Page Index
Page Index

How much data do we need to monitor and understand our environment?

Claudia Notarnicola

Many applications related to Earth observation such as monitoring water resources and vegetation to climate change impact analysis, have undergone fast developments in the last few years due to the advancements of methods, models, and the increased availability of data. These applications are characterized through a set of requirements in terms of data needs (Verstraete et al. 2015).

Requirements for the robust monitoring of geophysical variables are expressed in terms of six criteria: uncertainty, horizontal resolution, vertical resolution, observing cycle, timeliness, and stability (where appropriate) (OSCAR 2021a) and they apply to a wide range of applications such as hydrology, agricultural meteorology, climate, nowcasting and atmospheric processes.

THE ERA OF BIG DATA

Data refer to both satellite images and in-situ measurements. Different space agencies operate many satellites which provide long-term global observations of the land surface, biosphere, atmosphere, and oceans. From these observations, key parameters for land surfaces such as surface temperature, land surface water, energy, and carbon fluxes are derived, thus enabling the monitoring of their spatiotemporal variation. Earth observation components have been developed asynchronously because until now, the data and information to be exploited were of relatively small size and most applications were still not ready to process large numbers and types of data (Lehmann et al. 2014; McCabe et al. 2017). Nowadays with the huge increase of data availability (Figure 1) such as the COPERNICUS program (Copernicus 2021), the pressure to deliver useful information on main societal challenges is evident and a paradigm shift is needed to reduce duplication, optimize the observing potential, and empower the application suites.

For Earth observation, other relevant data also come from ground networks. The distribution of data is worldwide (OSCAR 2021b), even though some areas, such as Europe and North America, exhibit a higher density of data and a decline of in-situ networks has been observed since the 1980s (Fekete et al. 2012). For example, surface observations are one of the key observation components of the World Meteorological

Organization’s (WMO) Global Observing System (GOS) which is composed of about 11,000 stations at or near the Earth’s surface in order to gather atmospheric parameters (GOS 2021). Another relevant regional and global network of micrometeorological tower sites (FLUXNET) has also been established around the world (Fluxnet 2021). Moreover, a plethora of other information derived from CubeSat (Nanosats, 2021), high altitude platforms (d’Oliveria et al. 2016), unmanned vehicles and smartphone technologies are becoming available to the scientific community (Reis et al. 2015; McCabe et al. 2017).

Figure 1: The need for information from satellites is growing at an ever-increasing rate. The picture shows the satellites developed by the European Space Agency to observe our planet (Credit: ESA/ESA-developedEarthobservation_missions – CC BY-SA IGO 3.0https://creativecommons.org/licenses/by-sa/3.0/igo/ ).

Based on these considerations, we can definitively say that the era of Big Data has arrived in the domain of Earth observation. In the Digital Universe Report released by International Data Corporation (IDC), it has even been stated that the amount of data is set to double every two years (Gantz and Reinsel, 2012). In addition, the capability to access global Earth observation information is rapidly increasing. For Earth monitoring systems, there is a demand for a continuity of data, data uncertainty and the optimization of the network of multi-source remote sensing data. Moreover, data assimilation procedures still need validation data and Earth observation information has not yet satisfied some specific needs of model parameters (Guo et al. 2015; IPCC 2014; McCabe et al. 2017, Lehmann et al. 2014; Langeron et al. 2020).

METHODS FOR EFFICIENT USE OF HETEROGENEOUS DATASETS

The availability of large data from ground measurements, and satellite data sets for example, has led to the development of several methodologies that can cope with dataset heterogeneity. The main methodologies that deal with the complexity and richness of data are: a) data fusion approaches which aim at improving the retrieval of information when several data sources are available; b) data mining related to extracting non-redundant information from these data sets; c) domain adaptation whereby the main objective is to link data sets in different domains (spatial and/or temporal). These methodologies deal with the definition and identification of links in the data to extract only relevant information. Moreover, most of these methodologies apply techniques based on Artificial Intelligence, that by nature can deal with heterogeneous and large data sets (Tuia et al. 2016).

PHYSICALLY BASED AND EMPIRICAL MODELLING

In the last few years, several theoretical, physically based, and empirical models have been developed at both local and global scales. Physically based models represent natural processes by describing each individual physical process of the system and combining them into a complex one. Physical equations describe natural processes, such as stream flow or sediment transport. Physically based models can explain the spatial variability of most important land surface characteristics such as vegetation and soil parameters, as well as climate parameters including precipitation, temperature, and evaporation (Legesse et al. 2003). These complex approaches often require high resolution spatial and temporal input data with different resolution depending on the scale addressed by the model.

Along with physically based models, both empirical models and/or empirical expressions also known as data-driven models – used in theoretical model construction, have been developed based on observations of phenomena relating to natural processes (Goldstein & Coco 2015). Both models are calibrated and validated against observed environmental data, which have traditionally only been available from either a few, sparsely distributed monitoring stations, or from costly short-term field measurement campaigns. Nowadays this scenario has been strongly modified by the large availability of data in many application domains. For example, the availability of low-cost sensors and citizen crowdsourcing can enlarge the scale of environmental measurements (McCabe et al. 2017). Models play a potentially important role in integrating these data as inputs to refine and quantify important environmental relationships and processes. Models can also benefit from having new data to use as calibration, validation, and assimilation points to improve the outputs of increasingly complex and downscaled data. The graph in Figure 2 is an example of likely interaction among physically based models, ground observations, satellite data, data-driven methods.

Figure 2: Interaction among physically based models, data (ground and satellite) and data-driven methods which can be used for data assimilation, data fusion and for generating hybrid models (combination of physical and data-driven models).

Moreover, the combined use of data and models at different spatio-temporal scales can serve in the identification of scale-dependent and universal relationships between potential casual factors and outcomes of interest. Global climate models, currently used for science and policy purposes, are an archetype of how data and modeling might be integrated. In addition, hydrological science provides an example in which there are already large datasets and initiatives underway to develop data-driven modeling methods, such as the “Panta Rhei-Change in Hydrology and Society initiative” (Montanari et al. 2013).

APPLICATIONS FOR EARTH OBSERVATION

Data and methods can serve applications in several domains. It has been shown how the richness and variety of data can improve the understanding of complex processes, an example of which is the climate change analysis where extremely large and abundant datasets stemming from asynchronous satellite, aerial and ground observation experiments are provided. Earth observational elements and factors (e.g., glaciers, lakes, vegetation, radiation, and urbanization) and global environmental change information as well as simulation systems can be obtained from such huge data sets. The availability of data has drastically improved the prediction capabilities, and the physical description of the phenomena (Guo et al. 2015). Not only climate change studies but also applications in other domains such as hydrology, are profiting from data availability to improve the representativeness of phenomena and even the implementation of new physical laws (Lehmann et al. 2014). Artificial Intelligence-based methodologies can be applied to derive, among others, accurate land cover maps (Romero et al. 2016) and river discharge predictions (Callegari et al. 2015). As an example, in the following, the current situation related to the monitoring of the cryosphere in mountain areas is presented which outlines the main data sources available, and models with related uncertainties and gaps. These considerations are the results of many projects developed in this domain by Eurac Research over the last ten years.

CURRENT NEEDS FOR IMPROVED MONITORING OF THE CRYOSPHERE IN MOUNTAIN AREAS

Recent studies of snow climatology suggest an overall tendency toward decreases in several metrics of snow such as snow cover extent, snow cover duration, snow water equivalent, and snow depth, even though not all the areas show uniform trends (Bormann et al. 2018; Beniston et al. 2018; Smith & Bookhagen 2018; Notarnicola 2020).

Because of the significant consequences of changes in snow amount on Earth’s environment and population, scientists have developed several approaches to continuously measure and monitor snow and its related properties. To establish this knowledge and enable us to produce such information, adequate observations of snow parameters are needed. Today most of the available information on snow parameters, apart from snow extent, is based on observations from ground-based (meteo) stations, which in some cases may be sparse and therefore not sufficient for providing accurate estimates of snow parameters over mountain areas.

The most diffuse and long-term snow measurement is related to snow depth. Unless the measurement area is highly homogeneous, the automatic snow depth measurements at the point scale are not necessarily representative of the surrounding landscape (Haberkorn 2019). Nowadays it is possible to access databases from worldwide locations which provide snow depth measurements available from as far back as 1960 – such as those found here: https://globalcryospherewatch.org/reference/snow_inventory.php.

Another key parameter for understanding snow dynamics and conditions is snow water equivalent. Unfortunately, the frequency of these observations is not as readily available in time and space as in the case of snow depth and for this reason, in several research approaches, snow water equivalent is often estimated by model simulations (Langeron et al. 2020).

Ensuring additional and essential in-situ data to alleviate this problem can be challenging and expensive due to the need for data from remote and highly inaccessible areas. For this reason, Earth Observation from space has the potential to be particularly useful in this regard. Indeed, since the beginning of the satellite era in the 1960s, the areal extent of snow cover has been a key satellite observation target for the purposes of daily weather forecasting and a better understanding of the Earth’s climate system and hydrological cycle (Frei et al. 2012). Nowadays, this has become increasingly relevant thanks to Copernicus, the European Union’s Earth observation program, which, in combination with the plethora of other satellites, can be exploited to monitor the cryosphere (Figure 3). The techniques based on satellite data present some limitations due to varied factors, such as cloud presence, forest cover or the complexity of mountainous terrain due to its heterogeneity.

Figure 3: Satellite missions for observing the cryosphere (including snow, glaciers, permafrost, solid precipitation, river and lake ice) (from Global Cryosphere Watch, Courtesy of M. Drinkwater and J. Key, updated March 2, 2021).

Snow models express the energy and mass balances through equations which characterize the temperature and water content of the canopy, snowpack, and soil, coupled with terms that describe the evaporation, sublimation, and melt processes. The snow cover models vary significantly – from the simplest degree-day snow schemes to multi-layer snow cover evolution models, which can even include explicit representations of snow microstructure (e.g., for avalanche hazards forecasting). Of course, physically based models have different sources of uncertainties, such as the meteorological inputs (Raleigh et al. 2015) and the chosen parameters and simplifications in the description of the physical processes (Essery et al. 2013; Günther et al. 2019). The uncertainties of snow cover simulations may also be related to the small-scale variability of the study area, especially in complex mountainous terrain, with features such as topography and slope. Intercomparison studies have shown that models differ greatly in their predictions of snow accumulation and ablation. Although the physical processes within the snowpack are increasingly well parameterized, uncertainties still exist (Langeron et al, 2020). Data assimilation of ground and satellite data would improve not only the prediction of main snow parameters such as snow water equivalent and river discharge, but it would also favor the understanding of the snowpack processes as well.

MEASURING /
OBSERVING TOOL / MODELS
ADVANTAGES MAIN GAPS PROPOSED
DEVELOPMENTS
Ground observations Continuous in terms of time for some parameters such as snow depth Missing in high elevation; sparsely available information for relevant parameters such as snow water equivalent; number of networks is declining Increase density of networks especially in high mountains; Focus on other parameters such as snow water equivalent, liquid water content
Satellite observations Continuous in terms of space and able to provide information on inaccessible areas Acquisition time of satellite (e.g., weekly) does not match requirements for daily observations; cloud coverage; detection of snow under forest canopy. Relevant parameters such as snow water equivalent can be monitored only at large scale with coarse ground resolution New spectral bands for better discrimination of cloud over snow; sensors to detect relevant parameters such as snow water equivalent; dedicated studies for detection of snow under forest canopy
Physically based models Continuous in terms of both space and time and able to provide physical explanations to the snowpack processes Large uncertainties for specific processes (e.g., accumulation/ablation). Limited availability of ground data in specific regions (e.g., high elevations) Reduce uncertainties through data assimilation with satellite products. Use of data-driven approaches (e.g., Artificial Intelligence)

Table 1: Advantages, gaps and new developments
needed to improve the understanding of cryosphere processes.

CONCLUSIONS

We are witnessing the convergence of continuous data requests from different scientific communities and the generation of Big Data in a variety of domains, e.g., Earth Observation. Yet applications, data acquisition systems and elaboration methods are still only being developed for narrow disciplinary requirements, without systematically exploiting synergies between disciplines.

The challenge now is to use data more efficiently: a deeper understanding of Earth processes can be achieved by providing the data to fill specific gaps (Halevy et al. 2009). In the actual situation a paradigm shift may be required: a framework capable of detecting real data gaps and missing links in existing data, methods, and models. Such gaps can be then filled efficiently with specific satellite missions, new ground networks, improved methodologies, and better physically based models to reach auto-completion in the Earth Observation domain.

For future and better exploitation of these data and methods, it would be advisable to 1) create requirement-based clusters of applications and clusters of data and methods to identify the links between them. Such links explain similarities and differences between datasets and allow the exchange of data and methods in different clusters; 2) generate an ensemble of solutions for applications based on the existing available data and methods; 3) identify the main gaps in the actual information setting by analyzing the ensemble of solutions and their limitations.

The impact of these actions would reach far beyond the Earth Observation domain – they could provide solutions to the incessant growth of data volumes, methods and models and lead to smarter, more cost-efficient approaches and tools that fully utilize the potential of Big Data. Moreover, the need for credible information about the current and future trends in environmental processes to aid in decision making is becoming more urgent (IPCC 2014).



Abstract

The quantity of data and the methods available for observing our planet today are unprecedented. Booming sectors like remotely sensed data and their processing algorithms, have experienced an incessant, and in some cases uncontrolled growth. In parallel, many communities, such as natural hazard and environmental monitoring scientists, continuously ask for new, more intense field campaigns and higher spectral, spatial, and temporal resolution satellite missions. We are witnessing the convergence of continuous data requests from scientific communities for a better understanding of our environment and the generation of Big Data in different domains, e.g., Earth Observation and new and emerging technologies and approaches such as smartphones, low-cost sensors, and citizen crowdsourcing.

REFERENCES

  • Araripe d’Oliveira, Flavio; Lourenço de Melo, Francisco C. & Campos Devezas, Tessaleno (2016). High-Altitude Platforms — Present Situation and Technology Trends, J. Aerosp. Technol. Manag., São José dos Campos, Vol. 8, No. 3, Jul.-Sep., pp. 249–262.

  • Beniston, Martin; Farinotti, Daniel; Stoffel, Markus; et al. (2018). The European mountain cryosphere: a review of its current state, trends, and future challenges. The Cryosphere 12, pp. 759–794.

  • Bormann, Kat J.; Brown, Ross D.; Derksen, Chris & Painter, Thomas H. (2018). Estimating snow-cover trends from space. Nat. Clim. Change 8, pp. 924–928.

  • Callegari, Mattia; Mazzoli, Paolo; De Gregorio, Ludovica; et al. (2015). A Seasonal river discharge forecast using machine learning techniques: a case study in the Italian Alps, Water 7 (5), pp. 2494–2515.

  • COPERNICUS 2021: Copernicus program, http://www.copernicus.eu/.

  • Essery, Richard L. H.; Morin, Samuel; Lejeune, Yves & Menard, Cecile B. (2013). A comparison of 1701 snow models using observations from an alpine site. Adv. Water Resour. 55, pp. 131–148. doi: 10.1016/j.advwatres.2012.07.013.

  • Fekete, Balázs M.; Looser, Ulrich; Pietroniro, Alain & Robarts, Richard D. (2012). Rationale for Monitoring Discharge on the Ground, J. Hydrometeorol. 13, pp. 1977–1986, https://doi.org/10.1175/jhm-d-11-0126.1.

  • FLUXNET 2021: The Data portal serving the FLUXNET community http://fluxnet.fluxdata.org/.

  • Frei, Allan; Tedesco, Marco; Lee, Shihyan; et al. (2012). A review of global satellite-derived snow products, Adv. Space Res. 50, pp. 1007–1029.

  • Gantz, John & Reinsel, David (2012). The Digital Universe IN 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East, IDC Analyze the Future, Framingham.

  • Goldstein, Evan B. & Coco, Giovanni (2015). Machine learning components in deterministic models: hybrid synergy in the age of data, Frontiers in Environmental science (3) 33.

  • GOS 2021: WMO Global Observing System, https://public.wmo.int/en.

  • Günther, Daniel; Marke, Thomas; Essery, Richard & Strasser, Ulrich (2019). Uncertainties in snowpack simulations-assessing the impact of model structure, parameter choice, and forcing data error on point-scale energy balance snow model performance. Water Resour. Res. 55, pp. 2779–2800. doi: 10.1029/2018WR023403.

  • Guo, Hua-Dong; Zhang, Li & Zhu, Lan-Wei (2015). Earth observation big data for climate change research, Advances in Climate Change research 6, pp. 108–117.

  • Haberkorn, Anna; Helmert, Jürgen; Leppänen, Leena; López-Moreno, Juan-Ignacio & Pirazzini, Roberta (2019). European Snow Booklet – an Inventory of Snow Measurements in Europe, doi: 10.16904/envidat.59.

  • Halevy, Alon; Norvig, Peter & Pereira, Fernando (2009). The unreasonable effectiveness of data, Intelligent System, IEEE 2009.

  • IPCC Climate Change 2014: Synthesis Report. Geneva, Switzerland 2014, https://www.ipcc.ch/report/ar5/syr/.

  • Legesse, Dagnachew; Vallet-Coulomb, Christine & Gasse, Françoise (2003). Hydrological response of a catchment to climate and lane use changes in tropical Africa: Case study south central Ethiopia, Journal of Hydrology 275, pp. 67–85.

  • Lehmann, Anthony; Giuliani, Gregory; Ray, Nicolas; et al. (2014). Reviewing innovative Earth Observation solutions for filling science-policy gaps in hydrology, Journal of Hydrology.

  • Largeron, Chloé; Dumont, Marie; Morin, Samuel; et al. (2020). Toward Snow Cover Estimation in Mountainous Areas Using Modern Data Assimilation Methods: A Review. Front. Earth Sci. 8:325. doi: 10.3389/feart.2020.00325.

  • McCabe, Matthew F.; Rodell, Matthew; Alsdorf, Douglas E.; et al. (2017). The future of Earth observation in hydrology, Hydrol. Earth Syst. Sci. 21, pp. 3879–3914.

  • Montanari, Alberto; Young, Gordon; Savenije, Hubert H. G.; et al. (2013). Panta Rhei – Everything Flows: Change in hydrology and society – The IAHS Scientific Decade 2013–2022. Hydrological Sciences Journal. 58 (6), pp. 1256–1275.

  • NANOSATS 2021: Nanosatellites and cubesats database, http://www.nanosats.eu/.

  • Notarnicola, Claudia (2020). Hotspots of snow cover changes in global mountain regions over 2000–2018. Rem. Sen. Environ. 243, 111781.

  • OSCAR 2021a: The Observing Systems Capability Analysis and Review tool OSCAR maintained by the World Meteorological Organization, https://www.wmo-sat.info/oscar/observingrequirements.

  • OSCAR 2021b: World Meteorological Organization’s official repository of WIGOS metadata for all surface-based observing stations and platforms, https://oscar.wmo.int/surface//index.html#/.

  • Raleigh, Mark S.; Lundquist, Jessica D. & Clark, Martyn P. (2015). Exploring the impact of forcing error characteristics on physically based snow simulations within a global sensitivity analysis framework. Hydrol. Earth Syst. Sci. 19, pp. 3153–3179. doi: 10.5194/hess-19-3153-2015.

  • Reis, Stefan; Seto, Edmund; Northcross, Amanda; et al. (2015). Integrating modelling and smart sensors for environmental and human health, Environmental Modelling & Software 74, pp. 238–246.

  • Romero, Adriana; Gatta, Carlo & Camps-Valls, Gustau (2016) Unsupervised Deep Feature Extraction for Remote Sensing Image Classification. IEEE Transactions on Geoscience and Remote Sensing 54 (3), pp. 1349–1362.

  • Smith, Taylor & Bookhagen, Bodo (2018). Changes in seasonal snow water equivalent distribution in High Mountain Asia (1987 to 2009). Sci. Adv. 4, e1701550.

  • Tuia, Devis; Persello, Claudio & Bruzzone, Lorenzo (2016). Domain Adaptation for the classification of remote sensing data: an overview of recent advance, IEEE Geoscience and Remote Sensing Magazine, pp. 41–57.

  • Verstraete, Michel M.; Diner, David J. & Bézy, Jean-Loup (2015). Planning for a spaceborne Earth Observation mission: From user expectations to measurement requirements, Environment Science & Policy 54, pp. 419 –427.