2. ESDC Description

2.1. Macro Structure

The data is organised in the described 4-dimensional form x(u,v,t,k), but additionally each data stream k is assigned to one of the subsystems of interest:

  • Land surface
  • Atmospheric forcing
  • Socio-economic data

2.2. Spatial and Temporal Coverage

The fine grid of the ESDC has a spatial resolution of 0.083° (5”), which is properly nested within a coarse grid of 0.25° (15”). Hence, the ESDC is available in two versions

  • High resolution version: 0.083° (5”) spatial resolution,
  • Low resolution version: 0.25° (15”) spatial resolution.

While the latter contains all variables, the former only comprises those variables that are natively available at this resolution. The high-resolution data are nested on the low-resolution data set such that one can analyse these in tandem. In particular data from the socio-economic subsystem are often organised according to administrative units, typically national states, rather than on regular grids. These data are dispersed to the coarse grid by means of a national state mask, which is created by assigning a national state property to each grid point.

The temporal resolution is 8 days.

The time span currently covered is 2001-2011. We are dedicated to expand this period on both ends, but to preserve the ESDC’s characteristics, a reasonable coverage of data streams is required.

2.3. Format and Structure

The binary data format for the Earth System Data Cube (ESDC) in the CAB-LAB project is netCDF 4 classic, where the term classic stands for an underlying HDF-5 format accessed by a netCDF 4 API.

The netCDF file’s content and structure follows the CF-conventions. That is, there are are always at least three dimensions defined

  1. lon - Always the inner and therefore fastest varying dimension. Defines the raster width of spatial images.
  2. lat - Always the second dimension. Defines the raster height of spatial images.
  3. time - Time dimension.
The spatio-temporal structure of the Earth System Data Cube.

There are 1D-variables related to each dimension providing its actual values:

  • lon(lon) and lat(lon) - longitudes and latitudes in decimal degrees defined in a WGS-84 geographical
    coordinate reference system. The spatial grid is homogeneous with the distance between two grid points referred to as the ESDC’s spatial resolution.
  • start_time(time) and end_time(time) - Period start and end times of a datum given in
    days since 2001-01-01 00:00. The increments between two values in time are identical and referred to as the ESDC’s temporal resolution.

There is usually only a single geophysical variable with shape(time, lat, lon) represented by each netCDF file. So each netCDF file is composed of length(time) spatial images of that variable, where each image of size length(lon) x length(lat) pixels has been generated by aggregating all source data contributing to the period given by the ESDC’s temporal resolution.

To limit the size of individual files, the geophysical variables are stored in one file per year. For example, if the temporal resolution is 0.25 degrees and the the spatial resolution is an 8-day period then there will be up to 46 images of 1440 x 720 pixels in each annual netCDF file. These annual files are stored in dedicated sub-directories as follows:

<cube-root-dir>/
    cube.config
    data/
        LAI/
            2001_LAI.nc
            2002_LAI.nc
            ...
            2011_LAI.nc
        Ozone/
            2001_Ozone.nc
            2002_Ozone.nc
            ...
            2011_Ozone.nc
        ...

The names of the geophysical variable in a netCDF file must match the name of the corresponding sub-directory and the file name.

The text file cube.config contains a Data Cube’s static configuration such as its temporal and spatial resolution. Also the spatial coverage is constant, that is, all spatial images are of the same size. Where actual data is missing, fill values are inserted to expand a data set to the dimensions of the Data Cube. The fill values in the Data Cube are identical to the ones used in the Data Cube’s sources. The same holds for the data types. While all images for all time periods have the same size, the temporal coverage for a given variable may vary. Missing spatial images for a given time period are treated as images with all pixels set to a fill value.

The following table contains all possible configuration parameters:

Parameter Default Value Description
temporal_res 8 The constant temporal resolution given as integer days.
calendar 'gregorian' Defines the Data Cube’s time units.
ref_time datetime(2001, 1, 1) The Data Cube’s time unit is days since a reference date/time.
start_time datetime(2001, 1, 1) The start date/time of contributing source data.
end_time datetime(2011, 1, 1) The end date/time of contributing source data.
spatial_res 0.25 The constant spatial resolution given in decimal degrees.
grid_x0 0 The spatial grid’s X-offset.
grid_y0 0 The spatial grid’s Y-offset.
grid_width 1440 The spatial grid’s width. Must always be 360 / spatial_res.
grid_height 720 The spatial grid’s height. Must always be 180 / spatial_res.
variables None The variables contained in the Data Cube.
file_format 'NETCDF4_CLASSIC' The target binary file format.
compression False Whether or not the target binary files should be compressed.
model_version '0.1' The version of the Data Cube model and configuration.

2.4. Processing Applied

The Data Cube is generated by the cube-cli tool. This tools creates a Data Cube for a given configuration and can be used to subsequently add variables, one by one, to the Data Cube. Each variable is read from its specific data source and transformed in time and space to comply to the specification defined by the target Data Cube’s configuration.

The general approach is as follows: For each variable and a given Data Cube time period: * Read the variable’s data from all contributing sources that have an overlap with the target period; * Perform temporal aggregation of all contributing spatial images in the original spatial resolution; * Perform spatial upsampling or downsampling of the image aggregated in time; * Mask the resulting upsampled/downsampled image by the common land-sea mask; * Insert the final image for the variable and target time period into the Data Cube.

The following sections describe each method used in more detail.

2.4.1. Gap-Filling Approach

The current version (version 0.1, Feb 2016) of the ESDC does not explicitly fill gaps. However, some gap-filling occurs during temporal aggregation as described below. The CAB-LAB team may provide gap-filled ESDC versions at a later point in time of the project. Gap-filling is part of the Data Analytics Toolkit and is thus not tackled during Data Cube generation to retain the information on the original data coverage as much as possible.

For future Data Cube versions per-variable gap-filling strategies may be applied. Also, only a spatio-temporal region of interest may be gap-filled while cells outside this region may be filled by global default values. An instructive example of such an approach would be the gap-filling of a leaf area index (LAI) data set, which only takes place in mid-latitudes while gaps in high-latitudess are filled with zeros.

2.4.2. Temporal Resampling

Temporal resampling starts on the 1st January of every year so that all the i-th spatial images in the ESDC refer to the same time of the year, namely starting i x temporal resolution. Source data is collected for every resulting ESDC target period. If there is more than one contribution in time, then each contribution is weighted according to the temporal overlap with the target period. Finally, target pixel values are computed by averaging all weighted values in time not masked by a fill value. By doing so, some temporal gaps are filled implicitly.

2.4.3. Spatial Resampling

Spatial resampling occurs after temporal resampling only if the ESDC’s spatial resolution differs from the data source resolution.

If the ESDC’s spatial resolution is higher than the data source’s spatial resolution, source images are upsampled by rescaling hereby duplicating original values, but not performing any spatial interpolation.

If the ESDC’s spatial resolution is lower than the data source’s spatial resolution, source images are downsampled by aggregation hereby performing a weighted spatial averaging taking into account missing values. If there is not an integer factor between the source and the Data Cube resolution, weights will be found according to the spatial overlap of source and target cells.

Contiguous Oversampling Contiguous Undersampling
Discontiguous Overrsampling Discontiguous Undersampling

2.4.4. Land-Water Masking

After spatial resampling, a land-water mask is applied to individual variables depending on whether a variable is defined for water surfaces only, land surfaces only, or both. A common land-water mask is used for all variables for a given spatial resolution. Masked values are indicated by fill values.

2.4.5. Constraints and Limitations

The ESDC approach of transforming all variables onto a common grid greatly facilitates handling and joint analysis of data sets that originally had different characteristics and were generated under different assumptions. Regridding, gap-filling, and averaging, however, may alter the information contained in the original data considerably.

The main idea of the ESDC is to provide a consistent and synoptic characterisation of the Earth System at given time steps to promote global analyses. Therefore, conducting small-scale, high frequency studies that are potentially highly sensible to individual artifacts introduced by data transformation is not encouraged. The cautious expert user may hence carefully check phenomena close to the Land-Sea mask or in data sparse regions of the original data. If in doubt, suspicious patterns in the ESDC or unexpected analytical results should be verified with the source data in the native resolution. We try here as much as possible to conserve the characteristics of the original data, while facilitating data handling and analysis by transformation.

This is a difficult balance to strike that at times involves inconvenient trade-offs. We thus embrace transparency and reproducibility to enable the informed user to evaluate the validity and consistency of the processed data and strive to offer options for data transformation wherever possible.

2.5. Cube Data Variables

Project name Variable name in ESDC URL Citation
ESA Aerosol CCI aerosol_optical_thickness_1610 http://www.esa-aerosol-cci.org/ Holzer-Popp, T., de Leeuw, G., Griesfeller, J., Martynenko, D., Klueser, L., Bevan, S., et al. (2013). Aerosol retrieval experiments in the ESA Aerosol_cci project. Atmospheric Measurement Techniques, 6, 1919-1957. doi:10.5194/amt-6-1919-2013.
aerosol_optical_thickness_865
aerosol_optical_thickness_659
aerosol_optical_thickness_555
aerosol_optical_thickness_550
GLEAM bare_soil_evaporation http://www.gleam.eu/ Martens, B., Miralles, D.G., Lievens, H., van der Schalie, R., de Jeu, R.A.M., Fernández-Prieto, D., Beck, H.E., Dorigo, W.A., and Verhoest, N.E.C.: GLEAM v3.0: satellite-based land evaporation and root-zone soil moisture, Geoscientific Model Development Discussions, doi: 10.5194/gmd-2016-162, 2016
evaporation
evaporative_stress
interception_loss
open_water_evaporation
potential_evaporation
root_moisture
snow_sublimation
surface_moisture
transpiration
ERAInterim air_temperature_2m http://www.ecmwf.int/en/research/climate-reanalysis/era-interim Dee, D.P. et al. 2011 http://onlinelibrary.wiley.com/doi/10.1002/qj.828/abstract
GlobAlbedo black_sky_albedo http://www.globalbedo.org/ Muller, Jan-Peter, et al. “The ESA GLOBALBEDO project for mapping the Earth’s land surface albedo for 15 years from European sensors.” Geophysical Research Abstracts. Vol. 13. 2012.
white_sky_albedo
GFED4 burnt_area http://www.globalfiredata.org/ iglio, Louis, James T. Randerson, and Guido R. Werf. “Analysis of daily, monthly and annual burned area using the fourth‐generation global fire emissions databas (GFED4).” Journal of Geophysical Research: Biogeosciences 118.1 (2013): 317-328.
c_emission
GlobSnow fractional_snow_cover http://www.globsnow.info/ Luojus, Kari, et al. “ESA DUE Globsnow-Global Snow Database for Climate Research .” ESA Special Publication. Vol. 686. 2010.
snow_water_equivalent
FLUXCOM gross_primary_productivity http://fluxcom.org/ Tramontana, Gianluca, et al. “Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms.” (2016).
terrestrial_ecosystem_repiration
latent_heat
evapotranspiration
net_ecosystem_exchange
GlobTemperature land_surafce_temperature http://data.globtemperature.info/ Freitas, S. C. et al 2010: Quantifying the Uncertainty of Land Surface Temperature Retrievals From SEVIRI/Meteosat, IEEE Trans. Geosci. Remote Sens. Trigo, I. F., et al., 2011: The Satellite Application Facility on Land Surface Analysis. Int. J. Remote Sens., 32, 2725-2744, doi: 10.1080/01431161003743199.
Ozone CCI ozone http://www.esa-ozone-cci.org/ Laeng, A., et al. “The ozone climate change initiative: Comparison of four Level-2 processors for the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS).” Remote Sensing of Environment 162 (2015): 316-343.
GPCP precipitation http://precip.gsfc.nasa.gov/ Adler, Robert F., et al. “The version-2 global precipitation climatology project (GPCP) monthly precipitation analysis (1979-present).” Journal of hydrometeorology 4.6 (2003): 1147-1167.
SoilMoisture CCI soil_moisture http://www.esa-soilmoisture-cci.org/ Y.Y., Parinussa, R.M., Dorigo, W.A., De Jeu, R.A.M., Wagner, W., McCabe, M.F., Evans, J.P., and van Dijk, A.I.J.M. (2012): Trend-preserving blending of passive and active microwave soil moisture retrievals
GlobVapour water_vapour http://www.globvapour.info/ Schneider, Nadine, et al. “ESA DUE GlobVapour water vapor products: Validation.” AIP Conference Proceedings. Vol. 1531. No. 1. 2013.