4. ESDC Generation

This section explains how a ESDC is generated and how it can be extended by new variables.

4.1. Command-Line Tool

To generate new data cubes or to update existing ones a dedicated command-line tool cube-gen is used.

After installing cablab-core as described in section Installation, try:

$ cube-gen --help

CAB-LAB command-line interface, version 0.2.2
usage: cube-gen [-h] [-l] [-G] [-c CONFIG] [TARGET] [SOURCE [SOURCE ...]]

Generates a new CAB-LAB data cube or updates an existing one.

positional arguments:
  TARGET                data cube root directory
  SOURCE                <provider name>:dir=<directory>, use -l to list source
                        provider names

optional arguments:
  -h, --help            show this help message and exit
  -l, --list            list all available source providers
  -G, --dont-clear-cache
                        do not clear data cache before updating the cube
                        (faster)
  -c CONFIG, --cube-conf CONFIG
                        data cube configuration file

The list option lists all currently installed source data providers:

$ cube-gen --list

ozone -> cablab.providers.ozone.OzoneProvider
net_ecosystem_exchange -> cablab.providers.mpi_bgc.MPIBGCProvider
air_temperature -> cablab.providers.air_temperature.AirTemperatureProvider
interception_loss -> cablab.providers.gleam.GleamProvider
transpiration -> cablab.providers.gleam.GleamProvider
open_water_evaporation -> cablab.providers.gleam.GleamProvider
...

Source data providers are the pluggable software components used by cube-gen to read data from a source directory and transform it into a common data cube structure. The list above shows the mapping from short names to be used by the cube-gen command-line to the actual Python code, e.g. for ozone, the OzoneProvider class of the cablab/providers/ozone.py module is used.

The common cube structure is established by a cube configuration file provided by the cube-config option. Here is the configuration file that is used to produce the low-resolution ESDC. It will produce a 0.25 degrees global cube that whose source data will aggregated/interpolated to match 8 day periods and then resampled to match 1440 x 720 spatial grid cells:

model_version = '0.2.4'
spatial_res = 0.25
temporal_res = 8
grid_width = 1440
grid_height = 720
start_time = datetime.datetime(2001, 1, 1, 0, 0)
end_time = datetime.datetime(2012, 1, 1, 0, 0)
ref_time = datetime.datetime(2001, 1, 1, 0, 0)
calendar = 'gregorian'
file_format = 'NETCDF4_CLASSIC'
compression = False

To create or update a cube call the cube-gen tool with the configuration and the cube data provider(s). The cube data providers can have parameters on their own. All current providers have the dir parameter indicating the source data directory but this is not a rule. Other providers which read from multivariate sources also have a var parameter to indicate which variable of many possible should be used.

$ cube-gen mycube -c mycube.config ozone:dir=/path/to/ozone/netcdfs

will create the cube mycube in current directory using the mycube.config configuration and add a single variable ozone from source NetCDF files in /path/to/ozone/netcdfs.

Note, the GitHub repository cube-config is used to keep the configurations of individual ESDC versions.

4.2. Writing a new Provider

In order to add new source data for which there is no source data provider yet, you can write your own.

Make sure cablab-core is installed as described in section Installation above.

If your source data is NetCDF, writing a new provider is easy. Just copy one of the existing providers, e.g. cablab/providers/ozone.py and start adopting the code to your needs.

For source data other than NetCDF, you will have to write a provider from scratch by implementing the cablab.CubeSourceProvider interface or by extending the cablab.BaseCubeSourceProvider which is usually easier. Make sure you adhere to the contract described in the documentation of the respective class.

To run your provider you will have to register it in the setup.py file. Assuming your provider is called sst and your provider class is SeaSurfaceTemperatureProvider located in myproviders.py, then the entry_points section of the setup.py file should reflect this as follows:

entry_points={
    'cablab.source_providers': [
        'burnt_area = cablab.providers.burnt_area:BurntAreaProvider',
        'c_emissions = cablab.providers.c_emissions:CEmissionsProvider',
        'ozone = cablab.providers.ozone:OzoneProvider',
        ...
        'sst = myproviders:SeaSurfaceTemperatureProvider',

To run it:

$ cube-gen mycube -c mycube.config sst:dir=/path/to/sst/netcdfs

4.3. Sharing a Provider

If you plan to distribute and share your provider, you should create your own Python module separate from cablab-core with a dedicated setup.py with only your providers listed in the entry_points section. Other users may then install your module on top of an cablab-core to make use of your plugin.

4.4. Python Cube API Reference

Data Cube read-only access:

from cablab import Cube
from datetime import datetime
cube = Cube.open('./cablab-cube-v05')
data = cube.data.get(['LAI', 'Precip'], [datetime(2001, 6, 1), datetime(2012, 1, 1)], 53.2, 12.8)

Data Cube creation/update:

from cablab import Cube, CubeConfig
from datetime import datetime
cube = Cube.create('./my-cablab-cube', CubeConfig(spatial_res=0.05))
cube.update(MyVar1SourceProvider(cube.config, './my-cube-sources/var1'))
cube.update(MyVar2SourceProvider(cube.config, './my-cube-sources/var2'))
class cablab.Cube(base_dir, config)[source]

Represents a data cube. Use the static open() or create() methods to obtain data cube objects.

base_dir

The cube’s base directory.

close()[source]

Closes the data cube.

closed

Checks if the cube has been closed.

config

The cube’s configuration. See CubeConfig class.

static create(base_dir, config=CubeConfig(spatial_res=0.250000, grid_x0=0, grid_y0=0, grid_width=1440, grid_height=720, temporal_res=8, ref_time=datetime.datetime(2001, 1, 1, 0, 0)))[source]

Create a new data cube. Use the Cube.update(provider) method to add data to the cube via a source data provider.

Parameters:
  • base_dir – The data cube’s base directory. Must not exists.
  • config – The data cube’s static information.
Returns:

A cube instance.

data

The cube’s data which is an instance of the CubeDataAccess class.

info() → str[source]

Return a human-readable information string about this data cube (markdown formatted).

static open(base_dir)[source]

Open an existing data cube. Use the Cube.update(provider) method to add data to the cube via a source data provider.

Parameters:base_dir – The data cube’s base directory which must be empty or non-existent.
Returns:A cube instance.
update()[source]

Updates the data cube with source data from the given image provider.

Parameters:provider – An instance of the abstract ImageProvider class
class cablab.CubeConfig(spatial_res=0.25, grid_x0=0, grid_y0=0, grid_width=1440, grid_height=720, temporal_res=8, calendar='gregorian', ref_time=datetime.datetime(2001, 1, 1, 0, 0), start_time=datetime.datetime(2001, 1, 1, 0, 0), end_time=datetime.datetime(2012, 1, 1, 0, 0), variables=None, file_format='NETCDF4_CLASSIC', compression=False, chunk_sizes=None, static_data=False, model_version='1.0.1')[source]

A data cube’s static configuration information.

Parameters:
  • spatial_res – The spatial image resolution in degree.
  • grid_x0 – The fixed grid X offset (longitude direction).
  • grid_y0 – The fixed grid Y offset (latitude direction).
  • grid_width – The fixed grid width in pixels (longitude direction).
  • grid_height – The fixed grid height in pixels (latitude direction).
  • temporal_res – The temporal resolution in days.
  • ref_time – A datetime value which defines the units in which time values are given, namely ‘days since ref_time’.
  • start_time – The inclusive start time of the first image of any variable in the cube given as datetime value. None means unlimited.
  • end_time – The exclusive end time of the last image of any variable in the cube given as datetime value. None means unlimited.
  • variables – A list of variable names to be included in the cube.
  • file_format – The file format used. Must be one of ‘NETCDF4’, ‘NETCDF4_CLASSIC’, ‘NETCDF3_CLASSIC’ or ‘NETCDF3_64BIT’.
  • compression – Whether the data should be compressed.
date2num(date) → float[source]

Return the number of days for the given date as a number in the time units given by the time_units property.

Parameters:date – The date as a datetime.datetime value
easting

The latitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).

geo_bounds

The geographical boundary given as ((LL-lon, LL-lat), (UR-lon, UR-lat)).

static load(path) → object[source]

Load a CubeConfig from a text file.

Parameters:path – The file’s path name.
Returns:A new CubeConfig instance
northing

The longitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).

num_periods_per_year

Return the integer number of target periods per year.

store(path)[source]

Store a CubeConfig in a text file.

Parameters:path – The file’s path name.
time_units

Return the time units used by the data cube as string using the format ‘days since ref_time’.