# 4. ESDC Generation¶

This section explains how a ESDC is generated and how it can be extended by new variables.

## 4.1. Command-Line Tool¶

To generate new data cubes or to update existing ones a dedicated command-line tool cube-gen is used.

After installing cablab-core as described in section Installation, try:

$cube-gen --help CAB-LAB command-line interface, version 0.2.2 usage: cube-gen [-h] [-l] [-G] [-c CONFIG] [TARGET] [SOURCE [SOURCE ...]] Generates a new CAB-LAB data cube or updates an existing one. positional arguments: TARGET data cube root directory SOURCE <provider name>:dir=<directory>, use -l to list source provider names optional arguments: -h, --help show this help message and exit -l, --list list all available source providers -G, --dont-clear-cache do not clear data cache before updating the cube (faster) -c CONFIG, --cube-conf CONFIG data cube configuration file  The list option lists all currently installed source data providers: $ cube-gen --list

ozone -> cablab.providers.ozone.OzoneProvider
net_ecosystem_exchange -> cablab.providers.mpi_bgc.MPIBGCProvider
air_temperature -> cablab.providers.air_temperature.AirTemperatureProvider
interception_loss -> cablab.providers.gleam.GleamProvider
transpiration -> cablab.providers.gleam.GleamProvider
open_water_evaporation -> cablab.providers.gleam.GleamProvider
...


Source data providers are the pluggable software components used by cube-gen to read data from a source directory and transform it into a common data cube structure. The list above shows the mapping from short names to be used by the cube-gen command-line to the actual Python code, e.g. for ozone, the OzoneProvider class of the cablab/providers/ozone.py module is used.

The common cube structure is established by a cube configuration file provided by the cube-config option. Here is the configuration file that is used to produce the low-resolution ESDC. It will produce a 0.25 degrees global cube that whose source data will aggregated/interpolated to match 8 day periods and then resampled to match 1440 x 720 spatial grid cells:

model_version = '0.2.4'
spatial_res = 0.25
temporal_res = 8
grid_width = 1440
grid_height = 720
start_time = datetime.datetime(2001, 1, 1, 0, 0)
end_time = datetime.datetime(2012, 1, 1, 0, 0)
ref_time = datetime.datetime(2001, 1, 1, 0, 0)
calendar = 'gregorian'
file_format = 'NETCDF4_CLASSIC'
compression = False


To create or update a cube call the cube-gen tool with the configuration and the cube data provider(s). The cube data providers can have parameters on their own. All current providers have the dir parameter indicating the source data directory but this is not a rule. Other providers which read from multivariate sources also have a var parameter to indicate which variable of many possible should be used.



## 4.3. Sharing a Provider¶

If you plan to distribute and share your provider, you should create your own Python module separate from cablab-core with a dedicated setup.py with only your providers listed in the entry_points section. Other users may then install your module on top of an cablab-core to make use of your plugin.

## 4.4. Python Cube API Reference¶

from cablab import Cube
from datetime import datetime
cube = Cube.open('./cablab-cube-v05')
data = cube.data.get(['LAI', 'Precip'], [datetime(2001, 6, 1), datetime(2012, 1, 1)], 53.2, 12.8)


Data Cube creation/update:

from cablab import Cube, CubeConfig
from datetime import datetime
cube = Cube.create('./my-cablab-cube', CubeConfig(spatial_res=0.05))
cube.update(MyVar1SourceProvider(cube.config, './my-cube-sources/var1'))
cube.update(MyVar2SourceProvider(cube.config, './my-cube-sources/var2'))

class cablab.Cube(base_dir, config)[source]

Represents a data cube. Use the static open() or create() methods to obtain data cube objects.

base_dir

The cube’s base directory.

close()[source]

Closes the data cube.

closed

Checks if the cube has been closed.

config

The cube’s configuration. See CubeConfig class.

static create(base_dir, config=CubeConfig(spatial_res=0.250000, grid_x0=0, grid_y0=0, grid_width=1440, grid_height=720, temporal_res=8, ref_time=datetime.datetime(2001, 1, 1, 0, 0)))[source]

Create a new data cube. Use the Cube.update(provider) method to add data to the cube via a source data provider.

Parameters: base_dir – The data cube’s base directory. Must not exists. config – The data cube’s static information. A cube instance.
data

The cube’s data which is an instance of the CubeDataAccess class.

info() → str[source]

static open(base_dir)[source]

Open an existing data cube. Use the Cube.update(provider) method to add data to the cube via a source data provider.

Parameters: base_dir – The data cube’s base directory which must be empty or non-existent. A cube instance.
update(provider: CubeSourceProvider)[source]

Updates the data cube with source data from the given image provider.

Parameters: provider – An instance of the abstract ImageProvider class
class cablab.CubeConfig(spatial_res=0.25, grid_x0=0, grid_y0=0, grid_width=1440, grid_height=720, temporal_res=8, calendar='gregorian', ref_time=datetime.datetime(2001, 1, 1, 0, 0), start_time=datetime.datetime(2001, 1, 1, 0, 0), end_time=datetime.datetime(2012, 1, 1, 0, 0), variables=None, file_format='NETCDF4_CLASSIC', chunk_sizes=None, compression=False, comp_level=5, static_data=False, model_version='1.0.2')[source]

A data cube’s static configuration information.

Parameters: spatial_res – The spatial image resolution in degree. grid_x0 – The fixed grid X offset (longitude direction). grid_y0 – The fixed grid Y offset (latitude direction). grid_width – The fixed grid width in pixels (longitude direction). grid_height – The fixed grid height in pixels (latitude direction). temporal_res – The temporal resolution in days. ref_time – A datetime value which defines the units in which time values are given, namely ‘days since ref_time’. start_time – The inclusive start time of the first image of any variable in the cube given as datetime value. None means unlimited. end_time – The exclusive end time of the last image of any variable in the cube given as datetime value. None means unlimited. variables – A list of variable names to be included in the cube. file_format – The file format used. Must be one of ‘NETCDF4’, ‘NETCDF4_CLASSIC’, ‘NETCDF3_CLASSIC’ or ‘NETCDF3_64BIT’. chunk_sizes – A mapping of dimension names to chunk size for encoding. Default is None. compression – Whether gzip compression is used for encoding. Default is False. comp_level – Integer between 1 and 9 describing the level of compression desired for encoding. Default is 5. Ignored if compression is False.
date2num(date) → float[source]

Return the number of days for the given date as a number in the time units given by the time_units property.

Parameters: date – The date as a datetime.datetime value
easting

The latitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).

geo_bounds

The geographical boundary given as ((LL-lon, LL-lat), (UR-lon, UR-lat)).

static load(path) → object[source]

Load a CubeConfig from a text file.

Parameters: path – The file’s path name. A new CubeConfig instance
northing

The longitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).

num_periods_per_year

Return the integer number of target periods per year.

store(path)[source]

Store a CubeConfig in a text file.

Parameters: path – The file’s path name.
time_units

Return the time units used by the data cube as string using the format ‘days since ref_time’.