GeoZarrHandler#

class dtcg.datacube.geozarr.GeoZarrHandler(
ds=None,
ds_name='L1',
target_chunk_mb=5.0,
compressor=None,
metadata_mapping_data=None,
metadata_mapping_coords=None,
zarr_format=2,
)[source]#

Bases: MetadataMapper

Attributes

metadata_mappings_data

metadata_mappings_coords

Methods

__init__([ds, ds_name, target_chunk_mb, ...])

Initialise a GeoZarrHandler object.

add_datacube(datacubes, datacube_name[, ...])

Add a new dataset as a child group of the DataTree at the root.

add_layer(ds, ds_name[, overwrite])

Add a new dataset as a child group of the DataTree at the root.

export(storage_directory[, overwrite])

Write the dataset to GeoZarr format.

get_layer(ds_name)

Get a dataset from a DataTree.

read_metadata_mappings(schema, map_file)

Load and validate metadata mappings from a YAML file.

update_metadata(dataset, ds_name)

Apply variable and shared metadata to an xarray Dataset.

Parameters:
  • ds (xr.Dataset)

  • ds_name (str)

  • target_chunk_mb (float)

  • compressor (Optional[Blosc])

  • metadata_mapping_data (str)

  • metadata_mapping_coords (str)

  • zarr_format (int)

__init__(
ds=None,
ds_name='L1',
target_chunk_mb=5.0,
compressor=None,
metadata_mapping_data=None,
metadata_mapping_coords=None,
zarr_format=2,
)[source]#

Initialise a GeoZarrHandler object.

Parameters:
  • ds (xarray.DataTree | xarray.Dataset, default None) – Input dataset with dimensions (‘x’, ‘y’) or (‘t’, ‘x’, ‘y’). Must include coordinate variables. Accepts either a dataset or data tree.

  • data_tree (xarray.DataTree, default None) – Input data_tree. Either ds or data_tree must be provided.

  • ds_name (str, default 'L1') – Name of datacube.

  • target_chunk_mb (float, default 5.0) – Approximate chunk size in megabytes for efficient storage.

  • compressor (Blosc, default None) – Compressor to apply on arrays. If None, the compression will be Blosc with zstd.

  • metadata_mapping_data (str, default None) – Path to the YAML file containing variable metadata mappings. If None, defaults to ‘metadata_mapping_data.yaml’ in the current directory.

  • metadata_mapping_coords (str, default None) – Path to the YAML file containing time coordinate metadata mappings. If None, defaults to ‘metadata_mapping_data.yaml’ in the current directory.

  • zarr_format (int, default 2) – Zarr format version to use (2 or 3).

  • self (dtcg.datacube.geozarr.GeoZarrHandler)

add_datacube(
datacubes,
datacube_name,
overwrite=False,
)[source]#

Add a new dataset as a child group of the DataTree at the root.

Parameters:
  • datacubes (dict) – A dictionary with keys one of the currently supported L2 datacubes (‘monthly’, ‘annual_hydro’, ‘daily_smb’) and values the corresponding xr.Dataset.

  • datacube_name (str) – Layer name to be used for this node of the tree. It should either contain L2 or L3. If nothing from the both is included the name will get L2_ as suffix.

  • overwrite (bool) – If True, allow a layer of the same name to be overwritten.

  • self (dtcg.datacube.geozarr.GeoZarrHandler)

Return type:

None

add_layer(ds, ds_name, overwrite=False)[source]#

Add a new dataset as a child group of the DataTree at the root. :param ds: New dataset layer to be added to the existing data tree. :type ds: xarray.Dataset :param ds_name: Layer name to be used for this node of the tree. :type ds_name: str :param overwrite: If True, allow a layer of the same name to be overwritten. :type overwrite: bool

Parameters:
Return type:

None

export(storage_directory, overwrite=True)[source]#

Write the dataset to GeoZarr format.

Parameters:
  • storage_directory (str) – Path to write the Zarr data.

  • overwrite (bool, default True) – Whether to overwrite existing Zarr contents in the target location.

  • self (dtcg.datacube.geozarr.GeoZarrHandler)

Return type:

None

get_layer(ds_name)[source]#

Get a dataset from a DataTree.

Parameters:
Returns:

Dataset layer in tree.

Return type:

xr.Dataset

Raises:
  • KeyError – If the layer name is not present in the data tree.

  • AttributeError – If the layer does not contain a dataset.

read_metadata_mappings(schema, map_file)#

Load and validate metadata mappings from a YAML file.

Parameters:
  • schema (Schema) – The schema structure used for validation

  • map_file (str) – Path to the YAML file containing metadata mappings.

  • self (MetadataMapper)

Returns:

Metadata mappings loaded from YAML file.

Return type:

dict

Raises:

schema.SchemaError – If any of the metadata entries fail schema validation.

update_metadata(dataset, ds_name)#

Apply variable and shared metadata to an xarray Dataset.

Parameters:
  • dataset (xarray.Dataset) – Dataset to which the metadata should be applied.

  • ds_name (str) – Name of dataset.

  • self (MetadataMapper)

Returns:

The input dataset with updated metadata.

Return type:

xarray.Dataset

Warns:

UserWarning – If any dataset variables are missing in the metadata mapping.

Notes

This function adds both per-variable and global metadata attributes. Missing variable mappings are reported as warnings, not errors.