Inspecting dataset attributes

Inspecting dataset attributes#

The datasets collected during ORCESTRA follow a common Attribute Convention, which means that certain global attributes can be expected to be set. These can be used by downstream applications to automatically retrieve metadata from a dataset, but are also intended to help scientists in their day-to-day work.

The ORCESTRA Data Browser, for example, retrieves all necessary information to render a landing page directly from the dataset. But the global attributes can also be helpful for scientists to get a better overview of a dataset.

from pprint import pprint

import xarray as xr

In general, the attrs attribute allows access to the global attributes of a dataset. Here, we print all the global attributes of the CTD measurements during the BOW-TIE subcampaign:

ds_ctd = xr.open_dataset("ipfs://bafybeihoghhgi655g7arw2ubtpudbq4c4hpwjlrwghcex3snu7f36imjgq", engine="zarr")
pprint(ds_ctd.attrs)
{'Conventions': 'CF-1.6, OceanSites Manual-1.3, EGO glider users manual 1.3, '
                'ACDD-1.3',
 'Conventions_comment': 'this file is not strict according to OceanSites, EGO, '
                        'or ACDD',
 'GEOMAR_netcdf_version': '7',
 'GEOMAR_po_svn_global_revision_when_writing': 1142.0,
 'Metadata_Conventions': 'Unidata Dataset Discovery v1.0',
 'cast': '1',
 'cdm_data_type': 'Station',
 'chief_scientist': 'see contributor_name attribute',
 'comment': ' ',
 'contributor_email': 'daniel.klocke@mpimet.mpg.de ,\n'
                      'mdengler@geomar.de ,\n'
                      'mdengler@geomar.de ,\n'
                      'mdengler@geomar.de ,\n'
                      'mdengler@geomar.de ,\n'
                      'mdengler@geomar.de ,\n'
                      'mdengler@geomar.de ,',
 'contributor_name': 'Daniel Klocke ,\n'
                     'Marcus Dengler ,\n'
                     'Marcus Dengler ,\n'
                     'Marcus Dengler ,\n'
                     'Marcus Dengler ,\n'
                     'Marcus Dengler ,\n'
                     'Marcus Dengler ,',
 'contributor_role': 'Chief Scientist of the Cruise ,\n'
                     'Principal Investigator for the Cruise/Project ,\n'
                     'Scientist handling instrument on board ,\n'
                     'Scientist handling final processing ,\n'
                     'Principal Investigator for this data set ,\n'
                     'Publisher ,\n'
                     'GEOMAR FB1 PO data steward converting to NetCDF ,',
 'coverage_content_type': 'physicalMeasurement',
 'creator_email': 'mdengler@geomar.de',
 'creator_name': 'Marcus Dengler',
 'cruise_identifier': 'met_203_1',
 'cruise_leg': '1',
 'data_mode': 'D',
 'data_mode_list': 'R:real-time data   \n'
                   'P:provisional data  (this means RAW data)   \n'
                   'D:delayed-mode data  (this means calibrated FINAL data)\n'
                   'M:mixture of the above',
 'data_type': 'OceanSITES profile data',
 'expocode': '06M320240810',
 'expocode_old_style': ' ',
 'featureType': 'trajectoryProfile',
 'format_version': '1.3',
 'geospatial_lat_max': 14.8473,
 'geospatial_lat_min': 4.6514,
 'geospatial_lat_units': 'degrees_north',
 'geospatial_lon_max': -22.9982,
 'geospatial_lon_min': -58.7197,
 'geospatial_lon_units': 'degrees_east',
 'geospatial_vertical_positive': 'down',
 'geospatial_vertical_units': 'm',
 'history': 'created during CTD processing; {now}: converted to Zarr by Lukas '
            'Kluft (lukas.kluft@mpimet.mpg.de)',
 'ices_platform_code': '06M3',
 'institution': 'Helmholtz Centre for Marine Research Kiel (GEOMAR)',
 'institution_references': 'http://www.geomar.de',
 'license': 'CC-BY-4.0',
 'naming_authority': 'GEOMAR, de.geomar',
 'netcdf_version': '3.5',
 'nodc_template_version': 'NODC_NetCDF_Profile_Orthogonal_Template_v1.0',
 'platform': 'RV METEOR',
 'principal_investigator': 'Marcus Dengler',
 'principal_investigator_email': 'mdengler@geomar.de',
 'project': 'ORCESTRA, BOW-TIE',
 'publisher_email': 'mdengler@geomar.de',
 'publisher_name': 'Marcus Dengler',
 'publisher_url': 'https://orcid.org/0000-0001-5993-9088',
 'references': ' ',
 'sdn_edmo_code': '2947',
 'sea_name': ' ',
 'ship_name': 'Meteor III',
 'source': 'CTD profile observation',
 'standard_name_vocabulary': 'CF-1.6, v19',
 'time_coverage_end': '2024-09-22T14:24:12.000000000',
 'time_coverage_start': '2024-08-18T01:40:17.000000000',
 'title': 'GEOMAR PO-processed CTD data of cruise Meteor 203/1 CTD station '
          'number 1'}

But we can also extract individual keys to extract information that is relevant to us. This can be used to print standardised log messages when working with different datasets, or to check the license and creator of a dataset (see also the ORCESTRA Data Policy).

pprint(ds_ctd.attrs["title"])
pprint(ds_ctd.attrs["creator_name"])
pprint(ds_ctd.attrs["license"])
'GEOMAR PO-processed CTD data of cruise Meteor 203/1 CTD station number 1'
'Marcus Dengler'
'CC-BY-4.0'