ERA5 data on the HEALPix grid#

ERA5[1] data for selected variables has been restructured and remapped onto the HEALPix grid for the years 2010 to 2023. This dataset, which we call HERA5, is accessible through the internal TCO catalog and can be read and used by everyone. HERA5 provides data at different temporal resolutions and enables an open and fast access to the data from everywhere and without a local copy. This notebook shows the access to the data and some basic usage examples.

This tutorial makes use of the easygems python package which hosts several convenience functions to work with and display Earth-system data on the HEALPix grid. If you don’t have the package installed, you can run the code cell below to install it via pip:

%pip install easygems

Loading the data from the catalog#

First, let’s import all packages that are required to run the full notebook.

import intake
import healpix as hp
import cmocean
import numpy as np
import xarray as xr

import cartopy.crs as ccrs
import cartopy.feature as cf
import matplotlib.pyplot as plt

import easygems.healpix as egh

Now we can open the catalog and select the hierarchical HEALPix ERA5 data HERA5.

cat = intake.open_catalog("https://tcodata.mpimet.mpg.de/internal.yaml")
list(cat)
['BCO', 'METEOR', 'ORCESTRA', 'HERA5', 'HIFS', 'airinfo']
cat.HERA5.to_dask().pipe(egh.attach_coords)
<xarray.Dataset> Size: 66GB
Dimensions:    (time: 164, cell: 196608, level: 29)
Coordinates:
    latitude   (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
  * level      (level) int64 232B 50 70 100 125 150 175 ... 900 925 950 975 1000
    longitude  (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
  * time       (time) datetime64[ns] 1kB 2010-01-16T04:15:00 ... 2023-08-16T0...
    crs        int64 8B 0
  * cell       (cell) int64 2MB 0 1 2 3 4 ... 196603 196604 196605 196606 196607
    lat        (cell) float64 2MB 0.2984 0.5968 0.5968 ... -0.5968 -0.2984
    lon        (cell) float64 2MB 45.0 45.35 44.65 45.0 ... 315.4 314.6 315.0
Data variables: (12/73)
    100u       (time, cell) float32 129MB dask.array<chunksize=(24, 4096), meta=np.ndarray>
    100v       (time, cell) float32 129MB dask.array<chunksize=(24, 4096), meta=np.ndarray>
    10si       (time, cell) float32 129MB dask.array<chunksize=(24, 4096), meta=np.ndarray>
    10u        (time, cell) float32 129MB dask.array<chunksize=(24, 4096), meta=np.ndarray>
    10v        (time, cell) float32 129MB dask.array<chunksize=(24, 4096), meta=np.ndarray>
    2d         (time, cell) float32 129MB dask.array<chunksize=(24, 4096), meta=np.ndarray>
    ...         ...
    isor       (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
    lsm        (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
    sdfor      (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
    sdor       (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
    slor       (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
    z_sfc      (cell) float32 786kB dask.array<chunksize=(196608,), meta=np.ndarray>
Attributes:
    acknowledgment:  Contains modified Copernicus Climate Change Service info...
    contact:         lukas.kluft@mpimet.mpg.de
    creator:         Lukas Kluft
    description:     Selected variables from ERA5, restructured and saved on ...
    institution:     Max Planck Institute for Meteorology
    source:          Post-processed dataset based on the ERA5 mirror located ...
    title:           HERA5 - HEALPixelation of ERA5

Some notes on the dataset#

  • Dimensions

    • the dimensions are time, cell and (pressure-)level and the coordinate reference system crs. Here, cell refers to the cells in the HEALPix grid, which is a pixelation of a sphere. Get to know more about HEALPix here. The zoom level of the HERA5 data is \(z = 7\) and the number of cells respectively \(12 \cdot z^{4} = 196608\)

    • you can distinguish between 2d and 3d variables by their dimensions, the former having only (time, cell) and the latter having (time, level, cell).

  • Temporal hierarchy

    • The dataset is constructed hierarchically in time, which means that you can directly access either monthly aggregated values P1M, daily P1D, or the original hourly data PT1H. By default, the catalog entry points to the monthly values which are great to do a fast analysis of longterm and coarse features such as calculating the global mean surface temperature over 10 years. If you need higher temporal resolutions, you can add the respective acronym when loading the catalog: ds = cat.HERA5(time="PT1H").to_dask().

    • In principle, one could also implement spatial hierarchies. However, the original ERA5 data is already rather coarse such that there is no need for coarser spatial (zoom) levels in the present case.

  • Spatial resolution

    • The spatial resolution is given by the zoom level \(z = 7\) which is closest to the original ERA5 model output and roughly corresponds to 0.35 x 0.7° effective grid resolution.

  • Variables

    • Most variables are from the ERA5 “analysis” product. Only few variables such as total precipitation and boundary layer height are added from the “forecast” product. Each variable has a type attribute which will let you know whether it is from the analysis or forecast product.

Plotting examples#

2m temperature at the BCO in Aug-Sep 2020#

We would like to plot the 2m air temperature at one location, the BCO, in August and September 2020. For this rather short time period, we would like to work with the hourly data. Therefore, we need to specify the time hierarchy when addressing the data in the catalog.

era5 = cat.HERA5(time="PT1H").to_dask().pipe(egh.attach_coords)

Next, we select the cell index that is the nearest neighbor to the BCO location using healpix.ang2pix().

i_bco = hp.ang2pix(
    egh.get_nside(era5),
    -59.42875, # longitude at BCO
    13.16264, # latitude at BCO
    nest=egh.get_nest(era5),
    lonlat=True,
)

Now, we can plot the time series by selecting the respective cell as well as the time range.

era5["2t"].sel(cell=i_bco, time=slice("2020-08", "2020-09")).plot()
[<matplotlib.lines.Line2D at 0x7f5d83f9bf80>]
_images/63ad2aff6d2db06ae3c301a9949fc8d2a7e7b5799cfad8ace2607464699ab87a.png

Evolution of the moisture field at BCO#

Next, we extend our analysis and use a 3d variable to plot a time-height diagramm of moisture. Again, we select the BCO cell, a time range and now also a range in height level where we leave out the upper most layers (<100hPa) as we are interested in the troposphere only.

era5["q"].sel(
    cell=i_bco,
    time="2020-08",
    level=slice(100, 1000),
).plot(
    x="time",
    yincrease=False,
    cmap="cmo.dense",
    vmin=0,
)
<matplotlib.collections.QuadMesh at 0x7f5d8eaf4500>
_images/3f9209eafaee09f209ef4c3e9d6692211fd71c56a87b407aa164f5bb2d01a903.png

Select ORCESTRA region#

We want to see how the RH profile changes with latitude when averaged over the ORCESTRA campaign region on the 20th of August 2020. The easygems package provides the the isel_extent function which takes a lat/lon extent ([W, E, S, N]) as input and returns the corresponding indices of the global HEALPix grid. These indices can then be used to select the region of interest.

is_orcestra = egh.isel_extent(era5, [-60, -10, -5, 20])

fig, ax = plt.subplots()
era5["r"].sel(
    cell=is_orcestra,
    time="2020-08-20",
    level=slice(100, 1000),
).mean(
    "time"
).groupby(
    era5.lat.sel(cell=is_orcestra)
).mean().plot(
    x="lat",
    yincrease=False,
    cmap="cmo.dense",
    ax=ax,
)
ax.set_title("Relative humidity profile on 2020-08-20 (ORCESTRA region)")
Text(0.5, 1.0, 'Relative humidity profile on 2020-08-20 (ORCESTRA region)')
_images/abd195e751a892ca51de9bbec17399ecf6016a130acf40af502367feb8244d12.png

A snapshot of global SST#

egh.healpix_show(
    era5["sst"].sel(time="2020-08-01T12:00:00", method="nearest"),
    nest=egh.get_nest(era5),
    vmin=273,
    vmax=303,
    cmap="cmo.thermal",
)
<matplotlib.image.AxesImage at 0x7f5d749de240>
/home/runner/miniconda3/envs/orcestra_book/lib/python3.12/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naturalearth.s3.amazonaws.com/110m_physical/ne_110m_coastline.zip
  warnings.warn(f'Downloading: {url}', DownloadWarning)
<Figure size 640x480 with 0 Axes>
_images/2a0e5019df36d28dbeb574cd70a54f0a00c14b747323ec5a90a4835ccdb74bee.png

Plotting the Atlantic basin only#

In real-life analyses it is often necessary to constrain a plot to certain regions. The function easygems.healpix_show() allows to plot two-dimensional data onto a cartopy GeoAxis with arbitrary projection and extent.

era5 = cat.HERA5(time="P1M").to_dask().pipe(egh.attach_coords)
var = era5["tp"]

fig, ax = plt.subplots(figsize=(10, 6), subplot_kw={"projection": ccrs.PlateCarree()})
ax.set_extent([-65, -5, -10, 25])
ax.coastlines(lw=0.8)
im = egh.healpix_show(
    var.sel(time="2020-08", method="nearest").values,
    method="linear",
    cmap="cmo.rain",
    vmin=0,
)
fig.colorbar(im, label=f"{var.long_name} / {var.units}", shrink=0.7)
<matplotlib.colorbar.Colorbar at 0x7f5d753f2780>
_images/b8aea7690cbaff27bb07fdd59c75bc298802c242958386171cd6f2f2747a0254.png

Further reading#