ECCOv4 access via NASA’s EarthData in the Cloud#

This notebook demonstrates access to ECCOv4 model output. Broad information about the ECCO dataset can be found in the PODAAC website (see here).

Requirements to run this notebook

  1. Have an Earth Data Login account

  2. Have a Bearer Token.

Objectives

Use pydap’s client API to demonstrate

  • Access to NASA’s EarthData via the use of tokens. Tokens are greatly favored over username/password, since tokens avoid repeated authentication over the many redirects.

  • A workflow involving multiple OPeNDAP URLs and xarray parallelism, lazy evaluation, and plotting of Level 4 with complex Topology ECCOv4 Data via OPeNDAP.

Some variables of focus are

Author: Miguel Jimenez-Urias, ‘24

import matplotlib.pyplot as plt
import numpy as np
from pydap.net import create_session
from pydap.client import open_url
import xarray as xr
import pydap
print("xarray version: ", xr.__version__)
print("pydap version: ", pydap.__version__)
xarray version:  2025.4.0
pydap version:  3.5.5

Access EARTHDATA#

Many of the data variables can be browsed here. Here we will work with the original data defined on the Lat-Lon-Cap (LLC90) grid.

Grid_url = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Geometry%20Parameters%20for%20the%20Lat-Lon-Cap%2090%20(llc90)%20Native%20Model%20Grid%20(Version%204%20Release%204)/granules/GRID_GEOMETRY_ECCO_V4r4_native_llc0090'

Use .netrc credentials#

my_session = create_session()

Alternative token approach#

session_extra = {"token": "YourToken"}

# initialize a requests.session object with the token headers. All handled by pydap.
my_session = create_session(session_kwargs=session_extra)

Lazy access to remote data via pydap’s client API#

pydap exploits the OPeNDAP’s separation between metadata and data, to create lazy dataset objects that point to the data. These lazy objects contain all the attributes detailed in OPeNDAP’s metadata files (DMR).

ds_grid = open_url(Grid_url, session=my_session, protocol="dap4")
ds_grid.tree()
.GRID_GEOMETRY_ECCO_V4r4_native_llc0090.nc
├──dxC
├──PHrefF
├──XG
├──dyG
├──rA
├──hFacS
├──Zp1
├──Zl
├──rAw
├──dxG
├──maskW
├──YC
├──XC
├──maskS
├──YG
├──hFacC
├──drC
├──drF
├──XC_bnds
├──Zu
├──Z_bnds
├──YC_bnds
├──PHrefC
├──rAs
├──Depth
├──dyC
├──SN
├──rAz
├──maskC
├──CS
├──hFacW
├──Z
├──i
├──i_g
├──j
├──j_g
├──k
├──k_l
├──k_p1
├──k_u
├──nb
├──nv
└──tile

Note

PyDAP accesses the remote dataset’s metadata, and no data has been downloaded yet!

Download data into memory

The syntax is as follows:

# this fetches remote data into a pydap object container
pydap_Array = dataset[<VarName>][:]

where <VarName> is the name of one of the variables in the pydap.model.BaseType object. The pydap.model.BaseType is a thing wrapper of numpy arrays. The array has been downloaded into “local” memory (RAM) as (uncompressed) numpy arrays.

To extract the data as a numpy array, run

# The `.data` command allows direct access to the Numpy array (e.g. for manipulation)
pydap_Array.data
# lets download some data
Depth = ds_grid['Depth'][:]
print(type(Depth))
<class 'pydap.model.BaseType'>
Depth.attributes
{'_FillValue': 9.969209968e+36,
 'long_name': 'model seafloor depth below ocean surface at rest',
 'units': 'm',
 'coordinate': 'XC YC',
 'coverage_content_type': 'modelResult',
 'standard_name': 'sea_floor_depth_below_geoid',
 'comment': "Model sea surface height (SSH) of 0m corresponds to an ocean surface at rest relative to the geoid. Depth corresponds to seafloor depth below geoid. Note: the MITgcm used by ECCO V4r4 implements 'partial cells' so the actual model seafloor depth may differ from the seafloor depth provided by the input bathymetry file.",
 'coordinates': 'YC XC',
 'origname': 'Depth',
 'fullnamepath': '/Depth',
 'Maps': (),
 'checksum': array([950678499], dtype=uint32)}
Depth.shape, Depth.dims
((13, 90, 90), ['/tile', '/j', '/i'])

Plot Depth along native grid

ECCO data is defined on a Cube Sphere – meaning the horizontal grid contains an extra dimension: tile or face. You can inspect the data in its native grid by plotting all horizontal data onto a grid as follows:

Variable = [Depth[i].data for i in range(13)]
clevels =  np.linspace(0, 6000, 100)
cMap = 'Greys_r'
fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES = [
    axes[4, 0], axes[3, 0], axes[2, 0], axes[4, 1], axes[3, 1], axes[2, 1],
    axes[1, 1], 
    axes[1, 2], axes[1, 3], axes[1, 4], axes[0, 2], axes[0, 3], axes[0, 4],
]
for i in range(len(AXES)):
    AXES[i].contourf(Variable[i], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/849a9ed14c6d8839c23705e6c44ea3e1fdab748bdc0456ebcc990fac02e6e7c7.png

Fig. 1. Depth plotted on a horizontal layout. Data on tiles 0-5 follow C-ordering, whereas data on tiles 7-13 follow F-ordering. Data on the arctic cap, is defined on a polar coordinate grid.

Plot with corrected Topology

fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
    axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
    axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
    AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)

for i in range(len(AXES_CAP)):
    AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)

for i in range(len(AXES_R)):
    AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/19f85cdceb8c590181f7084e3cdcab349f14bc6271c5f41aa74c4547434b4d06.png

Fig. 2. Depth plotted on a horizontal layout that approximates lat-lon display. Data on the arctic cap, however, remains on a polar coordinate grid.

pydap + xarray to aggregate multiple URLs#

OPeNDAP allows remote access via dataURLs. Each dataURL represents a variable, i.e. a piece of the whole puzzle. We can exploit xarray concatenation and parallelism to combine multiple dataURLs, and thus multiple pydap objects, into a single xarray.Dataset. Can do this because xarray has long-implemented pydap as an engine to access/open remote datasets.

A single URL

Many OPeNDAP servers can serve data in either DAP2 and DAP4 (much newer) model implementations. In pydap, you can specify which of the two implementations by replacing the beginning of the URL with either one of dap2 or dap4. Since xarray imports pydap internally as is, then xarray can recognize this behavior as well.

Below we access remote Temperature and Salinity ECCO data with xarray via (internally) pydap.

baseURL = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/'
Temp_Salt = "ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_"
year = '2017-'
month = '01'
end_ = '_ECCO_V4r4_native_llc0090'
Temp_2017 = baseURL + Temp_Salt +year  + month + end_
Temp_2017
'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090'

Using CEs to pre-select only variables of interest#

Allows the user to drop variables but ONLY after creating the dataset of the entire remote URL. With OPeNDAP servers, the you encode the names of the variables you want to access in the URL. THis can significantly speed up analysis and exploration.

Below we manually specify the DAP4 Constraint Expression for keeping only the variables THETA, along with its dimensions time, k, time, j, i and tile.

You can also construct the full URL with constraint expressions interactively, by selecting manually the variables on the datasets’s Data Request Form and selecting Copy (Raw) Data URL.

pyds = open_url(Temp_2017, session=my_session)
pyds.tree()
.OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090.nc
├──XG
├──Zp1
├──Zl
├──YC
├──XC
├──SALT
├──YG
├──XC_bnds
├──Zu
├──THETA
├──Z_bnds
├──YC_bnds
├──time_bnds
├──Z
├──i
├──i_g
├──j
├──j_g
├──k
├──k_l
├──k_p1
├──k_u
├──nb
├──nv
├──tile
└──time

Pydap’s object can now helps up construct the Constraint Expression that will allow us to create a much simpler datacube

dims = pyds['THETA'].dims
Vars = ['/THETA'] + dims

# Below construct Contraint Expression
CE = "?dap4.ce="+(";").join(Vars)
print("Constraint Expression: ", CE)
Constraint Expression:  ?dap4.ce=/THETA;/time;/k;/tile;/j;/i

Now we add the CE to the data url as follows below:

Temp_2017 = baseURL + Temp_Salt +year  + month + end_ + CE
Temp_2017
'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090?dap4.ce=/THETA;/time;/k;/tile;/j;/i'
%%time
dataset = xr.open_dataset(Temp_2017, engine='pydap', session=my_session)
dataset
CPU times: user 151 ms, sys: 38.3 ms, total: 189 ms
Wall time: 30.5 s
<xarray.Dataset> Size: 21MB
Dimensions:  (time: 1, k: 50, tile: 13, j: 90, i: 90)
Coordinates:
  * i        (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * j        (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * k        (k) int32 200B 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49
  * tile     (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12
  * time     (time) datetime64[ns] 8B 2017-01-16T12:00:00
Data variables:
    THETA    (time, k, tile, j, i) float32 21MB ...
Attributes: (12/62)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_duration:          P1M
    time_coverage_end:               2017-02-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2017-01-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            f5b7028c-4181-11eb-b7e6-0cc47a3f47b1

Multiple files in parallel#

We can exploit xarray’s intuitive concat + merge capabilities to read multiple data URL in parallel. To accomplish that, we need to create a list of all relevant URLs that each point to an OPeNDAP dataset.

Tip

In this case we will use a session that caches the response. It can handle the authentication too!

Temp_2017 = [baseURL + Temp_Salt + year + f'{i:02}' + end_ + CE for i in range(1, 13)]
Temp_2017[:4]
['dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090?dap4.ce=/THETA;/time;/k;/tile;/j;/i',
 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-02_ECCO_V4r4_native_llc0090?dap4.ce=/THETA;/time;/k;/tile;/j;/i',
 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-03_ECCO_V4r4_native_llc0090?dap4.ce=/THETA;/time;/k;/tile;/j;/i',
 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-04_ECCO_V4r4_native_llc0090?dap4.ce=/THETA;/time;/k;/tile;/j;/i']
cache_session = create_session(use_cache=True) # caching the session
cache_session.cache.clear() # clear all cache to demonstrate the behavior. You do not have to do this!

Tip

You can use the consolidated_metadata method to rapidly download all metadata associated with a datacube, and cache it. This is a feature of pydap >= 3.5.5.

from pydap.client import consolidate_metadata
%%time
consolidate_metadata(Temp_2017, cache_session)
datacube has dimensions {'i[0:1:89]', 'j[0:1:89]', 'k[0:1:49]', 'time[0:1:0]', 'tile[0:1:12]'}
CPU times: user 619 ms, sys: 354 ms, total: 973 ms
Wall time: 11.4 s
cache_session.cache.urls()
['https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090.dap?dap4.ce=i%5B0%3A1%3A89%5D',
 'https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090.dap?dap4.ce=j%5B0%3A1%3A89%5D',
 'https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090.dap?dap4.ce=k%5B0%3A1%3A49%5D',
 'https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090.dap?dap4.ce=tile%5B0%3A1%3A12%5D',
 'https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-01_ECCO_V4r4_native_llc0090.dap?dap4.ce=time%5B0%3A1%3A0%5D',
 'https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_2017-10_ECCO_V4r4_native_llc0090.dmr?dap4.ce=%2FTHETA%3B%2Ftime%3B%2Fk%3B%2Ftile%3B%2Fj%3B%2Fi']
%%time
theta_salt_ds = xr.open_mfdataset(
    Temp_2017, 
    engine='pydap',
    session=cache_session, 
    parallel=True,
    combine='nested',
    concat_dim='time',
    decode_times=False
)
CPU times: user 242 ms, sys: 120 ms, total: 363 ms
Wall time: 300 ms
theta_salt_ds
<xarray.Dataset> Size: 253MB
Dimensions:  (time: 12, k: 50, tile: 13, j: 90, i: 90)
Coordinates:
  * i        (i) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * j        (j) int32 360B 0 1 2 3 4 5 6 7 8 9 ... 81 82 83 84 85 86 87 88 89
  * k        (k) int32 200B 0 1 2 3 4 5 6 7 8 9 ... 41 42 43 44 45 46 47 48 49
  * tile     (tile) int32 52B 0 1 2 3 4 5 6 7 8 9 10 11 12
  * time     (time) int32 48B 219528 219528 219528 ... 219528 219528 219528
Data variables:
    THETA    (time, k, tile, j, i) float32 253MB dask.array<chunksize=(1, 50, 13, 90, 90), meta=np.ndarray>
Attributes: (12/62)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_duration:          P1M
    time_coverage_end:               2017-11-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2017-10-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            fba8c270-4181-11eb-a317-0cc47a3f47df

Finally, we plot THETA#

Variable = [theta_salt_ds['THETA'][0, 0, i, :, :] for i in range(13)]
clevels = np.linspace(-5, 30, 100)
cMap='RdBu_r'
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
    axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
    axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
    AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)

for i in range(len(AXES_CAP)):
    AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)

for i in range(len(AXES_R)):
    AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/a0b2a1760b67af5bec260ddde1e53201a3ce8f38fce240277701e0a6fbe22d71.png

Fig. 3. Surface temperature, plotted on a horizontal layout that approximates lat-lon display. Data on the arctic cap, however, remains on a polar coordinate grid.