ECCOv4 access via NASA’s EarthData in the Cloud#

This notebook demonstrates access to ECCOv4 model output. Broad information about the ECCO dataset can be found in the PODAAC website (see here).

Requirements to run this notebook

  1. Have an Earth Data Login account

  2. Have a Bearer Token.

Objectives

Use pydap’s client API to demonstrate

  • Access to NASA’s EarthData via the use of tokens. Tokens are greatly favored over username/password, since tokens avoid repeated authentication over the many redirects.

  • A workflow involving multiple OPeNDAP URLs and xarray parallelism, lazy evaluation, and plotting of Level 4 with complex Topology ECCOv4 Data via OPeNDAP.

Some variables of focus are

Author: Miguel Jimenez-Urias, ‘24

import matplotlib.pyplot as plt
import numpy as np
from pydap.net import create_session
from pydap.client import open_url
import xarray as xr

Access EARTHDATA#

Many of the data variables can be browsed here. Here we will work with the original data defined on the Lat-Lon-Cap (LLC90) grid.

Grid_url = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Geometry%20Parameters%20for%20the%20Lat-Lon-Cap%2090%20(llc90)%20Native%20Model%20Grid%20(Version%204%20Release%204)/granules/GRID_GEOMETRY_ECCO_V4r4_native_llc0090'

Use .netrc credentials#

my_session = create_session()

Alternative token approach#

session_extra = {"token": "YourToken"}

# initialize a requests.session object with the token headers. All handled by pydap.
my_session = create_session(session_kwargs=session_extra)

Lazy access to remote data via pydap’s client API#

pydap exploits the OPeNDAP’s separation between metadata and data, to create lazy dataset objects that point to the data. These lazy objects contain all the attributes detailed in OPeNDAP’s metadata files (DMR).

ds_grid = open_url(Grid_url, session=my_session, protocol="dap4")
ds_grid.tree()
.GRID_GEOMETRY_ECCO_V4r4_native_llc0090.nc
├──dxC
├──PHrefF
├──XG
├──dyG
├──rA
├──hFacS
├──Zp1
├──Zl
├──rAw
├──dxG
├──maskW
├──YC
├──XC
├──maskS
├──YG
├──hFacC
├──drC
├──drF
├──XC_bnds
├──Zu
├──Z_bnds
├──YC_bnds
├──PHrefC
├──rAs
├──Depth
├──dyC
├──SN
├──rAz
├──maskC
├──CS
├──hFacW
├──Z
├──i
├──i_g
├──j
├──j_g
├──k
├──k_l
├──k_p1
├──k_u
├──nb
├──nv
└──tile

Note

PyDAP accesses the remote dataset’s metadata, and no data has been downloaded yet!

Download data into memory

The syntax is as follows:

# this fetches remote data into a pydap object container
pydap_Array = dataset[<VarName>][:]

where <VarName> is the name of one of the variables in the pydap.model.BaseType object. The pydap.model.BaseType is a thing wrapper of numpy arrays. The array has been downloaded into “local” memory (RAM) as (uncompressed) numpy arrays.

To extract the data as a numpy array, run

# The `.data` command allows direct access to the Numpy array (e.g. for manipulation)
pydap_Array.data
# lets download some data
Depth = ds_grid['Depth'][:]
print(type(Depth))
<class 'pydap.model.BaseType'>
Depth.attributes
{'_FillValue': 9.969209968e+36,
 'long_name': 'model seafloor depth below ocean surface at rest',
 'units': 'm',
 'coordinate': 'XC YC',
 'coverage_content_type': 'modelResult',
 'standard_name': 'sea_floor_depth_below_geoid',
 'comment': "Model sea surface height (SSH) of 0m corresponds to an ocean surface at rest relative to the geoid. Depth corresponds to seafloor depth below geoid. Note: the MITgcm used by ECCO V4r4 implements 'partial cells' so the actual model seafloor depth may differ from the seafloor depth provided by the input bathymetry file.",
 'coordinates': 'YC XC',
 'origname': 'Depth',
 'fullnamepath': '/Depth',
 'dims': ['tile', 'j', 'i'],
 'Maps': (),
 'checksum': array([950678499], dtype=uint32)}
Depth.shape, Depth.dimensions
((13, 90, 90), ['/tile', '/j', '/i'])

Plot Depth along native grid

ECCO data is defined on a Cube Sphere – meaning the horizontal grid contains an extra dimension: tile or face. You can inspect the data in its native grid by plotting all horizontal data onto a grid as follows:

Variable = [Depth[i].data for i in range(13)]
clevels =  np.linspace(0, 6000, 100)
cMap = 'Greys_r'
fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES = [
    axes[4, 0], axes[3, 0], axes[2, 0], axes[4, 1], axes[3, 1], axes[2, 1],
    axes[1, 1], 
    axes[1, 2], axes[1, 3], axes[1, 4], axes[0, 2], axes[0, 3], axes[0, 4],
]
for i in range(len(AXES)):
    AXES[i].contourf(Variable[i], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/849a9ed14c6d8839c23705e6c44ea3e1fdab748bdc0456ebcc990fac02e6e7c7.png

Fig. 1. Depth plotted on a horizontal layout. Data on tiles 0-5 follow C-ordering, whereas data on tiles 7-13 follow F-ordering. Data on the arctic cap, is defined on a polar coordinate grid.

Plot with corrected Topology

fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
    axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
    axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
    AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)

for i in range(len(AXES_CAP)):
    AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)

for i in range(len(AXES_R)):
    AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/19f85cdceb8c590181f7084e3cdcab349f14bc6271c5f41aa74c4547434b4d06.png

Fig. 2. Depth plotted on a horizontal layout that approximates lat-lon display. Data on the arctic cap, however, remains on a polar coordinate grid.

pydap + xarray to aggregate multiple URLs#

OPeNDAP allows remote access via dataURLs. Each dataURL represents a variable, i.e. a piece of the whole puzzle. We can exploit xarray concatenation and parallelism to combine multiple dataURLs, and thus multiple pydap objects, into a single xarray.Dataset. Can do this because xarray has long-implemented pydap as an engine to access/open remote datasets.

A single URL

Many OPeNDAP servers can serve data in either DAP2 and DAP4 (much newer) model implementations. In pydap, you can specify which of the two implementations by replacing the beginning of the URL with either one of dap2 or dap4. Since xarray imports pydap internally as is, then xarray can recognize this behavior as well.

Below we access remote Temperature and Salinity ECCO data with xarray via (internally) pydap.

baseURL = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/'
Temp_Salt = "ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_"
year = '2017-'
month = '01'
end_ = '_ECCO_V4r4_native_llc0090'
Temp_2017 = baseURL + Temp_Salt +year  + month + end_
%%time
dataset = xr.open_dataset(Temp_2017, engine='pydap', session=my_session)
dataset
CPU times: user 115 ms, sys: 19.5 ms, total: 134 ms
Wall time: 20.5 s
<xarray.Dataset> Size: 47MB
Dimensions:    (/tile: 13, /j_g: 90, /i_g: 90, /k_p1: 51, /k_l: 50, /j: 90,
                /i: 90, /time: 1, /k: 50, /nb: 4, /k_u: 50, /nv: 2)
Coordinates: (12/13)
    XG         (/tile, /j_g, /i_g) float32 421kB ...
    Zp1        (/k_p1) float32 204B ...
    Zl         (/k_l) float32 200B ...
    YC         (/tile, /j, /i) float32 421kB ...
    XC         (/tile, /j, /i) float32 421kB ...
    YG         (/tile, /j_g, /i_g) float32 421kB ...
    ...         ...
    Zu         (/k_u) float32 200B ...
    Z_bnds     (/k, /nv) float32 400B ...
    YC_bnds    (/tile, /j, /i, /nb) float32 2MB ...
    time_bnds  (/time, /nv) datetime64[ns] 16B ...
    Z          (/k) float32 200B ...
    time       (/time) datetime64[ns] 8B ...
Dimensions without coordinates: /tile, /j_g, /i_g, /k_p1, /k_l, /j, /i, /time,
                                /k, /nb, /k_u, /nv
Data variables: (12/13)
    SALT       (/time, /k, /tile, /j, /i) float32 21MB ...
    THETA      (/time, /k, /tile, /j, /i) float32 21MB ...
    i          (/i) int32 360B ...
    i_g        (/i_g) int32 360B ...
    j          (/j) int32 360B ...
    j_g        (/j_g) int32 360B ...
    ...         ...
    k_l        (/k_l) int32 200B ...
    k_p1       (/k_p1) int32 204B ...
    k_u        (/k_u) int32 200B ...
    nb         (/nb) float32 16B ...
    nv         (/nv) float32 8B ...
    tile       (/tile) int32 52B ...
Attributes: (12/62)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_duration:          P1M
    time_coverage_end:               2017-02-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2017-01-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            f5b7028c-4181-11eb-b7e6-0cc47a3f47b1

Using CEs to pre-select only variables of interest#

Allows the user to drop variables but ONLY after creating the dataset of the entire remote URL. With OPeNDAP servers, the you encode the names of the variables you want to access in the URL. THis can significantly speed up analysis and exploration.

Below we manually specify the DAP4 Constraint Expression for keeping only the variables THETA, along with its dimensions time, k, time, j, i and tile.

You can also construct the full URL with constraint expressions interactively, by selecting manually the variables on the datasets’s Data Request Form and selecting Copy (Raw) Data URL.

CE = "?dap4.ce=/THETA;/i;/time;/j;/k"
Temp_2017 = baseURL + Temp_Salt +year  + month + end_ + CE
%%time
dataset = xr.open_dataset(Temp_2017, engine='pydap', session=my_session, decode_times=False)
dataset
CPU times: user 17.4 ms, sys: 1.82 ms, total: 19.3 ms
Wall time: 2.76 s
<xarray.Dataset> Size: 21MB
Dimensions:  (/time: 1, /k: 50, /tile: 13, /j: 90, /i: 90)
Coordinates:
    time     (/time) int32 4B ...
Dimensions without coordinates: /time, /k, /tile, /j, /i
Data variables:
    THETA    (/time, /k, /tile, /j, /i) float32 21MB ...
    i        (/i) int32 360B ...
    j        (/j) int32 360B ...
    k        (/k) int32 200B ...
Attributes: (12/62)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_duration:          P1M
    time_coverage_end:               2017-02-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2017-01-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            f5b7028c-4181-11eb-b7e6-0cc47a3f47b1

Multiple files in parallel#

We can exploit xarray’s intuitive concat + merge capabilities to read multiple data URL in parallel. To accomplish that, we need to create a list of all relevant URLs that each point to an OPeNDAP dataset.

Tip

In this case we will use a session that caches the response. It can handle the authentication too!

cache_session = create_session(use_cache=True) # caching the session
cache_session.cache.clear() # clear all cache to demonstrate the behavior. You do not have to do this!
%%time
Temp_2017 = [baseURL + Temp_Salt + year + f'{i:02}' + end_ + CE for i in range(1, 13)]

theta_salt_ds = xr.open_mfdataset(
    Temp_2017, 
    engine='pydap',
    session=cache_session, 
    parallel=True, 
    combine='nested', 
    concat_dim='/time',
    decode_times=False,
)
CPU times: user 310 ms, sys: 132 ms, total: 442 ms
Wall time: 6.37 s

Any subsequent try should take milliseconds!#

Note

By default, a cache_session will store the metadata associated with this URLs (or any dap response downloaded) for 1 day, persisting your notebook. You can modify this by passing extra arguments to the create_session function.

%%time
Temp_2017 = [baseURL + Temp_Salt + year + f'{i:02}' + end_ + CE for i in range(1, 13)]

theta_salt_ds = xr.open_mfdataset(
    Temp_2017, 
    engine='pydap',
    session=cache_session, 
    parallel=True, 
    combine='nested', 
    concat_dim='/time',
    decode_times=False,
)
theta_salt_ds
CPU times: user 37.8 ms, sys: 3.28 ms, total: 41.1 ms
Wall time: 38.6 ms
<xarray.Dataset> Size: 253MB
Dimensions:  (/time: 12, /k: 50, /tile: 13, /j: 90, /i: 90)
Coordinates:
    time     (/time) int32 48B dask.array<chunksize=(1,), meta=np.ndarray>
Dimensions without coordinates: /time, /k, /tile, /j, /i
Data variables:
    THETA    (/time, /k, /tile, /j, /i) float32 253MB dask.array<chunksize=(1, 50, 13, 90, 90), meta=np.ndarray>
    i        (/time, /i) int32 4kB dask.array<chunksize=(1, 90), meta=np.ndarray>
    j        (/time, /j) int32 4kB dask.array<chunksize=(1, 90), meta=np.ndarray>
    k        (/time, /k) int32 2kB dask.array<chunksize=(1, 50), meta=np.ndarray>
Attributes: (12/62)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_duration:          P1M
    time_coverage_end:               2017-05-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2017-04-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            fa96f474-4181-11eb-bcb1-0cc47a3f47b1

Finally, we plot THETA#

Variable = [theta_salt_ds['THETA'][0, 0, i, :, :] for i in range(13)]
clevels = np.linspace(-5, 30, 100)
cMap='RdBu_r'
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
    axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
    axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
    AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)

for i in range(len(AXES_CAP)):
    AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)

for i in range(len(AXES_R)):
    AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/a0b2a1760b67af5bec260ddde1e53201a3ce8f38fce240277701e0a6fbe22d71.png

Fig. 3. Surface temperature, plotted on a horizontal layout that approximates lat-lon display. Data on the arctic cap, however, remains on a polar coordinate grid.