ECCOv4 access via NASA’s EarthData in the Cloud#

This notebook demonstrates access to ECCOv4 model output. Broad information about the ECCO dataset can be found in the PODAAC website (see here).

Requirements to run this notebook

  1. Have an Earth Data Login account

  2. Have a Bearer Token.

Objectives

Use pydap’s client API to demonstrate

  • Access to NASA’s EarthData via the use of tokens. Tokens are greatly favored over username/password, since tokens avoid repeated authentication over the many redirects.

  • A workflow involving multiple OPeNDAP URLs and xarray parallelism, lazy evaluation, and plotting of Level 4 with complex Topology ECCOv4 Data via OPeNDAP.

Some variables of focus are

Author: Miguel Jimenez-Urias, ‘24

import matplotlib.pyplot as plt
import numpy as np
import requests
from pydap.client import open_url
import cartopy.crs as ccrs
import json
import xarray as xr

Access EARTHDATA#

Many of the data variables can be browsed here. Here we will work with the original data defined on the Lat-Lon-Cap (LLC90) grid.

Grid_url = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Geometry%20Parameters%20for%20the%20Lat-Lon-Cap%2090%20(llc90)%20Native%20Model%20Grid%20(Version%204%20Release%204)/granules/GRID_GEOMETRY_ECCO_V4r4_native_llc0090'

Add to session’s headers Token Authorization#

edl_token = "YourToken"

auth_hdr="Bearer " + edl_token

# pass Token Authorization to a new Session.

my_session = requests.Session()
my_session.headers={"Authorization": auth_hdr}

Lazy access to remote data via pydap’s client API#

pydap exploits the OPeNDAP’s separation between metadata and data, to create lazy dataset objects that point to the data. These lazy objects contain all the attributes detailed in OPeNDAP’s metadata files (DMR).

ds_grid = open_url(Grid_url, session=my_session, protocol="dap4")
ds_grid.tree()
.GRID_GEOMETRY_ECCO_V4r4_native_llc0090.nc
├──dxC
├──PHrefF
├──XG
├──dyG
├──rA
├──hFacS
├──Zp1
├──Zl
├──rAw
├──dxG
├──maskW
├──YC
├──XC
├──maskS
├──YG
├──hFacC
├──drC
├──drF
├──XC_bnds
├──Zu
├──Z_bnds
├──YC_bnds
├──PHrefC
├──rAs
├──Depth
├──dyC
├──SN
├──rAz
├──maskC
├──CS
├──hFacW
├──Z
├──i
├──i_g
├──j
├──j_g
├──k
├──k_l
├──k_p1
├──k_u
├──nb
├──nv
└──tile

Note

PyDAP accesses the remote dataset’s metadata, and no data has been downloaded yet!

Download data into memory

The syntax is as follows:

# this fetches remote data into a pydap object container
pydap_Array = dataset[<VarName>][:]

where <VarName> is the name of one of the variables in the pydap.model.BaseType object. The pydap.model.BaseType is a thing wrapper of numpy arrays. The array has been downloaded into “local” memory (RAM) as (uncompressed) numpy arrays.

To extract the data as a numpy array, run

# The `.data` command allows direct access to the Numpy array (e.g. for manipulation)
pydap_Array.data
# lets download some data
Depth = ds_grid['Depth'][:]
print(type(Depth))
<class 'pydap.model.BaseType'>
Depth.attributes
{'_FillValue': 9.969209968e+36,
 'long_name': 'model seafloor depth below ocean surface at rest',
 'units': 'm',
 'coordinate': 'XC YC',
 'coverage_content_type': 'modelResult',
 'standard_name': 'sea_floor_depth_below_geoid',
 'comment': "Model sea surface height (SSH) of 0m corresponds to an ocean surface at rest relative to the geoid. Depth corresponds to seafloor depth below geoid. Note: the MITgcm used by ECCO V4r4 implements 'partial cells' so the actual model seafloor depth may differ from the seafloor depth provided by the input bathymetry file.",
 'coordinates': 'YC XC',
 'origname': 'Depth',
 'fullnamepath': '/Depth',
 'Maps': (),
 'checksum': array([950678499], dtype=uint32)}
Depth.shape, Depth.dimensions
((13, 90, 90), ['/tile', '/j', '/i'])

Plot Depth along native grid

ECCO data is defined on a Cube Sphere – meaning the horizontal grid contains an extra dimension: tile or face. You can inspect the data in its native grid by plotting all horizontal data onto a grid as follows:

Variable = [Depth[i].data for i in range(13)]
clevels =  np.linspace(0, 6000, 100)
cMap = 'Greys_r'
fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES = [
    axes[4, 0], axes[3, 0], axes[2, 0], axes[4, 1], axes[3, 1], axes[2, 1],
    axes[1, 1], 
    axes[1, 2], axes[1, 3], axes[1, 4], axes[0, 2], axes[0, 3], axes[0, 4],
]
for i in range(len(AXES)):
    AXES[i].contourf(Variable[i], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/6fd2fd4eaf1ed69240ada7abd345b2d8191843844ace20d5c0d145e2869504d8.png

Fig. 1. Depth plotted on a horizontal layout. Data on tiles 0-5 follow C-ordering, whereas data on tiles 7-13 follow F-ordering. Data on the arctic cap, is defined on a polar coordinate grid.

Plot with corrected Topology

fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
    axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
    axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
    AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)

for i in range(len(AXES_CAP)):
    AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)

for i in range(len(AXES_R)):
    AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/5e0e9dd9a216e10148d7c7600f48657dc7c0ec8900be1a9b196619e0bd29ce14.png

Fig. 2. Depth plotted on a horizontal layout that approximates lat-lon display. Data on the arctic cap, however, remains on a polar coordinate grid.

pydap + xarray to aggregate multiple URLs

OPeNDAP allows remote access via dataURLs. Each dataURL represents a variable, i.e. a piece of the whole puzzle. We can exploit xarray concatenation and parallelism to combine multiple dataURLs, and thus multiple pydap objects, into a single xarray.Dataset. Can do this because xarray has long-implemented pydap as an engine to access/open remote datasets.

A single URL

Many OPeNDAP servers can serve data in either DAP2 and DAP4 (much newer) model implementations. In pydap, you can specify which of the two implementations by replacing the beginning of the URL with either one of dap2 or dap4. Since xarray imports pydap internally as is, then xarray can recognize this behavior as well.

Below we access remote Temperature and Salinity ECCO data with xarray via (internally) pydap.

baseURL = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/'
Temp_Salt = "ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_"
year = '2017-'
month = '01'
end_ = '_ECCO_V4r4_native_llc0090'
Temp_2017 = baseURL + Temp_Salt +year  + month + end_
dataset = xr.open_dataset(Temp_2017, engine='pydap', session=my_session)
dataset
<xarray.Dataset> Size: 47MB
Dimensions:    (/tile: 13, /j_g: 90, /i_g: 90, /k_p1: 51, /k_l: 50, /j: 90,
                /i: 90, /time: 1, /k: 50, /nb: 4, /k_u: 50, /nv: 2)
Coordinates: (12/13)
    XG         (/tile, /j_g, /i_g) float32 421kB ...
    Zp1        (/k_p1) float32 204B ...
    Zl         (/k_l) float32 200B ...
    YC         (/tile, /j, /i) float32 421kB ...
    XC         (/tile, /j, /i) float32 421kB ...
    YG         (/tile, /j_g, /i_g) float32 421kB ...
    ...         ...
    Zu         (/k_u) float32 200B ...
    Z_bnds     (/k, /nv) float32 400B ...
    YC_bnds    (/tile, /j, /i, /nb) float32 2MB ...
    time_bnds  (/time, /nv) datetime64[ns] 16B ...
    Z          (/k) float32 200B ...
    time       (/time) datetime64[ns] 8B ...
Dimensions without coordinates: /tile, /j_g, /i_g, /k_p1, /k_l, /j, /i, /time,
                                /k, /nb, /k_u, /nv
Data variables: (12/13)
    SALT       (/time, /k, /tile, /j, /i) float32 21MB ...
    THETA      (/time, /k, /tile, /j, /i) float32 21MB ...
    i          (/i) int32 360B ...
    i_g        (/i_g) int32 360B ...
    j          (/j) int32 360B ...
    j_g        (/j_g) int32 360B ...
    ...         ...
    k_l        (/k_l) int32 200B ...
    k_p1       (/k_p1) int32 204B ...
    k_u        (/k_u) int32 200B ...
    nb         (/nb) float32 16B ...
    nv         (/nv) float32 8B ...
    tile       (/tile) int32 52B ...
Attributes: (12/62)
    acknowledgement:                 This research was carried out by the Jet...
    author:                          Ian Fenty and Ou Wang
    cdm_data_type:                   Grid
    comment:                         Fields provided on the curvilinear lat-l...
    Conventions:                     CF-1.8, ACDD-1.3
    coordinates_comment:             Note: the global 'coordinates' attribute...
    ...                              ...
    time_coverage_duration:          P1M
    time_coverage_end:               2017-02-01T00:00:00
    time_coverage_resolution:        P1M
    time_coverage_start:             2017-01-01T00:00:00
    title:                           ECCO Ocean Temperature and Salinity - Mo...
    uuid:                            f5b7028c-4181-11eb-b7e6-0cc47a3f47b1

Add a Constraint expression to the URL to ONLY retrieve THETA#

You can do that by appending to the <base_URL> the CE syntax:

<base_URL>?dap4.ce=/<VarName>

where <VarName> is the name of the Array you want to keep

Note

When using xarray, we need to include all dimensions associated with THETA in the Constraint Expression.

CE = '?dap4.ce=/THETA;/SALT;/tile;/j;/k;/i;/time'
dataset = xr.open_dataset(Temp_2017+CE, engine='pydap', session=my_session)

Multiple files in parallel#

We can exploit xarray’s intuitive concat + merge capabilities to read multiple data URL in parallel. To accomplish that, we need to create a list of all relevant URLs that each point to an OPeNDAP dataset.

Temp_2017 = [baseURL + Temp_Salt + year + f'{i:02}' + end_ + CE for i in range(1, 13)]

theta_salt_ds = xr.open_mfdataset(
    Temp_2017, 
    engine='pydap',
    session=my_session, 
    parallel=True, 
    combine='nested', 
    concat_dim='time',
)

Finally, we plot THETA#

Variable = [dataset['THETA'][0, 0, i, :, :] for i in range(13)]
clevels = np.linspace(-5, 30, 100)
cMap='RdBu_r'
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
    axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
    axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
    AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)

for i in range(len(AXES_CAP)):
    AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)

for i in range(len(AXES_R)):
    AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)

for ax in np.ravel(axes):
    ax.axis('off')
    plt.setp(ax.get_xticklabels(), visible=False)
    plt.setp(ax.get_yticklabels(), visible=False)

plt.show()
../_images/604989e092b267acf686a80c7b6f81aeb1b1ec065ef6d1ee424a87cc6dba60c7.png

Fig. 3. Surface temperature, plotted on a horizontal layout that approximates lat-lon display. Data on the arctic cap, however, remains on a polar coordinate grid.