ECCOv4 access via NASA’s EarthData in the Cloud#
This notebook demonstrates access to ECCOv4 model output. Broad information about the ECCO dataset can be found in the PODAAC website (see here).
Requirements to run this notebook
Have an Earth Data Login account
Have a Bearer Token.
Objectives
Use pydap’s client API to demonstrate
Access to NASA’s EarthData via the use of
tokens
.Tokens
are greatly favored overusername/password
, sincetokens
avoid repeated authentication over the many redirects.A workflow involving multiple OPeNDAP URLs and xarray parallelism, lazy evaluation, and plotting of Level 4 with complex Topology ECCOv4 Data via OPeNDAP.
Some variables of focus are
Author
: Miguel Jimenez-Urias, ‘24
import matplotlib.pyplot as plt
import numpy as np
from pydap.net import create_session
from pydap.client import open_url
import xarray as xr
Access EARTHDATA#
Many of the data variables can be browsed here. Here we will work with the original data defined on the Lat-Lon-Cap (LLC90) grid.
Grid_url = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%20Geometry%20Parameters%20for%20the%20Lat-Lon-Cap%2090%20(llc90)%20Native%20Model%20Grid%20(Version%204%20Release%204)/granules/GRID_GEOMETRY_ECCO_V4r4_native_llc0090'
Use .netrc
credentials#
my_session = create_session()
Alternative token approach#
session_extra = {"token": "YourToken"}
# initialize a requests.session object with the token headers. All handled by pydap.
my_session = create_session(session_kwargs=session_extra)
Lazy access to remote data via pydap’s client API#
pydap
exploits the OPeNDAP’s separation between metadata and data, to create lazy dataset objects that point to the data
. These lazy objects contain all the attributes detailed in OPeNDAP’s metadata files (DMR).
ds_grid = open_url(Grid_url, session=my_session, protocol="dap4")
ds_grid.tree()
.GRID_GEOMETRY_ECCO_V4r4_native_llc0090.nc
├──dxC
├──PHrefF
├──XG
├──dyG
├──rA
├──hFacS
├──Zp1
├──Zl
├──rAw
├──dxG
├──maskW
├──YC
├──XC
├──maskS
├──YG
├──hFacC
├──drC
├──drF
├──XC_bnds
├──Zu
├──Z_bnds
├──YC_bnds
├──PHrefC
├──rAs
├──Depth
├──dyC
├──SN
├──rAz
├──maskC
├──CS
├──hFacW
├──Z
├──i
├──i_g
├──j
├──j_g
├──k
├──k_l
├──k_p1
├──k_u
├──nb
├──nv
└──tile
Note
PyDAP accesses the remote dataset’s metadata, and no data has been downloaded yet!
Download data into memory
The syntax is as follows:
# this fetches remote data into a pydap object container
pydap_Array = dataset[<VarName>][:]
where <VarName>
is the name of one of the variables in the pydap.model.BaseType
object. The pydap.model.BaseType
is a thing wrapper of numpy arrays
. The array has been downloaded into “local” memory (RAM) as (uncompressed) numpy arrays.
To extract the data as a numpy array, run
# The `.data` command allows direct access to the Numpy array (e.g. for manipulation)
pydap_Array.data
# lets download some data
Depth = ds_grid['Depth'][:]
print(type(Depth))
<class 'pydap.model.BaseType'>
Depth.attributes
{'_FillValue': 9.969209968e+36,
'long_name': 'model seafloor depth below ocean surface at rest',
'units': 'm',
'coordinate': 'XC YC',
'coverage_content_type': 'modelResult',
'standard_name': 'sea_floor_depth_below_geoid',
'comment': "Model sea surface height (SSH) of 0m corresponds to an ocean surface at rest relative to the geoid. Depth corresponds to seafloor depth below geoid. Note: the MITgcm used by ECCO V4r4 implements 'partial cells' so the actual model seafloor depth may differ from the seafloor depth provided by the input bathymetry file.",
'coordinates': 'YC XC',
'origname': 'Depth',
'fullnamepath': '/Depth',
'dims': ['tile', 'j', 'i'],
'Maps': (),
'checksum': array([950678499], dtype=uint32)}
Depth.shape, Depth.dimensions
((13, 90, 90), ['/tile', '/j', '/i'])
Plot Depth along native grid
ECCO
data is defined on a Cube Sphere – meaning the horizontal grid contains an extra
dimension: tile
or face
. You can inspect the data in its native grid by plotting all horizontal data onto a grid as follows:
Variable = [Depth[i].data for i in range(13)]
clevels = np.linspace(0, 6000, 100)
cMap = 'Greys_r'
fig, axes = plt.subplots(nrows=5, ncols=5, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES = [
axes[4, 0], axes[3, 0], axes[2, 0], axes[4, 1], axes[3, 1], axes[2, 1],
axes[1, 1],
axes[1, 2], axes[1, 3], axes[1, 4], axes[0, 2], axes[0, 3], axes[0, 4],
]
for i in range(len(AXES)):
AXES[i].contourf(Variable[i], clevels, cmap=cMap)
for ax in np.ravel(axes):
ax.axis('off')
plt.setp(ax.get_xticklabels(), visible=False)
plt.setp(ax.get_yticklabels(), visible=False)
plt.show()

Fig. 1. Depth
plotted on a horizontal layout. Data on tiles 0-5
follow C-ordering
, whereas data on tiles 7-13
follow F-ordering
. Data on the arctic cap
, is defined on a polar coordinate grid.
Plot with corrected Topology
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)
for i in range(len(AXES_CAP)):
AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)
for i in range(len(AXES_R)):
AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)
for ax in np.ravel(axes):
ax.axis('off')
plt.setp(ax.get_xticklabels(), visible=False)
plt.setp(ax.get_yticklabels(), visible=False)
plt.show()

Fig. 2. Depth
plotted on a horizontal layout that approximates lat-lon
display. Data on the arctic cap
, however, remains on a polar coordinate grid.
pydap + xarray to aggregate multiple URLs#
OPeNDAP
allows remote access via dataURL
s. Each dataURL
represents a variable, i.e. a piece of the whole puzzle. We can exploit xarray
concatenation and parallelism to combine multiple dataURL
s, and thus multiple pydap objects, into a single xarray.Dataset
. Can do this because xarray has long-implemented pydap as an engine to access/open remote datasets
.
A single URL
Many OPeNDAP
servers can serve data in either DAP2
and DAP4
(much newer) model implementations. In pydap
, you can specify which of the two implementations by replacing the beginning of the URL with either one of dap2
or dap4
. Since xarray
imports pydap internally as is, then xarray can recognize this behavior as well.
Below we access remote Temperature and Salinity ECCO data with xarray
via (internally) pydap
.
baseURL = 'dap4://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/'
Temp_Salt = "ECCO%20Ocean%20Temperature%20and%20Salinity%20-%20Monthly%20Mean%20llc90%20Grid%20(Version%204%20Release%204)/granules/OCEAN_TEMPERATURE_SALINITY_mon_mean_"
year = '2017-'
month = '01'
end_ = '_ECCO_V4r4_native_llc0090'
Temp_2017 = baseURL + Temp_Salt +year + month + end_
%%time
dataset = xr.open_dataset(Temp_2017, engine='pydap', session=my_session)
dataset
CPU times: user 115 ms, sys: 19.5 ms, total: 134 ms
Wall time: 20.5 s
<xarray.Dataset> Size: 47MB Dimensions: (/tile: 13, /j_g: 90, /i_g: 90, /k_p1: 51, /k_l: 50, /j: 90, /i: 90, /time: 1, /k: 50, /nb: 4, /k_u: 50, /nv: 2) Coordinates: (12/13) XG (/tile, /j_g, /i_g) float32 421kB ... Zp1 (/k_p1) float32 204B ... Zl (/k_l) float32 200B ... YC (/tile, /j, /i) float32 421kB ... XC (/tile, /j, /i) float32 421kB ... YG (/tile, /j_g, /i_g) float32 421kB ... ... ... Zu (/k_u) float32 200B ... Z_bnds (/k, /nv) float32 400B ... YC_bnds (/tile, /j, /i, /nb) float32 2MB ... time_bnds (/time, /nv) datetime64[ns] 16B ... Z (/k) float32 200B ... time (/time) datetime64[ns] 8B ... Dimensions without coordinates: /tile, /j_g, /i_g, /k_p1, /k_l, /j, /i, /time, /k, /nb, /k_u, /nv Data variables: (12/13) SALT (/time, /k, /tile, /j, /i) float32 21MB ... THETA (/time, /k, /tile, /j, /i) float32 21MB ... i (/i) int32 360B ... i_g (/i_g) int32 360B ... j (/j) int32 360B ... j_g (/j_g) int32 360B ... ... ... k_l (/k_l) int32 200B ... k_p1 (/k_p1) int32 204B ... k_u (/k_u) int32 200B ... nb (/nb) float32 16B ... nv (/nv) float32 8B ... tile (/tile) int32 52B ... Attributes: (12/62) acknowledgement: This research was carried out by the Jet... author: Ian Fenty and Ou Wang cdm_data_type: Grid comment: Fields provided on the curvilinear lat-l... Conventions: CF-1.8, ACDD-1.3 coordinates_comment: Note: the global 'coordinates' attribute... ... ... time_coverage_duration: P1M time_coverage_end: 2017-02-01T00:00:00 time_coverage_resolution: P1M time_coverage_start: 2017-01-01T00:00:00 title: ECCO Ocean Temperature and Salinity - Mo... uuid: f5b7028c-4181-11eb-b7e6-0cc47a3f47b1
Using CEs to pre-select only variables of interest#
Allows the user to drop variables but ONLY after creating the dataset of the entire remote URL. With OPeNDAP servers, the you encode the names of the variables you want to access in the URL. THis can significantly speed up analysis and exploration.
Below we manually specify the DAP4 Constraint Expression for keeping only the variables THETA
, along with its dimensions time
, k
, time
, j
, i
and tile
.
You can also construct the full URL with constraint expressions interactively, by selecting manually the variables on the datasets’s Data Request Form and selecting Copy (Raw) Data URL.
CE = "?dap4.ce=/THETA;/i;/time;/j;/k"
Temp_2017 = baseURL + Temp_Salt +year + month + end_ + CE
%%time
dataset = xr.open_dataset(Temp_2017, engine='pydap', session=my_session, decode_times=False)
dataset
CPU times: user 17.4 ms, sys: 1.82 ms, total: 19.3 ms
Wall time: 2.76 s
<xarray.Dataset> Size: 21MB Dimensions: (/time: 1, /k: 50, /tile: 13, /j: 90, /i: 90) Coordinates: time (/time) int32 4B ... Dimensions without coordinates: /time, /k, /tile, /j, /i Data variables: THETA (/time, /k, /tile, /j, /i) float32 21MB ... i (/i) int32 360B ... j (/j) int32 360B ... k (/k) int32 200B ... Attributes: (12/62) acknowledgement: This research was carried out by the Jet... author: Ian Fenty and Ou Wang cdm_data_type: Grid comment: Fields provided on the curvilinear lat-l... Conventions: CF-1.8, ACDD-1.3 coordinates_comment: Note: the global 'coordinates' attribute... ... ... time_coverage_duration: P1M time_coverage_end: 2017-02-01T00:00:00 time_coverage_resolution: P1M time_coverage_start: 2017-01-01T00:00:00 title: ECCO Ocean Temperature and Salinity - Mo... uuid: f5b7028c-4181-11eb-b7e6-0cc47a3f47b1
Multiple files in parallel#
We can exploit xarray
’s intuitive concat
+ merge
capabilities to read multiple data URL in parallel. To accomplish that, we need to create a list of all relevant URLs that each point to an OPeNDAP dataset.
Tip
In this case we will use a session that caches the response. It can handle the authentication too!
cache_session = create_session(use_cache=True) # caching the session
cache_session.cache.clear() # clear all cache to demonstrate the behavior. You do not have to do this!
%%time
Temp_2017 = [baseURL + Temp_Salt + year + f'{i:02}' + end_ + CE for i in range(1, 13)]
theta_salt_ds = xr.open_mfdataset(
Temp_2017,
engine='pydap',
session=cache_session,
parallel=True,
combine='nested',
concat_dim='/time',
decode_times=False,
)
CPU times: user 310 ms, sys: 132 ms, total: 442 ms
Wall time: 6.37 s
Any subsequent try should take milliseconds!#
Note
By default, a cache_session will store the metadata associated with this URLs (or any dap response downloaded) for 1 day, persisting your notebook. You can modify this by passing extra arguments to the create_session
function.
%%time
Temp_2017 = [baseURL + Temp_Salt + year + f'{i:02}' + end_ + CE for i in range(1, 13)]
theta_salt_ds = xr.open_mfdataset(
Temp_2017,
engine='pydap',
session=cache_session,
parallel=True,
combine='nested',
concat_dim='/time',
decode_times=False,
)
theta_salt_ds
CPU times: user 37.8 ms, sys: 3.28 ms, total: 41.1 ms
Wall time: 38.6 ms
<xarray.Dataset> Size: 253MB Dimensions: (/time: 12, /k: 50, /tile: 13, /j: 90, /i: 90) Coordinates: time (/time) int32 48B dask.array<chunksize=(1,), meta=np.ndarray> Dimensions without coordinates: /time, /k, /tile, /j, /i Data variables: THETA (/time, /k, /tile, /j, /i) float32 253MB dask.array<chunksize=(1, 50, 13, 90, 90), meta=np.ndarray> i (/time, /i) int32 4kB dask.array<chunksize=(1, 90), meta=np.ndarray> j (/time, /j) int32 4kB dask.array<chunksize=(1, 90), meta=np.ndarray> k (/time, /k) int32 2kB dask.array<chunksize=(1, 50), meta=np.ndarray> Attributes: (12/62) acknowledgement: This research was carried out by the Jet... author: Ian Fenty and Ou Wang cdm_data_type: Grid comment: Fields provided on the curvilinear lat-l... Conventions: CF-1.8, ACDD-1.3 coordinates_comment: Note: the global 'coordinates' attribute... ... ... time_coverage_duration: P1M time_coverage_end: 2017-05-01T00:00:00 time_coverage_resolution: P1M time_coverage_start: 2017-04-01T00:00:00 title: ECCO Ocean Temperature and Salinity - Mo... uuid: fa96f474-4181-11eb-bcb1-0cc47a3f47b1
Finally, we plot THETA
#
Variable = [theta_salt_ds['THETA'][0, 0, i, :, :] for i in range(13)]
clevels = np.linspace(-5, 30, 100)
cMap='RdBu_r'
fig, axes = plt.subplots(nrows=4, ncols=4, figsize=(8, 8), gridspec_kw={'hspace':0.01, 'wspace':0.01})
AXES_NR = [
axes[3, 0], axes[2, 0], axes[1, 0], axes[3, 1], axes[2, 1], axes[1, 1],
]
AXES_CAP = [axes[0, 0]]
AXES_R = [
axes[1, 2], axes[2, 2], axes[3, 2], axes[1, 3], axes[2, 3], axes[3, 3],
]
for i in range(len(AXES_NR)):
AXES_NR[i].contourf(Variable[i], clevels, cmap=cMap)
for i in range(len(AXES_CAP)):
AXES_CAP[i].contourf(Variable[6].transpose()[:, ::-1], clevels, cmap=cMap)
for i in range(len(AXES_R)):
AXES_R[i].contourf(Variable[7+i].transpose()[::-1, :], clevels, cmap=cMap)
for ax in np.ravel(axes):
ax.axis('off')
plt.setp(ax.get_xticklabels(), visible=False)
plt.setp(ax.get_yticklabels(), visible=False)
plt.show()

Fig. 3. Surface temperature
, plotted on a horizontal layout that approximates lat-lon
display. Data on the arctic cap
, however, remains on a polar coordinate grid.