PACE#

This notebook demonstrates access to PACE Ocean Color Data. Broad information about the dataset can be found on the PACE website (see here)

Requirements to run this notebook

  1. Have an Earth Data Login account

  2. Have a Bearer Token.

Objectives

Use pydap’s client API to demonstrate

  • Access to NASA’s EarthData in the cloud via the use of tokens.

  • Access/download PACE data with an OPeNDAP URL and pydap’s client.

  • Construct a Constraint Expression.

  • Speed up the workflow by exploiting OPeNDAP’s data-proximate subsetting to access and download only subset of the original data.

Author: Miguel Jimenez-Urias, ‘24

import matplotlib.pyplot as plt
import numpy as np
import requests
from pydap.client import open_url
import cartopy.crs as ccrs
import xarray as xr

Access EARTHDATA#

The PACE OPeNDAP data catalog can be found here. Data only starts in 2024.

# slow download URL / higher resolution
url_DAP4 = "http://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0310/PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc"

Add to session’s headers Token Authorization#

edl_token = "YourToken"

auth_hdr="Bearer " + edl_token

# pass Token Authorization to a new Session.

my_session = requests.Session()
my_session.headers={"Authorization": auth_hdr}
%%time
ds_full = open_url(url_DAP4, session=my_session, protocol='dap4')
CPU times: user 43.7 ms, sys: 4.73 ms, total: 48.4 ms
Wall time: 945 ms
ds_full.tree()
.PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc
├──lat
├──lon
├──chlor_a
└──palette

Note

PyDAP accesses the remote dataset’s metadata, and no data has been downloaded yet!

ds_full['chlor_a'].attributes
{'long_name': 'Chlorophyll Concentration, OCI Algorithm',
 'units': 'mg m^-3',
 'standard_name': 'mass_concentration_of_chlorophyll_in_sea_water',
 '_FillValue': -32767.0,
 'valid_min': 0.00100000005,
 'valid_max': 100.0,
 'reference': 'Hu, C., Lee Z., and Franz, B.A. (2012). Chlorophyll-a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference, J. Geophys. Res., 117, C01011, doi:10.1029/2011JC007395.',
 'display_scale': 'log',
 'display_min': 0.00999999978,
 'display_max': 20.0,
 'Maps': ('/lat', '/lon')}
print('uncompressed dataset size [GBs]: ', ds_full.nbytes / 1e9)
uncompressed dataset size [GBs]:  0.149351808
ds_full['chlor_a'].shape
(4320, 8640)
print('uncompressed dataset size [GBs]: ', ds_full['chlor_a'].nbytes / 1e9)
uncompressed dataset size [GBs]:  0.1492992

Lets download some data#

%%time
chlor_a = ds_full['chlor_a'][:]
CPU times: user 1min 9s, sys: 2min 2s, total: 3min 12s
Wall time: 3min 26s
chlor_a.attributes
{'long_name': 'Chlorophyll Concentration, OCI Algorithm',
 'units': 'mg m^-3',
 'standard_name': 'mass_concentration_of_chlorophyll_in_sea_water',
 '_FillValue': -32767.0,
 'valid_min': 0.00100000005,
 'valid_max': 100.0,
 'reference': 'Hu, C., Lee Z., and Franz, B.A. (2012). Chlorophyll-a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference, J. Geophys. Res., 117, C01011, doi:10.1029/2011JC007395.',
 'display_scale': 'log',
 'display_min': 0.00999999978,
 'display_max': 20.0,
 'Maps': ('/lat', '/lon'),
 'checksum': array([221433038], dtype=uint32)}
chlor_a.nbytes/1e9
0.1492992

Decoding data values:

xarray decodes time and spatial values internally by default, everytime one accesses the data values, whereas currently there is no such method within pydap to do so. But it is often useful to understand how this works internally, and what type of parameters are used for decoding. Because OPeNDAP is based on the NetCDF data model, it if a CF-compliant software. Below are some of the most used metadata attributes associated for decoding data:

CF - Conventions

In OPeNDAP’s metadata rich datasets, each contains standard attributes used to describe missing data, units in which the data is presented, and any stretching/scaling of the values.

  • standard name

  • units

  • _FillValue

  • scale_factor

  • off_set

Below is a simple function that decodes the spatial values within the array:

def decode(variable) -> np.ndarray:
    """Decodes the variable BaseType according with atributes:
        _FillValue
        scale_factor

    Parameters:
        variable: BaseType (pydap model)
    """
    scale_factor = 1
    _Fillvalue = None

    if 'scale_factor' in variable.attributes:
        scale_factor = variable.scale_factor
    if '_FillValue' in variable.attributes:
        data = np.where(variable.data == variable._FillValue, np.nan, variable.data)    
    else:
        data = variable.data
    return scale_factor * data
CHLOR_A = decode(chlor_a)
CHLOR_A.shape
(4320, 8640)
ds_full['lon'].shape
(8640,)
Lon, Lat = np.meshgrid(decode(ds_full['lon'][:]), decode(ds_full['lat'][:]))
plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ax.coastlines()
plt.contourf(Lon[::4], Lat[::4], np.log(CHLOR_A[::4]), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.name + ' ['+chlor_a.units+']');
../_images/53d4916b62fb68c3dd73984c08747283cc7834e1d120650b7b77a4d2b0c153ed.png

Fig 1. Chlorophyll A concentration for the entire dataset.

Subsetting data#

We can use the figure above to find a region of interest, in index space. Then we can combine xarray and OPeNDAP's Hyrax to access only the data we want.

  • NorthAtlantic

plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ax.coastlines()
plt.contourf(Lon[500:2000,2000:4500], Lat[500:2000,2000:4500], np.log(CHLOR_A[500:2000,2000:4500]), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.name + ' ['+chlor_a.units+']');
../_images/7628bc3e36ff0a7eff599c0a2ab96ce7d747463a45045984c47cb5f7c6dc0399.png

Fig 2. An approach to visually subset the global Chlorophyll A concentration. This approach allows to extract index values to construct the Constraint Expression and add it to the URL.

Adding a Constraint Expression to URL#

Note

In OPeNDAP’s Constraint Expressions, hyper_slices define index range as follows: start:step:stop, and these include the last index (stop).

You can pass the index range to the URL, in the form of a Constraint Expression (CE). The syntax for CEs in DAP4 is:

url + "?dap4.ce=/lat[500:1:2000];/lon[2000:1:4500];/chlor_a[500:1:2000][2000:1:4500]"
url = "dap4://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0310/PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc?dap4.ce=/lat[500:1:2000];/lon[2000:1:4500];/chlor_a[500:1:2000][2000:1:4500]"
ds = open_url(url, session=my_session)
ds.tree()
.PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc
├──lat
├──lon
└──chlor_a
Lon, Lat = np.meshgrid(decode(ds['lon'][:]), decode(ds['lat'][:]))
%%time
chlor_a = ds['chlor_a'][:]
CPU times: user 749 ms, sys: 614 ms, total: 1.36 s
Wall time: 3.08 s

Downloading into memory is now 60 times faster !

CHLOR_A = decode(chlor_a)
CHLOR_A.shape
(1501, 2501)
plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.contourf(Lon, Lat, np.log(CHLOR_A), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.name + ' ['+chlor_a.units+']')
plt.show()
../_images/2467d4aaa2b361921634ec9c77e37e0228e876b1412a6244cd3dd408e7b0b0d9.png

Fig 3. Subset the global Chlorophyll A concentration. Only this data was requested and downloaded. The OPeNDAP server, in this case Hyrax, perform all subseting in a data-proximate way.