PACE#
This notebook demonstrates access to PACE Ocean Color Data. Broad information about the dataset can be found on the PACE website (see here)
Requirements to run this notebook
Have an Earth Data Login account
Have a Bearer Token.
Objectives
Use pydap’s client API to demonstrate
Access to NASA’s EarthData in the cloud via the use of
tokens
.Access/download
PACE
data with anOPeNDAP
URL andpydap
’s client.Construct a
Constraint Expression
.Speed up the workflow by exploiting
OPeNDAP
’s data-proximate subsetting to access and download only subset of the original data.
Author
: Miguel Jimenez-Urias, ‘24
import matplotlib.pyplot as plt
import numpy as np
import requests
from pydap.client import open_url
import cartopy.crs as ccrs
import xarray as xr
Access EARTHDATA#
The PACE OPeNDAP data catalog can be found here. Data only starts in 2024.
# slow download URL / higher resolution
url_DAP4 = "http://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0310/PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc"
Lets download some data#
%%time
chlor_a = ds_full['chlor_a'][:]
CPU times: user 1min 9s, sys: 2min 2s, total: 3min 12s
Wall time: 3min 26s
chlor_a.attributes
{'long_name': 'Chlorophyll Concentration, OCI Algorithm',
'units': 'mg m^-3',
'standard_name': 'mass_concentration_of_chlorophyll_in_sea_water',
'_FillValue': -32767.0,
'valid_min': 0.00100000005,
'valid_max': 100.0,
'reference': 'Hu, C., Lee Z., and Franz, B.A. (2012). Chlorophyll-a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference, J. Geophys. Res., 117, C01011, doi:10.1029/2011JC007395.',
'display_scale': 'log',
'display_min': 0.00999999978,
'display_max': 20.0,
'Maps': ('/lat', '/lon'),
'checksum': array([221433038], dtype=uint32)}
chlor_a.nbytes/1e9
0.1492992
Decoding data values:
xarray
decodes time and spatial values internally by default, everytime one accesses the data values, whereas currently there is no such method within pydap
to do so. But it is often useful to understand how this works internally, and what type of parameters are used for decoding. Because OPeNDAP is based on the NetCDF data model, it if a CF-compliant software. Below are some of the most used metadata attributes associated for decoding data:
CF - Conventions
In OPeNDAP’s metadata rich datasets, each contains standard attributes used to describe missing data, units in which the data is presented, and any stretching/scaling of the values.
standard name
units
_FillValue
scale_factor
off_set
Below is a simple function that decodes the spatial values within the array:
def decode(variable) -> np.ndarray:
"""Decodes the variable BaseType according with atributes:
_FillValue
scale_factor
Parameters:
variable: BaseType (pydap model)
"""
scale_factor = 1
_Fillvalue = None
if 'scale_factor' in variable.attributes:
scale_factor = variable.scale_factor
if '_FillValue' in variable.attributes:
data = np.where(variable.data == variable._FillValue, np.nan, variable.data)
else:
data = variable.data
return scale_factor * data
CHLOR_A = decode(chlor_a)
CHLOR_A.shape
(4320, 8640)
ds_full['lon'].shape
(8640,)
Lon, Lat = np.meshgrid(decode(ds_full['lon'][:]), decode(ds_full['lat'][:]))
plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ax.coastlines()
plt.contourf(Lon[::4], Lat[::4], np.log(CHLOR_A[::4]), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.name + ' ['+chlor_a.units+']');
Fig 1. Chlorophyll A concentration for the entire dataset.
Subsetting data#
We can use the figure above to find a region of interest, in index space. Then we can combine xarray
and OPeNDAP's Hyrax
to access only the data we want.
NorthAtlantic
plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()
ax.coastlines()
plt.contourf(Lon[500:2000,2000:4500], Lat[500:2000,2000:4500], np.log(CHLOR_A[500:2000,2000:4500]), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.name + ' ['+chlor_a.units+']');
Fig 2. An approach to visually subset the global Chlorophyll A concentration. This approach allows to extract index values to construct the Constraint Expression and add it to the URL.
Adding a Constraint Expression to URL#
Note
In OPeNDAP
’s Constraint Expressions
, hyper_slices
define index range as follows: start:step:stop
, and these include the last index (stop
).
You can pass the index range to the URL, in the form of a Constraint Expression (CE)
. The syntax for CE
s in DAP4
is:
url + "?dap4.ce=/lat[500:1:2000];/lon[2000:1:4500];/chlor_a[500:1:2000][2000:1:4500]"
url = "dap4://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2024/0310/PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc?dap4.ce=/lat[500:1:2000];/lon[2000:1:4500];/chlor_a[500:1:2000][2000:1:4500]"
ds = open_url(url, session=my_session)
ds.tree()
.PACE_OCI.20240310.L3m.DAY.CHL.V2_0.chlor_a.4km.NRT.nc
├──lat
├──lon
└──chlor_a
Lon, Lat = np.meshgrid(decode(ds['lon'][:]), decode(ds['lat'][:]))
%%time
chlor_a = ds['chlor_a'][:]
CPU times: user 749 ms, sys: 614 ms, total: 1.36 s
Wall time: 3.08 s
Downloading into memory is now 60 times faster !
CHLOR_A = decode(chlor_a)
CHLOR_A.shape
(1501, 2501)
plt.figure(figsize=(25, 8))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.contourf(Lon, Lat, np.log(CHLOR_A), 400, cmap='nipy_spectral')
plt.colorbar().set_label(chlor_a.name + ' ['+chlor_a.units+']')
plt.show()
Fig 3. Subset the global Chlorophyll A concentration. Only this data was requested and downloaded. The OPeNDAP server, in this case Hyrax, perform all subseting in a data-proximate way.