Discovering OPeNDAP URLs from NASA’s Earthdata#
This tutorial demonstrates how to find OPeNDAP URLs from the Common Metadata Repository (CMR). The CMR is NASA’s Earthdata API to query datasets available through many download and subset services, including OPeNDAP. The CMR API is complex and broad in scope, and with pydap.client.get_cmr_urls
users can query and retrieve OPeNDAP urls.
Requirements to run this notebook
Have an Earth Data Login account
Knowledge of the Collection Concept ID (CCID), or Digital Object Identifier (DOI) of the collection of interest.
Note
A collection in NASA’s perspective is a dataset (this, as opposed to a granule which can be thought of as an individual file and, all in aggregation describe the collection). And so the CCID or DOI are unique identifiers to that archive dataset.
Objectives
Use PyDAP to discover all opendap urls in two simple case studies
Discover all possible OPeNDAP urls associated with a specific Collection Concept ID (and DOI).
Discover all possible OPeNDAP urls from a collection, that match a time range and spatial bounding box of interest. These parameters, and others, are widely used by the CMR (and Earthdata search) to filter the number of possible returns from querying the CMR, therefore narrowing the search.
Author
: Miguel Jimenez-Urias, ‘25
from pydap.client import get_cmr_urls
import pydap
import datetime as dt
print("pydap version: ", pydap.__version__)
pydap version: 3.5.8
1) Discoverying daily, 4km cholophyll data from PACE (Level 3)#
In this example, we are interested in retrieving ALL Granule URLs from OPeNDAP, associated with a collection from PACE. For this collection, the CMR returns various versions of the data regarding the following variable:
Gridded Chlorophyll A, Version 3.1
PACE_ccid = "C3620140255-OB_CLOUD" # <--- This concept collection ID can be found of the Mission page for PACE.
urls = get_cmr_urls(ccid=PACE_ccid, limit=1000) # limit by default = 50
urls[:10]
['https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0701/PACE_OCI.20250701.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0701/PACE_OCI.20250701.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0702/PACE_OCI.20250702.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0702/PACE_OCI.20250702.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0703/PACE_OCI.20250703.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0703/PACE_OCI.20250703.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0704/PACE_OCI.20250704.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0704/PACE_OCI.20250704.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0705/PACE_OCI.20250705.L3m.DAY.CHL.V3_1.chlor_a.0p1deg.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0705/PACE_OCI.20250705.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc']
Identify the granules of interest#
Not all urls above can be aggregated into a single collection. These describe the same variables, interpolated over different time ranges. We want daily data at 4km resolution. Since this information is encoded into the URL, we can use a list comprehension to further filter our results.
pace_urls = [url for url in urls if '4km' in url and "DAY" in url]
pace_urls[:4]
['https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0701/PACE_OCI.20250701.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0702/PACE_OCI.20250702.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0703/PACE_OCI.20250703.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc',
'https://oceandata.sci.gsfc.nasa.gov/opendap/PACE_OCI/L3SMI/2025/0704/PACE_OCI.20250704.L3m.DAY.CHL.V3_1.chlor_a.4km.NRT.nc']
print("We found ", len(pace_urls), " relevant OPeNDAP urls from PACE")
We found 94 relevant OPeNDAP urls from PACE
2) Accessing swath data from ECOSTRESS (Level 2 data)#
In this example, we will filter the CMR result to only return granules URLs that have data in a specific area of interest.
Land Surface Temperature
Swath of data during the period of March 2025, in a bounding box defined below:
bounding_box = [-128.847656,41.112469,-107.050781,46.679594]
Note
You can use a web application, such as bbox finder, geojson, or Earthdata Search to construct polygons. Note that the CMR API requires, in the case of a bounding box, the following pattern: [West_Longitude, South_Latitude, East_Longitude, North_Latitude]
ECOSTRESS_ccid = "C2076114664-LPCLOUD"
bounding_box = [-128.847656,41.112469,-107.050781,46.679594]
time_range = [dt.datetime(2025, 3, 1), dt.datetime(2025, 3, 31)]
urls = get_cmr_urls(ccid=ECOSTRESS_ccid, bounding_box=bounding_box, time_range=time_range, limit=500)
print("Found ", len(urls), "relevant opendap urls for ECOSTRESS data")
Found 194 relevant opendap urls for ECOSTRESS data
urls[:6]
['https://opendap.earthdata.nasa.gov/collections/C2076114664-LPCLOUD/granules/ECOv002_L2_LSTE_37709_001_20250301T092419_0713_01',
'https://opendap.earthdata.nasa.gov/collections/C2076114664-LPCLOUD/granules/ECOv002_L2_LSTE_37712_008_20250301T141617_0713_01',
'https://opendap.earthdata.nasa.gov/collections/C2076114664-LPCLOUD/granules/ECOv002_L2_LSTE_37723_006_20250302T070023_0713_01',
'https://opendap.earthdata.nasa.gov/collections/C2076114664-LPCLOUD/granules/ECOv002_L2_LSTE_37723_007_20250302T070115_0713_01',
'https://opendap.earthdata.nasa.gov/collections/C2076114664-LPCLOUD/granules/ECOv002_L2_LSTE_37724_005_20250302T083359_0713_01',
'https://opendap.earthdata.nasa.gov/collections/C2076114664-LPCLOUD/granules/ECOv002_L2_LSTE_37724_007_20250302T083600_0713_01']
Warning
The CMR returns OPeNDAP urls with data that falls inside our bounding box. However, no subsetting has taken yet. To only download the data that falls within the area of interest, a user still needs to subset the data inside each file. CMR only filters by metadata. This is, for example, in one of those remote files, there may be only a single data point that lies inside the bounding box, whereas in another remote file, it is possible to find most of the data fall inside the bounding box.