PyDAP as a client#

PyDAP can be used to “lazily” inspect and retrieve remote data from any of the thousands of scientific datasets available on the internet on OPeNDAP data servers, allowing the user to manipulate a Dataset as if it were stored locally, only downloading on-the-fly when necessary. In order to transmit data from the Server to the Client, both server and client must agree on a way to represent data: is it an array of integers?, a multi-dimensional grid? In order to do this, a DAP protocol defines a data model that, in theory, should be able to represent any existing (scientific) dataset.

Pydap uses the requests library to fetch remote data from an OPeNDAP data server. Data from such a server is one of the following types:

Note

Clickling on any of the dap or dods example URLs will trigger a download of a OPeNDAP binary data. Pydap parses this binary data and turns is into a pydap Dataset.

Requests library#

As of version 3.5.4, pydap now uses Python’s requests library to get/fetch the remote datasets described on the Table above and can also use Python’s requests_cache library to cache responses. For the user, pydap has a special function to initialize any such session:

Session with No Cache

Cached Session

use_cache=False (default)

use_cache=True

from pydap.client import open_url
from pydap.net import create_session
data_url = "http://test.opendap.org/opendap/data/nc/coads_climatology.nc"

Use default non-cached Session#

# default
my_session = create_session()
%%time
pyds = open_url(data_url, protocol="dap4", session=my_session)
CPU times: user 17.2 ms, sys: 1.82 ms, total: 19.1 ms
Wall time: 269 ms

Lets try again#

%%time
pyds = open_url(data_url, protocol="dap4", session=my_session)
CPU times: user 4.24 ms, sys: 1.09 ms, total: 5.32 ms
Wall time: 155 ms

What is hapenning?#

In both cases, only the dmr associated with the remote dataset was fetched, and used to create the pydap dataset.

The apparent difference in timing can sometimes be attributes to what is called “cold reading” vs “warm reading”. But in both scenarios, each time the pyds is created, the remote dmr dataset is fetched and processed by pydap to create the lazy dataset that point to the original opendap source.

To avoid repeatedly downloading the same resource over and over, potentially overwhelming remote data servers, pydap can now cache responses.

Use Cached-Session#

# Non-default
cached_session = create_session(use_cache=True)

clear any prevous cached session#

cached_session.cache.clear()
%%time
new_pyds = open_url(data_url, protocol="dap4", session=cached_session)
CPU times: user 6.62 ms, sys: 3.62 ms, total: 10.2 ms
Wall time: 191 ms

The timing required to download a remote dmr from the same server remains close to that of the warm case.

Now let’s try again!#

%%time
new_pyds = open_url(data_url, protocol="dap4", session=cached_session)
CPU times: user 2.29 ms, sys: 432 μs, total: 2.72 ms
Wall time: 2.58 ms

The resulting timing has dropped significantly. This is because the dmr was never downloaded from the remote source. Insted it was fetched form the cache.

print("Default location of cached response: ", cached_session.cache.db_path)
Default location of cached response:  /var/folders/hc/tkfpclz952n091r0k5b2t9jr0000gn/T/http_cache.sqlite
print("URLs of cached responses: ", cached_session.cache.urls())
URLs of cached responses:  ['http://test.opendap.org/opendap/data/nc/coads_climatology.nc.dmr']

Finally, let’s clear the cache#

cached_session.cache.clear()
print("URLs of cached responses: ", cached_session.cache.urls())
URLs of cached responses:  []

Timeout#

To specify a timeout for the client, just set the desired number of seconds using the timeout option to open_url(...). For example, the following would timeout after 30 seconds without receiving a response from the server:

dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc', timeout=30)

Note

The default timeout is 120 seconds, or 2 minutes.