Basic Usage
Using SlideRule
There are four steps to using SlideRule:
Import the client package
Optionally configure the client
Define the request parameters
Issue the processing request
Import the client package
The majority of the SlideRule Python client functionality is found in the sliderule
module; but there are other modules as well that include additional features and mission specific functions and variables. To import the client and start using Sliderule, you can use the following code:
from sliderule import sliderule
Here is a list of modules in the SlideRule Python client.
- sliderule
Core functionality: initialization and configuration, making requests, processing an area of interest.
- earthdata
Query for resources using CMR, CMR-STAC, TNM, and other services.
- h5
Directly read HDF5 data from the cloud using the server-side H5Coro implementation.
- raster
Sample and subset supported raster datasets
- icesat2
Issue processing requests for ICESat-2 standard and custom data products
- gedi
Issue processing requests for GEDI standard and custom data products
- io
Read and write SlideRule output in different formats
- ipysliderule
Widgets and routines for building interactive interfaces to SlideRule in a Jupyter notebook
If you wanted to use multiple modules from the SlideRule Python client, you could use the following code as an example:
from sliderule import sliderule, earthdata, icesat2
Optionally Configure the Client
You can begin using the SlideRule Python client right away without any configuration. By default, the client is configured to use the public cluster and issue a minimal number of log messages, and if no further configuration is performed, all future processing requests will work with those settings.
But the client provides numerous routines for configuring its behavior. Here is a list and short description of each of the available routines. See the API Reference for a more detailed description.
- sliderule.init
Primary routine for configuring client; attempts to capture as parameters the typical settings that a user would want to change.
- sliderule.set_url
Configure the domain that the client points to for the server-side cluster; defaults to
slideruleearth.io
but can be changed to things likelocalhost
for local development. Typically, this should not be changed.- sliderule.set_ssl_verify
Disable the SSL certification check when making processing requests to sliderule; by default the client verifies the cert, but in cases (usually development) when a cert is invalid but the user knows the server being pointed to is valid, this setting can be overridden to allow the requests to go through. Typically, this should not be changed.
- sliderule.set_verbose
Change the verbosity of the log messages being generated; when enabled, server-side log messages will be printed to the user console.
- sliderule.set_rqst_timeout
Change how long to wait for the request to finish; needed when a user is making a very large processing request and needs to match the client timeouts to the server-side timeouts provided in the request parameters.
- sliderule.set_processing_flags
Certain streamed responses flag auxiliary fields in their response structure that indicate those fields are not necessary for core functionality; the client can be configured to skip those fields in order to speed up processing large responses.
- sliderule.update_available_servers
Acquire the number of nodes in the cluster and make a request to change (e.g. request a capacity increase) the number of nodes in a cluster.
- sliderule.scaleout
Increase the number of nodes in a cluster and wait for the cluster to reach the requested capacity
- sliderule.authenticate
Configure the organization that the client points to for processing requests. This is paired with the
sliderule.set_url()
routine to create the full URL for processing requests. For example, if the user callssliderule.set_url("slideruleearth.io")
and thensliderule.authenticate("uw")
then the client will make all requests tohttps://uw.slideruleearth.io
. This routine also handles authenticating to the organization when the associated cluster is a private cluster.
Define the Request Parameters
When making a request to the SlideRule servers, the parameters of the request (i.e. what the user wants to process and how they want to process it) are supplied in the body of the request as a JSON structure. When using the SlideRule Python client, the parameters are captured and provided by the user in a Python dictionary, and the dictionary is automatically serialized into a JSON structure by the client when making the request.
For example, to set the confidence filter on an ATL03 subsetting request, the parameter structure needed by the endpoint can be passed into the sliderule.run()
function as a dictionary, like so:
sliderule.run("atl03x", {"cnf":-1}, resources=["ATL03_20181019065445_03150111_005_01.h5"])
Issue the Processing Request
There are two general purpose routines provided in the SlideRule Python client for issuing processing requests.
- sliderule.source
Implements the low-level protocol for making requests to SlideRule and processing the results. This can be used to issue a request to any SlideRule endpoint.
- sliderule.run
Implements a standard SlideRule convention for making requests to SlideRule endpoints that return a dataframe. This uses the
sliderule.source()
routine.
A user is always free to use one of the routines above for making requests to SlideRule, but many times it is more convenient to use one of the helper functions in the mission specific modules. For instance, when making processing requests for ICESat-2 data, the icesat2
module provides many routines that wrap calls to specific endpoints in an easy-to-use Python function. For instance, when making a request to the atl06p
endpoint, a user should use the icesat2.atl06p()
Python routine.
General Request Parameters
Resources
All requests must provide some way for the server side code to know which resources to process. Typically, that is accomplished via specifying an area of interest or other query parameters; but sometimes it is necessary to manually specify which resources to process. When that is the case, there are a few parameters the user can use to do so:
resources
: a list of resources to process (e.g. granule names like “ATL03_20181019065445_03150111_005_01.h5”)asset
: the name of a collection of resources; this rarely needs to be specified because the default value for most endpoints are sufficient
Polygons
All polygons provided to SlideRule must be provided as a list of dictionaries containing longitudes and latitudes in counter-clockwise order with the first and last point matching.
The applicable parameters used to specify the polygon are:
poly
: polygon of region of interestproj
: projection used when subsetting data (“north_polar”, “south_polar”, “plate_carree”). In most cases, do not specify and code will do the right thing.ignore_poly_for_cmr
: boolean for whether to use the polygon as a part of the request to CMR for obtaining the list of resources to process. By default the polygon is used and this is only here for unusual cases where SlideRule is able to handle a polygon for subsetting that CMR cannot, and the list of resources to process is obtained some other way.
For example:
region = [ {"lon": -108.3435200747503, "lat": 38.89102961045247},
{"lon": -107.7677425431139, "lat": 38.90611184543033},
{"lon": -107.7818591266989, "lat": 39.26613714985466},
{"lon": -108.3605610678553, "lat": 39.25086131372244},
{"lon": -108.3435200747503, "lat": 38.89102961045247} ]
parms = {
"poly": region['poly']
}
In order to facilitate other formats, the sliderule.toregion
function can be used to convert polygons from the GeoJSON and Shapefile formats into this format accepted by SlideRule
.
Rasterized Area of Interest
There is no limit to the number of points in the polygon, but note that as the number of points grow, the amount of time it takes to perform the subsetting process also grows. Also, some regions cannot be expressed as a single polygon because they have holes in them or define discrete unconnected areas. Because of this, one of the outputs of the sliderule.toregion
function is a GeoJSON object for describing complex geometries. It is available under the "region_mask"
element of the returned dictionary.
When the GeoJSON is supplied in the parameters sent in the request, the server side software forgoes using the polygon for subsetting operations, and instead builds a raster of the GeoJSON object using the specified cellsize, and then uses that raster image as a mask to determine which points in the source datasets are included in the region of interest. The applicable parameters use to specify this functionality are:
raster
: geojson describing region of interest, enables use of rasterized region for subsettingcellsize
: the resolution to rasterize the GeoJSON; currently the units are spherical degrees and no projections are supported, but support for projections will be included in future releases
The example code below shows how this option can be enabled and used (note, the poly
parameter is still required):
region = sliderule.toregion('examples/grandmesa.geojson', cellsize=0.02)
parms = {
"poly": region['poly'],
"region_mask": region['raster']
}
Timeouts
Each request supports setting three different timeouts. These timeouts should only need to be set by a user manually either when making extremely large processing requests, or when failing fast is necessary and default timeouts are too long.
rqst_timeout
: total time in seconds for request to be processednode_timeout
: time in seconds for a single node to work on a distributed request (used for proxied requests)read_timeout
: time in seconds for a single read of an asset to taketimeout
: global timeout setting that sets all timeouts at once (can be overridden by further specifying the other timeouts)
Time
A time range is typically used to limit the resources being processed to only include those resources collected within the time range specified. All times sent as request parameters are in GMT time. All times returned in result records are in number of seconds (fractual, double precision) since the GPS epoch which is January 6, 1980 at midnight (1980-01-06:T00.00.00.000000Z).
t0
: start time for filtering source datasets (format %Y-%m-%dT%H:%M:%SZ, e.g. 2018-10-13T00:00:00Z)t1
: stop time for filtering source datasets (format %Y-%m-%dT%H:%M:%SZ, e.g. 2018-10-13T00:00:00Z)
The SlideRule Python client provides helper functions to perform the conversion. See gps2utc </web/rtd/api_reference/sliderule.html#gps2utc>
_.
For APIs that return GeoDataFrames, the columns that hold times are represented as a datatime
with microsecond precision. In most cases, the applicable time column will be used as the index of the GeoDataFrame.