GeoRaster
2023-05-04
Background
Raster data consists of 2-dimensional datasets that form a grid of square pixels, often called an image. A common format for storing raster data is TIFF. GeoTIFF is an extension to the TIFF format that embeds geospatial information into the TIFF file that ties the raster data to a geospatial reference. COGs are cloud-optimized GeoTIFFs that are internally optimized for access in the cloud. For more information see https://www.cogeo.org.
Overview
SlideRule supports sampling raster data at points of interest and including those sampled values alongside its customized data products. For instance, when performing an ATL06-SR processing run (atl06p
), the returned GeoDataFrame has a row for each calculated elevation; that row can also include values from different raster datasets that have been sampled at the geolocation of the calculated elevation.
In order to sample a raster dataset, SlideRule must first ascertain which individual raster files in the dataset intersect the point of interest, then obtain credentials to access the identified files, and then lastly, open up those files and read the necessary pixels to calculate the returned sample value. Unfortunately, most raster datasets are organized slightly differently and require a small amount of specialized code to perform the first step of determining which raster files need to be sampled. The second step of obtaining credentials also requires some specialized code, but since most of our datasets are in AWS and authenticated through NASA DAACs, most of the authentication code is generic. But even still, because of this, each raster dataset supported by SlideRule needs to be registered with SlideRule ahead of time and provided in what we call an Asset Directory.
Asset Directory
SlideRule’s asset directory is a list of datasets that SlideRule has access to. Each entry in the asset directory describes a dataset and provides the necessary information to find, authenticate, and read that dataset.
The following datasets are currently provided in SlideRule’s Asset Directory (with more being added as time goes on); the ones marked as rasters can be sampled; the ones that are not marked as rasters can be subsetted through different subsetting APIs.
asset | raster | description |
---|---|---|
icesat2 | The ICESat-2 Standard Data Products: ATL03, ATL06, and ATL08 | |
gedil4a | GEDI L4A Footprint Level Aboveground Biomass Density | |
gedil3-elevation | * | GEDI L3 gridded ground elevation |
gedil3-canopy | * | GEDI L3 gridded canopy height |
gedil3-elevation-stddev | * | GEDI L3 gridded ground elevation-standard deviation |
gedil3-canopy-stddev | * | GEDI L3 gridded canopy heigh-standard deviation |
gedil3-counts | * | GEDI L3 gridded counts of valid laser footprints |
gedil2a | GEDI L2A Elevation and Height Metrics Data Global Footprint | |
gedil1b | GEDI L1B Geolocated Waveforms | |
merit-dem | * | MERIT Digital Elevation Model |
swot-sim-ecco-llc4320 | * | Simulated SWOT Data |
swot-sim-glorys | * | Simulated SWOT Data |
usgs3dep-1meter-dem | * | USGS 3DEP 1m Digital Elevation Model |
esa-worldcover-10meter | * | Worldwide land cover mapping |
landsat-hls | * | Harmonized LandSat Sentinal-2 |
arcticdem-mosaic | * | PGC Arctic Digital Elevation Model Mosaic |
arcticdem-strips | * | PGC Arctic Digital Elevation Model Strips |
rema-mosaic | * | PGC Reference Elevation Model of Antarctica Mosaic |
rema-strips | * | PGC Reference Elevation Model of Antarctica Strips |
atlas-local | Local file-system staged ICESat-2 Standard Data Products: ATL03, ATL06, and ATL08 | |
gedi-local | Local file-system staged GEDI Stadard Data Products: L1B, L2A, L3, L4A, L4B | |
atlas-s3 | Internal s3-bucket staged ICESat-2 Standard Data Products: ATL03, ATL06, and ATL08 | |
nsidc-s3 | Alias for "icesat2" asset |
Sampling
Sampling rasters is controlled by the samples
field of a request’s parameter dictionary. The samples
field is itself a dictionary of dictionaries, each describing a raster dataset that needs to be sampled for the given processing request. The example below specifies two raster datasets that need to be sampled:
parms {
"samples" : {
"mosaic": {"asset": "arcticdem-mosaic", "radius": 10.0, "zonal_stats": True},
"strips": {"asset": "arcticdem-strips", "algorithm": "CubicSpline"}
}
}
The mosaic
and strips
dictionary keys are provided by the user to identify the data returned back to the user in the GeoDataFrame - they can be anything the user wants as long as it unique. The asset
dictionary keys are used to identify which asset in SlideRule’s asset directory is to be sampled. The dictionary keys that follow all control how the raster is to be sampled.
SlideRule supports various algorithms for sampling rasters: NearestNeighbour, Bilinear, Cubic, CubicSpline, Lanczos, Average, Mode, and Gauss. These are the algorithms supplied by GDAL and exposed to users through the SlideRule APIs. In addition to these algorithms, SlideRule also provides the ability to calculate zonal statistics for a given geolocation: count (number of pixels), minimum, maximum, mean, median, standard deviation, and median absolute deviation. Lastly, depending on the dataset, SlideRule provides custom bands that can be derived using other bands in the raster file. For example, the HLS dataset (landsat-hls
) provides the following three custom bands: “NDSI”, “NDVI”, “NDWI”.
Prior to sampling a raster, SlideRule provides ways to select and filter which rasters are sampled in a dataset. These include filtering based on time, and file name pattern matching.
For a full list of parameters available to users when sampling rasters, please see the sampling parameters.
Sample Results
The results returned by SlideRule for sampled raster datasets are sent as ancillary datasets which are then merged by the client software into the primary GeoDataFrame. The column name for each of the values below are prepended with the key
supplied by the user in the processing request. For example, if the user supplied "mosaic": {"asset": "arcticdem-mosaic", "radius": 10.0, "zonal_stats": True}
in the processing request, then each column associated with the results of sampling the arcticdem-mosaic
asset would have mosiac.
prepended to it.
The standard columns added to a GeoDataFrame for each sampled raster dataset are:
“value”: the sampled value from the raster at the point of interest
“time”: the best time provided by the raster dataset for when the sampled value was measured
“file_id”: a number used to identify the name of the file the sample value came from; this is used in conjunction with the
file_directory
provided in the metadata of a GeoDataFrame“flags”: any flags (if requested) that acompany the sampled data in the raster it was read from
The zonal statistic columns added to a GeoDataFrame for each sampled raster dataset are:
“count”: number of pixels read to calculate sample value
“min”: minimum pixel value of pixels that constributed to sample value
“max”: maximum pixel value of pixels that constributed to sample value
“mean”: average/mean pixel value of pixels that constributed to sample value
“median”: average/median pixel value of pixels that constributed to sample value
“stdev”: standard deviation of pixel values of pixels that constributed to sample value
“mad”: median absolute deviation of pixel values of pixels that constributed to sample value
Special Case - HLS
Unlike the other datasets supported by SlideRule, the HLS dataset (identified as landsat-hls
in the Asset Directory) requires the user to first query NASA’s EarthData CMR system, and provide the results of that query in the request to SlideRule.
For example, if you wanted to sample all HLS rasters that intersect the following polygon:
polygon = [ {"lon": -177.0000000001, "lat": 51.0000000001},
{"lon": -179.0000000001, "lat": 51.0000000001},
{"lon": -179.0000000001, "lat": 49.0000000001},
{"lon": -177.0000000001, "lat": 49.0000000001},
{"lon": -177.0000000001, "lat": 51.0000000001} ]
collected in the month of January in year 2021,
time_start = "2021-01-01T00:00:00Z"
time_end = "2021-02-01T23:59:59Z"
Then you would first have to perform a STAC query with those parameters like so:
catalog = earthdata.stac(short_name="HLS", polygon=polygon, time_start=time_start, time_end=time_end, as_str=True)
and then include the response in the request parameters under the catalog
field:
rqst = {"samples": {"asset": "landsat-hls", "catalog": catalog, "bands": ["B02"]}}