GeoParquet
2023-02-24
Background
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It has strong support in Python/DataFrames and is a popular data format for big data analytics. See https://parquet.apache.org/ for more details.
GeoParquet is a meta data and file organization convention overlaid on top of the Parquet format that provides a standard way of storing geospatial information inside Parquet files. It has growing support in geospatial analysis tools, including Python/GeoDataFrames. See https://geoparquet.org/ for more details.
Overview
SlideRule currently supports returning results back to data users as GeoParquet files. These files are built on the server and either streamed back directly to the user, or uploaded to a user-specified S3 bucket for later access. To specify the GeoParquet option, the request must include the output
parameter with the output.format
field set to “parquet”. See the section on output parameters for more details.
S3 as a destination
To specify S3 as a destination, the output.path
field must start with “s3://”. (For example, “s3://mybucket/maps/grandmesa.parquet”).
The SlideRule project does not maintain any S3 buckets that are open for public access; it is therefore incumbent on the user to provide a path to an S3 bucket they have access to, and to also provide the temporary credentials in their request for SlideRule to use to write to the bucket.
Methods for obtaining temporary AWS credentials are outside the scope of this user guide, yet it is strongly encouraged that the credentials provided are as limited in scope as possible. For the purposes of SlideRules, the only access that is necessary is “s3:PutObject”.
Constraints
Currently, only support for the
atl06
,atl08
, and flattenedatl03
records is provided. This means that the ICESat-2compact
parameter being set is not supported when outputting to GeoParquet, and theatl03
results may look slightly different between native runs and runs that request the GeoParquet format.The results in the GeoParquet file are not sorted.
The SlideRule server side version information only includes the server core version information and does not include version information for any of the plugins that the server has loaded and is running.