This repository is a simple set of demonstrations to prompt discussions over whether and how we should approach Virtualizing GeoTIFFs and COGs.
First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:
- Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
- Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
- Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/xstac, thereby decreasing cost and increasing performance
- Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
- Include etags in the virtualized datasets to support reproducibility
- A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (zarr-developers/geozarr-spec#53)
why-virtualize-geotiff
is distributed under the terms of the MIT license.