Description
Local representation of the SpatialData object when read in locally. This is a Visium HD dataset that I created originally using spatialdata_io.visium_hd
+ some post-processing stuff.
SpatialData object, with associated Zarr store: /<path>/11692b64-b34a-4dbe-adc9-784a87a7a856.zarr
├── Images
│ ├── 'spatialdata_hires_image': DataArray[cyx] (3, 4352, 6000)
│ └── 'spatialdata_lowres_image': DataArray[cyx] (3, 435, 600)
├── Shapes
│ └── 'spatialdata_square_008um': GeoDataFrame shape: (127839, 1) (2D shapes)
└── Tables
├── 'square_008um': AnnData (127839, 19059)
└── 'table': AnnData (127839, 19059)
with coordinate systems:
▸ 'downscaled_hires', with elements:
spatialdata_hires_image (Images), spatialdata_square_008um (Shapes)
▸ 'downscaled_lowres', with elements:
spatialdata_lowres_image (Images), spatialdata_square_008um (Shapes)
▸ 'global', with elements:
spatialdata_square_008um (Shapes)
Recommendation: attach a minimal working example
Generally, the easier it is for us to reproduce the issue, the faster we can work on it. It is not required, but if you can, please:
Reproducible example
This is a public dataset and the datastore should be downloadable
import spatialdata as sd
rem_path = "https://devel.umgear.org/datasets/spatial/11692b64-b34a-4dbe-adc9-784a87a7a856.zarr"
sdata = sd.read_zarr(rem_path)
# ERROR
This will work read in fine, but has other issues (which I will document in separate tickets)
sdata = sd.read_zarr(rem_path, selection=["images", "tables"])
Describe the bug
When I attempt to read in a publicly accessible remote Zarr dataset, it seems that Pyarrow is dropping one of the "/" in the https URI when it comes to the "shapes.parquet" file. I'm not sure if this is an downstream issue on that package's end, or more upstream (including something on my end).
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.12/site-packages/geopandas/io/arrow.py", line 653, in _read_parquet_schema_and_metadata
schema = parquet.ParquetDataset(path, filesystem=filesystem, **kwargs).schema
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1348, in __init__
finfo = filesystem.get_file_info(path_or_paths)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_fs.pyx", line 590, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Expected a local filesystem path, got a URI: 'https:/devel.umgear.org/datasets/spatial/11692b64-b34a-4dbe-adc9-784a87a7a856.zarr/shapes/spatialdata_square_008um/shapes.parquet'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/lib/python3.12/site-packages/spatialdata/_core/spatialdata.py", line 1850, in read
return read_zarr(file_path, selection=selection)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/spatialdata/_io/io_zarr.py", line 121, in read_zarr
shapes[subgroup_name] = _read_shapes(f_elem_store)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/spatialdata/_io/io_shapes.py", line 54, in _read_shapes
geo_df = read_parquet(path)
^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/geopandas/io/arrow.py", line 751, in _read_parquet
schema, metadata = _read_parquet_schema_and_metadata(path, filesystem)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/geopandas/io/arrow.py", line 655, in _read_parquet_schema_and_metadata
schema = parquet.read_schema(path, filesystem=filesystem)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 2339, in read_schema
filesystem, where = _resolve_filesystem_and_path(where, filesystem)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/pyarrow/fs.py", line 179, in _resolve_filesystem_and_path
filesystem, path = FileSystem.from_uri(path)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_fs.pyx", line 477, in pyarrow._fs.FileSystem.from_uri
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Unrecognized filesystem type in URI: https:/devel.umgear.org/datasets/spatial/11692b64-b34a-4dbe-adc9-784a87a7a856.zarr/shapes/spatialdata_square_008um/shapes.parquet
Expected behavior
The SpatialData object is successfully created
Desktop (optional):
- Tested in MacOS Sequoia 15.3 as well as a Dockerized Ubuntu:jammy image
Additional context
Relevant package versions. If you need me to go into a deeper dive, let me know
Python 3.12.7
spatialdata==0.3.0
spatialdata_io==0.1.6
pandas==2.2.1
anndata==0.10.6