This project stores test and example data for the TileDB-SOMA repository. Some examples of possible datasets that might be included are:
- an AnnData H5AD file;
- gene expression data and spatial folder from 10x's SpaceRanger (Visium processing software);
- a set of multiscale images stored as a tar file.
- Dataset 2025-02-19: Visium v2 Example
This repository exists to store large binaries and similar artifacts needed as test or example inputs for TileDB-SOMA. The data will be added to this repository as a release.
To add new test data do the following:
- Create a new release in the repository using the GitHub interface.
- Set the title using the format "Dataset YYYY-MM-DD" (for example "Dataset 2025-02-19"). Append the next letter of the alphabet starting with "b" if a release was already created on that day (for example "Dataset 2025-02-19b").
- Set the tag to "dataset-YYYY-MM-DD" (for example "dataset-2025-02-19").
- Add a description of what the data is. This may include information like where the data was originally hosted or how is was otherwise generated.
- Upload the desired datasets to the release by using the file dialog in the release page (at time of writing in Firefox this is done by clicking "Attach binaries by dropping them here or selecting them." in the new release page).
- Publish the release.
- Add a description of the new data to this README in the "Available data" section. Include the full URL to access the data (for example "https://github.com/single-cell-data/TileDB-SOMA-Test-Data/releases/download/Visium-v2.0.0-Dataset/spatial.tar.gz")
Note: We are adding data directly to releases because we do not need the version-control functionality of Git for the test data, and GitHub has a larger file-size limit for release-artifact files than it does for files committed to source control within the repository.
The data can be downloaded by navigating to the release on GitHub or using wget
or curl
to access the data from the URL listed in this README.md
.
After you add a new dataset to this repository, you should add a sibling PR to access the data from TileDB-SOMA. More information about how to do this will be added to the test/README.md
markdown file in the near future.