anndata2cas add accession_columns parameter#165
Merged
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR updates anndata2cas to support using pre-calculated accession IDs by introducing an accession_columns parameter while removing the now-unsupported parent_cell_set_name field. Key changes include updating tests to reflect schema changes, modifying conversion utility functions to accept an accession mapping, and adding a new MappedAccessionManager to handle predefined accession IDs.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/test/spreadsheet_to_cas_test.py | Updated expected annotation length in the test to match the removal of parent_cell_set_name. |
| src/test/conversion_utils_test.py | Commented out parent_cell_set_name test case as it is no longer part of the schema. |
| src/cas/utils/conversion_utils.py | Added an accession mapping parameter to generate_parent_cell_lookup and created create_accession_mapping. |
| src/cas/anndata_to_cas.py | Updated function signature and logic to pass accession_columns and use the new accession mapping. |
| src/cas/accession/*.py | Updated accession manager implementations to accept an extra cellset_name parameter. |
| src/cas/main.py & docs/cli.md | Updated CLI options and documentation for the new accession_columns parameter. |
ubyndr
approved these changes
Jun 2, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Anndata2cas is automatically generating accession IDs from the hash of the cells belonging to the cell set. However in some cases (in the anndata or spreadsheet) there are already pre-calculated accession ids in the dataset. This update enables users to identify accession columns and utilise those instead of generating new accessions.
Also
parent_cell_set_namedoesn't exist in the CAS schema, so deleted it from the json output.