Skip to content

Conversation

@dorien-er
Copy link
Contributor

@dorien-er dorien-er commented Oct 9, 2025

Changelog

Make gene name sanitation optional:
recommended for removing versions from ensembleid's (e.g. ENSMUSG00000017167.6), but not for gene names with splice variants (e.g. AL627309.1)

Note: #1083 needs to be merged first

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

description: |
The minimum number of genes present in both the reference and query datasets.
- name: "--sanitize_gene_names"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only remove version numbers? In this case I would use

--var_names_remove_ensembl_version_number or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what is passed to --var_gene_names: if this field contains ensemblid's, it would indeed only remove version numbers. But if this field contains gene symbols, it would result in removing splice variants

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants