-
Notifications
You must be signed in to change notification settings - Fork 14
make gene name sanitation optional #1084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| description: | | ||
| The minimum number of genes present in both the reference and query datasets. | ||
| - name: "--sanitize_gene_names" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only remove version numbers? In this case I would use
--var_names_remove_ensembl_version_number or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what is passed to --var_gene_names: if this field contains ensemblid's, it would indeed only remove version numbers. But if this field contains gene symbols, it would result in removing splice variants
Changelog
Make gene name sanitation optional:
recommended for removing versions from ensembleid's (e.g. ENSMUSG00000017167.6), but not for gene names with splice variants (e.g. AL627309.1)
Note: #1083 needs to be merged first
Issue ticket number and link
Closes #xxxx (Replace xxxx with the GitHub issue number)
Checklist before requesting a review
I have performed a self-review of my code
Conforms to the Contributor's guide
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI tests succeed!