You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, first let me say I think transvar is great, and we are planning to use it in PomBase, the model organism database for fission yeast.
I think there is a bug / inconsistency that arises like this. Let's say we have a feature in a GTF file (real example from pombe genome), that has lowercase in the gene_id and does not have a gene name:
Then run the code that gives the error (transvar_main_script below is bin/transvar from which I import the parser functions)
fromtransvar.annodbimportAnnoDBfromtransvar.configimportread_configimportargparsefromtransvar_main_scriptimportparser_add_annotation, parser_add_mutation, parser_add_generalfromtransvar.annoimportmain_annofromfunctoolsimportpartialparser=argparse.ArgumentParser(description=__doc__)
subparsers=parser.add_subparsers()
p=subparsers.add_parser("panno", help='annotate protein element')
parser_add_annotation(p)
parser_add_mutation(p)
parser_add_general(p)
p.set_defaults(func=partial(main_anno, at='p'))
variant_type='panno'variant_description='SPBC1198.04c:p.N3A'args=parser.parse_args([variant_type, '-i', variant_description, '--ensembl', 'data/pombe_genome.gtf.transvardb', '--reference', 'data/pombe_genome.fa', '-v', '2'])
db=AnnoDB(args, read_config())
print('the gene is found in the db: ',list(db.get_gene('SPBC1198.04c')))
print('the gene is not found in the panno call')
args.func(args)
The text was updated successfully, but these errors were encountered:
Hello, first let me say I think transvar is great, and we are planning to use it in PomBase, the model organism database for fission yeast.
I think there is a bug / inconsistency that arises like this. Let's say we have a feature in a GTF file (real example from pombe genome), that has lowercase in the
gene_id
and does not have agene name
:I can create a database and retreive the gene:
However, because of this:
transvar/transvar/anno.py
Lines 193 to 195 in f7c17a8
Or this:
transvar/transvar/anno.py
Lines 153 to 155 in f7c17a8
It's impossible to use it for a protein annotation such as
SPBC1198.04c:p.N3A
, because theq.tok
is made uppercase and it does not exist indb
.A possible fix for this particular case would be to do
gene_id.upper()
here, although I am not sure it would break something else.transvar/transvar/localdb.py
Lines 574 to 577 in f7c17a8
Minimal example to show the problem is below, and the files to reproduce:
data/pombe_genome.fa
: https://curation.pombase.org/dumps/latest_build/fasta/chromosomes/Schizosaccharomyces_pombe_all_chromosomes.fa.gz -o data/pombe_genome.fa.gzdata/pombe_genome.gtf
: pombe_genome.gtf.zipThen, to set up the transvar config:
Then run the code that gives the error (
transvar_main_script
below is bin/transvar from which I import the parser functions)The text was updated successfully, but these errors were encountered: