-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue importing db: TypeError: cannot unpack non-iterable NoneType object
#236
Comments
@daler might you be able to provide some guidance on this? |
Does the issue occur when using a smaller file? E.g., |
Also, the
In such a case, it's unclear how best (or even) to merge them since they have different start/stops and are children of different exons. Since it's unclear which one should be returned if you asked for I wonder if there was an issue in creating the database in your case -- out of memory, or timed out or something -- because adding the version to the database is the last thing to happen, and the error message implies that information doesn't exist. Testing with the first, say, 10k lines will help diagnose that. |
Thanks for the reply @daler
Using a small subset seems to work fine.
Ok, though I should note I'm using the official Gencode. release annotations. Does this imply they are not following the expected standards for gff3? I'll try
I don't think memory should normally be an issue, I'm running this on my interactive HPC with 16 cores and 258Gb RAM. Unless Another possibility is that it takes so long to create the database, that my interactive session times out after 12 hours (the max i can request it for). Though I'd hope it wouldn't take that long to process one file. |
FYI, the gencode.v47.annotation.gff3 file is 1.7Gb large. What's the largest file you've successfully run |
I use it on GENCODE files all the time; you can also leave it gzipped to save a little on space. It just so happens that the arguments you're using triggers complex behavior that in some cases can be helpful, but probably not in the general case. The following runs in ~8 mins with under 200 MB RAM total: gffutils.create_db(
"gencode.gff.gz",
dbfn="gencode_gff.db",
merge_strategy="create_unique",
verbose=True) Or, for GTF, gffutils.create_db(
"genecode.gtf.gz",
dbfn="gencode_gtf.db",
merge_strategy="create_unique",
disable_infer_transcripts=True,
disable_infer_genes=True,
verbose=True) Regarding specs... GFF expects every feature to have a unique ID (see this entry in the spec); GTF spec does not include transcript or gene features; per the spec, they are expected to be inferred from exons. So no, GENCODE GFF and GTF files do not follow the specs, hence needing to build in detection and warningwhen trying to build a db from GENCODE files. But honestly, hardly anyone follows the specs . . . hence needing to build gffutils in the first place to deal with all that messiness! For your original example, when you use the Also, |
Closing because I think everything is behaving as expected, but please reopen if you have any issues with the adjusted arguments. |
Thanks so much for the detailed response @daler. Trying this again now. Just a note, my example used gff3 format (not gff or gtf as in your examples). Not sure if this makes a difference. |
@daler, I'm still encountering same issues as before with the gff3 file in my initial reproducible example. Namely, the function hangs indefinitely, even after modifying the arguments.
|
Hello,
Thanks for the tool, love the concept. Though I'm having some issues getting the db for work. I've tried this with two different files (gff and gff3) and encountered the same error.
Thanks in advance for your help,
Brian
Reprex
Download gff3
Download annotations fro Gencode.
Create db
Import db
Versioning
All packages
The text was updated successfully, but these errors were encountered: