Issue importing db: `TypeError: cannot unpack non-iterable NoneType object` #236

bschilder · 2024-11-30T19:05:16Z

Hello,

Thanks for the tool, love the concept. Though I'm having some issues getting the db for work. I've tried this with two different files (gff and gff3) and encountered the same error.

Thanks in advance for your help,
Brian

Reprex

Download gff3

Download annotations fro Gencode.

!wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gff3.gz
!gunzip https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/gencode.v47.annotation.gff3.gz

Create db

dbfn = "gencode.v47.annotation.db"
db = gffutils.create_db("gencode.v47.annotation.gff3",
                            dbfn=dbfn,
                            force=False,
                            keep_order=True,
                            merge_strategy='merge', 
                            sort_attribute_values=True)

Import db

  db = gffutils.FeatureDB(dbfn, keep_order=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [6], in <cell line: 11>()
     11 if os.path.exists(dbfn):
     12     print("Using existing db.")
---> 13     db = gffutils.FeatureDB(dbfn)
     14 else:
     15     print("Creating db.")

File ~/.local/lib/python3.9/site-packages/gffutils/interface.py:199, in FeatureDB.__init__(self, dbfn, default_encoding, keep_order, pragmas, sort_attribute_values, text_factory)
    191 # Load some meta info
    192 # TODO: this is a good place to check for previous versions, and offer
    193 # to upgrade...
    194 c.execute(
    195     """
    196     SELECT version, dialect FROM meta
    197     """
    198 )
--> 199 version, dialect = c.fetchone()
    200 self.version = version
    201 self.dialect = helpers._unjsonify(dialect)

TypeError: cannot unpack non-iterable NoneType object

Versioning

python 3.9.5
gffutils 0.13

All packages

``` absl-py @ file:///dev/shm/jax/0.2.24/foss-2021a-CUDA-11.3.1/absl-py-0.15.0 alabaster @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/alabaster/alabaster-0.7.12 anyio==4.6.2.post1 appdirs @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/appdirs/appdirs-1.4.4 argcomplete==3.5.1 argh==0.31.3 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asn1crypto @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/asn1crypto/asn1crypto-1.4.0 astor @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/astor/astor-0.8.1 asttokens==2.0.5 astunparse @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/astunparse/astunparse-1.6.3 async-lru==2.0.4 atomicwrites @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/atomicwrites/atomicwrites-1.4.0 attrs==24.2.0 babel==2.16.0 backcall==0.2.0 bcbio-gff==0.7.1 bcrypt @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/bcrypt/bcrypt-3.2.0 beautifulsoup4==4.12.3 biopython==1.84 bitstring @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/bitstring/bitstring-3.1.7 bleach==6.2.0 blist @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/blist/blist-1.3.6 Bottleneck @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/Bottleneck/Bottleneck-1.3.2 CacheControl @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/CacheControl/CacheControl-0.12.6 cachetools @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/cachetools/cachetools-4.2.2 cachy @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/cachy/cachy-0.3.0 certifi @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/certifi/certifi-2020.12.5 cffi @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/cffi/cffi-1.14.5 chardet @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/chardet/chardet-4.0.0 charset-normalizer==2.0.12 clang @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/clang/clang-5.0 cleo @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/cleo-0.8.1-py2.py3-none-any.whl click @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/click/click-7.1.2 clikit @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/clikit-0.6.2-py2.py3-none-any.whl colorama @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/colorama/colorama-0.4.4 coloredlogs==15.0.1 crashtest @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/crashtest-0.3.1-py3-none-any.whl cryptography @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/cryptography/cryptography-3.4.7 cycler==0.11.0 Cython @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/Cython/Cython-0.29.23 cyvcf2==0.31.1 deap @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/deap/deap-1.3.1 debugpy==1.5.1 decorator @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/decorator/decorator-5.0.7 defusedxml==0.7.1 dill @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/dill/dill-0.3.3 distlib @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/distlib/distlib-0.3.1 dm-tree==0.1.6 docopt @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/docopt/docopt-0.6.2 docutils @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/docutils/docutils-0.17.1 ecdsa @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/ecdsa/ecdsa-0.16.1 entrypoints==0.4 exceptiongroup==1.2.2 executing==0.8.3 expecttest @ file:///dev/shm/expecttest/0.1.3/GCCcore-10.3.0/expecttest-0.1.3 fastjsonschema==2.20.0 filelock @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/filelock/filelock-3.0.12 flatbuffers @ file:///dev/shm/flatbufferspython/2.0/GCCcore-10.3.0/flatbuffers-2.0 flit @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/flit/flit-3.2.0 flit_core @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/flitcore/flit_core-3.2.0 fonttools==4.29.1 fqdn==1.5.1 fsspec @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/fsspec/fsspec-2021.4.0 future @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/future/future-0.18.2 gast @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/gast/gast-0.4.0 gffpandas==1.2.0 gffutils==0.13 google-api-core==2.23.0 google-auth @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/googleauth/google-auth-1.35.0 google-auth-oauthlib @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/googleauthoauthlib/google-auth-oauthlib-0.4.5 google-cloud-bigquery==3.27.0 google-cloud-core==2.4.1 google-crc32c==1.6.0 google-pasta @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/googlepasta/google-pasta-0.2.0 google-resumable-media==2.7.2 googleapis-common-protos==1.66.0 greenlet==3.1.1 grpcio @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/grpcio/grpcio-1.39.0 grpcio-status==1.68.0 gviz-api @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/gvizapi/gviz_api-1.9.0 h11==0.14.0 h5py @ file:///dev/shm/h5py/3.2.1/foss-2021a/h5py-3.2.1 html5lib @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/html5lib/html5lib-1.1 httpcore==1.0.7 httpx==0.27.2 humanfriendly==10.0 idna @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/idna/idna-2.10 imagesize @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/imagesize/imagesize-1.2.0 importlib_metadata==8.5.0 iniconfig @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/iniconfig/iniconfig-1.1.1 intervaltree @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/intervaltree/intervaltree-3.1.0 intreehooks @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/intreehooks/intreehooks-1.0 ipaddress @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/ipaddress/ipaddress-1.0.23 ipykernel==6.9.1 ipython==8.1.1 ipython-genutils==0.2.0 ipython-sql==0.5.0 isoduration==20.11.0 jax @ file:///dev/shm/jax/0.2.24/foss-2021a-CUDA-11.3.1/jax/jax-jax-v0.2.24 jaxlib @ file:///dev/shm/jax/0.2.24/foss-2021a-CUDA-11.3.1/jax-jaxlib-v0.1.73/dist/jaxlib-0.1.73-cp39-none-manylinux2010_x86_64.whl jedi==0.18.1 jeepney @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/jeepney-0.6.0-py3-none-any.whl Jinja2==3.1.4 joblib @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/joblib/joblib-1.0.1 json5==0.10.0 jsonpointer==3.0.0 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 jupyter-events==0.10.0 jupyter-lsp==2.2.5 jupyter_client==7.4.9 jupyter_core==5.7.2 jupyter_server==2.14.2 jupyter_server_terminals==0.5.3 jupyterlab==4.3.1 jupyterlab-lsp==5.1.0 jupyterlab_pygments==0.3.0 jupyterlab_server==2.27.3 keras @ file:///grid/it/data/elzar/easybuild/sources/t/TensorFlow/extensions/keras-2.6.0-py2.py3-none-any.whl Keras-Preprocessing @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/Keras_Preprocessing/Keras_Preprocessing-1.1.2 keyring @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/keyring/keyring-21.8.0 keyrings.alt @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/keyringsalt/keyrings.alt-4.0.2 kiwisolver==1.3.2 liac-arff @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/liacarff/liac-arff-2.5.0 lockfile @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/lockfile/lockfile-0.12.2 MACS2==2.2.6 Markdown @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/Markdown/Markdown-3.3.4 MarkupSafe==3.0.2 matplotlib==3.5.1 matplotlib-inline==0.1.3 mistune==3.0.2 mock @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/mock/mock-4.0.3 more-itertools @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/moreitertools/more-itertools-8.7.0 mpi4py @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/mpi4py/mpi4py-3.0.3 mpmath @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/mpmath/mpmath-1.2.1 msgpack @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/msgpack/msgpack-1.0.2 nbclient==0.10.0 nbconvert==7.16.4 nbformat==5.10.4 nest-asyncio==1.5.4 netaddr @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/netaddr/netaddr-0.8.0 netifaces @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/netifaces/netifaces-0.10.9 nose @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/nose/nose-1.3.7 notebook_shim==0.2.4 numexpr @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/numexpr/numexpr-2.7.3 numpy==1.22.3 oauthlib @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/oauthlib/oauthlib-3.1.1 opt-einsum @ file:///dev/shm/jax/0.2.24/foss-2021a-CUDA-11.3.1/opt_einsum/opt_einsum-3.3.0 overrides==7.7.0 packaging==24.2 pandas @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/pandas/pandas-1.2.4 pandocfilters==1.5.1 paramiko @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/paramiko/paramiko-2.7.2 parso==0.8.3 pastel @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/pastel-0.2.1-py2.py3-none-any.whl pathlib2 @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pathlib2/pathlib2-2.3.5 paycheck @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/paycheck/paycheck-1.0.2 pbr @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pbr/pbr-5.6.0 pexpect @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pexpect/pexpect-4.8.0 pickleshare==0.7.5 Pillow @ file:///dev/shm/Pillow/8.2.0/GCCcore-10.3.0/Pillow-8.2.0 pkginfo @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pkginfo/pkginfo-1.7.0 platformdirs==4.3.6 pluggy @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pluggy/pluggy-0.13.1 poetry @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/poetry/poetry-1.1.6 poetry-core @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/poetrycore/poetry-core-1.0.3 portpicker @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/portpicker/portpicker-1.4.0 prettytable==3.12.0 prometheus_client==0.21.0 promise==2.3 prompt-toolkit==3.0.28 proto-plus==1.25.0 protobuf @ file:///dev/shm/protobufpython/3.17.3/GCCcore-10.3.0/protobuf-3.17.3 psutil @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/psutil/psutil-5.8.0 ptyprocess @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval==0.2.2 py @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/py/py-1.10.0 py-expression-eval @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/py_expression_eval/py_expression_eval-0.3.13 pyasn1 @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pyasn1/pyasn1-0.4.8 pyasn1-modules @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/pyasn1modules/pyasn1-modules-0.2.8 pybind11 @ file:///dev/shm/pybind11/2.6.2/GCCcore-10.3.0/pybind11-2.6.2 pycparser @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pycparser/pycparser-2.20 pycrypto @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pycrypto/pycrypto-2.6.1 pyfaidx==0.8.1.3 Pygments @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/Pygments/Pygments-2.9.0 pygrgl==1.3 pylev @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pylev/pylev-1.3.0 PyNaCl @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/PyNaCl/PyNaCl-1.4.0 pyparsing @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pyparsing/pyparsing-2.4.7 pyrsistent @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pyrsistent/pyrsistent-0.17.3 pysam==0.22.1 pytest @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pytest/pytest-6.2.4 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 pytoml @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pytoml/pytoml-0.1.21 pytz @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/pytz/pytz-2021.1 PyVCF==0.6.8 PyYAML @ file:///dev/shm/PyYAML/5.4.1/GCCcore-10.3.0/PyYAML-5.4.1 pyzmq==26.2.0 referencing==0.35.1 regex @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/regex/regex-2021.4.4 requests==2.32.3 requests-oauthlib @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/requestsoauthlib/requests-oauthlib-1.3.0 requests-toolbelt @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/requeststoolbelt/requests-toolbelt-0.9.1 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py==0.21.0 rsa @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/rsa/rsa-4.7.2 sacremoses==0.0.47 scandir @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/scandir/scandir-1.10.0 scipy @ file:///dev/shm/SciPybundle/2021.05/foss-2021a/scipy/scipy-1.6.3 SecretStorage @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/SecretStorage/SecretStorage-3.3.1 semantic-version @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/semantic_version/semantic_version-2.8.5 Send2Trash==1.8.3 setuptools-rust @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/setuptoolsrust/setuptools-rust-0.12.1 setuptools-scm @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/setuptools_scm/setuptools_scm-6.0.1 shellingham @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/shellingham/shellingham-1.4.0 simplegeneric @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/simplegeneric/simplegeneric-0.8.1 simplejson @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/simplejson/simplejson-3.17.2 six @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/six/six-1.16.0 sniffio==1.3.1 snowballstemmer @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/snowballstemmer/snowballstemmer-2.1.0 sortedcontainers @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sortedcontainers/sortedcontainers-2.3.0 soupsieve==2.6 Sphinx @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/Sphinx/Sphinx-4.0.0 sphinx-bootstrap-theme @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxbootstraptheme/sphinx-bootstrap-theme-0.7.1 sphinxcontrib-applehelp @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribapplehelp/sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribdevhelp/sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribhtmlhelp/sphinxcontrib-htmlhelp-1.0.3 sphinxcontrib-jsmath @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribjsmath/sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribqthelp/sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribserializinghtml/sphinxcontrib-serializinghtml-1.1.4 sphinxcontrib-websupport @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/sphinxcontribwebsupport/sphinxcontrib-websupport-1.2.4 SQLAlchemy==2.0.36 sqlparse==0.5.2 stack-data==0.2.0 tabulate @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/tabulate/tabulate-0.8.9 tblib @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/tblib/tblib-1.7.0 tensorboard @ file:///grid/it/data/elzar/easybuild/sources/t/TensorFlow/extensions/tensorboard-2.6.0-py3-none-any.whl tensorboard-data-server @ file:///grid/it/data/elzar/easybuild/sources/t/TensorFlow/extensions/tensorboard_data_server-0.6.1-py3-none-any.whl tensorboard-plugin-profile @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/tensorboard_plugin_profile/tensorboard_plugin_profile-2.5.0 tensorboard-plugin-wit @ file:///grid/it/data/elzar/easybuild/sources/t/TensorFlow/extensions/tensorboard_plugin_wit-1.8.0-py3-none-any.whl tensorflow @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/tensorflow-2.6.0-cp39-cp39-linux_x86_64.whl tensorflow-estimator @ file:///grid/it/data/elzar/easybuild/sources/t/TensorFlow/extensions/tensorflow_estimator-2.6.0-py2.py3-none-any.whl termcolor @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/termcolor/termcolor-1.1.0 terminado==0.18.1 threadpoolctl @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/threadpoolctl/threadpoolctl-2.1.0 tinycss2==1.4.0 tokenizers==0.11.4 toml @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/toml/toml-0.10.2 tomli==2.1.0 tomlkit @ file:///grid/it/data/elzar/easybuild/sources/p/Python/extensions/tomlkit-0.7.0-py2.py3-none-any.whl toolz==0.11.2 torch==1.10.0 tornado==6.4.2 tqdm==4.62.3 traitlets==5.14.3 types-python-dateutil==2.9.0.20241003 typing-extensions @ file:///dev/shm/typingextensions/3.10.0.0/GCCcore-10.3.0/typing_extensions-3.10.0.0 ujson @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/ujson/ujson-4.0.2 uri-template==1.3.0 urllib3 @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/urllib3/urllib3-1.26.4 vcf2seq==0.7.4 virtualenv @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/virtualenv/virtualenv-20.4.6 wcwidth @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/wcwidth/wcwidth-0.2.5 webcolors==24.11.1 webencodings @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/webencodings/webencodings-0.5.1 websocket-client==1.8.0 Werkzeug @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/Werkzeug/Werkzeug-2.0.1 wrapt @ file:///dev/shm/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/wrapt/wrapt-1.12.1 xlrd @ file:///dev/shm/Python/3.9.5/GCCcore-10.3.0/xlrd/xlrd-2.0.1 zipp==3.21.0 ```

The text was updated successfully, but these errors were encountered:

bschilder · 2024-12-03T15:19:41Z

@daler might you be able to provide some guidance on this?

daler · 2024-12-03T19:16:48Z

Does the issue occur when using a smaller file? E.g., head -n 10000 from one of the files?

daler · 2024-12-03T22:32:34Z

Also, the merge_strategy="merge" makes it really slow to build a database. It's clear why the default of merge_strategy="error" doesn't work, for example with two different CDS entries for the same transcript in the GFF, CDS:ENST00000641515.2.

chr1    HAVANA  CDS     65565   65573   .       +       0       ID=CDS:ENST00000641515.2;Parent=ENST00000641515.2;gene_id=ENSG00000186092.7;transcript_id=ENST00000641515.2;gene_type=protein_coding;gene_name=OR4F5;transcript_type=protein_coding;transcript_name=OR4F5-201;exon_number=2;exon_id=ENSE00003813641.1;level=2;protein_id=ENSP00000493376.2;hgnc_id=HGNC:14825;tag=RNA_Seq_supported_partial,basic,Ensembl_canonical,GENCODE_Primary,MANE_Select,appris_principal_1,CCDS;ccdsid=CCDS30547.2;havana_gene=OTTHUMG00000001094.4;havana_transcript=OTTHUMT00000003223.4
chr1    HAVANA  CDS     69037   70008   .       +       0       ID=CDS:ENST00000641515.2;Parent=ENST00000641515.2;gene_id=ENSG00000186092.7;transcript_id=ENST00000641515.2;gene_type=protein_coding;gene_name=OR4F5;transcript_type=protein_coding;transcript_name=OR4F5-201;exon_number=3;exon_id=ENSE00003813949.1;level=2;protein_id=ENSP00000493376.2;hgnc_id=HGNC:14825;tag=RNA_Seq_supported_partial,basic,Ensembl_canonical,GENCODE_Primary,MANE_Select,appris_principal_1,CCDS;ccdsid=CCDS30547.2;havana_gene=OTTHUMG00000001094.4;havana_transcript=OTTHUMT00000003223.4

In such a case, it's unclear how best (or even) to merge them since they have different start/stops and are children of different exons. Since it's unclear which one should be returned if you asked for CDS:ENST00000641515.2, I think I would prefer merge_strategy="create_unique".

I wonder if there was an issue in creating the database in your case -- out of memory, or timed out or something -- because adding the version to the database is the last thing to happen, and the error message implies that information doesn't exist. Testing with the first, say, 10k lines will help diagnose that.

bschilder · 2024-12-03T23:01:49Z

Thanks for the reply @daler

Does the issue occur when using a smaller file? E.g., head -n 10000 from one of the files?

!head -1000 GRCh38/gencode.v47.annotation.gff3 > tmp.gff3

Using a small subset seems to work fine.

dbfn='tmp.db'
 db = gffutils.create_db("tmp.gff3",
                            dbfn=dbfn,
                            force=False,
                            keep_order=True,
                            merge_strategy='merge', 
                            sort_attribute_values=True)
db = gffutils.FeatureDB(dbfn, keep_order=True)
db
# <gffutils.interface.FeatureDB at 0x155362e9b910>

Also, the merge_strategy="merge" makes it really slow to build a database. It's clear why the default of merge_strategy="error" doesn't work, for example with two different CDS entries for the same transcript in the GFF, CDS:ENST00000641515.2.
chr1    HAVANA  CDS     65565   65573   .       +       0       ID=CDS:ENST00000641515.2;Parent=ENST00000641515.2;gene_id=ENSG00000186092.7;transcript_id=ENST00000641515.2;gene_type=protein_coding;gene_name=OR4F5;transcript_type=protein_coding;transcript_name=OR4F5-201;exon_number=2;exon_id=ENSE00003813641.1;level=2;protein_id=ENSP00000493376.2;hgnc_id=HGNC:14825;tag=RNA_Seq_supported_partial,basic,Ensembl_canonical,GENCODE_Primary,MANE_Select,appris_principal_1,CCDS;ccdsid=CCDS30547.2;havana_gene=OTTHUMG00000001094.4;havana_transcript=OTTHUMT00000003223.4
chr1    HAVANA  CDS     69037   70008   .       +       0       ID=CDS:ENST00000641515.2;Parent=ENST00000641515.2;gene_id=ENSG00000186092.7;transcript_id=ENST00000641515.2;gene_type=protein_coding;gene_name=OR4F5;transcript_type=protein_coding;transcript_name=OR4F5-201;exon_number=3;exon_id=ENSE00003813949.1;level=2;protein_id=ENSP00000493376.2;hgnc_id=HGNC:14825;tag=RNA_Seq_supported_partial,basic,Ensembl_canonical,GENCODE_Primary,MANE_Select,appris_principal_1,CCDS;ccdsid=CCDS30547.2;havana_gene=OTTHUMG00000001094.4;havana_transcript=OTTHUMT00000003223.4
In such a case, it's unclear how best (or even) to merge them since they have different start/stops and are children of different exons. Since it's unclear which one should be returned if you asked for CDS:ENST00000641515.2, I think I would prefer merge_strategy="create_unique".

Ok, though I should note I'm using the official Gencode. release annotations. Does this imply they are not following the expected standards for gff3?

I'll try merge_strategy="create_unique" with the full gff3 file and see if that helps (this will take a while).

I wonder if there was an issue in creating the database in your case -- out of memory, or timed out or something -- because adding the version to the database is the last thing to happen, and the error message implies that information doesn't exist. Testing with the first, say, 10k lines will help diagnose that.

I don't think memory should normally be an issue, I'm running this on my interactive HPC with 16 cores and 258Gb RAM. Unless gffutils is trying to process the data in a way that leads to an explosion of memory usage.

Another possibility is that it takes so long to create the database, that my interactive session times out after 12 hours (the max i can request it for). Though I'd hope it wouldn't take that long to process one file.

bschilder · 2024-12-03T23:14:11Z

FYI, the gencode.v47.annotation.gff3 file is 1.7Gb large. What's the largest file you've successfully run gffutils on @daler ?

daler · 2024-12-04T16:39:06Z

I use it on GENCODE files all the time; you can also leave it gzipped to save a little on space. It just so happens that the arguments you're using triggers complex behavior that in some cases can be helpful, but probably not in the general case.

The following runs in ~8 mins with under 200 MB RAM total:

gffutils.create_db(
    "gencode.gff.gz",
    dbfn="gencode_gff.db",
    merge_strategy="create_unique",
    verbose=True)

Or, for GTF,

gffutils.create_db(
    "genecode.gtf.gz",
    dbfn="gencode_gtf.db",
    merge_strategy="create_unique",
    disable_infer_transcripts=True,
    disable_infer_genes=True,
    verbose=True)

Regarding specs...

GFF expects every feature to have a unique ID (see this entry in the spec); GTF spec does not include transcript or gene features; per the spec, they are expected to be inferred from exons.

So no, GENCODE GFF and GTF files do not follow the specs, hence needing to build in detection and warningwhen trying to build a db from GENCODE files. But honestly, hardly anyone follows the specs . . . hence needing to build gffutils in the first place to deal with all that messiness!

For your original example, when you use the merge_strategy="merge", that triggers some rather complex behavior that involves scanning everything in the database so far to figure out what the merge candidates are. I haven't checked it, but this is probably something approaching O(n^2) complexity and I would not be surprised if spending all that effort on a GENCODE-size file took >12 hrs. In that case, I bet what happened is that the job timed out and never created the version entry, giving the original error you reported.

Also, keep_order=True and sort_attribute_values=True are really only useful for tests, or when it's very important to retain round-trip invariance. They don't add that much work, but it's something. It's the merge stuff that's super time-consuming though.

daler · 2024-12-16T18:34:11Z

Closing because I think everything is behaving as expected, but please reopen if you have any issues with the adjusted arguments.

bschilder · 2024-12-16T19:50:32Z

Thanks so much for the detailed response @daler. Trying this again now.

Just a note, my example used gff3 format (not gff or gtf as in your examples). Not sure if this makes a difference.

bschilder · 2024-12-16T20:21:18Z

@daler, I'm still encountering same issues as before with the gff3 file in my initial reproducible example. Namely, the function hangs indefinitely, even after modifying the arguments.

gff_fn = "GRCh38/gencode.v47.annotation.gff3"
dbfn = f"{gff_fn}.db"
 db = gffutils.create_db(gff_fn,
                            dbfn=dbfn,
                            merge_strategy='create_unique')

daler closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue importing db: `TypeError: cannot unpack non-iterable NoneType object` #236

Issue importing db: `TypeError: cannot unpack non-iterable NoneType object` #236

bschilder commented Nov 30, 2024 •

edited

Loading

bschilder commented Dec 3, 2024

daler commented Dec 3, 2024

daler commented Dec 3, 2024

bschilder commented Dec 3, 2024 •

edited

Loading

bschilder commented Dec 3, 2024 •

edited

Loading

daler commented Dec 4, 2024

daler commented Dec 16, 2024

bschilder commented Dec 16, 2024

bschilder commented Dec 16, 2024 •

edited

Loading

Issue importing db: TypeError: cannot unpack non-iterable NoneType object #236

Issue importing db: TypeError: cannot unpack non-iterable NoneType object #236

Comments

bschilder commented Nov 30, 2024 • edited Loading

Reprex

Download gff3

Create db

Import db

Versioning

All packages

bschilder commented Dec 3, 2024

daler commented Dec 3, 2024

daler commented Dec 3, 2024

bschilder commented Dec 3, 2024 • edited Loading

bschilder commented Dec 3, 2024 • edited Loading

daler commented Dec 4, 2024

daler commented Dec 16, 2024

bschilder commented Dec 16, 2024

bschilder commented Dec 16, 2024 • edited Loading

Issue importing db: `TypeError: cannot unpack non-iterable NoneType object` #236

Issue importing db: `TypeError: cannot unpack non-iterable NoneType object` #236

bschilder commented Nov 30, 2024 •

edited

Loading

bschilder commented Dec 3, 2024 •

edited

Loading

bschilder commented Dec 3, 2024 •

edited

Loading

bschilder commented Dec 16, 2024 •

edited

Loading