Skip to content

Feature Request: Faster Data Imports #2

@zachcp

Description

@zachcp

Feature Request: Faster Ingest scripts

Is your feature request related to a problem? Please describe.
If you try to load many (say, 1000's) of genomes into AS5 DB, it takes an exceedingly long time.

Describe the solution you'd like
Faster ingest scripts. Right now there are a lot of database calls for each transaction that are needed for data consistency. Would there be a way that the data could be checked for consistency offline, and then all of the tables corresponding to a cluster/genome/set-of-genomes, could be committed once using a COPY call. I think this could yield order-of-magnitude gains.

Describe alternatives you've considered
Writing each genome to its own SQLite DB in parallel. Dump the DBs and ingest/COPY the dumps.

Additional context
As more groups scale their genome mining efforts, DBs will be used in place of flat files so fast data ingest and stable/extandable DB schemas will be very important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions