Feature Request: Faster Ingest scripts
Is your feature request related to a problem? Please describe.
If you try to load many (say, 1000's) of genomes into AS5 DB, it takes an exceedingly long time.
Describe the solution you'd like
Faster ingest scripts. Right now there are a lot of database calls for each transaction that are needed for data consistency. Would there be a way that the data could be checked for consistency offline, and then all of the tables corresponding to a cluster/genome/set-of-genomes, could be committed once using a COPY call. I think this could yield order-of-magnitude gains.
Describe alternatives you've considered
Writing each genome to its own SQLite DB in parallel. Dump the DBs and ingest/COPY the dumps.
Additional context
As more groups scale their genome mining efforts, DBs will be used in place of flat files so fast data ingest and stable/extandable DB schemas will be very important.
Feature Request: Faster Ingest scripts
Is your feature request related to a problem? Please describe.
If you try to load many (say, 1000's) of genomes into AS5 DB, it takes an exceedingly long time.
Describe the solution you'd like
Faster ingest scripts. Right now there are a lot of database calls for each transaction that are needed for data consistency. Would there be a way that the data could be checked for consistency offline, and then all of the tables corresponding to a cluster/genome/set-of-genomes, could be committed once using a
COPYcall. I think this could yield order-of-magnitude gains.Describe alternatives you've considered
Writing each genome to its own SQLite DB in parallel. Dump the DBs and ingest/
COPYthe dumps.Additional context
As more groups scale their genome mining efforts, DBs will be used in place of flat files so fast data ingest and stable/extandable DB schemas will be very important.