Skip to content

Conversation

@douglasgscofield
Copy link

Adds the ability to specify the library path (what is set via --library_path) using the environment variable COMPLEASM_LIBRARY_PATH. The logic is:

  1. change default for --library_path options wherever they appear to None
  2. at the end of argument processing, if args.library_path == None, then check if environment variable COMPLEASM_LIBRARY_PATH is set
  3. if it is set, use its value for args.library_path
  4. if it is not set, use the current default value, mb_downloads, which will be in the current directory

This also modifies the __init__ logic in Downloader to do the same.

This change enables using a central location for lineage sets, useful for streamlining project-wide storage or, for example, for HPC clusters such as ours where we've already downloaded the lineage sets to the same system-wide location for both BUSCO and compleasm. These lineage sets do not often change, so enabling the use of a common location for them is not just feasible but recommended.

@douglasgscofield
Copy link
Author

douglasgscofield commented Apr 3, 2024

I should add that to use existing BUSCO v5 lineage sets for compleasm, each of the lineage directories, e.g., methanomicrobia_odb10, needs a corresponding methanomicrobia_odb10.done file at the same level, this can be created with touch for each lineage directory:

cd <BUSCO v5 lineage sets base directory>/lineages
for D in *_odb10; do
    test -d $D && touch $D.done
done

Also, if you uncompress the refseq_db.faa.gz within each lineage directory, leave the gzipped version in place for compleasm to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant