Skip to content

Reference sets configuration for microbes#1004

Open
manuelcarbajo wants to merge 9 commits into
Ensembl:feature/mvp-rapidfrom
manuelcarbajo:feature/mvp-rapid
Open

Reference sets configuration for microbes#1004
manuelcarbajo wants to merge 9 commits into
Ensembl:feature/mvp-rapidfrom
manuelcarbajo:feature/mvp-rapid

Conversation

@manuelcarbajo
Copy link
Copy Markdown
Contributor

Updated microbial reference genomes for the upcoming release. This update includes Archaea, Bacteria, Fungi, and Protists.

Note: As “Protists” is no longer a formal or searchable taxonomic group in NCBI Taxonomy, a base_collection named "shared_protists" has been introduced. This base collection aggregates all genomes used for comparative analyses across protist lineages.

Nine protist collections names have been defined: sar, amoebozoa, excavata, discoba, opisthokonta, haptophyta, viridiplantae, rhodophyta, and cryptophyceae. These collections correspond to higher-level eukaryotic clades that are explicitly represented and queryable in NCBI Taxonomy and together provide comprehensive coverage of protist diversity with genome-scale representation. These groups were selected using as criteria NCBI Taxonomy Browser (defining which higher-level eukaryotic clades are explicitly represented and queryable) and Adl SM et al. (2019) "Revisions to the classification, nomenclature, and diversity of eukaryotes." Journal of Eukaryotic Microbiology 66:4–119

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.07%. Comparing base (69393a7) to head (2d222ea).
⚠️ Report is 1 commits behind head on feature/mvp-rapid.

Additional details and impacted files
@@                Coverage Diff                 @@
##           feature/mvp-rapid    #1004   +/-   ##
==================================================
  Coverage              59.07%   59.07%           
==================================================
  Files                    213      213           
  Lines                  22671    22671           
  Branches                3527     3527           
==================================================
  Hits                   13394    13394           
  Misses                  8150     8150           
  Partials                1127     1127           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread conf/references/mlss_conf.xml Outdated
Comment thread conf/references/mlss_conf.xml Outdated
Comment thread conf/references/mlss_conf.xml Outdated
manuelcarbajo and others added 3 commits January 23, 2026 10:51
Adding separation space between different collection blocks

Co-authored-by: twalsh-ebi <twalsh@ebi.ac.uk>
Adds missing closing characters

Co-authored-by: twalsh-ebi <twalsh@ebi.ac.uk>
Deletes repeated closing collection block

Co-authored-by: twalsh-ebi <twalsh@ebi.ac.uk>
@manuelcarbajo
Copy link
Copy Markdown
Contributor Author

Thank you very much for flagging these typos Thomas. If everything else is good and Travis passes all checks I'm happy to go ahead with the merge.

Manuel Carbajo and others added 5 commits January 29, 2026 15:17
removed diff comments
According to the RNG rules a collection must contain a defined either:
<taxonomic_group .../> OR <genome .../>
since all genomes are defined in the base_collection, I added dummy taxonomic_groups
Comment on lines +700 to +701
<base_collection name="shared_protists"/>
<taxonomic_group taxon_name="SAR"/>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @manuelcarbajo .. Would it be possible to remove the Protists taxonomic_group elements, and replace each of the base_collection elements with a composable_collection, as I've suggested in this case?

Suggested change
<base_collection name="shared_protists"/>
<taxonomic_group taxon_name="SAR"/>
<composable_collection name="shared_protists"/>

Copy link
Copy Markdown
Contributor Author

@manuelcarbajo manuelcarbajo Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twalsh-ebi . If Travis is happy with this configuration, it works for me.

The taxonomic_group elements were originally added to satisfy the RNG validation requirements for the collections definition, so I’m glad to see a cleaner approach using composable_collection instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants