This is the central repository for branchwater.
branchwater is the framework we use for searching large collections of sequencing data with genome-scale queries. At its core it is a new search index for sourmash signatures, allowing near real-time search of large scale databases. It is an inverted index implemented on top of RocksDB.
You can read more about branchwater in Sourmash Branchwater Enables Lightweight Petabyte-Scale Sequence Search, Irber et al., 2022, and you can read about one of the earliest use cases in Biogeographic Distribution of Five Antarctic Cyanobacteria Using Large-Scale k-mer Searching with sourmash branchwater, Lumian et al., 2022.
branchwater had a couple of names over time:
- sra_search
- MAGsearch
- rocksdb-eval
- mastiff We finally brought it all together under the same umbrella.
Here are a few blog posts:
- MinHashing all the things: searching for MAGs in the SRA
- MinHashing all the things: a quick analysis of MAG search results
- Searching all public metagenomes with sourmash
- Discussion for the initial prototype for real time search of the SRA
branchwater is based on sourmash,
and the search index data structure live there since
version 0.12
of the Rust crate.
branchwater is currently (Jan 2024) mostly contained in this repo, with the tools developed to work with the new index:
- branchwater-api, a search server indexing ~946,000 SRA metagenomes.
- branchwater-web, a webapp that takes a genome of interest and rapidly searches for publicly-available metagenomes within NCBI's sequence read archive with branchwater. Metadata associated with the metagenome accessions are summarized in interactive tables, plots, and maps.
branchwater-index
, a command-line interface to build the search index. See the Query README for more details.branchwater-query
, a command-line interface to submit queries to a search server.
There are also additional resources:
- The code for monitoring the SRA and building sourmash sketches from genomes and metagenomes is in wort.
- sourmash_plugin_branchwater is a sourmash plugin exposing more features from branchwater in sourmash.
Please file branchwater-specific issues and pull requests in the branchwater repo. We also hang out in the sourmash repo a lot, if you have more general questions about sourmash. And there's a gitter/matrix channel where you can contact a number of the sourmash collaborators.
branchwater is AGPL licensed.
The webapp was developed by the USDA Agricultural Research Service, Genomics and Bioinformatics Research Unit group in Gainesville, FL, Primarily authored by Suzanne Fleishman and led by Adam Rivers. Check out their other work at https://tinyecology.com. As a work of the United States Government, the original code is available under the CC0 1.0 Universal Public Domain Dedication (CC0 1.0).