-
Notifications
You must be signed in to change notification settings - Fork 2
Quickstart
- Install the software (see Installation instructions).
- Call Castanet "lite" with the help flag
$ python3 -m app.castanet_lite -h
- Install the software (see Installation instructions).
- Ensure your Castanet Conda environment is active.
- Check dependencies installed correctly; check for red warnings in the terminal as these will come with guidance as to any installation issues and how to fix them. N.b. this will fail if you are using the Docker container and you haven't manually installed a Kraken2 database into the Kraken2_human_db directory.
$ python3 -m app.castanet_lite_deptest - Run an end-to-end job
$ python3 -m app.castanet_lite -ExpDir data/eval -ExpName CastanetTest -SaveDir ./experiments -RefStem data/eval/ref.fa -DoKrakenPrefilter False - Check the output in "./experiments/CastanetTest" (see Output file descriptions).
If you're using ONT / single ended reads, simply set the "SingleEndedReads" argument to true. We also highly recommend changing the MSA tool ("Mapper") to minimap2.
- Install the software (see Installation instructions).
- Ensure your Castanet Conda environment is active.
- Start an API server with
$ castanet(or$ uvicorn app.api:app --port 8001if you haven't created an alias command). Check that the Castanet startup message appears in your terminal window. - Open your favourite web browser and visit following address for the Castanet GUI
http://127.0.0.1:8001/docs. - Scroll down to find the green "check_dependencies" box and click on it to expand. Chick "Try it out" (top right corner of green box), then the blue "Execute" box that appears underneath. This will test that all of the dependencies needed for Castanet to run are installed and functioning as expected. Check the output in either the Terminal or the API window (might need to scroll down) to ensure it completes successfully; check for red warnings in the terminal as these will come with guidance as to any installation issues and how to fix them
- Expand green drop-down for end_to_end endpoint. Click "Try it out" button (top right of expanded green boxes). Copy-paste command string below, trigger the function with "Execute" button.
- Check the output in "./experiments/CastanetTest" (see Output file descriptions).
{
"ExpDir": "./data/eval/",
"ExpName": "CastanetTest",
"SaveDir": "./experiments",
"RefStem": "data/eval/ref.fa",
"SingleEndedReads": false,
"MatchLength": 40,
"DoTrimming": true,
"TrimMinLen": 36,
"DoKrakenPrefilter": true,
"LineageFile": "data/ncbi_lineages_2023-06-15.csv.gz",
"ExcludeIds": "9606",
"RetainIds": "",
"RetainNames": "",
"ExcludeNames": "Homo",
"ConsensusMinD": 10,
"ConsensusCoverage": 30,
"ConsensusMapQ": 1,
"ConsensusCleanFiles": true,
"GtFile": "",
"GtOrg": "",
"KrakenDbDir": "kraken2_human_db/",
"KeepDups": true,
"Clin": "",
"DepthInf": "",
"SamplesFile": "",
"PostFilt": false,
"AdaptP": "data/all_adapters.fa",
"NThreads": "auto"
}
Users may choose a simplified CLI or a GUI-based method for the Castanet quick-start. Guides for both are included below, using a small test dataset that's included with the repository. N.b. the simplified CLI does not support the full range of input parameters that the GUI and programmatic CLI have, but functionality is broad enough to suit the majority of use cases.
There are three arguments that new users need to be aware of when starting their first experiments:
- ExpDir: A folder containing your paired read files. N.b. Castanet currently only supports data input from folders containing just these two read files. This folder path can be absolute or relative: e.g. "./my_folder" will look for (or create a new) folder within your Castanet repository; "/mnt/d/datasets" will look in your D drive for a folder called datasets.
- ExpName: A name for your experiment, which will be used to name the folder in which output data are saved.
- SaveDir: Directory path for where your experiment data will be saved; combines with ExpName. E.g. if user specifies SaveDir: "./experiments" and ExpName: "MyExperiment", data will be saved to "./experiments/MyExperiment". Path may be absolute or relative.
- RefStem: Path to a multi-fasta file containing your mapping reference sequences. It's essential that the headers in this file are named in a manner that Castanet can interpret (see section below, "Generating custom probe files").
- Install the software (see Installation instructions).
- Ensure your Castanet Conda environment is active.
- Check dependencies installed correctly; check for red warnings in the terminal as these will come with guidance as to any installation issues and how to fix them.
$ python3 -m dev.castanet_lite_deptest - Run an end-to-end job
$ python3 -m dev.castanet_lite -ExpDir <<InputFol>> -ExpName <<ExperimentDir>> -SaveDir ./experiments -RefStem <<MappingRef>>, replacing<<InputFol>>with your directory containing 2 paired fastq files,<<ExperimentDir>>with the path to save your output data to and<<MappingRef>>with the path to your mapping reference (fasta) file (see Usage). - Check the output in "./experiments/CastanetTest" (see Output file descriptions).
- Install the software (see Installation instructions).
- Ensure your Castanet Conda environment is active.
- Start an API server with
$ castanet(or$ uvicorn app.api:app --port 8001if you haven't created an alias command). Check that the Castanet startup message appears in your terminal window. - Open your favourite web browser and visit following address for the Castanet GUI
http://127.0.0.1:8001/docs. - Scroll down to find the green "check_dependencies" box and click on it to expand. Chick "Try it out" (top right corner of green box), then the blue "Execute" box that appears underneath. This will test that all of the dependencies needed for Castanet to run are installed and functioning as expected. Check the output in either the Terminal or the API window (might need to scroll down) to ensure it completes successfully; check for red warnings in the terminal as these will come with guidance as to any installation issues and how to fix them
- Expand green drop-down for end_to_end endpoint. Click "Try it out" button (top right of expanded green boxes). Copy-paste command string below, then substitute the following fields:
<<InputFol>>with your directory containing 2 paired fastq files,<<ExperimentDir>>with the path to save your output data to and<<MappingRef>>with the path to your mapping reference (fasta) file (see Usage). - Trigger the function with "Execute" button.
{
"ExpDir": "<<InputFol>>",
"ExpName": "<<ExperimentDir>>",
"SaveDir": "./experiments",
"RefStem": "<<MappingRef>>",
"SingleEndedReads": false,
"MatchLength": 40,
"DoTrimming": true,
"TrimMinLen": 36,
"DoKrakenPrefilter": true,
"LineageFile": "data/ncbi_lineages_2023-06-15.csv.gz",
"ExcludeIds": "9606",
"RetainIds": "",
"RetainNames": "",
"ExcludeNames": "Homo",
"ConsensusMinD": 10,
"ConsensusCoverage": 30,
"ConsensusMapQ": 1,
"ConsensusCleanFiles": true,
"GtFile": "",
"GtOrg": "",
"KrakenDbDir": "kraken2_human_db/",
"KeepDups": true,
"Clin": "",
"DepthInf": "",
"SamplesFile": "",
"PostFilt": false,
"AdaptP": "data/all_adapters.fa",
"NThreads": "auto"
}
N.b. pay attention to your argument type: strings should be encased in double quotes, whereas numbers and booleans (true, false) don't need to be. Any arguments that default to empty ('"ArgName": ""') are optional and may be left blank. The API will give you error messages in the "response body" box in your web browser, and detailed error messags will be printed to the terminal.
The Castanet batch endpoint applies the end to end analysis pipeline iteratively to multiple datasets within a master data folder. It assumes your data structure is:
-DataFolder
|__>Sample_1
|_____>read1.fastq.gz
|_____>read2.fastq.gz
|__>Sample_n
|_____>read1.fastq.gz
|_____>read2.fastq.gz
Otherwise, this function behaves exactly like the end-to-end pipeline (above).
- Install Castanet and activate your Conda environment (see above).
- Run a batch job
$ python3 -m dev.castanet_lite -ExpDir <<InputFol>> -ExpName CastanetTest -SaveDir ./experiments -RefStem <<MappingRef>> -Batch True, substituting<<InputFol>>with your input data folder and<<MappingRef>>with your mapping reference file path. - Output will be saved in multiple folders (one for each read pair) in your SaveDir folder. A summary csv will be generated in the Castanet repository called {ExpName}.csv.
- Install Castanet and activate your Conda environment (see above).
- Start an API server with
$ castanet(or$ uvicorn app.api:app --port 8001if you haven't created an alias command, but make sure that your Castanet conda environment is active). Check that the Castanet startup message appears in your terminal window. - Open your favourite web browser and visit following address for the Castanet GUI
http://127.0.0.1:8001/docs. - Expand green drop-down for batch endpoint. Click "Try it out" button (top right of expanded green boxes). Amend the DataFolder argument to point towards your data folder (remembering to preserve the punctuation marks in the text box), then change the ExpName, SaveDir and RefStem arguments accordingly (see above).
- Output will be saved in multiple folders (one for each read pair) in your SaveDir folder. A summary csv will be generated in the Castanet repository called {ExpName}.csv.