Skip to content

Commit 634db46

Browse files
authored
2024.09 (#3433)
* 2024.09 * addressing @charles-cowart comments * update based on @qiyunzhu recommendations
1 parent 913a31f commit 634db46

File tree

12 files changed

+102
-14
lines changed

12 files changed

+102
-14
lines changed

CHANGELOG.md

+15
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,20 @@
11
# Qiita changelog
22

3+
Version 2024.09
4+
---------------
5+
6+
Deployed on September 23rd, 2024
7+
8+
* Added update_resource_allocation_redis and companion code, so resource allocations summaries are available for review. Thank you @Gossty!
9+
* Now is possible to have default workflows with only one step.
10+
* `qiita_client.update_job_step` now accepts an ignore_error optional parameter. Thank you @charles-cowart!
11+
* Initial changes in `qiita_client` to have more accurate variable names: `QIITA_SERVER_CERT` -> `QIITA_ROOTCA_CERT`. Thank you @charles-cowart!
12+
* Added `get_artifact_html_summary` to `qiita_client` to retrieve the summary file of an artifact.
13+
* Re-added github actions to `https://github.com/qiita-spots/qiita_client`.
14+
* `Woltka v0.1.4, paired-end` superseded `Woltka v0.1.4` in `qp-woltka`; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/woltka_pairedend.html). Thank you to @qiyunzhu for the benchmarks!
15+
* Other general fixes, like [#3424](https://github.com/qiita-spots/qiita/pull/3424), [#3425](https://github.com/qiita-spots/qiita/pull/3425).
16+
17+
318
Version 2024.07
419
---------------
520

qiita_core/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# The full license is in the file LICENSE, distributed with this software.
77
# -----------------------------------------------------------------------------
88

9-
__version__ = "2024.02"
9+
__version__ = "2024.09"

qiita_db/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
from . import user
2828
from . import processing_job
2929

30-
__version__ = "2024.02"
30+
__version__ = "2024.09"
3131

3232
__all__ = ["analysis", "artifact", "archive", "base", "commands",
3333
"environment_manager", "exceptions", "investigation", "logger",

qiita_pet/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# The full license is in the file LICENSE, distributed with this software.
77
# -----------------------------------------------------------------------------
88

9-
__version__ = "2024.02"
9+
__version__ = "2024.09"

qiita_pet/handlers/api_proxy/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
from .user import (user_jobs_get_req)
3939
from .util import check_access, check_fp
4040

41-
__version__ = "2024.02"
41+
__version__ = "2024.09"
4242

4343
__all__ = ['prep_template_summary_get_req', 'data_types_get_req',
4444
'study_get_req', 'sample_template_filepaths_get_req',

qiita_pet/support_files/doc/source/processingdata/index.rst

+7
Original file line numberDiff line numberDiff line change
@@ -202,3 +202,10 @@ Closed-Reference OTU Picking
202202
* **Sortmerna e_value** :ref:`[10]<reference10>` (required): Maximum e-value when clustering (local sequence alignment tool for filtering, mapping, and OTU picking) can expect to see by chance when searching a database
203203
* **Sortmerna max-pos** :ref:`[10]<reference10>` (required): Maximum number of positions per seed to store in the indexed database
204204
* **Threads** (required): Number of threads to use per job
205+
206+
Woltka / bowtie2 paired-end
207+
---------------------------
208+
209+
.. toctree::
210+
211+
woltka_pairedend.rst

qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst

+11-7
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,22 @@ References:
7171
Shotgun sequencing
7272
------------------
7373

74-
Qiita currently has one active shotgun metagenomics data analysis pipeline: a per sample
74+
Qiita currently has one active shotgun metagenomics data analysis pipeline: a per sample, paired-end
7575
bowtie2 alignment step with Woltka classification using either the WoLr2 (default) or RS210 databases.
7676
Below you will find more information about each of these options.
7777

7878
.. note::
79-
The bowtie2 settings are maximum and minimum mismatch penalties (mp=[1,1]), a
80-
penalty for ambiguities (np=1; default), read and reference gap open- and
79+
The bowtie2 settings are set for interleaved processing with a maximum and minimum mismatch
80+
penalties (mp=[1,1]), a penalty for ambiguities (np=1; default), read and reference gap open and
8181
extend penalties (rdg=[0,1], rfg=[0,1]), a minimum alignment score for an
8282
alignment to be considered valid (score-min=[L,0,-0.05]), a defined number of
8383
distinct, valid alignments (k=16), and the suppression of SAM records for
84-
unaligned reads, as well as SAM headers (no-unal, no-hd).
84+
unaligned reads, as well as SAM headers (no-unal, no-hd), and using end-to-end alignments
85+
before using the multiseed heuristic (no-exact-upfront, no-1mm-upfront). More information visit:
86+
87+
.. toctree::
88+
89+
woltka_pairedend.rst
8590

8691
The current workflow is as follows:
8792

@@ -110,10 +115,9 @@ For more information about the versions in this plugin, visit:
110115

111116
qp-fastp-minimap2.rst
112117

113-
Note that the command produces up to 6 output artifacts based on the aligner and database selected:
118+
Note that the command produces up to 5 output artifacts based on the aligner and database selected:
114119

115-
- Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
116-
- Per genome Predictions: contains the per genome level predictions BIOM table
120+
- Per genome Predictions: contains the raw alignment file and the per genome level predictions BIOM table
117121
- Per gene Predictions: Only WoLr2, contains the per gene level predictions BIOM table
118122
- KEGG Pathways: Only WoLr2, contains the functional profile
119123
- KEGG Ontology (KO): Only WoLr2, contains the functional profile
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
Wolka and Bowtie2 using Read Pairing Schemes
2+
============================================
3+
4+
Benchmarks created by Qiyun Zhu (@qiyunzhu) on Aug 1, 2024.
5+
6+
Summary
7+
-------
8+
9+
I tested alternative read pairing schemes in the analysis of shotgun metagenomic sequencing data. Sequencing reads were aligned against a reference microbial genome database as unpaired or paired, with or without singleton and/or discordant alignments suppressed. A series of synthetic datasets were used in the analysis.
10+
11+
The results reveal that treating reads as paired is always advantageous over unpaired. Suppressing singleton alignments further increases the accuracy of results, despite the cost of lower mapping rate. Suppressing discordant alignments has no obvious impact on the result. Regardless of accuracy, the downstream community ecology analyses are not obviously impacted by the choice of parameters.
12+
13+
Therefore, I recommend the general adoption of paired alignments as a standard procedure. I also endorse suppressing singleton and discordant alignments, but note the favor of further tests on whether they may reduce sensitivity with complex communities.
14+
15+
Alignment parameters
16+
--------------------
17+
18+
Sequencing data were aligned using Bowtie2 v2.5.1 in the “very sensitive” mode against the WoL2 database. They were treated as either unpaired or paired-end:
19+
20+
- SE: Reads are treated as unpaired (Bowtie2 input: -U merged.fq)
21+
- PE: Reads are treated as paired (Bowtie2 input: -1 fwd.fq, -2 rev.fq)
22+
- PE.NU: flags `--no-exact-upfront --no-1mm-upfront`.
23+
24+
Resulting alignment files (SAM format) were processed by Woltka v0.1.6 using default parameters to generate OGU tables.
25+
26+
Synthetic data
27+
--------------
28+
29+
Five synthetic datasets were generated with 25 samples each consisting of randomly selected WoL2 genomes. CAMISIM was executed to simulate 500 Mbp of 150 bp paired-end Illumina sequencing reads (appr. 3.3 million reads) per sample. The five datasets have different taxon count and distribution patterns. The result of one of the five datasets is displayed below. It consists of 400 taxa (more than others) and therefore is presumably the most realistic. However, all five results largely shared the same pattern.
30+
31+
The results of the five Bowtie2 parameter sets were compared using nine metrics:
32+
33+
Three metrics that only rely on each result.
34+
35+
- Mapping rate (%)
36+
- Number of taxa
37+
- Entropy (i.e., Shannon index, but without subsampling)
38+
39+
Six metrics that rely on comparing each result against the ground truth (higher is better):
40+
41+
- Presence/absence-based:
42+
- Precision (fraction of discovered taxa that are true)
43+
- Recall (sensitivity) (fraction of true taxa that are discovered)
44+
- F1 score (combination of precision and recall)
45+
- Abundance-based:
46+
47+
- Spearman correlation coefficient
48+
- Bray-Curtis similarity *
49+
- Weighted UniFrac similarity *
50+
51+
* Note: Bray-Curtis and weighted UniFrac similarities were calculated after subsampling to a constant sum of taxon frequencies per sample.
52+
53+
.. figure:: woltka_synthetic.png
54+
:align: center
55+
56+
57+
The results revealed:
58+
59+
#. PE outperforms SE in all metrics. Most importantly, it reduces false positive rate (higher precision) while retaining mapping rate. Meanwhile, the sensitivity (recall) of identifying true taxa is not obviously compromised (note the y-axis scale).
60+
#. PE.NU the two additional parameters had minimum effect on the result and make the alignment step faster. This may suggest that the additional parameters are safe to use.
61+
62+
Therefore, I would recommend adopting paired alignment in preference to unpaired alignment. I may suggest no mixing as it has improved accuracy, but the potential adverse effect of lower mapping rate may be further explored before making a compelling recommendation. Although not having a visible effect, no discordance may be added for logical coherency.
Loading

qiita_ware/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# The full license is in the file LICENSE, distributed with this software.
77
# -----------------------------------------------------------------------------
88

9-
__version__ = "2024.02"
9+
__version__ = "2024.09"

scripts/qiita-recover-jobs

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ def _submit_jobs(jids_to_recover, recover_type):
3131
st = int(ceil(SLEEP_TIME/CHANCES))
3232
len_jids_to_recover = len(jids_to_recover)
3333
for i, j in enumerate(jids_to_recover):
34-
print('recovering %s: %d/%d' % (recover_type, len_jids_to_recover, i))
34+
print(f'recovering {j} {recover_type}: {len_jids_to_recover}/{i}')
3535
job = ProcessingJob(j)
3636
job._set_status('in_construction')
3737
job.submit()

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from setuptools import setup
1111
from glob import glob
1212

13-
__version__ = "2024.02"
13+
__version__ = "2024.09"
1414

1515

1616
classes = """

0 commit comments

Comments
 (0)