Skip to content

Commit 13eafdf

Browse files
committed
fix CHANGELOG.md CONFLICT
2 parents 57d1b14 + 62292ca commit 13eafdf

39 files changed

+832
-267
lines changed

Diff for: .github/workflows/qiita-ci.yml

+2
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,8 @@ jobs:
154154
155155
echo "5. Setting up qiita"
156156
conda activate qiita
157+
# adapt environment_script for private qiita plugins from travis to github actions.
158+
sed 's#export PATH="/home/travis/miniconda3/bin:$PATH"; source #source /home/runner/.profile; conda #' -i qiita_db/support_files/patches/54.sql
157159
qiita-env make --no-load-ontologies
158160
qiita-test-install
159161
qiita plugins update

Diff for: CHANGELOG.md

+14
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# Qiita changelog
22

3+
Version 2024.02
4+
---------------
5+
6+
Deployed on February 27th, 2024
7+
8+
* Default workflows now accept commands with multiple inputs.
9+
* The loading time of the main study page was improved [#3350](https://github.com/qiita-spots/qiita/pull/3350).
10+
* SPP improvements - mainly @charles-cowart, thank you! Errors are now show to the user in the GUI [#127](hhttps://github.com/biocore/mg-scripts/pull/127), admins can restart jobs [#129](hhttps://github.com/biocore/mg-scripts/pull/129), adapter-trimmer files now are stored and their sequence counts are part of the prep-info [#126](hhttps://github.com/biocore/mg-scripts/pull/126), and support for per instrument/data-type configuration [#123](hhttps://github.com/biocore/mg-scripts/pull/123).
11+
* The internal Sequence Processing Pipeline is now using the https://www.gencodegenes.org human transcripts v44 for Metatranscriptomic data - additional to the human pan-genome reference, with the GRCh38 genome + PhiX and T2T-CHM13v2.0 genome - for human host filtering.
12+
* Added a command to qp-woltka: 'Calculate RNA Copy Counts'.
13+
* Other fixes - mainly by @sjanssen2, thank you!: [#3345](https://github.com/qiita-spots/qiita/pull/3345),[#3224](https://github.com/qiita-spots/qiita/pull/3224), [#3357](https://github.com/qiita-spots/qiita/pull/3357), [#3358](https://github.com/qiita-spots/qiita/pull/3358), [#3359](https://github.com/qiita-spots/qiita/pull/3359), [#3362](https://github.com/qiita-spots/qiita/pull/3362), [#3364](https://github.com/qiita-spots/qiita/pull/3364).
14+
15+
316
Version 2023.12
417
---------------
518

@@ -14,6 +27,7 @@ Deployed on January 8th, 2024
1427
* Updated the Adapter and host filtering plugin (qp-fastp-minimap2) to v2023.12 addressing a bug in adapter filtering; [more information](https://qiita.ucsd.edu/static/doc/html/processingdata/qp-fastp-minimap2.html).
1528
* Other fixes: [3334](https://github.com/qiita-spots/qiita/pull/3334), [3338](https://github.com/qiita-spots/qiita/pull/3338). Thank you @sjanssen2.
1629
* The internal Sequence Processing Pipeline is now using the human pan-genome reference, together with the GRCh38 genome + PhiX and T2T-CHM13v2.0 genome for human host filtering.
30+
* Added two new commands to qp-woltka: 'SynDNA Woltka' & 'Calculate Cell Counts'.
1731

1832

1933
Version 2023.10

Diff for: INSTALL.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,9 @@ Navigate to the cloned directory and ensure your conda environment is active:
162162
cd qiita
163163
source activate qiita
164164
```
165-
If you are using Ubuntu or a Windows Subsystem for Linux (WSL), you will need to ensure that you have a C++ compiler and that development libraries and include files for PostgreSQL are available. Type `cc` into your system to ensure that it doesn't result in `program not found`. The following commands will install a C++ compiler and `libpq-dev`:
165+
If you are using Ubuntu or a Windows Subsystem for Linux (WSL), you will need to ensure that you have a C++ compiler and that development libraries and include files for PostgreSQL are available. Type `cc` into your system to ensure that it doesn't result in `program not found`. If you use the the GNU Compiler Collection, make sure to have `gcc` and `g++` available. The following commands will install a C++ compiler and `libpq-dev`:
166166
```bash
167-
sudo apt install gcc # alternatively, you can install clang instead
167+
sudo apt install gcc g++ # alternatively, you can install clang instead
168168
sudo apt-get install libpq-dev
169169
```
170170
Install Qiita (this occurs through setuptools' `setup.py` file in the qiita directory):
@@ -178,7 +178,7 @@ At this point, Qiita will be installed and the system will start. However,
178178
you will need to install plugins in order to process any kind of data. For a list
179179
of available plugins, visit the [Qiita Spots](https://github.com/qiita-spots)
180180
github organization. Each of the plugins have their own installation instructions, we
181-
suggest looking at each individual .travis.yml file to see detailed installation
181+
suggest looking at each individual .github/workflows/qiita-plugin-ci.yml file to see detailed installation
182182
instructions. Note that the most common plugins are:
183183
- [qtp-biom](https://github.com/qiita-spots/qtp-biom)
184184
- [qtp-sequencing](https://github.com/qiita-spots/qtp-sequencing)
@@ -224,15 +224,15 @@ export REDBIOM_HOST=http://my_host.com:7379
224224

225225
## Configure NGINX and supervisor
226226

227-
(NGINX)[https://www.nginx.com/] is not a requirement for Qiita development but it's highly recommended for deploys as this will allow us
228-
to have multiple workers. Note that we are already installing (NGINX)[https://www.nginx.com/] within the Qiita conda environment; also,
229-
that Qiita comes with an example (NGINX)[https://www.nginx.com/] config file: `qiita_pet/nginx_example.conf`, which is used in the Travis builds.
227+
[NGINX](https://www.nginx.com/) is not a requirement for Qiita development but it's highly recommended for deploys as this will allow us
228+
to have multiple workers. Note that we are already installing [NGINX](https://www.nginx.com/) within the Qiita conda environment; also,
229+
that Qiita comes with an example [NGINX](https://www.nginx.com/) config file: `qiita_pet/nginx_example.conf`, which is used in the Travis builds.
230230

231-
Now, (supervisor)[https://github.com/Supervisor/supervisor] will allow us to start all the workers we want based on its configuration file; and we
232-
need that both the (NGINX)[https://www.nginx.com/] and (supervisor)[https://github.com/Supervisor/supervisor] config files to match. For our Travis
231+
Now, [supervisor](https://github.com/Supervisor/supervisor) will allow us to start all the workers we want based on its configuration file; and we
232+
need that both the [NGINX](https://www.nginx.com/) and [supervisor](https://github.com/Supervisor/supervisor) config files to match. For our Travis
233233
testing we are creating 3 workers: 21174 for master and 21175-6 as a regular workers.
234234

235-
If you are using (NGINX)[https://www.nginx.com/] via conda, you are going to need to create the NGINX folder within the environment; thus run:
235+
If you are using [NGINX](https://www.nginx.com/) via conda, you are going to need to create the NGINX folder within the environment; thus run:
236236

237237
```bash
238238
mkdir -p ${CONDA_PREFIX}/var/run/nginx/
@@ -256,7 +256,7 @@ Start the qiita server:
256256
qiita pet webserver start
257257
```
258258

259-
If all the above commands executed correctly, you should be able to access Qiita by going in your browser to https://localhost:21174 if you are not using NGINX, or https://localhost:8383 if you are using NGINX, to login use `[email protected]` and `password` as the credentials. (In the future, we will have a *single user mode* that will allow you to use a local Qiita server without logging in. You can track progress on this on issue [#920](https://github.com/biocore/qiita/issues/920).)
259+
If all the above commands executed correctly, you should be able to access Qiita by going in your browser to https://localhost:21174 if you are not using NGINX, or https://localhost:8383 if you are using NGINX, to login use `[email protected]` and `password` as the credentials. (Login as `[email protected]` with `password` to see admin functionality. In the future, we will have a *single user mode* that will allow you to use a local Qiita server without logging in. You can track progress on this on issue [#920](https://github.com/biocore/qiita/issues/920).)
260260

261261

262262

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
from qiita_core.util import MaxRSS_helper
2+
from qiita_db.software import Software
3+
import datetime
4+
from io import StringIO
5+
from subprocess import check_output
6+
import pandas as pd
7+
from os.path import join
8+
9+
# This is an example script to collect the data we need from SLURM, the plan
10+
# is that in the near future we will clean up and add these to the Qiita's main
11+
# code and then have cronjobs to run them.
12+
13+
# at time of writting we have:
14+
# qp-spades spades
15+
# (*) qp-woltka Woltka v0.1.4
16+
# qp-woltka SynDNA Woltka
17+
# qp-woltka Calculate Cell Counts
18+
# (*) qp-meta Sortmerna v2.1b
19+
# (*) qp-fastp-minimap2 Adapter and host filtering v2023.12
20+
# ... and the admin plugin
21+
# (*) qp-klp
22+
# Here we are only going to create summaries for (*)
23+
24+
25+
sacct = ['sacct', '-p',
26+
'--format=JobName,JobID,ElapsedRaw,MaxRSS,ReqMem', '-j']
27+
# for the non admin jobs, we will use jobs from the last six months
28+
six_months = datetime.date.today() - datetime.timedelta(weeks=6*4)
29+
30+
print('The current "sofware - commands" that use job-arrays are:')
31+
for s in Software.iter():
32+
if 'ENVIRONMENT="' in s.environment_script:
33+
for c in s.commands:
34+
print(f"{s.name} - {c.name}")
35+
36+
# 1. Command: woltka
37+
38+
fn = join('/panfs', 'qiita', 'jobs_woltka.tsv.gz')
39+
print(f"Generating the summary for the woltka jobs: {fn}.")
40+
41+
cmds = [c for s in Software.iter(False)
42+
if 'woltka' in s.name for c in s.commands]
43+
jobs = [j for c in cmds for j in c.processing_jobs if j.status == 'success' and
44+
j.heartbeat.date() > six_months and j.input_artifacts]
45+
46+
data = []
47+
for j in jobs:
48+
size = sum([fp['fp_size'] for fp in j.input_artifacts[0].filepaths])
49+
jid, mjid = j.external_id.strip().split()
50+
rvals = StringIO(check_output(sacct + [jid]).decode('ascii'))
51+
_d = pd.read_csv(rvals, sep='|')
52+
jmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str
53+
else MaxRSS_helper(x)).max()
54+
jwt = _d.ElapsedRaw.max()
55+
56+
rvals = StringIO(check_output(sacct + [mjid]).decode('ascii'))
57+
_d = pd.read_csv(rvals, sep='|')
58+
mmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str
59+
else MaxRSS_helper(x)).max()
60+
mwt = _d.ElapsedRaw.max()
61+
62+
data.append({
63+
'jid': j.id, 'sjid': jid, 'mem': jmem, 'wt': jwt, 'type': 'main',
64+
'db': j.parameters.values['Database'].split('/')[-1]})
65+
data.append(
66+
{'jid': j.id, 'sjid': mjid, 'mem': mmem, 'wt': mwt, 'type': 'merge',
67+
'db': j.parameters.values['Database'].split('/')[-1]})
68+
df = pd.DataFrame(data)
69+
df.to_csv(fn, sep='\t', index=False)
70+
71+
# 2. qp-meta Sortmerna
72+
73+
fn = join('/panfs', 'qiita', 'jobs_sortmerna.tsv.gz')
74+
print(f"Generating the summary for the woltka jobs: {fn}.")
75+
76+
# for woltka we will only use jobs from the last 6 months
77+
cmds = [c for s in Software.iter(False)
78+
if 'minimap2' in s.name.lower() for c in s.commands]
79+
jobs = [j for c in cmds for j in c.processing_jobs if j.status == 'success' and
80+
j.heartbeat.date() > six_months and j.input_artifacts]
81+
82+
data = []
83+
for j in jobs:
84+
size = sum([fp['fp_size'] for fp in j.input_artifacts[0].filepaths])
85+
jid, mjid = j.external_id.strip().split()
86+
rvals = StringIO(check_output(sacct + [jid]).decode('ascii'))
87+
_d = pd.read_csv(rvals, sep='|')
88+
jmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str
89+
else MaxRSS_helper(x)).max()
90+
jwt = _d.ElapsedRaw.max()
91+
92+
rvals = StringIO(check_output(sacct + [mjid]).decode('ascii'))
93+
_d = pd.read_csv(rvals, sep='|')
94+
mmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str
95+
else MaxRSS_helper(x)).max()
96+
mwt = _d.ElapsedRaw.max()
97+
98+
data.append({
99+
'jid': j.id, 'sjid': jid, 'mem': jmem, 'wt': jwt, 'type': 'main'})
100+
data.append(
101+
{'jid': j.id, 'sjid': mjid, 'mem': mmem, 'wt': mwt, 'type': 'merge'})
102+
df = pd.DataFrame(data)
103+
df.to_csv(fn, sep='\t', index=False)
104+
105+
106+
# 3. Adapter and host filtering. Note that there is a new version deployed on
107+
# Jan 2024 so the current results will not be the most accurate
108+
109+
fn = join('/panfs', 'qiita', 'jobs_adapter_host.tsv.gz')
110+
print(f"Generating the summary for the woltka jobs: {fn}.")
111+
112+
# for woltka we will only use jobs from the last 6 months
113+
cmds = [c for s in Software.iter(False)
114+
if 'meta' in s.name.lower() for c in s.commands]
115+
jobs = [j for c in cmds if 'sortmerna' in c.name.lower()
116+
for j in c.processing_jobs if j.status == 'success' and
117+
j.heartbeat.date() > six_months and j.input_artifacts]
118+
119+
data = []
120+
for j in jobs:
121+
size = sum([fp['fp_size'] for fp in j.input_artifacts[0].filepaths])
122+
jid, mjid = j.external_id.strip().split()
123+
rvals = StringIO(check_output(sacct + [jid]).decode('ascii'))
124+
_d = pd.read_csv(rvals, sep='|')
125+
jmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str
126+
else MaxRSS_helper(x)).max()
127+
jwt = _d.ElapsedRaw.max()
128+
129+
rvals = StringIO(check_output(sacct + [mjid]).decode('ascii'))
130+
_d = pd.read_csv(rvals, sep='|')
131+
mmem = _d.MaxRSS.apply(lambda x: x if type(x) is not str
132+
else MaxRSS_helper(x)).max()
133+
mwt = _d.ElapsedRaw.max()
134+
135+
data.append({
136+
'jid': j.id, 'sjid': jid, 'mem': jmem, 'wt': jwt, 'type': 'main'})
137+
data.append(
138+
{'jid': j.id, 'sjid': mjid, 'mem': mmem, 'wt': mwt, 'type': 'merge'})
139+
df = pd.DataFrame(data)
140+
df.to_csv(fn, sep='\t', index=False)
141+
142+
143+
# 4. The SPP!
144+
145+
fn = join('/panfs', 'qiita', 'jobs_spp.tsv.gz')
146+
print(f"Generating the summary for the SPP jobs: {fn}.")
147+
148+
# for the SPP we will look at jobs from the last year
149+
year = datetime.date.today() - datetime.timedelta(days=365)
150+
cmds = [c for s in Software.iter(False)
151+
if s.name == 'qp-klp' for c in s.commands]
152+
jobs = [j for c in cmds for j in c.processing_jobs if j.status == 'success' and
153+
j.heartbeat.date() > year]
154+
155+
# for the SPP we need to find the jobs that were actually run, this means
156+
# looping throught the existing slurm jobs and finding them
157+
max_inter = 2000
158+
159+
data = []
160+
for job in jobs:
161+
jei = int(job.external_id)
162+
rvals = StringIO(
163+
check_output(sacct + [str(jei)]).decode('ascii'))
164+
_d = pd.read_csv(rvals, sep='|')
165+
mem = _d.MaxRSS.apply(
166+
lambda x: x if type(x) is not str else MaxRSS_helper(x)).max()
167+
wt = _d.ElapsedRaw.max()
168+
# the current "easy" way to determine if amplicon or other is to check
169+
# the file extension of the filename
170+
stype = 'other'
171+
if job.parameters.values['sample_sheet']['filename'].endswith('.txt'):
172+
stype = 'amplicon'
173+
rid = job.parameters.values['run_identifier']
174+
data.append(
175+
{'jid': job.id, 'sjid': jei, 'mem': mem, 'stype': stype, 'wt': wt,
176+
'type': 'main', 'rid': rid, 'name': _d.JobName[0]})
177+
178+
# let's look for the convert job
179+
for jid in range(jei + 1, jei + max_inter):
180+
rvals = StringIO(check_output(sacct + [str(jid)]).decode('ascii'))
181+
_d = pd.read_csv(rvals, sep='|')
182+
if [1 for x in _d.JobName.values if x.startswith(job.id)]:
183+
cjid = int(_d.JobID[0])
184+
mem = _d.MaxRSS.apply(
185+
lambda x: x if type(x) is not str else MaxRSS_helper(x)).max()
186+
wt = _d.ElapsedRaw.max()
187+
188+
data.append(
189+
{'jid': job.id, 'sjid': cjid, 'mem': mem, 'stype': stype,
190+
'wt': wt, 'type': 'convert', 'rid': rid,
191+
'name': _d.JobName[0]})
192+
193+
# now let's look for the next step, if amplicon that's fastqc but
194+
# if other that's qc/nuqc
195+
for jid in range(cjid + 1, cjid + max_inter):
196+
rvals = StringIO(
197+
check_output(sacct + [str(jid)]).decode('ascii'))
198+
_d = pd.read_csv(rvals, sep='|')
199+
if [1 for x in _d.JobName.values if x.startswith(job.id)]:
200+
qc_jid = _d.JobIDRaw.apply(
201+
lambda x: int(x.split('.')[0])).max()
202+
qcmem = _d.MaxRSS.apply(
203+
lambda x: x if type(x) is not str
204+
else MaxRSS_helper(x)).max()
205+
qcwt = _d.ElapsedRaw.max()
206+
207+
if stype == 'amplicon':
208+
data.append(
209+
{'jid': job.id, 'sjid': qc_jid, 'mem': qcmem,
210+
'stype': stype, 'wt': qcwt, 'type': 'fastqc',
211+
'rid': rid, 'name': _d.JobName[0]})
212+
else:
213+
data.append(
214+
{'jid': job.id, 'sjid': qc_jid, 'mem': qcmem,
215+
'stype': stype, 'wt': qcwt, 'type': 'qc',
216+
'rid': rid, 'name': _d.JobName[0]})
217+
for jid in range(qc_jid + 1, qc_jid + max_inter):
218+
rvals = StringIO(check_output(
219+
sacct + [str(jid)]).decode('ascii'))
220+
_d = pd.read_csv(rvals, sep='|')
221+
if [1 for x in _d.JobName.values if x.startswith(
222+
job.id)]:
223+
fqc_jid = _d.JobIDRaw.apply(
224+
lambda x: int(x.split('.')[0])).max()
225+
fqcmem = _d.MaxRSS.apply(
226+
lambda x: x if type(x) is not str
227+
else MaxRSS_helper(x)).max()
228+
fqcwt = _d.ElapsedRaw.max()
229+
data.append(
230+
{'jid': job.id, 'sjid': fqc_jid,
231+
'mem': fqcmem, 'stype': stype,
232+
'wt': fqcwt, 'type': 'fastqc',
233+
'rid': rid, 'name': _d.JobName[0]})
234+
break
235+
break
236+
break
237+
238+
df = pd.DataFrame(data)
239+
df.to_csv(fn, sep='\t', index=False)

Diff for: notebooks/resource-allocation/generate-allocation-summary.py

+2-12
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from json import loads
66
from os.path import join
77

8+
from qiita_core.util import MaxRSS_helper
89
from qiita_db.exceptions import QiitaDBUnknownIDError
910
from qiita_db.processing_job import ProcessingJob
1011
from qiita_db.software import Software
@@ -117,19 +118,8 @@
117118
print('Make sure that only 0/K/M exist', set(
118119
df.MaxRSS.apply(lambda x: str(x)[-1])))
119120

120-
121-
def _helper(x):
122-
if x[-1] == 'K':
123-
y = float(x[:-1]) * 1000
124-
elif x[-1] == 'M':
125-
y = float(x[:-1]) * 1000000
126-
else:
127-
y = float(x)
128-
return y
129-
130-
131121
# Generating new columns
132-
df['MaxRSSRaw'] = df.MaxRSS.apply(lambda x: _helper(str(x)))
122+
df['MaxRSSRaw'] = df.MaxRSS.apply(lambda x: MaxRSS_helper(str(x)))
133123
df['ElapsedRawTime'] = df.ElapsedRaw.apply(
134124
lambda x: timedelta(seconds=float(x)))
135125

Diff for: qiita_core/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# The full license is in the file LICENSE, distributed with this software.
77
# -----------------------------------------------------------------------------
88

9-
__version__ = "2023.12"
9+
__version__ = "2024.02"

Diff for: qiita_core/tests/test_util.py

+15-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
from qiita_core.util import (
1212
qiita_test_checker, execute_as_transaction, get_qiita_version,
13-
is_test_environment, get_release_info)
13+
is_test_environment, get_release_info, MaxRSS_helper)
1414
from qiita_db.meta_util import (
1515
generate_biom_and_metadata_release, generate_plugin_releases)
1616
import qiita_db as qdb
@@ -82,6 +82,20 @@ def test_get_release_info(self):
8282
self.assertEqual(biom_metadata_release, ('', '', ''))
8383
self.assertNotEqual(archive_release, ('', '', ''))
8484

85+
def test_MaxRSS_helper(self):
86+
tests = [
87+
('6', 6.0),
88+
('6K', 6000),
89+
('6M', 6000000),
90+
('6G', 6000000000),
91+
('6.9', 6.9),
92+
('6.9K', 6900),
93+
('6.9M', 6900000),
94+
('6.9G', 6900000000),
95+
]
96+
for x, y in tests:
97+
self.assertEqual(MaxRSS_helper(x), y)
98+
8599

86100
if __name__ == '__main__':
87101
main()

0 commit comments

Comments
 (0)