You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-8Lines changed: 16 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,12 +12,11 @@
12
12
- Now the PCA creation in the final reports procedure removes any blanks or beds before plotting; and also creates a PCA plot with only the putative protonated m/z.
13
13
- Bug fix in the join_jobs command, Step 7 to join IVAMNs (fixed the removal of invalid connections).
14
14
- In the **test** command, added a new output_path parameter to allow running and storing the NP³ test results in a different folder (not in the repository folder).
15
-
- (1.2.2) The command **join_jobs** was updated with incremental clustering approaches to guarantee that the msclusterIDs from the reference job are maintained throughout consecutive executions. Also, the SAMPLE_CODE from the samples metadata among different original jobs are now allowed to contain duplicated values, which are automatically resolved and the original codes and last used codes are stored in new separated columns (see documentation). Additionally, now only the results from the joining jobs are needed, the results from all original jobs do not need to be kept for consecutive executions.
16
-
- A final report was implemented and the conda environment was updated with new dependencies. The **setup** command must be executed again for this new version after updating the conda environment. The final reports are documented in the manual, they contain quantification, chemical and molecular networking statistics and plots. The chemical space of the identified result using PCA method is also created using reference datasets for comparison and reproducibility.
15
+
- (1.2.2) The command **join_jobs** was updated with incremental clustering approaches to guarantee that the msclusterIDs from the reference job are maintained throughout consecutive executions(see documentation).
16
+
- A final report was implemented and the conda environment was updated with new dependencies. The **setup** command must be executed again for this new version after updating the conda environment.
17
17
- The **gnps_result** command have one new mandatory parameter called job_output_path and one new optional parameter equal to the metadata table path.
18
18
- (1.2.1) The quantification grouping is now performed by the *corr* command (Step 9) and before every biocorrelation computation. This allows adding extra groupings to a job a posteriori its processing. Useful to add groups to a joined job.
19
-
- (1.2.0) A new command called **join_jobs** was created! The *join_jobs* command is used to join NP³ jobs (results of the *run* or the *join_jobs* commands) into a single united job.
20
-
- The *join_jobs* command may be useful for processing growing libraries, which will have new datasets being included from time to time; or for processing very large jobs, which may be divided into smaller jobs and then joined by chunks with a smaller memory footprint (divide and conquer strategy).
19
+
- (1.2.0) A new command called **join_jobs** was created! The *join_jobs* command is used to join NP³ jobs (results of the *run* or the *join_jobs* commands) into a single united job.
21
20
22
21
- - - -
23
22
@@ -42,7 +41,7 @@ The NP³ MS workflow consists of ten major steps, where only the first requires
42
41
43
42
This workflow also contains two interactive commands for MS1 and MS2 data visualization and analysis. A third command to join the GNPS library identification results to the NP³ MS workflow quantification tables. A fourth command to unite different results from the NP³ MS Workflow. And a fifth command to create PCA plots int he NP³ reference chemical space.
44
43
45
-
The Steps 2 to 10 can be automatically executed with the NP³ command **run**. And different results from the NP³ MS Workflow may be united using the command **join_jobs**, which automatically execute Steps 3 to 10 with adaptations for joining previous results in an incremental clustering approach.
44
+
The Steps 2 to 10 can be automatically executed with the NP³ command **run**. And different results from the NP³ MS Workflow may be united using the command **join_jobs**, which automatically execute Steps 3 to 10 with adaptations for joining previous results in an incremental clustering approach. At the end of the pipeline processing, final reports are created containing quantification, chemical and molecular networking statistics and plots. The chemical space of the identified result using PCA method is also created using a reference dataset for comparison and reproducibility.
46
45
47
46
For the complete details of each command see the [NP³ MS workflow user manual](docs/Manual_NP3_workflow.pdf).
48
47
@@ -197,7 +196,7 @@ L754_bacs_test
197
196
│
198
197
├── outs <- the results from the clustering steps in separated folders, and inside them the results from the other workflow steps as described below
199
198
│ │
200
-
│ ├── L754_bacs_test <- the final clustering (Step 3) result folder
199
+
│ ├── L754_bacs_test <- the final clustering (Step 3) result folder - final results!
201
200
│ │ │
202
201
| | ├── clust <- the folder with clusters membership files (which SCANS or msclusterID were joined in the final clustering step) (Step3)
203
202
│ │ │
@@ -206,7 +205,15 @@ L754_bacs_test
206
205
│ │ │ ├── clean <- the folder with the quantification tables from Steps 5, 7 and 9
207
206
│ │ │ |
208
207
│ │ │ └── merge <- the folder with the quantification tables from Steps 8 and 9
209
-
│ │ │
208
+
│ │ │
209
+
| | ├── final_reports <- the folder with the final reports computed at the end of the processing based on the final clean counts and identifications
210
+
│ │ │ |
211
+
│ │ │ ├── chemical_report <- the folder with the chemical statistics and PCA plots (chemical_space_identifications subfolder)
212
+
│ │ │ |
213
+
│ │ │ ├── molecular_networking_report <- the folder with the molecular networks statistics
214
+
│ │ │ |
215
+
│ │ │ └── quantification_report <- the folder with the quantification statistics
216
+
│ │ │
210
217
| | ├── identifications <- the folder with the complete list of identifications from UNPD returned by tremolo
211
218
│ │ │
212
219
| | ├── mgf <- the folder with the MGF files from the clustering Step 3 (named L754_bacs_test_all.mgf), containing the complete list of consensus spectra, and from the clean Step 5 (named L754_bacs_test_clean.mgf), containing the final list of clean consensus spectra.
@@ -329,7 +336,8 @@ Commands:
329
336
- List of mandatory options:
330
337
- *\-o, \-\-output_path* \<path\> : path to the output data folder, inside the outs directory of the clustering result folder. It should contain the 'molecular_networking' folder and inside it the 'similarity_tables' folder. The job name will be extracted from here
331
338
332
-
- **join_jobs** [options] : Command to join NP³ jobs (results of the *run* or the *join_jobs* commands) into a single united job using an incremental clustering approach. Concatenate different jobs without the need of running them all together again. Uses a different metadata, called *metadata_join*, defining the jobs to be joined and their unique reference codes, the names of their used metadata and pre processing directory. It uses the clean results from the provided NP3 jobs and execute the main pipeline from Step 3 to 10 with some modifications and adaptations in an incremental clustering manner, except for Step 8 which is skipped.
339
+
- **join_jobs** [options] : Command to join NP³ jobs (results of the *run* or the *join_jobs* commands) into a single united job using an incremental clustering approach. Concatenate different jobs without the need of running them all together again. Uses a different metadata, called *metadata_join*, defining the jobs to be joined and their unique reference codes, the names of their used metadata and pre processing directory. It uses the clean results from the provided NP3 jobs and execute the main pipeline from Step 3 to 10 with some modifications and adaptations in an incremental clustering manner, except for Step 8 which is skipped.
340
+
The *join_jobs* command may be useful for processing growing libraries, which will have new datasets being included from time to time; or for processing very large jobs, which may be divided into smaller jobs and then joined by chunks with a smaller memory footprint (divide and conquer strategy)
333
341
- List of mandatory options:
334
342
- *\-n, \-\-output_name* \<name\> : the job name. It will be used to name the output directory and the results from joining the jobs. It must have less than 80 characters.
335
343
- *\-m, \-\-metadata_join* \<file\> : path to the metadata_join table CSV file defining the jobs to be joined. Different format, see manual.
0 commit comments