NP3_MS_Workflow_Nextflow/workflowinput.yaml at master · crisfbazz/NP3_MS_Workflow_Nextflow · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
workflowname: NP3_MS_Workflow_nextflow
workflowdescription: This is the NP3 MS Workflow v1.3.0 for GNPS2 using Nextflow.
workflowlongdescription: The NP3 MS Workflow is a pipeline for LC-MS/MS metabolomics data process and analysis focused on untargeted data. The NP3 *run* command is implemented here, which executes Steps 2 to 10. The Pre Process command (Step 2) is executed separated with provided parameters. ** Checkout the NP³ repository for details about these commands and the pipeline steps - https://github.com/danielatrivella/NP3_MS_Workflow **
workflowversion: "2025.11.26"
workflowfile: nf_workflow.nf
workflowautohide: false
adminonly: false
#This maps the parameters from an input form to those that will appear in nextflow
parameterlist:
    - displayname: Mandatory Parameters - Files Selection
      paramtype: section

    - displayname: Input Metadata Table
      paramtype: fileselector
      nf_paramname: metadata
      formplaceholder: Enter the path to the metadata table CSV file.
      formvalue: ""
      targettaskfolder: metadata
      optional: false
      selectsinglefile: true
      tooltip: "the path to the metadata table following the format expected by the NP3"

    - displayname: Input Raw Data Folder
      paramtype: fileselector
      nf_paramname: raw_data_path
      formplaceholder: Enter the path to the raw_data_path.
      formvalue: ""
      targettaskfolder: raw_data_path
      optional: false
      selectsinglefile: false
      folderunroll: true
      tooltip: "the path to the folder containing the input LC-MS/MS raw spectra data files (mzXML format is recommended). The pre_process (Step 2) output will be stored here. To use a previous pre processed result, this path should point to the directory storing the pre_processed result folder (the 'np3_results/pre_processed_results/' folder of a previous workflow result)."

    - displayname: Critical Parameters - LC-MS/MS
      paramtype: section

    # critical parms for *run*
    - displayname: Precursor m/z tolerance
      paramtype: text
      nf_paramname: mz_tolerance
      formplaceholder: Enter the m/z tolerance for MS2 precursor
      formvalue: "0.025"
      tooltip: "the tolerance in Daltons for the m/z of the precursor that determines if two spectra will be compared and possibly joined. Used in the clustering jobs (Step 3), in the cleaning (Step 5), in the library identifications (Step 6) and in the annotation of ionization variants (Step 7 - also used for the fragment tolerance of the annotations) (default to 0.025)."

    - displayname: Fragment tolerance for MS2 peaks
      paramtype: text
      nf_paramname: fragment_tolerance
      formplaceholder: Enter the tolerance in Daltons for MS2 peaks
      formvalue: "0.05"
      tooltip: "The tolerance in Daltons for fragment peaks. Peaks in the original MS/MS spectra that are closer than this get merged in the clustering jobs (Step 3). Also used in the pre process (Step 2), in the spectra similarity comparisons and in cleaning (Step 5) (default to 0.05)."

    - displayname: Ion Mode
      paramtype: text
      nf_paramname: ion_mode
      formplaceholder: Enter '1' for positive or '2' for negative ion mode
      formvalue: "1"
      tooltip: "the precursor ion mode. One of the following numeric values corresponding to an ion adduct type = '1' for positive [M+H]+; or '2' for negative [M-H]- ion mode (default to '1')."

    # critical parms for *pre_process*
    - displayname: Critical Parameters - Pre_process
      paramtype: section

    - displayname: Expected peak width of MS1 - minimum and maximum
      paramtype: text
      nf_paramname: peak_width
      formplaceholder: Enter the expected minimum,maximum peak width
      formvalue: "2,10"
      tooltip: "two numeric values separated by comma without spaces and using decimal point equals dot, containing the expected approximate peak width in chromatographic space. Given as a range (min,max) in seconds. The mean value will be used to simulate the width of the fake MS1 peaks (see documentation, default to '2,10')."

    - displayname: MS1 and MS2 retention time deviation
      paramtype: text
      nf_paramname: rt_tolerance_deviation
      formplaceholder: Enter the retention time deviation between MS1 and MS2
      formvalue: "3.0"
      tooltip: "The retention time tolerance in seconds used to enlarge the MS1 peak boundaries and accept as a match all MS2 ions that have a retention time value within the enlarged MS1 peak range. This tolerance is applied to both sides of the MS1 peaks (RTmin - rt_tolerance and RTmax + rt_tolerance). Tries to overcome bad MS1 peak integrations (default to 3s)."

    - displayname: MS1 and MS2 m/z deviation
      paramtype: text
      nf_paramname: mz_tolerance_deviation
      formplaceholder: Enter the m/z deviation between MS1 and MS2
      formvalue: "0.05"
      tooltip: "The tolerance in Daltons for matching a MS1 peak m/z with a MS2 spectrum precursor m/z (default to 0.05)."

    - displayname: PPM tolerance for MS1
      paramtype: text
      nf_paramname: ppm_tolerance
      formplaceholder: Enter the PPM tolerance for MS1
      formvalue: "15"
      tooltip: "the maximal tolerated m/z deviation in consecutive MS1 scans in parts per million (ppm) for the initial ROI definition of the R::xcms::centWave algorithm (Step 2). Typically set to a generous multiple of the mass accuracy of the mass spectrometer (default to 15)."

    # other parms for *run*
    - displayname: Clustering Parameters
      paramtype: section

    - displayname: Retention time tolerances
      paramtype: text
      nf_paramname: rt_tolerance
      formplaceholder: Enter x,y retention time tolerances in seconds
      formvalue: "1,2"
      tooltip: "x,y retention time tolerances in seconds for the retention time width of the precursor that determines if two spectra will be compared and possibly joined. It is directly applied to the retention time minimum (subtracted) and maximum (added) of the spectra. It enlarges the peak boundaries to deal with misaligned samples or ionization variant spectra. The first tolerance [x] is used in Step 3 (first clustering) and Step 7 (ionization variants annotation); and the second tolerance [y] is used in Step 3 (final clustering) and Step 5 (Clean) (default to '1,2')."

# removed for now
#    - displayname: Similarity Function for Clean and MN
#      paramtype: select
#      nf_paramname: similarity_function
#      formvalue: np3_shifted_cosine
#      tooltip: "the similarity function to be used in the spectra comparison to create the pairwise similarity tables for clean (Step 5) and molecular networking (Step 10). One of 'np3_shifted_cosine' or 'spec2vec'. If 'spec2vec' is selected, the model trained on UniqueInchikey subset (12,797 spectra) is used by spec2vec in the spectra comparison; otherwise, the NP3 shifted cosine function is used (default to 'np3_shifted_cosine')."
#      options:
#        - value: np3_shifted_cosine
#          display: np3_shifted_cosine
#        - value: spec2vec
#          display: spec2vec

    - displayname: Trim Precursor m/z
      paramtype: select
      nf_paramname: trim_mz
      formvalue: TRUE
      tooltip: "A logical 'True' or 'False' indicating if the spectra fragmented peaks around the precursor m/z +-20 Da should be deleted before the pairwise comparisons. If 'True' this removes the residual precursor ion, which is frequently observed in MS/MS spectra acquired on qTOFs (default to 'True')."
      options:
        - value: TRUE
          display: True
        - value: FALSE
          display: False

    - displayname: Noise cutoff
      paramtype: text
      nf_paramname: noise_cutoff
      formplaceholder: Enter the noise cutoff value
      formvalue: "FALSE"
      tooltip: "A positive numeric value to scale the interquartile range (IQR) of the blank spectra basePeakInt distribution from the clustering Step 3 result and to remove the spectra with a basePeakInt value below this distribution median plus IQR*noise_cutoff after the clean Step 5. Or FALSE to disable it. When no blank sample is present in the metadata, the full distribution is used. This cutoff will affect the spectra with a low basePeakInt value that probably are noise features."

    - displayname: Molecular Networking Parameters
      paramtype: section

    - displayname: Minimum Similarity
      paramtype: text
      nf_paramname: similarity_mn
      formplaceholder: Enter the minimum similarity score for molecular networking
      formvalue: "0.6"
      tooltip: "the minimum similarity score that must occur between a pair of consensus spectra to connect them with a link in the molecular network of similarity. Lower values will increase the components sizes by inducing the connection of less related spectra; and higher values will limit the components sizes to the opposite."

    - displayname: Network Top K
      paramtype: text
      nf_paramname: net_top_k
      formplaceholder: Enter the maximum top K for molecular networking
      formvalue: "15"
      tooltip: "the maximum number of connections for one single node in the molecular network of similarity. A link between two nodes is kept only if both nodes are within each other's [x] most similar nodes. Keeping this value low makes very large networks (many nodes) much easier to visualize (default to 15)."

    - displayname: Maximum Component Size
      paramtype: text
      nf_paramname: max_component_size
      formplaceholder: Enter the maximum number of nodes for the network components
      formvalue: "200"
      tooltip: "the maximum number of nodes that each component of the molecular network of similarity must have (Step 10). The links of this network will be removed using an increasing cosine threshold until each component has at most X nodes. Keeping this value low makes very large networks (many nodes and links) much easier to visualize (default to 200)."

    - displayname: Minimum Number of Matched Peaks
      paramtype: text
      nf_paramname: min_matched_peaks
      formplaceholder: Enter the minimum number of matched peaks to connect spectra
      formvalue: "6"
      tooltip: "The minimum number of common peaks that two spectra must share to be connected by an edge in the filtered SSMN. Connections between spectra with less common peaks than this cutoff will be removed when filtering the SSMN. Except for when one of the spectra have a number of fragment peaks smaller than the given min_matched_peaks value, in this case the spectra must share at least 2 peaks. The fragment peaks count is performed after the spectra are normalized and cleaned (default to 6)."


    - displayname: Optional Parameters - admins
      paramtype: section

    # install_dependencies
    - displayname: Install Dependencies
      paramtype: select
      nf_paramname: install_dependencies
      formvalue: "No"
      options:
        - value: "Yes"
          display: "Yes"
        - value: "No"
          display: "No"