Skip to content

Commit df8591a

Browse files
authored
Merge pull request #51 from msk-access/sort_pv_filter
Sort pv filter
2 parents 5e7f718 + 8d2c57d commit df8591a

File tree

5 files changed

+249
-25
lines changed

5 files changed

+249
-25
lines changed

.github/workflows/document_package.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ jobs:
4040
git checkout docs --
4141
git checkout ${{ steps.extract_branch.outputs.branch }} -- README.md
4242
git checkout ${{ steps.extract_branch.outputs.branch }} -- docs/cli.md
43+
mv docs/cli.md cli.md
4344
- uses: EndBug/add-and-commit@v9
4445
with:
4546
default_author: github_actions

.github/workflows/python-publish.yml

Lines changed: 0 additions & 19 deletions
This file was deleted.

docs/cli.md

Lines changed: 211 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ $ main [OPTIONS] COMMAND [ARGS]...
1717

1818
* `maf`: operations for manipulating maf files...
1919
* `mutect1`: post-processing commands for MuTect...
20+
* `mutect2`: post-processing commands for MuTect...
2021
* `vardict`: post-processing commands for VarDict...
2122

2223
## `main maf`
@@ -58,9 +59,27 @@ $ main maf annotate [OPTIONS] COMMAND [ARGS]...
5859

5960
**Commands**:
6061

62+
* `extract_blocklist`: Extract values from an optional blocklist...
6163
* `mafbybed`: annotate a maf column by a bed file.
6264
* `mafbytsv`: annotate a maf column by a bed file.
6365

66+
#### `main maf annotate extract_blocklist`
67+
68+
Extract values from an optional blocklist file if provided. Used in SNVs/indels workflow.
69+
70+
**Usage**:
71+
72+
```console
73+
$ main maf annotate extract_blocklist [OPTIONS]
74+
```
75+
76+
**Options**:
77+
78+
* `-b, --blocklist_file FILE`: Blocklist text file to extract values from. Needs to be in TSV format [required]
79+
* `-m, --maf FILE`: MAF file to subset [required]
80+
* `-sep, --separator TEXT`: Specify a separator for delimited data. [default: tsv]
81+
* `--help`: Show this message and exit.
82+
6483
#### `main maf annotate mafbybed`
6584

6685
annotate a maf column by a bed file.
@@ -135,13 +154,66 @@ $ main maf filter [OPTIONS] COMMAND [ARGS]...
135154

136155
**Commands**:
137156

157+
* `access_filters`: Filter a MAF file based on all the...
158+
* `access_remove_variants`: Filter a MAF file based on all the...
138159
* `cmo_ch`: Filter a MAF file based on all the parameters
139160
* `hotspot`: filter a MAF file based on the presence of...
140161
* `mappable`: Filter a MAF file to retain only mappable...
141162
* `non_common_variant`: Filter a MAF file for common variants and...
142163
* `non_hotspot`: filter a MAF file based on the presence of...
143164
* `not_complex`: Filter a MAF filter for complex variants...
144165

166+
#### `main maf filter access_filters`
167+
168+
Filter a MAF file based on all the parameters listed in ACCESS filters python script
169+
170+
**Usage**:
171+
172+
```console
173+
$ main maf filter access_filters [OPTIONS]
174+
```
175+
176+
**Options**:
177+
178+
* `-f, --fillout_maf FILE`: Fillout MAF file to subset (direct output from traceback subworkflow) [required]
179+
* `-a, --anno_maf FILE`: Annotated MAF file to subset (direct input file from beginning of traceback subworkflow) [required]
180+
* `-o, --output PATH`: Maf output file name. [default: output]
181+
* `-sep, --separator TEXT`: Specify a seperator for delimited data. [default: tsv]
182+
* `-bl, --blocklist TEXT`: Optional input blocklist file for access filtering criteria. [default: tsv]
183+
* `-ts, --tumor_samplename TEXT`: Name of Tumor Sample [required]
184+
* `-ns, --normal_samplename TEXT`: Name of MATCHED normal sample [required]
185+
* `--tumor_detect_alt_thres TEXT`: The Minimum Alt depth required to be considered detected in fillout [default: 2]
186+
* `--tumor_detect_alt_thres TEXT`: The Minimum Alt depth required to be considered detected in fillout [default: 2]
187+
* `--curated_detect_alt_thres TEXT`: The Minimum Alt depth required to be considered detected in fillout [default: 2]
188+
* `--plasma_detect_alt_thres TEXT`: The Minimum Alt depth required to be considered detected in fillout [default: 2]
189+
* `--tumor_TD_min TEXT`: The Minimum Total Depth required in tumor to consider a variant Likely Germline [default: 20]
190+
* `--normal_TD_min TEXT`: The Minimum Total Depth required in Matched Normal to consider a variant Germline [default: 20]
191+
* `--tumor_vaf_germline_thres TEXT`: The threshold for variant allele fraction required in Tumor to be consider a variant Likely Germline [default: 0.4]
192+
* `--tumor_vaf_germline_thres TEXT`: The threshold for variant allele fraction required in Matched Normal to be consider a variant Germline [default: 0.4]
193+
* `--tier_one_alt_min TEXT`: The Minimum Alt Depth required in hotspots [default: 3]
194+
* `--tier_two_alt_min TEXT`: The Minimum Alt Depth required in non-hotspots [default: 5]
195+
* `--min_n_curated_samples_alt_detected TEXT`: The Minimum number of curated samples variant is detected to be flagged [default: 2]
196+
* `--tn_ratio_thres TEXT`: Tumor-Normal variant fraction ratio threshold [default: 5]
197+
* `--help`: Show this message and exit.
198+
199+
#### `main maf filter access_remove_variants`
200+
201+
Filter a MAF file based on all the parameters satisfied by the remove variants by annotations CWL script in the ACCESS pipeline
202+
203+
**Usage**:
204+
205+
```console
206+
$ main maf filter access_remove_variants [OPTIONS]
207+
```
208+
209+
**Options**:
210+
211+
* `-m, --maf FILE`: MAF file to subset [required]
212+
* `-i, --intervals FILE`: Intervals file containing rows of criterion to tag input MAF by [required]
213+
* `-o, --output PATH`: Maf output file name. [default: output.maf]
214+
* `-sep, --separator TEXT`: Specify a seperator for delimited data. [default: tsv]
215+
* `--help`: Show this message and exit.
216+
145217
#### `main maf filter cmo_ch`
146218

147219
Filter a MAF file based on all the parameters
@@ -207,7 +279,7 @@ $ main maf filter non_common_variant [OPTIONS]
207279

208280
* `-m, --maf FILE`: MAF file to subset [required]
209281
* `-o, --output PATH`: Maf output file name. [default: output.maf]
210-
* `-sep, --separator TEXT`: Specify a seperator for delimited data. [default: tsv]
282+
* `-sep, --separator TEXT`: Specify a separator for delimited data. [default: tsv]
211283
* `--help`: Show this message and exit.
212284

213285
#### `main maf filter non_hotspot`
@@ -300,13 +372,72 @@ $ main maf tag [OPTIONS] COMMAND [ARGS]...
300372

301373
**Commands**:
302374

375+
* `access`: Tag a variant in a MAF file based on...
376+
* `by_rules`: Tag a variant in a MAF file based on...
377+
* `by_variant_classification`: Tag filtered MAF file by variant...
303378
* `cmo_ch`: Tag a variant in MAF file based on all the...
304379
* `common_variant`: Tag a variant in a MAF file as common...
305380
* `germline_status`: Tag a variant in a MAF file as germline...
381+
* `hotspots`: Tag a variant in a MAF file based on...
306382
* `prevalence_in_cosmicDB`: Tag a variant in a MAF file with...
307383
* `traceback`: Generate combined count columns between...
308384
* `truncating_mut_in_TSG`: Tag a truncating mutating variant in a MAF...
309385

386+
#### `main maf tag access`
387+
388+
Tag a variant in a MAF file based on criterion stated by the SNV/indels ACCESS pipeline workflow
389+
390+
**Usage**:
391+
392+
```console
393+
$ main maf tag access [OPTIONS]
394+
```
395+
396+
**Options**:
397+
398+
* `-m, --maf FILE`: MAF file to tag [required]
399+
* `-r, --rules FILE`: Intervals JSON file containing criterion to tag input MAF by [required]
400+
* `-h, --hotspots FILE`: Text file containing hotspots to tag input MAF by [required]
401+
* `-o, --output PATH`: Maf output file name. [default: output_tagged.maf]
402+
* `-sep, --separator TEXT`: Specify a separator for delimited data. [default: tsv]
403+
* `--help`: Show this message and exit.
404+
405+
#### `main maf tag by_rules`
406+
407+
Tag a variant in a MAF file based on criterion stated by an input rules.json JSON file
408+
409+
**Usage**:
410+
411+
```console
412+
$ main maf tag by_rules [OPTIONS]
413+
```
414+
415+
**Options**:
416+
417+
* `-m, --maf FILE`: MAF file to tag [required]
418+
* `-r, --rules FILE`: Intervals JSON file containing criterion to tag input MAF by [required]
419+
* `-o, --output PATH`: Maf output file name. [default: output_tagged.maf]
420+
* `-sep, --separator TEXT`: Specify a separator for delimited data. [default: tsv]
421+
* `--help`: Show this message and exit.
422+
423+
#### `main maf tag by_variant_classification`
424+
425+
Tag filtered MAF file by variant classifications and subset into individual text files.
426+
427+
**Usage**:
428+
429+
```console
430+
$ main maf tag by_variant_classification [OPTIONS]
431+
```
432+
433+
**Options**:
434+
435+
* `-m, --maf FILE`: filtered MAF file to split by annotations with [required]
436+
* `-tx_ref, --canonical_tx_ref FILE`: Reference canonical transcript file [required]
437+
* `-o, --output_dir PATH`: Output Directory to export individual text files to. [default: output_dir]
438+
* `-sep, --separator TEXT`: Specify a seperator for delimited data. [default: tsv]
439+
* `--help`: Show this message and exit.
440+
310441
#### `main maf tag cmo_ch`
311442

312443
Tag a variant in MAF file based on all the parameters listed
@@ -358,6 +489,24 @@ $ main maf tag germline_status [OPTIONS]
358489
* `-sep, --separator TEXT`: Specify a seperator for delimited data. [default: tsv]
359490
* `--help`: Show this message and exit.
360491

492+
#### `main maf tag hotspots`
493+
494+
Tag a variant in a MAF file based on hotspots file
495+
496+
**Usage**:
497+
498+
```console
499+
$ main maf tag hotspots [OPTIONS]
500+
```
501+
502+
**Options**:
503+
504+
* `-m, --maf FILE`: MAF file to tag [required]
505+
* `-h, --hotspots FILE`: Text file containing hotspots to tag input MAF by [required]
506+
* `-o, --output PATH`: Maf output file name. [default: output_tagged.maf]
507+
* `-sep, --separator TEXT`: Specify a separator for delimited data. [default: tsv]
508+
* `--help`: Show this message and exit.
509+
361510
#### `main maf tag prevalence_in_cosmicDB`
362511

363512
Tag a variant in a MAF file with prevalence in COSMIC DB
@@ -390,6 +539,7 @@ $ main maf tag traceback [OPTIONS]
390539
* `-m, --maf FILE`: MAF file to tag [required]
391540
* `-o, --output PATH`: Maf output file name. [default: output.maf]
392541
* `-sep, --separator TEXT`: Specify a seperator for delimited data. [default: tsv]
542+
* `-sheet, --samplesheet PATH`: Samplesheets in nucleovar formatting. See README for more info: `https://github.com/mskcc-omics-workflows/nucleovar/blob/main/README.md`. Used to add fillout type information to maf. The `sample_id` and `type` columns must be present.
393543
* `--help`: Show this message and exit.
394544

395545
#### `main maf tag truncating_mut_in_TSG`
@@ -468,6 +618,65 @@ $ main mutect1 case-control filter [OPTIONS]
468618
* `-o, --outDir TEXT`: Full Path to the output dir
469619
* `--help`: Show this message and exit.
470620

621+
## `main mutect2`
622+
623+
post-processing commands for MuTect version 2 VCFs.
624+
625+
**Usage**:
626+
627+
```console
628+
$ main mutect2 [OPTIONS] COMMAND [ARGS]...
629+
```
630+
631+
**Options**:
632+
633+
* `--help`: Show this message and exit.
634+
635+
**Commands**:
636+
637+
* `case-control`: Post-processing commands for filtering of...
638+
639+
### `main mutect2 case-control`
640+
641+
Post-processing commands for filtering of MuTect version 2 VCF input file.
642+
643+
**Usage**:
644+
645+
```console
646+
$ main mutect2 case-control [OPTIONS] COMMAND [ARGS]...
647+
```
648+
649+
**Options**:
650+
651+
* `--help`: Show this message and exit.
652+
653+
**Commands**:
654+
655+
* `filter`: This tool helps to filter MuTect version 2...
656+
657+
#### `main mutect2 case-control filter`
658+
659+
This tool helps to filter MuTect version 2 VCFs for case-control calling
660+
661+
**Usage**:
662+
663+
```console
664+
$ main mutect2 case-control filter [OPTIONS]
665+
```
666+
667+
**Options**:
668+
669+
* `-i, --inputVcf FILE`: Input vcf generated by MuTect2 which needs to be processed [required]
670+
* `-it, --inputTxt FILE`: Input Txt generated by MuTect which needs to be processed. NOTE, a Txt file will not be used for Mutect2 filtering as it is not provided in standard output. [default: /dev/null]
671+
* `--refFasta FILE`: Input reference fasta [default: /dev/null]
672+
* `--tsampleName TEXT`: Name of the tumor sample. [required]
673+
* `-dp, --totalDepth INTEGER RANGE`: Tumor total depth threshold [default: 20; x>=0]
674+
* `-ad, --alleleDepth INTEGER RANGE`: [default: 1; x>=0]
675+
* `-tnr, --tnRatio INTEGER RANGE`: Tumor-Normal variant fraction ratio threshold [default: 1; x>=0]
676+
* `-vf, --variantFraction FLOAT RANGE`: Tumor variant fraction threshold [default: 5e-05; x>=0]
677+
* `-o, --outDir TEXT`: Full Path to the output dir
678+
* `--help`: Show this message and exit.
679+
471680
## `main vardict`
472681

473682
post-processing commands for VarDict version 1.4.6 VCFs.
@@ -519,8 +728,8 @@ $ main vardict case-control filter [OPTIONS]
519728

520729
* `-i, --inputVcf FILE`: Input vcf generated by vardict which needs to be processed [required]
521730
* `--tsampleName TEXT`: Name of the tumor Sample [required]
731+
* `-ad, --alleledepth INTEGER RANGE`: [x>=1] [required]
522732
* `-dp, --totalDepth INTEGER RANGE`: Tumor total depth threshold [default: 20; x>=20]
523-
* `-ad, --alleledepth INTEGER RANGE`: [x>=1]
524733
* `-tnr, --tnRatio INTEGER`: Tumor-Normal variant fraction ratio threshold [default: 1]
525734
* `-vf, --variantFraction FLOAT`: Tumor variant fraction threshold [default: 5e-05]
526735
* `-mq, --minQual INTEGER`: Minimum variant call quality [default: 0]

postprocessing_variant_calls/vardict/vardict_class.py

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,10 @@ def __init__(
6262
self.txt_out = self.vcf_out + "_STDfilter.txt"
6363
self.vcf_complex_out = self.vcf_out + "_STDfilter_complex.vcf"
6464
self.vcf_out = self.vcf_out + "_STDfilter.vcf"
65+
# vcf output from sort
66+
self.vcf_out_sort = self.out_name()
67+
self.vcf_complex_out_sort = self.vcf_complex_out.replace(".vcf", "_sorted.vcf")
68+
self.vcf_out_sort = self.vcf_out.replace(".vcf", "_sorted.vcf")
6569
# vcf reader
6670
self.vcf_reader = self.set_reader()
6771
# sample list
@@ -371,8 +375,33 @@ def filter_case_control(self):
371375
+ "\n"
372376
)
373377
txt_fh.write(out_line)
374-
375378
vcf_writer.close()
376379
vcf_complex_writer.close()
377380
txt_fh.close()
378381
return self.vcf_out, self.vcf_complex_out, self.txt_out
382+
383+
def sort_vcf(self):
384+
# Read the input VCF file
385+
vcf_reader = vcf.Reader(open(self.vcf_out, "r"))
386+
sorted_records = sorted(
387+
vcf_reader, key=lambda record: (record.CHROM, record.POS)
388+
)
389+
# Write sorted records to the output VCF file
390+
vcf_writer = vcf.Writer(open(self.vcf_out_sort, "w"), vcf_reader)
391+
for record in sorted_records:
392+
vcf_writer.write_record(record)
393+
vcf_writer.close()
394+
return self.vcf_out_sort
395+
396+
def sort_vcf_complex(self):
397+
# Read the input VCF file
398+
vcf_reader = vcf.Reader(open(self.vcf_complex_out, "r"))
399+
sorted_records = sorted(
400+
vcf_reader, key=lambda record: (record.CHROM, record.POS)
401+
)
402+
# Write sorted records to the output VCF file
403+
vcf_writer = vcf.Writer(open(self.vcf_complex_out_sort, "w"), vcf_reader)
404+
for record in sorted_records:
405+
vcf_writer.write_record(record)
406+
vcf_writer.close()
407+
return self.vcf_complex_out_sort

0 commit comments

Comments
 (0)