- 
                Notifications
    
You must be signed in to change notification settings  - Fork 35
 
Adds fq/lint for early validation of FASTQs #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
          
 | 
    
| 
           Duplicate of #57 ?  | 
    
| 
           It is! Sorry I wasn't aware of that one although I thought I checked 🤔 . This one has some additional features to handle different use cases for FQ lint such as continuing without the failures.  | 
    
| 
           Got scooped! :D  | 
    
          
 If you'd consider closing it and reviewing this one we can maybe get the best of both worlds 🌻  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a small typo in README.md line 34.
Rest seems good to me.
| 
          
 Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.0.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.  | 
    
Validation of FASTQS early prevents running the pipeline on invalid FASTQ files which will make the pipeline more efficient at achieving it's ultimate objective of checking FASTQ validity. It adds 3 more parameters: - `--skip_linting` which enables the linting of FASTQs - `--fq_lint_args` which is a string of arguments to pass to the linting tool - `--continue_with_lint_fail` which is a boolean to determine whether to continue if the linting fails Between these three options the user has a high degree of control over how the pipeline lints which should handle most use cases. Closes nf-core#31
f6e38f9    to
    7072d59      
    Compare
  
    | 
           Hej, @adamrtalbot, thanks for your PR :). Just to let you know that we decided in the last seqinspector meeting on a defined list of modules to add to version 1. So while this is great, we will only implement it in a version after the first release. It's basically just to keep the first release simple.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, just had a minor comment on this PR.
| "type": "string", | ||
| "description": "Comma-separated string of tools to skip", | ||
| "pattern": "^((fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" | ||
| "pattern": "^((fq|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "pattern": "^((fq|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" | |
| "pattern": "^((fq_lint|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" | 
Since we have used the naming convention <tool>_<subcommand> for the other tools, it seems prudent to keep this going.
| // | ||
| // MODULE: Run FQ_LINT to catch early errors | ||
| // | ||
| if ( !("fq" in skip_tools) ) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if ( !("fq" in skip_tools) ) { | |
| if ( !("fq_lint" in skip_tools) ) { | 
| 
           Just had an idea regarding the  Thinking further on this option, have you considered reversing the logic here so that the pipeline would continue by default even if some samples fail linting? For me, it would seem that the main purpose of this pipeline is to identify which samples are bad (failed lint, contamination, low quality, etc.) and good for continued analysis. Stopping everything early due to one failed samples would go against this.  | 
    
          
 Based on @FranBonath's comment here I've stopped any further development on this feature, but yes, I think "keep going and report on all samples" is a good strategy for handling FQ linting.  | 
    
Validation of FASTQS early prevents running the pipeline on invalid FASTQ files which will make the pipeline more efficient at achieving it's ultimate objective of checking FASTQ validity.
It adds 3 more parameters:
[update March 25] Replaced with--skip_lintingwhich enables the linting of FASTQs--skip_tools 'fq'--fq_lint_argswhich is a string of arguments to pass to the linting tool--continue_with_lint_failwhich is a boolean to determine whether to continue if the linting failsBetween these three options the user has a high degree of control over how the pipeline lints which should handle most use cases.
Implements tests for all cases using the rnaseq minimal test dataset which has invalid sequencing names 🙄 .
Closes #31
PR checklist
nf-core lint).nf-test test main.nf.test -profile test,docker).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).