Skip to content

Seeking advice on fastp command for BGI/MGI metagenomic data QC #629

@ChenZx1234

Description

@ChenZx1234

Hello everyone,
I am new to the field of metagenomics analysis and I'm currently learning how to perform quality control on raw sequencing data. My data comes from a BGI/MGI sequencing platform.
I have read the documentation and put together the following fastp command to process my samples. Instead of filtering my raw data immediately, my plan is to first run an initial analysis to diagnose its quality. My goal is to use this preliminary analysis to define the most appropriate parameters for a second, dedicated filtering step.

Here is the command I've prepared:

fastp
-i rawdata/{}.r1.fq.gz
-I rawdata/{}.r2.fq.gz
-o qc/fastp/{}.r1.fq.gz
-O qc/fastp/{}.r2.fq.gz
--html qc/fastp/pre_qc_report/{}_preqc_fastp.html
--json qc/fastp/pre_qc_report/{}_preqc_fastp.json
-R "Pre Quality Control Report for {}"
--adapter_sequence AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA
--adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG
--trim_poly_g
-w 16

I would be very grateful for some guidance on whether this command is appropriate.
My questions are:

  1. Is this the correct way to run fastp in an "analysis-only" mode? Will omitting -o and -O prevent any data from being modified and only produce the reports?
  2. Will this initial report be sufficient for me to make informed decisions? Specifically, will the HTML report clearly show me:
    The presence and sequences of adapters (so I can decide whether to use auto-detection or specify them manually in the next step)?
    Any significant Poly-G tailing issues (so I can decide if --trim_poly_g or --poly_g_min_len is needed)?
    Other potential issues like low-quality regions or sequence duplications?
  3. Is this two-step strategy (1. Analyze Raw Data -> 2. Filter with Optimized Parameters) considered a good practice in the field? Or is it more common to simply run a single, comprehensive fastp command from the start?

Thank you in advance for your time and help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions