Skip to content

Handling Unequal-Length Paired-End Reads with detect_adapter_for_pe #622

@xingyongma

Description

@xingyongma

"I configured the detect_adapter_for_pe parameter in fastp to detect adapters. In my sequencing data, R2 is longer than R1, but after adapter filtering, the output shows R2 and R1 being of equal length. Actually, I want the output R2 to start from its own beginning and end at R1's termination point.

Is the current processing method of detect_adapter_for_pe such that if an adapter is detected, it outputs only the overlapping portion of R1 and R2? In reality, if modified to the approach described above—where a read starts from its own beginning and ends at its paired read's starting point—could it both maintain compatibility with the current method and accommodate cases where paired-end reads are of unequal length?"

Schematic Diagram Description
Image

Command I ran:

fastp
--detect_adapter_for_pe
-i R1.fastq
-I R2.fastq
-o clean_R1.fastq
-O clean_R2.fastq
--json fastp_2.json
--html fastp_2.html

Input R1.fastq:

@TTGGAAGGGATGTATGT_TTGTTGTWSKZXOYTLEQW_M_AE01-241101011:9:4P250320056US293234A2:L04:R001C001:0107:1438 1:N:0:TAAGGCGA+TCAGTCAC
TTTTTTAGAATATTTCGGTTAGTTTATATATATTTTGAATTTTATGAAAGAGAATTCGTTTTTAATATTTATTTGGTTGGT
IIIIIIIIIIIIIIIIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Input R2.fastq:

@TTGGAAGGGATGTATGT_TTGTTGTWSKZXOYTLEQW_M_AE01-241101011:9:4P250320056US293234A2:L04:R001C001:0107:1438 2:N:0:TAAGGCGA+TCAGTCAC
TCAAAACAACCCGTAACTTATTAACCAACCAAATAAATATTAAAAACGAATTCTCTTTCATAAAATTCAAAATATATATAAACTAACCGAAATATTCTAAAAAACCCTCCAACCTATCTCTTATACACATCTACAAACAAC
IIIIIIIIIIIIIIIIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Output clean_R1.fastq:

@TTGGAAGGGATGTATGT_TTGTTGTWSKZXOYTLEQW_M_AE01-241101011:9:4P250320056US293234A2:L04:R001C001:0107:1438 1:N:0:TAAGGCGA+TCAGTCAC
TTTTTTAGAATATTTCGGTTAGTTTATATATATTTTGAATTTTATGAAAGAGAATTCGTTTTTAATATTTATTTGGTTGGT
IIIIIIIIIIIIIIIIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Output clean_R2.fastq:

@TTGGAAGGGATGTATGT_TTGTTGTWSKZXOYTLEQW_M_AE01-241101011:9:4P250320056US293234A2:L04:R001C001:0107:1438 2:N:0:TAAGGCGA+TCAGTCAC
TCAAAACAACCCGTAACTTATTAACCAACCAAATAAATATTAAAAACGAATTCTCTTTCATAAAATTCAAAATATATATAA
IIIIIIIIIIIIIIIIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions