two fastq files were not correctly formated #36

alexyfyf · 2023-05-12T06:24:03Z

Hi team,

I have downloaded some cDNA fastq files from you s3 repo.
I found 2 files are not correctly formatted when I run QC with NanoPlot.

SGNex_MCF7_cDNAStranded_replicate2_run1/SGNex_MCF7_cDNAStranded_replicate2_run1.fastq.gz
SGNex_K562_cDNAStranded_replicate3_run3/SGNex_K562_cDNAStranded_replicate3_run3.fastq.gz

The first one has additional strings before the @ character of the first read.

fastq_fail/FAK34234_679ea2e77287c6ea3bab84c69ca16d29e5d9c760_228.fastq000666 001750 001750 00010735421 13424777162 023424 0ustar00gridgrid000000 000000 @0185f0c7-c4a5-40fb-9ac2-6907653a86a5 runid=679ea2e77287c6ea3bab84c69ca16d29e5d9c760 read=46243 ch=61 start_time=2019-02-01T08:06:48Z flow_cell_id=FAK34234 protocol_group_id=010219_MCF7_mRNA_PCS109 sample_id=010219_MCF7_mRNA_PCS109
ACGGTAATACTTCGGTCTTGTTTCGACAATCGGTCGCTCAGACCGACCGTGGAAC
+
#"*%&$#%"$&"""""$&&#"""""""++*++)/+%#%##'+*$%&'%"##("&$

The second one has a read with an unmatching length of quality score.

@09f55d50-803e-4048-899d-bb2fbdbf9c33 runid=446e90283984afd70d3f9af90262644290c7fca2 read=1796 ch=64 start_time=2019-01-07T07:56:26Z flow_cell_id=FAK11042 protocol_group_id=070119_K562_mRNA_PCS109 sample_id=070119_K562_mRNA_PCS109
TCGGTGATAAAGTGTTAATCGTCGG
+
%"-$&%""""""""$"""""""""

Can you confirm this?
Cheers,
Alex

The text was updated successfully, but these errors were encountered:

cying111 · 2023-11-08T05:51:51Z

Hi @alexyfyf ,

Thanks for pointing out the problems of those files.

I have corrected those two files and updated them in the S3 bucket. Please have a look.

Please let us know if issues are found for other files as well!

Thank you.
Warm regards,
Ying

alexyfyf · 2023-11-09T00:29:27Z

Hi Ying,

I did spot another file from dRNA also corruputed.
SGNex_MCF7_directRNA_replicate2_run2

It has quite a few problems, and I used the following code to fix it.

zcat SGNex_MCF7_directRNA_replicate2_run2.fastq.gz | sed 's/.*@/@/g' | sed '$d' | gzip > SGNex_MCF7_directRNA_replicate2_run2_fixed.fastq.gz

You can have a look and see if there's a better way.

Cheers,
Alex

cying111 · 2023-11-09T09:42:27Z

Hi Alex,

Thanks for the heads-up again and sharing your code for correcting that.

I think that's good already.

I have uploaded the corrected version just now.

Thank you
Regards,
Ying

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

two fastq files were not correctly formated #36

two fastq files were not correctly formated #36

alexyfyf commented May 12, 2023

cying111 commented Nov 8, 2023

alexyfyf commented Nov 9, 2023 •

edited

Loading

cying111 commented Nov 9, 2023

two fastq files were not correctly formated #36

two fastq files were not correctly formated #36

Comments

alexyfyf commented May 12, 2023

cying111 commented Nov 8, 2023

alexyfyf commented Nov 9, 2023 • edited Loading

cying111 commented Nov 9, 2023

alexyfyf commented Nov 9, 2023 •

edited

Loading