We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi team,
I have downloaded some cDNA fastq files from you s3 repo. I found 2 files are not correctly formatted when I run QC with NanoPlot.
SGNex_MCF7_cDNAStranded_replicate2_run1/SGNex_MCF7_cDNAStranded_replicate2_run1.fastq.gz SGNex_K562_cDNAStranded_replicate3_run3/SGNex_K562_cDNAStranded_replicate3_run3.fastq.gz
The first one has additional strings before the @ character of the first read.
fastq_fail/FAK34234_679ea2e77287c6ea3bab84c69ca16d29e5d9c760_228.fastq000666 001750 001750 00010735421 13424777162 023424 0ustar00gridgrid000000 000000 @0185f0c7-c4a5-40fb-9ac2-6907653a86a5 runid=679ea2e77287c6ea3bab84c69ca16d29e5d9c760 read=46243 ch=61 start_time=2019-02-01T08:06:48Z flow_cell_id=FAK34234 protocol_group_id=010219_MCF7_mRNA_PCS109 sample_id=010219_MCF7_mRNA_PCS109 ACGGTAATACTTCGGTCTTGTTTCGACAATCGGTCGCTCAGACCGACCGTGGAAC + #"*%&$#%"$&"""""$&&#"""""""++*++)/+%#%##'+*$%&'%"##("&$
The second one has a read with an unmatching length of quality score.
@09f55d50-803e-4048-899d-bb2fbdbf9c33 runid=446e90283984afd70d3f9af90262644290c7fca2 read=1796 ch=64 start_time=2019-01-07T07:56:26Z flow_cell_id=FAK11042 protocol_group_id=070119_K562_mRNA_PCS109 sample_id=070119_K562_mRNA_PCS109 TCGGTGATAAAGTGTTAATCGTCGG + %"-$&%""""""""$"""""""""
Can you confirm this? Cheers, Alex
The text was updated successfully, but these errors were encountered:
Hi @alexyfyf ,
Thanks for pointing out the problems of those files.
I have corrected those two files and updated them in the S3 bucket. Please have a look.
Please let us know if issues are found for other files as well!
Thank you. Warm regards, Ying
Sorry, something went wrong.
Hi Ying,
I did spot another file from dRNA also corruputed. SGNex_MCF7_directRNA_replicate2_run2
SGNex_MCF7_directRNA_replicate2_run2
It has quite a few problems, and I used the following code to fix it.
zcat SGNex_MCF7_directRNA_replicate2_run2.fastq.gz | sed 's/.*@/@/g' | sed '$d' | gzip > SGNex_MCF7_directRNA_replicate2_run2_fixed.fastq.gz
You can have a look and see if there's a better way.
Cheers, Alex
Hi Alex,
Thanks for the heads-up again and sharing your code for correcting that.
I think that's good already.
I have uploaded the corrected version just now.
Thank you Regards, Ying
No branches or pull requests
Hi team,
I have downloaded some cDNA fastq files from you s3 repo.
I found 2 files are not correctly formatted when I run QC with NanoPlot.
The first one has additional strings before the @ character of the first read.
The second one has a read with an unmatching length of quality score.
Can you confirm this?
Cheers,
Alex
The text was updated successfully, but these errors were encountered: