-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High PolyA content in R2 reads after trimming. #180
Comments
Hi @AmrSaadeldin , Thanks for the details. Here is my initial assessment of the situation:
In either case, I don't think the results would be very different in either case. |
Hi @FelixKrueger, thank you so much for your help and your detailed observations. Based on your insights, I am now contemplating whether to conduct another round of trimming using the A{10} parameter. However, I'm concerned that this might introduce bias in the data. Considering this, my inclination is towards proceeding directly with the mapping phase. I suspect that the sequences might either not map uniquely or not map at all, which, as you mentioned, could be due to technical artifacts or genomic stretches of A's. Given these possibilities, do you think proceeding directly to mapping, without an additional trimming step, is a sound approach for our downstream analysis? Thank you again. |
My gut feeling is that you should be fine to proceed as-is, but for your own ease of mind I would potentially run a test (maybe just on a single sample?) in parallel. If you can convince yourself that the effects are either undetectable or negligibly small, you should be well prepared to answer any questions in that direction (in theory, Read 2 is the read where the methylation state is encoded by G/A (and not C/T as for Read 1), so if there is some sort of technical bias that makes it through to the uniquely mapped stage (which I doubt) you would expect some more unmethylated calls at these positions. |
Hello
I am working with human whole-genome bisulfite sequencing data, and paired-end reads. After using Trim Galore to remove adapters, I encountered an issue with the FastQC report for R2 reads across all samples, which indicates a high PolyA content. This is unexpected since Trim Galore usually removes PolyA sequences. My primary concern is whether to proceed with mapping, given that the R1 reads are fine and have passed all FastQC tests. Additionally, it's important to note that both R1 and R2 reads in all samples do not show any overrepresented sequences and meet most other FastQC criteria, except for the adapter content in R2 reads. I am seeking advice on how to address this issue with the PolyA content in R2 reads and whether it's advisable to move forward with the current data.
Below the images before and after trimming.
the first image: The diagram shows the adapter content before trimming across various samples. Each line in the diagram represents the adapter content for a specific sample. The blue lines indicate the Illumina adapters in the R1 and R2 samples, while the orange line represents the polyA content in one of the R1 samples. The remaining lines, colored red and light blue, correspond to the polyA adapters content in all R2 samples
The second image: This is the FastQC report depicting adapter content after it has been trimmed using Trim-Galore! Every line in this report represents the polyA adapter content. The orange-red line at the bottom illustrates the polyA content for one of the R1 samples. All other lines in the report correspond to the polyA content in all the R2 samples.
The text was updated successfully, but these errors were encountered: