Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion deeptools/correctGCBias.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def parse_arguments(args=None):
' method proposed by [Benjamini & Speed (2012). '
'Nucleic Acids Research, 40(10)]. It will remove reads'
' from regions with too high coverage compared to the'
' expected values (typically GC-rich regions) and will'
' expected values (typically GC-moderate regions) and will'
' add reads to regions where too few reads are seen '
'(typically AT-rich regions). '
'The tool ``computeGCBias`` needs to be run first to generate the '
Expand Down
2 changes: 1 addition & 1 deletion docs/content/tools/computeGCBias.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Background

``computeGCBias`` is based on a paper by `Benjamini and Speed <http://nar.oxfordjournals.org/content/40/10/e72>`_.
The basic assumption of the GC bias diagnosis is that an ideal sample should show a uniform distribution of sequenced reads across the genome, i.e. all regions of the genome should have similar numbers of reads, regardless of their base-pair composition.
In reality, the DNA polymerases used for PCR-based amplifications during the library preparation of the sequencing protocols prefer GC-rich regions. This will influence the outcome of the sequencing as there will be more reads for GC-rich regions just because of the DNA polymerase's preference.
In reality, the DNA polymerases used for PCR-based amplifications during the library preparation of the sequencing protocols prefer GC-moderate regions. This will influence the outcome of the sequencing as there will be more reads for GC-moderate regions just because of the DNA polymerase's preference. As shown **real-life-data** below, the peak is at where the GC content is moderate.

``computeGCbias`` will first calculate the **expected GC profile** by counting the number of DNA fragments of a fixed size per GC fraction where GC fraction is defined as the number of G's or C's in a genome region of a given length.
The result is basically a histogram depicting the frequency of DNA fragments for each type of genome region with a GC fraction between 0 to 100 percent. This will be different for each reference genome, but is independent of the actual sequencing experiment.
Expand Down