deeptools · yuw444 · Oct 3, 2025 · Oct 3, 2025
diff --git a/deeptools/correctGCBias.py b/deeptools/correctGCBias.py
@@ -33,7 +33,7 @@ def parse_arguments(args=None):
         ' method proposed by [Benjamini & Speed (2012). '
         'Nucleic Acids Research, 40(10)]. It will remove reads'
         ' from regions with too high coverage compared to the'
-        ' expected values (typically GC-rich regions) and will'
+        ' expected values (typically GC-moderate regions) and will'
         ' add reads to regions where too few reads are seen '
         '(typically AT-rich regions). '
         'The tool ``computeGCBias`` needs to be run first to generate the '

diff --git a/docs/content/tools/computeGCBias.rst b/docs/content/tools/computeGCBias.rst
@@ -15,7 +15,7 @@ Background
 
 ``computeGCBias`` is based on a paper by `Benjamini and Speed <http://nar.oxfordjournals.org/content/40/10/e72>`_.
 The basic assumption of the GC bias diagnosis is that an ideal sample should show a uniform distribution of sequenced reads across the genome, i.e. all regions of the genome should have similar numbers of reads, regardless of their base-pair composition.
-In reality, the DNA polymerases used for PCR-based amplifications during the library preparation of the sequencing protocols prefer GC-rich regions. This will influence the outcome of the sequencing as there will be more reads for GC-rich regions just because of the DNA polymerase's preference.
+In reality, the DNA polymerases used for PCR-based amplifications during the library preparation of the sequencing protocols prefer GC-moderate regions. This will influence the outcome of the sequencing as there will be more reads for GC-moderate regions just because of the DNA polymerase's preference. As shown **real-life-data** below, the peak is at where the GC content is moderate.
 
 ``computeGCbias`` will first calculate the **expected GC profile** by counting the number of DNA fragments of a fixed size per GC fraction where GC fraction is defined as the number of G's or C's in a genome region of a given length.
 The result is basically a histogram depicting the frequency of DNA fragments for each type of genome region with a GC fraction between 0 to 100 percent. This will be different for each reference genome, but is independent of the actual sequencing experiment.