Extracting reads from bam file using binReadCounts takes too long #117

Urja25 · 2023-09-22T13:17:06Z

I am trying to use QDNAseq to perform copy number calling on a pseudobulk bam file (sorted and indexed ~50GB in size). I want to start with analyzing at 100 kb bin size first.

On running the script, it has been taking WEEKS to just extract read counts from the said bam files.

This is part of the code I am using:
library(QDNAseq)
library(QDNAseq.hg38)
library(future)

setwd("/home/u855h/chromothripsis/urja/LFS02CP_output/outs/clean_bam_merged")

bins <- getBinAnnotations(binSize=100, genome="hg38")

future::plan("multisession")

readCounts <- binReadCounts(bins, bamfiles="LFS02CP_merged_sorted.bam", chunkSize = 100)

print("binReadCounts: Done!")
print(readCounts)

I submit the job on the cluster by allocating it 200GB ram space. What am I doing wrong?

ilarischeinin · 2023-09-22T13:31:23Z

chunkSize = 100

This splits the genome into chunks of 100 nucleotides and processes each one separately, which explains the slowness. The smaller the chunk, the less memory is required. The bigger it is, the faster the processing. You can try increasing the number significantly, or even removing it altogether, as you have quite a bit of memory available. If you run out of memory, that means you need smaller chunks.

(I have switched fields years ago, so haven't been involved with any of this in a long time. So there's not much more I can say, but that one number jumped right up to my eyes as as this popped into my email.)

Urja25 · 2023-09-22T14:53:29Z

Hi @ilarischeinin! Removed chunksize completely and the pipeline finished in ~ 30 min :) Thank you!!

HenrikBengtsson added the question label Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting reads from bam file using binReadCounts takes too long #117

Extracting reads from bam file using binReadCounts takes too long #117

Urja25 commented Sep 22, 2023

ilarischeinin commented Sep 22, 2023

Urja25 commented Sep 22, 2023

Extracting reads from bam file using binReadCounts takes too long #117

Extracting reads from bam file using binReadCounts takes too long #117

Comments

Urja25 commented Sep 22, 2023

ilarischeinin commented Sep 22, 2023

Urja25 commented Sep 22, 2023