Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting reads from bam file using binReadCounts takes too long #117

Open
Urja25 opened this issue Sep 22, 2023 · 2 comments
Open

Extracting reads from bam file using binReadCounts takes too long #117

Urja25 opened this issue Sep 22, 2023 · 2 comments
Labels

Comments

@Urja25
Copy link

Urja25 commented Sep 22, 2023

I am trying to use QDNAseq to perform copy number calling on a pseudobulk bam file (sorted and indexed ~50GB in size). I want to start with analyzing at 100 kb bin size first.

On running the script, it has been taking WEEKS to just extract read counts from the said bam files.

This is part of the code I am using:
library(QDNAseq)
library(QDNAseq.hg38)
library(future)

setwd("/home/u855h/chromothripsis/urja/LFS02CP_output/outs/clean_bam_merged")

bins <- getBinAnnotations(binSize=100, genome="hg38")

future::plan("multisession")

readCounts <- binReadCounts(bins, bamfiles="LFS02CP_merged_sorted.bam", chunkSize = 100)

print("binReadCounts: Done!")
print(readCounts)

I submit the job on the cluster by allocating it 200GB ram space. What am I doing wrong?

@ilarischeinin
Copy link
Member

chunkSize = 100

This splits the genome into chunks of 100 nucleotides and processes each one separately, which explains the slowness. The smaller the chunk, the less memory is required. The bigger it is, the faster the processing. You can try increasing the number significantly, or even removing it altogether, as you have quite a bit of memory available. If you run out of memory, that means you need smaller chunks.

(I have switched fields years ago, so haven't been involved with any of this in a long time. So there's not much more I can say, but that one number jumped right up to my eyes as as this popped into my email.)

@Urja25
Copy link
Author

Urja25 commented Sep 22, 2023

Hi @ilarischeinin! Removed chunksize completely and the pipeline finished in ~ 30 min :) Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants