Skip to content

Identifying a spike in memory usage #12

@Martingales

Description

@Martingales

Hi,

I'm running tebreak on 40xWGS mouse samples from multiple strains (CAST, CAROLI, BL6). Recently I have switched from strain-specific reference genomes to the mm10 reference genome in order to allow easier comparisons between strains. Also more information and feature tracks are available for mm10.

Since the switch, tebreak needs way more RAM (>60 or even >80 GB) leading to the cluster killing my jobs. I needed to limit the number of cores to 4 instead of 12 resulting in an unacceptable extension of the runtime from hours to multiple days per job. The weird thing is that the average memory apparently doesn't change much and is quite low. Somewhere the memory usage of tebreak spikes.

What makes this more complicated is that sometimes when I run the same job twice, the needed resources change dramatically. Like in this example:

Run 1:
Max Memory : 56962 MB
Average Memory : 4866.31 MB
Total Requested Memory : 50000.00 MB
Max Swap : 66314 MB

Run 2:
Max Memory : 13446 MB
Average Memory : 3554.79 MB
Total Requested Memory : 80000.00 MB
Max Swap : 56173 MB

This is the code I use to run tebreak and resolve right after one another:

$PYTHONVERSION $TB/tebreak/tebreak.py \
        --disco_target $REF_TES \
        --bam $BAM \
        --bwaref $REF \
        --processes $n \
        --mask $REF_MASK \
        --map_tabix $UMAPK100 \
        --min_mappability $MINMAP \
        --pickle $OUTPUT.1.pickle \
        --detail_out $OUTPUT.1.tb.detail \
        --tmpdir $TMPDIRTB \
        --debug > $OUTPUT.1.out 2>&1

$PYTHONVERSION $TB/tebreak/resolve.py \
    -p $OUTPUT.1.pickle \
    -t $n \
    -i $REPBASE \
    -v \
    --detail_out $OUTPUT.2.rs.detail \
    --refoutdir $TMPDIRRF \
    --tmpdir $TMPDIRRS \
    --callmuts > $OUTPUT.2.out 2>&1

I will try to map the used memory for a sample to get a clearer picture where exactly the memory usage is spiking.

If you all can offer any help or have some ideas, that would be appreciated!

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions