Shared memory issue while running Protenix example #74

rakeshr10 · 2025-02-21T16:02:40Z

Hi, I tried to run the example I get the following error. How can I resolve this memory issue?

protenix predict --input examples/example.json --out_dir ./output --seeds 101

The text was updated successfully, but these errors were encountered:

zhangyuxuann · 2025-02-24T02:29:05Z

@rakeshr10 it's similar issue to #2

rakeshr10 · 2025-02-24T17:50:38Z

@zhangyuxuann Thanks for your reply. Putting shared memory worked but I was wondering if there is a way to run without using shared memory. I need to do large scale runs by spinning up pods on Kubernetes, so using /dev/shm might become an issue.

Another question is there a way to download the model weights and point it to the inference script instead of downloading it every time a new pod is spun up for inference.

It would also be helpful if you could provide some guidance on how to use the deepspeed evoformer attention kernel without compiling it every time a new pod is spun up. I noticed that the ninja workers are being called to compile evoformer code when it does not find the compiled cuda kernel in /root/.cache folder, this ends up taking a lot of time even though the inference is faster.

zhangyuxuann · 2025-02-25T02:30:42Z

@rakeshr10

PyTorch's DataLoader uses shared memory by default to speed up data loading. You can disable multi-process data loading by setting --num_workers 0 and try again.
if you put or copy the weights and ccd cache in ./release_data/ when the pod is spun up. it will not download again.
if you want to use the deepspeed evoformer attention kernel without compiling it every time a new pod is spun up. You can modify the builder.py like follows. For example, you can copy your precompilation evoformer_attn.so to the /root/.cache/torch_extensions/ or other path

# new_builder.py modified the following line
        if self.name == "evoformer_attn" and os.path.exists(
            "/root/.cache/torch_extensions/py39_cu121/evoformer_attn/evoformer_attn.so"
        ):
            import sys

            sys.path.append("/root/.cache/torch_extensions/py39_cu121/evoformer_attn")
            op_module = importlib.import_module("evoformer_attn")
            print(
                "using precompilation from /root/.cache/torch_extensions/py39_cu121/evoformer_attn"
            )
        else:
            op_module = load(
                name=self.name,
                sources=self.strip_empty_entries(sources),
                extra_include_paths=self.strip_empty_entries(extra_include_paths),
                extra_cflags=cxx_args,
                extra_cuda_cflags=nvcc_args,
                extra_ldflags=self.strip_empty_entries(self.extra_ldflags()),
                verbose=verbose,
            )

if you build your image from docker, just copy the above modified builder.py like

COPY ./new_builder.py /usr/local/lib/python3.9/dist-packages/deepspeed/ops/op_builder/builder.py
# the dest path you can show as 
# import deepspeed
# print(deepspeed.__file__)

rakeshr10 · 2025-02-25T19:47:14Z

@zhangyuxuann Thanks for your reply. I tried some of your suggestions and noticed these.

Setting --num_workers 0 worked but it was interesting to see that this was required only when inference needed to be run on GPU. A question does setting --num_workers 0 affect running batch inference jobs.
I was able to package evoformer compiled code into docker however I noticed that the inference speed was not much different than normal inference. If I use layernorm I noticed there is speed up during inference. Similar to evoformer is there a way I can package precompiled layernorm cuda kernel into docker so that it does not recompile whenever a new pod is spun up.
How do I specify bf16 weights for inference and is there any speed up or benefit for using bf16 weights.

zhangyuxuann · 2025-02-27T07:22:21Z

@rakeshr10

Could you describe the first question in more detail? By the way, i think `Batch inference mode is not needed because you can put multiple instances to be inferred in one JSON file, and different pods infer different JSON files.
you can copy the precompiled fastfold_layer_norm_cuda.so to ./protenix/model/layer_norm, it will not recompile layernorm again. the inference speed for using evoformer kernel is more obvious for long sequences.
you can set configs.skip_amp.confidence_head = False and configs.skip_amp.sample_diffusion = False to enable Mixed BF16 inference, which is faster and memory-efficiency for long sequences. For better performance, we recommand the default inference setting.

rakeshr10 · 2025-02-27T12:54:35Z

@zhangyuxuann Thanks for your reply. I was able to add fastfold_layer_norm_cuda.so in the path and now it works. I noticed the inference settings are same as in the picture so won't change them.

Regarding my first question what I mentioned was when setting --num_workers 0 the inference does not require shared memory when running on GPU. My question was since I am setting --num_workers 0 does it cause any issue while running protenix predict command on a multisequence json file which I am assuming is batch inference and the json file contains multiple sequences and msa files.

I tried running the colabfold_msa.py script on a single multiple sequence pairs containing fasta file. While the colabfold_search generates multiple a3m files, the a3m processor does not convert all a3m files into corresponding pairing and nonpairing a3m files for running protenix inference on multiple a3m files.

Also it would be great if this script could take the fasta and a3m files and converts them into json inputs file for running protenix inference.

JinyuanSun · 2025-03-03T03:30:40Z

@zhangyuxuann Thanks for your reply. I was able to add fastfold_layer_norm_cuda.so in the path and now it works. I noticed the inference settings are same as in the picture so won't change them.

Regarding my first question what I mentioned was when setting --num_workers 0 the inference does not require shared memory when running on GPU. My question was since I am setting --num_workers 0 does it cause any issue while running protenix predict command on a multisequence json file which I am assuming is batch inference and the json file contains multiple sequences and msa files.

I tried running the colabfold_msa.py script on a single multiple sequence pairs containing fasta file. While the colabfold_search generates multiple a3m files, the a3m processor does not convert all a3m files into corresponding pairing and nonpairing a3m files for running protenix inference on multiple a3m files.

Also it would be great if this script could take the fasta and a3m files and converts them into json inputs file for running protenix inference.

The issue may be caused by the fasta format, can you provide the input file, to reproduce the issue.

rakeshr10 · 2025-03-03T08:56:08Z

This is the fasta file with text extension. It has multiple pairs of sequences which can be submitted as a batch sequence search job to mmseqs using colabfold_search command to do msa generation for all pairs simultaneously.

It will be good if all of the a3m files generated for all pairs using the colabfold_search command can be converted to protenix compatible a3m and json files using colabfold_msa.py script

interactors.txt

rakeshr10 · 2025-03-12T09:10:38Z

@JinyuanSun Were you able to look into this issue and request for the script to handle multiple a3m files.

JinyuanSun · 2025-03-12T10:31:49Z

@JinyuanSun Were you able to look into this issue and request for the script to handle multiple a3m files.

The easiest approach would be to split the file into individual FASTA sequence files and then use a for loop in the shell to process them, thanks.
here is a code for you:

#!/bin/bash

input_fasta="input.fasta"
db_path="<path/to/colabfold_db>"
mmseqs_path="<path/to/mmseqs>"
output_dir="dimer_colabfold_msa"

mkdir -p split_fasta
mkdir -p "$output_dir"

awk '/^>/{f="split_fasta/seq"++i".fasta"} {print > f}' "$input_fasta"

for file in split_fasta/*.fasta; do
    echo "Processing $file with colabfold_msa.py..."
    python3 scripts/colabfold_msa.py "$file" "$db_path" "$output_dir" \
        --db1 uniref30_2103_db \
        --db3 colabfold_envdb_202108_db \
        --mmseqs_path "$mmseqs_path"
done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared memory issue while running Protenix example #74

Shared memory issue while running Protenix example #74

rakeshr10 commented Feb 21, 2025 •

edited

Loading

zhangyuxuann commented Feb 24, 2025

rakeshr10 commented Feb 24, 2025

zhangyuxuann commented Feb 25, 2025

rakeshr10 commented Feb 25, 2025

zhangyuxuann commented Feb 27, 2025 •

edited

Loading

rakeshr10 commented Feb 27, 2025 •

edited

Loading

JinyuanSun commented Mar 3, 2025

rakeshr10 commented Mar 3, 2025 •

edited

Loading

rakeshr10 commented Mar 12, 2025

JinyuanSun commented Mar 12, 2025

Shared memory issue while running Protenix example #74

Shared memory issue while running Protenix example #74

Comments

rakeshr10 commented Feb 21, 2025 • edited Loading

zhangyuxuann commented Feb 24, 2025

rakeshr10 commented Feb 24, 2025

zhangyuxuann commented Feb 25, 2025

rakeshr10 commented Feb 25, 2025

zhangyuxuann commented Feb 27, 2025 • edited Loading

rakeshr10 commented Feb 27, 2025 • edited Loading

JinyuanSun commented Mar 3, 2025

rakeshr10 commented Mar 3, 2025 • edited Loading

rakeshr10 commented Mar 12, 2025

JinyuanSun commented Mar 12, 2025

rakeshr10 commented Feb 21, 2025 •

edited

Loading

zhangyuxuann commented Feb 27, 2025 •

edited

Loading

rakeshr10 commented Feb 27, 2025 •

edited

Loading

rakeshr10 commented Mar 3, 2025 •

edited

Loading