Skip to content

KeyError: 'msa_mask' Issue #179

@YoshitakaMo

Description

@YoshitakaMo

I've tried to use FastFold for a large hetero-multimer protein complex, but I encountered this issue.

2023-07-25 14:53:20,908	INFO worker.py:1518 -- Started a local Ray instance.
2023-07-25 14:53:22,253	INFO workflow_access.py:356 -- Initializing workflow manager...
2023-07-25 14:53:24,072	INFO api.py:203 -- Workflow job created. [id="fastfold_data_workflow Tue Jul 25 14:53:22 2023"].
�[2m�[36m(WorkflowManagementActor pid=1387)�[0m 2023-07-25 14:53:24,128	INFO workflow_executor.py:86 -- Workflow job [id=fastfold_data_workflow Tue Jul 25 14:53:22 2023] started.
...
<MSA generation was successfully finished>
...
...
running in multimer mode...
Traceback (most recent call last):
  File "/foo/bar/FastFold/inference.py", line 556, in <module>
    main(args)
  File "/foo/bar/FastFold/inference.py", line 164, in main
    inference_multimer_model(args)
  File "/foo/bar/FastFold/inference.py", line 285, in inference_multimer_model
    processed_feature_dict = feature_processor.process_features(
  File "/foo/bar/FastFold/fastfold/data/feature_pipeline.py", line 124, in process_features
    return np_example_to_features(
  File "/foo/bar/FastFold/fastfold/data/feature_pipeline.py", line 106, in np_example_to_features
    features = input_pipeline_fn(tensor_dict, cfg.common, cfg[mode])
  File "/foo/bar/FastFold/fastfold/data/input_pipeline_multimer.py", line 107, in process_tensors_from_config
    tensors = compose(nonensembled)(tensors)
  File "/foo/bar/FastFold/fastfold/data/data_transforms.py", line 76, in <lambda>
    return lambda x: f(x, *args, **kwargs)
  File "/foo/bar/FastFold/fastfold/data/input_pipeline_multimer.py", line 124, in compose
    x = f(x)
  File "/foo/bar/FastFold/fastfold/data/data_transforms_multimer.py", line 298, in make_msa_profile
    batch['msa_mask'][..., None],
KeyError: 'msa_mask'

I know this issue is similar to #119, but I have no idea for the latest FastFold version. Please let me know the solution.

Computational environment

  • OS: Red Hat Enterprise Linux Server release 7.7
  • CUDA version: CUDA 11.6
  • GCC version: 10.3.0
  • FastFold version: The latest commit, eba4968

The input command was:

FASTAFILE="foo2.fasta"
OUTPUTDIR="./foo2"
DATE="2099-07-14"
DATABASEDIR=/foobar/alphafold/db-v2.3.2
python3.9 ${FASTFOLDDIR}/inference.py ${FASTAFILE} ${DATABASEDIR}/pdb_mmcif/mmcif_files/ \
    --output_dir ${OUTPUTDIR} \
    --gpus 4 \
    --model_preset multimer \
    --max_template_date ${DATE} \
    --relaxation \
    --use_precomputed_alignments ${OUTPUTDIR}/alignments \
    --save_prediction_result True \
    --uniref90_database_path=$DATABASEDIR/uniref90/uniref90.fasta \
    --mgnify_database_path=$DATABASEDIR/mgnify/mgy_clusters_2022_05.fa \
    --bfd_database_path=$DATABASEDIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniref30_database_path=$DATABASEDIR/uniref30/UniRef30_2021_03 \
    --obsolete_pdbs_path=$DATABASEDIR/pdb_mmcif/obsolete.dat \
    --uniprot_database_path=$DATABASEDIR/uniprot/uniprot.fasta \
    --pdb_seqres_database_path=$DATABASEDIR/pdb_seqres/pdb_seqres.txt \
    --jackhmmer_binary_path=$FASTFOLDDIR/fastfold-conda/bin/jackhmmer \
    --hhblits_binary_path=$FASTFOLDDIR/fastfold-conda/bin/hhblits \
    --hhsearch_binary_path=$FASTFOLDDIR/fastfold-conda/bin/hhsearch \
    --kalign_binary_path=$FASTFOLDDIR/fastfold-conda/bin/kalign

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions