Dataset preparation and benchmark metadata for BM5.5 (257 protein-protein complexes). Sister repo to Protein_Relax_Pipeline.
- 257 complexes, 605 chains, 122,966 total residues
- 119 homo-multimeric targets deduplicated to unique sequences
- Crystal-derived FASTAs (not UniProt canonical) for accurate RMSD
- 41 targets with expression tags requiring sequence cleaning
- 4 non-standard entries: BAAD, BOYV, BP57, CP57
- PROJECT_STATUS.md — completion tracking
- COMPARISON.md — benchmark comparison data
- NOTEBOOK.md — development journal and decisions