Skip to content

Conversation

@stephprince
Copy link
Collaborator

@stephprince stephprince commented Mar 14, 2025

Motivation

Draft of an approach inspired by Fadini et al that runs iterative inference-time optimization by modifying the MSA inputs.

There are a couple of remaining to-do items, but I wanted to get feedback on the current implementation in the meantime.

Questions:

  1. It was unclear to me from the paper what MSA values were modified as inputs to AF. The raw MSA features or the embedded MSA representation? Currently I'm using the raw MSA features.
  2. Does this current setup apply the lightning module wrapper at the best level of abstraction for training? Currently, each "training step" currently loops n_iterations=N times through the linear layer + AF model and uses the manual optimization functions.

TODO

  • Update the initialization of the linear layer parameters to come from the config file, similar to AF setup Edit: With the current setup this is challenging because the shape of the raw msa features modified by the linear layer changes for each data input. I updated the lighting callbacks to reinitialize the model parameters/optimizer at the start of each batch.
  • Setup checkpointing for final model / best version per batch?
  • Add option for multiple phase of training (e.g. original paper used a two phase approach, selecting the best model from phase 1 to fine-tune in phase 2)

How to test the behavior?

cd metfish
python src/metfish/refinement_model/train.py /path/to/data /path/to/output

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@codecov
Copy link

codecov bot commented Jul 7, 2025

Codecov Report

Attention: Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 25.22%. Comparing base (9a2a41c) to head (7caba1d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/metfish/utils.py 40.00% 6 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (9a2a41c) and HEAD (7caba1d). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (9a2a41c) HEAD (7caba1d)
2 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main       #8       +/-   ##
===========================================
- Coverage   44.05%   25.22%   -18.83%     
===========================================
  Files           3        7        +4     
  Lines         227     1118      +891     
===========================================
+ Hits          100      282      +182     
- Misses        127      836      +709     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants