Recreation of the IMPRESS protein-binding workflow using LangGraph (workflow definition) and Flowgentic (execution via RADICAL AsyncFlow), with mocked heavy tools for AlphaFold and ProteinMPNN.
- Overview
- Goals and Scope
- How This Mirrors IMPRESS
- Architecture
- Program Structure
- Workflow Execution Details
- Mock Tool Behavior
- Adaptive Logic
- Run Instructions
- Expected Artifacts
- Observed Run Results
- Troubleshooting
- Limitations and Next Steps
impress-flowgentic reproduces the behavior of IMPRESS protein-binding pipelines in a local runnable form:
- Multi-pass pipeline execution.
- Sequence design + ranking stage.
- Structure prediction stage per target.
- Score extraction stage (
af_stats_<pipeline>_pass_<n>.csv). - Adaptive child-pipeline spawning when targets degrade.
The project intentionally mocks expensive external tools (AlphaFold and MPNN), but keeps their data contracts and output directory conventions so the workflow resembles real IMPRESS execution.
- Recreate IMPRESS adaptive pipeline semantics using Flowgentic + LangGraph.
- Keep artifact paths and pass-level outputs compatible with IMPRESS expectations.
- Make the whole workflow runnable locally and observable via produced files.
- Running real AlphaFold.
- Running real ProteinMPNN.
- Integrating HPC scheduler-specific runtime configs.
This implementation follows IMPRESS behavior at three levels:
-
Manager semantics
- Tracks active pipelines, adaptive tasks, and spawned children.
- Submits child pipelines dynamically.
- Supports parent termination once work is migrated.
-
Pipeline semantics
- Multi-pass loop with per-pass execution.
- Child pipelines skip design/ranking on first inherited pass.
- Pipeline keeps
iter_seqs,score_history,current_scores, and pass counters.
-
Artifact semantics
- Produces IMPRESS-like layout under
af_pipeline_outputs_multi/<pipeline>/.... - Produces
af_stats_<pipeline>_pass_<n>.csvfiles. - Copies migrated targets into
<child>_in/.
- Produces IMPRESS-like layout under
-
FlowgenticImpressManager
- Coordinates pipeline lifecycle and adaptive execution.
- File:
impress_flowgentic/manager.py
-
ProteinBindingFlowgenticPipeline
- Implements pass loop.
- Compiles/executes LangGraph for each pass.
- File:
impress_flowgentic/pipeline.py
-
Adaptive policy
- Detects degraded targets and requests child pipeline spawning.
- File:
impress_flowgentic/adaptive.py
-
Mock tool layer
- MPNN/AlphaFold/scoring simulators with deterministic outputs.
- File:
impress_flowgentic/mocks.py
-
I/O helpers
- Seeds input PDBs and output directory tree.
- File:
impress_flowgentic/io.py
Each pass uses a StateGraph(PassState) with nodes wrapped by Flowgentic execution wrappers (AsyncFlowType.EXECUTION_BLOCK):
prepare_passmock_mpnn(conditionally skipped for child first inherited pass)rank_sequencesbuild_fastamock_alphafoldmock_plddt_extract
Conditional routing:
prepare_pass -> build_fastaifskip_design=Trueprepare_pass -> mock_mpnn -> rank_sequences -> build_fastaotherwise
Then always:
build_fasta -> mock_alphafold -> mock_plddt_extract -> END
impress-flowgentic/
├── impress_flowgentic/
│ ├── __init__.py
│ ├── adaptive.py
│ ├── base.py
│ ├── io.py
│ ├── manager.py
│ ├── mocks.py
│ ├── pipeline.py
│ ├── runner.py
│ ├── setup.py
│ └── state.py
├── scripts/
│ └── run_impress_flowgentic.py
├── workspace/ # generated run artifacts
├── pyproject.toml
└── README.md
- Runner seeds initial inputs (
p1_in/*.pdb) and required output directories. - Manager starts
p1pipeline. - Pipeline executes pass graph for each pass up to
max_passes. - After each pass, pipeline triggers adaptive step.
- Adaptive function may spawn child pipeline (
p1_sub1, etc.) with degraded targets only. - Manager continues until all parent/child pipelines complete.
- Manager writes summary report to
workspace/run_summary.json.
- Generates deterministic candidate sequences per target/pass.
- Writes files compatible with ranking parser:
.../mpnn/job_<pass>/seqs/<target>.fa
- Produces per-target dimer model outputs:
.../af/prediction/dimer_models/<target>/...
- Copies selected files into:
.../af/prediction/best_models/<target>.pdb.../af/prediction/best_ptm/<target>.json.../mpnn/job_<pass>/<target>.pdb
- Computes deterministic per-target metrics.
- Writes:
workspace/af_stats_<pipeline>_pass_<n>.csv
Adaptive policy intentionally mimics IMPRESS-style behavior:
- Wait until at least pass 2.
- Compare latest score vs previous score for each target.
- If degradation exceeds threshold, migrate target to child pipeline.
- Child pipeline inherits context (
score_history, pass number, etc.) and incrementsseq_rank. - Parent finalizes by removing migrated targets from local work set.
- Parent may terminate if no targets remain.
- Python 3.10+.
gitavailable (dependencies are installed from Git repositories).
cd /Users/yamirghofran0/STRIDE/impress-flowgentic
uv sync
uv run python scripts/run_impress_flowgentic.pycd /Users/yamirghofran0/STRIDE/impress-flowgentic
python -m venv .venv
source .venv/bin/activate
pip install .
python scripts/run_impress_flowgentic.pyIf you want to test against a local Flowgentic checkout instead of Git-installed package versions:
pip install -e ../flowgenticAfter a run, inspect:
workspace/run_summary.jsonworkspace/af_stats_p1_pass_*.csvworkspace/af_stats_p1_sub1_pass_*.csv(if child spawned)workspace/p1_in/*.pdbworkspace/p1_sub1_in/*.pdb(if child spawned)workspace/af_pipeline_outputs_multi/p1/...workspace/af_pipeline_outputs_multi/p1_sub1/...(if child spawned)
Typical directory shape:
workspace/
├── af_pipeline_outputs_multi/
│ ├── p1/
│ │ ├── af/
│ │ └── mpnn/
│ └── p1_sub1/
│ ├── af/
│ └── mpnn/
├── af_stats_p1_pass_1.csv
├── af_stats_p1_pass_2.csv
├── af_stats_p1_pass_3.csv
├── af_stats_p1_pass_4.csv
├── af_stats_p1_sub1_pass_2.csv
├── af_stats_p1_sub1_pass_3.csv
├── af_stats_p1_sub1_pass_4.csv
├── p1_in/
├── p1_sub1_in/
└── run_summary.json
A validated run produced:
p1completed all configured passes.- One adaptive spawn occurred:
p1 -> p1_sub1on parent pass 3. p1_sub1completed inherited passes.- No pipeline errors reported.
Example summary (workspace/run_summary.json):
{
"completed_pipelines": [
{"name": "p1", "status": "completed", "passes_executed": 4, "remaining_targets": 1, "error": null},
{"name": "p1_sub1", "status": "completed", "passes_executed": 4, "remaining_targets": 2, "error": null}
],
"spawn_requests": [
{"parent": "p1", "child": "p1_sub1", "pass": 3}
]
}Install project dependencies first:
uv syncAdaptive spawning depends on score trajectories and threshold. Check:
degradation_thresholdinimpress_flowgentic/runner.py- Generated scores in
workspace/af_stats_*.csv
This workflow reuses workspace/ paths. If you want a clean run, remove or rename workspace/ before running.
- Tools are mocks, not actual AlphaFold/MPNN integrations.
- Runtime currently uses local concurrent backend, not cluster-specific backends.
- Replace mock MPNN generator with real wrapper invocation.
- Replace mock AlphaFold writer with real execution command/staging.
- Add tests for adaptive branching and artifact contracts.
- Add optional telemetry artifact generation through Flowgentic introspection APIs.
This project is a faithful behavioral prototype of IMPRESS adaptive orchestration, implemented with Flowgentic + LangGraph and ready to evolve into real tool integrations.