impress-flowgentic

Recreation of the IMPRESS protein-binding workflow using LangGraph (workflow definition) and Flowgentic (execution via RADICAL AsyncFlow), with mocked heavy tools for AlphaFold and ProteinMPNN.

Overview

impress-flowgentic reproduces the behavior of IMPRESS protein-binding pipelines in a local runnable form:

Multi-pass pipeline execution.
Sequence design + ranking stage.
Structure prediction stage per target.
Score extraction stage (af_stats_<pipeline>_pass_<n>.csv).
Adaptive child-pipeline spawning when targets degrade.

The project intentionally mocks expensive external tools (AlphaFold and MPNN), but keeps their data contracts and output directory conventions so the workflow resembles real IMPRESS execution.

Goals and Scope

Goals

Recreate IMPRESS adaptive pipeline semantics using Flowgentic + LangGraph.
Keep artifact paths and pass-level outputs compatible with IMPRESS expectations.
Make the whole workflow runnable locally and observable via produced files.

Non-goals

Running real AlphaFold.
Running real ProteinMPNN.
Integrating HPC scheduler-specific runtime configs.

How This Mirrors IMPRESS

This implementation follows IMPRESS behavior at three levels:

Manager semantics
- Tracks active pipelines, adaptive tasks, and spawned children.
- Submits child pipelines dynamically.
- Supports parent termination once work is migrated.
Pipeline semantics
- Multi-pass loop with per-pass execution.
- Child pipelines skip design/ranking on first inherited pass.
- Pipeline keeps iter_seqs, score_history, current_scores, and pass counters.
Artifact semantics
- Produces IMPRESS-like layout under af_pipeline_outputs_multi/<pipeline>/....
- Produces af_stats_<pipeline>_pass_<n>.csv files.
- Copies migrated targets into <child>_in/.

Architecture

High-level components

FlowgenticImpressManager
- Coordinates pipeline lifecycle and adaptive execution.
- File: impress_flowgentic/manager.py
ProteinBindingFlowgenticPipeline
- Implements pass loop.
- Compiles/executes LangGraph for each pass.
- File: impress_flowgentic/pipeline.py
Adaptive policy
- Detects degraded targets and requests child pipeline spawning.
- File: impress_flowgentic/adaptive.py
Mock tool layer
- MPNN/AlphaFold/scoring simulators with deterministic outputs.
- File: impress_flowgentic/mocks.py
I/O helpers
- Seeds input PDBs and output directory tree.
- File: impress_flowgentic/io.py

Per-pass LangGraph

Each pass uses a StateGraph(PassState) with nodes wrapped by Flowgentic execution wrappers (AsyncFlowType.EXECUTION_BLOCK):

prepare_pass
mock_mpnn (conditionally skipped for child first inherited pass)
rank_sequences
build_fasta
mock_alphafold
mock_plddt_extract

Conditional routing:

prepare_pass -> build_fasta if skip_design=True
prepare_pass -> mock_mpnn -> rank_sequences -> build_fasta otherwise

Then always:

build_fasta -> mock_alphafold -> mock_plddt_extract -> END

Program Structure

impress-flowgentic/
├── impress_flowgentic/
│   ├── __init__.py
│   ├── adaptive.py
│   ├── base.py
│   ├── io.py
│   ├── manager.py
│   ├── mocks.py
│   ├── pipeline.py
│   ├── runner.py
│   ├── setup.py
│   └── state.py
├── scripts/
│   └── run_impress_flowgentic.py
├── workspace/                       # generated run artifacts
├── pyproject.toml
└── README.md

Workflow Execution Details

Runner seeds initial inputs (p1_in/*.pdb) and required output directories.
Manager starts p1 pipeline.
Pipeline executes pass graph for each pass up to max_passes.
After each pass, pipeline triggers adaptive step.
Adaptive function may spawn child pipeline (p1_sub1, etc.) with degraded targets only.
Manager continues until all parent/child pipelines complete.
Manager writes summary report to workspace/run_summary.json.

Mock Tool Behavior

Mock MPNN

Generates deterministic candidate sequences per target/pass.
Writes files compatible with ranking parser:
- .../mpnn/job_<pass>/seqs/<target>.fa

Mock AlphaFold

Produces per-target dimer model outputs:
- .../af/prediction/dimer_models/<target>/...
Copies selected files into:
- .../af/prediction/best_models/<target>.pdb
- .../af/prediction/best_ptm/<target>.json
- .../mpnn/job_<pass>/<target>.pdb

Mock score extraction (pLDDT/PAE)

Computes deterministic per-target metrics.
Writes:
- workspace/af_stats_<pipeline>_pass_<n>.csv

Adaptive Logic

Adaptive policy intentionally mimics IMPRESS-style behavior:

Wait until at least pass 2.
Compare latest score vs previous score for each target.
If degradation exceeds threshold, migrate target to child pipeline.
Child pipeline inherits context (score_history, pass number, etc.) and increments seq_rank.
Parent finalizes by removing migrated targets from local work set.
Parent may terminate if no targets remain.

Run Instructions

Prerequisites

Python 3.10+.
git available (dependencies are installed from Git repositories).

Recommended command

cd /Users/yamirghofran0/STRIDE/impress-flowgentic
uv sync
uv run python scripts/run_impress_flowgentic.py

Alternative

cd /Users/yamirghofran0/STRIDE/impress-flowgentic
python -m venv .venv
source .venv/bin/activate
pip install .
python scripts/run_impress_flowgentic.py

Optional local-development override

If you want to test against a local Flowgentic checkout instead of Git-installed package versions:

pip install -e ../flowgentic

Expected Artifacts

After a run, inspect:

workspace/run_summary.json
workspace/af_stats_p1_pass_*.csv
workspace/af_stats_p1_sub1_pass_*.csv (if child spawned)
workspace/p1_in/*.pdb
workspace/p1_sub1_in/*.pdb (if child spawned)
workspace/af_pipeline_outputs_multi/p1/...
workspace/af_pipeline_outputs_multi/p1_sub1/... (if child spawned)

Typical directory shape:

workspace/
├── af_pipeline_outputs_multi/
│   ├── p1/
│   │   ├── af/
│   │   └── mpnn/
│   └── p1_sub1/
│       ├── af/
│       └── mpnn/
├── af_stats_p1_pass_1.csv
├── af_stats_p1_pass_2.csv
├── af_stats_p1_pass_3.csv
├── af_stats_p1_pass_4.csv
├── af_stats_p1_sub1_pass_2.csv
├── af_stats_p1_sub1_pass_3.csv
├── af_stats_p1_sub1_pass_4.csv
├── p1_in/
├── p1_sub1_in/
└── run_summary.json

Observed Run Results

A validated run produced:

p1 completed all configured passes.
One adaptive spawn occurred: p1 -> p1_sub1 on parent pass 3.
p1_sub1 completed inherited passes.
No pipeline errors reported.

Example summary (workspace/run_summary.json):

{
  "completed_pipelines": [
    {"name": "p1", "status": "completed", "passes_executed": 4, "remaining_targets": 1, "error": null},
    {"name": "p1_sub1", "status": "completed", "passes_executed": 4, "remaining_targets": 2, "error": null}
  ],
  "spawn_requests": [
    {"parent": "p1", "child": "p1_sub1", "pass": 3}
  ]
}

Troubleshooting

`ModuleNotFoundError: No module named 'flowgentic'`

Install project dependencies first:

uv sync

No child pipeline spawned

Adaptive spawning depends on score trajectories and threshold. Check:

degradation_threshold in impress_flowgentic/runner.py
Generated scores in workspace/af_stats_*.csv

Existing artifacts from previous runs

This workflow reuses workspace/ paths. If you want a clean run, remove or rename workspace/ before running.

Limitations and Next Steps

Current limitations

Tools are mocks, not actual AlphaFold/MPNN integrations.
Runtime currently uses local concurrent backend, not cluster-specific backends.

Suggested next steps

Replace mock MPNN generator with real wrapper invocation.
Replace mock AlphaFold writer with real execution command/staging.
Add tests for adaptive branching and artifact contracts.
Add optional telemetry artifact generation through Flowgentic introspection APIs.

This project is a faithful behavioral prototype of IMPRESS adaptive orchestration, implemented with Flowgentic + LangGraph and ready to evolve into real tool integrations.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
impress_flowgentic		impress_flowgentic
scripts		scripts
workspace		workspace
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

impress-flowgentic

Table of Contents

Overview

Goals and Scope

Goals

Non-goals

How This Mirrors IMPRESS

Architecture

High-level components

Per-pass LangGraph

Program Structure

Workflow Execution Details

Mock Tool Behavior

Mock MPNN

Mock AlphaFold

Mock score extraction (pLDDT/PAE)

Adaptive Logic

Run Instructions

Prerequisites

Recommended command

Alternative

Optional local-development override

Expected Artifacts

Observed Run Results

Troubleshooting

ModuleNotFoundError: No module named 'flowgentic'

No child pipeline spawned

Existing artifacts from previous runs

Limitations and Next Steps

Current limitations

Suggested next steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ModuleNotFoundError: No module named 'flowgentic'`

Packages