State not updating

When running a reduced version of our tool-caller agent workflow I noticed what seems like an error in the state handling between tool calls. The run behavior is like this:
1) the router works (with manual routing logic), the ignored LLM output looks fine,
2) the first tool call run_mpnn appears to work
3) The following tool step -- score_mpnn -- fails to update the state. As a result, score_mpnn runs again for some or all pipelines, maybe looping on that tool a time or two and crashing.
That all happens in about two minutes on a single A100 after which the job hangs.

I can't tell exactly where is the disconnect between the successful tool run and the failed state update, but thought maybe you could take a look? My code, Anvil slurm script, and example slurm output are attached.

[mason_yesterdayupdate.py](https://github.com/user-attachments/files/24868072/mason_yesterdayupdate.py)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State not updating #93

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

State not updating #93

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions