Hi Benchflow team,
I ran into an issue when evaluating long-running optimization tasks with ACP agents, especially claude-agent-acp / Claude Code with Claude Opus 4.6 or 4.7.
In these runs, the agent often builds a valid but slower optimization model, starts a long solver command, and then does not emit any new tool call/message/thought for a while. Benchflow then kills the run with:
Agent idle for 600s with no new tool call, message, or thought
The result becomes a partial trajectory, even though the task is not necessarily stuck from the task perspective. It is waiting on a long solver run.
From the source, it looks like this behavior is controlled internally by:
agent_idle_timeout: int | None = 600
and the comment says:
None disables idle detection and falls back to the agent's wall-clock timeout
That behavior makes sense. However, as far as I can tell, agent_idle_timeout is not exposed through the normal CLI path, e.g. bench eval create. So users cannot easily run:
bench eval create ... --agent-idle-timeout none
or extend it to something like 1800 seconds without manually modifying the installed Benchflow source.
Request
Could Benchflow expose this setting in the CLI and/or YAML config?
For example:
bench eval create ... --agent-idle-timeout 1800
bench eval create ... --agent-idle-timeout none
or in YAML:
agent_idle_timeout: null
# or
agent_idle_timeout: 1800
Why this matters
For long-running optimization / scientific computing tasks, a solver command can legitimately run for more than 600 seconds without the agent producing a new message. In that case, the existing wall-clock timeout from task.toml [agent].timeout_sec is usually the right limit, while the idle watchdog can be too aggressive.
I do not think the default needs to change. The 600s default is useful for catching genuinely stuck agents. But exposing the knob would make long-running benchmark tasks much easier to run reproducibly without patching Benchflow locally.
Relevant code:
agent_idle_timeout: int | None = 600
idle_timeout is None disables _prompt_with_idle_watchdog and falls back to asyncio.wait_for(..., timeout=timeout)
Hi Benchflow team,
I ran into an issue when evaluating long-running optimization tasks with ACP agents, especially
claude-agent-acp/ Claude Code with Claude Opus 4.6 or 4.7.In these runs, the agent often builds a valid but slower optimization model, starts a long solver command, and then does not emit any new tool call/message/thought for a while. Benchflow then kills the run with:
The result becomes a partial trajectory, even though the task is not necessarily stuck from the task perspective. It is waiting on a long solver run.
From the source, it looks like this behavior is controlled internally by:
and the comment says:
That behavior makes sense. However, as far as I can tell,
agent_idle_timeoutis not exposed through the normal CLI path, e.g.bench eval create. So users cannot easily run:bench eval create ... --agent-idle-timeout noneor extend it to something like 1800 seconds without manually modifying the installed Benchflow source.
Request
Could Benchflow expose this setting in the CLI and/or YAML config?
For example:
or in YAML:
Why this matters
For long-running optimization / scientific computing tasks, a solver command can legitimately run for more than 600 seconds without the agent producing a new message. In that case, the existing wall-clock timeout from
task.toml [agent].timeout_secis usually the right limit, while the idle watchdog can be too aggressive.I do not think the default needs to change. The 600s default is useful for catching genuinely stuck agents. But exposing the knob would make long-running benchmark tasks much easier to run reproducibly without patching Benchflow locally.
Relevant code:
agent_idle_timeout: int | None = 600idle_timeout is Nonedisables_prompt_with_idle_watchdogand falls back toasyncio.wait_for(..., timeout=timeout)