Skip to content

Expose agent_idle_timeout in CLI / YAML config #338

@wxj630

Description

@wxj630

Hi Benchflow team,

I ran into an issue when evaluating long-running optimization tasks with ACP agents, especially claude-agent-acp / Claude Code with Claude Opus 4.6 or 4.7.

In these runs, the agent often builds a valid but slower optimization model, starts a long solver command, and then does not emit any new tool call/message/thought for a while. Benchflow then kills the run with:

Agent idle for 600s with no new tool call, message, or thought

The result becomes a partial trajectory, even though the task is not necessarily stuck from the task perspective. It is waiting on a long solver run.

From the source, it looks like this behavior is controlled internally by:

agent_idle_timeout: int | None = 600

and the comment says:

None disables idle detection and falls back to the agent's wall-clock timeout

That behavior makes sense. However, as far as I can tell, agent_idle_timeout is not exposed through the normal CLI path, e.g. bench eval create. So users cannot easily run:

bench eval create ... --agent-idle-timeout none

or extend it to something like 1800 seconds without manually modifying the installed Benchflow source.

Request

Could Benchflow expose this setting in the CLI and/or YAML config?

For example:

bench eval create ... --agent-idle-timeout 1800
bench eval create ... --agent-idle-timeout none

or in YAML:

agent_idle_timeout: null
# or
agent_idle_timeout: 1800

Why this matters

For long-running optimization / scientific computing tasks, a solver command can legitimately run for more than 600 seconds without the agent producing a new message. In that case, the existing wall-clock timeout from task.toml [agent].timeout_sec is usually the right limit, while the idle watchdog can be too aggressive.

I do not think the default needs to change. The 600s default is useful for catching genuinely stuck agents. But exposing the knob would make long-running benchmark tasks much easier to run reproducibly without patching Benchflow locally.

Relevant code:

  • agent_idle_timeout: int | None = 600
  • idle_timeout is None disables _prompt_with_idle_watchdog and falls back to asyncio.wait_for(..., timeout=timeout)

Metadata

Metadata

Assignees

Labels

fixedVerified fixed by running the patched code

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions