Expose `agent_idle_timeout` in CLI / YAML config


Hi Benchflow team,

I ran into an issue when evaluating long-running optimization tasks with ACP agents, especially `claude-agent-acp` / Claude Code with Claude Opus 4.6 or 4.7.

In these runs, the agent often builds a valid but slower optimization model, starts a long solver command, and then does not emit any new tool call/message/thought for a while. Benchflow then kills the run with:

```text
Agent idle for 600s with no new tool call, message, or thought
```

The result becomes a partial trajectory, even though the task is not necessarily stuck from the task perspective. It is waiting on a long solver run.

From the source, it looks like this behavior is controlled internally by:

```python
agent_idle_timeout: int | None = 600
```

and the comment says:

```python
None disables idle detection and falls back to the agent's wall-clock timeout
```

That behavior makes sense. However, as far as I can tell, `agent_idle_timeout` is not exposed through the normal CLI path, e.g. `bench eval create`. So users cannot easily run:

```bash
bench eval create ... --agent-idle-timeout none
```

or extend it to something like 1800 seconds without manually modifying the installed Benchflow source.

### Request

Could Benchflow expose this setting in the CLI and/or YAML config?

For example:

```bash
bench eval create ... --agent-idle-timeout 1800
bench eval create ... --agent-idle-timeout none
```

or in YAML:

```yaml
agent_idle_timeout: null
# or
agent_idle_timeout: 1800
```

### Why this matters

For long-running optimization / scientific computing tasks, a solver command can legitimately run for more than 600 seconds without the agent producing a new message. In that case, the existing wall-clock timeout from `task.toml [agent].timeout_sec` is usually the right limit, while the idle watchdog can be too aggressive.

I do not think the default needs to change. The 600s default is useful for catching genuinely stuck agents. But exposing the knob would make long-running benchmark tasks much easier to run reproducibly without patching Benchflow locally.

Relevant code:
- `agent_idle_timeout: int | None = 600`
- `idle_timeout is None` disables `_prompt_with_idle_watchdog` and falls back to `asyncio.wait_for(..., timeout=timeout)`
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose `agent_idle_timeout` in CLI / YAML config #338

Request

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose agent_idle_timeout in CLI / YAML config #338

Description

Request

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Expose `agent_idle_timeout` in CLI / YAML config #338