System prompt labels shell commands as tools, causing invalid tool calls and autopilot cost

### Describe the bug

In Copilot CLI non-interactive/autopilot sessions, the init system prompt labels shell commands as "Available tools":

```text
<environment_context>
...
* Available tools: git, curl, gh
</environment_context>
```

In the observed session, `curl` was available as a shell command, but not as a callable Copilot tool/function. The model treated it as a callable tool and emitted:

```text
tool= curl args= {'url': 'https://app.notion.com/p/...'}
```

The runtime then returned:

```text
Tool 'curl' does not exist.
```

This looks like prompt-surface ambiguity rather than only a model mistake: the system prompt uses the word "tools" for shell commands while callable tools are also exposed to the model as tools.

### Related schema confusion observed in the same testing work

The same ambiguity showed up around other tool workflows:

1. Shell/session tools were called without the required prior shell state:

```text
Multiple validation errors:
- "shellId": Required
- "delay": Required
```

2. Sub-agent/task invocation was attempted with an incomplete schema:

```text
{name: "task", arguments: {agent_type: "translation-validator", description: "Validate changelog files", prompt: "...", mode: "background"}}
```

Runtime response:

```text
"name": Required
```

The model then switched to background agent + `read_agent`, which worked but added extra turns and runtime cost.

### Why this matters

Autopilot mode absorbs these validation errors and keeps going, but each invalid tool call costs extra model requests, tokens, and wall time. This is especially visible with smaller/local models that are more sensitive to tool-surface ambiguity.

In our prompt optimization test, making the prompt explicitly avoid the ambiguous paths removed the validation errors across 10/10 successful runs:

- Require direct file write, not shell input/bash write
- Require `translation-validator` with `name=translation-validator`
- Require synchronous validator, not background/read_agent
- Require validator to read only output files, not Notion/web/curl
- Use `events.jsonl` `session.task_complete` as completion signal, not stdout

### Affected version

Observed in Copilot CLI session metadata:

```text
copilotVersion: 1.0.39
model: Qwen3.6-35B-A3B-bf16
mode: --autopilot, non-interactive
```

### Expected behavior

The init system prompt should clearly distinguish callable tools from shell commands. For example:

```text
* Available shell commands: git, curl, gh
```

instead of:

```text
* Available tools: git, curl, gh
```

It would also help if the tool guidance made preconditions and required fields harder to miss, especially for:

- shell/session output tools that require `shellId`
- `task` / sub-agent tools that require `name`
- background agent flows that require `read_agent`

### Suggested fixes

- Rename `Available tools: git, curl, gh` to `Available shell commands: git, curl, gh` in the system prompt.
- Explicitly state that shell commands must be run through the bash/shell tool and are not callable tool names.
- Add stronger schema guidance for `task` / sub-agent invocation, including required `name` and when not to use background/read_agent.
- Add stronger precondition wording for shell output/input tools that require an existing `shellId`.

### Additional context

This is related in impact to task completion/output reliability issues, but it is a distinct problem: tool-surface ambiguity in the init prompt causes invalid tool calls and unnecessary autopilot cost.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System prompt labels shell commands as tools, causing invalid tool calls and autopilot cost #3043

Describe the bug

Related schema confusion observed in the same testing work

Why this matters

Affected version

Expected behavior

Suggested fixes

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

System prompt labels shell commands as tools, causing invalid tool calls and autopilot cost #3043

Description

Describe the bug

Related schema confusion observed in the same testing work

Why this matters

Affected version

Expected behavior

Suggested fixes

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions