Describe the bug
In Copilot CLI non-interactive/autopilot sessions, the init system prompt labels shell commands as "Available tools":
<environment_context>
...
* Available tools: git, curl, gh
</environment_context>
In the observed session, curl was available as a shell command, but not as a callable Copilot tool/function. The model treated it as a callable tool and emitted:
tool= curl args= {'url': 'https://app.notion.com/p/...'}
The runtime then returned:
Tool 'curl' does not exist.
This looks like prompt-surface ambiguity rather than only a model mistake: the system prompt uses the word "tools" for shell commands while callable tools are also exposed to the model as tools.
Related schema confusion observed in the same testing work
The same ambiguity showed up around other tool workflows:
- Shell/session tools were called without the required prior shell state:
Multiple validation errors:
- "shellId": Required
- "delay": Required
- Sub-agent/task invocation was attempted with an incomplete schema:
{name: "task", arguments: {agent_type: "translation-validator", description: "Validate changelog files", prompt: "...", mode: "background"}}
Runtime response:
The model then switched to background agent + read_agent, which worked but added extra turns and runtime cost.
Why this matters
Autopilot mode absorbs these validation errors and keeps going, but each invalid tool call costs extra model requests, tokens, and wall time. This is especially visible with smaller/local models that are more sensitive to tool-surface ambiguity.
In our prompt optimization test, making the prompt explicitly avoid the ambiguous paths removed the validation errors across 10/10 successful runs:
- Require direct file write, not shell input/bash write
- Require
translation-validator with name=translation-validator
- Require synchronous validator, not background/read_agent
- Require validator to read only output files, not Notion/web/curl
- Use
events.jsonl session.task_complete as completion signal, not stdout
Affected version
Observed in Copilot CLI session metadata:
copilotVersion: 1.0.39
model: Qwen3.6-35B-A3B-bf16
mode: --autopilot, non-interactive
Expected behavior
The init system prompt should clearly distinguish callable tools from shell commands. For example:
* Available shell commands: git, curl, gh
instead of:
* Available tools: git, curl, gh
It would also help if the tool guidance made preconditions and required fields harder to miss, especially for:
- shell/session output tools that require
shellId
task / sub-agent tools that require name
- background agent flows that require
read_agent
Suggested fixes
- Rename
Available tools: git, curl, gh to Available shell commands: git, curl, gh in the system prompt.
- Explicitly state that shell commands must be run through the bash/shell tool and are not callable tool names.
- Add stronger schema guidance for
task / sub-agent invocation, including required name and when not to use background/read_agent.
- Add stronger precondition wording for shell output/input tools that require an existing
shellId.
Additional context
This is related in impact to task completion/output reliability issues, but it is a distinct problem: tool-surface ambiguity in the init prompt causes invalid tool calls and unnecessary autopilot cost.
Describe the bug
In Copilot CLI non-interactive/autopilot sessions, the init system prompt labels shell commands as "Available tools":
In the observed session,
curlwas available as a shell command, but not as a callable Copilot tool/function. The model treated it as a callable tool and emitted:The runtime then returned:
This looks like prompt-surface ambiguity rather than only a model mistake: the system prompt uses the word "tools" for shell commands while callable tools are also exposed to the model as tools.
Related schema confusion observed in the same testing work
The same ambiguity showed up around other tool workflows:
Runtime response:
The model then switched to background agent +
read_agent, which worked but added extra turns and runtime cost.Why this matters
Autopilot mode absorbs these validation errors and keeps going, but each invalid tool call costs extra model requests, tokens, and wall time. This is especially visible with smaller/local models that are more sensitive to tool-surface ambiguity.
In our prompt optimization test, making the prompt explicitly avoid the ambiguous paths removed the validation errors across 10/10 successful runs:
translation-validatorwithname=translation-validatorevents.jsonlsession.task_completeas completion signal, not stdoutAffected version
Observed in Copilot CLI session metadata:
Expected behavior
The init system prompt should clearly distinguish callable tools from shell commands. For example:
instead of:
It would also help if the tool guidance made preconditions and required fields harder to miss, especially for:
shellIdtask/ sub-agent tools that requirenameread_agentSuggested fixes
Available tools: git, curl, ghtoAvailable shell commands: git, curl, ghin the system prompt.task/ sub-agent invocation, including requirednameand when not to use background/read_agent.shellId.Additional context
This is related in impact to task completion/output reliability issues, but it is a distinct problem: tool-surface ambiguity in the init prompt causes invalid tool calls and unnecessary autopilot cost.