Fix azd tool install azure-skills treating the VS Code Copilot Chat launcher stub as a usable host#8805
Conversation
…entified as copilot cli
There was a problem hiding this comment.
Pull request overview
This PR hardens azd tool install azure-skills host selection by verifying that an agentic CLI host on PATH is actually functional (via a non-interactive --version probe) rather than merely present as a same-named launcher stub (notably the VS Code Copilot Chat extension stub). This makes “no supported host installed” scenarios fail consistently with clear install guidance across macOS/Linux/Windows and avoids triggering stub-driven interactive flows.
Changes:
- Extend
SkillHostmetadata to support a host-binary version probe (BinaryVersionArgs/BinaryVersionRegex) and configure it forcopilotandclaude. - Add
installer.hostUsable(ctx, host)and use it in skill target resolution and preferred-host selection to reject non-functional launcher stubs. - Add a regression test ensuring a
copilotlauncher stub onPATHresults in clean “install GitHub Copilot CLI” guidance and does not attempt plugin install/list actions through the stub.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cli/azd/pkg/tool/manifest.go |
Adds version-probe metadata to SkillHost and wires it for copilot/claude Azure Skills hosts. |
cli/azd/pkg/tool/installer.go |
Implements and applies hostUsable functional probing in host selection/target resolution to avoid stub hosts. |
cli/azd/pkg/tool/installer_test.go |
Updates host-presence mocking for version probes and adds a launcher-stub regression test. |
jongio
left a comment
There was a problem hiding this comment.
Root cause is correct: exec.LookPath returns true for the VS Code Copilot Chat extension's copilot stub, which exits 0 but can't actually install skills. The functional-probe approach (run --version, match a real version string, pipe empty stdin to prevent hangs) is the right fix for this class of false-positive PATH matches.
A few notes:
hostUsablecorrectly falls back to an existence-only check whenBinaryVersionArgs/BinaryVersionRegexaren't configured, so hosts without a version probe aren't penalized.- Context error handling is good: treating a cancelled/timed-out probe as "assume usable" avoids rejecting a real host over a transient issue.
- The regression test (
TestRunSkill_Install_LauncherStubHost_ReturnsInstallGuidance) covers the exact scenario from the bug report with realistic stub output, and also asserts that no plugin commands are attempted through the stub. - Existing test updates (adding
!slices.Contains(args.Args, "--version")guards) are clean. ThemockHostPresencehelper now auto-registers version probes for present hosts, with a comment that later registrations win (confirmed by the mock runner's reverse iteration atmock_runner.go:43).
One observation for a possible follow-up: AvailableSkillHosts (installer.go:235) still uses the old ToolInPath check rather than hostUsable. If a user has both the VS Code stub and a real host (e.g. claude) on PATH, the host-selection prompt at tool.go:676 would list the stub as a valid option. Selecting it would then fail in resolveSkillTargets since that path now calls hostUsable. Not a functional bug (the install/upgrade flow is fully protected), but the prompt-then-reject UX is slightly inconsistent. Updating AvailableSkillHosts to also use hostUsable would fix this, though it requires adding ctx to the Installer interface method signature, so it's reasonable to defer.
jongio
left a comment
There was a problem hiding this comment.
Incremental review of d5f428f (1 new commit since my prior review against 3b163b1).
The second commit closes the gap I flagged: AvailableSkillHosts now calls hostUsable (functional version probe) instead of ToolInPath (existence-only check), so the interactive host picker won't offer a launcher stub that the install path would later reject. The Installer interface, implementation, Manager pass-through, both cmd/tool.go call sites, and all test mocks/call sites are updated consistently.
No new issues from the incremental change. The full PR remains solid: root cause is correctly identified, the functional-probe approach is principled, edge cases (context cancellation, missing version config) are handled, and the regression test covers the reported scenario with realistic stub output.
RickWinter
left a comment
There was a problem hiding this comment.
This resolves #8797 by gating skill-host selection on a functional --version probe instead of a bare exec.LookPath existence check. The root-cause analysis is correct, and routing both the interactive picker and the install path through a single hostUsable so they cannot disagree is the right shape. Reusing matchVersion and adding BinaryVersionArgs / BinaryVersionRegex next to the existing VersionRegex keeps this consistent with how detection already works in the package, and the existence-only fallback leaves behavior unchanged for hosts that configure no probe.
The one concern worth resolving before merge is how strict the probe actually is at line 734. The match succeeds on any version-shaped token anywhere in the host's combined output, so the fix sits one stray semver in a stub's banner away from regressing to the same verification failed path it removes. The repeated probe spawns across the four call sites and the last-match-wins test ordering are lower priority and can follow. I would want the first point either tightened or consciously accepted; nothing else here is blocking.
…na/fix-8797-azure-skills-no-host
|
Thanks for the thorough review, @RickWinter and @jongio. Addressed the follow-ups: Probe strictness —
Added Repeated probe spawns — added an installer-field Self-defending stub test — the launcher-stub test now records and asserts the stub
|
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
jongio
left a comment
There was a problem hiding this comment.
Incremental review of c6e3cc8 (addresses RickWinter's feedback).
The memoization in hostUsable stores only on-PATH probe results correctly: the ToolInPath early-return at the top skips the cache entirely, so a host installed earlier in the same batch (e.g. azd tool install github-copilot-cli azure-skills) is re-probed when its PATH status changes. The lock-probe-lock-store pattern allows benign duplicate probes under concurrency rather than holding the mutex across the subprocess spawn, which is the right tradeoff here.
The \n separator between stdout and stderr in probeOnPathHost ensures (?m)^ anchors work when the version banner appears at the start of stderr output. Regex anchoring to each host's specific banner format closes the incidental-semver gap that was identified.
New tests (TestHostUsable_OnPathProbeMemoizedAcrossPasses, TestHostUsable_NotOnPathResultNotMemoized, TestRunSkill_Install_StubWithIncidentalVersion_Rejected, TestAzureSkillsHostVersionProbeRegex) lock these behaviors with good coverage of the edge cases.
Fixes #8797.
Problem
In the "no host installed" scenario,
azd tool install azure-skillsmisbehaved on macOS/Linux:copilot: Azure Skills was installed via copilot but verification failed.s~/.profile` PATH prompt.Windows already failed cleanly with
no supported agentic CLI host found on PATH.Root cause
azd treated "a file named
copilotexists on PATH" (exec.LookPath) as "a working Copilot CLI host." In VS Code / Codespaces the GitHub Copilot Chat extension drops a smallcopilotlauncher stub at~/.vscode-server/.../github.copilot-chat/copilotCli/copilotand adds it to the integrated-terminal PATH. That stub is not the real CLI — invoking any subcommand (including--version) printsInstall GitHub Copilot CLI? [y/N]and exits 0 without doing anything. azd "installed" the skill through it (a no-op), then verification found nothing → the misleading error. On Windows the stub is not on PATH, so it already failed cleanly.Fix
Gate skill-host selection on a functional probe, not mere existence:
SkillHost.BinaryVersionArgs/BinaryVersionRegex(--version/(\d+\.\d+\.\d+)for copilot and claude).installer.hostUsable(ctx, host)— a host counts only if a non-interactive--versionprobe returns a real version. The probe runs with an empty stdin, so a stub that prompts gets EOF and cannot hang the flow. Wired intopickSkillHost(now takesctx) and bothresolveSkillTargetsloops.A non-functional launcher stub now fails the probe → azd returns the same clean "install GitHub Copilot CLI" guidance on all platforms, and never triggers the launcher`s interactive self-install (which is what hung on Linux).
Verified against real output:
GitHub Copilot CLI 1.0.64-3→ matches1.0.64;2.1.178 (Claude Code)→ matches2.1.178; the stub`s prompt text → no match (rejected).Testing
TestRunSkill_Install_LauncherStubHost_ReturnsInstallGuidance(asserts clean guidance, not "verification failed", and that no install is attempted through the stub).go build ./...,go vet ./...,golangci-lint run ./pkg/tool/...(0 issues),cspell,gofmtall clean.go test ./pkg/tool/... ./cmd ./internal/agentpass.copilotstub on PATH,azd tool install azure-skillsreturns the clean guidance instead of "verification failed".No CHANGELOG entry:
azure-skillsis an unreleased feature (added by #8704 in the same1.26.0-beta.1cycle).