Skip to content

Add live golden-path (Tier 2) pipeline for azd ai agent extension#8758

Open
v1212 wants to merge 29 commits into
mainfrom
wujia/ai-agent-live-tests
Open

Add live golden-path (Tier 2) pipeline for azd ai agent extension#8758
v1212 wants to merge 29 commits into
mainfrom
wujia/ai-agent-live-tests

Conversation

@v1212

@v1212 v1212 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Fixes #8759

Summary

  • Adds eng/pipelines/ext-azure-ai-agents-live.yml: an on-demand / weekly Azure DevOps pipeline that runs the Tier 2 live golden path (init → provision → deploy → invoke → down) for the azure.ai.agents extension against a real Azure (TME) subscription, for both code and container deploy modes.
  • Adds the Tier 2 live driver as a Go test (tier2_live_test.go, console_test.go, assert.go) under cli/azd/extensions/azure.ai.agents/tests/e2e-live/, driving azd ai agent through a pseudo-terminal with go-expect + vt10x + creack/pty (replacing the original tmux/Python prototype from test: add static E2E tests for azure.ai.agents extension #8692), with CI auth detection for Azure DevOps (TF_BUILD / E2E_USE_AZ_CLI_AUTH).
  • Adds a README.md documenting how to run (CI + local) and the one-time ADO setup.

Together with the PR-gate tests in #8754 (Tier 0 offline + Tier 1 recording/playback), this covers Tier 0/1/2 from the original prototype (#8692).

At a glance

  1. What it tests — the full live golden path init → provision → deploy → invoke → down of azd ai agent, driving a real CLI over a pseudo-terminal (Go go-expect + vt10x + creack/pty), for both code (zip) and container (ACR build) deploy modes, run sequentially. The interactive init phase is an event-driven prompt loop (wait for the survey UI to settle, read the rendered screen, dispatch on the verbatim prompt strings) rather than a fixed script, so it tolerates the live model/region/project branches. The invoke step asserts a live model answers "what is 2+2?" with a standalone 4 / four.

  2. How to triggertrigger: none / pr: none, so it never runs on PR push. Run it via: a PR comment /azp run ext-azure-ai-agents-live (requires repo write), the weekly Mon 07:00 UTC schedule, or a manual ADO queue (pick deployModes = both / code / container).

  3. Identity / subscription / what it does — authenticates as the azure-sdk-tests ADO service connection (Workload Identity Federation / OIDC) into the shared TME test subscription, region eastus2. It creates a fresh Foundry project + gpt-4o-mini deployment (+ an ACR in container mode) and the agent service, exercises them, then tears them down. az logs in via OIDC and azd reuses that session (auth.useAzCliAuth + keepAzSessionActive: true, so the WIF token survives the multi-minute run).

  4. Leftover resources?No, by design. Each mode registers a t.Cleanup that runs azd down --force --purge, and a teardown failure fails the run (never green while leaking). A condition: always() AzureCLI@2 step is a second safety net: it force-purges any project dirs left under /tmp/e2e-tests/tier2-*/ after a crash or timeout. (Crash-path cleanup is best-effort, so the TME subscription is worth an occasional spot-check.)

  5. Runs on Azure DevOps, not GitHub Actions — the YAML lives in this GitHub repo (eng/pipelines/…) but is an ADO pipeline (extends: 1es-redirect, task: AzureCLI@2, $(Build.*), ##vso[...]), executed by the azure-sdk ADO project. GitHub CI for this PR stays green/inert; the live run only happens in ADO.

  6. Post-merge admin action — needs an Azure DevOps admin on the azure-sdk project (Azure SDK EngSys)not a GitHub admin — to: (a) register the pipeline as ext-azure-ai-agents-live (the exact name /azp run uses) against this repo + YAML path; (b) confirm the azure-sdk-tests service connection maps to the TME subscription with RBAC to create Foundry projects and deploy models (Contributor + Azure AI Developer + Cognitive Services Contributor, or equivalent); (c) kick the first /azp run to validate the keepAzSessionActive auth path. Merging the PR itself only needs a GitHub maintainer approval. The init GitHub token is already wired via the ambient azuresdk-github-pat org secret, so no extra secret setup is needed.

Testing

  • The golden-path flow was validated end-to-end in the test: add static E2E tests for azure.ai.agents extension #8692 prototype fork run (code ~669s, container ~711s). The driver has since been rewritten from tmux/Python to the Go PTY harness in this PR; the live run of the Go version is exercised post-merge via /azp run (the test is gated behind AZURE_AI_AGENTS_E2E_LIVE=1 and never runs in PR CI).
  • Local validation of the Go harness: gofmt, go vet, golangci-lint (ubuntu + windows), and a go test -c compile all clean on the linux build tag; the no-tag assert.go answer-matching logic is covered by assert_test.go (runs on the host); README.md passes cspell.
  • Driver design choices hardened across review: an event-driven prompt loop with output-settle detection (replacing a fixed poll) so live conditional prompts don't desync; loop detection that advances past or fails out of a stuck prompt; t.Context() as the run-context parent so go test -timeout cancellation propagates; LIFO t.Cleanup ordering so teardown runs before the copied AZD_CONFIG_DIR is removed; a destructive-RemoveAll guardrail on the test dir; and a robust standalone-token check for the invoke result (accepts 4 or four, ignores incidental 4s such as gpt-4o-mini / 4.1 / 404).
  • Timeout hierarchy: per-mode runTimeout (60 min, a context deadline) > the sum of the per-phase budgets (init/provision/deploy/invoke ≈ 32 min) so a slow-but-healthy run is never preempted mid-phase; both modes fit under go test -timeout 125m, and the ADO AzureCLI@2 step (130 min) and job (150 min) caps sit strictly above that, so a run can never be hard-killed mid-azd down (which would leak live resources).
  • The ADO pipeline itself only runs once registered in ADO (inert in GitHub CI by design).

Adds eng/pipelines/ext-azure-ai-agents-live.yml, an on-demand/weekly Azure DevOps pipeline that drives the real 'azd ai agent' CLI through tmux against live Azure (TME), exercising init -> provision -> deploy -> invoke -> down for both code and container deploy modes.

This is the live counterpart to the PR-gate checks (Tier 0 offline + Tier 1 recording/playback in #8754). Per Azure SDK EngSys / SFI guidance, live access stays out of the automatic PR pipeline (trigger: none) and runs only via '/azp run ext-azure-ai-agents-live' or the weekly schedule.

The Tier 2 tmux driver (test_full_e2e.py, test_tier2.py) is migrated from the #8692 prototype; CI auth detection is extended to recognize Azure DevOps (TF_BUILD) and an explicit E2E_USE_AZ_CLI_AUTH override.
@v1212 v1212 added ext-agents azure.ai.agents extension ext-foundry azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes}, microsoft.foundry labels Jun 22, 2026
Jian Wu added 2 commits June 22, 2026 19:54
…zSessionActive

The azure-sdk-tests service connection uses Workload Identity Federation, whose
az session is isolated to the task's private AZURE_CONFIG_DIR and expires after
~10 min. Running the ~50 min golden-path test (and the cleanup) as plain bash
steps after a separate login step would fail auth on both counts. Run them
inside AzureCLI@2 with keepAzSessionActive:true (matching build-cli.yml) so the
session stays refreshed and reaches azd (auth.useAzCliAuth) through tmux, which
inherits AZURE_CONFIG_DIR. Subscription/tenant are now read in-script via
az account show instead of cross-step pipeline variables.
test_tier2.py always ran sequentially, but kept a tautological if-condition
(len==1 or len>1), an unused concurrent.futures import, a no-op --serial flag,
and a docstring/print claiming parallel execution. Simplify to an explicit
sequential loop and update the docstring to match. Also fix test_full_e2e.py's
module docstring to point at README.md (LOCAL-TEST-GUIDE.md does not exist).
@v1212 v1212 marked this pull request as ready for review June 22, 2026 12:08
Copilot AI review requested due to automatic review settings June 22, 2026 12:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a dedicated Azure DevOps pipeline and supporting Python drivers/docs to run the Tier 2 live golden-path E2E for the azure.ai.agents azd extension (init → provision → deploy → invoke → down) against real Azure resources, outside of the GitHub PR gate.

Changes:

  • Added an on-demand + weekly ADO pipeline (ext-azure-ai-agents-live) to run Tier 2 live E2E for both code and container deploy modes.
  • Added a tmux-driven Python Tier 2 runner (test_tier2.py) and a full golden-path driver (test_full_e2e.py) adapted for ADO CI auth detection.
  • Added documentation for running the live Tier 2 tests locally and in CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
eng/pipelines/ext-azure-ai-agents-live.yml New ADO pipeline wiring to build azd + extension, run Tier 2 live E2E under AzureCLI@2, and publish logs/cleanup.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Tier 2 orchestrator: runs code + container golden paths sequentially with isolation and timeout handling.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py tmux-driven end-to-end golden path driver (init/provision/deploy/invoke/down) with CI/local auth switching.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md Docs for CI setup (ADO registration/service connection/secrets) and local WSL execution.

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
- Use the ambient azure-sdk org secret `azuresdk-github-pat` for GH_TOKEN
  instead of an empty `GitHubPat` placeholder variable (mirrors eval-waza.yml);
  removes a misleading masked variable and the need for admin PAT setup.
- Harden the AzureCLI@2 inline script: `set -euo pipefail` and assign-then-verify
  subscription/tenant so an `az account show` failure fails fast (a plain
  `export X=$(...)` would have masked the error from set -e).
- Reword the extension-install comment to be self-contained (it no longer
  inaccurately claims to mirror lint-ext-azure-ai-agents.yml).
- Clarify the test_full_e2e.py auth prerequisite: only local WSL runs leave
  auth.useAzCliAuth unset; CI auto-enables az CLI auth.
- Clear tmux scrollback after env setup so the exported GH token cannot leak
  into capture() output on failures/timeouts.
- _cleanup_leaked_resources now checks azd down's return code and reports
  failures instead of always printing "Cleanup complete".

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
- Stream child E2E output live with a watchdog-enforced hard timeout
  instead of buffering everything via capture_output
- Shell-escape the GitHub token (shlex.quote) before exporting in tmux
- Clean up the per-mode AZD_CONFIG_DIR temp copy unless E2E_KEEP_ARTIFACTS
- Use sha256 instead of md5 for the agent-name uniqueness suffix
- Derive the agent binary arch from uname -m instead of hard-coding amd64

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
- Shell-escape HOME/PATH/TENANT, the cd target, and the agent name with
  shlex.quote() (consistent with the earlier token fix)
- On Tier 2 timeout, kill the child's detached tmux server so reused CI
  agents do not accumulate orphaned tmux sockets
@v1212 v1212 requested a review from Copilot June 22, 2026 13:18

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go rewrite (commits c62a696 and 4d27988) directly addresses @vhvb1989's direction request. The interactive driver now uses go-expect + vt10x + creack/pty with event-driven synchronization (waitForQuiet) instead of fixed-interval polling. This is solid work.

What looks good:

  • Event-driven sync: promptQuiet/listSettle waits for the UI to stop emitting before dispatching, rather than sleeping a fixed interval. Correct approach for flake-free interactive tests.
  • Timeout hierarchy: per-mode runTimeout (60m) < go test -timeout (125m) < ADO step (130m) < job (150m). Each layer has room for graceful teardown before the next one kills it.
  • Cleanup safety: assertSafeTestDir blocks dangerous os.RemoveAll on protected paths; t.Cleanup(r.teardown) guarantees azd down --force --purge runs even on failure; the condition: always() pipeline step is a crash-path backstop.
  • Platform-portable assert logic: assert.go builds on all platforms (no build tag), with the regex-free standalone-"4" detection explained clearly.
  • Pipeline arch detection: uname -m mapping to GOARCH means the step won't break if the pool moves off amd64.

One observation for the conversation with @vhvb1989:

The latest commits already deliver what was requested. The old Python/tmux files are gone, the Go driver is in place, and it uses exactly the libraries and approach described in the direction-change comment. The driver is local to the extension's tests/e2e-live/ for now (as suggested), ready to lift into the shared module when #8795 lands.

Minor notes (non-blocking):

  1. tier2_live_test.go at 859 lines is large for a single file, but splitting a sequential test flow (init/provision/deploy/invoke/down + all helpers) across multiple files would hurt readability. Acceptable.
  2. The dispatchPrompt source-line annotations (e.g., init.go:722) are useful for maintenance but will become stale as the extension evolves. The README already documents this trade-off ("changes there can require updating dispatchPrompt"). Consider a lint or generator if drift becomes a maintenance burden.
  3. findServiceName uses line-by-line string matching to parse azure.yaml. This is pragmatic for the known scaffolded output format and avoids pulling in a YAML library for one test helper. Fine as-is.

@v1212

v1212 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @vhvb1989 — done. Ported the interactive init driver to Go (go-expect + vt10x + creack/pty) and dropped the tmux/Python files; the --no-prompt phases stay as os/exec calls.

Per "one thing to validate" note: I did a source-grounded spike and calibrated the prompt dispatch to the real azd ai agent init strings (event-driven sync; deploy mode via --deploy-mode). The one thing I can't check locally — vt10x rendering the live wizard — gets exercised on the first ADO /azp run; I'll validate/debug from there.

@v1212

v1212 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @jongio. The driver changed — the Go rewrite removed the original Python/tmux code, so those first-review points are gone. The three notes on the new code are acknowledged trade-offs (the dispatchPrompt drift one is documented in the README); I'll revisit a generator if that drift becomes painful. No code changes needed — ready for another look.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My prior analysis (two rounds against this same HEAD) still holds. The Go rewrite using go-expect + vt10x + creack/pty is solid: event-driven synchronization via waitForQuiet, clean timeout hierarchy, three-layer resource cleanup, and well-tested assert logic.

One code modernization note below. Also: the PR currently has merge conflicts that need resolving before merge.

No blocking issues from my side.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go rewrite from tmux+Python to a PTY-based Go driver is a significant improvement: same toolchain as the rest of the extension, no Python/tmux dependency on the agent, and the event-driven dispatch loop (wait for survey UI to settle, then match the rendered prompt) is more robust than fixed polling.

A few observations on the final state:

  1. findServiceName naive YAML walk (tier2_live_test.go): The line-by-line parser assumes a specific indentation structure. Since azd ai agent init always scaffolds the same format this is fine today, but if the template output changes it will silently break. Consider a targeted grep -m1 approach or importing a minimal YAML decoder if the function gets reused.

  2. Timeout cascading is well-designed: per-mode 60m > sum-of-phases 50m (10m teardown margin), step 130m > go-test 125m, job 150m > step. This ensures graceful teardown always runs before hard kills, preventing resource leaks. The commit that adjusted these (db53b06) explains the invariant clearly.

  3. responseHasExpectedAnswer rejects "4.0" by design (the 4 followed by . then digit rule). Worth a brief code comment noting this is intentional, since a future maintainer might wonder why a valid math answer is rejected. The existing unit tests cover this implicitly via the "version" case ("4.1") but an explicit "4.0" case in the table would make the design intent obvious.

  4. Dependency versions (go-expect, creack/pty, vt10x) are the same 2022 pseudo-versions that AlecAivazis/survey uses in its own test suite. They are proven to work well together for this exact use case. No action needed, but worth noting in case a future dep audit flags them as stale.

Jian Wu added 2 commits June 25, 2026 14:20
Per AGENTS.md Go 1.26 guidance, switch exitCode() from errors.As to errors.AsType[*exec.ExitError]. Add an explicit "4.0" case to the assert table and document why a decimal answer is deliberately rejected.
# Conflicts:
#	cli/azd/extensions/azure.ai.agents/cspell.yaml

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit on the safety-net cleanup step: the 2>/dev/null on line 213 swallows all diagnostic output from azd down. If a crash-path cleanup fails, the pipeline log won't show why. Since the task already sets continueOnError: true (so a non-zero exit can't fail the build), consider 2>&1 || true instead to keep diagnostic output visible for post-mortem investigation while still preventing the set -e from aborting the loop.

Not blocking, the current pattern is defensible for a quiet safety net.

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Use 2>&1 instead of 2>/dev/null in the always() cleanup loop so a failed force-purge is visible for debugging a resource leak; keep || true so set -e doesn't abort the loop.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go rewrite of the Tier 2 driver is well-structured: event-driven prompt dispatch, proper timeout hierarchy (per-mode 60m < go test 125m < step 130m < job 150m), and LIFO cleanup ordering so teardown always runs before the config dir it depends on is removed.

One minor observation below on findServiceName; everything else looks solid after 12+ rounds of iteration.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crash-path cleanup now surfaces stderr via 2>&1 instead of swallowing it, making failed force-purge attempts visible in the pipeline log. The || true guard still prevents the loop from stopping on individual azd down failures.

findServiceName (tier2_live_test.go:649) parses azure.yaml by string matching rather than a YAML library. This works for the predictable output of azd ai agent init, but could silently break on edge-case YAML (a comment containing services:, or a multi-line string value). Low risk since the generated files are predictable, but worth noting if the test starts failing on valid projects.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated
Replace the indentation/colon line scan with a minimal struct unmarshal (gopkg.in/yaml.v3, already a direct dep) so the service-name lookup tolerates comments and indentation changes in scaffolded azure.yaml.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion on context propagation in the test runner.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion on context propagation in the test runner.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff (8 files, +1550/-1). My prior feedback from yesterday was addressed in the latest commits (errors.AsType, stderr visibility in cleanup, struct-based YAML parsing, t.Context() propagation).

Fresh pass on the current state: the test architecture is well-designed. The event-driven prompt loop (go-expect waitForQuiet instead of fixed-interval polling), the bounded ring buffer for diagnostics, the LIFO cleanup ordering, and the tiered timeout hierarchy (per-phase < per-mode < go-test < ADO step < ADO job) are all solid engineering.

One non-blocking note for future maintenance: the dispatchPrompt cases are tightly coupled to the extension's prompt strings (each case annotated with source file:line). The README documents this, but consider adding a CI step or a grep-based check that flags when prompt-emitting lines in the extension source change without a corresponding update here. That would catch desync earlier than a full Tier 2 run.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the Go PTY test driver, answer matcher logic, console helpers, pipeline YAML, and README.

Prior feedback (errors.AsType, t.Context(), yaml struct parsing, stderr redirect) is all addressed. CI is green across all checks.

The timeout hierarchy, LIFO cleanup ordering, assertSafeTestDir guard, and event-driven prompt loop with loop detection are correctly implemented. The standalone-4 checker handles edge cases well (model names, versions, status codes). Dependencies were already in go.sum as indirect deps; the go.mod change just promotes them to direct.

One non-blocking suggestion on the dispatchPrompt default case.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all 8 files (1554 additions). My earlier comments (errors.AsType, t.Context(), yaml.Unmarshal for service name parsing, unhandled-prompt logging) are all addressed in the current revision.

Checked specifically:

  • Timeout hierarchy math: per-mode 60 min runTimeout x2 + teardown fits inside the go test -timeout 125m cap, and the ADO step (130 min) and job (150 min) caps sit above that. The condition: always() cleanup step provides a second safety net if the process is force-killed mid-teardown.
  • Pipeline security: no hardcoded secrets, proper WIF auth via service connection, the GH_TOKEN comes from an ambient org secret. The set -euo pipefail + assign-then-verify pattern for subscription/tenant avoids the export X=$(cmd) pitfall.
  • PTY wiring: Setsid: true, Setctty: true with stdin as the pty slave correctly gives the child a controlling terminal. The event-driven waitForQuiet approach with the loop detector (repeat >= 3/5) handles both slow CI and stuck prompts without hanging.
  • Cleanup ordering: LIFO t.Cleanup ensures azd down runs before the copied AZD_CONFIG_DIR is deleted. The context.Background() in teardown is correct (must survive test-context cancellation).
  • assertSafeTestDir guard: protects against destructive RemoveAll on protected paths.

No blocking issues. The test-only dependencies (go-expect, vt10x, creack/pty) are at the same versions used by the survey/expect ecosystem and are already present in go.sum as indirect deps (no go.sum diff needed).

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior feedback addressed: errors.AsType, t.Context(), yaml.Unmarshal for service name parsing, default-case logging for unhandled prompts.

One maintainability note on the prompt dispatch (inline).

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit on the location dispatch case for consistency with the other tightened prompt matches.

Merging main shifted prompt_service.go, leaving the cited line numbers
(143/190/226) pointing at the wrong lines. Drop the brittle file:line
reference and rely on the function name plus the quoted picker messages,
which the case clauses already match exactly.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous feedback (errors.AsType, t.Context(), yaml.Unmarshal for service name parsing, tightening prompt dispatch matches, logging default-case fall-throughs) all addressed in recent commits. Code quality is solid.

One observation: the PR description mentions a condition: always() AzureCLI@2 step as a "second safety net" for crash-path cleanup, but the pipeline YAML only has PublishPipelineArtifact@1 with condition: always(). The Go .Cleanup(r.teardown) approach is sufficient given the timeout hierarchy (job 150m > step 130m > go test 125m > per-mode 60m), so this isn't a gap in practice, but the description is slightly overstated. Consider updating the PR body to reflect the actual cleanup mechanism (t.Cleanup + LIFO ordering) or adding the explicit cleanup step if you want the belt-and-suspenders guarantee for cases where the Go process is killed before t.Cleanup runs.

No blocking issues. The event-driven prompt loop design, the timeout hierarchy, and the �ssertSafeTestDir guard are all well thought out.

ab890b1 tightened the subscription case to has("select an azure
subscription"), but that string is the extension's fmt.Println preamble,
not the survey "?" line activePrompt reads. ensureSubscription passes an
empty PromptSubscriptionRequest, so the picker renders azd-core's default
message "Select subscription" (prompt_service.go:507). The tightened match
missed it, sending the prompt to the default Enter and selecting the wrong
subscription in a live run. Match "select subscription" instead.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental commit fixes the subscription dispatch to match azd-core's actual picker message (Select subscription from prompt_service.go:507) rather than the extension's preamble text. Verified: activePrompt lowercases before dispatch, so the case-sensitive has() correctly matches.

Minor note for future awareness: select subscription (singular) is also a prefix/substring of select subscriptions to include in the filter (config_sub_filter.go:188). This won't cause false matches since that multi-subscription filter prompt doesn't appear in the agent init flow, but worth keeping in mind if future init steps add subscription filtering.

@RickWinter RickWinter left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a Tier 2 live golden-path test (init/provision/deploy/invoke/down) for the azure.ai.agents extension, driven through a real PTY with go-expect + vt10x, plus an on-demand/weekly Azure DevOps pipeline that runs it against the TME subscription. The shape is right: it is gated behind AZURE_AI_AGENTS_E2E_LIVE, kept out of the PR pipeline, runs the two deploy modes sequentially, and registers an azd down teardown with a condition: always() safety net so a crash does not leak billable resources. go.sum already carries the three new direct dependencies, so the extension module still builds, and the answer matcher is well-factored and covered on its own by assert_test.go.

One thing is worth tightening before a green run is trusted as signal: the invoke assertion matches a standalone "4" against the entire combined CLI output rather than the agent's actual response, so incidental azd chatter (an elapsed-seconds token, a count, a step number) could turn the check green while the agent is broken. That is the difference between a live test that verifies the agent and one that mostly verifies the pipeline ran, so I would scope it to the response payload.

The rest are non-blocking. The numerous file:line source citations will rot silently the first time those files are edited, and the pipeline hand-writes the extension manifest into config.json where it can drift from the real one. Nothing here runs in PR CI, so the blast radius is bounded to the manual or scheduled live run. None of it blocks merge; I would fix the invoke matcher scope before relying on a green run and treat the other two as follow-ups.

continue
}

if !responseHasExpectedAnswer(out) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

responseHasExpectedAnswer here runs against out, which is the full combined stdout+stderr of azd ai agent invoke, not just the agent's response. The matcher is careful to reject gpt-4o-mini, 4.1, and 404, but it still accepts any standalone 4, and azd's own output can emit one (think "4 seconds", "4 warnings", a step counter). If that happens the invoke check goes green even when the model never answered, which quietly defeats the assertion. Can we scope the match to the agent's actual response (parse the invoke output field) instead of the whole stream?

// Deploy mode is NOT an interactive prompt in the template/--agent-name
// flow: init auto-resolves it to "container" when a manifest is provided
// (init_from_code.go:1373), so it must be chosen via the --deploy-mode flag
// (init.go:1306). r.mode is exactly "container" or "code".

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these file:line anchors (init.go:1306 here, init_from_code.go:1373 above, and the dozen more in dispatchPrompt) are genuinely helpful context, but they rot the first time someone edits those files and nothing fails when they do. Could we anchor to the function name or the message string/constant instead of a line number, so the reference survives unrelated edits above it?

"azure.ai.agents": {
"id": "azure.ai.agents",
"namespace": "ai.agent",
"capabilities": ["custom-commands", "lifecycle-events", "mcp-server", "service-target-provider", "metadata"],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this inlines the extension's manifest (capabilities, displayName, description, usage) into the pipeline. It drifts from the real manifest the moment that changes, and a stale capabilities list here could either mask a capability-resolution regression or break the run for a reason unrelated to the test. Since we just built the binary one step up, can we let azd install/register it (or read the real manifest) rather than hand-writing config.json?

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed RickWinter's invoke assertion concern (replied on his thread with details). responseHasExpectedAnswer runs on the full combined stdout+stderr stream, and patterns like "4 seconds" or "step 4:" pass the standalone-four check since space/colon aren't word runes. Consider adding -o raw to the invoke args so the assertion is scoped to the server response payload only.

No other new findings. The event-driven dispatch loop, layered timeout hierarchy (per-mode 60m < go test 125m < ADO step 130m < job 150m), and LIFO cleanup ordering are correctly structured. Prior review feedback has been addressed in the current HEAD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.agents extension ext-foundry azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes}, microsoft.foundry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add automated test infrastructure (PR-gate + live E2E) for the azure.ai.agents extension

6 participants