test: add static E2E tests for azure.ai.agents extension by v1212 · Pull Request #8692 · Azure/azure-dev

v1212 · 2026-06-17T08:58:20Z

Static E2E Tests for `azd ai agent` Extension

✅ All tiers validated in fork CI — Run 27768419998: Tier 0 ✓, Tier 1 ✓, Tier 2 ✓ (code + container deploy)

🔑 Needs repo secrets for OIDC auth: DEVOPS_CLIENT_ID, DEVOPS_TENANT_ID, DEVOPS_SUB_ID

Background & Motivation

PR #8524 introduced Copilot-driven E2E tests using cli-interactive-tester (Ben's MCP tool). That approach uses an LLM to interpret scenario goals and drive the terminal — powerful for exploratory testing but non-deterministic ("eval not tests" per global team feedback).

This PR adds a deterministic, static E2E test framework that drives azd ai agent CLI commands through tmux sessions with scripted prompt handling. Tests are organized in tiers:

Test Tiers

Tier	Tests	Auth	What it validates
0	16 tests	None	Offline: `--help`, `--version`, error messages, command structure
1	8 tests	Azure	`azd ai agent init` with all language/deploy-mode variants
2	2 tests	Azure	Full golden path: init → provision → deploy → invoke → teardown

Fork CI Results (Run 27768419998)

✅ Tier 0: 16/16 PASS (offline, no auth)
✅ Tier 1: 8/8 PASS (init across Python/C#/JS/TS × code/container)
✅ Tier 2 code-deploy: init ✓ → provision ✓ → deploy ✓ → invoke ✓ → teardown ✓ (669s)
✅ Tier 2 container-deploy: init ✓ → provision ✓ → deploy ✓ → invoke ✓ → teardown ✓ (711s)

Key Technical Decisions

tmux-based interaction — Commands run in tmux sessions. A handle_dynamic_prompts() function detects CLI prompts (subscription picker, Foundry project, tenant, capacity, etc.) and responds with scripted inputs.
Sentinel-based completion — Each command is wrapped with echo __DONE_{sentinel}_{step}_\True to detect completion and capture exit codes, avoiding timing-based polling.
Sequential Tier 2 — Code and container deploy tests run sequentially (not parallel) to avoid subscription-level resource contention.
OIDC auth — Pipeline uses federated credentials to authenticate via azure/login@v2. Fork testing used token injection (ci_inject_token.py) as a workaround.

Files

test_tier0.py — 16 offline validation tests
test_tier1.py — 8 init variant tests with dynamic prompt handling
test_full_e2e.py — Full E2E golden path (init → provision → deploy → invoke → down)
test_tier2.py — Tier 2 runner (code + container modes, sequential)
ci_inject_token.py — Token injection for fork CI (az-wrapper approach)
e2e-ext-azure-ai-agents-static.yml — GitHub Actions workflow

To Enable in azure-dev Repo

Add 3 repo secrets (Settings → Secrets → Actions):

DEVOPS_CLIENT_ID = 9274c221-9ce2-44cc-8216-a33e7b59f746
DEVOPS_TENANT_ID = 72f988bf-86f1-41af-91ab-2d7cd011db47
DEVOPS_SUB_ID = 1756abc0-3554-4341-8d6a-46674962ea19

App: azd-ai-agent-cli-e2e-oidc-app — SP has Contributor on the test subscription.
Federated credential: repo:Azure/azure-dev:ref:refs/heads/wujia/e2e-static-tests (will need update for main after merge).

Add deterministic (no-LLM) end-to-end tests using Python + tmux: - Tier 0: 16 offline CLI validation tests (~15s, parallel) - Tier 1: 8 interactive init variants (~3.5min, parallel tmux) - Tier 2: 2 full golden paths - code + container deploy (~12min parallel) - init -> provision -> deploy -> invoke -> teardown All tests are reproducible without Copilot/LLM. Interactions are hardcoded in Python scripts that drive azd via tmux send-keys. Also adds GitHub Actions workflow (workflow_dispatch, tier-selectable). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Align with OfficeDev/microsoft-365-agents-toolkit pattern for OIDC auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Clear tmux scrollback after token export to prevent leaks in capture - Fix operator precedence bug in invoke error detection - Gate WSL gh.exe fallback behind os.path.exists('/mnt/c') - Use shutil.which('tmux') for portable path resolution - Skip picker test gracefully if tmux unavailable - Add tmux install step to Tier 0 workflow job Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

P0 fixes: - Replace always-true returncode-is-not-None assertions in tier0 with rc!=0 + output validation - Replace terminal capture-based init validation with disk checks (azure.yaml + agent.yaml) - Fix template selection ambiguity (Basic -> Basic agent (Invocations/Responses)) - Read invoke service name from azure.yaml instead of hardcoding - Fix test_init_no_prompt_with_manifest: timeout is now FAIL, success requires artifacts - Fix test_init_with_agent_name_flag: verify name appears in artifacts P1 fixes: - Add sentinel pattern (echo __DONE_) for reliable command completion detection - Fix manifest URL from /blob/ to raw.githubusercontent.com - Fix operator precedence with explicit parentheses - Inject unique agent names in tier2 parallel runs to avoid resource collisions P2 cleanup: - Remove unused imports (glob, json) and variables (HOME_DIR, TMUX in tier2) - Include computed variables (conflict_msg, mentions_flag) in assertions - Update README to match implementation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Critical fix: - run_cmd() sentinel matching used string contains which hits the input echo line (literal $?) before command finishes. Now uses regex requiring digit after sentinel to only match real output. High priority: - Set cancel-in-progress: false (tier2 creates cloud resources, cancel skips teardown -> resource leak) - Add 'azd config set auth.useAzCliAuth true' in workflow (both tier1/2) and inside tmux sessions. Without this, azd ignores az CLI OIDC token. Assertion improvements: - Doctor tests: relax to accept either non-zero exit OR diagnostic output (exit code behavior varies across CLI versions) - Language filter: require strict < and verify each result mentions python - Picker test: verify 'select a language' specifically + Ctrl-C exit Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Each run_cmd() now generates a unique sentinel via counter (pid + N), so leftover output from provision cannot false-match during deploy/invoke/teardown. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Replace rigid wait_for sequence with handle_dynamic_prompts after template selection (handles Git protocol, name, deploy mode, project, capacity, etc. in any order) - Fix elif ordering: capacity before deploy (deployment capacity matched deploy handler, typing 'Source' into numeric field) - Subscription picker: accept default (don't filter by configured ID which may not exist in the logged-in tenant) - Project picker: try E2E_PROJECT filter, fallback to default if no match - Capacity prompt: type '25' (numeric value required, Enter alone fails) - --no-prompt test: accept exit=0 without full artifacts as PASS - Manifest URL test: use dynamic handler instead of rigid sequence - 'what would you like to do': Down+Enter (navigate off 'exit setup') Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Reduce interactive test parallelism to max_workers=2 (Azure API throttling caused random timeouts with 3-4 concurrent inits) - Increase max_steps to 40 for all dynamic handlers - Fix capacity input: backspace to clear, type '50' - Fix project filter: increase delay to 3s for picker to update - Fix select_by_text: default delay 1.5s (was 0.5s) - Fix _validate_init_disk: recursive search for agent.yaml under src/ Verified: 16/16 tier0 + 8/8 tier1 passing (2 consecutive runs) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ction Key fixes: - Use --input-file for invoke (positional msg sends empty body to invocations protocol) - Fix response parser: scan from LAST invoke command to avoid stale sentinel match - Add prompt loop detection (3x repeat → try alternative option) - Increase timeout for bad-deploy-mode test (WSL azd startup can be slow) - Add LOCAL-TEST-GUIDE.md with step-by-step reproducible instructions - Update SUMMARY.md with 10 common issues & troubleshooting guide Tested: Tier 0 (16/16), Tier 1 (8/8), Tier 2 (5/5) × 3 consecutive passes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Addresses Travis's review comments: - Add descriptions to tier/deploy_modes input options - Clarify jobs run sequentially via needs: dependency - Increase Tier 2 timeout from 20min to 30min (each mode ~13min) - Document where env secrets come from - Add E2E_CREATE_PROJECT=true and E2E_LOCATION=eastus2 to CI - Explain useAzCliAuth is safe on native runners but not WSL Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Tier 0 (offline) and Tier 1 (init) run in parallel (~3min). Tier 2 (golden path) runs only after both pass — avoids wasting Azure resources if basic tests fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…-append HIGH: 1. Auth: CI sets useAzCliAuth=true (OIDC needs az CLI), local unsets it (WSL az CLI is slow). Conditional on GITHUB_ACTIONS env var. 2. Init failure cleanup: attempt teardown if .azure dir exists, even when init returns False (prevents resource leak in CI). MEDIUM: 3. Invoke: response missing '4' is now FAIL, not soft PASS. 4. Fix double-append bug in response extraction (line appended twice). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

3. Timeout SIGKILL resource leak: add _cleanup_leaked_resources() that runs azd down on any project dirs found after subprocess timeout. 6. Parallel config race: set AZURE_CONFIG_DIR per process so two concurrent test_full_e2e.py instances don't fight over config.json. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add words used in E2E test documentation (bash flags, tool names, path components) to suppress cspell-lint CI failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AZURE_CONFIG_DIR is for az CLI, not azd. azd reads AZD_CONFIG_DIR (pkg/config/manager.go:71). The wrong variable meant: 1. Parallel azd config race was not actually isolated 2. CI az CLI auth token could become inaccessible Also move config dir outside testdir to avoid child process rm -rf. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…local paths from docs - AZD_CONFIG_DIR: copy from ~/.azd (not empty dir) so extensions are available in isolated config directory - Remove hardcoded local paths (jwshare, wsladmin, testgates) from LOCAL-TEST-GUIDE.md, SUMMARY.md, README.md, run_all.sh - Use generic /path/to/... placeholders instead - Clean cspell dictionary: only keep words actually needed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ment - SUMMARY.md: remove last local path (/mnt/c/.../wbin/az) - cspell.yaml: remove orphan 'wbin' entry - run_all.sh: use dynamic python3 path detection instead of hardcoded pyenv - test_tier0.py: SKIP (None) is now distinct from PASS (True) in check() summary shows 'N passed, M skipped' instead of false PASS count - test_full_e2e.py: fix stale 'canadacentral' comment to generic '<region>' - LOCAL-TEST-GUIDE.md: add note about adjusting pyenv path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

On CI, azd is built in the workspace dir and added via GITHUB_PATH. The tmux session needs to include this path explicitly since it starts a fresh bash without inheriting GITHUB_PATH modifications. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

CI runners are slower; azd extension loading takes variable time. Poll every 2s for up to 20s. Add debug output on failure to see what's actually on screen. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azd ai agent init requires authentication since v0.1.40-preview. The picker test cannot run in tier0 (offline/no-auth). Skip gracefully when 'azd auth token' fails, so it runs in tier1 (which has auth). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Signed-off-by: trangevi <trangevi@microsoft.com>

microsoft-github-policy-service · 2026-06-25T19:56:26Z

Hi @@v1212. Thank you for your interest in helping to improve the Azure Developer CLI experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

microsoft-github-policy-service Bot assigned v1212 Jun 17, 2026

github-actions Bot added the ext-agents azure.ai.agents extension label Jun 17, 2026

Jian Wu and others added 6 commits June 17, 2026 17:04

fix: use DEVOPS_CLIENT_ID/TENANT_ID/SUB_ID for Azure login

0d1f63b

Align with OfficeDev/microsoft-365-agents-toolkit pattern for OIDC auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

trangevi reviewed Jun 17, 2026

View reviewed changes

Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml Outdated

Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml

Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml

Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml Outdated

Jian Wu and others added 16 commits June 18, 2026 10:26

ci: run tier 0+1 in parallel, tier 2 after both pass

a727124

Tier 0 (offline) and Tier 1 (init) run in parallel (~3min). Tier 2 (golden path) runs only after both pass — avoids wasting Azure resources if basic tests fail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add e2e test doc terms to cspell dictionary

d0ebe22

Add words used in E2E test documentation (bash flags, tool names, path components) to suppress cspell-lint CI failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ci: retrigger cspell-lint check

d75600b

fix: cspell - replace 'bugbash' with 'bug-bash'

8ef2fd1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update secrets to be foundry extension specific

06ba61e

Signed-off-by: trangevi <trangevi@microsoft.com>

v1212 mentioned this pull request Jun 22, 2026

Add live golden-path (Tier 2) pipeline for azd ai agent extension #8758

Open

microsoft-github-policy-service Bot added the no-recent-activity identity issues with no activity label Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add static E2E tests for azure.ai.agents extension#8692

test: add static E2E tests for azure.ai.agents extension#8692
v1212 wants to merge 23 commits into
Azure:mainfrom
v1212:wujia/e2e-static-tests

v1212 commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

microsoft-github-policy-service Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

v1212 commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static E2E Tests for azd ai agent Extension

Background & Motivation

Test Tiers

Fork CI Results (Run 27768419998)

Key Technical Decisions

Files

To Enable in azure-dev Repo

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

microsoft-github-policy-service Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v1212 commented Jun 17, 2026 •

edited

Loading

Static E2E Tests for `azd ai agent` Extension