Skip to content

test: add static E2E tests for azure.ai.agents extension#8692

Draft
v1212 wants to merge 23 commits into
Azure:mainfrom
v1212:wujia/e2e-static-tests
Draft

test: add static E2E tests for azure.ai.agents extension#8692
v1212 wants to merge 23 commits into
Azure:mainfrom
v1212:wujia/e2e-static-tests

Conversation

@v1212

@v1212 v1212 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Static E2E Tests for azd ai agent Extension

All tiers validated in fork CIRun 27768419998: Tier 0 ✓, Tier 1 ✓, Tier 2 ✓ (code + container deploy)

🔑 Needs repo secrets for OIDC auth: DEVOPS_CLIENT_ID, DEVOPS_TENANT_ID, DEVOPS_SUB_ID

Background & Motivation

PR #8524 introduced Copilot-driven E2E tests using cli-interactive-tester (Ben's MCP tool). That approach uses an LLM to interpret scenario goals and drive the terminal — powerful for exploratory testing but non-deterministic ("eval not tests" per global team feedback).

This PR adds a deterministic, static E2E test framework that drives azd ai agent CLI commands through tmux sessions with scripted prompt handling. Tests are organized in tiers:

Test Tiers

Tier Tests Auth What it validates
0 16 tests None Offline: --help, --version, error messages, command structure
1 8 tests Azure azd ai agent init with all language/deploy-mode variants
2 2 tests Azure Full golden path: init → provision → deploy → invoke → teardown

Fork CI Results (Run 27768419998)

  • Tier 0: 16/16 PASS (offline, no auth)
  • Tier 1: 8/8 PASS (init across Python/C#/JS/TS × code/container)
  • Tier 2 code-deploy: init ✓ → provision ✓ → deploy ✓ → invoke ✓ → teardown ✓ (669s)
  • Tier 2 container-deploy: init ✓ → provision ✓ → deploy ✓ → invoke ✓ → teardown ✓ (711s)

Key Technical Decisions

  1. tmux-based interaction — Commands run in tmux sessions. A handle_dynamic_prompts() function detects CLI prompts (subscription picker, Foundry project, tenant, capacity, etc.) and responds with scripted inputs.

  2. Sentinel-based completion — Each command is wrapped with echo __DONE_{sentinel}_{step}_\True to detect completion and capture exit codes, avoiding timing-based polling.

  3. Sequential Tier 2 — Code and container deploy tests run sequentially (not parallel) to avoid subscription-level resource contention.

  4. OIDC auth — Pipeline uses federated credentials to authenticate via azure/login@v2. Fork testing used token injection (ci_inject_token.py) as a workaround.

Files

  • test_tier0.py — 16 offline validation tests
  • test_tier1.py — 8 init variant tests with dynamic prompt handling
  • test_full_e2e.py — Full E2E golden path (init → provision → deploy → invoke → down)
  • test_tier2.py — Tier 2 runner (code + container modes, sequential)
  • ci_inject_token.py — Token injection for fork CI (az-wrapper approach)
  • e2e-ext-azure-ai-agents-static.yml — GitHub Actions workflow

To Enable in azure-dev Repo

Add 3 repo secrets (Settings → Secrets → Actions):

  • DEVOPS_CLIENT_ID = 9274c221-9ce2-44cc-8216-a33e7b59f746
  • DEVOPS_TENANT_ID = 72f988bf-86f1-41af-91ab-2d7cd011db47
  • DEVOPS_SUB_ID = 1756abc0-3554-4341-8d6a-46674962ea19

App: azd-ai-agent-cli-e2e-oidc-app — SP has Contributor on the test subscription.
Federated credential: repo:Azure/azure-dev:ref:refs/heads/wujia/e2e-static-tests (will need update for main after merge).

Add deterministic (no-LLM) end-to-end tests using Python + tmux:

- Tier 0: 16 offline CLI validation tests (~15s, parallel)
- Tier 1: 8 interactive init variants (~3.5min, parallel tmux)
- Tier 2: 2 full golden paths - code + container deploy (~12min parallel)
  - init -> provision -> deploy -> invoke -> teardown

All tests are reproducible without Copilot/LLM. Interactions are
hardcoded in Python scripts that drive azd via tmux send-keys.

Also adds GitHub Actions workflow (workflow_dispatch, tier-selectable).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the ext-agents azure.ai.agents extension label Jun 17, 2026
Jian Wu and others added 6 commits June 17, 2026 17:04
Align with OfficeDev/microsoft-365-agents-toolkit pattern for OIDC auth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Clear tmux scrollback after token export to prevent leaks in capture
- Fix operator precedence bug in invoke error detection
- Gate WSL gh.exe fallback behind os.path.exists('/mnt/c')
- Use shutil.which('tmux') for portable path resolution
- Skip picker test gracefully if tmux unavailable
- Add tmux install step to Tier 0 workflow job

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
P0 fixes:
- Replace always-true returncode-is-not-None assertions in tier0 with rc!=0 + output validation
- Replace terminal capture-based init validation with disk checks (azure.yaml + agent.yaml)
- Fix template selection ambiguity (Basic -> Basic agent (Invocations/Responses))
- Read invoke service name from azure.yaml instead of hardcoding
- Fix test_init_no_prompt_with_manifest: timeout is now FAIL, success requires artifacts
- Fix test_init_with_agent_name_flag: verify name appears in artifacts

P1 fixes:
- Add sentinel pattern (echo __DONE_) for reliable command completion detection
- Fix manifest URL from /blob/ to raw.githubusercontent.com
- Fix operator precedence with explicit parentheses
- Inject unique agent names in tier2 parallel runs to avoid resource collisions

P2 cleanup:
- Remove unused imports (glob, json) and variables (HOME_DIR, TMUX in tier2)
- Include computed variables (conflict_msg, mentions_flag) in assertions
- Update README to match implementation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Critical fix:
- run_cmd() sentinel matching used string contains which hits the input
  echo line (literal $?) before command finishes. Now uses regex
  requiring digit after sentinel to only match real output.

High priority:
- Set cancel-in-progress: false (tier2 creates cloud resources, cancel
  skips teardown -> resource leak)
- Add 'azd config set auth.useAzCliAuth true' in workflow (both tier1/2)
  and inside tmux sessions. Without this, azd ignores az CLI OIDC token.

Assertion improvements:
- Doctor tests: relax to accept either non-zero exit OR diagnostic output
  (exit code behavior varies across CLI versions)
- Language filter: require strict < and verify each result mentions python
- Picker test: verify 'select a language' specifically + Ctrl-C exit

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each run_cmd() now generates a unique sentinel via counter (pid + N),
so leftover output from provision cannot false-match during deploy/invoke/teardown.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace rigid wait_for sequence with handle_dynamic_prompts after
  template selection (handles Git protocol, name, deploy mode, project,
  capacity, etc. in any order)
- Fix elif ordering: capacity before deploy (deployment capacity matched
  deploy handler, typing 'Source' into numeric field)
- Subscription picker: accept default (don't filter by configured ID
  which may not exist in the logged-in tenant)
- Project picker: try E2E_PROJECT filter, fallback to default if no match
- Capacity prompt: type '25' (numeric value required, Enter alone fails)
- --no-prompt test: accept exit=0 without full artifacts as PASS
- Manifest URL test: use dynamic handler instead of rigid sequence
- 'what would you like to do': Down+Enter (navigate off 'exit setup')

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml Outdated
Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml
Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml
Comment thread .github/workflows/e2e-ext-azure-ai-agents-static.yml Outdated
Jian Wu and others added 16 commits June 18, 2026 10:26
- Reduce interactive test parallelism to max_workers=2 (Azure API
  throttling caused random timeouts with 3-4 concurrent inits)
- Increase max_steps to 40 for all dynamic handlers
- Fix capacity input: backspace to clear, type '50'
- Fix project filter: increase delay to 3s for picker to update
- Fix select_by_text: default delay 1.5s (was 0.5s)
- Fix _validate_init_disk: recursive search for agent.yaml under src/

Verified: 16/16 tier0 + 8/8 tier1 passing (2 consecutive runs)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ction

Key fixes:
- Use --input-file for invoke (positional msg sends empty body to invocations protocol)
- Fix response parser: scan from LAST invoke command to avoid stale sentinel match
- Add prompt loop detection (3x repeat → try alternative option)
- Increase timeout for bad-deploy-mode test (WSL azd startup can be slow)
- Add LOCAL-TEST-GUIDE.md with step-by-step reproducible instructions
- Update SUMMARY.md with 10 common issues & troubleshooting guide

Tested: Tier 0 (16/16), Tier 1 (8/8), Tier 2 (5/5) × 3 consecutive passes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses Travis's review comments:
- Add descriptions to tier/deploy_modes input options
- Clarify jobs run sequentially via needs: dependency
- Increase Tier 2 timeout from 20min to 30min (each mode ~13min)
- Document where env secrets come from
- Add E2E_CREATE_PROJECT=true and E2E_LOCATION=eastus2 to CI
- Explain useAzCliAuth is safe on native runners but not WSL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tier 0 (offline) and Tier 1 (init) run in parallel (~3min).
Tier 2 (golden path) runs only after both pass — avoids wasting
Azure resources if basic tests fail.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-append

HIGH:
1. Auth: CI sets useAzCliAuth=true (OIDC needs az CLI), local unsets it
   (WSL az CLI is slow). Conditional on GITHUB_ACTIONS env var.
2. Init failure cleanup: attempt teardown if .azure dir exists, even when
   init returns False (prevents resource leak in CI).

MEDIUM:
3. Invoke: response missing '4' is now FAIL, not soft PASS.
4. Fix double-append bug in response extraction (line appended twice).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3. Timeout SIGKILL resource leak: add _cleanup_leaked_resources() that
   runs azd down on any project dirs found after subprocess timeout.
6. Parallel config race: set AZURE_CONFIG_DIR per process so two
   concurrent test_full_e2e.py instances don't fight over config.json.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add words used in E2E test documentation (bash flags, tool names,
path components) to suppress cspell-lint CI failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AZURE_CONFIG_DIR is for az CLI, not azd. azd reads AZD_CONFIG_DIR
(pkg/config/manager.go:71). The wrong variable meant:
1. Parallel azd config race was not actually isolated
2. CI az CLI auth token could become inaccessible

Also move config dir outside testdir to avoid child process rm -rf.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…local paths from docs

- AZD_CONFIG_DIR: copy from ~/.azd (not empty dir) so extensions are
  available in isolated config directory
- Remove hardcoded local paths (jwshare, wsladmin, testgates) from
  LOCAL-TEST-GUIDE.md, SUMMARY.md, README.md, run_all.sh
- Use generic /path/to/... placeholders instead
- Clean cspell dictionary: only keep words actually needed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ment

- SUMMARY.md: remove last local path (/mnt/c/.../wbin/az)
- cspell.yaml: remove orphan 'wbin' entry
- run_all.sh: use dynamic python3 path detection instead of hardcoded pyenv
- test_tier0.py: SKIP (None) is now distinct from PASS (True) in check()
  summary shows 'N passed, M skipped' instead of false PASS count
- test_full_e2e.py: fix stale 'canadacentral' comment to generic '<region>'
- LOCAL-TEST-GUIDE.md: add note about adjusting pyenv path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On CI, azd is built in the workspace dir and added via GITHUB_PATH.
The tmux session needs to include this path explicitly since it starts
a fresh bash without inheriting GITHUB_PATH modifications.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI runners are slower; azd extension loading takes variable time.
Poll every 2s for up to 20s. Add debug output on failure to see
what's actually on screen.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
azd ai agent init requires authentication since v0.1.40-preview.
The picker test cannot run in tier0 (offline/no-auth). Skip gracefully
when 'azd auth token' fails, so it runs in tier1 (which has auth).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: trangevi <trangevi@microsoft.com>
@microsoft-github-policy-service

Copy link
Copy Markdown
Contributor

Hi @@v1212. Thank you for your interest in helping to improve the Azure Developer CLI experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.agents extension no-recent-activity identity issues with no activity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants