From eda279f36b1b7dd9036440faedbf95055b569867 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 19:38:40 +0800 Subject: [PATCH 01/33] Add live golden-path (Tier 2) pipeline for azd ai agent extension Adds eng/pipelines/ext-azure-ai-agents-live.yml, an on-demand/weekly Azure DevOps pipeline that drives the real 'azd ai agent' CLI through tmux against live Azure (TME), exercising init -> provision -> deploy -> invoke -> down for both code and container deploy modes. This is the live counterpart to the PR-gate checks (Tier 0 offline + Tier 1 recording/playback in #8754). Per Azure SDK EngSys / SFI guidance, live access stays out of the automatic PR pipeline (trigger: none) and runs only via '/azp run ext-azure-ai-agents-live' or the weekly schedule. The Tier 2 tmux driver (test_full_e2e.py, test_tier2.py) is migrated from the #8692 prototype; CI auth detection is extended to recognize Azure DevOps (TF_BUILD) and an explicit E2E_USE_AZ_CLI_AUTH override. --- .../azure.ai.agents/tests/e2e-live/README.md | 96 ++ .../tests/e2e-live/test_full_e2e.py | 881 ++++++++++++++++++ .../tests/e2e-live/test_tier2.py | 173 ++++ eng/pipelines/ext-azure-ai-agents-live.yml | 185 ++++ 4 files changed, 1335 insertions(+) create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py create mode 100644 eng/pipelines/ext-azure-ai-agents-live.yml diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md new file mode 100644 index 00000000000..e60d916e6a0 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md @@ -0,0 +1,96 @@ +# azure.ai.agents — Live E2E (Tier 2) + +Full golden-path tests that exercise the real `azd ai agent` CLI against **live +Azure** resources: + +``` +init → provision → deploy → invoke → down +``` + +A Python driver sends keystrokes to the CLI through a **tmux** session and asserts +on the captured output, for both deploy modes: + +| Mode | What it does | +| ----------- | ------------------------------------------------------- | +| `code` | Source-code (zip) deploy of the agent service | +| `container` | Container (ACR build) deploy of the agent service | + +The two modes run **sequentially** (same subscription → avoids resource races). + +## Where this fits + +| Tier | Coverage | Where it runs | +| ---- | ----------------------------------------- | ------------------------------------------------------ | +| 0 | Offline CLI validation (no auth) | PR gate — `.github/workflows/lint-ext-azure-ai-agents.yml` | +| 1 | `init` variants (recording/playback) | PR gate — same workflow | +| 2 | **Full live golden path** (this folder) | **`eng/pipelines/ext-azure-ai-agents-live.yml`** | + +Live Azure access is deliberately kept **out** of the automatic PR pipeline (Azure +SDK EngSys / SFI guidance). Tier 2 runs only on demand or on a schedule. + +## Running in CI + +Pipeline: `eng/pipelines/ext-azure-ai-agents-live.yml` (ADO). + +- **On demand (per PR):** comment `/azp run ext-azure-ai-agents-live` on the PR. + Requires write permission on the repo. +- **Scheduled:** weekly, Monday 07:00 UTC against `main`. +- **Manual:** queue the pipeline and pick `deployModes` = `both` / `code` / + `container`. + +Logs for each run are published as the `tier2-live-logs-` artifact. + +### One-time admin setup + +1. **Register the pipeline** in Azure DevOps pointing at + `eng/pipelines/ext-azure-ai-agents-live.yml`, named `ext-azure-ai-agents-live` + (the name used by `/azp run`). +2. **Service connection** — the `serviceConnection` parameter (default + `azure-sdk-tests`) must map to the shared **TME test subscription** via OIDC / + workload-identity federation. The federated identity needs enough RBAC to + create Foundry projects and deploy models (Contributor + Azure AI Developer + + Cognitive Services Contributor, or equivalent). +3. **Optional `GitHubPat`** — add a secret pipeline variable with a GitHub PAT to + avoid anonymous rate limits when the template is cloned during `init`. + +## Running locally (WSL) + +Prerequisites: WSL with `tmux` (>= 3.4), Python 3.12+, `azd` (>= 1.25.5) with the +`azure.ai.agents` extension installed, and `az` logged in. + +```bash +# Use azd's built-in auth locally (NOT az CLI auth — it is slow under WSL). +azd config unset auth.useAzCliAuth +azd auth login + +# Both modes (sequential): +python3 test_tier2.py --mode both + +# A single golden path: +python3 test_full_e2e.py --deploy-mode code +python3 test_full_e2e.py --deploy-mode container --keep # leave resources up +``` + +### Useful environment variables + +| Variable | Default | Purpose | +| ---------------------- | ------------ | -------------------------------------------------------------- | +| `E2E_CREATE_PROJECT` | `false` | `true` → always create a fresh Foundry project | +| `E2E_LOCATION` | `eastus2` | Region for new projects (needs model quota) | +| `E2E_SUBSCRIPTION` | — | Subscription id (filters the picker) | +| `E2E_TENANT` | — | AAD tenant id | +| `E2E_USE_AZ_CLI_AUTH` | — | `true` → set `auth.useAzCliAuth` (CI; auto-on under ADO/GHA) | +| `GH_TOKEN` | — | GitHub token for template clone (optional) | + +In CI the driver auto-detects GitHub Actions (`GITHUB_ACTIONS`) and Azure DevOps +(`TF_BUILD`) and switches to `az` CLI auth automatically. + +## Files + +| File | Purpose | +| ------------------ | ----------------------------------------------------------------- | +| `test_tier2.py` | Runner — invokes `test_full_e2e.py` once per deploy mode | +| `test_full_e2e.py` | One golden path: setup → init → provision → deploy → invoke → down | + +Each phase has bounded timeouts and best-effort `azd down --force --purge` +teardown so a crash mid-run does not leak billable resources. diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py new file mode 100644 index 00000000000..d04f0b18bf3 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -0,0 +1,881 @@ +#!/usr/bin/env python3 +"""Full E2E test: init -> provision -> deploy -> invoke -> down. + +See LOCAL-TEST-GUIDE.md for complete setup & run instructions. + +Prerequisites: + - WSL with tmux (>=3.4), Python 3.12+ + - azd (>=1.25.5) with azure.ai.agents extension, logged in via `azd auth login` + - auth.useAzCliAuth must NOT be true (use `azd config unset auth.useAzCliAuth`) + - GitHub token available via gh.exe or $GITHUB_TOKEN + +Recommended env vars: + E2E_CREATE_PROJECT=true — always create new Foundry project (avoid stale resources) + E2E_LOCATION=eastus2 — region with sufficient model quota + E2E_HOME=$HOME — home directory for azd config +""" +import subprocess +import time +import sys +import os + +import re + +TMUX = os.environ.get("E2E_TMUX", "tmux") +SOCK = os.environ.get("E2E_SOCK", "e2e") +SESS = os.environ.get("E2E_SESS", "e2e") +TESTDIR = os.environ.get("E2E_TESTDIR", "/tmp/e2e-tests/full-e2e") +HOME_DIR = os.environ.get("E2E_HOME", os.environ.get("HOME", "/home/runner")) +SUBSCRIPTION = os.environ.get("E2E_SUBSCRIPTION", "") +PROJECT = os.environ.get("E2E_PROJECT", "") +TENANT = os.environ.get("E2E_TENANT", "") +GH_TOKEN = os.environ.get("GH_TOKEN", os.environ.get("GITHUB_TOKEN", "")) +AGENT_NAME = os.environ.get("E2E_AGENT_NAME", "") # Optional: unique name for parallel isolation +CREATE_PROJECT = os.environ.get("E2E_CREATE_PROJECT", "").lower() in ("1", "true", "yes") +LOCATION = os.environ.get("E2E_LOCATION", "eastus2") # Region for new projects +# Inherit full parent PATH so tmux sessions get az-wrapper, azd, etc. +PARENT_PATH = os.environ.get("PATH", f"{HOME_DIR}/bin:/usr/local/bin:/usr/bin:/bin") +_tenant_env = f"; export AZURE_TENANT_ID={TENANT}" if TENANT else "" +_gh_env = f"; export GH_TOKEN={GH_TOKEN}; export GITHUB_TOKEN={GH_TOKEN}" if GH_TOKEN else "" +ENV_SETUP = f"export HOME={HOME_DIR}; export PATH={PARENT_PATH}{_tenant_env}{_gh_env}" + +# Track results +results = {} +DEPLOY_MODE = os.environ.get("E2E_DEPLOY_MODE", "code") # "code" or "container" +_SENTINEL_BASE = "__DONE_{}_".format(os.getpid()) +_sentinel_counter = 0 + + +def get_gh_token(): + """Get GitHub token from env or gh CLI.""" + token = os.environ.get("GITHUB_TOKEN", os.environ.get("GH_TOKEN", "")) + if token: + return token + # Try native gh CLI + try: + r = subprocess.run(["gh", "auth", "token"], capture_output=True, text=True, timeout=10) + if r.returncode == 0 and r.stdout.strip(): + return r.stdout.strip() + except Exception: + pass + # Try Windows gh.exe (WSL local-dev only) + if os.path.exists("/mnt/c"): + try: + r = subprocess.run( + ["/mnt/c/Program Files/GitHub CLI/gh.exe", "auth", "token"], + capture_output=True, text=True, timeout=10 + ) + if r.returncode == 0 and r.stdout.strip(): + return r.stdout.strip() + except Exception: + pass + return "" + + +def tmux(*args): + cmd = [TMUX, "-L", SOCK] + list(args) + r = subprocess.run(cmd, capture_output=True, text=True, timeout=10) + if r.returncode != 0 and r.stderr: + print(f" [tmux error] {' '.join(args[:3])}: {r.stderr.strip()}") + return r.stdout + + +def send(text): + tmux("send-keys", "-t", SESS, "-l", text) + + +def key(k): + tmux("send-keys", "-t", SESS, k) + + +def capture(): + return tmux("capture-pane", "-t", SESS, "-p") + + +def wait_for(pattern, timeout=60): + deadline = time.time() + timeout + while time.time() < deadline: + cap = capture() + if pattern.lower() in cap.lower(): + return cap + time.sleep(1) + return None + + +def wait_for_or_fail(pattern, timeout=60, phase=""): + cap = wait_for(pattern, timeout) + if cap is None: + print(f"TIMEOUT waiting for: {pattern}") + print("Last capture:") + print(capture()) + if phase: + results[phase] = "FAIL (timeout)" + return None + return cap + + +def select_by_text(target, delay=1.5): + send(target) + time.sleep(delay) + key("Enter") + + +def show(label="", lines_count=15): + cap = capture() + lines = [l for l in cap.split("\n") if l.strip()] + if label: + print(f"\n--- {label} ---") + for l in lines[-lines_count:]: + print(f" {l}") + + +def run_cmd(cmd, timeout=600): + """Send command with unique sentinel and wait for completion. Returns (capture_text, exit_code). + + Each call uses a unique sentinel (base + counter) so that leftover output from + previous commands cannot cause a false match. + """ + global _sentinel_counter + _sentinel_counter += 1 + sentinel = f"{_SENTINEL_BASE}{_sentinel_counter}_" + sentinel_re = re.compile(re.escape(sentinel) + r"(\d+)") + + send(f"{cmd} ; echo {sentinel}$?") + key("Enter") + deadline = time.time() + timeout + while time.time() < deadline: + cap = capture() + m = sentinel_re.search(cap) + if m: + rc = int(m.group(1)) + return cap, rc + time.sleep(3) + return None, -1 + + +# Legacy: kept for reference, prefer run_cmd() +def _wait_for_shell_prompt_legacy(timeout=600): + """Wait for bash prompt (command finished).""" + deadline = time.time() + timeout + while time.time() < deadline: + cap = capture() + lines = [l for l in cap.split("\n") if l.strip()] + if lines: + last = lines[-1].strip() + if last.endswith("$") or last.startswith("bash"): + return cap + time.sleep(3) + return None + + +def validate_init_output(testdir): + """Validate init produced correct artifacts on disk.""" + import glob as _glob + for d in os.listdir(testdir): + subdir = os.path.join(testdir, d) + if os.path.isdir(subdir): + azure_yaml = os.path.join(subdir, "azure.yaml") + if os.path.exists(azure_yaml): + with open(azure_yaml) as f: + content = f.read() + if "host:" in content and "azure.ai.agent" in content: + # agent.yaml may be nested under src// + agent_yamls = _glob.glob(os.path.join(subdir, "**", "agent.yaml"), recursive=True) + if agent_yamls or os.path.exists(os.path.join(subdir, "agent.yaml")): + return True + return False + + +def find_service_name(testdir): + """Read the first service name from azure.yaml under the generated project.""" + for d in os.listdir(testdir): + subdir = os.path.join(testdir, d) + azure_yaml_path = os.path.join(subdir, "azure.yaml") + if os.path.isdir(subdir) and os.path.exists(azure_yaml_path): + with open(azure_yaml_path) as f: + content = f.read() + in_services = False + for line in content.split("\n"): + if line.strip() == "services:": + in_services = True + continue + if in_services and line.startswith(" ") and line.strip().endswith(":"): + return line.strip().rstrip(":") + if in_services and not line.startswith(" ") and line.strip(): + break + return None + + +# =========================================================== +# SETUP +# =========================================================== +def setup(): + print("=" * 60) + print("SETUP") + print("=" * 60) + + subprocess.run([TMUX, "-L", SOCK, "kill-server"], capture_output=True) + time.sleep(0.5) + + # Clean test dir + subprocess.run(["rm", "-rf", TESTDIR]) + os.makedirs(TESTDIR, exist_ok=True) + + # Create tmux session + tmux("new-session", "-d", "-s", SESS, "-x", "200", "-y", "50", "bash --norc --noprofile") + time.sleep(2) + + cap = capture() + print(f"Session alive: {len(cap)} chars") + + # Set environment + gh_token = get_gh_token() + env_cmd = ENV_SETUP + if gh_token: + env_cmd += f"; export GH_TOKEN={gh_token}; export GITHUB_TOKEN={gh_token}" + print(f"GitHub token: {len(gh_token)} chars") + send(env_cmd) + key("Enter") + time.sleep(1) + # Clear scrollback to avoid token leaking into capture output + send("clear") + key("Enter") + time.sleep(0.5) + + send("echo ENV_OK") + key("Enter") + time.sleep(2) + cap = capture() + if "ENV_OK" not in cap: + print("ERROR: Environment setup failed") + sys.exit(1) + print("Environment OK") + + # Auth config: CI uses az CLI (OIDC token), local WSL uses azd built-in auth. + # In CI, the pipeline logs az CLI in via OIDC → azd needs useAzCliAuth=true. + # In WSL, az CLI is slow (cross-process) → must use azd built-in auth. + # Detection: GitHub Actions (GITHUB_ACTIONS), Azure DevOps (TF_BUILD), or an + # explicit E2E_USE_AZ_CLI_AUTH override for other CI / manual runs. + _use_az_cli_auth = ( + os.environ.get("E2E_USE_AZ_CLI_AUTH", "").lower() in ("1", "true", "yes") + or bool(os.environ.get("GITHUB_ACTIONS")) + or bool(os.environ.get("TF_BUILD")) # Azure DevOps pipeline + ) + if _use_az_cli_auth: + send("azd config set auth.useAzCliAuth true") + else: + send("azd config unset auth.useAzCliAuth 2>/dev/null") + key("Enter") + time.sleep(1) + + send(f"cd {TESTDIR}") + key("Enter") + time.sleep(1) + + +# =========================================================== +# PHASE 1: INIT +# =========================================================== +def phase_init(): + print("\n" + "=" * 60) + print("PHASE 1: azd ai agent init") + print("=" * 60) + + init_cmd = "azd ai agent init" + if AGENT_NAME: + init_cmd += f" --agent-name {AGENT_NAME}" + send(init_cmd) + key("Enter") + time.sleep(8) + + # Step 1: Language + if not wait_for_or_fail("Select a language", 30, "init"): + return False + print("[1] Language: Python") + select_by_text("Python") + time.sleep(3) + + # Step 2: Template + if not wait_for_or_fail("Select a starter template", 30, "init"): + return False + print("[2] Template: Basic agent (Invocations)") + select_by_text("Basic agent (Invocations") + time.sleep(8) + + # Step 2.5: Git protocol (may appear between template download and name prompt) + time.sleep(3) + cap = capture() + if "protocol" in cap.lower() or "git operations" in cap.lower(): + print("[2.5] Git protocol: HTTPS (default)") + key("Enter") + time.sleep(3) + + # Step 3: Name (may be skipped if --agent-name was used) + if AGENT_NAME: + print(f"[3] Name: {AGENT_NAME} (via --agent-name, prompt may be skipped)") + # Wait briefly for name prompt — if it doesn't appear, flag worked + cap = wait_for("Enter a name", 15) + if cap: + key("Enter") + time.sleep(5) + else: + if not wait_for_or_fail("Enter a name", 30, "init"): + return False + print("[3] Name: default") + key("Enter") + time.sleep(8) + + # Step 4: Foundry project type + if not wait_for_or_fail("Select a Foundry project", 30, "init"): + return False + + if CREATE_PROJECT: + # Create a new Foundry project — azd manages all resources + print("[4] Create a new Foundry project") + select_by_text("Create") + time.sleep(5) + # Remaining prompts (subscription, location, names) handled by dynamic loop + else: + # Use existing Foundry project + print("[4] Use existing Foundry project") + key("Enter") + + # Step 5: Wait for subscription or project picker + deadline = time.time() + 30 + while time.time() < deadline: + time.sleep(3) + cap = capture() + lines = [l for l in cap.split("\n") if l.strip()] + active_prompt = "" + for l in reversed(lines): + if l.strip().startswith("?"): + active_prompt = l.strip().lower() + break + if "subscription" in active_prompt: + print("[5] Subscription: accept default") + key("Enter") + time.sleep(10) + if not wait_for_or_fail("Select a Foundry project", 30, "init"): + return False + break + elif "select a foundry project" in active_prompt and "use an existing" not in active_prompt: + print("[5] Subscription: skipped (already on project picker)") + break + if lines and (lines[-1].strip().endswith("$") or lines[-1].strip().startswith("bash")): + if any("error" in l.lower() for l in lines[-5:]): + print("[5] ERROR: CLI exited") + show("Error") + results["init"] = "FAIL (error)" + return False + else: + print("[5] Timeout waiting for subscription/project picker") + show("Timeout") + results["init"] = "FAIL (timeout step 5)" + return False + + # Step 6: Project — verify we're on the project picker before typing + cap = capture() + cap_lines = [l for l in cap.split("\n") if l.strip()] + last_prompt = "" + for l in reversed(cap_lines): + if l.strip().startswith("?"): + last_prompt = l.strip().lower() + break + + if "foundry project" in last_prompt or "project" in last_prompt: + print(f"[6] Project: {PROJECT}") + if PROJECT: + select_by_text(PROJECT, delay=3) + else: + key("Enter") + time.sleep(10) + + # Verify we're past the project picker (not stuck) + time.sleep(3) + cap = capture() + prompt_line = "" + for l in reversed(cap.split("\n")): + if l.strip().startswith("?"): + prompt_line = l.strip().lower() + break + if "select a foundry project" in prompt_line: + print("[6b] Project filter may have failed, accepting highlighted") + key("Enter") + time.sleep(5) + else: + print(f"[6] Not on project picker, moving to dynamic") + + # Step 7+: Dynamic prompts + _last_prompt = "" + _same_prompt_count = 0 + for step_num in range(7, 45): + time.sleep(3) + cap = capture() + cap_lower = cap.lower() + + if "added to your azd project" in cap_lower or "agent definition added" in cap_lower: + print(f"[{step_num}] === INIT COMPLETE ===") + if not validate_init_output(TESTDIR): + print(" WARNING: marker found but disk validation failed, checking...") + time.sleep(5) + if not validate_init_output(TESTDIR): + print(" FAIL: artifacts not on disk despite completion marker") + results["init"] = "FAIL (no artifacts)" + return False + results["init"] = "PASS" + return True + + # Check for error exit + lines = [l for l in cap.split("\n") if l.strip()] + if lines: + last = lines[-1].strip() + if (last.endswith("$") or last.startswith("bash")): + if "error" in cap_lower: + print(f"[{step_num}] Init exited with error") + show("Error") + results["init"] = "FAIL (error)" + return False + + # Find ? prompt + prompt = "" + for l in reversed(lines): + if l.strip().startswith("?"): + prompt = l.strip().lower() + break + + if not prompt: + time.sleep(5) + cap = capture() + lines = [l for l in cap.split("\n") if l.strip()] + for l in reversed(lines): + if l.strip().startswith("?"): + prompt = l.strip().lower() + break + + if not prompt: + if lines and (lines[-1].strip().startswith("bash") or lines[-1].strip().endswith("$")): + # Check if init completed without marker + if validate_init_output(TESTDIR): + print(f"[{step_num}] Init complete (disk validation)") + results["init"] = "PASS" + return True + print(f"[{step_num}] Shell prompt, no completion marker") + show("Final") + results["init"] = "FAIL (no completion)" + return False + print(f"[{step_num}] Waiting...") + continue + + print(f"[{step_num}] {prompt[:80]}") + + # Detect prompt loops — same prompt question repeating 3+ times + # Compare by question part before ':' to handle varying filter text + colon_idx = prompt.find(":") + prompt_key = prompt[:colon_idx].strip() if colon_idx > 0 else prompt.strip() + if prompt_key == _last_prompt: + _same_prompt_count += 1 + else: + _same_prompt_count = 1 + _last_prompt = prompt_key + + if _same_prompt_count >= 3: + print(f" !! Loop detected ({_same_prompt_count}x same prompt)") + if "model" in prompt or "is specified" in prompt: + # Model prompt looping — probably no quota. Try Down to pick alt option. + print(" -> navigating to alternative option") + key("Down") + time.sleep(0.3) + key("Enter") + time.sleep(3) + continue + elif _same_prompt_count >= 5: + print(" FAIL: stuck in prompt loop") + results["init"] = "FAIL (prompt loop)" + return False + + # Handle prompts + if "[y/n]" in prompt or "(y/n)" in prompt: + # Confirm prompts — answer yes unless it's asking to reuse a conflicting name + if "continue with this existing agent name" in prompt: + print(" -> no (use fresh name)") + send("n") + key("Enter") + else: + print(" -> yes") + send("y") + key("Enter") + elif "protocol" in prompt or "git operations" in prompt: + # "What is your preferred protocol for Git operations?" → HTTPS (default) + print(" -> HTTPS (default)") + key("Enter") + elif "enter a different name" in prompt: + print(" -> default name") + key("Enter") + elif "acr" in prompt or "container registry" in prompt: + print(" -> blank (create new)") + key("Enter") + elif "enter model deployment name" in prompt or ("enter" in prompt and "deployment" in prompt and "name" in prompt): + print(" -> default name") + key("Enter") + elif "existing deployment" in prompt or "is specified in the agent manifest" in prompt or ("found" in prompt and "deployment" in prompt): + print(" -> use existing/specified") + key("Enter") + elif "capacity" in prompt: + # Capacity field is usually pre-filled; accept default + print(" -> accept capacity (default)") + key("Enter") + elif "sku" in prompt: + print(" -> default SKU") + key("Enter") + elif "version" in prompt: + print(" -> default version") + key("Enter") + elif "select" in prompt and "model" in prompt: + print(" -> select gpt-4o-mini") + select_by_text("gpt-4o-mini") + elif "subscription" in prompt: + if SUBSCRIPTION: + print(f" -> subscription: filter by {SUBSCRIPTION[:8]}") + select_by_text(SUBSCRIPTION[:8], delay=2) + else: + print(" -> subscription: accept default") + key("Enter") + elif "location" in prompt or "region" in prompt: + print(f" -> location: {LOCATION}") + select_by_text(LOCATION, delay=2) + elif "foundry project" in prompt or ("select" in prompt and "project" in prompt): + if PROJECT: + print(f" -> project: {PROJECT}") + select_by_text(PROJECT, delay=3) + else: + print(" -> default project") + key("Enter") + elif "account name" in prompt or "resource name" in prompt or "hub name" in prompt: + print(" -> accept default name") + key("Enter") + elif "model" in prompt and "capacity" not in prompt: + print(" -> default model") + key("Enter") + elif "deploy" in prompt and ("mode" in prompt or "how" in prompt) and "capacity" not in prompt: + if DEPLOY_MODE == "container": + print(" -> Container") + select_by_text("Container") + else: + print(" -> Source Code") + select_by_text("Source") + elif "what would you like to do" in prompt: + # Accept "Exit setup" (default) to finish init. + # Do NOT navigate up/down — that causes infinite loops by selecting + # "Add another model" or similar options. + print(" -> Exit setup (default)") + key("Enter") + else: + print(" -> Enter (default)") + key("Enter") + time.sleep(3) + + results["init"] = "FAIL (too many steps)" + return False + + +# =========================================================== +# PHASE 2: PROVISION +# =========================================================== +def phase_provision(): + print("\n" + "=" * 60) + print("PHASE 2: azd provision") + print("=" * 60) + + # Find the project subdirectory created by init + project_dir = None + for d in os.listdir(TESTDIR): + subdir = os.path.join(TESTDIR, d) + if os.path.isdir(subdir) and os.path.exists(os.path.join(subdir, "azure.yaml")): + project_dir = subdir + break + + if not project_dir: + print("ERROR: No project directory with azure.yaml found") + results["provision"] = "FAIL (no project dir)" + return False + + print(f"Project dir: {project_dir}") + send(f"cd {project_dir}") + key("Enter") + time.sleep(1) + + # Provision can take several minutes + print("Waiting for provision to complete (up to 10 min)...") + cap, rc = run_cmd("azd provision --no-prompt", timeout=600) + if cap is None: + print("TIMEOUT: provision did not complete in 10 min") + show("Current state", 20) + results["provision"] = "FAIL (timeout)" + return False + + show("Provision result", 20) + if rc != 0: + print(f"Provision FAILED (exit code {rc})") + results["provision"] = f"FAIL (exit code {rc})" + return False + + print("Provision appears complete") + results["provision"] = "PASS" + return True + + +# =========================================================== +# PHASE 3: DEPLOY +# =========================================================== +def phase_deploy(): + print("\n" + "=" * 60) + print("PHASE 3: azd deploy") + print("=" * 60) + + # Deploy can take several minutes + print("Waiting for deploy to complete (up to 10 min)...") + cap, rc = run_cmd("azd deploy --no-prompt", timeout=600) + if cap is None: + print("TIMEOUT: deploy did not complete in 10 min") + show("Current state", 20) + results["deploy"] = "FAIL (timeout)" + return False + + show("Deploy result", 20) + if rc != 0: + print(f"Deploy FAILED (exit code {rc})") + results["deploy"] = f"FAIL (exit code {rc})" + return False + + print("Deploy appears complete") + results["deploy"] = "PASS" + return True + + +# =========================================================== +# PHASE 4: INVOKE +# =========================================================== +def phase_invoke(): + print("\n" + "=" * 60) + print("PHASE 4: azd ai agent invoke") + print("=" * 60) + + # Wait for agent to fully start after deploy + wait_secs = 60 if DEPLOY_MODE == "container" else 30 + print(f"Waiting {wait_secs}s for agent startup ({DEPLOY_MODE} mode)...") + time.sleep(wait_secs) + + # The invocations protocol requires JSON payload via --input-file. + # Positional message sends empty body to invocations agents (azd bug/limitation). + service_name = find_service_name(TESTDIR) + if not service_name: + print("ERROR: Could not determine service name from azure.yaml") + results["invoke"] = "FAIL (no service name)" + return False + print(f" Service name: {service_name}") + + # Write payload to temp file for --input-file + payload_file = os.path.join(TESTDIR, ".invoke-payload.json") + with open(payload_file, "w") as f: + f.write('{"message": "Hello, what is 2+2?"}') + + max_retries = 3 + for attempt in range(1, max_retries + 1): + print(f"\nInvoke attempt {attempt}/{max_retries}...") + cap, rc = run_cmd( + f"azd ai agent invoke {service_name} --new-session -f {payload_file}", + timeout=180, + ) + if cap is None: + print("TIMEOUT: invoke did not complete in 3 min") + show("Current state", 20) + if attempt == max_retries: + results["invoke"] = "FAIL (timeout)" + return False + continue + + show("Invoke result", 20) + + # Check for errors + # Look for ERROR line in last few lines of output + lines = [l for l in cap.split("\n") if l.strip()] + has_error = False + error_msg = "" + if rc != 0: + for l in lines: + if "ERROR:" in l or ("error" in l.lower() and "500" in l): + has_error = True + error_msg = l.strip() + break + if not error_msg: + error_msg = f"exit code {rc}" + + if rc != 0 and has_error and ("500" in error_msg or "Internal Server Error" in error_msg): + print(f" Server error: {error_msg[:100]}") + if attempt < max_retries: + print(f" Retrying in 30s (container may still be starting)...") + time.sleep(30) + continue + else: + # Get container logs for debugging + print("\n Fetching agent logs for debugging...") + send(f"azd ai agent monitor {service_name} --tail 50") + key("Enter") + time.sleep(10) + log_cap = _wait_for_shell_prompt_legacy(timeout=60) + if log_cap: + show("Agent logs", 30) + results["invoke"] = f"FAIL (HTTP 500: {error_msg[:80]})" + return False + elif rc != 0: + print(f" Error: {error_msg[:100]}") + if attempt < max_retries: + time.sleep(15) + continue + results["invoke"] = f"FAIL ({error_msg[:80]})" + return False + else: + # Success — verify response content + # Extract lines between the LAST invoke command and its sentinel. + # The capture may contain output from previous phases, so we must + # find the last occurrence of the invoke command to avoid matching + # stale sentinels from earlier phases (deploy, provision, etc.). + all_lines = cap.split("\n") + # Find the last line that contains the invoke command + invoke_start = -1 + for i in range(len(all_lines) - 1, -1, -1): + if "invoke" in all_lines[i].lower() and service_name in all_lines[i]: + invoke_start = i + break + + resp_lines = [] + if invoke_start >= 0: + for line in all_lines[invoke_start + 1:]: + if _SENTINEL_BASE in line: + break + if line.strip(): + resp_lines.append(line.strip()) + + response_text = "\n".join(resp_lines) + if not response_text.strip(): + print(" WARNING: invoke returned empty response") + if attempt < max_retries: + print(" Retrying...") + time.sleep(15) + continue + results["invoke"] = "FAIL (empty response)" + return False + + # Payload asks "what is 2+2?" — response should contain "4" + has_expected = "4" in response_text + print(f" Response ({len(response_text)} chars): {response_text[:120]}") + if not has_expected: + print(" FAIL: response does not contain expected '4'") + results["invoke"] = "FAIL (response missing '4')" + return False + + results["invoke"] = "PASS" + return True + + results["invoke"] = "FAIL (all retries exhausted)" + return False + + +# =========================================================== +# PHASE 5: TEARDOWN +# =========================================================== +def phase_teardown(): + print("\n" + "=" * 60) + print("PHASE 5: azd down (teardown)") + print("=" * 60) + + print("Waiting for teardown (up to 10 min)...") + cap, rc = run_cmd("azd down --force --purge --no-prompt", timeout=600) + if cap is None: + print("TIMEOUT: teardown did not complete") + show("Current state", 20) + results["teardown"] = "FAIL (timeout)" + return False + + show("Teardown result", 20) + if rc != 0: + print(f"Teardown FAILED (exit code {rc})") + results["teardown"] = f"FAIL (exit code {rc})" + return False + print("Teardown complete") + results["teardown"] = "PASS" + return True + + +# =========================================================== +# MAIN +# =========================================================== +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser() + parser.add_argument("--keep", action="store_true", help="Keep deployed agent (skip teardown)") + parser.add_argument("--deploy-mode", choices=["code", "container"], default="code", + help="Deploy mode: 'code' (Source Code) or 'container' (Container)") + args = parser.parse_args() + DEPLOY_MODE = args.deploy_mode + + print(f"Deploy mode: {DEPLOY_MODE}") + start_time = time.time() + + setup() + + # Run phases sequentially + init_ok = phase_init() + if init_ok: + if phase_provision(): + if phase_deploy(): + phase_invoke() + if not args.keep: + phase_teardown() + else: + print("\n>>> --keep flag: skipping teardown, agent remains deployed <<<") + results["teardown"] = "SKIPPED (--keep)" + else: + if not args.keep: + phase_teardown() + else: + # Init failed — but may have already created Azure resources (RG, project). + # Attempt cleanup if there's a .azure directory indicating provisioned state. + project_dir = None + if os.path.isdir(TESTDIR): + for d in os.listdir(TESTDIR): + azure_dir = os.path.join(TESTDIR, d, ".azure") + if os.path.isdir(azure_dir): + project_dir = os.path.join(TESTDIR, d) + break + if project_dir and not args.keep: + print(f"\nInit failed but found .azure in {project_dir} — attempting cleanup...") + send(f"cd {project_dir}") + key("Enter") + time.sleep(1) + phase_teardown() + + # Cleanup tmux + tmux("kill-session", "-t", SESS) + + elapsed = time.time() - start_time + print("\n" + "=" * 60) + print(f"RESULTS (elapsed: {elapsed:.0f}s)") + print("=" * 60) + all_pass = True + for phase, result in results.items(): + status = "✓" if "PASS" in result or "SKIPPED" in result else "✗" + print(f" {status} {phase}: {result}") + if "FAIL" in result: + all_pass = False + + required = ["init", "provision", "deploy", "invoke"] + passed_required = all(results.get(p, "").startswith("PASS") for p in required) + + if passed_required: + print("\n✓ ALL REQUIRED PHASES PASSED") + sys.exit(0) + else: + missing = [p for p in required if not results.get(p, "").startswith("PASS")] + print(f"\n✗ FAILED PHASES: {', '.join(missing)}") + sys.exit(1) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py new file mode 100644 index 00000000000..8d48f106873 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -0,0 +1,173 @@ +#!/usr/bin/env python3 +"""Tier 2: Full E2E golden path tests — code deploy + container deploy in parallel. + +Runs two instances of test_full_e2e.py simultaneously with different: + - deploy modes (code vs container) + - tmux session/socket names + - working directories + +Note: Agent names are derived from template defaults in separate directories. +Each instance uses its own isolated tmux socket and test directory. + +Prerequisites: + - Same as test_full_e2e.py (WSL, tmux, azd, az CLI, tokens) + - Sufficient quota for 2 concurrent deployments +""" +import subprocess +import sys +import os +import time +import tempfile +import shutil +from concurrent.futures import ThreadPoolExecutor, as_completed + +SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) + + +def _cleanup_leaked_resources(testdir, env, label): + """Best-effort azd down for any project dirs left behind after timeout/crash.""" + if not os.path.isdir(testdir): + return + for d in os.listdir(testdir): + project_dir = os.path.join(testdir, d) + azure_yaml = os.path.join(project_dir, "azure.yaml") + if os.path.isdir(project_dir) and os.path.isfile(azure_yaml): + print(f" [{label}] Cleaning up leaked resources in {project_dir}...") + try: + subprocess.run( + ["azd", "down", "--force", "--purge", "--no-prompt"], + cwd=project_dir, env=env, + capture_output=True, text=True, timeout=300, + ) + print(f" [{label}] Cleanup complete") + except Exception as e: + print(f" [{label}] Cleanup failed: {e}") + + +def run_e2e(deploy_mode, label): + """Run a full E2E test with the given deploy mode.""" + sock = f"e2e-{deploy_mode}" + sess = f"e2e-{deploy_mode}" + testdir = f"/tmp/e2e-tests/tier2-{deploy_mode}" + + script_path = os.path.join(SCRIPT_DIR, "test_full_e2e.py") + + cmd = [ + "python3", script_path, "--deploy-mode", deploy_mode + ] + + env = os.environ.copy() + env["E2E_DEPLOY_MODE"] = deploy_mode + env["E2E_SOCK"] = sock + env["E2E_SESS"] = sess + env["E2E_TESTDIR"] = testdir + # Isolate azd config per process to prevent parallel race on ~/.azd/config.json + # Use AZD_CONFIG_DIR (not AZURE_CONFIG_DIR which is for az CLI). + # Place outside testdir because child process rm -rf's testdir on startup. + # Copy from default ~/.azd so extensions (installed there) are available. + azd_config_dir = os.path.join(tempfile.gettempdir(), f"e2e-azd-config-{deploy_mode}") + default_azd = os.path.expanduser("~/.azd") + if os.path.isdir(default_azd): + if os.path.isdir(azd_config_dir): + shutil.rmtree(azd_config_dir) + shutil.copytree(default_azd, azd_config_dir) + else: + os.makedirs(azd_config_dir, exist_ok=True) + env["AZD_CONFIG_DIR"] = azd_config_dir + # Unique agent name to avoid Azure resource collisions in parallel runs + import hashlib + unique_suffix = hashlib.md5(f"{deploy_mode}-{os.getpid()}".encode()).hexdigest()[:6] + env["E2E_AGENT_NAME"] = f"e2e-{deploy_mode}-{unique_suffix}" + + print(f"\n{'='*60}") + print(f"[{label}] Starting: deploy_mode={deploy_mode}, sock={sock}") + print(f"{'='*60}") + + start = time.time() + try: + r = subprocess.run( + cmd, env=env, + capture_output=True, text=True, timeout=1500 # 25 min max per test + ) + elapsed = time.time() - start + success = r.returncode == 0 + + # Print output + print(f"\n--- [{label}] Output ({elapsed:.0f}s) ---") + lines = r.stdout.strip().split("\n") + for line in lines[-30:]: + print(f" {line}") + if r.stderr.strip(): + print(f" [stderr] {r.stderr.strip()[:200]}") + + return { + "label": label, + "deploy_mode": deploy_mode, + "success": success, + "elapsed": elapsed, + "returncode": r.returncode, + } + except subprocess.TimeoutExpired: + elapsed = time.time() - start + print(f"\n--- [{label}] TIMEOUT after {elapsed:.0f}s ---") + # Attempt cleanup: find any azure.yaml and run azd down to prevent resource leak. + _cleanup_leaked_resources(testdir, env, label) + return { + "label": label, + "deploy_mode": deploy_mode, + "success": False, + "elapsed": elapsed, + "returncode": -1, + } + + +if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser(description="Tier 2: Parallel golden path E2E tests") + parser.add_argument("--serial", action="store_true", help="Run sequentially instead of parallel") + parser.add_argument("--mode", choices=["both", "code", "container"], default="both", + help="Which mode(s) to run") + args = parser.parse_args() + + print("=" * 60) + print("TIER 2: Golden Path E2E Tests") + print("=" * 60) + + tests = [] + if args.mode in ("both", "code"): + tests.append(("code", "CODE-DEPLOY")) + if args.mode in ("both", "container"): + tests.append(("container", "CONTAINER-DEPLOY")) + + print(f" Tests: {[t[1] for t in tests]}") + print(f" Parallel: {not args.serial}") + + start_all = time.time() + results = [] + + if args.serial or len(tests) == 1 or len(tests) > 1: + # Always run sequentially — parallel causes resource conflicts in same subscription + for mode, label in tests: + result = run_e2e(mode, label) + results.append(result) + + total_elapsed = time.time() - start_all + + # Summary + print(f"\n{'='*60}") + print(f"TIER 2 RESULTS ({total_elapsed:.0f}s total)") + print("=" * 60) + all_pass = True + for r in results: + status = "✓" if r["success"] else "✗" + print(f" {status} {r['label']}: {'PASS' if r['success'] else 'FAIL'} ({r['elapsed']:.0f}s)") + if not r["success"]: + all_pass = False + + if all_pass: + print(f"\n✓ ALL TIER 2 TESTS PASSED ({total_elapsed:.0f}s)") + sys.exit(0) + else: + failed = [r["label"] for r in results if not r["success"]] + print(f"\n✗ FAILED: {', '.join(failed)}") + sys.exit(1) diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml new file mode 100644 index 00000000000..890289996c4 --- /dev/null +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -0,0 +1,185 @@ +# Live E2E: azure.ai.agents extension — Tier 2 golden path +# +# Runs the full agent lifecycle (init -> provision -> deploy -> invoke -> down) +# against LIVE Azure resources, driving the real `azd ai agent` CLI through tmux. +# +# This pipeline is the live counterpart to the PR-gate checks in +# `.github/workflows/lint-ext-azure-ai-agents.yml` (Tier 0 offline + Tier 1 +# recording/playback). Live Azure access is intentionally kept OUT of the +# automatic PR pipeline (per Azure SDK EngSys / SFI guidance) and runs here only: +# - On demand via the PR comment: /azp run ext-azure-ai-agents-live +# (requires write permission on the repo) +# - On the weekly schedule below. +# +# Required ADO setup (one-time, admin) — see tests/e2e-live/README.md: +# - Register this YAML as a pipeline named `ext-azure-ai-agents-live`. +# - Service connection (parameter `serviceConnection`, default `azure-sdk-tests`) +# must map to the shared TME test subscription with Contributor + the RBAC +# needed to create Foundry projects and deploy models. +# - Optional secret variable `GitHubPat`: a GitHub PAT used when cloning the +# starter template, to avoid anonymous GitHub rate limits. + +trigger: none +pr: none + +schedules: + # 7am UTC Monday (offset from other weekly E2E pipelines to reduce contention). + - cron: "0 7 * * 1" + displayName: Weekly live golden-path E2E + branches: + include: + - main + always: true + +parameters: + - name: deployModes + displayName: "Tier 2 deploy modes" + type: string + default: both + values: + - both + - code + - container + - name: serviceConnection + displayName: "Azure service connection (TME subscription)" + type: string + default: azure-sdk-tests + +extends: + template: /eng/pipelines/templates/stages/1es-redirect.yml + parameters: + stages: + - stage: AiAgentsLiveE2E + displayName: AI Agents Live Golden Path + variables: + - template: /eng/pipelines/templates/variables/image.yml + - name: GitHubPat + value: "" + jobs: + - job: Tier2 + displayName: Tier 2 — init/provision/deploy/invoke/down + pool: + name: $(LINUXPOOL) + image: $(LINUXVMIMAGE) + os: linux + # Two golden paths (code + container) run sequentially, ~13-15 min + # each, plus build/provision overhead. + timeoutInMinutes: 90 + steps: + - checkout: self + + - template: /eng/pipelines/templates/steps/setup-go.yml + + - task: UsePythonVersion@0 + inputs: + versionSpec: "3.12" + displayName: Use Python 3.12 + + - bash: | + set -euo pipefail + sudo apt-get update + sudo apt-get install -y tmux + tmux -V + displayName: Install tmux + + # Live build — NO `-tags=record`, so the CLI/extension talk to real + # Azure instead of the recording proxy used by the PR-gate tests. + - bash: go build -o azd . + workingDirectory: cli/azd + displayName: Build azd + + - bash: go build -o azure-ai-agents . + workingDirectory: cli/azd/extensions/azure.ai.agents + displayName: Build azure.ai.agents extension + + - bash: echo "##vso[task.prependpath]$(Build.SourcesDirectory)/cli/azd" + displayName: Add azd to PATH + + # Install the freshly built extension into the azd config dir. + # Mirrors the install used by lint-ext-azure-ai-agents.yml, but with + # the live (non-record) binary. + - bash: | + set -euo pipefail + EXT_DIR="$HOME/.azd/extensions/azure.ai.agents" + mkdir -p "$EXT_DIR" + cp cli/azd/extensions/azure.ai.agents/azure-ai-agents "$EXT_DIR/azure-ai-agents-linux-amd64" + chmod +x "$EXT_DIR/azure-ai-agents-linux-amd64" + cat > "$HOME/.azd/config.json" << 'EOF' + { + "extension": { + "installed": { + "azure.ai.agents": { + "id": "azure.ai.agents", + "namespace": "ai.agent", + "capabilities": ["custom-commands", "lifecycle-events", "mcp-server", "service-target-provider", "metadata"], + "displayName": "Foundry agents (Preview)", + "description": "Ship agents with Microsoft Foundry from your terminal. (Preview)", + "version": "0.0.0-test", + "usage": "azd ai agent [options]", + "path": "extensions/azure.ai.agents/azure-ai-agents-linux-amd64", + "source": "azd" + } + } + } + } + EOF + displayName: Install azure.ai.agents extension + + # OIDC login via the service connection. Leaves an authenticated az + # CLI session on the agent (used by azd via auth.useAzCliAuth) and + # exports the resolved subscription/tenant for the test driver. + - task: AzureCLI@2 + displayName: Azure Login (TME) + inputs: + azureSubscription: ${{ parameters.serviceConnection }} + scriptType: bash + scriptLocation: inlineScript + inlineScript: | + set -euo pipefail + SUB_ID=$(az account show --query id -o tsv) + TENANT_ID=$(az account show --query tenantId -o tsv) + echo "##vso[task.setvariable variable=SubscriptionId;issecret=false]$SUB_ID" + echo "##vso[task.setvariable variable=TenantId;issecret=false]$TENANT_ID" + echo "Logged in. Subscription: $SUB_ID" + + - bash: azd config set auth.useAzCliAuth true + displayName: Configure azd to use az CLI auth + + - bash: | + set -o pipefail + mkdir -p "$(Build.ArtifactStagingDirectory)/logs" + python3 test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ + | tee "$(Build.ArtifactStagingDirectory)/logs/tier2.log" + workingDirectory: cli/azd/extensions/azure.ai.agents/tests/e2e-live + timeoutInMinutes: 80 + displayName: Run Tier 2 live golden path + env: + E2E_SUBSCRIPTION: $(SubscriptionId) + E2E_TENANT: $(TenantId) + E2E_CREATE_PROJECT: "true" + E2E_LOCATION: eastus2 + E2E_USE_AZ_CLI_AUTH: "true" + # Optional GitHub PAT to avoid anonymous clone rate limits. + GH_TOKEN: $(GitHubPat) + + - task: PublishPipelineArtifact@1 + condition: always() + inputs: + targetPath: $(Build.ArtifactStagingDirectory)/logs + artifactName: tier2-live-logs-$(Build.BuildId) + displayName: Publish test logs + + # Safety net: the test self-tears-down via `azd down`, but if it + # crashed mid-run, force-purge any leftover project environments. + - bash: | + echo "Best-effort teardown of any leaked resources..." + for dir in /tmp/e2e-tests/tier2-*/; do + [ -d "$dir" ] || continue + proj=$(find "$dir" -maxdepth 2 -name azure.yaml -exec dirname {} \; | head -1) + if [ -n "$proj" ]; then + ( cd "$proj" && azd down --force --purge --no-prompt ) 2>/dev/null || true + fi + done + condition: always() + continueOnError: true + displayName: Cleanup leaked Azure resources From b877e6ed25a6a4de74cdbaeb4a955aaf26c53568 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 19:54:45 +0800 Subject: [PATCH 02/33] Fix live pipeline auth: run test/cleanup inside AzureCLI@2 with keepAzSessionActive The azure-sdk-tests service connection uses Workload Identity Federation, whose az session is isolated to the task's private AZURE_CONFIG_DIR and expires after ~10 min. Running the ~50 min golden-path test (and the cleanup) as plain bash steps after a separate login step would fail auth on both counts. Run them inside AzureCLI@2 with keepAzSessionActive:true (matching build-cli.yml) so the session stays refreshed and reaches azd (auth.useAzCliAuth) through tmux, which inherits AZURE_CONFIG_DIR. Subscription/tenant are now read in-script via az account show instead of cross-step pipeline variables. --- eng/pipelines/ext-azure-ai-agents-live.yml | 78 ++++++++++++---------- 1 file changed, 42 insertions(+), 36 deletions(-) diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 890289996c4..2a54ef8464b 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -125,37 +125,33 @@ extends: EOF displayName: Install azure.ai.agents extension - # OIDC login via the service connection. Leaves an authenticated az - # CLI session on the agent (used by azd via auth.useAzCliAuth) and - # exports the resolved subscription/tenant for the test driver. + # Run the live golden path INSIDE the AzureCLI@2 task so the az CLI + # session (consumed by azd via auth.useAzCliAuth) stays valid for the + # whole run. keepAzSessionActive is REQUIRED: the service connection + # uses Workload Identity Federation and the test runs well past the + # ~10 min default token lifetime. A separate login step would NOT + # work — AzureCLI@2 isolates AZURE_CONFIG_DIR to a task-temp dir, so + # the session does not persist to later plain bash steps. - task: AzureCLI@2 - displayName: Azure Login (TME) + displayName: Run Tier 2 live golden path + timeoutInMinutes: 80 inputs: azureSubscription: ${{ parameters.serviceConnection }} + keepAzSessionActive: true + visibleAzLogin: false scriptType: bash scriptLocation: inlineScript + workingDirectory: cli/azd/extensions/azure.ai.agents/tests/e2e-live inlineScript: | - set -euo pipefail - SUB_ID=$(az account show --query id -o tsv) - TENANT_ID=$(az account show --query tenantId -o tsv) - echo "##vso[task.setvariable variable=SubscriptionId;issecret=false]$SUB_ID" - echo "##vso[task.setvariable variable=TenantId;issecret=false]$TENANT_ID" - echo "Logged in. Subscription: $SUB_ID" - - - bash: azd config set auth.useAzCliAuth true - displayName: Configure azd to use az CLI auth - - - bash: | - set -o pipefail - mkdir -p "$(Build.ArtifactStagingDirectory)/logs" - python3 test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ - | tee "$(Build.ArtifactStagingDirectory)/logs/tier2.log" - workingDirectory: cli/azd/extensions/azure.ai.agents/tests/e2e-live - timeoutInMinutes: 80 - displayName: Run Tier 2 live golden path + set -o pipefail + azd config set auth.useAzCliAuth true + export E2E_SUBSCRIPTION="$(az account show --query id -o tsv)" + export E2E_TENANT="$(az account show --query tenantId -o tsv)" + echo "Using subscription: $E2E_SUBSCRIPTION" + mkdir -p "$(Build.ArtifactStagingDirectory)/logs" + python3 test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ + | tee "$(Build.ArtifactStagingDirectory)/logs/tier2.log" env: - E2E_SUBSCRIPTION: $(SubscriptionId) - E2E_TENANT: $(TenantId) E2E_CREATE_PROJECT: "true" E2E_LOCATION: eastus2 E2E_USE_AZ_CLI_AUTH: "true" @@ -165,21 +161,31 @@ extends: - task: PublishPipelineArtifact@1 condition: always() inputs: - targetPath: $(Build.ArtifactStagingDirectory)/logs + targetPath: $(Build.ArtifactStagingDirectory) artifactName: tier2-live-logs-$(Build.BuildId) displayName: Publish test logs - # Safety net: the test self-tears-down via `azd down`, but if it - # crashed mid-run, force-purge any leftover project environments. - - bash: | - echo "Best-effort teardown of any leaked resources..." - for dir in /tmp/e2e-tests/tier2-*/; do - [ -d "$dir" ] || continue - proj=$(find "$dir" -maxdepth 2 -name azure.yaml -exec dirname {} \; | head -1) - if [ -n "$proj" ]; then - ( cd "$proj" && azd down --force --purge --no-prompt ) 2>/dev/null || true - fi - done + # Safety net for hard crashes / step timeout: the in-test teardown + # runs `azd down` already, but if the run died mid-way, force-purge + # any leftover project environments. Must run inside AzureCLI@2 so it + # is authenticated — the previous task's az session does not persist. + - task: AzureCLI@2 condition: always() continueOnError: true displayName: Cleanup leaked Azure resources + inputs: + azureSubscription: ${{ parameters.serviceConnection }} + keepAzSessionActive: true + visibleAzLogin: false + scriptType: bash + scriptLocation: inlineScript + inlineScript: | + azd config set auth.useAzCliAuth true + echo "Best-effort teardown of any leaked resources..." + for dir in /tmp/e2e-tests/tier2-*/; do + [ -d "$dir" ] || continue + proj=$(find "$dir" -maxdepth 2 -name azure.yaml -exec dirname {} \; | head -1) + if [ -n "$proj" ]; then + ( cd "$proj" && azd down --force --purge --no-prompt ) 2>/dev/null || true + fi + done From 9c906b1def1630bb8ace67efd498ef0f22c770d9 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 20:08:11 +0800 Subject: [PATCH 03/33] Clean up Tier 2 driver: drop dead parallel-mode code, fix doc ref test_tier2.py always ran sequentially, but kept a tautological if-condition (len==1 or len>1), an unused concurrent.futures import, a no-op --serial flag, and a docstring/print claiming parallel execution. Simplify to an explicit sequential loop and update the docstring to match. Also fix test_full_e2e.py's module docstring to point at README.md (LOCAL-TEST-GUIDE.md does not exist). --- .../tests/e2e-live/test_full_e2e.py | 2 +- .../tests/e2e-live/test_tier2.py | 34 +++++++++---------- 2 files changed, 17 insertions(+), 19 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index d04f0b18bf3..7e5aeed9085 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -1,7 +1,7 @@ #!/usr/bin/env python3 """Full E2E test: init -> provision -> deploy -> invoke -> down. -See LOCAL-TEST-GUIDE.md for complete setup & run instructions. +See README.md for complete setup & run instructions. Prerequisites: - WSL with tmux (>=3.4), Python 3.12+ diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index 8d48f106873..3b83c564338 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -1,17 +1,17 @@ #!/usr/bin/env python3 -"""Tier 2: Full E2E golden path tests — code deploy + container deploy in parallel. +"""Tier 2: Full E2E golden path tests — code deploy + container deploy. -Runs two instances of test_full_e2e.py simultaneously with different: - - deploy modes (code vs container) - - tmux session/socket names - - working directories - -Note: Agent names are derived from template defaults in separate directories. -Each instance uses its own isolated tmux socket and test directory. +Runs test_full_e2e.py once per deploy mode (code, then container), sequentially. +Each run is isolated with its own: + - deploy mode (code vs container) + - tmux session/socket name + - working directory + - AZD_CONFIG_DIR (copied from ~/.azd so the installed extension is available) + - unique agent name (avoids Azure resource collisions) Prerequisites: - Same as test_full_e2e.py (WSL, tmux, azd, az CLI, tokens) - - Sufficient quota for 2 concurrent deployments + - Model quota for one deployment at a time """ import subprocess import sys @@ -19,7 +19,6 @@ import time import tempfile import shutil -from concurrent.futures import ThreadPoolExecutor, as_completed SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) @@ -123,8 +122,7 @@ def run_e2e(deploy_mode, label): if __name__ == "__main__": import argparse - parser = argparse.ArgumentParser(description="Tier 2: Parallel golden path E2E tests") - parser.add_argument("--serial", action="store_true", help="Run sequentially instead of parallel") + parser = argparse.ArgumentParser(description="Tier 2: Golden path E2E tests") parser.add_argument("--mode", choices=["both", "code", "container"], default="both", help="Which mode(s) to run") args = parser.parse_args() @@ -140,16 +138,16 @@ def run_e2e(deploy_mode, label): tests.append(("container", "CONTAINER-DEPLOY")) print(f" Tests: {[t[1] for t in tests]}") - print(f" Parallel: {not args.serial}") + print(" Execution: sequential") start_all = time.time() results = [] - if args.serial or len(tests) == 1 or len(tests) > 1: - # Always run sequentially — parallel causes resource conflicts in same subscription - for mode, label in tests: - result = run_e2e(mode, label) - results.append(result) + # Always run sequentially: concurrent deploys in the same subscription race + # on shared resources (ACR, Foundry project) and exhaust model quota. + for mode, label in tests: + result = run_e2e(mode, label) + results.append(result) total_elapsed = time.time() - start_all From f7d9c9837a747febbd27ecf762059143eb26ce58 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 20:26:35 +0800 Subject: [PATCH 04/33] Address Copilot review on live Tier 2 pipeline - Use the ambient azure-sdk org secret `azuresdk-github-pat` for GH_TOKEN instead of an empty `GitHubPat` placeholder variable (mirrors eval-waza.yml); removes a misleading masked variable and the need for admin PAT setup. - Harden the AzureCLI@2 inline script: `set -euo pipefail` and assign-then-verify subscription/tenant so an `az account show` failure fails fast (a plain `export X=$(...)` would have masked the error from set -e). - Reword the extension-install comment to be self-contained (it no longer inaccurately claims to mirror lint-ext-azure-ai-agents.yml). - Clarify the test_full_e2e.py auth prerequisite: only local WSL runs leave auth.useAzCliAuth unset; CI auto-enables az CLI auth. - Clear tmux scrollback after env setup so the exported GH token cannot leak into capture() output on failures/timeouts. - _cleanup_leaked_resources now checks azd down's return code and reports failures instead of always printing "Cleanup complete". --- .../extensions/azure.ai.agents/cspell.yaml | 1 + .../azure.ai.agents/tests/e2e-live/README.md | 5 +-- .../tests/e2e-live/test_full_e2e.py | 9 ++++-- .../tests/e2e-live/test_tier2.py | 10 ++++-- eng/pipelines/ext-azure-ai-agents-live.yml | 31 ++++++++++++------- 5 files changed, 38 insertions(+), 18 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/cspell.yaml b/cli/azd/extensions/azure.ai.agents/cspell.yaml index 8b19175da50..4a7b9563144 100644 --- a/cli/azd/extensions/azure.ai.agents/cspell.yaml +++ b/cli/azd/extensions/azure.ai.agents/cspell.yaml @@ -38,6 +38,7 @@ words: - aoai - authorizationfailed - azdaiagent + - azuresdk - CLIENTSECRET - curr - dataagent diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md index e60d916e6a0..aaddc1bff0a 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md @@ -50,8 +50,9 @@ Logs for each run are published as the `tier2-live-logs-` artifact. workload-identity federation. The federated identity needs enough RBAC to create Foundry projects and deploy models (Contributor + Azure AI Developer + Cognitive Services Contributor, or equivalent). -3. **Optional `GitHubPat`** — add a secret pipeline variable with a GitHub PAT to - avoid anonymous rate limits when the template is cloned during `init`. +3. **GitHub auth** — clones of the starter template use the azure-sdk org secret + `azuresdk-github-pat` (already provided by the Azure SDK ADO project) to avoid + anonymous rate limits, so no extra secret setup is required. ## Running locally (WSL) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 7e5aeed9085..44d09d036eb 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -6,7 +6,9 @@ Prerequisites: - WSL with tmux (>=3.4), Python 3.12+ - azd (>=1.25.5) with azure.ai.agents extension, logged in via `azd auth login` - - auth.useAzCliAuth must NOT be true (use `azd config unset auth.useAzCliAuth`) + - For local WSL runs, leave auth.useAzCliAuth unset (azd built-in auth, via + `azd config unset auth.useAzCliAuth`); under CI (GitHub Actions / Azure + DevOps / E2E_USE_AZ_CLI_AUTH=true) the script auto-enables az CLI auth. - GitHub token available via gh.exe or $GITHUB_TOKEN Recommended env vars: @@ -237,10 +239,13 @@ def setup(): send(env_cmd) key("Enter") time.sleep(1) - # Clear scrollback to avoid token leaking into capture output + # Clear scrollback to avoid token leaking into capture output. `clear` only + # wipes the visible screen, so also drop tmux's scrollback buffer — the GH + # token was just typed into this pane and must never resurface in capture(). send("clear") key("Enter") time.sleep(0.5) + tmux("clear-history", "-t", SESS) send("echo ENV_OK") key("Enter") diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index 3b83c564338..6060aab2991 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -33,12 +33,18 @@ def _cleanup_leaked_resources(testdir, env, label): if os.path.isdir(project_dir) and os.path.isfile(azure_yaml): print(f" [{label}] Cleaning up leaked resources in {project_dir}...") try: - subprocess.run( + r = subprocess.run( ["azd", "down", "--force", "--purge", "--no-prompt"], cwd=project_dir, env=env, capture_output=True, text=True, timeout=300, ) - print(f" [{label}] Cleanup complete") + if r.returncode == 0: + print(f" [{label}] Cleanup complete") + else: + print(f" [{label}] Cleanup FAILED (exit {r.returncode}) — " + f"resources may be leaked, check the subscription") + if r.stderr.strip(): + print(f" [{label}] [stderr] {r.stderr.strip()[:300]}") except Exception as e: print(f" [{label}] Cleanup failed: {e}") diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 2a54ef8464b..0457f667c68 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -16,8 +16,9 @@ # - Service connection (parameter `serviceConnection`, default `azure-sdk-tests`) # must map to the shared TME test subscription with Contributor + the RBAC # needed to create Foundry projects and deploy models. -# - Optional secret variable `GitHubPat`: a GitHub PAT used when cloning the -# starter template, to avoid anonymous GitHub rate limits. +# - GitHub clones of the starter template authenticate with the azure-sdk org +# secret `azuresdk-github-pat` (already provided by the Azure SDK ADO +# project) to avoid anonymous rate limits — no extra secret setup required. trigger: none pr: none @@ -53,8 +54,6 @@ extends: displayName: AI Agents Live Golden Path variables: - template: /eng/pipelines/templates/variables/image.yml - - name: GitHubPat - value: "" jobs: - job: Tier2 displayName: Tier 2 — init/provision/deploy/invoke/down @@ -95,9 +94,9 @@ extends: - bash: echo "##vso[task.prependpath]$(Build.SourcesDirectory)/cli/azd" displayName: Add azd to PATH - # Install the freshly built extension into the azd config dir. - # Mirrors the install used by lint-ext-azure-ai-agents.yml, but with - # the live (non-record) binary. + # Install the freshly built (live, non-record) extension into the + # azd config dir: copy the binary where azd expects it and write a + # minimal config.json so `azd ai agent` resolves the extension. - bash: | set -euo pipefail EXT_DIR="$HOME/.azd/extensions/azure.ai.agents" @@ -143,10 +142,17 @@ extends: scriptLocation: inlineScript workingDirectory: cli/azd/extensions/azure.ai.agents/tests/e2e-live inlineScript: | - set -o pipefail + set -euo pipefail azd config set auth.useAzCliAuth true - export E2E_SUBSCRIPTION="$(az account show --query id -o tsv)" - export E2E_TENANT="$(az account show --query tenantId -o tsv)" + # Assign first (not `export X=$(...)`, which hides command + # substitution failures from set -e), then verify non-empty. + E2E_SUBSCRIPTION="$(az account show --query id -o tsv)" + E2E_TENANT="$(az account show --query tenantId -o tsv)" + if [ -z "$E2E_SUBSCRIPTION" ] || [ -z "$E2E_TENANT" ]; then + echo "ERROR: failed to resolve subscription/tenant from az account show" >&2 + exit 1 + fi + export E2E_SUBSCRIPTION E2E_TENANT echo "Using subscription: $E2E_SUBSCRIPTION" mkdir -p "$(Build.ArtifactStagingDirectory)/logs" python3 test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ @@ -155,8 +161,9 @@ extends: E2E_CREATE_PROJECT: "true" E2E_LOCATION: eastus2 E2E_USE_AZ_CLI_AUTH: "true" - # Optional GitHub PAT to avoid anonymous clone rate limits. - GH_TOKEN: $(GitHubPat) + # azure-sdk org PAT (ambient in the ADO project) used only to + # avoid anonymous GitHub rate limits when cloning the template. + GH_TOKEN: $(azuresdk-github-pat) - task: PublishPipelineArtifact@1 condition: always() From 2518d8f9ec6936d3030a9d1fa5fbdf6dff978c9a Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 20:45:17 +0800 Subject: [PATCH 05/33] Address second Copilot review pass on live Tier 2 pipeline - Stream child E2E output live with a watchdog-enforced hard timeout instead of buffering everything via capture_output - Shell-escape the GitHub token (shlex.quote) before exporting in tmux - Clean up the per-mode AZD_CONFIG_DIR temp copy unless E2E_KEEP_ARTIFACTS - Use sha256 instead of md5 for the agent-name uniqueness suffix - Derive the agent binary arch from uname -m instead of hard-coding amd64 --- .../tests/e2e-live/test_full_e2e.py | 5 +- .../tests/e2e-live/test_tier2.py | 81 ++++++++++++------- eng/pipelines/ext-azure-ai-agents-live.yml | 16 +++- 3 files changed, 68 insertions(+), 34 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 44d09d036eb..b18d5dd7d16 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -22,6 +22,7 @@ import os import re +import shlex TMUX = os.environ.get("E2E_TMUX", "tmux") SOCK = os.environ.get("E2E_SOCK", "e2e") @@ -38,7 +39,7 @@ # Inherit full parent PATH so tmux sessions get az-wrapper, azd, etc. PARENT_PATH = os.environ.get("PATH", f"{HOME_DIR}/bin:/usr/local/bin:/usr/bin:/bin") _tenant_env = f"; export AZURE_TENANT_ID={TENANT}" if TENANT else "" -_gh_env = f"; export GH_TOKEN={GH_TOKEN}; export GITHUB_TOKEN={GH_TOKEN}" if GH_TOKEN else "" +_gh_env = f"; export GH_TOKEN={shlex.quote(GH_TOKEN)}; export GITHUB_TOKEN={shlex.quote(GH_TOKEN)}" if GH_TOKEN else "" ENV_SETUP = f"export HOME={HOME_DIR}; export PATH={PARENT_PATH}{_tenant_env}{_gh_env}" # Track results @@ -234,7 +235,7 @@ def setup(): gh_token = get_gh_token() env_cmd = ENV_SETUP if gh_token: - env_cmd += f"; export GH_TOKEN={gh_token}; export GITHUB_TOKEN={gh_token}" + env_cmd += f"; export GH_TOKEN={shlex.quote(gh_token)}; export GITHUB_TOKEN={shlex.quote(gh_token)}" print(f"GitHub token: {len(gh_token)} chars") send(env_cmd) key("Enter") diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index 6060aab2991..07bd9ed1474 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -19,6 +19,9 @@ import time import tempfile import shutil +import hashlib +import collections +import threading SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) @@ -79,51 +82,73 @@ def run_e2e(deploy_mode, label): else: os.makedirs(azd_config_dir, exist_ok=True) env["AZD_CONFIG_DIR"] = azd_config_dir - # Unique agent name to avoid Azure resource collisions in parallel runs - import hashlib - unique_suffix = hashlib.md5(f"{deploy_mode}-{os.getpid()}".encode()).hexdigest()[:6] + # Unique agent name to avoid Azure resource collisions across runs. + # sha256 (not md5) only to avoid noise from security scanners — this is a + # non-cryptographic uniqueness suffix. + unique_suffix = hashlib.sha256(f"{deploy_mode}-{os.getpid()}".encode()).hexdigest()[:6] env["E2E_AGENT_NAME"] = f"e2e-{deploy_mode}-{unique_suffix}" print(f"\n{'='*60}") print(f"[{label}] Starting: deploy_mode={deploy_mode}, sock={sock}") print(f"{'='*60}") + timeout_s = 1500 # 25 min hard cap per test + keep_artifacts = os.environ.get("E2E_KEEP_ARTIFACTS", "").lower() in ("1", "true", "yes") start = time.time() try: - r = subprocess.run( - cmd, env=env, - capture_output=True, text=True, timeout=1500 # 25 min max per test + # Stream child output live (visible in the CI log, nothing buffered in + # memory) while keeping a bounded tail for the summary. A watchdog timer + # enforces the hard timeout even if the child hangs without any output. + tail = collections.deque(maxlen=30) + proc = subprocess.Popen( + cmd, env=env, text=True, bufsize=1, + stdout=subprocess.PIPE, stderr=subprocess.STDOUT, ) + assert proc.stdout is not None # stdout=PIPE guarantees this + timed_out = threading.Event() + + def _on_timeout(): + timed_out.set() + proc.kill() + + watchdog = threading.Timer(timeout_s, _on_timeout) + watchdog.start() + try: + for line in proc.stdout: + sys.stdout.write(line) + sys.stdout.flush() + tail.append(line.rstrip("\n")) + finally: + watchdog.cancel() + returncode = proc.wait() elapsed = time.time() - start - success = r.returncode == 0 - # Print output - print(f"\n--- [{label}] Output ({elapsed:.0f}s) ---") - lines = r.stdout.strip().split("\n") - for line in lines[-30:]: + if timed_out.is_set(): + print(f"\n--- [{label}] TIMEOUT after {elapsed:.0f}s ---") + # Best-effort cleanup so a hung run does not leak Azure resources. + _cleanup_leaked_resources(testdir, env, label) + return { + "label": label, + "deploy_mode": deploy_mode, + "success": False, + "elapsed": elapsed, + "returncode": -1, + } + + print(f"\n--- [{label}] Summary ({elapsed:.0f}s, exit {returncode}) ---") + for line in tail: print(f" {line}") - if r.stderr.strip(): - print(f" [stderr] {r.stderr.strip()[:200]}") - - return { - "label": label, - "deploy_mode": deploy_mode, - "success": success, - "elapsed": elapsed, - "returncode": r.returncode, - } - except subprocess.TimeoutExpired: - elapsed = time.time() - start - print(f"\n--- [{label}] TIMEOUT after {elapsed:.0f}s ---") - # Attempt cleanup: find any azure.yaml and run azd down to prevent resource leak. - _cleanup_leaked_resources(testdir, env, label) return { "label": label, "deploy_mode": deploy_mode, - "success": False, + "success": returncode == 0, "elapsed": elapsed, - "returncode": -1, + "returncode": returncode, } + finally: + # Drop the per-mode AZD_CONFIG_DIR copy unless explicitly kept for debugging. + if not keep_artifacts and os.path.isdir(azd_config_dir): + shutil.rmtree(azd_config_dir, ignore_errors=True) if __name__ == "__main__": diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 0457f667c68..4c91b7734ae 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -99,11 +99,19 @@ extends: # minimal config.json so `azd ai agent` resolves the extension. - bash: | set -euo pipefail + # Map the agent architecture to azd's expected binary suffix so + # this keeps working if the pool ever moves off linux/amd64. + case "$(uname -m)" in + x86_64|amd64) GOARCH=amd64 ;; + aarch64|arm64) GOARCH=arm64 ;; + *) echo "Unsupported architecture: $(uname -m)" >&2; exit 1 ;; + esac + BIN_NAME="azure-ai-agents-linux-${GOARCH}" EXT_DIR="$HOME/.azd/extensions/azure.ai.agents" mkdir -p "$EXT_DIR" - cp cli/azd/extensions/azure.ai.agents/azure-ai-agents "$EXT_DIR/azure-ai-agents-linux-amd64" - chmod +x "$EXT_DIR/azure-ai-agents-linux-amd64" - cat > "$HOME/.azd/config.json" << 'EOF' + cp cli/azd/extensions/azure.ai.agents/azure-ai-agents "$EXT_DIR/$BIN_NAME" + chmod +x "$EXT_DIR/$BIN_NAME" + cat > "$HOME/.azd/config.json" << EOF { "extension": { "installed": { @@ -115,7 +123,7 @@ extends: "description": "Ship agents with Microsoft Foundry from your terminal. (Preview)", "version": "0.0.0-test", "usage": "azd ai agent [options]", - "path": "extensions/azure.ai.agents/azure-ai-agents-linux-amd64", + "path": "extensions/azure.ai.agents/${BIN_NAME}", "source": "azd" } } From 08a4aef328ff1134d0d324a6e1a0f35dc093fa65 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 21:17:35 +0800 Subject: [PATCH 06/33] Address third Copilot review pass: shell-quoting + tmux cleanup - Shell-escape HOME/PATH/TENANT, the cd target, and the agent name with shlex.quote() (consistent with the earlier token fix) - On Tier 2 timeout, kill the child's detached tmux server so reused CI agents do not accumulate orphaned tmux sockets --- .../azure.ai.agents/tests/e2e-live/test_full_e2e.py | 8 ++++---- .../azure.ai.agents/tests/e2e-live/test_tier2.py | 11 +++++++++++ 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index b18d5dd7d16..401d04c90f5 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -38,9 +38,9 @@ LOCATION = os.environ.get("E2E_LOCATION", "eastus2") # Region for new projects # Inherit full parent PATH so tmux sessions get az-wrapper, azd, etc. PARENT_PATH = os.environ.get("PATH", f"{HOME_DIR}/bin:/usr/local/bin:/usr/bin:/bin") -_tenant_env = f"; export AZURE_TENANT_ID={TENANT}" if TENANT else "" +_tenant_env = f"; export AZURE_TENANT_ID={shlex.quote(TENANT)}" if TENANT else "" _gh_env = f"; export GH_TOKEN={shlex.quote(GH_TOKEN)}; export GITHUB_TOKEN={shlex.quote(GH_TOKEN)}" if GH_TOKEN else "" -ENV_SETUP = f"export HOME={HOME_DIR}; export PATH={PARENT_PATH}{_tenant_env}{_gh_env}" +ENV_SETUP = f"export HOME={shlex.quote(HOME_DIR)}; export PATH={shlex.quote(PARENT_PATH)}{_tenant_env}{_gh_env}" # Track results results = {} @@ -274,7 +274,7 @@ def setup(): key("Enter") time.sleep(1) - send(f"cd {TESTDIR}") + send(f"cd {shlex.quote(TESTDIR)}") key("Enter") time.sleep(1) @@ -289,7 +289,7 @@ def phase_init(): init_cmd = "azd ai agent init" if AGENT_NAME: - init_cmd += f" --agent-name {AGENT_NAME}" + init_cmd += f" --agent-name {shlex.quote(AGENT_NAME)}" send(init_cmd) key("Enter") time.sleep(8) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index 07bd9ed1474..2cc8aac9eb9 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -125,6 +125,17 @@ def _on_timeout(): if timed_out.is_set(): print(f"\n--- [{label}] TIMEOUT after {elapsed:.0f}s ---") + # The child's tmux server runs detached, so killing the child Python + # process does not stop it. Tear it down explicitly so we don't leak + # orphaned tmux servers/sockets on reused CI agents. + tmux_bin = env.get("E2E_TMUX", "tmux") + try: + subprocess.run( + [tmux_bin, "-L", sock, "kill-server"], + capture_output=True, text=True, timeout=30, + ) + except Exception as e: + print(f" [{label}] tmux kill-server failed: {e}") # Best-effort cleanup so a hung run does not leak Azure resources. _cleanup_leaked_resources(testdir, env, label) return { From 7890eec6abf9fec51942701289c7b9672e79d68c Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 21:29:41 +0800 Subject: [PATCH 07/33] Harden E2E shell composition and rm -rf guardrail (Copilot round 4) - Quote project_dir, service_name, and payload_file with shlex.quote in the cd/invoke/monitor commands (and the two remaining cleanup-path cd calls) - Guard the test-dir wipe: reject an unsafe E2E_TESTDIR (/, /tmp, $HOME, etc.) and pass -- to rm so paths starting with - are not treated as flags --- .../tests/e2e-live/test_full_e2e.py | 29 ++++++++++++++----- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 401d04c90f5..74d4c627c55 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -209,6 +209,20 @@ def find_service_name(testdir): return None +def _assert_safe_testdir(path): + """Guardrail before `rm -rf`: refuse a path that is not a clearly disposable + test dir, so a bad E2E_TESTDIR (e.g. '/', '/tmp', '$HOME') can never trigger + a destructive delete. Returns the normalized absolute path.""" + abspath = os.path.abspath(path) + home = os.path.abspath(os.path.expanduser("~")) + protected = {"/", "/tmp", "/var", "/usr", "/etc", "/bin", "/lib", + "/root", "/home", home} + if abspath in protected or abspath.count("/") < 2: + raise RuntimeError( + f"Refusing to `rm -rf` unsafe E2E_TESTDIR={path!r} (resolved {abspath!r})") + return abspath + + # =========================================================== # SETUP # =========================================================== @@ -220,9 +234,10 @@ def setup(): subprocess.run([TMUX, "-L", SOCK, "kill-server"], capture_output=True) time.sleep(0.5) - # Clean test dir - subprocess.run(["rm", "-rf", TESTDIR]) - os.makedirs(TESTDIR, exist_ok=True) + # Clean test dir (guard against a destructive E2E_TESTDIR like '/' or '/tmp') + safe_testdir = _assert_safe_testdir(TESTDIR) + subprocess.run(["rm", "-rf", "--", safe_testdir]) + os.makedirs(safe_testdir, exist_ok=True) # Create tmux session tmux("new-session", "-d", "-s", SESS, "-x", "200", "-y", "50", "bash --norc --noprofile") @@ -606,7 +621,7 @@ def phase_provision(): return False print(f"Project dir: {project_dir}") - send(f"cd {project_dir}") + send(f"cd {shlex.quote(project_dir)}") key("Enter") time.sleep(1) @@ -689,7 +704,7 @@ def phase_invoke(): for attempt in range(1, max_retries + 1): print(f"\nInvoke attempt {attempt}/{max_retries}...") cap, rc = run_cmd( - f"azd ai agent invoke {service_name} --new-session -f {payload_file}", + f"azd ai agent invoke {shlex.quote(service_name)} --new-session -f {shlex.quote(payload_file)}", timeout=180, ) if cap is None: @@ -725,7 +740,7 @@ def phase_invoke(): else: # Get container logs for debugging print("\n Fetching agent logs for debugging...") - send(f"azd ai agent monitor {service_name} --tail 50") + send(f"azd ai agent monitor {shlex.quote(service_name)} --tail 50") key("Enter") time.sleep(10) log_cap = _wait_for_shell_prompt_legacy(timeout=60) @@ -856,7 +871,7 @@ def phase_teardown(): break if project_dir and not args.keep: print(f"\nInit failed but found .azure in {project_dir} — attempting cleanup...") - send(f"cd {project_dir}") + send(f"cd {shlex.quote(project_dir)}") key("Enter") time.sleep(1) phase_teardown() From a2f634db6222576bc80f9ef07173516598901a55 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 21:40:03 +0800 Subject: [PATCH 08/33] Clarify Linux/WSL support in E2E docstrings (Copilot round 5) The Tier 2 scripts also run unattended on the Azure DevOps Linux agent, not only under local WSL; adjust the prerequisite wording so the header comments do not contradict the CI usage. --- .../extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py | 3 ++- .../extensions/azure.ai.agents/tests/e2e-live/test_tier2.py | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 74d4c627c55..7e67a3d6972 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -4,7 +4,8 @@ See README.md for complete setup & run instructions. Prerequisites: - - WSL with tmux (>=3.4), Python 3.12+ + - Linux (including WSL) with tmux (>=3.4), Python 3.12+ — also runs on the + Azure DevOps Linux agent via eng/pipelines/ext-azure-ai-agents-live.yml - azd (>=1.25.5) with azure.ai.agents extension, logged in via `azd auth login` - For local WSL runs, leave auth.useAzCliAuth unset (azd built-in auth, via `azd config unset auth.useAzCliAuth`); under CI (GitHub Actions / Azure diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index 2cc8aac9eb9..f3c408471a9 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -10,7 +10,8 @@ - unique agent name (avoids Azure resource collisions) Prerequisites: - - Same as test_full_e2e.py (WSL, tmux, azd, az CLI, tokens) + - Same as test_full_e2e.py: Linux (including WSL) with tmux, azd, az CLI, + tokens. Runs unattended on the Azure DevOps Linux agent in CI. - Model quota for one deployment at a time """ import subprocess From ef890ee631e57c282e586ba4a4799a3240545361 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 21:53:37 +0800 Subject: [PATCH 09/33] Bound setup() subprocess calls with timeout/check (Copilot round 6) --- .../azure.ai.agents/tests/e2e-live/test_full_e2e.py | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 7e67a3d6972..42a9cc97049 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -232,12 +232,16 @@ def setup(): print("SETUP") print("=" * 60) - subprocess.run([TMUX, "-L", SOCK, "kill-server"], capture_output=True) + # Bound kill-server so a wedged tmux cannot hang the whole CI job. It stays + # best-effort (no check): a "no server running" error here is expected. + subprocess.run([TMUX, "-L", SOCK, "kill-server"], capture_output=True, timeout=30) time.sleep(0.5) - # Clean test dir (guard against a destructive E2E_TESTDIR like '/' or '/tmp') + # Clean test dir (guard against a destructive E2E_TESTDIR like '/' or '/tmp'). + # check=True surfaces a failed delete instead of running against a dirty dir; + # timeout keeps a stuck delete from stalling the job with no diagnostic. safe_testdir = _assert_safe_testdir(TESTDIR) - subprocess.run(["rm", "-rf", "--", safe_testdir]) + subprocess.run(["rm", "-rf", "--", safe_testdir], check=True, timeout=120) os.makedirs(safe_testdir, exist_ok=True) # Create tmux session From f2e67c2a5d03e2d3f7c0766279a5b4e34db2feff Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 22:16:38 +0800 Subject: [PATCH 10/33] Harden invoke assertion to a standalone 4/four token + Linux/WSL docs (Copilot round 7) --- .../azure.ai.agents/tests/e2e-live/README.md | 6 +++--- .../tests/e2e-live/test_full_e2e.py | 15 +++++++++++---- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md index aaddc1bff0a..52c2b705232 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md @@ -54,10 +54,10 @@ Logs for each run are published as the `tier2-live-logs-` artifact. `azuresdk-github-pat` (already provided by the Azure SDK ADO project) to avoid anonymous rate limits, so no extra secret setup is required. -## Running locally (WSL) +## Running locally (Linux / WSL) -Prerequisites: WSL with `tmux` (>= 3.4), Python 3.12+, `azd` (>= 1.25.5) with the -`azure.ai.agents` extension installed, and `az` logged in. +Prerequisites: Linux (including WSL) with `tmux` (>= 3.4), Python 3.12+, `azd` +(>= 1.25.5) with the `azure.ai.agents` extension installed, and `az` logged in. ```bash # Use azd's built-in auth locally (NOT az CLI auth — it is slow under WSL). diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 42a9cc97049..eb5d8c82a8a 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -792,12 +792,19 @@ def phase_invoke(): results["invoke"] = "FAIL (empty response)" return False - # Payload asks "what is 2+2?" — response should contain "4" - has_expected = "4" in response_text + # Payload asks "what is 2+2?". Accept a standalone "4" token or the + # spelled-out word "four" (a live model may answer either). The regex + # requires "4" to stand alone so unrelated "4"s in captured output — + # model names ("gpt-4o-mini"), versions ("4.1"), or status codes + # ("404") — don't produce a false pass. + has_expected = ( + re.search(r"(? Date: Mon, 22 Jun 2026 22:54:48 +0800 Subject: [PATCH 11/33] Export GH token only once in setup(); raise instead of assert for stdout pipe (Copilot round 9) --- .../azure.ai.agents/tests/e2e-live/test_full_e2e.py | 7 ++++--- .../azure.ai.agents/tests/e2e-live/test_tier2.py | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index eb5d8c82a8a..3792694becd 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -33,15 +33,16 @@ SUBSCRIPTION = os.environ.get("E2E_SUBSCRIPTION", "") PROJECT = os.environ.get("E2E_PROJECT", "") TENANT = os.environ.get("E2E_TENANT", "") -GH_TOKEN = os.environ.get("GH_TOKEN", os.environ.get("GITHUB_TOKEN", "")) AGENT_NAME = os.environ.get("E2E_AGENT_NAME", "") # Optional: unique name for parallel isolation CREATE_PROJECT = os.environ.get("E2E_CREATE_PROJECT", "").lower() in ("1", "true", "yes") LOCATION = os.environ.get("E2E_LOCATION", "eastus2") # Region for new projects # Inherit full parent PATH so tmux sessions get az-wrapper, azd, etc. PARENT_PATH = os.environ.get("PATH", f"{HOME_DIR}/bin:/usr/local/bin:/usr/bin:/bin") _tenant_env = f"; export AZURE_TENANT_ID={shlex.quote(TENANT)}" if TENANT else "" -_gh_env = f"; export GH_TOKEN={shlex.quote(GH_TOKEN)}; export GITHUB_TOKEN={shlex.quote(GH_TOKEN)}" if GH_TOKEN else "" -ENV_SETUP = f"export HOME={shlex.quote(HOME_DIR)}; export PATH={shlex.quote(PARENT_PATH)}{_tenant_env}{_gh_env}" +# The GitHub token is intentionally NOT baked into ENV_SETUP. It is exported +# exactly once in setup() (immediately before the tmux scrollback is cleared), +# so the secret never lingers in pane history or gets duplicated across panes. +ENV_SETUP = f"export HOME={shlex.quote(HOME_DIR)}; export PATH={shlex.quote(PARENT_PATH)}{_tenant_env}" # Track results results = {} diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index f3c408471a9..d6738fb771e 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -105,7 +105,8 @@ def run_e2e(deploy_mode, label): cmd, env=env, text=True, bufsize=1, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, ) - assert proc.stdout is not None # stdout=PIPE guarantees this + if proc.stdout is None: # stdout=PIPE guarantees a pipe; be explicit for `python -O` + raise RuntimeError("subprocess stdout pipe was not created") timed_out = threading.Event() def _on_timeout(): From 38310d06d896e828e169b74b228d867e664f9e1f Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 23:12:05 +0800 Subject: [PATCH 12/33] Normalize TESTDIR to a vetted abspath; size watchdog to child phase-sum (Copilot round 10) --- .../tests/e2e-live/test_full_e2e.py | 10 +++++++--- .../azure.ai.agents/tests/e2e-live/test_tier2.py | 9 ++++++++- eng/pipelines/ext-azure-ai-agents-live.yml | 15 +++++++++++---- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 3792694becd..d1f27f4b7d4 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -229,6 +229,7 @@ def _assert_safe_testdir(path): # SETUP # =========================================================== def setup(): + global TESTDIR print("=" * 60) print("SETUP") print("=" * 60) @@ -239,11 +240,14 @@ def setup(): time.sleep(0.5) # Clean test dir (guard against a destructive E2E_TESTDIR like '/' or '/tmp'). + # Normalize to a vetted ABSOLUTE path and assign it back to the module global + # so every later step (cd, payload --input-file, dir scans, teardown) acts on + # the exact directory we wipe/create here — even if E2E_TESTDIR was relative. # check=True surfaces a failed delete instead of running against a dirty dir; # timeout keeps a stuck delete from stalling the job with no diagnostic. - safe_testdir = _assert_safe_testdir(TESTDIR) - subprocess.run(["rm", "-rf", "--", safe_testdir], check=True, timeout=120) - os.makedirs(safe_testdir, exist_ok=True) + TESTDIR = _assert_safe_testdir(TESTDIR) + subprocess.run(["rm", "-rf", "--", TESTDIR], check=True, timeout=120) + os.makedirs(TESTDIR, exist_ok=True) # Create tmux session tmux("new-session", "-d", "-s", SESS, "-x", "200", "-y", "50", "bash --norc --noprofile") diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index d6738fb771e..6bbc997a32e 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -93,7 +93,14 @@ def run_e2e(deploy_mode, label): print(f"[{label}] Starting: deploy_mode={deploy_mode}, sock={sock}") print(f"{'='*60}") - timeout_s = 1500 # 25 min hard cap per test + # Hard cap for the whole child run. Must be >= the SUM of the child's own + # sequential per-phase bounds in test_full_e2e.py, so this watchdog only + # trips for a truly hung child and never preempts a slow-but-healthy run + # (which would be a spurious failure AND would skip the child's azd + # teardown, leaking live Azure resources). Child phase budget: + # setup ~3m + init ~5m + provision 10m + deploy 10m + invoke ~12m + # + teardown 10m ~= 50m. + timeout_s = 3000 # 50 min hard cap per child (>= sum of child phase bounds) keep_artifacts = os.environ.get("E2E_KEEP_ARTIFACTS", "").lower() in ("1", "true", "yes") start = time.time() try: diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 4c91b7734ae..b8dcee19555 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -61,9 +61,12 @@ extends: name: $(LINUXPOOL) image: $(LINUXVMIMAGE) os: linux - # Two golden paths (code + container) run sequentially, ~13-15 min - # each, plus build/provision overhead. - timeoutInMinutes: 90 + # Two golden paths (code + container) run sequentially (~13-15 min + # each in the typical case), plus build/provision overhead. The cap + # is sized for the worst case so an ungraceful job timeout never + # preempts the in-test teardown: 2x the child's 50 min watchdog + # (test_tier2.py timeout_s) + per-run cleanup + build/setup steps. + timeoutInMinutes: 130 steps: - checkout: self @@ -141,7 +144,11 @@ extends: # the session does not persist to later plain bash steps. - task: AzureCLI@2 displayName: Run Tier 2 live golden path - timeoutInMinutes: 80 + # Holds BOTH deploy modes run sequentially: 2x the child's 50 min + # watchdog (test_tier2.py timeout_s) + per-run cleanup margin, so + # this step timeout never trips before the child's own watchdog + # (which runs the graceful azd teardown). + timeoutInMinutes: 110 inputs: azureSubscription: ${{ parameters.serviceConnection }} keepAzSessionActive: true From 729c744fe986945e2ef9696bf6b42925e023ed79 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 23:23:14 +0800 Subject: [PATCH 13/33] Fail run on teardown failure; spawn child via sys.executable; use pinned python in CI (Copilot round 11) --- .../azure.ai.agents/tests/e2e-live/test_full_e2e.py | 8 +++++++- .../azure.ai.agents/tests/e2e-live/test_tier2.py | 5 ++++- eng/pipelines/ext-azure-ai-agents-live.yml | 5 ++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index d1f27f4b7d4..63d857aa4ea 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -909,11 +909,17 @@ def phase_teardown(): required = ["init", "provision", "deploy", "invoke"] passed_required = all(results.get(p, "").startswith("PASS") for p in required) + # A failed `azd down` leaks live Azure resources, so a teardown FAIL must fail + # the run too — never report a green run while leaking. "SKIPPED (--keep)" and + # an unreached teardown don't start with "FAIL", so they don't trip this. + teardown_ok = not results.get("teardown", "").startswith("FAIL") - if passed_required: + if passed_required and teardown_ok: print("\n✓ ALL REQUIRED PHASES PASSED") sys.exit(0) else: missing = [p for p in required if not results.get(p, "").startswith("PASS")] + if not teardown_ok: + missing.append("teardown") print(f"\n✗ FAILED PHASES: {', '.join(missing)}") sys.exit(1) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index 6bbc997a32e..ca75bd7c630 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -61,8 +61,11 @@ def run_e2e(deploy_mode, label): script_path = os.path.join(SCRIPT_DIR, "test_full_e2e.py") + # Use sys.executable (not "python3") so the child runs under the exact same + # interpreter/version as this parent — matches the UsePythonVersion@0 pinned + # Python in CI and works inside virtualenvs. cmd = [ - "python3", script_path, "--deploy-mode", deploy_mode + sys.executable, script_path, "--deploy-mode", deploy_mode ] env = os.environ.copy() diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index b8dcee19555..f43ceb69254 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -170,7 +170,10 @@ extends: export E2E_SUBSCRIPTION E2E_TENANT echo "Using subscription: $E2E_SUBSCRIPTION" mkdir -p "$(Build.ArtifactStagingDirectory)/logs" - python3 test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ + # Invoke as `python` (not `python3`) so the UsePythonVersion@0 + # pinned 3.12 is used; `python3` may still resolve to the system + # Python on some agent images (matches eng/pipelines/eval-unit.yml). + python test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ | tee "$(Build.ArtifactStagingDirectory)/logs/tier2.log" env: E2E_CREATE_PROJECT: "true" From be2ddd1bf5a84b420682c95bc6377e3e55133046 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 22 Jun 2026 23:34:18 +0800 Subject: [PATCH 14/33] Use backtick command substitution for uname so ADO can't read it as a macro (Copilot round 12) --- eng/pipelines/ext-azure-ai-agents-live.yml | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index f43ceb69254..05a3b3b6355 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -104,10 +104,14 @@ extends: set -euo pipefail # Map the agent architecture to azd's expected binary suffix so # this keeps working if the pool ever moves off linux/amd64. - case "$(uname -m)" in + # Use backticks (not $(...)) for the command substitution so + # Azure DevOps cannot mistake it for a $(macro) variable; assign + # once and reference the plain shell var ($ARCH) thereafter. + ARCH=`uname -m` + case "$ARCH" in x86_64|amd64) GOARCH=amd64 ;; aarch64|arm64) GOARCH=arm64 ;; - *) echo "Unsupported architecture: $(uname -m)" >&2; exit 1 ;; + *) echo "Unsupported architecture: $ARCH" >&2; exit 1 ;; esac BIN_NAME="azure-ai-agents-linux-${GOARCH}" EXT_DIR="$HOME/.azd/extensions/azure.ai.agents" From db53b06a538a6bd6a1431df80a3fb26d8a31d54d Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Tue, 23 Jun 2026 12:58:21 +0800 Subject: [PATCH 15/33] Give child watchdog a teardown margin over phase-sum; widen pipeline caps The child watchdog (test_tier2.py timeout_s) equalled the 50m sum of the per-phase budgets, so a slow-but-healthy run that consumed its earlier phases could be hard-killed during the final azd teardown and leak live Azure resources -- the exact failure the watchdog is meant to prevent. Raise the child watchdog to 60m (50m phase sum + a full 10m teardown margin) and cascade the outer caps to keep the 'outer cap strictly greater than 2x child watchdog' invariant: AzureCLI@2 step 110->130m, job 130->150m. No behavior change on a healthy run. --- .../azure.ai.agents/tests/e2e-live/test_tier2.py | 15 +++++++++------ eng/pipelines/ext-azure-ai-agents-live.yml | 8 ++++---- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py index ca75bd7c630..de4f9df3d76 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py @@ -96,14 +96,17 @@ def run_e2e(deploy_mode, label): print(f"[{label}] Starting: deploy_mode={deploy_mode}, sock={sock}") print(f"{'='*60}") - # Hard cap for the whole child run. Must be >= the SUM of the child's own - # sequential per-phase bounds in test_full_e2e.py, so this watchdog only - # trips for a truly hung child and never preempts a slow-but-healthy run - # (which would be a spurious failure AND would skip the child's azd - # teardown, leaking live Azure resources). Child phase budget: + # Hard cap for the whole child run. Must be strictly GREATER than the SUM of + # the child's own sequential per-phase bounds in test_full_e2e.py, so this + # watchdog only trips for a truly hung child and never preempts a + # slow-but-healthy run (which would be a spurious failure AND would skip the + # child's azd teardown, leaking live Azure resources). Child phase budget: # setup ~3m + init ~5m + provision 10m + deploy 10m + invoke ~12m # + teardown 10m ~= 50m. - timeout_s = 3000 # 50 min hard cap per child (>= sum of child phase bounds) + # The 10 min margin over that 50m sum is a full extra teardown budget, so even + # a run that exhausts every earlier phase still has room to tear down + # gracefully before this hard kill fires. + timeout_s = 3600 # 60 min hard cap per child (50m phase sum + 10m teardown margin) keep_artifacts = os.environ.get("E2E_KEEP_ARTIFACTS", "").lower() in ("1", "true", "yes") start = time.time() try: diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 05a3b3b6355..d9b52c54da8 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -64,9 +64,9 @@ extends: # Two golden paths (code + container) run sequentially (~13-15 min # each in the typical case), plus build/provision overhead. The cap # is sized for the worst case so an ungraceful job timeout never - # preempts the in-test teardown: 2x the child's 50 min watchdog + # preempts the in-test teardown: 2x the child's 60 min watchdog # (test_tier2.py timeout_s) + per-run cleanup + build/setup steps. - timeoutInMinutes: 130 + timeoutInMinutes: 150 steps: - checkout: self @@ -148,11 +148,11 @@ extends: # the session does not persist to later plain bash steps. - task: AzureCLI@2 displayName: Run Tier 2 live golden path - # Holds BOTH deploy modes run sequentially: 2x the child's 50 min + # Holds BOTH deploy modes run sequentially: 2x the child's 60 min # watchdog (test_tier2.py timeout_s) + per-run cleanup margin, so # this step timeout never trips before the child's own watchdog # (which runs the graceful azd teardown). - timeoutInMinutes: 110 + timeoutInMinutes: 130 inputs: azureSubscription: ${{ parameters.serviceConnection }} keepAzSessionActive: true From 019d402e7b38cefe169156ac8440bdb6ced5ae09 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Tue, 23 Jun 2026 13:44:53 +0800 Subject: [PATCH 16/33] Drop redundant f-prefix on two placeholder-less log strings pyflakes flagged two f-strings with no interpolation; they are static text, so the f-prefix is dead. Plain string literals keep the linter (and Copilot) clean. No behavior change. --- .../azure.ai.agents/tests/e2e-live/test_full_e2e.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 63d857aa4ea..2e994ba069b 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -434,7 +434,7 @@ def phase_init(): key("Enter") time.sleep(5) else: - print(f"[6] Not on project picker, moving to dynamic") + print("[6] Not on project picker, moving to dynamic") # Step 7+: Dynamic prompts _last_prompt = "" @@ -744,7 +744,7 @@ def phase_invoke(): if rc != 0 and has_error and ("500" in error_msg or "Internal Server Error" in error_msg): print(f" Server error: {error_msg[:100]}") if attempt < max_retries: - print(f" Retrying in 30s (container may still be starting)...") + print(" Retrying in 30s (container may still be starting)...") time.sleep(30) continue else: From 171d7e65281a0ca787dd5814eec1e88768a0e5ba Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Tue, 23 Jun 2026 13:50:08 +0800 Subject: [PATCH 17/33] Correct documented tmux minimum to match the ubuntu-22.04 CI agent The docstring and README claimed tmux >=3.4, but the live pipeline installs tmux via apt-get on ubuntu-22.04 (LINUXVMIMAGE), which ships 3.2a. The driver only uses basic verbs (send-keys -l, capture-pane -p, new-session -x/-y, clear-history, kill-session/server) available well before 3.2, so >=3.4 was inaccurate and misleading. Align both docs to >=3.2. --- cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md | 2 +- .../extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md index 52c2b705232..5b468bcfc43 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md @@ -56,7 +56,7 @@ Logs for each run are published as the `tier2-live-logs-` artifact. ## Running locally (Linux / WSL) -Prerequisites: Linux (including WSL) with `tmux` (>= 3.4), Python 3.12+, `azd` +Prerequisites: Linux (including WSL) with `tmux` (>= 3.2), Python 3.12+, `azd` (>= 1.25.5) with the `azure.ai.agents` extension installed, and `az` logged in. ```bash diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py index 2e994ba069b..f04f26f9346 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py @@ -4,7 +4,7 @@ See README.md for complete setup & run instructions. Prerequisites: - - Linux (including WSL) with tmux (>=3.4), Python 3.12+ — also runs on the + - Linux (including WSL) with tmux (>=3.2), Python 3.12+ — also runs on the Azure DevOps Linux agent via eng/pipelines/ext-azure-ai-agents-live.yml - azd (>=1.25.5) with azure.ai.agents extension, logged in via `azd auth login` - For local WSL runs, leave auth.useAzCliAuth unset (azd built-in auth, via From c62a6966d48e012b0df52d3a5d30561d2f22f096 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Wed, 24 Jun 2026 18:02:40 +0800 Subject: [PATCH 18/33] Rewrite Tier 2 live E2E driver from tmux+Python to Go Per review feedback, replace the tmux-driven Python golden-path driver with a Go test so the live suite uses the same toolchain as the rest of the extension and needs no Python/tmux on the agent. - TestTier2Live drives the interactive `azd ai agent init` through a pseudo-terminal (Netflix/go-expect sends keys, hinshun/vt10x renders the screen, creack/pty provides the PTY); provision/deploy/invoke/down shell out to azd with --no-prompt. Per-mode 60m context drives the graceful azd-down teardown via t.Cleanup. - The PTY driver is tagged //go:build linux; the answer matcher and its table unit tests build and run on any platform. - Pipeline builds and runs `go test -run TestTier2Live` and drops the Python/tmux setup steps; promote go-expect/vt10x/creack-pty to direct requires; refresh README and cspell. - Remove test_tier2.py and test_full_e2e.py. --- .../extensions/azure.ai.agents/cspell.yaml | 4 + cli/azd/extensions/azure.ai.agents/go.mod | 7 +- .../azure.ai.agents/tests/e2e-live/README.md | 69 +- .../azure.ai.agents/tests/e2e-live/assert.go | 68 ++ .../tests/e2e-live/assert_test.go | 41 + .../tests/e2e-live/console_test.go | 143 +++ .../tests/e2e-live/test_full_e2e.py | 925 ------------------ .../tests/e2e-live/test_tier2.py | 228 ----- .../tests/e2e-live/tier2_live_test.go | 770 +++++++++++++++ eng/pipelines/ext-azure-ai-agents-live.yml | 43 +- 10 files changed, 1099 insertions(+), 1199 deletions(-) create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go delete mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py delete mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py create mode 100644 cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go diff --git a/cli/azd/extensions/azure.ai.agents/cspell.yaml b/cli/azd/extensions/azure.ai.agents/cspell.yaml index 4a7b9563144..ed66ca68c4a 100644 --- a/cli/azd/extensions/azure.ai.agents/cspell.yaml +++ b/cli/azd/extensions/azure.ai.agents/cspell.yaml @@ -82,3 +82,7 @@ words: - deepseek - ttfb - Bhadauria + # Live E2E (Tier 2) Go driver + - creack + - elive + - testdir diff --git a/cli/azd/extensions/azure.ai.agents/go.mod b/cli/azd/extensions/azure.ai.agents/go.mod index dbc6ceaed39..b40c0b86033 100644 --- a/cli/azd/extensions/azure.ai.agents/go.mod +++ b/cli/azd/extensions/azure.ai.agents/go.mod @@ -30,7 +30,12 @@ require ( require github.com/denormal/go-gitignore v0.0.0-20180930084346-ae8ad1d07817 -require golang.org/x/term v0.41.0 +require ( + github.com/Netflix/go-expect v0.0.0-20220104043353-73e0943537d2 + github.com/creack/pty v1.1.17 + github.com/hinshun/vt10x v0.0.0-20220119200601-820417d04eec + golang.org/x/term v0.41.0 +) require ( dario.cat/mergo v1.0.2 // indirect diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md index 5b468bcfc43..2fcd9408a71 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md @@ -7,8 +7,11 @@ Azure** resources: init → provision → deploy → invoke → down ``` -A Python driver sends keystrokes to the CLI through a **tmux** session and asserts -on the captured output, for both deploy modes: +A Go test driver answers the interactive `azd ai agent init` prompts through a +**pseudo-terminal** — [go-expect] sends keystrokes and [vt10x] renders the CLI's +terminal UI so the test can assert on the on-screen text, with [creack/pty] +providing the PTY. The non-interactive phases (`provision`, `deploy`, `invoke`, +`down`) shell out to `azd ... --no-prompt`. Both deploy modes are covered: | Mode | What it does | | ----------- | ------------------------------------------------------- | @@ -17,6 +20,10 @@ on the captured output, for both deploy modes: The two modes run **sequentially** (same subscription → avoids resource races). +[go-expect]: https://github.com/Netflix/go-expect +[vt10x]: https://github.com/hinshun/vt10x +[creack/pty]: https://github.com/creack/pty + ## Where this fits | Tier | Coverage | Where it runs | @@ -30,7 +37,9 @@ SDK EngSys / SFI guidance). Tier 2 runs only on demand or on a schedule. ## Running in CI -Pipeline: `eng/pipelines/ext-azure-ai-agents-live.yml` (ADO). +Pipeline: `eng/pipelines/ext-azure-ai-agents-live.yml` (ADO). The Tier 2 step +builds `azd` + the extension and runs `go test -run TestTier2Live` inside an +`AzureCLI@2` task (so the federated az session stays valid for the whole run). - **On demand (per PR):** comment `/azp run ext-azure-ai-agents-live` on the PR. Requires write permission on the repo. @@ -56,8 +65,14 @@ Logs for each run are published as the `tier2-live-logs-` artifact. ## Running locally (Linux / WSL) -Prerequisites: Linux (including WSL) with `tmux` (>= 3.2), Python 3.12+, `azd` -(>= 1.25.5) with the `azure.ai.agents` extension installed, and `az` logged in. +The live driver is tagged `//go:build linux` — it relies on a real PTY and a +controlling terminal (the platform CI runs on). On Windows, run it under WSL. + +Prerequisites: Linux (including WSL), a Go toolchain matching `go.mod` +(`GOTOOLCHAIN=auto` fetches the right version automatically), `azd` (>= 1.25.5) +with the `azure.ai.agents` extension installed, and `az` logged in. + +Run from the extension root (`cli/azd/extensions/azure.ai.agents`): ```bash # Use azd's built-in auth locally (NOT az CLI auth — it is slow under WSL). @@ -65,33 +80,45 @@ azd config unset auth.useAzCliAuth azd auth login # Both modes (sequential): -python3 test_tier2.py --mode both +AZURE_AI_AGENTS_E2E_LIVE=1 E2E_DEPLOY_MODES=both \ + go test -run TestTier2Live -count=1 -timeout 130m -v ./tests/e2e-live/ # A single golden path: -python3 test_full_e2e.py --deploy-mode code -python3 test_full_e2e.py --deploy-mode container --keep # leave resources up +AZURE_AI_AGENTS_E2E_LIVE=1 E2E_DEPLOY_MODES=code \ + go test -run TestTier2Live -count=1 -timeout 90m -v ./tests/e2e-live/ ``` +Without `AZURE_AI_AGENTS_E2E_LIVE=1` the test is **skipped**, so the package is +safe to include in a normal `go test ./...`. + ### Useful environment variables -| Variable | Default | Purpose | -| ---------------------- | ------------ | -------------------------------------------------------------- | -| `E2E_CREATE_PROJECT` | `false` | `true` → always create a fresh Foundry project | -| `E2E_LOCATION` | `eastus2` | Region for new projects (needs model quota) | -| `E2E_SUBSCRIPTION` | — | Subscription id (filters the picker) | -| `E2E_TENANT` | — | AAD tenant id | -| `E2E_USE_AZ_CLI_AUTH` | — | `true` → set `auth.useAzCliAuth` (CI; auto-on under ADO/GHA) | -| `GH_TOKEN` | — | GitHub token for template clone (optional) | +| Variable | Default | Purpose | +| -------------------------- | ------------------------------ | ----------------------------------------------------------- | +| `AZURE_AI_AGENTS_E2E_LIVE` | — | **Required** `=1` gate; unset → the test is skipped | +| `E2E_DEPLOY_MODES` | `both` | `both` / `code` / `container` | +| `E2E_CREATE_PROJECT` | `false` | `true` → always create a fresh Foundry project | +| `E2E_PROJECT` | — | Name of an existing Foundry project to select instead | +| `E2E_LOCATION` | `eastus2` | Region for new projects (needs model quota) | +| `E2E_SUBSCRIPTION` | — | Subscription id (filters the picker) | +| `E2E_TENANT` | — | AAD tenant id (sets `AZURE_TENANT_ID` for azd) | +| `E2E_USE_AZ_CLI_AUTH` | — | `true` → set `auth.useAzCliAuth` (CI; auto-on under ADO/GHA) | +| `E2E_TESTDIR` | `/tmp/e2e-tests/tier2-` | Scratch dir for the scaffolded project | +| `E2E_KEEP_ARTIFACTS` | — | `true` → keep the per-run `AZD_CONFIG_DIR` copy for debugging | +| `GH_TOKEN` | — | GitHub token for template clone (optional) | In CI the driver auto-detects GitHub Actions (`GITHUB_ACTIONS`) and Azure DevOps -(`TF_BUILD`) and switches to `az` CLI auth automatically. +(`TF_BUILD`) and switches to `az` CLI auth automatically. Azure resources are +always torn down (`azd down --force --purge`) via `t.Cleanup`, even on failure. ## Files -| File | Purpose | -| ------------------ | ----------------------------------------------------------------- | -| `test_tier2.py` | Runner — invokes `test_full_e2e.py` once per deploy mode | -| `test_full_e2e.py` | One golden path: setup → init → provision → deploy → invoke → down | +| File | Purpose | +| -------------------- | -------------------------------------------------------------------------------- | +| `tier2_live_test.go` | `TestTier2Live` — drives init/provision/deploy/invoke/down per mode (Linux-only) | +| `console_test.go` | PTY + vt10x console helper that renders the interactive CLI (Linux-only) | +| `assert.go` | Pure-logic answer matcher (`responseHasExpectedAnswer`) — builds on any platform | +| `assert_test.go` | Unit tests for the matcher — run anywhere via `go test ./tests/e2e-live/` | Each phase has bounded timeouts and best-effort `azd down --force --purge` teardown so a crash mid-run does not leak billable resources. diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go new file mode 100644 index 00000000000..89ebe9d94c0 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go @@ -0,0 +1,68 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +// Package e2elive contains the Tier 2 live golden-path end-to-end test for the +// azure.ai.agents extension: init -> provision -> deploy -> invoke -> down, +// driven against real Azure resources. See README.md for setup and how to run. +package e2elive + +import ( + "regexp" + "unicode" +) + +// spelledFourRe matches the spelled-out word "four" as a standalone word +// (case-insensitive), e.g. "the answer is four". +var spelledFourRe = regexp.MustCompile(`(?i)\bfour\b`) + +// responseHasExpectedAnswer reports whether text answers "what is 2+2?" with a +// standalone "4" or the spelled-out word "four". +// +// A live model may answer either, and the captured CLI output also contains +// unrelated digits — model names ("gpt-4o-mini"), versions ("4.1"), or status +// codes ("404") — so a bare substring search would produce false positives. +// The "4" must therefore stand alone: not part of a larger word or number. +// The standalone-"4" rule is the lookaround (? 0 { + if prev := runes[i-1]; prev == '.' || isWordRune(prev) { + continue + } + } + if i+2 < len(runes) && runes[i+1] == '.' && unicode.IsDigit(runes[i+2]) { + continue + } + if i+1 < len(runes) && isWordRune(runes[i+1]) { + continue + } + return true + } + return false +} + +// isWordRune reports whether r is a word character, matching the Python regex +// \w class (Unicode letters, digits, and underscore). +func isWordRune(r rune) bool { + return r == '_' || unicode.IsLetter(r) || unicode.IsDigit(r) +} diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go new file mode 100644 index 00000000000..b5e2eaab809 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go @@ -0,0 +1,41 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package e2elive + +import "testing" + +func TestResponseHasExpectedAnswer(t *testing.T) { + t.Parallel() + + cases := []struct { + name string + text string + want bool + }{ + {"plain four digit", "The answer is 4.", true}, + {"bare four", "4", true}, + {"equation", "2+2=4", true}, + {"spelled word", "It is four.", true}, + {"spelled upper", "FOUR", true}, + {"parenthesized", "(4)", true}, + {"trailing period mid-sentence", "the value 4. is final", true}, + {"model name", "gpt-4o-mini", false}, + {"version", "4.1", false}, + {"status code", "404", false}, + {"price", "$40", false}, + {"ratio", "24/7", false}, + {"fourteen", "fourteen apples", false}, + {"no answer", "I am not sure", false}, + {"empty", "", false}, + } + + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + if got := responseHasExpectedAnswer(tc.text); got != tc.want { + t.Errorf("responseHasExpectedAnswer(%q) = %v, want %v", tc.text, got, tc.want) + } + }) + } +} diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go new file mode 100644 index 00000000000..1943fe61087 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go @@ -0,0 +1,143 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//go:build linux + +package e2elive + +import ( + "fmt" + "os" + "strings" + + expect "github.com/Netflix/go-expect" + "github.com/creack/pty" + "github.com/hinshun/vt10x" +) + +// Key sequences sent to the interactive CLI over the pseudo-terminal. +const ( + keyEnter = "\r" + keyDown = "\x1b[B" +) + +// console drives an interactive child process through a pseudo-terminal and +// renders its output with a vt10x virtual terminal so tests can assert on the +// on-screen text (the same role tmux capture-pane played in the old driver). +// +// Wiring (mirrors AlecAivazis/survey's posix expect tests): +// +// child stdio ── ec.Tty() (pts) ─┐ +// ├─ go-expect tees child output ─► vt10x screen +// vt10x query replies ─► extSlave ┘ ▲ +// extMaster ─ go-expect feeds back to child stdin +// +// go-expect creates its own internal pty for the child (ec.Tty()). The external +// pty pair (extMaster/extSlave) exists solely so vt10x can answer terminal +// queries (e.g. cursor-position reports) back to the child; it is closed via +// WithCloser when the console is closed. +type console struct { + term vt10x.Terminal + ec *expect.Console +} + +// newConsole creates a console with a virtual terminal of the given size. +func newConsole(cols, rows int) (*console, error) { + extMaster, extSlave, err := pty.Open() + if err != nil { + return nil, fmt.Errorf("open feedback pty: %w", err) + } + + term := vt10x.New(vt10x.WithWriter(extSlave), vt10x.WithSize(cols, rows)) + + // Deliberately no WithDefaultTimeout: the drain goroutine runs ExpectEOF for + // the whole child lifetime, and a read timeout would stop it (ending screen + // updates) during the long quiet stretches of init (e.g. template download). + ec, err := expect.NewConsole( + expect.WithStdin(extMaster), + expect.WithStdout(term), + expect.WithCloser(extMaster, extSlave), + ) + if err != nil { + _ = extMaster.Close() + _ = extSlave.Close() + return nil, fmt.Errorf("create expect console: %w", err) + } + + // Match the child tty size to the virtual terminal so line wrapping in the + // rendered screen matches what the CLI actually drew. + //nolint:gosec // cols/rows are small fixed test dimensions; no overflow. + _ = pty.Setsize(ec.Tty(), &pty.Winsize{Cols: uint16(cols), Rows: uint16(rows)}) + + return &console{term: term, ec: ec}, nil +} + +// tty returns the slave pseudo-terminal the child process should attach its +// stdin/stdout/stderr to. +func (c *console) tty() *os.File { + return c.ec.Tty() +} + +// send writes raw bytes (keystrokes) to the child's tty. +func (c *console) send(s string) { + _, _ = c.ec.Send(s) +} + +// drain continuously renders child output to the virtual terminal until the +// child's tty closes (process exit). It MUST run for the whole child lifetime: +// go-expect only tees output to the screen while a read is in flight, so +// without this the screen would stay blank and the child would eventually block +// once the output pipe filled. +func (c *console) drain() { + _, _ = c.ec.ExpectEOF() +} + +// screen returns the current rendered virtual-terminal contents, cleaned of NUL +// padding and trailing whitespace on each line. +func (c *console) screen() string { + return cleanScreen(c.term.String()) +} + +// close tears down the console and all of its pseudo-terminals. +func (c *console) close() { + _ = c.ec.Close() +} + +// cleanScreen normalizes a vt10x screen dump: empty cells render as NUL, which +// is replaced with spaces, then trailing whitespace is trimmed from each row. +func cleanScreen(s string) string { + s = strings.ReplaceAll(s, "\x00", " ") + lines := strings.Split(s, "\n") + for i, l := range lines { + lines[i] = strings.TrimRight(l, " \t") + } + return strings.Join(lines, "\n") +} + +// nonEmptyLines returns the screen's non-blank lines, trimmed. +func nonEmptyLines(screen string) []string { + var out []string + for l := range strings.SplitSeq(screen, "\n") { + if t := strings.TrimSpace(l); t != "" { + out = append(out, t) + } + } + return out +} + +// activePrompt returns the lowercased text of the last survey "?" prompt line on +// screen, or "" if none is visible. +func activePrompt(screen string) string { + lines := nonEmptyLines(screen) + for i := len(lines) - 1; i >= 0; i-- { + if strings.HasPrefix(lines[i], "?") { + return strings.ToLower(lines[i]) + } + } + return "" +} + +// screenContains reports whether screen contains sub (case-insensitive). +func screenContains(screen, sub string) bool { + return strings.Contains(strings.ToLower(screen), strings.ToLower(sub)) +} diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py deleted file mode 100644 index f04f26f9346..00000000000 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py +++ /dev/null @@ -1,925 +0,0 @@ -#!/usr/bin/env python3 -"""Full E2E test: init -> provision -> deploy -> invoke -> down. - -See README.md for complete setup & run instructions. - -Prerequisites: - - Linux (including WSL) with tmux (>=3.2), Python 3.12+ — also runs on the - Azure DevOps Linux agent via eng/pipelines/ext-azure-ai-agents-live.yml - - azd (>=1.25.5) with azure.ai.agents extension, logged in via `azd auth login` - - For local WSL runs, leave auth.useAzCliAuth unset (azd built-in auth, via - `azd config unset auth.useAzCliAuth`); under CI (GitHub Actions / Azure - DevOps / E2E_USE_AZ_CLI_AUTH=true) the script auto-enables az CLI auth. - - GitHub token available via gh.exe or $GITHUB_TOKEN - -Recommended env vars: - E2E_CREATE_PROJECT=true — always create new Foundry project (avoid stale resources) - E2E_LOCATION=eastus2 — region with sufficient model quota - E2E_HOME=$HOME — home directory for azd config -""" -import subprocess -import time -import sys -import os - -import re -import shlex - -TMUX = os.environ.get("E2E_TMUX", "tmux") -SOCK = os.environ.get("E2E_SOCK", "e2e") -SESS = os.environ.get("E2E_SESS", "e2e") -TESTDIR = os.environ.get("E2E_TESTDIR", "/tmp/e2e-tests/full-e2e") -HOME_DIR = os.environ.get("E2E_HOME", os.environ.get("HOME", "/home/runner")) -SUBSCRIPTION = os.environ.get("E2E_SUBSCRIPTION", "") -PROJECT = os.environ.get("E2E_PROJECT", "") -TENANT = os.environ.get("E2E_TENANT", "") -AGENT_NAME = os.environ.get("E2E_AGENT_NAME", "") # Optional: unique name for parallel isolation -CREATE_PROJECT = os.environ.get("E2E_CREATE_PROJECT", "").lower() in ("1", "true", "yes") -LOCATION = os.environ.get("E2E_LOCATION", "eastus2") # Region for new projects -# Inherit full parent PATH so tmux sessions get az-wrapper, azd, etc. -PARENT_PATH = os.environ.get("PATH", f"{HOME_DIR}/bin:/usr/local/bin:/usr/bin:/bin") -_tenant_env = f"; export AZURE_TENANT_ID={shlex.quote(TENANT)}" if TENANT else "" -# The GitHub token is intentionally NOT baked into ENV_SETUP. It is exported -# exactly once in setup() (immediately before the tmux scrollback is cleared), -# so the secret never lingers in pane history or gets duplicated across panes. -ENV_SETUP = f"export HOME={shlex.quote(HOME_DIR)}; export PATH={shlex.quote(PARENT_PATH)}{_tenant_env}" - -# Track results -results = {} -DEPLOY_MODE = os.environ.get("E2E_DEPLOY_MODE", "code") # "code" or "container" -_SENTINEL_BASE = "__DONE_{}_".format(os.getpid()) -_sentinel_counter = 0 - - -def get_gh_token(): - """Get GitHub token from env or gh CLI.""" - token = os.environ.get("GITHUB_TOKEN", os.environ.get("GH_TOKEN", "")) - if token: - return token - # Try native gh CLI - try: - r = subprocess.run(["gh", "auth", "token"], capture_output=True, text=True, timeout=10) - if r.returncode == 0 and r.stdout.strip(): - return r.stdout.strip() - except Exception: - pass - # Try Windows gh.exe (WSL local-dev only) - if os.path.exists("/mnt/c"): - try: - r = subprocess.run( - ["/mnt/c/Program Files/GitHub CLI/gh.exe", "auth", "token"], - capture_output=True, text=True, timeout=10 - ) - if r.returncode == 0 and r.stdout.strip(): - return r.stdout.strip() - except Exception: - pass - return "" - - -def tmux(*args): - cmd = [TMUX, "-L", SOCK] + list(args) - r = subprocess.run(cmd, capture_output=True, text=True, timeout=10) - if r.returncode != 0 and r.stderr: - print(f" [tmux error] {' '.join(args[:3])}: {r.stderr.strip()}") - return r.stdout - - -def send(text): - tmux("send-keys", "-t", SESS, "-l", text) - - -def key(k): - tmux("send-keys", "-t", SESS, k) - - -def capture(): - return tmux("capture-pane", "-t", SESS, "-p") - - -def wait_for(pattern, timeout=60): - deadline = time.time() + timeout - while time.time() < deadline: - cap = capture() - if pattern.lower() in cap.lower(): - return cap - time.sleep(1) - return None - - -def wait_for_or_fail(pattern, timeout=60, phase=""): - cap = wait_for(pattern, timeout) - if cap is None: - print(f"TIMEOUT waiting for: {pattern}") - print("Last capture:") - print(capture()) - if phase: - results[phase] = "FAIL (timeout)" - return None - return cap - - -def select_by_text(target, delay=1.5): - send(target) - time.sleep(delay) - key("Enter") - - -def show(label="", lines_count=15): - cap = capture() - lines = [l for l in cap.split("\n") if l.strip()] - if label: - print(f"\n--- {label} ---") - for l in lines[-lines_count:]: - print(f" {l}") - - -def run_cmd(cmd, timeout=600): - """Send command with unique sentinel and wait for completion. Returns (capture_text, exit_code). - - Each call uses a unique sentinel (base + counter) so that leftover output from - previous commands cannot cause a false match. - """ - global _sentinel_counter - _sentinel_counter += 1 - sentinel = f"{_SENTINEL_BASE}{_sentinel_counter}_" - sentinel_re = re.compile(re.escape(sentinel) + r"(\d+)") - - send(f"{cmd} ; echo {sentinel}$?") - key("Enter") - deadline = time.time() + timeout - while time.time() < deadline: - cap = capture() - m = sentinel_re.search(cap) - if m: - rc = int(m.group(1)) - return cap, rc - time.sleep(3) - return None, -1 - - -# Legacy: kept for reference, prefer run_cmd() -def _wait_for_shell_prompt_legacy(timeout=600): - """Wait for bash prompt (command finished).""" - deadline = time.time() + timeout - while time.time() < deadline: - cap = capture() - lines = [l for l in cap.split("\n") if l.strip()] - if lines: - last = lines[-1].strip() - if last.endswith("$") or last.startswith("bash"): - return cap - time.sleep(3) - return None - - -def validate_init_output(testdir): - """Validate init produced correct artifacts on disk.""" - import glob as _glob - for d in os.listdir(testdir): - subdir = os.path.join(testdir, d) - if os.path.isdir(subdir): - azure_yaml = os.path.join(subdir, "azure.yaml") - if os.path.exists(azure_yaml): - with open(azure_yaml) as f: - content = f.read() - if "host:" in content and "azure.ai.agent" in content: - # agent.yaml may be nested under src// - agent_yamls = _glob.glob(os.path.join(subdir, "**", "agent.yaml"), recursive=True) - if agent_yamls or os.path.exists(os.path.join(subdir, "agent.yaml")): - return True - return False - - -def find_service_name(testdir): - """Read the first service name from azure.yaml under the generated project.""" - for d in os.listdir(testdir): - subdir = os.path.join(testdir, d) - azure_yaml_path = os.path.join(subdir, "azure.yaml") - if os.path.isdir(subdir) and os.path.exists(azure_yaml_path): - with open(azure_yaml_path) as f: - content = f.read() - in_services = False - for line in content.split("\n"): - if line.strip() == "services:": - in_services = True - continue - if in_services and line.startswith(" ") and line.strip().endswith(":"): - return line.strip().rstrip(":") - if in_services and not line.startswith(" ") and line.strip(): - break - return None - - -def _assert_safe_testdir(path): - """Guardrail before `rm -rf`: refuse a path that is not a clearly disposable - test dir, so a bad E2E_TESTDIR (e.g. '/', '/tmp', '$HOME') can never trigger - a destructive delete. Returns the normalized absolute path.""" - abspath = os.path.abspath(path) - home = os.path.abspath(os.path.expanduser("~")) - protected = {"/", "/tmp", "/var", "/usr", "/etc", "/bin", "/lib", - "/root", "/home", home} - if abspath in protected or abspath.count("/") < 2: - raise RuntimeError( - f"Refusing to `rm -rf` unsafe E2E_TESTDIR={path!r} (resolved {abspath!r})") - return abspath - - -# =========================================================== -# SETUP -# =========================================================== -def setup(): - global TESTDIR - print("=" * 60) - print("SETUP") - print("=" * 60) - - # Bound kill-server so a wedged tmux cannot hang the whole CI job. It stays - # best-effort (no check): a "no server running" error here is expected. - subprocess.run([TMUX, "-L", SOCK, "kill-server"], capture_output=True, timeout=30) - time.sleep(0.5) - - # Clean test dir (guard against a destructive E2E_TESTDIR like '/' or '/tmp'). - # Normalize to a vetted ABSOLUTE path and assign it back to the module global - # so every later step (cd, payload --input-file, dir scans, teardown) acts on - # the exact directory we wipe/create here — even if E2E_TESTDIR was relative. - # check=True surfaces a failed delete instead of running against a dirty dir; - # timeout keeps a stuck delete from stalling the job with no diagnostic. - TESTDIR = _assert_safe_testdir(TESTDIR) - subprocess.run(["rm", "-rf", "--", TESTDIR], check=True, timeout=120) - os.makedirs(TESTDIR, exist_ok=True) - - # Create tmux session - tmux("new-session", "-d", "-s", SESS, "-x", "200", "-y", "50", "bash --norc --noprofile") - time.sleep(2) - - cap = capture() - print(f"Session alive: {len(cap)} chars") - - # Set environment - gh_token = get_gh_token() - env_cmd = ENV_SETUP - if gh_token: - env_cmd += f"; export GH_TOKEN={shlex.quote(gh_token)}; export GITHUB_TOKEN={shlex.quote(gh_token)}" - print(f"GitHub token: {len(gh_token)} chars") - send(env_cmd) - key("Enter") - time.sleep(1) - # Clear scrollback to avoid token leaking into capture output. `clear` only - # wipes the visible screen, so also drop tmux's scrollback buffer — the GH - # token was just typed into this pane and must never resurface in capture(). - send("clear") - key("Enter") - time.sleep(0.5) - tmux("clear-history", "-t", SESS) - - send("echo ENV_OK") - key("Enter") - time.sleep(2) - cap = capture() - if "ENV_OK" not in cap: - print("ERROR: Environment setup failed") - sys.exit(1) - print("Environment OK") - - # Auth config: CI uses az CLI (OIDC token), local WSL uses azd built-in auth. - # In CI, the pipeline logs az CLI in via OIDC → azd needs useAzCliAuth=true. - # In WSL, az CLI is slow (cross-process) → must use azd built-in auth. - # Detection: GitHub Actions (GITHUB_ACTIONS), Azure DevOps (TF_BUILD), or an - # explicit E2E_USE_AZ_CLI_AUTH override for other CI / manual runs. - _use_az_cli_auth = ( - os.environ.get("E2E_USE_AZ_CLI_AUTH", "").lower() in ("1", "true", "yes") - or bool(os.environ.get("GITHUB_ACTIONS")) - or bool(os.environ.get("TF_BUILD")) # Azure DevOps pipeline - ) - if _use_az_cli_auth: - send("azd config set auth.useAzCliAuth true") - else: - send("azd config unset auth.useAzCliAuth 2>/dev/null") - key("Enter") - time.sleep(1) - - send(f"cd {shlex.quote(TESTDIR)}") - key("Enter") - time.sleep(1) - - -# =========================================================== -# PHASE 1: INIT -# =========================================================== -def phase_init(): - print("\n" + "=" * 60) - print("PHASE 1: azd ai agent init") - print("=" * 60) - - init_cmd = "azd ai agent init" - if AGENT_NAME: - init_cmd += f" --agent-name {shlex.quote(AGENT_NAME)}" - send(init_cmd) - key("Enter") - time.sleep(8) - - # Step 1: Language - if not wait_for_or_fail("Select a language", 30, "init"): - return False - print("[1] Language: Python") - select_by_text("Python") - time.sleep(3) - - # Step 2: Template - if not wait_for_or_fail("Select a starter template", 30, "init"): - return False - print("[2] Template: Basic agent (Invocations)") - select_by_text("Basic agent (Invocations") - time.sleep(8) - - # Step 2.5: Git protocol (may appear between template download and name prompt) - time.sleep(3) - cap = capture() - if "protocol" in cap.lower() or "git operations" in cap.lower(): - print("[2.5] Git protocol: HTTPS (default)") - key("Enter") - time.sleep(3) - - # Step 3: Name (may be skipped if --agent-name was used) - if AGENT_NAME: - print(f"[3] Name: {AGENT_NAME} (via --agent-name, prompt may be skipped)") - # Wait briefly for name prompt — if it doesn't appear, flag worked - cap = wait_for("Enter a name", 15) - if cap: - key("Enter") - time.sleep(5) - else: - if not wait_for_or_fail("Enter a name", 30, "init"): - return False - print("[3] Name: default") - key("Enter") - time.sleep(8) - - # Step 4: Foundry project type - if not wait_for_or_fail("Select a Foundry project", 30, "init"): - return False - - if CREATE_PROJECT: - # Create a new Foundry project — azd manages all resources - print("[4] Create a new Foundry project") - select_by_text("Create") - time.sleep(5) - # Remaining prompts (subscription, location, names) handled by dynamic loop - else: - # Use existing Foundry project - print("[4] Use existing Foundry project") - key("Enter") - - # Step 5: Wait for subscription or project picker - deadline = time.time() + 30 - while time.time() < deadline: - time.sleep(3) - cap = capture() - lines = [l for l in cap.split("\n") if l.strip()] - active_prompt = "" - for l in reversed(lines): - if l.strip().startswith("?"): - active_prompt = l.strip().lower() - break - if "subscription" in active_prompt: - print("[5] Subscription: accept default") - key("Enter") - time.sleep(10) - if not wait_for_or_fail("Select a Foundry project", 30, "init"): - return False - break - elif "select a foundry project" in active_prompt and "use an existing" not in active_prompt: - print("[5] Subscription: skipped (already on project picker)") - break - if lines and (lines[-1].strip().endswith("$") or lines[-1].strip().startswith("bash")): - if any("error" in l.lower() for l in lines[-5:]): - print("[5] ERROR: CLI exited") - show("Error") - results["init"] = "FAIL (error)" - return False - else: - print("[5] Timeout waiting for subscription/project picker") - show("Timeout") - results["init"] = "FAIL (timeout step 5)" - return False - - # Step 6: Project — verify we're on the project picker before typing - cap = capture() - cap_lines = [l for l in cap.split("\n") if l.strip()] - last_prompt = "" - for l in reversed(cap_lines): - if l.strip().startswith("?"): - last_prompt = l.strip().lower() - break - - if "foundry project" in last_prompt or "project" in last_prompt: - print(f"[6] Project: {PROJECT}") - if PROJECT: - select_by_text(PROJECT, delay=3) - else: - key("Enter") - time.sleep(10) - - # Verify we're past the project picker (not stuck) - time.sleep(3) - cap = capture() - prompt_line = "" - for l in reversed(cap.split("\n")): - if l.strip().startswith("?"): - prompt_line = l.strip().lower() - break - if "select a foundry project" in prompt_line: - print("[6b] Project filter may have failed, accepting highlighted") - key("Enter") - time.sleep(5) - else: - print("[6] Not on project picker, moving to dynamic") - - # Step 7+: Dynamic prompts - _last_prompt = "" - _same_prompt_count = 0 - for step_num in range(7, 45): - time.sleep(3) - cap = capture() - cap_lower = cap.lower() - - if "added to your azd project" in cap_lower or "agent definition added" in cap_lower: - print(f"[{step_num}] === INIT COMPLETE ===") - if not validate_init_output(TESTDIR): - print(" WARNING: marker found but disk validation failed, checking...") - time.sleep(5) - if not validate_init_output(TESTDIR): - print(" FAIL: artifacts not on disk despite completion marker") - results["init"] = "FAIL (no artifacts)" - return False - results["init"] = "PASS" - return True - - # Check for error exit - lines = [l for l in cap.split("\n") if l.strip()] - if lines: - last = lines[-1].strip() - if (last.endswith("$") or last.startswith("bash")): - if "error" in cap_lower: - print(f"[{step_num}] Init exited with error") - show("Error") - results["init"] = "FAIL (error)" - return False - - # Find ? prompt - prompt = "" - for l in reversed(lines): - if l.strip().startswith("?"): - prompt = l.strip().lower() - break - - if not prompt: - time.sleep(5) - cap = capture() - lines = [l for l in cap.split("\n") if l.strip()] - for l in reversed(lines): - if l.strip().startswith("?"): - prompt = l.strip().lower() - break - - if not prompt: - if lines and (lines[-1].strip().startswith("bash") or lines[-1].strip().endswith("$")): - # Check if init completed without marker - if validate_init_output(TESTDIR): - print(f"[{step_num}] Init complete (disk validation)") - results["init"] = "PASS" - return True - print(f"[{step_num}] Shell prompt, no completion marker") - show("Final") - results["init"] = "FAIL (no completion)" - return False - print(f"[{step_num}] Waiting...") - continue - - print(f"[{step_num}] {prompt[:80]}") - - # Detect prompt loops — same prompt question repeating 3+ times - # Compare by question part before ':' to handle varying filter text - colon_idx = prompt.find(":") - prompt_key = prompt[:colon_idx].strip() if colon_idx > 0 else prompt.strip() - if prompt_key == _last_prompt: - _same_prompt_count += 1 - else: - _same_prompt_count = 1 - _last_prompt = prompt_key - - if _same_prompt_count >= 3: - print(f" !! Loop detected ({_same_prompt_count}x same prompt)") - if "model" in prompt or "is specified" in prompt: - # Model prompt looping — probably no quota. Try Down to pick alt option. - print(" -> navigating to alternative option") - key("Down") - time.sleep(0.3) - key("Enter") - time.sleep(3) - continue - elif _same_prompt_count >= 5: - print(" FAIL: stuck in prompt loop") - results["init"] = "FAIL (prompt loop)" - return False - - # Handle prompts - if "[y/n]" in prompt or "(y/n)" in prompt: - # Confirm prompts — answer yes unless it's asking to reuse a conflicting name - if "continue with this existing agent name" in prompt: - print(" -> no (use fresh name)") - send("n") - key("Enter") - else: - print(" -> yes") - send("y") - key("Enter") - elif "protocol" in prompt or "git operations" in prompt: - # "What is your preferred protocol for Git operations?" → HTTPS (default) - print(" -> HTTPS (default)") - key("Enter") - elif "enter a different name" in prompt: - print(" -> default name") - key("Enter") - elif "acr" in prompt or "container registry" in prompt: - print(" -> blank (create new)") - key("Enter") - elif "enter model deployment name" in prompt or ("enter" in prompt and "deployment" in prompt and "name" in prompt): - print(" -> default name") - key("Enter") - elif "existing deployment" in prompt or "is specified in the agent manifest" in prompt or ("found" in prompt and "deployment" in prompt): - print(" -> use existing/specified") - key("Enter") - elif "capacity" in prompt: - # Capacity field is usually pre-filled; accept default - print(" -> accept capacity (default)") - key("Enter") - elif "sku" in prompt: - print(" -> default SKU") - key("Enter") - elif "version" in prompt: - print(" -> default version") - key("Enter") - elif "select" in prompt and "model" in prompt: - print(" -> select gpt-4o-mini") - select_by_text("gpt-4o-mini") - elif "subscription" in prompt: - if SUBSCRIPTION: - print(f" -> subscription: filter by {SUBSCRIPTION[:8]}") - select_by_text(SUBSCRIPTION[:8], delay=2) - else: - print(" -> subscription: accept default") - key("Enter") - elif "location" in prompt or "region" in prompt: - print(f" -> location: {LOCATION}") - select_by_text(LOCATION, delay=2) - elif "foundry project" in prompt or ("select" in prompt and "project" in prompt): - if PROJECT: - print(f" -> project: {PROJECT}") - select_by_text(PROJECT, delay=3) - else: - print(" -> default project") - key("Enter") - elif "account name" in prompt or "resource name" in prompt or "hub name" in prompt: - print(" -> accept default name") - key("Enter") - elif "model" in prompt and "capacity" not in prompt: - print(" -> default model") - key("Enter") - elif "deploy" in prompt and ("mode" in prompt or "how" in prompt) and "capacity" not in prompt: - if DEPLOY_MODE == "container": - print(" -> Container") - select_by_text("Container") - else: - print(" -> Source Code") - select_by_text("Source") - elif "what would you like to do" in prompt: - # Accept "Exit setup" (default) to finish init. - # Do NOT navigate up/down — that causes infinite loops by selecting - # "Add another model" or similar options. - print(" -> Exit setup (default)") - key("Enter") - else: - print(" -> Enter (default)") - key("Enter") - time.sleep(3) - - results["init"] = "FAIL (too many steps)" - return False - - -# =========================================================== -# PHASE 2: PROVISION -# =========================================================== -def phase_provision(): - print("\n" + "=" * 60) - print("PHASE 2: azd provision") - print("=" * 60) - - # Find the project subdirectory created by init - project_dir = None - for d in os.listdir(TESTDIR): - subdir = os.path.join(TESTDIR, d) - if os.path.isdir(subdir) and os.path.exists(os.path.join(subdir, "azure.yaml")): - project_dir = subdir - break - - if not project_dir: - print("ERROR: No project directory with azure.yaml found") - results["provision"] = "FAIL (no project dir)" - return False - - print(f"Project dir: {project_dir}") - send(f"cd {shlex.quote(project_dir)}") - key("Enter") - time.sleep(1) - - # Provision can take several minutes - print("Waiting for provision to complete (up to 10 min)...") - cap, rc = run_cmd("azd provision --no-prompt", timeout=600) - if cap is None: - print("TIMEOUT: provision did not complete in 10 min") - show("Current state", 20) - results["provision"] = "FAIL (timeout)" - return False - - show("Provision result", 20) - if rc != 0: - print(f"Provision FAILED (exit code {rc})") - results["provision"] = f"FAIL (exit code {rc})" - return False - - print("Provision appears complete") - results["provision"] = "PASS" - return True - - -# =========================================================== -# PHASE 3: DEPLOY -# =========================================================== -def phase_deploy(): - print("\n" + "=" * 60) - print("PHASE 3: azd deploy") - print("=" * 60) - - # Deploy can take several minutes - print("Waiting for deploy to complete (up to 10 min)...") - cap, rc = run_cmd("azd deploy --no-prompt", timeout=600) - if cap is None: - print("TIMEOUT: deploy did not complete in 10 min") - show("Current state", 20) - results["deploy"] = "FAIL (timeout)" - return False - - show("Deploy result", 20) - if rc != 0: - print(f"Deploy FAILED (exit code {rc})") - results["deploy"] = f"FAIL (exit code {rc})" - return False - - print("Deploy appears complete") - results["deploy"] = "PASS" - return True - - -# =========================================================== -# PHASE 4: INVOKE -# =========================================================== -def phase_invoke(): - print("\n" + "=" * 60) - print("PHASE 4: azd ai agent invoke") - print("=" * 60) - - # Wait for agent to fully start after deploy - wait_secs = 60 if DEPLOY_MODE == "container" else 30 - print(f"Waiting {wait_secs}s for agent startup ({DEPLOY_MODE} mode)...") - time.sleep(wait_secs) - - # The invocations protocol requires JSON payload via --input-file. - # Positional message sends empty body to invocations agents (azd bug/limitation). - service_name = find_service_name(TESTDIR) - if not service_name: - print("ERROR: Could not determine service name from azure.yaml") - results["invoke"] = "FAIL (no service name)" - return False - print(f" Service name: {service_name}") - - # Write payload to temp file for --input-file - payload_file = os.path.join(TESTDIR, ".invoke-payload.json") - with open(payload_file, "w") as f: - f.write('{"message": "Hello, what is 2+2?"}') - - max_retries = 3 - for attempt in range(1, max_retries + 1): - print(f"\nInvoke attempt {attempt}/{max_retries}...") - cap, rc = run_cmd( - f"azd ai agent invoke {shlex.quote(service_name)} --new-session -f {shlex.quote(payload_file)}", - timeout=180, - ) - if cap is None: - print("TIMEOUT: invoke did not complete in 3 min") - show("Current state", 20) - if attempt == max_retries: - results["invoke"] = "FAIL (timeout)" - return False - continue - - show("Invoke result", 20) - - # Check for errors - # Look for ERROR line in last few lines of output - lines = [l for l in cap.split("\n") if l.strip()] - has_error = False - error_msg = "" - if rc != 0: - for l in lines: - if "ERROR:" in l or ("error" in l.lower() and "500" in l): - has_error = True - error_msg = l.strip() - break - if not error_msg: - error_msg = f"exit code {rc}" - - if rc != 0 and has_error and ("500" in error_msg or "Internal Server Error" in error_msg): - print(f" Server error: {error_msg[:100]}") - if attempt < max_retries: - print(" Retrying in 30s (container may still be starting)...") - time.sleep(30) - continue - else: - # Get container logs for debugging - print("\n Fetching agent logs for debugging...") - send(f"azd ai agent monitor {shlex.quote(service_name)} --tail 50") - key("Enter") - time.sleep(10) - log_cap = _wait_for_shell_prompt_legacy(timeout=60) - if log_cap: - show("Agent logs", 30) - results["invoke"] = f"FAIL (HTTP 500: {error_msg[:80]})" - return False - elif rc != 0: - print(f" Error: {error_msg[:100]}") - if attempt < max_retries: - time.sleep(15) - continue - results["invoke"] = f"FAIL ({error_msg[:80]})" - return False - else: - # Success — verify response content - # Extract lines between the LAST invoke command and its sentinel. - # The capture may contain output from previous phases, so we must - # find the last occurrence of the invoke command to avoid matching - # stale sentinels from earlier phases (deploy, provision, etc.). - all_lines = cap.split("\n") - # Find the last line that contains the invoke command - invoke_start = -1 - for i in range(len(all_lines) - 1, -1, -1): - if "invoke" in all_lines[i].lower() and service_name in all_lines[i]: - invoke_start = i - break - - resp_lines = [] - if invoke_start >= 0: - for line in all_lines[invoke_start + 1:]: - if _SENTINEL_BASE in line: - break - if line.strip(): - resp_lines.append(line.strip()) - - response_text = "\n".join(resp_lines) - if not response_text.strip(): - print(" WARNING: invoke returned empty response") - if attempt < max_retries: - print(" Retrying...") - time.sleep(15) - continue - results["invoke"] = "FAIL (empty response)" - return False - - # Payload asks "what is 2+2?". Accept a standalone "4" token or the - # spelled-out word "four" (a live model may answer either). The regex - # requires "4" to stand alone so unrelated "4"s in captured output — - # model names ("gpt-4o-mini"), versions ("4.1"), or status codes - # ("404") — don't produce a false pass. - has_expected = ( - re.search(r"(?>> --keep flag: skipping teardown, agent remains deployed <<<") - results["teardown"] = "SKIPPED (--keep)" - else: - if not args.keep: - phase_teardown() - else: - # Init failed — but may have already created Azure resources (RG, project). - # Attempt cleanup if there's a .azure directory indicating provisioned state. - project_dir = None - if os.path.isdir(TESTDIR): - for d in os.listdir(TESTDIR): - azure_dir = os.path.join(TESTDIR, d, ".azure") - if os.path.isdir(azure_dir): - project_dir = os.path.join(TESTDIR, d) - break - if project_dir and not args.keep: - print(f"\nInit failed but found .azure in {project_dir} — attempting cleanup...") - send(f"cd {shlex.quote(project_dir)}") - key("Enter") - time.sleep(1) - phase_teardown() - - # Cleanup tmux - tmux("kill-session", "-t", SESS) - - elapsed = time.time() - start_time - print("\n" + "=" * 60) - print(f"RESULTS (elapsed: {elapsed:.0f}s)") - print("=" * 60) - all_pass = True - for phase, result in results.items(): - status = "✓" if "PASS" in result or "SKIPPED" in result else "✗" - print(f" {status} {phase}: {result}") - if "FAIL" in result: - all_pass = False - - required = ["init", "provision", "deploy", "invoke"] - passed_required = all(results.get(p, "").startswith("PASS") for p in required) - # A failed `azd down` leaks live Azure resources, so a teardown FAIL must fail - # the run too — never report a green run while leaking. "SKIPPED (--keep)" and - # an unreached teardown don't start with "FAIL", so they don't trip this. - teardown_ok = not results.get("teardown", "").startswith("FAIL") - - if passed_required and teardown_ok: - print("\n✓ ALL REQUIRED PHASES PASSED") - sys.exit(0) - else: - missing = [p for p in required if not results.get(p, "").startswith("PASS")] - if not teardown_ok: - missing.append("teardown") - print(f"\n✗ FAILED PHASES: {', '.join(missing)}") - sys.exit(1) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py deleted file mode 100644 index de4f9df3d76..00000000000 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py +++ /dev/null @@ -1,228 +0,0 @@ -#!/usr/bin/env python3 -"""Tier 2: Full E2E golden path tests — code deploy + container deploy. - -Runs test_full_e2e.py once per deploy mode (code, then container), sequentially. -Each run is isolated with its own: - - deploy mode (code vs container) - - tmux session/socket name - - working directory - - AZD_CONFIG_DIR (copied from ~/.azd so the installed extension is available) - - unique agent name (avoids Azure resource collisions) - -Prerequisites: - - Same as test_full_e2e.py: Linux (including WSL) with tmux, azd, az CLI, - tokens. Runs unattended on the Azure DevOps Linux agent in CI. - - Model quota for one deployment at a time -""" -import subprocess -import sys -import os -import time -import tempfile -import shutil -import hashlib -import collections -import threading - -SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__)) - - -def _cleanup_leaked_resources(testdir, env, label): - """Best-effort azd down for any project dirs left behind after timeout/crash.""" - if not os.path.isdir(testdir): - return - for d in os.listdir(testdir): - project_dir = os.path.join(testdir, d) - azure_yaml = os.path.join(project_dir, "azure.yaml") - if os.path.isdir(project_dir) and os.path.isfile(azure_yaml): - print(f" [{label}] Cleaning up leaked resources in {project_dir}...") - try: - r = subprocess.run( - ["azd", "down", "--force", "--purge", "--no-prompt"], - cwd=project_dir, env=env, - capture_output=True, text=True, timeout=300, - ) - if r.returncode == 0: - print(f" [{label}] Cleanup complete") - else: - print(f" [{label}] Cleanup FAILED (exit {r.returncode}) — " - f"resources may be leaked, check the subscription") - if r.stderr.strip(): - print(f" [{label}] [stderr] {r.stderr.strip()[:300]}") - except Exception as e: - print(f" [{label}] Cleanup failed: {e}") - - -def run_e2e(deploy_mode, label): - """Run a full E2E test with the given deploy mode.""" - sock = f"e2e-{deploy_mode}" - sess = f"e2e-{deploy_mode}" - testdir = f"/tmp/e2e-tests/tier2-{deploy_mode}" - - script_path = os.path.join(SCRIPT_DIR, "test_full_e2e.py") - - # Use sys.executable (not "python3") so the child runs under the exact same - # interpreter/version as this parent — matches the UsePythonVersion@0 pinned - # Python in CI and works inside virtualenvs. - cmd = [ - sys.executable, script_path, "--deploy-mode", deploy_mode - ] - - env = os.environ.copy() - env["E2E_DEPLOY_MODE"] = deploy_mode - env["E2E_SOCK"] = sock - env["E2E_SESS"] = sess - env["E2E_TESTDIR"] = testdir - # Isolate azd config per process to prevent parallel race on ~/.azd/config.json - # Use AZD_CONFIG_DIR (not AZURE_CONFIG_DIR which is for az CLI). - # Place outside testdir because child process rm -rf's testdir on startup. - # Copy from default ~/.azd so extensions (installed there) are available. - azd_config_dir = os.path.join(tempfile.gettempdir(), f"e2e-azd-config-{deploy_mode}") - default_azd = os.path.expanduser("~/.azd") - if os.path.isdir(default_azd): - if os.path.isdir(azd_config_dir): - shutil.rmtree(azd_config_dir) - shutil.copytree(default_azd, azd_config_dir) - else: - os.makedirs(azd_config_dir, exist_ok=True) - env["AZD_CONFIG_DIR"] = azd_config_dir - # Unique agent name to avoid Azure resource collisions across runs. - # sha256 (not md5) only to avoid noise from security scanners — this is a - # non-cryptographic uniqueness suffix. - unique_suffix = hashlib.sha256(f"{deploy_mode}-{os.getpid()}".encode()).hexdigest()[:6] - env["E2E_AGENT_NAME"] = f"e2e-{deploy_mode}-{unique_suffix}" - - print(f"\n{'='*60}") - print(f"[{label}] Starting: deploy_mode={deploy_mode}, sock={sock}") - print(f"{'='*60}") - - # Hard cap for the whole child run. Must be strictly GREATER than the SUM of - # the child's own sequential per-phase bounds in test_full_e2e.py, so this - # watchdog only trips for a truly hung child and never preempts a - # slow-but-healthy run (which would be a spurious failure AND would skip the - # child's azd teardown, leaking live Azure resources). Child phase budget: - # setup ~3m + init ~5m + provision 10m + deploy 10m + invoke ~12m - # + teardown 10m ~= 50m. - # The 10 min margin over that 50m sum is a full extra teardown budget, so even - # a run that exhausts every earlier phase still has room to tear down - # gracefully before this hard kill fires. - timeout_s = 3600 # 60 min hard cap per child (50m phase sum + 10m teardown margin) - keep_artifacts = os.environ.get("E2E_KEEP_ARTIFACTS", "").lower() in ("1", "true", "yes") - start = time.time() - try: - # Stream child output live (visible in the CI log, nothing buffered in - # memory) while keeping a bounded tail for the summary. A watchdog timer - # enforces the hard timeout even if the child hangs without any output. - tail = collections.deque(maxlen=30) - proc = subprocess.Popen( - cmd, env=env, text=True, bufsize=1, - stdout=subprocess.PIPE, stderr=subprocess.STDOUT, - ) - if proc.stdout is None: # stdout=PIPE guarantees a pipe; be explicit for `python -O` - raise RuntimeError("subprocess stdout pipe was not created") - timed_out = threading.Event() - - def _on_timeout(): - timed_out.set() - proc.kill() - - watchdog = threading.Timer(timeout_s, _on_timeout) - watchdog.start() - try: - for line in proc.stdout: - sys.stdout.write(line) - sys.stdout.flush() - tail.append(line.rstrip("\n")) - finally: - watchdog.cancel() - returncode = proc.wait() - elapsed = time.time() - start - - if timed_out.is_set(): - print(f"\n--- [{label}] TIMEOUT after {elapsed:.0f}s ---") - # The child's tmux server runs detached, so killing the child Python - # process does not stop it. Tear it down explicitly so we don't leak - # orphaned tmux servers/sockets on reused CI agents. - tmux_bin = env.get("E2E_TMUX", "tmux") - try: - subprocess.run( - [tmux_bin, "-L", sock, "kill-server"], - capture_output=True, text=True, timeout=30, - ) - except Exception as e: - print(f" [{label}] tmux kill-server failed: {e}") - # Best-effort cleanup so a hung run does not leak Azure resources. - _cleanup_leaked_resources(testdir, env, label) - return { - "label": label, - "deploy_mode": deploy_mode, - "success": False, - "elapsed": elapsed, - "returncode": -1, - } - - print(f"\n--- [{label}] Summary ({elapsed:.0f}s, exit {returncode}) ---") - for line in tail: - print(f" {line}") - return { - "label": label, - "deploy_mode": deploy_mode, - "success": returncode == 0, - "elapsed": elapsed, - "returncode": returncode, - } - finally: - # Drop the per-mode AZD_CONFIG_DIR copy unless explicitly kept for debugging. - if not keep_artifacts and os.path.isdir(azd_config_dir): - shutil.rmtree(azd_config_dir, ignore_errors=True) - - -if __name__ == "__main__": - import argparse - parser = argparse.ArgumentParser(description="Tier 2: Golden path E2E tests") - parser.add_argument("--mode", choices=["both", "code", "container"], default="both", - help="Which mode(s) to run") - args = parser.parse_args() - - print("=" * 60) - print("TIER 2: Golden Path E2E Tests") - print("=" * 60) - - tests = [] - if args.mode in ("both", "code"): - tests.append(("code", "CODE-DEPLOY")) - if args.mode in ("both", "container"): - tests.append(("container", "CONTAINER-DEPLOY")) - - print(f" Tests: {[t[1] for t in tests]}") - print(" Execution: sequential") - - start_all = time.time() - results = [] - - # Always run sequentially: concurrent deploys in the same subscription race - # on shared resources (ACR, Foundry project) and exhaust model quota. - for mode, label in tests: - result = run_e2e(mode, label) - results.append(result) - - total_elapsed = time.time() - start_all - - # Summary - print(f"\n{'='*60}") - print(f"TIER 2 RESULTS ({total_elapsed:.0f}s total)") - print("=" * 60) - all_pass = True - for r in results: - status = "✓" if r["success"] else "✗" - print(f" {status} {r['label']}: {'PASS' if r['success'] else 'FAIL'} ({r['elapsed']:.0f}s)") - if not r["success"]: - all_pass = False - - if all_pass: - print(f"\n✓ ALL TIER 2 TESTS PASSED ({total_elapsed:.0f}s)") - sys.exit(0) - else: - failed = [r["label"] for r in results if not r["success"]] - print(f"\n✗ FAILED: {', '.join(failed)}") - sys.exit(1) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go new file mode 100644 index 00000000000..1c0535d57b2 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -0,0 +1,770 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +//go:build linux + +package e2elive + +import ( + "bytes" + "context" + "crypto/sha256" + "encoding/hex" + "errors" + "fmt" + "io" + "io/fs" + "os" + "os/exec" + "path/filepath" + "strings" + "syscall" + "testing" + "time" +) + +// liveEnvVar gates the live test: it only runs when set to "1". This keeps the +// expensive, Azure-touching test out of the normal `go test ./...` run. +const liveEnvVar = "AZURE_AI_AGENTS_E2E_LIVE" + +// Virtual terminal dimensions for the interactive init phase. +const ( + initCols = 200 + initRows = 50 +) + +// Phase time budgets. The per-mode runTimeout must exceed the sum of the phase +// budgets so a slow-but-healthy run is never preempted (which would also skip +// the teardown and leak resources). Two modes at runTimeout each fit inside the +// `go test -timeout 125m` cap used by the pipeline, whose ADO step adds a small +// margin on top before force-killing the process. +const ( + runTimeout = 60 * time.Minute + initTimeout = 8 * time.Minute + provisionTimeout = 10 * time.Minute + deployTimeout = 10 * time.Minute + invokeTimeout = 3 * time.Minute + monitorTimeout = 60 * time.Second + teardownTimeout = 10 * time.Minute + + initStepDelay = 3 * time.Second +) + +// TestTier2Live exercises the full golden path against live Azure for each +// requested deploy mode, sequentially (concurrent deploys in one subscription +// race on shared resources and exhaust model quota). +func TestTier2Live(t *testing.T) { + if os.Getenv(liveEnvVar) != "1" { + t.Skipf("set %s=1 to run the live Tier 2 golden-path test", liveEnvVar) + } + + for _, mode := range deployModesFromEnv() { + t.Run(mode, func(t *testing.T) { + r := newRunner(t, mode) + ctx, cancel := context.WithTimeout(context.Background(), runTimeout) + defer cancel() + r.run(ctx) + }) + } +} + +// deployModesFromEnv reads E2E_DEPLOY_MODES (code|container|both); default both. +func deployModesFromEnv() []string { + switch strings.ToLower(strings.TrimSpace(os.Getenv("E2E_DEPLOY_MODES"))) { + case "code": + return []string{"code"} + case "container": + return []string{"container"} + default: + return []string{"code", "container"} + } +} + +// runner holds the per-mode state for one golden-path run. +type runner struct { + t *testing.T + mode string + testDir string + agentName string + env []string + projectDir string + c *console +} + +// newRunner prepares an isolated working directory, a private AZD_CONFIG_DIR +// (copied from ~/.azd so the installed extension is available), and a unique +// agent name, then registers teardown so resources are cleaned up even on +// failure. +func newRunner(t *testing.T, mode string) *runner { + t.Helper() + + testDir := getenvDefault("E2E_TESTDIR", "/tmp/e2e-tests/tier2-"+mode) + if err := assertSafeTestDir(testDir); err != nil { + t.Fatal(err) + } + if err := os.RemoveAll(testDir); err != nil { + t.Fatalf("clean test dir: %v", err) + } + if err := os.MkdirAll(testDir, 0o700); err != nil { + t.Fatalf("create test dir: %v", err) + } + + configDir := filepath.Join(os.TempDir(), "e2e-azd-config-"+mode) + setupConfigDir(t, configDir) + + env := os.Environ() + env = append(env, "AZD_CONFIG_DIR="+configDir) + if tenant := os.Getenv("E2E_TENANT"); tenant != "" { + env = append(env, "AZURE_TENANT_ID="+tenant) + } + if tok := ghToken(); tok != "" { + env = append(env, "GH_TOKEN="+tok, "GITHUB_TOKEN="+tok) + } + + r := &runner{ + t: t, + mode: mode, + testDir: testDir, + agentName: fmt.Sprintf("e2e-%s-%s", mode, shortHash(mode)), + env: env, + } + + // Cleanups run LIFO, so register the config-dir delete first and teardown + // second: teardown (azd down) runs before the config copy it relies on is + // removed. + if !envTrue("E2E_KEEP_ARTIFACTS") { + t.Cleanup(func() { _ = os.RemoveAll(configDir) }) + } + t.Cleanup(r.teardown) + + // CI (GitHub Actions / Azure DevOps / explicit override) uses the az CLI + // session for auth; local WSL uses azd's slower-to-avoid built-in auth. + if useAzCliAuth() { + _, _ = r.runAzd(context.Background(), testDir, time.Minute, + "config", "set", "auth.useAzCliAuth", "true") + } + + return r +} + +// setupConfigDir creates configDir as a copy of ~/.azd (so installed extensions +// resolve), or an empty dir if ~/.azd is absent. cp -a preserves the extension +// binary's executable bit. +func setupConfigDir(t *testing.T, configDir string) { + t.Helper() + + home, err := os.UserHomeDir() + if err != nil { + t.Fatalf("resolve home dir: %v", err) + } + defaultAzd := filepath.Join(home, ".azd") + if info, err := os.Stat(defaultAzd); err == nil && info.IsDir() { + _ = os.RemoveAll(configDir) + //nolint:gosec // both paths derive from HOME / TempDir, not user input. + out, err := exec.Command("cp", "-a", defaultAzd, configDir).CombinedOutput() + if err != nil { + t.Fatalf("copy azd config dir: %v: %s", err, out) + } + return + } + if err := os.MkdirAll(configDir, 0o700); err != nil { + t.Fatalf("create azd config dir: %v", err) + } +} + +// run executes the phases in order, stopping at the first failure. Teardown is +// registered separately as a cleanup, so it always runs. +func (r *runner) run(ctx context.Context) { + if err := r.phaseInit(ctx); err != nil { + r.t.Errorf("init: %v", err) + return + } + if err := r.phaseProvision(ctx); err != nil { + r.t.Errorf("provision: %v", err) + return + } + if err := r.phaseDeploy(ctx); err != nil { + r.t.Errorf("deploy: %v", err) + return + } + if err := r.phaseInvoke(ctx); err != nil { + r.t.Errorf("invoke: %v", err) + return + } +} + +// phaseInit runs `azd ai agent init` attached to a pseudo-terminal and drives +// its interactive prompts until the project is scaffolded on disk. +func (r *runner) phaseInit(ctx context.Context) error { + c, err := newConsole(initCols, initRows) + if err != nil { + return err + } + defer c.close() + r.c = c + + ictx, cancel := context.WithTimeout(ctx, initTimeout) + defer cancel() + + args := []string{"ai", "agent", "init", "--agent-name", r.agentName} + //nolint:gosec // azd is a trusted fixed binary; args are test-controlled. + cmd := exec.CommandContext(ictx, "azd", args...) + cmd.Dir = r.testDir + cmd.Env = r.env + cmd.Stdin = c.tty() + cmd.Stdout = c.tty() + cmd.Stderr = c.tty() + // Give the child the pts as its controlling terminal (as tmux did), so + // survey treats it as a real interactive terminal. + cmd.SysProcAttr = &syscall.SysProcAttr{Setsid: true, Setctty: true} + + if err := cmd.Start(); err != nil { + return fmt.Errorf("start azd ai agent init: %w", err) + } + + // Render child output to the screen for the whole lifetime of the process. + go c.drain() + + exited := make(chan struct{}) + go func() { + _ = cmd.Wait() + close(exited) + }() + + driveErr := r.driveInit(ictx, exited) + + // Make sure the child is gone before returning (it normally exits itself). + select { + case <-exited: + case <-time.After(10 * time.Second): + _ = cmd.Process.Kill() + <-exited + } + + return driveErr +} + +// driveInit is the prompt state machine: it polls the rendered screen and +// answers each survey prompt until init reports completion (or the process +// exits, or it times out). +func (r *runner) driveInit(ctx context.Context, exited <-chan struct{}) error { + var lastPromptKey string + samePromptCount := 0 + + for { + if err := sleepCtx(ctx, initStepDelay); err != nil { + return fmt.Errorf("init timed out: %w", err) + } + + select { + case <-exited: + if r.validateInitOutput() { + return nil + } + return errors.New("azd ai agent init exited without producing artifacts") + default: + } + + screen := r.c.screen() + + if screenContains(screen, "added to your azd project") || + screenContains(screen, "agent definition added") { + if r.validateInitOutput() { + return nil + } + if err := sleepCtx(ctx, 5*time.Second); err != nil { + return err + } + if r.validateInitOutput() { + return nil + } + return errors.New("init completion marker found but artifacts missing on disk") + } + + prompt := activePrompt(screen) + if prompt == "" { + continue + } + r.t.Logf("prompt: %s", truncate(prompt, 100)) + + // Loop detection: compare the question text before ':' so varying filter + // text on the same prompt doesn't reset the counter. + key := prompt + if i := strings.Index(prompt, ":"); i > 0 { + key = strings.TrimSpace(prompt[:i]) + } + if key == lastPromptKey { + samePromptCount++ + } else { + samePromptCount = 1 + lastPromptKey = key + } + if samePromptCount >= 3 { + if strings.Contains(prompt, "model") || strings.Contains(prompt, "is specified") { + r.t.Log("loop detected on model prompt; trying next option") + r.c.send(keyDown) + time.Sleep(300 * time.Millisecond) + r.c.send(keyEnter) + continue + } + if samePromptCount >= 5 { + return fmt.Errorf("init stuck in prompt loop: %q", key) + } + } + + r.dispatchPrompt(screen, prompt) + } +} + +// dispatchPrompt answers a single survey prompt. The case order mirrors the +// original Python elif chain: more specific prompts must precede generic ones. +func (r *runner) dispatchPrompt(screen, prompt string) { + contains := func(sub string) bool { return strings.Contains(prompt, sub) } + + switch { + case contains("[y/n]") || contains("(y/n)"): + if contains("continue with this existing agent name") { + r.c.send("n") // use a fresh name + } else { + r.c.send("y") + } + r.c.send(keyEnter) + case contains("language"): + r.selectByText("Python", 1500*time.Millisecond) + case contains("template"): + r.selectByText("Basic agent (Invocations", 1500*time.Millisecond) + case contains("protocol") || contains("git operations"): + r.enter() // HTTPS (default) + case contains("enter a different name"): + r.enter() + case contains("container registry") || contains("acr"): + r.enter() // blank -> create new + case contains("model deployment name") || + (contains("enter") && contains("deployment") && contains("name")): + r.enter() + case contains("existing deployment") || contains("is specified in the agent manifest") || + (contains("found") && contains("deployment")): + r.enter() + case contains("capacity"): + r.enter() + case contains("sku"): + r.enter() + case contains("version"): + r.enter() + case contains("select") && contains("model"): + r.selectByText("gpt-4o-mini", 1500*time.Millisecond) + case contains("subscription"): + if sub := os.Getenv("E2E_SUBSCRIPTION"); sub != "" { + r.selectByText(sub[:min(8, len(sub))], 2*time.Second) + } else { + r.enter() + } + case contains("location") || contains("region"): + r.selectByText(getenvDefault("E2E_LOCATION", "eastus2"), 2*time.Second) + case contains("foundry project") || (contains("select") && contains("project")): + switch { + case r.createProject() && screenContains(screen, "create a new"): + r.selectByText("Create", 3*time.Second) + case os.Getenv("E2E_PROJECT") != "": + r.selectByText(os.Getenv("E2E_PROJECT"), 3*time.Second) + default: + r.enter() + } + case contains("account name") || contains("resource name") || contains("hub name"): + r.enter() + case contains("model") && !contains("capacity"): + r.enter() + case contains("deploy") && (contains("mode") || contains("how")) && !contains("capacity"): + if r.mode == "container" { + r.selectByText("Container", 1500*time.Millisecond) + } else { + r.selectByText("Source", 1500*time.Millisecond) + } + case contains("what would you like to do"): + r.enter() // Exit setup (default) + case contains("enter a name"): + r.enter() + default: + r.enter() + } +} + +// phaseProvision finds the scaffolded project and runs `azd provision`. +func (r *runner) phaseProvision(ctx context.Context) error { + dir := r.findProjectDir() + if dir == "" { + return errors.New("no project directory with azure.yaml found") + } + r.projectDir = dir + r.t.Logf("project dir: %s", dir) + + _, code := r.runAzd(ctx, dir, provisionTimeout, "provision", "--no-prompt") + if code != 0 { + return fmt.Errorf("azd provision failed (exit %d)", code) + } + return nil +} + +// phaseDeploy runs `azd deploy`. +func (r *runner) phaseDeploy(ctx context.Context) error { + _, code := r.runAzd(ctx, r.projectDir, deployTimeout, "deploy", "--no-prompt") + if code != 0 { + return fmt.Errorf("azd deploy failed (exit %d)", code) + } + return nil +} + +// phaseInvoke calls the deployed agent and verifies it answers "2+2" with 4. +func (r *runner) phaseInvoke(ctx context.Context) error { + wait := 30 * time.Second + if r.mode == "container" { + wait = 60 * time.Second + } + r.t.Logf("waiting %s for agent startup (%s mode)", wait, r.mode) + if err := sleepCtx(ctx, wait); err != nil { + return err + } + + svc := r.findServiceName() + if svc == "" { + return errors.New("could not determine service name from azure.yaml") + } + r.t.Logf("service name: %s", svc) + + // The invocations protocol requires a JSON body via --input-file. + payload := filepath.Join(r.testDir, ".invoke-payload.json") + if err := os.WriteFile(payload, []byte(`{"message": "Hello, what is 2+2?"}`), 0o600); err != nil { + return fmt.Errorf("write invoke payload: %w", err) + } + + const maxRetries = 3 + for attempt := 1; attempt <= maxRetries; attempt++ { + r.t.Logf("invoke attempt %d/%d", attempt, maxRetries) + out, code := r.runAzd(ctx, r.projectDir, invokeTimeout, + "ai", "agent", "invoke", svc, "--new-session", "-f", payload) + + if code != 0 { + if attempt == maxRetries { + logs, _ := r.runAzd(ctx, r.projectDir, monitorTimeout, + "ai", "agent", "monitor", svc, "--tail", "50") + r.t.Logf("agent logs (tail):\n%s", tail(logs, 4000)) + return fmt.Errorf("azd invoke failed (exit %d)", code) + } + delay := 15 * time.Second + if strings.Contains(out, "500") || + strings.Contains(strings.ToLower(out), "internal server error") { + delay = 30 * time.Second // container may still be starting + } + r.t.Logf("invoke failed (exit %d); retrying in %s", code, delay) + if err := sleepCtx(ctx, delay); err != nil { + return err + } + continue + } + + if !responseHasExpectedAnswer(out) { + if attempt < maxRetries { + r.t.Log("response missing expected '4'/'four'; retrying") + if err := sleepCtx(ctx, 15*time.Second); err != nil { + return err + } + continue + } + return fmt.Errorf("invoke response missing expected '4'/'four': %s", truncate(out, 200)) + } + + r.t.Log("invoke succeeded; response contains the expected answer") + return nil + } + return errors.New("invoke failed after all retries") +} + +// teardown runs `azd down` so a run never leaves billable resources behind. It +// uses a fresh context because the per-run deadline may already have fired. +func (r *runner) teardown() { + if r.projectDir == "" { + r.projectDir = r.findProjectDir() + } + if r.projectDir == "" { + return + } + r.t.Log("teardown: azd down --force --purge") + _, code := r.runAzd(context.Background(), r.projectDir, teardownTimeout, + "down", "--force", "--purge", "--no-prompt") + if code != 0 { + r.t.Errorf("azd down failed (exit %d) — Azure resources may be leaked", code) + } +} + +// runAzd runs an azd command in dir with a timeout, streaming combined output to +// the test log and returning it along with the exit code. +func (r *runner) runAzd(ctx context.Context, dir string, timeout time.Duration, args ...string) (string, int) { + cctx, cancel := context.WithTimeout(ctx, timeout) + defer cancel() + + //nolint:gosec // azd is a trusted fixed binary; args are test-controlled. + cmd := exec.CommandContext(cctx, "azd", args...) + cmd.Dir = dir + cmd.Env = r.env + + var buf bytes.Buffer + lw := &lineLogger{t: r.t} + cmd.Stdout = io.MultiWriter(&buf, lw) + // Same writer value as Stdout => os/exec uses one pipe and one copier + // goroutine, so there is no concurrent write to buf/lw. + cmd.Stderr = cmd.Stdout + + err := cmd.Run() + lw.flush() + return buf.String(), exitCode(err) +} + +// selectByText filters a survey list by typing target, waits for the list to +// settle, then confirms with Enter. +func (r *runner) selectByText(target string, delay time.Duration) { + r.c.send(target) + time.Sleep(delay) + r.c.send(keyEnter) +} + +// enter accepts a prompt's default by pressing Enter. +func (r *runner) enter() { + r.c.send(keyEnter) +} + +// createProject reports whether the run should create a fresh Foundry project. +func (r *runner) createProject() bool { + return envTrue("E2E_CREATE_PROJECT") +} + +// findProjectDir returns the first immediate subdirectory of testDir that +// contains an azure.yaml (the project scaffolded by init), or "". +func (r *runner) findProjectDir() string { + entries, err := os.ReadDir(r.testDir) + if err != nil { + return "" + } + for _, e := range entries { + if !e.IsDir() { + continue + } + dir := filepath.Join(r.testDir, e.Name()) + if _, err := os.Stat(filepath.Join(dir, "azure.yaml")); err == nil { + return dir + } + } + return "" +} + +// findServiceName reads the first service name from the project's azure.yaml. +func (r *runner) findServiceName() string { + dir := r.projectDir + if dir == "" { + dir = r.findProjectDir() + } + if dir == "" { + return "" + } + //nolint:gosec // azure.yaml path is under the test-controlled testDir. + data, err := os.ReadFile(filepath.Join(dir, "azure.yaml")) + if err != nil { + return "" + } + inServices := false + for line := range strings.SplitSeq(string(data), "\n") { + trimmed := strings.TrimSpace(line) + if trimmed == "services:" { + inServices = true + continue + } + if inServices && strings.HasPrefix(line, " ") && strings.HasSuffix(trimmed, ":") { + return strings.TrimSuffix(trimmed, ":") + } + if inServices && !strings.HasPrefix(line, " ") && trimmed != "" { + break + } + } + return "" +} + +// validateInitOutput confirms init produced an agent project on disk: a project +// dir whose azure.yaml targets the agent host and a nested agent.yaml. +func (r *runner) validateInitOutput() bool { + entries, err := os.ReadDir(r.testDir) + if err != nil { + return false + } + for _, e := range entries { + if !e.IsDir() { + continue + } + subdir := filepath.Join(r.testDir, e.Name()) + //nolint:gosec // azure.yaml path is under the test-controlled testDir. + data, err := os.ReadFile(filepath.Join(subdir, "azure.yaml")) + if err != nil { + continue + } + content := string(data) + if strings.Contains(content, "host:") && strings.Contains(content, "azure.ai.agent") && + hasAgentYAML(subdir) { + return true + } + } + return false +} + +// hasAgentYAML reports whether an agent.yaml exists anywhere under root. +func hasAgentYAML(root string) bool { + found := false + _ = filepath.WalkDir(root, func(_ string, d fs.DirEntry, err error) error { + if err != nil { + return nil + } + if !d.IsDir() && d.Name() == "agent.yaml" { + found = true + return filepath.SkipAll + } + return nil + }) + return found +} + +// lineLogger forwards a stream to t.Log one line at a time so long-running azd +// output is visible live in the CI log. +type lineLogger struct { + t *testing.T + buf []byte +} + +func (l *lineLogger) Write(p []byte) (int, error) { + l.buf = append(l.buf, p...) + for { + i := bytes.IndexByte(l.buf, '\n') + if i < 0 { + break + } + l.t.Log(strings.TrimRight(string(l.buf[:i]), "\r")) + l.buf = l.buf[i+1:] + } + return len(p), nil +} + +func (l *lineLogger) flush() { + if len(l.buf) > 0 { + l.t.Log(strings.TrimRight(string(l.buf), "\r")) + l.buf = nil + } +} + +// exitCode extracts a process exit code from an exec error (-1 if it never ran). +func exitCode(err error) int { + if err == nil { + return 0 + } + var ee *exec.ExitError + if errors.As(err, &ee) { + return ee.ExitCode() + } + return -1 +} + +// ghToken resolves a GitHub token from the environment, falling back to `gh`. +func ghToken() string { + for _, k := range []string{"GITHUB_TOKEN", "GH_TOKEN"} { + if v := os.Getenv(k); v != "" { + return v + } + } + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + //nolint:gosec // gh is a trusted fixed binary; no user input in args. + out, err := exec.CommandContext(ctx, "gh", "auth", "token").Output() + if err != nil { + return "" + } + return strings.TrimSpace(string(out)) +} + +// shortHash returns a short, non-cryptographic uniqueness suffix for the agent +// name (sha256 only to avoid noise from security scanners). +func shortHash(mode string) string { + sum := sha256.Sum256(fmt.Appendf(nil, "%s-%d", mode, os.Getpid())) + return hex.EncodeToString(sum[:])[:6] +} + +// assertSafeTestDir refuses a path that is not clearly a disposable test dir, so +// a bad E2E_TESTDIR (e.g. "/", "/tmp", "$HOME") can never trigger a destructive +// delete. +func assertSafeTestDir(path string) error { + abs, err := filepath.Abs(path) + if err != nil { + return fmt.Errorf("resolve test dir: %w", err) + } + abs = filepath.Clean(abs) + protected := map[string]bool{ + "/": true, "/tmp": true, "/var": true, "/usr": true, "/etc": true, + "/bin": true, "/lib": true, "/root": true, "/home": true, + } + if home, err := os.UserHomeDir(); err == nil && home != "" { + protected[filepath.Clean(home)] = true + } + if protected[abs] || strings.Count(abs, "/") < 2 { + return fmt.Errorf("refusing to delete unsafe test dir %q (resolved %q)", path, abs) + } + return nil +} + +// useAzCliAuth reports whether to use the az CLI session for azd auth (CI), as +// opposed to azd's built-in auth (local WSL). +func useAzCliAuth() bool { + return envTrue("E2E_USE_AZ_CLI_AUTH") || + os.Getenv("GITHUB_ACTIONS") != "" || + os.Getenv("TF_BUILD") != "" +} + +// getenvDefault returns the env var value, or def if unset/empty. +func getenvDefault(key, def string) string { + if v := os.Getenv(key); v != "" { + return v + } + return def +} + +// envTrue reports whether an env var is set to a truthy value. +func envTrue(key string) bool { + switch strings.ToLower(strings.TrimSpace(os.Getenv(key))) { + case "1", "true", "yes": + return true + default: + return false + } +} + +// sleepCtx sleeps for d unless ctx is cancelled first, returning ctx.Err() then. +func sleepCtx(ctx context.Context, d time.Duration) error { + timer := time.NewTimer(d) + defer timer.Stop() + select { + case <-ctx.Done(): + return ctx.Err() + case <-timer.C: + return nil + } +} + +// truncate trims s and caps it to n characters with an ellipsis. +func truncate(s string, n int) string { + s = strings.TrimSpace(s) + if len(s) <= n { + return s + } + return s[:n] + "..." +} + +// tail returns the last n bytes of s with a leading ellipsis when truncated. +func tail(s string, n int) string { + if len(s) <= n { + return s + } + return "..." + s[len(s)-n:] +} diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index d9b52c54da8..577e9d05748 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -1,7 +1,9 @@ # Live E2E: azure.ai.agents extension — Tier 2 golden path # # Runs the full agent lifecycle (init -> provision -> deploy -> invoke -> down) -# against LIVE Azure resources, driving the real `azd ai agent` CLI through tmux. +# against LIVE Azure resources. The interactive `azd ai agent init` prompts are +# driven by the Go pseudo-terminal test driver (go-expect + vt10x); the other +# phases shell out to azd with --no-prompt. See tests/e2e-live/README.md. # # This pipeline is the live counterpart to the PR-gate checks in # `.github/workflows/lint-ext-azure-ai-agents.yml` (Tier 0 offline + Tier 1 @@ -64,26 +66,14 @@ extends: # Two golden paths (code + container) run sequentially (~13-15 min # each in the typical case), plus build/provision overhead. The cap # is sized for the worst case so an ungraceful job timeout never - # preempts the in-test teardown: 2x the child's 60 min watchdog - # (test_tier2.py timeout_s) + per-run cleanup + build/setup steps. + # preempts the in-test teardown: 2x the per-mode 60 min runTimeout + # (tier2_live_test.go) + per-run cleanup + build/setup steps. timeoutInMinutes: 150 steps: - checkout: self - template: /eng/pipelines/templates/steps/setup-go.yml - - task: UsePythonVersion@0 - inputs: - versionSpec: "3.12" - displayName: Use Python 3.12 - - - bash: | - set -euo pipefail - sudo apt-get update - sudo apt-get install -y tmux - tmux -V - displayName: Install tmux - # Live build — NO `-tags=record`, so the CLI/extension talk to real # Azure instead of the recording proxy used by the PR-gate tests. - bash: go build -o azd . @@ -148,10 +138,10 @@ extends: # the session does not persist to later plain bash steps. - task: AzureCLI@2 displayName: Run Tier 2 live golden path - # Holds BOTH deploy modes run sequentially: 2x the child's 60 min - # watchdog (test_tier2.py timeout_s) + per-run cleanup margin, so - # this step timeout never trips before the child's own watchdog - # (which runs the graceful azd teardown). + # Holds BOTH deploy modes run sequentially. `go test -timeout` + # (below) self-caps at 125 min — under this 130 min step budget — + # so the test process exits before ADO force-kills the step, and + # the per-mode 60 min runTimeout drives the graceful azd teardown. timeoutInMinutes: 130 inputs: azureSubscription: ${{ parameters.serviceConnection }} @@ -159,7 +149,7 @@ extends: visibleAzLogin: false scriptType: bash scriptLocation: inlineScript - workingDirectory: cli/azd/extensions/azure.ai.agents/tests/e2e-live + workingDirectory: cli/azd/extensions/azure.ai.agents inlineScript: | set -euo pipefail azd config set auth.useAzCliAuth true @@ -174,12 +164,17 @@ extends: export E2E_SUBSCRIPTION E2E_TENANT echo "Using subscription: $E2E_SUBSCRIPTION" mkdir -p "$(Build.ArtifactStagingDirectory)/logs" - # Invoke as `python` (not `python3`) so the UsePythonVersion@0 - # pinned 3.12 is used; `python3` may still resolve to the system - # Python on some agent images (matches eng/pipelines/eval-unit.yml). - python test_tier2.py --mode ${{ parameters.deployModes }} 2>&1 \ + # Drive the live golden path through the Go pseudo-terminal + # test driver. -v streams per-phase logs; -count=1 defeats the + # test cache (a live test must always re-run); -timeout self- + # caps the process under this step's budget so the per-mode + # teardown (t.Cleanup) runs before ADO force-kills the step. + go test -run TestTier2Live -count=1 -timeout 125m -v ./tests/e2e-live/ 2>&1 \ | tee "$(Build.ArtifactStagingDirectory)/logs/tier2.log" env: + # Gate + mode selection consumed by tier2_live_test.go. + AZURE_AI_AGENTS_E2E_LIVE: "1" + E2E_DEPLOY_MODES: ${{ parameters.deployModes }} E2E_CREATE_PROJECT: "true" E2E_LOCATION: eastus2 E2E_USE_AZ_CLI_AUTH: "true" From 4d27988457f20d49199e37a42ed244c763af12d8 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Wed, 24 Jun 2026 23:19:14 +0800 Subject: [PATCH 19/33] Make Tier 2 live init driver event-driven; calibrate prompts to wizard source Replace the init driver's fixed 3s polling with go-expect event-driven reads that block until the survey UI settles, then dispatch on the verbatim prompt strings the extension prints (each case annotated with its source file:line). Select deploy mode via the --deploy-mode flag, which init auto-resolves rather than prompts when a manifest is supplied. Document the dispatch-loop design and the live-only verification path in the README. Addresses review feedback on #8758. --- .../azure.ai.agents/tests/e2e-live/README.md | 27 +- .../tests/e2e-live/console_test.go | 93 +++++- .../tests/e2e-live/tier2_live_test.go | 291 ++++++++++++------ 3 files changed, 292 insertions(+), 119 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md index 2fcd9408a71..e085e26c10c 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md @@ -10,8 +10,14 @@ init → provision → deploy → invoke → down A Go test driver answers the interactive `azd ai agent init` prompts through a **pseudo-terminal** — [go-expect] sends keystrokes and [vt10x] renders the CLI's terminal UI so the test can assert on the on-screen text, with [creack/pty] -providing the PTY. The non-interactive phases (`provision`, `deploy`, `invoke`, -`down`) shell out to `azd ... --no-prompt`. Both deploy modes are covered: +providing the PTY. Synchronization is **event-driven**: the driver blocks on +go-expect reads until the survey UI stops emitting — i.e. a prompt is fully +drawn and waiting for input — instead of sleeping a fixed interval, then +dispatches on the rendered prompt text. The deploy mode is chosen up front via +`azd ai agent init --deploy-mode code|container` (it is not an interactive +prompt once a manifest is supplied). The non-interactive phases (`provision`, +`deploy`, `invoke`, `down`) shell out to `azd ... --no-prompt`. Both deploy +modes are covered: | Mode | What it does | | ----------- | ------------------------------------------------------- | @@ -24,6 +30,23 @@ The two modes run **sequentially** (same subscription → avoids resource races) [vt10x]: https://github.com/hinshun/vt10x [creack/pty]: https://github.com/creack/pty +## How the `init` driver answers prompts + +The interactive sub-flows (Foundry project selection, model/deployment) branch +on live runtime state, so the exact set and order of prompts is not fixed ahead +of time. Rather than a linear expect script, the driver runs a **dispatch +loop**: it waits for output to settle, reads the rendered screen, matches the +active `?` prompt against the verbatim strings the extension prints — each case +in `dispatchPrompt` is annotated with the source `file:line` it mirrors — and +sends the answer. A loop detector bounds any prompt that fails to advance so a +wording change upstream fails fast instead of hanging. + +Because the prompt strings are calibrated against the extension source, changes +there can require updating `dispatchPrompt`. And because a real PTY, Azure auth, +and the installed extension are all required, the **end-to-end interactive +correctness is only exercised by a live Tier 2 run** — it cannot be reproduced +by the platform-agnostic unit tests in this package. + ## Where this fits | Tier | Coverage | Where it runs | diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go index 1943fe61087..2a0153b571a 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/console_test.go @@ -9,6 +9,8 @@ import ( "fmt" "os" "strings" + "sync" + "time" expect "github.com/Netflix/go-expect" "github.com/creack/pty" @@ -19,16 +21,22 @@ import ( const ( keyEnter = "\r" keyDown = "\x1b[B" + keyUp = "\x1b[A" ) +// tailBytes caps the rolling raw-output buffer kept for failure diagnostics +// (the interactive init screen is otherwise not echoed to the test log). +const tailBytes = 16 << 10 + // console drives an interactive child process through a pseudo-terminal and -// renders its output with a vt10x virtual terminal so tests can assert on the -// on-screen text (the same role tmux capture-pane played in the old driver). +// renders its output with a vt10x virtual terminal so tests can both block on +// expected output (go-expect) and assert on the on-screen text (the role tmux +// capture-pane played in the old driver). // // Wiring (mirrors AlecAivazis/survey's posix expect tests): // // child stdio ── ec.Tty() (pts) ─┐ -// ├─ go-expect tees child output ─► vt10x screen +// ├─ go-expect tees child output ─► vt10x screen + tail // vt10x query replies ─► extSlave ┘ ▲ // extMaster ─ go-expect feeds back to child stdin // @@ -39,6 +47,7 @@ const ( type console struct { term vt10x.Terminal ec *expect.Console + tail *ringBuffer } // newConsole creates a console with a virtual terminal of the given size. @@ -49,13 +58,15 @@ func newConsole(cols, rows int) (*console, error) { } term := vt10x.New(vt10x.WithWriter(extSlave), vt10x.WithSize(cols, rows)) + tail := newRingBuffer(tailBytes) - // Deliberately no WithDefaultTimeout: the drain goroutine runs ExpectEOF for - // the whole child lifetime, and a read timeout would stop it (ending screen - // updates) during the long quiet stretches of init (e.g. template download). + // go-expect tees everything it reads to these writers, so every read driven + // by expect()/waitForQuiet() simultaneously renders the screen (term) and + // records the raw bytes (tail) for diagnostics. No WithDefaultTimeout: each + // read's deadline is supplied per call via expect.WithTimeout. ec, err := expect.NewConsole( expect.WithStdin(extMaster), - expect.WithStdout(term), + expect.WithStdout(term, tail), expect.WithCloser(extMaster, extSlave), ) if err != nil { @@ -69,7 +80,7 @@ func newConsole(cols, rows int) (*console, error) { //nolint:gosec // cols/rows are small fixed test dimensions; no overflow. _ = pty.Setsize(ec.Tty(), &pty.Winsize{Cols: uint16(cols), Rows: uint16(rows)}) - return &console{term: term, ec: ec}, nil + return &console{term: term, ec: ec, tail: tail}, nil } // tty returns the slave pseudo-terminal the child process should attach its @@ -83,13 +94,29 @@ func (c *console) send(s string) { _, _ = c.ec.Send(s) } -// drain continuously renders child output to the virtual terminal until the -// child's tty closes (process exit). It MUST run for the whole child lifetime: -// go-expect only tees output to the screen while a read is in flight, so -// without this the screen would stay blank and the child would eventually block -// once the output pipe filled. -func (c *console) drain() { - _, _ = c.ec.ExpectEOF() +// expect reads child output (teeing it to the screen and the tail buffer) until +// one of opts matches, idle elapses with no new byte, or the child's tty +// closes. It is the event-driven synchronization primitive that replaces the +// old fixed-interval polling: go-expect only renders output to the screen while +// a read is in flight, so every wait routes through here. +// +// Return contract (go-expect's passthrough pipe, see passthrough_pipe.go): +// - a match => (buf, nil) +// - idle of silence => (buf, err) with os.IsTimeout(err) == true +// - child exit / pts close => (buf, err) with a non-timeout error +func (c *console) expect(idle time.Duration, opts ...expect.ExpectOpt) (string, error) { + return c.ec.Expect(append(opts, expect.WithTimeout(idle))...) +} + +// waitForQuiet renders pending output to the screen until the UI stops emitting +// for quiet (a survey prompt fully drawn and now blocking on input) or the +// child exits. It returns exited=true once the child's tty has closed. +// +// It passes no matchers, so go-expect can only return on the idle read deadline +// (os.IsTimeout) or on a terminal read error (EOF / pts closed == child gone). +func (c *console) waitForQuiet(quiet time.Duration) (exited bool) { + _, err := c.expect(quiet) + return err != nil && !os.IsTimeout(err) } // screen returns the current rendered virtual-terminal contents, cleaned of NUL @@ -98,11 +125,44 @@ func (c *console) screen() string { return cleanScreen(c.term.String()) } +// tailString returns the most recent raw child output captured for diagnostics. +func (c *console) tailString() string { + return c.tail.String() +} + // close tears down the console and all of its pseudo-terminals. func (c *console) close() { _ = c.ec.Close() } +// ringBuffer is an io.Writer that retains only the last max bytes written, used +// to keep a bounded tail of raw child output for failure diagnostics. +type ringBuffer struct { + mu sync.Mutex + buf []byte + max int +} + +func newRingBuffer(max int) *ringBuffer { + return &ringBuffer{max: max} +} + +func (r *ringBuffer) Write(p []byte) (int, error) { + r.mu.Lock() + defer r.mu.Unlock() + r.buf = append(r.buf, p...) + if len(r.buf) > r.max { + r.buf = r.buf[len(r.buf)-r.max:] + } + return len(p), nil +} + +func (r *ringBuffer) String() string { + r.mu.Lock() + defer r.mu.Unlock() + return string(r.buf) +} + // cleanScreen normalizes a vt10x screen dump: empty cells render as NUL, which // is replaced with spaces, then trailing whitespace is trimmed from each row. func cleanScreen(s string) string { @@ -126,7 +186,8 @@ func nonEmptyLines(screen string) []string { } // activePrompt returns the lowercased text of the last survey "?" prompt line on -// screen, or "" if none is visible. +// screen, or "" if none is visible. The last "?" line is the one survey is +// currently blocking on (earlier "?" lines are answered prompts it echoed). func activePrompt(screen string) string { lines := nonEmptyLines(screen) for i := len(lines) - 1; i >= 0; i-- { diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index 1c0535d57b2..89fd5cb2f88 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -47,7 +47,13 @@ const ( monitorTimeout = 60 * time.Second teardownTimeout = 10 * time.Minute - initStepDelay = 3 * time.Second + // Event-driven tuning for the interactive init loop. promptQuiet is how long + // the survey UI must stop emitting before we treat the current prompt as + // "drawn and waiting for input"; listSettle is the shorter pause we let a + // filtered Select list redraw after typing before confirming with Enter. + // Both replace the old fixed 3s poll; the hard init cap is the ctx deadline. + promptQuiet = 800 * time.Millisecond + listSettle = 600 * time.Millisecond ) // TestTier2Live exercises the full golden path against live Azure for each @@ -206,7 +212,11 @@ func (r *runner) phaseInit(ctx context.Context) error { ictx, cancel := context.WithTimeout(ctx, initTimeout) defer cancel() - args := []string{"ai", "agent", "init", "--agent-name", r.agentName} + // Deploy mode is NOT an interactive prompt in the template/--agent-name + // flow: init auto-resolves it to "container" when a manifest is provided + // (init_from_code.go:1373), so it must be chosen via the --deploy-mode flag + // (init.go:1306). r.mode is exactly "container" or "code". + args := []string{"ai", "agent", "init", "--agent-name", r.agentName, "--deploy-mode", r.mode} //nolint:gosec // azd is a trusted fixed binary; args are test-controlled. cmd := exec.CommandContext(ictx, "azd", args...) cmd.Dir = r.testDir @@ -222,9 +232,10 @@ func (r *runner) phaseInit(ctx context.Context) error { return fmt.Errorf("start azd ai agent init: %w", err) } - // Render child output to the screen for the whole lifetime of the process. - go c.drain() - + // No separate render goroutine: go-expect's passthrough pipe drains the + // child's pty in the background, and driveInit's expect()/waitForQuiet() + // calls do the reading that renders the screen. (A concurrent reader would + // race those calls for the same stream.) exited := make(chan struct{}) go func() { _ = cmd.Wait() @@ -244,71 +255,67 @@ func (r *runner) phaseInit(ctx context.Context) error { return driveErr } -// driveInit is the prompt state machine: it polls the rendered screen and -// answers each survey prompt until init reports completion (or the process -// exits, or it times out). +// driveInit is the event-driven prompt loop: it waits (via go-expect) for the +// survey UI to settle on a prompt, reads the rendered screen, and answers it, +// until init reports completion (or the process exits, or it times out). +// +// Why a screen-dispatch loop and not a linear ExpectString script: the live +// model/deployment and Foundry-project sub-flows branch on runtime state — +// whether the just-created project already has the model deployed, region/model +// availability, existing-name collisions — so the exact set and order of +// prompts cannot be predetermined. A linear ExpectString sequence would desync +// at the first conditional prompt. Instead we block on output settling (the +// go-expect read), then dispatch on the verbatim prompt strings the extension +// prints (each case annotated with its source file:line). func (r *runner) driveInit(ctx context.Context, exited <-chan struct{}) error { - var lastPromptKey string - samePromptCount := 0 + var lastKey string + repeat := 0 for { - if err := sleepCtx(ctx, initStepDelay); err != nil { - return fmt.Errorf("init timed out: %w", err) - } - select { + case <-ctx.Done(): + return fmt.Errorf("init timed out: %w\n--- tail ---\n%s", + ctx.Err(), tail(r.c.tailString(), 2000)) case <-exited: - if r.validateInitOutput() { - return nil - } - return errors.New("azd ai agent init exited without producing artifacts") + return r.finishInit(ctx) default: } - screen := r.c.screen() + // Block until the UI stops emitting (prompt fully drawn, awaiting input) + // or the child exits. Replaces the old fixed-interval poll. + if r.c.waitForQuiet(promptQuiet) { + return r.finishInit(ctx) + } - if screenContains(screen, "added to your azd project") || - screenContains(screen, "agent definition added") { - if r.validateInitOutput() { - return nil - } - if err := sleepCtx(ctx, 5*time.Second); err != nil { - return err - } - if r.validateInitOutput() { - return nil - } - return errors.New("init completion marker found but artifacts missing on disk") + screen := r.c.screen() + if isInitComplete(screen) { + return r.finishInit(ctx) } prompt := activePrompt(screen) if prompt == "" { - continue + continue // spinner / transient output, not a survey prompt yet } r.t.Logf("prompt: %s", truncate(prompt, 100)) // Loop detection: compare the question text before ':' so varying filter // text on the same prompt doesn't reset the counter. - key := prompt - if i := strings.Index(prompt, ":"); i > 0 { - key = strings.TrimSpace(prompt[:i]) - } - if key == lastPromptKey { - samePromptCount++ + key := promptKey(prompt) + if key == lastKey { + repeat++ } else { - samePromptCount = 1 - lastPromptKey = key + repeat, lastKey = 1, key } - if samePromptCount >= 3 { + if repeat >= 3 { if strings.Contains(prompt, "model") || strings.Contains(prompt, "is specified") { r.t.Log("loop detected on model prompt; trying next option") r.c.send(keyDown) - time.Sleep(300 * time.Millisecond) + r.c.waitForQuiet(listSettle) r.c.send(keyEnter) continue } - if samePromptCount >= 5 { - return fmt.Errorf("init stuck in prompt loop: %q", key) + if repeat >= 5 { + return fmt.Errorf("init stuck in prompt loop: %q\n--- screen ---\n%s", key, screen) } } @@ -316,74 +323,152 @@ func (r *runner) driveInit(ctx context.Context, exited <-chan struct{}) error { } } -// dispatchPrompt answers a single survey prompt. The case order mirrors the -// original Python elif chain: more specific prompts must precede generic ones. +// finishInit confirms init produced the expected artifacts on disk, allowing a +// brief grace for files to flush after the completion marker or process exit. +func (r *runner) finishInit(ctx context.Context) error { + if r.validateInitOutput() { + return nil + } + _ = sleepCtx(ctx, 5*time.Second) + if r.validateInitOutput() { + return nil + } + return fmt.Errorf( + "init finished but expected artifacts are missing on disk\n--- tail ---\n%s", + tail(r.c.tailString(), 2000), + ) +} + +// isInitComplete reports whether the success marker is on screen. Source: +// init.go:1483 prints "AI agent definition added to your azd project +// successfully!" in green at the end of runInitFromManifest. +func isInitComplete(screen string) bool { + return screenContains(screen, "added to your azd project") || + screenContains(screen, "agent definition added") +} + +// promptKey reduces a prompt line to its stable question text (before the first +// ':') for loop detection. +func promptKey(prompt string) string { + if i := strings.Index(prompt, ":"); i > 0 { + return strings.TrimSpace(prompt[:i]) + } + return prompt +} + +// dispatchPrompt answers a single survey prompt. Cases are ordered specific → +// generic and keyed on the verbatim messages the extension prints; the file:line +// in each comment points at the source string this matches. The prompt argument +// is already lowercased (see activePrompt). +// +// Only a subset of these fire on the --agent-name template critical path +// (language, template, Foundry project, subscription, location, the manifest +// model, deployment name, capacity/sku/version). The rest are kept as defensive +// handlers because init auto-resolves them under userProvidedManifest=true (so +// they normally do NOT prompt) or only surfaces them for specific runtime state. func (r *runner) dispatchPrompt(screen, prompt string) { - contains := func(sub string) bool { return strings.Contains(prompt, sub) } + has := func(sub string) bool { return strings.Contains(prompt, sub) } switch { - case contains("[y/n]") || contains("(y/n)"): - if contains("continue with this existing agent name") { - r.c.send("n") // use a fresh name + // Yes/No confirms. "Continue with this existing agent name?" (init.go:722) + // only fires when the unique name already exists; decline it to reach the + // fresh-name input. Any other confirm: accept. + case has("[y/n]") || has("(y/n)") || has("continue with this existing agent name"): + if has("continue with this existing agent name") { + r.c.send("n") } else { r.c.send("y") } r.c.send(keyEnter) - case contains("language"): - r.selectByText("Python", 1500*time.Millisecond) - case contains("template"): - r.selectByText("Basic agent (Invocations", 1500*time.Millisecond) - case contains("protocol") || contains("git operations"): - r.enter() // HTTPS (default) - case contains("enter a different name"): - r.enter() - case contains("container registry") || contains("acr"): - r.enter() // blank -> create new - case contains("model deployment name") || - (contains("enter") && contains("deployment") && contains("name")): - r.enter() - case contains("existing deployment") || contains("is specified in the agent manifest") || - (contains("found") && contains("deployment")): - r.enter() - case contains("capacity"): - r.enter() - case contains("sku"): - r.enter() - case contains("version"): - r.enter() - case contains("select") && contains("model"): - r.selectByText("gpt-4o-mini", 1500*time.Millisecond) - case contains("subscription"): - if sub := os.Getenv("E2E_SUBSCRIPTION"); sub != "" { - r.selectByText(sub[:min(8, len(sub))], 2*time.Second) + + // Language select — "Select a language" (init_from_templates_helpers.go:263). + case has("select a language"): + r.selectByText("Python") + + // Template select — "Select a starter template" / "Select an agent template" + // (init_from_templates_helpers.go:304 / 327). + case has("starter template") || has("agent template"): + r.selectByText("Basic agent (Invocations") + + // Foundry project hosting — "Select a Foundry project to host your agent..." + // (init.go:1752 / 1910); choices "Use an existing..." / "Create a new...". + case has("foundry project to host"): + if r.createProject() { + r.selectByText("Create a new Foundry project") + } else { + r.selectByText("Use an existing Foundry project") + } + + // Existing-project picker — "Select a Foundry project" + // (init_foundry_resources_helpers.go:1360); only when reusing a project. + case has("select a foundry project"): + if p := os.Getenv("E2E_PROJECT"); p != "" { + r.selectByText(p) } else { r.enter() } - case contains("location") || contains("region"): - r.selectByText(getenvDefault("E2E_LOCATION", "eastus2"), 2*time.Second) - case contains("foundry project") || (contains("select") && contains("project")): - switch { - case r.createProject() && screenContains(screen, "create a new"): - r.selectByText("Create", 3*time.Second) - case os.Getenv("E2E_PROJECT") != "": - r.selectByText(os.Getenv("E2E_PROJECT"), 3*time.Second) - default: + + // Subscription — preamble "Select an Azure subscription..." + // (init_foundry_resources_helpers.go:905) + azd-core picker. + case has("subscription"): + if sub := os.Getenv("E2E_SUBSCRIPTION"); sub != "" { + r.selectByText(sub[:min(8, len(sub))]) + } else { r.enter() } - case contains("account name") || contains("resource name") || contains("hub name"): + + // Location — preamble "Select an Azure location..." + // (init_foundry_resources_helpers.go:1004) + azd-core picker. + case has("location") || has("region"): + r.selectByText(getenvDefault("E2E_LOCATION", "eastus2")) + + // Manifest model decision — "Model '%s' is specified in the agent manifest." + // (init_models.go:463); keep the manifest model (default first choice). + case has("is specified in the agent manifest"): r.enter() - case contains("model") && !contains("capacity"): + + // Existing deployments / generic proceed — init_models.go:263 / 330. + case has("how would you like to proceed") || has("existing deployment"): r.enter() - case contains("deploy") && (contains("mode") || contains("how")) && !contains("capacity"): - if r.mode == "container" { - r.selectByText("Container", 1500*time.Millisecond) - } else { - r.selectByText("Source", 1500*time.Millisecond) - } - case contains("what would you like to do"): - r.enter() // Exit setup (default) - case contains("enter a name"): + + // Model deployment name input — init_models.go:398 (default = model name). + case has("model deployment name") || (has("deployment name") && has("model")): + r.enter() + + // Model select — "Select a model" (init_models.go:704 etc.). + case has("select a model"): + r.selectByText("gpt-4o-mini") + + // Deployment capacity / sku / version — azd-core PromptAiDeployment + // (init_models.go:519); accept defaults. + case has("capacity") || has("sku") || has("version"): + r.enter() + + // Code-deploy prompts (init_from_code.go:1508 / 1534 / 1563). Auto-resolved + // under userProvidedManifest=true, so kept as defensive handlers only. + case has("select the runtime for your agent"): + r.enter() // default Python 3.13 + case has("entry point"): + r.enter() // accept detected default + case has("how should dependencies be resolved"): + r.enter() // default remote build + + // Optional infra (blank => create new): ACR login server + // (init_foundry_resources_helpers.go:481), App Insights (:606 / :621). + case has("acr login server") || has("container registry"): + r.enter() + case has("application insights"): + r.enter() + + // Startup command (helpers.go:773); blank => skip. + case has("command to start your agent"): r.enter() + + // Replacement agent name after declining the existing-name confirm + // (init.go:745) / the name input (init.go:261); accept the default. + case has("enter a different name for your agent") || has("enter a name for your agent"): + r.enter() + default: r.enter() } @@ -519,11 +604,15 @@ func (r *runner) runAzd(ctx context.Context, dir string, timeout time.Duration, return buf.String(), exitCode(err) } -// selectByText filters a survey list by typing target, waits for the list to -// settle, then confirms with Enter. -func (r *runner) selectByText(target string, delay time.Duration) { +// selectByText filters a survey list by typing target, waits (event-driven) for +// the filtered list to stop redrawing, then confirms with Enter. This assumes +// the survey / azd-core Select supports type-to-filter; that behavior is only +// verifiable against a live run (documented in README). waitForQuiet's exited +// result is intentionally ignored: a child that exited mid-select makes the +// trailing Enter a harmless no-op on the closed pty. +func (r *runner) selectByText(target string) { r.c.send(target) - time.Sleep(delay) + r.c.waitForQuiet(listSettle) r.c.send(keyEnter) } From e17c88ecff259708275f9250f05362c395e49164 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Thu, 25 Jun 2026 14:20:21 +0800 Subject: [PATCH 20/33] Address review: use errors.AsType and assert intentional 4.0 rejection Per AGENTS.md Go 1.26 guidance, switch exitCode() from errors.As to errors.AsType[*exec.ExitError]. Add an explicit "4.0" case to the assert table and document why a decimal answer is deliberately rejected. --- .../extensions/azure.ai.agents/tests/e2e-live/assert.go | 7 ++++++- .../azure.ai.agents/tests/e2e-live/assert_test.go | 1 + .../azure.ai.agents/tests/e2e-live/tier2_live_test.go | 3 +-- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go index 89ebe9d94c0..d210f5ff19d 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go @@ -25,6 +25,11 @@ var spelledFourRe = regexp.MustCompile(`(?i)\bfour\b`) // The standalone-"4" rule is the lookaround (?" form is treated as a version/decimal token to +// keep "4.1"-style strings out, and a live model answering "2+2" replies "4" or +// "four", never "4.0". +// // Go's regexp engine (RE2) has no lookahead/lookbehind, so the standalone-"4" // rule is implemented by scanning runes instead of with a single expression. func responseHasExpectedAnswer(text string) bool { @@ -37,7 +42,7 @@ func responseHasExpectedAnswer(text string) bool { // hasStandaloneFour reports whether text contains a "4" digit that stands alone, // reproducing the lookaround in the Python regex (? Date: Thu, 25 Jun 2026 15:34:57 +0800 Subject: [PATCH 21/33] Surface crash-path teardown errors in pipeline log Use 2>&1 instead of 2>/dev/null in the always() cleanup loop so a failed force-purge is visible for debugging a resource leak; keep || true so set -e doesn't abort the loop. --- eng/pipelines/ext-azure-ai-agents-live.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 577e9d05748..bc91070778a 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -210,6 +210,6 @@ extends: [ -d "$dir" ] || continue proj=$(find "$dir" -maxdepth 2 -name azure.yaml -exec dirname {} \; | head -1) if [ -n "$proj" ]; then - ( cd "$proj" && azd down --force --purge --no-prompt ) 2>/dev/null || true + ( cd "$proj" && azd down --force --purge --no-prompt ) 2>&1 || true fi done From d6170cb0797f1c6fa662f02ed5a0d5135d416778 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Thu, 25 Jun 2026 15:46:47 +0800 Subject: [PATCH 22/33] Parse azure.yaml via yaml.Unmarshal in findServiceName Replace the indentation/colon line scan with a minimal struct unmarshal (gopkg.in/yaml.v3, already a direct dep) so the service-name lookup tolerates comments and indentation changes in scaffolded azure.yaml. --- .../tests/e2e-live/tier2_live_test.go | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index b77a83222c0..f02edc03ca4 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -21,6 +21,8 @@ import ( "syscall" "testing" "time" + + "gopkg.in/yaml.v3" ) // liveEnvVar gates the live test: it only runs when set to "1". This keeps the @@ -645,7 +647,8 @@ func (r *runner) findProjectDir() string { return "" } -// findServiceName reads the first service name from the project's azure.yaml. +// findServiceName reads the service name from the project's azure.yaml. azd +// scaffolds exactly one service, so the sole key under services: is the name. func (r *runner) findServiceName() string { dir := r.projectDir if dir == "" { @@ -659,19 +662,16 @@ func (r *runner) findServiceName() string { if err != nil { return "" } - inServices := false - for line := range strings.SplitSeq(string(data), "\n") { - trimmed := strings.TrimSpace(line) - if trimmed == "services:" { - inServices = true - continue - } - if inServices && strings.HasPrefix(line, " ") && strings.HasSuffix(trimmed, ":") { - return strings.TrimSuffix(trimmed, ":") - } - if inServices && !strings.HasPrefix(line, " ") && trimmed != "" { - break - } + // A struct unmarshal is more robust than scanning lines: it tolerates + // comments and indentation changes that a naive parser would mishandle. + var proj struct { + Services map[string]any `yaml:"services"` + } + if err := yaml.Unmarshal(data, &proj); err != nil || len(proj.Services) == 0 { + return "" + } + for name := range proj.Services { + return name } return "" } From 9a1584cef28ad80f3ab0381b9c55f3e0519adbc0 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Thu, 25 Jun 2026 16:17:04 +0800 Subject: [PATCH 23/33] Use t.Context() as the test parent context in Tier 2 runner --- .../azure.ai.agents/tests/e2e-live/tier2_live_test.go | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index f02edc03ca4..953d1c10153 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -69,7 +69,7 @@ func TestTier2Live(t *testing.T) { for _, mode := range deployModesFromEnv() { t.Run(mode, func(t *testing.T) { r := newRunner(t, mode) - ctx, cancel := context.WithTimeout(context.Background(), runTimeout) + ctx, cancel := context.WithTimeout(t.Context(), runTimeout) defer cancel() r.run(ctx) }) @@ -148,7 +148,7 @@ func newRunner(t *testing.T, mode string) *runner { // CI (GitHub Actions / Azure DevOps / explicit override) uses the az CLI // session for auth; local WSL uses azd's slower-to-avoid built-in auth. if useAzCliAuth() { - _, _ = r.runAzd(context.Background(), testDir, time.Minute, + _, _ = r.runAzd(t.Context(), testDir, time.Minute, "config", "set", "auth.useAzCliAuth", "true") } From c23f9005087f636f133f9d92204ffd3731a9fcfa Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Fri, 26 Jun 2026 10:54:06 +0800 Subject: [PATCH 24/33] Log unhandled prompts that fall through to the default dispatch case --- .../azure.ai.agents/tests/e2e-live/tier2_live_test.go | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index 953d1c10153..b62b61f05e7 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -472,6 +472,10 @@ func (r *runner) dispatchPrompt(screen, prompt string) { r.enter() default: + // No specific case matched: send Enter as a safe default, but log the + // fall-through so CI can distinguish "matched and answered correctly" + // from "hit the catch-all" when a new or changed prompt appears. + r.t.Logf("unhandled prompt (default Enter): %s", truncate(prompt, 100)) r.enter() } } From ab890b1163c9a53c6f56031a06e66bc3b4943d82 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Fri, 26 Jun 2026 16:23:20 +0800 Subject: [PATCH 25/33] Tighten interactive prompt dispatch matches to full preamble strings --- .../tests/e2e-live/tier2_live_test.go | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index b62b61f05e7..845059ed73e 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -410,9 +410,11 @@ func (r *runner) dispatchPrompt(screen, prompt string) { r.enter() } - // Subscription — preamble "Select an Azure subscription..." - // (init_foundry_resources_helpers.go:905) + azd-core picker. - case has("subscription"): + // Subscription — the extension prints "Select an Azure subscription to ..." + // (init.go:1709 etc.) before the azd-core picker. Match the full preamble, + // not the bare word, so an unrelated prompt mentioning a subscription can't + // match by accident. + case has("select an azure subscription"): if sub := os.Getenv("E2E_SUBSCRIPTION"); sub != "" { r.selectByText(sub[:min(8, len(sub))]) } else { @@ -441,9 +443,13 @@ func (r *runner) dispatchPrompt(screen, prompt string) { case has("select a model"): r.selectByText("gpt-4o-mini") - // Deployment capacity / sku / version — azd-core PromptAiDeployment - // (init_models.go:519); accept defaults. - case has("capacity") || has("sku") || has("version"): + // Deployment version / SKU / capacity — azd-core PromptAiDeployment renders + // these exact picker messages (prompt_service.go:143 / 190 / 226); accept + // defaults. Match the full message rather than the bare keyword so a future + // prompt merely containing "version"/"sku"/"capacity" can't match by accident + // (it would fall through to the logged default instead). + case has("select a version for") || has("select a sku for") || + has("enter deployment capacity for"): r.enter() // Code-deploy prompts (init_from_code.go:1508 / 1534 / 1563). Auto-resolved From 4906d7161c131b49ea1556f0b090d1248c04f688 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Fri, 26 Jun 2026 17:11:56 +0800 Subject: [PATCH 26/33] Reference PromptAiDeployment by name instead of stale line numbers Merging main shifted prompt_service.go, leaving the cited line numbers (143/190/226) pointing at the wrong lines. Drop the brittle file:line reference and rely on the function name plus the quoted picker messages, which the case clauses already match exactly. --- .../azure.ai.agents/tests/e2e-live/tier2_live_test.go | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index 845059ed73e..842a6469ffc 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -443,11 +443,11 @@ func (r *runner) dispatchPrompt(screen, prompt string) { case has("select a model"): r.selectByText("gpt-4o-mini") - // Deployment version / SKU / capacity — azd-core PromptAiDeployment renders - // these exact picker messages (prompt_service.go:143 / 190 / 226); accept - // defaults. Match the full message rather than the bare keyword so a future - // prompt merely containing "version"/"sku"/"capacity" can't match by accident - // (it would fall through to the logged default instead). + // Deployment version / SKU / capacity — azd-core's PromptAiDeployment renders + // these exact picker messages; accept defaults. Match the full message rather + // than the bare keyword so a future prompt merely containing + // "version"/"sku"/"capacity" can't match by accident (it would fall through to + // the logged default instead). case has("select a version for") || has("select a sku for") || has("enter deployment capacity for"): r.enter() From cbdf5a5e2eefe42c9514611532a3241d25102cda Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Fri, 26 Jun 2026 20:12:56 +0800 Subject: [PATCH 27/33] Fix subscription dispatch to match azd-core's "Select subscription" ab890b116 tightened the subscription case to has("select an azure subscription"), but that string is the extension's fmt.Println preamble, not the survey "?" line activePrompt reads. ensureSubscription passes an empty PromptSubscriptionRequest, so the picker renders azd-core's default message "Select subscription" (prompt_service.go:507). The tightened match missed it, sending the prompt to the default Enter and selecting the wrong subscription in a live run. Match "select subscription" instead. --- .../azure.ai.agents/tests/e2e-live/tier2_live_test.go | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index 842a6469ffc..68bc85db619 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -410,11 +410,12 @@ func (r *runner) dispatchPrompt(screen, prompt string) { r.enter() } - // Subscription — the extension prints "Select an Azure subscription to ..." - // (init.go:1709 etc.) before the azd-core picker. Match the full preamble, - // not the bare word, so an unrelated prompt mentioning a subscription can't - // match by accident. - case has("select an azure subscription"): + // Subscription — the extension prints a descriptive preamble via fmt.Println + // (init.go:1709 etc.), but that line isn't the survey "?" line activePrompt + // reads. ensureSubscription passes an empty request, so the picker shows + // azd-core's default message "Select subscription" (prompt_service.go:507) — + // match that, not the preamble. + case has("select subscription"): if sub := os.Getenv("E2E_SUBSCRIPTION"); sub != "" { r.selectByText(sub[:min(8, len(sub))]) } else { From 3c0f717062e5168a67f30ac98e2aa78e839ed275 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 29 Jun 2026 10:49:21 +0800 Subject: [PATCH 28/33] Scope 2+2 invoke check to the agent response region Match only the model's reply ([agent] ... up to 'Server responded in') so stray digits elsewhere in CLI output can't false-pass; replace stale source line numbers in dispatch comments with function names. --- .../azure.ai.agents/tests/e2e-live/assert.go | 27 +++++++++ .../tests/e2e-live/assert_test.go | 40 +++++++++++++ .../tests/e2e-live/tier2_live_test.go | 60 ++++++++++--------- 3 files changed, 98 insertions(+), 29 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go index d210f5ff19d..6026116653b 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert.go @@ -15,6 +15,33 @@ import ( // (case-insensitive), e.g. "the answer is four". var spelledFourRe = regexp.MustCompile(`(?i)\bfour\b`) +// agentLineRe matches the start of an agent reply line, which invoke prints as +// "[] " (invoke.go printf "[%s] %s"). responseEndRe matches the +// green footer invoke prints after the reply, "Server responded in ..." — the +// region between them is exactly the model's answer, with no surrounding noise. +var ( + agentLineRe = regexp.MustCompile(`(?m)^\[[^\]]+\] `) + responseEndRe = regexp.MustCompile(`Server responded in`) +) + +// agentResponseRegion returns just the agent's printed answer, sliced from the +// first "[] " line to the "Server responded in" footer. Scoping the +// 2+2 check to this region keeps stray "4"s from the rest of the CLI output +// (model names, versions, status codes) from passing the test. If either marker +// is missing the format changed, so it returns the full text and lets the +// standalone-digit rules below guard against false positives. +func agentResponseRegion(out string) string { + start := agentLineRe.FindStringIndex(out) + if start == nil { + return out + } + rest := out[start[0]:] + if end := responseEndRe.FindStringIndex(rest); end != nil { + return rest[:end[0]] + } + return out +} + // responseHasExpectedAnswer reports whether text answers "what is 2+2?" with a // standalone "4" or the spelled-out word "four". // diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go index d35224e592b..4e9e4493276 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go @@ -40,3 +40,43 @@ func TestResponseHasExpectedAnswer(t *testing.T) { }) } } + +func TestAgentResponseRegion(t *testing.T) { + t.Parallel() + + cases := []struct { + name string + out string + want bool // responseHasExpectedAnswer over the sliced region + }{ + { + "answer scoped between markers", + "using model gpt-4o-mini\n[agent] The answer is 4.\nServer responded in 2s (first byte: 1s)\n", + true, + }, + { + "stray digits outside region rejected", + "gpt-4o-mini deployed (404 cached)\n[agent] I am not sure.\nServer responded in 4.0s\n", + false, + }, + { + "missing footer falls back to full text", + "using gpt-4o-mini\n[agent] four", + true, + }, + { + "no agent line falls back to full text", + "the answer is four", + true, + }, + } + + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + if got := responseHasExpectedAnswer(agentResponseRegion(tc.out)); got != tc.want { + t.Errorf("region(%q) -> %v, want %v", tc.out, got, tc.want) + } + }) + } +} diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index 68bc85db619..4c4b08d2610 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -215,9 +215,9 @@ func (r *runner) phaseInit(ctx context.Context) error { defer cancel() // Deploy mode is NOT an interactive prompt in the template/--agent-name - // flow: init auto-resolves it to "container" when a manifest is provided - // (init_from_code.go:1373), so it must be chosen via the --deploy-mode flag - // (init.go:1306). r.mode is exactly "container" or "code". + // flow: promptDeployMode (init_from_code.go) auto-resolves it to "container" + // when a manifest is provided, so it must be chosen via newInitCommand's + // --deploy-mode flag (init.go). r.mode is exactly "container" or "code". args := []string{"ai", "agent", "init", "--agent-name", r.agentName, "--deploy-mode", r.mode} //nolint:gosec // azd is a trusted fixed binary; args are test-controlled. cmd := exec.CommandContext(ictx, "azd", args...) @@ -268,7 +268,7 @@ func (r *runner) phaseInit(ctx context.Context) error { // prompts cannot be predetermined. A linear ExpectString sequence would desync // at the first conditional prompt. Instead we block on output settling (the // go-expect read), then dispatch on the verbatim prompt strings the extension -// prints (each case annotated with its source file:line). +// prints (each case annotated with the source function that prints it). func (r *runner) driveInit(ctx context.Context, exited <-chan struct{}) error { var lastKey string repeat := 0 @@ -342,8 +342,8 @@ func (r *runner) finishInit(ctx context.Context) error { } // isInitComplete reports whether the success marker is on screen. Source: -// init.go:1483 prints "AI agent definition added to your azd project -// successfully!" in green at the end of runInitFromManifest. +// runInitFromManifest (init.go) prints "AI agent definition added to your azd +// project successfully!" in green at the end. func isInitComplete(screen string) bool { return screenContains(screen, "added to your azd project") || screenContains(screen, "agent definition added") @@ -359,7 +359,7 @@ func promptKey(prompt string) string { } // dispatchPrompt answers a single survey prompt. Cases are ordered specific → -// generic and keyed on the verbatim messages the extension prints; the file:line +// generic and keyed on the verbatim messages the extension prints; the function // in each comment points at the source string this matches. The prompt argument // is already lowercased (see activePrompt). // @@ -372,9 +372,10 @@ func (r *runner) dispatchPrompt(screen, prompt string) { has := func(sub string) bool { return strings.Contains(prompt, sub) } switch { - // Yes/No confirms. "Continue with this existing agent name?" (init.go:722) - // only fires when the unique name already exists; decline it to reach the - // fresh-name input. Any other confirm: accept. + // Yes/No confirms. "Continue with this existing agent name?" + // (resolveExistingAgentNameConflictWithChecker) only fires when the unique + // name already exists; decline it to reach the fresh-name input. Any other + // confirm: accept. case has("[y/n]") || has("(y/n)") || has("continue with this existing agent name"): if has("continue with this existing agent name") { r.c.send("n") @@ -383,17 +384,17 @@ func (r *runner) dispatchPrompt(screen, prompt string) { } r.c.send(keyEnter) - // Language select — "Select a language" (init_from_templates_helpers.go:263). + // Language select — "Select a language" (promptAgentTemplate). case has("select a language"): r.selectByText("Python") // Template select — "Select a starter template" / "Select an agent template" - // (init_from_templates_helpers.go:304 / 327). + // (promptAgentTemplate). case has("starter template") || has("agent template"): r.selectByText("Basic agent (Invocations") // Foundry project hosting — "Select a Foundry project to host your agent..." - // (init.go:1752 / 1910); choices "Use an existing..." / "Create a new...". + // (runInitFromManifest); choices "Use an existing..." / "Create a new...". case has("foundry project to host"): if r.createProject() { r.selectByText("Create a new Foundry project") @@ -402,7 +403,7 @@ func (r *runner) dispatchPrompt(screen, prompt string) { } // Existing-project picker — "Select a Foundry project" - // (init_foundry_resources_helpers.go:1360); only when reusing a project. + // (selectFoundryProject); only when reusing a project. case has("select a foundry project"): if p := os.Getenv("E2E_PROJECT"); p != "" { r.selectByText(p) @@ -411,10 +412,10 @@ func (r *runner) dispatchPrompt(screen, prompt string) { } // Subscription — the extension prints a descriptive preamble via fmt.Println - // (init.go:1709 etc.), but that line isn't the survey "?" line activePrompt + // (runInitFromManifest), but that line isn't the survey "?" line activePrompt // reads. ensureSubscription passes an empty request, so the picker shows - // azd-core's default message "Select subscription" (prompt_service.go:507) — - // match that, not the preamble. + // azd-core's default message "Select subscription" (promptSubscriptionMessage) + // — match that, not the preamble. case has("select subscription"): if sub := os.Getenv("E2E_SUBSCRIPTION"); sub != "" { r.selectByText(sub[:min(8, len(sub))]) @@ -422,25 +423,25 @@ func (r *runner) dispatchPrompt(screen, prompt string) { r.enter() } - // Location — preamble "Select an Azure location..." - // (init_foundry_resources_helpers.go:1004) + azd-core picker. + // Location — preamble "Select an Azure location..." (ensureLocation) + + // azd-core picker. case has("location") || has("region"): r.selectByText(getenvDefault("E2E_LOCATION", "eastus2")) // Manifest model decision — "Model '%s' is specified in the agent manifest." - // (init_models.go:463); keep the manifest model (default first choice). + // (getModelDetails); keep the manifest model (default first choice). case has("is specified in the agent manifest"): r.enter() - // Existing deployments / generic proceed — init_models.go:263 / 330. + // Existing deployments / generic proceed — getModelDeploymentDetails. case has("how would you like to proceed") || has("existing deployment"): r.enter() - // Model deployment name input — init_models.go:398 (default = model name). + // Model deployment name input — getModelDeploymentDetails (default = model name). case has("model deployment name") || (has("deployment name") && has("model")): r.enter() - // Model select — "Select a model" (init_models.go:704 etc.). + // Model select — "Select a model" (promptForAlternativeModel etc.). case has("select a model"): r.selectByText("gpt-4o-mini") @@ -453,8 +454,8 @@ func (r *runner) dispatchPrompt(screen, prompt string) { has("enter deployment capacity for"): r.enter() - // Code-deploy prompts (init_from_code.go:1508 / 1534 / 1563). Auto-resolved - // under userProvidedManifest=true, so kept as defensive handlers only. + // Code-deploy prompts (promptCodeConfig). Auto-resolved under + // userProvidedManifest=true, so kept as defensive handlers only. case has("select the runtime for your agent"): r.enter() // default Python 3.13 case has("entry point"): @@ -463,18 +464,19 @@ func (r *runner) dispatchPrompt(screen, prompt string) { r.enter() // default remote build // Optional infra (blank => create new): ACR login server - // (init_foundry_resources_helpers.go:481), App Insights (:606 / :621). + // (configureAcrConnection), App Insights (configureAppInsightsConnection). case has("acr login server") || has("container registry"): r.enter() case has("application insights"): r.enter() - // Startup command (helpers.go:773); blank => skip. + // Startup command (resolveStartupCommandForInit); blank => skip. case has("command to start your agent"): r.enter() // Replacement agent name after declining the existing-name confirm - // (init.go:745) / the name input (init.go:261); accept the default. + // (promptForReplacementAgentName) / the name input (resolveInitAgentName); + // accept the default. case has("enter a different name for your agent") || has("enter a name for your agent"): r.enter() @@ -560,7 +562,7 @@ func (r *runner) phaseInvoke(ctx context.Context) error { continue } - if !responseHasExpectedAnswer(out) { + if !responseHasExpectedAnswer(agentResponseRegion(out)) { if attempt < maxRetries { r.t.Log("response missing expected '4'/'four'; retrying") if err := sleepCtx(ctx, 15*time.Second); err != nil { From 517544a47d9ed2a790fcf5e573d65c9ca0960fbf Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 29 Jun 2026 10:49:22 +0800 Subject: [PATCH 29/33] Generate live-pipeline config.json from extension.yaml via yq Derive the extension manifest fields from extension.yaml so they cannot drift from a hand-maintained copy; only path/source/test version injected. --- eng/pipelines/ext-azure-ai-agents-live.yml | 43 ++++++++++++---------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index bc91070778a..9d2f81f8a19 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -89,7 +89,11 @@ extends: # Install the freshly built (live, non-record) extension into the # azd config dir: copy the binary where azd expects it and write a - # minimal config.json so `azd ai agent` resolves the extension. + # config.json so `azd ai agent` resolves the extension. The config + # is generated FROM extension.yaml via yq so the manifest fields + # (capabilities, namespace, usage, ...) can never drift from a + # hand-maintained copy here; only test-specific fields (path, + # source, sentinel version) are injected. - bash: | set -euo pipefail # Map the agent architecture to azd's expected binary suffix so @@ -108,25 +112,24 @@ extends: mkdir -p "$EXT_DIR" cp cli/azd/extensions/azure.ai.agents/azure-ai-agents "$EXT_DIR/$BIN_NAME" chmod +x "$EXT_DIR/$BIN_NAME" - cat > "$HOME/.azd/config.json" << EOF - { - "extension": { - "installed": { - "azure.ai.agents": { - "id": "azure.ai.agents", - "namespace": "ai.agent", - "capabilities": ["custom-commands", "lifecycle-events", "mcp-server", "service-target-provider", "metadata"], - "displayName": "Foundry agents (Preview)", - "description": "Ship agents with Microsoft Foundry from your terminal. (Preview)", - "version": "0.0.0-test", - "usage": "azd ai agent [options]", - "path": "extensions/azure.ai.agents/${BIN_NAME}", - "source": "azd" - } - } - } - } - EOF + # yq ships on the azure-sdk Linux images; install the pinned + # version as a fallback if a future image drops it. + command -v yq >/dev/null 2>&1 || go install github.com/mikefarah/yq/v4@v4.44.3 + export BIN_NAME + yq -o=json ' + .id as $id | { + "extension": {"installed": {$id: { + "id": .id, + "namespace": .namespace, + "capabilities": .capabilities, + "displayName": .displayName, + "description": .description, + "version": "0.0.0-test", + "usage": .usage, + "path": "extensions/azure.ai.agents/" + env(BIN_NAME), + "source": "azd" + }}} + }' cli/azd/extensions/azure.ai.agents/extension.yaml > "$HOME/.azd/config.json" displayName: Install azure.ai.agents extension # Run the live golden path INSIDE the AzureCLI@2 task so the az CLI From 2d9dfb7e1e705efd1c1e9092432bc1b20eb12f33 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Mon, 29 Jun 2026 11:17:24 +0800 Subject: [PATCH 30/33] Add region test for standalone 4 before agent reply Per review: verify a real standalone digit ahead of the [agent] line is excluded by the response-region slicing. --- .../extensions/azure.ai.agents/tests/e2e-live/assert_test.go | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go index 4e9e4493276..d81b7ba309a 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/assert_test.go @@ -59,6 +59,11 @@ func TestAgentResponseRegion(t *testing.T) { "gpt-4o-mini deployed (404 cached)\n[agent] I am not sure.\nServer responded in 4.0s\n", false, }, + { + "standalone 4 before agent line excluded by region", + "completed step 4\n[agent] I don't know.\nServer responded in 1s\n", + false, + }, { "missing footer falls back to full text", "using gpt-4o-mini\n[agent] four", From 30dfb29f993fa57c480b928b32957510542b7296 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Tue, 30 Jun 2026 09:48:27 +0800 Subject: [PATCH 31/33] fix: modernize strings.Split to SplitSeq for go fix lint --- .../azure.ai.agents/tests/e2e-live/tier2_live_test.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go index 8cab0ba341a..ad963a1bbe8 100644 --- a/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go +++ b/cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go @@ -674,7 +674,7 @@ func (r *runner) azdEnvValues(ctx context.Context) map[string]string { } // Lines are KEY="value"; Cut on the first '=' so values containing '=' are // preserved, then strip the surrounding quotes azd always emits. - for _, line := range strings.Split(out, "\n") { + for line := range strings.SplitSeq(out, "\n") { key, val, ok := strings.Cut(strings.TrimSpace(line), "=") if !ok { continue From e1c7ab8f4601af274d124b9a21457b9bed6e4a82 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Tue, 30 Jun 2026 10:46:54 +0800 Subject: [PATCH 32/33] fix: tag cleanup resource groups by azd env name, not env values A provision that times out before persisting AZURE_RESOURCE_GROUP would leave its resource group untagged, so the EngSys cleanup pipeline could not reclaim it if azd down also failed. Enumerate azd envs on disk (.azure/) and tag the deterministic rg- group, which exists from the start of provisioning, closing the only permanent-leak gap. --- eng/pipelines/ext-azure-ai-agents-live.yml | 23 ++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/eng/pipelines/ext-azure-ai-agents-live.yml b/eng/pipelines/ext-azure-ai-agents-live.yml index 4def1f28748..05b4b1713ae 100644 --- a/eng/pipelines/ext-azure-ai-agents-live.yml +++ b/eng/pipelines/ext-azure-ai-agents-live.yml @@ -217,16 +217,23 @@ extends: [ -n "$proj" ] || continue ( cd "$proj" - # Tag the resource group before attempting `azd down` so the - # EngSys garbage collector can still reclaim it if the delete - # below fails - the cleanup pipeline keys off DeleteAfter. - rg=$(azd env get-values 2>/dev/null \ - | sed -n 's/^AZURE_RESOURCE_GROUP="\(.*\)"$/\1/p' | head -1) - if [ -n "$rg" ]; then - echo "Tagging $rg with DeleteAfter=$delete_after" + # Tag the resource group(s) before attempting `azd down` so + # the EngSys garbage collector can still reclaim them if the + # delete below fails - the cleanup pipeline keys off + # DeleteAfter. Enumerate by azd env name rather than reading + # AZURE_RESOURCE_GROUP from env values, which a provision that + # times out may not have persisted yet; azd creates the group + # as rg- at the start of provisioning, so the name is + # known even when the run dies mid-deploy. + for d in .azure/*/; do + [ -d "$d" ] || continue + name=${d#.azure/} + name=${name%/} + rg="rg-$name" + echo "Tagging $rg (env=$name) with DeleteAfter=$delete_after" az group update --name "$rg" \ --set "tags.DeleteAfter=$delete_after" --output none || true - fi + done azd down --force --purge --no-prompt ) 2>&1 || true done From 990ffa5c6749ceb4c05acdb1cafc9b7760359635 Mon Sep 17 00:00:00 2001 From: Jian Wu Date: Tue, 30 Jun 2026 15:00:29 +0800 Subject: [PATCH 33/33] build: bump creack/pty test dep to v1.1.24 --- cli/azd/extensions/azure.ai.agents/go.mod | 2 +- cli/azd/extensions/azure.ai.agents/go.sum | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/cli/azd/extensions/azure.ai.agents/go.mod b/cli/azd/extensions/azure.ai.agents/go.mod index cc00b9e4628..dc8e461f274 100644 --- a/cli/azd/extensions/azure.ai.agents/go.mod +++ b/cli/azd/extensions/azure.ai.agents/go.mod @@ -37,7 +37,7 @@ require github.com/denormal/go-gitignore v0.0.0-20180930084346-ae8ad1d07817 require ( github.com/Netflix/go-expect v0.0.0-20220104043353-73e0943537d2 - github.com/creack/pty v1.1.17 + github.com/creack/pty v1.1.24 github.com/hinshun/vt10x v0.0.0-20220119200601-820417d04eec go.opentelemetry.io/otel v1.43.0 go.opentelemetry.io/otel/trace v1.43.0 diff --git a/cli/azd/extensions/azure.ai.agents/go.sum b/cli/azd/extensions/azure.ai.agents/go.sum index 396d8b1ea21..3a4cc258c9e 100644 --- a/cli/azd/extensions/azure.ai.agents/go.sum +++ b/cli/azd/extensions/azure.ai.agents/go.sum @@ -100,8 +100,9 @@ github.com/clipperhouse/stringish v0.1.1/go.mod h1:v/WhFtE1q0ovMta2+m+UbpZ+2/HEX github.com/clipperhouse/uax29/v2 v2.5.0 h1:x7T0T4eTHDONxFJsL94uKNKPHrclyFI0lm7+w94cO8U= github.com/clipperhouse/uax29/v2 v2.5.0/go.mod h1:Wn1g7MK6OoeDT0vL+Q0SQLDz/KpfsVRgg6W7ihQeh4g= github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= -github.com/creack/pty v1.1.17 h1:QeVUsEDNrLBW4tMgZHvxy18sKtr6VI492kBhUfhDJNI= github.com/creack/pty v1.1.17/go.mod h1:MOBLtS5ELjhRRrroQr9kyvTxUAFNvYEK993ew/Vr4O4= +github.com/creack/pty v1.1.24 h1:bJrF4RRfyJnbTJqzRLHzcGaZK1NeM5kTC9jGgovnR1s= +github.com/creack/pty v1.1.24/go.mod h1:08sCNb52WyoAwi2QDyzUCTgcvVFhUzewun7wtTfvcwE= github.com/danwakefield/fnmatch v0.0.0-20160403171240-cbb64ac3d964 h1:y5HC9v93H5EPKqaS1UYVg1uYah5Xf51mBfIoWehClUQ= github.com/danwakefield/fnmatch v0.0.0-20160403171240-cbb64ac3d964/go.mod h1:Xd9hchkHSWYkEqJwUGisez3G1QY8Ryz0sdWrLPMGjLk= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=