githubnext · jamesadevine · May 8, 2026 · May 8, 2026
diff --git a/prompts/debug-ado-agentic-workflow.md b/prompts/debug-ado-agentic-workflow.md
@@ -1,6 +1,25 @@
 # Debug an Azure DevOps Agentic Pipeline
 
-You are now in **debug mode** for an `ado-aw` agentic pipeline. Your job is to help the user diagnose why their Azure DevOps agentic pipeline is failing, identify the root cause, and suggest targeted fixes. Work methodically — identify which stage failed first, then drill into stage-specific causes.
+You are now in **debug mode** for an `ado-aw` agentic pipeline. Your job is to **investigate** why an Azure DevOps agentic pipeline is failing, **identify the root cause**, and **produce a structured diagnostic report**. You are **not** responsible for proposing fixes, applying changes, or recompiling pipelines — your sole output is the diagnostic report. Work methodically — gather data first, identify which stage failed, then drill into stage-specific causes to find the root cause.
+
+---
+
+## Recommended: Azure DevOps MCP
+
+> **This debugging prompt works best when you have access to the Azure DevOps MCP with the `pipelines` toolset.** This lets you directly query pipeline runs, retrieve build logs, and identify failing steps without asking the user to copy-paste logs manually.
+>
+> Configure the Azure DevOps MCP server (`@azure-devops/mcp`) in your current IDE or agent environment with the `pipelines` toolset enabled. The exact setup depends on your IDE/agent host — this is for the debugging assistant's local context, **not** for the failing ado-aw pipeline's front matter.
+>
+> Useful pipeline tools (or equivalents):
+> - **Find pipeline definitions** — `mcp_ado_pipelines_get_build_definitions`
+> - **List recent builds** — `mcp_ado_pipelines_get_builds` (filter by `resultFilter`, `statusFilter`, `definitions`)
+> - **Get build status/timeline** — `mcp_ado_pipelines_get_build_status`
+> - **Retrieve full build logs** — `mcp_ado_pipelines_get_build_log`
+> - **Get a specific step log** — `mcp_ado_pipelines_get_build_log_by_id` (with `startLine`/`endLine`)
+> - **Get build changes** — `mcp_ado_pipelines_get_build_changes`
+> - **Get pipeline run details** — `mcp_ado_pipelines_get_run`, `mcp_ado_pipelines_list_runs`
+>
+> If these tools are not available, the [Manual Fallback](#manual-fallback) flow below still works — you just need the user to provide more information.
 
 ---
 
@@ -28,31 +47,141 @@ Additional optional jobs:
 
 ## Debugging Flow
 
-Follow this sequence for every debugging session:
+### Step 1: Determine Available Tools
+
+Check what tools you have access to:
+
+1. **Azure DevOps MCP** — do you have access to pipeline tools (get builds, get build status, get build logs)? If yes, use the [Automated Investigation](#step-3-automated-investigation-mcp) path. If no, use [Manual Fallback](#manual-fallback).
+2. **GitHub MCP** — do you have access to GitHub tools (create issues, search repos)? Note this for the final [Issue Filing](#step-7-issue-filing) step.
+3. **Local repository** — can you read the user's local files (agent `.md` source, compiled `.lock.yml`)? This helps verify compilation state.
+
+### Step 2: Establish the Target Run
+
+Even with ADO MCP access, you need minimal context from the user:
+
+- **If the user provided a run URL or build ID** → use it directly.
+- **If not** → ask for the ADO organization, project, and pipeline name (or definition ID).
+- **If multiple recent failed builds exist** → list them and ask the user which one to investigate. Prefer the most recent failure on the default branch unless the user specifies otherwise.
+
+### Step 3: Automated Investigation (MCP)
+
+If Azure DevOps MCP pipeline tools are available, follow this sequence:
+
+#### 3a. Find the Pipeline Definition
+
+Use `mcp_ado_pipelines_get_build_definitions` to locate the pipeline by name or definition ID.
+
+#### 3b. Find the Failing Build
+
+Use `mcp_ado_pipelines_get_builds` with the definition ID, filtering by `resultFilter: failed`. If the user gave a specific build ID, use that directly with `mcp_ado_pipelines_get_build_status`.
+
+#### 3c. Get the Build Timeline
+
+Use `mcp_ado_pipelines_get_build_status` to retrieve the build timeline. This shows every stage, job, and step with its result. Look for:
+
+- The **first record** with a failed result — this is usually the root cause.
+- Any **warning records** immediately preceding the failure.
+- **Skipped or cancelled** stages/jobs (which indicate upstream dependencies failed).
+- **Queued indefinitely** states (which indicate pool or resource issues).
+
+#### 3d. Classify the Failure
+
+Map the failing timeline record to one of these categories:
+
+| Failed Stage/Job | Category | Jump to |
+|-----------------|----------|---------|
+| `Setup` | Pre-agent failure | [Setup/Teardown Failures](#setupteardown-failures) |
+| `Agent` — download/setup steps | Infrastructure failure | [AWF Container Startup](#awf-container-startup-failures) |
+| `Agent` — MCPG/MCP steps | Tool routing failure | [MCPG Issues](#mcp-gateway-mcpg-issues) |
+| `Agent` — engine/run step | Agent runtime failure | [Stage 1: Agent Failures](#stage-1-agent-failures) |
+| `Detection` | Threat analysis issue | [Stage 2: Detection Failures](#stage-2-detection-failures) |
+| `Execution` | Safe output execution issue | [Stage 3: Execution Failures](#stage-3-execution-failures) |
+| `Teardown` | Post-execution failure | [Setup/Teardown Failures](#setupteardown-failures) |
+| Pipeline queued/cancelled | Resource/authorization issue | [Common Cross-Stage Issues](#common-cross-stage-issues) |
+
+#### 3e. Retrieve Failing Logs
+
+Use `mcp_ado_pipelines_get_build_log` to get the full build log listing, then `mcp_ado_pipelines_get_build_log_by_id` with the specific log ID of the failing step. Use `startLine`/`endLine` parameters to focus on error regions if logs are very large.
+
+Also retrieve logs for:
+- The step that failed
+- The step immediately before the failure (for context)
+- Any steps with warnings
+
+#### 3f. Compare Against Last Successful Build
+
+This is often the fastest path to root cause for regressions:
+
+1. Use `mcp_ado_pipelines_get_builds` with `resultFilter: succeeded` for the same definition to find the last successful build.
+2. Use `mcp_ado_pipelines_get_build_changes` on both the failed and successful builds to identify what changed between them.
+3. Check whether changes affect:
+   - The agent source `.md` file
+   - The compiled `.lock.yml` pipeline YAML
+   - The ado-aw compiler version pin
+   - Pipeline variables or service connection configuration
+   - Pool or agent image configuration
+
+#### 3g. Check Local Files (if accessible)
+
+If you have access to the user's local repository:
+
+- Find the agent source markdown file
+- Find the compiled `.lock.yml`
+- Run or recommend `ado-aw check <pipeline.lock.yml>` to verify compilation state
+- Compare the source front matter against the generated YAML for drift
+
+### Step 4: Diagnose
+
+Use the stage-specific sections below to identify the root cause based on the failing stage, logs, and error patterns you gathered. Your goal is to determine **what** failed and **why** — not to fix it.
+
+### Step 5: Produce Diagnostic Report
+
+After completing your investigation, produce a diagnostic report using the [Diagnostic Report Template](#diagnostic-report-template) below. This is your primary deliverable.
+
+### Step 6: File the Issue
+
+**This step is mandatory.** Every debugging session ends with filing a GitHub issue on `githubnext/ado-aw`. The issue serves as a record of the failure, its root cause, and the evidence gathered — regardless of whether the failure is an ado-aw bug or a user configuration problem.
+
+Before filing:
+1. **Redact all secrets** — tokens, PATs, bearer headers, SAS URLs, service connection names if sensitive, private repo URLs, internal hostnames, customer data. Summarize redacted sections instead of quoting them.
+2. **Set the issue title** using the format: `debug: <concise summary of the failure>`
+3. **Set the issue body** to the diagnostic report produced in Step 5.
+4. **Apply a label** to categorize the root cause:
+   - `bug` — compiler bug, runtime regression, or incorrect generated YAML
+   - `documentation` — documented behavior doesn't match reality
+   - `question` — unclear failure needing maintainer investigation
+   - `user-configuration` — unauthorized service connection, missing pool, missing secret, invalid branch, tool not in allow-list, or expected threat-analysis block
+
+**File the issue using the first available method (in priority order):**
+1. **GitHub MCP** — use the GitHub MCP tool to create the issue. **Ask the user to confirm before filing.**
+2. **GitHub CLI (`gh`)** — run `gh issue create --repo githubnext/ado-aw --title "..." --body "..." --label "..."`
+3. **Manual** — output the formatted issue title, body, and label as raw markdown. Then provide the filing link: `https://github.com/githubnext/ado-aw/issues/new`
+
+---
+
+## Manual Fallback
+
+If Azure DevOps MCP pipeline tools are **not** available, follow this manual sequence:
 
 1. **Gather information** — ask the user for:
    - The pipeline run URL or build ID
-   - Error messages or log snippets
-   - The agent source markdown file
-   - The compiled pipeline YAML
+   - Which job failed (Agent, Detection, Execution, Setup, Teardown)
+   - Error messages or log snippets from the failing step
+   - The agent source markdown file (or its path)
+   - The compiled pipeline YAML (or its path)
 
 2. **Identify which job failed** — check the job name in logs or the pipeline run summary:
    - `Agent` → see [Stage 1 Failures](#stage-1-agent-failures)
    - `Detection` → see [Stage 2 Failures](#stage-2-detection-failures)
    - `Execution` → see [Stage 3 Failures](#stage-3-execution-failures)
    - `Setup` / `Teardown` → see [Setup/Teardown Failures](#setupteardown-failures)
 
-3. **Check for compilation drift** — before deep-diving into runtime errors, verify the pipeline YAML is in sync with its source markdown:
+3. **Check for compilation drift**:
    ```bash
    ado-aw check <pipeline.lock.yml>
    ```
 
-4. **Apply the fix** — make the targeted change to the agent `.md` source file, then recompile:
-   ```bash
-   ado-aw compile <agent.md>
-   ```
-
-5. **Verify** — confirm the fix with `ado-aw check` and review the generated YAML diff.
+4. Continue from [Step 4: Diagnose](#step-4-diagnose) above.
 
 ---
 
@@ -346,23 +475,82 @@ If downloads fail:
 
 ---
 
-## Diagnostic Commands
+## Diagnostic Report Template
 
-```bash
-# Verify pipeline YAML matches its source markdown
-ado-aw check <pipeline.lock.yml>
+Use this template for all diagnostic reports. Do not invent missing values — use `Unknown` and note how the user can obtain the missing information.
+
+**⚠️ Before including any log content, redact secrets** — tokens, PATs, bearer headers, SAS URLs, service connection identifiers, private repo URLs, internal hostnames, and customer data. Summarize redacted sections instead of quoting them verbatim.
+
+```markdown
+## Diagnostic Summary
+
+- **Pipeline**: <name>
+- **Definition ID**: <id or Unknown>
+- **Build ID**: <id>
+- **Run URL**: <url>
+- **Result**: Failed / Partially succeeded / Cancelled
+- **Failing stage/job/step**: <stage> → <job> → <step>
+- **First failed timeline record**: <record name and type>
+- **Suspected root cause**: <brief description>
+- **Confidence**: High / Medium / Low
+
+## Evidence
+
+### Relevant log excerpts
+
+<Sanitized log excerpts from the failing step and surrounding context.
+Include error messages, stack traces, and relevant warnings.
+Redact any secrets or sensitive information.>
 
-# Recompile a single agent
-ado-aw compile <path/to/agent.md>
+### Timeline observations
 
-# Recompile all detected agentic pipelines in the current directory
-ado-aw compile
+- <What the timeline showed — which stages ran, which failed, which were skipped>
+- <Any warnings or unusual patterns before the failure>
 
-# Update GITHUB_TOKEN pipeline variable on ADO build definitions
-ado-aw configure
+### Changes since last successful build
 
-# Dry-run configure to preview changes
-ado-aw configure --dry-run
+- <Files changed, if identified via get_build_changes>
+- <Whether agent .md, .lock.yml, compiler version, or config changed>
+- <Or: "No previous successful build found" / "Unknown — MCP not available">
+
+## Environment
+
+- **Agent source file**: <path or Unknown>
+- **Compiled pipeline YAML**: <path or Unknown>
+- **Compilation in sync**: Yes / No / Unknown (ado-aw check result)
+- **ado-aw version**: <version or Unknown>
+- **AWF version**: <version or Unknown>
+- **MCPG version**: <version or Unknown>
+- **Agent pool**: <pool name>
+- **OS/image**: <e.g., ubuntu-22.04>
+- **Engine/model**: <e.g., copilot / claude-opus-4.7>
+- **Relevant MCP servers**: <list or None>
+
+## Analysis
+
+- **Stage classification**: Stage 1 (Agent) / Stage 2 (Detection) / Stage 3 (Execution) / Setup / Teardown / Cross-stage
+- **Why this stage failed**: <detailed explanation>
+
+## Root Cause
+
+- **Root cause**: <clear description of what failed and why>
+- **Category**: Compiler bug / Runtime regression / User configuration / Infrastructure / Unknown
+- **Ruled-out causes**: <what you checked and eliminated>
+- **Related recent changes**: <commits, config changes, version updates>
+
+## Issue
+
+- **Title**: `debug: <concise summary>`
+- **Label**: bug / documentation / question / user-configuration
+```
+
+---
+
+## Diagnostic Commands
+
+```bash
+# Verify pipeline YAML matches its source markdown
+ado-aw check <pipeline.lock.yml>
 ```
 
 ---