From 55c49abd706cd738fdf59044d81668ed0dba6cc5 Mon Sep 17 00:00:00 2001
From: Qingchuan Hao <qingchuan.hao@microsoft.com>
Date: Tue, 12 Aug 2025 12:04:19 +0000
Subject: [PATCH 1/3] new runbooks prompt

---
 holmes/plugins/runbooks/README.md             | 24 +++++++++++++
 .../plugins/runbooks/runbook-format.prompt.md | 36 +++++++++++++++++++
 2 files changed, 60 insertions(+)
 create mode 100644 holmes/plugins/runbooks/runbook-format.prompt.md

diff --git a/holmes/plugins/runbooks/README.md b/holmes/plugins/runbooks/README.md
index 217a6214b..7cc9b176b 100644
--- a/holmes/plugins/runbooks/README.md
+++ b/holmes/plugins/runbooks/README.md
@@ -20,3 +20,27 @@ This runbook is mainly used for `holmes investigate`
 
 Catalog specified in [catalog.json](catalog.json) contains a collection of runbooks written in markdown.
 During runtime, LLM will compare the runbook description with the user question and return the most matched runbook for investigation. It's possible no runbook is returned for no match.
+
+## Generating Runbooks
+
+To ensure all runbooks follow a consistent format and improve troubleshooting accuracy, contributors should use the standardized [runbook format prompt](runbook-format.prompt.md) when creating new runbooks.
+
+### Using the Runbook Format Prompt
+
+1. **Start with the Template**: Use `prompt.md` as your guide when creating new runbooks
+2. **Follow the Structure**: Ensure your runbook includes all required sections:
+   - **Goal**: Clear definition of issues addressed and agent mandate
+   - **Workflow**: Sequential diagnostic steps with detailed function descriptions
+   - **Synthesize Findings**: Logic for combining outputs and identifying root causes
+   - **Recommended Remediation Steps**: Both immediate and permanent solutions
+
+### Benefits of Using the Standard Format
+
+- **Consistency**: All runbooks follow the same structure and terminology
+- **AI Agent Compatibility**: Ensures runbooks are machine-readable and executable by AI agents
+- **Improved Accuracy**: Standardized format reduces ambiguity and improves diagnostic success rates
+- **Maintainability**: Easier to update and maintain runbooks across the project
+
+### Example Usage
+
+When creating a runbook for a new issue category (e.g., storage problems, authentication failures), provide the issue description to an LLM along with the prompt template to generate a properly formatted runbook that follows the established patterns.
diff --git a/holmes/plugins/runbooks/runbook-format.prompt.md b/holmes/plugins/runbooks/runbook-format.prompt.md
new file mode 100644
index 000000000..861ac3e08
--- /dev/null
+++ b/holmes/plugins/runbooks/runbook-format.prompt.md
@@ -0,0 +1,36 @@
+You are an expert in automated diagnostics and runbook creation for AI-driven troubleshooting agents. I will provide you with one or more issue descriptions or test scenarios.
+
+Your task is to generate a strictly executable runbook for AI Agent to follow. The runbook should be machine-readable but human-understandable, and must include:
+
+## Goal
+- **Primary Objective:** Clearly define the specific category of issues this runbook addresses (e.g., "diagnose network connectivity problems", "troubleshoot pod startup failures", "investigate performance degradation").
+- **Scope:** Specify the environment, technology stack, or system components covered by this runbook.
+- **Agent Mandate:** Explicitly state that the AI agent must follow the workflow steps sequentially and systematically without deviation to ensure consistent, thorough troubleshooting.
+- **Expected Outcome:** Define what successful completion of this runbook should achieve (root cause identification, issue resolution, or escalation criteria).
+
+## Workflow for [Issue Category] Diagnosis
+- Provide numbered, sequential steps the AI agent must execute in order.
+- Each step should specify:
+  - **Action:** Describe the diagnostic function conceptually (e.g., "retrieve container logs from specified pod", "check service connectivity between components", "examine resource utilization metrics")
+  - **Function Description:** Explain what the function should accomplish rather than naming specific tools (e.g., "query the cluster to list all pods in a namespace and their current status" instead of "kubectl_get_pods()")
+  - **Parameters:** What data/arguments to pass to the function (namespace, pod name, time range, etc.)
+  - **Expected Output:** What information to gather from the result (status codes, error messages, metrics, configurations)
+  - **Success/Failure Criteria:** How to interpret the output and what indicates normal vs. problematic conditions
+- Use conditional logic (IF/ELSE) when branching is required based on findings.
+- Describe functions generically so they can be mapped to available tools (e.g., "execute a command to test network connectivity" rather than "ping_host()")
+- Include verification steps to confirm each diagnostic action was successful.
+
+## Synthesize Findings
+- **Data Correlation:** Describe how the AI agent should combine outputs from multiple workflow steps.
+- **Pattern Recognition:** Specify what patterns, error messages, or metrics indicate specific root causes.
+- **Prioritization Logic:** Provide criteria for ranking potential causes by likelihood or severity.
+- **Evidence Requirements:** Define what evidence is needed to confidently identify each potential root cause.
+- **Example Scenarios:** Include sample synthesis statements showing how findings should be summarized.
+
+## Recommended Remediation Steps
+- **Immediate Actions:** List temporary workarounds or urgent fixes for critical issues.
+- **Permanent Solutions:** Provide step-by-step permanent remediation procedures.
+- **Verification Steps:** Define how to confirm each remediation action was successful.
+- **Documentation References:** Include links to official documentation, best practices, or vendor guidance.
+- **Escalation Criteria:** Specify when and how to escalate if remediation steps fail.
+- **Post-Remediation Monitoring:** Describe what to monitor to prevent recurrence.

From cc59300de05f3abd2a5e6d79c5d77f3335d4dcaa Mon Sep 17 00:00:00 2001
From: Qingchuan Hao <qingchuan.hao@microsoft.com>
Date: Wed, 13 Aug 2025 05:56:22 +0000
Subject: [PATCH 2/3] fix AI comment

---
 holmes/plugins/runbooks/runbook-format.prompt.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/holmes/plugins/runbooks/runbook-format.prompt.md b/holmes/plugins/runbooks/runbook-format.prompt.md
index 861ac3e08..95892eb82 100644
--- a/holmes/plugins/runbooks/runbook-format.prompt.md
+++ b/holmes/plugins/runbooks/runbook-format.prompt.md
@@ -1,4 +1,4 @@
-You are an expert in automated diagnostics and runbook creation for AI-driven troubleshooting agents. I will provide you with one or more issue descriptions or test scenarios.
+You are an expert in automated diagnostics and runbook creation for an AI-driven troubleshooting agents. I will provide you with one or more issue descriptions or test scenarios.
 
 Your task is to generate a strictly executable runbook for AI Agent to follow. The runbook should be machine-readable but human-understandable, and must include:
 

From 45185a11182df98923110eab6cc985f24edc167f Mon Sep 17 00:00:00 2001
From: Qingchuan Hao <qingchuan.hao@microsoft.com>
Date: Wed, 13 Aug 2025 09:41:26 +0000
Subject: [PATCH 3/3] explain the runbook file category and use claude.md

---
 .../{runbook-format.prompt.md => CLAUDE.md}   | 59 +++++++++++++++++--
 1 file changed, 54 insertions(+), 5 deletions(-)
 rename holmes/plugins/runbooks/{runbook-format.prompt.md => CLAUDE.md} (58%)

diff --git a/holmes/plugins/runbooks/runbook-format.prompt.md b/holmes/plugins/runbooks/CLAUDE.md
similarity index 58%
rename from holmes/plugins/runbooks/runbook-format.prompt.md
rename to holmes/plugins/runbooks/CLAUDE.md
index 95892eb82..842b5f29b 100644
--- a/holmes/plugins/runbooks/runbook-format.prompt.md
+++ b/holmes/plugins/runbooks/CLAUDE.md
@@ -1,14 +1,16 @@
 You are an expert in automated diagnostics and runbook creation for an AI-driven troubleshooting agents. I will provide you with one or more issue descriptions or test scenarios.
 
-Your task is to generate a strictly executable runbook for AI Agent to follow. The runbook should be machine-readable but human-understandable, and must include:
+Your task is to generate a strictly executable runbook for AI Agent to follow. The runbook should be machine-readable but human-understandable, and must include the following sections:
 
-## Goal
+# Runbook Content Structure
+
+## 1. Goal
 - **Primary Objective:** Clearly define the specific category of issues this runbook addresses (e.g., "diagnose network connectivity problems", "troubleshoot pod startup failures", "investigate performance degradation").
 - **Scope:** Specify the environment, technology stack, or system components covered by this runbook.
 - **Agent Mandate:** Explicitly state that the AI agent must follow the workflow steps sequentially and systematically without deviation to ensure consistent, thorough troubleshooting.
 - **Expected Outcome:** Define what successful completion of this runbook should achieve (root cause identification, issue resolution, or escalation criteria).
 
-## Workflow for [Issue Category] Diagnosis
+## 2. Workflow for [Issue Category] Diagnosis
 - Provide numbered, sequential steps the AI agent must execute in order.
 - Each step should specify:
   - **Action:** Describe the diagnostic function conceptually (e.g., "retrieve container logs from specified pod", "check service connectivity between components", "examine resource utilization metrics")
@@ -20,17 +22,64 @@ Your task is to generate a strictly executable runbook for AI Agent to follow. T
 - Describe functions generically so they can be mapped to available tools (e.g., "execute a command to test network connectivity" rather than "ping_host()")
 - Include verification steps to confirm each diagnostic action was successful.
 
-## Synthesize Findings
+## 3. Synthesize Findings
 - **Data Correlation:** Describe how the AI agent should combine outputs from multiple workflow steps.
 - **Pattern Recognition:** Specify what patterns, error messages, or metrics indicate specific root causes.
 - **Prioritization Logic:** Provide criteria for ranking potential causes by likelihood or severity.
 - **Evidence Requirements:** Define what evidence is needed to confidently identify each potential root cause.
 - **Example Scenarios:** Include sample synthesis statements showing how findings should be summarized.
 
-## Recommended Remediation Steps
+## 4. Recommended Remediation Steps
 - **Immediate Actions:** List temporary workarounds or urgent fixes for critical issues.
 - **Permanent Solutions:** Provide step-by-step permanent remediation procedures.
 - **Verification Steps:** Define how to confirm each remediation action was successful.
 - **Documentation References:** Include links to official documentation, best practices, or vendor guidance.
 - **Escalation Criteria:** Specify when and how to escalate if remediation steps fail.
 - **Post-Remediation Monitoring:** Describe what to monitor to prevent recurrence.
+
+# File Organization Guidelines
+
+## Folder Structure
+*Category folders are used to distinguish and categorize different runbooks based on their focus area or technology domain. Each runbook must be placed into a specific category folder under `holmes/plugins/runbooks/` for better organization and discoverability. Create a new category folder if your runbook doesn't fit into existing categories.*
+
+## File Naming
+*Use consistent naming conventions for runbook files:*
+
+- Use descriptive, lowercase names with hyphens: `dns-resolution-troubleshooting.md`
+- Include the issue type or technology: `redis-connection-issues.md`
+- Avoid generic names like `troubleshooting.md` or `debug.md`
+
+### Catalog Registration
+After creating your runbook, you must add an entry to `catalog.json` in the runbooks directory to make it discoverable by AI agents.
+
+**Steps to add a new catalog entry:**
+
+1. **Open** `holmes/plugins/runbooks/catalog.json`
+2. **Add your entry** to the JSON array following this structure:
+   ```json
+   {
+     "name": "Brief, descriptive name of the runbook",
+     "path": "category-folder/your-runbook-filename.md",
+     "description": "Clear description of what issues this runbook addresses",
+     "tags": ["relevant", "tags", "for", "search"]
+   }
+   ```
+
+3. **Ensure proper JSON formatting** - add a comma after the previous entry if needed
+4. **Validate the JSON** is properly formatted before committing
+
+**Field Guidelines:**
+- `name`: Keep concise but descriptive (e.g., "Redis Connection Issues")
+- `path`: Always include the category folder (e.g., "database/redis-connection-issues.md")
+- `description`: Explain what specific problems this runbook solves
+- `tags`: Include technology names, issue types, and relevant keywords
+
+Example catalog entry:
+```json
+{
+  "name": "DNS Resolution Troubleshooting",
+  "path": "networking/dns-resolution-troubleshooting.md",
+  "description": "Comprehensive guide for diagnosing and resolving DNS resolution issues in Kubernetes clusters",
+  "tags": ["dns", "networking", "kubernetes", "troubleshooting"]
+}
+```