Skip to content

Add azuremonitormetrics toolset & azuremonitorlogs toolset (both toolsets for CLI only) #782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

vishiy
Copy link

@vishiy vishiy commented Aug 4, 2025

  • Add azure monitor metrics toolset for CLI
  • Add azuremonitor logs toolset for CLI
    - Note : I have disabled azuremonitorlogs toolset for now, as it requires AzureMCP (which has removed SSE based activations from last release) where as HOlmes support only SSE . Once Holmes supports stdout, i will enable azuremonitor logs toolset in CLI by default.
  • Also added azuremonitoraerts for investigate flows to investigate metric based alerts in azure monitor.
  • All the tools use defaultcreds that requires login from az cli
  • Added unit tests for azure monitor metrics & alerts (will add azuremonitorlogs tests in subsequent PRs when azuremonitorlogs get enabled)

vishiy and others added 20 commits July 16, 2025 13:37
* add azuremonitor metrics toolset v1.0.0

* working queries

* limit satapoints to avoid excessive tokens that causes OAI throttling

* enforce cluster filter and show actual queries run..
* add azuremonitor metrics toolset v1.0.0

* working queries

* limit satapoints to avoid excessive tokens that causes OAI throttling

* enforce cluster filter and show actual queries run..

* add azure monitor metrics alert investigation
…oks) (#3)

* add azuremonitor metrics toolset v1.0.0

* working queries

* limit satapoints to avoid excessive tokens that causes OAI throttling

* enforce cluster filter and show actual queries run..

* add azure monitor metrics alert investigation

* Pushing runbook changes

Added three runbooks on DNS, nodes, and pods + catalog changes

---------

Co-authored-by: Auston Li <[email protected]>
* add azure monitor logs toolset

* fix azure montitor metrics toolset for detection issues.
@CLAassistant
Copy link

CLAassistant commented Aug 4, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

coderabbitai bot commented Aug 4, 2025

Walkthrough

This update introduces comprehensive Azure Monitor integration for AKS clusters, adding new toolsets for Azure Monitor Metrics and Logs, complete with configuration, installation, and troubleshooting documentation. It implements new plugins for fetching and investigating Azure Monitor Prometheus metric alerts, extends the CLI with alert investigation commands, and provides diagnostic runbooks for systematic alert analysis. Extensive unit tests and documentation support these additions, alongside dependency updates and configuration examples.

Changes

Cohort / File(s) Change Summary
Azure Monitor Metrics Toolset Implementation
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py, holmes/plugins/toolsets/azuremonitor_metrics/utils.py, holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2, holmes/plugins/toolsets/azuremonitor_metrics/install.md, holmes/plugins/toolsets/azuremonitor_metrics/__init__.py
Implements Azure Monitor Metrics toolset for querying managed Prometheus metrics on AKS clusters, including configuration, installation guide, utility functions for cluster/resource detection, LLM instructions, and module docstring.
Azure Monitor Logs Toolset Implementation
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py, holmes/plugins/toolsets/azuremonitorlogs/utils.py, holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2, holmes/plugins/toolsets/azuremonitorlogs/install.md, holmes/plugins/toolsets/azuremonitorlogs/__init__.py
Adds Azure Monitor Logs toolset for AKS Container Insights detection and Log Analytics workspace integration, with configuration, installation, utilities for cluster/workspace detection, and instruction documentation.
Azure Monitor Alerts Source Plugin
holmes/plugins/sources/azuremonitoralerts/__init__.py
Implements a new source plugin to fetch and process Azure Monitor Prometheus metric alerts for AKS clusters, converting them into Holmes issues with detailed query and description extraction.
Runbooks for Azure Monitor & Kubernetes Troubleshooting
holmes/plugins/runbooks/azuremonitor-alerts.yaml, holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md, holmes/plugins/runbooks/catalog.json, holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md, holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md, holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md, examples/azuremonitor_runbooks.yaml
Adds YAML and Markdown runbooks for Azure Monitor alert diagnostics, cost optimization, and Kubernetes troubleshooting (DNS, node, pod issues), and updates the runbooks catalog.
Documentation and Guides
AZURE_MONITOR_SETUP_GUIDE.md, docs/toolsets/azuremonitor-metrics.md, docs/toolsets/azuremonitor-logs.md, README.md, config.example.yaml
Introduces setup, usage, and troubleshooting guides for Azure Monitor toolsets, updates README with new integrations, and extends configuration examples.
Toolset and Core Integration
holmes/plugins/toolsets/__init__.py, holmes/core/toolset_manager.py
Integrates new toolsets with conditional imports, availability checks, and special enable/disable logic for default toolsets.
CLI and Investigation Enhancements
holmes/main.py, holmes/core/tool_calling_llm.py, holmes/core/runbooks.py
Adds CLI command for Azure Monitor Prometheus alert investigation, improves runbook display in investigations, and introduces a method to retrieve matched runbooks with metadata.
Dependency Updates
pyproject.toml
Adds azure-mgmt-resourcegraph dependency and updates azure-mgmt-resource version.
Azure Monitor Metrics Toolset Tests
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py, tests/plugins/toolsets/azuremonitor_metrics/README.md
Adds comprehensive unit tests and documentation for the Azure Monitor Metrics toolset, covering all tools, success/failure cases, and mocking strategies.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (holmes/main.py)
    participant AzureMonitorAlertsSource
    participant Azure API
    participant Investigator
    participant RunbookManager

    User->>CLI (holmes/main.py): Run `azuremonitormetrics --alertid=...`
    CLI (holmes/main.py)->>AzureMonitorAlertsSource: fetch_issue(alertid)
    AzureMonitorAlertsSource->>Azure API: Query Alert Management API
    Azure API-->>AzureMonitorAlertsSource: Return alert details
    AzureMonitorAlertsSource-->>CLI (holmes/main.py): Return Issue object
    CLI (holmes/main.py)->>Investigator: investigate(issue)
    Investigator->>RunbookManager: get_matched_runbooks_for_issue(issue)
    RunbookManager-->>Investigator: Return matched runbooks
    Investigator-->>CLI (holmes/main.py): Investigation result
    CLI (holmes/main.py)-->>User: Display analysis and runbook info
Loading
sequenceDiagram
    participant HolmesGPT
    participant AzureMonitorMetricsToolset
    participant Azure API
    participant AKS Cluster

    HolmesGPT->>AzureMonitorMetricsToolset: Execute PromQL query
    AzureMonitorMetricsToolset->>Azure API: Authenticate (Azure CLI/Managed Identity)
    AzureMonitorMetricsToolset->>AKS Cluster: Detect cluster resource ID
    AzureMonitorMetricsToolset->>Azure API: Query Prometheus endpoint
    Azure API-->>AzureMonitorMetricsToolset: Return metrics/alerts
    AzureMonitorMetricsToolset-->>HolmesGPT: Return results
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~70 minutes

Suggested reviewers

  • arikalon1

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vishiy
Copy link
Author

vishiy commented Aug 4, 2025

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

🔭 Outside diff range comments (1)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)

715-716: Move import to the top of the file.

Per the coding guidelines, all Python imports should be at the top of the file, not inside functions or at the bottom.

Move the import os statement from line 715 to the top of the file with other imports.

🧹 Nitpick comments (18)
holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md (2)

19-19: Use proper heading syntax instead of emphasis

Replace bold emphasis with proper heading syntax for better document structure and accessibility.

-**Core signals**
+#### Core signals

-**Kube-State Metrics**
+#### Kube-State Metrics

-**Node Exporter Metrics**
+#### Node Exporter Metrics

Also applies to: 37-37, 48-48


109-109: Add newline at end of file

Add a newline character at the end of the file to follow standard file conventions.

 | **Node Upkeep**           | Periodically rotate nodes via rolling upgrade or nodepool reimage |
+
holmes/core/runbooks.py (1)

28-44: Consider refactoring to reduce code duplication.

The new method works correctly but duplicates the filtering logic from get_instructions_for_issue. Consider extracting the common filtering logic into a private helper method to improve maintainability.

Example refactor:

+    def _get_matching_runbooks(self, issue: Issue) -> List[Runbook]:
+        """Private helper to get runbooks matching an issue"""
+        matched_runbooks = []
+        for runbook in self.runbooks:
+            if runbook.match.issue_id and not runbook.match.issue_id.match(issue.id):
+                continue
+            if runbook.match.issue_name and not runbook.match.issue_name.match(issue.name):
+                continue
+            if runbook.match.source and not runbook.match.source.match(issue.source_type):
+                continue
+            matched_runbooks.append(runbook)
+        return matched_runbooks
+
     def get_instructions_for_issue(self, issue: Issue) -> List[str]:
-        instructions = []
-        for runbook in self.runbooks:
-            if runbook.match.issue_id and not runbook.match.issue_id.match(issue.id):
-                continue
-            if runbook.match.issue_name and not runbook.match.issue_name.match(issue.name):
-                continue
-            if runbook.match.source and not runbook.match.source.match(issue.source_type):
-                continue
-            instructions.append(runbook.instructions)
-        return instructions
+        matched_runbooks = self._get_matching_runbooks(issue)
+        return [runbook.instructions for runbook in matched_runbooks]

     def get_matched_runbooks_for_issue(self, issue: Issue) -> List[Runbook]:
         """Get the actual Runbook objects that match an issue (for accessing metadata like file paths)"""
-        matched_runbooks = []
-        for runbook in self.runbooks:
-            if runbook.match.issue_id and not runbook.match.issue_id.match(issue.id):
-                continue
-            if runbook.match.issue_name and not runbook.match.issue_name.match(issue.name):
-                continue
-            if runbook.match.source and not runbook.match.source.match(issue.source_type):
-                continue
-            matched_runbooks.append(runbook)
-        return matched_runbooks
+        return self._get_matching_runbooks(issue)
README.md (1)

44-44: Add missing period after "etc."

In American English, abbreviations like "etc." require a period.

Apply this diff:

-| [<img src="images/integration_logos/http-icon.png" alt="Internet" width="20" style="vertical-align: middle;"> **Internet**](https://docs.robusta.dev/master/configuration/holmesgpt/toolsets/internet.html) | ✅ | Public runbooks, community docs etc |
+| [<img src="images/integration_logos/http-icon.png" alt="Internet" width="20" style="vertical-align: middle;"> **Internet**](https://docs.robusta.dev/master/configuration/holmesgpt/toolsets/internet.html) | ✅ | Public runbooks, community docs etc. |
holmes/plugins/toolsets/azuremonitor_metrics/install.md (1)

70-135: Specify a language for all fenced code blocks (markdownlint MD040).

markdownlint-cli2 fails on the blocks that open on Lines 77, 102, 109, 119 and 129 because they use plain ``` without a language qualifier.
These blocks contain CLI samples or expected console output; declaring a language (e.g. bash, `yaml`, `text`) will silence the linter and improve syntax highlighting.

Example fix for the first offending block:

-```
+```text
azuremonitorlogs    │ True    │ enabled  │ built-in │      │

Repeat for each affected block.

holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md (2)

12-18: Avoid using bold text as a substitute for headings (markdownlint MD036).

The emphasised line “PromQL Queries” (Line 13) triggers MD036.
Either promote it to a real heading or drop the bold styling:

-**PromQL Queries**
+### PromQL Queries

44-48: Remove blank lines inside block-quotes (markdownlint MD028).

Lines 45 and 47 are empty lines that sit inside the block-quotes, tripping MD028.
Delete those blanks to keep the quote intact.

holmes/plugins/toolsets/azuremonitorlogs/install.md (1)

70-82: Add language identifiers to fenced code blocks (markdownlint MD040).

Blocks that start on Lines 73, 101, 109, 118 and 128 are missing a language spec.
Mark them as text (for expected output / errors) or bash as appropriate.

-```
+```text
Error: DefaultAzureCredential failed to retrieve a token

Repeat for each occurrence.

docs/toolsets/azuremonitor-logs.md (1)

140-176: Language spec missing in fenced blocks (markdownlint MD040).

The error / output snippets opened on Lines 145, 151, 160 and 169 lack a language tag.
Use text to satisfy the linter:

-```
+```text
Error: DefaultAzureCredential failed to retrieve a token
examples/azuremonitor_runbooks.yaml (1)

35-43: Trim trailing whitespace flagged by yamllint.

Line 39 has stray spaces at the end:

-         - Use time ranges that cover the alert period plus some buffer (2h, 4h, 6h)␠␠
+         - Use time ranges that cover the alert period plus some buffer (2h, 4h, 6h)

A similar issue exists on Line 187.
Cleaning these prevents CI failures on strict YAML linting.

holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2 (1)

111-119: Add language identifier to the code block.

The example investigation flow should have a language identifier for better syntax highlighting.

-```
+```text
 1. List alerts → get_azure_monitor_alerts_with_fired_times
 2. See alert query: kube_deployment_status_replicas{cluster="myaks"} != kube_deployment_spec_replicas{cluster="myaks"}
 3. Execute alert query → execute_alert_promql_query (alert_id, time_range="2h")
 4. Analyze deployment status → execute_azuremonitor_prometheus_range_query with deployment queries
 5. Check pod status → query kube_pod_status_phase for related pods
 6. Investigate resources → query CPU/memory metrics for the timeframe

</blockquote></details>
<details>
<summary>holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)</summary><blockquote>

`62-62`: **Remove unused variable assignment.**

The `namespace` variable is read but never used.

```diff
-        with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f:
-            namespace = f.read().strip()
+        # Verify we're in a Kubernetes pod by checking service account
+        with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f:
+            f.read()  # Just verify the file exists and is readable
AZURE_MONITOR_SETUP_GUIDE.md (1)

7-10: Add language specifier to code block.

Fenced code blocks should have a language specified for proper syntax highlighting.

-```
+```text
 Cannot determine if Azure Monitor metrics is enabled because this environment is not running inside an AKS cluster. Please run this from within your AKS cluster or provide cluster details.

</blockquote></details>
<details>
<summary>tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py (1)</summary><blockquote>

`683-735`: **Consider adding test for maximum data points enforcement**

The step size calculation tests are comprehensive, but consider adding a test case for when the calculated step size needs to be increased to stay within the `max_data_points` limit (1000 as configured in line 616).


Example test case to add:

```python
@patch('holmes.plugins.toolsets.azuremonitor_metrics.azuremonitor_metrics_toolset.process_timestamps_to_rfc3339')
def test_step_size_max_data_points_enforcement(self, mock_process_timestamps):
    """Test that step size is increased to respect max_data_points limit."""
    # Mock timestamp processing for 30-day range
    mock_process_timestamps.return_value = ("2023-01-01T00:00:00Z", "2023-01-31T00:00:00Z")
    
    # Test step size calculation
    params = {
        "query": "up",
        "description": "Test query",
        "output_type": "Plain"
    }
    step = self.tool._calculate_optimal_step_size(params, "2023-01-01T00:00:00Z", "2023-01-31T00:00:00Z")
    
    # For 30 days with max 1000 data points, minimum step should be ~2592 seconds
    assert step >= 2592
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md (2)

18-83: Standardize markdown list style for consistency

The document uses both asterisk (*) and dash (-) for unordered lists. For consistency, use asterisks throughout as that's the predominant style in the document.

Apply this pattern throughout the document:

-* **STEP 1**: Use azuremonitorlogs toolset to detect Log Analytics workspace and cluster resource ID
-  - If workspace/cluster not immediately detectable, use `check_aks_cluster_context` to get current cluster
-  - If resource group needed, use `get_aks_cluster_resource_id` with available cluster names
+* **STEP 1**: Use azuremonitorlogs toolset to detect Log Analytics workspace and cluster resource ID
+  * If workspace/cluster not immediately detectable, use `check_aks_cluster_context` to get current cluster
+  * If resource group needed, use `get_aks_cluster_resource_id` with available cluster names

526-580: Add language specifiers to code blocks

Several code blocks are missing language specifiers, which affects syntax highlighting.

Add language specifiers to these code blocks:

-```
+```text
 BEFORE (ContainerLog v1):
 ├── ContainerLog table ($2.99/GB - Analytics only)
 ├── KubePodInventory table ($2.99/GB - Separate collection)
 └── Limited filtering options
-```
+```

-```
+```text
 Monthly Cost = (ContainerLog Volume × $2.99) + (KubePodInventory Volume × $2.99)
 Example: (10GB × $89.70) + (2GB × $59.80) = $956.40/month
-```
+```

-```
+```text
 Monthly Cost = (ContainerLogV2 Volume × $0.50) + (Eliminated KubePodInventory)
 Example: (10GB × $15.00) + $0 = $15.00/month
 Savings: $941.40/month (98.4% reduction)
-```
+```

</blockquote></details>
<details>
<summary>docs/toolsets/azuremonitor-metrics.md (1)</summary><blockquote>

`234-234`: **Add language specification to fenced code block**

The fenced code block should have a language specified for proper syntax highlighting.

```diff
-```
+```text
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (1)

568-645: Consider reducing verbose timestamp debugging

The timestamp parsing logic contains extensive debug logging that may be too verbose for production use. Consider reducing the logging or making it conditional based on a debug flag.

Consider wrapping the verbose debug logs in a condition or reducing them to key points only:

if logging.getLogger().isEnabledFor(logging.DEBUG):
    # Detailed debug logging here

Or simplify to log only the final result and errors.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4143c15 and 67d788a.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • AZURE_MONITOR_SETUP_GUIDE.md (1 hunks)
  • README.md (2 hunks)
  • config.example.yaml (1 hunks)
  • docs/toolsets/azuremonitor-logs.md (1 hunks)
  • docs/toolsets/azuremonitor-metrics.md (1 hunks)
  • examples/azuremonitor_runbooks.yaml (1 hunks)
  • holmes/core/runbooks.py (1 hunks)
  • holmes/core/tool_calling_llm.py (1 hunks)
  • holmes/core/toolset_manager.py (1 hunks)
  • holmes/main.py (1 hunks)
  • holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md (1 hunks)
  • holmes/plugins/runbooks/azuremonitor-alerts.yaml (1 hunks)
  • holmes/plugins/runbooks/catalog.json (1 hunks)
  • holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md (1 hunks)
  • holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md (1 hunks)
  • holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md (1 hunks)
  • holmes/plugins/sources/azuremonitoralerts/__init__.py (1 hunks)
  • holmes/plugins/toolsets/__init__.py (3 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/__init__.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2 (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/install.md (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitorlogs/__init__.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2 (1 hunks)
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitorlogs/install.md (1 hunks)
  • holmes/plugins/toolsets/azuremonitorlogs/utils.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/plugins/toolsets/azuremonitor_metrics/README.md (1 hunks)
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)

Files:

  • holmes/plugins/toolsets/azuremonitorlogs/__init__.py
  • holmes/plugins/toolsets/azuremonitor_metrics/__init__.py
  • holmes/core/tool_calling_llm.py
  • holmes/core/toolset_manager.py
  • holmes/core/runbooks.py
  • holmes/plugins/toolsets/__init__.py
  • holmes/plugins/toolsets/azuremonitorlogs/utils.py
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
  • holmes/main.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/sources/azuremonitoralerts/__init__.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
📚 Learning: the robusta-dev/holmesgpt codebase has comprehensive existing validation for azure environment varia...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.

Applied to files:

  • holmes/plugins/toolsets/azuremonitorlogs/__init__.py
  • README.md
  • holmes/plugins/toolsets/__init__.py
  • holmes/plugins/toolsets/azuremonitorlogs/utils.py
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: applies to holmes/plugins/toolsets/**/*.yaml : toolsets must be placed in holmes/plugins/toolsets/{n...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be placed in holmes/plugins/toolsets/{name}.yaml or {name}/

Applied to files:

  • holmes/plugins/toolsets/azuremonitorlogs/__init__.py
  • holmes/core/toolset_manager.py
  • holmes/plugins/toolsets/azuremonitorlogs/install.md
  • holmes/plugins/toolsets/azuremonitor_metrics/install.md
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2
  • holmes/plugins/toolsets/__init__.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: in the kubernetes logs toolset for holmes, both current and previous logs are intentionally fetched ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.

Applied to files:

  • holmes/plugins/toolsets/azuremonitorlogs/__init__.py
  • holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md
  • README.md
  • holmes/plugins/toolsets/azuremonitorlogs/install.md
  • docs/toolsets/azuremonitor-logs.md
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2
  • AZURE_MONITOR_SETUP_GUIDE.md
  • holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: the holmesgpt project uses python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dic...
Learnt from: nherment
PR: robusta-dev/holmesgpt#436
File: tests/llm/utils/mock_utils.py:240-249
Timestamp: 2025-06-05T12:23:27.634Z
Learning: The holmesgpt project uses Python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dict[str, int]` over importing equivalent types from the typing module like `List[str]`, `Dict[str, int]`.

Applied to files:

  • holmes/plugins/toolsets/azuremonitorlogs/__init__.py
📚 Learning: the init_config method in toolsets should be idempotent - safely callable multiple times without err...
Learnt from: nherment
PR: robusta-dev/holmesgpt#535
File: holmes/plugins/toolsets/bash/bash_toolset.py:207-209
Timestamp: 2025-06-24T05:51:04.543Z
Learning: The init_config method in toolsets should be idempotent - safely callable multiple times without errors. self.config should maintain consistent typing (not alternate between dict and config object types) throughout the object lifecycle.

Applied to files:

  • holmes/core/toolset_manager.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: all new features require unit tests...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: All new features require unit tests

Applied to files:

  • tests/plugins/toolsets/azuremonitor_metrics/README.md
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: new toolsets require integration tests with mocks...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: New toolsets require integration tests with mocks

Applied to files:

  • tests/plugins/toolsets/azuremonitor_metrics/README.md
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: tests must match source structure under tests/...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Tests must match source structure under tests/

Applied to files:

  • tests/plugins/toolsets/azuremonitor_metrics/README.md
🪛 LanguageTool
README.md

[style] ~44-~44: In American English, abbreviations like “etc.” require a period.
Context: ...) | ✅ | Public runbooks, community docs etc | | [<img src="images/integration_logos...

(ETC_PERIOD)

holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md

[uncategorized] ~832-~832: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...onitor Metrics is NOT ENABLED:** - 🚨 HIGH PRIORITY RECOMMENDATION: Enable Azure Monitor ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.17.2)
holmes/plugins/toolsets/azuremonitor_metrics/install.md

77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


102-102: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


119-119: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


129-129: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md

19-19: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


37-37: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


48-48: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


80-80: Blank line inside blockquote

(MD028, no-blanks-blockquote)


82-82: Blank line inside blockquote

(MD028, no-blanks-blockquote)

holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md

13-13: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


45-45: Blank line inside blockquote

(MD028, no-blanks-blockquote)


47-47: Blank line inside blockquote

(MD028, no-blanks-blockquote)

docs/toolsets/azuremonitor-logs.md

145-145: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


151-151: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


160-160: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


169-169: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

AZURE_MONITOR_SETUP_GUIDE.md

7-7: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/toolsets/azuremonitor-metrics.md

234-234: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md

18-18: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


19-19: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


21-21: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


22-22: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


65-65: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


66-66: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


67-67: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


68-68: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


69-69: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


80-80: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


81-81: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


82-82: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


83-83: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


514-514: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


515-515: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


516-516: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


517-517: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


520-520: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


521-521: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


522-522: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


523-523: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


526-526: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


574-574: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


580-580: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


587-587: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


588-588: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


589-589: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


590-590: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


591-591: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


611-611: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


612-612: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


613-613: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


614-614: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


615-615: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


647-647: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


648-648: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


649-649: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


650-650: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


651-651: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


654-654: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


655-655: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


656-656: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


657-657: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


660-660: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


661-661: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


662-662: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


663-663: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


672-672: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


673-673: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


674-674: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


675-675: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


678-678: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


679-679: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


680-680: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


681-681: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


689-689: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


690-690: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


691-691: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


692-692: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


695-695: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


696-696: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


697-697: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


698-698: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


703-703: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


704-704: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


705-705: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


706-706: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


790-790: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


795-795: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


802-802: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


826-826: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


827-827: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


828-828: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


829-829: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


832-832: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


833-833: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


834-834: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


835-835: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


836-836: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


848-848: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


849-849: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


850-850: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


851-851: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


921-921: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1014-1014: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1015-1015: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1016-1016: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1017-1017: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1018-1018: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1019-1019: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1020-1020: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1023-1023: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1024-1024: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1025-1025: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1026-1026: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1029-1029: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1030-1030: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1031-1031: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1032-1032: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1033-1033: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 YAMLlint (1.37.1)
examples/azuremonitor_runbooks.yaml

[error] 39-39: trailing spaces

(trailing-spaces)


[error] 187-187: trailing spaces

(trailing-spaces)

🪛 Ruff (0.12.2)
holmes/plugins/toolsets/azuremonitorlogs/utils.py

3-3: json imported but unused

Remove unused import: json

(F401)

tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py

4-4: pytest imported but unused

Remove unused import: pytest

(F401)


5-5: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)


8-8: holmes.core.tools.StructuredToolResult imported but unused

Remove unused import: holmes.core.tools.StructuredToolResult

(F401)

holmes/main.py

888-888: f-string without any placeholders

Remove extraneous f prefix

(F541)


918-918: f-string without any placeholders

Remove extraneous f prefix

(F541)

holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py

7-7: typing.List imported but unused

Remove unused import: typing.List

(F401)


34-34: azure.core.exceptions.AzureError imported but unused

Remove unused import: azure.core.exceptions.AzureError

(F401)


504-504: Local variable result_data is assigned to but never used

Remove assignment to unused variable result_data

(F841)


603-603: datetime.datetime imported but unused

Remove unused import: datetime.datetime

(F401)


653-653: f-string without any placeholders

Remove extraneous f prefix

(F541)

holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py

3-3: json imported but unused

Remove unused import: json

(F401)


10-10: pydantic.field_validator imported but unused

Remove unused import: pydantic.field_validator

(F401)

holmes/plugins/toolsets/azuremonitor_metrics/utils.py

3-3: json imported but unused

Remove unused import: json

(F401)


6-6: typing.Tuple imported but unused

Remove unused import: typing.Tuple

(F401)


62-62: Local variable namespace is assigned to but never used

Remove assignment to unused variable namespace

(F841)


387-387: azure.mgmt.resource.ResourceManagementClient imported but unused

Remove unused import: azure.mgmt.resource.ResourceManagementClient

(F401)

holmes/plugins/sources/azuremonitoralerts/__init__.py

86-86: Local variable alert_name is assigned to but never used

Remove assignment to unused variable alert_name

(F841)


622-622: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (26)
pyproject.toml (1)

61-61: LGTM!

The addition of azure-mgmt-resourcegraph is appropriate for querying Azure resources in the new Azure Monitor toolsets.

holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md (1)

21-74: Excellent metrics reference tables!

The comprehensive list of metrics with their uses and labels provides valuable reference material for node troubleshooting scenarios.

holmes/plugins/sources/azuremonitoralerts/__init__.py (4)

20-34: Well-structured initialization with proper validation!

The initialization properly validates the cluster resource ID and extracts necessary components with appropriate error handling.


211-274: Clean implementation with smart console output control!

The use of a class-level flag to prevent repeated console output is a thoughtful touch. The method properly structures all alert information into the Issue object.


629-652: Clean alert name parsing with modern type hints!

Good use of modern Python type hints and proper handling of edge cases.


805-826: Robust token acquisition with proper fallback!

The implementation properly handles credential fallback scenarios and includes appropriate logging for debugging.

holmes/plugins/toolsets/azuremonitorlogs/__init__.py (1)

1-2: LGTM!

Standard package marker with appropriate docstring.

holmes/plugins/toolsets/azuremonitor_metrics/__init__.py (1)

1-2: LGTM!

Clear and descriptive docstring for the Azure Monitor Metrics toolset package.

holmes/core/toolset_manager.py (1)

93-99: LGTM! Well-designed special handling for disabled toolsets.

The implementation correctly addresses the requirement to keep azuremonitorlogs disabled even when all toolsets are enabled. The approach using a list of toolsets_to_keep_disabled is extensible and makes the intention clear.

holmes/plugins/runbooks/catalog.json (1)

13-32: LGTM! Comprehensive Azure Monitor runbook additions.

The new runbook entries are well-structured and provide valuable troubleshooting resources for Azure Monitor and Prometheus scenarios. The descriptions are clear and specific, covering DNS resolution, node issues, pod issues, and cost optimization.

README.md (2)

31-53: LGTM! Comprehensive Azure Monitor integration documented.

The documentation updates effectively showcase the new Azure Monitor capabilities with clear descriptions and appropriate status indicators. The migration from robusta-dev.github.io to docs.robusta.dev maintains consistency across all integration links.


62-62: LGTM! Azure Monitor Alerts integration well documented.

The addition of Azure Monitor Alerts to the End-to-End Automation section is appropriate and clearly indicates its CLI-only status for investigating Prometheus metric alerts.

holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md (1)

1-63: Excellent systematic troubleshooting workflow.

This runbook provides comprehensive guidance for pod troubleshooting with well-structured workflow steps and valuable metrics reference tables. The systematic approach covering scoping, metrics analysis, events, logs, and remediation aligns well with best practices.

tests/plugins/toolsets/azuremonitor_metrics/README.md (1)

1-110: Excellent comprehensive test documentation.

This README provides thorough documentation of the Azure Monitor Metrics toolset test suite, covering all 7 tools with detailed success/failure scenarios, clear mocking strategies, and practical execution instructions. The organization and structure follow best practices and align with the project's testing standards.

holmes/plugins/toolsets/__init__.py (3)

10-30: Excellent conditional import handling.

The try/catch blocks with proper logging provide graceful handling of import failures. This ensures the system remains functional even if Azure Monitor dependencies are missing.


112-125: Good conditional toolset registration.

The conditional addition of toolsets based on import availability with appropriate logging provides good visibility into which toolsets are loaded.


155-159: Proper default enablement logic.

The logic to set enabled status based on the is_default property with a sensible fallback to False follows good practices for toolset management.

holmes/plugins/runbooks/azuremonitor-alerts.yaml (1)

1-152: Excellent comprehensive diagnostic runbooks.

These runbooks provide systematic, actionable guidance for Azure Monitor alert investigation. The 10-step generic workflow is thorough and logical, while the specialized runbooks appropriately focus on specific alert types (CPU, memory, pod states). The integration with available toolsets and clear instructions make these valuable diagnostic resources.

holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2 (4)

1-16: LGTM!

The introduction and key capabilities section provide clear documentation. The Jinja2 template syntax for listing tools is correct.


36-62: LGTM!

The query types, error handling, common metrics, and troubleshooting sections are well-documented and provide valuable guidance.


63-83: LGTM!

The alert investigation workflow Step 1 is thoroughly documented with appropriate emphasis on displaying the exact tool output without modifications.


135-149: LGTM!

The time range considerations, conditional configuration display, and setup reminder are well-documented. The Jinja2 syntax for conditional blocks is correct.

holmes/main.py (1)

854-944: LGTM!

The azuremonitormetrics command implementation follows the established pattern of other investigate commands with proper error handling, console output formatting, and JSON output support.

holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)

580-713: LGTM!

The enhance_promql_with_cluster_filter function is well-implemented with comprehensive handling of PromQL syntax edge cases, proper validation, and good error handling.

holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2 (1)

1-142: LGTM!

The Azure Monitor Logs toolset instructions are comprehensive and well-structured. The clear distinction between detection and query execution roles is appropriately emphasized, and the cost optimization workflow provides detailed guidance.

AZURE_MONITOR_SETUP_GUIDE.md (1)

14-388: LGTM!

The Azure Monitor Setup Guide is exceptionally comprehensive with detailed prerequisites, multiple configuration methods, thorough troubleshooting guidance, and complete workflow examples. The documentation will be very helpful for users setting up the toolset from external environments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (1)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)

714-715: Move import to the top of the file

Python imports should be placed at the top of the file, not at the end. This import os statement should be moved to line 1-12 with the other imports.

-import os
-

And add at the top of the file:

 """Utility functions for Azure Monitor Metrics toolset."""

 import json
 import logging
+import os
 import re
 from typing import Dict, Optional, Tuple
♻️ Duplicate comments (8)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)

6-6: Remove unused import Tuple

The Tuple import from typing is not used in the code.

-from typing import Dict, Optional, Tuple
+from typing import Dict, Optional
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py (1)

4-8: Remove unused imports

The following imports are not used in the test file:

  • pytest (line 4)
  • MagicMock (line 5)
  • StructuredToolResult (line 8)
 """Unit tests for Azure Monitor Metrics toolset individual tools."""

 import json
-import pytest
-from unittest.mock import MagicMock, patch, Mock
+from unittest.mock import patch, Mock
 from requests import RequestException

-from holmes.core.tools import ToolResultStatus, StructuredToolResult
+from holmes.core.tools import ToolResultStatus
 from holmes.plugins.toolsets.azuremonitor_metrics.azuremonitor_metrics_toolset import (
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py (2)

3-3: Remove unused import json

The json module is imported but never used in this file.

-import json

10-10: Remove unused import field_validator

The field_validator is imported from pydantic but never used in this file.

-from pydantic import BaseModel, field_validator
+from pydantic import BaseModel
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (3)

7-7: Remove unused import List

The List type is imported but never used in this file. Consider using the modern type hint syntax list[T] instead of importing from typing.

-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Dict, Optional, Tuple

34-34: Remove unused import AzureError

The AzureError exception is imported but never used in this file.

-from azure.core.exceptions import AzureError

693-751: Remove duplicate unused methods

The methods _get_alert_rule_details and _is_prometheus_alert_rule are defined here but never used within this file. According to previous review comments, these same methods exist and are actively used in holmes/plugins/sources/azuremonitoralerts/__init__.py, making these duplicates.

Please remove these unused duplicate methods to avoid code duplication and maintenance issues.

holmes/plugins/sources/azuremonitoralerts/__init__.py (1)

621-621: Remove unnecessary f-string prefix

The string doesn't contain any placeholders, so the f prefix is not needed.

-            logging.warning(f"No rules found in rule group properties")
+            logging.warning("No rules found in rule group properties")
🧹 Nitpick comments (3)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)

530-547: Extract complex query to a constant for better maintainability

The Azure Resource Graph query is very complex and embedded inline. Consider extracting it to a module-level constant or a separate function for better readability and maintainability.

# At module level
AZURE_MONITOR_WORKSPACE_QUERY_TEMPLATE = """
resources 
| where type == "microsoft.insights/datacollectionrules"
| extend ma = properties.destinations.monitoringAccounts
| extend flows = properties.dataFlows
| mv-expand flows
| where flows.streams contains "Microsoft-PrometheusMetrics"
| mv-expand ma
| where array_index_of(flows.destinations, tostring(ma.name)) != -1
| project dcrId = tolower(id), azureMonitorWorkspaceResourceId=tolower(tostring(ma.accountResourceId))
| join (insightsresources | extend clusterId = split(tolower(id), '/providers/microsoft.insights/datacollectionruleassociations/')[0] | where clusterId =~ "{cluster_resource_id}" | project clusterId = tostring(clusterId), dcrId = tolower(tostring(parse_json(properties).dataCollectionRuleId)), dcraName = name) on dcrId
| join kind=leftouter (resources | where type == "microsoft.monitor/accounts" | extend prometheusQueryEndpoint=tostring(properties.metrics.prometheusQueryEndpoint) | extend amwLocation = location | project azureMonitorWorkspaceResourceId=tolower(id), prometheusQueryEndpoint, amwLocation) on azureMonitorWorkspaceResourceId
| project-away dcrId1, azureMonitorWorkspaceResourceId1
| join kind=leftouter (resources | where type == "microsoft.dashboard/grafana" | extend amwIntegrations = properties.grafanaIntegrations.azureMonitorWorkspaceIntegrations | mv-expand amwIntegrations | extend azureMonitorWorkspaceResourceId = tolower(tostring(amwIntegrations.azureMonitorWorkspaceResourceId)) | where azureMonitorWorkspaceResourceId != "" | extend grafanaObject = pack("grafanaResourceId", tolower(id), "grafanaWorkspaceName", name, "grafanaEndpoint", properties.endpoint) | summarize associatedGrafanas=make_list(grafanaObject) by azureMonitorWorkspaceResourceId) on azureMonitorWorkspaceResourceId
| extend amwToGrafana = pack("azureMonitorWorkspaceResourceId", azureMonitorWorkspaceResourceId, "prometheusQueryEndpoint", prometheusQueryEndpoint, "amwLocation", amwLocation, "associatedGrafanas", associatedGrafanas)
| summarize amwToGrafanas=make_list(amwToGrafana) by dcrResourceId = dcrId, dcraName
| order by dcrResourceId
"""

# In the function
query = AZURE_MONITOR_WORKSPACE_QUERY_TEMPLATE.format(
    cluster_resource_id=cluster_resource_id.lower()
)
holmes/plugins/sources/azuremonitoralerts/__init__.py (2)

395-627: Comprehensive query extraction with good fallback strategies

The Prometheus query extraction methods are well-designed with:

  • Multiple fallback strategies to handle various alert formats
  • Comprehensive logging for debugging
  • Smart parsing of alert names to locate specific rules

Consider documenting the expected alert name format in the _parse_alert_name method docstring for future maintainability.


804-824: Consider adding token caching for management API

The token acquisition logic is correct, but unlike the Prometheus token in the toolset, this doesn't implement caching. Since multiple API calls may be made during alert fetching, consider adding token caching similar to the toolset implementation to reduce authentication overhead.

Would you like me to generate the token caching implementation for consistency with the toolset?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67d788a and 3676c0c.

📒 Files selected for processing (11)
  • holmes/core/tool_calling_llm.py (2 hunks)
  • holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md (1 hunks)
  • holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md (1 hunks)
  • holmes/plugins/sources/azuremonitoralerts/__init__.py (1 hunks)
  • holmes/plugins/toolsets/__init__.py (3 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2 (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py (1 hunks)
  • tests/plugins/toolsets/azuremonitor_metrics/README.md (1 hunks)
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/plugins/toolsets/azuremonitor_metrics/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md
  • holmes/plugins/toolsets/init.py
  • holmes/core/tool_calling_llm.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)

Files:

  • holmes/plugins/sources/azuremonitoralerts/__init__.py
  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
📚 Learning: in robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enab...
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.

Applied to files:

  • holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: in the kubernetes logs toolset for holmes, both current and previous logs are intentionally fetched ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.

Applied to files:

  • holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: when suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: When suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first if validation logic already exists rather than reimplementing it.

Applied to files:

  • holmes/plugins/sources/azuremonitoralerts/__init__.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the robusta-dev/holmesgpt codebase has comprehensive existing validation for azure environment varia...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.

Applied to files:

  • holmes/plugins/sources/azuremonitoralerts/__init__.py
  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the holmesgpt project uses python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dic...
Learnt from: nherment
PR: robusta-dev/holmesgpt#436
File: tests/llm/utils/mock_utils.py:240-249
Timestamp: 2025-06-05T12:23:27.634Z
Learning: The holmesgpt project uses Python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dict[str, int]` over importing equivalent types from the typing module like `List[str]`, `Dict[str, int]`.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: applies to holmes/plugins/toolsets/**/*.yaml : toolsets must be placed in holmes/plugins/toolsets/{n...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be placed in holmes/plugins/toolsets/{name}.yaml or {name}/

Applied to files:

  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the init_config method in toolsets should be idempotent - safely callable multiple times without err...
Learnt from: nherment
PR: robusta-dev/holmesgpt#535
File: holmes/plugins/toolsets/bash/bash_toolset.py:207-209
Timestamp: 2025-06-24T05:51:04.543Z
Learning: The init_config method in toolsets should be idempotent - safely callable multiple times without errors. self.config should maintain consistent typing (not alternate between dict and config object types) throughout the object lifecycle.

Applied to files:

  • holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: new toolsets require integration tests with mocks...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: New toolsets require integration tests with mocks

Applied to files:

  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: all new features require unit tests...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: All new features require unit tests

Applied to files:

  • tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: for the robusta-dev/holmesgpt repository, do not flag fbt003 (boolean positional value in function c...
Learnt from: moshemorad
PR: robusta-dev/holmesgpt#391
File: holmes/common/env_vars.py:25-25
Timestamp: 2025-07-17T20:00:38.391Z
Learning: For the robusta-dev/holmesgpt repository, do not flag FBT003 (boolean positional value in function call) as an issue. The team is okay with boolean positional arguments in function calls.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🪛 LanguageTool
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md

[uncategorized] ~832-~832: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...onitor Metrics is NOT ENABLED:** - 🚨 HIGH PRIORITY RECOMMENDATION: Enable Azure Monitor ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.17.2)
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md

18-18: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


19-19: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


21-21: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


22-22: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


65-65: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


66-66: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


67-67: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


68-68: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


69-69: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


80-80: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


81-81: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


82-82: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


83-83: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


514-514: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


515-515: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


516-516: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


517-517: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


520-520: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


521-521: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


522-522: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


523-523: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


526-526: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


574-574: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


580-580: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


587-587: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


588-588: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


589-589: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


590-590: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


591-591: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


611-611: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


612-612: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


613-613: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


614-614: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


615-615: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


647-647: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


648-648: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


649-649: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


650-650: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


651-651: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


654-654: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


655-655: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


656-656: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


657-657: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


660-660: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


661-661: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


662-662: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


663-663: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


672-672: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


673-673: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


674-674: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


675-675: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


678-678: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


679-679: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


680-680: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


681-681: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


689-689: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


690-690: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


691-691: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


692-692: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


695-695: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


696-696: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


697-697: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


698-698: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


703-703: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


704-704: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


705-705: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


706-706: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


790-790: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


795-795: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


802-802: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


826-826: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


827-827: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


828-828: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


829-829: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


832-832: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


833-833: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


834-834: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


835-835: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


836-836: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


848-848: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


849-849: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


850-850: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


851-851: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


921-921: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


1014-1014: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1015-1015: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1016-1016: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1017-1017: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1018-1018: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1019-1019: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1020-1020: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1023-1023: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1024-1024: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1025-1025: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1026-1026: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1029-1029: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1030-1030: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1031-1031: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1032-1032: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


1033-1033: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.12.2)
holmes/plugins/sources/azuremonitoralerts/__init__.py

621-621: f-string without any placeholders

Remove extraneous f prefix

(F541)

holmes/plugins/toolsets/azuremonitor_metrics/utils.py

3-3: json imported but unused

Remove unused import: json

(F401)


6-6: typing.Tuple imported but unused

Remove unused import: typing.Tuple

(F401)


62-62: Local variable namespace is assigned to but never used

Remove assignment to unused variable namespace

(F841)

holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py

3-3: json imported but unused

Remove unused import: json

(F401)


10-10: pydantic.field_validator imported but unused

Remove unused import: pydantic.field_validator

(F401)

tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py

4-4: pytest imported but unused

Remove unused import: pytest

(F401)


5-5: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)


8-8: holmes.core.tools.StructuredToolResult imported but unused

Remove unused import: holmes.core.tools.StructuredToolResult

(F401)

holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py

7-7: typing.List imported but unused

Remove unused import: typing.List

(F401)


34-34: azure.core.exceptions.AzureError imported but unused

Remove unused import: azure.core.exceptions.AzureError

(F401)


504-504: Local variable result_data is assigned to but never used

Remove assignment to unused variable result_data

(F841)


652-652: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (9)
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md (1)

1-1059: Excellent comprehensive cost optimization runbook!

This runbook provides thorough guidance for Azure Monitor Container Insights cost optimization with detailed KQL queries, implementation strategies, and executive reporting templates. The content is well-structured and technically sound.

holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py (3)

31-43: Well-structured configuration model!

The AzureMonitorLogsConfig class follows established patterns with proper typing and meaningful field names. Good use of Optional types for flexible configuration.


49-333: Excellent tool implementations following established patterns!

All four tools (CheckAKSClusterContext, GetAKSClusterResourceID, CheckAzureMonitorLogsEnabled, GenerateCostOptimizationReport) follow the established toolset patterns with:

  • Proper inheritance and initialization
  • Consistent error handling with try/catch blocks
  • Appropriate use of StructuredToolResult
  • Well-defined parameters and descriptions

The implementation is robust and integrates well with the existing toolset framework.


334-392: Solid toolset implementation following best practices!

The AzureMonitorLogsToolset class properly implements:

  • Idempotent prerequisites_callable method (safely callable multiple times)
  • Proper error handling in configuration initialization
  • Disabled by default setting (aligns with toolset design)
  • Appropriate metadata and documentation URLs
  • Clean separation of concerns

The implementation follows established patterns and incorporates best practices from the codebase.

holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (3)

46-70: Well-structured configuration model

The configuration class is properly implemented with appropriate field types, defaults, and validation. The trailing slash validator ensures consistent URL formatting, and the use of private fields for sensitive data like credentials is a good security practice.


280-408: Well-implemented Prometheus query execution

The query execution tool is properly structured with:

  • Clear error handling for missing configuration
  • Smart auto-cluster filtering with appropriate warnings
  • Comprehensive response handling with different status types
  • Good debugging output

1200-1268: Secure and efficient authentication implementation

The authentication handling is well-implemented with:

  • Efficient token caching with proper expiry management
  • Smart credential fallback strategy (CLI first, then default)
  • Correct Azure Monitor Prometheus scope
  • Appropriate error handling and logging
holmes/plugins/sources/azuremonitoralerts/__init__.py (2)

14-34: Well-structured source plugin initialization

The initialization properly handles:

  • Resource ID validation with clear error messages
  • Extraction of subscription ID and cluster name
  • Class-level console output control to prevent spam

Good defensive programming with format validation.


35-127: Robust alert fetching implementation

The fetch_issues method is well-implemented with:

  • Proper filtering for Prometheus metric alerts
  • Smart deduplication using alert IDs
  • Comprehensive error handling and logging
  • Clear API parameter configuration

The removal of timeRange parameter to get all active alerts is a good design decision.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (3)

7-7: Remove unused import Tuple

The Tuple type is imported but not used in this file.

-from typing import Dict, Optional, Tuple
+from typing import Dict, Optional

62-63: Remove unused variable namespace

The namespace variable is read from the service account file but never used. Either use the namespace value or simplify the check to just verify the file exists.

-        with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f:
-            namespace = f.read().strip()
+        # Check if we're running in a Kubernetes pod
+        if not os.path.exists("/var/run/secrets/kubernetes.io/serviceaccount/namespace"):
+            raise FileNotFoundError("Not running in Kubernetes")

96-292: Refactor complex function into smaller, focused functions

This function is 196 lines long with 4 different detection strategies. Consider breaking it down into smaller helper functions for better maintainability and testability.

def get_aks_cluster_id_from_kubectl() -> Optional[str]:
    """Get AKS cluster resource ID using kubectl and Azure CLI."""
    strategies = [
        _try_kubectl_context_name_strategy,
        _try_kubectl_server_url_strategy,
        _try_kubectl_node_labels_strategy,
        _try_az_aks_list_strategy
    ]
    
    for strategy in strategies:
        try:
            result = strategy()
            if result:
                return result
        except Exception as e:
            logging.debug(f"Strategy {strategy.__name__} failed: {e}")
    
    return None
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (5)

7-7: Remove unused import List

The List type is imported but never used in this file.

-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Dict, Optional, Tuple

34-34: Remove unused import AzureError

The AzureError exception is imported but never used in this file.

-from azure.core.exceptions import AzureError

504-504: Remove unused variable result_data

The variable result_data is assigned but never used in this scope.

-                result_data = {
-                    "cluster_name": cluster_name,
-                    "cluster_resource_id": cluster_resource_id,
-                    "total_prometheus_alerts": len(prometheus_alerts),
-                    "alerts": prometheus_alerts
-                }

652-652: Remove unnecessary f-string prefix

The string doesn't contain any placeholders, so the f prefix is not needed.

-                    summary_lines.append(f"   🔬 **Type:** Prometheus Metric Alert")
+                    summary_lines.append("   🔬 **Type:** Prometheus Metric Alert")

431-691: Refactor complex method for better maintainability

This _invoke method is over 260 lines long and handles multiple responsibilities. Consider extracting the following into separate methods:

  1. Timestamp parsing logic (lines 563-642) → _parse_alert_timestamp(alert, issue)
  2. Alert formatting logic (lines 512-661) → _format_alert_summary(alerts, issues)
  3. Console output logic → _print_alert_summary(formatted_summary)

This will improve readability, testability, and maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3676c0c and 7edd5b1.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • holmes/plugins/toolsets/__init__.py (3 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (1 hunks)
  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • holmes/plugins/toolsets/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.py: ALWAYS place Python imports at the top of the file, not inside functions or methods
Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)

Files:

  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
📚 Learning: the holmesgpt project uses python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dic...
Learnt from: nherment
PR: robusta-dev/holmesgpt#436
File: tests/llm/utils/mock_utils.py:240-249
Timestamp: 2025-06-05T12:23:27.634Z
Learning: The holmesgpt project uses Python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dict[str, int]` over importing equivalent types from the typing module like `List[str]`, `Dict[str, int]`.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the robusta-dev/holmesgpt codebase has comprehensive existing validation for azure environment varia...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/utils.py
  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: in robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enab...
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: applies to holmes/plugins/toolsets/**/*.yaml : toolsets must be placed in holmes/plugins/toolsets/{n...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be placed in holmes/plugins/toolsets/{name}.yaml or {name}/

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: when suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: When suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first if validation logic already exists rather than reimplementing it.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: for the robusta-dev/holmesgpt repository, do not flag fbt003 (boolean positional value in function c...
Learnt from: moshemorad
PR: robusta-dev/holmesgpt#391
File: holmes/common/env_vars.py:25-25
Timestamp: 2025-07-17T20:00:38.391Z
Learning: For the robusta-dev/holmesgpt repository, do not flag FBT003 (boolean positional value in function call) as an issue. The team is okay with boolean positional arguments in function calls.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the init_config method in toolsets should be idempotent - safely callable multiple times without err...
Learnt from: nherment
PR: robusta-dev/holmesgpt#535
File: holmes/plugins/toolsets/bash/bash_toolset.py:207-209
Timestamp: 2025-06-24T05:51:04.543Z
Learning: The init_config method in toolsets should be idempotent - safely callable multiple times without errors. self.config should maintain consistent typing (not alternate between dict and config object types) throughout the object lifecycle.

Applied to files:

  • holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🪛 Ruff (0.12.2)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py

3-3: json imported but unused

Remove unused import: json

(F401)


7-7: typing.Tuple imported but unused

Remove unused import: typing.Tuple

(F401)


63-63: Local variable namespace is assigned to but never used

Remove assignment to unused variable namespace

(F841)


714-714: Redefinition of unused os from line 6

Remove definition: os

(F811)

holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py

7-7: typing.List imported but unused

Remove unused import: typing.List

(F401)


34-34: azure.core.exceptions.AzureError imported but unused

Remove unused import: azure.core.exceptions.AzureError

(F401)


504-504: Local variable result_data is assigned to but never used

Remove assignment to unused variable result_data

(F841)


652-652: f-string without any placeholders

Remove extraneous f prefix

(F541)


return None

class CheckAKSClusterContext(BaseAzureMonitorMetricsTool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can elaborate why need this tool? can we check it once when we load the tools? what is the use case for this tool?

@@ -6,6 +6,27 @@
import yaml # type: ignore
from pydantic import ValidationError

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider create function that do the import logic and return true/false and reuse.

tags=[
ToolsetTag.CORE
],
is_default=True, # Enable by default like internet toolset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_default=True, # Enable by default like internet toolset
is_default=False, # Enable by default like internet toolset

@@ -122,6 +157,11 @@ def load_builtin_toolsets(dal: Optional[SupabaseDal] = None) -> List[Toolset]:
toolset.type = ToolsetType.BUILTIN
# dont' expose build-in toolsets path
toolset.path = None
# Set enabled status based on is_default property
if hasattr(toolset, 'is_default'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the is_default was used in other use cases also in the saas. Can you elaborate why we need this changeand why azuremonitormetrics is mark as default=true?

GenerateCostOptimizationReport(toolset=self),
],
tags=[
ToolsetTag.CORE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be cli only?

ExecuteAzureMonitorPrometheusRangeQuery(toolset=self),
],
tags=[
ToolsetTag.CORE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be cli only?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants