-
Notifications
You must be signed in to change notification settings - Fork 149
Add azuremonitormetrics toolset & azuremonitorlogs toolset (both toolsets for CLI only) #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* add azuremonitor metrics toolset v1.0.0 * working queries * limit satapoints to avoid excessive tokens that causes OAI throttling * enforce cluster filter and show actual queries run..
* add azuremonitor metrics toolset v1.0.0 * working queries * limit satapoints to avoid excessive tokens that causes OAI throttling * enforce cluster filter and show actual queries run.. * add azure monitor metrics alert investigation
…oks) (#3) * add azuremonitor metrics toolset v1.0.0 * working queries * limit satapoints to avoid excessive tokens that causes OAI throttling * enforce cluster filter and show actual queries run.. * add azure monitor metrics alert investigation * Pushing runbook changes Added three runbooks on DNS, nodes, and pods + catalog changes --------- Co-authored-by: Auston Li <[email protected]>
* add azure monitor logs toolset * fix azure montitor metrics toolset for detection issues.
WalkthroughThis update introduces comprehensive Azure Monitor integration for AKS clusters, adding new toolsets for Azure Monitor Metrics and Logs, complete with configuration, installation, and troubleshooting documentation. It implements new plugins for fetching and investigating Azure Monitor Prometheus metric alerts, extends the CLI with alert investigation commands, and provides diagnostic runbooks for systematic alert analysis. Extensive unit tests and documentation support these additions, alongside dependency updates and configuration examples. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (holmes/main.py)
participant AzureMonitorAlertsSource
participant Azure API
participant Investigator
participant RunbookManager
User->>CLI (holmes/main.py): Run `azuremonitormetrics --alertid=...`
CLI (holmes/main.py)->>AzureMonitorAlertsSource: fetch_issue(alertid)
AzureMonitorAlertsSource->>Azure API: Query Alert Management API
Azure API-->>AzureMonitorAlertsSource: Return alert details
AzureMonitorAlertsSource-->>CLI (holmes/main.py): Return Issue object
CLI (holmes/main.py)->>Investigator: investigate(issue)
Investigator->>RunbookManager: get_matched_runbooks_for_issue(issue)
RunbookManager-->>Investigator: Return matched runbooks
Investigator-->>CLI (holmes/main.py): Investigation result
CLI (holmes/main.py)-->>User: Display analysis and runbook info
sequenceDiagram
participant HolmesGPT
participant AzureMonitorMetricsToolset
participant Azure API
participant AKS Cluster
HolmesGPT->>AzureMonitorMetricsToolset: Execute PromQL query
AzureMonitorMetricsToolset->>Azure API: Authenticate (Azure CLI/Managed Identity)
AzureMonitorMetricsToolset->>AKS Cluster: Detect cluster resource ID
AzureMonitorMetricsToolset->>Azure API: Query Prometheus endpoint
Azure API-->>AzureMonitorMetricsToolset: Return metrics/alerts
AzureMonitorMetricsToolset-->>HolmesGPT: Return results
Estimated code review effort🎯 5 (Critical) | ⏱️ ~70 minutes Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 23
🔭 Outside diff range comments (1)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)
715-716
: Move import to the top of the file.Per the coding guidelines, all Python imports should be at the top of the file, not inside functions or at the bottom.
Move the
import os
statement from line 715 to the top of the file with other imports.
🧹 Nitpick comments (18)
holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md (2)
19-19
: Use proper heading syntax instead of emphasisReplace bold emphasis with proper heading syntax for better document structure and accessibility.
-**Core signals** +#### Core signals -**Kube-State Metrics** +#### Kube-State Metrics -**Node Exporter Metrics** +#### Node Exporter MetricsAlso applies to: 37-37, 48-48
109-109
: Add newline at end of fileAdd a newline character at the end of the file to follow standard file conventions.
| **Node Upkeep** | Periodically rotate nodes via rolling upgrade or nodepool reimage | +
holmes/core/runbooks.py (1)
28-44
: Consider refactoring to reduce code duplication.The new method works correctly but duplicates the filtering logic from
get_instructions_for_issue
. Consider extracting the common filtering logic into a private helper method to improve maintainability.Example refactor:
+ def _get_matching_runbooks(self, issue: Issue) -> List[Runbook]: + """Private helper to get runbooks matching an issue""" + matched_runbooks = [] + for runbook in self.runbooks: + if runbook.match.issue_id and not runbook.match.issue_id.match(issue.id): + continue + if runbook.match.issue_name and not runbook.match.issue_name.match(issue.name): + continue + if runbook.match.source and not runbook.match.source.match(issue.source_type): + continue + matched_runbooks.append(runbook) + return matched_runbooks + def get_instructions_for_issue(self, issue: Issue) -> List[str]: - instructions = [] - for runbook in self.runbooks: - if runbook.match.issue_id and not runbook.match.issue_id.match(issue.id): - continue - if runbook.match.issue_name and not runbook.match.issue_name.match(issue.name): - continue - if runbook.match.source and not runbook.match.source.match(issue.source_type): - continue - instructions.append(runbook.instructions) - return instructions + matched_runbooks = self._get_matching_runbooks(issue) + return [runbook.instructions for runbook in matched_runbooks] def get_matched_runbooks_for_issue(self, issue: Issue) -> List[Runbook]: """Get the actual Runbook objects that match an issue (for accessing metadata like file paths)""" - matched_runbooks = [] - for runbook in self.runbooks: - if runbook.match.issue_id and not runbook.match.issue_id.match(issue.id): - continue - if runbook.match.issue_name and not runbook.match.issue_name.match(issue.name): - continue - if runbook.match.source and not runbook.match.source.match(issue.source_type): - continue - matched_runbooks.append(runbook) - return matched_runbooks + return self._get_matching_runbooks(issue)README.md (1)
44-44
: Add missing period after "etc."In American English, abbreviations like "etc." require a period.
Apply this diff:
-| [<img src="images/integration_logos/http-icon.png" alt="Internet" width="20" style="vertical-align: middle;"> **Internet**](https://docs.robusta.dev/master/configuration/holmesgpt/toolsets/internet.html) | ✅ | Public runbooks, community docs etc | +| [<img src="images/integration_logos/http-icon.png" alt="Internet" width="20" style="vertical-align: middle;"> **Internet**](https://docs.robusta.dev/master/configuration/holmesgpt/toolsets/internet.html) | ✅ | Public runbooks, community docs etc. |holmes/plugins/toolsets/azuremonitor_metrics/install.md (1)
70-135
: Specify a language for all fenced code blocks (markdownlint MD040).
markdownlint-cli2
fails on the blocks that open on Lines 77, 102, 109, 119 and 129 because they use plain```
without a language qualifier.
These blocks contain CLI samples or expected console output; declaring a language (e.g.bash
, `yaml`, `text`) will silence the linter and improve syntax highlighting.Example fix for the first offending block:
-``` +```text azuremonitorlogs │ True │ enabled │ built-in │ │Repeat for each affected block.
holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md (2)
12-18
: Avoid using bold text as a substitute for headings (markdownlint MD036).The emphasised line “PromQL Queries” (Line 13) triggers MD036.
Either promote it to a real heading or drop the bold styling:-**PromQL Queries** +### PromQL Queries
44-48
: Remove blank lines inside block-quotes (markdownlint MD028).Lines 45 and 47 are empty lines that sit inside the block-quotes, tripping MD028.
Delete those blanks to keep the quote intact.holmes/plugins/toolsets/azuremonitorlogs/install.md (1)
70-82
: Add language identifiers to fenced code blocks (markdownlint MD040).Blocks that start on Lines 73, 101, 109, 118 and 128 are missing a language spec.
Mark them astext
(for expected output / errors) orbash
as appropriate.-``` +```text Error: DefaultAzureCredential failed to retrieve a tokenRepeat for each occurrence.
docs/toolsets/azuremonitor-logs.md (1)
140-176
: Language spec missing in fenced blocks (markdownlint MD040).The error / output snippets opened on Lines 145, 151, 160 and 169 lack a language tag.
Usetext
to satisfy the linter:-``` +```text Error: DefaultAzureCredential failed to retrieve a tokenexamples/azuremonitor_runbooks.yaml (1)
35-43
: Trim trailing whitespace flagged by yamllint.Line 39 has stray spaces at the end:
- - Use time ranges that cover the alert period plus some buffer (2h, 4h, 6h)␠␠ + - Use time ranges that cover the alert period plus some buffer (2h, 4h, 6h)A similar issue exists on Line 187.
Cleaning these prevents CI failures on strict YAML linting.holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2 (1)
111-119
: Add language identifier to the code block.The example investigation flow should have a language identifier for better syntax highlighting.
-``` +```text 1. List alerts → get_azure_monitor_alerts_with_fired_times 2. See alert query: kube_deployment_status_replicas{cluster="myaks"} != kube_deployment_spec_replicas{cluster="myaks"} 3. Execute alert query → execute_alert_promql_query (alert_id, time_range="2h") 4. Analyze deployment status → execute_azuremonitor_prometheus_range_query with deployment queries 5. Check pod status → query kube_pod_status_phase for related pods 6. Investigate resources → query CPU/memory metrics for the timeframe</blockquote></details> <details> <summary>holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)</summary><blockquote> `62-62`: **Remove unused variable assignment.** The `namespace` variable is read but never used. ```diff - with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f: - namespace = f.read().strip() + # Verify we're in a Kubernetes pod by checking service account + with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f: + f.read() # Just verify the file exists and is readable
AZURE_MONITOR_SETUP_GUIDE.md (1)
7-10
: Add language specifier to code block.Fenced code blocks should have a language specified for proper syntax highlighting.
-``` +```text Cannot determine if Azure Monitor metrics is enabled because this environment is not running inside an AKS cluster. Please run this from within your AKS cluster or provide cluster details.</blockquote></details> <details> <summary>tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py (1)</summary><blockquote> `683-735`: **Consider adding test for maximum data points enforcement** The step size calculation tests are comprehensive, but consider adding a test case for when the calculated step size needs to be increased to stay within the `max_data_points` limit (1000 as configured in line 616). Example test case to add: ```python @patch('holmes.plugins.toolsets.azuremonitor_metrics.azuremonitor_metrics_toolset.process_timestamps_to_rfc3339') def test_step_size_max_data_points_enforcement(self, mock_process_timestamps): """Test that step size is increased to respect max_data_points limit.""" # Mock timestamp processing for 30-day range mock_process_timestamps.return_value = ("2023-01-01T00:00:00Z", "2023-01-31T00:00:00Z") # Test step size calculation params = { "query": "up", "description": "Test query", "output_type": "Plain" } step = self.tool._calculate_optimal_step_size(params, "2023-01-01T00:00:00Z", "2023-01-31T00:00:00Z") # For 30 days with max 1000 data points, minimum step should be ~2592 seconds assert step >= 2592
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md (2)
18-83
: Standardize markdown list style for consistencyThe document uses both asterisk (
*
) and dash (-
) for unordered lists. For consistency, use asterisks throughout as that's the predominant style in the document.Apply this pattern throughout the document:
-* **STEP 1**: Use azuremonitorlogs toolset to detect Log Analytics workspace and cluster resource ID - - If workspace/cluster not immediately detectable, use `check_aks_cluster_context` to get current cluster - - If resource group needed, use `get_aks_cluster_resource_id` with available cluster names +* **STEP 1**: Use azuremonitorlogs toolset to detect Log Analytics workspace and cluster resource ID + * If workspace/cluster not immediately detectable, use `check_aks_cluster_context` to get current cluster + * If resource group needed, use `get_aks_cluster_resource_id` with available cluster names
526-580
: Add language specifiers to code blocksSeveral code blocks are missing language specifiers, which affects syntax highlighting.
Add language specifiers to these code blocks:
-``` +```text BEFORE (ContainerLog v1): ├── ContainerLog table ($2.99/GB - Analytics only) ├── KubePodInventory table ($2.99/GB - Separate collection) └── Limited filtering options -``` +``` -``` +```text Monthly Cost = (ContainerLog Volume × $2.99) + (KubePodInventory Volume × $2.99) Example: (10GB × $89.70) + (2GB × $59.80) = $956.40/month -``` +``` -``` +```text Monthly Cost = (ContainerLogV2 Volume × $0.50) + (Eliminated KubePodInventory) Example: (10GB × $15.00) + $0 = $15.00/month Savings: $941.40/month (98.4% reduction) -``` +``` </blockquote></details> <details> <summary>docs/toolsets/azuremonitor-metrics.md (1)</summary><blockquote> `234-234`: **Add language specification to fenced code block** The fenced code block should have a language specified for proper syntax highlighting. ```diff -``` +```textholmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (1)
568-645
: Consider reducing verbose timestamp debuggingThe timestamp parsing logic contains extensive debug logging that may be too verbose for production use. Consider reducing the logging or making it conditional based on a debug flag.
Consider wrapping the verbose debug logs in a condition or reducing them to key points only:
if logging.getLogger().isEnabledFor(logging.DEBUG): # Detailed debug logging hereOr simplify to log only the final result and errors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (31)
AZURE_MONITOR_SETUP_GUIDE.md
(1 hunks)README.md
(2 hunks)config.example.yaml
(1 hunks)docs/toolsets/azuremonitor-logs.md
(1 hunks)docs/toolsets/azuremonitor-metrics.md
(1 hunks)examples/azuremonitor_runbooks.yaml
(1 hunks)holmes/core/runbooks.py
(1 hunks)holmes/core/tool_calling_llm.py
(1 hunks)holmes/core/toolset_manager.py
(1 hunks)holmes/main.py
(1 hunks)holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
(1 hunks)holmes/plugins/runbooks/azuremonitor-alerts.yaml
(1 hunks)holmes/plugins/runbooks/catalog.json
(1 hunks)holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md
(1 hunks)holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md
(1 hunks)holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md
(1 hunks)holmes/plugins/sources/azuremonitoralerts/__init__.py
(1 hunks)holmes/plugins/toolsets/__init__.py
(3 hunks)holmes/plugins/toolsets/azuremonitor_metrics/__init__.py
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/install.md
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/utils.py
(1 hunks)holmes/plugins/toolsets/azuremonitorlogs/__init__.py
(1 hunks)holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2
(1 hunks)holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
(1 hunks)holmes/plugins/toolsets/azuremonitorlogs/install.md
(1 hunks)holmes/plugins/toolsets/azuremonitorlogs/utils.py
(1 hunks)pyproject.toml
(1 hunks)tests/plugins/toolsets/azuremonitor_metrics/README.md
(1 hunks)tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: ALWAYS place Python imports at the top of the file, not inside functions or methods
Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
Files:
holmes/plugins/toolsets/azuremonitorlogs/__init__.py
holmes/plugins/toolsets/azuremonitor_metrics/__init__.py
holmes/core/tool_calling_llm.py
holmes/core/toolset_manager.py
holmes/core/runbooks.py
holmes/plugins/toolsets/__init__.py
holmes/plugins/toolsets/azuremonitorlogs/utils.py
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
holmes/main.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/sources/azuremonitoralerts/__init__.py
🧠 Learnings (9)
📓 Common learnings
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
📚 Learning: the robusta-dev/holmesgpt codebase has comprehensive existing validation for azure environment varia...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Applied to files:
holmes/plugins/toolsets/azuremonitorlogs/__init__.py
README.md
holmes/plugins/toolsets/__init__.py
holmes/plugins/toolsets/azuremonitorlogs/utils.py
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: applies to holmes/plugins/toolsets/**/*.yaml : toolsets must be placed in holmes/plugins/toolsets/{n...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be placed in holmes/plugins/toolsets/{name}.yaml or {name}/
Applied to files:
holmes/plugins/toolsets/azuremonitorlogs/__init__.py
holmes/core/toolset_manager.py
holmes/plugins/toolsets/azuremonitorlogs/install.md
holmes/plugins/toolsets/azuremonitor_metrics/install.md
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2
holmes/plugins/toolsets/__init__.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: in the kubernetes logs toolset for holmes, both current and previous logs are intentionally fetched ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
Applied to files:
holmes/plugins/toolsets/azuremonitorlogs/__init__.py
holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md
README.md
holmes/plugins/toolsets/azuremonitorlogs/install.md
docs/toolsets/azuremonitor-logs.md
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2
AZURE_MONITOR_SETUP_GUIDE.md
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: the holmesgpt project uses python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dic...
Learnt from: nherment
PR: robusta-dev/holmesgpt#436
File: tests/llm/utils/mock_utils.py:240-249
Timestamp: 2025-06-05T12:23:27.634Z
Learning: The holmesgpt project uses Python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dict[str, int]` over importing equivalent types from the typing module like `List[str]`, `Dict[str, int]`.
Applied to files:
holmes/plugins/toolsets/azuremonitorlogs/__init__.py
📚 Learning: the init_config method in toolsets should be idempotent - safely callable multiple times without err...
Learnt from: nherment
PR: robusta-dev/holmesgpt#535
File: holmes/plugins/toolsets/bash/bash_toolset.py:207-209
Timestamp: 2025-06-24T05:51:04.543Z
Learning: The init_config method in toolsets should be idempotent - safely callable multiple times without errors. self.config should maintain consistent typing (not alternate between dict and config object types) throughout the object lifecycle.
Applied to files:
holmes/core/toolset_manager.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: all new features require unit tests...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: All new features require unit tests
Applied to files:
tests/plugins/toolsets/azuremonitor_metrics/README.md
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: new toolsets require integration tests with mocks...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: New toolsets require integration tests with mocks
Applied to files:
tests/plugins/toolsets/azuremonitor_metrics/README.md
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: tests must match source structure under tests/...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Tests must match source structure under tests/
Applied to files:
tests/plugins/toolsets/azuremonitor_metrics/README.md
🪛 LanguageTool
README.md
[style] ~44-~44: In American English, abbreviations like “etc.” require a period.
Context: ...) | ✅ | Public runbooks, community docs etc | | [<img src="images/integration_logos...
(ETC_PERIOD)
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
[uncategorized] ~832-~832: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...onitor Metrics is NOT ENABLED:** - 🚨 HIGH PRIORITY RECOMMENDATION: Enable Azure Monitor ...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.17.2)
holmes/plugins/toolsets/azuremonitor_metrics/install.md
77-77: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
102-102: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
109-109: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
119-119: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
129-129: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md
19-19: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
37-37: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
48-48: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
80-80: Blank line inside blockquote
(MD028, no-blanks-blockquote)
82-82: Blank line inside blockquote
(MD028, no-blanks-blockquote)
holmes/plugins/runbooks/networking/dns_troubleshooting_prometheus_instructions.md
13-13: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
45-45: Blank line inside blockquote
(MD028, no-blanks-blockquote)
47-47: Blank line inside blockquote
(MD028, no-blanks-blockquote)
docs/toolsets/azuremonitor-logs.md
145-145: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
151-151: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
160-160: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
169-169: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
AZURE_MONITOR_SETUP_GUIDE.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/toolsets/azuremonitor-metrics.md
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
18-18: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
19-19: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
21-21: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
22-22: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
65-65: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
66-66: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
67-67: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
68-68: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
69-69: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
80-80: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
81-81: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
82-82: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
83-83: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
514-514: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
515-515: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
516-516: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
517-517: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
520-520: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
521-521: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
522-522: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
523-523: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
526-526: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
574-574: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
580-580: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
587-587: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
588-588: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
589-589: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
590-590: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
591-591: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
611-611: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
612-612: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
613-613: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
614-614: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
615-615: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
647-647: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
648-648: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
649-649: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
650-650: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
651-651: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
654-654: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
655-655: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
656-656: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
657-657: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
660-660: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
661-661: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
662-662: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
663-663: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
672-672: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
673-673: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
674-674: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
675-675: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
678-678: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
679-679: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
680-680: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
681-681: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
689-689: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
690-690: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
691-691: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
692-692: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
695-695: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
696-696: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
697-697: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
698-698: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
703-703: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
704-704: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
705-705: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
706-706: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
790-790: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
795-795: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
802-802: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
826-826: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
827-827: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
828-828: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
829-829: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
832-832: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
833-833: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
834-834: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
835-835: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
836-836: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
848-848: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
849-849: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
850-850: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
851-851: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
921-921: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1014-1014: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1015-1015: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1016-1016: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1017-1017: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1018-1018: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1019-1019: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1020-1020: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1023-1023: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1024-1024: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1025-1025: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1026-1026: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1029-1029: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1030-1030: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1031-1031: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1032-1032: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1033-1033: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 YAMLlint (1.37.1)
examples/azuremonitor_runbooks.yaml
[error] 39-39: trailing spaces
(trailing-spaces)
[error] 187-187: trailing spaces
(trailing-spaces)
🪛 Ruff (0.12.2)
holmes/plugins/toolsets/azuremonitorlogs/utils.py
3-3: json
imported but unused
Remove unused import: json
(F401)
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
4-4: pytest
imported but unused
Remove unused import: pytest
(F401)
5-5: unittest.mock.MagicMock
imported but unused
Remove unused import: unittest.mock.MagicMock
(F401)
8-8: holmes.core.tools.StructuredToolResult
imported but unused
Remove unused import: holmes.core.tools.StructuredToolResult
(F401)
holmes/main.py
888-888: f-string without any placeholders
Remove extraneous f
prefix
(F541)
918-918: f-string without any placeholders
Remove extraneous f
prefix
(F541)
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
7-7: typing.List
imported but unused
Remove unused import: typing.List
(F401)
34-34: azure.core.exceptions.AzureError
imported but unused
Remove unused import: azure.core.exceptions.AzureError
(F401)
504-504: Local variable result_data
is assigned to but never used
Remove assignment to unused variable result_data
(F841)
603-603: datetime.datetime
imported but unused
Remove unused import: datetime.datetime
(F401)
653-653: f-string without any placeholders
Remove extraneous f
prefix
(F541)
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
3-3: json
imported but unused
Remove unused import: json
(F401)
10-10: pydantic.field_validator
imported but unused
Remove unused import: pydantic.field_validator
(F401)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
3-3: json
imported but unused
Remove unused import: json
(F401)
6-6: typing.Tuple
imported but unused
Remove unused import: typing.Tuple
(F401)
62-62: Local variable namespace
is assigned to but never used
Remove assignment to unused variable namespace
(F841)
387-387: azure.mgmt.resource.ResourceManagementClient
imported but unused
Remove unused import: azure.mgmt.resource.ResourceManagementClient
(F401)
holmes/plugins/sources/azuremonitoralerts/__init__.py
86-86: Local variable alert_name
is assigned to but never used
Remove assignment to unused variable alert_name
(F841)
622-622: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🔇 Additional comments (26)
pyproject.toml (1)
61-61
: LGTM!The addition of
azure-mgmt-resourcegraph
is appropriate for querying Azure resources in the new Azure Monitor toolsets.holmes/plugins/runbooks/node_troubleshooting_prometheus_instructions.md (1)
21-74
: Excellent metrics reference tables!The comprehensive list of metrics with their uses and labels provides valuable reference material for node troubleshooting scenarios.
holmes/plugins/sources/azuremonitoralerts/__init__.py (4)
20-34
: Well-structured initialization with proper validation!The initialization properly validates the cluster resource ID and extracts necessary components with appropriate error handling.
211-274
: Clean implementation with smart console output control!The use of a class-level flag to prevent repeated console output is a thoughtful touch. The method properly structures all alert information into the Issue object.
629-652
: Clean alert name parsing with modern type hints!Good use of modern Python type hints and proper handling of edge cases.
805-826
: Robust token acquisition with proper fallback!The implementation properly handles credential fallback scenarios and includes appropriate logging for debugging.
holmes/plugins/toolsets/azuremonitorlogs/__init__.py (1)
1-2
: LGTM!Standard package marker with appropriate docstring.
holmes/plugins/toolsets/azuremonitor_metrics/__init__.py (1)
1-2
: LGTM!Clear and descriptive docstring for the Azure Monitor Metrics toolset package.
holmes/core/toolset_manager.py (1)
93-99
: LGTM! Well-designed special handling for disabled toolsets.The implementation correctly addresses the requirement to keep
azuremonitorlogs
disabled even when all toolsets are enabled. The approach using a list oftoolsets_to_keep_disabled
is extensible and makes the intention clear.holmes/plugins/runbooks/catalog.json (1)
13-32
: LGTM! Comprehensive Azure Monitor runbook additions.The new runbook entries are well-structured and provide valuable troubleshooting resources for Azure Monitor and Prometheus scenarios. The descriptions are clear and specific, covering DNS resolution, node issues, pod issues, and cost optimization.
README.md (2)
31-53
: LGTM! Comprehensive Azure Monitor integration documented.The documentation updates effectively showcase the new Azure Monitor capabilities with clear descriptions and appropriate status indicators. The migration from
robusta-dev.github.io
todocs.robusta.dev
maintains consistency across all integration links.
62-62
: LGTM! Azure Monitor Alerts integration well documented.The addition of Azure Monitor Alerts to the End-to-End Automation section is appropriate and clearly indicates its CLI-only status for investigating Prometheus metric alerts.
holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md (1)
1-63
: Excellent systematic troubleshooting workflow.This runbook provides comprehensive guidance for pod troubleshooting with well-structured workflow steps and valuable metrics reference tables. The systematic approach covering scoping, metrics analysis, events, logs, and remediation aligns well with best practices.
tests/plugins/toolsets/azuremonitor_metrics/README.md (1)
1-110
: Excellent comprehensive test documentation.This README provides thorough documentation of the Azure Monitor Metrics toolset test suite, covering all 7 tools with detailed success/failure scenarios, clear mocking strategies, and practical execution instructions. The organization and structure follow best practices and align with the project's testing standards.
holmes/plugins/toolsets/__init__.py (3)
10-30
: Excellent conditional import handling.The try/catch blocks with proper logging provide graceful handling of import failures. This ensures the system remains functional even if Azure Monitor dependencies are missing.
112-125
: Good conditional toolset registration.The conditional addition of toolsets based on import availability with appropriate logging provides good visibility into which toolsets are loaded.
155-159
: Proper default enablement logic.The logic to set
enabled
status based on theis_default
property with a sensible fallback toFalse
follows good practices for toolset management.holmes/plugins/runbooks/azuremonitor-alerts.yaml (1)
1-152
: Excellent comprehensive diagnostic runbooks.These runbooks provide systematic, actionable guidance for Azure Monitor alert investigation. The 10-step generic workflow is thorough and logical, while the specialized runbooks appropriately focus on specific alert types (CPU, memory, pod states). The integration with available toolsets and clear instructions make these valuable diagnostic resources.
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2 (4)
1-16
: LGTM!The introduction and key capabilities section provide clear documentation. The Jinja2 template syntax for listing tools is correct.
36-62
: LGTM!The query types, error handling, common metrics, and troubleshooting sections are well-documented and provide valuable guidance.
63-83
: LGTM!The alert investigation workflow Step 1 is thoroughly documented with appropriate emphasis on displaying the exact tool output without modifications.
135-149
: LGTM!The time range considerations, conditional configuration display, and setup reminder are well-documented. The Jinja2 syntax for conditional blocks is correct.
holmes/main.py (1)
854-944
: LGTM!The
azuremonitormetrics
command implementation follows the established pattern of other investigate commands with proper error handling, console output formatting, and JSON output support.holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)
580-713
: LGTM!The
enhance_promql_with_cluster_filter
function is well-implemented with comprehensive handling of PromQL syntax edge cases, proper validation, and good error handling.holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_instructions.jinja2 (1)
1-142
: LGTM!The Azure Monitor Logs toolset instructions are comprehensive and well-structured. The clear distinction between detection and query execution roles is appropriately emphasized, and the cost optimization workflow provides detailed guidance.
AZURE_MONITOR_SETUP_GUIDE.md (1)
14-388
: LGTM!The Azure Monitor Setup Guide is exceptionally comprehensive with detailed prerequisites, multiple configuration methods, thorough troubleshooting guidance, and complete workflow examples. The documentation will be very helpful for users setting up the toolset from external environments.
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
Outdated
Show resolved
Hide resolved
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
Outdated
Show resolved
Hide resolved
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🔭 Outside diff range comments (1)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)
714-715
: Move import to the top of the filePython imports should be placed at the top of the file, not at the end. This
import os
statement should be moved to line 1-12 with the other imports.-import os -And add at the top of the file:
"""Utility functions for Azure Monitor Metrics toolset.""" import json import logging +import os import re from typing import Dict, Optional, Tuple
♻️ Duplicate comments (8)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)
6-6
: Remove unused importTuple
The
Tuple
import from typing is not used in the code.-from typing import Dict, Optional, Tuple +from typing import Dict, Optionaltests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py (1)
4-8
: Remove unused importsThe following imports are not used in the test file:
pytest
(line 4)MagicMock
(line 5)StructuredToolResult
(line 8)"""Unit tests for Azure Monitor Metrics toolset individual tools.""" import json -import pytest -from unittest.mock import MagicMock, patch, Mock +from unittest.mock import patch, Mock from requests import RequestException -from holmes.core.tools import ToolResultStatus, StructuredToolResult +from holmes.core.tools import ToolResultStatus from holmes.plugins.toolsets.azuremonitor_metrics.azuremonitor_metrics_toolset import (holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py (2)
3-3
: Remove unused importjson
The
json
module is imported but never used in this file.-import json
10-10
: Remove unused importfield_validator
The
field_validator
is imported from pydantic but never used in this file.-from pydantic import BaseModel, field_validator +from pydantic import BaseModelholmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (3)
7-7
: Remove unused importList
The
List
type is imported but never used in this file. Consider using the modern type hint syntaxlist[T]
instead of importing from typing.-from typing import Any, Dict, List, Optional, Tuple +from typing import Any, Dict, Optional, Tuple
34-34
: Remove unused importAzureError
The
AzureError
exception is imported but never used in this file.-from azure.core.exceptions import AzureError
693-751
: Remove duplicate unused methodsThe methods
_get_alert_rule_details
and_is_prometheus_alert_rule
are defined here but never used within this file. According to previous review comments, these same methods exist and are actively used inholmes/plugins/sources/azuremonitoralerts/__init__.py
, making these duplicates.Please remove these unused duplicate methods to avoid code duplication and maintenance issues.
holmes/plugins/sources/azuremonitoralerts/__init__.py (1)
621-621
: Remove unnecessary f-string prefixThe string doesn't contain any placeholders, so the
f
prefix is not needed.- logging.warning(f"No rules found in rule group properties") + logging.warning("No rules found in rule group properties")
🧹 Nitpick comments (3)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (1)
530-547
: Extract complex query to a constant for better maintainabilityThe Azure Resource Graph query is very complex and embedded inline. Consider extracting it to a module-level constant or a separate function for better readability and maintainability.
# At module level AZURE_MONITOR_WORKSPACE_QUERY_TEMPLATE = """ resources | where type == "microsoft.insights/datacollectionrules" | extend ma = properties.destinations.monitoringAccounts | extend flows = properties.dataFlows | mv-expand flows | where flows.streams contains "Microsoft-PrometheusMetrics" | mv-expand ma | where array_index_of(flows.destinations, tostring(ma.name)) != -1 | project dcrId = tolower(id), azureMonitorWorkspaceResourceId=tolower(tostring(ma.accountResourceId)) | join (insightsresources | extend clusterId = split(tolower(id), '/providers/microsoft.insights/datacollectionruleassociations/')[0] | where clusterId =~ "{cluster_resource_id}" | project clusterId = tostring(clusterId), dcrId = tolower(tostring(parse_json(properties).dataCollectionRuleId)), dcraName = name) on dcrId | join kind=leftouter (resources | where type == "microsoft.monitor/accounts" | extend prometheusQueryEndpoint=tostring(properties.metrics.prometheusQueryEndpoint) | extend amwLocation = location | project azureMonitorWorkspaceResourceId=tolower(id), prometheusQueryEndpoint, amwLocation) on azureMonitorWorkspaceResourceId | project-away dcrId1, azureMonitorWorkspaceResourceId1 | join kind=leftouter (resources | where type == "microsoft.dashboard/grafana" | extend amwIntegrations = properties.grafanaIntegrations.azureMonitorWorkspaceIntegrations | mv-expand amwIntegrations | extend azureMonitorWorkspaceResourceId = tolower(tostring(amwIntegrations.azureMonitorWorkspaceResourceId)) | where azureMonitorWorkspaceResourceId != "" | extend grafanaObject = pack("grafanaResourceId", tolower(id), "grafanaWorkspaceName", name, "grafanaEndpoint", properties.endpoint) | summarize associatedGrafanas=make_list(grafanaObject) by azureMonitorWorkspaceResourceId) on azureMonitorWorkspaceResourceId | extend amwToGrafana = pack("azureMonitorWorkspaceResourceId", azureMonitorWorkspaceResourceId, "prometheusQueryEndpoint", prometheusQueryEndpoint, "amwLocation", amwLocation, "associatedGrafanas", associatedGrafanas) | summarize amwToGrafanas=make_list(amwToGrafana) by dcrResourceId = dcrId, dcraName | order by dcrResourceId """ # In the function query = AZURE_MONITOR_WORKSPACE_QUERY_TEMPLATE.format( cluster_resource_id=cluster_resource_id.lower() )holmes/plugins/sources/azuremonitoralerts/__init__.py (2)
395-627
: Comprehensive query extraction with good fallback strategiesThe Prometheus query extraction methods are well-designed with:
- Multiple fallback strategies to handle various alert formats
- Comprehensive logging for debugging
- Smart parsing of alert names to locate specific rules
Consider documenting the expected alert name format in the
_parse_alert_name
method docstring for future maintainability.
804-824
: Consider adding token caching for management APIThe token acquisition logic is correct, but unlike the Prometheus token in the toolset, this doesn't implement caching. Since multiple API calls may be made during alert fetching, consider adding token caching similar to the toolset implementation to reduce authentication overhead.
Would you like me to generate the token caching implementation for consistency with the toolset?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
holmes/core/tool_calling_llm.py
(2 hunks)holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
(1 hunks)holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md
(1 hunks)holmes/plugins/sources/azuremonitoralerts/__init__.py
(1 hunks)holmes/plugins/toolsets/__init__.py
(3 hunks)holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/utils.py
(1 hunks)holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
(1 hunks)tests/plugins/toolsets/azuremonitor_metrics/README.md
(1 hunks)tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/plugins/toolsets/azuremonitor_metrics/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- holmes/plugins/runbooks/pod_troubleshooting_prometheus_instructions.md
- holmes/plugins/toolsets/init.py
- holmes/core/tool_calling_llm.py
- holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_instructions.jinja2
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: ALWAYS place Python imports at the top of the file, not inside functions or methods
Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
Files:
holmes/plugins/sources/azuremonitoralerts/__init__.py
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
📚 Learning: in robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enab...
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Applied to files:
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: in the kubernetes logs toolset for holmes, both current and previous logs are intentionally fetched ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
Applied to files:
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
📚 Learning: when suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: When suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first if validation logic already exists rather than reimplementing it.
Applied to files:
holmes/plugins/sources/azuremonitoralerts/__init__.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the robusta-dev/holmesgpt codebase has comprehensive existing validation for azure environment varia...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Applied to files:
holmes/plugins/sources/azuremonitoralerts/__init__.py
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the holmesgpt project uses python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dic...
Learnt from: nherment
PR: robusta-dev/holmesgpt#436
File: tests/llm/utils/mock_utils.py:240-249
Timestamp: 2025-06-05T12:23:27.634Z
Learning: The holmesgpt project uses Python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dict[str, int]` over importing equivalent types from the typing module like `List[str]`, `Dict[str, int]`.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: applies to holmes/plugins/toolsets/**/*.yaml : toolsets must be placed in holmes/plugins/toolsets/{n...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be placed in holmes/plugins/toolsets/{name}.yaml or {name}/
Applied to files:
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the init_config method in toolsets should be idempotent - safely callable multiple times without err...
Learnt from: nherment
PR: robusta-dev/holmesgpt#535
File: holmes/plugins/toolsets/bash/bash_toolset.py:207-209
Timestamp: 2025-06-24T05:51:04.543Z
Learning: The init_config method in toolsets should be idempotent - safely callable multiple times without errors. self.config should maintain consistent typing (not alternate between dict and config object types) throughout the object lifecycle.
Applied to files:
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: new toolsets require integration tests with mocks...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: New toolsets require integration tests with mocks
Applied to files:
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: all new features require unit tests...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: All new features require unit tests
Applied to files:
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
📚 Learning: for the robusta-dev/holmesgpt repository, do not flag fbt003 (boolean positional value in function c...
Learnt from: moshemorad
PR: robusta-dev/holmesgpt#391
File: holmes/common/env_vars.py:25-25
Timestamp: 2025-07-17T20:00:38.391Z
Learning: For the robusta-dev/holmesgpt repository, do not flag FBT003 (boolean positional value in function call) as an issue. The team is okay with boolean positional arguments in function calls.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🪛 LanguageTool
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
[uncategorized] ~832-~832: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...onitor Metrics is NOT ENABLED:** - 🚨 HIGH PRIORITY RECOMMENDATION: Enable Azure Monitor ...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.17.2)
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md
18-18: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
19-19: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
21-21: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
22-22: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
65-65: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
66-66: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
67-67: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
68-68: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
69-69: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
80-80: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
81-81: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
82-82: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
83-83: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
514-514: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
515-515: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
516-516: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
517-517: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
520-520: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
521-521: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
522-522: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
523-523: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
526-526: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
574-574: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
580-580: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
587-587: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
588-588: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
589-589: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
590-590: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
591-591: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
611-611: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
612-612: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
613-613: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
614-614: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
615-615: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
647-647: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
648-648: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
649-649: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
650-650: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
651-651: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
654-654: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
655-655: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
656-656: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
657-657: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
660-660: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
661-661: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
662-662: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
663-663: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
672-672: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
673-673: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
674-674: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
675-675: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
678-678: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
679-679: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
680-680: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
681-681: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
689-689: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
690-690: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
691-691: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
692-692: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
695-695: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
696-696: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
697-697: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
698-698: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
703-703: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
704-704: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
705-705: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
706-706: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
790-790: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
795-795: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
802-802: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
826-826: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
827-827: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
828-828: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
829-829: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
832-832: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
833-833: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
834-834: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
835-835: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
836-836: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
848-848: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
849-849: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
850-850: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
851-851: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
921-921: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1014-1014: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1015-1015: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1016-1016: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1017-1017: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1018-1018: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1019-1019: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1020-1020: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1023-1023: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1024-1024: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1025-1025: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1026-1026: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1029-1029: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1030-1030: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1031-1031: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1032-1032: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
1033-1033: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.12.2)
holmes/plugins/sources/azuremonitoralerts/__init__.py
621-621: f-string without any placeholders
Remove extraneous f
prefix
(F541)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
3-3: json
imported but unused
Remove unused import: json
(F401)
6-6: typing.Tuple
imported but unused
Remove unused import: typing.Tuple
(F401)
62-62: Local variable namespace
is assigned to but never used
Remove assignment to unused variable namespace
(F841)
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py
3-3: json
imported but unused
Remove unused import: json
(F401)
10-10: pydantic.field_validator
imported but unused
Remove unused import: pydantic.field_validator
(F401)
tests/plugins/toolsets/azuremonitor_metrics/test_azuremonitor_metrics_tools.py
4-4: pytest
imported but unused
Remove unused import: pytest
(F401)
5-5: unittest.mock.MagicMock
imported but unused
Remove unused import: unittest.mock.MagicMock
(F401)
8-8: holmes.core.tools.StructuredToolResult
imported but unused
Remove unused import: holmes.core.tools.StructuredToolResult
(F401)
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
7-7: typing.List
imported but unused
Remove unused import: typing.List
(F401)
34-34: azure.core.exceptions.AzureError
imported but unused
Remove unused import: azure.core.exceptions.AzureError
(F401)
504-504: Local variable result_data
is assigned to but never used
Remove assignment to unused variable result_data
(F841)
652-652: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🔇 Additional comments (9)
holmes/plugins/runbooks/azure_monitor_logs_cost_optimization.md (1)
1-1059
: Excellent comprehensive cost optimization runbook!This runbook provides thorough guidance for Azure Monitor Container Insights cost optimization with detailed KQL queries, implementation strategies, and executive reporting templates. The content is well-structured and technically sound.
holmes/plugins/toolsets/azuremonitorlogs/azuremonitorlogs_toolset.py (3)
31-43
: Well-structured configuration model!The
AzureMonitorLogsConfig
class follows established patterns with proper typing and meaningful field names. Good use of Optional types for flexible configuration.
49-333
: Excellent tool implementations following established patterns!All four tools (
CheckAKSClusterContext
,GetAKSClusterResourceID
,CheckAzureMonitorLogsEnabled
,GenerateCostOptimizationReport
) follow the established toolset patterns with:
- Proper inheritance and initialization
- Consistent error handling with try/catch blocks
- Appropriate use of StructuredToolResult
- Well-defined parameters and descriptions
The implementation is robust and integrates well with the existing toolset framework.
334-392
: Solid toolset implementation following best practices!The
AzureMonitorLogsToolset
class properly implements:
- Idempotent
prerequisites_callable
method (safely callable multiple times)- Proper error handling in configuration initialization
- Disabled by default setting (aligns with toolset design)
- Appropriate metadata and documentation URLs
- Clean separation of concerns
The implementation follows established patterns and incorporates best practices from the codebase.
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (3)
46-70
: Well-structured configuration modelThe configuration class is properly implemented with appropriate field types, defaults, and validation. The trailing slash validator ensures consistent URL formatting, and the use of private fields for sensitive data like credentials is a good security practice.
280-408
: Well-implemented Prometheus query executionThe query execution tool is properly structured with:
- Clear error handling for missing configuration
- Smart auto-cluster filtering with appropriate warnings
- Comprehensive response handling with different status types
- Good debugging output
1200-1268
: Secure and efficient authentication implementationThe authentication handling is well-implemented with:
- Efficient token caching with proper expiry management
- Smart credential fallback strategy (CLI first, then default)
- Correct Azure Monitor Prometheus scope
- Appropriate error handling and logging
holmes/plugins/sources/azuremonitoralerts/__init__.py (2)
14-34
: Well-structured source plugin initializationThe initialization properly handles:
- Resource ID validation with clear error messages
- Extraction of subscription ID and cluster name
- Class-level console output control to prevent spam
Good defensive programming with format validation.
35-127
: Robust alert fetching implementationThe
fetch_issues
method is well-implemented with:
- Proper filtering for Prometheus metric alerts
- Smart deduplication using alert IDs
- Comprehensive error handling and logging
- Clear API parameter configuration
The removal of timeRange parameter to get all active alerts is a good design decision.
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
Show resolved
Hide resolved
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
Show resolved
Hide resolved
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
Show resolved
Hide resolved
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (8)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py (3)
7-7
: Remove unused importTuple
The
Tuple
type is imported but not used in this file.-from typing import Dict, Optional, Tuple +from typing import Dict, Optional
62-63
: Remove unused variablenamespace
The
namespace
variable is read from the service account file but never used. Either use the namespace value or simplify the check to just verify the file exists.- with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace", "r") as f: - namespace = f.read().strip() + # Check if we're running in a Kubernetes pod + if not os.path.exists("/var/run/secrets/kubernetes.io/serviceaccount/namespace"): + raise FileNotFoundError("Not running in Kubernetes")
96-292
: Refactor complex function into smaller, focused functionsThis function is 196 lines long with 4 different detection strategies. Consider breaking it down into smaller helper functions for better maintainability and testability.
def get_aks_cluster_id_from_kubectl() -> Optional[str]: """Get AKS cluster resource ID using kubectl and Azure CLI.""" strategies = [ _try_kubectl_context_name_strategy, _try_kubectl_server_url_strategy, _try_kubectl_node_labels_strategy, _try_az_aks_list_strategy ] for strategy in strategies: try: result = strategy() if result: return result except Exception as e: logging.debug(f"Strategy {strategy.__name__} failed: {e}") return Noneholmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py (5)
7-7
: Remove unused importList
The
List
type is imported but never used in this file.-from typing import Any, Dict, List, Optional, Tuple +from typing import Any, Dict, Optional, Tuple
34-34
: Remove unused importAzureError
The
AzureError
exception is imported but never used in this file.-from azure.core.exceptions import AzureError
504-504
: Remove unused variableresult_data
The variable
result_data
is assigned but never used in this scope.- result_data = { - "cluster_name": cluster_name, - "cluster_resource_id": cluster_resource_id, - "total_prometheus_alerts": len(prometheus_alerts), - "alerts": prometheus_alerts - }
652-652
: Remove unnecessary f-string prefixThe string doesn't contain any placeholders, so the
f
prefix is not needed.- summary_lines.append(f" 🔬 **Type:** Prometheus Metric Alert") + summary_lines.append(" 🔬 **Type:** Prometheus Metric Alert")
431-691
: Refactor complex method for better maintainabilityThis
_invoke
method is over 260 lines long and handles multiple responsibilities. Consider extracting the following into separate methods:
- Timestamp parsing logic (lines 563-642) →
_parse_alert_timestamp(alert, issue)
- Alert formatting logic (lines 512-661) →
_format_alert_summary(alerts, issues)
- Console output logic →
_print_alert_summary(formatted_summary)
This will improve readability, testability, and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (4)
holmes/plugins/toolsets/__init__.py
(3 hunks)holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
(1 hunks)holmes/plugins/toolsets/azuremonitor_metrics/utils.py
(1 hunks)pyproject.toml
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- holmes/plugins/toolsets/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py
: ALWAYS place Python imports at the top of the file, not inside functions or methods
Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
Files:
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Learnt from: nherment
PR: robusta-dev/holmesgpt#408
File: holmes/plugins/toolsets/kubernetes_logs.py:90-97
Timestamp: 2025-05-15T05:13:43.169Z
Learning: In the Kubernetes logs toolset for Holmes, both current and previous logs are intentionally fetched and combined for each pod, even though this requires more API calls. This design ensures all logs are captured even when pods restart but retain their name, providing complete diagnostic information.
📚 Learning: the holmesgpt project uses python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dic...
Learnt from: nherment
PR: robusta-dev/holmesgpt#436
File: tests/llm/utils/mock_utils.py:240-249
Timestamp: 2025-06-05T12:23:27.634Z
Learning: The holmesgpt project uses Python >= 3.10 and prefers modern type hint syntax like `list[str]`, `dict[str, int]` over importing equivalent types from the typing module like `List[str]`, `Dict[str, int]`.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the robusta-dev/holmesgpt codebase has comprehensive existing validation for azure environment varia...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: The robusta-dev/holmesgpt codebase has comprehensive existing validation for Azure environment variables (AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION) and MODEL in tests/llm/utils/classifiers.py, tests/llm/conftest.py, and holmes/core/llm.py. Don't suggest adding redundant validation logic.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: in robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enab...
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: applies to holmes/plugins/toolsets/**/*.yaml : toolsets must be placed in holmes/plugins/toolsets/{n...
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-03T07:25:36.018Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be placed in holmes/plugins/toolsets/{name}.yaml or {name}/
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: when suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first ...
Learnt from: nherment
PR: robusta-dev/holmesgpt#610
File: .github/workflows/llm-evaluation.yaml:39-42
Timestamp: 2025-07-08T08:45:41.069Z
Learning: When suggesting improvements to environment variable handling in robusta-dev/holmesgpt, check first if validation logic already exists rather than reimplementing it.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: for the robusta-dev/holmesgpt repository, do not flag fbt003 (boolean positional value in function c...
Learnt from: moshemorad
PR: robusta-dev/holmesgpt#391
File: holmes/common/env_vars.py:25-25
Timestamp: 2025-07-17T20:00:38.391Z
Learning: For the robusta-dev/holmesgpt repository, do not flag FBT003 (boolean positional value in function call) as an issue. The team is okay with boolean positional arguments in function calls.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
📚 Learning: the init_config method in toolsets should be idempotent - safely callable multiple times without err...
Learnt from: nherment
PR: robusta-dev/holmesgpt#535
File: holmes/plugins/toolsets/bash/bash_toolset.py:207-209
Timestamp: 2025-06-24T05:51:04.543Z
Learning: The init_config method in toolsets should be idempotent - safely callable multiple times without errors. self.config should maintain consistent typing (not alternate between dict and config object types) throughout the object lifecycle.
Applied to files:
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
🪛 Ruff (0.12.2)
holmes/plugins/toolsets/azuremonitor_metrics/utils.py
3-3: json
imported but unused
Remove unused import: json
(F401)
7-7: typing.Tuple
imported but unused
Remove unused import: typing.Tuple
(F401)
63-63: Local variable namespace
is assigned to but never used
Remove assignment to unused variable namespace
(F841)
714-714: Redefinition of unused os
from line 6
Remove definition: os
(F811)
holmes/plugins/toolsets/azuremonitor_metrics/azuremonitor_metrics_toolset.py
7-7: typing.List
imported but unused
Remove unused import: typing.List
(F401)
34-34: azure.core.exceptions.AzureError
imported but unused
Remove unused import: azure.core.exceptions.AzureError
(F401)
504-504: Local variable result_data
is assigned to but never used
Remove assignment to unused variable result_data
(F841)
652-652: f-string without any placeholders
Remove extraneous f
prefix
(F541)
|
||
return None | ||
|
||
class CheckAKSClusterContext(BaseAzureMonitorMetricsTool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can elaborate why need this tool? can we check it once when we load the tools? what is the use case for this tool?
@@ -6,6 +6,27 @@ | |||
import yaml # type: ignore | |||
from pydantic import ValidationError | |||
|
|||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider create function that do the import logic and return true/false and reuse.
tags=[ | ||
ToolsetTag.CORE | ||
], | ||
is_default=True, # Enable by default like internet toolset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_default=True, # Enable by default like internet toolset | |
is_default=False, # Enable by default like internet toolset |
@@ -122,6 +157,11 @@ def load_builtin_toolsets(dal: Optional[SupabaseDal] = None) -> List[Toolset]: | |||
toolset.type = ToolsetType.BUILTIN | |||
# dont' expose build-in toolsets path | |||
toolset.path = None | |||
# Set enabled status based on is_default property | |||
if hasattr(toolset, 'is_default'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the is_default was used in other use cases also in the saas. Can you elaborate why we need this changeand why azuremonitormetrics is mark as default=true?
GenerateCostOptimizationReport(toolset=self), | ||
], | ||
tags=[ | ||
ToolsetTag.CORE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be cli only?
ExecuteAzureMonitorPrometheusRangeQuery(toolset=self), | ||
], | ||
tags=[ | ||
ToolsetTag.CORE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be cli only?
- Note : I have disabled azuremonitorlogs toolset for now, as it requires AzureMCP (which has removed SSE based activations from last release) where as HOlmes support only SSE . Once Holmes supports stdout, i will enable azuremonitor logs toolset in CLI by default.