feat(ansible): pve-host-netmon role — Proxmox network bandwidth monitoring#1383
Open
feat(ansible): pve-host-netmon role — Proxmox network bandwidth monitoring#1383
Conversation
… monitoring Installs Grafana Alloy directly on Proxmox hosts (not in LXC) to collect per-physical-interface byte counters via prometheus.exporter.unix (netdev collector only). Goal: track monthly bandwidth against a 25 TB housing limit without adding Cacti or fragmenting monitoring into Zabbix. Only node_network_receive_bytes_total and node_network_transmit_bytes_total are emitted (~4-6 timeseries total). All Proxmox virtual interfaces (vmbr*, veth*, fwbr*, fwln*, fwpr*, tap*) are excluded by regex to stay within existing Grafana Cloud metrics quota. Role includes full molecule test suite (converge + idempotence + verify) mirroring the pve-monitoring role pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Grafana dashboard (proxmox folder) to track monthly bandwidth usage on rabbit-01-psp against the 25 TB housing limit (in+out combined, calendar month from the 1st). Dashboard panels (default time range: now/M → now i.e. "This month"): - Gauge: % of 25 TB limit (green/yellow/orange/red thresholds at 70/90/95%) - Stat: total used this month (decbytes) - Stat: remaining budget (colour-inverted thresholds) - Stat: daily average bytes/day via $__range_s - Time series: per-interface rx/tx rate (binBps) for spike detection All stat panels use instant queries with $__range so they compute the exact month-to-date delta rather than a rolling 30-day approximation. Also fixes the Alloy config template to set instance="rabbit-01-psp" via prometheus.relabel (otherwise the label would be the internal scrape address). molecule verify updated to assert the relabel block is present (18 checks total, all pass). generate_dashboards.py changes: - p_stat gains instant=False param (backward-compatible) - p_gauge added (type: gauge with threshold markers) - make_dashboard gains time_range/refresh overrides - stable_uid handles "proxmox" cluster (prefix "pve") - build_rabbit_netbw() builder added - APPS["proxmox"] registered; "rabbit-netbw" builder wired in main() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
Contributor
Terraform Format and Style 🖌
|
eno1 is the internet-facing interface on rabbit-01-psp that counts
toward the 25 TB housing quota. Scoping all six PromQL expressions to
{device="eno1"} makes the intent explicit and prevents spurious data
if additional interfaces are ever added to the host.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ansible/pve-host-netmon/that installs Grafana Alloy directly on Proxmox hosts to track per-interface network bandwidthprometheus.exporter.unixwith only thenetdevcollector — produces ~4–6 timeseries total, negligible Grafana Cloud quota impactvmbr*,veth*,fwbr*,fwln*,fwpr*,tap*) via configurablenetmon_device_excluderegexnode_network_receive_bytes_total+node_network_transmit_bytes_totalper physical NIC → enablesincrease(...[30d])for monthly totals and alerting against the 25 TB housing limitRole structure
defaults/main.ymltasks/main.ymltemplates/alloy-config.alloy.j2handlers/main.ymlinventory.ymlpve_hostsgroup (rabbit-01-psp placeholder)playbook.yml-e @secrets.ymlmolecule/default/Molecule test results
Verify checks: GPG keyring, alloy package, config dir/file,
set_collectors = ["netdev"],device_exclude, remote_write URL,prometheus.exporter.unixblock,CUSTOM_ARGSin/etc/default/alloy, service enabled, pve-exporter absent.Deployment (rabbit-01-psp)
Before running, fill in real SSH host and IP in
inventory.yml, then provide secrets:secrets.ymlmust contain:grafana_api_keygrafana_metrics_usernamegrafana_prometheus_urlTest plan
molecule testpasses (converge + idempotence + verify)node_network_receive_bytes_total{site="bgy"}vmbr*,veth*, etc.)🤖 Generated with Claude Code