Skip to content

Commit dac8fa4

Browse files
authored
feat: describe new metrics stack (#146)
* feat: describe new metrics stack * fix(metrics): remove Loki dashboard Refs NethServer/dev#7162
1 parent 7d6441c commit dac8fa4

File tree

4 files changed

+114
-8
lines changed

4 files changed

+114
-8
lines changed

index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ NethServer 8 administrator manual
6666
dnsmasq
6767
netdata
6868
piler
69+
metrics
6970

7071
.. toctree::
7172
:maxdepth: 2

metrics.rst

+103
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
.. _metrics-section:
2+
3+
==================
4+
Metrics and alerts
5+
==================
6+
7+
The monitoring stack is automatically installed on the leader node.
8+
9+
All nodes will run `Node exporter <https://prometheus.io/docs/guides/node-exporter/>`_ that provides the node metrics endpoint
10+
11+
The leader node will run:
12+
13+
- `Prometheus <https://prometheus.io/>`_ scrapes all node_exporter metrics endpoint and stores them on a local disk
14+
- `Alertmanager <https://prometheus.io/docs/alerting/latest/alertmanager/>`_ sends alerts to the configured receivers
15+
- `Grafana <https://grafana.com/>`_ visualizes the collected metrics, it is disabled by default
16+
17+
The monitoring stack does not require any configuration and it will automatically reconfigure when
18+
new nodes are added or removed from the cluster.
19+
When a node is promoted to leader, the monitoring stack will be automatically installed to new leader node
20+
and removed from the old one.
21+
22+
.. note:: Metrics and alerts are not preserved when the leader node is switched.
23+
24+
Alerts
25+
======
26+
27+
Prometheus will automatically send alerts to the Alertmanager when a rule is triggered.
28+
Current rules will send alerts for:
29+
30+
- No SWAP is configured
31+
- SWAP space is nearly full
32+
- One or more backups have failed
33+
- Disk partitions are nearly full
34+
35+
If the machine has a valid subscription, the alerts will be forwarded to the Nethesis portal like `my.nethesis.it <https://my.nethesis.it>`_
36+
or `my.nethserver.com <https://my.nethserver.com>`_.
37+
38+
If the machine does not have a valid subscription, the alerts will be visible only in the Grafana dashboard.
39+
Still you can configure the alerts to be sent to a specific email address. See :ref:`mail-notifications` section.
40+
41+
Enable Grafana
42+
==============
43+
44+
Grafana is an open-source platform for monitoring and observability. It allows you to query, visualize, alert on,
45+
and understand your metrics no matter where they are stored.
46+
Grafana provides you with tools to turn your time-series into insightful graphs and visualizations.
47+
48+
By default, Grafana is not enabled. You can enable it by configuring a path where it will be exposed.
49+
50+
Grafana can be exposed on a path of your choice.
51+
To enable Grafana access, run the following command on the leader node: ::
52+
53+
api-cli run module/metrics1/configure-module --data '{"prometheus_path": "", "grafana_path": "grafana"}'
54+
55+
Grafana will be then accessible at the following URL: ``https://<leader-node>/grafana``.
56+
If you switched the leader, please note that you may have to replace ``metrics1`` with actual metrics module instance name.
57+
58+
Default Grafana credentials are:
59+
60+
- username: ``admin``
61+
- password: ``admin``
62+
63+
During the first login, you will be asked to change the password.
64+
65+
Grafana will automatically display:
66+
67+
- a dashboard for all nodes metrics like CPU load, memory usage, and disk space
68+
- a dashboard for fired alerts
69+
70+
.. warning::
71+
If the leader node is switched, Grafana will be accessible on the new leader node but the configuration will be lost:
72+
you will need to reconfigure the admin password and customization to the dashboards.
73+
74+
Access Prometheus web interface
75+
===============================
76+
77+
By default, Prometheus web interface is not exposed to the public network.
78+
79+
If you need to troubleshoot the Prometheus configuration, you can expose the Prometheus web interface on a path of your choice.
80+
81+
To enable Prometheus web interface access, run the following command on the leader node: ::
82+
83+
api-cli run module/metrics1/configure-module --data '{"prometheus_path": "prometheus", "grafana_path": "grafana"}'
84+
85+
Prometheus will be then accessible at the following URL: ``https://<leader-node>/prometheus``.
86+
87+
.. note:: Prometheus web interface will be accessible from any IP address without authentication. Use with caution.
88+
89+
.. _mail-notifications:
90+
91+
Mail notifications
92+
==================
93+
94+
Mail notifications can be sent to users when an alert is fired or resolved.
95+
The cluster needs an SMTP server to send the notifications. So first, make sure to enable the :ref:`email-notifications` feature.
96+
If mail notifications are not enabled, the alerts will be visible only in the Grafana dashboard and not sent to any email address.
97+
98+
Then, configure the mail notifications by running the following command on the leader node: ::
99+
100+
api-cli run module/metrics1/configure-module --data '{"prometheus_path": "", "grafana_path": "grafana", "mail_to": ["[email protected]"], "mail_from": "[email protected]"}'
101+
102+
The ``mail_to`` parameter is a list of email addresses that will receive the alerts.
103+
The ``mail_from`` parameter is the email address that will be used as the sender.

prometheus.rst

+9-7
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,28 @@
1-
.. _metrics-section:
2-
31
.. _prometheus-section:
42

53
==========
64
Prometheus
75
==========
86

9-
NethServer 8 uses a widely adopted monitoring stack composed by:
7+
.. note::
8+
The Prometheus module is different from the one in :ref:`metrics-section`. This module is focused on monitoring
9+
external services and applications.
10+
11+
NethServer 8 includes a widely adopted monitoring stack composed by:
1012

1113
- `Prometheus <https://prometheus.io/>`_ scrapes all metrics endpoint and stores them on a local disk
12-
- `Node exporter <https://prometheus.io/docs/guides/node-exporter/>`_ provides the node metrics endpoint
1314
- `Grafana <https://grafana.com/>`_ visualizes the collected metrics
1415

16+
This is the same stack used by :ref:`metrics-section`, but this module is focused on monitoring
17+
external services and applications.
1518

1619
You can install only one instance of **Prometheus**, usually on the leader node.
1720
Prometheus does not require any configuration and it will be exposed on a random URL.
1821
The URL is available on the Prometheus instance ``Status`` page. You can access it from the software center or
1922
from the application menu in the top-right corner.
2023

21-
You should install the **node exporter** on each cluster node.
22-
To install it, access the :ref:`software_center-section` and look for the ``node_exporter`` application.
23-
Each time a new node with the exporter is installed, Prometheus will automatically collect the node metrics.
24+
Since core 3.5.0, the node_exporter is already installed as core module on all nodes.
25+
Prometheus will automatically scrape metrics from all nodes.
2426

2527
**Grafana** can be installed only on the leader node.
2628
After installation, you will need to configure the ``Host name`` with a valid FQDN to access the Grafana instance.

subscription.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ When a cluster has an active subscription, the following services are
3333
enabled:
3434

3535
- Remote support by Nethesis
36-
- Resources monitoring and alerting
36+
- Resources :ref:`monitoring and alerting <metrics-section>`
3737
- Upload of leader node inventory
3838
- Scheduled updates for node operating systems, core components, and
3939
applications

0 commit comments

Comments
 (0)