Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ The MONITOR information includes the following data:

The metrics above are directly read from and stored in the monitoring database.

{{< alert title="GPU Monitoring Notes" color="success" >}}
- GPU metrics represent combined values across all detected GPUs. This may be a sum or average depending on the metric.
- GPU load can spike very quickly (e.g. during inference). Since probes run at configurable intervals (MONITOR_VM, default 30s in `monitord.conf`), short spikes may not be captured. Lower the interval to capture more details. You can check monitoring configuration [here]({{% relref "../../../product/cloud_system_administration/resource_monitoring/monitoring_system" %}}).
- GPU monitoring metrics may not behave as expected when using MIG or vGPU configurations. In such cases interpretation of the values requires caution.
{{< /alert >}}

Additionally, the following derived metrics are calculated from the stored metrics and used for forecasting. These derived metrics are not stored in the database but are computed on-demand:

| Key | Description |
Expand Down