-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Component(s)
receiver/prometheus
What happened?
Description
When collecting metrics throug kubernetes_sd_configs, occasional label changes may occur.
Steps to Reproduce
- Use kubernetes_sd_configs to configure a prometheus scraping job.
eg.
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- default
selectors:
- role: pod
label: label=value-1At this point, occasional label changes may already occur. And you can also perform the following operations to make it easier to detect when the above-mentioned issues occur."
2. Use the cumulativeToDelta component in the OTel configuration to convert metrics to delta.
Expected Result
If the pod corresponding to the scraped instance does not change, the labels of the metrics should always remain the same.
Actual Result
The situation where occasional metric label changes occur.
Collector version
v0.95.0 (after troubleshooting, the latest version is likely to experience the same issue)
Environment information
Environment
OS: kubernetes
Compiler: go 1.22.0
OpenTelemetry Collector configuration
exporters:
file:
path: ./otel.log
extensions:
health_check:
memory_ballast:
size_mib: "256"
processors:
batch:
send_batch_size: 100
send_batch_max_size: 100
timeout: 5s
memory_limiter:
check_interval: 1s
limit_mib: 2056
cumulativetodelta:
receivers:
prometheus:
report_extra_scrape_metrics: true
trim_metric_suffixes: false
config:
scrape_configs:
- job_name: 'scrape-prometheus-open-metrics'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- default
selectors:
- role: pod
field: spec.nodeName=$KUBE_NODE_NAME
scrape_interval: 10s
scrape_timeout: 10s
metrics_path: /openMetrics
scheme: http
relabel_configs:
- source_labels: [__address__]
target_label: __address__
regex: ([^:]+)(?::\d+)?
replacement: $$1:26666
action: replace
service:
extensions:
- health_check
- memory_ballast
pipelines:
metrics/prometheus:
receivers:
- prometheus
processors:
- memory_limiter
- cumulativetodelta
- batch
exporters:
- fileLog output
Additional context
We discovered this issue because we observed that the value of the delta metric was the same as that of the cumulative metric. That is, there would be a sharp increase at a certain point.
After troubleshooting, we believe that when OTel updates the target information, some metrics read the target information, causing occasional and brief label errors. The following are the causes we believe.
- For targets with the same URL, prometheus will update
DiscoveredLabelswith the information of the most recently processed target. - In the meantime, prometheus receiver will read the
DiscoveredLabelsof the target. - Step one and step two are concurrent. If step two is executed before step one is fully completed, it will cause step two to read the information of an unexpected target.
The following are examples.
And I have a question. Why does prometheus receiver need to add nodeResources?
