Skip to content

add timeout to node stats#529

Closed
trk-bxn wants to merge 5 commits intoprometheus-community:masterfrom
trk-bxn:add_node_timeout
Closed

add timeout to node stats#529
trk-bxn wants to merge 5 commits intoprometheus-community:masterfrom
trk-bxn:add_node_timeout

Conversation

@trk-bxn
Copy link
Copy Markdown

@trk-bxn trk-bxn commented Feb 4, 2022

  • Added fix for metric gabs on high load on single node.

With the elasticsearch version 7 and Opensearch version 1 there is a new timeout parameter implement.

Elasticsearch 6 ignores this timeout parameter and this change have no impact.

Tested with elasticsearch 6.8 and opensearch 1.2

Elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html#cluster-nodes-stats-api-query-params

Opensearch
https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/rest-api-reference/#nodesinfo

resolve #508

Signed-off-by: trk-bxn <tobias.koenigs@tonies.com>
Signed-off-by: trk-bxn <tobias.koenigs@tonies.com>
Signed-off-by: trk-bxn <tobias.koenigs@tonies.com>
Copy link
Copy Markdown
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the httpclient passed into collector.NewNodes already has this set on the client side. I think you could pull it from the http.Client.Timeout field so the signature does not need to change. I am also not sure what the benefit is of the http query string parameter. Can you explain how this would benefit users of this exporter?

@trk-bxn
Copy link
Copy Markdown
Author

trk-bxn commented Jun 15, 2022

Hi,
the benefit is that if one single node have a high response time e.g. high load or bad network connection. The request from exporter to the cluster return with this fix metrics of the other nodes. Currently the exporter return no data for all nodes in this situation and create a gap in Grafana. With this fix this Gab is only for this single node.

This are two different timeout option. The http.Client.Timeout stop if the request is to long. But in the background the requested node try to get the metrics from the other nodes. This backend request have a elasticsearch internal timeout of default 30 seconds. If now the http.Client timedout 5 secends the requested noded is not ready to send all node metrics and return then an empty response. This is a problem if you have more nodes.

The elasticsearch 7 and opensearch 1 support now to change the default internal timeout for node metrics request and fix the empty response. I defined now to set the elasticsearch internal timeout to the same timeout of the elasticsearch exporter.

@trk-bxn
Copy link
Copy Markdown
Author

trk-bxn commented Jun 15, 2022

this fix this problem in newer elasticsearch version
#330

@fostdvelop
Copy link
Copy Markdown

Hi,
what is keeping this PR from being merged?
Is there a workaround that we can use for now, cause we think these gaps are causing false alarms for us.

@sysadmind
Copy link
Copy Markdown
Contributor

If you're seeing gaps in data, you want to try adjusting the flag --es.timeout=5s. This PR puts a timeout URL parameter on the nodes endpoint request, but I'm not sure how that help. The HTTP client that we use in the exporter will respect the --es.timeout parameter to timeout a request when it takes too long.

@sysadmind
Copy link
Copy Markdown
Contributor

@trk-bxn this PR is out of date and needs a rebase. I'm still not clear how the URL parameter makes a difference. Does elasticsearch do something special with the URL parameter? Can you link to the docs for that? I'm going to close for now but if you have information about how this effects elasticsearch, I would like to reconsider.

@sysadmind sysadmind closed this Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Suggestion for timeout method on ES 7+ and opensearch

3 participants