-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding regex matcher optimizer #266
Conversation
I think prometheus already does this: https://github.com/prometheus/prometheus/blob/92d69803606620505ed8f0226abb640a121ec1cd/tsdb/querier_test.go#L1840 |
The problem is that Thanos does not, and the Querier and all Stores end up parsing and compiling regular expressions for these cases. |
Can you share the code? @fpetkovski I think store gateway has the same optimization in the posting group code, right? |
In the receiver, we see high CPU usage for This is the function in Prometheus: https://github.com/prometheus/prometheus/blob/92d69803606620505ed8f0226abb640a121ec1cd/model/labels/regexp.go#L30. |
Got it. I think we saw same thing in Prometheus. A lot of time spend on |
Do we have any benchmark? I am wondering how much could this help by using equal matcher rather than regex matcher? |
Ah i finally understand; probably its grafana sending lots of regex matchers with literals! |
thsi is in prometheus now prometheus/prometheus#12434 |
case *parser.VectorSelector: | ||
for matcherIndex, lm := range e.LabelMatchers { | ||
if lm.Type == labels.MatchRegexp { | ||
if !strings.ContainsAny(lm.Value, hasRegexChars) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way in the prometheus pr was the only way i could come up with to make sure that its a literal without parsing though, is it problematic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a bit of a hack for sure
@yeya24 added the banchmark test. |
Signed-off-by: Sharad <[email protected]> make format Signed-off-by: Sharad <[email protected]> make format Signed-off-by: Sharad <[email protected]> removing regex Signed-off-by: Sharad <[email protected]> Adding banchmark test Signed-off-by: Sharad <[email protected]> Changing test name Signed-off-by: Sharad <[email protected]> Fix step generation in selectors (#271) Selector operators have a bug where if no series for a given timestamp exists, the step vector will not be generated for that step. This complicates the implementation of the absent function, and could also cause subtle bugs in downstream operators which expect all end-result steps to be present in a batch. This commit fixes the issue by making sure all steps for a batch are generated in the matrix and vector selectors before returning the batch. distribute: Add offset to engines to account for range selectors (#246) Range selectors in PromQL need to select points from before the start of the range query. When running the distributed optimizer, certain engines might not have sufficient range in the past to correctly calculate the result for certain steps in the range. To avoid returning incorrect results, this commit calculates the offset that needs to be added to the beginning of a query for each individual remote engine. This PR also adds detection for whether a sufficient time-based overlap exists between engines for queries to be safely distributed. If the overlap is not sufficient, the distributed optimizer will not modify the query and the engine will execute it in memory. Fix repo references (#274) This commit changes references from thanos-community to thanos-io. Signed-off-by: fpetkovski <[email protected]> Updating Prometheus to 905a0bd63a12fa328eac4b7f2674356c283bacac (#275) Signed-off-by: Alan Protasio <[email protected]> execution: fix scalar function (#265) Signed-off-by: Michael Hoffmann <[email protected]> Fix package reference (#276) Signed-off-by: Alan Protasio <[email protected]> Update support matrix (#273) Signed-off-by: Filip Petkovski <[email protected]> engine,test: sort results when fuzzing instant queries (#278) Signed-off-by: Michael Hoffmann <[email protected]> execution: fix stddev for large values (#267) Signed-off-by: Michael Hoffmann <[email protected]> execution: add relabel operator; add label_replace function (#279) Signed-off-by: Michael Hoffmann <[email protected]> execution: add absent function (#184) Signed-off-by: Michael Hoffmann <[email protected]> execution: fix khashaggregate comparison logic (#270) Signed-off-by: Michael Hoffmann <[email protected]> execution: remove unused labels (#284) Labels are handled by the VectorOperator Series method. Since the promql.Samples are used in the Next method they cannot affect labels. Deleting all those fields has no impact on correctness. Signed-off-by: Michael Hoffmann <[email protected]> Add support for histogram and scalar binop (#283) * Add support for histogram and scalar binop This commit adds support for operations between a histogram and a scalar. Signed-off-by: Filip Petkovski <[email protected]> * Take scalarVal out of loop Signed-off-by: Filip Petkovski <[email protected]> --------- Signed-off-by: Filip Petkovski <[email protected]> Add absent support for native histograms (#286) Absent should work on native histograms too. Signed-off-by: Filip Petkovski <[email protected]> Include histograms from remote instant queries (#288) Native histograms are not loaded from remote instant queries. This commit fixes that. Signed-off-by: Filip Petkovski <[email protected]> execution: fix step invariant operator (#281) Signed-off-by: Michael Hoffmann <[email protected]> Changing imported package name to thanos-io Signed-off-by: Sharad <[email protected]> linting Signed-off-by: Sharad <[email protected]>
Do we still need the optimizer with prometheus/prometheus#12434 merged upstream? |
ah yes we don't need it any more. I will close it. |
I think we need to bump prometheus dependency in thanos still though! |
Optimizing the regular expression if input string does not contain any regex literal. This optimization step will convert the input query from up{app=~"something"} to up{app="something"}