Use rollup queries in custom dimension reports #22647
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Alternative implementation of #22571 using
WITH ROLLUP
To avoid running two queries for custom dimension reports, this PR explores the option of using rollup queries.
Fixture changes and updated result expectations have been copied from #22571. The failing entry in the
test_reportLimitingdimension_2_rankingQuery__CustomDimensions.getCustomDimension_day.xml
fixture stems from a newOthers
row being added by the implementation of this PR. Needs to be verified if the changes are actually correct or a bad side effect.As
WITH ROLLUP
usesNULL
for the summary rows, we need to prevent those values changing the results. If we cannotCOALESCE
those values we need find a way to update the counter expressions to handle them.Followup tasks
The current way of
SELECT FROM ( SELECT WITH ROLLUP ) ORDER BY
should be compatible with all supported database engines. UsingSELECT WITH ROLLUP ORDER BY
without a subquery could remove the addedfilesort
from the rollup queries, improving the overall performance.This should not only look at the SQL query performance, but also the results being created. In the example queries each distinct
custom_dimension_1
value will create a separate rollup row, in addition to the ones already present. This could lead to an increased data table size we should be aware of before committing to a full rollup implementation.Local testing
Compared the current queries with the new implementation, both WITH and WITHOUT the
COALESCE
addition, as my local dataset contained several entries withidaction_url = NULL
that changed the results.Current state
SQL Query
EXPLAIN Query
Result Data
ROLLUP (with duplicate NULLs)
SQL Query
EXPLAIN Query
Result Data
The duplicate
url = NULL
rows are creating some problems with the data table. They are messing with the ranking counter and are not merged into the__mtm_ranking_query_others__
row.The query is only listed to highlight the problems of potential
NULL
values in a grouping column.ROLLUP with COALESCE
SQL Query
EXPLAIN Query
Result Data
Review