Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rollup queries in custom dimension reports #22647

Draft
wants to merge 3 commits into
base: 5.x-dev
Choose a base branch
from
Draft

Conversation

mneudert
Copy link
Member

@mneudert mneudert commented Oct 8, 2024

Description

Alternative implementation of #22571 using WITH ROLLUP

To avoid running two queries for custom dimension reports, this PR explores the option of using rollup queries.

Fixture changes and updated result expectations have been copied from #22571. The failing entry in the test_reportLimitingdimension_2_rankingQuery__CustomDimensions.getCustomDimension_day.xml fixture stems from a new Others row being added by the implementation of this PR. Needs to be verified if the changes are actually correct or a bad side effect.

As WITH ROLLUP uses NULL for the summary rows, we need to prevent those values changing the results. If we cannot COALESCE those values we need find a way to update the counter expressions to handle them.

Followup tasks

  1. Verify the query results are actually correct.
  2. Add more tests and further improve test data.
  3. Replace the subquery sorting with a sorted rollup if the database engine supports it.
    The current way of SELECT FROM ( SELECT WITH ROLLUP ) ORDER BY should be compatible with all supported database engines. Using SELECT WITH ROLLUP ORDER BY without a subquery could remove the added filesort from the rollup queries, improving the overall performance.
  4. Performance test against real data to ensure the performance does not degrade too much.
    This should not only look at the SQL query performance, but also the results being created. In the example queries each distinct custom_dimension_1 value will create a separate rollup row, in addition to the ones already present. This could lead to an increased data table size we should be aware of before committing to a full rollup implementation.

Local testing

Compared the current queries with the new implementation, both WITH and WITHOUT the COALESCE addition, as my local dataset contained several entries with idaction_url = NULL that changed the results.

Current state

SQL Query
EXPLAIN Query
Result Data

ROLLUP (with duplicate NULLs)

SQL Query
EXPLAIN Query
Result Data

The duplicate url = NULL rows are creating some problems with the data table. They are messing with the ranking counter and are not merged into the __mtm_ranking_query_others__ row.

The query is only listed to highlight the problems of potential NULL values in a grouping column.

ROLLUP with COALESCE

SQL Query
EXPLAIN Query
Result Data

Review

Copy link
Contributor

If you don't want this PR to be closed automatically in 28 days then you need to assign the label 'Do not close'.

@github-actions github-actions bot added the Stale The label used by the Close Stale Issues action label Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale The label used by the Close Stale Issues action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants