Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Reduce time spent normalizing #14049

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 8, 2025

Still a Work In Progress 🚨

Which issue does this PR close?

Rationale for this change

The continued re-normalization of equivalence classes, especially when the number of expressions and number of constants in the sort order is substantial, is very expensive as described on #13748 (comment)

A substantial amount of the planning time for these queries is continually recomputing a normalized version of OrderingEquivalenceClass.

What changes are included in this PR?

Always keep OrderingEquivalenceClass normalized (deprecate OrderingEquivalenceClass::normalized_oeq_class)

Are these changes tested?

By CI.

Manually testing performance, on datafusion 44:

Running with 10 columns...completed in 36.26575ms
Running with 20 columns...completed in 106.079083ms
Running with 30 columns...completed in 432.379833ms
Running with 40 columns...completed in 1.415281375s
Running with 50 columns...completed in 3.870661417s
Running with 60 columns...completed in 9.034940917s

Currently (Jan 8):

Running with 10 columns...completed in 21.796ms
Running with 20 columns...completed in 80.530584ms
Running with 30 columns...completed in 318.37525ms
Running with 40 columns...completed in 1.091937708s
Running with 50 columns...completed in 2.966728667s
Running with 60 columns...completed in 7.124604708s

So this is better but still not good enough

Are there any user-facing changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential planning time (100s of seconds) with UNION and ORDER BY queries
1 participant