-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E-Matching Non-Canonical Nodes #34
Comments
Thanks for reporting this careful analysis @Tarmean. I am wary of these exponential slowdowns, but it would be very good to have a minimal reproducer example to investigate further. @folivetti may be running into this, but will hopefully let me know if in his investigation it turns out to be a problem with the system (and not with hegg). |
edit: never mind again, this works without my height calculation from the other issue. Sorry about the noise. |
I've been trying two things, so I'm gonna split this into two parts in the hopes it remains somewhat comprehensible: Testing performanceIt's not properly self-contained but I have been using the test cases as benchmarks. Here is a hacked-together version. master...Tarmean:hegg:exponential_example This limits the iterations to 8, and disables scheduling. When I run
But normClasses is 30% slower for smaller rewrites! A workaround would to count merges, and only clean up if there have been >n merges since the last time. But maybe it is simpler to leave it be since it only becomes relevant with big examples like 50s of non-scheduled rewriting? General slowdownMaybe there is a second thing going on. Even with normClasses the cleanup the test is 1s for 6 iterations, 4s for 7 iterations, and 53s for 8 iterations. My understanding of the current algorithm is roughly:
Where gather_var loops over the database and finds variable assignments which are possible for all pattern constraints with the existing substitutions. If the DB size increases, this slows down gather_var. If gather_var returns more items, the nested gather_var is called more frequently. So if they scale together this maybe causes quadratic behavior? Tbh I wasn't smart enough to fully understand the explanation in the relational e-matching paper. I may be totally off with my understanding. I found the implementation strategy of this paper interesting. It can freely move between generic join and hash join. But it builds an index for each atom in the query! Even when sharing indexes with compatible variable orderings across all rewrite rules the index building could dominate the runtime for small databases. I have an experimental implementation of this here but the code is in serious need of a cleanup https://github.com/Tarmean/hegg/blob/master/src/Data/Equality/Compiler/QueryExecution.hs#L73 . |
I have been playing with alternative e-matching implementations and tried comparing my versions with the existing matching. Largely the differences are fairly minor, but dumping all rule triggers for the test suite showed some interesting differences:
Not sure if those are critical issues, but the scheduler blocking extra rules makes performance comparisons tricky. And disabling the scheduler seems to cause occasional exponential slowdowns for the existing genericJoin, I haven't figured out if its something about the join algorithm or if the duplicates actually cause exponentially more results.
The text was updated successfully, but these errors were encountered: