-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[red-knot] simplify "removing" in UnionBuilder::add #16947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] simplify "removing" in UnionBuilder::add #16947
Conversation
It's now O(m) instead of O(n + m) and easier to read.
|
Thanks for the PR! Codspeed is neutral on this PR, but I'm hesitant to make lots of changes to this code motivated by performance until we have better benchmarks for code that involves lots of large unions. Certain third-party libraries often do create large unions, which has created lots of pathological performance problems for other type checkers (I wroteup a list of some examples in astral-sh/ty#240). I sort of think we should prioritise adding better benchmarks for code with big unions before making too many changes like this |
}); | ||
self.elements.push(to_add); | ||
if let Some((&first, rest)) = to_remove.split_first() { | ||
self.elements[first] = to_add; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the improvement overall as it simplifies the code quiet a bit and it also reduces runtime complexity
I agree that the existing code is somewhat hard to reason about.
I'm getting the impression that this new code changes behavior because it now always swaps the first element to remove with the added element where, I think, previously, the to_add
was always appended at the end.
A version of this that's closer to the original is:
match to_remove[..] {
[] => self.elements.push(to_add),
[index] => self.elements[index] = to_add,
_ => {
// We iterate in descending order to keep remaining indices valid after `swap_remove`.
for index in to_remove.into_iter().rev() {
self.elements.swap_remove(index);
}
self.elements.push(to_add);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting the impression that this new code changes behavior because it now always swaps the first element to remove with the added element where, I think, previously, the to_add was always appended at the end.
Yes. It does.
Also it now changes the order of elements
by multiple swap_remove
-s (before it was the order-preserving algorithm of removing).
But AFAIU there are no order-related requirements for this code (e.g. it could self.elements[index] = to_add
to the middle). It's not important to preserve previous behavior, so I chose the current version in favor of simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we prefer not to change the order of union elements if it's possible and it's cheap to avoid doing so. However, if it leads to significantly worse performance to retain the order of union elements, we're okay with mutating the order (that's the reason why we already use swap_remove
in several places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. In this case it is strange to have the optimization (?) self.elements[index] = to_add
in one instance, and do not have it in another.
The option, where we append to_add
always to the end:
// We iterate in descending order to keep remaining indices valid after `swap_remove`.
for index in to_remove.into_iter().rev() {
self.elements.swap_remove(index);
}
self.elements.push(to_add);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to generally preserve the order of user-provided union annotations, but once we are modifying a union, I think it's a pretty weak preference and easily overridden by simplicity or performance.
I also note that no existing test requires changing here, suggesting that the difference in behavior is minimal in practice and doesn't seem to affect common scenarios. And as @alex-700 points out, the new simpler implementation is arguably more consistent in how it handles the ordering.
I would be inclined to accept this change as-is.
@AlexWaygood thanks for the quick response! I completely agree with you that performance-wise changes should be measured through better benchmark. Thoughts about performance:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
}); | ||
self.elements.push(to_add); | ||
if let Some((&first, rest)) = to_remove.split_first() { | ||
self.elements[first] = to_add; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to generally preserve the order of user-provided union annotations, but once we are modifying a union, I think it's a pretty weak preference and easily overridden by simplicity or performance.
I also note that no existing test requires changing here, suggesting that the difference in behavior is minimal in practice and doesn't seem to affect common scenarios. And as @alex-700 points out, the new simpler implementation is arguably more consistent in how it handles the ordering.
I would be inclined to accept this change as-is.
Summary
Simplify "removing" in UnionBuilder::add
It's now O(m) instead of O(n + m) and easier to read.
Test Plan
cargo test (incl. mdtest)