Skip to content

[red-knot] simplify "removing" in UnionBuilder::add #16947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2025

Conversation

alex-700
Copy link
Contributor

Summary

Simplify "removing" in UnionBuilder::add
It's now O(m) instead of O(n + m) and easier to read.

Test Plan

cargo test (incl. mdtest)

It's now O(m) instead of O(n + m) and easier to read.
@AlexWaygood AlexWaygood added the ty Multi-file analysis & type inference label Mar 24, 2025
Copy link
Contributor

mypy_primer results

No ecosystem changes detected ✅

@AlexWaygood
Copy link
Member

Thanks for the PR! Codspeed is neutral on this PR, but I'm hesitant to make lots of changes to this code motivated by performance until we have better benchmarks for code that involves lots of large unions. Certain third-party libraries often do create large unions, which has created lots of pathological performance problems for other type checkers (I wroteup a list of some examples in astral-sh/ty#240). I sort of think we should prioritise adding better benchmarks for code with big unions before making too many changes like this

});
self.elements.push(to_add);
if let Some((&first, rest)) = to_remove.split_first() {
self.elements[first] = to_add;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the improvement overall as it simplifies the code quiet a bit and it also reduces runtime complexity

I agree that the existing code is somewhat hard to reason about.

I'm getting the impression that this new code changes behavior because it now always swaps the first element to remove with the added element where, I think, previously, the to_add was always appended at the end.

A version of this that's closer to the original is:

match to_remove[..] {
    [] => self.elements.push(to_add),
    [index] => self.elements[index] = to_add,
    _ => {
        // We iterate in descending order to keep remaining indices valid after `swap_remove`.
        for index in to_remove.into_iter().rev() {
            self.elements.swap_remove(index);
        }

        self.elements.push(to_add);
    }
}

Copy link
Contributor Author

@alex-700 alex-700 Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting the impression that this new code changes behavior because it now always swaps the first element to remove with the added element where, I think, previously, the to_add was always appended at the end.

Yes. It does.

Also it now changes the order of elements by multiple swap_remove-s (before it was the order-preserving algorithm of removing).

But AFAIU there are no order-related requirements for this code (e.g. it could self.elements[index] = to_add to the middle). It's not important to preserve previous behavior, so I chose the current version in favor of simplicity.

Copy link
Member

@AlexWaygood AlexWaygood Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we prefer not to change the order of union elements if it's possible and it's cheap to avoid doing so. However, if it leads to significantly worse performance to retain the order of union elements, we're okay with mutating the order (that's the reason why we already use swap_remove in several places).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. In this case it is strange to have the optimization (?) self.elements[index] = to_add in one instance, and do not have it in another.

The option, where we append to_add always to the end:

// We iterate in descending order to keep remaining indices valid after `swap_remove`.
for index in to_remove.into_iter().rev() {
    self.elements.swap_remove(index);
}
self.elements.push(to_add);

Copy link
Contributor

@carljm carljm Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to generally preserve the order of user-provided union annotations, but once we are modifying a union, I think it's a pretty weak preference and easily overridden by simplicity or performance.

I also note that no existing test requires changing here, suggesting that the difference in behavior is minimal in practice and doesn't seem to affect common scenarios. And as @alex-700 points out, the new simpler implementation is arguably more consistent in how it handles the ordering.

I would be inclined to accept this change as-is.

@alex-700
Copy link
Contributor Author

@AlexWaygood thanks for the quick response!
This PR was motiviert mostly by simplicity and consistency (the pattern is already used here, here, and here).

I completely agree with you that performance-wise changes should be measured through better benchmark.

Thoughts about performance:

  • it is unlikely that the new code will be worse, than the current one.
  • before the code [from PR] we will spend the time in the loop. So:
  • It is non-trivial to challenge one implementation or another for perf.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

});
self.elements.push(to_add);
if let Some((&first, rest)) = to_remove.split_first() {
self.elements[first] = to_add;
Copy link
Contributor

@carljm carljm Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to generally preserve the order of user-provided union annotations, but once we are modifying a union, I think it's a pretty weak preference and easily overridden by simplicity or performance.

I also note that no existing test requires changing here, suggesting that the difference in behavior is minimal in practice and doesn't seem to affect common scenarios. And as @alex-700 points out, the new simpler implementation is arguably more consistent in how it handles the ordering.

I would be inclined to accept this change as-is.

@carljm carljm merged commit 68ea2b8 into astral-sh:main Mar 24, 2025
23 checks passed
@alex-700 alex-700 deleted the latyshev/builder-simplify-remove branch June 10, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ty Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants