Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolvelib: emit Requires-Python dependency first #13270

Merged
merged 3 commits into from
Mar 7, 2025

Conversation

ichard26
Copy link
Member

@ichard26 ichard26 commented Mar 6, 2025

Revived version of #11398. This is a rewrite of #13160, focused on minimizing the diff to the strictly necessary changes.

Closes #11398.
Fixes #13146.
Fixes #11142.

@pfmoore PTAL. I believe I have addressed all of your concerns (namely the confusion with the tests and adding a comment about why the change to a generator is important).

Verified

This commit was signed with the committer’s verified signature.
ichard26 Richard Si
This makes the resolver always inspect Requires-Python first when
checking a candidate's consistency, ensuring that no other candidates
are prepared if the Requires-Python check fails.

This regression was masked due to a broken test which checked for the
(nonpresence of the) wrong package name.

---

The resolvelib provider was also updated to return dependencies lazily.

While ideally we wouldn't prepare candidates unnecessarily, pip has grown
numerous metadata checks (for reporting bad metadata, skipping candidates
with unsupported legacy metadata, etc.) so it's infeasible to stop
preparing candidates upon creation (without a serious architectural
redesign). However, we can create the candidates one-by-one as they're
processed instead of all dependencies at once.

This is necessary so the resolver can process Requires-Python first
without processing other dependencies.

Co-authored-by: Tzu-ping Chung <[email protected]>
@ichard26
Copy link
Member Author

ichard26 commented Mar 6, 2025

@notatallshaw Given the actual resolvelib changes are still the same from #13160, I'm going to consider your review still valid... however, I'm curious to whether PR #13253 will reduce the small overhead introduced here as you explained earlier.

@ichard26
Copy link
Member Author

ichard26 commented Mar 6, 2025

While I'm here, I'm curious: do people read the commit messages when reviewing a PR? I strive to provide additional context and generally write good, informative commit messages, but I'm wondering if they're too hidden. I usually read the commit messages, but I tend to be a slow reviewer anyway.

@pfmoore
Copy link
Member

pfmoore commented Mar 6, 2025

do people read the commit messages when reviewing a PR?

Personally, I tend to just review the final "all commits" diff. Going through commit by commit doesn't really work for me, I find it too easy to lose context while reviewing. That's probably more a comment on my review process than on the value of having well structured and described commits though 🙄

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Verified

This commit was signed with the committer’s verified signature.
ichard26 Richard Si
@notatallshaw
Copy link
Member

notatallshaw commented Mar 7, 2025

@notatallshaw Given the actual resolvelib changes are still the same from #13160, I'm going to consider your review still valid... however, I'm curious to whether PR #13253 will reduce the small overhead introduced here as you explained earlier.

Yes, the cost of prefering Requires-Python is now the lowest of any requirement, it will in general immediately short-circuit the preference calculation, ignoring all other options.

@ichard26
Copy link
Member Author

ichard26 commented Mar 7, 2025

Yes, the cost of prefering Requires-Python is now the lowest of any requirement, it will in general immediately short-circuit the preference calculation, ignoring all other options.

You're going to need to ELI5 this :)

What does it mean to prefer a "candidate"? If a candidate is preferred, is it the first to be attempted while pinning a package?

@notatallshaw
Copy link
Member

notatallshaw commented Mar 7, 2025

What does it mean to prefer a "candidate"? If a candidate is preferred, is it the first to be attempted while pinning a package?

The terminology is overloaded here, I don't think I said "candidate" ? I'm going to try my best to clarify but be aware I'm not in front of my computer right now:

For each resolution round there are a choice of requirements, the provider must tell the resolver which one is preferred, via the get_preference method. Once a requirement is chosen a candidate will be selected to attempt to pin that requirement using that candidate.

Prior to resolvelib 1.1 for each round get_preference was called for every unpinned requirement. But now for Reauires-Python requirements the get_preference call is short circuited.

@ichard26
Copy link
Member Author

ichard26 commented Mar 7, 2025

Ah, that's my bad. I definitely pulled the "candidate" term out of thin air. I don't think I understand why get_preference needs to be called for every unsatisfied requirement, but that's moreso a reflection on my utter lack of knowledge on the resolution logic. I should take a deeper look at the logic at some point... Either way, that does help me to understand the improvement. Thanks for taking the time to explain it!

Verified

This commit was signed with the committer’s verified signature.
ichard26 Richard Si
@ichard26
Copy link
Member Author

ichard26 commented Mar 7, 2025

Thanks @pfmoore and @notatallshaw for reviewing!

@ichard26 ichard26 enabled auto-merge (squash) March 7, 2025 23:24
@pfmoore
Copy link
Member

pfmoore commented Mar 7, 2025

A candidate is a specific project/version item ("foo 1.0", for example), whereas a requirement is a specification of what is valid ("foo >= 1.0"). The resolution algorithm is, at the most basic level, a process of picking a requirement from the set that still need resolving, and "pinning" it by choosing a specific candidate. That proceeds until a resolution is found, or we hit a dead end when we backtrack, unpinning things until we can try a different solution.

I don't fully remember how get_preference fits in, and I think it's changed since I worked on the resolver, but basically when you're looking for something to pin next, you call get_preference on each requirement to get a priority order. The highest priority requirement gets pinned first.

I think the point of @notatallshaw's comment is that now, if we can see a Requires-Python we prioritise that, as it can be dealt with without doing all the work of getting dependencies and analysing them that get_preference normally involves. But I'd really need to read the current code to confirm that, so take it as at best an educated guess as to what's going on...

@notatallshaw
Copy link
Member

notatallshaw commented Mar 7, 2025

I don't think I understand why get_preference needs to be called for every unsatisfied requirement, but that's moreso a reflection on my utter lack of knowledge on the resolution logic

Because of this line: https://github.com/sarugaku/resolvelib/blob/1.1.0/src/resolvelib/resolvers/resolution.py#L440

A resolver, in general, needs to know which unsatisfied requirement to try next, get_preference drives this choice for resolvelib, and resolvelib naively assumes the cost of calling get_preference for every unsatisfied requirement is cheap and takes the min for the "best" preference.

The new narrow_requirement_selection method allows the amount of things passed to get_preference to be narrowed, saving the cost of calling it so many times (in fact in the case of Requires-Python it is narrowed to calling it 0 times).

@ichard26 ichard26 merged commit 5b23c59 into pypa:main Mar 7, 2025
28 checks passed
@ichard26
Copy link
Member Author

ichard26 commented Mar 7, 2025

but basically when you're looking for something to pin next, you call get_preference on each requirement to get a priority order. The highest priority requirement gets pinned first.

Ah, I thought get_preference operated on all unpinned requirements at once (producing an ordered list, or at least returning the highest priority requirement) which is why I was confused by the need to call it numerous times per resolution round.

That makes much more sense, indeed 🙂

@ichard26 ichard26 deleted the bug/requires-python-take2 branch March 8, 2025 04:19
@pfmoore
Copy link
Member

pfmoore commented Mar 8, 2025

Thanks @notatallshaw - the point I'd forgotten was that get_preference assumes that calculating the preference is cheap (or rather, that a cheap calculation is sufficient - after all, it's only intended as a heuristic). That also clarifies for me the role of narrow_requirement_selection, which I hand't yet managed to fit into my mental model.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants