Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce resumable downloads with --resume-retries #12991

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

gmargaritis
Copy link

@gmargaritis gmargaritis commented Oct 4, 2024

Resolves #4796

Introduced the --resume-retries option in order to allow resuming incomplete downloads incase of dropped or timed out connections.

This option additionally uses the values specified for --retries and --timeout for each resume attempt, since they are passed in the session.

Used 0 as the default in order to keep backwards compatibility.

This PR is based on #11180

The downloader will make new requests and attempt to resume downloading using a Range header. If the initial response includes an ETag (preferred) or Date header, the downloader will ask the server to resume downloading only when it is safe (i.e., the file hasn't changed since the initial request) using an If-Range header.

If the server responds with a 200 (e.g. if the server doesn't support partial content or can't check if the file has changed), the downloader will restart the download (i.e. start from the very first byte); if the server responds with a 206 Partial Content, the downloader will resume the download from the partially downloaded file.

yichi-yang and others added 3 commits September 26, 2024 21:26

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
- Added —resume-retries option to allow resuming incomplete downloads
- Setting —resume-retries=N allows pip to make N attempts to resume downloading, in case of dropped or timed out connections
- Each resume attempt uses the values specified for —retries and —timeout internally

Signed-off-by: gmargaritis <[email protected]>
@gmargaritis
Copy link
Author

I'm guessing the CI fails because of the new linter rules introduced in 102d818

@thk686
Copy link

thk686 commented Oct 4, 2024

Does this do rsync-style checksums? That would increase reliability.

@notatallshaw
Copy link
Member

I'm guessing the CI fails because of the new linter rules introduced in 102d818

This is CI fix, failing until it's merged: #12964

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
Signed-off-by: gmargaritis <[email protected]>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@gmargaritis
Copy link
Author

Hey @notatallshaw 👋

Is there anything that I can do to move this one forward?

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@notatallshaw
Copy link
Member

notatallshaw commented Dec 11, 2024

Is there anything that I can do to move this one forward?

A pip maintainer needs to take up the task of reviewing it, as we're all volunteers it's a matter of finding time.

I think my main concern would be the behavior when interacting with index servers that behave badly, e.g. give the wrong content length (usually 0). Your description looks good to me, but I haven't had time to look over the code yet.

@gmargaritis
Copy link
Author

A pip maintainer needs to take up the task of reviewing it, as we're all volunteers it's a matter of finding time.

Yeah, I know how it goes, so no worries!

If you need any clarifications or would like me to make changes, I'd be happy to help!

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@art-ignatev
Copy link

any chances that it'll be merged soon?

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@notatallshaw notatallshaw added this to the 25.1 milestone Feb 1, 2025
@notatallshaw
Copy link
Member

I've had an initial cursory glace at this PR and it appears to be sufficiently high quality.

I've also run the functionality locally (select a large wheel to download and then disconnect my WiFi midway through the download) and it has a good UX.

My main concern, although this is a ship that has probably sailed, is it would be nice for pip not to have to directly handle HTTP intricacies and leave that to a separate library.

I can’t promise a full review or other maintainers will agree, but I am adding it to the 25.1 milestone for it to be tracked.

@pfmoore
Copy link
Member

pfmoore commented Feb 1, 2025

The PR looks good, although I’m not a http expert so I can’t comment on details like status and header handling. Like @notatallshaw I wish we could leave this sort of detail to a 3rd party library, but that would be a major refactoring. Add this PR (along with cert handling, parallel downloads, etc) to the list of reasons we should consider such a refactoring, but in the meantime I’m in favour of adding this.

@pfmoore
Copy link
Member

pfmoore commented Feb 1, 2025

There isn’t an “approve with conditions” button, but I approve this change on the basis that someone who understands http should check the header and status handling.

@ichard26
Copy link
Member

Does this do rsync-style checksums? That would increase reliability.

@thk686 I'm very late to the party, but could you elaborate on how checksums come into play? AFAIK, indices don't serve the checksums of their distributions so there is no way pip could double check the download wasn't corrupted unless the checksums were given by the user. This PR uses conditional range requests (via the If-Range HTTP request header) which will avoid the issue of the file being changed on the server in-between requests.

@thk686
Copy link

thk686 commented Mar 13, 2025 via email

@gmargaritis
Copy link
Author

gmargaritis commented Mar 16, 2025

@ichard26

I'd prefer if pip's behaviour emulated that of a browser where it caches the incomplete download so the download can be resumed at some later point, but I realize that would be significantly increase the complexity of the implementation (and it'd also introduce some friction as the download would have to be manually restarted).

There has been some discussion around this in the past1 and I’d pretty much prefer it. However, I think that it’s out of scope for this first step in implementing resumable downloads, considering the amount of work needed.

Would be possible to default to allowing a few (1-3) resume attempts? That way, if the download fails halfway through, the download will be given another shot. It may not be enough if the connection is so unstable that it requires a ton of resumes, but for one-off failures, it would still be a major improvement. As long as the messaging is clear, I don't think automatic resumes would be that annoying to the user.1 I consider resumes as the preferred option and opting out of resumption to be an exceptional (but still important to support!) case.

I initially set the default --resume-retries to 0 for backward compatibility and to get the discussion going. I agree that a low default (e.g., 2–5 attempts) would provide a better UX, but I'd also be cautious about changing pip’s default install behavior.

We have two options:

  1. Set a low default right away, as you suggested.
  2. Release it as-is, monitor and fix any issues that arise, and consider making it the default in a future version.

Footnotes

  1. https://github.com/pypa/pip/issues/4796#issuecomment-1153260254

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
Signed-off-by: gmargaritis <[email protected]>
(cherry picked from commit f2e48c3f5885305369b88761ab74cd16a0869667)

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
Signed-off-by: gmargaritis <[email protected]>
(cherry picked from commit 53ce184348de1af4937dc04de7a1aedbe4ede19a)

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
Signed-off-by: gmargaritis <[email protected]>
(cherry picked from commit 1f8d7fe0b0a5c7b53719bd8713619f982c042dbf)

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
Signed-off-by: gmargaritis <[email protected]>
(cherry picked from commit af6b7ac624ebc18035d2da217c4c1850a6850cd7)

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
Signed-off-by: gmargaritis <[email protected]>
(cherry picked from commit 67e366aec42d913436159ca3bf877c46a0d5cd2c)

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis
…_download

Signed-off-by: gmargaritis <[email protected]>
@ichard26 ichard26 self-assigned this Mar 17, 2025
@ichard26
Copy link
Member

Just so everyone is on the same page, I plan on re-reviewing this PR sometime this week. I'm working on prototyping some code style changes which I'll share soon. Beyond that, I'd like to review the other parts of the resuming UX. After that, I should be happy enough with this to merge it and let any other suggestions be handled at a later date.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was signed with the committer’s verified signature.
gmargaritis George Margaritis

Verified

This commit was signed with the committer’s verified signature.
ichard26 Richard Si

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Verified

This commit was signed with the committer’s verified signature.
ichard26 Richard Si
@ichard26 ichard26 force-pushed the introduce-resuming-downloads branch from 7e165af to c146e81 Compare March 30, 2025 03:08
Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I've completed another review pass. In this review, I've taken a closer look at the code structure and the user-facing messaging.


Firstly, I pushed three commits to your branch:

  • ff2ccd2 - I didn't like how bytes_received was being passed absolutely everywhere. Thus, I removed it as a parameter to _write_chunks_to_file. The method will only return how many bytes it has downloaded. It's on the calling method to keep track of how many bytes have been downloaded. I also made some other simplifications where it didn't impact readability.

  • 2808134 - The point of the exceptions.py module is to separate the error printing logic from the business logic. For that reason, I moved all of the formatting logic inside the exception. Finally, I made some changes to the error itself:

    • If resumes are disabled, don't even bother saying that there are 0 attempts configured
    • If resumes are disabled, provide a more specific hint to enable resumes using --resume-retries
    • Don't repeat how many attempts are configured
    • "File" -> "URL"
    • Drop the "error" suffix from the error code, it's redundant
  • c146e81 - The "Attempting to resume" warning was reworked to be easier to for non technical users. Also, I replaced any mention of "retries" with "attempts" as retries IMO is confusing with the --retries flag

Please let me know if you have any issues with these changes.


Secondly, I have some larger design questions:

(As I mention at the end, these questions are not blocking. I want to discuss them, but no changes need to be made to get this PR landed.)

  • How about we rename the --resume-retries flag to --resume-attempts? I know that doesn't follow the pattern set by --retries, but --resume-retries is IMO confusing because it sounds similar to --retries but configures something else entirely (in addition, the former is handled by urllib3, while we handle the latter). Or because this PR actually enables download resumption and restarting (see discussion below), perhaps --download-retries?

  • Should restarts be allowed while retrying a download? My concern is that an user will be okay with pip resuming a download (perhaps even 10s of times) but wouldn't be okay with restarting the download from scratch several times (e.g., they're on a metered connection). I had a semi-private discussion with @emmatyping about this. I've decided that this is fine for now, but I want to revisit this before we make resuming the default.

More broadly, this PR has been pretty hairy to review. That's unavoidable, of course, but I think it highlights that this feature could be confusing for end-users, too.

With this PR, there are several layers to our HTTP logic. urllib3 handles connection timeouts, read timeouts, and request retrying. These correspond to the --retries and --timeout flags. Now, there's also our own logic to retry a HTTP download, corresponding to --resume-retries. Note the emphasis on "retry". While reviewing this PR, I got confused by the fact resuming and restarting are both supported. This PR generally refers to both features as "resuming" but IMO this is really "download retrying" (we resume from where we left off, but restart if resuming is impossible). This is confusing, and that's why I decided to switch to use the word "attempt" as it better encapsulates both behaviours.

Realistically, this nuance is going to be hard to get across1, but I just don't want our users to have the wrong impression of the functionality. This PR doesn't have add documentation. That's fine for now, but we should add some docs before making this the default.


Anyway, that's my review. I still haven't reviewed the tests, but that's not a blocking concern.

I want to thank you again for the PR! I will admit that I've been more nit-picky than usual. That's for two reasons: A) this is the first large pip PR I've reviewed, and B) this is complicated and I want to get this right. The latter is doubly important as it's hard to change things in pip down the line. Fortunately, backwards compatibility isn't really a huge problem with download resuming, but it's still something to consider.

On that note, I do want to get this in the pip 25.1 release, so barring any major objections, I will merge this in ~probably a week or two. I'm in favour of the PR and it's good enough to land as-is. We'll get feedback once pip 25.1 is out and we can make more changes based on that feedback as needed.

Footnotes

  1. I also recognize that as a maintainer/reviewer, I am more aware of the nuance and subtlety involved here than any regular pip user. It's likely that is contributing to the confusion, and the average user won't be as confused as they won't even know enough to be confused 🙂

@ichard26
Copy link
Member

Oh right, here are some screenshots of the new messaging:

image

(It says "Resuming download of" here. I'd made this adjustment earlier, but I reverted the change in my last commit)

image
image

The urllib3 request retrying warnings are very noisy here, but this is probably the less common scenario for retrying. I'm disconnecting the Internet entirely to trigger the retry logic. In practice, I'd imagine the download would fail, but the request/connection can be easily restored (so urllib3 won't complain). Either way, this is a pre-existing issue and should be dealt with separately.

@pfmoore
Copy link
Member

pfmoore commented Apr 1, 2025

@ichard26 WHat's the status on this? It seems close to ready, but there are still some design questions being asked. Is release 25.1 (in a couple of weeks) still a realistic target here, or should we defer it? This is a big enough feature that I'd like to see it go in, but conversely, I really don't want it as a last-minute merge a few days before the release.

@ichard26
Copy link
Member

ichard26 commented Apr 1, 2025

What's the status on this? It seems close to ready, but there are still some design questions being asked.

@pfmoore we've decided that this feature should be opt-in upon release, only to be enabled by default in pip 25.2 or later once we've gotten some feedback. Thus, I'm not too worried about the exact implementation details. Those can change in a future release if needed. I'm also pretty happy with the code as-is.

The sticking point I have is that I'm still not sure of the UI of resumable downloads. --resume-retries is a weird flag.1 As someone who understands the implementation, it makes sense, but it's likely to be rather obtuse for users. How many resumes should I allow? How does it work differently than --retries? Part of me wants to suggest that we reuse the --retries flag to enable resumable downloads2 to keep the UI simpler. OTOH, all of the networking flags (except for --proxy and --timeout) are already "advanced" features so maybe it's fine to expose these fine knobs to the users. If that's the case, I'm also not really happy with the current name. See #12991 (review) for more.

While the rest of the feature can be reworked in future releases, it's likely not feasible to rename a flag once released.

Any thoughts @pfmoore @notatallshaw? I'm rather torn and can't make up my mind.

Footnotes

  1. Does any other tool that accesses the network have a similar flag?

  2. Although once automatic resuming is the default, reusing the --retries default of 5 does seem a bit high... Also to handle the opt-in phase, we'd need to switch to using --use-feature=resume-downloads (and set an reasonable internal limit [10?] since it wouldn't be configurable).

@ichard26
Copy link
Member

ichard26 commented Apr 1, 2025

My current feeling is that it's better to stick with a simpler UI for pip 25.1. If we get complaints, then we can consider giving the users more control later (à la --resume-retries or whatever name we end up choosing) before it's enabled by default. It's harder to remove/restrict the UI after the fact (as packaging standardization has shown time after time).

Proposal:

  • The --resume-retries flag is removed
  • To opt-in into resumable/restartable downloads, one must pass --use-feature=retry-downloads1. This has the benefit that --use-feature is explicitly meant for experimental features
  • The resume/restart limit is hard-coded to some value (10? - given it's opt-in, it's fine to err on the higher end)
  • Once it's made the default, the current proposal would be to link it to the --retries flag, unless feedback received after the pip 25.1 release indicates a separate flag is necessary.

Footnotes

  1. I'm using the word "retry" over "resume" as restarting will be done if range requests aren't supported.

@pfmoore
Copy link
Member

pfmoore commented Apr 1, 2025

Any thoughts @pfmoore @notatallshaw?

I'm going to keep out of the design discussions. I have a bunch of other things on my plate, and not much spare time, so I don't want to add anything else.

I will say, though, that I don't like the idea of changing the UI once it becomes default. That's not (in my mind) what the --use-feature flag is for - it should be an opt-in to using a new feature that's complete, and won't change, to give people a chance to try it out before it's made the default1.

I'm going to say that I'd rather not have it in 25.1 unless it's complete (so assuming it's gated behind --use-feature, the only change needed to make it the default is to remove the need for --use-feature).

Footnotes

  1. I'll note that as described, this is a weird feature to use --use-feature on. Because the default is no retries, --use-feature=retry-downloads on its own is a no-op. So why have --use-feature at all? But as I said, I don't want to get sucked into design discussions, so I'm not really looking for an explanation here, just pointing out the oddity.

@notatallshaw
Copy link
Member

notatallshaw commented Apr 1, 2025

@ichard26 I'm not a fan of this proposal for a couple of reasons:

Firstly, I really dislike changing how features are enabled between pip versions, it makes guides outdated quickly and it more difficult to write scripts against pip.

Secondly, I mildly dislike overloading flag with multiple meanings, particularly to simplify reading the help at the cost of user control. What if user is working against an index they need to disable resumable retries but enable regular retries , will that be possible under the new scheme?

In terms of having nuanced options, I think most users should be served well by good defaults, and users who really need something other than the default will learn the names and meanings of those additional options.

@gmargaritis
Copy link
Author

gmargaritis commented Apr 2, 2025

Thank you all for your input and efforts!

There have been various discussions about the naming convention in the past (#11180, #4796).

In terms of having nuanced options, I think most users should be served well by good defaults, and users who really need something other than the default will learn the names and meanings of those additional options.

I fully agree, and I suggest keeping the existing implementation of --resume-retries.

Merging the behavior with --retries requires a deeper technical and UX discussion, so keeping them separate for now ensures clarity without limiting future improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Pip could resume download package at halfway the connection is poor
8 participants