-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClosedConnectionError & rate limiting #82
Comments
No worries! TBH, I’ve lost track of the current rate limits the Wayback Machine imposes, but I think earlier this year it was at 10 requests/second for both CDX search (i.e. If you are using multiple threads, you can do some messy stuff to share connections across threads, which has helped us reduce connection errors with Wayback in these code samples: That’s way over-complicated and I hope to get that functionality built-in to this package as part of #58. You also might find some useful inspiration from other parts of the above script, which we use to pull in ~20 GB of data every night from Wayback. It’s really messy and a bit hard to follow, though. (It’s had a lot of iterations but limited time to really clean it up over the last few years, and is what this package was originally extracted from.) (Sorry about the slow feedback here, @jordannickerson. I’ve been semi-offline for the last couple weeks.) |
Quick update: I’m considering this a duplicate of #58, which I am pretty committed to actually solving this month. |
I apologize for the slight abuse of the term "Issues", as I don't think the problem I'm encountering is a true issue of your project.
While using wayback, I've run into issues with the connection being closed by the remote host. I've been performing a lot of search requests/pulling mementos, and suspect I'm hitting a rate limit. However, I have put a large delay between queries (5ish seconds).
Is there a best practice on how much we should throttle usage, and are there other things that we should do beyond just looping over all our searches with a time.sleep call to avoid slamming the server?
The text was updated successfully, but these errors were encountered: