-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic 403 "You have exceeded a secondary rate limit” errors #276
Comments
Based on the error you shared, I'm guessing you have automatic branch updates enabled for this repository? That feature can definitely be expensive when a repository has many open pull requests that need to be updated. Each time a PR merges, Bulldozer needs to evaluate all open PRs to see if they need updates and the required API calls can add up. While it may not fit with your workflow, you can reduce requests by making updates opt-in (via the Otherwise, to further diagnose this, I think we'll want to look at the GitHub request logs. If you have debug logging enabled in Bulldozer, you can look at all messages with the The request logs will also show the cache status of requests. Bulldozer maintains an in-memory cache for GitHub responses that is 50MB by default. One thing you could try is increasing the size of this cache and then checking request logs to see if cache hits have increased: cache:
max_size: "100MB" Cache metrics can also be exported to Datadog, but unfortunately, we don't support any other metric exporters yet. |
Thank you for the reply. The information you supplied is very helpful. Our current bulldozer configuration is:
Currently in our 30 repos, we have 57 pull requests with the “auto update” label and 11 with the “auto merge” label applied. Many Bulldozer operations run successfully however, they start failing with the “403 You have exceeded a secondary rate limit. Please wait a few minutes before you try again” error, which I think indicates Bulldozer may be doing too many api requests, too quickly and the secondary rate limit kicks in. Could 68 PRs being processed by Bulldozer cause this? Is there a way to slow down some of the Bulldozer operations, e.g some delay variable setting etc? We do have debug configured and I have access to the Bulldozer logs with a history of calls. I am attaching a compressed log, with our org and repo names replaced, but it has all other data. Please let me know if you need any more information. In the attached log, you'll see that "OUR_REPO_7" seems to have more issues. That is our biggest repo and has the most PRs and therefore the most Bulldozer operations. |
Thanks for the logs. I found at least one request, That seems like what happened here: the first API request made after the 11th When we process PRs for updates, we just run through them in a loop, as fast as possible, so it's not too surprising that updating many PRs could trigger the secondary limits. We should probably add a delay to this processing, either fixed or dynamic based on the number of PRs to process. I'll need to think about this a bit more to make sure there are no negative consequences to adding the delay, but if there aren't, it should be a relatively simple change. |
Thank you. The ability to have some kind of delay could help. As you can see from our bulldozer.yml file, do you think us having "auto merge" in both the "merge" and "update" triggers could be causing this behavior? We added this label to the "update" trigger so it would force updates w/o requiring developers set both "auto update" and "auto merge" labels. The c7k1sfpsbsbq0kn51rq0 appears to be an old Draft PR. I just tried adding the "ignore_drafts: true" option under the "update" section but the Bulldozer logs show the following errors:
We are using the latest "develop" branch of Bulldozer. On another note, a nice feature would be the ability to exclude certain PRs from being processed at all. UPDATE: I removed the "auto merge" trigger label from the update section but we still saw these the errors so I put back our original configuration. |
A few other options:
Just a thought. |
Yet another option, perhaps even cleaner: use a custom label on PRs that need to be ignored by bulldozer. Then here: bulldozer/pull/pull_requests.go Line 70 in 3ae63f8
It'd require organizations to have a mechanism in place to apply that label to their PRs, but it'd avoid having to artificially sleep anywhere between sending requests.
|
It's strange that you got an error trying to use the For the original issue, I've proposed two changes that might help:
|
You’re correct. We spin up bulldozer docker containers using docker compose and I froze us to version 1.10.1. We once pulled in the latest docker container and our config broke, so I locked us to your Thanks for the fixes. To reproduce the original problem, I created 80 pull requests with the TEST1: curl -s http://localhost:8089/api/health I updated our base branch in Github so the 80 test PRs (with the “auto update” label set) would trigger updates. Approx 1 minute in, I started seeing the same “secondary rate limit” errors. TEST2: curl -s http://localhost:8089/api/health This PR does fix the issue. I did not see any “secondary rate limit” errors, however for my 80 test PRs, it took over an hour to update all PRs. I do see your new messages in the log. LAST TEST: I created 20 more PRs, for a total of 100 PRs for my test and there were still no errors. It looks like adding a small 2 second (or possibly less) delay is all that may be required to get around this problem. I am attaching the bulldozer logs so you can see all the operations. One has your 60 second max delay default and the other is the 2 second max delay. Would it be easy to make the “PullUpdateMaxDelay” configurable just in case the hardcoded value does not work for everyone? Thanks again for the quick turnaround on these fixes. We depend on Bulldozer and it makes developers lives a lot easier around here. |
Thanks for the extensive testing! Based on your results, I'll definitely drop the default max delay to something much lower, like the 2 seconds you tested with. I'll also consider adding a configuration property, although I'd prefer to keep things simple if there's a fixed value that will work in almost all cases. |
I've been running this change in our production environment since yesterday and i just saw the errors again. I'm sorry but the 2 second change may not be enough, I'm going to try value "PullUpdateMaxDelay = 5 * time.Second" for the new container and continue to monitor. I am attaching the logfile with the 2s delay so maybe you can see where the issue could be. I'm hoping we don't have to use the full 60s because of how long it takes to process all our open PRs. |
Sorry, I clicked the close link by accident and reopened. We saw more secondary rate limit errors with the 5 second max setting. Trying again with "PullUpdateMaxDelay = 15 * time.Second" |
Thanks for the continued testing. I updated the PR yesterday to use a fixed 2s delay instead of exponential backoff (with a lower max value, the backoff seemed to happen too fast to make much of a difference), but I also added a configuration property to adjust the delay value. Based on your testing so far, do you think a fixed but configurable delay will work for you (if you can find the appropriate value)? |
I’m still not clear why my test repository with 100 test PRs worked fine with your I tried setting Also, when we created our "bulldozer" Github app, what/whos API key does this app use? |
I pulled your latest bkeyes/update-delay branch (commit 44c959d). I changed your DefaultPullUpdateDelay = 2 * time.Second back to DefaultPullUpdateDelay = 60 * time.Second because we were seeing the secondary rate limit errors with the 2 second delay. There was just an update to our repo base branch that caused Bulldozer to run through all PRs and we’re still seeing the rate limit errors. I don’t understand why our production repo is hitting this issue when our test repo did not. I looked at the https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits page and there is a line that says "If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request." Could we be hitting this? I am attaching the bulldozer log for you. Please let me know if you need any more information. |
Do you have an update on this? |
The Github API doc for pull request merging warns of secondary rate limiting: https://docs.github.com/en/rest/reference/pulls#merge-a-pull-request |
Our org currently has Bulldozer configured for 30 repos and 300+ open pull requests. We are seeing sporadic “403 You have exceeded a secondary rate limit” errors in our Bulldozer logs. Is it possible that Bulldozer is running too many concurrent api requests and github is throwing the 403 error? An example of the complete error is:
The text was updated successfully, but these errors were encountered: