-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Multiple req_throttle()
limits in one request's policies
#555
Comments
This would require splitting the parameters in two (i.e. number of requests and time limit). But are you sure you need this? Most modern APIs will return a rate-limit header that you can respond to dynamically with |
I know it's common for more "official" APIs to have Retry-After responses, but there's lots of community data sources (particularly in sports analytics where I spend most of my time) where they have set guidelines or rules but may not have implemented Retry-After headers once a user is over the limits (just a 403 or unspecific 429 response). |
Ok, fair enough. |
I recently learned about the idea of token buckets, which work something like this: token_bucket <- function(capacity, fill_time_s) {
fill_rate <- capacity / fill_time_s
tokens <- capacity
last_fill <- Sys.time()
refill <- function() {
now <- Sys.time()
new_tokens <- (now - last_fill) * fill_rate
tokens <<- min(capacity, tokens + new_tokens)
last_fill <<- now
}
function() {
refill()
if (tokens >= 1) {
tokens <<- tokens - 1
0
} else {
1 / fill_rate
}
}
}
token <- token_bucket(10, 10)
repeat{
wait <- token()
if (wait == 0) {
cat(".")
} else {
Sys.sleep(wait)
}
} This is already more sophisticated than the current rate limiting algorithm, but it's also easy to extend to multiple request limits: # 100 tokens / minute
minute <- token_bucket(100, 60)
# 10000 tokens / day
daily <- token_bucket(10000, 60 * 60 * 24)
# need to wait until both tokens are available
wait <- max(minute(), daily()) |
Some APIs have multiple throttle limits: i.e. no more than 4 requests/second and no more than 20 requests per minute and no more than 200 requests/hour.
Current behavior (as far as I've been able to verify) is that placing multiple
req_throttle()
policies on a request results in just the last policy being enforced.A desired behavior (if feasable) is a pool of throttles, with delay required to satisfy all throttles, for the above scenario:
results in stepwise increase in throttle delays. So 20 requests could happen in as little as 5 seconds, but once 20 requests are sent the throttle engages the next limit and holds until one minute is past to continue sending requests. This same occurs with respecting the 200 request/hour limit.
The benefit of this is it permits bursts of activities to occur quickly, but respects the larger scale limit(s) in place should users be doing more significant API access tasks.
The text was updated successfully, but these errors were encountered: