Suggestion: Multiple `req_throttle()` limits in one request's policies #555

pbulsink · 2024-09-26T00:42:30Z

Some APIs have multiple throttle limits: i.e. no more than 4 requests/second and no more than 20 requests per minute and no more than 200 requests/hour.

Current behavior (as far as I've been able to verify) is that placing multiple req_throttle() policies on a request results in just the last policy being enforced.

A desired behavior (if feasable) is a pool of throttles, with delay required to satisfy all throttles, for the above scenario:

req <- request(url) %>%
  req_throttle(4/1) %>%
  req_throttle(20/60) %>%
  req_throttle(200/3600)

results in stepwise increase in throttle delays. So 20 requests could happen in as little as 5 seconds, but once 20 requests are sent the throttle engages the next limit and holds until one minute is past to continue sending requests. This same occurs with respecting the 200 request/hour limit.

The benefit of this is it permits bursts of activities to occur quickly, but respects the larger scale limit(s) in place should users be doing more significant API access tasks.

The text was updated successfully, but these errors were encountered:

hadley · 2024-09-26T12:25:40Z

This would require splitting the parameters in two (i.e. number of requests and time limit). But are you sure you need this? Most modern APIs will return a rate-limit header that you can respond to dynamically with req_retry().

pbulsink · 2024-10-12T19:21:06Z

I know it's common for more "official" APIs to have Retry-After responses, but there's lots of community data sources (particularly in sports analytics where I spend most of my time) where they have set guidelines or rules but may not have implemented Retry-After headers once a user is over the limits (just a 403 or unspecific 429 response).

hadley · 2024-10-16T14:02:56Z

Ok, fair enough.

hadley · 2025-01-28T22:45:57Z

I recently learned about the idea of token buckets, which work something like this:

token_bucket <- function(capacity, fill_time_s) {

  fill_rate <- capacity / fill_time_s
  tokens <- capacity
  last_fill <- Sys.time()

  refill <- function() {
    now <- Sys.time()
    new_tokens <- (now - last_fill) * fill_rate
    tokens <<- min(capacity, tokens + new_tokens)
    last_fill <<- now
  }

  function() {
    refill()
    if (tokens >= 1) {
      tokens <<- tokens - 1
      0
    } else {
      1 / fill_rate
    }
  }
}

token <- token_bucket(10, 10)
repeat{
  wait <- token()
  if (wait == 0) {
    cat(".")
  } else {
    Sys.sleep(wait)
  }
}

This is already more sophisticated than the current rate limiting algorithm, but it's also easy to extend to multiple request limits:

# 100 tokens / minute
minute <- token_bucket(100, 60)
# 10000 tokens / day
daily <- token_bucket(10000, 60 * 60 * 24)
# need to wait until both tokens are available
wait <- max(minute(), daily())

hadley added the feature a feature request or enhancement label Dec 19, 2024

hadley modified the milestone: v1.1.1 Feb 3, 2025

hadley mentioned this issue Feb 8, 2025

Use token bucket implementation for throttling #667

Merged

hadley modified the milestone: v1.1.1 Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Multiple `req_throttle()` limits in one request's policies #555

Suggestion: Multiple `req_throttle()` limits in one request's policies #555

pbulsink commented Sep 26, 2024

hadley commented Sep 26, 2024

pbulsink commented Oct 12, 2024

hadley commented Oct 16, 2024

hadley commented Jan 28, 2025

Suggestion: Multiple req_throttle() limits in one request's policies #555

Suggestion: Multiple req_throttle() limits in one request's policies #555

Comments

pbulsink commented Sep 26, 2024

hadley commented Sep 26, 2024

pbulsink commented Oct 12, 2024

hadley commented Oct 16, 2024

hadley commented Jan 28, 2025

Suggestion: Multiple `req_throttle()` limits in one request's policies #555

Suggestion: Multiple `req_throttle()` limits in one request's policies #555