-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new iteration strategy #353
Conversation
I quickly skimmed over it and it looks quite nice. Let me know when this is ready for a proper review. |
I think it's probably good for a proper review now. The part I'm least certain about is the return value — my gut feeling is that it should return a list of resps and a list of reqs, so that if there is an error you can track it down more easily. But I'm struggling with what to call that object, and it should go in a separate PR anyway. |
I used the new interface in an internal package (for an absolutely terrible API with a very inconsistent interface. So, the perfect use case actually). Some things that might be difficult:
api_paginate_token <- function(req,
field,
spec,
page_size,
n_results_field,
...,
req_token_field = "nextToken",
resp_token_field = "nextToken",
pb_name = "Download",
n = Inf) {
check_dots_empty()
parse_resp <- function(resp) {
parsed <- httr2::resp_body_json(resp, check_type = FALSE, bigint_as_char = TRUE)
tibblify(parsed[[field]], spec)
}
known_total <- FALSE
next_req <- function(resp, req) {
parsed <- httr2::resp_body_json(resp, check_type = FALSE, bigint_as_char = TRUE)
next_token <- purrr::pluck(parsed, !!!resp_token_field)
if (is.null(next_token)) {
return(NULL)
}
if (!known_total && !is.null(n_results_field)) {
n_results <- purrr::pluck(parsed, !!!n_results_field)
check_number_whole(n_results)
known_total <<- TRUE
signal("", class = "httr2_total_pages", n = ceiling(n_results / page_size))
}
data <- req$body$data
data <- purrr::assign_in(data, req_token_field, value = u(next_token))
req$body$data <- data
req
}
responses <- httr2::req_perform_iteratively(
req = req,
next_req = next_req,
max_reqs = ceiling(n / page_size),
progress = pb_name
)
httr2::resps_combine(responses, parse_resp)
} |
Oh and of course I couldn't use the nice helpers because of the pagination patterns used by the API... |
#' * `iterate_with_offset()` increments a query parameter, e.g. `?page=1`, | ||
#' `?page=2`, or `?offset=1`, `offset=21`. | ||
#' * `iterate_with_cursor()` updates a query parameter with the value of a | ||
#' cursor found somewhere in the response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restricting iterate_with_offset()
and iterate_with_cursor()
to a query parameter makes the implementation much simpler. I hope this covers most APIs but it definitely doesn't cover the main one I have to use...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a reasonable trade-off. My plan is to eventually have vignette that describes how to handle more complex situations.
#' @param resp_param_value A callback function that takes a response (`resp`) | ||
#' and returns the next cursor value. Return `NULL` if there are no further | ||
#' pages. | ||
iterate_with_cursor <- function(param_name, resp_param_value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to not include a resp_pages
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include a resp_pages()
for either of these two helpers because I think it's relatively rare that the total number of pages is available with these interfaces. But I guess there's no harm in adding it.
#' @param resp_data A function with one argument `resp` that parses the | ||
#' response and returns a list with the field `data` and other fields needed | ||
#' to create the request for the next page. | ||
resps_combine <- function(resps, resp_data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRAN might be more happy if you explicitly document the return value with @return
check_request(req) | ||
check_function2(next_req, args = c("resp", "req")) | ||
check_number_whole(max_reqs, allow_infinite = TRUE, min = 1) | ||
check_string(path, allow_empty = FALSE, allow_null = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check that the path actually includes {i}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a bit fiddly for me for now, particularly since I'm not sure if I'm super committed to this API. We can work on it in the future.
R/test.R
Outdated
|
||
res$set_status(200L)$send_json( | ||
object = list(data = data, count = n), | ||
object = list(data = data, count = n, pages = n / page_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object = list(data = data, count = n, pages = n / page_size), | |
object = list(data = data, count = n, pages = ceiling(n / page_size)), |
Overall this looks good to me. Switching to the new interface was relatively quick (as written above there are some annoying things but I guess this is mainly due to the terrible API I have to use). |
Maybe add the request to each response as an attribute. Or add a helper |
Co-authored-by: Maximilian Girlich <[email protected]>
A first pass. Fixes #341. Fixes #298.
To do (possibly in future PRs)
req_chunk()
(Addreq_chunk()
#307) look like?req_perform_parallel()
if you wanted.req_perform_parallel()
(but notmulti_req_perform()
)@mgirlich it's probably worth a glance if you have time, but it's likely to evolve quite a bit more as I work on it, so no need for a thorough review yet. It's a bit PR so I recommend reading
iterate.R
first, theniterate-helpers.R
theniterate-responses.R