|
| 1 | +--- |
| 2 | +title: "httr2" |
| 3 | +output: rmarkdown::html_vignette |
| 4 | +vignette: > |
| 5 | + %\VignetteIndexEntry{httr2} |
| 6 | + %\VignetteEngine{knitr::rmarkdown} |
| 7 | + %\VignetteEncoding{UTF-8} |
| 8 | +--- |
| 9 | + |
| 10 | +```{r, include = FALSE} |
| 11 | +knitr::opts_chunk$set( |
| 12 | + collapse = TRUE, |
| 13 | + comment = "#>" |
| 14 | +) |
| 15 | +``` |
| 16 | + |
| 17 | +The goal of this document is to get you up and running with httr2 as quickly as possible. |
| 18 | +httr2 is designed to map closely to the underlying HTTP protocol. |
| 19 | +I'll try and explain the basics in this intro, but I'd also recommend "[An overview of HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)" from MDN. |
| 20 | + |
| 21 | +There are two important parts to HTTP: the **request**, the data sent to the server, and the **response**, the data sent back from the server. |
| 22 | + |
| 23 | +```{r setup} |
| 24 | +library(httr2) |
| 25 | +``` |
| 26 | + |
| 27 | +## Create a request |
| 28 | + |
| 29 | +In httr2, you start by creating a request. |
| 30 | +This is quite different to httr, where you could only submit a request, immediately receiving a response. |
| 31 | +Having an explicit request object makes it easier to built up a complex request piece by piece, and works well with the pipe. |
| 32 | + |
| 33 | +The simplest request just needs a url: |
| 34 | + |
| 35 | +```{r} |
| 36 | +req <- request("http://httpbin.org/get") |
| 37 | +req |
| 38 | +``` |
| 39 | + |
| 40 | +We can see exactly what this message this request will send to the server by performing a dry run: |
| 41 | + |
| 42 | +```{r} |
| 43 | +req %>% req_dry_run() |
| 44 | +``` |
| 45 | + |
| 46 | +The first line of the request contains three important pieces of information: |
| 47 | + |
| 48 | +- The HTTP **method**, which a verb telling the server what action you want to perform. |
| 49 | + Here's its GET, indicating that we want to get some data. |
| 50 | + |
| 51 | +- The **path** which is the url stripped of all information that server already knows: the protocol (http), the domain or host (httpbin.org) and the port (not used here). |
| 52 | + |
| 53 | +- The version of the HTTP protocol. |
| 54 | + |
| 55 | +The following lines consistent of name-value pairs separated by ":". |
| 56 | +These are called the HTTP **headers**. |
| 57 | +These headers are automatically added by httr2; if you want to add your own, you can use `req_headers()`: |
| 58 | + |
| 59 | +```{r} |
| 60 | +req %>% |
| 61 | + req_headers(Name = "Hadley", `Shoe-Size` = "11") %>% |
| 62 | + req_dry_run() |
| 63 | +``` |
| 64 | + |
| 65 | +HTTP servers will ignore headers that they don't understand. |
| 66 | + |
| 67 | +The headers are finished up by a blank line which is followed by the **body**. |
| 68 | +The requests above (like all GET requests) don't have a body, so let's see what happens if we add one. |
| 69 | +Here we'll use `req_body_json()` to add some data encoded as JSON: |
| 70 | + |
| 71 | +```{r} |
| 72 | +req %>% |
| 73 | + req_body_json(list(x = 1, y = "a")) %>% |
| 74 | + req_dry_run() |
| 75 | +``` |
| 76 | + |
| 77 | +What's changed? |
| 78 | + |
| 79 | +- The method has changed from GET to POST. |
| 80 | + POST is the standard method for sending data to a website. |
| 81 | + You can use other methods with `req_method()`. |
| 82 | + |
| 83 | +- We have two new headers, `Content-Type` and `Content-Length` which tell the server how to interpret the body --- it's going to be 15 bytes long and is encoded as JSON. |
| 84 | + |
| 85 | +- We have a body, consisting of some JSON. |
| 86 | + |
| 87 | +Different sites need data stored in different formats so httr2 provides a selection of common data types: |
| 88 | + |
| 89 | +```{r} |
| 90 | +req %>% |
| 91 | + req_body_form(list(x = 1, y = "a")) %>% |
| 92 | + req_dry_run() |
| 93 | +``` |
| 94 | + |
| 95 | +If you need to send data encoded differently, you can use `req_body_string()` or `req_body_raw()` to add the raw data to the body, and `req_header()` to set the `Content-Type` header to the appropriate type. |
| 96 | + |
| 97 | +## Performing a request and fetch the response |
| 98 | + |
| 99 | +To actually perform a request and fetch the response back, you'll use `req_fetch()`: |
| 100 | + |
| 101 | +```{r} |
| 102 | +req <- request("https://httpbin.org/json") |
| 103 | +resp <- req %>% req_fetch() |
| 104 | +resp |
| 105 | +``` |
| 106 | + |
| 107 | +You can see a simulation of what httr2 actually received with `resp_raw()`: |
| 108 | + |
| 109 | +```{r} |
| 110 | +resp %>% resp_raw() |
| 111 | +``` |
| 112 | + |
| 113 | +An HTTP response has a very similar structure to an HTTP request. |
| 114 | +The first line gives the version of HTTP used, and a status code followed by a short description. |
| 115 | +Then we have the headers, followed by a blank line, followed by a body. |
| 116 | +The majority of responses will have a body, unlike requests. |
| 117 | + |
| 118 | +You can extract data from the response using the `resp_()` functions: |
| 119 | + |
| 120 | +- `resp_status()` returns the status code and `resp_status_desc()` returns the description: |
| 121 | + |
| 122 | + ```{r} |
| 123 | + resp %>% resp_status() |
| 124 | + resp %>% resp_status_desc() |
| 125 | + ``` |
| 126 | +
|
| 127 | +- You can extract all headers with `resp_headers()` or a specific header with `resp_header()`: |
| 128 | +
|
| 129 | + ```{r} |
| 130 | + resp %>% resp_headers() %>% str() |
| 131 | + resp %>% resp_header("Content-Length") |
| 132 | + ``` |
| 133 | +
|
| 134 | + Headers are case insensitive: |
| 135 | +
|
| 136 | + ```{r} |
| 137 | + resp %>% resp_header("CoTtEnT-LeNgTH") |
| 138 | + ``` |
| 139 | +
|
| 140 | +- You can extract the body in various forms using the `resp_body_*()` family of functions. |
| 141 | + Since this response returns JSON we can use `resp_body_json()`: |
| 142 | +
|
| 143 | + ```{r} |
| 144 | + resp %>% resp_body_json() %>% str() |
| 145 | + ``` |
| 146 | +
|
| 147 | +Responses with status codes 4xx and 5xx are consider to be HTTP errors. |
| 148 | +httr2 automatically turns these into R errors: |
| 149 | +
|
| 150 | +```{r, error = TRUE} |
| 151 | +request("https://httpbin.org/status/404") %>% req_fetch() |
| 152 | +request("https://httpbin.org/status/500") %>% req_fetch() |
| 153 | +``` |
| 154 | + |
| 155 | +This is another important difference with httr, which required that you explicitly call `httr::stop_for_status()` to turn HTTP errors into R errors. |
| 156 | +If needed, you can revert to the httr behaviour with `req_error(req, is_error = ~ FALSE)`. |
| 157 | + |
| 158 | +## Controlling the request process |
| 159 | + |
| 160 | +A number of `req_` functions don't directly affect how the request is sent but affect how it's handled. |
| 161 | +I'm not going to go in to detail here, but wanted to make you aware of the options: |
| 162 | + |
| 163 | +- `req_cache()` sets up a cache so if repeated requests return the same results, you can avoid a trip to the server. |
| 164 | + |
| 165 | +- `req_throttle()` will automatically add a small delay before each request so you can avoid hammering a server with many requests. |
| 166 | + |
| 167 | +- `req_retry()` sets up a retry strategy so that if the request either fails or you get a transient HTTP error, it'll automatically retry after a short delay. |
0 commit comments