Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

req_body_file leaks file handles #534

Closed
plietar opened this issue Sep 4, 2024 · 6 comments · Fixed by #542
Closed

req_body_file leaks file handles #534

plietar opened this issue Sep 4, 2024 · 6 comments · Fixed by #542

Comments

@plietar
Copy link

plietar commented Sep 4, 2024

When using httr2::req_body_file, httr2 opens a connection to the file being uploaded but never closes it. Eventually the handle gets closed by the garbage collector, but a warning gets printed onto the console.

Looking at the source, I see some code that is supposed to close the handle on a short read:

httr2/R/req-body.R

Lines 241 to 246 in 06e9133

out <- readBin(con, "raw", nbytes)
if (length(out) < nbytes) {
close(con)
done <<- TRUE
con <<- NULL
}

However from what I can tell by adding print statements, libcurl never tries to read more than necessary, so short reads never happen, even on a successful file transfer. For small enough files, nbytes is always just equal to size, and to length(out)

One possible solution would be to accumulate the number of bytes read so far (the sum of all the length(out)), and when that equals size then close the file. This would work in the happy case where the connection is not closed early.

I think a more foolproof solution would be to record the connection handle in the request object, and have req_perform close it when the request is complete.

Example reproduction:

> writeLines("Hello", "hello.txt")
> httr2::request("https://example.com") |> httr2::req_body_file("hello.txt") |> httr2::req_perform()
<httr2_response>
POST https://example.com/
Status: 200 OK
Content-Type: text/html
Body: In memory (1256 bytes)

> gc()

           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  6221410 332.3    9245604 493.8  9245604 493.8
Vcells 37136790 283.4   64900073 495.2 42431065 323.8
Warning message:
In .Internal(gc(verbose, reset, full)) :
  closing unused connection 3 (hello.txt)

The call to gc() is used to force the GC to run, but even without it the warning message gets printed eventually.

@hadley
Copy link
Member

hadley commented Sep 4, 2024

Are you using the development version? I can't reproduce the problem:

library(httr2)

path <- tempfile()
writeLines("Hello", path)

showConnections()
#>   description class            mode text   isopen   can read can write
#> 3 "output"    "textConnection" "wr" "text" "opened" "no"     "yes"

resp <- request("https://example.com/") |> 
  req_body_file(path) |> 
  req_perform()

showConnections()
#>   description class            mode text   isopen   can read can write
#> 3 "output"    "textConnection" "wr" "text" "opened" "no"     "yes"

Created on 2024-09-04 with reprex v2.1.0

@plietar
Copy link
Author

plietar commented Sep 4, 2024

Yeah, same happens using the latest httr2 Git commit.

It might have to do with the curl version. I found this commit, introduced in curl 8.7.0 (ie. March 2024): curl/curl@9369c30 which refactors a lot of the file transfer code.

In particular, it adds a piece of code that caps the blen argument to the user callback:
curl/curl@9369c30#diff-c9087157c5758c4d1d00d619debe8f14f5e33227a892315175a90ea6dcbd9ad2R517-R524

  /* respect length limitations */
  if(ctx->total_len >= 0) {
    curl_off_t remain = ctx->total_len - ctx->read_len;
    if(remain <= 0)
      blen = 0;
    else if(remain < (curl_off_t)blen)
      blen = (size_t)remain;
  }
  nread = 0;
  if(data->state.fread_func && blen) {
    Curl_set_in_callback(data, true);
    nread = data->state.fread_func(buf, 1, blen, data->state.in);
    Curl_set_in_callback(data, false);
    ctx->has_used_cb = TRUE;
  }

I don't see any equivalent code in the removed lines, so I guess the behaviour has changed slightly, and where it used to pass the size of the full outgoing buffer, it now only passes the size of how many bytes are actually needed.

For reference, this is my curl version and sessionInfo:

> curl::curl_version()
$version
[1] "8.9.1"

$ssl_version
[1] "OpenSSL/3.0.14"

$libz_version
[1] "1.3.1"

$libssh_version
[1] "libssh2/1.11.0"

$libidn_version
[1] "2.3.7"

$host
[1] "x86_64-pc-linux-gnu"

$protocols
 [1] "dict"    "file"    "ftp"     "ftps"    "gopher"  "gophers" "http"
 [8] "https"   "imap"    "imaps"   "mqtt"    "pop3"    "pop3s"   "rtsp"
[15] "scp"     "sftp"    "smb"     "smbs"    "smtp"    "smtps"   "telnet"
[22] "tftp"

$ipv6
[1] TRUE

$http2
[1] TRUE

$idn
[1] TRUE

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: NixOS 23.11 (Tapir)

Matrix products: default
BLAS/LAPACK: /nix/store/7mqsnvbpnzli3lmb6vi0yfy5kyn1arlx-blas-3/lib/libblas.so.3;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] httr2_1.0.3.9000 testthat_3.2.1.1

loaded via a namespace (and not attached):
 [1] miniUI_0.1.1.1     compiler_4.4.1     brio_1.1.5         promises_1.3.0
 [5] Rcpp_1.0.12        stringr_1.5.1      later_1.3.2        fastmap_1.2.0
 [9] mime_0.12          R6_2.5.1           curl_5.2.2         htmlwidgets_1.6.4
[13] desc_1.4.3         profvis_0.3.7.9000 openssl_2.2.0      rprojroot_2.0.4
[17] shiny_1.8.1.1      rlang_1.1.4        cachem_1.1.0       stringi_1.8.4
[21] httpuv_1.6.15      fs_1.6.4           pkgload_1.3.4      memoise_2.0.1
[25] cli_3.6.2          withr_3.0.0        magrittr_2.0.3     digest_0.6.35
[29] rstudioapi_0.16.0  xtable_1.8-4       rappdirs_0.3.3     askpass_1.2.0
[33] remotes_2.5.0      devtools_2.4.5     lifecycle_1.0.4    vctrs_0.6.5
[37] glue_1.7.0         urlchecker_1.0.1   sessioninfo_1.2.2  pkgbuild_1.4.4
[41] purrr_1.0.2        tools_4.4.1        usethis_2.2.3      ellipsis_0.3.2
[45] htmltools_0.5.8.1

@hadley
Copy link
Member

hadley commented Sep 4, 2024

Can you please send me the results of my reprex run on your computer?

hadley added a commit that referenced this issue Sep 4, 2024
So we can ensure that open connections are always closed. Fixes #534.
@plietar
Copy link
Author

plietar commented Sep 4, 2024

Reprex below. I've had to run reprex with std_out_err = TRUE to capture the GC warnings. See the details pane.

showConnections always runs a gc() first, which means that its output never includes the leaked file. See: https://github.com/r-devel/r-svn/blob/b927f3dd98f11b32c6c36c15bc8093ec87db008e/src/library/base/R/connections.R#L223

library(httr2)

path <- tempfile()
writeLines("Hello", path)

showConnections()
#>   description class            mode text   isopen   can read can write
#> 3 "output"    "textConnection" "wr" "text" "opened" "no"     "yes"

resp <- request("https://example.com/") |> 
  req_body_file(path) |> 
  req_perform()

showConnections()
#>   description class            mode text   isopen   can read can write
#> 3 "output"    "textConnection" "wr" "text" "opened" "no"     "yes"

Created on 2024-09-04 with reprex v2.1.0

Standard output and standard error
Warning message:
In .Internal(gc(verbose, reset, full)) :
  closing unused connection 4 (/tmp/nix-shell.Of66Tb/RtmpPnh7Wt/filef07445c59a7d)

I can run a slightly different reprex, which uses getAllConnections and summary.connection instead of showConnections. With that I actually do get the extra handle lingering around.

library(httr2)

path <- tempfile()
writeLines("Hello", path)

sapply(getAllConnections(), \(x) summary.connection(x)$description)
#> [1] "stdin"  "stdout" "stderr" "output"

resp <- request("https://example.com/") |> 
  req_body_file(path) |> 
  req_perform()

sapply(getAllConnections(), \(x) summary.connection(x)$description)
#> [1] "stdin"                                            
#> [2] "stdout"                                           
#> [3] "stderr"                                           
#> [4] "output"                                           
#> [5] "/tmp/nix-shell.Of66Tb/Rtmp7pXUfP/filefd4a7ce78bdc"

gc()
#>           used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells  707211 37.8    1429215 76.4   835805 44.7
#> Vcells 1282302  9.8    8388608 64.0  2003164 15.3

sapply(getAllConnections(), \(x) summary.connection(x)$description)
#> [1] "stdin"  "stdout" "stderr" "output"

Created on 2024-09-04 with reprex v2.1.0

Standard output and standard error
Warning message:
In .Internal(gc(verbose, reset, full)) :
  closing unused connection 4 (/tmp/nix-shell.Of66Tb/Rtmp7pXUfP/filefd4a7ce78bdc)

@hadley
Copy link
Member

hadley commented Sep 4, 2024

Thanks! I hacked together a quick fix in #542. I need to think about it a bit more to make sure it's the right approach, but I think it should solve your problem (and is a safer approach in general).

@plietar
Copy link
Author

plietar commented Sep 4, 2024

Thanks for the quick resolution! I can confirm that the done-callback branch fixes the issue for me.

hadley added a commit that referenced this issue Sep 5, 2024
So we can ensure that open connections are always closed. Fixes #534.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants