-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duckplyr 1.0.0 #724
base: main
Are you sure you want to change the base?
duckplyr 1.0.0 #724
Conversation
Furthermore, this blog post might need a benchmark. Maybe it could be structured around "why bother" (despite already having code that works without duckplyr, despite the fallbacks and some "annoying" incompatibilities like factors and timezones): duckplyr already works fairly well, and is under active development. And the choice is IMHO probably not duckplyr vs dplyr but rather duckplyr vs other dplyr backends. So large data support is crucial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duckdb_tibble()
needs the dot, the other functions don't.
b0ccfba
to
0df0d53
Compare
- [computation to files](https://duckplyr.tidyverse.org/reference/compute_file.html) using `compute_parquet()` or `compute_csv()`. | ||
|
||
A drawback of analyzing large data with duckplyr is that the limitations of duckplyr won't be compensated by fallbacks, since fallbacks to dplyr necessitate putting data into memory. | ||
Therefore, if your pipeline encounters fallbacks, you might want to work around them by converting the duck frame into a table through `compute()` then running SQL code through the experimental `read_sql_duckdb()` function. Again, over time, we expect more native support for dplyr functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krlmlr could we tweak the example to use ceiling()
that isn't supported I think? So it'd look more realistic. (I do not know SQL 🙈 )
Code for acknowledgements c1 <- usethis::use_tidy_thanks("tidyverse/duckplyr", to = "v1.0.0.9003")
c2 <- usethis::use_tidy_thanks("duckdb/duckdb-r", to = "v1.2.0")
all_c <- setdiff(union(c1, c2), c("krlmlr", "maelle", "github-actions[bot]")) # do not thank post authors
purrr::map_chr(all_c, ~ sprintf("[@%s](https://github.com/%s)", .x, .x)) |>
glue::glue_collapse(sep = ", ", last = ", and ") |>
clipr::write_clip() |
|
||
<!-- FIXME: | ||
|
||
We have many more dplyr backends, the two above are just from the tidyverse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the action needed here to create this repo? under the cynkra org?
``` | ||
|
||
Like with other dplyr backends like dtplyr and dbplyr, duckplyr allows you to get faster results without learning a different syntax. | ||
Unlike other dplyr backends, duckplyr does not require you to change existing code or learn specific idiosyncracies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether this sentence is slightly misleading: to use duckplyr efficiently, it's probably best to read the "limits" vignette, which is akin to learning about idiosyncrasies?
…rudence Most users should not care about prudence at all. We must give it a name and formalize it only for providing this experience. I added the "clutter your memory" bit. How does this look to you?
@krlmlr the "stingy" example does not work, it should generate an error but does not. 🤔
I'm a bit undecided regarding structure. I tried starting with basic usage, but even simply discussing
library()
vs individual activation viaduck_tibble()
is better done with some understanding of prudence I think.