-
Notifications
You must be signed in to change notification settings - Fork 4.1k
GH-49067: [R] Disable GCS on macos #49068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+197
−3
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,193 @@ | ||
| --- | ||
| title: "Libarrow binary features" | ||
| description: > | ||
| Understanding which C++ features are enabled in Arrow R package builds | ||
| output: rmarkdown::html_vignette | ||
| --- | ||
|
|
||
| This document explains which C++ features are enabled in different Arrow R | ||
| package build configurations, and documents the decisions behind our default | ||
| feature set. This is intended as internal developer documentation for understanding | ||
| which features are enabled in which builds. It is not intended to be a guide for | ||
| installing the Arrow R package; for that, see the | ||
| [installation guide](../../install.html). | ||
|
|
||
| ## Overview | ||
|
|
||
| When the Arrow R package is installed, it needs a copy of the Arrow C++ library | ||
| (libarrow). This can come from: | ||
|
|
||
| 1. **Prebuilt binaries** we host (for releases and nightlies) | ||
| 2. **Source builds** when binaries aren't available or users opt out | ||
|
|
||
| The features available in libarrow depend on how it was built. This document | ||
| covers the feature configuration for both scenarios. | ||
|
|
||
| ## Prebuilt libarrow binary configuration | ||
|
|
||
| We produce prebuilt libarrow binaries for macOS, Windows, and Linux. These | ||
| binaries include **more features** than the default source build to provide | ||
| users with a fully-featured experience out of the box. | ||
|
|
||
| ### Current binary feature set | ||
|
|
||
| | Platform | S3 | GCS | Configured in | | ||
| |----------|----|----|---------------| | ||
| | macOS (ARM64, x86_64) | ON | ON | `dev/tasks/r/github.packages.yml` | | ||
| | Windows | ON | ON | `ci/scripts/PKGBUILD` | | ||
| | Linux (x86_64) | ON | ON | `compose.yaml` (`ubuntu-cpp-static`) | | ||
|
|
||
| ### Exceptions to our build defaults | ||
|
|
||
| Even though GCS defaults to OFF for source builds, we explicitly enable it in | ||
| our prebuilt binaries because: | ||
|
|
||
| 1. **Binary users expect features to "just work"** - they shouldn't need to | ||
| rebuild from source to access cloud storage | ||
| 2. **Build time is not a concern** - we build binaries once in CI, not on | ||
| user machines | ||
| 3. **Parity across platforms** - users get the same features regardless of OS | ||
|
|
||
|
|
||
| ## Feature configuration in source builds of libarrow | ||
|
|
||
| Source builds are controlled by `r/inst/build_arrow_static.sh`. The key | ||
| environment variable is `LIBARROW_MINIMAL`: | ||
|
|
||
| - `LIBARROW_MINIMAL` unset: Default feature set (Parquet, Dataset, JSON, common compression ON; S3/GCS/jemalloc OFF) | ||
| - `LIBARROW_MINIMAL=false`: Full feature set (adds S3, jemalloc, additional compression) | ||
| - `LIBARROW_MINIMAL=true`: Truly minimal (disables Parquet, Dataset, JSON, most compression, SIMD) | ||
|
|
||
| ### Features always enabled | ||
|
|
||
| These features are always built regardless of `LIBARROW_MINIMAL`: | ||
|
|
||
| | Feature | CMake Flag | Notes | | ||
| |---------|------------|-------| | ||
| | Compute | `ARROW_COMPUTE=ON` | Core compute functions | | ||
| | CSV | `ARROW_CSV=ON` | CSV reading/writing | | ||
| | Filesystem | `ARROW_FILESYSTEM=ON` | Local filesystem support | | ||
| | JSON | `ARROW_JSON=ON` | JSON reading | | ||
| | Parquet | `ARROW_PARQUET=ON` | Parquet file format | | ||
| | Dataset | `ARROW_DATASET=ON` | Multi-file datasets | | ||
| | Acero | `ARROW_ACERO=ON` | Query execution engine | | ||
| | Mimalloc | `ARROW_MIMALLOC=ON` | Memory allocator | | ||
| | LZ4 | `ARROW_WITH_LZ4=ON` | LZ4 compression | | ||
| | Snappy | `ARROW_WITH_SNAPPY=ON` | Snappy compression | | ||
| | RE2 | `ARROW_WITH_RE2=ON` | Regular expressions | | ||
| | UTF8Proc | `ARROW_WITH_UTF8PROC=ON` | Unicode support | | ||
|
|
||
| ### Features controlled by LIBARROW_MINIMAL | ||
|
|
||
| When `LIBARROW_MINIMAL=false`, the following additional features are enabled | ||
| (via `$ARROW_DEFAULT_PARAM=ON`): | ||
|
|
||
| | Feature | CMake Flag | Default | | ||
| |---------|------------|---------| | ||
| | S3 | `ARROW_S3` | `$ARROW_DEFAULT_PARAM` | | ||
| | Jemalloc | `ARROW_JEMALLOC` | `$ARROW_DEFAULT_PARAM` | | ||
| | Brotli | `ARROW_WITH_BROTLI` | `$ARROW_DEFAULT_PARAM` | | ||
| | BZ2 | `ARROW_WITH_BZ2` | `$ARROW_DEFAULT_PARAM` | | ||
| | Zlib | `ARROW_WITH_ZLIB` | `$ARROW_DEFAULT_PARAM` | | ||
| | Zstd | `ARROW_WITH_ZSTD` | `$ARROW_DEFAULT_PARAM` | | ||
|
|
||
| ### Features that require explicit opt-in | ||
|
|
||
| GCS (Google Cloud Storage) is **always off by default**, even when | ||
| `LIBARROW_MINIMAL=false`: | ||
|
|
||
| | Feature | CMake Flag | Default | Reason | | ||
| |---------|------------|---------|--------| | ||
| | GCS | `ARROW_GCS` | `OFF` | Build complexity, dependency size | | ||
|
|
||
| To enable GCS in a source build, you must explicitly set `ARROW_GCS=ON`. | ||
|
|
||
| **Why is GCS off by default?** | ||
|
jonkeane marked this conversation as resolved.
|
||
|
|
||
| GCS was turned off by default in [#48343](https://github.com/apache/arrow/pull/48343) | ||
| (December 2025) because: | ||
|
|
||
| 1. Building google-cloud-cpp is fragile and adds significant build time | ||
| 2. The dependency on abseil (ABSL) has caused compatibility issues | ||
| 3. Users who need GCS can still enable it explicitly | ||
|
|
||
| ## Configuration file locations | ||
|
|
||
| ### libarrow source build configuration | ||
|
|
||
| The main build script that controls source builds: | ||
|
|
||
| **`r/inst/build_arrow_static.sh`** - CMake flags and defaults | ||
| ([view source](https://github.com/apache/arrow/blob/main/r/inst/build_arrow_static.sh)) | ||
| the environment variables to look for are `LIBARROW_MINIMAL`, `ARROW_*`, and, `ARROW_DEFAULT_PARAM` | ||
|
|
||
| ### libarrow binary build configuration | ||
|
|
||
| Each platform has its own configuration file: | ||
|
|
||
| | Platform | Config file | Key settings | | ||
| |----------|-------------|--------------| | ||
| | macOS | `dev/tasks/r/github.packages.yml` | `LIBARROW_MINIMAL=false`, `ARROW_GCS=ON` | | ||
| | Windows | `ci/scripts/PKGBUILD` | `ARROW_GCS=ON`, `ARROW_S3=ON` | | ||
| | Linux | `compose.yaml` (`ubuntu-cpp-static`) | `LIBARROW_MINIMAL=false`, `ARROW_GCS=ON` | | ||
|
|
||
| ## R-universe builds | ||
|
|
||
| [R-universe](https://apache.r-universe.dev/arrow) builds the Arrow R package | ||
| for users who want newer versions than CRAN. R-universe behavior varies by | ||
| platform and architecture: | ||
|
|
||
| | Platform | Architecture | Build method | Features | | ||
| |----------|--------------|--------------|----------| | ||
| | macOS | ARM64 | Downloads prebuilt binary | Full (S3 + GCS) | | ||
| | macOS | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) | | ||
| | Windows | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) | | ||
|
jonkeane marked this conversation as resolved.
|
||
| | Windows | ARM64 | Not supported | NA | | ||
| | Linux | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) | | ||
| | Linux | ARM64 | Builds from source | S3 only (no GCS) | | ||
|
|
||
| ### Why Linux ARM64 builds from source | ||
|
|
||
| We only publish prebuilt Linux binaries for x86_64 architecture. The binary | ||
| selection logic in `r/tools/nixlibs.R` (line 263) explicitly checks for this: | ||
|
|
||
| ```r | ||
| if (identical(os, "darwin") || (identical(os, "linux") && identical(arch, "x86_64"))) { | ||
| ``` | ||
| When R-universe builds on Linux ARM64 runners, no binary is available, so it | ||
| falls back to building from source using `build_arrow_static.sh`. Since GCS | ||
| defaults to OFF in that script, Linux ARM64 users don't get GCS support. | ||
|
|
||
| ### Enabling GCS for Linux ARM64 | ||
|
|
||
| To provide full feature parity for Linux ARM64, we would need to: | ||
|
|
||
| 1. Add an ARM64 Linux build job to `dev/tasks/r/github.packages.yml` | ||
| 2. Update `select_binary()` in `nixlibs.R` to recognize `linux-aarch64` | ||
| 3. Add the artifact pattern to `dev/tasks/tasks.yml` | ||
| 4. Update the nightly upload workflow | ||
|
|
||
| See [GH-36193](https://github.com/apache/arrow/issues/36193) for tracking this work. | ||
|
|
||
| Alternatively, changing the GCS default in `build_arrow_static.sh` from `OFF` | ||
| to `$ARROW_DEFAULT_PARAM` would enable GCS for all source builds, including | ||
| Linux ARM64 on R-universe. | ||
|
|
||
| ## Checking installed features | ||
|
|
||
| Users can check which features are enabled in their installation: | ||
|
|
||
| ```r | ||
| # Show all capabilities | ||
| arrow::arrow_info() | ||
|
|
||
| # Check specific features | ||
| arrow::arrow_with_s3() | ||
| arrow::arrow_with_gcs() | ||
| ``` | ||
|
|
||
| ## Related documentation | ||
|
|
||
| - [Installation guide](../install.html) - User-facing installation docs | ||
| - [Installation details](./install_details.html) - How the build system works | ||
| - [Developer setup](./setup.html) - Building Arrow for development | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the docs I wish we had a few years ago, thank you for writing them!!