Skip to content

[VL] Unify native-build component isolation via a single resolver (macOS + Linux)#12331

Draft
jackylee-ch wants to merge 1 commit into
apache:mainfrom
jackylee-ch:gluten-native-build-isolation
Draft

[VL] Unify native-build component isolation via a single resolver (macOS + Linux)#12331
jackylee-ch wants to merge 1 commit into
apache:mainfrom
jackylee-ch:gluten-native-build-isolation

Conversation

@jackylee-ch

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Native-build path policy was duplicated across three shell entry points
(builddeps-veloxbe.sh, build-helper-functions.sh, build-velox.sh), each
independently hardcoding -DCMAKE_IGNORE_PREFIX_PATH=/usr/local on macOS only.
This left Linux without first-class isolation and, importantly, left Velox's
own dependency builds (folly, bundled Arrow, …) unprotected from /usr/local.

This PR introduces dev/build-isolation.sh as a single source of truth. It
normalizes all path inputs, decides isolation on/off, and emits a CMake
toolchain fragment + path-policy.env + machine-readable resolved roots that
every build layer consumes.

Default behavior (user-facing contract)

  • macOS and Linux are both default-on (GLUTEN_BUILD_ISOLATION=auto → on);
    vcpkg forces off; explicit on+vcpkg fails fast (only one toolchain slot).
  • macOS default: local prefix ${VELOX_HOME}/deps-install; /usr/local ignored.
  • Linux default: setup still installs to system dirs, which stay trusted-managed
    (Docker/CI behavior and artifact locations unchanged); only ambient residue
    (stray Conda, user CMake registry) is filtered. On Linux this is effectively
    a no-op
    unless you opt into a separate install.
  • Either platform + explicit INSTALL_PREFIX (separate install): /usr/local
    and /usr flip to ignored, with GLUTEN_ALLOW_IGNORED_ROOTS /
    GLUTEN_TRUST_PREFIX escape hatches. GLUTEN_BUILD_ISOLATION=off is a full
    kill-switch on both platforms.

Two-level isolation

  • CMake find policy: ignore roots + NO_SYSTEM_FROM_IMPORTED + package-registry
    off, propagated to every nested cmake (incl. Velox's own dependency setup)
    via the exported CMAKE_TOOLCHAIN_FILE. The toolchain carries only the ignore
    policy — it deliberately does not prepend trusted prefixes globally, which
    would wrongly redirect Velox's/Arrow's self-contained bundled builds to
    deps-install artifacts.
  • Compiler include search: CMAKE_IGNORE_* doesn't govern the compiler, and on
    macOS clang searches /usr/local/include ahead of -isystem, so a stale
    header there (e.g. an old gtest/fmt) shadows the bundled copy. The resolver
    exports CFLAGS/CXXFLAGS with -idirafter <ignored>/include to demote those
    roots below every -I/-isystem dir; child cmake processes inherit it.

dev/build-arrow.sh: guard the destructive download-dir removal (never wipe a
user-provided ARROW_PREFIX) and resolve a sane default install prefix for
standalone runs instead of silently targeting /usr/local.

The final PR is described not as "block /usr/local" but as establishing a single
resolver for component install + dependency discovery: opt-out via
GLUTEN_BUILD_ISOLATION, user-explicit paths win, ambient residue isolated.

How was this patch tested?

  • dev/tests/test-build-isolation.sh: a fast (seconds) dry-run scenario harness
    asserting the resolved policy and CMAKE_TOOLCHAIN_FILE / -idirafter
    propagation for the default-behavior scenarios (macOS/Linux defaults,
    separate-install, system mode, vcpkg off, on+vcpkg fail-fast, kill-switch,
    conda filtering, allow-list). 19/19 passing.
  • End-to-end complete native macOS build (arm64) on top of the project's
    macOS build fixes: produced valid libgluten.dylib + libvelox.dylib with
    zero /usr/local linkage (otool -L), confirming the isolation holds at
    the runtime-link level.
  • Linux path is a no-op by default (system stays trusted), preserving existing
    Docker/CI behavior.

Was this patch authored or co-authored using generative AI tooling?

Co-authored using Claude (Opus) via Claude Code.

Copilot AI review requested due to automatic review settings June 22, 2026 12:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a unified native-build “component isolation” resolver (dev/build-isolation.sh) and wires it into the Velox/Arrow build entrypoints so that path-discovery isolation (CMake + compiler include search demotion) is consistently applied across macOS and Linux.

Changes:

  • Add dev/build-isolation.sh to compute isolation policy, generate a toolchain fragment, and export discovery/ignore flags for all nested CMake builds.
  • Update native build entrypoints (builddeps-veloxbe.sh, build-velox.sh, build-helper-functions.sh) to consume the resolver outputs instead of duplicating macOS-only /usr/local ignore logic.
  • Add a dry-run scenario test harness (dev/tests/test-build-isolation.sh) and ignore generated artifacts via .gitignore; harden dev/build-arrow.sh’s managed-dir cleanup behavior.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ep/build-velox/src/build-velox.sh Sources the isolation resolver and injects resolved CMake/compiler isolation flags into Velox’s build.
dev/tests/test-build-isolation.sh Adds dry-run tests validating resolved isolation behavior and propagation.
dev/builddeps-veloxbe.sh Establishes/install-prefix explicitness and applies resolver-derived CMake isolation across native build stages.
dev/build-isolation.sh New single-source resolver that generates toolchain/env artifacts and exports isolation flags/toolchain.
dev/build-helper-functions.sh Uses the resolver to apply isolation flags consistently for CMake-based dependency builds.
dev/build-arrow.sh Avoids wiping user-provided Arrow prefixes and consults resolver for standalone prefix defaults.
.gitignore Ignores generated isolation artifacts under .gluten-build-cache/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dev/build-arrow.sh
Comment thread dev/build-isolation.sh Outdated
Comment thread dev/build-isolation.sh
@jackylee-ch jackylee-ch marked this pull request as draft June 22, 2026 13:47
@jackylee-ch jackylee-ch force-pushed the gluten-native-build-isolation branch 2 times, most recently from ee16043 to 51280f4 Compare June 23, 2026 06:57
…cOS + Linux)

Native-build path policy was duplicated across three shell entry points
(builddeps-veloxbe.sh, build-helper-functions.sh, build-velox.sh), each
independently hardcoding `-DCMAKE_IGNORE_PREFIX_PATH=/usr/local` on macOS only.
This left Linux without first-class isolation and, importantly, left Velox's
own dependency builds (folly, bundled Arrow, ...) unprotected from /usr/local.

Introduce dev/build-isolation.sh as a single source of truth. It normalizes all
path inputs, decides isolation on/off, and emits a CMake toolchain fragment +
path-policy.env + machine-readable resolved_{trusted,ignored,runtime_ignored}_roots
under the already-gitignored ep/_ep working dir. Every build layer consumes them.

Default behavior (user-facing contract):
  * macOS and Linux both default-on (GLUTEN_BUILD_ISOLATION=auto -> on); vcpkg
    forces off; explicit on+vcpkg fails fast (only one toolchain slot).
  * macOS default: local prefix ${VELOX_HOME}/deps-install; /usr/local ignored.
  * Linux default: setup still installs to system dirs (trusted-managed,
    Docker/CI behavior and artifact locations unchanged); only ambient residue
    (stray Conda, user CMake registry) is filtered -- effectively a no-op unless
    you opt into a separate install.
  * Either platform + explicit INSTALL_PREFIX (separate install): /usr/local and
    /usr flip to ignored, with GLUTEN_ALLOW_IGNORED_ROOTS / GLUTEN_TRUST_PREFIX
    escape hatches. GLUTEN_BUILD_ISOLATION=off is a full kill-switch.

Two-level isolation:
  * CMake find policy: ignore roots + NO_SYSTEM_FROM_IMPORTED + package-registry
    off, propagated to every nested cmake (incl. Velox's own dependency setup)
    via the exported CMAKE_TOOLCHAIN_FILE. The toolchain carries only the ignore
    policy -- it does NOT prepend trusted prefixes globally, which would wrongly
    redirect Velox's/Arrow's self-contained bundled builds to deps-install.
  * Compiler include search: CMAKE_IGNORE_* doesn't govern the compiler, and on
    macOS clang searches /usr/local/include ahead of -isystem, so a stale header
    there (e.g. an old gtest/fmt) shadows the bundled copy. The resolver exports
    CFLAGS/CXXFLAGS with `-idirafter <ignored>/include` to demote those roots
    below every -I/-isystem dir; child cmake processes inherit it.

build-arrow.sh: guard the destructive download-dir removal (never wipe a
user-provided ARROW_PREFIX) and resolve a sane default install prefix for
standalone runs instead of silently targeting /usr/local.

Verified end-to-end by a complete native macOS build (arm64): valid
libgluten.dylib + libvelox.dylib with zero /usr/local linkage (otool -L). The
resolver supports GLUTEN_ISOLATION_DRYRUN=1 to emit the policy without building.
Linux is a no-op by default, preserving existing Docker/CI behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jackylee-ch jackylee-ch force-pushed the gluten-native-build-isolation branch from 51280f4 to 44b67a8 Compare June 23, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants