[MINOR][TESTS] Extract TestEnvHelper trait from SparkTestSuite#55710
Draft
zhengruifeng wants to merge 8 commits intoapache:masterfrom
Draft
[MINOR][TESTS] Extract TestEnvHelper trait from SparkTestSuite#55710zhengruifeng wants to merge 8 commits intoapache:masterfrom
zhengruifeng wants to merge 8 commits intoapache:masterfrom
Conversation
### What changes were proposed in this pull request? Introduces `org.apache.spark.util.SparkTestPaths` in `common-utils` test sources, holding the two helpers that are currently duplicated across module boundaries: - `getWorkspaceFilePath(parts...)` — resolve a path relative to `spark.test.home` / `SPARK_HOME`. - `regenerateGoldenFiles` — env-var check for `SPARK_GENERATE_GOLDEN_FILES=1`. The trait is mixed into: - `SparkTestSuite` (core) — drops local copies of both helpers. - `ConnectFunSuite` (connect client) — drops the local `getWorkspaceFilePath` (which carried a `// Borrowed from SparkFunSuite` comment). - `LogKeysSuite` (common-utils) — drops local copies of both. Also removes the now-redundant `private val regenerateGoldenFiles` in `PlanGenerationTestSuite` (it would have shadowed/clashed with the inherited `protected def`). Living in `common-utils` is required because some consumers (`LogKeysSuite` in `common-utils`, `ConnectFunSuite` in the shaded Spark Connect client) cannot reach `SparkFunSuite` / `SparkTestSuite` in `spark-core` due to module dependency direction. ### Why are the changes needed? Cleanup. The two helpers are byte-for-byte identical at three call sites (four counting the `regenerateGoldenFiles` clone), and the connect copy explicitly comments that it was borrowed from `SparkFunSuite`. Consolidating prevents drift. ### Does this PR introduce _any_ user-facing change? No. Test-only refactor. ### How was this patch tested? `build/sbt common-utils/Test/compile core/Test/compile connect-client-jvm/Test/compile sql/Test/compile` succeeds. No behavior change. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7)
The trait covers env-var-driven test plumbing — both the workspace path resolution (via `spark.test.home` / `SPARK_HOME`) and the golden-file regeneration flag (`SPARK_GENERATE_GOLDEN_FILES`). The earlier name was too narrow for the latter. Generated-by: Claude Code (Opus 4.7)
Drop the redundant `Spark` prefix; everything in `org.apache.spark.util` is already Spark-scoped. Generated-by: Claude Code (Opus 4.7)
Mixing `TestEnvHelper` (defined in `common-utils` test sources) into `ConnectFunSuite` puts it in the parent chain of every connect-client test class. When a closure from such a test is sent to the connect server (e.g. UDFs in `KeyValueGroupedDatasetE2ETestSuite`), the server JVM tries to load `TestEnvHelper.class` and fails because `common-utils` test JAR is not registered as an artifact (only `connect-client-jvm` test-classes is, via `RemoteSparkSession`). Restore `ConnectFunSuite`'s local `getWorkspaceFilePath` and `PlanGenerationTestSuite`'s local `regenerateGoldenFiles`. The `TestEnvHelper` trait remains shared between `SparkTestSuite` (core) and `LogKeysSuite` (common-utils), where the cross-classloader path does not apply. Generated-by: Claude Code (Opus 4.7)
HyukjinKwon
approved these changes
May 7, 2026
Re-extends `ConnectFunSuite` with `TestEnvHelper`, but this time also uploads `common-utils/target/<scala>/test-classes` to the connect server's artifact path so the server can resolve `TestEnvHelper.class` when deserializing client-side closures. Without this, every test class extending `ConnectFunSuite` had `TestEnvHelper` in its parent chain, and any closure sent to the server (UDFs, KV grouped operations) would fail with `SparkClassNotFoundException: Failed to load class: org.apache.spark.util.TestEnvHelper`. Generated-by: Claude Code (Opus 4.7)
Maven (unlike SBT) does not propagate test-scoped transitive dependencies. Mixing `TestEnvHelper` (in `common-utils` test-jar) into `SparkTestSuite` (in `core` test-jar) put `TestEnvHelper` in the linearization of `SparkFunSuite`, which broke Maven compilation of every downstream module that depends on `core` test-jar without an explicit `common-utils` test-jar dep — graphx, mllib, sql/catalyst, sql/hive, etc. Restore `SparkTestSuite`'s local `getWorkspaceFilePath` and `regenerateGoldenFiles`. The `TestEnvHelper` trait is still shared between `ConnectFunSuite` (connect-client-jvm — pom already declares the dep) and `LogKeysSuite` (common-utils itself). Generated-by: Claude Code (Opus 4.7)
…kTestSuite Re-targets the trait at `SparkTestSuite` only, dropping the shared usage across `LogKeysSuite` (common-utils) and `ConnectFunSuite` (connect-client-jvm). The trait now lives in `core/src/test/scala/` beside `SparkTestSuite`, so `TestEnvHelper.class` ships in the `core` test-jar that every downstream test module already depends on — no Maven dependency-resolution surprises. Reverts: - `LogKeysSuite` back to its local `getWorkspaceFilePath` and `regenerateGoldenFiles` (common-utils does not depend on core). - `ConnectFunSuite` back to its local `getWorkspaceFilePath` (connect-client-jvm does not depend on core, and closure serialization to the connect server made cross-classpath sharing awkward). - `PlanGenerationTestSuite`'s `private val regenerateGoldenFiles`. - `RemoteSparkSession.syncTestDependencies` no longer uploads common-utils test-classes; `IntegrationTestUtils.commonUtilsTestClassDir` removed. Net effect: `SparkTestSuite` defers two helpers to the trait; everyone else is unchanged from master. Generated-by: Claude Code (Opus 4.7)
Centralizes the `SPARK_CLEAN_ORPHANED_GOLDEN_FILES` env-var flag in the trait alongside `regenerateGoldenFiles`. `ProtoToParsedPlanTestSuite` now inherits it via `SparkFunSuite` instead of redeclaring locally. Also refreshes the trait's docstring (the file moved out of common-utils into core test sources earlier in this branch). `PlanGenerationTestSuite` (in connect-client-jvm) still keeps its own local copy — connect-client-jvm cannot reach core test-jar. Generated-by: Claude Code (Opus 4.7)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Splits two env-driven helpers out of
SparkTestSuiteinto a new traitorg.apache.spark.util.TestEnvHelper(incore/src/test/scala/):getWorkspaceFilePath(parts...)— resolve a path relative tospark.test.home/SPARK_HOME.regenerateGoldenFiles— env-var check forSPARK_GENERATE_GOLDEN_FILES=1.SparkTestSuitemixes the trait in; everything else is unchanged from master.Why are the changes needed?
Cleanup. The two helpers are well-scoped (env-var-driven test plumbing), worth grouping into their own trait for readability and so future env-driven helpers have a natural home.
The trait is intentionally scoped to
SparkTestSuiteonly. Earlier iterations of this PR also tried to share withLogKeysSuite(common-utils) andConnectFunSuite(connect-client-jvm), but each of those introduced cross-module concerns:common-utilstest sources broke Maven compilation of every module that depends oncoretest-jar but notcommon-utilstest-jar (graphx, mllib, sql/catalyst, etc.) — Maven does not propagate test-scoped transitive deps.ConnectFunSuite-based tests send closures to the connect server, which fails to deserialize them when the captured class hierarchy includes a trait whose.classisn't on the server's artifact path.Putting the trait in
core/src/test/avoids both: it ships in thecoretest-jar that downstream test modules already depend on.Does this PR introduce any user-facing change?
No. Test-only refactor.
How was this patch tested?
build/sbt common-utils/Test/compile core/Test/compile graphx/Test/compile connect-client-jvm/Test/compile sql/Test/compilesucceeds.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)