Skip to content

[MINOR][TESTS] Extract TestEnvHelper trait from SparkTestSuite#55710

Draft
zhengruifeng wants to merge 8 commits intoapache:masterfrom
zhengruifeng:unify-workspace-test-helpers
Draft

[MINOR][TESTS] Extract TestEnvHelper trait from SparkTestSuite#55710
zhengruifeng wants to merge 8 commits intoapache:masterfrom
zhengruifeng:unify-workspace-test-helpers

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 6, 2026

What changes were proposed in this pull request?

Splits two env-driven helpers out of SparkTestSuite into a new trait org.apache.spark.util.TestEnvHelper (in core/src/test/scala/):

  • getWorkspaceFilePath(parts...) — resolve a path relative to spark.test.home / SPARK_HOME.
  • regenerateGoldenFiles — env-var check for SPARK_GENERATE_GOLDEN_FILES=1.

SparkTestSuite mixes the trait in; everything else is unchanged from master.

Why are the changes needed?

Cleanup. The two helpers are well-scoped (env-var-driven test plumbing), worth grouping into their own trait for readability and so future env-driven helpers have a natural home.

The trait is intentionally scoped to SparkTestSuite only. Earlier iterations of this PR also tried to share with LogKeysSuite (common-utils) and ConnectFunSuite (connect-client-jvm), but each of those introduced cross-module concerns:

  • Living in common-utils test sources broke Maven compilation of every module that depends on core test-jar but not common-utils test-jar (graphx, mllib, sql/catalyst, etc.) — Maven does not propagate test-scoped transitive deps.
  • ConnectFunSuite-based tests send closures to the connect server, which fails to deserialize them when the captured class hierarchy includes a trait whose .class isn't on the server's artifact path.

Putting the trait in core/src/test/ avoids both: it ships in the core test-jar that downstream test modules already depend on.

Does this PR introduce any user-facing change?

No. Test-only refactor.

How was this patch tested?

build/sbt common-utils/Test/compile core/Test/compile graphx/Test/compile connect-client-jvm/Test/compile sql/Test/compile succeeds.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

### What changes were proposed in this pull request?

Introduces `org.apache.spark.util.SparkTestPaths` in `common-utils` test sources, holding the two helpers that are currently duplicated across module boundaries:

- `getWorkspaceFilePath(parts...)` — resolve a path relative to `spark.test.home` / `SPARK_HOME`.
- `regenerateGoldenFiles` — env-var check for `SPARK_GENERATE_GOLDEN_FILES=1`.

The trait is mixed into:
- `SparkTestSuite` (core) — drops local copies of both helpers.
- `ConnectFunSuite` (connect client) — drops the local `getWorkspaceFilePath` (which carried a `// Borrowed from SparkFunSuite` comment).
- `LogKeysSuite` (common-utils) — drops local copies of both.

Also removes the now-redundant `private val regenerateGoldenFiles` in `PlanGenerationTestSuite` (it would have shadowed/clashed with the inherited `protected def`).

Living in `common-utils` is required because some consumers (`LogKeysSuite` in `common-utils`, `ConnectFunSuite` in the shaded Spark Connect client) cannot reach `SparkFunSuite` / `SparkTestSuite` in `spark-core` due to module dependency direction.

### Why are the changes needed?

Cleanup. The two helpers are byte-for-byte identical at three call sites (four counting the `regenerateGoldenFiles` clone), and the connect copy explicitly comments that it was borrowed from `SparkFunSuite`. Consolidating prevents drift.

### Does this PR introduce _any_ user-facing change?

No. Test-only refactor.

### How was this patch tested?

`build/sbt common-utils/Test/compile core/Test/compile connect-client-jvm/Test/compile sql/Test/compile` succeeds. No behavior change.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)
The trait covers env-var-driven test plumbing — both the workspace path
resolution (via `spark.test.home` / `SPARK_HOME`) and the golden-file
regeneration flag (`SPARK_GENERATE_GOLDEN_FILES`). The earlier name was
too narrow for the latter.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [MINOR][TESTS] Unify workspace test helpers via SparkTestPaths trait [MINOR][TESTS] Unify workspace test helpers via SparkTestEnvHelper trait May 6, 2026
Drop the redundant `Spark` prefix; everything in `org.apache.spark.util`
is already Spark-scoped.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [MINOR][TESTS] Unify workspace test helpers via SparkTestEnvHelper trait [MINOR][TESTS] Unify workspace test helpers via TestEnvHelper trait May 6, 2026
@zhengruifeng zhengruifeng changed the title [MINOR][TESTS] Unify workspace test helpers via TestEnvHelper trait [SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait May 6, 2026
Mixing `TestEnvHelper` (defined in `common-utils` test sources) into
`ConnectFunSuite` puts it in the parent chain of every connect-client
test class. When a closure from such a test is sent to the connect
server (e.g. UDFs in `KeyValueGroupedDatasetE2ETestSuite`), the server
JVM tries to load `TestEnvHelper.class` and fails because
`common-utils` test JAR is not registered as an artifact (only
`connect-client-jvm` test-classes is, via `RemoteSparkSession`).

Restore `ConnectFunSuite`'s local `getWorkspaceFilePath` and
`PlanGenerationTestSuite`'s local `regenerateGoldenFiles`. The
`TestEnvHelper` trait remains shared between `SparkTestSuite` (core)
and `LogKeysSuite` (common-utils), where the cross-classloader path
does not apply.

Generated-by: Claude Code (Opus 4.7)
Re-extends `ConnectFunSuite` with `TestEnvHelper`, but this time also
uploads `common-utils/target/<scala>/test-classes` to the connect
server's artifact path so the server can resolve `TestEnvHelper.class`
when deserializing client-side closures.

Without this, every test class extending `ConnectFunSuite` had
`TestEnvHelper` in its parent chain, and any closure sent to the server
(UDFs, KV grouped operations) would fail with
`SparkClassNotFoundException: Failed to load class:
org.apache.spark.util.TestEnvHelper`.

Generated-by: Claude Code (Opus 4.7)
Maven (unlike SBT) does not propagate test-scoped transitive
dependencies. Mixing `TestEnvHelper` (in `common-utils` test-jar) into
`SparkTestSuite` (in `core` test-jar) put `TestEnvHelper` in the
linearization of `SparkFunSuite`, which broke Maven compilation of
every downstream module that depends on `core` test-jar without an
explicit `common-utils` test-jar dep — graphx, mllib, sql/catalyst,
sql/hive, etc.

Restore `SparkTestSuite`'s local `getWorkspaceFilePath` and
`regenerateGoldenFiles`. The `TestEnvHelper` trait is still shared
between `ConnectFunSuite` (connect-client-jvm — pom already declares
the dep) and `LogKeysSuite` (common-utils itself).

Generated-by: Claude Code (Opus 4.7)
…kTestSuite

Re-targets the trait at `SparkTestSuite` only, dropping the shared usage
across `LogKeysSuite` (common-utils) and `ConnectFunSuite`
(connect-client-jvm). The trait now lives in `core/src/test/scala/`
beside `SparkTestSuite`, so `TestEnvHelper.class` ships in the `core`
test-jar that every downstream test module already depends on — no
Maven dependency-resolution surprises.

Reverts:
- `LogKeysSuite` back to its local `getWorkspaceFilePath` and
  `regenerateGoldenFiles` (common-utils does not depend on core).
- `ConnectFunSuite` back to its local `getWorkspaceFilePath`
  (connect-client-jvm does not depend on core, and closure
  serialization to the connect server made cross-classpath sharing
  awkward).
- `PlanGenerationTestSuite`'s `private val regenerateGoldenFiles`.
- `RemoteSparkSession.syncTestDependencies` no longer uploads
  common-utils test-classes; `IntegrationTestUtils.commonUtilsTestClassDir`
  removed.

Net effect: `SparkTestSuite` defers two helpers to the trait; everyone
else is unchanged from master.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [SPARK-56748][TESTS] Unify workspace test helpers via TestEnvHelper trait [MINOR][TESTS] Extract TestEnvHelper trait from SparkTestSuite May 7, 2026
Centralizes the `SPARK_CLEAN_ORPHANED_GOLDEN_FILES` env-var flag in the
trait alongside `regenerateGoldenFiles`. `ProtoToParsedPlanTestSuite`
now inherits it via `SparkFunSuite` instead of redeclaring locally.

Also refreshes the trait's docstring (the file moved out of
common-utils into core test sources earlier in this branch).

`PlanGenerationTestSuite` (in connect-client-jvm) still keeps its own
local copy — connect-client-jvm cannot reach core test-jar.

Generated-by: Claude Code (Opus 4.7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants