diff --git a/AGENTS.md b/AGENTS.md index 96f5b7917cae..f69793f3e8a3 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,6 +20,31 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m Avoid introducing non-ASCII characters in code or comments. String literals may contain non-ASCII when the content requires it (error messages, test data, etc.). Identifiers are ASCII by convention. The common failure mode is typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) sneaking into comments; scalastyle flags some of these. Spot-check before committing: `grep -rn -P "[^\x00-\x7F]" `. +## Scala Test Base Classes + +When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. Spark uses the `AnyFunSuite` ScalaTest style throughout, so the bases below are the chain to choose from. Each adds capability on top of the previous: + + SparkFunSuite (core) + <- PlanTest (sql/catalyst) + <- QueryTest (sql/core) + <- SharedSparkSession (sql/core) + +| Test scope | Base | Notes | +|------------|------|-------| +| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`, thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`. | +| Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | +| SQL/DataFrame helpers — abstract `spark` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. Cannot be instantiated alone — `spark` is abstract and must be supplied by a session-providing trait. | +| SQL/DataFrame integration tests — provides a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared classic `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. | + +`QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`. To run a concrete suite, mix in a session-providing trait. The common providers in this repo are: + +| Session provider | Module / location | Use case | +|---|---|---| +| `SharedSparkSession` | `sql/core` | Classic in-process `SparkSession`. Default for tests under `sql/core`. | +| `TestHiveSingleton` | `sql/hive` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. | + +Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). The four bases above all carry the `SparkFunSuite` chain, so they can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait "` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain. + ## Build and Test Build and tests can take a long time. If the user explicitly asked to run tests, run them. Otherwise (you are running tests on your own to verify a change), first ask the user if they have more changes to make.