Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,31 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m

Avoid introducing non-ASCII characters in code or comments. String literals may contain non-ASCII when the content requires it (error messages, test data, etc.). Identifiers are ASCII by convention. The common failure mode is typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) sneaking into comments; scalastyle flags some of these. Spot-check before committing: `grep -rn -P "[^\x00-\x7F]" <files>`.

## Scala Test Base Classes

When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. Spark uses the `AnyFunSuite` ScalaTest style throughout, so the bases below are the chain to choose from. Each adds capability on top of the previous:

SparkFunSuite (core)
<- PlanTest (sql/catalyst)
<- QueryTest (sql/core)
<- SharedSparkSession (sql/core)

| Test scope | Base | Notes |
|------------|------|-------|
| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`, thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`. |
| Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. |
| SQL/DataFrame helpers — abstract `spark` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. Cannot be instantiated alone — `spark` is abstract and must be supplied by a session-providing trait. |
| SQL/DataFrame integration tests — provides a session | `SharedSparkSession` | The default for most SQL suites. Provides a shared classic `TestSparkSession`, `testImplicits`, plus `checkAnswer` from `QueryTest`. |

`QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`. To run a concrete suite, mix in a session-providing trait. The common providers in this repo are:

| Session provider | Module / location | Use case |
|---|---|---|
| `SharedSparkSession` | `sql/core` | Classic in-process `SparkSession`. Default for tests under `sql/core`. |
| `TestHiveSingleton` | `sql/hive` | Hive-backed session (`TestHive`). Used by tests under `sql/hive`. |

Linearization gotcha: the first item in the `extends` clause must transitively extend a class (i.e. carry a non-`Object` superclass). The four bases above all carry the `SparkFunSuite` chain, so they can appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — if you put one first, mix in a class-bearing trait immediately after, or compilation fails with `superclass Object is not a subclass of the superclass SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait <Name>"` — if it ends in `extends DataTypeErrorsBase` or another pure trait, it does not carry the class chain.

## Build and Test

Build and tests can take a long time. If the user explicitly asked to run tests, run them. Otherwise (you are running tests on your own to verify a change), first ask the user if they have more changes to make.
Expand Down