Skip to content

Commit 8cfc73a

Browse files
[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs
### What changes were proposed in this pull request? This PR extends metric-view support to **DS v2 catalogs** by routing `CREATE VIEW ... WITH METRICS` through the `ViewCatalog` / `TableViewCatalog` APIs introduced by [SPARK-52729](#51419) and finalized by [SPARK-56655](https://github.com/apache/spark/pull/55954). Third-party v2 catalogs that implement `ViewCatalog` can now host metric views with the same metadata fidelity as session-catalog metric views. **1. V2 metric-view CREATE path -- shared with `CreateV2ViewExec`.** A new `CreateV2MetricViewExec` and `CreateV2ViewExec` both extend a new `V2CreateViewPreparation` trait (which itself extends `V2ViewPreparation`). The trait owns the shared CREATE-side `run()`: `viewExists` short-circuit on `IF NOT EXISTS`, `createOrReplaceView` for `OR REPLACE`, and cross-type collision decoding (`ViewAlreadyExistsException` -> `tableExists` -> `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode `UNSUPPORTED`, typed `viewDependencies`, `PROP_TABLE_TYPE = METRIC_VIEW`, `retainColumnMetadata = true`) via optional hooks on `V2ViewPreparation`. `DataSourceV2Strategy` intercepts `CreateMetricViewCommand` on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in `CreateMetricViewCommand.run`. **2. First-class `METRIC_VIEW` table type.** - `CatalogTableType.METRIC_VIEW` is added alongside `EXTERNAL` / `MANAGED` / `VIEW`. - `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant for the V2 surface. - The previous `view.viewWithMetrics` property hack is removed; `CatalogTable.isMetricView` checks `tableType == METRIC_VIEW` directly. - `V1Table.summarizeTableType` and `V1Table.toCatalogTable(catalog, ident, ViewInfo)` translate between the V2 property form and the V1 enum. - HMS round-trip support: `HiveTableType` has no `METRIC_VIEW` variant (both regular views and metric views serialize as `VIRTUAL_VIEW`). `HiveExternalCatalog` now persists a `view.subType = METRIC_VIEW` property on write and lifts `tableType` back to `METRIC_VIEW` on read, so HMS-backed metric views survive the round trip. **3. Repo-wide `tableType == VIEW` audit + `CatalogTable.isViewLike` helper.** Promoting metric views to a distinct `CatalogTableType` opens silent regressions wherever existing code branches on `VIEW`. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces: - `CatalogTable.isViewLike` instance method (DBR parity: today returns `tableType == VIEW || tableType == METRIC_VIEW`; forks may extend the set). - `CatalogTable.isViewLike(t: CatalogTableType)` companion form for the few sites that have a `CatalogTableType` but no `CatalogTable` (e.g. `SessionCatalog.isView`, `verifyAlterTableType`, `HiveClientImpl.toHiveTableType`). All 18 sites in `catalyst` / `core` / `hive` that previously did inline `tableType == VIEW || tableType == METRIC_VIEW` (or the `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW` pattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites: `CatalogTable.toJsonLinkedHashMap` (DESCRIBE EXTENDED rows), `HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}`, `HiveClientImpl.toHiveTableType`, `SessionCatalog.isView`, `InMemoryCatalog.listViews`, `RelationResolution`, `Analyzer.lookupTableOrView`, `rules.scala`, `DataStreamWriter`, `DescribeRelationJsonCommand`, `AnalyzeColumnCommand`, `AnalyzePartitionCommand`, `CommandUtils.analyzeTable`, `V2SessionCatalog.dropTableInternal`, `verifyAlterTableType` in `ddl.scala`, and 3 sites in `tables.scala`. **Explicit rejection (uniform error class):** `SHOW CREATE TABLE` on a metric view has no round-trippable `CREATE VIEW ... WITH METRICS` form, so it's rejected explicitly with the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class on **both** the v1 session-catalog path (in `tables.scala`) and the v2 catalog path (in `DataSourceV2Strategy`), so users see the same actionable message regardless of catalog kind. **4. Drop-command parity.** - `DropTableCommand` (v1 path) treats both `VIEW` and `METRIC_VIEW` as views: `DROP TABLE` rejects either with `wrongCommandForObjectTypeError`, and `DROP VIEW` accepts either. - `V2SessionCatalog.dropTableInternal` extends the existing "view rejected from `DROP TABLE`" guard to cover `METRIC_VIEW`. - For non-session v2: `DropTableExec` (post-SPARK-56655) actively rejects with `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since `TableViewCatalog`'s default `viewExists` derives from `loadTableOrView` and recognizes `MetadataTable + ViewInfo`. - `ResolveSessionCatalog`'s `DropView` routing comment is clarified: v2 metric views fall through to `DataSourceV2Strategy` and `ViewCatalog.dropView`. **5. Typed view dependencies (`ViewInfo.viewDependencies`).** - New public DTOs in `org.apache.spark.sql.connector.catalog`: `Dependency` (sealed interface with `Dependency.table(String[])` / `Dependency.function(String[])` non-vararg factories), `TableDependency`, `FunctionDependency`, `DependencyList(Dependency[])`. - `TableDependency` and `FunctionDependency` carry the dependency identifier as **structural multi-part name parts** (`record TableDependency(String[] nameParts)`), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg `cat.db1.db2.tbl` -> 4 parts) round-trip without ambiguity against quoted identifiers containing literal `.`. v1 sources resolved through the session catalog are normalized by a new `MetricViewHelper.qualifyV1` to a stable 3-part `[spark_catalog, db, table]` shape so consumers see deterministic arity per source kind (otherwise `TableIdentifier.nameParts` could return 1, 2, or 3 parts depending on what the analyzer captured). - All three records (`TableDependency`, `FunctionDependency`, `DependencyList`) override `equals` / `hashCode` / `toString` using `Arrays.equals` / `Arrays.hashCode` / `Arrays.toString` to give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through to `Object.equals` (reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensive `clone()` so callers cannot mutate the record's internal array. - `ViewInfo` gains a `viewDependencies` field and a `ViewInfo.Builder.withViewDependencies(...)` setter. Per the field's contract, `null` means "no dependency list was supplied" while an empty `DependencyList.of(new Dependency[0])` means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even when `collectTableDependencies` returns empty. - `MetricViewHelper.collectTableDependencies` walks the analyzed plan and emits structural `Seq[Seq[String]]` parts; the v2 source arm preserves full namespace arity, the v1 source arms (`View`, `HiveTableRelation`, `LogicalRelation`) all route through `qualifyV1` for the stable 3-part shape. **6. Multi-level-namespace targets for v2 metric views.** `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. The helper now takes `nameParts: Seq[String]` directly; call sites in both the v1 path (`CreateMetricViewCommand`) and the v2 path (`DataSourceV2Strategy`) updated. The helper now also returns `(LogicalPlan, MetricView)` so callers don't have to re-parse the YAML body just to read descriptor properties. **7. `metric_view.*` descriptor properties (v1/v2 parity).** `MetricView.getProperties` produces canonical descriptive properties (`metric_view.from.type`, `metric_view.from.name` / `metric_view.from.sql`, `metric_view.where`) that **both** the v1 path (`CreateMetricViewCommand.createMetricViewInSessionCatalog`) and the v2 path (`DataSourceV2Strategy`) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows in `DESCRIBE TABLE EXTENDED` regardless of catalog kind. Long values are truncated to `Constants.MAXIMUM_PROPERTY_SIZE`; the Scaladoc on `getProperties` calls out that `metric_view.from.sql` is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL. **8. `ViewInfo` constructor cleanup.** The metric-view-specific `PROP_TABLE_TYPE = METRIC_VIEW` special case is dropped from the generic `ViewInfo` constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. `METRIC_VIEW`) call `BaseBuilder.withTableType(...)` before `build()` -- exercised by `CreateV2MetricViewExec` via the new `V2ViewPreparation.tableType` hook. **9. `ViewHelper.aliasPlan(retainMetadata)`.** The user-specified-column-with-comment branch in `aliasPlan` previously dropped existing column metadata. A new `retainMetadata: Boolean = false` parameter merges the analyzed attribute's metadata into the new comment metadata. `ViewHelper.prepareTable` passes `retainMetadata = isMetricView` (v1 path); `V2ViewPreparation` exposes a `retainColumnMetadata` hook that `CreateV2MetricViewExec` overrides to `true` (v2 path). Both preserve the per-column `metric_view.type` / `metric_view.expr` keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments. **10. Error classes.** - New `INVALID_METRIC_VIEW_YAML` (sqlState 42K0L). `MetricViewPlanner.parseYAML`'s catch blocks now route through `QueryCompilationErrors.invalidMetricViewYamlError` instead of `SparkException.internalError`, so a typo in the user's YAML body surfaces as a user-correctable `AnalysisException` rather than "please contact support". - New `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path so `SHOW CREATE TABLE` on a metric view produces the same actionable message regardless of catalog kind. **11. Misc.** - `MetricViewCanonical.parseSource` accepts multipart identifiers (`parseMultipartIdentifier`) so 3-part `catalog.schema.table` source references work as `AssetSource`. ### Why are the changes needed? Before this PR, metric-view DDL only worked against the session catalog: the create path called `SessionCatalog.createTable` directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped `ViewCatalog` and `TableViewCatalog` as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface. Once metric views can live on a v2 catalog, two more constraints surface: 1. **Type discriminator.** A consumer reading a row through `ViewCatalog.loadView` needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in `PROP_TABLE_TYPE = METRIC_VIEW` keeps the distinction wire-compatible and lets `V1Table.toCatalogTable` reconstruct `CatalogTableType.METRIC_VIEW` on the read path. 2. **Structured dependency lineage.** Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed `DependencyList` of `TableDependency` / `FunctionDependency` with structural `String[] nameParts` lets catalogs persist the lineage as a first-class field with full fidelity. The remaining changes (drop-command parity, `aliasPlan` metadata retention, `metric_view.*` properties, `parseMultipartIdentifier`, `tableType == VIEW` audit + `isViewLike` helper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real `CatalogTableType` and as a v2 catalog citizen -- without them, basic operations like `DROP VIEW`, `DESCRIBE TABLE EXTENDED`, `CREATE VIEW (a COMMENT 'c') WITH METRICS ...`, or `CREATE VIEW cat.db1.db2.mv WITH METRICS ...` would silently degrade. ### Does this PR introduce _any_ user-facing change? Yes, both for end users and for catalog plugin developers: **End users:** - `CREATE VIEW <ident> WITH METRICS ...` now works against any v2 catalog that implements `ViewCatalog`, including catalogs with multi-level namespace targets. Previously it was rejected with `MISSING_CATALOG_ABILITY.VIEWS` for non-session catalogs, and capped at single-level namespaces. `IF NOT EXISTS` and `OR REPLACE` are honored on the v2 path (regression vs. v1 fixed). - A v2 metric view can be queried with `SELECT region, measure(count_sum) FROM <mv> ...`, dropped with `DROP VIEW`, listed via `SHOW VIEWS` (and via `SHOW TABLES` on a `TableViewCatalog`, matching v1 SHOW TABLES output), and described with `DESCRIBE TABLE` / `DESCRIBE TABLE EXTENDED`. `DROP TABLE` on a metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead"). - `ALTER VIEW <metric_view> RENAME TO ...` is wired through `RenameV2ViewExec` and preserves the metric-view kind across the rename. - `SHOW CREATE TABLE` on a metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind. - Session-catalog metric views are now stored as `CatalogTableType.METRIC_VIEW` instead of `CatalogTableType.VIEW + view.viewWithMetrics=true`. Observable in `DESCRIBE TABLE EXTENDED`'s `Type` row and the `tableType` column of the `tables` system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a `view.subType = METRIC_VIEW` property marker. - `DESCRIBE TABLE EXTENDED` on metric views (v1 and v2) now consistently surfaces `metric_view.from.type` / `metric_view.from.name` / `metric_view.from.sql` / `metric_view.where` descriptor rows. - Error messages from `DROP TABLE` / `DROP VIEW` mismatch now mention `METRIC_VIEW` alongside `VIEW`. - Malformed metric-view YAML now surfaces as `INVALID_METRIC_VIEW_YAML` (user-correctable) instead of "Spark internal error, please contact support". **Catalog plugin developers:** - New public API surface in `org.apache.spark.sql.connector.catalog`: sealed interface `Dependency` with `permits TableDependency, FunctionDependency`, both records carrying `String[] nameParts`. `Dependency.table(String[])` / `Dependency.function(String[])` static factories (non-vararg per review; callers pass an existing array directly). `DependencyList(Dependency[])` with `DependencyList.of(Dependency[])` factory. All three records override `equals` / `hashCode` / `toString` to give value semantics on their array fields, and the canonical accessors return a defensive `clone()` so internal state is not mutable through the public API. All `Evolving`, `since 4.2.0`. Note: today's only producer in Spark itself is metric-view dependency extraction, which emits `TableDependency` only; `FunctionDependency` and `Dependency.function(...)` are exposed as groundwork for future producers (e.g. SQL UDF dep tracking). - `ViewInfo` gains a typed `viewDependencies()` accessor and `ViewInfo.Builder.withViewDependencies(...)` setter. `viewDependencies` is populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back through `V1Table.toCatalogTable`) carry `null`. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up. - `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant. - `CatalogTableType.METRIC_VIEW` enum value (v1 surface). - `CatalogTable.isViewLike` instance + `CatalogTable.isViewLike(CatalogTableType)` companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR's `MATERIALIZED_VIEW`, `STREAMING_TABLE`) only need to extend the helper body. - `V2ViewPreparation` (private to `org.apache.spark.sql.execution.datasources.v2`) gains optional `viewDependencies` / `tableType` / `retainColumnMetadata` hooks. ### How was this patch tested? `MetricViewV2CatalogSuite` -- 31 tests across 5 sections, all against an in-memory `ViewCatalog` test fixture (`MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog`): **Section 1 -- CREATE-related (11 tests):** - V2 catalog receives `METRIC_VIEW` table type and view text via `ViewInfo`. - V2 catalog path populates `metric_view.*` descriptor properties + view context (`currentCatalog` / `currentNamespace`) + captured SQL configs. - V2 catalog path captures `SQLSource` and comment. - Metric view columns carry `metric_view.type` / `metric_view.expr` in column metadata. - User-specified column names with comments preserve `metric_view.*` metadata (pins the `aliasPlan(retainMetadata = true)` fix). - `CREATE OR REPLACE VIEW ... WITH METRICS` replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, `metric_view.where`, dependencies). - `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when the view exists (catalog never sees the second `createView` call). - `CREATE VIEW ... WITH METRICS` over a v2 table at the ident throws `TABLE_OR_VIEW_ALREADY_EXISTS` (analyzer-time pre-check). - `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when a v2 table sits at the ident (v1 parity). - `CREATE VIEW ... WITH METRICS` on a non-`ViewCatalog` catalog fails with `MISSING_CATALOG_ABILITY.VIEWS`. - `CREATE VIEW ... WITH METRICS` at a multi-level-namespace v2 target (`testcat.ns_a.ns_b.mv_deep`) succeeds (pins the `analyzeMetricViewText` lift to `Seq[String]`). **Section 2 -- Dependency extraction (5 tests):** - SQL source `JOIN` captures both tables as 3-part `nameParts`. - SQL source subquery deduplicates same-table references. - SQL source self-join deduplicates same-table references. - V1 session-catalog source emits exactly 3 parts, normalized to `[spark_catalog, db, table]` by `qualifyV1`. - Multi-level V2 namespace source (`testcat.ns_a.ns_b.events_deep`) emits 4-part `nameParts`. **Section 3 -- SELECT cases (5 tests, modeled on `MetricViewSuite` patterns):** - `SELECT measure(count_sum) FROM <mv> GROUP BY region` returns aggregated rows (exercises the full `loadTableOrView` -> `MetadataTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip). - `SELECT measure(...) WHERE region = ...` -- query-layer filter on top of the view. - View's pre-defined `where` clause is applied (`where = Some("count > 1")` filters at view-resolution time). - Multiple measures with different aggregations (sum / sum / max). - `ORDER BY measure(...) DESC LIMIT 1` over the metric view. **Section 4 -- DESCRIBE cases (2 tests):** - `DESCRIBE TABLE EXTENDED` round-trips through `loadTableOrView` and emits the `View Text` / `Type` rows (gated through `CatalogTable.isViewLike` in `toJsonLinkedHashMap`, which now recognizes `METRIC_VIEW`). - `DESCRIBE TABLE` (non-EXTENDED) returns the aliased columns. **Section 5 -- DROP / SHOW / RENAME cases (8 tests):** - `DROP VIEW` succeeds on a v2 metric view. - `DROP VIEW IF EXISTS` on a non-existent v2 metric view is a no-op. - `DROP TABLE` on a v2 metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", per SPARK-56655's `DropTableExec`) and asserts the metric view is **not** deleted. - `DROP TABLE IF EXISTS` on a v2 metric view also throws (`IF EXISTS` doesn't silence the wrong-type error, v1 parity). - `SHOW CREATE TABLE` on a v2 metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (same dedicated error class as the v1 path). - `SHOW TABLES` on a `TableViewCatalog` lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655). - `SHOW VIEWS` lists v2 metric views. - `ALTER VIEW <metric_view> RENAME TO ...` succeeds and preserves the metric-view kind across the rename (pins `RenameV2ViewExec` end-to-end against the fixture's `renameView`). Existing session-catalog metric-view tests (`MetricViewSuite`, `SimpleMetricViewSuite`, `HiveMetricViewSuite`) and v1 path tests pass unchanged. `DDLSuite` and `HiveDDLSuite` had their `tableTypes` enumerations updated to include `METRIC_VIEW` in two assertion lists. `PlanResolutionSuite` test fixture was updated to stub the new `CatalogTable.isViewLike` method on the Mockito mock. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Sonnet 4.7) Closes #55487 from chenwang-databricks/metric-view-on-51419. Lead-authored-by: Chen Wang <chen.wang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent c23e166 commit 8cfc73a

40 files changed

Lines changed: 1888 additions & 109 deletions

File tree

common/utils/src/main/resources/error/error-conditions.json

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4123,6 +4123,12 @@
41234123
},
41244124
"sqlState" : "KD002"
41254125
},
4126+
"INVALID_METRIC_VIEW_YAML" : {
4127+
"message" : [
4128+
"Failed to parse metric view YAML: <message>"
4129+
],
4130+
"sqlState" : "42K0L"
4131+
},
41264132
"INVALID_NAME_IN_USE_COMMAND" : {
41274133
"message" : [
41284134
"Invalid name '<name>' in <command> command. Reason: <reason>"
@@ -8314,6 +8320,11 @@
83148320
"The table <tableName> is a Spark data source table. Please use SHOW CREATE TABLE without AS SERDE instead."
83158321
]
83168322
},
8323+
"ON_METRIC_VIEW" : {
8324+
"message" : [
8325+
"The command is not supported on a metric view <tableName>."
8326+
]
8327+
},
83178328
"ON_TEMPORARY_VIEW" : {
83188329
"message" : [
83198330
"The command is not supported on a temporary view <tableName>."
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.sql.connector.catalog;
19+
20+
import org.apache.spark.annotation.Evolving;
21+
22+
/**
23+
* Represents a dependency of a SQL object such as a view or metric view.
24+
* <p>
25+
* A dependency is one of: {@link TableDependency} or {@link FunctionDependency}. The
26+
* {@code sealed} declaration enforces this structurally.
27+
* <p>
28+
* Note: today the only producer in Spark itself is metric-view dependency extraction, which
29+
* emits {@link TableDependency} only. {@link FunctionDependency} and the
30+
* {@link #function(String[])} factory are exposed as groundwork for future producers
31+
* (e.g. SQL UDF dependency tracking); consumers iterating a {@link DependencyList} received
32+
* from Spark today should expect to see only {@link TableDependency} instances.
33+
*
34+
* @since 4.2.0
35+
*/
36+
@Evolving
37+
public sealed interface Dependency permits TableDependency, FunctionDependency {
38+
39+
/**
40+
* Construct a {@link TableDependency} from the structural multi-part name of the dependent
41+
* table. {@code nameParts} should contain at least one element; for catalog-managed tables
42+
* the first element is typically the catalog name and subsequent elements are namespace
43+
* components followed by the table name.
44+
*/
45+
static TableDependency table(String[] nameParts) {
46+
return new TableDependency(nameParts);
47+
}
48+
49+
/**
50+
* Construct a {@link FunctionDependency} from the structural multi-part name of the
51+
* dependent function. {@code nameParts} should contain at least one element; for
52+
* catalog-managed functions the first element is typically the catalog name and subsequent
53+
* elements are namespace components followed by the function name.
54+
*/
55+
static FunctionDependency function(String[] nameParts) {
56+
return new FunctionDependency(nameParts);
57+
}
58+
}
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.sql.connector.catalog;
19+
20+
import java.util.Arrays;
21+
import java.util.Objects;
22+
23+
import org.apache.spark.annotation.Evolving;
24+
25+
/**
26+
* A list of dependencies for a SQL object such as a view or metric view.
27+
* <p>
28+
* <ul>
29+
* <li>When {@code null}, the dependency information is not provided.</li>
30+
* <li>When the array is empty, dependencies are provided but the object has none.</li>
31+
* <li>When the array is non-empty, each entry describes one dependency.</li>
32+
* </ul>
33+
* <p>
34+
* Records' auto-generated {@code equals}/{@code hashCode} on array fields fall through to
35+
* {@link Object#equals} (reference equality), so this record overrides them to use
36+
* {@link Arrays#equals(Object[], Object[])} / {@link Arrays#hashCode(Object[])} on
37+
* {@code dependencies}; per-element equality delegates to the element's overridden
38+
* {@code equals} ({@link TableDependency} / {@link FunctionDependency} both implement value
39+
* semantics on their {@code nameParts} array). The defensive-copy accessor override clones
40+
* on read so callers cannot mutate the record's internal array.
41+
*
42+
* @param dependencies array of dependencies; must contain no null elements (defensive
43+
* copy made; not validated element-wise -- callers passing nulls will
44+
* surface NPEs in downstream consumers)
45+
* @since 4.2.0
46+
*/
47+
@Evolving
48+
public record DependencyList(Dependency[] dependencies) {
49+
50+
public DependencyList {
51+
Objects.requireNonNull(dependencies, "dependencies must not be null");
52+
dependencies = dependencies.clone();
53+
}
54+
55+
/** Returns a defensive copy of the underlying dependencies array. */
56+
@Override
57+
public Dependency[] dependencies() { return dependencies.clone(); }
58+
59+
@Override
60+
public boolean equals(Object o) {
61+
return o instanceof DependencyList that && Arrays.equals(dependencies, that.dependencies);
62+
}
63+
64+
@Override
65+
public int hashCode() { return Arrays.hashCode(dependencies); }
66+
67+
@Override
68+
public String toString() {
69+
return "DependencyList[dependencies=" + Arrays.toString(dependencies) + "]";
70+
}
71+
72+
public static DependencyList of(Dependency[] dependencies) {
73+
return new DependencyList(dependencies);
74+
}
75+
}
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.sql.connector.catalog;
19+
20+
import java.util.Arrays;
21+
import java.util.Objects;
22+
23+
import org.apache.spark.annotation.Evolving;
24+
25+
/**
26+
* A function dependency of a SQL object.
27+
* <p>
28+
* The dependent function is identified by its structural multi-part name. See
29+
* {@link TableDependency} for the parts-form contract.
30+
* <p>
31+
* Records' auto-generated {@code equals}/{@code hashCode} on array fields fall through to
32+
* {@link Object#equals} (reference equality), so this record overrides them to use
33+
* {@link Arrays#equals(Object[], Object[])} / {@link Arrays#hashCode(Object[])} on
34+
* {@code nameParts} and give value-based semantics. The defensive-copy accessor override
35+
* also clones on read so callers cannot mutate the record's internal array.
36+
*
37+
* @param nameParts structural multi-part identifier; must be non-empty and contain no
38+
* null elements (defensive copy made; not validated element-wise --
39+
* callers passing nulls will surface NPEs in downstream consumers)
40+
* @since 4.2.0
41+
*/
42+
@Evolving
43+
public record FunctionDependency(String[] nameParts) implements Dependency {
44+
public FunctionDependency {
45+
Objects.requireNonNull(nameParts, "nameParts must not be null");
46+
if (nameParts.length == 0) {
47+
throw new IllegalArgumentException("nameParts must not be empty");
48+
}
49+
nameParts = nameParts.clone();
50+
}
51+
52+
/** Returns a defensive copy of the underlying parts array. */
53+
@Override
54+
public String[] nameParts() { return nameParts.clone(); }
55+
56+
@Override
57+
public boolean equals(Object o) {
58+
return o instanceof FunctionDependency that && Arrays.equals(nameParts, that.nameParts);
59+
}
60+
61+
@Override
62+
public int hashCode() { return Arrays.hashCode(nameParts); }
63+
64+
@Override
65+
public String toString() {
66+
return "FunctionDependency[nameParts=" + Arrays.toString(nameParts) + "]";
67+
}
68+
}
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.sql.connector.catalog;
19+
20+
import java.util.Arrays;
21+
import java.util.Objects;
22+
23+
import org.apache.spark.annotation.Evolving;
24+
25+
/**
26+
* A table dependency of a SQL object.
27+
* <p>
28+
* The dependent table is identified by its structural multi-part name. {@code nameParts}
29+
* arity matches the catalog's namespace depth plus one for the table name -- for a catalog
30+
* with single-level namespaces the parts are {@code [catalog, schema, table]}; for a catalog
31+
* with multi-level namespaces (e.g. Iceberg with {@code db1.db2}) the parts are
32+
* {@code [catalog, db1, db2, ..., table]}; for v1 sources resolved through the session
33+
* catalog, producers should normalize to {@code [spark_catalog, db, table]} so consumers see
34+
* a stable arity per source kind. The structural form preserves arity and is unambiguous
35+
* against quoted identifiers containing a literal {@code .}; consumers that need a flat
36+
* string should join the parts themselves with a quoting scheme appropriate to their wire
37+
* format.
38+
* <p>
39+
* Records' auto-generated {@code equals}/{@code hashCode} on array fields fall through to
40+
* {@link Object#equals} (reference equality), so this record overrides them to use
41+
* {@link Arrays#equals(Object[], Object[])} / {@link Arrays#hashCode(Object[])} on
42+
* {@code nameParts} and give value-based semantics. The defensive-copy accessor override
43+
* also clones on read so callers cannot mutate the record's internal array.
44+
*
45+
* @param nameParts structural multi-part identifier; must be non-empty and contain no
46+
* null elements (defensive copy made; not validated element-wise --
47+
* callers passing nulls will surface NPEs in downstream consumers)
48+
* @since 4.2.0
49+
*/
50+
@Evolving
51+
public record TableDependency(String[] nameParts) implements Dependency {
52+
public TableDependency {
53+
Objects.requireNonNull(nameParts, "nameParts must not be null");
54+
if (nameParts.length == 0) {
55+
throw new IllegalArgumentException("nameParts must not be empty");
56+
}
57+
nameParts = nameParts.clone();
58+
}
59+
60+
/** Returns a defensive copy of the underlying parts array. */
61+
@Override
62+
public String[] nameParts() { return nameParts.clone(); }
63+
64+
@Override
65+
public boolean equals(Object o) {
66+
return o instanceof TableDependency that && Arrays.equals(nameParts, that.nameParts);
67+
}
68+
69+
@Override
70+
public int hashCode() { return Arrays.hashCode(nameParts); }
71+
72+
@Override
73+
public String toString() {
74+
return "TableDependency[nameParts=" + Arrays.toString(nameParts) + "]";
75+
}
76+
}

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableSummary.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ public interface TableSummary {
2727
String EXTERNAL_TABLE_TYPE = "EXTERNAL";
2828
String VIEW_TABLE_TYPE = "VIEW";
2929
String FOREIGN_TABLE_TYPE = "FOREIGN";
30+
String METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW";
3031

3132
Identifier identifier();
3233
String tableType();

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewInfo.java

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ public class ViewInfo extends TableInfo {
4848
private final Map<String, String> sqlConfigs;
4949
private final String schemaMode;
5050
private final String[] queryColumnNames;
51+
private final DependencyList viewDependencies;
5152

5253
protected ViewInfo(Builder builder) {
5354
super(builder);
@@ -57,11 +58,11 @@ protected ViewInfo(Builder builder) {
5758
this.sqlConfigs = Collections.unmodifiableMap(builder.sqlConfigs);
5859
this.schemaMode = builder.schemaMode;
5960
this.queryColumnNames = builder.queryColumnNames;
60-
// Force PROP_TABLE_TYPE = VIEW so that `properties()` reflects the typed ViewInfo
61-
// classification. Catalogs and generic viewers reading PROP_TABLE_TYPE from the properties
62-
// bag (e.g. TableCatalog.listTableSummaries default impl, DESCRIBE) see "VIEW" without
63-
// requiring authors to remember to call withTableType(VIEW).
64-
properties().put(TableCatalog.PROP_TABLE_TYPE, TableSummary.VIEW_TABLE_TYPE);
61+
this.viewDependencies = builder.viewDependencies;
62+
// Default PROP_TABLE_TYPE = VIEW so `properties()` reflects the typed ViewInfo
63+
// classification. Callers can refine to a more specific view kind (for example,
64+
// METRIC_VIEW) by calling BaseBuilder.withTableType(...) on the builder before build().
65+
properties().putIfAbsent(TableCatalog.PROP_TABLE_TYPE, TableSummary.VIEW_TABLE_TYPE);
6566
}
6667

6768
/** The SQL text of the view. */
@@ -102,13 +103,22 @@ protected ViewInfo(Builder builder) {
102103
*/
103104
public String[] queryColumnNames() { return queryColumnNames; }
104105

106+
/**
107+
* Returns the structured list of objects this view depends on (source tables and functions),
108+
* or {@code null} if no dependency list was supplied. Unlike other view metadata which is
109+
* encoded into {@link #properties()}, dependency lists are a first-class field because their
110+
* nested structure does not round-trip cleanly through flat string properties.
111+
*/
112+
public DependencyList viewDependencies() { return viewDependencies; }
113+
105114
public static class Builder extends BaseBuilder<Builder> {
106115
private String queryText;
107116
private String currentCatalog;
108117
private String[] currentNamespace = new String[0];
109118
private Map<String, String> sqlConfigs = new HashMap<>();
110119
private String schemaMode;
111120
private String[] queryColumnNames = new String[0];
121+
private DependencyList viewDependencies = null;
112122

113123
@Override
114124
protected Builder self() { return this; }
@@ -143,6 +153,16 @@ public Builder withQueryColumnNames(String[] queryColumnNames) {
143153
return this;
144154
}
145155

156+
/**
157+
* Sets the structured dependency list for this view. Source tables and functions referenced
158+
* by the view text should be recorded here so downstream consumers (e.g. catalogs persisting
159+
* lineage) can access them without re-analyzing the view body.
160+
*/
161+
public Builder withViewDependencies(DependencyList viewDependencies) {
162+
this.viewDependencies = viewDependencies;
163+
return this;
164+
}
165+
146166
@Override
147167
public ViewInfo build() {
148168
Objects.requireNonNull(columns, "columns should not be null");

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1231,7 +1231,7 @@ class Analyzer(
12311231
) {
12321232
CatalogV2Util.loadTable(catalog, ident).map {
12331233
case v1Table: V1Table if CatalogV2Util.isSessionCatalog(catalog) &&
1234-
v1Table.v1Table.tableType == CatalogTableType.VIEW =>
1234+
v1Table.v1Table.isViewLike =>
12351235
val v1Ident = v1Table.catalogTable.identifier
12361236
val v2Ident = Identifier.of(v1Ident.database.toArray, v1Ident.identifier)
12371237
ResolvedPersistentView(

0 commit comments

Comments
 (0)