[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487
[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487chenwang-databricks wants to merge 17 commits intoapache:masterfrom
Conversation
a48673d to
6f2ac0b
Compare
245dc93 to
9190bf0
Compare
12905c0 to
8f62efc
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Thanks for extending metric views to v2 catalogs. The shape of the change is right — route through ViewCatalog.createView with PROP_TABLE_TYPE = METRIC_VIEW and a typed viewDependencies, and promote METRIC_VIEW to a first-class CatalogTableType. A few significant concerns:
1. v2 path forks instead of reusing V2ViewPreparation and changes behavior on IF NOT EXISTS / OR REPLACE. CreateMetricViewCommand.createMetricViewInV2Catalog is open-coded inside a v1 RunnableCommand and duplicates most of V2ViewPreparation.buildViewInfo, but with materially different behavior: allowExisting and replace are silently dropped on the v2 path (regression vs. v1, which honored ignoreIfExists), there's no viewExists short-circuit, no createOrReplaceView, no cross-type collision decoding (so a table sitting at the ident leaks ViewAlreadyExistsException instead of unsupportedCreateOrReplaceViewOnTableError), and the CharVarcharUtils / SchemaUtils.checkColumnNameDuplication / checkIndeterminateCollationInSchema / owner-stamping work that V2ViewPreparation does is missing. Refactoring this into a CreateV2MetricViewExec (or extending V2ViewPreparation with a viewDependencies / extraProperties hook) plus a DataSourceV2Strategy rule would close all of these gaps in one move.
2. Audit other tableType == VIEW gates for METRIC_VIEW. Promoting metric views to a distinct CatalogTableType opens up silent regressions wherever existing code branches on VIEW. Two I'm fairly sure about: CatalogTable.toJsonLinkedHashMap (interface.scala:670) gates View Text / View Original Text / View Schema Mode / View Catalog and Namespace / SQL Path on tableType == VIEW, so DESCRIBE TABLE EXTENDED on a metric view will lose all of those rows — visible regression vs. the pre-PR viewWithMetrics storage. And HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata} (lines 277, 297, 598, 838) only special-case VIEW, so a Hive-metastore-backed session-catalog metric view will go through the wrong branches on persist and read-back. Tests don't cover Hive-backed metric views, so this would only surface in production. Worth grepping tableType == CatalogTableType.VIEW and tableType == VIEW repo-wide and deciding which sites also need to accept METRIC_VIEW.
3. TableDependency.tableFullName is a single dot-joined string but the Javadoc claims a fixed three-part shape. This breaks down two ways: (a) MetricViewHelper.collectTableDependencies uses unquotedString for V1 sources, which yields a 2-part db.tbl whenever TableIdentifier.catalog is unset, and mkString(".") for V2 sources, which yields N-part names for catalogs with multi-level namespaces (e.g., Iceberg cat.db1.db2.tbl is 4 parts) — so the producer doesn't honor the contract; (b) any consumer parsing the string back via split/join loses arity and is ambiguous against quoted identifiers containing a literal .. Since this is brand-new public API, it's the right time to make the storage structural — e.g., TableDependency(String[] nameParts) — rather than committing to a string the producer can't always honor.
4. MetricViewPlanner.parseYAML raises SparkException.internalError for malformed user YAML. Replacing invalidLiteralForWindowDurationError is right, but internalError is the wrong category — a typo in the user's YAML body shouldn't surface as "Spark internal error, please contact support." Should be a QueryCompilationErrors / AnalysisException with a real error class.
Minor stuff (typed DTO sealing/mutability, the ViewInfo constructor's metric-view special case, missing tests for OR REPLACE / IF NOT EXISTS / V1-source dependency / multi-level-namespace V2 source / read-back through loadRelation) is in the inline comments below.
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
b7d086f to
269056a
Compare
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
cloud-fan
left a comment
There was a problem hiding this comment.
Round 2 review.
Status: 13 prior findings addressed (4 main concerns + 9 inline), 0 remaining, 4 new (3 newly introduced in commits since the prior review at 8f62efc6, 1 late catch). The shared V2ViewPreparation / V2CreateViewPreparation refactor reads cleanly, the typed Dependency / DependencyList API replaces the dot-joined-string form, the INVALID_METRIC_VIEW_YAML error is in place, and the tableType == VIEW audit covers most sites. Four follow-ups inline.
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
fb8da8c to
28543c1
Compare
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
28543c1 to
b8bd9cc
Compare
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
b8bd9cc to
5d89566
Compare
Add support for CREATE VIEW ... WITH METRICS and the corresponding V2 catalog round-trip on top of the ViewCatalog / RelationCatalog APIs introduced by SPARK-52729. The V2 metric-view create path now builds a ViewInfo and calls ViewCatalog.createView. Metric views are distinguished from plain views with PROP_TABLE_TYPE=METRIC_VIEW, which ViewInfo now preserves when callers intentionally set it. This avoids mutating ViewInfo properties after construction while keeping plain views defaulted to VIEW. ViewInfo gains a typed viewDependencies field so catalogs can persist structured table/function dependencies without flattening nested lineage into string properties. The metric-view planner collects direct table dependencies from the analyzed source plan and passes them through this field. Also add first-class CatalogTableType.METRIC_VIEW / TableSummary.METRIC_VIEW_TABLE_TYPE support, V2-to-V1 conversion for MetricView ViewInfo rows, drop-command parity for metric views, metric_view.* descriptive properties, and focused V1/V2 tests for dependency extraction, ViewInfo payloads, DROP VIEW routing, and user-specified column metadata preservation.
…eedback Address all 4 substantive concerns and 8 inline comments from cloud-fan's review on PR apache#55487. V2 path refactor (concern 1, inline 87/111/142): - New CreateV2MetricViewExec extends V2ViewPreparation (mirrors CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE via createOrReplaceView, and cross-type collision decoding via ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE. - DataSourceV2Strategy rule routes CreateMetricViewCommand on a non-session catalog to the new exec; CreateMetricViewCommand keeps only the v1 session-catalog path in run(). - V2ViewPreparation gains optional viewDependencies / tableType hooks the metric-view subclass populates; plain views leave them None. - Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn- NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and PROP_OWNER stamping for free via the shared trait. Structural TableDependency / FunctionDependency (concern 3, inline 30/169): - Replace single-string `tableFullName` with `String[] nameParts`; arity is preserved per source, unambiguous against quoted identifiers containing literal `.`. Apply same shape to FunctionDependency. - Dependency.table / .function factories take varargs. - MetricViewHelper.collectTableDependencies returns Seq[Seq[String]]; each match arm emits structural parts (V1 sources via TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name). Sealed Dependency + defensive DependencyList (inline 30 / 40): - `sealed interface Dependency permits TableDependency, FunctionDependency` enforces the documented "is one of" structurally. - DependencyList canonical constructor and accessor defensively clone the array so consumers can't mutate stored ViewInfo dependency state. ViewInfo constructor cleanup (inline 68): - Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation tableType hook. VIEW-gate audit (concern 2, inline 1088): - CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" / "View Original Text" / "View Schema Mode" / "View Catalog and Namespace" / "SQL Path" rows for metric views. - HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2, alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW so a Hive-metastore-backed session-catalog metric view round-trips. - Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution.createDataSourceV1Scan (streaming-on-view rejection), Analyzer.lookupTableOrView (ResolvedPersistentView wrapping), rules.saveDataIntoView, DataStreamWriter.writeToV1Table, DescribeRelationJsonCommand.describePartitionInfoJson, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, tables.CreateTableLike provider lookup, tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo, tables.ShowCreateTableCommand (3 sites) -- all extended to accept METRIC_VIEW. Real error class for malformed YAML (concern 4, inline 70): - New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) + QueryCompilationErrors.invalidMetricViewYamlError. Replaces SparkException.internalError so a typo in the user's YAML body surfaces as a user-correctable AnalysisException. Tests (inline 39): - CREATE OR REPLACE VIEW WITH METRICS replaces the existing view. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists. - CREATE VIEW WITH METRICS over a v2 table at the ident throws EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE. - CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits at the ident. - V1 session-catalog source dependency: nameParts arity 2-3. - Multi-level V2 namespace source dependency: nameParts arity N+2. - DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo). - Existing dep tests updated to assert nameParts instead of tableFullName. Misc (inline 178): - Scaladoc on MetricView.getProperties calls out that metric_view.from.sql / metric_view.where are truncated to Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable) values; consumers should re-read the YAML body for full SQL.
…e#55487 CI fix: - Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in `MetricViewV2CatalogSuite.scala` whose absence broke the test file's parse (compileIncremental "four errors found" -- ')' expected, unmatched '}', unclosed multi-line string, '}' expected at EOF). Self-review feedback: - Shorten `viewDependencies` and `tableType` Scaladocs on `V2ViewPreparation` to one sentence each; drop the plain-view / metric-view distinctions. - Consolidate the duplicated `run()` body from `CreateV2ViewExec` and `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait (extends `V2ViewPreparation`) that owns the shared CREATE-side flow: `viewExists` short-circuit, `OR REPLACE` via `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for cross-type collisions. Both subclasses now extend `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of CREATE-only fields. - Drop the defensive `case _: ResolvedIdentifier =>` branch in `CreateMetricViewCommand.run` -- the catch-all internal-error case is sufficient now that `DataSourceV2Strategy` reliably intercepts the non-session path. - `SHOW CREATE TABLE` on a metric view now throws the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead of falling through to the VIEW branch. Reverts the three METRIC_VIEW additions in `tables.scala`'s VIEW gates from the previous round. Test reorg + new coverage in `MetricViewV2CatalogSuite`: - Reorganized into 5 sections: (1) Create-related, (2) Dependency extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW. - Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced with positive assertions on the captured ViewInfo's queryText, metric_view.where, and viewDependencies (parts shape). - New `SELECT from a v2 metric view returns aggregated rows` -- exercises the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip and asserts on the aggregated output rows. - New `DESCRIBE TABLE on a v2 metric view returns the aliased columns` (non-EXTENDED variant alongside the existing EXTENDED test). - New `SHOW TABLES does not list v2 metric views` (RelationCatalog contract: tables only). - New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage Add the test cases requested on PR apache#55487: SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite` patterns: - Existing test fixed to use `measure(count_sum)` (the v1 suite shows the required syntax for measure columns) and switched to `checkAnswer` against an equivalent raw aggregation over the source. - `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a filter at the query layer. - `SELECT against a v2 metric view honors the view's pre-defined where clause` -- creates the metric view with `where = Some("count > 1")`, asserts only the matching rows surface. - `SELECT from a v2 metric view supports multiple measures with different aggregations` -- adds price_sum / price_max alongside count_sum. - `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`. DROP TABLE on a v2 metric view (section 5) -- 2 new tests: - `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` -- `DropTableExec`'s `tableExists` probe returns false (RelationCatalog passive filtering), the `viewExists` fallback returns true, so the exec emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric view is still present in the catalog after the failed DROP. - `DROP TABLE IF EXISTS on a v2 metric view also throws EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type error (v1 parity). SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test: - `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation "SHOW CREATE TABLE".
…ursion, scalastyle Address four CI failures introduced by the v2 metric-view changes on this PR: 1. `SparkOperation.tableTypeString` (hive-thriftserver) did not know how to map the new `CatalogTableType.METRIC_VIEW` to a Hive JDBC table type, causing `SparkMetadataOperationSuite` `GET_TABLE_TYPES` to fail with "Unknown table type is found: CatalogTableType(METRIC_VIEW)". Map `METRIC_VIEW` to `"VIEW"` on the JDBC wire (Hive only knows `TABLE` / `VIEW`). 2. `HiveClientImpl.toHiveTableType` (hive) had the same gap, breaking 19 `HiveMetricViewSuite` cases that go through HMS persistence on the V1 session-catalog path. Map `METRIC_VIEW` to `HiveTableType.VIRTUAL_VIEW`. 3. `MetricViewRecordingCatalog.loadRelation` in `MetricViewV2CatalogSuite` recursed infinitely. `class MetricViewRecordingCatalog extends InMemoryTableCatalog with RelationCatalog` linearizes `RelationCatalog` ahead of `InMemoryTableCatalog`, so `super.loadTable` resolved to `RelationCatalog.loadTable`, whose default delegates back to `loadRelation`. Qualify the super call to `super[InMemoryTableCatalog]` to dispatch directly to the table-catalog implementation. 4. Scalastyle import-order violation in `DataSourceV2Strategy.scala`: `metricview.serde.MetricViewFactory` was placed before `execution.datasources.*`. Reorder to satisfy the Scalastyle import-order rule. Signed-off-by: chen.wang <chen.wang@databricks.com>
…ble / loadTableOrView Adapt `MetricViewV2CatalogSuite` to the renames in SPARK-56655, which closed out the still-`Evolving` v2 view API surface introduced by SPARK-52729: - `RelationCatalog` -> `TableViewCatalog` - `MetadataOnlyTable` -> `MetadataTable` - `loadRelation` -> `loadTableOrView` - `ViewCatalog` gained a new abstract `renameView(oldIdent, newIdent)` Updates: - Imports: drop `MetadataOnlyTable`, `RelationCatalog`; add `MetadataTable`, `TableViewCatalog`. - `MetricViewRecordingCatalog`: mix in `TableViewCatalog` instead of `RelationCatalog`; rename the override of `loadRelation` -> `loadTableOrView`, and `MetadataOnlyTable` -> `MetadataTable`. The qualified `super[InMemoryTableCatalog].loadTable(ident)` recursion-bypass continues to work (the default `loadTable` derived from `TableViewCatalog` still delegates to `loadTableOrView`, hence the bypass is still required). - Add a `renameView(oldIdent, newIdent)` implementation: moves the `views` map entry, the `MetricViewRecordingCatalog.capturedViews` entry, and rejects via `ViewAlreadyExistsException` / `NoSuchViewException` per the `TableViewCatalog` contract. - Update doc comments and a few test names that mentioned the old API. Signed-off-by: chen.wang <chen.wang@databricks.com>
…-trip
Five fixes to recover the metric-view test suites after the latest rebase
on apache/spark master:
1. `MetricViewV2CatalogSuite.withTestCatalogTables`: a few negative tests
pre-create a regular table at `mv` to exercise cross-type collisions,
but the previous cleanup only did `DROP VIEW IF EXISTS mv`, leaving the
table behind to poison every subsequent test in the suite (cleanup
raised `WRONG_COMMAND_FOR_OBJECT_TYPE` because mv was a TABLE, not a
VIEW). Sweep `mv` as both view and table in cleanup, wrapped in `Try`
so the symmetric "wrong command for type" errors don't escape.
2. Test assertions on `info.queryText() === yaml`: the captured queryText
includes the `\n` between the SQL `$$` markers and the YAML body, but
`MetricViewFactory.toYAML(...)` doesn't emit surrounding newlines. Trim
both sides before the equality check (3 sites).
3. `MISSING_CATALOG_ABILITY` assertion: SPARK-56655 added the `.VIEWS`
subclass; the bare error condition no longer surfaces directly. Update
the assertion to expect `MISSING_CATALOG_ABILITY.VIEWS`.
4. `MetricViewRecordingCatalog.{createView, replaceView}`: the
`TableViewCatalog` active-rejection contract requires `createView` to
throw `ViewAlreadyExistsException` (and `replaceView` to throw
`NoSuchViewException`) when a *table* sits at the ident, not just when
a view does. The previous override only checked `views`, so the
pre-table-then-create-view test never raised the cross-type collision
`CreateV2MetricViewExec` is supposed to decode into
`EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`. Add a tables-only existence
probe (bypassing `TableViewCatalog.tableExists`'s default, which would
recurse through our own `loadTableOrView`) and reject up front.
5. `HiveExternalCatalog` HMS round-trip for metric views: `HiveTableType`
has no `METRIC_VIEW` variant -- both regular views and metric views
serialize as `VIRTUAL_VIEW`, and `HiveClientImpl.getTableOption` always
maps `VIRTUAL_VIEW` back to `CatalogTableType.VIEW`, dropping the
metric-view distinction on read. Result: `HiveMetricViewSuite` (which
reuses the same v1 `MetricViewSuite` against
`HiveExternalCatalog`) saw the YAML body parsed as SQL, raising
`PARSE_SYNTAX_ERROR` on every test.
Persist a `view.subType = METRIC_VIEW` property on write
(`tableMetaToTableProps`) and lift `tableType` back to `METRIC_VIEW`
on read (`restoreTableMetadata`) when the marker is present. Property
name follows the existing `view.*` convention so it round-trips
through HMS like other view metadata.
Also factored the `metric_view.*` column metadata preservation across
user-specified renames into a hookable `retainColumnMetadata: Boolean`
on `V2ViewPreparation` (defaults to `false`); `CreateV2MetricViewExec`
overrides it to `true`. Mirrors what
`ViewHelper.prepareTable(isMetricView = true)` already does on the v1
path, fixing the `key not found: metric_view.type` failure on the v2
"user-specified column names with comments preserve metric_view.*
metadata" test.
Signed-off-by: chen.wang <chen.wang@databricks.com>
…tructField Add three small public APIs that let v2 catalog connectors (such as the Unity Catalog Spark connector built on TableViewCatalog) round-trip a Spark StructField through external storage without reaching for private[sql] helpers or the singleton-StructType wrap workaround: - StructField.json / StructField.prettyJson: public counterparts of DataType.json / DataType.prettyJson, exposing the existing private[sql] jsonValue. - StructField.fromJson(String): companion-object parser that mirrors DataType.fromJson and is the inverse of StructField.json. - Column.fromStructField(StructField): static factory in the catalog Column interface that maps a Spark StructField (with metadata) into a connector Column (with metadataInJSON), symmetric to TableInfo.schema() which already goes the other way via CatalogV2Util.v2ColumnsToStructType. Widens DataType.parseStructField from private to private[sql] so the new StructField.fromJson companion can call it directly. Tests: - DataTypeSuite covers StructField round-trip via .json / .fromJson with metric_view-style metadata + comment, and an empty-metadata field. - CatalogV2UtilSuite covers Column.fromStructField for both the metadata-bearing and empty-metadata cases. Signed-off-by: chen.wang <chen.wang@databricks.com>
…pache#55487 Address the four new findings from cloud-fan's round 2 review of PR apache#55487 plus the corresponding CI fixes. Combines the revert of the StructField/Column public-API commit (per cloud-fan's split request) with the three remaining metric-view fixes into one commit. Cloud-fan finding #1 (`verifyAlterTableType` audit miss): * `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently succeeds (renames as if a regular table), and `ALTER VIEW <metric_view> RENAME TO ...` throws `cannotAlterTableWithAlterViewError` (wrong category for a view- kind). Widen both arms to also accept `METRIC_VIEW`. Cloud-fan finding apache#2/apache#3 (StructField commit split-out): * Revert d55143f ("Public StructField.json / .fromJson + Column.fromStructField"). None of the three new public APIs are called by code in this PR -- they exist to support the downstream Unity Catalog connector. Per cloud-fan's request, split them out into a separate PR with its own JIRA ticket so they can be reviewed independently. Reverting the commit also removes the `Column.fromStructField` divergence from `structFieldToV2Column` (which would have duplicated the comment field in `metadataInJSON`); the split-out PR fixes that inline. The revert also fixes two CI failures introduced by the same commit: - `mimaReportBinaryIssues` losing `StructField.tupled` / `StructField.curried` / `AbstractFunction4` parent on the `StructField` companion object (sql-api 4.0.0 binary surface). - `Javaunidoc` failing on the `{@link StructField.json}` / `{@link StructField.fromJson}` references that javadoc rejects (member separator must be `#`, not `.`). Cloud-fan finding apache#4 (multi-level namespace lift): * `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, which can only carry up to 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. Lift the helper to take `nameParts: Seq[String]` directly. The synthetic `CatalogTable` used as analysis context still needs a TableIdentifier (it's a v1 type); we collapse the intermediate namespace components into the synthetic `database` slot via dot-join. Since the synthetic identifier is never used to resolve the view body itself, this collapse is observationally invisible to the analyzed plan, and `verifyTemporaryObjectsNotExists` still receives the full `nameParts` so error rendering preserves the multi-part form. Update both call sites: - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1) passes `name.nameParts`. - `DataSourceV2Strategy` (v2) passes `(catalog.name() +: ident.namespace() :+ ident.name())`. Add a new test in `MetricViewV2CatalogSuite` that creates a metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create succeeds + the captured `ViewInfo` carries `PROP_TABLE_TYPE = METRIC_VIEW`. Test realignment to SPARK-56655's new contract: * SPARK-56655 (master) added several v2 view DDL behaviors that changed the error categories surfaced for cross-type collisions and added view-aware listing on `TableViewCatalog`. Realign 5 existing test assertions in `MetricViewV2CatalogSuite`: - "CREATE VIEW ... WITH METRICS over a v2 table at the ident": `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_ ALREADY_EXISTS` (master's analyzer-time pre-check fires before the exec-time decoding in `CreateV2MetricViewExec.run`). - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` -> `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", raised by master's new `DropTableExec.run`). - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES on a v2 TableViewCatalog lists both tables and metric views": SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through `listRelationSummaries` so views appear alongside tables in the output (matches v1 SHOW TABLES on a session catalog). - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing to the pre-substitution YAML body (the row carries the raw text between the SQL `$$ ... $$` markers, including the surrounding `\n`). * Add an explicit `SHOW CREATE TABLE` rejection guard for v2 metric views in `DataSourceV2Strategy`. Master's new `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not round-trippable as metric-view DDL (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE YAML AS $$ <yaml> $$`). Reject up front with `unsupportedTableOperationError("SHOW CREATE TABLE")` rather than emit lossy DDL; the existing test already expects this exact behavior. Signed-off-by: chen.wang <chen.wang@databricks.com>
5d89566 to
0b26f95
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Round 3 review.
Status: All 4 prior findings addressed (verifyAlterTableType audit miss, StructField/Column public-API split, Column.fromStructField comment-duplication via revert, multi-level namespace lift in analyzeMetricViewText), 0 remaining, 7 new (1 newly introduced, 6 late catches). The round 3 follow-up commit cleanly addresses the round 2 findings; remaining concerns center on v1/v2 inconsistencies (descriptive properties, error class, dependency-name arity) and API correctness of the new Dependency records.
I'll be honest about the 6 late catches: most have been latent since round 1 — the structural-form switch fixed the format dimension I flagged but not the arity dimension; the records I suggested have a value-semantics issue I missed; the metric_view.* merge has been on the v2 path since the original PR. Apologies for not surfacing these earlier.
| val analyzed = MetricViewHelper.analyzeMetricViewText( | ||
| session, nameParts, originalText) | ||
| val metricView = MetricViewFactory.fromYAML(originalText) | ||
| val mergedProps = properties ++ metricView.getProperties |
There was a problem hiding this comment.
The v2 path merges metric_view.from.type / metric_view.from.name / metric_view.from.sql / metric_view.where into the property bag here (properties ++ metricView.getProperties), but the v1 path in CreateMetricViewCommand.createMetricViewInSessionCatalog (metricViewCommands.scala:74-78) passes properties to prepareTable without the merge. Result: DESCRIBE TABLE EXTENDED on a session-catalog metric view doesn't show these descriptor rows; on a v2 metric view, it does.
The PR description states the merge happens "into the view's properties bag" without a v1/v2 carve-out — please confirm this is intentional, or merge in v1 too.
| // would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not a round-trippable | ||
| // metric-view DDL form (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE | ||
| // YAML AS $$ <yaml> $$`). Reject up front rather than emit lossy DDL. | ||
| throw QueryCompilationErrors.unsupportedTableOperationError( |
There was a problem hiding this comment.
The v1 path uses the dedicated UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (tables.scala:1219 → showCreateTableNotSupportedOnMetricViewError); this v2 path uses the generic UNSUPPORTED_FEATURE.TABLE_OPERATION. Same user action, two error classes — the dedicated one carries a more actionable message ("not supported on metric view"). showCreateTableNotSupportedOnMetricViewError takes a String; you can format the multipart name the same way the case below does ((catalog.name() +: ident.asMultipartIdentifier).map(quoteIfNeeded).mkString(".")).
| case r: HiveTableRelation => | ||
| tables += r.tableMeta.identifier.nameParts | ||
| case r: LogicalRelation if r.catalogTable.isDefined => | ||
| tables += r.catalogTable.get.identifier.nameParts |
There was a problem hiding this comment.
TableIdentifier.nameParts returns 1, 2, or 3 parts depending on which of catalog / database are set on the resolved CatalogTable.identifier. The same v1 source table can yield [db, tbl] or [spark_catalog, db, tbl] across runs — the test in MetricViewV2CatalogSuite (the V1 source dependency-extraction test) admits this with parts.length >= 2 && parts.length <= 3. The TableDependency Javadoc (TableDependency.java:32-33) acknowledges it, but downstream consumers comparing dependency names structurally won't be able to rely on stable arity. Same issue applies to the View arm (line 105) and HiveTableRelation arm (line 112).
Consider normalizing v1 sources to always emit 3-part [spark_catalog, db, tbl] (prepending the session-catalog name when missing) so consumers see a stable shape per source kind. The test should then assert exactly 3 parts.
| val nameParts = (catalog.name() +: ident.namespace().toIndexedSeq) :+ ident.name() | ||
| val analyzed = MetricViewHelper.analyzeMetricViewText( | ||
| session, nameParts, originalText) | ||
| val metricView = MetricViewFactory.fromYAML(originalText) |
There was a problem hiding this comment.
MetricViewHelper.analyzeMetricViewText (called on line 345) already parses the YAML internally via MetricViewPlanner.parseYAML → MetricViewFactory.fromYAML. Re-parsing here duplicates the work. analyzeMetricViewText could return (LogicalPlan, MetricView) (or expose the parsed MetricView via the MetricViewPlaceholder it builds) so the strategy doesn't pay the parse twice.
| } | ||
|
|
||
| case DropTable(r: ResolvedIdentifier, ifExists, purge) => | ||
| val tableCatalog = r.catalog.asTableCatalog |
There was a problem hiding this comment.
Extracting r.catalog.asTableCatalog into tableCatalog is a no-op refactor unrelated to metric-view support and is not mentioned in the PR description. Please revert or split into a separate cleanup PR — keeps the metric-view PR focused.
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public record DependencyList(Dependency[] dependencies) { |
There was a problem hiding this comment.
Java records auto-generate equals/hashCode from per-field Objects.equals/Objects.hashCode, which on array fields fall through to Object.equals (reference equality). So DependencyList.of(Dependency.table("a","b")).equals(DependencyList.of(Dependency.table("a","b"))) == false. Same issue on TableDependency.nameParts (TableDependency.java:42) and FunctionDependency.nameParts (FunctionDependency.java:34).
This undermines the "structural multi-part name" intent — consumers can't dedup, compare, or use these as Map keys. Please override equals/hashCode to use Arrays.equals/Arrays.hashCode on each record (and Arrays.deepEquals/Arrays.deepHashCode on DependencyList), or expose List<...> instead of [] to inherit value semantics for free.
| where = None, | ||
| select = metricViewColumns) | ||
| val yaml = MetricViewFactory.toYAML(metricView) | ||
| // Give both columns new names, and a comment on each. Without the `retainMetadata` |
There was a problem hiding this comment.
Two test comments describe behavior in terms of the in-PR change rather than the current invariant — they age poorly post-merge:
- Lines 253-254:
// ... Without theretainMetadatafix toViewHelper.aliasPlan, the metric_view.* keys disappear here.— once merged, there is no "fix" to refer to. Rephrase as a pin:// Pins aliasPlan(retainMetadata=true): metric_view.* keys must survive a column rename with comments. - Lines 749-752 (DESCRIBE EXTENDED test):
// The "Type" row was added alongside this metric-view PR so DescribeV2ViewExec matches v1 parity...— same problem. Rephrase to describe the current invariant being pinned (e.g.DescribeV2ViewExec emits a "Type" row matching v1 parity, so users can distinguish VIEW vs METRIC_VIEW.).
…pache#55487 - Revert the unrelated `tableCatalog` extraction in the `DropTable` strategy case (round 2 leftover). - Single YAML parse per CREATE: `MetricViewHelper.analyzeMetricViewText` now returns `(LogicalPlan, MetricView)`, grabbing the parsed `MetricView` off the un-analyzed `MetricViewPlaceholder` so callers don't re-parse. - v1/v2 parity for `metric_view.*` descriptor properties: lift the `metricView.getProperties` merge into the v1 `createMetricViewInSessionCatalog` path so v1 `DESCRIBE TABLE EXTENDED` surfaces the same descriptor rows as v2. - v1/v2 parity for `SHOW CREATE TABLE` error class: switch the v2 path to the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class. - Normalize v1 source dependencies to a stable 3-part `[spark_catalog, db, table]` shape via a new `qualifyV1` helper, applied in the `View` / `HiveTableRelation` / `LogicalRelation` arms of `collectTableDependencies`. Updates `TableDependency` Javadoc and the V1 source dep test accordingly. - Switch `Dependency` / `TableDependency` / `FunctionDependency` / `DependencyList` from `String[]` / `Dependency[]` to immutable `List<>` so records inherit value `equals` / `hashCode` for free (closes the reference-equality bug on array fields). Internal copies use `List.copyOf`; static factories still take varargs. - Rephrase 2 test comments in `MetricViewV2CatalogSuite` to describe the pinned invariant rather than the in-PR fix (so they age well post-merge). Signed-off-by: chen.wang <chen.wang@databricks.com>
Two CI fixes for the previous round-3 review commit: - Scalastyle: replace the multi-space alignment before `=>` in `MetricViewHelper.qualifyV1`'s pattern-match arms with a single space (Scalastyle's `NoWhitespaceBeforeLeftBracketChecker` flags lined-up `=>` arrows). - Test: update `MetricViewV2CatalogSuite`'s SHOW CREATE TABLE assertion to expect `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (the dedicated error class introduced in the round-3 commit) instead of the previously-used generic `UNSUPPORTED_FEATURE.TABLE_OPERATION`. Other CI job failures on the prior run are unrelated infrastructure flakes (Kerberos PREAUTH on Docker integration tests, minikube termination on Kubernetes IT, RemoteClassLoaderError on Connect's `SparkSessionE2ESuite`). Signed-off-by: chen.wang <chen.wang@databricks.com>
cloud-fan
left a comment
There was a problem hiding this comment.
Round 4 review.
Status: All 7 prior findings addressed (SHOW CREATE TABLE error class parity, metric_view.* v1/v2 parity, Dependency records value-equality via List<>, v1 source dep normalization through qualifyV1, single YAML parse, DropTable revert, test comment phrasing), 0 remaining, 6 new (3 newly introduced in 437b71d79e, 3 late catches I missed earlier). The round-3 follow-up cleanly closes the round-3 findings; remaining concerns are mostly small (text-quality, test gap) plus a few design-level questions worth answering inline before merge.
Apologies again for the late catches -- the FunctionDependency unused-API and the cyclic-view-check skip have been latent since round 1, and the DBR terminology leak has been there since the original commit.
Followup-able (not blocking, worth a tracking ticket):
ALTER VIEW <metric_view> AS SELECT ...is silently accepted byAlterV2ViewExec--existingView.propertiescarryPROP_TABLE_TYPE = METRIC_VIEWthrough, so the view ends up classified asMETRIC_VIEWwith a plain-SQL body; the nextSELECTblows up withINVALID_METRIC_VIEW_YAML. Same hole exists on the v1 path (AlterViewAsCommand.alterPermanentViewpreservestableTypeviaviewMeta.copy). Not introduced by this PR, but promotingMETRIC_VIEWto a first-class type makes the inconsistency more visible.viewDependenciesis only populated on the non-session v2 path -- v1 metric views (and v2 metric views read back throughV1Table.toCatalogTable) carrynull. Probably the right scope cap for this PR, but worth a one-line note in the description so catalog plugin developers know not to rely on it for the v1 surface.- No test for
ALTER VIEW <metric_view> RENAME TO ...even thoughRenameV2ViewExecis in the strategy and the fixture'srenameViewis implemented. Coverage gap.
| val deps = if (depParts.nonEmpty) { | ||
| Some(DependencyList.of( | ||
| depParts.map(parts => Dependency.table(parts: _*)): _*)) | ||
| } else None |
There was a problem hiding this comment.
The empty-depParts case maps to None here, which becomes null on ViewInfo.viewDependencies(). Per the Javadoc on ViewInfo#viewDependencies, null means "no dependency list was supplied" while an empty list means "provided but the object has none." For a metric view we always compute deps, so when collectTableDependencies returns empty (e.g. SQLSource("SELECT 1 AS x")) the right value is Some(DependencyList.of()), not None.
| val deps = if (depParts.nonEmpty) { | |
| Some(DependencyList.of( | |
| depParts.map(parts => Dependency.table(parts: _*)): _*)) | |
| } else None | |
| val deps = Some(DependencyList.of( | |
| depParts.map(parts => Dependency.table(parts: _*)): _*)) |
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public sealed interface Dependency permits TableDependency, FunctionDependency { |
There was a problem hiding this comment.
FunctionDependency is in the sealed permits list and exposed via the Dependency.function(...) factory below, but no producer in this PR ever emits one -- MetricViewHelper.collectTableDependencies only emits TableDependency. Two options: (a) drop FunctionDependency until it has a producer (the @Evolving annotation is meant to evolve before stabilizing, so adding it later is cheap); (b) keep it as groundwork but mention in the PR description so reviewers don't trip on the dead surface, and add a sentence to this class-level Javadoc noting that consumers may receive only TableDependency instances today.
| * with single-level namespaces the parts are {@code [catalog, schema, table]}; for a catalog | ||
| * with multi-level namespaces (e.g. Iceberg with {@code db1.db2}) the parts are | ||
| * {@code [catalog, db1, db2, ..., table]}; for v1 sources resolved through the session | ||
| * catalog producers should normalize to {@code [spark_catalog, db, table]} so consumers see |
There was a problem hiding this comment.
Missing comma -- as written, "session catalog producers" reads as a compound noun.
| * catalog producers should normalize to {@code [spark_catalog, db, table]} so consumers see | |
| * catalog, producers should normalize to {@code [spark_catalog, db, table]} so consumers see |
| ViewHelper.verifyTemporaryObjectsNotExists( | ||
| isTemporary = false, name.nameParts, analyzed, Seq.empty) | ||
| analyzed | ||
| isTemporary = false, nameParts, analyzed, Seq.empty) |
There was a problem hiding this comment.
analyzeMetricViewText runs verifyTemporaryObjectsNotExists but skips the cyclic-view-reference check that plain CREATE VIEW gets via CheckViewReferences.checkCyclicViewReference (only applied on replace). For CREATE OR REPLACE VIEW <mv> WITH METRICS LANGUAGE YAML AS $$ from: <mv> ... $$ the cycle currently surfaces only on the next SELECT. Pre-existing on the v1 path too (the old createMetricViewInSessionCatalog didn't call it either), so this is a follow-up rather than a blocker for this PR -- but wiring it in here means both v1 and v2 metric-view CREATE paths benefit from one fix.
| val info = capturedViewInfo() | ||
| val props = info.properties() | ||
|
|
||
| // metric_view.* descriptive properties (mirrors DBR SingleSourceMetricView). |
There was a problem hiding this comment.
"DBR" is internal Databricks-Runtime terminology and shouldn't appear in OSS comments.
| // metric_view.* descriptive properties (mirrors DBR SingleSourceMetricView). | |
| // metric_view.* descriptive properties (mirrors the canonical metric-view property layout). |
| // emits a "Type" row matching v1 `CatalogTable.toJsonLinkedHashMap` parity, so users | ||
| // can distinguish a plain VIEW from a sub-kind like METRIC_VIEW. |
There was a problem hiding this comment.
Sentence is broken: "Pins DescribeV2ViewExec emits a "Type" row matching v1 CatalogTable.toJsonLinkedHashMap parity ..." reads as if Pins is a verb taking DescribeV2ViewExec as its direct object, and "matching X parity" is awkward.
| // emits a "Type" row matching v1 `CatalogTable.toJsonLinkedHashMap` parity, so users | |
| // can distinguish a plain VIEW from a sub-kind like METRIC_VIEW. | |
| // "View Current Catalog" / "View Schema Mode" / etc. rows. Pins that `DescribeV2ViewExec` | |
| // emits a "Type" row for parity with v1 `CatalogTable.toJsonLinkedHashMap`, so users |
…pache#55487 - DataSourceV2Strategy: empty dep list -> Some(DependencyList.of()), not None, so ViewInfo.viewDependencies semantics match its Javadoc ("null = not supplied", "empty = supplied but none"). Metric-view CREATE always *computes* deps, so the right empty representation is Some(empty list). - Dependency.java: add Javadoc note that today's producers emit only TableDependency; FunctionDependency / Dependency.function(...) are groundwork for future producers (e.g. SQL UDF dep tracking). - TableDependency.java: insert missing comma after "session catalog" in the parts-form Javadoc. - MetricViewV2CatalogSuite: drop "DBR" terminology, fix the broken "Pins ..." sentence in the DESCRIBE EXTENDED test, and add an ALTER VIEW <metric_view> RENAME TO test that pins the existing RenameV2ViewExec wiring (catalog renameView relocates both the views entry and the captured ViewInfo entry; metric-view kind survives the rename). Signed-off-by: chen.wang <chen.wang@databricks.com>
cloud-fan
left a comment
There was a problem hiding this comment.
Round 5 review.
Status: All 6 prior round-4 findings addressed (R4 inline: Some(DependencyList.of()) not None; FunctionDependency unused-API Javadoc note; TableDependency missing-comma typo; "DBR" terminology leak; broken sentence at line 755 -- plus R4 follow-up #3: ALTER VIEW ... RENAME TO test). 1 remaining (R4 inline #4: analyzeMetricViewText still skips CheckViewReferences.checkCyclicViewReference -- accepted as a follow-up rather than a blocker in round 4). 1 new (newly introduced in e980f24427): a small citation-precision nit on the new DataSourceV2Strategy comment (inline below).
The round-5 commit is small and surgical; nothing else from prior rounds regressed. Approving -- the inline below is a nit, not blocking.
…rces/v2/DataSourceV2Strategy.scala Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
| * the first element is typically the catalog name and subsequent elements are namespace | ||
| * components followed by the table name. | ||
| */ | ||
| static TableDependency table(String... nameParts) { |
There was a problem hiding this comment.
a late catch. This API is implemented by users and called by Spark. Var-args API is not recommended here as it requires Spark to do parts: _* which is extra work.
There was a problem hiding this comment.
String[] should be better.
… isViewLike Address cloud-fan late catch on PR apache#55487 (apache#55487 (comment)) plus port DBR's `CatalogTable.isViewLike` helper to OSS to reduce divergence. - Dependency.table / Dependency.function / DependencyList.of: switch from `String... nameParts` / `Dependency... dependencies` varargs to `List<String>` / `List<Dependency>` parameters. Eliminates the `: _*` splat and transient `String[]` / `Dependency[]` array allocation on every call. Records still hold immutable List internally via `List.copyOf` in the canonical constructor, so wire / value semantics are unchanged. - DataSourceV2Strategy: update the sole producer call site to construct via `.asJava` instead of `: _*`, with an explicit `Seq[Dependency]` ascription so the Java invariance + Scala covariance gap is bridged before `.asJava`. - CatalogTable.isViewLike: new instance method (mirrors DBR signature exactly so the body is the only OSS/DBR diff: today VIEW or METRIC_VIEW; DBR also includes MATERIALIZED_VIEW and STREAMING_TABLE), plus a companion `CatalogTable.isViewLike(t: CatalogTableType)` for callers that only have a `CatalogTableType` (`SessionCatalog.isView`, `ddl.scala::verifyAlterTableType`, drop-table type check, and `HiveClientImpl.toHiveTableType`). - Replace all 17 inline `tableType == VIEW || tableType == METRIC_VIEW` disjunctions and 1 `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW` pattern-alternation across catalyst / core / hive with the new helpers. Single source of truth -- adding new view-like types in the future is a one-line change in `isViewLike`. - Drop now-unused `CatalogTableType` imports from 6 files where the only reference was the just-removed disjunction. Signed-off-by: chen.wang <chen.wang@databricks.com>
…d-fan Per apache#55487 review feedback, flip the Dependency / DependencyList records back to array-shape fields to match the dominant DSv2 convention (Identifier.namespace, Column.children, etc.) while preserving the value semantics that the round-3 List<> switch was introduced to fix. - TableDependency.nameParts: List<String> -> String[] - FunctionDependency.nameParts: List<String> -> String[] - DependencyList.dependencies: List<Dependency> -> Dependency[] - Each record gains explicit equals / hashCode / toString overrides using Arrays.equals / Arrays.hashCode / Arrays.toString so structural names still compare value-wise. Without the overrides, records' auto-generated methods on array fields fall through to Object.equals (reference equality). - Each record gains a defensive accessor override returning a clone, so callers cannot mutate the record's internal array. - Static factories Dependency.table / Dependency.function: switch from List<String> to String[] (non-vararg, satisfying the same R3189677004 intent -- no `: _*` splat). - DependencyList.of: switch from List<Dependency> to Dependency[]. - DataSourceV2Strategy producer: build via .toArray instead of .asJava; drop the now-unused CollectionConverters import. - MetricViewV2CatalogSuite: array-shape access (.length / (0) / .toSeq) instead of .size / .get(0) / .asScala.toSeq. Also folds in three pre-existing CI failures from `90deee1949c`: - Dependency.java Javadoc: `{@link #function(String...)}` -> `{@link #function(String[])}` so `Run / Documentation generation` resolves the new signature. - MetricViewV2CatalogSuite ALTER VIEW RENAME TO test: drop the catalog component from the new name (`ns.mv_renamed`, not `testcat.ns.mv_renamed`) per upstream `DataSourceV2SQLSuite` convention, so `newName.asIdentifier` doesn't leak the catalog into the namespace and the fixture's relocated entry can be found by loadTableOrView. - PlanResolutionSuite `alter view: alter view properties`: stub the new `isViewLike` method on the Mockito CatalogTable mock at line 141 so Analyzer.scala:1229's call to `v1Table.v1Table.isViewLike` returns the correct value computed from the stubbed `tableType`. Without the stub Mockito returns false (default for unstubbed Boolean methods), the view fixture falls through to the table branch, and the test fails with `EXPECT_VIEW_NOT_TABLE.USE_ALTER_TABLE`. Signed-off-by: chen.wang <chen.wang@databricks.com>
There was a problem hiding this comment.
Round 6 review.
Status: 2 addressed (R5 nit on DataSourceV2Strategy comment citation, fixed in fd3d8b2; R5 late catch on Dependency var-args API, fixed in f2251d8). 1 remaining (analyzeMetricViewText cyclic-view-reference follow-up — still scoped out per round 4). 4 new (3 newly introduced — DBR leak in new isViewLike Javadoc, misleading "Body kept in sync" line, null-element rejection regression on the new array records; 1 late catch — single-VIEW audit gap at views.scala).
(GitHub's inline-comment endpoint is returning a transient 422 "internal error" for this PR, so the per-line findings are inlined below as a temporary workaround. I'll re-post them properly once the API recovers.)
1. DBR terminology leak — interface.scala:466 (newly introduced in 90deee1). "DBR" is internal Databricks-Runtime terminology and shouldn't appear in OSS comments — round 4's inline #2 explicitly flagged this for the test comment, round 5 confirmed it was fixed, but porting the helper from the internal fork brought the term back. Drop the parenthetical or rephrase generically:
- * METRIC_VIEW. Forks may extend this set (e.g. DBR also includes MATERIALIZED_VIEW and
- * STREAMING_TABLE), so call sites that need a uniform "is this view-like?" check should
+ * METRIC_VIEW. Forks may extend this set with additional view-like types, so call sites
+ * that need a uniform "is this view-like?" check should2. Misleading "Body kept in sync" line — interface.scala:773-774 (newly introduced in 90deee1). There's only one body to keep in sync because def isViewLike: Boolean = CatalogTable.isViewLike(tableType) (line 470) delegates to this companion form. The note reads as if there are two parallel bodies a future maintainer needs to update together. Drop the line:
- * patterns or [[org.apache.spark.sql.catalyst.catalog.SessionCatalog.isView]]). Body kept
- * in sync with the instance method.
+ * patterns or [[org.apache.spark.sql.catalyst.catalog.SessionCatalog.isView]]).3. Null-element rejection regression — TableDependency.java:55, FunctionDependency.java:47, DependencyList.java:50 (newly introduced in f2251d8). Subtle regression from the List<String> → String[] switch: the previous List.copyOf(nameParts) rejected null elements with NPE at construction time (immutable-list invariant). nameParts.clone() does not, so Dependency.table(new String[]{null, "tbl"}) now silently constructs a TableDependency whose nameParts()[0] is null, surfacing as an NPE later when a consumer iterates parts. Same pattern in all three records.
Spark's own producer in MetricViewHelper.collectTableDependencies never emits nulls (parts come from analyzed-plan identifiers), but Dependency / DependencyList is @Evolving SPI exposed to third-party catalogs. If you want to preserve the strictness, add a per-element non-null check after the clone in each compact constructor; if relaxing it is fine, document the precondition on the nameParts / dependencies Javadoc. Not blocking either way.
4. views.scala:175 audit miss — single-VIEW audit gap (late catch). The disjunction-style tableType == VIEW || tableType == METRIC_VIEW was migrated to isViewLike in 14 sites, but CreateViewCommand.runCommand at views.scala:175 still uses the single-VIEW form tableMetadata.tableType != CatalogTableType.VIEW. Pre-PR, a metric view had tableType == VIEW so this branch did not fire on cross-kind collision; post-PR, a metric view has tableType == METRIC_VIEW, so CREATE VIEW <existing_metric_view> (no IF NOT EXISTS) now throws unsupportedCreateOrReplaceViewOnTableError ("… on a table") instead of pre-PR viewAlreadyExistsError. The error message no longer matches reality (the existing object is a view, just a metric one). Either migrate to !isViewLike to recover pre-PR behavior, or keep the strict separation but adjust the error class. Related lower-stakes consistency gap: createDataSourceTables.scala:53,154 (assert(table.tableType != CatalogTableType.VIEW)) silently accepts METRIC_VIEW input where the audit's intent reads as "no view-like types here."
…pache#55487 Address 4 findings from apache#55487 (review): - Drop "DBR" terminology leak in CatalogTable.isViewLike Javadoc (interface.scala:466) -- newly introduced in round 5; matches the resolution of round 4's DBR-mention finding on test comments. - Drop misleading "Body kept in sync with the instance method" line in the CatalogTable.isViewLike(t: CatalogTableType) companion Javadoc (interface.scala:773-774) -- there's only one body since the instance method delegates to the companion form. - Document the non-null-element precondition on TableDependency, FunctionDependency, and DependencyList @param Javadocs. Round-5b's switch from List.copyOf back to Array.clone() lost the implicit null-element rejection that the immutable-list invariant gave. Spark's own producer never emits nulls, but the SPI is @evolving and exposed to third-party catalogs; documenting the precondition keeps the API consistent with other DSv2 array-shape SPIs (Identifier.namespace, Column.children) which also trust callers. - views.scala:175 (CreateViewCommand.runCommand) audit miss: change `tableMetadata.tableType != CatalogTableType.VIEW` to `!tableMetadata.isViewLike` so `CREATE VIEW <existing_metric_view>` surfaces the correct viewAlreadyExistsError instead of the misclassifying unsupportedCreateOrReplaceViewOnTableError ("...on a table"). Pre-PR this branch didn't fire on metric views (they had tableType == VIEW); post-PR they have tableType == METRIC_VIEW, so the strict comparison incorrectly classified them as tables. Also widen the two assertions in createDataSourceTables.scala (`assert(table.tableType != CatalogTableType.VIEW)`) to `assert(!table.isViewLike)` per the same pattern -- these are invariant sanity checks whose intent is "no view-like input here". Signed-off-by: chen.wang <chen.wang@databricks.com>
cloud-fan
left a comment
There was a problem hiding this comment.
Round 7 review.
Status: 4 addressed (all R6 findings: DBR terminology drop in interface.scala:466, "Body kept in sync" line in interface.scala:773-774, null-element precondition documented on the three array-record @param Javadocs, views.scala:175 audit miss + bonus widening of the two assertions in createDataSourceTables.scala). 0 remaining. 2 new -- both late catches of mine: same isViewLike audit-miss pattern at two more sites in HiveExternalCatalog.scala (605 and 854) that I should have caught alongside views.scala:175 in round 6.
(GitHub's inline-comment endpoint was returning a transient 422 last round; trying inline again this round.)
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
|
thanks, merging to master/4.x/4.2 (to finalize the ds v2 view api before release) |
### What changes were proposed in this pull request? This PR extends metric-view support to **DS v2 catalogs** by routing `CREATE VIEW ... WITH METRICS` through the `ViewCatalog` / `TableViewCatalog` APIs introduced by [SPARK-52729](#51419) and finalized by [SPARK-56655](https://github.com/apache/spark/pull/55954). Third-party v2 catalogs that implement `ViewCatalog` can now host metric views with the same metadata fidelity as session-catalog metric views. **1. V2 metric-view CREATE path -- shared with `CreateV2ViewExec`.** A new `CreateV2MetricViewExec` and `CreateV2ViewExec` both extend a new `V2CreateViewPreparation` trait (which itself extends `V2ViewPreparation`). The trait owns the shared CREATE-side `run()`: `viewExists` short-circuit on `IF NOT EXISTS`, `createOrReplaceView` for `OR REPLACE`, and cross-type collision decoding (`ViewAlreadyExistsException` -> `tableExists` -> `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode `UNSUPPORTED`, typed `viewDependencies`, `PROP_TABLE_TYPE = METRIC_VIEW`, `retainColumnMetadata = true`) via optional hooks on `V2ViewPreparation`. `DataSourceV2Strategy` intercepts `CreateMetricViewCommand` on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in `CreateMetricViewCommand.run`. **2. First-class `METRIC_VIEW` table type.** - `CatalogTableType.METRIC_VIEW` is added alongside `EXTERNAL` / `MANAGED` / `VIEW`. - `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant for the V2 surface. - The previous `view.viewWithMetrics` property hack is removed; `CatalogTable.isMetricView` checks `tableType == METRIC_VIEW` directly. - `V1Table.summarizeTableType` and `V1Table.toCatalogTable(catalog, ident, ViewInfo)` translate between the V2 property form and the V1 enum. - HMS round-trip support: `HiveTableType` has no `METRIC_VIEW` variant (both regular views and metric views serialize as `VIRTUAL_VIEW`). `HiveExternalCatalog` now persists a `view.subType = METRIC_VIEW` property on write and lifts `tableType` back to `METRIC_VIEW` on read, so HMS-backed metric views survive the round trip. **3. Repo-wide `tableType == VIEW` audit + `CatalogTable.isViewLike` helper.** Promoting metric views to a distinct `CatalogTableType` opens silent regressions wherever existing code branches on `VIEW`. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces: - `CatalogTable.isViewLike` instance method (DBR parity: today returns `tableType == VIEW || tableType == METRIC_VIEW`; forks may extend the set). - `CatalogTable.isViewLike(t: CatalogTableType)` companion form for the few sites that have a `CatalogTableType` but no `CatalogTable` (e.g. `SessionCatalog.isView`, `verifyAlterTableType`, `HiveClientImpl.toHiveTableType`). All 18 sites in `catalyst` / `core` / `hive` that previously did inline `tableType == VIEW || tableType == METRIC_VIEW` (or the `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW` pattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites: `CatalogTable.toJsonLinkedHashMap` (DESCRIBE EXTENDED rows), `HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}`, `HiveClientImpl.toHiveTableType`, `SessionCatalog.isView`, `InMemoryCatalog.listViews`, `RelationResolution`, `Analyzer.lookupTableOrView`, `rules.scala`, `DataStreamWriter`, `DescribeRelationJsonCommand`, `AnalyzeColumnCommand`, `AnalyzePartitionCommand`, `CommandUtils.analyzeTable`, `V2SessionCatalog.dropTableInternal`, `verifyAlterTableType` in `ddl.scala`, and 3 sites in `tables.scala`. **Explicit rejection (uniform error class):** `SHOW CREATE TABLE` on a metric view has no round-trippable `CREATE VIEW ... WITH METRICS` form, so it's rejected explicitly with the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class on **both** the v1 session-catalog path (in `tables.scala`) and the v2 catalog path (in `DataSourceV2Strategy`), so users see the same actionable message regardless of catalog kind. **4. Drop-command parity.** - `DropTableCommand` (v1 path) treats both `VIEW` and `METRIC_VIEW` as views: `DROP TABLE` rejects either with `wrongCommandForObjectTypeError`, and `DROP VIEW` accepts either. - `V2SessionCatalog.dropTableInternal` extends the existing "view rejected from `DROP TABLE`" guard to cover `METRIC_VIEW`. - For non-session v2: `DropTableExec` (post-SPARK-56655) actively rejects with `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since `TableViewCatalog`'s default `viewExists` derives from `loadTableOrView` and recognizes `MetadataTable + ViewInfo`. - `ResolveSessionCatalog`'s `DropView` routing comment is clarified: v2 metric views fall through to `DataSourceV2Strategy` and `ViewCatalog.dropView`. **5. Typed view dependencies (`ViewInfo.viewDependencies`).** - New public DTOs in `org.apache.spark.sql.connector.catalog`: `Dependency` (sealed interface with `Dependency.table(String[])` / `Dependency.function(String[])` non-vararg factories), `TableDependency`, `FunctionDependency`, `DependencyList(Dependency[])`. - `TableDependency` and `FunctionDependency` carry the dependency identifier as **structural multi-part name parts** (`record TableDependency(String[] nameParts)`), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg `cat.db1.db2.tbl` -> 4 parts) round-trip without ambiguity against quoted identifiers containing literal `.`. v1 sources resolved through the session catalog are normalized by a new `MetricViewHelper.qualifyV1` to a stable 3-part `[spark_catalog, db, table]` shape so consumers see deterministic arity per source kind (otherwise `TableIdentifier.nameParts` could return 1, 2, or 3 parts depending on what the analyzer captured). - All three records (`TableDependency`, `FunctionDependency`, `DependencyList`) override `equals` / `hashCode` / `toString` using `Arrays.equals` / `Arrays.hashCode` / `Arrays.toString` to give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through to `Object.equals` (reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensive `clone()` so callers cannot mutate the record's internal array. - `ViewInfo` gains a `viewDependencies` field and a `ViewInfo.Builder.withViewDependencies(...)` setter. Per the field's contract, `null` means "no dependency list was supplied" while an empty `DependencyList.of(new Dependency[0])` means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even when `collectTableDependencies` returns empty. - `MetricViewHelper.collectTableDependencies` walks the analyzed plan and emits structural `Seq[Seq[String]]` parts; the v2 source arm preserves full namespace arity, the v1 source arms (`View`, `HiveTableRelation`, `LogicalRelation`) all route through `qualifyV1` for the stable 3-part shape. **6. Multi-level-namespace targets for v2 metric views.** `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. The helper now takes `nameParts: Seq[String]` directly; call sites in both the v1 path (`CreateMetricViewCommand`) and the v2 path (`DataSourceV2Strategy`) updated. The helper now also returns `(LogicalPlan, MetricView)` so callers don't have to re-parse the YAML body just to read descriptor properties. **7. `metric_view.*` descriptor properties (v1/v2 parity).** `MetricView.getProperties` produces canonical descriptive properties (`metric_view.from.type`, `metric_view.from.name` / `metric_view.from.sql`, `metric_view.where`) that **both** the v1 path (`CreateMetricViewCommand.createMetricViewInSessionCatalog`) and the v2 path (`DataSourceV2Strategy`) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows in `DESCRIBE TABLE EXTENDED` regardless of catalog kind. Long values are truncated to `Constants.MAXIMUM_PROPERTY_SIZE`; the Scaladoc on `getProperties` calls out that `metric_view.from.sql` is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL. **8. `ViewInfo` constructor cleanup.** The metric-view-specific `PROP_TABLE_TYPE = METRIC_VIEW` special case is dropped from the generic `ViewInfo` constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. `METRIC_VIEW`) call `BaseBuilder.withTableType(...)` before `build()` -- exercised by `CreateV2MetricViewExec` via the new `V2ViewPreparation.tableType` hook. **9. `ViewHelper.aliasPlan(retainMetadata)`.** The user-specified-column-with-comment branch in `aliasPlan` previously dropped existing column metadata. A new `retainMetadata: Boolean = false` parameter merges the analyzed attribute's metadata into the new comment metadata. `ViewHelper.prepareTable` passes `retainMetadata = isMetricView` (v1 path); `V2ViewPreparation` exposes a `retainColumnMetadata` hook that `CreateV2MetricViewExec` overrides to `true` (v2 path). Both preserve the per-column `metric_view.type` / `metric_view.expr` keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments. **10. Error classes.** - New `INVALID_METRIC_VIEW_YAML` (sqlState 42K0L). `MetricViewPlanner.parseYAML`'s catch blocks now route through `QueryCompilationErrors.invalidMetricViewYamlError` instead of `SparkException.internalError`, so a typo in the user's YAML body surfaces as a user-correctable `AnalysisException` rather than "please contact support". - New `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path so `SHOW CREATE TABLE` on a metric view produces the same actionable message regardless of catalog kind. **11. Misc.** - `MetricViewCanonical.parseSource` accepts multipart identifiers (`parseMultipartIdentifier`) so 3-part `catalog.schema.table` source references work as `AssetSource`. ### Why are the changes needed? Before this PR, metric-view DDL only worked against the session catalog: the create path called `SessionCatalog.createTable` directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped `ViewCatalog` and `TableViewCatalog` as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface. Once metric views can live on a v2 catalog, two more constraints surface: 1. **Type discriminator.** A consumer reading a row through `ViewCatalog.loadView` needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in `PROP_TABLE_TYPE = METRIC_VIEW` keeps the distinction wire-compatible and lets `V1Table.toCatalogTable` reconstruct `CatalogTableType.METRIC_VIEW` on the read path. 2. **Structured dependency lineage.** Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed `DependencyList` of `TableDependency` / `FunctionDependency` with structural `String[] nameParts` lets catalogs persist the lineage as a first-class field with full fidelity. The remaining changes (drop-command parity, `aliasPlan` metadata retention, `metric_view.*` properties, `parseMultipartIdentifier`, `tableType == VIEW` audit + `isViewLike` helper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real `CatalogTableType` and as a v2 catalog citizen -- without them, basic operations like `DROP VIEW`, `DESCRIBE TABLE EXTENDED`, `CREATE VIEW (a COMMENT 'c') WITH METRICS ...`, or `CREATE VIEW cat.db1.db2.mv WITH METRICS ...` would silently degrade. ### Does this PR introduce _any_ user-facing change? Yes, both for end users and for catalog plugin developers: **End users:** - `CREATE VIEW <ident> WITH METRICS ...` now works against any v2 catalog that implements `ViewCatalog`, including catalogs with multi-level namespace targets. Previously it was rejected with `MISSING_CATALOG_ABILITY.VIEWS` for non-session catalogs, and capped at single-level namespaces. `IF NOT EXISTS` and `OR REPLACE` are honored on the v2 path (regression vs. v1 fixed). - A v2 metric view can be queried with `SELECT region, measure(count_sum) FROM <mv> ...`, dropped with `DROP VIEW`, listed via `SHOW VIEWS` (and via `SHOW TABLES` on a `TableViewCatalog`, matching v1 SHOW TABLES output), and described with `DESCRIBE TABLE` / `DESCRIBE TABLE EXTENDED`. `DROP TABLE` on a metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead"). - `ALTER VIEW <metric_view> RENAME TO ...` is wired through `RenameV2ViewExec` and preserves the metric-view kind across the rename. - `SHOW CREATE TABLE` on a metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind. - Session-catalog metric views are now stored as `CatalogTableType.METRIC_VIEW` instead of `CatalogTableType.VIEW + view.viewWithMetrics=true`. Observable in `DESCRIBE TABLE EXTENDED`'s `Type` row and the `tableType` column of the `tables` system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a `view.subType = METRIC_VIEW` property marker. - `DESCRIBE TABLE EXTENDED` on metric views (v1 and v2) now consistently surfaces `metric_view.from.type` / `metric_view.from.name` / `metric_view.from.sql` / `metric_view.where` descriptor rows. - Error messages from `DROP TABLE` / `DROP VIEW` mismatch now mention `METRIC_VIEW` alongside `VIEW`. - Malformed metric-view YAML now surfaces as `INVALID_METRIC_VIEW_YAML` (user-correctable) instead of "Spark internal error, please contact support". **Catalog plugin developers:** - New public API surface in `org.apache.spark.sql.connector.catalog`: sealed interface `Dependency` with `permits TableDependency, FunctionDependency`, both records carrying `String[] nameParts`. `Dependency.table(String[])` / `Dependency.function(String[])` static factories (non-vararg per review; callers pass an existing array directly). `DependencyList(Dependency[])` with `DependencyList.of(Dependency[])` factory. All three records override `equals` / `hashCode` / `toString` to give value semantics on their array fields, and the canonical accessors return a defensive `clone()` so internal state is not mutable through the public API. All `Evolving`, `since 4.2.0`. Note: today's only producer in Spark itself is metric-view dependency extraction, which emits `TableDependency` only; `FunctionDependency` and `Dependency.function(...)` are exposed as groundwork for future producers (e.g. SQL UDF dep tracking). - `ViewInfo` gains a typed `viewDependencies()` accessor and `ViewInfo.Builder.withViewDependencies(...)` setter. `viewDependencies` is populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back through `V1Table.toCatalogTable`) carry `null`. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up. - `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant. - `CatalogTableType.METRIC_VIEW` enum value (v1 surface). - `CatalogTable.isViewLike` instance + `CatalogTable.isViewLike(CatalogTableType)` companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR's `MATERIALIZED_VIEW`, `STREAMING_TABLE`) only need to extend the helper body. - `V2ViewPreparation` (private to `org.apache.spark.sql.execution.datasources.v2`) gains optional `viewDependencies` / `tableType` / `retainColumnMetadata` hooks. ### How was this patch tested? `MetricViewV2CatalogSuite` -- 31 tests across 5 sections, all against an in-memory `ViewCatalog` test fixture (`MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog`): **Section 1 -- CREATE-related (11 tests):** - V2 catalog receives `METRIC_VIEW` table type and view text via `ViewInfo`. - V2 catalog path populates `metric_view.*` descriptor properties + view context (`currentCatalog` / `currentNamespace`) + captured SQL configs. - V2 catalog path captures `SQLSource` and comment. - Metric view columns carry `metric_view.type` / `metric_view.expr` in column metadata. - User-specified column names with comments preserve `metric_view.*` metadata (pins the `aliasPlan(retainMetadata = true)` fix). - `CREATE OR REPLACE VIEW ... WITH METRICS` replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, `metric_view.where`, dependencies). - `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when the view exists (catalog never sees the second `createView` call). - `CREATE VIEW ... WITH METRICS` over a v2 table at the ident throws `TABLE_OR_VIEW_ALREADY_EXISTS` (analyzer-time pre-check). - `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when a v2 table sits at the ident (v1 parity). - `CREATE VIEW ... WITH METRICS` on a non-`ViewCatalog` catalog fails with `MISSING_CATALOG_ABILITY.VIEWS`. - `CREATE VIEW ... WITH METRICS` at a multi-level-namespace v2 target (`testcat.ns_a.ns_b.mv_deep`) succeeds (pins the `analyzeMetricViewText` lift to `Seq[String]`). **Section 2 -- Dependency extraction (5 tests):** - SQL source `JOIN` captures both tables as 3-part `nameParts`. - SQL source subquery deduplicates same-table references. - SQL source self-join deduplicates same-table references. - V1 session-catalog source emits exactly 3 parts, normalized to `[spark_catalog, db, table]` by `qualifyV1`. - Multi-level V2 namespace source (`testcat.ns_a.ns_b.events_deep`) emits 4-part `nameParts`. **Section 3 -- SELECT cases (5 tests, modeled on `MetricViewSuite` patterns):** - `SELECT measure(count_sum) FROM <mv> GROUP BY region` returns aggregated rows (exercises the full `loadTableOrView` -> `MetadataTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip). - `SELECT measure(...) WHERE region = ...` -- query-layer filter on top of the view. - View's pre-defined `where` clause is applied (`where = Some("count > 1")` filters at view-resolution time). - Multiple measures with different aggregations (sum / sum / max). - `ORDER BY measure(...) DESC LIMIT 1` over the metric view. **Section 4 -- DESCRIBE cases (2 tests):** - `DESCRIBE TABLE EXTENDED` round-trips through `loadTableOrView` and emits the `View Text` / `Type` rows (gated through `CatalogTable.isViewLike` in `toJsonLinkedHashMap`, which now recognizes `METRIC_VIEW`). - `DESCRIBE TABLE` (non-EXTENDED) returns the aliased columns. **Section 5 -- DROP / SHOW / RENAME cases (8 tests):** - `DROP VIEW` succeeds on a v2 metric view. - `DROP VIEW IF EXISTS` on a non-existent v2 metric view is a no-op. - `DROP TABLE` on a v2 metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", per SPARK-56655's `DropTableExec`) and asserts the metric view is **not** deleted. - `DROP TABLE IF EXISTS` on a v2 metric view also throws (`IF EXISTS` doesn't silence the wrong-type error, v1 parity). - `SHOW CREATE TABLE` on a v2 metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (same dedicated error class as the v1 path). - `SHOW TABLES` on a `TableViewCatalog` lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655). - `SHOW VIEWS` lists v2 metric views. - `ALTER VIEW <metric_view> RENAME TO ...` succeeds and preserves the metric-view kind across the rename (pins `RenameV2ViewExec` end-to-end against the fixture's `renameView`). Existing session-catalog metric-view tests (`MetricViewSuite`, `SimpleMetricViewSuite`, `HiveMetricViewSuite`) and v1 path tests pass unchanged. `DDLSuite` and `HiveDDLSuite` had their `tableTypes` enumerations updated to include `METRIC_VIEW` in two assertion lists. `PlanResolutionSuite` test fixture was updated to stub the new `CatalogTable.isViewLike` method on the Mockito mock. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Sonnet 4.7) Closes #55487 from chenwang-databricks/metric-view-on-51419. Lead-authored-by: Chen Wang <chen.wang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8cfc73a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This PR extends metric-view support to **DS v2 catalogs** by routing `CREATE VIEW ... WITH METRICS` through the `ViewCatalog` / `TableViewCatalog` APIs introduced by [SPARK-52729](#51419) and finalized by [SPARK-56655](https://github.com/apache/spark/pull/55954). Third-party v2 catalogs that implement `ViewCatalog` can now host metric views with the same metadata fidelity as session-catalog metric views. **1. V2 metric-view CREATE path -- shared with `CreateV2ViewExec`.** A new `CreateV2MetricViewExec` and `CreateV2ViewExec` both extend a new `V2CreateViewPreparation` trait (which itself extends `V2ViewPreparation`). The trait owns the shared CREATE-side `run()`: `viewExists` short-circuit on `IF NOT EXISTS`, `createOrReplaceView` for `OR REPLACE`, and cross-type collision decoding (`ViewAlreadyExistsException` -> `tableExists` -> `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode `UNSUPPORTED`, typed `viewDependencies`, `PROP_TABLE_TYPE = METRIC_VIEW`, `retainColumnMetadata = true`) via optional hooks on `V2ViewPreparation`. `DataSourceV2Strategy` intercepts `CreateMetricViewCommand` on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in `CreateMetricViewCommand.run`. **2. First-class `METRIC_VIEW` table type.** - `CatalogTableType.METRIC_VIEW` is added alongside `EXTERNAL` / `MANAGED` / `VIEW`. - `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant for the V2 surface. - The previous `view.viewWithMetrics` property hack is removed; `CatalogTable.isMetricView` checks `tableType == METRIC_VIEW` directly. - `V1Table.summarizeTableType` and `V1Table.toCatalogTable(catalog, ident, ViewInfo)` translate between the V2 property form and the V1 enum. - HMS round-trip support: `HiveTableType` has no `METRIC_VIEW` variant (both regular views and metric views serialize as `VIRTUAL_VIEW`). `HiveExternalCatalog` now persists a `view.subType = METRIC_VIEW` property on write and lifts `tableType` back to `METRIC_VIEW` on read, so HMS-backed metric views survive the round trip. **3. Repo-wide `tableType == VIEW` audit + `CatalogTable.isViewLike` helper.** Promoting metric views to a distinct `CatalogTableType` opens silent regressions wherever existing code branches on `VIEW`. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces: - `CatalogTable.isViewLike` instance method (DBR parity: today returns `tableType == VIEW || tableType == METRIC_VIEW`; forks may extend the set). - `CatalogTable.isViewLike(t: CatalogTableType)` companion form for the few sites that have a `CatalogTableType` but no `CatalogTable` (e.g. `SessionCatalog.isView`, `verifyAlterTableType`, `HiveClientImpl.toHiveTableType`). All 18 sites in `catalyst` / `core` / `hive` that previously did inline `tableType == VIEW || tableType == METRIC_VIEW` (or the `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW` pattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites: `CatalogTable.toJsonLinkedHashMap` (DESCRIBE EXTENDED rows), `HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}`, `HiveClientImpl.toHiveTableType`, `SessionCatalog.isView`, `InMemoryCatalog.listViews`, `RelationResolution`, `Analyzer.lookupTableOrView`, `rules.scala`, `DataStreamWriter`, `DescribeRelationJsonCommand`, `AnalyzeColumnCommand`, `AnalyzePartitionCommand`, `CommandUtils.analyzeTable`, `V2SessionCatalog.dropTableInternal`, `verifyAlterTableType` in `ddl.scala`, and 3 sites in `tables.scala`. **Explicit rejection (uniform error class):** `SHOW CREATE TABLE` on a metric view has no round-trippable `CREATE VIEW ... WITH METRICS` form, so it's rejected explicitly with the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class on **both** the v1 session-catalog path (in `tables.scala`) and the v2 catalog path (in `DataSourceV2Strategy`), so users see the same actionable message regardless of catalog kind. **4. Drop-command parity.** - `DropTableCommand` (v1 path) treats both `VIEW` and `METRIC_VIEW` as views: `DROP TABLE` rejects either with `wrongCommandForObjectTypeError`, and `DROP VIEW` accepts either. - `V2SessionCatalog.dropTableInternal` extends the existing "view rejected from `DROP TABLE`" guard to cover `METRIC_VIEW`. - For non-session v2: `DropTableExec` (post-SPARK-56655) actively rejects with `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since `TableViewCatalog`'s default `viewExists` derives from `loadTableOrView` and recognizes `MetadataTable + ViewInfo`. - `ResolveSessionCatalog`'s `DropView` routing comment is clarified: v2 metric views fall through to `DataSourceV2Strategy` and `ViewCatalog.dropView`. **5. Typed view dependencies (`ViewInfo.viewDependencies`).** - New public DTOs in `org.apache.spark.sql.connector.catalog`: `Dependency` (sealed interface with `Dependency.table(String[])` / `Dependency.function(String[])` non-vararg factories), `TableDependency`, `FunctionDependency`, `DependencyList(Dependency[])`. - `TableDependency` and `FunctionDependency` carry the dependency identifier as **structural multi-part name parts** (`record TableDependency(String[] nameParts)`), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg `cat.db1.db2.tbl` -> 4 parts) round-trip without ambiguity against quoted identifiers containing literal `.`. v1 sources resolved through the session catalog are normalized by a new `MetricViewHelper.qualifyV1` to a stable 3-part `[spark_catalog, db, table]` shape so consumers see deterministic arity per source kind (otherwise `TableIdentifier.nameParts` could return 1, 2, or 3 parts depending on what the analyzer captured). - All three records (`TableDependency`, `FunctionDependency`, `DependencyList`) override `equals` / `hashCode` / `toString` using `Arrays.equals` / `Arrays.hashCode` / `Arrays.toString` to give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through to `Object.equals` (reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensive `clone()` so callers cannot mutate the record's internal array. - `ViewInfo` gains a `viewDependencies` field and a `ViewInfo.Builder.withViewDependencies(...)` setter. Per the field's contract, `null` means "no dependency list was supplied" while an empty `DependencyList.of(new Dependency[0])` means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even when `collectTableDependencies` returns empty. - `MetricViewHelper.collectTableDependencies` walks the analyzed plan and emits structural `Seq[Seq[String]]` parts; the v2 source arm preserves full namespace arity, the v1 source arms (`View`, `HiveTableRelation`, `LogicalRelation`) all route through `qualifyV1` for the stable 3-part shape. **6. Multi-level-namespace targets for v2 metric views.** `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. The helper now takes `nameParts: Seq[String]` directly; call sites in both the v1 path (`CreateMetricViewCommand`) and the v2 path (`DataSourceV2Strategy`) updated. The helper now also returns `(LogicalPlan, MetricView)` so callers don't have to re-parse the YAML body just to read descriptor properties. **7. `metric_view.*` descriptor properties (v1/v2 parity).** `MetricView.getProperties` produces canonical descriptive properties (`metric_view.from.type`, `metric_view.from.name` / `metric_view.from.sql`, `metric_view.where`) that **both** the v1 path (`CreateMetricViewCommand.createMetricViewInSessionCatalog`) and the v2 path (`DataSourceV2Strategy`) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows in `DESCRIBE TABLE EXTENDED` regardless of catalog kind. Long values are truncated to `Constants.MAXIMUM_PROPERTY_SIZE`; the Scaladoc on `getProperties` calls out that `metric_view.from.sql` is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL. **8. `ViewInfo` constructor cleanup.** The metric-view-specific `PROP_TABLE_TYPE = METRIC_VIEW` special case is dropped from the generic `ViewInfo` constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. `METRIC_VIEW`) call `BaseBuilder.withTableType(...)` before `build()` -- exercised by `CreateV2MetricViewExec` via the new `V2ViewPreparation.tableType` hook. **9. `ViewHelper.aliasPlan(retainMetadata)`.** The user-specified-column-with-comment branch in `aliasPlan` previously dropped existing column metadata. A new `retainMetadata: Boolean = false` parameter merges the analyzed attribute's metadata into the new comment metadata. `ViewHelper.prepareTable` passes `retainMetadata = isMetricView` (v1 path); `V2ViewPreparation` exposes a `retainColumnMetadata` hook that `CreateV2MetricViewExec` overrides to `true` (v2 path). Both preserve the per-column `metric_view.type` / `metric_view.expr` keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments. **10. Error classes.** - New `INVALID_METRIC_VIEW_YAML` (sqlState 42K0L). `MetricViewPlanner.parseYAML`'s catch blocks now route through `QueryCompilationErrors.invalidMetricViewYamlError` instead of `SparkException.internalError`, so a typo in the user's YAML body surfaces as a user-correctable `AnalysisException` rather than "please contact support". - New `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path so `SHOW CREATE TABLE` on a metric view produces the same actionable message regardless of catalog kind. **11. Misc.** - `MetricViewCanonical.parseSource` accepts multipart identifiers (`parseMultipartIdentifier`) so 3-part `catalog.schema.table` source references work as `AssetSource`. ### Why are the changes needed? Before this PR, metric-view DDL only worked against the session catalog: the create path called `SessionCatalog.createTable` directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped `ViewCatalog` and `TableViewCatalog` as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface. Once metric views can live on a v2 catalog, two more constraints surface: 1. **Type discriminator.** A consumer reading a row through `ViewCatalog.loadView` needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in `PROP_TABLE_TYPE = METRIC_VIEW` keeps the distinction wire-compatible and lets `V1Table.toCatalogTable` reconstruct `CatalogTableType.METRIC_VIEW` on the read path. 2. **Structured dependency lineage.** Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed `DependencyList` of `TableDependency` / `FunctionDependency` with structural `String[] nameParts` lets catalogs persist the lineage as a first-class field with full fidelity. The remaining changes (drop-command parity, `aliasPlan` metadata retention, `metric_view.*` properties, `parseMultipartIdentifier`, `tableType == VIEW` audit + `isViewLike` helper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real `CatalogTableType` and as a v2 catalog citizen -- without them, basic operations like `DROP VIEW`, `DESCRIBE TABLE EXTENDED`, `CREATE VIEW (a COMMENT 'c') WITH METRICS ...`, or `CREATE VIEW cat.db1.db2.mv WITH METRICS ...` would silently degrade. ### Does this PR introduce _any_ user-facing change? Yes, both for end users and for catalog plugin developers: **End users:** - `CREATE VIEW <ident> WITH METRICS ...` now works against any v2 catalog that implements `ViewCatalog`, including catalogs with multi-level namespace targets. Previously it was rejected with `MISSING_CATALOG_ABILITY.VIEWS` for non-session catalogs, and capped at single-level namespaces. `IF NOT EXISTS` and `OR REPLACE` are honored on the v2 path (regression vs. v1 fixed). - A v2 metric view can be queried with `SELECT region, measure(count_sum) FROM <mv> ...`, dropped with `DROP VIEW`, listed via `SHOW VIEWS` (and via `SHOW TABLES` on a `TableViewCatalog`, matching v1 SHOW TABLES output), and described with `DESCRIBE TABLE` / `DESCRIBE TABLE EXTENDED`. `DROP TABLE` on a metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead"). - `ALTER VIEW <metric_view> RENAME TO ...` is wired through `RenameV2ViewExec` and preserves the metric-view kind across the rename. - `SHOW CREATE TABLE` on a metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind. - Session-catalog metric views are now stored as `CatalogTableType.METRIC_VIEW` instead of `CatalogTableType.VIEW + view.viewWithMetrics=true`. Observable in `DESCRIBE TABLE EXTENDED`'s `Type` row and the `tableType` column of the `tables` system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a `view.subType = METRIC_VIEW` property marker. - `DESCRIBE TABLE EXTENDED` on metric views (v1 and v2) now consistently surfaces `metric_view.from.type` / `metric_view.from.name` / `metric_view.from.sql` / `metric_view.where` descriptor rows. - Error messages from `DROP TABLE` / `DROP VIEW` mismatch now mention `METRIC_VIEW` alongside `VIEW`. - Malformed metric-view YAML now surfaces as `INVALID_METRIC_VIEW_YAML` (user-correctable) instead of "Spark internal error, please contact support". **Catalog plugin developers:** - New public API surface in `org.apache.spark.sql.connector.catalog`: sealed interface `Dependency` with `permits TableDependency, FunctionDependency`, both records carrying `String[] nameParts`. `Dependency.table(String[])` / `Dependency.function(String[])` static factories (non-vararg per review; callers pass an existing array directly). `DependencyList(Dependency[])` with `DependencyList.of(Dependency[])` factory. All three records override `equals` / `hashCode` / `toString` to give value semantics on their array fields, and the canonical accessors return a defensive `clone()` so internal state is not mutable through the public API. All `Evolving`, `since 4.2.0`. Note: today's only producer in Spark itself is metric-view dependency extraction, which emits `TableDependency` only; `FunctionDependency` and `Dependency.function(...)` are exposed as groundwork for future producers (e.g. SQL UDF dep tracking). - `ViewInfo` gains a typed `viewDependencies()` accessor and `ViewInfo.Builder.withViewDependencies(...)` setter. `viewDependencies` is populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back through `V1Table.toCatalogTable`) carry `null`. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up. - `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant. - `CatalogTableType.METRIC_VIEW` enum value (v1 surface). - `CatalogTable.isViewLike` instance + `CatalogTable.isViewLike(CatalogTableType)` companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR's `MATERIALIZED_VIEW`, `STREAMING_TABLE`) only need to extend the helper body. - `V2ViewPreparation` (private to `org.apache.spark.sql.execution.datasources.v2`) gains optional `viewDependencies` / `tableType` / `retainColumnMetadata` hooks. ### How was this patch tested? `MetricViewV2CatalogSuite` -- 31 tests across 5 sections, all against an in-memory `ViewCatalog` test fixture (`MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog`): **Section 1 -- CREATE-related (11 tests):** - V2 catalog receives `METRIC_VIEW` table type and view text via `ViewInfo`. - V2 catalog path populates `metric_view.*` descriptor properties + view context (`currentCatalog` / `currentNamespace`) + captured SQL configs. - V2 catalog path captures `SQLSource` and comment. - Metric view columns carry `metric_view.type` / `metric_view.expr` in column metadata. - User-specified column names with comments preserve `metric_view.*` metadata (pins the `aliasPlan(retainMetadata = true)` fix). - `CREATE OR REPLACE VIEW ... WITH METRICS` replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, `metric_view.where`, dependencies). - `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when the view exists (catalog never sees the second `createView` call). - `CREATE VIEW ... WITH METRICS` over a v2 table at the ident throws `TABLE_OR_VIEW_ALREADY_EXISTS` (analyzer-time pre-check). - `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when a v2 table sits at the ident (v1 parity). - `CREATE VIEW ... WITH METRICS` on a non-`ViewCatalog` catalog fails with `MISSING_CATALOG_ABILITY.VIEWS`. - `CREATE VIEW ... WITH METRICS` at a multi-level-namespace v2 target (`testcat.ns_a.ns_b.mv_deep`) succeeds (pins the `analyzeMetricViewText` lift to `Seq[String]`). **Section 2 -- Dependency extraction (5 tests):** - SQL source `JOIN` captures both tables as 3-part `nameParts`. - SQL source subquery deduplicates same-table references. - SQL source self-join deduplicates same-table references. - V1 session-catalog source emits exactly 3 parts, normalized to `[spark_catalog, db, table]` by `qualifyV1`. - Multi-level V2 namespace source (`testcat.ns_a.ns_b.events_deep`) emits 4-part `nameParts`. **Section 3 -- SELECT cases (5 tests, modeled on `MetricViewSuite` patterns):** - `SELECT measure(count_sum) FROM <mv> GROUP BY region` returns aggregated rows (exercises the full `loadTableOrView` -> `MetadataTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip). - `SELECT measure(...) WHERE region = ...` -- query-layer filter on top of the view. - View's pre-defined `where` clause is applied (`where = Some("count > 1")` filters at view-resolution time). - Multiple measures with different aggregations (sum / sum / max). - `ORDER BY measure(...) DESC LIMIT 1` over the metric view. **Section 4 -- DESCRIBE cases (2 tests):** - `DESCRIBE TABLE EXTENDED` round-trips through `loadTableOrView` and emits the `View Text` / `Type` rows (gated through `CatalogTable.isViewLike` in `toJsonLinkedHashMap`, which now recognizes `METRIC_VIEW`). - `DESCRIBE TABLE` (non-EXTENDED) returns the aliased columns. **Section 5 -- DROP / SHOW / RENAME cases (8 tests):** - `DROP VIEW` succeeds on a v2 metric view. - `DROP VIEW IF EXISTS` on a non-existent v2 metric view is a no-op. - `DROP TABLE` on a v2 metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", per SPARK-56655's `DropTableExec`) and asserts the metric view is **not** deleted. - `DROP TABLE IF EXISTS` on a v2 metric view also throws (`IF EXISTS` doesn't silence the wrong-type error, v1 parity). - `SHOW CREATE TABLE` on a v2 metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (same dedicated error class as the v1 path). - `SHOW TABLES` on a `TableViewCatalog` lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655). - `SHOW VIEWS` lists v2 metric views. - `ALTER VIEW <metric_view> RENAME TO ...` succeeds and preserves the metric-view kind across the rename (pins `RenameV2ViewExec` end-to-end against the fixture's `renameView`). Existing session-catalog metric-view tests (`MetricViewSuite`, `SimpleMetricViewSuite`, `HiveMetricViewSuite`) and v1 path tests pass unchanged. `DDLSuite` and `HiveDDLSuite` had their `tableTypes` enumerations updated to include `METRIC_VIEW` in two assertion lists. `PlanResolutionSuite` test fixture was updated to stub the new `CatalogTable.isViewLike` method on the Mockito mock. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Sonnet 4.7) Closes #55487 from chenwang-databricks/metric-view-on-51419. Lead-authored-by: Chen Wang <chen.wang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8cfc73a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR extends metric-view support to DS v2 catalogs by routing
CREATE VIEW ... WITH METRICSthrough theViewCatalog/TableViewCatalogAPIs introduced by SPARK-52729 and finalized by SPARK-56655. Third-party v2 catalogs that implementViewCatalogcan now host metric views with the same metadata fidelity as session-catalog metric views.1. V2 metric-view CREATE path -- shared with
CreateV2ViewExec. A newCreateV2MetricViewExecandCreateV2ViewExecboth extend a newV2CreateViewPreparationtrait (which itself extendsV2ViewPreparation). The trait owns the shared CREATE-siderun():viewExistsshort-circuit onIF NOT EXISTS,createOrReplaceViewforOR REPLACE, and cross-type collision decoding (ViewAlreadyExistsException->tableExists->EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-modeUNSUPPORTED, typedviewDependencies,PROP_TABLE_TYPE = METRIC_VIEW,retainColumnMetadata = true) via optional hooks onV2ViewPreparation.DataSourceV2StrategyinterceptsCreateMetricViewCommandon a non-session catalog and routes to the new exec; the v1 session-catalog path stays inCreateMetricViewCommand.run.2. First-class
METRIC_VIEWtable type.CatalogTableType.METRIC_VIEWis added alongsideEXTERNAL/MANAGED/VIEW.TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"constant for the V2 surface.view.viewWithMetricsproperty hack is removed;CatalogTable.isMetricViewcheckstableType == METRIC_VIEWdirectly.V1Table.summarizeTableTypeandV1Table.toCatalogTable(catalog, ident, ViewInfo)translate between the V2 property form and the V1 enum.HiveTableTypehas noMETRIC_VIEWvariant (both regular views and metric views serialize asVIRTUAL_VIEW).HiveExternalCatalognow persists aview.subType = METRIC_VIEWproperty on write and liftstableTypeback toMETRIC_VIEWon read, so HMS-backed metric views survive the round trip.3. Repo-wide
tableType == VIEWaudit +CatalogTable.isViewLikehelper. Promoting metric views to a distinctCatalogTableTypeopens silent regressions wherever existing code branches onVIEW. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces:CatalogTable.isViewLikeinstance method (DBR parity: today returnstableType == VIEW || tableType == METRIC_VIEW; forks may extend the set).CatalogTable.isViewLike(t: CatalogTableType)companion form for the few sites that have aCatalogTableTypebut noCatalogTable(e.g.SessionCatalog.isView,verifyAlterTableType,HiveClientImpl.toHiveTableType).All 18 sites in
catalyst/core/hivethat previously did inlinetableType == VIEW || tableType == METRIC_VIEW(or theCatalogTableType.VIEW | CatalogTableType.METRIC_VIEWpattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites:CatalogTable.toJsonLinkedHashMap(DESCRIBE EXTENDED rows),HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata},HiveClientImpl.toHiveTableType,SessionCatalog.isView,InMemoryCatalog.listViews,RelationResolution,Analyzer.lookupTableOrView,rules.scala,DataStreamWriter,DescribeRelationJsonCommand,AnalyzeColumnCommand,AnalyzePartitionCommand,CommandUtils.analyzeTable,V2SessionCatalog.dropTableInternal,verifyAlterTableTypeinddl.scala, and 3 sites intables.scala.Explicit rejection (uniform error class):
SHOW CREATE TABLEon a metric view has no round-trippableCREATE VIEW ... WITH METRICSform, so it's rejected explicitly with the dedicatedUNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEWerror class on both the v1 session-catalog path (intables.scala) and the v2 catalog path (inDataSourceV2Strategy), so users see the same actionable message regardless of catalog kind.4. Drop-command parity.
DropTableCommand(v1 path) treats bothVIEWandMETRIC_VIEWas views:DROP TABLErejects either withwrongCommandForObjectTypeError, andDROP VIEWaccepts either.V2SessionCatalog.dropTableInternalextends the existing "view rejected fromDROP TABLE" guard to coverMETRIC_VIEW.DropTableExec(post-SPARK-56655) actively rejects withWRONG_COMMAND_FOR_OBJECT_TYPE("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views sinceTableViewCatalog's defaultviewExistsderives fromloadTableOrViewand recognizesMetadataTable + ViewInfo.ResolveSessionCatalog'sDropViewrouting comment is clarified: v2 metric views fall through toDataSourceV2StrategyandViewCatalog.dropView.5. Typed view dependencies (
ViewInfo.viewDependencies).org.apache.spark.sql.connector.catalog:Dependency(sealed interface withDependency.table(String[])/Dependency.function(String[])non-vararg factories),TableDependency,FunctionDependency,DependencyList(Dependency[]).TableDependencyandFunctionDependencycarry the dependency identifier as structural multi-part name parts (record TableDependency(String[] nameParts)), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Icebergcat.db1.db2.tbl-> 4 parts) round-trip without ambiguity against quoted identifiers containing literal.. v1 sources resolved through the session catalog are normalized by a newMetricViewHelper.qualifyV1to a stable 3-part[spark_catalog, db, table]shape so consumers see deterministic arity per source kind (otherwiseTableIdentifier.namePartscould return 1, 2, or 3 parts depending on what the analyzer captured).TableDependency,FunctionDependency,DependencyList) overrideequals/hashCode/toStringusingArrays.equals/Arrays.hashCode/Arrays.toStringto give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through toObject.equals(reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensiveclone()so callers cannot mutate the record's internal array.ViewInfogains aviewDependenciesfield and aViewInfo.Builder.withViewDependencies(...)setter. Per the field's contract,nullmeans "no dependency list was supplied" while an emptyDependencyList.of(new Dependency[0])means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even whencollectTableDependenciesreturns empty.MetricViewHelper.collectTableDependencieswalks the analyzed plan and emits structuralSeq[Seq[String]]parts; the v2 source arm preserves full namespace arity, the v1 source arms (View,HiveTableRelation,LogicalRelation) all route throughqualifyV1for the stable 3-part shape.6. Multi-level-namespace targets for v2 metric views.
MetricViewHelper.analyzeMetricViewTextpreviously required aTableIdentifier, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g.cat.db1.db2.mv) failed atident.asTableIdentifierwithrequiresSinglePartNamespaceError. The helper now takesnameParts: Seq[String]directly; call sites in both the v1 path (CreateMetricViewCommand) and the v2 path (DataSourceV2Strategy) updated. The helper now also returns(LogicalPlan, MetricView)so callers don't have to re-parse the YAML body just to read descriptor properties.7.
metric_view.*descriptor properties (v1/v2 parity).MetricView.getPropertiesproduces canonical descriptive properties (metric_view.from.type,metric_view.from.name/metric_view.from.sql,metric_view.where) that both the v1 path (CreateMetricViewCommand.createMetricViewInSessionCatalog) and the v2 path (DataSourceV2Strategy) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows inDESCRIBE TABLE EXTENDEDregardless of catalog kind. Long values are truncated toConstants.MAXIMUM_PROPERTY_SIZE; the Scaladoc ongetPropertiescalls out thatmetric_view.from.sqlis therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.8.
ViewInfoconstructor cleanup. The metric-view-specificPROP_TABLE_TYPE = METRIC_VIEWspecial case is dropped from the genericViewInfoconstructor in favor ofproperties().putIfAbsent(...). Callers that want a more specific kind (e.g.METRIC_VIEW) callBaseBuilder.withTableType(...)beforebuild()-- exercised byCreateV2MetricViewExecvia the newV2ViewPreparation.tableTypehook.9.
ViewHelper.aliasPlan(retainMetadata). The user-specified-column-with-comment branch inaliasPlanpreviously dropped existing column metadata. A newretainMetadata: Boolean = falseparameter merges the analyzed attribute's metadata into the new comment metadata.ViewHelper.prepareTablepassesretainMetadata = isMetricView(v1 path);V2ViewPreparationexposes aretainColumnMetadatahook thatCreateV2MetricViewExecoverrides totrue(v2 path). Both preserve the per-columnmetric_view.type/metric_view.exprkeys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.10. Error classes.
INVALID_METRIC_VIEW_YAML(sqlState 42K0L).MetricViewPlanner.parseYAML's catch blocks now route throughQueryCompilationErrors.invalidMetricViewYamlErrorinstead ofSparkException.internalError, so a typo in the user's YAML body surfaces as a user-correctableAnalysisExceptionrather than "please contact support".UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW(sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path soSHOW CREATE TABLEon a metric view produces the same actionable message regardless of catalog kind.11. Misc.
MetricViewCanonical.parseSourceaccepts multipart identifiers (parseMultipartIdentifier) so 3-partcatalog.schema.tablesource references work asAssetSource.Why are the changes needed?
Before this PR, metric-view DDL only worked against the session catalog: the create path called
SessionCatalog.createTabledirectly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shippedViewCatalogandTableViewCatalogas the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.Once metric views can live on a v2 catalog, two more constraints surface:
ViewCatalog.loadViewneeds to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this inPROP_TABLE_TYPE = METRIC_VIEWkeeps the distinction wire-compatible and letsV1Table.toCatalogTablereconstructCatalogTableType.METRIC_VIEWon the read path.DependencyListofTableDependency/FunctionDependencywith structuralString[] namePartslets catalogs persist the lineage as a first-class field with full fidelity.The remaining changes (drop-command parity,
aliasPlanmetadata retention,metric_view.*properties,parseMultipartIdentifier,tableType == VIEWaudit +isViewLikehelper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a realCatalogTableTypeand as a v2 catalog citizen -- without them, basic operations likeDROP VIEW,DESCRIBE TABLE EXTENDED,CREATE VIEW (a COMMENT 'c') WITH METRICS ..., orCREATE VIEW cat.db1.db2.mv WITH METRICS ...would silently degrade.Does this PR introduce any user-facing change?
Yes, both for end users and for catalog plugin developers:
End users:
CREATE VIEW <ident> WITH METRICS ...now works against any v2 catalog that implementsViewCatalog, including catalogs with multi-level namespace targets. Previously it was rejected withMISSING_CATALOG_ABILITY.VIEWSfor non-session catalogs, and capped at single-level namespaces.IF NOT EXISTSandOR REPLACEare honored on the v2 path (regression vs. v1 fixed).SELECT region, measure(count_sum) FROM <mv> ..., dropped withDROP VIEW, listed viaSHOW VIEWS(and viaSHOW TABLESon aTableViewCatalog, matching v1 SHOW TABLES output), and described withDESCRIBE TABLE/DESCRIBE TABLE EXTENDED.DROP TABLEon a metric view throwsWRONG_COMMAND_FOR_OBJECT_TYPE("Use DROP VIEW instead").ALTER VIEW <metric_view> RENAME TO ...is wired throughRenameV2ViewExecand preserves the metric-view kind across the rename.SHOW CREATE TABLEon a metric view throwsUNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW(no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind.CatalogTableType.METRIC_VIEWinstead ofCatalogTableType.VIEW + view.viewWithMetrics=true. Observable inDESCRIBE TABLE EXTENDED'sTyperow and thetableTypecolumn of thetablessystem table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via aview.subType = METRIC_VIEWproperty marker.DESCRIBE TABLE EXTENDEDon metric views (v1 and v2) now consistently surfacesmetric_view.from.type/metric_view.from.name/metric_view.from.sql/metric_view.wheredescriptor rows.DROP TABLE/DROP VIEWmismatch now mentionMETRIC_VIEWalongsideVIEW.INVALID_METRIC_VIEW_YAML(user-correctable) instead of "Spark internal error, please contact support".Catalog plugin developers:
org.apache.spark.sql.connector.catalog: sealed interfaceDependencywithpermits TableDependency, FunctionDependency, both records carryingString[] nameParts.Dependency.table(String[])/Dependency.function(String[])static factories (non-vararg per review; callers pass an existing array directly).DependencyList(Dependency[])withDependencyList.of(Dependency[])factory. All three records overrideequals/hashCode/toStringto give value semantics on their array fields, and the canonical accessors return a defensiveclone()so internal state is not mutable through the public API. All@Evolving,@since 4.2.0. Note: today's only producer in Spark itself is metric-view dependency extraction, which emitsTableDependencyonly;FunctionDependencyandDependency.function(...)are exposed as groundwork for future producers (e.g. SQL UDF dep tracking).ViewInfogains a typedviewDependencies()accessor andViewInfo.Builder.withViewDependencies(...)setter.viewDependenciesis populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back throughV1Table.toCatalogTable) carrynull. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up.TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"constant.CatalogTableType.METRIC_VIEWenum value (v1 surface).CatalogTable.isViewLikeinstance +CatalogTable.isViewLike(CatalogTableType)companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR'sMATERIALIZED_VIEW,STREAMING_TABLE) only need to extend the helper body.V2ViewPreparation(private toorg.apache.spark.sql.execution.datasources.v2) gains optionalviewDependencies/tableType/retainColumnMetadatahooks.How was this patch tested?
MetricViewV2CatalogSuite-- 31 tests across 5 sections, all against an in-memoryViewCatalogtest fixture (MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog):Section 1 -- CREATE-related (11 tests):
METRIC_VIEWtable type and view text viaViewInfo.metric_view.*descriptor properties + view context (currentCatalog/currentNamespace) + captured SQL configs.SQLSourceand comment.metric_view.type/metric_view.exprin column metadata.metric_view.*metadata (pins thealiasPlan(retainMetadata = true)fix).CREATE OR REPLACE VIEW ... WITH METRICSreplaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText,metric_view.where, dependencies).CREATE VIEW IF NOT EXISTS ... WITH METRICSis a no-op when the view exists (catalog never sees the secondcreateViewcall).CREATE VIEW ... WITH METRICSover a v2 table at the ident throwsTABLE_OR_VIEW_ALREADY_EXISTS(analyzer-time pre-check).CREATE VIEW IF NOT EXISTS ... WITH METRICSis a no-op when a v2 table sits at the ident (v1 parity).CREATE VIEW ... WITH METRICSon a non-ViewCatalogcatalog fails withMISSING_CATALOG_ABILITY.VIEWS.CREATE VIEW ... WITH METRICSat a multi-level-namespace v2 target (testcat.ns_a.ns_b.mv_deep) succeeds (pins theanalyzeMetricViewTextlift toSeq[String]).Section 2 -- Dependency extraction (5 tests):
JOINcaptures both tables as 3-partnameParts.[spark_catalog, db, table]byqualifyV1.testcat.ns_a.ns_b.events_deep) emits 4-partnameParts.Section 3 -- SELECT cases (5 tests, modeled on
MetricViewSuitepatterns):SELECT measure(count_sum) FROM <mv> GROUP BY regionreturns aggregated rows (exercises the fullloadTableOrView->MetadataTable(ViewInfo)->V1Table.toCatalogTable(ViewInfo)->ResolveMetricViewround-trip).SELECT measure(...) WHERE region = ...-- query-layer filter on top of the view.whereclause is applied (where = Some("count > 1")filters at view-resolution time).ORDER BY measure(...) DESC LIMIT 1over the metric view.Section 4 -- DESCRIBE cases (2 tests):
DESCRIBE TABLE EXTENDEDround-trips throughloadTableOrViewand emits theView Text/Typerows (gated throughCatalogTable.isViewLikeintoJsonLinkedHashMap, which now recognizesMETRIC_VIEW).DESCRIBE TABLE(non-EXTENDED) returns the aliased columns.Section 5 -- DROP / SHOW / RENAME cases (8 tests):
DROP VIEWsucceeds on a v2 metric view.DROP VIEW IF EXISTSon a non-existent v2 metric view is a no-op.DROP TABLEon a v2 metric view throwsWRONG_COMMAND_FOR_OBJECT_TYPE("Use DROP VIEW instead", per SPARK-56655'sDropTableExec) and asserts the metric view is not deleted.DROP TABLE IF EXISTSon a v2 metric view also throws (IF EXISTSdoesn't silence the wrong-type error, v1 parity).SHOW CREATE TABLEon a v2 metric view throwsUNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW(same dedicated error class as the v1 path).SHOW TABLESon aTableViewCataloglists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655).SHOW VIEWSlists v2 metric views.ALTER VIEW <metric_view> RENAME TO ...succeeds and preserves the metric-view kind across the rename (pinsRenameV2ViewExecend-to-end against the fixture'srenameView).Existing session-catalog metric-view tests (
MetricViewSuite,SimpleMetricViewSuite,HiveMetricViewSuite) and v1 path tests pass unchanged.DDLSuiteandHiveDDLSuitehad theirtableTypesenumerations updated to includeMETRIC_VIEWin two assertion lists.PlanResolutionSuitetest fixture was updated to stub the newCatalogTable.isViewLikemethod on the Mockito mock.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Sonnet 4.7)