Skip to content

[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487

Closed
chenwang-databricks wants to merge 17 commits intoapache:masterfrom
chenwang-databricks:metric-view-on-51419
Closed

[SPARK-54119] Support METRIC_VIEW creation on V2 catalogs#55487
chenwang-databricks wants to merge 17 commits intoapache:masterfrom
chenwang-databricks:metric-view-on-51419

Conversation

@chenwang-databricks
Copy link
Copy Markdown
Contributor

@chenwang-databricks chenwang-databricks commented Apr 22, 2026

What changes were proposed in this pull request?

This PR extends metric-view support to DS v2 catalogs by routing CREATE VIEW ... WITH METRICS through the ViewCatalog / TableViewCatalog APIs introduced by SPARK-52729 and finalized by SPARK-56655. Third-party v2 catalogs that implement ViewCatalog can now host metric views with the same metadata fidelity as session-catalog metric views.

1. V2 metric-view CREATE path -- shared with CreateV2ViewExec. A new CreateV2MetricViewExec and CreateV2ViewExec both extend a new V2CreateViewPreparation trait (which itself extends V2ViewPreparation). The trait owns the shared CREATE-side run(): viewExists short-circuit on IF NOT EXISTS, createOrReplaceView for OR REPLACE, and cross-type collision decoding (ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode UNSUPPORTED, typed viewDependencies, PROP_TABLE_TYPE = METRIC_VIEW, retainColumnMetadata = true) via optional hooks on V2ViewPreparation. DataSourceV2Strategy intercepts CreateMetricViewCommand on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in CreateMetricViewCommand.run.

2. First-class METRIC_VIEW table type.

  • CatalogTableType.METRIC_VIEW is added alongside EXTERNAL / MANAGED / VIEW.
  • TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW" constant for the V2 surface.
  • The previous view.viewWithMetrics property hack is removed; CatalogTable.isMetricView checks tableType == METRIC_VIEW directly.
  • V1Table.summarizeTableType and V1Table.toCatalogTable(catalog, ident, ViewInfo) translate between the V2 property form and the V1 enum.
  • HMS round-trip support: HiveTableType has no METRIC_VIEW variant (both regular views and metric views serialize as VIRTUAL_VIEW). HiveExternalCatalog now persists a view.subType = METRIC_VIEW property on write and lifts tableType back to METRIC_VIEW on read, so HMS-backed metric views survive the round trip.

3. Repo-wide tableType == VIEW audit + CatalogTable.isViewLike helper. Promoting metric views to a distinct CatalogTableType opens silent regressions wherever existing code branches on VIEW. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces:

  • CatalogTable.isViewLike instance method (DBR parity: today returns tableType == VIEW || tableType == METRIC_VIEW; forks may extend the set).
  • CatalogTable.isViewLike(t: CatalogTableType) companion form for the few sites that have a CatalogTableType but no CatalogTable (e.g. SessionCatalog.isView, verifyAlterTableType, HiveClientImpl.toHiveTableType).

All 18 sites in catalyst / core / hive that previously did inline tableType == VIEW || tableType == METRIC_VIEW (or the CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW pattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites: CatalogTable.toJsonLinkedHashMap (DESCRIBE EXTENDED rows), HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}, HiveClientImpl.toHiveTableType, SessionCatalog.isView, InMemoryCatalog.listViews, RelationResolution, Analyzer.lookupTableOrView, rules.scala, DataStreamWriter, DescribeRelationJsonCommand, AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable, V2SessionCatalog.dropTableInternal, verifyAlterTableType in ddl.scala, and 3 sites in tables.scala.

Explicit rejection (uniform error class): SHOW CREATE TABLE on a metric view has no round-trippable CREATE VIEW ... WITH METRICS form, so it's rejected explicitly with the dedicated UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW error class on both the v1 session-catalog path (in tables.scala) and the v2 catalog path (in DataSourceV2Strategy), so users see the same actionable message regardless of catalog kind.

4. Drop-command parity.

  • DropTableCommand (v1 path) treats both VIEW and METRIC_VIEW as views: DROP TABLE rejects either with wrongCommandForObjectTypeError, and DROP VIEW accepts either.
  • V2SessionCatalog.dropTableInternal extends the existing "view rejected from DROP TABLE" guard to cover METRIC_VIEW.
  • For non-session v2: DropTableExec (post-SPARK-56655) actively rejects with WRONG_COMMAND_FOR_OBJECT_TYPE ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since TableViewCatalog's default viewExists derives from loadTableOrView and recognizes MetadataTable + ViewInfo.
  • ResolveSessionCatalog's DropView routing comment is clarified: v2 metric views fall through to DataSourceV2Strategy and ViewCatalog.dropView.

5. Typed view dependencies (ViewInfo.viewDependencies).

  • New public DTOs in org.apache.spark.sql.connector.catalog: Dependency (sealed interface with Dependency.table(String[]) / Dependency.function(String[]) non-vararg factories), TableDependency, FunctionDependency, DependencyList(Dependency[]).
  • TableDependency and FunctionDependency carry the dependency identifier as structural multi-part name parts (record TableDependency(String[] nameParts)), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg cat.db1.db2.tbl -> 4 parts) round-trip without ambiguity against quoted identifiers containing literal .. v1 sources resolved through the session catalog are normalized by a new MetricViewHelper.qualifyV1 to a stable 3-part [spark_catalog, db, table] shape so consumers see deterministic arity per source kind (otherwise TableIdentifier.nameParts could return 1, 2, or 3 parts depending on what the analyzer captured).
  • All three records (TableDependency, FunctionDependency, DependencyList) override equals / hashCode / toString using Arrays.equals / Arrays.hashCode / Arrays.toString to give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through to Object.equals (reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensive clone() so callers cannot mutate the record's internal array.
  • ViewInfo gains a viewDependencies field and a ViewInfo.Builder.withViewDependencies(...) setter. Per the field's contract, null means "no dependency list was supplied" while an empty DependencyList.of(new Dependency[0]) means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even when collectTableDependencies returns empty.
  • MetricViewHelper.collectTableDependencies walks the analyzed plan and emits structural Seq[Seq[String]] parts; the v2 source arm preserves full namespace arity, the v1 source arms (View, HiveTableRelation, LogicalRelation) all route through qualifyV1 for the stable 3-part shape.

6. Multi-level-namespace targets for v2 metric views. MetricViewHelper.analyzeMetricViewText previously required a TableIdentifier, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. cat.db1.db2.mv) failed at ident.asTableIdentifier with requiresSinglePartNamespaceError. The helper now takes nameParts: Seq[String] directly; call sites in both the v1 path (CreateMetricViewCommand) and the v2 path (DataSourceV2Strategy) updated. The helper now also returns (LogicalPlan, MetricView) so callers don't have to re-parse the YAML body just to read descriptor properties.

7. metric_view.* descriptor properties (v1/v2 parity). MetricView.getProperties produces canonical descriptive properties (metric_view.from.type, metric_view.from.name / metric_view.from.sql, metric_view.where) that both the v1 path (CreateMetricViewCommand.createMetricViewInSessionCatalog) and the v2 path (DataSourceV2Strategy) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows in DESCRIBE TABLE EXTENDED regardless of catalog kind. Long values are truncated to Constants.MAXIMUM_PROPERTY_SIZE; the Scaladoc on getProperties calls out that metric_view.from.sql is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.

8. ViewInfo constructor cleanup. The metric-view-specific PROP_TABLE_TYPE = METRIC_VIEW special case is dropped from the generic ViewInfo constructor in favor of properties().putIfAbsent(...). Callers that want a more specific kind (e.g. METRIC_VIEW) call BaseBuilder.withTableType(...) before build() -- exercised by CreateV2MetricViewExec via the new V2ViewPreparation.tableType hook.

9. ViewHelper.aliasPlan(retainMetadata). The user-specified-column-with-comment branch in aliasPlan previously dropped existing column metadata. A new retainMetadata: Boolean = false parameter merges the analyzed attribute's metadata into the new comment metadata. ViewHelper.prepareTable passes retainMetadata = isMetricView (v1 path); V2ViewPreparation exposes a retainColumnMetadata hook that CreateV2MetricViewExec overrides to true (v2 path). Both preserve the per-column metric_view.type / metric_view.expr keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.

10. Error classes.

  • New INVALID_METRIC_VIEW_YAML (sqlState 42K0L). MetricViewPlanner.parseYAML's catch blocks now route through QueryCompilationErrors.invalidMetricViewYamlError instead of SparkException.internalError, so a typo in the user's YAML body surfaces as a user-correctable AnalysisException rather than "please contact support".
  • New UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path so SHOW CREATE TABLE on a metric view produces the same actionable message regardless of catalog kind.

11. Misc.

  • MetricViewCanonical.parseSource accepts multipart identifiers (parseMultipartIdentifier) so 3-part catalog.schema.table source references work as AssetSource.

Why are the changes needed?

Before this PR, metric-view DDL only worked against the session catalog: the create path called SessionCatalog.createTable directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped ViewCatalog and TableViewCatalog as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.

Once metric views can live on a v2 catalog, two more constraints surface:

  1. Type discriminator. A consumer reading a row through ViewCatalog.loadView needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in PROP_TABLE_TYPE = METRIC_VIEW keeps the distinction wire-compatible and lets V1Table.toCatalogTable reconstruct CatalogTableType.METRIC_VIEW on the read path.
  2. Structured dependency lineage. Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed DependencyList of TableDependency / FunctionDependency with structural String[] nameParts lets catalogs persist the lineage as a first-class field with full fidelity.

The remaining changes (drop-command parity, aliasPlan metadata retention, metric_view.* properties, parseMultipartIdentifier, tableType == VIEW audit + isViewLike helper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real CatalogTableType and as a v2 catalog citizen -- without them, basic operations like DROP VIEW, DESCRIBE TABLE EXTENDED, CREATE VIEW (a COMMENT 'c') WITH METRICS ..., or CREATE VIEW cat.db1.db2.mv WITH METRICS ... would silently degrade.

Does this PR introduce any user-facing change?

Yes, both for end users and for catalog plugin developers:

End users:

  • CREATE VIEW <ident> WITH METRICS ... now works against any v2 catalog that implements ViewCatalog, including catalogs with multi-level namespace targets. Previously it was rejected with MISSING_CATALOG_ABILITY.VIEWS for non-session catalogs, and capped at single-level namespaces. IF NOT EXISTS and OR REPLACE are honored on the v2 path (regression vs. v1 fixed).
  • A v2 metric view can be queried with SELECT region, measure(count_sum) FROM <mv> ..., dropped with DROP VIEW, listed via SHOW VIEWS (and via SHOW TABLES on a TableViewCatalog, matching v1 SHOW TABLES output), and described with DESCRIBE TABLE / DESCRIBE TABLE EXTENDED. DROP TABLE on a metric view throws WRONG_COMMAND_FOR_OBJECT_TYPE ("Use DROP VIEW instead").
  • ALTER VIEW <metric_view> RENAME TO ... is wired through RenameV2ViewExec and preserves the metric-view kind across the rename.
  • SHOW CREATE TABLE on a metric view throws UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind.
  • Session-catalog metric views are now stored as CatalogTableType.METRIC_VIEW instead of CatalogTableType.VIEW + view.viewWithMetrics=true. Observable in DESCRIBE TABLE EXTENDED's Type row and the tableType column of the tables system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a view.subType = METRIC_VIEW property marker.
  • DESCRIBE TABLE EXTENDED on metric views (v1 and v2) now consistently surfaces metric_view.from.type / metric_view.from.name / metric_view.from.sql / metric_view.where descriptor rows.
  • Error messages from DROP TABLE / DROP VIEW mismatch now mention METRIC_VIEW alongside VIEW.
  • Malformed metric-view YAML now surfaces as INVALID_METRIC_VIEW_YAML (user-correctable) instead of "Spark internal error, please contact support".

Catalog plugin developers:

  • New public API surface in org.apache.spark.sql.connector.catalog: sealed interface Dependency with permits TableDependency, FunctionDependency, both records carrying String[] nameParts. Dependency.table(String[]) / Dependency.function(String[]) static factories (non-vararg per review; callers pass an existing array directly). DependencyList(Dependency[]) with DependencyList.of(Dependency[]) factory. All three records override equals / hashCode / toString to give value semantics on their array fields, and the canonical accessors return a defensive clone() so internal state is not mutable through the public API. All @Evolving, @since 4.2.0. Note: today's only producer in Spark itself is metric-view dependency extraction, which emits TableDependency only; FunctionDependency and Dependency.function(...) are exposed as groundwork for future producers (e.g. SQL UDF dep tracking).
  • ViewInfo gains a typed viewDependencies() accessor and ViewInfo.Builder.withViewDependencies(...) setter. viewDependencies is populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back through V1Table.toCatalogTable) carry null. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up.
  • TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW" constant.
  • CatalogTableType.METRIC_VIEW enum value (v1 surface).
  • CatalogTable.isViewLike instance + CatalogTable.isViewLike(CatalogTableType) companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR's MATERIALIZED_VIEW, STREAMING_TABLE) only need to extend the helper body.
  • V2ViewPreparation (private to org.apache.spark.sql.execution.datasources.v2) gains optional viewDependencies / tableType / retainColumnMetadata hooks.

How was this patch tested?

MetricViewV2CatalogSuite -- 31 tests across 5 sections, all against an in-memory ViewCatalog test fixture (MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog):

Section 1 -- CREATE-related (11 tests):

  • V2 catalog receives METRIC_VIEW table type and view text via ViewInfo.
  • V2 catalog path populates metric_view.* descriptor properties + view context (currentCatalog / currentNamespace) + captured SQL configs.
  • V2 catalog path captures SQLSource and comment.
  • Metric view columns carry metric_view.type / metric_view.expr in column metadata.
  • User-specified column names with comments preserve metric_view.* metadata (pins the aliasPlan(retainMetadata = true) fix).
  • CREATE OR REPLACE VIEW ... WITH METRICS replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, metric_view.where, dependencies).
  • CREATE VIEW IF NOT EXISTS ... WITH METRICS is a no-op when the view exists (catalog never sees the second createView call).
  • CREATE VIEW ... WITH METRICS over a v2 table at the ident throws TABLE_OR_VIEW_ALREADY_EXISTS (analyzer-time pre-check).
  • CREATE VIEW IF NOT EXISTS ... WITH METRICS is a no-op when a v2 table sits at the ident (v1 parity).
  • CREATE VIEW ... WITH METRICS on a non-ViewCatalog catalog fails with MISSING_CATALOG_ABILITY.VIEWS.
  • CREATE VIEW ... WITH METRICS at a multi-level-namespace v2 target (testcat.ns_a.ns_b.mv_deep) succeeds (pins the analyzeMetricViewText lift to Seq[String]).

Section 2 -- Dependency extraction (5 tests):

  • SQL source JOIN captures both tables as 3-part nameParts.
  • SQL source subquery deduplicates same-table references.
  • SQL source self-join deduplicates same-table references.
  • V1 session-catalog source emits exactly 3 parts, normalized to [spark_catalog, db, table] by qualifyV1.
  • Multi-level V2 namespace source (testcat.ns_a.ns_b.events_deep) emits 4-part nameParts.

Section 3 -- SELECT cases (5 tests, modeled on MetricViewSuite patterns):

  • SELECT measure(count_sum) FROM <mv> GROUP BY region returns aggregated rows (exercises the full loadTableOrView -> MetadataTable(ViewInfo) -> V1Table.toCatalogTable(ViewInfo) -> ResolveMetricView round-trip).
  • SELECT measure(...) WHERE region = ... -- query-layer filter on top of the view.
  • View's pre-defined where clause is applied (where = Some("count > 1") filters at view-resolution time).
  • Multiple measures with different aggregations (sum / sum / max).
  • ORDER BY measure(...) DESC LIMIT 1 over the metric view.

Section 4 -- DESCRIBE cases (2 tests):

  • DESCRIBE TABLE EXTENDED round-trips through loadTableOrView and emits the View Text / Type rows (gated through CatalogTable.isViewLike in toJsonLinkedHashMap, which now recognizes METRIC_VIEW).
  • DESCRIBE TABLE (non-EXTENDED) returns the aliased columns.

Section 5 -- DROP / SHOW / RENAME cases (8 tests):

  • DROP VIEW succeeds on a v2 metric view.
  • DROP VIEW IF EXISTS on a non-existent v2 metric view is a no-op.
  • DROP TABLE on a v2 metric view throws WRONG_COMMAND_FOR_OBJECT_TYPE ("Use DROP VIEW instead", per SPARK-56655's DropTableExec) and asserts the metric view is not deleted.
  • DROP TABLE IF EXISTS on a v2 metric view also throws (IF EXISTS doesn't silence the wrong-type error, v1 parity).
  • SHOW CREATE TABLE on a v2 metric view throws UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (same dedicated error class as the v1 path).
  • SHOW TABLES on a TableViewCatalog lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655).
  • SHOW VIEWS lists v2 metric views.
  • ALTER VIEW <metric_view> RENAME TO ... succeeds and preserves the metric-view kind across the rename (pins RenameV2ViewExec end-to-end against the fixture's renameView).

Existing session-catalog metric-view tests (MetricViewSuite, SimpleMetricViewSuite, HiveMetricViewSuite) and v1 path tests pass unchanged. DDLSuite and HiveDDLSuite had their tableTypes enumerations updated to include METRIC_VIEW in two assertion lists. PlanResolutionSuite test fixture was updated to stub the new CatalogTable.isViewLike method on the Mockito mock.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Sonnet 4.7)

@chenwang-databricks chenwang-databricks force-pushed the metric-view-on-51419 branch 3 times, most recently from 245dc93 to 9190bf0 Compare April 28, 2026 14:47
@chenwang-databricks chenwang-databricks changed the title [Draft] Support METRIC_VIEW creation on V2 catalogs (stacks on #51419) [SPARK-54119] Support METRIC_VIEW creation on V2 catalogs Apr 28, 2026
@chenwang-databricks chenwang-databricks marked this pull request as ready for review April 28, 2026 15:19
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extending metric views to v2 catalogs. The shape of the change is right — route through ViewCatalog.createView with PROP_TABLE_TYPE = METRIC_VIEW and a typed viewDependencies, and promote METRIC_VIEW to a first-class CatalogTableType. A few significant concerns:

1. v2 path forks instead of reusing V2ViewPreparation and changes behavior on IF NOT EXISTS / OR REPLACE. CreateMetricViewCommand.createMetricViewInV2Catalog is open-coded inside a v1 RunnableCommand and duplicates most of V2ViewPreparation.buildViewInfo, but with materially different behavior: allowExisting and replace are silently dropped on the v2 path (regression vs. v1, which honored ignoreIfExists), there's no viewExists short-circuit, no createOrReplaceView, no cross-type collision decoding (so a table sitting at the ident leaks ViewAlreadyExistsException instead of unsupportedCreateOrReplaceViewOnTableError), and the CharVarcharUtils / SchemaUtils.checkColumnNameDuplication / checkIndeterminateCollationInSchema / owner-stamping work that V2ViewPreparation does is missing. Refactoring this into a CreateV2MetricViewExec (or extending V2ViewPreparation with a viewDependencies / extraProperties hook) plus a DataSourceV2Strategy rule would close all of these gaps in one move.

2. Audit other tableType == VIEW gates for METRIC_VIEW. Promoting metric views to a distinct CatalogTableType opens up silent regressions wherever existing code branches on VIEW. Two I'm fairly sure about: CatalogTable.toJsonLinkedHashMap (interface.scala:670) gates View Text / View Original Text / View Schema Mode / View Catalog and Namespace / SQL Path on tableType == VIEW, so DESCRIBE TABLE EXTENDED on a metric view will lose all of those rows — visible regression vs. the pre-PR viewWithMetrics storage. And HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata} (lines 277, 297, 598, 838) only special-case VIEW, so a Hive-metastore-backed session-catalog metric view will go through the wrong branches on persist and read-back. Tests don't cover Hive-backed metric views, so this would only surface in production. Worth grepping tableType == CatalogTableType.VIEW and tableType == VIEW repo-wide and deciding which sites also need to accept METRIC_VIEW.

3. TableDependency.tableFullName is a single dot-joined string but the Javadoc claims a fixed three-part shape. This breaks down two ways: (a) MetricViewHelper.collectTableDependencies uses unquotedString for V1 sources, which yields a 2-part db.tbl whenever TableIdentifier.catalog is unset, and mkString(".") for V2 sources, which yields N-part names for catalogs with multi-level namespaces (e.g., Iceberg cat.db1.db2.tbl is 4 parts) — so the producer doesn't honor the contract; (b) any consumer parsing the string back via split/join loses arity and is ambiguous against quoted identifiers containing a literal .. Since this is brand-new public API, it's the right time to make the storage structural — e.g., TableDependency(String[] nameParts) — rather than committing to a string the producer can't always honor.

4. MetricViewPlanner.parseYAML raises SparkException.internalError for malformed user YAML. Replacing invalidLiteralForWindowDurationError is right, but internalError is the wrong category — a typo in the user's YAML body shouldn't surface as "Spark internal error, please contact support." Should be a QueryCompilationErrors / AnalysisException with a real error class.

Minor stuff (typed DTO sealing/mutability, the ViewInfo constructor's metric-view special case, missing tests for OR REPLACE / IF NOT EXISTS / V1-source dependency / multi-level-namespace V2 source / read-back through loadRelation) is in the inline comments below.

Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewInfo.java Outdated
Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Dependency.java Outdated
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 29, 2026
…eedback

Address all 4 substantive concerns and 8 inline comments from cloud-fan's
review on PR apache#55487.

V2 path refactor (concern 1, inline 87/111/142):
- New CreateV2MetricViewExec extends V2ViewPreparation (mirrors
  CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE
  via createOrReplaceView, and cross-type collision decoding via
  ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.
- DataSourceV2Strategy rule routes CreateMetricViewCommand on a
  non-session catalog to the new exec; CreateMetricViewCommand keeps only
  the v1 session-catalog path in run().
- V2ViewPreparation gains optional viewDependencies / tableType hooks the
  metric-view subclass populates; plain views leave them None.
- Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn-
  NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and
  PROP_OWNER stamping for free via the shared trait.

Structural TableDependency / FunctionDependency (concern 3, inline 30/169):
- Replace single-string `tableFullName` with `String[] nameParts`; arity
  is preserved per source, unambiguous against quoted identifiers
  containing literal `.`. Apply same shape to FunctionDependency.
- Dependency.table / .function factories take varargs.
- MetricViewHelper.collectTableDependencies returns Seq[Seq[String]];
  each match arm emits structural parts (V1 sources via
  TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name).

Sealed Dependency + defensive DependencyList (inline 30 / 40):
- `sealed interface Dependency permits TableDependency, FunctionDependency`
  enforces the documented "is one of" structurally.
- DependencyList canonical constructor and accessor defensively clone the
  array so consumers can't mutate stored ViewInfo dependency state.

ViewInfo constructor cleanup (inline 68):
- Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic
  constructor in favor of `properties().putIfAbsent(...)`. Callers that
  want a more specific kind (e.g. METRIC_VIEW) call
  BaseBuilder.withTableType(...) before build() -- exercised by
  CreateV2MetricViewExec via the new V2ViewPreparation tableType hook.

VIEW-gate audit (concern 2, inline 1088):
- CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts
  METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" /
  "View Original Text" / "View Schema Mode" / "View Catalog and
  Namespace" / "SQL Path" rows for metric views.
- HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2,
  alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW
  so a Hive-metastore-backed session-catalog metric view round-trips.
- Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews,
  RelationResolution.createDataSourceV1Scan (streaming-on-view rejection),
  Analyzer.lookupTableOrView (ResolvedPersistentView wrapping),
  rules.saveDataIntoView, DataStreamWriter.writeToV1Table,
  DescribeRelationJsonCommand.describePartitionInfoJson,
  AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable,
  tables.CreateTableLike provider lookup,
  tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo,
  tables.ShowCreateTableCommand (3 sites) -- all extended to accept
  METRIC_VIEW.

Real error class for malformed YAML (concern 4, inline 70):
- New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) +
  QueryCompilationErrors.invalidMetricViewYamlError. Replaces
  SparkException.internalError so a typo in the user's YAML body surfaces
  as a user-correctable AnalysisException.

Tests (inline 39):
- CREATE OR REPLACE VIEW WITH METRICS replaces the existing view.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists.
- CREATE VIEW WITH METRICS over a v2 table at the ident throws
  EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits
  at the ident.
- V1 session-catalog source dependency: nameParts arity 2-3.
- Multi-level V2 namespace source dependency: nameParts arity N+2.
- DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through
  loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo).
- Existing dep tests updated to assert nameParts instead of tableFullName.

Misc (inline 178):
- Scaladoc on MetricView.getProperties calls out that
  metric_view.from.sql / metric_view.where are truncated to
  Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable)
  values; consumers should re-read the YAML body for full SQL.
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala Outdated
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 29, 2026
…e#55487

CI fix:
- Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in
  `MetricViewV2CatalogSuite.scala` whose absence broke the test file's
  parse (compileIncremental "four errors found" -- ')' expected, unmatched
  '}', unclosed multi-line string, '}' expected at EOF).

Self-review feedback:
- Shorten `viewDependencies` and `tableType` Scaladocs on
  `V2ViewPreparation` to one sentence each; drop the plain-view /
  metric-view distinctions.
- Consolidate the duplicated `run()` body from `CreateV2ViewExec` and
  `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait
  (extends `V2ViewPreparation`) that owns the shared CREATE-side
  flow: `viewExists` short-circuit, `OR REPLACE` via
  `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for
  cross-type collisions. Both subclasses now extend
  `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate
  trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of
  CREATE-only fields.
- Drop the defensive `case _: ResolvedIdentifier =>` branch in
  `CreateMetricViewCommand.run` -- the catch-all internal-error case is
  sufficient now that `DataSourceV2Strategy` reliably intercepts the
  non-session path.
- `SHOW CREATE TABLE` on a metric view now throws the dedicated
  `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead
  of falling through to the VIEW branch. Reverts the three METRIC_VIEW
  additions in `tables.scala`'s VIEW gates from the previous round.

Test reorg + new coverage in `MetricViewV2CatalogSuite`:
- Reorganized into 5 sections: (1) Create-related, (2) Dependency
  extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW.
- Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced
  with positive assertions on the captured ViewInfo's queryText,
  metric_view.where, and viewDependencies (parts shape).
- New `SELECT from a v2 metric view returns aggregated rows` -- exercises
  the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` ->
  `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip
  and asserts on the aggregated output rows.
- New `DESCRIBE TABLE on a v2 metric view returns the aliased columns`
  (non-EXTENDED variant alongside the existing EXTENDED test).
- New `SHOW TABLES does not list v2 metric views` (RelationCatalog
  contract: tables only).
- New `SHOW VIEWS lists v2 metric views`.
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 29, 2026
… TABLE coverage

Add the test cases requested on PR apache#55487:

SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite`
patterns:
- Existing test fixed to use `measure(count_sum)` (the v1 suite shows the
  required syntax for measure columns) and switched to `checkAnswer`
  against an equivalent raw aggregation over the source.
- `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a
  filter at the query layer.
- `SELECT against a v2 metric view honors the view's pre-defined where
  clause` -- creates the metric view with `where = Some("count > 1")`,
  asserts only the matching rows surface.
- `SELECT from a v2 metric view supports multiple measures with different
  aggregations` -- adds price_sum / price_max alongside count_sum.
- `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`.

DROP TABLE on a v2 metric view (section 5) -- 2 new tests:
- `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` --
  `DropTableExec`'s `tableExists` probe returns false (RelationCatalog
  passive filtering), the `viewExists` fallback returns true, so the exec
  emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric
  view is still present in the catalog after the failed DROP.
- `DROP TABLE IF EXISTS on a v2 metric view also throws
  EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type
  error (v1 parity).

SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test:
- `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes
  through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and
  throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation
  "SHOW CREATE TABLE".
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 30, 2026
…eedback

Address all 4 substantive concerns and 8 inline comments from cloud-fan's
review on PR apache#55487.

V2 path refactor (concern 1, inline 87/111/142):
- New CreateV2MetricViewExec extends V2ViewPreparation (mirrors
  CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE
  via createOrReplaceView, and cross-type collision decoding via
  ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.
- DataSourceV2Strategy rule routes CreateMetricViewCommand on a
  non-session catalog to the new exec; CreateMetricViewCommand keeps only
  the v1 session-catalog path in run().
- V2ViewPreparation gains optional viewDependencies / tableType hooks the
  metric-view subclass populates; plain views leave them None.
- Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn-
  NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and
  PROP_OWNER stamping for free via the shared trait.

Structural TableDependency / FunctionDependency (concern 3, inline 30/169):
- Replace single-string `tableFullName` with `String[] nameParts`; arity
  is preserved per source, unambiguous against quoted identifiers
  containing literal `.`. Apply same shape to FunctionDependency.
- Dependency.table / .function factories take varargs.
- MetricViewHelper.collectTableDependencies returns Seq[Seq[String]];
  each match arm emits structural parts (V1 sources via
  TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name).

Sealed Dependency + defensive DependencyList (inline 30 / 40):
- `sealed interface Dependency permits TableDependency, FunctionDependency`
  enforces the documented "is one of" structurally.
- DependencyList canonical constructor and accessor defensively clone the
  array so consumers can't mutate stored ViewInfo dependency state.

ViewInfo constructor cleanup (inline 68):
- Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic
  constructor in favor of `properties().putIfAbsent(...)`. Callers that
  want a more specific kind (e.g. METRIC_VIEW) call
  BaseBuilder.withTableType(...) before build() -- exercised by
  CreateV2MetricViewExec via the new V2ViewPreparation tableType hook.

VIEW-gate audit (concern 2, inline 1088):
- CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts
  METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" /
  "View Original Text" / "View Schema Mode" / "View Catalog and
  Namespace" / "SQL Path" rows for metric views.
- HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2,
  alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW
  so a Hive-metastore-backed session-catalog metric view round-trips.
- Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews,
  RelationResolution.createDataSourceV1Scan (streaming-on-view rejection),
  Analyzer.lookupTableOrView (ResolvedPersistentView wrapping),
  rules.saveDataIntoView, DataStreamWriter.writeToV1Table,
  DescribeRelationJsonCommand.describePartitionInfoJson,
  AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable,
  tables.CreateTableLike provider lookup,
  tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo,
  tables.ShowCreateTableCommand (3 sites) -- all extended to accept
  METRIC_VIEW.

Real error class for malformed YAML (concern 4, inline 70):
- New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) +
  QueryCompilationErrors.invalidMetricViewYamlError. Replaces
  SparkException.internalError so a typo in the user's YAML body surfaces
  as a user-correctable AnalysisException.

Tests (inline 39):
- CREATE OR REPLACE VIEW WITH METRICS replaces the existing view.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists.
- CREATE VIEW WITH METRICS over a v2 table at the ident throws
  EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits
  at the ident.
- V1 session-catalog source dependency: nameParts arity 2-3.
- Multi-level V2 namespace source dependency: nameParts arity N+2.
- DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through
  loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo).
- Existing dep tests updated to assert nameParts instead of tableFullName.

Misc (inline 178):
- Scaladoc on MetricView.getProperties calls out that
  metric_view.from.sql / metric_view.where are truncated to
  Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable)
  values; consumers should re-read the YAML body for full SQL.
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 30, 2026
…e#55487

CI fix:
- Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in
  `MetricViewV2CatalogSuite.scala` whose absence broke the test file's
  parse (compileIncremental "four errors found" -- ')' expected, unmatched
  '}', unclosed multi-line string, '}' expected at EOF).

Self-review feedback:
- Shorten `viewDependencies` and `tableType` Scaladocs on
  `V2ViewPreparation` to one sentence each; drop the plain-view /
  metric-view distinctions.
- Consolidate the duplicated `run()` body from `CreateV2ViewExec` and
  `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait
  (extends `V2ViewPreparation`) that owns the shared CREATE-side
  flow: `viewExists` short-circuit, `OR REPLACE` via
  `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for
  cross-type collisions. Both subclasses now extend
  `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate
  trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of
  CREATE-only fields.
- Drop the defensive `case _: ResolvedIdentifier =>` branch in
  `CreateMetricViewCommand.run` -- the catch-all internal-error case is
  sufficient now that `DataSourceV2Strategy` reliably intercepts the
  non-session path.
- `SHOW CREATE TABLE` on a metric view now throws the dedicated
  `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead
  of falling through to the VIEW branch. Reverts the three METRIC_VIEW
  additions in `tables.scala`'s VIEW gates from the previous round.

Test reorg + new coverage in `MetricViewV2CatalogSuite`:
- Reorganized into 5 sections: (1) Create-related, (2) Dependency
  extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW.
- Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced
  with positive assertions on the captured ViewInfo's queryText,
  metric_view.where, and viewDependencies (parts shape).
- New `SELECT from a v2 metric view returns aggregated rows` -- exercises
  the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` ->
  `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip
  and asserts on the aggregated output rows.
- New `DESCRIBE TABLE on a v2 metric view returns the aliased columns`
  (non-EXTENDED variant alongside the existing EXTENDED test).
- New `SHOW TABLES does not list v2 metric views` (RelationCatalog
  contract: tables only).
- New `SHOW VIEWS lists v2 metric views`.
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 30, 2026
… TABLE coverage

Add the test cases requested on PR apache#55487:

SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite`
patterns:
- Existing test fixed to use `measure(count_sum)` (the v1 suite shows the
  required syntax for measure columns) and switched to `checkAnswer`
  against an equivalent raw aggregation over the source.
- `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a
  filter at the query layer.
- `SELECT against a v2 metric view honors the view's pre-defined where
  clause` -- creates the metric view with `where = Some("count > 1")`,
  asserts only the matching rows surface.
- `SELECT from a v2 metric view supports multiple measures with different
  aggregations` -- adds price_sum / price_max alongside count_sum.
- `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`.

DROP TABLE on a v2 metric view (section 5) -- 2 new tests:
- `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` --
  `DropTableExec`'s `tableExists` probe returns false (RelationCatalog
  passive filtering), the `viewExists` fallback returns true, so the exec
  emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric
  view is still present in the catalog after the failed DROP.
- `DROP TABLE IF EXISTS on a v2 metric view also throws
  EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type
  error (v1 parity).

SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test:
- `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes
  through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and
  throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation
  "SHOW CREATE TABLE".
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 review.

Status: 13 prior findings addressed (4 main concerns + 9 inline), 0 remaining, 4 new (3 newly introduced in commits since the prior review at 8f62efc6, 1 late catch). The shared V2ViewPreparation / V2CreateViewPreparation refactor reads cleanly, the typed Dependency / DependencyList API replaces the dot-joined-string form, the INVALID_METRIC_VIEW_YAML error is in place, and the tableType == VIEW audit covers most sites. Four follow-ups inline.

Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala Outdated
Comment thread sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala Outdated
Comment thread sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Column.java Outdated
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request May 1, 2026
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
Add support for CREATE VIEW ... WITH METRICS and the corresponding V2 catalog round-trip on top of the ViewCatalog / RelationCatalog APIs introduced by SPARK-52729.

The V2 metric-view create path now builds a ViewInfo and calls ViewCatalog.createView. Metric views are distinguished from plain views with PROP_TABLE_TYPE=METRIC_VIEW, which ViewInfo now preserves when callers intentionally set it. This avoids mutating ViewInfo properties after construction while keeping plain views defaulted to VIEW.

ViewInfo gains a typed viewDependencies field so catalogs can persist structured table/function dependencies without flattening nested lineage into string properties. The metric-view planner collects direct table dependencies from the analyzed source plan and passes them through this field.

Also add first-class CatalogTableType.METRIC_VIEW / TableSummary.METRIC_VIEW_TABLE_TYPE support, V2-to-V1 conversion for MetricView ViewInfo rows, drop-command parity for metric views, metric_view.* descriptive properties, and focused V1/V2 tests for dependency extraction, ViewInfo payloads, DROP VIEW routing, and user-specified column metadata preservation.
…eedback

Address all 4 substantive concerns and 8 inline comments from cloud-fan's
review on PR apache#55487.

V2 path refactor (concern 1, inline 87/111/142):
- New CreateV2MetricViewExec extends V2ViewPreparation (mirrors
  CreateV2ViewExec): viewExists short-circuit on IF NOT EXISTS, OR REPLACE
  via createOrReplaceView, and cross-type collision decoding via
  ViewAlreadyExistsException -> tableExists -> EXPECT_VIEW_NOT_TABLE.
- DataSourceV2Strategy rule routes CreateMetricViewCommand on a
  non-session catalog to the new exec; CreateMetricViewCommand keeps only
  the v1 session-catalog path in run().
- V2ViewPreparation gains optional viewDependencies / tableType hooks the
  metric-view subclass populates; plain views leave them None.
- Picks up CharVarcharUtils.getRawSchema, SchemaUtils.checkColumn-
  NameDuplication, SchemaUtils.checkIndeterminateCollationInSchema, and
  PROP_OWNER stamping for free via the shared trait.

Structural TableDependency / FunctionDependency (concern 3, inline 30/169):
- Replace single-string `tableFullName` with `String[] nameParts`; arity
  is preserved per source, unambiguous against quoted identifiers
  containing literal `.`. Apply same shape to FunctionDependency.
- Dependency.table / .function factories take varargs.
- MetricViewHelper.collectTableDependencies returns Seq[Seq[String]];
  each match arm emits structural parts (V1 sources via
  TableIdentifier.nameParts; V2 sources via catalog +: namespace :+ name).

Sealed Dependency + defensive DependencyList (inline 30 / 40):
- `sealed interface Dependency permits TableDependency, FunctionDependency`
  enforces the documented "is one of" structurally.
- DependencyList canonical constructor and accessor defensively clone the
  array so consumers can't mutate stored ViewInfo dependency state.

ViewInfo constructor cleanup (inline 68):
- Drop the metric-view-specific PROP_TABLE_TYPE branch in the generic
  constructor in favor of `properties().putIfAbsent(...)`. Callers that
  want a more specific kind (e.g. METRIC_VIEW) call
  BaseBuilder.withTableType(...) before build() -- exercised by
  CreateV2MetricViewExec via the new V2ViewPreparation tableType hook.

VIEW-gate audit (concern 2, inline 1088):
- CatalogTable.toJsonLinkedHashMap (interface.scala:670) accepts
  METRIC_VIEW so DESCRIBE TABLE EXTENDED still emits "View Text" /
  "View Original Text" / "View Schema Mode" / "View Catalog and
  Namespace" / "SQL Path" rows for metric views.
- HiveExternalCatalog: 4 sites (createTable empty-schema fallback x2,
  alterTable VIEW path, restoreTableMetadata read-back) accept METRIC_VIEW
  so a Hive-metastore-backed session-catalog metric view round-trips.
- Repo-wide: SessionCatalog.isView, InMemoryCatalog.listViews,
  RelationResolution.createDataSourceV1Scan (streaming-on-view rejection),
  Analyzer.lookupTableOrView (ResolvedPersistentView wrapping),
  rules.saveDataIntoView, DataStreamWriter.writeToV1Table,
  DescribeRelationJsonCommand.describePartitionInfoJson,
  AnalyzeColumnCommand, AnalyzePartitionCommand, CommandUtils.analyzeTable,
  tables.CreateTableLike provider lookup,
  tables.AlterTableAddColumnsCommand, tables.describeDetailedPartitionInfo,
  tables.ShowCreateTableCommand (3 sites) -- all extended to accept
  METRIC_VIEW.

Real error class for malformed YAML (concern 4, inline 70):
- New INVALID_METRIC_VIEW_YAML error condition (sqlState 42K0L) +
  QueryCompilationErrors.invalidMetricViewYamlError. Replaces
  SparkException.internalError so a typo in the user's YAML body surfaces
  as a user-correctable AnalysisException.

Tests (inline 39):
- CREATE OR REPLACE VIEW WITH METRICS replaces the existing view.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when the view exists.
- CREATE VIEW WITH METRICS over a v2 table at the ident throws
  EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE.
- CREATE VIEW IF NOT EXISTS WITH METRICS is a no-op when a v2 table sits
  at the ident.
- V1 session-catalog source dependency: nameParts arity 2-3.
- Multi-level V2 namespace source dependency: nameParts arity N+2.
- DESCRIBE TABLE EXTENDED on a v2 metric view -- read-back through
  loadRelation -> MetadataOnlyTable -> V1Table.toCatalogTable(ViewInfo).
- Existing dep tests updated to assert nameParts instead of tableFullName.

Misc (inline 178):
- Scaladoc on MetricView.getProperties calls out that
  metric_view.from.sql / metric_view.where are truncated to
  Constants.MAXIMUM_PROPERTY_SIZE and are descriptive (not round-trippable)
  values; consumers should re-read the YAML body for full SQL.
…e#55487

CI fix:
- Restore the missing `"MISSING_CATALOG_ABILITY.VIEWS") {` line in
  `MetricViewV2CatalogSuite.scala` whose absence broke the test file's
  parse (compileIncremental "four errors found" -- ')' expected, unmatched
  '}', unclosed multi-line string, '}' expected at EOF).

Self-review feedback:
- Shorten `viewDependencies` and `tableType` Scaladocs on
  `V2ViewPreparation` to one sentence each; drop the plain-view /
  metric-view distinctions.
- Consolidate the duplicated `run()` body from `CreateV2ViewExec` and
  `CreateV2MetricViewExec` into a new `V2CreateViewPreparation` trait
  (extends `V2ViewPreparation`) that owns the shared CREATE-side
  flow: `viewExists` short-circuit, `OR REPLACE` via
  `createOrReplaceView`, and `ViewAlreadyExistsException` decoding for
  cross-type collisions. Both subclasses now extend
  `V2CreateViewPreparation` and inherit `run()` unchanged. The intermediate
  trait keeps `V2ViewPreparation` (also used by `AlterV2ViewExec`) free of
  CREATE-only fields.
- Drop the defensive `case _: ResolvedIdentifier =>` branch in
  `CreateMetricViewCommand.run` -- the catch-all internal-error case is
  sufficient now that `DataSourceV2Strategy` reliably intercepts the
  non-session path.
- `SHOW CREATE TABLE` on a metric view now throws the dedicated
  `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000) instead
  of falling through to the VIEW branch. Reverts the three METRIC_VIEW
  additions in `tables.scala`'s VIEW gates from the previous round.

Test reorg + new coverage in `MetricViewV2CatalogSuite`:
- Reorganized into 5 sections: (1) Create-related, (2) Dependency
  extraction, (3) SELECT cases, (4) DESCRIBE cases, (5) DROP / SHOW.
- Trimmed the `OR REPLACE` test's `!== firstYaml` assertion; replaced
  with positive assertions on the captured ViewInfo's queryText,
  metric_view.where, and viewDependencies (parts shape).
- New `SELECT from a v2 metric view returns aggregated rows` -- exercises
  the `loadRelation` -> `MetadataOnlyTable(ViewInfo)` ->
  `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip
  and asserts on the aggregated output rows.
- New `DESCRIBE TABLE on a v2 metric view returns the aliased columns`
  (non-EXTENDED variant alongside the existing EXTENDED test).
- New `SHOW TABLES does not list v2 metric views` (RelationCatalog
  contract: tables only).
- New `SHOW VIEWS lists v2 metric views`.
… TABLE coverage

Add the test cases requested on PR apache#55487:

SELECT read-back (section 3) -- 4 new tests modeled on `MetricViewSuite`
patterns:
- Existing test fixed to use `measure(count_sum)` (the v1 suite shows the
  required syntax for measure columns) and switched to `checkAnswer`
  against an equivalent raw aggregation over the source.
- `SELECT measure(...) with a WHERE clause on a dimension` -- exercises a
  filter at the query layer.
- `SELECT against a v2 metric view honors the view's pre-defined where
  clause` -- creates the metric view with `where = Some("count > 1")`,
  asserts only the matching rows surface.
- `SELECT from a v2 metric view supports multiple measures with different
  aggregations` -- adds price_sum / price_max alongside count_sum.
- `SELECT from a v2 metric view supports ORDER BY and LIMIT on measures`.

DROP TABLE on a v2 metric view (section 5) -- 2 new tests:
- `DROP TABLE on a v2 metric view throws EXPECT_TABLE_NOT_VIEW` --
  `DropTableExec`'s `tableExists` probe returns false (RelationCatalog
  passive filtering), the `viewExists` fallback returns true, so the exec
  emits `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE`. Also asserts the metric
  view is still present in the catalog after the failed DROP.
- `DROP TABLE IF EXISTS on a v2 metric view also throws
  EXPECT_TABLE_NOT_VIEW` -- IF EXISTS does not silence the wrong-type
  error (v1 parity).

SHOW CREATE TABLE on a v2 metric view (section 5) -- 1 new test:
- `SHOW CREATE TABLE on a v2 metric view is unsupported` -- routes
  through `DataSourceV2Strategy`'s `ResolvedPersistentView` case and
  throws `UNSUPPORTED_FEATURE.TABLE_OPERATION` with operation
  "SHOW CREATE TABLE".
…ursion, scalastyle

Address four CI failures introduced by the v2 metric-view changes on this PR:

1. `SparkOperation.tableTypeString` (hive-thriftserver) did not know how to map
   the new `CatalogTableType.METRIC_VIEW` to a Hive JDBC table type, causing
   `SparkMetadataOperationSuite` `GET_TABLE_TYPES` to fail with "Unknown table
   type is found: CatalogTableType(METRIC_VIEW)". Map `METRIC_VIEW` to `"VIEW"`
   on the JDBC wire (Hive only knows `TABLE` / `VIEW`).

2. `HiveClientImpl.toHiveTableType` (hive) had the same gap, breaking 19
   `HiveMetricViewSuite` cases that go through HMS persistence on the V1
   session-catalog path. Map `METRIC_VIEW` to `HiveTableType.VIRTUAL_VIEW`.

3. `MetricViewRecordingCatalog.loadRelation` in `MetricViewV2CatalogSuite`
   recursed infinitely. `class MetricViewRecordingCatalog extends
   InMemoryTableCatalog with RelationCatalog` linearizes `RelationCatalog`
   ahead of `InMemoryTableCatalog`, so `super.loadTable` resolved to
   `RelationCatalog.loadTable`, whose default delegates back to
   `loadRelation`. Qualify the super call to `super[InMemoryTableCatalog]`
   to dispatch directly to the table-catalog implementation.

4. Scalastyle import-order violation in `DataSourceV2Strategy.scala`:
   `metricview.serde.MetricViewFactory` was placed before
   `execution.datasources.*`. Reorder to satisfy the Scalastyle import-order
   rule.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…ble / loadTableOrView

Adapt `MetricViewV2CatalogSuite` to the renames in SPARK-56655, which closed
out the still-`Evolving` v2 view API surface introduced by SPARK-52729:

- `RelationCatalog` -> `TableViewCatalog`
- `MetadataOnlyTable` -> `MetadataTable`
- `loadRelation` -> `loadTableOrView`
- `ViewCatalog` gained a new abstract `renameView(oldIdent, newIdent)`

Updates:
- Imports: drop `MetadataOnlyTable`, `RelationCatalog`; add `MetadataTable`,
  `TableViewCatalog`.
- `MetricViewRecordingCatalog`: mix in `TableViewCatalog` instead of
  `RelationCatalog`; rename the override of `loadRelation` -> `loadTableOrView`,
  and `MetadataOnlyTable` -> `MetadataTable`. The qualified
  `super[InMemoryTableCatalog].loadTable(ident)` recursion-bypass continues
  to work (the default `loadTable` derived from `TableViewCatalog` still
  delegates to `loadTableOrView`, hence the bypass is still required).
- Add a `renameView(oldIdent, newIdent)` implementation: moves the
  `views` map entry, the `MetricViewRecordingCatalog.capturedViews` entry,
  and rejects via `ViewAlreadyExistsException` / `NoSuchViewException`
  per the `TableViewCatalog` contract.
- Update doc comments and a few test names that mentioned the old API.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…-trip

Five fixes to recover the metric-view test suites after the latest rebase
on apache/spark master:

1. `MetricViewV2CatalogSuite.withTestCatalogTables`: a few negative tests
   pre-create a regular table at `mv` to exercise cross-type collisions,
   but the previous cleanup only did `DROP VIEW IF EXISTS mv`, leaving the
   table behind to poison every subsequent test in the suite (cleanup
   raised `WRONG_COMMAND_FOR_OBJECT_TYPE` because mv was a TABLE, not a
   VIEW). Sweep `mv` as both view and table in cleanup, wrapped in `Try`
   so the symmetric "wrong command for type" errors don't escape.

2. Test assertions on `info.queryText() === yaml`: the captured queryText
   includes the `\n` between the SQL `$$` markers and the YAML body, but
   `MetricViewFactory.toYAML(...)` doesn't emit surrounding newlines. Trim
   both sides before the equality check (3 sites).

3. `MISSING_CATALOG_ABILITY` assertion: SPARK-56655 added the `.VIEWS`
   subclass; the bare error condition no longer surfaces directly. Update
   the assertion to expect `MISSING_CATALOG_ABILITY.VIEWS`.

4. `MetricViewRecordingCatalog.{createView, replaceView}`: the
   `TableViewCatalog` active-rejection contract requires `createView` to
   throw `ViewAlreadyExistsException` (and `replaceView` to throw
   `NoSuchViewException`) when a *table* sits at the ident, not just when
   a view does. The previous override only checked `views`, so the
   pre-table-then-create-view test never raised the cross-type collision
   `CreateV2MetricViewExec` is supposed to decode into
   `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`. Add a tables-only existence
   probe (bypassing `TableViewCatalog.tableExists`'s default, which would
   recurse through our own `loadTableOrView`) and reject up front.

5. `HiveExternalCatalog` HMS round-trip for metric views: `HiveTableType`
   has no `METRIC_VIEW` variant -- both regular views and metric views
   serialize as `VIRTUAL_VIEW`, and `HiveClientImpl.getTableOption` always
   maps `VIRTUAL_VIEW` back to `CatalogTableType.VIEW`, dropping the
   metric-view distinction on read. Result: `HiveMetricViewSuite` (which
   reuses the same v1 `MetricViewSuite` against
   `HiveExternalCatalog`) saw the YAML body parsed as SQL, raising
   `PARSE_SYNTAX_ERROR` on every test.

   Persist a `view.subType = METRIC_VIEW` property on write
   (`tableMetaToTableProps`) and lift `tableType` back to `METRIC_VIEW`
   on read (`restoreTableMetadata`) when the marker is present. Property
   name follows the existing `view.*` convention so it round-trips
   through HMS like other view metadata.

Also factored the `metric_view.*` column metadata preservation across
user-specified renames into a hookable `retainColumnMetadata: Boolean`
on `V2ViewPreparation` (defaults to `false`); `CreateV2MetricViewExec`
overrides it to `true`. Mirrors what
`ViewHelper.prepareTable(isMetricView = true)` already does on the v1
path, fixing the `key not found: metric_view.type` failure on the v2
"user-specified column names with comments preserve metric_view.*
metadata" test.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…tructField

Add three small public APIs that let v2 catalog connectors (such as the Unity
Catalog Spark connector built on TableViewCatalog) round-trip a Spark
StructField through external storage without reaching for private[sql] helpers
or the singleton-StructType wrap workaround:

- StructField.json / StructField.prettyJson: public counterparts of
  DataType.json / DataType.prettyJson, exposing the existing private[sql]
  jsonValue.
- StructField.fromJson(String): companion-object parser that mirrors
  DataType.fromJson and is the inverse of StructField.json.
- Column.fromStructField(StructField): static factory in the catalog Column
  interface that maps a Spark StructField (with metadata) into a connector
  Column (with metadataInJSON), symmetric to TableInfo.schema() which already
  goes the other way via CatalogV2Util.v2ColumnsToStructType.

Widens DataType.parseStructField from private to private[sql] so the new
StructField.fromJson companion can call it directly.

Tests:
- DataTypeSuite covers StructField round-trip via .json / .fromJson with
  metric_view-style metadata + comment, and an empty-metadata field.
- CatalogV2UtilSuite covers Column.fromStructField for both the
  metadata-bearing and empty-metadata cases.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…pache#55487

Address the four new findings from cloud-fan's round 2 review of
PR apache#55487 plus the corresponding CI fixes. Combines the revert of
the StructField/Column public-API commit (per cloud-fan's split
request) with the three remaining metric-view fixes into one commit.

Cloud-fan finding #1 (`verifyAlterTableType` audit miss):

* `ddl.scala` `verifyAlterTableType` (lines 1086-1103) was missed in
  the `tableType == VIEW` -> `VIEW || METRIC_VIEW` audit. Without
  this fix, `ALTER TABLE <metric_view> RENAME TO ...` silently
  succeeds (renames as if a regular table), and
  `ALTER VIEW <metric_view> RENAME TO ...` throws
  `cannotAlterTableWithAlterViewError` (wrong category for a view-
  kind). Widen both arms to also accept `METRIC_VIEW`.

Cloud-fan finding apache#2/apache#3 (StructField commit split-out):

* Revert d55143f ("Public StructField.json / .fromJson +
  Column.fromStructField"). None of the three new public APIs are
  called by code in this PR -- they exist to support the downstream
  Unity Catalog connector. Per cloud-fan's request, split them out
  into a separate PR with its own JIRA ticket so they can be
  reviewed independently. Reverting the commit also removes the
  `Column.fromStructField` divergence from `structFieldToV2Column`
  (which would have duplicated the comment field in
  `metadataInJSON`); the split-out PR fixes that inline.

  The revert also fixes two CI failures introduced by the same
  commit:
  - `mimaReportBinaryIssues` losing `StructField.tupled` /
    `StructField.curried` / `AbstractFunction4` parent on the
    `StructField` companion object (sql-api 4.0.0 binary surface).
  - `Javaunidoc` failing on the `{@link StructField.json}` /
    `{@link StructField.fromJson}` references that javadoc rejects
    (member separator must be `#`, not `.`).

Cloud-fan finding apache#4 (multi-level namespace lift):

* `MetricViewHelper.analyzeMetricViewText` previously required a
  `TableIdentifier`, which can only carry up to 3 name parts. v2
  metric views with multi-level-namespace targets (e.g.
  `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with
  `requiresSinglePartNamespaceError`. Lift the helper to take
  `nameParts: Seq[String]` directly. The synthetic `CatalogTable`
  used as analysis context still needs a TableIdentifier (it's a
  v1 type); we collapse the intermediate namespace components into
  the synthetic `database` slot via dot-join. Since the synthetic
  identifier is never used to resolve the view body itself, this
  collapse is observationally invisible to the analyzed plan, and
  `verifyTemporaryObjectsNotExists` still receives the full
  `nameParts` so error rendering preserves the multi-part form.

  Update both call sites:
  - `CreateMetricViewCommand.createMetricViewInSessionCatalog` (v1)
    passes `name.nameParts`.
  - `DataSourceV2Strategy` (v2) passes
    `(catalog.name() +: ident.namespace() :+ ident.name())`.

  Add a new test in `MetricViewV2CatalogSuite` that creates a
  metric view at `testcat.ns_a.ns_b.mv_deep` and asserts the create
  succeeds + the captured `ViewInfo` carries
  `PROP_TABLE_TYPE = METRIC_VIEW`.

Test realignment to SPARK-56655's new contract:

* SPARK-56655 (master) added several v2 view DDL behaviors that
  changed the error categories surfaced for cross-type collisions
  and added view-aware listing on `TableViewCatalog`. Realign 5
  existing test assertions in `MetricViewV2CatalogSuite`:
  - "CREATE VIEW ... WITH METRICS over a v2 table at the ident":
    `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE` -> `TABLE_OR_VIEW_
    ALREADY_EXISTS` (master's analyzer-time pre-check fires before
    the exec-time decoding in `CreateV2MetricViewExec.run`).
  - "DROP TABLE on a v2 metric view" + "DROP TABLE IF EXISTS on
    a v2 metric view": `EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE` ->
    `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead",
    raised by master's new `DropTableExec.run`).
  - "SHOW TABLES does not list v2 metric views" -> "SHOW TABLES
    on a v2 TableViewCatalog lists both tables and metric views":
    SPARK-56655 routes SHOW TABLES on a `TableViewCatalog` through
    `listRelationSummaries` so views appear alongside tables in
    the output (matches v1 SHOW TABLES on a session catalog).
  - DESCRIBE TABLE EXTENDED `View Text` row: trim before comparing
    to the pre-substitution YAML body (the row carries the raw
    text between the SQL `$$ ... $$` markers, including the
    surrounding `\n`).

* Add an explicit `SHOW CREATE TABLE` rejection guard for v2
  metric views in `DataSourceV2Strategy`. Master's new
  `ShowCreateV2ViewExec` would emit a plain `CREATE VIEW <ident>
  AS <yaml>`, which is not round-trippable as metric-view DDL
  (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
  YAML AS $$ <yaml> $$`). Reject up front with
  `unsupportedTableOperationError("SHOW CREATE TABLE")` rather
  than emit lossy DDL; the existing test already expects this
  exact behavior.

Signed-off-by: chen.wang <chen.wang@databricks.com>
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 review.

Status: All 4 prior findings addressed (verifyAlterTableType audit miss, StructField/Column public-API split, Column.fromStructField comment-duplication via revert, multi-level namespace lift in analyzeMetricViewText), 0 remaining, 7 new (1 newly introduced, 6 late catches). The round 3 follow-up commit cleanly addresses the round 2 findings; remaining concerns center on v1/v2 inconsistencies (descriptive properties, error class, dependency-name arity) and API correctness of the new Dependency records.

I'll be honest about the 6 late catches: most have been latent since round 1 — the structural-form switch fixed the format dimension I flagged but not the arity dimension; the records I suggested have a value-semantics issue I missed; the metric_view.* merge has been on the v2 path since the original PR. Apologies for not surfacing these earlier.

val analyzed = MetricViewHelper.analyzeMetricViewText(
session, nameParts, originalText)
val metricView = MetricViewFactory.fromYAML(originalText)
val mergedProps = properties ++ metricView.getProperties
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v2 path merges metric_view.from.type / metric_view.from.name / metric_view.from.sql / metric_view.where into the property bag here (properties ++ metricView.getProperties), but the v1 path in CreateMetricViewCommand.createMetricViewInSessionCatalog (metricViewCommands.scala:74-78) passes properties to prepareTable without the merge. Result: DESCRIBE TABLE EXTENDED on a session-catalog metric view doesn't show these descriptor rows; on a v2 metric view, it does.

The PR description states the merge happens "into the view's properties bag" without a v1/v2 carve-out — please confirm this is intentional, or merge in v1 too.

// would emit a plain `CREATE VIEW <ident> AS <yaml>`, which is not a round-trippable
// metric-view DDL form (the right form is `CREATE VIEW <ident> WITH METRICS LANGUAGE
// YAML AS $$ <yaml> $$`). Reject up front rather than emit lossy DDL.
throw QueryCompilationErrors.unsupportedTableOperationError(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v1 path uses the dedicated UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW (tables.scala:1219 → showCreateTableNotSupportedOnMetricViewError); this v2 path uses the generic UNSUPPORTED_FEATURE.TABLE_OPERATION. Same user action, two error classes — the dedicated one carries a more actionable message ("not supported on metric view"). showCreateTableNotSupportedOnMetricViewError takes a String; you can format the multipart name the same way the case below does ((catalog.name() +: ident.asMultipartIdentifier).map(quoteIfNeeded).mkString(".")).

case r: HiveTableRelation =>
tables += r.tableMeta.identifier.nameParts
case r: LogicalRelation if r.catalogTable.isDefined =>
tables += r.catalogTable.get.identifier.nameParts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableIdentifier.nameParts returns 1, 2, or 3 parts depending on which of catalog / database are set on the resolved CatalogTable.identifier. The same v1 source table can yield [db, tbl] or [spark_catalog, db, tbl] across runs — the test in MetricViewV2CatalogSuite (the V1 source dependency-extraction test) admits this with parts.length >= 2 && parts.length <= 3. The TableDependency Javadoc (TableDependency.java:32-33) acknowledges it, but downstream consumers comparing dependency names structurally won't be able to rely on stable arity. Same issue applies to the View arm (line 105) and HiveTableRelation arm (line 112).

Consider normalizing v1 sources to always emit 3-part [spark_catalog, db, tbl] (prepending the session-catalog name when missing) so consumers see a stable shape per source kind. The test should then assert exactly 3 parts.

val nameParts = (catalog.name() +: ident.namespace().toIndexedSeq) :+ ident.name()
val analyzed = MetricViewHelper.analyzeMetricViewText(
session, nameParts, originalText)
val metricView = MetricViewFactory.fromYAML(originalText)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricViewHelper.analyzeMetricViewText (called on line 345) already parses the YAML internally via MetricViewPlanner.parseYAMLMetricViewFactory.fromYAML. Re-parsing here duplicates the work. analyzeMetricViewText could return (LogicalPlan, MetricView) (or expose the parsed MetricView via the MetricViewPlaceholder it builds) so the strategy doesn't pay the parse twice.

}

case DropTable(r: ResolvedIdentifier, ifExists, purge) =>
val tableCatalog = r.catalog.asTableCatalog
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting r.catalog.asTableCatalog into tableCatalog is a no-op refactor unrelated to metric-view support and is not mentioned in the PR description. Please revert or split into a separate cleanup PR — keeps the metric-view PR focused.

* @since 4.2.0
*/
@Evolving
public record DependencyList(Dependency[] dependencies) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java records auto-generate equals/hashCode from per-field Objects.equals/Objects.hashCode, which on array fields fall through to Object.equals (reference equality). So DependencyList.of(Dependency.table("a","b")).equals(DependencyList.of(Dependency.table("a","b"))) == false. Same issue on TableDependency.nameParts (TableDependency.java:42) and FunctionDependency.nameParts (FunctionDependency.java:34).

This undermines the "structural multi-part name" intent — consumers can't dedup, compare, or use these as Map keys. Please override equals/hashCode to use Arrays.equals/Arrays.hashCode on each record (and Arrays.deepEquals/Arrays.deepHashCode on DependencyList), or expose List<...> instead of [] to inherit value semantics for free.

where = None,
select = metricViewColumns)
val yaml = MetricViewFactory.toYAML(metricView)
// Give both columns new names, and a comment on each. Without the `retainMetadata`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two test comments describe behavior in terms of the in-PR change rather than the current invariant — they age poorly post-merge:

  • Lines 253-254: // ... Without the retainMetadatafix toViewHelper.aliasPlan, the metric_view.* keys disappear here. — once merged, there is no "fix" to refer to. Rephrase as a pin: // Pins aliasPlan(retainMetadata=true): metric_view.* keys must survive a column rename with comments.
  • Lines 749-752 (DESCRIBE EXTENDED test): // The "Type" row was added alongside this metric-view PR so DescribeV2ViewExec matches v1 parity... — same problem. Rephrase to describe the current invariant being pinned (e.g. DescribeV2ViewExec emits a "Type" row matching v1 parity, so users can distinguish VIEW vs METRIC_VIEW.).

…pache#55487

- Revert the unrelated `tableCatalog` extraction in the `DropTable` strategy
  case (round 2 leftover).
- Single YAML parse per CREATE: `MetricViewHelper.analyzeMetricViewText` now
  returns `(LogicalPlan, MetricView)`, grabbing the parsed `MetricView` off
  the un-analyzed `MetricViewPlaceholder` so callers don't re-parse.
- v1/v2 parity for `metric_view.*` descriptor properties: lift the
  `metricView.getProperties` merge into the v1
  `createMetricViewInSessionCatalog` path so v1 `DESCRIBE TABLE EXTENDED`
  surfaces the same descriptor rows as v2.
- v1/v2 parity for `SHOW CREATE TABLE` error class: switch the v2 path to
  the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class.
- Normalize v1 source dependencies to a stable 3-part
  `[spark_catalog, db, table]` shape via a new `qualifyV1` helper, applied
  in the `View` / `HiveTableRelation` / `LogicalRelation` arms of
  `collectTableDependencies`. Updates `TableDependency` Javadoc and the V1
  source dep test accordingly.
- Switch `Dependency` / `TableDependency` / `FunctionDependency` /
  `DependencyList` from `String[]` / `Dependency[]` to immutable
  `List<>` so records inherit value `equals` / `hashCode` for free
  (closes the reference-equality bug on array fields). Internal copies use
  `List.copyOf`; static factories still take varargs.
- Rephrase 2 test comments in `MetricViewV2CatalogSuite` to describe the
  pinned invariant rather than the in-PR fix (so they age well post-merge).

Signed-off-by: chen.wang <chen.wang@databricks.com>
Two CI fixes for the previous round-3 review commit:

- Scalastyle: replace the multi-space alignment before `=>` in
  `MetricViewHelper.qualifyV1`'s pattern-match arms with a single space
  (Scalastyle's `NoWhitespaceBeforeLeftBracketChecker` flags lined-up
  `=>` arrows).
- Test: update `MetricViewV2CatalogSuite`'s SHOW CREATE TABLE
  assertion to expect `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW`
  (the dedicated error class introduced in the round-3 commit) instead
  of the previously-used generic `UNSUPPORTED_FEATURE.TABLE_OPERATION`.

Other CI job failures on the prior run are unrelated infrastructure
flakes (Kerberos PREAUTH on Docker integration tests, minikube
termination on Kubernetes IT, RemoteClassLoaderError on Connect's
`SparkSessionE2ESuite`).

Signed-off-by: chen.wang <chen.wang@databricks.com>
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4 review.

Status: All 7 prior findings addressed (SHOW CREATE TABLE error class parity, metric_view.* v1/v2 parity, Dependency records value-equality via List<>, v1 source dep normalization through qualifyV1, single YAML parse, DropTable revert, test comment phrasing), 0 remaining, 6 new (3 newly introduced in 437b71d79e, 3 late catches I missed earlier). The round-3 follow-up cleanly closes the round-3 findings; remaining concerns are mostly small (text-quality, test gap) plus a few design-level questions worth answering inline before merge.

Apologies again for the late catches -- the FunctionDependency unused-API and the cyclic-view-check skip have been latent since round 1, and the DBR terminology leak has been there since the original commit.

Followup-able (not blocking, worth a tracking ticket):

  • ALTER VIEW <metric_view> AS SELECT ... is silently accepted by AlterV2ViewExec -- existingView.properties carry PROP_TABLE_TYPE = METRIC_VIEW through, so the view ends up classified as METRIC_VIEW with a plain-SQL body; the next SELECT blows up with INVALID_METRIC_VIEW_YAML. Same hole exists on the v1 path (AlterViewAsCommand.alterPermanentView preserves tableType via viewMeta.copy). Not introduced by this PR, but promoting METRIC_VIEW to a first-class type makes the inconsistency more visible.
  • viewDependencies is only populated on the non-session v2 path -- v1 metric views (and v2 metric views read back through V1Table.toCatalogTable) carry null. Probably the right scope cap for this PR, but worth a one-line note in the description so catalog plugin developers know not to rely on it for the v1 surface.
  • No test for ALTER VIEW <metric_view> RENAME TO ... even though RenameV2ViewExec is in the strategy and the fixture's renameView is implemented. Coverage gap.

Comment on lines +348 to +351
val deps = if (depParts.nonEmpty) {
Some(DependencyList.of(
depParts.map(parts => Dependency.table(parts: _*)): _*))
} else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty-depParts case maps to None here, which becomes null on ViewInfo.viewDependencies(). Per the Javadoc on ViewInfo#viewDependencies, null means "no dependency list was supplied" while an empty list means "provided but the object has none." For a metric view we always compute deps, so when collectTableDependencies returns empty (e.g. SQLSource("SELECT 1 AS x")) the right value is Some(DependencyList.of()), not None.

Suggested change
val deps = if (depParts.nonEmpty) {
Some(DependencyList.of(
depParts.map(parts => Dependency.table(parts: _*)): _*))
} else None
val deps = Some(DependencyList.of(
depParts.map(parts => Dependency.table(parts: _*)): _*))

* @since 4.2.0
*/
@Evolving
public sealed interface Dependency permits TableDependency, FunctionDependency {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionDependency is in the sealed permits list and exposed via the Dependency.function(...) factory below, but no producer in this PR ever emits one -- MetricViewHelper.collectTableDependencies only emits TableDependency. Two options: (a) drop FunctionDependency until it has a producer (the @Evolving annotation is meant to evolve before stabilizing, so adding it later is cheap); (b) keep it as groundwork but mention in the PR description so reviewers don't trip on the dead surface, and add a sentence to this class-level Javadoc noting that consumers may receive only TableDependency instances today.

* with single-level namespaces the parts are {@code [catalog, schema, table]}; for a catalog
* with multi-level namespaces (e.g. Iceberg with {@code db1.db2}) the parts are
* {@code [catalog, db1, db2, ..., table]}; for v1 sources resolved through the session
* catalog producers should normalize to {@code [spark_catalog, db, table]} so consumers see
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma -- as written, "session catalog producers" reads as a compound noun.

Suggested change
* catalog producers should normalize to {@code [spark_catalog, db, table]} so consumers see
* catalog, producers should normalize to {@code [spark_catalog, db, table]} so consumers see

ViewHelper.verifyTemporaryObjectsNotExists(
isTemporary = false, name.nameParts, analyzed, Seq.empty)
analyzed
isTemporary = false, nameParts, analyzed, Seq.empty)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyzeMetricViewText runs verifyTemporaryObjectsNotExists but skips the cyclic-view-reference check that plain CREATE VIEW gets via CheckViewReferences.checkCyclicViewReference (only applied on replace). For CREATE OR REPLACE VIEW <mv> WITH METRICS LANGUAGE YAML AS $$ from: <mv> ... $$ the cycle currently surfaces only on the next SELECT. Pre-existing on the v1 path too (the old createMetricViewInSessionCatalog didn't call it either), so this is a follow-up rather than a blocker for this PR -- but wiring it in here means both v1 and v2 metric-view CREATE paths benefit from one fix.

val info = capturedViewInfo()
val props = info.properties()

// metric_view.* descriptive properties (mirrors DBR SingleSourceMetricView).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DBR" is internal Databricks-Runtime terminology and shouldn't appear in OSS comments.

Suggested change
// metric_view.* descriptive properties (mirrors DBR SingleSourceMetricView).
// metric_view.* descriptive properties (mirrors the canonical metric-view property layout).

Comment on lines +754 to +755
// emits a "Type" row matching v1 `CatalogTable.toJsonLinkedHashMap` parity, so users
// can distinguish a plain VIEW from a sub-kind like METRIC_VIEW.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence is broken: "Pins DescribeV2ViewExec emits a "Type" row matching v1 CatalogTable.toJsonLinkedHashMap parity ..." reads as if Pins is a verb taking DescribeV2ViewExec as its direct object, and "matching X parity" is awkward.

Suggested change
// emits a "Type" row matching v1 `CatalogTable.toJsonLinkedHashMap` parity, so users
// can distinguish a plain VIEW from a sub-kind like METRIC_VIEW.
// "View Current Catalog" / "View Schema Mode" / etc. rows. Pins that `DescribeV2ViewExec`
// emits a "Type" row for parity with v1 `CatalogTable.toJsonLinkedHashMap`, so users

…pache#55487

- DataSourceV2Strategy: empty dep list -> Some(DependencyList.of()),
  not None, so ViewInfo.viewDependencies semantics match its Javadoc
  ("null = not supplied", "empty = supplied but none"). Metric-view
  CREATE always *computes* deps, so the right empty representation is
  Some(empty list).
- Dependency.java: add Javadoc note that today's producers emit only
  TableDependency; FunctionDependency / Dependency.function(...) are
  groundwork for future producers (e.g. SQL UDF dep tracking).
- TableDependency.java: insert missing comma after "session catalog"
  in the parts-form Javadoc.
- MetricViewV2CatalogSuite: drop "DBR" terminology, fix the broken
  "Pins ..." sentence in the DESCRIBE EXTENDED test, and add an
  ALTER VIEW <metric_view> RENAME TO test that pins the existing
  RenameV2ViewExec wiring (catalog renameView relocates both the
  views entry and the captured ViewInfo entry; metric-view kind
  survives the rename).

Signed-off-by: chen.wang <chen.wang@databricks.com>
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 5 review.

Status: All 6 prior round-4 findings addressed (R4 inline: Some(DependencyList.of()) not None; FunctionDependency unused-API Javadoc note; TableDependency missing-comma typo; "DBR" terminology leak; broken sentence at line 755 -- plus R4 follow-up #3: ALTER VIEW ... RENAME TO test). 1 remaining (R4 inline #4: analyzeMetricViewText still skips CheckViewReferences.checkCyclicViewReference -- accepted as a follow-up rather than a blocker in round 4). 1 new (newly introduced in e980f24427): a small citation-precision nit on the new DataSourceV2Strategy comment (inline below).

The round-5 commit is small and surgical; nothing else from prior rounds regressed. Approving -- the inline below is a nit, not blocking.

…rces/v2/DataSourceV2Strategy.scala

Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
* the first element is typically the catalog name and subsequent elements are namespace
* components followed by the table name.
*/
static TableDependency table(String... nameParts) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a late catch. This API is implemented by users and called by Spark. Var-args API is not recommended here as it requires Spark to do parts: _* which is extra work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String[] should be better.

… isViewLike

Address cloud-fan late catch on PR apache#55487
(apache#55487 (comment)) plus
port DBR's `CatalogTable.isViewLike` helper to OSS to reduce divergence.

- Dependency.table / Dependency.function / DependencyList.of: switch from
  `String... nameParts` / `Dependency... dependencies` varargs to
  `List<String>` / `List<Dependency>` parameters. Eliminates the `: _*`
  splat and transient `String[]` / `Dependency[]` array allocation on
  every call. Records still hold immutable List internally via
  `List.copyOf` in the canonical constructor, so wire / value semantics
  are unchanged.
- DataSourceV2Strategy: update the sole producer call site to construct
  via `.asJava` instead of `: _*`, with an explicit `Seq[Dependency]`
  ascription so the Java invariance + Scala covariance gap is bridged
  before `.asJava`.
- CatalogTable.isViewLike: new instance method (mirrors DBR signature
  exactly so the body is the only OSS/DBR diff: today VIEW or
  METRIC_VIEW; DBR also includes MATERIALIZED_VIEW and STREAMING_TABLE),
  plus a companion `CatalogTable.isViewLike(t: CatalogTableType)` for
  callers that only have a `CatalogTableType` (`SessionCatalog.isView`,
  `ddl.scala::verifyAlterTableType`, drop-table type check, and
  `HiveClientImpl.toHiveTableType`).
- Replace all 17 inline `tableType == VIEW || tableType == METRIC_VIEW`
  disjunctions and 1 `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW`
  pattern-alternation across catalyst / core / hive with the new helpers.
  Single source of truth -- adding new view-like types in the future is a
  one-line change in `isViewLike`.
- Drop now-unused `CatalogTableType` imports from 6 files where the only
  reference was the just-removed disjunction.

Signed-off-by: chen.wang <chen.wang@databricks.com>
…d-fan

Per apache#55487 review feedback, flip the
Dependency / DependencyList records back to array-shape fields to match
the dominant DSv2 convention (Identifier.namespace, Column.children, etc.)
while preserving the value semantics that the round-3 List<> switch was
introduced to fix.

- TableDependency.nameParts: List<String> -> String[]
- FunctionDependency.nameParts: List<String> -> String[]
- DependencyList.dependencies: List<Dependency> -> Dependency[]
- Each record gains explicit equals / hashCode / toString overrides using
  Arrays.equals / Arrays.hashCode / Arrays.toString so structural names
  still compare value-wise. Without the overrides, records' auto-generated
  methods on array fields fall through to Object.equals (reference equality).
- Each record gains a defensive accessor override returning a clone, so
  callers cannot mutate the record's internal array.
- Static factories Dependency.table / Dependency.function: switch from
  List<String> to String[] (non-vararg, satisfying the same R3189677004
  intent -- no `: _*` splat).
- DependencyList.of: switch from List<Dependency> to Dependency[].
- DataSourceV2Strategy producer: build via .toArray instead of .asJava;
  drop the now-unused CollectionConverters import.
- MetricViewV2CatalogSuite: array-shape access (.length / (0) / .toSeq)
  instead of .size / .get(0) / .asScala.toSeq.

Also folds in three pre-existing CI failures from `90deee1949c`:

- Dependency.java Javadoc: `{@link #function(String...)}` ->
  `{@link #function(String[])}` so `Run / Documentation generation`
  resolves the new signature.
- MetricViewV2CatalogSuite ALTER VIEW RENAME TO test: drop the catalog
  component from the new name (`ns.mv_renamed`, not
  `testcat.ns.mv_renamed`) per upstream `DataSourceV2SQLSuite`
  convention, so `newName.asIdentifier` doesn't leak the catalog into
  the namespace and the fixture's relocated entry can be found by
  loadTableOrView.
- PlanResolutionSuite `alter view: alter view properties`: stub the new
  `isViewLike` method on the Mockito CatalogTable mock at line 141 so
  Analyzer.scala:1229's call to `v1Table.v1Table.isViewLike` returns the
  correct value computed from the stubbed `tableType`. Without the stub
  Mockito returns false (default for unstubbed Boolean methods), the
  view fixture falls through to the table branch, and the test fails
  with `EXPECT_VIEW_NOT_TABLE.USE_ALTER_TABLE`.

Signed-off-by: chen.wang <chen.wang@databricks.com>
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 6 review.

Status: 2 addressed (R5 nit on DataSourceV2Strategy comment citation, fixed in fd3d8b2; R5 late catch on Dependency var-args API, fixed in f2251d8). 1 remaining (analyzeMetricViewText cyclic-view-reference follow-up — still scoped out per round 4). 4 new (3 newly introduced — DBR leak in new isViewLike Javadoc, misleading "Body kept in sync" line, null-element rejection regression on the new array records; 1 late catch — single-VIEW audit gap at views.scala).

(GitHub's inline-comment endpoint is returning a transient 422 "internal error" for this PR, so the per-line findings are inlined below as a temporary workaround. I'll re-post them properly once the API recovers.)


1. DBR terminology leak — interface.scala:466 (newly introduced in 90deee1). "DBR" is internal Databricks-Runtime terminology and shouldn't appear in OSS comments — round 4's inline #2 explicitly flagged this for the test comment, round 5 confirmed it was fixed, but porting the helper from the internal fork brought the term back. Drop the parenthetical or rephrase generically:

-   * METRIC_VIEW. Forks may extend this set (e.g. DBR also includes MATERIALIZED_VIEW and
-   * STREAMING_TABLE), so call sites that need a uniform "is this view-like?" check should
+   * METRIC_VIEW. Forks may extend this set with additional view-like types, so call sites
+   * that need a uniform "is this view-like?" check should

2. Misleading "Body kept in sync" line — interface.scala:773-774 (newly introduced in 90deee1). There's only one body to keep in sync because def isViewLike: Boolean = CatalogTable.isViewLike(tableType) (line 470) delegates to this companion form. The note reads as if there are two parallel bodies a future maintainer needs to update together. Drop the line:

-   * patterns or [[org.apache.spark.sql.catalyst.catalog.SessionCatalog.isView]]). Body kept
-   * in sync with the instance method.
+   * patterns or [[org.apache.spark.sql.catalyst.catalog.SessionCatalog.isView]]).

3. Null-element rejection regression — TableDependency.java:55, FunctionDependency.java:47, DependencyList.java:50 (newly introduced in f2251d8). Subtle regression from the List<String>String[] switch: the previous List.copyOf(nameParts) rejected null elements with NPE at construction time (immutable-list invariant). nameParts.clone() does not, so Dependency.table(new String[]{null, "tbl"}) now silently constructs a TableDependency whose nameParts()[0] is null, surfacing as an NPE later when a consumer iterates parts. Same pattern in all three records.

Spark's own producer in MetricViewHelper.collectTableDependencies never emits nulls (parts come from analyzed-plan identifiers), but Dependency / DependencyList is @Evolving SPI exposed to third-party catalogs. If you want to preserve the strictness, add a per-element non-null check after the clone in each compact constructor; if relaxing it is fine, document the precondition on the nameParts / dependencies Javadoc. Not blocking either way.

4. views.scala:175 audit miss — single-VIEW audit gap (late catch). The disjunction-style tableType == VIEW || tableType == METRIC_VIEW was migrated to isViewLike in 14 sites, but CreateViewCommand.runCommand at views.scala:175 still uses the single-VIEW form tableMetadata.tableType != CatalogTableType.VIEW. Pre-PR, a metric view had tableType == VIEW so this branch did not fire on cross-kind collision; post-PR, a metric view has tableType == METRIC_VIEW, so CREATE VIEW <existing_metric_view> (no IF NOT EXISTS) now throws unsupportedCreateOrReplaceViewOnTableError ("… on a table") instead of pre-PR viewAlreadyExistsError. The error message no longer matches reality (the existing object is a view, just a metric one). Either migrate to !isViewLike to recover pre-PR behavior, or keep the strict separation but adjust the error class. Related lower-stakes consistency gap: createDataSourceTables.scala:53,154 (assert(table.tableType != CatalogTableType.VIEW)) silently accepts METRIC_VIEW input where the audit's intent reads as "no view-like types here."

…pache#55487

Address 4 findings from
apache#55487 (review):

- Drop "DBR" terminology leak in CatalogTable.isViewLike Javadoc
  (interface.scala:466) -- newly introduced in round 5; matches the
  resolution of round 4's DBR-mention finding on test comments.
- Drop misleading "Body kept in sync with the instance method" line in
  the CatalogTable.isViewLike(t: CatalogTableType) companion Javadoc
  (interface.scala:773-774) -- there's only one body since the instance
  method delegates to the companion form.
- Document the non-null-element precondition on TableDependency,
  FunctionDependency, and DependencyList @param Javadocs. Round-5b's
  switch from List.copyOf back to Array.clone() lost the implicit
  null-element rejection that the immutable-list invariant gave. Spark's
  own producer never emits nulls, but the SPI is @evolving and exposed
  to third-party catalogs; documenting the precondition keeps the API
  consistent with other DSv2 array-shape SPIs (Identifier.namespace,
  Column.children) which also trust callers.
- views.scala:175 (CreateViewCommand.runCommand) audit miss: change
  `tableMetadata.tableType != CatalogTableType.VIEW` to
  `!tableMetadata.isViewLike` so `CREATE VIEW <existing_metric_view>`
  surfaces the correct viewAlreadyExistsError instead of the
  misclassifying unsupportedCreateOrReplaceViewOnTableError ("...on a
  table"). Pre-PR this branch didn't fire on metric views (they had
  tableType == VIEW); post-PR they have tableType == METRIC_VIEW, so the
  strict comparison incorrectly classified them as tables. Also widen
  the two assertions in createDataSourceTables.scala
  (`assert(table.tableType != CatalogTableType.VIEW)`) to
  `assert(!table.isViewLike)` per the same pattern -- these are
  invariant sanity checks whose intent is "no view-like input here".

Signed-off-by: chen.wang <chen.wang@databricks.com>
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 7 review.

Status: 4 addressed (all R6 findings: DBR terminology drop in interface.scala:466, "Body kept in sync" line in interface.scala:773-774, null-element precondition documented on the three array-record @param Javadocs, views.scala:175 audit miss + bonus widening of the two assertions in createDataSourceTables.scala). 0 remaining. 2 new -- both late catches of mine: same isViewLike audit-miss pattern at two more sites in HiveExternalCatalog.scala (605 and 854) that I should have caught alongside views.scala:175 in round 6.

(GitHub's inline-comment endpoint was returning a transient 422 last round; trying inline again this round.)

Comment thread sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala Outdated
Comment thread sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala Outdated
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2 (to finalize the ds v2 view api before release)

@cloud-fan cloud-fan closed this in 8cfc73a May 7, 2026
cloud-fan added a commit that referenced this pull request May 7, 2026
### What changes were proposed in this pull request?

This PR extends metric-view support to **DS v2 catalogs** by routing `CREATE VIEW ... WITH METRICS` through the `ViewCatalog` / `TableViewCatalog` APIs introduced by [SPARK-52729](#51419) and finalized by [SPARK-56655](https://github.com/apache/spark/pull/55954). Third-party v2 catalogs that implement `ViewCatalog` can now host metric views with the same metadata fidelity as session-catalog metric views.

**1. V2 metric-view CREATE path -- shared with `CreateV2ViewExec`.** A new `CreateV2MetricViewExec` and `CreateV2ViewExec` both extend a new `V2CreateViewPreparation` trait (which itself extends `V2ViewPreparation`). The trait owns the shared CREATE-side `run()`: `viewExists` short-circuit on `IF NOT EXISTS`, `createOrReplaceView` for `OR REPLACE`, and cross-type collision decoding (`ViewAlreadyExistsException` -> `tableExists` -> `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode `UNSUPPORTED`, typed `viewDependencies`, `PROP_TABLE_TYPE = METRIC_VIEW`, `retainColumnMetadata = true`) via optional hooks on `V2ViewPreparation`. `DataSourceV2Strategy` intercepts `CreateMetricViewCommand` on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in `CreateMetricViewCommand.run`.

**2. First-class `METRIC_VIEW` table type.**
- `CatalogTableType.METRIC_VIEW` is added alongside `EXTERNAL` / `MANAGED` / `VIEW`.
- `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant for the V2 surface.
- The previous `view.viewWithMetrics` property hack is removed; `CatalogTable.isMetricView` checks `tableType == METRIC_VIEW` directly.
- `V1Table.summarizeTableType` and `V1Table.toCatalogTable(catalog, ident, ViewInfo)` translate between the V2 property form and the V1 enum.
- HMS round-trip support: `HiveTableType` has no `METRIC_VIEW` variant (both regular views and metric views serialize as `VIRTUAL_VIEW`). `HiveExternalCatalog` now persists a `view.subType = METRIC_VIEW` property on write and lifts `tableType` back to `METRIC_VIEW` on read, so HMS-backed metric views survive the round trip.

**3. Repo-wide `tableType == VIEW` audit + `CatalogTable.isViewLike` helper.** Promoting metric views to a distinct `CatalogTableType` opens silent regressions wherever existing code branches on `VIEW`. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces:
- `CatalogTable.isViewLike` instance method (DBR parity: today returns `tableType == VIEW || tableType == METRIC_VIEW`; forks may extend the set).
- `CatalogTable.isViewLike(t: CatalogTableType)` companion form for the few sites that have a `CatalogTableType` but no `CatalogTable` (e.g. `SessionCatalog.isView`, `verifyAlterTableType`, `HiveClientImpl.toHiveTableType`).

All 18 sites in `catalyst` / `core` / `hive` that previously did inline `tableType == VIEW || tableType == METRIC_VIEW` (or the `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW` pattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites: `CatalogTable.toJsonLinkedHashMap` (DESCRIBE EXTENDED rows), `HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}`, `HiveClientImpl.toHiveTableType`, `SessionCatalog.isView`, `InMemoryCatalog.listViews`, `RelationResolution`, `Analyzer.lookupTableOrView`, `rules.scala`, `DataStreamWriter`, `DescribeRelationJsonCommand`, `AnalyzeColumnCommand`, `AnalyzePartitionCommand`, `CommandUtils.analyzeTable`, `V2SessionCatalog.dropTableInternal`, `verifyAlterTableType` in `ddl.scala`, and 3 sites in `tables.scala`.

**Explicit rejection (uniform error class):** `SHOW CREATE TABLE` on a metric view has no round-trippable `CREATE VIEW ... WITH METRICS` form, so it's rejected explicitly with the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class on **both** the v1 session-catalog path (in `tables.scala`) and the v2 catalog path (in `DataSourceV2Strategy`), so users see the same actionable message regardless of catalog kind.

**4. Drop-command parity.**
- `DropTableCommand` (v1 path) treats both `VIEW` and `METRIC_VIEW` as views: `DROP TABLE` rejects either with `wrongCommandForObjectTypeError`, and `DROP VIEW` accepts either.
- `V2SessionCatalog.dropTableInternal` extends the existing "view rejected from `DROP TABLE`" guard to cover `METRIC_VIEW`.
- For non-session v2: `DropTableExec` (post-SPARK-56655) actively rejects with `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since `TableViewCatalog`'s default `viewExists` derives from `loadTableOrView` and recognizes `MetadataTable + ViewInfo`.
- `ResolveSessionCatalog`'s `DropView` routing comment is clarified: v2 metric views fall through to `DataSourceV2Strategy` and `ViewCatalog.dropView`.

**5. Typed view dependencies (`ViewInfo.viewDependencies`).**
- New public DTOs in `org.apache.spark.sql.connector.catalog`: `Dependency` (sealed interface with `Dependency.table(String[])` / `Dependency.function(String[])` non-vararg factories), `TableDependency`, `FunctionDependency`, `DependencyList(Dependency[])`.
- `TableDependency` and `FunctionDependency` carry the dependency identifier as **structural multi-part name parts** (`record TableDependency(String[] nameParts)`), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg `cat.db1.db2.tbl` -> 4 parts) round-trip without ambiguity against quoted identifiers containing literal `.`. v1 sources resolved through the session catalog are normalized by a new `MetricViewHelper.qualifyV1` to a stable 3-part `[spark_catalog, db, table]` shape so consumers see deterministic arity per source kind (otherwise `TableIdentifier.nameParts` could return 1, 2, or 3 parts depending on what the analyzer captured).
- All three records (`TableDependency`, `FunctionDependency`, `DependencyList`) override `equals` / `hashCode` / `toString` using `Arrays.equals` / `Arrays.hashCode` / `Arrays.toString` to give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through to `Object.equals` (reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensive `clone()` so callers cannot mutate the record's internal array.
- `ViewInfo` gains a `viewDependencies` field and a `ViewInfo.Builder.withViewDependencies(...)` setter. Per the field's contract, `null` means "no dependency list was supplied" while an empty `DependencyList.of(new Dependency[0])` means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even when `collectTableDependencies` returns empty.
- `MetricViewHelper.collectTableDependencies` walks the analyzed plan and emits structural `Seq[Seq[String]]` parts; the v2 source arm preserves full namespace arity, the v1 source arms (`View`, `HiveTableRelation`, `LogicalRelation`) all route through `qualifyV1` for the stable 3-part shape.

**6. Multi-level-namespace targets for v2 metric views.** `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. The helper now takes `nameParts: Seq[String]` directly; call sites in both the v1 path (`CreateMetricViewCommand`) and the v2 path (`DataSourceV2Strategy`) updated. The helper now also returns `(LogicalPlan, MetricView)` so callers don't have to re-parse the YAML body just to read descriptor properties.

**7. `metric_view.*` descriptor properties (v1/v2 parity).** `MetricView.getProperties` produces canonical descriptive properties (`metric_view.from.type`, `metric_view.from.name` / `metric_view.from.sql`, `metric_view.where`) that **both** the v1 path (`CreateMetricViewCommand.createMetricViewInSessionCatalog`) and the v2 path (`DataSourceV2Strategy`) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows in `DESCRIBE TABLE EXTENDED` regardless of catalog kind. Long values are truncated to `Constants.MAXIMUM_PROPERTY_SIZE`; the Scaladoc on `getProperties` calls out that `metric_view.from.sql` is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.

**8. `ViewInfo` constructor cleanup.** The metric-view-specific `PROP_TABLE_TYPE = METRIC_VIEW` special case is dropped from the generic `ViewInfo` constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. `METRIC_VIEW`) call `BaseBuilder.withTableType(...)` before `build()` -- exercised by `CreateV2MetricViewExec` via the new `V2ViewPreparation.tableType` hook.

**9. `ViewHelper.aliasPlan(retainMetadata)`.** The user-specified-column-with-comment branch in `aliasPlan` previously dropped existing column metadata. A new `retainMetadata: Boolean = false` parameter merges the analyzed attribute's metadata into the new comment metadata. `ViewHelper.prepareTable` passes `retainMetadata = isMetricView` (v1 path); `V2ViewPreparation` exposes a `retainColumnMetadata` hook that `CreateV2MetricViewExec` overrides to `true` (v2 path). Both preserve the per-column `metric_view.type` / `metric_view.expr` keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.

**10. Error classes.**
- New `INVALID_METRIC_VIEW_YAML` (sqlState 42K0L). `MetricViewPlanner.parseYAML`'s catch blocks now route through `QueryCompilationErrors.invalidMetricViewYamlError` instead of `SparkException.internalError`, so a typo in the user's YAML body surfaces as a user-correctable `AnalysisException` rather than "please contact support".
- New `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path so `SHOW CREATE TABLE` on a metric view produces the same actionable message regardless of catalog kind.

**11. Misc.**
- `MetricViewCanonical.parseSource` accepts multipart identifiers (`parseMultipartIdentifier`) so 3-part `catalog.schema.table` source references work as `AssetSource`.

### Why are the changes needed?

Before this PR, metric-view DDL only worked against the session catalog: the create path called `SessionCatalog.createTable` directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped `ViewCatalog` and `TableViewCatalog` as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.

Once metric views can live on a v2 catalog, two more constraints surface:

1. **Type discriminator.** A consumer reading a row through `ViewCatalog.loadView` needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in `PROP_TABLE_TYPE = METRIC_VIEW` keeps the distinction wire-compatible and lets `V1Table.toCatalogTable` reconstruct `CatalogTableType.METRIC_VIEW` on the read path.
2. **Structured dependency lineage.** Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed `DependencyList` of `TableDependency` / `FunctionDependency` with structural `String[] nameParts` lets catalogs persist the lineage as a first-class field with full fidelity.

The remaining changes (drop-command parity, `aliasPlan` metadata retention, `metric_view.*` properties, `parseMultipartIdentifier`, `tableType == VIEW` audit + `isViewLike` helper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real `CatalogTableType` and as a v2 catalog citizen -- without them, basic operations like `DROP VIEW`, `DESCRIBE TABLE EXTENDED`, `CREATE VIEW (a COMMENT 'c') WITH METRICS ...`, or `CREATE VIEW cat.db1.db2.mv WITH METRICS ...` would silently degrade.

### Does this PR introduce _any_ user-facing change?

Yes, both for end users and for catalog plugin developers:

**End users:**
- `CREATE VIEW <ident> WITH METRICS ...` now works against any v2 catalog that implements `ViewCatalog`, including catalogs with multi-level namespace targets. Previously it was rejected with `MISSING_CATALOG_ABILITY.VIEWS` for non-session catalogs, and capped at single-level namespaces. `IF NOT EXISTS` and `OR REPLACE` are honored on the v2 path (regression vs. v1 fixed).
- A v2 metric view can be queried with `SELECT region, measure(count_sum) FROM <mv> ...`, dropped with `DROP VIEW`, listed via `SHOW VIEWS` (and via `SHOW TABLES` on a `TableViewCatalog`, matching v1 SHOW TABLES output), and described with `DESCRIBE TABLE` / `DESCRIBE TABLE EXTENDED`. `DROP TABLE` on a metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead").
- `ALTER VIEW <metric_view> RENAME TO ...` is wired through `RenameV2ViewExec` and preserves the metric-view kind across the rename.
- `SHOW CREATE TABLE` on a metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind.
- Session-catalog metric views are now stored as `CatalogTableType.METRIC_VIEW` instead of `CatalogTableType.VIEW + view.viewWithMetrics=true`. Observable in `DESCRIBE TABLE EXTENDED`'s `Type` row and the `tableType` column of the `tables` system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a `view.subType = METRIC_VIEW` property marker.
- `DESCRIBE TABLE EXTENDED` on metric views (v1 and v2) now consistently surfaces `metric_view.from.type` / `metric_view.from.name` / `metric_view.from.sql` / `metric_view.where` descriptor rows.
- Error messages from `DROP TABLE` / `DROP VIEW` mismatch now mention `METRIC_VIEW` alongside `VIEW`.
- Malformed metric-view YAML now surfaces as `INVALID_METRIC_VIEW_YAML` (user-correctable) instead of "Spark internal error, please contact support".

**Catalog plugin developers:**
- New public API surface in `org.apache.spark.sql.connector.catalog`: sealed interface `Dependency` with `permits TableDependency, FunctionDependency`, both records carrying `String[] nameParts`. `Dependency.table(String[])` / `Dependency.function(String[])` static factories (non-vararg per review; callers pass an existing array directly). `DependencyList(Dependency[])` with `DependencyList.of(Dependency[])` factory. All three records override `equals` / `hashCode` / `toString` to give value semantics on their array fields, and the canonical accessors return a defensive `clone()` so internal state is not mutable through the public API. All `Evolving`, `since 4.2.0`. Note: today's only producer in Spark itself is metric-view dependency extraction, which emits `TableDependency` only; `FunctionDependency` and `Dependency.function(...)` are exposed as groundwork for future producers (e.g. SQL UDF dep tracking).
- `ViewInfo` gains a typed `viewDependencies()` accessor and `ViewInfo.Builder.withViewDependencies(...)` setter. `viewDependencies` is populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back through `V1Table.toCatalogTable`) carry `null`. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up.
- `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant.
- `CatalogTableType.METRIC_VIEW` enum value (v1 surface).
- `CatalogTable.isViewLike` instance + `CatalogTable.isViewLike(CatalogTableType)` companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR's `MATERIALIZED_VIEW`, `STREAMING_TABLE`) only need to extend the helper body.
- `V2ViewPreparation` (private to `org.apache.spark.sql.execution.datasources.v2`) gains optional `viewDependencies` / `tableType` / `retainColumnMetadata` hooks.

### How was this patch tested?

`MetricViewV2CatalogSuite` -- 31 tests across 5 sections, all against an in-memory `ViewCatalog` test fixture (`MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog`):

**Section 1 -- CREATE-related (11 tests):**
- V2 catalog receives `METRIC_VIEW` table type and view text via `ViewInfo`.
- V2 catalog path populates `metric_view.*` descriptor properties + view context (`currentCatalog` / `currentNamespace`) + captured SQL configs.
- V2 catalog path captures `SQLSource` and comment.
- Metric view columns carry `metric_view.type` / `metric_view.expr` in column metadata.
- User-specified column names with comments preserve `metric_view.*` metadata (pins the `aliasPlan(retainMetadata = true)` fix).
- `CREATE OR REPLACE VIEW ... WITH METRICS` replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, `metric_view.where`, dependencies).
- `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when the view exists (catalog never sees the second `createView` call).
- `CREATE VIEW ... WITH METRICS` over a v2 table at the ident throws `TABLE_OR_VIEW_ALREADY_EXISTS` (analyzer-time pre-check).
- `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when a v2 table sits at the ident (v1 parity).
- `CREATE VIEW ... WITH METRICS` on a non-`ViewCatalog` catalog fails with `MISSING_CATALOG_ABILITY.VIEWS`.
- `CREATE VIEW ... WITH METRICS` at a multi-level-namespace v2 target (`testcat.ns_a.ns_b.mv_deep`) succeeds (pins the `analyzeMetricViewText` lift to `Seq[String]`).

**Section 2 -- Dependency extraction (5 tests):**
- SQL source `JOIN` captures both tables as 3-part `nameParts`.
- SQL source subquery deduplicates same-table references.
- SQL source self-join deduplicates same-table references.
- V1 session-catalog source emits exactly 3 parts, normalized to `[spark_catalog, db, table]` by `qualifyV1`.
- Multi-level V2 namespace source (`testcat.ns_a.ns_b.events_deep`) emits 4-part `nameParts`.

**Section 3 -- SELECT cases (5 tests, modeled on `MetricViewSuite` patterns):**
- `SELECT measure(count_sum) FROM <mv> GROUP BY region` returns aggregated rows (exercises the full `loadTableOrView` -> `MetadataTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip).
- `SELECT measure(...) WHERE region = ...` -- query-layer filter on top of the view.
- View's pre-defined `where` clause is applied (`where = Some("count > 1")` filters at view-resolution time).
- Multiple measures with different aggregations (sum / sum / max).
- `ORDER BY measure(...) DESC LIMIT 1` over the metric view.

**Section 4 -- DESCRIBE cases (2 tests):**
- `DESCRIBE TABLE EXTENDED` round-trips through `loadTableOrView` and emits the `View Text` / `Type` rows (gated through `CatalogTable.isViewLike` in `toJsonLinkedHashMap`, which now recognizes `METRIC_VIEW`).
- `DESCRIBE TABLE` (non-EXTENDED) returns the aliased columns.

**Section 5 -- DROP / SHOW / RENAME cases (8 tests):**
- `DROP VIEW` succeeds on a v2 metric view.
- `DROP VIEW IF EXISTS` on a non-existent v2 metric view is a no-op.
- `DROP TABLE` on a v2 metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", per SPARK-56655's `DropTableExec`) and asserts the metric view is **not** deleted.
- `DROP TABLE IF EXISTS` on a v2 metric view also throws (`IF EXISTS` doesn't silence the wrong-type error, v1 parity).
- `SHOW CREATE TABLE` on a v2 metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (same dedicated error class as the v1 path).
- `SHOW TABLES` on a `TableViewCatalog` lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655).
- `SHOW VIEWS` lists v2 metric views.
- `ALTER VIEW <metric_view> RENAME TO ...` succeeds and preserves the metric-view kind across the rename (pins `RenameV2ViewExec` end-to-end against the fixture's `renameView`).

Existing session-catalog metric-view tests (`MetricViewSuite`, `SimpleMetricViewSuite`, `HiveMetricViewSuite`) and v1 path tests pass unchanged. `DDLSuite` and `HiveDDLSuite` had their `tableTypes` enumerations updated to include `METRIC_VIEW` in two assertion lists. `PlanResolutionSuite` test fixture was updated to stub the new `CatalogTable.isViewLike` method on the Mockito mock.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Sonnet 4.7)

Closes #55487 from chenwang-databricks/metric-view-on-51419.

Lead-authored-by: Chen Wang <chen.wang@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8cfc73a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request May 7, 2026
### What changes were proposed in this pull request?

This PR extends metric-view support to **DS v2 catalogs** by routing `CREATE VIEW ... WITH METRICS` through the `ViewCatalog` / `TableViewCatalog` APIs introduced by [SPARK-52729](#51419) and finalized by [SPARK-56655](https://github.com/apache/spark/pull/55954). Third-party v2 catalogs that implement `ViewCatalog` can now host metric views with the same metadata fidelity as session-catalog metric views.

**1. V2 metric-view CREATE path -- shared with `CreateV2ViewExec`.** A new `CreateV2MetricViewExec` and `CreateV2ViewExec` both extend a new `V2CreateViewPreparation` trait (which itself extends `V2ViewPreparation`). The trait owns the shared CREATE-side `run()`: `viewExists` short-circuit on `IF NOT EXISTS`, `createOrReplaceView` for `OR REPLACE`, and cross-type collision decoding (`ViewAlreadyExistsException` -> `tableExists` -> `EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE`). The metric-view subclass only supplies the metric-view-specific bits (no collation, schema-mode `UNSUPPORTED`, typed `viewDependencies`, `PROP_TABLE_TYPE = METRIC_VIEW`, `retainColumnMetadata = true`) via optional hooks on `V2ViewPreparation`. `DataSourceV2Strategy` intercepts `CreateMetricViewCommand` on a non-session catalog and routes to the new exec; the v1 session-catalog path stays in `CreateMetricViewCommand.run`.

**2. First-class `METRIC_VIEW` table type.**
- `CatalogTableType.METRIC_VIEW` is added alongside `EXTERNAL` / `MANAGED` / `VIEW`.
- `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant for the V2 surface.
- The previous `view.viewWithMetrics` property hack is removed; `CatalogTable.isMetricView` checks `tableType == METRIC_VIEW` directly.
- `V1Table.summarizeTableType` and `V1Table.toCatalogTable(catalog, ident, ViewInfo)` translate between the V2 property form and the V1 enum.
- HMS round-trip support: `HiveTableType` has no `METRIC_VIEW` variant (both regular views and metric views serialize as `VIRTUAL_VIEW`). `HiveExternalCatalog` now persists a `view.subType = METRIC_VIEW` property on write and lifts `tableType` back to `METRIC_VIEW` on read, so HMS-backed metric views survive the round trip.

**3. Repo-wide `tableType == VIEW` audit + `CatalogTable.isViewLike` helper.** Promoting metric views to a distinct `CatalogTableType` opens silent regressions wherever existing code branches on `VIEW`. To consolidate the audit and reduce divergence with the Databricks Runtime (which has the same helper), this PR introduces:
- `CatalogTable.isViewLike` instance method (DBR parity: today returns `tableType == VIEW || tableType == METRIC_VIEW`; forks may extend the set).
- `CatalogTable.isViewLike(t: CatalogTableType)` companion form for the few sites that have a `CatalogTableType` but no `CatalogTable` (e.g. `SessionCatalog.isView`, `verifyAlterTableType`, `HiveClientImpl.toHiveTableType`).

All 18 sites in `catalyst` / `core` / `hive` that previously did inline `tableType == VIEW || tableType == METRIC_VIEW` (or the `CatalogTableType.VIEW | CatalogTableType.METRIC_VIEW` pattern alternation) are now routed through these helpers, so adding a new view-like type in the future is a one-line change in the helper body. Notable touched call sites: `CatalogTable.toJsonLinkedHashMap` (DESCRIBE EXTENDED rows), `HiveExternalCatalog.{createTable, alterTable, restoreTableMetadata}`, `HiveClientImpl.toHiveTableType`, `SessionCatalog.isView`, `InMemoryCatalog.listViews`, `RelationResolution`, `Analyzer.lookupTableOrView`, `rules.scala`, `DataStreamWriter`, `DescribeRelationJsonCommand`, `AnalyzeColumnCommand`, `AnalyzePartitionCommand`, `CommandUtils.analyzeTable`, `V2SessionCatalog.dropTableInternal`, `verifyAlterTableType` in `ddl.scala`, and 3 sites in `tables.scala`.

**Explicit rejection (uniform error class):** `SHOW CREATE TABLE` on a metric view has no round-trippable `CREATE VIEW ... WITH METRICS` form, so it's rejected explicitly with the dedicated `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` error class on **both** the v1 session-catalog path (in `tables.scala`) and the v2 catalog path (in `DataSourceV2Strategy`), so users see the same actionable message regardless of catalog kind.

**4. Drop-command parity.**
- `DropTableCommand` (v1 path) treats both `VIEW` and `METRIC_VIEW` as views: `DROP TABLE` rejects either with `wrongCommandForObjectTypeError`, and `DROP VIEW` accepts either.
- `V2SessionCatalog.dropTableInternal` extends the existing "view rejected from `DROP TABLE`" guard to cover `METRIC_VIEW`.
- For non-session v2: `DropTableExec` (post-SPARK-56655) actively rejects with `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead") when a view sits at the ident -- works unchanged for metric views since `TableViewCatalog`'s default `viewExists` derives from `loadTableOrView` and recognizes `MetadataTable + ViewInfo`.
- `ResolveSessionCatalog`'s `DropView` routing comment is clarified: v2 metric views fall through to `DataSourceV2Strategy` and `ViewCatalog.dropView`.

**5. Typed view dependencies (`ViewInfo.viewDependencies`).**
- New public DTOs in `org.apache.spark.sql.connector.catalog`: `Dependency` (sealed interface with `Dependency.table(String[])` / `Dependency.function(String[])` non-vararg factories), `TableDependency`, `FunctionDependency`, `DependencyList(Dependency[])`.
- `TableDependency` and `FunctionDependency` carry the dependency identifier as **structural multi-part name parts** (`record TableDependency(String[] nameParts)`), not a single dot-flattened string. Arity is preserved per source so multi-level-namespace V2 catalogs (e.g. Iceberg `cat.db1.db2.tbl` -> 4 parts) round-trip without ambiguity against quoted identifiers containing literal `.`. v1 sources resolved through the session catalog are normalized by a new `MetricViewHelper.qualifyV1` to a stable 3-part `[spark_catalog, db, table]` shape so consumers see deterministic arity per source kind (otherwise `TableIdentifier.nameParts` could return 1, 2, or 3 parts depending on what the analyzer captured).
- All three records (`TableDependency`, `FunctionDependency`, `DependencyList`) override `equals` / `hashCode` / `toString` using `Arrays.equals` / `Arrays.hashCode` / `Arrays.toString` to give value semantics on their array fields. Without the overrides, records' auto-generated methods on array fields fall through to `Object.equals` (reference equality), which would make structural multi-part names unusable as Map keys / for dedup. Each record also overrides the canonical accessor to return a defensive `clone()` so callers cannot mutate the record's internal array.
- `ViewInfo` gains a `viewDependencies` field and a `ViewInfo.Builder.withViewDependencies(...)` setter. Per the field's contract, `null` means "no dependency list was supplied" while an empty `DependencyList.of(new Dependency[0])` means "supplied but the object has none" -- metric-view CREATE always emits the latter, never the former, even when `collectTableDependencies` returns empty.
- `MetricViewHelper.collectTableDependencies` walks the analyzed plan and emits structural `Seq[Seq[String]]` parts; the v2 source arm preserves full namespace arity, the v1 source arms (`View`, `HiveTableRelation`, `LogicalRelation`) all route through `qualifyV1` for the stable 3-part shape.

**6. Multi-level-namespace targets for v2 metric views.** `MetricViewHelper.analyzeMetricViewText` previously required a `TableIdentifier`, capping the metric-view target at 3 name parts. v2 metric views with multi-level-namespace targets (e.g. `cat.db1.db2.mv`) failed at `ident.asTableIdentifier` with `requiresSinglePartNamespaceError`. The helper now takes `nameParts: Seq[String]` directly; call sites in both the v1 path (`CreateMetricViewCommand`) and the v2 path (`DataSourceV2Strategy`) updated. The helper now also returns `(LogicalPlan, MetricView)` so callers don't have to re-parse the YAML body just to read descriptor properties.

**7. `metric_view.*` descriptor properties (v1/v2 parity).** `MetricView.getProperties` produces canonical descriptive properties (`metric_view.from.type`, `metric_view.from.name` / `metric_view.from.sql`, `metric_view.where`) that **both** the v1 path (`CreateMetricViewCommand.createMetricViewInSessionCatalog`) and the v2 path (`DataSourceV2Strategy`) merge into the view's properties bag, so catalog browsers and tooling see the same descriptor rows in `DESCRIBE TABLE EXTENDED` regardless of catalog kind. Long values are truncated to `Constants.MAXIMUM_PROPERTY_SIZE`; the Scaladoc on `getProperties` calls out that `metric_view.from.sql` is therefore a descriptive value, not a round-trippable representation -- consumers should re-read the YAML body for the full SQL.

**8. `ViewInfo` constructor cleanup.** The metric-view-specific `PROP_TABLE_TYPE = METRIC_VIEW` special case is dropped from the generic `ViewInfo` constructor in favor of `properties().putIfAbsent(...)`. Callers that want a more specific kind (e.g. `METRIC_VIEW`) call `BaseBuilder.withTableType(...)` before `build()` -- exercised by `CreateV2MetricViewExec` via the new `V2ViewPreparation.tableType` hook.

**9. `ViewHelper.aliasPlan(retainMetadata)`.** The user-specified-column-with-comment branch in `aliasPlan` previously dropped existing column metadata. A new `retainMetadata: Boolean = false` parameter merges the analyzed attribute's metadata into the new comment metadata. `ViewHelper.prepareTable` passes `retainMetadata = isMetricView` (v1 path); `V2ViewPreparation` exposes a `retainColumnMetadata` hook that `CreateV2MetricViewExec` overrides to `true` (v2 path). Both preserve the per-column `metric_view.type` / `metric_view.expr` keys that the analyzer attaches to dimensions and measures even when the user renames columns and adds comments.

**10. Error classes.**
- New `INVALID_METRIC_VIEW_YAML` (sqlState 42K0L). `MetricViewPlanner.parseYAML`'s catch blocks now route through `QueryCompilationErrors.invalidMetricViewYamlError` instead of `SparkException.internalError`, so a typo in the user's YAML body surfaces as a user-correctable `AnalysisException` rather than "please contact support".
- New `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (sqlState 0A000), used by both the v1 session-catalog path and the v2 catalog path so `SHOW CREATE TABLE` on a metric view produces the same actionable message regardless of catalog kind.

**11. Misc.**
- `MetricViewCanonical.parseSource` accepts multipart identifiers (`parseMultipartIdentifier`) so 3-part `catalog.schema.table` source references work as `AssetSource`.

### Why are the changes needed?

Before this PR, metric-view DDL only worked against the session catalog: the create path called `SessionCatalog.createTable` directly, and there was no way for a third-party v2 catalog (Unity Catalog, Hive Metastore catalog, custom REST catalogs, etc.) to own a metric view's lifecycle. SPARK-52729 / SPARK-56655 shipped `ViewCatalog` and `TableViewCatalog` as the public v2 surface for catalog-managed views; metric views are a kind of view and naturally belong on this surface.

Once metric views can live on a v2 catalog, two more constraints surface:

1. **Type discriminator.** A consumer reading a row through `ViewCatalog.loadView` needs to know it's a metric view, not a plain SQL view, so it can render the right UI / planner output. Encoding this in `PROP_TABLE_TYPE = METRIC_VIEW` keeps the distinction wire-compatible and lets `V1Table.toCatalogTable` reconstruct `CatalogTableType.METRIC_VIEW` on the read path.
2. **Structured dependency lineage.** Metric views always reference at least one source table; cataloging that lineage as flat string properties or single dot-joined strings loses arity for multi-level namespaces and is ambiguous against quoted identifiers. A typed `DependencyList` of `TableDependency` / `FunctionDependency` with structural `String[] nameParts` lets catalogs persist the lineage as a first-class field with full fidelity.

The remaining changes (drop-command parity, `aliasPlan` metadata retention, `metric_view.*` properties, `parseMultipartIdentifier`, `tableType == VIEW` audit + `isViewLike` helper, multi-level-namespace lift, HMS round-trip marker) are mechanical follow-ups that fall out of supporting metric views as a real `CatalogTableType` and as a v2 catalog citizen -- without them, basic operations like `DROP VIEW`, `DESCRIBE TABLE EXTENDED`, `CREATE VIEW (a COMMENT 'c') WITH METRICS ...`, or `CREATE VIEW cat.db1.db2.mv WITH METRICS ...` would silently degrade.

### Does this PR introduce _any_ user-facing change?

Yes, both for end users and for catalog plugin developers:

**End users:**
- `CREATE VIEW <ident> WITH METRICS ...` now works against any v2 catalog that implements `ViewCatalog`, including catalogs with multi-level namespace targets. Previously it was rejected with `MISSING_CATALOG_ABILITY.VIEWS` for non-session catalogs, and capped at single-level namespaces. `IF NOT EXISTS` and `OR REPLACE` are honored on the v2 path (regression vs. v1 fixed).
- A v2 metric view can be queried with `SELECT region, measure(count_sum) FROM <mv> ...`, dropped with `DROP VIEW`, listed via `SHOW VIEWS` (and via `SHOW TABLES` on a `TableViewCatalog`, matching v1 SHOW TABLES output), and described with `DESCRIBE TABLE` / `DESCRIBE TABLE EXTENDED`. `DROP TABLE` on a metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead").
- `ALTER VIEW <metric_view> RENAME TO ...` is wired through `RenameV2ViewExec` and preserves the metric-view kind across the rename.
- `SHOW CREATE TABLE` on a metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (no round-trippable form yet) on both the v1 and v2 paths -- same error class regardless of catalog kind.
- Session-catalog metric views are now stored as `CatalogTableType.METRIC_VIEW` instead of `CatalogTableType.VIEW + view.viewWithMetrics=true`. Observable in `DESCRIBE TABLE EXTENDED`'s `Type` row and the `tableType` column of the `tables` system table. SQL behavior is unchanged. Hive-metastore-backed metric views also round-trip through HMS via a `view.subType = METRIC_VIEW` property marker.
- `DESCRIBE TABLE EXTENDED` on metric views (v1 and v2) now consistently surfaces `metric_view.from.type` / `metric_view.from.name` / `metric_view.from.sql` / `metric_view.where` descriptor rows.
- Error messages from `DROP TABLE` / `DROP VIEW` mismatch now mention `METRIC_VIEW` alongside `VIEW`.
- Malformed metric-view YAML now surfaces as `INVALID_METRIC_VIEW_YAML` (user-correctable) instead of "Spark internal error, please contact support".

**Catalog plugin developers:**
- New public API surface in `org.apache.spark.sql.connector.catalog`: sealed interface `Dependency` with `permits TableDependency, FunctionDependency`, both records carrying `String[] nameParts`. `Dependency.table(String[])` / `Dependency.function(String[])` static factories (non-vararg per review; callers pass an existing array directly). `DependencyList(Dependency[])` with `DependencyList.of(Dependency[])` factory. All three records override `equals` / `hashCode` / `toString` to give value semantics on their array fields, and the canonical accessors return a defensive `clone()` so internal state is not mutable through the public API. All `Evolving`, `since 4.2.0`. Note: today's only producer in Spark itself is metric-view dependency extraction, which emits `TableDependency` only; `FunctionDependency` and `Dependency.function(...)` are exposed as groundwork for future producers (e.g. SQL UDF dep tracking).
- `ViewInfo` gains a typed `viewDependencies()` accessor and `ViewInfo.Builder.withViewDependencies(...)` setter. `viewDependencies` is populated only on the non-session v2 CREATE path; v1 metric views (and v2 metric views read back through `V1Table.toCatalogTable`) carry `null`. Catalog plugin authors persisting dependency lineage should treat the field as v2-only for now -- broadening to v1 is a tracked follow-up.
- `TableSummary.METRIC_VIEW_TABLE_TYPE = "METRIC_VIEW"` constant.
- `CatalogTableType.METRIC_VIEW` enum value (v1 surface).
- `CatalogTable.isViewLike` instance + `CatalogTable.isViewLike(CatalogTableType)` companion helpers (DBR parity helpers for "does this table behave like a view at resolution / DDL time?"). Forks that add their own view-like types (e.g. DBR's `MATERIALIZED_VIEW`, `STREAMING_TABLE`) only need to extend the helper body.
- `V2ViewPreparation` (private to `org.apache.spark.sql.execution.datasources.v2`) gains optional `viewDependencies` / `tableType` / `retainColumnMetadata` hooks.

### How was this patch tested?

`MetricViewV2CatalogSuite` -- 31 tests across 5 sections, all against an in-memory `ViewCatalog` test fixture (`MetricViewRecordingCatalog extends InMemoryTableCatalog with TableViewCatalog`):

**Section 1 -- CREATE-related (11 tests):**
- V2 catalog receives `METRIC_VIEW` table type and view text via `ViewInfo`.
- V2 catalog path populates `metric_view.*` descriptor properties + view context (`currentCatalog` / `currentNamespace`) + captured SQL configs.
- V2 catalog path captures `SQLSource` and comment.
- Metric view columns carry `metric_view.type` / `metric_view.expr` in column metadata.
- User-specified column names with comments preserve `metric_view.*` metadata (pins the `aliasPlan(retainMetadata = true)` fix).
- `CREATE OR REPLACE VIEW ... WITH METRICS` replaces an existing v2 metric view (asserts on the replacement's distinguishing fields: queryText, `metric_view.where`, dependencies).
- `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when the view exists (catalog never sees the second `createView` call).
- `CREATE VIEW ... WITH METRICS` over a v2 table at the ident throws `TABLE_OR_VIEW_ALREADY_EXISTS` (analyzer-time pre-check).
- `CREATE VIEW IF NOT EXISTS ... WITH METRICS` is a no-op when a v2 table sits at the ident (v1 parity).
- `CREATE VIEW ... WITH METRICS` on a non-`ViewCatalog` catalog fails with `MISSING_CATALOG_ABILITY.VIEWS`.
- `CREATE VIEW ... WITH METRICS` at a multi-level-namespace v2 target (`testcat.ns_a.ns_b.mv_deep`) succeeds (pins the `analyzeMetricViewText` lift to `Seq[String]`).

**Section 2 -- Dependency extraction (5 tests):**
- SQL source `JOIN` captures both tables as 3-part `nameParts`.
- SQL source subquery deduplicates same-table references.
- SQL source self-join deduplicates same-table references.
- V1 session-catalog source emits exactly 3 parts, normalized to `[spark_catalog, db, table]` by `qualifyV1`.
- Multi-level V2 namespace source (`testcat.ns_a.ns_b.events_deep`) emits 4-part `nameParts`.

**Section 3 -- SELECT cases (5 tests, modeled on `MetricViewSuite` patterns):**
- `SELECT measure(count_sum) FROM <mv> GROUP BY region` returns aggregated rows (exercises the full `loadTableOrView` -> `MetadataTable(ViewInfo)` -> `V1Table.toCatalogTable(ViewInfo)` -> `ResolveMetricView` round-trip).
- `SELECT measure(...) WHERE region = ...` -- query-layer filter on top of the view.
- View's pre-defined `where` clause is applied (`where = Some("count > 1")` filters at view-resolution time).
- Multiple measures with different aggregations (sum / sum / max).
- `ORDER BY measure(...) DESC LIMIT 1` over the metric view.

**Section 4 -- DESCRIBE cases (2 tests):**
- `DESCRIBE TABLE EXTENDED` round-trips through `loadTableOrView` and emits the `View Text` / `Type` rows (gated through `CatalogTable.isViewLike` in `toJsonLinkedHashMap`, which now recognizes `METRIC_VIEW`).
- `DESCRIBE TABLE` (non-EXTENDED) returns the aliased columns.

**Section 5 -- DROP / SHOW / RENAME cases (8 tests):**
- `DROP VIEW` succeeds on a v2 metric view.
- `DROP VIEW IF EXISTS` on a non-existent v2 metric view is a no-op.
- `DROP TABLE` on a v2 metric view throws `WRONG_COMMAND_FOR_OBJECT_TYPE` ("Use DROP VIEW instead", per SPARK-56655's `DropTableExec`) and asserts the metric view is **not** deleted.
- `DROP TABLE IF EXISTS` on a v2 metric view also throws (`IF EXISTS` doesn't silence the wrong-type error, v1 parity).
- `SHOW CREATE TABLE` on a v2 metric view throws `UNSUPPORTED_SHOW_CREATE_TABLE.ON_METRIC_VIEW` (same dedicated error class as the v1 path).
- `SHOW TABLES` on a `TableViewCatalog` lists both tables and metric views (matches v1 SHOW TABLES output per SPARK-56655).
- `SHOW VIEWS` lists v2 metric views.
- `ALTER VIEW <metric_view> RENAME TO ...` succeeds and preserves the metric-view kind across the rename (pins `RenameV2ViewExec` end-to-end against the fixture's `renameView`).

Existing session-catalog metric-view tests (`MetricViewSuite`, `SimpleMetricViewSuite`, `HiveMetricViewSuite`) and v1 path tests pass unchanged. `DDLSuite` and `HiveDDLSuite` had their `tableTypes` enumerations updated to include `METRIC_VIEW` in two assertion lists. `PlanResolutionSuite` test fixture was updated to stub the new `CatalogTable.isViewLike` method on the Mockito mock.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Sonnet 4.7)

Closes #55487 from chenwang-databricks/metric-view-on-51419.

Lead-authored-by: Chen Wang <chen.wang@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8cfc73a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants