-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: Add materialized views to information_schema #26688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters
ca8c374 to
b32f72c
Compare
|
@sourcery-ai guide |
Reviewer's GuideAdds information_schema.materialized_views support by extending metadata/connector APIs to list and fetch materialized views in bulk, wiring it through information_schema and JDBC metadata listing, implementing it for Iceberg, and adding tests to validate visibility and freshness state reporting. Sequence diagram for querying information_schema.materialized_viewssequenceDiagram
actor User
participant PrestoEngine
participant InformationSchemaPageSourceProvider
participant Metadata
participant MetadataManager
participant CatalogMetadata
participant ConnectorMetadata
participant IcebergAbstractMetadata
User->>PrestoEngine: SELECT * FROM information_schema.materialized_views
PrestoEngine->>InformationSchemaPageSourceProvider: getInformationSchemaTable(session, catalog, TABLE_MATERIALIZED_VIEWS, prefixes)
loop for_each_prefix
InformationSchemaPageSourceProvider->>Metadata: getMaterializedViews(session, prefix)
Metadata->>MetadataManager: getMaterializedViews(session, prefix)
MetadataManager->>CatalogMetadata: listConnectorIds()
loop for_each_connectorId
MetadataManager->>ConnectorMetadata: listMaterializedViews(connectorSession, schemaName)
alt multiple_schemas
MetadataManager->>ConnectorMetadata: getMaterializedViews(connectorSession, schemaName, viewNames)
else single_schema
MetadataManager->>ConnectorMetadata: getMaterializedViews(connectorSession, schemaName, viewNames)
end
ConnectorMetadata-->>MetadataManager: Map viewName_to_definition
end
MetadataManager-->>InformationSchemaPageSourceProvider: Map QualifiedObjectName_to_definition
loop for_each_materialized_view
InformationSchemaPageSourceProvider->>Metadata: getMaterializedViewStatus(session, viewName)
Metadata->>MetadataManager: getMaterializedViewStatus(session, viewName)
MetadataManager->>CatalogMetadata: getMetadata()
CatalogMetadata-->>MetadataManager: ConnectorMetadata
MetadataManager->>ConnectorMetadata: getMaterializedViewStatus(connectorSession, schemaTableName)
ConnectorMetadata-->>MetadataManager: MaterializedViewStatus
MetadataManager-->>InformationSchemaPageSourceProvider: MaterializedViewStatus
InformationSchemaPageSourceProvider-->>InformationSchemaPageSourceProvider: build row with definition, owner, security, storage, base_tables, freshness_state
end
end
InformationSchemaPageSourceProvider-->>PrestoEngine: InternalTable(materialized_views)
PrestoEngine-->>User: result_set_rows
Sequence diagram for JDBC TABLES listing including MATERIALIZED VIEWsequenceDiagram
actor Client
participant JdbcDriver
participant TableJdbcTable
participant Metadata
participant AccessControl
Client->>JdbcDriver: DatabaseMetaData.getTables(..., types=[TABLE, VIEW, MATERIALIZED VIEW])
JdbcDriver->>TableJdbcTable: cursor(transactionHandle, session, split, columns)
loop for_each_prefix
alt include_views
TableJdbcTable->>Metadata: listViews(session, prefix)
Metadata-->>TableJdbcTable: Set views
end
alt include_materialized_views
TableJdbcTable->>Metadata: listMaterializedViews(session, prefix)
Metadata-->>TableJdbcTable: List materializedViews
TableJdbcTable->>AccessControl: filterTables(materializedViews)
AccessControl-->>TableJdbcTable: filtered_materializedViews
TableJdbcTable-->>TableJdbcTable: addRow(name, type=MATERIALIZED VIEW)
end
alt include_tables
TableJdbcTable->>Metadata: listTables(session, prefix)
Metadata-->>TableJdbcTable: Set tables
TableJdbcTable-->>TableJdbcTable: addRow(name, type=TABLE) when not in views and not in materializedViews
end
end
TableJdbcTable-->>JdbcDriver: RecordCursor
JdbcDriver-->>Client: JDBC ResultSet with TABLE, VIEW, MATERIALIZED VIEW rows
ER diagram for new information_schema.materialized_views tableerDiagram
information_schema_tables {
string table_catalog
string table_schema
string table_name
string table_type
}
information_schema_views {
string table_catalog
string table_schema
string table_name
string view_owner
string view_definition
}
information_schema_materialized_views {
string table_catalog
string table_schema
string table_name
string view_definition
string view_owner
string view_security
string storage_schema
string storage_table_name
string base_tables
string freshness_state
}
information_schema_schemata {
string catalog_name
string schema_name
string schema_owner
string default_character_set_name
string default_collation_name
string is_mutable
}
information_schema_schemata ||--o{ information_schema_tables : contains
information_schema_tables ||--o{ information_schema_views : may_have_view
information_schema_tables ||--o{ information_schema_materialized_views : may_have_materialized_view
Class diagram for new materialized view metadata APIsclassDiagram
class Metadata {
<<interface>>
+void dropMaterializedView(Session session, QualifiedObjectName viewName)
+List~QualifiedObjectName~ listMaterializedViews(Session session, QualifiedTablePrefix prefix)
+Map~QualifiedObjectName, MaterializedViewDefinition~ getMaterializedViews(Session session, QualifiedTablePrefix prefix)
+void beginRefreshMaterializedView(Session session, QualifiedObjectName materializedViewName)
+void finishRefreshMaterializedView(Session session, QualifiedObjectName materializedViewName, boolean success, List~QualifiedObjectName~ baseTables)
+List~QualifiedObjectName~ getReferencedMaterializedViews(Session session, QualifiedObjectName tableName)
+MaterializedViewStatus getMaterializedViewStatus(Session session, QualifiedObjectName viewName)
}
class MetadataManager {
+List~QualifiedObjectName~ listMaterializedViews(Session session, QualifiedTablePrefix prefix)
+Map~QualifiedObjectName, MaterializedViewDefinition~ getMaterializedViews(Session session, QualifiedTablePrefix prefix)
+MaterializedViewStatus getMaterializedViewStatus(Session session, QualifiedObjectName viewName)
-Optional~CatalogMetadata~ getOptionalCatalogMetadata(Session session, TransactionManager transactionManager, String catalogName)
}
class DelegatingMetadataManager {
-Metadata delegate
+List~QualifiedObjectName~ listMaterializedViews(Session session, QualifiedTablePrefix prefix)
+Map~QualifiedObjectName, MaterializedViewDefinition~ getMaterializedViews(Session session, QualifiedTablePrefix prefix)
+MaterializedViewStatus getMaterializedViewStatus(Session session, QualifiedObjectName viewName)
}
class ConnectorMetadata {
<<interface>>
+Optional~MaterializedViewDefinition~ getMaterializedView(ConnectorSession session, SchemaTableName viewName)
+List~SchemaTableName~ listMaterializedViews(ConnectorSession session, String schemaName)
+Map~SchemaTableName, MaterializedViewDefinition~ getMaterializedViews(ConnectorSession session, String schemaName, List~SchemaTableName~ viewNames)
+MaterializedViewStatus getMaterializedViewStatus(ConnectorSession session, SchemaTableName materializedViewName, TupleDomain~String~ baseQueryDomain)
+MaterializedViewStatus getMaterializedViewStatus(ConnectorSession session, SchemaTableName materializedViewName)
}
class ClassLoaderSafeConnectorMetadata {
-ConnectorMetadata delegate
-ClassLoader classLoader
+List~SchemaTableName~ listMaterializedViews(ConnectorSession session, String schemaName)
+Map~SchemaTableName, MaterializedViewDefinition~ getMaterializedViews(ConnectorSession session, String schemaName, List~SchemaTableName~ viewNames)
}
class IcebergAbstractMetadata {
+List~SchemaTableName~ listMaterializedViews(ConnectorSession session, String schemaName)
+Optional~MaterializedViewDefinition~ getMaterializedView(ConnectorSession session, SchemaTableName viewName)
+MaterializedViewStatus getMaterializedViewStatus(ConnectorSession session, SchemaTableName materializedViewName)
-View getIcebergView(ConnectorSession session, SchemaTableName viewName)
}
class MetadataResolver {
<<interface>>
+MaterializedViewStatus getMaterializedViewStatus(QualifiedObjectName materializedViewName)
}
Metadata <|.. MetadataManager
Metadata <|.. DelegatingMetadataManager
ConnectorMetadata <|.. ClassLoaderSafeConnectorMetadata
ConnectorMetadata <|.. IcebergAbstractMetadata
MetadataManager --> ConnectorMetadata : uses_via_CatalogMetadata
DelegatingMetadataManager --> Metadata : delegates
ClassLoaderSafeConnectorMetadata --> ConnectorMetadata : delegates
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In MetadataResolver, changing the default getMaterializedViewStatus(QualifiedObjectName) from delegating to the TupleDomain version to throwing UnsupportedOperationException may break existing implementations that only overrode the domain-based method; consider keeping the delegating default for backward compatibility.
- In InformationSchemaMetadata.calculatePrefixesWithTableName, the new Stream.of(...).flatMap(stream -> stream) construction is harder to read than the previous chained Stream.concat style; consider simplifying this to a more conventional pattern for maintainability.
- In buildMaterializedViews, base_tables is rendered as a comma-separated "schema.table" list without catalog, which may be ambiguous in multi-catalog setups; consider including the catalog or clarifying the expected format to avoid confusion when base tables span catalogs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In MetadataResolver, changing the default getMaterializedViewStatus(QualifiedObjectName) from delegating to the TupleDomain version to throwing UnsupportedOperationException may break existing implementations that only overrode the domain-based method; consider keeping the delegating default for backward compatibility.
- In InformationSchemaMetadata.calculatePrefixesWithTableName, the new Stream.of(...).flatMap(stream -> stream) construction is harder to read than the previous chained Stream.concat style; consider simplifying this to a more conventional pattern for maintainability.
- In buildMaterializedViews, base_tables is rendered as a comma-separated "schema.table" list without catalog, which may be ambiguous in multi-catalog setups; consider including the catalog or clarifying the expected format to avoid confusion when base tables span catalogs.
## Individual Comments
### Comment 1
<location> `presto-spi/src/main/java/com/facebook/presto/spi/analyzer/MetadataResolver.java:126` </location>
<code_context>
*/
default MaterializedViewStatus getMaterializedViewStatus(QualifiedObjectName materializedViewName)
{
- return getMaterializedViewStatus(materializedViewName, TupleDomain.all());
+ throw new UnsupportedOperationException("getMaterializedViewStatus is not supported");
}
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Changing the default getMaterializedViewStatus implementation to throw is a behavioral breaking change.
Previously, this default delegated to `getMaterializedViewStatus(materializedViewName, TupleDomain.all())`, but now it always throws `UnsupportedOperationException`. Implementations that only overrode the TupleDomain-based method will now fail at runtime. If the goal is to require explicit implementations, consider first deprecating this method or ensuring all in-tree uses either call the TupleDomain variant or override the no-arg method.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| */ | ||
| default MaterializedViewStatus getMaterializedViewStatus(QualifiedObjectName materializedViewName) | ||
| { | ||
| return getMaterializedViewStatus(materializedViewName, TupleDomain.all()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Changing the default getMaterializedViewStatus implementation to throw is a behavioral breaking change.
Previously, this default delegated to getMaterializedViewStatus(materializedViewName, TupleDomain.all()), but now it always throws UnsupportedOperationException. Implementations that only overrode the TupleDomain-based method will now fail at runtime. If the goal is to require explicit implementations, consider first deprecating this method or ensuring all in-tree uses either call the TupleDomain variant or override the no-arg method.
b32f72c to
0e4171e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In buildMaterializedViews(), base_tables is constructed using the view’s catalog for all entries; consider using each base table’s own catalog (and a stable ordering) so cross-catalog materialized views are represented accurately.
- The new MetadataResolver.getMaterializedViewStatus default now throws UnsupportedOperationException; to avoid surprising breakages, consider providing a backwards-compatible default (for example delegating to the existing metadata APIs) rather than requiring all callers to override immediately.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In buildMaterializedViews(), base_tables is constructed using the view’s catalog for all entries; consider using each base table’s own catalog (and a stable ordering) so cross-catalog materialized views are represented accurately.
- The new MetadataResolver.getMaterializedViewStatus default now throws UnsupportedOperationException; to avoid surprising breakages, consider providing a backwards-compatible default (for example delegating to the existing metadata APIs) rather than requiring all callers to override immediately.
## Individual Comments
### Comment 1
<location> `presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViews.java:1405-1414` </location>
<code_context>
+ @Test
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting information_schema.materialized_views emptiness or absence after dropping materialized views
Both `testInformationSchemaMaterializedViews` and `testInformationSchemaMaterializedViewsAfterRefresh` drop the materialized views (and base tables) but never verify the post-drop state. Please add a final assertion that the corresponding rows are removed from `information_schema.materialized_views` (for example, `SELECT COUNT(*) FROM information_schema.materialized_views WHERE table_schema = 'test_schema' AND table_name = '...'` returns 0) to confirm the information schema is updated on DROP as well as on CREATE/REFRESH.
Suggested implementation:
```java
assertUpdate("DROP MATERIALIZED VIEW test_mv_metadata");
assertUpdate("DROP TABLE test_mv_metadata_base");
assertUpdate("DROP TABLE existing_table_name");
assertUpdate("DROP TABLE test_mv_base");
assertQuery(
"SELECT COUNT(*) " +
"FROM information_schema.materialized_views " +
"WHERE table_schema = 'test_schema' AND table_name = 'test_mv_metadata'",
"VALUES 0");
```
To fully implement your suggestion for the specific tests you mentioned, the following additional updates are needed elsewhere in the same file:
1) In testInformationSchemaMaterializedViews:
- After the existing DROP statements that remove test_is_mv1, test_is_mv2 and their base tables, add assertions verifying that the corresponding rows are removed from information_schema.materialized_views, for example:
- assertQuery("SELECT COUNT(*) FROM information_schema.materialized_views WHERE table_schema = 'test_schema' AND table_name = 'test_is_mv1'", "VALUES 0");
- assertQuery("SELECT COUNT(*) FROM information_schema.materialized_views WHERE table_schema = 'test_schema' AND table_name = 'test_is_mv2'", "VALUES 0");
- If there is a single query used to assert presence before drop (e.g. filtering by table_schema and table_name IN ('test_is_mv1','test_is_mv2')), you may instead assert a single COUNT(*) = 0 over the same predicate after the drops.
2) In testInformationSchemaMaterializedViewsAfterRefresh:
- Identify where the materialized views created in that test are dropped.
- Immediately after the DROP MATERIALIZED VIEW statements (and any table drops if present), add analogous assertions that COUNT(*) = 0 in information_schema.materialized_views for those specific (schema, table_name) combinations.
- Ensure the table_schema used in the WHERE clause matches whatever schema is used to create the materialized views in that test (likely 'test_schema', but adjust if the test uses a different schema).
These additional changes will ensure all relevant tests validate that information_schema.materialized_views is correctly updated on DROP, not only on CREATE/REFRESH.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testInformationSchemaMaterializedViews() | ||
| { | ||
| assertUpdate("CREATE TABLE test_is_mv_base1 (id BIGINT, name VARCHAR, value BIGINT)"); | ||
| assertUpdate("CREATE TABLE test_is_mv_base2 (category VARCHAR, amount BIGINT)"); | ||
|
|
||
| assertUpdate("INSERT INTO test_is_mv_base1 VALUES (1, 'Alice', 100), (2, 'Bob', 200)", 2); | ||
| assertUpdate("INSERT INTO test_is_mv_base2 VALUES ('A', 50), ('B', 75)", 2); | ||
|
|
||
| assertUpdate("CREATE MATERIALIZED VIEW test_is_mv1 AS SELECT id, name, value FROM test_is_mv_base1 WHERE id > 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider asserting information_schema.materialized_views emptiness or absence after dropping materialized views
Both testInformationSchemaMaterializedViews and testInformationSchemaMaterializedViewsAfterRefresh drop the materialized views (and base tables) but never verify the post-drop state. Please add a final assertion that the corresponding rows are removed from information_schema.materialized_views (for example, SELECT COUNT(*) FROM information_schema.materialized_views WHERE table_schema = 'test_schema' AND table_name = '...' returns 0) to confirm the information schema is updated on DROP as well as on CREATE/REFRESH.
Suggested implementation:
assertUpdate("DROP MATERIALIZED VIEW test_mv_metadata");
assertUpdate("DROP TABLE test_mv_metadata_base");
assertUpdate("DROP TABLE existing_table_name");
assertUpdate("DROP TABLE test_mv_base");
assertQuery(
"SELECT COUNT(*) " +
"FROM information_schema.materialized_views " +
"WHERE table_schema = 'test_schema' AND table_name = 'test_mv_metadata'",
"VALUES 0");To fully implement your suggestion for the specific tests you mentioned, the following additional updates are needed elsewhere in the same file:
-
In testInformationSchemaMaterializedViews:
- After the existing DROP statements that remove test_is_mv1, test_is_mv2 and their base tables, add assertions verifying that the corresponding rows are removed from information_schema.materialized_views, for example:
- assertQuery("SELECT COUNT(*) FROM information_schema.materialized_views WHERE table_schema = 'test_schema' AND table_name = 'test_is_mv1'", "VALUES 0");
- assertQuery("SELECT COUNT(*) FROM information_schema.materialized_views WHERE table_schema = 'test_schema' AND table_name = 'test_is_mv2'", "VALUES 0");
- If there is a single query used to assert presence before drop (e.g. filtering by table_schema and table_name IN ('test_is_mv1','test_is_mv2')), you may instead assert a single COUNT(*) = 0 over the same predicate after the drops.
- After the existing DROP statements that remove test_is_mv1, test_is_mv2 and their base tables, add assertions verifying that the corresponding rows are removed from information_schema.materialized_views, for example:
-
In testInformationSchemaMaterializedViewsAfterRefresh:
- Identify where the materialized views created in that test are dropped.
- Immediately after the DROP MATERIALIZED VIEW statements (and any table drops if present), add analogous assertions that COUNT(*) = 0 in information_schema.materialized_views for those specific (schema, table_name) combinations.
- Ensure the table_schema used in the WHERE clause matches whatever schema is used to create the materialized views in that test (likely 'test_schema', but adjust if the test uses a different schema).
These additional changes will ensure all relevant tests validate that information_schema.materialized_views is correctly updated on DROP, not only on CREATE/REFRESH.
0e4171e to
5e5b141
Compare
Description
Add materialized views to information_schema.
Motivation and Context
Allow users to probe the materialized view definition, access control and status.
Impact
A new table,
materialized_views, is added to the information schema.Test Plan
New tests have been added.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.