Skip to content

HIVE-28655: Implement HMS Related Drop Stats Changes (Part2) COL_STATS_ACCURATE related changes#6198

Merged
soumyakanti3578 merged 12 commits into
apache:masterfrom
DanielZhu58:HIVE-28655-part2
Mar 26, 2026
Merged

HIVE-28655: Implement HMS Related Drop Stats Changes (Part2) COL_STATS_ACCURATE related changes#6198
soumyakanti3578 merged 12 commits into
apache:masterfrom
DanielZhu58:HIVE-28655-part2

Conversation

@DanielZhu58

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

We need a way to drop the stats associated with the table/partition and its columns.
This can help a lot in migration or replication where the stats data take huge time to copy.
Particularly when the table is partitioned, we have stats rows for each table, partition, column combination, which can get huge when the number of partitions is huge.
This is the HMS side changes.
This is the part2 with the parameter COL_STATS_ACCURATE related changes.

Why are the changes needed?

This can give the users potential options to clear the unnecessary stats.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual tests and unit tests.

@sonarqubecloud

Copy link
Copy Markdown

@soumyakanti3578 soumyakanti3578 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments here, but I also noticed that I had reviewed the earlier version of this PR: #5790
Please go through the review comments there too.

Also it would be nice to go through the sonar report and fix the ones that make sense. For example, there are many reports on code conventions that you should incorporate to improve maintainability and consistency.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (2)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/MetastoreDirectSqlUtils.java:588

  • Indentation for these method declarations is inconsistent with the surrounding code (one less leading space than other members). This can trip checkstyle and makes diffs noisier; please align indentation with the rest of the class.
    standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/DirectSqlUpdatePart.java:675
  • The query select DISTINCT "CD_ID" ... returns a single column per row, but sqlResult is declared as List<Object[]>. This will cause a ClassCastException during iteration (the iterator will attempt to cast each element to Object[]). Use List<Object> (or List<?>) for this result and pass each element to extractSqlLong, or adjust the query/result handling so the element type matches the actual result.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1854 to +1856
String table_column_stats_accurate = tableParams.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + tblName, table_column_stats_accurate == null ||
(!table_column_stats_accurate.contains(colName[0]) && !table_column_stats_accurate.contains(colName[1])));

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COLUMN_STATS_ACCURATE is a JSON payload; checking it via String.contains(...) is brittle (substring collisions, formatting changes, etc.). Prefer parsing via StatsSetupConst.getColumnsHavingStats(tableParams) and asserting the returned list does/doesn’t contain the expected column names.

Suggested change
String table_column_stats_accurate = tableParams.get(StatsSetupConst.COLUMN_STATS_ACCURATE);
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + tblName, table_column_stats_accurate == null ||
(!table_column_stats_accurate.contains(colName[0]) && !table_column_stats_accurate.contains(colName[1])));
List<String> colsWithStats = StatsSetupConst.getColumnsHavingStats(tableParams);
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + tblName,
colsWithStats == null || (!colsWithStats.contains(colName[0]) && !colsWithStats.contains(colName[1])));

Copilot uses AI. Check for mistakes.
Comment on lines +1877 to +1878
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + tblName, table_column_stats_accurate == null ||
(!table_column_stats_accurate.contains(colName[0]) && !table_column_stats_accurate.contains(colName[1])));

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: validating COLUMN_STATS_ACCURATE by substring search is brittle. Consider using StatsSetupConst.getColumnsHavingStats(...) and asserting the list doesn’t contain the deleted column names.

Suggested change
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + tblName, table_column_stats_accurate == null ||
(!table_column_stats_accurate.contains(colName[0]) && !table_column_stats_accurate.contains(colName[1])));
List<String> colsWithStats = StatsSetupConst.getColumnsHavingStats(tableParams);
assertTrue("parameter COLUMN_STATS_ACCURATE is not accurate in " + tblName,
colsWithStats == null ||
(!colsWithStats.contains(colName[0]) && !colsWithStats.contains(colName[1])));

Copilot uses AI. Check for mistakes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/DirectSqlUpdatePart.java:676

  • In the DISTINCT CD_ID query, sqlResult is declared as List<Object[]> but then iterated as for (Object cdId : sqlResult) and passed to extractSqlLong(cdId). This will likely fail at runtime (elements are Object[], not Number). Use List<Object> for single-column results, or iterate Object[] row and extract row[0] consistently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/DirectSqlUpdatePart.java:676

  • In this query result handling, executeWithArray for select DISTINCT "CD_ID" returns a single-column result list (elements are typically Number/Long). Declaring it as List<Object[]> will cause a ClassCastException when iterating because the enhanced-for will cast each element to Object[]. Use List<Object> (as before) or map to List<Long> and call extractSqlLong on each element.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/directsql/DirectSqlUpdatePart.java:673

  • This query selects a single column (SELECT DISTINCT "CD_ID" ...), but the result is stored in a List<Object[]>. That’s misleading and makes it easy to accidentally treat each row as an array later. Prefer List<Object> (or explicitly convert to List<Long>) for single-column selects to reflect the actual result shape.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +168 to +172
for (int i = 0; i < numUpdates.length; i++) {
if (numUpdates[i] != 1) {
throw new MetaException("Invalid state of PARTITION_PARAMS ("
+ StatsSetupConst.COLUMN_STATS_ACCURATE + ") for PART_ID " + partIds.get(i));
}

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyUpdates treats any batch update count other than 1 as failure. JDBC allows executeBatch() to return Statement.SUCCESS_NO_INFO for successful updates, which would make this throw even though the update succeeded. Consider treating SUCCESS_NO_INFO as success (while still flagging 0 updates).

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +72
String deleteSql = "delete from \"PART_COL_STATS\" where \"PART_ID\" in ( " + getIdListForIn(partitionIds) + ")";
List<Object> params = new ArrayList<>(colNames == null ? 1 : colNames.size() + 1);

if (colNames != null && !colNames.isEmpty()) {
deleteSql += " and \"COLUMN_NAME\" in (" + makeParams(colNames.size()) + ")";

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class hard-codes metastore table names in SQL strings (e.g., "PART_COL_STATS", "TAB_COL_STATS", "PARTITION_PARAMS"). MetaStoreDirectSql schema-qualifies these identifiers via getFullyQualifiedName(schema, ...) based on javax.jdo.mapping.Schema. If a schema is configured, these statements will hit the wrong table names. Consider obtaining the qualified table identifiers from MetaStoreDirectSql (or passing schema in and using getFullyQualifiedName) instead of embedding raw names.

Copilot uses AI. Check for mistakes.

@dengzhhu653 dengzhhu653 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending test

@sonarqubecloud

Copy link
Copy Markdown

@DanielZhu58

Copy link
Copy Markdown
Contributor Author

@dengzhhu653 @soumyakanti3578
Thank you for all the reviewing and help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants