HIVE-27190: Fix cache-key collisions for time-travel queries on Iceberg by deniskuzZ · Pull Request #6380 · apache/hive

deniskuzZ · 2026-03-19T16:41:40Z

What changes were proposed in this pull request?

Introduced Table#getQualifier(), which extends the fully qualified name with the metatable name and the time-travel / snapshot ref when present, and uses it as the cache key.

Why are the changes needed?

Fixes a data-correctness bug: when a single query references the same table at multiple points in time - entries collide in the cache. A later reader reuses the partition list / column stats that were computed for a different snapshot, producing wrong row counts, wrong plans, and incorrect query results.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -Dtest=TestIcebergCliDriver -Dqfile=iceberg_partition_pruner_cache_key.q

sonarqubecloud · 2026-03-20T14:42:50Z

Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zhangbutao · 2026-04-11T13:44:55Z

+    } else if (tab.isNonNative()) {
+      long snapshotId = tab.getStorageHandler().getSnapshotId(tab);
+      if (snapshotId > 0) {
+        key = tab.getFullyQualifiedName() + "." + snapshotId + ";";


Why change it here?
In my opinion,
key = tab.getFullyQualifiedName() + "." + tab.getSnapshotRef() + ";"
can also represent a unique key.

tag/branch/as of might reference the same snapshot, but ATM all of them produce diff keys, isn't it?

Sorry for the late reply.
My understanding is that if tag/ branch point to the same snapshot, then their snapshot IDs should be the same. Am I wrong?

yes, that is what this PR is about. we use snapshotRef as part of the key - not snapshotId. tag/branch, timetravel produce diff keys for the same snapshot

Let me rephrase this. If a branch (test_branch) and a tag (test_tag) are both created from main at the same time, then their snapshot IDs will be the same, and the key will be the same (key = tab.getFullyQualifiedName() + "." + snapshotId + ";").
If data is written to test_branch later, then test_branch's current snapshot ID and test_tag's snapshot ID will become different. And the keys will be the different.

Is that correct?

yes, that is correct.

@zhangbutao, i've revisited the fix. Since it's query level cache, adding new API in SH is probably an overkill. created getQualifier to account for all supported cases.
WDYT?

Sure, that's fine.

Thanks for checking, @zhangbutao!

zhangbutao

+1 LGTM

Copilot

Pull request overview

This PR addresses a correctness issue where partition-pruning and column-stats caches can collide when the same Iceberg table is referenced at different points in time (time travel), by incorporating a time-travel/metatable qualifier into table identity and cache keys.

Changes:

Introduces Table#getQualifier() and uses it to extend table identity/cache keys for time-travel and Iceberg metadata-table scans.
Updates partition pruner and Calcite planning paths to use the new qualifier-based identity, and adjusts shared-work merge checks accordingly.
Adds an Iceberg CLI query test + expected outputs to validate distinct planning/stats for current vs time-travel scans.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java	Changes partition-list key API from `Optional<String>` to `String` and updates list creation.
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java	Updates column-stats cache lookup to use the new `String` key API; minor generics cleanup.
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java	Incorporates table qualifier into Calcite table identity to avoid time-travel collisions.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java	Requires matching qualifier when deciding whether two TableScans can be merged.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java	Builds partition-pruner cache key using `Table#getQualifier()`.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java	Uses `String` partition-list keys (and `computeIfAbsent`) for column-stats caching.
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java	Adds `getQualifier()` to represent metatable/time-travel identity for caching/planning.
iceberg/iceberg-handler/src/test/queries/positive/iceberg_partition_pruner_cache_key.q	New query test covering current vs tag-based time travel in the same query/session.
iceberg/iceberg-handler/src/test/results/positive/iceberg_partition_pruner_cache_key.q.out	New golden output for the added cache-key regression test.
iceberg/iceberg-handler/src/test/results/positive/iceberg_metadata_table_alias.q.out	Updates expected output to reflect metadata-table qualification in optimized SQL.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  public PrunedPartitionList(Table source, String key, Set<Partition> partitions,
      List<String> referred, boolean hasUnknowns) {
    this.source = Objects.requireNonNull(source);
-    this.ppListKey = Optional.ofNullable(key);
+    this.ppListKey = key;
    this.referred = Objects.requireNonNull(referred);


zhangbutao

Almost looks good to me. Thanks @deniskuzZ .

zhangbutao · 2026-05-23T15:30:47Z

+    } else if (tab.isNonNative()) {
+      long snapshotId = tab.getStorageHandler().getSnapshotId(tab);
+      if (snapshotId > 0) {
+        key = tab.getFullyQualifiedName() + "." + snapshotId + ";";


Sure, that's fine.

deniskuzZ · 2026-05-25T09:35:01Z

@zhangbutao, addresses the review comments. Please let me know if i missed anything

sonarqubecloud · 2026-05-25T10:44:59Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.6% Duplication on New Code

See analysis details on SonarQube Cloud

zhangbutao

+1 LGTM
Thanks @deniskuzZ

asf-ci-hive added tests pending tests unstable and removed tests pending labels Mar 19, 2026

deniskuzZ force-pushed the HIVE-27190 branch from cc7465a to ca6f216 Compare March 20, 2026 13:00

asf-ci-hive added tests pending and removed tests unstable labels Mar 20, 2026

asf-ci-hive added tests passed and removed tests pending labels Mar 20, 2026

deniskuzZ requested a review from zhangbutao April 2, 2026 20:11

zhangbutao reviewed Apr 6, 2026

View reviewed changes

zhangbutao reviewed Apr 11, 2026

View reviewed changes

zhangbutao approved these changes May 6, 2026

View reviewed changes

asf-ci-hive added tests pending tests failed and removed tests passed tests pending labels May 12, 2026

deniskuzZ added 2 commits May 12, 2026 17:22

HIVE-27190: Implement col stats cache for hive iceberg table

8507a00

refactor, drop SH API

03435da

deniskuzZ force-pushed the HIVE-27190 branch from b9cfa85 to 03435da Compare May 12, 2026 14:22

asf-ci-hive added tests pending and removed tests failed labels May 12, 2026

deniskuzZ force-pushed the HIVE-27190 branch from 4c0d06e to b5f3aa0 Compare May 12, 2026 18:25

asf-ci-hive added tests unstable tests pending and removed tests pending tests unstable labels May 12, 2026

tests

584159f

deniskuzZ force-pushed the HIVE-27190 branch from b5f3aa0 to 584159f Compare May 13, 2026 07:02

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels May 13, 2026

deniskuzZ changed the title ~~HIVE-27190: Implement col stats cache for hive iceberg table~~ HIVE-27190: Fix cache-key collisions for time-travel and metadata-table queries on Iceberg May 13, 2026

deniskuzZ changed the title ~~HIVE-27190: Fix cache-key collisions for time-travel and metadata-table queries on Iceberg~~ HIVE-27190: Fix cache-key collisions for time-travel queries on Iceberg May 13, 2026

deniskuzZ requested a review from zhangbutao May 13, 2026 15:43

zhangbutao requested a review from Copilot May 23, 2026 14:20

Copilot started reviewing on behalf of zhangbutao May 23, 2026 14:21 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

zhangbutao approved these changes May 23, 2026

View reviewed changes

zhangbutao reviewed May 23, 2026

View reviewed changes

Comment thread ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java

review comments #1

859facf

asf-ci-hive added tests pending and removed tests passed labels May 25, 2026

asf-ci-hive added tests passed and removed tests pending labels May 25, 2026

zhangbutao approved these changes May 25, 2026

View reviewed changes

deniskuzZ merged commit d2d7dd2 into apache:master May 26, 2026
4 checks passed

Uh oh!

Conversation

deniskuzZ commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud Bot commented Mar 20, 2026

Quality Gate passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangbutao left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

zhangbutao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deniskuzZ commented May 25, 2026

Uh oh!

sonarqubecloud Bot commented May 25, 2026

Quality Gate passed

Uh oh!

zhangbutao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deniskuzZ commented Mar 19, 2026 •

edited

Loading

deniskuzZ May 5, 2026 •

edited

Loading

deniskuzZ May 12, 2026 •

edited

Loading