[server] Add negative cache for non-existent partition IDs in metadata requests by swuferhong · Pull Request #3511 · apache/fluss

swuferhong · 2026-06-23T05:58:22Z

Purpose

Linked issue: close #3510

During hourly partition rotation, clients holding stale partition IDs repeatedly trigger
ZooKeeper lookups that always return "not found". This adds unnecessary pressure on ZK.
The negative cache eliminates these redundant queries by remembering the "not found" result
for a configurable TTL period.

Brief change log

Tests

API and Format

Documentation

…a requests

loserwang1024

I have left some comment

loserwang1024 · 2026-06-23T10:32:02Z

+     */
+    private static final long CLEANUP_INTERVAL_MS = 60_000;
+
+    private final ConcurrentHashMap<Long, Long> cache;


Why not useUse Guava Cache but implementation by yourself?

Unreliable expiration cleanup — maybeCleanup() is only called inside markNonExistent(). If no new partitions are marked as non-existent for a long time (in practice, markNonExistent is only triggered after a ZK query confirms non-existence), expired entries will linger in memory indefinitely. While isKnownNonExistent() lazily removes individual expired entries, it never performs a full sweep.

The project already has shaded Guava, so you can use it directly:

import org.apache.fluss.shaded.guava32.com.google.common.cache.Cache; import org.apache.fluss.shaded.guava32.com.google.common.cache.CacheBuilder; private final Cache<Long, Boolean> negativeCache = CacheBuilder.newBuilder() .expireAfterAccess(10, TimeUnit.MINUTES) .maximumSize(10000) // prevent OOM in extreme cases .build();

Advantages of Guava Cache:

Automatic eviction: access-time-based TTL without manual maintenance

Bounded size: maximumSize prevents unbounded growth (the current implementation has no size limit)

Battle-tested thread safety: no need to roll your own CAS logic

@wuchong ,WDYT?

loserwang1024 · 2026-06-23T10:32:50Z

        long[] partitionIds = request.getPartitionsIds();
        List<Long> partitionIdsNotExistsInCache = new ArrayList<>();
        for (long partitionId : partitionIds) {
+            // Fast-path: throw immediately for partition IDs known to not exist,


I think negative cache check should NOT be placed before metadata cache lookup.

Problem scenario — a race condition exists:

A new partition is being created; its ID has been assigned.

A client sends a metadata request, but the server's metadata cache hasn't synced yet, and ZK may not have the entry written yet either.

Server queries ZK, finds nothing → calls markNonExistent(partitionId).

Partition creation completes; metadata cache is updated via ZK watch.

Subsequent requests are blocked by the negative cache and can never retrieve the now-existing partition.

Suggested fix: Move the negative cache check to after the metadata cache lookup but before the ZK query:

[server] Add negative cache for non-existent partition IDs in metadat…

bd34e12

…a requests

swuferhong requested a review from loserwang1024 June 23, 2026 06:14

loserwang1024 reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[server] Add negative cache for non-existent partition IDs in metadata requests#3511

[server] Add negative cache for non-existent partition IDs in metadata requests#3511
swuferhong wants to merge 1 commit into
apache:mainfrom
swuferhong:partition-not-exists-cache

swuferhong commented Jun 23, 2026

Uh oh!

loserwang1024 left a comment

Uh oh!

loserwang1024 Jun 23, 2026

Uh oh!

loserwang1024 Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

swuferhong commented Jun 23, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

loserwang1024 left a comment

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

loserwang1024 Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants