KAFKA-19775: Don't fail if nextOffsetsAndMetadataToBeConsumed is not available. #20665

Nikita-Shupletsov · 2025-10-08T23:15:25Z

Before we added caching for consumer next offsets we'd called
mainConsumer.position and always expected something back. When we
added the caching, we kept the check that we always have nextOffset, but
as the logic changed to fetching the offsets from poll, we may not have
anything for topics that have no messages. This PR accounts for that.

Reviewers: Lucas Brutschy [email protected], Matthias J. Sax
[email protected]

…ble.

Nikita-Shupletsov · 2025-10-08T23:17:19Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

-                }
-            } catch (final KafkaException fatal) {
-                throw new StreamsException(fatal);
+            if (nextOffsetsAndMetadataToBeConsumed.containsKey(partition)) {


the change that reworked that code to use cached offsets: https://github.com/apache/kafka/pull/17091/files#diff-a76674468cda8772230fb8411717cf9068b1a363a792f32c602fb2ec5ba9efd7R472

@aliehsaeedii @lucasbru, as you folks worked on it, could you please take a look? thanks

mjsax · 2025-10-08T23:31:44Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

-                break;
+                return partitionsNeedCommit.stream()
+                        .flatMap(partition -> findOffsetAndMetadata(partition)
+                                .map(offsetAndMetadata -> Map.entry(partition, offsetAndMetadata))


If findOffsetAndMetadata returns an empty Optional, the partition will be dropped, and we does not end up in the computed "committableOffsets" Map, right?

yes. if findOffsetAndMetadata returns an empty optional, that topic partition will not be in the result list

Nikita-Shupletsov · 2025-10-08T23:32:01Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

-                    // This indicates a bug and thus we rethrow it as fatal `IllegalStateException`
-                    throw new IllegalStateException("Stream task " + id + " does not know the partition: " + partition);
-                }
-            } catch (final KafkaException fatal) {


as there are no client calls anymore, the catch is redundant

Nice slide cleanup.

mjsax · 2025-10-08T23:34:15Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

-                }
-                break;
+                return partitionsNeedCommit.stream()
+                        .flatMap(partition -> findOffsetAndMetadata(partition)


Java streams API always confused me. Why do we use flatMap here? Don't we get a single entry in "committableOffsets" Map that we compute here? Or is it needed, because we might drop some partitions?

so the path here is:
a stream of partitionsNeedCommit -> flatMap(partition -> to offsetAndMetadata optional -> map optional to map.entry optional -> map it to a stream) -> collect

the reason why we map the the optional to a stream is that it looks nicer. otherwise we would need to have two operations: filter(Optonal::ifPresent).map(Optional::get).
if you prefer that syntax, I will update the code

Subjective. :) If would it personally find easy to read using filter(...).map(...); it's more explicit, and less "magic"

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

mjsax · 2025-10-08T23:39:10Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

-            } catch (final KafkaException fatal) {
-                throw new StreamsException(fatal);
+            if (nextOffsetsAndMetadataToBeConsumed.containsKey(partition)) {
+                final OffsetAndMetadata offsetAndMetadata = nextOffsetsAndMetadataToBeConsumed.get(partition);


nit: I find the constainsKey() followed by get() pattern always confusing and hard to read... How about

final OffsetAndMetadata offsetAndMetadata = nextOffsetsAndMetadataToBeConsumed.get(partition); if (offsetAndMetadata != null) { offset = offsetAndMetadata.offset(); leaderEpoch = offsetAndMetadata.leaderEpoch(); }

Just a personal preference.

...ion-tests/src/test/java/org/apache/kafka/streams/integration/RegexSourceIntegrationTest.java

…ms/integration/RegexSourceIntegrationTest.java Co-authored-by: Matthias J. Sax <[email protected]>

lucasbru

LGTM, thanks!

Nikita-Shupletsov · 2025-10-10T22:45:27Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

+            final OffsetAndMetadata offsetAndMetadata = nextOffsetsAndMetadataToBeConsumed.get(partition);
+            if (offsetAndMetadata == null) {
+                try {
+                    offset = mainConsumer.position(partition);


my understanding so far:
we have some metadata that's one per task. but we have multiple input partitions. so instead of picking one, we just add it everywhere. and during restore we just read them all in whatever order and expect them to be more or less the same.
so if we are adding a new partition on the fly, we want that partition to also have that metadata. if we follow that logic.
so if we follow that logic, we need to commit even for that empty partition.

so I added a fallback to the previous logic when we ask the consumer about the offset

Yes, this is also what I understood when digging into it last week. I'm actually not sure that logic is fully watertight - what if my regular expression matches a disjoint subset of topics - none of them will have the metadata, so I have lost it, right? It seems a bit of a best-effort thing to me... In which we could also consider not committing the "partition we never consumed from" here.

lucasbru · 2025-10-13T12:55:24Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java

+                } catch (final TimeoutException error) {
+                    // the `consumer.position()` call should never block, because we know that we did process data
+                    // for the requested partition and thus the consumer should have a valid local position
+                    // that it can return immediately


Is that still valid in the corner case? I think in the case you are describing, we haven't processed any records so we may not have a position for the partition yet and this will actually block to do an offset fetch to find the last committed offset for the partition?

This reverts commit d6664ab.

…available. (#20665) Before we added caching for consumer next offsets we'd called `mainConsumer.position` and always expected something back. When we added the caching, we kept the check that we always have nextOffset, but as the logic changed to fetching the offsets from poll, we may not have anything for topics that have no messages. This PR accounts for that. Reviewers: Lucas Brutschy <[email protected]>, Matthias J. Sax <[email protected]>

mjsax · 2025-10-16T04:42:20Z

Thanks for the fix! -- Merged to trunk and cherry-picked to 4.1 and 4.0 branches.

MINOR: Don't fail if nextOffsetsAndMetadataToBeConsumed is not availa…

af1c8b7

…ble.

github-actions bot added streams triage PRs from the community labels Oct 8, 2025

Nikita-Shupletsov commented Oct 8, 2025

View reviewed changes

mjsax reviewed Oct 8, 2025

View reviewed changes

Nikita-Shupletsov commented Oct 8, 2025

View reviewed changes

mjsax reviewed Oct 8, 2025

View reviewed changes

streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java Show resolved Hide resolved

mjsax reviewed Oct 8, 2025

View reviewed changes

...ion-tests/src/test/java/org/apache/kafka/streams/integration/RegexSourceIntegrationTest.java Outdated Show resolved Hide resolved

Update streams/integration-tests/src/test/java/org/apache/kafka/strea…

31d0b1e

…ms/integration/RegexSourceIntegrationTest.java Co-authored-by: Matthias J. Sax <[email protected]>

mjsax added ci-approved and removed triage PRs from the community labels Oct 8, 2025

Small refactoring.

be74cac

Nikita-Shupletsov changed the title ~~MINOR: Don't fail if nextOffsetsAndMetadataToBeConsumed is not available.~~ KAFKA-19775: Don't fail if nextOffsetsAndMetadataToBeConsumed is not available. Oct 9, 2025

lucasbru approved these changes Oct 9, 2025

View reviewed changes

Updated the logic to always return OffsetAndMetadata.

d6664ab

github-actions bot added the small Small PRs label Oct 10, 2025

Nikita-Shupletsov commented Oct 10, 2025

View reviewed changes

lucasbru reviewed Oct 13, 2025

View reviewed changes

Revert "Updated the logic to always return OffsetAndMetadata."

f889723

This reverts commit d6664ab.

github-actions bot removed the small Small PRs label Oct 15, 2025

mjsax approved these changes Oct 15, 2025

View reviewed changes

mjsax merged commit 44481ca into apache:trunk Oct 16, 2025
30 of 35 checks passed

Uh oh!

KAFKA-19775: Don't fail if nextOffsetsAndMetadataToBeConsumed is not available. #20665

KAFKA-19775: Don't fail if nextOffsetsAndMetadataToBeConsumed is not available. #20665

Uh oh!

Conversation

Nikita-Shupletsov commented Oct 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucasbru left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mjsax commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nikita-Shupletsov commented Oct 8, 2025 •

edited by github-actions bot

Loading