Skip to content

Conversation

abhishekrb19
Copy link
Contributor

When a segment publish error occurs, it's hard to tell which supervisor to reset if there are multiple supervisors consuming from the same stream. This change includes the datasource name to the error messages. Ideally we would also include the supervisor ID in the error, but since the supervisor ID is currently the same as the datasource name, this should suffice for now.

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

When a segment publish error occurs, it's hard to tell which supervisor to
reset if there are multiple supervisors consuming from the same stream. This
change includes the datasource name to the error messages. Ideally we would also include
the supervisor ID in the error, but since the supervisor ID is currently the same as
the datasource name, this should suffice for now.
@kfaraz
Copy link
Contributor

kfaraz commented Jun 4, 2025

@abhishekrb19 , the SegmentPublishResult is anyway sent back to the task which already belongs to a specific datasource. The task logs this failure but I don't think it is ever logged on the Overlord, unless an exception occurs.
I am not sure how including the datasource name in the publish result message would help.

@abhishekrb19
Copy link
Contributor Author

abhishekrb19 commented Jun 4, 2025

That’s right from a task logs perspective, @kfaraz. However, on 32.0.1, I noticed some error and warning logs on the Overlord that lacked the additional context, which prompted me to add the datasource name to the error messages to make them more actionable and help with troubleshooting. For example:

ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.
java.util.concurrent.ExecutionException: org.apache.druid.java.util.common.ISE: Failed to publish segments because of [org.apache.druid.error.DruidException: The new start metadata state[KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamStartSequenceNumbers{stream=...multiTopicPartition=false}=4278168656}}}]. Try resetting the supervisor.]

@kfaraz
Copy link
Contributor

kfaraz commented Jun 4, 2025

Fair enough, @abhishekrb19 ! I have run into this several times myself.

For this case, I would advise including the datasource name in the log on the supervisor side by using task.getDatasource() rather than putting it in the SegmentPublishResult.

In fact, I feel we should do this for several of the log messages coming out of the supervisor (maybe in a follow up).

Copy link

github-actions bot commented Aug 4, 2025

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants