Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong lag calculation for cooperative-sticky consumers #902

Open
omnilight opened this issue Jan 30, 2025 · 1 comment
Open

Wrong lag calculation for cooperative-sticky consumers #902

omnilight opened this issue Jan 30, 2025 · 1 comment

Comments

@omnilight
Copy link

omnilight commented Jan 30, 2025

Hello!

I want to thank you for the great library!

We encountered an issue in the pkg/kadm package when using the client.Lag() function.

We have consumers that subscribe to two topics, let's say topic1 and topic2, using the cooperative-sticky rebalancing strategy.

After running for some time and consuming events, we disconnect from topic1, while there are still unconsumed messages in it.

At the same time, we continue consuming from topic2, which also has messages.

As a result, we end up in a situation where topic1 has no assigned members, while topic2 still has members.

When calling client.Lag() in this case, it returns the lag only for topic2, while topic1 is completely absent in the response. This happens because the function iterates only over active members:

https://github.com/twmb/franz-go/blob/master/pkg/kadm/groups.go#L1595

Our expected behavior in this case would be to see the lag for topic1 as well. If we disconnect completely from both topics, the lag is shown correctly because an Empty group is handled separately:

https://github.com/twmb/franz-go/blob/master/pkg/kadm/groups.go#L1590

Thanks for your help!

@twmb
Copy link
Owner

twmb commented Feb 10, 2025

I think the reason I originally did this was because I assumed that in an active group, a person is only interested in the lag for topics that are actually doled out to users. If I consume topic1 and topic2, and then I deliberately stop consuming topic1, I don't want to permanently have topic1 show up as lagging.

That said,

  • As you point out, in an empty group, all topics with commits are eligible for lag calculations -- so there is inconsistency already that should be resolved
  • In a group, if memberA can only consume topic1 and memberB can only consume topic2, we should not exclude topic1 from lag calculations (note I think this is what's going on in your example above)
  • If a person does want to stop consuming topic1 permanently, they can use the DeleteOffsets admin API to stop topic1 from showing up in lag calculations

I'll change the logic in Lag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants