-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventHub component delivers out of order messages (pubsub, binding) #3568
Comments
Triaged |
Imo this is a P0 as it fundamentally breaks the underlying FIFO ordering that one would expect from EventHubs when processing each message individually via Dapr PubSub |
Agree |
And also the EH binding! They share the same code AFAICT. |
Hey @olitomlinson @yaron2 - After a crash course in golang, I don't think the issue is at https://github.com/dapr/components-contrib/blob/main/common/component/azure/eventhubs/eventhubs.go?rgh-link-date=2024-10-16T14%3A11%3A56Z#L293 as this starts a goroutine for the partition which seems perfectly fine. Partitions should be handled in parallel. However, looking at
|
That seems correct, yes |
So removing the If this really is a one line fix, would you expect unit tests? They would be entirely beyond me at this point in my golang career :D Also, as a P0 bug - would this warrant making it into 1.14.5 ? |
This is exactly what I said in Discord :) |
I obviously misread or missed that -- but it's good that we agree! :) I will submit the two-line PR as draft and link it, and we can go from there. Given this is a blocker for our solution, I would really like to see this make a point release and not wait for 1.15... |
It could be as simple as a one-liner, but it needs thorough testing to makes sure that the checkpointing is done correctly after each message completes. My one reservation on fixing this quickly is that there may be users out there in the wild with high-throughput use-cases that depend on the throughput that is currently afforded by this incorrect implementation. Until a fix is in place, its hard to quantify what that performance degradation maybe by checkpointing on each message. The real solution here is to use Bulk Subscriptions for high throughput use-cases, but this is not Stable yet. Idea : This could be fixed but the fix is put behind an opt-in feature-flag on the metadata so it doesn't impact people with existing expectations (from the incorrect implementation).
Then, when bulk subscriptions does graduate to Stable, the feature flag could be removed and replaced with an opt-in feature flag that reverts the behavior back to the broken implementation. And users with high-throughput expectations are encouraged to migrate to Bulk Subscriptions (or opt back in to the previous broken implementation, for a window of supported releases)
|
Hmm, I'm not going to be competent enough in the language to fix this in the window that my project requires. If you could collaborate with me, then I may learn enough to address my other feature requests for event hubs myself. How busy are you, lol |
Expected Behavior
When using Event Hubs as a pubsub or binding, messages should be delivered in the order they were posted (assume PartitionKey is set when publishing/posting to ensure ordering across partitions.)
Actual Behavior
In the pubsub case, the sidecar delivers new events before the subscriber has completed handling the last one. This causes major problems when trying to ensure order sensitive work is executed correctly (e.g. starting a workflow to process subsequent events.)
Steps to Reproduce the Problem
We're publishing to our topic like this (dotnet sdk):
and receiving like this:
The problem is clear when watching the logs: instead of seeing a constant start/stop/start/stop alternating sequence of log events, we're seeing start/stop/start/start/stop/stop interleaving. The sidecar should not be sending another event until the current one has completed processing, i.e. it receives a http 200 (in this case.)
The same issue likely occurs for the binding since the common code is the problem (according to @yaron2):
Release Note
PubSub and Binding components using ordered delivery (with a partitionkey) would interleave event deliveries to a subscriber. Now the sidecar will wait until the handler returns before sending the next event.
RELEASE NOTE:
The text was updated successfully, but these errors were encountered: