-
Notifications
You must be signed in to change notification settings - Fork 1.9k
chore(performance): EventMetadata UUID generation optimizations #24345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(performance): EventMetadata UUID generation optimizations #24345
Conversation
v7 are created using a timestamp so they can be ordered, however this comes at a performance cost. We currently don't need to order these UUID's, so for now we can use v4.
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
[Websocket server sink uses it](https://github.com/vectordotdev/vector/blob/72e09673fda9d6fbf933adacea1220bdfae162a8/src/sinks/websocket_server/buffering.rs#L235) for time ordered replays in case the connection drops
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Summary
This PR fixes some severe performance issues which were discovered when load testing and profiling the disk buffer v2 implementation. UUID's were added to event metadata in #21074, and while regression tests were run on that branch before merging, there weren't any regression tests which test disk buffer performance. While adding UUID's to events is inconsequential in most pipeline configurations, disk buffers were disproportionately affected by this change due to events being serialized to/deserialized from the buffer, which lead to an additional (immediately overwritten) UUID being generated per event upon deserialization (see fix in #24336). This PR enables the
fast-rngfeature inUUIDwhich speeds up generation significantly, and switches the generator to v4 which further speeds up generation by foregoing v7's timestamp ordering.All of these changes combined lead to a ~40% increase in bandwidth through a multi-threaded disk buffer load test with identical test settings.
Finally, this PR adds a regression test for disk buffers so we can test for disk buffer performance regressions in future contributions.
Flame graph of a thread before UUID optimizations:

After UUID Optimizations:

Vector configuration
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Related: #24336
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.