-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4979: Evented desired state of world populator in kubelet volume manager #4980
base: master
Are you sure you want to change the base?
Conversation
f62cde9
to
f8e3d82
Compare
87e4d5c
to
e75c26e
Compare
/assign @xing-yang @jsafrane @SergeyKanzhelev ( since you have already been involved in the analysis and presentation ) |
/assign |
Please add a prod-readiness file for this KEP. |
e75c26e
to
f2488d1
Compare
/approve |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bouaouda-achraf, jpbetz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
## Summary | ||
|
||
This KEP proposes optimizing the loop iteration period (currently fixed at 100ms) in the Desired State of the World Populator (DSWP). The enhancement involves dynamically increasing the sleep period when no changes are detected and reacting to gRPC event streams from the CRI implementation to reduce unnecessary processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DSWP of which side? volume-manager or attach-detach controller? We should explicitly mention that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retitled the kep to calarify which side it affects .
- [ ] Add E2E tests for DSWP | ||
|
||
#### Beta (enabled by default) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beta -> GA?
Deprecation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sections updated .
## Proposal | ||
|
||
The Desired State of the World Populator will listen to gRPC event streams from the CRI implementation. Specifically, the CONTAINER_CREATED_EVENT and CONTAINER_DELETED_EVENT will trigger the populator loop. | ||
During periods of inactivity, the populator loop interval will increase by 100ms increments after the third execution, up to a maximum of 1 second. If an event is detected, the interval resets to the default 100ms. This approach ensures responsiveness while reducing CPU usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DSWP also triggers online expansion of volumes in kubelet if requested by the user. How does that work? Does this mean, online resizing will still work, but it will be slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you.
The KEP will improve performance for mount/unmount operations. However, for online volume resize, users may experience some delay (100 ms currently VS <= 1 second).
My considerations on this:
- Ship the current KEP with this behavior: the resize request delay will be documented and highlighted when using this KEP.
- Reduce the maximum sleep period to 500 ms or keep it ≤ 1 second?
- Catching volume resize event with informer ? (a good choice ? coupling kubelet with non node-component ? )
What do you think about the fact that this delay in resizing is a reasonable trade-off for the overall improvement in mount/unmount performance?
/cc @harche is every runtime emitting these events now? evented pleg has been beta for a while and I am pretty sure the events emitting is still optional. Will this feature rely on that? |
owning-sig: sig-storage | ||
participating-sigs: | ||
- sig-node | ||
status: provisional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be marked as implementable if we are targetting alpha in 1.33 release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the status to "implementable". please correct me if I'm wrong .
We need someone from sig-node to review this KEP as well. |
The feature is in |
Keeping @tallclair in the loop since he is looking into enhancing the Generic PLEG. |
f015d09
to
0b26e80
Compare
Should I mention in the KEP how to activate the pod-event option on CRI-O in addition to enabling the feature flag ? |
0b26e80
to
67bb932
Compare
That would be certainly helpful to crio users, thanks. |
Triggering the existing DSWP implementation based on the event type : | ||
|
||
1. CONTAINER_CREATED_EVENT | ||
2. CONTAINER_DELETED_EVENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the time these events are emitted. But ideally volumes must be ready before containers are created by CRI. Are containers created before volume is mounted? Traditionally that wasn't the case. Because all CRI does is prepares the bind mounts. So volumes must be ready before container could be created. Has this been split in two phases in container runtime or something?
Similarly - volume must be mounted after containers have been terminated. Is CONTAINER_DELETED_EVENT
event emitted after containers are terminated or when deletion of containers start?
We need to be crystal clear about source and timeline of these events to make sure we don't introduce races and cause data loss etc when using evented DSOWP. cc @smarterclayton who refactored this code a bit back to reduce races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great point. I am pretty sure the container creation/deletion events are before/after volumes need to be created/deleted which would make the timing of this not work AFAIU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, sorry--for deletion this could work, as the container is deleted before the volume needs to be unmounted. I'm not sure how the kubelet would notify the volume plugin that it should mount before the container creation event happens, but it couldn't rely just on CRI events in this case. It'd have to be triggered by something else.
|
||
## Proposal | ||
|
||
The Desired State of the World Populator will listen to gRPC event streams from the CRI implementation. Specifically, the CONTAINER_CREATED_EVENT and CONTAINER_DELETED_EVENT will trigger the populator loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you listen to CRI events, rely on the current evented PLEG or implement it independently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tagged you on the PR side https://github.com/kubernetes/kubernetes/pull/128958/files#r1950654976 ( independently )
|
||
##### e2e tests | ||
|
||
- [ ] Generate a large number of CRI Events by creating and deleting a significant number of containers within a short period of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this feature is released, the populator loop interval will be expanded from 100ms to 1s. While this remains a very short interval, how would you validate its effectiveness in e2e tests?
Whether the interval is 100ms or 1s, your tests can pass normally. How do you expect to correctly distinguish between them?
Triggering the existing DSWP implementation based on the event type : | ||
|
||
1. CONTAINER_CREATED_EVENT | ||
2. CONTAINER_DELETED_EVENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend elaborating on the implementation details, preferably with sequence diagrams to clearly demonstrate how CRI events trigger volume mount/unmount operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep I agree , I will add a sequence diagram.
Issue link: Optimize DSWP loop with dynamic sleep period and CRI event integration #4979
Discussion Link:
Sig storage Meeting" (Thursday, September 12, 2024)
Initial doc
Issue comment