Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4979: Evented desired state of world populator in kubelet volume manager #4980

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bouaouda-achraf
Copy link

@bouaouda-achraf bouaouda-achraf commented Nov 24, 2024

  • One-line PR description: Dynamically adjust the Desired State of the World Populator (DSWP) sleep period during inactivity and trigger populator logic based on CRI events.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Nov 24, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 24, 2024
@bouaouda-achraf bouaouda-achraf changed the title feat: Evented desired state of world populator KEP-4979: Evented desired state of world populator Nov 24, 2024
@bouaouda-achraf bouaouda-achraf force-pushed the evented-dswp branch 2 times, most recently from 87e4d5c to e75c26e Compare November 24, 2024 22:02
@bouaouda-achraf
Copy link
Author

/assign @xing-yang @jsafrane @SergeyKanzhelev

( since you have already been involved in the analysis and presentation )

@bouaouda-achraf bouaouda-achraf changed the title KEP-4979: Evented desired state of world populator KEP-4979: kubelet: Evented desired state of world populator Jan 6, 2025
@gnufied
Copy link
Member

gnufied commented Jan 23, 2025

/assign

@kannon92
Copy link
Contributor

Please add a prod-readiness file for this KEP.

@jpbetz
Copy link
Contributor

jpbetz commented Feb 5, 2025

/approve
For PRR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bouaouda-achraf, jpbetz
Once this PR has been reviewed and has the lgtm label, please ask for approval from jsafrane. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


## Summary

This KEP proposes optimizing the loop iteration period (currently fixed at 100ms) in the Desired State of the World Populator (DSWP). The enhancement involves dynamically increasing the sleep period when no changes are detected and reacting to gRPC event streams from the CRI implementation to reduce unnecessary processing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSWP of which side? volume-manager or attach-detach controller? We should explicitly mention that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retitled the kep to calarify which side it affects .

- [ ] Add E2E tests for DSWP

#### Beta (enabled by default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beta -> GA?
Deprecation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sections updated .

## Proposal

The Desired State of the World Populator will listen to gRPC event streams from the CRI implementation. Specifically, the CONTAINER_CREATED_EVENT and CONTAINER_DELETED_EVENT will trigger the populator loop.
During periods of inactivity, the populator loop interval will increase by 100ms increments after the third execution, up to a maximum of 1 second. If an event is detected, the interval resets to the default 100ms. This approach ensures responsiveness while reducing CPU usage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSWP also triggers online expansion of volumes in kubelet if requested by the user. How does that work? Does this mean, online resizing will still work, but it will be slower?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you.

The KEP will improve performance for mount/unmount operations. However, for online volume resize, users may experience some delay (100 ms currently VS <= 1 second).

My considerations on this:

  1. Ship the current KEP with this behavior: the resize request delay will be documented and highlighted when using this KEP.
  2. Reduce the maximum sleep period to 500 ms or keep it ≤ 1 second?
  3. Catching volume resize event with informer ? (a good choice ? coupling kubelet with non node-component ? )

What do you think about the fact that this delay in resizing is a reasonable trade-off for the overall improvement in mount/unmount performance?

@haircommander
Copy link
Contributor

/cc @harche

is every runtime emitting these events now? evented pleg has been beta for a while and I am pretty sure the events emitting is still optional. Will this feature rely on that?

@k8s-ci-robot k8s-ci-robot requested a review from harche February 6, 2025 17:13
owning-sig: sig-storage
participating-sigs:
- sig-node
status: provisional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be marked as implementable if we are targetting alpha in 1.33 release?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the status to "implementable". please correct me if I'm wrong .

@xing-yang
Copy link
Contributor

We need someone from sig-node to review this KEP as well.
cc @SergeyKanzhelev

@harche
Copy link
Contributor

harche commented Feb 6, 2025

/cc @harche

is every runtime emitting these events now? evented pleg has been beta for a while and I am pretty sure the events emitting is still optional. Will this feature rely on that?

The feature is in alpha. CRIO doesn't emit them unless you specify --enable-pod-events flag.

@harche
Copy link
Contributor

harche commented Feb 6, 2025

Keeping @tallclair in the loop since he is looking into enhancing the Generic PLEG.

@bouaouda-achraf bouaouda-achraf changed the title KEP-4979: kubelet: Evented desired state of world populator KEP-4979: Evented desired state of world populator in kubelet volume manager Feb 9, 2025
@bouaouda-achraf bouaouda-achraf force-pushed the evented-dswp branch 3 times, most recently from f015d09 to 0b26e80 Compare February 9, 2025 12:48
@bouaouda-achraf
Copy link
Author

/cc @harche
is every runtime emitting these events now? evented pleg has been beta for a while and I am pretty sure the events emitting is still optional. Will this feature rely on that?

The feature is in alpha. CRIO doesn't emit them unless you specify --enable-pod-events flag.

Should I mention in the KEP how to activate the pod-event option on CRI-O in addition to enabling the feature flag ?

@harche
Copy link
Contributor

harche commented Feb 10, 2025

/cc @harche
is every runtime emitting these events now? evented pleg has been beta for a while and I am pretty sure the events emitting is still optional. Will this feature rely on that?

The feature is in alpha. CRIO doesn't emit them unless you specify --enable-pod-events flag.

Should I mention in the KEP how to activate the pod-event option on CRI-O in addition to enabling the feature flag ?

That would be certainly helpful to crio users, thanks.

Triggering the existing DSWP implementation based on the event type :

1. CONTAINER_CREATED_EVENT
2. CONTAINER_DELETED_EVENT
Copy link
Member

@gnufied gnufied Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the time these events are emitted. But ideally volumes must be ready before containers are created by CRI. Are containers created before volume is mounted? Traditionally that wasn't the case. Because all CRI does is prepares the bind mounts. So volumes must be ready before container could be created. Has this been split in two phases in container runtime or something?

Similarly - volume must be mounted after containers have been terminated. Is CONTAINER_DELETED_EVENT event emitted after containers are terminated or when deletion of containers start?

We need to be crystal clear about source and timeline of these events to make sure we don't introduce races and cause data loss etc when using evented DSOWP. cc @smarterclayton who refactored this code a bit back to reduce races.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great point. I am pretty sure the container creation/deletion events are before/after volumes need to be created/deleted which would make the timing of this not work AFAIU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, sorry--for deletion this could work, as the container is deleted before the volume needs to be unmounted. I'm not sure how the kubelet would notify the volume plugin that it should mount before the container creation event happens, but it couldn't rely just on CRI events in this case. It'd have to be triggered by something else.


## Proposal

The Desired State of the World Populator will listen to gRPC event streams from the CRI implementation. Specifically, the CONTAINER_CREATED_EVENT and CONTAINER_DELETED_EVENT will trigger the populator loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you listen to CRI events, rely on the current evented PLEG or implement it independently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tagged you on the PR side https://github.com/kubernetes/kubernetes/pull/128958/files#r1950654976 ( independently )


##### e2e tests

- [ ] Generate a large number of CRI Events by creating and deleting a significant number of containers within a short period of time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this feature is released, the populator loop interval will be expanded from 100ms to 1s. While this remains a very short interval, how would you validate its effectiveness in e2e tests?

Whether the interval is 100ms or 1s, your tests can pass normally. How do you expect to correctly distinguish between them?

Triggering the existing DSWP implementation based on the event type :

1. CONTAINER_CREATED_EVENT
2. CONTAINER_DELETED_EVENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend elaborating on the implementation details, preferably with sequence diagrams to clearly demonstrate how CRI events trigger volume mount/unmount operations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I agree , I will add a sequence diagram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.