-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85
Comments
It is impossible now, and not sure that this is going to be good idea. Some of NRI plugins are going to be deployed as containers, and infrastructure components are also in containers (e.g. kubelet's static pod manifests), so not starting until NRI plugin registers will lead to non-functional node. Another thing is the crashes of the plugins and re-connects: we shouldn't have scenarios where crash of the plugin will render node even temporary to non-usable state. For all of those the sync-on-connect calls were implemented: the plugin during start can inspect and adjust running containers which were started before NRI plugin registers. As well, if there is something that can't be modified via Update cals, there is always possibility for NRI plugin to trigger stop of the existing container, and eventual re-creation of it by orchestration layer above. |
Some discussion here .. wip #43 (comment) |
@kad good insight Should probably develop some use cases .. an integration bucket and a e2e bucket that exercises the supported use cases. |
I agree with @kad that it's probably not a good idea to bake such logic into the runtime itself. If I had to do something to this effect, my first idea would probably be to roll some extra tooling for it. Run the critical plugin(s) as DaemonSets, monitor whether they are ready/live (or have them refresh a CRD or a label on their node periodically and monitor that), then taint and if necessary drain/cordon/uncordon the nodes as necessary by extra tooling. If useful, maybe also label workloads that need the critical plugins as such, and have a mutating webhook to inject tolerations for unlabeled workloads to tolerate nodes without the critical plugins being up and running. |
In the scenarios where I work, i also encountered similar problems... I'm not sure if it's a good idea for us to provide the ability to categorize all plugins by role and make corresponding distinctions during the plugin registration/deregistration process. such as give a flag used to identify what the critical level for plugin to notify NRI when Initiate registration, and when deal with different level plugin's deregistration use different policy, like only when plugin explicitly initiates one rpc request can the deregistration be completed. (just an example and hasn't been carefully considered) |
Is there a way to configure containerd so that starting containers is impossible if the NRI plugin is not working or hasn't registered yet?
The text was updated successfully, but these errors were encountered: