How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85

t33m · 2024-05-22T06:48:48Z

Is there a way to configure containerd so that starting containers is impossible if the NRI plugin is not working or hasn't registered yet?

kad · 2024-05-22T08:17:17Z

It is impossible now, and not sure that this is going to be good idea. Some of NRI plugins are going to be deployed as containers, and infrastructure components are also in containers (e.g. kubelet's static pod manifests), so not starting until NRI plugin registers will lead to non-functional node. Another thing is the crashes of the plugins and re-connects: we shouldn't have scenarios where crash of the plugin will render node even temporary to non-usable state.

For all of those the sync-on-connect calls were implemented: the plugin during start can inspect and adjust running containers which were started before NRI plugin registers. As well, if there is something that can't be modified via Update cals, there is always possibility for NRI plugin to trigger stop of the existing container, and eventual re-creation of it by orchestration layer above.

mikebrow · 2024-05-23T23:05:28Z

Some discussion here .. wip #43 (comment)

mikebrow · 2024-05-23T23:14:42Z

@kad good insight

Should probably develop some use cases .. an integration bucket and a e2e bucket that exercises the supported use cases.

klihub · 2024-05-24T08:31:59Z

It is impossible now, and not sure that this is going to be good idea. Some of NRI plugins are going to be deployed as containers, and infrastructure components are also in containers (e.g. kubelet's static pod manifests), so not starting until NRI plugin registers will lead to non-functional node. Another thing is the crashes of the plugins and re-connects: we shouldn't have scenarios where crash of the plugin will render node even temporary to non-usable state.

For all of those the sync-on-connect calls were implemented: the plugin during start can inspect and adjust running containers which were started before NRI plugin registers. As well, if there is something that can't be modified via Update cals, there is always possibility for NRI plugin to trigger stop of the existing container, and eventual re-creation of it by orchestration layer above.

I agree with @kad that it's probably not a good idea to bake such logic into the runtime itself.

If I had to do something to this effect, my first idea would probably be to roll some extra tooling for it. Run the critical plugin(s) as DaemonSets, monitor whether they are ready/live (or have them refresh a CRD or a label on their node periodically and monitor that), then taint and if necessary drain/cordon/uncordon the nodes as necessary by extra tooling. If useful, maybe also label workloads that need the critical plugins as such, and have a mutating webhook to inject tolerations for unlabeled workloads to tolerate nodes without the critical plugins being up and running.

zhaodiaoer · 2024-07-17T07:30:25Z

In the scenarios where I work, i also encountered similar problems... I'm not sure if it's a good idea for us to provide the ability to categorize all plugins by role and make corresponding distinctions during the plugin registration/deregistration process. such as give a flag used to identify what the critical level for plugin to notify NRI when Initiate registration, and when deal with different level plugin's deregistration use different policy, like only when plugin explicitly initiates one rpc request can the deregistration be completed. (just an example and hasn't been carefully considered)

mikebrow mentioned this issue May 23, 2024

Add hook integration tests #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85

How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85

t33m commented May 22, 2024

kad commented May 22, 2024

mikebrow commented May 23, 2024

mikebrow commented May 23, 2024

klihub commented May 24, 2024

zhaodiaoer commented Jul 17, 2024

How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85

How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85

Comments

t33m commented May 22, 2024

kad commented May 22, 2024

mikebrow commented May 23, 2024

mikebrow commented May 23, 2024

klihub commented May 24, 2024

zhaodiaoer commented Jul 17, 2024