Container Toolkit messed up config.toml in GKE cluster with containerd 2.0

I've been using the GPU Operator to install Nvidia driver in my GKE cluster. This has worked fine until last week when the cluster version is upgraded from GKE 1.32 to 1.33. One of the big change is that the containerd version is now 2.0. Now, a lot of GPU operator Daemonset is failing with the following errors:
```
"failed to \"CreatePodSandbox\" for \"gpu-feature-discovery-nv7f9_gpu-operator(b07a2405-29b1-412e-8400-a1367a10e76d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"gpu-feature-discovery-nv7f9_gpu-operator(b07a2405-29b1-412e-8400-a1367a10e76d)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"2b62cb2980cd477cefbea1155871bcd2f50b2c6fb3e922039554d8b3fc14784f\\\": plugin type=\\\"gke\\\" failed (add): failed to find plugin \\\"gke\\\" in path [/opt/cni/bin]\"
```

So we went and looked at the `config.toml` file and it looks like it was modified and added an incorrect line of config:
```
    [plugins."io.containerd.cri.v1.runtime".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
```

This is most likely because of the `nvidia-container-toolkit-daemonset` because it was the last pod to be able start successfully (and also it looks like the toolkit modifies this file?).

The toolkit version is from the container `nvcr.io/nvidia/k8s/container-toolkit:v1.17.8-ubuntu20.04` and the GPU operator is version 25.3.2 (the latest one).

Any insight on what is happening and why the `config.toml` was modified incorrectly for containerd 2.0?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Container Toolkit messed up config.toml in GKE cluster with containerd 2.0 #1222

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Container Toolkit messed up config.toml in GKE cluster with containerd 2.0 #1222

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions