-
Notifications
You must be signed in to change notification settings - Fork 11
node-installer job does not terminate properly #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The install-pods of kwasm do not terminate with status In case of systemd this means, that containerd receives a
|
Seeing similar behavior in the uninstall jobs (when deleting a shim). The first pod deletes the shim and restarts containerd but ends with status
|
Update here after testing the current rcm node-installer behavior on different distros. ## distro: exit code, pod status
k3d: 255, Unknown
k3s: 255, Unknown
k0s: 255, Unknown
rke2: 255, Unknown
kind: 255, Unknown
microk8s: 0, Completed
minikube: 255, Unknown
aks: 255, Unknown I first played around with different signals and slightly varied logic for the current "get containerd pid, send syscall to terminate" approach. I didn't land on a combo that solved the issue for the k3d/k3s/rke2 distros. I then took inspiration from the current version of the node installer script in the containerd-shim-spin project and went the This did solve the issue (the container exit code is 0 and thus the pod's status is What do we think of going the Regardless of revised approach, I do like the idea of moving away from the current one-size-fits-all "get containerd pid, send syscall to terminate" and moving to distro-specific |
+1 Moving to "moving to distro-specific restarters" sounds like the most flexible approach to me. |
It seems to be an acceptable workaround. I only wonder what
Lets do this! |
As part of #68 I investigated an issue in the containerd restart routine. When the node-installer installs a runtime and restarts containerd, the corresponding pod terminates with status
Unknown
Overview:
Logs of Pod with status
Unknown
kubectl logs kwasm-worker-spin-v2-install-n82d9 -c downloader 2024-05-20T20:49:40 INFO start downloading shim from https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz... 2024-05-20T20:49:42 INFO download successful: total 40M drwxrwxrwx 1 root root 46 May 20 20:49 . drwxr-xr-x 1 root root 48 May 20 20:49 .. -rwxr-xr-x 1 1001 127 39.6M May 8 17:13 containerd-shim-spin-v2
Logs of Pod with status
Completed
kubectl logs kwasm-worker-spin-v2-install-rq78d -c downloader 2024-05-20T20:49:57 INFO start downloading shim from https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz... 2024-05-20T20:49:59 INFO download successful: total 40M drwxrwxrwx 1 root root 46 May 20 20:49 . drwxr-xr-x 1 root root 48 May 20 20:49 .. -rwxr-xr-x 1 1001 127 39.6M May 8 17:13 containerd-shim-spin-v2
kubectl logs kwasm-worker-spin-v2-install-rq78d -c provisioner 2024/05/20 20:50:00 INFO shim installed shim=spin-v2 path=/opt/kwasm/bin/containerd-shim-spin-v2 new-version=false 2024/05/20 20:50:00 INFO runtime config already exists, skipping runtime=spin-v2 2024/05/20 20:50:00 INFO shim configured shim=spin-v2 path=/etc/containerd/config.toml 2024/05/20 20:50:00 INFO nothing changed, nothing more to do
The
Completed
pod only gets scheduled in the first place, as the first one did not terminated successfully; even though the actual job (rewriting containerd config and removing the binary) is done. As a result, the second run of the job has nothing left todo.Description of Pod with Status
Unknown
kubectl describe po kwasm-worker-spin-v2-install-n82d9
Entire resource of Job (e.g. for recreation of the bug)
While the goal of installing/uninstalling the shim is achieved, this is not a desired behavior and desires for a solution.
The text was updated successfully, but these errors were encountered: