Skip to content

feat: Add KVM module loader DaemonSet to enable hardware acceleration…#1708

Draft
sanya-pf9 wants to merge 1 commit intomainfrom
feat/adding-dev/kvm-support-to-not-fallback-to-qemu
Draft

feat: Add KVM module loader DaemonSet to enable hardware acceleration…#1708
sanya-pf9 wants to merge 1 commit intomainfrom
feat/adding-dev/kvm-support-to-not-fallback-to-qemu

Conversation

@sanya-pf9
Copy link
Contributor

@sanya-pf9 sanya-pf9 commented Mar 17, 2026

The v2v-helper pod uses QEMU for disk conversion. Without /dev/kvm, QEMU falls back to software emulation (TCG), which is significantly slower and memory-hungry (observed ~1.1GB RSS vs hardware-accelerated baseline).

This adds a privileged DaemonSet (kvm-module-loader) to the migration-system namespace that loads the kvm-intel or kvm-amd kernel module on the vjailbreak node at startup. Once loaded, /dev/kvm is visible inside the v2v-helper pod via the existing /dev bind mount, enabling QEMU to use KVM hardware acceleration automatically.

Which issue(s) this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)

fixes #1646

Special notes for your reviewer

I did my pros and cons list before choosing a daemon set, listing them here

Design decisions

Why a DaemonSet and not an init container in the v2v-helper job?
The v2v-helper runs thousands of migrations on a single node. An init container would run modprobe on every single migration, adding overhead to each one for something that only needs to happen once per
node boot. The DaemonSet loads the module once at startup and it stays loaded for all subsequent migrations.

Why not both (DaemonSet + init container as a safety net)?
modprobe is idempotent, if the module is already loaded it exits immediately. So the init container wouldn't be harmful, but it would be redundant overhead on every migration. The race window where the
DaemonSet hasn't run yet but a migration job has already started is practically unreachable in normal usage (module loads in ~100ms; by the time a user creates a MigrationPlan and the job pod is scheduled, the DaemonSet has already run). Even if the race did occur, QEMU falls back to TCG gracefully rather than crashing.

Why is the DaemonSet always running (sleep infinity)?
DaemonSet pods require restartPolicy: Always. The sleep infinity keeps the pod alive after modprobe completes so the DaemonSet considers the node configured. Resource overhead is negligible (~1MB RSS, 0 CPU).

Minimal RBAC — why no ServiceAccount or RBAC rules?
The pod never calls the Kubernetes API. automountServiceAccountToken: false ensures no token is mounted. No ClusterRole or RoleBinding is needed. The only elevated privilege is privileged: true on the container (required to load kernel modules) and a read-only /lib/modules host mount (required by modprobe to locate module files).

Why scoped to control-plane nodes only?
The vjailbreak node runs with the node-role.kubernetes.io/control-plane label. Scoping the DaemonSet to this label ensures the module loader only runs on the vjailbreak node and not on any other nodes
in a broader cluster. The toleration for the control-plane:NoSchedule taint is required because control-plane nodes typically carry this taint.

Testing done

please add testing details (logs, screenshots, etc.)

@spai-p9
Copy link
Collaborator

spai-p9 commented Mar 18, 2026

@sanya-pf9 Ik this is still a draft, But probably some consideration to make before we start making a fix.

need to mount /dev/kvm inside v2v-helper also. Now there will be 2 cases, for modprobe kvm-intel || modprobe kvm-amd || true; sleep infinity, the vjailbreak vm should have nested virtualisation enabled for this to work or else it will error out, so we have to do 2 things:

  1. We need to detect if vjailbreak VM has nested virtualisation, if so install the kvm kernel modules.
  2. If 1 is satisfied, then only mount /dev/kvm or else v2v-helper pod will fail to come up saying mount not found.

So we need to smartly handle this in scale up also. imagine there can be a case where say some agents have nested virtualisation and some don't so this case we need to handle as well.

@sanya-pf9
Copy link
Contributor Author

@spai-p9 yess!! All valid points!

My idea was to let VMs fall back to QEMU/TCG gracefully when KVM isn't available, we don't need to force it. The || true ensures the DaemonSet never crashes, and if /dev/kvm doesn't exist (nested virt not enabled), QEMU automatically falls back to TCG. No pod failures, no special detection logic needed. Enabling nested virt on the hypervisor is outside vjailbreak's control anyway.

We should discuss this in detail with the team. hence I am maintaining my thought process too in the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mount /dev/kvm into the v2v-helper pod so KVM can actually be used

2 participants