|
| 1 | +# Root Cause Analysis: BoxLite VM Failure on Amazon Linux 2023 (EC2 c8i) |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +BoxLite VMs fail to start on Amazon Linux 2023 (kernel 6.1) on EC2 c8i instances. The guest kernel (Linux 6.12.62, embedded in libkrunfw) triggers an i8042 CPU reset during early boot, causing immediate VM termination with no error output. |
| 6 | + |
| 7 | +## Environment |
| 8 | + |
| 9 | +| Component | Working (Ubuntu) | Failing (AL2023) | |
| 10 | +|-----------|-----------------|------------------| |
| 11 | +| Host OS | Ubuntu 24.04 | Amazon Linux 2023 | |
| 12 | +| Host kernel | 6.17.0-1007-aws | 6.1.164-196.303.amzn2023 | |
| 13 | +| Instance type | c8i.4xlarge | c8i.4xlarge | |
| 14 | +| Instance ID | i-0a2516cdf06c221df (boxlite-prod) | i-095ae61ecbca0d780 (boxlite-dev) | |
| 15 | +| Nested KVM | Yes | Yes | |
| 16 | +| KVM capabilities | ept, vpid, unrestricted_guest | Identical | |
| 17 | +| Guest kernel | 6.12.62 (libkrunfw v5.1.0) | 6.12.62 (libkrunfw v5.1.0) | |
| 18 | + |
| 19 | +## Root Cause |
| 20 | + |
| 21 | +The guest kernel (Linux 6.12.62) issues a **hardware reset via the i8042 keyboard controller** during very early boot when running under nested KVM on host kernel 6.1. |
| 22 | + |
| 23 | +### Shutdown sequence |
| 24 | + |
| 25 | +``` |
| 26 | +1. krun_start_enter() called |
| 27 | +2. VMM creates VM, sets up virtio devices (CMOS, balloon, rng, console, fs, block, vsock, net) |
| 28 | +3. 8 vCPUs started in paused state, then resumed |
| 29 | +4. Boot vCPU (vCPU0) runs ~5 KVM_RUN iterations (I/O handling) |
| 30 | +5. Guest kernel writes CMD_RESET_CPU (0xFE) to i8042 port 0x64 |
| 31 | + → i8042 device handler writes to reset_evt EventFd |
| 32 | +6. VMM event loop receives reset_evt, calls libc::_exit(0) |
| 33 | +7. All threads killed instantly — no console output, no error messages |
| 34 | +``` |
| 35 | + |
| 36 | +### Why the original diagnosis was wrong |
| 37 | + |
| 38 | +PR #417 diagnosed this as "broken nested KVM on EC2 c8i / Amazon Linux 2023" based on a flawed KVM smoke test. The smoke test didn't initialize vCPU registers (CS:IP, RFLAGS) before KVM_RUN, causing it to fail on any nested KVM. This was fixed in PR #421. |
| 39 | + |
| 40 | +KVM itself works correctly on both kernels — the issue is that the **guest kernel** (libkrunfw 6.12.62) can't boot under kernel 6.1's nested KVM emulation. |
| 41 | + |
| 42 | +## Diagnosis Process |
| 43 | + |
| 44 | +### Step 1: Verify KVM works |
| 45 | + |
| 46 | +Wrote a C program (`kvm_test.c`) that creates a minimal VM with a HLT instruction and proper vCPU register init (CS base=0, RIP=0, RFLAGS=0x2). |
| 47 | + |
| 48 | +``` |
| 49 | +# AL2023 (kernel 6.1) |
| 50 | +WITH register setup: exit_reason=5 (HLT) ✅ |
| 51 | +WITHOUT register setup: exit_reason=17 (SHUTDOWN) ❌ |
| 52 | +
|
| 53 | +# Ubuntu 24.04 (kernel 6.17) |
| 54 | +WITH register setup: exit_reason=5 (HLT) ✅ |
| 55 | +WITHOUT register setup: exit_reason=0 (UNKNOWN) ❌ |
| 56 | +``` |
| 57 | + |
| 58 | +**Conclusion:** KVM works on both. The smoke test was broken, not KVM. |
| 59 | + |
| 60 | +### Step 2: Compare KVM capabilities |
| 61 | + |
| 62 | +Used `KVM_CHECK_EXTENSION` ioctl via Python to compare capabilities: |
| 63 | + |
| 64 | +``` |
| 65 | +# Both kernels — identical results: |
| 66 | +IRQCHIP(1)=1, HLT(2)=1, USER_MEMORY(4)=1, SET_TSS_ADDR(6)=1 |
| 67 | +VAPIC(13)=0, EXT_CPUID(14)=1, INTERNAL_ERROR_DATA(25)=4096 |
| 68 | +TSC_CONTROL(28)=0, SET_BOOT_CPU_ID(37)=1, X86_DISABLE_EXITS(41)=1 |
| 69 | +SPLIT_IRQCHIP(117)=1, IMMEDIATE_EXIT(119)=0, VMX_EXCEPTION_PAYLOAD(129)=3 |
| 70 | +
|
| 71 | +CPU flags (both): vmx, ept, vpid, unrestricted_guest, tpr_shadow, vnmi, flexpriority |
| 72 | +Nested virt: Y (both) |
| 73 | +``` |
| 74 | + |
| 75 | +**Conclusion:** KVM capabilities are identical. The difference is not in feature exposure. |
| 76 | + |
| 77 | +### Step 3: Run BoxLite with debug logging |
| 78 | + |
| 79 | +Built boxlite-cli on AL2023 and ran with `RUST_LOG=trace`: |
| 80 | + |
| 81 | +``` |
| 82 | +[shim] T+0ms: main() entered |
| 83 | +[shim] T+5ms: config parsed |
| 84 | +[shim] T+9ms: logging initialized |
| 85 | +[shim] T+83ms: gvproxy created |
| 86 | +[shim] T+83ms: engine created |
| 87 | +[shim] T+92ms: instance created (krun FFI calls done) |
| 88 | +[shim] T+93ms: entering VM (krun_start_enter) |
| 89 | +[krun] krun_start_enter called |
| 90 | +[DEBUG vmm] using vcpu exit code: 0 |
| 91 | +[INFO vmm] Vmm is stopping. |
| 92 | +``` |
| 93 | + |
| 94 | +**Observations:** |
| 95 | +- Shim starts fine, all FFI calls succeed |
| 96 | +- Virtio devices set up: `set_irq_line: 5-13` (balloon, rng, console, fs×2, block×2, vsock, net) |
| 97 | +- VMM stops with exit code 0 (clean shutdown) |
| 98 | +- Console output: completely empty |
| 99 | +- No `KVM_EXIT_HLT`, `SHUTDOWN`, `FAIL_ENTRY`, or `INTERNAL_ERROR` observed |
| 100 | + |
| 101 | +### Step 4: Instrument libkrun vCPU run loop |
| 102 | + |
| 103 | +Added `eprintln!("[krun-debug] vCPU run loop iteration {n}")` to the `running()` function in `vmm/src/linux/vstate.rs`. Required rebuilding libkrun from vendored source (not the prebuilt binary): |
| 104 | + |
| 105 | +```bash |
| 106 | +# Must delete libkrun's own target directory to force rebuild |
| 107 | +rm -rf src/deps/libkrun-sys/vendor/libkrun/target/ |
| 108 | +make shim # triggers full rebuild |
| 109 | +``` |
| 110 | + |
| 111 | +**Result:** |
| 112 | +``` |
| 113 | +[krun-debug] vCPU run loop iteration 1 # ×8 (one per vCPU) |
| 114 | +[krun-debug] vCPU run loop iteration 2 # boot vCPU only |
| 115 | +[krun-debug] vCPU run loop iteration 3 |
| 116 | +[krun-debug] vCPU run loop iteration 4 |
| 117 | +[krun-debug] vCPU run loop iteration 5 |
| 118 | +``` |
| 119 | + |
| 120 | +Boot vCPU runs exactly 5 KVM_RUN iterations. Other 7 vCPUs run 1 each. |
| 121 | + |
| 122 | +### Step 5: Instrument KVM exit handlers |
| 123 | + |
| 124 | +Added `eprintln` and `std::fs::write("/tmp/krun-vcpu-*.log")` to every `VcpuExit` handler: HLT, Shutdown, SystemEvent, FailEntry, InternalError, and the `Stopped`/`Error` catch in `running()`. |
| 125 | + |
| 126 | +**Result:** None of them fired. No files written. No exit handler triggered. |
| 127 | + |
| 128 | +Verified strings are in the deployed binary: |
| 129 | +```bash |
| 130 | +$ strings ~/.local/share/boxlite/runtimes/v0.8.0-*/boxlite-shim | grep 'krun-debug' |
| 131 | +[krun-debug] KVM_EXIT_HLT |
| 132 | +[krun-debug] KVM_EXIT_SHUTDOWN |
| 133 | +[krun-debug] KVM_SYSTEM_EVENT: event= |
| 134 | +[krun-debug] vCPU STOPPED at iteration |
| 135 | +... |
| 136 | +``` |
| 137 | + |
| 138 | +**Conclusion:** The vCPU exits through `Interrupted` (EINTR) → channel disconnected, not through any KVM exit handler. Something external triggers the shutdown. |
| 139 | + |
| 140 | +### Step 6: Instrument i8042 device handler |
| 141 | + |
| 142 | +Found in `libkrun/src/devices/src/legacy/i8042.rs:229-236`: |
| 143 | +```rust |
| 144 | +OFS_STATUS if data[0] == CMD_RESET_CPU => { |
| 145 | + // The guest wants to assert the CPU reset line. |
| 146 | + if let Err(e) = self.reset_evt.write(1) { ... } |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +Added instrumentation: |
| 151 | +```rust |
| 152 | +let _ = std::fs::write("/tmp/krun-i8042-reset.log", |
| 153 | + "i8042: CMD_RESET_CPU triggered by guest kernel\n"); |
| 154 | +eprintln!("[krun-debug] i8042: CMD_RESET_CPU - guest requested reset!"); |
| 155 | +``` |
| 156 | + |
| 157 | +**Result:** |
| 158 | +``` |
| 159 | +$ cat /tmp/krun-i8042-reset.log |
| 160 | +i8042: CMD_RESET_CPU triggered by guest kernel |
| 161 | +
|
| 162 | +$ grep krun-debug shim.stderr |
| 163 | +[krun-debug] vCPU run loop iteration 1 # ×8 |
| 164 | +[krun-debug] vCPU run loop iteration 2 |
| 165 | +[krun-debug] i8042: CMD_RESET_CPU - guest requested reset! # ← HERE |
| 166 | +[krun-debug] vCPU run loop iteration 3 |
| 167 | +[krun-debug] vCPU run loop iteration 4 |
| 168 | +[krun-debug] i8042: CMD_RESET_CPU - guest requested reset! # ← RETRY |
| 169 | +[krun-debug] vCPU run loop iteration 5 |
| 170 | +``` |
| 171 | + |
| 172 | +**Root cause confirmed:** The guest kernel triggers CMD_RESET_CPU via i8042 port 0x64 at iteration 2, retries at iteration 4. The i8042 handler writes to `reset_evt` EventFd → VMM event loop calls `_exit(0)`. |
| 173 | + |
| 174 | +### Step 7: Capture all I/O operations |
| 175 | + |
| 176 | +Added `eprintln` to IoIn, IoOut, MmioRead, MmioWrite handlers. Also modified `DEFAULT_KERNEL_CMDLINE` to remove `quiet` and add `earlyprintk=hvc0 panic=5 panic_print=15`. |
| 177 | + |
| 178 | +**Complete I/O trace of the failed boot:** |
| 179 | +``` |
| 180 | +vCPU run loop iteration 1 # ×8 (one per vCPU, likely HLT/interrupt) |
| 181 | +IoIn port=0x64 len=1 # Read i8042 status register |
| 182 | +IoOut port=0x64 data=[254] # Write 0xFE = CMD_RESET_CPU (first reset attempt) |
| 183 | +i8042: CMD_RESET_CPU |
| 184 | +IoIn port=0x64 len=1 # Read status again (retry) |
| 185 | +IoOut port=0x64 data=[254] # Write 0xFE again (second reset attempt) |
| 186 | +i8042: CMD_RESET_CPU |
| 187 | +``` |
| 188 | + |
| 189 | +**Key observation:** The guest kernel performs ZERO other I/O operations — no serial port writes, no MMIO, no CMOS, no PIC/APIC, no PCI. It goes directly from initial execution to i8042 reset. This means the failure happens during **kernel decompression or very early startup** (before any hardware probing). |
| 190 | + |
| 191 | +The `earlyprintk=hvc0` and removal of `quiet` had no effect — console output remained empty because the crash occurs before the console driver is even initialized. |
| 192 | + |
| 193 | +### Step 8: Analysis of the failure point |
| 194 | + |
| 195 | +The kernel cmdline includes `reboot=k` which tells Linux to reboot via keyboard controller. Combined with `panic=-1` (immediate reboot on panic), the sequence is: |
| 196 | +1. Guest kernel starts decompression / early init |
| 197 | +2. Triple fault or early boot error (before any console output) |
| 198 | +3. CPU reset → i8042 CMD_RESET_CPU (0xFE) on port 0x64 |
| 199 | +4. VMM `_exit(0)` |
| 200 | + |
| 201 | +The triple fault likely occurs because kernel 6.1's nested KVM doesn't properly emulate a CPU feature that the guest kernel 6.12.62 requires during very early boot (decompressor or head_64.S code). |
| 202 | + |
| 203 | +### Further investigation needed |
| 204 | + |
| 205 | +To determine the exact CPU feature causing the triple fault: |
| 206 | +1. Use `perf kvm stat` on the host to capture VM entry/exit reasons |
| 207 | +2. Try an older libkrunfw guest kernel (e.g., 5.15 or 6.1) to see if it boots |
| 208 | +3. Compare CPUID leaves between kernel 6.1 and 6.17 nested KVM using a test program |
| 209 | + |
| 210 | +## Why kernel 6.1 vs 6.17 |
| 211 | + |
| 212 | +KVM capabilities are identical between both kernels (verified via `KVM_CHECK_EXTENSION`). The difference is likely in: |
| 213 | + |
| 214 | +1. **CPUID emulation**: Kernel 6.1 may not expose certain CPUID leaves that the guest kernel 6.12.62 requires (e.g., newer Intel features like CET, WAITPKG, AMX) |
| 215 | +2. **MSR handling**: Certain MSRs may not be properly emulated under nested KVM in 6.1 |
| 216 | +3. **VMX feature bits**: The nested VMX VMCS may not advertise features the guest kernel expects |
| 217 | + |
| 218 | +The guest kernel hits an early boot failure (likely during CPU feature detection or APIC setup), determines it can't continue, and uses the i8042 reset as the fallback shutdown mechanism. |
| 219 | + |
| 220 | +## Recommendations |
| 221 | + |
| 222 | +### For libkrun/libkrunfw |
| 223 | + |
| 224 | +1. **Add early console output**: The guest kernel should print to the virtio console before reaching the point where it would trigger a reset. Adding `earlyprintk=hvc0` or similar to the kernel command line would capture the actual error. |
| 225 | + |
| 226 | +2. **Consider a more compatible guest kernel**: libkrunfw uses kernel 6.12.62 which may require features not available in kernel 6.1's nested KVM. Testing with an older guest kernel (e.g., 5.15 or 6.1) could work. |
| 227 | + |
| 228 | +3. **Don't silently _exit on i8042 reset**: Instead of calling `_exit(0)`, the VMM should log a clear error: "Guest kernel triggered hardware reset (i8042 CMD_RESET_CPU). The guest kernel may be incompatible with this KVM configuration." |
| 229 | + |
| 230 | +### For BoxLite users |
| 231 | + |
| 232 | +1. **Use Ubuntu 24.04** (kernel 6.17) instead of Amazon Linux 2023 for EC2 instances |
| 233 | +2. **Or upgrade AL2023 kernel** to a newer version if available |
| 234 | +3. **Or use bare-metal instances** (.metal) which don't have nested KVM limitations |
| 235 | + |
| 236 | +## Files involved |
| 237 | + |
| 238 | +| File | Role | |
| 239 | +|------|------| |
| 240 | +| `libkrun/src/devices/src/legacy/i8042.rs:229-236` | i8042 CMD_RESET_CPU handler triggers VM exit | |
| 241 | +| `libkrun/src/vmm/src/lib.rs:403-428` | VMM event handler calls _exit(0) on reset_evt | |
| 242 | +| `libkrun/src/vmm/src/linux/vstate.rs:1421-1590` | vCPU run_emulation() and running() state machine | |
| 243 | +| `libkrun/src/libkrun/src/lib.rs:2684-2688` | VMM event loop in krun_start_enter | |
| 244 | + |
| 245 | +## Upstream Status |
| 246 | + |
| 247 | +No existing upstream issues or fixes found for running libkrun on a nested KVM host (our scenario: EC2 L0 → KVM L1 → libkrun L2). This appears to be an unreported configuration. |
| 248 | + |
| 249 | +Note: [libkrunfw#50](https://github.com/containers/libkrunfw/issues/50) is about a *different* thing — enabling nested KVM *inside* libkrun guest VMs, not about running libkrun on top of nested KVM. |
| 250 | + |
| 251 | +Relevant references: |
| 252 | +- [libkrun#460](https://github.com/containers/libkrun/issues/460) — Silent reset with low memory (fixed in v1.17.0, similar symptom) |
| 253 | +- [libkrun#314](https://github.com/containers/libkrun/issues/314) — ENOMEM on kernel 6.12/6.13 host (different issue) |
| 254 | +- [libkrun#302](https://github.com/containers/libkrun/pull/302) — KVM SystemEvents support (aarch64 only) |
| 255 | + |
| 256 | +BoxLite works on Ubuntu 24.04 (kernel 6.17) on the same EC2 c8i hardware because the newer kernel provides better nested KVM emulation that satisfies the guest kernel 6.12.62's requirements. |
| 257 | + |
| 258 | +## Date |
| 259 | + |
| 260 | +Investigation conducted: 2026-04-02 |
0 commit comments