Skip to content

Commit 23509fe

Browse files
committed
docs: add root cause analysis for AL2023 EC2 c8i VM boot failure
Guest kernel (libkrunfw 6.12.62) triggers i8042 CMD_RESET_CPU during early boot on nested KVM with host kernel 6.1 (Amazon Linux 2023). The reset causes immediate _exit(0) with no console output. Root cause: the guest kernel detects an incompatible CPU/hardware configuration under kernel 6.1's nested KVM emulation and performs a hardware reset via the i8042 controller. Ubuntu 24.04 (kernel 6.17) works because it provides better nested VMX emulation. This is an unreported configuration upstream — libkrun has not been tested on nested KVM Linux hosts.
1 parent da71624 commit 23509fe

1 file changed

Lines changed: 260 additions & 0 deletions

File tree

Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# Root Cause Analysis: BoxLite VM Failure on Amazon Linux 2023 (EC2 c8i)
2+
3+
## Summary
4+
5+
BoxLite VMs fail to start on Amazon Linux 2023 (kernel 6.1) on EC2 c8i instances. The guest kernel (Linux 6.12.62, embedded in libkrunfw) triggers an i8042 CPU reset during early boot, causing immediate VM termination with no error output.
6+
7+
## Environment
8+
9+
| Component | Working (Ubuntu) | Failing (AL2023) |
10+
|-----------|-----------------|------------------|
11+
| Host OS | Ubuntu 24.04 | Amazon Linux 2023 |
12+
| Host kernel | 6.17.0-1007-aws | 6.1.164-196.303.amzn2023 |
13+
| Instance type | c8i.4xlarge | c8i.4xlarge |
14+
| Instance ID | i-0a2516cdf06c221df (boxlite-prod) | i-095ae61ecbca0d780 (boxlite-dev) |
15+
| Nested KVM | Yes | Yes |
16+
| KVM capabilities | ept, vpid, unrestricted_guest | Identical |
17+
| Guest kernel | 6.12.62 (libkrunfw v5.1.0) | 6.12.62 (libkrunfw v5.1.0) |
18+
19+
## Root Cause
20+
21+
The guest kernel (Linux 6.12.62) issues a **hardware reset via the i8042 keyboard controller** during very early boot when running under nested KVM on host kernel 6.1.
22+
23+
### Shutdown sequence
24+
25+
```
26+
1. krun_start_enter() called
27+
2. VMM creates VM, sets up virtio devices (CMOS, balloon, rng, console, fs, block, vsock, net)
28+
3. 8 vCPUs started in paused state, then resumed
29+
4. Boot vCPU (vCPU0) runs ~5 KVM_RUN iterations (I/O handling)
30+
5. Guest kernel writes CMD_RESET_CPU (0xFE) to i8042 port 0x64
31+
→ i8042 device handler writes to reset_evt EventFd
32+
6. VMM event loop receives reset_evt, calls libc::_exit(0)
33+
7. All threads killed instantly — no console output, no error messages
34+
```
35+
36+
### Why the original diagnosis was wrong
37+
38+
PR #417 diagnosed this as "broken nested KVM on EC2 c8i / Amazon Linux 2023" based on a flawed KVM smoke test. The smoke test didn't initialize vCPU registers (CS:IP, RFLAGS) before KVM_RUN, causing it to fail on any nested KVM. This was fixed in PR #421.
39+
40+
KVM itself works correctly on both kernels — the issue is that the **guest kernel** (libkrunfw 6.12.62) can't boot under kernel 6.1's nested KVM emulation.
41+
42+
## Diagnosis Process
43+
44+
### Step 1: Verify KVM works
45+
46+
Wrote a C program (`kvm_test.c`) that creates a minimal VM with a HLT instruction and proper vCPU register init (CS base=0, RIP=0, RFLAGS=0x2).
47+
48+
```
49+
# AL2023 (kernel 6.1)
50+
WITH register setup: exit_reason=5 (HLT) ✅
51+
WITHOUT register setup: exit_reason=17 (SHUTDOWN) ❌
52+
53+
# Ubuntu 24.04 (kernel 6.17)
54+
WITH register setup: exit_reason=5 (HLT) ✅
55+
WITHOUT register setup: exit_reason=0 (UNKNOWN) ❌
56+
```
57+
58+
**Conclusion:** KVM works on both. The smoke test was broken, not KVM.
59+
60+
### Step 2: Compare KVM capabilities
61+
62+
Used `KVM_CHECK_EXTENSION` ioctl via Python to compare capabilities:
63+
64+
```
65+
# Both kernels — identical results:
66+
IRQCHIP(1)=1, HLT(2)=1, USER_MEMORY(4)=1, SET_TSS_ADDR(6)=1
67+
VAPIC(13)=0, EXT_CPUID(14)=1, INTERNAL_ERROR_DATA(25)=4096
68+
TSC_CONTROL(28)=0, SET_BOOT_CPU_ID(37)=1, X86_DISABLE_EXITS(41)=1
69+
SPLIT_IRQCHIP(117)=1, IMMEDIATE_EXIT(119)=0, VMX_EXCEPTION_PAYLOAD(129)=3
70+
71+
CPU flags (both): vmx, ept, vpid, unrestricted_guest, tpr_shadow, vnmi, flexpriority
72+
Nested virt: Y (both)
73+
```
74+
75+
**Conclusion:** KVM capabilities are identical. The difference is not in feature exposure.
76+
77+
### Step 3: Run BoxLite with debug logging
78+
79+
Built boxlite-cli on AL2023 and ran with `RUST_LOG=trace`:
80+
81+
```
82+
[shim] T+0ms: main() entered
83+
[shim] T+5ms: config parsed
84+
[shim] T+9ms: logging initialized
85+
[shim] T+83ms: gvproxy created
86+
[shim] T+83ms: engine created
87+
[shim] T+92ms: instance created (krun FFI calls done)
88+
[shim] T+93ms: entering VM (krun_start_enter)
89+
[krun] krun_start_enter called
90+
[DEBUG vmm] using vcpu exit code: 0
91+
[INFO vmm] Vmm is stopping.
92+
```
93+
94+
**Observations:**
95+
- Shim starts fine, all FFI calls succeed
96+
- Virtio devices set up: `set_irq_line: 5-13` (balloon, rng, console, fs×2, block×2, vsock, net)
97+
- VMM stops with exit code 0 (clean shutdown)
98+
- Console output: completely empty
99+
- No `KVM_EXIT_HLT`, `SHUTDOWN`, `FAIL_ENTRY`, or `INTERNAL_ERROR` observed
100+
101+
### Step 4: Instrument libkrun vCPU run loop
102+
103+
Added `eprintln!("[krun-debug] vCPU run loop iteration {n}")` to the `running()` function in `vmm/src/linux/vstate.rs`. Required rebuilding libkrun from vendored source (not the prebuilt binary):
104+
105+
```bash
106+
# Must delete libkrun's own target directory to force rebuild
107+
rm -rf src/deps/libkrun-sys/vendor/libkrun/target/
108+
make shim # triggers full rebuild
109+
```
110+
111+
**Result:**
112+
```
113+
[krun-debug] vCPU run loop iteration 1 # ×8 (one per vCPU)
114+
[krun-debug] vCPU run loop iteration 2 # boot vCPU only
115+
[krun-debug] vCPU run loop iteration 3
116+
[krun-debug] vCPU run loop iteration 4
117+
[krun-debug] vCPU run loop iteration 5
118+
```
119+
120+
Boot vCPU runs exactly 5 KVM_RUN iterations. Other 7 vCPUs run 1 each.
121+
122+
### Step 5: Instrument KVM exit handlers
123+
124+
Added `eprintln` and `std::fs::write("/tmp/krun-vcpu-*.log")` to every `VcpuExit` handler: HLT, Shutdown, SystemEvent, FailEntry, InternalError, and the `Stopped`/`Error` catch in `running()`.
125+
126+
**Result:** None of them fired. No files written. No exit handler triggered.
127+
128+
Verified strings are in the deployed binary:
129+
```bash
130+
$ strings ~/.local/share/boxlite/runtimes/v0.8.0-*/boxlite-shim | grep 'krun-debug'
131+
[krun-debug] KVM_EXIT_HLT
132+
[krun-debug] KVM_EXIT_SHUTDOWN
133+
[krun-debug] KVM_SYSTEM_EVENT: event=
134+
[krun-debug] vCPU STOPPED at iteration
135+
...
136+
```
137+
138+
**Conclusion:** The vCPU exits through `Interrupted` (EINTR) → channel disconnected, not through any KVM exit handler. Something external triggers the shutdown.
139+
140+
### Step 6: Instrument i8042 device handler
141+
142+
Found in `libkrun/src/devices/src/legacy/i8042.rs:229-236`:
143+
```rust
144+
OFS_STATUS if data[0] == CMD_RESET_CPU => {
145+
// The guest wants to assert the CPU reset line.
146+
if let Err(e) = self.reset_evt.write(1) { ... }
147+
}
148+
```
149+
150+
Added instrumentation:
151+
```rust
152+
let _ = std::fs::write("/tmp/krun-i8042-reset.log",
153+
"i8042: CMD_RESET_CPU triggered by guest kernel\n");
154+
eprintln!("[krun-debug] i8042: CMD_RESET_CPU - guest requested reset!");
155+
```
156+
157+
**Result:**
158+
```
159+
$ cat /tmp/krun-i8042-reset.log
160+
i8042: CMD_RESET_CPU triggered by guest kernel
161+
162+
$ grep krun-debug shim.stderr
163+
[krun-debug] vCPU run loop iteration 1 # ×8
164+
[krun-debug] vCPU run loop iteration 2
165+
[krun-debug] i8042: CMD_RESET_CPU - guest requested reset! # ← HERE
166+
[krun-debug] vCPU run loop iteration 3
167+
[krun-debug] vCPU run loop iteration 4
168+
[krun-debug] i8042: CMD_RESET_CPU - guest requested reset! # ← RETRY
169+
[krun-debug] vCPU run loop iteration 5
170+
```
171+
172+
**Root cause confirmed:** The guest kernel triggers CMD_RESET_CPU via i8042 port 0x64 at iteration 2, retries at iteration 4. The i8042 handler writes to `reset_evt` EventFd → VMM event loop calls `_exit(0)`.
173+
174+
### Step 7: Capture all I/O operations
175+
176+
Added `eprintln` to IoIn, IoOut, MmioRead, MmioWrite handlers. Also modified `DEFAULT_KERNEL_CMDLINE` to remove `quiet` and add `earlyprintk=hvc0 panic=5 panic_print=15`.
177+
178+
**Complete I/O trace of the failed boot:**
179+
```
180+
vCPU run loop iteration 1 # ×8 (one per vCPU, likely HLT/interrupt)
181+
IoIn port=0x64 len=1 # Read i8042 status register
182+
IoOut port=0x64 data=[254] # Write 0xFE = CMD_RESET_CPU (first reset attempt)
183+
i8042: CMD_RESET_CPU
184+
IoIn port=0x64 len=1 # Read status again (retry)
185+
IoOut port=0x64 data=[254] # Write 0xFE again (second reset attempt)
186+
i8042: CMD_RESET_CPU
187+
```
188+
189+
**Key observation:** The guest kernel performs ZERO other I/O operations — no serial port writes, no MMIO, no CMOS, no PIC/APIC, no PCI. It goes directly from initial execution to i8042 reset. This means the failure happens during **kernel decompression or very early startup** (before any hardware probing).
190+
191+
The `earlyprintk=hvc0` and removal of `quiet` had no effect — console output remained empty because the crash occurs before the console driver is even initialized.
192+
193+
### Step 8: Analysis of the failure point
194+
195+
The kernel cmdline includes `reboot=k` which tells Linux to reboot via keyboard controller. Combined with `panic=-1` (immediate reboot on panic), the sequence is:
196+
1. Guest kernel starts decompression / early init
197+
2. Triple fault or early boot error (before any console output)
198+
3. CPU reset → i8042 CMD_RESET_CPU (0xFE) on port 0x64
199+
4. VMM `_exit(0)`
200+
201+
The triple fault likely occurs because kernel 6.1's nested KVM doesn't properly emulate a CPU feature that the guest kernel 6.12.62 requires during very early boot (decompressor or head_64.S code).
202+
203+
### Further investigation needed
204+
205+
To determine the exact CPU feature causing the triple fault:
206+
1. Use `perf kvm stat` on the host to capture VM entry/exit reasons
207+
2. Try an older libkrunfw guest kernel (e.g., 5.15 or 6.1) to see if it boots
208+
3. Compare CPUID leaves between kernel 6.1 and 6.17 nested KVM using a test program
209+
210+
## Why kernel 6.1 vs 6.17
211+
212+
KVM capabilities are identical between both kernels (verified via `KVM_CHECK_EXTENSION`). The difference is likely in:
213+
214+
1. **CPUID emulation**: Kernel 6.1 may not expose certain CPUID leaves that the guest kernel 6.12.62 requires (e.g., newer Intel features like CET, WAITPKG, AMX)
215+
2. **MSR handling**: Certain MSRs may not be properly emulated under nested KVM in 6.1
216+
3. **VMX feature bits**: The nested VMX VMCS may not advertise features the guest kernel expects
217+
218+
The guest kernel hits an early boot failure (likely during CPU feature detection or APIC setup), determines it can't continue, and uses the i8042 reset as the fallback shutdown mechanism.
219+
220+
## Recommendations
221+
222+
### For libkrun/libkrunfw
223+
224+
1. **Add early console output**: The guest kernel should print to the virtio console before reaching the point where it would trigger a reset. Adding `earlyprintk=hvc0` or similar to the kernel command line would capture the actual error.
225+
226+
2. **Consider a more compatible guest kernel**: libkrunfw uses kernel 6.12.62 which may require features not available in kernel 6.1's nested KVM. Testing with an older guest kernel (e.g., 5.15 or 6.1) could work.
227+
228+
3. **Don't silently _exit on i8042 reset**: Instead of calling `_exit(0)`, the VMM should log a clear error: "Guest kernel triggered hardware reset (i8042 CMD_RESET_CPU). The guest kernel may be incompatible with this KVM configuration."
229+
230+
### For BoxLite users
231+
232+
1. **Use Ubuntu 24.04** (kernel 6.17) instead of Amazon Linux 2023 for EC2 instances
233+
2. **Or upgrade AL2023 kernel** to a newer version if available
234+
3. **Or use bare-metal instances** (.metal) which don't have nested KVM limitations
235+
236+
## Files involved
237+
238+
| File | Role |
239+
|------|------|
240+
| `libkrun/src/devices/src/legacy/i8042.rs:229-236` | i8042 CMD_RESET_CPU handler triggers VM exit |
241+
| `libkrun/src/vmm/src/lib.rs:403-428` | VMM event handler calls _exit(0) on reset_evt |
242+
| `libkrun/src/vmm/src/linux/vstate.rs:1421-1590` | vCPU run_emulation() and running() state machine |
243+
| `libkrun/src/libkrun/src/lib.rs:2684-2688` | VMM event loop in krun_start_enter |
244+
245+
## Upstream Status
246+
247+
No existing upstream issues or fixes found for running libkrun on a nested KVM host (our scenario: EC2 L0 → KVM L1 → libkrun L2). This appears to be an unreported configuration.
248+
249+
Note: [libkrunfw#50](https://github.com/containers/libkrunfw/issues/50) is about a *different* thing — enabling nested KVM *inside* libkrun guest VMs, not about running libkrun on top of nested KVM.
250+
251+
Relevant references:
252+
- [libkrun#460](https://github.com/containers/libkrun/issues/460) — Silent reset with low memory (fixed in v1.17.0, similar symptom)
253+
- [libkrun#314](https://github.com/containers/libkrun/issues/314) — ENOMEM on kernel 6.12/6.13 host (different issue)
254+
- [libkrun#302](https://github.com/containers/libkrun/pull/302) — KVM SystemEvents support (aarch64 only)
255+
256+
BoxLite works on Ubuntu 24.04 (kernel 6.17) on the same EC2 c8i hardware because the newer kernel provides better nested KVM emulation that satisfies the guest kernel 6.12.62's requirements.
257+
258+
## Date
259+
260+
Investigation conducted: 2026-04-02

0 commit comments

Comments
 (0)