Conversation
Merge the Flint VMM (custom Zig microVMM) into the repo at vmm/, replacing Firecracker as the underlying VM engine. TypeScript SDK changes: - Rename FirecrackerApi → FlintApi with Flint-compatible payloads (two-phase snapshot/load, flat mem_file_path, no resume_vm) - Sandbox restore uses Flint CLI restore mode (--restore flags) instead of HTTP roundtrips for snapshot loading - Replace getFirecrackerPath() → getVmmPath() pointing to ~/.hearth/bin/flint - Setup builds Flint from source (zig build -Doptimize=ReleaseSafe) instead of downloading Firecracker binary - Delete dm-thin provisioning (src/vm/thin.ts, ~360 lines) — use cp --reflink=auto (COPYFILE_FICLONE) for all rootfs copies - Remove pool CLI command and thin pool from status - Socket renamed to flint.sock, vcpu_count set to 1 VMM changes (vmm/): - Copy Flint source, remove redundant files (agent, pool, pool_api, sandbox) - Strip pool mode CLI args, agent listener/accept, sandbox API endpoints - Add ELF vmlinux loading support alongside bzImage in boot/loader.zig - Remove flint-agent build target Known issue: ELF vmlinux boot (Firecracker CI kernels) stalls during init — needs further debugging of boot_params setup for ELF path. bzImage kernels work correctly.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces Firecracker with an in-tree Zig VMM named Flint: docs and CLI updated, thin-pool/dm-thin support removed, TypeScript VM API renamed/adjusted, setup now builds Flint from Changes
Sequence Diagram(s)sequenceDiagram
participant TS as TypeScript host client
participant Flint as flint (Zig VMM)
participant KVM as /dev/kvm (kernel)
participant Devices as VirtIO backends
TS->>Flint: PUT /boot-source, /drives, /vsock, /machine-config
Flint-->>TS: 200 OK (config stored)
TS->>Flint: PUT /actions {"ActionType":"InstanceStart"}
Flint->>KVM: open /dev/kvm, create VM, set memory, create vCPU
Flint->>Devices: init blk/net/vsock/serial backends
Flint->>KVM: KVM_RUN (vCPU loop)
loop device events
Devices->>Flint: signal interrupt / queue work
Flint->>KVM: inject IRQ / handle MMIO IO
end
sequenceDiagram
participant TS as TypeScript host client
participant FlintCLI as flint (CLI)
participant KVM as /dev/kvm
TS->>FlintCLI: spawn flint --restore --vmstate-path ... --mem-path ...
FlintCLI->>KVM: open /dev/kvm, create VM, mmap mem file
FlintCLI->>KVM: restore vCPU, irq/timer, device state (snapshot order)
FlintCLI-->>TS: API available / VM running
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/claude-in-sandbox.ts (1)
5-16:⚠️ Potential issue | 🟡 MinorUpdate the macOS prerequisite text to the new remote-daemon flow.
The block around Line 14 still tells macOS users to run
hearth lima setup, which conflicts with the new README guidance that Flint requires KVM and macOS should connect to a remote Linux daemon instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/claude-in-sandbox.ts` around lines 5 - 16, Update the macOS prerequisite line in the examples/claude-in-sandbox.ts README block: replace the old "hearth lima setup (macOS M3+)" instruction with the new remote-daemon guidance (e.g., mention connecting to a remote Linux daemon or running "npx hearth setup" on a remote Linux host) so it matches README and Flint KVM requirements; edit the Prerequisites bullet where "2." and the macOS note appear to reflect the remote-daemon flow and remove the lima instruction.README.md (1)
573-573:⚠️ Potential issue | 🟡 MinorSynchronize the earlier checkpoint section with this new behavior.
This line says
sandbox.checkpoint()keeps the sandbox running, but the## Checkpoint and Rollbacksection around Line 182 still says the active session is terminated on checkpoint. Please update that section too so the README does not document two different semantics for the same API.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 573, Update the "## Checkpoint and Rollback" section to match the current behavior described in v0.6: change any text that says checkpointing terminates or destroys the active session to state that sandbox.checkpoint() saves the VM state while keeping the sandbox/session running, allowing rollback and branching without terminating the active sandbox; ensure you mention sandbox.checkpoint() by name and remove or replace phrases like "terminates the active session" or "destroys the sandbox" so the README documents a single, consistent API behavior.
🟠 Major comments (22)
src/cli/setup.ts-77-80 (1)
77-80:⚠️ Potential issue | 🟠 MajorDon't treat any existing Flint binary as current.
Once
~/.hearth/bin/flintexists, setup never rebuilds it. After an upgrade, users can keep running an older VMM binary that no longer matches the shipped SDK/API or snapshot behavior. Please key this cache on a Flint version/fingerprint, not bare file existence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 77 - 80, The current check treats any file at flintPath (join(BIN_DIR, "flint")) as up-to-date; change it to validate a version/fingerprint before skipping install: compute a stable fingerprint or query the binary (e.g., run "flint --version" or compute SHA256 of flintPath) and compare it to the expected FLINT_FINGERPRINT/FLINT_VERSION constant shipped with the build; only log "flint: already installed" and return if the fingerprint/version matches, otherwise remove/overwrite the stale binary and proceed with building/installing; also handle failures to read or execute the existing binary by falling back to reinstall.src/cli/setup.ts-123-125 (1)
123-125:⚠️ Potential issue | 🟠 MajorUse HTTPS for the kernel index request.
The final kernel download is HTTPS, but the object listing that chooses
latestKeyis fetched over plaintext HTTP. That allows on-path tampering or downgrade selection before the secure download step.🔒 Suggested change
- const listUrl = `http://spec.ccfc.min.s3.amazonaws.com/?prefix=firecracker-ci/v1.15/${architecture}/vmlinux-&list-type=2`; + const listUrl = `https://spec.ccfc.min.s3.amazonaws.com/?prefix=firecracker-ci/v1.15/${architecture}/vmlinux-&list-type=2`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 123 - 125, Change the kernel index request to use HTTPS: update the construction of listUrl (where it's built using architecture and passed to fetchText) to use "https://" instead of "http://", so the object listing that determines latestKey is fetched over TLS; ensure fetchText is still used to fetch the HTTPS URL and that any logic that derives latestKey from xml remains unchanged.src/vm/api.ts-31-32 (1)
31-32: 🛠️ Refactor suggestion | 🟠 MajorCreate a typed
FlintApiErrorclass for HTTP errors from the Flint API.The module violates the coding guideline requiring each module to define its own typed error types. Using a generic
Errorforces callers to parse the message string to distinguish Flint HTTP status failures from other VM boot/configuration errors. Define the error class at the module level with structured fields for method, path, statusCode, and response body.♻️ Suggested change
+export class FlintApiError extends Error { + constructor( + readonly method: string, + readonly path: string, + readonly statusCode: number, + readonly responseBody: string, + ) { + super(`Flint ${method} ${path}: ${statusCode} ${responseBody}`); + this.name = "FlintApiError"; + } +} + /** Thin client for the Flint VMM REST API over Unix socket. */ export class FlintApi { @@ - reject(new Error(`Flint ${method} ${path}: ${res.statusCode} ${data}`)); + reject(new FlintApiError(method, path, res.statusCode, data));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/vm/api.ts` around lines 31 - 32, Add a module-level typed FlintApiError class (exported) that extends Error and exposes structured fields: method: string, path: string, statusCode: number, and body: string (and set name = "FlintApiError" and preserve stack). Replace the generic reject(new Error(...)) in the request/response path (the branch inside the code that currently checks res.statusCode >= 300 in the function that sends HTTP requests) with reject(new FlintApiError(method, path, res.statusCode, data)) so callers can instanceof-check and access structured fields instead of parsing the message.src/cli/setup.ts-100-108 (1)
100-108:⚠️ Potential issue | 🟠 MajorUse a typed error instead of generic
Errorfor setup failures.The
findVmmDir()function currently throws a genericError, which violates the coding guideline that "each module must define its own typed error types instead of using generic errors." Create aSetupErrorclass (or similar) extendingHearthErrorand use it here. This allows callers to distinguish setup failures from other error types.Note: The
vmm/directory will be included in the published npm package by default (no.npmignoreexists and.gitignoreonly excludes build artifacts), so the lookup will work from distributed tarballs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 100 - 108, Define a typed setup error and throw it instead of a generic Error: add a SetupError class (or similarly named) that extends HearthError (import HearthError where needed) in the same module or a nearby errors module, and replace the throw in findVmmDir() with throw new SetupError("Could not find vmm/ directory. Run from the hearth repo root."); ensure the new SetupError is exported if other modules need to catch it so callers can distinguish setup failures from other error types.src/cli/setup.ts-83-95 (1)
83-95:⚠️ Potential issue | 🟠 MajorValidate Zig version against the manifest's minimum requirement.
The preflight only checks that
zigexists viaexecSync("zig version", { stdio: "pipe" })but does not validate the version number. Thevmm/build.zig.zonmanifest declares.minimum_zig_version = "0.16.0-dev.1484+d0ba6642b", which is never checked. When an incompatible Zig version is installed, the build fails later with an opaque error fromzig buildsincestdio: "pipe"suppresses stderr output to the user.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 83 - 95, Read and parse the manifest's minimum_zig_version (from the manifest file in the vmm dir found by findVmmDir()), then replace the current naive execSync("zig version") check with logic that execSyncs "zig version", captures and parses the installed Zig semantic version, compares it against the manifest's minimum_zig_version string (handle pre-release/metadata as needed), and if the installed version is lower print a clear error mentioning both the installed version and the required minimum and exit; additionally, when invoking execSync("zig build -Doptimize=ReleaseSafe", { cwd: vmmDir, ... }), ensure stderr is not suppressed (either forward stdio or capture and log stderr on failure) so build errors are visible to the user.vmm/src/api.zig-109-117 (1)
109-117:⚠️ Potential issue | 🟠 MajorDon't return 204 for
InstanceStartuntil the boot config is valid.
handleAction()acknowledgesInstanceStartimmediately, butserve()can still discover that neitherkernel_pathnorsnapshot_pathwas configured and just keep waiting. Clients will see a successful start request even though the VM never transitions. Validate before responding, or return a 4xx from the action handler.Also applies to: 184-186, 367-386
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/api.zig` around lines 109 - 117, handleAction() currently acknowledges InstanceStart (returns 204) before validating the boot configuration, causing clients to see success even when neither config.kernel_path nor config.snapshot_path is set; update handleAction() (and the InstanceStart handling branches around the started/config checks) to validate that either config.snapshot_path != null or config.kernel_path != null before sending a 2xx response, and if validation fails return a 4xx error instead; mirror the same validation logic used in serve() (the code paths around the started flag and snapshot/kernel checks) so the action handler only acknowledges when the boot config is actually valid.vmm/src/devices/virtio/blk.zig-125-146 (1)
125-146:⚠️ Potential issue | 🟠 MajorKeep
used_lenequal to the bytes the device actually produced.The
T_INpath unconditionally addstotal_data_leneven if apread()fails partway through, andT_GET_IDaddsid_desc.leneven though the device writes at most 20 bytes.vmm/src/devices/virtio/queue.zig, Lines 76-92 copy this value straight into the used ring, so over-reporting here can make the guest treat unwritten bytes as valid data.Also applies to: 175-184, 195-197
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/virtio/blk.zig` around lines 125 - 146, Ensure used_len/device_written reflects actual bytes written rather than assumed totals: in the T_IN path (loop using linux.pread and variables file_offset, rc, bytes_read) accumulate the real bytes written per descriptor (bytes_read or 0 on short/failure) into used_len/device_written and do not add total_data_len when a pread fails partway; similarly in the T_GET_ID handling cap the increment to at most the actual number of bytes written (the min of id_desc.len and the device-written ID length, e.g., 20). Update the logic around variables used_len, device_written, and any additions of total_data_len/id_desc.len so they sum the actual bytes written (using rc/bytes_read) and stop adding on errors/breaks; also mirror this fix at the other affected ranges noted (around lines 175-184 and 195-197) so queue.zig receives correct used lengths.vmm/src/api.zig-726-739 (1)
726-739:⚠️ Potential issue | 🟠 Major
SendCtrlAltDelis a no-op right now.This branch never injects a key event, triggers ACPI shutdown, or otherwise tells the guest to power off—it only waits up to 5 seconds for
runtime.exitedto change on its own. That means the endpoint will mostly time out while still advertising shutdown semantics. Return 501/400 until there is an actual shutdown path, or wire it to a real device/runtime action.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/api.zig` around lines 726 - 739, The SendCtrlAltDel branch currently does nothing to initiate a shutdown and merely waits on runtime.exited; update it so that until you implement a real shutdown/inject path the handler returns a clear not-implemented response instead of pretending to shut down: detect the parsed.value.action_type == "SendCtrlAltDel" branch and call respondError(request, .not_implemented, "SendCtrlAltDel not supported") (or .bad_request as preferred) rather than spinning on runtime.exited/respondOk; alternatively, if you wire it to a device/runtime action, invoke the appropriate shutdown/inject function on the runtime/API and then wait on runtime.exited before calling respondOk/respondError.vmm/src/integration_tests.zig-32-43 (1)
32-43:⚠️ Potential issue | 🟠 MajorStop embedding
init_scriptinside a raw single-quoted shell string.Several callers pass
echo -n '', which closes the shell quote early and turns the rest of the script into shell syntax. That makesbuildInitrd()fail for the API/snapshot tests before Flint even starts. Write the file with Zig I/O or escape'before interpolating.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/integration_tests.zig` around lines 32 - 43, In buildInitrd, stop embedding init_script inside a raw single-quoted shell string passed to process.run (the current printf '...'), because unescaped single quotes in init_script break the shell; instead either (preferred) write the init file using Zig I/O (open a temp file in the TMPDIR and write init_script bytes directly before running bsdcpio) or (if keeping the shell approach) escape single quotes in init_script (replace each ' with '\'' or '"'"' in the string used to build the shell command) before concatenation; update the process.run invocation accordingly and keep references to init_script and process.run in the revised buildInitrd implementation.vmm/src/integration_tests.zig-200-219 (1)
200-219:⚠️ Potential issue | 🟠 MajorReplace fixed sleeps with a real readiness check.
sleep_ms(500/1000/2000)does not guarantee that the Unix socket is listening or that the VM/restore has reached the expected state, so these tests will be flaky on slower CI or under load. Pollconnect()orGET /vmuntil the API becomes reachable instead of sleeping a guessed duration.Also applies to: 245-257, 299-318, 346-401
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/integration_tests.zig` around lines 200 - 219, Replace the brittle sleep_ms(...) calls with a polling readiness helper that attempts to connect to the Firecracker API socket (using connect() on sock_path) or repeatedly calls httpRequest(sock_path, "GET", "/vm", null) until a success/expected state (e.g., HTTP success and "Running" in the body) or a short timeout; remove the fixed sleeps in the boot/restore/start sequences around boot_cmd/httpRequest and VM status checks (locations using sleep_ms, httpRequest, sock_path, and the status check that searches for "Running") and use the helper with backoff/retry and a clear timeout, freeing responses via allocator.free on each attempt and returning an error if the socket/API never becomes ready.vmm/src/devices/virtio/blk.zig-154-168 (1)
154-168:⚠️ Potential issue | 🟠 MajorTreat short
pwriteresults as I/O failure, not success.A positive short write is legal on Linux. The current loop advances
file_offsetby the full descriptor length and can still returnS_OK, which silently truncates guest writes under ENOSPC or signal interruption. Loop until each descriptor is fully written, or fail the request when the full payload cannot be persisted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/virtio/blk.zig` around lines 154 - 168, The loop treating pwrite short writes as success must be changed so each descriptor is fully written: inside the for over descs use a loop that calls linux.pwrite(self.fd, buf.ptr + written, buf.len - written, file_offset + written) until written == buf.len; on each pwrite check rc < 0 => set status = S_IOERR and break, treat rc == 0 as failure and set S_IOERR, otherwise advance written by rc and only after the descriptor finishes advance file_offset by the descriptor's written bytes (cast rc to u64 for arithmetic), handling overflow the same way you currently do; reference functions/vars: pwrite, file_offset, descs, desc.len, status, S_IOERR.vmm/src/integration_tests.zig-157-172 (1)
157-172:⚠️ Potential issue | 🟠 MajorThe 10s boot timeout never fires while
read()is blocked.
linux.read()on the stdout pipe is blocking, so if the guest stalls before printingFLINT_BOOT_OK, this loop can hang indefinitely inside the first read and never re-check the monotonic clock. Usepoll/nonblocking I/O around stdout so the timeout is enforceable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/integration_tests.zig` around lines 157 - 172, The loop that reads from the child stdout uses blocking linux.read(), so the timeout based on start_ts never triggers; change the I/O to use poll or non-blocking reads: call linux.poll (or set stdout.handle to O_NONBLOCK and use nonblocking reads) to wait for readability with a timeout derived from start_ts (or pass a computed timeout to poll), then only call linux.read when poll indicates data available and handle EAGAIN/EWOULDBLOCK; update the loop around buf/total/FLINT_BOOT_OK to break on poll timeout and keep existing logic for accumulating data and searching for "FLINT_BOOT_OK".src/sandbox/sandbox.ts-111-122 (1)
111-122:⚠️ Potential issue | 🟠 MajorClean up the child if restore initialization fails before
new Sandbox(...).Right now copy failures before
spawn()leave a stalerunDir, and failures afterspawn()but before the constructor returns can also leave a live Flint process behind. BecauseactiveSandboxes.add(this)only happens in the constructor, nothing reaps that partial restore later. Put the whole restore sequence under one cleanup guard.Also applies to: 156-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sandbox/sandbox.ts` around lines 111 - 122, The restoreFromDir flow can leave a stale runDir or a live child process if any step fails before the Sandbox constructor adds the instance to activeSandboxes; wrap the entire restore sequence in a cleanup guard (e.g., local variables for runDir and spawned child) so that on any error you remove the runDir and, if spawn() already created a child process, kill/cleanup that child; ensure the guard runs both for copyFile failures and for failures after spawn() but before the new Sandbox(...) returns, and keep activeSandboxes.add(this) only in the constructor as-is while the guard handles pre-constructor cleanup.vmm/src/api.zig-488-499 (1)
488-499:⚠️ Potential issue | 🟠 MajorMake the post-boot accept loop interruptible on guest exit.
The loop condition is checked before
accept(), so onceruntime.exitedflips, the main thread can still block forever waiting for another client. Invmm/src/main.zig, Lines 426-457 callservePostBoot()beforethread.join(), so this can keep Flint alive after the guest has already exited. Use nonblocking accept/poll or close the listener when the VM exits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/api.zig` around lines 488 - 499, The accept loop in servePostBoot (the while loop that calls server.accept) can block after runtime.exited flips; make the accept interruptible by either switching the listener to non-blocking and polling/selecting on both the server socket and runtime.exited, or by arranging for the listener to be closed when runtime.exited is set so accept() returns. Locate the loop that references runtime.exited, server.accept, and handlePostBootConnection and modify it to (1) avoid a blocking accept when runtime.exited is true by checking/using a nonblocking accept or poll, or (2) ensure the listener is closed from the VM-exit path so accept returns with an error and the loop can exit; update error handling for accept to treat the listener-closed case as loop termination rather than retry.vmm/src/main.zig-854-860 (1)
854-860:⚠️ Potential issue | 🟠 MajorThe pause handshake can miss a resume→pause transition.
vmm/src/api.zig:606-620clearsack_pausedand flipspausedback tofalsebefore returning. If another pause request setspaused = trueagain before this loop observes thatfalseedge, the vCPU keeps spinning here, never publishes a fresh ack, and the second pause waits forever. A level-triggered bool pair is too weak here; this needs an epoch/futex/condition-style handshake.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/main.zig` around lines 854 - 860, The current level-triggered boolean handshake using rt.ack_paused and rt.paused (observed in the loop in main.zig where vcpu spins and then clears vcpu.kvm_run.immediate_exit) can miss a resume→pause race; replace it with an epoch-based handshake: add an atomic pause_epoch counter (e.g., rt.pause_epoch) that the requestor increments to request a pause and an atomic ack_epoch (e.g., rt.ack_epoch) that the vCPU sets to match the requester’s epoch when it has acknowledged the pause; use release stores when updating epochs and acquire loads when reading them, and block/wake using a futex/condvar (or Zig's Event/Cond API) rather than a spin loop so the vCPU can publish a fresh ack for every new epoch and the requester can wait for ack_epoch == pause_epoch safely; update references to rt.paused/rt.ack_paused and vcpu.kvm_run.immediate_exit handling to use the new epoch/futex handshake (wake the vCPU when pause_epoch increments, and requester clears by incrementing resume or moving to next epoch).vmm/src/devices/virtio/queue.zig-159-165 (1)
159-165:⚠️ Potential issue | 🟠 MajorReject
avail.idxvalues that outrun the queue.
popAvail()accepts any forward jump and just mods the ring index. A guest can advanceavail.idxby more thansize, which makes the VMM reconsume ring entries and duplicate blk/net work.Suggested fix
pub fn popAvail(self: *Self, mem: *Memory) !?u16 { const avail_idx = try self.getAvailIdx(mem); if (avail_idx == self.last_avail_idx) return null; + const pending = avail_idx -% self.last_avail_idx; + if (pending > self.size) return error.AvailRingCorrupt; const head = try self.getAvailRing(mem, self.last_avail_idx); self.last_avail_idx +%= 1; return head; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/virtio/queue.zig` around lines 159 - 165, popAvail currently accepts arbitrarily large forward jumps of avail.idx and just mods the ring index; fix it by validating the advance before consuming: after calling getAvailIdx (symbol: avail_idx) compute the forward distance from self.last_avail_idx to avail_idx using ring-size arithmetic (use self.size/the queue size) and reject (return an error) if that distance is greater than the queue size (or >= self.size, whichever matches your semantics). Only if the distance is in-range proceed to call getAvailRing(self.last_avail_idx) and increment last_avail_idx; reference functions/fields to change: popAvail, getAvailIdx, getAvailRing, last_avail_idx, and the queue size field (self.size).vmm/src/devices/virtio/net.zig-171-178 (1)
171-178:⚠️ Potential issue | 🟠 MajorDon't consume an RX buffer on
WouldBlock.The frame was never delivered, but this path still moves the descriptor to the used ring with
len=0. A transient EAGAIN race drains the guest RX ring and can stall receive entirely.Suggested fix
const bytes_written = self.receiveFrame(mem, queue, head) catch |err| { // EAGAIN/EWOULDBLOCK means no more frames if (err == error.WouldBlock) { - // Already popped descriptor, push back as unused - queue.pushUsed(mem, head, 0) catch |e| log.warn("RX pushUsed failed: {}", .{e}); + queue.last_avail_idx -%= 1; break; } log.err("RX failed: {}", .{err}); queue.pushUsed(mem, head, 0) catch |e| log.warn("RX pushUsed failed: {}", .{e}); break;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/virtio/net.zig` around lines 171 - 178, When receiveFrame returns error.WouldBlock after popping a descriptor (head) you must not move that descriptor to the used ring with queue.pushUsed(..., 0); instead restore the descriptor to the guest RX available ring (i.e., requeue head) or otherwise avoid consuming it so the guest can retry. Update the error handler in the popAvail/receiveFrame block (where bytes_written is assigned) to call the appropriate requeue method (e.g., queue.pushAvail(mem, head) or the queue API that returns the descriptor to available) and skip calling queue.pushUsed for WouldBlock, then break/return as before.vmm/src/devices/virtio/net.zig-75-80 (1)
75-80:⚠️ Potential issue | 🟠 MajorTX packets are silently lost on
writev()failures under host backpressure.
init()setsO_NONBLOCKon the TAP fd (lines 77-80), buttransmitChain()does not handle write failures. Whenwritev()returnsEAGAINdue to host backpressure (lines 151-155), the code only logs a warning and continues toqueue.pushUsed()(line 158), marking the descriptor complete. The packet data is lost without retry or deferral, while the guest believes transmission succeeded.The RX path handles this correctly (lines 174-177) by re-queuing the descriptor on
WouldBlock. The TX path needs equivalent handling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/virtio/net.zig` around lines 75 - 80, init() sets O_NONBLOCK on the TAP fd but transmitChain() currently treats writev() EAGAIN as a successful transmission and calls queue.pushUsed(), losing the packet; modify transmitChain() (the writev() handling) to detect EAGAIN/WouldBlock and NOT call queue.pushUsed() for that descriptor, instead re-queue or stash the descriptor for retry (e.g., push into a pending TX queue or re-add to the virtqueue) and schedule a retry when the TAP fd becomes writable (via poll/epoll), and also handle partial writes by tracking bytes written and adjusting iovecs so remaining bytes are sent on retry; ensure queue.pushUsed() is only called after the entire packet has actually been written.vmm/src/devices/virtio/mmio.zig-209-217 (1)
209-217:⚠️ Potential issue | 🟠 MajorDevice reset needs a backend reset hook.
reset()clears only transport and queue state. ForVsock, established host sockets, pending control packets, and buffered writes survive a guest reset, so a reinitialized device can inherit stale connections and leaked FDs. Please clear backend-local state when status is written back to0.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/virtio/mmio.zig` around lines 209 - 217, The device reset currently only clears transport/queue fields in fn reset(self: *Self) but does not clear backend-local state (e.g., Vsock's host sockets, pending control packets, buffered writes), so add and call a backend reset hook from this code path: define a reset method on the backend interface (e.g., Backend.reset() or reset_on_device_reset()) and invoke it from the mmio reset implementation (fn reset and/or wherever status is written back to 0) so that when status is cleared the backend cleans up sockets, pending packets, and FDs; reference the existing fn reset(self: *Self), Self.device_features_sel, Self.queue_sel, and the Vsock backend implementation to implement the cleanup.vmm/src/snapshot.zig-251-268 (1)
251-268:⚠️ Potential issue | 🟠 MajorReject truncated or mismatched device records during restore.
This loop only checks
dev_len >= 16, then restores into whatever backend happens to be in sloti. A malformed snapshot can therefore trip bounds checks insideVirtioMmio.snapshotRestore(), and a reordered or missing CLI device can have queue state applied to the wrong transport. Since the first 16 bytes already carrydevice_id/mmio_base/irq, restore should fail unless the backend exists, the identity matches, and the consumed size matchesdev_len.vmm/src/snapshot.zig-286-292 (1)
286-292:⚠️ Potential issue | 🟠 MajorSnapshot files should not be world-readable.
openCreate()is used for both vmstate and raw RAM dumps, and0o644exposes guest memory/register state to other local users. Please tighten this to0o600unless shared snapshots are an explicit requirement.Suggested change
- }, 0o644)); + }, 0o600));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/snapshot.zig` around lines 286 - 292, The mode used when creating snapshot files in openCreate currently sets permissions to 0o644; update the file creation mode in openCreate (in snapshot.zig) from 0o644 to 0o600 so created vmstate and raw RAM dump files are owner-read/write only, ensuring snapshots are not world-readable; keep the rest of the flags (.WRONLY, .CREAT, .TRUNC, .CLOEXEC) unchanged.vmm/src/kvm/vcpu.zig-192-202 (1)
192-202:⚠️ Potential issue | 🟠 MajorUse the counted ioctl path for MSRs instead of ioctlVoid.
KVM_GET_MSRSandKVM_SET_MSRSreturn the number of MSR entries successfully transferred, but the current code drops this viaioctlVoid. If the kernel handles fewer entries than requested, the code won't detect it—the uninitialized tail ofMsrBuffergets serialized as snapshot state, silently corrupting restore. Replaceabi.ioctlVoid()withabi.ioctl()and validate the returned count equals the expected entry count.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/kvm/vcpu.zig` around lines 192 - 202, In getMsrs and setMsrs, replace the use of abi.ioctlVoid with the counted path abi.ioctl and check the returned number of MSRs transferred against the expected count (derived from snapshot_msr_indices.len / buf.nmsrs or buf.nmsrs) to detect partial transfers; after calling abi.ioctl(self.fd, c.KVM_GET_MSRS, `@intFromPtr`(buf)) / abi.ioctl(self.fd, c.KVM_SET_MSRS, `@intFromPtr`(buf)) verify the returned int equals the expected entry count and return an error (or handle it) if it is smaller so you never serialize uninitialized tail entries from MsrBuffer.
🟡 Minor comments (2)
vmm/src/seccomp.zig-171-227 (1)
171-227:⚠️ Potential issue | 🟡 MinorAudit mode still hard-kills on argument violations; log message is misleading.
The
default_actionparameter only controls the fallthrough for unlisted syscalls at line 172. The argument-validation branches forclone,socket, andmprotectat lines 179, 185, and 192 hardcodeSECCOMP_RET_KILL_PROCESS. This means in audit mode (enabled via--seccomp-audit), argument violations still terminate the process despite the log message at line 225 claiming "violations logged, not killed." Either route those reject branches throughdefault_actionso audit mode behaves consistently, or clarify both the log message and CLI help text to specify that only unlisted syscalls are logged.Suggested fix if audit mode should cover these cases too
- f[CLONE_BLK + 3] = bpf_stmt(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS); + f[CLONE_BLK + 3] = bpf_stmt(BPF_RET | BPF_K, default_action); - f[SOCKET_BLK + 2] = bpf_stmt(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS); + f[SOCKET_BLK + 2] = bpf_stmt(BPF_RET | BPF_K, default_action); - f[MPROT_BLK + 3] = bpf_stmt(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS); + f[MPROT_BLK + 3] = bpf_stmt(BPF_RET | BPF_K, default_action);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/seccomp.zig` around lines 171 - 227, The argument-validation branches for CLONE_BLK, SOCKET_BLK, and MPROT_BLK currently hardcode SECCOMP_RET_KILL_PROCESS so audit mode still kills; change the return statements at CLONE_BLK+3, SOCKET_BLK+2, and MPROT_BLK+3 to return the buildFilter's default_action instead of SECCOMP_RET_KILL_PROCESS so that when buildFilter is invoked with SECCOMP_RET_LOG (log_filter) those argument violations are logged rather than killing the process; keep kill_filter/log_filter and buildFilter usage as-is so behavior is driven by the passed default_action.vmm/src/devices/serial.zig-151-158 (1)
151-158:⚠️ Potential issue | 🟡 MinorClear the pending IRQ latch when THRE interrupts are disabled.
updateIir()switchesiirback toIIR_NO_INT, but it leavesirq_pendingset. If the guest clearsIER_THREbeforehasPendingIrq()runs, Flint still injects one stale interrupt.Suggested fix
if (self.ier & IER_THRE != 0) { self.iir = fifo_bits | IIR_THR_EMPTY; self.irq_pending = true; } else { self.iir = fifo_bits | IIR_NO_INT; + self.irq_pending = false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/devices/serial.zig` around lines 151 - 158, The updateIir function leaves the irq_pending latch set when THRE is disabled, causing stale interrupts; modify updateIir (in serial.zig) so that in the branch where (self.ier & IER_THRE) == 0 you not only set self.iir = fifo_bits | IIR_NO_INT but also clear the pending latch (self.irq_pending = false) so hasPendingIrq() won't report a stale IRQ; keep the existing behavior where setting IER_THRE sets IIR_THR_EMPTY and irq_pending = true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cli/setup.ts`:
- Around line 120-125: The setup flow in src/cli/setup.ts currently constructs a
listUrl that points to ELF vmlinux artifacts; change it to fetch a bzImage
artifact instead (e.g., use the Firecracker CI prefix that contains bzImage
builds) or add a flag to prevent downloading ELF by default; update the code
around the fetchText/listUrl construction in setup.ts so it requests the bzImage
file for the chosen architecture, and ensure this aligns with the loader used by
src/vm/snapshot.ts (getKernelPath) so base snapshot creation uses a bzImage
rather than an ELF until the ELF loader bug is fixed.
In `@vmm/build.zig`:
- Around line 4-6: The build currently forces an x86_64 Linux default by setting
.default_target in the target = b.standardTargetOptions(...) block; remove that
hardcoding so Zig's default/CLI-provided target is used instead (e.g., call
b.standardTargetOptions(.{}) or omit the .default_target field) and ensure the
same `target` symbol (used in the test and integration-test step definitions)
remains referenced so tests and the flint binary are built for the
host/CLI-provided target rather than always x86_64.
In `@vmm/src/boot/loader.zig`:
- Around line 238-249: The PT_LOAD virtual-to-physical translation uses the
wrong high-half base: change the subtraction base from 0xffffffff80000000 to the
documented 0xffffffff81000000 so that addresses like 0xffffffff81000000 map to
0x01000000 (PHYS_OFFSET) instead of being shifted by an extra 0x01000000; update
the same logic wherever you translate the kernel entry/RIP (the other block
mentioned) so both places that compute phys from p_paddr use the corrected
high-half base while keeping PHYS_OFFSET and the existing bounds checks.
- Around line 296-305: The setup_header fields are being written at the wrong
offsets (using params.OFF_SETUP_HEADER + 6 and +10); update the writes so the
HdrS magic and boot-protocol version are written at the start of the synthetic
setup_header. Replace the writeInt calls that target params.OFF_SETUP_HEADER + 6
and +10 with writes at params.OFF_SETUP_HEADER (for the 4-byte HDRS_MAGIC) and
params.OFF_SETUP_HEADER + 4 (for the 2-byte 0x020F version), keeping the same bp
buffer, std.mem.writeInt calls, and little-endian encoding.
In `@vmm/src/devices/virtio/vsock.zig`:
- Around line 403-447: The code currently consumes the TX descriptor chain
before ensuring data is actually kept: both the initial flush-block path and the
.AGAIN branch subtract full descriptor lengths and restart stashing from
descs[1..], which can drop or duplicate bytes. Fix by treating stashWrite() as
authoritative: track per-descriptor progress (an index and an offset into the
current descriptor) and only decrement remaining and advance to the next
descriptor by the actual number of bytes stashed/ written; in the .AGAIN branch,
stash the unsent suffix of the current descriptor and then continue stashing
subsequent descriptors starting at the next descriptor index (not re-iterating
from descs[1..]). Update conn.rx_cnt by the actual stashed count returned from
stashWrite() and use that to decide when to stop, and ensure any mem.slice()
failure breaks the descriptor loop rather than assuming the full length was
handled.
- Around line 479-556: deliverRxData currently pops an RX head (queue.popAvail)
before knowing if there's TX window or any readable socket data, which lets
polling drain the guest RX ring; change the logic in deliverRxData to first
check conn.availableForTx() and confirm the host socket is readable (e.g., via
non-blocking poll or a recv/peek) and only then call queue.popAvail to consume
descriptors, or if you must pop early ensure you restore the head (not
queue.pushUsed with len 0) when you hit tx_window==0 or EAGAIN; update the
branches around conn.availableForTx(), linux.readv handling of EAGAIN, and
queue.pushUsed usage so descriptors are not consumed unless header+data will
actually be written (refer to deliverRxData, queue.popAvail,
conn.availableForTx, linux.readv, and queue.pushUsed).
In `@vmm/src/jail.zig`:
- Around line 65-75: The device nodes are created as root:root and supplementary
groups remain after setuid/setgid; before calling linux.setgid(config.gid) and
linux.setuid(config.uid) you must change ownership of created nodes and clear
supplementary groups so the target user can access them. After creating dev/kvm
and (if config.need_tun) dev/net/tun, call chown/chownat (or the equivalent
syscall wrapper) to set owner to config.uid and group to config.gid for
linux.mknod-created files, and call the function to clear supplementary groups
(setgroups(0, NULL) or the Zig wrapper) before invoking linux.setgid and
linux.setuid; ensure these operations occur after linux.mknod and before
linux.setgid/linux.setuid so the jailed VMM can open the devices.
---
Outside diff comments:
In `@examples/claude-in-sandbox.ts`:
- Around line 5-16: Update the macOS prerequisite line in the
examples/claude-in-sandbox.ts README block: replace the old "hearth lima setup
(macOS M3+)" instruction with the new remote-daemon guidance (e.g., mention
connecting to a remote Linux daemon or running "npx hearth setup" on a remote
Linux host) so it matches README and Flint KVM requirements; edit the
Prerequisites bullet where "2." and the macOS note appear to reflect the
remote-daemon flow and remove the lima instruction.
In `@README.md`:
- Line 573: Update the "## Checkpoint and Rollback" section to match the current
behavior described in v0.6: change any text that says checkpointing terminates
or destroys the active session to state that sandbox.checkpoint() saves the VM
state while keeping the sandbox/session running, allowing rollback and branching
without terminating the active sandbox; ensure you mention sandbox.checkpoint()
by name and remove or replace phrases like "terminates the active session" or
"destroys the sandbox" so the README documents a single, consistent API
behavior.
---
Major comments:
In `@src/cli/setup.ts`:
- Around line 77-80: The current check treats any file at flintPath
(join(BIN_DIR, "flint")) as up-to-date; change it to validate a
version/fingerprint before skipping install: compute a stable fingerprint or
query the binary (e.g., run "flint --version" or compute SHA256 of flintPath)
and compare it to the expected FLINT_FINGERPRINT/FLINT_VERSION constant shipped
with the build; only log "flint: already installed" and return if the
fingerprint/version matches, otherwise remove/overwrite the stale binary and
proceed with building/installing; also handle failures to read or execute the
existing binary by falling back to reinstall.
- Around line 123-125: Change the kernel index request to use HTTPS: update the
construction of listUrl (where it's built using architecture and passed to
fetchText) to use "https://" instead of "http://", so the object listing that
determines latestKey is fetched over TLS; ensure fetchText is still used to
fetch the HTTPS URL and that any logic that derives latestKey from xml remains
unchanged.
- Around line 100-108: Define a typed setup error and throw it instead of a
generic Error: add a SetupError class (or similarly named) that extends
HearthError (import HearthError where needed) in the same module or a nearby
errors module, and replace the throw in findVmmDir() with throw new
SetupError("Could not find vmm/ directory. Run from the hearth repo root.");
ensure the new SetupError is exported if other modules need to catch it so
callers can distinguish setup failures from other error types.
- Around line 83-95: Read and parse the manifest's minimum_zig_version (from the
manifest file in the vmm dir found by findVmmDir()), then replace the current
naive execSync("zig version") check with logic that execSyncs "zig version",
captures and parses the installed Zig semantic version, compares it against the
manifest's minimum_zig_version string (handle pre-release/metadata as needed),
and if the installed version is lower print a clear error mentioning both the
installed version and the required minimum and exit; additionally, when invoking
execSync("zig build -Doptimize=ReleaseSafe", { cwd: vmmDir, ... }), ensure
stderr is not suppressed (either forward stdio or capture and log stderr on
failure) so build errors are visible to the user.
In `@src/sandbox/sandbox.ts`:
- Around line 111-122: The restoreFromDir flow can leave a stale runDir or a
live child process if any step fails before the Sandbox constructor adds the
instance to activeSandboxes; wrap the entire restore sequence in a cleanup guard
(e.g., local variables for runDir and spawned child) so that on any error you
remove the runDir and, if spawn() already created a child process, kill/cleanup
that child; ensure the guard runs both for copyFile failures and for failures
after spawn() but before the new Sandbox(...) returns, and keep
activeSandboxes.add(this) only in the constructor as-is while the guard handles
pre-constructor cleanup.
In `@src/vm/api.ts`:
- Around line 31-32: Add a module-level typed FlintApiError class (exported)
that extends Error and exposes structured fields: method: string, path: string,
statusCode: number, and body: string (and set name = "FlintApiError" and
preserve stack). Replace the generic reject(new Error(...)) in the
request/response path (the branch inside the code that currently checks
res.statusCode >= 300 in the function that sends HTTP requests) with reject(new
FlintApiError(method, path, res.statusCode, data)) so callers can
instanceof-check and access structured fields instead of parsing the message.
In `@vmm/src/api.zig`:
- Around line 109-117: handleAction() currently acknowledges InstanceStart
(returns 204) before validating the boot configuration, causing clients to see
success even when neither config.kernel_path nor config.snapshot_path is set;
update handleAction() (and the InstanceStart handling branches around the
started/config checks) to validate that either config.snapshot_path != null or
config.kernel_path != null before sending a 2xx response, and if validation
fails return a 4xx error instead; mirror the same validation logic used in
serve() (the code paths around the started flag and snapshot/kernel checks) so
the action handler only acknowledges when the boot config is actually valid.
- Around line 726-739: The SendCtrlAltDel branch currently does nothing to
initiate a shutdown and merely waits on runtime.exited; update it so that until
you implement a real shutdown/inject path the handler returns a clear
not-implemented response instead of pretending to shut down: detect the
parsed.value.action_type == "SendCtrlAltDel" branch and call
respondError(request, .not_implemented, "SendCtrlAltDel not supported") (or
.bad_request as preferred) rather than spinning on runtime.exited/respondOk;
alternatively, if you wire it to a device/runtime action, invoke the appropriate
shutdown/inject function on the runtime/API and then wait on runtime.exited
before calling respondOk/respondError.
- Around line 488-499: The accept loop in servePostBoot (the while loop that
calls server.accept) can block after runtime.exited flips; make the accept
interruptible by either switching the listener to non-blocking and
polling/selecting on both the server socket and runtime.exited, or by arranging
for the listener to be closed when runtime.exited is set so accept() returns.
Locate the loop that references runtime.exited, server.accept, and
handlePostBootConnection and modify it to (1) avoid a blocking accept when
runtime.exited is true by checking/using a nonblocking accept or poll, or (2)
ensure the listener is closed from the VM-exit path so accept returns with an
error and the loop can exit; update error handling for accept to treat the
listener-closed case as loop termination rather than retry.
In `@vmm/src/devices/virtio/blk.zig`:
- Around line 125-146: Ensure used_len/device_written reflects actual bytes
written rather than assumed totals: in the T_IN path (loop using linux.pread and
variables file_offset, rc, bytes_read) accumulate the real bytes written per
descriptor (bytes_read or 0 on short/failure) into used_len/device_written and
do not add total_data_len when a pread fails partway; similarly in the T_GET_ID
handling cap the increment to at most the actual number of bytes written (the
min of id_desc.len and the device-written ID length, e.g., 20). Update the logic
around variables used_len, device_written, and any additions of
total_data_len/id_desc.len so they sum the actual bytes written (using
rc/bytes_read) and stop adding on errors/breaks; also mirror this fix at the
other affected ranges noted (around lines 175-184 and 195-197) so queue.zig
receives correct used lengths.
- Around line 154-168: The loop treating pwrite short writes as success must be
changed so each descriptor is fully written: inside the for over descs use a
loop that calls linux.pwrite(self.fd, buf.ptr + written, buf.len - written,
file_offset + written) until written == buf.len; on each pwrite check rc < 0 =>
set status = S_IOERR and break, treat rc == 0 as failure and set S_IOERR,
otherwise advance written by rc and only after the descriptor finishes advance
file_offset by the descriptor's written bytes (cast rc to u64 for arithmetic),
handling overflow the same way you currently do; reference functions/vars:
pwrite, file_offset, descs, desc.len, status, S_IOERR.
In `@vmm/src/devices/virtio/mmio.zig`:
- Around line 209-217: The device reset currently only clears transport/queue
fields in fn reset(self: *Self) but does not clear backend-local state (e.g.,
Vsock's host sockets, pending control packets, buffered writes), so add and call
a backend reset hook from this code path: define a reset method on the backend
interface (e.g., Backend.reset() or reset_on_device_reset()) and invoke it from
the mmio reset implementation (fn reset and/or wherever status is written back
to 0) so that when status is cleared the backend cleans up sockets, pending
packets, and FDs; reference the existing fn reset(self: *Self),
Self.device_features_sel, Self.queue_sel, and the Vsock backend implementation
to implement the cleanup.
In `@vmm/src/devices/virtio/net.zig`:
- Around line 171-178: When receiveFrame returns error.WouldBlock after popping
a descriptor (head) you must not move that descriptor to the used ring with
queue.pushUsed(..., 0); instead restore the descriptor to the guest RX available
ring (i.e., requeue head) or otherwise avoid consuming it so the guest can
retry. Update the error handler in the popAvail/receiveFrame block (where
bytes_written is assigned) to call the appropriate requeue method (e.g.,
queue.pushAvail(mem, head) or the queue API that returns the descriptor to
available) and skip calling queue.pushUsed for WouldBlock, then break/return as
before.
- Around line 75-80: init() sets O_NONBLOCK on the TAP fd but transmitChain()
currently treats writev() EAGAIN as a successful transmission and calls
queue.pushUsed(), losing the packet; modify transmitChain() (the writev()
handling) to detect EAGAIN/WouldBlock and NOT call queue.pushUsed() for that
descriptor, instead re-queue or stash the descriptor for retry (e.g., push into
a pending TX queue or re-add to the virtqueue) and schedule a retry when the TAP
fd becomes writable (via poll/epoll), and also handle partial writes by tracking
bytes written and adjusting iovecs so remaining bytes are sent on retry; ensure
queue.pushUsed() is only called after the entire packet has actually been
written.
In `@vmm/src/devices/virtio/queue.zig`:
- Around line 159-165: popAvail currently accepts arbitrarily large forward
jumps of avail.idx and just mods the ring index; fix it by validating the
advance before consuming: after calling getAvailIdx (symbol: avail_idx) compute
the forward distance from self.last_avail_idx to avail_idx using ring-size
arithmetic (use self.size/the queue size) and reject (return an error) if that
distance is greater than the queue size (or >= self.size, whichever matches your
semantics). Only if the distance is in-range proceed to call
getAvailRing(self.last_avail_idx) and increment last_avail_idx; reference
functions/fields to change: popAvail, getAvailIdx, getAvailRing, last_avail_idx,
and the queue size field (self.size).
In `@vmm/src/integration_tests.zig`:
- Around line 32-43: In buildInitrd, stop embedding init_script inside a raw
single-quoted shell string passed to process.run (the current printf '...'),
because unescaped single quotes in init_script break the shell; instead either
(preferred) write the init file using Zig I/O (open a temp file in the TMPDIR
and write init_script bytes directly before running bsdcpio) or (if keeping the
shell approach) escape single quotes in init_script (replace each ' with '\'' or
'"'"' in the string used to build the shell command) before concatenation;
update the process.run invocation accordingly and keep references to init_script
and process.run in the revised buildInitrd implementation.
- Around line 200-219: Replace the brittle sleep_ms(...) calls with a polling
readiness helper that attempts to connect to the Firecracker API socket (using
connect() on sock_path) or repeatedly calls httpRequest(sock_path, "GET", "/vm",
null) until a success/expected state (e.g., HTTP success and "Running" in the
body) or a short timeout; remove the fixed sleeps in the boot/restore/start
sequences around boot_cmd/httpRequest and VM status checks (locations using
sleep_ms, httpRequest, sock_path, and the status check that searches for
"Running") and use the helper with backoff/retry and a clear timeout, freeing
responses via allocator.free on each attempt and returning an error if the
socket/API never becomes ready.
- Around line 157-172: The loop that reads from the child stdout uses blocking
linux.read(), so the timeout based on start_ts never triggers; change the I/O to
use poll or non-blocking reads: call linux.poll (or set stdout.handle to
O_NONBLOCK and use nonblocking reads) to wait for readability with a timeout
derived from start_ts (or pass a computed timeout to poll), then only call
linux.read when poll indicates data available and handle EAGAIN/EWOULDBLOCK;
update the loop around buf/total/FLINT_BOOT_OK to break on poll timeout and keep
existing logic for accumulating data and searching for "FLINT_BOOT_OK".
In `@vmm/src/kvm/vcpu.zig`:
- Around line 192-202: In getMsrs and setMsrs, replace the use of abi.ioctlVoid
with the counted path abi.ioctl and check the returned number of MSRs
transferred against the expected count (derived from snapshot_msr_indices.len /
buf.nmsrs or buf.nmsrs) to detect partial transfers; after calling
abi.ioctl(self.fd, c.KVM_GET_MSRS, `@intFromPtr`(buf)) / abi.ioctl(self.fd,
c.KVM_SET_MSRS, `@intFromPtr`(buf)) verify the returned int equals the expected
entry count and return an error (or handle it) if it is smaller so you never
serialize uninitialized tail entries from MsrBuffer.
In `@vmm/src/main.zig`:
- Around line 854-860: The current level-triggered boolean handshake using
rt.ack_paused and rt.paused (observed in the loop in main.zig where vcpu spins
and then clears vcpu.kvm_run.immediate_exit) can miss a resume→pause race;
replace it with an epoch-based handshake: add an atomic pause_epoch counter
(e.g., rt.pause_epoch) that the requestor increments to request a pause and an
atomic ack_epoch (e.g., rt.ack_epoch) that the vCPU sets to match the
requester’s epoch when it has acknowledged the pause; use release stores when
updating epochs and acquire loads when reading them, and block/wake using a
futex/condvar (or Zig's Event/Cond API) rather than a spin loop so the vCPU can
publish a fresh ack for every new epoch and the requester can wait for ack_epoch
== pause_epoch safely; update references to rt.paused/rt.ack_paused and
vcpu.kvm_run.immediate_exit handling to use the new epoch/futex handshake (wake
the vCPU when pause_epoch increments, and requester clears by incrementing
resume or moving to next epoch).
In `@vmm/src/snapshot.zig`:
- Around line 286-292: The mode used when creating snapshot files in openCreate
currently sets permissions to 0o644; update the file creation mode in openCreate
(in snapshot.zig) from 0o644 to 0o600 so created vmstate and raw RAM dump files
are owner-read/write only, ensuring snapshots are not world-readable; keep the
rest of the flags (.WRONLY, .CREAT, .TRUNC, .CLOEXEC) unchanged.
---
Minor comments:
In `@vmm/src/devices/serial.zig`:
- Around line 151-158: The updateIir function leaves the irq_pending latch set
when THRE is disabled, causing stale interrupts; modify updateIir (in
serial.zig) so that in the branch where (self.ier & IER_THRE) == 0 you not only
set self.iir = fifo_bits | IIR_NO_INT but also clear the pending latch
(self.irq_pending = false) so hasPendingIrq() won't report a stale IRQ; keep the
existing behavior where setting IER_THRE sets IIR_THR_EMPTY and irq_pending =
true.
In `@vmm/src/seccomp.zig`:
- Around line 171-227: The argument-validation branches for CLONE_BLK,
SOCKET_BLK, and MPROT_BLK currently hardcode SECCOMP_RET_KILL_PROCESS so audit
mode still kills; change the return statements at CLONE_BLK+3, SOCKET_BLK+2, and
MPROT_BLK+3 to return the buildFilter's default_action instead of
SECCOMP_RET_KILL_PROCESS so that when buildFilter is invoked with
SECCOMP_RET_LOG (log_filter) those argument violations are logged rather than
killing the process; keep kill_filter/log_filter and buildFilter usage as-is so
behavior is driven by the passed default_action.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5807f445-edf3-4944-b154-178ed15f036a
📒 Files selected for processing (41)
.gitignoreARCHITECTURE.mdCLAUDE.mdREADME.mddocs/exec-plans/active/merge-flint.mdexamples/claude-in-sandbox.tspackage.jsonsrc/cli/hearth.tssrc/cli/pool.tssrc/cli/setup.test.tssrc/cli/setup.tssrc/cli/status.tssrc/network/proxy.tssrc/sandbox/sandbox.tssrc/vm/api.tssrc/vm/binary.tssrc/vm/snapshot.tssrc/vm/thin.tsvmm/build.zigvmm/build.zig.zonvmm/src/api.zigvmm/src/boot/loader.zigvmm/src/boot/params.zigvmm/src/devices/serial.zigvmm/src/devices/virtio.zigvmm/src/devices/virtio/blk.zigvmm/src/devices/virtio/mmio.zigvmm/src/devices/virtio/net.zigvmm/src/devices/virtio/queue.zigvmm/src/devices/virtio/vsock.zigvmm/src/integration_tests.zigvmm/src/jail.zigvmm/src/kvm/abi.zigvmm/src/kvm/system.zigvmm/src/kvm/vcpu.zigvmm/src/kvm/vm.zigvmm/src/main.zigvmm/src/memory.zigvmm/src/seccomp.zigvmm/src/snapshot.zigvmm/src/tests.zig
💤 Files with no reviewable changes (3)
- src/cli/status.ts
- src/cli/pool.ts
- src/vm/thin.ts
| const target = b.standardTargetOptions(.{ | ||
| .default_target = .{ .cpu_arch = .x86_64, .os_tag = .linux }, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "vmm/build.zig target configuration:"
sed -n '3,8p' vmm/build.zig
echo
echo "src/cli/setup.ts Flint build invocation:"
sed -n '76,96p' src/cli/setup.tsRepository: joshuaisaact/hearth
Length of output: 1046
🏁 Script executed:
# Check for any comments explaining the x86_64 hardcoding in build.zig
echo "=== Checking for comments around hardcoded target ==="
head -20 vmm/build.zig
echo -e "\n=== Checking if build.zig defines any test steps ==="
grep -i "test\|zig build test" vmm/build.zig || echo "(no test references found)"
echo -e "\n=== Checking for any -Dtarget usage in the entire codebase ==="
rg "\-Dtarget" --type ts --type zig --type sh --type json 2>/dev/null || echo "(no -Dtarget found)"
echo -e "\n=== Checking if there's architecture documentation in README ==="
rg "x86_64|arm64|architecture|support" README.md 2>/dev/null | head -20 || echo "(no architecture info in README)"Repository: joshuaisaact/hearth
Length of output: 1886
🏁 Script executed:
# Verify Zig standardTargetOptions behavior with and without cpu_arch
# Check if there are any similar patterns in the repo or comments about target selection
echo "=== Checking README for build/target guidance ==="
grep -A5 -B5 "cross-compile\|aarch64\|arm64" README.md | head -30
echo -e "\n=== Checking full build.zig for any conditional target logic ==="
wc -l vmm/build.zigRepository: joshuaisaact/hearth
Length of output: 665
🏁 Script executed:
# Read full vmm/build.zig to see complete context
echo "=== Full vmm/build.zig ==="
cat -n vmm/build.zig
echo -e "\n=== Check agent/build.zig for multi-arch pattern ==="
head -20 agent/build.zigRepository: joshuaisaact/hearth
Length of output: 3086
Don't hardcode the default build target to x86_64.
Lines 4–6 hardcode .cpu_arch = .x86_64, .os_tag = .linux, but src/cli/setup.ts:86 invokes zig build without a -Dtarget override. This causes the setup to install an x86_64 flint binary on arm64 Linux hosts. Additionally, the test steps defined at lines 30–41 and 46–52 inherit this hardcoded target, causing zig build test and zig build integration-test to attempt execution of cross-compiled x86_64 artifacts on non-x86_64 machines, resulting in failures.
♻️ Suggested change
const target = b.standardTargetOptions(.{
- .default_target = .{ .cpu_arch = .x86_64, .os_tag = .linux },
+ .default_target = .{ .os_tag = .linux },
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const target = b.standardTargetOptions(.{ | |
| .default_target = .{ .cpu_arch = .x86_64, .os_tag = .linux }, | |
| }); | |
| const target = b.standardTargetOptions(.{ | |
| .default_target = .{ .os_tag = .linux }, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/build.zig` around lines 4 - 6, The build currently forces an x86_64 Linux
default by setting .default_target in the target = b.standardTargetOptions(...)
block; remove that hardcoding so Zig's default/CLI-provided target is used
instead (e.g., call b.standardTargetOptions(.{}) or omit the .default_target
field) and ensure the same `target` symbol (used in the test and
integration-test step definitions) remains referenced so tests and the flint
binary are built for the host/CLI-provided target rather than always x86_64.
| // Convert virtual address to physical if needed (Linux kernels | ||
| // use virtual addresses starting at 0xffffffff81000000, physical at 0x1000000) | ||
| const paddr: usize = blk: { | ||
| if (p_paddr < mem.size()) break :blk @intCast(p_paddr); | ||
| // Try using a standard kernel text offset (16MB) | ||
| const PHYS_OFFSET: u64 = 0x1000000; | ||
| if (p_paddr >= 0xffffffff80000000) { | ||
| const phys = p_paddr - 0xffffffff80000000 + PHYS_OFFSET; | ||
| if (phys < mem.size()) break :blk @intCast(phys); | ||
| } | ||
| log.err("PT_LOAD segment paddr 0x{x} out of guest memory bounds", .{p_paddr}); | ||
| return error.InvalidKernel; |
There was a problem hiding this comment.
Use the documented high-half base when translating ELF addresses.
The comment says 0xffffffff81000000 should map to 0x1000000, but subtracting 0xffffffff80000000 maps that same address to 0x2000000. That shifts both PT_LOAD segments and the final entry RIP by 16 MiB.
Suggested fix
- const PHYS_OFFSET: u64 = 0x1000000;
- if (p_paddr >= 0xffffffff80000000) {
- const phys = p_paddr - 0xffffffff80000000 + PHYS_OFFSET;
+ const PHYS_OFFSET: u64 = 0x1000000;
+ const KERNEL_VIRT_BASE: u64 = 0xffffffff81000000;
+ if (p_paddr >= KERNEL_VIRT_BASE) {
+ const phys = p_paddr - KERNEL_VIRT_BASE + PHYS_OFFSET;
if (phys < mem.size()) break :blk `@intCast`(phys);
}
@@
- const PHYS_OFFSET: u64 = 0x1000000;
- if (entry >= 0xffffffff80000000) {
- break :blk entry - 0xffffffff80000000 + PHYS_OFFSET;
+ const PHYS_OFFSET: u64 = 0x1000000;
+ const KERNEL_VIRT_BASE: u64 = 0xffffffff81000000;
+ if (entry >= KERNEL_VIRT_BASE) {
+ break :blk entry - KERNEL_VIRT_BASE + PHYS_OFFSET;
}Also applies to: 273-279
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/boot/loader.zig` around lines 238 - 249, The PT_LOAD
virtual-to-physical translation uses the wrong high-half base: change the
subtraction base from 0xffffffff80000000 to the documented 0xffffffff81000000 so
that addresses like 0xffffffff81000000 map to 0x01000000 (PHYS_OFFSET) instead
of being shifted by an extra 0x01000000; update the same logic wherever you
translate the kernel entry/RIP (the other block mentioned) so both places that
compute phys from p_paddr use the corrected high-half base while keeping
PHYS_OFFSET and the existing bounds checks.
| fn deliverRxData(self: *Self, mem: *Memory, queue: *Queue, conn: *Connection) !bool { | ||
| // Pop an RX descriptor from the guest | ||
| const head = (try queue.popAvail(mem)) orelse return error.NoBuffers; | ||
|
|
||
| var descs: [16]Queue.Desc = undefined; | ||
| const desc_count = queue.collectChain(mem, head, &descs) catch |err| { | ||
| queue.pushUsed(mem, head, 0) catch |e| log.warn("pushUsed failed: {}", .{e}); | ||
| return err; | ||
| }; | ||
|
|
||
| if (desc_count == 0 or descs[0].len < HDR_SIZE) { | ||
| queue.pushUsed(mem, head, 0) catch |e| log.warn("pushUsed failed: {}", .{e}); | ||
| return false; | ||
| } | ||
|
|
||
| // Calculate available data buffer space | ||
| var data_space: u32 = 0; | ||
| for (descs[1..desc_count]) |desc| { | ||
| data_space += desc.len; | ||
| } | ||
|
|
||
| // Also respect flow control | ||
| const tx_window = conn.availableForTx(); | ||
| if (tx_window == 0 and data_space > 0) { | ||
| // No TX window — push descriptor back unused | ||
| queue.pushUsed(mem, head, 0) catch |e| log.warn("pushUsed failed: {}", .{e}); | ||
| return false; | ||
| } | ||
| const max_read = @min(data_space, tx_window); | ||
|
|
||
| // Try to read from the host socket into data descriptors using readv | ||
| var total_read: u32 = 0; | ||
| const data_descs = descs[1..desc_count]; | ||
| if (max_read > 0 and data_descs.len > 0) { | ||
| var iov: [16]std.posix.iovec = undefined; | ||
| var iov_count: usize = 0; | ||
| var remaining = max_read; | ||
| for (data_descs) |desc| { | ||
| if (remaining == 0) break; | ||
| const read_len = @min(desc.len, remaining); | ||
| const buf = mem.slice(@intCast(desc.addr), read_len) catch break; | ||
| iov[iov_count] = .{ .base = buf.ptr, .len = buf.len }; | ||
| iov_count += 1; | ||
| remaining -= read_len; | ||
| } | ||
|
|
||
| if (iov_count == 0) { | ||
| queue.pushUsed(mem, head, 0) catch |e| log.warn("RX pushUsed failed: {}", .{e}); | ||
| return false; | ||
| } | ||
|
|
||
| const rc: isize = @bitCast(linux.readv(conn.fd, @ptrCast(&iov), @intCast(iov_count))); | ||
| if (rc < 0) { | ||
| const errno: linux.E = @enumFromInt(@as(u16, @intCast(-rc))); | ||
| if (errno == .AGAIN) { | ||
| // No data available | ||
| queue.pushUsed(mem, head, 0) catch |e| log.warn("RX pushUsed failed: {}", .{e}); | ||
| return false; | ||
| } | ||
| // Socket error — close connection, send RST to guest | ||
| log.warn("read from host socket failed: {}", .{errno}); | ||
| self.sendRstToGuest(mem, queue, head, &descs, conn); | ||
| self.closeConnectionPtr(conn); | ||
| return true; | ||
| } | ||
| if (rc == 0) { | ||
| // EOF — host closed connection, send RST to guest | ||
| self.sendRstToGuest(mem, queue, head, &descs, conn); | ||
| self.closeConnectionPtr(conn); | ||
| return true; | ||
| } | ||
| total_read = @intCast(rc); | ||
| } | ||
|
|
||
| if (total_read == 0 and data_space > 0) { | ||
| // No data to deliver | ||
| queue.pushUsed(mem, head, 0) catch |e| log.warn("pushUsed failed: {}", .{e}); | ||
| return false; |
There was a problem hiding this comment.
Avoid consuming RX descriptors when no packet can be produced.
deliverRxData() pops an RX head before it knows whether the socket is readable. Both the tx_window == 0 path and the readv(...)=EAGAIN path complete that descriptor with len 0 and return false, so polling an idle connection drains the guest RX ring without delivering data or raising an interrupt. Please gate popAvail() on readiness/window or keep the head until a header/data packet is actually written.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/devices/virtio/vsock.zig` around lines 479 - 556, deliverRxData
currently pops an RX head (queue.popAvail) before knowing if there's TX window
or any readable socket data, which lets polling drain the guest RX ring; change
the logic in deliverRxData to first check conn.availableForTx() and confirm the
host socket is readable (e.g., via non-blocking poll or a recv/peek) and only
then call queue.popAvail to consume descriptors, or if you must pop early ensure
you restore the head (not queue.pushUsed with len 0) when you hit tx_window==0
or EAGAIN; update the branches around conn.availableForTx(), linux.readv
handling of EAGAIN, and queue.pushUsed usage so descriptors are not consumed
unless header+data will actually be written (refer to deliverRxData,
queue.popAvail, conn.availableForTx, linux.readv, and queue.pushUsed).
| try check(linux.mkdir("dev", 0o755), "mkdir(/dev)"); | ||
| try check(linux.mknod("dev/kvm", S_IFCHR | 0o660, DEV_KVM), "mknod(/dev/kvm)"); | ||
|
|
||
| if (config.need_tun) { | ||
| try check(linux.mkdir("dev/net", 0o755), "mkdir(/dev/net)"); | ||
| try check(linux.mknod("dev/net/tun", S_IFCHR | 0o660, DEV_NET_TUN), "mknod(/dev/net/tun)"); | ||
| } | ||
|
|
||
| // Drop privileges — last step requiring root | ||
| try check(linux.setgid(config.gid), "setgid"); | ||
| try check(linux.setuid(config.uid), "setuid"); |
There was a problem hiding this comment.
Fix device ownership and supplementary groups before dropping root.
/dev/kvm and /dev/net/tun are created as 0660 root:root, then the process switches to the non-zero UID/GID required by vmm/src/main.zig, Lines 146-226. After that, the jailed VMM can no longer open those devices unless it happens to retain a privileged group, and setuid/setgid here also leave any inherited supplementary groups intact. Chown the device nodes (or otherwise grant the target UID/GID access) and clear supplementary groups before the UID/GID drop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/jail.zig` around lines 65 - 75, The device nodes are created as
root:root and supplementary groups remain after setuid/setgid; before calling
linux.setgid(config.gid) and linux.setuid(config.uid) you must change ownership
of created nodes and clear supplementary groups so the target user can access
them. After creating dev/kvm and (if config.need_tun) dev/net/tun, call
chown/chownat (or the equivalent syscall wrapper) to set owner to config.uid and
group to config.gid for linux.mknod-created files, and call the function to
clear supplementary groups (setgroups(0, NULL) or the Zig wrapper) before
invoking linux.setgid and linux.setuid; ensure these operations occur after
linux.mknod and before linux.setgid/linux.setuid so the jailed VMM can open the
devices.
Three fixes for the ELF kernel boot path: 1. Set initial MSRs (IA32_MISC_ENABLE, IA32_APICBASE, TSC) before vCPU starts. Without these, the kernel hangs after LSM init because perf_event_init and APIC setup fail silently. 2. Return 0xFF for unhandled IO port reads. Without this, the kernel's 8250 serial driver detects phantom UARTs at COM2/COM3/COM4 and spins trying to initialize them. 3. Switch to 5.10.x kernel from Firecracker CI. The 6.1.x kernel dropped CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES which Flint needs for virtio device discovery via kernel cmdline. Also: add boot_params fields (boot_flag, cmdline_size, kernel_alignment, init_size, xloadflags) matching what Firecracker sets for ELF kernels. Use sequential API calls in snapshot creation (Promise.all caused race). Add waitForFile for vsock listener before VM spawn. Fresh boot works e2e: setup → base snapshot creation succeeds. Snapshot restore still needs debugging (guest doesn't resume).
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
src/cli/setup.ts (1)
120-126:⚠️ Potential issue | 🟠 MajorSetup still downloads ELF vmlinux which stalls during kernel init.
The PR objectives explicitly state "ELF vmlinux boot stalls during kernel init" and "bzImage kernels boot correctly." Yet setup fetches
vmlinux-5.10(ELF format). This will cause the defaulthearth setupflow to produce a non-functional base snapshot.Switch to fetching a bzImage kernel until the ELF loader is fixed, or add a clear warning that ELF loading is experimental.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 120 - 126, The setup currently downloads an ELF vmlinux (see listUrl and the fetchText import in src/cli/setup.ts) which stalls during kernel init; change the download target to the corresponding bzImage build (update the prefix/path in listUrl to point to bzImage for the 5.10 series) or, if you prefer a temporary opt-in, add an explicit warning message before download (using console.warn) that ELF vmlinux is experimental and may stall and provide a flag to force ELF. Update references to listUrl and any downstream variable names so the code fetches and uses the bzImage artifact (or emits the warning and respects the new flag) instead of vmlinux-5.10.vmm/src/boot/loader.zig (1)
238-246:⚠️ Potential issue | 🔴 CriticalUse
0xffffffff81000000as the ELF high-half translation base.This is still subtracting
0xffffffff80000000, so0xffffffff81000000maps to0x02000000instead of0x01000000. That shifts both the loaded PT_LOAD segments and the final RIP by an extra 16 MiB.🧭 Suggested fix
const PHYS_OFFSET: u64 = 0x1000000; - if (p_paddr >= 0xffffffff80000000) { - const phys = p_paddr - 0xffffffff80000000 + PHYS_OFFSET; + const KERNEL_VIRT_BASE: u64 = 0xffffffff81000000; + if (p_paddr >= KERNEL_VIRT_BASE) { + const phys = p_paddr - KERNEL_VIRT_BASE + PHYS_OFFSET; if (phys < mem.size()) break :blk `@intCast`(phys); } @@ if (entry < mem.size()) break :blk entry; const PHYS_OFFSET: u64 = 0x1000000; - if (entry >= 0xffffffff80000000) { - break :blk entry - 0xffffffff80000000 + PHYS_OFFSET; + const KERNEL_VIRT_BASE: u64 = 0xffffffff81000000; + if (entry >= KERNEL_VIRT_BASE) { + break :blk entry - KERNEL_VIRT_BASE + PHYS_OFFSET; }Also applies to: 273-279
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/boot/loader.zig` around lines 238 - 246, The virtual-to-physical translation in the paddr calculation uses the wrong high-half base constant: change the subtraction constant 0xffffffff80000000 to 0xffffffff81000000 so that the computed phys = p_paddr - 0xffffffff81000000 + PHYS_OFFSET yields the intended 0x01000000 offset; update the same constant in the later identical block (the second occurrence around the other mapping) so both PT_LOAD segment and final RIP mapping are shifted correctly; verify this in the code around the paddr calculation (symbols: p_paddr, paddr, PHYS_OFFSET, mem).
🧹 Nitpick comments (3)
src/cli/setup.ts (3)
76-98:setupFlint()swallows build errors without diagnostic output.When
zig buildfails,execSyncthrows butstdio: "pipe"suppresses the actual compiler errors. The user sees only "Setup failed" with no indication of what went wrong.Consider capturing and logging stderr on failure:
Proposed fix
const vmmDir = findVmmDir(); console.log(" flint: building with Zig..."); - execSync("zig build -Doptimize=ReleaseSafe", { cwd: vmmDir, stdio: "pipe" }); + try { + execSync("zig build -Doptimize=ReleaseSafe", { cwd: vmmDir, stdio: "pipe" }); + } catch (err) { + const stderr = err instanceof Error && "stderr" in err ? (err as { stderr: Buffer }).stderr?.toString() : ""; + console.error("ERROR: Flint build failed."); + if (stderr) console.error(stderr); + process.exit(1); + } copyFileSync(join(vmmDir, "zig-out", "bin", "flint"), flintPath);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 76 - 98, The setupFlint() function currently calls execSync("zig build -Doptimize=ReleaseSafe", { cwd: vmmDir, stdio: "pipe" }) and if that throws the build stderr is hidden; update setupFlint to wrap the zig build execSync in a try/catch, capture the thrown error (from execSync) and log the error's stdout/stderr or message via console.error before exiting, so users see the Zig compiler output; keep existing behavior of copying and chmod on success and only perform those after a successful build.
125-126: Dynamic import offetchTextis unnecessary.
fetchTextis already exported from./download.jswhich is statically imported on line 17 fordownload. The dynamic import adds complexity without benefit.Proposed fix
-import { download } from "./download.js"; +import { download, fetchText } from "./download.js";Then remove the dynamic import:
console.log(" kernel: finding latest from Firecracker CI..."); - const { fetchText } = await import("./download.js"); const listUrl = `http://spec.ccfc.min.s3.amazonaws.com/?prefix=firecracker-ci/v1.15/${architecture}/vmlinux-5.10&list-type=2`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 125 - 126, The dynamic import of fetchText is unnecessary; remove the dynamic import statement that destructures fetchText and instead use the already statically imported export from the existing import named download (i.e., reference the fetchText export directly where used). Update any code that referenced the dynamically-imported fetchText to use the static import, and delete the now-unused dynamic import line to simplify setup.ts.
100-109:findVmmDir()throws a genericErrorinstead of a typed error.Per coding guidelines, each module should define its own typed error types instead of using generic errors. This pattern also appears in
findAgentDir().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/setup.ts` around lines 100 - 109, findVmmDir() currently throws a generic Error; change this to throw a module-specific typed error (e.g., create and export a VmmNotFoundError class) and update findAgentDir() to use a corresponding AgentNotFoundError if not already typed. Specifically, define a subclass of Error (named VmmNotFoundError) in the same module, update findVmmDir() to throw new VmmNotFoundError("Could not find vmm/ directory. Run from the hearth repo root."), and ensure any callers or tests that expect the generic Error are adjusted to handle the new typed error (or exported type) so type checks and error handling remain correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cli/setup.ts`:
- Line 126: The S3 listing URL stored in the variable listUrl uses "http" and
should be switched to "https" to prevent MITM manipulation; update the template
string assigned to listUrl in src/cli/setup.ts (the declaration of listUrl) to
use "https://spec.ccfc.min.s3.amazonaws.com/..." and ensure any other code that
constructs or uses this S3 listing URL follows the same HTTPS scheme.
In `@src/sandbox/sandbox.ts`:
- Around line 131-134: The code races because waitForFile(`${vsockPath}_1024`,
...) is invoked before the server/listener started by
agent.waitForConnection(...) has been awaited; either await the promise returned
by agent.waitForConnection (e.g. capture agentConnected =
agent.waitForConnection(...) and await it before calling waitForFile) so the
listener is guaranteed to be active before polling the socket file, or
alternatively change waitForConnection() so it only resolves after the socket
file exists; update references to waitForConnection, agentConnected, and
waitForFile accordingly.
In `@vmm/src/boot/loader.zig`:
- Around line 164-188: loadInitrd currently only protects the fixed range
[params.KERNEL_ADDR, params.KERNEL_ADDR + kernel_size) which is fine for bzImage
but can overlap ELF PT_LOAD segments that are relocated around 0x01000000;
update loadInitrd to reserve the actual ELF load window instead of using
params.KERNEL_ADDR+kernel_size: compute (or accept as parameters) the ELF
PT_LOAD minimum and maximum virtual addresses (the ELF load window) and use that
range when checking and computing initrd_addr (replace uses of
params.KERNEL_ADDR and kernel_size in the overlap check and size calculations
with elf_load_min/elf_load_max), and apply the same change for the other
occurrence referenced in lines 315-317 so the initrd placement never collides
with any ELF PT_LOAD segments.
- Around line 252-269: The ELF segment handling currently writes file-backed
bytes and skips zeroing when paddr + p_memsz exceeds mem.size(), leaving a
partial load and kernel_end beyond guest RAM; update the logic to compute
seg_end = paddr + `@as`(usize, `@intCast`(p_memsz)) and if seg_end > mem.size()
return error.InvalidKernel before updating kernel_end or returning success so
that any segment that extends past guest RAM causes the loader to fail (adjust
the flow around the existing p_filesz write, the p_memsz zeroing branch, and the
kernel_end assignment to enforce this check using paddr, p_memsz, mem.size(),
and kernel_end).
- Around line 215-228: The loop currently slices program header bytes into ph
before validating phentsize, which can bounds-trap; add a guard before creating
ph that (1) validates phentsize is at least the minimum required size (use 56 as
suggested) and (2) ensures ph_start + phentsize <= kernel_data.len, returning
error.InvalidKernel if either check fails; update the code around phentsize,
ph_start, and the creation of ph (the variables phentsize, ph_start,
kernel_data, ph in the loop) so you reject undersized or out-of-bounds program
headers before slicing.
---
Duplicate comments:
In `@src/cli/setup.ts`:
- Around line 120-126: The setup currently downloads an ELF vmlinux (see listUrl
and the fetchText import in src/cli/setup.ts) which stalls during kernel init;
change the download target to the corresponding bzImage build (update the
prefix/path in listUrl to point to bzImage for the 5.10 series) or, if you
prefer a temporary opt-in, add an explicit warning message before download
(using console.warn) that ELF vmlinux is experimental and may stall and provide
a flag to force ELF. Update references to listUrl and any downstream variable
names so the code fetches and uses the bzImage artifact (or emits the warning
and respects the new flag) instead of vmlinux-5.10.
In `@vmm/src/boot/loader.zig`:
- Around line 238-246: The virtual-to-physical translation in the paddr
calculation uses the wrong high-half base constant: change the subtraction
constant 0xffffffff80000000 to 0xffffffff81000000 so that the computed phys =
p_paddr - 0xffffffff81000000 + PHYS_OFFSET yields the intended 0x01000000
offset; update the same constant in the later identical block (the second
occurrence around the other mapping) so both PT_LOAD segment and final RIP
mapping are shifted correctly; verify this in the code around the paddr
calculation (symbols: p_paddr, paddr, PHYS_OFFSET, mem).
---
Nitpick comments:
In `@src/cli/setup.ts`:
- Around line 76-98: The setupFlint() function currently calls execSync("zig
build -Doptimize=ReleaseSafe", { cwd: vmmDir, stdio: "pipe" }) and if that
throws the build stderr is hidden; update setupFlint to wrap the zig build
execSync in a try/catch, capture the thrown error (from execSync) and log the
error's stdout/stderr or message via console.error before exiting, so users see
the Zig compiler output; keep existing behavior of copying and chmod on success
and only perform those after a successful build.
- Around line 125-126: The dynamic import of fetchText is unnecessary; remove
the dynamic import statement that destructures fetchText and instead use the
already statically imported export from the existing import named download
(i.e., reference the fetchText export directly where used). Update any code that
referenced the dynamically-imported fetchText to use the static import, and
delete the now-unused dynamic import line to simplify setup.ts.
- Around line 100-109: findVmmDir() currently throws a generic Error; change
this to throw a module-specific typed error (e.g., create and export a
VmmNotFoundError class) and update findAgentDir() to use a corresponding
AgentNotFoundError if not already typed. Specifically, define a subclass of
Error (named VmmNotFoundError) in the same module, update findVmmDir() to throw
new VmmNotFoundError("Could not find vmm/ directory. Run from the hearth repo
root."), and ensure any callers or tests that expect the generic Error are
adjusted to handle the new typed error (or exported type) so type checks and
error handling remain correct.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e105c085-4a07-451e-8b5e-ce35c5b7557f
📒 Files selected for processing (6)
src/cli/setup.tssrc/sandbox/sandbox.tssrc/vm/snapshot.tsvmm/src/boot/loader.zigvmm/src/boot/params.zigvmm/src/main.zig
✅ Files skipped from review due to trivial changes (2)
- vmm/src/boot/params.zig
- vmm/src/main.zig
🚧 Files skipped from review as they are similar to previous changes (1)
- src/vm/snapshot.ts
| fn loadInitrd(bp: []u8, mem: *Memory, path: [*:0]const u8, kernel_size: usize) !void { | ||
| const initrd_data = try readFile(path); | ||
| defer std.heap.page_allocator.free(initrd_data); | ||
|
|
||
| if (initrd_data.len == 0) return error.EmptyInitrd; | ||
|
|
||
| // Place initrd at end of RAM, page-aligned down. | ||
| // Respect initrd_addr_max from the kernel header. | ||
| const initrd_max = std.mem.readInt(u32, bp[params.OFF_INITRD_ADDR_MAX..][0..4], .little); | ||
| const mem_top = mem.size(); | ||
| const top = if (initrd_max > 0 and initrd_max < mem_top) initrd_max else mem_top; | ||
|
|
||
| if (initrd_data.len > top) { | ||
| log.err("initrd ({} bytes) larger than available memory ({})", .{ initrd_data.len, top }); | ||
| return error.InitrdTooLarge; | ||
| } | ||
|
|
||
| const initrd_addr = (top - initrd_data.len) & ~@as(usize, 0xFFF); // page-align down | ||
|
|
||
| if (initrd_addr < params.KERNEL_ADDR + kernel_size) { | ||
| log.err("initrd too large: needs 0x{x} but only 0x{x} available", .{ | ||
| initrd_data.len, top - (params.KERNEL_ADDR + kernel_size), | ||
| }); | ||
| return error.InitrdTooLarge; | ||
| } |
There was a problem hiding this comment.
Reserve the actual ELF load window when placing the initrd.
loadInitrd only protects [params.KERNEL_ADDR, params.KERNEL_ADDR + kernel_size). That is correct for bzImage, but not for the ELF path here, where PT_LOAD segments are translated around 0x01000000. With a large initrd and modest guest RAM, this can place the initrd on top of an ELF segment. params.KERNEL_ADDR is fixed at 0x100000 in vmm/src/boot/params.zig:79-95, so this helper needs the ELF segment min/max addresses, not just a size.
Also applies to: 315-317
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/boot/loader.zig` around lines 164 - 188, loadInitrd currently only
protects the fixed range [params.KERNEL_ADDR, params.KERNEL_ADDR + kernel_size)
which is fine for bzImage but can overlap ELF PT_LOAD segments that are
relocated around 0x01000000; update loadInitrd to reserve the actual ELF load
window instead of using params.KERNEL_ADDR+kernel_size: compute (or accept as
parameters) the ELF PT_LOAD minimum and maximum virtual addresses (the ELF load
window) and use that range when checking and computing initrd_addr (replace uses
of params.KERNEL_ADDR and kernel_size in the overlap check and size calculations
with elf_load_min/elf_load_max), and apply the same change for the other
occurrence referenced in lines 315-317 so the initrd placement never collides
with any ELF PT_LOAD segments.
| const entry = std.mem.readInt(u64, kernel_data[24..32], .little); | ||
| const phoff = std.mem.readInt(u64, kernel_data[32..40], .little); | ||
| const phentsize = std.mem.readInt(u16, kernel_data[54..56], .little); | ||
| const phnum = std.mem.readInt(u16, kernel_data[56..58], .little); | ||
|
|
||
| log.info("ELF kernel: entry=0x{x}, {} program headers", .{ entry, phnum }); | ||
|
|
||
| // Load all PT_LOAD segments into guest memory | ||
| var kernel_end: u64 = 0; | ||
| var i: u16 = 0; | ||
| while (i < phnum) : (i += 1) { | ||
| const ph_start = phoff + @as(u64, i) * phentsize; | ||
| if (ph_start + phentsize > kernel_data.len) return error.InvalidKernel; | ||
| const ph = kernel_data[ph_start..][0..phentsize]; |
There was a problem hiding this comment.
Reject undersized ELF program headers before slicing them.
phentsize comes from the file, but the loop immediately reads up to ph[40..48]. If e_phentsize < 56, this will bounds-trap instead of returning error.InvalidKernel.
🛡️ Suggested guard
const entry = std.mem.readInt(u64, kernel_data[24..32], .little);
const phoff = std.mem.readInt(u64, kernel_data[32..40], .little);
const phentsize = std.mem.readInt(u16, kernel_data[54..56], .little);
const phnum = std.mem.readInt(u16, kernel_data[56..58], .little);
+ const file_len: u64 = `@intCast`(kernel_data.len);
+
+ if (phentsize < 56) {
+ log.err("unexpected ELF program header size: {}", .{phentsize});
+ return error.InvalidKernel;
+ }
+ const ph_table_size = try std.math.mul(u64, `@as`(u64, phentsize), `@as`(u64, phnum));
+ if (phoff > file_len or ph_table_size > file_len - phoff) {
+ return error.InvalidKernel;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const entry = std.mem.readInt(u64, kernel_data[24..32], .little); | |
| const phoff = std.mem.readInt(u64, kernel_data[32..40], .little); | |
| const phentsize = std.mem.readInt(u16, kernel_data[54..56], .little); | |
| const phnum = std.mem.readInt(u16, kernel_data[56..58], .little); | |
| log.info("ELF kernel: entry=0x{x}, {} program headers", .{ entry, phnum }); | |
| // Load all PT_LOAD segments into guest memory | |
| var kernel_end: u64 = 0; | |
| var i: u16 = 0; | |
| while (i < phnum) : (i += 1) { | |
| const ph_start = phoff + @as(u64, i) * phentsize; | |
| if (ph_start + phentsize > kernel_data.len) return error.InvalidKernel; | |
| const ph = kernel_data[ph_start..][0..phentsize]; | |
| const entry = std.mem.readInt(u64, kernel_data[24..32], .little); | |
| const phoff = std.mem.readInt(u64, kernel_data[32..40], .little); | |
| const phentsize = std.mem.readInt(u16, kernel_data[54..56], .little); | |
| const phnum = std.mem.readInt(u16, kernel_data[56..58], .little); | |
| const file_len: u64 = `@intCast`(kernel_data.len); | |
| if (phentsize < 56) { | |
| log.err("unexpected ELF program header size: {}", .{phentsize}); | |
| return error.InvalidKernel; | |
| } | |
| const ph_table_size = try std.math.mul(u64, `@as`(u64, phentsize), `@as`(u64, phnum)); | |
| if (phoff > file_len or ph_table_size > file_len - phoff) { | |
| return error.InvalidKernel; | |
| } | |
| log.info("ELF kernel: entry=0x{x}, {} program headers", .{ entry, phnum }); | |
| // Load all PT_LOAD segments into guest memory | |
| var kernel_end: u64 = 0; | |
| var i: u16 = 0; | |
| while (i < phnum) : (i += 1) { | |
| const ph_start = phoff + `@as`(u64, i) * phentsize; | |
| if (ph_start + phentsize > kernel_data.len) return error.InvalidKernel; | |
| const ph = kernel_data[ph_start..][0..phentsize]; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/boot/loader.zig` around lines 215 - 228, The loop currently slices
program header bytes into ph before validating phentsize, which can bounds-trap;
add a guard before creating ph that (1) validates phentsize is at least the
minimum required size (use 56 as suggested) and (2) ensures ph_start + phentsize
<= kernel_data.len, returning error.InvalidKernel if either check fails; update
the code around phentsize, ph_start, and the creation of ph (the variables
phentsize, ph_start, kernel_data, ph in the loop) so you reject undersized or
out-of-bounds program headers before slicing.
| if (p_filesz > 0) { | ||
| const end = p_offset + p_filesz; | ||
| if (end > kernel_data.len) return error.InvalidKernel; | ||
| try mem.write(paddr, kernel_data[@intCast(p_offset)..@intCast(end)]); | ||
| } | ||
|
|
||
| // Zero BSS (memsz > filesz) | ||
| if (p_memsz > p_filesz) { | ||
| const bss_start = paddr + @as(usize, @intCast(p_filesz)); | ||
| const bss_size: usize = @intCast(p_memsz - p_filesz); | ||
| if (bss_start + bss_size <= mem.size()) { | ||
| const bss = try mem.slice(bss_start, bss_size); | ||
| @memset(bss, 0); | ||
| } | ||
| } | ||
|
|
||
| const seg_end = paddr + @as(usize, @intCast(p_memsz)); | ||
| if (seg_end > kernel_end) kernel_end = seg_end; |
There was a problem hiding this comment.
Fail the load when an ELF segment extends past guest RAM.
If paddr + p_memsz exceeds mem.size(), the file-backed bytes may still be written and the @memset branch is just skipped. That leaves a partially loaded kernel and a kernel_end that points past real guest memory. Return error.InvalidKernel instead of silently continuing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/boot/loader.zig` around lines 252 - 269, The ELF segment handling
currently writes file-backed bytes and skips zeroing when paddr + p_memsz
exceeds mem.size(), leaving a partial load and kernel_end beyond guest RAM;
update the logic to compute seg_end = paddr + `@as`(usize, `@intCast`(p_memsz)) and
if seg_end > mem.size() return error.InvalidKernel before updating kernel_end or
returning success so that any segment that extends past guest RAM causes the
loader to fail (adjust the flow around the existing p_filesz write, the p_memsz
zeroing branch, and the kernel_end assignment to enforce this check using paddr,
p_memsz, mem.size(), and kernel_end).
- Register memory with KVM BEFORE restoring vCPU state (LAPIC/MSR state may reference guest memory addresses) - Restore vCPU state BEFORE VM state (PIT/clock/irqchip) so irqchip-injected interrupts aren't overwritten - Read file data in file order, apply in Firecracker's order - Add KVM_KVMCLOCK_CTRL call after vCPU restore (prevents soft lockup watchdog false positives) - Add KVM paravirt MSRs to snapshot save/restore list (kvm-clock, async PF, steal time, PV EOI) - Add KVM_PIT_SPEAKER_DUMMY flag on PIT restore Snapshot restore still hangs on kernel 6.19 with APICv=Y — known KVM bug loses LAPIC timer interrupt during KVM_SET_LAPIC. Needs either kernel patch or APICv disabled.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
vmm/src/snapshot.zig (2)
317-333: Consider handling EINTR for robustness.The I/O helpers treat any
rc <= 0as an error, butread/writecan return-EINTRif interrupted by a signal. In a controlled VMM environment this is unlikely, but could cause spurious snapshot failures if signals are delivered during I/O.♻️ Example EINTR handling for writeAll
fn writeAll(fd: i32, data: []const u8) !void { var written: usize = 0; while (written < data.len) { const rc: isize = `@bitCast`(linux.write(fd, data[written..].ptr, data.len - written)); - if (rc <= 0) return error.SnapshotWriteFailed; + if (rc < 0) { + if (`@as`(linux.E, `@enumFromInt`(-rc)) == .INTR) continue; + return error.SnapshotWriteFailed; + } + if (rc == 0) return error.SnapshotWriteFailed; written += `@intCast`(rc); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/snapshot.zig` around lines 317 - 333, The helpers writeAll and readExact should treat interrupted syscalls specially: when the syscall returns -linux.EINTR (i.e., rc < 0 and rc == -linux.EINTR) simply continue the loop instead of returning error.SnapshotWriteFailed/SnapshotReadFailed; keep the existing behavior for other negative/zero returns and still advance the counters on positive rc. Update both functions to detect EINTR and retry the read/write rather than failing immediately.
186-187: Duplicate and potentially misleading comment.The comment "Restore vCPU state FIRST" appears here (line 186) and again at line 214. Line 186 is about reading state from the file, while lines 214-224 are about applying state to the vCPU. Consider clarifying the distinction or removing the duplicate.
✏️ Suggested clarification
- // --- Restore vCPU state FIRST (matches Firecracker's order) --- + // --- Read vCPU state from file (in save order) --- var mp_state: c.kvm_mp_state = undefined;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/snapshot.zig` around lines 186 - 187, The comment "Restore vCPU state FIRST" is duplicated and ambiguous; update the earlier occurrence (the comment adjacent to the snapshot read logic) to explicitly say it's reading/parsing vCPU state from the snapshot (e.g., "Read vCPU state from snapshot — do not apply yet") and keep the later comment that precedes the vCPU apply logic as the authoritative "Restore vCPU state FIRST (apply to vCPU)" so the distinction between reading and applying is clear and not misleading.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@vmm/src/snapshot.zig`:
- Around line 270-282: The restore path reads dev_type from the snapshot but
never validates it against the live backend; before calling
dev.snapshotRestore(dev_buf[0..dev_len]) you must check that the saved dev_type
matches the provided device's type (e.g., call or add a device method like
getType() on the device at devices[i] or compare against a known identity field
on the device struct) and if they differ log an error/warn and skip/abort
restore for that slot rather than applying state. Update the code around
devices[i] to perform this validation (use the existing dev_type variable and
the device instance at devices[i], and avoid relying on
VirtioMmio.snapshotRestore to validate identity) and ensure the code returns or
continues appropriately when a mismatch is detected.
---
Nitpick comments:
In `@vmm/src/snapshot.zig`:
- Around line 317-333: The helpers writeAll and readExact should treat
interrupted syscalls specially: when the syscall returns -linux.EINTR (i.e., rc
< 0 and rc == -linux.EINTR) simply continue the loop instead of returning
error.SnapshotWriteFailed/SnapshotReadFailed; keep the existing behavior for
other negative/zero returns and still advance the counters on positive rc.
Update both functions to detect EINTR and retry the read/write rather than
failing immediately.
- Around line 186-187: The comment "Restore vCPU state FIRST" is duplicated and
ambiguous; update the earlier occurrence (the comment adjacent to the snapshot
read logic) to explicitly say it's reading/parsing vCPU state from the snapshot
(e.g., "Read vCPU state from snapshot — do not apply yet") and keep the later
comment that precedes the vCPU apply logic as the authoritative "Restore vCPU
state FIRST (apply to vCPU)" so the distinction between reading and applying is
clear and not misleading.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 311c12f7-98e7-4288-8360-e008d5051342
📒 Files selected for processing (3)
vmm/src/kvm/vcpu.zigvmm/src/main.zigvmm/src/snapshot.zig
✅ Files skipped from review due to trivial changes (1)
- vmm/src/main.zig
🚧 Files skipped from review as they are similar to previous changes (1)
- vmm/src/kvm/vcpu.zig
The snapshot was missing KVM_SET_XSAVE/KVM_GET_XSAVE — the full FPU/SSE/AVX register state. Without this, the vCPU had zeroed FPU state after restore, causing the guest to stall on the first FPU instruction. This unblocked MMIO exits (guest now produces 3 exits after restore vs 0 before). Snapshot restore still has a timer interrupt delivery issue — the guest processes the initial MMIO exits but then enters HLT idle and timer interrupts don't wake it (only ~3 exits in 5 seconds). Fresh boot and snapshot creation work end-to-end.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@vmm/src/kvm/vcpu.zig`:
- Around line 104-109: The getCpuid function currently only sets cpuid.nent and
padding before issuing KVM_GET_CPUID2, leaving the tail of
Kvm.CpuidBuffer.entries uninitialized; zero the entire cpuid buffer (all bytes
of the Kvm.CpuidBuffer instance) before setting nent/padding and calling
abi.ioctlVoid so any unused entries are deterministic and not leaked into
snapshots. Locate the getCpuid function and the Kvm.CpuidBuffer usage (symbols:
getCpuid, Kvm.CpuidBuffer, cpuid.nent, cpuid.padding, Kvm.MAX_CPUID_ENTRIES,
KVM_GET_CPUID2) and add a full zeroing of the cpuid struct (using the project’s
preferred mem zero helper) prior to the ioctl call.
- Around line 213-217: The doc comment for kvmclockCtrl currently claims the
ioctl is "Non-fatal if the guest doesn't support kvmclock" but the
implementation uses try abi.ioctlVoid(self.fd, 0xAED5, 0) which propagates all
errors; update kvmclockCtrl to match the comment by catching the ioctl error and
only propagate unexpected errors: call abi.ioctlVoid and if it returns
error.InvalidArgument or error.NoDevice (or the corresponding errno values)
swallow/ignore them, otherwise return the error; reference the kvmclockCtrl
function, its use of abi.ioctlVoid and self.fd when making the change.
- Around line 100-102: The setCpuid function currently passes the
Kvm.CpuidBuffer directly into the KVM_SET_CPUID2 ioctl; add a defensive bounds
check on the buffer's nent field before calling abi.ioctlVoid to ensure nent <=
256 (the fixed capacity) and return an error if it exceeds that limit. Locate
the setCpuid function and Kvm.CpuidBuffer usage, read cpuid.nent (or equivalent
field) and validate it against the constant max entries (256) prior to invoking
abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, ...), failing early on invalid values
to prevent the kernel from reading past the userspace buffer. Ensure the error
returned is appropriate for the function signature (!void).
- Around line 199-210: The getMsrs and setMsrs implementations currently call
abi.ioctlVoid and assume success; change them to treat KVM_GET_MSRS/KVM_SET_MSRS
as counted transfers by calling the ioctl variant that returns an integer (e.g.,
abi.ioctl returning isize), capture the returned count, verify it equals
buf.nmsrs (or the expected number) and return typed errors (e.g.,
error.PartialMsrRead / error.PartialMsrWrite / error.InvalidMsrCount) on
mismatch; also ensure the MsrBuffer is zero-initialized before use (zero buf in
getMsrs and zero any temporary buffers before save) to avoid leaking
uninitialized memory, and when persisting/saving only write the actual number of
populated entries (snapshot_msr_indices.len / buf.nmsrs) rather than the full
32-entry MsrBuffer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d0e7f02e-91aa-435f-adcf-bbf4b68b92c9
📒 Files selected for processing (2)
vmm/src/kvm/vcpu.zigvmm/src/snapshot.zig
🚧 Files skipped from review as they are similar to previous changes (1)
- vmm/src/snapshot.zig
| pub fn setCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { | ||
| try abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, @intFromPtr(cpuid)); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the vcpu.zig file
find . -name "vcpu.zig" -type fRepository: joshuaisaact/hearth
Length of output: 85
🏁 Script executed:
# Let's look at the setCpuid function context
head -120 vmm/src/kvm/vcpu.zig | tail -30Repository: joshuaisaact/hearth
Length of output: 1160
🏁 Script executed:
# Search for CpuidBuffer definition
rg "CpuidBuffer" --type zig -B 2 -A 5Repository: joshuaisaact/hearth
Length of output: 3308
🏁 Script executed:
# Search for MAX_CPUID_ENTRIES constant
rg "MAX_CPUID_ENTRIES" --type zigRepository: joshuaisaact/hearth
Length of output: 408
🏁 Script executed:
# Also search for the Kvm module structure
fd "\.zig" | xargs rg "pub const|struct.*Cpuid" --type zig -B 1 -A 3 | head -50Repository: joshuaisaact/hearth
Length of output: 2451
Add bounds check for nent in setCpuid before passing to KVM.
The setCpuid function passes the CPUID buffer directly to KVM_SET_CPUID2 without validating that nent does not exceed the fixed buffer capacity (256 entries). If nent is corrupted or oversized (e.g., from a malformed snapshot), the kernel ioctl will trust the embedded count and may read past the end of the userspace buffer. The snapshot.zig restore path already validates this defensively, but the check belongs in setCpuid itself to prevent the ioctl call from being invoked with invalid data.
Suggested fix
pub fn setCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void {
+ if (cpuid.nent > Kvm.MAX_CPUID_ENTRIES) return error.InvalidCpuidEntryCount;
try abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, `@intFromPtr`(cpuid));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn setCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { | |
| try abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, @intFromPtr(cpuid)); | |
| } | |
| pub fn setCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { | |
| if (cpuid.nent > Kvm.MAX_CPUID_ENTRIES) return error.InvalidCpuidEntryCount; | |
| try abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, `@intFromPtr`(cpuid)); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/kvm/vcpu.zig` around lines 100 - 102, The setCpuid function currently
passes the Kvm.CpuidBuffer directly into the KVM_SET_CPUID2 ioctl; add a
defensive bounds check on the buffer's nent field before calling abi.ioctlVoid
to ensure nent <= 256 (the fixed capacity) and return an error if it exceeds
that limit. Locate the setCpuid function and Kvm.CpuidBuffer usage, read
cpuid.nent (or equivalent field) and validate it against the constant max
entries (256) prior to invoking abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, ...),
failing early on invalid values to prevent the kernel from reading past the
userspace buffer. Ensure the error returned is appropriate for the function
signature (!void).
| /// Read back the CPUID entries currently set on this vCPU. | ||
| /// Needed for snapshot: the guest may see filtered CPUID vs host's supported set. | ||
| pub fn getCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { | ||
| cpuid.nent = Kvm.MAX_CPUID_ENTRIES; | ||
| cpuid.padding = 0; | ||
| try abi.ioctlVoid(self.fd, c.KVM_GET_CPUID2, @intFromPtr(cpuid)); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the CpuidBuffer definition
find . -type f -name "*.zig" | head -20Repository: joshuaisaact/hearth
Length of output: 576
🏁 Script executed:
# Search for CpuidBuffer definition
rg "CpuidBuffer" -t zig -B 5 -A 10Repository: joshuaisaact/hearth
Length of output: 5473
🏁 Script executed:
# Find and inspect the snapshot.zig file mentioned in the original scratchpad
fd -t f "snapshot.zig" -x cat {}Repository: joshuaisaact/hearth
Length of output: 12233
🏁 Script executed:
# Look at the getCpuid function in context
sed -n '100,120p' vmm/src/kvm/vcpu.zigRepository: joshuaisaact/hearth
Length of output: 985
🏁 Script executed:
# Check if there's a getMsrs/setMsrs to compare patterns
rg "getMsrs|setMsrs" -t zig -B 2 -A 8Repository: joshuaisaact/hearth
Length of output: 3149
🏁 Script executed:
# Get the full vcpu.zig file structure to understand the context
wc -l vmm/src/kvm/vcpu.zigRepository: joshuaisaact/hearth
Length of output: 87
Zero the full CPUID buffer before KVM_GET_CPUID2.
The snapshot save path writes the entire Kvm.CpuidBuffer struct to disk, but getCpuid only initializes the live prefix. When KVM returns fewer than Kvm.MAX_CPUID_ENTRIES, the uninitialized tail of the entries array persists as stack garbage into the snapshot file.
Suggested fix
pub fn getCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void {
+ cpuid.* = std.mem.zeroes(Kvm.CpuidBuffer);
cpuid.nent = Kvm.MAX_CPUID_ENTRIES;
- cpuid.padding = 0;
try abi.ioctlVoid(self.fd, c.KVM_GET_CPUID2, `@intFromPtr`(cpuid));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Read back the CPUID entries currently set on this vCPU. | |
| /// Needed for snapshot: the guest may see filtered CPUID vs host's supported set. | |
| pub fn getCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { | |
| cpuid.nent = Kvm.MAX_CPUID_ENTRIES; | |
| cpuid.padding = 0; | |
| try abi.ioctlVoid(self.fd, c.KVM_GET_CPUID2, @intFromPtr(cpuid)); | |
| /// Read back the CPUID entries currently set on this vCPU. | |
| /// Needed for snapshot: the guest may see filtered CPUID vs host's supported set. | |
| pub fn getCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { | |
| cpuid.* = std.mem.zeroes(Kvm.CpuidBuffer); | |
| cpuid.nent = Kvm.MAX_CPUID_ENTRIES; | |
| try abi.ioctlVoid(self.fd, c.KVM_GET_CPUID2, `@intFromPtr`(cpuid)); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/kvm/vcpu.zig` around lines 104 - 109, The getCpuid function currently
only sets cpuid.nent and padding before issuing KVM_GET_CPUID2, leaving the tail
of Kvm.CpuidBuffer.entries uninitialized; zero the entire cpuid buffer (all
bytes of the Kvm.CpuidBuffer instance) before setting nent/padding and calling
abi.ioctlVoid so any unused entries are deterministic and not leaked into
snapshots. Locate the getCpuid function and the Kvm.CpuidBuffer usage (symbols:
getCpuid, Kvm.CpuidBuffer, cpuid.nent, cpuid.padding, Kvm.MAX_CPUID_ENTRIES,
KVM_GET_CPUID2) and add a full zeroing of the cpuid struct (using the project’s
preferred mem zero helper) prior to the ioctl call.
| /// Notify KVM that the guest was paused (prevents soft lockup watchdog | ||
| /// false positives on resume). Non-fatal if the guest doesn't support kvmclock. | ||
| pub fn kvmclockCtrl(self: Self) !void { | ||
| try abi.ioctlVoid(self.fd, 0xAED5, 0); // KVM_KVMCLOCK_CTRL | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd vmm && find . -name "vcpu.zig" -type fRepository: joshuaisaact/hearth
Length of output: 81
🏁 Script executed:
cat -n vmm/src/kvm/vcpu.zig | sed -n '205,225p'Repository: joshuaisaact/hearth
Length of output: 908
🏁 Script executed:
# Search for abi.ioctlVoid implementation
rg "fn ioctlVoid" vmm/Repository: joshuaisaact/hearth
Length of output: 153
🏁 Script executed:
# Check for error handling patterns with ioctlVoid in the codebase
rg -A 3 "ioctlVoid" vmm/src/kvm/vcpu.zig | head -50Repository: joshuaisaact/hearth
Length of output: 1210
🏁 Script executed:
cat -n vmm/src/kvm/abi.zig | grep -A 15 "pub fn ioctlVoid"Repository: joshuaisaact/hearth
Length of output: 347
🏁 Script executed:
# Search for calls to kvmclockCtrl to see how errors are currently handled
rg "kvmclockCtrl" vmm/Repository: joshuaisaact/hearth
Length of output: 178
🏁 Script executed:
# Check if there are any patterns in the codebase for ignoring specific errors
rg "catch|error:" vmm/src/kvm/ | head -30Repository: joshuaisaact/hearth
Length of output: 167
🏁 Script executed:
cat -n vmm/src/kvm/abi.zig | grep -B 10 "pub fn ioctlVoid"Repository: joshuaisaact/hearth
Length of output: 532
🏁 Script executed:
# Find and read the ioctl function
rg -A 20 "^fn ioctl\(" vmm/src/kvm/abi.zigRepository: joshuaisaact/hearth
Length of output: 45
🏁 Script executed:
rg -B 25 "pub fn ioctlVoid" vmm/src/kvm/abi.zig | head -40Repository: joshuaisaact/hearth
Length of output: 1081
🏁 Script executed:
# Get the full abi.zig file to see the complete error handling
cat vmm/src/kvm/abi.zigRepository: joshuaisaact/hearth
Length of output: 1503
Fix the contract mismatch between the doc comment and implementation.
The doc comment states "Non-fatal if the guest doesn't support kvmclock", but the function propagates all ioctl errors via try. Either update the doc comment to accurately reflect that the function can error, or change the implementation to catch and ignore the specific errors that occur when the guest doesn't support kvmclock (typically error.InvalidArgument or error.NoDevice).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/kvm/vcpu.zig` around lines 213 - 217, The doc comment for
kvmclockCtrl currently claims the ioctl is "Non-fatal if the guest doesn't
support kvmclock" but the implementation uses try abi.ioctlVoid(self.fd, 0xAED5,
0) which propagates all errors; update kvmclockCtrl to match the comment by
catching the ioctl error and only propagate unexpected errors: call
abi.ioctlVoid and if it returns error.InvalidArgument or error.NoDevice (or the
corresponding errno values) swallow/ignore them, otherwise return the error;
reference the kvmclockCtrl function, its use of abi.ioctlVoid and self.fd when
making the change.
- Fix KVM_KVMCLOCK_CTRL: was using hardcoded 0xAED5 (wrong), now uses c.KVM_KVMCLOCK_CTRL from KVM headers (0xAEAD) - Add KVM_GET_XSAVE/KVM_SET_XSAVE to snapshot save/restore (full FPU/SSE/AVX register state) - Log warning if KVMCLOCK_CTRL fails (non-fatal) Snapshot restore still blocked: guest doesn't resume after restore. KVM_RUN produces ~3 MMIO exits then the vCPU goes idle with no timer interrupts waking it. All KVM state restoration succeeds (matching Firecracker's exact order). APICv=N doesn't help.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
vmm/src/kvm/vcpu.zig (2)
100-109:⚠️ Potential issue | 🟠 MajorDefensively bound and sanitize the CPUID buffer before these ioctls.
setCpuid()still trustscpuid.nent, andgetCpuid()still leaves the unused tail uninitialized even thoughvmm/src/snapshot.zig::save()writes the wholeKvm.CpuidBufferto disk. That combination can both over-trust malformed input and leak stack garbage into snapshots.🧩 Suggested fix
pub fn setCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { + if (cpuid.nent > Kvm.MAX_CPUID_ENTRIES) return error.InvalidCpuidEntryCount; try abi.ioctlVoid(self.fd, c.KVM_SET_CPUID2, `@intFromPtr`(cpuid)); } @@ pub fn getCpuid(self: Self, cpuid: *Kvm.CpuidBuffer) !void { + cpuid.* = std.mem.zeroes(Kvm.CpuidBuffer); cpuid.nent = Kvm.MAX_CPUID_ENTRIES; - cpuid.padding = 0; try abi.ioctlVoid(self.fd, c.KVM_GET_CPUID2, `@intFromPtr`(cpuid)); }Linux KVM API: what constraints does KVM_SET_CPUID2 place on struct kvm_cpuid2.nent, and does KVM_GET_CPUID2 populate only the first nent entries of the caller-provided buffer?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/kvm/vcpu.zig` around lines 100 - 109, Ensure the CPUID buffer is defensively bounded and zeroed: in setCpuid(), validate and clamp cpuid.nent to Kvm.MAX_CPUID_ENTRIES (or return an error if cpuid.nent is out of range) before calling ioctl to avoid trusting a malicious nent; in getCpuid(), set cpuid.nent = Kvm.MAX_CPUID_ENTRIES as you do but also zero the unused entries/tail of the Kvm.CpuidBuffer after the ioctl so any entries beyond the populated count are deterministic (avoid leaking stack garbage), and document/handle the case where the kernel returns fewer entries than MAX_CPUID_ENTRIES by only trusting the populated count the kernel reports while keeping the rest zeroed.
199-210:⚠️ Potential issue | 🟠 MajorTreat MSR ioctls as counted transfers and zero the fixed buffer first.
KVM_GET_MSRSandKVM_SET_MSRScan succeed partially, soioctlVoidhides incomplete transfers.getMsrs()also initializes only the live prefix of a 32-entry buffer, whilevmm/src/snapshot.zig::save()persists the full struct.🧩 Suggested fix
pub fn getMsrs(self: Self, buf: *MsrBuffer) !void { + buf.* = std.mem.zeroes(MsrBuffer); buf.nmsrs = snapshot_msr_indices.len; - buf.pad = 0; for (snapshot_msr_indices, 0..) |idx, i| { buf.entries[i] = .{ .index = idx, .reserved = 0, .data = 0 }; } - try abi.ioctlVoid(self.fd, c.KVM_GET_MSRS, `@intFromPtr`(buf)); + const requested: usize = `@intCast`(buf.nmsrs); + const got = try abi.ioctl(self.fd, c.KVM_GET_MSRS, `@intFromPtr`(buf)); + if (got != requested) return error.PartialMsrRead; } pub fn setMsrs(self: Self, buf: *const MsrBuffer) !void { - try abi.ioctlVoid(self.fd, c.KVM_SET_MSRS, `@intFromPtr`(buf)); + if (buf.nmsrs > MAX_MSR_ENTRIES) return error.InvalidMsrCount; + const requested: usize = `@intCast`(buf.nmsrs); + const written = try abi.ioctl(self.fd, c.KVM_SET_MSRS, `@intFromPtr`(buf)); + if (written != requested) return error.PartialMsrWrite; }Linux KVM API: what do KVM_GET_MSRS and KVM_SET_MSRS return on success, and can either ioctl report a partial transfer instead of failing?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vmm/src/kvm/vcpu.zig` around lines 199 - 210, getMsrs and setMsrs must treat KVM_GET_MSRS/KVM_SET_MSRS as counted transfers and operate on a zeroed fixed-size MsrBuffer to avoid persisting garbage; zero the entire MsrBuffer first, populate buf.nmsrs with the intended count, initialize all buf.entries to zero then fill the live prefix using snapshot_msr_indices in getMsrs, then call the ioctl variant that returns the actual number of MSRs processed (don’t use ioctlVoid), check the returned count and update buf.nmsrs to that count to reflect a partial success; similarly, in setMsrs ensure you only send the number of entries indicated by buf.nmsrs (zero the buffer beforehand) and call the ioctl that returns the actual processed count and handle partial writes instead of assuming success.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@vmm/src/snapshot.zig`:
- Around line 309-315: The snapshot file creation in openCreate currently uses
mode 0o644 which exposes sensitive guest RAM; change the file mode passed to
linux.open in openCreate from 0o644 to owner-only 0o600 so created snapshot
artifacts are readable/writable only by the owner (update the mode argument in
the `@bitCast` call that constructs the open flags).
- Around line 286-290: The restore loop currently treats a saved device without
a live backend as a warning, allowing a partially reconstructed VM; change the
branch that handles devices[i] == null to abort the restore instead: in the
function performing the restore loop (the code around devices[i] checks in
snapshot.zig that calls dev.snapshotRestore), replace the log.warn path with a
failing return (e.g., return error.RestoreMissingBackend or an appropriate
existing error variant) so load()/restore exits immediately when a saved device
slot (dev_type at slot i) has no provided backend, and include the device slot
and type in the error context for diagnostics.
- Around line 185-187: The code currently calls Memory.initFromFile and
mem.deinit() via errdefer after vm.setMemoryRegion(0, 0, mem.alignedMem()),
which can unmap RAM while KVM slot 0 still references it; add an API in the
VM/KVM layer (e.g., vm.deleteMemoryRegion or vm.unsetMemoryRegion) in
vmm/src/kvm/vm.zig to unregister a memory region/slot from KVM, and change
snapshot.zig so the cleanup path first calls that new vm method to remove slot 0
before calling mem.deinit() (or reverse the errdefer order / add an errdefer to
call the new vm.unsetMemoryRegion before mem.deinit) to ensure KVM no longer
points at the mapped memory.
---
Duplicate comments:
In `@vmm/src/kvm/vcpu.zig`:
- Around line 100-109: Ensure the CPUID buffer is defensively bounded and
zeroed: in setCpuid(), validate and clamp cpuid.nent to Kvm.MAX_CPUID_ENTRIES
(or return an error if cpuid.nent is out of range) before calling ioctl to avoid
trusting a malicious nent; in getCpuid(), set cpuid.nent = Kvm.MAX_CPUID_ENTRIES
as you do but also zero the unused entries/tail of the Kvm.CpuidBuffer after the
ioctl so any entries beyond the populated count are deterministic (avoid leaking
stack garbage), and document/handle the case where the kernel returns fewer
entries than MAX_CPUID_ENTRIES by only trusting the populated count the kernel
reports while keeping the rest zeroed.
- Around line 199-210: getMsrs and setMsrs must treat KVM_GET_MSRS/KVM_SET_MSRS
as counted transfers and operate on a zeroed fixed-size MsrBuffer to avoid
persisting garbage; zero the entire MsrBuffer first, populate buf.nmsrs with the
intended count, initialize all buf.entries to zero then fill the live prefix
using snapshot_msr_indices in getMsrs, then call the ioctl variant that returns
the actual number of MSRs processed (don’t use ioctlVoid), check the returned
count and update buf.nmsrs to that count to reflect a partial success;
similarly, in setMsrs ensure you only send the number of entries indicated by
buf.nmsrs (zero the buffer beforehand) and call the ioctl that returns the
actual processed count and handle partial writes instead of assuming success.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e78380d7-fdd7-43ff-8051-0aa0d9241619
📒 Files selected for processing (3)
vmm/src/kvm/vcpu.zigvmm/src/main.zigvmm/src/snapshot.zig
✅ Files skipped from review due to trivial changes (1)
- vmm/src/main.zig
vmm/src/snapshot.zig
Outdated
| const mem = try Memory.initFromFile(mem_path, header.mem_size); | ||
| errdefer mem.deinit(); | ||
| try vm.setMemoryRegion(0, 0, mem.alignedMem()); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Linux KVM API: after KVM_SET_USER_MEMORY_REGION registers a userspace_addr, is it valid to munmap that range without first deleting or replacing the memory slot?
💡 Result:
No, it is not valid to munmap the userspace memory range registered via KVM_SET_USER_MEMORY_REGION without first deleting or replacing the memory slot. The userspace memory must remain valid and accessible at the registered userspace_addr for the lifetime of the slot. KVM relies on MMU notifiers to detect changes like munmap and invalidate its internal mappings (e.g. EPT/NPT/SPTEs), but munmapping while the slot is active can lead to use-after-free scenarios where KVM or the guest accesses unmapped memory before invalidation completes. Official docs state changes like mmap/madvise are reflected automatically with KVM_CAP_SYNC_MMU, implying the backing must stay valid. Examples in code (e.g. QEMU, tutorials) mmap once and keep mapped until slot modification/removal. To change backing, use KVM_SET_USER_MEMORY_REGION to delete (memory_size=0) or replace the slot first.
Citations:
- 1: https://www.kernel.org/doc/Documentation/virt/kvm/api.txt
- 2: https://docs.kernel.org/virt/kvm/api.html
- 3: https://www.linux-kvm.org/page/Memory
🏁 Script executed:
find . -name "*.zig" -type f | head -20Repository: joshuaisaact/hearth
Length of output: 576
🏁 Script executed:
# Locate the snapshot, memory, and vm files
fd -e zig | grep -E "(snapshot|memory|vm\.zig)" | head -20Repository: joshuaisaact/hearth
Length of output: 121
🏁 Script executed:
# Check the structure
ls -la vmm/src/ 2>/dev/null | head -30Repository: joshuaisaact/hearth
Length of output: 838
🏁 Script executed:
# Check snapshot.zig around lines 185-210 to see the error handling
cat -n vmm/src/snapshot.zig | sed -n '175,220p'Repository: joshuaisaact/hearth
Length of output: 2108
🏁 Script executed:
# Check memory.zig to see if deinit unmaps
cat -n vmm/src/memory.zigRepository: joshuaisaact/hearth
Length of output: 4278
🏁 Script executed:
# Check vm.zig for setMemoryRegion and any clearMemoryRegion function
cat -n vmm/src/kvm/vm.zig | head -100Repository: joshuaisaact/hearth
Length of output: 4399
🏁 Script executed:
# Search for clearMemoryRegion or unmap in vm.zig
rg -n "clear|unmap" vmm/src/kvm/vm.zigRepository: joshuaisaact/hearth
Length of output: 45
🏁 Script executed:
# See the rest of the restore function to understand all possible error paths
cat -n vmm/src/snapshot.zig | sed -n '185,280p'Repository: joshuaisaact/hearth
Length of output: 4532
🏁 Script executed:
# Check if there's a clearMemoryRegion or deleteMemoryRegion function anywhere
rg -n "Memory.*Region|delete|clear" vmm/src/kvm/vm.zigRepository: joshuaisaact/hearth
Length of output: 185
🏁 Script executed:
# Look for any error handling patterns in snapshot.zig around the memory setup
rg -n -A5 "setMemoryRegion" vmm/src/snapshot.zigRepository: joshuaisaact/hearth
Length of output: 313
Don't unmap restored RAM while slot 0 still points at it.
After vm.setMemoryRegion(0, 0, mem.alignedMem()) at line 187, any error in the subsequent restoration steps (lines 192–261) will trigger errdefer mem.deinit(), which unmaps the memory via std.posix.munmap(). Since setMemoryRegion() only registers the userspace pointer with KVM and no corresponding deletion exists in vm.zig, slot 0 remains active, pointing to freed memory. The VM will fault or access invalid memory if the caller doesn't immediately tear down the VM.
🧹 Suggested fix
Add a helper to delete memory regions in vmm/src/kvm/vm.zig:
/// Register a guest physical memory region backed by host memory.
pub fn setMemoryRegion(self: Self, slot: u32, guest_phys_addr: u64, memory: []align(std.heap.page_size_min) u8) !void {
var region = c.kvm_userspace_memory_region{
.slot = slot,
.flags = 0,
.guest_phys_addr = guest_phys_addr,
.memory_size = memory.len,
.userspace_addr = `@intFromPtr`(memory.ptr),
};
try abi.ioctlVoid(self.fd, c.KVM_SET_USER_MEMORY_REGION, `@intFromPtr`(®ion));
log.info("memory region: slot={} guest=0x{x} size=0x{x}", .{ slot, guest_phys_addr, memory.len });
}
+/// Delete a guest physical memory region by setting memory_size to 0.
+pub fn clearMemoryRegion(self: Self, slot: u32) !void {
+ var region = std.mem.zeroes(c.kvm_userspace_memory_region);
+ region.slot = slot;
+ try abi.ioctlVoid(self.fd, c.KVM_SET_USER_MEMORY_REGION, `@intFromPtr`(®ion));
+}Then in vmm/src/snapshot.zig lines 185–187:
const mem = try Memory.initFromFile(mem_path, header.mem_size);
errdefer mem.deinit();
try vm.setMemoryRegion(0, 0, mem.alignedMem());
+ errdefer vm.clearMemoryRegion(0) catch {};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@vmm/src/snapshot.zig` around lines 185 - 187, The code currently calls
Memory.initFromFile and mem.deinit() via errdefer after vm.setMemoryRegion(0, 0,
mem.alignedMem()), which can unmap RAM while KVM slot 0 still references it; add
an API in the VM/KVM layer (e.g., vm.deleteMemoryRegion or vm.unsetMemoryRegion)
in vmm/src/kvm/vm.zig to unregister a memory region/slot from KVM, and change
snapshot.zig so the cleanup path first calls that new vm method to remove slot 0
before calling mem.deinit() (or reverse the errdefer order / add an errdefer to
call the new vm.unsetMemoryRegion before mem.deinit) to ensure KVM no longer
points at the mapped memory.
Snapshot restore has a KVM vCPU resume issue (KVM_RUN blocks forever after state restore, likely related to APICv and LAPIC timer state). As a workaround, Sandbox.create() now boots a fresh VM each time instead of restoring from a snapshot. This is slower (~6s vs target ~135ms) but functionally correct. Fresh boot + exec + destroy cycle works end-to-end. Also: fix KVM_GET_MSRS to check return count, add MSR_IA32_EFER and FS/GS_BASE to MSR list, remove identity map addr from restore paths (Firecracker doesn't use it). Snapshot restore remains as a TODO for optimization.
…tion Issues fixed: - Remove SA_RESTART from SIGUSR1 handler (deadlocked VM pause) - Add missing seccomp syscalls (epoll, nanosleep, statx) - Fix vsock EAGAIN recovery re-iterating from wrong descriptor index - Add DESC_F_WRITE validation in virtio-blk and virtio-net - Add KVM_SET_IDENTITY_MAP_ADDR to both snapshot restore paths - Validate device type matches on snapshot restore - SendCtrlAltDel now actually signals VM exit - Path traversal validation on snapshot create API - Check agent.ping() return value in freshBoot/restoreFromDir - Remove dead imports and loadSnapshot() method - Fix VmConfig memory leak on repeated API requests - Add backoff to pause spin loops (nanosleep after 1000 spins) - Fix /dev/kvm permissions inside jail (0666 in private namespace) - Add mem_size upper bound check on snapshot restore - Show zig build errors on setup failure
Root cause: the ELF kernel loader's synthetic boot_params broke resume from HLT after snapshot restore. Switching to bzImage format (which uses the kernel's own setup header) fixes restore completely. Changes: - Switch kernel from ELF vmlinux to bzImage (built from 5.10 source via Docker) - getKernelPath() prefers bzImage over vmlinux - Setup builds bzImage from Firecracker CI 5.10 kernel config - Sandbox.create() uses snapshot restore (68ms) instead of fresh boot (6s) - Add 200ms settle delay before snapshot capture so vCPU enters HALTED state (mp_state=3) instead of RUNNABLE (mp_state=0) - Dynamic MSR discovery via KVM_GET_MSR_INDEX_LIST (60+ MSRs vs 21) - Add debug register save/restore (KVM_GET/SET_DEBUGREGS) - Bump snapshot format to v2
Eliminates Docker dependency from setup. The 5.10.245 bzImage is hosted at github.com/joshuaisaact/hearth/releases/tag/kernel-5.10.245.
…tion Fixes: - blk.zig: DESC_F_WRITE validation now actually prevents I/O (else → else if) - api.zig: move immediate_exit=1 after cmpxchg to avoid pause race - api.zig: add isValidBasename() to handleSnapshotLoad (was missing) - api.zig: remove noop SendCtrlAltDel wait loop - seccomp.zig: add open(2), shutdown(48), fdatasync(75) to allowlist - snapshot.zig: assert device slots non-null during save, mode 0600 - sandbox.ts: check ping() in reconnectAgent() - setup.test.ts: accept bzImage or vmlinux Tests added: - snapshot header mem_size bounds validation - isValidBasename accepts/rejects (path traversal regression guard) - seccomp filter contains all required syscalls
… sync Node.js copyFile uses copy_file_range which does server-side CoW on btrfs regardless of flags. The VMM mmaps the memory snapshot with MAP_PRIVATE for demand-paging, and reflinked extents break page fault handling on restore. Shell out to cp --reflink=never for the memory file. Also includes prior review fixes: tar path traversal guards, env var key validation, checkpoint resume error handling, cmpxchg pause race, IO exit bounds check, getIoData optional return, hardcoded MSR list, snapshot file permissions, device min size validation, blk pwrite retry, vsock payload cap, and vsock queue index sync after restore.
Summary
Sandbox.create()~145ms (was ~135ms with Firecracker)cp --reflink=autoKey changes
Known issue
Multiple sequential
exec()calls on the same sandbox hang — first exec works, second blocks. This is a pre-existing hearth-agent protocol bug (single request per connection), not a regression from this migration.42 files changed, +7133 / -653