[nanvix-http] B: serve_kill waits for VM exit#20
Open
esaurez wants to merge 1 commit into
Open
Conversation
16e6452 to
02a1348
Compare
f9252b4 to
03203a8
Compare
02a1348 to
721dc4a
Compare
03203a8 to
3052d99
Compare
Re-order the cleanup in `serve_kill` so the VMM task is awaited BEFORE the gateway bridge task is aborted. The previous order aborted the bridge as soon as the shim's KILL arrived, which closed `output_rx` while the guest was still writing to stdio. Every subsequent guest write then failed at the io_handler with `output channel closed`. On Unix this surfaces to consumers of the gateway socket as an abrupt EOF instead of a graceful close; on Windows (after the cross-platform gateway consumer landed) it manifests as CPython BrokenPipeError -> exit 120, ~60 seconds after KILL was issued. After the fix, the bridge ends naturally when the io_handler closes `output_tx` at VM teardown: `output_rx.recv()` returns `None`, the bridge's read loop terminates, and the spawned task resolves on its own. The `abort()` after `wait()` is retained as defensive cleanup in case the bridge has not yet released its connection handles by the time we return the KILL response. The comment block documents the invariant the new ordering relies on: the bridge's consumer (UDS peer on Unix, named-pipe peer on Windows) must keep draining bytes the bridge forwards. If a future consumer stops reading mid-stream, the connection write back-pressures the bridge, the bridge stops draining `output_rx`, and the io_handler eventually blocks on `output_tx.send().await` -- which would hang the KILL response. The fix is a 3-line re-order. It is logically correct on both Unix and Windows; on Unix it merely cleans up an EOF-instead-of- graceful-close artifact in the gateway socket consumers, but on Windows it is what keeps CPython workloads alive past their first stdio write. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
721dc4a to
df77dd4
Compare
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-order the cleanup in
nanvix-http'sserve_killso the VMM task is awaited before the gateway bridge is aborted. A small, surgical correctness fix that was previously silent on Unix and becomes user-visible on Windows once the cross-platform gateway endpoint lands (#18).Base
This PR is stacked on #18 (
enhancement-windows-http-mode). The new bridge field name (gateway_sockaddr) andcfg(unix)-gatedremove_fileare introduced there.What changed
The match arm now operates on the saved
wait_result.A multi-paragraph comment captures the rationale and documents the invariant the new ordering relies on: the bridge's consumer (the gateway UDS peer on Unix or the named-pipe peer on Windows) must keep draining the bytes the bridge forwards. If a future consumer stops reading mid-stream, the connection write back-pressures the bridge, the bridge stops draining
output_rx, the io_handler eventually blocks onoutput_tx.send().await(once the bounded channel buffer fills), and the guest stalls without reaching VM exit — which would hang the KILL response.The same
RunningVm.handle/bridge are also torn down inStandaloneState::cleanup(), butcleanup()usesabort_and_wait()(forced shutdown) where the drain invariant does not apply. A short comment incleanup()records this asymmetry so a future contributor does not "harmonise" the two paths.Why this matters
The previous order aborted the bridge as soon as the shim's
KILLarrived, which closedoutput_rxwhile the guest was still writing to stdio. Every subsequent guest write then failed at the io_handler withoutput channel closed. The consequences:BrokenPipeError.After the fix, the bridge ends naturally when the io_handler closes
output_txat VM teardown:output_rx.recv()returnsNone, the bridge's read loop terminates, and the spawned task resolves on its own. Theabort()afterwait()is retained as defensive cleanup in case the bridge has not yet released its connection handles by the time we return the KILL response.Why it's safe
awaitearlier; the function's semantics (return KILL exit code after the VM exits) are unchanged._gateway_bridge.abort()no longer races withoutput_tx: the io_handler has already closed it from the VMM side by the timewait()returns.Tests
No straightforward unit test for this path (requires a live VM + bridge). Existing standalone integration coverage exercises the new ordering implicitly.
z.ps1 build,z.ps1 build -- format-check,z.ps1 build -- lint-check, andz.ps1 build -- spellcheckall pass on Windows.Notes for reviewers
nanvixd -log-to-stdout). It is the smallest of the four.