fix(win): prevent UI hang — spawn git without leaking handles into the child#799
fix(win): prevent UI hang — spawn git without leaking handles into the child#799Flipper1994 wants to merge 2 commits into
Conversation
On Windows with the UI enabled (--ui=true), list_projects hangs forever and wedges the single-threaded HTTP server, so the web UI never loads. Root cause: list_projects resolves git context per project via git_capture -> cbm_popen (git_context.c). On Windows cbm_popen was _popen, which spawns the child with CreateProcess(bInheritHandles=TRUE) and leaks every inheritable handle into the git child -- including the Winsock/AFD handles that exist only in UI mode. git-for-Windows (MSYS2/Cygwin) classifies each inherited handle via NtQueryObject on startup, which deadlocks on an inherited socket/AFD handle, so git never runs and the server blocks forever in fgets on git's stdout pipe. Confirmed with gdb (request thread in fgets/git_capture, git.exe child in ntdll!ZwQueryObject). The plain MCP-stdio server and the CLI are unaffected -- no socket handles to inherit -- which is why only the UI hangs. Fix: reimplement cbm_popen on Windows with CreateProcess + STARTUPINFOEX and an explicit PROC_THREAD_ATTRIBUTE_HANDLE_LIST containing only the stdout write-end and a NUL handle for stdin/stderr. The git child now inherits only the pipe, so there is no foreign handle for NtQueryObject to deadlock on. cbm_pclose reaps via a small FILE*->HANDLE table. The POSIX path is unchanged (popen already sets O_CLOEXEC). Centralized in cbm_popen, so it also covers watcher.c and the git-history pass. Signed-off-by: Flipper <jacobphilipp@ymail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the Windows UI hang fix for #798. Triage: high-priority Windows/UI stability. Review will focus on the custom Windows |
…le leak Guards the cbm_popen handle-inheritance fix. The git NtQueryObject deadlock is environment-sensitive (does not reproduce on every git-for-Windows/MSYS build), so a git-based test cannot be relied on to go RED. Instead cbm_popen_isolates_inheritable_handle tests the fix's contract directly: it creates an inheritable marker handle and spawns a re-exec of the test runner via cbm_popen (hidden __handle_probe mode), asserting the child cannot read the marker. Verified RED without the fix, GREEN with it. Two companion end-to-end liveness invariants (git-context resolve and POST /rpc list_projects under a watchdog while the UI server holds live sockets) catch DeusData#798 on affected git builds and any future hang regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Flipper <jacobphilipp@ymail.com>
|
@DeusData Added the regression guard you asked for ( The tricky part: the git
Checklist status:
All three tests run in the existing |
|
Thank you — this was a genuinely hard Windows bug found and fixed correctly: leaked inheritable handles keeping the MSYS2 git child hung on the AFD socket walk, matching #798's stacks exactly. We carried it over the line as d2a5975 (PR #846) with you credited as co-author, plus three hardening deltas from review: the isolated spawn no longer silently falls back to _popen (that would reintroduce the deadlock) and logs on failure, cmd.exe is resolved via %COMSPEC%/System32 (no search-path planting), and the command line is UTF-16 (CreateProcessW) so non-ASCII repo paths survive. The full UI-mode hang reproduction (listening socket + single-threaded server + MSYS2 handle walk) is flagged as a follow-up. Closes #798. Closing in favor of the distill — thanks again! |
Windows cbm_popen wrapped _popen, whose CreateProcess(bInheritHandles=TRUE) leaks every inheritable handle - including the UI listening socket and Winsock/AFD helpers - into git children. Git-for-Windows' MSYS2 runtime walks inherited handles with NtQueryObject at startup and deadlocks on socket/AFD handles, wedging the single-threaded UI server (DeusData#798). Read-mode cbm_popen now spawns via CreateProcessW with STARTUPINFOEXW and PROC_THREAD_ATTRIBUTE_HANDLE_LIST restricted to the stdout pipe write end and a NUL handle for stdin/stderr; cbm_pclose reaps the child via a handle table + WaitForSingleObject + GetExitCodeProcess. Distilled from DeusData#799 with three corrections: - no silent fallback to _popen: a failed isolated spawn logs a structured warning (compat.popen_isolated_failed stage/gle/errno) and returns NULL (all call sites handle NULL); the composed command line is heap-allocated instead of a fixed 2048-byte buffer whose overflow forced the fallback - cmd.exe resolved explicitly from %COMSPEC% (GetSystemDirectoryW\cmd.exe fallback) and passed as lpApplicationName - no search-path lookup - GetExitCodeProcess failure reports -1 instead of success; a failed NUL open fails the spawn instead of feeding INVALID_HANDLE_VALUE into STARTF_USESTDHANDLES Adds two windows-only regression tests (test_security.c) driving the cbm_popen/cbm_pclose round-trip against real git and asserting the isolated path was taken via the cbm_popen_last_was_isolated() hook. Closes DeusData#798 Co-authored-by: Flipper <jacobphilipp@ymail.com> Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
What does this PR do?
Fixes a Windows-only deadlock that makes the web UI hang forever. With
--ui=true,the MCP tool
list_projectsnever returns and, because the UI HTTP server issingle-threaded, the whole server stops responding — the UI page loads but stays
blank. Closes #798.
Root cause
list_projectsshells out to git per project for git context(
git_capture→cbm_popen,src/git/git_context.c). On Windowscbm_popenwasthe CRT
_popen, which doesCreateProcess(bInheritHandles = TRUE)and leaksevery inheritable handle into the git child — including the Winsock/AFD handles
that exist only in UI mode. git-for-Windows (MSYS2/Cygwin) classifies each inherited
handle via
NtQueryObjecton startup, which deadlocks on an inheritedsocket/AFD handle. git never runs, so the server blocks forever in
fgetson git'sstdout pipe. Confirmed with gdb: request thread in
fgets/git_capture, thegit.exechild inntdll!ZwQueryObject.The plain MCP-stdio server and the CLI are unaffected (no socket handles to inherit),
which is why only the UI hangs.
The fix
Reimplement
cbm_popenon Windows (src/foundation/compat_fs.c) to spawn viaCreateProcess+STARTUPINFOEXwith an explicitPROC_THREAD_ATTRIBUTE_HANDLE_LISTcontaining only the stdout write-end and aNULhandle for the child's stdin/stderr. The git child now inherits only the pipe —
no sockets, no MCP stdin pipe, no Winsock handles — so there is no foreign handle for
NtQueryObjectto deadlock on.cbm_pclosereaps the process via a smallFILE*→HANDLEtable. The POSIX path is unchanged (popenalready opens its pipeO_CLOEXEC). This is centralized incbm_popen, so it also covers the watcher(
src/watcher/watcher.c) and git-history pass, which shell out to git the same way.Validation (Windows, production
cbm-with-uibinary)POST /rpc list_projects(UI)git/cmdprocessesChecklist
git commit -s)test.shbuilds with ASan/UBSan, whichare unavailable on the MSYS2/MinGW toolchain used to reproduce this; the fix is
#ifdef _WIN32-only and does not touch the POSIX path.httpdsuite (tests/test_httpd.c,tests/test_main.c). The gitNtQueryObjectdeadlock isenvironment-sensitive (it does not reproduce on every git-for-Windows/MSYS
build, nor under Linux CI), so the load-bearing guard tests the fix's
contract directly and deterministically:
cbm_popen_isolates_inheritable_handlecreates an inheritable marker handle and spawns a re-exec of the test runner
via
cbm_popen(hidden__handle_probemode), asserting the child cannotread the marker — verified RED without the fix (
leaked==1), GREEN withit. Two companion end-to-end liveness invariants (git-context resolve and
POST /rpc list_projectsunder a watchdog while the UI server holds livesockets) catch Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance) #798 on affected git builds and any future hang regression.