Skip to content

fix(win): prevent UI hang — spawn git without leaking handles into the child#799

Closed
Flipper1994 wants to merge 2 commits into
DeusData:mainfrom
Flipper1994:fix/windows-ui-popen-handle-inheritance
Closed

fix(win): prevent UI hang — spawn git without leaking handles into the child#799
Flipper1994 wants to merge 2 commits into
DeusData:mainfrom
Flipper1994:fix/windows-ui-popen-handle-inheritance

Conversation

@Flipper1994

@Flipper1994 Flipper1994 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a Windows-only deadlock that makes the web UI hang forever. With --ui=true,
the MCP tool list_projects never returns and, because the UI HTTP server is
single-threaded, the whole server stops responding — the UI page loads but stays
blank. Closes #798.

Root cause

list_projects shells out to git per project for git context
(git_capturecbm_popen, src/git/git_context.c). On Windows cbm_popen was
the CRT _popen, which does CreateProcess(bInheritHandles = TRUE) and leaks
every inheritable handle into the git child — including the Winsock/AFD handles
that exist only in UI mode. git-for-Windows (MSYS2/Cygwin) classifies each inherited
handle via NtQueryObject on startup, which deadlocks on an inherited
socket/AFD handle. git never runs, so the server blocks forever in fgets on git's
stdout pipe. Confirmed with gdb: request thread in fgets/git_capture, the
git.exe child in ntdll!ZwQueryObject.

The plain MCP-stdio server and the CLI are unaffected (no socket handles to inherit),
which is why only the UI hangs.

The fix

Reimplement cbm_popen on Windows (src/foundation/compat_fs.c) to spawn via
CreateProcess + STARTUPINFOEX with an explicit
PROC_THREAD_ATTRIBUTE_HANDLE_LIST containing only the stdout write-end and a NUL
handle for the child's stdin/stderr. The git child now inherits only the pipe —
no sockets, no MCP stdin pipe, no Winsock handles — so there is no foreign handle for
NtQueryObject to deadlock on. cbm_pclose reaps the process via a small
FILE*HANDLE table. The POSIX path is unchanged (popen already opens its pipe
O_CLOEXEC). This is centralized in cbm_popen, so it also covers the watcher
(src/watcher/watcher.c) and git-history pass, which shell out to git the same way.

Validation (Windows, production cbm-with-ui binary)

before after
POST /rpc list_projects (UI) hang / timeout 200 in ~1 s, 6× stable
leftover git/cmd processes persist forever none
web UI in browser blank page renders, stays responsive

Checklist

  • Every commit is signed off (git commit -s)
  • Tests pass locally — note: this repo's test.sh builds with ASan/UBSan, which
    are unavailable on the MSYS2/MinGW toolchain used to reproduce this; the fix is
    #ifdef _WIN32-only and does not touch the POSIX path.
  • Lint passes
  • Regression test — added in the httpd suite (tests/test_httpd.c,
    tests/test_main.c). The git NtQueryObject deadlock is
    environment-sensitive (it does not reproduce on every git-for-Windows/MSYS
    build, nor under Linux CI), so the load-bearing guard tests the fix's
    contract directly and deterministically: cbm_popen_isolates_inheritable_handle
    creates an inheritable marker handle and spawns a re-exec of the test runner
    via cbm_popen (hidden __handle_probe mode), asserting the child cannot
    read the marker — verified RED without the fix (leaked==1), GREEN with
    it
    . Two companion end-to-end liveness invariants (git-context resolve and
    POST /rpc list_projects under a watchdog while the UI server holds live
    sockets) catch Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance) #798 on affected git builds and any future hang regression.

On Windows with the UI enabled (--ui=true), list_projects hangs forever and
wedges the single-threaded HTTP server, so the web UI never loads.

Root cause: list_projects resolves git context per project via git_capture ->
cbm_popen (git_context.c). On Windows cbm_popen was _popen, which spawns the
child with CreateProcess(bInheritHandles=TRUE) and leaks every inheritable
handle into the git child -- including the Winsock/AFD handles that exist only
in UI mode. git-for-Windows (MSYS2/Cygwin) classifies each inherited handle via
NtQueryObject on startup, which deadlocks on an inherited socket/AFD handle, so
git never runs and the server blocks forever in fgets on git's stdout pipe.
Confirmed with gdb (request thread in fgets/git_capture, git.exe child in
ntdll!ZwQueryObject). The plain MCP-stdio server and the CLI are unaffected --
no socket handles to inherit -- which is why only the UI hangs.

Fix: reimplement cbm_popen on Windows with CreateProcess + STARTUPINFOEX and an
explicit PROC_THREAD_ATTRIBUTE_HANDLE_LIST containing only the stdout write-end
and a NUL handle for stdin/stderr. The git child now inherits only the pipe, so
there is no foreign handle for NtQueryObject to deadlock on. cbm_pclose reaps
via a small FILE*->HANDLE table. The POSIX path is unchanged (popen already
sets O_CLOEXEC). Centralized in cbm_popen, so it also covers watcher.c and the
git-history pass.

Signed-off-by: Flipper <jacobphilipp@ymail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Flipper1994 Flipper1994 requested a review from DeusData as a code owner July 3, 2026 14:07
@DeusData DeusData added bug Something isn't working windows Windows-specific issues ux/behavior Display bugs, docs, adoption UX stability/performance Server crashes, OOM, hangs, high CPU/memory priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. labels Jul 4, 2026
@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thanks for the Windows UI hang fix for #798. Triage: high-priority Windows/UI stability.

Review will focus on the custom Windows cbm_popen implementation: handle inheritance must be closed for git-for-Windows without leaking handles, hanging the HTTP server, or changing POSIX behavior. A regression that proves list_projects returns under UI mode on Windows is the important guard.

…le leak

Guards the cbm_popen handle-inheritance fix. The git NtQueryObject deadlock
is environment-sensitive (does not reproduce on every git-for-Windows/MSYS
build), so a git-based test cannot be relied on to go RED. Instead
cbm_popen_isolates_inheritable_handle tests the fix's contract directly:
it creates an inheritable marker handle and spawns a re-exec of the test
runner via cbm_popen (hidden __handle_probe mode), asserting the child
cannot read the marker. Verified RED without the fix, GREEN with it.

Two companion end-to-end liveness invariants (git-context resolve and
POST /rpc list_projects under a watchdog while the UI server holds live
sockets) catch DeusData#798 on affected git builds and any future hang regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Flipper <jacobphilipp@ymail.com>
@Flipper1994

Copy link
Copy Markdown
Contributor Author

@DeusData Added the regression guard you asked for (0c2daca, test-only).

The tricky part: the git NtQueryObject deadlock is environment-sensitive — it does not reproduce on every git-for-Windows/MSYS build (nor under Linux CI, which has no sockets), so a git-based test can't be relied on to go RED. So the load-bearing guard tests the fix's contract directly:

  • cbm_popen_isolates_inheritable_handle (tests/test_httpd.c + a hidden __handle_probe re-exec mode in tests/test_main.c): creates an inheritable marker handle, spawns a re-exec of the test runner through cbm_popen, and asserts the child cannot read the marker — i.e. no foreign handle crosses into the child. Verified RED without the fix (leaked==1), GREEN with it. This is the exact leak that caused the deadlock, tested deterministically on any Windows.
  • Two companion end-to-end liveness invariants — cbm_git_context_resolve (the per-project path list_projects runs) and POST /rpc list_projects, both under a watchdog while the UI server holds live sockets. They enforce "must-not-hang / must-respond under UI mode" and catch Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance) #798 on affected git builds; same pattern as the environment-sensitive guards already in test_git_context.c.

Checklist status:

  • Handle inheritance closed for git-for-Windows — STARTUPINFOEX + PROC_THREAD_ATTRIBUTE_HANDLE_LIST = {stdout pipe, NUL}; every error path closes its handles.
  • No server hang / POSIX unchanged — Windows path is #ifdef _WIN32-only; POSIX popen (already O_CLOEXEC) untouched.
  • Regression test — done, as above; the checklist box is now ticked.

All three tests run in the existing httpd suite (no new build wiring), so they execute on the current test-windows CI leg. No production code changed in this follow-up.

@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thank you — this was a genuinely hard Windows bug found and fixed correctly: leaked inheritable handles keeping the MSYS2 git child hung on the AFD socket walk, matching #798's stacks exactly. We carried it over the line as d2a5975 (PR #846) with you credited as co-author, plus three hardening deltas from review: the isolated spawn no longer silently falls back to _popen (that would reintroduce the deadlock) and logs on failure, cmd.exe is resolved via %COMSPEC%/System32 (no search-path planting), and the command line is UTF-16 (CreateProcessW) so non-ASCII repo paths survive. The full UI-mode hang reproduction (listening socket + single-threaded server + MSYS2 handle walk) is flagged as a follow-up. Closes #798. Closing in favor of the distill — thanks again!

@DeusData DeusData closed this Jul 4, 2026
rarepops pushed a commit to rarepops/codebase-memory-mcp that referenced this pull request Jul 4, 2026
Windows cbm_popen wrapped _popen, whose CreateProcess(bInheritHandles=TRUE)
leaks every inheritable handle - including the UI listening socket and
Winsock/AFD helpers - into git children. Git-for-Windows' MSYS2 runtime
walks inherited handles with NtQueryObject at startup and deadlocks on
socket/AFD handles, wedging the single-threaded UI server (DeusData#798).

Read-mode cbm_popen now spawns via CreateProcessW with STARTUPINFOEXW and
PROC_THREAD_ATTRIBUTE_HANDLE_LIST restricted to the stdout pipe write end
and a NUL handle for stdin/stderr; cbm_pclose reaps the child via a handle
table + WaitForSingleObject + GetExitCodeProcess.

Distilled from DeusData#799 with three corrections:
- no silent fallback to _popen: a failed isolated spawn logs a structured
  warning (compat.popen_isolated_failed stage/gle/errno) and returns NULL
  (all call sites handle NULL); the composed command line is heap-allocated
  instead of a fixed 2048-byte buffer whose overflow forced the fallback
- cmd.exe resolved explicitly from %COMSPEC% (GetSystemDirectoryW\cmd.exe
  fallback) and passed as lpApplicationName - no search-path lookup
- GetExitCodeProcess failure reports -1 instead of success; a failed NUL
  open fails the spawn instead of feeding INVALID_HANDLE_VALUE into
  STARTF_USESTDHANDLES

Adds two windows-only regression tests (test_security.c) driving the
cbm_popen/cbm_pclose round-trip against real git and asserting the isolated
path was taken via the cbm_popen_last_was_isolated() hook.

Closes DeusData#798

Co-authored-by: Flipper <jacobphilipp@ymail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. stability/performance Server crashes, OOM, hangs, high CPU/memory ux/behavior Display bugs, docs, adoption UX windows Windows-specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance)

2 participants