fix(macOS): reap entire codegraph process tree on exit (Setpgid + negative-PID kill)#3735
fix(macOS): reap entire codegraph process tree on exit (Setpgid + negative-PID kill)#3735ttmouse wants to merge 1 commit into
Conversation
7d42f64 to
86321f7
Compare
|
Thanks — the Setpgid + negative-PID reaping is the right approach for the macOS/Unix process tree. One blocker: the Windows Could you move the process-group setup + group-kill behind a build tag — e.g. a |
86321f7 to
78ecb8e
Compare
…ative-PID kill) On Unix (macOS/Linux), KillTree only killed the direct child process (cmd.Process.Kill()), leaving orphan grandchildren alive — e.g. the codegraph launcher shell's bundled node runtime. Over time, 40+ orphan processes accumulated, causing severe system slowdown. Fix: Set Setpgid on the child before starting it, making it the leader of a new process group. KillTree (and the Cancel handler) now uses syscall.Kill with a negative PID to reap the entire process group. This matches the pattern already proven in bash_kill_other.go and shell_kill_other.go, and is symmetric with the Windows Job Object approach in kill_windows.go.
78ecb8e to
509aa55
Compare
更新说明根据 review 反馈做了以下调整: 1. Windows 编译修复
2. EnsureInit 的隐患(在 code review 中发现的 bug)
3. KillTree 错误日志 关于单元测试review 还建议为 |
|
Thanks @ttmouse — re-landed as #3787, with credit. Your Setpgid + negative-pid approach is exactly right; I just integrated it into the StartTracked path that #3755 added for the Windows Job Object, so the two reaping mechanisms sit side by side (group kill off Windows, Job Object on Windows) and transport_stdio needs no per-call change. Added a grandchild-reaping regression test too. Appreciate the fix! |
问题
macOS/Linux 上,Reasonix 退出后 codegraph 的孙进程残留。累计 40+ 孤儿进程导致系统严重卡顿。
根因
internal/proc/kill_other.go的KillTree只调cmd.Process.Kill()— 只杀直接子进程(codegraph 的 shell launcher),不杀它启动的 node 运行时和工作进程。对比 Windows 版:
internal/proc/kill_windows.go用 Job Object (JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE) +taskkill /T能完整清理整个进程树,没有此问题。修复
两个改动,复用代码库中已有的模式(
bash_kill_other.go/shell_kill_other.go):internal/proc/kill_other.go—KillTree从cmd.Process.Kill()改为syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL),杀死整个进程组internal/plugin/transport_stdio.go—newStdioTransport中在cmd.Start()前设cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}并设cmd.Cancelhandler 杀进程组这样无论正常退出还是 context 取消,codegraph 的进程树都会被完整清理。
Fixes #3734