Skip to content

Windows: ZMQ ipc:// transport not supported — TUI fails to start after /dev/tty fix #41

Description

@mind6

Summary

After the /dev/tty fix in #40, the next Windows blocker is that Kaimon binds/connects ZMQ sockets using the ipc:// (Unix domain socket) transport, which the standard ZMQ_jll Windows build does not support. Result: kaimon aborts at TUI init.

Reproducer

Windows 11 / Julia 1.12.5 / Kaimon 1.3.1 with PR #40 applied:

ERROR: ZMQ: Protocol not supported
Stacktrace:
  [1] bind(socket::ZMQ.Socket, endpoint::String)
    @ ZMQ .../ZMQ/src/socket.jl:115
  [2] _start_event_pub!(mgr::Kaimon.ConnectionManager)
    @ Kaimon .../src/gate_client.jl:315
  [3] start!(mgr::Kaimon.ConnectionManager)
    @ Kaimon .../src/gate_client.jl:1525
  [4] init!(m::Kaimon.KaimonModel, _t::Tachikoma.Terminal)
    @ Kaimon .../src/tui/lifecycle.jl:221

Confirmed by direct probe — bind(sock, "ipc://./test.sock") on the ZMQ_jll shipped to Windows raises ZMQ: Protocol not supported. This is not a Kaimon bug per se, but a long-standing libzmq/Windows situation: see zeromq/libzmq#153, zeromq/pyzmq#1462, zeromq/zeromq.js#478. libzmq has experimental named-pipe IPC (PR zeromq/libzmq#3717) but it isn't enabled in the binary builds Julia ships.

Scope of the change

ipc:// is used in five files:

  • src/gate.jl:2092 — gate REQ/REP endpoint per session
  • src/gate.jl:2108 — gate PUB/SUB stream endpoint per session
  • src/gate.jl:2668 — service REP endpoint client-side connect
  • src/gate_client.jl:315 — global event PUB (the first one to crash)
  • src/gate_client.jl:640 — fallback when reading session metadata
  • src/service_endpoint.jl:30,31,101 — service endpoint bind + cleanup
  • src/extension_manager.jl:138 — extension SUB connect

Endpoints are also persisted in session metadata JSON files (sock_dir/<sid>.json), so the wire format on disk is affected too.

Suggested fix: TCP loopback on Windows

Switch transports based on Sys.iswindows():

  • Unix: keep ipc://...sock (cheap, no port allocation).
  • Windows: use tcp://127.0.0.1:<port>, with a port chosen dynamically via bind(sock, \"tcp://127.0.0.1:*\") and the resolved endpoint read back via get_last_endpoint(sock) (ZMQ.jl exposes the LAST_ENDPOINT socket option). Persist the resolved tcp://... URL in session metadata exactly as ipc://... is today — the rest of the code only sees an opaque endpoint string.

Sketch:

function _gate_endpoint(sock_dir, sid; suffix=\"\")
    if Sys.iswindows()
        # caller binds with port=*, then reads LAST_ENDPOINT
        return \"tcp://127.0.0.1:*\"
    else
        return \"ipc://\" * joinpath(sock_dir, \"\$(sid)\$(suffix).sock\")
    end
end

function _bind_dynamic!(sock, endpoint_template)
    bind(sock, endpoint_template)
    # ZMQ.jl: get the resolved endpoint after wildcard bind
    return String(ZMQ.get(sock, ZMQ.LAST_ENDPOINT))
end

Then gate.jl/gate_client.jl save the resolved endpoint (with concrete port) into session metadata, and clients connect to that. Cleanup paths that today rm the .sock file simply become no-ops on Windows.

Trade-offs vs. IPC:

  • ✅ Same security posture as IPC for single-user dev workstation (loopback only, OS firewall blocks remote).
  • ✅ No libzmq rebuild required.
  • ⚠️ Multiple Kaimon instances on the same machine work fine (dynamic ports), but stale tcp:// entries in old session metadata need explicit cleanup; the existing _maybe_cleanup_stale_session! logic keying off pid/mtime should still work.
  • ⚠️ A loopback ZMQ socket is visible to other local users; if that's a concern, gate it behind ZAP/CURVE auth or document the difference.

Alternative considered: ship a Windows-only ZMQ_jll rebuilt with -DZMQ_HAVE_IPC=ON -DZMQ_HAVE_WINDOWS_NAMED_PIPES=ON. That works but pulls Kaimon into the BinaryBuilder / Yggdrasil rabbit hole and risks divergence from upstream ZMQ_jll. TCP loopback is a smaller, more portable change.

Bigger picture: state of Windows support

For context — there don't appear to be any Windows tracking issues on the repo today, and the Discourse announcement thread doesn't mention Windows. The two real blockers I've hit running kaimon cold on Windows 11 are:

  1. /dev/tty open in _start_stdout_capture! → fixed by #40 (one line).
  2. ZMQ ipc:// everywhere → this issue.

Beyond those, there are a handful of Unix-flavored references (/dev/ttys* paths, the tty shell-out in tool_definitions.jl) but those are explicitly documented as macOS/Linux-only features (external-TTY attach), so they degrade gracefully — they shouldn't block a Windows user from running the core gate + TUI. So the realistic path to "Kaimon works on Windows" looks like:

  • PR #40/dev/ttyCON
  • This issueipc://tcp://127.0.0.1 on Windows
  • Optional: document that the external-TTY (tty-path) feature is Unix-only
  • CI: a Windows runner in GitHub Actions to keep this from regressing

Happy to put together a PR for the TCP-loopback switch if the design above sounds reasonable — wanted to file this first to check whether you'd prefer that approach, a ZMQ_jll rebuild, or something else (e.g. inproc + a single-process design on Windows).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions