session, tcpip: fix exit-status ordering and half-close in forwarding#8
session, tcpip: fix exit-status ordering and half-close in forwarding#8
Conversation
There was a problem hiding this comment.
Pull request overview
This PR ports upstream Tailscale/gliderlabs SSH fixes related to SSH forwarding semantics, focusing on correct exit-status/close ordering and proper half-close behavior during bidirectional stream copying.
Changes:
- Update
bicopyto use half-closes (CloseWrite/CloseRead when available) so one direction completing doesn’t prematurely tear down the other. - Adjust session exit handling so
exit-statusis sent without immediately closing the channel; close is performed after handler completion. - Make reverse UNIX socket forwarding resilient to nil contexts when calling
net.ListenConfig.Listen. - Refactor agent forwarding to reuse
bicopyfor data transfer.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tcpip.go |
Reworks bicopy to support half-close semantics and adds helper half-close functions. |
tcpip_test.go |
Adds new tests covering bicopy normal operation, cancellation, half-close propagation, and half-close no-op behavior. |
session.go |
Changes Session.Exit to no longer close immediately; explicitly closes after handler completion. |
streamlocal.go |
Avoids passing nil contexts to net.ListenConfig.Listen by falling back to context.Background(). |
agent.go |
Switches agent forwarding data copy logic to use bicopy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Decouple Exit() from Close() in session.go. Per RFC 4254 section 6.10, the exit-status message should be sent before the channel is closed. By not closing immediately, we allow the session handler to complete any remaining I/O operations before the channel is closed by the request handler cleanup code. Rewrite bicopy() to use half-close (CloseWrite/CloseRead) instead of immediately cancelling the context when one copy direction finishes. When one direction finishes, half-close the write side of the destination to signal EOF to the peer and close the read side of the source to release kernel resources, allowing the other direction to finish gracefully. Context cancellation is preserved as a safety fallback. Add halfCloseWrite and halfCloseRead helpers that use type assertions for CloseWrite and CloseRead respectively. All connection types used in SSH forwarding (gossh.Channel, net.TCPConn, net.UnixConn) support CloseWrite. net.TCPConn and net.UnixConn also support CloseRead; gossh.Channel does not, so halfCloseRead is a no-op for SSH channels. Updates tailscale/tailscale#18256 Co-authored-by: James Tucker <james@tailscale.com>
19e5212 to
cbe4b0e
Compare
Add RFC references to the comments introduced in the previous commit: - session.go Exit(): cite RFC 4254 Section 6.10 with URL for the exit-status before CHANNEL_CLOSE requirement - tcpip.go bicopy(): cite RFC 4254 Section 5.3 with URL for the EOF signaling requirement - tcpip.go halfCloseWrite(): expand comment explaining how half-close propagation unblocks the other copy direction - tcpip.go halfCloseRead(): expand comment explaining kernel resource release and gossh.Channel no-op behavior No behavioral changes, only documentation. Updates tailscale/tailscale#18256
Replace wg.Add(2) + go func() { defer wg.Done(); ... }() with
wg.Go(func() { ... }). Available since Go 1.24.
Suggested-by: Ox Cart <ox@tailscale.com> (tailscale/tailscale#18331 review)
Updates tailscale/tailscale#18256
Replace the inline bidirectional copy in ForwardAgentConnections with the shared bicopy function. This fixes a panic-prone hard cast to *net.UnixConn (which would crash on any non-Unix connection type) and adds context integration that was previously missing from agent forwarding. Updates tailscale/tailscale#18256
cbe4b0e to
54dbde8
Compare
| } | ||
| if opts.PathValidator != nil { | ||
| // PathValidator requires a valid SSH Context for session | ||
| // metadata; skip when ctx is nil (only in tests). |
There was a problem hiding this comment.
Why do we allow the nil context only for tests instead of just making the test construct one and pass it in?
There was a problem hiding this comment.
Are we testing it because we expect that it might happen in real-world usage and we want to allow it? Or is it just convenience for the tests, to reduce setup code?
|
|
||
| select { | ||
| case <-done: | ||
| case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Nit: 1s should be plenty, do we need 5s?
neinkeinkaffee
left a comment
There was a problem hiding this comment.
Given that we'll cover the exit ordering in an integration test in a follow-up in OSS, this looks good to me apart from one question mark I've got regarding the nil context safeguard!
Add tests using real TCP connections to verify bicopy correctness: TestBicopyNormal: data flows correctly in both directions. TestBicopyContextCancel: context cancellation force-closes both connections and bicopy returns promptly. TestBicopyHalfClosePropagation: when one side finishes sending, the other side still receives all its data. This is the key property that half-close provides over the old cancel-on-first-direction approach. TestHalfCloseWriteNoOpPreservesConnection: verifies halfCloseWrite is a safe no-op for types without CloseWrite support. The connection remains readable, proving a c.Close() fallback would be harmful. TestHalfCloseReadNoOpPreservesConnection: verifies halfCloseRead is a safe no-op for types without CloseRead support. The connection remains writable, proving a c.Close() fallback would be harmful. Updates tailscale/tailscale#18256
The returned callback passed its Context parameter directly to net.ListenConfig.Listen, which panics when ctx is nil (Go 1.21+ dereferences the context in net.ContextSockTrace). Use context.Background as a fallback when ctx is nil. This fixes TestNewReverseUnixForwardingCallbackBindUnlinkSkipsNonSocket and TestNewReverseUnixForwardingCallbackSocketPermissions which both pass nil as the Context.
54dbde8 to
ba9c35c
Compare
Fix a race where CloseWrite (SSH_MSG_CHANNEL_EOF) could be sent before Exit (exit-status), causing SSH clients (especially on macOS) to miss the exit status. The root cause was a select between outputDone and ctx.Done that allowed Exit to be called while stdout was still copying, or after CloseWrite had already been sent. Replace the atomic.Bool/atomic.Int32/channel synchronization with sync.WaitGroup and guarantee the wire ordering required by RFC 4254 section 6.10: exit-status -> EOF -> CHANNEL_CLOSE. The new ordering in run(): 1. cmd.Wait() returns with exit code 2. ss.Exit(exitCode) sends exit-status 3. closeAll(childPipes) signals io.Copy goroutines to finish 4. wg.Wait() waits for stdout goroutine which calls CloseWrite (EOF) 5. defer ss.Close() sends CHANNEL_CLOSE on function return Additional changes: - Send SIGHUP instead of SIGKILL on session termination (cite OpenSSH session.c:2246 for PTY master close, POSIX General Terminal Interface for terminal disconnect semantics) - Use exit code 255 for SSH protocol/permission errors (cite OpenSSH ssh.c:1693 for the 255 convention) - Use exit code 127 for command not found (cite POSIX Shell Command Language for the 127 convention) - Use exit code 254 for recording failures - Add isNotFoundOrExecutable helper Bumps the github.com/tailscale/gliderssh dependency to pull in the matching wire-ordering changes from tailscale/gliderssh#8, and regenerates the corresponding nix vendor hashes via tool/updateflakes. Based on #18331. Updates #18256 Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This PR brings in the gliderlabs ssh changes from tailscale/tailscale#18331 into our fork.
This is the first part of getting tailscale/tailscale#18331 into main, where the other part will be update the gliderssh dependency and the other half of the changes.
In addition to the changes, it tries to pick up some of the comments, adds some references to various specifications and tests.
Updates tailscale/tailscale#18256