Skip to content

session, tcpip: fix exit-status ordering and half-close in forwarding#8

Merged
kradalby merged 6 commits intomasterfrom
kradalby/ssh-exit-status-ordering
Apr 29, 2026
Merged

session, tcpip: fix exit-status ordering and half-close in forwarding#8
kradalby merged 6 commits intomasterfrom
kradalby/ssh-exit-status-ordering

Conversation

@kradalby
Copy link
Copy Markdown
Member

This PR brings in the gliderlabs ssh changes from tailscale/tailscale#18331 into our fork.

This is the first part of getting tailscale/tailscale#18331 into main, where the other part will be update the gliderssh dependency and the other half of the changes.

In addition to the changes, it tries to pick up some of the comments, adds some references to various specifications and tests.

Updates tailscale/tailscale#18256

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports upstream Tailscale/gliderlabs SSH fixes related to SSH forwarding semantics, focusing on correct exit-status/close ordering and proper half-close behavior during bidirectional stream copying.

Changes:

  • Update bicopy to use half-closes (CloseWrite/CloseRead when available) so one direction completing doesn’t prematurely tear down the other.
  • Adjust session exit handling so exit-status is sent without immediately closing the channel; close is performed after handler completion.
  • Make reverse UNIX socket forwarding resilient to nil contexts when calling net.ListenConfig.Listen.
  • Refactor agent forwarding to reuse bicopy for data transfer.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tcpip.go Reworks bicopy to support half-close semantics and adds helper half-close functions.
tcpip_test.go Adds new tests covering bicopy normal operation, cancellation, half-close propagation, and half-close no-op behavior.
session.go Changes Session.Exit to no longer close immediately; explicitly closes after handler completion.
streamlocal.go Avoids passing nil contexts to net.ListenConfig.Listen by falling back to context.Background().
agent.go Switches agent forwarding data copy logic to use bicopy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tcpip_test.go Outdated
Comment thread tcpip_test.go
Comment thread session.go
Comment thread streamlocal.go Outdated
Decouple Exit() from Close() in session.go. Per RFC 4254 section 6.10,
the exit-status message should be sent before the channel is closed.
By not closing immediately, we allow the session handler to complete
any remaining I/O operations before the channel is closed by the
request handler cleanup code.

Rewrite bicopy() to use half-close (CloseWrite/CloseRead) instead of
immediately cancelling the context when one copy direction finishes.
When one direction finishes, half-close the write side of the
destination to signal EOF to the peer and close the read side of the
source to release kernel resources, allowing the other direction to
finish gracefully. Context cancellation is preserved as a safety
fallback.

Add halfCloseWrite and halfCloseRead helpers that use type assertions
for CloseWrite and CloseRead respectively. All connection types used
in SSH forwarding (gossh.Channel, net.TCPConn, net.UnixConn) support
CloseWrite. net.TCPConn and net.UnixConn also support CloseRead;
gossh.Channel does not, so halfCloseRead is a no-op for SSH channels.

Updates tailscale/tailscale#18256

Co-authored-by: James Tucker <james@tailscale.com>
@kradalby kradalby force-pushed the kradalby/ssh-exit-status-ordering branch from 19e5212 to cbe4b0e Compare April 16, 2026 08:42
Add RFC references to the comments introduced in the previous commit:

- session.go Exit(): cite RFC 4254 Section 6.10 with URL for the
  exit-status before CHANNEL_CLOSE requirement
- tcpip.go bicopy(): cite RFC 4254 Section 5.3 with URL for the EOF
  signaling requirement
- tcpip.go halfCloseWrite(): expand comment explaining how half-close
  propagation unblocks the other copy direction
- tcpip.go halfCloseRead(): expand comment explaining kernel resource
  release and gossh.Channel no-op behavior

No behavioral changes, only documentation.

Updates tailscale/tailscale#18256
Replace wg.Add(2) + go func() { defer wg.Done(); ... }() with
wg.Go(func() { ... }). Available since Go 1.24.

Suggested-by: Ox Cart <ox@tailscale.com> (tailscale/tailscale#18331 review)

Updates tailscale/tailscale#18256
Replace the inline bidirectional copy in ForwardAgentConnections with
the shared bicopy function. This fixes a panic-prone hard cast to
*net.UnixConn (which would crash on any non-Unix connection type) and
adds context integration that was previously missing from agent
forwarding.

Updates tailscale/tailscale#18256
@kradalby kradalby force-pushed the kradalby/ssh-exit-status-ordering branch from cbe4b0e to 54dbde8 Compare April 16, 2026 09:18
Comment thread streamlocal.go Outdated
}
if opts.PathValidator != nil {
// PathValidator requires a valid SSH Context for session
// metadata; skip when ctx is nil (only in tests).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we allow the nil context only for tests instead of just making the test construct one and pass it in?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we testing it because we expect that it might happen in real-world usage and we want to allow it? Or is it just convenience for the tests, to reduce setup code?

Comment thread tcpip_test.go Outdated

select {
case <-done:
case <-time.After(5 * time.Second):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 1s should be plenty, do we need 5s?

Copy link
Copy Markdown

@neinkeinkaffee neinkeinkaffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we'll cover the exit ordering in an integration test in a follow-up in OSS, this looks good to me apart from one question mark I've got regarding the nil context safeguard!

Add tests using real TCP connections to verify bicopy correctness:

TestBicopyNormal: data flows correctly in both directions.

TestBicopyContextCancel: context cancellation force-closes both
connections and bicopy returns promptly.

TestBicopyHalfClosePropagation: when one side finishes sending, the
other side still receives all its data. This is the key property that
half-close provides over the old cancel-on-first-direction approach.

TestHalfCloseWriteNoOpPreservesConnection: verifies halfCloseWrite is
a safe no-op for types without CloseWrite support. The connection
remains readable, proving a c.Close() fallback would be harmful.

TestHalfCloseReadNoOpPreservesConnection: verifies halfCloseRead is
a safe no-op for types without CloseRead support. The connection
remains writable, proving a c.Close() fallback would be harmful.

Updates tailscale/tailscale#18256
The returned callback passed its Context parameter directly to
net.ListenConfig.Listen, which panics when ctx is nil (Go 1.21+
dereferences the context in net.ContextSockTrace). Use
context.Background as a fallback when ctx is nil.

This fixes TestNewReverseUnixForwardingCallbackBindUnlinkSkipsNonSocket
and TestNewReverseUnixForwardingCallbackSocketPermissions which both
pass nil as the Context.
@kradalby kradalby force-pushed the kradalby/ssh-exit-status-ordering branch from 54dbde8 to ba9c35c Compare April 29, 2026 11:33
@kradalby kradalby requested a review from neinkeinkaffee April 29, 2026 11:33
@kradalby kradalby merged commit 434cb5b into master Apr 29, 2026
4 checks passed
kradalby added a commit to tailscale/tailscale that referenced this pull request May 6, 2026
Fix a race where CloseWrite (SSH_MSG_CHANNEL_EOF) could be sent before
Exit (exit-status), causing SSH clients (especially on macOS) to miss
the exit status. The root cause was a select between outputDone and
ctx.Done that allowed Exit to be called while stdout was still copying,
or after CloseWrite had already been sent.

Replace the atomic.Bool/atomic.Int32/channel synchronization with
sync.WaitGroup and guarantee the wire ordering required by RFC 4254
section 6.10: exit-status -> EOF -> CHANNEL_CLOSE.

The new ordering in run():
1. cmd.Wait() returns with exit code
2. ss.Exit(exitCode) sends exit-status
3. closeAll(childPipes) signals io.Copy goroutines to finish
4. wg.Wait() waits for stdout goroutine which calls CloseWrite (EOF)
5. defer ss.Close() sends CHANNEL_CLOSE on function return

Additional changes:
- Send SIGHUP instead of SIGKILL on session termination
  (cite OpenSSH session.c:2246 for PTY master close, POSIX General
  Terminal Interface for terminal disconnect semantics)
- Use exit code 255 for SSH protocol/permission errors
  (cite OpenSSH ssh.c:1693 for the 255 convention)
- Use exit code 127 for command not found
  (cite POSIX Shell Command Language for the 127 convention)
- Use exit code 254 for recording failures
- Add isNotFoundOrExecutable helper

Bumps the github.com/tailscale/gliderssh dependency to pull in the
matching wire-ordering changes from tailscale/gliderssh#8, and
regenerates the corresponding nix vendor hashes via tool/updateflakes.

Based on #18331.

Updates #18256

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants