Skip to content

fix(lxc): lxc tty#283

Merged
bbonaby merged 21 commits into
microsoft:mainfrom
caarlos0:fix/lxc-stdin
May 14, 2026
Merged

fix(lxc): lxc tty#283
bbonaby merged 21 commits into
microsoft:mainfrom
caarlos0:fix/lxc-stdin

Conversation

@caarlos0

@caarlos0 caarlos0 commented May 12, 2026

Copy link
Copy Markdown
Contributor

📖 Description

Make the LXC backend usable end-to-end for an interactive sandboxed shell driven over a host pty (the model the GitHub Copilot CLI uses on Linux for its bash tool). Four stacked fixes — each was necessary, none was sufficient on its own.

1. attach_run: inherit parent stdio (a3f2a66)

Wires lxc-attach's stdin/stdout/stderr to the parent process so the inner shell shares whatever drives lxc-exec. Without this, the child gets a closed stdin and exits immediately on EOF — interactive shells can't survive their first read. Mirrors the AppContainer runner's pty sharing on Windows.

2. destroy(): skip the redundant lxc-stop (562cd5a)

lxc-destroy -f already force-stops a running container. We were also calling lxc-stop first, which waits up to 60 s for graceful shutdown. On distros that run systemd as PID 1 in unprivileged userns (anything ubuntu-class), init never cleanly responds to SIGPWR — the wait always times out before the force-stop step. Cleanup drops from ~60 s to ~0.2 s; alpine is unaffected.

3. attach_run: bridge through an inner pty with a detached ctty (49fbbb9)

The inherit-stdio fix from #1 got bash booting and printing its prompt, but the shell then ignored everything the host wrote. Two compounding root causes:

  • Pre-buffered input was discarded. Bash's readline init calls tcsetattr on its stdin, which can flush bytes the parent buffered into the pty before bash got there. Anything the CLI pre-sent (e.g. its shell-init wrapper) was lost.
  • lxc-attach bypassed our slave fds. When it inherits a controlling terminal from the parent, it forwards the inner container pty's I/O straight to /dev/tty instead of using the stdio slots. Bash's prompt reached our screen but the master fd we kept stayed empty, so there was no way to forward host stdin into bash either.

Fix:

  • Allocate an inner pty pair via nix::pty::openpty. Give the slave to lxc-attach (and thus bash) for stdin/stdout/stderr; keep the master in the parent.
  • In pre_exec: setsid() to drop the inherited controlling terminal, then ioctl(TIOCSCTTY) on fd 0 so lxc-attach adopts our slave as its new ctty. This is what makes the slave fds actually carry the data.
  • Output thread copies master → host stdout, signaling "ready" on the first byte from inside the container (bash is past readline init at that point and safe to feed).
  • Input thread waits for "ready" (capped at 5 s; the higher-level command timeout takes over from there) before forwarding host stdin → master.

Adds the term feature to the workspace nix dependency so pty::openpty is available.

4. lxc-exec: destroy the active container on fatal signals (f146e9f)

LxcScriptRunner::run_internal only calls container.destroy() after attach_run returns. When the runner is killed by a fatal signal — its parent exited, or it received SIGHUP/SIGTERM/SIGINT — the in-flight attach_run is interrupted and the destroy block is never reached, leaving the container running indefinitely (lxc-ls lists it long after the CLI session that spawned it is gone).

Install a signal_cleanup watchdog in lxc-exec's main:

  • Block SIGHUP/SIGTERM/SIGINT in the calling thread before any other thread is spawned, so all later threads inherit the blocked mask.
  • A dedicated thread sigwaits for any of those signals, destroys the currently active container, and exits with 128 + signo so the parent sees a normal signal-style exit.

LxcScriptRunner::run_internal registers the active container name with the watchdog right after resolving it, so the cleanup window covers create/start/attach as well as the normal post-attach destroy path.

🔗 References

Surfaced while wiring up Linux sandbox support in the GitHub Copilot CLI runtime. Tested against the createConfigFromPolicy migration in copilot-agent-runtime#7767.

🔍 Validation

  • cargo test -p lxc_common --lib — 25/25 unit tests green on host (macOS arm64) and target (linux/aarch64-unknown-linux-gnu).
  • cargo check -p lxc_common --target aarch64-unknown-linux-gnu clean.
  • End-to-end manual test on Ubuntu Resolute (arm64, OrbStack VM) running the Copilot CLI's sandboxed bash tool: container creates, bash boots, the CLI's init wrapper executes, commands run, output streams back, container destroys in <1 s. Repeated successfully across 10+ invocations with no ABORTING start failures or 60 s teardown waits.
  • Signal cleanup verified by SIGTERM/SIGINT'ing lxc-exec mid-run: container is destroyed before the process exits, lxc-ls shows no orphaned containers across repeated runs.

✅ Checklist

📋 Issue Type

  • Bug fix
  • Feature
  • Task

refs https://github.com/github/copilot-agent-runtime/issues/164

Microsoft Reviewers: Open in CodeFlow

caarlos0 and others added 3 commits May 12, 2026 10:58
Wire lxc-attach's stdin/stdout/stderr to the parent process so the inner
process shares the pty driving lxc-exec. Without this, stdin is closed
and interactive shells exit immediately on EOF.

Mirrors the AppContainer runner on Windows, which shares the parent's
ConPTY with the sandboxed child.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
`lxc-destroy -f` already force-stops a running container, but `destroy()`
also called `lxc-stop` first. Plain `lxc-stop` waits up to 60 seconds for
a graceful shutdown, which is fatal for distros that run systemd as PID 1
in unprivileged userns: init never cleanly responds to SIGPWR, so the wait
times out before the force-stop step.

Removing the prior stop drops cleanup from ~60s to ~0.2s on ubuntu and is
a no-op on alpine (no init to wait on).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The inherit-stdio approach from the previous commit got `lxc-attach` running
but produced silent shells: bash booted, printed its prompt, and then ignored
everything the host wrote. Two compounding problems:

1. **Pre-buffered input was discarded.** Bash's readline init calls
   `tcsetattr` on its stdin, which can flush bytes the parent buffered into
   the pty before bash got there. Anything the CLI pre-sent (e.g. a shell
   wrapper writing `PS1=""; PS2=""; unset HISTFILE`) was lost.
2. **`lxc-attach` bypassed our slave fds.** When it inherits a controlling
   terminal from the parent, it forwards the inner container pty's I/O
   straight to `/dev/tty` instead of using the stdio slots we set. Bash's
   prompt reached our screen but the master fd we kept stayed empty, so
   there was no way to forward host stdin into bash either.

Fix:

- Allocate an inner pty pair via `nix::pty::openpty`. Give the slave to
  `lxc-attach` (and thus bash) for stdin/stdout/stderr; keep the master.
- In `pre_exec`: `setsid()` to drop the inherited controlling terminal,
  then `ioctl(TIOCSCTTY)` on fd 0 so `lxc-attach` adopts our slave as its
  new ctty. This is what makes the slave fds actually carry the data.
- Output thread copies master to host stdout and signals "ready" on the
  first byte from inside the container, which marks bash as past its
  readline init and safe to feed.
- Input thread waits for "ready" (capped at 5s so a wedged shell doesn't
  block forever, since the higher-level command timeout takes over) before
  forwarding host stdin to the master.

Adds the `term` feature to the workspace `nix` dependency so `pty::openpty`
is available.

Validated end-to-end on Ubuntu Resolute (arm64, OrbStack) running the CLI's
sandboxed shell tool; bash receives wrapper text, runs commands, and exits
cleanly.

Supersedes the inherit-stdio approach in a3f2a66.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@caarlos0 caarlos0 changed the title Fix/lxc stdin fix(lxc): lxc tty May 12, 2026
caarlos0 and others added 5 commits May 12, 2026 14:14
The new pty-based attach_run uses pre_exec, openpty, TIOCSCTTY, and
nix::unistd::setsid — none of which exist on Windows. Workspace clippy
runs on windows-latest and was failing to compile lxc_common.

Gate the Unix implementation behind cfg(unix) and provide a stub for
non-Unix targets so the crate keeps compiling workspace-wide. lxc-exec
itself is still Linux-only at runtime; the stub just lets the workspace
build on Windows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The codebase consistently gates platform-specific code with explicit
target_os checks (target_os = "windows" / "linux" / "macos") rather
than the broader unix/not(unix) family. cfg(unix) also covers macOS,
where lxc isn't available either, so the explicit Linux gate is more
accurate. Convert the existing current_euid gate (added in #275) and
the new attach_run gate to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
`LxcScriptRunner::run_internal` only calls `container.destroy()` after
`attach_run` returns. When the runner is terminated by a fatal signal
(its parent exited or sent a SIGHUP/SIGTERM/SIGINT), the in-flight
`attach_run` is interrupted and the destroy block is never reached,
leaving the container running indefinitely — `lxc-ls` lists it long
after the CLI session that spawned it is gone.

Install a `signal_cleanup` watchdog in `lxc-exec`'s `main` that:

  - Blocks SIGHUP/SIGTERM/SIGINT in the calling thread before any other
    thread is spawned, so all threads inherit the blocked mask.
  - Spawns a dedicated thread that synchronously `sigwait`s for any of
    those signals, destroys the currently active container, and exits
    with `128 + signo` so the parent observes a normal signal-style
    exit.

`LxcScriptRunner::run_internal` registers the active container name with
the watchdog right after resolving it, so the cleanup window covers
create/start/attach as well as the normal post-attach destroy path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The new signal_cleanup module pulls in nix::sys::signal::{SigSet,
Signal} and nix::Result, which are POSIX-only and don't compile on
Windows. The wxc-exec-lint CI job runs cargo clippy --workspace on
windows-latest, so lxc_common has to compile workspace-wide.

Gate the nix imports, the watchdog thread, and the install() body
behind cfg(target_os = "linux"); provide a no-op install() stub for
non-Linux that returns Ok(()). Signal-driven cleanup only matters at
runtime where lxc-exec actually runs, which is Linux. set_active stays
cross-platform — it's a plain Mutex<Option<String>> with no syscall
dependencies — so callers don't need their own cfg gates.

Also switches the install() return type from nix::Result<()> to
Result<(), String> so the signature exists on every target. The single
caller in lxc/src/main.rs already formats the error with Display.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Pre-existing formatting drift from commit 49fbbb9 (the inner-pty
attach_run rewrite) — surfaced now that clippy passes and the lint
workflow reaches its 'Fail if formatting check failed' step.

No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@asklar asklar requested a review from bbonaby May 12, 2026 18:29
Comment thread src/lxc_common/src/signal_cleanup.rs
Comment thread src/lxc_common/src/signal_cleanup.rs Outdated
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@caarlos0

Copy link
Copy Markdown
Contributor Author

@bbonaby thanks for the review, updated!

caarlos0 and others added 3 commits May 13, 2026 11:27
The INSTALLED guard added in the review commit was placed at module
scope without a cfg gate, but its only reader — install() — is
cfg(target_os = "linux"). On Windows clippy that makes INSTALLED
dead code:

    error: static `INSTALLED` is never used
      --> lxc_common\src\signal_cleanup.rs:29:8

Gate the static behind cfg(target_os = "linux") to match install().
ACTIVE_CONTAINER stays cross-platform — set_active is callable on every
target.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@caarlos0

Copy link
Copy Markdown
Contributor Author

failing tests are on main as well - ignoring.

@bbonaby

bbonaby commented May 13, 2026

Copy link
Copy Markdown
Collaborator

@Caarlos, I merged a PR that should fix those errors you were getting. If you merge with main and fix the conflict we should see all green after

caarlos0 added 2 commits May 13, 2026 14:10
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@caarlos0

Copy link
Copy Markdown
Contributor Author

@bbonaby done! thank you

Comment thread src/lxc_common/src/lxc_bindings.rs
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 14, 2026 13:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Linux LXC backend to better support interactive PTY-driven shell sessions and to clean up active containers when lxc-exec is terminated by common fatal signals.

Changes:

  • Adds signal-based cleanup tracking for the active LXC container.
  • Reworks lxc-attach execution to bridge through an inner PTY with controlling-terminal setup.
  • Simplifies LXC destruction via lxc-destroy -f and enables the nix term feature.
Show a summary per file
File Description
src/lxc/src/main.rs Installs LXC signal cleanup early in lxc-exec startup.
src/lxc_common/src/signal_cleanup.rs Adds active-container tracking and a signal watchdog thread.
src/lxc_common/src/lxc_runner.rs Registers the active container name for signal cleanup.
src/lxc_common/src/lxc_bindings.rs Adds PTY-backed attach_run, adjusts LXC path cfg handling, and simplifies destroy.
src/lxc_common/src/lib.rs Exports the new signal cleanup module.
src/Cargo.toml Enables the nix term feature for PTY support.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 5

Comment thread src/lxc_common/src/lxc_bindings.rs
Comment thread src/lxc_common/src/signal_cleanup.rs Outdated
Comment thread src/lxc_common/src/lxc_runner.rs Outdated
Comment thread src/lxc_common/src/lxc_bindings.rs Outdated
Comment on lines +331 to +333
// Cap the wait so a wedged container doesn't block forever; if the
// shell never produces output the caller's higher-level timeout
// catches it.
Comment thread src/lxc_common/src/signal_cleanup.rs Outdated
caarlos0 and others added 2 commits May 14, 2026 11:40
The signal mask installed by signal_cleanup::install() blocks SIGHUP,
SIGTERM, and SIGINT in the runner so the watchdog thread can sigwait
on them. That mask is inherited across fork+exec, so without an explicit
reset the inner shell launched by lxc-attach inherits the blocked set
and silently ignores Ctrl-C and termination signals.

Restore the default mask in the existing pre_exec closure so the inner
process behaves like any other interactive shell.

Also tighten the comment on the 5-second readiness gate: it only caps
how long the stdin forwarder waits before kicking in, it is not a
substitute for the (still unimplemented) script_timeout. The original
wording overstated the protection it provides.

Addresses Copilot review feedback on PR #283.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
…xit is true

LxcScriptRunner unconditionally called signal_cleanup::set_active so
the watchdog thread would lxc-destroy the container on a fatal signal.
That is the right behavior for the default case, but with
`lifecycle.destroyOnExit = false` the normal completion path
deliberately preserves the container. A SIGTERM during such a run would
silently delete a container the caller asked to keep.

Gate the registration on self.destroy_on_exit so the signal path mirrors
the normal-completion behavior.

Addresses Copilot review feedback on PR #283.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
caarlos0 and others added 2 commits May 14, 2026 11:41
Previously, install() called thread_block() to mask SIGHUP/SIGTERM/SIGINT
before spawning the watchdog thread. If the spawn failed, the function
returned an error but the signals stayed blocked. main() only logged a
warning and continued, leaving the process in a state where those
termination signals were silently ignored — exactly the failure mode
the watchdog exists to handle.

Two fixes:

1. signal_cleanup::install() now restores the original mask via
   thread_unblock() when the watchdog spawn fails, so callers see a
   clean error rather than a permanently broken signal disposition.
   It also defers the INSTALLED flag until the watchdog is actually
   running, so a retry after a transient spawn failure works.

2. lxc-exec's main() now exits with status 1 on install() failure
   instead of warning and continuing. Containers that leak on Ctrl-C
   are exactly what this code prevents; we should not run without it.

Addresses Copilot review feedback on PR #283.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
On a fatal signal the watchdog destroyed the active container but
bypassed the network-policy cleanup that the normal completion path
runs. Any iptables MXC-* chain and FORWARD hook the runner installed
for that container leaked, breaking later runs that reused the same
container name (or just slowly accumulating dead chains).

Track the host-side veth interface alongside the container name in
the watchdog's active-sandbox slot, register it from
LxcScriptRunner::run_internal right after veth discovery, and have
the watchdog tear down iptables before destroying the container so
the FORWARD hook is removed while the veth name still exists.

Adds NetworkIptablesManager::force_cleanup() so the watchdog can
trigger the same teardown path without owning the original manager
instance.

Addresses Copilot review feedback on PR #283.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
@caarlos0

Copy link
Copy Markdown
Contributor Author

Addressed the latest review pass:

  • Signals stay blocked in the inner shell (lxc_bindings.rs:276) → 13d644d. pre_exec now restores the default mask before exec so Ctrl-C / termination work inside the attached shell.
  • set_active runs even when destroyOnExit=false (lxc_runner.rs:103) → d752012. Registration is gated on self.destroy_on_exit so a SIGTERM no longer deletes a container the caller asked to keep.
  • Failed watchdog spawn leaves signals blocked (signal_cleanup.rs:69) → 1a9d6a4. install() calls thread_unblock() on spawn failure, defers INSTALLED until the watchdog is actually running, and lxc-exec's main now exits 1 on install failure instead of warning and continuing.
  • iptables chain/FORWARD hook leaks on fatal signal (signal_cleanup.rs:91) → 7783829. Watchdog now tracks the host-side veth alongside the container name and tears down iptables before destroying the container; new NetworkIptablesManager::force_cleanup does the recovery without owning the original manager.
  • Comment overstated timeout protection (lxc_bindings.rs:333) → folded into 13d644d. Comment now says the 5 s gate only delays the stdin forwarder; real timeout enforcement is still tracked by the existing TODO in lxc_runner.

cargo fmt, cargo clippy --target aarch64-unknown-linux-gnu -- -D warnings, and cargo test -p lxc_common (25 passed) all green.

@bbonaby bbonaby left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bbonaby bbonaby merged commit 9cf07a8 into microsoft:main May 14, 2026
15 checks passed
@bbonaby

bbonaby commented May 14, 2026

Copy link
Copy Markdown
Collaborator

We have a few more PRs we probably want to get in before we spin up another release. #284, #288, and #308. potentially #306 and #307 as well. Will keep you posted.

@caarlos0 caarlos0 deleted the fix/lxc-stdin branch May 14, 2026 15:33
@caarlos0

Copy link
Copy Markdown
Contributor Author

cool, FWIW seems like PTY is not working on macos seatbelt as well, working on a fix 🙏🏻

@bbonaby

bbonaby commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the catch! I created this issue: https://github.com/microsoft/mxc/issues/311 feel free to update it however you wish with more details if you want. CC: @richiemsft, not sure if #288 will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants