fix(lxc): lxc tty#283
Conversation
Wire lxc-attach's stdin/stdout/stderr to the parent process so the inner process shares the pty driving lxc-exec. Without this, stdin is closed and interactive shells exit immediately on EOF. Mirrors the AppContainer runner on Windows, which shares the parent's ConPTY with the sandboxed child. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
`lxc-destroy -f` already force-stops a running container, but `destroy()` also called `lxc-stop` first. Plain `lxc-stop` waits up to 60 seconds for a graceful shutdown, which is fatal for distros that run systemd as PID 1 in unprivileged userns: init never cleanly responds to SIGPWR, so the wait times out before the force-stop step. Removing the prior stop drops cleanup from ~60s to ~0.2s on ubuntu and is a no-op on alpine (no init to wait on). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The inherit-stdio approach from the previous commit got `lxc-attach` running but produced silent shells: bash booted, printed its prompt, and then ignored everything the host wrote. Two compounding problems: 1. **Pre-buffered input was discarded.** Bash's readline init calls `tcsetattr` on its stdin, which can flush bytes the parent buffered into the pty before bash got there. Anything the CLI pre-sent (e.g. a shell wrapper writing `PS1=""; PS2=""; unset HISTFILE`) was lost. 2. **`lxc-attach` bypassed our slave fds.** When it inherits a controlling terminal from the parent, it forwards the inner container pty's I/O straight to `/dev/tty` instead of using the stdio slots we set. Bash's prompt reached our screen but the master fd we kept stayed empty, so there was no way to forward host stdin into bash either. Fix: - Allocate an inner pty pair via `nix::pty::openpty`. Give the slave to `lxc-attach` (and thus bash) for stdin/stdout/stderr; keep the master. - In `pre_exec`: `setsid()` to drop the inherited controlling terminal, then `ioctl(TIOCSCTTY)` on fd 0 so `lxc-attach` adopts our slave as its new ctty. This is what makes the slave fds actually carry the data. - Output thread copies master to host stdout and signals "ready" on the first byte from inside the container, which marks bash as past its readline init and safe to feed. - Input thread waits for "ready" (capped at 5s so a wedged shell doesn't block forever, since the higher-level command timeout takes over) before forwarding host stdin to the master. Adds the `term` feature to the workspace `nix` dependency so `pty::openpty` is available. Validated end-to-end on Ubuntu Resolute (arm64, OrbStack) running the CLI's sandboxed shell tool; bash receives wrapper text, runs commands, and exits cleanly. Supersedes the inherit-stdio approach in a3f2a66. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The new pty-based attach_run uses pre_exec, openpty, TIOCSCTTY, and nix::unistd::setsid — none of which exist on Windows. Workspace clippy runs on windows-latest and was failing to compile lxc_common. Gate the Unix implementation behind cfg(unix) and provide a stub for non-Unix targets so the crate keeps compiling workspace-wide. lxc-exec itself is still Linux-only at runtime; the stub just lets the workspace build on Windows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The codebase consistently gates platform-specific code with explicit target_os checks (target_os = "windows" / "linux" / "macos") rather than the broader unix/not(unix) family. cfg(unix) also covers macOS, where lxc isn't available either, so the explicit Linux gate is more accurate. Convert the existing current_euid gate (added in #275) and the new attach_run gate to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
`LxcScriptRunner::run_internal` only calls `container.destroy()` after
`attach_run` returns. When the runner is terminated by a fatal signal
(its parent exited or sent a SIGHUP/SIGTERM/SIGINT), the in-flight
`attach_run` is interrupted and the destroy block is never reached,
leaving the container running indefinitely — `lxc-ls` lists it long
after the CLI session that spawned it is gone.
Install a `signal_cleanup` watchdog in `lxc-exec`'s `main` that:
- Blocks SIGHUP/SIGTERM/SIGINT in the calling thread before any other
thread is spawned, so all threads inherit the blocked mask.
- Spawns a dedicated thread that synchronously `sigwait`s for any of
those signals, destroys the currently active container, and exits
with `128 + signo` so the parent observes a normal signal-style
exit.
`LxcScriptRunner::run_internal` registers the active container name with
the watchdog right after resolving it, so the cleanup window covers
create/start/attach as well as the normal post-attach destroy path.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
The new signal_cleanup module pulls in nix::sys::signal::{SigSet,
Signal} and nix::Result, which are POSIX-only and don't compile on
Windows. The wxc-exec-lint CI job runs cargo clippy --workspace on
windows-latest, so lxc_common has to compile workspace-wide.
Gate the nix imports, the watchdog thread, and the install() body
behind cfg(target_os = "linux"); provide a no-op install() stub for
non-Linux that returns Ok(()). Signal-driven cleanup only matters at
runtime where lxc-exec actually runs, which is Linux. set_active stays
cross-platform — it's a plain Mutex<Option<String>> with no syscall
dependencies — so callers don't need their own cfg gates.
Also switches the install() return type from nix::Result<()> to
Result<(), String> so the signature exists on every target. The single
caller in lxc/src/main.rs already formats the error with Display.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Pre-existing formatting drift from commit 49fbbb9 (the inner-pty attach_run rewrite) — surfaced now that clippy passes and the lint workflow reaches its 'Fail if formatting check failed' step. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
|
@bbonaby thanks for the review, updated! |
The INSTALLED guard added in the review commit was placed at module
scope without a cfg gate, but its only reader — install() — is
cfg(target_os = "linux"). On Windows clippy that makes INSTALLED
dead code:
error: static `INSTALLED` is never used
--> lxc_common\src\signal_cleanup.rs:29:8
Gate the static behind cfg(target_os = "linux") to match install().
ACTIVE_CONTAINER stays cross-platform — set_active is callable on every
target.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
|
failing tests are on main as well - ignoring. |
|
@Caarlos, I merged a PR that should fix those errors you were getting. If you merge with main and fix the conflict we should see all green after |
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
|
@bbonaby done! thank you |
Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Linux LXC backend to better support interactive PTY-driven shell sessions and to clean up active containers when lxc-exec is terminated by common fatal signals.
Changes:
- Adds signal-based cleanup tracking for the active LXC container.
- Reworks
lxc-attachexecution to bridge through an inner PTY with controlling-terminal setup. - Simplifies LXC destruction via
lxc-destroy -fand enables thenixtermfeature.
Show a summary per file
| File | Description |
|---|---|
src/lxc/src/main.rs |
Installs LXC signal cleanup early in lxc-exec startup. |
src/lxc_common/src/signal_cleanup.rs |
Adds active-container tracking and a signal watchdog thread. |
src/lxc_common/src/lxc_runner.rs |
Registers the active container name for signal cleanup. |
src/lxc_common/src/lxc_bindings.rs |
Adds PTY-backed attach_run, adjusts LXC path cfg handling, and simplifies destroy. |
src/lxc_common/src/lib.rs |
Exports the new signal cleanup module. |
src/Cargo.toml |
Enables the nix term feature for PTY support. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 5
| // Cap the wait so a wedged container doesn't block forever; if the | ||
| // shell never produces output the caller's higher-level timeout | ||
| // catches it. |
The signal mask installed by signal_cleanup::install() blocks SIGHUP, SIGTERM, and SIGINT in the runner so the watchdog thread can sigwait on them. That mask is inherited across fork+exec, so without an explicit reset the inner shell launched by lxc-attach inherits the blocked set and silently ignores Ctrl-C and termination signals. Restore the default mask in the existing pre_exec closure so the inner process behaves like any other interactive shell. Also tighten the comment on the 5-second readiness gate: it only caps how long the stdin forwarder waits before kicking in, it is not a substitute for the (still unimplemented) script_timeout. The original wording overstated the protection it provides. Addresses Copilot review feedback on PR #283. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
…xit is true LxcScriptRunner unconditionally called signal_cleanup::set_active so the watchdog thread would lxc-destroy the container on a fatal signal. That is the right behavior for the default case, but with `lifecycle.destroyOnExit = false` the normal completion path deliberately preserves the container. A SIGTERM during such a run would silently delete a container the caller asked to keep. Gate the registration on self.destroy_on_exit so the signal path mirrors the normal-completion behavior. Addresses Copilot review feedback on PR #283. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
Previously, install() called thread_block() to mask SIGHUP/SIGTERM/SIGINT before spawning the watchdog thread. If the spawn failed, the function returned an error but the signals stayed blocked. main() only logged a warning and continued, leaving the process in a state where those termination signals were silently ignored — exactly the failure mode the watchdog exists to handle. Two fixes: 1. signal_cleanup::install() now restores the original mask via thread_unblock() when the watchdog spawn fails, so callers see a clean error rather than a permanently broken signal disposition. It also defers the INSTALLED flag until the watchdog is actually running, so a retry after a transient spawn failure works. 2. lxc-exec's main() now exits with status 1 on install() failure instead of warning and continuing. Containers that leak on Ctrl-C are exactly what this code prevents; we should not run without it. Addresses Copilot review feedback on PR #283. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
On a fatal signal the watchdog destroyed the active container but bypassed the network-policy cleanup that the normal completion path runs. Any iptables MXC-* chain and FORWARD hook the runner installed for that container leaked, breaking later runs that reused the same container name (or just slowly accumulating dead chains). Track the host-side veth interface alongside the container name in the watchdog's active-sandbox slot, register it from LxcScriptRunner::run_internal right after veth discovery, and have the watchdog tear down iptables before destroying the container so the FORWARD hook is removed while the veth name still exists. Adds NetworkIptablesManager::force_cleanup() so the watchdog can trigger the same teardown path without owning the original manager instance. Addresses Copilot review feedback on PR #283. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Carlos Alexandro Becker <caarlos0@users.noreply.github.com>
|
Addressed the latest review pass:
|
|
cool, FWIW seems like PTY is not working on macos seatbelt as well, working on a fix 🙏🏻 |
|
Thanks for the catch! I created this issue: https://github.com/microsoft/mxc/issues/311 feel free to update it however you wish with more details if you want. CC: @richiemsft, not sure if #288 will fix it. |
📖 Description
Make the LXC backend usable end-to-end for an interactive sandboxed shell driven over a host pty (the model the GitHub Copilot CLI uses on Linux for its
bashtool). Four stacked fixes — each was necessary, none was sufficient on its own.1.
attach_run: inherit parent stdio (a3f2a66)Wires
lxc-attach's stdin/stdout/stderr to the parent process so the inner shell shares whatever driveslxc-exec. Without this, the child gets a closed stdin and exits immediately on EOF — interactive shells can't survive their first read. Mirrors the AppContainer runner's pty sharing on Windows.2.
destroy(): skip the redundantlxc-stop(562cd5a)lxc-destroy -falready force-stops a running container. We were also callinglxc-stopfirst, which waits up to 60 s for graceful shutdown. On distros that run systemd as PID 1 in unprivileged userns (anything ubuntu-class), init never cleanly responds to SIGPWR — the wait always times out before the force-stop step. Cleanup drops from ~60 s to ~0.2 s; alpine is unaffected.3.
attach_run: bridge through an inner pty with a detached ctty (49fbbb9)The inherit-stdio fix from #1 got bash booting and printing its prompt, but the shell then ignored everything the host wrote. Two compounding root causes:
tcsetattron its stdin, which can flush bytes the parent buffered into the pty before bash got there. Anything the CLI pre-sent (e.g. its shell-init wrapper) was lost.lxc-attachbypassed our slave fds. When it inherits a controlling terminal from the parent, it forwards the inner container pty's I/O straight to/dev/ttyinstead of using the stdio slots. Bash's prompt reached our screen but the master fd we kept stayed empty, so there was no way to forward host stdin into bash either.Fix:
nix::pty::openpty. Give the slave tolxc-attach(and thus bash) for stdin/stdout/stderr; keep the master in the parent.pre_exec:setsid()to drop the inherited controlling terminal, thenioctl(TIOCSCTTY)on fd 0 solxc-attachadopts our slave as its new ctty. This is what makes the slave fds actually carry the data.Adds the
termfeature to the workspacenixdependency sopty::openptyis available.4.
lxc-exec: destroy the active container on fatal signals (f146e9f)LxcScriptRunner::run_internalonly callscontainer.destroy()afterattach_runreturns. When the runner is killed by a fatal signal — its parent exited, or it received SIGHUP/SIGTERM/SIGINT — the in-flightattach_runis interrupted and the destroy block is never reached, leaving the container running indefinitely (lxc-lslists it long after the CLI session that spawned it is gone).Install a
signal_cleanupwatchdog inlxc-exec'smain:sigwaits for any of those signals, destroys the currently active container, and exits with128 + signoso the parent sees a normal signal-style exit.LxcScriptRunner::run_internalregisters the active container name with the watchdog right after resolving it, so the cleanup window covers create/start/attach as well as the normal post-attach destroy path.🔗 References
Surfaced while wiring up Linux sandbox support in the GitHub Copilot CLI runtime. Tested against the
createConfigFromPolicymigration in copilot-agent-runtime#7767.🔍 Validation
cargo test -p lxc_common --lib— 25/25 unit tests green on host (macOS arm64) and target (linux/aarch64-unknown-linux-gnu).cargo check -p lxc_common --target aarch64-unknown-linux-gnuclean.bashtool: container creates, bash boots, the CLI's init wrapper executes, commands run, output streams back, container destroys in <1 s. Repeated successfully across 10+ invocations with noABORTINGstart failures or 60 s teardown waits.lxc-execmid-run: container is destroyed before the process exits,lxc-lsshows no orphaned containers across repeated runs.✅ Checklist
📋 Issue Type
refs https://github.com/github/copilot-agent-runtime/issues/164
Microsoft Reviewers: Open in CodeFlow