Skip to content

fix(distroless): router self-probe sweep + prompt-shields default + kind staleness + e2e gate (v0.1.13)#449

Merged
pallakatos merged 2 commits into
mainfrom
fix/distroless-router-probe-sweep
Jun 24, 2026
Merged

fix(distroless): router self-probe sweep + prompt-shields default + kind staleness + e2e gate (v0.1.13)#449
pallakatos merged 2 commits into
mainfrom
fix/distroless-router-probe-sweep

Conversation

@pallakatos

Copy link
Copy Markdown
Collaborator

Summary

Completes the distroless-move cleanup (#383) across AKS, local-k8s, and docker. The AL3 distroless images dropped sh/curl/iptables; every kubectl exec of those tools into the distroless inference-router broke on the real distroless path — operator AGT/metrics panels and kars egress|policy|model|add|handoff. (egress-guard init + controller probes were the earlier-fixed instances.)

The unifying fix

New kars-inference-router probe [GET|POST] <path> [json-body] subcommand: hits the router's own 127.0.0.1:8443, reads the admin token internally (Authorization: Bearer), and is present in both the distroless router image and the sandbox image. Every CLI/operator curl/sh/wget/cat-into-distroless call now uses it. No tools added to any hardened image. Docker-mode execs (tool-rich sandbox container) unchanged.

Also

  • prompt-shields default off + --require-prompt-shields opt-in (bare Foundry emits no prompt_filter_results → prior default blocked every response).
  • kind staleness: loadImageIntoKind verifies the node image by ID and re-imports on mismatch — kars dev --release now actually loads the distroless images instead of a stale :dev (the bug that masked all of this locally).
  • e2e gate: test_sandbox_pod_starts asserts the sandbox pod passes the egress-guard init AND the router self-probe works — the e2e previously only checked ns/NetworkPolicy/SA, never the pod, which is exactly why this class shipped uncaught.

Verification

cargo build --release + probe smoke-tested; CLI tsc + lint (0 errors) + 821 tests (incl. new refs.test.ts); bash -n e2e clean. End-to-end distroless validation runs via the new e2e gate (Linux CI) and kars dev --release on a fresh kind.

Security audit: docs/internal/security-audits/2026-06-24-distroless-router-probe-sweep.md (2 sign-offs). No runtime security control weakened.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…ind staleness + e2e gate

The AL3 distroless move (#383) removed sh/curl/iptables from the controller/
inference-router/a2a/conformance images. Everything that exec'd those tools into
the distroless inference-router broke on the real distroless path (AKS + kind
--release): operator AGT/metrics panels, kars egress/policy/model/add/handoff.

Fixes:
- inference-router: new 'kars-inference-router probe [GET|POST] <path> [body]'
  subcommand — hits its own localhost:8443, reads the admin token internally
  (Authorization: Bearer), present in distroless router AND sandbox image.
- CLI: replace every kubectl-exec curl/sh/wget/cat into -c inference-router with
  the probe binary (operator x8, egress, policy, model, add, handoff). Docker-mode
  execs (tool-rich sandbox container) unchanged.
- prompt-shields default OFF + --require-prompt-shields opt-in (bare Foundry
  emits no prompt_filter_results -> prior default blocked every response).
- dev/local-k8s: loadImageIntoKind verifies node image by ID + re-imports on
  mismatch — 'kars dev --release' now loads the real distroless images instead of
  a stale :dev (the bug that masked all of this locally).
- e2e: test_sandbox_pod_starts asserts the sandbox POD passes the egress-guard
  init AND the router self-probe works — the regression gate that was missing.

docker mode clean (single tool-rich container, router in-process).

Verified: cargo build --release + probe smoke-test; CLI tsc + lint(0) + 821 tests;
bash -n e2e. Security audit: docs/internal/security-audits/2026-06-24-distroless-router-probe-sweep.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

…or distroless gate

The new test_sandbox_pod_starts gate caught a real gap: the e2e never loaded
a sandbox image, so the egress-guard init (now on ctx.sandbox_image) hit
ErrImagePull. Two fixes:

- controller: egress-guard init container now sets imagePullPolicy: pull_policy
  (same as the agent container — both run the sandbox image). Neutral on AKS
  (Always for :latest), correct on kind (IfNotPresent) so a loaded sandbox
  image is authoritative instead of force-pulling ACR.
- e2e: tests/e2e/Dockerfile.sandbox-stub (azurelinux+iptables, same base/backend
  as the production sandbox) loaded as kars-sandbox-e2e:dev; SANDBOX_IMAGE points
  at it so the gate runs the egress-guard's real iptables. Gate diagnostics now
  distinguish ErrImagePull (harness) from a tool break (the regression class).

Security audit addendum appended (no control weakened).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@pallakatos pallakatos merged commit b669fa1 into main Jun 24, 2026
35 checks passed
@pallakatos pallakatos deleted the fix/distroless-router-probe-sweep branch June 24, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant