feat(lucebox): hub CLI + autotune/sweep/profile + harness adapters + shell wrapper#335
feat(lucebox): hub CLI + autotune/sweep/profile + harness adapters + shell wrapper#335easel wants to merge 9 commits into
Conversation
91276a7 to
62b21f0
Compare
## What Containerization stack for lucebox-hub. Dockerfile + docker-bake.hcl build the lucebox-hub image (build-env and runtime stages); scripts/build_image.sh drives local builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO sidecars consumed by /props. GitHub Actions add .github/workflows/docker.yml (build & publish), update ci.yml, and add release-luce-bench.yml for tagging. Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml, .gitignore, README) live here because the Dockerfile uv-syncs the workspace at build time. ## Why Provides the reproducible image and CI pipeline every other split PR deploys into. Centralizing build/publish here keeps Dockerfile, entrypoint, and workspace-root pinning in one reviewable change. ## Dependencies - Luce-Org#335 (lucebox-cli): Dockerfile COPYs lucebox/ into the image - Luce-Org#337 (lucebench-harness): Dockerfile COPYs luce-bench/ into the image
## What Containerization stack for lucebox-hub. Dockerfile + docker-bake.hcl build the lucebox-hub image (build-env and runtime stages); scripts/build_image.sh drives local builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO sidecars consumed by /props. GitHub Actions add .github/workflows/docker.yml (build & publish), update ci.yml, and add release-luce-bench.yml for tagging. Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml, .gitignore, README) live here because the Dockerfile uv-syncs the workspace at build time. ## Why Provides the reproducible image and CI pipeline every other split PR deploys into. Centralizing build/publish here keeps Dockerfile, entrypoint, and workspace-root pinning in one reviewable change. ## Dependencies - Luce-Org#335 (lucebox-cli): Dockerfile COPYs lucebox/ into the image - Luce-Org#337 (lucebench-harness): Dockerfile COPYs luce-bench/ into the image
6c9078b to
360d332
Compare
## What Containerization stack for lucebox-hub. Dockerfile + docker-bake.hcl build the lucebox-hub image (build-env and runtime stages); scripts/build_image.sh drives local builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO sidecars consumed by /props. GitHub Actions add .github/workflows/docker.yml (build & publish), update ci.yml, and add release-luce-bench.yml for tagging. Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml, .gitignore, README) live here because the Dockerfile uv-syncs the workspace at build time. ## Why Provides the reproducible image and CI pipeline every other split PR deploys into. Centralizing build/publish here keeps Dockerfile, entrypoint, and workspace-root pinning in one reviewable change. ## Dependencies - Luce-Org#335 (lucebox-cli): Dockerfile COPYs lucebox/ into the image - Luce-Org#337 (lucebench-harness): Dockerfile COPYs luce-bench/ into the image
1d588db to
0d8e1ff
Compare
There was a problem hiding this comment.
29 issues found across 52 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lucebox/src/lucebox/docker_run.py">
<violation number="1" location="lucebox/src/lucebox/docker_run.py:231">
P3: Several newly added helper functions are dead code (defined but never called anywhere in the codebase).</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| # ── subprocess helpers ───────────────────────────────────────────────────── | ||
|
|
||
|
|
||
| def run(argv: list[str], *, check: bool = True) -> subprocess.CompletedProcess[str]: |
There was a problem hiding this comment.
P3: Several newly added helper functions are dead code (defined but never called anywhere in the codebase).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lucebox/src/lucebox/docker_run.py, line 231:
<comment>Several newly added helper functions are dead code (defined but never called anywhere in the codebase).</comment>
<file context>
@@ -0,0 +1,270 @@
+# ── subprocess helpers ─────────────────────────────────────────────────────
+
+
+def run(argv: list[str], *, check: bool = True) -> subprocess.CompletedProcess[str]:
+ """Run a command, streaming stdout/stderr to the user. `check=False` to
+ inspect exit codes manually."""
</file context>
There was a problem hiding this comment.
Fixed in 73f04f4: removed the unused run(), docker_inspect_running(), host_path_visible(), and stderr() helpers from docker_run.py, plus the now-unused sys import.
There was a problem hiding this comment.
Thanks for the update!
bb7bb11 to
a56b51b
Compare
Containerization stack for lucebox-hub. Dockerfile + docker-bake.hcl build the lucebox-hub image (build-env and runtime stages); scripts/build_image.sh drives local builds; server/scripts/entrypoint.sh emits IMAGE_INFO / HOST_INFO sidecars consumed by /props. GitHub Actions add .github/workflows/docker.yml (build & publish), update ci.yml, and add release-luce-bench.yml for tagging. Workspace-root files (pyproject.toml, uv.lock, Makefile, lefthook.yml, .gitignore, README) live here because the Dockerfile uv-syncs the workspace at build time. Provides the reproducible image and CI pipeline every other split PR deploys into. Centralizing build/publish here keeps Dockerfile, entrypoint, and workspace-root pinning in one reviewable change. - Luce-Org#335 (lucebox-cli): Dockerfile COPYs lucebox/ into the image - Luce-Org#337 (lucebench-harness): Dockerfile COPYs luce-bench/ into the image
…R + personal refs Strip forward-references to lucebox-cli (Luce-Org#335) and luce-bench (Luce-Org#337) plus the contributor's personal repos so the docker stack stands alone: - delete .github/workflows/release-luce-bench.yml (luce-bench PyPI publish; fires only on luce-bench-v* tags, needs a luce-bench/ dir not in this repo) - Makefile: drop test/smoke/bench/profile targets (invoke lucebench/lucebox modules absent here) and their now-unused vars - .gitignore: drop luce-bench/snapshots and external baseline-repo URLs - pyproject.toml / Dockerfile / docker.yml: de-reference Luce-Org#335/Luce-Org#337/luce-bench in comments No functional change: deps, workspace members, ruff config, and every build instruction are untouched, so CI stays green. The siblings re-add their own scaffolding when they land. Co-Authored-By: WOZCODE <contact@withwoz.com>
5a7e617 to
c46e358
Compare
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lucebox/src/lucebox/docker_run.py">
<violation number="1" location="lucebox/src/lucebox/docker_run.py:231">
P3: Several newly added helper functions are dead code (defined but never called anywhere in the codebase).</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
1235758 to
a731f1d
Compare
… adapters New lucebox/ Python package exposing the hub CLI (autotune, sweep, profile, smoke, models, config, download, host-check, docker_run) plus the lucebox.sh launcher wrapper and install.sh. Adds the harness/ adapter package wrapping external coding agents (claude_code, codex, hermes, openclaw, opencode, pi) that autotune sweeps drive. Ships scripts/check_lucebox_wrapper_sandbox.sh and scripts/test_lucebox_sh.sh for wrapper validation, full pytest coverage under lucebox/tests/, and the bragi autotune profile-sweep protocol docs. This is the user-facing surface of lucebox-hub: one CLI to launch the image, tune layer-split / pflash settings against a host, run sweeps, and dispatch bench runs. Splitting it out keeps Python-side review independent of the C++ server and Docker stack reviews. - Luce-Org#334 (docker-stack): docker_run.py launches the lucebox-hub image - Luce-Org#337 (lucebench-harness): lucebox bench delegates to luce-bench (workspace dep) - Luce-Org#336 (server-layer-split): autotune presumes layer-split build artifacts
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…to-auto-pick entrypoint.sh:write_host_info() bailed loudly when /opt/lucebox-hub/ did not exist on the host (unit tests, plain docker run without bind mount), because bash refuses the > redirect before the command runs and 2>/dev/null does not suppress the redirect's own error. Guard with an upfront [ -d ] check. test_lucebox_sh.sh:test_entrypoint_multi_target was asserting against the pre-Luce-Org#334 multi-target semantics (auto-pick + warn + exec shim). PR Luce-Org#334 (merged) changed that to refuse-to-auto-pick + exit non-zero. Update the assertion: still drives the auto-detect block (so any DRAFT_FAMILY_GLOB set -u regression trips), but now requires the refuse warn to fire and the shim to NOT exec. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the missing rc != 0 assertion to test_entrypoint_multi_target. The previous fix updated the test to look for the "Refusing to auto-select" warn and a non-exec'd shim, but didn't check the exit code. A regression where the entrypoint logged the warn but still exited 0 (silently auto-picking under the covers) would have slipped through. The container MUST fail to start on multi-target ambiguity — that is the whole point of the policy added in Luce-Org#334. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e CI Behavior-preserving cleanup of the lucebox hub CLI surfaced by review: - cli.py: delete dead _pick_variant_from_driver; collapse the six near-identical Typer client subcommands into a factory loop. - harness adapters: rewire claude_code onto _common; add build_base_parser + run_main so the six main() arg-parsers stop duplicating (per-client differences preserved); hoist KNOWN_AREAS in bench.py. - Unify DFLASH_ALLOWLIST in types.py (was duplicated with a silent one-field divergence between cli.py and sweep.py). - config.py: extract _atomic_write_doc. - Shell: shared _append_host_env / _append_scalar_env / _set_tty_flags helpers in lucebox.sh, _trim in entrypoint.sh, sandbox factory in the test runner. - FIX a tty regression introduced while sharing the docker interactive flags: the helper ran behind `< <(...)`, so `[ -t 1 ]` tested the pipe and forced -i even on a real tty (breaking interactive client TUIs). Now runs in caller scope via nameref; added a PTY-based regression test. - Add unit tests for the previously-untested harness adapters + bench, and wire lucebox + harness pytest into CI (it previously never ran pytest). 116 -> 156 tests; bash wrapper 54 -> 55; ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oad core Defer second-order features to follow-up PRs so this PR is just the host wrapper and the CLI needed to install, launch/serve, configure, and download models. Cuts the PR from ~11.2k to ~6.2k added lines. Deferred out of this PR (land as stacked follow-ups): - Agent-client adapters + bench: the entire net-new harness/ package (bench.py, the six clients, run_lucebench.sh) and the client launcher verbs. harness/ reverts to main's loose-scripts state; the root pyproject no longer adds harness as a workspace member/dep. - Autotune sweep + profiles: candidate_configs, the Profile registry and per-arch brackets, sweep.py, and the `autotune` command. The host-derived DFLASH_* heuristic (runtime_from_host) STAYS — config.live_config needs it to bake serve defaults — slimmed into autotune.py. - profile + smoke commands and their modules. Coupling fixes: - recommend_preset moved autotune.py -> download.py (models sub-app uses it). - Wrapper: drop the autotune --sweep exec-routing special case and trim usage/completion/exec-set to the core verbs. Tests/CI follow the surface: deferred-feature tests removed; runtime_from_host heuristic tests kept; new guard asserts the deferred verbs are NOT registered. CI pytest step scoped to lucebox. lucebox 72 passed, wrapper 53 passed, ruff + mypy + shellcheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 37 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lucebox.sh">
<violation number="1" location="lucebox.sh:904">
P2: Bash and fish completion lists are missing 'autotune', 'profile', and 'smoke' subcommands that remain valid via the catch-all dispatch in main()</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| cur="${COMP_WORDS[COMP_CWORD]}" | ||
| prev="${COMP_WORDS[COMP_CWORD-1]}" | ||
| cmds="install uninstall start stop restart enable disable status logs \ | ||
| serve pull update check completion config models \ |
There was a problem hiding this comment.
P2: Bash and fish completion lists are missing 'autotune', 'profile', and 'smoke' subcommands that remain valid via the catch-all dispatch in main()
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lucebox.sh, line 904:
<comment>Bash and fish completion lists are missing 'autotune', 'profile', and 'smoke' subcommands that remain valid via the catch-all dispatch in main()</comment>
<file context>
@@ -901,8 +901,8 @@ _lucebox_complete() {
cmds="install uninstall start stop restart enable disable status logs \
- serve pull update check completion config models autotune \
- profile smoke print-run help version"
+ serve pull update check completion config models \
+ print-run help version"
config_verbs="get set unset"
</file context>
…utotune) The exhaustive VRAM-tier test for recommend_preset lived in the deferred test_autotune_cli.py; since the function moved to download.py in the split, its direct test belongs alongside it. The models auto-recommend path is also covered indirectly by test_models_cli. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eftovers The serve-argv builder (docker_run.py) — the core's whole job — was only 50% covered by one test. Add lucebox/tests/test_docker_run.py pinning the argv/printable contract, every DFLASH_* env (always-on + conditional), target/draft path resolution, host-env forwarding, volume mounts, and docker_pull. docker_run.py 50% -> 96%. Includes a regression guard for the preset-cap analysis: a large preset serves at the conservative DflashRuntime() floor (max_ctx=16384), not a high OOM-prone ctx — the VRAM-tier heuristic's higher caps only apply via `autotune --apply` (which threads cfg.model.preset; separate PR). No bug. Also: - Add lucebox/**/*.py to the ruff include set (was mypy-checked but never ruff-linted) and clear the 3 unused imports it surfaced in cli.py. - Drop an orphan `# autotune` section divider and refresh two docstrings that referenced the deferred autotune subcommand. lucebox 89 passed, wrapper 53 passed, ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacked on the core lucebox CLI (Luce-Org#335). Restores the second-order tuning and diagnostics surface deferred from that PR: - autotune.py: the empirical sweep machinery — candidate_configs, the Profile registry + per-arch coding-agent-loop brackets, get_profile. (recommend_preset stays in download.py from the core split.) - sweep.py: the per-cell config sweep driver (config set -> restart -> luce-bench snapshot -> winner pick). - profile.py / smoke.py + the `autotune`, `profile`, `smoke` CLI commands. - Wrapper: the `autotune --sweep` exec-routing special case (sweeps must stay on docker run, not exec into the container they'd restart) and the autotune/profile/smoke entries in usage + completion + the exec set. Tests: sweep / profile / smoke / autotune-cli / candidate-configs suites and the Profile/bracket tests restored; test_cli guards that the client launcher verbs are still deferred while autotune/profile/smoke register. lucebox 122 passed, wrapper 55 passed, ruff + mypy + shellcheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacked on the tuning PR (which is stacked on core Luce-Org#335). Restores the last deferred cluster: the agent-client launch surface. - harness/ becomes an installable package again (bench.py + the six client adapters: claude_code, codex, hermes, openclaw, opencode, pi, plus _common and run_lucebench.sh) and rejoins the uv workspace. - cli.py: the claude/codex/opencode/hermes/pi/openclaw verbs, registered from one factory, plus _detect_server_url / _exec_client (which reuse profile._server_base_urls from the tuning PR — hence this stacks on it). - CI runs the harness pytest suite alongside lucebox again. recommend_preset stays in download.py (core's relocation), so the models sub-app is unaffected. lucebox + harness 156 passed, wrapper 55 passed, ruff + mypy + shellcheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Core carried three bits only the deferred follow-ups use. Evict them so the core CLI is just the basic launch/config/download surface: - types.BASE_DFLASH_ALLOWLIST — no core caller; used by the sweep's per-cell bracket. Moves to the tuning PR. - autotune.runtime_from_host preset-awareness (_preset_approx_gb + the ≥20 GB large-model 32K branch) — core always calls runtime_from_host(host) with no preset, so this was unreachable here. The preset-size cap moves to the tuning PR alongside autotune --apply (which threads cfg.model.preset). Behavior for the core's preset-less call is unchanged. - config.live_config(preset_name=...) — dead across the entire tree (every call site passes no arg). Deleted outright, not moved. Their tests move with them. lucebox 87 passed, wrapper 53, ruff + mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacked on the core lucebox CLI (Luce-Org#335). Restores the second-order tuning and diagnostics surface deferred from that PR: - autotune.py: the empirical sweep machinery — candidate_configs, the Profile registry + per-arch coding-agent-loop brackets, get_profile. (recommend_preset stays in download.py from the core split.) - sweep.py: the per-cell config sweep driver (config set -> restart -> luce-bench snapshot -> winner pick). - profile.py / smoke.py + the `autotune`, `profile`, `smoke` CLI commands. - Wrapper: the `autotune --sweep` exec-routing special case (sweeps must stay on docker run, not exec into the container they'd restart) and the autotune/profile/smoke entries in usage + completion + the exec set. Tests: sweep / profile / smoke / autotune-cli / candidate-configs suites and the Profile/bracket tests restored; test_cli guards that the client launcher verbs are still deferred while autotune/profile/smoke register. lucebox 122 passed, wrapper 55 passed, ruff + mypy + shellcheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacked on the tuning PR (which is stacked on core Luce-Org#335). Restores the last deferred cluster: the agent-client launch surface. - harness/ becomes an installable package again (bench.py + the six client adapters: claude_code, codex, hermes, openclaw, opencode, pi, plus _common and run_lucebench.sh) and rejoins the uv workspace. - cli.py: the claude/codex/opencode/hermes/pi/openclaw verbs, registered from one factory, plus _detect_server_url / _exec_client (which reuse profile._server_base_urls from the tuning PR — hence this stacks on it). - CI runs the harness pytest suite alongside lucebox again. recommend_preset stays in download.py (core's relocation), so the models sub-app is unaffected. lucebox + harness 156 passed, wrapper 55 passed, ruff + mypy + shellcheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stacked on the tuning PR (which is stacked on core Luce-Org#335). Restores the last deferred cluster: the agent-client launch surface. - harness/ becomes an installable package again (bench.py + the six client adapters: claude_code, codex, hermes, openclaw, opencode, pi, plus _common and run_lucebench.sh) and rejoins the uv workspace. - cli.py: the claude/codex/opencode/hermes/pi/openclaw verbs, registered from one factory, plus _detect_server_url / _exec_client (which reuse profile._server_base_urls from the tuning PR — hence this stacks on it). - CI runs the harness pytest suite alongside lucebox again. recommend_preset stays in download.py (core's relocation), so the models sub-app is unaffected. lucebox + harness 156 passed, wrapper 55 passed, ruff + mypy + shellcheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds the
luceboxhub CLI (autotune, sweep, profile, smoke, models, config, download, host-check, docker_run) plus thelucebox.shwrapper andinstall.shbootstrap. Also lands theharness/adapter package (claude_code, codex, hermes, openclaw, opencode, pi) and therun_lucebench.shshim thatlucebox bench-style flows shell out to. Additive only — no existing source files modified outside CI workflow edits, README touch-ups, and alefthook.ymlfor the new packages.Files
lucebox/— new Python package: CLI entrypoint, autotune/sweep/profile/smoke/config/download/host-check/docker_run modules, full pytest suite underlucebox/tests/.harness/— new Python package:harness.bench+ per-agent client adapters (claude_code.py,codex.py,hermes.py,openclaw.py,opencode.py,pi.py) and therun_lucebench.shoperator shim.lucebox.sh+install.sh— top-level shell wrapper and bootstrap installer.scripts/check_lucebox_wrapper_sandbox.sh,scripts/test_lucebox_sh.sh— wrapper-shape integration tests..github/workflows/ci.yml— wires the new packages into CI.harness/clients/README.md,lefthook.yml— supporting docs/config.Single commit, ~10.5k LOC added across 49 files.
Dependencies
lucebox.docker_runshells out todocker runagainst thelucebox-hubimage and reads theIMAGE_INFOproduced by the docker build. The CLI cannot exercise a real run path until docker-stack lands.harness/bench.pyinvokespython -m lucebench.cli ...as a subprocess;harness/clients/run_lucebench.shis the operator shim around the same package. Both require the lucebench package to be importable.Unit tests in
lucebox/tests/andharness/mock the external surfaces and pass standalone, but end-to-end validation requires the three siblings above.Test plan
cd lucebox && uv run pytest(standalone — mocks docker/lucebench surfaces)cd harness && uv run pytest(standalone)./lucebox.sh --helpresolves once build(docker): lucebox-hub container image + CI release pipeline #334 lands and the image is availablelucebox host-checkreports GPU + driver facts on a fresh hostlucebox autotune --dry-runenumerates candidate configs without launching a sweeplucebox bench-style flow once feat(luce-bench): in-tree bench harness + multi-turn agent_recorded + LLM judge #337 (luce-bench) is mergeable and refactor(server): shared layer-split backend + GGUF inspection + c2-gate plumbing #336 (server layer-split) is exposing the expected build flagsNote: this PR's CLI/adapter code is self-contained and unit-tests pass standalone, but full integration validation requires the docker, luce-bench, and server-layer-split siblings to land together.
Generated with Claude Code