-
Notifications
You must be signed in to change notification settings - Fork 213
feat(lucebox): docker stack + CLI + bench/profile + harness + luce-bench in-tree #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
easel
wants to merge
72
commits into
Luce-Org:main
Choose a base branch
from
easel:feat/lucebox-docker
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
72 commits
Select commit
Hold shift + click to select a range
d4546a5
feat(pflash): ee7 early-exit drafter + anchor-transitive cascade + bu…
dusterbloom 94907a4
refactor(pflash): rename DFLASH_COMPRESS_* → PFLASH_COMPRESS_* (casca…
dusterbloom 99f6b38
fix(pflash): adaptive anchor_radius eliminates 64K NIAH cliff
dusterbloom 766e46d
bench: add eval_quality_compare.py for LongBench F1 regression detection
dusterbloom 8c1d705
feat(qwen35): derive scalars from weights, assert vs GGUF metadata
dusterbloom 699bb5c
feat(pflash): adaptive composition via per-request fa_window override
dusterbloom a676161
feat(pflash): PFLASH_*/DFLASH_* env-var dual aliasing + transitive ca…
dusterbloom 6536b76
refactor(pflash): extract compress_cfg_from_env, kill qwen35/qwen3 pa…
dusterbloom b7dd89b
chore(pflash): move narrative comments to docs/, trim mega-blocks
dusterbloom ff0a6b9
fix(server): append closed <think> prefill in Jinja renderer when thi…
dusterbloom fc8c8e2
fix(chat_template): gate closed-think prefill injection to Qwen3 arch…
dusterbloom e64a2b8
refactor(c2-gate): wire c2_spec_decode_permitted into qwen35_backend
dusterbloom 97ca5b1
feat(lucebox): docker stack + CLI + bench/profile + harness + luce-be…
easel f4db35b
fix(lucebox): env > config.toml > default in _load_or_build
easel a73d482
chore(lint): fix ruff E741/E731/E501/F401/I001 surfaced by PR #285 CI
easel 09dc0be
chore(ci): fix ruff I001 + mypy KernelInfo union in lucebox
easel 28701f7
fix(ci): restore contents:read on luce-bench release job
easel 459194d
fix(harness,docker): reproducible uv sync + sensible lucebench defaults
easel 4e9fcf6
fix(p2): cubic P2 batch — pin actions, sanitize tags, mirror shell wr…
easel ccef455
fix(p3): cubic P3 batch — docs links, NOTICE copyright, areas __all__
easel 37b3fbd
feat(lucebox): docker stack + CLI + bench/profile + harness + luce-be…
easel 8c1f37d
feat(pflash): effective-size admission gate + keep-ratio guard (keep …
dusterbloom db79a7d
fix(server): route Qwen3.6/Laguna think-mode reasoning to reasoning_c…
easel f62b1eb
test(server): add render→emit integration tests for reasoning channel
easel 8b48ad8
Merge PR #308 (qwen-think-channel) into PR #285 (lucebox-docker)
easel 9a6db60
feat(lucebench): card-driven thinking control + client-side thinking …
easel 2f63a42
feat(extractor): multi-turn replay slicing + _is_claude_session fix
easel 026ab07
feat(agent_recorded): multi-turn replay loader + prefill-and-decode v…
easel cb58edb
feat(autotune): profile-driven sweep + fa_window axis + coding-agent-…
easel 72a8afc
fix(sweep): fall back to persisted [host] when LUCEBOX_HOST_* env empty
easel cefa0f5
feat(autotune): gemma4 WSL 24GB → max_ctx=98304 from sweep evidence
easel aadb424
Merge easel/feat/lucebox-docker (card-driven thinking control + qwen-…
easel 4b24445
docs(autotune): sweep protocol + qwen3.6-27b bragi runbook
easel cb36e06
feat(lucebox.sh): docker exec into live container for steady-state su…
easel 4668c0b
fix(longctx): lenient Risk-prefix grader accepts thinking preamble
easel 48fafe6
fix(lucebox): add luce-bench dep + fix model_cards duplication in Doc…
easel 542b4e2
Merge bragi's lucebench dep + Dockerfile model_cards dedup
easel 3dffb30
feat(autotune/sweep): bragi sweep learnings — tq3_0 required for qwen…
easel b915ccc
Merge bragi: sweep learnings (gemma 131K, qwen tq3_0)
easel 31fcaf6
test(sweep): update winner-pick test for post-3dffb30 max_ctx-first o…
easel 148fba0
docs(experiments): sindri verifies bragi's 131K on gemma — drop-in clean
easel 6ea9694
docs(experiments): note GPU power throttle on bragi baselines (86-90W…
easel 52b9da4
fix(agent_recorded): recognize call:<verb>{} tool emissions + hyphena…
easel 1d15ced
fix(agent): recognize call:<verb>{} emissions as agent-shape
easel 20554ff
fix(humaneval): parse-pass grader trims trailing garbage before decla…
easel 4b4fd28
Merge bragi: GPU power throttle note on baselines
easel 060492e
docs(experiments): bragi think vs nothink baseline summary 2026-05-30
easel fbc2d41
feat(pflash): adaptive compression-regime router (correct-by-construc…
dusterbloom b31544f
feat(pflash): wire type-gate router into live handler; prune disprove…
dusterbloom 8fc961b
feat(pflash): empty-response guard + bandit floor reconciliation (tas…
dusterbloom deba2fd
fix(forge): synthesize tool_use from call:<verb>{} plain-text emissions
easel deb5adb
Merge bragi: think vs nothink baseline summary doc
easel 83c5567
feat(pflash): merge pflash/ee7 — prefill KV compression (16 commits, …
easel a45c9fa
Merge remote-tracking branch 'easel/feat/lucebox-docker' into feat/lu…
easel 1122d02
Merge easel/feat/lucebox-docker: PFlash batch + chat_template gates
easel 5b15d34
Merge origin/main: spec-decode empty fallback + prefix-cache fix + docs
easel 6790deb
feat(pflash): add multi-turn session bench script + build test_server…
easel 4b757d1
fix(server): Gemma4 <|channel>thought token routing (code=0% fix)
easel 2062a37
fix(Dockerfile): verify test_server_unit binary present in runtime image
easel 12c50c0
docs(server): plan for call:<verb>{} tool-parser pattern + codex review
easel cdb8b9c
fix(server): parse gemma's call:<verb>{} plain-text tool emissions
easel 5ca695c
Merge PR #323: server-side call:<verb>{} tool parser
easel c70ebb0
Merge bragi: gemma4 channel-token fix + pflash test scripts
easel 329f611
Merge bragi: Dockerfile test_server_unit guard
easel b48ce66
fix(config): validate prefill_mode accepts only off/auto/always
easel b69a75e
docs(experiments): add Qwen3.6-27B PFlash A/B test doc (baseline in p…
easel d339916
Merge remote-tracking branch 'easel/feat/lucebox-docker' into feat/lu…
easel b240239
docs(experiments): update pflash A/B doc with critical finding
easel 1443239
fix(server): replace C++20 starts_with with C++17 rfind
easel aa00f49
docs(experiments): add Gemma4 call:verb parser fix verification doc
easel dcd28c1
docs(experiments): update Gemma4 fix doc with analysis and partial re…
easel b707e87
fix(autotune): document pflash requires prefix_cache_slots > 0
easel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Local venv and Python caches — uv rebuilds inside the image. | ||
| .venv/ | ||
| **/__pycache__/ | ||
| **/*.pyc | ||
|
|
||
| # Build artefacts. | ||
| **/build/ | ||
| **/build-*/ | ||
| dflash/build/ | ||
|
|
||
| # Model weights — bind-mount at runtime instead of baking into the image. | ||
| dflash/models/ | ||
| **/*.gguf | ||
| **/*.safetensors | ||
|
|
||
| # Git metadata. Submodule contents are kept; .git files inside the worktree | ||
| # are not needed at build time. | ||
| .git/ | ||
| **/.git | ||
| **/.gitignore.local | ||
|
|
||
| # Local agent / IDE state. | ||
| .claude/ | ||
| .idea/ | ||
| .vscode/ | ||
|
|
||
| # Misc large or volatile. | ||
| *.log | ||
| *.tmp | ||
| *.swp | ||
| **/*.bin | ||
| **/*.npy |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| name: Docker prebuilds | ||
|
|
||
| # Builds the cuda12 lucebox-hub Docker image defined in docker-bake.hcl | ||
| # and pushes it to GHCR. The bake file is the source of | ||
| # truth for arch matrices and CUDA pinning; this workflow only handles | ||
| # fetching submodules, freeing runner disk, signing in to the registry, and | ||
| # wiring the cache. | ||
|
|
||
| on: | ||
| # Build + push to GHCR when a GitHub Release is published. The release tag | ||
| # becomes one of the image tags via docker/metadata-action's `type=ref, | ||
| # event=tag` + `type=semver` rules below. | ||
| release: | ||
| types: [published] | ||
| # Build-only CI guard on PRs that touch the docker surface. We never push | ||
| # from a PR — even if we wanted to, GITHUB_TOKEN on PRs from forks lacks | ||
| # `packages:write`. The point is to catch Dockerfile / bake-file / arch- | ||
| # list regressions before they land on main. | ||
| pull_request: | ||
| paths: | ||
| - Dockerfile | ||
| - docker-bake.hcl | ||
| - .dockerignore | ||
| - .github/workflows/docker.yml | ||
| - server/CMakeLists.txt | ||
| - server/src/** | ||
| - server/test/** | ||
| - server/include/** | ||
| - server/scripts/** | ||
| - server/deps/** | ||
| - server/pyproject.toml | ||
| - pyproject.toml | ||
| - uv.lock | ||
| - lucebox.sh | ||
| - lucebox/** | ||
| # Manual trigger for one-off rebuilds or pre-release smoke tests. The | ||
| # `push` input controls whether the resulting images land in GHCR or only | ||
| # populate the buildx cache. | ||
| workflow_dispatch: | ||
| inputs: | ||
| push: | ||
| description: "Push images to GHCR after build" | ||
| type: boolean | ||
| default: false | ||
|
|
||
| # Single in-flight build per ref. New pushes cancel the previous run so we | ||
| # don't queue 30-min compiles. | ||
| concurrency: | ||
| group: docker-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| env: | ||
| REGISTRY: ghcr.io | ||
| IMAGE_NAME: ${{ github.repository_owner }}/lucebox-hub | ||
|
|
||
| jobs: | ||
| build: | ||
| name: ${{ matrix.variant }} | ||
| # ubuntu-latest = 4 CPU / 16 GB RAM / 14 GB free disk on the GitHub- | ||
| # hosted plan. The disk-free step at the top of the job claws back | ||
| # ~30 GB, which is enough to land a 14 GB image with build cache. | ||
| # CPU is the harder constraint: the fat-binary arch list can take hours | ||
| # on hosted runners. If you outgrow this: | ||
| # • Larger GitHub-hosted runners (`ubuntu-latest-8-cores`, paid) | ||
| # halve wall time. | ||
| # • A self-hosted runner with the host's nvcc avoids the | ||
| # containerised CUDA toolkit pull entirely. | ||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| contents: read | ||
| packages: write | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| variant: [cuda12] | ||
| steps: | ||
| - name: Free runner disk space | ||
| # The default ubuntu-latest image keeps ~25 GB of preinstalled | ||
| # tooling (Android SDK, .NET, Haskell, ghc, etc.) we don't need. | ||
| # Pinned action; check upstream releases before bumping. | ||
| uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 | ||
| with: | ||
| tool-cache: true | ||
| android: true | ||
| dotnet: true | ||
| haskell: true | ||
| large-packages: false # slow; preinstalled apt packages we don't need | ||
| swap-storage: true | ||
|
|
||
| - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 | ||
| with: | ||
| # Submodule contents are needed by the cmake build (llama.cpp ggml | ||
| # subtree, mit-han-lab Block-Sparse-Attention). The Dockerfile | ||
| # asserts they're present before running cmake. | ||
| submodules: recursive | ||
|
|
||
| - uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3 | ||
|
|
||
| - name: Log in to GHCR | ||
| # Skip on PR runs: we never push from a PR and the token from a fork | ||
| # PR can't `packages:write` anyway. | ||
| if: github.event_name != 'pull_request' | ||
| uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3 | ||
| with: | ||
| registry: ${{ env.REGISTRY }} | ||
| username: ${{ github.actor }} | ||
| password: ${{ secrets.GITHUB_TOKEN }} | ||
|
|
||
| - name: Capture build identity | ||
| id: identity | ||
| # /props.build identity baked into the image. GIT_SHA is the full | ||
| # commit sha (matches `${{ github.sha }}` — short-form is fine, we | ||
| # use the full 40-char form for "exactly which weights are running" | ||
| # forensics). BUILD_TIME is ISO 8601 UTC. IMAGE_TAG is filled in | ||
| # after the metadata-action step below picks the headline tag. | ||
| run: | | ||
| echo "git_sha=${{ github.sha }}" >> "$GITHUB_OUTPUT" | ||
| echo "build_time=$(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Derive image metadata | ||
| id: meta | ||
| uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051 # v5 | ||
| with: | ||
| images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} | ||
| # Suffix every tag with the variant so future CUDA stacks can | ||
| # coexist under the same image name. Examples (using cuda12): | ||
| # ghcr.io/<owner>/lucebox-hub:cuda12 (moving — main/dispatch/release) | ||
| # ghcr.io/<owner>/lucebox-hub:0.3.0-cuda12 (pinned — from `lucebox-v0.3.0` tag) | ||
| # ghcr.io/<owner>/lucebox-hub:feat-x-cuda12 (per branch) | ||
| # ghcr.io/<owner>/lucebox-hub:sha-abc1234-cuda12 (per commit) | ||
| flavor: | | ||
| latest=false | ||
| suffix=-${{ matrix.variant }},onlatest=true | ||
| tags: | | ||
| # Moving variant tag — emitted on main, release, and any | ||
| # workflow_dispatch with push:true. The `enable=` gate keeps | ||
| # branch + PR builds from clobbering the published `:cuda12`. | ||
| type=raw,value=${{ matrix.variant }},suffix=,priority=1000,enable=${{ github.event_name == 'release' || (github.ref == 'refs/heads/main' && github.event_name != 'pull_request') || (github.event_name == 'workflow_dispatch' && inputs.push) }} | ||
| # Pinned version tag — extracts the version from a | ||
| # `lucebox-v<X.Y.Z>` git tag push, mirroring the hatch-vcs | ||
| # scheme used by luce-bench and lucebox. Yields e.g. | ||
| # `0.3.0-cuda12` when `lucebox-v0.3.0` is pushed. | ||
| type=match,pattern=lucebox-v(\d+\.\d+\.\d+),group=1 | ||
| type=ref,event=branch | ||
| type=ref,event=tag | ||
| type=ref,event=pr | ||
| type=sha,prefix=sha- | ||
| type=semver,pattern={{version}} | ||
| type=semver,pattern={{major}}.{{minor}} | ||
|
|
||
| - name: Build and push | ||
| uses: docker/bake-action@4a9a8d494466d37134e2bfca2d3a8de8fb2681ad # v5 | ||
| env: | ||
| # Wire identity into docker-bake.hcl's GIT_SHA / IMAGE_TAG / | ||
| # BUILD_TIME variables. IMAGE_TAG is `${{ steps.meta.outputs. | ||
| # version }}` — the headline tag metadata-action picked | ||
| # (e.g. `cuda12` on main, `0.3.0-cuda12` on a release tag). | ||
| # The image's /props.build will surface these so a curl can | ||
| # pin down "what binary is this exactly" without inspecting | ||
| # the registry. | ||
| GIT_SHA: ${{ steps.identity.outputs.git_sha }} | ||
| IMAGE_TAG: ${{ steps.meta.outputs.version }} | ||
| BUILD_TIME: ${{ steps.identity.outputs.build_time }} | ||
| with: | ||
| files: | | ||
| docker-bake.hcl | ||
| ${{ steps.meta.outputs.bake-file }} | ||
| targets: ${{ matrix.variant }} | ||
| push: ${{ github.event_name == 'release' || (github.event_name == 'workflow_dispatch' && inputs.push) }} | ||
| # gha cache stores layer blobs in the workflow's Actions cache, | ||
| # scoped by variant so future CUDA stacks don't evict each other. | ||
| # mode=max also caches multi-stage intermediate layers (the | ||
| # builder stage with the 30-min nvcc compile), which is the whole | ||
| # point of doing this. | ||
| set: | | ||
| ${{ matrix.variant }}.cache-from=type=gha,scope=${{ matrix.variant }} | ||
| ${{ matrix.variant }}.cache-to=type=gha,scope=${{ matrix.variant }},mode=max |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: Release luce-bench | ||
|
|
||
| # Builds and publishes the luce-bench package to PyPI when a tag | ||
| # matching `luce-bench-v*` is pushed (e.g. `luce-bench-v0.2.7`). The | ||
| # release version is derived from the tag itself via hatch-vcs (see | ||
| # `luce-bench/pyproject.toml`), so there's no version-in-file to keep | ||
| # in sync. | ||
| # | ||
| # Uses PyPI trusted publishing (OIDC): set up the publisher in the | ||
| # PyPI project settings as `easel/lucebox-hub` repo + this workflow | ||
| # file + the `pypi` environment. No long-lived API token needed. | ||
|
|
||
| on: | ||
| push: | ||
| tags: | ||
| - 'luce-bench-v*' | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| build-and-publish: | ||
| runs-on: ubuntu-latest | ||
| environment: | ||
| name: pypi | ||
| url: https://pypi.org/p/luce-bench | ||
| permissions: | ||
| # Job-level `permissions` completely replaces the workflow-level | ||
| # block, so `contents: read` has to be repeated here for | ||
| # actions/checkout to be able to read the repo. | ||
| contents: read | ||
| id-token: write # trusted publishing | ||
| steps: | ||
| - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5 | ||
| with: | ||
| version: latest | ||
|
|
||
| - name: Build wheel + sdist | ||
| working-directory: luce-bench | ||
| run: | | ||
| uv build --out-dir dist | ||
|
|
||
| - name: Publish to PyPI (trusted publisher) | ||
| uses: pypa/gh-action-pypi-publish@release/v1 | ||
| with: | ||
| packages-dir: luce-bench/dist | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.