Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
d4546a5
feat(pflash): ee7 early-exit drafter + anchor-transitive cascade + bu…
dusterbloom May 27, 2026
94907a4
refactor(pflash): rename DFLASH_COMPRESS_* → PFLASH_COMPRESS_* (casca…
dusterbloom May 27, 2026
99f6b38
fix(pflash): adaptive anchor_radius eliminates 64K NIAH cliff
dusterbloom May 27, 2026
766e46d
bench: add eval_quality_compare.py for LongBench F1 regression detection
dusterbloom May 27, 2026
8c1d705
feat(qwen35): derive scalars from weights, assert vs GGUF metadata
dusterbloom May 28, 2026
699bb5c
feat(pflash): adaptive composition via per-request fa_window override
dusterbloom May 28, 2026
a676161
feat(pflash): PFLASH_*/DFLASH_* env-var dual aliasing + transitive ca…
dusterbloom May 28, 2026
6536b76
refactor(pflash): extract compress_cfg_from_env, kill qwen35/qwen3 pa…
dusterbloom May 28, 2026
b7dd89b
chore(pflash): move narrative comments to docs/, trim mega-blocks
dusterbloom May 28, 2026
ff0a6b9
fix(server): append closed <think> prefill in Jinja renderer when thi…
dusterbloom May 28, 2026
fc8c8e2
fix(chat_template): gate closed-think prefill injection to Qwen3 arch…
dusterbloom May 28, 2026
e64a2b8
refactor(c2-gate): wire c2_spec_decode_permitted into qwen35_backend
dusterbloom May 28, 2026
97ca5b1
feat(lucebox): docker stack + CLI + bench/profile + harness + luce-be…
easel May 29, 2026
f4db35b
fix(lucebox): env > config.toml > default in _load_or_build
easel May 29, 2026
a73d482
chore(lint): fix ruff E741/E731/E501/F401/I001 surfaced by PR #285 CI
easel May 29, 2026
09dc0be
chore(ci): fix ruff I001 + mypy KernelInfo union in lucebox
easel May 29, 2026
28701f7
fix(ci): restore contents:read on luce-bench release job
easel May 29, 2026
459194d
fix(harness,docker): reproducible uv sync + sensible lucebench defaults
easel May 29, 2026
4e9fcf6
fix(p2): cubic P2 batch — pin actions, sanitize tags, mirror shell wr…
easel May 29, 2026
ccef455
fix(p3): cubic P3 batch — docs links, NOTICE copyright, areas __all__
easel May 29, 2026
37b3fbd
feat(lucebox): docker stack + CLI + bench/profile + harness + luce-be…
easel May 29, 2026
8c1f37d
feat(pflash): effective-size admission gate + keep-ratio guard (keep …
dusterbloom May 29, 2026
db79a7d
fix(server): route Qwen3.6/Laguna think-mode reasoning to reasoning_c…
easel May 29, 2026
f62b1eb
test(server): add render→emit integration tests for reasoning channel
easel May 29, 2026
8b48ad8
Merge PR #308 (qwen-think-channel) into PR #285 (lucebox-docker)
easel May 29, 2026
9a6db60
feat(lucebench): card-driven thinking control + client-side thinking …
easel May 29, 2026
2f63a42
feat(extractor): multi-turn replay slicing + _is_claude_session fix
easel May 30, 2026
026ab07
feat(agent_recorded): multi-turn replay loader + prefill-and-decode v…
easel May 30, 2026
cb58edb
feat(autotune): profile-driven sweep + fa_window axis + coding-agent-…
easel May 30, 2026
72a8afc
fix(sweep): fall back to persisted [host] when LUCEBOX_HOST_* env empty
easel May 30, 2026
cefa0f5
feat(autotune): gemma4 WSL 24GB → max_ctx=98304 from sweep evidence
easel May 30, 2026
aadb424
Merge easel/feat/lucebox-docker (card-driven thinking control + qwen-…
easel May 30, 2026
4b24445
docs(autotune): sweep protocol + qwen3.6-27b bragi runbook
easel May 30, 2026
cb36e06
feat(lucebox.sh): docker exec into live container for steady-state su…
easel May 30, 2026
4668c0b
fix(longctx): lenient Risk-prefix grader accepts thinking preamble
easel May 30, 2026
48fafe6
fix(lucebox): add luce-bench dep + fix model_cards duplication in Doc…
easel May 30, 2026
542b4e2
Merge bragi's lucebench dep + Dockerfile model_cards dedup
easel May 30, 2026
3dffb30
feat(autotune/sweep): bragi sweep learnings — tq3_0 required for qwen…
easel May 30, 2026
b915ccc
Merge bragi: sweep learnings (gemma 131K, qwen tq3_0)
easel May 30, 2026
31fcaf6
test(sweep): update winner-pick test for post-3dffb30 max_ctx-first o…
easel May 30, 2026
148fba0
docs(experiments): sindri verifies bragi's 131K on gemma — drop-in clean
easel May 30, 2026
6ea9694
docs(experiments): note GPU power throttle on bragi baselines (86-90W…
easel May 30, 2026
52b9da4
fix(agent_recorded): recognize call:<verb>{} tool emissions + hyphena…
easel May 30, 2026
1d15ced
fix(agent): recognize call:<verb>{} emissions as agent-shape
easel May 30, 2026
20554ff
fix(humaneval): parse-pass grader trims trailing garbage before decla…
easel May 30, 2026
4b4fd28
Merge bragi: GPU power throttle note on baselines
easel May 31, 2026
060492e
docs(experiments): bragi think vs nothink baseline summary 2026-05-30
easel May 31, 2026
fbc2d41
feat(pflash): adaptive compression-regime router (correct-by-construc…
dusterbloom May 30, 2026
b31544f
feat(pflash): wire type-gate router into live handler; prune disprove…
dusterbloom May 30, 2026
8fc961b
feat(pflash): empty-response guard + bandit floor reconciliation (tas…
dusterbloom May 30, 2026
deba2fd
fix(forge): synthesize tool_use from call:<verb>{} plain-text emissions
easel May 31, 2026
deb5adb
Merge bragi: think vs nothink baseline summary doc
easel May 31, 2026
83c5567
feat(pflash): merge pflash/ee7 — prefill KV compression (16 commits, …
easel May 31, 2026
a45c9fa
Merge remote-tracking branch 'easel/feat/lucebox-docker' into feat/lu…
easel May 31, 2026
1122d02
Merge easel/feat/lucebox-docker: PFlash batch + chat_template gates
easel May 31, 2026
5b15d34
Merge origin/main: spec-decode empty fallback + prefix-cache fix + docs
easel May 31, 2026
6790deb
feat(pflash): add multi-turn session bench script + build test_server…
easel May 31, 2026
4b757d1
fix(server): Gemma4 <|channel>thought token routing (code=0% fix)
easel May 31, 2026
2062a37
fix(Dockerfile): verify test_server_unit binary present in runtime image
easel May 31, 2026
12c50c0
docs(server): plan for call:<verb>{} tool-parser pattern + codex review
easel May 31, 2026
cdb8b9c
fix(server): parse gemma's call:<verb>{} plain-text tool emissions
easel May 31, 2026
5ca695c
Merge PR #323: server-side call:<verb>{} tool parser
easel May 31, 2026
c70ebb0
Merge bragi: gemma4 channel-token fix + pflash test scripts
easel May 31, 2026
329f611
Merge bragi: Dockerfile test_server_unit guard
easel May 31, 2026
b48ce66
fix(config): validate prefill_mode accepts only off/auto/always
easel May 31, 2026
b69a75e
docs(experiments): add Qwen3.6-27B PFlash A/B test doc (baseline in p…
easel May 31, 2026
d339916
Merge remote-tracking branch 'easel/feat/lucebox-docker' into feat/lu…
easel May 31, 2026
b240239
docs(experiments): update pflash A/B doc with critical finding
easel May 31, 2026
1443239
fix(server): replace C++20 starts_with with C++17 rfind
easel May 31, 2026
aa00f49
docs(experiments): add Gemma4 call:verb parser fix verification doc
easel May 31, 2026
dcd28c1
docs(experiments): update Gemma4 fix doc with analysis and partial re…
easel May 31, 2026
b707e87
fix(autotune): document pflash requires prefix_cache_slots > 0
easel May 31, 2026
8039911
fix(server): use C++17 rfind idiom instead of C++20 starts_with
easel Jun 1, 2026
004a81b
fix(tool_parser): accept ``_call:`` prefix from gemma tokenizer artifact
easel Jun 1, 2026
fac7e0f
Merge bragi: parallel C++17 fix + pflash docs + config validation
easel Jun 1, 2026
1552495
docs(experiments): plan soft-close thinking termination via logit-rat…
easel Jun 1, 2026
e974ac3
docs(experiments): plan SSE emitter CONTENT-mode tool parse
easel Jun 1, 2026
d799d00
feat(server): soft-close thinking termination via logit-ratio peek
easel Jun 1, 2026
8055201
docs(server): plan for call:<verb>{} tool-parser pattern + codex review
easel May 31, 2026
d67a269
fix(server): parse gemma's call:<verb>{} plain-text tool emissions
easel May 31, 2026
80e6e2a
fix(tool_parser): accept ``_call:`` prefix from gemma tokenizer artifact
easel Jun 1, 2026
8218333
fix(server): wire sse_emitter to detect plain-text call:<verb>{} tools
easel Jun 1, 2026
ee9cd9e
fix(server): address cubic PR #329 review feedback
easel Jun 2, 2026
009c6fb
Merge PR #329 (call:<verb>{} parser + emitter wiring) into feat/luceb…
easel Jun 2, 2026
44230e2
Merge PR #326 (soft-close thinking termination) into feat/lucebox-docker
easel Jun 2, 2026
3ed7d5a
fix(test): close brace in test_soft_close_natural_at_boundary lost du…
easel Jun 2, 2026
9395917
feat(server): add --debug-thinking-logits trajectory diagnostic
easel Jun 2, 2026
acf718b
fix(server+luce-bench): detect call:verb{} in streaming emitter + all…
easel May 31, 2026
3bc38db
docs(experiments): complete Gemma4 agent_recorded results + final ver…
easel May 31, 2026
2e89171
fix(server): handle Anthropic-format tool_use + tool_result content b…
easel May 31, 2026
69893a1
docs(experiments): add Qwen3.6 forge tool_result fix doc (0%→100%)
easel May 31, 2026
bfd7f1e
docs(experiments): finalize forge fix doc with Gemma4 re-test (20%→20%)
easel May 31, 2026
3443d1d
docs(experiments): add bragi RTX 5090 final tuning summary 2026-05-31
easel May 31, 2026
090d1ec
docs(experiments): add comprehensive Qwen3.6 sweep + update final sum…
easel May 31, 2026
29d78ad
docs+fix(autotune): quantify prefix_cache regression on agent_recorded
easel May 31, 2026
9304c4f
docs(experiments): bragi auto-tuning complete summary 2026-05-31
easel May 31, 2026
e6e21bd
feat(lucebox): wire Laguna-XS.2 safetensors speculator into server la…
easel Jun 1, 2026
9aa029e
docs(experiments): update Qwen3.6 baseline with final numbers + add L…
easel Jun 1, 2026
6f9b9cd
docs(experiments): add Laguna-XS.2 initial characterization for bragi
easel Jun 1, 2026
e9a53bc
docs(laguna): fill in 32K benchmark results table
easel Jun 1, 2026
e1e2973
docs(laguna): document 48K ctx regression + budget=4/16 crash
easel Jun 1, 2026
f259880
docs(gemma4-31b): initial characterization skeleton for bragi 2026-05-31
easel Jun 1, 2026
3262a2a
fix(autotune): make runtime_from_host preset-aware for large models
easel Jun 1, 2026
5cffc98
docs(gemma4-31b): complete initial characterization results — bragi 2…
easel Jun 1, 2026
0edafa9
docs(sweep): qwen3.6-27b coding-agent-loop sweep bragi 2026-06-01
easel Jun 1, 2026
778c4f0
docs(autotune): bragi auto-tune summary 2026-06-01
easel Jun 1, 2026
0b23967
fix(autotune): document GPU hang risk for budget=32+q8_0 at 65K ctx
easel Jun 1, 2026
cd32af1
docs(autotune): add winner-config agent_recorded QA result (9/26, no …
easel Jun 1, 2026
718709e
docs(autotune): mark bragi autotune complete, add sweep status table
easel Jun 1, 2026
4c16cec
fix(server): treat max_tokens as response-only budget when thinking i…
easel Jun 2, 2026
175c8a7
fix(server): split soft-close probe ids from inject ids
easel Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Local venv and Python caches — uv rebuilds inside the image.
.venv/
**/__pycache__/
**/*.pyc

# Build artefacts.
**/build/
**/build-*/
dflash/build/

# Model weights — bind-mount at runtime instead of baking into the image.
dflash/models/
**/*.gguf
**/*.safetensors

# Git metadata. Submodule contents are kept; .git files inside the worktree
# are not needed at build time.
.git/
**/.git
**/.gitignore.local

# Local agent / IDE state.
.claude/
.idea/
.vscode/

# Misc large or volatile.
*.log
*.tmp
*.swp
**/*.bin
**/*.npy
34 changes: 30 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,46 @@ jobs:
name: uv workspace (lock + sync + import smoke)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: astral-sh/setup-uv@caf0cab7a618c569241d31dcd442f54681755d39 # v3
with:
version: "0.11.x"
- name: Verify uv lockfile and workspace sync
# Skips the torch wheel in this fast job; the CUDA build below runs a
# full sync and builds megakernel against torch.
run: bash scripts/check_uv_workspace.sh

- name: Lint Python surfaces touched by lucebox tooling
run: uv run --frozen --extra dev ruff check .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Lint and typecheck steps using uv run --frozen --extra dev will trigger a full re-sync that installs the cu128 torch wheel (~2 GB), defeating the --no-install-package torch optimization in check_uv_workspace.sh that was explicitly designed to keep this job fast.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/ci.yml, line 23:

<comment>Lint and typecheck steps using `uv run --frozen --extra dev` will trigger a full re-sync that installs the cu128 torch wheel (~2 GB), defeating the `--no-install-package torch` optimization in `check_uv_workspace.sh` that was explicitly designed to keep this job fast.</comment>

<file context>
@@ -10,20 +10,46 @@ jobs:
         run: bash scripts/check_uv_workspace.sh
 
+      - name: Lint Python surfaces touched by lucebox tooling
+        run: uv run --frozen --extra dev ruff check .
+
+      - name: Typecheck lucebox CLI
</file context>


- name: Typecheck lucebox CLI
run: uv run --frozen --extra dev python -m mypy --package lucebox

- name: Install shellcheck (for bash test runner)
# ubuntu-latest typically ships shellcheck pre-installed, but pin
# the dependency explicitly so the bash test runner can always rely
# on `command -v shellcheck` succeeding.
run: |
if ! command -v shellcheck >/dev/null 2>&1; then
sudo apt-get update
sudo apt-get install -y shellcheck
fi
shellcheck --version | head -3

- name: Smoke-test lucebox.sh wrapper
# Catches `set -u` regressions, syntax errors, and stale dispatch
# handlers in the host-side wrapper + the in-container entrypoint.
# Runs shellcheck --severity=error across every shipped .sh file,
# exercises every subcommand dispatch under `set -u`, and drives the
# entrypoint's draft-resolution block through every family-glob
# branch — all on the bare runner without docker/nvidia/systemd.
run: bash scripts/test_lucebox_sh.sh

build:
name: Build (cmake + uv sync --extra megakernel)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
submodules: recursive
token: ${{ secrets.SUBMODULE_PAT || secrets.GITHUB_TOKEN }}
Expand All @@ -39,7 +65,7 @@ jobs:
sub-packages: '["nvcc", "cudart-dev", "thrust", "driver-dev"]'
non-cuda-sub-packages: '["libcublas-dev"]'

- uses: astral-sh/setup-uv@v3
- uses: astral-sh/setup-uv@caf0cab7a618c569241d31dcd442f54681755d39 # v3
with:
version: "0.11.x"
# uv reads .python-version (3.12, matching the previous CI) and downloads the matching
Expand Down
177 changes: 177 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
name: Docker prebuilds

# Builds the cuda12 lucebox-hub Docker image defined in docker-bake.hcl
# and pushes it to GHCR. The bake file is the source of
# truth for arch matrices and CUDA pinning; this workflow only handles
# fetching submodules, freeing runner disk, signing in to the registry, and
# wiring the cache.

on:
# Build + push to GHCR when a GitHub Release is published. The release tag
# becomes one of the image tags via docker/metadata-action's `type=ref,
# event=tag` + `type=semver` rules below.
release:
types: [published]
# Build-only CI guard on PRs that touch the docker surface. We never push
# from a PR — even if we wanted to, GITHUB_TOKEN on PRs from forks lacks
# `packages:write`. The point is to catch Dockerfile / bake-file / arch-
# list regressions before they land on main.
pull_request:
paths:
- Dockerfile
- docker-bake.hcl
- .dockerignore
- .github/workflows/docker.yml
- server/CMakeLists.txt
- server/src/**
- server/test/**
- server/include/**
- server/scripts/**
- server/deps/**
- server/pyproject.toml
- pyproject.toml
- uv.lock
- lucebox.sh
- lucebox/**
# Manual trigger for one-off rebuilds or pre-release smoke tests. The
# `push` input controls whether the resulting images land in GHCR or only
# populate the buildx cache.
workflow_dispatch:
inputs:
push:
description: "Push images to GHCR after build"
type: boolean
default: false

# Single in-flight build per ref. New pushes cancel the previous run so we
# don't queue 30-min compiles.
concurrency:
group: docker-${{ github.ref }}
cancel-in-progress: true

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository_owner }}/lucebox-hub

jobs:
build:
name: ${{ matrix.variant }}
# ubuntu-latest = 4 CPU / 16 GB RAM / 14 GB free disk on the GitHub-
# hosted plan. The disk-free step at the top of the job claws back
# ~30 GB, which is enough to land a 14 GB image with build cache.
# CPU is the harder constraint: the fat-binary arch list can take hours
# on hosted runners. If you outgrow this:
# • Larger GitHub-hosted runners (`ubuntu-latest-8-cores`, paid)
# halve wall time.
# • A self-hosted runner with the host's nvcc avoids the
# containerised CUDA toolkit pull entirely.
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
strategy:
fail-fast: false
matrix:
variant: [cuda12]
steps:
- name: Free runner disk space
# The default ubuntu-latest image keeps ~25 GB of preinstalled
# tooling (Android SDK, .NET, Haskell, ghc, etc.) we don't need.
# Pinned action; check upstream releases before bumping.
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
with:
tool-cache: true
android: true
dotnet: true
haskell: true
large-packages: false # slow; preinstalled apt packages we don't need
swap-storage: true

- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
# Submodule contents are needed by the cmake build (llama.cpp ggml
# subtree, mit-han-lab Block-Sparse-Attention). The Dockerfile
# asserts they're present before running cmake.
submodules: recursive

- uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3

- name: Log in to GHCR
# Skip on PR runs: we never push from a PR and the token from a fork
# PR can't `packages:write` anyway.
if: github.event_name != 'pull_request'
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Capture build identity
id: identity
# /props.build identity baked into the image. GIT_SHA is the full
# commit sha (matches `${{ github.sha }}` — short-form is fine, we
# use the full 40-char form for "exactly which weights are running"
# forensics). BUILD_TIME is ISO 8601 UTC. IMAGE_TAG is filled in
# after the metadata-action step below picks the headline tag.
run: |
echo "git_sha=${{ github.sha }}" >> "$GITHUB_OUTPUT"
echo "build_time=$(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$GITHUB_OUTPUT"

- name: Derive image metadata
id: meta
uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051 # v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
# Suffix every tag with the variant so future CUDA stacks can
# coexist under the same image name. Examples (using cuda12):
# ghcr.io/<owner>/lucebox-hub:cuda12 (moving — main/dispatch/release)
# ghcr.io/<owner>/lucebox-hub:0.3.0-cuda12 (pinned — from `lucebox-v0.3.0` tag)
# ghcr.io/<owner>/lucebox-hub:feat-x-cuda12 (per branch)
# ghcr.io/<owner>/lucebox-hub:sha-abc1234-cuda12 (per commit)
flavor: |
latest=false
suffix=-${{ matrix.variant }},onlatest=true
tags: |
# Moving variant tag — emitted on main, release, and any
# workflow_dispatch with push:true. The `enable=` gate keeps
# branch + PR builds from clobbering the published `:cuda12`.
type=raw,value=${{ matrix.variant }},suffix=,priority=1000,enable=${{ github.event_name == 'release' || (github.ref == 'refs/heads/main' && github.event_name != 'pull_request') || (github.event_name == 'workflow_dispatch' && inputs.push) }}
# Pinned version tag — extracts the version from a
# `lucebox-v<X.Y.Z>` git tag push, mirroring the hatch-vcs
# scheme used by luce-bench and lucebox. Yields e.g.
# `0.3.0-cuda12` when `lucebox-v0.3.0` is pushed.
type=match,pattern=lucebox-v(\d+\.\d+\.\d+),group=1
type=ref,event=branch
type=ref,event=tag
type=ref,event=pr
type=sha,prefix=sha-
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}

- name: Build and push
uses: docker/bake-action@4a9a8d494466d37134e2bfca2d3a8de8fb2681ad # v5
env:
# Wire identity into docker-bake.hcl's GIT_SHA / IMAGE_TAG /
# BUILD_TIME variables. IMAGE_TAG is `${{ steps.meta.outputs.
# version }}` — the headline tag metadata-action picked
# (e.g. `cuda12` on main, `0.3.0-cuda12` on a release tag).
# The image's /props.build will surface these so a curl can
# pin down "what binary is this exactly" without inspecting
# the registry.
GIT_SHA: ${{ steps.identity.outputs.git_sha }}
IMAGE_TAG: ${{ steps.meta.outputs.version }}
BUILD_TIME: ${{ steps.identity.outputs.build_time }}
with:
files: |
docker-bake.hcl
${{ steps.meta.outputs.bake-file }}
targets: ${{ matrix.variant }}
push: ${{ github.event_name == 'release' || (github.event_name == 'workflow_dispatch' && inputs.push) }}
# gha cache stores layer blobs in the workflow's Actions cache,
# scoped by variant so future CUDA stacks don't evict each other.
# mode=max also caches multi-stage intermediate layers (the
# builder stage with the 30-min nvcc compile), which is the whole
# point of doing this.
set: |
${{ matrix.variant }}.cache-from=type=gha,scope=${{ matrix.variant }}
${{ matrix.variant }}.cache-to=type=gha,scope=${{ matrix.variant }},mode=max
51 changes: 51 additions & 0 deletions .github/workflows/release-luce-bench.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Release luce-bench

# Builds and publishes the luce-bench package to PyPI when a tag
# matching `luce-bench-v*` is pushed (e.g. `luce-bench-v0.2.7`). The
# release version is derived from the tag itself via hatch-vcs (see
# `luce-bench/pyproject.toml`), so there's no version-in-file to keep
# in sync.
#
# Uses PyPI trusted publishing (OIDC): set up the publisher in the
# PyPI project settings as `easel/lucebox-hub` repo + this workflow
# file + the `pypi` environment. No long-lived API token needed.

on:
push:
tags:
- 'luce-bench-v*'

permissions:
contents: read

jobs:
build-and-publish:
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/luce-bench
permissions:
# Job-level `permissions` completely replaces the workflow-level
# block, so `contents: read` has to be repeated here for
# actions/checkout to be able to read the repo.
contents: read
id-token: write # trusted publishing
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 0

- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
with:
version: latest

- name: Build wheel + sdist
working-directory: luce-bench
run: |
uv build --out-dir dist

- name: Publish to PyPI (trusted publisher)
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: luce-bench/dist
19 changes: 19 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,22 @@ fix-plan.md
# Harness test artifacts
.harness-work/
health

# lucebox host-side generated config + benchmark output
.lucebox/
models/.lucebox/

# Claude Code session state (worktrees, agent scratchpads)
.claude/

# Benchmark snapshots live in the standalone luce-bench-baselines repo
# (https://github.com/easel/luce-bench-baselines) — not in lucebox-hub.
dflash/docs/tuning-snapshots/

# luce-bench --sweep default output dir (per-host bench runs); reference
# baselines live in github.com/easel/luce-bench-baselines.
luce-bench/snapshots/

# Workdir editor backup suffixes
*.git-head
*.pre-pflash-rename
Loading