feat(dFlash): MTP foundation by dusterbloom · Pull Request #237 · Luce-Org/lucebox-hub

dusterbloom · 2026-05-20T18:53:07Z

Summary

Adds Qwen3.5/3.6 NextN MTP support: IMtpModule interface + chain runner in common/, Qwen35MtpModule concrete impl, qwen35 backend wiring, and --mtp-gguf flags in dflash_server.

Works with both GGUF layouts:

Separate files: backbone GGUF as primary + MTP-head GGUF via --mtp-gguf
Unsloth single-file combined GGUF (Qwen3.6-27B-Q4_K_M-mtp.gguf): pass the same file as both the primary and --mtp-gguf; the loader reads backbone tensors and MTP head tensors from it. Verified greedy-equivalent + same 1.82× speedup as the separate-files setup.

Validation

Single-prompt HTTP (`/v1/chat/completions`, temp=0)

HumanEval-10 via harness/benchmarks/generation_benchmark.py on Qwen3.6-27B Q4_K_M + MTP-Q4_K_M (RTX 3090):

Mode	mean tok/s	speedup	accept rate
AR	32.46	1.00x	—
chain γ=2	59.24	1.82x	0.93–1.00

All 10 prompts pass functional checks. Greedy text byte-identical on 8/10; remaining 2 within the AR-vs-AR non-determinism baseline (verified via control run: same prompts produced 9/10 identical text across two AR-only runs of the same server).

Confirmed equivalent between separate-files and unsloth single-file layouts: 1.82× → 1.82× (statistically identical), 8/10 byte-identical text in both, same 95–100% accept rate range.

Agentic-CLI smoke (OpenAI Responses API)

pi (v0.55.3) via harness/clients/run_pi.sh: server starts with MTP, SSE streaming delivers tokens, [mtp_decode] iters=2 proposed=4 accepted=4 accept_rate=1.00, marker received exact-match. Proves the full HTTP → Responses API → SSE → MTP chain → tokenizer → client round-trip.

Long-prompt resilience

25K-token prompt under --mtp-gguf + --max-ctx 65536: server stays alive, MTP fires (iters=16, 50% accept), real response returned. Earlier silent SIGKILL on long prompts (DFLASH27B_MTP_CTX default 8192 not tracking max_ctx) fixed in `ada7bb0`; warm_head_kv overflow now triggers graceful AR fallback.

Unit tests

test_common_mtp_orchestrator: 20/20 pass — chain runner state machine (γ propagation, EOS termination, partial-accept rollback, n_gen termination), orchestrator lifecycle, Qwen35MtpModule input-validation guards, T20 ExternalDrafter partial-accept hidden threading (asserts the set_capture_row + consume_captured_hidden invariant the `74f708a` fix added). Mock-based, no GGUF required.

Scope

IMtpModule + INativeMtp + IExternalDrafterMtp mixins
Qwen35MtpModule (INativeMtp impl) + GGUF loader + step graph
MtpChainRunner + MtpOrchestrator
dflash_server: --mtp-gguf, --mtp-gamma, --mtp-draft-source, --mtp-draft-topk

Not in this PR (follow-ups)

Gemma4 ExternalDrafter MTP module (uses the IExternalDrafterMtp interface declared here)
Per-request DFlash/MTP/auto dispatcher (commit `3b65ac1` on `feat/mtp-prefix-warm-ghost`)
IDrafter abstraction unifying DFlash + MTP under one runner
Tree-mode runner + entropy-adaptive selector
Prefix-cache WARM hit for MTP

cubic-dev-ai

2 issues found across 34 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

dusterbloom · 2026-05-20T19:15:39Z

Addressed cubic review in commit 74f708a:

F1 (P2): add missing io.emit(-1) on chain-runner failure path in warm_and_decode (mtp_orchestrator.cpp:152)
F2 (P1): call set_capture_row(accept_n) + consume_captured_hidden() before threading running_hidden on partial accept for ExternalDrafter flavor; NativeHeads path is unaffected (next_hidden is always empty for that flavor, new branch not entered)

Build clean. All 19 unit tests pass. T7 (partial_accept_rollback) exercises the NativeHeads rollback path only — the ExternalDrafter hidden-sync invariant is not covered by the existing suite (noted for follow-up).

E2E HumanEval-10 AR-vs-MTP same-text rate: 8/10 post-fix (unchanged from 8/10 pre-fix). F2 is confined to ExternalDrafter and does not affect the Qwen3.6/NativeHeads path tested here. he_01/he_02 divergences remain — consistent with prior AR-vs-AR control showing GPU non-determinism on those two prompts.

dusterbloom · 2026-05-20T19:26:51Z

Coverage gap closed in bf2d3e9: T20 now exercises the ExternalDrafter partial-accept path that the 74f708a fix addressed (T7 used a NativeHeads stub, so it never reached the fixed branch). 20/20 unit tests pass.

dusterbloom · 2026-05-20T19:42:54Z

Two more fixes from real-workload hermes testing (commit ada7bb0):

A. Silent server death on long-prompt MTP: DFLASH27B_MTP_CTX defaulted to 8192 but the server never propagated --max-ctx to that env var. With --max-ctx 65536, the backbone KV was allocated at 65536 slots while the MTP head_kv stayed at 8192 — VRAM mismatch caused OOM on the next prefill. (The warm_head_kv overflow guard already returned false cleanly — the bug was upstream at init time.) Fix: 5-line propagation in server_main.cpp using setenv(..., overwrite=0) so user env still wins.

B. --draft + --mtp-gguf conflict: server now logs a WARNING and clears draft_path when both are set (MTP wins).

Verified: 25K-token prompt → server stays alive, MTP ran (iters=16 proposed=32 accepted=16 accept_rate=0.50), real response returned. Conflict invocation prints the warning and starts cleanly with draft = (none) in the config banner.

cubic-dev-ai

6 issues found across 34 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

dusterbloom · 2026-05-20T22:08:28Z

Addressed cubic re-review in commit f6e8e94:

P1:

gguf_mmap.h open() now idempotent (releases prior mapping first via placement-destroy+reconstruct; leaves object in default state on any failure path)
gguf_mmap.h Windows release() now calls CloseHandle() before clearing handle_ (fix by inspection — Linux build cannot exercise the Windows path)
qwen35_mtp.cpp GPU warmup off-by-one (> -> >=): a prompt of exactly n_ctx tokens would have written slot n_ctx (out-of-bounds; valid range is [0, n_ctx-1] with slot_start=1). Now declines MTP cleanly and chain runner falls through to AR.

P2:

test_dflash.cpp missing hidden_seq capture now clears all_prefill_hidden and breaks out of the prefill loop instead of running warm_head_kv on an undefined buffer
T7 now asserts restore_kv_at_chain_calls >= 1 — rollback path is confirmed to fire on partial accept

P3:

T6 EOS termination assertion tightened from total_emitted <= 10 to total_emitted == 1 (EOS is the first emitted token; runner must stop immediately)

Build clean, 20/20 unit tests pass. HumanEval-10 re-bench: 8/10 byte-identical.

howard0su · 2026-05-20T23:06:07Z

+
+    // MTP (Multi-Token Prediction) speculator — mutually exclusive with --draft.
+    // When mtp_gguf_path is set, the backend ignores draft_path.
+    const char * mtp_gguf_path    = nullptr;


shall we use https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF which contains MTP in the main model?

howard0su · 2026-05-20T23:06:16Z

+    // When mtp_gguf_path is set, the backend ignores draft_path.
+    const char * mtp_gguf_path    = nullptr;
+    int          mtp_gamma        = 0;        // 0 = MTP loaded but not active; >0 = chain depth
+    const char * mtp_draft_source = nullptr;  // "chain" (default) | "mtp_topk"


why not a enum?

howard0su · 2026-05-20T23:08:00Z

@@ -0,0 +1,219 @@
+// common/gguf_mmap.h — RAII wrapper for platform-conditional mmap of GGUF files.


this worth a separate PR

howard0su · 2026-05-20T23:08:29Z

+    // supports_mtp() returns true. Default returns nullptr.
+    // Forward-declared to avoid a header dependency from model_backend.h
+    // on mtp_interface.h — backends include both as needed.
+    virtual mtp::IMtpModule * mtp() { return nullptr; }


why not return nullptr indicate this model doesn't support mtp?

howard0su · 2026-05-20T23:09:17Z

+// matching mixin (IExternalDrafterMtp / INativeMtp).
+enum class MtpFlavor {
+    ExternalDrafter,   // Gemma4-style: separate drafter, h_prev chain
+    NativeHeads,       // Qwen3.6-style: MTP heads in the backbone


this should hide into ModelBackend.

howard0su · 2026-05-20T23:10:55Z

+        const int n = std::min(prefill_ubatch, prompt_len - start);
+        std::vector<int32_t> chunk(req.prompt.begin() + start,
+                                   req.prompt.begin() + start + n);
+        if (!target->verify_batch(chunk, start, last_tok, nullptr)) {


mtp should only change how we generate the batch (v.s. draft), can we normalize the code with the existing logic?

howard0su · 2026-05-20T23:11:49Z

@@ -0,0 +1,133 @@
+// qwen35_mtp_graph.h — CUDA cgraph for Qwen3.6 MTP head step forward.


we should use the existing graph API not a new API unless we want to put MTP header into another ggml backend.

howard0su · 2026-05-20T23:12:03Z

@@ -0,0 +1,225 @@
+// qwen35_mtp_loader.cpp — Discovery loader for Qwen3.6 -MTP-GGUF files.


check the previous comment about GGUF loading.

weicj · 2026-05-21T05:57:28Z

I’m working on a shared hardware/backend/runtime placement foundation for the cpp native server path. #236

Could we align the MTP backend/device selection with that shared placement layer, instead of keeping a separate MTP-specific selection path?

This would avoid duplicated CUDA/HIP/device handling across MTP,DFlash, PFlash, and future speculative paths.

Mechanical rename per weicj's review on PR Luce-Org#237 — the legacy namespace name baked the first backend (Qwen3-27B) into shared code. Renaming to dflash::common removes the backend leak from the substrate so future backends plug into a neutral namespace. Scope: - namespace dflash27b → namespace dflash::common - dflash27b::* → dflash::common::* - CMake static lib dflash27b → dflash_common - CMake project(dflash27b) → project(dflash) - Private CMake vars _dflash27b_* → _dflash_* - Stale comment references Out of scope (deferred to a follow-up): - Public C header dflash/include/dflash27b.h - C symbol dflash27b_last_error() - Preprocessor macros DFLASH27B_* - Env vars DFLASH27B_* Build note: CUDA 12.6 + GCC 13.3 has a known _Float128 conflict during CUDA host-compiler ID detection. Workaround: -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-11 No behavior change. Clean build green. Symbol mangling confirmed as N6dflash6common* via nm; zero residual dflash27b symbols outside the deferred public C ABI.

@weicj

Mechanical rename per @weicj's review on Luce-Org#237. Renames the legacy backend-baked namespace to a neutral one so future backends plug into shared code without name leakage. Scope: - namespace dflash27b → namespace dflash::common (sources + tests) - dflash27b::* → dflash::common::* - CMake static lib dflash27b → dflash_common - CMake project(dflash27b) → project(dflash) - Private CMake vars _dflash27b_* → _dflash_* - Stale comment references Out of scope (deferred to a follow-up): - Public C header dflash/include/dflash27b.h - C symbol dflash27b_last_error() - Preprocessor macros DFLASH27B_* - Env vars DFLASH27B_* Build note: CUDA 12.6 + GCC 13.3 has a known _Float128 conflict during CUDA host-compiler ID detection. Workaround: -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/g++-11 No behavior change. Symbol mangling confirmed via nm as N6dflash6common*; no residual dflash27b mangled symbols outside the deferred public C ABI.

cubic-dev-ai

1 issue found across 8 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

@howard0su

Per @howard0su's review on Luce-Org#237: gguf_mmap.h/.cpp are platform abstraction (POSIX mmap / Windows MapViewOfFile) that will be reused by multiple loaders. Extracting into a standalone PR ahead of the loader refactor (target/draft/MTP heads all benefit). Includes the cubic P1 fixes from f6e8e94: - open() is now idempotent (releases prior mapping before re-opening, leaves object in default state on any failure path) - Windows release() now calls CloseHandle() before clearing handle_ New: test_gguf_mmap unit test covers open/close, idempotency, missing-file path, RAII destructor. Stacked on Luce-Org#241 (uses dflash::common namespace from the start).

Per cubic P1 on PR Luce-Org#237: --mtp-source=none was silently overridden by the legacy-flag inference because MtpSource::None served both as the default and as the explicit 'none' value. Adds an internal MtpSource::Unset sentinel as the new default. Legacy-flag inference only fires when the field is still Unset. After inference (or if no legacy flag matched), Unset is resolved to None before any backend code sees it. User-visible CLI surface unchanged: --mtp-source still accepts exactly {none, native, external, auto}. Unset is internal-only and never escapes arg parsing. Defensive assert in create_backend() enforces this.

@howard0su

Per @howard0su's review on Luce-Org#237 (lines 57, 59): - 57: 'shall we use the unsloth single-file MTP-in-target GGUF?' - 59: 'why not a enum?' Replaces: const char * mtp_gguf_path = nullptr; const char * mtp_draft_source = nullptr; // "chain" | "mtp_topk" with: enum class MtpSource { None, Native, ExternalDrafter, Auto }; MtpSource mtp_source = MtpSource::None; const char * mtp_gguf_path = nullptr; // only for ExternalDrafter bool mtp_use_topk = false; // false=chain, true=mtp_topk Adds gguf_contains_mtp_tensors() probe (keyed on qwen35.nextn_predict_layers metadata) so --mtp-gguf becomes optional when the primary GGUF embeds MTP tensors (unsloth single-file case). Stacked on Luce-Org#237. dflash_server arg parsing updated to: - --mtp-source [none|native|external|auto] (new explicit flag) - --mtp-gguf PATH (now optional; only needed for ExternalDrafter) - Old --mtp-draft-source string flag warns + ignored (migration aid) - --mtp-gamma alone triggers Auto detection All test_common_mtp_orchestrator tests still pass (mock-based, unaffected by the config-surface change).

Per cubic P1 on PR Luce-Org#237: --mtp-source=none was silently overridden by the legacy-flag inference because MtpSource::None served both as the default and as the explicit 'none' value. Adds an internal MtpSource::Unset sentinel as the new default. Legacy-flag inference only fires when the field is still Unset. After inference (or if no legacy flag matched), Unset is resolved to None before any backend code sees it. User-visible CLI surface unchanged: --mtp-source still accepts exactly {none, native, external, auto}. Unset is internal-only and never escapes arg parsing. Defensive assert in create_backend() enforces this.

- server_main.cpp: gate --draft suppression and DFLASH27B_MTP_CTX on concrete MtpSource (Native|ExternalDrafter) instead of !None. Auto defers to backend GGUF probe; preserves --draft as fallback when Auto resolves to None. - mtp_orchestrator.cpp: wrap io with req.on_token via DaemonIO::with_token_callback (mirrors laguna_backend.cpp:151 and gemma4_backend.cpp:172). MTP requests now get streaming disconnect cancellation and per-token callbacks. 20/20 MTP test suite still passes.

Ports the Qwen3.6 MTP head onto the qwen35 backbone (same arch, NextN block at layer n_layer-1). Speculation runs through a new common chain runner; the existing DFlashTarget adapter handles verify/snapshot/restore. - common/mtp_interface.h: flavor-tagged IMtpModule + INativeMtp / IExternalDrafterMtp mixins. Future Gemma4 drafter plugs in via IExternalDrafterMtp without touching the chain runner. - common/mtp_chain_runner.{h,cpp}: γ-chain propose/verify/accept loop, hoisted out of the backend. Three KV-reconciliation paths (accept-all / fast rollback / recommit) share a single post-iter invariant so AR equivalence holds under recommit. - common/mtp_orchestrator.{h,cpp}: chunked prefill + warm + dispatch to chain runner. Owns only control flow; all compute lives in DFlashTarget::verify_batch and INativeMtp::step_batch graphs on the backend device. - qwen36/qwen36_mtp.{h,cpp,_graph.cpp,_loader.cpp}: GGUF tensor inventory for Qwen3.6 -MTP-GGUF, GPU warm graph, GPU step graph cached on (head_idx, fa_window, fused_lm_head, topk_k). γ is bound at attach time as the single source of truth. - qwen35: supports_mtp()/mtp() exposed through ModelBackend; generate() delegates to common::mtp::warm_and_decode when MTP is configured. Cache sized for max(γ+1, ddtree_budget+1) verify tokens. - server.py: --mtp-gguf and --mtp-gamma flags routed through; daemon command surface unchanged. Tests: 4/4 test_common_mtp_orchestrator. Full build green; harness probe 7/7 (claude_code, codex, opencode, openwebui, pi, hermes, openclaw) at --max-ctx 65536; MTP decode reports accept_rate 0.43-0.88 on short agentic prompts.

… paths Adds unit tests exercising MtpChainRunner state machine (gamma propagation, EOS termination, partial-accept rollback, n_gen termination), MtpOrchestrator lifecycle (reset_chain ordering, set_initial_hidden plumbing, gamma derivation), and Qwen36MtpModule input-validation guards (attach(nullptr), out-of-range gamma, pre-attach calls, shutdown idempotency). All new tests are mock-based and require no GGUF.

Both qwen35_backend and the test_dflash harness called the 4-arg overload of Qwen36MtpModule::init() with backend.tensor_context() — the BACKBONE ggml_ctx. That ctx contains only blk.0..63; MTP head tensors live in blk.64.* in the separate MTP GGUF, so the loader could never resolve them and init failed silently before chain decode. Switched both call sites to the 3-arg overload init(gguf_path, target, err) which calls load_gguf_tensor_context(mtp_gguf_path) internally to build the correct context for tensor lookup. Caught by E2E sweep on real Qwen3.6-27B Q4_K_M + MTP-Q4_K_M GGUFs.

… is qwen35) The GGUF general.architecture field on both 'Qwen3.5-0.8B-Q4_K_M.gguf' and 'Qwen3.6-27B-MTP-Q4_K_M.gguf' is 'qwen35'. The 'Qwen3.6' label lives only in general.name (model display name). The MTP module is a NextN extension of the qwen35 architecture, not a separate arch. Aligns naming with Qwen35Backend (the backbone) and the loader's arch check in gguf_target_loader.cpp:271. Pure rename — no functional change. Files renamed (git mv, blame preserved): src/qwen36/qwen36_mtp.{h,cpp} → src/qwen35/qwen35_mtp.{h,cpp} src/qwen36/qwen36_mtp_graph.{h,cpp} → src/qwen35/qwen35_mtp_graph.{h,cpp} src/qwen36/qwen36_mtp_loader.cpp → src/qwen35/qwen35_mtp_loader.cpp dflash/src/qwen36/ directory removed E2E re-verified on Qwen3.6-27B Q4_K_M + MTP-Q4_K_M (RTX 3090): AR (γ=0): 35.6 tok/s Chain γ=2: 57.8 tok/s = 1.62× over AR (accept_rate=0.83) Chain γ=4: 50.9 tok/s = 1.43× over AR (accept_rate=0.58) Greedy token output byte-identical across all three modes.

Mechanical fixes from style review: - qwen35_mtp_loader.cpp: rewrite header from stale 'PR 2 skeleton' narrative to current behavior - mtp_chain_runner.cpp: remove dead if-block at the post-propose size check (comment-only body) - Qwen35DaemonArgs::n renamed to Qwen35DaemonArgs::mtp_draft_source to match Qwen35Config::mtp_draft_source - mtp_orchestrator namespace: drop unused 'common' layer (dflash27b::common::mtp -> dflash27b::mtp) - mtp_orchestrator: dynamic_cast -> static_cast (flavor() already discriminates the concrete type) - qwen35_mtp.cpp: drop redundant 'static' on TU-local helpers inside anonymous namespace - qwen35_mtp.{h,cpp}: convert Phase-A/B roadmap comments to TODOs or remove (completed phases) - qwen35_backend.h: label ensure_decode_cache/tensor_context as test-only (they remain public for harness access) - dflash_target.h: forward-declare DDTree; move ddtree.h include to .cpp files that use it concretely - mtp_chain_runner.h: clean stale PR-internal cross-reference - qwen35_backend.h: terminology consistency (MTP module) - mtp_orchestrator.cpp: comment that enable_hidden_seq_capture is no-op for non-MTP targets - test_common_mtp_orchestrator.cpp: remove PR-review narrative - qwen35_backend.cpp: add fflush after [mtp] loaded printf - No functional change. E2E sweep re-verified greedy byte-identical output across AR / chain γ=2 / chain γ=4.

Adds --mtp-gguf, --mtp-gamma, --mtp-draft-source, --mtp-draft-topk to the native C++ HTTP server (dflash/src/server/dflash_server). Threads them through BackendArgs into Qwen35Config so the existing Qwen35Backend MTP path (already wired) is reachable through the production HTTP endpoint, not just the test_dflash harness. Mirror of the existing flags in dflash/scripts/server.py. Verified via HTTP /v1/chat/completions: AR baseline and chain γ=2 produce greedy-identical text under temperature=0 on Qwen3.6-27B Q4_K_M + MTP-Q4_K_M. Also confirmed with unsloth single-file combined GGUF (Qwen3.6-27B-Q4_K_M-mtp.gguf) passed as --mtp-gguf — identical output, greedy equivalent.

Record the latest unattended PR probe pass, including fresh direct-merge conflict checks for the remaining non-integrated PRs and a usable tmux/Codex feasibility report for PR Luce-Org#237. No code changes.

Record the 2026-05-29 19:36 cron preflight, fresh direct-merge probes for the remaining non-integrated PRs, and a tmux/Codex feasibility report for PR Luce-Org#237. No code changes were made.

Record the 2026-05-29 20:14 unattended integration pass, including fresh conflict probes for unresolved old-layout PRs and the latest tmux Claude/Codex delegation results for PR Luce-Org#237.

Record the 2026-05-29 21:56 cron refresh, fresh direct-merge probes for unresolved old-layout PRs, and the latest tmux-driven Luce-Org#237 Claude/Codex delegation outcome. No code changes are included.

Record the 2026-05-29 22:17 unattended integration probe pass. Note current included PR heads, retained conflicted worktrees, and the latest tmux-driven Luce-Org#237 salvage delegation results.

Record 2026-05-30 01:26 cron probe results, including fresh Codex review for PR Luce-Org#237 and retained probe artifact paths.

Record the 2026-05-30 02:03 UTC-4 unattended run, fresh direct-merge probes for remaining old-layout PRs, and a tmux-driven Codex feasibility review for PR Luce-Org#237.

Record the 2026-05-30 03:30 unattended reconciliation pass, including fresh conflicting probe worktrees, Codex Luce-Org#237 salvage review output, and validation results.

Record the May 30 04:31 unattended reconciliation, refreshed conflict probes, and a read-only Codex assessment for PR Luce-Org#237's current-layout selective port path.

Record the 2026-05-30 04:49 unattended reconciliation, fresh conflict probes, and renewed Luce-Org#237 Claude/Codex assessments. No stack code changes were required because current auto-integration already contains the mergeable non-draft PR heads.

Reconfirm open PR containment against origin/main and easel/auto-integration. Record fresh worktree probes for non-contained PRs plus read-only Luce-Org#237 Claude/Codex delegation attempts.

Record the 2026-05-30 08:35 unattended integration pass, including fresh direct-merge probes for non-contained PRs and the read-only Luce-Org#237 Claude/Codex delegation results.

Record the 2026-05-30 09:14 cron reconciliation pass, fresh conflict probes for the remaining non-contained PRs, and the Luce-Org#237 Claude/Codex delegation results.

Record the 2026-05-30 11:16 EDT unattended integration pass: refreshed PR-head containment, fresh conflicted probe worktrees, and Codex feasibility reports for the Luce-Org#305 and Luce-Org#237 selective-port candidates. No product-code stack changes were made.

Record the 2026-05-30 13:21 cron probe results, exact PR-head containment, retained worktrees, and the Codex Luce-Org#237 selective-port recommendation.

Record the 2026-05-30 15:09 cron pass, repeated conflict probes for remaining non-integrated PRs, and the tmux-delegated Luce-Org#237 selective-port assessment.

Record the 2026-05-30 15:57 cron pass, refreshed PR-head containment, fresh conflict probes for the remaining non-integrated PRs, and the tmux-driven Codex Luce-Org#237 feasibility report.

Record the 2026-05-30 17:00 EDT unattended reconciliation pass, including refreshed PR-head containment, conflict probes, and the bounded Claude rerun outcome for PR Luce-Org#237.

Record the 2026-05-30 17:38 unattended refresh, exact PR-head containment, repeated conflicted probe results, and the tmux-driven Luce-Org#237 selective-port feasibility audit.

Record the latest unattended PR enumeration, direct-merge probe results, and the tmux-driven Luce-Org#237 feasibility audit. No product code changed in this refresh.

Rerun direct merge probes for remaining non-integrated PRs and record the tmux-driven Codex audit for Luce-Org#237/Luce-Org#305. No product-code changes were integrated.

Record the 2026-05-30 23:50 cron pass, fresh conflict probes, and the tmux-driven Luce-Org#237 audit results.

Record the 2026-05-31 00:29 EDT unattended reconciliation run, current open PR containment, repeated conflict probes, and the Luce-Org#237 delegated audit outcomes.

Record the 2026-05-31 02:27 UTC-4 unattended run, fresh conflict probes, and the Codex-derived Luce-Org#237 selective MTP salvage plan.

Record 2026-05-31 03:14 cron preflight, current PR containment, fresh conflict probes for the remaining non-ancestor PRs, and the Luce-Org#237 Codex salvage audit.

Record the 2026-05-31 03:55 EDT unattended reconciliation run, including current PR-head containment, repeated conflict probes for the six remaining non-ancestor PRs, and the fresh Luce-Org#237/Luce-Org#135 delegated audit attempts.

Record 2026-05-31 04:10 cron reconciliation: no new PR heads, fresh conflict probes for Luce-Org#305/Luce-Org#237/Luce-Org#221/Luce-Org#154/Luce-Org#153/Luce-Org#135, and failed read-only delegation attempts for Luce-Org#237.

Record the 2026-05-31 cron preflight, exact PR containment check, repeated conflict probes, and the new Codex feasibility summary for PR Luce-Org#237's current-layout MTP port.

Port the safe current-layout common MTP interface, chain runner, and orchestrator slice from PR Luce-Org#237 without Qwen-specific runtime wiring. Add a common orchestrator regression test and CMake wiring.

Record the Luce-Org#237 common MTP foundation selective port, validation, remaining PR classification, and retained worktrees from the unattended integration run.

dusterbloom mentioned this pull request May 20, 2026

feat(mtp): MTP-via-daemon end-to-end (incl. MTP infrastructure) #214

Closed

8 tasks

cubic-dev-ai Bot reviewed May 20, 2026

View reviewed changes

Comment thread dflash/src/common/mtp_chain_runner.cpp Outdated

Comment thread dflash/src/common/mtp_orchestrator.cpp

dusterbloom marked this pull request as draft May 20, 2026 19:36

dusterbloom marked this pull request as ready for review May 20, 2026 19:53

cubic-dev-ai Bot reviewed May 20, 2026

View reviewed changes

howard0su suggested changes May 20, 2026

View reviewed changes

dusterbloom mentioned this pull request May 21, 2026

refactor(dflash): rename namespace dflash27b → dflash::common #241

Merged

This was referenced May 21, 2026

refactor(common): extract gguf_mmap RAII wrapper as standalone PR #243

Merged

refactor(mtp): MtpSource enum + auto-detect MTP tensors dusterbloom/lucebox-hub#2

Merged

cubic-dev-ai Bot reviewed May 21, 2026

View reviewed changes

Comment thread dflash/src/server/server_main.cpp Outdated

dusterbloom force-pushed the feat/dflash-mtp-foundation branch from 1051e71 to 8ec9a95 Compare May 21, 2026 20:10

davide221 requested a review from howard0su May 22, 2026 09:07

dusterbloom added 6 commits May 23, 2026 11:42

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026

docs: refresh auto-integration probe manifest

11c709a

Record 2026-05-30 01:26 cron probe results, including fresh Codex review for PR Luce-Org#237 and retained probe artifact paths.

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 30, 2026

docs: refresh auto-integration manifest

1a00f25

Record the 2026-05-30 13:21 cron probe results, exact PR-head containment, retained worktrees, and the Codex Luce-Org#237 selective-port recommendation.

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026

docs: refresh auto-integration manifest

4f59378

Record the 2026-05-30 23:50 cron pass, fresh conflict probes, and the tmux-driven Luce-Org#237 audit results.

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026

docs: refresh auto-integration manifest

9956605

Record the 2026-05-31 02:27 UTC-4 unattended run, fresh conflict probes, and the Codex-derived Luce-Org#237 selective MTP salvage plan.

		@@ -0,0 +1,219 @@
		// common/gguf_mmap.h — RAII wrapper for platform-conditional mmap of GGUF files.

		@@ -0,0 +1,133 @@
		// qwen35_mtp_graph.h — CUDA cgraph for Qwen3.6 MTP head step forward.

		@@ -0,0 +1,225 @@
		// qwen35_mtp_loader.cpp — Discovery loader for Qwen3.6 -MTP-GGUF files.

Conversation

dusterbloom commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Single-prompt HTTP (/v1/chat/completions, temp=0)

Agentic-CLI smoke (OpenAI Responses API)

Long-prompt resilience

Unit tests

Scope

Not in this PR (follow-ups)

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dusterbloom commented May 20, 2026

Uh oh!

dusterbloom commented May 20, 2026

Uh oh!

dusterbloom commented May 20, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dusterbloom commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weicj commented May 21, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dusterbloom commented May 20, 2026 •

edited

Loading

Single-prompt HTTP (`/v1/chat/completions`, temp=0)

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading