feat(server): add Laguna target-layer-split adapter by weicj · Pull Request #297 · Luce-Org/lucebox-hub

weicj · 2026-05-28T19:23:03Z

Summary

Add Laguna support to the shared C++ target layer-split path.

Laguna already has a single-card C++ target backend. After #295, the shared LayerSplitBackend path can also handle sampled requests when an adapter returns final-token logits. This PR adds the Laguna adapter on that foundation, so Laguna can use same-backend multi-GPU target layer split while preserving normal generation, CPU sampling, and prefix-cache snapshot/restore behavior.

Changes

Add LagunaLayerSplitAdapter and route arch=laguna layer-split placement through LayerSplitBackend.
Add Laguna partial target loading: each shard loads only its owned block layers, while the final shard keeps output_norm and output for logits.
Add partial Laguna KV-cache allocation and make snapshot save/restore skip layers that are not owned by the shard.
Add Laguna per-layer split forward execution, including activation handoff between same-backend GPUs.
Add a Laguna split projection helper that can return both argmax and final-token logits, so sampled requests use the shared CPU sampler instead of falling back to greedy-only behavior.
Preserve prefix-cache restore for sampled requests by saving/restoring the cached prefill logits with each layer-split snapshot.
Pass resolved device placement into the existing single-card Laguna backend instead of hardcoding GPU 0.
Include the Laguna adapter in the server build.

Notes

This PR is intentionally target-only for Laguna. It does not add DFlash remote draft/spec-decode for Laguna or expand Laguna PFlash behavior.
Local validation: runtime smoke passed on dual Pro VII with Laguna-XS.2 Q4_K_M using hip:0,hip:1 target layer split, temperature=0.7, and prefix-cache restore on the second request.
Draft note: this PR is currently stacked on fix(server): support sampled requests in target layer split #295. After fix(server): support sampled requests in target layer split #295 lands, this branch should be rebased onto main so the final diff contains only the Laguna adapter changes.

Record the latest open PR classification, note that Luce-Org#297 is non-draft and already carried, and capture refreshed conflict probes plus Luce-Org#237 delegated feasibility results.

The post-push PR recheck showed Luce-Org#297 is draft again, so keep it as an already-carried draft dependency rather than a current non-draft target.

cubic-dev-ai

No issues found across 21 files

_{Re-trigger cubic}

## What Adds a richer GGUF identity reader on top of Howard Su's existing `gguf_inspect` (PR Luce-Org#305): - `GgufMetadata` struct — captures `general.*` + `<arch>.*` header fields (architecture, name, file_type, quantization_version, block_count, embedding_length, context_length, vocab_size) with -1 / "" sentinels distinguishing "not in GGUF" from legitimate zero. - `read_gguf_metadata(path, compute_sha256)` — best-effort header read; optional SHA-256 of the whole file. - Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency added for one hash. - `<path>.sha256` sidecar caching — first server start hashes the file (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar. Sidecar I/O failures are non-fatal. - `llama_ftype_name` decode — maps `general.file_type` ints to human-readable names ("Q4_K_M", "IQ4_XS", etc.) for /props. ## Why `/props` schema-4 wants a single authoritative "exactly what binary + GGUF + quant + sha256 is loaded" payload so benchmarking and provenance tooling can pin model identity across runs without re-parsing GGUF headers in every consumer. The sidecar makes the SHA-256 free after the first boot, which is what makes it usable as a default-on identity field. ## Dependencies None. This is purely additive on top of `gguf_inspect.{cpp,h}` as merged in PR Luce-Org#305 — zero deletions, 333 insertions total. No other server files or build rules change in this PR; consumers will be wired up separately. ## Scope note This PR is the extracted-and-cleaned remnant of the previously closed PR Luce-Org#336 after a provenance audit; everything else from that branch (c2_gate, qwen3 drafter changes, structural-defense loaders, and the inadvertent reverts of Luce-Org#273/Luce-Org#295/Luce-Org#297) is either landing through its canonical PR (Luce-Org#274) or being dropped entirely.

weicj added 2 commits May 29, 2026 02:07

fix(server): enable sampling for target layer split

a9aedf7

feat(server): add Laguna target-layer-split adapter

53dd168

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026

docs: correct final draft status for PR 297

22520ce

The post-push PR recheck showed Luce-Org#297 is draft again, so keep it as an already-carried draft dependency rather than a current non-draft target.

weicj mentioned this pull request May 29, 2026

refactor(server): share target layer-split runtime helpers #306

Merged

weicj marked this pull request as ready for review May 29, 2026 10:40

cubic-dev-ai Bot reviewed May 29, 2026

View reviewed changes

davide221 merged commit bdc706a into Luce-Org:main Jun 3, 2026
3 checks passed

This was referenced Jun 4, 2026

refactor(server): shared layer-split backend + GGUF inspection + c2-gate plumbing #336

Closed

feat(server): GgufMetadata reader + SHA-256 sidecar for /props schema-4 #344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): add Laguna target-layer-split adapter#297

feat(server): add Laguna target-layer-split adapter#297
davide221 merged 2 commits into
Luce-Org:mainfrom
weicj:feat-server-laguna-layer-split-adapter-v2

weicj commented May 28, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weicj commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weicj commented May 28, 2026 •

edited

Loading