feat(server): add Laguna target-layer-split adapter#297
Merged
davide221 merged 2 commits intoJun 3, 2026
Conversation
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 29, 2026
Record the latest open PR classification, note that Luce-Org#297 is non-draft and already carried, and capture refreshed conflict probes plus Luce-Org#237 delegated feasibility results.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 29, 2026
The post-push PR recheck showed Luce-Org#297 is draft again, so keep it as an already-carried draft dependency rather than a current non-draft target.
This was referenced Jun 4, 2026
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 4, 2026
## What Adds a richer GGUF identity reader on top of Howard Su's existing `gguf_inspect` (PR Luce-Org#305): - `GgufMetadata` struct — captures `general.*` + `<arch>.*` header fields (architecture, name, file_type, quantization_version, block_count, embedding_length, context_length, vocab_size) with -1 / "" sentinels distinguishing "not in GGUF" from legitimate zero. - `read_gguf_metadata(path, compute_sha256)` — best-effort header read; optional SHA-256 of the whole file. - Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency added for one hash. - `<path>.sha256` sidecar caching — first server start hashes the file (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar. Sidecar I/O failures are non-fatal. - `llama_ftype_name` decode — maps `general.file_type` ints to human-readable names ("Q4_K_M", "IQ4_XS", etc.) for /props. ## Why `/props` schema-4 wants a single authoritative "exactly what binary + GGUF + quant + sha256 is loaded" payload so benchmarking and provenance tooling can pin model identity across runs without re-parsing GGUF headers in every consumer. The sidecar makes the SHA-256 free after the first boot, which is what makes it usable as a default-on identity field. ## Dependencies None. This is purely additive on top of `gguf_inspect.{cpp,h}` as merged in PR Luce-Org#305 — zero deletions, 333 insertions total. No other server files or build rules change in this PR; consumers will be wired up separately. ## Scope note This PR is the extracted-and-cleaned remnant of the previously closed PR Luce-Org#336 after a provenance audit; everything else from that branch (c2_gate, qwen3 drafter changes, structural-defense loaders, and the inadvertent reverts of Luce-Org#273/Luce-Org#295/Luce-Org#297) is either landing through its canonical PR (Luce-Org#274) or being dropped entirely.
easel
added a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 5, 2026
## What Adds a richer GGUF identity reader on top of Howard Su's existing `gguf_inspect` (PR Luce-Org#305): - `GgufMetadata` struct — captures `general.*` + `<arch>.*` header fields (architecture, name, file_type, quantization_version, block_count, embedding_length, context_length, vocab_size) with -1 / "" sentinels distinguishing "not in GGUF" from legitimate zero. - `read_gguf_metadata(path, compute_sha256)` — best-effort header read; optional SHA-256 of the whole file. - Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency added for one hash. - `<path>.sha256` sidecar caching — first server start hashes the file (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar. Sidecar I/O failures are non-fatal. - `llama_ftype_name` decode — maps `general.file_type` ints to human-readable names ("Q4_K_M", "IQ4_XS", etc.) for /props. ## Why `/props` schema-4 wants a single authoritative "exactly what binary + GGUF + quant + sha256 is loaded" payload so benchmarking and provenance tooling can pin model identity across runs without re-parsing GGUF headers in every consumer. The sidecar makes the SHA-256 free after the first boot, which is what makes it usable as a default-on identity field. ## Dependencies None. This is purely additive on top of `gguf_inspect.{cpp,h}` as merged in PR Luce-Org#305 — zero deletions, 333 insertions total. No other server files or build rules change in this PR; consumers will be wired up separately. ## Scope note This PR is the extracted-and-cleaned remnant of the previously closed PR Luce-Org#336 after a provenance audit; everything else from that branch (c2_gate, qwen3 drafter changes, structural-defense loaders, and the inadvertent reverts of Luce-Org#273/Luce-Org#295/Luce-Org#297) is either landing through its canonical PR (Luce-Org#274) or being dropped entirely.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Laguna support to the shared C++ target layer-split path.
Laguna already has a single-card C++ target backend. After #295, the shared
LayerSplitBackendpath can also handle sampled requests when an adapter returns final-token logits. This PR adds the Laguna adapter on that foundation, so Laguna can use same-backend multi-GPU target layer split while preserving normal generation, CPU sampling, and prefix-cache snapshot/restore behavior.Changes
LagunaLayerSplitAdapterand routearch=lagunalayer-split placement throughLayerSplitBackend.output_normandoutputfor logits.Notes
This PR is intentionally target-only for Laguna. It does not add DFlash remote draft/spec-decode for Laguna or expand Laguna PFlash behavior.
Local validation: runtime smoke passed on dual Pro VII with Laguna-XS.2 Q4_K_M using
hip:0,hip:1target layer split,temperature=0.7, and prefix-cache restore on the second request.Draft note: this PR is currently stacked on fix(server): support sampled requests in target layer split #295. After fix(server): support sampled requests in target layer split #295 lands, this branch should be rebased onto main so the final diff contains only the Laguna adapter changes.