Skip to content

feat(server): add Laguna target-layer-split adapter#297

Merged
davide221 merged 2 commits into
Luce-Org:mainfrom
weicj:feat-server-laguna-layer-split-adapter-v2
Jun 3, 2026
Merged

feat(server): add Laguna target-layer-split adapter#297
davide221 merged 2 commits into
Luce-Org:mainfrom
weicj:feat-server-laguna-layer-split-adapter-v2

Conversation

@weicj
Copy link
Copy Markdown
Collaborator

@weicj weicj commented May 28, 2026

Summary

Add Laguna support to the shared C++ target layer-split path.

Laguna already has a single-card C++ target backend. After #295, the shared LayerSplitBackend path can also handle sampled requests when an adapter returns final-token logits. This PR adds the Laguna adapter on that foundation, so Laguna can use same-backend multi-GPU target layer split while preserving normal generation, CPU sampling, and prefix-cache snapshot/restore behavior.

Changes

  • Add LagunaLayerSplitAdapter and route arch=laguna layer-split placement through LayerSplitBackend.
  • Add Laguna partial target loading: each shard loads only its owned block layers, while the final shard keeps output_norm and output for logits.
  • Add partial Laguna KV-cache allocation and make snapshot save/restore skip layers that are not owned by the shard.
  • Add Laguna per-layer split forward execution, including activation handoff between same-backend GPUs.
  • Add a Laguna split projection helper that can return both argmax and final-token logits, so sampled requests use the shared CPU sampler instead of falling back to greedy-only behavior.
  • Preserve prefix-cache restore for sampled requests by saving/restoring the cached prefill logits with each layer-split snapshot.
  • Pass resolved device placement into the existing single-card Laguna backend instead of hardcoding GPU 0.
  • Include the Laguna adapter in the server build.

Notes

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
Record the latest open PR classification, note that Luce-Org#297 is non-draft and already carried, and capture refreshed conflict probes plus Luce-Org#237 delegated feasibility results.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 29, 2026
The post-push PR recheck showed Luce-Org#297 is draft again, so keep it as an already-carried draft dependency rather than a current non-draft target.
@weicj weicj marked this pull request as ready for review May 29, 2026 10:40
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 21 files

Re-trigger cubic

@davide221 davide221 merged commit bdc706a into Luce-Org:main Jun 3, 2026
3 checks passed
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 4, 2026
## What

Adds a richer GGUF identity reader on top of Howard Su's existing
`gguf_inspect` (PR Luce-Org#305):

- `GgufMetadata` struct — captures `general.*` + `<arch>.*` header fields
  (architecture, name, file_type, quantization_version, block_count,
  embedding_length, context_length, vocab_size) with -1 / "" sentinels
  distinguishing "not in GGUF" from legitimate zero.
- `read_gguf_metadata(path, compute_sha256)` — best-effort header read;
  optional SHA-256 of the whole file.
- Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency
  added for one hash.
- `<path>.sha256` sidecar caching — first server start hashes the file
  (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar.
  Sidecar I/O failures are non-fatal.
- `llama_ftype_name` decode — maps `general.file_type` ints to
  human-readable names ("Q4_K_M", "IQ4_XS", etc.) for /props.

## Why

`/props` schema-4 wants a single authoritative "exactly what binary +
GGUF + quant + sha256 is loaded" payload so benchmarking and provenance
tooling can pin model identity across runs without re-parsing GGUF
headers in every consumer. The sidecar makes the SHA-256 free after the
first boot, which is what makes it usable as a default-on identity
field.

## Dependencies

None. This is purely additive on top of `gguf_inspect.{cpp,h}` as merged
in PR Luce-Org#305 — zero deletions, 333 insertions total. No other server
files or build rules change in this PR; consumers will be wired up
separately.

## Scope note

This PR is the extracted-and-cleaned remnant of the previously closed
PR Luce-Org#336 after a provenance audit; everything else from that branch
(c2_gate, qwen3 drafter changes, structural-defense loaders, and the
inadvertent reverts of Luce-Org#273/Luce-Org#295/Luce-Org#297) is either landing through its
canonical PR (Luce-Org#274) or being dropped entirely.
easel added a commit to easel/lucebox-hub that referenced this pull request Jun 5, 2026
## What

Adds a richer GGUF identity reader on top of Howard Su's existing
`gguf_inspect` (PR Luce-Org#305):

- `GgufMetadata` struct — captures `general.*` + `<arch>.*` header fields
  (architecture, name, file_type, quantization_version, block_count,
  embedding_length, context_length, vocab_size) with -1 / "" sentinels
  distinguishing "not in GGUF" from legitimate zero.
- `read_gguf_metadata(path, compute_sha256)` — best-effort header read;
  optional SHA-256 of the whole file.
- Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency
  added for one hash.
- `<path>.sha256` sidecar caching — first server start hashes the file
  (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar.
  Sidecar I/O failures are non-fatal.
- `llama_ftype_name` decode — maps `general.file_type` ints to
  human-readable names ("Q4_K_M", "IQ4_XS", etc.) for /props.

## Why

`/props` schema-4 wants a single authoritative "exactly what binary +
GGUF + quant + sha256 is loaded" payload so benchmarking and provenance
tooling can pin model identity across runs without re-parsing GGUF
headers in every consumer. The sidecar makes the SHA-256 free after the
first boot, which is what makes it usable as a default-on identity
field.

## Dependencies

None. This is purely additive on top of `gguf_inspect.{cpp,h}` as merged
in PR Luce-Org#305 — zero deletions, 333 insertions total. No other server
files or build rules change in this PR; consumers will be wired up
separately.

## Scope note

This PR is the extracted-and-cleaned remnant of the previously closed
PR Luce-Org#336 after a provenance audit; everything else from that branch
(c2_gate, qwen3 drafter changes, structural-defense loaders, and the
inadvertent reverts of Luce-Org#273/Luce-Org#295/Luce-Org#297) is either landing through its
canonical PR (Luce-Org#274) or being dropped entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants