Skip to content

feat(server): GgufMetadata reader + SHA-256 sidecar for /props schema-4#344

Open
easel wants to merge 1 commit into
Luce-Org:mainfrom
easel:feat/server-gguf-inspect
Open

feat(server): GgufMetadata reader + SHA-256 sidecar for /props schema-4#344
easel wants to merge 1 commit into
Luce-Org:mainfrom
easel:feat/server-gguf-inspect

Conversation

@easel
Copy link
Copy Markdown
Collaborator

@easel easel commented Jun 4, 2026

What

Adds a richer GGUF identity reader on top of Howard Su's existing gguf_inspect (PR #305):

  • GgufMetadata struct — captures general.* + <arch>.* header fields (architecture, name, file_type, quantization_version, block_count, embedding_length, context_length, vocab_size) with -1 / "" sentinels distinguishing "not in GGUF" from legitimate zero.
  • read_gguf_metadata(path, compute_sha256) — best-effort header read; optional SHA-256 of the whole file.
  • Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency added for one hash.
  • <path>.sha256 sidecar caching — first server start hashes the file (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar. Sidecar I/O failures are non-fatal.
  • llama_ftype_name decode — maps general.file_type ints to human-readable names (Q4_K_M, IQ4_XS, etc.) for /props.

Why

/props schema-4 wants a single authoritative "exactly what binary + GGUF + quant + sha256 is loaded" payload so benchmarking and provenance tooling can pin model identity across runs without re-parsing GGUF headers in every consumer. The sidecar makes SHA-256 free after the first boot, which is what makes it usable as a default-on identity field.

Dependencies

None. Purely additive on top of gguf_inspect.{cpp,h} as merged in PR #305 — zero deletions, 333 insertions total. No other server files or build rules change in this PR; /props consumers will be wired up in a follow-up.

Diff stat

 server/src/common/gguf_inspect.cpp | 293 +++++++++++++++++++++++++++++++++++++
 server/src/common/gguf_inspect.h   |  40 +++++
 2 files changed, 333 insertions(+)

Scope / provenance note

This PR is the extracted-and-cleaned remnant of the previously closed PR #336 after a provenance audit. Everything else from that branch (c2_gate, qwen3 drafter changes, structural-defense loaders, and the inadvertent reverts of #273 / #295 / #297) is either landing through its canonical PR (#274) or being dropped entirely. Only this 333-line additive layer on top of Howard's prior gguf_inspect is Erik's own work and survives the audit.

Test plan

  • Local build of dflash-server compiles unchanged (no other files touched, so this is mostly a "does it still link" check).
  • Manual read_gguf_metadata("…/qwen3-coder-30b-iq4_xs.gguf", true) round-trip on a known model: confirm file_type_name == "IQ4_XS", block_count, context_length match llama-gguf-dump.
  • Verify sidecar: delete *.sha256, call once → sidecar appears with 64-hex-char SHA-256 + newline; call again → no re-hash (instrument with a print or strace).
  • Verify sentinel behavior on a GGUF missing general.quantization_version: field stays at -1, ok == true.

@easel easel marked this pull request as ready for review June 4, 2026 18:37
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread server/src/common/gguf_inspect.cpp
Comment thread server/src/common/gguf_inspect.cpp Outdated
## What

Adds a richer GGUF identity reader on top of Howard Su's existing
`gguf_inspect` (PR Luce-Org#305):

- `GgufMetadata` struct — captures `general.*` + `<arch>.*` header fields
  (architecture, name, file_type, quantization_version, block_count,
  embedding_length, context_length, vocab_size) with -1 / "" sentinels
  distinguishing "not in GGUF" from legitimate zero.
- `read_gguf_metadata(path, compute_sha256)` — best-effort header read;
  optional SHA-256 of the whole file.
- Self-contained SHA-256 mini-impl (RFC 6234) — no OpenSSL dependency
  added for one hash.
- `<path>.sha256` sidecar caching — first server start hashes the file
  (~30s for a 17 GB GGUF on NVMe), subsequent starts read the sidecar.
  Sidecar I/O failures are non-fatal.
- `llama_ftype_name` decode — maps `general.file_type` ints to
  human-readable names ("Q4_K_M", "IQ4_XS", etc.) for /props.

## Why

`/props` schema-4 wants a single authoritative "exactly what binary +
GGUF + quant + sha256 is loaded" payload so benchmarking and provenance
tooling can pin model identity across runs without re-parsing GGUF
headers in every consumer. The sidecar makes the SHA-256 free after the
first boot, which is what makes it usable as a default-on identity
field.

## Dependencies

None. This is purely additive on top of `gguf_inspect.{cpp,h}` as merged
in PR Luce-Org#305 — zero deletions, 333 insertions total. No other server
files or build rules change in this PR; consumers will be wired up
separately.

## Scope note

This PR is the extracted-and-cleaned remnant of the previously closed
PR Luce-Org#336 after a provenance audit; everything else from that branch
(c2_gate, qwen3 drafter changes, structural-defense loaders, and the
inadvertent reverts of Luce-Org#273/Luce-Org#295/Luce-Org#297) is either landing through its
canonical PR (Luce-Org#274) or being dropped entirely.
@easel easel force-pushed the feat/server-gguf-inspect branch from 532f678 to f7d8278 Compare June 4, 2026 23:20
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 5, 2026
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 5, 2026
easel pushed a commit to easel/lucebox-hub that referenced this pull request Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant