Skip to content

test(mtp): add RED-bar scripts for converter + e2e determinism#242

Merged
davide221 merged 3 commits into
Luce-Org:mainfrom
dusterbloom:test/mtp-red-bar-scripts
May 23, 2026
Merged

test(mtp): add RED-bar scripts for converter + e2e determinism#242
davide221 merged 3 commits into
Luce-Org:mainfrom
dusterbloom:test/mtp-red-bar-scripts

Conversation

@dusterbloom
Copy link
Copy Markdown
Collaborator

Summary

Two shell test scripts carved out of #241 per @howard0su's review request to split the rename PR from the test additions.

What's added

  • dflash/test/test_mtp_converter.shPhase 1 RED test for convert_dflash_to_gguf.py --mtp-assistant mode. Validates the converter's GGUF output matches the AtomicChat gold reference on (1) tensor list, (2) required MTP tensors, (3) required GGUF metadata keys, (4) requires_target_arch == "gemma4". Exits 77 (autotools skip code) cleanly if the gold reference at $MTP_GOLD_GGUF is missing.

  • dflash/test/test_mtp_e2e.shPhase 3 RED test for --draft-method mtp byte-identical output vs --draft-method none at temp=0 seed=0. Exits 77 cleanly if the test_gemma4_dflash binary or the model files are missing.

Safe to land now

Both scripts:

  • Use #!/usr/bin/env bash + set -euo pipefail
  • Have no C++ namespace dependencies
  • Exit 77 (skip) when prerequisites aren't on disk
  • Have no global state, no side effects beyond writing to /tmp

They'll turn GREEN when the corresponding MTP work lands. Until then they're CI-stable skips.

Not in this PR

The third test file flagged in #241, dflash/test/test_flash_attn_sparse_mask.cpp, is deferred. It's a TDD red-bar that intentionally references symbols (flash_prefill_forward_bf16_masked(), 6-arg ggml_flash_attn_sparse()) that don't exist yet, so adding it to main without the spike that turns it green would leave a permanently broken build target. It will ship with the sparse-mask spike PR.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread dflash/test/test_mtp_converter.sh Outdated
Comment thread dflash/test/test_mtp_converter.sh
dusterbloom added a commit to dusterbloom/lucebox-hub that referenced this pull request May 21, 2026
Two cubic findings on PR Luce-Org#242 (both valid P2):
- Tensor-list extraction used 'grep -E \d' which matches a literal
  backslash-d in POSIX ERE, not a digit. Switched to [0-9].
- target-arch check was substring-based and would pass for values
  merely containing 'gemma4'. Switched to exact-equality.

Script still exits 77 cleanly when prerequisites (gold model,
safetensors) are absent.
@davide221
Copy link
Copy Markdown
Contributor

Submodule rollback hidden in diff, can you fix?

Two shell tests carved out of Luce-Org#241 per @howard0su's review request:

- test_mtp_converter.sh: Phase 1 RED for convert_dflash_to_gguf.py
  --mtp-assistant mode. Validates the converter output matches the
  AtomicChat gold reference (tensor list, required MTP tensors, GGUF
  metadata keys, requires_target_arch == gemma4). Exits 77 cleanly if
  gold reference missing.

- test_mtp_e2e.sh: Phase 3 RED for --draft-method mtp byte-identical
  output vs --draft-method none at temp=0 seed=0. Exits 77 cleanly if
  test_gemma4_dflash binary or models missing.

Both safe to land now: scripts skip gracefully when their prerequisites
aren't on disk, so they're CI-stable. Will turn GREEN when the
corresponding MTP work lands.
Two cubic findings on PR Luce-Org#242 (both valid P2):
- Tensor-list extraction used 'grep -E \d' which matches a literal
  backslash-d in POSIX ERE, not a digit. Switched to [0-9].
- target-arch check was substring-based and would pass for values
  merely containing 'gemma4'. Switched to exact-equality.

Script still exits 77 cleanly when prerequisites (gold model,
safetensors) are absent.
@dusterbloom dusterbloom force-pushed the test/mtp-red-bar-scripts branch from 2ce2b8a to 03fdc4d Compare May 23, 2026 09:32
…ter assertions

- bump dflash/deps/llama.cpp to b896cf69 (matches origin/main; Luce-Org#242 rebase missed the submodule)
- replace POSIX-incompatible \d with [0-9] in tensor-list grep (cubic P2, lines 35+37)
- tighten gemma4 arch check from substring to exact match (cubic P2, line 79)
@davide221 davide221 merged commit afc10fc into Luce-Org:main May 23, 2026
3 checks passed
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 24, 2026
…ipts

test(mtp): add RED-bar scripts for converter + e2e determinism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants