Skip to content

[LuceBox][DFlash][lucebox-pr314-common-empty-fallback][2/n] Default empty spec retry in backend calls#319

Open
OmarB97 wants to merge 3 commits into
Luce-Org:mainfrom
OmarB97:codex/pr314-restore-default
Open

[LuceBox][DFlash][lucebox-pr314-common-empty-fallback][2/n] Default empty spec retry in backend calls#319
OmarB97 wants to merge 3 commits into
Luce-Org:mainfrom
OmarB97:codex/pr314-restore-default

Conversation

@OmarB97
Copy link
Copy Markdown
Contributor

@OmarB97 OmarB97 commented May 31, 2026

Why

Howard's PR #314 review asked that the empty speculative-decode retry stay behind the normal backend call name, especially for restore_and_generate, with the retry enabled by default instead of exposed as a separate call-site helper.

What changed

  • Kept ModelBackend::generate and ModelBackend::restore_and_generate as the public default call surface.
  • Moved backend-specific implementations behind generate_impl and restore_and_generate_impl.
  • Updated daemon, HTTP, backend subclasses, and unit tests to use the default call names while preserving the centralized zero-token speculative retry.

How to review

Start with server/src/common/model_backend.h to verify the default retry wrapper, then scan the call-site cleanup in server/src/common/daemon_loop.cpp and server/src/server/http_server.cpp. The remaining backend changes are mechanical override renames.

Evidence

Verification

  • git diff --check
  • On taro, from /tmp/lucebox-pr314-restore-default-a079a4b: cmake -S server -B server/build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCUDAToolkit_ROOT=/usr/local/cuda -DCMAKE_CUDA_ARCHITECTURES=120 -DDFLASH27B_FA_ALL_QUANTS=OFF
  • On taro: cmake --build server/build --target test_server_unit -j$(nproc)
  • On taro: ./server/build/test_server_unit -> 1620 assertions, 0 failures
  • On taro: cmake --build server/build --target dflash_server -j$(nproc)

Risks / gaps

The implementation rename touches every backend subclass, so the main risk is a missed override or call site. No follow-up task is needed: the focused CUDA build compiled dflash_common, test_server_unit, and dflash_server on sm_120, which covers the touched call surface.

Collaborators

  • Omar Baradei requested the PR fix(common): retry empty spec-decode output through AR #314 follow-up from ko-mac on May 31, 2026.
  • Codex on ko-mac (ko-mac.codex#629f416c13) implemented and verified the change for MeshBoard task lucebox-pr314-common-empty-fallback, using taro only as the CUDA build host.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 16 files

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant