[LuceBox][DFlash][lucebox-pr314-common-empty-fallback][2/n] Default empty spec retry in backend calls#319
Open
OmarB97 wants to merge 3 commits into
Open
[LuceBox][DFlash][lucebox-pr314-common-empty-fallback][2/n] Default empty spec retry in backend calls#319OmarB97 wants to merge 3 commits into
OmarB97 wants to merge 3 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Howard's PR #314 review asked that the empty speculative-decode retry stay behind the normal backend call name, especially for
restore_and_generate, with the retry enabled by default instead of exposed as a separate call-site helper.What changed
ModelBackend::generateandModelBackend::restore_and_generateas the public default call surface.generate_implandrestore_and_generate_impl.How to review
Start with
server/src/common/model_backend.hto verify the default retry wrapper, then scan the call-site cleanup inserver/src/common/daemon_loop.cppandserver/src/server/http_server.cpp. The remaining backend changes are mechanical override renames.Evidence
test_server_unitstdout ontaro:Results: 1620 assertions, 0 failuresandALL PASSED.dflash_serverbuild ontaro:[100%] Built target dflash_server.Verification
git diff --checktaro, from/tmp/lucebox-pr314-restore-default-a079a4b:cmake -S server -B server/build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCUDAToolkit_ROOT=/usr/local/cuda -DCMAKE_CUDA_ARCHITECTURES=120 -DDFLASH27B_FA_ALL_QUANTS=OFFtaro:cmake --build server/build --target test_server_unit -j$(nproc)taro:./server/build/test_server_unit-> 1620 assertions, 0 failurestaro:cmake --build server/build --target dflash_server -j$(nproc)Risks / gaps
The implementation rename touches every backend subclass, so the main risk is a missed override or call site. No follow-up task is needed: the focused CUDA build compiled
dflash_common,test_server_unit, anddflash_serveronsm_120, which covers the touched call surface.Collaborators
ko-macon May 31, 2026.ko-mac(ko-mac.codex#629f416c13) implemented and verified the change for MeshBoard tasklucebox-pr314-common-empty-fallback, usingtaroonly as the CUDA build host.