feat(server): support DFlash with mixed-backend target layer split#321
Open
weicj wants to merge 6 commits into
Open
feat(server): support DFlash with mixed-backend target layer split#321weicj wants to merge 6 commits into
weicj wants to merge 6 commits into
Conversation
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Record draft PR Luce-Org#321 from the final PR re-enumeration and confirm no new non-draft PR appeared after the auto-integration push.
72107e2 to
71b3e98
Compare
Contributor
There was a problem hiding this comment.
6 issues found across 43 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Record the Luce-Org#291/Luce-Org#290 draft-residency integration, newly non-draft Luce-Org#321/Luce-Org#325 classification, validation, and retained worktree/transcript paths for the May 31 13:30 UTC run.
Contributor
There was a problem hiding this comment.
1 issue found across 10 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="server/src/qwen35/qwen35_target_shard_ipc.cpp">
<violation number="1" location="server/src/qwen35/qwen35_target_shard_ipc.cpp:60">
P2: The negative-value guard only checks `raw[0]`, so signed inputs with leading whitespace (for example `" -1"`) still pass through `strtoull` and can produce an unintended huge shared-memory size.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Comment on lines
+60
to
+64
| if (raw[0] == '-') { | ||
| return required_bytes; | ||
| } | ||
| char * end = nullptr; | ||
| const unsigned long long parsed = std::strtoull(raw, &end, 10); |
Contributor
There was a problem hiding this comment.
P2: The negative-value guard only checks raw[0], so signed inputs with leading whitespace (for example " -1") still pass through strtoull and can produce an unintended huge shared-memory size.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At server/src/qwen35/qwen35_target_shard_ipc.cpp, line 60:
<comment>The negative-value guard only checks `raw[0]`, so signed inputs with leading whitespace (for example `" -1"`) still pass through `strtoull` and can produce an unintended huge shared-memory size.</comment>
<file context>
@@ -57,9 +57,13 @@ size_t target_shard_shared_bytes_from_env(size_t required_bytes) {
if (!raw || !*raw) {
return required_bytes;
}
+ if (raw[0] == '-') {
+ return required_bytes;
+ }
</file context>
Suggested change
| if (raw[0] == '-') { | |
| return required_bytes; | |
| } | |
| char * end = nullptr; | |
| const unsigned long long parsed = std::strtoull(raw, &end, 10); | |
| const char * p = raw; | |
| while (*p == ' ' || *p == '\t' || *p == '\n' || *p == '\r' || *p == '\f' || *p == '\v') { | |
| ++p; | |
| } | |
| if (*p == '-') { | |
| return required_bytes; | |
| } | |
| char * end = nullptr; | |
| const unsigned long long parsed = std::strtoull(p, &end, 10); | |
| if (end == p || *end != '\0' || | |
| parsed > (unsigned long long)std::numeric_limits<size_t>::max()) { | |
| return required_bytes; | |
| } |
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Record the exact Luce-Org#290/Luce-Org#291 merges, current Luce-Org#321/Luce-Org#325 classification, retained worktrees, and validation for the 2026-05-31 13:57 integration run.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Selectively carries the same-backend Qwen3.5 layer-split disk prefix-cache snapshot export/adopt slice from PR Luce-Org#325 while leaving the mixed-backend runtime and Laguna cache work blocked on the larger PR Luce-Org#321 architecture reconciliation. Also refreshes the auto-integration manifest/run log with the current PR classification and retained worktree notes.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Record the 2026-05-31 15:02 unattended run: exact open-PR containment, fresh conflict probe counts for the eight remaining non-ancestor candidates, and the tmux-driven Luce-Org#321 Claude/Codex read-only attempts. No source changes were promoted.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Carry the conflict-free PR Luce-Org#321 placement foundation over the current auto-integration stack. DevicePlacement now records per-shard backends, parses mixed backend layer-split device lists, validates duplicate devices by backend plus GPU, and extends placement unit coverage.\n\nThe target-shard IPC/runtime pieces remain documented as pending selective-port work.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Port a narrow PR Luce-Org#321 control-plane slice by adding RemoteTargetShardConfig, threading it through BackendArgs, and parsing/printing the target-shard IPC CLI options without enabling mixed-backend execution yet. Refresh the auto-integration manifest with current probe/delegation results.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Port a narrow PR Luce-Org#321 runtime slice into the auto-integration stack: resolve a null-safe log prefix once, use it consistently for layer-split runtime diagnostics/snapshot setup, and stamp shard metadata with each configured per-shard placement backend.\n\nAlso refresh the auto-integration manifest with current PR classification, probe counts, retained worktrees, and validation notes.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Carry the next narrow PR Luce-Org#321 slice by passing the staged RemoteTargetShardConfig from BackendArgs into Qwen35LayerSplitAdapterConfig. Also add the LayerSplitShardMeta placement_backend field required by the previously ported runtime metadata slice.\n\nValidation: git diff --check; conflict-marker scan on promoted source files; stub g++ syntax smoke for LayerSplitShardMeta::placement_backend. Full CMake remains locally blocked by missing server deps/CUDA compiler-id environment issues.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Selective-port a no-op-safe slice from PR Luce-Org#321 by adding the backend IPC mode parse/name surface and declaration-only Qwen35 target-shard IPC client/daemon contract. Runtime implementation, CMake wiring, daemon dispatch, and mixed-backend activation remain intentionally deferred until the broader layer-split conflicts are reconciled. Update the auto-integration manifest with current PR classifications, retained worktrees, validation, and Codex delegation evidence.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Port the inert PR Luce-Org#321 target-shard IPC client implementation and register it with dflash_common. The client remains unactivated until daemon dispatch and runtime adapter wiring are reconciled. Validation: git diff --check; conflict-marker search; YAML parse. Local syntax probing remains blocked by the missing vendored ggml-backend.h dependency in this checkout.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
May 31, 2026
Port a narrow Luce-Org#321 current-layout slice by making inactive Qwen35 target-shard IPC state/snapshot calls no-op successes. This lets future runtime adapter hooks call snapshot/reset/restore helpers safely before the mixed-backend target-shard client is active.\n\nUpdate auto-integration manifest with current PR containment, probe results, Codex delegation outcome, validation, and retained worktrees.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-05-31 19:49 cron preflight, current PR containment, direct-merge probe counts, and the unpromoted PR Luce-Org#321 daemon-dispatch attempt blocked by the missing current-layout forward-from-activation helper.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Port a narrow PR Luce-Org#321 safety guard into the current stack: invalid capture layer indices, invalid positions, non-positive ring capacity, and invalid hidden size now fail instead of silently no-oping during DFlash feature-ring capture copies. Refresh auto-integration metadata with current PR containment and probe results.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Selectively ports the next inert PR Luce-Org#321 target-shard IPC prerequisite onto auto-integration. Adds a Qwen35 layer-split forward-from-activation entry point with boundary activation validation, explicit ActivationPair ownership semantics, and F32 capture guards while leaving daemon dispatch and adapter wiring deferred. Refreshes the auto-integration manifest with the 22:04 probe results.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-05-31 23:00 metadata/probe refresh: current PR-head containment, Luce-Org#321/Luce-Org#325 conflict probes, and tmux Claude/Codex read-only delegation outcomes. No source changes were promoted.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record PR Luce-Org#326 integration, current PR-head coverage, retained conflict probes, and Luce-Org#321 target-shard IPC feasibility findings.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-06-01 00:28 unattended probe pass. No source changes were promoted; Luce-Org#321 still needs live Qwen35 mixed-target adapter wiring, while Luce-Org#325's non-Luce-Org#321 same-backend disk-prefix-cache behavior is represented pending Luce-Org#321 mixed-target wiring.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-06-01 unattended PR integration pass, updated PR Luce-Org#285 head containment, current selective-port conflict counts, and delegated review conclusions for the remaining Luce-Org#321/Luce-Org#325/Luce-Org#135 slices.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-06-01 02:48 unattended probe pass, current PR containment, direct-merge conflict counts, and retained Luce-Org#321 Codex transcript. No source changes were promoted.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record the 2026-06-01 03:07 unattended probe run, including current PR-head containment, direct-merge conflict counts, the Codex Luce-Org#321 read-only delegation outcome, validation, and retained worktree paths.
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 1, 2026
Record current heads for PR Luce-Org#321 and Luce-Org#325 as represented by the auto-integration stack after direct merge and tmux-delegated conflict-resolution attempts confirmed the remaining diffs are already carried by current-layout port commits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR lets target layer split run across different backends and completes the DFlash speculative decode path on top of mixed-backend target split.
Same-backend layer split could already shard the target across multiple GPUs from the same backend, but CUDA/HIP mixed placement was limited to draft/target separation. The target itself could not be split across backend processes. DFlash also needs more than a plain target forward: verify requires hidden-state capture, draft feature ring or remote draft IPC forwarding, target KV snapshot/restore, and final token projection. This PR wires those required DFlash pieces into the mixed target shard IPC path.
Changes
DraftFeatureMirroror forwards them to remote draft IPC.snapshot/restoresupport to the remote target shard for DFlash speculative verify rollback.--target-shard-ipc-binis provided.Notes
cuda:0,hip:0,hip:1and layer split0.08,0.46,0.46; logs show CUDA running layers[0,5), and the two HIP shards running[5,35)and[35,64).restore=true.