feat: mocker disagg #3833

PeaBrane · 2025-10-22T21:28:23Z

Overview:

A continuation of #3847

Major fixes/chores:

Refactored so that the KvManager would publish the kv events directly over NATs instead of relying on intermediate relays
Removed a very expensive op where we were sending ForwardPassMetrics after every token generated (instead of after every forward pass)
Use a running mean data structure for hit rate tracking (this was the second most expensive op)

Other minor fixes/chores:

Limited the mocker random token range to 100 - 200 so less likely to encounter detokenization failures
Update mocker timing estimates with new planner sweeps on H200

Scoped for future:

Some benchmarking with it
Make event publishing over zmq so that our kv event publisher can be tested in CI as well
Simulate nixl transfer latency (right now assumed to be 0)

Signed-off-by: PeaBrane <[email protected]>

coderabbitai · 2025-10-22T22:15:59Z

Walkthrough

This PR introduces worker type awareness to the Dynamo Mocker engine by adding prefill/decode mode flags, centralizing CLI argument parsing, and propagating an is_prefill flag through Python and Rust layers to control engine behavior including KV event publishing, max token handling, and endpoint model type selection.

Changes

Cohort / File(s)	Change Summary
Mocker shell script invocation `benchmarks/router/run_engines.sh`	Appends mode-aware flags to MOCKER_ARGS per worker loop: `--is-prefill-worker` when MODE is "prefill", `--is-decode-worker` when MODE is "decode"
Python CLI argument parsing `components/src/dynamo/mocker/args.py`	New module providing `parse_args()` for comprehensive CLI interface and `create_temp_engine_args_file(args)` to build engine config from CLI arguments, write to temp JSON file, and return path. Supports worker-type flags, KV events toggling, and legacy extra engine args file
Python worker refactoring `components/src/dynamo/mocker/main.py`	Removed inline `cmd_line_args()` function; replaced with imports from `.args`. Worker now uses `parse_args()` and either consumes provided extra_engine_args or generates temp file via `create_temp_engine_args_file()`. EntrypointArgs construction now includes `is_prefill=args.is_prefill_worker`
Rust engine configuration `launch/dynamo-run/src/lib.rs`, `lib/llm/src/entrypoint.rs`	Added `is_prefill: bool` field to EngineConfig::StaticCore variant, updating the enum signature and construction path for Mocker engine output
Rust bindings and entrypoint `lib/bindings/python/rust/llm/entrypoint.rs`	Added `is_prefill: bool` field to EntrypointArgs struct with PyO3 binding (default false). Updated constructor signature and propagated field through engine selection and Mocker engine configuration
Endpoint model type selection `lib/llm/src/entrypoint/input/endpoint.rs`	Updated StaticCore pattern match to destructure `is_prefill`. Sets model_type to Prefill when is_prefill is true, otherwise Chat \| Completions
Mocker engine logic `lib/llm/src/mocker/engine.rs`	For prefill workers: override max_tokens to 1, add dummy disaggregated_params to output payload. KV events publishing now requires both `enable_prefix_caching` and `publish_kv_events` to be true (previously only `enable_prefix_caching`)
Mocker configuration `lib/llm/src/mocker/protocols.rs`	Added `publish_kv_events: bool` (default true) and `is_prefill: bool` (default false) fields to MockEngineArgs struct; wired through JSON builder path for extra_args overrides

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes follow a consistent pattern of threading an is_prefill flag through multiple layers (Python CLI → Rust bindings → engine config → worker logic). While the file count is moderate-to-high (~9 files), the modifications are largely homogeneous plumbing changes with localized logic implementations in the engine and endpoint layers. Understanding the flow requires tracing across layers, but individual edits remain straightforward.

Poem

🐰 A prefill flag hops through the code,

From shell to Python, Rust to load,

With KV events and tokens refined,

Workers now know their dispatch kind! ✨

Pre-merge checks

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “feat: mocker disagg” is too terse and uses an unclear abbreviation that does not clearly convey the main change of adding prefill mode support and disaggregated parameters in the mocker engine. It is related to the mocker component but is vague about the actual feature being introduced.	Please revise the title to explicitly describe the primary feature, for example “feat: add prefill worker support and disaggregated parameters to mocker engine,” to make the change clear at a glance.
Description Check	❓ Inconclusive	The pull request description provides substantive information about the changes in the Overview section, including specific details about major fixes (KvManager refactoring, ForwardPassMetrics removal, hit rate tracking), minor fixes (mocker token range, timing estimates), and future work. However, the description deviates significantly from the required template structure by lacking three specified section headings: a dedicated "#### Details:" section, a "#### Where should the reviewer start?" section calling out specific files for review, and a "#### Related Issues:" section with action keywords (though #3847 is mentioned inline in the Overview). While the content quality is good and directly relevant to the PR objectives, the structural mismatch with the explicit template creates ambiguity about whether it fully satisfies the documentation requirements.	To fully meet the template requirements, consider reorganizing the description to include all four sections with their specified headings. Specifically, add a dedicated "#### Details:" section that summarizes the changes, a "#### Where should the reviewer start?" section that calls out key files like `lib/llm/src/mocker/engine.rs` and `lib/llm/src/mocker/protocols.rs` for focused review, and a "#### Related Issues:" section that properly references #3847 using an action keyword such as "Relates to #3847". This will ensure the description aligns with the repository's documentation standards.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

lib/bindings/python/rust/llm/entrypoint.rs (1)
255-269: Propagate args.is_prefill to MockEngineArgs

MockEngineArgs is loaded or defaulted without considering the endpoint’s args.is_prefill, so the mock engine won’t enforce prefill limits. After constructing mocker_args, assign the flag:
-            let mocker_args = if let Some(extra_args_path) = args.extra_engine_args {
+            let mut mocker_args = if let Some(extra_args_path) = args.extra_engine_args {
                 MockEngineArgs::from_json_file(&extra_args_path)? 
             } else {
                 MockEngineArgs::default()
             };
+            mocker_args.is_prefill = args.is_prefill;
launch/dynamo-run/src/lib.rs (1)

144-149: Add a prefill_worker flag and wire it to is_prefill

The Flags struct (launch/dynamo-run/src/flags.rs) currently has no prefill indicator, so is_prefill is always set to false. Introduce a --prefill-worker boolean in Flags and pass its value to is_prefill when constructing EngineConfig::StaticCore in launch/dynamo-run/src/lib.rs.

components/src/dynamo/mocker/main.py (1)

26-34: Decode worker flag silently ignored when --extra-engine-args is used

Right now, if someone launches the mocker with both --is-decode-worker and a custom --extra-engine-args JSON, the branch on Lines 26-34 just forwards that file untouched. As a result, publish_kv_events stays whatever the JSON dictated (often True), so decode workers keep emitting KV events even though the CLI flag promises “does not publish KV events.” This is a regression compared to the non-JSON path where create_temp_engine_args_file forces publish_kv_events=False. Please make sure the decode/no-kv toggles are applied regardless of how the extra args are supplied (e.g., merge the override into a temp copy of the supplied JSON or error out if it conflicts).

🧹 Nitpick comments (5)

lib/llm/src/mocker/protocols.rs (1)
101-108: JSON wiring for publish_kv_events and is_prefill looks correct; add a small mapping test.

The builder defaults and parsing logic are sound. Add a unit test to lock behavior and guard against regressions.

Example:
#[test]
fn loads_publish_kv_and_is_prefill() {
    let tmp = tempfile::NamedTempFile::new().unwrap();
    std::fs::write(tmp.path(), r#"{ "publish_kv_events": false, "is_prefill": true }"#).unwrap();
    let args = MockEngineArgs::from_json_file(tmp.path()).unwrap();
    assert!(!args.publish_kv_events);
    assert!(args.is_prefill);
}
Also applies to: 132-145, 226-236
lib/llm/src/entrypoint/input/endpoint.rs (1)

70-92: Correctly attaches Prefill vs Chat|Completions based on is_prefill.

Good conditional routing of model type with no behavior change for non-prefill.

Consider a trace log on attach indicating the chosen model_type for easier debugging.
lib/llm/src/mocker/engine.rs (1)
359-366: Also bound scheduler’s requested tokens in prefill mode.

You cap streamed tokens to 1, but DirectRequest.max_output_tokens remains the original value. This can overproduce scheduler work and signals. Clamp it to 1 when is_prefill.

Example adjustment (within generate):
let is_prefill = self.engine_args.is_prefill;
let requested_max = request
    .stop_conditions
    .max_tokens
    .expect("max_output_tokens must be specified for mocker") as usize;

let effective_max = if is_prefill { 1 } else { requested_max };

let direct_request = DirectRequest {
    tokens: request.token_ids.clone(),
    max_output_tokens: effective_max,
    uuid: Some(request_uuid),
    dp_rank,
};
Optional: if a completion signal arrives before effective_max, send a graceful length finish instead of an error to avoid noisy failures in prefill.
components/src/dynamo/mocker/args.py (2)
180-192: Make worker-type flags mutually exclusive.

Prevent accidental --is-prefill-worker + --is-decode-worker combos.

Patch:
-    # Worker type configuration
-    parser.add_argument(
+    # Worker type configuration (mutually exclusive)
+    group = parser.add_mutually_exclusive_group()
+    group.add_argument(
         "--is-prefill-worker",
         action="store_true",
         default=False,
         help="Register as Prefill model type instead of Chat+Completions (default: False)",
     )
-    parser.add_argument(
+    group.add_argument(
         "--is-decode-worker",
         action="store_true",
         default=False,
         help="Mark this as a decode worker which does not publish KV events (default: False)",
     )
53-61: Consider cleaning up the temp JSON automatically.

If the main doesn’t delete it, register an atexit hook here to remove the file.

Example:
import atexit
# ...
temp_path = Path(f.name)
atexit.register(lambda p=temp_path: p.exists() and p.unlink(missing_ok=True))

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb294b9 and 127883b.

📒 Files selected for processing (9)

benchmarks/router/run_engines.sh (1 hunks)
components/src/dynamo/mocker/args.py (1 hunks)
components/src/dynamo/mocker/main.py (2 hunks)
launch/dynamo-run/src/lib.rs (1 hunks)
lib/bindings/python/rust/llm/entrypoint.rs (4 hunks)
lib/llm/src/entrypoint.rs (1 hunks)
lib/llm/src/entrypoint/input/endpoint.rs (2 hunks)
lib/llm/src/mocker/engine.rs (3 hunks)
lib/llm/src/mocker/protocols.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#3184
File: docs/architecture/kv_cache_routing.md:70-73
Timestamp: 2025-09-23T20:08:37.105Z
Learning: PeaBrane prefers to keep documentation diagrams simplified to avoid visual overload, even when this means sacrificing some technical precision for the sake of clarity and comprehension. They prioritize pedagogical effectiveness over exhaustive technical detail in architectural diagrams.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#2756
File: lib/llm/src/kv_router/subscriber.rs:36-44
Timestamp: 2025-08-29T10:03:48.330Z
Learning: PeaBrane prefers to keep PRs contained in scope and is willing to defer technical improvements to future PRs when the current implementation works for the immediate use case. They acknowledge technical debt but prioritize deliverability over completeness in individual PRs.

🧬 Code graph analysis (4)

lib/llm/src/entrypoint/input/endpoint.rs (2)

lib/llm/src/model_card.rs (2)

model_type (550-550)

model_type (711-713)

lib/bindings/python/src/dynamo/_core.pyi (1)

ModelType (889-896)

components/src/dynamo/mocker/args.py (1)

lib/llm/src/mocker/protocols.rs (1)

default (111-115)

components/src/dynamo/mocker/main.py (1)

components/src/dynamo/mocker/args.py (2)

create_temp_engine_args_file (19-61)

parse_args (64-202)

lib/bindings/python/rust/llm/entrypoint.rs (2)

lib/llm/src/discovery/model_manager.rs (1)

new (70-81)

lib/llm/src/local_model.rs (21)

model_path (90-93)

model_name (95-98)

endpoint_id (100-103)

endpoint_id (399-401)

context_length (105-108)

router_config (136-139)

router_config (372-374)

kv_cache_block_size (111-114)

http_host (116-119)

http_host (356-358)

http_port (121-124)

http_port (360-362)

tls_cert_path (126-129)

tls_cert_path (364-366)

extra_engine_args (166-169)

namespace (141-144)

namespace (380-382)

custom_backend_metrics_endpoint (181-184)

custom_backend_metrics_endpoint (384-386)

custom_backend_metrics_polling_interval (186-189)

custom_backend_metrics_polling_interval (388-390)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Mirror Repository to GitLab
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (.)
GitHub Check: clippy (.)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

components/src/dynamo/mocker/args.py (1)

45-52: Good: decode disables KV events, prefill sets is_prefill.

This matches the engine semantics introduced in Rust.

benchmarks/router/run_engines.sh (1)

201-205: Prefill/decode worker flags are correctly supported by mocker CLI

The flags --is-prefill-worker and --is-decode-worker are already defined in components/src/dynamo/mocker/args.py and handled in components/src/dynamo/mocker/main.py, so the script will not trigger unknown-argument errors.

lib/llm/src/entrypoint.rs

lib/llm/src/mocker/engine.rs

Signed-off-by: PeaBrane <[email protected]>

lib/llm/src/entrypoint/input/endpoint.rs

lib/bindings/python/rust/llm/kv.rs

lib/llm/src/kv_router/indexer.rs

grahamking · 2025-10-23T13:05:53Z

@PeaBrane This PR is three or more different things. Could you split it into more focused PRs? One for the worker_id removal. One for the --is-prefill param. And so on.

That makes it much easier to locate a change when you git blame, easier to revert a specific change, easier to review, easier for someone reading the logs to understand what is changing in the project.

PeaBrane · 2025-10-23T15:27:31Z

@grahamking Thanks, I will try to do better. But I think this one of those cases, where the changes are too correlated to be easily broken down into separate PRs. The additional context here is we would to get disagg mockers in a functional state fast for router/planner benchmarking, and ideally we need it soon.

The core changes should be contained to the mockers, so should not affect the core dynamo components. That being said, I will try to see if I can break it down to a series of PRs

Signed-off-by: PeaBrane <[email protected]>

lib/llm/src/mocker/scheduler.rs

Signed-off-by: PeaBrane <[email protected]>

first commit

a767d48

Signed-off-by: PeaBrane <[email protected]>

pull-request-size bot added the size/M label Oct 22, 2025

github-actions bot added the feat label Oct 22, 2025

Merge branch 'main' into rupei/mocker-prefill

89f9560

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 21:30 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 21:36 Inactive

hook everything up

127883b

Signed-off-by: PeaBrane <[email protected]>

pull-request-size bot added size/L and removed size/M labels Oct 22, 2025

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 22:06 Inactive

PeaBrane marked this pull request as ready for review October 22, 2025 22:06

PeaBrane requested review from a team as code owners October 22, 2025 22:06

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 22:08 Inactive

PeaBrane requested review from grahamking and tedzhouhk October 22, 2025 22:09

coderabbitai bot reviewed Oct 22, 2025

View reviewed changes

lib/llm/src/entrypoint.rs Show resolved Hide resolved

lib/llm/src/mocker/engine.rs Outdated Show resolved Hide resolved

new mocker timing predictions based on h200 planner sweeps

d14df06

Signed-off-by: PeaBrane <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 23:10 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 22, 2025 23:19 Inactive

fix triple inversion

dff4790

Signed-off-by: PeaBrane <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 00:39 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 00:40 Inactive

more perf fixes

c082d48

Signed-off-by: PeaBrane <[email protected]>

pull-request-size bot added size/XL and removed size/L labels Oct 23, 2025

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 01:48 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 01:49 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 06:05 Inactive

mocker docs update

bb7d360

Signed-off-by: PeaBrane <[email protected]>

grahamking reviewed Oct 23, 2025

View reviewed changes

lib/llm/src/entrypoint/input/endpoint.rs Outdated Show resolved Hide resolved

grahamking reviewed Oct 23, 2025

View reviewed changes

lib/bindings/python/rust/llm/kv.rs Show resolved Hide resolved

grahamking reviewed Oct 23, 2025

View reviewed changes

lib/llm/src/kv_router/indexer.rs Outdated Show resolved Hide resolved

Merge branch 'main' into rupei/mocker-prefill

f5fb2d8

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 15:42 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 15:44 Inactive

no need for ? debug on dp_rank u32

5f3798d

Signed-off-by: PeaBrane <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 17:46 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 17:47 Inactive

Merge remote-tracking branch 'origin/main' into rupei/mocker-prefill

d6ed47c

Signed-off-by: PeaBrane <[email protected]>

pull-request-size bot added size/XL and removed size/XXL labels Oct 23, 2025

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 17:53 Inactive

PeaBrane requested a review from grahamking October 23, 2025 17:53

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 17:54 Inactive

PeaBrane requested review from alec-flowers and jthomson04 October 23, 2025 17:55

Merge remote-tracking branch 'origin/main' into rupei/mocker-prefill

fff5b3e

Signed-off-by: PeaBrane <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 22:21 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 23, 2025 22:26 Inactive

jthomson04 reviewed Oct 24, 2025

View reviewed changes

lib/llm/src/mocker/scheduler.rs Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into rupei/mocker-prefill

f5cee08

Signed-off-by: PeaBrane <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 21:22 Inactive

keep updated decode formula

3bdd59c

Signed-off-by: PeaBrane <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 21:27 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 21:33 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: mocker disagg #3833

feat: mocker disagg #3833

Uh oh!

PeaBrane commented Oct 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking commented Oct 23, 2025 •

edited

Loading

Uh oh!

PeaBrane commented Oct 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: mocker disagg #3833

Are you sure you want to change the base?

feat: mocker disagg #3833

Uh oh!

Conversation

PeaBrane commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Uh oh!

coderabbitai bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeaBrane commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PeaBrane commented Oct 22, 2025 •

edited

Loading

coderabbitai bot commented Oct 22, 2025 •

edited

Loading

grahamking commented Oct 23, 2025 •

edited

Loading

PeaBrane commented Oct 23, 2025 •

edited

Loading