[Hackathon] linux-kernel: per-hop latency models for the in-memory transport#8
[Hackathon] linux-kernel: per-hop latency models for the in-memory transport#8mariagorskikh wants to merge 2 commits into
Conversation
The README documents that mean_latency and duration are always 0.0 with the default in_memory transport because every hop delivers at ``time = now``. This adds a small, focused latency-model layer so those numbers actually mean something — without breaking determinism or existing traces. Surface ------- * ``Simulator(latency_model=...)`` — optional callable ``(rng, sender, receiver) -> delay_seconds``. The simulator now carries a dedicated, seed-derived ``_latency_rng`` so adding latency never perturbs per-agent or failure RNG streams. * ``InMemoryTransport.send`` schedules deliveries at ``now + delay``. ``delay`` is clamped at 0 so the clock can never go backwards. * ``ScenarioConfig.transport_config`` — YAML block parsed by ``runner._build_latency_model_from_config``. Empty/absent means the legacy zero-latency behaviour: byte-identical traces with prior runs. Models in ``nest_plugins_reference.transport.latency`` ------------------------------------------------------ Pure, composable functions: * ``constant_latency`` * ``uniform_latency`` * ``exponential_latency`` (long-tail packet-network model) * ``normal_latency`` (clamped at ``min_delay``) * ``pair_matrix_latency`` (heterogeneous topologies) * ``with_jitter`` (compose ±jitter onto any base model) * ``zero_latency`` (explicit no-op for clarity) * ``make_latency_model`` (YAML-dict -> model) Tests ----- * 36 new tests covering: each model's bounds and approximate mean, factory parsing, jitter envelope, RNG determinism, simulator integration (clock advances, mean_latency matches config), runner end-to-end (YAML -> measurable latency in trace), compatibility with message_drop and partition logic. * All 193 pre-existing tests still pass. Baseline marketplace trace is byte-identical to a fresh main checkout. Usage ----- ```yaml transport_config: kind: exponential mean: 0.020 jitter: 0.005 ``` Same seed -> byte-identical trace, with realistic per-hop latency.
Reviewer's GuideAdds configurable, deterministic per-hop latency modeling to the in-memory transport, plumbed from scenario YAML through ScenarioRunner into Simulator and InMemoryTransport, along with a suite of latency models in the reference plugins and comprehensive tests and docs. Sequence diagram for latency model construction from YAML transport_configsequenceDiagram
actor User
participant ScenarioRunner
participant ScenarioConfig
participant LatencyFactory as nest_plugins_reference.transport.latency
participant Simulator
User->>ScenarioRunner: run()
ScenarioRunner->>ScenarioConfig: read transport_config
ScenarioRunner->>ScenarioRunner: _build_latency_model()
ScenarioRunner->>ScenarioRunner: _build_latency_model_from_config(transport_config)
ScenarioRunner->>LatencyFactory: make_latency_model(spec)
LatencyFactory-->>ScenarioRunner: LatencyModel
ScenarioRunner-->>ScenarioRunner: latency_model
ScenarioRunner->>Simulator: __init__(latency_model=latency_model, seed, ...)
Simulator-->>ScenarioRunner: Simulator instance
Sequence diagram for InMemoryTransport send with per-hop latency modelsequenceDiagram
participant Agent
participant InMemoryTransport
participant LatencyModel
participant VirtualClock
participant EventQueue
Agent->>InMemoryTransport: send(to, payload, ...)
InMemoryTransport->>VirtualClock: now
VirtualClock-->>InMemoryTransport: now
InMemoryTransport->>LatencyModel: latency_model(_latency_rng, _agent_id, to)
LatencyModel-->>InMemoryTransport: delay_seconds
InMemoryTransport->>EventQueue: push(Event(time=now + max(0.0, delay_seconds), kind=deliver, ...))
EventQueue-->>InMemoryTransport: queued
InMemoryTransport-->>Agent: send completed
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The LatencyModel type alias is defined separately in both nest_core.sim.transport and nest_plugins_reference.transport.latency, while Simulator/runner use Any in places; consider centralising this alias in one module and importing it to avoid divergence between core and plugin interfaces.
- The comments around the latency RNG usage are now misleading: InMemoryTransport.init says it reuses the simulator's failure RNG, and the LatencyModel docstring in transport.py says the simulator passes its failure RNG, but Simulator actually passes a dedicated _latency_rng; update these docstrings to reflect the current RNG wiring.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The LatencyModel type alias is defined separately in both nest_core.sim.transport and nest_plugins_reference.transport.latency, while Simulator/runner use Any in places; consider centralising this alias in one module and importing it to avoid divergence between core and plugin interfaces.
- The comments around the latency RNG usage are now misleading: InMemoryTransport.__init__ says it reuses the simulator's failure RNG, and the LatencyModel docstring in transport.py says the simulator passes its failure RNG, but Simulator actually passes a dedicated _latency_rng; update these docstrings to reflect the current RNG wiring.
## Individual Comments
### Comment 1
<location path="packages/nest-core/nest_core/runner.py" line_range="53-57" />
<code_context>
+ spec = raw_latency if isinstance(raw_latency, dict) else transport_config
+ if not spec:
+ return None
+ try:
+ from nest_plugins_reference.transport.latency import make_latency_model
+ except ImportError: # pragma: no cover — only when reference plugins aren't installed
+ return None
+ return make_latency_model(spec)
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Silently ignoring ImportError for latency plugins may hide configuration mistakes.
When `nest_plugins_reference` isn’t installed, `_build_latency_model_from_config` returns `None`, so a config that explicitly sets a latency model ends up running with no latency and no signal that anything is wrong. It would be better to raise an explicit error or at least log a warning when `transport_config` requests a latency model but the reference plugin import fails, so misconfigurations don’t go unnoticed.
</issue_to_address>
### Comment 2
<location path="packages/nest-plugins-reference/nest_plugins_reference/transport/latency.py" line_range="262-266" />
<code_context>
+ )
+ raise ValueError(msg)
+
+ jitter = spec.get("jitter")
+ if jitter is not None and float(jitter) > 0:
+ base = with_jitter(base, float(jitter))
+
+ return base
+
+
</code_context>
<issue_to_address>
**issue:** Negative `jitter` values are silently ignored instead of raising as other helpers do.
Here `jitter` is only applied when `float(jitter) > 0`, so negative values are silently treated as “no jitter”. In contrast, `with_jitter` validates `jitter >= 0` and raises on invalid input. To keep behavior consistent and surface misconfigurations, consider rejecting negative `jitter` here as well (e.g., raise if `jitter < 0`).
</issue_to_address>
### Comment 3
<location path="packages/nest-plugins-reference/nest_plugins_reference/transport/latency.py" line_range="229-233" />
<code_context>
+ base = uniform_latency(float(spec["low"]), float(spec["high"]))
+ elif kind == "exponential":
+ base = exponential_latency(float(spec["mean"]))
+ elif kind == "normal":
+ base = normal_latency(
+ float(spec["mean"]),
+ float(spec["stddev"]),
+ min_delay=float(spec.get("min_delay", 0.0)),
+ )
+ elif kind == "pair_matrix":
</code_context>
<issue_to_address>
**suggestion:** Missing required parameters in `normal` latency spec cause a KeyError rather than a clearer configuration error.
Here we index `spec["mean"]` and `spec["stddev"]` directly, so a missing field results in a raw `KeyError` instead of a clear configuration error. Consider either raising a `ValueError` with a descriptive message when required fields are absent, or validating required keys for each `kind` before constructing the latency function.
Suggested implementation:
```python
elif kind == "normal":
missing = [key for key in ("mean", "stddev") if key not in spec]
if missing:
missing_str = ", ".join(sorted(missing))
msg = f"normal latency spec missing required field(s): {missing_str}"
raise ValueError(msg)
base = normal_latency(
float(spec["mean"]),
float(spec["stddev"]),
min_delay=float(spec.get("min_delay", 0.0)),
)
```
If you want consistent behavior across all latency kinds, you can apply the same pattern to other branches:
- `"constant"`: require `"mean"`.
- `"uniform"`: require `"low"` and `"high"`.
- `"exponential"`: require `"mean"`.
For each branch, add a similar `missing = [...]` check before indexing into `spec` and raise a descriptive `ValueError` listing the missing keys.
</issue_to_address>
### Comment 4
<location path="packages/nest-core/tests/test_sim_latency.py" line_range="95-104" />
<code_context>
+
+
+# ---------------------------------------------------------------------------
+# Constant latency: every hop adds exactly the configured delay
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+async def test_constant_latency_advances_clock(tmp_path: Path) -> None:
+ from nest_plugins_reference.transport.latency import constant_latency
+
+ trace = tmp_path / "constant.jsonl"
+ sim = Simulator(seed=1, trace_path=trace, latency_model=constant_latency(0.01))
+ sim.add_agent(AgentId("p"), _Pinger(AgentId("e")))
+ sim.add_agent(AgentId("e"), _Echoer())
+
+ await sim.run(max_ticks=200)
+
+ events = _read_trace(trace)
+ mean = _mean_latency(events)
+ # ping -> arrives at 0.01, pong -> arrives at 0.02. Mean of the two
+ # send-to-receive deltas is exactly 0.01.
+ assert mean == pytest.approx(0.01, rel=1e-9)
+ # The virtual clock advanced past zero.
+ assert sim.clock.now >= 0.01
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a transport-level test that a negative latency from a custom model is clamped and does not move the clock backwards
The existing clamping in `InMemoryTransport.send` prevents negative delays, but a custom `latency_model` could still return them. Please add a test that injects a custom model returning a negative delay into `Simulator`, sends a message, and asserts that event timestamps never precede `clock.now` and that the clock is monotonically non-decreasing. This will lock in the contract that latency cannot move time backwards, independent of the built-in models.
Suggested implementation:
```python
events = _read_trace(trace)
assert _mean_latency(events) == 0.0
assert sim.clock.now == 0.0
# ---------------------------------------------------------------------------
# Negative latency: custom model is clamped and does not move time backwards
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_negative_latency_is_clamped_and_monotone(tmp_path: Path) -> None:
# Custom latency model that always returns a negative delay.
# The InMemoryTransport is expected to clamp this to a non-negative delay,
# so that time never moves backwards.
def negative_latency(*_args: object, **_kwargs: object) -> float:
return -0.5
trace = tmp_path / "negative_latency.jsonl"
sim = Simulator(seed=1, trace_path=trace, latency_model=negative_latency)
sim.add_agent(AgentId("p"), _Pinger(AgentId("e")))
sim.add_agent(AgentId("e"), _Echoer())
await sim.run(max_ticks=200)
events = _read_trace(trace)
# Extract event timestamps from the trace. We expect them to be
# monotonically non-decreasing and never move time backwards.
timestamps = [event["time"] for event in events]
# No event should be scheduled before time zero.
assert not timestamps or min(timestamps) >= 0.0
# Event timestamps must be monotonically non-decreasing.
assert all(
later >= earlier for earlier, later in zip(timestamps, timestamps[1:])
)
# The simulator clock is monotonically non-decreasing and should end
# at or after the last event timestamp.
if timestamps:
assert sim.clock.now >= timestamps[-1]
assert sim.clock.now >= 0.0
# ---------------------------------------------------------------------------
These tests live in nest-core (not nest-plugins-reference) because they
```
I assumed each trace event has a numeric `"time"` field representing the event timestamp, since that’s the most common pattern and matches the semantics needed for these assertions. If the actual key used in your trace format differs (for example `"ts"`, `"timestamp"`, or nested under another key), please update `timestamps = [event["time"] for event in events]` accordingly, or adjust `_read_trace` to expose a uniform `"time"` field.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| try: | ||
| from nest_plugins_reference.transport.latency import make_latency_model | ||
| except ImportError: # pragma: no cover — only when reference plugins aren't installed | ||
| return None | ||
| return make_latency_model(spec) |
There was a problem hiding this comment.
issue (bug_risk): Silently ignoring ImportError for latency plugins may hide configuration mistakes.
When nest_plugins_reference isn’t installed, _build_latency_model_from_config returns None, so a config that explicitly sets a latency model ends up running with no latency and no signal that anything is wrong. It would be better to raise an explicit error or at least log a warning when transport_config requests a latency model but the reference plugin import fails, so misconfigurations don’t go unnoticed.
| jitter = spec.get("jitter") | ||
| if jitter is not None and float(jitter) > 0: | ||
| base = with_jitter(base, float(jitter)) | ||
|
|
||
| return base |
There was a problem hiding this comment.
issue: Negative jitter values are silently ignored instead of raising as other helpers do.
Here jitter is only applied when float(jitter) > 0, so negative values are silently treated as “no jitter”. In contrast, with_jitter validates jitter >= 0 and raises on invalid input. To keep behavior consistent and surface misconfigurations, consider rejecting negative jitter here as well (e.g., raise if jitter < 0).
| elif kind == "normal": | ||
| base = normal_latency( | ||
| float(spec["mean"]), | ||
| float(spec["stddev"]), | ||
| min_delay=float(spec.get("min_delay", 0.0)), |
There was a problem hiding this comment.
suggestion: Missing required parameters in normal latency spec cause a KeyError rather than a clearer configuration error.
Here we index spec["mean"] and spec["stddev"] directly, so a missing field results in a raw KeyError instead of a clear configuration error. Consider either raising a ValueError with a descriptive message when required fields are absent, or validating required keys for each kind before constructing the latency function.
Suggested implementation:
elif kind == "normal":
missing = [key for key in ("mean", "stddev") if key not in spec]
if missing:
missing_str = ", ".join(sorted(missing))
msg = f"normal latency spec missing required field(s): {missing_str}"
raise ValueError(msg)
base = normal_latency(
float(spec["mean"]),
float(spec["stddev"]),
min_delay=float(spec.get("min_delay", 0.0)),
)If you want consistent behavior across all latency kinds, you can apply the same pattern to other branches:
"constant": require"mean"."uniform": require"low"and"high"."exponential": require"mean".
For each branch, add a similar missing = [...] check before indexing into spec and raise a descriptive ValueError listing the missing keys.
| # Constant latency: every hop adds exactly the configured delay | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_constant_latency_advances_clock(tmp_path: Path) -> None: | ||
| from nest_plugins_reference.transport.latency import constant_latency | ||
|
|
||
| trace = tmp_path / "constant.jsonl" | ||
| sim = Simulator(seed=1, trace_path=trace, latency_model=constant_latency(0.01)) |
There was a problem hiding this comment.
suggestion (testing): Add a transport-level test that a negative latency from a custom model is clamped and does not move the clock backwards
The existing clamping in InMemoryTransport.send prevents negative delays, but a custom latency_model could still return them. Please add a test that injects a custom model returning a negative delay into Simulator, sends a message, and asserts that event timestamps never precede clock.now and that the clock is monotonically non-decreasing. This will lock in the contract that latency cannot move time backwards, independent of the built-in models.
Suggested implementation:
events = _read_trace(trace)
assert _mean_latency(events) == 0.0
assert sim.clock.now == 0.0
# ---------------------------------------------------------------------------
# Negative latency: custom model is clamped and does not move time backwards
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_negative_latency_is_clamped_and_monotone(tmp_path: Path) -> None:
# Custom latency model that always returns a negative delay.
# The InMemoryTransport is expected to clamp this to a non-negative delay,
# so that time never moves backwards.
def negative_latency(*_args: object, **_kwargs: object) -> float:
return -0.5
trace = tmp_path / "negative_latency.jsonl"
sim = Simulator(seed=1, trace_path=trace, latency_model=negative_latency)
sim.add_agent(AgentId("p"), _Pinger(AgentId("e")))
sim.add_agent(AgentId("e"), _Echoer())
await sim.run(max_ticks=200)
events = _read_trace(trace)
# Extract event timestamps from the trace. We expect them to be
# monotonically non-decreasing and never move time backwards.
timestamps = [event["time"] for event in events]
# No event should be scheduled before time zero.
assert not timestamps or min(timestamps) >= 0.0
# Event timestamps must be monotonically non-decreasing.
assert all(
later >= earlier for earlier, later in zip(timestamps, timestamps[1:])
)
# The simulator clock is monotonically non-decreasing and should end
# at or after the last event timestamp.
if timestamps:
assert sim.clock.now >= timestamps[-1]
assert sim.clock.now >= 0.0
# ---------------------------------------------------------------------------
These tests live in nest-core (not nest-plugins-reference) because theyI assumed each trace event has a numeric "time" field representing the event timestamp, since that’s the most common pattern and matches the semantics needed for these assertions. If the actual key used in your trace format differs (for example "ts", "timestamp", or nested under another key), please update timestamps = [event["time"] for event in events] accordingly, or adjust _read_trace to expose a uniform "time" field.
Layer picked
Transport (layer 1) — reference plugin extension + simulator wiring.
Why
The README has a section titled "Determinism & what the clock does" that explicitly flags this gap:
For a testing tool that brags about "diff a trace against properties you care about," a transport that always reports zero latency is a sharp edge. NEST already exposes
mean_latencyas a built-in metric; it's just always 0.0. This PR fixes that without breaking anything.Core idea
A tiny, composable latency-model layer wired into the existing in-memory transport. One thing, well.
Simulator(latency_model=...)— optional callable(rng, sender, receiver) -> delay_seconds. The simulator gets a dedicated, seed-derived_latency_rngso adding latency never perturbs the per-agent or failure RNG streams — existing traces stay byte-identical.InMemoryTransport.sendschedules deliveries atnow + delay, clamped at zero so the clock can never go backwards.transport_config:(parsed byrunner._build_latency_model_from_config) so users get realistic latency from YAML alone.Six pure, composable models in
nest_plugins_reference.transport.latency:kindconstantmeanuniformlow,highexponentialmeannormalmean,stddev,min_delaypair_matrixmatrix: {"a,b": delay, ...},defaultzeroPlus
with_jitter(base, jitter)to wrap any model in ±uniform jitter.How to test
Run it twice — the traces are byte-identical (
md5sumconfirms).Key assumptions / design notes
transport_config⇒ no latency model ⇒time = now⇒ existing traces are byte-for-byte unchanged. I verified the bundledmarketplacebaseline trace MD5-matches across a fresh run on this branch.seeddirectly ((seed * 2654435761 + 0xA1A7C7) & ((1<<63)-1)), not from the master RNG sequence. Adding/removing the latency feature does not change the per-agent or failure-RNG streams for an existing scenario.max(0.0, ...)so the virtual clock can never move backwards — important for thenormalmodel and forwith_jitter.test_latency_compatible_with_drop_rate: latency +message_drop_rate+ Byzantine + partitions all coexist.transport_config: {kind: ..., mean: ...}at the top level or nested astransport_config: {latency: {kind: ..., ...}}— whichever reads better.Future work
latency_modelcould itself become a named plugin innest.plugins.transport, so the registry-driven plugin path (currently shadowed by the simulator's hard-wiredInMemoryTransport) gets exercised.latency_p50/latency_p99metric alongsidemean_latency— easy now that the data is real.pair_matrix-style) to model lossy WAN links.replaytransport that reads latencies from a captured production trace.Persona
Linux kernel maintainer — networking, queues, scheduling. Small, sharp tools. Pure functions, no global state, one knob per job.
https://claude.ai/code/session_01C5j2D4MgCkPgsjSCqBVpWW
Generated by Claude Code
Summary by Sourcery
Introduce configurable per-hop latency modeling to the in-memory transport and wire it through the simulator and scenario runner while preserving backward-compatible zero-latency behavior.
New Features:
Enhancements:
Tests: