Skip to content

Nvidia Blackwell Support#7

Merged
LZL0 merged 13 commits into
masterfrom
performance
Dec 8, 2025
Merged

Nvidia Blackwell Support#7
LZL0 merged 13 commits into
masterfrom
performance

Conversation

@LZL0

@LZL0 LZL0 commented Dec 8, 2025

Copy link
Copy Markdown
Member

Summary by cubic

Optimized streaming guardrails and drift detection for Nvidia Blackwell–class throughput and added a full benchmark suite and docs. L0 now sustains 90K+ tokens/s with full features, providing ample headroom for 1000+ t/s models.

  • Performance

    • JSON guardrail now uses incremental state (O(delta) per token) with full analysis only at completion.
    • Drift detection now analyzes a sliding window (default 500 chars) for meta/tone/repetition/markdown checks.
    • Tuned defaults: guardrails=15, drift=25, checkpoint=20; updated examples in ADVANCED.md.
  • Benchmarks

    • New benchmark suite (tests/test_benchmark.py) and BENCHMARKS.md with scenarios, TTFT, and overhead.
    • README highlights “Blackwell-ready”.

Written for commit dcb1a47. Summary will update automatically on new commits.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 7 files

Prompt for AI agents (all 2 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="BENCHMARKS.md">

<violation number="1" location="BENCHMARKS.md:50">
P2: Incorrect import pattern in code example. The L0 API exports `run` directly from the module, so `from l0 import l0` will fail. Use `import l0` instead, and replace `json_rule()` with an actual exported guardrail like `l0.JSON_ONLY_GUARDRAILS` or `l0.Guardrails.recommended()`.</violation>
</file>

<file name="tests/test_benchmark.py">

<violation number="1" location="tests/test_benchmark.py:291">
P2: Inconsistent TTFT measurement: `start_time` is set after `_internal_run` completes, but in `run_baseline_benchmark` it&#39;s set before iteration. This makes time-to-first-token comparisons between baseline and L0 benchmarks invalid. Consider moving `start_time` before the `_internal_run` call to be consistent.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Comment thread BENCHMARKS.md Outdated

Configure via `check_intervals`:
```python
from l0 import l0

@cubic-dev-ai cubic-dev-ai Bot Dec 8, 2025

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Incorrect import pattern in code example. The L0 API exports run directly from the module, so from l0 import l0 will fail. Use import l0 instead, and replace json_rule() with an actual exported guardrail like l0.JSON_ONLY_GUARDRAILS or l0.Guardrails.recommended().

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At BENCHMARKS.md, line 50:

<comment>Incorrect import pattern in code example. The L0 API exports `run` directly from the module, so `from l0 import l0` will fail. Use `import l0` instead, and replace `json_rule()` with an actual exported guardrail like `l0.JSON_ONLY_GUARDRAILS` or `l0.Guardrails.recommended()`.</comment>

<file context>
@@ -0,0 +1,97 @@
+
+Configure via `check_intervals`:
+```python
+from l0 import l0
+
+result = await l0.run(
</file context>

✅ Addressed in 768c1a8

Comment thread tests/test_benchmark.py Outdated
check_intervals=check_intervals,
)

start_time = time.perf_counter()

@cubic-dev-ai cubic-dev-ai Bot Dec 8, 2025

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Inconsistent TTFT measurement: start_time is set after _internal_run completes, but in run_baseline_benchmark it's set before iteration. This makes time-to-first-token comparisons between baseline and L0 benchmarks invalid. Consider moving start_time before the _internal_run call to be consistent.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/test_benchmark.py, line 291:

<comment>Inconsistent TTFT measurement: `start_time` is set after `_internal_run` completes, but in `run_baseline_benchmark` it&#39;s set before iteration. This makes time-to-first-token comparisons between baseline and L0 benchmarks invalid. Consider moving `start_time` before the `_internal_run` call to be consistent.</comment>

<file context>
@@ -0,0 +1,1044 @@
+        check_intervals=check_intervals,
+    )
+
+    start_time = time.perf_counter()
+
+    async for event in result:
</file context>

✅ Addressed in 9d74796

@LZL0 LZL0 merged commit 8ec63c3 into master Dec 8, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant