Skip to content

Commit e5dcbcb

Browse files
docs: show per-context speedup multipliers in benchmark table
1 parent 1239f09 commit e5dcbcb

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,11 @@ Benchmarked with `gemma-4-26b-a4b-it-4bit` running three configurations across 5
6060
| Configuration | 512 tokens | 40K tokens | 100K tokens | Avg TPS* |
6161
|---|---|---|---|---|
6262
| Baseline | 70.8 | 34.3 | 25.8 | 36.6 |
63-
| **MTP Speculative** | 71.5 | 38.4 | 29.1 | **40.3** (1.10×) |
64-
| **MTP + TurboQuant**| **72.1** | **65.2** | **62.1** | **66.2** (1.81×) |
63+
| **MTP Speculative** | 71.5 (1.01×) | 38.4 (1.12×) | 29.1 (1.13×) | **40.3** |
64+
| **MTP + TurboQuant**| **72.1 (1.02×)** | **65.2 (1.90×)** | **62.1 (2.41×)** | **66.2** |
65+
66+
*\* Time-weighted average: `total_tokens / sum(60/TPS)` — correct wall-clock representation vs arithmetic mean.*
6567

66-
*\* Time-weighted average: `total_tokens / sum(60/TPS)` — gives correct wall-clock representation vs arithmetic mean.*
6768

6869
### Time to First Token (seconds) — lower is better
6970

0 commit comments

Comments
 (0)