File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -60,10 +60,11 @@ Benchmarked with `gemma-4-26b-a4b-it-4bit` running three configurations across 5
6060| Configuration | 512 tokens | 40K tokens | 100K tokens | Avg TPS* |
6161| ---| ---| ---| ---| ---|
6262| Baseline | 70.8 | 34.3 | 25.8 | 36.6 |
63- | ** MTP Speculative** | 71.5 | 38.4 | 29.1 | ** 40.3** (1.10×) |
64- | ** MTP + TurboQuant** ⭐ | ** 72.1** | ** 65.2** | ** 62.1** | ** 66.2** (1.81×) |
63+ | ** MTP Speculative** | 71.5 (1.01×) | 38.4 (1.12×) | 29.1 (1.13×) | ** 40.3** |
64+ | ** MTP + TurboQuant** ⭐ | ** 72.1 (1.02×)** | ** 65.2 (1.90×)** | ** 62.1 (2.41×)** | ** 66.2** |
65+
66+ * \* Time-weighted average: ` total_tokens / sum(60/TPS) ` — correct wall-clock representation vs arithmetic mean.*
6567
66- * \* Time-weighted average: ` total_tokens / sum(60/TPS) ` — gives correct wall-clock representation vs arithmetic mean.*
6768
6869### Time to First Token (seconds) — lower is better
6970
You can’t perform that action at this time.
0 commit comments