You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tests
token length fix
sample prefix tokens not chars
lint
fix tests
percentage
lint fixes
gemini-feedback
fix tests failing
rename prompt-prefix-ratio
mkae current_prefix_length local
remove prompt prefix length
refactor prefix sampling logic
format
revert back to 4 digits
fix prefix length to change with variable distribution
use line char ratio
put latest changes in their own function and update prefix truncation
gemini-feedback
make number of tokens more accurate to scenario
use tokenizer.encode
fix tests
documentation
fix merge issues
Copy file name to clipboardExpand all lines: docs/user-guide/run-benchmark.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -186,6 +186,11 @@ For heavier traffic scenarios, like `D(16000,200)` or `D(128000,200)`, use the f
186
186
--num-concurrency 32 \
187
187
```
188
188
189
+
To benchmark with prefix caching, you can make a given fraction of each prompt a common prefix with `--prompt-prefix-ratio`. For example, to set the first half of each prompt to a common prefix, use:
0 commit comments