Add support to benchmark prefix caching #36

TJ5 · 2025-07-08T23:28:30Z

Motivation: We want to be able to benchmark prefix caching by giving users the option to make a portion of the input prompts a shared prefix.

Changes:

Added new option --prompt-prefix-ratio [float] which takes a ratio [0.0, 1.0] of tokens of the overall prompt length to be used as a prefix. Example: 0.5 indicates that the first half of the tokens of all generated text prompts should be a shared prefix.
Make the body of each prompt (after the prefix, and regardless of whether the --prompt-prefix-length is used) start with a randomly selected number. The reason for this is that since sonnet is a very small dataset, it is likely that certain prompts may end up having a larger shared prefix than intended, i.e. by random chance they may be very similar prompts. Using a randomly selected number as a "buffer" between the prefix and the body guarantees they have only the shared prefix that the user wants to benchmark.

Image: Randomly sampled prefix of 10 tokens, "And make T" prepended to all prompts, followed by a random 4 digit number, followed by the rest of the randomly sampled prompt.

Added tests

gemini-code-assist

Summary of Changes

Hello @TJ5, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances our benchmarking capabilities by adding robust support for testing prefix caching. It allows users to define a shared prefix for all input prompts, either by a fixed token count or as a ratio of the total prompt length. This is a critical feature for accurately evaluating the performance benefits of prefix caching in large language models. Additionally, I've implemented a mechanism to ensure that the non-prefix portion of the prompts remains distinct, preventing accidental shared prefixes from the underlying dataset and guaranteeing precise benchmark results.

Highlights

New CLI Options for Prefix Caching: I've introduced --prompt-prefix-length and --prompt-prefix-ratio command-line options to allow users to specify a shared prefix for all input prompts, enabling benchmarking of prefix caching.
Dynamic Prefix Generation: The TextSampler now includes logic to generate a shared prefix of a specified token length or ratio, sampled from the existing dataset, and prepend it to all prompts.
Prompt Uniqueness Mechanism: To prevent unintended shared prefixes from the small sonnet.txt dataset, a random 4-digit number is now inserted between the generated prefix and the main body of each prompt, ensuring distinctness for benchmarking.
Input Validation: I've added validation to ensure that only one of the prefix length options (--prompt-prefix-length or --prompt-prefix-ratio) can be used at a time, preventing conflicting configurations.
Comprehensive Testing: New unit tests have been added to cover both fixed and ratio-based prefix generation, as well as edge cases such as handling short prompts, empty datasets, and scenarios where the prefix length might exceed the total prompt length.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The code changes introduce the ability to benchmark prefix caching by giving users the option to make a portion of the input prompts a shared prefix. I've provided feedback to improve the correctness of token length calculations, and validation logic for the new CLI options.

genai_bench/sampling/text.py

genai_bench/cli/validation.py

tests/sampling/test_text.py

YouNeedCryDear · 2025-07-11T04:19:56Z

@slin1237 @CatherineSue Could you please take some time to review? There are some other tasks that depends on this feature.

CatherineSue

Will take another round of detailed implementation later.

Can we add documentation for this change?

genai_bench/cli/cli.py

genai_bench/sampling/text.py

YouNeedCryDear · 2025-08-05T06:02:26Z

@TJ5 Could you resolve the conflict and have @CatherineSue to review again?

tests token length fix sample prefix tokens not chars lint fix tests percentage lint fixes gemini-feedback fix tests failing rename prompt-prefix-ratio mkae current_prefix_length local remove prompt prefix length refactor prefix sampling logic format revert back to 4 digits fix prefix length to change with variable distribution use line char ratio put latest changes in their own function and update prefix truncation gemini-feedback make number of tokens more accurate to scenario use tokenizer.encode fix tests documentation fix merge issues

ai-jz · 2025-09-30T00:29:01Z

testing codex usage

@codex review this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

genai_bench/cli/cli.py

into prefix-caching

TJ5 requested review from CatherineSue and slin1237 as code owners July 8, 2025 23:28

gemini-code-assist bot reviewed Jul 8, 2025

View reviewed changes

genai_bench/sampling/text.py Outdated Show resolved Hide resolved

genai_bench/sampling/text.py Outdated Show resolved Hide resolved

genai_bench/cli/validation.py Outdated Show resolved Hide resolved

tests/sampling/test_text.py Outdated Show resolved Hide resolved

YouNeedCryDear self-requested a review July 9, 2025 22:35

YouNeedCryDear previously approved these changes Jul 9, 2025

View reviewed changes

CatherineSue reviewed Jul 29, 2025

View reviewed changes

genai_bench/cli/cli.py Outdated Show resolved Hide resolved

genai_bench/sampling/text.py Outdated Show resolved Hide resolved

TJ5 force-pushed the prefix-caching branch from b59fa7f to 88e3322 Compare July 30, 2025 17:38

TJ5 force-pushed the prefix-caching branch 2 times, most recently from 0637d74 to 9f704a4 Compare August 5, 2025 19:52

TJ5 dismissed YouNeedCryDear’s stale review via 9f704a4 August 7, 2025 21:49

TJ5 force-pushed the prefix-caching branch from 9f704a4 to dbfbfc4 Compare August 11, 2025 23:53

TJ5 added 2 commits August 12, 2025 10:41

rebase fixes

a8bc5c1

TJ5 force-pushed the prefix-caching branch from dbfbfc4 to a8bc5c1 Compare August 12, 2025 17:41

chatgpt-codex-connector bot reviewed Sep 30, 2025

View reviewed changes

genai_bench/cli/cli.py Outdated Show resolved Hide resolved

YouNeedCryDear added 3 commits September 30, 2025 14:43

Merge remote-tracking branch 'origin/main'

51896ed

into prefix-caching

fix prefix ratio cli variable name

5072f87

Merge remote-tracking branch 'origin/main' into prefix-caching

53a5e52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support to benchmark prefix caching #36

Add support to benchmark prefix caching #36

Uh oh!

TJ5 commented Jul 8, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear commented Jul 11, 2025

Uh oh!

CatherineSue left a comment

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear commented Aug 5, 2025

Uh oh!

ai-jz commented Sep 30, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support to benchmark prefix caching #36

Are you sure you want to change the base?

Add support to benchmark prefix caching #36

Uh oh!

Conversation

TJ5 commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear commented Jul 11, 2025

Uh oh!

CatherineSue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear commented Aug 5, 2025

Uh oh!

ai-jz commented Sep 30, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TJ5 commented Jul 8, 2025 •

edited

Loading