Skip to content

Conversation

@TJ5
Copy link
Collaborator

@TJ5 TJ5 commented Jul 8, 2025

Motivation: We want to be able to benchmark prefix caching by giving users the option to make a portion of the input prompts a shared prefix.

Changes:

  • Added new option --prompt-prefix-ratio [float] which takes a ratio [0.0, 1.0] of tokens of the overall prompt length to be used as a prefix. Example: 0.5 indicates that the first half of the tokens of all generated text prompts should be a shared prefix.
  • Make the body of each prompt (after the prefix, and regardless of whether the --prompt-prefix-length is used) start with a randomly selected number. The reason for this is that since sonnet is a very small dataset, it is likely that certain prompts may end up having a larger shared prefix than intended, i.e. by random chance they may be very similar prompts. Using a randomly selected number as a "buffer" between the prefix and the body guarantees they have only the shared prefix that the user wants to benchmark.

image

Image: Randomly sampled prefix of 10 tokens, "And make T" prepended to all prompts, followed by a random 4 digit number, followed by the rest of the randomly sampled prompt.

Added tests

@TJ5 TJ5 requested review from CatherineSue and slin1237 as code owners July 8, 2025 23:28
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @TJ5, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances our benchmarking capabilities by adding robust support for testing prefix caching. It allows users to define a shared prefix for all input prompts, either by a fixed token count or as a ratio of the total prompt length. This is a critical feature for accurately evaluating the performance benefits of prefix caching in large language models. Additionally, I've implemented a mechanism to ensure that the non-prefix portion of the prompts remains distinct, preventing accidental shared prefixes from the underlying dataset and guaranteeing precise benchmark results.

Highlights

  • New CLI Options for Prefix Caching: I've introduced --prompt-prefix-length and --prompt-prefix-ratio command-line options to allow users to specify a shared prefix for all input prompts, enabling benchmarking of prefix caching.
  • Dynamic Prefix Generation: The TextSampler now includes logic to generate a shared prefix of a specified token length or ratio, sampled from the existing dataset, and prepend it to all prompts.
  • Prompt Uniqueness Mechanism: To prevent unintended shared prefixes from the small sonnet.txt dataset, a random 4-digit number is now inserted between the generated prefix and the main body of each prompt, ensuring distinctness for benchmarking.
  • Input Validation: I've added validation to ensure that only one of the prefix length options (--prompt-prefix-length or --prompt-prefix-ratio) can be used at a time, preventing conflicting configurations.
  • Comprehensive Testing: New unit tests have been added to cover both fixed and ratio-based prefix generation, as well as edge cases such as handling short prompts, empty datasets, and scenarios where the prefix length might exceed the total prompt length.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce the ability to benchmark prefix caching by giving users the option to make a portion of the input prompts a shared prefix. I've provided feedback to improve the correctness of token length calculations, and validation logic for the new CLI options.

@YouNeedCryDear YouNeedCryDear self-requested a review July 9, 2025 22:35
YouNeedCryDear
YouNeedCryDear previously approved these changes Jul 9, 2025
@YouNeedCryDear
Copy link
Collaborator

@slin1237 @CatherineSue Could you please take some time to review? There are some other tasks that depends on this feature.

Copy link
Collaborator

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take another round of detailed implementation later.

Can we add documentation for this change?

@TJ5 TJ5 force-pushed the prefix-caching branch from b59fa7f to 88e3322 Compare July 30, 2025 17:38
@YouNeedCryDear
Copy link
Collaborator

@TJ5 Could you resolve the conflict and have @CatherineSue to review again?

@TJ5 TJ5 force-pushed the prefix-caching branch 2 times, most recently from 0637d74 to 9f704a4 Compare August 5, 2025 19:52
TJ5 added 2 commits August 12, 2025 10:41
tests

token length fix

sample prefix tokens not chars

lint

fix tests

percentage

lint fixes

gemini-feedback

fix tests failing

rename prompt-prefix-ratio

mkae current_prefix_length local

remove prompt prefix length

refactor prefix sampling logic

format

revert back to 4 digits

fix prefix length to change with variable distribution

use line char ratio

put latest changes in their own function and update prefix truncation

gemini-feedback

make number of tokens more accurate to scenario

use tokenizer.encode

fix tests

documentation

fix merge issues
@ai-jz
Copy link
Collaborator

ai-jz commented Sep 30, 2025

testing codex usage

@codex review this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants