-
Notifications
You must be signed in to change notification settings - Fork 27
Add support to benchmark prefix caching #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @TJ5, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances our benchmarking capabilities by adding robust support for testing prefix caching. It allows users to define a shared prefix for all input prompts, either by a fixed token count or as a ratio of the total prompt length. This is a critical feature for accurately evaluating the performance benefits of prefix caching in large language models. Additionally, I've implemented a mechanism to ensure that the non-prefix portion of the prompts remains distinct, preventing accidental shared prefixes from the underlying dataset and guaranteeing precise benchmark results.
Highlights
- New CLI Options for Prefix Caching: I've introduced
--prompt-prefix-lengthand--prompt-prefix-ratiocommand-line options to allow users to specify a shared prefix for all input prompts, enabling benchmarking of prefix caching. - Dynamic Prefix Generation: The
TextSamplernow includes logic to generate a shared prefix of a specified token length or ratio, sampled from the existing dataset, and prepend it to all prompts. - Prompt Uniqueness Mechanism: To prevent unintended shared prefixes from the small
sonnet.txtdataset, a random 4-digit number is now inserted between the generated prefix and the main body of each prompt, ensuring distinctness for benchmarking. - Input Validation: I've added validation to ensure that only one of the prefix length options (
--prompt-prefix-lengthor--prompt-prefix-ratio) can be used at a time, preventing conflicting configurations. - Comprehensive Testing: New unit tests have been added to cover both fixed and ratio-based prefix generation, as well as edge cases such as handling short prompts, empty datasets, and scenarios where the prefix length might exceed the total prompt length.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce the ability to benchmark prefix caching by giving users the option to make a portion of the input prompts a shared prefix. I've provided feedback to improve the correctness of token length calculations, and validation logic for the new CLI options.
|
@slin1237 @CatherineSue Could you please take some time to review? There are some other tasks that depends on this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take another round of detailed implementation later.
Can we add documentation for this change?
|
@TJ5 Could you resolve the conflict and have @CatherineSue to review again? |
0637d74 to
9f704a4
Compare
tests token length fix sample prefix tokens not chars lint fix tests percentage lint fixes gemini-feedback fix tests failing rename prompt-prefix-ratio mkae current_prefix_length local remove prompt prefix length refactor prefix sampling logic format revert back to 4 digits fix prefix length to change with variable distribution use line char ratio put latest changes in their own function and update prefix truncation gemini-feedback make number of tokens more accurate to scenario use tokenizer.encode fix tests documentation fix merge issues
|
testing codex usage @codex review this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting
Motivation: We want to be able to benchmark prefix caching by giving users the option to make a portion of the input prompts a shared prefix.
Changes:
Image: Randomly sampled prefix of 10 tokens, "And make T" prepended to all prompts, followed by a random 4 digit number, followed by the rest of the randomly sampled prompt.