Skip to content

[FeatureRequest] Constraint input & output sizes no matter the dataset #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markVaykhansky opened this issue Apr 8, 2025 · 2 comments

Comments

@markVaykhansky
Copy link
Collaborator

Description
Currently the user can limit the prompt input size and model output size only when using emulated data type.
Please separate the input & output sizes configuration from the type of the data.

Suggested Implementation
Add --data-max-input-tokens and --max-output-tokens parameters to the CLI.
If the data-type is file or transformers and the input size is larger than data_max_input_tokens it should cut-off the input at the limit.

@markVaykhansky markVaykhansky changed the title Constraint input & output sizes no matter the dataset [FeatureRequest] Constraint input & output sizes no matter the dataset Apr 8, 2025
@rgreenberg1
Copy link
Collaborator

rgreenberg1 commented May 8, 2025

@markVaykhansky just following up here. How does the following proposal work?

If we need to limit the output tokens, pre-process the dataset. For input tokens, we have an environment variable GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS

cc @markurtz @sjmonson

@rgreenberg1 rgreenberg1 moved this to Backlog in GuideLLM Kanban Board May 8, 2025
markurtz added a commit that referenced this issue Jun 5, 2025
Added a new command - preprocess. This command allows the users to
preprocess dataset (from HF, file etc.), and limit the prompts the
specific token sizes distribution. The generated dataset is saved to a
local file and optionally to HF, and can later be used by GuideLLM
benchmark. Solves #106

---------

Co-authored-by: Mark Kurtz <[email protected]>
@markurtz
Copy link
Member

Closing this out since this was addressed in #162

@github-project-automation github-project-automation bot moved this from Backlog to Done in GuideLLM Kanban Board Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants