[FeatureRequest] Constraint input & output sizes no matter the dataset #106

markVaykhansky · 2025-04-08T12:46:54Z

Description
Currently the user can limit the prompt input size and model output size only when using emulated data type.
Please separate the input & output sizes configuration from the type of the data.

Suggested Implementation
Add --data-max-input-tokens and --max-output-tokens parameters to the CLI.
If the data-type is file or transformers and the input size is larger than data_max_input_tokens it should cut-off the input at the limit.

The text was updated successfully, but these errors were encountered:

rgreenberg1 · 2025-05-08T18:58:38Z

@markVaykhansky just following up here. How does the following proposal work?

If we need to limit the output tokens, pre-process the dataset. For input tokens, we have an environment variable GUIDELLM__OPENAI__MAX_OUTPUT_TOKENS

cc @markurtz @sjmonson

Added a new command - preprocess. This command allows the users to preprocess dataset (from HF, file etc.), and limit the prompts the specific token sizes distribution. The generated dataset is saved to a local file and optionally to HF, and can later be used by GuideLLM benchmark. Solves #106 --------- Co-authored-by: Mark Kurtz <[email protected]>

markurtz · 2025-06-12T22:03:44Z

Closing this out since this was addressed in #162

markVaykhansky changed the title ~~Constraint input & output sizes no matter the dataset~~ [FeatureRequest] Constraint input & output sizes no matter the dataset Apr 8, 2025

rgreenberg1 added this to GuideLLM Kanban Board May 8, 2025

rgreenberg1 moved this to Backlog in GuideLLM Kanban Board May 8, 2025

TomerG711 mentioned this issue May 18, 2025

Feat/add preprocess dataset #162

Merged

markurtz closed this as completed Jun 12, 2025

github-project-automation bot moved this from Backlog to Done in GuideLLM Kanban Board Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FeatureRequest] Constraint input & output sizes no matter the dataset #106

[FeatureRequest] Constraint input & output sizes no matter the dataset #106

markVaykhansky commented Apr 8, 2025

rgreenberg1 commented May 8, 2025 •

edited

Loading

Uh oh!

markurtz commented Jun 12, 2025

Uh oh!

[FeatureRequest] Constraint input & output sizes no matter the dataset #106

[FeatureRequest] Constraint input & output sizes no matter the dataset #106

Comments

markVaykhansky commented Apr 8, 2025

rgreenberg1 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markurtz commented Jun 12, 2025

Uh oh!

rgreenberg1 commented May 8, 2025 •

edited

Loading