-
Notifications
You must be signed in to change notification settings - Fork 46
[FeatureRequest] Constraint input & output sizes no matter the dataset #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@markVaykhansky just following up here. How does the following proposal work? If we need to limit the output tokens, pre-process the dataset. For input tokens, we have an environment variable |
markurtz
added a commit
that referenced
this issue
Jun 5, 2025
Added a new command - preprocess. This command allows the users to preprocess dataset (from HF, file etc.), and limit the prompts the specific token sizes distribution. The generated dataset is saved to a local file and optionally to HF, and can later be used by GuideLLM benchmark. Solves #106 --------- Co-authored-by: Mark Kurtz <[email protected]>
Closing this out since this was addressed in #162 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Currently the user can limit the prompt input size and model output size only when using emulated data type.
Please separate the input & output sizes configuration from the type of the data.
Suggested Implementation
Add
--data-max-input-tokens
and--max-output-tokens
parameters to the CLI.If the data-type is
file
ortransformers
and the input size is larger thandata_max_input_tokens
it should cut-off the input at the limit.The text was updated successfully, but these errors were encountered: