Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to predict a specific length of tokens? #1975

Open
simmonssong opened this issue Mar 19, 2025 · 3 comments
Open

How to predict a specific length of tokens? #1975

simmonssong opened this issue Mar 19, 2025 · 3 comments

Comments

@simmonssong
Copy link

In llama.cpp, --n-predict option is used to set the number of tokens to predict when generating text/

I don't find the binding for that in doc.

@DanieleMorotti
Copy link
Contributor

Hi, the binding for that parameter is max_tokens.

@simmonssong
Copy link
Author

max_tokens cannot ensure an exact number of predicted tokens. Sometimes, a model predicts less than max_tokens .

@DanieleMorotti
Copy link
Contributor

Yes, and the --n-predict option in llama.cpp won't work unless you ignore the EOS token, as explained here. Thus, I don't know if it was what you were looking for, to sample until the --n-predict value is reached and then truncate.

I was not able to find such option on the high level API of this repo, maybe you can have a look at this example, that adopts the low level api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants