Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Medusa Speculative Decoding #423

Open
someone13574 opened this issue Sep 11, 2023 · 1 comment
Open

Medusa Speculative Decoding #423

someone13574 opened this issue Sep 11, 2023 · 1 comment

Comments

@someone13574
Copy link

Recently there was a project called Medusa which was released. It basically trains more lm_head's that instead of predicting the next token, they predict the token n+2, n+3, and n+4 before generating a tree of possible combinations of top-k possibilities for the upcoming tokens and evaluating them all at once with some clever masking and selecting one of the best ones. They get ~2x speedup and it looks like they are planning to integrate into llama.cpp, so I thought it would be a good fit for this project as well.

Links: Blog, Implementation, Models

@someone13574
Copy link
Author

Ref to llama.cpp issue ggerganov/llama.cpp#3137

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant