Skip to content

SpD, multiprojection heads #306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from
Open

SpD, multiprojection heads #306

wants to merge 44 commits into from

Conversation

eplatero97
Copy link
Contributor

Objective

This PR seeks to implement post-attention hidden size projections used to speculate tokens ahead of the base model. This PR contains three primary components:

  1. extending base model with multi-projection in modeling_auto.py
  2. implementing multi-projections forward pass.
  3. app demo of multi-projection model.

Initial Implementation

Initial implementation gives the user the flexibility to define its own projection architecture and pass it to QEffAutoModelForCausalLM.

Then, QEfficient simply attaches these projections to the model to be used during the forward pass. The "attaching" of these projections is done by using the accelerate library. I used this library because it has a robust implementation to attach weights to an already existing model. We can implement our own abstraction if needed, but first we must agree on what the external API will be to the user.

NOTE: Please keep in mind that to integrate medusa, similar changes will be needed (instead of doing multiple hidden size projections, medusa uses multiple lm_heads to speculate ahead of the base model).

@vbaddi vbaddi added the enhancement New feature or request label Mar 5, 2025
@quic-rishinr quic-rishinr marked this pull request as draft March 5, 2025 09:29

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
@eplatero97
Copy link
Contributor Author

using sdk 1.20.0.63, I manually validated that all qwen vanilla spd tests passed as well as llama multiprojection pytorch unit test

Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
@quic-rishinr quic-rishinr marked this pull request as ready for review March 21, 2025 11:37
…jections

Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
eplatero97 and others added 7 commits April 20, 2025 21:26
Signed-off-by: eplatero <[email protected]>
Signed-off-by: Erick Platero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Signed-off-by: Erick Platero <[email protected]>
lint fix
Signed-off-by: eplatero <[email protected]>
Signed-off-by: eplatero <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.20.0 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants