Skip to content

Conversation

attafosu
Copy link
Contributor

@attafosu attafosu marked this pull request as ready for review September 12, 2025 18:45
@xuechendi
Copy link
Collaborator

Do we see performance improvement?


key_rot = key[..., :self.rotary_dim]
key_pass = key[..., self.rotary_dim:]
key_rot = apply_rotary_pos_emb(key_rot, cos, sin, None, 0, rope_mode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a comparison with existing forward_native, seems major difference is apply_rotary_pos_emb vs apply_rotary_emb_torch, may you check if we do gets perf gain with the oot impl, or we can use native ?

Copy link
Contributor Author

@attafosu attafosu Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did some quick tests and there's some perf gain over the default:
forward_native: 11.53 tok/sec
forward_oot: 12.32 tok/sec
This is on a smaller sized image and I expect it to be more pronounced on an even bigger input (text or image)

@xuechendi
Copy link
Collaborator

please fix pre-commit

@xuechendi
Copy link
Collaborator

/run-gaudi-tests

@xuechendi xuechendi enabled auto-merge (squash) September 12, 2025 21:36
@xuechendi xuechendi disabled auto-merge September 16, 2025 23:24
@xuechendi xuechendi merged commit b94548a into vllm-project:main Sep 16, 2025
8 checks passed
PatrykWo pushed a commit to PatrykWo/vllm-gaudi-vllm-docker that referenced this pull request Sep 17, 2025
- HPU Mrope implementation had a bug which was exposed by
vllm-project/vllm#24444
- Initial workaround was to use the default implementation:
vllm-project#162
- This PR fixes the bug in the HPU mrope

---------

Signed-off-by: attafosu <[email protected]>
Co-authored-by: Chendi.Xue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants