Skip to content

Conversation

jainapurva
Copy link

For the llama model, in the sdpa function call, set enable_gqa=True to use the inbuilt grouped query attention functionality

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 13, 2024
@jainapurva jainapurva requested a review from drisspg July 13, 2024 03:56
@yanboliang
Copy link
Contributor

yanboliang commented Jul 26, 2024

I think we should wait a bit to get this in, since a lot of users are still using the old version of PT which doesn't support enable_gqa. But I'm interested how much perf gain we have after enabling the builtin gqa, do you have numbers on A100?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants