FlexAttention benchmarks currently assess the performance of MHA (Multi-Head Attention).
GQA (Group-Query Attention) is a technique widely used in LLM models (e.g. llama-3.1) to improve performance (by reducing memory usage).
FlexAttention benchmark should be improved to evaluate the performance of FA using GQA.