Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize mla to get best performance #10102

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

lizhenyun01
Copy link
Contributor

@lizhenyun01 lizhenyun01 commented Mar 12, 2025

PR types

Function optimization & Performance optimization

PR changes

Others

Description

optimize mla to get best performance

  1. 增加针对mla chunk_size的最佳参数搜索以获得最佳性能,支持128k长度内参数搜索。使用export FLAGS_mla_dec_chunk_size=-1打开参数搜索,或设置其>0作为指定的chunk_size,放弃原本的FLAGS_cascade_attention_max_partition_size以避免与append_attn冲突。
  2. 移除fixed_block_num以规避在cuda12.8以下版本可能出现的bug,同时大batch下有小幅度性能提升
  3. cuda版本小于12.8时关闭寄存器重分配以避免未知原因带来的大batch可能出nan的现象
  4. 打开tma以在hopper下获得load/store加速
  5. 增加run_mla_benchmark.sh脚本供用户一键获得MLA kernel的profile
    ...

Copy link

paddle-bot bot commented Mar 12, 2025

Thanks for your contribution!

@lizhenyun01 lizhenyun01 changed the title Mla split kv optimize mla to get best performance Mar 12, 2025
Copy link

codecov bot commented Mar 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 50.00%. Comparing base (6d80be7) to head (104f7ba).
Report is 10 commits behind head on develop.

Current head 104f7ba differs from pull request most recent head e3128e5

Please upload reports for the commit e3128e5 to get more accurate results.

❌ Your project check has failed because the head coverage (50.00%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #10102   +/-   ##
========================================
  Coverage    49.99%   50.00%           
========================================
  Files          757      757           
  Lines       122442   122442           
========================================
+ Hits         61217    61227   +10     
+ Misses       61225    61215   -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yuanlehome
yuanlehome previously approved these changes Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants