-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avxtime: Summarize AVX-512 cputime per task as a histogram #4795
base: master
Are you sure you want to change the base?
Conversation
Can you help me review it? I will improve it with any comments. |
I wonder how to create a workload to test this tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with tracing x86_fpu_regs_deactivated, so I'll assume you've dug into that and understand how it all works.
Note that there are AVX-512 PMCs but I don't think they are easily findable. https://github.com/intel/PerfSpect recently added support for AVX-512 metrics, although I don't think it can emit a per-PID breakdown or the context-switch event runtimes. (It's also unlikly to work in most cloud environments due to a lack of the PMU.)
I think this tool should be a good starting point. Future enhancements could add more detail about how the system is affected (power, clockspeed, CPI) during AVX-512 runs, likely requiring PMCs.
man/man8/avxcputime.8
Outdated
critical in cloud environments for placement. | ||
|
||
This tool summarizes AVX-512 cputime as a histogram, showing the amount of CPU | ||
time consumed by AVX-512 per-task. This provides valuable insights - it can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this first sentence: If it's measuring AVX-512 time per task, what is the interval? Per second? It looks like it's doing it for each context switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe I didn't describe it clearly enough. It is similar to cpudist
. It calculates the AVX-512 time from the beginning of each context switch until the end of x86_fpu_regs_deactivated.
README.md
Outdated
@@ -85,6 +85,7 @@ pair of .c and .py files, and some are directories of files. | |||
|
|||
|
|||
- tools/[argdist](tools/argdist.py): Display function parameter values as a histogram or frequency count. [Examples](tools/argdist_example.txt). | |||
- tools/[avxcputime](tools/avxcputime.py): Summarize AVX-512 cputime per task as a histogram. [Examples](tools/avxcputime_example.txt). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider calling it just avxtime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, avxtime
sounds great.
Hi hengqi, thanks for the comment. I have used this benchmark to simulate avx512 workload for testing. I hope it will be helpful to you. |
Hi Brendan, thank you for your insightful comments. As you mentioned, there are now some PMCs available to support the detection of AVX-512. The PMCs of the In addition, there are some PMCs that can hint at the execution of AVX-512 instructions, such as:
The more AVX-512 executions there are, the higher the level will be, and the CPU will downclock more severely. However, these PMCs can only be used to estimate the execution of AVX-512 instructions. |
The AVX-512 instruction set in x86 can accelerate FPU execution but also causes core turbo frequency drop that impacts sibling CPUs. This can affect performance of other workloads running on the sibling CPU. Detecting AVX-512 usage is thus critical in cloud environments for placement. This tool summarizes AVX-512 cputime as a histogram, showing the amount of CPU time consumed by AVX-512 per-task. Within the specified time period, it tracks the time spent from being scheduled on the CPU until the FPU registers are deactivated for each task that uses AVX-512. This provides valuable insights - it can identify processes and CPUs executing AVX-512 to avoid scheduling sensitive jobs together. The cputime distribution can also help debug AVX-512 performance issues. This program is based on this Linux kernel patch that tracks per-task AVX-512 usage: torvalds/linux@2f7726f It uses BPF to measure the time between a task's AVX-512 timestamps to calculate the CPU time in AVX-512 mode. The aggregated cputime distribution is printed for analysis. Signed-off-by: Zhiyong Ye <[email protected]>
Hi @brendangregg , I've updated the PR. Changes are:
Welcome to review it again. Any comments can help me improve this patch. |
The AVX-512 instruction set in x86 can accelerate FPU execution but also causes core turbo frequency drop that impacts sibling CPUs. This can affect performance of other workloads running on the sibling CPU. Detecting AVX-512 usage is thus critical in cloud environments for placement.
This tool summarizes AVX-512 cputime as a histogram, showing the amount of CPU time consumed by AVX-512 per-task. This provides valuable insights - it can identify processes and CPUs executing AVX-512 to avoid scheduling sensitive jobs together. The cputime distribution can also help debug AVX-512 performance issues.
This program is based on this Linux kernel patch that tracks per-task AVX-512 usage:
torvalds/linux@2f7726f
It uses BPF to measure the time between a task's AVX-512 timestamps to calculate the CPU time in AVX-512 mode. The aggregated cputime distribution is printed for analysis.