Skip to content

Commit 6a34dfa

Browse files
committed
Merge tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada: - Add generic support for built-in boot DTB files - Enable TAB cycling for dialog buttons in nconfig - Fix issues in streamline_config.pl - Refactor Kconfig - Add support for Clang's AutoFDO (Automatic Feedback-Directed Optimization) - Add support for Clang's Propeller, a profile-guided optimization. - Change the working directory to the external module directory for M= builds - Support building external modules in a separate output directory - Enable objtool for *.mod.o and additional kernel objects - Use lz4 instead of deprecated lz4c - Work around a performance issue with "git describe" - Refactor modpost * tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits) kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms gitignore: Don't ignore 'tags' directory kbuild: add dependency from vmlinux to resolve_btfids modpost: replace tdb_hash() with hash_str() kbuild: deb-pkg: add python3:native to build dependency genksyms: reduce indentation in export_symbol() modpost: improve error messages in device_id_check() modpost: rename alias symbol for MODULE_DEVICE_TABLE() modpost: rename variables in handle_moddevtable() modpost: move strstarts() to modpost.h modpost: convert do_usb_table() to a generic handler modpost: convert do_of_table() to a generic handler modpost: convert do_pnp_device_entry() to a generic handler modpost: convert do_pnp_card_entries() to a generic handler modpost: call module_alias_printf() from all do_*_entry() functions modpost: pass (struct module *) to do_*_entry() functions modpost: remove DEF_FIELD_ADDR_VAR() macro modpost: deduplicate MODULE_ALIAS() for all drivers modpost: introduce module_alias_printf() helper modpost: remove unnecessary check in do_acpi_entry() ...
2 parents 0e287d3 + e6064da commit 6a34dfa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+1471
-1010
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ series
129129

130130
# ctags files
131131
tags
132+
!tags/
132133
TAGS
133134

134135
# cscope files

Documentation/dev-tools/autofdo.rst

+168
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===================================
4+
Using AutoFDO with the Linux kernel
5+
===================================
6+
7+
This enables AutoFDO build support for the kernel when using
8+
the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
9+
is a type of profile-guided optimization (PGO) used to enhance the
10+
performance of binary executables. It gathers information about the
11+
frequency of execution of various code paths within a binary using
12+
hardware sampling. This data is then used to guide the compiler's
13+
optimization decisions, resulting in a more efficient binary. AutoFDO
14+
is a powerful optimization technique, and data indicates that it can
15+
significantly improve kernel performance. It's especially beneficial
16+
for workloads affected by front-end stalls.
17+
18+
For AutoFDO builds, unlike non-FDO builds, the user must supply a
19+
profile. Acquiring an AutoFDO profile can be done in several ways.
20+
AutoFDO profiles are created by converting hardware sampling using
21+
the "perf" tool. It is crucial that the workload used to create these
22+
perf files is representative; they must exhibit runtime
23+
characteristics similar to the workloads that are intended to be
24+
optimized. Failure to do so will result in the compiler optimizing
25+
for the wrong objective.
26+
27+
The AutoFDO profile often encapsulates the program's behavior. If the
28+
performance-critical codes are architecture-independent, the profile
29+
can be applied across platforms to achieve performance gains. For
30+
instance, using the profile generated on Intel architecture to build
31+
a kernel for AMD architecture can also yield performance improvements.
32+
33+
There are two methods for acquiring a representative profile:
34+
(1) Sample real workloads using a production environment.
35+
(2) Generate the profile using a representative load test.
36+
When enabling the AutoFDO build configuration without providing an
37+
AutoFDO profile, the compiler only modifies the dwarf information in
38+
the kernel without impacting runtime performance. It's advisable to
39+
use a kernel binary built with the same AutoFDO configuration to
40+
collect the perf profile. While it's possible to use a kernel built
41+
with different options, it may result in inferior performance.
42+
43+
One can collect profiles using AutoFDO build for the previous kernel.
44+
AutoFDO employs relative line numbers to match the profiles, offering
45+
some tolerance for source changes. This mode is commonly used in a
46+
production environment for profile collection.
47+
48+
In a profile collection based on a load test, the AutoFDO collection
49+
process consists of the following steps:
50+
51+
#. Initial build: The kernel is built with AutoFDO options
52+
without a profile.
53+
54+
#. Profiling: The above kernel is then run with a representative
55+
workload to gather execution frequency data. This data is
56+
collected using hardware sampling, via perf. AutoFDO is most
57+
effective on platforms supporting advanced PMU features like
58+
LBR on Intel machines.
59+
60+
#. AutoFDO profile generation: Perf output file is converted to
61+
the AutoFDO profile via offline tools.
62+
63+
The support requires a Clang compiler LLVM 17 or later.
64+
65+
Preparation
66+
===========
67+
68+
Configure the kernel with::
69+
70+
CONFIG_AUTOFDO_CLANG=y
71+
72+
Customization
73+
=============
74+
75+
The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
76+
AutoFDO builds. One can, however, enable or disable AutoFDO build for
77+
individual files and directories by adding a line similar to the following
78+
to the respective kernel Makefile:
79+
80+
- For enabling a single file (e.g. foo.o) ::
81+
82+
AUTOFDO_PROFILE_foo.o := y
83+
84+
- For enabling all files in one directory ::
85+
86+
AUTOFDO_PROFILE := y
87+
88+
- For disabling one file ::
89+
90+
AUTOFDO_PROFILE_foo.o := n
91+
92+
- For disabling all files in one directory ::
93+
94+
AUTOFDO_PROFILE := n
95+
96+
Workflow
97+
========
98+
99+
Here is an example workflow for AutoFDO kernel:
100+
101+
1) Build the kernel on the host machine with LLVM enabled,
102+
for example, ::
103+
104+
$ make menuconfig LLVM=1
105+
106+
Turn on AutoFDO build config::
107+
108+
CONFIG_AUTOFDO_CLANG=y
109+
110+
With a configuration that with LLVM enabled, use the following command::
111+
112+
$ scripts/config -e AUTOFDO_CLANG
113+
114+
After getting the config, build with ::
115+
116+
$ make LLVM=1
117+
118+
2) Install the kernel on the test machine.
119+
120+
3) Run the load tests. The '-c' option in perf specifies the sample
121+
event period. We suggest using a suitable prime number, like 500009,
122+
for this purpose.
123+
124+
- For Intel platforms::
125+
126+
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
127+
128+
- For AMD platforms:
129+
130+
The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
131+
132+
For Zen3::
133+
134+
$ cat proc/cpuinfo | grep " brs"
135+
136+
For Zen4::
137+
138+
$ cat proc/cpuinfo | grep amd_lbr_v2
139+
140+
The following command generated the perf data file::
141+
142+
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
143+
144+
4) (Optional) Download the raw perf file to the host machine.
145+
146+
5) To generate an AutoFDO profile, two offline tools are available:
147+
create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
148+
of the AutoFDO project and can be found on GitHub
149+
(https://github.com/google/autofdo), version v0.30.1 or later.
150+
The llvm_profgen tool is included in the LLVM compiler itself. It's
151+
important to note that the version of llvm_profgen doesn't need to match
152+
the version of Clang. It needs to be the LLVM 19 release of Clang
153+
or later, or just from the LLVM trunk. ::
154+
155+
$ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
156+
157+
or ::
158+
159+
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
160+
161+
Note that multiple AutoFDO profile files can be merged into one via::
162+
163+
$ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
164+
165+
6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
166+
(Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
167+
168+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>

Documentation/dev-tools/coccinelle.rst

+7-15
Original file line numberDiff line numberDiff line change
@@ -250,25 +250,17 @@ variables for .cocciconfig is as follows:
250250
- Your directory from which spatch is called is processed next
251251
- The directory provided with the ``--dir`` option is processed last, if used
252252

253-
Since coccicheck runs through make, it naturally runs from the kernel
254-
proper dir; as such the second rule above would be implied for picking up a
255-
.cocciconfig when using ``make coccicheck``.
256-
257253
``make coccicheck`` also supports using M= targets. If you do not supply
258254
any M= target, it is assumed you want to target the entire kernel.
259255
The kernel coccicheck script has::
260256

261-
if [ "$KBUILD_EXTMOD" = "" ] ; then
262-
OPTIONS="--dir $srctree $COCCIINCLUDE"
263-
else
264-
OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE"
265-
fi
266-
267-
KBUILD_EXTMOD is set when an explicit target with M= is used. For both cases
268-
the spatch ``--dir`` argument is used, as such third rule applies when whether
269-
M= is used or not, and when M= is used the target directory can have its own
270-
.cocciconfig file. When M= is not passed as an argument to coccicheck the
271-
target directory is the same as the directory from where spatch was called.
257+
OPTIONS="--dir $srcroot $COCCIINCLUDE"
258+
259+
Here, $srcroot refers to the source directory of the target: it points to the
260+
external module's source directory when M= used, and otherwise, to the kernel
261+
source directory. The third rule ensures the spatch reads the .cocciconfig from
262+
the target directory, allowing external modules to have their own .cocciconfig
263+
file.
272264

273265
If not using the kernel's coccicheck target, keep the above precedence
274266
order logic of .cocciconfig reading. If using the kernel's coccicheck target,

Documentation/dev-tools/index.rst

+2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Documentation/dev-tools/testing-overview.rst
3434
ktap
3535
checkuapi
3636
gpio-sloppy-logic-analyzer
37+
autofdo
38+
propeller
3739

3840

3941
.. only:: subproject and html

Documentation/dev-tools/propeller.rst

+162
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=====================================
4+
Using Propeller with the Linux kernel
5+
=====================================
6+
7+
This enables Propeller build support for the kernel when using Clang
8+
compiler. Propeller is a profile-guided optimization (PGO) method used
9+
to optimize binary executables. Like AutoFDO, it utilizes hardware
10+
sampling to gather information about the frequency of execution of
11+
different code paths within a binary. Unlike AutoFDO, this information
12+
is then used right before linking phase to optimize (among others)
13+
block layout within and across functions.
14+
15+
A few important notes about adopting Propeller optimization:
16+
17+
#. Although it can be used as a standalone optimization step, it is
18+
strongly recommended to apply Propeller on top of AutoFDO,
19+
AutoFDO+ThinLTO or Instrument FDO. The rest of this document
20+
assumes this paradigm.
21+
22+
#. Propeller uses another round of profiling on top of
23+
AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
24+
"build-afdo - train-afdo - build-propeller - train-propeller -
25+
build-optimized".
26+
27+
#. Propeller requires LLVM 19 release or later for Clang/Clang++
28+
and the linker(ld.lld).
29+
30+
#. In addition to LLVM toolchain, Propeller requires a profiling
31+
conversion tool: https://github.com/google/autofdo with a release
32+
after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
33+
34+
The Propeller optimization process involves the following steps:
35+
36+
#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
37+
you would normally do, but with a set of compile-time / link-time
38+
flags, so that a special metadata section is created within the
39+
kernel binary. The special section is only intend to be used by the
40+
profiling tool, it is not part of the runtime image, nor does it
41+
change kernel run time text sections.
42+
43+
#. Profiling: The above kernel is then run with a representative
44+
workload to gather execution frequency data. This data is collected
45+
using hardware sampling, via perf. Propeller is most effective on
46+
platforms supporting advanced PMU features like LBR on Intel
47+
machines. This step is the same as profiling the kernel for AutoFDO
48+
(the exact perf parameters can be different).
49+
50+
#. Propeller profile generation: Perf output file is converted to a
51+
pair of Propeller profiles via an offline tool.
52+
53+
#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
54+
binary as you would normally do, but with a compile-time /
55+
link-time flag to pick up the Propeller compile time and link time
56+
profiles. This build step uses 3 profiles - the AutoFDO profile,
57+
the Propeller compile-time profile and the Propeller link-time
58+
profile.
59+
60+
#. Deployment: The optimized kernel binary is deployed and used
61+
in production environments, providing improved performance
62+
and reduced latency.
63+
64+
Preparation
65+
===========
66+
67+
Configure the kernel with::
68+
69+
CONFIG_AUTOFDO_CLANG=y
70+
CONFIG_PROPELLER_CLANG=y
71+
72+
Customization
73+
=============
74+
75+
The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
76+
for Propeller builds. One can, however, enable or disable Propeller build
77+
for individual files and directories by adding a line similar to the
78+
following to the respective kernel Makefile:
79+
80+
- For enabling a single file (e.g. foo.o)::
81+
82+
PROPELLER_PROFILE_foo.o := y
83+
84+
- For enabling all files in one directory::
85+
86+
PROPELLER_PROFILE := y
87+
88+
- For disabling one file::
89+
90+
PROPELLER_PROFILE_foo.o := n
91+
92+
- For disabling all files in one directory::
93+
94+
PROPELLER__PROFILE := n
95+
96+
97+
Workflow
98+
========
99+
100+
Here is an example workflow for building an AutoFDO+Propeller kernel:
101+
102+
1) Assuming an AutoFDO profile is already collected following
103+
instructions in the AutoFDO document, build the kernel on the host
104+
machine, with AutoFDO and Propeller build configs ::
105+
106+
CONFIG_AUTOFDO_CLANG=y
107+
CONFIG_PROPELLER_CLANG=y
108+
109+
and ::
110+
111+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
112+
113+
2) Install the kernel on the test machine.
114+
115+
3) Run the load tests. The '-c' option in perf specifies the sample
116+
event period. We suggest using a suitable prime number, like 500009,
117+
for this purpose.
118+
119+
- For Intel platforms::
120+
121+
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
122+
123+
- For AMD platforms::
124+
125+
$ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
126+
127+
Note you can repeat the above steps to collect multiple <perf_file>s.
128+
129+
4) (Optional) Download the raw perf file(s) to the host machine.
130+
131+
5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
132+
generate Propeller profile. ::
133+
134+
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
135+
--format=propeller --propeller_output_module_name
136+
--out=<propeller_profile_prefix>_cc_profile.txt
137+
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
138+
139+
"<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
140+
141+
This command generates a pair of Propeller profiles:
142+
"<propeller_profile_prefix>_cc_profile.txt" and
143+
"<propeller_profile_prefix>_ld_profile.txt".
144+
145+
If there are more than 1 perf_file collected in the previous step,
146+
you can create a temp list file "<perf_file_list>" with each line
147+
containing one perf file name and run::
148+
149+
$ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
150+
--format=propeller --propeller_output_module_name
151+
--out=<propeller_profile_prefix>_cc_profile.txt
152+
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
153+
154+
6) Rebuild the kernel using the AutoFDO and Propeller
155+
profiles. ::
156+
157+
CONFIG_AUTOFDO_CLANG=y
158+
CONFIG_PROPELLER_CLANG=y
159+
160+
and ::
161+
162+
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>

0 commit comments

Comments
 (0)