Skip to content

Commit 555686d

Browse files
committed
gh-101282: rework the BOLT build process
(This change is a quick and dirty way to merge some of the build system improvements I'm proposing in gh-101093 before the 3.12 feature freeze. I wanted to scope bloat myself to fix some longstanding deficiencies in the build system around profile-guided builds. But I'm getting soft resistance to the reviews so close to the freeze deadline and it is obvious that we need a simpler solution to hit the 3.12 deadline. While this change is quick and dirty, it attempts to not make things worse.) Before this change, we only applied bolt to the main python binary. After this change, we apply bolt to libpython if it is configured. In shared library builds, most of the C code is in libpython so it is critical to apply bolt to libpython to realize bolt benefits. This change also reworks how bolt instrumentation is applied. It effectively removes the readelf based logic added in gh-101525 and replaces it with a mechanism that saves a copy of the pre-bolt binary and restores that copy when necessary. This allows us to perform bolt optimizations without having to manually delete the output binary to force a new bolt run. We also add a new make target for purging bolt files and hook it up to `clean` so bolt state is purged when appropriate. `.gitignore` rules have been added to ignore files related to bolt. Before and after this refactor, `make` will no-op after a previous run. Both versions should also share common make DAG deficiencies where targets fail to trigger as often as they need to or can trigger prematurely in certain scenarios. e.g. after this change you may need to `rm profile-bolt-stamp` to force a bolt run because there aren't appropriate non-phony targets for bolt's make target to depend on. Fixing this is a non-trivial amount of work that will likely have to wait until the 3.13 window. To make it easier to iterate on custom BOLT settings, the flags to pass to instrumentation and application are now defined in configure and can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and `BOLT_APPLY_FLAGS`.
1 parent d1732fe commit 555686d

File tree

6 files changed

+127
-131
lines changed

6 files changed

+127
-131
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@
2323
*.gc??
2424
*.profclang?
2525
*.profraw
26+
# Copies of binaries before BOLT optimizations.
27+
*.prebolt
28+
# BOLT profile data.
29+
*.fdata
2630
*.dyn
2731
.gdb_history
2832
.purify
@@ -124,6 +128,7 @@ Tools/unicode/data/
124128
/platform
125129
/profile-clean-stamp
126130
/profile-run-stamp
131+
/profile-bolt-stamp
127132
/Python/deepfreeze/*.c
128133
/pybuilddir.txt
129134
/pyconfig.h

Doc/using/configure.rst

+7-1
Original file line numberDiff line numberDiff line change
@@ -313,7 +313,13 @@ also be used to improve performance.
313313
experimental for now. Because this tool operates on machine code its success
314314
is dependent on a combination of the build environment + the other
315315
optimization configure args + the CPU architecture, and not all combinations
316-
are supported.
316+
are supported. BOLT versions before LLVM 16 are known to crash BOLT under
317+
some scenarios. Use of LLVM 16 or newer for BOLT optimization is stronger
318+
encouraged.
319+
320+
The ``BOLT_INSTRUMENT_FLAGS`` and ``BOLT_APPLY_FLAGS`` configure variables
321+
can be defined to override the default set of arguments for ``llvm-bolt``
322+
to instrument and apply BOLT data to binaries, respectively.
317323

318324
.. versionadded:: 3.12
319325

Makefile.pre.in

+50-15
Original file line numberDiff line numberDiff line change
@@ -672,21 +672,55 @@ profile-opt: profile-run-stamp
672672
-rm -f profile-clean-stamp
673673
$(MAKE) @DEF_MAKE_RULE@ CFLAGS_NODIST="$(CFLAGS_NODIST) $(PGO_PROF_USE_FLAG)" LDFLAGS_NODIST="$(LDFLAGS_NODIST)"
674674

675-
.PHONY: bolt-opt
676-
bolt-opt: @PREBOLT_RULE@
675+
# List of binaries that BOLT runs on.
676+
BOLT_BINARIES := @BOLT_BINARIES@
677+
678+
BOLT_INSTRUMENT_FLAGS := @BOLT_INSTRUMENT_FLAGS@
679+
BOLT_APPLY_FLAGS := @BOLT_APPLY_FLAGS@
680+
681+
.PHONY: clean-bolt
682+
clean-bolt:
683+
# Profile data.
677684
rm -f *.fdata
678-
@if $(READELF) -p .note.bolt_info $(BUILDPYTHON) | grep BOLT > /dev/null; then\
679-
echo "skip: $(BUILDPYTHON) is already BOLTed."; \
680-
else \
681-
@LLVM_BOLT@ ./$(BUILDPYTHON) -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $(BUILDPYTHON).bolt) -o $(BUILDPYTHON).bolt_inst; \
682-
./$(BUILDPYTHON).bolt_inst $(PROFILE_TASK) || true; \
683-
@MERGE_FDATA@ $(BUILDPYTHON).*.fdata > $(BUILDPYTHON).fdata; \
684-
@LLVM_BOLT@ ./$(BUILDPYTHON) -o $(BUILDPYTHON).bolt -data=$(BUILDPYTHON).fdata -update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot; \
685-
rm -f *.fdata; \
686-
rm -f $(BUILDPYTHON).bolt_inst; \
687-
mv $(BUILDPYTHON).bolt $(BUILDPYTHON); \
688-
fi
685+
# Pristine binaries before BOLT optimization.
686+
rm -f *.prebolt
687+
# BOLT instrumented binaries.
688+
rm -f *.bolt_inst
689+
690+
profile-bolt-stamp: $(BUILDPYTHON)
691+
# Ensure a pristine, pre-BOLT copy of the binary and no profile data from last run.
692+
for bin in $(BOLT_BINARIES); do \
693+
prebolt="$${bin}.prebolt"; \
694+
if [ -e "$${prebolt}" ]; then \
695+
echo "Restoring pre-BOLT binary $${prebolt}"; \
696+
mv "$${bin}.prebolt" "$${bin}"; \
697+
fi; \
698+
cp "$${bin}" "$${prebolt}"; \
699+
rm -f $${bin}.bolt.*.fdata $${bin}.fdata; \
700+
done
701+
# Instrument each binary.
702+
for bin in $(BOLT_BINARIES); do \
703+
@LLVM_BOLT@ "$${bin}" -instrument -instrumentation-file-append-pid -instrumentation-file=$(abspath $${bin}.bolt) -o $${bin}.bolt_inst $(BOLT_INSTRUMENT_FLAGS); \
704+
mv "$${bin}.bolt_inst" "$${bin}"; \
705+
done
706+
# Run instrumented binaries to collect data.
707+
$(RUNSHARED) ./$(BUILDPYTHON) $(PROFILE_TASK) || true
708+
# Merge all the data files together.
709+
for bin in $(BOLT_BINARIES); do \
710+
@MERGE_FDATA@ $${bin}.*.fdata > "$${bin}.fdata"; \
711+
rm -f $${bin}.*.fdata; \
712+
done
713+
# Run bolt against the merged data to produce an optimized binary.
714+
for bin in $(BOLT_BINARIES); do \
715+
@LLVM_BOLT@ "$${bin}.prebolt" -o "$${bin}.bolt" -data="$${bin}.fdata" $(BOLT_APPLY_FLAGS); \
716+
mv "$${bin}.bolt" "$${bin}"; \
717+
done
718+
touch $@
689719

720+
.PHONY: bolt-opt
721+
bolt-opt:
722+
$(MAKE) @PREBOLT_RULE@
723+
$(MAKE) profile-bolt-stamp
690724

691725
# Compile and run with gcov
692726
.PHONY: coverage
@@ -2623,10 +2657,11 @@ profile-removal:
26232657
rm -f $(COVERAGE_INFO)
26242658
rm -rf $(COVERAGE_REPORT)
26252659
rm -f profile-run-stamp
2660+
rm -f profile-bolt-stamp
26262661

26272662
.PHONY: clean
2628-
clean: clean-retain-profile
2629-
@if test @DEF_MAKE_ALL_RULE@ = profile-opt; then \
2663+
clean: clean-retain-profile clean-bolt
2664+
@if test @DEF_MAKE_ALL_RULE@ = profile-opt -o @DEF_MAKE_ALL_RULE@ = bolt-opt; then \
26302665
rm -f profile-gen-stamp profile-clean-stamp; \
26312666
$(MAKE) profile-removal; \
26322667
fi
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
BOLT optimization is now applied to the libpython shared library if building
2+
a shared library. BOLT instrumentation and application settings can now be
3+
influenced via the ``BOLT_INSTRUMENT_FLAGS`` and ``BOLT_APPLY_FLAGS``
4+
configure variables.

configure

+37-108
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

configure.ac

+24-7
Original file line numberDiff line numberDiff line change
@@ -2028,13 +2028,6 @@ if test "$Py_BOLT" = 'true' ; then
20282028
DEF_MAKE_ALL_RULE="bolt-opt"
20292029
DEF_MAKE_RULE="build_all"
20302030

2031-
AC_SUBST(READELF)
2032-
AC_CHECK_TOOLS(READELF, [readelf], "notfound")
2033-
if test "$READELF" == "notfound"
2034-
then
2035-
AC_MSG_ERROR([readelf is required for a --enable-bolt build but could not be found.])
2036-
fi
2037-
20382031
# -fno-reorder-blocks-and-partition is required for bolt to work.
20392032
# Possibly GCC only.
20402033
AX_CHECK_COMPILE_FLAG([-fno-reorder-blocks-and-partition],[
@@ -2067,6 +2060,30 @@ if test "$Py_BOLT" = 'true' ; then
20672060
fi
20682061
fi
20692062

2063+
# Enable BOLT of libpython if built.
2064+
AC_SUBST(BOLT_BINARIES)
2065+
BOLT_BINARIES='$(BUILDPYTHON)'
2066+
if test "${enable_shared}" = "yes"
2067+
then
2068+
BOLT_BINARIES="${BOLT_BINARIES} \$(INSTSONAME)"
2069+
fi
2070+
2071+
AC_ARG_VAR(BOLT_INSTRUMENT_FLAGS, Arguments to llvm-bolt when instrumenting binaries)
2072+
AC_MSG_CHECKING(BOLT_INSTRUMENT_FLAGS)
2073+
if test -z "${BOLT_INSTRUMENT_FLAGS}"
2074+
then
2075+
BOLT_INSTRUMENT_FLAGS=
2076+
fi
2077+
AC_MSG_RESULT($BOLT_INSTRUMENT_FLAGS)
2078+
2079+
AC_ARG_VAR(BOLT_APPLY_FLAGS, Arguments to llvm-bolt when creating a BOLT optimized binary)
2080+
AC_MSG_CHECKING(BOLT_APPLY_FLAGS)
2081+
if test -z "${BOLT_APPLY_FLAGS}"
2082+
then
2083+
BOLT_APPLY_FLAGS="-update-debug-sections -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -icf=1 -inline-all -split-eh -reorder-functions-use-hot-size -peepholes=none -jump-tables=aggressive -inline-ap -indirect-call-promotion=all -dyno-stats -use-gnu-stack -frame-opt=hot"
2084+
fi
2085+
AC_MSG_RESULT($BOLT_APPLY_FLAGS)
2086+
20702087
# XXX Shouldn't the code above that fiddles with BASECFLAGS and OPT be
20712088
# merged with this chunk of code?
20722089

0 commit comments

Comments
 (0)