| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168 |
- .. SPDX-License-Identifier: GPL-2.0
- ===================================
- Using AutoFDO with the Linux kernel
- ===================================
- This enables AutoFDO build support for the kernel when using
- the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
- is a type of profile-guided optimization (PGO) used to enhance the
- performance of binary executables. It gathers information about the
- frequency of execution of various code paths within a binary using
- hardware sampling. This data is then used to guide the compiler's
- optimization decisions, resulting in a more efficient binary. AutoFDO
- is a powerful optimization technique, and data indicates that it can
- significantly improve kernel performance. It's especially beneficial
- for workloads affected by front-end stalls.
- For AutoFDO builds, unlike non-FDO builds, the user must supply a
- profile. Acquiring an AutoFDO profile can be done in several ways.
- AutoFDO profiles are created by converting hardware sampling using
- the "perf" tool. It is crucial that the workload used to create these
- perf files is representative; they must exhibit runtime
- characteristics similar to the workloads that are intended to be
- optimized. Failure to do so will result in the compiler optimizing
- for the wrong objective.
- The AutoFDO profile often encapsulates the program's behavior. If the
- performance-critical codes are architecture-independent, the profile
- can be applied across platforms to achieve performance gains. For
- instance, using the profile generated on Intel architecture to build
- a kernel for AMD architecture can also yield performance improvements.
- There are two methods for acquiring a representative profile:
- (1) Sample real workloads using a production environment.
- (2) Generate the profile using a representative load test.
- When enabling the AutoFDO build configuration without providing an
- AutoFDO profile, the compiler only modifies the dwarf information in
- the kernel without impacting runtime performance. It's advisable to
- use a kernel binary built with the same AutoFDO configuration to
- collect the perf profile. While it's possible to use a kernel built
- with different options, it may result in inferior performance.
- One can collect profiles using AutoFDO build for the previous kernel.
- AutoFDO employs relative line numbers to match the profiles, offering
- some tolerance for source changes. This mode is commonly used in a
- production environment for profile collection.
- In a profile collection based on a load test, the AutoFDO collection
- process consists of the following steps:
- #. Initial build: The kernel is built with AutoFDO options
- without a profile.
- #. Profiling: The above kernel is then run with a representative
- workload to gather execution frequency data. This data is
- collected using hardware sampling, via perf. AutoFDO is most
- effective on platforms supporting advanced PMU features like
- LBR on Intel machines.
- #. AutoFDO profile generation: Perf output file is converted to
- the AutoFDO profile via offline tools.
- The support requires a Clang compiler LLVM 17 or later.
- Preparation
- ===========
- Configure the kernel with::
- CONFIG_AUTOFDO_CLANG=y
- Customization
- =============
- The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
- AutoFDO builds. One can, however, enable or disable AutoFDO build for
- individual files and directories by adding a line similar to the following
- to the respective kernel Makefile:
- - For enabling a single file (e.g. foo.o) ::
- AUTOFDO_PROFILE_foo.o := y
- - For enabling all files in one directory ::
- AUTOFDO_PROFILE := y
- - For disabling one file ::
- AUTOFDO_PROFILE_foo.o := n
- - For disabling all files in one directory ::
- AUTOFDO_PROFILE := n
- Workflow
- ========
- Here is an example workflow for AutoFDO kernel:
- 1) Build the kernel on the host machine with LLVM enabled,
- for example, ::
- $ make menuconfig LLVM=1
- Turn on AutoFDO build config::
- CONFIG_AUTOFDO_CLANG=y
- With a configuration that with LLVM enabled, use the following command::
- $ scripts/config -e AUTOFDO_CLANG
- After getting the config, build with ::
- $ make LLVM=1
- 2) Install the kernel on the test machine.
- 3) Run the load tests. The '-c' option in perf specifies the sample
- event period. We suggest using a suitable prime number, like 500009,
- for this purpose.
- - For Intel platforms::
- $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
- - For AMD platforms:
- The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
- For Zen3::
- $ cat /proc/cpuinfo | grep " brs"
- For Zen4::
- $ cat /proc/cpuinfo | grep amd_lbr_v2
- The following command generated the perf data file::
- $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
- 4) (Optional) Download the raw perf file to the host machine.
- 5) To generate an AutoFDO profile, two offline tools are available:
- create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
- of the AutoFDO project and can be found on GitHub
- (https://github.com/google/autofdo), version v0.30.1 or later.
- The llvm_profgen tool is included in the LLVM compiler itself. It's
- important to note that the version of llvm_profgen doesn't need to match
- the version of Clang. It needs to be the LLVM 19 release of Clang
- or later, or just from the LLVM trunk. ::
- $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
- or ::
- $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
- Note that multiple AutoFDO profile files can be merged into one via::
- $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
- 6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
- (Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
- $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
|