Message ID | 20210226222030.3718075-1-morbo@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v8] pgo: add clang's Profile Guided Optimization infrastructure | expand |
On Fri, Feb 26, 2021 at 2:20 PM Bill Wendling <morbo@google.com> wrote: > > From: Sami Tolvanen <samitolvanen@google.com> > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a > profile, the kernel is instrumented with PGO counters, a representative > workload is run, and the raw profile data is collected from > /sys/kernel/debug/pgo/profraw. > > The raw profile data must be processed by clang's "llvm-profdata" tool > before it can be used during recompilation: > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw > > Multiple raw profiles may be merged during this step. > > The data can now be used by the compiler: > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... > > This initial submission is restricted to x86, as that's the platform we > know works. This restriction can be lifted once other platforms have > been verified to work with PGO. > > Note that this method of profiling the kernel is clang-native, unlike > the clang support in kernel/gcov. > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > Co-developed-by: Bill Wendling <morbo@google.com> > Signed-off-by: Bill Wendling <morbo@google.com> I forgot to add these tags: Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> > --- > v8: - Rebased on top-of-tree. > v7: - Fix minor build failure reported by Sedat. > v6: - Add better documentation about the locking scheme and other things. > - Rename macros to better match the same macros in LLVM's source code. > v5: - Correct padding calculation, discovered by Nathan Chancellor. > v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our > own popcount implementation, based on Nick Desaulniers's comment. > v3: - Added change log section based on Sedat Dilek's comments. > v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's > testing. > - Corrected documentation, re PGO flags when using LTO, based on Fangrui > Song's comments. > --- > Documentation/dev-tools/index.rst | 1 + > Documentation/dev-tools/pgo.rst | 127 +++++++++ > MAINTAINERS | 9 + > Makefile | 3 + > arch/Kconfig | 1 + > arch/x86/Kconfig | 1 + > arch/x86/boot/Makefile | 1 + > arch/x86/boot/compressed/Makefile | 1 + > arch/x86/crypto/Makefile | 4 + > arch/x86/entry/vdso/Makefile | 1 + > arch/x86/kernel/vmlinux.lds.S | 2 + > arch/x86/platform/efi/Makefile | 1 + > arch/x86/purgatory/Makefile | 1 + > arch/x86/realmode/rm/Makefile | 1 + > arch/x86/um/vdso/Makefile | 1 + > drivers/firmware/efi/libstub/Makefile | 1 + > include/asm-generic/vmlinux.lds.h | 44 +++ > kernel/Makefile | 1 + > kernel/pgo/Kconfig | 35 +++ > kernel/pgo/Makefile | 5 + > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++ > kernel/pgo/instrument.c | 189 +++++++++++++ > kernel/pgo/pgo.h | 203 ++++++++++++++ > scripts/Makefile.lib | 10 + > 24 files changed, 1032 insertions(+) > create mode 100644 Documentation/dev-tools/pgo.rst > create mode 100644 kernel/pgo/Kconfig > create mode 100644 kernel/pgo/Makefile > create mode 100644 kernel/pgo/fs.c > create mode 100644 kernel/pgo/instrument.c > create mode 100644 kernel/pgo/pgo.h > > diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst > index f7809c7b1ba9..8d6418e85806 100644 > --- a/Documentation/dev-tools/index.rst > +++ b/Documentation/dev-tools/index.rst > @@ -26,6 +26,7 @@ whole; patches welcome! > kgdb > kselftest > kunit/index > + pgo > > > .. only:: subproject and html > diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst > new file mode 100644 > index 000000000000..b7f11d8405b7 > --- /dev/null > +++ b/Documentation/dev-tools/pgo.rst > @@ -0,0 +1,127 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +=============================== > +Using PGO with the Linux kernel > +=============================== > + > +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel > +when building with Clang. The profiling data is exported via the ``pgo`` > +debugfs directory. > + > +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization > + > + > +Preparation > +=========== > + > +Configure the kernel with: > + > +.. code-block:: make > + > + CONFIG_DEBUG_FS=y > + CONFIG_PGO_CLANG=y > + > +Note that kernels compiled with profiling flags will be significantly larger > +and run slower. > + > +Profiling data will only become accessible once debugfs has been mounted: > + > +.. code-block:: sh > + > + mount -t debugfs none /sys/kernel/debug > + > + > +Customization > +============= > + > +You can enable or disable profiling for individual file and directories by > +adding a line similar to the following to the respective kernel Makefile: > + > +- For a single file (e.g. main.o) > + > + .. code-block:: make > + > + PGO_PROFILE_main.o := y > + > +- For all files in one directory > + > + .. code-block:: make > + > + PGO_PROFILE := y > + > +To exclude files from being profiled use > + > + .. code-block:: make > + > + PGO_PROFILE_main.o := n > + > +and > + > + .. code-block:: make > + > + PGO_PROFILE := n > + > +Only files which are linked to the main kernel image or are compiled as kernel > +modules are supported by this mechanism. > + > + > +Files > +===== > + > +The PGO kernel support creates the following files in debugfs: > + > +``/sys/kernel/debug/pgo`` > + Parent directory for all PGO-related files. > + > +``/sys/kernel/debug/pgo/reset`` > + Global reset file: resets all coverage data to zero when written to. > + > +``/sys/kernel/debug/profraw`` > + The raw PGO data that must be processed with ``llvm_profdata``. > + > + > +Workflow > +======== > + > +The PGO kernel can be run on the host or test machines. The data though should > +be analyzed with Clang's tools from the same Clang version as the kernel was > +compiled. Clang's tolerant of version skew, but it's easier to use the same > +Clang version. > + > +The profiling data is useful for optimizing the kernel, analyzing coverage, > +etc. Clang offers tools to perform these tasks. > + > +Here is an example workflow for profiling an instrumented kernel with PGO and > +using the result to optimize the kernel: > + > +1) Install the kernel on the TEST machine. > + > +2) Reset the data counters right before running the load tests > + > + .. code-block:: sh > + > + $ echo 1 > /sys/kernel/debug/pgo/reset > + > +3) Run the load tests. > + > +4) Collect the raw profile data > + > + .. code-block:: sh > + > + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw > + > +5) (Optional) Download the raw profile data to the HOST machine. > + > +6) Process the raw profile data > + > + .. code-block:: sh > + > + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw > + > + Note that multiple raw profile data files can be merged during this step. > + > +7) Rebuild the kernel using the profile data (PGO disabled) > + > + .. code-block:: sh > + > + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... > diff --git a/MAINTAINERS b/MAINTAINERS > index c71664ca8bfd..3a6668792bc5 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -14019,6 +14019,15 @@ S: Maintained > F: include/linux/personality.h > F: include/uapi/linux/personality.h > > +PGO BASED KERNEL PROFILING > +M: Sami Tolvanen <samitolvanen@google.com> > +M: Bill Wendling <wcw@google.com> > +R: Nathan Chancellor <natechancellor@gmail.com> > +R: Nick Desaulniers <ndesaulniers@google.com> > +S: Supported > +F: Documentation/dev-tools/pgo.rst > +F: kernel/pgo > + > PHOENIX RC FLIGHT CONTROLLER ADAPTER > M: Marcus Folkesson <marcus.folkesson@gmail.com> > L: linux-input@vger.kernel.org > diff --git a/Makefile b/Makefile > index 6ecd0d22e608..b57d4d44c799 100644 > --- a/Makefile > +++ b/Makefile > @@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD > # Defaults to vmlinux, but the arch makefile usually adds further targets > all: vmlinux > > +CFLAGS_PGO_CLANG := -fprofile-generate > +export CFLAGS_PGO_CLANG > + > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \ > $(call cc-option,-fno-tree-loop-im) \ > $(call cc-disable-warning,maybe-uninitialized,) > diff --git a/arch/Kconfig b/arch/Kconfig > index 2bb30673d8e6..111e642a2af7 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT > bool > > source "kernel/gcov/Kconfig" > +source "kernel/pgo/Kconfig" > > source "scripts/gcc-plugins/Kconfig" > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index cd4b9b1204a8..c9808583b528 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -99,6 +99,7 @@ config X86 > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096 > select ARCH_SUPPORTS_LTO_CLANG if X86_64 > select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64 > + select ARCH_SUPPORTS_PGO_CLANG if X86_64 > select ARCH_USE_BUILTIN_BSWAP > select ARCH_USE_QUEUED_RWLOCKS > select ARCH_USE_QUEUED_SPINLOCKS > diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile > index fe605205b4ce..383853e32f67 100644 > --- a/arch/x86/boot/Makefile > +++ b/arch/x86/boot/Makefile > @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables > GCOV_PROFILE := n > +PGO_PROFILE := n > UBSAN_SANITIZE := n > > $(obj)/bzImage: asflags-y := $(SVGA_MODE) > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile > index e0bc3988c3fa..ed12ab65f606 100644 > --- a/arch/x86/boot/compressed/Makefile > +++ b/arch/x86/boot/compressed/Makefile > @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/ > > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > GCOV_PROFILE := n > +PGO_PROFILE := n > UBSAN_SANITIZE :=n > > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE) > diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile > index b28e36b7c96b..4b2e9620c412 100644 > --- a/arch/x86/crypto/Makefile > +++ b/arch/x86/crypto/Makefile > @@ -4,6 +4,10 @@ > > OBJECT_FILES_NON_STANDARD := y > > +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of > +# registers for some of the functions. > +PGO_PROFILE_curve25519-x86_64.o := n > + > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o > twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o > obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile > index 05c4abc2fdfd..f7421e44725a 100644 > --- a/arch/x86/entry/vdso/Makefile > +++ b/arch/x86/entry/vdso/Makefile > @@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@ > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \ > $(call ld-option, --eh-frame-hdr) -Bsymbolic > GCOV_PROFILE := n > +PGO_PROFILE := n > > quiet_cmd_vdso_and_check = VDSO $@ > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check) > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index efd9e9ea17f2..f6cab2316c46 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -184,6 +184,8 @@ SECTIONS > > BUG_TABLE > > + PGO_CLANG_DATA > + > ORC_UNWIND_TABLE > > . = ALIGN(PAGE_SIZE); > diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile > index 84b09c230cbd..5f22b31446ad 100644 > --- a/arch/x86/platform/efi/Makefile > +++ b/arch/x86/platform/efi/Makefile > @@ -2,6 +2,7 @@ > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y > KASAN_SANITIZE := n > GCOV_PROFILE := n > +PGO_PROFILE := n > > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile > index 95ea17a9d20c..36f20e99da0b 100644 > --- a/arch/x86/purgatory/Makefile > +++ b/arch/x86/purgatory/Makefile > @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk > > # Sanitizer, etc. runtimes are unavailable and cannot be linked here. > GCOV_PROFILE := n > +PGO_PROFILE := n > KASAN_SANITIZE := n > UBSAN_SANITIZE := n > KCSAN_SANITIZE := n > diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile > index 83f1b6a56449..21797192f958 100644 > --- a/arch/x86/realmode/rm/Makefile > +++ b/arch/x86/realmode/rm/Makefile > @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \ > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables > GCOV_PROFILE := n > +PGO_PROFILE := n > UBSAN_SANITIZE := n > diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile > index 5943387e3f35..54f5768f5853 100644 > --- a/arch/x86/um/vdso/Makefile > +++ b/arch/x86/um/vdso/Makefile > @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@ > > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv > GCOV_PROFILE := n > +PGO_PROFILE := n > > # > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y). > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile > index c23466e05e60..724fb389bb9d 100644 > --- a/drivers/firmware/efi/libstub/Makefile > +++ b/drivers/firmware/efi/libstub/Makefile > @@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) > > GCOV_PROFILE := n > +PGO_PROFILE := n > # Sanitizer runtimes are unavailable and cannot be linked here. > KASAN_SANITIZE := n > KCSAN_SANITIZE := n > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index 6786f8c0182f..4a0c21b840b3 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -329,6 +329,49 @@ > #define DTPM_TABLE() > #endif > > +#ifdef CONFIG_PGO_CLANG > +#define PGO_CLANG_DATA \ > + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \ > + . = ALIGN(8); \ > + __llvm_prf_start = .; \ > + __llvm_prf_data_start = .; \ > + KEEP(*(__llvm_prf_data)) \ > + . = ALIGN(8); \ > + __llvm_prf_data_end = .; \ > + } \ > + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \ > + . = ALIGN(8); \ > + __llvm_prf_cnts_start = .; \ > + KEEP(*(__llvm_prf_cnts)) \ > + . = ALIGN(8); \ > + __llvm_prf_cnts_end = .; \ > + } \ > + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \ > + . = ALIGN(8); \ > + __llvm_prf_names_start = .; \ > + KEEP(*(__llvm_prf_names)) \ > + . = ALIGN(8); \ > + __llvm_prf_names_end = .; \ > + . = ALIGN(8); \ > + } \ > + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \ > + __llvm_prf_vals_start = .; \ > + KEEP(*(__llvm_prf_vals)) \ > + . = ALIGN(8); \ > + __llvm_prf_vals_end = .; \ > + . = ALIGN(8); \ > + } \ > + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \ > + __llvm_prf_vnds_start = .; \ > + KEEP(*(__llvm_prf_vnds)) \ > + . = ALIGN(8); \ > + __llvm_prf_vnds_end = .; \ > + __llvm_prf_end = .; \ > + } > +#else > +#define PGO_CLANG_DATA > +#endif > + > #define KERNEL_DTB() \ > STRUCT_ALIGN(); \ > __dtb_start = .; \ > @@ -1105,6 +1148,7 @@ > CONSTRUCTORS \ > } \ > BUG_TABLE \ > + PGO_CLANG_DATA > > #define INIT_TEXT_SECTION(inittext_align) \ > . = ALIGN(inittext_align); \ > diff --git a/kernel/Makefile b/kernel/Makefile > index 320f1f3941b7..a2a23ef2b12f 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/ > obj-$(CONFIG_KCSAN) += kcsan/ > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o > +obj-$(CONFIG_PGO_CLANG) += pgo/ > > obj-$(CONFIG_PERF_EVENTS) += events/ > > diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig > new file mode 100644 > index 000000000000..76a640b6cf6e > --- /dev/null > +++ b/kernel/pgo/Kconfig > @@ -0,0 +1,35 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)" > + > +config ARCH_SUPPORTS_PGO_CLANG > + bool > + > +config PGO_CLANG > + bool "Enable clang's PGO-based kernel profiling" > + depends on DEBUG_FS > + depends on ARCH_SUPPORTS_PGO_CLANG > + depends on CC_IS_CLANG && CLANG_VERSION >= 120000 > + help > + This option enables clang's PGO (Profile Guided Optimization) based > + code profiling to better optimize the kernel. > + > + If unsure, say N. > + > + Run a representative workload for your application on a kernel > + compiled with this option and download the raw profile file from > + /sys/kernel/debug/pgo/profraw. This file needs to be processed with > + llvm-profdata. It may be merged with other collected raw profiles. > + > + Copy the resulting profile file into vmlinux.profdata, and enable > + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized > + kernel. > + > + Note that a kernel compiled with profiling flags will be > + significantly larger and run slower. Also be sure to exclude files > + from profiling which are not linked to the kernel image to prevent > + linker errors. > + > + Note that the debugfs filesystem has to be mounted to access > + profiling data. > + > +endmenu > diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile > new file mode 100644 > index 000000000000..41e27cefd9a4 > --- /dev/null > +++ b/kernel/pgo/Makefile > @@ -0,0 +1,5 @@ > +# SPDX-License-Identifier: GPL-2.0 > +GCOV_PROFILE := n > +PGO_PROFILE := n > + > +obj-y += fs.o instrument.o > diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c > new file mode 100644 > index 000000000000..1678df3b7d64 > --- /dev/null > +++ b/kernel/pgo/fs.c > @@ -0,0 +1,389 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2019 Google, Inc. > + * > + * Author: > + * Sami Tolvanen <samitolvanen@google.com> > + * > + * This software is licensed under the terms of the GNU General Public > + * License version 2, as published by the Free Software Foundation, and > + * may be copied, distributed, and modified under those terms. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > + > +#define pr_fmt(fmt) "pgo: " fmt > + > +#include <linux/kernel.h> > +#include <linux/debugfs.h> > +#include <linux/fs.h> > +#include <linux/module.h> > +#include <linux/slab.h> > +#include <linux/vmalloc.h> > +#include "pgo.h" > + > +static struct dentry *directory; > + > +struct prf_private_data { > + void *buffer; > + unsigned long size; > +}; > + > +/* > + * Raw profile data format: > + * > + * - llvm_prf_header > + * - __llvm_prf_data > + * - __llvm_prf_cnts > + * - __llvm_prf_names > + * - zero padding to 8 bytes > + * - for each llvm_prf_data in __llvm_prf_data: > + * - llvm_prf_value_data > + * - llvm_prf_value_record + site count array > + * - llvm_prf_value_node_data > + * ... > + * ... > + * ... > + */ > + > +static void prf_fill_header(void **buffer) > +{ > + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer; > + > +#ifdef CONFIG_64BIT > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64; > +#else > + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32; > +#endif > + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION; > + header->data_size = prf_data_count(); > + header->padding_bytes_before_counters = 0; > + header->counters_size = prf_cnts_count(); > + header->padding_bytes_after_counters = 0; > + header->names_size = prf_names_count(); > + header->counters_delta = (u64)__llvm_prf_cnts_start; > + header->names_delta = (u64)__llvm_prf_names_start; > + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST; > + > + *buffer += sizeof(*header); > +} > + > +/* > + * Copy the source into the buffer, incrementing the pointer into buffer in the > + * process. > + */ > +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size) > +{ > + memcpy(*buffer, src, size); > + *buffer += size; > +} > + > +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds) > +{ > + struct llvm_prf_value_node **nodes = > + (struct llvm_prf_value_node **)p->values; > + u32 kinds = 0; > + u32 size = 0; > + unsigned int kind; > + unsigned int n; > + unsigned int s = 0; > + > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { > + unsigned int sites = p->num_value_sites[kind]; > + > + if (!sites) > + continue; > + > + /* Record + site count array */ > + size += prf_get_value_record_size(sites); > + kinds++; > + > + if (!nodes) > + continue; > + > + for (n = 0; n < sites; n++) { > + u32 count = 0; > + struct llvm_prf_value_node *site = nodes[s + n]; > + > + while (site && ++count <= U8_MAX) > + site = site->next; > + > + size += count * > + sizeof(struct llvm_prf_value_node_data); > + } > + > + s += sites; > + } > + > + if (size) > + size += sizeof(struct llvm_prf_value_data); > + > + if (value_kinds) > + *value_kinds = kinds; > + > + return size; > +} > + > +static u32 prf_get_value_size(void) > +{ > + u32 size = 0; > + struct llvm_prf_data *p; > + > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) > + size += __prf_get_value_size(p, NULL); > + > + return size; > +} > + > +/* Serialize the profiling's value. */ > +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer) > +{ > + struct llvm_prf_value_data header; > + struct llvm_prf_value_node **nodes = > + (struct llvm_prf_value_node **)p->values; > + unsigned int kind; > + unsigned int n; > + unsigned int s = 0; > + > + header.total_size = __prf_get_value_size(p, &header.num_value_kinds); > + > + if (!header.num_value_kinds) > + /* Nothing to write. */ > + return; > + > + prf_copy_to_buffer(buffer, &header, sizeof(header)); > + > + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { > + struct llvm_prf_value_record *record; > + u8 *counts; > + unsigned int sites = p->num_value_sites[kind]; > + > + if (!sites) > + continue; > + > + /* Profiling value record. */ > + record = *(struct llvm_prf_value_record **)buffer; > + *buffer += prf_get_value_record_header_size(); > + > + record->kind = kind; > + record->num_value_sites = sites; > + > + /* Site count array. */ > + counts = *(u8 **)buffer; > + *buffer += prf_get_value_record_site_count_size(sites); > + > + /* > + * If we don't have nodes, we can skip updating the site count > + * array, because the buffer is zero filled. > + */ > + if (!nodes) > + continue; > + > + for (n = 0; n < sites; n++) { > + u32 count = 0; > + struct llvm_prf_value_node *site = nodes[s + n]; > + > + while (site && ++count <= U8_MAX) { > + prf_copy_to_buffer(buffer, site, > + sizeof(struct llvm_prf_value_node_data)); > + site = site->next; > + } > + > + counts[n] = (u8)count; > + } > + > + s += sites; > + } > +} > + > +static void prf_serialize_values(void **buffer) > +{ > + struct llvm_prf_data *p; > + > + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) > + prf_serialize_value(p, buffer); > +} > + > +static inline unsigned long prf_get_padding(unsigned long size) > +{ > + return 7 & (sizeof(u64) - size % sizeof(u64)); > +} > + > +static unsigned long prf_buffer_size(void) > +{ > + return sizeof(struct llvm_prf_header) + > + prf_data_size() + > + prf_cnts_size() + > + prf_names_size() + > + prf_get_padding(prf_names_size()) + > + prf_get_value_size(); > +} > + > +/* > + * Serialize the profiling data into a format LLVM's tools can understand. > + * Note: caller *must* hold pgo_lock. > + */ > +static int prf_serialize(struct prf_private_data *p) > +{ > + int err = 0; > + void *buffer; > + > + p->size = prf_buffer_size(); > + p->buffer = vzalloc(p->size); > + > + if (!p->buffer) { > + err = -ENOMEM; > + goto out; > + } > + > + buffer = p->buffer; > + > + prf_fill_header(&buffer); > + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size()); > + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size()); > + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size()); > + buffer += prf_get_padding(prf_names_size()); > + > + prf_serialize_values(&buffer); > + > +out: > + return err; > +} > + > +/* open() implementation for PGO. Creates a copy of the profiling data set. */ > +static int prf_open(struct inode *inode, struct file *file) > +{ > + struct prf_private_data *data; > + unsigned long flags; > + int err; > + > + data = kzalloc(sizeof(*data), GFP_KERNEL); > + if (!data) { > + err = -ENOMEM; > + goto out; > + } > + > + flags = prf_lock(); > + > + err = prf_serialize(data); > + if (unlikely(err)) { > + kfree(data); > + goto out_unlock; > + } > + > + file->private_data = data; > + > +out_unlock: > + prf_unlock(flags); > +out: > + return err; > +} > + > +/* read() implementation for PGO. */ > +static ssize_t prf_read(struct file *file, char __user *buf, size_t count, > + loff_t *ppos) > +{ > + struct prf_private_data *data = file->private_data; > + > + BUG_ON(!data); > + > + return simple_read_from_buffer(buf, count, ppos, data->buffer, > + data->size); > +} > + > +/* release() implementation for PGO. Release resources allocated by open(). */ > +static int prf_release(struct inode *inode, struct file *file) > +{ > + struct prf_private_data *data = file->private_data; > + > + if (data) { > + vfree(data->buffer); > + kfree(data); > + } > + > + return 0; > +} > + > +static const struct file_operations prf_fops = { > + .owner = THIS_MODULE, > + .open = prf_open, > + .read = prf_read, > + .llseek = default_llseek, > + .release = prf_release > +}; > + > +/* write() implementation for resetting PGO's profile data. */ > +static ssize_t reset_write(struct file *file, const char __user *addr, > + size_t len, loff_t *pos) > +{ > + struct llvm_prf_data *data; > + > + memset(__llvm_prf_cnts_start, 0, prf_cnts_size()); > + > + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) { > + struct llvm_prf_value_node **vnodes; > + u64 current_vsite_count; > + u32 i; > + > + if (!data->values) > + continue; > + > + current_vsite_count = 0; > + vnodes = (struct llvm_prf_value_node **)data->values; > + > + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++) > + current_vsite_count += data->num_value_sites[i]; > + > + for (i = 0; i < current_vsite_count; i++) { > + struct llvm_prf_value_node *current_vnode = vnodes[i]; > + > + while (current_vnode) { > + current_vnode->count = 0; > + current_vnode = current_vnode->next; > + } > + } > + } > + > + return len; > +} > + > +static const struct file_operations prf_reset_fops = { > + .owner = THIS_MODULE, > + .write = reset_write, > + .llseek = noop_llseek, > +}; > + > +/* Create debugfs entries. */ > +static int __init pgo_init(void) > +{ > + directory = debugfs_create_dir("pgo", NULL); > + if (!directory) > + goto err_remove; > + > + if (!debugfs_create_file("profraw", 0600, directory, NULL, > + &prf_fops)) > + goto err_remove; > + > + if (!debugfs_create_file("reset", 0200, directory, NULL, > + &prf_reset_fops)) > + goto err_remove; > + > + return 0; > + > +err_remove: > + pr_err("initialization failed\n"); > + return -EIO; > +} > + > +/* Remove debugfs entries. */ > +static void __exit pgo_exit(void) > +{ > + debugfs_remove_recursive(directory); > +} > + > +module_init(pgo_init); > +module_exit(pgo_exit); > diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c > new file mode 100644 > index 000000000000..62ff5cfce7b1 > --- /dev/null > +++ b/kernel/pgo/instrument.c > @@ -0,0 +1,189 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2019 Google, Inc. > + * > + * Author: > + * Sami Tolvanen <samitolvanen@google.com> > + * > + * This software is licensed under the terms of the GNU General Public > + * License version 2, as published by the Free Software Foundation, and > + * may be copied, distributed, and modified under those terms. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > + > +#define pr_fmt(fmt) "pgo: " fmt > + > +#include <linux/bitops.h> > +#include <linux/kernel.h> > +#include <linux/export.h> > +#include <linux/spinlock.h> > +#include <linux/types.h> > +#include "pgo.h" > + > +/* > + * This lock guards both profile count updating and serialization of the > + * profiling data. Keeping both of these activities separate via locking > + * ensures that we don't try to serialize data that's only partially updated. > + */ > +static DEFINE_SPINLOCK(pgo_lock); > +static int current_node; > + > +unsigned long prf_lock(void) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&pgo_lock, flags); > + > + return flags; > +} > + > +void prf_unlock(unsigned long flags) > +{ > + spin_unlock_irqrestore(&pgo_lock, flags); > +} > + > +/* > + * Return a newly allocated profiling value node which contains the tracked > + * value by the value profiler. > + * Note: caller *must* hold pgo_lock. > + */ > +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p, > + u32 index, u64 value) > +{ > + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end) > + return NULL; /* Out of nodes */ > + > + current_node++; > + > + /* Make sure the node is entirely within the section */ > + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end || > + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end) > + return NULL; > + > + return &__llvm_prf_vnds_start[current_node]; > +} > + > +/* > + * Counts the number of times a target value is seen. > + * > + * Records the target value for the index if not seen before. Otherwise, > + * increments the counter associated w/ the target value. > + */ > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index); > +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index) > +{ > + struct llvm_prf_data *p = (struct llvm_prf_data *)data; > + struct llvm_prf_value_node **counters; > + struct llvm_prf_value_node *curr; > + struct llvm_prf_value_node *min = NULL; > + struct llvm_prf_value_node *prev = NULL; > + u64 min_count = U64_MAX; > + u8 values = 0; > + unsigned long flags; > + > + if (!p || !p->values) > + return; > + > + counters = (struct llvm_prf_value_node **)p->values; > + curr = counters[index]; > + > + while (curr) { > + if (target_value == curr->value) { > + curr->count++; > + return; > + } > + > + if (curr->count < min_count) { > + min_count = curr->count; > + min = curr; > + } > + > + prev = curr; > + curr = curr->next; > + values++; > + } > + > + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) { > + if (!min->count || !(--min->count)) { > + curr = min; > + curr->value = target_value; > + curr->count++; > + } > + return; > + } > + > + /* Lock when updating the value node structure. */ > + flags = prf_lock(); > + > + curr = allocate_node(p, index, target_value); > + if (!curr) > + goto out; > + > + curr->value = target_value; > + curr->count++; > + > + if (!counters[index]) > + counters[index] = curr; > + else if (prev && !prev->next) > + prev->next = curr; > + > +out: > + prf_unlock(flags); > +} > +EXPORT_SYMBOL(__llvm_profile_instrument_target); > + > +/* Counts the number of times a range of targets values are seen. */ > +void __llvm_profile_instrument_range(u64 target_value, void *data, > + u32 index, s64 precise_start, > + s64 precise_last, s64 large_value); > +void __llvm_profile_instrument_range(u64 target_value, void *data, > + u32 index, s64 precise_start, > + s64 precise_last, s64 large_value) > +{ > + if (large_value != S64_MIN && (s64)target_value >= large_value) > + target_value = large_value; > + else if ((s64)target_value < precise_start || > + (s64)target_value > precise_last) > + target_value = precise_last + 1; > + > + __llvm_profile_instrument_target(target_value, data, index); > +} > +EXPORT_SYMBOL(__llvm_profile_instrument_range); > + > +static u64 inst_prof_get_range_rep_value(u64 value) > +{ > + if (value <= 8) > + /* The first ranges are individually tracked, use it as is. */ > + return value; > + else if (value >= 513) > + /* The last range is mapped to its lowest value. */ > + return 513; > + else if (hweight64(value) == 1) > + /* If it's a power of two, use it as is. */ > + return value; > + > + /* Otherwise, take to the previous power of two + 1. */ > + return (1 << (64 - __builtin_clzll(value) - 1)) + 1; > +} > + > +/* > + * The target values are partitioned into multiple ranges. The range spec is > + * defined in compiler-rt/include/profile/InstrProfData.inc. > + */ > +void __llvm_profile_instrument_memop(u64 target_value, void *data, > + u32 counter_index); > +void __llvm_profile_instrument_memop(u64 target_value, void *data, > + u32 counter_index) > +{ > + u64 rep_value; > + > + /* Map the target value to the representative value of its range. */ > + rep_value = inst_prof_get_range_rep_value(target_value); > + __llvm_profile_instrument_target(rep_value, data, counter_index); > +} > +EXPORT_SYMBOL(__llvm_profile_instrument_memop); > diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h > new file mode 100644 > index 000000000000..ddc8d3002fe5 > --- /dev/null > +++ b/kernel/pgo/pgo.h > @@ -0,0 +1,203 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) 2019 Google, Inc. > + * > + * Author: > + * Sami Tolvanen <samitolvanen@google.com> > + * > + * This software is licensed under the terms of the GNU General Public > + * License version 2, as published by the Free Software Foundation, and > + * may be copied, distributed, and modified under those terms. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > + > +#ifndef _PGO_H > +#define _PGO_H > + > +/* > + * Note: These internal LLVM definitions must match the compiler version. > + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code. > + */ > + > +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \ > + ((u64)255 << 56 | \ > + (u64)'l' << 48 | \ > + (u64)'p' << 40 | \ > + (u64)'r' << 32 | \ > + (u64)'o' << 24 | \ > + (u64)'f' << 16 | \ > + (u64)'r' << 8 | \ > + (u64)129) > +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \ > + ((u64)255 << 56 | \ > + (u64)'l' << 48 | \ > + (u64)'p' << 40 | \ > + (u64)'r' << 32 | \ > + (u64)'o' << 24 | \ > + (u64)'f' << 16 | \ > + (u64)'R' << 8 | \ > + (u64)129) > + > +#define LLVM_INSTR_PROF_RAW_VERSION 5 > +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8 > +#define LLVM_INSTR_PROF_IPVK_FIRST 0 > +#define LLVM_INSTR_PROF_IPVK_LAST 1 > +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255 > + > +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56) > +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57) > + > +/** > + * struct llvm_prf_header - represents the raw profile header data structure. > + * @magic: the magic token for the file format. > + * @version: the version of the file format. > + * @data_size: the number of entries in the profile data section. > + * @padding_bytes_before_counters: the number of padding bytes before the > + * counters. > + * @counters_size: the size in bytes of the LLVM profile section containing the > + * counters. > + * @padding_bytes_after_counters: the number of padding bytes after the > + * counters. > + * @names_size: the size in bytes of the LLVM profile section containing the > + * counters' names. > + * @counters_delta: the beginning of the LLMV profile counters section. > + * @names_delta: the beginning of the LLMV profile names section. > + * @value_kind_last: the last profile value kind. > + */ > +struct llvm_prf_header { > + u64 magic; > + u64 version; > + u64 data_size; > + u64 padding_bytes_before_counters; > + u64 counters_size; > + u64 padding_bytes_after_counters; > + u64 names_size; > + u64 counters_delta; > + u64 names_delta; > + u64 value_kind_last; > +}; > + > +/** > + * struct llvm_prf_data - represents the per-function control structure. > + * @name_ref: the reference to the function's name. > + * @func_hash: the hash value of the function. > + * @counter_ptr: a pointer to the profile counter. > + * @function_ptr: a pointer to the function. > + * @values: the profiling values associated with this function. > + * @num_counters: the number of counters in the function. > + * @num_value_sites: the number of value profile sites. > + */ > +struct llvm_prf_data { > + const u64 name_ref; > + const u64 func_hash; > + const void *counter_ptr; > + const void *function_ptr; > + void *values; > + const u32 num_counters; > + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1]; > +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT); > + > +/** > + * structure llvm_prf_value_node_data - represents the data part of the struct > + * llvm_prf_value_node data structure. > + * @value: the value counters. > + * @count: the counters' count. > + */ > +struct llvm_prf_value_node_data { > + u64 value; > + u64 count; > +}; > + > +/** > + * struct llvm_prf_value_node - represents an internal data structure used by > + * the value profiler. > + * @value: the value counters. > + * @count: the counters' count. > + * @next: the next value node. > + */ > +struct llvm_prf_value_node { > + u64 value; > + u64 count; > + struct llvm_prf_value_node *next; > +}; > + > +/** > + * struct llvm_prf_value_data - represents the value profiling data in indexed > + * format. > + * @total_size: the total size in bytes including this field. > + * @num_value_kinds: the number of value profile kinds that has value profile > + * data. > + */ > +struct llvm_prf_value_data { > + u32 total_size; > + u32 num_value_kinds; > +}; > + > +/** > + * struct llvm_prf_value_record - represents the on-disk layout of the value > + * profile data of a particular kind for one function. > + * @kind: the kind of the value profile record. > + * @num_value_sites: the number of value profile sites. > + * @site_count_array: the first element of the array that stores the number > + * of profiled values for each value site. > + */ > +struct llvm_prf_value_record { > + u32 kind; > + u32 num_value_sites; > + u8 site_count_array[]; > +}; > + > +#define prf_get_value_record_header_size() \ > + offsetof(struct llvm_prf_value_record, site_count_array) > +#define prf_get_value_record_site_count_size(sites) \ > + roundup((sites), 8) > +#define prf_get_value_record_size(sites) \ > + (prf_get_value_record_header_size() + \ > + prf_get_value_record_site_count_size((sites))) > + > +/* Data sections */ > +extern struct llvm_prf_data __llvm_prf_data_start[]; > +extern struct llvm_prf_data __llvm_prf_data_end[]; > + > +extern u64 __llvm_prf_cnts_start[]; > +extern u64 __llvm_prf_cnts_end[]; > + > +extern char __llvm_prf_names_start[]; > +extern char __llvm_prf_names_end[]; > + > +extern struct llvm_prf_value_node __llvm_prf_vnds_start[]; > +extern struct llvm_prf_value_node __llvm_prf_vnds_end[]; > + > +/* Locking for vnodes */ > +extern unsigned long prf_lock(void); > +extern void prf_unlock(unsigned long flags); > + > +#define __DEFINE_PRF_SIZE(s) \ > + static inline unsigned long prf_ ## s ## _size(void) \ > + { \ > + unsigned long start = \ > + (unsigned long)__llvm_prf_ ## s ## _start; \ > + unsigned long end = \ > + (unsigned long)__llvm_prf_ ## s ## _end; \ > + return roundup(end - start, \ > + sizeof(__llvm_prf_ ## s ## _start[0])); \ > + } \ > + static inline unsigned long prf_ ## s ## _count(void) \ > + { \ > + return prf_ ## s ## _size() / \ > + sizeof(__llvm_prf_ ## s ## _start[0]); \ > + } > + > +__DEFINE_PRF_SIZE(data); > +__DEFINE_PRF_SIZE(cnts); > +__DEFINE_PRF_SIZE(names); > +__DEFINE_PRF_SIZE(vnds); > + > +#undef __DEFINE_PRF_SIZE > + > +#endif /* _PGO_H */ > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib > index eee59184de64..48a65d092c5b 100644 > --- a/scripts/Makefile.lib > +++ b/scripts/Makefile.lib > @@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \ > $(CFLAGS_GCOV)) > endif > > +# > +# Enable clang's PGO profiling flags for a file or directory depending on > +# variables PGO_PROFILE_obj.o and PGO_PROFILE. > +# > +ifeq ($(CONFIG_PGO_CLANG),y) > +_c_flags += $(if $(patsubst n%,, \ > + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \ > + $(CFLAGS_PGO_CLANG)) > +endif > + > # > # Enable address sanitizer flags for kernel except some files or directories > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE) > -- > 2.30.1.766.gb4fecdf3b7-goog >
Reviewed-by: Fangrui Song <maskray@google.com> Some minor items below: On 2021-02-26, 'Bill Wendling' via Clang Built Linux wrote: >From: Sami Tolvanen <samitolvanen@google.com> > >Enable the use of clang's Profile-Guided Optimization[1]. To generate a >profile, the kernel is instrumented with PGO counters, a representative >workload is run, and the raw profile data is collected from >/sys/kernel/debug/pgo/profraw. > >The raw profile data must be processed by clang's "llvm-profdata" tool >before it can be used during recompilation: > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw > >Multiple raw profiles may be merged during this step. > >The data can now be used by the compiler: > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... > >This initial submission is restricted to x86, as that's the platform we >know works. This restriction can be lifted once other platforms have >been verified to work with PGO. > >Note that this method of profiling the kernel is clang-native, unlike >the clang support in kernel/gcov. > >[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization > >Signed-off-by: Sami Tolvanen <samitolvanen@google.com> >Co-developed-by: Bill Wendling <morbo@google.com> >Signed-off-by: Bill Wendling <morbo@google.com> >--- >v8: - Rebased on top-of-tree. >v7: - Fix minor build failure reported by Sedat. >v6: - Add better documentation about the locking scheme and other things. > - Rename macros to better match the same macros in LLVM's source code. >v5: - Correct padding calculation, discovered by Nathan Chancellor. >v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our > own popcount implementation, based on Nick Desaulniers's comment. >v3: - Added change log section based on Sedat Dilek's comments. >v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's > testing. > - Corrected documentation, re PGO flags when using LTO, based on Fangrui > Song's comments. >--- > Documentation/dev-tools/index.rst | 1 + > Documentation/dev-tools/pgo.rst | 127 +++++++++ > MAINTAINERS | 9 + > Makefile | 3 + > arch/Kconfig | 1 + > arch/x86/Kconfig | 1 + > arch/x86/boot/Makefile | 1 + > arch/x86/boot/compressed/Makefile | 1 + > arch/x86/crypto/Makefile | 4 + > arch/x86/entry/vdso/Makefile | 1 + > arch/x86/kernel/vmlinux.lds.S | 2 + > arch/x86/platform/efi/Makefile | 1 + > arch/x86/purgatory/Makefile | 1 + > arch/x86/realmode/rm/Makefile | 1 + > arch/x86/um/vdso/Makefile | 1 + > drivers/firmware/efi/libstub/Makefile | 1 + > include/asm-generic/vmlinux.lds.h | 44 +++ > kernel/Makefile | 1 + > kernel/pgo/Kconfig | 35 +++ > kernel/pgo/Makefile | 5 + > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++ > kernel/pgo/instrument.c | 189 +++++++++++++ > kernel/pgo/pgo.h | 203 ++++++++++++++ > scripts/Makefile.lib | 10 + > 24 files changed, 1032 insertions(+) > create mode 100644 Documentation/dev-tools/pgo.rst > create mode 100644 kernel/pgo/Kconfig > create mode 100644 kernel/pgo/Makefile > create mode 100644 kernel/pgo/fs.c > create mode 100644 kernel/pgo/instrument.c > create mode 100644 kernel/pgo/pgo.h > >diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst >index f7809c7b1ba9..8d6418e85806 100644 >--- a/Documentation/dev-tools/index.rst >+++ b/Documentation/dev-tools/index.rst >@@ -26,6 +26,7 @@ whole; patches welcome! > kgdb > kselftest > kunit/index >+ pgo > > > .. only:: subproject and html >diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst >new file mode 100644 >index 000000000000..b7f11d8405b7 >--- /dev/null >+++ b/Documentation/dev-tools/pgo.rst >@@ -0,0 +1,127 @@ >+.. SPDX-License-Identifier: GPL-2.0 >+ >+=============================== >+Using PGO with the Linux kernel >+=============================== >+ >+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel >+when building with Clang. The profiling data is exported via the ``pgo`` >+debugfs directory. >+ >+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization >+ >+ >+Preparation >+=========== >+ >+Configure the kernel with: >+ >+.. code-block:: make >+ >+ CONFIG_DEBUG_FS=y >+ CONFIG_PGO_CLANG=y >+ >+Note that kernels compiled with profiling flags will be significantly larger >+and run slower. >+ >+Profiling data will only become accessible once debugfs has been mounted: >+ >+.. code-block:: sh >+ >+ mount -t debugfs none /sys/kernel/debug >+ >+ >+Customization >+============= >+ >+You can enable or disable profiling for individual file and directories by >+adding a line similar to the following to the respective kernel Makefile: >+ >+- For a single file (e.g. main.o) >+ >+ .. code-block:: make >+ >+ PGO_PROFILE_main.o := y >+ >+- For all files in one directory >+ >+ .. code-block:: make >+ >+ PGO_PROFILE := y >+ >+To exclude files from being profiled use >+ >+ .. code-block:: make >+ >+ PGO_PROFILE_main.o := n >+ >+and >+ >+ .. code-block:: make >+ >+ PGO_PROFILE := n >+ >+Only files which are linked to the main kernel image or are compiled as kernel >+modules are supported by this mechanism. >+ >+ >+Files >+===== >+ >+The PGO kernel support creates the following files in debugfs: >+ >+``/sys/kernel/debug/pgo`` >+ Parent directory for all PGO-related files. >+ >+``/sys/kernel/debug/pgo/reset`` >+ Global reset file: resets all coverage data to zero when written to. >+ >+``/sys/kernel/debug/profraw`` >+ The raw PGO data that must be processed with ``llvm_profdata``. >+ >+ >+Workflow >+======== >+ >+The PGO kernel can be run on the host or test machines. The data though should >+be analyzed with Clang's tools from the same Clang version as the kernel was >+compiled. Clang's tolerant of version skew, but it's easier to use the same >+Clang version. >+ >+The profiling data is useful for optimizing the kernel, analyzing coverage, >+etc. Clang offers tools to perform these tasks. >+ >+Here is an example workflow for profiling an instrumented kernel with PGO and >+using the result to optimize the kernel: >+ >+1) Install the kernel on the TEST machine. >+ >+2) Reset the data counters right before running the load tests >+ >+ .. code-block:: sh >+ >+ $ echo 1 > /sys/kernel/debug/pgo/reset >+ >+3) Run the load tests. >+ >+4) Collect the raw profile data >+ >+ .. code-block:: sh >+ >+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw >+ >+5) (Optional) Download the raw profile data to the HOST machine. >+ >+6) Process the raw profile data >+ >+ .. code-block:: sh >+ >+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw >+ >+ Note that multiple raw profile data files can be merged during this step. >+ >+7) Rebuild the kernel using the profile data (PGO disabled) >+ >+ .. code-block:: sh >+ >+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... >diff --git a/MAINTAINERS b/MAINTAINERS >index c71664ca8bfd..3a6668792bc5 100644 >--- a/MAINTAINERS >+++ b/MAINTAINERS >@@ -14019,6 +14019,15 @@ S: Maintained > F: include/linux/personality.h > F: include/uapi/linux/personality.h > >+PGO BASED KERNEL PROFILING >+M: Sami Tolvanen <samitolvanen@google.com> >+M: Bill Wendling <wcw@google.com> >+R: Nathan Chancellor <natechancellor@gmail.com> >+R: Nick Desaulniers <ndesaulniers@google.com> >+S: Supported >+F: Documentation/dev-tools/pgo.rst >+F: kernel/pgo >+ > PHOENIX RC FLIGHT CONTROLLER ADAPTER > M: Marcus Folkesson <marcus.folkesson@gmail.com> > L: linux-input@vger.kernel.org >diff --git a/Makefile b/Makefile >index 6ecd0d22e608..b57d4d44c799 100644 >--- a/Makefile >+++ b/Makefile >@@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD > # Defaults to vmlinux, but the arch makefile usually adds further targets > all: vmlinux > >+CFLAGS_PGO_CLANG := -fprofile-generate >+export CFLAGS_PGO_CLANG >+ > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \ > $(call cc-option,-fno-tree-loop-im) \ > $(call cc-disable-warning,maybe-uninitialized,) >diff --git a/arch/Kconfig b/arch/Kconfig >index 2bb30673d8e6..111e642a2af7 100644 >--- a/arch/Kconfig >+++ b/arch/Kconfig >@@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT > bool > > source "kernel/gcov/Kconfig" >+source "kernel/pgo/Kconfig" > > source "scripts/gcc-plugins/Kconfig" > >diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >index cd4b9b1204a8..c9808583b528 100644 >--- a/arch/x86/Kconfig >+++ b/arch/x86/Kconfig >@@ -99,6 +99,7 @@ config X86 > select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096 > select ARCH_SUPPORTS_LTO_CLANG if X86_64 > select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64 >+ select ARCH_SUPPORTS_PGO_CLANG if X86_64 > select ARCH_USE_BUILTIN_BSWAP > select ARCH_USE_QUEUED_RWLOCKS > select ARCH_USE_QUEUED_SPINLOCKS >diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile >index fe605205b4ce..383853e32f67 100644 >--- a/arch/x86/boot/Makefile >+++ b/arch/x86/boot/Makefile >@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables > GCOV_PROFILE := n >+PGO_PROFILE := n > UBSAN_SANITIZE := n > > $(obj)/bzImage: asflags-y := $(SVGA_MODE) >diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile >index e0bc3988c3fa..ed12ab65f606 100644 >--- a/arch/x86/boot/compressed/Makefile >+++ b/arch/x86/boot/compressed/Makefile >@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/ > > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > GCOV_PROFILE := n >+PGO_PROFILE := n > UBSAN_SANITIZE :=n > > KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE) >diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile >index b28e36b7c96b..4b2e9620c412 100644 >--- a/arch/x86/crypto/Makefile >+++ b/arch/x86/crypto/Makefile >@@ -4,6 +4,10 @@ > > OBJECT_FILES_NON_STANDARD := y > >+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of >+# registers for some of the functions. >+PGO_PROFILE_curve25519-x86_64.o := n >+ > obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o > twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o > obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o >diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile >index 05c4abc2fdfd..f7421e44725a 100644 >--- a/arch/x86/entry/vdso/Makefile >+++ b/arch/x86/entry/vdso/Makefile >@@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@ > VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \ > $(call ld-option, --eh-frame-hdr) -Bsymbolic > GCOV_PROFILE := n >+PGO_PROFILE := n > > quiet_cmd_vdso_and_check = VDSO $@ > cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check) >diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S >index efd9e9ea17f2..f6cab2316c46 100644 >--- a/arch/x86/kernel/vmlinux.lds.S >+++ b/arch/x86/kernel/vmlinux.lds.S >@@ -184,6 +184,8 @@ SECTIONS > > BUG_TABLE > >+ PGO_CLANG_DATA >+ > ORC_UNWIND_TABLE > > . = ALIGN(PAGE_SIZE); >diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile >index 84b09c230cbd..5f22b31446ad 100644 >--- a/arch/x86/platform/efi/Makefile >+++ b/arch/x86/platform/efi/Makefile >@@ -2,6 +2,7 @@ > OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y > KASAN_SANITIZE := n > GCOV_PROFILE := n >+PGO_PROFILE := n > > obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o > obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o >diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile >index 95ea17a9d20c..36f20e99da0b 100644 >--- a/arch/x86/purgatory/Makefile >+++ b/arch/x86/purgatory/Makefile >@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk > > # Sanitizer, etc. runtimes are unavailable and cannot be linked here. > GCOV_PROFILE := n >+PGO_PROFILE := n > KASAN_SANITIZE := n > UBSAN_SANITIZE := n > KCSAN_SANITIZE := n >diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile >index 83f1b6a56449..21797192f958 100644 >--- a/arch/x86/realmode/rm/Makefile >+++ b/arch/x86/realmode/rm/Makefile >@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \ > KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables > GCOV_PROFILE := n >+PGO_PROFILE := n > UBSAN_SANITIZE := n >diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile >index 5943387e3f35..54f5768f5853 100644 >--- a/arch/x86/um/vdso/Makefile >+++ b/arch/x86/um/vdso/Makefile >@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@ > > VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv > GCOV_PROFILE := n >+PGO_PROFILE := n > > # > # Install the unstripped copy of vdso*.so listed in $(vdso-install-y). >diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile >index c23466e05e60..724fb389bb9d 100644 >--- a/drivers/firmware/efi/libstub/Makefile >+++ b/drivers/firmware/efi/libstub/Makefile >@@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) > KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) > > GCOV_PROFILE := n >+PGO_PROFILE := n > # Sanitizer runtimes are unavailable and cannot be linked here. > KASAN_SANITIZE := n > KCSAN_SANITIZE := n >diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h >index 6786f8c0182f..4a0c21b840b3 100644 >--- a/include/asm-generic/vmlinux.lds.h >+++ b/include/asm-generic/vmlinux.lds.h >@@ -329,6 +329,49 @@ > #define DTPM_TABLE() > #endif > >+#ifdef CONFIG_PGO_CLANG >+#define PGO_CLANG_DATA \ >+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \ >+ . = ALIGN(8); \ >+ __llvm_prf_start = .; \ >+ __llvm_prf_data_start = .; \ >+ KEEP(*(__llvm_prf_data)) \ >+ . = ALIGN(8); \ >+ __llvm_prf_data_end = .; \ >+ } \ Some minor items on linker script usage. The end of a metadata section usually does not need alignment. Does the . = ALIGN(8) have significance? Ditto below. This is an item about LD_DEAD_CODE_DATA_ELIMINATION. Feel free to postpone after this patch is in tree: KEEP(*(__llvm_prf_data)) KEEP should be dropped. I have been involved in improving GC (my recent interests on such metadata sections :) https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order) With LLVM>=13 (https://reviews.llvm.org/D96757), __llvm_prf_* associated to non-COMDAT text sections can be GCed as well. KEEP would unnecessarily retain them under LD_DEAD_CODE_DATA_ELIMINATION. For older releases (at least 10<=LLVM<13), such __llvm_prf_* sections are not in zero flag section groups so they usually cannot be discarded. So perhaps with KEEP or without KEEP, you won't find many size differences. >+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \ >+ . = ALIGN(8); \ >+ __llvm_prf_cnts_start = .; \ >+ KEEP(*(__llvm_prf_cnts)) \ >+ . = ALIGN(8); \ >+ __llvm_prf_cnts_end = .; \ >+ } \ >+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \ >+ . = ALIGN(8); \ >+ __llvm_prf_names_start = .; \ >+ KEEP(*(__llvm_prf_names)) \ >+ . = ALIGN(8); \ >+ __llvm_prf_names_end = .; \ >+ . = ALIGN(8); \ >+ } \ __llvm_prf_names does not need alignment. It is often 1 in userspace programs. >+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \ >+ __llvm_prf_vals_start = .; \ >+ KEEP(*(__llvm_prf_vals)) \ >+ . = ALIGN(8); \ >+ __llvm_prf_vals_end = .; \ >+ . = ALIGN(8); \ >+ } \ >+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \ >+ __llvm_prf_vnds_start = .; \ >+ KEEP(*(__llvm_prf_vnds)) \ >+ . = ALIGN(8); \ >+ __llvm_prf_vnds_end = .; \ >+ __llvm_prf_end = .; \ >+ } In userspace PGO instrumentation, the start is often aligned by 16. The end does not need alignment. >+#else >+#define PGO_CLANG_DATA >+#endif >+ > #define KERNEL_DTB() \ > STRUCT_ALIGN(); \ > __dtb_start = .; \ >@@ -1105,6 +1148,7 @@ > CONSTRUCTORS \ > } \ > BUG_TABLE \ >+ PGO_CLANG_DATA > > #define INIT_TEXT_SECTION(inittext_align) \ > . = ALIGN(inittext_align); \ >diff --git a/kernel/Makefile b/kernel/Makefile >index 320f1f3941b7..a2a23ef2b12f 100644 >--- a/kernel/Makefile >+++ b/kernel/Makefile >@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/ > obj-$(CONFIG_KCSAN) += kcsan/ > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o >+obj-$(CONFIG_PGO_CLANG) += pgo/ > > obj-$(CONFIG_PERF_EVENTS) += events/ > >diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig >new file mode 100644 >index 000000000000..76a640b6cf6e >--- /dev/null >+++ b/kernel/pgo/Kconfig >@@ -0,0 +1,35 @@ >+# SPDX-License-Identifier: GPL-2.0-only >+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)" >+ >+config ARCH_SUPPORTS_PGO_CLANG >+ bool >+ >+config PGO_CLANG >+ bool "Enable clang's PGO-based kernel profiling" >+ depends on DEBUG_FS >+ depends on ARCH_SUPPORTS_PGO_CLANG >+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000 >+ help >+ This option enables clang's PGO (Profile Guided Optimization) based >+ code profiling to better optimize the kernel. >+ >+ If unsure, say N. >+ >+ Run a representative workload for your application on a kernel >+ compiled with this option and download the raw profile file from >+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with >+ llvm-profdata. It may be merged with other collected raw profiles. >+ >+ Copy the resulting profile file into vmlinux.profdata, and enable >+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized >+ kernel. >+ >+ Note that a kernel compiled with profiling flags will be >+ significantly larger and run slower. Also be sure to exclude files >+ from profiling which are not linked to the kernel image to prevent >+ linker errors. >+ >+ Note that the debugfs filesystem has to be mounted to access >+ profiling data. >+ >+endmenu >diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile >new file mode 100644 >index 000000000000..41e27cefd9a4 >--- /dev/null >+++ b/kernel/pgo/Makefile >@@ -0,0 +1,5 @@ >+# SPDX-License-Identifier: GPL-2.0 >+GCOV_PROFILE := n >+PGO_PROFILE := n >+ >+obj-y += fs.o instrument.o >diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c >new file mode 100644 >index 000000000000..1678df3b7d64 >--- /dev/null >+++ b/kernel/pgo/fs.c >@@ -0,0 +1,389 @@ >+// SPDX-License-Identifier: GPL-2.0 >+/* >+ * Copyright (C) 2019 Google, Inc. >+ * >+ * Author: >+ * Sami Tolvanen <samitolvanen@google.com> >+ * >+ * This software is licensed under the terms of the GNU General Public >+ * License version 2, as published by the Free Software Foundation, and >+ * may be copied, distributed, and modified under those terms. >+ * >+ * This program is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >+ * GNU General Public License for more details. >+ * >+ */ >+ >+#define pr_fmt(fmt) "pgo: " fmt >+ >+#include <linux/kernel.h> >+#include <linux/debugfs.h> >+#include <linux/fs.h> >+#include <linux/module.h> >+#include <linux/slab.h> >+#include <linux/vmalloc.h> >+#include "pgo.h" >+ >+static struct dentry *directory; >+ >+struct prf_private_data { >+ void *buffer; >+ unsigned long size; >+}; >+ >+/* >+ * Raw profile data format: >+ * >+ * - llvm_prf_header >+ * - __llvm_prf_data >+ * - __llvm_prf_cnts >+ * - __llvm_prf_names >+ * - zero padding to 8 bytes >+ * - for each llvm_prf_data in __llvm_prf_data: >+ * - llvm_prf_value_data >+ * - llvm_prf_value_record + site count array >+ * - llvm_prf_value_node_data >+ * ... >+ * ... >+ * ... >+ */ >+ >+static void prf_fill_header(void **buffer) >+{ >+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer; >+ >+#ifdef CONFIG_64BIT >+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64; >+#else >+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32; >+#endif >+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION; >+ header->data_size = prf_data_count(); >+ header->padding_bytes_before_counters = 0; >+ header->counters_size = prf_cnts_count(); >+ header->padding_bytes_after_counters = 0; >+ header->names_size = prf_names_count(); >+ header->counters_delta = (u64)__llvm_prf_cnts_start; >+ header->names_delta = (u64)__llvm_prf_names_start; >+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST; >+ >+ *buffer += sizeof(*header); >+} >+ >+/* >+ * Copy the source into the buffer, incrementing the pointer into buffer in the >+ * process. >+ */ >+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size) >+{ >+ memcpy(*buffer, src, size); >+ *buffer += size; >+} >+ >+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds) >+{ >+ struct llvm_prf_value_node **nodes = >+ (struct llvm_prf_value_node **)p->values; >+ u32 kinds = 0; >+ u32 size = 0; >+ unsigned int kind; >+ unsigned int n; >+ unsigned int s = 0; >+ >+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { >+ unsigned int sites = p->num_value_sites[kind]; >+ >+ if (!sites) >+ continue; >+ >+ /* Record + site count array */ >+ size += prf_get_value_record_size(sites); >+ kinds++; >+ >+ if (!nodes) >+ continue; >+ >+ for (n = 0; n < sites; n++) { >+ u32 count = 0; >+ struct llvm_prf_value_node *site = nodes[s + n]; >+ >+ while (site && ++count <= U8_MAX) >+ site = site->next; >+ >+ size += count * >+ sizeof(struct llvm_prf_value_node_data); >+ } >+ >+ s += sites; >+ } >+ >+ if (size) >+ size += sizeof(struct llvm_prf_value_data); >+ >+ if (value_kinds) >+ *value_kinds = kinds; >+ >+ return size; >+} >+ >+static u32 prf_get_value_size(void) >+{ >+ u32 size = 0; >+ struct llvm_prf_data *p; >+ >+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) >+ size += __prf_get_value_size(p, NULL); >+ >+ return size; >+} >+ >+/* Serialize the profiling's value. */ >+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer) >+{ >+ struct llvm_prf_value_data header; >+ struct llvm_prf_value_node **nodes = >+ (struct llvm_prf_value_node **)p->values; >+ unsigned int kind; >+ unsigned int n; >+ unsigned int s = 0; >+ >+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds); >+ >+ if (!header.num_value_kinds) >+ /* Nothing to write. */ >+ return; >+ >+ prf_copy_to_buffer(buffer, &header, sizeof(header)); >+ >+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { >+ struct llvm_prf_value_record *record; >+ u8 *counts; >+ unsigned int sites = p->num_value_sites[kind]; >+ >+ if (!sites) >+ continue; >+ >+ /* Profiling value record. */ >+ record = *(struct llvm_prf_value_record **)buffer; >+ *buffer += prf_get_value_record_header_size(); >+ >+ record->kind = kind; >+ record->num_value_sites = sites; >+ >+ /* Site count array. */ >+ counts = *(u8 **)buffer; >+ *buffer += prf_get_value_record_site_count_size(sites); >+ >+ /* >+ * If we don't have nodes, we can skip updating the site count >+ * array, because the buffer is zero filled. >+ */ >+ if (!nodes) >+ continue; >+ >+ for (n = 0; n < sites; n++) { >+ u32 count = 0; >+ struct llvm_prf_value_node *site = nodes[s + n]; >+ >+ while (site && ++count <= U8_MAX) { >+ prf_copy_to_buffer(buffer, site, >+ sizeof(struct llvm_prf_value_node_data)); >+ site = site->next; >+ } >+ >+ counts[n] = (u8)count; >+ } >+ >+ s += sites; >+ } >+} >+ >+static void prf_serialize_values(void **buffer) >+{ >+ struct llvm_prf_data *p; >+ >+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) >+ prf_serialize_value(p, buffer); >+} >+ >+static inline unsigned long prf_get_padding(unsigned long size) >+{ >+ return 7 & (sizeof(u64) - size % sizeof(u64)); >+} >+ >+static unsigned long prf_buffer_size(void) >+{ >+ return sizeof(struct llvm_prf_header) + >+ prf_data_size() + >+ prf_cnts_size() + >+ prf_names_size() + >+ prf_get_padding(prf_names_size()) + >+ prf_get_value_size(); >+} >+ >+/* >+ * Serialize the profiling data into a format LLVM's tools can understand. >+ * Note: caller *must* hold pgo_lock. >+ */ >+static int prf_serialize(struct prf_private_data *p) >+{ >+ int err = 0; >+ void *buffer; >+ >+ p->size = prf_buffer_size(); >+ p->buffer = vzalloc(p->size); >+ >+ if (!p->buffer) { >+ err = -ENOMEM; >+ goto out; >+ } >+ >+ buffer = p->buffer; >+ >+ prf_fill_header(&buffer); >+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size()); >+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size()); >+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size()); >+ buffer += prf_get_padding(prf_names_size()); >+ >+ prf_serialize_values(&buffer); >+ >+out: >+ return err; >+} >+ >+/* open() implementation for PGO. Creates a copy of the profiling data set. */ >+static int prf_open(struct inode *inode, struct file *file) >+{ >+ struct prf_private_data *data; >+ unsigned long flags; >+ int err; >+ >+ data = kzalloc(sizeof(*data), GFP_KERNEL); >+ if (!data) { >+ err = -ENOMEM; >+ goto out; >+ } >+ >+ flags = prf_lock(); >+ >+ err = prf_serialize(data); >+ if (unlikely(err)) { >+ kfree(data); >+ goto out_unlock; >+ } >+ >+ file->private_data = data; >+ >+out_unlock: >+ prf_unlock(flags); >+out: >+ return err; >+} >+ >+/* read() implementation for PGO. */ >+static ssize_t prf_read(struct file *file, char __user *buf, size_t count, >+ loff_t *ppos) >+{ >+ struct prf_private_data *data = file->private_data; >+ >+ BUG_ON(!data); >+ >+ return simple_read_from_buffer(buf, count, ppos, data->buffer, >+ data->size); >+} >+ >+/* release() implementation for PGO. Release resources allocated by open(). */ >+static int prf_release(struct inode *inode, struct file *file) >+{ >+ struct prf_private_data *data = file->private_data; >+ >+ if (data) { >+ vfree(data->buffer); >+ kfree(data); >+ } >+ >+ return 0; >+} >+ >+static const struct file_operations prf_fops = { >+ .owner = THIS_MODULE, >+ .open = prf_open, >+ .read = prf_read, >+ .llseek = default_llseek, >+ .release = prf_release >+}; >+ >+/* write() implementation for resetting PGO's profile data. */ >+static ssize_t reset_write(struct file *file, const char __user *addr, >+ size_t len, loff_t *pos) >+{ >+ struct llvm_prf_data *data; >+ >+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size()); >+ >+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) { >+ struct llvm_prf_value_node **vnodes; >+ u64 current_vsite_count; >+ u32 i; >+ >+ if (!data->values) >+ continue; >+ >+ current_vsite_count = 0; >+ vnodes = (struct llvm_prf_value_node **)data->values; >+ >+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++) >+ current_vsite_count += data->num_value_sites[i]; >+ >+ for (i = 0; i < current_vsite_count; i++) { >+ struct llvm_prf_value_node *current_vnode = vnodes[i]; >+ >+ while (current_vnode) { >+ current_vnode->count = 0; >+ current_vnode = current_vnode->next; >+ } >+ } >+ } >+ >+ return len; >+} >+ >+static const struct file_operations prf_reset_fops = { >+ .owner = THIS_MODULE, >+ .write = reset_write, >+ .llseek = noop_llseek, >+}; >+ >+/* Create debugfs entries. */ >+static int __init pgo_init(void) >+{ >+ directory = debugfs_create_dir("pgo", NULL); >+ if (!directory) >+ goto err_remove; >+ >+ if (!debugfs_create_file("profraw", 0600, directory, NULL, >+ &prf_fops)) >+ goto err_remove; >+ >+ if (!debugfs_create_file("reset", 0200, directory, NULL, >+ &prf_reset_fops)) >+ goto err_remove; >+ >+ return 0; >+ >+err_remove: >+ pr_err("initialization failed\n"); >+ return -EIO; >+} >+ >+/* Remove debugfs entries. */ >+static void __exit pgo_exit(void) >+{ >+ debugfs_remove_recursive(directory); >+} >+ >+module_init(pgo_init); >+module_exit(pgo_exit); >diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c >new file mode 100644 >index 000000000000..62ff5cfce7b1 >--- /dev/null >+++ b/kernel/pgo/instrument.c >@@ -0,0 +1,189 @@ >+// SPDX-License-Identifier: GPL-2.0 >+/* >+ * Copyright (C) 2019 Google, Inc. >+ * >+ * Author: >+ * Sami Tolvanen <samitolvanen@google.com> >+ * >+ * This software is licensed under the terms of the GNU General Public >+ * License version 2, as published by the Free Software Foundation, and >+ * may be copied, distributed, and modified under those terms. >+ * >+ * This program is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >+ * GNU General Public License for more details. >+ * >+ */ >+ >+#define pr_fmt(fmt) "pgo: " fmt >+ >+#include <linux/bitops.h> >+#include <linux/kernel.h> >+#include <linux/export.h> >+#include <linux/spinlock.h> >+#include <linux/types.h> >+#include "pgo.h" >+ >+/* >+ * This lock guards both profile count updating and serialization of the >+ * profiling data. Keeping both of these activities separate via locking >+ * ensures that we don't try to serialize data that's only partially updated. >+ */ >+static DEFINE_SPINLOCK(pgo_lock); >+static int current_node; >+ >+unsigned long prf_lock(void) >+{ >+ unsigned long flags; >+ >+ spin_lock_irqsave(&pgo_lock, flags); >+ >+ return flags; >+} >+ >+void prf_unlock(unsigned long flags) >+{ >+ spin_unlock_irqrestore(&pgo_lock, flags); >+} >+ >+/* >+ * Return a newly allocated profiling value node which contains the tracked >+ * value by the value profiler. >+ * Note: caller *must* hold pgo_lock. >+ */ >+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p, >+ u32 index, u64 value) >+{ >+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end) >+ return NULL; /* Out of nodes */ >+ >+ current_node++; >+ >+ /* Make sure the node is entirely within the section */ >+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end || >+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end) >+ return NULL; >+ >+ return &__llvm_prf_vnds_start[current_node]; >+} >+ >+/* >+ * Counts the number of times a target value is seen. >+ * >+ * Records the target value for the index if not seen before. Otherwise, >+ * increments the counter associated w/ the target value. >+ */ >+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index); >+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index) >+{ >+ struct llvm_prf_data *p = (struct llvm_prf_data *)data; >+ struct llvm_prf_value_node **counters; >+ struct llvm_prf_value_node *curr; >+ struct llvm_prf_value_node *min = NULL; >+ struct llvm_prf_value_node *prev = NULL; >+ u64 min_count = U64_MAX; >+ u8 values = 0; >+ unsigned long flags; >+ >+ if (!p || !p->values) >+ return; >+ >+ counters = (struct llvm_prf_value_node **)p->values; >+ curr = counters[index]; >+ >+ while (curr) { >+ if (target_value == curr->value) { >+ curr->count++; >+ return; >+ } >+ >+ if (curr->count < min_count) { >+ min_count = curr->count; >+ min = curr; >+ } >+ >+ prev = curr; >+ curr = curr->next; >+ values++; >+ } >+ >+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) { >+ if (!min->count || !(--min->count)) { >+ curr = min; >+ curr->value = target_value; >+ curr->count++; >+ } >+ return; >+ } >+ >+ /* Lock when updating the value node structure. */ >+ flags = prf_lock(); >+ >+ curr = allocate_node(p, index, target_value); >+ if (!curr) >+ goto out; >+ >+ curr->value = target_value; >+ curr->count++; >+ >+ if (!counters[index]) >+ counters[index] = curr; >+ else if (prev && !prev->next) >+ prev->next = curr; >+ >+out: >+ prf_unlock(flags); >+} >+EXPORT_SYMBOL(__llvm_profile_instrument_target); >+ >+/* Counts the number of times a range of targets values are seen. */ >+void __llvm_profile_instrument_range(u64 target_value, void *data, >+ u32 index, s64 precise_start, >+ s64 precise_last, s64 large_value); >+void __llvm_profile_instrument_range(u64 target_value, void *data, >+ u32 index, s64 precise_start, >+ s64 precise_last, s64 large_value) >+{ >+ if (large_value != S64_MIN && (s64)target_value >= large_value) >+ target_value = large_value; >+ else if ((s64)target_value < precise_start || >+ (s64)target_value > precise_last) >+ target_value = precise_last + 1; >+ >+ __llvm_profile_instrument_target(target_value, data, index); >+} >+EXPORT_SYMBOL(__llvm_profile_instrument_range); >+ >+static u64 inst_prof_get_range_rep_value(u64 value) >+{ >+ if (value <= 8) >+ /* The first ranges are individually tracked, use it as is. */ >+ return value; >+ else if (value >= 513) >+ /* The last range is mapped to its lowest value. */ >+ return 513; >+ else if (hweight64(value) == 1) >+ /* If it's a power of two, use it as is. */ >+ return value; >+ >+ /* Otherwise, take to the previous power of two + 1. */ >+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1; >+} >+ >+/* >+ * The target values are partitioned into multiple ranges. The range spec is >+ * defined in compiler-rt/include/profile/InstrProfData.inc. >+ */ >+void __llvm_profile_instrument_memop(u64 target_value, void *data, >+ u32 counter_index); >+void __llvm_profile_instrument_memop(u64 target_value, void *data, >+ u32 counter_index) >+{ >+ u64 rep_value; >+ >+ /* Map the target value to the representative value of its range. */ >+ rep_value = inst_prof_get_range_rep_value(target_value); >+ __llvm_profile_instrument_target(rep_value, data, counter_index); >+} >+EXPORT_SYMBOL(__llvm_profile_instrument_memop); >diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h >new file mode 100644 >index 000000000000..ddc8d3002fe5 >--- /dev/null >+++ b/kernel/pgo/pgo.h >@@ -0,0 +1,203 @@ >+/* SPDX-License-Identifier: GPL-2.0 */ >+/* >+ * Copyright (C) 2019 Google, Inc. >+ * >+ * Author: >+ * Sami Tolvanen <samitolvanen@google.com> >+ * >+ * This software is licensed under the terms of the GNU General Public >+ * License version 2, as published by the Free Software Foundation, and >+ * may be copied, distributed, and modified under those terms. >+ * >+ * This program is distributed in the hope that it will be useful, >+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >+ * GNU General Public License for more details. >+ * >+ */ >+ >+#ifndef _PGO_H >+#define _PGO_H >+ >+/* >+ * Note: These internal LLVM definitions must match the compiler version. >+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code. >+ */ >+ >+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \ >+ ((u64)255 << 56 | \ >+ (u64)'l' << 48 | \ >+ (u64)'p' << 40 | \ >+ (u64)'r' << 32 | \ >+ (u64)'o' << 24 | \ >+ (u64)'f' << 16 | \ >+ (u64)'r' << 8 | \ >+ (u64)129) >+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \ >+ ((u64)255 << 56 | \ >+ (u64)'l' << 48 | \ >+ (u64)'p' << 40 | \ >+ (u64)'r' << 32 | \ >+ (u64)'o' << 24 | \ >+ (u64)'f' << 16 | \ >+ (u64)'R' << 8 | \ >+ (u64)129) >+ >+#define LLVM_INSTR_PROF_RAW_VERSION 5 >+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8 >+#define LLVM_INSTR_PROF_IPVK_FIRST 0 >+#define LLVM_INSTR_PROF_IPVK_LAST 1 >+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255 >+ >+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56) >+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57) >+ >+/** >+ * struct llvm_prf_header - represents the raw profile header data structure. >+ * @magic: the magic token for the file format. >+ * @version: the version of the file format. >+ * @data_size: the number of entries in the profile data section. >+ * @padding_bytes_before_counters: the number of padding bytes before the >+ * counters. >+ * @counters_size: the size in bytes of the LLVM profile section containing the >+ * counters. >+ * @padding_bytes_after_counters: the number of padding bytes after the >+ * counters. >+ * @names_size: the size in bytes of the LLVM profile section containing the >+ * counters' names. >+ * @counters_delta: the beginning of the LLMV profile counters section. >+ * @names_delta: the beginning of the LLMV profile names section. >+ * @value_kind_last: the last profile value kind. >+ */ >+struct llvm_prf_header { >+ u64 magic; >+ u64 version; >+ u64 data_size; >+ u64 padding_bytes_before_counters; >+ u64 counters_size; >+ u64 padding_bytes_after_counters; >+ u64 names_size; >+ u64 counters_delta; >+ u64 names_delta; >+ u64 value_kind_last; >+}; >+ >+/** >+ * struct llvm_prf_data - represents the per-function control structure. >+ * @name_ref: the reference to the function's name. >+ * @func_hash: the hash value of the function. >+ * @counter_ptr: a pointer to the profile counter. >+ * @function_ptr: a pointer to the function. >+ * @values: the profiling values associated with this function. >+ * @num_counters: the number of counters in the function. >+ * @num_value_sites: the number of value profile sites. >+ */ >+struct llvm_prf_data { >+ const u64 name_ref; >+ const u64 func_hash; >+ const void *counter_ptr; >+ const void *function_ptr; >+ void *values; >+ const u32 num_counters; >+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1]; >+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT); >+ >+/** >+ * structure llvm_prf_value_node_data - represents the data part of the struct >+ * llvm_prf_value_node data structure. >+ * @value: the value counters. >+ * @count: the counters' count. >+ */ >+struct llvm_prf_value_node_data { >+ u64 value; >+ u64 count; >+}; >+ >+/** >+ * struct llvm_prf_value_node - represents an internal data structure used by >+ * the value profiler. >+ * @value: the value counters. >+ * @count: the counters' count. >+ * @next: the next value node. >+ */ >+struct llvm_prf_value_node { >+ u64 value; >+ u64 count; >+ struct llvm_prf_value_node *next; >+}; >+ >+/** >+ * struct llvm_prf_value_data - represents the value profiling data in indexed >+ * format. >+ * @total_size: the total size in bytes including this field. >+ * @num_value_kinds: the number of value profile kinds that has value profile >+ * data. >+ */ >+struct llvm_prf_value_data { >+ u32 total_size; >+ u32 num_value_kinds; >+}; >+ >+/** >+ * struct llvm_prf_value_record - represents the on-disk layout of the value >+ * profile data of a particular kind for one function. >+ * @kind: the kind of the value profile record. >+ * @num_value_sites: the number of value profile sites. >+ * @site_count_array: the first element of the array that stores the number >+ * of profiled values for each value site. >+ */ >+struct llvm_prf_value_record { >+ u32 kind; >+ u32 num_value_sites; >+ u8 site_count_array[]; >+}; >+ >+#define prf_get_value_record_header_size() \ >+ offsetof(struct llvm_prf_value_record, site_count_array) >+#define prf_get_value_record_site_count_size(sites) \ >+ roundup((sites), 8) >+#define prf_get_value_record_size(sites) \ >+ (prf_get_value_record_header_size() + \ >+ prf_get_value_record_site_count_size((sites))) >+ >+/* Data sections */ >+extern struct llvm_prf_data __llvm_prf_data_start[]; >+extern struct llvm_prf_data __llvm_prf_data_end[]; >+ >+extern u64 __llvm_prf_cnts_start[]; >+extern u64 __llvm_prf_cnts_end[]; >+ >+extern char __llvm_prf_names_start[]; >+extern char __llvm_prf_names_end[]; >+ >+extern struct llvm_prf_value_node __llvm_prf_vnds_start[]; >+extern struct llvm_prf_value_node __llvm_prf_vnds_end[]; >+ >+/* Locking for vnodes */ >+extern unsigned long prf_lock(void); >+extern void prf_unlock(unsigned long flags); >+ >+#define __DEFINE_PRF_SIZE(s) \ >+ static inline unsigned long prf_ ## s ## _size(void) \ >+ { \ >+ unsigned long start = \ >+ (unsigned long)__llvm_prf_ ## s ## _start; \ >+ unsigned long end = \ >+ (unsigned long)__llvm_prf_ ## s ## _end; \ >+ return roundup(end - start, \ >+ sizeof(__llvm_prf_ ## s ## _start[0])); \ >+ } \ >+ static inline unsigned long prf_ ## s ## _count(void) \ >+ { \ >+ return prf_ ## s ## _size() / \ >+ sizeof(__llvm_prf_ ## s ## _start[0]); \ >+ } >+ >+__DEFINE_PRF_SIZE(data); >+__DEFINE_PRF_SIZE(cnts); >+__DEFINE_PRF_SIZE(names); >+__DEFINE_PRF_SIZE(vnds); >+ >+#undef __DEFINE_PRF_SIZE >+ >+#endif /* _PGO_H */ >diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib >index eee59184de64..48a65d092c5b 100644 >--- a/scripts/Makefile.lib >+++ b/scripts/Makefile.lib >@@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \ > $(CFLAGS_GCOV)) > endif > >+# >+# Enable clang's PGO profiling flags for a file or directory depending on >+# variables PGO_PROFILE_obj.o and PGO_PROFILE. >+# >+ifeq ($(CONFIG_PGO_CLANG),y) >+_c_flags += $(if $(patsubst n%,, \ >+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \ >+ $(CFLAGS_PGO_CLANG)) >+endif >+ > # > # Enable address sanitizer flags for kernel except some files or directories > # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE) >-- >2.30.1.766.gb4fecdf3b7-goog > >-- >You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. >To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. >To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210226222030.3718075-1-morbo%40google.com.
On 2021-02-28, Fangrui Song wrote: >Reviewed-by: Fangrui Song <maskray@google.com> > >Some minor items below: > >On 2021-02-26, 'Bill Wendling' via Clang Built Linux wrote: >>From: Sami Tolvanen <samitolvanen@google.com> >> >>Enable the use of clang's Profile-Guided Optimization[1]. To generate a >>profile, the kernel is instrumented with PGO counters, a representative >>workload is run, and the raw profile data is collected from >>/sys/kernel/debug/pgo/profraw. >> >>The raw profile data must be processed by clang's "llvm-profdata" tool >>before it can be used during recompilation: >> >> $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw >> $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw >> >>Multiple raw profiles may be merged during this step. >> >>The data can now be used by the compiler: >> >> $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... >> >>This initial submission is restricted to x86, as that's the platform we >>know works. This restriction can be lifted once other platforms have >>been verified to work with PGO. >> >>Note that this method of profiling the kernel is clang-native, unlike >>the clang support in kernel/gcov. >> >>[1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization >> >>Signed-off-by: Sami Tolvanen <samitolvanen@google.com> >>Co-developed-by: Bill Wendling <morbo@google.com> >>Signed-off-by: Bill Wendling <morbo@google.com> >>--- >>v8: - Rebased on top-of-tree. >>v7: - Fix minor build failure reported by Sedat. >>v6: - Add better documentation about the locking scheme and other things. >> - Rename macros to better match the same macros in LLVM's source code. >>v5: - Correct padding calculation, discovered by Nathan Chancellor. >>v4: - Remove non-x86 Makfile changes and se "hweight64" instead of using our >> own popcount implementation, based on Nick Desaulniers's comment. >>v3: - Added change log section based on Sedat Dilek's comments. >>v2: - Added "__llvm_profile_instrument_memop" based on Nathan Chancellor's >> testing. >> - Corrected documentation, re PGO flags when using LTO, based on Fangrui >> Song's comments. >>--- >>Documentation/dev-tools/index.rst | 1 + >>Documentation/dev-tools/pgo.rst | 127 +++++++++ >>MAINTAINERS | 9 + >>Makefile | 3 + >>arch/Kconfig | 1 + >>arch/x86/Kconfig | 1 + >>arch/x86/boot/Makefile | 1 + >>arch/x86/boot/compressed/Makefile | 1 + >>arch/x86/crypto/Makefile | 4 + >>arch/x86/entry/vdso/Makefile | 1 + >>arch/x86/kernel/vmlinux.lds.S | 2 + >>arch/x86/platform/efi/Makefile | 1 + >>arch/x86/purgatory/Makefile | 1 + >>arch/x86/realmode/rm/Makefile | 1 + >>arch/x86/um/vdso/Makefile | 1 + >>drivers/firmware/efi/libstub/Makefile | 1 + >>include/asm-generic/vmlinux.lds.h | 44 +++ >>kernel/Makefile | 1 + >>kernel/pgo/Kconfig | 35 +++ >>kernel/pgo/Makefile | 5 + >>kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++ >>kernel/pgo/instrument.c | 189 +++++++++++++ >>kernel/pgo/pgo.h | 203 ++++++++++++++ >>scripts/Makefile.lib | 10 + >>24 files changed, 1032 insertions(+) >>create mode 100644 Documentation/dev-tools/pgo.rst >>create mode 100644 kernel/pgo/Kconfig >>create mode 100644 kernel/pgo/Makefile >>create mode 100644 kernel/pgo/fs.c >>create mode 100644 kernel/pgo/instrument.c >>create mode 100644 kernel/pgo/pgo.h >> >>diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst >>index f7809c7b1ba9..8d6418e85806 100644 >>--- a/Documentation/dev-tools/index.rst >>+++ b/Documentation/dev-tools/index.rst >>@@ -26,6 +26,7 @@ whole; patches welcome! >> kgdb >> kselftest >> kunit/index >>+ pgo >> >> >>.. only:: subproject and html >>diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst >>new file mode 100644 >>index 000000000000..b7f11d8405b7 >>--- /dev/null >>+++ b/Documentation/dev-tools/pgo.rst >>@@ -0,0 +1,127 @@ >>+.. SPDX-License-Identifier: GPL-2.0 >>+ >>+=============================== >>+Using PGO with the Linux kernel >>+=============================== >>+ >>+Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel >>+when building with Clang. The profiling data is exported via the ``pgo`` >>+debugfs directory. >>+ >>+.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization >>+ >>+ >>+Preparation >>+=========== >>+ >>+Configure the kernel with: >>+ >>+.. code-block:: make >>+ >>+ CONFIG_DEBUG_FS=y >>+ CONFIG_PGO_CLANG=y >>+ >>+Note that kernels compiled with profiling flags will be significantly larger >>+and run slower. >>+ >>+Profiling data will only become accessible once debugfs has been mounted: >>+ >>+.. code-block:: sh >>+ >>+ mount -t debugfs none /sys/kernel/debug >>+ >>+ >>+Customization >>+============= >>+ >>+You can enable or disable profiling for individual file and directories by >>+adding a line similar to the following to the respective kernel Makefile: >>+ >>+- For a single file (e.g. main.o) >>+ >>+ .. code-block:: make >>+ >>+ PGO_PROFILE_main.o := y >>+ >>+- For all files in one directory >>+ >>+ .. code-block:: make >>+ >>+ PGO_PROFILE := y >>+ >>+To exclude files from being profiled use >>+ >>+ .. code-block:: make >>+ >>+ PGO_PROFILE_main.o := n >>+ >>+and >>+ >>+ .. code-block:: make >>+ >>+ PGO_PROFILE := n >>+ >>+Only files which are linked to the main kernel image or are compiled as kernel >>+modules are supported by this mechanism. >>+ >>+ >>+Files >>+===== >>+ >>+The PGO kernel support creates the following files in debugfs: >>+ >>+``/sys/kernel/debug/pgo`` >>+ Parent directory for all PGO-related files. >>+ >>+``/sys/kernel/debug/pgo/reset`` >>+ Global reset file: resets all coverage data to zero when written to. >>+ >>+``/sys/kernel/debug/profraw`` >>+ The raw PGO data that must be processed with ``llvm_profdata``. >>+ >>+ >>+Workflow >>+======== >>+ >>+The PGO kernel can be run on the host or test machines. The data though should >>+be analyzed with Clang's tools from the same Clang version as the kernel was >>+compiled. Clang's tolerant of version skew, but it's easier to use the same >>+Clang version. >>+ >>+The profiling data is useful for optimizing the kernel, analyzing coverage, >>+etc. Clang offers tools to perform these tasks. >>+ >>+Here is an example workflow for profiling an instrumented kernel with PGO and >>+using the result to optimize the kernel: >>+ >>+1) Install the kernel on the TEST machine. >>+ >>+2) Reset the data counters right before running the load tests >>+ >>+ .. code-block:: sh >>+ >>+ $ echo 1 > /sys/kernel/debug/pgo/reset >>+ >>+3) Run the load tests. >>+ >>+4) Collect the raw profile data >>+ >>+ .. code-block:: sh >>+ >>+ $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw >>+ >>+5) (Optional) Download the raw profile data to the HOST machine. >>+ >>+6) Process the raw profile data >>+ >>+ .. code-block:: sh >>+ >>+ $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw >>+ >>+ Note that multiple raw profile data files can be merged during this step. >>+ >>+7) Rebuild the kernel using the profile data (PGO disabled) >>+ >>+ .. code-block:: sh >>+ >>+ $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... >>diff --git a/MAINTAINERS b/MAINTAINERS >>index c71664ca8bfd..3a6668792bc5 100644 >>--- a/MAINTAINERS >>+++ b/MAINTAINERS >>@@ -14019,6 +14019,15 @@ S: Maintained >>F: include/linux/personality.h >>F: include/uapi/linux/personality.h >> >>+PGO BASED KERNEL PROFILING >>+M: Sami Tolvanen <samitolvanen@google.com> >>+M: Bill Wendling <wcw@google.com> >>+R: Nathan Chancellor <natechancellor@gmail.com> >>+R: Nick Desaulniers <ndesaulniers@google.com> >>+S: Supported >>+F: Documentation/dev-tools/pgo.rst >>+F: kernel/pgo >>+ >>PHOENIX RC FLIGHT CONTROLLER ADAPTER >>M: Marcus Folkesson <marcus.folkesson@gmail.com> >>L: linux-input@vger.kernel.org >>diff --git a/Makefile b/Makefile >>index 6ecd0d22e608..b57d4d44c799 100644 >>--- a/Makefile >>+++ b/Makefile >>@@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD >># Defaults to vmlinux, but the arch makefile usually adds further targets >>all: vmlinux >> >>+CFLAGS_PGO_CLANG := -fprofile-generate >>+export CFLAGS_PGO_CLANG >>+ >>CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \ >> $(call cc-option,-fno-tree-loop-im) \ >> $(call cc-disable-warning,maybe-uninitialized,) >>diff --git a/arch/Kconfig b/arch/Kconfig >>index 2bb30673d8e6..111e642a2af7 100644 >>--- a/arch/Kconfig >>+++ b/arch/Kconfig >>@@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT >> bool >> >>source "kernel/gcov/Kconfig" >>+source "kernel/pgo/Kconfig" >> >>source "scripts/gcc-plugins/Kconfig" >> >>diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>index cd4b9b1204a8..c9808583b528 100644 >>--- a/arch/x86/Kconfig >>+++ b/arch/x86/Kconfig >>@@ -99,6 +99,7 @@ config X86 >> select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096 >> select ARCH_SUPPORTS_LTO_CLANG if X86_64 >> select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64 >>+ select ARCH_SUPPORTS_PGO_CLANG if X86_64 >> select ARCH_USE_BUILTIN_BSWAP >> select ARCH_USE_QUEUED_RWLOCKS >> select ARCH_USE_QUEUED_SPINLOCKS >>diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile >>index fe605205b4ce..383853e32f67 100644 >>--- a/arch/x86/boot/Makefile >>+++ b/arch/x86/boot/Makefile >>@@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ >>KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) >>KBUILD_CFLAGS += -fno-asynchronous-unwind-tables >>GCOV_PROFILE := n >>+PGO_PROFILE := n >>UBSAN_SANITIZE := n >> >>$(obj)/bzImage: asflags-y := $(SVGA_MODE) >>diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile >>index e0bc3988c3fa..ed12ab65f606 100644 >>--- a/arch/x86/boot/compressed/Makefile >>+++ b/arch/x86/boot/compressed/Makefile >>@@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/ >> >>KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ >>GCOV_PROFILE := n >>+PGO_PROFILE := n >>UBSAN_SANITIZE :=n >> >>KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE) >>diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile >>index b28e36b7c96b..4b2e9620c412 100644 >>--- a/arch/x86/crypto/Makefile >>+++ b/arch/x86/crypto/Makefile >>@@ -4,6 +4,10 @@ >> >>OBJECT_FILES_NON_STANDARD := y >> >>+# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of >>+# registers for some of the functions. >>+PGO_PROFILE_curve25519-x86_64.o := n >>+ >>obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o >>twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o >>obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o >>diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile >>index 05c4abc2fdfd..f7421e44725a 100644 >>--- a/arch/x86/entry/vdso/Makefile >>+++ b/arch/x86/entry/vdso/Makefile >>@@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@ >>VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \ >> $(call ld-option, --eh-frame-hdr) -Bsymbolic >>GCOV_PROFILE := n >>+PGO_PROFILE := n >> >>quiet_cmd_vdso_and_check = VDSO $@ >> cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check) >>diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S >>index efd9e9ea17f2..f6cab2316c46 100644 >>--- a/arch/x86/kernel/vmlinux.lds.S >>+++ b/arch/x86/kernel/vmlinux.lds.S >>@@ -184,6 +184,8 @@ SECTIONS >> >> BUG_TABLE >> >>+ PGO_CLANG_DATA >>+ >> ORC_UNWIND_TABLE >> >> . = ALIGN(PAGE_SIZE); >>diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile >>index 84b09c230cbd..5f22b31446ad 100644 >>--- a/arch/x86/platform/efi/Makefile >>+++ b/arch/x86/platform/efi/Makefile >>@@ -2,6 +2,7 @@ >>OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y >>KASAN_SANITIZE := n >>GCOV_PROFILE := n >>+PGO_PROFILE := n >> >>obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o >>obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o >>diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile >>index 95ea17a9d20c..36f20e99da0b 100644 >>--- a/arch/x86/purgatory/Makefile >>+++ b/arch/x86/purgatory/Makefile >>@@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk >> >># Sanitizer, etc. runtimes are unavailable and cannot be linked here. >>GCOV_PROFILE := n >>+PGO_PROFILE := n >>KASAN_SANITIZE := n >>UBSAN_SANITIZE := n >>KCSAN_SANITIZE := n >>diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile >>index 83f1b6a56449..21797192f958 100644 >>--- a/arch/x86/realmode/rm/Makefile >>+++ b/arch/x86/realmode/rm/Makefile >>@@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \ >>KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ >>KBUILD_CFLAGS += -fno-asynchronous-unwind-tables >>GCOV_PROFILE := n >>+PGO_PROFILE := n >>UBSAN_SANITIZE := n >>diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile >>index 5943387e3f35..54f5768f5853 100644 >>--- a/arch/x86/um/vdso/Makefile >>+++ b/arch/x86/um/vdso/Makefile >>@@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@ >> >>VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv >>GCOV_PROFILE := n >>+PGO_PROFILE := n >> >># >># Install the unstripped copy of vdso*.so listed in $(vdso-install-y). >>diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile >>index c23466e05e60..724fb389bb9d 100644 >>--- a/drivers/firmware/efi/libstub/Makefile >>+++ b/drivers/firmware/efi/libstub/Makefile >>@@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) >>KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) >> >>GCOV_PROFILE := n >>+PGO_PROFILE := n >># Sanitizer runtimes are unavailable and cannot be linked here. >>KASAN_SANITIZE := n >>KCSAN_SANITIZE := n >>diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h >>index 6786f8c0182f..4a0c21b840b3 100644 >>--- a/include/asm-generic/vmlinux.lds.h >>+++ b/include/asm-generic/vmlinux.lds.h >>@@ -329,6 +329,49 @@ >>#define DTPM_TABLE() >>#endif >> >>+#ifdef CONFIG_PGO_CLANG >>+#define PGO_CLANG_DATA \ >>+ __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_start = .; \ >>+ __llvm_prf_data_start = .; \ >>+ KEEP(*(__llvm_prf_data)) \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_data_end = .; \ >>+ } \ > >Some minor items on linker script usage. The end of a metadata section >usually does not need alignment. Does the . = ALIGN(8) have >significance? Ditto below. > > > >This is an item about LD_DEAD_CODE_DATA_ELIMINATION. Feel free to >postpone after this patch is in tree: > > KEEP(*(__llvm_prf_data)) > >KEEP should be dropped. > >I have been involved in improving GC (my recent interests on such >metadata sections :) >https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order) > >With LLVM>=13 (https://reviews.llvm.org/D96757), __llvm_prf_* associated >to non-COMDAT text sections can be GCed as well. KEEP would >unnecessarily retain them under LD_DEAD_CODE_DATA_ELIMINATION. > >For older releases (at least 10<=LLVM<13), such __llvm_prf_* sections >are not in zero flag section groups so they usually cannot be discarded. >So perhaps with KEEP or without KEEP, you won't find many size >differences. > >>+ __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_cnts_start = .; \ >>+ KEEP(*(__llvm_prf_cnts)) \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_cnts_end = .; \ >>+ } \ >>+ __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_names_start = .; \ >>+ KEEP(*(__llvm_prf_names)) \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_names_end = .; \ >>+ . = ALIGN(8); \ >>+ } \ > >__llvm_prf_names does not need alignment. >It is often 1 in userspace programs. > >>+ __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \ >>+ __llvm_prf_vals_start = .; \ >>+ KEEP(*(__llvm_prf_vals)) \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_vals_end = .; \ >>+ . = ALIGN(8); \ >>+ } \ >>+ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \ >>+ __llvm_prf_vnds_start = .; \ >>+ KEEP(*(__llvm_prf_vnds)) \ >>+ . = ALIGN(8); \ >>+ __llvm_prf_vnds_end = .; \ >>+ __llvm_prf_end = .; \ >>+ } > >In userspace PGO instrumentation, the start is often aligned by 16. >The end does not need alignment. Thinking more, my suggestion is to drop explicit alignment annotations entirely: __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \ __llvm_prf_vals_start = .; \ *(__llvm_prf_vals) \ __llvm_prf_vals_end = .; \ } \ __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \ __llvm_prf_vnds_start = .; \ *(__llvm_prf_vnds) \ __llvm_prf_vnds_end = .; \ __llvm_prf_end = .; \ } // _cnts, _names and _data are similar. Just delete all ALIGN. // I deleted KEEP above to facilitate --gc-sections as well. Let the linker figure out the alignments in input sections and the output section alignment (https://lld.llvm.org/ELF/linker_script.html#output-section-alignment). Omitting alignment is probably preferable in most cases, unless no input section is present (either not emitted at all or all discarded by ld --gc-sections) (very rare event, happened with commit 793f49a87aae ("firmware_loader: align .builtin_fw to 8"), but that case unlikely happens with PGO). > >>+#else >>+#define PGO_CLANG_DATA >>+#endif >>+ >>#define KERNEL_DTB() \ >> STRUCT_ALIGN(); \ >> __dtb_start = .; \ >>@@ -1105,6 +1148,7 @@ >> CONSTRUCTORS \ >> } \ >> BUG_TABLE \ >>+ PGO_CLANG_DATA >> >>#define INIT_TEXT_SECTION(inittext_align) \ >> . = ALIGN(inittext_align); \ >>diff --git a/kernel/Makefile b/kernel/Makefile >>index 320f1f3941b7..a2a23ef2b12f 100644 >>--- a/kernel/Makefile >>+++ b/kernel/Makefile >>@@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/ >>obj-$(CONFIG_KCSAN) += kcsan/ >>obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o >>obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o >>+obj-$(CONFIG_PGO_CLANG) += pgo/ >> >>obj-$(CONFIG_PERF_EVENTS) += events/ >> >>diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig >>new file mode 100644 >>index 000000000000..76a640b6cf6e >>--- /dev/null >>+++ b/kernel/pgo/Kconfig >>@@ -0,0 +1,35 @@ >>+# SPDX-License-Identifier: GPL-2.0-only >>+menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)" >>+ >>+config ARCH_SUPPORTS_PGO_CLANG >>+ bool >>+ >>+config PGO_CLANG >>+ bool "Enable clang's PGO-based kernel profiling" >>+ depends on DEBUG_FS >>+ depends on ARCH_SUPPORTS_PGO_CLANG >>+ depends on CC_IS_CLANG && CLANG_VERSION >= 120000 >>+ help >>+ This option enables clang's PGO (Profile Guided Optimization) based >>+ code profiling to better optimize the kernel. >>+ >>+ If unsure, say N. >>+ >>+ Run a representative workload for your application on a kernel >>+ compiled with this option and download the raw profile file from >>+ /sys/kernel/debug/pgo/profraw. This file needs to be processed with >>+ llvm-profdata. It may be merged with other collected raw profiles. >>+ >>+ Copy the resulting profile file into vmlinux.profdata, and enable >>+ KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized >>+ kernel. >>+ >>+ Note that a kernel compiled with profiling flags will be >>+ significantly larger and run slower. Also be sure to exclude files >>+ from profiling which are not linked to the kernel image to prevent >>+ linker errors. >>+ >>+ Note that the debugfs filesystem has to be mounted to access >>+ profiling data. >>+ >>+endmenu >>diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile >>new file mode 100644 >>index 000000000000..41e27cefd9a4 >>--- /dev/null >>+++ b/kernel/pgo/Makefile >>@@ -0,0 +1,5 @@ >>+# SPDX-License-Identifier: GPL-2.0 >>+GCOV_PROFILE := n >>+PGO_PROFILE := n >>+ >>+obj-y += fs.o instrument.o >>diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c >>new file mode 100644 >>index 000000000000..1678df3b7d64 >>--- /dev/null >>+++ b/kernel/pgo/fs.c >>@@ -0,0 +1,389 @@ >>+// SPDX-License-Identifier: GPL-2.0 >>+/* >>+ * Copyright (C) 2019 Google, Inc. >>+ * >>+ * Author: >>+ * Sami Tolvanen <samitolvanen@google.com> >>+ * >>+ * This software is licensed under the terms of the GNU General Public >>+ * License version 2, as published by the Free Software Foundation, and >>+ * may be copied, distributed, and modified under those terms. >>+ * >>+ * This program is distributed in the hope that it will be useful, >>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>+ * GNU General Public License for more details. >>+ * >>+ */ >>+ >>+#define pr_fmt(fmt) "pgo: " fmt >>+ >>+#include <linux/kernel.h> >>+#include <linux/debugfs.h> >>+#include <linux/fs.h> >>+#include <linux/module.h> >>+#include <linux/slab.h> >>+#include <linux/vmalloc.h> >>+#include "pgo.h" >>+ >>+static struct dentry *directory; >>+ >>+struct prf_private_data { >>+ void *buffer; >>+ unsigned long size; >>+}; >>+ >>+/* >>+ * Raw profile data format: >>+ * >>+ * - llvm_prf_header >>+ * - __llvm_prf_data >>+ * - __llvm_prf_cnts >>+ * - __llvm_prf_names >>+ * - zero padding to 8 bytes >>+ * - for each llvm_prf_data in __llvm_prf_data: >>+ * - llvm_prf_value_data >>+ * - llvm_prf_value_record + site count array >>+ * - llvm_prf_value_node_data >>+ * ... >>+ * ... >>+ * ... >>+ */ >>+ >>+static void prf_fill_header(void **buffer) >>+{ >>+ struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer; >>+ >>+#ifdef CONFIG_64BIT >>+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64; >>+#else >>+ header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32; >>+#endif >>+ header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION; >>+ header->data_size = prf_data_count(); >>+ header->padding_bytes_before_counters = 0; >>+ header->counters_size = prf_cnts_count(); >>+ header->padding_bytes_after_counters = 0; >>+ header->names_size = prf_names_count(); >>+ header->counters_delta = (u64)__llvm_prf_cnts_start; >>+ header->names_delta = (u64)__llvm_prf_names_start; >>+ header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST; >>+ >>+ *buffer += sizeof(*header); >>+} >>+ >>+/* >>+ * Copy the source into the buffer, incrementing the pointer into buffer in the >>+ * process. >>+ */ >>+static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size) >>+{ >>+ memcpy(*buffer, src, size); >>+ *buffer += size; >>+} >>+ >>+static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds) >>+{ >>+ struct llvm_prf_value_node **nodes = >>+ (struct llvm_prf_value_node **)p->values; >>+ u32 kinds = 0; >>+ u32 size = 0; >>+ unsigned int kind; >>+ unsigned int n; >>+ unsigned int s = 0; >>+ >>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { >>+ unsigned int sites = p->num_value_sites[kind]; >>+ >>+ if (!sites) >>+ continue; >>+ >>+ /* Record + site count array */ >>+ size += prf_get_value_record_size(sites); >>+ kinds++; >>+ >>+ if (!nodes) >>+ continue; >>+ >>+ for (n = 0; n < sites; n++) { >>+ u32 count = 0; >>+ struct llvm_prf_value_node *site = nodes[s + n]; >>+ >>+ while (site && ++count <= U8_MAX) >>+ site = site->next; >>+ >>+ size += count * >>+ sizeof(struct llvm_prf_value_node_data); >>+ } >>+ >>+ s += sites; >>+ } >>+ >>+ if (size) >>+ size += sizeof(struct llvm_prf_value_data); >>+ >>+ if (value_kinds) >>+ *value_kinds = kinds; >>+ >>+ return size; >>+} >>+ >>+static u32 prf_get_value_size(void) >>+{ >>+ u32 size = 0; >>+ struct llvm_prf_data *p; >>+ >>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) >>+ size += __prf_get_value_size(p, NULL); >>+ >>+ return size; >>+} >>+ >>+/* Serialize the profiling's value. */ >>+static void prf_serialize_value(struct llvm_prf_data *p, void **buffer) >>+{ >>+ struct llvm_prf_value_data header; >>+ struct llvm_prf_value_node **nodes = >>+ (struct llvm_prf_value_node **)p->values; >>+ unsigned int kind; >>+ unsigned int n; >>+ unsigned int s = 0; >>+ >>+ header.total_size = __prf_get_value_size(p, &header.num_value_kinds); >>+ >>+ if (!header.num_value_kinds) >>+ /* Nothing to write. */ >>+ return; >>+ >>+ prf_copy_to_buffer(buffer, &header, sizeof(header)); >>+ >>+ for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { >>+ struct llvm_prf_value_record *record; >>+ u8 *counts; >>+ unsigned int sites = p->num_value_sites[kind]; >>+ >>+ if (!sites) >>+ continue; >>+ >>+ /* Profiling value record. */ >>+ record = *(struct llvm_prf_value_record **)buffer; >>+ *buffer += prf_get_value_record_header_size(); >>+ >>+ record->kind = kind; >>+ record->num_value_sites = sites; >>+ >>+ /* Site count array. */ >>+ counts = *(u8 **)buffer; >>+ *buffer += prf_get_value_record_site_count_size(sites); >>+ >>+ /* >>+ * If we don't have nodes, we can skip updating the site count >>+ * array, because the buffer is zero filled. >>+ */ >>+ if (!nodes) >>+ continue; >>+ >>+ for (n = 0; n < sites; n++) { >>+ u32 count = 0; >>+ struct llvm_prf_value_node *site = nodes[s + n]; >>+ >>+ while (site && ++count <= U8_MAX) { >>+ prf_copy_to_buffer(buffer, site, >>+ sizeof(struct llvm_prf_value_node_data)); >>+ site = site->next; >>+ } >>+ >>+ counts[n] = (u8)count; >>+ } >>+ >>+ s += sites; >>+ } >>+} >>+ >>+static void prf_serialize_values(void **buffer) >>+{ >>+ struct llvm_prf_data *p; >>+ >>+ for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) >>+ prf_serialize_value(p, buffer); >>+} >>+ >>+static inline unsigned long prf_get_padding(unsigned long size) >>+{ >>+ return 7 & (sizeof(u64) - size % sizeof(u64)); >>+} >>+ >>+static unsigned long prf_buffer_size(void) >>+{ >>+ return sizeof(struct llvm_prf_header) + >>+ prf_data_size() + >>+ prf_cnts_size() + >>+ prf_names_size() + >>+ prf_get_padding(prf_names_size()) + >>+ prf_get_value_size(); >>+} >>+ >>+/* >>+ * Serialize the profiling data into a format LLVM's tools can understand. >>+ * Note: caller *must* hold pgo_lock. >>+ */ >>+static int prf_serialize(struct prf_private_data *p) >>+{ >>+ int err = 0; >>+ void *buffer; >>+ >>+ p->size = prf_buffer_size(); >>+ p->buffer = vzalloc(p->size); >>+ >>+ if (!p->buffer) { >>+ err = -ENOMEM; >>+ goto out; >>+ } >>+ >>+ buffer = p->buffer; >>+ >>+ prf_fill_header(&buffer); >>+ prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size()); >>+ prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size()); >>+ prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size()); >>+ buffer += prf_get_padding(prf_names_size()); >>+ >>+ prf_serialize_values(&buffer); >>+ >>+out: >>+ return err; >>+} >>+ >>+/* open() implementation for PGO. Creates a copy of the profiling data set. */ >>+static int prf_open(struct inode *inode, struct file *file) >>+{ >>+ struct prf_private_data *data; >>+ unsigned long flags; >>+ int err; >>+ >>+ data = kzalloc(sizeof(*data), GFP_KERNEL); >>+ if (!data) { >>+ err = -ENOMEM; >>+ goto out; >>+ } >>+ >>+ flags = prf_lock(); >>+ >>+ err = prf_serialize(data); >>+ if (unlikely(err)) { >>+ kfree(data); >>+ goto out_unlock; >>+ } >>+ >>+ file->private_data = data; >>+ >>+out_unlock: >>+ prf_unlock(flags); >>+out: >>+ return err; >>+} >>+ >>+/* read() implementation for PGO. */ >>+static ssize_t prf_read(struct file *file, char __user *buf, size_t count, >>+ loff_t *ppos) >>+{ >>+ struct prf_private_data *data = file->private_data; >>+ >>+ BUG_ON(!data); >>+ >>+ return simple_read_from_buffer(buf, count, ppos, data->buffer, >>+ data->size); >>+} >>+ >>+/* release() implementation for PGO. Release resources allocated by open(). */ >>+static int prf_release(struct inode *inode, struct file *file) >>+{ >>+ struct prf_private_data *data = file->private_data; >>+ >>+ if (data) { >>+ vfree(data->buffer); >>+ kfree(data); >>+ } >>+ >>+ return 0; >>+} >>+ >>+static const struct file_operations prf_fops = { >>+ .owner = THIS_MODULE, >>+ .open = prf_open, >>+ .read = prf_read, >>+ .llseek = default_llseek, >>+ .release = prf_release >>+}; >>+ >>+/* write() implementation for resetting PGO's profile data. */ >>+static ssize_t reset_write(struct file *file, const char __user *addr, >>+ size_t len, loff_t *pos) >>+{ >>+ struct llvm_prf_data *data; >>+ >>+ memset(__llvm_prf_cnts_start, 0, prf_cnts_size()); >>+ >>+ for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) { >>+ struct llvm_prf_value_node **vnodes; >>+ u64 current_vsite_count; >>+ u32 i; >>+ >>+ if (!data->values) >>+ continue; >>+ >>+ current_vsite_count = 0; >>+ vnodes = (struct llvm_prf_value_node **)data->values; >>+ >>+ for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++) >>+ current_vsite_count += data->num_value_sites[i]; >>+ >>+ for (i = 0; i < current_vsite_count; i++) { >>+ struct llvm_prf_value_node *current_vnode = vnodes[i]; >>+ >>+ while (current_vnode) { >>+ current_vnode->count = 0; >>+ current_vnode = current_vnode->next; >>+ } >>+ } >>+ } >>+ >>+ return len; >>+} >>+ >>+static const struct file_operations prf_reset_fops = { >>+ .owner = THIS_MODULE, >>+ .write = reset_write, >>+ .llseek = noop_llseek, >>+}; >>+ >>+/* Create debugfs entries. */ >>+static int __init pgo_init(void) >>+{ >>+ directory = debugfs_create_dir("pgo", NULL); >>+ if (!directory) >>+ goto err_remove; >>+ >>+ if (!debugfs_create_file("profraw", 0600, directory, NULL, >>+ &prf_fops)) >>+ goto err_remove; >>+ >>+ if (!debugfs_create_file("reset", 0200, directory, NULL, >>+ &prf_reset_fops)) >>+ goto err_remove; >>+ >>+ return 0; >>+ >>+err_remove: >>+ pr_err("initialization failed\n"); >>+ return -EIO; >>+} >>+ >>+/* Remove debugfs entries. */ >>+static void __exit pgo_exit(void) >>+{ >>+ debugfs_remove_recursive(directory); >>+} >>+ >>+module_init(pgo_init); >>+module_exit(pgo_exit); >>diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c >>new file mode 100644 >>index 000000000000..62ff5cfce7b1 >>--- /dev/null >>+++ b/kernel/pgo/instrument.c >>@@ -0,0 +1,189 @@ >>+// SPDX-License-Identifier: GPL-2.0 >>+/* >>+ * Copyright (C) 2019 Google, Inc. >>+ * >>+ * Author: >>+ * Sami Tolvanen <samitolvanen@google.com> >>+ * >>+ * This software is licensed under the terms of the GNU General Public >>+ * License version 2, as published by the Free Software Foundation, and >>+ * may be copied, distributed, and modified under those terms. >>+ * >>+ * This program is distributed in the hope that it will be useful, >>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>+ * GNU General Public License for more details. >>+ * >>+ */ >>+ >>+#define pr_fmt(fmt) "pgo: " fmt >>+ >>+#include <linux/bitops.h> >>+#include <linux/kernel.h> >>+#include <linux/export.h> >>+#include <linux/spinlock.h> >>+#include <linux/types.h> >>+#include "pgo.h" >>+ >>+/* >>+ * This lock guards both profile count updating and serialization of the >>+ * profiling data. Keeping both of these activities separate via locking >>+ * ensures that we don't try to serialize data that's only partially updated. >>+ */ >>+static DEFINE_SPINLOCK(pgo_lock); >>+static int current_node; >>+ >>+unsigned long prf_lock(void) >>+{ >>+ unsigned long flags; >>+ >>+ spin_lock_irqsave(&pgo_lock, flags); >>+ >>+ return flags; >>+} >>+ >>+void prf_unlock(unsigned long flags) >>+{ >>+ spin_unlock_irqrestore(&pgo_lock, flags); >>+} >>+ >>+/* >>+ * Return a newly allocated profiling value node which contains the tracked >>+ * value by the value profiler. >>+ * Note: caller *must* hold pgo_lock. >>+ */ >>+static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p, >>+ u32 index, u64 value) >>+{ >>+ if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end) >>+ return NULL; /* Out of nodes */ >>+ >>+ current_node++; >>+ >>+ /* Make sure the node is entirely within the section */ >>+ if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end || >>+ &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end) >>+ return NULL; >>+ >>+ return &__llvm_prf_vnds_start[current_node]; >>+} >>+ >>+/* >>+ * Counts the number of times a target value is seen. >>+ * >>+ * Records the target value for the index if not seen before. Otherwise, >>+ * increments the counter associated w/ the target value. >>+ */ >>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index); >>+void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index) >>+{ >>+ struct llvm_prf_data *p = (struct llvm_prf_data *)data; >>+ struct llvm_prf_value_node **counters; >>+ struct llvm_prf_value_node *curr; >>+ struct llvm_prf_value_node *min = NULL; >>+ struct llvm_prf_value_node *prev = NULL; >>+ u64 min_count = U64_MAX; >>+ u8 values = 0; >>+ unsigned long flags; >>+ >>+ if (!p || !p->values) >>+ return; >>+ >>+ counters = (struct llvm_prf_value_node **)p->values; >>+ curr = counters[index]; >>+ >>+ while (curr) { >>+ if (target_value == curr->value) { >>+ curr->count++; >>+ return; >>+ } >>+ >>+ if (curr->count < min_count) { >>+ min_count = curr->count; >>+ min = curr; >>+ } >>+ >>+ prev = curr; >>+ curr = curr->next; >>+ values++; >>+ } >>+ >>+ if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) { >>+ if (!min->count || !(--min->count)) { >>+ curr = min; >>+ curr->value = target_value; >>+ curr->count++; >>+ } >>+ return; >>+ } >>+ >>+ /* Lock when updating the value node structure. */ >>+ flags = prf_lock(); >>+ >>+ curr = allocate_node(p, index, target_value); >>+ if (!curr) >>+ goto out; >>+ >>+ curr->value = target_value; >>+ curr->count++; >>+ >>+ if (!counters[index]) >>+ counters[index] = curr; >>+ else if (prev && !prev->next) >>+ prev->next = curr; >>+ >>+out: >>+ prf_unlock(flags); >>+} >>+EXPORT_SYMBOL(__llvm_profile_instrument_target); >>+ >>+/* Counts the number of times a range of targets values are seen. */ >>+void __llvm_profile_instrument_range(u64 target_value, void *data, >>+ u32 index, s64 precise_start, >>+ s64 precise_last, s64 large_value); >>+void __llvm_profile_instrument_range(u64 target_value, void *data, >>+ u32 index, s64 precise_start, >>+ s64 precise_last, s64 large_value) >>+{ >>+ if (large_value != S64_MIN && (s64)target_value >= large_value) >>+ target_value = large_value; >>+ else if ((s64)target_value < precise_start || >>+ (s64)target_value > precise_last) >>+ target_value = precise_last + 1; >>+ >>+ __llvm_profile_instrument_target(target_value, data, index); >>+} >>+EXPORT_SYMBOL(__llvm_profile_instrument_range); >>+ >>+static u64 inst_prof_get_range_rep_value(u64 value) >>+{ >>+ if (value <= 8) >>+ /* The first ranges are individually tracked, use it as is. */ >>+ return value; >>+ else if (value >= 513) >>+ /* The last range is mapped to its lowest value. */ >>+ return 513; >>+ else if (hweight64(value) == 1) >>+ /* If it's a power of two, use it as is. */ >>+ return value; >>+ >>+ /* Otherwise, take to the previous power of two + 1. */ >>+ return (1 << (64 - __builtin_clzll(value) - 1)) + 1; >>+} `1 << ...` is another very minor issue. I sent https://reviews.llvm.org/D97640 to fix the upstream. The overflow won't happen in practice because the function is only used by the size parameter of memory operation (e.g. memcpy). >>+/* >>+ * The target values are partitioned into multiple ranges. The range spec is >>+ * defined in compiler-rt/include/profile/InstrProfData.inc. >>+ */ >>+void __llvm_profile_instrument_memop(u64 target_value, void *data, >>+ u32 counter_index); >>+void __llvm_profile_instrument_memop(u64 target_value, void *data, >>+ u32 counter_index) >>+{ >>+ u64 rep_value; >>+ >>+ /* Map the target value to the representative value of its range. */ >>+ rep_value = inst_prof_get_range_rep_value(target_value); >>+ __llvm_profile_instrument_target(rep_value, data, counter_index); >>+} >>+EXPORT_SYMBOL(__llvm_profile_instrument_memop); >>diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h >>new file mode 100644 >>index 000000000000..ddc8d3002fe5 >>--- /dev/null >>+++ b/kernel/pgo/pgo.h >>@@ -0,0 +1,203 @@ >>+/* SPDX-License-Identifier: GPL-2.0 */ >>+/* >>+ * Copyright (C) 2019 Google, Inc. >>+ * >>+ * Author: >>+ * Sami Tolvanen <samitolvanen@google.com> >>+ * >>+ * This software is licensed under the terms of the GNU General Public >>+ * License version 2, as published by the Free Software Foundation, and >>+ * may be copied, distributed, and modified under those terms. >>+ * >>+ * This program is distributed in the hope that it will be useful, >>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of >>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>+ * GNU General Public License for more details. >>+ * >>+ */ >>+ >>+#ifndef _PGO_H >>+#define _PGO_H >>+ >>+/* >>+ * Note: These internal LLVM definitions must match the compiler version. >>+ * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code. >>+ */ >>+ >>+#define LLVM_INSTR_PROF_RAW_MAGIC_64 \ >>+ ((u64)255 << 56 | \ >>+ (u64)'l' << 48 | \ >>+ (u64)'p' << 40 | \ >>+ (u64)'r' << 32 | \ >>+ (u64)'o' << 24 | \ >>+ (u64)'f' << 16 | \ >>+ (u64)'r' << 8 | \ >>+ (u64)129) >>+#define LLVM_INSTR_PROF_RAW_MAGIC_32 \ >>+ ((u64)255 << 56 | \ >>+ (u64)'l' << 48 | \ >>+ (u64)'p' << 40 | \ >>+ (u64)'r' << 32 | \ >>+ (u64)'o' << 24 | \ >>+ (u64)'f' << 16 | \ >>+ (u64)'R' << 8 | \ >>+ (u64)129) >>+ >>+#define LLVM_INSTR_PROF_RAW_VERSION 5 >>+#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8 >>+#define LLVM_INSTR_PROF_IPVK_FIRST 0 >>+#define LLVM_INSTR_PROF_IPVK_LAST 1 >>+#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255 >>+ >>+#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56) >>+#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57) >>+ >>+/** >>+ * struct llvm_prf_header - represents the raw profile header data structure. >>+ * @magic: the magic token for the file format. >>+ * @version: the version of the file format. >>+ * @data_size: the number of entries in the profile data section. >>+ * @padding_bytes_before_counters: the number of padding bytes before the >>+ * counters. >>+ * @counters_size: the size in bytes of the LLVM profile section containing the >>+ * counters. >>+ * @padding_bytes_after_counters: the number of padding bytes after the >>+ * counters. >>+ * @names_size: the size in bytes of the LLVM profile section containing the >>+ * counters' names. >>+ * @counters_delta: the beginning of the LLMV profile counters section. >>+ * @names_delta: the beginning of the LLMV profile names section. >>+ * @value_kind_last: the last profile value kind. >>+ */ >>+struct llvm_prf_header { >>+ u64 magic; >>+ u64 version; >>+ u64 data_size; >>+ u64 padding_bytes_before_counters; >>+ u64 counters_size; >>+ u64 padding_bytes_after_counters; >>+ u64 names_size; >>+ u64 counters_delta; >>+ u64 names_delta; >>+ u64 value_kind_last; >>+}; >>+ >>+/** >>+ * struct llvm_prf_data - represents the per-function control structure. >>+ * @name_ref: the reference to the function's name. >>+ * @func_hash: the hash value of the function. >>+ * @counter_ptr: a pointer to the profile counter. >>+ * @function_ptr: a pointer to the function. >>+ * @values: the profiling values associated with this function. >>+ * @num_counters: the number of counters in the function. >>+ * @num_value_sites: the number of value profile sites. >>+ */ >>+struct llvm_prf_data { >>+ const u64 name_ref; >>+ const u64 func_hash; >>+ const void *counter_ptr; >>+ const void *function_ptr; >>+ void *values; >>+ const u32 num_counters; >>+ const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1]; >>+} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT); >>+ >>+/** >>+ * structure llvm_prf_value_node_data - represents the data part of the struct >>+ * llvm_prf_value_node data structure. >>+ * @value: the value counters. >>+ * @count: the counters' count. >>+ */ >>+struct llvm_prf_value_node_data { >>+ u64 value; >>+ u64 count; >>+}; >>+ >>+/** >>+ * struct llvm_prf_value_node - represents an internal data structure used by >>+ * the value profiler. >>+ * @value: the value counters. >>+ * @count: the counters' count. >>+ * @next: the next value node. >>+ */ >>+struct llvm_prf_value_node { >>+ u64 value; >>+ u64 count; >>+ struct llvm_prf_value_node *next; >>+}; >>+ >>+/** >>+ * struct llvm_prf_value_data - represents the value profiling data in indexed >>+ * format. >>+ * @total_size: the total size in bytes including this field. >>+ * @num_value_kinds: the number of value profile kinds that has value profile >>+ * data. >>+ */ >>+struct llvm_prf_value_data { >>+ u32 total_size; >>+ u32 num_value_kinds; >>+}; >>+ >>+/** >>+ * struct llvm_prf_value_record - represents the on-disk layout of the value >>+ * profile data of a particular kind for one function. >>+ * @kind: the kind of the value profile record. >>+ * @num_value_sites: the number of value profile sites. >>+ * @site_count_array: the first element of the array that stores the number >>+ * of profiled values for each value site. >>+ */ >>+struct llvm_prf_value_record { >>+ u32 kind; >>+ u32 num_value_sites; >>+ u8 site_count_array[]; >>+}; >>+ >>+#define prf_get_value_record_header_size() \ >>+ offsetof(struct llvm_prf_value_record, site_count_array) >>+#define prf_get_value_record_site_count_size(sites) \ >>+ roundup((sites), 8) >>+#define prf_get_value_record_size(sites) \ >>+ (prf_get_value_record_header_size() + \ >>+ prf_get_value_record_site_count_size((sites))) >>+ >>+/* Data sections */ >>+extern struct llvm_prf_data __llvm_prf_data_start[]; >>+extern struct llvm_prf_data __llvm_prf_data_end[]; >>+ >>+extern u64 __llvm_prf_cnts_start[]; >>+extern u64 __llvm_prf_cnts_end[]; >>+ >>+extern char __llvm_prf_names_start[]; >>+extern char __llvm_prf_names_end[]; >>+ >>+extern struct llvm_prf_value_node __llvm_prf_vnds_start[]; >>+extern struct llvm_prf_value_node __llvm_prf_vnds_end[]; >>+ >>+/* Locking for vnodes */ >>+extern unsigned long prf_lock(void); >>+extern void prf_unlock(unsigned long flags); >>+ >>+#define __DEFINE_PRF_SIZE(s) \ >>+ static inline unsigned long prf_ ## s ## _size(void) \ >>+ { \ >>+ unsigned long start = \ >>+ (unsigned long)__llvm_prf_ ## s ## _start; \ >>+ unsigned long end = \ >>+ (unsigned long)__llvm_prf_ ## s ## _end; \ >>+ return roundup(end - start, \ >>+ sizeof(__llvm_prf_ ## s ## _start[0])); \ >>+ } \ >>+ static inline unsigned long prf_ ## s ## _count(void) \ >>+ { \ >>+ return prf_ ## s ## _size() / \ >>+ sizeof(__llvm_prf_ ## s ## _start[0]); \ >>+ } >>+ >>+__DEFINE_PRF_SIZE(data); >>+__DEFINE_PRF_SIZE(cnts); >>+__DEFINE_PRF_SIZE(names); >>+__DEFINE_PRF_SIZE(vnds); >>+ >>+#undef __DEFINE_PRF_SIZE >>+ >>+#endif /* _PGO_H */ >>diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib >>index eee59184de64..48a65d092c5b 100644 >>--- a/scripts/Makefile.lib >>+++ b/scripts/Makefile.lib >>@@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \ >> $(CFLAGS_GCOV)) >>endif >> >>+# >>+# Enable clang's PGO profiling flags for a file or directory depending on >>+# variables PGO_PROFILE_obj.o and PGO_PROFILE. >>+# >>+ifeq ($(CONFIG_PGO_CLANG),y) >>+_c_flags += $(if $(patsubst n%,, \ >>+ $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \ >>+ $(CFLAGS_PGO_CLANG)) >>+endif >>+ >># >># Enable address sanitizer flags for kernel except some files or directories >># we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE) >>-- >>2.30.1.766.gb4fecdf3b7-goog >> >>-- >>You received this message because you are subscribed to the Google Groups "Clang Built Linux" group. >>To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com. >>To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20210226222030.3718075-1-morbo%40google.com.
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index f7809c7b1ba9..8d6418e85806 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -26,6 +26,7 @@ whole; patches welcome! kgdb kselftest kunit/index + pgo .. only:: subproject and html diff --git a/Documentation/dev-tools/pgo.rst b/Documentation/dev-tools/pgo.rst new file mode 100644 index 000000000000..b7f11d8405b7 --- /dev/null +++ b/Documentation/dev-tools/pgo.rst @@ -0,0 +1,127 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================== +Using PGO with the Linux kernel +=============================== + +Clang's profiling kernel support (PGO_) enables profiling of the Linux kernel +when building with Clang. The profiling data is exported via the ``pgo`` +debugfs directory. + +.. _PGO: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization + + +Preparation +=========== + +Configure the kernel with: + +.. code-block:: make + + CONFIG_DEBUG_FS=y + CONFIG_PGO_CLANG=y + +Note that kernels compiled with profiling flags will be significantly larger +and run slower. + +Profiling data will only become accessible once debugfs has been mounted: + +.. code-block:: sh + + mount -t debugfs none /sys/kernel/debug + + +Customization +============= + +You can enable or disable profiling for individual file and directories by +adding a line similar to the following to the respective kernel Makefile: + +- For a single file (e.g. main.o) + + .. code-block:: make + + PGO_PROFILE_main.o := y + +- For all files in one directory + + .. code-block:: make + + PGO_PROFILE := y + +To exclude files from being profiled use + + .. code-block:: make + + PGO_PROFILE_main.o := n + +and + + .. code-block:: make + + PGO_PROFILE := n + +Only files which are linked to the main kernel image or are compiled as kernel +modules are supported by this mechanism. + + +Files +===== + +The PGO kernel support creates the following files in debugfs: + +``/sys/kernel/debug/pgo`` + Parent directory for all PGO-related files. + +``/sys/kernel/debug/pgo/reset`` + Global reset file: resets all coverage data to zero when written to. + +``/sys/kernel/debug/profraw`` + The raw PGO data that must be processed with ``llvm_profdata``. + + +Workflow +======== + +The PGO kernel can be run on the host or test machines. The data though should +be analyzed with Clang's tools from the same Clang version as the kernel was +compiled. Clang's tolerant of version skew, but it's easier to use the same +Clang version. + +The profiling data is useful for optimizing the kernel, analyzing coverage, +etc. Clang offers tools to perform these tasks. + +Here is an example workflow for profiling an instrumented kernel with PGO and +using the result to optimize the kernel: + +1) Install the kernel on the TEST machine. + +2) Reset the data counters right before running the load tests + + .. code-block:: sh + + $ echo 1 > /sys/kernel/debug/pgo/reset + +3) Run the load tests. + +4) Collect the raw profile data + + .. code-block:: sh + + $ cp -a /sys/kernel/debug/pgo/profraw /tmp/vmlinux.profraw + +5) (Optional) Download the raw profile data to the HOST machine. + +6) Process the raw profile data + + .. code-block:: sh + + $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw + + Note that multiple raw profile data files can be merged during this step. + +7) Rebuild the kernel using the profile data (PGO disabled) + + .. code-block:: sh + + $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... diff --git a/MAINTAINERS b/MAINTAINERS index c71664ca8bfd..3a6668792bc5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14019,6 +14019,15 @@ S: Maintained F: include/linux/personality.h F: include/uapi/linux/personality.h +PGO BASED KERNEL PROFILING +M: Sami Tolvanen <samitolvanen@google.com> +M: Bill Wendling <wcw@google.com> +R: Nathan Chancellor <natechancellor@gmail.com> +R: Nick Desaulniers <ndesaulniers@google.com> +S: Supported +F: Documentation/dev-tools/pgo.rst +F: kernel/pgo + PHOENIX RC FLIGHT CONTROLLER ADAPTER M: Marcus Folkesson <marcus.folkesson@gmail.com> L: linux-input@vger.kernel.org diff --git a/Makefile b/Makefile index 6ecd0d22e608..b57d4d44c799 100644 --- a/Makefile +++ b/Makefile @@ -657,6 +657,9 @@ endif # KBUILD_EXTMOD # Defaults to vmlinux, but the arch makefile usually adds further targets all: vmlinux +CFLAGS_PGO_CLANG := -fprofile-generate +export CFLAGS_PGO_CLANG + CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \ $(call cc-option,-fno-tree-loop-im) \ $(call cc-disable-warning,maybe-uninitialized,) diff --git a/arch/Kconfig b/arch/Kconfig index 2bb30673d8e6..111e642a2af7 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1192,6 +1192,7 @@ config ARCH_HAS_ELFCORE_COMPAT bool source "kernel/gcov/Kconfig" +source "kernel/pgo/Kconfig" source "scripts/gcc-plugins/Kconfig" diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index cd4b9b1204a8..c9808583b528 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -99,6 +99,7 @@ config X86 select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096 select ARCH_SUPPORTS_LTO_CLANG if X86_64 select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64 + select ARCH_SUPPORTS_PGO_CLANG if X86_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile index fe605205b4ce..383853e32f67 100644 --- a/arch/x86/boot/Makefile +++ b/arch/x86/boot/Makefile @@ -71,6 +71,7 @@ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) KBUILD_CFLAGS += -fno-asynchronous-unwind-tables GCOV_PROFILE := n +PGO_PROFILE := n UBSAN_SANITIZE := n $(obj)/bzImage: asflags-y := $(SVGA_MODE) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index e0bc3988c3fa..ed12ab65f606 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -54,6 +54,7 @@ CFLAGS_sev-es.o += -I$(objtree)/arch/x86/lib/ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ GCOV_PROFILE := n +PGO_PROFILE := n UBSAN_SANITIZE :=n KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE) diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index b28e36b7c96b..4b2e9620c412 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -4,6 +4,10 @@ OBJECT_FILES_NON_STANDARD := y +# Disable PGO for curve25519-x86_64. With PGO enabled, clang runs out of +# registers for some of the functions. +PGO_PROFILE_curve25519-x86_64.o := n + obj-$(CONFIG_CRYPTO_TWOFISH_586) += twofish-i586.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 05c4abc2fdfd..f7421e44725a 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -180,6 +180,7 @@ quiet_cmd_vdso = VDSO $@ VDSO_LDFLAGS = -shared --hash-style=both --build-id=sha1 \ $(call ld-option, --eh-frame-hdr) -Bsymbolic GCOV_PROFILE := n +PGO_PROFILE := n quiet_cmd_vdso_and_check = VDSO $@ cmd_vdso_and_check = $(cmd_vdso); $(cmd_vdso_check) diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index efd9e9ea17f2..f6cab2316c46 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -184,6 +184,8 @@ SECTIONS BUG_TABLE + PGO_CLANG_DATA + ORC_UNWIND_TABLE . = ALIGN(PAGE_SIZE); diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile index 84b09c230cbd..5f22b31446ad 100644 --- a/arch/x86/platform/efi/Makefile +++ b/arch/x86/platform/efi/Makefile @@ -2,6 +2,7 @@ OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y KASAN_SANITIZE := n GCOV_PROFILE := n +PGO_PROFILE := n obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile index 95ea17a9d20c..36f20e99da0b 100644 --- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -23,6 +23,7 @@ targets += purgatory.ro purgatory.chk # Sanitizer, etc. runtimes are unavailable and cannot be linked here. GCOV_PROFILE := n +PGO_PROFILE := n KASAN_SANITIZE := n UBSAN_SANITIZE := n KCSAN_SANITIZE := n diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile index 83f1b6a56449..21797192f958 100644 --- a/arch/x86/realmode/rm/Makefile +++ b/arch/x86/realmode/rm/Makefile @@ -76,4 +76,5 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \ KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables GCOV_PROFILE := n +PGO_PROFILE := n UBSAN_SANITIZE := n diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile index 5943387e3f35..54f5768f5853 100644 --- a/arch/x86/um/vdso/Makefile +++ b/arch/x86/um/vdso/Makefile @@ -64,6 +64,7 @@ quiet_cmd_vdso = VDSO $@ VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv GCOV_PROFILE := n +PGO_PROFILE := n # # Install the unstripped copy of vdso*.so listed in $(vdso-install-y). diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index c23466e05e60..724fb389bb9d 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -42,6 +42,7 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS)) KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) GCOV_PROFILE := n +PGO_PROFILE := n # Sanitizer runtimes are unavailable and cannot be linked here. KASAN_SANITIZE := n KCSAN_SANITIZE := n diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 6786f8c0182f..4a0c21b840b3 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -329,6 +329,49 @@ #define DTPM_TABLE() #endif +#ifdef CONFIG_PGO_CLANG +#define PGO_CLANG_DATA \ + __llvm_prf_data : AT(ADDR(__llvm_prf_data) - LOAD_OFFSET) { \ + . = ALIGN(8); \ + __llvm_prf_start = .; \ + __llvm_prf_data_start = .; \ + KEEP(*(__llvm_prf_data)) \ + . = ALIGN(8); \ + __llvm_prf_data_end = .; \ + } \ + __llvm_prf_cnts : AT(ADDR(__llvm_prf_cnts) - LOAD_OFFSET) { \ + . = ALIGN(8); \ + __llvm_prf_cnts_start = .; \ + KEEP(*(__llvm_prf_cnts)) \ + . = ALIGN(8); \ + __llvm_prf_cnts_end = .; \ + } \ + __llvm_prf_names : AT(ADDR(__llvm_prf_names) - LOAD_OFFSET) { \ + . = ALIGN(8); \ + __llvm_prf_names_start = .; \ + KEEP(*(__llvm_prf_names)) \ + . = ALIGN(8); \ + __llvm_prf_names_end = .; \ + . = ALIGN(8); \ + } \ + __llvm_prf_vals : AT(ADDR(__llvm_prf_vals) - LOAD_OFFSET) { \ + __llvm_prf_vals_start = .; \ + KEEP(*(__llvm_prf_vals)) \ + . = ALIGN(8); \ + __llvm_prf_vals_end = .; \ + . = ALIGN(8); \ + } \ + __llvm_prf_vnds : AT(ADDR(__llvm_prf_vnds) - LOAD_OFFSET) { \ + __llvm_prf_vnds_start = .; \ + KEEP(*(__llvm_prf_vnds)) \ + . = ALIGN(8); \ + __llvm_prf_vnds_end = .; \ + __llvm_prf_end = .; \ + } +#else +#define PGO_CLANG_DATA +#endif + #define KERNEL_DTB() \ STRUCT_ALIGN(); \ __dtb_start = .; \ @@ -1105,6 +1148,7 @@ CONSTRUCTORS \ } \ BUG_TABLE \ + PGO_CLANG_DATA #define INIT_TEXT_SECTION(inittext_align) \ . = ALIGN(inittext_align); \ diff --git a/kernel/Makefile b/kernel/Makefile index 320f1f3941b7..a2a23ef2b12f 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -111,6 +111,7 @@ obj-$(CONFIG_BPF) += bpf/ obj-$(CONFIG_KCSAN) += kcsan/ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call.o +obj-$(CONFIG_PGO_CLANG) += pgo/ obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/pgo/Kconfig b/kernel/pgo/Kconfig new file mode 100644 index 000000000000..76a640b6cf6e --- /dev/null +++ b/kernel/pgo/Kconfig @@ -0,0 +1,35 @@ +# SPDX-License-Identifier: GPL-2.0-only +menu "Profile Guided Optimization (PGO) (EXPERIMENTAL)" + +config ARCH_SUPPORTS_PGO_CLANG + bool + +config PGO_CLANG + bool "Enable clang's PGO-based kernel profiling" + depends on DEBUG_FS + depends on ARCH_SUPPORTS_PGO_CLANG + depends on CC_IS_CLANG && CLANG_VERSION >= 120000 + help + This option enables clang's PGO (Profile Guided Optimization) based + code profiling to better optimize the kernel. + + If unsure, say N. + + Run a representative workload for your application on a kernel + compiled with this option and download the raw profile file from + /sys/kernel/debug/pgo/profraw. This file needs to be processed with + llvm-profdata. It may be merged with other collected raw profiles. + + Copy the resulting profile file into vmlinux.profdata, and enable + KCFLAGS=-fprofile-use=vmlinux.profdata to produce an optimized + kernel. + + Note that a kernel compiled with profiling flags will be + significantly larger and run slower. Also be sure to exclude files + from profiling which are not linked to the kernel image to prevent + linker errors. + + Note that the debugfs filesystem has to be mounted to access + profiling data. + +endmenu diff --git a/kernel/pgo/Makefile b/kernel/pgo/Makefile new file mode 100644 index 000000000000..41e27cefd9a4 --- /dev/null +++ b/kernel/pgo/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +GCOV_PROFILE := n +PGO_PROFILE := n + +obj-y += fs.o instrument.o diff --git a/kernel/pgo/fs.c b/kernel/pgo/fs.c new file mode 100644 index 000000000000..1678df3b7d64 --- /dev/null +++ b/kernel/pgo/fs.c @@ -0,0 +1,389 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Google, Inc. + * + * Author: + * Sami Tolvanen <samitolvanen@google.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) "pgo: " fmt + +#include <linux/kernel.h> +#include <linux/debugfs.h> +#include <linux/fs.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> +#include "pgo.h" + +static struct dentry *directory; + +struct prf_private_data { + void *buffer; + unsigned long size; +}; + +/* + * Raw profile data format: + * + * - llvm_prf_header + * - __llvm_prf_data + * - __llvm_prf_cnts + * - __llvm_prf_names + * - zero padding to 8 bytes + * - for each llvm_prf_data in __llvm_prf_data: + * - llvm_prf_value_data + * - llvm_prf_value_record + site count array + * - llvm_prf_value_node_data + * ... + * ... + * ... + */ + +static void prf_fill_header(void **buffer) +{ + struct llvm_prf_header *header = *(struct llvm_prf_header **)buffer; + +#ifdef CONFIG_64BIT + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_64; +#else + header->magic = LLVM_INSTR_PROF_RAW_MAGIC_32; +#endif + header->version = LLVM_VARIANT_MASK_IR_PROF | LLVM_INSTR_PROF_RAW_VERSION; + header->data_size = prf_data_count(); + header->padding_bytes_before_counters = 0; + header->counters_size = prf_cnts_count(); + header->padding_bytes_after_counters = 0; + header->names_size = prf_names_count(); + header->counters_delta = (u64)__llvm_prf_cnts_start; + header->names_delta = (u64)__llvm_prf_names_start; + header->value_kind_last = LLVM_INSTR_PROF_IPVK_LAST; + + *buffer += sizeof(*header); +} + +/* + * Copy the source into the buffer, incrementing the pointer into buffer in the + * process. + */ +static void prf_copy_to_buffer(void **buffer, void *src, unsigned long size) +{ + memcpy(*buffer, src, size); + *buffer += size; +} + +static u32 __prf_get_value_size(struct llvm_prf_data *p, u32 *value_kinds) +{ + struct llvm_prf_value_node **nodes = + (struct llvm_prf_value_node **)p->values; + u32 kinds = 0; + u32 size = 0; + unsigned int kind; + unsigned int n; + unsigned int s = 0; + + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { + unsigned int sites = p->num_value_sites[kind]; + + if (!sites) + continue; + + /* Record + site count array */ + size += prf_get_value_record_size(sites); + kinds++; + + if (!nodes) + continue; + + for (n = 0; n < sites; n++) { + u32 count = 0; + struct llvm_prf_value_node *site = nodes[s + n]; + + while (site && ++count <= U8_MAX) + site = site->next; + + size += count * + sizeof(struct llvm_prf_value_node_data); + } + + s += sites; + } + + if (size) + size += sizeof(struct llvm_prf_value_data); + + if (value_kinds) + *value_kinds = kinds; + + return size; +} + +static u32 prf_get_value_size(void) +{ + u32 size = 0; + struct llvm_prf_data *p; + + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) + size += __prf_get_value_size(p, NULL); + + return size; +} + +/* Serialize the profiling's value. */ +static void prf_serialize_value(struct llvm_prf_data *p, void **buffer) +{ + struct llvm_prf_value_data header; + struct llvm_prf_value_node **nodes = + (struct llvm_prf_value_node **)p->values; + unsigned int kind; + unsigned int n; + unsigned int s = 0; + + header.total_size = __prf_get_value_size(p, &header.num_value_kinds); + + if (!header.num_value_kinds) + /* Nothing to write. */ + return; + + prf_copy_to_buffer(buffer, &header, sizeof(header)); + + for (kind = 0; kind < ARRAY_SIZE(p->num_value_sites); kind++) { + struct llvm_prf_value_record *record; + u8 *counts; + unsigned int sites = p->num_value_sites[kind]; + + if (!sites) + continue; + + /* Profiling value record. */ + record = *(struct llvm_prf_value_record **)buffer; + *buffer += prf_get_value_record_header_size(); + + record->kind = kind; + record->num_value_sites = sites; + + /* Site count array. */ + counts = *(u8 **)buffer; + *buffer += prf_get_value_record_site_count_size(sites); + + /* + * If we don't have nodes, we can skip updating the site count + * array, because the buffer is zero filled. + */ + if (!nodes) + continue; + + for (n = 0; n < sites; n++) { + u32 count = 0; + struct llvm_prf_value_node *site = nodes[s + n]; + + while (site && ++count <= U8_MAX) { + prf_copy_to_buffer(buffer, site, + sizeof(struct llvm_prf_value_node_data)); + site = site->next; + } + + counts[n] = (u8)count; + } + + s += sites; + } +} + +static void prf_serialize_values(void **buffer) +{ + struct llvm_prf_data *p; + + for (p = __llvm_prf_data_start; p < __llvm_prf_data_end; p++) + prf_serialize_value(p, buffer); +} + +static inline unsigned long prf_get_padding(unsigned long size) +{ + return 7 & (sizeof(u64) - size % sizeof(u64)); +} + +static unsigned long prf_buffer_size(void) +{ + return sizeof(struct llvm_prf_header) + + prf_data_size() + + prf_cnts_size() + + prf_names_size() + + prf_get_padding(prf_names_size()) + + prf_get_value_size(); +} + +/* + * Serialize the profiling data into a format LLVM's tools can understand. + * Note: caller *must* hold pgo_lock. + */ +static int prf_serialize(struct prf_private_data *p) +{ + int err = 0; + void *buffer; + + p->size = prf_buffer_size(); + p->buffer = vzalloc(p->size); + + if (!p->buffer) { + err = -ENOMEM; + goto out; + } + + buffer = p->buffer; + + prf_fill_header(&buffer); + prf_copy_to_buffer(&buffer, __llvm_prf_data_start, prf_data_size()); + prf_copy_to_buffer(&buffer, __llvm_prf_cnts_start, prf_cnts_size()); + prf_copy_to_buffer(&buffer, __llvm_prf_names_start, prf_names_size()); + buffer += prf_get_padding(prf_names_size()); + + prf_serialize_values(&buffer); + +out: + return err; +} + +/* open() implementation for PGO. Creates a copy of the profiling data set. */ +static int prf_open(struct inode *inode, struct file *file) +{ + struct prf_private_data *data; + unsigned long flags; + int err; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) { + err = -ENOMEM; + goto out; + } + + flags = prf_lock(); + + err = prf_serialize(data); + if (unlikely(err)) { + kfree(data); + goto out_unlock; + } + + file->private_data = data; + +out_unlock: + prf_unlock(flags); +out: + return err; +} + +/* read() implementation for PGO. */ +static ssize_t prf_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos) +{ + struct prf_private_data *data = file->private_data; + + BUG_ON(!data); + + return simple_read_from_buffer(buf, count, ppos, data->buffer, + data->size); +} + +/* release() implementation for PGO. Release resources allocated by open(). */ +static int prf_release(struct inode *inode, struct file *file) +{ + struct prf_private_data *data = file->private_data; + + if (data) { + vfree(data->buffer); + kfree(data); + } + + return 0; +} + +static const struct file_operations prf_fops = { + .owner = THIS_MODULE, + .open = prf_open, + .read = prf_read, + .llseek = default_llseek, + .release = prf_release +}; + +/* write() implementation for resetting PGO's profile data. */ +static ssize_t reset_write(struct file *file, const char __user *addr, + size_t len, loff_t *pos) +{ + struct llvm_prf_data *data; + + memset(__llvm_prf_cnts_start, 0, prf_cnts_size()); + + for (data = __llvm_prf_data_start; data < __llvm_prf_data_end; data++) { + struct llvm_prf_value_node **vnodes; + u64 current_vsite_count; + u32 i; + + if (!data->values) + continue; + + current_vsite_count = 0; + vnodes = (struct llvm_prf_value_node **)data->values; + + for (i = LLVM_INSTR_PROF_IPVK_FIRST; i <= LLVM_INSTR_PROF_IPVK_LAST; i++) + current_vsite_count += data->num_value_sites[i]; + + for (i = 0; i < current_vsite_count; i++) { + struct llvm_prf_value_node *current_vnode = vnodes[i]; + + while (current_vnode) { + current_vnode->count = 0; + current_vnode = current_vnode->next; + } + } + } + + return len; +} + +static const struct file_operations prf_reset_fops = { + .owner = THIS_MODULE, + .write = reset_write, + .llseek = noop_llseek, +}; + +/* Create debugfs entries. */ +static int __init pgo_init(void) +{ + directory = debugfs_create_dir("pgo", NULL); + if (!directory) + goto err_remove; + + if (!debugfs_create_file("profraw", 0600, directory, NULL, + &prf_fops)) + goto err_remove; + + if (!debugfs_create_file("reset", 0200, directory, NULL, + &prf_reset_fops)) + goto err_remove; + + return 0; + +err_remove: + pr_err("initialization failed\n"); + return -EIO; +} + +/* Remove debugfs entries. */ +static void __exit pgo_exit(void) +{ + debugfs_remove_recursive(directory); +} + +module_init(pgo_init); +module_exit(pgo_exit); diff --git a/kernel/pgo/instrument.c b/kernel/pgo/instrument.c new file mode 100644 index 000000000000..62ff5cfce7b1 --- /dev/null +++ b/kernel/pgo/instrument.c @@ -0,0 +1,189 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2019 Google, Inc. + * + * Author: + * Sami Tolvanen <samitolvanen@google.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define pr_fmt(fmt) "pgo: " fmt + +#include <linux/bitops.h> +#include <linux/kernel.h> +#include <linux/export.h> +#include <linux/spinlock.h> +#include <linux/types.h> +#include "pgo.h" + +/* + * This lock guards both profile count updating and serialization of the + * profiling data. Keeping both of these activities separate via locking + * ensures that we don't try to serialize data that's only partially updated. + */ +static DEFINE_SPINLOCK(pgo_lock); +static int current_node; + +unsigned long prf_lock(void) +{ + unsigned long flags; + + spin_lock_irqsave(&pgo_lock, flags); + + return flags; +} + +void prf_unlock(unsigned long flags) +{ + spin_unlock_irqrestore(&pgo_lock, flags); +} + +/* + * Return a newly allocated profiling value node which contains the tracked + * value by the value profiler. + * Note: caller *must* hold pgo_lock. + */ +static struct llvm_prf_value_node *allocate_node(struct llvm_prf_data *p, + u32 index, u64 value) +{ + if (&__llvm_prf_vnds_start[current_node + 1] >= __llvm_prf_vnds_end) + return NULL; /* Out of nodes */ + + current_node++; + + /* Make sure the node is entirely within the section */ + if (&__llvm_prf_vnds_start[current_node] >= __llvm_prf_vnds_end || + &__llvm_prf_vnds_start[current_node + 1] > __llvm_prf_vnds_end) + return NULL; + + return &__llvm_prf_vnds_start[current_node]; +} + +/* + * Counts the number of times a target value is seen. + * + * Records the target value for the index if not seen before. Otherwise, + * increments the counter associated w/ the target value. + */ +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index); +void __llvm_profile_instrument_target(u64 target_value, void *data, u32 index) +{ + struct llvm_prf_data *p = (struct llvm_prf_data *)data; + struct llvm_prf_value_node **counters; + struct llvm_prf_value_node *curr; + struct llvm_prf_value_node *min = NULL; + struct llvm_prf_value_node *prev = NULL; + u64 min_count = U64_MAX; + u8 values = 0; + unsigned long flags; + + if (!p || !p->values) + return; + + counters = (struct llvm_prf_value_node **)p->values; + curr = counters[index]; + + while (curr) { + if (target_value == curr->value) { + curr->count++; + return; + } + + if (curr->count < min_count) { + min_count = curr->count; + min = curr; + } + + prev = curr; + curr = curr->next; + values++; + } + + if (values >= LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE) { + if (!min->count || !(--min->count)) { + curr = min; + curr->value = target_value; + curr->count++; + } + return; + } + + /* Lock when updating the value node structure. */ + flags = prf_lock(); + + curr = allocate_node(p, index, target_value); + if (!curr) + goto out; + + curr->value = target_value; + curr->count++; + + if (!counters[index]) + counters[index] = curr; + else if (prev && !prev->next) + prev->next = curr; + +out: + prf_unlock(flags); +} +EXPORT_SYMBOL(__llvm_profile_instrument_target); + +/* Counts the number of times a range of targets values are seen. */ +void __llvm_profile_instrument_range(u64 target_value, void *data, + u32 index, s64 precise_start, + s64 precise_last, s64 large_value); +void __llvm_profile_instrument_range(u64 target_value, void *data, + u32 index, s64 precise_start, + s64 precise_last, s64 large_value) +{ + if (large_value != S64_MIN && (s64)target_value >= large_value) + target_value = large_value; + else if ((s64)target_value < precise_start || + (s64)target_value > precise_last) + target_value = precise_last + 1; + + __llvm_profile_instrument_target(target_value, data, index); +} +EXPORT_SYMBOL(__llvm_profile_instrument_range); + +static u64 inst_prof_get_range_rep_value(u64 value) +{ + if (value <= 8) + /* The first ranges are individually tracked, use it as is. */ + return value; + else if (value >= 513) + /* The last range is mapped to its lowest value. */ + return 513; + else if (hweight64(value) == 1) + /* If it's a power of two, use it as is. */ + return value; + + /* Otherwise, take to the previous power of two + 1. */ + return (1 << (64 - __builtin_clzll(value) - 1)) + 1; +} + +/* + * The target values are partitioned into multiple ranges. The range spec is + * defined in compiler-rt/include/profile/InstrProfData.inc. + */ +void __llvm_profile_instrument_memop(u64 target_value, void *data, + u32 counter_index); +void __llvm_profile_instrument_memop(u64 target_value, void *data, + u32 counter_index) +{ + u64 rep_value; + + /* Map the target value to the representative value of its range. */ + rep_value = inst_prof_get_range_rep_value(target_value); + __llvm_profile_instrument_target(rep_value, data, counter_index); +} +EXPORT_SYMBOL(__llvm_profile_instrument_memop); diff --git a/kernel/pgo/pgo.h b/kernel/pgo/pgo.h new file mode 100644 index 000000000000..ddc8d3002fe5 --- /dev/null +++ b/kernel/pgo/pgo.h @@ -0,0 +1,203 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Google, Inc. + * + * Author: + * Sami Tolvanen <samitolvanen@google.com> + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef _PGO_H +#define _PGO_H + +/* + * Note: These internal LLVM definitions must match the compiler version. + * See llvm/include/llvm/ProfileData/InstrProfData.inc in LLVM's source code. + */ + +#define LLVM_INSTR_PROF_RAW_MAGIC_64 \ + ((u64)255 << 56 | \ + (u64)'l' << 48 | \ + (u64)'p' << 40 | \ + (u64)'r' << 32 | \ + (u64)'o' << 24 | \ + (u64)'f' << 16 | \ + (u64)'r' << 8 | \ + (u64)129) +#define LLVM_INSTR_PROF_RAW_MAGIC_32 \ + ((u64)255 << 56 | \ + (u64)'l' << 48 | \ + (u64)'p' << 40 | \ + (u64)'r' << 32 | \ + (u64)'o' << 24 | \ + (u64)'f' << 16 | \ + (u64)'R' << 8 | \ + (u64)129) + +#define LLVM_INSTR_PROF_RAW_VERSION 5 +#define LLVM_INSTR_PROF_DATA_ALIGNMENT 8 +#define LLVM_INSTR_PROF_IPVK_FIRST 0 +#define LLVM_INSTR_PROF_IPVK_LAST 1 +#define LLVM_INSTR_PROF_MAX_NUM_VAL_PER_SITE 255 + +#define LLVM_VARIANT_MASK_IR_PROF (0x1ULL << 56) +#define LLVM_VARIANT_MASK_CSIR_PROF (0x1ULL << 57) + +/** + * struct llvm_prf_header - represents the raw profile header data structure. + * @magic: the magic token for the file format. + * @version: the version of the file format. + * @data_size: the number of entries in the profile data section. + * @padding_bytes_before_counters: the number of padding bytes before the + * counters. + * @counters_size: the size in bytes of the LLVM profile section containing the + * counters. + * @padding_bytes_after_counters: the number of padding bytes after the + * counters. + * @names_size: the size in bytes of the LLVM profile section containing the + * counters' names. + * @counters_delta: the beginning of the LLMV profile counters section. + * @names_delta: the beginning of the LLMV profile names section. + * @value_kind_last: the last profile value kind. + */ +struct llvm_prf_header { + u64 magic; + u64 version; + u64 data_size; + u64 padding_bytes_before_counters; + u64 counters_size; + u64 padding_bytes_after_counters; + u64 names_size; + u64 counters_delta; + u64 names_delta; + u64 value_kind_last; +}; + +/** + * struct llvm_prf_data - represents the per-function control structure. + * @name_ref: the reference to the function's name. + * @func_hash: the hash value of the function. + * @counter_ptr: a pointer to the profile counter. + * @function_ptr: a pointer to the function. + * @values: the profiling values associated with this function. + * @num_counters: the number of counters in the function. + * @num_value_sites: the number of value profile sites. + */ +struct llvm_prf_data { + const u64 name_ref; + const u64 func_hash; + const void *counter_ptr; + const void *function_ptr; + void *values; + const u32 num_counters; + const u16 num_value_sites[LLVM_INSTR_PROF_IPVK_LAST + 1]; +} __aligned(LLVM_INSTR_PROF_DATA_ALIGNMENT); + +/** + * structure llvm_prf_value_node_data - represents the data part of the struct + * llvm_prf_value_node data structure. + * @value: the value counters. + * @count: the counters' count. + */ +struct llvm_prf_value_node_data { + u64 value; + u64 count; +}; + +/** + * struct llvm_prf_value_node - represents an internal data structure used by + * the value profiler. + * @value: the value counters. + * @count: the counters' count. + * @next: the next value node. + */ +struct llvm_prf_value_node { + u64 value; + u64 count; + struct llvm_prf_value_node *next; +}; + +/** + * struct llvm_prf_value_data - represents the value profiling data in indexed + * format. + * @total_size: the total size in bytes including this field. + * @num_value_kinds: the number of value profile kinds that has value profile + * data. + */ +struct llvm_prf_value_data { + u32 total_size; + u32 num_value_kinds; +}; + +/** + * struct llvm_prf_value_record - represents the on-disk layout of the value + * profile data of a particular kind for one function. + * @kind: the kind of the value profile record. + * @num_value_sites: the number of value profile sites. + * @site_count_array: the first element of the array that stores the number + * of profiled values for each value site. + */ +struct llvm_prf_value_record { + u32 kind; + u32 num_value_sites; + u8 site_count_array[]; +}; + +#define prf_get_value_record_header_size() \ + offsetof(struct llvm_prf_value_record, site_count_array) +#define prf_get_value_record_site_count_size(sites) \ + roundup((sites), 8) +#define prf_get_value_record_size(sites) \ + (prf_get_value_record_header_size() + \ + prf_get_value_record_site_count_size((sites))) + +/* Data sections */ +extern struct llvm_prf_data __llvm_prf_data_start[]; +extern struct llvm_prf_data __llvm_prf_data_end[]; + +extern u64 __llvm_prf_cnts_start[]; +extern u64 __llvm_prf_cnts_end[]; + +extern char __llvm_prf_names_start[]; +extern char __llvm_prf_names_end[]; + +extern struct llvm_prf_value_node __llvm_prf_vnds_start[]; +extern struct llvm_prf_value_node __llvm_prf_vnds_end[]; + +/* Locking for vnodes */ +extern unsigned long prf_lock(void); +extern void prf_unlock(unsigned long flags); + +#define __DEFINE_PRF_SIZE(s) \ + static inline unsigned long prf_ ## s ## _size(void) \ + { \ + unsigned long start = \ + (unsigned long)__llvm_prf_ ## s ## _start; \ + unsigned long end = \ + (unsigned long)__llvm_prf_ ## s ## _end; \ + return roundup(end - start, \ + sizeof(__llvm_prf_ ## s ## _start[0])); \ + } \ + static inline unsigned long prf_ ## s ## _count(void) \ + { \ + return prf_ ## s ## _size() / \ + sizeof(__llvm_prf_ ## s ## _start[0]); \ + } + +__DEFINE_PRF_SIZE(data); +__DEFINE_PRF_SIZE(cnts); +__DEFINE_PRF_SIZE(names); +__DEFINE_PRF_SIZE(vnds); + +#undef __DEFINE_PRF_SIZE + +#endif /* _PGO_H */ diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index eee59184de64..48a65d092c5b 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -139,6 +139,16 @@ _c_flags += $(if $(patsubst n%,, \ $(CFLAGS_GCOV)) endif +# +# Enable clang's PGO profiling flags for a file or directory depending on +# variables PGO_PROFILE_obj.o and PGO_PROFILE. +# +ifeq ($(CONFIG_PGO_CLANG),y) +_c_flags += $(if $(patsubst n%,, \ + $(PGO_PROFILE_$(basetarget).o)$(PGO_PROFILE)y), \ + $(CFLAGS_PGO_CLANG)) +endif + # # Enable address sanitizer flags for kernel except some files or directories # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)