Context |
Check |
Description |
bpf/vmtest-bpf-next-VM_Test-24 |
success
|
Logs for x86_64-gcc / build-release
|
bpf/vmtest-bpf-next-VM_Test-6 |
success
|
Logs for aarch64-gcc / build-release
|
bpf/vmtest-bpf-next-VM_Test-13 |
success
|
Logs for s390x-gcc / GCC BPF
|
bpf/vmtest-bpf-next-VM_Test-14 |
success
|
Logs for s390x-gcc / build / build for s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-19 |
success
|
Logs for s390x-gcc / veristat-kernel
|
bpf/vmtest-bpf-next-VM_Test-0 |
success
|
Logs for Lint
|
bpf/vmtest-bpf-next-VM_Test-23 |
fail
|
Logs for x86_64-gcc / build / build for x86_64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-2 |
success
|
Logs for Unittests
|
bpf/vmtest-bpf-next-VM_Test-3 |
success
|
Logs for Validate matrix.py
|
bpf/vmtest-bpf-next-VM_Test-20 |
success
|
Logs for s390x-gcc / veristat-meta
|
bpf/vmtest-bpf-next-VM_Test-12 |
success
|
Logs for aarch64-gcc / veristat-meta
|
bpf/vmtest-bpf-next-VM_Test-11 |
success
|
Logs for aarch64-gcc / veristat-kernel
|
bpf/vmtest-bpf-next-VM_Test-22 |
success
|
Logs for x86_64-gcc / GCC BPF
|
bpf/vmtest-bpf-next-VM_Test-18 |
success
|
Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-15 |
success
|
Logs for s390x-gcc / build-release
|
bpf/vmtest-bpf-next-VM_Test-5 |
success
|
Logs for aarch64-gcc / build / build for aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-26 |
success
|
Logs for x86_64-gcc / veristat-kernel
|
bpf/vmtest-bpf-next-VM_Test-21 |
success
|
Logs for set-matrix
|
bpf/vmtest-bpf-next-VM_Test-1 |
success
|
Logs for ShellCheck
|
bpf/vmtest-bpf-next-VM_Test-25 |
success
|
Logs for x86_64-gcc / test
|
bpf/vmtest-bpf-next-VM_Test-27 |
success
|
Logs for x86_64-gcc / veristat-meta
|
bpf/vmtest-bpf-next-VM_Test-28 |
success
|
Logs for x86_64-llvm-17 / GCC BPF
|
bpf/vmtest-bpf-next-VM_Test-29 |
fail
|
Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
|
bpf/vmtest-bpf-next-VM_Test-30 |
fail
|
Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
|
bpf/vmtest-bpf-next-VM_Test-31 |
success
|
Logs for x86_64-llvm-17 / test
|
bpf/vmtest-bpf-next-VM_Test-32 |
success
|
Logs for x86_64-llvm-17 / veristat-kernel
|
bpf/vmtest-bpf-next-VM_Test-33 |
success
|
Logs for x86_64-llvm-17 / veristat-meta
|
bpf/vmtest-bpf-next-VM_Test-34 |
success
|
Logs for x86_64-llvm-18 / GCC BPF
|
bpf/vmtest-bpf-next-VM_Test-35 |
fail
|
Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
|
bpf/vmtest-bpf-next-VM_Test-36 |
fail
|
Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
|
bpf/vmtest-bpf-next-VM_Test-38 |
success
|
Logs for x86_64-llvm-18 / veristat-kernel
|
bpf/vmtest-bpf-next-VM_Test-37 |
success
|
Logs for x86_64-llvm-18 / test
|
bpf/vmtest-bpf-next-VM_Test-4 |
success
|
Logs for aarch64-gcc / GCC BPF
|
bpf/vmtest-bpf-next-VM_Test-39 |
success
|
Logs for x86_64-llvm-18 / veristat-meta
|
bpf/vmtest-bpf-next-VM_Test-7 |
success
|
Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-10 |
success
|
Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
|
bpf/vmtest-bpf-next-VM_Test-17 |
success
|
Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-16 |
success
|
Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
|
bpf/vmtest-bpf-next-VM_Test-8 |
success
|
Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
|
bpf/vmtest-bpf-next-PR |
fail
|
PR summary
|
bpf/vmtest-bpf-next-VM_Test-9 |
success
|
Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
|
netdev/series_format |
success
|
Posting correctly formatted
|
netdev/tree_selection |
success
|
Clearly marked for bpf-next
|
netdev/ynl |
success
|
Generated files up to date;
no warnings/errors;
no diff in generated;
|
netdev/fixes_present |
success
|
Fixes tag not required for -next series
|
netdev/header_inline |
success
|
No static functions without inline keyword in header files
|
netdev/build_32bit |
success
|
Errors and warnings before: 0 this patch: 0
|
netdev/build_tools |
success
|
Errors and warnings before: 26 (+0) this patch: 26 (+0)
|
netdev/cc_maintainers |
warning
|
2 maintainers not CCed: justinstitt@google.com dan.j.williams@intel.com
|
netdev/build_clang |
success
|
Errors and warnings before: 1 this patch: 1
|
netdev/verify_signedoff |
success
|
Signed-off-by tag matches author and committer
|
netdev/deprecated_api |
success
|
None detected
|
netdev/check_selftest |
success
|
No net selftest shell script
|
netdev/verify_fixes |
success
|
No Fixes tag
|
netdev/build_allmodconfig_warn |
fail
|
Errors and warnings before: 16 this patch: 16
|
netdev/checkpatch |
warning
|
CHECK: Comparison to NULL could be written "!mds"
CHECK: Please don't use multiple blank lines
WARNING: From:/Signed-off-by: email address mismatch: 'From: Menglong Dong <menglong8.dong@gmail.com>' != 'Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>'
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
|
netdev/build_clang_rust |
success
|
No Rust files in patch. Skipping build
|
netdev/kdoc |
success
|
Errors and warnings before: 0 this patch: 0
|
netdev/source_inline |
success
|
Was 0 now: 0
|
new file mode 100644
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KFUNC_MD_H
+#define _LINUX_KFUNC_MD_H
+
+#include <linux/kernel.h>
+
+struct kfunc_md {
+ int users;
+ /* we can use this field later, make sure it is 8-bytes aligned
+ * for now.
+ */
+ int pad0;
+ void *func;
+};
+
+extern struct kfunc_md *kfunc_mds;
+
+struct kfunc_md *kfunc_md_find(void *ip);
+struct kfunc_md *kfunc_md_get(void *ip);
+void kfunc_md_put(struct kfunc_md *meta);
+void kfunc_md_put_by_ip(void *ip);
+void kfunc_md_lock(void);
+void kfunc_md_unlock(void);
+
+#endif
@@ -108,6 +108,7 @@ obj-$(CONFIG_TRACE_CLOCK) += trace/
obj-$(CONFIG_RING_BUFFER) += trace/
obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_RETHOOK) += trace/
+obj-$(CONFIG_FUNCTION_METADATA) += trace/
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
@@ -106,6 +106,7 @@ obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBE) += fprobe.o
obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_FPROBE_EVENTS) += trace_fprobe.o
+obj-$(CONFIG_FUNCTION_METADATA) += kfunc_md.o
obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
obj-$(CONFIG_RV) += rv/
new file mode 100644
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/slab.h>
+#include <linux/memory.h>
+#include <linux/rcupdate.h>
+#include <linux/ftrace.h>
+#include <linux/kfunc_md.h>
+
+#define ENTRIES_PER_PAGE (PAGE_SIZE / sizeof(struct kfunc_md))
+
+static u32 kfunc_md_count = ENTRIES_PER_PAGE, kfunc_md_used;
+struct kfunc_md __rcu *kfunc_mds;
+EXPORT_SYMBOL_GPL(kfunc_mds);
+
+static DEFINE_MUTEX(kfunc_md_mutex);
+
+
+void kfunc_md_unlock(void)
+{
+ mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_unlock);
+
+void kfunc_md_lock(void)
+{
+ mutex_lock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_lock);
+
+static u32 kfunc_md_get_index(void *ip)
+{
+ return *(u32 *)(ip - KFUNC_MD_DATA_OFFSET);
+}
+
+static void kfunc_md_init(struct kfunc_md *mds, u32 start, u32 end)
+{
+ u32 i;
+
+ for (i = start; i < end; i++)
+ mds[i].users = 0;
+}
+
+static int kfunc_md_page_order(void)
+{
+ return fls(DIV_ROUND_UP(kfunc_md_count, ENTRIES_PER_PAGE)) - 1;
+}
+
+/* Get next usable function metadata. On success, return the usable
+ * kfunc_md and store the index of it to *index. If no usable kfunc_md is
+ * found in kfunc_mds, a larger array will be allocated.
+ */
+static struct kfunc_md *kfunc_md_get_next(u32 *index)
+{
+ struct kfunc_md *new_mds, *mds;
+ u32 i, order;
+
+ mds = rcu_dereference(kfunc_mds);
+ if (mds == NULL) {
+ order = kfunc_md_page_order();
+ new_mds = (void *)__get_free_pages(GFP_KERNEL, order);
+ if (!new_mds)
+ return NULL;
+ kfunc_md_init(new_mds, 0, kfunc_md_count);
+ /* The first time to initialize kfunc_mds, so it is not
+ * used anywhere yet, and we can update it directly.
+ */
+ rcu_assign_pointer(kfunc_mds, new_mds);
+ mds = new_mds;
+ }
+
+ if (likely(kfunc_md_used < kfunc_md_count)) {
+ /* maybe we can manage the used function metadata entry
+ * with a bit map ?
+ */
+ for (i = 0; i < kfunc_md_count; i++) {
+ if (!mds[i].users) {
+ kfunc_md_used++;
+ *index = i;
+ mds[i].users++;
+ return mds + i;
+ }
+ }
+ }
+
+ order = kfunc_md_page_order();
+ /* no available function metadata, so allocate a bigger function
+ * metadata array.
+ */
+ new_mds = (void *)__get_free_pages(GFP_KERNEL, order + 1);
+ if (!new_mds)
+ return NULL;
+
+ memcpy(new_mds, mds, kfunc_md_count * sizeof(*new_mds));
+ kfunc_md_init(new_mds, kfunc_md_count, kfunc_md_count * 2);
+
+ rcu_assign_pointer(kfunc_mds, new_mds);
+ synchronize_rcu();
+ free_pages((u64)mds, order);
+
+ mds = new_mds + kfunc_md_count;
+ *index = kfunc_md_count;
+ kfunc_md_count <<= 1;
+ kfunc_md_used++;
+ mds->users++;
+
+ return mds;
+}
+
+static int kfunc_md_text_poke(void *ip, void *insn, void *nop)
+{
+ void *target;
+ int ret = 0;
+ u8 *prog;
+
+ target = ip - KFUNC_MD_INSN_OFFSET;
+ mutex_lock(&text_mutex);
+ if (insn) {
+ if (!memcmp(target, insn, KFUNC_MD_INSN_SIZE))
+ goto out;
+
+ if (memcmp(target, nop, KFUNC_MD_INSN_SIZE)) {
+ ret = -EBUSY;
+ goto out;
+ }
+ prog = insn;
+ } else {
+ if (!memcmp(target, nop, KFUNC_MD_INSN_SIZE))
+ goto out;
+ prog = nop;
+ }
+
+ ret = kfunc_md_arch_poke(target, prog);
+out:
+ mutex_unlock(&text_mutex);
+ return ret;
+}
+
+static bool __kfunc_md_put(struct kfunc_md *md)
+{
+ u8 nop_insn[KFUNC_MD_INSN_SIZE];
+
+ if (WARN_ON_ONCE(md->users <= 0))
+ return false;
+
+ md->users--;
+ if (md->users > 0)
+ return false;
+
+ if (!kfunc_md_arch_exist(md->func))
+ return false;
+
+ kfunc_md_arch_nops(nop_insn);
+ /* release the metadata by recovering the function padding to NOPS */
+ kfunc_md_text_poke(md->func, NULL, nop_insn);
+ /* TODO: we need a way to shrink the array "kfunc_mds" */
+ kfunc_md_used--;
+
+ return true;
+}
+
+/* Decrease the reference of the md, release it if "md->users <= 0" */
+void kfunc_md_put(struct kfunc_md *md)
+{
+ mutex_lock(&kfunc_md_mutex);
+ __kfunc_md_put(md);
+ mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_put);
+
+/* Get a exist metadata by the function address, and NULL will be returned
+ * if not exist.
+ *
+ * NOTE: rcu lock should be held during reading the metadata, and
+ * kfunc_md_lock should be held if writing happens.
+ */
+struct kfunc_md *kfunc_md_find(void *ip)
+{
+ struct kfunc_md *md;
+ u32 index;
+
+ if (kfunc_md_arch_exist(ip)) {
+ index = kfunc_md_get_index(ip);
+ if (WARN_ON_ONCE(index >= kfunc_md_count))
+ return NULL;
+
+ md = rcu_dereference(kfunc_mds) + index;
+ return md;
+ }
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(kfunc_md_find);
+
+void kfunc_md_put_by_ip(void *ip)
+{
+ struct kfunc_md *md;
+
+ mutex_lock(&kfunc_md_mutex);
+ md = kfunc_md_find(ip);
+ if (md)
+ __kfunc_md_put(md);
+ mutex_unlock(&kfunc_md_mutex);
+}
+EXPORT_SYMBOL_GPL(kfunc_md_put_by_ip);
+
+/* Get a exist metadata by the function address, and create one if not
+ * exist. Reference of the metadata will increase 1.
+ *
+ * NOTE: always call this function with kfunc_md_lock held, and all
+ * updating to metadata should also hold the kfunc_md_lock.
+ */
+struct kfunc_md *kfunc_md_get(void *ip)
+{
+ u8 nop_insn[KFUNC_MD_INSN_SIZE], insn[KFUNC_MD_INSN_SIZE];
+ struct kfunc_md *md;
+ u32 index;
+
+ md = kfunc_md_find(ip);
+ if (md) {
+ md->users++;
+ return md;
+ }
+
+ md = kfunc_md_get_next(&index);
+ if (!md)
+ return NULL;
+
+ kfunc_md_arch_pretend(insn, index);
+ kfunc_md_arch_nops(nop_insn);
+
+ if (kfunc_md_text_poke(ip, insn, nop_insn)) {
+ kfunc_md_used--;
+ md->users = 0;
+ return NULL;
+ }
+ md->func = ip;
+
+ return md;
+}
+EXPORT_SYMBOL_GPL(kfunc_md_get);
For now, there isn't a way to set and get per-function metadata with a low overhead, which is not convenient for some situations. Take BPF trampoline for example, we need to create a trampoline for each kernel function, as we have to store some information of the function to the trampoline, such as BPF progs, function arg count, etc. The performance overhead and memory consumption can be higher to create these trampolines. With the supporting of per-function metadata storage, we can store these information to the metadata, and create a global BPF trampoline for all the kernel functions. In the global trampoline, we get the information that we need from the function metadata through the ip (function address) with almost no overhead. Another beneficiary can be ftrace. For now, all the kernel functions that are enabled by dynamic ftrace will be added to a filter hash if there are more than one callbacks. And hash lookup will happen when the traced functions are called, which has an impact on the performance, see __ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function metadata supporting, we can store the information that if the callback is enabled on the kernel function to the metadata. Support per-function metadata storage in the function padding, and previous discussion can be found in [1]. Generally speaking, we have two way to implement this feature: 1. Create a function metadata array, and prepend a insn which can hold the index of the function metadata in the array. And store the insn to the function padding. 2. Allocate the function metadata with kmalloc(), and prepend a insn which hold the pointer of the metadata. And store the insn to the function padding. Compared with way 2, way 1 consume less space, but we need to do more work on the global function metadata array. And we implement this function in the way 1. Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1] Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> --- v2: - add supporting for arm64 - split out arch relevant code - refactor the commit log --- include/linux/kfunc_md.h | 25 ++++ kernel/Makefile | 1 + kernel/trace/Makefile | 1 + kernel/trace/kfunc_md.c | 239 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 266 insertions(+) create mode 100644 include/linux/kfunc_md.h create mode 100644 kernel/trace/kfunc_md.c