From patchwork Mon Mar 3 06:53:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Menglong Dong X-Patchwork-Id: 13998252 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EED57C282C5 for ; Mon, 3 Mar 2025 07:00:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=g+TPvgOYa0/i6hWJUb51m5UVXTU1tbJahC/+FJNQh8Y=; b=NVJjR5+NulRIMgd0fflcsYX+Wi GRV/OLdLvC9zJJl5hWAg3XhLfQzRlUwjXRm6sWttTY7p+HpBS5ElznuyQXvy9Q1vBOE6Sdna9QTZY 8beVtiLV00iaAgO9/cnUegLoSPoq8qBFA2bL1m6fzwnFZCUDTeFO04x/kzc6pGmdNtgsxVg5uTz+g Y4KWg1MJDh5yZYln0n+ew4ZyCKL3I6na0sQJzr6AC8D1V5hogbkJsIcq2j2qUoulgLVyK0y9z7GtC HWbReMOhAx3CKqB2ngYAY+PARY3hcVKSWnnaf5mxAMIx7Qntcx1GI1HzifDqaGc6+tyfxN8Bj2qkG SjoTwKBg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tozmu-0000000HNEh-4Ala; Mon, 03 Mar 2025 07:00:28 +0000 Received: from mail-pj1-x1044.google.com ([2607:f8b0:4864:20::1044]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1toziX-0000000HMWs-3tz6 for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 06:55:59 +0000 Received: by mail-pj1-x1044.google.com with SMTP id 98e67ed59e1d1-2fe848040b1so8486158a91.3 for ; Sun, 02 Mar 2025 22:55:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740984957; x=1741589757; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=g+TPvgOYa0/i6hWJUb51m5UVXTU1tbJahC/+FJNQh8Y=; b=NnYOpcvHE1Wdm5fVQYy/ezBjAeWbi4Eltd5HNjXvYwIjZFHKJrPFnXOfrVoR2ZxoI4 gEuj5arCjjBZbzoLz8N6VDYhVoYj8Sq8DJ8getFZ5v4lMna5vQNdIXYNZPGoPYMYV663 PPp4WRniLO+tf87sYfwvef/hsjFlAPTvcB0j1EUwalSCUUcv8AAhdv6xXTk4OUT7FUFm ZkayR06ZWzcPwJcaojtEf8eK31tCy/W4vFVlUVfeJiBxhzuq8EwJAWrHbCYnJj4neZcK UifZNimD6Izyi1WkfEf0pHgdouzd+na8wUpRnG6HPn20JH6R1UGQA3NgDTCw5tJKnd9Z 3WQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740984957; x=1741589757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g+TPvgOYa0/i6hWJUb51m5UVXTU1tbJahC/+FJNQh8Y=; b=F/OwHtp+qOOxHRGLYB0bQAbP4k9eawyaBIGukEmCO+dcqM16IFDUURSiNHGIwmIkPF XemBTXAS5u3s6VLjwS/vak00n7hgkWkcz9XrUhQgIJtXZ4NU9auEKDJfKNF9zywDbj9h my43jR9oZF5q/2MDnEJ4/DfOUTfFmyn+S3sBsx0mYcPJJLtS7cMw4rBz3w+rpwdIxksX 7BWjaXbj+zu9j5u332DNM8+BHITj6TeHvnXnB9h2REvDM/IKAS+e/UpSkceBqlHHPw55 cyzHU9cFqCn4vjsghtzvtXK7j7zFCXMkHhJA2+TnfTb71R68zTrp22k0LMb/ekwP58v1 2cVw== X-Forwarded-Encrypted: i=1; AJvYcCUkndVTrCmGpZ77uqkUAOAd2QWA5cAlXeiOpxIvAr42/dIua+biuS8TQreM2doRD09p3h+PwFW2azojDhIM/u09@lists.infradead.org X-Gm-Message-State: AOJu0YzKv8P4vosyZjCiI7/LDy/ymoVovqBXn56n7gzpnd+Sf6kxaLCs 7r2RLmidLXMRxEysuzBUh/ni9I6mv+CAYdxOvdk6bb0jQoSaYLFq X-Gm-Gg: ASbGncukxGG9gTWJDxijyOD7M6FHyL352MTquStQR/M9Z5r7R0vnV5XcgGxEna7mUkn P6QBz3QFAvuXUTsQUwbe8e7sxlNDTvzl7vyMGgMTsQwG4E9q5vpt5DDrZ5TeFDkB9x5PO42Fr38 FuK+AoJI+SO08Vl9VNzGDqu3AJgvcq8LMS1xHoicv0a75PXb/q4x/E68lnbxcR6RPtVrDlzVx9c hYcECnKNDNFX4RVo5lUDspgmDbP68/aGOjx3K9zwQeWqBOU4kp5mMGtMuVs+3Y1sHnDUguuPVJt +odFAFSAXB+RIpIX1ibV9OGU3H+UYdzkeAb22xVCncffRkZ2CftkzOfH34uuXw== X-Google-Smtp-Source: AGHT+IGrORT4INWtpUNjluDwXHOnFxXU+/PjylHNkD+c6r6BnZ0CyNVYuiS0gySDL2RRjqIVNqAxlA== X-Received: by 2002:a17:90b:3c88:b0:2ee:ee77:2263 with SMTP id 98e67ed59e1d1-2febab2ecd6mr20664608a91.7.1740984956958; Sun, 02 Mar 2025 22:55:56 -0800 (PST) Received: from localhost.localdomain ([43.129.244.20]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fea6769ad2sm8139575a91.11.2025.03.02.22.55.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Mar 2025 22:55:56 -0800 (PST) From: Menglong Dong X-Google-Original-From: Menglong Dong To: peterz@infradead.org, rostedt@goodmis.org, mark.rutland@arm.com, alexei.starovoitov@gmail.com Cc: catalin.marinas@arm.com, will@kernel.org, mhiramat@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, jolsa@kernel.org, davem@davemloft.net, dsahern@kernel.org, mathieu.desnoyers@efficios.com, nathan@kernel.org, nick.desaulniers+lkml@gmail.com, morbo@google.com, samitolvanen@google.com, kees@kernel.org, dongml2@chinatelecom.cn, akpm@linux-foundation.org, riel@surriel.com, rppt@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v3 2/4] add per-function metadata storage support Date: Mon, 3 Mar 2025 14:53:43 +0800 Message-Id: <20250303065345.229298-3-dongml2@chinatelecom.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250303065345.229298-1-dongml2@chinatelecom.cn> References: <20250303065345.229298-1-dongml2@chinatelecom.cn> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250302_225557_983423_92820488 X-CRM114-Status: GOOD ( 33.21 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For now, there isn't a way to set and get per-function metadata with a low overhead, which is not convenient for some situations. Take BPF trampoline for example, we need to create a trampoline for each kernel function, as we have to store some information of the function to the trampoline, such as BPF progs, function arg count, etc. The performance overhead and memory consumption can be higher to create these trampolines. With the supporting of per-function metadata storage, we can store these information to the metadata, and create a global BPF trampoline for all the kernel functions. In the global trampoline, we get the information that we need from the function metadata through the ip (function address) with almost no overhead. Another beneficiary can be ftrace. For now, all the kernel functions that are enabled by dynamic ftrace will be added to a filter hash if there are more than one callbacks. And hash lookup will happen when the traced functions are called, which has an impact on the performance, see __ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function metadata supporting, we can store the information that if the callback is enabled on the kernel function to the metadata. Support per-function metadata storage in the function padding, and previous discussion can be found in [1]. Generally speaking, we have two way to implement this feature: 1. Create a function metadata array, and prepend a insn which can hold the index of the function metadata in the array. And store the insn to the function padding. 2. Allocate the function metadata with kmalloc(), and prepend a insn which hold the pointer of the metadata. And store the insn to the function padding. Compared with way 2, way 1 consume less space, but we need to do more work on the global function metadata array. And we implement this function in the way 1. Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1] Signed-off-by: Menglong Dong --- v2: - add supporting for arm64 - split out arch relevant code - refactor the commit log --- include/linux/kfunc_md.h | 25 ++++ kernel/Makefile | 1 + kernel/trace/Makefile | 1 + kernel/trace/kfunc_md.c | 239 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 266 insertions(+) create mode 100644 include/linux/kfunc_md.h create mode 100644 kernel/trace/kfunc_md.c diff --git a/include/linux/kfunc_md.h b/include/linux/kfunc_md.h new file mode 100644 index 000000000000..df616f0fcb36 --- /dev/null +++ b/include/linux/kfunc_md.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_KFUNC_MD_H +#define _LINUX_KFUNC_MD_H + +#include + +struct kfunc_md { + int users; + /* we can use this field later, make sure it is 8-bytes aligned + * for now. + */ + int pad0; + void *func; +}; + +extern struct kfunc_md *kfunc_mds; + +struct kfunc_md *kfunc_md_find(void *ip); +struct kfunc_md *kfunc_md_get(void *ip); +void kfunc_md_put(struct kfunc_md *meta); +void kfunc_md_put_by_ip(void *ip); +void kfunc_md_lock(void); +void kfunc_md_unlock(void); + +#endif diff --git a/kernel/Makefile b/kernel/Makefile index 87866b037fbe..7435674d5da3 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -108,6 +108,7 @@ obj-$(CONFIG_TRACE_CLOCK) += trace/ obj-$(CONFIG_RING_BUFFER) += trace/ obj-$(CONFIG_TRACEPOINTS) += trace/ obj-$(CONFIG_RETHOOK) += trace/ +obj-$(CONFIG_FUNCTION_METADATA) += trace/ obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-$(CONFIG_CPU_PM) += cpu_pm.o obj-$(CONFIG_BPF) += bpf/ diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 057cd975d014..9780ee3f8d8d 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -106,6 +106,7 @@ obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o obj-$(CONFIG_FPROBE) += fprobe.o obj-$(CONFIG_RETHOOK) += rethook.o obj-$(CONFIG_FPROBE_EVENTS) += trace_fprobe.o +obj-$(CONFIG_FUNCTION_METADATA) += kfunc_md.o obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o obj-$(CONFIG_RV) += rv/ diff --git a/kernel/trace/kfunc_md.c b/kernel/trace/kfunc_md.c new file mode 100644 index 000000000000..7ec25bcf778d --- /dev/null +++ b/kernel/trace/kfunc_md.c @@ -0,0 +1,239 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +#define ENTRIES_PER_PAGE (PAGE_SIZE / sizeof(struct kfunc_md)) + +static u32 kfunc_md_count = ENTRIES_PER_PAGE, kfunc_md_used; +struct kfunc_md __rcu *kfunc_mds; +EXPORT_SYMBOL_GPL(kfunc_mds); + +static DEFINE_MUTEX(kfunc_md_mutex); + + +void kfunc_md_unlock(void) +{ + mutex_unlock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_unlock); + +void kfunc_md_lock(void) +{ + mutex_lock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_lock); + +static u32 kfunc_md_get_index(void *ip) +{ + return *(u32 *)(ip - KFUNC_MD_DATA_OFFSET); +} + +static void kfunc_md_init(struct kfunc_md *mds, u32 start, u32 end) +{ + u32 i; + + for (i = start; i < end; i++) + mds[i].users = 0; +} + +static int kfunc_md_page_order(void) +{ + return fls(DIV_ROUND_UP(kfunc_md_count, ENTRIES_PER_PAGE)) - 1; +} + +/* Get next usable function metadata. On success, return the usable + * kfunc_md and store the index of it to *index. If no usable kfunc_md is + * found in kfunc_mds, a larger array will be allocated. + */ +static struct kfunc_md *kfunc_md_get_next(u32 *index) +{ + struct kfunc_md *new_mds, *mds; + u32 i, order; + + mds = rcu_dereference(kfunc_mds); + if (mds == NULL) { + order = kfunc_md_page_order(); + new_mds = (void *)__get_free_pages(GFP_KERNEL, order); + if (!new_mds) + return NULL; + kfunc_md_init(new_mds, 0, kfunc_md_count); + /* The first time to initialize kfunc_mds, so it is not + * used anywhere yet, and we can update it directly. + */ + rcu_assign_pointer(kfunc_mds, new_mds); + mds = new_mds; + } + + if (likely(kfunc_md_used < kfunc_md_count)) { + /* maybe we can manage the used function metadata entry + * with a bit map ? + */ + for (i = 0; i < kfunc_md_count; i++) { + if (!mds[i].users) { + kfunc_md_used++; + *index = i; + mds[i].users++; + return mds + i; + } + } + } + + order = kfunc_md_page_order(); + /* no available function metadata, so allocate a bigger function + * metadata array. + */ + new_mds = (void *)__get_free_pages(GFP_KERNEL, order + 1); + if (!new_mds) + return NULL; + + memcpy(new_mds, mds, kfunc_md_count * sizeof(*new_mds)); + kfunc_md_init(new_mds, kfunc_md_count, kfunc_md_count * 2); + + rcu_assign_pointer(kfunc_mds, new_mds); + synchronize_rcu(); + free_pages((u64)mds, order); + + mds = new_mds + kfunc_md_count; + *index = kfunc_md_count; + kfunc_md_count <<= 1; + kfunc_md_used++; + mds->users++; + + return mds; +} + +static int kfunc_md_text_poke(void *ip, void *insn, void *nop) +{ + void *target; + int ret = 0; + u8 *prog; + + target = ip - KFUNC_MD_INSN_OFFSET; + mutex_lock(&text_mutex); + if (insn) { + if (!memcmp(target, insn, KFUNC_MD_INSN_SIZE)) + goto out; + + if (memcmp(target, nop, KFUNC_MD_INSN_SIZE)) { + ret = -EBUSY; + goto out; + } + prog = insn; + } else { + if (!memcmp(target, nop, KFUNC_MD_INSN_SIZE)) + goto out; + prog = nop; + } + + ret = kfunc_md_arch_poke(target, prog); +out: + mutex_unlock(&text_mutex); + return ret; +} + +static bool __kfunc_md_put(struct kfunc_md *md) +{ + u8 nop_insn[KFUNC_MD_INSN_SIZE]; + + if (WARN_ON_ONCE(md->users <= 0)) + return false; + + md->users--; + if (md->users > 0) + return false; + + if (!kfunc_md_arch_exist(md->func)) + return false; + + kfunc_md_arch_nops(nop_insn); + /* release the metadata by recovering the function padding to NOPS */ + kfunc_md_text_poke(md->func, NULL, nop_insn); + /* TODO: we need a way to shrink the array "kfunc_mds" */ + kfunc_md_used--; + + return true; +} + +/* Decrease the reference of the md, release it if "md->users <= 0" */ +void kfunc_md_put(struct kfunc_md *md) +{ + mutex_lock(&kfunc_md_mutex); + __kfunc_md_put(md); + mutex_unlock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_put); + +/* Get a exist metadata by the function address, and NULL will be returned + * if not exist. + * + * NOTE: rcu lock should be held during reading the metadata, and + * kfunc_md_lock should be held if writing happens. + */ +struct kfunc_md *kfunc_md_find(void *ip) +{ + struct kfunc_md *md; + u32 index; + + if (kfunc_md_arch_exist(ip)) { + index = kfunc_md_get_index(ip); + if (WARN_ON_ONCE(index >= kfunc_md_count)) + return NULL; + + md = rcu_dereference(kfunc_mds) + index; + return md; + } + return NULL; +} +EXPORT_SYMBOL_GPL(kfunc_md_find); + +void kfunc_md_put_by_ip(void *ip) +{ + struct kfunc_md *md; + + mutex_lock(&kfunc_md_mutex); + md = kfunc_md_find(ip); + if (md) + __kfunc_md_put(md); + mutex_unlock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_put_by_ip); + +/* Get a exist metadata by the function address, and create one if not + * exist. Reference of the metadata will increase 1. + * + * NOTE: always call this function with kfunc_md_lock held, and all + * updating to metadata should also hold the kfunc_md_lock. + */ +struct kfunc_md *kfunc_md_get(void *ip) +{ + u8 nop_insn[KFUNC_MD_INSN_SIZE], insn[KFUNC_MD_INSN_SIZE]; + struct kfunc_md *md; + u32 index; + + md = kfunc_md_find(ip); + if (md) { + md->users++; + return md; + } + + md = kfunc_md_get_next(&index); + if (!md) + return NULL; + + kfunc_md_arch_pretend(insn, index); + kfunc_md_arch_nops(nop_insn); + + if (kfunc_md_text_poke(ip, insn, nop_insn)) { + kfunc_md_used--; + md->users = 0; + return NULL; + } + md->func = ip; + + return md; +} +EXPORT_SYMBOL_GPL(kfunc_md_get);