From patchwork Mon Mar 3 06:53:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Menglong Dong X-Patchwork-Id: 13998251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 72FE9C282C6 for ; Mon, 3 Mar 2025 06:59:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=nymZZudf34hrdxusEHfgJt6WkWUJ+6WT6FOvsBZVSeI=; b=2FLGq/rhe5zzuvyd4TZWeZuS8T s7M5QPUKm6gE5ht7SFnXnVZaZBuXr48UB+C0luzLpx0eRTsLZxz0k1chrJWRS7kFG+MDkSEnMuRnj 1w7jYRTv+InEeza1wkcrH564OT5S7PMCrBbyZoUo17t+hcgbO7Zg7y7EHCD1nIpM++VOnG/bLdXM/ pwwXmgKvCKXx0QqTrNYEpit9Zk1NBDAp+T8sr1eY4tK5dfBirvEP7ecIqxo06neCe8bf1im/EBcW1 KYz0zzYrjxUXYNmH+LigFGbaUzHuvyuzFddEm2I/CxeejNFGEf2qyzf8ECx/l2ZBUsd7WUalENCJe nYd+p+mg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tozlO-0000000HN77-1B3V; Mon, 03 Mar 2025 06:58:54 +0000 Received: from mail-pj1-x1041.google.com ([2607:f8b0:4864:20::1041]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1toziP-0000000HMVD-3snS for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 06:55:51 +0000 Received: by mail-pj1-x1041.google.com with SMTP id 98e67ed59e1d1-2fe821570e4so5804027a91.0 for ; Sun, 02 Mar 2025 22:55:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740984949; x=1741589749; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nymZZudf34hrdxusEHfgJt6WkWUJ+6WT6FOvsBZVSeI=; b=cGeni1eJ0Pv/c1fxarMLMnsnHk8WyotBCBOJ/5jKO9b5pADqfyjPDqQCEpUra2HKld T76OatE4f4I3NcvYRBwJcZcaEg/4Zu8D+Jtfk0hU6MDVsXqOy6S4pp1zYgSUeOkbkZwT 1J0VkIGlG8ySAtPvl9Hn6MsINOiqr1taUZw13D+cATX6OM1/EwvfSJdLCqRyGYjvb53A B5B9KRGuZKgZ0nNvDAonTg6FvqcarbPNvHYJFSTxrTXLO+f931TNQCcV6aokVqPYhN3J uzHmJTekUMSpG+CQ++nL79TyC+P6dwgP93ePxUQiTwepLvD4Op8icbB3MO3AwwoOnIMZ /hRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740984949; x=1741589749; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nymZZudf34hrdxusEHfgJt6WkWUJ+6WT6FOvsBZVSeI=; b=vNgsgxNmNBJ4jHZafF7WThgAKphAbuayK4LqWhnNpwqpZ801GZ0NTqr7jWJgGKNy8U p/HXfgcURbuKq9oHiDR6yWx0ms79lRDUW44IkvX6Y/yGgcervhJn6BvPbL9eHPrhWLGF jULI4W8wa4R69/oe30UaTBg5KMgmrHSfyszSYpz9RUlxxMxfyfeKdmOh62TfwKEFRqor Bu8nLPuiN33kCdORZtBUh+hv9f/7h1gm3ZuOpDjaPcjtDpWV3AIBq1UjmllZCwzquIUE Qv60fziX9efsqL2+nzhKg/qm3cAlxfRc08WW4R0Po2uXSN81XUlXflnOz96WeG93VqKe 3TLQ== X-Forwarded-Encrypted: i=1; AJvYcCV9HlKjbYgvzICCLtOEPjUnLiGkpiw7PHRjwn+bVfAxPRWM5lXN00D9JBYmGFwM0rSLgWk4xP7UyFzMMM0a6fiY@lists.infradead.org X-Gm-Message-State: AOJu0Yxk6ngR9LvkQwQ9/0SrflcHZmhnMy9VZFtjAoSxqRxOdvA0+Izt bcA8jJook+GU7DjbSyD5uPG5LHKsfj9kXTgJdRVjEglDI2z0a1Fx X-Gm-Gg: ASbGncu9POJKn6i4k6AMl3PkwcYumMyum73lePxtyueYMpYvAe70GtJ1GiRt1Pg5fYg 4BFsVHhuspvLzb4YO/uAwjnXqVw0GeNC3MBSJ1hC+tRmADeB4kjYF2Rbyeb49dG+SWYfa3AMoN8 dbYfcXGNs/nlimNBOZzXCrCIPH8c0HmkCn9z3+jZjA/FzyuTo23XFjWNCchePbxxwj6uZMNGZRG FjHQ5JT1P4hMncCNlC+vLii2H/i28N1O9HZ23BDyf7kSNiUSxbD0S4OQDLVI41Kcey3eNL9QB0v Km8rEfusD3Fiu862kMDxCmWwKa14BhmyreA00NW8eZVmeDz5mVF/B5nTKWXjNw== X-Google-Smtp-Source: AGHT+IGSoY6DTXHZdCGzW7B6A82lurvA8twdiEOcJq7108bwjigQWB9/CnfNiFNnV2PDUqTyIKthyQ== X-Received: by 2002:a17:90b:2f8d:b0:2ee:48bf:7dc3 with SMTP id 98e67ed59e1d1-2febab7862fmr19657613a91.15.1740984949306; Sun, 02 Mar 2025 22:55:49 -0800 (PST) Received: from localhost.localdomain ([43.129.244.20]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fea6769ad2sm8139575a91.11.2025.03.02.22.55.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Mar 2025 22:55:48 -0800 (PST) From: Menglong Dong X-Google-Original-From: Menglong Dong To: peterz@infradead.org, rostedt@goodmis.org, mark.rutland@arm.com, alexei.starovoitov@gmail.com Cc: catalin.marinas@arm.com, will@kernel.org, mhiramat@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, jolsa@kernel.org, davem@davemloft.net, dsahern@kernel.org, mathieu.desnoyers@efficios.com, nathan@kernel.org, nick.desaulniers+lkml@gmail.com, morbo@google.com, samitolvanen@google.com, kees@kernel.org, dongml2@chinatelecom.cn, akpm@linux-foundation.org, riel@surriel.com, rppt@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v3 1/4] x86/ibt: factor out cfi and fineibt offset Date: Mon, 3 Mar 2025 14:53:42 +0800 Message-Id: <20250303065345.229298-2-dongml2@chinatelecom.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250303065345.229298-1-dongml2@chinatelecom.cn> References: <20250303065345.229298-1-dongml2@chinatelecom.cn> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250302_225549_963279_6BB0848C X-CRM114-Status: GOOD ( 18.02 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For now, the layout of cfi and fineibt is hard coded, and the padding is fixed on 16 bytes. Factor out FINEIBT_INSN_OFFSET and CFI_INSN_OFFSET. CFI_INSN_OFFSET is the offset of cfi, which is the same as FUNCTION_ALIGNMENT when CALL_PADDING is enabled. And FINEIBT_INSN_OFFSET is the offset where we put the fineibt preamble on, which is 16 for now. When the FUNCTION_ALIGNMENT is bigger than 16, we place the fineibt preamble on the last 16 bytes of the padding for better performance, which means the fineibt preamble don't use the space that cfi uses. Signed-off-by: Menglong Dong --- arch/x86/include/asm/cfi.h | 12 ++++++++---- arch/x86/kernel/alternative.c | 27 ++++++++++++++++++++------- arch/x86/net/bpf_jit_comp.c | 22 +++++++++++----------- 3 files changed, 39 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 31d19c815f99..ab51fa0ef6af 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -109,15 +109,19 @@ enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); extern u32 cfi_bpf_hash; extern u32 cfi_bpf_subprog_hash; +#ifdef CONFIG_CALL_PADDING +#define FINEIBT_INSN_OFFSET 16 +#define CFI_INSN_OFFSET CONFIG_FUNCTION_ALIGNMENT +#else +#define CFI_INSN_OFFSET 5 +#endif + static inline int cfi_get_offset(void) { switch (cfi_mode) { case CFI_FINEIBT: - return 16; case CFI_KCFI: - if (IS_ENABLED(CONFIG_CALL_PADDING)) - return 16; - return 5; + return CFI_INSN_OFFSET; default: return 0; } diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index c71b575bf229..ad050d09cb2b 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -908,7 +908,7 @@ void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct mod poison_endbr(addr, wr_addr, true); if (IS_ENABLED(CONFIG_FINEIBT)) - poison_cfi(addr - 16, wr_addr - 16); + poison_cfi(addr, wr_addr); } } @@ -974,12 +974,15 @@ u32 cfi_get_func_hash(void *func) { u32 hash; - func -= cfi_get_offset(); switch (cfi_mode) { +#ifdef CONFIG_FINEIBT case CFI_FINEIBT: + func -= FINEIBT_INSN_OFFSET; func += 7; break; +#endif case CFI_KCFI: + func -= CFI_INSN_OFFSET; func += 1; break; default: @@ -1068,7 +1071,7 @@ early_param("cfi", cfi_parse_cmdline); * * caller: caller: * movl $(-0x12345678),%r10d // 6 movl $0x12345678,%r10d // 6 - * addl $-15(%r11),%r10d // 4 sub $16,%r11 // 4 + * addl $-15(%r11),%r10d // 4 sub $FINEIBT_INSN_OFFSET,%r11 // 4 * je 1f // 2 nop4 // 4 * ud2 // 2 * 1: call __x86_indirect_thunk_r11 // 5 call *%r11; nop2; // 5 @@ -1092,10 +1095,14 @@ extern u8 fineibt_preamble_end[]; #define fineibt_preamble_size (fineibt_preamble_end - fineibt_preamble_start) #define fineibt_preamble_hash 7 +#define ___OFFSET_STR(x) #x +#define __OFFSET_STR(x) ___OFFSET_STR(x) +#define OFFSET_STR __OFFSET_STR(FINEIBT_INSN_OFFSET) + asm( ".pushsection .rodata \n" "fineibt_caller_start: \n" " movl $0x12345678, %r10d \n" - " sub $16, %r11 \n" + " sub $"OFFSET_STR", %r11 \n" ASM_NOP4 "fineibt_caller_end: \n" ".popsection \n" @@ -1225,6 +1232,7 @@ static int cfi_rewrite_preamble(s32 *start, s32 *end, struct module *mod) addr, addr, 5, addr)) return -EINVAL; + wr_addr += (CFI_INSN_OFFSET - FINEIBT_INSN_OFFSET); text_poke_early(wr_addr, fineibt_preamble_start, fineibt_preamble_size); WARN_ON(*(u32 *)(wr_addr + fineibt_preamble_hash) != 0x12345678); text_poke_early(wr_addr + fineibt_preamble_hash, &hash, 4); @@ -1241,7 +1249,8 @@ static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod) void *addr = (void *)s + *s; void *wr_addr = module_writable_address(mod, addr); - poison_endbr(addr + 16, wr_addr + 16, false); + poison_endbr(addr + CFI_INSN_OFFSET, wr_addr + CFI_INSN_OFFSET, + false); } } @@ -1347,12 +1356,12 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, return; case CFI_FINEIBT: - /* place the FineIBT preamble at func()-16 */ + /* place the FineIBT preamble at func()-FINEIBT_INSN_OFFSET */ ret = cfi_rewrite_preamble(start_cfi, end_cfi, mod); if (ret) goto err; - /* rewrite the callers to target func()-16 */ + /* rewrite the callers to target func()-FINEIBT_INSN_OFFSET */ ret = cfi_rewrite_callers(start_retpoline, end_retpoline, mod); if (ret) goto err; @@ -1381,6 +1390,8 @@ static void poison_cfi(void *addr, void *wr_addr) { switch (cfi_mode) { case CFI_FINEIBT: + addr -= FINEIBT_INSN_OFFSET; + wr_addr -= FINEIBT_INSN_OFFSET; /* * __cfi_\func: * osp nopl (%rax) @@ -1394,6 +1405,8 @@ static void poison_cfi(void *addr, void *wr_addr) break; case CFI_KCFI: + addr -= CFI_INSN_OFFSET; + wr_addr -= CFI_INSN_OFFSET; /* * __cfi_\func: * movl $0, %eax diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a43fc5af973d..e0ddb0fd28e2 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -414,6 +414,12 @@ static void emit_nops(u8 **pprog, int len) static void emit_fineibt(u8 **pprog, u32 hash) { u8 *prog = *pprog; +#ifdef CONFIG_CALL_PADDING + int i; + + for (i = 0; i < CFI_INSN_OFFSET - 16; i++) + EMIT1(0x90); +#endif EMIT_ENDBR(); EMIT3_off32(0x41, 0x81, 0xea, hash); /* subl $hash, %r10d */ @@ -428,20 +434,14 @@ static void emit_fineibt(u8 **pprog, u32 hash) static void emit_kcfi(u8 **pprog, u32 hash) { u8 *prog = *pprog; +#ifdef CONFIG_CALL_PADDING + int i; +#endif EMIT1_off32(0xb8, hash); /* movl $hash, %eax */ #ifdef CONFIG_CALL_PADDING - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); - EMIT1(0x90); + for (i = 0; i < CFI_INSN_OFFSET - 5; i++) + EMIT1(0x90); #endif EMIT_ENDBR(); From patchwork Mon Mar 3 06:53:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Menglong Dong X-Patchwork-Id: 13998252 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EED57C282C5 for ; Mon, 3 Mar 2025 07:00:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=g+TPvgOYa0/i6hWJUb51m5UVXTU1tbJahC/+FJNQh8Y=; b=NVJjR5+NulRIMgd0fflcsYX+Wi GRV/OLdLvC9zJJl5hWAg3XhLfQzRlUwjXRm6sWttTY7p+HpBS5ElznuyQXvy9Q1vBOE6Sdna9QTZY 8beVtiLV00iaAgO9/cnUegLoSPoq8qBFA2bL1m6fzwnFZCUDTeFO04x/kzc6pGmdNtgsxVg5uTz+g Y4KWg1MJDh5yZYln0n+ew4ZyCKL3I6na0sQJzr6AC8D1V5hogbkJsIcq2j2qUoulgLVyK0y9z7GtC HWbReMOhAx3CKqB2ngYAY+PARY3hcVKSWnnaf5mxAMIx7Qntcx1GI1HzifDqaGc6+tyfxN8Bj2qkG SjoTwKBg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tozmu-0000000HNEh-4Ala; Mon, 03 Mar 2025 07:00:28 +0000 Received: from mail-pj1-x1044.google.com ([2607:f8b0:4864:20::1044]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1toziX-0000000HMWs-3tz6 for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 06:55:59 +0000 Received: by mail-pj1-x1044.google.com with SMTP id 98e67ed59e1d1-2fe848040b1so8486158a91.3 for ; Sun, 02 Mar 2025 22:55:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740984957; x=1741589757; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=g+TPvgOYa0/i6hWJUb51m5UVXTU1tbJahC/+FJNQh8Y=; b=NnYOpcvHE1Wdm5fVQYy/ezBjAeWbi4Eltd5HNjXvYwIjZFHKJrPFnXOfrVoR2ZxoI4 gEuj5arCjjBZbzoLz8N6VDYhVoYj8Sq8DJ8getFZ5v4lMna5vQNdIXYNZPGoPYMYV663 PPp4WRniLO+tf87sYfwvef/hsjFlAPTvcB0j1EUwalSCUUcv8AAhdv6xXTk4OUT7FUFm ZkayR06ZWzcPwJcaojtEf8eK31tCy/W4vFVlUVfeJiBxhzuq8EwJAWrHbCYnJj4neZcK UifZNimD6Izyi1WkfEf0pHgdouzd+na8wUpRnG6HPn20JH6R1UGQA3NgDTCw5tJKnd9Z 3WQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740984957; x=1741589757; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g+TPvgOYa0/i6hWJUb51m5UVXTU1tbJahC/+FJNQh8Y=; b=F/OwHtp+qOOxHRGLYB0bQAbP4k9eawyaBIGukEmCO+dcqM16IFDUURSiNHGIwmIkPF XemBTXAS5u3s6VLjwS/vak00n7hgkWkcz9XrUhQgIJtXZ4NU9auEKDJfKNF9zywDbj9h my43jR9oZF5q/2MDnEJ4/DfOUTfFmyn+S3sBsx0mYcPJJLtS7cMw4rBz3w+rpwdIxksX 7BWjaXbj+zu9j5u332DNM8+BHITj6TeHvnXnB9h2REvDM/IKAS+e/UpSkceBqlHHPw55 cyzHU9cFqCn4vjsghtzvtXK7j7zFCXMkHhJA2+TnfTb71R68zTrp22k0LMb/ekwP58v1 2cVw== X-Forwarded-Encrypted: i=1; AJvYcCUkndVTrCmGpZ77uqkUAOAd2QWA5cAlXeiOpxIvAr42/dIua+biuS8TQreM2doRD09p3h+PwFW2azojDhIM/u09@lists.infradead.org X-Gm-Message-State: AOJu0YzKv8P4vosyZjCiI7/LDy/ymoVovqBXn56n7gzpnd+Sf6kxaLCs 7r2RLmidLXMRxEysuzBUh/ni9I6mv+CAYdxOvdk6bb0jQoSaYLFq X-Gm-Gg: ASbGncukxGG9gTWJDxijyOD7M6FHyL352MTquStQR/M9Z5r7R0vnV5XcgGxEna7mUkn P6QBz3QFAvuXUTsQUwbe8e7sxlNDTvzl7vyMGgMTsQwG4E9q5vpt5DDrZ5TeFDkB9x5PO42Fr38 FuK+AoJI+SO08Vl9VNzGDqu3AJgvcq8LMS1xHoicv0a75PXb/q4x/E68lnbxcR6RPtVrDlzVx9c hYcECnKNDNFX4RVo5lUDspgmDbP68/aGOjx3K9zwQeWqBOU4kp5mMGtMuVs+3Y1sHnDUguuPVJt +odFAFSAXB+RIpIX1ibV9OGU3H+UYdzkeAb22xVCncffRkZ2CftkzOfH34uuXw== X-Google-Smtp-Source: AGHT+IGrORT4INWtpUNjluDwXHOnFxXU+/PjylHNkD+c6r6BnZ0CyNVYuiS0gySDL2RRjqIVNqAxlA== X-Received: by 2002:a17:90b:3c88:b0:2ee:ee77:2263 with SMTP id 98e67ed59e1d1-2febab2ecd6mr20664608a91.7.1740984956958; Sun, 02 Mar 2025 22:55:56 -0800 (PST) Received: from localhost.localdomain ([43.129.244.20]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fea6769ad2sm8139575a91.11.2025.03.02.22.55.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Mar 2025 22:55:56 -0800 (PST) From: Menglong Dong X-Google-Original-From: Menglong Dong To: peterz@infradead.org, rostedt@goodmis.org, mark.rutland@arm.com, alexei.starovoitov@gmail.com Cc: catalin.marinas@arm.com, will@kernel.org, mhiramat@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, jolsa@kernel.org, davem@davemloft.net, dsahern@kernel.org, mathieu.desnoyers@efficios.com, nathan@kernel.org, nick.desaulniers+lkml@gmail.com, morbo@google.com, samitolvanen@google.com, kees@kernel.org, dongml2@chinatelecom.cn, akpm@linux-foundation.org, riel@surriel.com, rppt@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v3 2/4] add per-function metadata storage support Date: Mon, 3 Mar 2025 14:53:43 +0800 Message-Id: <20250303065345.229298-3-dongml2@chinatelecom.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250303065345.229298-1-dongml2@chinatelecom.cn> References: <20250303065345.229298-1-dongml2@chinatelecom.cn> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250302_225557_983423_92820488 X-CRM114-Status: GOOD ( 33.21 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For now, there isn't a way to set and get per-function metadata with a low overhead, which is not convenient for some situations. Take BPF trampoline for example, we need to create a trampoline for each kernel function, as we have to store some information of the function to the trampoline, such as BPF progs, function arg count, etc. The performance overhead and memory consumption can be higher to create these trampolines. With the supporting of per-function metadata storage, we can store these information to the metadata, and create a global BPF trampoline for all the kernel functions. In the global trampoline, we get the information that we need from the function metadata through the ip (function address) with almost no overhead. Another beneficiary can be ftrace. For now, all the kernel functions that are enabled by dynamic ftrace will be added to a filter hash if there are more than one callbacks. And hash lookup will happen when the traced functions are called, which has an impact on the performance, see __ftrace_ops_list_func() -> ftrace_ops_test(). With the per-function metadata supporting, we can store the information that if the callback is enabled on the kernel function to the metadata. Support per-function metadata storage in the function padding, and previous discussion can be found in [1]. Generally speaking, we have two way to implement this feature: 1. Create a function metadata array, and prepend a insn which can hold the index of the function metadata in the array. And store the insn to the function padding. 2. Allocate the function metadata with kmalloc(), and prepend a insn which hold the pointer of the metadata. And store the insn to the function padding. Compared with way 2, way 1 consume less space, but we need to do more work on the global function metadata array. And we implement this function in the way 1. Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1] Signed-off-by: Menglong Dong --- v2: - add supporting for arm64 - split out arch relevant code - refactor the commit log --- include/linux/kfunc_md.h | 25 ++++ kernel/Makefile | 1 + kernel/trace/Makefile | 1 + kernel/trace/kfunc_md.c | 239 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 266 insertions(+) create mode 100644 include/linux/kfunc_md.h create mode 100644 kernel/trace/kfunc_md.c diff --git a/include/linux/kfunc_md.h b/include/linux/kfunc_md.h new file mode 100644 index 000000000000..df616f0fcb36 --- /dev/null +++ b/include/linux/kfunc_md.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_KFUNC_MD_H +#define _LINUX_KFUNC_MD_H + +#include + +struct kfunc_md { + int users; + /* we can use this field later, make sure it is 8-bytes aligned + * for now. + */ + int pad0; + void *func; +}; + +extern struct kfunc_md *kfunc_mds; + +struct kfunc_md *kfunc_md_find(void *ip); +struct kfunc_md *kfunc_md_get(void *ip); +void kfunc_md_put(struct kfunc_md *meta); +void kfunc_md_put_by_ip(void *ip); +void kfunc_md_lock(void); +void kfunc_md_unlock(void); + +#endif diff --git a/kernel/Makefile b/kernel/Makefile index 87866b037fbe..7435674d5da3 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -108,6 +108,7 @@ obj-$(CONFIG_TRACE_CLOCK) += trace/ obj-$(CONFIG_RING_BUFFER) += trace/ obj-$(CONFIG_TRACEPOINTS) += trace/ obj-$(CONFIG_RETHOOK) += trace/ +obj-$(CONFIG_FUNCTION_METADATA) += trace/ obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-$(CONFIG_CPU_PM) += cpu_pm.o obj-$(CONFIG_BPF) += bpf/ diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 057cd975d014..9780ee3f8d8d 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -106,6 +106,7 @@ obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o obj-$(CONFIG_FPROBE) += fprobe.o obj-$(CONFIG_RETHOOK) += rethook.o obj-$(CONFIG_FPROBE_EVENTS) += trace_fprobe.o +obj-$(CONFIG_FUNCTION_METADATA) += kfunc_md.o obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o obj-$(CONFIG_RV) += rv/ diff --git a/kernel/trace/kfunc_md.c b/kernel/trace/kfunc_md.c new file mode 100644 index 000000000000..7ec25bcf778d --- /dev/null +++ b/kernel/trace/kfunc_md.c @@ -0,0 +1,239 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +#define ENTRIES_PER_PAGE (PAGE_SIZE / sizeof(struct kfunc_md)) + +static u32 kfunc_md_count = ENTRIES_PER_PAGE, kfunc_md_used; +struct kfunc_md __rcu *kfunc_mds; +EXPORT_SYMBOL_GPL(kfunc_mds); + +static DEFINE_MUTEX(kfunc_md_mutex); + + +void kfunc_md_unlock(void) +{ + mutex_unlock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_unlock); + +void kfunc_md_lock(void) +{ + mutex_lock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_lock); + +static u32 kfunc_md_get_index(void *ip) +{ + return *(u32 *)(ip - KFUNC_MD_DATA_OFFSET); +} + +static void kfunc_md_init(struct kfunc_md *mds, u32 start, u32 end) +{ + u32 i; + + for (i = start; i < end; i++) + mds[i].users = 0; +} + +static int kfunc_md_page_order(void) +{ + return fls(DIV_ROUND_UP(kfunc_md_count, ENTRIES_PER_PAGE)) - 1; +} + +/* Get next usable function metadata. On success, return the usable + * kfunc_md and store the index of it to *index. If no usable kfunc_md is + * found in kfunc_mds, a larger array will be allocated. + */ +static struct kfunc_md *kfunc_md_get_next(u32 *index) +{ + struct kfunc_md *new_mds, *mds; + u32 i, order; + + mds = rcu_dereference(kfunc_mds); + if (mds == NULL) { + order = kfunc_md_page_order(); + new_mds = (void *)__get_free_pages(GFP_KERNEL, order); + if (!new_mds) + return NULL; + kfunc_md_init(new_mds, 0, kfunc_md_count); + /* The first time to initialize kfunc_mds, so it is not + * used anywhere yet, and we can update it directly. + */ + rcu_assign_pointer(kfunc_mds, new_mds); + mds = new_mds; + } + + if (likely(kfunc_md_used < kfunc_md_count)) { + /* maybe we can manage the used function metadata entry + * with a bit map ? + */ + for (i = 0; i < kfunc_md_count; i++) { + if (!mds[i].users) { + kfunc_md_used++; + *index = i; + mds[i].users++; + return mds + i; + } + } + } + + order = kfunc_md_page_order(); + /* no available function metadata, so allocate a bigger function + * metadata array. + */ + new_mds = (void *)__get_free_pages(GFP_KERNEL, order + 1); + if (!new_mds) + return NULL; + + memcpy(new_mds, mds, kfunc_md_count * sizeof(*new_mds)); + kfunc_md_init(new_mds, kfunc_md_count, kfunc_md_count * 2); + + rcu_assign_pointer(kfunc_mds, new_mds); + synchronize_rcu(); + free_pages((u64)mds, order); + + mds = new_mds + kfunc_md_count; + *index = kfunc_md_count; + kfunc_md_count <<= 1; + kfunc_md_used++; + mds->users++; + + return mds; +} + +static int kfunc_md_text_poke(void *ip, void *insn, void *nop) +{ + void *target; + int ret = 0; + u8 *prog; + + target = ip - KFUNC_MD_INSN_OFFSET; + mutex_lock(&text_mutex); + if (insn) { + if (!memcmp(target, insn, KFUNC_MD_INSN_SIZE)) + goto out; + + if (memcmp(target, nop, KFUNC_MD_INSN_SIZE)) { + ret = -EBUSY; + goto out; + } + prog = insn; + } else { + if (!memcmp(target, nop, KFUNC_MD_INSN_SIZE)) + goto out; + prog = nop; + } + + ret = kfunc_md_arch_poke(target, prog); +out: + mutex_unlock(&text_mutex); + return ret; +} + +static bool __kfunc_md_put(struct kfunc_md *md) +{ + u8 nop_insn[KFUNC_MD_INSN_SIZE]; + + if (WARN_ON_ONCE(md->users <= 0)) + return false; + + md->users--; + if (md->users > 0) + return false; + + if (!kfunc_md_arch_exist(md->func)) + return false; + + kfunc_md_arch_nops(nop_insn); + /* release the metadata by recovering the function padding to NOPS */ + kfunc_md_text_poke(md->func, NULL, nop_insn); + /* TODO: we need a way to shrink the array "kfunc_mds" */ + kfunc_md_used--; + + return true; +} + +/* Decrease the reference of the md, release it if "md->users <= 0" */ +void kfunc_md_put(struct kfunc_md *md) +{ + mutex_lock(&kfunc_md_mutex); + __kfunc_md_put(md); + mutex_unlock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_put); + +/* Get a exist metadata by the function address, and NULL will be returned + * if not exist. + * + * NOTE: rcu lock should be held during reading the metadata, and + * kfunc_md_lock should be held if writing happens. + */ +struct kfunc_md *kfunc_md_find(void *ip) +{ + struct kfunc_md *md; + u32 index; + + if (kfunc_md_arch_exist(ip)) { + index = kfunc_md_get_index(ip); + if (WARN_ON_ONCE(index >= kfunc_md_count)) + return NULL; + + md = rcu_dereference(kfunc_mds) + index; + return md; + } + return NULL; +} +EXPORT_SYMBOL_GPL(kfunc_md_find); + +void kfunc_md_put_by_ip(void *ip) +{ + struct kfunc_md *md; + + mutex_lock(&kfunc_md_mutex); + md = kfunc_md_find(ip); + if (md) + __kfunc_md_put(md); + mutex_unlock(&kfunc_md_mutex); +} +EXPORT_SYMBOL_GPL(kfunc_md_put_by_ip); + +/* Get a exist metadata by the function address, and create one if not + * exist. Reference of the metadata will increase 1. + * + * NOTE: always call this function with kfunc_md_lock held, and all + * updating to metadata should also hold the kfunc_md_lock. + */ +struct kfunc_md *kfunc_md_get(void *ip) +{ + u8 nop_insn[KFUNC_MD_INSN_SIZE], insn[KFUNC_MD_INSN_SIZE]; + struct kfunc_md *md; + u32 index; + + md = kfunc_md_find(ip); + if (md) { + md->users++; + return md; + } + + md = kfunc_md_get_next(&index); + if (!md) + return NULL; + + kfunc_md_arch_pretend(insn, index); + kfunc_md_arch_nops(nop_insn); + + if (kfunc_md_text_poke(ip, insn, nop_insn)) { + kfunc_md_used--; + md->users = 0; + return NULL; + } + md->func = ip; + + return md; +} +EXPORT_SYMBOL_GPL(kfunc_md_get); From patchwork Mon Mar 3 06:53:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Menglong Dong X-Patchwork-Id: 13998254 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D90A6C282C5 for ; Mon, 3 Mar 2025 07:02:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=DMU16H6izdRqPTzQSvNq/kwALVMPh8dWr5JstZVUeSQ=; b=lqNMdy3HzUKif+95oZpogfffvT qopwI0k92gAhE8WlifN1B/de+N/3P2BaZZvz4wxXKEIYeCf3IEdUNmyaYjzLGYW9t8G8NN1sIXSgx +3Kssd/wMGUzJd5ypVfT+vp6R+mb+NlrEJbmw7RC/umV23r6Rh00dSQWuuBiTQ+YbEzcBfGvt1WQD zSRZxHH18l+kcFULPH+IdOeBYyxOkGlkVzrIMBqGrT1REemREtd24JIt9lZa1YKUS4c6Wpc8bqWf/ rnpBJXwC+JoDdJJtty9eElQLcypDuMJ2XnWHOLtefbgTMcxGWaGvH2dlgvg4nkUCmEreVF/v5jtLU PlOqWivw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tozoS-0000000HNOH-2fca; Mon, 03 Mar 2025 07:02:04 +0000 Received: from mail-pj1-x1044.google.com ([2607:f8b0:4864:20::1044]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tozif-0000000HMZ9-0GPJ for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 06:56:06 +0000 Received: by mail-pj1-x1044.google.com with SMTP id 98e67ed59e1d1-2fea8d8c322so7364839a91.2 for ; Sun, 02 Mar 2025 22:56:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740984964; x=1741589764; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DMU16H6izdRqPTzQSvNq/kwALVMPh8dWr5JstZVUeSQ=; b=BRVbkhZvYOt+j9VIiYg5GoBFvoH56taWUkh7MgUKBeChWg+JLzHKVM3VjgwyfS7P1j jBJfNz2l9GtGEmMp3elcERJiKGAUG2LBW+4AjpMCzayk0xv7Tg8clJ/kjbTGH0gKzkBN h7tIs7PASj9qDxNGKyAw/yRySA9LWy3FZKkJ61dj5J4ubJYHcXCuIkO5SeYw2aLERXV4 wmT5neM/6y7QUZUpTKjZn/DBKdeZNOqRfzn5nTveioa3r+KRyLJ662/B8SB8ZN002CbE kzF+pNB15CrmUKep9AMFmDhdVqLkI/EsYxKhvQtdo6yf0eaTtjKHKKDkQUDkM2LbHUwj cV4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740984964; x=1741589764; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DMU16H6izdRqPTzQSvNq/kwALVMPh8dWr5JstZVUeSQ=; b=fEzMjI1D2lZkSaJqMHuhV8SiG9453XpZySVZF7W5UAN4rkHzac4jWTJizFQU0AG/pL KE6/AqmNIh9havDYesfnUCOREuHjt929ygR/D7BRUQc/lt2XLpoIeT4+QIP4tfTlZD3N 9L7P2YdZsgnDK7GgoXdA9vLZ7JFx8plcOOQroK3r6RrQBN8ytgQTtZfdFpnKfEIj6Xsw MJdYtH8rg+Z1b4pxyhfQ7FdXgTNp7CdOCLwFr03bSn6e9KytTKQENZeW++d57Ev4JVbd nL65bgVS09zTr5bkmeKVMrbprgnQv5Dr2IfR+7Dd8RKkqCG0yX+LcTNA7L2Gyc6eYccd maIg== X-Forwarded-Encrypted: i=1; AJvYcCUHQlmpLQdu3Z0Z/gJnRalcfJ4vjZjgNQBRlWBUIpQHgNr4KjEPaHtmyxlxQUJ+WxZNY2Lqu1m8QeiQxQp/Nl3k@lists.infradead.org X-Gm-Message-State: AOJu0YwCGd35daSo3xpl8DV0mR8aRaAJt3OhJII9zphpOZhxAvUsh0VU HCm58PCxn0FVKRh81Yy1R1xN6REf0xsM00e+ZqHCpiXbROF0sS5h X-Gm-Gg: ASbGnctxfBfX4lmrRdVOgs9g2ET4up27C7Ml5yFF0azy8Vbu2CUoHyNdsSMFa7Yfn1J MQqvndu85HVlvUbjDi03vECrOItiC9qx1YCCX9Os/RMplAqTKIRVyeFSvUTFUDwqNriFAkwL6By /Jt2CYbXZLUOsgj7NiTTY4F0C3c6TnqwJZQp4GYjNXSLl7gAVlDlOXlDpGKAI+Kz8j9hqZEjIe+ RygIo3lCgREn+s0ivxqTIR5PFrB3OfT8DIS/sB6o2abOuJr/Cj9svsWhLSfEvZ/B9AT/uYNMNWs /Up6Qn6sPLp/LlH0XADAVYqOipmEfTnhlvx6m8BXQnBtxAAvnJyZbfwRPL9Fig== X-Google-Smtp-Source: AGHT+IGyBy5lAIgE9/BJFGPHKWL9IGQJbSt/AMZr+VvaZ6EhLBWcU/x/8umKGWf6iOEq1B7Odhyfkg== X-Received: by 2002:a17:90b:1845:b0:2ee:d193:f3d5 with SMTP id 98e67ed59e1d1-2febab2ecbfmr20898385a91.7.1740984964540; Sun, 02 Mar 2025 22:56:04 -0800 (PST) Received: from localhost.localdomain ([43.129.244.20]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fea6769ad2sm8139575a91.11.2025.03.02.22.55.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Mar 2025 22:56:04 -0800 (PST) From: Menglong Dong X-Google-Original-From: Menglong Dong To: peterz@infradead.org, rostedt@goodmis.org, mark.rutland@arm.com, alexei.starovoitov@gmail.com Cc: catalin.marinas@arm.com, will@kernel.org, mhiramat@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, jolsa@kernel.org, davem@davemloft.net, dsahern@kernel.org, mathieu.desnoyers@efficios.com, nathan@kernel.org, nick.desaulniers+lkml@gmail.com, morbo@google.com, samitolvanen@google.com, kees@kernel.org, dongml2@chinatelecom.cn, akpm@linux-foundation.org, riel@surriel.com, rppt@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v3 3/4] x86: implement per-function metadata storage for x86 Date: Mon, 3 Mar 2025 14:53:44 +0800 Message-Id: <20250303065345.229298-4-dongml2@chinatelecom.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250303065345.229298-1-dongml2@chinatelecom.cn> References: <20250303065345.229298-1-dongml2@chinatelecom.cn> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250302_225605_104270_292B8C80 X-CRM114-Status: GOOD ( 21.92 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org With CONFIG_CALL_PADDING enabled, there will be 16-bytes padding space before all the kernel functions. And some kernel features can use it, such as MITIGATION_CALL_DEPTH_TRACKING, CFI_CLANG, FINEIBT, etc. In my research, MITIGATION_CALL_DEPTH_TRACKING will consume the tail 9-bytes in the function padding, CFI_CLANG will consume the head 5-bytes, and FINEIBT will consume all the 16 bytes if it is enabled. So there will be no space for us if MITIGATION_CALL_DEPTH_TRACKING and CFI_CLANG are both enabled, or FINEIBT is enabled. In x86, we need 5-bytes to prepend a "mov %eax xxx" insn, which can hold a 4-bytes index. So we have following logic: 1. use the head 5-bytes if CFI_CLANG is not enabled 2. use the tail 5-bytes if MITIGATION_CALL_DEPTH_TRACKING and FINEIBT are not enabled 3. compile the kernel with FUNCTION_ALIGNMENT_32B otherwise In the third case, we make the kernel function 32 bytes aligned, and there will be 32 bytes padding before the functions. According to my testing, the text size didn't increase on this case, which is weird. With 16-bytes padding: -rwxr-xr-x 1 401190688 x86-dev/vmlinux* -rw-r--r-- 1 251068 x86-dev/vmlinux.a -rw-r--r-- 1 851892992 x86-dev/vmlinux.o -rw-r--r-- 1 12395008 x86-dev/arch/x86/boot/bzImage With 32-bytes padding: -rwxr-xr-x 1 401318128 x86-dev/vmlinux* -rw-r--r-- 1 251154 x86-dev/vmlinux.a -rw-r--r-- 1 853636704 x86-dev/vmlinux.o -rw-r--r-- 1 12509696 x86-dev/arch/x86/boot/bzImage The way I tested should be right, and this is a good news for us. On the third case, the layout of the padding space will be like this if fineibt is enabled: __cfi_func: mov -- 5 -- cfi, not used anymore nop nop nop mov -- 5 -- function metadata nop nop nop fineibt -- 16 -- fineibt func: nopw -- 4 ...... I tested the fineibt with "cfi=fineibt" cmdline, and it works well together with FUNCTION_METADATA enabled. And I also tested the performance of this function by setting metadata for all the kernel function, and it consumes 0.7s for 70k+ functions, not bad :/ I can't find a machine that support IBT, so I didn't test the IBT. I'd appreciate it if someone can do this testing for me :/ Signed-off-by: Menglong Dong --- v3: - select FUNCTION_ALIGNMENT_32B on case3, instead of extra 5-bytes --- arch/x86/Kconfig | 18 ++++++++++++ arch/x86/include/asm/ftrace.h | 54 +++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index be2c311f5118..fe5a98401135 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2509,6 +2509,24 @@ config PREFIX_SYMBOLS def_bool y depends on CALL_PADDING && !CFI_CLANG +config FUNCTION_METADATA + bool "Per-function metadata storage support" + default y + depends on CC_HAS_ENTRY_PADDING && OBJTOOL + select CALL_PADDING + select FUNCTION_ALIGNMENT_32B if ((CFI_CLANG && CALL_THUNKS) || FINEIBT) + help + Support per-function metadata storage for kernel functions, and + get the metadata of the function by its address with almost no + overhead. + + The index of the metadata will be stored in the function padding + and consumes 5-bytes. FUNCTION_ALIGNMENT_32B will be selected if + "(CFI_CLANG && CALL_THUNKS) || FINEIBT" to make sure there is + enough available padding space for this function. However, it + seems that the text size almost don't change, compare with + FUNCTION_ALIGNMENT_16B. + menuconfig CPU_MITIGATIONS bool "Mitigations for CPU vulnerabilities" default y diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h index f9cb4d07df58..d5cbb8e18fd7 100644 --- a/arch/x86/include/asm/ftrace.h +++ b/arch/x86/include/asm/ftrace.h @@ -4,6 +4,28 @@ #include +#ifdef CONFIG_FUNCTION_METADATA +#if (defined(CONFIG_CFI_CLANG) && defined(CONFIG_CALL_THUNKS)) || (defined(CONFIG_FINEIBT)) + /* the CONFIG_FUNCTION_PADDING_BYTES is 32 in this case, use the + * range: [align + 8, align + 13]. + */ + #define KFUNC_MD_INSN_OFFSET (CONFIG_FUNCTION_PADDING_BYTES - 8) + #define KFUNC_MD_DATA_OFFSET (CONFIG_FUNCTION_PADDING_BYTES - 9) +#else + #ifdef CONFIG_CFI_CLANG + /* use the space that CALL_THUNKS suppose to use */ + #define KFUNC_MD_INSN_OFFSET (5) + #define KFUNC_MD_DATA_OFFSET (4) + #else + /* use the space that CFI_CLANG suppose to use */ + #define KFUNC_MD_INSN_OFFSET (CONFIG_FUNCTION_PADDING_BYTES) + #define KFUNC_MD_DATA_OFFSET (CONFIG_FUNCTION_PADDING_BYTES - 1) + #endif +#endif + +#define KFUNC_MD_INSN_SIZE (5) +#endif + #ifdef CONFIG_FUNCTION_TRACER #ifndef CC_USING_FENTRY # error Compiler does not support fentry? @@ -168,4 +190,36 @@ static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs) #endif /* !COMPILE_OFFSETS */ #endif /* !__ASSEMBLY__ */ +#if !defined(__ASSEMBLY__) && defined(CONFIG_FUNCTION_METADATA) +#include + +static inline bool kfunc_md_arch_exist(void *ip) +{ + return *(u8 *)(ip - KFUNC_MD_INSN_OFFSET) == 0xB8; +} + +static inline void kfunc_md_arch_pretend(u8 *insn, u32 index) +{ + *insn = 0xB8; + *(u32 *)(insn + 1) = index; +} + +static inline void kfunc_md_arch_nops(u8 *insn) +{ + *(insn++) = BYTES_NOP1; + *(insn++) = BYTES_NOP1; + *(insn++) = BYTES_NOP1; + *(insn++) = BYTES_NOP1; + *(insn++) = BYTES_NOP1; +} + +static inline int kfunc_md_arch_poke(void *ip, u8 *insn) +{ + text_poke(ip, insn, KFUNC_MD_INSN_SIZE); + text_poke_sync(); + return 0; +} + +#endif + #endif /* _ASM_X86_FTRACE_H */ From patchwork Mon Mar 3 06:53:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Menglong Dong X-Patchwork-Id: 13998255 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 792AAC282C5 for ; Mon, 3 Mar 2025 07:03:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=KvSQXTXm2WJtvrSSulKAAYVeipdazZIwWA04KeDmP3s=; b=nkshV42Uob/UB0olBcwJcafyuo GaF4p9AsZ0m0YLueBTnHYAFlPv4wN8KUSZy42Vubd9GDkymoo5TV4ykgNWggaMYRjVdbuiBbVZ2Il zDH/SGZj4EGydIHzObvvhBL9VGUYhGjgktcnZ9mdFguchSaYQVj7Mt3z2osr9bTjELiepzvAlL11b KMOoOjYzjItj07Er4O95DsMSruygdo9ClPrdJXc/bwyv98uWU8kRxZPhZW6xAsYy+nA3J3NT0b9X6 MxMJA6ercvw0d2xHWMdQgz6G9ky4Zad48QyT70HVg4qpQqe4+5wxME6geWRQ/WcQlNJxTB6Xh3e+D W5J5++9w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tozpy-0000000HNVE-24Gl; Mon, 03 Mar 2025 07:03:38 +0000 Received: from mail-pl1-x641.google.com ([2607:f8b0:4864:20::641]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tozin-0000000HMbe-0Fla for linux-arm-kernel@lists.infradead.org; Mon, 03 Mar 2025 06:56:14 +0000 Received: by mail-pl1-x641.google.com with SMTP id d9443c01a7336-223378e2b0dso56856875ad.0 for ; Sun, 02 Mar 2025 22:56:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740984972; x=1741589772; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KvSQXTXm2WJtvrSSulKAAYVeipdazZIwWA04KeDmP3s=; b=fZuj9fA7a54RY+Dersq984gxMMlM5c2IBIa+J3C6o8asWXM4mI248G/1nj942+wFfi 5LCW3RLylrsTPD1DL3EVn+ZlAUlIsF7wuHxKDxCZcZ0ikWB2pDUPDdOfXLpQlb61p9rK TcwdIAPV7+xEW0FVALf/1ACeJyxtIjp2FZoXL1Klh585FblPoe0jsZXivLivKYCsUQD8 zHlk0ChNFTZ8ai3cf183moIQW7aExOjfW/5t9iE01s2TeyK7ipYjxX76lo2IIeB6cq1D XtkFFVR76lNI4rgHI7LWFZmNVRe6GJbdpc2iZ8HknLBh/8Sm0D++38e2LXm57Lqooq2D 78ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740984972; x=1741589772; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KvSQXTXm2WJtvrSSulKAAYVeipdazZIwWA04KeDmP3s=; b=nal6TQLhO3F84SX+4NuSS4zHGRBdl6Jl10h4awFsfZwdpgmoK3RfgCM6B9vzixBFzV hKWNBy0Kk0Y1ZLWCGvIXUdxzkDRCpmTgAX0elZoM6WulbTuLpHRp9HpZo9nMT3UJjdNm Nke00W5EUhugxiVFL14HwlcZ41bX+97y1M6i8dZv+5AmGfSeUlIUAtkTcitqNbhjjsc4 bJKCnX0BlcOC5nMIYJaKGZvnB+oUTV3NXXNr/qT+SQDBzNzAbx1glb2LwyFhYzeHe8Gv SJpoPTr/6X075X1moNZmvAl8rFjv1pNsH0IwppwTMOQs5tbtU3T1sUhPsYp7Zya4y2F9 tRsA== X-Forwarded-Encrypted: i=1; AJvYcCUN8joVFb5G7XrCJGtqHzGOG+J2BgTEBiAe91SHNjIveRgS3Owdo9nmSr090yIHro6IzT1eX874mk38ZhIg61ys@lists.infradead.org X-Gm-Message-State: AOJu0YxTSxTbSHmrwcfS5wXr/dQ6TkEJO64MQyEU03gfoCTed+Ch+tWI Yg7yEZ+yJm6Va7PKsrOUzBHYYd6Js++TsdmQysgUQfnpZd2Q8hGi X-Gm-Gg: ASbGnct6ks/OnzEABYyaE9SPjau442WwgVyTyY5Q3QWx+WZGWFhWrPVvSWwrrZWgrPK y8T5Ev21ZdL530bvLGe72bjPYBUQwNXAHE+xxhXJpO9TzX4SAWwOj+OaJuALGy3Zcmg55Lc1FBL XsSBIDBHUBE/ernz/9DPTcFVyCOQN2eEd21UUbImOQMsXA3kB03K/GAMVKjmtXHOIpj33VG29q8 9z1g9fdxOJrJ1/ryWfQqOpSWOqMFxCN0Rk5saUwJOCQRSErNMB0Z90JC5TS/DzS3O4vImBJiNtA EmY24rykS4ogW+m9/y8se0npAvd5cmw6n96XLDVxpmo4sHp0Sp08K1Ve5gkQvA== X-Google-Smtp-Source: AGHT+IHD7TOQHXPnvhY3fUJExGRRx5jR57Vtr+fKvMEWMLLg1ZWvH9nc0mv2BX9j9PlmWnQhBjDbsg== X-Received: by 2002:a17:903:3b83:b0:21f:6c3:aefb with SMTP id d9443c01a7336-2236920c418mr195671705ad.35.1740984972122; Sun, 02 Mar 2025 22:56:12 -0800 (PST) Received: from localhost.localdomain ([43.129.244.20]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fea6769ad2sm8139575a91.11.2025.03.02.22.56.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Mar 2025 22:56:11 -0800 (PST) From: Menglong Dong X-Google-Original-From: Menglong Dong To: peterz@infradead.org, rostedt@goodmis.org, mark.rutland@arm.com, alexei.starovoitov@gmail.com Cc: catalin.marinas@arm.com, will@kernel.org, mhiramat@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, jolsa@kernel.org, davem@davemloft.net, dsahern@kernel.org, mathieu.desnoyers@efficios.com, nathan@kernel.org, nick.desaulniers+lkml@gmail.com, morbo@google.com, samitolvanen@google.com, kees@kernel.org, dongml2@chinatelecom.cn, akpm@linux-foundation.org, riel@surriel.com, rppt@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v3 4/4] arm64: implement per-function metadata storage for arm64 Date: Mon, 3 Mar 2025 14:53:45 +0800 Message-Id: <20250303065345.229298-5-dongml2@chinatelecom.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250303065345.229298-1-dongml2@chinatelecom.cn> References: <20250303065345.229298-1-dongml2@chinatelecom.cn> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250302_225613_101524_924A472E X-CRM114-Status: GOOD ( 21.55 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org The per-function metadata storage is already used by ftrace if CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS is enabled, and it store the pointer of the callback directly to the function padding, which consume 8-bytes, in the commit baaf553d3bc3 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS"). So we can directly store the index to the function padding too, without a prepending. With CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS enabled, the function is 8-bytes aligned, and we will compile the kernel with extra 8-bytes (2 NOPS) padding space. Otherwise, the function is 4-bytes aligned, and only extra 4-bytes (1 NOPS) is needed. However, we have the same problem with Mark in the commit above: we can't use the function padding together with CFI_CLANG, which can make the clang compiles a wrong offset to the pre-function type hash. He said that he was working with others on this problem 2 years ago. Hi Mark, is there any progress on this problem? Signed-off-by: Menglong Dong --- arch/arm64/Kconfig | 15 +++++++++++++++ arch/arm64/Makefile | 23 ++++++++++++++++++++-- arch/arm64/include/asm/ftrace.h | 34 +++++++++++++++++++++++++++++++++ arch/arm64/kernel/ftrace.c | 13 +++++++++++-- 4 files changed, 81 insertions(+), 4 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 940343beb3d4..7ed80f5eb267 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1536,6 +1536,21 @@ config NODES_SHIFT Specify the maximum number of NUMA Nodes available on the target system. Increases memory reserved to accommodate various tables. +config FUNCTION_METADATA + bool "Per-function metadata storage support" + default y + select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE if !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY + depends on !CFI_CLANG + help + Support per-function metadata storage for kernel functions, and + get the metadata of the function by its address with almost no + overhead. + + The index of the metadata will be stored in the function padding, + which will consume 4-bytes. If FUNCTION_ALIGNMENT_8B is enabled, + extra 8-bytes function padding will be reserved during compiling. + Otherwise, only extra 4-bytes function padding is needed. + source "kernel/Kconfig.hz" config ARCH_SPARSEMEM_ENABLE diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 2b25d671365f..2df2b0f4dd90 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -144,12 +144,31 @@ endif CHECKFLAGS += -D__aarch64__ +ifeq ($(CONFIG_FUNCTION_METADATA),y) + ifeq ($(CONFIG_FUNCTION_ALIGNMENT_8B),y) + __padding_nops := 2 + else + __padding_nops := 1 + endif +else + __padding_nops := 0 +endif + ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS),y) + __padding_nops := $(shell echo $(__padding_nops) + 2 | bc) KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY - CC_FLAGS_FTRACE := -fpatchable-function-entry=4,2 + CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops) else ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_ARGS),y) + CC_FLAGS_FTRACE := -fpatchable-function-entry=$(shell echo $(__padding_nops) + 2 | bc),$(__padding_nops) KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY - CC_FLAGS_FTRACE := -fpatchable-function-entry=2 +else ifeq ($(CONFIG_FUNCTION_METADATA),y) + CC_FLAGS_FTRACE += -fpatchable-function-entry=$(__padding_nops),$(__padding_nops) + ifneq ($(CONFIG_FUNCTION_TRACER),y) + KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) + # some file need to remove this cflag when CONFIG_FUNCTION_TRACER + # is not enabled, so we need to export it here + export CC_FLAGS_FTRACE + endif endif ifeq ($(CONFIG_KASAN_SW_TAGS), y) diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index bfe3ce9df197..aa3eaa91bf82 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -24,6 +24,16 @@ #define FTRACE_PLT_IDX 0 #define NR_FTRACE_PLTS 1 +#ifdef CONFIG_FUNCTION_METADATA +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS +#define KFUNC_MD_DATA_OFFSET (AARCH64_INSN_SIZE * 3) +#else +#define KFUNC_MD_DATA_OFFSET AARCH64_INSN_SIZE +#endif +#define KFUNC_MD_INSN_SIZE AARCH64_INSN_SIZE +#define KFUNC_MD_INSN_OFFSET KFUNC_MD_DATA_OFFSET +#endif + /* * Currently, gcc tends to save the link register after the local variables * on the stack. This causes the max stack tracer to report the function @@ -216,6 +226,30 @@ static inline bool arch_syscall_match_sym_name(const char *sym, */ return !strcmp(sym + 8, name); } + +#ifdef CONFIG_FUNCTION_METADATA +#include + +static inline bool kfunc_md_arch_exist(void *ip) +{ + return !aarch64_insn_is_nop(*(u32 *)(ip - KFUNC_MD_INSN_OFFSET)); +} + +static inline void kfunc_md_arch_pretend(u8 *insn, u32 index) +{ + *(u32 *)insn = index; +} + +static inline void kfunc_md_arch_nops(u8 *insn) +{ + *(u32 *)insn = aarch64_insn_gen_nop(); +} + +static inline int kfunc_md_arch_poke(void *ip, u8 *insn) +{ + return aarch64_insn_patch_text_nosync(ip, *(u32 *)insn); +} +#endif #endif /* ifndef __ASSEMBLY__ */ #ifndef __ASSEMBLY__ diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index d7c0d023dfe5..4191ff0037f5 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -88,8 +88,10 @@ unsigned long ftrace_call_adjust(unsigned long addr) * to `BL `, which is at `addr + 4` bytes in either case. * */ - if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) - return addr + AARCH64_INSN_SIZE; + if (!IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)) { + addr += AARCH64_INSN_SIZE; + goto out; + } /* * When using patchable-function-entry with pre-function NOPs, addr is @@ -139,6 +141,13 @@ unsigned long ftrace_call_adjust(unsigned long addr) /* Skip the first NOP after function entry */ addr += AARCH64_INSN_SIZE; +out: + if (IS_ENABLED(CONFIG_FUNCTION_METADATA)) { + if (IS_ENABLED(CONFIG_FUNCTION_ALIGNMENT_8B)) + addr += 2 * AARCH64_INSN_SIZE; + else + addr += AARCH64_INSN_SIZE; + } return addr; }