From patchwork Mon Dec 25 04:42:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jisheng Zhang X-Patchwork-Id: 13504457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 06978C47073 for ; Mon, 25 Dec 2023 04:55:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=+/kzLxNo6GEupESKE6nMX6BB6P++jJo3wGDJhppa5EU=; b=HB2npeKLOtzm3C oFNr9a0kUDQBPHMRj/TgiZtooSi7KTEIqhJvnCzE14Xub/JThk4G4NN4PNz8Pne37IrNLURWQkCFJ F92r6BDe2J7UtrD8gY8Y31K566GWSqhhsiFw2lcukXq8qJ/L3IkwBp/mR1jhLtm8SYUuaOz0iJUSa bl4E48YidJSawQQ6qi9169lqRTGK0DI2JbAsBB1N/xbYCYeJaOH712pEmo5i7H0lnfEZO7hPaRBzP JVNbYwZeD0XR+w/tiLm1sQdxr4b0pzlfSYLZn/jne3raLNz3WYghDYTgMapEwlZPle1LgBlkIXvbg FZuJQQUm+0M8Z3xpjpnQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rHczX-00ADBA-2O; Mon, 25 Dec 2023 04:55:03 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1rHczT-00AD96-2Z for linux-riscv@lists.infradead.org; Mon, 25 Dec 2023 04:55:01 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 1C5F1B80B13; Mon, 25 Dec 2023 04:54:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1D88DC433C9; Mon, 25 Dec 2023 04:54:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703480094; bh=joYODQQgM8OQwgZYoqPwgCmKzSHGbsW2er2Qt7q5rp8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ApSX89aVjmgqbdkXyq2u5pLTi4jiZQPcbWieu08zEKDV+Dph9ImzxtwcsGt5gy8wr /TPfe/Cqs8k7g5QxPltEkToGICEHusdBwH+BW/P4QGyTWlikiQ+KV91UxOhnGwtY3c 62T/Ms8cCCjpthMdr1c7uRPwAwhn6ga8kJFq/mbyljdf+t9cOOAkO4wdcsp6ehWHg1 eLDPnILHE/41GJwESFN4SoIiwr26rVqsjG9I4d/50ZVfoarK04qEswNR7OvVSUPl3x ORk0R8hDtobKogJ4ISI9Ei8kZKGWfBcjKtMXq0H6NRCViLC1l6L9fSyImDeQ/TOicW AbSrAHslZGDZA== From: Jisheng Zhang To: Paul Walmsley , Palmer Dabbelt , Albert Ou Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Eric Biggers , Conor Dooley , Qingfang DENG , Charlie Jenkins Subject: [PATCH v4 1/2] riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Date: Mon, 25 Dec 2023 12:42:06 +0800 Message-Id: <20231225044207.3821-2-jszhang@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20231225044207.3821-1-jszhang@kernel.org> References: <20231225044207.3821-1-jszhang@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231224_205500_149623_CBDBAA50 X-CRM114-Status: GOOD ( 11.55 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org Some riscv implementations such as T-HEAD's C906, C908, C910 and C920 support efficient unaligned access, for performance reason we want to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To avoid performance regressions on other non efficient unaligned access platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected. To solve this problem, runtime code patching based on the detected speed is a good solution. But that's not easy, it involves lots of work to modify vairous subsystems such as net, mm, lib and so on. This can be done step by step. So let's take an easier solution: add support to efficient unaligned access and hide the support under NONPORTABLE. Now let's introduce RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on NONPORTABLE, if users know during config time that the kernel will be only run on those efficient unaligned access hw platforms, they can enable it. Obviously, generic unified kernel Image shouldn't enable it. Signed-off-by: Jisheng Zhang Reviewed-by: Charlie Jenkins Reviewed-by: Eric Biggers --- arch/riscv/Kconfig | 13 +++++++++++++ arch/riscv/Makefile | 2 ++ 2 files changed, 15 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 24c1799e2ec4..afcc5fdc16f7 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -651,6 +651,19 @@ config RISCV_MISALIGNED load/store for both kernel and userspace. When disable, misaligned accesses will generate SIGBUS in userspace and panic in kernel. +config RISCV_EFFICIENT_UNALIGNED_ACCESS + bool "Assume the CPU supports fast unaligned memory accesses" + depends on NONPORTABLE + select HAVE_EFFICIENT_UNALIGNED_ACCESS + help + Say Y here if you want the kernel to assume that the CPU supports + efficient unaligned memory accesses. When enabled, this option + improves the performance of the kernel on such CPUs. However, the + kernel will run much more slowly, or will not be able to run at all, + on CPUs that do not support efficient unaligned memory accesses. + + If unsure what to do here, say N. + endmenu # "Platform type" menu "Kernel features" diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index a74be78678eb..ebbe02628a27 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -108,7 +108,9 @@ KBUILD_AFLAGS_MODULE += $(call as-option,-Wa$(comma)-mno-relax) # unaligned accesses. While unaligned accesses are explicitly allowed in the # RISC-V ISA, they're emulated by machine mode traps on all extant # architectures. It's faster to have GCC emit only aligned accesses. +ifneq ($(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS),y) KBUILD_CFLAGS += $(call cc-option,-mstrict-align) +endif ifeq ($(CONFIG_STACKPROTECTOR_PER_TASK),y) prepare: stack_protector_prepare From patchwork Mon Dec 25 04:42:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jisheng Zhang X-Patchwork-Id: 13504456 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 04404C3DA6E for ; Mon, 25 Dec 2023 04:55:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=vZH92cnvC2Tb3Wq76hTGm4nYerrH2+bNfhdvYBs6NqE=; b=O70uyRX78B7o2x ie6UpHwEPOlChBYVBjOxSpS76Ab65kxFNT5D9T7lPT73h8XxiiVNuFiij3pa8/S/1fA8sV6eux+7r UZ3TIHuJ7tSmADg6zrMS2bOgrEG1qg8YHlor2GiuYuPPEPdx0++jQpXievIeOdd9v3Pln3VzjXtNj sJsrYU48/nCtOHBsV/NEsIjxOflrU9vZ2rRx2eH7KwNHaCa/0BlApCjMgNhju952WeJ55A9Q18D8M YaQptjNG8oCl1nzMt8nbWQ0WeBCd/b1jJG/F6b2aooEfrK7nGGjCabV90HW5eO6WSjFsapPzPvpAc XXTTDooJtUk3tA+sustQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rHczb-00ADD3-1a; Mon, 25 Dec 2023 04:55:07 +0000 Received: from sin.source.kernel.org ([2604:1380:40e1:4800::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1rHczX-00AD9j-2J for linux-riscv@lists.infradead.org; Mon, 25 Dec 2023 04:55:05 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id B929BCE0E56; Mon, 25 Dec 2023 04:54:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 77FAFC433CC; Mon, 25 Dec 2023 04:54:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703480097; bh=6XpfAVFOAhuOEVj65LCeIoXPY0Gh1KiNb/Z3XmVwtL0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Jozbum0Q/fdcDqQiB55TcCxgyo/ld5j3NiVIyavjUoZCjmDITUYvfSoktcEfqZDYm mkC0SQ4pQdcN/ov8EfaafmTTO/k/2HGyVDc/5pTkpdtHezaUNRwez/H9DX5kgQYTmM F+p4cBLepUCvQK+Xr95SD6HfEaB0S5n6KhAj3EkQoaE12Y2MkTv/Zsx91T/kccZdAb xQat64yxzEgNYd16DF2dUQPW2UHhBuUPyVhlysL49m6z0FPVZ8TcNRwBR9RRMP9d9k wuezVnDoaRokWyF/r27e2kVZy6KIOgqu5H+HF83G0BjnBA8QoLH3a94LfTcetypgy2 ehU3rhFvFjLbw== From: Jisheng Zhang To: Paul Walmsley , Palmer Dabbelt , Albert Ou Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Eric Biggers , Conor Dooley , Qingfang DENG Subject: [PATCH v4 2/2] riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW Date: Mon, 25 Dec 2023 12:42:07 +0800 Message-Id: <20231225044207.3821-3-jszhang@kernel.org> X-Mailer: git-send-email 2.40.0 In-Reply-To: <20231225044207.3821-1-jszhang@kernel.org> References: <20231225044207.3821-1-jszhang@kernel.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231224_205504_105169_48F0A8F3 X-CRM114-Status: GOOD ( 17.08 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string comparisons in the vfs layer. This patch implements support for load_unaligned_zeropad in much the same way as has been done for arm64. Here is the test program and step: $ cat tt.c #include #include #include #define ITERATIONS 1000000 #define PATH "123456781234567812345678123456781" int main(void) { unsigned long i; struct stat buf; for (i = 0; i < ITERATIONS; i++) stat(PATH, &buf); return 0; } $ gcc -O2 tt.c $ touch 123456781234567812345678123456781 $ time ./a.out Per my test on T-HEAD C910 platforms, the above test performance is improved by about 7.5%. Signed-off-by: Jisheng Zhang --- arch/riscv/Kconfig | 1 + arch/riscv/include/asm/asm-extable.h | 15 ++++++++++++ arch/riscv/include/asm/word-at-a-time.h | 27 +++++++++++++++++++++ arch/riscv/mm/extable.c | 31 +++++++++++++++++++++++++ 4 files changed, 74 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index afcc5fdc16f7..e34863c5a8ed 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -654,6 +654,7 @@ config RISCV_MISALIGNED config RISCV_EFFICIENT_UNALIGNED_ACCESS bool "Assume the CPU supports fast unaligned memory accesses" depends on NONPORTABLE + select DCACHE_WORD_ACCESS if MMU select HAVE_EFFICIENT_UNALIGNED_ACCESS help Say Y here if you want the kernel to assume that the CPU supports diff --git a/arch/riscv/include/asm/asm-extable.h b/arch/riscv/include/asm/asm-extable.h index 00a96e7a9664..0c8bfd54fc4e 100644 --- a/arch/riscv/include/asm/asm-extable.h +++ b/arch/riscv/include/asm/asm-extable.h @@ -6,6 +6,7 @@ #define EX_TYPE_FIXUP 1 #define EX_TYPE_BPF 2 #define EX_TYPE_UACCESS_ERR_ZERO 3 +#define EX_TYPE_LOAD_UNALIGNED_ZEROPAD 4 #ifdef CONFIG_MMU @@ -47,6 +48,11 @@ #define EX_DATA_REG_ZERO_SHIFT 5 #define EX_DATA_REG_ZERO GENMASK(9, 5) +#define EX_DATA_REG_DATA_SHIFT 0 +#define EX_DATA_REG_DATA GENMASK(4, 0) +#define EX_DATA_REG_ADDR_SHIFT 5 +#define EX_DATA_REG_ADDR GENMASK(9, 5) + #define EX_DATA_REG(reg, gpr) \ "((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")" @@ -62,6 +68,15 @@ #define _ASM_EXTABLE_UACCESS_ERR(insn, fixup, err) \ _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) +#define _ASM_EXTABLE_LOAD_UNALIGNED_ZEROPAD(insn, fixup, data, addr) \ + __DEFINE_ASM_GPR_NUMS \ + __ASM_EXTABLE_RAW(#insn, #fixup, \ + __stringify(EX_TYPE_LOAD_UNALIGNED_ZEROPAD), \ + "(" \ + EX_DATA_REG(DATA, data) " | " \ + EX_DATA_REG(ADDR, addr) \ + ")") + #endif /* __ASSEMBLY__ */ #else /* CONFIG_MMU */ diff --git a/arch/riscv/include/asm/word-at-a-time.h b/arch/riscv/include/asm/word-at-a-time.h index 7c086ac6ecd4..f3f031e34191 100644 --- a/arch/riscv/include/asm/word-at-a-time.h +++ b/arch/riscv/include/asm/word-at-a-time.h @@ -9,6 +9,7 @@ #define _ASM_RISCV_WORD_AT_A_TIME_H +#include #include struct word_at_a_time { @@ -45,4 +46,30 @@ static inline unsigned long find_zero(unsigned long mask) /* The mask we created is directly usable as a bytemask */ #define zero_bytemask(mask) (mask) +#ifdef CONFIG_DCACHE_WORD_ACCESS + +/* + * Load an unaligned word from kernel space. + * + * In the (very unlikely) case of the word being a page-crosser + * and the next page not being mapped, take the exception and + * return zeroes in the non-existing part. + */ +static inline unsigned long load_unaligned_zeropad(const void *addr) +{ + unsigned long ret; + + /* Load word from unaligned pointer addr */ + asm( + "1: " REG_L " %0, %2\n" + "2:\n" + _ASM_EXTABLE_LOAD_UNALIGNED_ZEROPAD(1b, 2b, %0, %1) + : "=&r" (ret) + : "r" (addr), "m" (*(unsigned long *)addr)); + + return ret; +} + +#endif /* CONFIG_DCACHE_WORD_ACCESS */ + #endif /* _ASM_RISCV_WORD_AT_A_TIME_H */ diff --git a/arch/riscv/mm/extable.c b/arch/riscv/mm/extable.c index 35484d830fd6..dd1530af3ef1 100644 --- a/arch/riscv/mm/extable.c +++ b/arch/riscv/mm/extable.c @@ -27,6 +27,14 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex, return true; } +static inline unsigned long regs_get_gpr(struct pt_regs *regs, unsigned int offset) +{ + if (unlikely(!offset || offset > MAX_REG_OFFSET)) + return 0; + + return *(unsigned long *)((unsigned long)regs + offset); +} + static inline void regs_set_gpr(struct pt_regs *regs, unsigned int offset, unsigned long val) { @@ -50,6 +58,27 @@ static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex, return true; } +static bool +ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex, + struct pt_regs *regs) +{ + int reg_data = FIELD_GET(EX_DATA_REG_DATA, ex->data); + int reg_addr = FIELD_GET(EX_DATA_REG_ADDR, ex->data); + unsigned long data, addr, offset; + + addr = regs_get_gpr(regs, reg_addr * sizeof(unsigned long)); + + offset = addr & 0x7UL; + addr &= ~0x7UL; + + data = *(unsigned long *)addr >> (offset * 8); + + regs_set_gpr(regs, reg_data * sizeof(unsigned long), data); + + regs->epc = get_ex_fixup(ex); + return true; +} + bool fixup_exception(struct pt_regs *regs) { const struct exception_table_entry *ex; @@ -65,6 +94,8 @@ bool fixup_exception(struct pt_regs *regs) return ex_handler_bpf(ex, regs); case EX_TYPE_UACCESS_ERR_ZERO: return ex_handler_uaccess_err_zero(ex, regs); + case EX_TYPE_LOAD_UNALIGNED_ZEROPAD: + return ex_handler_load_unaligned_zeropad(ex, regs); } BUG();