From patchwork Wed May 10 03:52:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhangfei X-Patchwork-Id: 13236276 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E3857C7EE23 for ; Wed, 10 May 2023 03:53:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=THIyJz93kF8GYWiAPjv+umkIw8I1R5UawlvdmmJdLDU=; b=EGzkPxhdTR/MaH mx/KQBOccMphdB1p8MSemXyPEd2qJjF/w44LbgCnr0+tDzFWEZhgx7fSZCCROrWjGzNThf8/orCl0 EH670bDTQHXm6aAHexgeSjnjjZIno929QSQwpwwBQJCVhJ8o3mPvAZqenl9KM2fGKiCihPOBJ8H0j X15COeHsszfacj3eB0NSLEMx6f2iDhKfFwtWoSNUAPnN6fWFAJyh3IUtecEB6gkBpDTxxRkXarbmX t4pRMZ5hN3QNAc47k7Dmmu3wvGLzu+pIFMxApgy/Jhk/D1MHSS+LMp4LCdRzJXaLYbCaUJrowbLLJ x0X3iZT6wpAAJf31iuVg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pwasz-004wtR-1S; Wed, 10 May 2023 03:53:05 +0000 Received: from m12.mail.163.com ([220.181.12.198]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pwasv-004ws6-0y for linux-riscv@lists.infradead.org; Wed, 10 May 2023 03:53:03 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=whf+T AG7OCcV2LDVpNChSoc8r0T2b3ye10CxikYpsv4=; b=aSvVdGelI8nU99QxPsZCE hkvRIckr4V703o7xurjurOErD4fDYRP+EqQ0UcYU2wavk0LdF9LzM4nt7fzv2C7l zjnaJnyJ7pPW9cMhWPnRP9ExP20/W3/L8L4Qvlt+Wk2Z6kyNvqL0qN3DlFUed9sZ M1iXQ77tSs1BBGudLFanr0= Received: from zhangf-virtual-machine.localdomain (unknown [180.111.102.183]) by zwqz-smtp-mta-g3-4 (Coremail) with SMTP id _____wBHlVQOFVtk+zPWBQ--.60792S3; Wed, 10 May 2023 11:52:48 +0800 (CST) From: zhangfei To: ajones@ventanamicro.com Cc: aou@eecs.berkeley.edu, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, palmer@dabbelt.com, paul.walmsley@sifive.com, zhang_fei_0403@163.com, zhangfei@nj.iscas.ac.cn, Conor Dooley Subject: [PATCH 1/2] RISC-V: lib: Improve memset assembler formatting Date: Wed, 10 May 2023 11:52:41 +0800 Message-Id: <20230510035243.8586-2-zhang_fei_0403@163.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230510035243.8586-1-zhang_fei_0403@163.com> References: <20230509-b0dc346928ddc8d2b5690f67@orel> <20230510035243.8586-1-zhang_fei_0403@163.com> MIME-Version: 1.0 X-CM-TRANSID: _____wBHlVQOFVtk+zPWBQ--.60792S3 X-Coremail-Antispam: 1Uf129KBjvJXoWxCF4fGF13Aw45Cr4kArW8Zwb_yoWrtr4Upw 4fG34rGayqkF1rW34YqFyrKFWDJw4Sq3Z5Xw1ayr12kr1UKry7Za4qqFW5twnFyrW3ur4D ZF1DJrW7ZFy5XrDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UpxRgUUUUU= X-Originating-IP: [180.111.102.183] X-CM-SenderInfo: x2kd0w5bihxsiquqjqqrwthudrp/1tbiWxVrl2I0Z90ZqwAAss X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230509_205301_702711_75045395 X-CRM114-Status: GOOD ( 11.60 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Andrew Jones Aligning the first operand of each instructions with a tab is a typical style which improves readability. Apply it to memset.S. While there, we also make a small grammar change to a comment. No functional change intended. Signed-off-by: Andrew Jones Reviewed-by: Conor Dooley --- arch/riscv/lib/memset.S | 143 ++++++++++++++++++++-------------------- 1 file changed, 72 insertions(+), 71 deletions(-) diff --git a/arch/riscv/lib/memset.S b/arch/riscv/lib/memset.S index 34c5360c6705..e613c5c27998 100644 --- a/arch/riscv/lib/memset.S +++ b/arch/riscv/lib/memset.S @@ -3,111 +3,112 @@ * Copyright (C) 2013 Regents of the University of California */ - #include #include /* void *memset(void *, int, size_t) */ ENTRY(__memset) WEAK(memset) - move t0, a0 /* Preserve return value */ + move t0, a0 /* Preserve return value */ /* Defer to byte-oriented fill for small sizes */ - sltiu a3, a2, 16 - bnez a3, 4f + sltiu a3, a2, 16 + bnez a3, 4f /* * Round to nearest XLEN-aligned address - * greater than or equal to start address + * greater than or equal to the start address. */ - addi a3, t0, SZREG-1 - andi a3, a3, ~(SZREG-1) - beq a3, t0, 2f /* Skip if already aligned */ + addi a3, t0, SZREG-1 + andi a3, a3, ~(SZREG-1) + beq a3, t0, 2f /* Skip if already aligned */ + /* Handle initial misalignment */ - sub a4, a3, t0 + sub a4, a3, t0 1: - sb a1, 0(t0) - addi t0, t0, 1 - bltu t0, a3, 1b - sub a2, a2, a4 /* Update count */ + sb a1, 0(t0) + addi t0, t0, 1 + bltu t0, a3, 1b + sub a2, a2, a4 /* Update count */ 2: /* Duff's device with 32 XLEN stores per iteration */ /* Broadcast value into all bytes */ - andi a1, a1, 0xff - slli a3, a1, 8 - or a1, a3, a1 - slli a3, a1, 16 - or a1, a3, a1 + andi a1, a1, 0xff + slli a3, a1, 8 + or a1, a3, a1 + slli a3, a1, 16 + or a1, a3, a1 #ifdef CONFIG_64BIT - slli a3, a1, 32 - or a1, a3, a1 + slli a3, a1, 32 + or a1, a3, a1 #endif /* Calculate end address */ - andi a4, a2, ~(SZREG-1) - add a3, t0, a4 + andi a4, a2, ~(SZREG-1) + add a3, t0, a4 - andi a4, a4, 31*SZREG /* Calculate remainder */ - beqz a4, 3f /* Shortcut if no remainder */ - neg a4, a4 - addi a4, a4, 32*SZREG /* Calculate initial offset */ + andi a4, a4, 31*SZREG /* Calculate remainder */ + beqz a4, 3f /* Shortcut if no remainder */ + neg a4, a4 + addi a4, a4, 32*SZREG /* Calculate initial offset */ /* Adjust start address with offset */ - sub t0, t0, a4 + sub t0, t0, a4 /* Jump into loop body */ /* Assumes 32-bit instruction lengths */ - la a5, 3f + la a5, 3f #ifdef CONFIG_64BIT - srli a4, a4, 1 + srli a4, a4, 1 #endif - add a5, a5, a4 - jr a5 + add a5, a5, a4 + jr a5 3: - REG_S a1, 0(t0) - REG_S a1, SZREG(t0) - REG_S a1, 2*SZREG(t0) - REG_S a1, 3*SZREG(t0) - REG_S a1, 4*SZREG(t0) - REG_S a1, 5*SZREG(t0) - REG_S a1, 6*SZREG(t0) - REG_S a1, 7*SZREG(t0) - REG_S a1, 8*SZREG(t0) - REG_S a1, 9*SZREG(t0) - REG_S a1, 10*SZREG(t0) - REG_S a1, 11*SZREG(t0) - REG_S a1, 12*SZREG(t0) - REG_S a1, 13*SZREG(t0) - REG_S a1, 14*SZREG(t0) - REG_S a1, 15*SZREG(t0) - REG_S a1, 16*SZREG(t0) - REG_S a1, 17*SZREG(t0) - REG_S a1, 18*SZREG(t0) - REG_S a1, 19*SZREG(t0) - REG_S a1, 20*SZREG(t0) - REG_S a1, 21*SZREG(t0) - REG_S a1, 22*SZREG(t0) - REG_S a1, 23*SZREG(t0) - REG_S a1, 24*SZREG(t0) - REG_S a1, 25*SZREG(t0) - REG_S a1, 26*SZREG(t0) - REG_S a1, 27*SZREG(t0) - REG_S a1, 28*SZREG(t0) - REG_S a1, 29*SZREG(t0) - REG_S a1, 30*SZREG(t0) - REG_S a1, 31*SZREG(t0) - addi t0, t0, 32*SZREG - bltu t0, a3, 3b - andi a2, a2, SZREG-1 /* Update count */ + REG_S a1, 0(t0) + REG_S a1, SZREG(t0) + REG_S a1, 2*SZREG(t0) + REG_S a1, 3*SZREG(t0) + REG_S a1, 4*SZREG(t0) + REG_S a1, 5*SZREG(t0) + REG_S a1, 6*SZREG(t0) + REG_S a1, 7*SZREG(t0) + REG_S a1, 8*SZREG(t0) + REG_S a1, 9*SZREG(t0) + REG_S a1, 10*SZREG(t0) + REG_S a1, 11*SZREG(t0) + REG_S a1, 12*SZREG(t0) + REG_S a1, 13*SZREG(t0) + REG_S a1, 14*SZREG(t0) + REG_S a1, 15*SZREG(t0) + REG_S a1, 16*SZREG(t0) + REG_S a1, 17*SZREG(t0) + REG_S a1, 18*SZREG(t0) + REG_S a1, 19*SZREG(t0) + REG_S a1, 20*SZREG(t0) + REG_S a1, 21*SZREG(t0) + REG_S a1, 22*SZREG(t0) + REG_S a1, 23*SZREG(t0) + REG_S a1, 24*SZREG(t0) + REG_S a1, 25*SZREG(t0) + REG_S a1, 26*SZREG(t0) + REG_S a1, 27*SZREG(t0) + REG_S a1, 28*SZREG(t0) + REG_S a1, 29*SZREG(t0) + REG_S a1, 30*SZREG(t0) + REG_S a1, 31*SZREG(t0) + + addi t0, t0, 32*SZREG + bltu t0, a3, 3b + andi a2, a2, SZREG-1 /* Update count */ 4: /* Handle trailing misalignment */ - beqz a2, 6f - add a3, t0, a2 + beqz a2, 6f + add a3, t0, a2 5: - sb a1, 0(t0) - addi t0, t0, 1 - bltu t0, a3, 5b + sb a1, 0(t0) + addi t0, t0, 1 + bltu t0, a3, 5b 6: ret END(__memset) From patchwork Wed May 10 03:52:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: zhangfei X-Patchwork-Id: 13236277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 36A2FC7EE22 for ; Wed, 10 May 2023 03:53:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=aIKlgkNZVKJxSCRwAkHF11KoPHZrX9YgHSyPiggRmPk=; b=dnIbXpqSaqfQ+x O1K6iBih1V44FpUY5CJQkNuXKcNYSIXSvy34ZmHxLoTOtpX2TeyAyrOUY5pB6O52f09lzR6d1upFQ bUel+jgJKF0DqF24espCNKizSeCR0O9gbRunVMjUtj5hvBh7s+xl0Sc5qZop7Im8Xs7NmQms2cuA3 fjRIq2XzZ33jvwkctv44RWI7bnUQOJUNT1ud13slHUSwMmt60/nFd3UvNjhQVtKzlGCP8ekrjY7Ul i4iIEFsUYNYE2n9khpyEAMYwmZZINdFG9FkxxXJAIfe0ClLpKYBokZKgz0MMOJrG8IwaTF7Xn2XOE YubAtl6+O/qicplVmHhg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pwat3-004wu6-08; Wed, 10 May 2023 03:53:09 +0000 Received: from m12.mail.163.com ([220.181.12.199]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pwasx-004wst-2r for linux-riscv@lists.infradead.org; Wed, 10 May 2023 03:53:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=iEiFU y1jQsRM6aIe5Ug6C6I5S7K8RRjm4135kykYFHY=; b=kauZQLk2XXLBOu4xJE3XR 4fl5u+T5bRybY2yydZfvCRtnVIU1brx6hI4krs+8NYWr8KCZr8RnRUx6IM2ToyCw WGvfxfyHK3trkzUq+mZJdVrTOxikBJMNxLlvGci0/xdttQF3TOjh6bXljEFv+712 3TdLFvcSz+cZzw1PaQipCQ= Received: from zhangf-virtual-machine.localdomain (unknown [180.111.102.183]) by zwqz-smtp-mta-g3-4 (Coremail) with SMTP id _____wBHlVQOFVtk+zPWBQ--.60792S4; Wed, 10 May 2023 11:52:56 +0800 (CST) From: zhangfei To: ajones@ventanamicro.com Cc: aou@eecs.berkeley.edu, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, palmer@dabbelt.com, paul.walmsley@sifive.com, zhang_fei_0403@163.com, zhangfei@nj.iscas.ac.cn Subject: [PATCH 2/2] riscv: Optimize memset Date: Wed, 10 May 2023 11:52:42 +0800 Message-Id: <20230510035243.8586-3-zhang_fei_0403@163.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230510035243.8586-1-zhang_fei_0403@163.com> References: <20230509-b0dc346928ddc8d2b5690f67@orel> <20230510035243.8586-1-zhang_fei_0403@163.com> MIME-Version: 1.0 X-CM-TRANSID: _____wBHlVQOFVtk+zPWBQ--.60792S4 X-Coremail-Antispam: 1Uf129KBjvdXoW7GryxZw13AFWrKrWxtrW8JFb_yoWkAFc_Gr WxCa97JFyDJFZ3Xa9rtw13Kry8uFZ8KrykG3WDtw1UG3WFkwnxtrWYqry5AF18XwsrGay3 G3ZrJr4rXr1UGjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUvcSsGvfC2KfnxnUUI43ZEXa7IUUWlkPUUUUU== X-Originating-IP: [180.111.102.183] X-CM-SenderInfo: x2kd0w5bihxsiquqjqqrwthudrp/xtbCfAlrl2DcJd-5aAAAse X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230509_205304_247762_9A2AEA7E X-CRM114-Status: UNSURE ( 6.33 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: zhangfei This patch has been optimized for memset data sizes less than 16 bytes. Compared to byte by byte storage, significant performance improvement has been achieved. It allows storage instructions to be executed in parallel and reduces the number of jumps. Signed-off-by: Fei Zhang --- arch/riscv/lib/memset.S | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/arch/riscv/lib/memset.S b/arch/riscv/lib/memset.S index e613c5c27998..6113a2696e79 100644 --- a/arch/riscv/lib/memset.S +++ b/arch/riscv/lib/memset.S @@ -106,9 +106,36 @@ WEAK(memset) beqz a2, 6f add a3, t0, a2 5: - sb a1, 0(t0) - addi t0, t0, 1 - bltu t0, a3, 5b + sb a1, 0(t0) + sb a1, -1(a3) + li a4, 2 + bgeu a4, a2, 6f + + sb a1, 1(t0) + sb a1, 2(t0) + sb a1, -2(a3) + sb a1, -3(a3) + li a4, 6 + bgeu a4, a2, 6f + + sb a1, 3(t0) + sb a1, -4(a3) + li a4, 8 + bgeu a4, a2, 6f + + sb a1, 4(t0) + sb a1, -5(a3) + li a4, 10 + bgeu a4, a2, 6f + + sb a1, 5(t0) + sb a1, 6(t0) + sb a1, -6(a3) + sb a1, -7(a3) + li a4, 14 + bgeu a4, a2, 6f + + sb a1, 7(t0) 6: ret END(__memset)