From patchwork Tue Mar 23 07:34:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Yingliang X-Patchwork-Id: 12157075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A67E3C433DB for ; Tue, 23 Mar 2021 07:34:15 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 25D3060231 for ; Tue, 23 Mar 2021 07:34:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 25D3060231 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=L2HScvDB+eGQmrYlkJk9gnhkTemjFKF6pLK1yg3cQQo=; b=dZDz9EgInXY2u/JGjoV0wB/h+ euWCT/pgWO912LTbZuTrHFJWlusCt2iPblqJ7wxXRPiR8DLfQPs/WajDc4wqZE0wqJYPkP46WLjae 8KfYsXD0ge819jZbbw3n8v2H8rSgem/kb5s1twkO2RfwSBlhPbYpQDgWmrBHz22qA6+rzlE7v979s +ppDrYU1D2uhfw3EmS2VQpEiCXjuUu4cZhrdgY6ENGrNvCKv9bb1ofv4l6FOjJVaxE8y610MQqyiQ 5BgtS1KrCYxFSrEGeDbhVQDeXIpjcGeEwn/qF6sWE2lCHC7a4Y3mcAl62rEl7dbHuQXXbVA/KapKs k07ZHGw0Q==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lObWi-00E988-NV; Tue, 23 Mar 2021 07:32:32 +0000 Received: from szxga05-in.huawei.com ([45.249.212.191]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lObWB-00E90Q-4Y for linux-arm-kernel@lists.infradead.org; Tue, 23 Mar 2021 07:32:02 +0000 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4F4NKW5BxVzNqX2; Tue, 23 Mar 2021 15:29:19 +0800 (CST) Received: from huawei.com (10.175.103.91) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Tue, 23 Mar 2021 15:31:40 +0800 From: Yang Yingliang To: , CC: , , , Subject: [PATCH 1/3] arm64: lib: introduce ldp2/stp2 macro Date: Tue, 23 Mar 2021 15:34:30 +0800 Message-ID: <20210323073432.3422227-2-yangyingliang@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323073432.3422227-1-yangyingliang@huawei.com> References: <20210323073432.3422227-1-yangyingliang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.103.91] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210323_073159_813488_708FE4F8 X-CRM114-Status: UNSURE ( 6.92 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Introduce ldp2/stp2 to load/store without add src/dst. Signed-off-by: Yang Yingliang --- arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++ arch/arm64/lib/copy_from_user.S | 8 ++++++++ arch/arm64/lib/copy_in_user.S | 8 ++++++++ arch/arm64/lib/copy_to_user.S | 8 ++++++++ arch/arm64/lib/memcpy.S | 8 ++++++++ 5 files changed, 48 insertions(+) diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h index ccedf548dac9..129c08621df1 100644 --- a/arch/arm64/include/asm/asm-uaccess.h +++ b/arch/arm64/include/asm/asm-uaccess.h @@ -72,6 +72,14 @@ alternative_else_nop_endif _asm_extable 8889b,\l; .endm + .macro user_ldp2 l, reg1, reg2, addr, post_inc1, post_inc2 +8888: ldtr \reg1, [\addr, \post_inc1]; +8889: ldtr \reg2, [\addr, \post_inc2]; + + _asm_extable 8888b,\l; + _asm_extable 8889b,\l; + .endm + .macro user_stp l, reg1, reg2, addr, post_inc 8888: sttr \reg1, [\addr]; 8889: sttr \reg2, [\addr, #8]; @@ -81,6 +89,14 @@ alternative_else_nop_endif _asm_extable 8889b,\l; .endm + .macro user_stp2 l, reg1, reg2, addr, post_inc1, post_inc2 +8888: sttr \reg1, [\addr, \post_inc1]; +8889: sttr \reg2, [\addr, \post_inc2]; + + _asm_extable 8888b,\l; + _asm_extable 8889b,\l; + .endm + .macro user_ldst l, inst, reg, addr, post_inc 8888: \inst \reg, [\addr]; add \addr, \addr, \post_inc; diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S index 95cd62d67371..37308bcb338e 100644 --- a/arch/arm64/lib/copy_from_user.S +++ b/arch/arm64/lib/copy_from_user.S @@ -48,10 +48,18 @@ user_ldp 9998f, \reg1, \reg2, \ptr, \val .endm + .macro ldp2 reg1, reg2, ptr, val1, val2 + user_ldp2 9998f, \reg1, \reg2, \ptr, \val1, \val2 + .endm + .macro stp1 reg1, reg2, ptr, val stp \reg1, \reg2, [\ptr], \val .endm + .macro stp2 reg1, reg2, ptr, val1, val2 + stp \reg1, \reg2, [\ptr, \val1] + .endm + end .req x5 SYM_FUNC_START(__arch_copy_from_user) add end, x0, x2 diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S index 1f61cd0df062..5654f7098102 100644 --- a/arch/arm64/lib/copy_in_user.S +++ b/arch/arm64/lib/copy_in_user.S @@ -49,10 +49,18 @@ user_ldp 9998f, \reg1, \reg2, \ptr, \val .endm + .macro ldp2 reg1, reg2, ptr, val1, val2 + user_ldp2 9998f, \reg1, \reg2, \ptr, \val1, \val2 + .endm + .macro stp1 reg1, reg2, ptr, val user_stp 9998f, \reg1, \reg2, \ptr, \val .endm + .macro stp2 reg1, reg2, ptr, val1, val2 + user_stp2 9998f, \reg1, \reg2, \ptr, \val1, \val2 + .endm + end .req x5 SYM_FUNC_START(__arch_copy_in_user) diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S index 043da90f5dd7..a1f95169ce04 100644 --- a/arch/arm64/lib/copy_to_user.S +++ b/arch/arm64/lib/copy_to_user.S @@ -47,10 +47,18 @@ ldp \reg1, \reg2, [\ptr], \val .endm + .macro ldp2 reg1, reg2, ptr, val1, val2 + ldp \reg1, \reg2, [\ptr, \val1] + .endm + .macro stp1 reg1, reg2, ptr, val user_stp 9998f, \reg1, \reg2, \ptr, \val .endm + .macro stp2 reg1, reg2, ptr, val1, val2 + user_stp2 9998f, \reg1, \reg2, \ptr, \val1, \val2 + .endm + end .req x5 SYM_FUNC_START(__arch_copy_to_user) add end, x0, x2 diff --git a/arch/arm64/lib/memcpy.S b/arch/arm64/lib/memcpy.S index dc8d2a216a6e..9e0bfefd2673 100644 --- a/arch/arm64/lib/memcpy.S +++ b/arch/arm64/lib/memcpy.S @@ -52,10 +52,18 @@ ldp \reg1, \reg2, [\ptr], \val .endm + .macro ldp2 reg1, reg2, ptr, val1, val2 + ldp \reg1, \reg2, [\ptr, \val1] + .endm + .macro stp1 reg1, reg2, ptr, val stp \reg1, \reg2, [\ptr], \val .endm + .macro stp2 reg1, reg2, ptr, val1, val2 + stp \reg1, \reg2, [\ptr, \val1] + .endm + SYM_FUNC_START_ALIAS(__memcpy) SYM_FUNC_START_WEAK_PI(memcpy) #include "copy_template.S" From patchwork Tue Mar 23 07:34:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Yingliang X-Patchwork-Id: 12157073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D47C433C1 for ; Tue, 23 Mar 2021 07:34:06 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B7D926199F for ; Tue, 23 Mar 2021 07:34:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7D926199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=zSGqXmfSuEYo06/mRW1HkzTYXC2bEJuKwILpfEU8wKw=; b=OSp8njOk3pxb5zwr360fw+IeM a2DI3EQpT70Ws3PoVSoeCR0KGSrh0IgniFmbOeEKYyAy3508xh18DkU8G5fbFDKkB5P5gkw7cBn96 quNUZj6V9IBqLxJv/1sPok7KxtecTcbagKiF4C6+EQ5wX/tBPkry55yVyqu/Pvj6oEMmVLrtyhXgt WB0watJ+LmWYhHeHptB9RUx5+uXKDgkmXMsqQ3zsEdpqY6WQG21GJL3DBDL7IbInnuO14/63rmuOE N3VqgodvsOcactG6P/39o9vHMSzcTOs7fw7XyMTFrzDmpnYZBFYlMJmnzyP1TvMCFPl0XdpzIgsIK j7eZ2ncsw==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lObWa-00E96k-On; Tue, 23 Mar 2021 07:32:24 +0000 Received: from szxga05-in.huawei.com ([45.249.212.191]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lObWA-00E90O-7V for linux-arm-kernel@lists.infradead.org; Tue, 23 Mar 2021 07:32:02 +0000 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4F4NKW4VsSzNqT8; Tue, 23 Mar 2021 15:29:19 +0800 (CST) Received: from huawei.com (10.175.103.91) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Tue, 23 Mar 2021 15:31:40 +0800 From: Yang Yingliang To: , CC: , , , Subject: [PATCH 2/3] arm64: lib: improve copy performance when size is ge 128 bytes Date: Tue, 23 Mar 2021 15:34:31 +0800 Message-ID: <20210323073432.3422227-3-yangyingliang@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323073432.3422227-1-yangyingliang@huawei.com> References: <20210323073432.3422227-1-yangyingliang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.103.91] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210323_073158_740603_461473CE X-CRM114-Status: UNSURE ( 6.99 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When copy over 128 bytes, src/dst is added after each ldp/stp instruction, it will cost more time. To improve this, we only add src/dst after load or store 64 bytes. Copy 4096 bytes cost on Kunpeng920 (ms): Without this patch: memcpy: 143.85 copy_from_user: 172.69 copy_to_user: 199.23 With this patch: memcpy: 107.12 copy_from_user: 157.50 copy_to_user: 198.85 It's about 25% improvement in memcpy(). Signed-off-by: Yang Yingliang --- arch/arm64/lib/copy_template.S | 36 +++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S index 488df234c49a..c3cd6f84c9c0 100644 --- a/arch/arm64/lib/copy_template.S +++ b/arch/arm64/lib/copy_template.S @@ -152,29 +152,33 @@ D_h .req x14 .p2align L1_CACHE_SHIFT .Lcpy_body_large: /* pre-get 64 bytes data. */ - ldp1 A_l, A_h, src, #16 - ldp1 B_l, B_h, src, #16 - ldp1 C_l, C_h, src, #16 - ldp1 D_l, D_h, src, #16 + ldp2 A_l, A_h, src, #0, #8 + ldp2 B_l, B_h, src, #16, #24 + ldp2 C_l, C_h, src, #32, #40 + ldp2 D_l, D_h, src, #48, #56 + add src, src, #64 1: /* * interlace the load of next 64 bytes data block with store of the last * loaded 64 bytes data. */ - stp1 A_l, A_h, dst, #16 - ldp1 A_l, A_h, src, #16 - stp1 B_l, B_h, dst, #16 - ldp1 B_l, B_h, src, #16 - stp1 C_l, C_h, dst, #16 - ldp1 C_l, C_h, src, #16 - stp1 D_l, D_h, dst, #16 - ldp1 D_l, D_h, src, #16 + stp2 A_l, A_h, dst, #0, #8 + ldp2 A_l, A_h, src, #0, #8 + stp2 B_l, B_h, dst, #16, #24 + ldp2 B_l, B_h, src, #16, #24 + stp2 C_l, C_h, dst, #32, #40 + ldp2 C_l, C_h, src, #32, #40 + stp2 D_l, D_h, dst, #48, #56 + ldp2 D_l, D_h, src, #48, #56 + add src, src, #64 + add dst, dst, #64 subs count, count, #64 b.ge 1b - stp1 A_l, A_h, dst, #16 - stp1 B_l, B_h, dst, #16 - stp1 C_l, C_h, dst, #16 - stp1 D_l, D_h, dst, #16 + stp2 A_l, A_h, dst, #0, #8 + stp2 B_l, B_h, dst, #16, #24 + stp2 C_l, C_h, dst, #32, #40 + stp2 D_l, D_h, dst, #48, #56 + add dst, dst, #64 tst count, #0x3f b.ne .Ltail63 From patchwork Tue Mar 23 07:34:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Yingliang X-Patchwork-Id: 12157077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB0A9C433DB for ; Tue, 23 Mar 2021 07:34:23 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 523DE60231 for ; Tue, 23 Mar 2021 07:34:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 523DE60231 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:CC:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=6MmdyIORufuiOF+qzstmuXqpDQi7fWHx+VRMq8g/N1A=; b=HDYcvlj4GgljVk47whaQuy3e1 oAVsdnfS6qj4tu8AYg0Bb3MzHXI9zIOb1tz9nnNsSMBpYJcTxKcH2VQLRCASGmTWt8mqj4Mfa5ey5 Hv4Iyk9qkVPiqc0UHy3Zg7gA/l1l9JFAphemBcb1KSIQB3ziO9yCcqQh5HsrDc+AbUFeqhT244JeU qUgI3DNVUIhUSm8TOJeKnLQQAoVwoxSIkHHYRv+6YLC6Z4pIs4924A41vJrwGeIwilpSzN5dVkpqQ IDuS7iYci6yzdwXLuotYdfh64A//d5GOuQL1lfQwFiiI/4omARN9radTOdwWvONnpK2V6OSq2ImGH cfwodfM0w==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lObWr-00E99Y-0L; Tue, 23 Mar 2021 07:32:41 +0000 Received: from szxga05-in.huawei.com ([45.249.212.191]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lObWB-00E90N-6j for linux-arm-kernel@lists.infradead.org; Tue, 23 Mar 2021 07:32:02 +0000 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4F4NKW4DWszNqTF; Tue, 23 Mar 2021 15:29:19 +0800 (CST) Received: from huawei.com (10.175.103.91) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Tue, 23 Mar 2021 15:31:41 +0800 From: Yang Yingliang To: , CC: , , , Subject: [PATCH 3/3] arm64: lib: improve copy performance when size is less than 128 and ge 64 bytes Date: Tue, 23 Mar 2021 15:34:32 +0800 Message-ID: <20210323073432.3422227-4-yangyingliang@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210323073432.3422227-1-yangyingliang@huawei.com> References: <20210323073432.3422227-1-yangyingliang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.103.91] X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210323_073159_715608_A64BBBAD X-CRM114-Status: UNSURE ( 7.51 ) X-CRM114-Notice: Please train this message. X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When copy less than 128 and ge than 64 bytes, add src/dst after load and store 64 bytes to improve performance. Copy 127 bytes cost on Kunpeng920 (ms): Without this patch: memcpy: 14.62 copy_from_user: 14.23 copy_to_user: 14.42 With this patch: memcpy: 13.85 copy_from_user: 13.26 copy_to_user: 13.84 It's about 5.27% improvement in memcpy(). Signed-off-by: Yang Yingliang --- arch/arm64/lib/copy_template.S | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S index c3cd6f84c9c0..a9cbd47473f0 100644 --- a/arch/arm64/lib/copy_template.S +++ b/arch/arm64/lib/copy_template.S @@ -132,14 +132,16 @@ D_h .req x14 * Less than 128 bytes to copy, so handle 64 here and then jump * to the tail. */ - ldp1 A_l, A_h, src, #16 - stp1 A_l, A_h, dst, #16 - ldp1 B_l, B_h, src, #16 - ldp1 C_l, C_h, src, #16 - stp1 B_l, B_h, dst, #16 - stp1 C_l, C_h, dst, #16 - ldp1 D_l, D_h, src, #16 - stp1 D_l, D_h, dst, #16 + ldp2 A_l, A_h, src, #0, #8 + stp2 A_l, A_h, dst, #0, #8 + ldp2 B_l, B_h, src, #16, #24 + ldp2 C_l, C_h, src, #32, #40 + stp2 B_l, B_h, dst, #16, #24 + stp2 C_l, C_h, dst, #32, #40 + ldp2 D_l, D_h, src, #48, #56 + stp2 D_l, D_h, dst, #48, #56 + add src, src, #64 + add dst, dst, #64 tst count, #0x3f b.ne .Ltail63