From patchwork Wed Oct 14 08:32:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837053 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9CEA921 for ; Wed, 14 Oct 2020 08:35:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 697FE221EB for ; Wed, 14 Oct 2020 08:35:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="kDnMN13V" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 697FE221EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F9396B005D; Wed, 14 Oct 2020 04:35:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 98270900002; Wed, 14 Oct 2020 04:35:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84C866B0068; Wed, 14 Oct 2020 04:35:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id 53C406B005D for ; Wed, 14 Oct 2020 04:35:06 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D85023628 for ; Wed, 14 Oct 2020 08:35:05 +0000 (UTC) X-FDA: 77369870970.01.week69_14078cb2720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id A998110046466 for ; Wed, 14 Oct 2020 08:35:05 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30054:30055:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yrxpgebg4kfjf8g7yt5jab3txhpycf3qe8qs9xy86gjhj6ksa943hjjopx9cy.as3o8nm7sx39o8xsmprxsc3jgh3r1i6oory78zpbrgekcxouxzyyopmqzu51xqs.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: week69_14078cb2720a X-Filterd-Recvd-Size: 4943 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:35:05 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8YI7M186148; Wed, 14 Oct 2020 08:34:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=rg0xSo5FhydTErKxas68rHq7YrKqqteB+Ff0Hk9ivdA=; b=kDnMN13Vj5stPAKXHNwEDiilSWkX29QeVB7JmQd4DssmSHjnD9naTnkqRJg0w/M2j709 ZJ2p8Pcx6mNGDadyWuAN38ICIrNYYbK85i/MEMiqkTMu9x6nW/4cORmqkc2WNp38Ya5E 0Ph7jMwFPplQ4M/pqDnpoTXLMmnoapvm2lEDfaXdICfYZhmFz+FPIvWpWGZNHieLEnbG QEY7CfFl8Yt8TgytJCEE8p4tnQvXFbjZlxLycoskaBgM0WPcNEE/WrehVnHn6uEYXeU3 xL3SbfLHBq41C7pPNkmjnTqX8ybSvgUWNFkOluHwVnj1tGYEeEnwWMegXDmlOzXyT/+U /g== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 3434wkp6bh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:34:56 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8UZR5111643; Wed, 14 Oct 2020 08:32:55 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 343pvxfsma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:32:55 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09E8WnHu021251; Wed, 14 Oct 2020 08:32:49 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:32:49 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Tony Luck , Pawan Gupta , Josh Poimboeuf , "Peter Zijlstra (Intel)" , Mark Gross , Kim Phillips , Vineela Tummalapalli , Wei Huang Subject: [PATCH 1/8] x86/cpuid: add X86_FEATURE_NT_GOOD Date: Wed, 14 Oct 2020 01:32:52 -0700 Message-Id: <20201014083300.19077-2-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 suspectscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 impostorscore=0 clxscore=1011 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140062 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Enabled on microarchitectures with performant non-temporal MOV (MOVNTI) instruction. Signed-off-by: Ankur Arora --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 7b0afd5e6c57..8bae38240346 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -289,6 +289,7 @@ #define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */ #define X86_FEATURE_SPLIT_LOCK_DETECT (11*32+ 6) /* #AC for split lock */ #define X86_FEATURE_PER_THREAD_MBA (11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */ +#define X86_FEATURE_NT_GOOD (11*32+ 8) /* Non-temporal instructions perform well */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ From patchwork Wed Oct 14 08:32:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837041 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B73BF14B2 for ; Wed, 14 Oct 2020 08:33:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 65BBD214D8 for ; Wed, 14 Oct 2020 08:33:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="byq/npMm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 65BBD214D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9FE6E940007; Wed, 14 Oct 2020 04:33:15 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9AE216B0062; Wed, 14 Oct 2020 04:33:15 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C8C4940007; Wed, 14 Oct 2020 04:33:15 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 5F9046B005D for ; Wed, 14 Oct 2020 04:33:15 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E7E67180AD806 for ; Wed, 14 Oct 2020 08:33:14 +0000 (UTC) X-FDA: 77369866308.03.work90_16184852720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id C74C828A4E8 for ; Wed, 14 Oct 2020 08:33:14 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30054:30064,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04y8j1ufecos8erkyrk4gep1iirxpyphfaqc6qct4cnr1wqsqy9ejmp51jtszeu.iz1nrkncigmzh71i4h7wcuqn9sueewoim55pswkfj1b5nuy3wtggtobngsr7zj4.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: work90_16184852720a X-Filterd-Recvd-Size: 6710 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:33:14 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8TnWF024154; Wed, 14 Oct 2020 08:32:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=dngeZ8n9mXihWuMdvo9jC04/GenkD+EXiDmLqvyLl9Q=; b=byq/npMmocHxiIKS7tZ1RDcOCPrbxh4XlZ9T1W9YBgb91WgtCI6o0b65oQiEgKRJVoMN vTG0MRZK+SEqjfg19eulTsIfULFI9Du/CGTgaUwRbDBPUP2IgS6nQLz57jR55saHsDCv wY4TFOaes692o+d6rwaOuBTlO3T6OpKh9hqggHrwvdZ24IaQG/vkUDyQoesvo1ycgSLo RBi6I4mGjgux8BFtLgZgYHFsbP6qkdBVtZbj9JT0Ak/uJwJVWj8EAos0J96mAKb5whXo fa9lsYC+8OdHNuDET7pekY4/D6EwzCL5xfP5z54DyMo7pRZalG1WR/YFXlLBpUuYGBgA TQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 343vaecfn9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:32:58 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8TguQ125847; Wed, 14 Oct 2020 08:32:57 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 343php84kt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:32:57 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09E8Wthn000307; Wed, 14 Oct 2020 08:32:55 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:32:55 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Jiri Slaby , Juergen Gross Subject: [PATCH 2/8] x86/asm: add memset_movnti() Date: Wed, 14 Oct 2020 01:32:53 -0700 Message-Id: <20201014083300.19077-3-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 clxscore=1011 impostorscore=0 phishscore=0 malwarescore=0 bulkscore=0 priorityscore=1501 mlxscore=0 suspectscore=0 spamscore=0 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a MOVNTI based implementation of memset(). memset_orig() and memset_movnti() only differ in the opcode used in the inner loop, so move the memset_orig() logic into a macro, which gets expanded into memset_movq() and memset_movnti(). Signed-off-by: Ankur Arora --- arch/x86/lib/memset_64.S | 68 +++++++++++++++++++++++++++--------------------- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S index 9ff15ee404a4..79703cc04b6a 100644 --- a/arch/x86/lib/memset_64.S +++ b/arch/x86/lib/memset_64.S @@ -27,7 +27,7 @@ SYM_FUNC_START(__memset) * * Otherwise, use original memset function. */ - ALTERNATIVE_2 "jmp memset_orig", "", X86_FEATURE_REP_GOOD, \ + ALTERNATIVE_2 "jmp memset_movq", "", X86_FEATURE_REP_GOOD, \ "jmp memset_erms", X86_FEATURE_ERMS movq %rdi,%r9 @@ -68,7 +68,8 @@ SYM_FUNC_START_LOCAL(memset_erms) ret SYM_FUNC_END(memset_erms) -SYM_FUNC_START_LOCAL(memset_orig) +.macro MEMSET_MOV OP fence +SYM_FUNC_START_LOCAL(memset_\OP) movq %rdi,%r10 /* expand byte value */ @@ -79,64 +80,71 @@ SYM_FUNC_START_LOCAL(memset_orig) /* align dst */ movl %edi,%r9d andl $7,%r9d - jnz .Lbad_alignment -.Lafter_bad_alignment: + jnz .Lbad_alignment_\@ +.Lafter_bad_alignment_\@: movq %rdx,%rcx shrq $6,%rcx - jz .Lhandle_tail + jz .Lhandle_tail_\@ .p2align 4 -.Lloop_64: +.Lloop_64_\@: decq %rcx - movq %rax,(%rdi) - movq %rax,8(%rdi) - movq %rax,16(%rdi) - movq %rax,24(%rdi) - movq %rax,32(%rdi) - movq %rax,40(%rdi) - movq %rax,48(%rdi) - movq %rax,56(%rdi) + \OP %rax,(%rdi) + \OP %rax,8(%rdi) + \OP %rax,16(%rdi) + \OP %rax,24(%rdi) + \OP %rax,32(%rdi) + \OP %rax,40(%rdi) + \OP %rax,48(%rdi) + \OP %rax,56(%rdi) leaq 64(%rdi),%rdi - jnz .Lloop_64 + jnz .Lloop_64_\@ /* Handle tail in loops. The loops should be faster than hard to predict jump tables. */ .p2align 4 -.Lhandle_tail: +.Lhandle_tail_\@: movl %edx,%ecx andl $63&(~7),%ecx - jz .Lhandle_7 + jz .Lhandle_7_\@ shrl $3,%ecx .p2align 4 -.Lloop_8: +.Lloop_8_\@: decl %ecx - movq %rax,(%rdi) + \OP %rax,(%rdi) leaq 8(%rdi),%rdi - jnz .Lloop_8 + jnz .Lloop_8_\@ -.Lhandle_7: +.Lhandle_7_\@: andl $7,%edx - jz .Lende + jz .Lende_\@ .p2align 4 -.Lloop_1: +.Lloop_1_\@: decl %edx movb %al,(%rdi) leaq 1(%rdi),%rdi - jnz .Lloop_1 + jnz .Lloop_1_\@ -.Lende: +.Lende_\@: + .if \fence + sfence + .endif movq %r10,%rax ret -.Lbad_alignment: +.Lbad_alignment_\@: cmpq $7,%rdx - jbe .Lhandle_7 + jbe .Lhandle_7_\@ movq %rax,(%rdi) /* unaligned store */ movq $8,%r8 subq %r9,%r8 addq %r8,%rdi subq %r8,%rdx - jmp .Lafter_bad_alignment -.Lfinal: -SYM_FUNC_END(memset_orig) + jmp .Lafter_bad_alignment_\@ +.Lfinal_\@: +SYM_FUNC_END(memset_\OP) +.endm + +MEMSET_MOV OP=movq fence=0 +MEMSET_MOV OP=movnti fence=1 From patchwork Wed Oct 14 08:32:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837043 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85320921 for ; Wed, 14 Oct 2020 08:33:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 30AAC20BED for ; Wed, 14 Oct 2020 08:33:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="gFTjpD2T" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30AAC20BED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 14DD7940008; Wed, 14 Oct 2020 04:33:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0FCE8940009; Wed, 14 Oct 2020 04:33:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2E0D940008; Wed, 14 Oct 2020 04:33:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id C5ECA6B005D for ; Wed, 14 Oct 2020 04:33:18 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5B9341EE6 for ; Wed, 14 Oct 2020 08:33:18 +0000 (UTC) X-FDA: 77369866476.11.head10_3212eb02720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 3AF10180F8B81 for ; Wed, 14 Oct 2020 08:33:18 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30054:30055:30064,0,RBL:141.146.126.79:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04yfrdkkogkzfmbngi89eo1gmjd83ycee1gqom5ixo1q9f6brj91uwqx9smkmfd.mra6q9d51n1dkibmykif1bgjifexiny88opeqnr37iyp7bsxe7xfbtm1ofe7zx1.1-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: head10_3212eb02720a X-Filterd-Recvd-Size: 8924 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:33:17 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8SYqr079281; Wed, 14 Oct 2020 08:33:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=0LOAlCR7+l/nqsBzeW4eiXFvfSz5VzzoawRgTiBvYww=; b=gFTjpD2T7UidHHTZkiAsGpZXozTXlAnbc2lxmuTrdl2nC8Jb5/0qbQ480eCX/sS9VFG8 WUlQ9nqWNt1tm/3gt22S3Por9cCW2u+I2jD2ObumrFq1nZY2h8tuBTH7dew2PH6d1WtE bUFSv77uCpNuxBAzbHKZVimgi8GXq8iKdQRGfKnZdFP5sRB+CvSurMShja3SgbD0oXrK QeOF2rCu+q9vj/cKfQJqIPmxRnMTar+A9zjrSKsilNeBrW39cmql3XG8Y+X7+D0FyLT8 tV9jpcE8SdhYCibs2BKsjF33+FXEvDFDiOgK2M6YNlwIgRHOGTImRuRbzuFc5f4khSbJ vw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2130.oracle.com with ESMTP id 343pajvt3r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:33:04 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8UYoR111566; Wed, 14 Oct 2020 08:33:03 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 343pvxfsr8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:33:03 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 09E8X16S029052; Wed, 14 Oct 2020 08:33:01 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:33:01 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim Subject: [PATCH 3/8] perf bench: add memset_movnti() Date: Wed, 14 Oct 2020 01:32:54 -0700 Message-Id: <20201014083300.19077-4-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 suspectscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 impostorscore=0 priorityscore=1501 clxscore=1011 malwarescore=0 adultscore=0 lowpriorityscore=0 spamscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Clone memset_movnti() from arch/x86/lib/memset_64.S. perf bench mem memset on -f x86-64-movnt on Intel Broadwellx, Skylakex and AMD Rome: Intel Broadwellx: $ for i in 2 8 32 128 512; do perf bench mem memset -f x86-64-movnt -s ${i}MB done # Output pruned. # Running 'mem/memset' benchmark: # function 'x86-64-movnt' (movnt-based memset() in arch/x86/lib/memset_64.S) # Copying 2MB bytes ... 11.837121 GB/sec # Copying 8MB bytes ... 11.783560 GB/sec # Copying 32MB bytes ... 11.868591 GB/sec # Copying 128MB bytes ... 11.865211 GB/sec # Copying 512MB bytes ... 11.864085 GB/sec Intel Skylakex: $ for i in 2 8 32 128 512; do perf bench mem memset -f x86-64-movnt -s ${i}MB done # Running 'mem/memset' benchmark: # function 'x86-64-movnt' (movnt-based memset() in arch/x86/lib/memset_64.S) # Copying 2MB bytes ... 6.361971 GB/sec # Copying 8MB bytes ... 6.300403 GB/sec # Copying 32MB bytes ... 6.288992 GB/sec # Copying 128MB bytes ... 6.328793 GB/sec # Copying 512MB bytes ... 6.324471 GB/sec AMD Rome: $ for i in 2 8 32 128 512; do perf bench mem memset -f x86-64-movnt -s ${i}MB done # Running 'mem/memset' benchmark: # function 'x86-64-movnt' (movnt-based memset() in arch/x86/lib/memset_64.S) # Copying 2MB bytes ... 10.993199 GB/sec # Copying 8MB bytes ... 14.221784 GB/sec # Copying 32MB bytes ... 14.293337 GB/sec # Copying 128MB bytes ... 15.238947 GB/sec # Copying 512MB bytes ... 16.476093 GB/sec Signed-off-by: Ankur Arora --- tools/arch/x86/lib/memset_64.S | 68 ++++++++++++++++------------ tools/perf/bench/mem-memset-x86-64-asm-def.h | 6 ++- 2 files changed, 43 insertions(+), 31 deletions(-) diff --git a/tools/arch/x86/lib/memset_64.S b/tools/arch/x86/lib/memset_64.S index fd5d25a474b7..bfbf6d06f81e 100644 --- a/tools/arch/x86/lib/memset_64.S +++ b/tools/arch/x86/lib/memset_64.S @@ -26,7 +26,7 @@ SYM_FUNC_START(__memset) * * Otherwise, use original memset function. */ - ALTERNATIVE_2 "jmp memset_orig", "", X86_FEATURE_REP_GOOD, \ + ALTERNATIVE_2 "jmp memset_movq", "", X86_FEATURE_REP_GOOD, \ "jmp memset_erms", X86_FEATURE_ERMS movq %rdi,%r9 @@ -65,7 +65,8 @@ SYM_FUNC_START(memset_erms) ret SYM_FUNC_END(memset_erms) -SYM_FUNC_START(memset_orig) +.macro MEMSET_MOV OP fence +SYM_FUNC_START(memset_\OP) movq %rdi,%r10 /* expand byte value */ @@ -76,64 +77,71 @@ SYM_FUNC_START(memset_orig) /* align dst */ movl %edi,%r9d andl $7,%r9d - jnz .Lbad_alignment -.Lafter_bad_alignment: + jnz .Lbad_alignment_\@ +.Lafter_bad_alignment_\@: movq %rdx,%rcx shrq $6,%rcx - jz .Lhandle_tail + jz .Lhandle_tail_\@ .p2align 4 -.Lloop_64: +.Lloop_64_\@: decq %rcx - movq %rax,(%rdi) - movq %rax,8(%rdi) - movq %rax,16(%rdi) - movq %rax,24(%rdi) - movq %rax,32(%rdi) - movq %rax,40(%rdi) - movq %rax,48(%rdi) - movq %rax,56(%rdi) + \OP %rax,(%rdi) + \OP %rax,8(%rdi) + \OP %rax,16(%rdi) + \OP %rax,24(%rdi) + \OP %rax,32(%rdi) + \OP %rax,40(%rdi) + \OP %rax,48(%rdi) + \OP %rax,56(%rdi) leaq 64(%rdi),%rdi - jnz .Lloop_64 + jnz .Lloop_64_\@ /* Handle tail in loops. The loops should be faster than hard to predict jump tables. */ .p2align 4 -.Lhandle_tail: +.Lhandle_tail_\@: movl %edx,%ecx andl $63&(~7),%ecx - jz .Lhandle_7 + jz .Lhandle_7_\@ shrl $3,%ecx .p2align 4 -.Lloop_8: +.Lloop_8_\@: decl %ecx - movq %rax,(%rdi) + \OP %rax,(%rdi) leaq 8(%rdi),%rdi - jnz .Lloop_8 + jnz .Lloop_8_\@ -.Lhandle_7: +.Lhandle_7_\@: andl $7,%edx - jz .Lende + jz .Lende_\@ .p2align 4 -.Lloop_1: +.Lloop_1_\@: decl %edx movb %al,(%rdi) leaq 1(%rdi),%rdi - jnz .Lloop_1 + jnz .Lloop_1_\@ -.Lende: +.Lende_\@: + .if \fence + sfence + .endif movq %r10,%rax ret -.Lbad_alignment: +.Lbad_alignment_\@: cmpq $7,%rdx - jbe .Lhandle_7 + jbe .Lhandle_7_\@ movq %rax,(%rdi) /* unaligned store */ movq $8,%r8 subq %r9,%r8 addq %r8,%rdi subq %r8,%rdx - jmp .Lafter_bad_alignment -.Lfinal: -SYM_FUNC_END(memset_orig) + jmp .Lafter_bad_alignment_\@ +.Lfinal_\@: +SYM_FUNC_END(memset_\OP) +.endm + +MEMSET_MOV OP=movq fence=0 +MEMSET_MOV OP=movnti fence=1 diff --git a/tools/perf/bench/mem-memset-x86-64-asm-def.h b/tools/perf/bench/mem-memset-x86-64-asm-def.h index dac6d2b7c39b..53ead7f91313 100644 --- a/tools/perf/bench/mem-memset-x86-64-asm-def.h +++ b/tools/perf/bench/mem-memset-x86-64-asm-def.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ -MEMSET_FN(memset_orig, +MEMSET_FN(memset_movq, "x86-64-unrolled", "unrolled memset() in arch/x86/lib/memset_64.S") @@ -11,3 +11,7 @@ MEMSET_FN(__memset, MEMSET_FN(memset_erms, "x86-64-stosb", "movsb-based memset() in arch/x86/lib/memset_64.S") + +MEMSET_FN(memset_movnti, + "x86-64-movnt", + "movnt-based memset() in arch/x86/lib/memset_64.S") From patchwork Wed Oct 14 08:32:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837049 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 06246921 for ; Wed, 14 Oct 2020 08:34:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 840BD221FE for ; Wed, 14 Oct 2020 08:34:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Czn2jonC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 840BD221FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0395194000A; Wed, 14 Oct 2020 04:33:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F2B04940009; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D588094000C; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 8E70D94000A for ; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 01A58181AEF07 for ; Wed, 14 Oct 2020 08:33:57 +0000 (UTC) X-FDA: 77369868114.17.chalk78_0112ca62720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id D0A2B180D0181 for ; Wed, 14 Oct 2020 08:33:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30016:30034:30054:30055:30064,0,RBL:141.146.126.79:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yf6zqgz9x6c5wqqoz5ancefa8e9ocr8tykawdzwh81wxx33mt3dtaa3ndqkbd.wm6rcoqt5gchkujx9k5jos6tnmixrgbfb1mw4c7hscnip4gy6p4wpqyrfi8yyhb.4-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: chalk78_0112ca62720a X-Filterd-Recvd-Size: 20808 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:33:56 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8SVvl079244; Wed, 14 Oct 2020 08:33:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=ft6BxI93bWXpv5KvcECBv+9LJ7SCRti4HHPnJOWKA5I=; b=Czn2jonC75aPpOWTRBGtuTqOGY4H6TXn6TUbRZ5mRdbo4BOgCgnz/GMijDJWZvVhm8Vn n25yZEqwzTTBwu4KXzAqYfqAuK3Jce8tU/ydy9cRWLUS6R5L37ZgXRQrwLFebCS6Tk1P JbbVYWy2CYn7EeB8tDg1zbafT9+0bUZCorMQiVXY7RtHcVsC7j99wJJMWiFVtXsOBC6n 7hH+jj5WeJwfG1bjvZ3dX93/ria/3rD8QGahF5qTeTq/9y6gmdokx63RmbI9GrhZqvlj iWM29v//dmbItItKrUUe//B68OVFHAgJOFWypC0zGTJWhRBQBUflDkSCyGqDu/4aRhgs Gw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2130.oracle.com with ESMTP id 343pajvt4x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:33:08 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8UYa3111523; Wed, 14 Oct 2020 08:33:08 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 343pvxfstq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:33:07 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09E8X6iQ021466; Wed, 14 Oct 2020 08:33:06 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:33:05 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Jiri Slaby , Herbert Xu , "Rafael J. Wysocki" Subject: [PATCH 4/8] x86/asm: add clear_page_nt() Date: Wed, 14 Oct 2020 01:32:55 -0700 Message-Id: <20201014083300.19077-5-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 suspectscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 impostorscore=0 priorityscore=1501 clxscore=1011 malwarescore=0 adultscore=0 lowpriorityscore=0 spamscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add clear_page_nt() which is essentially an unrolled MOVNTI loop. The unrolling keeps the inner loop similar to memset_movnti() which can be exercised via perf bench mem memset. The caller needs to execute an SFENCE when done. MOVNTI, from the Intel SDM, Volume 2B, 4-101: "The non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy." The AMD Arch Manual has something similar to say as well. This can potentially improve page-clearing bandwidth (see below for performance numbers for two microarchitectures where it helps and one where it doesn't) and can help indirectly by consuming less cache resources. Any performance benefits are expected for extents larger than LLC-sized or more -- when we are DRAM-BW constrained rather than cache-BW constrained. # Intel Broadwellx # Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb # (X86_FEATURE_ERMS) and x86-64-movnt: System: Oracle X6-2 CPU: 2 nodes * 10 cores/node * 2 threads/core Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1) Memory: 256G evenly split between nodes Microcode: 0xb00002e scaling_governor: performance L3 size: 25MB intel_pstate/no_turbo: 1 x86-64-stosb (5 runs) x86-64-movnt (5 runs) speedup ----------------------- ----------------------- ------- size BW ( pstdev) BW ( pstdev) 16MB 17.35 GB/s ( +- 9.27%) 11.83 GB/s ( +- 0.19%) -31.81% 128MB 5.31 GB/s ( +- 0.13%) 11.72 GB/s ( +- 0.44%) +121.84% 1024MB 5.42 GB/s ( +- 0.13%) 11.78 GB/s ( +- 0.03%) +117.34% 4096MB 5.41 GB/s ( +- 0.41%) 11.76 GB/s ( +- 0.07%) +117.37% Comparing perf stats for size=4096MB: $ perf stat -r 5 --all-user -e ... perf bench mem memset -l 1 -s 4096MB -f x86-64-stosb # Running 'mem/memset' benchmark: # function 'x86-64-stosb' (movsb-based memset() in arch/x86/lib/memset_64.S) # Copying 4096MB bytes ... 5.405362 GB/sec 5.444229 GB/sec 5.397943 GB/sec 5.401012 GB/sec 5.439320 GB/sec Performance counter stats for 'perf bench mem memset -l 1 -s 4096MB -f x86-64-stosb' (5 runs): 2,064,476,092 cpu-cycles # 1.087 GHz ( +- 0.17% ) (22.19%) 8,578,591 instructions # 0.00 insn per cycle ( +- 12.01% ) (27.79%) 132,481,645 cache-references # 69.730 M/sec ( +- 0.20% ) (27.83%) 157,710 cache-misses # 0.119 % of all cache refs ( +- 5.80% ) (27.84%) 2,879,628 branch-instructions # 1.516 M/sec ( +- 0.21% ) (27.86%) 80,581 branch-misses # 2.80% of all branches ( +- 13.15% ) (27.84%) 94,401,869 bus-cycles # 49.687 M/sec ( +- 0.25% ) (22.21%) 133,947,283 L1-dcache-load-misses # 139717.91% of all L1-dcache accesses ( +- 0.26% ) (22.21%) 95,870 L1-dcache-loads # 0.050 M/sec ( +- 9.95% ) (22.21%) 1,700 LLC-loads # 0.895 K/sec ( +- 6.50% ) (22.21%) 1,410 LLC-load-misses # 82.95% of all LL-cache accesses ( +- 19.42% ) (22.21%) 132,526,771 LLC-stores # 69.754 M/sec ( +- 0.65% ) (11.10%) 101,145 LLC-store-misses # 0.053 M/sec ( +- 11.19% ) (11.10%) 1.90238 +- 0.00358 seconds time elapsed ( +- 0.19% ) $ perf stat -r 5 --all-user -e ... perf bench mem memset -l 1 -s 4096MB -f x86-64-movnt # Running 'mem/memset' benchmark: # function 'x86-64-movnt' (movnt-based memset() in arch/x86/lib/memset_64.S) # Copying 4096MB bytes ... 11.774264 GB/sec 11.758826 GB/sec 11.774368 GB/sec 11.758239 GB/sec 11.760348 GB/sec Performance counter stats for 'perf bench mem memset -l 1 -s 4096MB -f x86-64-movnt' (5 runs): 1,619,807,936 cpu-cycles # 0.971 GHz ( +- 0.24% ) (22.14%) 1,481,306,856 instructions # 0.91 insn per cycle ( +- 0.33% ) (27.75%) 163,086 cache-references # 0.098 M/sec ( +- 11.68% ) (27.79%) 39,913 cache-misses # 24.474 % of all cache refs ( +- 26.45% ) (27.84%) 135,741,931 branch-instructions # 81.353 M/sec ( +- 0.33% ) (27.89%) 82,647 branch-misses # 0.06% of all branches ( +- 6.29% ) (27.90%) 73,575,446 bus-cycles # 44.095 M/sec ( +- 0.28% ) (22.28%) 27,834 L1-dcache-load-misses # 68.42% of all L1-dcache accesses ( +- 65.93% ) (22.28%) 40,683 L1-dcache-loads # 0.024 M/sec ( +- 42.62% ) (22.27%) 2,598 LLC-loads # 0.002 M/sec ( +- 22.66% ) (22.25%) 1,523 LLC-load-misses # 58.60% of all LL-cache accesses ( +- 39.64% ) (22.22%) 2 LLC-stores # 0.001 K/sec ( +-100.00% ) (11.08%) 0 LLC-store-misses # 0.000 K/sec (11.07%) 1.67003 +- 0.00169 seconds time elapsed ( +- 0.10% ) The L1-dcache-load-miss (L1D.REPLACEMENT) counts are significantly down, which does confirm that unlike "REP; STOSB", MOVNTI does not result in a write-allocate. # AMD Rome # Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosq # (X86_FEATURE_REP_GOOD) and x86-64-movnt: System: Oracle E2-2c CPU: 2 nodes * 64 cores/node * 2 threads/core AMD EPYC 7742 (Rome, 23:49:0) Memory: 2048 GB evenly split between nodes Microcode: 0x8301038 scaling_governor: performance L3 size: 16 * 16MB cpufreq/boost: 0 x86-64-stosq (5 runs) x86-64-movnt (5 runs) speedup ----------------------- ----------------------- ------- size BW ( pstdev) BW ( pstdev) 16MB 15.39 GB/s ( +- 9.14%) 14.56 GB/s ( +-19.43%) -5.39% 128MB 11.04 GB/s ( +- 4.87%) 14.49 GB/s ( +-13.22%) +31.25% 1024MB 11.86 GB/s ( +- 0.83%) 16.54 GB/s ( +- 0.04%) +39.46% 4096MB 11.89 GB/s ( +- 0.61%) 16.49 GB/s ( +- 0.28%) +38.68% Comparing perf stats for size=4096MB: $ perf stat -r 5 --all-user -e ... perf bench mem memset -l 1 -s 4096MB -f x86-64-stosq # Running 'mem/memset' benchmark: # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) # Copying 4096MB bytes ... 11.785122 GB/sec 11.970851 GB/sec 11.916821 GB/sec 11.861517 GB/sec 11.941867 GB/sec Performance counter stats for 'perf bench mem memset -l 1 -s 4096MB -f x86-64-stosq' (5 runs): 1,014,645,096 cpu-cycles # 1.264 GHz ( +- 0.18% ) (45.28%) 4,620,983 instructions # 0.00 insn per cycle ( +- 1.86% ) (45.37%) 262,988,622 cache-references # 327.723 M/sec ( +- 0.21% ) (45.51%) 6,312,740 cache-misses # 2.400 % of all cache refs ( +- 1.12% ) (45.56%) 1,792,517 branch-instructions # 2.234 M/sec ( +- 0.20% ) (45.60%) 54,095 branch-misses # 3.02% of all branches ( +- 2.99% ) (45.64%) 133,710,131 L1-dcache-load-misses # 363.51% of all L1-dcache accesses ( +- 0.12% ) (45.55%) 36,783,396 L1-dcache-loads # 45.838 M/sec ( +- 0.79% ) (45.46%) 53,411,709 L1-dcache-prefetches # 66.559 M/sec ( +- 0.28% ) (45.39%) 0.80303 +- 0.00117 seconds time elapsed ( +- 0.15% ) $ perf stat -r 5 --all-user -e ... perf bench mem memset -l 1 -s 4096MB -f x86-64-movnt # Running 'mem/memset' benchmark: # function 'x86-64-movnt' (movnt-based memset() in arch/x86/lib/memset_64.S) # Copying 4096MB bytes ... 16.533230 GB/sec 16.496138 GB/sec 16.480302 GB/sec 16.478333 GB/sec 16.474600 GB/sec Performance counter stats for 'perf bench mem memset -l 1 -s 4096MB -f x86-64-movnt' (5 runs): 1,091,352,779 cpu-cycles # 1.292 GHz ( +- 0.32% ) (45.25%) 1,483,248,390 instructions # 1.36 insn per cycle ( +- 0.14% ) (45.38%) 134,114,985 cache-references # 158.723 M/sec ( +- 0.17% ) (45.51%) 117,682 cache-misses # 0.088 % of all cache refs ( +- 0.99% ) (45.59%) 135,009,275 branch-instructions # 159.781 M/sec ( +- 0.18% ) (45.68%) 50,659 branch-misses # 0.04% of all branches ( +- 7.50% ) (45.66%) 58,569 L1-dcache-load-misses # 5.84% of all L1-dcache accesses ( +- 6.04% ) (45.57%) 1,002,657 L1-dcache-loads # 1.187 M/sec ( +- 15.40% ) (45.45%) 3,111 L1-dcache-prefetches # 0.004 M/sec ( +- 31.21% ) (45.38%) 0.84554 +- 0.00289 seconds time elapsed ( +- 0.34% ) Similar to Intel Broadwellx, the L1-dcache-load-misses (L2$ access from DC Miss) counts are significantly lower. The L1 prefetcher is also fairly quiet. # Intel Skylakex # Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb # (X86_FEATURE_ERMS) and x86-64-movnt: System: Oracle X8-2 CPU: 2 nodes * 26 cores/node * 2 threads/core Intel Xeon Platinum 8270CL (Skylakex, 6:85:7) Memory: 3TB evenly split between nodes Microcode: 0x5002f01 scaling_governor: performance L3 size: 36MB intel_pstate/no_turbo: 1 x86-64-stosb (5 runs) x86-64-movnt (5 runs) speedup ----------------------- ----------------------- ------- size BW ( pstdev) BW ( pstdev) 16MB 20.38 GB/s ( +- 2.58%) 6.25 GB/s ( +- 0.41%) -69.28% 128MB 6.52 GB/s ( +- 0.14%) 6.31 GB/s ( +- 0.47%) -3.22% 1024MB 6.48 GB/s ( +- 0.31%) 6.24 GB/s ( +- 0.00%) -3.70% 4096MB 6.51 GB/s ( +- 0.01%) 6.27 GB/s ( +- 0.42%) -3.68% Comparing perf stats for size=4096MB: $ perf stat -r 5 --all-user -e ... perf bench mem memset -l 1 -s 4096MB -f x86-64-stosb # Running 'mem/memset' benchmark: # function 'x86-64-stosb' (movsb-based memset() in arch/x86/lib/memset_64.S) # Copying 4096MB bytes ... 6.516972 GB/sec 6.518756 GB/sec 6.517620 GB/sec 6.517598 GB/sec 6.518799 GB/sec Performance counter stats for 'perf bench mem memset -l 1 -s 4096MB -f x86-64-stosb' (5 runs): 3,357,373,317 cpu-cycles # 1.133 GHz ( +- 0.01% ) (29.38%) 165,063,710 instructions # 0.05 insn per cycle ( +- 1.54% ) (35.29%) 358,997 cache-references # 0.121 M/sec ( +- 0.89% ) (35.32%) 205,420 cache-misses # 57.221 % of all cache refs ( +- 3.61% ) (35.36%) 6,117,673 branch-instructions # 2.065 M/sec ( +- 1.48% ) (35.38%) 58,309 branch-misses # 0.95% of all branches ( +- 1.30% ) (35.39%) 31,329,466 bus-cycles # 10.575 M/sec ( +- 0.03% ) (23.56%) 68,543,766 L1-dcache-load-misses # 157.03% of all L1-dcache accesses ( +- 0.02% ) (23.53%) 43,648,909 L1-dcache-loads # 14.734 M/sec ( +- 0.50% ) (23.50%) 137,498 LLC-loads # 0.046 M/sec ( +- 0.21% ) (23.49%) 12,308 LLC-load-misses # 8.95% of all LL-cache accesses ( +- 2.52% ) (23.49%) 26,335 LLC-stores # 0.009 M/sec ( +- 5.65% ) (11.75%) 25,008 LLC-store-misses # 0.008 M/sec ( +- 3.42% ) (11.75%) 2.962842 +- 0.000162 seconds time elapsed ( +- 0.01% ) $ perf stat -r 5 --all-user -e ... perf bench mem memset -l 1 -s 4096MB -f x86-64-movnt # Running 'mem/memset' benchmark: # function 'x86-64-movnt' (movnt-based memset() in arch/x86/lib/memset_64.S) # Copying 4096MB bytes ... 6.283420 GB/sec 6.222843 GB/sec 6.282976 GB/sec 6.282828 GB/sec 6.283173 GB/sec Performance counter stats for 'perf bench mem memset -l 1 -s 4096MB -f x86-64-movnt' (5 runs): 4,462,272,094 cpu-cycles # 1.322 GHz ( +- 0.30% ) (29.38%) 1,633,675,881 instructions # 0.37 insn per cycle ( +- 0.21% ) (35.28%) 283,627 cache-references # 0.084 M/sec ( +- 0.58% ) (35.31%) 28,824 cache-misses # 10.163 % of all cache refs ( +- 20.67% ) (35.34%) 139,719,697 branch-instructions # 41.407 M/sec ( +- 0.16% ) (35.35%) 58,062 branch-misses # 0.04% of all branches ( +- 1.49% ) (35.36%) 41,760,350 bus-cycles # 12.376 M/sec ( +- 0.05% ) (23.55%) 303,300 L1-dcache-load-misses # 0.69% of all L1-dcache accesses ( +- 2.08% ) (23.53%) 43,769,498 L1-dcache-loads # 12.972 M/sec ( +- 0.54% ) (23.52%) 99,570 LLC-loads # 0.030 M/sec ( +- 1.06% ) (23.52%) 1,966 LLC-load-misses # 1.97% of all LL-cache accesses ( +- 6.17% ) (23.52%) 129 LLC-stores # 0.038 K/sec ( +- 27.85% ) (11.75%) 7 LLC-store-misses # 0.002 K/sec ( +- 47.82% ) (11.75%) 3.37465 +- 0.00474 seconds time elapsed ( +- 0.14% ) The L1-dcache-load-misses (L1D.REPLACEMENT) count is much lower just like the previous two cases. No performance improvement for Skylakex though. Signed-off-by: Ankur Arora --- arch/x86/include/asm/page_64.h | 1 + arch/x86/lib/clear_page_64.S | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 939b1cff4a7b..bde3c2785ec4 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -43,6 +43,7 @@ extern unsigned long __phys_addr_symbol(unsigned long); void clear_page_orig(void *page); void clear_page_rep(void *page); void clear_page_erms(void *page); +void clear_page_nt(void *page); static inline void clear_page(void *page) { diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S index c4c7dd115953..f16bb753b236 100644 --- a/arch/x86/lib/clear_page_64.S +++ b/arch/x86/lib/clear_page_64.S @@ -50,3 +50,29 @@ SYM_FUNC_START(clear_page_erms) ret SYM_FUNC_END(clear_page_erms) EXPORT_SYMBOL_GPL(clear_page_erms) + +/* + * Zero a page. + * %rdi - page + * + * Caller needs to issue a fence at the end. + */ +SYM_FUNC_START(clear_page_nt) + xorl %eax,%eax + movl $4096,%ecx + + .p2align 4 +.Lstart: + movnti %rax, 0x00(%rdi) + movnti %rax, 0x08(%rdi) + movnti %rax, 0x10(%rdi) + movnti %rax, 0x18(%rdi) + movnti %rax, 0x20(%rdi) + movnti %rax, 0x28(%rdi) + movnti %rax, 0x30(%rdi) + movnti %rax, 0x38(%rdi) + addq $0x40, %rdi + subl $0x40, %ecx + ja .Lstart + ret +SYM_FUNC_END(clear_page_nt) From patchwork Wed Oct 14 08:32:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837045 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5520014B2 for ; Wed, 14 Oct 2020 08:33:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D05D020BED for ; Wed, 14 Oct 2020 08:33:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="V73J0TZq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D05D020BED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B35DC6B005D; Wed, 14 Oct 2020 04:33:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AE599940009; Wed, 14 Oct 2020 04:33:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FC006B0068; Wed, 14 Oct 2020 04:33:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 738D66B005D for ; Wed, 14 Oct 2020 04:33:31 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 15D46181AC9CC for ; Wed, 14 Oct 2020 08:33:31 +0000 (UTC) X-FDA: 77369867022.02.balls41_280257f2720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id EA4C5101E9AA6 for ; Wed, 14 Oct 2020 08:33:30 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30054:30064,0,RBL:141.146.126.79:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yf5tjbhphn7b9czbwyxm5k7ugapycenfs9mn16osbo51ssi7chzyx9esatkpi.kgzpgeiknactjujrjedmoe7s6cmdrw11uy5g3z5bim9g13xiiuqx38gxtuqu5ec.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: balls41_280257f2720a X-Filterd-Recvd-Size: 7429 Received: from aserp2130.oracle.com (aserp2130.oracle.com [141.146.126.79]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:33:30 +0000 (UTC) Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8Sbj1079320; Wed, 14 Oct 2020 08:33:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=N2brVn+1Poo0Ren9oK0iyu9Zpo2iIS7LndLrg9tk4Ck=; b=V73J0TZqdhEaSF8XOaRaygnrm/IXGHIMihqMhoHp3gnoxah4MR3DZeaJO9o/J/648bIV oA7UeGBG4mZY14zXCa5jZNZcr5U6asLtkQ8BGa24PpFKnGlsFT+hcla8x7SZ0N00wmav 1sRqVlnPZ6fPj9qFi1ocDvJBpl4oRkbm+v3J0fzL+Ecy257b++hP4OxFtTeLIQ1oL6kM HqiqzwoE2863wunhBjFnyTtzFBslp4hM2bKstlOHvptWK+DCtEV6tBaZYcpfDKcC43SS KLbusJwGPY1VeACMbidIVIy9A/DAQUSQnBXzuIVnTVRf80Sq5j4CbwBGu9yQyWGc/U1d BA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2130.oracle.com with ESMTP id 343pajvt6t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:33:21 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8Tg3N125760; Wed, 14 Oct 2020 08:33:21 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 343php850t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:33:21 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09E8XE39000426; Wed, 14 Oct 2020 08:33:14 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:33:13 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Arnd Bergmann , Andrew Morton , Ira Weiny , linux-arch@vger.kernel.org Subject: [PATCH 5/8] x86/clear_page: add clear_page_uncached() Date: Wed, 14 Oct 2020 01:32:56 -0700 Message-Id: <20201014083300.19077-6-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 suspectscore=0 impostorscore=0 priorityscore=1501 clxscore=1011 malwarescore=0 adultscore=0 lowpriorityscore=0 spamscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Define clear_page_uncached() as an alternative_call() to clear_page_nt() if the CPU sets X86_FEATURE_NT_GOOD and fallback to clear_page() if it doesn't. Similarly define clear_page_uncached_flush() which provides an SFENCE if the CPU sets X86_FEATURE_NT_GOOD. Also, add the glue interface clear_user_highpage_uncached(). Signed-off-by: Ankur Arora Reported-by: kernel test robot Reported-by: kernel test robot --- arch/x86/include/asm/page.h | 6 ++++++ arch/x86/include/asm/page_32.h | 9 +++++++++ arch/x86/include/asm/page_64.h | 14 ++++++++++++++ include/asm-generic/page.h | 3 +++ include/linux/highmem.h | 10 ++++++++++ 5 files changed, 42 insertions(+) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 7555b48803a8..ca0aa379ac7f 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -28,6 +28,12 @@ static inline void clear_user_page(void *page, unsigned long vaddr, clear_page(page); } +static inline void clear_user_page_uncached(void *page, unsigned long vaddr, + struct page *pg) +{ + clear_page_uncached(page); +} + static inline void copy_user_page(void *to, void *from, unsigned long vaddr, struct page *topage) { diff --git a/arch/x86/include/asm/page_32.h b/arch/x86/include/asm/page_32.h index 94dbd51df58f..7a03a274a9a4 100644 --- a/arch/x86/include/asm/page_32.h +++ b/arch/x86/include/asm/page_32.h @@ -39,6 +39,15 @@ static inline void clear_page(void *page) memset(page, 0, PAGE_SIZE); } +static inline void clear_page_uncached(void *page) +{ + clear_page(page); +} + +static inline void clear_page_uncached_flush(void) +{ +} + static inline void copy_page(void *to, void *from) { memcpy(to, from, PAGE_SIZE); diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index bde3c2785ec4..5897075e77dd 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -55,6 +55,20 @@ static inline void clear_page(void *page) : "cc", "memory", "rax", "rcx"); } +static inline void clear_page_uncached(void *page) +{ + alternative_call(clear_page, + clear_page_nt, X86_FEATURE_NT_GOOD, + "=D" (page), + "0" (page) + : "cc", "memory", "rax", "rcx"); +} + +static inline void clear_page_uncached_flush(void) +{ + alternative("", "sfence", X86_FEATURE_NT_GOOD); +} + void copy_page(void *to, void *from); #endif /* !__ASSEMBLY__ */ diff --git a/include/asm-generic/page.h b/include/asm-generic/page.h index fe801f01625e..60235a0cf24a 100644 --- a/include/asm-generic/page.h +++ b/include/asm-generic/page.h @@ -26,6 +26,9 @@ #ifndef __ASSEMBLY__ #define clear_page(page) memset((page), 0, PAGE_SIZE) +#define clear_page_uncached(page) clear_page(page) +#define clear_page_uncached_flush() do { } while (0) + #define copy_page(to,from) memcpy((to), (from), PAGE_SIZE) #define clear_user_page(page, vaddr, pg) clear_page(page) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 14e6202ce47f..f842593e2474 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -232,6 +232,16 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) } #endif +#ifndef clear_user_highpage_uncached +static inline void clear_user_highpage_uncached(struct page *page, unsigned long vaddr) +{ + void *addr = kmap_atomic(page); + + clear_user_page_uncached(addr, vaddr, page); + kunmap_atomic(addr); +} +#endif + #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE /** * __alloc_zeroed_user_highpage - Allocate a zeroed HIGHMEM page for a VMA with caller-specified movable GFP flags From patchwork Wed Oct 14 08:32:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837055 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 17C0B14B2 for ; Wed, 14 Oct 2020 08:35:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B784820BED for ; Wed, 14 Oct 2020 08:35:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NxwER1YW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B784820BED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EDA1C940008; Wed, 14 Oct 2020 04:35:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E63B3900002; Wed, 14 Oct 2020 04:35:24 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2C8E940008; Wed, 14 Oct 2020 04:35:24 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id A2D02900002 for ; Wed, 14 Oct 2020 04:35:24 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3B0861EF1 for ; Wed, 14 Oct 2020 08:35:24 +0000 (UTC) X-FDA: 77369871768.25.scene20_07035ef2720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 1053B1804E3A0 for ; Wed, 14 Oct 2020 08:35:24 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30054:30064,0,RBL:156.151.31.85:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04yrqwhwsx9dh1amatqay3ju478jeopkkrguhticws81ii16oqqapy9bqptehob.k3o9fq1t5yb6ks8pxbwnx9oz1s19du17r55p79qp6a9dgu9iq3wib4odme3m7dz.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: scene20_07035ef2720a X-Filterd-Recvd-Size: 4571 Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:35:23 +0000 (UTC) Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8XneR028323; Wed, 14 Oct 2020 08:35:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=0xUha+Rzm7ppN9x+U34PLFU2iK8SLsjWF7MTNSZhBNE=; b=NxwER1YWmV2kX6CBdgiAEQgDNGSfrkpQYdBrEH3wyCit/fWJpVdgFYHi0f57Kux/i2aD 3b6K+/otQuOjkv2a1WVQ7JgFBFbc1y2mRdQgnEi5AqbWog3z+iCRP/j1K94QkGWe2Ufj Y6gBlxq9e3kTy7wDN6i0q3wj24u5ZDh18qkYbzbamRsKHQ+vbA0uRbUzRD+2E7WuFWpw SgcDwQMCSLIg9Gm6UN58sPDOGSADB42R1ghl5/hjR09Zzs5c0VA/hnJB+lz9HlgUr+yo hlrMwgv7gPif5io4VbXCEHgfIZaye8olgBmxugGXACqPv7gqutIuHw79iNefSwONXY8c Mg== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 343vaecgc0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:35:21 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8TYYn005651; Wed, 14 Oct 2020 08:33:21 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 344by3a4ng-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:33:20 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 09E8XKRQ029207; Wed, 14 Oct 2020 08:33:20 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:33:19 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Andrew Morton Subject: [PATCH 6/8] mm, clear_huge_page: use clear_page_uncached() for gigantic pages Date: Wed, 14 Oct 2020 01:32:57 -0700 Message-Id: <20201014083300.19077-7-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 suspectscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 clxscore=1015 impostorscore=0 phishscore=0 malwarescore=0 bulkscore=0 priorityscore=1501 mlxscore=0 suspectscore=0 spamscore=0 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140062 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Uncached writes are suitable for circumstances where the region written to is not expected to be read again soon, or the region written to is large enough that there's no expectation that we will find the writes in the cache. Accordingly switch to using clear_page_uncached() for gigantic pages. Signed-off-by: Ankur Arora --- mm/memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index eeae590e526a..4d2c58f83ab1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5092,7 +5092,7 @@ static void clear_gigantic_page(struct page *page, for (i = 0; i < pages_per_huge_page; i++, p = mem_map_next(p, page, i)) { cond_resched(); - clear_user_highpage(p, addr + i * PAGE_SIZE); + clear_user_highpage_uncached(p, addr + i * PAGE_SIZE); } } @@ -5111,6 +5111,7 @@ void clear_huge_page(struct page *page, if (unlikely(pages_per_huge_page > MAX_ORDER_NR_PAGES)) { clear_gigantic_page(page, addr, pages_per_huge_page); + clear_page_uncached_flush(); return; } From patchwork Wed Oct 14 08:32:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837047 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00DD514B2 for ; Wed, 14 Oct 2020 08:33:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92A9821527 for ; Wed, 14 Oct 2020 08:33:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="waB2eQIr" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92A9821527 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B5DA094000B; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AE80B940009; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AD6294000B; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id 63EF8940009 for ; Wed, 14 Oct 2020 04:33:57 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id F07DA181AC9CC for ; Wed, 14 Oct 2020 08:33:56 +0000 (UTC) X-FDA: 77369868072.19.bee14_240601b2720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id DA00F1AD1B7 for ; Wed, 14 Oct 2020 08:33:56 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30030:30054:30055:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-62.18.0.100 64.10.201.10;04yfquyuaofdimxae93iousdcc1h4opdyktct3eo316bo3qpt9cer45c33ir8a6.jqfjzp8efwwxxrp4g3fo4mhiz67xt1eqzzmdsi86my9mn73ziqamrc7ejsye3s4.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: bee14_240601b2720a X-Filterd-Recvd-Size: 11246 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:33:56 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8Ua63182186; Wed, 14 Oct 2020 08:33:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=RJVK+CzUyTBfDw8obQvm5ubHP60lgkWKdRwFkICT7nw=; b=waB2eQIrwmiMTY3v0fBxN+93kgkczxJ4quJYyja93bd6i5Dm8tsjGIB7W2Js086WpoVB 4enKt3KAE78OZNY+6kIBzbWDQ3YC20h4JXcNjE5i9/HzvUMLKMsCdhblh/AnGtei7iix Vo+T+hmgGh73p7YHMb7LlNntZJxaaQNLIBr7reXMxvpbUffuCKjZtDt+VrGr3jtMM8aJ e/FqGOoZ0S6J+nW4W2DRpzaL4m/I05aHv9uAZODqoLDYnBVmLgNMHRnCc0ELj4kBOrfI TSVIU5vxg3KXD2eTH1VmHEpGbb5iwDjT/ksM0rdp8AlrvS5/pIM+9zqTU87kKTLHpzZ7 jA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 3434wkp60x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:33:43 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8TYC9005668; Wed, 14 Oct 2020 08:33:42 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 344by3a4y5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:33:42 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09E8XeAc021661; Wed, 14 Oct 2020 08:33:40 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:33:40 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Tony Luck , Sean Christopherson , Mike Rapoport , Xiaoyao Li , Fenghua Yu , "Peter Zijlstra (Intel)" , Dave Hansen Subject: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel Broadwellx Date: Wed, 14 Oct 2020 01:32:58 -0700 Message-Id: <20201014083300.19077-8-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 suspectscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 impostorscore=0 clxscore=1011 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140061 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: System: Oracle X6-2 CPU: 2 nodes * 10 cores/node * 2 threads/core Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1) Memory: 256 GB evenly split between nodes Microcode: 0xb00002e scaling_governor: performance L3 size: 25MB intel_pstate/no_turbo: 1 Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb (X86_FEATURE_ERMS) and x86-64-movnt (X86_FEATURE_NT_GOOD): x86-64-stosb (5 runs) x86-64-movnt (5 runs) speedup ----------------------- ----------------------- ------- size BW ( pstdev) BW ( pstdev) 16MB 17.35 GB/s ( +- 9.27%) 11.83 GB/s ( +- 0.19%) -31.81% 128MB 5.31 GB/s ( +- 0.13%) 11.72 GB/s ( +- 0.44%) +121.84% 1024MB 5.42 GB/s ( +- 0.13%) 11.78 GB/s ( +- 0.03%) +117.34% 4096MB 5.41 GB/s ( +- 0.41%) 11.76 GB/s ( +- 0.07%) +117.37% The next workload exercises the page-clearing path directly by faulting over an anonymous mmap region backed by 1GB pages. This workload is similar to the creation phase of pinned guests in QEMU. $ cat pf-test.c #include #include #include #define HPAGE_BITS 30 int main(int argc, char **argv) { int i; unsigned long len = atoi(argv[1]); /* In GB */ unsigned long offset = 0; unsigned long numpages; char *base; len *= 1UL << 30; numpages = len >> HPAGE_BITS; base = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_HUGE_1GB, 0, 0); for (i = 0; i < numpages; i++) { *((volatile char *)base + offset) = *(base + offset); offset += 1UL << HPAGE_BITS; } return 0; } The specific test is for a 128GB region but this is a single-threaded O(n) workload so the exact region size is not material. Page-clearing throughput for clear_page_erms(): 3.72 GBps $ perf stat -r 5 --all-kernel -e ... bin/pf-test 128 Performance counter stats for 'bin/pf-test 128' (5 runs): 74,799,496,556 cpu-cycles # 2.176 GHz ( +- 2.22% ) (29.41%) 1,474,615,023 instructions # 0.02 insn per cycle ( +- 0.23% ) (35.29%) 2,148,580,131 cache-references # 62.502 M/sec ( +- 0.02% ) (35.29%) 71,736,985 cache-misses # 3.339 % of all cache refs ( +- 0.94% ) (35.29%) 433,713,165 branch-instructions # 12.617 M/sec ( +- 0.15% ) (35.30%) 1,008,251 branch-misses # 0.23% of all branches ( +- 1.88% ) (35.30%) 3,406,821,966 bus-cycles # 99.104 M/sec ( +- 2.22% ) (23.53%) 2,156,059,110 L1-dcache-load-misses # 445.35% of all L1-dcache accesses ( +- 0.01% ) (23.53%) 484,128,243 L1-dcache-loads # 14.083 M/sec ( +- 0.22% ) (23.53%) 944,216 LLC-loads # 0.027 M/sec ( +- 7.41% ) (23.53%) 537,989 LLC-load-misses # 56.98% of all LL-cache accesses ( +- 13.64% ) (23.53%) 2,150,138,476 LLC-stores # 62.547 M/sec ( +- 0.01% ) (11.76%) 69,598,760 LLC-store-misses # 2.025 M/sec ( +- 0.47% ) (11.76%) 483,923,875 dTLB-loads # 14.077 M/sec ( +- 0.21% ) (17.64%) 1,892 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 30.63% ) (23.53%) 4,799,154,980 dTLB-stores # 139.606 M/sec ( +- 0.03% ) (23.53%) 90 dTLB-store-misses # 0.003 K/sec ( +- 35.92% ) (23.53%) 34.377 +- 0.760 seconds time elapsed ( +- 2.21% ) Page-clearing throughput with clear_page_nt(): 11.78GBps $ perf stat -r 5 --all-kernel -e ... bin/pf-test 128 Performance counter stats for 'bin/pf-test 128' (5 runs): 23,699,446,603 cpu-cycles # 2.182 GHz ( +- 0.01% ) (23.53%) 24,794,548,512 instructions # 1.05 insn per cycle ( +- 0.00% ) (29.41%) 432,775 cache-references # 0.040 M/sec ( +- 3.96% ) (29.41%) 75,580 cache-misses # 17.464 % of all cache refs ( +- 51.42% ) (29.41%) 2,492,858,290 branch-instructions # 229.475 M/sec ( +- 0.00% ) (29.42%) 34,016,826 branch-misses # 1.36% of all branches ( +- 0.04% ) (29.42%) 1,078,468,643 bus-cycles # 99.276 M/sec ( +- 0.01% ) (23.53%) 717,228 L1-dcache-load-misses # 0.20% of all L1-dcache accesses ( +- 3.77% ) (23.53%) 351,999,535 L1-dcache-loads # 32.403 M/sec ( +- 0.04% ) (23.53%) 75,988 LLC-loads # 0.007 M/sec ( +- 4.20% ) (23.53%) 24,503 LLC-load-misses # 32.25% of all LL-cache accesses ( +- 53.30% ) (23.53%) 57,283 LLC-stores # 0.005 M/sec ( +- 2.15% ) (11.76%) 19,738 LLC-store-misses # 0.002 M/sec ( +- 46.55% ) (11.76%) 351,836,498 dTLB-loads # 32.388 M/sec ( +- 0.04% ) (17.65%) 1,171 dTLB-load-misses # 0.00% of all dTLB cache accesses ( +- 42.68% ) (23.53%) 17,385,579,725 dTLB-stores # 1600.392 M/sec ( +- 0.00% ) (23.53%) 200 dTLB-store-misses # 0.018 K/sec ( +- 10.63% ) (23.53%) 10.863678 +- 0.000804 seconds time elapsed ( +- 0.01% ) L1-dcache-load-misses (L1D.REPLACEMENT) is substantially lower which suggests that, as expected, we aren't doing write-allocate or RFO. Note that the IPC and instruction counts etc are quite different, but that's just an artifact of switching from a single 'REP; STOSB' per PAGE_SIZE region to a MOVNTI loop. The page-clearing BW is substantially higher (~100% or more), so enable X86_FEATURE_NT_GOOD for Intel Broadwellx. Signed-off-by: Ankur Arora --- arch/x86/kernel/cpu/intel.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 59a1e3ce3f14..161028c1dee0 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -662,6 +662,8 @@ static void init_intel(struct cpuinfo_x86 *c) c->x86_cache_alignment = c->x86_clflush_size * 2; if (c->x86 == 6) set_cpu_cap(c, X86_FEATURE_REP_GOOD); + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X) + set_cpu_cap(c, X86_FEATURE_NT_GOOD); #else /* * Names for the Pentium II/Celeron processors From patchwork Wed Oct 14 08:32:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankur Arora X-Patchwork-Id: 11837057 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4AE7514B2 for ; Wed, 14 Oct 2020 08:35:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DAB70221EB for ; Wed, 14 Oct 2020 08:35:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="St0ZITX9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DAB70221EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E64AA940009; Wed, 14 Oct 2020 04:35:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DED68900002; Wed, 14 Oct 2020 04:35:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C3F7A940009; Wed, 14 Oct 2020 04:35:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 84D06900002 for ; Wed, 14 Oct 2020 04:35:38 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 20BD1181AC9CC for ; Wed, 14 Oct 2020 08:35:38 +0000 (UTC) X-FDA: 77369872356.10.care32_1602d032720a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id E9DDA16A0DE for ; Wed, 14 Oct 2020 08:35:37 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,ankur.a.arora@oracle.com,,RULES_HIT:30054:30064,0,RBL:141.146.126.78:@oracle.com:.lbl8.mailshell.net-64.10.201.10 62.18.0.100;04ygkcbw4n71w5ur8fnjw8fgah5k8ycyuap4tb6a4fpwfjweidn67sdktjiaiwj.f31xwby1mya5mijruew15j59eh1wiotdttgn7nreafw4ddagen6wf1yfzqqkw9n.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: care32_1602d032720a X-Filterd-Recvd-Size: 9818 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 08:35:36 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8YRqa186243; Wed, 14 Oct 2020 08:35:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=T3PCr19DmirvMKsRm0x8g9OFkz0Tyylo49u+1BOAFHo=; b=St0ZITX9FhrksKfVT/FbxvtQeoLPaGoWi67xN00KCBxtVbBPOmHSPzKa8JljTBrLk02q kG9OQDHz5NqBB+ulZZuT39/ApUGjpag9g2YPoLlp7ONfE++y4550mD75c3xDRtDRIZ3k XimNsbHvQDYOSk40LN87kwohPcuZ5liQ/FLyzSJYIeHn3irPuULlNBM3F9RRVd+dMdSl 9shj9ykx/EDGODiPYD9HCOqB2+0tQplAhJtzRds1ME/Ql3dr8KfFZAnAiUb2l8KWEBeB Ds/QJcFpy2s2ufD2ZJUn7KiZ6j/xl+T8+hYPxMQ2+LyjeTMlmrGBIbV/47IkW63xC4Bm 8g== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 3434wkp6h3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Wed, 14 Oct 2020 08:35:30 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 09E8ZF86154891; Wed, 14 Oct 2020 08:35:30 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 343pv00cgj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Oct 2020 08:35:30 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 09E8ZT6X022446; Wed, 14 Oct 2020 08:35:29 GMT Received: from monad.ca.oracle.com (/10.156.74.184) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 14 Oct 2020 01:35:29 -0700 From: Ankur Arora To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Ankur Arora , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Kim Phillips , Reinette Chatre , Tony Luck , Tom Lendacky , Wei Huang Subject: [PATCH 8/8] x86/cpu/amd: enable X86_FEATURE_NT_GOOD on AMD Zen Date: Wed, 14 Oct 2020 01:32:59 -0700 Message-Id: <20201014083300.19077-9-ankur.a.arora@oracle.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201014083300.19077-1-ankur.a.arora@oracle.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 spamscore=0 adultscore=0 suspectscore=0 phishscore=0 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140062 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9773 signatures=668681 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 phishscore=0 suspectscore=0 impostorscore=0 clxscore=1011 spamscore=0 priorityscore=1501 bulkscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010140062 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: System: Oracle E2-2C CPU: 2 nodes * 64 cores/node * 2 threads/core AMD EPYC 7742 (Rome, 23:49:0) Memory: 2048 GB evenly split between nodes Microcode: 0x8301038 scaling_governor: performance L3 size: 16 * 16MB cpufreq/boost: 0 Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosq (X86_FEATURE_REP_GOOD) and x86-64-movnt (X86_FEATURE_NT_GOOD): x86-64-stosq (5 runs) x86-64-movnt (5 runs) speedup ----------------------- ----------------------- ------- size BW ( pstdev) BW ( pstdev) 16MB 15.39 GB/s ( +- 9.14%) 14.56 GB/s ( +-19.43%) -5.39% 128MB 11.04 GB/s ( +- 4.87%) 14.49 GB/s ( +-13.22%) +31.25% 1024MB 11.86 GB/s ( +- 0.83%) 16.54 GB/s ( +- 0.04%) +39.46% 4096MB 11.89 GB/s ( +- 0.61%) 16.49 GB/s ( +- 0.28%) +38.68% The next workload exercises the page-clearing path directly by faulting over an anonymous mmap region backed by 1GB pages. This workload is similar to the creation phase of pinned guests in QEMU. $ cat pf-test.c #include #include #include #define HPAGE_BITS 30 int main(int argc, char **argv) { int i; unsigned long len = atoi(argv[1]); /* In GB */ unsigned long offset = 0; unsigned long numpages; char *base; len *= 1UL << 30; numpages = len >> HPAGE_BITS; base = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_HUGE_1GB, 0, 0); for (i = 0; i < numpages; i++) { *((volatile char *)base + offset) = *(base + offset); offset += 1UL << HPAGE_BITS; } return 0; } The specific test is for a 128GB region but this is a single-threaded O(n) workload so the exact region size is not material. Page-clearing throughput for clear_page_rep(): 11.33 GBps $ perf stat -r 5 --all-kernel -e ... bin/pf-test 128 Performance counter stats for 'bin/pf-test 128' (5 runs): 25,130,082,910 cpu-cycles # 2.226 GHz ( +- 0.44% ) (54.54%) 1,368,762,311 instructions # 0.05 insn per cycle ( +- 0.02% ) (54.54%) 4,265,726,534 cache-references # 377.794 M/sec ( +- 0.02% ) (54.54%) 119,021,793 cache-misses # 2.790 % of all cache refs ( +- 3.90% ) (54.55%) 413,825,787 branch-instructions # 36.650 M/sec ( +- 0.01% ) (54.55%) 236,847 branch-misses # 0.06% of all branches ( +- 18.80% ) (54.56%) 2,152,320,887 L1-dcache-load-misses # 40.40% of all L1-dcache accesses ( +- 0.01% ) (54.55%) 5,326,873,560 L1-dcache-loads # 471.775 M/sec ( +- 0.20% ) (54.55%) 828,943,234 L1-dcache-prefetches # 73.415 M/sec ( +- 0.55% ) (54.54%) 18,914 dTLB-loads # 0.002 M/sec ( +- 47.23% ) (54.54%) 4,423 dTLB-load-misses # 23.38% of all dTLB cache accesses ( +- 27.75% ) (54.54%) 11.2917 +- 0.0499 seconds time elapsed ( +- 0.44% ) Page-clearing throughput for clear_page_nt(): 16.29 GBps $ perf stat -r 5 --all-kernel -e ... bin/pf-test 128 Performance counter stats for 'bin/pf-test 128' (5 runs): 17,523,166,924 cpu-cycles # 2.230 GHz ( +- 0.03% ) (45.43%) 24,801,270,826 instructions # 1.42 insn per cycle ( +- 0.01% ) (45.45%) 2,151,391,033 cache-references # 273.845 M/sec ( +- 0.01% ) (45.46%) 168,555 cache-misses # 0.008 % of all cache refs ( +- 4.87% ) (45.47%) 2,490,226,446 branch-instructions # 316.974 M/sec ( +- 0.01% ) (45.48%) 117,604 branch-misses # 0.00% of all branches ( +- 1.56% ) (45.48%) 273,492 L1-dcache-load-misses # 0.06% of all L1-dcache accesses ( +- 2.14% ) (45.47%) 490,340,458 L1-dcache-loads # 62.414 M/sec ( +- 0.02% ) (45.45%) 20,517 L1-dcache-prefetches # 0.003 M/sec ( +- 9.61% ) (45.44%) 7,413 dTLB-loads # 0.944 K/sec ( +- 8.37% ) (45.44%) 2,031 dTLB-load-misses # 27.40% of all dTLB cache accesses ( +- 8.30% ) (45.43%) 7.85674 +- 0.00270 seconds time elapsed ( +- 0.03% ) The L1-dcache-load-misses (L2$ access from DC Miss) count is substantially lower which suggests we aren't doing write-allocate or RFO. The L1-dcache-prefetches are also substantially lower. Note that the IPC and instruction counts etc are quite different, but that's just an artifact of switching from a single 'REP; STOSQ' per PAGE_SIZE region to a MOVNTI loop. The page-clearing BW shows a ~40% improvement. Additionally, a quick 'perf bench memset' comparison on AMD Naples (AMD EPYC 7551) shows similar performance gains. So, enable X86_FEATURE_NT_GOOD for AMD Zen. Signed-off-by: Ankur Arora --- arch/x86/kernel/cpu/amd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index dcc3d943c68f..c57eb6c28aa1 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -918,6 +918,9 @@ static void init_amd_zn(struct cpuinfo_x86 *c) { set_cpu_cap(c, X86_FEATURE_ZEN); + if (c->x86 == 0x17) + set_cpu_cap(c, X86_FEATURE_NT_GOOD); + #ifdef CONFIG_NUMA node_reclaim_distance = 32; #endif