From patchwork Tue Aug 18 08:11:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srikar Dronamraju X-Patchwork-Id: 11720297 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2F0B2618 for ; Tue, 18 Aug 2020 08:12:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E3E042076E for ; Tue, 18 Aug 2020 08:12:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="sBg1Zh8V" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E3E042076E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E54646B0024; Tue, 18 Aug 2020 04:12:32 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E2AF06B0025; Tue, 18 Aug 2020 04:12:32 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D19EA6B0026; Tue, 18 Aug 2020 04:12:32 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id B825F6B0024 for ; Tue, 18 Aug 2020 04:12:32 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 73B92180AD822 for ; Tue, 18 Aug 2020 08:12:32 +0000 (UTC) X-FDA: 77162972544.30.brain00_3e06bf22701d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 49B4E180B3C83 for ; Tue, 18 Aug 2020 08:12:32 +0000 (UTC) X-Spam-Summary: 1,0,0,cb5eb2b8f344ef73,d41d8cd98f00b204,srikar@linux.vnet.ibm.com,,RULES_HIT:2:41:69:355:379:541:800:960:966:973:982:988:989:1260:1261:1311:1314:1345:1359:1431:1437:1515:1535:1605:1606:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2526:2559:2562:2691:2693:2741:2892:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4120:4250:4321:4379:4385:4605:5007:6119:6261:6653:6742:7903:7904:8603:10004:10026:11026:11232:11657:11658:11914:12043:12294:12296:12297:12438:12555:12679:12740:12895:12986:13053:13161:13180:13229:13846:13870:13894:14096:14394:21063:21080:21324:21451:21483:21611:21627:21795:21939:21990:30012:30034:30051:30054:30075:30080,0,RBL:148.163.156.1:@linux.vnet.ibm.com:.lbl8.mailshell.net-62.8.0.100 64.201.201.201;04y8fnjxga6snwek3w1epehrzbmomypcdh3q84m6mrsh9bixbkhzm6mae84qmk7.1sufzndx4k46wgr9kunp9s417k6gwdxipuuqadra4smgkgwecqnyp3eoo3m6hyh.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0 ,MSF:not X-HE-Tag: brain00_3e06bf22701d X-Filterd-Recvd-Size: 9345 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 18 Aug 2020 08:12:31 +0000 (UTC) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 07I82W47083897; Tue, 18 Aug 2020 04:12:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=n/JmR5KruIHZvbLzMxijcT999B5k7s3kVo6okQwOPlU=; b=sBg1Zh8V6RkmXAj4cHsxY+htM73BXoRP9g9eKan1dFICG1gI4J2la8t/7Vu4ox1junsb DbrBxD3NLfJgtomOrsisgxX9y9SH9+Mb82jmX9+MrXFy5u13neAiWKLjm6DTZl/QeoXf YYPkyEoFkXJ8ETE/P647C8dZODHQvG3XTF1rzMWeZK+YwYCiOSl2W79esg9I0jkqdwwA Q27KNl8G5A2WTSMjXK12l5UipLBNsA82flQ7ALbPR5L54nDF4OK9d/FlCK/9Z8dtVHfl tSBkaJo56ddPjgqa6WMdNLKC631wloQ4LqoHwdu7vKp9FY4vzRrYFIDWJcPda8owQb58 ow== Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 3304r5hk4c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 04:12:27 -0400 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 07I8AJVR026207; Tue, 18 Aug 2020 08:12:25 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma03fra.de.ibm.com with ESMTP id 3304c907rw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Aug 2020 08:12:25 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 07I8CMN325362724 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Aug 2020 08:12:22 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD10CA4057; Tue, 18 Aug 2020 08:12:21 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5369DA4040; Tue, 18 Aug 2020 08:12:18 +0000 (GMT) Received: from srikart450.in.ibm.com (unknown [9.85.93.83]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 18 Aug 2020 08:12:17 +0000 (GMT) From: Srikar Dronamraju To: Michael Ellerman Cc: linuxppc-dev , Srikar Dronamraju , linux-mm@kvack.org, Michal Hocko , Mel Gorman , Vlastimil Babka , Christopher Lameter , Andrew Morton , Linus Torvalds , Gautham R Shenoy , David Hildenbrand , Aneesh Kumar K V Subject: [PATCH v6 3/3] powerpc/numa: Offline memoryless cpuless node 0 Date: Tue, 18 Aug 2020 13:41:04 +0530 Message-Id: <20200818081104.57888-4-srikar@linux.vnet.ibm.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200818081104.57888-1-srikar@linux.vnet.ibm.com> References: <20200818081104.57888-1-srikar@linux.vnet.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-08-18_04:2020-08-18,2020-08-18 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 malwarescore=0 suspectscore=2 impostorscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 clxscore=1015 phishscore=0 bulkscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2008180051 X-Rspamd-Queue-Id: 49B4E180B3C83 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently Linux kernel with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause numa_balancing to be enabled on systems with only one node with memory and CPUs. The existence of this dummy node which is cpuless and memoryless node can confuse users/scripts looking at output of lscpu / numactl. By marking, node 0 as offline, lets stop assuming that node 0 is always online. If node 0 has CPU or memory that are online, node 0 will again be set as online. v5.8 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files ------------------ /sys/devices/system/node/online: 0,2 /proc/sys/kernel/numa_balancing: 1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.8 + patch ------------------ available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files ------------------ /sys/devices/system/node/online: 2 /proc/sys/kernel/numa_balancing: 0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Example of a node with online CPUs/memory on node 0. (Same o/p with and without patch) numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 0 size: 32482 MB node 0 free: 22994 MB node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 node 3 size: 0 MB node 3 free: 0 MB node distances: node 0 1 2 3 0: 10 20 40 40 1: 20 10 40 40 2: 40 40 10 20 3: 40 40 20 10 Note: On Powerpc, cpu_to_node of possible but not present cpus would previously return 0. Hence this commit depends on commit ("powerpc/numa: Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id queried from vphn"). Without the 2 commits, Powerpc system might crash. 1. User space applications like Numactl, lscpu, that parse the sysfs tend to believe there is an extra online node. This tends to confuse users and applications. Other user space applications start believing that system was not able to use all the resources (i.e missing resources) or the system was not setup correctly. 2. Also existence of dummy node also leads to inconsistent information. The number of online nodes is inconsistent with the information in the device-tree and resource-dump 3. When the dummy node is present, single node non-Numa systems end up showing up as NUMA systems and numa_balancing gets enabled. This will mean we take the hit from the unnecessary numa hinting faults. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-mm@kvack.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Cc: David Hildenbrand Cc: Aneesh Kumar K V Signed-off-by: Srikar Dronamraju --- Changelog: v5->v6: Moved fix from arch independent code to powerpc specific (Michal Hocko, Christopher Lamater) arch/powerpc/mm/numa.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 10c5064eeb88..0d72a7d4360e 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -924,6 +924,16 @@ void __init mem_topology_setup(void) { int cpu; + /* + * Linux/mm assumes node 0 to be online at boot. However this is not + * true on PowerPC, where node 0 is similar to any other node, it + * could be cpuless, memoryless node. So force node 0 to be offline + * for now. This will prevent cpuless, memoryless node 0 showing up + * unnecessarily as online. If a node has cpus or memory that need + * to be online, then node will anyway be marked online. + */ + node_set_offline(0); + if (parse_numa_properties()) setup_nonnuma();