From patchwork Mon Sep 14 16:50:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 11774485 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A3E59112E for ; Mon, 14 Sep 2020 16:50:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5783B21BE5 for ; Mon, 14 Sep 2020 16:50:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="gw8SIyU+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5783B21BE5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6B750900003; Mon, 14 Sep 2020 12:50:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6667E6B007B; Mon, 14 Sep 2020 12:50:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 555C2900003; Mon, 14 Sep 2020 12:50:55 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 4088B6B0078 for ; Mon, 14 Sep 2020 12:50:55 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F16DF1730874 for ; Mon, 14 Sep 2020 16:50:54 +0000 (UTC) X-FDA: 77262256428.19.bit66_0c12ea02710a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id B298B1AD1A3 for ; Mon, 14 Sep 2020 16:50:54 +0000 (UTC) X-Spam-Summary: 1,0,0,5d7dadd6dc7e1a83,d41d8cd98f00b204,ldufour@linux.ibm.com,,RULES_HIT:2:41:69:152:355:379:541:960:973:988:989:1260:1261:1277:1311:1313:1314:1345:1437:1515:1516:1518:1535:1593:1594:1605:1606:1730:1747:1777:1792:2194:2199:2393:2553:2559:2562:2741:2902:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3874:4120:4250:4321:4362:4605:5007:6119:6238:6261:6653:7875:7903:8660:10004:11026:11473:11658:11914:12043:12048:12291:12296:12297:12438:12555:12679:12683:12740:12895:12986:13148:13161:13229:13230:13548:13870:13894:13972:14096:14097:14394:14659:21080:21433:21451:21524:21611:21627:21939:30003:30029:30045:30054:30056:30070:30090,0,RBL:148.163.158.5:@linux.ibm.com:.lbl8.mailshell.net-64.201.201.201 62.14.0.100;04y8kf5h3a41qfc53dq1qxor53kpzopb4trwo4jzruc8bu759qo1etdyanfyz3w.3em6xzfmitsry7ux91n594ifxmm6sf551zrpap9ga7b7d9uko97mapdegrszykq.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL :0,DNSBL X-HE-Tag: bit66_0c12ea02710a X-Filterd-Recvd-Size: 9578 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Mon, 14 Sep 2020 16:50:54 +0000 (UTC) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08EGUqrn126648; Mon, 14 Sep 2020 12:50:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=pp1; bh=F+6vQedN52j8wgtFjBUAiJBhp+e0vxfPw/ZjD+hUdYk=; b=gw8SIyU+lBQfuXuu9NGvUbGkH3uqOnZfQ8/G0Z2nyn21hf+zAxg24vFq6JjLsTN9zKNd PwEXeyV/VYBFG+sH+fDqLCU9x59p+zzoxgb/cFCat4wKQE865xO6hr7HCEKYyr/QHAvm W+uXayO62Z4+pTywaTr/1iZk6clYjRgglFc/hGcXgmikPpND/JBB/yE+AlgX0N3T3cNm 3pBBLcUM8tPFRugcB5VYIoZq4PF7tO3f2q4mdXWTBEQAerMmNNBV9/tGQ7zn4NmvdFVX o2Ti1ikigkAvtMGsc1Sf0/A6oxnxM6l+8IWlciJ73k61BVXHOy7J4d7Iw7YmfJ+FMmej dw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 33jbx7guyd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Sep 2020 12:50:50 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 08EGV8je127370; Mon, 14 Sep 2020 12:50:49 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 33jbx7guxw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Sep 2020 12:50:49 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 08EGlGAW015707; Mon, 14 Sep 2020 16:50:48 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 33h2r99ybs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 14 Sep 2020 16:50:47 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 08EGojis21037426 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 14 Sep 2020 16:50:45 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F3EBCA4059; Mon, 14 Sep 2020 16:50:44 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7F681A4055; Mon, 14 Sep 2020 16:50:44 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.145.51.157]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 14 Sep 2020 16:50:44 +0000 (GMT) From: Laurent Dufour To: akpm@linux-foundation.org, David Hildenbrand , Oscar Salvador , mhocko@kernel.org, Greg Kroah-Hartman Cc: linux-mm@kvack.org, "Rafael J . Wysocki" , nathanl@linux.ibm.com, cheloha@linux.ibm.com, Tony Luck , Fenghua Yu , linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v2 0/3] mm: fix memory to node bad links in sysfs Date: Mon, 14 Sep 2020 18:50:39 +0200 Message-Id: <20200914165042.96218-1-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-14_06:2020-09-14,2020-09-14 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 lowpriorityscore=0 suspectscore=0 malwarescore=0 phishscore=0 spamscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009140129 X-Rspamd-Queue-Id: B298B1AD1A3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: ------------[ cut here ]------------ kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 NIP: c000000000403f34 LR: c000000000403f2c CTR: 0000000000000000 REGS: c0000004876e3660 TRAP: 0700 Not tainted (5.9.0-rc1+) MSR: 800000000282b033 CR: 24000448 XER: 20040000 CFAR: c000000000846d20 IRQMASK: 0 GPR00: c000000000403f2c c0000004876e38f0 c0000000012f6f00 ffffffffffffffef GPR04: 0000000000000227 c0000004805ae680 0000000000000000 00000004886f0000 GPR08: 0000000000000226 0000000000000003 0000000000000002 fffffffffffffffd GPR12: 0000000088000484 c00000001ec96280 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000004 0000000000000003 GPR20: c00000047814ffe0 c0000007ffff7c08 0000000000000010 c0000000013332c8 GPR24: 0000000000000000 c0000000011f6cc0 0000000000000000 0000000000000000 GPR28: ffffffffffffffef 0000000000000001 0000000150000000 0000000010000000 NIP [c000000000403f34] add_memory_resource+0x244/0x340 LR [c000000000403f2c] add_memory_resource+0x23c/0x340 Call Trace: [c0000004876e38f0] [c000000000403f2c] add_memory_resource+0x23c/0x340 (unreliable) [c0000004876e39c0] [c00000000040408c] __add_memory+0x5c/0xf0 [c0000004876e39f0] [c0000000000e2b94] dlpar_add_lmb+0x1b4/0x500 [c0000004876e3ad0] [c0000000000e3888] dlpar_memory+0x1f8/0xb80 [c0000004876e3b60] [c0000000000dc0d0] handle_dlpar_errorlog+0xc0/0x190 [c0000004876e3bd0] [c0000000000dc398] dlpar_store+0x198/0x4a0 [c0000004876e3c90] [c00000000072e630] kobj_attr_store+0x30/0x50 [c0000004876e3cb0] [c00000000051f954] sysfs_kf_write+0x64/0x90 [c0000004876e3cd0] [c00000000051ee40] kernfs_fop_write+0x1b0/0x290 [c0000004876e3d20] [c000000000438dd8] vfs_write+0xe8/0x290 [c0000004876e3d70] [c0000000004391ac] ksys_write+0xdc/0x130 [c0000004876e3dc0] [c000000000034e40] system_call_exception+0x160/0x270 [c0000004876e3e20] [c00000000000d740] system_call_common+0xf0/0x27c Instruction dump: 48442e35 60000000 0b030000 3cbe0001 7fa3eb78 7bc48402 38a5fffe 7ca5fa14 78a58402 48442db1 60000000 7c7c1b78 <0b030000> 7f23cb78 4bda371d 60000000 ---[ end trace 562fd6c109cd0fb2 ]--- This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are 2 issues here: a. The sysfs memory and node's layout are broken due to these multiple links b. The link errors in link_mem_sections() should not lead to a system panic. To address a. register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. The patch 3 is addressing the point b. Thanks, Laurent Since v1: - change context enum's name from Michal's comment - use 2 callbacks in link_mem_sections from David's comment - use dev_err_ratelimited from Greg's comment Laurent Dufour (3): mm: replace memmap_context by memplug_context mm: don't rely on system state to detect hot-plug operations mm: don't panic when links can't be created in sysfs arch/ia64/mm/init.c | 6 +-- drivers/base/node.c | 99 ++++++++++++++++++++++++++++-------------- include/linux/mm.h | 2 +- include/linux/mmzone.h | 11 +++-- include/linux/node.h | 13 +++--- mm/memory_hotplug.c | 6 +-- mm/page_alloc.c | 10 ++--- 7 files changed, 94 insertions(+), 53 deletions(-)