From patchwork Tue Apr 11 14:18:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13207648 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6513CC76196 for ; Tue, 11 Apr 2023 14:18:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02441280005; Tue, 11 Apr 2023 10:18:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F17C4280001; Tue, 11 Apr 2023 10:18:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB9A3280005; Tue, 11 Apr 2023 10:18:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CB805280001 for ; Tue, 11 Apr 2023 10:18:58 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 762A48042F for ; Tue, 11 Apr 2023 14:18:58 +0000 (UTC) X-FDA: 80669316756.12.9F54E4D Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf21.hostedemail.com (Postfix) with ESMTP id 3B7D41C001B for ; Tue, 11 Apr 2023 14:18:55 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=cwq19Spn; spf=pass (imf21.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681222735; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=BYAKrJjIrWlN1b24nPWvwtu0oU8bhP8vn42VXAHF/oA=; b=5LISDVNKeAF7Dd5yvZxDam8KyiE3K3VCy1V76pWwUGwnJFFNRZOeGrPhsUmTRCOJb3RiS8 Fx9cQjWmufitJTSATBfinGTZK6GPkYjJtTl2WkGnIDtxrVup+EQZl456ooNCe7y0rNdq7i xIlN2HRab6gzaiyBfNQc7RvxyaOY+0U= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=cwq19Spn; spf=pass (imf21.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681222735; a=rsa-sha256; cv=none; b=NnsCS9ofz07BkchQrSUJe1f3WmCXpD7rv5uvHGmU3mM/KPNZeTdeMDSM921g8WEmudzKAr 52eEXaEfT7qPYE82VAaIbC99a0EEWjGGZMGX4rfS5Y3auklXmwaebMLprMv/D/0TqGNSYg pmzdJ56yU2idzmtQEWr7PlLrC+G5eNI= Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33BEEanQ031734; Tue, 11 Apr 2023 14:18:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=BYAKrJjIrWlN1b24nPWvwtu0oU8bhP8vn42VXAHF/oA=; b=cwq19Spn6UvxmswbyTUcNGX7FTqB697dgdZXTTrfUrJt4fEI9ohR3nM8KEkYbJDF7b4D wwqSGY+V8KfYDWrTEcYjwpd6jsbRH9VVKQWK+1zeMKt/fvCuheZFFcOAmnY5qCJk9gtH YKEGrp4RfHkF8IDIhn/Tlo3apuXMaZn2mlY9R02hxwIqAzVnVXAsJy608nywCSUuYR8K WEYaMtkjkhEQ+L0A61Q0rOM/2Ln4nliHfZmoKzgNCJQn9HKOie0kFXsPomtmCQ4xQPUM Sb4tDG1byjTxb4cRfxYWNHHU7C9O2ZF0+hI6ar5zkrzIRMDJlnbLXM7fg+js5VaLMQrq yA== Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3pw94u868j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Apr 2023 14:18:36 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 33BDxAgT029774; Tue, 11 Apr 2023 14:18:36 GMT Received: from smtprelay04.wdc07v.mail.ibm.com ([9.208.129.114]) by ppma02dal.us.ibm.com (PPS) with ESMTPS id 3pu0fqcv3f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Apr 2023 14:18:36 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 33BEIYef53739908 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Apr 2023 14:18:34 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D010A58064; Tue, 11 Apr 2023 14:18:33 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 30EE158056; Tue, 11 Apr 2023 14:18:31 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.75.136]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP; Tue, 11 Apr 2023 14:18:30 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" , Joao Martins , Muchun Song , Dan Williams , Tarun Sahu Subject: [PATCH v2 1/2] mm/vmemmap/devdax: Fix kernel crash when probing devdax devices Date: Tue, 11 Apr 2023 19:48:17 +0530 Message-Id: <20230411141818.62152-1-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 5MvC3gcaqQddIUUXaLexLKJ4CWS0CBv- X-Proofpoint-ORIG-GUID: 5MvC3gcaqQddIUUXaLexLKJ4CWS0CBv- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-11_09,2023-04-11_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 clxscore=1015 malwarescore=0 priorityscore=1501 impostorscore=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304110131 X-Stat-Signature: ncf77ubojri45xkryu51bixaqjz7zoeo X-Rspam-User: X-Rspamd-Queue-Id: 3B7D41C001B X-Rspamd-Server: rspam06 X-HE-Tag: 1681222735-942620 X-HE-Meta: U2FsdGVkX19gf/2AFBUf3ZpX6Tljwi51M+NxAS0NUecRn9pKegX1Duxzx6qmTg582bHd7/eQkjdkpjf1qYnLQmdFrwaKkOvt5//8xJsc9c1aKcBqNPKtl+Ei5TnYjMVwOstvKmZTy787znjLU3KKoim4V0GbUl/b1JAQrBqNJMe0ssDqeI+Y28VbLhE8ZbzF6it/QWe7xgw0NXxlO0vP0tLaAGcQJFxFkcpLt/ehtNBGPFvTLRGV06a2xrgP73A90pywE1j3HLwIZpxMChZ7GlyK+ppz7fP4X/K5YwbPInRpg0ntkgWeCnCW0tlElgcli8aP0/d9Gd4G+YqErS+Qo6ZEt6PoQ9w9xpWYJ3aCr65KfNurib9HthnIy5Wr26Wxeeqhl8U9byn4YgsLqFg0QflFHJe9ZR/OM0YZEv/coINPnawjMgPBBNHzJJLL24Y8OuIAfD22KLcfqhrjRBbgljSlnlvGC2VmQ4BNPesGr2QKY64zNd2zKXo67+ysMWC5u/G3XF84E8CUGn8LaTquZoqem5Oo6RVQqLCc5vYYFJ+q36XJMXzQB9d3VYBELvTZmnf/GxfPDJcxLnE7Qs4JiikYuDpOgureCQRv8D/pT6kuIoslmqIlzkpuNn9ZHVb1A1+ToFP7k3dfczNxfy0G18HNxMTTdHP/8m20XpO657K4LDTqKuZ7HQs02l74wdydFASBTGEO6TYlixW7gByHiRpVEuABpmZaARnZ0/3hZVRAqHy5txWmXdGAphYg6QXJJucVoXp0Fr/Z4UTlvJbJ+SsGhjqv7c1xxFxkvs82jdrtcI5Q3djTZxQoFzAdKGTmmjP9Z8nGR60aRWHniNlZDrbMjyzUAIeT5YY1XJcXHs4M1lJ+aji5hug2WUXTz8vyoNsYDps6III2qFp8DBcaUBv0sCfw4YMDWKIdrmFgYqY7apz9A1/6fup0RIfXLZk2GF5OtuB8DgkLmxSsItT m+gwzC1r L2bp9y5Mqa2v99OUPTdynxqVTiIvvKug8o5CtixLfw+y0AewXvXrSfSFIVSQP0z92dyTtkuBuwQzzgnqjHILGSkputw5b94E05GmUOkUT0EileQcE1LCaCsTgP8HAgCTlx13+SM7SKJQAPJVPu0phjdGFta1gmI8kavN9a5NDEP5ZnHKPGPAkfGCEQcuMgbSeCUDxCUozdjUJ8wP6G30F2pJaHzkVYaV2xG0f1bvpKDQsalOfTOav3ZrBHGekwmDoMJ1c3g2fN3mc8LE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Cc: Joao Martins Cc: Muchun Song Cc: Dan Williams Reported-by: Tarun Sahu Signed-off-by: Aneesh Kumar K.V --- changes from V1: * Only disable memory saving part of compound devmaps * Update the correct Fixes: commit * Add patch to drop HUGETLB specific kconfig include/linux/mm.h | 16 ++++++++++++++++ mm/page_alloc.c | 9 ++++++--- mm/sparse-vmemmap.c | 3 +-- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 716d30d93616..c47f2186d2c2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3442,6 +3442,22 @@ void vmemmap_populate_print_last(void); void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap); #endif + +#ifdef CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMA +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return is_power_of_2(sizeof(struct page)) && + pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap; +} +#else +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return false; +} +#endif + void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long nr_pages); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3bb3484563ed..292411d8816f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6844,10 +6844,13 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * of an altmap. See vmemmap_populate_compound_pages(). */ static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap, unsigned long nr_pages) { - return is_power_of_2(sizeof(struct page)) && - !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; + if (vmemmap_can_optimize(altmap, pgmap)) + return 2 * (PAGE_SIZE / sizeof(struct page)); + else + return nr_pages; } static void __ref memmap_init_compound(struct page *head, @@ -6912,7 +6915,7 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pfns_per_compound)); + compound_nr_pages(altmap, pgmap, pfns_per_compound)); } pr_info("%s initialised %lu pages in %ums\n", __func__, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index c5398a5960d0..10d73a0dfcec 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -458,8 +458,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn, !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (is_power_of_2(sizeof(struct page)) && - pgmap && pgmap_vmemmap_nr(pgmap) > 1 && !altmap) + if (vmemmap_can_optimize(altmap, pgmap)) r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); else r = vmemmap_populate(start, end, nid, altmap);