From patchwork Wed Apr 12 05:00:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13208482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A62EC77B73 for ; Wed, 12 Apr 2023 05:01:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98FA1900003; Wed, 12 Apr 2023 01:01:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91821900002; Wed, 12 Apr 2023 01:01:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 741FF900004; Wed, 12 Apr 2023 01:01:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 55271900003 for ; Wed, 12 Apr 2023 01:01:02 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 21475803B7 for ; Wed, 12 Apr 2023 05:01:02 +0000 (UTC) X-FDA: 80671539564.23.B139A70 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf10.hostedemail.com (Postfix) with ESMTP id E5EA1C000E for ; Wed, 12 Apr 2023 05:00:59 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=c8sW33Xt; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf10.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681275660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=OkufwmyZ15XV4QTt7jhgtUI5bYNa7F6h+SIYhpb9qRc=; b=wmFu8hJa049Nw5+8pQb0vATvdZ4ndA4KQLrgdF8mlpBzp2MzK/jMOzXlSZSq2lJZjar/+m JhQObhYz2iHyRYhjVPTzcFtbqYKewGQZDdH3hNmawoInboQ+Tu5a0kGd/beJCbtoLdxaSx 6d6uUGoNxA26KufAxR5XaOHvVb0BKh8= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=c8sW33Xt; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf10.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681275660; a=rsa-sha256; cv=none; b=8YWyeXTtL6OJBwDjtmpRssBcOXYII03yyqPVolJqBZcaBZetYLoi48QCwAbbrdZtpiFrcI FuDFwXcK/a6TUJxmaxOKZwalFtmI6zODfi90N9dhCmRhc5tOFdT5zevWOHC8EzeS/H9/jo P25fnNTorWluj2HNl0u2TR91vAVe9P8= Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33C4O8su008895; Wed, 12 Apr 2023 05:00:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=OkufwmyZ15XV4QTt7jhgtUI5bYNa7F6h+SIYhpb9qRc=; b=c8sW33XtaxVGFYjXloRVbvw4ZJ+r/wCAmL4hLLYuAnBn4NQP4ZJj/cFCcdvdJSlYuwhO PKarNRv+Gi51R81/T/zTnZRQtucNaBLS5VRy8jzoYV1hmhQQb+TYONm2eEFrhnkh9qrJ Ky49OYQx7fiNc4nnt0t2U3fSxAANNl4F0CqyIgvkct9cAmeMaI6XakrxnZYjYlv6sUGg 7gUt+QDrCwy8Iw0IdZeSnaqOH1zj7qj0y06qGbnAwbIH1tk9wrFZvwh71oibP0vIrO74 CCmFk+MT58EVuzkeYKM2+nCNlkaAPk9MMTXlh+rFePb8YNRiBSCJ8NcLHy9e2lfHewA5 AQ== Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pwnk692xa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Apr 2023 05:00:41 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 33C2cLM7009441; Wed, 12 Apr 2023 05:00:40 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([9.208.129.116]) by ppma04dal.us.ibm.com (PPS) with ESMTPS id 3pu0gyrxym-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 12 Apr 2023 05:00:40 +0000 Received: from smtpav02.dal12v.mail.ibm.com (smtpav02.dal12v.mail.ibm.com [10.241.53.101]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 33C50c9P12583442 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 12 Apr 2023 05:00:38 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 89B0C5805F; Wed, 12 Apr 2023 05:00:38 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6B37358062; Wed, 12 Apr 2023 05:00:35 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.80.153]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTP; Wed, 12 Apr 2023 05:00:34 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" , Joao Martins , Muchun Song , Dan Williams , Mike Kravetz , Tarun Sahu Subject: [PATCH v3 1/2] mm/vmemmap/devdax: Fix kernel crash when probing devdax devices Date: Wed, 12 Apr 2023 10:30:24 +0530 Message-Id: <20230412050025.84346-1-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: pMZAF-IC81mZZlV-deGdXxcS302vqrCq X-Proofpoint-ORIG-GUID: pMZAF-IC81mZZlV-deGdXxcS302vqrCq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-11_16,2023-04-11_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 adultscore=0 clxscore=1015 mlxscore=0 phishscore=0 mlxlogscore=999 impostorscore=0 bulkscore=0 spamscore=0 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304120039 X-Rspamd-Queue-Id: E5EA1C000E X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: ua9nsghjohrswop3fpiydz4nx1cgt5nc X-HE-Tag: 1681275659-389560 X-HE-Meta: U2FsdGVkX19AhirKqBb3BoNCNDaIVheb6wyIbR3djpRJGDrXPDcKgd2+8CHdwBNiLFGYb2XOQTPHURkvGzJP9pxjAqwo/dea2Q82yWYYLj/nJ0cWydS5MI2/H5nQCyum39dbro7QKRtxaVSAwD9yC8sYyyz2BaA1mkeqnWbJov/rp6xDICVbX+yZC4wwdgQjqXQiXcXdY9XqT4mZP7RHgxrDo3yUwl+n/YmoCvnR7bnhHduWF8AETF+DHivtqKjX2RFPY+9PZYaH9tA1LZ5aDV9mkKRIwcCGb3+NbDeX5F5Ac1s0GeHtraVinXN7bMOEdgJhceTAWw6gk6sNjWxvyVMT9W66YvEmAZ0ZjfxtYSvX45ueD2COOrmldST9UdDVzc6rSnSx6QfH0Zn81CqzpwPfVBEBR92X/wnw0Iq7VBBzczep0NUkA7tQ6r/xB2bjtj6kI6iNRrDro1Qm/S22JPWrpZOPAMO9hSj+ywnCNrvoPcuH2n1BBHxlN9kHBmh+XqQm0wLY3pEAkZxI56wQEDiKQx+M2c78iDhnQ6VvfCFeIjsAwWMbgfA9qV91eA+dZdne45mVQDHc8P/T15AcU8rbzlzQF9oEfMX46gE1vNCK8Kqw7O/47RkWIM8qV2H+v3N1BJk2IzyvScyGcHbw+RplAl6BujQuOxukRlBzc22zCEzBEYU9dPAx9ulcfwYi/VZ0l0+npcX22I1UdtytS+HPqgPmVYQnz/j++KxQWs1uBzv7cTRfRE2O90xx6xsahhJjfgBJrLT1K8A5Uly/zb+s+waW83bEcMryB0eAgQCwte8ADTL+Ly2plFm9Eqouiq1AqYaC0BF78clMHnUW4NRFSIA8Ar6CSzk03aU6iwqpDnx7iEaHyjg4QKUu/K1wD8TTOshJVlp70Z7sViXVj3nwzcj3TsWXnshdsU4Yc18qbMpB+LxbhhxMuGeTKDcLVF0Bg1zoJ9ykkinLHVG 5GdaPklT N26jtOVcl9ZkVIKSnIn3C6NLvFO/2kZauC+CdCFB07t/4l/Yxh0vjzqKrVIpN6P2TS0R7/BwBYPmfdtfEku0haRyECQBV5pqAR1uOUOWDMumvk8MIZw46StyK4Kg9oIkBwntnvNyrhiiTh7vRfXPGctbji2rFz3JrTyZpNxhO0F5ADvnCFw/VB5sipWGnxSD1ben8l9e7RnNyjFWLqJn4Cfl7k1lVdh94b6zzdAdWNZAqvHUFKp14lSlnH7J4ys1UL4o1X89Ek1AAxlY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. Architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option are selected when supporting this feature. On ppc64 (pmem) where this isn't supported, it fixes below crash:' BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Cc: Joao Martins Cc: Muchun Song Cc: Dan Williams Cc: Mike Kravetz Reported-by: Tarun Sahu Signed-off-by: Aneesh Kumar K.V Reviewed-by: Joao Martins --- include/linux/mm.h | 16 ++++++++++++++++ mm/page_alloc.c | 11 +++++++---- mm/sparse-vmemmap.c | 3 +-- 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..ced82b9c18e5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3425,6 +3425,22 @@ void vmemmap_populate_print_last(void); void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap); #endif + +#ifdef CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return is_power_of_2(sizeof(struct page)) && + pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap; +} +#else +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return false; +} +#endif + void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long nr_pages); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7136c36c5d01..cf9f9ddfbd19 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6889,10 +6889,13 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * of an altmap. See vmemmap_populate_compound_pages(). */ static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, - unsigned long nr_pages) + struct dev_pagemap *pgmap) { - return is_power_of_2(sizeof(struct page)) && - !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; + + if (!vmemmap_can_optimize(altmap, pgmap)) + return pgmap_vmemmap_nr(pgmap); + + return 2 * (PAGE_SIZE / sizeof(struct page)); } static void __ref memmap_init_compound(struct page *head, @@ -6957,7 +6960,7 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pfns_per_compound)); + compound_nr_pages(altmap, pgmap)); } pr_info("%s initialised %lu pages in %ums\n", __func__, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index c5398a5960d0..10d73a0dfcec 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -458,8 +458,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn, !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (is_power_of_2(sizeof(struct page)) && - pgmap && pgmap_vmemmap_nr(pgmap) > 1 && !altmap) + if (vmemmap_can_optimize(altmap, pgmap)) r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); else r = vmemmap_populate(start, end, nid, altmap);