From patchwork Fri Apr 7 12:23:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13204718 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9F22C76196 for ; Fri, 7 Apr 2023 12:24:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8CCA900003; Fri, 7 Apr 2023 08:24:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B3C48900002; Fri, 7 Apr 2023 08:24:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A056B900003; Fri, 7 Apr 2023 08:24:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 930C9900002 for ; Fri, 7 Apr 2023 08:24:25 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 569314131C for ; Fri, 7 Apr 2023 12:24:25 +0000 (UTC) X-FDA: 80654512890.07.72B573E Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf16.hostedemail.com (Postfix) with ESMTP id 346DB180018 for ; Fri, 7 Apr 2023 12:24:22 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=heXqXjGt; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf16.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680870262; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=jbrsTMDIBAnElVIbszazm9DC56r9RLyVOAJKl9BpOgA=; b=SEknB6Yc/ziXlnJZLfbcB66rpETXvgm0SV/DrVk+ThKxJ8b1R2+I2n2w37c2CKMOunJFDk 1GuyQnorX3pLlPqwPtF4rHzDDi3mnRJgr/uOJXXFziik2e2Mo7vZS/5lrqU5aQhC2ot8vL ZOemrVVRwKmKZwMygOrq7zhckr6ppdc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=heXqXjGt; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf16.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680870262; a=rsa-sha256; cv=none; b=bbinvNIrsERnrkGTov5bRt0zq1CAAbGXQlhUoaPncb31Ef3pUyzPUdU+3i/yshn05jQSr1 iDB0LBL5OHSpudrqTqC6CpniNdtfWYzryFalY4qyZpDc2spVdF1wb1DXbWXW0gXbY4foyK 3MScp9GZKOncThKiZuhaPNduW2Mt9+Q= Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 337BuDMF022919; Fri, 7 Apr 2023 12:24:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=jbrsTMDIBAnElVIbszazm9DC56r9RLyVOAJKl9BpOgA=; b=heXqXjGt3ME3d0a5SzKHvo/KFGzrFoGYSyUbOI/4BU/t2NOalKP3+Lfv4ZIb+DCQuWt5 7EfvYgcDhzvkWI+THABLABEXvUOR42sGr6U2phLaeWkNzC2WKbFuqYVj7CHBRbcKqQeT gRAquQxDJMF+9s6nGDEFgtorBjKv1O3okevXef34OHo8xYgcOn4RUNocxxg1wdA4H5SQ ZDZWOfh1q7uzDE2xXpPmP6k8H67hha+No7dmIVftNDfWFyD1KkQO5wJZQcHEIXrwnzpm 9EhjbhBVqxwAgMSdSSLCaHZtn/XDbwXFq+RComlhNXjAKbrxTgUL45uWPEtTZ1RNBnFg SQ== Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ptghkb42e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 07 Apr 2023 12:24:03 +0000 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 337BNwtj024459; Fri, 7 Apr 2023 12:24:02 GMT Received: from smtprelay06.wdc07v.mail.ibm.com ([9.208.129.118]) by ppma01wdc.us.ibm.com (PPS) with ESMTPS id 3ppc887wea-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 07 Apr 2023 12:24:02 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay06.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 337CO18N58065210 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 7 Apr 2023 12:24:01 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 295E558062; Fri, 7 Apr 2023 12:24:01 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BFCE958054; Fri, 7 Apr 2023 12:23:57 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.41.121]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 7 Apr 2023 12:23:57 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" , Joao Martins , Muchun Song , Dan Williams , Tarun Sahu Subject: [PATCH] mm/vmemmap/devdax: Fix kernel crash when probing devdax devices Date: Fri, 7 Apr 2023 17:53:53 +0530 Message-Id: <20230407122353.12018-1-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: RzktbXVaKLwAvYW-qAu_mdcZrLXFbCAo X-Proofpoint-ORIG-GUID: RzktbXVaKLwAvYW-qAu_mdcZrLXFbCAo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-07_07,2023-04-06_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 phishscore=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 adultscore=0 suspectscore=0 clxscore=1011 impostorscore=0 mlxscore=0 spamscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304070110 X-Rspamd-Queue-Id: 346DB180018 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 5r4huwo43wa7uh9qoemsd4br5tsckur1 X-HE-Tag: 1680870261-202954 X-HE-Meta: U2FsdGVkX19vT5KlFwJQgbsnFSLgAdoVrF59GxpACBlwZl1RgbGCiHEDIoNzWKA4VLtos2hreWYxerWcM7lRW6noFVHL+gnQa5bKmrHGPID1iMpAjYHbKHOA/4u/ANvhJRfO9D9FmACg2ejIWexAYEDlymkance56j4rTFY4g9m8pOQGAxEpN+5fiZ11JctDSXlMbgYK3YszLpKetCcGNPhGNA2yzLez0pkYd9zuCac+bSNBWC7vFKEimgZXVae2S7scRNv3SfmgWvokV7rh5fKYnZz1FYyV2BX6XKQg5lwKUL1mv8F22U/Vsdrgp/RmR4OfYBKsfhSI3+T1aLPDllwLLljgxKElRjlrwa624ZKCMEx1V9IsRfRNTbIT/O8lHAnWQ61M5SVJhv96fBiU5hj1S4osI2UFoA7XMsN4dA67Q7RuZTCKApJDaVCirb89xIY4Flnr9B7PQ3cQpxysgsr4LgfwPNRuEQfxXHC2Q/zpnse/8sbRAZH3fj0bkB2jqwJfzzgEHrOD4w7ivC8TM4h0UuCANsN0XlZg8lHMKMkyjedh7chnRWQAoHHbpRj08fMtuHexN1RhmZL2LPOHMl6OXSksNz/clwMqsllImJrirA3vTVPuva3q+oPcKQ6O1C8KHmkJNt1sG82kpNO+Q1E5QhHudJbhqz8+K4by2hgNOuvZ6BeXcV7wR6hUY2PUXbhAW7Tf5WXlDczdjmGgfPl+8FAOReOh03CvsB2iR6o+4ENqzpdqI/hQXeHZVXTt5OhioBVHYPvIF5Umo030ikKUP5F5oRPL297lOwyehxkdhWGie5JrOulpFGfBtFibxJe2Q08Pld3IgnalRxh40OYDqYTXx2zQ6zVjXZXrdCBnS8FW0pTeKKKLk/SGNiWIJMgPOr0TnxaItiroFsN5vdq6bow/q46eDKPhrAhXMMa+aga1y4wc77rlhy1p/65VP+XrsoVbfR8YBk8qilN XamLwj9O QFupJk+k7/7akaYn1Ul0SZzb4s5bC5uRMFSDjXZbgRNcBEB+UpJ0I1R3UUhF3RlP1XvjbHsGLLeRNMujTYhfVauFQE6q3yyzbZ/g+5KSnuPR0DNxOuiQf+t3Rkm9TqEOQ1Ey96/0VFLEHhGVBYLmtso3k1tqaawzG2jQ503aNd8MWoghyLNRNz3i626ZV65zHAuybB9NSBPzc+NAW6w3uNXoMRXdWQzrk7kDIirBW8dXsm990CYt4l/FZ7l0wTPiNuwV7DbkDVgQqCjJqpxZKt/FjBTKlWS58Tk1SKkEg7Ay/jt0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: commit c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. I added vmemmap_can_optimize() even though page_vmemmap_nr(pgmap) > 1 check should filter architecture not supporting this. IMHO that brings clarity to the code where we are populating vmemmap. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Fixes: c4386bd8ee3a ("mm/memremap: add ZONE_DEVICE support for compound pages") Cc: Joao Martins Cc: Muchun Song Cc: Dan Williams Reported-by: Tarun Sahu Signed-off-by: Aneesh Kumar K.V Reviewed-by: Dan Williams --- arch/arm64/Kconfig | 1 + arch/loongarch/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/x86/Kconfig | 1 + drivers/dax/device.c | 3 ++- include/linux/mm.h | 16 ++++++++++++++++ mm/Kconfig | 3 +++ mm/sparse-vmemmap.c | 3 +-- 8 files changed, 26 insertions(+), 3 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 27b2592698b0..d3f5945f0aff 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -103,6 +103,7 @@ config ARM64 select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_NO_INSTR + select ARCH_WANT_OPTIMIZE_VMEMMAP select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES select ARCH_HAS_UBSAN_SANITIZE_ALL select ARM_AMBA diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 9cc8b84f7eb0..ce5802066d0e 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -56,6 +56,7 @@ config LOONGARCH select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_NO_INSTR + select ARCH_WANT_OPTIMIZE_VMEMMAP select BUILDTIME_TABLE_SORT select COMMON_CLK select CPU_PM diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 933771b0b07a..abffccd937b2 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -127,6 +127,7 @@ config S390 select ARCH_WANT_DEFAULT_BPF_JIT select ARCH_WANT_IPC_PARSE_VERSION select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP + select ARCH_WANT_OPTIMIZE_VMEMMAP select BUILDTIME_TABLE_SORT select CLONE_BACKWARDS2 select DMA_OPS if PCI diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a825bf031f49..e8d66d834b4f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -127,6 +127,7 @@ config X86 select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP if X86_64 select ARCH_WANT_LD_ORPHAN_WARN + select ARCH_WANT_OPTIMIZE_VMEMMAP if X86_64 select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH select BUILDTIME_TABLE_SORT diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 5494d745ced5..05be8e79d64b 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -448,7 +448,8 @@ int dev_dax_probe(struct dev_dax *dev_dax) } pgmap->type = MEMORY_DEVICE_GENERIC; - if (dev_dax->align > PAGE_SIZE) + if (dev_dax->align > PAGE_SIZE && + IS_ENABLED(CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP)) pgmap->vmemmap_shift = order_base_2(dev_dax->align >> PAGE_SHIFT); addr = devm_memremap_pages(dev, pgmap); diff --git a/include/linux/mm.h b/include/linux/mm.h index 716d30d93616..fb71e21df23d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3442,6 +3442,22 @@ void vmemmap_populate_print_last(void); void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap); #endif + +#ifdef CONFIG_ARCH_WANT_OPTIMIZE_VMEMMAP +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return is_power_of_2(sizeof(struct page)) && + pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap; +} +#else +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return false; +} +#endif + void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long nr_pages); diff --git a/mm/Kconfig b/mm/Kconfig index ff7b209dec05..99f87c1be1e8 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -461,6 +461,9 @@ config SPARSEMEM_VMEMMAP pfn_to_page and page_to_pfn operations. This is the most efficient option when sufficient kernel resources are available. +config ARCH_WANT_OPTIMIZE_VMEMMAP + bool + config HAVE_MEMBLOCK_PHYS_MAP bool diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index c5398a5960d0..10d73a0dfcec 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -458,8 +458,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn, !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (is_power_of_2(sizeof(struct page)) && - pgmap && pgmap_vmemmap_nr(pgmap) > 1 && !altmap) + if (vmemmap_can_optimize(altmap, pgmap)) r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); else r = vmemmap_populate(start, end, nid, altmap);