From patchwork Tue Apr 11 14:22:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 13207651 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70ACDC76196 for ; Tue, 11 Apr 2023 14:22:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09913280009; Tue, 11 Apr 2023 10:22:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 048C1280001; Tue, 11 Apr 2023 10:22:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E538D280009; Tue, 11 Apr 2023 10:22:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D4EEF280001 for ; Tue, 11 Apr 2023 10:22:44 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 644A81A0C5D for ; Tue, 11 Apr 2023 14:22:44 +0000 (UTC) X-FDA: 80669326248.03.6DC99DF Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf19.hostedemail.com (Postfix) with ESMTP id 09A0F1A001E for ; Tue, 11 Apr 2023 14:22:41 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b="h7p/A5Is"; spf=pass (imf19.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681222962; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=daeTOWtQqBFzgEXYVygqzf5LDV+NQcafrzvZ1+F0JkI=; b=uKqNvSJXZOAdqkMGZpYxc/LbBmCZLN9eAq/tyY/IiaSKnkOMntwnjrC2wD4YwVfIOCTlz5 huALYmeOY1CSWoqYEAMw/n/C5D3NQKDGqSHkerRKQViu+gZxTENN48xs5NthWwYjNnNaBd R3GeNqtUTIfN9pxBJdN2DNmm7LEDq8k= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b="h7p/A5Is"; spf=pass (imf19.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681222962; a=rsa-sha256; cv=none; b=I3OqMhIiKBY7yfjPTru8LBywlBWnLwuulrg5C1iyt/GFKPlVU5pZ63P8HgZ5ogkb0krYrP JxXHepnOeOZXn6sR0qfVzlO7j4bJeUhbspqefjLNEcLYYEHzQVJH3a7J13KP7m58aXcNQp Gb9U64fRU/Hd0lwdPz3r0tfJTRbXkn0= Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33BDjp2H036752; Tue, 11 Apr 2023 14:22:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=daeTOWtQqBFzgEXYVygqzf5LDV+NQcafrzvZ1+F0JkI=; b=h7p/A5Is2mKQgKvxl724RfRUOgACH2e5XnwuO6w4bgxevtEZ/tdoksIln7vTl7xBNwCW eH9q0SDlEbwriFpEci3nASYzpOZMhIoG/T1RGlYE3+gxmleLMC6qeqEGmWGnYOfaaf+g jjp88pGG/HuQmswOzlcup6qjqzlrhoVZ8onY1uIXywqmcP1BslTnYD2CMoVBhE5UlYyq HARI9ehDlA6aTiGnxLGMPhBVg7vpl/vELAqu9Arz3HRnfAAB+lxxBJUpyI0WWlpmf+2K 9d3DvSpKRctFxoN/VmG1/CAN32/OFPcYyanPqeldxJfJlByXnUd2MekdlCiu2jC7l91O Sw== Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pw83ruhyq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Apr 2023 14:22:26 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 33BDxAi8029774; Tue, 11 Apr 2023 14:22:26 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([9.208.129.113]) by ppma02dal.us.ibm.com (PPS) with ESMTPS id 3pu0fqcvr1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 11 Apr 2023 14:22:26 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 33BEMOL824052426 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Apr 2023 14:22:24 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 270E95805B; Tue, 11 Apr 2023 14:22:24 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6076C58058; Tue, 11 Apr 2023 14:22:21 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.75.136]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTP; Tue, 11 Apr 2023 14:22:21 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: "Aneesh Kumar K.V" , Joao Martins , Muchun Song , Dan Williams , Tarun Sahu Subject: [PATCH v2-updated 1/2] mm/vmemmap/devdax: Fix kernel crash when probing devdax devices Date: Tue, 11 Apr 2023 19:52:13 +0530 Message-Id: <20230411142214.64464-1-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: dqb_upkIiq5B049CyFgVSE8eU8z88Rli X-Proofpoint-ORIG-GUID: dqb_upkIiq5B049CyFgVSE8eU8z88Rli X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-11_09,2023-04-11_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 mlxlogscore=999 suspectscore=0 adultscore=0 malwarescore=0 phishscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304110131 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 09A0F1A001E X-Stat-Signature: hpwk5xfhk6wwq44dbwr1rdmcq84jnhjo X-HE-Tag: 1681222961-635577 X-HE-Meta: U2FsdGVkX1/Ni+7kQuPv2cA42ouqhZ8L8Eub0yDGOTEQsQIeEDk+KOtpZEwQveiq50Xmo/RGaupWc0AONXBkL8+0ffZfwTLpgLkQ7Ab7+ed2MbMS1LQ3EHXi+ka4gchJ+CozbkyOu02ujYe80T/nzOf34X92DKOHQGaGybmSzo3cYgTyDnTEhJVbdslp+qTmP+tmr8BgcAK/lJ1Q83M9P6ROb2j4y8PdZSJSCPrlmQyhvXNrQgMOXu9M7ZCNp9+cVywQp9aWWSjZSoxUZlvy8/gJCB6jGZ2zdzO70BaRA6zGowYYxR+uXDt5EQOtPL09a7EoQx7nmWGkC/uzohuqGrGkZMAMhYc4d0PJRO+Gf+JIZcW9dgVeAiJzPoAYudEQ5vEEMLieJLqd0jnYY/LH33Reo4bNvh49SUtw1qqZvh+c7zeeJQxOzgMloi6OPK71cXFnKs5J19Mrs5wrc3rcSUx/DWrsZdbcBqiay6E0rarIG6degp42XgXDl3DoZPvydO9RRpPnp2fct3y7g4Nflwoi411pYEfmLGA0968HU+SmJdLo03sUJXgElkNitl3FcllvRxoG8gQJjZfowsQX5QlgS/7PQyItoAiu2e42HKMJsnlaYU19ye/yD/HZ5jbvCrR3uF6UsiDmFBPxXqmb4GqwvUyaYTBSMD4b/d8BJEctCOKbCmZgilYyUpsO9opOcTbAazVp7ILMK6q6MF2ED3a2Uz/pZckGqaZe0tSPtTX+DHasf2wm31uvwRWNUjVgjXzmWeYj7ZZj9oImU7PMBHDBauCo1WYnsaDhHeTp1onOzjThFZP/07s/4ZsWqV95vGS9qeaUzI/jhIzaoVt1TjZtAln9LX1kH0MAEqaaiUYeA5q4FrX777OLOIrPq8gzUrEmI/JqQPgduDjGDbZeoqLD1QGDQhDkgc0LL7fQZS7qSe/vYdJ9cg/bVnOU74dytXQ/aQedQEc/SjePHeO H7rE2i6K AGvOHkv32VM3SeClKx1POufRULqx7De+ofD85LpH2uSURRDMV4q6OY0A6cfIIfn4BwSihwC5mDd6SIdpB3iIB7mAb1/gwD68s6o47uYy0WbWptyI63WN/KtTUFDCjGHQ66L09Axq7bHVg4sZZ1VOGlBIyz4/2zcvxwaVZPiE5Pmrm8ab2qE/yxHpIaWAFCMNvPSY/+1HILCqte+mXa3pT7dN29wCMYSR7/skzmXKBZdckMf8wkTLujdDhAvder5gzOS8lXGrDzuYOea8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Cc: Joao Martins Cc: Muchun Song Cc: Dan Williams Reported-by: Tarun Sahu Signed-off-by: Aneesh Kumar K.V --- Changes from V1: * Only disable memory saving part of compound devmaps * Update the correct Fixes: commit * Add patch to drop HUGETLB specific kconfig include/linux/mm.h | 16 ++++++++++++++++ mm/page_alloc.c | 9 ++++++--- mm/sparse-vmemmap.c | 3 +-- 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 716d30d93616..08799ad0cf0f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3442,6 +3442,22 @@ void vmemmap_populate_print_last(void); void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap); #endif + +#ifdef CONFIG_ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return is_power_of_2(sizeof(struct page)) && + pgmap && (pgmap_vmemmap_nr(pgmap) > 1) && !altmap; +} +#else +static inline bool vmemmap_can_optimize(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap) +{ + return false; +} +#endif + void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, unsigned long nr_pages); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3bb3484563ed..292411d8816f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6844,10 +6844,13 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, * of an altmap. See vmemmap_populate_compound_pages(). */ static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, + struct dev_pagemap *pgmap, unsigned long nr_pages) { - return is_power_of_2(sizeof(struct page)) && - !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; + if (vmemmap_can_optimize(altmap, pgmap)) + return 2 * (PAGE_SIZE / sizeof(struct page)); + else + return nr_pages; } static void __ref memmap_init_compound(struct page *head, @@ -6912,7 +6915,7 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pfns_per_compound)); + compound_nr_pages(altmap, pgmap, pfns_per_compound)); } pr_info("%s initialised %lu pages in %ums\n", __func__, diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index c5398a5960d0..10d73a0dfcec 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -458,8 +458,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn, !IS_ALIGNED(nr_pages, PAGES_PER_SUBSECTION))) return NULL; - if (is_power_of_2(sizeof(struct page)) && - pgmap && pgmap_vmemmap_nr(pgmap) > 1 && !altmap) + if (vmemmap_can_optimize(altmap, pgmap)) r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); else r = vmemmap_populate(start, end, nid, altmap);