From patchwork Wed Jan 25 20:23:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 13116092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8FE0C27C76 for ; Wed, 25 Jan 2023 20:23:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D6E86B0074; Wed, 25 Jan 2023 15:23:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 387276B0075; Wed, 25 Jan 2023 15:23:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24DE66B0078; Wed, 25 Jan 2023 15:23:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 124DF6B0074 for ; Wed, 25 Jan 2023 15:23:52 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D2CE0AB3F3 for ; Wed, 25 Jan 2023 20:23:51 +0000 (UTC) X-FDA: 80394447462.01.6D24B55 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf05.hostedemail.com (Postfix) with ESMTP id 533DF10000F for ; Wed, 25 Jan 2023 20:23:49 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=knTJ46xp; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674678230; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=o6j1C/IHvbeotIPotqfHYLhvi6MorDeRVA46Mhcvr18=; b=LAuyqIFKg+FDpkwn0YFPm2XFadDPhX+rhm33D3yAx2Idh/XcrbE8q9MvoxuCgDFWPI/iL0 39Fy2FDA4B84m+Ii6uDTXyOWA4G2ZNWvWYqCaBu+izjylEz3vuVve2CrMZGsM22/8WnEj3 3jLyp+bD01+8af5j0316ppeUEMFuvbQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=knTJ46xp; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674678230; a=rsa-sha256; cv=none; b=x6+SpjIeq4ywoGZUgGLa818IyUe/A9Jw/uOGwGRY9kAuj4y6yeTtbG0WOSXwhx7R8tehnF I+JVOUGbol7CEx+EKV4G0mjfQED/FzbIXS/LHPbRkdtPrMjy3ENmu/fbjeKHubqmRlvmAz bUkNfW3i9Szjk4YhNYZ6X+UAwHCjUMk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674678229; x=1706214229; h=subject:from:to:cc:date:message-id:mime-version: content-transfer-encoding; bh=oW1gyLbFzR4XGy/nrIoXw/90pgAFQddwJ8dJi52PPsQ=; b=knTJ46xpbIqmZj3spUiyhiOiYBhSuZ7kp9f5UuCgsT8FmMZDgG1m209p kBY8ZKBunVs2xhztFf1eTwtHl1TFELR8dfrYc5FqD7FkTPBouehhnny3S nkGiEraSKxsz8dTgHH0zD/pKX/batMcvMf0C0x1laMUbabRyPCIQ/CTJ2 3O/LMW1JeqO+oOWCyJAt7oR5EOnLnR4lZj5yp50PspDgdHyT0soq0/z3D YJLMWMV/bQJ7+P6fnarD/iI09UeIeAWXskefZphU5b0MxnG4Iz9yzigBS +mdorZSSCFlig57SGuhX0WF/4cN9ThcmycpHujqcZOAZevT824+YJx0wa w==; X-IronPort-AV: E=McAfee;i="6500,9779,10601"; a="307011727" X-IronPort-AV: E=Sophos;i="5.97,246,1669104000"; d="scan'208";a="307011727" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2023 12:23:47 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10601"; a="805126312" X-IronPort-AV: E=Sophos;i="5.97,246,1669104000"; d="scan'208";a="805126312" Received: from lwlu-mobl.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.209.17.213]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jan 2023 12:23:46 -0800 Subject: [PATCH v2] nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE From: Dan Williams To: nvdimm@lists.linux.dev Cc: stable@vger.kernel.org, Alexander Potapenko , Marco Elver , Jeff Moyer , linux-mm@kvack.org, kasan-dev@googlegroups.com, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org Date: Wed, 25 Jan 2023 12:23:46 -0800 Message-ID: <167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 533DF10000F X-Stat-Signature: niajuuyujkdc57y4ymyh7m4syxikqshj X-HE-Tag: 1674678229-270987 X-HE-Meta: U2FsdGVkX1/vYhLgL9lztiTSluKLCLOU0Xn9ZZ+Rjayto5rxxyI9SGywo01iEXyG5Wh9KKLanM/U+F6WLzxGwJ/rjsTZ9rQy7L9gmqlF4aP9V9r04Kw10vKz+xpflBkC8QSsoxllPQmPu+NywuHCchnlLCPemMBCZXfk4DbN3VgwLlsD02YI5hc0XFRvfrN/PteAVFK8m9oPF0chgEQKyf5vlystOl16s9q0D+l5n2lbu3DvA6OUkTe1SXN3dSkuNzZQnqcMj+CbV1O+dmNlFWivxuZeALZEue7FLoRKo9cSLA6tqY7OzccSVY8V5aTxyp902PjKJAE5R2bHS70VkdHNZBEyVaQiE3iGc0PRhKJ517WWz/2JTCDfIloJ3k1A0+L6QYMab+O6RH17hrFDCeQ0Bnn3AkHz0bSy99HRs8FTlwYiUwk9c59cjWqmLfcDBdT72uN4T+E492gpTIYeqHjLKTKxQMF84dpAgTcDm4dwI9fy34wvjWnOHKw77uSVhZ22J+hEggqotwPrxdhb1iU5KiVWY4tg02vOBvFxABcZYhL2hfNuGUSKpY6EPFscBqYZ9PZ3kGXa4cGKJ6SzmvljwuLJdIGHNVwq9NHQzVkSADxgv1Sq9UFHCVZMQIK8K/xgfMO1XfYNaTLrhf1F/MRVX2hrXttBSRQs9ivLHlI5fvTS5hMrwykuC/7fyiJdCAGMGAeVQZybR5CwcyQzENkQ23EMev0pDQdiWEk8C6UkXjnK1T82kJ3tYd1znNO2XzJsGz5KUfTso/w3HO6b/ZR+qA+7mpebRlJRsx/6AkcqJa9jTxmUJW9NqR3DkzsgkDh1iD/H1hZmw+gsU5s2Zx/XeprtQMj2u1RgS3EEZrHVcFt4nu/I+vQkDzsSseSznsb3M52KrjUg3yxAArK0N14XBWLnJnwDNUVCCbiqTwdr3MfVwp+O90dWCad+my4QYtEREqK2Oyc74i/SdK/ I9CkvsPs BPvhHxWV1hTRxmOEiB/iK60r+YbC2ysU/MonYYfXWVxTrUhN5MitI5cvP6sO4GgchvRzNbNNO8r5ocqpg3fGiPwAcGdMAanesBkmqEKHA9NooNiRRmyVdy8U8801k0bsNa6+ishvfUyr66lFzPgTMa7UPfMLMqnrjr1BOW60j4JlggroVL4mb+5D2UdEPOnl2MCLiTApB0Cy65x7sjbYvhfI8f78B278h3uyM2vJlumx6uqBPt+m6w5LyeOr96CeGm0DC3LyCo7GQH5CVafH1fsc06eAyOs70BVFQuL8gu1Ke2taRLj5ihTxnU/0RoODU2q1g9AZB2OgwWMM3uxMOcFhiOA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE") ...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page) potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this doubles the amount of capacity stolen from user addressable capacity for everyone, regardless of whether they are using the debug option. Revert that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but allow for debug scenarios to proceed with creating debug sized page maps with a compile option to support debug scenarios. Note that this only applies to cases where the page map is permanent, i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl create-namespace" terms). For the "--map=mem" case, since the allocation is ephemeral for the lifespan of the namespace, there are no explicit restriction. However, the implicit restriction, of having enough available "System RAM" to store the page map for the typically large pmem, still applies. Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE") Cc: Cc: Alexander Potapenko Cc: Marco Elver Reported-by: Jeff Moyer Acked-by: Yu Zhao Acked-by: Alexander Potapenko --- Changes since v1 [1]: * Replace the module option with a compile option and a description of the tradeoffs to consider when running with KMSAN enabled in the presence of NVDIMM namespaces and their local reservation of capacity for a 'struct page' memmap array. (Greg) [1]: https://lore.kernel.org/all/63bc8fec4744a_5178e29467@dwillia2-xfh.jf.intel.com.notmuch/ drivers/nvdimm/Kconfig | 19 +++++++++++++++++++ drivers/nvdimm/nd.h | 2 +- drivers/nvdimm/pfn_devs.c | 42 +++++++++++++++++++++++++++--------------- 3 files changed, 47 insertions(+), 16 deletions(-) diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig index 79d93126453d..77b06d54cc62 100644 --- a/drivers/nvdimm/Kconfig +++ b/drivers/nvdimm/Kconfig @@ -102,6 +102,25 @@ config NVDIMM_KEYS depends on ENCRYPTED_KEYS depends on (LIBNVDIMM=ENCRYPTED_KEYS) || LIBNVDIMM=m +config NVDIMM_KMSAN + bool + depends on KMSAN + help + KMSAN, and other memory debug facilities, increase the size of + 'struct page' to contain extra metadata. This collides with + the NVDIMM capability to store a potentially + larger-than-"System RAM" size 'struct page' array in a + reservation of persistent memory rather than limited / + precious DRAM. However, that reservation needs to persist for + the life of the given NVDIMM namespace. If you are using KMSAN + to debug an issue unrelated to NVDIMMs or DAX then say N to this + option. Otherwise, say Y but understand that any namespaces + (with the page array stored pmem) created with this build of + the kernel will permanently reserve and strand excess + capacity compared to the CONFIG_KMSAN=n case. + + Select N if unsure. + config NVDIMM_TEST_BUILD tristate "Build the unit test core" depends on m diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index 85ca5b4da3cf..ec5219680092 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev, struct nd_namespace_common *ndns); #if IS_ENABLED(CONFIG_ND_CLAIM) /* max struct page size independent of kernel config */ -#define MAX_STRUCT_PAGE_SIZE 128 +#define MAX_STRUCT_PAGE_SIZE 64 int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap); #else static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 61af072ac98f..c7655a1fe38c 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -13,6 +13,8 @@ #include "pfn.h" #include "nd.h" +const static bool page_struct_override = IS_ENABLED(CONFIG_NVDIMM_KMSAN); + static void nd_pfn_release(struct device *dev) { struct nd_region *nd_region = to_nd_region(dev->parent); @@ -758,12 +760,6 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - /* - * Note, we use 64 here for the standard size of struct page, - * debugging options may cause it to be larger in which case the - * implementation will limit the pfns advertised through - * ->direct_access() to those that are included in the memmap. - */ start = nsio->res.start; size = resource_size(&nsio->res); npfns = PHYS_PFN(size - SZ_8K); @@ -782,20 +778,33 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) } end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { + unsigned long page_map_size = MAX_STRUCT_PAGE_SIZE * npfns; + /* * The altmap should be padded out to the block size used * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. * - * Also make sure size of struct page is less than 128. We - * want to make sure we use large enough size here so that - * we don't have a dynamic reserve space depending on - * struct page size. But we also want to make sure we notice - * when we end up adding new elements to struct page. + * Also make sure size of struct page is less than + * MAX_STRUCT_PAGE_SIZE. The goal here is compatibility in the + * face of production kernel configurations that reduce the + * 'struct page' size below MAX_STRUCT_PAGE_SIZE. For debug + * kernel configurations that increase the 'struct page' size + * above MAX_STRUCT_PAGE_SIZE, the page_struct_override allows + * for continuing with the capacity that will be wasted when + * reverting to a production kernel configuration. Otherwise, + * those configurations are blocked by default. */ - BUILD_BUG_ON(sizeof(struct page) > MAX_STRUCT_PAGE_SIZE); - offset = ALIGN(start + SZ_8K + MAX_STRUCT_PAGE_SIZE * npfns, align) - - start; + if (sizeof(struct page) > MAX_STRUCT_PAGE_SIZE) { + if (page_struct_override) + page_map_size = sizeof(struct page) * npfns; + else { + dev_err(&nd_pfn->dev, + "Memory debug options prevent using pmem for the page map\n"); + return -EINVAL; + } + } + offset = ALIGN(start + SZ_8K + page_map_size, align) - start; } else if (nd_pfn->mode == PFN_MODE_RAM) offset = ALIGN(start + SZ_8K, align) - start; else @@ -818,7 +827,10 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) pfn_sb->version_minor = cpu_to_le16(4); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align); - pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE); + if (sizeof(struct page) > MAX_STRUCT_PAGE_SIZE && page_struct_override) + pfn_sb->page_struct_size = cpu_to_le16(sizeof(struct page)); + else + pfn_sb->page_struct_size = cpu_to_le16(MAX_STRUCT_PAGE_SIZE); pfn_sb->page_size = cpu_to_le32(PAGE_SIZE); checksum = nd_sb_checksum((struct nd_gen_sb *) pfn_sb); pfn_sb->checksum = cpu_to_le64(checksum);