From patchwork Sat Aug 1 03:26:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11695783 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6B9DC138A for ; Sat, 1 Aug 2020 03:42:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3574920888 for ; Sat, 1 Aug 2020 03:42:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3574920888 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 25D678D0072; Fri, 31 Jul 2020 23:42:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 20E9C8D0068; Fri, 31 Jul 2020 23:42:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D6AF8D0072; Fri, 31 Jul 2020 23:42:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id EAC9A8D0068 for ; Fri, 31 Jul 2020 23:42:22 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AF2CA8248047 for ; Sat, 1 Aug 2020 03:42:22 +0000 (UTC) X-FDA: 77100602124.02.brain46_0212e2626f89 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 8B8F310097AA4 for ; Sat, 1 Aug 2020 03:42:22 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,dan.j.williams@intel.com,,RULES_HIT:30003:30029:30045:30054:30064:30090,0,RBL:134.134.136.24:@intel.com:.lbl8.mailshell.net-64.95.201.95 62.18.0.100;04ygktmpawqhkoq674xxh3jbg4h19yp56nczce4yftostufgfxnh83ubwoq5k4s.p9dkpx1fmrr6i1eie6wqygmih1gadtd5c9amcu4hzgqfn4has1qzti8eht39jcs.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: brain46_0212e2626f89 X-Filterd-Recvd-Size: 10503 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Sat, 1 Aug 2020 03:42:21 +0000 (UTC) IronPort-SDR: as6a7W44F3vVnVp4WsNP6SAi1RlUWJA0QRf/a6YIbH0fBMv2xnMOGLnzoQz05zh1lPS/iXDSKO 1QZc8CQVnG7w== X-IronPort-AV: E=McAfee;i="6000,8403,9699"; a="153118178" X-IronPort-AV: E=Sophos;i="5.75,420,1589266800"; d="scan'208";a="153118178" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 20:42:20 -0700 IronPort-SDR: ONvEH+SJtKAahLRLkv3MoHjNtAi/7QkSDQ9QBOUph/UdZRwTiGUkG+3uwfWjhrjK9DwMxSsewa 6P+EjBpcj6HA== X-IronPort-AV: E=Sophos;i="5.75,420,1589266800"; d="scan'208";a="491754079" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2020 20:42:19 -0700 Subject: [PATCH v3 11/23] device-dax: Kill dax_kmem_res From: Dan Williams To: akpm@linux-foundation.org Cc: David Hildenbrand , Vishal Verma , Dave Hansen , Pavel Tatashin , peterz@infradead.org, ard.biesheuvel@linaro.org, vishal.l.verma@intel.com, linux-mm@kvack.org, linux-nvdimm@lists.01.org, joao.m.martins@oracle.com, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, dri-devel@lists.freedesktop.org Date: Fri, 31 Jul 2020 20:26:01 -0700 Message-ID: <159625236129.3040297.8607704947114784109.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <159625229779.3040297.11363509688097221416.stgit@dwillia2-desk3.amr.corp.intel.com> References: <159625229779.3040297.11363509688097221416.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 X-Rspamd-Queue-Id: 8B8F310097AA4 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Several related issues around this unneeded attribute: - The dax_kmem_res property allows the kmem driver to stash the adjusted resource range that was used for the hotplug operation, but that can be recalculated from the original base range. - kmem is using an open coded release_resource() + kfree() when an idiomatic release_mem_region() is sufficient. - The driver managed resource need only manage the busy flag. Other flags are of no concern to the kmem driver. In fact if kmem inherits some memory range that add_memory_driver_managed() rejects that is a memory-hotplug-core policy that the driver is in no position to override. - The implementation trusts that failed remove_memory() results in the entire resource range remaining pinned busy. The driver need not make that layering violation assumption and just maintain the busy state in its local resource. - The "Hot-remove not yet implemented." comment is stale since hotremove support is now included. Cc: David Hildenbrand Cc: Vishal Verma Cc: Dave Hansen Cc: Pavel Tatashin Signed-off-by: Dan Williams --- drivers/dax/dax-private.h | 3 - drivers/dax/kmem.c | 123 +++++++++++++++++++++------------------------ 2 files changed, 58 insertions(+), 68 deletions(-) diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 6779f683671d..12a2dbc43b40 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -42,8 +42,6 @@ struct dax_region { * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) * @range: resource range for the instance - * @dax_mem_res: physical address range of hotadded DAX memory - * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed() */ struct dev_dax { struct dax_region *region; @@ -52,7 +50,6 @@ struct dev_dax { struct device dev; struct dev_pagemap *pgmap; struct range range; - struct resource *dax_kmem_res; }; static inline u64 range_len(struct range *range) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 5bb133df147d..77e25361fbeb 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -19,16 +19,24 @@ static const char *kmem_name; /* Set if any memory will remain added when the driver will be unloaded. */ static bool any_hotremove_failed; +static struct range dax_kmem_range(struct dev_dax *dev_dax) +{ + struct range range; + + /* memory-block align the hotplug range */ + range.start = ALIGN(dev_dax->range.start, memory_block_size_bytes()); + range.end = ALIGN_DOWN(dev_dax->range.end + 1, + memory_block_size_bytes()) - 1; + return range; +} + int dev_dax_kmem_probe(struct device *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); - struct range *range = &dev_dax->range; - resource_size_t kmem_start; - resource_size_t kmem_size; - resource_size_t kmem_end; - struct resource *new_res; - const char *new_res_name; - int numa_node; + struct range range = dax_kmem_range(dev_dax); + int numa_node = dev_dax->target_node; + struct resource *res; + char *res_name; int rc; /* @@ -37,109 +45,94 @@ int dev_dax_kmem_probe(struct device *dev) * could be mixed in a node with faster memory, causing * unavoidable performance issues. */ - numa_node = dev_dax->target_node; if (numa_node < 0) { dev_warn(dev, "rejecting DAX region with invalid node: %d\n", numa_node); return -EINVAL; } - /* Hotplug starting at the beginning of the next block: */ - kmem_start = ALIGN(range->start, memory_block_size_bytes()); - - kmem_size = range_len(range); - /* Adjust the size down to compensate for moving up kmem_start: */ - kmem_size -= kmem_start - range->start; - /* Align the size down to cover only complete blocks: */ - kmem_size &= ~(memory_block_size_bytes() - 1); - kmem_end = kmem_start + kmem_size; - - new_res_name = kstrdup(dev_name(dev), GFP_KERNEL); - if (!new_res_name) + res_name = kstrdup(dev_name(dev), GFP_KERNEL); + if (!res_name) return -ENOMEM; - /* Region is permanently reserved if hotremove fails. */ - new_res = request_mem_region(kmem_start, kmem_size, new_res_name); - if (!new_res) { - dev_warn(dev, "could not reserve region [%pa-%pa]\n", - &kmem_start, &kmem_end); - kfree(new_res_name); + res = request_mem_region(range.start, range_len(&range), res_name); + if (!res) { + dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", + range.start, range.end); + kfree(res_name); return -EBUSY; } /* - * Set flags appropriate for System RAM. Leave ..._BUSY clear - * so that add_memory() can add a child resource. Do not - * inherit flags from the parent since it may set new flags - * unknown to us that will break add_memory() below. + * Temporarily clear busy to allow add_memory_driver_managed() + * to claim it. */ - new_res->flags = IORESOURCE_SYSTEM_RAM; + res->flags &= ~IORESOURCE_BUSY; /* * Ensure that future kexec'd kernels will not treat this as RAM * automatically. */ - rc = add_memory_driver_managed(numa_node, new_res->start, - resource_size(new_res), kmem_name); + rc = add_memory_driver_managed(numa_node, res->start, + resource_size(res), kmem_name); + + res->flags |= IORESOURCE_BUSY; if (rc) { - release_resource(new_res); - kfree(new_res); - kfree(new_res_name); + release_mem_region(range.start, range_len(&range)); + kfree(res_name); return rc; } - dev_dax->dax_kmem_res = new_res; + + dev_set_drvdata(dev, res_name); return 0; } #ifdef CONFIG_MEMORY_HOTREMOVE -static int dev_dax_kmem_remove(struct device *dev) +static void dax_kmem_release(struct dev_dax *dev_dax) { - struct dev_dax *dev_dax = to_dev_dax(dev); - struct resource *res = dev_dax->dax_kmem_res; - resource_size_t kmem_start = res->start; - resource_size_t kmem_size = resource_size(res); - const char *res_name = res->name; int rc; + struct device *dev = &dev_dax->dev; + const char *res_name = dev_get_drvdata(dev); + struct range range = dax_kmem_range(dev_dax); /* * We have one shot for removing memory, if some memory blocks were not * offline prior to calling this function remove_memory() will fail, and * there is no way to hotremove this memory until reboot because device - * unbind will succeed even if we return failure. + * unbind will proceed regardless of the remove_memory result. */ - rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size); - if (rc) { - any_hotremove_failed = true; - dev_err(dev, - "DAX region %pR cannot be hotremoved until the next reboot\n", - res); - return rc; + rc = remove_memory(dev_dax->target_node, range.start, range_len(&range)); + if (rc == 0) { + release_mem_region(range.start, range_len(&range)); + dev_set_drvdata(dev, NULL); + kfree(res_name); + return; } - /* Release and free dax resources */ - release_resource(res); - kfree(res); - kfree(res_name); - dev_dax->dax_kmem_res = NULL; - - return 0; + any_hotremove_failed = true; + dev_err(dev, "%#llx-%#llx cannot be hotremoved until the next reboot\n", + range.start, range.end); } #else -static int dev_dax_kmem_remove(struct device *dev) +static void dax_kmem_release(struct dev_dax *dev_dax) { /* - * Without hotremove purposely leak the request_mem_region() for the - * device-dax range and return '0' to ->remove() attempts. The removal - * of the device from the driver always succeeds, but the region is - * permanently pinned as reserved by the unreleased - * request_mem_region(). + * Without hotremove purposely leak the request_mem_region() for + * the device-dax range attempts. The removal of the device from + * the driver always succeeds, but the region is permanently + * pinned as reserved by the unreleased request_mem_region(). */ any_hotremove_failed = true; - return 0; } #endif /* CONFIG_MEMORY_HOTREMOVE */ +static int dev_dax_kmem_remove(struct device *dev) +{ + dax_kmem_release(to_dev_dax(dev)); + return 0; +} + static struct dax_device_driver device_dax_kmem_driver = { .drv = { .probe = dev_dax_kmem_probe,