From patchwork Mon Mar 23 23:54:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454253 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD1D2913 for ; Tue, 24 Mar 2020 00:10:48 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7E2AC20788 for ; Tue, 24 Mar 2020 00:10:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E2AC20788 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 3A18610FC388C; Mon, 23 Mar 2020 17:11:38 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.65; helo=mga03.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 808BB10FC3881 for ; Mon, 23 Mar 2020 17:11:35 -0700 (PDT) IronPort-SDR: w82thZgrLDi/W3H47QolllYVomGmOWMseRs8fymnuoLz2wgESRTloVWhf6eQDxvX7TnwxJ1fCg ZzOkuG5LcTIA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:10:44 -0700 IronPort-SDR: FXlZeJCeC8XzZ3K98a5LQIxztMUNO9atLkmlVAgtVKu97Fkyytph/17UqZ0ET8vqCXr+/RFVHV 0IbaGsOux5KA== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="447678501" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:10:43 -0700 Subject: [PATCH 01/12] device-dax: Drop the dax_region.pfn_flags attribute From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:54:36 -0700 Message-ID: <158500767684.2088294.8691316958585274856.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: 3MNG75X24U7EIHKLE3NKSSJJMKYXUKR5 X-Message-ID-Hash: 3MNG75X24U7EIHKLE3NKSSJJMKYXUKR5 X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: All callers specify the same flags to alloc_dax_region(), so there is no need to allow for anything other than PFN_DEV|PFN_MAP, or carry a ->pfn_flags around on the region. Device-dax instances are always page backed. Signed-off-by: Dan Williams --- drivers/dax/bus.c | 4 +--- drivers/dax/bus.h | 3 +-- drivers/dax/dax-private.h | 2 -- drivers/dax/device.c | 26 +++----------------------- drivers/dax/hmem/hmem.c | 2 +- drivers/dax/pmem/core.c | 3 +-- 6 files changed, 7 insertions(+), 33 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 46e46047a1f7..5301c294e861 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -226,8 +226,7 @@ static void dax_region_unregister(void *region) } struct dax_region *alloc_dax_region(struct device *parent, int region_id, - struct resource *res, int target_node, unsigned int align, - unsigned long long pfn_flags) + struct resource *res, int target_node, unsigned int align) { struct dax_region *dax_region; @@ -251,7 +250,6 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, dev_set_drvdata(parent, dax_region); memcpy(&dax_region->res, res, sizeof(*res)); - dax_region->pfn_flags = pfn_flags; kref_init(&dax_region->kref); dax_region->id = region_id; dax_region->align = align; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 9e4eba67e8b9..55577e9791da 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -10,8 +10,7 @@ struct dax_device; struct dax_region; void dax_region_put(struct dax_region *dax_region); struct dax_region *alloc_dax_region(struct device *parent, int region_id, - struct resource *res, int target_node, unsigned int align, - unsigned long long flags); + struct resource *res, int target_node, unsigned int align); enum dev_dax_subsys { DEV_DAX_BUS, diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 3107ce80e809..28767dc3ec13 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -23,7 +23,6 @@ void dax_bus_exit(void); * @dev: parent device backing this region * @align: allocation and mapping alignment for child dax devices * @res: physical address range of the region - * @pfn_flags: identify whether the pfns are paged back or not */ struct dax_region { int id; @@ -32,7 +31,6 @@ struct dax_region { struct device *dev; unsigned int align; struct resource res; - unsigned long long pfn_flags; }; /** diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 1af823b2fe6b..eaa651ddeccd 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -41,14 +41,6 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma, return -EINVAL; } - if ((dax_region->pfn_flags & (PFN_DEV|PFN_MAP)) == PFN_DEV - && (vma->vm_flags & VM_DONTCOPY) == 0) { - dev_info_ratelimited(dev, - "%s: %s: fail, dax range requires MADV_DONTFORK\n", - current->comm, func); - return -EINVAL; - } - if (!vma_is_dax(vma)) { dev_info_ratelimited(dev, "%s: %s: fail, vma is not DAX capable\n", @@ -102,7 +94,7 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax, return VM_FAULT_SIGBUS; } - *pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); + *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); return vmf_insert_mixed(vmf->vma, vmf->address, *pfn); } @@ -127,12 +119,6 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax, return VM_FAULT_SIGBUS; } - /* dax pmd mappings require pfn_t_devmap() */ - if ((dax_region->pfn_flags & (PFN_DEV|PFN_MAP)) != (PFN_DEV|PFN_MAP)) { - dev_dbg(dev, "region lacks devmap flags\n"); - return VM_FAULT_SIGBUS; - } - if (fault_size < dax_region->align) return VM_FAULT_SIGBUS; else if (fault_size > dax_region->align) @@ -150,7 +136,7 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax, return VM_FAULT_SIGBUS; } - *pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); + *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); return vmf_insert_pfn_pmd(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE); } @@ -177,12 +163,6 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, return VM_FAULT_SIGBUS; } - /* dax pud mappings require pfn_t_devmap() */ - if ((dax_region->pfn_flags & (PFN_DEV|PFN_MAP)) != (PFN_DEV|PFN_MAP)) { - dev_dbg(dev, "region lacks devmap flags\n"); - return VM_FAULT_SIGBUS; - } - if (fault_size < dax_region->align) return VM_FAULT_SIGBUS; else if (fault_size > dax_region->align) @@ -200,7 +180,7 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, return VM_FAULT_SIGBUS; } - *pfn = phys_to_pfn_t(phys, dax_region->pfn_flags); + *pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP); return vmf_insert_pfn_pud(vmf, *pfn, vmf->flags & FAULT_FLAG_WRITE); } diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index 29ceb5795297..506893861253 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -22,7 +22,7 @@ static int dax_hmem_probe(struct platform_device *pdev) memcpy(&pgmap.res, res, sizeof(*res)); dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node, - PMD_SIZE, PFN_DEV|PFN_MAP); + PMD_SIZE); if (!dax_region) return -ENOMEM; diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c index 2bedf8414fff..ea52bb77a294 100644 --- a/drivers/dax/pmem/core.c +++ b/drivers/dax/pmem/core.c @@ -53,8 +53,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) memcpy(&res, &pgmap.res, sizeof(res)); res.start += offset; dax_region = alloc_dax_region(dev, region_id, &res, - nd_region->target_node, le32_to_cpu(pfn_sb->align), - PFN_DEV|PFN_MAP); + nd_region->target_node, le32_to_cpu(pfn_sb->align)); if (!dax_region) return ERR_PTR(-ENOMEM); From patchwork Mon Mar 23 23:54:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454257 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4CABD14B4 for ; Tue, 24 Mar 2020 00:10:53 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3509520787 for ; Tue, 24 Mar 2020 00:10:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3509520787 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 5C4F210FC3891; Mon, 23 Mar 2020 17:11:43 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 34A0E10FC3886 for ; Mon, 23 Mar 2020 17:11:40 -0700 (PDT) IronPort-SDR: gIbA5nX93cnhnXr4oQpVli7c70PdSOYJ/D8FTGDBIrKYffCr+/UeSLJeotEnhhMj8f+BDGqb9z qYCgCT2dQMvw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:10:50 -0700 IronPort-SDR: wrCwhew8c8zRErxRyNieFX0ol7sog5GCQVQ8yk/r/ORZ3TRhUfhxM2/4Uq80SYz12ao5+PuO2Z kWhp5ZSU5Mww== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="235382018" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:10:50 -0700 Subject: [PATCH 02/12] device-dax: Move instance creation parameters to 'struct dev_dax_data' From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:54:43 -0700 Message-ID: <158500768347.2088294.5638260757259826816.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: QWYBV432BPNYVEHCAN4HMEXLOREYFZ5B X-Message-ID-Hash: QWYBV432BPNYVEHCAN4HMEXLOREYFZ5B X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In preparation for adding more parameters to instance creation, move existing parameters to a new struct. Signed-off-by: Dan Williams --- drivers/dax/bus.c | 14 +++++++------- drivers/dax/bus.h | 16 ++++++++-------- drivers/dax/hmem/hmem.c | 8 +++++++- drivers/dax/pmem/core.c | 9 ++++++++- 4 files changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 5301c294e861..a5b1b7a82a87 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -395,9 +395,9 @@ static void unregister_dev_dax(void *dev) put_device(dev); } -struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id, - struct dev_pagemap *pgmap, enum dev_dax_subsys subsys) +struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) { + struct dax_region *dax_region = data->dax_region; struct device *parent = dax_region->dev; struct dax_device *dax_dev; struct dev_dax *dev_dax; @@ -405,14 +405,14 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id, struct device *dev; int rc = -ENOMEM; - if (id < 0) + if (data->id < 0) return ERR_PTR(-EINVAL); dev_dax = kzalloc(sizeof(*dev_dax), GFP_KERNEL); if (!dev_dax) return ERR_PTR(-ENOMEM); - memcpy(&dev_dax->pgmap, pgmap, sizeof(*pgmap)); + memcpy(&dev_dax->pgmap, data->pgmap, sizeof(struct dev_pagemap)); /* * No 'host' or dax_operations since there is no access to this @@ -436,13 +436,13 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id, inode = dax_inode(dax_dev); dev->devt = inode->i_rdev; - if (subsys == DEV_DAX_BUS) + if (data->subsys == DEV_DAX_BUS) dev->bus = &dax_bus_type; else dev->class = dax_class; dev->parent = parent; dev->type = &dev_dax_type; - dev_set_name(dev, "dax%d.%d", dax_region->id, id); + dev_set_name(dev, "dax%d.%d", dax_region->id, data->id); rc = device_add(dev); if (rc) { @@ -462,7 +462,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id, return ERR_PTR(rc); } -EXPORT_SYMBOL_GPL(__devm_create_dev_dax); +EXPORT_SYMBOL_GPL(devm_create_dev_dax); static int match_always_count; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 55577e9791da..299c2e7fac09 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -13,18 +13,18 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, struct resource *res, int target_node, unsigned int align); enum dev_dax_subsys { - DEV_DAX_BUS, + DEV_DAX_BUS = 0, /* zeroed dev_dax_data picks this by default */ DEV_DAX_CLASS, }; -struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id, - struct dev_pagemap *pgmap, enum dev_dax_subsys subsys); +struct dev_dax_data { + struct dax_region *dax_region; + struct dev_pagemap *pgmap; + enum dev_dax_subsys subsys; + int id; +}; -static inline struct dev_dax *devm_create_dev_dax(struct dax_region *dax_region, - int id, struct dev_pagemap *pgmap) -{ - return __devm_create_dev_dax(dax_region, id, pgmap, DEV_DAX_BUS); -} +struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data); /* to be deleted when DEV_DAX_CLASS is removed */ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys); diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index 506893861253..b84fe17178d8 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -11,6 +11,7 @@ static int dax_hmem_probe(struct platform_device *pdev) struct dev_pagemap pgmap = { }; struct dax_region *dax_region; struct memregion_info *mri; + struct dev_dax_data data; struct dev_dax *dev_dax; struct resource *res; @@ -26,7 +27,12 @@ static int dax_hmem_probe(struct platform_device *pdev) if (!dax_region) return -ENOMEM; - dev_dax = devm_create_dev_dax(dax_region, 0, &pgmap); + data = (struct dev_dax_data) { + .dax_region = dax_region, + .id = 0, + .pgmap = &pgmap, + }; + dev_dax = devm_create_dev_dax(&data); if (IS_ERR(dev_dax)) return PTR_ERR(dev_dax); diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c index ea52bb77a294..08ee5947a49c 100644 --- a/drivers/dax/pmem/core.c +++ b/drivers/dax/pmem/core.c @@ -14,6 +14,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) resource_size_t offset; struct nd_pfn_sb *pfn_sb; struct dev_dax *dev_dax; + struct dev_dax_data data; struct nd_namespace_io *nsio; struct dax_region *dax_region; struct dev_pagemap pgmap = { }; @@ -57,7 +58,13 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) if (!dax_region) return ERR_PTR(-ENOMEM); - dev_dax = __devm_create_dev_dax(dax_region, id, &pgmap, subsys); + data = (struct dev_dax_data) { + .dax_region = dax_region, + .id = id, + .pgmap = &pgmap, + .subsys = subsys, + }; + dev_dax = devm_create_dev_dax(&data); /* child dev_dax instances now own the lifetime of the dax_region */ dax_region_put(dax_region); From patchwork Mon Mar 23 23:54:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454261 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 52F4F913 for ; Tue, 24 Mar 2020 00:10:59 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3C28720719 for ; Tue, 24 Mar 2020 00:10:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3C28720719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 7798B10FC388F; Mon, 23 Mar 2020 17:11:49 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.88; helo=mga01.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C502210FC388E for ; Mon, 23 Mar 2020 17:11:46 -0700 (PDT) IronPort-SDR: OkswbQzaKmbDkbgHFF0DJyMGdP4WEEb6+YbYrFB4BZCBNmXfWsJ1w/wvMC0j0jn228WZkqL86U e/ClEVP5PM0A== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:10:55 -0700 IronPort-SDR: HC2pvlH4K1xumggL+mL/cTtBCBfjpjldtQ8eUk2tibvzVBEzxFD1/QkvkGnBgZEzkALc9U9hi8 qE4DW99hXhMw== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="264958610" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:10:55 -0700 Subject: [PATCH 03/12] device-dax: Make pgmap optional for instance creation From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:54:49 -0700 Message-ID: <158500768916.2088294.4064468957836652680.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: X6LXDUGVZ2S7DKMGU3XCCMDWZ3SCLB43 X-Message-ID-Hash: X6LXDUGVZ2S7DKMGU3XCCMDWZ3SCLB43 X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The passed in dev_pagemap is only required in the pmem case as the libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to place the memmap in pmem directly. In the hmem case there is no agent reserving an altmap so it can all be handled by a core internal default. Pass the resource range via a new @range property of 'struct dev_dax_data'. Signed-off-by: Dan Williams --- drivers/dax/bus.c | 29 +++++++++++++++-------------- drivers/dax/bus.h | 2 ++ drivers/dax/dax-private.h | 9 ++++++++- drivers/dax/device.c | 28 +++++++++++++++++++--------- drivers/dax/hmem/hmem.c | 8 ++++---- drivers/dax/kmem.c | 12 ++++++------ drivers/dax/pmem/core.c | 4 ++++ tools/testing/nvdimm/dax-dev.c | 8 ++++---- 8 files changed, 62 insertions(+), 38 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index a5b1b7a82a87..79e7514480b7 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -271,7 +271,7 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_dax *dev_dax = to_dev_dax(dev); - unsigned long long size = resource_size(&dev_dax->region->res); + unsigned long long size = range_len(&dev_dax->range); return sprintf(buf, "%llu\n", size); } @@ -293,19 +293,12 @@ static ssize_t target_node_show(struct device *dev, } static DEVICE_ATTR_RO(target_node); -static unsigned long long dev_dax_resource(struct dev_dax *dev_dax) -{ - struct dax_region *dax_region = dev_dax->region; - - return dax_region->res.start; -} - static ssize_t resource_show(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_dax *dev_dax = to_dev_dax(dev); - return sprintf(buf, "%#llx\n", dev_dax_resource(dev_dax)); + return sprintf(buf, "%#llx\n", dev_dax->range.start); } static DEVICE_ATTR(resource, 0400, resource_show, NULL); @@ -376,6 +369,7 @@ static void dev_dax_release(struct device *dev) dax_region_put(dax_region); put_dax(dax_dev); + kfree(dev_dax->pgmap); kfree(dev_dax); } @@ -412,7 +406,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) if (!dev_dax) return ERR_PTR(-ENOMEM); - memcpy(&dev_dax->pgmap, data->pgmap, sizeof(struct dev_pagemap)); + if (data->pgmap) { + dev_dax->pgmap = kmemdup(data->pgmap, + sizeof(struct dev_pagemap), GFP_KERNEL); + if (!dev_dax->pgmap) + goto err_pgmap; + } /* * No 'host' or dax_operations since there is no access to this @@ -420,17 +419,18 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) */ dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC); if (!dax_dev) - goto err; + goto err_alloc_dax; /* a device_dax instance is dead while the driver is not attached */ kill_dax(dax_dev); - /* from here on we're committed to teardown via dax_dev_release() */ + /* from here on we're committed to teardown via dev_dax_release() */ dev = &dev_dax->dev; device_initialize(dev); dev_dax->dax_dev = dax_dev; dev_dax->region = dax_region; + dev_dax->range = data->range; dev_dax->target_node = dax_region->target_node; kref_get(&dax_region->kref); @@ -456,8 +456,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) return ERR_PTR(rc); return dev_dax; - - err: +err_alloc_dax: + kfree(dev_dax->pgmap); +err_pgmap: kfree(dev_dax); return ERR_PTR(rc); diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 299c2e7fac09..4aeb36da83a4 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -3,6 +3,7 @@ #ifndef __DAX_BUS_H__ #define __DAX_BUS_H__ #include +#include struct dev_dax; struct resource; @@ -21,6 +22,7 @@ struct dev_dax_data { struct dax_region *dax_region; struct dev_pagemap *pgmap; enum dev_dax_subsys subsys; + struct range range; int id; }; diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 28767dc3ec13..22d43095559a 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -41,6 +41,7 @@ struct dax_region { * @target_node: effective numa node if dev_dax memory range is onlined * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) + * @range: resource range for the instance * @dax_mem_res: physical address range of hotadded DAX memory */ struct dev_dax { @@ -48,10 +49,16 @@ struct dev_dax { struct dax_device *dax_dev; int target_node; struct device dev; - struct dev_pagemap pgmap; + struct dev_pagemap *pgmap; + struct range range; struct resource *dax_kmem_res; }; +static inline u64 range_len(struct range *range) +{ + return range->end - range->start + 1; +} + static inline struct dev_dax *to_dev_dax(struct device *dev) { return container_of(dev, struct dev_dax, dev); diff --git a/drivers/dax/device.c b/drivers/dax/device.c index eaa651ddeccd..d631344fdadb 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -55,12 +55,12 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma, __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size) { - struct resource *res = &dev_dax->region->res; + struct range *range = &dev_dax->range; phys_addr_t phys; - phys = pgoff * PAGE_SIZE + res->start; - if (phys >= res->start && phys <= res->end) { - if (phys + size - 1 <= res->end) + phys = pgoff * PAGE_SIZE + range->start; + if (phys >= range->start && phys <= range->end) { + if (phys + size - 1 <= range->end) return phys; } @@ -395,21 +395,31 @@ int dev_dax_probe(struct device *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); struct dax_device *dax_dev = dev_dax->dax_dev; - struct resource *res = &dev_dax->region->res; + struct range *range = &dev_dax->range; + struct dev_pagemap *pgmap; struct inode *inode; struct cdev *cdev; void *addr; int rc; /* 1:1 map region resource range to device-dax instance range */ - if (!devm_request_mem_region(dev, res->start, resource_size(res), + if (!devm_request_mem_region(dev, range->start, range_len(range), dev_name(dev))) { - dev_warn(dev, "could not reserve region %pR\n", res); + dev_warn(dev, "could not reserve range: %#llx - %#llx\n", + range->start, range->end); return -EBUSY; } - dev_dax->pgmap.type = MEMORY_DEVICE_DEVDAX; - addr = devm_memremap_pages(dev, &dev_dax->pgmap); + pgmap = dev_dax->pgmap; + if (!pgmap) { + pgmap = devm_kzalloc(dev, sizeof(*pgmap), GFP_KERNEL); + if (!pgmap) + return -ENOMEM; + pgmap->res.start = range->start; + pgmap->res.end = range->end; + } + pgmap->type = MEMORY_DEVICE_DEVDAX; + addr = devm_memremap_pages(dev, pgmap); if (IS_ERR(addr)) return PTR_ERR(addr); diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index b84fe17178d8..af82d6ba820a 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -8,7 +8,6 @@ static int dax_hmem_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; - struct dev_pagemap pgmap = { }; struct dax_region *dax_region; struct memregion_info *mri; struct dev_dax_data data; @@ -20,8 +19,6 @@ static int dax_hmem_probe(struct platform_device *pdev) return -ENOMEM; mri = dev->platform_data; - memcpy(&pgmap.res, res, sizeof(*res)); - dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node, PMD_SIZE); if (!dax_region) @@ -30,7 +27,10 @@ static int dax_hmem_probe(struct platform_device *pdev) data = (struct dev_dax_data) { .dax_region = dax_region, .id = 0, - .pgmap = &pgmap, + .range = { + .start = res->start, + .end = res->end, + }, }; dev_dax = devm_create_dev_dax(&data); if (IS_ERR(dev_dax)) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 3d0a7e702c94..111e4c06ff49 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -17,7 +17,7 @@ int dev_dax_kmem_probe(struct device *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); - struct resource *res = &dev_dax->region->res; + struct range *range = &dev_dax->range; resource_size_t kmem_start; resource_size_t kmem_size; resource_size_t kmem_end; @@ -33,17 +33,17 @@ int dev_dax_kmem_probe(struct device *dev) */ numa_node = dev_dax->target_node; if (numa_node < 0) { - dev_warn(dev, "rejecting DAX region %pR with invalid node: %d\n", - res, numa_node); + dev_warn(dev, "rejecting DAX region with invalid node: %d\n", + numa_node); return -EINVAL; } /* Hotplug starting at the beginning of the next block: */ - kmem_start = ALIGN(res->start, memory_block_size_bytes()); + kmem_start = ALIGN(range->start, memory_block_size_bytes()); - kmem_size = resource_size(res); + kmem_size = range_len(range); /* Adjust the size down to compensate for moving up kmem_start: */ - kmem_size -= kmem_start - res->start; + kmem_size -= kmem_start - range->start; /* Align the size down to cover only complete blocks: */ kmem_size &= ~(memory_block_size_bytes() - 1); kmem_end = kmem_start + kmem_size; diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c index 08ee5947a49c..4fa81d3d2f65 100644 --- a/drivers/dax/pmem/core.c +++ b/drivers/dax/pmem/core.c @@ -63,6 +63,10 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) .id = id, .pgmap = &pgmap, .subsys = subsys, + .range = { + .start = res.start, + .end = res.end, + }, }; dev_dax = devm_create_dev_dax(&data); diff --git a/tools/testing/nvdimm/dax-dev.c b/tools/testing/nvdimm/dax-dev.c index 7e5d979e73cb..38d8e55c4a0d 100644 --- a/tools/testing/nvdimm/dax-dev.c +++ b/tools/testing/nvdimm/dax-dev.c @@ -9,12 +9,12 @@ phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size) { - struct resource *res = &dev_dax->region->res; + struct range *range = &dev_dax->range; phys_addr_t addr; - addr = pgoff * PAGE_SIZE + res->start; - if (addr >= res->start && addr <= res->end) { - if (addr + size - 1 <= res->end) { + addr = pgoff * PAGE_SIZE + range->start; + if (addr >= range->start && addr <= range->end) { + if (addr + size - 1 <= range->end) { if (get_nfit_res(addr)) { struct page *page; From patchwork Mon Mar 23 23:54:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 778F61668 for ; Tue, 24 Mar 2020 00:11:05 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 61D6B20719 for ; Tue, 24 Mar 2020 00:11:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 61D6B20719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 8E9C710FC388D; Mon, 23 Mar 2020 17:11:55 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.20; helo=mga02.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7706510FC388A for ; Mon, 23 Mar 2020 17:11:53 -0700 (PDT) IronPort-SDR: Kc9uonLJhSPiRh5X9/c86mB7QIXjmf9zKVIm4bNQ/3Uz3gdfDtofb1kyvVk9eBrHl7nykeAno1 qHNxNiwO8q2Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:02 -0700 IronPort-SDR: o7Q8jiHtwpNBIgs6DkiSX7vQ6yTPDOWz5sCBFIqFRyOToLtJ+htwlqGuSn/6rLEn1lW073F0dk E9Xol9yACspQ== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="238050189" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:01 -0700 Subject: [PATCH 04/12] device-dax: Kill dax_kmem_res From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:54:54 -0700 Message-ID: <158500769438.2088294.11339134986537700302.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: G4T6OTGKIZ23EJGNKGPUHWHSPZYP7AKK X-Message-ID-Hash: G4T6OTGKIZ23EJGNKGPUHWHSPZYP7AKK X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: David Hildenbrand , Dave Hansen , Pavel Tatashin , hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Several related issues around this unneeded attribute: - The dax_kmem_res property allows the kmem driver to stash the adjusted resource range that was used for the hotplug operation, but that can be recalculated from the original base range. - kmem is using an open coded release_resource() + kfree() when an idiomatic release_mem_region() is sufficient. - The driver managed resource need only manage the busy flag. Other flags are of no concern to the kmem driver. In fact if kmem inherits some memory range that add_memory() rejects that is a memory-hotplug-core policy that the driver is in no position to override. - The implementation trusts that failed remove_memory() results in the entire resource range remaining pinned busy. The driver need not make that layering violation assumption and just maintain the busy state in its local resource. - The "Hot-remove not yet implemented." comment is stale as of commit 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM") Cc: David Hildenbrand Cc: Vishal Verma Cc: Dave Hansen Cc: Pavel Tatashin Signed-off-by: Dan Williams --- drivers/dax/dax-private.h | 2 - drivers/dax/kmem.c | 104 +++++++++++++++++++-------------------------- 2 files changed, 43 insertions(+), 63 deletions(-) diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 22d43095559a..12a2dbc43b40 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -42,7 +42,6 @@ struct dax_region { * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) * @range: resource range for the instance - * @dax_mem_res: physical address range of hotadded DAX memory */ struct dev_dax { struct dax_region *region; @@ -51,7 +50,6 @@ struct dev_dax { struct device dev; struct dev_pagemap *pgmap; struct range range; - struct resource *dax_kmem_res; }; static inline u64 range_len(struct range *range) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 111e4c06ff49..2bb7fa0951ef 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -14,15 +14,23 @@ #include "dax-private.h" #include "bus.h" +static struct range dax_kmem_range(struct dev_dax *dev_dax) +{ + struct range range; + + /* memory-block align the hotplug range */ + range.start = ALIGN(dev_dax->range.start, memory_block_size_bytes()); + range.end = ALIGN_DOWN(dev_dax->range.end + 1, + memory_block_size_bytes()) - 1; + return range; +} + int dev_dax_kmem_probe(struct device *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); - struct range *range = &dev_dax->range; - resource_size_t kmem_start; - resource_size_t kmem_size; - resource_size_t kmem_end; - struct resource *new_res; - int numa_node; + struct range range = dax_kmem_range(dev_dax); + int numa_node = dev_dax->target_node; + struct resource *res; int rc; /* @@ -31,95 +39,69 @@ int dev_dax_kmem_probe(struct device *dev) * could be mixed in a node with faster memory, causing * unavoidable performance issues. */ - numa_node = dev_dax->target_node; if (numa_node < 0) { dev_warn(dev, "rejecting DAX region with invalid node: %d\n", numa_node); return -EINVAL; } - /* Hotplug starting at the beginning of the next block: */ - kmem_start = ALIGN(range->start, memory_block_size_bytes()); - - kmem_size = range_len(range); - /* Adjust the size down to compensate for moving up kmem_start: */ - kmem_size -= kmem_start - range->start; - /* Align the size down to cover only complete blocks: */ - kmem_size &= ~(memory_block_size_bytes() - 1); - kmem_end = kmem_start + kmem_size; - - /* Region is permanently reserved. Hot-remove not yet implemented. */ - new_res = request_mem_region(kmem_start, kmem_size, dev_name(dev)); - if (!new_res) { - dev_warn(dev, "could not reserve region [%pa-%pa]\n", - &kmem_start, &kmem_end); + res = request_mem_region(range.start, range_len(&range), dev_name(dev)); + if (!res) { + dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", + range.start, range.end); return -EBUSY; } - /* - * Set flags appropriate for System RAM. Leave ..._BUSY clear - * so that add_memory() can add a child resource. Do not - * inherit flags from the parent since it may set new flags - * unknown to us that will break add_memory() below. - */ - new_res->flags = IORESOURCE_SYSTEM_RAM; - new_res->name = dev_name(dev); - - rc = add_memory(numa_node, new_res->start, resource_size(new_res)); + /* Temporarily clear busy to allow add_memory() to claim it */ + res->flags &= ~IORESOURCE_BUSY; + rc = add_memory(numa_node, range.start, range_len(&range)); + res->flags |= IORESOURCE_BUSY; if (rc) { - release_resource(new_res); - kfree(new_res); + release_mem_region(range.start, range_len(&range)); return rc; } - dev_dax->dax_kmem_res = new_res; return 0; } #ifdef CONFIG_MEMORY_HOTREMOVE -static int dev_dax_kmem_remove(struct device *dev) +static void dax_kmem_release(struct dev_dax *dev_dax) { - struct dev_dax *dev_dax = to_dev_dax(dev); - struct resource *res = dev_dax->dax_kmem_res; - resource_size_t kmem_start = res->start; - resource_size_t kmem_size = resource_size(res); + struct range range = dax_kmem_range(dev_dax); int rc; /* * We have one shot for removing memory, if some memory blocks were not * offline prior to calling this function remove_memory() will fail, and * there is no way to hotremove this memory until reboot because device - * unbind will succeed even if we return failure. + * unbind will proceed regardless of the remove_memory result. */ - rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size); - if (rc) { - dev_err(dev, - "DAX region %pR cannot be hotremoved until the next reboot\n", - res); - return rc; + rc = remove_memory(dev_dax->target_node, range.start, range_len(&range)); + if (rc == 0) { + release_mem_region(range.start, range_len(&range)); + return; } - - /* Release and free dax resources */ - release_resource(res); - kfree(res); - dev_dax->dax_kmem_res = NULL; - - return 0; + dev_err(&dev_dax->dev, "%#llx-%#llx cannot be hotremoved until the next reboot\n", + range.start, range.end); } #else -static int dev_dax_kmem_remove(struct device *dev) +static void dax_kmem_release(struct dev_dax *dev_dax) { /* - * Without hotremove purposely leak the request_mem_region() for the - * device-dax range and return '0' to ->remove() attempts. The removal - * of the device from the driver always succeeds, but the region is - * permanently pinned as reserved by the unreleased - * request_mem_region(). + * Without hotremove purposely leak the request_mem_region() for + * the device-dax range attempts. The removal of the device from + * the driver always succeeds, but the region is permanently + * pinned as reserved by the unreleased request_mem_region(). */ - return 0; } #endif /* CONFIG_MEMORY_HOTREMOVE */ +static int dev_dax_kmem_remove(struct device *dev) +{ + dax_kmem_release(to_dev_dax(dev)); + return 0; +} + static struct dax_device_driver device_dax_kmem_driver = { .drv = { .probe = dev_dax_kmem_probe, From patchwork Mon Mar 23 23:55:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454269 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 99136913 for ; Tue, 24 Mar 2020 00:11:12 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8559F20789 for ; Tue, 24 Mar 2020 00:11:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8559F20789 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B015B10FC3895; Mon, 23 Mar 2020 17:12:02 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.20; helo=mga02.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2E9AD10FC388A for ; Mon, 23 Mar 2020 17:12:00 -0700 (PDT) IronPort-SDR: G+7e+liJwAbLpsLy7ltoy/TX9jCXtbf2ugSqs9qIsiutpAqoxUgti6UYMG2muquOhBXNz12sYZ k5NZpM2yPiHA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:09 -0700 IronPort-SDR: g69I3EClV2BqXP/7TFvkmXZK5hNooGCwc/KKgW96VLfpSX/ZW7sRJUPtRGVY5ijctG25rKLv4q hGu0ra/c+g1g== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="281413251" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:08 -0700 Subject: [PATCH 05/12] device-dax: Add an allocation interface for device-dax instances From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:01 -0700 Message-ID: <158500770126.2088294.8690264272015198775.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: WAFNYF2B5VRIXIAVMOFNTEBTOBVNXBVT X-Message-ID-Hash: WAFNYF2B5VRIXIAVMOFNTEBTOBVNXBVT X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In preparation for a facility that enables dax regions to be sub-divided, introduce infrastructure to track and allocate region capacity. The new dax_region/available_size attribute is only enabled for volatile hmem devices, not pmem devices that are defined by nvdimm namespace boundaries. This is per Jeff's feedback the last time dynamic device-dax capacity allocation support was discussed. Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com Signed-off-by: Dan Williams --- drivers/dax/bus.c | 116 ++++++++++++++++++++++++++++++++++++++++++--- drivers/dax/bus.h | 7 ++- drivers/dax/dax-private.h | 2 - drivers/dax/hmem/hmem.c | 7 +-- drivers/dax/pmem/core.c | 8 +-- 5 files changed, 118 insertions(+), 22 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 79e7514480b7..94163bf000a4 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -130,6 +130,11 @@ ATTRIBUTE_GROUPS(dax_drv); static int dax_bus_match(struct device *dev, struct device_driver *drv); +static bool is_static(struct dax_region *dax_region) +{ + return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0; +} + static struct bus_type dax_bus_type = { .name = "dax", .uevent = dax_bus_uevent, @@ -185,7 +190,48 @@ static ssize_t align_show(struct device *dev, } static DEVICE_ATTR_RO(align); +#define for_each_dax_region_resource(dax_region, res) \ + for (res = (dax_region)->res.child; res; res = res->sibling) + +static unsigned long long dax_region_avail_size(struct dax_region *dax_region) +{ + resource_size_t size = resource_size(&dax_region->res); + struct resource *res; + + device_lock_assert(dax_region->dev); + + for_each_dax_region_resource(dax_region, res) + size -= resource_size(res); + return size; +} + +static ssize_t available_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dax_region *dax_region = dev_get_drvdata(dev); + unsigned long long size; + + device_lock(dev); + size = dax_region_avail_size(dax_region); + device_unlock(dev); + + return sprintf(buf, "%llu\n", size); +} +static DEVICE_ATTR_RO(available_size); + +static umode_t dax_region_visible(struct kobject *kobj, struct attribute *a, + int n) +{ + struct device *dev = container_of(kobj, struct device, kobj); + struct dax_region *dax_region = dev_get_drvdata(dev); + + if (is_static(dax_region) && a == &dev_attr_available_size.attr) + return 0; + return a->mode; +} + static struct attribute *dax_region_attributes[] = { + &dev_attr_available_size.attr, &dev_attr_region_size.attr, &dev_attr_align.attr, &dev_attr_id.attr, @@ -195,6 +241,7 @@ static struct attribute *dax_region_attributes[] = { static const struct attribute_group dax_region_attribute_group = { .name = "dax_region", .attrs = dax_region_attributes, + .is_visible = dax_region_visible, }; static const struct attribute_group *dax_region_attribute_groups[] = { @@ -226,7 +273,8 @@ static void dax_region_unregister(void *region) } struct dax_region *alloc_dax_region(struct device *parent, int region_id, - struct resource *res, int target_node, unsigned int align) + struct resource *res, int target_node, unsigned int align, + unsigned long flags) { struct dax_region *dax_region; @@ -249,12 +297,17 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, return NULL; dev_set_drvdata(parent, dax_region); - memcpy(&dax_region->res, res, sizeof(*res)); kref_init(&dax_region->kref); dax_region->id = region_id; dax_region->align = align; dax_region->dev = parent; dax_region->target_node = target_node; + dax_region->res = (struct resource) { + .start = res->start, + .end = res->end, + .flags = IORESOURCE_MEM | flags, + }; + if (sysfs_create_groups(&parent->kobj, dax_region_attribute_groups)) { kfree(dax_region); return NULL; @@ -267,6 +320,32 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, } EXPORT_SYMBOL_GPL(alloc_dax_region); +static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) +{ + struct dax_region *dax_region = dev_dax->region; + struct resource *res = &dax_region->res; + struct device *dev = &dev_dax->dev; + struct resource *alloc; + + device_lock_assert(dax_region->dev); + + /* TODO: handle multiple allocations per region */ + if (res->child) + return -ENOMEM; + + alloc = __request_region(res, res->start, size, dev_name(dev), 0); + + if (!alloc) + return -ENOMEM; + + dev_dax->range = (struct range) { + .start = alloc->start, + .end = alloc->end, + }; + + return 0; +} + static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -361,6 +440,15 @@ void kill_dev_dax(struct dev_dax *dev_dax) } EXPORT_SYMBOL_GPL(kill_dev_dax); +static void free_dev_dax_range(struct dev_dax *dev_dax) +{ + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + + device_lock_assert(dax_region->dev); + __release_region(&dax_region->res, range->start, range_len(range)); +} + static void dev_dax_release(struct device *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); @@ -385,6 +473,7 @@ static void unregister_dev_dax(void *dev) dev_dbg(dev, "%s\n", __func__); kill_dev_dax(dev_dax); + free_dev_dax_range(dev_dax); device_del(dev); put_device(dev); } @@ -397,7 +486,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) struct dev_dax *dev_dax; struct inode *inode; struct device *dev; - int rc = -ENOMEM; + int rc; if (data->id < 0) return ERR_PTR(-EINVAL); @@ -406,7 +495,19 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) if (!dev_dax) return ERR_PTR(-ENOMEM); + dev_dax->region = dax_region; + dev = &dev_dax->dev; + device_initialize(dev); + dev_set_name(dev, "dax%d.%d", dax_region->id, data->id); + + rc = alloc_dev_dax_range(dev_dax, data->size); + if (rc) + goto err_range; + if (data->pgmap) { + dev_WARN_ONCE(parent, !is_static(dax_region), + "custom dev_pagemap requires a static dax_region\n"); + dev_dax->pgmap = kmemdup(data->pgmap, sizeof(struct dev_pagemap), GFP_KERNEL); if (!dev_dax->pgmap) @@ -425,12 +526,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) kill_dax(dax_dev); /* from here on we're committed to teardown via dev_dax_release() */ - dev = &dev_dax->dev; - device_initialize(dev); - dev_dax->dax_dev = dax_dev; - dev_dax->region = dax_region; - dev_dax->range = data->range; dev_dax->target_node = dax_region->target_node; kref_get(&dax_region->kref); @@ -442,7 +538,6 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) dev->class = dax_class; dev->parent = parent; dev->type = &dev_dax_type; - dev_set_name(dev, "dax%d.%d", dax_region->id, data->id); rc = device_add(dev); if (rc) { @@ -456,9 +551,12 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) return ERR_PTR(rc); return dev_dax; + err_alloc_dax: kfree(dev_dax->pgmap); err_pgmap: + free_dev_dax_range(dev_dax); +err_range: kfree(dev_dax); return ERR_PTR(rc); diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 4aeb36da83a4..44592a8cac0f 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -10,8 +10,11 @@ struct resource; struct dax_device; struct dax_region; void dax_region_put(struct dax_region *dax_region); + +#define IORESOURCE_DAX_STATIC (1UL << 0) struct dax_region *alloc_dax_region(struct device *parent, int region_id, - struct resource *res, int target_node, unsigned int align); + struct resource *res, int target_node, unsigned int align, + unsigned long flags); enum dev_dax_subsys { DEV_DAX_BUS = 0, /* zeroed dev_dax_data picks this by default */ @@ -22,7 +25,7 @@ struct dev_dax_data { struct dax_region *dax_region; struct dev_pagemap *pgmap; enum dev_dax_subsys subsys; - struct range range; + resource_size_t size; int id; }; diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 12a2dbc43b40..99b1273bb232 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -22,7 +22,7 @@ void dax_bus_exit(void); * @kref: to pin while other agents have a need to do lookups * @dev: parent device backing this region * @align: allocation and mapping alignment for child dax devices - * @res: physical address range of the region + * @res: resource tree to track instance allocations */ struct dax_region { int id; diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index af82d6ba820a..e7b64539e23e 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -20,17 +20,14 @@ static int dax_hmem_probe(struct platform_device *pdev) mri = dev->platform_data; dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node, - PMD_SIZE); + PMD_SIZE, 0); if (!dax_region) return -ENOMEM; data = (struct dev_dax_data) { .dax_region = dax_region, .id = 0, - .range = { - .start = res->start, - .end = res->end, - }, + .size = resource_size(res), }; dev_dax = devm_create_dev_dax(&data); if (IS_ERR(dev_dax)) diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c index 4fa81d3d2f65..4fe700884338 100644 --- a/drivers/dax/pmem/core.c +++ b/drivers/dax/pmem/core.c @@ -54,7 +54,8 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) memcpy(&res, &pgmap.res, sizeof(res)); res.start += offset; dax_region = alloc_dax_region(dev, region_id, &res, - nd_region->target_node, le32_to_cpu(pfn_sb->align)); + nd_region->target_node, le32_to_cpu(pfn_sb->align), + IORESOURCE_DAX_STATIC); if (!dax_region) return ERR_PTR(-ENOMEM); @@ -63,10 +64,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) .id = id, .pgmap = &pgmap, .subsys = subsys, - .range = { - .start = res.start, - .end = res.end, - }, + .size = resource_size(&res), }; dev_dax = devm_create_dev_dax(&data); From patchwork Mon Mar 23 23:55:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454275 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A68B51668 for ; Tue, 24 Mar 2020 00:11:18 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 910C220787 for ; Tue, 24 Mar 2020 00:11:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 910C220787 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C937E10FC3897; Mon, 23 Mar 2020 17:12:08 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.65; helo=mga03.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 330E510FC3893 for ; Mon, 23 Mar 2020 17:12:05 -0700 (PDT) IronPort-SDR: EeEekxQLBDKNupk2ZCKsYJVN4oN1h9iMu51ucA3ir5ZtzTgexkqJkjjoXu+3kG+O09HiT5qn6b 6ZCqnxl8qUpw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:14 -0700 IronPort-SDR: qUF3nrLiD+QIyUeehSgbfTyXgSHz2AzY0PLa1N5+F/msorQH07IqgcSgf/J82VNmx1td1EOdji fVEMheeMawrA== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="240091459" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:14 -0700 Subject: [PATCH 06/12] device-dax: Introduce seed devices From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:08 -0700 Message-ID: <158500770817.2088294.2401162451260434254.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: ITL2MHLV6S2IKU4CEQ737POBTDV7HQTR X-Message-ID-Hash: ITL2MHLV6S2IKU4CEQ737POBTDV7HQTR X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Add a seed device concept for dynamic dax regions to be able to split the region amongst multiple sub-instances. The seed device, similar to libnvdimm seed devices, is a device that starts with zero capacity allocated and unbound to a driver. In contrast to libnvdimm seed devices explicit 'create' and 'delete' interfaces are added to the region to trigger seeds to be created and unused devices to be reclaimed. The explicit create and delete replaces implicit create as a side effect of probe and implicit delete when writing 0 to the size. Only one seed can be created at a time and until that device has been successfully bound to a driver no more seeds can be added. Delete can be performed on any 0-sized and idle device. This avoids the gymnastics of needing to move device_unregister() to its own async context. Specifically, it avoids the deadlock of deleting a device via one of its own attributes. It is also less surprising to userspace which never sees an extra device it did not request. For now just add the device creation, teardown, and ->probe() prevention. A later patch will arrange for the 'dax/size' attribute to be writable to allocate capacity from the region. Signed-off-by: Dan Williams --- drivers/dax/bus.c | 296 +++++++++++++++++++++++++++++++++++++++------ drivers/dax/bus.h | 4 - drivers/dax/dax-private.h | 7 + drivers/dax/device.c | 12 +- drivers/dax/hmem/hmem.c | 2 drivers/dax/kmem.c | 14 +- drivers/dax/pmem/compat.c | 2 7 files changed, 281 insertions(+), 56 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 94163bf000a4..8db771feed3d 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -135,10 +135,46 @@ static bool is_static(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0; } +static int dax_bus_probe(struct device *dev) +{ + struct dax_device_driver *dax_drv = to_dax_drv(dev->driver); + struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + int rc; + + if (range_len(range) == 0 || dev_dax->id < 0) + return -ENXIO; + + rc = dax_drv->probe(dev_dax); + + if (rc || is_static(dax_region)) + return rc; + + /* + * Allow new seed creation after successful probe of the + * previous seed. + */ + if (dax_region->seed == dev) + dax_region->seed = NULL; + + return 0; +} + +static int dax_bus_remove(struct device *dev) +{ + struct dax_device_driver *dax_drv = to_dax_drv(dev->driver); + struct dev_dax *dev_dax = to_dev_dax(dev); + + return dax_drv->remove(dev_dax); +} + static struct bus_type dax_bus_type = { .name = "dax", .uevent = dax_bus_uevent, .match = dax_bus_match, + .probe = dax_bus_probe, + .remove = dax_bus_remove, .drv_groups = dax_drv_groups, }; @@ -219,14 +255,195 @@ static ssize_t available_size_show(struct device *dev, } static DEVICE_ATTR_RO(available_size); +static ssize_t seed_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dax_region *dax_region = dev_get_drvdata(dev); + struct device *seed; + ssize_t rc; + + if (is_static(dax_region)) + return -EINVAL; + + device_lock(dev); + seed = dax_region->seed; + rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : ""); + device_unlock(dev); + + return rc; +} +static DEVICE_ATTR_RO(seed); + +static ssize_t create_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct dax_region *dax_region = dev_get_drvdata(dev); + unsigned long long avail; + ssize_t rc; + int val; + + if (is_static(dax_region)) + return -EINVAL; + + rc = kstrtoint(buf, 0, &val); + if (rc) + return rc; + if (val != 1) + return -EINVAL; + + device_lock(dev); + avail = dax_region_avail_size(dax_region); + if (avail == 0) + rc = -ENOSPC; + else if (dax_region->seed) + rc = -EBUSY; + else { + struct dev_dax_data data = { + .dax_region = dax_region, + .size = 0, + .id = -1, + }; + struct dev_dax *dev_dax = devm_create_dev_dax(&data); + + if (IS_ERR(dev_dax)) + rc = PTR_ERR(dev_dax); + else { + /* + * This does not race ->probe() since the device + * will not succeed ->probe() until 'size' is + * non-zero and establishing 'size' locks + * against ->probe() + */ + dax_region->seed = &dev_dax->dev; + rc = len; + } + } + device_unlock(dev); + + return rc; +} +static DEVICE_ATTR_WO(create); + +void kill_dev_dax(struct dev_dax *dev_dax) +{ + struct dax_device *dax_dev = dev_dax->dax_dev; + struct inode *inode = dax_inode(dax_dev); + + kill_dax(dax_dev); + unmap_mapping_range(inode->i_mapping, 0, 0, 1); +} +EXPORT_SYMBOL_GPL(kill_dev_dax); + +static void free_dev_dax_range(struct dev_dax *dev_dax) +{ + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + + device_lock_assert(dax_region->dev); + if (range_len(range)) + __release_region(&dax_region->res, range->start, + range_len(range)); +} + +static void unregister_dev_dax(void *dev) +{ + struct dev_dax *dev_dax = to_dev_dax(dev); + + dev_dbg(dev, "%s\n", __func__); + + kill_dev_dax(dev_dax); + free_dev_dax_range(dev_dax); + device_del(dev); + put_device(dev); +} + +/* a return value >= 0 indicates this invocation invalidated the id */ +static int __free_dev_dax_id(struct dev_dax *dev_dax) +{ + struct dax_region *dax_region = dev_dax->region; + struct device *dev = &dev_dax->dev; + int rc = dev_dax->id; + + device_lock_assert(dev); + + if (is_static(dax_region) || dev_dax->id < 0) + return -1; + ida_free(&dax_region->ida, dev_dax->id); + dev_dax->id = -1; + return rc; +} + +static int free_dev_dax_id(struct dev_dax *dev_dax) +{ + struct device *dev = &dev_dax->dev; + int rc; + + device_lock(dev); + rc = __free_dev_dax_id(dev_dax); + device_unlock(dev); + return rc; +} + +static ssize_t delete_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + struct dax_region *dax_region = dev_get_drvdata(dev); + struct dev_dax *dev_dax; + struct device *victim; + bool do_del = false; + int rc; + + if (is_static(dax_region)) + return -EINVAL; + + victim = device_find_child_by_name(dax_region->dev, buf); + if (!victim) + return -ENXIO; + + device_lock(dev); + device_lock(victim); + dev_dax = to_dev_dax(victim); + if (victim->driver || range_len(&dev_dax->range)) + rc = -EBUSY; + else { + /* + * Invalidate the device so it does not become active + * again, but always preserve device-id-0 so that + * /sys/bus/dax/ is guaranteed to be populated while any + * dax_region is registered. + */ + if (dev_dax->id > 0) { + do_del = __free_dev_dax_id(dev_dax) >= 0; + rc = len; + if (dax_region->seed == victim) + dax_region->seed = NULL; + } else + rc = -EBUSY; + } + device_unlock(victim); + + /* won the race to invalidate the device, clean it up */ + if (do_del) + devm_release_action(dev, unregister_dev_dax, victim); + device_unlock(dev); + put_device(victim); + + return rc; +} +static DEVICE_ATTR_WO(delete); + static umode_t dax_region_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct dax_region *dax_region = dev_get_drvdata(dev); - if (is_static(dax_region) && a == &dev_attr_available_size.attr) - return 0; + if (is_static(dax_region)) + if (a == &dev_attr_available_size.attr + || a == &dev_attr_seed.attr + || a == &dev_attr_create.attr + || a == &dev_attr_delete.attr) + return 0; return a->mode; } @@ -234,6 +451,9 @@ static struct attribute *dax_region_attributes[] = { &dev_attr_available_size.attr, &dev_attr_region_size.attr, &dev_attr_align.attr, + &dev_attr_create.attr, + &dev_attr_delete.attr, + &dev_attr_seed.attr, &dev_attr_id.attr, NULL, }; @@ -302,6 +522,7 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, dax_region->align = align; dax_region->dev = parent; dax_region->target_node = target_node; + ida_init(&dax_region->ida); dax_region->res = (struct resource) { .start = res->start, .end = res->end, @@ -329,6 +550,15 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) device_lock_assert(dax_region->dev); + /* handle the seed alloc special case */ + if (!size) { + dev_dax->range = (struct range) { + .start = res->start, + .end = res->start - 1, + }; + return 0; + } + /* TODO: handle multiple allocations per region */ if (res->child) return -ENOMEM; @@ -430,33 +660,15 @@ static const struct attribute_group *dax_attribute_groups[] = { NULL, }; -void kill_dev_dax(struct dev_dax *dev_dax) -{ - struct dax_device *dax_dev = dev_dax->dax_dev; - struct inode *inode = dax_inode(dax_dev); - - kill_dax(dax_dev); - unmap_mapping_range(inode->i_mapping, 0, 0, 1); -} -EXPORT_SYMBOL_GPL(kill_dev_dax); - -static void free_dev_dax_range(struct dev_dax *dev_dax) -{ - struct dax_region *dax_region = dev_dax->region; - struct range *range = &dev_dax->range; - - device_lock_assert(dax_region->dev); - __release_region(&dax_region->res, range->start, range_len(range)); -} - static void dev_dax_release(struct device *dev) { struct dev_dax *dev_dax = to_dev_dax(dev); struct dax_region *dax_region = dev_dax->region; struct dax_device *dax_dev = dev_dax->dax_dev; - dax_region_put(dax_region); put_dax(dax_dev); + free_dev_dax_id(dev_dax); + dax_region_put(dax_region); kfree(dev_dax->pgmap); kfree(dev_dax); } @@ -466,18 +678,6 @@ static const struct device_type dev_dax_type = { .groups = dax_attribute_groups, }; -static void unregister_dev_dax(void *dev) -{ - struct dev_dax *dev_dax = to_dev_dax(dev); - - dev_dbg(dev, "%s\n", __func__); - - kill_dev_dax(dev_dax); - free_dev_dax_range(dev_dax); - device_del(dev); - put_device(dev); -} - struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) { struct dax_region *dax_region = data->dax_region; @@ -488,17 +688,35 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) struct device *dev; int rc; - if (data->id < 0) - return ERR_PTR(-EINVAL); - dev_dax = kzalloc(sizeof(*dev_dax), GFP_KERNEL); if (!dev_dax) return ERR_PTR(-ENOMEM); + if (is_static(dax_region)) { + if (dev_WARN_ONCE(parent, data->id < 0, + "dynamic id specified to static region\n")) { + rc = -EINVAL; + goto err_id; + } + + dev_dax->id = data->id; + } else { + if (dev_WARN_ONCE(parent, data->id >= 0, + "static id specified to dynamic region\n")) { + rc = -EINVAL; + goto err_id; + } + + rc = ida_alloc(&dax_region->ida, GFP_KERNEL); + if (rc < 0) + goto err_id; + dev_dax->id = rc; + } + dev_dax->region = dax_region; dev = &dev_dax->dev; device_initialize(dev); - dev_set_name(dev, "dax%d.%d", dax_region->id, data->id); + dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); rc = alloc_dev_dax_range(dev_dax, data->size); if (rc) @@ -557,6 +775,8 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) err_pgmap: free_dev_dax_range(dev_dax); err_range: + free_dev_dax_id(dev_dax); +err_id: kfree(dev_dax); return ERR_PTR(rc); diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index 44592a8cac0f..da27ea70a19a 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -38,6 +38,8 @@ struct dax_device_driver { struct device_driver drv; struct list_head ids; int match_always; + int (*probe)(struct dev_dax *dev); + int (*remove)(struct dev_dax *dev); }; int __dax_driver_register(struct dax_device_driver *dax_drv, @@ -48,7 +50,7 @@ void dax_driver_unregister(struct dax_device_driver *dax_drv); void kill_dev_dax(struct dev_dax *dev_dax); #if IS_ENABLED(CONFIG_DEV_DAX_PMEM_COMPAT) -int dev_dax_probe(struct device *dev); +int dev_dax_probe(struct dev_dax *dev_dax); #endif /* diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 99b1273bb232..496a580384c5 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -7,6 +7,7 @@ #include #include +#include /* private routines between core files */ struct dax_device; @@ -22,6 +23,8 @@ void dax_bus_exit(void); * @kref: to pin while other agents have a need to do lookups * @dev: parent device backing this region * @align: allocation and mapping alignment for child dax devices + * @ida: instance id allocator + * @seed: for dynamic regions, the most recently created instance * @res: resource tree to track instance allocations */ struct dax_region { @@ -30,6 +33,8 @@ struct dax_region { struct kref kref; struct device *dev; unsigned int align; + struct ida ida; + struct device *seed; struct resource res; }; @@ -39,6 +44,7 @@ struct dax_region { * @region - parent region * @dax_dev - core dax functionality * @target_node: effective numa node if dev_dax memory range is onlined + * @id: ida allocated id * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) * @range: resource range for the instance @@ -47,6 +53,7 @@ struct dev_dax { struct dax_region *region; struct dax_device *dax_dev; int target_node; + int id; struct device dev; struct dev_pagemap *pgmap; struct range range; diff --git a/drivers/dax/device.c b/drivers/dax/device.c index d631344fdadb..accbffa07ef1 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -391,11 +391,11 @@ static void dev_dax_kill(void *dev_dax) kill_dev_dax(dev_dax); } -int dev_dax_probe(struct device *dev) +int dev_dax_probe(struct dev_dax *dev_dax) { - struct dev_dax *dev_dax = to_dev_dax(dev); struct dax_device *dax_dev = dev_dax->dax_dev; struct range *range = &dev_dax->range; + struct device *dev = &dev_dax->dev; struct dev_pagemap *pgmap; struct inode *inode; struct cdev *cdev; @@ -445,17 +445,15 @@ int dev_dax_probe(struct device *dev) } EXPORT_SYMBOL_GPL(dev_dax_probe); -static int dev_dax_remove(struct device *dev) +static int dev_dax_remove(struct dev_dax *dev_dax) { /* all probe actions are unwound by devm */ return 0; } static struct dax_device_driver device_dax_driver = { - .drv = { - .probe = dev_dax_probe, - .remove = dev_dax_remove, - }, + .probe = dev_dax_probe, + .remove = dev_dax_remove, .match_always = 1, }; diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index e7b64539e23e..aa260009dfc7 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -26,7 +26,7 @@ static int dax_hmem_probe(struct platform_device *pdev) data = (struct dev_dax_data) { .dax_region = dax_region, - .id = 0, + .id = -1, .size = resource_size(res), }; dev_dax = devm_create_dev_dax(&data); diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 2bb7fa0951ef..d41f6ac8b56f 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -25,11 +25,11 @@ static struct range dax_kmem_range(struct dev_dax *dev_dax) return range; } -int dev_dax_kmem_probe(struct device *dev) +int dev_dax_kmem_probe(struct dev_dax *dev_dax) { - struct dev_dax *dev_dax = to_dev_dax(dev); struct range range = dax_kmem_range(dev_dax); int numa_node = dev_dax->target_node; + struct device *dev = &dev_dax->dev; struct resource *res; int rc; @@ -96,17 +96,15 @@ static void dax_kmem_release(struct dev_dax *dev_dax) } #endif /* CONFIG_MEMORY_HOTREMOVE */ -static int dev_dax_kmem_remove(struct device *dev) +static int dev_dax_kmem_remove(struct dev_dax *dev_dax) { - dax_kmem_release(to_dev_dax(dev)); + dax_kmem_release(dev_dax); return 0; } static struct dax_device_driver device_dax_kmem_driver = { - .drv = { - .probe = dev_dax_kmem_probe, - .remove = dev_dax_kmem_remove, - }, + .probe = dev_dax_kmem_probe, + .remove = dev_dax_kmem_remove, }; static int __init dax_kmem_init(void) diff --git a/drivers/dax/pmem/compat.c b/drivers/dax/pmem/compat.c index d7b15e6f30c5..863c114fd88c 100644 --- a/drivers/dax/pmem/compat.c +++ b/drivers/dax/pmem/compat.c @@ -22,7 +22,7 @@ static int dax_pmem_compat_probe(struct device *dev) return -ENOMEM; device_lock(&dev_dax->dev); - rc = dev_dax_probe(&dev_dax->dev); + rc = dev_dax_probe(dev_dax); device_unlock(&dev_dax->dev); devres_close_group(&dev_dax->dev, dev_dax); From patchwork Mon Mar 23 23:55:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454277 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB7B614B4 for ; Tue, 24 Mar 2020 00:11:21 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B629620719 for ; Tue, 24 Mar 2020 00:11:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B629620719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E8C4710FC389C; Mon, 23 Mar 2020 17:12:11 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.65; helo=mga03.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 1626A10FC3899 for ; Mon, 23 Mar 2020 17:12:10 -0700 (PDT) IronPort-SDR: GDWmmZlrmRRkr/YBRP1AY8j3NhuxbJOLPuvFMfHsL/ZfBrqt2VIbaX15XQV48nnu9QtOErLvRf TDZF6Wn7Lnvg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:19 -0700 IronPort-SDR: +yzgMZ4pMuQpT1NE7He3vuTOG2pgPJ7SLhUCST9bTczk+5BUCwe+yqr8TSGaKO3j4lsCilPm5N j5h/viUtpnPw== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="325739407" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:19 -0700 Subject: [PATCH 07/12] drivers/base: Make device_find_child_by_name() compatible with sysfs inputs From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:13 -0700 Message-ID: <158500771333.2088294.9851442753403203762.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: Q2Y2HQYWI7PQWGW3S52MCJ6JLFTN3WF5 X-Message-ID-Hash: Q2Y2HQYWI7PQWGW3S52MCJ6JLFTN3WF5 X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Use sysfs_streq() in device_find_child_by_name() to allow it to use a sysfs input string that might contain a trailing newline. The other "device by name" interfaces, {bus,driver,class}_find_device_by_name(), already account for sysfs strings. Signed-off-by: Dan Williams --- drivers/base/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 139cdf7e7327..4abfd4df42ec 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2931,7 +2931,7 @@ struct device *device_find_child_by_name(struct device *parent, klist_iter_init(&parent->p->klist_children, &i); while ((child = next_device(&i))) - if (!strcmp(dev_name(child), name) && get_device(child)) + if (sysfs_streq(dev_name(child), name) && get_device(child)) break; klist_iter_exit(&i); return child; From patchwork Mon Mar 23 23:55:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454281 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3473913 for ; Tue, 24 Mar 2020 00:11:28 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CDE1F20788 for ; Tue, 24 Mar 2020 00:11:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CDE1F20788 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 0C1FE10FC389B; Mon, 23 Mar 2020 17:12:19 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.88; helo=mga01.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DC16310FC3896 for ; Mon, 23 Mar 2020 17:12:16 -0700 (PDT) IronPort-SDR: 3A/YiZIz0ofNEhgwlV9XfBSeSmF2SueXfirJyNRC5sd1RGemb3IedQchLR0EgsNy9YPQNmDbbC va9ZF7gef/mA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:25 -0700 IronPort-SDR: FhihgXpXO734ms4pg404CTxhXTLQKf7BmV1kpTrW2gfHV9JrtYkgwks/y9x1Xlvxu+ddTK5E5o rzV4RXZa++BA== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="357253195" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:25 -0700 Subject: [PATCH 08/12] device-dax: Add resize support From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:18 -0700 Message-ID: <158500771845.2088294.637621783660044227.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: NP7UY5O5DX6GQOJWWX6G2SAFBQK7JVBF X-Message-ID-Hash: NP7UY5O5DX6GQOJWWX6G2SAFBQK7JVBF X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Make the device-dax 'size' attribute writable to allow capacity to be split between multiple instances in a region. The intended consumers of this capability are users that want to split a scarce memory resource between device-dax and System-RAM access, or users that want to have multiple security domains for a large region. By default the hmem instance provider allocates an entire region to the first instance. The process of creating a new instance (assuming a region-id of 0) is find the region and trigger the 'create' attribute which yields an empty instance to configure. For example: cd /sys/bus/dax/devices echo dax0.0 > dax0.0/driver/unbind echo $new_size > dax0.0/size echo 1 > $(readlink -f dax0.0)../dax_region/create seed=$(cat $(readlink -f dax0.0)../dax_region/seed) echo $new_size > $seed/size echo dax0.0 > ../drivers/{device_dax,kmem}/bind echo dax0.1 > ../drivers/{device_dax,kmem}/bind Instances can be destroyed by: echo $device > $(readlink -f $device)../dax_region/delete Signed-off-by: Dan Williams --- drivers/dax/bus.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 176 insertions(+), 10 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 8db771feed3d..6eb77127bb7d 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "dax-private.h" #include "bus.h" @@ -541,7 +542,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, } EXPORT_SYMBOL_GPL(alloc_dax_region); -static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) +static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, + resource_size_t size) { struct dax_region *dax_region = dev_dax->region; struct resource *res = &dax_region->res; @@ -550,8 +552,34 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) device_lock_assert(dax_region->dev); + if (dev_WARN_ONCE(&dev_dax->dev, !size, "non-zero size required\n")) + return -EINVAL; + + /* allow default @start when the resource tree is empty */ + if (start == U64_MAX && !res->child) + start = res->start; + if (start == U64_MAX) + return -EINVAL; + + alloc = __request_region(res, start, size, dev_name(dev), 0); + if (!alloc) + return -ENOMEM; + + dev_dax->range = (struct range) { + .start = alloc->start, + .end = alloc->end, + }; + + return 0; +} + +static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) +{ /* handle the seed alloc special case */ if (!size) { + struct dax_region *dax_region = dev_dax->region; + struct resource *res = &dax_region->res; + dev_dax->range = (struct range) { .start = res->start, .end = res->start - 1, @@ -559,18 +587,29 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) return 0; } - /* TODO: handle multiple allocations per region */ - if (res->child) - return -ENOMEM; + return __alloc_dev_dax_range(dev_dax, U64_MAX, size); +} - alloc = __request_region(res, res->start, size, dev_name(dev), 0); +static int __adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, + resource_size_t size) +{ + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + int rc = 0; - if (!alloc) - return -ENOMEM; + device_lock_assert(dax_region->dev); + + if (size) + rc = adjust_resource(res, range->start, size); + else + __release_region(&dax_region->res, range->start, + range_len(range)); + if (rc) + return rc; dev_dax->range = (struct range) { - .start = alloc->start, - .end = alloc->end, + .start = range->start, + .end = range->start + size - 1, }; return 0; @@ -584,7 +623,131 @@ static ssize_t size_show(struct device *dev, return sprintf(buf, "%llu\n", size); } -static DEVICE_ATTR_RO(size); + +static bool alloc_is_aligned(struct dax_region *dax_region, + resource_size_t size) +{ + /* + * The minimum mapping granularity for a device instance is a + * single subsection, unless the arch says otherwise. + */ + return IS_ALIGNED(size, max_t(unsigned long, dax_region->align, + memremap_compat_align())); +} + +static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) +{ + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + struct resource *res, *adjust = NULL; + struct device *dev = &dev_dax->dev; + + for_each_dax_region_resource(dax_region, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + + if (dev_WARN_ONCE(dev, !adjust, "failed to find matching resource\n")) + return -ENXIO; + return __adjust_dev_dax_range(dev_dax, adjust, size); +} + +static ssize_t dev_dax_resize(struct dax_region *dax_region, + struct dev_dax *dev_dax, resource_size_t size) +{ + resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; + resource_size_t dev_size = range_len(&dev_dax->range); + struct resource *region_res = &dax_region->res; + struct device *dev = &dev_dax->dev; + const char *name = dev_name(dev); + struct resource *res, *first; + + if (dev->driver) + return -EBUSY; + if (size == dev_size) + return 0; + if (size > dev_size && size - dev_size > avail) + return -ENOSPC; + if (size < dev_size) + return dev_dax_shrink(dev_dax, size); + + to_alloc = size - dev_size; + if (dev_WARN_ONCE(dev, !alloc_is_aligned(dax_region, to_alloc), + "resize of %pa misaligned\n", &to_alloc)) + return -ENXIO; + + /* + * Expand the device into the unused portion of the region. This + * may involve adjusting the end of an existing resource, or + * allocating a new resource. + */ + first = region_res->child; + if (!first) + return __alloc_dev_dax_range(dev_dax, dax_region->res.start, + to_alloc); + for (res = first; to_alloc && res; res = res->sibling) { + struct resource *next = res->sibling; + resource_size_t free; + + /* space at the beginning of the region */ + free = 0; + if (res == first && res->start > dax_region->res.start) + free = res->start - dax_region->res.start; + if (free >= to_alloc && dev_size == 0) + return __alloc_dev_dax_range(dev_dax, + dax_region->res.start, to_alloc); + + free = 0; + /* space between allocations */ + if (next && next->start > res->end + 1) + free = next->start - res->end + 1; + + /* space at the end of the region */ + if (free < to_alloc && !next && res->end < region_res->end) + free = region_res->end - res->end; + + if (free >= to_alloc && strcmp(name, res->name) == 0) + return __adjust_dev_dax_range(dev_dax, res, + resource_size(res) + to_alloc); + else if (free >= to_alloc && dev_size == 0) + return __alloc_dev_dax_range(dev_dax, res->end + 1, + to_alloc); + } + return -ENOSPC; +} + +static ssize_t size_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + ssize_t rc; + unsigned long long val; + struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + + rc = kstrtoull(buf, 0, &val); + if (rc) + return rc; + + if (!alloc_is_aligned(dax_region, val)) { + dev_dbg(dev, "%s: size: %lld misaligned\n", __func__, val); + return -EINVAL; + } + + device_lock(dax_region->dev); + if (!dax_region->dev->driver) { + device_unlock(dax_region->dev); + return -ENXIO; + } + device_lock(dev); + rc = dev_dax_resize(dax_region, dev_dax, val); + device_unlock(dev); + device_unlock(dax_region->dev); + + return rc == 0 ? len : rc; +} +static DEVICE_ATTR_RW(size); static int dev_dax_target_node(struct dev_dax *dev_dax) { @@ -633,11 +796,14 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; if (a == &dev_attr_target_node.attr && dev_dax_target_node(dev_dax) < 0) return 0; if (a == &dev_attr_numa_node.attr && !IS_ENABLED(CONFIG_NUMA)) return 0; + if (a == &dev_attr_size.attr && is_static(dax_region)) + return 0444; return a->mode; } From patchwork Mon Mar 23 23:55:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454285 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D87C913 for ; Tue, 24 Mar 2020 00:11:34 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 087B720787 for ; Tue, 24 Mar 2020 00:11:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 087B720787 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 2AB5410FC388C; Mon, 23 Mar 2020 17:12:24 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.65; helo=mga03.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7D2A510FC3883 for ; Mon, 23 Mar 2020 17:12:22 -0700 (PDT) IronPort-SDR: NWYm5hMYkvHNyuieHTk/u95NwrI3yGV0jGLu322JxWUrFKDrGrRKXUvNDMiiKV3io9xmPl+gec OhQ23sl4ofUQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:31 -0700 IronPort-SDR: 47HWtOBwYyHBN7VY3Lq+J6zBJaAvTrOOvEop78P7j4bHUAEq8TfshyVYicqnn/zAvUyxbWo1/o WyFJBNTko2mA== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="239647261" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:30 -0700 Subject: [PATCH 09/12] mm/memremap_pages: Convert to 'struct range' From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:24 -0700 Message-ID: <158500772405.2088294.14397986267725316797.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: YXBXJNVSRLHIHUKVDBCRJEDRZPRGCHQO X-Message-ID-Hash: YXBXJNVSRLHIHUKVDBCRJEDRZPRGCHQO X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: Jason Gunthorpe , Christoph Hellwig , Ben Skeggs , Paul Mackerras , Bjorn Helgaas , Michael Ellerman , dave.hansen@linux.intel.com, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The 'struct resource' in 'struct dev_pagemap' is only used for holding resource span information. The other fields, 'name', 'flags', 'desc', 'parent', 'sibling', and 'child' are all unused. This is in preparation for introducing a multi-range extension of devm_memremap_pages() where it would unfortunate to multiply that unused data by the number of extents being mapped. The bulk of this change is unwinding all the places internal to libnvdimm that used 'struct resource' unnecessarily. P2PDMA had a minor usage of the flags field, but only to report failures with "%pR". That is replaced with an open coded print of the range. Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: Ben Skeggs Cc: Ira Weiny Cc: Paul Mackerras Cc: Bjorn Helgaas Cc: Logan Gunthorpe Cc: Michael Ellerman Cc: Vishal Verma Signed-off-by: Dan Williams --- arch/powerpc/kvm/book3s_hv_uvmem.c | 13 +++-- drivers/dax/bus.c | 10 ++-- drivers/dax/bus.h | 2 - drivers/dax/dax-private.h | 5 -- drivers/dax/device.c | 3 - drivers/dax/hmem/hmem.c | 5 ++ drivers/dax/pmem/core.c | 12 ++--- drivers/gpu/drm/nouveau/nouveau_dmem.c | 3 + drivers/nvdimm/badrange.c | 26 +++++------ drivers/nvdimm/claim.c | 13 +++-- drivers/nvdimm/nd.h | 3 + drivers/nvdimm/pfn_devs.c | 12 ++--- drivers/nvdimm/pmem.c | 26 ++++++----- drivers/nvdimm/region.c | 21 +++++---- drivers/pci/p2pdma.c | 11 ++--- include/linux/memremap.h | 5 +- include/linux/range.h | 6 ++ mm/memremap.c | 77 ++++++++++++++++---------------- tools/testing/nvdimm/test/iomap.c | 2 - 19 files changed, 135 insertions(+), 120 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index f44f6b27950f..dd8f2dbdbfaf 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -329,9 +329,9 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) struct kvmppc_uvmem_page_pvt *pvt; unsigned long pfn_last, pfn_first; - pfn_first = kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT; + pfn_first = kvmppc_uvmem_pgmap.range.start >> PAGE_SHIFT; pfn_last = pfn_first + - (resource_size(&kvmppc_uvmem_pgmap.res) >> PAGE_SHIFT); + (range_len(&kvmppc_uvmem_pgmap.range) >> PAGE_SHIFT); spin_lock(&kvmppc_uvmem_bitmap_lock); bit = find_first_zero_bit(kvmppc_uvmem_bitmap, @@ -647,7 +647,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct vm_fault *vmf) static void kvmppc_uvmem_page_free(struct page *page) { unsigned long pfn = page_to_pfn(page) - - (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT); + (kvmppc_uvmem_pgmap.range.start >> PAGE_SHIFT); struct kvmppc_uvmem_page_pvt *pvt; spin_lock(&kvmppc_uvmem_bitmap_lock); @@ -778,7 +778,8 @@ int kvmppc_uvmem_init(void) } kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE; - kvmppc_uvmem_pgmap.res = *res; + kvmppc_uvmem_pgmap.range.start = res->start; + kvmppc_uvmem_pgmap.range.end = res->end; kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops; /* just one global instance: */ kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap; @@ -810,7 +811,7 @@ int kvmppc_uvmem_init(void) void kvmppc_uvmem_free(void) { memunmap_pages(&kvmppc_uvmem_pgmap); - release_mem_region(kvmppc_uvmem_pgmap.res.start, - resource_size(&kvmppc_uvmem_pgmap.res)); + release_mem_region(kvmppc_uvmem_pgmap.range.start, + range_len(&kvmppc_uvmem_pgmap.range)); kfree(kvmppc_uvmem_bitmap); } diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 6eb77127bb7d..9695d70b1e80 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -494,7 +494,7 @@ static void dax_region_unregister(void *region) } struct dax_region *alloc_dax_region(struct device *parent, int region_id, - struct resource *res, int target_node, unsigned int align, + struct range *range, int target_node, unsigned int align, unsigned long flags) { struct dax_region *dax_region; @@ -509,8 +509,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, return NULL; } - if (!IS_ALIGNED(res->start, align) - || !IS_ALIGNED(resource_size(res), align)) + if (!IS_ALIGNED(range->start, align) + || !IS_ALIGNED(range_len(range), align)) return NULL; dax_region = kzalloc(sizeof(*dax_region), GFP_KERNEL); @@ -525,8 +525,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, dax_region->target_node = target_node; ida_init(&dax_region->ida); dax_region->res = (struct resource) { - .start = res->start, - .end = res->end, + .start = range->start, + .end = range->end, .flags = IORESOURCE_MEM | flags, }; diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h index da27ea70a19a..72b92f95509f 100644 --- a/drivers/dax/bus.h +++ b/drivers/dax/bus.h @@ -13,7 +13,7 @@ void dax_region_put(struct dax_region *dax_region); #define IORESOURCE_DAX_STATIC (1UL << 0) struct dax_region *alloc_dax_region(struct device *parent, int region_id, - struct resource *res, int target_node, unsigned int align, + struct range *range, int target_node, unsigned int align, unsigned long flags); enum dev_dax_subsys { diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 496a580384c5..17d3b5f94de8 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -59,11 +59,6 @@ struct dev_dax { struct range range; }; -static inline u64 range_len(struct range *range) -{ - return range->end - range->start + 1; -} - static inline struct dev_dax *to_dev_dax(struct device *dev) { return container_of(dev, struct dev_dax, dev); diff --git a/drivers/dax/device.c b/drivers/dax/device.c index accbffa07ef1..3c9c5136295c 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -415,8 +415,7 @@ int dev_dax_probe(struct dev_dax *dev_dax) pgmap = devm_kzalloc(dev, sizeof(*pgmap), GFP_KERNEL); if (!pgmap) return -ENOMEM; - pgmap->res.start = range->start; - pgmap->res.end = range->end; + pgmap->range = *range; } pgmap->type = MEMORY_DEVICE_DEVDAX; addr = devm_memremap_pages(dev, pgmap); diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c index aa260009dfc7..1a3347bb6143 100644 --- a/drivers/dax/hmem/hmem.c +++ b/drivers/dax/hmem/hmem.c @@ -13,13 +13,16 @@ static int dax_hmem_probe(struct platform_device *pdev) struct dev_dax_data data; struct dev_dax *dev_dax; struct resource *res; + struct range range; res = platform_get_resource(pdev, IORESOURCE_MEM, 0); if (!res) return -ENOMEM; mri = dev->platform_data; - dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node, + range.start = res->start; + range.end = res->end; + dax_region = alloc_dax_region(dev, pdev->id, &range, mri->target_node, PMD_SIZE, 0); if (!dax_region) return -ENOMEM; diff --git a/drivers/dax/pmem/core.c b/drivers/dax/pmem/core.c index 4fe700884338..62b26bfceab1 100644 --- a/drivers/dax/pmem/core.c +++ b/drivers/dax/pmem/core.c @@ -9,7 +9,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) { - struct resource res; + struct range range; int rc, id, region_id; resource_size_t offset; struct nd_pfn_sb *pfn_sb; @@ -50,10 +50,10 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) if (rc != 2) return ERR_PTR(-EINVAL); - /* adjust the dax_region resource to the start of data */ - memcpy(&res, &pgmap.res, sizeof(res)); - res.start += offset; - dax_region = alloc_dax_region(dev, region_id, &res, + /* adjust the dax_region range to the start of data */ + range = pgmap.range; + range.start += offset, + dax_region = alloc_dax_region(dev, region_id, &range, nd_region->target_node, le32_to_cpu(pfn_sb->align), IORESOURCE_DAX_STATIC); if (!dax_region) @@ -64,7 +64,7 @@ struct dev_dax *__dax_pmem_probe(struct device *dev, enum dev_dax_subsys subsys) .id = id, .pgmap = &pgmap, .subsys = subsys, - .size = resource_size(&res), + .size = range_len(&range), }; dev_dax = devm_create_dev_dax(&data); diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index ad89e09a0be3..06faa819dbf3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -526,7 +526,8 @@ nouveau_dmem_init(struct nouveau_drm *drm) if (IS_ERR(res)) goto out_free; drm->dmem->pagemap.type = MEMORY_DEVICE_PRIVATE; - drm->dmem->pagemap.res = *res; + drm->dmem->pagemap.range.start = res->start; + drm->dmem->pagemap.range.end = res->end; drm->dmem->pagemap.ops = &nouveau_dmem_pagemap_ops; drm->dmem->pagemap.owner = drm->dev; if (IS_ERR(devm_memremap_pages(device, &drm->dmem->pagemap))) diff --git a/drivers/nvdimm/badrange.c b/drivers/nvdimm/badrange.c index b9eeefa27e3a..aaf6e215a8c6 100644 --- a/drivers/nvdimm/badrange.c +++ b/drivers/nvdimm/badrange.c @@ -211,7 +211,7 @@ static void __add_badblock_range(struct badblocks *bb, u64 ns_offset, u64 len) } static void badblocks_populate(struct badrange *badrange, - struct badblocks *bb, const struct resource *res) + struct badblocks *bb, const struct range *range) { struct badrange_entry *bre; @@ -222,34 +222,34 @@ static void badblocks_populate(struct badrange *badrange, u64 bre_end = bre->start + bre->length - 1; /* Discard intervals with no intersection */ - if (bre_end < res->start) + if (bre_end < range->start) continue; - if (bre->start > res->end) + if (bre->start > range->end) continue; /* Deal with any overlap after start of the namespace */ - if (bre->start >= res->start) { + if (bre->start >= range->start) { u64 start = bre->start; u64 len; - if (bre_end <= res->end) + if (bre_end <= range->end) len = bre->length; else - len = res->start + resource_size(res) + len = range->start + range_len(range) - bre->start; - __add_badblock_range(bb, start - res->start, len); + __add_badblock_range(bb, start - range->start, len); continue; } /* * Deal with overlap for badrange starting before * the namespace. */ - if (bre->start < res->start) { + if (bre->start < range->start) { u64 len; - if (bre_end < res->end) - len = bre->start + bre->length - res->start; + if (bre_end < range->end) + len = bre->start + bre->length - range->start; else - len = resource_size(res); + len = range_len(range); __add_badblock_range(bb, 0, len); } } @@ -267,7 +267,7 @@ static void badblocks_populate(struct badrange *badrange, * and add badblocks entries for all matching sub-ranges */ void nvdimm_badblocks_populate(struct nd_region *nd_region, - struct badblocks *bb, const struct resource *res) + struct badblocks *bb, const struct range *range) { struct nvdimm_bus *nvdimm_bus; @@ -279,7 +279,7 @@ void nvdimm_badblocks_populate(struct nd_region *nd_region, nvdimm_bus = walk_to_nvdimm_bus(&nd_region->dev); nvdimm_bus_lock(&nvdimm_bus->dev); - badblocks_populate(&nvdimm_bus->badrange, bb, res); + badblocks_populate(&nvdimm_bus->badrange, bb, range); nvdimm_bus_unlock(&nvdimm_bus->dev); } EXPORT_SYMBOL_GPL(nvdimm_badblocks_populate); diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 45964acba944..290267e1ff9f 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -303,13 +303,16 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio, resource_size_t size) { - struct resource *res = &nsio->res; struct nd_namespace_common *ndns = &nsio->common; + struct range range = { + .start = nsio->res.start, + .end = nsio->res.end, + }; nsio->size = size; - if (!devm_request_mem_region(dev, res->start, size, + if (!devm_request_mem_region(dev, range.start, size, dev_name(&ndns->dev))) { - dev_warn(dev, "could not reserve region %pR\n", res); + dev_warn(dev, "could not reserve region %pR\n", &nsio->res); return -EBUSY; } @@ -317,9 +320,9 @@ int devm_nsio_enable(struct device *dev, struct nd_namespace_io *nsio, if (devm_init_badblocks(dev, &nsio->bb)) return -ENOMEM; nvdimm_badblocks_populate(to_nd_region(ndns->dev.parent), &nsio->bb, - &nsio->res); + &range); - nsio->addr = devm_memremap(dev, res->start, size, ARCH_MEMREMAP_PMEM); + nsio->addr = devm_memremap(dev, range.start, size, ARCH_MEMREMAP_PMEM); return PTR_ERR_OR_ZERO(nsio->addr); } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index c4d69c1cce55..6d420d855097 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -377,8 +377,9 @@ int nvdimm_namespace_detach_btt(struct nd_btt *nd_btt); const char *nvdimm_namespace_disk_name(struct nd_namespace_common *ndns, char *name); unsigned int pmem_sector_size(struct nd_namespace_common *ndns); +struct range; void nvdimm_badblocks_populate(struct nd_region *nd_region, - struct badblocks *bb, const struct resource *res); + struct badblocks *bb, const struct range *range); int devm_namespace_enable(struct device *dev, struct nd_namespace_common *ndns, resource_size_t size); void devm_namespace_disable(struct device *dev, diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 34db557dbad1..7ad9ea107810 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -672,7 +672,7 @@ static unsigned long init_altmap_reserve(resource_size_t base) static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) { - struct resource *res = &pgmap->res; + struct range *range = &pgmap->range; struct vmem_altmap *altmap = &pgmap->altmap; struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb; u64 offset = le64_to_cpu(pfn_sb->dataoff); @@ -689,16 +689,16 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) .end_pfn = PHYS_PFN(end), }; - memcpy(res, &nsio->res, sizeof(*res)); - res->start += start_pad; - res->end -= end_trunc; - + *range = (struct range) { + .start = nsio->res.start + start_pad, + .end = nsio->res.end - end_trunc, + }; if (nd_pfn->mode == PFN_MODE_RAM) { if (offset < reserve) return -EINVAL; nd_pfn->npfns = le64_to_cpu(pfn_sb->npfns); } else if (nd_pfn->mode == PFN_MODE_PMEM) { - nd_pfn->npfns = PHYS_PFN((resource_size(res) - offset)); + nd_pfn->npfns = PHYS_PFN((range_len(range) - offset)); if (le64_to_cpu(nd_pfn->pfn_sb->npfns) > nd_pfn->npfns) dev_info(&nd_pfn->dev, "number of pfns truncated from %lld to %ld\n", diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 4eae441f86c9..b292b6cb763c 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -349,7 +349,7 @@ static int pmem_attach_disk(struct device *dev, struct nd_region *nd_region = to_nd_region(dev->parent); int nid = dev_to_node(dev), fua; struct resource *res = &nsio->res; - struct resource bb_res; + struct range bb_range; struct nd_pfn *nd_pfn = NULL; struct dax_device *dax_dev; struct nd_pfn_sb *pfn_sb; @@ -408,24 +408,26 @@ static int pmem_attach_disk(struct device *dev, pfn_sb = nd_pfn->pfn_sb; pmem->data_offset = le64_to_cpu(pfn_sb->dataoff); pmem->pfn_pad = resource_size(res) - - resource_size(&pmem->pgmap.res); + range_len(&pmem->pgmap.range); pmem->pfn_flags |= PFN_MAP; - memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res)); - bb_res.start += pmem->data_offset; + bb_range = pmem->pgmap.range; + bb_range.start += pmem->data_offset; } else if (pmem_should_map_pages(dev)) { - memcpy(&pmem->pgmap.res, &nsio->res, sizeof(pmem->pgmap.res)); + pmem->pgmap.range.start = res->start; + pmem->pgmap.range.end = res->end; pmem->pgmap.type = MEMORY_DEVICE_FS_DAX; pmem->pgmap.ops = &fsdax_pagemap_ops; addr = devm_memremap_pages(dev, &pmem->pgmap); pmem->pfn_flags |= PFN_MAP; - memcpy(&bb_res, &pmem->pgmap.res, sizeof(bb_res)); + bb_range = pmem->pgmap.range; } else { if (devm_add_action_or_reset(dev, pmem_release_queue, &pmem->pgmap)) return -ENOMEM; addr = devm_memremap(dev, pmem->phys_addr, pmem->size, ARCH_MEMREMAP_PMEM); - memcpy(&bb_res, &nsio->res, sizeof(bb_res)); + bb_range.start = res->start; + bb_range.end = res->end; } if (IS_ERR(addr)) @@ -456,7 +458,7 @@ static int pmem_attach_disk(struct device *dev, / 512); if (devm_init_badblocks(dev, &pmem->bb)) return -ENOMEM; - nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res); + nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range); disk->bb = &pmem->bb; if (is_nvdimm_sync(nd_region)) @@ -567,8 +569,8 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event) resource_size_t offset = 0, end_trunc = 0; struct nd_namespace_common *ndns; struct nd_namespace_io *nsio; - struct resource res; struct badblocks *bb; + struct range range; struct kernfs_node *bb_state; if (event != NVDIMM_REVALIDATE_POISON) @@ -604,9 +606,9 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event) nsio = to_nd_namespace_io(&ndns->dev); } - res.start = nsio->res.start + offset; - res.end = nsio->res.end - end_trunc; - nvdimm_badblocks_populate(nd_region, bb, &res); + range.start = nsio->res.start + offset; + range.end = nsio->res.end - end_trunc; + nvdimm_badblocks_populate(nd_region, bb, &range); if (bb_state) sysfs_notify_dirent(bb_state); } diff --git a/drivers/nvdimm/region.c b/drivers/nvdimm/region.c index 0f6978e72e7c..bfce87ed72ab 100644 --- a/drivers/nvdimm/region.c +++ b/drivers/nvdimm/region.c @@ -35,7 +35,10 @@ static int nd_region_probe(struct device *dev) return rc; if (is_memory(&nd_region->dev)) { - struct resource ndr_res; + struct range range = { + .start = nd_region->ndr_start, + .end = nd_region->ndr_start + nd_region->ndr_size - 1, + }; if (devm_init_badblocks(dev, &nd_region->bb)) return -ENODEV; @@ -44,9 +47,7 @@ static int nd_region_probe(struct device *dev) if (!nd_region->bb_state) dev_warn(&nd_region->dev, "'badblocks' notification disabled\n"); - ndr_res.start = nd_region->ndr_start; - ndr_res.end = nd_region->ndr_start + nd_region->ndr_size - 1; - nvdimm_badblocks_populate(nd_region, &nd_region->bb, &ndr_res); + nvdimm_badblocks_populate(nd_region, &nd_region->bb, &range); } rc = nd_region_register_namespaces(nd_region, &err); @@ -121,14 +122,16 @@ static void nd_region_notify(struct device *dev, enum nvdimm_event event) { if (event == NVDIMM_REVALIDATE_POISON) { struct nd_region *nd_region = to_nd_region(dev); - struct resource res; if (is_memory(&nd_region->dev)) { - res.start = nd_region->ndr_start; - res.end = nd_region->ndr_start + - nd_region->ndr_size - 1; + struct range range = { + .start = nd_region->ndr_start, + .end = nd_region->ndr_start + + nd_region->ndr_size - 1, + }; + nvdimm_badblocks_populate(nd_region, - &nd_region->bb, &res); + &nd_region->bb, &range); if (nd_region->bb_state) sysfs_notify_dirent(nd_region->bb_state); } diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index b73b10bce0df..39a7b32cc9ba 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -185,9 +185,8 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, return -ENOMEM; pgmap = &p2p_pgmap->pgmap; - pgmap->res.start = pci_resource_start(pdev, bar) + offset; - pgmap->res.end = pgmap->res.start + size - 1; - pgmap->res.flags = pci_resource_flags(pdev, bar); + pgmap->range.start = pci_resource_start(pdev, bar) + offset; + pgmap->range.end = pgmap->range.start + size - 1; pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; p2p_pgmap->provider = pdev; @@ -202,13 +201,13 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, error = gen_pool_add_owner(pdev->p2pdma->pool, (unsigned long)addr, pci_bus_address(pdev, bar) + offset, - resource_size(&pgmap->res), dev_to_node(&pdev->dev), + range_len(&pgmap->range), dev_to_node(&pdev->dev), pgmap->ref); if (error) goto pages_free; - pci_info(pdev, "added peer-to-peer DMA memory %pR\n", - &pgmap->res); + pci_info(pdev, "added peer-to-peer DMA memory %#llx-%#llx\n", + pgmap->range.start, pgmap->range.end); return 0; diff --git a/include/linux/memremap.h b/include/linux/memremap.h index fac75e5a723b..6c21951bdb16 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_MEMREMAP_H_ #define _LINUX_MEMREMAP_H_ +#include #include #include @@ -94,7 +95,7 @@ struct dev_pagemap_ops { /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings * @altmap: pre-allocated/reserved memory for vmemmap allocations - * @res: physical address range covered by @ref + * @range: physical address range covered by @ref * @ref: reference count that pins the devm_memremap_pages() mapping * @internal_ref: internal reference if @ref is not provided by the caller * @done: completion for @internal_ref @@ -107,7 +108,7 @@ struct dev_pagemap_ops { */ struct dev_pagemap { struct vmem_altmap altmap; - struct resource res; + struct range range; struct percpu_ref *ref; struct percpu_ref internal_ref; struct completion done; diff --git a/include/linux/range.h b/include/linux/range.h index d1fbeb664012..274681cc3154 100644 --- a/include/linux/range.h +++ b/include/linux/range.h @@ -1,12 +1,18 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_RANGE_H #define _LINUX_RANGE_H +#include struct range { u64 start; u64 end; }; +static inline u64 range_len(const struct range *range) +{ + return range->end - range->start + 1; +} + int add_range(struct range *range, int az, int nr_range, u64 start, u64 end); diff --git a/mm/memremap.c b/mm/memremap.c index 8afcc54c8928..9979891fec78 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -70,24 +70,24 @@ static void devmap_managed_enable_put(void) } #endif /* CONFIG_DEV_PAGEMAP_OPS */ -static void pgmap_array_delete(struct resource *res) +static void pgmap_array_delete(struct range *range) { - xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end), + xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end), NULL, GFP_KERNEL); synchronize_rcu(); } static unsigned long pfn_first(struct dev_pagemap *pgmap) { - return PHYS_PFN(pgmap->res.start) + + return PHYS_PFN(pgmap->range.start) + vmem_altmap_offset(pgmap_altmap(pgmap)); } static unsigned long pfn_end(struct dev_pagemap *pgmap) { - const struct resource *res = &pgmap->res; + const struct range *range = &pgmap->range; - return (res->start + resource_size(res)) >> PAGE_SHIFT; + return (range->start + range_len(range)) >> PAGE_SHIFT; } static unsigned long pfn_next(unsigned long pfn) @@ -146,7 +146,7 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap) void memunmap_pages(struct dev_pagemap *pgmap) { - struct resource *res = &pgmap->res; + struct range *range = &pgmap->range; struct page *first_page; unsigned long pfn; int nid; @@ -163,20 +163,20 @@ void memunmap_pages(struct dev_pagemap *pgmap) nid = page_to_nid(first_page); mem_hotplug_begin(); - remove_pfn_range_from_zone(page_zone(first_page), PHYS_PFN(res->start), - PHYS_PFN(resource_size(res))); + remove_pfn_range_from_zone(page_zone(first_page), PHYS_PFN(range->start), + PHYS_PFN(range_len(range))); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - __remove_pages(PHYS_PFN(res->start), - PHYS_PFN(resource_size(res)), NULL); + __remove_pages(PHYS_PFN(range->start), + PHYS_PFN(range_len(range)), NULL); } else { - arch_remove_memory(nid, res->start, resource_size(res), + arch_remove_memory(nid, range->start, range_len(range), pgmap_altmap(pgmap)); - kasan_remove_zero_shadow(__va(res->start), resource_size(res)); + kasan_remove_zero_shadow(__va(range->start), range_len(range)); } mem_hotplug_done(); - untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); - pgmap_array_delete(res); + untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range)); + pgmap_array_delete(range); WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n"); devmap_managed_enable_put(); } @@ -202,7 +202,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref) */ void *memremap_pages(struct dev_pagemap *pgmap, int nid) { - struct resource *res = &pgmap->res; + struct range *range = &pgmap->range; struct dev_pagemap *conflict_pgmap; struct mhp_params params = { /* @@ -271,7 +271,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) return ERR_PTR(error); } - conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->start), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->start), NULL); if (conflict_pgmap) { WARN(1, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); @@ -279,7 +279,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) goto err_array; } - conflict_pgmap = get_dev_pagemap(PHYS_PFN(res->end), NULL); + conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->end), NULL); if (conflict_pgmap) { WARN(1, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); @@ -287,26 +287,27 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) goto err_array; } - is_ram = region_intersects(res->start, resource_size(res), + is_ram = region_intersects(range->start, range_len(range), IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE); if (is_ram != REGION_DISJOINT) { - WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__, - is_ram == REGION_MIXED ? "mixed" : "ram", res); + WARN_ONCE(1, "attempted on %s region %#llx-%#llx\n", + is_ram == REGION_MIXED ? "mixed" : "ram", + range->start, range->end); error = -ENXIO; goto err_array; } - error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start), - PHYS_PFN(res->end), pgmap, GFP_KERNEL)); + error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(range->start), + PHYS_PFN(range->end), pgmap, GFP_KERNEL)); if (error) goto err_array; if (nid < 0) nid = numa_mem_id(); - error = track_pfn_remap(NULL, ¶ms.pgprot, PHYS_PFN(res->start), - 0, resource_size(res)); + error = track_pfn_remap(NULL, ¶ms.pgprot, PHYS_PFN(range->start), 0, + range_len(range)); if (error) goto err_pfn_remap; @@ -324,16 +325,16 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) * arch_add_memory(). */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { - error = add_pages(nid, PHYS_PFN(res->start), - PHYS_PFN(resource_size(res)), ¶ms); + error = add_pages(nid, PHYS_PFN(range->start), + PHYS_PFN(range_len(range)), ¶ms); } else { - error = kasan_add_zero_shadow(__va(res->start), resource_size(res)); + error = kasan_add_zero_shadow(__va(range->start), range_len(range)); if (error) { mem_hotplug_done(); goto err_kasan; } - error = arch_add_memory(nid, res->start, resource_size(res), + error = arch_add_memory(nid, range->start, range_len(range), ¶ms); } @@ -341,8 +342,8 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) struct zone *zone; zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; - move_pfn_range_to_zone(zone, PHYS_PFN(res->start), - PHYS_PFN(resource_size(res)), params.altmap); + move_pfn_range_to_zone(zone, PHYS_PFN(range->start), + PHYS_PFN(range_len(range)), params.altmap); } mem_hotplug_done(); @@ -354,17 +355,17 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) * to allow us to do the work while not holding the hotplug lock. */ memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], - PHYS_PFN(res->start), - PHYS_PFN(resource_size(res)), pgmap); + PHYS_PFN(range->start), + PHYS_PFN(range_len(range)), pgmap); percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap)); - return __va(res->start); + return __va(range->start); err_add_memory: - kasan_remove_zero_shadow(__va(res->start), resource_size(res)); + kasan_remove_zero_shadow(__va(range->start), range_len(range)); err_kasan: - untrack_pfn(NULL, PHYS_PFN(res->start), resource_size(res)); + untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range)); err_pfn_remap: - pgmap_array_delete(res); + pgmap_array_delete(range); err_array: dev_pagemap_kill(pgmap); dev_pagemap_cleanup(pgmap); @@ -389,7 +390,7 @@ EXPORT_SYMBOL_GPL(memremap_pages); * 'live' on entry and will be killed and reaped at * devm_memremap_pages_release() time, or if this routine fails. * - * 4/ res is expected to be a host memory range that could feasibly be + * 4/ range is expected to be a host memory range that could feasibly be * treated as a "System RAM" range, i.e. not a device mmio range, but * this is not enforced. */ @@ -446,7 +447,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, * In the cached case we're already holding a live reference. */ if (pgmap) { - if (phys >= pgmap->res.start && phys <= pgmap->res.end) + if (phys >= pgmap->range.start && phys <= pgmap->range.end) return pgmap; put_dev_pagemap(pgmap); } diff --git a/tools/testing/nvdimm/test/iomap.c b/tools/testing/nvdimm/test/iomap.c index 03e40b3b0106..c62d372d426f 100644 --- a/tools/testing/nvdimm/test/iomap.c +++ b/tools/testing/nvdimm/test/iomap.c @@ -126,7 +126,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref) void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) { int error; - resource_size_t offset = pgmap->res.start; + resource_size_t offset = pgmap->range.start; struct nfit_test_resource *nfit_res = get_nfit_res(offset); if (!nfit_res) From patchwork Mon Mar 23 23:55:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454289 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BCB2913 for ; Tue, 24 Mar 2020 00:11:40 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 15D3A20719 for ; Tue, 24 Mar 2020 00:11:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 15D3A20719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 5131910FC389E; Mon, 23 Mar 2020 17:12:30 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.120; helo=mga04.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id A3DA810FC3891 for ; Mon, 23 Mar 2020 17:12:27 -0700 (PDT) IronPort-SDR: 9JNo1nHGs4WlIcstDdjtvJUvja4pJxO3o9dQrHwDk9VoAQwACcI+LjnnYwZ0CK4K6X7DVaNI8r Pb68w8r9l82Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:36 -0700 IronPort-SDR: KjgHOrcd/oRwTcD/wVrHgqnToYQLG0WjTy3LApEVDaNydROuWzpRjrhaucp5PEaz9/f2wsFMXq +6/bHxlBDhEw== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="419689148" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:36 -0700 Subject: [PATCH 10/12] mm/memremap_pages: Support multiple ranges per invocation From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:30 -0700 Message-ID: <158500773040.2088294.10425417905649995321.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: JTNSHRLIF54SHWW5HRZAD2M7V3ZPEHEA X-Message-ID-Hash: JTNSHRLIF54SHWW5HRZAD2M7V3ZPEHEA X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: Jason Gunthorpe , Christoph Hellwig , Ben Skeggs , Paul Mackerras , Bjorn Helgaas , Michael Ellerman , dave.hansen@linux.intel.com, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In support of device-dax growing the ability to front physically dis-contiguous ranges of memory, update devm_memremap_pages() to track multiple ranges with a single reference counter and devm instance. Cc: Jason Gunthorpe Cc: Christoph Hellwig Cc: Ben Skeggs Cc: Ira Weiny Cc: Paul Mackerras Cc: Bjorn Helgaas Cc: Logan Gunthorpe Cc: Michael Ellerman Cc: Vishal Verma Signed-off-by: Dan Williams --- arch/powerpc/kvm/book3s_hv_uvmem.c | 1 drivers/dax/device.c | 1 drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 drivers/nvdimm/pfn_devs.c | 1 drivers/nvdimm/pmem.c | 1 drivers/pci/p2pdma.c | 1 include/linux/memremap.h | 8 + mm/memremap.c | 264 +++++++++++++++++++------------- 8 files changed, 167 insertions(+), 111 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index dd8f2dbdbfaf..ee56512465ac 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -780,6 +780,7 @@ int kvmppc_uvmem_init(void) kvmppc_uvmem_pgmap.type = MEMORY_DEVICE_PRIVATE; kvmppc_uvmem_pgmap.range.start = res->start; kvmppc_uvmem_pgmap.range.end = res->end; + kvmppc_uvmem_pgmap.nr_range = 1; kvmppc_uvmem_pgmap.ops = &kvmppc_uvmem_ops; /* just one global instance: */ kvmppc_uvmem_pgmap.owner = &kvmppc_uvmem_pgmap; diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 3c9c5136295c..9752f7b6cfe0 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -416,6 +416,7 @@ int dev_dax_probe(struct dev_dax *dev_dax) if (!pgmap) return -ENOMEM; pgmap->range = *range; + pgmap->nr_range = 1; } pgmap->type = MEMORY_DEVICE_DEVDAX; addr = devm_memremap_pages(dev, pgmap); diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 06faa819dbf3..a1033034c67f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -528,6 +528,7 @@ nouveau_dmem_init(struct nouveau_drm *drm) drm->dmem->pagemap.type = MEMORY_DEVICE_PRIVATE; drm->dmem->pagemap.range.start = res->start; drm->dmem->pagemap.range.end = res->end; + drm->dmem->pagemap.nr_range = 1; drm->dmem->pagemap.ops = &nouveau_dmem_pagemap_ops; drm->dmem->pagemap.owner = drm->dev; if (IS_ERR(devm_memremap_pages(device, &drm->dmem->pagemap))) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 7ad9ea107810..88a1aabe0657 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -693,6 +693,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) .start = nsio->res.start + start_pad, .end = nsio->res.end - end_trunc, }; + pgmap->nr_range = 1; if (nd_pfn->mode == PFN_MODE_RAM) { if (offset < reserve) return -EINVAL; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index b292b6cb763c..699910b12300 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -415,6 +415,7 @@ static int pmem_attach_disk(struct device *dev, } else if (pmem_should_map_pages(dev)) { pmem->pgmap.range.start = res->start; pmem->pgmap.range.end = res->end; + pmem->pgmap.nr_range = 1; pmem->pgmap.type = MEMORY_DEVICE_FS_DAX; pmem->pgmap.ops = &fsdax_pagemap_ops; addr = devm_memremap_pages(dev, &pmem->pgmap); diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c index 39a7b32cc9ba..c55de6619b91 100644 --- a/drivers/pci/p2pdma.c +++ b/drivers/pci/p2pdma.c @@ -187,6 +187,7 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, pgmap = &p2p_pgmap->pgmap; pgmap->range.start = pci_resource_start(pdev, bar) + offset; pgmap->range.end = pgmap->range.start + size - 1; + pgmap->nr_range = 1; pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; p2p_pgmap->provider = pdev; diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 6c21951bdb16..cdd25c837056 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -95,7 +95,6 @@ struct dev_pagemap_ops { /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings * @altmap: pre-allocated/reserved memory for vmemmap allocations - * @range: physical address range covered by @ref * @ref: reference count that pins the devm_memremap_pages() mapping * @internal_ref: internal reference if @ref is not provided by the caller * @done: completion for @internal_ref @@ -105,10 +104,10 @@ struct dev_pagemap_ops { * @owner: an opaque pointer identifying the entity that manages this * instance. Used by various helpers to make sure that no * foreign ZONE_DEVICE memory is accessed. + * @range(s): physical address range(s) covered by @ref */ struct dev_pagemap { struct vmem_altmap altmap; - struct range range; struct percpu_ref *ref; struct percpu_ref internal_ref; struct completion done; @@ -116,6 +115,11 @@ struct dev_pagemap { unsigned int flags; const struct dev_pagemap_ops *ops; void *owner; + int nr_range; + union { + struct range range; + struct range ranges[1]; + }; }; static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap) diff --git a/mm/memremap.c b/mm/memremap.c index 9979891fec78..f2c1a20f12f8 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -77,15 +77,19 @@ static void pgmap_array_delete(struct range *range) synchronize_rcu(); } -static unsigned long pfn_first(struct dev_pagemap *pgmap) +static unsigned long pfn_first(struct dev_pagemap *pgmap, int range_id) { - return PHYS_PFN(pgmap->range.start) + - vmem_altmap_offset(pgmap_altmap(pgmap)); + struct range *range = &pgmap->ranges[range_id]; + unsigned long pfn = PHYS_PFN(range->start); + + if (range_id) + return pfn; + return pfn + vmem_altmap_offset(pgmap_altmap(pgmap)); } -static unsigned long pfn_end(struct dev_pagemap *pgmap) +static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id) { - const struct range *range = &pgmap->range; + const struct range *range = &pgmap->ranges[range_id]; return (range->start + range_len(range)) >> PAGE_SHIFT; } @@ -117,8 +121,9 @@ bool pfn_zone_device_reserved(unsigned long pfn) return ret; } -#define for_each_device_pfn(pfn, map) \ - for (pfn = pfn_first(map); pfn < pfn_end(map); pfn = pfn_next(pfn)) +#define for_each_device_pfn(pfn, map, i) \ + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); \ + pfn = pfn_next(pfn)) static void dev_pagemap_kill(struct dev_pagemap *pgmap) { @@ -144,20 +149,14 @@ static void dev_pagemap_cleanup(struct dev_pagemap *pgmap) pgmap->ref = NULL; } -void memunmap_pages(struct dev_pagemap *pgmap) +static void pageunmap_range(struct dev_pagemap *pgmap, int range_id) { - struct range *range = &pgmap->range; + struct range *range = &pgmap->ranges[range_id]; struct page *first_page; - unsigned long pfn; int nid; - dev_pagemap_kill(pgmap); - for_each_device_pfn(pfn, pgmap) - put_page(pfn_to_page(pfn)); - dev_pagemap_cleanup(pgmap); - /* make sure to access a memmap that was actually initialized */ - first_page = pfn_to_page(pfn_first(pgmap)); + first_page = pfn_to_page(pfn_first(pgmap, range_id)); /* pages are dead and unused, undo the arch mapping */ nid = page_to_nid(first_page); @@ -177,6 +176,22 @@ void memunmap_pages(struct dev_pagemap *pgmap) untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range)); pgmap_array_delete(range); +} + +void memunmap_pages(struct dev_pagemap *pgmap) +{ + unsigned long pfn; + int i; + + dev_pagemap_kill(pgmap); + for (i = 0; i < pgmap->nr_range; i++) + for_each_device_pfn(pfn, pgmap, i) + put_page(pfn_to_page(pfn)); + dev_pagemap_cleanup(pgmap); + + for (i = 0; i < pgmap->nr_range; i++) + pageunmap_range(pgmap, i); + WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n"); devmap_managed_enable_put(); } @@ -195,96 +210,29 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref) complete(&pgmap->done); } -/* - * Not device managed version of dev_memremap_pages, undone by - * memunmap_pages(). Please use dev_memremap_pages if you have a struct - * device available. - */ -void *memremap_pages(struct dev_pagemap *pgmap, int nid) +static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, + int range_id, int nid) { - struct range *range = &pgmap->range; + struct range *range = &pgmap->ranges[range_id]; struct dev_pagemap *conflict_pgmap; - struct mhp_params params = { - /* - * We do not want any optional features only our own memmap - */ - .altmap = pgmap_altmap(pgmap), - .pgprot = PAGE_KERNEL, - }; int error, is_ram; - bool need_devmap_managed = true; - - switch (pgmap->type) { - case MEMORY_DEVICE_PRIVATE: - if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) { - WARN(1, "Device private memory not supported\n"); - return ERR_PTR(-EINVAL); - } - if (!pgmap->ops || !pgmap->ops->migrate_to_ram) { - WARN(1, "Missing migrate_to_ram method\n"); - return ERR_PTR(-EINVAL); - } - if (!pgmap->owner) { - WARN(1, "Missing owner\n"); - return ERR_PTR(-EINVAL); - } - break; - case MEMORY_DEVICE_FS_DAX: - if (!IS_ENABLED(CONFIG_ZONE_DEVICE) || - IS_ENABLED(CONFIG_FS_DAX_LIMITED)) { - WARN(1, "File system DAX not supported\n"); - return ERR_PTR(-EINVAL); - } - break; - case MEMORY_DEVICE_DEVDAX: - need_devmap_managed = false; - break; - case MEMORY_DEVICE_PCI_P2PDMA: - params.pgprot = pgprot_noncached(params.pgprot); - need_devmap_managed = false; - break; - default: - WARN(1, "Invalid pgmap type %d\n", pgmap->type); - break; - } - - if (!pgmap->ref) { - if (pgmap->ops && (pgmap->ops->kill || pgmap->ops->cleanup)) - return ERR_PTR(-EINVAL); - - init_completion(&pgmap->done); - error = percpu_ref_init(&pgmap->internal_ref, - dev_pagemap_percpu_release, 0, GFP_KERNEL); - if (error) - return ERR_PTR(error); - pgmap->ref = &pgmap->internal_ref; - } else { - if (!pgmap->ops || !pgmap->ops->kill || !pgmap->ops->cleanup) { - WARN(1, "Missing reference count teardown definition\n"); - return ERR_PTR(-EINVAL); - } - } - if (need_devmap_managed) { - error = devmap_managed_enable_get(pgmap); - if (error) - return ERR_PTR(error); - } + if (WARN_ONCE(pgmap_altmap(pgmap) && range_id > 0, + "altmap not supported for multiple ranges\n")) + return -EINVAL; conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->start), NULL); if (conflict_pgmap) { WARN(1, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); - error = -ENOMEM; - goto err_array; + return -ENOMEM; } conflict_pgmap = get_dev_pagemap(PHYS_PFN(range->end), NULL); if (conflict_pgmap) { WARN(1, "Conflicting mapping in same section\n"); put_dev_pagemap(conflict_pgmap); - error = -ENOMEM; - goto err_array; + return -ENOMEM; } is_ram = region_intersects(range->start, range_len(range), @@ -294,20 +242,19 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) WARN_ONCE(1, "attempted on %s region %#llx-%#llx\n", is_ram == REGION_MIXED ? "mixed" : "ram", range->start, range->end); - error = -ENXIO; - goto err_array; + return -ENXIO; } error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end), pgmap, GFP_KERNEL)); if (error) - goto err_array; + return error; if (nid < 0) nid = numa_mem_id(); - error = track_pfn_remap(NULL, ¶ms.pgprot, PHYS_PFN(range->start), 0, - range_len(range)); + error = track_pfn_remap(NULL, ¶ms->pgprot, PHYS_PFN(range->start), + 0, range_len(range)); if (error) goto err_pfn_remap; @@ -326,7 +273,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) */ if (pgmap->type == MEMORY_DEVICE_PRIVATE) { error = add_pages(nid, PHYS_PFN(range->start), - PHYS_PFN(range_len(range)), ¶ms); + PHYS_PFN(range_len(range)), params); } else { error = kasan_add_zero_shadow(__va(range->start), range_len(range)); if (error) { @@ -335,7 +282,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) } error = arch_add_memory(nid, range->start, range_len(range), - ¶ms); + params); } if (!error) { @@ -343,7 +290,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE]; move_pfn_range_to_zone(zone, PHYS_PFN(range->start), - PHYS_PFN(range_len(range)), params.altmap); + PHYS_PFN(range_len(range)), params->altmap); } mem_hotplug_done(); @@ -357,20 +304,119 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], PHYS_PFN(range->start), PHYS_PFN(range_len(range)), pgmap); - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap)); - return __va(range->start); + percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id) + - pfn_first(pgmap, range_id)); + return 0; - err_add_memory: +err_add_memory: kasan_remove_zero_shadow(__va(range->start), range_len(range)); - err_kasan: +err_kasan: untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range)); - err_pfn_remap: +err_pfn_remap: pgmap_array_delete(range); - err_array: - dev_pagemap_kill(pgmap); - dev_pagemap_cleanup(pgmap); - devmap_managed_enable_put(); - return ERR_PTR(error); + return error; +} + + +/* + * Not device managed version of dev_memremap_pages, undone by + * memunmap_pages(). Please use dev_memremap_pages if you have a struct + * device available. + */ +void *memremap_pages(struct dev_pagemap *pgmap, int nid) +{ + struct mhp_params params = { + /* + * We do not want any optional features only our own memmap + */ + .altmap = pgmap_altmap(pgmap), + .pgprot = PAGE_KERNEL, + }; + const int nr_range = pgmap->nr_range; + bool need_devmap_managed = true; + int error, i; + + if (WARN_ONCE(!nr_range, "nr_range must be specified\n")) + return ERR_PTR(-EINVAL); + + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) { + WARN(1, "Device private memory not supported\n"); + return ERR_PTR(-EINVAL); + } + if (!pgmap->ops || !pgmap->ops->migrate_to_ram) { + WARN(1, "Missing migrate_to_ram method\n"); + return ERR_PTR(-EINVAL); + } + if (!pgmap->owner) { + WARN(1, "Missing owner\n"); + return ERR_PTR(-EINVAL); + } + break; + case MEMORY_DEVICE_FS_DAX: + if (!IS_ENABLED(CONFIG_ZONE_DEVICE) || + IS_ENABLED(CONFIG_FS_DAX_LIMITED)) { + WARN(1, "File system DAX not supported\n"); + return ERR_PTR(-EINVAL); + } + break; + case MEMORY_DEVICE_DEVDAX: + need_devmap_managed = false; + break; + case MEMORY_DEVICE_PCI_P2PDMA: + params.pgprot = pgprot_noncached(params.pgprot); + need_devmap_managed = false; + break; + default: + WARN(1, "Invalid pgmap type %d\n", pgmap->type); + break; + } + + if (!pgmap->ref) { + if (pgmap->ops && (pgmap->ops->kill || pgmap->ops->cleanup)) + return ERR_PTR(-EINVAL); + + init_completion(&pgmap->done); + error = percpu_ref_init(&pgmap->internal_ref, + dev_pagemap_percpu_release, 0, GFP_KERNEL); + if (error) + return ERR_PTR(error); + pgmap->ref = &pgmap->internal_ref; + } else { + if (!pgmap->ops || !pgmap->ops->kill || !pgmap->ops->cleanup) { + WARN(1, "Missing reference count teardown definition\n"); + return ERR_PTR(-EINVAL); + } + } + + if (need_devmap_managed) { + error = devmap_managed_enable_get(pgmap); + if (error) + return ERR_PTR(error); + } + + /* + * Clear the pgmap nr_range as it will be incremented for each + * successfully processed range. This communicates how many + * regions to unwind in the abort case. + */ + pgmap->nr_range = 0; + error = 0; + for (i = 0; i < nr_range; i++) { + error = pagemap_range(pgmap, ¶ms, i, nid); + if (error) + break; + pgmap->nr_range++; + } + + if (i < nr_range) { + memunmap_pages(pgmap); + pgmap->nr_range = nr_range; + return ERR_PTR(error); + } + + return __va(pgmap->ranges[0].start); } EXPORT_SYMBOL_GPL(memremap_pages); From patchwork Mon Mar 23 23:55:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454293 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46A3614B4 for ; Tue, 24 Mar 2020 00:11:44 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3073D20719 for ; Tue, 24 Mar 2020 00:11:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3073D20719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 6BA8910FC38A3; Mon, 23 Mar 2020 17:12:34 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.126; helo=mga18.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 561A710FC38A1 for ; Mon, 23 Mar 2020 17:12:32 -0700 (PDT) IronPort-SDR: lxxicFOxlQTfmkq1hlTniYXD7XnNfsfYCCo9hWfbkryv//45kw3QxHs7Ehwgyp4vkmgiTHJXxv LY1ekyBAVRIw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:41 -0700 IronPort-SDR: II9mWSQjXBt6vxQ3hwFbd6KPXmXcIQuLLceKQr4Fs/SI1UsS/7/LsFnK5CoGIYE6WDzohEY6Qc 2wY0cI0dxH5w== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="393091929" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:41 -0700 Subject: [PATCH 11/12] device-dax: Add dis-contiguous resource support From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:35 -0700 Message-ID: <158500773552.2088294.8756587190550753100.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: QPZMY5KK36LRWSUP5ZUCVNB535HTNIYK X-Message-ID-Hash: QPZMY5KK36LRWSUP5ZUCVNB535HTNIYK X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: Joao Martins , dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Break the requirement that device-dax instances are physically contiguous. With this constraint removed it allows fragmented available capacity to be fully allocated. It can be used to help with memory-side-cache management for virtual machines, or any other scenario where a platform address boundary also designates a performance boundary. For example a direct mapped memory side cache might rotate cache colors at 1GB boundaries. With dis-contiguous allocations a device-dax instance could be configured to contain only 1 cache color. It also satisfies Joao's use case: Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/ Reported-by: Joao Martins Signed-off-by: Dan Williams --- drivers/dax/bus.c | 230 +++++++++++++++++++++++++++++----------- drivers/dax/dax-private.h | 9 +- drivers/dax/device.c | 55 ++++++---- drivers/dax/kmem.c | 98 ++++++++++++----- tools/testing/nvdimm/dax-dev.c | 20 ++- 5 files changed, 289 insertions(+), 123 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 9695d70b1e80..07aeb8fa9ee8 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -136,15 +136,27 @@ static bool is_static(struct dax_region *dax_region) return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0; } +static u64 dev_dax_size(struct dev_dax *dev_dax) +{ + u64 size = 0; + int i; + + device_lock_assert(&dev_dax->dev); + + for (i = 0; i < dev_dax->nr_range; i++) + size += range_len(&dev_dax->ranges[i].range); + + return size; +} + static int dax_bus_probe(struct device *dev) { struct dax_device_driver *dax_drv = to_dax_drv(dev->driver); struct dev_dax *dev_dax = to_dev_dax(dev); struct dax_region *dax_region = dev_dax->region; - struct range *range = &dev_dax->range; int rc; - if (range_len(range) == 0 || dev_dax->id < 0) + if (dev_dax_size(dev_dax) == 0 || dev_dax->id < 0) return -ENXIO; rc = dax_drv->probe(dev_dax); @@ -335,15 +347,19 @@ void kill_dev_dax(struct dev_dax *dev_dax) } EXPORT_SYMBOL_GPL(kill_dev_dax); -static void free_dev_dax_range(struct dev_dax *dev_dax) +static void free_dev_dax_ranges(struct dev_dax *dev_dax) { struct dax_region *dax_region = dev_dax->region; - struct range *range = &dev_dax->range; + int i; device_lock_assert(dax_region->dev); - if (range_len(range)) + for (i = 0; i < dev_dax->nr_range; i++) { + struct range *range = &dev_dax->ranges[i].range; + __release_region(&dax_region->res, range->start, range_len(range)); + } + dev_dax->nr_range = 0; } static void unregister_dev_dax(void *dev) @@ -353,7 +369,7 @@ static void unregister_dev_dax(void *dev) dev_dbg(dev, "%s\n", __func__); kill_dev_dax(dev_dax); - free_dev_dax_range(dev_dax); + free_dev_dax_ranges(dev_dax); device_del(dev); put_device(dev); } @@ -404,7 +420,7 @@ static ssize_t delete_store(struct device *dev, struct device_attribute *attr, device_lock(dev); device_lock(victim); dev_dax = to_dev_dax(victim); - if (victim->driver || range_len(&dev_dax->range)) + if (victim->driver || dev_dax_size(dev_dax)) rc = -EBUSY; else { /* @@ -548,7 +564,10 @@ static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, struct dax_region *dax_region = dev_dax->region; struct resource *res = &dax_region->res; struct device *dev = &dev_dax->dev; + struct dev_dax_range *ranges; + unsigned long pgoff = 0; struct resource *alloc; + int i; device_lock_assert(dax_region->dev); @@ -561,13 +580,26 @@ static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, if (start == U64_MAX) return -EINVAL; + ranges = krealloc(dev_dax->ranges, sizeof(*ranges) + * (dev_dax->nr_range + 1), GFP_KERNEL); + if (!ranges) + return -ENOMEM; + alloc = __request_region(res, start, size, dev_name(dev), 0); - if (!alloc) + if (!alloc) { + kfree(ranges); return -ENOMEM; + } - dev_dax->range = (struct range) { - .start = alloc->start, - .end = alloc->end, + for (i = 0; i < dev_dax->nr_range; i++) + pgoff += PHYS_PFN(range_len(&ranges[i].range)); + dev_dax->ranges = ranges; + ranges[dev_dax->nr_range++] = (struct dev_dax_range) { + .pgoff = pgoff, + .range = { + .start = alloc->start, + .end = alloc->end, + }, }; return 0; @@ -577,13 +609,10 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) { /* handle the seed alloc special case */ if (!size) { - struct dax_region *dax_region = dev_dax->region; - struct resource *res = &dax_region->res; - - dev_dax->range = (struct range) { - .start = res->start, - .end = res->start - 1, - }; + if (dev_WARN_ONCE(&dev_dax->dev, dev_dax->nr_range, + "0-size allocation must be first\n")) + return -EBUSY; + /* special case dev_dax->nr_range == 0 as 0 size */ return 0; } @@ -593,21 +622,23 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) static int __adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, resource_size_t size) { + int last_range = dev_dax->nr_range - 1; + struct dev_dax_range *dax_range = &dev_dax->ranges[last_range]; struct dax_region *dax_region = dev_dax->region; - struct range *range = &dev_dax->range; - int rc = 0; + struct range *range = &dax_range->range; + int rc; device_lock_assert(dax_region->dev); - if (size) - rc = adjust_resource(res, range->start, size); - else - __release_region(&dax_region->res, range->start, - range_len(range)); + if (dev_WARN_ONCE(&dev_dax->dev, !size, + "deletion is handled by dev_dax_shrink\n")) + return -EINVAL; + + rc = adjust_resource(res, range->start, size); if (rc) return rc; - dev_dax->range = (struct range) { + *range = (struct range) { .start = range->start, .end = range->start + size - 1, }; @@ -619,7 +650,11 @@ static ssize_t size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_dax *dev_dax = to_dev_dax(dev); - unsigned long long size = range_len(&dev_dax->range); + unsigned long long size; + + device_lock(dev); + size = dev_dax_size(dev_dax); + device_unlock(dev); return sprintf(buf, "%llu\n", size); } @@ -637,32 +672,79 @@ static bool alloc_is_aligned(struct dax_region *dax_region, static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) { + resource_size_t to_shrink = dev_dax_size(dev_dax) - size; struct dax_region *dax_region = dev_dax->region; - struct range *range = &dev_dax->range; - struct resource *res, *adjust = NULL; struct device *dev = &dev_dax->dev; - - for_each_dax_region_resource(dax_region, res) - if (strcmp(res->name, dev_name(dev)) == 0 - && res->start == range->start) { - adjust = res; - break; + int i; + + for (i = dev_dax->nr_range - 1; i >= 0; i--) { + struct range *range = &dev_dax->ranges[i].range; + struct resource *adjust = NULL, *res; + resource_size_t shrink; + + shrink = min(to_shrink, range_len(range)); + if (shrink >= range_len(range)) { + __release_region(&dax_region->res, range->start, + range_len(range)); + dev_dax->nr_range--; + to_shrink -= shrink; + if (!to_shrink) + break; + continue; } - if (dev_WARN_ONCE(dev, !adjust, "failed to find matching resource\n")) - return -ENXIO; - return __adjust_dev_dax_range(dev_dax, adjust, size); + for_each_dax_region_resource(dax_region, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + + if (dev_WARN_ONCE(dev, !adjust || i != dev_dax->nr_range - 1, + "failed to find matching resource\n")) + return -ENXIO; + return __adjust_dev_dax_range(dev_dax, adjust, range_len(range) + - shrink); + } + return 0; +} + +/* + * Only allow adjustments that preserve the relative pgoff of existing + * allocations. I.e. the dev_dax->ranges array is ordered by increasing pgoff. + */ +static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res) +{ + struct dev_dax_range *last; + int i; + + if (dev_dax->nr_range == 0) + return false; + if (strcmp(res->name, dev_name(&dev_dax->dev)) != 0) + return false; + last = &dev_dax->ranges[dev_dax->nr_range - 1]; + if (last->range.start != res->start || last->range.end != res->end) + return false; + for (i = 0; i < dev_dax->nr_range - 1; i++) { + struct dev_dax_range *dax_range = &dev_dax->ranges[i]; + + if (dax_range->pgoff > last->pgoff) + return false; + } + + return true; } static ssize_t dev_dax_resize(struct dax_region *dax_region, struct dev_dax *dev_dax, resource_size_t size) { resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; - resource_size_t dev_size = range_len(&dev_dax->range); + resource_size_t dev_size = dev_dax_size(dev_dax); struct resource *region_res = &dax_region->res; struct device *dev = &dev_dax->dev; - const char *name = dev_name(dev); struct resource *res, *first; + resource_size_t alloc = 0; + int rc; if (dev->driver) return -EBUSY; @@ -684,38 +766,47 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region, * allocating a new resource. */ first = region_res->child; - if (!first) - return __alloc_dev_dax_range(dev_dax, dax_region->res.start, - to_alloc); - for (res = first; to_alloc && res; res = res->sibling) { +retry: + rc = -ENOSPC; + for (res = first; res; res = res->sibling) { struct resource *next = res->sibling; - resource_size_t free; /* space at the beginning of the region */ - free = 0; - if (res == first && res->start > dax_region->res.start) - free = res->start - dax_region->res.start; - if (free >= to_alloc && dev_size == 0) - return __alloc_dev_dax_range(dev_dax, - dax_region->res.start, to_alloc); - - free = 0; + if (res == first && res->start > dax_region->res.start) { + alloc = min(res->start - dax_region->res.start, + to_alloc); + rc = __alloc_dev_dax_range(dev_dax, + dax_region->res.start, alloc); + break; + } + + alloc = 0; /* space between allocations */ if (next && next->start > res->end + 1) - free = next->start - res->end + 1; + alloc = min(next->start - (res->end + 1), to_alloc); /* space at the end of the region */ - if (free < to_alloc && !next && res->end < region_res->end) - free = region_res->end - res->end; - - if (free >= to_alloc && strcmp(name, res->name) == 0) - return __adjust_dev_dax_range(dev_dax, res, - resource_size(res) + to_alloc); - else if (free >= to_alloc && dev_size == 0) - return __alloc_dev_dax_range(dev_dax, res->end + 1, - to_alloc); + if (!alloc && !next && res->end < region_res->end) + alloc = min(region_res->end - res->end, to_alloc); + + if (!alloc) + continue; + + if (adjust_ok(dev_dax, res)) { + rc = __adjust_dev_dax_range(dev_dax, res, + resource_size(res) + alloc); + break; + } + rc = __alloc_dev_dax_range(dev_dax, res->end + 1, + alloc); + break; } - return -ENOSPC; + if (rc) + return rc; + to_alloc -= alloc; + if (to_alloc) + goto retry; + return 0; } static ssize_t size_store(struct device *dev, struct device_attribute *attr, @@ -769,8 +860,15 @@ static ssize_t resource_show(struct device *dev, struct device_attribute *attr, char *buf) { struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + unsigned long long start; + + if (dev_dax->nr_range < 1) + start = dax_region->res.start; + else + start = dev_dax->ranges[0].range.start; - return sprintf(buf, "%#llx\n", dev_dax->range.start); + return sprintf(buf, "%#llx\n", start); } static DEVICE_ATTR(resource, 0400, resource_show, NULL); @@ -939,7 +1037,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) err_alloc_dax: kfree(dev_dax->pgmap); err_pgmap: - free_dev_dax_range(dev_dax); + free_dev_dax_ranges(dev_dax); err_range: free_dev_dax_id(dev_dax); err_id: diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index 17d3b5f94de8..d6aea3eaea43 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -47,7 +47,8 @@ struct dax_region { * @id: ida allocated id * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) - * @range: resource range for the instance + * @nr_range: size of @ranges + * @ranges: resource-span + pgoff tuples for the instance */ struct dev_dax { struct dax_region *region; @@ -56,7 +57,11 @@ struct dev_dax { int id; struct device dev; struct dev_pagemap *pgmap; - struct range range; + int nr_range; + struct dev_dax_range { + unsigned long pgoff; + struct range range; + } *ranges; }; static inline struct dev_dax *to_dev_dax(struct device *dev) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 9752f7b6cfe0..b75e902d9c2f 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -55,15 +55,22 @@ static int check_vma(struct dev_dax *dev_dax, struct vm_area_struct *vma, __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size) { - struct range *range = &dev_dax->range; - phys_addr_t phys; - - phys = pgoff * PAGE_SIZE + range->start; - if (phys >= range->start && phys <= range->end) { + int i; + + for (i = 0; i < dev_dax->nr_range; i++) { + struct dev_dax_range *dax_range = &dev_dax->ranges[i]; + struct range *range = &dax_range->range; + unsigned long long pgoff_end; + phys_addr_t phys; + + pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1; + if (pgoff < dax_range->pgoff || pgoff > pgoff_end) + continue; + phys = PFN_PHYS(pgoff - dax_range->pgoff) + range->start; if (phys + size - 1 <= range->end) return phys; + break; } - return -1; } @@ -394,30 +401,40 @@ static void dev_dax_kill(void *dev_dax) int dev_dax_probe(struct dev_dax *dev_dax) { struct dax_device *dax_dev = dev_dax->dax_dev; - struct range *range = &dev_dax->range; struct device *dev = &dev_dax->dev; struct dev_pagemap *pgmap; struct inode *inode; struct cdev *cdev; void *addr; - int rc; - - /* 1:1 map region resource range to device-dax instance range */ - if (!devm_request_mem_region(dev, range->start, range_len(range), - dev_name(dev))) { - dev_warn(dev, "could not reserve range: %#llx - %#llx\n", - range->start, range->end); - return -EBUSY; - } + int rc, i; pgmap = dev_dax->pgmap; + if (dev_WARN_ONCE(dev, pgmap && dev_dax->nr_range > 1, + "static pgmap / multi-range device conflict\n")) + return -EINVAL; + if (!pgmap) { - pgmap = devm_kzalloc(dev, sizeof(*pgmap), GFP_KERNEL); + pgmap = devm_kzalloc(dev, sizeof(*pgmap) + sizeof(struct range) + * (dev_dax->nr_range - 1), GFP_KERNEL); if (!pgmap) return -ENOMEM; - pgmap->range = *range; - pgmap->nr_range = 1; + pgmap->nr_range = dev_dax->nr_range; + } + + for (i = 0; i < dev_dax->nr_range; i++) { + struct range *range = &dev_dax->ranges[i].range; + + if (!devm_request_mem_region(dev, range->start, + range_len(range), dev_name(dev))) { + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve range\n", + i, range->start, range->end); + return -EBUSY; + } + /* don't update the range for static pgmap */ + if (!dev_dax->pgmap) + pgmap->ranges[i] = *range; } + pgmap->type = MEMORY_DEVICE_DEVDAX; addr = devm_memremap_pages(dev, pgmap); if (IS_ERR(addr)) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index d41f6ac8b56f..8af5ec234e30 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -14,24 +14,27 @@ #include "dax-private.h" #include "bus.h" -static struct range dax_kmem_range(struct dev_dax *dev_dax) +static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r) { - struct range range; + struct dev_dax_range *dax_range = &dev_dax->ranges[i]; + struct range *range = &dax_range->range; /* memory-block align the hotplug range */ - range.start = ALIGN(dev_dax->range.start, memory_block_size_bytes()); - range.end = ALIGN_DOWN(dev_dax->range.end + 1, - memory_block_size_bytes()) - 1; - return range; + r->start = ALIGN(range->start, memory_block_size_bytes()); + r->end = ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1; + if (r->start >= r->end) { + r->start = range->start; + r->end = range->end; + return -ENOSPC; + } + return 0; } int dev_dax_kmem_probe(struct dev_dax *dev_dax) { - struct range range = dax_kmem_range(dev_dax); int numa_node = dev_dax->target_node; struct device *dev = &dev_dax->dev; - struct resource *res; - int rc; + int i, mapped = 0; /* * Ensure good NUMA information for the persistent memory. @@ -45,20 +48,45 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax) return -EINVAL; } - res = request_mem_region(range.start, range_len(&range), dev_name(dev)); - if (!res) { - dev_warn(dev, "could not reserve region [%#llx-%#llx]\n", - range.start, range.end); - return -EBUSY; - } + for (i = 0; i < dev_dax->nr_range; i++) { + struct resource *res; + struct range range; + int rc; + + rc = dax_kmem_range(dev_dax, i, &range); + if (rc) { + dev_info(dev, "mapping%d: %#llx-%#llx too small after alignment\n", + i, range.start, range.end); + continue; + } - /* Temporarily clear busy to allow add_memory() to claim it */ - res->flags &= ~IORESOURCE_BUSY; - rc = add_memory(numa_node, range.start, range_len(&range)); - res->flags |= IORESOURCE_BUSY; - if (rc) { - release_mem_region(range.start, range_len(&range)); - return rc; + res = request_mem_region(range.start, range_len(&range), + dev_name(dev)); + if (!res) { + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n", + i, range.start, range.end); + /* + * Once some memory has been onlined we can't + * assume that it can be un-onlined safely. + */ + if (mapped) + continue; + return -EBUSY; + } + + /* Temporarily clear busy to allow add_memory() to claim it */ + res->flags &= ~IORESOURCE_BUSY; + rc = add_memory(numa_node, range.start, range_len(&range)); + res->flags |= IORESOURCE_BUSY; + if (rc) { + dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", + i, range.start, range.end); + release_mem_region(range.start, range_len(&range)); + if (mapped) + continue; + return rc; + } + mapped++; } return 0; @@ -67,8 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax) #ifdef CONFIG_MEMORY_HOTREMOVE static void dax_kmem_release(struct dev_dax *dev_dax) { - struct range range = dax_kmem_range(dev_dax); - int rc; + int i; /* * We have one shot for removing memory, if some memory blocks were not @@ -76,13 +103,24 @@ static void dax_kmem_release(struct dev_dax *dev_dax) * there is no way to hotremove this memory until reboot because device * unbind will proceed regardless of the remove_memory result. */ - rc = remove_memory(dev_dax->target_node, range.start, range_len(&range)); - if (rc == 0) { - release_mem_region(range.start, range_len(&range)); - return; + for (i = 0; i < dev_dax->nr_range; i++) { + struct range range; + int rc; + + rc = dax_kmem_range(dev_dax, i, &range); + if (rc) + continue; + + rc = remove_memory(dev_dax->target_node, range.start, + range_len(&range)); + if (rc == 0) { + release_mem_region(range.start, range_len(&range)); + continue; + } + dev_err(&dev_dax->dev, + "mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n", + i, range.start, range.end); } - dev_err(&dev_dax->dev, "%#llx-%#llx cannot be hotremoved until the next reboot\n", - range.start, range.end); } #else static void dax_kmem_release(struct dev_dax *dev_dax) diff --git a/tools/testing/nvdimm/dax-dev.c b/tools/testing/nvdimm/dax-dev.c index 38d8e55c4a0d..fb342a8c98d3 100644 --- a/tools/testing/nvdimm/dax-dev.c +++ b/tools/testing/nvdimm/dax-dev.c @@ -9,11 +9,18 @@ phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size) { - struct range *range = &dev_dax->range; - phys_addr_t addr; + int i; - addr = pgoff * PAGE_SIZE + range->start; - if (addr >= range->start && addr <= range->end) { + for (i = 0; i < dev_dax->nr_range; i++) { + struct dev_dax_range *dax_range = &dev_dax->ranges[i]; + struct range *range = &dax_range->range; + unsigned long long pgoff_end; + phys_addr_t addr; + + pgoff_end = dax_range->pgoff + PHYS_PFN(range_len(range)) - 1; + if (pgoff < dax_range->pgoff || pgoff > pgoff_end) + continue; + addr = PFN_PHYS(pgoff - dax_range->pgoff) + range->start; if (addr + size - 1 <= range->end) { if (get_nfit_res(addr)) { struct page *page; @@ -23,9 +30,10 @@ phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, page = vmalloc_to_page((void *)addr); return PFN_PHYS(page_to_pfn(page)); - } else - return addr; + } + return addr; } + break; } return -1; } From patchwork Mon Mar 23 23:55:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 11454297 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EBE514B4 for ; Tue, 24 Mar 2020 00:11:49 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5905620719 for ; Tue, 24 Mar 2020 00:11:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5905620719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 914D810FC38A2; Mon, 23 Mar 2020 17:12:39 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=dan.j.williams@intel.com; receiver= Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2B94710097DE7 for ; Mon, 23 Mar 2020 17:12:38 -0700 (PDT) IronPort-SDR: B4cjb1uHg8VaUztWa2nIdkoM4LcmTZXDwKZ5Lg4gY7mVY3lqZ7aDIyf5hD8Ytsfam/fKDYQ1Ch hI+Gr0x2xHiA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:47 -0700 IronPort-SDR: MPtOzZQ8O+DTJvhlxtI2NGD9SAPSdmJiGoUYjEsARfhJniew7lk8sJ1FqtGKtZNWSXtHrp2SGp KQ+onD5VBDuA== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="446014887" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:47 -0700 Subject: [PATCH 12/12] device-dax: Introduce 'mapping' devices From: Dan Williams To: linux-mm@kvack.org Date: Mon, 23 Mar 2020 16:55:40 -0700 Message-ID: <158500774067.2088294.1260962550701501447.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Message-ID-Hash: YUTEJHT3XVVJY62V5TLMNNOTOXZIHSPO X-Message-ID-Hash: YUTEJHT3XVVJY62V5TLMNNOTOXZIHSPO X-MailFrom: dan.j.williams@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In support of interrogating the physical address layout of a device with dis-contiguous ranges, introduce a sysfs directory with 'start', 'end', and 'page_offset' attributes. The alternative is trying to parse /proc/iomem, and that file will not reflect the extent layout until the device is enabled. Signed-off-by: Dan Williams --- drivers/dax/bus.c | 190 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dax/dax-private.h | 14 +++ 2 files changed, 202 insertions(+), 2 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 07aeb8fa9ee8..645449a079bd 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -558,6 +558,167 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, } EXPORT_SYMBOL_GPL(alloc_dax_region); +static void dax_mapping_release(struct device *dev) +{ + struct dax_mapping *mapping = to_dax_mapping(dev); + struct dev_dax *dev_dax = to_dev_dax(dev->parent); + + ida_free(&dev_dax->ida, mapping->id); + kfree(mapping); +} + +static void unregister_dax_mapping(void *data) +{ + struct device *dev = data; + struct dax_mapping *mapping = to_dax_mapping(dev); + struct dev_dax *dev_dax = to_dev_dax(dev->parent); + struct dax_region *dax_region = dev_dax->region; + + dev_dbg(dev, "%s\n", __func__); + + device_lock_assert(dax_region->dev); + + dev_dax->ranges[mapping->range_id].mapping = NULL; + mapping->range_id = -1; + + device_del(dev); + put_device(dev); +} + +static struct dev_dax_range *get_dax_range(struct device *dev) +{ + struct dax_mapping *mapping = to_dax_mapping(dev); + struct dev_dax *dev_dax = to_dev_dax(dev->parent); + struct dax_region *dax_region = dev_dax->region; + + device_lock(dax_region->dev); + if (mapping->range_id < 1) { + device_unlock(dax_region->dev); + return NULL; + } + + return &dev_dax->ranges[mapping->range_id]; +} + +static void put_dax_range(struct dev_dax_range *dax_range) +{ + struct dax_mapping *mapping = dax_range->mapping; + struct dev_dax *dev_dax = to_dev_dax(mapping->dev.parent); + struct dax_region *dax_region = dev_dax->region; + + device_unlock(dax_region->dev); +} + +static ssize_t start_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_dax_range *dax_range; + ssize_t rc; + + dax_range = get_dax_range(dev); + if (!dax_range) + return -ENXIO; + rc = sprintf(buf, "%#llx\n", dax_range->range.start); + put_dax_range(dax_range); + + return rc; +} +static DEVICE_ATTR(start, 0400, start_show, NULL); + +static ssize_t end_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_dax_range *dax_range; + ssize_t rc; + + dax_range = get_dax_range(dev); + if (!dax_range) + return -ENXIO; + rc = sprintf(buf, "%#llx\n", dax_range->range.end); + put_dax_range(dax_range); + + return rc; +} +static DEVICE_ATTR(end, 0400, end_show, NULL); + +static ssize_t pgoff_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct dev_dax_range *dax_range; + ssize_t rc; + + dax_range = get_dax_range(dev); + if (!dax_range) + return -ENXIO; + rc = sprintf(buf, "%#lx\n", dax_range->pgoff); + put_dax_range(dax_range); + + return rc; +} +static DEVICE_ATTR(page_offset, 0400, pgoff_show, NULL); + +static struct attribute *dax_mapping_attributes[] = { + &dev_attr_start.attr, + &dev_attr_end.attr, + &dev_attr_page_offset.attr, + NULL, +}; + +static const struct attribute_group dax_mapping_attribute_group = { + .attrs = dax_mapping_attributes, +}; + +static const struct attribute_group *dax_mapping_attribute_groups[] = { + &dax_mapping_attribute_group, + NULL, +}; + +static struct device_type dax_mapping_type = { + .release = dax_mapping_release, + .groups = dax_mapping_attribute_groups, +}; + +static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id) +{ + struct dax_region *dax_region = dev_dax->region; + struct dax_mapping *mapping; + struct device *dev; + int rc; + + device_lock_assert(dax_region->dev); + + if (dev_WARN_ONCE(&dev_dax->dev, !dax_region->dev->driver, + "region disabled\n")) + return -ENXIO; + + mapping = kzalloc(sizeof(*mapping), GFP_KERNEL); + if (!mapping) + return -ENOMEM; + mapping->range_id = range_id; + mapping->id = ida_alloc(&dev_dax->ida, GFP_KERNEL); + if (mapping->id < 0) { + kfree(mapping); + return -ENOMEM; + } + dev_dax->ranges[range_id].mapping = mapping; + dev = &mapping->dev; + device_initialize(dev); + dev->parent = &dev_dax->dev; + dev->type = &dax_mapping_type; + dev_set_name(dev, "mapping%d", mapping->id); + rc = device_add(dev); + if (rc) { + put_device(dev); + return rc; + } + + rc = devm_add_action_or_reset(dax_region->dev, unregister_dax_mapping, + dev); + if (rc) + return rc; + return 0; +} + static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, resource_size_t size) { @@ -567,7 +728,7 @@ static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, struct dev_dax_range *ranges; unsigned long pgoff = 0; struct resource *alloc; - int i; + int i, rc; device_lock_assert(dax_region->dev); @@ -602,6 +763,21 @@ static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, }, }; + /* + * A dev_dax instance must be registered before mapping device + * children can be added. Defer to devm_create_dev_dax() to add + * the initial mapping device. + */ + if (!device_is_registered(&dev_dax->dev)) + return 0; + + rc = devm_register_dax_mapping(dev_dax, dev_dax->nr_range - 1); + if (rc) { + dev_dax->nr_range--; + __release_region(res, alloc->start, resource_size(alloc)); + return rc; + } + return 0; } @@ -679,11 +855,14 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) for (i = dev_dax->nr_range - 1; i >= 0; i--) { struct range *range = &dev_dax->ranges[i].range; + struct dax_mapping *mapping = dev_dax->ranges[i].mapping; struct resource *adjust = NULL, *res; resource_size_t shrink; shrink = min(to_shrink, range_len(range)); if (shrink >= range_len(range)) { + devm_release_action(dax_region->dev, + unregister_dax_mapping, &mapping->dev); __release_region(&dax_region->res, range->start, range_len(range)); dev_dax->nr_range--; @@ -1007,9 +1186,9 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) /* a device_dax instance is dead while the driver is not attached */ kill_dax(dax_dev); - /* from here on we're committed to teardown via dev_dax_release() */ dev_dax->dax_dev = dax_dev; dev_dax->target_node = dax_region->target_node; + ida_init(&dev_dax->ida); kref_get(&dax_region->kref); inode = dax_inode(dax_dev); @@ -1032,6 +1211,13 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) if (rc) return ERR_PTR(rc); + /* register mapping device for the initial allocation range */ + if (dev_dax->nr_range && range_len(&dev_dax->ranges[0].range)) { + rc = devm_register_dax_mapping(dev_dax, 0); + if (rc) + return ERR_PTR(rc); + } + return dev_dax; err_alloc_dax: diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h index d6aea3eaea43..23f113e99c92 100644 --- a/drivers/dax/dax-private.h +++ b/drivers/dax/dax-private.h @@ -38,6 +38,12 @@ struct dax_region { struct resource res; }; +struct dax_mapping { + struct device dev; + int range_id; + int id; +}; + /** * struct dev_dax - instance data for a subdivision of a dax region, and * data while the device is activated in the driver. @@ -45,6 +51,7 @@ struct dax_region { * @dax_dev - core dax functionality * @target_node: effective numa node if dev_dax memory range is onlined * @id: ida allocated id + * @ida: mapping id allocator * @dev - device core * @pgmap - pgmap for memmap setup / lifetime (driver owned) * @nr_range: size of @ranges @@ -55,12 +62,14 @@ struct dev_dax { struct dax_device *dax_dev; int target_node; int id; + struct ida ida; struct device dev; struct dev_pagemap *pgmap; int nr_range; struct dev_dax_range { unsigned long pgoff; struct range range; + struct dax_mapping *mapping; } *ranges; }; @@ -68,4 +77,9 @@ static inline struct dev_dax *to_dev_dax(struct device *dev) { return container_of(dev, struct dev_dax, dev); } + +static inline struct dax_mapping *to_dax_mapping(struct device *dev) +{ + return container_of(dev, struct dax_mapping, dev); +} #endif