From patchwork Mon Oct 22 20:13:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Hansen X-Patchwork-Id: 10652415 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3CA0914BB for ; Mon, 22 Oct 2018 20:18:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 261FC28EC3 for ; Mon, 22 Oct 2018 20:18:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1A53428ED4; Mon, 22 Oct 2018 20:18:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9AADF28EC3 for ; Mon, 22 Oct 2018 20:18:40 +0000 (UTC) Received: from [127.0.0.1] (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 91C992117CEB2; Mon, 22 Oct 2018 13:18:40 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received-SPF: None (no SPF record) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=dave.hansen@linux.intel.com; receiver=linux-nvdimm@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 434422117B555 for ; Mon, 22 Oct 2018 13:18:39 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Oct 2018 13:18:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,413,1534834800"; d="scan'208";a="101509365" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by fmsmga001.fm.intel.com with ESMTP; 22 Oct 2018 13:18:38 -0700 Subject: [PATCH 2/9] dax: kernel memory driver for mm ownership of DAX To: linux-kernel@vger.kernel.org From: Dave Hansen Date: Mon, 22 Oct 2018 13:13:20 -0700 References: <20181022201317.8558C1D8@viggo.jf.intel.com> In-Reply-To: <20181022201317.8558C1D8@viggo.jf.intel.com> Message-Id: <20181022201320.45C9785C@viggo.jf.intel.com> X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: thomas.lendacky@amd.com, mhocko@suse.com, linux-nvdimm@lists.01.org, Dave Hansen , ying.huang@intel.com, linux-mm@kvack.org, zwisler@kernel.org, fengguang.wu@intel.com, akpm@linux-foundation.org MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Virus-Scanned: ClamAV using ClamSMTP Add the actual driver to which will own the DAX range. This allows very nice party with the other possible "owners" of a DAX region: device DAX and filesystem DAX. It also greatly simplifies the process of handing off control of the memory between the different owners since it's just a matter of unbinding and rebinding the device to different drivers. I tried to do this all internally to the kernel and the locking and "self-destruction" of the old device context was a nightmare. Having userspace drive it is a wonderful simplification. Cc: Dan Williams Cc: Dave Jiang Cc: Ross Zwisler Cc: Vishal Verma Cc: Tom Lendacky Cc: Andrew Morton Cc: Michal Hocko Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying Cc: Fengguang Wu --- b/drivers/dax/kmem.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) diff -puN /dev/null drivers/dax/kmem.c --- /dev/null 2018-09-18 12:39:53.059362935 -0700 +++ b/drivers/dax/kmem.c 2018-10-22 13:12:21.502930393 -0700 @@ -0,0 +1,152 @@ +// this just just a copy of drivers/dax/pmem.c with +// s/dax_pmem/dax_kmem' for now. +// +// need real license +/* + * Copyright(c) 2016-2018 Intel Corporation. All rights reserved. + */ +#include +#include +#include +#include +#include "../nvdimm/pfn.h" +#include "../nvdimm/nd.h" +#include "device-dax.h" + +struct dax_kmem { + struct device *dev; + struct percpu_ref ref; + struct dev_pagemap pgmap; + struct completion cmp; +}; + +static struct dax_kmem *to_dax_kmem(struct percpu_ref *ref) +{ + return container_of(ref, struct dax_kmem, ref); +} + +static void dax_kmem_percpu_release(struct percpu_ref *ref) +{ + struct dax_kmem *dax_kmem = to_dax_pmem(ref); + + dev_dbg(dax_kmem->dev, "trace\n"); + complete(&dax_kmem->cmp); +} + +static void dax_kmem_percpu_exit(void *data) +{ + struct percpu_ref *ref = data; + struct dax_kmem *dax_kmem = to_dax_pmem(ref); + + dev_dbg(dax_kmem->dev, "trace\n"); + wait_for_completion(&dax_kmem->cmp); + percpu_ref_exit(ref); +} + +static void dax_kmem_percpu_kill(void *data) +{ + struct percpu_ref *ref = data; + struct dax_kmem *dax_kmem = to_dax_pmem(ref); + + dev_dbg(dax_kmem->dev, "trace\n"); + percpu_ref_kill(ref); +} + +static int dax_kmem_probe(struct device *dev) +{ + void *addr; + struct resource res; + int rc, id, region_id; + struct nd_pfn_sb *pfn_sb; + struct dev_dax *dev_dax; + struct dax_kmem *dax_kmem; + struct nd_namespace_io *nsio; + struct dax_region *dax_region; + struct nd_namespace_common *ndns; + struct nd_dax *nd_dax = to_nd_dax(dev); + struct nd_pfn *nd_pfn = &nd_dax->nd_pfn; + + ndns = nvdimm_namespace_common_probe(dev); + if (IS_ERR(ndns)) + return PTR_ERR(ndns); + nsio = to_nd_namespace_io(&ndns->dev); + + dax_kmem = devm_kzalloc(dev, sizeof(*dax_kmem), GFP_KERNEL); + if (!dax_kmem) + return -ENOMEM; + + /* parse the 'pfn' info block via ->rw_bytes */ + rc = devm_nsio_enable(dev, nsio); + if (rc) + return rc; + rc = nvdimm_setup_pfn(nd_pfn, &dax_kmem->pgmap); + if (rc) + return rc; + devm_nsio_disable(dev, nsio); + + pfn_sb = nd_pfn->pfn_sb; + + if (!devm_request_mem_region(dev, nsio->res.start, + resource_size(&nsio->res), + dev_name(&ndns->dev))) { + dev_warn(dev, "could not reserve region %pR\n", &nsio->res); + return -EBUSY; + } + + dax_kmem->dev = dev; + init_completion(&dax_kmem->cmp); + rc = percpu_ref_init(&dax_kmem->ref, dax_kmem_percpu_release, 0, + GFP_KERNEL); + if (rc) + return rc; + + rc = devm_add_action_or_reset(dev, dax_kmem_percpu_exit, + &dax_kmem->ref); + if (rc) + return rc; + + dax_kmem->pgmap.ref = &dax_kmem->ref; + addr = devm_memremap_pages(dev, &dax_kmem->pgmap); + if (IS_ERR(addr)) + return PTR_ERR(addr); + + rc = devm_add_action_or_reset(dev, dax_kmem_percpu_kill, + &dax_kmem->ref); + if (rc) + return rc; + + /* adjust the dax_region resource to the start of data */ + memcpy(&res, &dax_kmem->pgmap.res, sizeof(res)); + res.start += le64_to_cpu(pfn_sb->dataoff); + + rc = sscanf(dev_name(&ndns->dev), "namespace%d.%d", ®ion_id, &id); + if (rc != 2) + return -EINVAL; + + dax_region = alloc_dax_region(dev, region_id, &res, + le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP); + if (!dax_region) + return -ENOMEM; + + /* TODO: support for subdividing a dax region... */ + dev_dax = devm_create_dev_dax(dax_region, id, &res, 1); + + /* child dev_dax instances now own the lifetime of the dax_region */ + dax_region_put(dax_region); + + return PTR_ERR_OR_ZERO(dev_dax); +} + +static struct nd_device_driver dax_kmem_driver = { + .probe = dax_kmem_probe, + .drv = { + .name = "dax_kmem", + }, + .type = ND_DRIVER_DAX_PMEM, +}; + +module_nd_driver(dax_kmem_driver); + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Intel Corporation"); +MODULE_ALIAS_ND_DEVICE(ND_DEVICE_DAX_PMEM);