From patchwork Fri Nov 13 18:12:30 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7613401 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C47EFBF90C for ; Fri, 13 Nov 2015 18:12:37 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1E935206C5 for ; Fri, 13 Nov 2015 18:12:36 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 419AB2063A for ; Fri, 13 Nov 2015 18:12:34 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 13D931A1F2F; Fri, 13 Nov 2015 10:12:34 -0800 (PST) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by ml01.01.org (Postfix) with ESMTP id 588CC1A1F2F for ; Fri, 13 Nov 2015 10:12:32 -0800 (PST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP; 13 Nov 2015 10:12:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,288,1444719600"; d="scan'208";a="850551803" Received: from orsmsx108.amr.corp.intel.com ([10.22.240.6]) by fmsmga002.fm.intel.com with ESMTP; 13 Nov 2015 10:12:31 -0800 Received: from orsmsx116.amr.corp.intel.com (10.22.240.14) by ORSMSX108.amr.corp.intel.com (10.22.240.6) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 13 Nov 2015 10:12:30 -0800 Received: from orsmsx107.amr.corp.intel.com ([169.254.1.20]) by ORSMSX116.amr.corp.intel.com ([169.254.9.162]) with mapi id 14.03.0248.002; Fri, 13 Nov 2015 10:12:30 -0800 From: "Williams, Dan J" To: "torvalds@linux-foundation.org" Subject: [GIT PULL] libnvdimm fixes for 4.4-rc1 Thread-Topic: [GIT PULL] libnvdimm fixes for 4.4-rc1 Thread-Index: AQHRHj7cH6LWUVEDbk6Yhpd03vTk5A== Date: Fri, 13 Nov 2015 18:12:30 +0000 Message-ID: <1447438349.17460.8.camel@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.7.201.23] Content-ID: MIME-Version: 1.0 Cc: "linux-acpi@vger.kernel.org" , "linux-nvdimm@lists.01.org" X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Linus, please pull from...   git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes ...to receive: 1/ 3 fixes tagged for -stable including a crash fix, simple performance tweak, and an invalid i/o error. 2/ build regression fix for the nvdimm unit tests 3/ nvdimm documentation update --- The following changes since commit 5d50ac70fe98518dbf620bfba8184254663125eb:   Merge tag 'xfs-for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs (2015-11-11 20:18:48 -0800) are available in the git repository at:   git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-fixes for you to fetch changes up to 152d7bd80dca5ce77ec2d7313149a2ab990e808e:   dax: fix __dax_pmd_fault crash (2015-11-12 18:33:54 -0800) ---------------------------------------------------------------- Dan Williams (4):       tools/testing/nvdimm, acpica: fix flag rename build breakage       libnvdimm, e820: fix numa node for e820-type-12 pmem ranges       libnvdimm, pmem: fix size trim in pmem_direct_access()       dax: fix __dax_pmd_fault crash Konrad Rzeszutek Wilk (1):       libnvdimm: documentation clarifications  Documentation/nvdimm/nvdimm.txt  | 49 +++++++++++++++++++++++-----------------  drivers/nvdimm/e820.c            | 15 +++++++++++-  drivers/nvdimm/pmem.c            | 15 ++----------  fs/dax.c                         |  7 ++++++  tools/testing/nvdimm/test/nfit.c |  2 +-  5 files changed, 52 insertions(+), 36 deletions(-) commit f42957967fb435aef6fc700fbbd9df89533b9a2e Author: Dan Williams Date:   Tue Nov 10 15:50:33 2015 -0800     tools/testing/nvdimm, acpica: fix flag rename build breakage          Commit ca321d1ca672 "ACPICA: Update NFIT table to rename a flags field"     performed a tree-wide s/ACPI_NFIT_MEM_ARMED/ACPI_NFIT_MEM_NOT_ARMED/     operation, but missed the tools/testing/nvdimm/ directory.          Cc: Bob Moore     Cc: Lv Zheng     Acked-by: Rafael J. Wysocki     Signed-off-by: Dan Williams commit f7256dc0cdbc68903502997bde619f555a910f50 Author: Dan Williams Date:   Wed Nov 11 16:46:33 2015 -0800     libnvdimm, e820: fix numa node for e820-type-12 pmem ranges          Rather than punt on the numa node for these e820 ranges try to find a     better answer with memory_add_physaddr_to_nid() when it is available.          Cc:     Reported-by: Boaz Harrosh     Tested-by: Boaz Harrosh     Signed-off-by: Dan Williams commit 589e75d15702dc720b363a92f984876704864946 Author: Dan Williams Date:   Sat Oct 24 19:55:58 2015 -0700     libnvdimm, pmem: fix size trim in pmem_direct_access()          This masking prevents access to the end of the device via dax_do_io(),     and is unnecessary as arch_add_memory() would have rejected an unaligned     allocation.          Cc:     Cc: Ross Zwisler     Signed-off-by: Dan Williams commit 8de5dff8bae634497f4413bc3067389f2ed267da Author: Konrad Rzeszutek Wilk Date:   Tue Nov 10 16:10:45 2015 -0800     libnvdimm: documentation clarifications          A bunch of changes that I hope will help in understanding it     better for first-time readers.          Signed-off-by: Konrad Rzeszutek Wilk     Signed-off-by: Dan Williams commit 152d7bd80dca5ce77ec2d7313149a2ab990e808e Author: Dan Williams Date:   Thu Nov 12 18:33:54 2015 -0800     dax: fix __dax_pmd_fault crash          Since 4.3 introduced devm_memremap_pages() the pfns handled by DAX may     optionally have a struct page backing.  When a mapped pfn reaches     vmf_insert_pfn_pmd() it fails with a crash signature like the following:           kernel BUG at mm/huge_memory.c:905!      [..]      Call Trace:       [] __dax_pmd_fault+0x2ea/0x5b0       [] xfs_filemap_pmd_fault+0x92/0x150 [xfs]       [] handle_mm_fault+0x312/0x1b50          Fix this by falling back to 4K mappings in the pfn_valid() case.  Longer     term, vmf_insert_pfn_pmd() needs to grow support for architectures that     can provide a 'pmd_special' capability.          Cc:     Cc: Andrew Morton     Reported-by: Ross Zwisler     Signed-off-by: Dan Williams diff --git a/Documentation/nvdimm/nvdimm.txt b/Documentation/nvdimm/nvdimm.txt index 197a0b6b0582..e894de69915a 100644 --- a/Documentation/nvdimm/nvdimm.txt +++ b/Documentation/nvdimm/nvdimm.txt @@ -62,6 +62,12 @@ DAX: File system extensions to bypass the page cache and block layer to  mmap persistent memory, from a PMEM block device, directly into a  process address space.   +DSM: Device Specific Method: ACPI method to to control specific +device - in this case the firmware. + +DCR: NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5. +It defines a vendor-id, device-id, and interface format for a given DIMM. +  BTT: Block Translation Table: Persistent memory is byte addressable.  Existing software may have an expectation that the power-fail-atomicity  of writes is at least one sector, 512 bytes.  The BTT is an indirection @@ -133,16 +139,16 @@ device driver:      registered, can be immediately attached to nd_pmem.        2. BLK (nd_blk.ko): This driver performs I/O using a set of platform -    defined apertures.  A set of apertures will all access just one DIMM. -    Multiple windows allow multiple concurrent accesses, much like +    defined apertures.  A set of apertures will access just one DIMM. +    Multiple windows (apertures) allow multiple concurrent accesses, much like      tagged-command-queuing, and would likely be used by different threads or      different CPUs.        The NFIT specification defines a standard format for a BLK-aperture, but      the spec also allows for vendor specific layouts, and non-NFIT BLK -    implementations may other designs for BLK I/O.  For this reason "nd_blk" -    calls back into platform-specific code to perform the I/O.  One such -    implementation is defined in the "Driver Writer's Guide" and "DSM +    implementations may have other designs for BLK I/O.  For this reason +    "nd_blk" calls back into platform-specific code to perform the I/O. +    One such implementation is defined in the "Driver Writer's Guide" and "DSM      Interface Example".     @@ -152,7 +158,7 @@ Why BLK?  While PMEM provides direct byte-addressable CPU-load/store access to  NVDIMM storage, it does not provide the best system RAS (recovery,  availability, and serviceability) model.  An access to a corrupted -system-physical-address address causes a cpu exception while an access +system-physical-address address causes a CPU exception while an access  to a corrupted address through an BLK-aperture causes that block window  to raise an error status in a register.  The latter is more aligned with  the standard error model that host-bus-adapter attached disks present. @@ -162,7 +168,7 @@ data could be interleaved in an opaque hardware specific manner across  several DIMMs.    PMEM vs BLK -BLK-apertures solve this RAS problem, but their presence is also the +BLK-apertures solve these RAS problems, but their presence is also the  major contributing factor to the complexity of the ND subsystem.  They  complicate the implementation because PMEM and BLK alias in DPA space.  Any given DIMM's DPA-range may contribute to one or more @@ -220,8 +226,8 @@ socket.  Each unique interface (BLK or PMEM) to DPA space is identified  by a region device with a dynamically assigned id (REGION0 - REGION5).        1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A -    single PMEM namespace is created in the REGION0-SPA-range that spans -    DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that +    single PMEM namespace is created in the REGION0-SPA-range that spans most +    of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that      interleaved system-physical-address range is reclaimed as BLK-aperture      accessed space starting at DPA-offset (a) into each DIMM.  In that      reclaimed space we create two BLK-aperture "namespaces" from REGION2 and @@ -230,13 +236,13 @@ by a region device with a dynamically assigned id (REGION0 - REGION5).        2. In the last portion of DIMM0 and DIMM1 we have an interleaved      system-physical-address range, REGION1, that spans those two DIMMs as -    well as DIMM2 and DIMM3.  Some of REGION1 allocated to a PMEM namespace -    named "pm1.0" the rest is reclaimed in 4 BLK-aperture namespaces (for +    well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace +    named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for      each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and      "blk5.0".        3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1 -    interleaved system-physical-address range (i.e. the DPA address below +    interleaved system-physical-address range (i.e. the DPA address past      offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.      Note, that this example shows that BLK-aperture namespaces don't need to      be contiguous in DPA-space. @@ -252,15 +258,15 @@ LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API    What follows is a description of the LIBNVDIMM sysfs layout and a  corresponding object hierarchy diagram as viewed through the LIBNDCTL -api.  The example sysfs paths and diagrams are relative to the Example +API.  The example sysfs paths and diagrams are relative to the Example  NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit  test.    LIBNDCTL: Context -Every api call in the LIBNDCTL library requires a context that holds the +Every API call in the LIBNDCTL library requires a context that holds the  logging parameters and other library instance state.  The library is  based on the libabc template: -https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git/ +https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git    LIBNDCTL: instantiate a new library context example   @@ -409,7 +415,7 @@ Bit 31:28 Reserved  LIBNVDIMM/LIBNDCTL: Region  ----------------------   -A generic REGION device is registered for each PMEM range orBLK-aperture +A generic REGION device is registered for each PMEM range or BLK-aperture  set.  Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture  sets on the "nfit_test.0" bus.  The primary role of regions are to be a  container of "mappings".  A mapping is a tuple of REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" | diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c index 8282db2ef99e..b0045a505dc8 100644 --- a/drivers/nvdimm/e820.c +++ b/drivers/nvdimm/e820.c @@ -3,6 +3,7 @@   * Copyright (c) 2015, Intel Corporation.   */  #include +#include  #include  #include   @@ -25,6 +26,18 @@ static int e820_pmem_remove(struct platform_device *pdev)   return 0;  }   +#ifdef CONFIG_MEMORY_HOTPLUG +static int e820_range_to_nid(resource_size_t addr) +{ + return memory_add_physaddr_to_nid(addr); +} +#else +static int e820_range_to_nid(resource_size_t addr) +{ + return NUMA_NO_NODE; +} +#endif +  static int e820_pmem_probe(struct platform_device *pdev)  {   static struct nvdimm_bus_descriptor nd_desc; @@ -48,7 +61,7 @@ static int e820_pmem_probe(struct platform_device *pdev)   memset(&ndr_desc, 0, sizeof(ndr_desc));   ndr_desc.res = p;   ndr_desc.attr_groups = e820_pmem_region_attribute_groups; - ndr_desc.numa_node = NUMA_NO_NODE; + ndr_desc.numa_node = e820_range_to_nid(p->start);   set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);   if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))   goto err; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 012e0649f1ac..8ee79893d2f5 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -105,22 +105,11 @@ static long pmem_direct_access(struct block_device *bdev, sector_t sector,  {   struct pmem_device *pmem = bdev->bd_disk->private_data;   resource_size_t offset = sector * 512 + pmem->data_offset; - resource_size_t size; - - if (pmem->data_offset) { - /* -  * Limit the direct_access() size to what is covered by -  * the memmap -  */ - size = (pmem->size - offset) & ~ND_PFN_MASK; - } else - size = pmem->size - offset; - - /* FIXME convert DAX to comprehend that this mapping has a lifetime */ +   *kaddr = pmem->virt_addr + offset;   *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;   - return size; + return pmem->size - offset;  }    static const struct block_device_operations pmem_fops = { diff --git a/fs/dax.c b/fs/dax.c index 131fd35ae39d..bff20cc56130 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -627,6 +627,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,   if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))   goto fallback;   + /* +  * TODO: teach vmf_insert_pfn_pmd() to support +  * 'pte_special' for pmds +  */ + if (pfn_valid(pfn)) + goto fallback; +   if (buffer_unwritten(&bh) || buffer_new(&bh)) {   int i;   for (i = 0; i < PTRS_PER_PMD; i++) diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c index dce346aa94ea..40ab4476c80a 100644 --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -1135,7 +1135,7 @@ static void nfit_test1_setup(struct nfit_test *t)   memdev->interleave_ways = 1;   memdev->flags = ACPI_NFIT_MEM_SAVE_FAILED | ACPI_NFIT_MEM_RESTORE_FAILED   | ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_HEALTH_OBSERVED - | ACPI_NFIT_MEM_ARMED; + | ACPI_NFIT_MEM_NOT_ARMED;     offset += sizeof(*memdev);   /* dcr-descriptor0 */