From patchwork Fri Oct 2 21:02:32 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ross Zwisler X-Patchwork-Id: 7318901 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id E12C0BF90C for ; Fri, 2 Oct 2015 21:02:48 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DDD682087F for ; Fri, 2 Oct 2015 21:02:47 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8BDA120878 for ; Fri, 2 Oct 2015 21:02:46 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E448261D48; Fri, 2 Oct 2015 14:02:45 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by ml01.01.org (Postfix) with ESMTP id 6A85761D0A for ; Fri, 2 Oct 2015 14:02:45 -0700 (PDT) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP; 02 Oct 2015 14:02:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,624,1437462000"; d="scan'208";a="818198831" Received: from theros.lm.intel.com ([10.232.112.146]) by fmsmga002.fm.intel.com with ESMTP; 02 Oct 2015 14:02:44 -0700 From: Ross Zwisler To: linux-kernel@vger.kernel.org Subject: [PATCH 3/3] Revert "dax: fix race between simultaneous faults" Date: Fri, 2 Oct 2015 15:02:32 -0600 Message-Id: <1443819752-17091-4-git-send-email-ross.zwisler@linux.intel.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1443819752-17091-1-git-send-email-ross.zwisler@linux.intel.com> References: <1443819752-17091-1-git-send-email-ross.zwisler@linux.intel.com> Cc: linux-nvdimm@lists.01.org, Jan Kara , linux-mm@kvack.org, Alexander Viro , Dave Chinner , linux-fsdevel@vger.kernel.org, Andrew Morton , "Kirill A. Shutemov" X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This reverts commit 843172978bb92997310d2f7fbc172ece423cfc02. The following two locking commits in the DAX code: commit 843172978bb9 ("dax: fix race between simultaneous faults") commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX") introduced a number of deadlocks and other issues, and need to be reverted for the v4.3 kernel. The list of issues in DAX after these commits (some newly introduced by the commits, some preexisting) can be found here: https://lkml.org/lkml/2015/9/25/602 Signed-off-by: Ross Zwisler --- fs/dax.c | 33 ++++++++++++++++----------------- mm/memory.c | 11 +++-------- 2 files changed, 19 insertions(+), 25 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index de3f53e..f364c90 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -285,6 +285,7 @@ static int copy_user_bh(struct page *to, struct buffer_head *bh, static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, struct vm_area_struct *vma, struct vm_fault *vmf) { + struct address_space *mapping = inode->i_mapping; sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9); unsigned long vaddr = (unsigned long)vmf->virtual_address; void __pmem *addr; @@ -292,6 +293,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, pgoff_t size; int error; + i_mmap_lock_read(mapping); + /* * Check truncate didn't happen while we were allocating a block. * If it did, this block may or may not be still allocated to the @@ -321,6 +324,8 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, error = vm_insert_mixed(vma, vaddr, pfn); out: + i_mmap_unlock_read(mapping); + return error; } @@ -382,17 +387,15 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, * from a read fault and we've raced with a truncate */ error = -EIO; - goto unlock; + goto unlock_page; } - } else { - i_mmap_lock_write(mapping); } error = get_block(inode, block, &bh, 0); if (!error && (bh.b_size < PAGE_SIZE)) error = -EIO; /* fs corruption? */ if (error) - goto unlock; + goto unlock_page; if (!buffer_mapped(&bh) && !buffer_unwritten(&bh) && !vmf->cow_page) { if (vmf->flags & FAULT_FLAG_WRITE) { @@ -403,9 +406,8 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, if (!error && (bh.b_size < PAGE_SIZE)) error = -EIO; if (error) - goto unlock; + goto unlock_page; } else { - i_mmap_unlock_write(mapping); return dax_load_hole(mapping, page, vmf); } } @@ -417,15 +419,17 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, else clear_user_highpage(new_page, vaddr); if (error) - goto unlock; + goto unlock_page; vmf->page = page; if (!page) { + i_mmap_lock_read(mapping); /* Check we didn't race with truncate */ size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT; if (vmf->pgoff >= size) { + i_mmap_unlock_read(mapping); error = -EIO; - goto unlock; + goto out; } } return VM_FAULT_LOCKED; @@ -461,8 +465,6 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, WARN_ON_ONCE(!(vmf->flags & FAULT_FLAG_WRITE)); } - if (!page) - i_mmap_unlock_write(mapping); out: if (error == -ENOMEM) return VM_FAULT_OOM | major; @@ -471,14 +473,11 @@ int __dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf, return VM_FAULT_SIGBUS | major; return VM_FAULT_NOPAGE | major; - unlock: + unlock_page: if (page) { unlock_page(page); page_cache_release(page); - } else { - i_mmap_unlock_write(mapping); } - goto out; } EXPORT_SYMBOL(__dax_fault); @@ -556,10 +555,10 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, block = (sector_t)pgoff << (PAGE_SHIFT - blkbits); bh.b_size = PMD_SIZE; - i_mmap_lock_write(mapping); length = get_block(inode, block, &bh, write); if (length) return VM_FAULT_SIGBUS; + i_mmap_lock_read(mapping); /* * If the filesystem isn't willing to tell us the length of a hole, @@ -634,11 +633,11 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, } out: + i_mmap_unlock_read(mapping); + if (buffer_unwritten(&bh)) complete_unwritten(&bh, !(result & VM_FAULT_ERROR)); - i_mmap_unlock_write(mapping); - return result; fallback: diff --git a/mm/memory.c b/mm/memory.c index 5ec066f..deb679c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2427,16 +2427,11 @@ void unmap_mapping_range(struct address_space *mapping, details.last_index = ULONG_MAX; - /* - * DAX already holds i_mmap_lock to serialise file truncate vs - * page fault and page fault vs page fault. - */ - if (!IS_DAX(mapping->host)) - i_mmap_lock_write(mapping); + /* DAX uses i_mmap_lock to serialise file truncate vs page fault */ + i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap))) unmap_mapping_range_tree(&mapping->i_mmap, &details); - if (!IS_DAX(mapping->host)) - i_mmap_unlock_write(mapping); + i_mmap_unlock_write(mapping); } EXPORT_SYMBOL(unmap_mapping_range);