From patchwork Wed Sep 23 04:42:06 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 7247371 Return-Path: X-Original-To: patchwork-linux-nvdimm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 3031C9F372 for ; Wed, 23 Sep 2015 04:48:07 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0CA2B206F3 for ; Wed, 23 Sep 2015 04:48:06 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AA9C6206F1 for ; Wed, 23 Sep 2015 04:48:04 +0000 (UTC) Received: from ml01.vlan14.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 9D0796163C; Tue, 22 Sep 2015 21:48:04 -0700 (PDT) X-Original-To: linux-nvdimm@lists.01.org Delivered-To: linux-nvdimm@lists.01.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by ml01.01.org (Postfix) with ESMTP id F11F86163C for ; Tue, 22 Sep 2015 21:48:02 -0700 (PDT) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 22 Sep 2015 21:48:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,576,1437462000"; d="scan'208";a="775020305" Received: from dwillia2-desk3.jf.intel.com ([10.54.39.39]) by orsmga001.jf.intel.com with ESMTP; 22 Sep 2015 21:47:48 -0700 Subject: [PATCH 10/15] block, dax: fix lifetime of in-kernel dax mappings From: Dan Williams To: akpm@linux-foundation.org Date: Wed, 23 Sep 2015 00:42:06 -0400 Message-ID: <20150923044206.36490.79829.stgit@dwillia2-desk3.jf.intel.com> In-Reply-To: <20150923043737.36490.70547.stgit@dwillia2-desk3.jf.intel.com> References: <20150923043737.36490.70547.stgit@dwillia2-desk3.jf.intel.com> User-Agent: StGit/0.17.1-9-g687f MIME-Version: 1.0 Cc: Jens Axboe , linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Christoph Hellwig X-BeenThere: linux-nvdimm@lists.01.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Linux-nvdimm developer list." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The DAX implementation needs to protect new calls to ->direct_access() and usage of its return value against unbind of the underlying block device. Use blk_dax_{get|put}() to either prevent blk_cleanup_queue() from proceeding, or fail the dax_map_bh() if the request_queue is being torn down. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Boaz Harrosh Cc: Ross Zwisler Signed-off-by: Dan Williams --- fs/dax.c | 131 ++++++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 84 insertions(+), 47 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index bcfb14bfc1e4..358eea39e982 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -63,12 +63,43 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size) } EXPORT_SYMBOL_GPL(dax_clear_blocks); -static long dax_get_addr(struct buffer_head *bh, void __pmem **addr, - unsigned blkbits) +static void __pmem *__dax_map_bh(const struct buffer_head *bh, unsigned blkbits, + unsigned long *pfn, long *len) { - unsigned long pfn; + long rc; + void __pmem *addr; + struct block_device *bdev = bh->b_bdev; + struct request_queue *q = bdev->bd_queue; sector_t sector = bh->b_blocknr << (blkbits - 9); - return bdev_direct_access(bh->b_bdev, sector, addr, &pfn, bh->b_size); + + rc = blk_dax_get(q); + if (rc < 0) + return (void __pmem *) ERR_PTR(rc); + rc = bdev_direct_access(bdev, sector, &addr, pfn, bh->b_size); + if (len) + *len = rc; + if (rc < 0) { + blk_dax_put(q); + return (void __pmem *) ERR_PTR(rc); + } + return addr; +} + +static void __pmem *dax_map_bh(const struct buffer_head *bh, unsigned blkbits) +{ + unsigned long pfn; + + return __dax_map_bh(bh, blkbits, &pfn, NULL); +} + +static void dax_unmap_bh(const struct buffer_head *bh, void __pmem *addr) +{ + struct block_device *bdev = bh->b_bdev; + struct request_queue *q = bdev->bd_queue; + + if (IS_ERR(addr)) + return; + blk_dax_put(q); } /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */ @@ -104,15 +135,16 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, loff_t start, loff_t end, get_block_t get_block, struct buffer_head *bh) { - ssize_t retval = 0; - loff_t pos = start; - loff_t max = start; - loff_t bh_max = start; - void __pmem *addr; + loff_t pos = start, max = start, bh_max = start; + int rw = iov_iter_rw(iter), rc; + long map_len = 0; + unsigned long pfn; + void __pmem *addr = NULL; + void __pmem *kmap = (void __pmem *) ERR_PTR(-EIO); bool hole = false; bool need_wmb = false; - if (iov_iter_rw(iter) != WRITE) + if (rw == READ) end = min(end, i_size_read(inode)); while (pos < end) { @@ -127,9 +159,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, if (pos == bh_max) { bh->b_size = PAGE_ALIGN(end - pos); bh->b_state = 0; - retval = get_block(inode, block, bh, - iov_iter_rw(iter) == WRITE); - if (retval) + rc = get_block(inode, block, bh, rw == WRITE); + if (rc) break; if (!buffer_size_valid(bh)) bh->b_size = 1 << blkbits; @@ -141,21 +172,25 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, bh->b_size -= done; } - hole = iov_iter_rw(iter) != WRITE && !buffer_written(bh); + hole = rw == READ && !buffer_written(bh); if (hole) { addr = NULL; size = bh->b_size - first; } else { - retval = dax_get_addr(bh, &addr, blkbits); - if (retval < 0) + dax_unmap_bh(bh, kmap); + kmap = __dax_map_bh(bh, blkbits, &pfn, &map_len); + if (IS_ERR(kmap)) { + rc = PTR_ERR(kmap); break; + } + addr = kmap; if (buffer_unwritten(bh) || buffer_new(bh)) { - dax_new_buf(addr, retval, first, pos, - end); + dax_new_buf(addr, map_len, first, pos, + end); need_wmb = true; } addr += first; - size = retval - first; + size = map_len - first; } max = min(pos + size, end); } @@ -178,8 +213,9 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter, if (need_wmb) wmb_pmem(); + dax_unmap_bh(bh, kmap); - return (pos == start) ? retval : pos - start; + return (pos == start) ? rc : pos - start; } /** @@ -274,21 +310,22 @@ static int copy_user_bh(struct page *to, struct buffer_head *bh, void __pmem *vfrom; void *vto; - if (dax_get_addr(bh, &vfrom, blkbits) < 0) - return -EIO; + vfrom = dax_map_bh(bh, blkbits); + if (IS_ERR(vfrom)) + return PTR_ERR(vfrom); vto = kmap_atomic(to); copy_user_page(vto, (void __force *)vfrom, vaddr, to); kunmap_atomic(vto); + dax_unmap_bh(bh, vfrom); return 0; } static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, struct vm_area_struct *vma, struct vm_fault *vmf) { - sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9); unsigned long vaddr = (unsigned long)vmf->virtual_address; - void __pmem *addr; unsigned long pfn; + void __pmem *addr; pgoff_t size; int error; @@ -305,11 +342,9 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, goto out; } - error = bdev_direct_access(bh->b_bdev, sector, &addr, &pfn, bh->b_size); - if (error < 0) - goto out; - if (error < PAGE_SIZE) { - error = -EIO; + addr = __dax_map_bh(bh, inode->i_blkbits, &pfn, NULL); + if (IS_ERR(addr)) { + error = PTR_ERR(addr); goto out; } @@ -317,6 +352,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh, clear_pmem(addr, PAGE_SIZE); wmb_pmem(); } + dax_unmap_bh(bh, addr); error = vm_insert_mixed(vma, vaddr, pfn); @@ -528,11 +564,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, unsigned blkbits = inode->i_blkbits; unsigned long pmd_addr = address & PMD_MASK; bool write = flags & FAULT_FLAG_WRITE; - long length; - void __pmem *kaddr; pgoff_t size, pgoff; - sector_t block, sector; - unsigned long pfn; + sector_t block; int result = 0; /* Fall back to PTEs if we're going to COW */ @@ -557,8 +590,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, bh.b_size = PMD_SIZE; i_mmap_lock_write(mapping); - length = get_block(inode, block, &bh, write); - if (length) + if (get_block(inode, block, &bh, write) != 0) return VM_FAULT_SIGBUS; /* @@ -569,17 +601,17 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE) goto fallback; - sector = bh.b_blocknr << (blkbits - 9); - if (buffer_unwritten(&bh) || buffer_new(&bh)) { int i; + long length; + unsigned long pfn; + void __pmem *kaddr = __dax_map_bh(&bh, blkbits, &pfn, &length); - length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn, - bh.b_size); - if (length < 0) { + if (IS_ERR(kaddr)) { result = VM_FAULT_SIGBUS; goto out; } + if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) goto fallback; @@ -589,6 +621,7 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, count_vm_event(PGMAJFAULT); mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT); result |= VM_FAULT_MAJOR; + dax_unmap_bh(&bh, kaddr); } /* @@ -635,12 +668,15 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address, result = VM_FAULT_NOPAGE; spin_unlock(ptl); } else { - length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn, - bh.b_size); - if (length < 0) { + long length; + unsigned long pfn; + void __pmem *kaddr = __dax_map_bh(&bh, blkbits, &pfn, &length); + + if (IS_ERR(kaddr)) { result = VM_FAULT_SIGBUS; goto out; } + dax_unmap_bh(&bh, kaddr); if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) goto fallback; @@ -746,12 +782,13 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length, if (err < 0) return err; if (buffer_written(&bh)) { - void __pmem *addr; - err = dax_get_addr(&bh, &addr, inode->i_blkbits); - if (err < 0) - return err; + void __pmem *addr = dax_map_bh(&bh, inode->i_blkbits); + + if (IS_ERR(addr)) + return PTR_ERR(addr); clear_pmem(addr + offset, length); wmb_pmem(); + dax_unmap_bh(&bh, addr); } return 0;