From patchwork Mon Jun 26 14:25:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 9809785 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 56BE560329 for ; Mon, 26 Jun 2017 14:25:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BF55285D1 for ; Mon, 26 Jun 2017 14:25:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 50E96285D8; Mon, 26 Jun 2017 14:25:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E09FE285DA for ; Mon, 26 Jun 2017 14:25:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751485AbdFZOZs (ORCPT ); Mon, 26 Jun 2017 10:25:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34116 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750752AbdFZOZr (ORCPT ); Mon, 26 Jun 2017 10:25:47 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 28B773D974; Mon, 26 Jun 2017 14:25:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 28B773D974 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=agruenba@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 28B773D974 Received: from nux.redhat.com (ovpn-116-233.ams2.redhat.com [10.36.116.233]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5F3046BF6C; Mon, 26 Jun 2017 14:25:23 +0000 (UTC) From: Andreas Gruenbacher To: linux-fsdevel@vger.kernel.org Cc: Andreas Gruenbacher , linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org Subject: [PATCH v3 1/5] vfs: Add page_cache_seek_hole_data helper Date: Mon, 26 Jun 2017 16:25:14 +0200 Message-Id: <1498487118-6564-2-git-send-email-agruenba@redhat.com> In-Reply-To: <1498487118-6564-1-git-send-email-agruenba@redhat.com> References: <1498487118-6564-1-git-send-email-agruenba@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 26 Jun 2017 14:25:26 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Both ext4 and xfs implement seeking for the next hole or piece of data in unwritten extents by scanning the page cache, and both versions share the same bug when iterating the buffers of a page: the start offset into the page isn't taken into account, so when a page fits more than two filesystem blocks, things will go wrong. For example, on a filesystem with a block size of 1k, the following command will fail: xfs_io -f -c "falloc 0 4k" \ -c "pwrite 1k 1k" \ -c "pwrite 3k 1k" \ -c "seek -a -r 0" foo In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048, SEEK_DATA) will return the correct result. Introduce a generic vfs helper for seeking in the page cache that gets this right. The next commits will replace the filesystem specific implementations. Signed-off-by: Andreas Gruenbacher --- fs/buffer.c | 125 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/buffer_head.h | 2 + 2 files changed, 127 insertions(+) diff --git a/fs/buffer.c b/fs/buffer.c index 4d5d03b..7702269 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3485,6 +3485,131 @@ int bh_submit_read(struct buffer_head *bh) } EXPORT_SYMBOL(bh_submit_read); +/* + * Seek for SEEK_DATA / SEEK_HOLE within @page, starting at @lastoff. + * + * Returns the offset within the file on success, and -ENOENT otherwise. + */ +static loff_t +page_seek_hole_data(struct page *page, loff_t lastoff, int whence) +{ + loff_t offset = page_offset(page); + struct buffer_head *bh, *head; + bool seek_data = whence == SEEK_DATA; + + if (lastoff < offset) + lastoff = offset; + + bh = head = page_buffers(page); + do { + offset += bh->b_size; + if (lastoff >= offset) + continue; + + /* + * Unwritten extents that have data in the page cache covering + * them can be identified by the BH_Unwritten state flag. + * Pages with multiple buffers might have a mix of holes, data + * and unwritten extents - any buffer with valid data in it + * should have BH_Uptodate flag set on it. + */ + + if ((buffer_unwritten(bh) || buffer_uptodate(bh)) == seek_data) + return lastoff; + + lastoff = offset; + } while ((bh = bh->b_this_page) != head); + return -ENOENT; +} + +/* + * Seek for SEEK_DATA / SEEK_HOLE in the page cache. + * + * Within unwritten extents, the page cache determines which parts are holes + * and which are data: unwritten and uptodate buffer heads count as data; + * everything else counts as a hole. + * + * Returns the resulting offset on successs, and -ENOENT otherwise. + */ +loff_t +page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, + int whence) +{ + pgoff_t index = offset >> PAGE_SHIFT; + pgoff_t end = DIV_ROUND_UP(offset + length, PAGE_SIZE); + loff_t lastoff = offset; + struct pagevec pvec; + + if (length <= 0) + return -ENOENT; + + pagevec_init(&pvec, 0); + + do { + unsigned want, nr_pages, i; + + want = min_t(unsigned, end - index, PAGEVEC_SIZE); + nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index, want); + if (nr_pages == 0) + break; + + for (i = 0; i < nr_pages; i++) { + struct page *page = pvec.pages[i]; + + /* + * At this point, the page may be truncated or + * invalidated (changing page->mapping to NULL), or + * even swizzled back from swapper_space to tmpfs file + * mapping. However, page->index will not change + * because we have a reference on the page. + * + * If current page offset is beyond where we've ended, + * we've found a hole. + */ + if (whence == SEEK_HOLE && + lastoff < page_offset(page)) + goto check_range; + + /* Searching done if the page index is out of range. */ + if (page->index >= end) + goto not_found; + + lock_page(page); + if (likely(page->mapping == inode->i_mapping) && + page_has_buffers(page)) { + lastoff = page_seek_hole_data(page, lastoff, whence); + if (lastoff >= 0) { + unlock_page(page); + goto check_range; + } + } + unlock_page(page); + lastoff = page_offset(page) + PAGE_SIZE; + } + + /* Searching done if fewer pages returned than wanted. */ + if (nr_pages < want) + break; + + index = pvec.pages[i - 1]->index + 1; + pagevec_release(&pvec); + } while (index < end); + + /* When no page at lastoff and we are not done, we found a hole. */ + if (whence != SEEK_HOLE) + goto not_found; + +check_range: + if (lastoff < offset + length) + goto out; +not_found: + lastoff = -ENOENT; +out: + pagevec_release(&pvec); + return lastoff; +} +EXPORT_SYMBOL_GPL(page_cache_seek_hole_data); + void __init buffer_init(void) { unsigned long nrpages; diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index e0abeba..c8dae55 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -202,6 +202,8 @@ void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); int bh_submit_read(struct buffer_head *bh); +loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset, + loff_t length, int whence); extern int buffer_heads_over_limit;