From patchwork Wed May 23 14:43:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 10421581 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8E72360327 for ; Wed, 23 May 2018 14:45:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7AFA328113 for ; Wed, 23 May 2018 14:45:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6F6BE28868; Wed, 23 May 2018 14:45:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 95F5B28113 for ; Wed, 23 May 2018 14:45:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C8D96B0273; Wed, 23 May 2018 10:44:57 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 452426B0274; Wed, 23 May 2018 10:44:57 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FA6D6B0275; Wed, 23 May 2018 10:44:57 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id D9B226B0273 for ; Wed, 23 May 2018 10:44:56 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id j14-v6so13161479pfn.11 for ; Wed, 23 May 2018 07:44:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=nKI32iQClGVYlfIbGEiFkQMevxF+s9NYdCmvCrl7D7U=; b=ffWynISy8p9rI+K2MSxSnhfObZaUsR1jW5Y4dnh6MWb54XQjR1ZTxGvNw34rWa4N4P 9hzHGBlSpxRACyMaF8TMYNTodZozp0Wf1VujuOnzQ4wwpOUh9efa1NOEnvxA8TRDLbw4 TcnhHNmBN6OzQSHRpV+Uh+QZj8z8pcxLN7fsWAgWN0YtCl+op2ZQviQVmyMyXmKMvx+7 B9z0JrES+ujCQ19W4hDo/UKVHP19cMlLy7qtIetB+B/KHtCD57E/UG2rClhMOL1M6V3V aUPYCmBsePOK7VWX11IOAL9ie/7m0Q9bx+swf29S7r6BUUewLwqBDQeKdu4mpZz7v5On Xm4w== X-Gm-Message-State: ALKqPwc3l+G4N73/69Yg7BZXQqrCWjX7xg8HeT2SQGVKQ7e0Y0XacXVV Jo1SnKjITUfERklJoqiIJvXE4qz7zx6QuzpgcxRjdojCmKAtq38XVpDsL/wXv86sCSdQIsLcufs ir2HtWm79uGMcUoUf4/fqGFuXE+VeqR6s/nNBdhq9GhoJYYHbgnVGZGAx3VrjmJo= X-Received: by 2002:a17:902:24e:: with SMTP id 72-v6mr3196701plc.87.1527086696558; Wed, 23 May 2018 07:44:56 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrDq6WcYDlcJN7OFnUbtHe0+IcyuCT03ffCwBcHMlWlRJvJQZzdHMhxnCUl2or1uzZ7JjDF X-Received: by 2002:a17:902:24e:: with SMTP id 72-v6mr3196656plc.87.1527086695650; Wed, 23 May 2018 07:44:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527086695; cv=none; d=google.com; s=arc-20160816; b=0Nguo3ZycWf/jo9gwg3+kEqA8ted58ktDLbU2uK5/Yus9IDHEEE/CTgJSkmNNkk6Ai zziCnzqKH+7J2CceLjrSpIPAgiokKhjxazSd9nVJD2vXo3rKa/tKNulSQfL5NBLcVeb+ KjPil2klN9DMZt4Qzs5aIld+MTRuHjnkVbBQTeBWguk3zHppDoUdAOEmFtAul2/SfjTQ nhFEDnWfjPGQqGfj102LJEmD+VKXv4wtdlpQ4eYJOrX7Uki6vh8C9BfgsRoL8stJ/XYW mxU0OJVVLb3fPM8bNGM7zQzVtL7RKCESjkakL9w4ddwXASG/D2UM18YeO7Pkbe5aLbBX FOMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=nKI32iQClGVYlfIbGEiFkQMevxF+s9NYdCmvCrl7D7U=; b=r+WWomffs7Cx9Lwl1osZcvpN8icg9d6P8zHTi2xizzpmngIVEhSKFCst2Ve/PzW6wd 2DLbfWvruDkwCeIK7ke/vvP2mWk+6v7xP9r5po6yepILUO+V7l8z8sBWEfsjFG0+HTK6 H7My8H+4VmVXLYpl0wq2WVrmpe4U92uspJCbyiD7nRwrLF1be3VaL2fCkqWyW3UoCIJq vfmZXkSXGKEyCP+CqQv1vMVlDkpXZVoCERJV2SCx+sf83RYZUEqj0u/NiUzupzhIXZ7N Fg4yzT68r6RZYPdv+jw5M8Fi/jC04b7D+F4dCqFGiIjuWFz36rswk1XrfM5JjYzuN0Qx PvVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20170209 header.b=KmPQa0Ef; spf=pass (google.com: best guess record for domain of batv+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) smtp.mailfrom=BATV+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org Received: from bombadil.infradead.org (bombadil.infradead.org. [2607:7c80:54:e::133]) by mx.google.com with ESMTPS id n22-v6si18927009pfb.126.2018.05.23.07.44.55 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 23 May 2018 07:44:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of batv+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) client-ip=2607:7c80:54:e::133; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20170209 header.b=KmPQa0Ef; spf=pass (google.com: best guess record for domain of batv+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) smtp.mailfrom=BATV+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=References:In-Reply-To:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=nKI32iQClGVYlfIbGEiFkQMevxF+s9NYdCmvCrl7D7U=; b=KmPQa0EfRqi/6h9jh9LsppJJi WQLoSRnH80V9nlyN2UwwjgtIIg7SfHcBFIHil4bXupBx7IGGGATcCUMCoG9Qp3ja1zmx5haFdFPzU z4wMq3mVobkXh+qXjsKAcriKcZwNX0BVKxR/QDLYufJ6VaFeULaLErmUHio0NZf9ItzgEgVqg+ft+ iRP41hWNSN9jlokjynnOSAjuvGxcjnC0iDBfO7lw9QSzbiLkCcWekDT4QGuCYVh3Migvvw07UclXp wg++HwkJirDkempmZO5Xfv2AJFS1nO/FRY9H6ksMwLBuA/KcqOeNRADnUarWxIv2N7uA05iXWyXOj 8C3ZcNe6g==; Received: from 089144199016.atnat0008.highway.a1.net ([89.144.199.16] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fLV0T-0000Q7-Lw; Wed, 23 May 2018 14:44:50 +0000 From: Christoph Hellwig To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 16/34] iomap: add initial support for writes without buffer heads Date: Wed, 23 May 2018 16:43:39 +0200 Message-Id: <20180523144357.18985-17-hch@lst.de> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180523144357.18985-1-hch@lst.de> References: <20180523144357.18985-1-hch@lst.de> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP For now just limited to blocksize == PAGE_SIZE, where we can simply read in the full page in write begin, and just set the whole page dirty after copying data into it. This code is enabled by default and XFS will now be feed pages without buffer heads in ->writepage and ->writepages. If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old path will still be used, this both helps the transition in XFS and prepares for the gfs2 migration to the iomap infrastructure. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap.c | 129 ++++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_iomap.c | 6 +- include/linux/iomap.h | 2 + 3 files changed, 124 insertions(+), 13 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 78259a2249f4..debb859a8a14 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -308,6 +308,49 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len) truncate_pagecache_range(inode, max(pos, i_size), pos + len); } +static int +iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page, + unsigned poff, unsigned plen, unsigned from, unsigned to, + struct iomap *iomap) +{ + struct bio_vec bvec; + struct bio bio; + + if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) { + zero_user_segments(page, poff, from, to, poff + plen); + return 0; + } + + bio_init(&bio, &bvec, 1); + bio.bi_opf = REQ_OP_READ; + bio.bi_iter.bi_sector = iomap_sector(iomap, block_start); + bio_set_dev(&bio, iomap->bdev); + __bio_add_page(&bio, page, plen, poff); + return submit_bio_wait(&bio); +} + +static int +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, + struct page *page, struct iomap *iomap) +{ + loff_t block_size = i_blocksize(inode); + loff_t block_start = pos & ~(block_size - 1); + loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1); + unsigned poff = block_start & (PAGE_SIZE - 1); + unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start); + unsigned from = pos & (PAGE_SIZE - 1); + unsigned to = from + len; + + WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE); + + if (PageUptodate(page)) + return 0; + if (from <= poff && to >= poff + plen) + return 0; + return iomap_read_page_sync(inode, block_start, page, + poff, plen, from, to, iomap); +} + static int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, struct page **pagep, struct iomap *iomap) @@ -325,7 +368,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, if (!page) return -ENOMEM; - status = __block_write_begin_int(page, pos, len, NULL, iomap); + if (iomap->flags & IOMAP_F_BUFFER_HEAD) + status = __block_write_begin_int(page, pos, len, NULL, iomap); + else + status = __iomap_write_begin(inode, pos, len, page, iomap); if (unlikely(status)) { unlock_page(page); put_page(page); @@ -338,14 +384,69 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, return status; } +int +iomap_set_page_dirty(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + int newly_dirty; + + if (unlikely(!mapping)) + return !TestSetPageDirty(page); + + /* + * Lock out page->mem_cgroup migration to keep PageDirty + * synchronized with per-memcg dirty page counters. + */ + lock_page_memcg(page); + newly_dirty = !TestSetPageDirty(page); + if (newly_dirty) + __set_page_dirty(page, mapping, 0); + unlock_page_memcg(page); + + if (newly_dirty) + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + return newly_dirty; +} +EXPORT_SYMBOL_GPL(iomap_set_page_dirty); + +static int +__iomap_write_end(struct inode *inode, loff_t pos, unsigned len, + unsigned copied, struct page *page, struct iomap *iomap) +{ + flush_dcache_page(page); + + /* + * The blocks that were entirely written will now be uptodate, so we + * don't have to worry about a readpage reading them and overwriting a + * partial write. However if we have encountered a short write and only + * partially written into a block, it will not be marked uptodate, so a + * readpage might come in and destroy our partial write. + * + * Do the simplest thing, and just treat any short write to a non + * uptodate page as a zero-length write, and force the caller to redo + * the whole thing. + */ + if (unlikely(copied < len && !PageUptodate(page))) { + copied = 0; + } else { + SetPageUptodate(page); + iomap_set_page_dirty(page); + } + return __generic_write_end(inode, pos, copied, page); +} + static int iomap_write_end(struct inode *inode, loff_t pos, unsigned len, - unsigned copied, struct page *page) + unsigned copied, struct page *page, struct iomap *iomap) { int ret; - ret = generic_write_end(NULL, inode->i_mapping, pos, len, - copied, page, NULL); + if (iomap->flags & IOMAP_F_BUFFER_HEAD) + ret = generic_write_end(NULL, inode->i_mapping, pos, len, + copied, page, NULL); + else + ret = __iomap_write_end(inode, pos, len, copied, page, iomap); + if (ret < len) iomap_write_failed(inode, pos, len); return ret; @@ -400,7 +501,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, flush_dcache_page(page); - status = iomap_write_end(inode, pos, bytes, copied, page); + status = iomap_write_end(inode, pos, bytes, copied, page, + iomap); if (unlikely(status < 0)) break; copied = status; @@ -494,7 +596,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, WARN_ON_ONCE(!PageUptodate(page)); - status = iomap_write_end(inode, pos, bytes, bytes, page); + status = iomap_write_end(inode, pos, bytes, bytes, page, iomap); if (unlikely(status <= 0)) { if (WARN_ON_ONCE(status == 0)) return -EIO; @@ -546,7 +648,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, zero_user(page, offset, bytes); mark_page_accessed(page); - return iomap_write_end(inode, pos, bytes, bytes, page); + return iomap_write_end(inode, pos, bytes, bytes, page, iomap); } static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes, @@ -632,11 +734,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length, struct page *page = data; int ret; - ret = __block_write_begin_int(page, pos, length, NULL, iomap); - if (ret) - return ret; + if (iomap->flags & IOMAP_F_BUFFER_HEAD) { + ret = __block_write_begin_int(page, pos, length, NULL, iomap); + if (ret) + return ret; + block_commit_write(page, 0, length); + } else { + WARN_ON_ONCE(!PageUptodate(page)); + WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE); + } - block_commit_write(page, 0, length); return length; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index c6ce6f9335b6..da6d1995e460 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -638,7 +638,7 @@ xfs_file_iomap_begin_delay( * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch * them out if the write happens to fail. */ - iomap->flags = IOMAP_F_NEW; + iomap->flags |= IOMAP_F_NEW; trace_xfs_iomap_alloc(ip, offset, count, 0, &got); done: if (isnullstartblock(got.br_startblock)) @@ -1031,6 +1031,8 @@ xfs_file_iomap_begin( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; + iomap->flags |= IOMAP_F_BUFFER_HEAD; + if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) && !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ @@ -1131,7 +1133,7 @@ xfs_file_iomap_begin( if (error) return error; - iomap->flags = IOMAP_F_NEW; + iomap->flags |= IOMAP_F_NEW; trace_xfs_iomap_alloc(ip, offset, length, 0, &imap); out_finish: diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 7300d30ca495..4d3d9d0cd69f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -30,6 +30,7 @@ struct vm_fault; */ #define IOMAP_F_NEW 0x01 /* blocks have been newly allocated */ #define IOMAP_F_DIRTY 0x02 /* uncommitted metadata */ +#define IOMAP_F_BUFFER_HEAD 0x04 /* file system requires buffer heads */ /* * Flags that only need to be reported for IOMAP_REPORT requests: @@ -92,6 +93,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, int iomap_readpage(struct page *page, const struct iomap_ops *ops); int iomap_readpages(struct address_space *mapping, struct list_head *pages, unsigned nr_pages, const struct iomap_ops *ops); +int iomap_set_page_dirty(struct page *page); int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,