From patchwork Tue Jul 21 18:31:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 11676399 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5EA141510 for ; Tue, 21 Jul 2020 18:32:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 43E74207BB for ; Tue, 21 Jul 2020 18:32:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="PERTZ1Jd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730065AbgGUScL (ORCPT ); Tue, 21 Jul 2020 14:32:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726458AbgGUScG (ORCPT ); Tue, 21 Jul 2020 14:32:06 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F38AC061794; Tue, 21 Jul 2020 11:32:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=GzIGQ/IDPs3sg9v+Bc2PW8SnyaJxpRjXtcrdrPZplIY=; b=PERTZ1JddArB5VG4b0YQjzt4lr 8BelUcw5MkPn9bek4/d71H7MarPZx++gJib+u66HHTa5NXAnp6oMXxumuklzs9lxfep1jMLcJEjBL q4BJI7w1ETZotvQFdJVhnd2oE8inbZd+qyfm1aHkoTKLMtAEdxDX08vz3Q+n7n1hSyvIsbtG+4Ygg wHcpoY8a+m1K85ixBP+LS4YTX8p8R1+eEFzRxdG0VdRJ781HCCqujBWi0U+9tIMzM72Uacfkr5A8x HneYl0O2yosFjGAaXX2+z6Z6VJpyEiBh72eXzRQvvJBG+0+sW8GttAzQj99w3+VJZtXmqgGoAYKId ss3ZaZrw==; Received: from [2001:4bb8:18c:2acc:5b1c:6483:bd6d:e406] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1jxx3Y-00062O-Pa; Tue, 21 Jul 2020 18:32:01 +0000 From: Christoph Hellwig To: Dave Chinner , Goldwyn Rodrigues Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Matthew Wilcox , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, cluster-devel@redhat.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Subject: [PATCH 1/3] xfs: use ENOTBLK for direct I/O to buffered I/O fallback Date: Tue, 21 Jul 2020 20:31:55 +0200 Message-Id: <20200721183157.202276-2-hch@lst.de> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200721183157.202276-1-hch@lst.de> References: <20200721183157.202276-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This is what the classic fs/direct-io.c implementation and thuse other file systems use. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 00db81eac80d6c..a6ef90457abf97 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -505,7 +505,7 @@ xfs_file_dio_aio_write( */ if (xfs_is_cow_inode(ip)) { trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count); - return -EREMCHG; + return -ENOTBLK; } iolock = XFS_IOLOCK_EXCL; } else { @@ -714,7 +714,7 @@ xfs_file_write_iter( * allow an operation to fall back to buffered mode. */ ret = xfs_file_dio_aio_write(iocb, from); - if (ret != -EREMCHG) + if (ret != -ENOTBLK) return ret; } From patchwork Tue Jul 21 18:31:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 11676393 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2578113A4 for ; Tue, 21 Jul 2020 18:32:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0AA2D2072E for ; Tue, 21 Jul 2020 18:32:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="YZLMGD/R" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730209AbgGUScL (ORCPT ); Tue, 21 Jul 2020 14:32:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729015AbgGUScI (ORCPT ); Tue, 21 Jul 2020 14:32:08 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FA2FC0619DB; Tue, 21 Jul 2020 11:32:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=xU171t7ztfam8+xnaeFGlyAau2FdqaQtiVon3JxnpSU=; b=YZLMGD/R+ktbdwc4xiH9a29f5h hwNXGvtAxLPO3AL54x641XLSxIz2rAPmoxfMSymEfT/c1ybEz0H+UYEXhlx4jqYLPfZpn9mzRgkmP sOLGJlYpZlLHpzKqDOgKd73jxrcUgi20fcT3wOvb4j5jig1zCdMagouh5WCdWAWS0+AE+KlsR76tR NScrViP2XiQMiVgbjaUsnIaKp6UgUl7/WmwqvYfwxuPp5slvE/cB6izGeumc2S/5u/mgoAJHMfaTJ pNJI/8DhmehoXnCcedyqVLLW52JBBsLcWvvOCvMf9ukounNnrAttEfHX8A4w4YZQyCxIfk55ghgmh KluiZGhQ==; Received: from [2001:4bb8:18c:2acc:5b1c:6483:bd6d:e406] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1jxx3a-00062X-Oq; Tue, 21 Jul 2020 18:32:03 +0000 From: Christoph Hellwig To: Dave Chinner , Goldwyn Rodrigues Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Matthew Wilcox , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, cluster-devel@redhat.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, Dave Chinner , "Darrick J . Wong" Subject: [PATCH 2/3] iomap: Only invalidate page cache pages on direct IO writes Date: Tue, 21 Jul 2020 20:31:56 +0200 Message-Id: <20200721183157.202276-3-hch@lst.de> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200721183157.202276-1-hch@lst.de> References: <20200721183157.202276-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Dave Chinner The historic requirement for XFS to invalidate cached pages on direct IO reads has been lost in the twisty pages of history - it was inherited from Irix, which implemented page cache invalidation on read as a method of working around problems synchronising page cache state with uncached IO. XFS has carried this ever since. In the initial linux ports it was necessary to get mmap and DIO to play "ok" together and not immediately corrupt data. This was the state of play until the linux kernel had infrastructure to track unwritten extents and synchronise page faults with allocations and unwritten extent conversions (->page_mkwrite infrastructure). IOws, the page cache invalidation on DIO read was necessary to prevent trivial data corruptions. This didn't solve all the problems, though. There were peformance problems if we didn't invalidate the entire page cache over the file on read - we couldn't easily determine if the cached pages were over the range of the IO, and invalidation required taking a serialising lock (i_mutex) on the inode. This serialising lock was an issue for XFS, as it was the only exclusive lock in the direct Io read path. Hence if there were any cached pages, we'd just invalidate the entire file in one go so that subsequent IOs didn't need to take the serialising lock. This was a problem that prevented ranged invalidation from being particularly useful for avoiding the remaining coherency issues. This was solved with the conversion of i_mutex to i_rwsem and the conversion of the XFS inode IO lock to use i_rwsem. Hence we could now just do ranged invalidation and the performance problem went away. However, page cache invalidation was still needed to serialise sub-page/sub-block zeroing via direct IO against buffered IO because bufferhead state attached to the cached page could get out of whack when direct IOs were issued. We've removed bufferheads from the XFS code, and we don't carry any extent state on the cached pages anymore, and so this problem has gone away, too. IOWs, it would appear that we don't have any good reason to be invalidating the page cache on DIO reads anymore. Hence remove the invalidation on read because it is unnecessary overhead, not needed to maintain coherency between mmap/buffered access and direct IO anymore, and prevents anyone from using direct IO reads from intentionally invalidating the page cache of a file. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Christoph Hellwig --- fs/iomap/direct-io.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ec7b78e6fecaf9..190967e87b69e4 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -475,23 +475,22 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (ret) goto out_free_dio; - /* - * Try to invalidate cache pages for the range we're direct - * writing. If this invalidation fails, tough, the write will - * still work, but racing two incompatible write paths is a - * pretty crazy thing to do, so we don't support it 100%. - */ - ret = invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end >> PAGE_SHIFT); - if (ret) - dio_warn_stale_pagecache(iocb->ki_filp); - ret = 0; + if (iov_iter_rw(iter) == WRITE) { + /* + * Try to invalidate cache pages for the range we are writing. + * If this invalidation fails, tough, the write will still work, + * but racing two incompatible write paths is a pretty crazy + * thing to do, so we don't support it 100%. + */ + if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, + end >> PAGE_SHIFT)) + dio_warn_stale_pagecache(iocb->ki_filp); - if (iov_iter_rw(iter) == WRITE && !wait_for_completion && - !inode->i_sb->s_dio_done_wq) { - ret = sb_init_dio_done_wq(inode->i_sb); - if (ret < 0) - goto out_free_dio; + if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { + ret = sb_init_dio_done_wq(inode->i_sb); + if (ret < 0) + goto out_free_dio; + } } inode_dio_begin(inode); From patchwork Tue Jul 21 18:31:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 11676407 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3E5913A4 for ; Tue, 21 Jul 2020 18:32:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A1EF12072E for ; Tue, 21 Jul 2020 18:32:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="YCrFq48B" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730241AbgGUScQ (ORCPT ); Tue, 21 Jul 2020 14:32:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728577AbgGUScJ (ORCPT ); Tue, 21 Jul 2020 14:32:09 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D6E2C0619DC; Tue, 21 Jul 2020 11:32:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=g1iU+g80X5NheLUj70Usw4wskv5FEEX9gbzjsPxiIG4=; b=YCrFq48BOZ+ZpLcUDlo3t/YS22 VhEtv/+6o60MkrpDfnDvCDmuIgcTTmA3psW4K9+lrJx8V7zgFmwewa5YW7a5s5uHA1ClLqUQiAi7D 4de1hqXxmVp6bDFwPv41ymH535PNS/2X7qMjalJyNwk8tZsHgdRY/btyGm/82XKa/7fj73Rr/SKwd 4mdb7PBCiAPtzyLbtQiwXwcHixdWI+/XaWgshBP/Yw3OaNe/3W6dx2etX7gQWy9OABLtTvqnwpOx9 fyjZUFBtLmEx6f9NNMRipziu+ngzNFBtpeAY+ULvC1stSgszyjhvqo+jsP9jCkktH04Wordcp6JvY E3hctjfw==; Received: from [2001:4bb8:18c:2acc:5b1c:6483:bd6d:e406] (helo=localhost) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1jxx3c-00062e-DZ; Tue, 21 Jul 2020 18:32:05 +0000 From: Christoph Hellwig To: Dave Chinner , Goldwyn Rodrigues Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Matthew Wilcox , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, cluster-devel@redhat.com, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, Dave Chinner , Goldwyn Rodrigues Subject: [PATCH 3/3] iomap: fall back to buffered writes for invalidation failures Date: Tue, 21 Jul 2020 20:31:57 +0200 Message-Id: <20200721183157.202276-4-hch@lst.de> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200721183157.202276-1-hch@lst.de> References: <20200721183157.202276-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Failing to invalid the page cache means data in incoherent, which is a very bad state for the system. Always fall back to buffered I/O through the page cache if we can't invalidate mappings. Signed-off-by: Christoph Hellwig Acked-by: Dave Chinner Reviewed-by: Goldwyn Rodrigues Reviewed-by: Darrick J. Wong Acked-by: Damien Le Moal Acked-by: Bob Peterson Reviewed-by: Theodore Ts'o # for ext4 Reviewed-by: Ritesh Harjani Reviewed-by: Andreas Gruenbacher # for gfs2 --- fs/ext4/file.c | 2 ++ fs/gfs2/file.c | 3 ++- fs/iomap/direct-io.c | 16 +++++++++++----- fs/iomap/trace.h | 1 + fs/xfs/xfs_file.c | 4 ++-- fs/zonefs/super.c | 7 +++++-- 6 files changed, 23 insertions(+), 10 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 2a01e31a032c4c..129cc1dd6b7952 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -544,6 +544,8 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, is_sync_kiocb(iocb) || unaligned_io || extend); + if (ret == -ENOTBLK) + ret = 0; if (extend) ret = ext4_handle_inode_extension(inode, offset, ret, count); diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index bebde537ac8cf2..b085a3bea4f0fd 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -835,7 +835,8 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from) ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, is_sync_kiocb(iocb)); - + if (ret == -ENOTBLK) + ret = 0; out: gfs2_glock_dq(&gh); out_uninit: diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 190967e87b69e4..c1aafb2ab99072 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -10,6 +10,7 @@ #include #include #include +#include "trace.h" #include "../internal.h" @@ -401,6 +402,9 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, * can be mapped into multiple disjoint IOs and only a subset of the IOs issued * may be pure data writes. In that case, we still need to do a full data sync * completion. + * + * Returns -ENOTBLK In case of a page invalidation invalidation failure for + * writes. The callers needs to fall back to buffered I/O in this case. */ ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, @@ -478,13 +482,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (iov_iter_rw(iter) == WRITE) { /* * Try to invalidate cache pages for the range we are writing. - * If this invalidation fails, tough, the write will still work, - * but racing two incompatible write paths is a pretty crazy - * thing to do, so we don't support it 100%. + * If this invalidation fails, let the caller fall back to + * buffered I/O. */ if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, - end >> PAGE_SHIFT)) - dio_warn_stale_pagecache(iocb->ki_filp); + end >> PAGE_SHIFT)) { + trace_iomap_dio_invalidate_fail(inode, pos, count); + ret = -ENOTBLK; + goto out_free_dio; + } if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { ret = sb_init_dio_done_wq(inode->i_sb); diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h index 5693a39d52fb63..fdc7ae388476f5 100644 --- a/fs/iomap/trace.h +++ b/fs/iomap/trace.h @@ -74,6 +74,7 @@ DEFINE_EVENT(iomap_range_class, name, \ DEFINE_RANGE_EVENT(iomap_writepage); DEFINE_RANGE_EVENT(iomap_releasepage); DEFINE_RANGE_EVENT(iomap_invalidatepage); +DEFINE_RANGE_EVENT(iomap_dio_invalidate_fail); #define IOMAP_TYPE_STRINGS \ { IOMAP_HOLE, "HOLE" }, \ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index a6ef90457abf97..1b4517fc55f1b9 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -553,8 +553,8 @@ xfs_file_dio_aio_write( xfs_iunlock(ip, iolock); /* - * No fallback to buffered IO on errors for XFS, direct IO will either - * complete fully or fail. + * No fallback to buffered IO after short writes for XFS, direct I/O + * will either complete fully or return an error. */ ASSERT(ret < 0 || ret == count); return ret; diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 07bc42d62673ce..d0a04528a7e18e 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -786,8 +786,11 @@ static ssize_t zonefs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (iocb->ki_pos >= ZONEFS_I(inode)->i_max_size) return -EFBIG; - if (iocb->ki_flags & IOCB_DIRECT) - return zonefs_file_dio_write(iocb, from); + if (iocb->ki_flags & IOCB_DIRECT) { + ssize_t ret = zonefs_file_dio_write(iocb, from); + if (ret != -ENOTBLK) + return ret; + } return zonefs_file_buffered_write(iocb, from); }