[01/10] ext4: write out dirty data before dropping pages

Message ID	20240830073800.2131781-2-yi.zhang@huaweicloud.com (mailing list archive)
State	New
Headers	show Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B35B456440; Fri, 30 Aug 2024 07:39:33 +0000 (UTC) From: Zhang Yi <yi.zhang@huaweicloud.com> To: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, yi.zhang@huawei.com, yi.zhang@huaweicloud.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: [PATCH 01/10] ext4: write out dirty data before dropping pages Date: Fri, 30 Aug 2024 15:37:51 +0800 Message-Id: <20240830073800.2131781-2-yi.zhang@huaweicloud.com> In-Reply-To: <20240830073800.2131781-1-yi.zhang@huaweicloud.com> References: <20240830073800.2131781-1-yi.zhang@huaweicloud.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	ext4: clean up and refactor fallocate \| expand [00/10] ext4: clean up and refactor fallocate [01/10] ext4: write out dirty data before dropping pages [02/10] ext4: don't explicit update times in ext4_fallocate() [03/10] ext4: drop ext4_update_disksize_before_punch() [04/10] ext4: refactor ext4_zero_range() [05/10] ext4: refactor ext4_punch_hole() [06/10] ext4: refactor ext4_collapse_range() [07/10] ext4: refactor ext4_insert_range() [08/10] ext4: factor out ext4_do_fallocate() [09/10] ext4: factor out the common checking part of all fallocate operations [10/10] ext4: factor out a common helper to lock and flush data before fallocate

Message ID

20240830073800.2131781-2-yi.zhang@huaweicloud.com (mailing list archive)

State

New

Headers

From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	tytso@mit.edu,
	adilger.kernel@dilger.ca,
	jack@suse.cz,
	ritesh.list@gmail.com,
	yi.zhang@huawei.com,
	yi.zhang@huaweicloud.com,
	chengzhihao1@huawei.com,
	yukuai3@huawei.com
Subject: [PATCH 01/10] ext4: write out dirty data before dropping pages
Date: Fri, 30 Aug 2024 15:37:51 +0800
Message-Id: <20240830073800.2131781-2-yi.zhang@huaweicloud.com>
In-Reply-To: <20240830073800.2131781-1-yi.zhang@huaweicloud.com>
References: <20240830073800.2131781-1-yi.zhang@huaweicloud.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

ext4: clean up and refactor fallocate | expand

Commit Message

Zhang Yi Aug. 30, 2024, 7:37 a.m. UTC

From: Zhang Yi <yi.zhang@huawei.com>

Current zero range, punch hole and collapse range have a common
potential data loss problem. In general, ext4_zero_range(),
ext4_collapse_range() and ext4_punch_hold() will discard all page cache
of the operation range before converting the extents status. However,
the first two functions don't write back dirty data before discarding
page cache, and ext4_punch_hold() write back at the very beginning
without holding i_rwsem and mapping invalidate lock. Hence, if some bad
things (e.g. EIO or ENOMEM) happens just after dropping dirty page
cache, the operation will failed but the user's valid data in the dirty
page cache will be lost. Fix this by write all dirty data under i_rwsem
and mapping invalidate lock before discarding pages.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 77 +++++++++++++++++------------------------------
 fs/ext4/inode.c   | 19 +++++-------
 2 files changed, 36 insertions(+), 60 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e067f2dd0335..7d5edfa2e630 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4602,6 +4602,24 @@  static long ext4_zero_range(struct file *file, loff_t offset,
 	if (ret)
 		goto out_mutex;
 
+	/*
+	 * Prevent page faults from reinstantiating pages we have released
+	 * from page cache.
+	 */
+	filemap_invalidate_lock(mapping);
+
+	ret = ext4_break_layouts(inode);
+	if (ret)
+		goto out_invalidate_lock;
+
+	/*
+	 * Write data that will be zeroed to preserve them when successfully
+	 * discarding page cache below but fail to convert extents.
+	 */
+	ret = filemap_write_and_wait_range(mapping, start, end - 1);
+	if (ret)
+		goto out_invalidate_lock;
+
 	/* Preallocate the range including the unaligned edges */
 	if (partial_begin || partial_end) {
 		ret = ext4_alloc_file_blocks(file,
@@ -4610,7 +4628,7 @@  static long ext4_zero_range(struct file *file, loff_t offset,
 				 round_down(offset, 1 << blkbits)) >> blkbits,
 				new_size, flags);
 		if (ret)
-			goto out_mutex;
+			goto out_invalidate_lock;
 
 	}
 
@@ -4619,37 +4637,9 @@  static long ext4_zero_range(struct file *file, loff_t offset,
 		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
 			  EXT4_EX_NOCACHE);
 
-		/*
-		 * Prevent page faults from reinstantiating pages we have
-		 * released from page cache.
-		 */
-		filemap_invalidate_lock(mapping);
-
-		ret = ext4_break_layouts(inode);
-		if (ret) {
-			filemap_invalidate_unlock(mapping);
-			goto out_mutex;
-		}
-
 		ret = ext4_update_disksize_before_punch(inode, offset, len);
-		if (ret) {
-			filemap_invalidate_unlock(mapping);
-			goto out_mutex;
-		}
-
-		/*
-		 * For journalled data we need to write (and checkpoint) pages
-		 * before discarding page cache to avoid inconsitent data on
-		 * disk in case of crash before zeroing trans is committed.
-		 */
-		if (ext4_should_journal_data(inode)) {
-			ret = filemap_write_and_wait_range(mapping, start,
-							   end - 1);
-			if (ret) {
-				filemap_invalidate_unlock(mapping);
-				goto out_mutex;
-			}
-		}
+		if (ret)
+			goto out_invalidate_lock;
 
 		/* Now release the pages and zero block aligned part of pages */
 		truncate_pagecache_range(inode, start, end - 1);
@@ -4657,12 +4647,11 @@  static long ext4_zero_range(struct file *file, loff_t offset,
 
 		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
 					     flags);
-		filemap_invalidate_unlock(mapping);
 		if (ret)
-			goto out_mutex;
+			goto out_invalidate_lock;
 	}
 	if (!partial_begin && !partial_end)
-		goto out_mutex;
+		goto out_invalidate_lock;
 
 	/*
 	 * In worst case we have to writeout two nonadjacent unwritten
@@ -4675,7 +4664,7 @@  static long ext4_zero_range(struct file *file, loff_t offset,
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
 		ext4_std_error(inode->i_sb, ret);
-		goto out_mutex;
+		goto out_invalidate_lock;
 	}
 
 	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
@@ -4694,6 +4683,8 @@  static long ext4_zero_range(struct file *file, loff_t offset,
 
 out_handle:
 	ext4_journal_stop(handle);
+out_invalidate_lock:
+	filemap_invalidate_unlock(mapping);
 out_mutex:
 	inode_unlock(inode);
 	return ret;
@@ -5363,20 +5354,8 @@  static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	 * for page size > block size.
 	 */
 	ioffset = round_down(offset, PAGE_SIZE);
-	/*
-	 * Write tail of the last page before removed range since it will get
-	 * removed from the page cache below.
-	 */
-	ret = filemap_write_and_wait_range(mapping, ioffset, offset);
-	if (ret)
-		goto out_mmap;
-	/*
-	 * Write data that will be shifted to preserve them when discarding
-	 * page cache below. We are also protected from pages becoming dirty
-	 * by i_rwsem and invalidate_lock.
-	 */
-	ret = filemap_write_and_wait_range(mapping, offset + len,
-					   LLONG_MAX);
+	/* Write out all dirty pages */
+	ret = filemap_write_and_wait_range(mapping, ioffset, LLONG_MAX);
 	if (ret)
 		goto out_mmap;
 	truncate_pagecache(inode, ioffset);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 941c1c0d5c6e..c3d7606a5315 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3957,17 +3957,6 @@  int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	trace_ext4_punch_hole(inode, offset, length, 0);
 
-	/*
-	 * Write out all dirty pages to avoid race conditions
-	 * Then release them.
-	 */
-	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		ret = filemap_write_and_wait_range(mapping, offset,
-						   offset + length - 1);
-		if (ret)
-			return ret;
-	}
-
 	inode_lock(inode);
 
 	/* No need to punch hole beyond i_size */
@@ -4021,6 +4010,14 @@  int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (ret)
 		goto out_dio;
 
+	/* Write out all dirty pages to avoid race conditions */
+	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+		ret = filemap_write_and_wait_range(mapping, offset,
+						   offset + length - 1);
+		if (ret)
+			goto out_dio;
+	}
+
 	first_block_offset = round_up(offset, sb->s_blocksize);
 	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;

[01/10] ext4: write out dirty data before dropping pages

Commit Message

Patch