zonefs: Fix O_APPEND async write handling

Message ID	20210311032230.159925-1-damien.lemoal@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> IronPort-SDR: XpdIc0vLNU+ydtCSba3EgqluynA4Nr/LX3K2mbu9EFTt4AI47PWk0aKnuvF/VDu+m+xX74CgCy Nr3NV0TGkyytsQxRYoGe7l2y0Emp/VYdXVdZp/OKjl34AiJhlPmTvOqC2vj4pE5vPv8y3RoNql Zut5lplLZA8v7DaGmwN2Ni9hhvUNd0eZyFOiJWlbUxjrjupxCoTqS7oK2CHVmkyv1yD45pe5Xz CQ5+oJqeEFMNgByLV2sr9Waohqw+VG0ccYkT4HAKz+vwihxLe9CkFXpH9+OJu2/Ydw5XZTr8Q0 FD8= IronPort-SDR: 1nWsVF32sStDUeQPivZ+Rwcx2smPBqsnVkmp4ncRRdsIcTr+b+oBlghmbqXUnZIK8mkbbciWHr gAmwpNWLT2leMr8UYJLDY3Vnk3U1YsYdM2LXhV+Cj9K1m8Km6f0jLbeaM0QNHMj2rcaKp6j05G fRJAxjX9VjrxFswmXbtonXplet13YVIgTSa8UmYH5fL++QPWCq9DYXVrKqHBBdz3r1k+yQJrHs ywgAYuKozaoNMQN+N2f4OcK/saWQifFOVhYyXeeVkWOwOfpXJiMMlMZ+AD86k4X5h/ayg+ctnd 8WTIWfFpQvHIEQYsRAWb7si+ IronPort-SDR: jYyXaUwZDr3EzdMtGwePDQ9iCQBXzhkJjEdXLhL/KcAYsfhHuvWHAmMMKPVV7zwK7pXuXw1CPj 4D2oNmiOE9WnoDVkt5oOjVs9etVTNLD0t4uHiEuJX7kJXfRvTfmCouJa06GINE1pjFqdCSQpoo FcA4nfjNxaTNIlEwk3CFtyk8GVczGOGMsEl2jXbd/cAbGvAburcEXMuQu8youBUhGgQs51JQqw JJJ7TJJVL1J0flNKIZ6q4Hpkwccex7IRvat2pmuX57mVPYUbdUT9ftaa0T+k9bM+OqSAHyDYY3 5aw= WDCIronportException: Internal From: Damien Le Moal <damien.lemoal@wdc.com> To: linux-fsdevel@vger.kernel.org Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Subject: [PATCH] zonefs: Fix O_APPEND async write handling Date: Thu, 11 Mar 2021 12:22:30 +0900 Message-Id: <20210311032230.159925-1-damien.lemoal@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	zonefs: Fix O_APPEND async write handling \| expand zonefs: Fix O_APPEND async write handling

Message ID

20210311032230.159925-1-damien.lemoal@wdc.com (mailing list archive)

State

New, archived

Headers

IronPort-SDR: 
 XpdIc0vLNU+ydtCSba3EgqluynA4Nr/LX3K2mbu9EFTt4AI47PWk0aKnuvF/VDu+m+xX74CgCy
 Nr3NV0TGkyytsQxRYoGe7l2y0Emp/VYdXVdZp/OKjl34AiJhlPmTvOqC2vj4pE5vPv8y3RoNql
 Zut5lplLZA8v7DaGmwN2Ni9hhvUNd0eZyFOiJWlbUxjrjupxCoTqS7oK2CHVmkyv1yD45pe5Xz
 CQ5+oJqeEFMNgByLV2sr9Waohqw+VG0ccYkT4HAKz+vwihxLe9CkFXpH9+OJu2/Ydw5XZTr8Q0
 FD8=
IronPort-SDR: 
 1nWsVF32sStDUeQPivZ+Rwcx2smPBqsnVkmp4ncRRdsIcTr+b+oBlghmbqXUnZIK8mkbbciWHr
 gAmwpNWLT2leMr8UYJLDY3Vnk3U1YsYdM2LXhV+Cj9K1m8Km6f0jLbeaM0QNHMj2rcaKp6j05G
 fRJAxjX9VjrxFswmXbtonXplet13YVIgTSa8UmYH5fL++QPWCq9DYXVrKqHBBdz3r1k+yQJrHs
 ywgAYuKozaoNMQN+N2f4OcK/saWQifFOVhYyXeeVkWOwOfpXJiMMlMZ+AD86k4X5h/ayg+ctnd
 8WTIWfFpQvHIEQYsRAWb7si+
IronPort-SDR: 
 jYyXaUwZDr3EzdMtGwePDQ9iCQBXzhkJjEdXLhL/KcAYsfhHuvWHAmMMKPVV7zwK7pXuXw1CPj
 4D2oNmiOE9WnoDVkt5oOjVs9etVTNLD0t4uHiEuJX7kJXfRvTfmCouJa06GINE1pjFqdCSQpoo
 FcA4nfjNxaTNIlEwk3CFtyk8GVczGOGMsEl2jXbd/cAbGvAburcEXMuQu8youBUhGgQs51JQqw
 JJJ7TJJVL1J0flNKIZ6q4Hpkwccex7IRvat2pmuX57mVPYUbdUT9ftaa0T+k9bM+OqSAHyDYY3
 5aw=
WDCIronportException: Internal
From: Damien Le Moal <damien.lemoal@wdc.com>
To: linux-fsdevel@vger.kernel.org
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH] zonefs: Fix O_APPEND async write handling
Date: Thu, 11 Mar 2021 12:22:30 +0900
Message-Id: <20210311032230.159925-1-damien.lemoal@wdc.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

zonefs: Fix O_APPEND async write handling | expand

Commit Message

Damien Le Moal March 11, 2021, 3:22 a.m. UTC

zonefs updates the size of a sequential zone file inode only on
completion of direct writes. When executing asynchronous append writes
(with a file open with O_APPEND or using RWF_APPEND), the use of the
current inode size in generic_write_checks() to set an iocb offset thus
leads to unaligned write if an application issues an append write
operation with another write already being executed.

Fix this problem by introducing zonefs_write_checks() as a modified
version of generic_write_checks() using the file inode wp_offset for an
append write iocb offset. Also introduce zonefs_write_check_limits() to
replace generic_write_check_limits() call. This zonefs special helper
makes sure that the maximum file limit used is the maximum size of the
file being accessed.

Since zonefs_write_checks() already truncates the iov_iter, the calls
to iov_iter_truncate() in zonefs_file_dio_write() and
zonefs_file_buffered_write() are removed.

Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 fs/zonefs/super.c | 76 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 66 insertions(+), 10 deletions(-)

Comments

Darrick J. Wong March 11, 2021, 3:36 a.m. UTC | #1

On Thu, Mar 11, 2021 at 12:22:30PM +0900, Damien Le Moal wrote:
> zonefs updates the size of a sequential zone file inode only on
> completion of direct writes. When executing asynchronous append writes
> (with a file open with O_APPEND or using RWF_APPEND), the use of the
> current inode size in generic_write_checks() to set an iocb offset thus
> leads to unaligned write if an application issues an append write
> operation with another write already being executed.

Ah, I /had/ wondered if setting i_size to the zone size (instead of the
write pointer) would have side effects...

> Fix this problem by introducing zonefs_write_checks() as a modified
> version of generic_write_checks() using the file inode wp_offset for an
> append write iocb offset. Also introduce zonefs_write_check_limits() to
> replace generic_write_check_limits() call. This zonefs special helper
> makes sure that the maximum file limit used is the maximum size of the
> file being accessed.
> 
> Since zonefs_write_checks() already truncates the iov_iter, the calls
> to iov_iter_truncate() in zonefs_file_dio_write() and
> zonefs_file_buffered_write() are removed.
> 
> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  fs/zonefs/super.c | 76 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 66 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index b6ff4a21abac..11aa990b3a4c 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -727,6 +727,68 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
>  	return ret;
>  }
>  
> +/*
> + * Do not exceed the LFS limits nor the file zone size. If pos is under the
> + * limit it becomes a short access. If it exceeds the limit, return -EFBIG.
> + */
> +static loff_t zonefs_write_check_limits(struct file *file, loff_t pos,
> +					loff_t count)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	loff_t limit = rlimit(RLIMIT_FSIZE);
> +	loff_t max_size = zi->i_max_size;
> +
> +	if (limit != RLIM_INFINITY) {
> +		if (pos >= limit) {
> +			send_sig(SIGXFSZ, current, 0);
> +			return -EFBIG;
> +		}
> +		count = min(count, limit - pos);
> +	}
> +
> +	if (!(file->f_flags & O_LARGEFILE))
> +		max_size = min_t(loff_t, MAX_NON_LFS, max_size);
> +
> +	if (unlikely(pos >= max_size))
> +		return -EFBIG;
> +
> +	return min(count, max_size - pos);
> +}
> +
> +static ssize_t zonefs_write_checks(struct kiocb *iocb, struct iov_iter *from)
> +{
> +	struct file *file = iocb->ki_filp;
> +	struct inode *inode = file_inode(file);
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	loff_t count;
> +
> +	if (IS_SWAPFILE(inode))
> +		return -ETXTBSY;

...but can zonefs really do swap files now?

--D

> +
> +	if (!iov_iter_count(from))
> +		return 0;
> +
> +	if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
> +		return -EINVAL;
> +
> +	if (iocb->ki_flags & IOCB_APPEND) {
> +		if (zi->i_ztype != ZONEFS_ZTYPE_SEQ)
> +			return -EINVAL;
> +		mutex_lock(&zi->i_truncate_mutex);
> +		iocb->ki_pos = zi->i_wpoffset;
> +		mutex_unlock(&zi->i_truncate_mutex);
> +	}
> +
> +	count = zonefs_write_check_limits(file, iocb->ki_pos,
> +					  iov_iter_count(from));
> +	if (count < 0)
> +		return count;
> +
> +	iov_iter_truncate(from, count);
> +	return iov_iter_count(from);
> +}
> +
>  /*
>   * Handle direct writes. For sequential zone files, this is the only possible
>   * write path. For these files, check that the user is issuing writes
> @@ -744,8 +806,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
>  	struct super_block *sb = inode->i_sb;
>  	bool sync = is_sync_kiocb(iocb);
>  	bool append = false;
> -	size_t count;
> -	ssize_t ret;
> +	ssize_t ret, count;
>  
>  	/*
>  	 * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT
> @@ -763,13 +824,10 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
>  		inode_lock(inode);
>  	}
>  
> -	ret = generic_write_checks(iocb, from);
> -	if (ret <= 0)
> +	count = zonefs_write_checks(iocb, from);
> +	if (count <= 0)
>  		goto inode_unlock;
>  
> -	iov_iter_truncate(from, zi->i_max_size - iocb->ki_pos);
> -	count = iov_iter_count(from);
> -
>  	if ((iocb->ki_pos | count) & (sb->s_blocksize - 1)) {
>  		ret = -EINVAL;
>  		goto inode_unlock;
> @@ -828,12 +886,10 @@ static ssize_t zonefs_file_buffered_write(struct kiocb *iocb,
>  		inode_lock(inode);
>  	}
>  
> -	ret = generic_write_checks(iocb, from);
> +	ret = zonefs_write_checks(iocb, from);
>  	if (ret <= 0)
>  		goto inode_unlock;
>  
> -	iov_iter_truncate(from, zi->i_max_size - iocb->ki_pos);
> -
>  	ret = iomap_file_buffered_write(iocb, from, &zonefs_iomap_ops);
>  	if (ret > 0)
>  		iocb->ki_pos += ret;
> -- 
> 2.29.2
>

Damien Le Moal March 11, 2021, 5:01 a.m. UTC | #2

On 2021/03/11 12:36, Darrick J. Wong wrote:
> On Thu, Mar 11, 2021 at 12:22:30PM +0900, Damien Le Moal wrote:
>> zonefs updates the size of a sequential zone file inode only on
>> completion of direct writes. When executing asynchronous append writes
>> (with a file open with O_APPEND or using RWF_APPEND), the use of the
>> current inode size in generic_write_checks() to set an iocb offset thus
>> leads to unaligned write if an application issues an append write
>> operation with another write already being executed.
> 
> Ah, I /had/ wondered if setting i_size to the zone size (instead of the
> write pointer) would have side effects...

In retrospect, the problem is obvious :)
But a hole in the test suite let this one slip for some time. That is fixed now.

[...]
>> +static ssize_t zonefs_write_checks(struct kiocb *iocb, struct iov_iter *from)
>> +{
>> +	struct file *file = iocb->ki_filp;
>> +	struct inode *inode = file_inode(file);
>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>> +	loff_t count;
>> +
>> +	if (IS_SWAPFILE(inode))
>> +		return -ETXTBSY;
> 
> ...but can zonefs really do swap files now?

Conventional zone files could, I guess, but I have not tested that. Not entirely
sure about this as I am not familiar with the swap code. I think it would be
safer to disallow swapfile use with zonefs, similarly to what claim_swap() does
with zoned block devices. But I am not sure how to do that. Sequential zone
files definitely will not be able to handle swap.

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index b6ff4a21abac..11aa990b3a4c 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -727,6 +727,68 @@  static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
 	return ret;
 }
 
+/*
+ * Do not exceed the LFS limits nor the file zone size. If pos is under the
+ * limit it becomes a short access. If it exceeds the limit, return -EFBIG.
+ */
+static loff_t zonefs_write_check_limits(struct file *file, loff_t pos,
+					loff_t count)
+{
+	struct inode *inode = file_inode(file);
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	loff_t limit = rlimit(RLIMIT_FSIZE);
+	loff_t max_size = zi->i_max_size;
+
+	if (limit != RLIM_INFINITY) {
+		if (pos >= limit) {
+			send_sig(SIGXFSZ, current, 0);
+			return -EFBIG;
+		}
+		count = min(count, limit - pos);
+	}
+
+	if (!(file->f_flags & O_LARGEFILE))
+		max_size = min_t(loff_t, MAX_NON_LFS, max_size);
+
+	if (unlikely(pos >= max_size))
+		return -EFBIG;
+
+	return min(count, max_size - pos);
+}
+
+static ssize_t zonefs_write_checks(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	loff_t count;
+
+	if (IS_SWAPFILE(inode))
+		return -ETXTBSY;
+
+	if (!iov_iter_count(from))
+		return 0;
+
+	if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+		return -EINVAL;
+
+	if (iocb->ki_flags & IOCB_APPEND) {
+		if (zi->i_ztype != ZONEFS_ZTYPE_SEQ)
+			return -EINVAL;
+		mutex_lock(&zi->i_truncate_mutex);
+		iocb->ki_pos = zi->i_wpoffset;
+		mutex_unlock(&zi->i_truncate_mutex);
+	}
+
+	count = zonefs_write_check_limits(file, iocb->ki_pos,
+					  iov_iter_count(from));
+	if (count < 0)
+		return count;
+
+	iov_iter_truncate(from, count);
+	return iov_iter_count(from);
+}
+
 /*
  * Handle direct writes. For sequential zone files, this is the only possible
  * write path. For these files, check that the user is issuing writes
@@ -744,8 +806,7 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 	struct super_block *sb = inode->i_sb;
 	bool sync = is_sync_kiocb(iocb);
 	bool append = false;
-	size_t count;
-	ssize_t ret;
+	ssize_t ret, count;
 
 	/*
 	 * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT
@@ -763,13 +824,10 @@  static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 		inode_lock(inode);
 	}
 
-	ret = generic_write_checks(iocb, from);
-	if (ret <= 0)
+	count = zonefs_write_checks(iocb, from);
+	if (count <= 0)
 		goto inode_unlock;
 
-	iov_iter_truncate(from, zi->i_max_size - iocb->ki_pos);
-	count = iov_iter_count(from);
-
 	if ((iocb->ki_pos | count) & (sb->s_blocksize - 1)) {
 		ret = -EINVAL;
 		goto inode_unlock;
@@ -828,12 +886,10 @@  static ssize_t zonefs_file_buffered_write(struct kiocb *iocb,
 		inode_lock(inode);
 	}
 
-	ret = generic_write_checks(iocb, from);
+	ret = zonefs_write_checks(iocb, from);
 	if (ret <= 0)
 		goto inode_unlock;
 
-	iov_iter_truncate(from, zi->i_max_size - iocb->ki_pos);
-
 	ret = iomap_file_buffered_write(iocb, from, &zonefs_iomap_ops);
 	if (ret > 0)
 		iocb->ki_pos += ret;

zonefs: Fix O_APPEND async write handling

Commit Message

Comments

Patch