loop: Disable fallocate() zero and discard if not supported

Message ID	20240607091555.2504-1-chrubis@suse.cz (mailing list archive)
State	New, archived
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4571405F8; Fri, 7 Jun 2024 09:16:27 +0000 (UTC) From: Cyril Hrubis <chrubis@suse.cz> To: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Cyril Hrubis <chrubis@suse.cz>, Jan Kara <jack@suse.cz> Subject: [PATCH] loop: Disable fallocate() zero and discard if not supported Date: Fri, 7 Jun 2024 11:15:55 +0200 Message-ID: <20240607091555.2504-1-chrubis@suse.cz> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	loop: Disable fallocate() zero and discard if not supported \| expand loop: Disable fallocate() zero and discard if not supported

Cyril Hrubis June 7, 2024, 9:15 a.m. UTC

If fallcate is implemented but zero and discard operations are not
supported by the filesystem the backing file is on we continue to fill
dmesg with errors from the blk_mq_end_request() since each time we call
fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
ends up propagated into the block layer. In the end syscall succeeds
since the blkdev_issue_zeroout() falls back to writing zeroes which
makes the errors even more misleading and confusing.

How to reproduce:

1. make sure /tmp is mounted as tmpfs
2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
3. losetup /dev/loop0 /tmp/disk.img
4. mkfs.ext2 /dev/loop0
5. dmesg |tail

[710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
[710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0

This patch changes the lo_fallocate() to clear the flags for zero and
discard operations if we get EOPNOTSUPP from the backing file fallocate
callback, that way we at least stop spewing errors after the first
unsuccessful try.

CC: Jan Kara <jack@suse.cz>
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
---
 drivers/block/loop.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Petr Vorel June 7, 2024, 3:06 p.m. UTC | #1

From: Cyril Hrubis <chrubis@suse.cz>

Hi Cyril, all,


> If fallcate is implemented but zero and discard operations are not
nit: s/fallocate/fallcate/

> supported by the filesystem the backing file is on we continue to fill
> dmesg with errors from the blk_mq_end_request() since each time we call
> fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
> ends up propagated into the block layer. In the end syscall succeeds
> since the blkdev_issue_zeroout() falls back to writing zeroes which
> makes the errors even more misleading and confusing.

Thanks for looking into this!

FYI this has been in mainline for some time: mentioned for 4.16.0-rc7 [1], later
in v5.19 [2]. Cc also Dave, who back then suggested to remove the warning as not
being useful for users [3].

> How to reproduce:

> 1. make sure /tmp is mounted as tmpfs
> 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
> 3. losetup /dev/loop0 /tmp/disk.img
> 4. mkfs.ext2 /dev/loop0
> 5. dmesg |tail

> [710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0
> [710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0

Kind regards,
Petr

[1] https://lore.kernel.org/all/10632862.17524551.1522402353418.JavaMail.zimbra@redhat.com/
[2] https://lore.kernel.org/all/YvZUfq+3HYwXEncw@pevik/
[3] https://lore.kernel.org/all/20220814224440.GR3600936@dread.disaster.area/

> This patch changes the lo_fallocate() to clear the flags for zero and
> discard operations if we get EOPNOTSUPP from the backing file fallocate
> callback, that way we at least stop spewing errors after the first
> unsuccessful try.

> CC: Jan Kara <jack@suse.cz>
> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
> ---
>  drivers/block/loop.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)

> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 93780f41646b..315c76e3ef4a 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -320,6 +320,21 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
>  	ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq));
>  	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
>  		return -EIO;
> +
> +	if (ret == -EOPNOTSUPP) {
> +		struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
> +
> +		if (mode & FALLOC_FL_ZERO_RANGE)
> +			lim.max_write_zeroes_sectors = 0;
> +
> +		if (mode & FALLOC_FL_PUNCH_HOLE) {
> +			lim.max_hw_discard_sectors = 0;
> +			lim.discard_granularity = 0;
> +		}
> +
> +		queue_limits_commit_update(lo->lo_queue, &lim);
> +	}
> +
>  	return ret;
>  }

Christoph Hellwig June 8, 2024, 5:42 a.m. UTC | #2

On Fri, Jun 07, 2024 at 11:15:55AM +0200, Cyril Hrubis wrote:
> If fallcate is implemented but zero and discard operations are not
> supported by the filesystem the backing file is on we continue to fill
> dmesg with errors from the blk_mq_end_request() since each time we call
> fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
> ends up propagated into the block layer. In the end syscall succeeds
> since the blkdev_issue_zeroout() falls back to writing zeroes which
> makes the errors even more misleading and confusing.
> 
> How to reproduce:
> 
> 1. make sure /tmp is mounted as tmpfs
> 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
> 3. losetup /dev/loop0 /tmp/disk.img
> 4. mkfs.ext2 /dev/loop0
> 5. dmesg |tail

Can you wire this up for blktests?

> +	if (ret == -EOPNOTSUPP) {
> +		struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
> +
> +		if (mode & FALLOC_FL_ZERO_RANGE)
> +			lim.max_write_zeroes_sectors = 0;
> +
> +		if (mode & FALLOC_FL_PUNCH_HOLE) {
> +			lim.max_hw_discard_sectors = 0;
> +			lim.discard_granularity = 0;
> +		}
> +
> +		queue_limits_commit_update(lo->lo_queue, &lim);

Please split this out into a separate helper to keep it out of the
main fast path I/O handling.  A little comment that we are
optimistically trying these if ->fallocate is support and might have
to paddle back here would also be useful.

(and maybe one day we figure out a way for the file system to
advertise what fallocate modes it actually supports..)

Chaitanya Kulkarni June 10, 2024, 2:47 a.m. UTC | #3

On 6/7/2024 10:42 PM, Christoph Hellwig wrote:
> On Fri, Jun 07, 2024 at 11:15:55AM +0200, Cyril Hrubis wrote:
>> If fallcate is implemented but zero and discard operations are not
>> supported by the filesystem the backing file is on we continue to fill
>> dmesg with errors from the blk_mq_end_request() since each time we call
>> fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate()
>> ends up propagated into the block layer. In the end syscall succeeds
>> since the blkdev_issue_zeroout() falls back to writing zeroes which
>> makes the errors even more misleading and confusing.
>>
>> How to reproduce:
>>
>> 1. make sure /tmp is mounted as tmpfs
>> 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
>> 3. losetup /dev/loop0 /tmp/disk.img
>> 4. mkfs.ext2 /dev/loop0
>> 5. dmesg |tail
> Can you wire this up for blktests?

Please CC Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> and me when
you do that ..

-ck

Cyril Hrubis June 10, 2024, 2:13 p.m. UTC | #4

Hi!
> > How to reproduce:
> > 
> > 1. make sure /tmp is mounted as tmpfs
> > 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100
> > 3. losetup /dev/loop0 /tmp/disk.img
> > 4. mkfs.ext2 /dev/loop0
> > 5. dmesg |tail
> 
> Can you wire this up for blktests?

Will try.

> > +	if (ret == -EOPNOTSUPP) {
> > +		struct queue_limits lim = queue_limits_start_update(lo->lo_queue);
> > +
> > +		if (mode & FALLOC_FL_ZERO_RANGE)
> > +			lim.max_write_zeroes_sectors = 0;
> > +
> > +		if (mode & FALLOC_FL_PUNCH_HOLE) {
> > +			lim.max_hw_discard_sectors = 0;
> > +			lim.discard_granularity = 0;
> > +		}
> > +
> > +		queue_limits_commit_update(lo->lo_queue, &lim);
> 
> Please split this out into a separate helper to keep it out of the
> main fast path I/O handling.  A little comment that we are
> optimistically trying these if ->fallocate is support and might have
> to paddle back here would also be useful.

Will do.

Do we need noinline attribute for the function as well or unlikely() in
the if condition?

> (and maybe one day we figure out a way for the file system to
> advertise what fallocate modes it actually supports..)

One of my ideas was to try fallocate() with zero size in the
loop_reconfigure_limits() to see if we get EOPNOTSUPP but for that to
work we would have to make sure that we do not bail early on zero size
in the vfs layer...

Christoph Hellwig June 10, 2024, 2:14 p.m. UTC | #5

On Mon, Jun 10, 2024 at 04:13:23PM +0200, Cyril Hrubis wrote:
> Do we need noinline attribute for the function as well or unlikely() in
> the if condition?

unlikely sounds like the right thing here.

> > (and maybe one day we figure out a way for the file system to
> > advertise what fallocate modes it actually supports..)
> 
> One of my ideas was to try fallocate() with zero size in the
> loop_reconfigure_limits() to see if we get EOPNOTSUPP but for that to
> work we would have to make sure that we do not bail early on zero size
> in the vfs layer...

And the VFS layer doesn't know, it has to go all the way into the
file system..

loop: Disable fallocate() zero and discard if not supported

Commit Message

Comments

Patch