Message ID | 20240607091555.2504-1-chrubis@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | loop: Disable fallocate() zero and discard if not supported | expand |
From: Cyril Hrubis <chrubis@suse.cz> Hi Cyril, all, > If fallcate is implemented but zero and discard operations are not nit: s/fallocate/fallcate/ > supported by the filesystem the backing file is on we continue to fill > dmesg with errors from the blk_mq_end_request() since each time we call > fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate() > ends up propagated into the block layer. In the end syscall succeeds > since the blkdev_issue_zeroout() falls back to writing zeroes which > makes the errors even more misleading and confusing. Thanks for looking into this! FYI this has been in mainline for some time: mentioned for 4.16.0-rc7 [1], later in v5.19 [2]. Cc also Dave, who back then suggested to remove the warning as not being useful for users [3]. > How to reproduce: > 1. make sure /tmp is mounted as tmpfs > 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100 > 3. losetup /dev/loop0 /tmp/disk.img > 4. mkfs.ext2 /dev/loop0 > 5. dmesg |tail > [710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 > [710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 Kind regards, Petr [1] https://lore.kernel.org/all/10632862.17524551.1522402353418.JavaMail.zimbra@redhat.com/ [2] https://lore.kernel.org/all/YvZUfq+3HYwXEncw@pevik/ [3] https://lore.kernel.org/all/20220814224440.GR3600936@dread.disaster.area/ > This patch changes the lo_fallocate() to clear the flags for zero and > discard operations if we get EOPNOTSUPP from the backing file fallocate > callback, that way we at least stop spewing errors after the first > unsuccessful try. > CC: Jan Kara <jack@suse.cz> > Signed-off-by: Cyril Hrubis <chrubis@suse.cz> > --- > drivers/block/loop.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 93780f41646b..315c76e3ef4a 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -320,6 +320,21 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos, > ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq)); > if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP)) > return -EIO; > + > + if (ret == -EOPNOTSUPP) { > + struct queue_limits lim = queue_limits_start_update(lo->lo_queue); > + > + if (mode & FALLOC_FL_ZERO_RANGE) > + lim.max_write_zeroes_sectors = 0; > + > + if (mode & FALLOC_FL_PUNCH_HOLE) { > + lim.max_hw_discard_sectors = 0; > + lim.discard_granularity = 0; > + } > + > + queue_limits_commit_update(lo->lo_queue, &lim); > + } > + > return ret; > }
On Fri, Jun 07, 2024 at 11:15:55AM +0200, Cyril Hrubis wrote: > If fallcate is implemented but zero and discard operations are not > supported by the filesystem the backing file is on we continue to fill > dmesg with errors from the blk_mq_end_request() since each time we call > fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate() > ends up propagated into the block layer. In the end syscall succeeds > since the blkdev_issue_zeroout() falls back to writing zeroes which > makes the errors even more misleading and confusing. > > How to reproduce: > > 1. make sure /tmp is mounted as tmpfs > 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100 > 3. losetup /dev/loop0 /tmp/disk.img > 4. mkfs.ext2 /dev/loop0 > 5. dmesg |tail Can you wire this up for blktests? > + if (ret == -EOPNOTSUPP) { > + struct queue_limits lim = queue_limits_start_update(lo->lo_queue); > + > + if (mode & FALLOC_FL_ZERO_RANGE) > + lim.max_write_zeroes_sectors = 0; > + > + if (mode & FALLOC_FL_PUNCH_HOLE) { > + lim.max_hw_discard_sectors = 0; > + lim.discard_granularity = 0; > + } > + > + queue_limits_commit_update(lo->lo_queue, &lim); Please split this out into a separate helper to keep it out of the main fast path I/O handling. A little comment that we are optimistically trying these if ->fallocate is support and might have to paddle back here would also be useful. (and maybe one day we figure out a way for the file system to advertise what fallocate modes it actually supports..)
On 6/7/2024 10:42 PM, Christoph Hellwig wrote: > On Fri, Jun 07, 2024 at 11:15:55AM +0200, Cyril Hrubis wrote: >> If fallcate is implemented but zero and discard operations are not >> supported by the filesystem the backing file is on we continue to fill >> dmesg with errors from the blk_mq_end_request() since each time we call >> fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate() >> ends up propagated into the block layer. In the end syscall succeeds >> since the blkdev_issue_zeroout() falls back to writing zeroes which >> makes the errors even more misleading and confusing. >> >> How to reproduce: >> >> 1. make sure /tmp is mounted as tmpfs >> 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100 >> 3. losetup /dev/loop0 /tmp/disk.img >> 4. mkfs.ext2 /dev/loop0 >> 5. dmesg |tail > Can you wire this up for blktests? Please CC Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> and me when you do that .. -ck
Hi! > > How to reproduce: > > > > 1. make sure /tmp is mounted as tmpfs > > 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100 > > 3. losetup /dev/loop0 /tmp/disk.img > > 4. mkfs.ext2 /dev/loop0 > > 5. dmesg |tail > > Can you wire this up for blktests? Will try. > > + if (ret == -EOPNOTSUPP) { > > + struct queue_limits lim = queue_limits_start_update(lo->lo_queue); > > + > > + if (mode & FALLOC_FL_ZERO_RANGE) > > + lim.max_write_zeroes_sectors = 0; > > + > > + if (mode & FALLOC_FL_PUNCH_HOLE) { > > + lim.max_hw_discard_sectors = 0; > > + lim.discard_granularity = 0; > > + } > > + > > + queue_limits_commit_update(lo->lo_queue, &lim); > > Please split this out into a separate helper to keep it out of the > main fast path I/O handling. A little comment that we are > optimistically trying these if ->fallocate is support and might have > to paddle back here would also be useful. Will do. Do we need noinline attribute for the function as well or unlikely() in the if condition? > (and maybe one day we figure out a way for the file system to > advertise what fallocate modes it actually supports..) One of my ideas was to try fallocate() with zero size in the loop_reconfigure_limits() to see if we get EOPNOTSUPP but for that to work we would have to make sure that we do not bail early on zero size in the vfs layer...
On Mon, Jun 10, 2024 at 04:13:23PM +0200, Cyril Hrubis wrote: > Do we need noinline attribute for the function as well or unlikely() in > the if condition? unlikely sounds like the right thing here. > > (and maybe one day we figure out a way for the file system to > > advertise what fallocate modes it actually supports..) > > One of my ideas was to try fallocate() with zero size in the > loop_reconfigure_limits() to see if we get EOPNOTSUPP but for that to > work we would have to make sure that we do not bail early on zero size > in the vfs layer... And the VFS layer doesn't know, it has to go all the way into the file system..
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 93780f41646b..315c76e3ef4a 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -320,6 +320,21 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos, ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq)); if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP)) return -EIO; + + if (ret == -EOPNOTSUPP) { + struct queue_limits lim = queue_limits_start_update(lo->lo_queue); + + if (mode & FALLOC_FL_ZERO_RANGE) + lim.max_write_zeroes_sectors = 0; + + if (mode & FALLOC_FL_PUNCH_HOLE) { + lim.max_hw_discard_sectors = 0; + lim.discard_granularity = 0; + } + + queue_limits_commit_update(lo->lo_queue, &lim); + } + return ret; }
If fallcate is implemented but zero and discard operations are not supported by the filesystem the backing file is on we continue to fill dmesg with errors from the blk_mq_end_request() since each time we call fallocate() on the loop device the EOPNOTSUPP error from lo_fallocate() ends up propagated into the block layer. In the end syscall succeeds since the blkdev_issue_zeroout() falls back to writing zeroes which makes the errors even more misleading and confusing. How to reproduce: 1. make sure /tmp is mounted as tmpfs 2. dd if=/dev/zero of=/tmp/disk.img bs=1M count=100 3. losetup /dev/loop0 /tmp/disk.img 4. mkfs.ext2 /dev/loop0 5. dmesg |tail [710690.898214] operation not supported error, dev loop0, sector 204672 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.898279] operation not supported error, dev loop0, sector 522 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.898603] operation not supported error, dev loop0, sector 16906 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.898917] operation not supported error, dev loop0, sector 32774 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.899218] operation not supported error, dev loop0, sector 49674 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.899484] operation not supported error, dev loop0, sector 65542 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.899743] operation not supported error, dev loop0, sector 82442 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.900015] operation not supported error, dev loop0, sector 98310 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.900276] operation not supported error, dev loop0, sector 115210 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 [710690.900546] operation not supported error, dev loop0, sector 131078 op 0x9:(WRITE_ZEROES) flags 0x8000800 phys_seg 0 prio class 0 This patch changes the lo_fallocate() to clear the flags for zero and discard operations if we get EOPNOTSUPP from the backing file fallocate callback, that way we at least stop spewing errors after the first unsuccessful try. CC: Jan Kara <jack@suse.cz> Signed-off-by: Cyril Hrubis <chrubis@suse.cz> --- drivers/block/loop.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)