diff mbox series

[v8,3/5] loop: Add support for provision requests

Message ID 20231007012817.3052558-4-sarthakkukreti@chromium.org (mailing list archive)
State New, archived
Headers show
Series Introduce provisioning primitives | expand

Commit Message

Sarthak Kukreti Oct. 7, 2023, 1:28 a.m. UTC
Add support for provision requests to loopback devices.
Loop devices will configure provision support based on
whether the underlying block device/file can support
the provision request and upon receiving a provision bio,
will map it to the backing device/storage. For loop devices
over files, a REQ_OP_PROVISION request will translate to
an fallocate mode 0 call on the backing file.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 drivers/block/loop.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

Comments

Dave Chinner Oct. 8, 2023, 11:37 p.m. UTC | #1
On Fri, Oct 06, 2023 at 06:28:15PM -0700, Sarthak Kukreti wrote:
> Add support for provision requests to loopback devices.
> Loop devices will configure provision support based on
> whether the underlying block device/file can support
> the provision request and upon receiving a provision bio,
> will map it to the backing device/storage. For loop devices
> over files, a REQ_OP_PROVISION request will translate to
> an fallocate mode 0 call on the backing file.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>


Hmmmm.

This doesn't actually implement the required semantics of
REQ_PROVISION. Yes, it passes the command to the filesystem
fallocate() implementation, but fallocate() at the filesystem level
does not have the same semantics as REQ_PROVISION.

i.e. at the filesystem level, fallocate() only guarantees the next
write to the provisioned range will succeed without ENOSPC, it does
not guarantee *every* write to the range will succeed without
ENOSPC. If someone clones the loop file while it is in use (i.e.
snapshots it via cp --reflink) then all guarantees that the next
write to a provisioned LBA range will succeed without ENOSPC are
voided.

So while this will work for basic testing that the filesystem is
issuing REQ_PROVISION based IO correctly, it can't actually be used
for hosting production filesystems that need full REQ_PROVISION
guarantees when the loop device backing file is independently
shapshotted via FICLONE....

At minimuim, this set of implementation constraints needs tobe
documented somewhere...

-Dave.
Sarthak Kukreti Oct. 10, 2023, 10:43 p.m. UTC | #2
On Sun, Oct 8, 2023 at 4:37 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Oct 06, 2023 at 06:28:15PM -0700, Sarthak Kukreti wrote:
> > Add support for provision requests to loopback devices.
> > Loop devices will configure provision support based on
> > whether the underlying block device/file can support
> > the provision request and upon receiving a provision bio,
> > will map it to the backing device/storage. For loop devices
> > over files, a REQ_OP_PROVISION request will translate to
> > an fallocate mode 0 call on the backing file.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
>
>
> Hmmmm.
>
> This doesn't actually implement the required semantics of
> REQ_PROVISION. Yes, it passes the command to the filesystem
> fallocate() implementation, but fallocate() at the filesystem level
> does not have the same semantics as REQ_PROVISION.
>
> i.e. at the filesystem level, fallocate() only guarantees the next
> write to the provisioned range will succeed without ENOSPC, it does
> not guarantee *every* write to the range will succeed without
> ENOSPC. If someone clones the loop file while it is in use (i.e.
> snapshots it via cp --reflink) then all guarantees that the next
> write to a provisioned LBA range will succeed without ENOSPC are
> voided.
>
> So while this will work for basic testing that the filesystem is
> issuing REQ_PROVISION based IO correctly, it can't actually be used
> for hosting production filesystems that need full REQ_PROVISION
> guarantees when the loop device backing file is independently
> shapshotted via FICLONE....
>
> At minimuim, this set of implementation constraints needs tobe
> documented somewhere...
>
Fair point. I wanted to have a separate fallocate() mode
(FALLOC_FL_PROVISION) in the earlier series of the patchset so that we
can distinguish between a provision request and a regular fallocate()
call; I dropped it from the series after feedback that the default
case should suffice. But this might be one of the cases where we need
an explicit intent that we want to provision space.

Given a separate FALLOC_FL_PROVISION mode in the scenario you
mentioned, the filesystem could copy previously 'provisioned' blocks
to new blocks (which implicitly provisions them) or reserve blocks for
use (and passing through REQ_OP_PROVISION below). That also means that
the filesystem should track 'provisioned' blocks and take appropriate
actions to ensure the provisioning guarantees.

For filesystems without copy-on-write semantics (eg. ext4),
REQ_OP_PROVISION should still be equivalent to mode == 0.

For now, I'll add the details to the commit message and the loop
driver code (side note: should there be device documentation on the
loop device driver?). WDYT?

Best
Sarthak

> -Dave.
> --
> Dave Chinner
> david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Dave Chinner Oct. 10, 2023, 11:59 p.m. UTC | #3
On Tue, Oct 10, 2023 at 03:43:10PM -0700, Sarthak Kukreti wrote:
> On Sun, Oct 8, 2023 at 4:37 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Oct 06, 2023 at 06:28:15PM -0700, Sarthak Kukreti wrote:
> > > Add support for provision requests to loopback devices.
> > > Loop devices will configure provision support based on
> > > whether the underlying block device/file can support
> > > the provision request and upon receiving a provision bio,
> > > will map it to the backing device/storage. For loop devices
> > > over files, a REQ_OP_PROVISION request will translate to
> > > an fallocate mode 0 call on the backing file.
> > >
> > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> >
> >
> > Hmmmm.
> >
> > This doesn't actually implement the required semantics of
> > REQ_PROVISION. Yes, it passes the command to the filesystem
> > fallocate() implementation, but fallocate() at the filesystem level
> > does not have the same semantics as REQ_PROVISION.
> >
> > i.e. at the filesystem level, fallocate() only guarantees the next
> > write to the provisioned range will succeed without ENOSPC, it does
> > not guarantee *every* write to the range will succeed without
> > ENOSPC. If someone clones the loop file while it is in use (i.e.
> > snapshots it via cp --reflink) then all guarantees that the next
> > write to a provisioned LBA range will succeed without ENOSPC are
> > voided.
> >
> > So while this will work for basic testing that the filesystem is
> > issuing REQ_PROVISION based IO correctly, it can't actually be used
> > for hosting production filesystems that need full REQ_PROVISION
> > guarantees when the loop device backing file is independently
> > shapshotted via FICLONE....
> >
> > At minimuim, this set of implementation constraints needs tobe
> > documented somewhere...
> >
> Fair point. I wanted to have a separate fallocate() mode
> (FALLOC_FL_PROVISION) in the earlier series of the patchset so that we
> can distinguish between a provision request and a regular fallocate()
> call; I dropped it from the series after feedback that the default
> case should suffice. But this might be one of the cases where we need
> an explicit intent that we want to provision space.

ISTR that I commented that filesystems like XFS can't implement
REQ_PROVISION semantics for extents without on-disk format
changes. Hence that needs to happen before we expose a new API to
userspace....

> Given a separate FALLOC_FL_PROVISION mode in the scenario you
> mentioned, the filesystem could copy previously 'provisioned' blocks
> to new blocks (which implicitly provisions them) or reserve blocks for
> use (and passing through REQ_OP_PROVISION below). That also means that
> the filesystem should track 'provisioned' blocks and take appropriate
> actions to ensure the provisioning guarantees.

Yes, tracking provisioned ranges persistently and the reservations
they require needs on-disk filesytem format changes compared to just
preallocating space.  None of this functionality currently exists in
any filesystem that supports shared extents, and it's a fairly
significant chunk of development work to support it.

Nobody has planned to do this sort of complex surgery to XFS at
this point in time. I doubt that anyone on the btrfs side of
things is really even following this discussion because this is
largely for block device thinp and snapshot support
and btrfs just doesn't care about that.

> For filesystems without copy-on-write semantics (eg. ext4),
> REQ_OP_PROVISION should still be equivalent to mode == 0.

Well, yes. This is the same situation as "for non-sparse block
devices, REQ_PROVISION can just be ignored." This is not an
interesting use case, nor a use case that the functionality or APIs
should be designed around.

-Dave.
diff mbox series

Patch

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9f2d412fc560..abb4dddbd4fd 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -311,16 +311,20 @@  static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
 {
 	/*
 	 * We use fallocate to manipulate the space mappings used by the image
-	 * a.k.a. discard/zerorange.
+	 * a.k.a. discard/provision/zerorange.
 	 */
 	struct file *file = lo->lo_backing_file;
 	int ret;
 
-	mode |= FALLOC_FL_KEEP_SIZE;
+	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE) &&
+	    !bdev_max_discard_sectors(lo->lo_device))
+		return -EOPNOTSUPP;
 
-	if (!bdev_max_discard_sectors(lo->lo_device))
+	if (mode == 0 && !bdev_max_provision_sectors(lo->lo_device))
 		return -EOPNOTSUPP;
 
+	mode |= FALLOC_FL_KEEP_SIZE;
+
 	ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq));
 	if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
 		return -EIO;
@@ -488,6 +492,8 @@  static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 				FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_DISCARD:
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
+	case REQ_OP_PROVISION:
+		return lo_fallocate(lo, rq, pos, 0);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
 			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
@@ -754,6 +760,25 @@  static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
+static void loop_config_provision(struct loop_device *lo)
+{
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
+
+	/*
+	 * If the backing device is a block device, mirror its provisioning
+	 * capability.
+	 */
+	if (S_ISBLK(inode->i_mode)) {
+		blk_queue_max_provision_sectors(lo->lo_queue,
+			bdev_max_provision_sectors(I_BDEV(inode)));
+	} else if (file->f_op->fallocate) {
+		blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
+	} else {
+		blk_queue_max_provision_sectors(lo->lo_queue, 0);
+	}
+}
+
 static void loop_config_discard(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -1092,6 +1117,7 @@  static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	blk_queue_io_min(lo->lo_queue, bsize);
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
@@ -1304,6 +1330,7 @@  loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	}
 
 	loop_config_discard(lo);
+	loop_config_provision(lo);
 
 	/* update dio if lo_offset or transfer is changed */
 	__loop_update_dio(lo, lo->use_dio);
@@ -1857,6 +1884,7 @@  static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 	case REQ_OP_DISCARD:
 	case REQ_OP_WRITE_ZEROES:
+	case REQ_OP_PROVISION:
 		cmd->use_aio = false;
 		break;
 	default: