diff mbox series

[v3,2/3] dm: Add support for block provisioning

Message ID 20230414000219.92640-3-sarthakkukreti@chromium.org (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series [v3,1/3] block: Introduce provisioning primitives | expand

Commit Message

Sarthak Kukreti April 14, 2023, 12:02 a.m. UTC
Add support to dm devices for REQ_OP_PROVISION. The default mode
is to passthrough the request to the underlying device, if it
supports it. dm-thinpool uses the provision request to provision
blocks for a dm-thin device. dm-thinpool currently does not
pass through REQ_OP_PROVISION to underlying devices.

For shared blocks, provision requests will break sharing and copy the
contents of the entire block.

Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
---
 drivers/md/dm-crypt.c         |   4 +-
 drivers/md/dm-linear.c        |   1 +
 drivers/md/dm-snap.c          |   7 +++
 drivers/md/dm-table.c         |  25 ++++++++
 drivers/md/dm-thin.c          | 110 +++++++++++++++++++++++++++++++---
 drivers/md/dm.c               |   4 ++
 include/linux/device-mapper.h |  11 ++++
 7 files changed, 153 insertions(+), 9 deletions(-)

Comments

Joe Thornber April 14, 2023, 1:32 p.m. UTC | #1
On Fri, Apr 14, 2023 at 7:52 AM Sarthak Kukreti <sarthakkukreti@chromium.org>
wrote:

> Add support to dm devices for REQ_OP_PROVISION. The default mode
> is to passthrough the request to the underlying device, if it
> supports it. dm-thinpool uses the provision request to provision
> blocks for a dm-thin device. dm-thinpool currently does not
> pass through REQ_OP_PROVISION to underlying devices.
>
> For shared blocks, provision requests will break sharing and copy the
> contents of the entire block.
>

I see two issue with this patch:

i) You use get_bio_block_range() to see which blocks the provision bio
covers.  But this function only returns
complete blocks that are covered (it was designed for discard).  Unlike
discard, provision is not a hint so those
partial blocks will need to be provisioned too.

ii) You are setting off multiple dm_thin_new_mapping operations in flight
at once.  Each of these receives
the same virt_cell and frees it  when it completes.  So I think we have
multiple frees occuring?  In addition you also
release the cell yourself in process_provision_cell().  Fixing this is not
trivial, you'll need to reference count the cells,
and aggregate the mapping operation results.

I think it would be far easier to restrict the size of the provision bio to
be no bigger than one thinp block (as we do for normal io).  This way dm
core can split the bios, chain the child bios rather than having to
reference count mapping ops, and aggregate the results.

- Joe
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer April 14, 2023, 6:14 p.m. UTC | #2
On Fri, Apr 14 2023 at  9:32P -0400,
Joe Thornber <thornber@redhat.com> wrote:

> On Fri, Apr 14, 2023 at 7:52 AM Sarthak Kukreti <sarthakkukreti@chromium.org>
> wrote:
> 
> > Add support to dm devices for REQ_OP_PROVISION. The default mode
> > is to passthrough the request to the underlying device, if it
> > supports it. dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block.
> >
> 
> I see two issue with this patch:
> 
> i) You use get_bio_block_range() to see which blocks the provision bio
> covers.  But this function only returns
> complete blocks that are covered (it was designed for discard).  Unlike
> discard, provision is not a hint so those
> partial blocks will need to be provisioned too.
> 
> ii) You are setting off multiple dm_thin_new_mapping operations in flight
> at once.  Each of these receives
> the same virt_cell and frees it  when it completes.  So I think we have
> multiple frees occuring?  In addition you also
> release the cell yourself in process_provision_cell().  Fixing this is not
> trivial, you'll need to reference count the cells,
> and aggregate the mapping operation results.
> 
> I think it would be far easier to restrict the size of the provision bio to
> be no bigger than one thinp block (as we do for normal io).  This way dm
> core can split the bios, chain the child bios rather than having to
> reference count mapping ops, and aggregate the results.

I happened to be looking at implementing WRITE_ZEROES support for DM
thinp yesterday and reached the same conclussion relative to it (both
of Joe's points above, for me "ii)" was: the dm-bio-prison-v1 range
locking we do for discards needs work for other types of IO).

We can work to make REQ_OP_PROVISION spanning multiple thinp blocks
possible as follow-on optimization; but in the near-term DM thinp
needs REQ_OP_PROVISION to be split on a thinp block boundary.

This splitting can be assisted by block core in terms of a new
'provision_granularity' (which for thinp, it'd be set to the thinp
blocksize).  But I don't know that we need to go that far (I'm
thinking its fine to have DM do the splitting it needs and only
elevate related code to block core if/when needed in the future).

DM core can take on conditionally imposing its max_io_len() to handle
splitting REQ_OP_PROVISION as needed on a per-target basis. This DM
core commit I've staged for 6.4 makes this quite a simple change:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-6.4&id=13f6facf3faeed34ca381aef4c9b153c7aed3972

So please rebase on linux-dm.git's dm-6.4 branch, and for
REQ_OP_PROVISION dm.c:__process_abnormal_io() you'd add this:

	case REQ_OP_PROVISION:
                num_bios = ti->num_provision_bios;
                if (ti->max_provision_granularity)
                        max_granularity = limits->max_provision_sectors;
                break;

I'll reply again later today (to patch 2's actual code changes),
because I caught at least one other thing worth mentioning.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Mike Snitzer April 14, 2023, 9:58 p.m. UTC | #3
On Thu, Apr 13 2023 at  8:02P -0400,
Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:

> Add support to dm devices for REQ_OP_PROVISION. The default mode
> is to passthrough the request to the underlying device, if it
> supports it. dm-thinpool uses the provision request to provision
> blocks for a dm-thin device. dm-thinpool currently does not
> pass through REQ_OP_PROVISION to underlying devices.
> 
> For shared blocks, provision requests will break sharing and copy the
> contents of the entire block.
> 
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> ---
>  drivers/md/dm-crypt.c         |   4 +-
>  drivers/md/dm-linear.c        |   1 +
>  drivers/md/dm-snap.c          |   7 +++

Have you tested REQ_OP_PROVISION with these targets?  Just want to
make sure you have an explicit need (and vested interest) for them
passing through REQ_OP_PROVISION.

> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 2055a758541d..5985343384a7 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1850,6 +1850,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
>  	return true;
>  }
>  
> +static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
> +				    sector_t start, sector_t len, void *data)
> +{
> +	return !bdev_max_provision_sectors(dev->bdev);
> +}
> +
> +static bool dm_table_supports_provision(struct dm_table *t)
> +{
> +	for (unsigned int i = 0; i < t->num_targets; i++) {
> +		struct dm_target *ti = dm_table_get_target(t, i);
> +
> +		if (ti->provision_supported ||
> +		    (ti->type->iterate_devices &&
> +		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
>  				     sector_t start, sector_t len, void *data)
>  {
> @@ -1983,6 +2003,11 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
>  	if (!dm_table_supports_write_zeroes(t))
>  		q->limits.max_write_zeroes_sectors = 0;
>  
> +	if (dm_table_supports_provision(t))
> +		blk_queue_max_provision_sectors(q, UINT_MAX >> 9);

This doesn't seem correct in that it'll override whatever
max_provision_sectors was set by a target (like thinp).

I think you only need the if (!dm_table_supports_provision)) case:

> +	else
> +		q->limits.max_provision_sectors = 0;
> +
>  	dm_table_verify_integrity(t);
>  
>  	/*
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 13d4677baafd..b08b7ae617be 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c

I think it'll make the most sense to split out the dm-thin.c changes
in a separate patch.

> @@ -909,7 +909,8 @@ static void __inc_remap_and_issue_cell(void *context,
>  	struct bio *bio;
>  
>  	while ((bio = bio_list_pop(&cell->bios))) {
> -		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
> +		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> +		    bio_op(bio) == REQ_OP_PROVISION)
>  			bio_list_add(&info->defer_bios, bio);
>  		else {
>  			inc_all_io_entry(info->tc->pool, bio);
> @@ -1013,6 +1014,15 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
>  		goto out;
>  	}
>  
> +	/*
> +	 * For provision requests, once the prepared block has been inserted
> +	 * into the mapping btree, return.
> +	 */
> +	if (bio && bio_op(bio) == REQ_OP_PROVISION) {
> +		bio_endio(bio);
> +		return;
> +	}
> +
>  	/*
>  	 * Release any bios held while the block was being provisioned.
>  	 * If we are processing a write bio that completely covers the block,
> @@ -1241,7 +1251,7 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
>  
>  static int io_overwrites_block(struct pool *pool, struct bio *bio)
>  {
> -	return (bio_data_dir(bio) == WRITE) &&
> +	return (bio_data_dir(bio) == WRITE) && bio_op(bio) != REQ_OP_PROVISION &&
>  		io_overlaps_block(pool, bio);
>  }
>  
> @@ -1334,10 +1344,11 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
>  	/*
>  	 * IO to pool_dev remaps to the pool target's data_dev.
>  	 *
> -	 * If the whole block of data is being overwritten, we can issue the
> -	 * bio immediately. Otherwise we use kcopyd to clone the data first.
> +	 * If the whole block of data is being overwritten and if this is not a
> +	 * provision request, we can issue the bio immediately.
> +	 * Otherwise we use kcopyd to clone the data first.
>  	 */
> -	if (io_overwrites_block(pool, bio))
> +	if (io_overwrites_block(pool, bio) && bio_op(bio) != REQ_OP_PROVISION)
>  		remap_and_issue_overwrite(tc, bio, data_dest, m);
>  	else {
>  		struct dm_io_region from, to;
> @@ -1356,7 +1367,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
>  		/*
>  		 * Do we need to zero a tail region?
>  		 */
> -		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks) {
> +		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks &&
> +		    bio_op(bio) != REQ_OP_PROVISION) {
>  			atomic_inc(&m->prepare_actions);
>  			ll_zero(tc, m,
>  				data_dest * pool->sectors_per_block + len,
> @@ -1390,6 +1402,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
>  	m->data_block = data_block;
>  	m->cell = cell;
>  
> +	/* Provision requests are chained on the original bio. */
> +	if (bio && bio_op(bio) == REQ_OP_PROVISION)
> +		m->bio = bio;
> +
>  	/*
>  	 * If the whole block of data is being overwritten or we are not
>  	 * zeroing pre-existing data, we can issue the bio immediately.
> @@ -1865,7 +1881,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
>  
>  	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
>  		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
> -		cell_defer_no_holder(tc, virt_cell);
> +		if (bio_op(bio) != REQ_OP_PROVISION)
> +			cell_defer_no_holder(tc, virt_cell);
>  	} else {
>  		struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
>  

Not confident in the above changes given the request that we only
handle REQ_OP_PROVISION one thinp block at a time.  So I'll gloss over
them for now.

> @@ -1982,6 +1999,73 @@ static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
>  	}
>  }
>  
> +static void process_provision_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> +{
> +	int r;
> +	struct pool *pool = tc->pool;
> +	struct bio *bio = cell->holder;
> +	dm_block_t begin, end;
> +	struct dm_thin_lookup_result lookup_result;
> +
> +	if (tc->requeue_mode) {
> +		cell_requeue(pool, cell);
> +		return;
> +	}
> +
> +	get_bio_block_range(tc, bio, &begin, &end);
> +
> +	while (begin != end) {
> +		r = ensure_next_mapping(pool);
> +		if (r)
> +			/* we did our best */
> +			return;
> +
> +		r = dm_thin_find_block(tc->td, begin, 1, &lookup_result);
> +		switch (r) {
> +		case 0:
> +			if (lookup_result.shared)
> +				process_shared_bio(tc, bio, begin,
> +						   &lookup_result, cell);
> +			begin++;
> +			break;
> +		case -ENODATA:
> +			bio_inc_remaining(bio);
> +			provision_block(tc, bio, begin, cell);
> +			begin++;
> +			break;
> +		default:
> +			DMERR_LIMIT(
> +				"%s: dm_thin_find_block() failed: error = %d",
> +				__func__, r);
> +			cell_defer_no_holder(tc, cell);
> +			bio_io_error(bio);
> +			begin++;
> +			break;
> +		}
> +	}
> +	bio_endio(bio);
> +	cell_defer_no_holder(tc, cell);
> +}
> +
> +static void process_provision_bio(struct thin_c *tc, struct bio *bio)
> +{
> +	dm_block_t begin, end;
> +	struct dm_cell_key virt_key;
> +	struct dm_bio_prison_cell *virt_cell;
> +
> +	get_bio_block_range(tc, bio, &begin, &end);
> +	if (begin == end) {
> +		bio_endio(bio);
> +		return;
> +	}

Like Joe mentioned, this pattern was fine for discards because they
are advisory/optional.  But we need to make sure we don't truncate
REQ_OP_PROVISION -- so we need to round up if we partially bleed into
the blocks to the left or right.

BUT ranged REQ_OP_PROVISION support is for later, this can be dealt
with more simply in that each REQ_OP_PROVISION will be handled a block
at a time initially.  SO you'll want to honor _all_ REQ_OP_PROVISION,
never returning early.

> +
> +	build_key(tc->td, VIRTUAL, begin, end, &virt_key);
> +	if (bio_detain(tc->pool, &virt_key, bio, &virt_cell))
> +		return;
> +
> +	process_provision_cell(tc, virt_cell);
> +}
> +
>  static void process_bio(struct thin_c *tc, struct bio *bio)
>  {
>  	struct pool *pool = tc->pool;
> @@ -2202,6 +2286,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
>  
>  		if (bio_op(bio) == REQ_OP_DISCARD)
>  			pool->process_discard(tc, bio);
> +		else if (bio_op(bio) == REQ_OP_PROVISION)
> +			process_provision_bio(tc, bio);

This should be pool->process_provision() (or ->process_provision_bio
if you like).  Point is, you need to be switching these methods
if/when the pool_mode transitions in set_pool_mode().

>  		else
>  			pool->process_bio(tc, bio);
>  
> @@ -2723,7 +2809,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
>  		return DM_MAPIO_SUBMITTED;
>  	}
>  
> -	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
> +	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> +	    bio_op(bio) == REQ_OP_PROVISION) {
>  		thin_defer_bio_with_throttle(tc, bio);
>  		return DM_MAPIO_SUBMITTED;
>  	}
> @@ -3370,6 +3457,8 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	pt->adjusted_pf = pt->requested_pf = pf;
>  	ti->num_flush_bios = 1;
>  	ti->limit_swap_bios = true;
> +	ti->num_provision_bios = 1;
> +	ti->provision_supported = true;
>  
>  	/*
>  	 * Only need to enable discards if the pool should pass
> @@ -4068,6 +4157,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  		blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
>  	}
>  
> +

Please fix this extra whitespace damage.

>  	/*
>  	 * pt->adjusted_pf is a staging area for the actual features to use.
>  	 * They get transferred to the live pool in bind_control_target()
> @@ -4261,6 +4351,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  		ti->num_discard_bios = 1;
>  	}
>  
> +	ti->num_provision_bios = 1;
> +	ti->provision_supported = true;
> +
>  	mutex_unlock(&dm_thin_pool_table.mutex);
>  
>  	spin_lock_irq(&tc->pool->lock);
> @@ -4475,6 +4568,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  
>  	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
>  	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> +	limits->max_provision_sectors = 2048 * 1024 * 16; /* 16G */

Building on my previous reply, with suggested update to
dm.c:__process_abnormal_io(), once you rebase on dm-6.4's dm-thin.c
you'll want to instead:

limits->max_provision_sectors = pool->sectors_per_block << SECTOR_SHIFT;

And you'll want to drop any of the above code that deals with handling
bio-prison range locking and processing of REQ_OP_PROVISION for
multiple thinp blocks at once.

Simple REQ_OP_PROVISION processing one thinp block at a time first and
then we can worry about handling REQ_OP_PROVISION that span blocks
later.

>  static struct target_type thin_target = {
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index dfde0088147a..d8f1803062b7 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1593,6 +1593,7 @@ static bool is_abnormal_io(struct bio *bio)
>  		case REQ_OP_DISCARD:
>  		case REQ_OP_SECURE_ERASE:
>  		case REQ_OP_WRITE_ZEROES:
> +		case REQ_OP_PROVISION:
>  			return true;
>  		default:
>  			break;
> @@ -1617,6 +1618,9 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
>  	case REQ_OP_WRITE_ZEROES:
>  		num_bios = ti->num_write_zeroes_bios;
>  		break;
> +	case REQ_OP_PROVISION:
> +		num_bios = ti->num_provision_bios;
> +		break;
>  	default:
>  		break;
>  	}

Please be sure to include my suggested __process_abnormal_io change
from my previous reply.

> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index 7975483816e4..e9f687521ae6 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -334,6 +334,12 @@ struct dm_target {
>  	 */
>  	unsigned int num_write_zeroes_bios;
>  
> +	/*
> +	 * The number of PROVISION bios that will be submitted to the target.
> +	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
> +	 */
> +	unsigned int num_provision_bios;
> +
>  	/*
>  	 * The minimum number of extra bytes allocated in each io for the
>  	 * target to use.
> @@ -358,6 +364,11 @@ struct dm_target {
>  	 */
>  	bool discards_supported:1;
>  
> +	/* Set if this target needs to receive provision requests regardless of
> +	 * whether or not its underlying devices have support.
> +	 */
> +	bool provision_supported:1;
> +
>  	/*
>  	 * Set if we need to limit the number of in-flight bios when swapping.
>  	 */

You'll need to add max_provision_granularity bool too (as implied by
the dm.c:__process_abnormal_io() change suggested in my first reply to
this patch).

I'm happy to wait for you to consume the v3 feedback we've provided so
you can create a v4.  I'm thinking I can base dm-thin.c's WRITE_ZEROES
support ontop of your REQ_OP_PROVISION v4 changes -- they should be
complementary.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
Sarthak Kukreti April 18, 2023, 10:13 p.m. UTC | #4
On Fri, Apr 14, 2023 at 2:58 PM Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Thu, Apr 13 2023 at  8:02P -0400,
> Sarthak Kukreti <sarthakkukreti@chromium.org> wrote:
>
> > Add support to dm devices for REQ_OP_PROVISION. The default mode
> > is to passthrough the request to the underlying device, if it
> > supports it. dm-thinpool uses the provision request to provision
> > blocks for a dm-thin device. dm-thinpool currently does not
> > pass through REQ_OP_PROVISION to underlying devices.
> >
> > For shared blocks, provision requests will break sharing and copy the
> > contents of the entire block.
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> >  drivers/md/dm-crypt.c         |   4 +-
> >  drivers/md/dm-linear.c        |   1 +
> >  drivers/md/dm-snap.c          |   7 +++
>
> Have you tested REQ_OP_PROVISION with these targets?  Just want to
> make sure you have an explicit need (and vested interest) for them
> passing through REQ_OP_PROVISION.
>

Yes. I have a vested interest in dm-linear and dm-crypt; I kept
dm-snap support mostly for consistency with thin snapshots.

> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 2055a758541d..5985343384a7 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -1850,6 +1850,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
> >       return true;
> >  }
> >
> > +static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
> > +                                 sector_t start, sector_t len, void *data)
> > +{
> > +     return !bdev_max_provision_sectors(dev->bdev);
> > +}
> > +
> > +static bool dm_table_supports_provision(struct dm_table *t)
> > +{
> > +     for (unsigned int i = 0; i < t->num_targets; i++) {
> > +             struct dm_target *ti = dm_table_get_target(t, i);
> > +
> > +             if (ti->provision_supported ||
> > +                 (ti->type->iterate_devices &&
> > +                 ti->type->iterate_devices(ti, device_provision_capable, NULL)))
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
> >                                    sector_t start, sector_t len, void *data)
> >  {
> > @@ -1983,6 +2003,11 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> >       if (!dm_table_supports_write_zeroes(t))
> >               q->limits.max_write_zeroes_sectors = 0;
> >
> > +     if (dm_table_supports_provision(t))
> > +             blk_queue_max_provision_sectors(q, UINT_MAX >> 9);
>
> This doesn't seem correct in that it'll override whatever
> max_provision_sectors was set by a target (like thinp).
>
> I think you only need the if (!dm_table_supports_provision)) case:
>
Done

> > +     else
> > +             q->limits.max_provision_sectors = 0;
> > +
> >       dm_table_verify_integrity(t);
> >
> >       /*
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index 13d4677baafd..b08b7ae617be 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
>
> I think it'll make the most sense to split out the dm-thin.c changes
> in a separate patch.
>
Separated the dm-thin changes into a separate patch that follows this one in v4.

> > @@ -909,7 +909,8 @@ static void __inc_remap_and_issue_cell(void *context,
> >       struct bio *bio;
> >
> >       while ((bio = bio_list_pop(&cell->bios))) {
> > -             if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
> > +             if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> > +                 bio_op(bio) == REQ_OP_PROVISION)
> >                       bio_list_add(&info->defer_bios, bio);
> >               else {
> >                       inc_all_io_entry(info->tc->pool, bio);
> > @@ -1013,6 +1014,15 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
> >               goto out;
> >       }
> >
> > +     /*
> > +      * For provision requests, once the prepared block has been inserted
> > +      * into the mapping btree, return.
> > +      */
> > +     if (bio && bio_op(bio) == REQ_OP_PROVISION) {
> > +             bio_endio(bio);
> > +             return;
> > +     }
> > +
> >       /*
> >        * Release any bios held while the block was being provisioned.
> >        * If we are processing a write bio that completely covers the block,
> > @@ -1241,7 +1251,7 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio)
> >
> >  static int io_overwrites_block(struct pool *pool, struct bio *bio)
> >  {
> > -     return (bio_data_dir(bio) == WRITE) &&
> > +     return (bio_data_dir(bio) == WRITE) && bio_op(bio) != REQ_OP_PROVISION &&
> >               io_overlaps_block(pool, bio);
> >  }
> >
> > @@ -1334,10 +1344,11 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
> >       /*
> >        * IO to pool_dev remaps to the pool target's data_dev.
> >        *
> > -      * If the whole block of data is being overwritten, we can issue the
> > -      * bio immediately. Otherwise we use kcopyd to clone the data first.
> > +      * If the whole block of data is being overwritten and if this is not a
> > +      * provision request, we can issue the bio immediately.
> > +      * Otherwise we use kcopyd to clone the data first.
> >        */
> > -     if (io_overwrites_block(pool, bio))
> > +     if (io_overwrites_block(pool, bio) && bio_op(bio) != REQ_OP_PROVISION)
> >               remap_and_issue_overwrite(tc, bio, data_dest, m);
> >       else {
> >               struct dm_io_region from, to;
> > @@ -1356,7 +1367,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
> >               /*
> >                * Do we need to zero a tail region?
> >                */
> > -             if (len < pool->sectors_per_block && pool->pf.zero_new_blocks) {
> > +             if (len < pool->sectors_per_block && pool->pf.zero_new_blocks &&
> > +                 bio_op(bio) != REQ_OP_PROVISION) {
> >                       atomic_inc(&m->prepare_actions);
> >                       ll_zero(tc, m,
> >                               data_dest * pool->sectors_per_block + len,
> > @@ -1390,6 +1402,10 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
> >       m->data_block = data_block;
> >       m->cell = cell;
> >
> > +     /* Provision requests are chained on the original bio. */
> > +     if (bio && bio_op(bio) == REQ_OP_PROVISION)
> > +             m->bio = bio;
> > +
> >       /*
> >        * If the whole block of data is being overwritten or we are not
> >        * zeroing pre-existing data, we can issue the bio immediately.
> > @@ -1865,7 +1881,8 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
> >
> >       if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
> >               break_sharing(tc, bio, block, &key, lookup_result, data_cell);
> > -             cell_defer_no_holder(tc, virt_cell);
> > +             if (bio_op(bio) != REQ_OP_PROVISION)
> > +                     cell_defer_no_holder(tc, virt_cell);
> >       } else {
> >               struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
> >
>
> Not confident in the above changes given the request that we only
> handle REQ_OP_PROVISION one thinp block at a time.  So I'll gloss over
> them for now.
>
Yeah, the majority of this got removed in v4. I added a check in
io_overwrites_block() to return false for all provision requests.

> > @@ -1982,6 +1999,73 @@ static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> >       }
> >  }
> >
> > +static void process_provision_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
> > +{
> > +     int r;
> > +     struct pool *pool = tc->pool;
> > +     struct bio *bio = cell->holder;
> > +     dm_block_t begin, end;
> > +     struct dm_thin_lookup_result lookup_result;
> > +
> > +     if (tc->requeue_mode) {
> > +             cell_requeue(pool, cell);
> > +             return;
> > +     }
> > +
> > +     get_bio_block_range(tc, bio, &begin, &end);
> > +
> > +     while (begin != end) {
> > +             r = ensure_next_mapping(pool);
> > +             if (r)
> > +                     /* we did our best */
> > +                     return;
> > +
> > +             r = dm_thin_find_block(tc->td, begin, 1, &lookup_result);
> > +             switch (r) {
> > +             case 0:
> > +                     if (lookup_result.shared)
> > +                             process_shared_bio(tc, bio, begin,
> > +                                                &lookup_result, cell);
> > +                     begin++;
> > +                     break;
> > +             case -ENODATA:
> > +                     bio_inc_remaining(bio);
> > +                     provision_block(tc, bio, begin, cell);
> > +                     begin++;
> > +                     break;
> > +             default:
> > +                     DMERR_LIMIT(
> > +                             "%s: dm_thin_find_block() failed: error = %d",
> > +                             __func__, r);
> > +                     cell_defer_no_holder(tc, cell);
> > +                     bio_io_error(bio);
> > +                     begin++;
> > +                     break;
> > +             }
> > +     }
> > +     bio_endio(bio);
> > +     cell_defer_no_holder(tc, cell);
> > +}
> > +
> > +static void process_provision_bio(struct thin_c *tc, struct bio *bio)
> > +{
> > +     dm_block_t begin, end;
> > +     struct dm_cell_key virt_key;
> > +     struct dm_bio_prison_cell *virt_cell;
> > +
> > +     get_bio_block_range(tc, bio, &begin, &end);
> > +     if (begin == end) {
> > +             bio_endio(bio);
> > +             return;
> > +     }
>
> Like Joe mentioned, this pattern was fine for discards because they
> are advisory/optional.  But we need to make sure we don't truncate
> REQ_OP_PROVISION -- so we need to round up if we partially bleed into
> the blocks to the left or right.
>
> BUT ranged REQ_OP_PROVISION support is for later, this can be dealt
> with more simply in that each REQ_OP_PROVISION will be handled a block
> at a time initially.  SO you'll want to honor _all_ REQ_OP_PROVISION,
> never returning early.
>
Thanks. The next patch in the series has the simplified version. It
had a lot in common with process_bio() so there was a possibility for
merging the two code paths, but I opted to keep it like this to make
ranged handling and passdown support easier to implement.

> > +
> > +     build_key(tc->td, VIRTUAL, begin, end, &virt_key);
> > +     if (bio_detain(tc->pool, &virt_key, bio, &virt_cell))
> > +             return;
> > +
> > +     process_provision_cell(tc, virt_cell);
> > +}
> > +
> >  static void process_bio(struct thin_c *tc, struct bio *bio)
> >  {
> >       struct pool *pool = tc->pool;
> > @@ -2202,6 +2286,8 @@ static void process_thin_deferred_bios(struct thin_c *tc)
> >
> >               if (bio_op(bio) == REQ_OP_DISCARD)
> >                       pool->process_discard(tc, bio);
> > +             else if (bio_op(bio) == REQ_OP_PROVISION)
> > +                     process_provision_bio(tc, bio);
>
> This should be pool->process_provision() (or ->process_provision_bio
> if you like).  Point is, you need to be switching these methods
> if/when the pool_mode transitions in set_pool_mode().
>
Done

> >               else
> >                       pool->process_bio(tc, bio);
> >
> > @@ -2723,7 +2809,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
> >               return DM_MAPIO_SUBMITTED;
> >       }
> >
> > -     if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
> > +     if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
> > +         bio_op(bio) == REQ_OP_PROVISION) {
> >               thin_defer_bio_with_throttle(tc, bio);
> >               return DM_MAPIO_SUBMITTED;
> >       }
> > @@ -3370,6 +3457,8 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> >       pt->adjusted_pf = pt->requested_pf = pf;
> >       ti->num_flush_bios = 1;
> >       ti->limit_swap_bios = true;
> > +     ti->num_provision_bios = 1;
> > +     ti->provision_supported = true;
> >
> >       /*
> >        * Only need to enable discards if the pool should pass
> > @@ -4068,6 +4157,7 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >               blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
> >       }
> >
> > +
>
> Please fix this extra whitespace damage.
>
Done

> >       /*
> >        * pt->adjusted_pf is a staging area for the actual features to use.
> >        * They get transferred to the live pool in bind_control_target()
> > @@ -4261,6 +4351,9 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
> >               ti->num_discard_bios = 1;
> >       }
> >
> > +     ti->num_provision_bios = 1;
> > +     ti->provision_supported = true;
> > +
> >       mutex_unlock(&dm_thin_pool_table.mutex);
> >
> >       spin_lock_irq(&tc->pool->lock);
> > @@ -4475,6 +4568,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >
> >       limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> >       limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > +     limits->max_provision_sectors = 2048 * 1024 * 16; /* 16G */
>
> Building on my previous reply, with suggested update to
> dm.c:__process_abnormal_io(), once you rebase on dm-6.4's dm-thin.c
> you'll want to instead:
>
> limits->max_provision_sectors = pool->sectors_per_block << SECTOR_SHIFT;
>
> And you'll want to drop any of the above code that deals with handling
> bio-prison range locking and processing of REQ_OP_PROVISION for
> multiple thinp blocks at once.
>
> Simple REQ_OP_PROVISION processing one thinp block at a time first and
> then we can worry about handling REQ_OP_PROVISION that span blocks
> later.
>
Thanks, done in v4.

> >  static struct target_type thin_target = {
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index dfde0088147a..d8f1803062b7 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1593,6 +1593,7 @@ static bool is_abnormal_io(struct bio *bio)
> >               case REQ_OP_DISCARD:
> >               case REQ_OP_SECURE_ERASE:
> >               case REQ_OP_WRITE_ZEROES:
> > +             case REQ_OP_PROVISION:
> >                       return true;
> >               default:
> >                       break;
> > @@ -1617,6 +1618,9 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
> >       case REQ_OP_WRITE_ZEROES:
> >               num_bios = ti->num_write_zeroes_bios;
> >               break;
> > +     case REQ_OP_PROVISION:
> > +             num_bios = ti->num_provision_bios;
> > +             break;
> >       default:
> >               break;
> >       }
>
> Please be sure to include my suggested __process_abnormal_io change
> from my previous reply.
>
Done.

> > diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> > index 7975483816e4..e9f687521ae6 100644
> > --- a/include/linux/device-mapper.h
> > +++ b/include/linux/device-mapper.h
> > @@ -334,6 +334,12 @@ struct dm_target {
> >        */
> >       unsigned int num_write_zeroes_bios;
> >
> > +     /*
> > +      * The number of PROVISION bios that will be submitted to the target.
> > +      * The bio number can be accessed with dm_bio_get_target_bio_nr.
> > +      */
> > +     unsigned int num_provision_bios;
> > +
> >       /*
> >        * The minimum number of extra bytes allocated in each io for the
> >        * target to use.
> > @@ -358,6 +364,11 @@ struct dm_target {
> >        */
> >       bool discards_supported:1;
> >
> > +     /* Set if this target needs to receive provision requests regardless of
> > +      * whether or not its underlying devices have support.
> > +      */
> > +     bool provision_supported:1;
> > +
> >       /*
> >        * Set if we need to limit the number of in-flight bios when swapping.
> >        */
>
> You'll need to add max_provision_granularity bool too (as implied by
> the dm.c:__process_abnormal_io() change suggested in my first reply to
> this patch).
>
> I'm happy to wait for you to consume the v3 feedback we've provided so
> you can create a v4.  I'm thinking I can base dm-thin.c's WRITE_ZEROES
> support ontop of your REQ_OP_PROVISION v4 changes -- they should be
> complementary.
>
Done. Thanks for the review and guidance!

Best
Sarthak

> Thanks,
> Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
diff mbox series

Patch

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3ba53dc3cc3f..5c655bfd4686 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3087,6 +3087,8 @@  static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
 	if (ret)
 		return ret;
 
+	ti->num_provision_bios = 1;
+
 	while (opt_params--) {
 		opt_string = dm_shift_arg(&as);
 		if (!opt_string) {
@@ -3390,7 +3392,7 @@  static int crypt_map(struct dm_target *ti, struct bio *bio)
 	 * - for REQ_OP_DISCARD caller must use flush if IO ordering matters
 	 */
 	if (unlikely(bio->bi_opf & REQ_PREFLUSH ||
-	    bio_op(bio) == REQ_OP_DISCARD)) {
+	    bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) {
 		bio_set_dev(bio, cc->dev->bdev);
 		if (bio_sectors(bio))
 			bio->bi_iter.bi_sector = cc->start +
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 3e622dcc9dbd..7843e548e850 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@  static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	ti->num_discard_bios = 1;
 	ti->num_secure_erase_bios = 1;
 	ti->num_write_zeroes_bios = 1;
+	ti->num_provision_bios = 1;
 	ti->private = lc;
 	return 0;
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index f766c21408f1..f6b224a12000 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1358,6 +1358,7 @@  static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	if (s->discard_zeroes_cow)
 		ti->num_discard_bios = (s->discard_passdown_origin ? 2 : 1);
 	ti->per_io_data_size = sizeof(struct dm_snap_tracked_chunk);
+	ti->num_provision_bios = 1;
 
 	/* Add snapshot to the list of snapshots for this origin */
 	/* Exceptions aren't triggered till snapshot_resume() is called */
@@ -2003,6 +2004,11 @@  static int snapshot_map(struct dm_target *ti, struct bio *bio)
 	/* If the block is already remapped - use that, else remap it */
 	e = dm_lookup_exception(&s->complete, chunk);
 	if (e) {
+		if (unlikely(bio_op(bio) == REQ_OP_PROVISION)) {
+			bio_endio(bio);
+			r = DM_MAPIO_SUBMITTED;
+			goto out_unlock;
+		}
 		remap_exception(s, e, bio, chunk);
 		if (unlikely(bio_op(bio) == REQ_OP_DISCARD) &&
 		    io_overlaps_chunk(s, bio)) {
@@ -2413,6 +2419,7 @@  static void snapshot_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		/* All discards are split on chunk_size boundary */
 		limits->discard_granularity = snap->store->chunk_size;
 		limits->max_discard_sectors = snap->store->chunk_size;
+		limits->max_provision_sectors = snap->store->chunk_size;
 
 		up_read(&_origins_lock);
 	}
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 2055a758541d..5985343384a7 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1850,6 +1850,26 @@  static bool dm_table_supports_write_zeroes(struct dm_table *t)
 	return true;
 }
 
+static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
+				    sector_t start, sector_t len, void *data)
+{
+	return !bdev_max_provision_sectors(dev->bdev);
+}
+
+static bool dm_table_supports_provision(struct dm_table *t)
+{
+	for (unsigned int i = 0; i < t->num_targets; i++) {
+		struct dm_target *ti = dm_table_get_target(t, i);
+
+		if (ti->provision_supported ||
+		    (ti->type->iterate_devices &&
+		    ti->type->iterate_devices(ti, device_provision_capable, NULL)))
+			return true;
+	}
+
+	return false;
+}
+
 static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
 				     sector_t start, sector_t len, void *data)
 {
@@ -1983,6 +2003,11 @@  int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_zeroes(t))
 		q->limits.max_write_zeroes_sectors = 0;
 
+	if (dm_table_supports_provision(t))
+		blk_queue_max_provision_sectors(q, UINT_MAX >> 9);
+	else
+		q->limits.max_provision_sectors = 0;
+
 	dm_table_verify_integrity(t);
 
 	/*
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 13d4677baafd..b08b7ae617be 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -909,7 +909,8 @@  static void __inc_remap_and_issue_cell(void *context,
 	struct bio *bio;
 
 	while ((bio = bio_list_pop(&cell->bios))) {
-		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
+		if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+		    bio_op(bio) == REQ_OP_PROVISION)
 			bio_list_add(&info->defer_bios, bio);
 		else {
 			inc_all_io_entry(info->tc->pool, bio);
@@ -1013,6 +1014,15 @@  static void process_prepared_mapping(struct dm_thin_new_mapping *m)
 		goto out;
 	}
 
+	/*
+	 * For provision requests, once the prepared block has been inserted
+	 * into the mapping btree, return.
+	 */
+	if (bio && bio_op(bio) == REQ_OP_PROVISION) {
+		bio_endio(bio);
+		return;
+	}
+
 	/*
 	 * Release any bios held while the block was being provisioned.
 	 * If we are processing a write bio that completely covers the block,
@@ -1241,7 +1251,7 @@  static int io_overlaps_block(struct pool *pool, struct bio *bio)
 
 static int io_overwrites_block(struct pool *pool, struct bio *bio)
 {
-	return (bio_data_dir(bio) == WRITE) &&
+	return (bio_data_dir(bio) == WRITE) && bio_op(bio) != REQ_OP_PROVISION &&
 		io_overlaps_block(pool, bio);
 }
 
@@ -1334,10 +1344,11 @@  static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 	/*
 	 * IO to pool_dev remaps to the pool target's data_dev.
 	 *
-	 * If the whole block of data is being overwritten, we can issue the
-	 * bio immediately. Otherwise we use kcopyd to clone the data first.
+	 * If the whole block of data is being overwritten and if this is not a
+	 * provision request, we can issue the bio immediately.
+	 * Otherwise we use kcopyd to clone the data first.
 	 */
-	if (io_overwrites_block(pool, bio))
+	if (io_overwrites_block(pool, bio) && bio_op(bio) != REQ_OP_PROVISION)
 		remap_and_issue_overwrite(tc, bio, data_dest, m);
 	else {
 		struct dm_io_region from, to;
@@ -1356,7 +1367,8 @@  static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
 		/*
 		 * Do we need to zero a tail region?
 		 */
-		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks) {
+		if (len < pool->sectors_per_block && pool->pf.zero_new_blocks &&
+		    bio_op(bio) != REQ_OP_PROVISION) {
 			atomic_inc(&m->prepare_actions);
 			ll_zero(tc, m,
 				data_dest * pool->sectors_per_block + len,
@@ -1390,6 +1402,10 @@  static void schedule_zero(struct thin_c *tc, dm_block_t virt_block,
 	m->data_block = data_block;
 	m->cell = cell;
 
+	/* Provision requests are chained on the original bio. */
+	if (bio && bio_op(bio) == REQ_OP_PROVISION)
+		m->bio = bio;
+
 	/*
 	 * If the whole block of data is being overwritten or we are not
 	 * zeroing pre-existing data, we can issue the bio immediately.
@@ -1865,7 +1881,8 @@  static void process_shared_bio(struct thin_c *tc, struct bio *bio,
 
 	if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size) {
 		break_sharing(tc, bio, block, &key, lookup_result, data_cell);
-		cell_defer_no_holder(tc, virt_cell);
+		if (bio_op(bio) != REQ_OP_PROVISION)
+			cell_defer_no_holder(tc, virt_cell);
 	} else {
 		struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
 
@@ -1982,6 +1999,73 @@  static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
 	}
 }
 
+static void process_provision_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
+{
+	int r;
+	struct pool *pool = tc->pool;
+	struct bio *bio = cell->holder;
+	dm_block_t begin, end;
+	struct dm_thin_lookup_result lookup_result;
+
+	if (tc->requeue_mode) {
+		cell_requeue(pool, cell);
+		return;
+	}
+
+	get_bio_block_range(tc, bio, &begin, &end);
+
+	while (begin != end) {
+		r = ensure_next_mapping(pool);
+		if (r)
+			/* we did our best */
+			return;
+
+		r = dm_thin_find_block(tc->td, begin, 1, &lookup_result);
+		switch (r) {
+		case 0:
+			if (lookup_result.shared)
+				process_shared_bio(tc, bio, begin,
+						   &lookup_result, cell);
+			begin++;
+			break;
+		case -ENODATA:
+			bio_inc_remaining(bio);
+			provision_block(tc, bio, begin, cell);
+			begin++;
+			break;
+		default:
+			DMERR_LIMIT(
+				"%s: dm_thin_find_block() failed: error = %d",
+				__func__, r);
+			cell_defer_no_holder(tc, cell);
+			bio_io_error(bio);
+			begin++;
+			break;
+		}
+	}
+	bio_endio(bio);
+	cell_defer_no_holder(tc, cell);
+}
+
+static void process_provision_bio(struct thin_c *tc, struct bio *bio)
+{
+	dm_block_t begin, end;
+	struct dm_cell_key virt_key;
+	struct dm_bio_prison_cell *virt_cell;
+
+	get_bio_block_range(tc, bio, &begin, &end);
+	if (begin == end) {
+		bio_endio(bio);
+		return;
+	}
+
+	build_key(tc->td, VIRTUAL, begin, end, &virt_key);
+	if (bio_detain(tc->pool, &virt_key, bio, &virt_cell))
+		return;
+
+	process_provision_cell(tc, virt_cell);
+}
+
 static void process_bio(struct thin_c *tc, struct bio *bio)
 {
 	struct pool *pool = tc->pool;
@@ -2202,6 +2286,8 @@  static void process_thin_deferred_bios(struct thin_c *tc)
 
 		if (bio_op(bio) == REQ_OP_DISCARD)
 			pool->process_discard(tc, bio);
+		else if (bio_op(bio) == REQ_OP_PROVISION)
+			process_provision_bio(tc, bio);
 		else
 			pool->process_bio(tc, bio);
 
@@ -2723,7 +2809,8 @@  static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 		return DM_MAPIO_SUBMITTED;
 	}
 
-	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
+	if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD ||
+	    bio_op(bio) == REQ_OP_PROVISION) {
 		thin_defer_bio_with_throttle(tc, bio);
 		return DM_MAPIO_SUBMITTED;
 	}
@@ -3370,6 +3457,8 @@  static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	pt->adjusted_pf = pt->requested_pf = pf;
 	ti->num_flush_bios = 1;
 	ti->limit_swap_bios = true;
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
 
 	/*
 	 * Only need to enable discards if the pool should pass
@@ -4068,6 +4157,7 @@  static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits)
 		blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT);
 	}
 
+
 	/*
 	 * pt->adjusted_pf is a staging area for the actual features to use.
 	 * They get transferred to the live pool in bind_control_target()
@@ -4261,6 +4351,9 @@  static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 		ti->num_discard_bios = 1;
 	}
 
+	ti->num_provision_bios = 1;
+	ti->provision_supported = true;
+
 	mutex_unlock(&dm_thin_pool_table.mutex);
 
 	spin_lock_irq(&tc->pool->lock);
@@ -4475,6 +4568,7 @@  static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
 	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
+	limits->max_provision_sectors = 2048 * 1024 * 16; /* 16G */
 }
 
 static struct target_type thin_target = {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dfde0088147a..d8f1803062b7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1593,6 +1593,7 @@  static bool is_abnormal_io(struct bio *bio)
 		case REQ_OP_DISCARD:
 		case REQ_OP_SECURE_ERASE:
 		case REQ_OP_WRITE_ZEROES:
+		case REQ_OP_PROVISION:
 			return true;
 		default:
 			break;
@@ -1617,6 +1618,9 @@  static blk_status_t __process_abnormal_io(struct clone_info *ci,
 	case REQ_OP_WRITE_ZEROES:
 		num_bios = ti->num_write_zeroes_bios;
 		break;
+	case REQ_OP_PROVISION:
+		num_bios = ti->num_provision_bios;
+		break;
 	default:
 		break;
 	}
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 7975483816e4..e9f687521ae6 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -334,6 +334,12 @@  struct dm_target {
 	 */
 	unsigned int num_write_zeroes_bios;
 
+	/*
+	 * The number of PROVISION bios that will be submitted to the target.
+	 * The bio number can be accessed with dm_bio_get_target_bio_nr.
+	 */
+	unsigned int num_provision_bios;
+
 	/*
 	 * The minimum number of extra bytes allocated in each io for the
 	 * target to use.
@@ -358,6 +364,11 @@  struct dm_target {
 	 */
 	bool discards_supported:1;
 
+	/* Set if this target needs to receive provision requests regardless of
+	 * whether or not its underlying devices have support.
+	 */
+	bool provision_supported:1;
+
 	/*
 	 * Set if we need to limit the number of in-flight bios when swapping.
 	 */