Message ID | 20230518223326.18744-5-sarthakkukreti@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce provisioning primitives | expand |
On Thu, May 18 2023 at 6:33P -0400, Sarthak Kukreti <sarthakkukreti@chromium.org> wrote: > dm-thinpool uses the provision request to provision > blocks for a dm-thin device. dm-thinpool currently does not > pass through REQ_OP_PROVISION to underlying devices. > > For shared blocks, provision requests will break sharing and copy the > contents of the entire block. Additionally, if 'skip_block_zeroing' > is not set, dm-thin will opt to zero out the entire range as a part > of provisioning. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > --- > drivers/md/dm-thin.c | 74 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 70 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index 2b13c949bd72..f1b68b558cf0 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio) > > static int io_overwrites_block(struct pool *pool, struct bio *bio) > { > - return (bio_data_dir(bio) == WRITE) && > - io_overlaps_block(pool, bio); > + return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) && > + bio_op(bio) != REQ_OP_PROVISION; > } > > static void save_and_set_endio(struct bio *bio, bio_end_io_t **save, > @@ -1394,6 +1396,9 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block, > m->data_block = data_block; > m->cell = cell; > > + if (bio && bio_op(bio) == REQ_OP_PROVISION) > + m->bio = bio; > + > /* > * If the whole block of data is being overwritten or we are not > * zeroing pre-existing data, we can issue the bio immediately. This doesn't seem like the best way to address avoiding passdown of provision bios (relying on process_prepared_mapping's implementation that happens to do the right thing if m->bio set). Doing so cascades into relying on complete_overwrite_bio() happening to _not_ actually being specific to "overwrite" bios. I don't have a better suggestion yet but will look closer. Just think this needs to be formalized a bit more rather than it happening to "just work". Cc'ing Joe to see what he thinks too. This is something we can clean up with a follow-on patch though, so not a show-stopper for this series. Mike
On Fri, May 19 2023 at 11:23P -0400, Mike Snitzer <snitzer@kernel.org> wrote: > On Thu, May 18 2023 at 6:33P -0400, > Sarthak Kukreti <sarthakkukreti@chromium.org> wrote: > > > dm-thinpool uses the provision request to provision > > blocks for a dm-thin device. dm-thinpool currently does not > > pass through REQ_OP_PROVISION to underlying devices. > > > > For shared blocks, provision requests will break sharing and copy the > > contents of the entire block. Additionally, if 'skip_block_zeroing' > > is not set, dm-thin will opt to zero out the entire range as a part > > of provisioning. > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > > --- > > drivers/md/dm-thin.c | 74 +++++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 70 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > > index 2b13c949bd72..f1b68b558cf0 100644 > > --- a/drivers/md/dm-thin.c > > +++ b/drivers/md/dm-thin.c > > @@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio) > > > > static int io_overwrites_block(struct pool *pool, struct bio *bio) > > { > > - return (bio_data_dir(bio) == WRITE) && > > - io_overlaps_block(pool, bio); > > + return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) && > > + bio_op(bio) != REQ_OP_PROVISION; > > } > > > > static void save_and_set_endio(struct bio *bio, bio_end_io_t **save, > > @@ -1394,6 +1396,9 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block, > > m->data_block = data_block; > > m->cell = cell; > > > > + if (bio && bio_op(bio) == REQ_OP_PROVISION) > > + m->bio = bio; > > + > > /* > > * If the whole block of data is being overwritten or we are not > > * zeroing pre-existing data, we can issue the bio immediately. > > This doesn't seem like the best way to address avoiding passdown of > provision bios (relying on process_prepared_mapping's implementation > that happens to do the right thing if m->bio set). Doing so cascades > into relying on complete_overwrite_bio() happening to _not_ actually > being specific to "overwrite" bios. > > I don't have a better suggestion yet but will look closer. Just think > this needs to be formalized a bit more rather than it happening to > "just work". > > Cc'ing Joe to see what he thinks too. This is something we can clean > up with a follow-on patch though, so not a show-stopper for this > series. I haven't circled back to look close enough at this but REQ_OP_PROVISION bios _are_ being passed down to the thin-pool's underlying data device. Brian Foster reported that if he issues a REQ_OP_PROVISION to a thin device after a snapshot (to break sharing), it'll fail with -EOPNOTSUPP (response is from bio being passed down to device that doesn't support it). I was able to reproduce with: # fallocate --offset 0 --length 1048576 /dev/test/thin # lvcreate -n snap --snapshot test/thin # fallocate --offset 0 --length 1048576 /dev/test/thin fallocate: fallocate failed: Operation not supported But reports success when retried: # fallocate --offset 0 --length 1048576 /dev/test/thin # echo $? 0 It's somewhat moot in that Joe will be reimplementing handling for REQ_OP_PROVISION _but_ in the meantime it'd be nice to have a version of this patch that doesn't error (due to passdown of REQ_OP_PROVISION) when breaking sharing. Primarily so the XFS guys (Dave and Brian) can make progress. I'll take a closer look tomorrow but figured I'd let you know. Mike
On Thu, Jun 08 2023 at 5:24P -0400, Mike Snitzer <snitzer@kernel.org> wrote: > On Fri, May 19 2023 at 11:23P -0400, > Mike Snitzer <snitzer@kernel.org> wrote: > > > On Thu, May 18 2023 at 6:33P -0400, > > Sarthak Kukreti <sarthakkukreti@chromium.org> wrote: > > > > > dm-thinpool uses the provision request to provision > > > blocks for a dm-thin device. dm-thinpool currently does not > > > pass through REQ_OP_PROVISION to underlying devices. > > > > > > For shared blocks, provision requests will break sharing and copy the > > > contents of the entire block. Additionally, if 'skip_block_zeroing' > > > is not set, dm-thin will opt to zero out the entire range as a part > > > of provisioning. > > > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > --- > > > drivers/md/dm-thin.c | 74 +++++++++++++++++++++++++++++++++++++++++--- > > > 1 file changed, 70 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > > > index 2b13c949bd72..f1b68b558cf0 100644 > > > --- a/drivers/md/dm-thin.c > > > +++ b/drivers/md/dm-thin.c > > > @@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio) > > > > > > static int io_overwrites_block(struct pool *pool, struct bio *bio) > > > { > > > - return (bio_data_dir(bio) == WRITE) && > > > - io_overlaps_block(pool, bio); > > > + return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) && > > > + bio_op(bio) != REQ_OP_PROVISION; > > > } > > > > > > static void save_and_set_endio(struct bio *bio, bio_end_io_t **save, > > > @@ -1394,6 +1396,9 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block, > > > m->data_block = data_block; > > > m->cell = cell; > > > > > > + if (bio && bio_op(bio) == REQ_OP_PROVISION) > > > + m->bio = bio; > > > + > > > /* > > > * If the whole block of data is being overwritten or we are not > > > * zeroing pre-existing data, we can issue the bio immediately. > > > > This doesn't seem like the best way to address avoiding passdown of > > provision bios (relying on process_prepared_mapping's implementation > > that happens to do the right thing if m->bio set). Doing so cascades > > into relying on complete_overwrite_bio() happening to _not_ actually > > being specific to "overwrite" bios. > > > > I don't have a better suggestion yet but will look closer. Just think > > this needs to be formalized a bit more rather than it happening to > > "just work". > > > > Cc'ing Joe to see what he thinks too. This is something we can clean > > up with a follow-on patch though, so not a show-stopper for this > > series. > > I haven't circled back to look close enough at this but > REQ_OP_PROVISION bios _are_ being passed down to the thin-pool's > underlying data device. > > Brian Foster reported that if he issues a REQ_OP_PROVISION to a thin > device after a snapshot (to break sharing), it'll fail with > -EOPNOTSUPP (response is from bio being passed down to device that > doesn't support it). I was able to reproduce with: > > # fallocate --offset 0 --length 1048576 /dev/test/thin > # lvcreate -n snap --snapshot test/thin > > # fallocate --offset 0 --length 1048576 /dev/test/thin > fallocate: fallocate failed: Operation not supported > > But reports success when retried: > # fallocate --offset 0 --length 1048576 /dev/test/thin > # echo $? > 0 > > It's somewhat moot in that Joe will be reimplementing handling for > REQ_OP_PROVISION _but_ in the meantime it'd be nice to have a version > of this patch that doesn't error (due to passdown of REQ_OP_PROVISION) > when breaking sharing. Primarily so the XFS guys (Dave and Brian) can > make progress. > > I'll take a closer look tomorrow but figured I'd let you know. This fixes the issue for me (causes process_prepared_mapping to end the bio without REQ_OP_PROVISION passdown). But like I said above back on May 19: needs cleanup to make it less of a hack for the REQ_OP_PROVISION case. At a minimum complete_overwrite_bio() would need renaming. diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 43a6702f9efe..434a3b37af2f 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1324,6 +1324,9 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block, m->data_block = data_dest; m->cell = cell; + if (bio_op(bio) == REQ_OP_PROVISION) + m->bio = bio; + /* * quiesce action + copy action + an extra reference held for the * duration of this function (we may need to inc later for a
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 2b13c949bd72..f1b68b558cf0 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -274,6 +274,7 @@ struct pool { process_bio_fn process_bio; process_bio_fn process_discard; + process_bio_fn process_provision; process_cell_fn process_cell; process_cell_fn process_discard_cell; @@ -913,7 +914,8 @@ static void __inc_remap_and_issue_cell(void *context, struct bio *bio; while ((bio = bio_list_pop(&cell->bios))) { - if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) + if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD || + bio_op(bio) == REQ_OP_PROVISION) bio_list_add(&info->defer_bios, bio); else { inc_all_io_entry(info->tc->pool, bio); @@ -1245,8 +1247,8 @@ static int io_overlaps_block(struct pool *pool, struct bio *bio) static int io_overwrites_block(struct pool *pool, struct bio *bio) { - return (bio_data_dir(bio) == WRITE) && - io_overlaps_block(pool, bio); + return (bio_data_dir(bio) == WRITE) && io_overlaps_block(pool, bio) && + bio_op(bio) != REQ_OP_PROVISION; } static void save_and_set_endio(struct bio *bio, bio_end_io_t **save, @@ -1394,6 +1396,9 @@ static void schedule_zero(struct thin_c *tc, dm_block_t virt_block, m->data_block = data_block; m->cell = cell; + if (bio && bio_op(bio) == REQ_OP_PROVISION) + m->bio = bio; + /* * If the whole block of data is being overwritten or we are not * zeroing pre-existing data, we can issue the bio immediately. @@ -1953,6 +1958,51 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block } } +static void process_provision_bio(struct thin_c *tc, struct bio *bio) +{ + int r; + struct pool *pool = tc->pool; + dm_block_t block = get_bio_block(tc, bio); + struct dm_bio_prison_cell *cell; + struct dm_cell_key key; + struct dm_thin_lookup_result lookup_result; + + /* + * If cell is already occupied, then the block is already + * being provisioned so we have nothing further to do here. + */ + build_virtual_key(tc->td, block, &key); + if (bio_detain(pool, &key, bio, &cell)) + return; + + if (tc->requeue_mode) { + cell_requeue(pool, cell); + return; + } + + r = dm_thin_find_block(tc->td, block, 1, &lookup_result); + switch (r) { + case 0: + if (lookup_result.shared) { + process_shared_bio(tc, bio, block, &lookup_result, cell); + } else { + bio_endio(bio); + cell_defer_no_holder(tc, cell); + } + break; + case -ENODATA: + provision_block(tc, bio, block, cell); + break; + + default: + DMERR_LIMIT("%s: dm_thin_find_block() failed: error = %d", + __func__, r); + cell_defer_no_holder(tc, cell); + bio_io_error(bio); + break; + } +} + static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell) { int r; @@ -2228,6 +2278,8 @@ static void process_thin_deferred_bios(struct thin_c *tc) if (bio_op(bio) == REQ_OP_DISCARD) pool->process_discard(tc, bio); + else if (bio_op(bio) == REQ_OP_PROVISION) + pool->process_provision(tc, bio); else pool->process_bio(tc, bio); @@ -2579,6 +2631,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) dm_pool_metadata_read_only(pool->pmd); pool->process_bio = process_bio_fail; pool->process_discard = process_bio_fail; + pool->process_provision = process_bio_fail; pool->process_cell = process_cell_fail; pool->process_discard_cell = process_cell_fail; pool->process_prepared_mapping = process_prepared_mapping_fail; @@ -2592,6 +2645,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) dm_pool_metadata_read_only(pool->pmd); pool->process_bio = process_bio_read_only; pool->process_discard = process_bio_success; + pool->process_provision = process_bio_fail; pool->process_cell = process_cell_read_only; pool->process_discard_cell = process_cell_success; pool->process_prepared_mapping = process_prepared_mapping_fail; @@ -2612,6 +2666,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) pool->out_of_data_space = true; pool->process_bio = process_bio_read_only; pool->process_discard = process_discard_bio; + pool->process_provision = process_bio_fail; pool->process_cell = process_cell_read_only; pool->process_prepared_mapping = process_prepared_mapping; set_discard_callbacks(pool); @@ -2628,6 +2683,7 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) dm_pool_metadata_read_write(pool->pmd); pool->process_bio = process_bio; pool->process_discard = process_discard_bio; + pool->process_provision = process_provision_bio; pool->process_cell = process_cell; pool->process_prepared_mapping = process_prepared_mapping; set_discard_callbacks(pool); @@ -2749,7 +2805,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_SUBMITTED; } - if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) { + if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD || + bio_op(bio) == REQ_OP_PROVISION) { thin_defer_bio_with_throttle(tc, bio); return DM_MAPIO_SUBMITTED; } @@ -3396,6 +3453,9 @@ static int pool_ctr(struct dm_target *ti, unsigned int argc, char **argv) pt->adjusted_pf = pt->requested_pf = pf; ti->num_flush_bios = 1; ti->limit_swap_bios = true; + ti->num_provision_bios = 1; + ti->provision_supported = true; + ti->max_provision_granularity = true; /* * Only need to enable discards if the pool should pass @@ -4094,6 +4154,8 @@ static void pool_io_hints(struct dm_target *ti, struct queue_limits *limits) blk_limits_io_opt(limits, pool->sectors_per_block << SECTOR_SHIFT); } + limits->max_provision_sectors = pool->sectors_per_block; + /* * pt->adjusted_pf is a staging area for the actual features to use. * They get transferred to the live pool in bind_control_target() @@ -4288,6 +4350,10 @@ static int thin_ctr(struct dm_target *ti, unsigned int argc, char **argv) ti->max_discard_granularity = true; } + ti->num_provision_bios = 1; + ti->provision_supported = true; + ti->max_provision_granularity = true; + mutex_unlock(&dm_thin_pool_table.mutex); spin_lock_irq(&tc->pool->lock);
dm-thinpool uses the provision request to provision blocks for a dm-thin device. dm-thinpool currently does not pass through REQ_OP_PROVISION to underlying devices. For shared blocks, provision requests will break sharing and copy the contents of the entire block. Additionally, if 'skip_block_zeroing' is not set, dm-thin will opt to zero out the entire range as a part of provisioning. Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> --- drivers/md/dm-thin.c | 74 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 4 deletions(-)