Message ID | 20240416152842.13933-1-snitzer@kernel.org (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | [v2] dm: restore synchronous close of device mapper block device | expand |
On Tue, Apr 16, 2024 at 11:28:42AM -0400, Mike Snitzer wrote: > From: Ming Lei <ming.lei@redhat.com> > > 'dmsetup remove' and 'dmsetup remove_all' require synchronous bdev > release. Otherwise dm_lock_for_deletion() may return -EBUSY if the open > count is > 0, because the open count is dropped in dm_blk_close() > which occurs after fput() completes. > > So if dm_blk_close() is delayed because of asynchronous fput(), this > device mapper device is skipped during remove, which is a regression. > > Fix the issue by using __fput_sync(). > > Also: DM device removal has long supported being made asynchronous by > setting the DMF_DEFERRED_REMOVE flag on the DM device. So leverage > using async fput() in close_table_device() if DMF_DEFERRED_REMOVE flag > is set. IMO, this way isn't necessary, because the patch is one bug fix, and we are supposed to recover into exact previous behavior before commit a28d893eb327 ("md: port block device access to file") for minimizing regression risk. But the extra change seems work. thanks, Ming
On Wed, Apr 17, 2024 at 09:32:55AM +0800, Ming Lei wrote: > On Tue, Apr 16, 2024 at 11:28:42AM -0400, Mike Snitzer wrote: > > From: Ming Lei <ming.lei@redhat.com> > > > > 'dmsetup remove' and 'dmsetup remove_all' require synchronous bdev > > release. Otherwise dm_lock_for_deletion() may return -EBUSY if the open > > count is > 0, because the open count is dropped in dm_blk_close() > > which occurs after fput() completes. > > > > So if dm_blk_close() is delayed because of asynchronous fput(), this > > device mapper device is skipped during remove, which is a regression. > > > > Fix the issue by using __fput_sync(). > > > > Also: DM device removal has long supported being made asynchronous by > > setting the DMF_DEFERRED_REMOVE flag on the DM device. So leverage > > using async fput() in close_table_device() if DMF_DEFERRED_REMOVE flag > > is set. > > IMO, this way isn't necessary, because the patch is one bug fix, and we are > supposed to recover into exact previous behavior before commit a28d893eb327 > ("md: port block device access to file") for minimizing regression risk. > > But the extra change seems work. I normally would agree but I see no real reason to avoid leveraging async fput() for the async DM device removal use-case ;) Mike
On Tue, Apr 16, 2024 at 11:29 PM Mike Snitzer <snitzer@kernel.org> wrote: > > From: Ming Lei <ming.lei@redhat.com> > > 'dmsetup remove' and 'dmsetup remove_all' require synchronous bdev > release. Otherwise dm_lock_for_deletion() may return -EBUSY if the open > count is > 0, because the open count is dropped in dm_blk_close() > which occurs after fput() completes. > > So if dm_blk_close() is delayed because of asynchronous fput(), this > device mapper device is skipped during remove, which is a regression. > > Fix the issue by using __fput_sync(). > > Also: DM device removal has long supported being made asynchronous by > setting the DMF_DEFERRED_REMOVE flag on the DM device. So leverage > using async fput() in close_table_device() if DMF_DEFERRED_REMOVE flag > is set. > > Reported-by: Zhong Changhui <czhong@redhat.com> > Fixes: a28d893eb327 ("md: port block device access to file") > Suggested-by: Christian Brauner <brauner@kernel.org> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > [snitzer: editted commit header, use fput() if DMF_DEFERRED_REMOVE set] > Signed-off-by: Mike Snitzer <snitzer@kernel.org> > --- > drivers/md/dm.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 56aa2a8b9d71..7d0746b37c8e 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -765,7 +765,7 @@ static struct table_device *open_table_device(struct mapped_device *md, > return td; > > out_blkdev_put: > - fput(bdev_file); > + __fput_sync(bdev_file); > out_free_td: > kfree(td); > return ERR_PTR(r); > @@ -778,7 +778,13 @@ static void close_table_device(struct table_device *td, struct mapped_device *md > { > if (md->disk->slave_dir) > bd_unlink_disk_holder(td->dm_dev.bdev, md->disk); > - fput(td->dm_dev.bdev_file); > + > + /* Leverage async fput() if DMF_DEFERRED_REMOVE set */ > + if (unlikely(test_bit(DMF_DEFERRED_REMOVE, &md->flags))) > + fput(td->dm_dev.bdev_file); > + else > + __fput_sync(td->dm_dev.bdev_file); > + > put_dax(td->dm_dev.dax_dev); > list_del(&td->list); > kfree(td); > -- > 2.40.0 > I tried to apply this patch and looks this issue has solved by this patch
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 56aa2a8b9d71..7d0746b37c8e 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -765,7 +765,7 @@ static struct table_device *open_table_device(struct mapped_device *md, return td; out_blkdev_put: - fput(bdev_file); + __fput_sync(bdev_file); out_free_td: kfree(td); return ERR_PTR(r); @@ -778,7 +778,13 @@ static void close_table_device(struct table_device *td, struct mapped_device *md { if (md->disk->slave_dir) bd_unlink_disk_holder(td->dm_dev.bdev, md->disk); - fput(td->dm_dev.bdev_file); + + /* Leverage async fput() if DMF_DEFERRED_REMOVE set */ + if (unlikely(test_bit(DMF_DEFERRED_REMOVE, &md->flags))) + fput(td->dm_dev.bdev_file); + else + __fput_sync(td->dm_dev.bdev_file); + put_dax(td->dm_dev.dax_dev); list_del(&td->list); kfree(td);