make autoclear operation synchronous again

Message ID	03f43407-c34b-b7b2-68cd-d4ca93a993b8@i-love.sakura.ne.jp (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Message-ID: <03f43407-c34b-b7b2-68cd-d4ca93a993b8@i-love.sakura.ne.jp> Date: Sun, 26 Dec 2021 16:06:27 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Content-Language: en-US To: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.cz>, Dan Schatzberg <schatzberg.dan@gmail.com>, Christoph Hellwig <hch@lst.de>, linux-block <linux-block@vger.kernel.org> From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Subject: [PATCH] make autoclear operation synchronous again Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk
Series	make autoclear operation synchronous again \| expand make autoclear operation synchronous again

Tetsuo Handa Dec. 26, 2021, 7:06 a.m. UTC

The kernel test robot is reporting that xfstest can fail at

  umount ext2 on xfs
  umount xfs

sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
asynchronous") broke what commit ("loop: Make explicit loop device
destruction lazy") wanted to achieve.

Although we cannot guarantee that nobody is holding a reference when
"umount xfs" is called, we should try to close a race window opened
by asynchronous autoclear operation.

It turned out that there is no need to call flush_workqueue() from
__loop_clr_fd(), for blk_mq_freeze_queue() blocks until all pending
"struct work_struct" are processed by loop_process_work().

Revert commit 322c4293ecc58110 ("loop: make autoclear operation
asynchronous"), and simply defer calling destroy_workqueue() till
loop_remove().

Since a loop device is likely reused after once created thanks to
ioctl(LOOP_CTL_GET_FREE), being unable to destroy corresponding WQ
when ioctl(LOOP_CLR_FD) is called should be an acceptable waste.

Reported-by: kernel test robot <oliver.sang@intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot <syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---

 drivers/block/loop.c | 76 ++++++++++++++++++++------------------------
 drivers/block/loop.h |  1 -
 2 files changed, 35 insertions(+), 42 deletions(-)

Christoph Hellwig Dec. 29, 2021, 5:29 p.m. UTC | #1

The subject line is missing the loop: prefix.

This approach looks very reasonable, but a few comments below:

> -	lo->workqueue = alloc_workqueue("loop%d",
> -					WQ_UNBOUND | WQ_FREEZABLE,
> -					0,
> -					lo->lo_number);
> +	if (!lo->workqueue)
> +		lo->workqueue = alloc_workqueue("loop%d",
> +						WQ_UNBOUND | WQ_FREEZABLE,
> +						0, lo->lo_number);

Instead of having to deal with sometimes present workqueues, why
not move the workqueue allocation to loop_add?

> +	/* This is safe: open() is still holding a reference. */
> +	module_put(THIS_MODULE);

Any reason to move the module_put here insted of keeping it at the
end of the function as the old code did?

Tetsuo Handa Dec. 30, 2021, 10:52 a.m. UTC | #2

On 2021/12/30 2:29, Christoph Hellwig wrote:
> The subject line is missing the loop: prefix.

Oops.

> 
> This approach looks very reasonable, but a few comments below:

I prefer task_work approach if it is safe, for it can call destroy_workqueue()
without global locks held. Note that __loop_clear_fd() is also called from
ioctl(LOOP_CLR_FD), and creating locking dependency chain for both with and without
global locks held looks stupid. If we can do without global locks held, why bother
to do with global locks held?

> 
>> -	lo->workqueue = alloc_workqueue("loop%d",
>> -					WQ_UNBOUND | WQ_FREEZABLE,
>> -					0,
>> -					lo->lo_number);
>> +	if (!lo->workqueue)
>> +		lo->workqueue = alloc_workqueue("loop%d",
>> +						WQ_UNBOUND | WQ_FREEZABLE,
>> +						0, lo->lo_number);
> 
> Instead of having to deal with sometimes present workqueues, why
> not move the workqueue allocation to loop_add?

A bit of worrisome thing is that destroy_workqueue() can be called with
major_names_lock held, for loop_add() may be called as probe function from
blk_request_module(). Some unexpected dependency might bite us in future.

We can avoid destroy_workqueue() from loop_add() if we call alloc_workqueue()
after add_disk() succeeded. But in that case calling alloc_workqueue() from
loop_configure() (which is called without global locks like major_names_lock)
sounds safer.

> 
>> +	/* This is safe: open() is still holding a reference. */
>> +	module_put(THIS_MODULE);
> 
> Any reason to move the module_put here insted of keeping it at the
> end of the function as the old code did?

OK. Two patches shown below. Are these look reasonable?



From 1409a49181efcc474fbae2cf4e60cbc37adf34aa Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 30 Dec 2021 18:41:05 +0900
Subject: [PATCH 1/2] loop: Revert "loop: make autoclear operation asynchronous"

The kernel test robot is reporting that xfstest can fail at

  umount ext2 on xfs
  umount xfs

sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
asynchronous") broke what commit ("loop: Make explicit loop device
destruction lazy") wanted to achieve.

Although we cannot guarantee that nobody is holding a reference when
"umount xfs" is called, we should try to close a race window opened
by asynchronous autoclear operation.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/block/loop.c | 65 ++++++++++++++++++++------------------------
 drivers/block/loop.h |  1 -
 2 files changed, 29 insertions(+), 37 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b1b05c45c07c..e52a8a5e8cbc 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	return error;
 }
 
-static void __loop_clr_fd(struct loop_device *lo)
+static void __loop_clr_fd(struct loop_device *lo, bool release)
 {
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
@@ -1144,6 +1144,8 @@ static void __loop_clr_fd(struct loop_device *lo)
 	/* let user-space know about this change */
 	kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
 	mapping_set_gfp_mask(filp->f_mapping, gfp);
+	/* This is safe: open() is still holding a reference. */
+	module_put(THIS_MODULE);
 	blk_mq_unfreeze_queue(lo->lo_queue);
 
 	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
@@ -1151,52 +1153,44 @@ static void __loop_clr_fd(struct loop_device *lo)
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
 		int err;
 
-		mutex_lock(&lo->lo_disk->open_mutex);
+		/*
+		 * open_mutex has been held already in release path, so don't
+		 * acquire it if this function is called in such case.
+		 *
+		 * If the reread partition isn't from release path, lo_refcnt
+		 * must be at least one and it can only become zero when the
+		 * current holder is released.
+		 */
+		if (!release)
+			mutex_lock(&lo->lo_disk->open_mutex);
 		err = bdev_disk_changed(lo->lo_disk, false);
-		mutex_unlock(&lo->lo_disk->open_mutex);
+		if (!release)
+			mutex_unlock(&lo->lo_disk->open_mutex);
 		if (err)
 			pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
 				__func__, lo->lo_number, err);
 		/* Device is gone, no point in returning error */
 	}
 
+	/*
+	 * lo->lo_state is set to Lo_unbound here after above partscan has
+	 * finished. There cannot be anybody else entering __loop_clr_fd() as
+	 * Lo_rundown state protects us from all the other places trying to
+	 * change the 'lo' device.
+	 */
 	lo->lo_flags = 0;
 	if (!part_shift)
 		lo->lo_disk->flags |= GENHD_FL_NO_PART;
-
-	fput(filp);
-}
-
-static void loop_rundown_completed(struct loop_device *lo)
-{
 	mutex_lock(&lo->lo_mutex);
 	lo->lo_state = Lo_unbound;
 	mutex_unlock(&lo->lo_mutex);
-	module_put(THIS_MODULE);
-}
-
-static void loop_rundown_workfn(struct work_struct *work)
-{
-	struct loop_device *lo = container_of(work, struct loop_device,
-					      rundown_work);
-	struct block_device *bdev = lo->lo_device;
-	struct gendisk *disk = lo->lo_disk;
-
-	__loop_clr_fd(lo);
-	kobject_put(&bdev->bd_device.kobj);
-	module_put(disk->fops->owner);
-	loop_rundown_completed(lo);
-}
 
-static void loop_schedule_rundown(struct loop_device *lo)
-{
-	struct block_device *bdev = lo->lo_device;
-	struct gendisk *disk = lo->lo_disk;
-
-	__module_get(disk->fops->owner);
-	kobject_get(&bdev->bd_device.kobj);
-	INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
-	queue_work(system_long_wq, &lo->rundown_work);
+	/*
+	 * Need not hold lo_mutex to fput backing file. Calling fput holding
+	 * lo_mutex triggers a circular lock dependency possibility warning as
+	 * fput can take open_mutex which is usually taken before lo_mutex.
+	 */
+	fput(filp);
 }
 
 static int loop_clr_fd(struct loop_device *lo)
@@ -1228,8 +1222,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_state = Lo_rundown;
 	mutex_unlock(&lo->lo_mutex);
 
-	__loop_clr_fd(lo);
-	loop_rundown_completed(lo);
+	__loop_clr_fd(lo, false);
 	return 0;
 }
 
@@ -1754,7 +1747,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
 		 * In autoclear mode, stop the loop thread
 		 * and remove configuration after last close.
 		 */
-		loop_schedule_rundown(lo);
+		__loop_clr_fd(lo, true);
 		return;
 	} else if (lo->lo_state == Lo_bound) {
 		/*
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 918a7a2dc025..082d4b6bfc6a 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -56,7 +56,6 @@ struct loop_device {
 	struct gendisk		*lo_disk;
 	struct mutex		lo_mutex;
 	bool			idr_visible;
-	struct work_struct      rundown_work;
 };
 
 struct loop_cmd {

Christoph Hellwig Jan. 3, 2022, 8:33 a.m. UTC | #3

On Thu, Dec 30, 2021 at 07:52:34PM +0900, Tetsuo Handa wrote:
> > Instead of having to deal with sometimes present workqueues, why
> > not move the workqueue allocation to loop_add?
> 
> A bit of worrisome thing is that destroy_workqueue() can be called with
> major_names_lock held, for loop_add() may be called as probe function from
> blk_request_module(). Some unexpected dependency might bite us in future.
> 
> We can avoid destroy_workqueue() from loop_add() if we call alloc_workqueue()
> after add_disk() succeeded. But in that case calling alloc_workqueue() from
> loop_configure() (which is called without global locks like major_names_lock)
> sounds safer.

Ok.

> OK. Two patches shown below. Are these look reasonable?

They do look reasonable to me based on a quick glance, but please post
them one patch per mail in a separate thread for proper review.

Jan Stancek Jan. 5, 2022, 6:02 a.m. UTC | #4

On Thu, Dec 30, 2021 at 07:52:34PM +0900, Tetsuo Handa wrote:
>OK. Two patches shown below. Are these look reasonable?
>
>
>
>>From 1409a49181efcc474fbae2cf4e60cbc37adf34aa Mon Sep 17 00:00:00 2001
>From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>Date: Thu, 30 Dec 2021 18:41:05 +0900
>Subject: [PATCH 1/2] loop: Revert "loop: make autoclear operation asynchronous"
>

Thanks, the revert fixes failures we saw recently in LTP tests,
which do mount/umount in close succession:

# for i in `seq 1 2`;do mount -t iso9660 -o loop /root/isofs.iso /mnt/isofs; umount /mnt/isofs/; done
mount: /mnt/isofs: WARNING: source write-protected, mounted read-only.
mount: /mnt/isofs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
umount: /mnt/isofs/: not mounted.

>The kernel test robot is reporting that xfstest can fail at
>
>  umount ext2 on xfs
>  umount xfs
>
>sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
>asynchronous") broke what commit ("loop: Make explicit loop device
>destruction lazy") wanted to achieve.
>
>Although we cannot guarantee that nobody is holding a reference when
>"umount xfs" is called, we should try to close a race window opened
>by asynchronous autoclear operation.
>
>Reported-by: kernel test robot <oliver.sang@intel.com>
>Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>---
> drivers/block/loop.c | 65 ++++++++++++++++++++------------------------
> drivers/block/loop.h |  1 -
> 2 files changed, 29 insertions(+), 37 deletions(-)
>
>diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>index b1b05c45c07c..e52a8a5e8cbc 100644
>--- a/drivers/block/loop.c
>+++ b/drivers/block/loop.c
>@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
> 	return error;
> }
>
>-static void __loop_clr_fd(struct loop_device *lo)
>+static void __loop_clr_fd(struct loop_device *lo, bool release)
> {
> 	struct file *filp;
> 	gfp_t gfp = lo->old_gfp_mask;
>@@ -1144,6 +1144,8 @@ static void __loop_clr_fd(struct loop_device *lo)
> 	/* let user-space know about this change */
> 	kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
> 	mapping_set_gfp_mask(filp->f_mapping, gfp);
>+	/* This is safe: open() is still holding a reference. */
>+	module_put(THIS_MODULE);
> 	blk_mq_unfreeze_queue(lo->lo_queue);
>
> 	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
>@@ -1151,52 +1153,44 @@ static void __loop_clr_fd(struct loop_device *lo)
> 	if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
> 		int err;
>
>-		mutex_lock(&lo->lo_disk->open_mutex);
>+		/*
>+		 * open_mutex has been held already in release path, so don't
>+		 * acquire it if this function is called in such case.
>+		 *
>+		 * If the reread partition isn't from release path, lo_refcnt
>+		 * must be at least one and it can only become zero when the
>+		 * current holder is released.
>+		 */
>+		if (!release)
>+			mutex_lock(&lo->lo_disk->open_mutex);
> 		err = bdev_disk_changed(lo->lo_disk, false);
>-		mutex_unlock(&lo->lo_disk->open_mutex);
>+		if (!release)
>+			mutex_unlock(&lo->lo_disk->open_mutex);
> 		if (err)
> 			pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
> 				__func__, lo->lo_number, err);
> 		/* Device is gone, no point in returning error */
> 	}
>
>+	/*
>+	 * lo->lo_state is set to Lo_unbound here after above partscan has
>+	 * finished. There cannot be anybody else entering __loop_clr_fd() as
>+	 * Lo_rundown state protects us from all the other places trying to
>+	 * change the 'lo' device.
>+	 */
> 	lo->lo_flags = 0;
> 	if (!part_shift)
> 		lo->lo_disk->flags |= GENHD_FL_NO_PART;
>-
>-	fput(filp);
>-}
>-
>-static void loop_rundown_completed(struct loop_device *lo)
>-{
> 	mutex_lock(&lo->lo_mutex);
> 	lo->lo_state = Lo_unbound;
> 	mutex_unlock(&lo->lo_mutex);
>-	module_put(THIS_MODULE);
>-}
>-
>-static void loop_rundown_workfn(struct work_struct *work)
>-{
>-	struct loop_device *lo = container_of(work, struct loop_device,
>-					      rundown_work);
>-	struct block_device *bdev = lo->lo_device;
>-	struct gendisk *disk = lo->lo_disk;
>-
>-	__loop_clr_fd(lo);
>-	kobject_put(&bdev->bd_device.kobj);
>-	module_put(disk->fops->owner);
>-	loop_rundown_completed(lo);
>-}
>
>-static void loop_schedule_rundown(struct loop_device *lo)
>-{
>-	struct block_device *bdev = lo->lo_device;
>-	struct gendisk *disk = lo->lo_disk;
>-
>-	__module_get(disk->fops->owner);
>-	kobject_get(&bdev->bd_device.kobj);
>-	INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
>-	queue_work(system_long_wq, &lo->rundown_work);
>+	/*
>+	 * Need not hold lo_mutex to fput backing file. Calling fput holding
>+	 * lo_mutex triggers a circular lock dependency possibility warning as
>+	 * fput can take open_mutex which is usually taken before lo_mutex.
>+	 */
>+	fput(filp);
> }
>
> static int loop_clr_fd(struct loop_device *lo)
>@@ -1228,8 +1222,7 @@ static int loop_clr_fd(struct loop_device *lo)
> 	lo->lo_state = Lo_rundown;
> 	mutex_unlock(&lo->lo_mutex);
>
>-	__loop_clr_fd(lo);
>-	loop_rundown_completed(lo);
>+	__loop_clr_fd(lo, false);
> 	return 0;
> }
>
>@@ -1754,7 +1747,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
> 		 * In autoclear mode, stop the loop thread
> 		 * and remove configuration after last close.
> 		 */
>-		loop_schedule_rundown(lo);
>+		__loop_clr_fd(lo, true);
> 		return;
> 	} else if (lo->lo_state == Lo_bound) {
> 		/*
>diff --git a/drivers/block/loop.h b/drivers/block/loop.h
>index 918a7a2dc025..082d4b6bfc6a 100644
>--- a/drivers/block/loop.h
>+++ b/drivers/block/loop.h
>@@ -56,7 +56,6 @@ struct loop_device {
> 	struct gendisk		*lo_disk;
> 	struct mutex		lo_mutex;
> 	bool			idr_visible;
>-	struct work_struct      rundown_work;
> };
>
> struct loop_cmd {
>-- 
>2.32.0
>

Tetsuo Handa Jan. 5, 2022, 6:10 a.m. UTC | #5

On 2022/01/05 15:02, Jan Stancek wrote:
> On Thu, Dec 30, 2021 at 07:52:34PM +0900, Tetsuo Handa wrote:
>> OK. Two patches shown below. Are these look reasonable?
>>
>>
>>
>>> From 1409a49181efcc474fbae2cf4e60cbc37adf34aa Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Date: Thu, 30 Dec 2021 18:41:05 +0900
>> Subject: [PATCH 1/2] loop: Revert "loop: make autoclear operation asynchronous"
>>
> 
> Thanks, the revert fixes failures we saw recently in LTP tests,
> which do mount/umount in close succession:
> 
> # for i in `seq 1 2`;do mount -t iso9660 -o loop /root/isofs.iso /mnt/isofs; umount /mnt/isofs/; done
> mount: /mnt/isofs: WARNING: source write-protected, mounted read-only.
> mount: /mnt/isofs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
> umount: /mnt/isofs/: not mounted.
> 

I'm waiting for Jens to come back to
https://lkml.kernel.org/r/c205dcd2-db55-a35c-e2ef-20193b5ac0da@i-love.sakura.ne.jp .

Christoph Hellwig Jan. 20, 2022, 8:17 a.m. UTC | #6

On Wed, Jan 05, 2022 at 03:10:23PM +0900, Tetsuo Handa wrote:
> On 2022/01/05 15:02, Jan Stancek wrote:
> > On Thu, Dec 30, 2021 at 07:52:34PM +0900, Tetsuo Handa wrote:
> >> OK. Two patches shown below. Are these look reasonable?
> >>
> >>
> >>
> >>> From 1409a49181efcc474fbae2cf4e60cbc37adf34aa Mon Sep 17 00:00:00 2001
> >> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> Date: Thu, 30 Dec 2021 18:41:05 +0900
> >> Subject: [PATCH 1/2] loop: Revert "loop: make autoclear operation asynchronous"
> >>
> > 
> > Thanks, the revert fixes failures we saw recently in LTP tests,
> > which do mount/umount in close succession:
> > 
> > # for i in `seq 1 2`;do mount -t iso9660 -o loop /root/isofs.iso /mnt/isofs; umount /mnt/isofs/; done
> > mount: /mnt/isofs: WARNING: source write-protected, mounted read-only.
> > mount: /mnt/isofs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
> > umount: /mnt/isofs/: not mounted.
> > 
> 
> I'm waiting for Jens to come back to
> https://lkml.kernel.org/r/c205dcd2-db55-a35c-e2ef-20193b5ac0da@i-love.sakura.ne.jp .

So I think the version in this thread is what we should merge.

On top of that we should:

 - remove the probe mechanism (patch already sent)
 - stop taking open_mutex in del_gendisk and bdev_disk_changed (I have
   a series for that now)

make autoclear operation synchronous again

Commit Message

Comments

Patch