diff mbox series

[2/2] loop: use task_work for autoclear operation

Message ID 9eff2034-2f32-54a3-e476-d0f609ab49c0@i-love.sakura.ne.jp (mailing list archive)
State New, archived
Headers show
Series [1/2] block: export task_work_add() | expand

Commit Message

Tetsuo Handa Dec. 22, 2021, 3:27 p.m. UTC
The kernel test robot is reporting that xfstest can fail at

  umount ext2 on xfs
  umount xfs

sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
asynchronous") broke what commit ("loop: Make explicit loop device
destruction lazy") wanted to achieve.

Although we cannot guarantee that nobody is holding a reference when
"umount xfs" is called, we should try to close a race window opened
by asynchronous autoclear operation.

Try to make the autoclear operation upon close() synchronous, by calling
__loop_clr_fd() from current thread's task work rather than a WQ thread.

Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/block/loop.c | 25 ++++++++++++++++++++-----
 drivers/block/loop.h |  5 ++++-
 2 files changed, 24 insertions(+), 6 deletions(-)

Comments

Jens Axboe Dec. 22, 2021, 3:56 p.m. UTC | #1
On 12/22/21 8:27 AM, Tetsuo Handa wrote:
> The kernel test robot is reporting that xfstest can fail at
> 
>   umount ext2 on xfs
>   umount xfs
> 
> sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
> asynchronous") broke what commit ("loop: Make explicit loop device
> destruction lazy") wanted to achieve.
> 
> Although we cannot guarantee that nobody is holding a reference when
> "umount xfs" is called, we should try to close a race window opened
> by asynchronous autoclear operation.
> 
> Try to make the autoclear operation upon close() synchronous, by calling
> __loop_clr_fd() from current thread's task work rather than a WQ thread.

Doesn't this potentially race with fput?
Tetsuo Handa Dec. 23, 2021, 7:01 a.m. UTC | #2
On 2021/12/23 0:56, Jens Axboe wrote:
> On 12/22/21 8:27 AM, Tetsuo Handa wrote:
>> The kernel test robot is reporting that xfstest can fail at
>>
>>   umount ext2 on xfs
>>   umount xfs
>>
>> sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
>> asynchronous") broke what commit ("loop: Make explicit loop device
>> destruction lazy") wanted to achieve.
>>
>> Although we cannot guarantee that nobody is holding a reference when
>> "umount xfs" is called, we should try to close a race window opened
>> by asynchronous autoclear operation.
>>
>> Try to make the autoclear operation upon close() synchronous, by calling
>> __loop_clr_fd() from current thread's task work rather than a WQ thread.
> 
> Doesn't this potentially race with fput?
> 

What race?

loop_schedule_rundown() is called from lo_release() from blkdev_put() from
blkdev_close() from __fput() from task_work_run(). And loop_schedule_rundown()
calls kobject_get(&bdev->bd_device.kobj) before put_device() from
blkdev_put_no_open() from blkdev_put() from blkdev_close() from __fput() from
task_work_run() calls kobject_put(&bdev->bd_device.kobj).
Jens Axboe Dec. 23, 2021, 2:13 p.m. UTC | #3
On 12/23/21 12:01 AM, Tetsuo Handa wrote:
> On 2021/12/23 0:56, Jens Axboe wrote:
>> On 12/22/21 8:27 AM, Tetsuo Handa wrote:
>>> The kernel test robot is reporting that xfstest can fail at
>>>
>>>   umount ext2 on xfs
>>>   umount xfs
>>>
>>> sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation
>>> asynchronous") broke what commit ("loop: Make explicit loop device
>>> destruction lazy") wanted to achieve.
>>>
>>> Although we cannot guarantee that nobody is holding a reference when
>>> "umount xfs" is called, we should try to close a race window opened
>>> by asynchronous autoclear operation.
>>>
>>> Try to make the autoclear operation upon close() synchronous, by calling
>>> __loop_clr_fd() from current thread's task work rather than a WQ thread.
>>
>> Doesn't this potentially race with fput?
>>
> 
> What race?
> 
> loop_schedule_rundown() is called from lo_release() from blkdev_put()
> from blkdev_close() from __fput() from task_work_run(). And
> loop_schedule_rundown() calls kobject_get(&bdev->bd_device.kobj)
> before put_device() from blkdev_put_no_open() from blkdev_put() from
> blkdev_close() from __fput() from task_work_run() calls
> kobject_put(&bdev->bd_device.kobj).

Race is the wrong word, I mean ordering issues. If the fput happens
before you queue your task_work, then it'll be run before that.
Tetsuo Handa Dec. 23, 2021, 3:28 p.m. UTC | #4
On 2021/12/23 23:13, Jens Axboe wrote:
> Race is the wrong word, I mean ordering issues. If the fput happens
> before you queue your task_work, then it'll be run before that.

There are two fput() related to this patch.
fput(file_of_loop_device) or fput(file_of_backing_file), which one?

loop_schedule_rundown() is called from lo_release() from blkdev_put() from
blkdev_close() from __fput() from task_work_run(), after fput(file_of_loop_device)
called task_work_add(TWA_RESUME) for scheduling __fput().

__loop_clr_fd() is called from loop_rundown_callbackfn() from task_work_run(),
after loop_schedule_rundown() called task_work_add(TWA_RESUME) for scheduling
loop_rundown_callbackfn().

fput(file_of_backing_file) is called from __loop_clr_fd(), and
fput(file_of_backing_file) calls task_work_add(TWA_RESUME) for scheduling
__fput().

And __fput() from task_work_run() calls release callback for file_of_backing_file.

fput(file_of_loop_device) happens before loop_schedule_rundown() calls
task_work_add(TWA_RESUME).

fput(file_of_backing_file) happens after loop_schedule_rundown() called
task_work_add(TWA_RESUME).

The release callback for file_of_backing_file will be completed before
close() syscall which triggered fput(file_of_loop_device) returns to
userspace, for these are chain of add_task_work().

As a total sequence, I think there is no ordering issues.
Or, am I missing limitation of task work usage?
diff mbox series

Patch

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b1b05c45c07c..814605e2cbef 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1175,10 +1175,8 @@  static void loop_rundown_completed(struct loop_device *lo)
 	module_put(THIS_MODULE);
 }
 
-static void loop_rundown_workfn(struct work_struct *work)
+static void loop_rundown_start(struct loop_device *lo)
 {
-	struct loop_device *lo = container_of(work, struct loop_device,
-					      rundown_work);
 	struct block_device *bdev = lo->lo_device;
 	struct gendisk *disk = lo->lo_disk;
 
@@ -1188,6 +1186,18 @@  static void loop_rundown_workfn(struct work_struct *work)
 	loop_rundown_completed(lo);
 }
 
+static void loop_rundown_callbackfn(struct callback_head *callback)
+{
+	loop_rundown_start(container_of(callback, struct loop_device,
+					rundown.callback));
+}
+
+static void loop_rundown_workfn(struct work_struct *work)
+{
+	loop_rundown_start(container_of(work, struct loop_device,
+					rundown.work));
+}
+
 static void loop_schedule_rundown(struct loop_device *lo)
 {
 	struct block_device *bdev = lo->lo_device;
@@ -1195,8 +1205,13 @@  static void loop_schedule_rundown(struct loop_device *lo)
 
 	__module_get(disk->fops->owner);
 	kobject_get(&bdev->bd_device.kobj);
-	INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
-	queue_work(system_long_wq, &lo->rundown_work);
+	if (!(current->flags & PF_KTHREAD)) {
+		init_task_work(&lo->rundown.callback, loop_rundown_callbackfn);
+		if (!task_work_add(current, &lo->rundown.callback, TWA_RESUME))
+			return;
+	}
+	INIT_WORK(&lo->rundown.work, loop_rundown_workfn);
+	queue_work(system_long_wq, &lo->rundown.work);
 }
 
 static int loop_clr_fd(struct loop_device *lo)
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 918a7a2dc025..596472f9cde3 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -56,7 +56,10 @@  struct loop_device {
 	struct gendisk		*lo_disk;
 	struct mutex		lo_mutex;
 	bool			idr_visible;
-	struct work_struct      rundown_work;
+	union {
+		struct work_struct   work;
+		struct callback_head callback;
+	} rundown;
 };
 
 struct loop_cmd {