Message ID | 9eff2034-2f32-54a3-e476-d0f609ab49c0@i-love.sakura.ne.jp (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] block: export task_work_add() | expand |
On 12/22/21 8:27 AM, Tetsuo Handa wrote: > The kernel test robot is reporting that xfstest can fail at > > umount ext2 on xfs > umount xfs > > sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation > asynchronous") broke what commit ("loop: Make explicit loop device > destruction lazy") wanted to achieve. > > Although we cannot guarantee that nobody is holding a reference when > "umount xfs" is called, we should try to close a race window opened > by asynchronous autoclear operation. > > Try to make the autoclear operation upon close() synchronous, by calling > __loop_clr_fd() from current thread's task work rather than a WQ thread. Doesn't this potentially race with fput?
On 2021/12/23 0:56, Jens Axboe wrote: > On 12/22/21 8:27 AM, Tetsuo Handa wrote: >> The kernel test robot is reporting that xfstest can fail at >> >> umount ext2 on xfs >> umount xfs >> >> sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation >> asynchronous") broke what commit ("loop: Make explicit loop device >> destruction lazy") wanted to achieve. >> >> Although we cannot guarantee that nobody is holding a reference when >> "umount xfs" is called, we should try to close a race window opened >> by asynchronous autoclear operation. >> >> Try to make the autoclear operation upon close() synchronous, by calling >> __loop_clr_fd() from current thread's task work rather than a WQ thread. > > Doesn't this potentially race with fput? > What race? loop_schedule_rundown() is called from lo_release() from blkdev_put() from blkdev_close() from __fput() from task_work_run(). And loop_schedule_rundown() calls kobject_get(&bdev->bd_device.kobj) before put_device() from blkdev_put_no_open() from blkdev_put() from blkdev_close() from __fput() from task_work_run() calls kobject_put(&bdev->bd_device.kobj).
On 12/23/21 12:01 AM, Tetsuo Handa wrote: > On 2021/12/23 0:56, Jens Axboe wrote: >> On 12/22/21 8:27 AM, Tetsuo Handa wrote: >>> The kernel test robot is reporting that xfstest can fail at >>> >>> umount ext2 on xfs >>> umount xfs >>> >>> sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation >>> asynchronous") broke what commit ("loop: Make explicit loop device >>> destruction lazy") wanted to achieve. >>> >>> Although we cannot guarantee that nobody is holding a reference when >>> "umount xfs" is called, we should try to close a race window opened >>> by asynchronous autoclear operation. >>> >>> Try to make the autoclear operation upon close() synchronous, by calling >>> __loop_clr_fd() from current thread's task work rather than a WQ thread. >> >> Doesn't this potentially race with fput? >> > > What race? > > loop_schedule_rundown() is called from lo_release() from blkdev_put() > from blkdev_close() from __fput() from task_work_run(). And > loop_schedule_rundown() calls kobject_get(&bdev->bd_device.kobj) > before put_device() from blkdev_put_no_open() from blkdev_put() from > blkdev_close() from __fput() from task_work_run() calls > kobject_put(&bdev->bd_device.kobj). Race is the wrong word, I mean ordering issues. If the fput happens before you queue your task_work, then it'll be run before that.
On 2021/12/23 23:13, Jens Axboe wrote: > Race is the wrong word, I mean ordering issues. If the fput happens > before you queue your task_work, then it'll be run before that. There are two fput() related to this patch. fput(file_of_loop_device) or fput(file_of_backing_file), which one? loop_schedule_rundown() is called from lo_release() from blkdev_put() from blkdev_close() from __fput() from task_work_run(), after fput(file_of_loop_device) called task_work_add(TWA_RESUME) for scheduling __fput(). __loop_clr_fd() is called from loop_rundown_callbackfn() from task_work_run(), after loop_schedule_rundown() called task_work_add(TWA_RESUME) for scheduling loop_rundown_callbackfn(). fput(file_of_backing_file) is called from __loop_clr_fd(), and fput(file_of_backing_file) calls task_work_add(TWA_RESUME) for scheduling __fput(). And __fput() from task_work_run() calls release callback for file_of_backing_file. fput(file_of_loop_device) happens before loop_schedule_rundown() calls task_work_add(TWA_RESUME). fput(file_of_backing_file) happens after loop_schedule_rundown() called task_work_add(TWA_RESUME). The release callback for file_of_backing_file will be completed before close() syscall which triggered fput(file_of_loop_device) returns to userspace, for these are chain of add_task_work(). As a total sequence, I think there is no ordering issues. Or, am I missing limitation of task work usage?
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index b1b05c45c07c..814605e2cbef 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1175,10 +1175,8 @@ static void loop_rundown_completed(struct loop_device *lo) module_put(THIS_MODULE); } -static void loop_rundown_workfn(struct work_struct *work) +static void loop_rundown_start(struct loop_device *lo) { - struct loop_device *lo = container_of(work, struct loop_device, - rundown_work); struct block_device *bdev = lo->lo_device; struct gendisk *disk = lo->lo_disk; @@ -1188,6 +1186,18 @@ static void loop_rundown_workfn(struct work_struct *work) loop_rundown_completed(lo); } +static void loop_rundown_callbackfn(struct callback_head *callback) +{ + loop_rundown_start(container_of(callback, struct loop_device, + rundown.callback)); +} + +static void loop_rundown_workfn(struct work_struct *work) +{ + loop_rundown_start(container_of(work, struct loop_device, + rundown.work)); +} + static void loop_schedule_rundown(struct loop_device *lo) { struct block_device *bdev = lo->lo_device; @@ -1195,8 +1205,13 @@ static void loop_schedule_rundown(struct loop_device *lo) __module_get(disk->fops->owner); kobject_get(&bdev->bd_device.kobj); - INIT_WORK(&lo->rundown_work, loop_rundown_workfn); - queue_work(system_long_wq, &lo->rundown_work); + if (!(current->flags & PF_KTHREAD)) { + init_task_work(&lo->rundown.callback, loop_rundown_callbackfn); + if (!task_work_add(current, &lo->rundown.callback, TWA_RESUME)) + return; + } + INIT_WORK(&lo->rundown.work, loop_rundown_workfn); + queue_work(system_long_wq, &lo->rundown.work); } static int loop_clr_fd(struct loop_device *lo) diff --git a/drivers/block/loop.h b/drivers/block/loop.h index 918a7a2dc025..596472f9cde3 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -56,7 +56,10 @@ struct loop_device { struct gendisk *lo_disk; struct mutex lo_mutex; bool idr_visible; - struct work_struct rundown_work; + union { + struct work_struct work; + struct callback_head callback; + } rundown; }; struct loop_cmd {
The kernel test robot is reporting that xfstest can fail at umount ext2 on xfs umount xfs sequence, for commit 322c4293ecc58110 ("loop: make autoclear operation asynchronous") broke what commit ("loop: Make explicit loop device destruction lazy") wanted to achieve. Although we cannot guarantee that nobody is holding a reference when "umount xfs" is called, we should try to close a race window opened by asynchronous autoclear operation. Try to make the autoclear operation upon close() synchronous, by calling __loop_clr_fd() from current thread's task work rather than a WQ thread. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- drivers/block/loop.c | 25 ++++++++++++++++++++----- drivers/block/loop.h | 5 ++++- 2 files changed, 24 insertions(+), 6 deletions(-)