Message ID | 20240529200240.133331-1-gulam.mohamed@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V3,for-6.10/block] loop: Fix a race between loop detach and loop open | expand |
Hi, 在 2024/05/30 4:02, Gulam Mohamed 写道: > 1. Userspace sends the command "losetup -d" which uses the open() call > to open the device > 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the > function loop_clr_fd() > 3. If LOOP_CLR_FD is the first command received at the time, then the > AUTOCLEAR flag is not set and deletion of the > loop device proceeds ahead and scans the partitions (drop/add > partitions) > > if (disk_openers(lo->lo_disk) > 1) { > lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > loop_global_unlock(lo, true); > return 0; > } > > 4. Before scanning partitions, it will check to see if any partition of > the loop device is currently opened > 5. If any partition is opened, then it will return EBUSY: > > if (disk->open_partitions) > return -EBUSY; > 6. So, after receiving the "LOOP_CLR_FD" command and just before the above > check for open_partitions, if any other command > (like blkid) opens any partition of the loop device, then the partition > scan will not proceed and EBUSY is returned as shown in above code > 7. But in "__loop_clr_fd()", this EBUSY error is not propagated > 8. We have noticed that this is causing the partitions of the loop to > remain stale even after the loop device is detached resulting in the > IO errors on the partitions > > Fix: > Re-introduce the lo_open() call to restrict any process to open the loop > device when its being detached > > Test case involves the following two scripts: > > script1.sh: > > while [ 1 ]; > do > losetup -P -f /home/opt/looptest/test10.img > blkid /dev/loop0p1 > done > > script2.sh: > > while [ 1 ]; > do > losetup -d /dev/loop0 > done > > Without fix, the following IO errors have been observed: > > kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16) > kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 > phys_seg 1 prio class 0 > kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0 > phys_seg 1 prio class 0 > kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page > read > > Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> > --- > v3<-v2: > Re-introduced the loop->lo_refcnt to take care of the case where we race > when the Lo_rundown is set after the lo_open() function releases the > lo_mutex lock > > drivers/block/loop.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 28a95fd366fe..60f61bf8dbd1 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -49,6 +49,7 @@ struct loop_func_table; > > struct loop_device { > int lo_number; > + atomic_t lo_refcnt; > loff_t lo_offset; > loff_t lo_sizelimit; > int lo_flags; > @@ -1242,7 +1243,7 @@ static int loop_clr_fd(struct loop_device *lo) > * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d > * command to fail with EBUSY. > */ > - if (disk_openers(lo->lo_disk) > 1) { > + if (atomic_read(&lo->lo_refcnt) > 1) { > lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > loop_global_unlock(lo, true); > return 0; > @@ -1717,14 +1718,31 @@ static int lo_compat_ioctl(struct block_device *bdev, blk_mode_t mode, > } > #endif > > -static void lo_release(struct gendisk *disk) > +static int lo_open(struct gendisk *disk, blk_mode_t mode) > { > struct loop_device *lo = disk->private_data; > + int err; > > - if (disk_openers(disk) > 0) > - return; > + err = mutex_lock_killable(&lo->lo_mutex); > + if (err) > + return err; > + > + if (lo->lo_state == Lo_deleting || lo->lo_state == Lo_rundown) > + err = -ENXIO; > + else > + atomic_inc(&lo->lo_refcnt); > + mutex_unlock(&lo->lo_mutex); > + return err; > +} > + > +static void lo_release(struct gendisk *disk) > +{ > + struct loop_device *lo = disk->private_data; > > mutex_lock(&lo->lo_mutex); > + if (atomic_dec_return(&lo->lo_refcnt)) > + goto out_unlock; So, both add, dec and test are inside the lo_mutex, then there is no need to use atomic value. Thanks, Kuai > + > if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) { > lo->lo_state = Lo_rundown; > mutex_unlock(&lo->lo_mutex); > @@ -1735,6 +1753,7 @@ static void lo_release(struct gendisk *disk) > __loop_clr_fd(lo, true); > return; > } > +out_unlock: > mutex_unlock(&lo->lo_mutex); > } > > @@ -1752,6 +1771,7 @@ static void lo_free_disk(struct gendisk *disk) > > static const struct block_device_operations lo_fops = { > .owner = THIS_MODULE, > + .open = lo_open, > .release = lo_release, > .ioctl = lo_ioctl, > #ifdef CONFIG_COMPAT > @@ -2064,6 +2084,7 @@ static int loop_add(int i) > */ > if (!part_shift) > set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); > + atomic_set(&lo->lo_refcnt, 0); > mutex_init(&lo->lo_mutex); > lo->lo_number = i; > spin_lock_init(&lo->lo_lock); > @@ -2158,7 +2179,7 @@ static int loop_control_remove(int idx) > ret = mutex_lock_killable(&lo->lo_mutex); > if (ret) > goto mark_visible; > - if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) { > + if (lo->lo_state != Lo_unbound || atomic_read(&lo->lo_refcnt) > 0) { > mutex_unlock(&lo->lo_mutex); > ret = -EBUSY; > goto mark_visible; >
Hi Kuai, > -----Original Message----- > From: Yu Kuai <yukuai1@huaweicloud.com> > Sent: Thursday, May 30, 2024 7:36 AM > To: Gulam Mohamed <gulam.mohamed@oracle.com>; linux- > block@vger.kernel.org; linux-kernel@vger.kernel.org > Cc: yukuai1@huaweicloud.com; hch@lst.de; axboe@kernel.dk; yukuai (C) > <yukuai3@huawei.com> > Subject: Re: [PATCH V3 for-6.10/block] loop: Fix a race between loop detach > and loop open > > Hi, > > 在 2024/05/30 4:02, Gulam Mohamed 写道: > > 1. Userspace sends the command "losetup -d" which uses the open() call > > to open the device > > 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the > > function loop_clr_fd() > > 3. If LOOP_CLR_FD is the first command received at the time, then the > > AUTOCLEAR flag is not set and deletion of the > > loop device proceeds ahead and scans the partitions (drop/add > > partitions) > > > > if (disk_openers(lo->lo_disk) > 1) { > > lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > > loop_global_unlock(lo, true); > > return 0; > > } > > > > 4. Before scanning partitions, it will check to see if any partition of > > the loop device is currently opened > > 5. If any partition is opened, then it will return EBUSY: > > > > if (disk->open_partitions) > > return -EBUSY; > > 6. So, after receiving the "LOOP_CLR_FD" command and just before the > above > > check for open_partitions, if any other command > > (like blkid) opens any partition of the loop device, then the partition > > scan will not proceed and EBUSY is returned as shown in above code > > 7. But in "__loop_clr_fd()", this EBUSY error is not propagated > > 8. We have noticed that this is causing the partitions of the loop to > > remain stale even after the loop device is detached resulting in the > > IO errors on the partitions > > > > Fix: > > Re-introduce the lo_open() call to restrict any process to open the > > loop device when its being detached > > > > Test case involves the following two scripts: > > > > script1.sh: > > > > while [ 1 ]; > > do > > losetup -P -f /home/opt/looptest/test10.img > > blkid /dev/loop0p1 > > done > > > > script2.sh: > > > > while [ 1 ]; > > do > > losetup -d /dev/loop0 > > done > > > > Without fix, the following IO errors have been observed: > > > > kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16) > > kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 > > phys_seg 1 prio class 0 > > kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0 > > phys_seg 1 prio class 0 > > kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page > > read > > > > Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> > > --- > > v3<-v2: > > Re-introduced the loop->lo_refcnt to take care of the case where we > > race when the Lo_rundown is set after the lo_open() function releases > > the lo_mutex lock > > > > drivers/block/loop.c | 31 ++++++++++++++++++++++++++----- > > 1 file changed, 26 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c index > > 28a95fd366fe..60f61bf8dbd1 100644 > > --- a/drivers/block/loop.c > > +++ b/drivers/block/loop.c > > @@ -49,6 +49,7 @@ struct loop_func_table; > > > > struct loop_device { > > int lo_number; > > + atomic_t lo_refcnt; > > loff_t lo_offset; > > loff_t lo_sizelimit; > > int lo_flags; > > @@ -1242,7 +1243,7 @@ static int loop_clr_fd(struct loop_device *lo) > > * <dev>/do something like mkfs/losetup -d <dev> causing the losetup > -d > > * command to fail with EBUSY. > > */ > > - if (disk_openers(lo->lo_disk) > 1) { > > + if (atomic_read(&lo->lo_refcnt) > 1) { > > lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > > loop_global_unlock(lo, true); > > return 0; > > @@ -1717,14 +1718,31 @@ static int lo_compat_ioctl(struct block_device > *bdev, blk_mode_t mode, > > } > > #endif > > > > -static void lo_release(struct gendisk *disk) > > +static int lo_open(struct gendisk *disk, blk_mode_t mode) > > { > > struct loop_device *lo = disk->private_data; > > + int err; > > > > - if (disk_openers(disk) > 0) > > - return; > > + err = mutex_lock_killable(&lo->lo_mutex); > > + if (err) > > + return err; > > + > > + if (lo->lo_state == Lo_deleting || lo->lo_state == Lo_rundown) > > + err = -ENXIO; > > + else > > + atomic_inc(&lo->lo_refcnt); > > + mutex_unlock(&lo->lo_mutex); > > + return err; > > +} > > + > > +static void lo_release(struct gendisk *disk) { > > + struct loop_device *lo = disk->private_data; > > > > mutex_lock(&lo->lo_mutex); > > + if (atomic_dec_return(&lo->lo_refcnt)) > > + goto out_unlock; > > So, both add, dec and test are inside the lo_mutex, then there is no need to > use atomic value. Yes, you are correct. Will change this to "int" in next version. Thanks & Regards, Gulam Mohamed. > > Thanks, > Kuai > > + > > if (lo->lo_state == Lo_bound && (lo->lo_flags & > LO_FLAGS_AUTOCLEAR)) { > > lo->lo_state = Lo_rundown; > > mutex_unlock(&lo->lo_mutex); > > @@ -1735,6 +1753,7 @@ static void lo_release(struct gendisk *disk) > > __loop_clr_fd(lo, true); > > return; > > } > > +out_unlock: > > mutex_unlock(&lo->lo_mutex); > > } > > > > @@ -1752,6 +1771,7 @@ static void lo_free_disk(struct gendisk *disk) > > > > static const struct block_device_operations lo_fops = { > > .owner = THIS_MODULE, > > + .open = lo_open, > > .release = lo_release, > > .ioctl = lo_ioctl, > > #ifdef CONFIG_COMPAT > > @@ -2064,6 +2084,7 @@ static int loop_add(int i) > > */ > > if (!part_shift) > > set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); > > + atomic_set(&lo->lo_refcnt, 0); > > mutex_init(&lo->lo_mutex); > > lo->lo_number = i; > > spin_lock_init(&lo->lo_lock); > > @@ -2158,7 +2179,7 @@ static int loop_control_remove(int idx) > > ret = mutex_lock_killable(&lo->lo_mutex); > > if (ret) > > goto mark_visible; > > - if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) { > > + if (lo->lo_state != Lo_unbound || atomic_read(&lo->lo_refcnt) > 0) { > > mutex_unlock(&lo->lo_mutex); > > ret = -EBUSY; > > goto mark_visible; > >
On Wed, May 29, 2024 at 08:02:40PM +0000, Gulam Mohamed wrote: > Test case involves the following two scripts: > > script1.sh: > > while [ 1 ]; > do > losetup -P -f /home/opt/looptest/test10.img > blkid /dev/loop0p1 > done > > script2.sh: > > while [ 1 ]; > do > losetup -d /dev/loop0 > done When just running these in the background, I just get spam of losetup errors. Maybe you can turn this into a proper blktests test case with a sensible timeout? A for the fix: I suspect it is bette to simply always defer the actual work of disconnecting the backing device, as that avoid the race entirely, and simplifies the code in nbd by removing special cases. Something like this: diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 93780f41646b75..c2238c1e2ad68d 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1131,7 +1131,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode, return error; } -static void __loop_clr_fd(struct loop_device *lo, bool release) +static void __loop_clr_fd(struct loop_device *lo) { struct file *filp; gfp_t gfp = lo->old_gfp_mask; @@ -1139,14 +1139,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release) if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags)) blk_queue_write_cache(lo->lo_queue, false, false); - /* - * Freeze the request queue when unbinding on a live file descriptor and - * thus an open device. When called from ->release we are guaranteed - * that there is no I/O in progress already. - */ - if (!release) - blk_mq_freeze_queue(lo->lo_queue); - spin_lock_irq(&lo->lo_lock); filp = lo->lo_backing_file; lo->lo_backing_file = NULL; @@ -1164,8 +1156,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release) mapping_set_gfp_mask(filp->f_mapping, gfp); /* This is safe: open() is still holding a reference. */ module_put(THIS_MODULE); - if (!release) - blk_mq_unfreeze_queue(lo->lo_queue); disk_force_media_change(lo->lo_disk); @@ -1180,11 +1170,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release) * must be at least one and it can only become zero when the * current holder is released. */ - if (!release) - mutex_lock(&lo->lo_disk->open_mutex); err = bdev_disk_changed(lo->lo_disk, false); - if (!release) - mutex_unlock(&lo->lo_disk->open_mutex); if (err) pr_warn("%s: partition scan of loop%d failed (rc=%d)\n", __func__, lo->lo_number, err); @@ -1232,25 +1218,16 @@ static int loop_clr_fd(struct loop_device *lo) loop_global_unlock(lo, true); return -ENXIO; } + /* - * If we've explicitly asked to tear down the loop device, - * and it has an elevated reference count, set it for auto-teardown when - * the last reference goes away. This stops $!~#$@ udev from - * preventing teardown because it decided that it needs to run blkid on - * the loopback device whenever they appear. xfstests is notorious for - * failing tests because blkid via udev races with a losetup - * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d - * command to fail with EBUSY. + * Mark the device for removing the backing device on last close. + * If we are the only opener, also switch the state to roundown here to + * prevent new openers from coming in. */ - if (disk_openers(lo->lo_disk) > 1) { - lo->lo_flags |= LO_FLAGS_AUTOCLEAR; - loop_global_unlock(lo, true); - return 0; - } - lo->lo_state = Lo_rundown; + lo->lo_flags |= LO_FLAGS_AUTOCLEAR; + if (disk_openers(lo->lo_disk) == 1) + lo->lo_state = Lo_rundown; loop_global_unlock(lo, true); - - __loop_clr_fd(lo, false); return 0; } @@ -1720,22 +1697,24 @@ static int lo_compat_ioctl(struct block_device *bdev, blk_mode_t mode, static void lo_release(struct gendisk *disk) { struct loop_device *lo = disk->private_data; + bool need_clear; if (disk_openers(disk) > 0) return; + /* + * Clear the backing device information if this is the last close of + * a device that's been marked for auto clear, or on which LOOP_CLR_FD + * has been called. + */ mutex_lock(&lo->lo_mutex); - if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) { + if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) lo->lo_state = Lo_rundown; - mutex_unlock(&lo->lo_mutex); - /* - * In autoclear mode, stop the loop thread - * and remove configuration after last close. - */ - __loop_clr_fd(lo, true); - return; - } + need_clear = (lo->lo_state == Lo_rundown); mutex_unlock(&lo->lo_mutex); + + if (need_clear) + __loop_clr_fd(lo); } static void lo_free_disk(struct gendisk *disk)
Hi , > -----Original Message----- > From: Christoph Hellwig <hch@lst.de> > Sent: Friday, May 31, 2024 5:27 PM > To: Gulam Mohamed <gulam.mohamed@oracle.com> > Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; > yukuai1@huaweicloud.com; hch@lst.de; axboe@kernel.dk > Subject: Re: [PATCH V3 for-6.10/block] loop: Fix a race between loop detach > and loop open > > On Wed, May 29, 2024 at 08:02:40PM +0000, Gulam Mohamed wrote: > > Test case involves the following two scripts: > > > > script1.sh: > > > > while [ 1 ]; > > do > > losetup -P -f /home/opt/looptest/test10.img > > blkid /dev/loop0p1 > > done > > > > script2.sh: > > > > while [ 1 ]; > > do > > losetup -d /dev/loop0 > > done > > When just running these in the background, I just get spam of losetup errors. > Maybe you can turn this into a proper blktests test case with a sensible > timeout? > Yes, the blktests test case for this is already in its final stage of the review > A for the fix: I suspect it is bette to simply always defer the actual work of > disconnecting the backing device, as that avoid the race entirely, and > simplifies the code in nbd by removing special cases. Something like this: I agree. This looks to be a good solution. Will implement this and send V4 for review Thanks & Regards, Gulam Mohamed. > > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c index > 93780f41646b75..c2238c1e2ad68d 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1131,7 +1131,7 @@ static int loop_configure(struct loop_device *lo, > blk_mode_t mode, > return error; > } > > -static void __loop_clr_fd(struct loop_device *lo, bool release) > +static void __loop_clr_fd(struct loop_device *lo) > { > struct file *filp; > gfp_t gfp = lo->old_gfp_mask; > @@ -1139,14 +1139,6 @@ static void __loop_clr_fd(struct loop_device *lo, > bool release) > if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags)) > blk_queue_write_cache(lo->lo_queue, false, false); > > - /* > - * Freeze the request queue when unbinding on a live file descriptor > and > - * thus an open device. When called from ->release we are > guaranteed > - * that there is no I/O in progress already. > - */ > - if (!release) > - blk_mq_freeze_queue(lo->lo_queue); > - > spin_lock_irq(&lo->lo_lock); > filp = lo->lo_backing_file; > lo->lo_backing_file = NULL; > @@ -1164,8 +1156,6 @@ static void __loop_clr_fd(struct loop_device *lo, > bool release) > mapping_set_gfp_mask(filp->f_mapping, gfp); > /* This is safe: open() is still holding a reference. */ > module_put(THIS_MODULE); > - if (!release) > - blk_mq_unfreeze_queue(lo->lo_queue); > > disk_force_media_change(lo->lo_disk); > > @@ -1180,11 +1170,7 @@ static void __loop_clr_fd(struct loop_device *lo, > bool release) > * must be at least one and it can only become zero when the > * current holder is released. > */ > - if (!release) > - mutex_lock(&lo->lo_disk->open_mutex); > err = bdev_disk_changed(lo->lo_disk, false); > - if (!release) > - mutex_unlock(&lo->lo_disk->open_mutex); > if (err) > pr_warn("%s: partition scan of loop%d failed > (rc=%d)\n", > __func__, lo->lo_number, err); > @@ -1232,25 +1218,16 @@ static int loop_clr_fd(struct loop_device *lo) > loop_global_unlock(lo, true); > return -ENXIO; > } > + > /* > - * If we've explicitly asked to tear down the loop device, > - * and it has an elevated reference count, set it for auto-teardown > when > - * the last reference goes away. This stops $!~#$@ udev from > - * preventing teardown because it decided that it needs to run blkid > on > - * the loopback device whenever they appear. xfstests is notorious for > - * failing tests because blkid via udev races with a losetup > - * <dev>/do something like mkfs/losetup -d <dev> causing the losetup > -d > - * command to fail with EBUSY. > + * Mark the device for removing the backing device on last close. > + * If we are the only opener, also switch the state to roundown here > to > + * prevent new openers from coming in. > */ > - if (disk_openers(lo->lo_disk) > 1) { > - lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > - loop_global_unlock(lo, true); > - return 0; > - } > - lo->lo_state = Lo_rundown; > + lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > + if (disk_openers(lo->lo_disk) == 1) > + lo->lo_state = Lo_rundown; > loop_global_unlock(lo, true); > - > - __loop_clr_fd(lo, false); > return 0; > } > > @@ -1720,22 +1697,24 @@ static int lo_compat_ioctl(struct block_device > *bdev, blk_mode_t mode, static void lo_release(struct gendisk *disk) { > struct loop_device *lo = disk->private_data; > + bool need_clear; > > if (disk_openers(disk) > 0) > return; > > + /* > + * Clear the backing device information if this is the last close of > + * a device that's been marked for auto clear, or on which > LOOP_CLR_FD > + * has been called. > + */ > mutex_lock(&lo->lo_mutex); > - if (lo->lo_state == Lo_bound && (lo->lo_flags & > LO_FLAGS_AUTOCLEAR)) { > + if (lo->lo_state == Lo_bound && (lo->lo_flags & > LO_FLAGS_AUTOCLEAR)) > lo->lo_state = Lo_rundown; > - mutex_unlock(&lo->lo_mutex); > - /* > - * In autoclear mode, stop the loop thread > - * and remove configuration after last close. > - */ > - __loop_clr_fd(lo, true); > - return; > - } > + need_clear = (lo->lo_state == Lo_rundown); > mutex_unlock(&lo->lo_mutex); > + > + if (need_clear) > + __loop_clr_fd(lo); > } > > static void lo_free_disk(struct gendisk *disk)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 28a95fd366fe..60f61bf8dbd1 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -49,6 +49,7 @@ struct loop_func_table; struct loop_device { int lo_number; + atomic_t lo_refcnt; loff_t lo_offset; loff_t lo_sizelimit; int lo_flags; @@ -1242,7 +1243,7 @@ static int loop_clr_fd(struct loop_device *lo) * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d * command to fail with EBUSY. */ - if (disk_openers(lo->lo_disk) > 1) { + if (atomic_read(&lo->lo_refcnt) > 1) { lo->lo_flags |= LO_FLAGS_AUTOCLEAR; loop_global_unlock(lo, true); return 0; @@ -1717,14 +1718,31 @@ static int lo_compat_ioctl(struct block_device *bdev, blk_mode_t mode, } #endif -static void lo_release(struct gendisk *disk) +static int lo_open(struct gendisk *disk, blk_mode_t mode) { struct loop_device *lo = disk->private_data; + int err; - if (disk_openers(disk) > 0) - return; + err = mutex_lock_killable(&lo->lo_mutex); + if (err) + return err; + + if (lo->lo_state == Lo_deleting || lo->lo_state == Lo_rundown) + err = -ENXIO; + else + atomic_inc(&lo->lo_refcnt); + mutex_unlock(&lo->lo_mutex); + return err; +} + +static void lo_release(struct gendisk *disk) +{ + struct loop_device *lo = disk->private_data; mutex_lock(&lo->lo_mutex); + if (atomic_dec_return(&lo->lo_refcnt)) + goto out_unlock; + if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) { lo->lo_state = Lo_rundown; mutex_unlock(&lo->lo_mutex); @@ -1735,6 +1753,7 @@ static void lo_release(struct gendisk *disk) __loop_clr_fd(lo, true); return; } +out_unlock: mutex_unlock(&lo->lo_mutex); } @@ -1752,6 +1771,7 @@ static void lo_free_disk(struct gendisk *disk) static const struct block_device_operations lo_fops = { .owner = THIS_MODULE, + .open = lo_open, .release = lo_release, .ioctl = lo_ioctl, #ifdef CONFIG_COMPAT @@ -2064,6 +2084,7 @@ static int loop_add(int i) */ if (!part_shift) set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); + atomic_set(&lo->lo_refcnt, 0); mutex_init(&lo->lo_mutex); lo->lo_number = i; spin_lock_init(&lo->lo_lock); @@ -2158,7 +2179,7 @@ static int loop_control_remove(int idx) ret = mutex_lock_killable(&lo->lo_mutex); if (ret) goto mark_visible; - if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) { + if (lo->lo_state != Lo_unbound || atomic_read(&lo->lo_refcnt) > 0) { mutex_unlock(&lo->lo_mutex); ret = -EBUSY; goto mark_visible;
1. Userspace sends the command "losetup -d" which uses the open() call to open the device 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the function loop_clr_fd() 3. If LOOP_CLR_FD is the first command received at the time, then the AUTOCLEAR flag is not set and deletion of the loop device proceeds ahead and scans the partitions (drop/add partitions) if (disk_openers(lo->lo_disk) > 1) { lo->lo_flags |= LO_FLAGS_AUTOCLEAR; loop_global_unlock(lo, true); return 0; } 4. Before scanning partitions, it will check to see if any partition of the loop device is currently opened 5. If any partition is opened, then it will return EBUSY: if (disk->open_partitions) return -EBUSY; 6. So, after receiving the "LOOP_CLR_FD" command and just before the above check for open_partitions, if any other command (like blkid) opens any partition of the loop device, then the partition scan will not proceed and EBUSY is returned as shown in above code 7. But in "__loop_clr_fd()", this EBUSY error is not propagated 8. We have noticed that this is causing the partitions of the loop to remain stale even after the loop device is detached resulting in the IO errors on the partitions Fix: Re-introduce the lo_open() call to restrict any process to open the loop device when its being detached Test case involves the following two scripts: script1.sh: while [ 1 ]; do losetup -P -f /home/opt/looptest/test10.img blkid /dev/loop0p1 done script2.sh: while [ 1 ]; do losetup -d /dev/loop0 done Without fix, the following IO errors have been observed: kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16) kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page read Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> --- v3<-v2: Re-introduced the loop->lo_refcnt to take care of the case where we race when the Lo_rundown is set after the lo_open() function releases the lo_mutex lock drivers/block/loop.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)