Message ID | 20171129030556.47833-1-yanaijie@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On 11/29/2017 04:05 AM, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > removed scsi_device_get() and directly called get_device() to increase > the refcount of the device. But actullay scsi_device_get() will fail in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > ->scsi_disk_put() > ->put_device() last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 > ->scsi_remove_device() > ->put_device() refcount decreased to 0 again > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > The same scsi device will be found agian because it is in the shost->__devices > list until scsi_device_dev_release_usercontext() called, although the device > state was set to SDEV_DEL after the first scsi_remove_device(). > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > time be called. > > Call trace: > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > [<ffff0000086662cc>] device_release+0x3c/0xa0 > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > [<ffff0000086666fc>] put_device+0x24/0x30 > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > [<ffff000008704a50>] sd_release+0x50/0x80 > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > [<ffff000008279b64>] __fput+0x94/0x1d8 > [<ffff000008279d20>] ____fput+0x20/0x30 > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > And sometimes in __scsi_remove_target() it will loop for a long time > removing the same device if someone else holding a refcount until the > last refcount is released. > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > because the full refcount implement will prevent the refcount increase > when it is 0. > > Fix this by checking the sdev_state again like we did before in > scsi_device_get(). Then when iterating shost again we will skip the device > deleted because scsi_remove_device() will set the device state to > SDEV_CANCEL or SDEV_DEL. > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > Signed-off-by: Jason Yan <yanaijie@huawei.com> > CC: Hannes Reinecke <hare@suse.de> > CC: Christoph Hellwig <hch@lst.de> > CC: Johannes Thumshirn <jthumshirn@suse.de> > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > CC: Miao Xie <miaoxie@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} > + > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes
On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > removed scsi_device_get() and directly called get_device() to increase > the refcount of the device. But actullay scsi_device_get() will fail in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > ->scsi_disk_put() > ->put_device() last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 > ->scsi_remove_device() > ->put_device() refcount decreased to 0 again > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > The same scsi device will be found agian because it is in the shost->__devices > list until scsi_device_dev_release_usercontext() called, although the device > state was set to SDEV_DEL after the first scsi_remove_device(). > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > time be called. > > Call trace: > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > [<ffff0000086662cc>] device_release+0x3c/0xa0 > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > [<ffff0000086666fc>] put_device+0x24/0x30 > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > [<ffff000008704a50>] sd_release+0x50/0x80 > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > [<ffff000008279b64>] __fput+0x94/0x1d8 > [<ffff000008279d20>] ____fput+0x20/0x30 > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > And sometimes in __scsi_remove_target() it will loop for a long time > removing the same device if someone else holding a refcount until the > last refcount is released. > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > because the full refcount implement will prevent the refcount increase > when it is 0. > > Fix this by checking the sdev_state again like we did before in > scsi_device_get(). Then when iterating shost again we will skip the device > deleted because scsi_remove_device() will set the device state to > SDEV_CANCEL or SDEV_DEL. > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > Signed-off-by: Jason Yan <yanaijie@huawei.com> > CC: Hannes Reinecke <hare@suse.de> > CC: Christoph Hellwig <hch@lst.de> > CC: Johannes Thumshirn <jthumshirn@suse.de> > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > CC: Miao Xie <miaoxie@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} > + > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); Hi Greg, As the above patch description shows it can happen that the SCSI core calls get_device() after the device reference count has reached zero and before the memory for struct device is freed. Although the above patch looks fine to me, would you consider it acceptable to modify get_device() such that it uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this because that change would help to reduce the complexity of the already too complicated SCSI core. Thanks, Bart.
On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > As the above patch description shows it can happen that the SCSI core calls > get_device() after the device reference count has reached zero and before > the memory for struct device is freed. Although the above patch looks fine > to me, would you consider it acceptable to modify get_device() such that it > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > because that change would help to reduce the complexity of the already too > complicated SCSI core. I don't think we can just modify get_device, but we can add a new get_device_unless_zero. In fact I have an open coded variant of that in nvme, and was planning to submit one for the current merge window..
On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), > we > removed scsi_device_get() and directly called get_device() to > increase > the refcount of the device. But actullay scsi_device_get() will fail > in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > - > >scsi_disk_put() > ->put_device() > last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 This analysis fails here: get_device() on something with refcount 0 returns NULL. That triggers the if clause to ignore this device. We may have a more complex way of triggering a dual put race as the trace implies, but I don't think this is it. [...] > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device > *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == > SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} This is pretty much scsi_device_get() without the try_module get, so they should probably be combined. James > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct > scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev);
On Wed, Nov 29, 2017 at 08:31:48AM -0800, James Bottomley wrote: > This analysis fails here: get_device() on something with refcount 0 > returns NULL. That triggers the if clause to ignore this device. No, it doesn't. Take a look at the get_device and kobject_get implementations,
On Wed, 2017-11-29 at 17:34 +0100, Christoph Hellwig wrote: > On Wed, Nov 29, 2017 at 08:31:48AM -0800, James Bottomley wrote: > > > > This analysis fails here: get_device() on something with refcount 0 > > returns NULL. That triggers the if clause to ignore this device. > > No, it doesn't. Take a look at the get_device and kobject_get > implementations, Hm, so why doesn't get_device use kref_get_unless_zero()? James
On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > > removed scsi_device_get() and directly called get_device() to increase > > the refcount of the device. But actullay scsi_device_get() will fail in > > three cases: > > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > > 2. get_device() fail > > 3. the module is not alive > > > > The intended purpose was to remove the check of the module alive. > > Unfortunately the check of the device state was droped too. And this > > introduced a race condition like this: > > > > CPU0 CPU1 > > __scsi_remove_target() > > ->iterate shost->__devices > > ->scsi_remove_device() > > ->put_device() > > someone still hold a refcount > > sd_release() > > ->scsi_disk_put() > > ->put_device() last put and trigger the device release > > > > ->goto restart > > ->iterate shost->__devices and got the same device > > ->get_device() while refcount is 0 > > ->scsi_remove_device() > > ->put_device() refcount decreased to 0 again > > ->scsi_device_dev_release() > > ->scsi_device_dev_release_usercontext() > > > > ->scsi_device_dev_release() > > ->scsi_device_dev_release_usercontext() > > > > The same scsi device will be found agian because it is in the shost->__devices > > list until scsi_device_dev_release_usercontext() called, although the device > > state was set to SDEV_DEL after the first scsi_remove_device(). > > > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > > time be called. > > > > Call trace: > > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > > [<ffff0000086662cc>] device_release+0x3c/0xa0 > > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > > [<ffff0000086666fc>] put_device+0x24/0x30 > > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > > [<ffff000008704a50>] sd_release+0x50/0x80 > > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > > [<ffff000008279b64>] __fput+0x94/0x1d8 > > [<ffff000008279d20>] ____fput+0x20/0x30 > > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > > > And sometimes in __scsi_remove_target() it will loop for a long time > > removing the same device if someone else holding a refcount until the > > last refcount is released. > > > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > > because the full refcount implement will prevent the refcount increase > > when it is 0. > > > > Fix this by checking the sdev_state again like we did before in > > scsi_device_get(). Then when iterating shost again we will skip the device > > deleted because scsi_remove_device() will set the device state to > > SDEV_CANCEL or SDEV_DEL. > > > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > > Signed-off-by: Jason Yan <yanaijie@huawei.com> > > CC: Hannes Reinecke <hare@suse.de> > > CC: Christoph Hellwig <hch@lst.de> > > CC: Johannes Thumshirn <jthumshirn@suse.de> > > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > > CC: Miao Xie <miaoxie@huawei.com> > > --- > > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > > index 50e7d7e..d398894 100644 > > --- a/drivers/scsi/scsi_sysfs.c > > +++ b/drivers/scsi/scsi_sysfs.c > > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > > } > > EXPORT_SYMBOL(scsi_remove_device); > > > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > > +{ > > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > > + return -ENXIO; > > + if (!get_device(&sdev->sdev_gendev)) > > + return -ENXIO; > > + return 0; > > +} > > + > > static void __scsi_remove_target(struct scsi_target *starget) > > { > > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > > */ > > if (sdev->channel != starget->channel || > > sdev->id != starget->id || > > - !get_device(&sdev->sdev_gendev)) > > + scsi_device_get_not_deleted(sdev)) > > continue; > > spin_unlock_irqrestore(shost->host_lock, flags); > > scsi_remove_device(sdev); > > Hi Greg, > > As the above patch description shows it can happen that the SCSI core calls > get_device() after the device reference count has reached zero and before > the memory for struct device is freed. Although the above patch looks fine > to me, would you consider it acceptable to modify get_device() such that it > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > because that change would help to reduce the complexity of the already too > complicated SCSI core. Shouldn't there be a bus lock somewhere preventing this race? Having an open-coded put call isn't good, as you see here. thanks, greg k-h
On Wed, Nov 29, 2017 at 05:20:50PM +0100, hch@lst.de wrote: > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > As the above patch description shows it can happen that the SCSI core calls > > get_device() after the device reference count has reached zero and before > > the memory for struct device is freed. Although the above patch looks fine > > to me, would you consider it acceptable to modify get_device() such that it > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > because that change would help to reduce the complexity of the already too > > complicated SCSI core. > > I don't think we can just modify get_device, but we can add a new > get_device_unless_zero. In fact I have an open coded variant of that > in nvme, and was planning to submit one for the current merge window.. I feel like that is just delaying the real fix, shouldn't there be a bus lock somewhere on the put_device path for this bus to prevent this? thanks, greg k-h
On Wed, 2017-11-29 at 17:39 +0000, gregkh@linuxfoundation.org wrote: > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > As the above patch description shows it can happen that the SCSI core calls > > get_device() after the device reference count has reached zero and before > > the memory for struct device is freed. Although the above patch looks fine > > to me, would you consider it acceptable to modify get_device() such that it > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > because that change would help to reduce the complexity of the already too > > complicated SCSI core. > > Shouldn't there be a bus lock somewhere preventing this race? Having an > open-coded put call isn't good, as you see here. Hello Greg, The get_device() call occurs with the SCSI host lock held. The SCSI host lock serializes iteration over the sibling list by the get_device() caller and removal of the SCSI host from the SCSI device sibling list by scsi_device_dev_release_usercontext(). If you have a look at __scsi_remove_target() then you will see that the host lock has to be released after a matching SCSI target has been found and before scsi_remove_device() is called. The latter function namely may sleep. Bart.
On Wed, 2017-11-29 at 17:39 +0000, gregkh@linuxfoundation.org wrote: > On Wed, Nov 29, 2017 at 05:20:50PM +0100, hch@lst.de wrote: > > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > > As the above patch description shows it can happen that the SCSI core calls > > > get_device() after the device reference count has reached zero and before > > > the memory for struct device is freed. Although the above patch looks fine > > > to me, would you consider it acceptable to modify get_device() such that it > > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > > because that change would help to reduce the complexity of the already too > > > complicated SCSI core. > > > > I don't think we can just modify get_device, but we can add a new > > get_device_unless_zero. In fact I have an open coded variant of that > > in nvme, and was planning to submit one for the current merge window.. > > I feel like that is just delaying the real fix, shouldn't there be a bus > lock somewhere on the put_device path for this bus to prevent this? > > thanks, > > greg k-h Why is it that clients of the kobject code have to have their own lock / state checking to prevent a duplicate destructor callback? It seems to me like this is something the core functionality should provide, because a get inside a destructor would *always* be wrong, no? It looks like: void refcount_inc(refcount_t *r) { WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n"); } would have warned if CONFIG_REFCOUNT_FULL was on, I/we don't normally enable that though. -Ewan
On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > removed scsi_device_get() and directly called get_device() to increase > the refcount of the device. But actullay scsi_device_get() will fail in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > ->scsi_disk_put() > ->put_device() last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 > ->scsi_remove_device() > ->put_device() refcount decreased to 0 again > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > The same scsi device will be found agian because it is in the shost->__devices > list until scsi_device_dev_release_usercontext() called, although the device > state was set to SDEV_DEL after the first scsi_remove_device(). > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > time be called. > > Call trace: > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > [<ffff0000086662cc>] device_release+0x3c/0xa0 > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > [<ffff0000086666fc>] put_device+0x24/0x30 > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > [<ffff000008704a50>] sd_release+0x50/0x80 > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > [<ffff000008279b64>] __fput+0x94/0x1d8 > [<ffff000008279d20>] ____fput+0x20/0x30 > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > And sometimes in __scsi_remove_target() it will loop for a long time > removing the same device if someone else holding a refcount until the > last refcount is released. > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > because the full refcount implement will prevent the refcount increase > when it is 0. > > Fix this by checking the sdev_state again like we did before in > scsi_device_get(). Then when iterating shost again we will skip the device > deleted because scsi_remove_device() will set the device state to > SDEV_CANCEL or SDEV_DEL. > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > Signed-off-by: Jason Yan <yanaijie@huawei.com> > CC: Hannes Reinecke <hare@suse.de> > CC: Christoph Hellwig <hch@lst.de> > CC: Johannes Thumshirn <jthumshirn@suse.de> > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > CC: Miao Xie <miaoxie@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} > + > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); See subsequent discussion, however, we have a reproducible case here and the patch does appear to fix the issue (500+ iterations). Reviewed-by: Ewan D. Milne <emilne@redhat.com>
On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote:
> because a get inside a destructor would *always* be wrong, no?
Hello Ewan,
That's not what we are discussing. What can happen with the SCSI core is that
get_device() is called concurrently with the destructor. get_device() can be
called concurrently with the destructor because the destructore removes a
device from the siblings list and because the SCSI core can call get_device()
for devices it finds on the siblings list. Personally I think that design is
superior compared to removing a SCSI device from the sibling list before the
last put_device() call because the approach followed in the SCSI core leads to
a simpler implementation. However, it seems like the current get_device()
implementation does not yet support the SCSI core design ...
Bart.
On Wed, 2017-11-29 at 19:11 +0000, Bart Van Assche wrote: > On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote: > > because a get inside a destructor would *always* be wrong, no? > > Hello Ewan, > > That's not what we are discussing. What can happen with the SCSI core is that > get_device() is called concurrently with the destructor. get_device() can be > called concurrently with the destructor because the destructore removes a > device from the siblings list and because the SCSI core can call get_device() > for devices it finds on the siblings list. Personally I think that design is > superior compared to removing a SCSI device from the sibling list before the > last put_device() call because the approach followed in the SCSI core leads to > a simpler implementation. However, it seems like the current get_device() > implementation does not yet support the SCSI core design ... > > Bart. OK, well, I think the point still stands, though, once the refcount goes to zero and the destructor is invoked, a get that then increments the refcount seems fundamentally wrong to me. Especially if a subsequent put causes the destructor to be invoked *simultaneously* *on another thread*. The locking has to happen somewhere, why isn't this done by the kobject? Relying on the client code to get this right means that there are opportunities all over the kernel for problems like this to happen, just like here, where we inadvertently removed the state check that prevented the get_device() call. -Ewan
On Wed, 2017-11-29 at 14:20 -0500, Ewan D. Milne wrote: > OK, well, I think the point still stands, though, once the refcount > goes to zero and the destructor is invoked, a get that then increments > the refcount seems fundamentally wrong to me. I agree that incrementing a reference count that has dropped to zero is wrong. However, that's what happens currently. That behavior has been reported as a bug. We need to fix this behavior, either through the patch at the start of this thread or by using code that avoids to increment a zero reference count, e.g. kobject_get_unless_zero(). Bart.
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 50e7d7e..d398894 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) } EXPORT_SYMBOL(scsi_remove_device); +static int scsi_device_get_not_deleted(struct scsi_device *sdev) +{ + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) + return -ENXIO; + if (!get_device(&sdev->sdev_gendev)) + return -ENXIO; + return 0; +} + static void __scsi_remove_target(struct scsi_target *starget) { struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) */ if (sdev->channel != starget->channel || sdev->id != starget->id || - !get_device(&sdev->sdev_gendev)) + scsi_device_get_not_deleted(sdev)) continue; spin_unlock_irqrestore(shost->host_lock, flags); scsi_remove_device(sdev);
In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we removed scsi_device_get() and directly called get_device() to increase the refcount of the device. But actullay scsi_device_get() will fail in three cases: 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state 2. get_device() fail 3. the module is not alive The intended purpose was to remove the check of the module alive. Unfortunately the check of the device state was droped too. And this introduced a race condition like this: CPU0 CPU1 __scsi_remove_target() ->iterate shost->__devices ->scsi_remove_device() ->put_device() someone still hold a refcount sd_release() ->scsi_disk_put() ->put_device() last put and trigger the device release ->goto restart ->iterate shost->__devices and got the same device ->get_device() while refcount is 0 ->scsi_remove_device() ->put_device() refcount decreased to 0 again ->scsi_device_dev_release() ->scsi_device_dev_release_usercontext() ->scsi_device_dev_release() ->scsi_device_dev_release_usercontext() The same scsi device will be found agian because it is in the shost->__devices list until scsi_device_dev_release_usercontext() called, although the device state was set to SDEV_DEL after the first scsi_remove_device(). Finally we got a oops in scsi_device_dev_release_usercontext() when the second time be called. Call trace: [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 [<ffff0000086662cc>] device_release+0x3c/0xa0 [<ffff000008c2e780>] kobject_put+0x80/0xf0 [<ffff0000086666fc>] put_device+0x24/0x30 [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 [<ffff000008704894>] scsi_disk_put+0x44/0x60 [<ffff000008704a50>] sd_release+0x50/0x80 [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 [<ffff000008279b64>] __fput+0x94/0x1d8 [<ffff000008279d20>] ____fput+0x20/0x30 [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 And sometimes in __scsi_remove_target() it will loop for a long time removing the same device if someone else holding a refcount until the last refcount is released. Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered because the full refcount implement will prevent the refcount increase when it is 0. Fix this by checking the sdev_state again like we did before in scsi_device_get(). Then when iterating shost again we will skip the device deleted because scsi_remove_device() will set the device state to SDEV_CANCEL or SDEV_DEL. Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: Hannes Reinecke <hare@suse.de> CC: Christoph Hellwig <hch@lst.de> CC: Johannes Thumshirn <jthumshirn@suse.de> CC: Zhaohongjiang <zhaohongjiang@huawei.com> CC: Miao Xie <miaoxie@huawei.com> --- drivers/scsi/scsi_sysfs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)