Message ID | 20151019143546.GA7486@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/19/2015 07:35 AM, Christoph Hellwig wrote: > When dropping a lock while iterating a list we must restart the search > as other threads could have manipulated the list under us. Without this > we can get stuck in an endless loop. Hello Christoph, Thanks for looking into this. However, I think we need a motivation in the patch description why this patch does not reintroduce the soft lockup documented in patch "scsi_remove_target: fix softlockup regression on hot remove" (commit bc3f02a795d3). Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote: > Thanks for looking into this. However, I think we need a motivation in the > patch description why this patch does not reintroduce the soft lockup > documented in patch "scsi_remove_target: fix softlockup regression on hot > remove" (commit bc3f02a795d3). Interesting. I tried to find the original report and what state changes would cause an endless loop here. Dan, do you remember any details about this bug report? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-10-19 at 17:56 +0200, Christoph Hellwig wrote: > On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote: > > Thanks for looking into this. However, I think we need a motivation in the > > patch description why this patch does not reintroduce the soft lockup > > documented in patch "scsi_remove_target: fix softlockup regression on hot > > remove" (commit bc3f02a795d3). > > Interesting. I tried to find the original report and what state > changes would cause an endless loop here. The original report is this one: http://thread.gmane.org/gmane.linux.kernel/1348679 James > Dan, do you remember any > details about this bug report? > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 19, 2015 at 8:56 AM, Christoph Hellwig <hch@lst.de> wrote: > On Mon, Oct 19, 2015 at 08:36:23AM -0700, Bart Van Assche wrote: >> Thanks for looking into this. However, I think we need a motivation in the >> patch description why this patch does not reintroduce the soft lockup >> documented in patch "scsi_remove_target: fix softlockup regression on hot >> remove" (commit bc3f02a795d3). > > Interesting. I tried to find the original report and what state > changes would cause an endless loop here. Dan, do you remember any > details about this bug report? I believe the issue I was seeing back then might have been fixed or at least modulated by "f2495e228fce [SCSI] dual scan thread bug fix" which came a few years later. The original problem was hot-remove racing hot-add and that scsi_target_reap() was not guaranteed to advance the state of the target if it was in the process of being scanned when a removal event arrived. However the comment in that change: + /* + * if we get here and the target is still in the CREATED state that + * means it was allocated but never made visible (because a scan + * turned up no LUNs), so don't call device_del() on it. + */ ...is not what I was seeing. The target was in the CREATED state because it had not yet completed the initial scan before tear down was initiated. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-10-19 at 16:35 +0200, Christoph Hellwig wrote: > When dropping a lock while iterating a list we must restart the > search > as other threads could have manipulated the list under us. Without > this > we can get stuck in an endless loop. > > Reported-by: Johannes Thumshirn <jthumshirn@suse.de> > Signed-off-by: Christoph Hellwig <hch@lst.de> > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index b333389f..d3b34d8 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1158,31 +1158,23 @@ static void __scsi_remove_target(struct > scsi_target *starget) > void scsi_remove_target(struct device *dev) > { > struct Scsi_Host *shost = dev_to_shost(dev->parent); > - struct scsi_target *starget, *last = NULL; > + struct scsi_target *starget; > unsigned long flags; > > - /* remove targets being careful to lookup next entry before > - * deleting the last > - */ > +restart: > spin_lock_irqsave(shost->host_lock, flags); > list_for_each_entry(starget, &shost->__targets, siblings) { > if (starget->state == STARGET_DEL) > continue; > if (starget->dev.parent == dev || &starget->dev == > dev) { > - /* assuming new targets arrive at the end */ > kref_get(&starget->reap_ref); > spin_unlock_irqrestore(shost->host_lock, > flags); > - if (last) > - scsi_target_reap(last); > - last = starget; > __scsi_remove_target(starget); > - spin_lock_irqsave(shost->host_lock, flags); > + scsi_target_reap(starget); > + goto restart; > } > } > spin_unlock_irqrestore(shost->host_lock, flags); > - > - if (last) > - scsi_target_reap(last); > } > EXPORT_SYMBOL(scsi_remove_target); Hi Christoph, I haven't heard anything from the original reporter of the lockup but my test's went all O.K., so Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2015-10-19 at 16:35 +0200, Christoph Hellwig wrote: > When dropping a lock while iterating a list we must restart the > search > as other threads could have manipulated the list under us. Without > this > we can get stuck in an endless loop. > > Reported-by: Johannes Thumshirn <jthumshirn@suse.de> > Signed-off-by: Christoph Hellwig <hch@lst.de> I have an official acknowledge from the original author that a 48h long term test on their side didn't reproduce the lockup as well. James can we please get the patch included in 4.4 and stable? Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index b333389f..d3b34d8 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1158,31 +1158,23 @@ static void __scsi_remove_target(struct scsi_target *starget) void scsi_remove_target(struct device *dev) { struct Scsi_Host *shost = dev_to_shost(dev->parent); - struct scsi_target *starget, *last = NULL; + struct scsi_target *starget; unsigned long flags; - /* remove targets being careful to lookup next entry before - * deleting the last - */ +restart: spin_lock_irqsave(shost->host_lock, flags); list_for_each_entry(starget, &shost->__targets, siblings) { if (starget->state == STARGET_DEL) continue; if (starget->dev.parent == dev || &starget->dev == dev) { - /* assuming new targets arrive at the end */ kref_get(&starget->reap_ref); spin_unlock_irqrestore(shost->host_lock, flags); - if (last) - scsi_target_reap(last); - last = starget; __scsi_remove_target(starget); - spin_lock_irqsave(shost->host_lock, flags); + scsi_target_reap(starget); + goto restart; } } spin_unlock_irqrestore(shost->host_lock, flags); - - if (last) - scsi_target_reap(last); } EXPORT_SYMBOL(scsi_remove_target);
When dropping a lock while iterating a list we must restart the search as other threads could have manipulated the list under us. Without this we can get stuck in an endless loop. Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html