Message ID | 20230912005655.368075-8-dlemoal@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix libata suspend/resume handling and code cleanup | expand |
On 9/11/23 17:56, Damien Le Moal wrote: > If an error occurs when resuming a host adapter before the devices > attached to the adapter are resumed, the adapter low level driver may > remove the scsi host, resulting in a call to sd_remove() for the > disks of the host. However, since this function calls sd_shutdown(), > a synchronize cache command and a start stop unit may be issued with the > drive still sleeping and the HBA non-functional. This causes PM resume > to hang, forcing a reset of the machine to recover. > > Fix this by checking a device host state in sd_shutdown() and by > returning early doing nothing if the host state is not SHOST_RUNNING. > > Cc: stable@vger.kernel.org > Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > Reviewed-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/sd.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index c92a317ba547..a415abb721d3 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev) > if (!sdkp) > return; /* this can happen */ > > - if (pm_runtime_suspended(dev)) > + if (pm_runtime_suspended(dev) || > + sdkp->device->host->shost_state != SHOST_RUNNING) > return; > > if (sdkp->WCE && sdkp->media_present) { The above seems wrong to me because no SYNCHRONIZE CACHE command will be sent even if the device is still present, if the SCSI error handler is active and if it will succeed at a later time. How about replacing the above patch with something like the untested patch below? Thanks, Bart. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 4cd281368826..c0e069d9d58e 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3689,12 +3689,14 @@ static int sd_probe(struct device *dev) static int sd_remove(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); + int rpm_get_res; - scsi_autopm_get_device(sdkp->device); + rpm_get_res = scsi_autopm_get_device(sdkp->device); device_del(&sdkp->disk_dev); del_gendisk(sdkp->disk); - sd_shutdown(dev); + if (rpm_get_res >= 0) + sd_shutdown(dev); put_disk(sdkp->disk); return 0;
On 9/14/23 23:48, Bart Van Assche wrote: > On 9/11/23 17:56, Damien Le Moal wrote: >> If an error occurs when resuming a host adapter before the devices >> attached to the adapter are resumed, the adapter low level driver may >> remove the scsi host, resulting in a call to sd_remove() for the >> disks of the host. However, since this function calls sd_shutdown(), >> a synchronize cache command and a start stop unit may be issued with the >> drive still sleeping and the HBA non-functional. This causes PM resume >> to hang, forcing a reset of the machine to recover. >> >> Fix this by checking a device host state in sd_shutdown() and by >> returning early doing nothing if the host state is not SHOST_RUNNING. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> >> Reviewed-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/scsi/sd.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index c92a317ba547..a415abb721d3 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev) >> if (!sdkp) >> return; /* this can happen */ >> >> - if (pm_runtime_suspended(dev)) >> + if (pm_runtime_suspended(dev) || >> + sdkp->device->host->shost_state != SHOST_RUNNING) >> return; >> >> if (sdkp->WCE && sdkp->media_present) { > > The above seems wrong to me because no SYNCHRONIZE CACHE command will be > sent even if the device is still present, if the SCSI error handler is > active and if it will succeed at a later time. How about replacing the > above patch with something like the untested patch below? > > Thanks, > > Bart. > > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 4cd281368826..c0e069d9d58e 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3689,12 +3689,14 @@ static int sd_probe(struct device *dev) > static int sd_remove(struct device *dev) > { > struct scsi_disk *sdkp = dev_get_drvdata(dev); > + int rpm_get_res; > > - scsi_autopm_get_device(sdkp->device); > + rpm_get_res = scsi_autopm_get_device(sdkp->device); > > device_del(&sdkp->disk_dev); > del_gendisk(sdkp->disk); > - sd_shutdown(dev); > + if (rpm_get_res >= 0) > + sd_shutdown(dev); > > put_disk(sdkp->disk); > return 0; OK. Let me try this.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c92a317ba547..a415abb721d3 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev) if (!sdkp) return; /* this can happen */ - if (pm_runtime_suspended(dev)) + if (pm_runtime_suspended(dev) || + sdkp->device->host->shost_state != SHOST_RUNNING) return; if (sdkp->WCE && sdkp->media_present) {