diff mbox series

[v2,07/21] scsi: sd: Do not issue commands to suspended disks on remove

Message ID 20230912005655.368075-8-dlemoal@kernel.org (mailing list archive)
State Superseded
Headers show
Series Fix libata suspend/resume handling and code cleanup | expand

Commit Message

Damien Le Moal Sept. 12, 2023, 12:56 a.m. UTC
If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. However, since this function calls sd_shutdown(),
a synchronize cache command and a start stop unit may be issued with the
drive still sleeping and the HBA non-functional. This causes PM resume
to hang, forcing a reset of the machine to recover.

Fix this by checking a device host state in sd_shutdown() and by
returning early doing nothing if the host state is not SHOST_RUNNING.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/sd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Sept. 14, 2023, 2:48 p.m. UTC | #1
On 9/11/23 17:56, Damien Le Moal wrote:
> If an error occurs when resuming a host adapter before the devices
> attached to the adapter are resumed, the adapter low level driver may
> remove the scsi host, resulting in a call to sd_remove() for the
> disks of the host. However, since this function calls sd_shutdown(),
> a synchronize cache command and a start stop unit may be issued with the
> drive still sleeping and the HBA non-functional. This causes PM resume
> to hang, forcing a reset of the machine to recover.
> 
> Fix this by checking a device host state in sd_shutdown() and by
> returning early doing nothing if the host state is not SHOST_RUNNING.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>   drivers/scsi/sd.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c92a317ba547..a415abb721d3 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
>   	if (!sdkp)
>   		return;         /* this can happen */
>   
> -	if (pm_runtime_suspended(dev))
> +	if (pm_runtime_suspended(dev) ||
> +	    sdkp->device->host->shost_state != SHOST_RUNNING)
>   		return;
>   
>   	if (sdkp->WCE && sdkp->media_present) {

The above seems wrong to me because no SYNCHRONIZE CACHE command will be
sent even if the device is still present, if the SCSI error handler is
active and if it will succeed at a later time. How about replacing the
above patch with something like the untested patch below?

Thanks,

Bart.


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4cd281368826..c0e069d9d58e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3689,12 +3689,14 @@ static int sd_probe(struct device *dev)
  static int sd_remove(struct device *dev)
  {
  	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+	int rpm_get_res;

-	scsi_autopm_get_device(sdkp->device);
+	rpm_get_res = scsi_autopm_get_device(sdkp->device);

  	device_del(&sdkp->disk_dev);
  	del_gendisk(sdkp->disk);
-	sd_shutdown(dev);
+	if (rpm_get_res >= 0)
+		sd_shutdown(dev);

  	put_disk(sdkp->disk);
  	return 0;
Damien Le Moal Sept. 14, 2023, 10:06 p.m. UTC | #2
On 9/14/23 23:48, Bart Van Assche wrote:
> On 9/11/23 17:56, Damien Le Moal wrote:
>> If an error occurs when resuming a host adapter before the devices
>> attached to the adapter are resumed, the adapter low level driver may
>> remove the scsi host, resulting in a call to sd_remove() for the
>> disks of the host. However, since this function calls sd_shutdown(),
>> a synchronize cache command and a start stop unit may be issued with the
>> drive still sleeping and the HBA non-functional. This causes PM resume
>> to hang, forcing a reset of the machine to recover.
>>
>> Fix this by checking a device host state in sd_shutdown() and by
>> returning early doing nothing if the host state is not SHOST_RUNNING.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/scsi/sd.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index c92a317ba547..a415abb721d3 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
>>   	if (!sdkp)
>>   		return;         /* this can happen */
>>   
>> -	if (pm_runtime_suspended(dev))
>> +	if (pm_runtime_suspended(dev) ||
>> +	    sdkp->device->host->shost_state != SHOST_RUNNING)
>>   		return;
>>   
>>   	if (sdkp->WCE && sdkp->media_present) {
> 
> The above seems wrong to me because no SYNCHRONIZE CACHE command will be
> sent even if the device is still present, if the SCSI error handler is
> active and if it will succeed at a later time. How about replacing the
> above patch with something like the untested patch below?
> 
> Thanks,
> 
> Bart.
> 
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 4cd281368826..c0e069d9d58e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3689,12 +3689,14 @@ static int sd_probe(struct device *dev)
>   static int sd_remove(struct device *dev)
>   {
>   	struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +	int rpm_get_res;
> 
> -	scsi_autopm_get_device(sdkp->device);
> +	rpm_get_res = scsi_autopm_get_device(sdkp->device);
> 
>   	device_del(&sdkp->disk_dev);
>   	del_gendisk(sdkp->disk);
> -	sd_shutdown(dev);
> +	if (rpm_get_res >= 0)
> +		sd_shutdown(dev);
> 
>   	put_disk(sdkp->disk);
>   	return 0;

OK. Let me try this.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c92a317ba547..a415abb721d3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3763,7 +3763,8 @@  static void sd_shutdown(struct device *dev)
 	if (!sdkp)
 		return;         /* this can happen */
 
-	if (pm_runtime_suspended(dev))
+	if (pm_runtime_suspended(dev) ||
+	    sdkp->device->host->shost_state != SHOST_RUNNING)
 		return;
 
 	if (sdkp->WCE && sdkp->media_present) {