diff mbox series

[v3,02/23] ata: libata-core: Fix port and device removal

Message ID 20230915081507.761711-3-dlemoal@kernel.org (mailing list archive)
State Superseded
Headers show
Series Fix libata suspend/resume handling and code cleanup | expand

Commit Message

Damien Le Moal Sept. 15, 2023, 8:14 a.m. UTC
Whenever an ATA adapter driver is removed (e.g. rmmod),
ata_port_detach() is called repeatedly for all the adapter ports to
remove (unload) the devices attached to the port and delete the port
device itself. Removing of devices is done using libata EH with the
ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
ata_eh_unload() which disables all devices attached to the port.

ata_port_detach() finishes by calling scsi_remove_host() to remove the
scsi host associated with the port. This function will trigger the
removal of all scsi devices attached to the host and in the case of
disks, calls to sd_shutdown() which will flush the device write cache
and stop the device. However, given that the devices were already
disabled by ata_eh_unload(), the synchronize write cache command and
start stop unit commands fail. E.g. running "rmmod ahci" with first
removing sd_mod results in error messages like:

ata13.00: disable device
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] Stopping disk
sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Fix this by removing all scsi devices of the ata devices connected to
the port before scheduling libata EH to disable the ATA devices.

Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
---
 drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Niklas Cassel Sept. 19, 2023, 1:21 p.m. UTC | #1
On Fri, Sep 15, 2023 at 05:14:46PM +0900, Damien Le Moal wrote:
> Whenever an ATA adapter driver is removed (e.g. rmmod),
> ata_port_detach() is called repeatedly for all the adapter ports to
> remove (unload) the devices attached to the port and delete the port
> device itself. Removing of devices is done using libata EH with the
> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
> ata_eh_unload() which disables all devices attached to the port.
> 
> ata_port_detach() finishes by calling scsi_remove_host() to remove the
> scsi host associated with the port. This function will trigger the
> removal of all scsi devices attached to the host and in the case of
> disks, calls to sd_shutdown() which will flush the device write cache
> and stop the device. However, given that the devices were already
> disabled by ata_eh_unload(), the synchronize write cache command and
> start stop unit commands fail. E.g. running "rmmod ahci" with first
> removing sd_mod results in error messages like:
> 
> ata13.00: disable device
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:0:0:0: [sda] Stopping disk
> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> Fix this by removing all scsi devices of the ata devices connected to
> the port before scheduling libata EH to disable the ATA devices.
> 
> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
> ---
>  drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index c4898483d716..693cb3cd70cd 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>  	struct ata_link *link;
>  	struct ata_device *dev;
>  
> -	/* tell EH we're leaving & flush EH */
> +	/* Wait for any ongoing EH */
> +	ata_port_wait_eh(ap);
> +
> +	mutex_lock(&ap->scsi_scan_mutex);
>  	spin_lock_irqsave(ap->lock, flags);
> +
> +	/* Remove scsi devices */
> +	ata_for_each_link(link, ap, HOST_FIRST) {
> +		ata_for_each_dev(dev, link, ALL) {
> +			if (dev->sdev) {
> +				spin_unlock_irqrestore(ap->lock, flags);
> +				scsi_remove_device(dev->sdev);
> +				spin_lock_irqsave(ap->lock, flags);
> +				dev->sdev = NULL;
> +			}
> +		}
> +	}
> +
> +	/* Tell EH to disable all devices */
>  	ap->pflags |= ATA_PFLAG_UNLOADING;
>  	ata_port_schedule_eh(ap);
> +
>  	spin_unlock_irqrestore(ap->lock, flags);
> +	mutex_unlock(&ap->scsi_scan_mutex);
>  
>  	/* wait till EH commits suicide */
>  	ata_port_wait_eh(ap);
> -- 
> 2.41.0
> 

Looks good:
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

Could you perhaps also fix the WARN_ON(!(ap->pflags & ATA_PFLAG_UNLOADED));
in this very same function.

The pflag is checked without holding the lock.


Kind regards,
Niklas
Damien Le Moal Sept. 19, 2023, 5:42 p.m. UTC | #2
On 2023/09/19 6:21, Niklas Cassel wrote:
> On Fri, Sep 15, 2023 at 05:14:46PM +0900, Damien Le Moal wrote:
>> Whenever an ATA adapter driver is removed (e.g. rmmod),
>> ata_port_detach() is called repeatedly for all the adapter ports to
>> remove (unload) the devices attached to the port and delete the port
>> device itself. Removing of devices is done using libata EH with the
>> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
>> ata_eh_unload() which disables all devices attached to the port.
>>
>> ata_port_detach() finishes by calling scsi_remove_host() to remove the
>> scsi host associated with the port. This function will trigger the
>> removal of all scsi devices attached to the host and in the case of
>> disks, calls to sd_shutdown() which will flush the device write cache
>> and stop the device. However, given that the devices were already
>> disabled by ata_eh_unload(), the synchronize write cache command and
>> start stop unit commands fail. E.g. running "rmmod ahci" with first
>> removing sd_mod results in error messages like:
>>
>> ata13.00: disable device
>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> sd 0:0:0:0: [sda] Stopping disk
>> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>
>> Fix this by removing all scsi devices of the ata devices connected to
>> the port before scheduling libata EH to disable the ATA devices.
>>
>> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>> Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
>> ---
>>  drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index c4898483d716..693cb3cd70cd 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>>  	struct ata_link *link;
>>  	struct ata_device *dev;
>>  
>> -	/* tell EH we're leaving & flush EH */
>> +	/* Wait for any ongoing EH */
>> +	ata_port_wait_eh(ap);
>> +
>> +	mutex_lock(&ap->scsi_scan_mutex);
>>  	spin_lock_irqsave(ap->lock, flags);
>> +
>> +	/* Remove scsi devices */
>> +	ata_for_each_link(link, ap, HOST_FIRST) {
>> +		ata_for_each_dev(dev, link, ALL) {
>> +			if (dev->sdev) {
>> +				spin_unlock_irqrestore(ap->lock, flags);
>> +				scsi_remove_device(dev->sdev);
>> +				spin_lock_irqsave(ap->lock, flags);
>> +				dev->sdev = NULL;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* Tell EH to disable all devices */
>>  	ap->pflags |= ATA_PFLAG_UNLOADING;
>>  	ata_port_schedule_eh(ap);
>> +
>>  	spin_unlock_irqrestore(ap->lock, flags);
>> +	mutex_unlock(&ap->scsi_scan_mutex);
>>  
>>  	/* wait till EH commits suicide */
>>  	ata_port_wait_eh(ap);
>> -- 
>> 2.41.0
>>
> 
> Looks good:
> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Could you perhaps also fix the WARN_ON(!(ap->pflags & ATA_PFLAG_UNLOADED));
> in this very same function.
> 
> The pflag is checked without holding the lock.

It does not matter much at that point since EH is dead, or supposed to be, which
is what the WARN_ON() checks.

> 
> 
> Kind regards,
> Niklas
diff mbox series

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c4898483d716..693cb3cd70cd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5952,11 +5952,30 @@  static void ata_port_detach(struct ata_port *ap)
 	struct ata_link *link;
 	struct ata_device *dev;
 
-	/* tell EH we're leaving & flush EH */
+	/* Wait for any ongoing EH */
+	ata_port_wait_eh(ap);
+
+	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
+
+	/* Remove scsi devices */
+	ata_for_each_link(link, ap, HOST_FIRST) {
+		ata_for_each_dev(dev, link, ALL) {
+			if (dev->sdev) {
+				spin_unlock_irqrestore(ap->lock, flags);
+				scsi_remove_device(dev->sdev);
+				spin_lock_irqsave(ap->lock, flags);
+				dev->sdev = NULL;
+			}
+		}
+	}
+
+	/* Tell EH to disable all devices */
 	ap->pflags |= ATA_PFLAG_UNLOADING;
 	ata_port_schedule_eh(ap);
+
 	spin_unlock_irqrestore(ap->lock, flags);
+	mutex_unlock(&ap->scsi_scan_mutex);
 
 	/* wait till EH commits suicide */
 	ata_port_wait_eh(ap);