diff mbox series

[02/19] ata: libata-core: Fix port and device removal

Message ID 20230911040217.253905-3-dlemoal@kernel.org (mailing list archive)
State Superseded
Headers show
Series Fix libata suspend/resume handling and code cleanup | expand

Commit Message

Damien Le Moal Sept. 11, 2023, 4:02 a.m. UTC
Whenever an ATA adapter driver is removed (e.g. rmmod),
ata_port_detach() is called repeatedly for all the adapter ports to
remove (unload) the devices attached to the port and delete the port
device itself. Removing of devices is done using libata EH with the
ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
ata_eh_unload() which disables all devices attached to the port.

ata_port_detach() finishes by calling scsi_remove_host() to remove the
scsi host associated with the port. This function will trigger the
removal of all scsi devices attached to the host and in the case of
disks, calls to sd_shutdown() which will flush the device write cache
and stop the device. However, given that the devices were already
disabled by ata_eh_unload(), the synchronize write cache command and
start stop unit commands fail. E.g. running "rmmod ahci" with first
removing sd_mod results in error messages like:

ata13.00: disable device
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] Stopping disk
sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Fix this by removing all scsi devices of the ata devices connected to
the port before scheduling libata EH to disable the ATA devices.

Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Comments

Hannes Reinecke Sept. 11, 2023, 6:37 a.m. UTC | #1
On 9/11/23 06:02, Damien Le Moal wrote:
> Whenever an ATA adapter driver is removed (e.g. rmmod),
> ata_port_detach() is called repeatedly for all the adapter ports to
> remove (unload) the devices attached to the port and delete the port
> device itself. Removing of devices is done using libata EH with the
> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
> ata_eh_unload() which disables all devices attached to the port.
> 
> ata_port_detach() finishes by calling scsi_remove_host() to remove the
> scsi host associated with the port. This function will trigger the
> removal of all scsi devices attached to the host and in the case of
> disks, calls to sd_shutdown() which will flush the device write cache
> and stop the device. However, given that the devices were already
> disabled by ata_eh_unload(), the synchronize write cache command and
> start stop unit commands fail. E.g. running "rmmod ahci" with first
> removing sd_mod results in error messages like:
> 
> ata13.00: disable device
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:0:0:0: [sda] Stopping disk
> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> Fix this by removing all scsi devices of the ata devices connected to
> the port before scheduling libata EH to disable the ATA devices.
> 
> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>   1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index c4898483d716..693cb3cd70cd 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>   	struct ata_link *link;
>   	struct ata_device *dev;
>   
> -	/* tell EH we're leaving & flush EH */
> +	/* Wait for any ongoing EH */
> +	ata_port_wait_eh(ap);
> +
> +	mutex_lock(&ap->scsi_scan_mutex);
>   	spin_lock_irqsave(ap->lock, flags);
> +
> +	/* Remove scsi devices */
> +	ata_for_each_link(link, ap, HOST_FIRST) {
> +		ata_for_each_dev(dev, link, ALL) {
> +			if (dev->sdev) {
> +				spin_unlock_irqrestore(ap->lock, flags);
> +				scsi_remove_device(dev->sdev);
> +				spin_lock_irqsave(ap->lock, flags);
> +				dev->sdev = NULL;
> +			}
> +		}
> +	}
> +
> +	/* Tell EH to disable all devices */
>   	ap->pflags |= ATA_PFLAG_UNLOADING;
>   	ata_port_schedule_eh(ap);
> +
>   	spin_unlock_irqrestore(ap->lock, flags);
> +	mutex_unlock(&ap->scsi_scan_mutex);
>   
Are you sure about releasing scan_mutex after ata_port_schedule_eh()?
Technically ata_port_schedule_eh() will be calling into SCSI rescan, 
which would take scan_mutex ...
Not that it matter much, seeing that we've disconnected all devices, but 
still ...

Cheers,

Hannes
Damien Le Moal Sept. 11, 2023, 6:44 a.m. UTC | #2
On 9/11/23 15:37, Hannes Reinecke wrote:
> On 9/11/23 06:02, Damien Le Moal wrote:
>> Whenever an ATA adapter driver is removed (e.g. rmmod),
>> ata_port_detach() is called repeatedly for all the adapter ports to
>> remove (unload) the devices attached to the port and delete the port
>> device itself. Removing of devices is done using libata EH with the
>> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
>> ata_eh_unload() which disables all devices attached to the port.
>>
>> ata_port_detach() finishes by calling scsi_remove_host() to remove the
>> scsi host associated with the port. This function will trigger the
>> removal of all scsi devices attached to the host and in the case of
>> disks, calls to sd_shutdown() which will flush the device write cache
>> and stop the device. However, given that the devices were already
>> disabled by ata_eh_unload(), the synchronize write cache command and
>> start stop unit commands fail. E.g. running "rmmod ahci" with first
>> removing sd_mod results in error messages like:
>>
>> ata13.00: disable device
>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result:
>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> sd 0:0:0:0: [sda] Stopping disk
>> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
>> driverbyte=DRIVER_OK
>>
>> Fix this by removing all scsi devices of the ata devices connected to
>> the port before scheduling libata EH to disable the ATA devices.
>>
>> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>>   1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index c4898483d716..693cb3cd70cd 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>>       struct ata_link *link;
>>       struct ata_device *dev;
>>   -    /* tell EH we're leaving & flush EH */
>> +    /* Wait for any ongoing EH */
>> +    ata_port_wait_eh(ap);
>> +
>> +    mutex_lock(&ap->scsi_scan_mutex);
>>       spin_lock_irqsave(ap->lock, flags);
>> +
>> +    /* Remove scsi devices */
>> +    ata_for_each_link(link, ap, HOST_FIRST) {
>> +        ata_for_each_dev(dev, link, ALL) {
>> +            if (dev->sdev) {
>> +                spin_unlock_irqrestore(ap->lock, flags);
>> +                scsi_remove_device(dev->sdev);
>> +                spin_lock_irqsave(ap->lock, flags);
>> +                dev->sdev = NULL;
>> +            }
>> +        }
>> +    }
>> +
>> +    /* Tell EH to disable all devices */
>>       ap->pflags |= ATA_PFLAG_UNLOADING;
>>       ata_port_schedule_eh(ap);
>> +
>>       spin_unlock_irqrestore(ap->lock, flags);
>> +    mutex_unlock(&ap->scsi_scan_mutex);
>>   
> Are you sure about releasing scan_mutex after ata_port_schedule_eh()?
> Technically ata_port_schedule_eh() will be calling into SCSI rescan, which
> would take scan_mutex ...
> Not that it matter much, seeing that we've disconnected all devices, but still ...

UNLOADING is set and in that case, EH does nothing else than removing the
devices. So I think this is OK.

> 
> Cheers,
> 
> Hannes
Hannes Reinecke Sept. 11, 2023, 7:07 a.m. UTC | #3
On 9/11/23 08:44, Damien Le Moal wrote:
> On 9/11/23 15:37, Hannes Reinecke wrote:
>> On 9/11/23 06:02, Damien Le Moal wrote:
>>> Whenever an ATA adapter driver is removed (e.g. rmmod),
>>> ata_port_detach() is called repeatedly for all the adapter ports to
>>> remove (unload) the devices attached to the port and delete the port
>>> device itself. Removing of devices is done using libata EH with the
>>> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
>>> ata_eh_unload() which disables all devices attached to the port.
>>>
>>> ata_port_detach() finishes by calling scsi_remove_host() to remove the
>>> scsi host associated with the port. This function will trigger the
>>> removal of all scsi devices attached to the host and in the case of
>>> disks, calls to sd_shutdown() which will flush the device write cache
>>> and stop the device. However, given that the devices were already
>>> disabled by ata_eh_unload(), the synchronize write cache command and
>>> start stop unit commands fail. E.g. running "rmmod ahci" with first
>>> removing sd_mod results in error messages like:
>>>
>>> ata13.00: disable device
>>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result:
>>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 0:0:0:0: [sda] Stopping disk
>>> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
>>> driverbyte=DRIVER_OK
>>>
>>> Fix this by removing all scsi devices of the ata devices connected to
>>> the port before scheduling libata EH to disable the ATA devices.
>>>
>>> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>> ---
>>>    drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>>>    1 file changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>>> index c4898483d716..693cb3cd70cd 100644
>>> --- a/drivers/ata/libata-core.c
>>> +++ b/drivers/ata/libata-core.c
>>> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>>>        struct ata_link *link;
>>>        struct ata_device *dev;
>>>    -    /* tell EH we're leaving & flush EH */
>>> +    /* Wait for any ongoing EH */
>>> +    ata_port_wait_eh(ap);
>>> +
>>> +    mutex_lock(&ap->scsi_scan_mutex);
>>>        spin_lock_irqsave(ap->lock, flags);
>>> +
>>> +    /* Remove scsi devices */
>>> +    ata_for_each_link(link, ap, HOST_FIRST) {
>>> +        ata_for_each_dev(dev, link, ALL) {
>>> +            if (dev->sdev) {
>>> +                spin_unlock_irqrestore(ap->lock, flags);
>>> +                scsi_remove_device(dev->sdev);
>>> +                spin_lock_irqsave(ap->lock, flags);
>>> +                dev->sdev = NULL;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    /* Tell EH to disable all devices */
>>>        ap->pflags |= ATA_PFLAG_UNLOADING;
>>>        ata_port_schedule_eh(ap);
>>> +
>>>        spin_unlock_irqrestore(ap->lock, flags);
>>> +    mutex_unlock(&ap->scsi_scan_mutex);
>>>    
>> Are you sure about releasing scan_mutex after ata_port_schedule_eh()?
>> Technically ata_port_schedule_eh() will be calling into SCSI rescan, which
>> would take scan_mutex ...
>> Not that it matter much, seeing that we've disconnected all devices, but still ...
> 
> UNLOADING is set and in that case, EH does nothing else than removing the
> devices. So I think this is OK.
> 
Yeah, thought so. Just wanted to mention it.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Chia-Lin Kao (AceLan) Sept. 13, 2023, 1:43 a.m. UTC | #4
On Mon, Sep 11, 2023 at 01:02:00PM +0900, Damien Le Moal wrote:
> Whenever an ATA adapter driver is removed (e.g. rmmod),
> ata_port_detach() is called repeatedly for all the adapter ports to
> remove (unload) the devices attached to the port and delete the port
> device itself. Removing of devices is done using libata EH with the
> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
> ata_eh_unload() which disables all devices attached to the port.
> 
> ata_port_detach() finishes by calling scsi_remove_host() to remove the
> scsi host associated with the port. This function will trigger the
> removal of all scsi devices attached to the host and in the case of
> disks, calls to sd_shutdown() which will flush the device write cache
> and stop the device. However, given that the devices were already
> disabled by ata_eh_unload(), the synchronize write cache command and
> start stop unit commands fail. E.g. running "rmmod ahci" with first
> removing sd_mod results in error messages like:
> 
> ata13.00: disable device
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:0:0:0: [sda] Stopping disk
> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> Fix this by removing all scsi devices of the ata devices connected to
> the port before scheduling libata EH to disable the ATA devices.
> 
> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
diff mbox series

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c4898483d716..693cb3cd70cd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5952,11 +5952,30 @@  static void ata_port_detach(struct ata_port *ap)
 	struct ata_link *link;
 	struct ata_device *dev;
 
-	/* tell EH we're leaving & flush EH */
+	/* Wait for any ongoing EH */
+	ata_port_wait_eh(ap);
+
+	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
+
+	/* Remove scsi devices */
+	ata_for_each_link(link, ap, HOST_FIRST) {
+		ata_for_each_dev(dev, link, ALL) {
+			if (dev->sdev) {
+				spin_unlock_irqrestore(ap->lock, flags);
+				scsi_remove_device(dev->sdev);
+				spin_lock_irqsave(ap->lock, flags);
+				dev->sdev = NULL;
+			}
+		}
+	}
+
+	/* Tell EH to disable all devices */
 	ap->pflags |= ATA_PFLAG_UNLOADING;
 	ata_port_schedule_eh(ap);
+
 	spin_unlock_irqrestore(ap->lock, flags);
+	mutex_unlock(&ap->scsi_scan_mutex);
 
 	/* wait till EH commits suicide */
 	ata_port_wait_eh(ap);