diff mbox

[1/1] scsi subsystem : fix function __scsi_device_lookup

Message ID 1444894715-4906-1-git-send-email-johnzzpcrystal@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhengping Zhou Oct. 15, 2015, 7:38 a.m. UTC
when a scsi_device is unpluged from scsi controller, if the
scsi_device is still be used by application layer,it won't be
released until users release it. In this case, scsi_device_remove just set
the scsi_device's state to be SDEV_DEL. But if you plug the disk
just before the old scsi_device is released, then there will be two
scsi_device structures in scsi_host->__devices. when the next unpluging 
event happens,some low-level drivers will check whether the scsi_device 
has been added to host (for example, the megaraid sas series controller) 
by calling scsi_device_lookup(call __scsi_device_lookup). 
__scsi_device_lookup will return the first scsi_device. Because its 
state is SDEV_DEL, the scsi_device_lookup will return NULL finally, 
making the low-level driver assume that the scsi_device has been 
removed,and won't call scsi_device_remove,which will lead the 
failure of hot swap.
Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
---
Hi all:
	I'm sorry to bother again,that's my second time to send 
	this patch.
	I find a bug about the failure of hot swap when I am using 
	megaraid sas series controller. Finally I have found that 
	when controller receives the event of hot swap, it will firstly 
	check whether the device is added to the system/host by calling 
	scsi_device_lookup.The logics in function megasas_aen_polling 
	is as follows:
            case MR_EVT_PD_REMOVED:
                    if (megasas_get_pd_list(instance) == 0) { 
                    for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
                            for (j = 0; 
                            j < MEGASAS_MAX_DEV_PER_CHANNEL;
                            j++) {

                            pd_index =
                            (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j; 

                            sdev1 = scsi_device_lookup(host, i, j, 0);

                            if (instance->pd_list[pd_index].driveState
                                    == MR_PD_STATE_SYSTEM) {
                                    if (sdev1)
                                            scsi_device_put(sdev1);
                            } else {
                                    if (sdev1) {
                                            scsi_remove_device(sdev1);
                                            scsi_device_put(sdev1);
                                    }    
                            }    
                            }    
                    }    
                    }    
	If the previous scsi_device is not released, this will lead the 
	appearance of two scsi_devices which correspond with the same disk.
	And when the disk is unpluged afterwards, the controller will assume
	that this disk has never been added into the system/host. Thus it won't 
	call scsi_device_remove. When I finish this modification, this problem
	is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY 
	and PCI_DEVICE_ID_LSI_FURY.
Thanks
Zhengping
---
 drivers/scsi/scsi.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Hannes Reinecke Oct. 15, 2015, 7:53 a.m. UTC | #1
On 10/15/2015 09:38 AM, Zhengping Zhou wrote:
> when a scsi_device is unpluged from scsi controller, if the
> scsi_device is still be used by application layer,it won't be
> released until users release it. In this case, scsi_device_remove just set
> the scsi_device's state to be SDEV_DEL. But if you plug the disk
> just before the old scsi_device is released, then there will be two
> scsi_device structures in scsi_host->__devices. when the next unpluging 
> event happens,some low-level drivers will check whether the scsi_device 
> has been added to host (for example, the megaraid sas series controller) 
> by calling scsi_device_lookup(call __scsi_device_lookup). 
> __scsi_device_lookup will return the first scsi_device. Because its 
> state is SDEV_DEL, the scsi_device_lookup will return NULL finally, 
> making the low-level driver assume that the scsi_device has been 
> removed,and won't call scsi_device_remove,which will lead the 
> failure of hot swap.
> Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
> ---
> Hi all:
> 	I'm sorry to bother again,that's my second time to send 
> 	this patch.
> 	I find a bug about the failure of hot swap when I am using 
> 	megaraid sas series controller. Finally I have found that 
> 	when controller receives the event of hot swap, it will firstly 
> 	check whether the device is added to the system/host by calling 
> 	scsi_device_lookup.The logics in function megasas_aen_polling 
> 	is as follows:
>             case MR_EVT_PD_REMOVED:
>                     if (megasas_get_pd_list(instance) == 0) { 
>                     for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
>                             for (j = 0; 
>                             j < MEGASAS_MAX_DEV_PER_CHANNEL;
>                             j++) {
> 
>                             pd_index =
>                             (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j; 
> 
>                             sdev1 = scsi_device_lookup(host, i, j, 0);
> 
>                             if (instance->pd_list[pd_index].driveState
>                                     == MR_PD_STATE_SYSTEM) {
>                                     if (sdev1)
>                                             scsi_device_put(sdev1);
>                             } else {
>                                     if (sdev1) {
>                                             scsi_remove_device(sdev1);
>                                             scsi_device_put(sdev1);
>                                     }    
>                             }    
>                             }    
>                     }    
>                     }    
> 	If the previous scsi_device is not released, this will lead the 
> 	appearance of two scsi_devices which correspond with the same disk.
> 	And when the disk is unpluged afterwards, the controller will assume
> 	that this disk has never been added into the system/host. Thus it won't 
> 	call scsi_device_remove. When I finish this modification, this problem
> 	is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY 
> 	and PCI_DEVICE_ID_LSI_FURY.
> Thanks
> Zhengping
> ---
>  drivers/scsi/scsi.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 207d6a7..5251d6d 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>  	struct scsi_device *sdev;
>  
>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
> +		if (sdev->sdev_state == SDEV_DEL)
> +			continue;
>  		if (sdev->channel == channel && sdev->id == id &&
>  				sdev->lun ==lun)
>  			return sdev;
> 
Ho-hum.

So lookup will return NULL, which then will cause the subsequent
functions to assume the scsi_device is not present, right?

And if you're _really_ unlucky it'll continue to add this device
(with the same LUN, target, bus, and host number!) to the list,
resulting in us having _two_ devices with the same number on the list.

Happy lookup.

I guess this calls for the lock rework from Johannes ...

Cheers,

Hannes
Zhengping Zhou Oct. 15, 2015, 8:22 a.m. UTC | #2
Yes

But it's interesting ,when you continue to add this device (with the
same LUN, target, bus, and host number!) to the list,scsi_add_device
will call scsi_probe_and_add_lun,and it will call
scsi_device_lookup_by_target(starget, lun) to recheck its existence
finally.but scsi_device_lookup_by_target doesn't has the problem  what
I mentioned in this patch .So it's OK when you add this device again.

Forgive my poor English!
2015-10-15 15:53 GMT+08:00 Hannes Reinecke <hare@suse.de>:
> On 10/15/2015 09:38 AM, Zhengping Zhou wrote:
>> when a scsi_device is unpluged from scsi controller, if the
>> scsi_device is still be used by application layer,it won't be
>> released until users release it. In this case, scsi_device_remove just set
>> the scsi_device's state to be SDEV_DEL. But if you plug the disk
>> just before the old scsi_device is released, then there will be two
>> scsi_device structures in scsi_host->__devices. when the next unpluging
>> event happens,some low-level drivers will check whether the scsi_device
>> has been added to host (for example, the megaraid sas series controller)
>> by calling scsi_device_lookup(call __scsi_device_lookup).
>> __scsi_device_lookup will return the first scsi_device. Because its
>> state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
>> making the low-level driver assume that the scsi_device has been
>> removed,and won't call scsi_device_remove,which will lead the
>> failure of hot swap.
>> Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
>> ---
>> Hi all:
>>       I'm sorry to bother again,that's my second time to send
>>       this patch.
>>       I find a bug about the failure of hot swap when I am using
>>       megaraid sas series controller. Finally I have found that
>>       when controller receives the event of hot swap, it will firstly
>>       check whether the device is added to the system/host by calling
>>       scsi_device_lookup.The logics in function megasas_aen_polling
>>       is as follows:
>>             case MR_EVT_PD_REMOVED:
>>                     if (megasas_get_pd_list(instance) == 0) {
>>                     for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
>>                             for (j = 0;
>>                             j < MEGASAS_MAX_DEV_PER_CHANNEL;
>>                             j++) {
>>
>>                             pd_index =
>>                             (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
>>
>>                             sdev1 = scsi_device_lookup(host, i, j, 0);
>>
>>                             if (instance->pd_list[pd_index].driveState
>>                                     == MR_PD_STATE_SYSTEM) {
>>                                     if (sdev1)
>>                                             scsi_device_put(sdev1);
>>                             } else {
>>                                     if (sdev1) {
>>                                             scsi_remove_device(sdev1);
>>                                             scsi_device_put(sdev1);
>>                                     }
>>                             }
>>                             }
>>                     }
>>                     }
>>       If the previous scsi_device is not released, this will lead the
>>       appearance of two scsi_devices which correspond with the same disk.
>>       And when the disk is unpluged afterwards, the controller will assume
>>       that this disk has never been added into the system/host. Thus it won't
>>       call scsi_device_remove. When I finish this modification, this problem
>>       is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
>>       and PCI_DEVICE_ID_LSI_FURY.
>> Thanks
>> Zhengping
>> ---
>>  drivers/scsi/scsi.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 207d6a7..5251d6d 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>>       struct scsi_device *sdev;
>>
>>       list_for_each_entry(sdev, &shost->__devices, siblings) {
>> +             if (sdev->sdev_state == SDEV_DEL)
>> +                     continue;
>>               if (sdev->channel == channel && sdev->id == id &&
>>                               sdev->lun ==lun)
>>                       return sdev;
>>
> Ho-hum.
>
> So lookup will return NULL, which then will cause the subsequent
> functions to assume the scsi_device is not present, right?
>
> And if you're _really_ unlucky it'll continue to add this device
> (with the same LUN, target, bus, and host number!) to the list,
> resulting in us having _two_ devices with the same number on the list.
>
> Happy lookup.
>
> I guess this calls for the lock rework from Johannes ...
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                            zSeries & Storage
> hare@suse.de                                   +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhengping Zhou Oct. 16, 2015, 12:08 p.m. UTC | #3
2015-10-15 16:22 GMT+08:00 Zhengping Zhou <johnzzpcrystal@gmail.com>:
> Yes
>
> But it's interesting ,when you continue to add this device (with the
> same LUN, target, bus, and host number!) to the list,scsi_add_device
> will call scsi_probe_and_add_lun,and it will call
> scsi_device_lookup_by_target(starget, lun) to recheck its existence
> finally.but scsi_device_lookup_by_target doesn't has the problem  what
> I mentioned in this patch .So it's OK when you add this device again.
>
> Forgive my poor English!
> 2015-10-15 15:53 GMT+08:00 Hannes Reinecke <hare@suse.de>:
>> On 10/15/2015 09:38 AM, Zhengping Zhou wrote:
>>> when a scsi_device is unpluged from scsi controller, if the
>>> scsi_device is still be used by application layer,it won't be
>>> released until users release it. In this case, scsi_device_remove just set
>>> the scsi_device's state to be SDEV_DEL. But if you plug the disk
>>> just before the old scsi_device is released, then there will be two
>>> scsi_device structures in scsi_host->__devices. when the next unpluging
>>> event happens,some low-level drivers will check whether the scsi_device
>>> has been added to host (for example, the megaraid sas series controller)
>>> by calling scsi_device_lookup(call __scsi_device_lookup).
>>> __scsi_device_lookup will return the first scsi_device. Because its
>>> state is SDEV_DEL, the scsi_device_lookup will return NULL finally,
>>> making the low-level driver assume that the scsi_device has been
>>> removed,and won't call scsi_device_remove,which will lead the
>>> failure of hot swap.
>>> Signed-off-by: Zhengping Zhou <johnzzpcrystal@gmail.com>
>>> ---
>>> Hi all:
>>>       I'm sorry to bother again,that's my second time to send
>>>       this patch.
>>>       I find a bug about the failure of hot swap when I am using
>>>       megaraid sas series controller. Finally I have found that
>>>       when controller receives the event of hot swap, it will firstly
>>>       check whether the device is added to the system/host by calling
>>>       scsi_device_lookup.The logics in function megasas_aen_polling
>>>       is as follows:
>>>             case MR_EVT_PD_REMOVED:
>>>                     if (megasas_get_pd_list(instance) == 0) {
>>>                     for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) {
>>>                             for (j = 0;
>>>                             j < MEGASAS_MAX_DEV_PER_CHANNEL;
>>>                             j++) {
>>>
>>>                             pd_index =
>>>                             (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j;
>>>
>>>                             sdev1 = scsi_device_lookup(host, i, j, 0);
>>>
>>>                             if (instance->pd_list[pd_index].driveState
>>>                                     == MR_PD_STATE_SYSTEM) {
>>>                                     if (sdev1)
>>>                                             scsi_device_put(sdev1);
>>>                             } else {
>>>                                     if (sdev1) {
>>>                                             scsi_remove_device(sdev1);
>>>                                             scsi_device_put(sdev1);
>>>                                     }
>>>                             }
>>>                             }
>>>                     }
>>>                     }
>>>       If the previous scsi_device is not released, this will lead the
>>>       appearance of two scsi_devices which correspond with the same disk.
>>>       And when the disk is unpluged afterwards, the controller will assume
>>>       that this disk has never been added into the system/host. Thus it won't
>>>       call scsi_device_remove. When I finish this modification, this problem
>>>       is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY
>>>       and PCI_DEVICE_ID_LSI_FURY.
>>> Thanks
>>> Zhengping
>>> ---
>>>  drivers/scsi/scsi.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>> index 207d6a7..5251d6d 100644
>>> --- a/drivers/scsi/scsi.c
>>> +++ b/drivers/scsi/scsi.c
>>> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>>>       struct scsi_device *sdev;
>>>
>>>       list_for_each_entry(sdev, &shost->__devices, siblings) {
>>> +             if (sdev->sdev_state == SDEV_DEL)
>>> +                     continue;
>>>               if (sdev->channel == channel && sdev->id == id &&
>>>                               sdev->lun ==lun)
>>>                       return sdev;
>>>
>> Ho-hum.
>>
>> So lookup will return NULL, which then will cause the subsequent
>> functions to assume the scsi_device is not present, right?
>>
>> And if you're _really_ unlucky it'll continue to add this device
>> (with the same LUN, target, bus, and host number!) to the list,
>> resulting in us having _two_ devices with the same number on the list.
>>
>> Happy lookup.
>>
>> I guess this calls for the lock rework from Johannes ...
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke                            zSeries & Storage
>> hare@suse.de                                   +49 911 74053 688
>> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
>> HRB 21284 (AG Nürnberg)




You can compare the function __scsi_device_lookup_by_target and the
function __scsi_device_lookup.
It will be more easy to figure out this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 207d6a7..5251d6d 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1118,6 +1118,8 @@  struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 	struct scsi_device *sdev;
 
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
+		if (sdev->sdev_state == SDEV_DEL)
+			continue;
 		if (sdev->channel == channel && sdev->id == id &&
 				sdev->lun ==lun)
 			return sdev;