diff mbox

mpt2sas: mpt3sas: Fix memory corruption during initialization

Message ID 1430751931-14002-1-git-send-email-Sreekanth.Reddy@avagotech.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sreekanth Reddy May 4, 2015, 3:05 p.m. UTC
I have applied this patch on the latest upstream mpt3sas driver, then I have compiled and loaded the driver.
In the driver logs I didn't see any attached drives are added to the OS, 'fdisk -l' command also doesn't list
 the drives which are actually attached to the HBA.

When I debug this issue then I see that in '_scsih_target_alloc'
 driver is searching for sas_device from the lists 'sas_device_init_list' & 'sas_device_list'
 based on the device sas address using the function mpt3sas_scsih_sas_device_find_by_sas_address(),
 since this device is not in the 'sas_device_init_list' (as it is moved it to head list) driver exit
 from this function without updating the required device addition information.

To solve the original problem (i.e memory corruption), here I have attached the patch,
 in this patch I have added one atomic flag is_on_sas_device_init_list in _sas_device_structure
 and I followed below algorithm.

1. when ever a device is added to sas_device_init_list then driver will set this atomic flag of this device to one.

2. And during the addition of this device to SCSI mid layer,
        if the device is successfully added to the OS then driver will move this device list in to sas_device_list list from sas_device_init_list list and at this time driver will reset this flag to zero.
        if device is failed to register with SCSI mid layer then also driver will reset this flag to zero in function _scsih_sas_device_remove and will remove the device entry from sas_device_init_list and will free the device structure.

3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
         a. driver will check whether addition of discovered devices to SML process is currently running or not,
               i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
                    if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
             ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.

4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this  device removal event.

Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
---
 drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 ++
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 ++
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
 4 files changed, 71 insertions(+), 21 deletions(-)

Comments

Tomas Henzl May 5, 2015, 3:35 p.m. UTC | #1
On 05/04/2015 05:05 PM, Sreekanth Reddy wrote:
> I have applied this patch on the latest upstream mpt3sas driver, then I have compiled and loaded the driver.
> In the driver logs I didn't see any attached drives are added to the OS, 'fdisk -l' command also doesn't list
>  the drives which are actually attached to the HBA.
>
> When I debug this issue then I see that in '_scsih_target_alloc'
>  driver is searching for sas_device from the lists 'sas_device_init_list' & 'sas_device_list'
>  based on the device sas address using the function mpt3sas_scsih_sas_device_find_by_sas_address(),
>  since this device is not in the 'sas_device_init_list' (as it is moved it to head list) driver exit
>  from this function without updating the required device addition information.
>
> To solve the original problem (i.e memory corruption), here I have attached the patch,
>  in this patch I have added one atomic flag is_on_sas_device_init_list in _sas_device_structure
>  and I followed below algorithm.
>
> 1. when ever a device is added to sas_device_init_list then driver will set this atomic flag of this device to one.
>
> 2. And during the addition of this device to SCSI mid layer,
>         if the device is successfully added to the OS then driver will move this device list in to sas_device_list list from sas_device_init_list list and at this time driver will reset this flag to zero.
>         if device is failed to register with SCSI mid layer then also driver will reset this flag to zero in function _scsih_sas_device_remove and will remove the device entry from sas_device_init_list and will free the device structure.
>
> 3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
>          a. driver will check whether addition of discovered devices to SML process is currently running or not,
>                i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
>                     if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
>              ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
>
> 4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this  device removal event.
>
> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
> ---
>  drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 ++
>  drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 ++
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
>  4 files changed, 71 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> index caff8d1..1aa10d2 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> @@ -376,6 +376,7 @@ struct _sas_device {
>  	u8	phy;
>  	u8	responding;
>  	u8	pfa_led_on;
> +	atomic_t is_on_sas_device_init_list;

Hi Sreekanth,
when is_on_sas_device_init_list is used it's protected
by ioc->sas_device_lock - why do you need a atomic_t ?
There is one exception, but easily fixable. 

>  };
>  
>  /**
> @@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
>  	u8		broadcast_aen_busy;
>  	u16		broadcast_aen_pending;
>  	u8		shost_recovery;
> +	u8		discovered_device_addition_on;
>  
>  	struct mutex	reset_in_progress_mutex;
>  	spinlock_t 	ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> index 3f26147..2a61286 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> @@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
>      struct _sas_device *sas_device)
>  {
>  	unsigned long flags;
> +	struct _sas_device *same_sas_device;
>  
>  	if (!sas_device)
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	list_del(&sas_device->list);
> -	kfree(sas_device);
> +	same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> +						sas_device->handle);

Is it possible that when same_sas_device is not null, that the
value is not the same as for the sas_device ? 

> +	if (same_sas_device) {
> +		list_del(&same_sas_device->list);
> +		if (atomic_read(&sas_device->is_on_sas_device_init_list))

Seems easier to just set the variable without a test.

> +			atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> +		kfree(same_sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>  
> @@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
>  	    "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
>  	    sas_device->handle, (unsigned long long)sas_device->sas_address));
>  
> +	atomic_set(&sas_device->is_on_sas_device_init_list, 1);
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
>  	_scsih_determine_boot_device(ioc, sas_device, 0);
> @@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
>  	    sas_address);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
>  	struct _sas_device *sas_device, *next;
>  	unsigned long flags;
>  
> +	ioc->discovered_device_addition_on = 1;
>  	/* SAS Device List */
>  	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
>  	    list) {
>  
>  		if (ioc->hide_drives)
>  			continue;
> -
> +
>  		if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
>  		    sas_device->sas_address_parent)) {
> -			list_del(&sas_device->list);
> -			kfree(sas_device);
> +			mpt2sas_transport_port_remove(ioc,
> +					sas_device->sas_address,
> +					sas_device->sas_address_parent);
> +			_scsih_sas_device_remove(ioc, sas_device);
>  			continue;
>  		} else if (!sas_device->starget) {
>  			if (!ioc->is_driver_loading) {
>  				mpt2sas_transport_port_remove(ioc,
>  					sas_device->sas_address,
>  					sas_device->sas_address_parent);
> -				list_del(&sas_device->list);
> -				kfree(sas_device);
> +				_scsih_sas_device_remove(ioc, sas_device);
>  				continue;
>  			}
>  		}
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  		list_move_tail(&sas_device->list, &ioc->sas_device_list);
> +		atomic_dec(&sas_device->is_on_sas_device_init_list);

Why not 'atomic_set(&sas_device->is_on_sas_device_init_list, 0);' ?
There is no place where you set the value of is_on_sas_device_init_list
higher than '1'.

Cheers,
Tomas

>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
> +	ioc->discovered_device_addition_on = 0;
>  }
>  
>  /**
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index afa8816..6188490 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -315,6 +315,7 @@ struct _sas_device {
>  	u8	responding;
>  	u8	fast_path;
>  	u8	pfa_led_on;
> +	atomic_t is_on_sas_device_init_list;
>  };
>  
>  /**
> @@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
>  	u8		broadcast_aen_busy;
>  	u16		broadcast_aen_pending;
>  	u8		shost_recovery;
> +	u8		discovered_device_addition_on;
>  
>  	struct mutex	reset_in_progress_mutex;
>  	spinlock_t	ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 5a97e32..53cc9ea 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
>  	struct _sas_device *sas_device)
>  {
>  	unsigned long flags;
> +	struct _sas_device *same_sas_device;
>  
>  	if (!sas_device)
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	list_del(&sas_device->list);
> -	kfree(sas_device);
> +	same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> +						sas_device->handle);
> +	if (same_sas_device) {
> +		list_del(&same_sas_device->list);
> +		if (atomic_read(&sas_device->is_on_sas_device_init_list))
> +			atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> +		kfree(same_sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>  
> @@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
>  	    sas_address);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
>  		ioc->name, __func__, sas_device->handle,
>  		(unsigned long long)sas_device->sas_address));
>  
> +	atomic_set(&sas_device->is_on_sas_device_init_list, 1);
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	list_add_tail(&sas_device->list, &ioc->sas_device_list);
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  	struct _sas_device *sas_device, *next;
>  	unsigned long flags;
>  
> +	ioc->discovered_device_addition_on = 1;
>  	/* SAS Device List */
>  	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
>  	    list) {
>  
>  		if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
>  		    sas_device->sas_address_parent)) {
> -			list_del(&sas_device->list);
> -			kfree(sas_device);
> +			mpt3sas_transport_port_remove(ioc,
> +					sas_device->sas_address,
> +					sas_device->sas_address_parent);
> +			_scsih_sas_device_remove(ioc, sas_device);
>  			continue;
>  		} else if (!sas_device->starget) {
>  			/*
> @@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  				mpt3sas_transport_port_remove(ioc,
>  				    sas_device->sas_address,
>  				    sas_device->sas_address_parent);
> -				list_del(&sas_device->list);
> -				kfree(sas_device);
> +				_scsih_sas_device_remove(ioc, sas_device);
>  				continue;
>  			}
>  		}
>  
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  		list_move_tail(&sas_device->list, &ioc->sas_device_list);
> +		atomic_dec(&sas_device->is_on_sas_device_init_list);
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
> +	ioc->discovered_device_addition_on = 0;
>  }
>  
>  /**

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens May 6, 2015, 6:48 p.m. UTC | #2
On Monday 05/04 at 20:35 +0530, Sreekanth Reddy wrote:
> I have applied this patch on the latest upstream mpt3sas driver, then
> I have compiled and loaded the driver.  In the driver logs I didn't
> see any attached drives are added to the OS, 'fdisk -l' command also
> doesn't list the drives which are actually attached to the HBA.
>
> When I debug this issue then I see that in '_scsih_target_alloc'
> driver is searching for sas_device from the lists
> 'sas_device_init_list' & 'sas_device_list' based on the device sas
> address using the function
> mpt3sas_scsih_sas_device_find_by_sas_address(), since this device is
> not in the 'sas_device_init_list' (as it is moved it to head list)
> driver exit from this function without updating the required device
> addition information.

Yes, I misunderstood that the initialization depended on the devices
still being on the init_list.

What's interesting about this is that when I tested it, it still worked.
I think the MPT2SAS_PORT_ENABLE_COMPLETE fw_event might zero
ioc->start_scan and allow scsih_scan_finished() to start probing devices
before all the devices are actually on the init_list. It seems to be
very repeatable per-machine whether or not it works.

But in any case, my patch was wrong.

> To solve the original problem (i.e memory corruption), here I have
> attached the patch, in this patch I have added one atomic flag
> is_on_sas_device_init_list in _sas_device_structure and I followed
> below algorithm.

The problem is that this only solves a single case. There isn't anything
to enforce that this or a similar chain of events can't happen elsewhere
in the code.

I think the best general solution would be to add a refcount to these
objects.  They sit on a list that can be concurrently accessed from
multiple threads, so I think a refcount is the best way to ensure that
objects aren't freed out from under other users.

I'm working on a patchset that does this. I'm starting by adding a
refcount to the sas_device object only, and refactoring the code in
mpt2sas_scsih.c to use it. I should be able to send up a first version
of that pretty soon to get some feedback.

Thanks,
Calvin

 
> 1. when ever a device is added to sas_device_init_list then driver
> will set this atomic flag of this device to one.
> 
> 2. And during the addition of this device to SCSI mid layer, if the
> device is successfully added to the OS then driver will move this
> device list in to sas_device_list list from sas_device_init_list list
> and at this time driver will reset this flag to zero.  if device is
> failed to register with SCSI mid layer then also driver will reset
> this flag to zero in function _scsih_sas_device_remove and will remove
> the device entry from sas_device_init_list and will free the device
> structure.
> 
> 3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
>          a. driver will check whether addition of discovered devices to SML process is currently running or not,
>                i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
>                     if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
>              ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
> 
> 4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this  device removal event.
> 
> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
> ---
>  drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 ++
>  drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 ++
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
>  4 files changed, 71 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> index caff8d1..1aa10d2 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> @@ -376,6 +376,7 @@ struct _sas_device {
>  	u8	phy;
>  	u8	responding;
>  	u8	pfa_led_on;
> +	atomic_t is_on_sas_device_init_list;
>  };
>  
>  /**
> @@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
>  	u8		broadcast_aen_busy;
>  	u16		broadcast_aen_pending;
>  	u8		shost_recovery;
> +	u8		discovered_device_addition_on;
>  
>  	struct mutex	reset_in_progress_mutex;
>  	spinlock_t 	ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> index 3f26147..2a61286 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> @@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
>      struct _sas_device *sas_device)
>  {
>  	unsigned long flags;
> +	struct _sas_device *same_sas_device;
>  
>  	if (!sas_device)
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	list_del(&sas_device->list);
> -	kfree(sas_device);
> +	same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> +						sas_device->handle);
> +	if (same_sas_device) {
> +		list_del(&same_sas_device->list);
> +		if (atomic_read(&sas_device->is_on_sas_device_init_list))
> +			atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> +		kfree(same_sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>  
> @@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
>  	    "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
>  	    sas_device->handle, (unsigned long long)sas_device->sas_address));
>  
> +	atomic_set(&sas_device->is_on_sas_device_init_list, 1);
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
>  	_scsih_determine_boot_device(ioc, sas_device, 0);
> @@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
>  	    sas_address);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
>  	struct _sas_device *sas_device, *next;
>  	unsigned long flags;
>  
> +	ioc->discovered_device_addition_on = 1;
>  	/* SAS Device List */
>  	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
>  	    list) {
>  
>  		if (ioc->hide_drives)
>  			continue;
> -
> +
>  		if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
>  		    sas_device->sas_address_parent)) {
> -			list_del(&sas_device->list);
> -			kfree(sas_device);
> +			mpt2sas_transport_port_remove(ioc,
> +					sas_device->sas_address,
> +					sas_device->sas_address_parent);
> +			_scsih_sas_device_remove(ioc, sas_device);
>  			continue;
>  		} else if (!sas_device->starget) {
>  			if (!ioc->is_driver_loading) {
>  				mpt2sas_transport_port_remove(ioc,
>  					sas_device->sas_address,
>  					sas_device->sas_address_parent);
> -				list_del(&sas_device->list);
> -				kfree(sas_device);
> +				_scsih_sas_device_remove(ioc, sas_device);
>  				continue;
>  			}
>  		}
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  		list_move_tail(&sas_device->list, &ioc->sas_device_list);
> +		atomic_dec(&sas_device->is_on_sas_device_init_list);
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
> +	ioc->discovered_device_addition_on = 0;
>  }
>  
>  /**
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index afa8816..6188490 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -315,6 +315,7 @@ struct _sas_device {
>  	u8	responding;
>  	u8	fast_path;
>  	u8	pfa_led_on;
> +	atomic_t is_on_sas_device_init_list;
>  };
>  
>  /**
> @@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
>  	u8		broadcast_aen_busy;
>  	u16		broadcast_aen_pending;
>  	u8		shost_recovery;
> +	u8		discovered_device_addition_on;
>  
>  	struct mutex	reset_in_progress_mutex;
>  	spinlock_t	ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 5a97e32..53cc9ea 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
>  	struct _sas_device *sas_device)
>  {
>  	unsigned long flags;
> +	struct _sas_device *same_sas_device;
>  
>  	if (!sas_device)
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	list_del(&sas_device->list);
> -	kfree(sas_device);
> +	same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> +						sas_device->handle);
> +	if (same_sas_device) {
> +		list_del(&same_sas_device->list);
> +		if (atomic_read(&sas_device->is_on_sas_device_init_list))
> +			atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> +		kfree(same_sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>  
> @@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
>  	    sas_address);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	if (sas_device) {
> +		if (ioc->discovered_device_addition_on &&
> +		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
> +			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +			return;
> +		} else
> +			list_del(&sas_device->list);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	if (sas_device)
>  		_scsih_remove_device(ioc, sas_device);
> @@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
>  		ioc->name, __func__, sas_device->handle,
>  		(unsigned long long)sas_device->sas_address));
>  
> +	atomic_set(&sas_device->is_on_sas_device_init_list, 1);
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	list_add_tail(&sas_device->list, &ioc->sas_device_list);
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  	struct _sas_device *sas_device, *next;
>  	unsigned long flags;
>  
> +	ioc->discovered_device_addition_on = 1;
>  	/* SAS Device List */
>  	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
>  	    list) {
>  
>  		if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
>  		    sas_device->sas_address_parent)) {
> -			list_del(&sas_device->list);
> -			kfree(sas_device);
> +			mpt3sas_transport_port_remove(ioc,
> +					sas_device->sas_address,
> +					sas_device->sas_address_parent);
> +			_scsih_sas_device_remove(ioc, sas_device);
>  			continue;
>  		} else if (!sas_device->starget) {
>  			/*
> @@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  				mpt3sas_transport_port_remove(ioc,
>  				    sas_device->sas_address,
>  				    sas_device->sas_address_parent);
> -				list_del(&sas_device->list);
> -				kfree(sas_device);
> +				_scsih_sas_device_remove(ioc, sas_device);
>  				continue;
>  			}
>  		}
>  
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  		list_move_tail(&sas_device->list, &ioc->sas_device_list);
> +		atomic_dec(&sas_device->is_on_sas_device_init_list);
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
> +	ioc->discovered_device_addition_on = 0;
>  }
>  
>  /**
> -- 
> 2.0.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sreekanth Reddy May 12, 2015, 9:38 a.m. UTC | #3
HI Tomas & Calvin,

Thanks for reviewing this patch.

There is some problem with this patch, In this patch as the driver is
ignoring the device removal event (when ioc's
discovered_device_addition_on flag and device's
is_on_sas_device_init_list is one) so driver not freeing the
sas_device structure from the sas_device_init_list list.

Due to this when ever same device is hot plugged then the device
addition to the SML is not happing.

I have one more patch to fix the original issue and I will post it today.

Regards,
Sreekanth

On Tue, May 5, 2015 at 9:05 PM, Tomas Henzl <thenzl@redhat.com> wrote:
>
> On 05/04/2015 05:05 PM, Sreekanth Reddy wrote:
> > I have applied this patch on the latest upstream mpt3sas driver, then I have compiled and loaded the driver.
> > In the driver logs I didn't see any attached drives are added to the OS, 'fdisk -l' command also doesn't list
> >  the drives which are actually attached to the HBA.
> >
> > When I debug this issue then I see that in '_scsih_target_alloc'
> >  driver is searching for sas_device from the lists 'sas_device_init_list' & 'sas_device_list'
> >  based on the device sas address using the function mpt3sas_scsih_sas_device_find_by_sas_address(),
> >  since this device is not in the 'sas_device_init_list' (as it is moved it to head list) driver exit
> >  from this function without updating the required device addition information.
> >
> > To solve the original problem (i.e memory corruption), here I have attached the patch,
> >  in this patch I have added one atomic flag is_on_sas_device_init_list in _sas_device_structure
> >  and I followed below algorithm.
> >
> > 1. when ever a device is added to sas_device_init_list then driver will set this atomic flag of this device to one.
> >
> > 2. And during the addition of this device to SCSI mid layer,
> >         if the device is successfully added to the OS then driver will move this device list in to sas_device_list list from sas_device_init_list list and at this time driver will reset this flag to zero.
> >         if device is failed to register with SCSI mid layer then also driver will reset this flag to zero in function _scsih_sas_device_remove and will remove the device entry from sas_device_init_list and will free the device structure.
> >
> > 3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
> >          a. driver will check whether addition of discovered devices to SML process is currently running or not,
> >                i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
> >                     if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
> >              ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
> >
> > 4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this  device removal event.
> >
> > Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
> > ---
> >  drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 ++
> >  drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
> >  drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 ++
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
> >  4 files changed, 71 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> > index caff8d1..1aa10d2 100644
> > --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> > @@ -376,6 +376,7 @@ struct _sas_device {
> >       u8      phy;
> >       u8      responding;
> >       u8      pfa_led_on;
> > +     atomic_t is_on_sas_device_init_list;
>
> Hi Sreekanth,
> when is_on_sas_device_init_list is used it's protected
> by ioc->sas_device_lock - why do you need a atomic_t ?
> There is one exception, but easily fixable.
>
> >  };
> >
> >  /**
> > @@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
> >       u8              broadcast_aen_busy;
> >       u16             broadcast_aen_pending;
> >       u8              shost_recovery;
> > +     u8              discovered_device_addition_on;
> >
> >       struct mutex    reset_in_progress_mutex;
> >       spinlock_t      ioc_reset_in_progress_lock;
> > diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> > index 3f26147..2a61286 100644
> > --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> > +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> > @@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
> >      struct _sas_device *sas_device)
> >  {
> >       unsigned long flags;
> > +     struct _sas_device *same_sas_device;
> >
> >       if (!sas_device)
> >               return;
> >
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > -     list_del(&sas_device->list);
> > -     kfree(sas_device);
> > +     same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> > +                                             sas_device->handle);
>
> Is it possible that when same_sas_device is not null, that the
> value is not the same as for the sas_device ?
>
> > +     if (same_sas_device) {
> > +             list_del(&same_sas_device->list);
> > +             if (atomic_read(&sas_device->is_on_sas_device_init_list))
>
> Seems easier to just set the variable without a test.
>
> > +                     atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> > +             kfree(same_sas_device);
> > +     }
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >  }
> >
> > @@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
> >           "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
> >           sas_device->handle, (unsigned long long)sas_device->sas_address));
> >
> > +     atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >       list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
> >       _scsih_determine_boot_device(ioc, sas_device, 0);
> > @@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
> >
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > -     if (sas_device)
> > -             list_del(&sas_device->list);
> > +     if (sas_device) {
> > +             if (ioc->discovered_device_addition_on &&
> > +                 atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > +                     spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > +                     return;
> > +             } else
> > +                     list_del(&sas_device->list);
> > +     }
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >       if (sas_device)
> >               _scsih_remove_device(ioc, sas_device);
> > @@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >       sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
> >           sas_address);
> > -     if (sas_device)
> > -             list_del(&sas_device->list);
> > +     if (sas_device) {
> > +             if (ioc->discovered_device_addition_on &&
> > +                 atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > +                     spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > +                     return;
> > +             } else
> > +                     list_del(&sas_device->list);
> > +     }
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >       if (sas_device)
> >               _scsih_remove_device(ioc, sas_device);
> > @@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
> >       struct _sas_device *sas_device, *next;
> >       unsigned long flags;
> >
> > +     ioc->discovered_device_addition_on = 1;
> >       /* SAS Device List */
> >       list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> >           list) {
> >
> >               if (ioc->hide_drives)
> >                       continue;
> > -
> > +
> >               if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
> >                   sas_device->sas_address_parent)) {
> > -                     list_del(&sas_device->list);
> > -                     kfree(sas_device);
> > +                     mpt2sas_transport_port_remove(ioc,
> > +                                     sas_device->sas_address,
> > +                                     sas_device->sas_address_parent);
> > +                     _scsih_sas_device_remove(ioc, sas_device);
> >                       continue;
> >               } else if (!sas_device->starget) {
> >                       if (!ioc->is_driver_loading) {
> >                               mpt2sas_transport_port_remove(ioc,
> >                                       sas_device->sas_address,
> >                                       sas_device->sas_address_parent);
> > -                             list_del(&sas_device->list);
> > -                             kfree(sas_device);
> > +                             _scsih_sas_device_remove(ioc, sas_device);
> >                               continue;
> >                       }
> >               }
> >               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >               list_move_tail(&sas_device->list, &ioc->sas_device_list);
> > +             atomic_dec(&sas_device->is_on_sas_device_init_list);
>
> Why not 'atomic_set(&sas_device->is_on_sas_device_init_list, 0);' ?
> There is no place where you set the value of is_on_sas_device_init_list
> higher than '1'.
>
> Cheers,
> Tomas
>
> >               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >       }
> > +     ioc->discovered_device_addition_on = 0;
> >  }
> >
> >  /**
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> > index afa8816..6188490 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> > @@ -315,6 +315,7 @@ struct _sas_device {
> >       u8      responding;
> >       u8      fast_path;
> >       u8      pfa_led_on;
> > +     atomic_t is_on_sas_device_init_list;
> >  };
> >
> >  /**
> > @@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
> >       u8              broadcast_aen_busy;
> >       u16             broadcast_aen_pending;
> >       u8              shost_recovery;
> > +     u8              discovered_device_addition_on;
> >
> >       struct mutex    reset_in_progress_mutex;
> >       spinlock_t      ioc_reset_in_progress_lock;
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index 5a97e32..53cc9ea 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> >       struct _sas_device *sas_device)
> >  {
> >       unsigned long flags;
> > +     struct _sas_device *same_sas_device;
> >
> >       if (!sas_device)
> >               return;
> >
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > -     list_del(&sas_device->list);
> > -     kfree(sas_device);
> > +     same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> > +                                             sas_device->handle);
> > +     if (same_sas_device) {
> > +             list_del(&same_sas_device->list);
> > +             if (atomic_read(&sas_device->is_on_sas_device_init_list))
> > +                     atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> > +             kfree(same_sas_device);
> > +     }
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >  }
> >
> > @@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> >
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > -     if (sas_device)
> > -             list_del(&sas_device->list);
> > +     if (sas_device) {
> > +             if (ioc->discovered_device_addition_on &&
> > +                 atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > +                     spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > +                     return;
> > +             } else
> > +                     list_del(&sas_device->list);
> > +     }
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >       if (sas_device)
> >               _scsih_remove_device(ioc, sas_device);
> > @@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> >           sas_address);
> > -     if (sas_device)
> > -             list_del(&sas_device->list);
> > +     if (sas_device) {
> > +             if (ioc->discovered_device_addition_on &&
> > +                 atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > +                     spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > +                     return;
> > +             } else
> > +                     list_del(&sas_device->list);
> > +     }
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >       if (sas_device)
> >               _scsih_remove_device(ioc, sas_device);
> > @@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
> >               ioc->name, __func__, sas_device->handle,
> >               (unsigned long long)sas_device->sas_address));
> >
> > +     atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> >       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >       list_add_tail(&sas_device->list, &ioc->sas_device_list);
> >       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > @@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> >       struct _sas_device *sas_device, *next;
> >       unsigned long flags;
> >
> > +     ioc->discovered_device_addition_on = 1;
> >       /* SAS Device List */
> >       list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> >           list) {
> >
> >               if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> >                   sas_device->sas_address_parent)) {
> > -                     list_del(&sas_device->list);
> > -                     kfree(sas_device);
> > +                     mpt3sas_transport_port_remove(ioc,
> > +                                     sas_device->sas_address,
> > +                                     sas_device->sas_address_parent);
> > +                     _scsih_sas_device_remove(ioc, sas_device);
> >                       continue;
> >               } else if (!sas_device->starget) {
> >                       /*
> > @@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> >                               mpt3sas_transport_port_remove(ioc,
> >                                   sas_device->sas_address,
> >                                   sas_device->sas_address_parent);
> > -                             list_del(&sas_device->list);
> > -                             kfree(sas_device);
> > +                             _scsih_sas_device_remove(ioc, sas_device);
> >                               continue;
> >                       }
> >               }
> >
> >               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> >               list_move_tail(&sas_device->list, &ioc->sas_device_list);
> > +             atomic_dec(&sas_device->is_on_sas_device_init_list);
> >               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> >       }
> > +     ioc->discovered_device_addition_on = 0;
> >  }
> >
> >  /**
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens May 15, 2015, 3:41 a.m. UTC | #4
Hello all,

This patchset attempts to address problems we've been having with
panics due to memory corruption from the mpt2sas driver.

I will provide a similar set of fixes for mpt3sas, since we see
similar issues there as well. "Porting" this to mpt3sas will be
trivial since the part of the driver I'm touching is nearly identical
between the two, so I thought it would be simpler to review a patch
against mpt2sas alone at first.

I've tested this for a few days on a big storage box that seemed to be
very susceptible to the panics, and so far it seems to have eliminated
them.

Thanks,
Calvin


Total diffstat:

 drivers/scsi/mpt2sas/mpt2sas_base.h      |  20 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c     | 482 +++++++++++++++++++++----------
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 359 insertions(+), 155 deletions(-)

Patches:

* [PATCH 1/6] Add refcount to sas_device struct
* [PATCH 2/6] Refactor code to use new sas_device refcount
* [PATCH 3/6] Fix unsafe sas_device_list usage
* [PATCH 4/6] Add refcount to fw_event_work struct
* [PATCH 5/6] Refactor code to use new fw_event refcount
* [PATCH 6/6] Fix unsafe fw_event_list usage
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens June 9, 2015, 3:50 a.m. UTC | #5
Hello all,

This patchset attempts to address problems we've been having with
panics due to memory corruption from the mpt2sas driver.

I will provide a similar set of fixes for mpt3sas, since we see
similar issues there as well. "Porting" this to mpt3sas will be
trivial since the part of the driver I'm touching is nearly identical
between the two, so I thought it would be simpler to review a patch
against mpt2sas alone at first.

I've tested this on a handful of large storage boxes over the past few
weeks, so far it seems to have completely eliminated the memory
corruption panics.

Thanks,
Calvin


Total diffstat:

 drivers/scsi/mpt2sas/mpt2sas_base.h      |  20 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c     | 482 +++++++++++++++++++++----------
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 359 insertions(+), 155 deletions(-)

Patches:

* [PATCH 1/6] Add refcount to sas_device struct
* [PATCH 2/6] Refactor code to use new sas_device refcount
* [PATCH 3/6] Fix unsafe sas_device_list usage
* [PATCH 4/6] Add refcount to fw_event_work struct
* [PATCH 5/6] Refactor code to use new fw_event refcount
* [PATCH 6/6] Fix unsafe fw_event_list usage
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe July 2, 2015, 7:22 p.m. UTC | #6
On 05/14/2015 09:41 PM, Calvin Owens wrote:
> Hello all,
>
> This patchset attempts to address problems we've been having with
> panics due to memory corruption from the mpt2sas driver.
>
> I will provide a similar set of fixes for mpt3sas, since we see
> similar issues there as well. "Porting" this to mpt3sas will be
> trivial since the part of the driver I'm touching is nearly identical
> between the two, so I thought it would be simpler to review a patch
> against mpt2sas alone at first.
>
> I've tested this for a few days on a big storage box that seemed to be
> very susceptible to the panics, and so far it seems to have eliminated
> them.

Guys, can someone outside of FB please review this? We're hitting random 
memory corruptions without these fixes.
Bart Van Assche July 2, 2015, 8:15 p.m. UTC | #7
On 06/08/2015 08:50 PM, Calvin Owens wrote:
> This patchset attempts to address problems we've been having with
> panics due to memory corruption from the mpt2sas driver.
>
> I will provide a similar set of fixes for mpt3sas, since we see
> similar issues there as well. "Porting" this to mpt3sas will be
> trivial since the part of the driver I'm touching is nearly identical
> between the two, so I thought it would be simpler to review a patch
> against mpt2sas alone at first.
>
> I've tested this on a handful of large storage boxes over the past few
> weeks, so far it seems to have completely eliminated the memory
> corruption panics.

If you have to repost this series please convert 
BUG_ON(!spin_is_locked(&ioc->sas_device_lock)); into 
lockdep_is_held(...). Otherwise, for the whole series:

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens July 12, 2015, 4:24 a.m. UTC | #8
Hello all,

This patchset attempts to address problems we've been having with
panics due to memory corruption from the mpt2sas driver.

Thanks,
Calvin

Patches in this series:
[PATCH 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list usage
[PATCH 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

Changes since v1:
	* Squished patches 1-3 and 4-6 into two patches
	* s/BUG_ON(!spin_is_locked/assert_spin_locked/g
	* Use more succinct fuction names
	* Store a pointer to the sas_device object in ->hostdata to eliminate
	  the need for several lookups on the lists.
	* Remove the fw_event from fw_event_list at the start of
	  _firmware_event_work()
	* Explicitly separate fw_event_list removal from fw_event freeing

Total diffstat:

 drivers/scsi/mpt2sas/mpt2sas_base.h      |  22 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c     | 535 +++++++++++++++++++++----------
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 396 insertions(+), 173 deletions(-)

Diff showing changes v1 => v2:
	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v1v2.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens Aug. 1, 2015, 5:02 a.m. UTC | #9
Hello all,

This patchset attempts to address problems we've been having with
panics due to memory corruption from the mpt2sas driver.

Changes are noted in the individual patches, I realized putting them in the
cover was probably a bit confusing.

Thanks,
Calvin


Patches in this series:
 [PATCH v3 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list
 [PATCH v3 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

Total diffstat:
 drivers/scsi/mpt2sas/mpt2sas_base.h      |  22 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c     | 579 ++++++++++++++++++++++---------
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 439 insertions(+), 174 deletions(-)

Diff showing changes v2 => v3:
	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v2v3.patch

Diff showing changes v1 => v2:
	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v1v2.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Calvin Owens Aug. 14, 2015, 1:48 a.m. UTC | #10
Hello all,

This patchset attempts to address problems we've been having with
panics due to memory corruption from the mpt2sas driver.

Thanks,
Calvin


[PATCH v4 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list
[PATCH v4 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage

Total diffstat:
 drivers/scsi/mpt2sas/mpt2sas_base.h      |  22 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c     | 592 ++++++++++++++++++++++---------
 drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
 3 files changed, 451 insertions(+), 175 deletions(-)

Diff showing changes v3 => v4:
	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v3v4.patch

Diff showing changes v2 => v3:
	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v2v3.patch

Diff showing changes v1 => v2:
	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v1v2.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Aug. 25, 2015, 9:21 p.m. UTC | #11
On Thu, 2015-08-13 at 18:48 -0700, Calvin Owens wrote:
> Hello all,
> 
> This patchset attempts to address problems we've been having with
> panics due to memory corruption from the mpt2sas driver.
> 
> Thanks,
> Calvin
> 
> 
> [PATCH v4 1/2] mpt2sas: Refcount sas_device objects and fix unsafe list
> [PATCH v4 2/2] mpt2sas: Refcount fw_events and fix unsafe list usage
> 
> Total diffstat:
>  drivers/scsi/mpt2sas/mpt2sas_base.h      |  22 +-
>  drivers/scsi/mpt2sas/mpt2sas_scsih.c     | 592 ++++++++++++++++++++++---------
>  drivers/scsi/mpt2sas/mpt2sas_transport.c |  12 +-
>  3 files changed, 451 insertions(+), 175 deletions(-)
> 
> Diff showing changes v3 => v4:
> 	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v3v4.patch
> 
> Diff showing changes v2 => v3:
> 	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v2v3.patch
> 
> Diff showing changes v1 => v2:
> 	http://jcalvinowens.github.io/stuff/mpt2sas-patchset-v1v2.patch
> --

(Adding JEJB CC')

James, please considering pick this up for v4.3-rc1.

Btw, I'm seeing the same type of issues on mpt3sas, and unless someone
at Avago is already working on a similar patch series, I'll end up
forward porting these to mpt3sas code.

Thank you,

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
index caff8d1..1aa10d2 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -376,6 +376,7 @@  struct _sas_device {
 	u8	phy;
 	u8	responding;
 	u8	pfa_led_on;
+	atomic_t is_on_sas_device_init_list;
 };
 
 /**
@@ -833,6 +834,7 @@  struct MPT2SAS_ADAPTER {
 	u8		broadcast_aen_busy;
 	u16		broadcast_aen_pending;
 	u8		shost_recovery;
+	u8		discovered_device_addition_on;
 
 	struct mutex	reset_in_progress_mutex;
 	spinlock_t 	ioc_reset_in_progress_lock;
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 3f26147..2a61286 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -590,13 +590,20 @@  _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
     struct _sas_device *sas_device)
 {
 	unsigned long flags;
+	struct _sas_device *same_sas_device;
 
 	if (!sas_device)
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	list_del(&sas_device->list);
-	kfree(sas_device);
+	same_sas_device = _scsih_sas_device_find_by_handle(ioc,
+						sas_device->handle);
+	if (same_sas_device) {
+		list_del(&same_sas_device->list);
+		if (atomic_read(&sas_device->is_on_sas_device_init_list))
+			atomic_set(&sas_device->is_on_sas_device_init_list, 0);
+		kfree(same_sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 }
 
@@ -658,6 +664,7 @@  _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
 	    "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
 	    sas_device->handle, (unsigned long long)sas_device->sas_address));
 
+	atomic_set(&sas_device->is_on_sas_device_init_list, 1);
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
 	_scsih_determine_boot_device(ioc, sas_device, 0);
@@ -5364,8 +5371,14 @@  _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-	if (sas_device)
-		list_del(&sas_device->list);
+	if (sas_device) {
+		if (ioc->discovered_device_addition_on &&
+		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
+			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+			return;
+		} else
+			list_del(&sas_device->list);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	if (sas_device)
 		_scsih_remove_device(ioc, sas_device);
@@ -5391,8 +5404,14 @@  mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
 	    sas_address);
-	if (sas_device)
-		list_del(&sas_device->list);
+	if (sas_device) {
+		if (ioc->discovered_device_addition_on &&
+		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
+			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+			return;
+		} else
+			list_del(&sas_device->list);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	if (sas_device)
 		_scsih_remove_device(ioc, sas_device);
@@ -7978,32 +7997,36 @@  _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
 	struct _sas_device *sas_device, *next;
 	unsigned long flags;
 
+	ioc->discovered_device_addition_on = 1;
 	/* SAS Device List */
 	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
 	    list) {
 
 		if (ioc->hide_drives)
 			continue;
-
+
 		if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
 		    sas_device->sas_address_parent)) {
-			list_del(&sas_device->list);
-			kfree(sas_device);
+			mpt2sas_transport_port_remove(ioc,
+					sas_device->sas_address,
+					sas_device->sas_address_parent);
+			_scsih_sas_device_remove(ioc, sas_device);
 			continue;
 		} else if (!sas_device->starget) {
 			if (!ioc->is_driver_loading) {
 				mpt2sas_transport_port_remove(ioc,
 					sas_device->sas_address,
 					sas_device->sas_address_parent);
-				list_del(&sas_device->list);
-				kfree(sas_device);
+				_scsih_sas_device_remove(ioc, sas_device);
 				continue;
 			}
 		}
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
 		list_move_tail(&sas_device->list, &ioc->sas_device_list);
+		atomic_dec(&sas_device->is_on_sas_device_init_list);
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
+	ioc->discovered_device_addition_on = 0;
 }
 
 /**
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index afa8816..6188490 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -315,6 +315,7 @@  struct _sas_device {
 	u8	responding;
 	u8	fast_path;
 	u8	pfa_led_on;
+	atomic_t is_on_sas_device_init_list;
 };
 
 /**
@@ -766,6 +767,7 @@  struct MPT3SAS_ADAPTER {
 	u8		broadcast_aen_busy;
 	u16		broadcast_aen_pending;
 	u8		shost_recovery;
+	u8		discovered_device_addition_on;
 
 	struct mutex	reset_in_progress_mutex;
 	spinlock_t	ioc_reset_in_progress_lock;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..53cc9ea 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -582,13 +582,20 @@  _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
 	struct _sas_device *sas_device)
 {
 	unsigned long flags;
+	struct _sas_device *same_sas_device;
 
 	if (!sas_device)
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	list_del(&sas_device->list);
-	kfree(sas_device);
+	same_sas_device = _scsih_sas_device_find_by_handle(ioc,
+						sas_device->handle);
+	if (same_sas_device) {
+		list_del(&same_sas_device->list);
+		if (atomic_read(&sas_device->is_on_sas_device_init_list))
+			atomic_set(&sas_device->is_on_sas_device_init_list, 0);
+		kfree(same_sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 }
 
@@ -610,8 +616,14 @@  _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-	if (sas_device)
-		list_del(&sas_device->list);
+	if (sas_device) {
+		if (ioc->discovered_device_addition_on &&
+		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
+			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+			return;
+		} else
+			list_del(&sas_device->list);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	if (sas_device)
 		_scsih_remove_device(ioc, sas_device);
@@ -637,8 +649,14 @@  mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
 	    sas_address);
-	if (sas_device)
-		list_del(&sas_device->list);
+	if (sas_device) {
+		if (ioc->discovered_device_addition_on &&
+		    atomic_read(&sas_device->is_on_sas_device_init_list)) {
+			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+			return;
+		} else
+			list_del(&sas_device->list);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	if (sas_device)
 		_scsih_remove_device(ioc, sas_device);
@@ -663,6 +681,7 @@  _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
 		ioc->name, __func__, sas_device->handle,
 		(unsigned long long)sas_device->sas_address));
 
+	atomic_set(&sas_device->is_on_sas_device_init_list, 1);
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	list_add_tail(&sas_device->list, &ioc->sas_device_list);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -7610,14 +7629,17 @@  _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
 	struct _sas_device *sas_device, *next;
 	unsigned long flags;
 
+	ioc->discovered_device_addition_on = 1;
 	/* SAS Device List */
 	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
 	    list) {
 
 		if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
 		    sas_device->sas_address_parent)) {
-			list_del(&sas_device->list);
-			kfree(sas_device);
+			mpt3sas_transport_port_remove(ioc,
+					sas_device->sas_address,
+					sas_device->sas_address_parent);
+			_scsih_sas_device_remove(ioc, sas_device);
 			continue;
 		} else if (!sas_device->starget) {
 			/*
@@ -7630,16 +7652,17 @@  _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
 				mpt3sas_transport_port_remove(ioc,
 				    sas_device->sas_address,
 				    sas_device->sas_address_parent);
-				list_del(&sas_device->list);
-				kfree(sas_device);
+				_scsih_sas_device_remove(ioc, sas_device);
 				continue;
 			}
 		}
 
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
 		list_move_tail(&sas_device->list, &ioc->sas_device_list);
+		atomic_dec(&sas_device->is_on_sas_device_init_list);
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
+	ioc->discovered_device_addition_on = 0;
 }
 
 /**