diff mbox

[for,4.1,resend] libsas: fix "sysfs group not found" warnings at port teardown time

Message ID 20150618032204.29990.87007.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams June 18, 2015, 3:22 a.m. UTC
Praveen reports:

    After some debugging this is what I have found

    sas_phye_loss_of_signal gets triggered on phy_event from mvsas
        sas_phye_loss_of_signal calls sas_deform_port
             sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
             sas_deform_port calls sas_port_delete
                     sas_port_delete calls sas_port_delete_link
                             sysfs_remove_group: kobject 'port-X:Y'
                     sas_port_delete calls device_del
                             sysfs_remove_group: kobject 'port-X:Y'

    sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
        sas_destruct_devices calls sas_rphy_delete
            sas_rphy_delete calls scsi_remove_device
                 scsi_remove_device calls __scsi_remove_device
                         __scsi_remove_device calls bsg_unregister_queue
                                 bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'

    Since X:0:0:0 falls under port-X:Y (which got deleted during
    sas_port_delete), this call results in the warning. All the later
    warnings in the dmesg output I sent earlier are trying to delete objects
    under port-X:Y. Since port-X:Y got recursively deleted, all these calls
    result in warnings. Since, the PHY and DISC events are processed in two
    different work queues (and one triggers the other), is there any way
    other than checking if the object exists in sysfs (in device_del) before
    deleting?

    WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
    sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
    [..]
    CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
    Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
    Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
     0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
     ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
     ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
    Call Trace:
     [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
     [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
     [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
     [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
     [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
     [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
     [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]

It appears we've always been double deleting the devices below sas_port,
but recent sysfs changes now exposes this problem.  Libsas should delete
all the devices from rphy down before deleting the parent port.

Cc: <stable@vger.kernel.org>
Reported-by: Praveen Murali <pmurali@logicube.com>
Tested-by: Praveen Murali <pmurali@logicube.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |    6 +++---
 drivers/scsi/libsas/sas_port.c     |    1 -
 2 files changed, 3 insertions(+), 4 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hannes Reinecke June 21, 2015, 2:27 p.m. UTC | #1
On 06/18/2015 05:22 AM, Dan Williams wrote:
> Praveen reports:
> 
>     After some debugging this is what I have found
> 
>     sas_phye_loss_of_signal gets triggered on phy_event from mvsas
>         sas_phye_loss_of_signal calls sas_deform_port
>              sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
>              sas_deform_port calls sas_port_delete
>                      sas_port_delete calls sas_port_delete_link
>                              sysfs_remove_group: kobject 'port-X:Y'
>                      sas_port_delete calls device_del
>                              sysfs_remove_group: kobject 'port-X:Y'
> 
>     sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
>         sas_destruct_devices calls sas_rphy_delete
>             sas_rphy_delete calls scsi_remove_device
>                  scsi_remove_device calls __scsi_remove_device
>                          __scsi_remove_device calls bsg_unregister_queue
>                                  bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'
> 
>     Since X:0:0:0 falls under port-X:Y (which got deleted during
>     sas_port_delete), this call results in the warning. All the later
>     warnings in the dmesg output I sent earlier are trying to delete objects
>     under port-X:Y. Since port-X:Y got recursively deleted, all these calls
>     result in warnings. Since, the PHY and DISC events are processed in two
>     different work queues (and one triggers the other), is there any way
>     other than checking if the object exists in sysfs (in device_del) before
>     deleting?
> 
>     WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
>     sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
>     [..]
>     CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
>     Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
>     Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
>      0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
>      ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
>      ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
>     Call Trace:
>      [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
>      [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
>      [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
>      [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
>      [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
>      [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
>      [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]
> 
> It appears we've always been double deleting the devices below sas_port,
> but recent sysfs changes now exposes this problem.  Libsas should delete
> all the devices from rphy down before deleting the parent port.
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: Praveen Murali <pmurali@logicube.com>
> Tested-by: Praveen Murali <pmurali@logicube.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
James Bottomley July 22, 2015, 6:28 p.m. UTC | #2
On Wed, 2015-06-17 at 23:22 -0400, Dan Williams wrote:
> Praveen reports:
> 
>     After some debugging this is what I have found
> 
>     sas_phye_loss_of_signal gets triggered on phy_event from mvsas
>         sas_phye_loss_of_signal calls sas_deform_port
>              sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
>              sas_deform_port calls sas_port_delete
>                      sas_port_delete calls sas_port_delete_link
>                              sysfs_remove_group: kobject 'port-X:Y'
>                      sas_port_delete calls device_del
>                              sysfs_remove_group: kobject 'port-X:Y'
> 
>     sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
>         sas_destruct_devices calls sas_rphy_delete
>             sas_rphy_delete calls scsi_remove_device
>                  scsi_remove_device calls __scsi_remove_device
>                          __scsi_remove_device calls bsg_unregister_queue
>                                  bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'
> 
>     Since X:0:0:0 falls under port-X:Y (which got deleted during
>     sas_port_delete), this call results in the warning. All the later
>     warnings in the dmesg output I sent earlier are trying to delete objects
>     under port-X:Y. Since port-X:Y got recursively deleted, all these calls
>     result in warnings. Since, the PHY and DISC events are processed in two
>     different work queues (and one triggers the other), is there any way
>     other than checking if the object exists in sysfs (in device_del) before
>     deleting?
> 
>     WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
>     sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
>     [..]
>     CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
>     Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
>     Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
>      0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
>      ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
>      ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
>     Call Trace:
>      [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
>      [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
>      [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
>      [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
>      [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
>      [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
>      [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]
> 
> It appears we've always been double deleting the devices below sas_port,
> but recent sysfs changes now exposes this problem.  Libsas should delete
> all the devices from rphy down before deleting the parent port.

There's a missing description of the fix here.

        So we make the DISCE_DESTROY event delete the port as well as
        all the underlying devices
?

> Cc: <stable@vger.kernel.org>
> Reported-by: Praveen Murali <pmurali@logicube.com>
> Tested-by: Praveen Murali <pmurali@logicube.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/scsi/libsas/sas_discover.c |    6 +++---
>  drivers/scsi/libsas/sas_port.c     |    1 -
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> index 60de66252fa2..a4db770fe8b0 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -362,11 +362,14 @@ static void sas_destruct_devices(struct work_struct *work)
>  	clear_bit(DISCE_DESTRUCT, &port->disc.pending);
>  
>  	list_for_each_entry_safe(dev, n, &port->destroy_list, disco_list_node) {
> +		struct sas_port *sas_port = dev_to_sas_port(dev->rphy->dev.parent);
> +

Do you need this? isn't what you've elaborately got here as sas_port,
simply port->port?  Assuming you don't NULL that out (see below) all
this goes away.

>  		list_del_init(&dev->disco_list_node);
>  
>  		sas_remove_children(&dev->rphy->dev);
>  		sas_rphy_delete(dev->rphy);
>  		sas_unregister_common_dev(port, dev);
> +		sas_port_delete(sas_port);

So this becomes sas_port_delete(port->port);

>  	}
>  }
>  
> @@ -400,9 +403,6 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>  
>  	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>  		sas_unregister_dev(port, dev);
> -
> -	port->port->rphy = NULL;
> -

Why does this line need removing.  It's only used by ATA devices on an
expander, but it's logical that it removes the visibility of the device
being destroyed.

>  }
>  
>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> index d3c5297c6c89..9a25ae3a52a4 100644
> --- a/drivers/scsi/libsas/sas_port.c
> +++ b/drivers/scsi/libsas/sas_port.c
> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>  
>  	if (port->num_phys == 1) {
>  		sas_unregister_domain_devices(port, gone);
> -		sas_port_delete(port->port);
>  		port->port = NULL;
>  	} else {
>  		sas_port_delete_phy(port->port, phy->phy);
> 

This should become

if (port->num_phys == 1)
	sas_unregister_domain_device(port, gone);

sas_port_delete_phy(port->port, phy->phy);

So we end up with a port scheduled for destruction with no phys rather
than making the last phy association hang around until the DISCE
workqueue runs.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 22, 2015, 9:08 p.m. UTC | #3
On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2015-06-17 at 23:22 -0400, Dan Williams wrote:
>> Praveen reports:
>>
>>     After some debugging this is what I have found
>>
>>     sas_phye_loss_of_signal gets triggered on phy_event from mvsas
>>         sas_phye_loss_of_signal calls sas_deform_port
>>              sas_deform_port posts a DISCE_DESTRUCT event (sas_unregister_domain_devices-> sas_unregister_dev)
>>              sas_deform_port calls sas_port_delete
>>                      sas_port_delete calls sas_port_delete_link
>>                              sysfs_remove_group: kobject 'port-X:Y'
>>                      sas_port_delete calls device_del
>>                              sysfs_remove_group: kobject 'port-X:Y'
>>
>>     sas_destruct_devices gets triggered for the destruct event (DISCE_DESTRUCT)
>>         sas_destruct_devices calls sas_rphy_delete
>>             sas_rphy_delete calls scsi_remove_device
>>                  scsi_remove_device calls __scsi_remove_device
>>                          __scsi_remove_device calls bsg_unregister_queue
>>                                  bsg_unregister_queue -> device_unregister -> device_del -> sysfs_remove_group: kobject 'X:0:0:0'
>>
>>     Since X:0:0:0 falls under port-X:Y (which got deleted during
>>     sas_port_delete), this call results in the warning. All the later
>>     warnings in the dmesg output I sent earlier are trying to delete objects
>>     under port-X:Y. Since port-X:Y got recursively deleted, all these calls
>>     result in warnings. Since, the PHY and DISC events are processed in two
>>     different work queues (and one triggers the other), is there any way
>>     other than checking if the object exists in sysfs (in device_del) before
>>     deleting?
>>
>>     WARNING: CPU: 2 PID: 6 at fs/sysfs/group.c:219 device_del+0x40/0x1c0()
>>     sysfs group ffffffff818b97e0 not found for kobject '2:0:4:0'
>>     [..]
>>     CPU: 2 PID: 6 Comm: kworker/u8:0 Tainted: P        W  O  3.16.7-ckt9-logicube-ng.3 #1
>>     Hardware name: To be filled by O.E.M. To be filled by O.E.M./VT6085, BIOS 4.6.5 01/23/2015
>>     Workqueue: scsi_wq_2 sas_destruct_devices [libsas]
>>      0000000000000009 ffffffff8151cd18 ffff88011b35bcd8 ffffffff810687b7
>>      ffff88011a661400 ffff88011b35bd28 ffff8800c6e5e968 ffff880000028810
>>      ffff8800c89f2c00 ffffffff8106881c ffffffff81733b68 0000000000000028
>>     Call Trace:
>>      [<ffffffff8151cd18>] ? dump_stack+0x41/0x51
>>      [<ffffffff810687b7>] ? warn_slowpath_common+0x77/0x90
>>      [<ffffffff8106881c>] ? warn_slowpath_fmt+0x4c/0x50
>>      [<ffffffff813ad2d0>] ? device_del+0x40/0x1c0
>>      [<ffffffff813ad46a>] ? device_unregister+0x1a/0x70
>>      [<ffffffff812a535e>] ? bsg_unregister_queue+0x5e/0xb0
>>      [<ffffffffa00781a9>] ? __scsi_remove_device+0xa9/0xd0 [scsi_mod]
>>
>> It appears we've always been double deleting the devices below sas_port,
>> but recent sysfs changes now exposes this problem.  Libsas should delete
>> all the devices from rphy down before deleting the parent port.
>
> There's a missing description of the fix here.
>
>         So we make the DISCE_DESTROY event delete the port as well as
>         all the underlying devices
> ?

"We make DISCE_DESTROY responsible for deleting the child devices as
well as the port." == "Libsas should delete all the devices from rphy
down before deleting the parent port."

>> Cc: <stable@vger.kernel.org>
>> Reported-by: Praveen Murali <pmurali@logicube.com>
>> Tested-by: Praveen Murali <pmurali@logicube.com>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>>  drivers/scsi/libsas/sas_discover.c |    6 +++---
>>  drivers/scsi/libsas/sas_port.c     |    1 -
>>  2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
>> index 60de66252fa2..a4db770fe8b0 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -362,11 +362,14 @@ static void sas_destruct_devices(struct work_struct *work)
>>       clear_bit(DISCE_DESTRUCT, &port->disc.pending);
>>
>>       list_for_each_entry_safe(dev, n, &port->destroy_list, disco_list_node) {
>> +             struct sas_port *sas_port = dev_to_sas_port(dev->rphy->dev.parent);
>> +
>
> Do you need this? isn't what you've elaborately got here as sas_port,
> simply port->port?

Yes, it's just an elaborate workaround since port->port is already torn down.

> Assuming you don't NULL that out (see below) all
> this goes away.

Not sure, I'd have to go look if libsas is prepared to have port->port
being valid for longer.

>
>>               list_del_init(&dev->disco_list_node);
>>
>>               sas_remove_children(&dev->rphy->dev);
>>               sas_rphy_delete(dev->rphy);
>>               sas_unregister_common_dev(port, dev);
>> +             sas_port_delete(sas_port);
>
> So this becomes sas_port_delete(port->port);
>
>>       }
>>  }
>>
>> @@ -400,9 +403,6 @@ void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
>>
>>       list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
>>               sas_unregister_dev(port, dev);
>> -
>> -     port->port->rphy = NULL;
>> -
>
> Why does this line need removing.  It's only used by ATA devices on an
> expander, but it's logical that it removes the visibility of the device
> being destroyed.

It's already performed by sas_port_delete()

>
>>  }
>>
>>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
>> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> index d3c5297c6c89..9a25ae3a52a4 100644
>> --- a/drivers/scsi/libsas/sas_port.c
>> +++ b/drivers/scsi/libsas/sas_port.c
>> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>>
>>       if (port->num_phys == 1) {
>>               sas_unregister_domain_devices(port, gone);
>> -             sas_port_delete(port->port);
>>               port->port = NULL;
>>       } else {
>>               sas_port_delete_phy(port->port, phy->phy);
>>
>
> This should become
>
> if (port->num_phys == 1)
>         sas_unregister_domain_device(port, gone);
>
> sas_port_delete_phy(port->port, phy->phy);
>
> So we end up with a port scheduled for destruction with no phys rather
> than making the last phy association hang around until the DISCE
> workqueue runs.

Sounds ok in theory.  I don't have a libsas environment handy, I
worked with Praveen to validate the version as submitted if you want
to re-work it.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley July 27, 2015, 3:17 p.m. UTC | #4
On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> >>  }
> >>
> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> >> index d3c5297c6c89..9a25ae3a52a4 100644
> >> --- a/drivers/scsi/libsas/sas_port.c
> >> +++ b/drivers/scsi/libsas/sas_port.c
> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
> >>
> >>       if (port->num_phys == 1) {
> >>               sas_unregister_domain_devices(port, gone);
> >> -             sas_port_delete(port->port);
> >>               port->port = NULL;
> >>       } else {
> >>               sas_port_delete_phy(port->port, phy->phy);
> >>
> >
> > This should become
> >
> > if (port->num_phys == 1)
> >         sas_unregister_domain_device(port, gone);
> >
> > sas_port_delete_phy(port->port, phy->phy);
> >
> > So we end up with a port scheduled for destruction with no phys rather
> > than making the last phy association hang around until the DISCE
> > workqueue runs.
> 
> Sounds ok in theory.

It's not really a choice.  The specific problem you've introduced with
this patch is failure to cope with link flutter: a deform and form event
queued sequentially.  In the new scheme you're trying to introduce, the
destruct event gets queued from the deform but behind the form and the
link flutter results in a dead link.  I thought just forcing a zero phy
port would fix this, but it won't, either the destruct has to run in the
context of the deform event or the form has to be queued later than the
destruct.  I think coupled with the changes above, there needs to be

if (port->port) {
	/* dying port, requeue form event */
	resend the PORTE_BYTES_DMAED event
	return
}

inside the unmatched port loop in sas_port_form() if nothing is found as
well to close this.

>   I don't have a libsas environment handy, I worked with Praveen to
> validate the version as submitted if you want to re-work it.

A couple of days ago, this was so urgent as to have to go outside the
usual patch process ... now it's not important enough for you to bother
working on it; which is it?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 27, 2015, 3:48 p.m. UTC | #5
On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
>> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> >
>> >>  }
>> >>
>> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
>> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> >> index d3c5297c6c89..9a25ae3a52a4 100644
>> >> --- a/drivers/scsi/libsas/sas_port.c
>> >> +++ b/drivers/scsi/libsas/sas_port.c
>> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>> >>
>> >>       if (port->num_phys == 1) {
>> >>               sas_unregister_domain_devices(port, gone);
>> >> -             sas_port_delete(port->port);
>> >>               port->port = NULL;
>> >>       } else {
>> >>               sas_port_delete_phy(port->port, phy->phy);
>> >>
>> >
>> > This should become
>> >
>> > if (port->num_phys == 1)
>> >         sas_unregister_domain_device(port, gone);
>> >
>> > sas_port_delete_phy(port->port, phy->phy);
>> >
>> > So we end up with a port scheduled for destruction with no phys rather
>> > than making the last phy association hang around until the DISCE
>> > workqueue runs.
>>
>> Sounds ok in theory.
>
> It's not really a choice.  The specific problem you've introduced with
> this patch is failure to cope with link flutter: a deform and form event
> queued sequentially.  In the new scheme you're trying to introduce, the
> destruct event gets queued from the deform but behind the form and the
> link flutter results in a dead link.  I thought just forcing a zero phy
> port would fix this, but it won't, either the destruct has to run in the
> context of the deform event or the form has to be queued later than the
> destruct.  I think coupled with the changes above, there needs to be
>
> if (port->port) {
>         /* dying port, requeue form event */
>         resend the PORTE_BYTES_DMAED event
>         return
> }
>
> inside the unmatched port loop in sas_port_form() if nothing is found as
> well to close this.

I think it's too late.  Once the lldd has triggered libsas to start
tear down I seem to recall the lldd has the expectation that a new
PORTE_BYTES_DMAED triggers the creation of a new port instance for
that phy.  Once the flutter reaches libsas the race is already lost
and the port needs to be torn down, but I would need to take a closer
look.

>>   I don't have a libsas environment handy, I worked with Praveen to
>> validate the version as submitted if you want to re-work it.
>
> A couple of days ago, this was so urgent as to have to go outside the
> usual patch process ... now it's not important enough for you to bother
> working on it; which is it?

Neither, it was a reviewed patch that was idling in the process.  I'm
still of the opinion that pinging Andrew in a case like this *is* the
expected process, unless there's a place I can check that a patch is
still in the application queue?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley July 27, 2015, 5:11 p.m. UTC | #6
On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> >> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> >
> >> >>  }
> >> >>
> >> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> >> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> >> >> index d3c5297c6c89..9a25ae3a52a4 100644
> >> >> --- a/drivers/scsi/libsas/sas_port.c
> >> >> +++ b/drivers/scsi/libsas/sas_port.c
> >> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
> >> >>
> >> >>       if (port->num_phys == 1) {
> >> >>               sas_unregister_domain_devices(port, gone);
> >> >> -             sas_port_delete(port->port);
> >> >>               port->port = NULL;
> >> >>       } else {
> >> >>               sas_port_delete_phy(port->port, phy->phy);
> >> >>
> >> >
> >> > This should become
> >> >
> >> > if (port->num_phys == 1)
> >> >         sas_unregister_domain_device(port, gone);
> >> >
> >> > sas_port_delete_phy(port->port, phy->phy);
> >> >
> >> > So we end up with a port scheduled for destruction with no phys rather
> >> > than making the last phy association hang around until the DISCE
> >> > workqueue runs.
> >>
> >> Sounds ok in theory.
> >
> > It's not really a choice.  The specific problem you've introduced with
> > this patch is failure to cope with link flutter: a deform and form event
> > queued sequentially.  In the new scheme you're trying to introduce, the
> > destruct event gets queued from the deform but behind the form and the
> > link flutter results in a dead link.  I thought just forcing a zero phy
> > port would fix this, but it won't, either the destruct has to run in the
> > context of the deform event or the form has to be queued later than the
> > destruct.  I think coupled with the changes above, there needs to be
> >
> > if (port->port) {
> >         /* dying port, requeue form event */
> >         resend the PORTE_BYTES_DMAED event
> >         return
> > }
> >
> > inside the unmatched port loop in sas_port_form() if nothing is found as
> > well to close this.
> 
> I think it's too late.  Once the lldd has triggered libsas to start
> tear down I seem to recall the lldd has the expectation that a new
> PORTE_BYTES_DMAED triggers the creation of a new port instance for
> that phy.  Once the flutter reaches libsas the race is already lost
> and the port needs to be torn down, but I would need to take a closer
> look.

I don't understand your reasoning.  The expectation is that
PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
this event precedes DISCE_DESTRUCT for the port and requeues the event,
now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 27, 2015, 5:55 p.m. UTC | #7
On Mon, Jul 27, 2015 at 10:11 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
>> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
>> >> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
>> >> <James.Bottomley@hansenpartnership.com> wrote:
>> >> >
>> >> >>  }
>> >> >>
>> >> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
>> >> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> >> >> index d3c5297c6c89..9a25ae3a52a4 100644
>> >> >> --- a/drivers/scsi/libsas/sas_port.c
>> >> >> +++ b/drivers/scsi/libsas/sas_port.c
>> >> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>> >> >>
>> >> >>       if (port->num_phys == 1) {
>> >> >>               sas_unregister_domain_devices(port, gone);
>> >> >> -             sas_port_delete(port->port);
>> >> >>               port->port = NULL;
>> >> >>       } else {
>> >> >>               sas_port_delete_phy(port->port, phy->phy);
>> >> >>
>> >> >
>> >> > This should become
>> >> >
>> >> > if (port->num_phys == 1)
>> >> >         sas_unregister_domain_device(port, gone);
>> >> >
>> >> > sas_port_delete_phy(port->port, phy->phy);
>> >> >
>> >> > So we end up with a port scheduled for destruction with no phys rather
>> >> > than making the last phy association hang around until the DISCE
>> >> > workqueue runs.
>> >>
>> >> Sounds ok in theory.
>> >
>> > It's not really a choice.  The specific problem you've introduced with
>> > this patch is failure to cope with link flutter: a deform and form event
>> > queued sequentially.  In the new scheme you're trying to introduce, the
>> > destruct event gets queued from the deform but behind the form and the
>> > link flutter results in a dead link.  I thought just forcing a zero phy
>> > port would fix this, but it won't, either the destruct has to run in the
>> > context of the deform event or the form has to be queued later than the
>> > destruct.  I think coupled with the changes above, there needs to be
>> >
>> > if (port->port) {
>> >         /* dying port, requeue form event */
>> >         resend the PORTE_BYTES_DMAED event
>> >         return
>> > }
>> >
>> > inside the unmatched port loop in sas_port_form() if nothing is found as
>> > well to close this.
>>
>> I think it's too late.  Once the lldd has triggered libsas to start
>> tear down I seem to recall the lldd has the expectation that a new
>> PORTE_BYTES_DMAED triggers the creation of a new port instance for
>> that phy.  Once the flutter reaches libsas the race is already lost
>> and the port needs to be torn down, but I would need to take a closer
>> look.
>
> I don't understand your reasoning.  The expectation is that
> PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
> this event precedes DISCE_DESTRUCT for the port and requeues the event,
> now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?
>

Ah, I read "flutter" and mistakenly thought "debounce".  You're
addressing the case where we're fully committed to the port going
down, but a "port up" event is injected between the "port down" and
"destruct" events.  Yes, I agree re-queuing needs to happen, however
don't we now have queue re-order problem with respect to a new "port
down" event?  It seems events need to be held off in-order while the
subordinate DISCE events are processed.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley July 27, 2015, 6:07 p.m. UTC | #8
On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> >>   I don't have a libsas environment handy, I worked with Praveen to
> >> validate the version as submitted if you want to re-work it.
> >
> > A couple of days ago, this was so urgent as to have to go outside the
> > usual patch process ... now it's not important enough for you to bother
> > working on it; which is it?
> 
> Neither, it was a reviewed patch that was idling in the process.  I'm
> still of the opinion that pinging Andrew in a case like this *is* the
> expected process, unless there's a place I can check that a patch is
> still in the application queue?

I didn't ask you to justify your process, I asked you how important you
thought the patch was mainly because of the conflicting signals you've
sent.  I get that you think I should treat all your patches as important
whether you do or not, but the world doesn't quite work like that: patch
application is a process of triage.  Patches, like this, which have
timing related issues potentially leading to races get looked at by me
as the last reviewer.  The speed of review depends on several factors,
but one of which is what type of user visible issue is this causing.
The user visible effects of this are a nasty warning message and nothing
more, I believe?  A useful indicator in this triage is how important the
submitter thinks the patch is, which was originally why I asked.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 27, 2015, 6:24 p.m. UTC | #9
On Mon, Jul 27, 2015 at 11:07 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
>> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
>> >>   I don't have a libsas environment handy, I worked with Praveen to
>> >> validate the version as submitted if you want to re-work it.
>> >
>> > A couple of days ago, this was so urgent as to have to go outside the
>> > usual patch process ... now it's not important enough for you to bother
>> > working on it; which is it?
>>
>> Neither, it was a reviewed patch that was idling in the process.  I'm
>> still of the opinion that pinging Andrew in a case like this *is* the
>> expected process, unless there's a place I can check that a patch is
>> still in the application queue?
>
> I didn't ask you to justify your process, I asked you how important you
> thought the patch was mainly because of the conflicting signals you've
> sent.  I get that you think I should treat all your patches as important
> whether you do or not, but the world doesn't quite work like that: patch
> application is a process of triage.  Patches, like this, which have
> timing related issues potentially leading to races get looked at by me
> as the last reviewer.  The speed of review depends on several factors,
> but one of which is what type of user visible issue is this causing.
> The user visible effects of this are a nasty warning message and nothing
> more, I believe?  A useful indicator in this triage is how important the
> submitter thinks the patch is, which was originally why I asked.
>

That would be a question to Praveen.  It wasn't clear to me whether
this sysfs backtrace was a simply a warning or eventually fatal to the
box.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley July 27, 2015, 6:38 p.m. UTC | #10
On Mon, 2015-07-27 at 10:55 -0700, Dan Williams wrote:
> On Mon, Jul 27, 2015 at 10:11 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
> >> On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> > On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
> >> >> On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
> >> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> >> >
> >> >> >>  }
> >> >> >>
> >> >> >>  void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> >> >> >> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
> >> >> >> index d3c5297c6c89..9a25ae3a52a4 100644
> >> >> >> --- a/drivers/scsi/libsas/sas_port.c
> >> >> >> +++ b/drivers/scsi/libsas/sas_port.c
> >> >> >> @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
> >> >> >>
> >> >> >>       if (port->num_phys == 1) {
> >> >> >>               sas_unregister_domain_devices(port, gone);
> >> >> >> -             sas_port_delete(port->port);
> >> >> >>               port->port = NULL;
> >> >> >>       } else {
> >> >> >>               sas_port_delete_phy(port->port, phy->phy);
> >> >> >>
> >> >> >
> >> >> > This should become
> >> >> >
> >> >> > if (port->num_phys == 1)
> >> >> >         sas_unregister_domain_device(port, gone);
> >> >> >
> >> >> > sas_port_delete_phy(port->port, phy->phy);
> >> >> >
> >> >> > So we end up with a port scheduled for destruction with no phys rather
> >> >> > than making the last phy association hang around until the DISCE
> >> >> > workqueue runs.
> >> >>
> >> >> Sounds ok in theory.
> >> >
> >> > It's not really a choice.  The specific problem you've introduced with
> >> > this patch is failure to cope with link flutter: a deform and form event
> >> > queued sequentially.  In the new scheme you're trying to introduce, the
> >> > destruct event gets queued from the deform but behind the form and the
> >> > link flutter results in a dead link.  I thought just forcing a zero phy
> >> > port would fix this, but it won't, either the destruct has to run in the
> >> > context of the deform event or the form has to be queued later than the
> >> > destruct.  I think coupled with the changes above, there needs to be
> >> >
> >> > if (port->port) {
> >> >         /* dying port, requeue form event */
> >> >         resend the PORTE_BYTES_DMAED event
> >> >         return
> >> > }
> >> >
> >> > inside the unmatched port loop in sas_port_form() if nothing is found as
> >> > well to close this.
> >>
> >> I think it's too late.  Once the lldd has triggered libsas to start
> >> tear down I seem to recall the lldd has the expectation that a new
> >> PORTE_BYTES_DMAED triggers the creation of a new port instance for
> >> that phy.  Once the flutter reaches libsas the race is already lost
> >> and the port needs to be torn down, but I would need to take a closer
> >> look.
> >
> > I don't understand your reasoning.  The expectation is that
> > PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
> > this event precedes DISCE_DESTRUCT for the port and requeues the event,
> > now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?
> >
> 
> Ah, I read "flutter" and mistakenly thought "debounce".  You're
> addressing the case where we're fully committed to the port going
> down, but a "port up" event is injected between the "port down" and
> "destruct" events.  Yes, I agree re-queuing needs to happen, however
> don't we now have queue re-order problem with respect to a new "port
> down" event?  It seems events need to be held off in-order while the
> subordinate DISCE events are processed.

No, that seems to be the intent of the prior code.  The reason port
visibility goes immediately (along with all associated phys), is that
the port is ready for reuse as soon as sas_deform_port() returns.
Destroying the subtree is left to DISCE_DESTRUCT, but since the subtree
is not visible, a new port can form even as the elements associated with
the old port are being destroyed.  the DISCE_DISCOVER that populates it
will queue behind the destruct, so everything just works today (modulo
the new warning).

It would be ideal if we could detect in the destruct queue that the
visibility is already gone and make use of it, but we can't.  The patch
that causes this warning induces a mismatch between the state of the
kernfs tree and the kobjects ... we still have to call device_del which
leads to kobject_del (which triggers the warning) to bring this state
back into alignment.  So the disconnected subtree we originally used to
make all this work is what's suddenly been rendered invalid.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Praveen Murali July 27, 2015, 7:53 p.m. UTC | #11
PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBEYW4gV2lsbGlhbXMgW21haWx0
bzpkYW4uai53aWxsaWFtc0BpbnRlbC5jb21dDQo+IFNlbnQ6IE1vbmRheSwgSnVseSAyNywgMjAx
NSAxMToyNSBBTQ0KPiBUbzogSmFtZXMgQm90dG9tbGV5DQo+IENjOiBDaHJpc3RvcGggSGVsbHdp
ZzsgUHJhdmVlbiBNdXJhbGk7IGxpbnV4LXNjc2k7IHN0YWJsZUB2Z2VyLmtlcm5lbC5vcmcNCj4g
U3ViamVjdDogUmU6IFtmb3IgNC4xIFBBVENIIHJlc2VuZF0gbGlic2FzOiBmaXggInN5c2ZzIGdy
b3VwIG5vdCBmb3VuZCIgd2FybmluZ3MNCj4gYXQgcG9ydCB0ZWFyZG93biB0aW1lDQo+IA0KPiA+
IEkgZGlkbid0IGFzayB5b3UgdG8ganVzdGlmeSB5b3VyIHByb2Nlc3MsIEkgYXNrZWQgeW91IGhv
dyBpbXBvcnRhbnQgeW91DQo+ID4gdGhvdWdodCB0aGUgcGF0Y2ggd2FzIG1haW5seSBiZWNhdXNl
IG9mIHRoZSBjb25mbGljdGluZyBzaWduYWxzIHlvdSd2ZQ0KPiA+IHNlbnQuICBJIGdldCB0aGF0
IHlvdSB0aGluayBJIHNob3VsZCB0cmVhdCBhbGwgeW91ciBwYXRjaGVzIGFzIGltcG9ydGFudA0K
PiA+IHdoZXRoZXIgeW91IGRvIG9yIG5vdCwgYnV0IHRoZSB3b3JsZCBkb2Vzbid0IHF1aXRlIHdv
cmsgbGlrZSB0aGF0OiBwYXRjaA0KPiA+IGFwcGxpY2F0aW9uIGlzIGEgcHJvY2VzcyBvZiB0cmlh
Z2UuICBQYXRjaGVzLCBsaWtlIHRoaXMsIHdoaWNoIGhhdmUNCj4gPiB0aW1pbmcgcmVsYXRlZCBp
c3N1ZXMgcG90ZW50aWFsbHkgbGVhZGluZyB0byByYWNlcyBnZXQgbG9va2VkIGF0IGJ5IG1lDQo+
ID4gYXMgdGhlIGxhc3QgcmV2aWV3ZXIuICBUaGUgc3BlZWQgb2YgcmV2aWV3IGRlcGVuZHMgb24g
c2V2ZXJhbCBmYWN0b3JzLA0KPiA+IGJ1dCBvbmUgb2Ygd2hpY2ggaXMgd2hhdCB0eXBlIG9mIHVz
ZXIgdmlzaWJsZSBpc3N1ZSBpcyB0aGlzIGNhdXNpbmcuDQo+ID4gVGhlIHVzZXIgdmlzaWJsZSBl
ZmZlY3RzIG9mIHRoaXMgYXJlIGEgbmFzdHkgd2FybmluZyBtZXNzYWdlIGFuZCBub3RoaW5nDQo+
ID4gbW9yZSwgSSBiZWxpZXZlPyAgQSB1c2VmdWwgaW5kaWNhdG9yIGluIHRoaXMgdHJpYWdlIGlz
IGhvdyBpbXBvcnRhbnQgdGhlDQo+ID4gc3VibWl0dGVyIHRoaW5rcyB0aGUgcGF0Y2ggaXMsIHdo
aWNoIHdhcyBvcmlnaW5hbGx5IHdoeSBJIGFza2VkLg0KPiA+DQo+IA0KPiBUaGF0IHdvdWxkIGJl
IGEgcXVlc3Rpb24gdG8gUHJhdmVlbi4gIEl0IHdhc24ndCBjbGVhciB0byBtZSB3aGV0aGVyDQo+
IHRoaXMgc3lzZnMgYmFja3RyYWNlIHdhcyBhIHNpbXBseSBhIHdhcm5pbmcgb3IgZXZlbnR1YWxs
eSBmYXRhbCB0byB0aGUNCj4gYm94Lg0KQXMgZmFyIGFzIEkgcmVtZW1iZXIsIHRoZSBpc3N1ZSB3
YXMgbW9zdGx5IHdpdGggdGhlIHN5c2ZzIGJhY2t0cmFjZXMuIEkgZG9u4oCZdA0KcmVtZW1iZXIg
aXQgY2F1c2luZyBhIGZhdGFsIGVycm9yOyBidXQgdGhhdCBjb3VsZCB2ZXJ5IHdlbGwgYmUgYmVj
YXVzZSBJIGRpZCBub3QNCnJ1biBpdCBsb25nIGVub3VnaC4NCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams July 27, 2015, 8:52 p.m. UTC | #12
On Mon, Jul 27, 2015 at 11:38 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> No, that seems to be the intent of the prior code.  The reason port
> visibility goes immediately (along with all associated phys), is that
> the port is ready for reuse as soon as sas_deform_port() returns.
> Destroying the subtree is left to DISCE_DESTRUCT, but since the subtree
> is not visible, a new port can form even as the elements associated with
> the old port are being destroyed.  the DISCE_DISCOVER that populates it
> will queue behind the destruct, so everything just works today (modulo
> the new warning).
>
> It would be ideal if we could detect in the destruct queue that the
> visibility is already gone and make use of it, but we can't.  The patch
> that causes this warning induces a mismatch between the state of the
> kernfs tree and the kobjects ... we still have to call device_del which
> leads to kobject_del (which triggers the warning) to bring this state
> back into alignment.  So the disconnected subtree we originally used to
> make all this work is what's suddenly been rendered invalid.
>

Looking at the original report this seemed new for the 3.16 kernel.
However, looking closer, that warning in sysfs_remove_group() has been
present for a long time before that.  I don't recall seeing it back
when we were doing focused hotplug testing with the isci driver for
the 3.5 / 3.6 kernels, so I believe it is newly uncovered since then.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de66252fa2..a4db770fe8b0 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -362,11 +362,14 @@  static void sas_destruct_devices(struct work_struct *work)
 	clear_bit(DISCE_DESTRUCT, &port->disc.pending);
 
 	list_for_each_entry_safe(dev, n, &port->destroy_list, disco_list_node) {
+		struct sas_port *sas_port = dev_to_sas_port(dev->rphy->dev.parent);
+
 		list_del_init(&dev->disco_list_node);
 
 		sas_remove_children(&dev->rphy->dev);
 		sas_rphy_delete(dev->rphy);
 		sas_unregister_common_dev(port, dev);
+		sas_port_delete(sas_port);
 	}
 }
 
@@ -400,9 +403,6 @@  void sas_unregister_domain_devices(struct asd_sas_port *port, int gone)
 
 	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
 		sas_unregister_dev(port, dev);
-
-	port->port->rphy = NULL;
-
 }
 
 void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index d3c5297c6c89..9a25ae3a52a4 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -219,7 +219,6 @@  void sas_deform_port(struct asd_sas_phy *phy, int gone)
 
 	if (port->num_phys == 1) {
 		sas_unregister_domain_devices(port, gone);
-		sas_port_delete(port->port);
 		port->port = NULL;
 	} else {
 		sas_port_delete_phy(port->port, phy->phy);