Message ID | 20240416030727.17074-1-yangxingui@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | scsi: libsas: Fix exp-attached end device cannot be scanned in again after probe failed | expand |
Hi Xingui, On 2024/4/16 11:07, Xingui Yang wrote: > We found that it is judged as broadcast flutter and exits directly when the > exp-attached end device reconnects after the end device probe failed. Can you please describe how to reproduce this issue in detail? Thanks, Jason > > [78779.654026] sas: broadcast received: 0 > [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 > [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed > [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) > [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached > [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 500e004aaaaaaa05 (stp) > [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found > [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 > [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 > ... > [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 > [78835.171344] sas: sas_probe_sata: for exp-attached device 500e004aaaaaaa05 returned -19 > [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone > [78835.187487] sas: broadcast received: 0 > [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 > [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed > [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) > [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 > [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 500e004aaaaaaa05 (stp) > [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter > [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 > > The cause of the problem is that the related ex_phy information was not > cleared after the end device probe failed. In order to solve the above > problem, a function sas_ex_unregister_end_dev() is defined to clear the > ex_phy information and unregister the end device when the exp-attached end > device probe failed. > > As the sata device is an asynchronous probe, the sata device may probe > failed after done REVALIDATING DOMAIN. Then after the port is added to the > sas_port_del_list, the port will not be deleted until the end of the next > REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about > creating a duplicate port will occur in the new REVALIDATING DOMAIN when > the end device reconnects. Therefore, the previous destroy_list and > sas_port_del_list should be handled before REVALIDATING DOMAIN. > > Signed-off-by: Xingui Yang <yangxingui@huawei.com> > --- > drivers/scsi/libsas/sas_discover.c | 2 ++ > drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++ > drivers/scsi/libsas/sas_internal.h | 6 +++++- > 3 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c > index 8fb7c41c0962..aae90153f4c6 100644 > --- a/drivers/scsi/libsas/sas_discover.c > +++ b/drivers/scsi/libsas/sas_discover.c > @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct work_struct *work) > struct sas_ha_struct *ha = port->ha; > struct domain_device *ddev = port->port_dev; > > + sas_destruct_devices(port); > + sas_destruct_ports(port); > /* prevent revalidation from finding sata links in recovery */ > mutex_lock(&ha->disco_mutex); > if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) { > diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c > index f6e6db8b8aba..6ae1f4aaaf61 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -1856,6 +1856,22 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, > } > } > > +void sas_ex_unregister_end_dev(struct domain_device *dev) > +{ > + struct domain_device *parent = dev->parent; > + struct expander_device *parent_ex = &parent->ex_dev; > + int i; > + > + for (i = 0; i < parent_ex->num_phys; i++) { > + struct ex_phy *phy = &parent_ex->ex_phy[i]; > + > + if (sas_phy_match_dev_addr(dev, phy)) { > + sas_unregister_devs_sas_addr(parent, i, true); > + break; > + } > + } Did you mean this end device is a wide-port end device ? How could this happen? > +} > + > static int sas_discover_bfs_by_root_level(struct domain_device *root, > const int level) > { > diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h > index 3804aef165ad..434f928c2ed8 100644 > --- a/drivers/scsi/libsas/sas_internal.h > +++ b/drivers/scsi/libsas/sas_internal.h > @@ -50,6 +50,7 @@ void sas_discover_event(struct asd_sas_port *port, enum discover_event ev); > > void sas_init_dev(struct domain_device *dev); > void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev); > +void sas_ex_unregister_end_dev(struct domain_device *dev); > > void sas_scsi_recover_host(struct Scsi_Host *shost); > > @@ -145,7 +146,10 @@ static inline void sas_fail_probe(struct domain_device *dev, const char *func, i > func, dev->parent ? "exp-attached" : > "direct-attached", > SAS_ADDR(dev->sas_addr), err); > - sas_unregister_dev(dev->port, dev); > + if (dev->parent && !dev_is_expander(dev->dev_type)) > + sas_ex_unregister_end_dev(dev); > + else > + sas_unregister_dev(dev->port, dev); > } > > static inline void sas_fill_in_rphy(struct domain_device *dev, >
On 2024/4/17 9:46, Jason Yan wrote: > Hi Xingui, > > On 2024/4/16 11:07, Xingui Yang wrote: >> We found that it is judged as broadcast flutter and exits directly >> when the >> exp-attached end device reconnects after the end device probe failed. > > Can you please describe how to reproduce this issue in detail? The test steps we currently construct are to simulate link abnormalities and adjust the rate of the remote phy when running IO on all disks. When the sata disk is probed and the IDENTIFY command is sent to the disk, the expander return rate is abnormal, causing sata disk probe fail. But there may be many reasons for device probe failure, including expander or disk instability or link abnormalities. > > Thanks, > Jason > >> >> [78779.654026] sas: broadcast received: 0 >> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 >> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed >> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated >> BROADCAST(CHANGE) >> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached >> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: >> 500e004aaaaaaa05 (stp) >> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found >> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 >> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 >> ... >> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 >> tries: 1 >> [78835.171344] sas: sas_probe_sata: for exp-attached device >> 500e004aaaaaaa05 returned -19 >> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone >> [78835.187487] sas: broadcast received: 0 >> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 >> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed >> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated >> BROADCAST(CHANGE) >> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 >> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: >> 500e004aaaaaaa05 (stp) >> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter >> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 >> >> The cause of the problem is that the related ex_phy information was not >> cleared after the end device probe failed. In order to solve the above >> problem, a function sas_ex_unregister_end_dev() is defined to clear the >> ex_phy information and unregister the end device when the exp-attached >> end >> device probe failed. >> >> As the sata device is an asynchronous probe, the sata device may probe >> failed after done REVALIDATING DOMAIN. Then after the port is added to >> the >> sas_port_del_list, the port will not be deleted until the end of the next >> REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about >> creating a duplicate port will occur in the new REVALIDATING DOMAIN when >> the end device reconnects. Therefore, the previous destroy_list and >> sas_port_del_list should be handled before REVALIDATING DOMAIN. >> >> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >> --- >> drivers/scsi/libsas/sas_discover.c | 2 ++ >> drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++ >> drivers/scsi/libsas/sas_internal.h | 6 +++++- >> 3 files changed, 23 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/libsas/sas_discover.c >> b/drivers/scsi/libsas/sas_discover.c >> index 8fb7c41c0962..aae90153f4c6 100644 >> --- a/drivers/scsi/libsas/sas_discover.c >> +++ b/drivers/scsi/libsas/sas_discover.c >> @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct >> work_struct *work) >> struct sas_ha_struct *ha = port->ha; >> struct domain_device *ddev = port->port_dev; >> + sas_destruct_devices(port); >> + sas_destruct_ports(port); >> /* prevent revalidation from finding sata links in recovery */ >> mutex_lock(&ha->disco_mutex); >> if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) { >> diff --git a/drivers/scsi/libsas/sas_expander.c >> b/drivers/scsi/libsas/sas_expander.c >> index f6e6db8b8aba..6ae1f4aaaf61 100644 >> --- a/drivers/scsi/libsas/sas_expander.c >> +++ b/drivers/scsi/libsas/sas_expander.c >> @@ -1856,6 +1856,22 @@ static void sas_unregister_devs_sas_addr(struct >> domain_device *parent, >> } >> } >> +void sas_ex_unregister_end_dev(struct domain_device *dev) >> +{ >> + struct domain_device *parent = dev->parent; >> + struct expander_device *parent_ex = &parent->ex_dev; >> + int i; >> + >> + for (i = 0; i < parent_ex->num_phys; i++) { >> + struct ex_phy *phy = &parent_ex->ex_phy[i]; >> + >> + if (sas_phy_match_dev_addr(dev, phy)) { >> + sas_unregister_devs_sas_addr(parent, i, true); >> + break; >> + } >> + } > > Did you mean this end device is a wide-port end device ? How could this > happen? No, the end device described here is a non-expander device. Such as: sata/sas disk. But these devices are exp-attached. Thanks. Xingui
On 2024/4/17 15:47, yangxingui wrote: > > > On 2024/4/17 9:46, Jason Yan wrote: >> Hi Xingui, >> >> On 2024/4/16 11:07, Xingui Yang wrote: >>> We found that it is judged as broadcast flutter and exits directly >>> when the >>> exp-attached end device reconnects after the end device probe failed. >> >> Can you please describe how to reproduce this issue in detail? > The test steps we currently construct are to simulate link abnormalities > and adjust the rate of the remote phy when running IO on all disks. > > When the sata disk is probed and the IDENTIFY command is sent to the > disk, the expander return rate is abnormal, causing sata disk probe > fail. But there may be many reasons for device probe failure, including > expander or disk instability or link abnormalities. > >> >> Thanks, >> Jason >> >>> >>> [78779.654026] sas: broadcast received: 0 >>> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 >>> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed >>> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated >>> BROADCAST(CHANGE) >>> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached >>> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: >>> 500e004aaaaaaa05 (stp) >>> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found >>> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 >>> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 >>> ... >>> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 >>> tries: 1 >>> [78835.171344] sas: sas_probe_sata: for exp-attached device >>> 500e004aaaaaaa05 returned -19 >>> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone >>> [78835.187487] sas: broadcast received: 0 >>> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 >>> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed >>> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated >>> BROADCAST(CHANGE) >>> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 >>> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: >>> 500e004aaaaaaa05 (stp) >>> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter >>> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 >>> >>> The cause of the problem is that the related ex_phy information was not >>> cleared after the end device probe failed. In order to solve the above >>> problem, a function sas_ex_unregister_end_dev() is defined to clear the >>> ex_phy information and unregister the end device when the >>> exp-attached end >>> device probe failed. >>> >>> As the sata device is an asynchronous probe, the sata device may probe >>> failed after done REVALIDATING DOMAIN. Then after the port is added >>> to the >>> sas_port_del_list, the port will not be deleted until the end of the >>> next >>> REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about >>> creating a duplicate port will occur in the new REVALIDATING DOMAIN when >>> the end device reconnects. Therefore, the previous destroy_list and >>> sas_port_del_list should be handled before REVALIDATING DOMAIN. >>> >>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>> --- >>> drivers/scsi/libsas/sas_discover.c | 2 ++ >>> drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++ >>> drivers/scsi/libsas/sas_internal.h | 6 +++++- >>> 3 files changed, 23 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/scsi/libsas/sas_discover.c >>> b/drivers/scsi/libsas/sas_discover.c >>> index 8fb7c41c0962..aae90153f4c6 100644 >>> --- a/drivers/scsi/libsas/sas_discover.c >>> +++ b/drivers/scsi/libsas/sas_discover.c >>> @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct >>> work_struct *work) >>> struct sas_ha_struct *ha = port->ha; >>> struct domain_device *ddev = port->port_dev; >>> + sas_destruct_devices(port); >>> + sas_destruct_ports(port); >>> /* prevent revalidation from finding sata links in recovery */ >>> mutex_lock(&ha->disco_mutex); >>> if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) { >>> diff --git a/drivers/scsi/libsas/sas_expander.c >>> b/drivers/scsi/libsas/sas_expander.c >>> index f6e6db8b8aba..6ae1f4aaaf61 100644 >>> --- a/drivers/scsi/libsas/sas_expander.c >>> +++ b/drivers/scsi/libsas/sas_expander.c >>> @@ -1856,6 +1856,22 @@ static void >>> sas_unregister_devs_sas_addr(struct domain_device *parent, >>> } >>> } >>> +void sas_ex_unregister_end_dev(struct domain_device *dev) >>> +{ >>> + struct domain_device *parent = dev->parent; >>> + struct expander_device *parent_ex = &parent->ex_dev; >>> + int i; >>> + >>> + for (i = 0; i < parent_ex->num_phys; i++) { >>> + struct ex_phy *phy = &parent_ex->ex_phy[i]; >>> + >>> + if (sas_phy_match_dev_addr(dev, phy)) { >>> + sas_unregister_devs_sas_addr(parent, i, true); >>> + break; >>> + } >>> + } >> >> Did you mean this end device is a wide-port end device ? How could >> this happen? > > No, the end device described here is a non-expander device. Such as: > sata/sas disk. But these devices are exp-attached. If it is not a wide port, why do they have the same sas address here? Why do you add this function to unregister these PHYs? And the last parameter of sas_unregister_devs_sas_addr() means the last PHY of the wide port, you just all passed true, it is irrational. > > Thanks. > Xingui > .
Hi Jason, On 2024/4/18 9:46, Jason Yan wrote: > On 2024/4/17 15:47, yangxingui wrote: >> >> >> On 2024/4/17 9:46, Jason Yan wrote: >>> Hi Xingui, >>> >>> On 2024/4/16 11:07, Xingui Yang wrote: >>>> We found that it is judged as broadcast flutter and exits directly >>>> when the >>>> exp-attached end device reconnects after the end device probe failed. >>> >>> Can you please describe how to reproduce this issue in detail? >> The test steps we currently construct are to simulate link >> abnormalities and adjust the rate of the remote phy when running IO on >> all disks. >> >> When the sata disk is probed and the IDENTIFY command is sent to the >> disk, the expander return rate is abnormal, causing sata disk probe >> fail. But there may be many reasons for device probe failure, >> including expander or disk instability or link abnormalities. >> >>> >>> Thanks, >>> Jason >>> >>>> >>>> [78779.654026] sas: broadcast received: 0 >>>> [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 >>>> [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed >>>> [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated >>>> BROADCAST(CHANGE) >>>> [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached >>>> [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: >>>> 500e004aaaaaaa05 (stp) >>>> [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found >>>> [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 >>>> [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 >>>> ... >>>> [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: >>>> 0 tries: 1 >>>> [78835.171344] sas: sas_probe_sata: for exp-attached device >>>> 500e004aaaaaaa05 returned -19 >>>> [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone >>>> [78835.187487] sas: broadcast received: 0 >>>> [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 >>>> [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed >>>> [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated >>>> BROADCAST(CHANGE) >>>> [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 >>>> [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: >>>> 500e004aaaaaaa05 (stp) >>>> [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter >>>> [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 >>>> >>>> The cause of the problem is that the related ex_phy information was not >>>> cleared after the end device probe failed. In order to solve the above >>>> problem, a function sas_ex_unregister_end_dev() is defined to clear the >>>> ex_phy information and unregister the end device when the >>>> exp-attached end >>>> device probe failed. >>>> >>>> As the sata device is an asynchronous probe, the sata device may probe >>>> failed after done REVALIDATING DOMAIN. Then after the port is added >>>> to the >>>> sas_port_del_list, the port will not be deleted until the end of the >>>> next >>>> REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about >>>> creating a duplicate port will occur in the new REVALIDATING DOMAIN >>>> when >>>> the end device reconnects. Therefore, the previous destroy_list and >>>> sas_port_del_list should be handled before REVALIDATING DOMAIN. >>>> >>>> Signed-off-by: Xingui Yang <yangxingui@huawei.com> >>>> --- >>>> drivers/scsi/libsas/sas_discover.c | 2 ++ >>>> drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++ >>>> drivers/scsi/libsas/sas_internal.h | 6 +++++- >>>> 3 files changed, 23 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/scsi/libsas/sas_discover.c >>>> b/drivers/scsi/libsas/sas_discover.c >>>> index 8fb7c41c0962..aae90153f4c6 100644 >>>> --- a/drivers/scsi/libsas/sas_discover.c >>>> +++ b/drivers/scsi/libsas/sas_discover.c >>>> @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct >>>> work_struct *work) >>>> struct sas_ha_struct *ha = port->ha; >>>> struct domain_device *ddev = port->port_dev; >>>> + sas_destruct_devices(port); >>>> + sas_destruct_ports(port); >>>> /* prevent revalidation from finding sata links in recovery */ >>>> mutex_lock(&ha->disco_mutex); >>>> if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) { >>>> diff --git a/drivers/scsi/libsas/sas_expander.c >>>> b/drivers/scsi/libsas/sas_expander.c >>>> index f6e6db8b8aba..6ae1f4aaaf61 100644 >>>> --- a/drivers/scsi/libsas/sas_expander.c >>>> +++ b/drivers/scsi/libsas/sas_expander.c >>>> @@ -1856,6 +1856,22 @@ static void >>>> sas_unregister_devs_sas_addr(struct domain_device *parent, >>>> } >>>> } >>>> +void sas_ex_unregister_end_dev(struct domain_device *dev) >>>> +{ >>>> + struct domain_device *parent = dev->parent; >>>> + struct expander_device *parent_ex = &parent->ex_dev; >>>> + int i; >>>> + >>>> + for (i = 0; i < parent_ex->num_phys; i++) { >>>> + struct ex_phy *phy = &parent_ex->ex_phy[i]; >>>> + >>>> + if (sas_phy_match_dev_addr(dev, phy)) { >>>> + sas_unregister_devs_sas_addr(parent, i, true); >>>> + break; >>>> + } >>>> + } >>> >>> Did you mean this end device is a wide-port end device ? How could >>> this happen? >> >> No, the end device described here is a non-expander device. Such as: >> sata/sas disk. But these devices are exp-attached. > > If it is not a wide port, why do they have the same sas address here? > Why do you add this function to unregister these PHYs? And the last > parameter of sas_unregister_devs_sas_addr() means the last PHY of the > wide port, you just all passed true, it is irrational. The non-expander end device does not have a wide port, such as a sata disk, and there is only one ex_phy corresponding to it. This function finds the ex_phy corresponding to the end dev through dev->sas_addr, then clears the ex_phy information and unregister the end device. Thanks, Xingui
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c index 8fb7c41c0962..aae90153f4c6 100644 --- a/drivers/scsi/libsas/sas_discover.c +++ b/drivers/scsi/libsas/sas_discover.c @@ -517,6 +517,8 @@ static void sas_revalidate_domain(struct work_struct *work) struct sas_ha_struct *ha = port->ha; struct domain_device *ddev = port->port_dev; + sas_destruct_devices(port); + sas_destruct_ports(port); /* prevent revalidation from finding sata links in recovery */ mutex_lock(&ha->disco_mutex); if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) { diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index f6e6db8b8aba..6ae1f4aaaf61 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1856,6 +1856,22 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, } } +void sas_ex_unregister_end_dev(struct domain_device *dev) +{ + struct domain_device *parent = dev->parent; + struct expander_device *parent_ex = &parent->ex_dev; + int i; + + for (i = 0; i < parent_ex->num_phys; i++) { + struct ex_phy *phy = &parent_ex->ex_phy[i]; + + if (sas_phy_match_dev_addr(dev, phy)) { + sas_unregister_devs_sas_addr(parent, i, true); + break; + } + } +} + static int sas_discover_bfs_by_root_level(struct domain_device *root, const int level) { diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h index 3804aef165ad..434f928c2ed8 100644 --- a/drivers/scsi/libsas/sas_internal.h +++ b/drivers/scsi/libsas/sas_internal.h @@ -50,6 +50,7 @@ void sas_discover_event(struct asd_sas_port *port, enum discover_event ev); void sas_init_dev(struct domain_device *dev); void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev); +void sas_ex_unregister_end_dev(struct domain_device *dev); void sas_scsi_recover_host(struct Scsi_Host *shost); @@ -145,7 +146,10 @@ static inline void sas_fail_probe(struct domain_device *dev, const char *func, i func, dev->parent ? "exp-attached" : "direct-attached", SAS_ADDR(dev->sas_addr), err); - sas_unregister_dev(dev->port, dev); + if (dev->parent && !dev_is_expander(dev->dev_type)) + sas_ex_unregister_end_dev(dev); + else + sas_unregister_dev(dev->port, dev); } static inline void sas_fill_in_rphy(struct domain_device *dev,
We found that it is judged as broadcast flutter and exits directly when the exp-attached end device reconnects after the end device probe failed. [78779.654026] sas: broadcast received: 0 [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 500e004aaaaaaa05 (stp) [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 ... [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [78835.171344] sas: sas_probe_sata: for exp-attached device 500e004aaaaaaa05 returned -19 [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone [78835.187487] sas: broadcast received: 0 [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 500e004aaaaaaa05 (stp) [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 The cause of the problem is that the related ex_phy information was not cleared after the end device probe failed. In order to solve the above problem, a function sas_ex_unregister_end_dev() is defined to clear the ex_phy information and unregister the end device when the exp-attached end device probe failed. As the sata device is an asynchronous probe, the sata device may probe failed after done REVALIDATING DOMAIN. Then after the port is added to the sas_port_del_list, the port will not be deleted until the end of the next REVALIDATING DOMAIN and sas_destruct_ports() is called. A warning about creating a duplicate port will occur in the new REVALIDATING DOMAIN when the end device reconnects. Therefore, the previous destroy_list and sas_port_del_list should be handled before REVALIDATING DOMAIN. Signed-off-by: Xingui Yang <yangxingui@huawei.com> --- drivers/scsi/libsas/sas_discover.c | 2 ++ drivers/scsi/libsas/sas_expander.c | 16 ++++++++++++++++ drivers/scsi/libsas/sas_internal.h | 6 +++++- 3 files changed, 23 insertions(+), 1 deletion(-)