From patchwork Wed Jan 30 08:24:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787861 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A68616C2 for ; Wed, 30 Jan 2019 08:26:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96A7D1FE82 for ; Wed, 30 Jan 2019 08:26:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B31D27EE2; Wed, 30 Jan 2019 08:26:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2661A1FE82 for ; Wed, 30 Jan 2019 08:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730193AbfA3IZy (ORCPT ); Wed, 30 Jan 2019 03:25:54 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2699 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725834AbfA3IZy (ORCPT ); Wed, 30 Jan 2019 03:25:54 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id C3DC751258FFD29DE41B; Wed, 30 Jan 2019 16:25:51 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:45 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 1/7] scsi: libsas: reset the negotiated_linkrate when phy is down Date: Wed, 30 Jan 2019 16:24:06 +0800 Message-ID: <20190130082412.9357-2-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If the device is unplugged or disconnected, the negotiated_linkrate still can be seen from the userspace by sysfs. This makes people confused and leaks information of the device last used. So let's reset the negotiated_linkrate after the phy is down. Signed-off-by: Jason Yan CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Hannes Reinecke --- drivers/scsi/libsas/sas_expander.c | 2 ++ include/scsi/libsas.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 17eb4185f29d..7b0e6dcef6e6 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1904,6 +1904,8 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent, &parent->port->sas_port_del_list); phy->port = NULL; } + if (phy->phy) + phy->phy->negotiated_linkrate = SAS_LINK_RATE_UNKNOWN; } static int sas_discover_bfs_by_root_level(struct domain_device *root, diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index 3de3b10da19a..420156cea3ee 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -448,6 +448,9 @@ static inline void sas_phy_disconnected(struct asd_sas_phy *phy) { phy->oob_mode = OOB_NOT_CONNECTED; phy->linkrate = SAS_LINK_RATE_UNKNOWN; + + if (phy->phy) + phy->phy->negotiated_linkrate = SAS_LINK_RATE_UNKNOWN; } static inline unsigned int to_sas_gpio_od(int device, int bit) From patchwork Wed Jan 30 08:24:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787849 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 21BF76C2 for ; Wed, 30 Jan 2019 08:26:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 121911FE82 for ; Wed, 30 Jan 2019 08:26:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0600F27EE2; Wed, 30 Jan 2019 08:26:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E8051FE82 for ; Wed, 30 Jan 2019 08:26:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730283AbfA3IZ7 (ORCPT ); Wed, 30 Jan 2019 03:25:59 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2703 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730256AbfA3IZ6 (ORCPT ); Wed, 30 Jan 2019 03:25:58 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id E19F0B1B2C21C26959D1; Wed, 30 Jan 2019 16:25:56 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:46 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 2/7] scsi: libsas: only clear phy->in_shutdown after shutdown event done Date: Wed, 30 Jan 2019 16:24:07 +0800 Message-ID: <20190130082412.9357-3-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When the event queue is full of phy up and down events and reached the threshold, we will queue a shutdown-event, and set phy->in_shutdown so that we will not queue a shutdown-event again. But before the shutdown-event can be executed, every phy-down event will clear phy->in_shutdown and a new shutdown-event will be queued. The queue will be full of these shutdown-events. Fix this by only clear phy->in_shutdown in sas_phye_shutdown(), that is after the first shutdown-event has been executed. Fixes: f12486e06ae8 ("scsi: libsas: shut down the PHY if events reached the threshold") Signed-off-by: Jason Yan CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Hannes Reinecke Reviewed-by: John Garry --- drivers/scsi/libsas/sas_phy.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c index 0374243c85d0..762bb13cca74 100644 --- a/drivers/scsi/libsas/sas_phy.c +++ b/drivers/scsi/libsas/sas_phy.c @@ -35,7 +35,6 @@ static void sas_phye_loss_of_signal(struct work_struct *work) struct asd_sas_event *ev = to_asd_sas_event(work); struct asd_sas_phy *phy = ev->phy; - phy->in_shutdown = 0; phy->error = 0; sas_deform_port(phy, 1); } @@ -45,7 +44,6 @@ static void sas_phye_oob_done(struct work_struct *work) struct asd_sas_event *ev = to_asd_sas_event(work); struct asd_sas_phy *phy = ev->phy; - phy->in_shutdown = 0; phy->error = 0; } @@ -127,6 +125,7 @@ static void sas_phye_shutdown(struct work_struct *work) } else pr_notice("phy%02d is not enabled, cannot shutdown\n", phy->id); + phy->in_shutdown = 0; } /* ---------- Phy class registration ---------- */ From patchwork Wed Jan 30 08:24:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787857 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4D39746 for ; Wed, 30 Jan 2019 08:26:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A54AF1FE82 for ; Wed, 30 Jan 2019 08:26:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9879127EE2; Wed, 30 Jan 2019 08:26:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 123531FE82 for ; Wed, 30 Jan 2019 08:26:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730422AbfA3I01 (ORCPT ); Wed, 30 Jan 2019 03:26:27 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2702 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730255AbfA3IZ7 (ORCPT ); Wed, 30 Jan 2019 03:25:59 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id DA34F614C3A639B2FB62; Wed, 30 Jan 2019 16:25:56 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:47 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 3/7] scsi: libsas: optimize the debug print of the revalidate process Date: Wed, 30 Jan 2019 16:24:08 +0800 Message-ID: <20190130082412.9357-4-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP sas_rediscover() returns error code if discover failed for a expander phy. And sas_ex_revalidate_domain() only returns the last phy's error code. So when sas_revalidate_domain() prints the return value of the discover process, we do not know if the revalidation for every phy is successful or not. We just know the last bcast phy revalidation succeeded or not. No need to return a error code for sas_ex_revalidate_domain() and sas_rediscover(), and just print the debug log for each bcast phy directly in sas_rediscover(). Signed-off-by: Jason Yan CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Hannes Reinecke --- drivers/scsi/libsas/sas_discover.c | 7 +++---- drivers/scsi/libsas/sas_expander.c | 11 ++++++----- include/scsi/libsas.h | 2 +- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c index 726ada9b8c79..ffc571a12916 100644 --- a/drivers/scsi/libsas/sas_discover.c +++ b/drivers/scsi/libsas/sas_discover.c @@ -500,7 +500,6 @@ static void sas_discover_domain(struct work_struct *work) static void sas_revalidate_domain(struct work_struct *work) { - int res = 0; struct sas_discovery_event *ev = to_sas_discovery_event(work); struct asd_sas_port *port = ev->port; struct sas_ha_struct *ha = port->ha; @@ -521,10 +520,10 @@ static void sas_revalidate_domain(struct work_struct *work) if (ddev && (ddev->dev_type == SAS_FANOUT_EXPANDER_DEVICE || ddev->dev_type == SAS_EDGE_EXPANDER_DEVICE)) - res = sas_ex_revalidate_domain(ddev); + sas_ex_revalidate_domain(ddev); - pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n", - port->id, task_pid_nr(current), res); + pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d\n", + port->id, task_pid_nr(current)); out: mutex_unlock(&ha->disco_mutex); diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 7b0e6dcef6e6..5cd720f93f96 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -2062,7 +2062,7 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last) * first phy,for other phys in this port, we add it to the port to * forming the wide-port. */ -static int sas_rediscover(struct domain_device *dev, const int phy_id) +static void sas_rediscover(struct domain_device *dev, const int phy_id) { struct expander_device *ex = &dev->ex_dev; struct ex_phy *changed_phy = &ex->ex_phy[phy_id]; @@ -2090,7 +2090,9 @@ static int sas_rediscover(struct domain_device *dev, const int phy_id) res = sas_rediscover_dev(dev, phy_id, last); } else res = sas_discover_new(dev, phy_id); - return res; + + pr_debug("ex %016llx phy%d discover returned 0x%x\n", + SAS_ADDR(dev->sas_addr), phy_id, res); } /** @@ -2102,7 +2104,7 @@ static int sas_rediscover(struct domain_device *dev, const int phy_id) * Discover process only interrogates devices in order to discover the * domain. */ -int sas_ex_revalidate_domain(struct domain_device *port_dev) +void sas_ex_revalidate_domain(struct domain_device *port_dev) { int res; struct domain_device *dev = NULL; @@ -2117,11 +2119,10 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev) res = sas_find_bcast_phy(dev, &phy_id, i, true); if (phy_id == -1) break; - res = sas_rediscover(dev, phy_id); + sas_rediscover(dev, phy_id); i = phy_id + 1; } while (i < ex->num_phys); } - return res; } void sas_smp_handler(struct bsg_job *job, struct Scsi_Host *shost, diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index 420156cea3ee..e557bcb0c266 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -692,7 +692,7 @@ int sas_discover_root_expander(struct domain_device *); void sas_init_ex_attr(void); -int sas_ex_revalidate_domain(struct domain_device *); +void sas_ex_revalidate_domain(struct domain_device *); void sas_unregister_domain_devices(struct asd_sas_port *port, int gone); void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *); From patchwork Wed Jan 30 08:24:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787855 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0B45746 for ; Wed, 30 Jan 2019 08:26:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE88E1FE82 for ; Wed, 30 Jan 2019 08:26:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9F6C027EE2; Wed, 30 Jan 2019 08:26:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 03E231FE82 for ; Wed, 30 Jan 2019 08:26:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730275AbfA3IZ7 (ORCPT ); Wed, 30 Jan 2019 03:25:59 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2705 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730249AbfA3IZ6 (ORCPT ); Wed, 30 Jan 2019 03:25:58 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id F0827A44E62DB6304100; Wed, 30 Jan 2019 16:25:56 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:48 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 4/7] scsi: libsas: split the replacement of sas disks in two steps Date: Wed, 30 Jan 2019 16:24:09 +0800 Message-ID: <20190130082412.9357-5-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now if a new device replaced a old device, the sas address will change. We unregister the old device and discover the new device in one revalidation process. But after we deferred the sas_port_delete(), the sas port is not deleted when we registering the new port and device. The sas port cannot be added because the name of the new port is the same as the old. Fix this by doing the replacement in two steps. The first revalidation only delete the old device and trigger a new revalidation. The second revalidation discover the new device. To keep the event processing synchronised to the original event, we wrapped a loop and added a new parameter to see if we should revalidate again. Signed-off-by: Jason Yan CC: chenxiang CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Hannes Reinecke --- drivers/scsi/libsas/sas_discover.c | 20 +++++++++++++++----- drivers/scsi/libsas/sas_expander.c | 20 ++++++++++++++------ include/scsi/libsas.h | 2 +- 3 files changed, 30 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c index ffc571a12916..c825c89fbddd 100644 --- a/drivers/scsi/libsas/sas_discover.c +++ b/drivers/scsi/libsas/sas_discover.c @@ -498,12 +498,10 @@ static void sas_discover_domain(struct work_struct *work) task_pid_nr(current), error); } -static void sas_revalidate_domain(struct work_struct *work) +static void sas_do_revalidate_domain(struct asd_sas_port *port, bool *retry) { - struct sas_discovery_event *ev = to_sas_discovery_event(work); - struct asd_sas_port *port = ev->port; - struct sas_ha_struct *ha = port->ha; struct domain_device *ddev = port->port_dev; + struct sas_ha_struct *ha = port->ha; /* prevent revalidation from finding sata links in recovery */ mutex_lock(&ha->disco_mutex); @@ -520,7 +518,7 @@ static void sas_revalidate_domain(struct work_struct *work) if (ddev && (ddev->dev_type == SAS_FANOUT_EXPANDER_DEVICE || ddev->dev_type == SAS_EDGE_EXPANDER_DEVICE)) - sas_ex_revalidate_domain(ddev); + sas_ex_revalidate_domain(ddev, retry); pr_debug("done REVALIDATING DOMAIN on port %d, pid:%d\n", port->id, task_pid_nr(current)); @@ -532,6 +530,18 @@ static void sas_revalidate_domain(struct work_struct *work) sas_probe_devices(port); } +static void sas_revalidate_domain(struct work_struct *work) +{ + struct sas_discovery_event *ev = to_sas_discovery_event(work); + struct asd_sas_port *port = ev->port; + bool retry; + + do { + retry = false; + sas_do_revalidate_domain(port, &retry); + } while (retry); +} + /* ---------- Events ---------- */ static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 5cd720f93f96..cdbf8d8a28bf 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1994,7 +1994,8 @@ static bool dev_type_flutter(enum sas_device_type new, enum sas_device_type old) return false; } -static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last) +static int sas_unregister(struct domain_device *dev, int phy_id, bool last, + bool *retry) { struct expander_device *ex = &dev->ex_dev; struct ex_phy *phy = &ex->ex_phy[phy_id]; @@ -2045,7 +2046,12 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last) SAS_ADDR(phy->attached_sas_addr)); sas_unregister_devs_sas_addr(dev, phy_id, last); - return sas_discover_new(dev, phy_id); + /* force the next revalidation find this phy and bring it up */ + phy->phy_change_count = -1; + ex->ex_change_count = -1; + *retry = true; + + return 0; } /** @@ -2062,7 +2068,8 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last) * first phy,for other phys in this port, we add it to the port to * forming the wide-port. */ -static void sas_rediscover(struct domain_device *dev, const int phy_id) +static void sas_rediscover(struct domain_device *dev, const int phy_id, + bool *retry) { struct expander_device *ex = &dev->ex_dev; struct ex_phy *changed_phy = &ex->ex_phy[phy_id]; @@ -2087,7 +2094,7 @@ static void sas_rediscover(struct domain_device *dev, const int phy_id) break; } } - res = sas_rediscover_dev(dev, phy_id, last); + res = sas_unregister(dev, phy_id, last, retry); } else res = sas_discover_new(dev, phy_id); @@ -2098,13 +2105,14 @@ static void sas_rediscover(struct domain_device *dev, const int phy_id) /** * sas_ex_revalidate_domain - revalidate the domain * @port_dev: port domain device. + * @retry: do we need to revalidate again * * NOTE: this process _must_ quit (return) as soon as any connection * errors are encountered. Connection recovery is done elsewhere. * Discover process only interrogates devices in order to discover the * domain. */ -void sas_ex_revalidate_domain(struct domain_device *port_dev) +void sas_ex_revalidate_domain(struct domain_device *port_dev, bool *retry) { int res; struct domain_device *dev = NULL; @@ -2119,7 +2127,7 @@ void sas_ex_revalidate_domain(struct domain_device *port_dev) res = sas_find_bcast_phy(dev, &phy_id, i, true); if (phy_id == -1) break; - sas_rediscover(dev, phy_id); + sas_rediscover(dev, phy_id, retry); i = phy_id + 1; } while (i < ex->num_phys); } diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index e557bcb0c266..deb75765e555 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -692,7 +692,7 @@ int sas_discover_root_expander(struct domain_device *); void sas_init_ex_attr(void); -void sas_ex_revalidate_domain(struct domain_device *); +void sas_ex_revalidate_domain(struct domain_device *port_dev, bool *retry); void sas_unregister_domain_devices(struct asd_sas_port *port, int gone); void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *); From patchwork Wed Jan 30 08:24:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787851 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E21BD746 for ; Wed, 30 Jan 2019 08:26:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1FDD26E82 for ; Wed, 30 Jan 2019 08:26:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C5534285A8; Wed, 30 Jan 2019 08:26:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 15FFD26E82 for ; Wed, 30 Jan 2019 08:26:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730373AbfA3I0K (ORCPT ); Wed, 30 Jan 2019 03:26:10 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2706 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730310AbfA3I0B (ORCPT ); Wed, 30 Jan 2019 03:26:01 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 0FC7C59403CD7288EAD6; Wed, 30 Jan 2019 16:25:57 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:49 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 5/7] scsi: libsas: check if the same device when flutter Date: Wed, 30 Jan 2019 16:24:10 +0800 Message-ID: <20190130082412.9357-6-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The ata device do not have a real sas address. If a ata device is replaced with another one, the sas address is the same. Now libsas treat this scenario as flutter and do not delete the old one and discover the new one. This will cause the data read from or write to the wrong device. And also when hotplugging a sata device, libsas entered to the flutter case and sometimes found the phy attached address is abnormal. The log is like this: sas: ex 500e004aaaaaaa1f phy6 originated BROADCAST(CHANGE) sas: ex 500e004aaaaaaa1f phy06:U:0 attached: 0000000000000000 (no device) sas: ex 500e004aaaaaaa1f phy 0x6 broadcast flutter Fix this issue by checking the phy attached address and the ata device's class and id if they are the same as the origin. The ata class and id is readed in ata EH process. When ata EH is scheduled, revalidate will be deferred and a new bcast will be raised. Tested-by: Chen Liangfei Signed-off-by: Jason Yan Reviewed-by: John Garry CC: chenxiang CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Tejun Heo CC: Hannes Reinecke --- drivers/ata/libata-core.c | 3 +- drivers/scsi/libsas/sas_ata.c | 18 ++++++++++ drivers/scsi/libsas/sas_expander.c | 67 ++++++++++++++++++++++++++++++++------ include/linux/libata.h | 2 ++ include/scsi/libsas.h | 1 + 5 files changed, 80 insertions(+), 11 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index b8c3f9e6af89..67e77fa3c63a 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4225,7 +4225,7 @@ void ata_std_postreset(struct ata_link *link, unsigned int *classes) * RETURNS: * 1 if @dev matches @new_class and @new_id, 0 otherwise. */ -static int ata_dev_same_device(struct ata_device *dev, unsigned int new_class, +int ata_dev_same_device(struct ata_device *dev, unsigned int new_class, const u16 *new_id) { const u16 *old_id = dev->id; @@ -7400,6 +7400,7 @@ EXPORT_SYMBOL_GPL(ata_eh_analyze_ncq_error); EXPORT_SYMBOL_GPL(ata_do_eh); EXPORT_SYMBOL_GPL(ata_std_error_handler); +EXPORT_SYMBOL_GPL(ata_dev_same_device); EXPORT_SYMBOL_GPL(ata_cable_40wire); EXPORT_SYMBOL_GPL(ata_cable_80wire); EXPORT_SYMBOL_GPL(ata_cable_unknown); diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c index 6f93fee2b21b..a9f5523f0347 100644 --- a/drivers/scsi/libsas/sas_ata.c +++ b/drivers/scsi/libsas/sas_ata.c @@ -625,6 +625,22 @@ static int sas_get_ata_command_set(struct domain_device *dev) return ata_dev_classify(&tf); } +static void sas_ata_store_id(struct domain_device *dev) +{ + struct ata_device *ata_dev = sas_to_ata_dev(dev); + unsigned char model[ATA_ID_PROD_LEN + 1]; + unsigned char serial[ATA_ID_SERNO_LEN + 1]; + + /* store the ata device's class and id */ + memcpy(dev->sata_dev.id, ata_dev->id, ATA_ID_WORDS); + dev->sata_dev.class = ata_dev->class; + + ata_id_c_string(ata_dev->id, model, ATA_ID_PROD, sizeof(model)); + ata_id_c_string(ata_dev->id, serial, ATA_ID_SERNO, sizeof(serial)); + + sas_ata_printk(KERN_INFO, dev, "model:%s serial:%s\n", model, serial); +} + void sas_probe_sata(struct asd_sas_port *port) { struct domain_device *dev, *n; @@ -649,6 +665,8 @@ void sas_probe_sata(struct asd_sas_port *port) */ if (!ata_dev_enabled(sas_to_ata_dev(dev))) sas_fail_probe(dev, __func__, -ENODEV); + else + sas_ata_store_id(dev); } } diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index cdbf8d8a28bf..6e56ebdc2148 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1994,6 +1994,61 @@ static bool dev_type_flutter(enum sas_device_type new, enum sas_device_type old) return false; } +/* + * we think the device is fluttering so just read the phy state and update + * some information of the device, but if some important things changed + * such as the sas address, or the linkrate, or the ata devices id and class, + * we have to unregister the device and re-probe it. + */ +static bool sas_process_flutter(struct domain_device *dev, struct ex_phy *phy, + int phy_id, u8 *sas_addr) +{ + struct domain_device *ata_dev = sas_ex_to_ata(dev, phy_id); + enum sas_linkrate linkrate = phy->linkrate; + char *action = ""; + + sas_ex_phy_discover(dev, phy_id); + + if (ata_dev && phy->attached_dev_type == SAS_SATA_PENDING) + action = ", needs recovery"; + pr_debug("ex %016llx phy%d broadcast flutter%s\n", + SAS_ADDR(dev->sas_addr), phy_id, action); + + if (linkrate != phy->linkrate) { + pr_debug("ex %016llx phy%d linkrate changed from %d to %d\n", + SAS_ADDR(dev->sas_addr), phy_id, + linkrate, phy->linkrate); + return false; + } + + /* the phy attached address will be updated by sas_ex_phy_discover() + * and sometimes become abnormal + */ + if (SAS_ADDR(phy->attached_sas_addr) != SAS_ADDR(sas_addr) || + SAS_ADDR(phy->attached_sas_addr) == 0) { + /* if attached_sas_addr become abnormal, we must set the + * original address back so that the device can be unregistered + */ + memcpy(phy->attached_sas_addr, sas_addr, SAS_ADDR_SIZE); + pr_debug("phy address(%016llx) abnormal, origin:%016llx\n", + SAS_ADDR(phy->attached_sas_addr), + SAS_ADDR(sas_addr)); + return false; + } + + if (ata_dev) { + struct ata_device *adev = sas_to_ata_dev(ata_dev); + unsigned int class = ata_dev->sata_dev.class; + u16 *id = ata_dev->sata_dev.id; + + /* to see if the disk is replaced with another one */ + if (!ata_dev_same_device(adev, class, id)) + return false; + } + + return true; +} + static int sas_unregister(struct domain_device *dev, int phy_id, bool last, bool *retry) { @@ -2028,16 +2083,8 @@ static int sas_unregister(struct domain_device *dev, int phy_id, bool last, return res; } else if (SAS_ADDR(sas_addr) == SAS_ADDR(phy->attached_sas_addr) && dev_type_flutter(type, phy->attached_dev_type)) { - struct domain_device *ata_dev = sas_ex_to_ata(dev, phy_id); - char *action = ""; - - sas_ex_phy_discover(dev, phy_id); - - if (ata_dev && phy->attached_dev_type == SAS_SATA_PENDING) - action = ", needs recovery"; - pr_debug("ex %016llx phy 0x%x broadcast flutter%s\n", - SAS_ADDR(dev->sas_addr), phy_id, action); - return res; + if (sas_process_flutter(dev, phy, phy_id, sas_addr)) + return res; } /* we always have to delete the old device when we went here */ diff --git a/include/linux/libata.h b/include/linux/libata.h index 68133842e6d7..b253cfcdd6ae 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1144,6 +1144,8 @@ extern int sata_scr_write(struct ata_link *link, int reg, u32 val); extern int sata_scr_write_flush(struct ata_link *link, int reg, u32 val); extern bool ata_link_online(struct ata_link *link); extern bool ata_link_offline(struct ata_link *link); +extern int ata_dev_same_device(struct ata_device *dev, unsigned int new_class, + const u16 *new_id); #ifdef CONFIG_PM extern int ata_host_suspend(struct ata_host *host, pm_message_t mesg); extern void ata_host_resume(struct ata_host *host); diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index deb75765e555..2da5d085b11a 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -164,6 +164,7 @@ struct sata_device { struct ata_host *ata_host; struct smp_resp rps_resp ____cacheline_aligned; /* report_phy_sata_resp */ u8 fis[ATA_RESP_FIS_SIZE]; + u16 id[ATA_ID_WORDS]; }; struct ssp_device { From patchwork Wed Jan 30 08:24:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787853 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 14BB26C2 for ; Wed, 30 Jan 2019 08:26:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 05A3226E82 for ; Wed, 30 Jan 2019 08:26:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ED635285A8; Wed, 30 Jan 2019 08:26:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 93BAC26E82 for ; Wed, 30 Jan 2019 08:26:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730296AbfA3IZ7 (ORCPT ); Wed, 30 Jan 2019 03:25:59 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2701 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725834AbfA3IZ6 (ORCPT ); Wed, 30 Jan 2019 03:25:58 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id D39B661D8BCC729635C2; Wed, 30 Jan 2019 16:25:56 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:50 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Xiaofei Tan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 6/7] scsi: libsas: reset the phy address if discover failed Date: Wed, 30 Jan 2019 16:24:11 +0800 Message-ID: <20190130082412.9357-7-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When we failed to discover the device, the phy address is still kept in ex_phy. So when the next time we revalidate this phy the address and device type is the same, it will be considered as flutter and will not be discovered again. So the device will not be brought up. Fix this by reset the phy address to the initial value. Then in the next revalidation the device will be discovered agian. Tested-by: Chen Liangfei Signed-off-by: Jason Yan CC: Xiaofei Tan CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Hannes Reinecke Reviewed-by: John Garry --- drivers/scsi/libsas/sas_expander.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 6e56ebdc2148..e781941a7088 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -1100,6 +1100,13 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id) i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr)); } } + } else { + /* if we failed to discover this device, we have to + * reset the expander phy attached address so that we + * will not treat the phy as flutter in the next + * revalidation + */ + memset(ex_phy->attached_sas_addr, 0, SAS_ADDR_SIZE); } return res; From patchwork Wed Jan 30 08:24:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Yan X-Patchwork-Id: 10787859 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4177A6C2 for ; Wed, 30 Jan 2019 08:26:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2E2831FE82 for ; Wed, 30 Jan 2019 08:26:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 21FE427EE2; Wed, 30 Jan 2019 08:26:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C7031FE82 for ; Wed, 30 Jan 2019 08:26:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730260AbfA3IZ7 (ORCPT ); Wed, 30 Jan 2019 03:25:59 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2704 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730251AbfA3IZ7 (ORCPT ); Wed, 30 Jan 2019 03:25:59 -0500 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id E8684CA78E3B31BB3F4E; Wed, 30 Jan 2019 16:25:56 +0800 (CST) Received: from huawei.com (10.175.124.28) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.408.0; Wed, 30 Jan 2019 16:25:51 +0800 From: Jason Yan To: , CC: , , , , , , , , , , , , , Jason Yan , Xiaofei Tan , Ewan Milne , Tomas Henzl Subject: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks Date: Wed, 30 Jan 2019 16:24:12 +0800 Message-ID: <20190130082412.9357-8-yanaijie@huawei.com> X-Mailer: git-send-email 2.14.4 In-Reply-To: <20190130082412.9357-1-yanaijie@huawei.com> References: <20190130082412.9357-1-yanaijie@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The work flow of revalidation now is scanning expander phy by the sequence of the phy and check if the phy have changed. This will leads to an issue of swapping two sas disks on one expander. Assume we have two sas disks, connected with expander phy10 and phy11: phy10: 5000cca04eb1001d port-0:0:10 phy11: 5000cca04eb043ad port-0:0:11 Swap these two disks, and imaging the following scenario: revalidation 1: -->phy10: 0 --> delete phy10 domain device -->phy11: 5000cca04eb043ad (no change) revalidation done revalidation 2: -->step 1, check phy10: -->phy10: 5000cca04eb043ad --> add to wide port(port-0:0:11) (phy11 address is still 5000cca04eb043ad now) -->step 2, check phy11: -->phy11: 0 --> phy11 address is 0 now, but it's part of wide port(port-0:0:11), the domain device will not be deleted. revalidation done revalidation 3: -->phy10, 5000cca04eb043ad (no change) -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed, port-0:0:11 already exist, trigger a warning as follows revalidation done [14790.189699] sysfs: cannot create duplicate filename '/devices/pci0000:74/0000:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11' [14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted 4.16.0-rc1-191134-g138f084-dirty #228 [14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B303 05/16/2018 [14790.219323] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain [14790.225404] Call trace: [14790.227842] dump_backtrace+0x0/0x18c [14790.231492] show_stack+0x14/0x1c [14790.234798] dump_stack+0x88/0xac [14790.238101] sysfs_warn_dup+0x64/0x7c [14790.241751] sysfs_create_dir_ns+0x90/0xa0 [14790.245835] kobject_add_internal+0xa0/0x284 [14790.250092] kobject_add+0xb8/0x11c [14790.253570] device_add+0xe8/0x598 [14790.256960] sas_port_add+0x24/0x50 [14790.260436] sas_ex_discover_devices+0xb10/0xc30 [14790.265040] sas_ex_revalidate_domain+0x1d8/0x518 [14790.269731] sas_revalidate_domain+0x12c/0x154 [14790.274163] process_one_work+0x128/0x2b0 [14790.278160] worker_thread+0x14c/0x408 [14790.281897] kthread+0xfc/0x128 [14790.285026] ret_from_fork+0x10/0x18 [14790.288598] ------------[ cut here ]------------ At last, the disk 5000cca04eb1001d is lost. The basic idea of fix this issue is to let the revalidation first scan all phys, and then unregisterring devices. Only when no devices need to be unregisterred, go to the next step to discover new devices. If there are devices need unregister, unregister those devices and tell the revalidation event processor to retry again. The next revalidation will process the discovering of the new devices. Tested-by: Chen Liangfei Signed-off-by: Jason Yan CC: Xiaofei Tan CC: chenxiang CC: John Garry CC: Johannes Thumshirn CC: Ewan Milne CC: Christoph Hellwig CC: Tomas Henzl CC: Dan Williams CC: Hannes Reinecke --- drivers/scsi/libsas/sas_expander.c | 146 +++++++++++++++++++++++++------------ 1 file changed, 100 insertions(+), 46 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index e781941a7088..073bb5c6e353 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -2056,7 +2056,7 @@ static bool sas_process_flutter(struct domain_device *dev, struct ex_phy *phy, return true; } -static int sas_unregister(struct domain_device *dev, int phy_id, bool last, +static int sas_ex_unregister(struct domain_device *dev, int phy_id, bool last, bool *retry) { struct expander_device *ex = &dev->ex_dev; @@ -2108,21 +2108,8 @@ static int sas_unregister(struct domain_device *dev, int phy_id, bool last, return 0; } -/** - * sas_rediscover - revalidate the domain. - * @dev:domain device to be detect. - * @phy_id: the phy id will be detected. - * - * NOTE: this process _must_ quit (return) as soon as any connection - * errors are encountered. Connection recovery is done elsewhere. - * Discover process only interrogates devices in order to discover the - * domain.For plugging out, we un-register the device only when it is - * the last phy in the port, for other phys in this port, we just delete it - * from the port.For inserting, we do discovery when it is the - * first phy,for other phys in this port, we add it to the port to - * forming the wide-port. - */ -static void sas_rediscover(struct domain_device *dev, const int phy_id, + +static void sas_ex_unregister_device(struct domain_device *dev, const int phy_id, bool *retry) { struct expander_device *ex = &dev->ex_dev; @@ -2131,31 +2118,70 @@ static void sas_rediscover(struct domain_device *dev, const int phy_id, int i; bool last = true; /* is this the last phy of the port */ - pr_debug("ex %016llx phy%d originated BROADCAST(CHANGE)\n", - SAS_ADDR(dev->sas_addr), phy_id); - - if (SAS_ADDR(changed_phy->attached_sas_addr) != 0) { - for (i = 0; i < ex->num_phys; i++) { - struct ex_phy *phy = &ex->ex_phy[i]; + for (i = 0; i < ex->num_phys; i++) { + struct ex_phy *phy = &ex->ex_phy[i]; - if (i == phy_id) - continue; - if (SAS_ADDR(phy->attached_sas_addr) == - SAS_ADDR(changed_phy->attached_sas_addr)) { - pr_debug("phy%d part of wide port with phy%d\n", - phy_id, i); - last = false; - break; - } + if (i == phy_id) + continue; + if (SAS_ADDR(phy->attached_sas_addr) == + SAS_ADDR(changed_phy->attached_sas_addr)) { + pr_debug("phy%d part of wide port with phy%d\n", + phy_id, i); + last = false; + break; } - res = sas_unregister(dev, phy_id, last, retry); - } else - res = sas_discover_new(dev, phy_id); + } + res = sas_ex_unregister(dev, phy_id, last, retry); pr_debug("ex %016llx phy%d discover returned 0x%x\n", SAS_ADDR(dev->sas_addr), phy_id, res); } +static int sas_ex_try_unregister(struct domain_device *dev, u8 *changed_phy, + int nr, bool *retry) +{ + struct expander_device *ex = &dev->ex_dev; + int unregistered = 0; + struct ex_phy *phy; + int i; + + for (i = 0; i < nr; i++) { + pr_debug("ex %016llx phy%d originated BROADCAST(CHANGE)\n", + SAS_ADDR(dev->sas_addr), changed_phy[i]); + + phy = &ex->ex_phy[changed_phy[i]]; + + if (SAS_ADDR(phy->attached_sas_addr) == 0) + continue; + + sas_ex_unregister_device(dev, changed_phy[i], retry); + changed_phy[i] = 0xff; + unregistered++; + } + return unregistered; +} + +static void sas_ex_register(struct domain_device *dev, u8 *changed_phy, + int nr) +{ + struct expander_device *ex = &dev->ex_dev; + struct ex_phy *phy; + int res = 0; + int i; + + for (i = 0; i < nr; i++) { + if (changed_phy[i] == 0xff) + continue; + + phy = &ex->ex_phy[changed_phy[i]]; + + res = sas_discover_new(dev, changed_phy[i]); + + pr_debug("ex %016llx phy%d register returned 0x%x\n", + SAS_ADDR(dev->sas_addr), changed_phy[i], res); + } +} + /** * sas_ex_revalidate_domain - revalidate the domain * @port_dev: port domain device. @@ -2170,21 +2196,49 @@ void sas_ex_revalidate_domain(struct domain_device *port_dev, bool *retry) { int res; struct domain_device *dev = NULL; + u8 changed_phy[MAX_EXPANDER_PHYS]; + struct expander_device *ex; + int unregistered = 0; + int phy_id; + int nr = 0; + int i = 0; res = sas_find_bcast_dev(port_dev, &dev); - if (res == 0 && dev) { - struct expander_device *ex = &dev->ex_dev; - int i = 0, phy_id; - - do { - phy_id = -1; - res = sas_find_bcast_phy(dev, &phy_id, i, true); - if (phy_id == -1) - break; - sas_rediscover(dev, phy_id, retry); - i = phy_id + 1; - } while (i < ex->num_phys); + if (res || !dev) + return; + + memset(changed_phy, 0xff, MAX_EXPANDER_PHYS); + ex = &dev->ex_dev; + + do { + phy_id = -1; + res = sas_find_bcast_phy(dev, &phy_id, i, true); + if (phy_id == -1) + break; + changed_phy[nr++] = phy_id; + i = phy_id + 1; + } while (i < dev->ex_dev.num_phys); + + if (nr == 0) + return; + + unregistered = sas_ex_try_unregister(dev, changed_phy, nr, retry); + + if (unregistered > 0) { + struct ex_phy *phy; + + for (i = 0; i < nr; i++) { + if (changed_phy[i] == 0xff) + continue; + phy = &ex->ex_phy[changed_phy[i]]; + phy->phy_change_count = -1; + } + ex->ex_change_count = -1; + *retry = true; + return; } + + sas_ex_register(dev, changed_phy, nr); } void sas_smp_handler(struct bsg_job *job, struct Scsi_Host *shost,