Message ID | 1537935759-14754-4-git-send-email-suganath-prabu.subramani@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | mpt3sas: Hot-Plug Surprise removal support on IOC. | expand |
On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote: > The code for getting shost and IOC is redundant so > moved that to function "scsih_get_shost_and_ioc". > Also checks for NULL are added to IOC and shost. > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> > --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------ > 1 file changed, 82 insertions(+), 16 deletions(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index 566a550..f6e92eb 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) > } > > /** > + * _scsih_get_shost_and_ioc - get shost and ioc > + * and verify whether they are NULL or not > + * @pdev: PCI device struct > + * @shost: address of scsi host pointer > + * @ioc: address of HBA adapter pointer > + * > + * Return zero if *shost and *ioc are not NULL otherwise return error number. > + */ > +static int > +_scsih_get_shost_and_ioc(struct pci_dev *pdev, > + struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc) > +{ > + *shost = pci_get_drvdata(pdev); > + if (*shost == NULL) { > + dev_err(&pdev->dev, "pdev's driver data is null\n"); > + return -ENXIO; > + } > + > + *ioc = shost_priv(*shost); > + if (*ioc == NULL) { > + dev_err(&pdev->dev, "shost's private data is null\n"); > + return -ENXIO; I think it's better to omit NULL pointer checks like these because there should not be a path where we can execute this code when these pointers are NULL. If there *is* such a path, I think that's a serious bug and it's better to oops when we try to dereference the NULL pointer. If we just return an error code, it's likely the bug will be ignored and never fixed. Bjorn
Trivial nits/questions. In subject: s/Introdude/Introduce/ s/ / / (remove double space) s/\.// (remove trailing period, also appears in patches 4, 5, 6) On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote: > The code for getting shost and IOC is redundant so > moved that to function "scsih_get_shost_and_ioc". > Also checks for NULL are added to IOC and shost. > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Per [1], it's good to use full names, e.g., spell out "S" when that makes sense. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n456 > --- > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------ > 1 file changed, 82 insertions(+), 16 deletions(-) > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > index 566a550..f6e92eb 100644 > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) > } > > /** > + * _scsih_get_shost_and_ioc - get shost and ioc > + * and verify whether they are NULL or not > + * @pdev: PCI device struct > + * @shost: address of scsi host pointer There seems to be a convention to capitalize "IOC" and "SCSI" when used in English text, i.e., when they're not variable names or part of a function name. > + * @ioc: address of HBA adapter pointer > + * > + * Return zero if *shost and *ioc are not NULL otherwise return error number. > + */ > +static int > +_scsih_get_shost_and_ioc(struct pci_dev *pdev, > + struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc) > +{ > + *shost = pci_get_drvdata(pdev); > + if (*shost == NULL) { > + dev_err(&pdev->dev, "pdev's driver data is null\n"); I don't see any other uses of dev_printk() in this file; existing output seems to use sdev_printk(), pr_info(), etc. Just pointing this out to make sure dev_printk() is really what you want here. > + return -ENXIO; > + } Bjorn
On Thu, Sep 27, 2018 at 2:39 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote: > > The code for getting shost and IOC is redundant so > > moved that to function "scsih_get_shost_and_ioc". > > Also checks for NULL are added to IOC and shost. > > > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> > > --- > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------ > > 1 file changed, 82 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > index 566a550..f6e92eb 100644 > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) > > } > > > > /** > > + * _scsih_get_shost_and_ioc - get shost and ioc > > + * and verify whether they are NULL or not > > + * @pdev: PCI device struct > > + * @shost: address of scsi host pointer > > + * @ioc: address of HBA adapter pointer > > + * > > + * Return zero if *shost and *ioc are not NULL otherwise return error number. > > + */ > > +static int > > +_scsih_get_shost_and_ioc(struct pci_dev *pdev, > > + struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc) > > +{ > > + *shost = pci_get_drvdata(pdev); > > + if (*shost == NULL) { > > + dev_err(&pdev->dev, "pdev's driver data is null\n"); > > + return -ENXIO; > > + } > > + > > + *ioc = shost_priv(*shost); > > + if (*ioc == NULL) { > > + dev_err(&pdev->dev, "shost's private data is null\n"); > > + return -ENXIO; > > I think it's better to omit NULL pointer checks like these because > there should not be a path where we can execute this code when these > pointers are NULL. > > If there *is* such a path, I think that's a serious bug and it's > better to oops when we try to dereference the NULL pointer. If we > just return an error code, it's likely the bug will be ignored and > never fixed. > We have added the ioc and shost checks, because of kernel panic in below scenario. Have 3 HBA's in system and perform below operation. 1) Run “poweroff”. 2) Immediate hot unplug HBA. We have observed, kernel's shutdown process has removed all the 3 HBA devices smoothly, and also user have unplugged the HBA device during this time. PCI subsystem's hot-plug thread is also trying to remove the hot plugged PCI device which is already removed/cleaned by the shutdown process. (Which is not expected for the already removed device) Due to this kernel panic is observed. And we are not sure whether it has to fixed from pciehp layer, so we added sanity checks in driver. [ 1745.605510] BUG: unable to handle kernel NULL pointer dereference at 0000000000000a98 [ 1745.606554] IP: [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas] [ 1745.607609] PGD 0 [ 1745.608621] Oops: 0000 [#1] SMP [ 1745.622989] CPU: 0 PID: 668 Comm: kworker/0:2 Tainted: G O 4.4.55-1.el7.elrepo.x86_64 #1 [ 1745.624800] Hardware name: PRO-MNU65930231 PRO-NME69559126/BRD-PRO55212588, BIOS 0.51e 05/08/2017 [ 1745.626673] Workqueue: pciehp-3 pciehp_power_thread [ 1745.628566] task: ffff881fe50dd880 ti: ffff881fe88e4000 task.ti: ffff881fe88e4000 [ 1745.630530] RIP: 0010:[<ffffffffa054c750>] [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas] [ 1745.632577] RSP: 0018:ffff881fe88e7c98 EFLAGS: 00010292 [ 1745.634639] RAX: 0000000000000001 RBX: ffff881feef5c000 RCX: 0000000000000000 [ 1745.636718] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff881feef5c000 [ 1745.638832] RBP: ffff881fe88e7cc8 R08: 0000000000000000 R09: 0000000180220002 [ 1745.640972] R10: 00000000eaf4a401 R11: ffffea007fabd280 R12: 0000000000000000 [ 1745.643136] R13: ffffffffa0576020 R14: ffff881fe9af8240 R15: 0000000000000000 [ 1745.645320] FS: 0000000000000000(0000) GS:ffff881ffde00000(0000) knlGS:0000000000000000 [ 1745.647572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1745.649833] CR2: 0000000000000a98 CR3: 0000001fe76df000 CR4: 00000000003406f0 [ 1745.652147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1745.654476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1745.656825] Stack: [ 1745.659138] ffff881fe88e7cc8 ffff881feef5c098 ffff881feef5c000 ffffffffa0576020 [ 1745.661562] ffff881fe9af8240 0000000000000000 ffff881fe88e7cf0 ffffffff8137f9d9 [ 1745.663990] ffff881feef5c098 ffffffffa0576088 ffff881feef5c000 ffff881fe88e7d10 [ 1745.666428] Call Trace: [ 1745.668830] [<ffffffff8137f9d9>] pci_device_remove+0x39/0xc0 [ 1745.671256] [<ffffffff8147db36>] __device_release_driver+0x96/0x130 [ 1745.673664] [<ffffffff8147dbf3>] device_release_driver+0x23/0x30 [ 1745.676071] [<ffffffff8137851c>] pci_stop_bus_device+0x8c/0xa0 [ 1745.678485] [<ffffffff81378612>] pci_stop_and_remove_bus_device+0x12/0x20 [ 1745.680909] [<ffffffff81392eea>] pciehp_unconfigure_device+0xaa/0x1b0 [ 1745.683331] [<ffffffff813929e2>] pciehp_disable_slot+0x52/0xd0 [ 1745.685767] [<ffffffff81392aed>] pciehp_power_thread+0x8d/0xb0 [ 1745.688210] [<ffffffff8109728f>] process_one_work+0x14f/0x400 [ 1745.690633] [<ffffffff81097b04>] worker_thread+0x114/0x470 [ 1745.693080] [<ffffffff810979f0>] ? rescuer_thread+0x310/0x310 [ 1745.695518] [<ffffffff8109d648>] kthread+0xd8/0xf0 [ 1745.697936] [<ffffffff8109d570>] ? kthread_park+0x60/0x60 [ 1745.700359] [<ffffffff816f854f>] ret_from_fork+0x3f/0x70 [ 1745.702747] [<ffffffff8109d570>] ? kthread_park+0x60/0x60 > Bjorn
[+cc LKML] On Mon, Oct 01, 2018 at 12:57:01PM +0530, Suganath Prabu Subramani wrote: > On Thu, Sep 27, 2018 at 2:39 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote: > > > The code for getting shost and IOC is redundant so > > > moved that to function "scsih_get_shost_and_ioc". > > > Also checks for NULL are added to IOC and shost. > > > > > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> > > > --- > > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------ > > > 1 file changed, 82 insertions(+), 16 deletions(-) > > > > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > index 566a550..f6e92eb 100644 > > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) > > > } > > > > > > /** > > > + * _scsih_get_shost_and_ioc - get shost and ioc > > > + * and verify whether they are NULL or not > > > + * @pdev: PCI device struct > > > + * @shost: address of scsi host pointer > > > + * @ioc: address of HBA adapter pointer > > > + * > > > + * Return zero if *shost and *ioc are not NULL otherwise return error number. > > > + */ > > > +static int > > > +_scsih_get_shost_and_ioc(struct pci_dev *pdev, > > > + struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc) > > > +{ > > > + *shost = pci_get_drvdata(pdev); > > > + if (*shost == NULL) { > > > + dev_err(&pdev->dev, "pdev's driver data is null\n"); > > > + return -ENXIO; > > > + } > > > + > > > + *ioc = shost_priv(*shost); > > > + if (*ioc == NULL) { > > > + dev_err(&pdev->dev, "shost's private data is null\n"); > > > + return -ENXIO; > > > > I think it's better to omit NULL pointer checks like these because > > there should not be a path where we can execute this code when these > > pointers are NULL. > > > > If there *is* such a path, I think that's a serious bug and it's > > better to oops when we try to dereference the NULL pointer. If we > > just return an error code, it's likely the bug will be ignored and > > never fixed. > > > We have added the ioc and shost checks, because of kernel panic in > below scenario. > Have 3 HBA's in system and perform below operation. > 1) Run “poweroff”. > 2) Immediate hot unplug HBA. > We have observed, kernel's shutdown process has removed all the 3 > HBA devices smoothly, and also user have unplugged the HBA device > during this time. PCI subsystem's hot-plug thread is also trying to > remove the hot plugged PCI device which is already removed/cleaned > by the shutdown process. (Which is not expected for the already > removed device) Due to this kernel panic is observed. And we are not > sure whether it has to fixed from pciehp layer, so we added sanity > checks in driver. This is a great example. scsih_remove() is the mpt3sas_driver.remove method. It sounds like it's getting called twice for the same HBA. I think that's a PCI core defect and we should fix it there instead of trying to work around it in every driver. The trace below is from v4.4.55. Can you reproduce the same problem with a current tree, e.g., v4.19-rc? There have been many changes since v4.4 that may have fixed this problem. > [ 1745.605510] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000a98 > [ 1745.606554] IP: [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas] > [ 1745.607609] PGD 0 > [ 1745.608621] Oops: 0000 [#1] SMP > [ 1745.622989] CPU: 0 PID: 668 Comm: kworker/0:2 Tainted: G > O 4.4.55-1.el7.elrepo.x86_64 #1 > [ 1745.624800] Hardware name: PRO-MNU65930231 > PRO-NME69559126/BRD-PRO55212588, BIOS 0.51e 05/08/2017 > [ 1745.626673] Workqueue: pciehp-3 pciehp_power_thread > [ 1745.628566] task: ffff881fe50dd880 ti: ffff881fe88e4000 task.ti: > ffff881fe88e4000 > [ 1745.630530] RIP: 0010:[<ffffffffa054c750>] [<ffffffffa054c750>] > scsih_remove+0x20/0x2d0 [mpt3sas] > [ 1745.632577] RSP: 0018:ffff881fe88e7c98 EFLAGS: 00010292 > [ 1745.634639] RAX: 0000000000000001 RBX: ffff881feef5c000 RCX: 0000000000000000 > [ 1745.636718] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff881feef5c000 > [ 1745.638832] RBP: ffff881fe88e7cc8 R08: 0000000000000000 R09: 0000000180220002 > [ 1745.640972] R10: 00000000eaf4a401 R11: ffffea007fabd280 R12: 0000000000000000 > [ 1745.643136] R13: ffffffffa0576020 R14: ffff881fe9af8240 R15: 0000000000000000 > [ 1745.645320] FS: 0000000000000000(0000) GS:ffff881ffde00000(0000) > knlGS:0000000000000000 > [ 1745.647572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1745.649833] CR2: 0000000000000a98 CR3: 0000001fe76df000 CR4: 00000000003406f0 > [ 1745.652147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1745.654476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1745.656825] Stack: > [ 1745.659138] ffff881fe88e7cc8 ffff881feef5c098 ffff881feef5c000 > ffffffffa0576020 > [ 1745.661562] ffff881fe9af8240 0000000000000000 ffff881fe88e7cf0 > ffffffff8137f9d9 > [ 1745.663990] ffff881feef5c098 ffffffffa0576088 ffff881feef5c000 > ffff881fe88e7d10 > [ 1745.666428] Call Trace: > [ 1745.668830] [<ffffffff8137f9d9>] pci_device_remove+0x39/0xc0 > [ 1745.671256] [<ffffffff8147db36>] __device_release_driver+0x96/0x130 > [ 1745.673664] [<ffffffff8147dbf3>] device_release_driver+0x23/0x30 > [ 1745.676071] [<ffffffff8137851c>] pci_stop_bus_device+0x8c/0xa0 > [ 1745.678485] [<ffffffff81378612>] pci_stop_and_remove_bus_device+0x12/0x20 > [ 1745.680909] [<ffffffff81392eea>] pciehp_unconfigure_device+0xaa/0x1b0 > [ 1745.683331] [<ffffffff813929e2>] pciehp_disable_slot+0x52/0xd0 > [ 1745.685767] [<ffffffff81392aed>] pciehp_power_thread+0x8d/0xb0 > [ 1745.688210] [<ffffffff8109728f>] process_one_work+0x14f/0x400 > [ 1745.690633] [<ffffffff81097b04>] worker_thread+0x114/0x470 > [ 1745.693080] [<ffffffff810979f0>] ? rescuer_thread+0x310/0x310 > [ 1745.695518] [<ffffffff8109d648>] kthread+0xd8/0xf0 > [ 1745.697936] [<ffffffff8109d570>] ? kthread_park+0x60/0x60 > [ 1745.700359] [<ffffffff816f854f>] ret_from_fork+0x3f/0x70 > [ 1745.702747] [<ffffffff8109d570>] ? kthread_park+0x60/0x60
Hi Bjorn, We tried to recreate the issue with latest kernel multiple times.(Without including this patch,) but we were not able to recreate it. On 4.4 kernel we hit this issue time long back. So we are not sure that this issue is really fixed with latest kernel or not. We prefer to keep this patch as this only adds sanity check. We are also okay, if you suggest us to completely remove this patch. Thanks, Suganath Prabu On Mon, Oct 1, 2018 at 7:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc LKML] > > On Mon, Oct 01, 2018 at 12:57:01PM +0530, Suganath Prabu Subramani wrote: > > On Thu, Sep 27, 2018 at 2:39 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote: > > > > The code for getting shost and IOC is redundant so > > > > moved that to function "scsih_get_shost_and_ioc". > > > > Also checks for NULL are added to IOC and shost. > > > > > > > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> > > > > --- > > > > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------ > > > > 1 file changed, 82 insertions(+), 16 deletions(-) > > > > > > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > index 566a550..f6e92eb 100644 > > > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c > > > > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) > > > > } > > > > > > > > /** > > > > + * _scsih_get_shost_and_ioc - get shost and ioc > > > > + * and verify whether they are NULL or not > > > > + * @pdev: PCI device struct > > > > + * @shost: address of scsi host pointer > > > > + * @ioc: address of HBA adapter pointer > > > > + * > > > > + * Return zero if *shost and *ioc are not NULL otherwise return error number. > > > > + */ > > > > +static int > > > > +_scsih_get_shost_and_ioc(struct pci_dev *pdev, > > > > + struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc) > > > > +{ > > > > + *shost = pci_get_drvdata(pdev); > > > > + if (*shost == NULL) { > > > > + dev_err(&pdev->dev, "pdev's driver data is null\n"); > > > > + return -ENXIO; > > > > + } > > > > + > > > > + *ioc = shost_priv(*shost); > > > > + if (*ioc == NULL) { > > > > + dev_err(&pdev->dev, "shost's private data is null\n"); > > > > + return -ENXIO; > > > > > > I think it's better to omit NULL pointer checks like these because > > > there should not be a path where we can execute this code when these > > > pointers are NULL. > > > > > > If there *is* such a path, I think that's a serious bug and it's > > > better to oops when we try to dereference the NULL pointer. If we > > > just return an error code, it's likely the bug will be ignored and > > > never fixed. > > > > > We have added the ioc and shost checks, because of kernel panic in > > below scenario. > > Have 3 HBA's in system and perform below operation. > > 1) Run “poweroff”. > > 2) Immediate hot unplug HBA. > > We have observed, kernel's shutdown process has removed all the 3 > > HBA devices smoothly, and also user have unplugged the HBA device > > during this time. PCI subsystem's hot-plug thread is also trying to > > remove the hot plugged PCI device which is already removed/cleaned > > by the shutdown process. (Which is not expected for the already > > removed device) Due to this kernel panic is observed. And we are not > > sure whether it has to fixed from pciehp layer, so we added sanity > > checks in driver. > > This is a great example. scsih_remove() is the mpt3sas_driver.remove > method. It sounds like it's getting called twice for the same HBA. > I think that's a PCI core defect and we should fix it there instead of > trying to work around it in every driver. > > The trace below is from v4.4.55. Can you reproduce the same problem > with a current tree, e.g., v4.19-rc? There have been many changes > since v4.4 that may have fixed this problem. > > > [ 1745.605510] BUG: unable to handle kernel NULL pointer dereference > > at 0000000000000a98 > > [ 1745.606554] IP: [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas] > > [ 1745.607609] PGD 0 > > [ 1745.608621] Oops: 0000 [#1] SMP > > [ 1745.622989] CPU: 0 PID: 668 Comm: kworker/0:2 Tainted: G > > O 4.4.55-1.el7.elrepo.x86_64 #1 > > [ 1745.624800] Hardware name: PRO-MNU65930231 > > PRO-NME69559126/BRD-PRO55212588, BIOS 0.51e 05/08/2017 > > [ 1745.626673] Workqueue: pciehp-3 pciehp_power_thread > > [ 1745.628566] task: ffff881fe50dd880 ti: ffff881fe88e4000 task.ti: > > ffff881fe88e4000 > > [ 1745.630530] RIP: 0010:[<ffffffffa054c750>] [<ffffffffa054c750>] > > scsih_remove+0x20/0x2d0 [mpt3sas] > > [ 1745.632577] RSP: 0018:ffff881fe88e7c98 EFLAGS: 00010292 > > [ 1745.634639] RAX: 0000000000000001 RBX: ffff881feef5c000 RCX: 0000000000000000 > > [ 1745.636718] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff881feef5c000 > > [ 1745.638832] RBP: ffff881fe88e7cc8 R08: 0000000000000000 R09: 0000000180220002 > > [ 1745.640972] R10: 00000000eaf4a401 R11: ffffea007fabd280 R12: 0000000000000000 > > [ 1745.643136] R13: ffffffffa0576020 R14: ffff881fe9af8240 R15: 0000000000000000 > > [ 1745.645320] FS: 0000000000000000(0000) GS:ffff881ffde00000(0000) > > knlGS:0000000000000000 > > [ 1745.647572] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1745.649833] CR2: 0000000000000a98 CR3: 0000001fe76df000 CR4: 00000000003406f0 > > [ 1745.652147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 1745.654476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 1745.656825] Stack: > > [ 1745.659138] ffff881fe88e7cc8 ffff881feef5c098 ffff881feef5c000 > > ffffffffa0576020 > > [ 1745.661562] ffff881fe9af8240 0000000000000000 ffff881fe88e7cf0 > > ffffffff8137f9d9 > > [ 1745.663990] ffff881feef5c098 ffffffffa0576088 ffff881feef5c000 > > ffff881fe88e7d10 > > [ 1745.666428] Call Trace: > > [ 1745.668830] [<ffffffff8137f9d9>] pci_device_remove+0x39/0xc0 > > [ 1745.671256] [<ffffffff8147db36>] __device_release_driver+0x96/0x130 > > [ 1745.673664] [<ffffffff8147dbf3>] device_release_driver+0x23/0x30 > > [ 1745.676071] [<ffffffff8137851c>] pci_stop_bus_device+0x8c/0xa0 > > [ 1745.678485] [<ffffffff81378612>] pci_stop_and_remove_bus_device+0x12/0x20 > > [ 1745.680909] [<ffffffff81392eea>] pciehp_unconfigure_device+0xaa/0x1b0 > > [ 1745.683331] [<ffffffff813929e2>] pciehp_disable_slot+0x52/0xd0 > > [ 1745.685767] [<ffffffff81392aed>] pciehp_power_thread+0x8d/0xb0 > > [ 1745.688210] [<ffffffff8109728f>] process_one_work+0x14f/0x400 > > [ 1745.690633] [<ffffffff81097b04>] worker_thread+0x114/0x470 > > [ 1745.693080] [<ffffffff810979f0>] ? rescuer_thread+0x310/0x310 > > [ 1745.695518] [<ffffffff8109d648>] kthread+0xd8/0xf0 > > [ 1745.697936] [<ffffffff8109d570>] ? kthread_park+0x60/0x60 > > [ 1745.700359] [<ffffffff816f854f>] ret_from_fork+0x3f/0x70 > > [ 1745.702747] [<ffffffff8109d570>] ? kthread_park+0x60/0x60
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 566a550..f6e92eb 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) } /** + * _scsih_get_shost_and_ioc - get shost and ioc + * and verify whether they are NULL or not + * @pdev: PCI device struct + * @shost: address of scsi host pointer + * @ioc: address of HBA adapter pointer + * + * Return zero if *shost and *ioc are not NULL otherwise return error number. + */ +static int +_scsih_get_shost_and_ioc(struct pci_dev *pdev, + struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc) +{ + *shost = pci_get_drvdata(pdev); + if (*shost == NULL) { + dev_err(&pdev->dev, "pdev's driver data is null\n"); + return -ENXIO; + } + + *ioc = shost_priv(*shost); + if (*ioc == NULL) { + dev_err(&pdev->dev, "shost's private data is null\n"); + return -ENXIO; + } + + return 0; +} + + +/** * scsih_remove - detach and remove add host * @pdev: PCI device struct * @@ -9816,8 +9845,8 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc) */ static void scsih_remove(struct pci_dev *pdev) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; struct _sas_port *mpt3sas_port, *next_port; struct _raid_device *raid_device, *next; struct MPT3SAS_TARGET *sas_target_priv_data; @@ -9825,6 +9854,10 @@ static void scsih_remove(struct pci_dev *pdev) struct workqueue_struct *wq; unsigned long flags; + if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) { + dev_err(&pdev->dev, "unable to remove device\n"); + return; + } ioc->remove_host = 1; mpt3sas_wait_for_commands_to_complete(ioc); @@ -9898,11 +9931,16 @@ static void scsih_remove(struct pci_dev *pdev) static void scsih_shutdown(struct pci_dev *pdev) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; struct workqueue_struct *wq; unsigned long flags; + if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) { + dev_err(&pdev->dev, "unable to shutdown device\n"); + return; + } + ioc->remove_host = 1; mpt3sas_wait_for_commands_to_complete(ioc); @@ -10727,10 +10765,16 @@ out_add_shost_fail: static int scsih_suspend(struct pci_dev *pdev, pm_message_t state) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; pci_power_t device_state; + int rc; + rc = _scsih_get_shost_and_ioc(pdev, &shost, &ioc); + if (rc) { + dev_err(&pdev->dev, "unable to suspend device\n"); + return rc; + } mpt3sas_base_stop_watchdog(ioc); flush_scheduled_work(); scsi_block_requests(shost); @@ -10754,11 +10798,17 @@ scsih_suspend(struct pci_dev *pdev, pm_message_t state) static int scsih_resume(struct pci_dev *pdev) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; pci_power_t device_state = pdev->current_state; int r; + r = _scsih_get_shost_and_ioc(pdev, &shost, &ioc); + if (r) { + dev_err(&pdev->dev, "unable to resume device\n"); + return r; + } + pr_info(MPT3SAS_FMT "pdev=0x%p, slot=%s, previous operating state [D%d]\n", ioc->name, pdev, pci_name(pdev), device_state); @@ -10790,9 +10840,13 @@ scsih_resume(struct pci_dev *pdev) static pci_ers_result_t scsih_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; + if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) { + dev_err(&pdev->dev, "device unavailable\n"); + return PCI_ERS_RESULT_DISCONNECT; + } pr_info(MPT3SAS_FMT "PCI error: detected callback, state(%d)!!\n", ioc->name, state); @@ -10827,10 +10881,14 @@ scsih_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state) static pci_ers_result_t scsih_pci_slot_reset(struct pci_dev *pdev) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; int rc; + if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) { + dev_err(&pdev->dev, "unable to perform slot reset\n"); + return PCI_ERS_RESULT_DISCONNECT; + } pr_info(MPT3SAS_FMT "PCI error: slot reset callback!!\n", ioc->name); @@ -10863,9 +10921,13 @@ scsih_pci_slot_reset(struct pci_dev *pdev) static void scsih_pci_resume(struct pci_dev *pdev) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; + if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) { + dev_err(&pdev->dev, "unable to resume device\n"); + return; + } pr_info(MPT3SAS_FMT "PCI error: resume callback!!\n", ioc->name); pci_cleanup_aer_uncorrect_error_status(pdev); @@ -10880,9 +10942,13 @@ scsih_pci_resume(struct pci_dev *pdev) static pci_ers_result_t scsih_pci_mmio_enabled(struct pci_dev *pdev) { - struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct MPT3SAS_ADAPTER *ioc = shost_priv(shost); + struct Scsi_Host *shost = NULL; + struct MPT3SAS_ADAPTER *ioc = NULL; + if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) { + dev_err(&pdev->dev, "unable to enable mmio\n"); + return PCI_ERS_RESULT_DISCONNECT; + } pr_info(MPT3SAS_FMT "PCI error: mmio enabled callback!!\n", ioc->name);
The code for getting shost and IOC is redundant so moved that to function "scsih_get_shost_and_ioc". Also checks for NULL are added to IOC and shost. Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------ 1 file changed, 82 insertions(+), 16 deletions(-)