diff mbox

pci/shpchp: no claim on pcie port device

Message ID 1528785733-19442-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Pingfan Liu June 12, 2018, 6:42 a.m. UTC
The Linux Device Driver Model allows a physical device to be handled
by only a single driver. But at present, both shpchp and portdrv_pci
claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
few problems, one is the wrong shutdown seq of devices, owing to the
broken devices_kset.

I hit this bug on a Power9 machine, when "kexec -e", and see a ata-disk
behind a bridge can not write back buffer in flight due to the former
shutdown of the bridge which clears the BusMaster bit.

Note the device involved:
0004:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        NUMA node: 0
        Bus: primary=00, secondary=01, subordinate=12, sec-latency=0
        I/O behind bridge: 00000000-00000fff
        Memory behind bridge: 80000000-ffefffff
        Prefetchable memory behind bridge: 0006024000000000-0006027f7fffffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis+, LTR-, OBFF Disabled ARIFwd+
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP+ FCP- CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF- MalfTLP- ECRC+ UnsupReq- ACSViol-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
        Capabilities: [148 v1] #19

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
---
 drivers/pci/hotplug/shpchp_core.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Christoph Hellwig June 12, 2018, 6:57 a.m. UTC | #1
On Tue, Jun 12, 2018 at 02:42:13PM +0800, Pingfan Liu wrote:
> The Linux Device Driver Model allows a physical device to be handled
> by only a single driver. But at present, both shpchp and portdrv_pci
> claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
> few problems, one is the wrong shutdown seq of devices, owing to the
> broken devices_kset.

How can they both touch devices_kset?  Once one driver has claimed the
device it should not be available to others.

> +	/* do not claim pcie port device */
> +	if (pci_is_pcie(dev) &&
> +	    ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
> +	     (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
> +	     (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
> +		return -ENODEV;

No need for the inner braces.
kernel test robot June 12, 2018, 6:57 a.m. UTC | #2
Hi Pingfan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.17]
[cannot apply to pci/next next-20180612]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/pci-shpchp-no-claim-on-pcie-port-device/20180612-144419
config: i386-randconfig-x009-201823 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/pci/hotplug/shpchp_core.c: In function 'shpc_probe':
>> drivers/pci/hotplug/shpchp_core.c:291:18: error: 'dev' undeclared (first use in this function); did you mean 'pdev'?
     if (pci_is_pcie(dev) &&
                     ^~~
                     pdev
   drivers/pci/hotplug/shpchp_core.c:291:18: note: each undeclared identifier is reported only once for each function it appears in

vim +291 drivers/pci/hotplug/shpchp_core.c

   284	
   285	static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
   286	{
   287		int rc;
   288		struct controller *ctrl;
   289	
   290		/* do not claim pcie port device */
 > 291		if (pci_is_pcie(dev) &&
   292		    ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
   293		     (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
   294		     (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
   295			return -ENODEV;
   296	
   297		if (!is_shpc_capable(pdev))
   298			return -ENODEV;
   299	
   300		ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
   301		if (!ctrl)
   302			goto err_out_none;
   303	
   304		INIT_LIST_HEAD(&ctrl->slot_list);
   305	
   306		rc = shpc_init(ctrl, pdev);
   307		if (rc) {
   308			ctrl_dbg(ctrl, "Controller initialization failed\n");
   309			goto err_out_free_ctrl;
   310		}
   311	
   312		pci_set_drvdata(pdev, ctrl);
   313	
   314		/* Setup the slot information structures */
   315		rc = init_slots(ctrl);
   316		if (rc) {
   317			ctrl_err(ctrl, "Slot initialization failed\n");
   318			goto err_out_release_ctlr;
   319		}
   320	
   321		rc = shpchp_create_ctrl_files(ctrl);
   322		if (rc)
   323			goto err_cleanup_slots;
   324	
   325		return 0;
   326	
   327	err_cleanup_slots:
   328		cleanup_slots(ctrl);
   329	err_out_release_ctlr:
   330		ctrl->hpc_ops->release_ctlr(ctrl);
   331	err_out_free_ctrl:
   332		kfree(ctrl);
   333	err_out_none:
   334		return -ENODEV;
   335	}
   336	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Pingfan Liu June 12, 2018, 7:14 a.m. UTC | #3
On Tue, Jun 12, 2018 at 2:42 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> The Linux Device Driver Model allows a physical device to be handled
> by only a single driver. But at present, both shpchp and portdrv_pci
> claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
> few problems, one is the wrong shutdown seq of devices, owing to the
> broken devices_kset.
>
> I hit this bug on a Power9 machine, when "kexec -e", and see a ata-disk
> behind a bridge can not write back buffer in flight due to the former
> shutdown of the bridge which clears the BusMaster bit.
>
> Note the device involved:
> 0004:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         NUMA node: 0
>         Bus: primary=00, secondary=01, subordinate=12, sec-latency=0
>         I/O behind bridge: 00000000-00000fff
>         Memory behind bridge: 80000000-ffefffff
>         Prefetchable memory behind bridge: 0006024000000000-0006027f7fffffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [48] Express (v2) Root Port (Slot-), MSI 00
>                 DevCap: MaxPayload 512 bytes, PhantFunc 0
>                         ExtTag- RBE+
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 256 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s <64ns, L1 <1us
>                         ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
>                 LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk-
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+
>                 DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis+, LTR-, OBFF Disabled ARIFwd+
>                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
>                          EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP+ FCP- CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF- MalfTLP- ECRC+ UnsupReq- ACSViol-
>                 UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
>         Capabilities: [148 v1] #19
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
>  drivers/pci/hotplug/shpchp_core.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/pci/hotplug/shpchp_core.c b/drivers/pci/hotplug/shpchp_core.c
> index 1f0f969..2cd8e19 100644
> --- a/drivers/pci/hotplug/shpchp_core.c
> +++ b/drivers/pci/hotplug/shpchp_core.c
> @@ -287,6 +287,13 @@ static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         int rc;
>         struct controller *ctrl;
>
> +       /* do not claim pcie port device */
> +       if (pci_is_pcie(dev) &&
> +           ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
> +            (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
> +            (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
> +               return -ENODEV;
> +
Sorry, should s/dev/pdev/.  I will try to loan a power9 to test it.
NACK it, I will send out v2.

Thanks,
Pingfan
>         if (!is_shpc_capable(pdev))
>                 return -ENODEV;
>
> --
> 2.7.4
>
Pingfan Liu June 12, 2018, 7:20 a.m. UTC | #4
On Tue, Jun 12, 2018 at 2:57 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jun 12, 2018 at 02:42:13PM +0800, Pingfan Liu wrote:
> > The Linux Device Driver Model allows a physical device to be handled
> > by only a single driver. But at present, both shpchp and portdrv_pci
> > claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
> > few problems, one is the wrong shutdown seq of devices, owing to the
> > broken devices_kset.
>
> How can they both touch devices_kset?  Once one driver has claimed the
> device it should not be available to others.
>
Unfortunately, it could be.  There are two code path for do_one_initcall
kernel_init->..->do_one_initcall->pcie_portdrv_init
And
load_module->do_init_module->do_one_initcall->shpcd_init
> > +     /* do not claim pcie port device */
> > +     if (pci_is_pcie(dev) &&
> > +         ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
> > +          (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
> > +          (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
> > +             return -ENODEV;
>
> No need for the inner braces.

OK.

Thanks,
Pingfan
kernel test robot June 12, 2018, 7:37 a.m. UTC | #5
Hi Pingfan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.17]
[cannot apply to pci/next next-20180612]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/pci-shpchp-no-claim-on-pcie-port-device/20180612-144419
config: i386-randconfig-a1-201823 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/pci/hotplug/shpchp_core.c: In function 'shpc_probe':
>> drivers/pci/hotplug/shpchp_core.c:291:18: error: 'dev' undeclared (first use in this function)
     if (pci_is_pcie(dev) &&
                     ^
   drivers/pci/hotplug/shpchp_core.c:291:18: note: each undeclared identifier is reported only once for each function it appears in

vim +/dev +291 drivers/pci/hotplug/shpchp_core.c

   284	
   285	static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
   286	{
   287		int rc;
   288		struct controller *ctrl;
   289	
   290		/* do not claim pcie port device */
 > 291		if (pci_is_pcie(dev) &&
   292		    ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
   293		     (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
   294		     (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
   295			return -ENODEV;
   296	
   297		if (!is_shpc_capable(pdev))
   298			return -ENODEV;
   299	
   300		ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
   301		if (!ctrl)
   302			goto err_out_none;
   303	
   304		INIT_LIST_HEAD(&ctrl->slot_list);
   305	
   306		rc = shpc_init(ctrl, pdev);
   307		if (rc) {
   308			ctrl_dbg(ctrl, "Controller initialization failed\n");
   309			goto err_out_free_ctrl;
   310		}
   311	
   312		pci_set_drvdata(pdev, ctrl);
   313	
   314		/* Setup the slot information structures */
   315		rc = init_slots(ctrl);
   316		if (rc) {
   317			ctrl_err(ctrl, "Slot initialization failed\n");
   318			goto err_out_release_ctlr;
   319		}
   320	
   321		rc = shpchp_create_ctrl_files(ctrl);
   322		if (rc)
   323			goto err_cleanup_slots;
   324	
   325		return 0;
   326	
   327	err_cleanup_slots:
   328		cleanup_slots(ctrl);
   329	err_out_release_ctlr:
   330		ctrl->hpc_ops->release_ctlr(ctrl);
   331	err_out_free_ctrl:
   332		kfree(ctrl);
   333	err_out_none:
   334		return -ENODEV;
   335	}
   336	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox

Patch

diff --git a/drivers/pci/hotplug/shpchp_core.c b/drivers/pci/hotplug/shpchp_core.c
index 1f0f969..2cd8e19 100644
--- a/drivers/pci/hotplug/shpchp_core.c
+++ b/drivers/pci/hotplug/shpchp_core.c
@@ -287,6 +287,13 @@  static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	int rc;
 	struct controller *ctrl;
 
+	/* do not claim pcie port device */
+	if (pci_is_pcie(dev) &&
+	    ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
+	     (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
+	     (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
+		return -ENODEV;
+
 	if (!is_shpc_capable(pdev))
 		return -ENODEV;