Message ID | 1503331124-25029-2-git-send-email-oza.oza@broadcom.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
[+cc Sinan, Timur, Alex] Hi Oza, On Mon, Aug 21, 2017 at 09:28:43PM +0530, Oza Pawandeep wrote: > PCIe spec r3.1, sec 2.3.2 > If CRS software visibility is not enabled, the RC must reissue the > config request as a new request. > > - If CRS software visibility is enabled, > - for a config read of Vendor ID, the RC must return 0x0001 data > - for all other config reads/writes, the RC must reissue the > request You mentioned early on that this fixes an issue with NVMe after a reset. Is that reset an FLR? I'm wondering if this overlaps with the work Sinan is doing: http://lkml.kernel.org/r/20170818212310.15145.21732.stgit@bhelgaas-glaptop.roam.corp.google.com With the current code in the tree, pci_flr_wait() probably waits only 100ms on your system. It currently reads PCI_COMMAND and probably sees 0xffff0001 because the NVMe device returns a CRS completion (which this iproc RC incorrectly turns into 0xffff0001 data), so we exit the loop after a single msleep(100). Sinan is extending pci_flr_wait() to read the Vendor ID and look for the CRS SV indication. If this is where you're seeing the NVMe issue, I bet we can fix this by simply changing iproc_pcie_config_read() to return the correct data when it sees a CRS completion. Then the retry loop in pci_flr_wait() will work as intended. Sec 2.3.2 says the RC may limit the number of retries, and it doesn't say anything about how many retries the RC should do or what interval should be between them. So I think it should be legal to say "this RC does zero retries". I think an RC that does zero retries and doesn't support CRS SV would always handle a CRS completion as a "failed transaction" so it would synthesize 0xffffffff data (and, I assume, set an error bit like Received Master Abort). If we treated this controller as though it doesn't support CRS SV, the workaround in iproc_pcie_config_read() would be simply: data = readl(cfg_data_p); if (data == CFG_RETRY_STATUS) /* 0xffff0001 */ data = 0xffffffff; That would make the loop in pci_flr_wait() that reads PCI_COMMAND work correctly. It would get 0xffffffff data as long as the device returned CRS completions. It wouldn't make Sinan's new CRS SV code work, but that's OK: that all has to work correctly even on systems that don't support CRS SV. The only problem is that when we read a non-Vendor ID register that happens to really be 0xffff0001, we *should* return 0xffff0001, but we incorrectly return 0xffffffff. But we already know we just have to live with that issue. One minor refactoring comment below. > iproc PCIe Controller spec: > 4.7.3.3. Retry Status On Configuration Cycle > Endpoints are allowed to generate retry status on configuration > cycles. In this case, the RC needs to re-issue the request. The IP > does not handle this because the number of configuration cycles needed > will probably be less than the total number of non-posted operations > needed. > > When a retry status is received on the User RX interface for a > configuration request that was sent on the User TX interface, > it will be indicated with a completion with the CMPL_STATUS field set > to 2=CRS, and the user will have to find the address and data values > and send a new transaction on the User TX interface. > When the internal configuration space returns a retry status during a > configuration cycle (user_cscfg = 1) on the Command/Status interface, > the pcie_cscrs will assert with the pcie_csack signal to indicate the > CRS status. > When the CRS Software Visibility Enable register in the Root Control > register is enabled, the IP will return the data value to 0x0001 for > the Vendor ID value and 0xffff (all 1’s) for the rest of the data in > the request for reads of offset 0 that return with CRS status. This > is true for both the User RX Interface and for the Command/Status > interface. When CRS Software Visibility is enabled, the CMPL_STATUS > field of the completion on the User RX Interface will not be 2=CRS and > the pcie_cscrs signal will not assert on the Command/Status interface. > > Per PCIe r3.1, sec 2.3.2, config requests that receive completions > with Configuration Request Retry Status (CRS) should be reissued by > the hardware except reads of the Vendor ID when CRS Software > Visibility is enabled. > > This hardware never reissues configuration requests when it receives > CRS completions. > Note that, neither PCIe host bridge nor PCIe core re-issues the > request for any configuration offset. > As a result of the fact, PCIe RC driver (sw) should take care of > retrying. > > For config writes, there is no way for this driver to learn about > CRS completions. A write that receives a CRS completion will not be > retried and the data will be discarded. This is a potential data > corruption. > > For config reads, this hardware returns CFG_RETRY_STATUS data when > it receives a CRS completion for a config read, regardless of the > address of the read or the CRS Software Visibility Enable bit. As a > partial workaround for this, we retry in software any read that > returns CFG_RETRY_STATUS. > > It implements iproc_pcie_config_read which gets called for Stingray. > For other iproc based SOC, it falls back to PCI generic APIs. > > Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com> > Reviewed-by: Ray Jui <ray.jui@broadcom.com> > Reviewed-by: Scott Branden <scott.branden@broadcom.com> > > diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c > index c574863..a3edae4 100644 > --- a/drivers/pci/host/pcie-iproc.c > +++ b/drivers/pci/host/pcie-iproc.c > @@ -68,6 +68,9 @@ > #define APB_ERR_EN_SHIFT 0 > #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) > > +#define CFG_RETRY_STATUS 0xffff0001 > +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ > + > /* derive the enum index of the outbound/inbound mapping registers */ > #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) > > @@ -448,6 +451,89 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, > } > } > > +static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) > +{ > + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; > + unsigned int data; > + > + /* > + * As per PCIe spec r3.1, sec 2.3.2, CRS Software > + * Visibility only affects config read of the Vendor ID. > + * For config write or any other config read the Root must > + * automatically re-issue configuration request again as a > + * new request. > + * > + * For config reads, this hardware returns CFG_RETRY_STATUS data when > + * it receives a CRS completion for a config read, regardless of the > + * address of the read or the CRS Software Visibility Enable bit. As a > + * partial workaround for this, we retry in software any read that > + * returns CFG_RETRY_STATUS. > + */ > + data = readl(cfg_data_p); > + while (data == CFG_RETRY_STATUS && timeout--) { > + udelay(1); > + data = readl(cfg_data_p); > + } > + > + if (data == CFG_RETRY_STATUS) > + data = 0xffffffff; > + > + return data; > +} > + > +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, > + unsigned int busno, > + unsigned int slot, > + unsigned int fn, > + int where) > +{ > + u16 offset; > + u32 val; > + > + /* EP device access */ > + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | > + (slot << CFG_ADDR_DEV_NUM_SHIFT) | > + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | > + (where & CFG_ADDR_REG_NUM_MASK) | > + (1 & CFG_ADDR_CFG_TYPE_MASK); > + > + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); > + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); > + > + if (iproc_pcie_reg_is_invalid(offset)) > + return NULL; > + > + return (pcie->base + offset); > +} > + > +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, > + int where, int size, u32 *val) > +{ > + struct iproc_pcie *pcie = iproc_data(bus); > + unsigned int slot = PCI_SLOT(devfn); > + unsigned int fn = PCI_FUNC(devfn); > + unsigned int busno = bus->number; > + void __iomem *cfg_data_p; > + u32 data; > + > + /* root complex access. */ > + if (busno == 0) > + return pci_generic_config_read32(bus, devfn, where, size, val); > + > + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); > + > + if (!cfg_data_p) > + return PCIBIOS_DEVICE_NOT_FOUND; > + > + data = iproc_pcie_cfg_retry(cfg_data_p); > + > + *val = data; > + if (size <= 2) > + *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); > + > + return PCIBIOS_SUCCESSFUL; > +} > + > /** > * Note access to the configuration registers are protected at the higher layer > * by 'pci_lock' in drivers/pci/access.c > @@ -459,7 +545,6 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct iproc_pcie *pcie, > { > unsigned slot = PCI_SLOT(devfn); > unsigned fn = PCI_FUNC(devfn); > - u32 val; > u16 offset; > > /* root complex access */ > @@ -484,18 +569,7 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct iproc_pcie *pcie, > if (slot > 0) > return NULL; > > - /* EP device access */ > - val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | > - (slot << CFG_ADDR_DEV_NUM_SHIFT) | > - (fn << CFG_ADDR_FUNC_NUM_SHIFT) | > - (where & CFG_ADDR_REG_NUM_MASK) | > - (1 & CFG_ADDR_CFG_TYPE_MASK); > - iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); > - offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); > - if (iproc_pcie_reg_is_invalid(offset)) > - return NULL; > - else > - return (pcie->base + offset); > + return iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); Thanks for factoring this out. Can you please split that refactor into a separate patch? That will make this CRS-handling patch smaller and simpler. > } > > static void __iomem *iproc_pcie_bus_map_cfg_bus(struct pci_bus *bus, > @@ -554,8 +628,13 @@ static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, > int where, int size, u32 *val) > { > int ret; > + struct iproc_pcie *pcie = iproc_data(bus); > > iproc_pcie_apb_err_disable(bus, true); > + if (pcie->type == IPROC_PCIE_PAXB_V2) > + ret = iproc_pcie_config_read(bus, devfn, where, size, val); > + else > + ret = pci_generic_config_read32(bus, devfn, where, size, val); > ret = pci_generic_config_read32(bus, devfn, where, size, val); > iproc_pcie_apb_err_disable(bus, false); > > -- > 1.9.1 >
On Wed, Aug 23, 2017 at 2:03 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > [+cc Sinan, Timur, Alex] > > Hi Oza, > > On Mon, Aug 21, 2017 at 09:28:43PM +0530, Oza Pawandeep wrote: >> PCIe spec r3.1, sec 2.3.2 >> If CRS software visibility is not enabled, the RC must reissue the >> config request as a new request. >> >> - If CRS software visibility is enabled, >> - for a config read of Vendor ID, the RC must return 0x0001 data >> - for all other config reads/writes, the RC must reissue the >> request > > You mentioned early on that this fixes an issue with NVMe after a > reset. Is that reset an FLR? I'm wondering if this overlaps with the > work Sinan is doing: > > http://lkml.kernel.org/r/20170818212310.15145.21732.stgit@bhelgaas-glaptop.roam.corp.google.com > > With the current code in the tree, pci_flr_wait() probably waits only > 100ms on your system. It currently reads PCI_COMMAND and probably > sees 0xffff0001 because the NVMe device returns a CRS completion > (which this iproc RC incorrectly turns into 0xffff0001 data), so we > exit the loop after a single msleep(100). > > Sinan is extending pci_flr_wait() to read the Vendor ID and look for > the CRS SV indication. If this is where you're seeing the NVMe issue, > I bet we can fix this by simply changing iproc_pcie_config_read() to > return the correct data when it sees a CRS completion. Then the > retry loop in pci_flr_wait() will work as intended. > The issue with User Space poll mode drivers resulting into CRS. root@bcm958742k:~# /usr/share/spdk/examples/nvme/perf -r 'trtype:PCIe traddr:0000:01:00.0' -q 128 -s 4096 -w read -d 2048 -t 30 -c 0x1 Starting DPDK 16.11.1 initialization... [ DPDK EAL parameters: perf -c 1 -m 2048 --file-prefix=spdk_pid2202 ] EAL: Detected 8 lcore(s) EAL: Probing VFIO support... EAL: VFIO support initialized EAL: cannot open /proc/self/numa_maps, consider that all memory is in socket_id 0 Initializing NVMe Controllers EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:953 spdk_nvme EAL: using IOMMU type 1 (Type 1) [ 45.811353] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x2a0 Attaching to NVMe Controller at 0000:01:00.0 [8086:0953] [ 46.486601] vfio_bar_restore: 0000:01:00.0 reset recovery - restoring bars nvme_ctrlr.c:1237:nvme_ctrlr_process_init: ***ERROR*** Initialization timed out in state 3 nvme_ctrlr.c: 414:nvme_ctrlr_shutdown: ***ERROR*** did not shutdown within 5 seconds EAL: Hotplug doesn't support vfio yet spdk_nvme_probe() failed for transport address '0000:01:00.0' /usr/share/spdk/examples/nvme/perf: errors occured ok, so this is in the context of [ 52.500579] [<ffff000008088498>] dump_backtrace+0x0/0x208 [ 52.506147] [<ffff0000080886b4>] show_stack+0x14/0x20 [ 52.511357] [<ffff0000083ace34>] dump_stack+0x8c/0xb0 [ 52.516567] [<ffff00000840a550>] pci_flr_wait+0x18/0xc8 [ 52.521955] [<ffff00000840bf9c>] pcie_flr+0x34/0x68 [ 52.526986] [<ffff00000840e3d0>] __pci_dev_reset+0x100/0x2c0 [ 52.532823] [<ffff00000840e60c>] pci_try_reset_function+0x64/0x80 [ 52.539108] [<ffff00000856e4b8>] vfio_pci_ioctl+0x398/0xb78 [ 52.544857] [<ffff000008568378>] vfio_device_fops_unl_ioctl+0x20/0x30 [ 52.551501] [<ffff0000081d4e24>] do_vfs_ioctl+0xa4/0x738 [ 52.556980] [<ffff0000081d5544>] SyS_ioctl+0x8c/0xa0 [ 52.562100] [<ffff000008082f8c>] __sys_trace_return+0x0/0x4 So, I have two things to do here. 1) remove retry funciton and just do following. data = readl(cfg_data_p); if (data == CFG_RETRY_STATUS) /* 0xffff0001 */ data = 0xffffffff; 2) refactor the code with separate patch. but for that Sinan's patch is required. then with both the patches I think, things will work out correctly for iproc SOC. Let me know how this sounds and if you agree with above two points, I will be posting final set of patches. let me know if you want me to change anything in the commit description. in my opinion I should remove config write description which is irrelevant here. Regards, Oza. > Sec 2.3.2 says the RC may limit the number of retries, and it doesn't > say anything about how many retries the RC should do or what interval > should be between them. So I think it should be legal to say "this RC > does zero retries". > > I think an RC that does zero retries and doesn't support CRS SV would > always handle a CRS completion as a "failed transaction" so it would > synthesize 0xffffffff data (and, I assume, set an error bit like > Received Master Abort). If we treated this controller as though it > doesn't support CRS SV, the workaround in iproc_pcie_config_read() > would be simply: > > data = readl(cfg_data_p); > if (data == CFG_RETRY_STATUS) /* 0xffff0001 */ > data = 0xffffffff; > > That would make the loop in pci_flr_wait() that reads PCI_COMMAND work > correctly. It would get 0xffffffff data as long as the device > returned CRS completions. > > It wouldn't make Sinan's new CRS SV code work, but that's OK: that all > has to work correctly even on systems that don't support CRS SV. > > The only problem is that when we read a non-Vendor ID register that > happens to really be 0xffff0001, we *should* return 0xffff0001, but we > incorrectly return 0xffffffff. But we already know we just have to > live with that issue. > > One minor refactoring comment below. > >> iproc PCIe Controller spec: >> 4.7.3.3. Retry Status On Configuration Cycle >> Endpoints are allowed to generate retry status on configuration >> cycles. In this case, the RC needs to re-issue the request. The IP >> does not handle this because the number of configuration cycles needed >> will probably be less than the total number of non-posted operations >> needed. >> >> When a retry status is received on the User RX interface for a >> configuration request that was sent on the User TX interface, >> it will be indicated with a completion with the CMPL_STATUS field set >> to 2=CRS, and the user will have to find the address and data values >> and send a new transaction on the User TX interface. >> When the internal configuration space returns a retry status during a >> configuration cycle (user_cscfg = 1) on the Command/Status interface, >> the pcie_cscrs will assert with the pcie_csack signal to indicate the >> CRS status. >> When the CRS Software Visibility Enable register in the Root Control >> register is enabled, the IP will return the data value to 0x0001 for >> the Vendor ID value and 0xffff (all 1’s) for the rest of the data in >> the request for reads of offset 0 that return with CRS status. This >> is true for both the User RX Interface and for the Command/Status >> interface. When CRS Software Visibility is enabled, the CMPL_STATUS >> field of the completion on the User RX Interface will not be 2=CRS and >> the pcie_cscrs signal will not assert on the Command/Status interface. >> >> Per PCIe r3.1, sec 2.3.2, config requests that receive completions >> with Configuration Request Retry Status (CRS) should be reissued by >> the hardware except reads of the Vendor ID when CRS Software >> Visibility is enabled. >> >> This hardware never reissues configuration requests when it receives >> CRS completions. >> Note that, neither PCIe host bridge nor PCIe core re-issues the >> request for any configuration offset. >> As a result of the fact, PCIe RC driver (sw) should take care of >> retrying. >> >> For config writes, there is no way for this driver to learn about >> CRS completions. A write that receives a CRS completion will not be >> retried and the data will be discarded. This is a potential data >> corruption. >> >> For config reads, this hardware returns CFG_RETRY_STATUS data when >> it receives a CRS completion for a config read, regardless of the >> address of the read or the CRS Software Visibility Enable bit. As a >> partial workaround for this, we retry in software any read that >> returns CFG_RETRY_STATUS. >> >> It implements iproc_pcie_config_read which gets called for Stingray. >> For other iproc based SOC, it falls back to PCI generic APIs. >> >> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com> >> Reviewed-by: Ray Jui <ray.jui@broadcom.com> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com> >> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c >> index c574863..a3edae4 100644 >> --- a/drivers/pci/host/pcie-iproc.c >> +++ b/drivers/pci/host/pcie-iproc.c >> @@ -68,6 +68,9 @@ >> #define APB_ERR_EN_SHIFT 0 >> #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) >> >> +#define CFG_RETRY_STATUS 0xffff0001 >> +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ >> + >> /* derive the enum index of the outbound/inbound mapping registers */ >> #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) >> >> @@ -448,6 +451,89 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, >> } >> } >> >> +static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) >> +{ >> + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; >> + unsigned int data; >> + >> + /* >> + * As per PCIe spec r3.1, sec 2.3.2, CRS Software >> + * Visibility only affects config read of the Vendor ID. >> + * For config write or any other config read the Root must >> + * automatically re-issue configuration request again as a >> + * new request. >> + * >> + * For config reads, this hardware returns CFG_RETRY_STATUS data when >> + * it receives a CRS completion for a config read, regardless of the >> + * address of the read or the CRS Software Visibility Enable bit. As a >> + * partial workaround for this, we retry in software any read that >> + * returns CFG_RETRY_STATUS. >> + */ >> + data = readl(cfg_data_p); >> + while (data == CFG_RETRY_STATUS && timeout--) { >> + udelay(1); >> + data = readl(cfg_data_p); >> + } >> + >> + if (data == CFG_RETRY_STATUS) >> + data = 0xffffffff; >> + >> + return data; >> +} >> + >> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, >> + unsigned int busno, >> + unsigned int slot, >> + unsigned int fn, >> + int where) >> +{ >> + u16 offset; >> + u32 val; >> + >> + /* EP device access */ >> + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | >> + (slot << CFG_ADDR_DEV_NUM_SHIFT) | >> + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | >> + (where & CFG_ADDR_REG_NUM_MASK) | >> + (1 & CFG_ADDR_CFG_TYPE_MASK); >> + >> + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); >> + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); >> + >> + if (iproc_pcie_reg_is_invalid(offset)) >> + return NULL; >> + >> + return (pcie->base + offset); >> +} >> + >> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, >> + int where, int size, u32 *val) >> +{ >> + struct iproc_pcie *pcie = iproc_data(bus); >> + unsigned int slot = PCI_SLOT(devfn); >> + unsigned int fn = PCI_FUNC(devfn); >> + unsigned int busno = bus->number; >> + void __iomem *cfg_data_p; >> + u32 data; >> + >> + /* root complex access. */ >> + if (busno == 0) >> + return pci_generic_config_read32(bus, devfn, where, size, val); >> + >> + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); >> + >> + if (!cfg_data_p) >> + return PCIBIOS_DEVICE_NOT_FOUND; >> + >> + data = iproc_pcie_cfg_retry(cfg_data_p); >> + >> + *val = data; >> + if (size <= 2) >> + *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); >> + >> + return PCIBIOS_SUCCESSFUL; >> +} >> + >> /** >> * Note access to the configuration registers are protected at the higher layer >> * by 'pci_lock' in drivers/pci/access.c >> @@ -459,7 +545,6 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct iproc_pcie *pcie, >> { >> unsigned slot = PCI_SLOT(devfn); >> unsigned fn = PCI_FUNC(devfn); >> - u32 val; >> u16 offset; >> >> /* root complex access */ >> @@ -484,18 +569,7 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct iproc_pcie *pcie, >> if (slot > 0) >> return NULL; >> >> - /* EP device access */ >> - val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | >> - (slot << CFG_ADDR_DEV_NUM_SHIFT) | >> - (fn << CFG_ADDR_FUNC_NUM_SHIFT) | >> - (where & CFG_ADDR_REG_NUM_MASK) | >> - (1 & CFG_ADDR_CFG_TYPE_MASK); >> - iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); >> - offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); >> - if (iproc_pcie_reg_is_invalid(offset)) >> - return NULL; >> - else >> - return (pcie->base + offset); >> + return iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); > > Thanks for factoring this out. Can you please split that refactor > into a separate patch? That will make this CRS-handling patch smaller > and simpler. > >> } >> >> static void __iomem *iproc_pcie_bus_map_cfg_bus(struct pci_bus *bus, >> @@ -554,8 +628,13 @@ static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, >> int where, int size, u32 *val) >> { >> int ret; >> + struct iproc_pcie *pcie = iproc_data(bus); >> >> iproc_pcie_apb_err_disable(bus, true); >> + if (pcie->type == IPROC_PCIE_PAXB_V2) >> + ret = iproc_pcie_config_read(bus, devfn, where, size, val); >> + else >> + ret = pci_generic_config_read32(bus, devfn, where, size, val); >> ret = pci_generic_config_read32(bus, devfn, where, size, val); >> iproc_pcie_apb_err_disable(bus, false); >> >> -- >> 1.9.1 >>
On Wed, Aug 23, 2017 at 03:57:02PM +0530, Oza Oza wrote: > On Wed, Aug 23, 2017 at 2:03 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Sinan, Timur, Alex] > > > > Hi Oza, > > > > On Mon, Aug 21, 2017 at 09:28:43PM +0530, Oza Pawandeep wrote: > >> PCIe spec r3.1, sec 2.3.2 > >> If CRS software visibility is not enabled, the RC must reissue the > >> config request as a new request. > >> > >> - If CRS software visibility is enabled, > >> - for a config read of Vendor ID, the RC must return 0x0001 data > >> - for all other config reads/writes, the RC must reissue the > >> request > > > > You mentioned early on that this fixes an issue with NVMe after a > > reset. Is that reset an FLR? I'm wondering if this overlaps with the > > work Sinan is doing: > > > > http://lkml.kernel.org/r/20170818212310.15145.21732.stgit@bhelgaas-glaptop.roam.corp.google.com > > > > With the current code in the tree, pci_flr_wait() probably waits only > > 100ms on your system. It currently reads PCI_COMMAND and probably > > sees 0xffff0001 because the NVMe device returns a CRS completion > > (which this iproc RC incorrectly turns into 0xffff0001 data), so we > > exit the loop after a single msleep(100). > > > > Sinan is extending pci_flr_wait() to read the Vendor ID and look for > > the CRS SV indication. If this is where you're seeing the NVMe issue, > > I bet we can fix this by simply changing iproc_pcie_config_read() to > > return the correct data when it sees a CRS completion. Then the > > retry loop in pci_flr_wait() will work as intended. > > > > The issue with User Space poll mode drivers resulting into CRS. > > > root@bcm958742k:~# /usr/share/spdk/examples/nvme/perf -r 'trtype:PCIe > traddr:0000:01:00.0' -q 128 -s 4096 -w read -d 2048 -t 30 -c 0x1 > Starting DPDK 16.11.1 initialization... > [ DPDK EAL parameters: perf -c 1 -m 2048 --file-prefix=spdk_pid2202 ] > EAL: Detected 8 lcore(s) > EAL: Probing VFIO support... > EAL: VFIO support initialized > EAL: cannot open /proc/self/numa_maps, consider that all memory is in > socket_id 0 > Initializing NVMe Controllers > EAL: PCI device 0000:01:00.0 on NUMA socket 0 > EAL: probe driver: 8086:953 spdk_nvme > EAL: using IOMMU type 1 (Type 1) > [ 45.811353] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x2a0 > Attaching to NVMe Controller at 0000:01:00.0 [8086:0953] > [ 46.486601] vfio_bar_restore: 0000:01:00.0 reset recovery - restoring bars > nvme_ctrlr.c:1237:nvme_ctrlr_process_init: ***ERROR*** Initialization > timed out in state 3 > nvme_ctrlr.c: 414:nvme_ctrlr_shutdown: ***ERROR*** did not shutdown > within 5 seconds > EAL: Hotplug doesn't support vfio yet > spdk_nvme_probe() failed for transport address '0000:01:00.0' > /usr/share/spdk/examples/nvme/perf: errors occured > > ok, so this is in the context of > [ 52.500579] [<ffff000008088498>] dump_backtrace+0x0/0x208 > [ 52.506147] [<ffff0000080886b4>] show_stack+0x14/0x20 > [ 52.511357] [<ffff0000083ace34>] dump_stack+0x8c/0xb0 > [ 52.516567] [<ffff00000840a550>] pci_flr_wait+0x18/0xc8 > [ 52.521955] [<ffff00000840bf9c>] pcie_flr+0x34/0x68 > [ 52.526986] [<ffff00000840e3d0>] __pci_dev_reset+0x100/0x2c0 > [ 52.532823] [<ffff00000840e60c>] pci_try_reset_function+0x64/0x80 > [ 52.539108] [<ffff00000856e4b8>] vfio_pci_ioctl+0x398/0xb78 > [ 52.544857] [<ffff000008568378>] vfio_device_fops_unl_ioctl+0x20/0x30 > [ 52.551501] [<ffff0000081d4e24>] do_vfs_ioctl+0xa4/0x738 > [ 52.556980] [<ffff0000081d5544>] SyS_ioctl+0x8c/0xa0 > [ 52.562100] [<ffff000008082f8c>] __sys_trace_return+0x0/0x4 > > > So, I have two things to do here. > > 1) remove retry funciton and just do following. > data = readl(cfg_data_p); > if (data == CFG_RETRY_STATUS) /* 0xffff0001 */ > data = 0xffffffff; > > 2) refactor the code with separate patch. > > but for that Sinan's patch is required. > then with both the patches I think, things will work out correctly for > iproc SOC. It looks like you are using the pci_flr_wait() path as I suspected. If you do the check as you mentioned in 1), this should work even without Sinan's patch. It's possible you'd have to increase the pci_flr_wait() timeout (currently 1sec total). I think we'll end up doing that as part of Sinan's work, so we use the same timeout whether or not the RC supports CRS SV. > > Let me know how this sounds and if you agree with above two points, I > will be posting final set of patches. > > let me know if you want me to change anything in the commit description. > in my opinion I should remove config write description which is irrelevant here. I agree. I don't think you need to say anything special about write, since I think an RC is probably allowed to drop (with no retries) a config write that receives a CRS completion. In this case, I expect the RC would log an error, so you should see something in PCI_STATUS.
On Wed, Aug 23, 2017 at 7:21 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Wed, Aug 23, 2017 at 03:57:02PM +0530, Oza Oza wrote: >> On Wed, Aug 23, 2017 at 2:03 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: >> > [+cc Sinan, Timur, Alex] >> > >> > Hi Oza, >> > >> > On Mon, Aug 21, 2017 at 09:28:43PM +0530, Oza Pawandeep wrote: >> >> PCIe spec r3.1, sec 2.3.2 >> >> If CRS software visibility is not enabled, the RC must reissue the >> >> config request as a new request. >> >> >> >> - If CRS software visibility is enabled, >> >> - for a config read of Vendor ID, the RC must return 0x0001 data >> >> - for all other config reads/writes, the RC must reissue the >> >> request >> > >> > You mentioned early on that this fixes an issue with NVMe after a >> > reset. Is that reset an FLR? I'm wondering if this overlaps with the >> > work Sinan is doing: >> > >> > http://lkml.kernel.org/r/20170818212310.15145.21732.stgit@bhelgaas-glaptop.roam.corp.google.com >> > >> > With the current code in the tree, pci_flr_wait() probably waits only >> > 100ms on your system. It currently reads PCI_COMMAND and probably >> > sees 0xffff0001 because the NVMe device returns a CRS completion >> > (which this iproc RC incorrectly turns into 0xffff0001 data), so we >> > exit the loop after a single msleep(100). >> > >> > Sinan is extending pci_flr_wait() to read the Vendor ID and look for >> > the CRS SV indication. If this is where you're seeing the NVMe issue, >> > I bet we can fix this by simply changing iproc_pcie_config_read() to >> > return the correct data when it sees a CRS completion. Then the >> > retry loop in pci_flr_wait() will work as intended. >> > >> >> The issue with User Space poll mode drivers resulting into CRS. >> >> >> root@bcm958742k:~# /usr/share/spdk/examples/nvme/perf -r 'trtype:PCIe >> traddr:0000:01:00.0' -q 128 -s 4096 -w read -d 2048 -t 30 -c 0x1 >> Starting DPDK 16.11.1 initialization... >> [ DPDK EAL parameters: perf -c 1 -m 2048 --file-prefix=spdk_pid2202 ] >> EAL: Detected 8 lcore(s) >> EAL: Probing VFIO support... >> EAL: VFIO support initialized >> EAL: cannot open /proc/self/numa_maps, consider that all memory is in >> socket_id 0 >> Initializing NVMe Controllers >> EAL: PCI device 0000:01:00.0 on NUMA socket 0 >> EAL: probe driver: 8086:953 spdk_nvme >> EAL: using IOMMU type 1 (Type 1) >> [ 45.811353] vfio_ecap_init: 0000:01:00.0 hiding ecap 0x19@0x2a0 >> Attaching to NVMe Controller at 0000:01:00.0 [8086:0953] >> [ 46.486601] vfio_bar_restore: 0000:01:00.0 reset recovery - restoring bars >> nvme_ctrlr.c:1237:nvme_ctrlr_process_init: ***ERROR*** Initialization >> timed out in state 3 >> nvme_ctrlr.c: 414:nvme_ctrlr_shutdown: ***ERROR*** did not shutdown >> within 5 seconds >> EAL: Hotplug doesn't support vfio yet >> spdk_nvme_probe() failed for transport address '0000:01:00.0' >> /usr/share/spdk/examples/nvme/perf: errors occured >> >> ok, so this is in the context of >> [ 52.500579] [<ffff000008088498>] dump_backtrace+0x0/0x208 >> [ 52.506147] [<ffff0000080886b4>] show_stack+0x14/0x20 >> [ 52.511357] [<ffff0000083ace34>] dump_stack+0x8c/0xb0 >> [ 52.516567] [<ffff00000840a550>] pci_flr_wait+0x18/0xc8 >> [ 52.521955] [<ffff00000840bf9c>] pcie_flr+0x34/0x68 >> [ 52.526986] [<ffff00000840e3d0>] __pci_dev_reset+0x100/0x2c0 >> [ 52.532823] [<ffff00000840e60c>] pci_try_reset_function+0x64/0x80 >> [ 52.539108] [<ffff00000856e4b8>] vfio_pci_ioctl+0x398/0xb78 >> [ 52.544857] [<ffff000008568378>] vfio_device_fops_unl_ioctl+0x20/0x30 >> [ 52.551501] [<ffff0000081d4e24>] do_vfs_ioctl+0xa4/0x738 >> [ 52.556980] [<ffff0000081d5544>] SyS_ioctl+0x8c/0xa0 >> [ 52.562100] [<ffff000008082f8c>] __sys_trace_return+0x0/0x4 >> >> >> So, I have two things to do here. >> >> 1) remove retry funciton and just do following. >> data = readl(cfg_data_p); >> if (data == CFG_RETRY_STATUS) /* 0xffff0001 */ >> data = 0xffffffff; >> >> 2) refactor the code with separate patch. >> >> but for that Sinan's patch is required. >> then with both the patches I think, things will work out correctly for >> iproc SOC. > > It looks like you are using the pci_flr_wait() path as I suspected. > > If you do the check as you mentioned in 1), this should work even > without Sinan's patch. looks like this can not work. If I add folowing in iproc_pcie_config_read() if (data == CFG_RETRY_STATUS) /* 0xffff0001 */ data = 0xffffffff; because pci_bus_read_dev_vendor_id returns false /* some broken boards return 0 or ~0 if a slot is empty: */ if (*l == 0xffffffff || *l == 0x00000000 || *l == 0x0000ffff || *l == 0xffff0000) return false; In working Enumuration case I get following: [ 9.125976] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), re-configuring [ 9.134267] where=0x0 val=0xffff0001 [ 9.146946] where=0x0 val=0xffff0001 [ 9.158943] where=0x0 val=0xffff0001 [ 9.170945] where=0x0 val=0xffff0001 [ 9.186945] where=0x0 val=0xffff0001 [ 9.210944] where=0x0 val=0xffff0001 [ 9.250943] where=0x0 val=0xffff0001 [ 9.322942] where=0x0 val=0xffff0001 [ 9.458943] where=0x0 val=0xffff0001 [ 9.726942] where=0x0 val=0x9538086 >> actual vendor and device id. so I think I have to retry in RC driver, so the old code still holds good. except that I have to do factoring out. please let me know If I missed anything, or you want me to try anything else. Regards, Oza. > > It's possible you'd have to increase the pci_flr_wait() timeout > (currently 1sec total). I think we'll end up doing that as part of > Sinan's work, so we use the same timeout whether or not the RC > supports CRS SV. > >> >> Let me know how this sounds and if you agree with above two points, I >> will be posting final set of patches. >> >> let me know if you want me to change anything in the commit description. >> in my opinion I should remove config write description which is irrelevant here. > > I agree. I don't think you need to say anything special about write, > since I think an RC is probably allowed to drop (with no retries) a > config write that receives a CRS completion. In this case, I expect > the RC would log an error, so you should see something in PCI_STATUS.
Hi Oza, > In working Enumuration case I get following: > [ 9.125976] pci 0000:00:00.0: bridge configuration invalid ([bus > 00-00]), re-configuring > [ 9.134267] where=0x0 val=0xffff0001 > [ 9.146946] where=0x0 val=0xffff0001 > [ 9.158943] where=0x0 val=0xffff0001 > [ 9.170945] where=0x0 val=0xffff0001 > [ 9.186945] where=0x0 val=0xffff0001 > [ 9.210944] where=0x0 val=0xffff0001 > [ 9.250943] where=0x0 val=0xffff0001 > [ 9.322942] where=0x0 val=0xffff0001 > [ 9.458943] where=0x0 val=0xffff0001 > [ 9.726942] where=0x0 val=0x9538086 >> actual vendor and device id. > > so I think I have to retry in RC driver, so the old code still holds good. > except that I have to do factoring out You need to return 0xFFFF0001 for vendor ID register and return 0xFFFFFFFF for other registers like COMMAND register during the CRS period. > > please let me know If I missed anything, or you want me to try anything else. Sinan
On Wed, Aug 23, 2017 at 9:22 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > Hi Oza, > >> In working Enumuration case I get following: >> [ 9.125976] pci 0000:00:00.0: bridge configuration invalid ([bus >> 00-00]), re-configuring >> [ 9.134267] where=0x0 val=0xffff0001 >> [ 9.146946] where=0x0 val=0xffff0001 >> [ 9.158943] where=0x0 val=0xffff0001 >> [ 9.170945] where=0x0 val=0xffff0001 >> [ 9.186945] where=0x0 val=0xffff0001 >> [ 9.210944] where=0x0 val=0xffff0001 >> [ 9.250943] where=0x0 val=0xffff0001 >> [ 9.322942] where=0x0 val=0xffff0001 >> [ 9.458943] where=0x0 val=0xffff0001 >> [ 9.726942] where=0x0 val=0x9538086 >> actual vendor and device id. >> >> so I think I have to retry in RC driver, so the old code still holds good. >> except that I have to do factoring out > You need to return 0xFFFF0001 for vendor ID register and return 0xFFFFFFFF for > other registers like COMMAND register during the CRS period. > Hi Sinan, Although I don't disagree entirely with your suggestion, but at the same time, I am not sure, in RC driver I should have special case for vendor id and device id, and have another case for rest of the configuration access. my thinking is, the RC driver should closely reflect the PCI iproc HW behaviour (in a generic way, rather than handling vendor/device id case separately) Bjorn, please comment as how do you see it being handled. Regards, Oza. >> >> please let me know If I missed anything, or you want me to try anything else. > > Sinan
On Wed, Aug 23, 2017 at 09:32:06PM +0530, Oza Oza wrote: > On Wed, Aug 23, 2017 at 9:22 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > > Hi Oza, > > > >> In working Enumuration case I get following: > >> [ 9.125976] pci 0000:00:00.0: bridge configuration invalid ([bus > >> 00-00]), re-configuring > >> [ 9.134267] where=0x0 val=0xffff0001 > >> [ 9.146946] where=0x0 val=0xffff0001 > >> [ 9.158943] where=0x0 val=0xffff0001 > >> [ 9.170945] where=0x0 val=0xffff0001 > >> [ 9.186945] where=0x0 val=0xffff0001 > >> [ 9.210944] where=0x0 val=0xffff0001 > >> [ 9.250943] where=0x0 val=0xffff0001 > >> [ 9.322942] where=0x0 val=0xffff0001 > >> [ 9.458943] where=0x0 val=0xffff0001 > >> [ 9.726942] where=0x0 val=0x9538086 >> actual vendor and device id. > >> > >> so I think I have to retry in RC driver, so the old code still holds good. > >> except that I have to do factoring out > > You need to return 0xFFFF0001 for vendor ID register and return 0xFFFFFFFF for > > other registers like COMMAND register during the CRS period. The proposal we're trying to implement is to handle this controller as an RP that does not support CRS SV. Such an RP would never return 0xffff0001 for the vendor/device ID. If it never got a successful completion, it would return 0xffffffff. So I think you do have to either retry (as in your v7 patch) or add a delay before we start enumeration. It looks like we waited somewhere between 320ms and 590ms before this device became ready. The PCI specs do require a delay after a reset (PCIe r3.1, sec 6.6.1) before software sends a config request. I don't think there's anywhere in the PCI core where we delay before we first enumerate devices under a host bridge. I'm not sure we'd *want* such a delay in the PCI core, because on many systems the BIOS has already initialized the PCI controller, and an additional delay would be unnecessary and would only slow down boot. But it might make sense to add a delay in the PCI controller driver -- it knows when the reset occurs and probably knows more about the firmware environment. I haven't tried to analyze all of sec 6.6.1, but this section: Unless Readiness Notifications mechanisms are used (see Section 6.23), the Root Complex and/or system software must allow at least 1.0 s after a Conventional Reset of a device, before it may determine that a device which fails to return a Successful Completion status for a valid Configuration Request is a broken device. This period is independent of how quickly Link training completes. Note: This delay is analogous to the Trhfa parameter specified for PCI/PCI-X, and is intended to allow an adequate amount of time for devices which require self initialization. makes it sound like a 1sec delay might be needed. That sounds like an awful lot, but this device did take close to that amount of time. I don't care which way you go. You've already implemented the delay in the v7 patch, and that's fine with me. Bjorn
On Wed, Aug 23, 2017 at 11:30 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Wed, Aug 23, 2017 at 09:32:06PM +0530, Oza Oza wrote: >> On Wed, Aug 23, 2017 at 9:22 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >> > Hi Oza, >> > >> >> In working Enumuration case I get following: >> >> [ 9.125976] pci 0000:00:00.0: bridge configuration invalid ([bus >> >> 00-00]), re-configuring >> >> [ 9.134267] where=0x0 val=0xffff0001 >> >> [ 9.146946] where=0x0 val=0xffff0001 >> >> [ 9.158943] where=0x0 val=0xffff0001 >> >> [ 9.170945] where=0x0 val=0xffff0001 >> >> [ 9.186945] where=0x0 val=0xffff0001 >> >> [ 9.210944] where=0x0 val=0xffff0001 >> >> [ 9.250943] where=0x0 val=0xffff0001 >> >> [ 9.322942] where=0x0 val=0xffff0001 >> >> [ 9.458943] where=0x0 val=0xffff0001 >> >> [ 9.726942] where=0x0 val=0x9538086 >> actual vendor and device id. >> >> >> >> so I think I have to retry in RC driver, so the old code still holds good. >> >> except that I have to do factoring out >> > You need to return 0xFFFF0001 for vendor ID register and return 0xFFFFFFFF for >> > other registers like COMMAND register during the CRS period. > > The proposal we're trying to implement is to handle this controller as > an RP that does not support CRS SV. Such an RP would never return > 0xffff0001 for the vendor/device ID. If it never got a successful > completion, it would return 0xffffffff. > > So I think you do have to either retry (as in your v7 patch) or add a > delay before we start enumeration. > > It looks like we waited somewhere between 320ms and 590ms before this > device became ready. > > The PCI specs do require a delay after a reset (PCIe r3.1, sec 6.6.1) > before software sends a config request. I don't think there's > anywhere in the PCI core where we delay before we first enumerate > devices under a host bridge. I'm not sure we'd *want* such a delay in > the PCI core, because on many systems the BIOS has already initialized > the PCI controller, and an additional delay would be unnecessary and > would only slow down boot. > > But it might make sense to add a delay in the PCI controller driver -- > it knows when the reset occurs and probably knows more about the > firmware environment. I haven't tried to analyze all of sec 6.6.1, > but this section: > > Unless Readiness Notifications mechanisms are used (see Section > 6.23), the Root Complex and/or system software must allow at least > 1.0 s after a Conventional Reset of a device, before it may > determine that a device which fails to return a Successful > Completion status for a valid Configuration Request is a broken > device. This period is independent of how quickly Link training > completes. > > Note: This delay is analogous to the Trhfa parameter specified for > PCI/PCI-X, and is intended to allow an adequate amount of time for > devices which require self initialization. > > makes it sound like a 1sec delay might be needed. That sounds like an > awful lot, but this device did take close to that amount of time. > > I don't care which way you go. You've already implemented the delay > in the v7 patch, and that's fine with me. > Thanks for your inputs Bjorn. I will have v8 patch which will have; factored out separate patch + retry implementation of v7 patch. > Bjorn
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c index c574863..a3edae4 100644 --- a/drivers/pci/host/pcie-iproc.c +++ b/drivers/pci/host/pcie-iproc.c @@ -68,6 +68,9 @@ #define APB_ERR_EN_SHIFT 0 #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) +#define CFG_RETRY_STATUS 0xffff0001 +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ + /* derive the enum index of the outbound/inbound mapping registers */ #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) @@ -448,6 +451,89 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, } } +static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) +{ + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; + unsigned int data; + + /* + * As per PCIe spec r3.1, sec 2.3.2, CRS Software + * Visibility only affects config read of the Vendor ID. + * For config write or any other config read the Root must + * automatically re-issue configuration request again as a + * new request. + * + * For config reads, this hardware returns CFG_RETRY_STATUS data when + * it receives a CRS completion for a config read, regardless of the + * address of the read or the CRS Software Visibility Enable bit. As a + * partial workaround for this, we retry in software any read that + * returns CFG_RETRY_STATUS. + */ + data = readl(cfg_data_p); + while (data == CFG_RETRY_STATUS && timeout--) { + udelay(1); + data = readl(cfg_data_p); + } + + if (data == CFG_RETRY_STATUS) + data = 0xffffffff; + + return data; +} + +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, + unsigned int busno, + unsigned int slot, + unsigned int fn, + int where) +{ + u16 offset; + u32 val; + + /* EP device access */ + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | + (slot << CFG_ADDR_DEV_NUM_SHIFT) | + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | + (where & CFG_ADDR_REG_NUM_MASK) | + (1 & CFG_ADDR_CFG_TYPE_MASK); + + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); + + if (iproc_pcie_reg_is_invalid(offset)) + return NULL; + + return (pcie->base + offset); +} + +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 *val) +{ + struct iproc_pcie *pcie = iproc_data(bus); + unsigned int slot = PCI_SLOT(devfn); + unsigned int fn = PCI_FUNC(devfn); + unsigned int busno = bus->number; + void __iomem *cfg_data_p; + u32 data; + + /* root complex access. */ + if (busno == 0) + return pci_generic_config_read32(bus, devfn, where, size, val); + + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); + + if (!cfg_data_p) + return PCIBIOS_DEVICE_NOT_FOUND; + + data = iproc_pcie_cfg_retry(cfg_data_p); + + *val = data; + if (size <= 2) + *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); + + return PCIBIOS_SUCCESSFUL; +} + /** * Note access to the configuration registers are protected at the higher layer * by 'pci_lock' in drivers/pci/access.c @@ -459,7 +545,6 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct iproc_pcie *pcie, { unsigned slot = PCI_SLOT(devfn); unsigned fn = PCI_FUNC(devfn); - u32 val; u16 offset; /* root complex access */ @@ -484,18 +569,7 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct iproc_pcie *pcie, if (slot > 0) return NULL; - /* EP device access */ - val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | - (slot << CFG_ADDR_DEV_NUM_SHIFT) | - (fn << CFG_ADDR_FUNC_NUM_SHIFT) | - (where & CFG_ADDR_REG_NUM_MASK) | - (1 & CFG_ADDR_CFG_TYPE_MASK); - iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); - offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); - if (iproc_pcie_reg_is_invalid(offset)) - return NULL; - else - return (pcie->base + offset); + return iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); } static void __iomem *iproc_pcie_bus_map_cfg_bus(struct pci_bus *bus, @@ -554,8 +628,13 @@ static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { int ret; + struct iproc_pcie *pcie = iproc_data(bus); iproc_pcie_apb_err_disable(bus, true); + if (pcie->type == IPROC_PCIE_PAXB_V2) + ret = iproc_pcie_config_read(bus, devfn, where, size, val); + else + ret = pci_generic_config_read32(bus, devfn, where, size, val); ret = pci_generic_config_read32(bus, devfn, where, size, val); iproc_pcie_apb_err_disable(bus, false);