Message ID | 1501861696-25767-2-git-send-email-oza.oza@broadcom.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote: > PCIe spec r3.1, sec 2.3.2 > If CRS software visibility is not enabled, the RC must reissue the > config request as a new request. > > - If CRS software visibility is enabled, > - for a config read of Vendor ID, the RC must return 0x0001 data > - for all other config reads/writes, the RC must reissue the > request > > iproc PCIe Controller spec: > 4.7.3.3. Retry Status On Configuration Cycle > Endpoints are allowed to generate retry status on configuration > cycles. In this case, the RC needs to re-issue the request. The IP > does not handle this because the number of configuration cycles needed > will probably be less than the total number of non-posted operations > needed. > > When a retry status is received on the User RX interface for a > configuration request that was sent on the User TX interface, > it will be indicated with a completion with the CMPL_STATUS field set > to 2=CRS, and the user will have to find the address and data values > and send a new transaction on the User TX interface. > When the internal configuration space returns a retry status during a > configuration cycle (user_cscfg = 1) on the Command/Status interface, > the pcie_cscrs will assert with the pcie_csack signal to indicate the > CRS status. > When the CRS Software Visibility Enable register in the Root Control > register is enabled, the IP will return the data value to 0x0001 for > the Vendor ID value and 0xffff (all 1’s) for the rest of the data in > the request for reads of offset 0 that return with CRS status. This > is true for both the User RX Interface and for the Command/Status > interface. When CRS Software Visibility is enabled, the CMPL_STATUS > field of the completion on the User RX Interface will not be 2=CRS and > the pcie_cscrs signal will not assert on the Command/Status interface. > > Note that, neither PCIe host bridge nor PCIe core re-issues the > request for any configuration offset. > > As a result of the fact, PCIe RC driver (sw) should take care of > retrying. > This patch fixes the problem, and attempts to read config space again > in case of PCIe code forwarding the CRS back to CPU. I think "fixes the problem" is a little bit of an overstatement. I think it covers up some cases of the problem, but if I understand correctly, we're still susceptible to data corruptions on both read and write: Per PCIe r3.1, sec 2.3.2, config requests that receive completions with Configuration Request Retry Status (CRS) should be reissued by the hardware *except* reads of the Vendor ID when CRS Software Visibility is enabled. This hardware never reissues configuration requests when it receives CRS completions. For config writes, there is no way for this driver to learn about CRS completions. A write that receives a CRS completion will not be retried and the data will be discarded. This is a potential data corruption. For config reads, this hardware returns CFG_RETRY_STATUS data when it receives a CRS completion for a config read, regardless of the address of the read or the CRS Software Visibility Enable bit. As a partial workaround for this, we retry in software any read that returns CFG_RETRY_STATUS. CFG_RETRY_STATUS could be valid data for other config registers. It is impossible to read such registers correctly because we can't tell whether the hardware (1) received a CRS completion and synthesized data of CFG_RETRY_STATUS or (2) received a successful completion with data of CFG_RETRY_STATUS. The only thing we can really do is return 0xffffffff data, which indicates a config read failure in the first case but is a data corruption in the second case. The write case is probably not that important because we have a similar situation on other systems. On other systems, the root complex automatically reissues config writes with CRS completions, but it may limit the number of retries. In that case, the failed write is probably logged as a Target Abort error, but nobody pays attention to that, so this isn't really much worse. > It implements iproc_pcie_config_read which gets called for Stingray, > Otherwise it falls back to PCI generic APIs. > > Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com> > Reviewed-by: Ray Jui <ray.jui@broadcom.com> > Reviewed-by: Scott Branden <scott.branden@broadcom.com> > > diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c > index 0f39bd2..583cee0 100644 > --- a/drivers/pci/host/pcie-iproc.c > +++ b/drivers/pci/host/pcie-iproc.c > @@ -68,6 +68,9 @@ > #define APB_ERR_EN_SHIFT 0 > #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) > > +#define CFG_RETRY_STATUS 0xffff0001 > +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ > + > /* derive the enum index of the outbound/inbound mapping registers */ > #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) > > @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, > } > } > > +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) > +{ > + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; > + unsigned int ret; > + > + /* > + * As per PCIe spec r3.1, sec 2.3.2, CRS Software > + * Visibility only affects config read of the Vendor ID. > + * For config write or any other config read the Root must > + * automatically re-issue configuration request again as a > + * new request. > + * > + * Iproc based PCIe RC (hw) does not retry > + * request on its own, so handle it here. > + * Note that, AXI data 0xffff0001 returned by PAXB could be > + * valid data in configuration space, for e.g. BAR requesting > + * 64bit IO memory, so we retry till timeout. > + * And leave to be handled by upper layers. > + * > + * Also, iproc PCIe RC does not have any effect on > + * CRS Software visibility bit, irrespective if it > + * set or unset, the RC will never re-issue any configuration > + * access request. > + */ > + do { > + ret = readl(cfg_data_p); > + if (ret == CFG_RETRY_STATUS) > + udelay(1); > + else > + return PCIBIOS_SUCCESSFUL; > + } while (timeout--); > + > + return PCIBIOS_DEVICE_NOT_FOUND; > +} > + > +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, > + unsigned int busno, > + unsigned int slot, > + unsigned int fn, > + int where) > +{ > + u16 offset; > + u32 val; > + > + /* EP device access */ > + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | > + (slot << CFG_ADDR_DEV_NUM_SHIFT) | > + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | > + (where & CFG_ADDR_REG_NUM_MASK) | > + (1 & CFG_ADDR_CFG_TYPE_MASK); > + > + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); > + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); > + > + if (iproc_pcie_reg_is_invalid(offset)) > + return NULL; > + > + return (pcie->base + offset); > +} This is a duplicate of most of the code in iproc_pcie_map_cfg_bus(). Can you use that directly, or factor this out somehow so it's not repeated? > + > /** > * Note access to the configuration registers are protected at the higher layer > * by 'pci_lock' in drivers/pci/access.c > @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus, > return (pcie->base + offset); > } > > +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, > + int where, int size, u32 *val) > +{ > + struct iproc_pcie *pcie = iproc_data(bus); > + unsigned int slot = PCI_SLOT(devfn); > + unsigned int fn = PCI_FUNC(devfn); > + unsigned int busno = bus->number; > + void __iomem *cfg_data_p; > + int ret; > + > + /* root complex access. */ > + if (busno == 0) > + return pci_generic_config_read32(bus, devfn, where, size, val); > + > + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); > + > + if (!cfg_data_p) > + return PCIBIOS_DEVICE_NOT_FOUND; > + > + ret = iproc_pcie_cfg_retry(cfg_data_p); > + if (ret) This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats the whole point of having a #define. You could test for PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even better). > + return ret; > + > + *val = readl(cfg_data_p); This is an extra read. iproc_pcie_cfg_retry() read it once and got valid data (something other than CFG_RETRY_STATUS), returned PCIBIOS_SUCCESSFUL, and now you're reading it again. I think you should do something like this instead so you don't do the MMIO read any more times than necessary: static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p) { u32 data; int timeout = CFG_RETRY_STATUS_TIMEOUT_US; data = readl(cfg_data_p); while (data == CFG_RETRY_STATUS && timeout--) { udelay(1); data = readl(cfg_data_p); } if (data == CFG_RETRY_STATUS) data = 0xffffffff; return data; } static int iproc_pcie_config_read(...) { u32 data; ... data = iproc_pcie_cfg_retry(cfg_data_p); if (size <= 2) *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and 0xffffffff data. That's what most other platforms do, and most callers of the PCI config accessors check for that data instead of checking the return code to see whether the access was successful. For example, pci_flr_wait() assumes that if a read of PCI_COMMAND returns ~0, it's because the device isn't ready yet, and we should wait and retry. > + > + if (size <= 2) > + *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); > + > + return PCIBIOS_SUCCESSFUL; > +} > + > static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, > int where, int size, u32 *val) > { > int ret; > + struct iproc_pcie *pcie = iproc_data(bus); > > iproc_pcie_apb_err_disable(bus, true); > - ret = pci_generic_config_read32(bus, devfn, where, size, val); > + if (pcie->type == IPROC_PCIE_PAXB_V2) > + ret = iproc_pcie_config_read(bus, devfn, where, size, val); > + else > + ret = pci_generic_config_read32(bus, devfn, where, size, val); > iproc_pcie_apb_err_disable(bus, false); > > return ret; > -- > 1.9.1 >
On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote: >> PCIe spec r3.1, sec 2.3.2 >> If CRS software visibility is not enabled, the RC must reissue the >> config request as a new request. >> >> - If CRS software visibility is enabled, >> - for a config read of Vendor ID, the RC must return 0x0001 data >> - for all other config reads/writes, the RC must reissue the >> request >> >> iproc PCIe Controller spec: >> 4.7.3.3. Retry Status On Configuration Cycle >> Endpoints are allowed to generate retry status on configuration >> cycles. In this case, the RC needs to re-issue the request. The IP >> does not handle this because the number of configuration cycles needed >> will probably be less than the total number of non-posted operations >> needed. >> >> When a retry status is received on the User RX interface for a >> configuration request that was sent on the User TX interface, >> it will be indicated with a completion with the CMPL_STATUS field set >> to 2=CRS, and the user will have to find the address and data values >> and send a new transaction on the User TX interface. >> When the internal configuration space returns a retry status during a >> configuration cycle (user_cscfg = 1) on the Command/Status interface, >> the pcie_cscrs will assert with the pcie_csack signal to indicate the >> CRS status. >> When the CRS Software Visibility Enable register in the Root Control >> register is enabled, the IP will return the data value to 0x0001 for >> the Vendor ID value and 0xffff (all 1’s) for the rest of the data in >> the request for reads of offset 0 that return with CRS status. This >> is true for both the User RX Interface and for the Command/Status >> interface. When CRS Software Visibility is enabled, the CMPL_STATUS >> field of the completion on the User RX Interface will not be 2=CRS and >> the pcie_cscrs signal will not assert on the Command/Status interface. >> >> Note that, neither PCIe host bridge nor PCIe core re-issues the >> request for any configuration offset. >> >> As a result of the fact, PCIe RC driver (sw) should take care of >> retrying. >> This patch fixes the problem, and attempts to read config space again >> in case of PCIe code forwarding the CRS back to CPU. > > I think "fixes the problem" is a little bit of an overstatement. I > think it covers up some cases of the problem, but if I understand > correctly, we're still susceptible to data corruptions on both read > and write: > > Per PCIe r3.1, sec 2.3.2, config requests that receive completions > with Configuration Request Retry Status (CRS) should be reissued by > the hardware *except* reads of the Vendor ID when CRS Software > Visibility is enabled. > > This hardware never reissues configuration requests when it receives > CRS completions. > > For config writes, there is no way for this driver to learn about > CRS completions. A write that receives a CRS completion will not be > retried and the data will be discarded. This is a potential data > corruption. > > For config reads, this hardware returns CFG_RETRY_STATUS data when > it receives a CRS completion for a config read, regardless of the > address of the read or the CRS Software Visibility Enable bit. As a > partial workaround for this, we retry in software any read that > returns CFG_RETRY_STATUS. > > CFG_RETRY_STATUS could be valid data for other config registers. It > is impossible to read such registers correctly because we can't tell > whether the hardware (1) received a CRS completion and synthesized > data of CFG_RETRY_STATUS or (2) received a successful completion > with data of CFG_RETRY_STATUS. > this will be fixed in our SOC in next revision, we will have separate register in host bridge space to tell us, whether it has to be retried. so there will not be any data corruption. it will be minor change (instead of checking for CFG_RETRY_STATUS, we will directly check that register) so that will be clean. > The only thing we can really do is return 0xffffffff data, which > indicates a config read failure in the first case but is a data > corruption in the second case. Instead of returning PCIBIOS_DEVICE_NOT_FOUND, you want me to return 0xffffffff data, and I suppose that you have suggsted in the code at the end also. Is my understanding right on this comment ? And do you want me to add all the texts in commit description, which you have written above regarding config read and write handling ? instead of "fixes the problem" It will be quiet long commit description :), but that's okay. > > The write case is probably not that important because we have a > similar situation on other systems. On other systems, the root > complex automatically reissues config writes with CRS completions, but > it may limit the number of retries. In that case, the failed write is > probably logged as a Target Abort error, but nobody pays attention to > that, so this isn't really much worse. > yes, we will not be handling write. >> It implements iproc_pcie_config_read which gets called for Stingray, >> Otherwise it falls back to PCI generic APIs. >> >> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com> >> Reviewed-by: Ray Jui <ray.jui@broadcom.com> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com> >> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c >> index 0f39bd2..583cee0 100644 >> --- a/drivers/pci/host/pcie-iproc.c >> +++ b/drivers/pci/host/pcie-iproc.c >> @@ -68,6 +68,9 @@ >> #define APB_ERR_EN_SHIFT 0 >> #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) >> >> +#define CFG_RETRY_STATUS 0xffff0001 >> +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ >> + >> /* derive the enum index of the outbound/inbound mapping registers */ >> #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) >> >> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, >> } >> } >> >> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) >> +{ >> + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; >> + unsigned int ret; >> + >> + /* >> + * As per PCIe spec r3.1, sec 2.3.2, CRS Software >> + * Visibility only affects config read of the Vendor ID. >> + * For config write or any other config read the Root must >> + * automatically re-issue configuration request again as a >> + * new request. >> + * >> + * Iproc based PCIe RC (hw) does not retry >> + * request on its own, so handle it here. >> + * Note that, AXI data 0xffff0001 returned by PAXB could be >> + * valid data in configuration space, for e.g. BAR requesting >> + * 64bit IO memory, so we retry till timeout. >> + * And leave to be handled by upper layers. >> + * >> + * Also, iproc PCIe RC does not have any effect on >> + * CRS Software visibility bit, irrespective if it >> + * set or unset, the RC will never re-issue any configuration >> + * access request. >> + */ >> + do { >> + ret = readl(cfg_data_p); >> + if (ret == CFG_RETRY_STATUS) >> + udelay(1); >> + else >> + return PCIBIOS_SUCCESSFUL; >> + } while (timeout--); >> + >> + return PCIBIOS_DEVICE_NOT_FOUND; >> +} >> + >> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, >> + unsigned int busno, >> + unsigned int slot, >> + unsigned int fn, >> + int where) >> +{ >> + u16 offset; >> + u32 val; >> + >> + /* EP device access */ >> + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | >> + (slot << CFG_ADDR_DEV_NUM_SHIFT) | >> + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | >> + (where & CFG_ADDR_REG_NUM_MASK) | >> + (1 & CFG_ADDR_CFG_TYPE_MASK); >> + >> + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); >> + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); >> + >> + if (iproc_pcie_reg_is_invalid(offset)) >> + return NULL; >> + >> + return (pcie->base + offset); >> +} > > This is a duplicate of most of the code in iproc_pcie_map_cfg_bus(). > Can you use that directly, or factor this out somehow so it's not > repeated? this whole things has to be aligned to Lorenzo's patch, because I see his patches are in. > >> + >> /** >> * Note access to the configuration registers are protected at the higher layer >> * by 'pci_lock' in drivers/pci/access.c >> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus, >> return (pcie->base + offset); >> } >> >> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, >> + int where, int size, u32 *val) >> +{ >> + struct iproc_pcie *pcie = iproc_data(bus); >> + unsigned int slot = PCI_SLOT(devfn); >> + unsigned int fn = PCI_FUNC(devfn); >> + unsigned int busno = bus->number; >> + void __iomem *cfg_data_p; >> + int ret; >> + >> + /* root complex access. */ >> + if (busno == 0) >> + return pci_generic_config_read32(bus, devfn, where, size, val); >> + >> + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); >> + >> + if (!cfg_data_p) >> + return PCIBIOS_DEVICE_NOT_FOUND; >> + >> + ret = iproc_pcie_cfg_retry(cfg_data_p); >> + if (ret) > > This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats > the whole point of having a #define. You could test for > PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even > better). > >> + return ret; >> + >> + *val = readl(cfg_data_p); > > This is an extra read. iproc_pcie_cfg_retry() read it once and got > valid data (something other than CFG_RETRY_STATUS), returned > PCIBIOS_SUCCESSFUL, and now you're reading it again. > > I think you should do something like this instead so you don't do the > MMIO read any more times than necessary: > > static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p) > { > u32 data; > int timeout = CFG_RETRY_STATUS_TIMEOUT_US; > > data = readl(cfg_data_p); > while (data == CFG_RETRY_STATUS && timeout--) { > udelay(1); > data = readl(cfg_data_p); > } > > if (data == CFG_RETRY_STATUS) > data = 0xffffffff; > return data; > } > > static int iproc_pcie_config_read(...) > { > u32 data; > > ... > data = iproc_pcie_cfg_retry(cfg_data_p); > if (size <= 2) > *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); > > In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and > 0xffffffff data. That's what most other platforms do, and most > callers of the PCI config accessors check for that data instead of > checking the return code to see whether the access was successful. > > For example, pci_flr_wait() assumes that if a read of PCI_COMMAND > returns ~0, it's because the device isn't ready yet, and we should > wait and retry. > >> + >> + if (size <= 2) >> + *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); >> + >> + return PCIBIOS_SUCCESSFUL; >> +} >> + >> static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, >> int where, int size, u32 *val) >> { >> int ret; >> + struct iproc_pcie *pcie = iproc_data(bus); >> >> iproc_pcie_apb_err_disable(bus, true); >> - ret = pci_generic_config_read32(bus, devfn, where, size, val); >> + if (pcie->type == IPROC_PCIE_PAXB_V2) >> + ret = iproc_pcie_config_read(bus, devfn, where, size, val); >> + else >> + ret = pci_generic_config_read32(bus, devfn, where, size, val); >> iproc_pcie_apb_err_disable(bus, false); >> >> return ret; >> -- >> 1.9.1 >> ok I will take care of your comments. and change the code accordingly. and re-submit the patch-set again. Regards Oza.
On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, Aug 04, 2017 at 09:18:15PM +0530, Oza Pawandeep wrote: >> PCIe spec r3.1, sec 2.3.2 >> If CRS software visibility is not enabled, the RC must reissue the >> config request as a new request. >> >> - If CRS software visibility is enabled, >> - for a config read of Vendor ID, the RC must return 0x0001 data >> - for all other config reads/writes, the RC must reissue the >> request >> >> iproc PCIe Controller spec: >> 4.7.3.3. Retry Status On Configuration Cycle >> Endpoints are allowed to generate retry status on configuration >> cycles. In this case, the RC needs to re-issue the request. The IP >> does not handle this because the number of configuration cycles needed >> will probably be less than the total number of non-posted operations >> needed. >> >> When a retry status is received on the User RX interface for a >> configuration request that was sent on the User TX interface, >> it will be indicated with a completion with the CMPL_STATUS field set >> to 2=CRS, and the user will have to find the address and data values >> and send a new transaction on the User TX interface. >> When the internal configuration space returns a retry status during a >> configuration cycle (user_cscfg = 1) on the Command/Status interface, >> the pcie_cscrs will assert with the pcie_csack signal to indicate the >> CRS status. >> When the CRS Software Visibility Enable register in the Root Control >> register is enabled, the IP will return the data value to 0x0001 for >> the Vendor ID value and 0xffff (all 1’s) for the rest of the data in >> the request for reads of offset 0 that return with CRS status. This >> is true for both the User RX Interface and for the Command/Status >> interface. When CRS Software Visibility is enabled, the CMPL_STATUS >> field of the completion on the User RX Interface will not be 2=CRS and >> the pcie_cscrs signal will not assert on the Command/Status interface. >> >> Note that, neither PCIe host bridge nor PCIe core re-issues the >> request for any configuration offset. >> >> As a result of the fact, PCIe RC driver (sw) should take care of >> retrying. >> This patch fixes the problem, and attempts to read config space again >> in case of PCIe code forwarding the CRS back to CPU. > > I think "fixes the problem" is a little bit of an overstatement. I > think it covers up some cases of the problem, but if I understand > correctly, we're still susceptible to data corruptions on both read > and write: > > Per PCIe r3.1, sec 2.3.2, config requests that receive completions > with Configuration Request Retry Status (CRS) should be reissued by > the hardware *except* reads of the Vendor ID when CRS Software > Visibility is enabled. > > This hardware never reissues configuration requests when it receives > CRS completions. > > For config writes, there is no way for this driver to learn about > CRS completions. A write that receives a CRS completion will not be > retried and the data will be discarded. This is a potential data > corruption. > > For config reads, this hardware returns CFG_RETRY_STATUS data when > it receives a CRS completion for a config read, regardless of the > address of the read or the CRS Software Visibility Enable bit. As a > partial workaround for this, we retry in software any read that > returns CFG_RETRY_STATUS. > > CFG_RETRY_STATUS could be valid data for other config registers. It > is impossible to read such registers correctly because we can't tell > whether the hardware (1) received a CRS completion and synthesized > data of CFG_RETRY_STATUS or (2) received a successful completion > with data of CFG_RETRY_STATUS. > > The only thing we can really do is return 0xffffffff data, which > indicates a config read failure in the first case but is a data > corruption in the second case. > > The write case is probably not that important because we have a > similar situation on other systems. On other systems, the root > complex automatically reissues config writes with CRS completions, but > it may limit the number of retries. In that case, the failed write is > probably logged as a Target Abort error, but nobody pays attention to > that, so this isn't really much worse. > >> It implements iproc_pcie_config_read which gets called for Stingray, >> Otherwise it falls back to PCI generic APIs. >> >> Signed-off-by: Oza Pawandeep <oza.oza@broadcom.com> >> Reviewed-by: Ray Jui <ray.jui@broadcom.com> >> Reviewed-by: Scott Branden <scott.branden@broadcom.com> >> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c >> index 0f39bd2..583cee0 100644 >> --- a/drivers/pci/host/pcie-iproc.c >> +++ b/drivers/pci/host/pcie-iproc.c >> @@ -68,6 +68,9 @@ >> #define APB_ERR_EN_SHIFT 0 >> #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) >> >> +#define CFG_RETRY_STATUS 0xffff0001 >> +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ >> + >> /* derive the enum index of the outbound/inbound mapping registers */ >> #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) >> >> @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, >> } >> } >> >> +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) >> +{ >> + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; >> + unsigned int ret; >> + >> + /* >> + * As per PCIe spec r3.1, sec 2.3.2, CRS Software >> + * Visibility only affects config read of the Vendor ID. >> + * For config write or any other config read the Root must >> + * automatically re-issue configuration request again as a >> + * new request. >> + * >> + * Iproc based PCIe RC (hw) does not retry >> + * request on its own, so handle it here. >> + * Note that, AXI data 0xffff0001 returned by PAXB could be >> + * valid data in configuration space, for e.g. BAR requesting >> + * 64bit IO memory, so we retry till timeout. >> + * And leave to be handled by upper layers. >> + * >> + * Also, iproc PCIe RC does not have any effect on >> + * CRS Software visibility bit, irrespective if it >> + * set or unset, the RC will never re-issue any configuration >> + * access request. >> + */ >> + do { >> + ret = readl(cfg_data_p); >> + if (ret == CFG_RETRY_STATUS) >> + udelay(1); >> + else >> + return PCIBIOS_SUCCESSFUL; >> + } while (timeout--); >> + >> + return PCIBIOS_DEVICE_NOT_FOUND; >> +} >> + >> +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, >> + unsigned int busno, >> + unsigned int slot, >> + unsigned int fn, >> + int where) >> +{ >> + u16 offset; >> + u32 val; >> + >> + /* EP device access */ >> + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | >> + (slot << CFG_ADDR_DEV_NUM_SHIFT) | >> + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | >> + (where & CFG_ADDR_REG_NUM_MASK) | >> + (1 & CFG_ADDR_CFG_TYPE_MASK); >> + >> + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); >> + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); >> + >> + if (iproc_pcie_reg_is_invalid(offset)) >> + return NULL; >> + >> + return (pcie->base + offset); >> +} > > This is a duplicate of most of the code in iproc_pcie_map_cfg_bus(). > Can you use that directly, or factor this out somehow so it's not > repeated? > >> + >> /** >> * Note access to the configuration registers are protected at the higher layer >> * by 'pci_lock' in drivers/pci/access.c >> @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus, >> return (pcie->base + offset); >> } >> >> +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, >> + int where, int size, u32 *val) >> +{ >> + struct iproc_pcie *pcie = iproc_data(bus); >> + unsigned int slot = PCI_SLOT(devfn); >> + unsigned int fn = PCI_FUNC(devfn); >> + unsigned int busno = bus->number; >> + void __iomem *cfg_data_p; >> + int ret; >> + >> + /* root complex access. */ >> + if (busno == 0) >> + return pci_generic_config_read32(bus, devfn, where, size, val); >> + >> + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); >> + >> + if (!cfg_data_p) >> + return PCIBIOS_DEVICE_NOT_FOUND; >> + >> + ret = iproc_pcie_cfg_retry(cfg_data_p); >> + if (ret) > > This relies on the fact that PCIBIOS_SUCCESSFUL == 0, which defeats > the whole point of having a #define. You could test for > PCIBIOS_SUCCESSFUL (but I think the restructuring below would be even > better). > >> + return ret; >> + >> + *val = readl(cfg_data_p); > > This is an extra read. iproc_pcie_cfg_retry() read it once and got > valid data (something other than CFG_RETRY_STATUS), returned > PCIBIOS_SUCCESSFUL, and now you're reading it again. > > I think you should do something like this instead so you don't do the > MMIO read any more times than necessary: > > static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p) > { > u32 data; > int timeout = CFG_RETRY_STATUS_TIMEOUT_US; > > data = readl(cfg_data_p); > while (data == CFG_RETRY_STATUS && timeout--) { > udelay(1); > data = readl(cfg_data_p); > } > > if (data == CFG_RETRY_STATUS) > data = 0xffffffff; > return data; > } > > static int iproc_pcie_config_read(...) > { > u32 data; > > ... > data = iproc_pcie_cfg_retry(cfg_data_p); > if (size <= 2) > *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); > > In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and > 0xffffffff data. That's what most other platforms do, and most > callers of the PCI config accessors check for that data instead of > checking the return code to see whether the access was successful. > I see one problem with this. we have Samsung NVMe which exposes 64-bit IO BAR. 0xffff0001. and if you see __pci_read_base which does following in sequence. pci_read_config_dword(dev, pos, &l); pci_write_config_dword(dev, pos, l | mask); pci_read_config_dword(dev, pos, &sz); pci_write_config_dword(dev, pos, l); returning 0xffffffff would not be correct in that case. even if callers retry, they will end up in a loop if caller retries. hence I was returning 0xffff0001, it is upto the upper layer to treat it as data or not. let me know if this sounds like a problem to you as well. so in my opinion returning 0xffffffff is not an option. > For example, pci_flr_wait() assumes that if a read of PCI_COMMAND > returns ~0, it's because the device isn't ready yet, and we should > wait and retry. > >> + >> + if (size <= 2) >> + *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); >> + >> + return PCIBIOS_SUCCESSFUL; >> +} >> + >> static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, >> int where, int size, u32 *val) >> { >> int ret; >> + struct iproc_pcie *pcie = iproc_data(bus); >> >> iproc_pcie_apb_err_disable(bus, true); >> - ret = pci_generic_config_read32(bus, devfn, where, size, val); >> + if (pcie->type == IPROC_PCIE_PAXB_V2) >> + ret = iproc_pcie_config_read(bus, devfn, where, size, val); >> + else >> + ret = pci_generic_config_read32(bus, devfn, where, size, val); >> iproc_pcie_apb_err_disable(bus, false); >> >> return ret; >> -- >> 1.9.1 >>
On Sun, Aug 20, 2017 at 01:02:09AM +0530, Oza Oza wrote: > On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > I think you should do something like this instead so you don't do the > > MMIO read any more times than necessary: > > > > static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p) > > { > > u32 data; > > int timeout = CFG_RETRY_STATUS_TIMEOUT_US; > > > > data = readl(cfg_data_p); > > while (data == CFG_RETRY_STATUS && timeout--) { > > udelay(1); > > data = readl(cfg_data_p); > > } > > > > if (data == CFG_RETRY_STATUS) > > data = 0xffffffff; > > return data; > > } > > > > static int iproc_pcie_config_read(...) > > { > > u32 data; > > > > ... > > data = iproc_pcie_cfg_retry(cfg_data_p); > > if (size <= 2) > > *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); > > > > In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and > > 0xffffffff data. That's what most other platforms do, and most > > callers of the PCI config accessors check for that data instead of > > checking the return code to see whether the access was successful. > > > I see one problem with this. > we have Samsung NVMe which exposes 64-bit IO BAR. > 0xffff0001. > > and if you see __pci_read_base which does following in sequence. > > pci_read_config_dword(dev, pos, &l); > pci_write_config_dword(dev, pos, l | mask); > pci_read_config_dword(dev, pos, &sz); > pci_write_config_dword(dev, pos, l); > > returning 0xffffffff would not be correct in that case. > even if callers retry, they will end up in a loop if caller retries. > > hence I was returning 0xffff0001, it is upto the upper layer to treat > it as data or not. In your patch, I don't think the upper layer will ever see 0xffff0001. iproc_pcie_config_read() only updates *data if iproc_pcie_cfg_retry() returns PCIBIOS_SUCCESSFUL. And iproc_pcie_cfg_retry() only returns PCIBIOS_SUCCESSFUL if it reads something other than 0xffff0001. Even if you *did* return 0xffff0001 to upper layers, I think we'd have a problem. Here's an example scenario. If we do a Function-Level Reset, pcie_flr() starts the reset, then calls pci_flr_wait() to wait until the device becomes ready again. pci_flr_wait() thinks that any PCI_COMMAND value other than 0xffffffff means the device is ready. If the device returns CRS status and you return 0xffff0001 when pci_flr_wait() reads PCI_COMMAND, it thinks that means the device is ready, but it's not. > let me know if this sounds like a problem to you as well. > > so in my opinion returning 0xffffffff is not an option. > > > For example, pci_flr_wait() assumes that if a read of PCI_COMMAND > > returns ~0, it's because the device isn't ready yet, and we should > > wait and retry.
On Sun, Aug 20, 2017 at 1:56 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Sun, Aug 20, 2017 at 01:02:09AM +0530, Oza Oza wrote: >> On Sat, Aug 19, 2017 at 11:56 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > >> > I think you should do something like this instead so you don't do the >> > MMIO read any more times than necessary: >> > >> > static u32 iproc_pcie_cfg_retry(void __iomem *cfg_data_p) >> > { >> > u32 data; >> > int timeout = CFG_RETRY_STATUS_TIMEOUT_US; >> > >> > data = readl(cfg_data_p); >> > while (data == CFG_RETRY_STATUS && timeout--) { >> > udelay(1); >> > data = readl(cfg_data_p); >> > } >> > >> > if (data == CFG_RETRY_STATUS) >> > data = 0xffffffff; >> > return data; >> > } >> > >> > static int iproc_pcie_config_read(...) >> > { >> > u32 data; >> > >> > ... >> > data = iproc_pcie_cfg_retry(cfg_data_p); >> > if (size <= 2) >> > *val = (data >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); >> > >> > In the event of a timeout, this returns PCIBIOS_SUCCESSFUL and >> > 0xffffffff data. That's what most other platforms do, and most >> > callers of the PCI config accessors check for that data instead of >> > checking the return code to see whether the access was successful. >> > >> I see one problem with this. >> we have Samsung NVMe which exposes 64-bit IO BAR. >> 0xffff0001. >> >> and if you see __pci_read_base which does following in sequence. >> >> pci_read_config_dword(dev, pos, &l); >> pci_write_config_dword(dev, pos, l | mask); >> pci_read_config_dword(dev, pos, &sz); >> pci_write_config_dword(dev, pos, l); >> >> returning 0xffffffff would not be correct in that case. >> even if callers retry, they will end up in a loop if caller retries. >> >> hence I was returning 0xffff0001, it is upto the upper layer to treat >> it as data or not. > > In your patch, I don't think the upper layer will ever see 0xffff0001. > iproc_pcie_config_read() only updates *data if iproc_pcie_cfg_retry() > returns PCIBIOS_SUCCESSFUL. And iproc_pcie_cfg_retry() only returns > PCIBIOS_SUCCESSFUL if it reads something other than 0xffff0001. > > Even if you *did* return 0xffff0001 to upper layers, I think we'd have > a problem. Here's an example scenario. If we do a Function-Level > Reset, pcie_flr() starts the reset, then calls pci_flr_wait() to wait > until the device becomes ready again. pci_flr_wait() thinks that any > PCI_COMMAND value other than 0xffffffff means the device is ready. > > If the device returns CRS status and you return 0xffff0001 when > pci_flr_wait() reads PCI_COMMAND, it thinks that means the device is > ready, but it's not. > >> let me know if this sounds like a problem to you as well. >> >> so in my opinion returning 0xffffffff is not an option. >> >> > For example, pci_flr_wait() assumes that if a read of PCI_COMMAND >> > returns ~0, it's because the device isn't ready yet, and we should >> > wait and retry. yes that will be a problem if I return 0xffff0001. ok let me revise the patch by returning 0xffffffff as of now. Regards, Oza.
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c index 0f39bd2..583cee0 100644 --- a/drivers/pci/host/pcie-iproc.c +++ b/drivers/pci/host/pcie-iproc.c @@ -68,6 +68,9 @@ #define APB_ERR_EN_SHIFT 0 #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT) +#define CFG_RETRY_STATUS 0xffff0001 +#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milli-seconds. */ + /* derive the enum index of the outbound/inbound mapping registers */ #define MAP_REG(base_reg, index) ((base_reg) + (index) * 2) @@ -448,6 +451,66 @@ static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, } } +static int iproc_pcie_cfg_retry(void __iomem *cfg_data_p) +{ + int timeout = CFG_RETRY_STATUS_TIMEOUT_US; + unsigned int ret; + + /* + * As per PCIe spec r3.1, sec 2.3.2, CRS Software + * Visibility only affects config read of the Vendor ID. + * For config write or any other config read the Root must + * automatically re-issue configuration request again as a + * new request. + * + * Iproc based PCIe RC (hw) does not retry + * request on its own, so handle it here. + * Note that, AXI data 0xffff0001 returned by PAXB could be + * valid data in configuration space, for e.g. BAR requesting + * 64bit IO memory, so we retry till timeout. + * And leave to be handled by upper layers. + * + * Also, iproc PCIe RC does not have any effect on + * CRS Software visibility bit, irrespective if it + * set or unset, the RC will never re-issue any configuration + * access request. + */ + do { + ret = readl(cfg_data_p); + if (ret == CFG_RETRY_STATUS) + udelay(1); + else + return PCIBIOS_SUCCESSFUL; + } while (timeout--); + + return PCIBIOS_DEVICE_NOT_FOUND; +} + +static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie, + unsigned int busno, + unsigned int slot, + unsigned int fn, + int where) +{ + u16 offset; + u32 val; + + /* EP device access */ + val = (busno << CFG_ADDR_BUS_NUM_SHIFT) | + (slot << CFG_ADDR_DEV_NUM_SHIFT) | + (fn << CFG_ADDR_FUNC_NUM_SHIFT) | + (where & CFG_ADDR_REG_NUM_MASK) | + (1 & CFG_ADDR_CFG_TYPE_MASK); + + iproc_pcie_write_reg(pcie, IPROC_PCIE_CFG_ADDR, val); + offset = iproc_pcie_reg_offset(pcie, IPROC_PCIE_CFG_DATA); + + if (iproc_pcie_reg_is_invalid(offset)) + return NULL; + + return (pcie->base + offset); +} + /** * Note access to the configuration registers are protected at the higher layer * by 'pci_lock' in drivers/pci/access.c @@ -499,13 +562,48 @@ static void __iomem *iproc_pcie_map_cfg_bus(struct pci_bus *bus, return (pcie->base + offset); } +static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 *val) +{ + struct iproc_pcie *pcie = iproc_data(bus); + unsigned int slot = PCI_SLOT(devfn); + unsigned int fn = PCI_FUNC(devfn); + unsigned int busno = bus->number; + void __iomem *cfg_data_p; + int ret; + + /* root complex access. */ + if (busno == 0) + return pci_generic_config_read32(bus, devfn, where, size, val); + + cfg_data_p = iproc_pcie_map_ep_cfg_reg(pcie, busno, slot, fn, where); + + if (!cfg_data_p) + return PCIBIOS_DEVICE_NOT_FOUND; + + ret = iproc_pcie_cfg_retry(cfg_data_p); + if (ret) + return ret; + + *val = readl(cfg_data_p); + + if (size <= 2) + *val = (*val >> (8 * (where & 3))) & ((1 << (size * 8)) - 1); + + return PCIBIOS_SUCCESSFUL; +} + static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { int ret; + struct iproc_pcie *pcie = iproc_data(bus); iproc_pcie_apb_err_disable(bus, true); - ret = pci_generic_config_read32(bus, devfn, where, size, val); + if (pcie->type == IPROC_PCIE_PAXB_V2) + ret = iproc_pcie_config_read(bus, devfn, where, size, val); + else + ret = pci_generic_config_read32(bus, devfn, where, size, val); iproc_pcie_apb_err_disable(bus, false); return ret;