Message ID | 20230428223500.23337-4-jim2101024@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI: brcmstb: Configure appropriate HW CLKREQ# mode | expand |
On Fri, Apr 28, 2023 at 06:34:57PM -0400, Jim Quinlan wrote: > Since the STB PCIe HW will cause a CPU abort on a PCIe transaction > completion timeout abort, we might as well extend the default timeout > limit. Further, different devices and systems may requires a larger or > smaller amount commensurate with their L1SS exit time, so the property > "brcm,completion-timeout-us" may be used to set a custom timeout value. s/requires/require/ AFAIK, other platforms do not tweak Configuration Timeout values based on L1SS exit time. Why is brcm different? Bjorn
On Sun, Apr 30, 2023 at 3:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Apr 28, 2023 at 06:34:57PM -0400, Jim Quinlan wrote: > > Since the STB PCIe HW will cause a CPU abort on a PCIe transaction > > completion timeout abort, we might as well extend the default timeout > > limit. Further, different devices and systems may requires a larger or > > smaller amount commensurate with their L1SS exit time, so the property > > "brcm,completion-timeout-us" may be used to set a custom timeout value. > > s/requires/require/ > > AFAIK, other platforms do not tweak Configuration Timeout values based > on L1SS exit time. Why is brcm different? Keep in mind that our Brcm PCIe HW signals a CPU abort on a PCIe completion timeout. Other PCIe HW just returns 0xffffffff. I've been maintaining this driver for over eight years or so and we've done fine with the HW default completion timeout value. Only recently has a major customer requested that this timeout value be changed, and their reason was so they could avoid a CPU abort when using L1SS. Now we could set this value to a big number for all cases and not require "brcm,completion-timeout-us". I cannot see any downsides, other than another customer coming along asking us to double the default or lessen it. But I'm certainly willing to do that -- would that be acceptable? Regards, Jim > > Bjorn
On Sun, Apr 30, 2023 at 05:24:26PM -0400, Jim Quinlan wrote: > On Sun, Apr 30, 2023 at 3:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Apr 28, 2023 at 06:34:57PM -0400, Jim Quinlan wrote: > > > Since the STB PCIe HW will cause a CPU abort on a PCIe transaction > > > completion timeout abort, we might as well extend the default timeout > > > limit. Further, different devices and systems may requires a larger or > > > smaller amount commensurate with their L1SS exit time, so the property > > > "brcm,completion-timeout-us" may be used to set a custom timeout value. > > > > s/requires/require/ > > > > AFAIK, other platforms do not tweak Configuration Timeout values based > > on L1SS exit time. Why is brcm different? > > Keep in mind that our Brcm PCIe HW signals a CPU abort on a PCIe > completion timeout. Other PCIe HW just returns 0xffffffff. Most does, but I'm pretty sure there are other controllers used on arm64 that signal CPU aborts, e.g., imx6q_pcie_abort_handler() seems similar. > I've been maintaining this driver for over eight years or so and we've > done fine with the HW default completion timeout value. > Only recently has a major customer requested that this timeout value > be changed, and their reason was so they could > avoid a CPU abort when using L1SS. > > Now we could set this value to a big number for all cases and not > require "brcm,completion-timeout-us". I cannot see any > downsides, other than another customer coming along asking us to > double the default or lessen it. > > But I'm certainly willing to do that -- would that be acceptable? That would be fine with me. Bjorn
Hi Jim, Am 29.04.23 um 00:34 schrieb Jim Quinlan: > Since the STB PCIe HW will cause a CPU abort on a PCIe transaction > completion timeout abort, we might as well extend the default timeout > limit. Further, different devices and systems may requires a larger or > smaller amount commensurate with their L1SS exit time, so the property > "brcm,completion-timeout-us" may be used to set a custom timeout value. > > Tested-by: Florian Fainelli <f.fainelli@gmail.com> > Signed-off-by: Jim Quinlan <jim2101024@gmail.com> > --- > drivers/pci/controller/pcie-brcmstb.c | 30 +++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c > index c4b076ea5180..c2cb683447ac 100644 > --- a/drivers/pci/controller/pcie-brcmstb.c > +++ b/drivers/pci/controller/pcie-brcmstb.c > @@ -1080,6 +1080,35 @@ static void brcm_config_clkreq(struct brcm_pcie *pcie) > writel(clkreq_set, pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG); > } > > +static void brcm_config_completion_timeout(struct brcm_pcie *pcie) > +{ > + /* TIMEOUT register is two registers before RGR1_SW_INIT_1 */ > + const char *fmt = "brcm,completion-timeout-us clamped to region [%u..%u]\n"; > + const unsigned int REG_OFFSET = PCIE_RGR1_SW_INIT_1(pcie) - 8; > + const u32 timeout_us_min = 16; > + const u32 timeout_us_max = 19884107; > + u32 timeout_us = 1000000; /* Our default, 1 second */ > + int rval, ret; > + > + ret = of_property_read_u32(pcie->np, "brcm,completion-timeout-us", > + &timeout_us); > + if (ret && ret != -EINVAL) > + dev_err(pcie->dev, "malformed/invalid 'brcm,completion-timeout-us'\n"); > + > + /* If needed, clamp the requested timeout value and issue a warning */ > + if (timeout_us < timeout_us_min) { > + timeout_us = timeout_us_min; > + dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max); > + } else if (timeout_us > timeout_us_max) { > + timeout_us = timeout_us_max; > + dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max); > + } > + > + /* Each unit in timeout register is 1/216,000,000 seconds */ > + rval = 216 * timeout_us; > + writel(rval, pcie->base + REG_OFFSET); i don't think "int" is the proper type for rval here > +} > + > static int brcm_pcie_start_link(struct brcm_pcie *pcie) > { > struct device *dev = pcie->dev; > @@ -1110,6 +1139,7 @@ static int brcm_pcie_start_link(struct brcm_pcie *pcie) > return -ENODEV; > } > > + brcm_config_completion_timeout(pcie); > brcm_config_clkreq(pcie); > > if (pcie->gen)
On Mon, May 1, 2023 at 4:55 PM Lukas Wunner <lukas@wunner.de> wrote: > > On Sun, Apr 30, 2023 at 05:24:26PM -0400, Jim Quinlan wrote: > > I've been maintaining this driver for over eight years or so and we've > > done fine with the HW default completion timeout value. > > Only recently has a major customer requested that this timeout value > > be changed, and their reason was so they could > > avoid a CPU abort when using L1SS. > > > > Now we could set this value to a big number for all cases and not > > require "brcm,completion-timeout-us". I cannot see any > > downsides, other than another customer coming along asking us to > > double the default or lessen it. > > The Completion Timeout is configurable in the Device Control 2 Register > (PCIe r2.1 sec 7.8.16). Your controller does expose that register: > > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- > > Why does your controller allow configuration of the timeout in a > proprietary register in addition to DevCtl2? > > If you make the Completion Timeout configurable, please do so in > a spec-compliant way, i.e. via DevCtl2, so that it works for > other products as well. > > If the proprietary register has advantages over DevCtl2 (e.g. > finer granularity), perhaps you can divert accesses to the > Completion Timeout Value in DevCtl2 to your proprietary register. Hello Lukas & Bjorn, Yes, you are right. I was under the (mis)understanding that writing this register is (a) necessary for long L1SS periods and (b) overrides or updates the value in the CFG register you mentioned. It turns out that only (a) is true. After communicating with the HW engineer, it appears this register is for an internal bus timeout. Further, this bus timeout can occur when there is no PCIe access involved. However, the error appears as a completion timeout, which I also am investigating. At any rate, I am dropping the "brcm,completion-timeout-usec" property completely and V5 will just set this internal timeout to a higher default if the "brcm,enable-l1ss" is present. Thanks & regards, Jim > > Thanks, > > Lukas
diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c index c4b076ea5180..c2cb683447ac 100644 --- a/drivers/pci/controller/pcie-brcmstb.c +++ b/drivers/pci/controller/pcie-brcmstb.c @@ -1080,6 +1080,35 @@ static void brcm_config_clkreq(struct brcm_pcie *pcie) writel(clkreq_set, pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG); } +static void brcm_config_completion_timeout(struct brcm_pcie *pcie) +{ + /* TIMEOUT register is two registers before RGR1_SW_INIT_1 */ + const char *fmt = "brcm,completion-timeout-us clamped to region [%u..%u]\n"; + const unsigned int REG_OFFSET = PCIE_RGR1_SW_INIT_1(pcie) - 8; + const u32 timeout_us_min = 16; + const u32 timeout_us_max = 19884107; + u32 timeout_us = 1000000; /* Our default, 1 second */ + int rval, ret; + + ret = of_property_read_u32(pcie->np, "brcm,completion-timeout-us", + &timeout_us); + if (ret && ret != -EINVAL) + dev_err(pcie->dev, "malformed/invalid 'brcm,completion-timeout-us'\n"); + + /* If needed, clamp the requested timeout value and issue a warning */ + if (timeout_us < timeout_us_min) { + timeout_us = timeout_us_min; + dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max); + } else if (timeout_us > timeout_us_max) { + timeout_us = timeout_us_max; + dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max); + } + + /* Each unit in timeout register is 1/216,000,000 seconds */ + rval = 216 * timeout_us; + writel(rval, pcie->base + REG_OFFSET); +} + static int brcm_pcie_start_link(struct brcm_pcie *pcie) { struct device *dev = pcie->dev; @@ -1110,6 +1139,7 @@ static int brcm_pcie_start_link(struct brcm_pcie *pcie) return -ENODEV; } + brcm_config_completion_timeout(pcie); brcm_config_clkreq(pcie); if (pcie->gen)