Message ID | 20200727213045.2117855-1-ian.kumlien@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Use maximum latency when determining L1 ASPM | expand |
On Mon, Jul 27, 2020 at 11:30:45PM +0200, Ian Kumlien wrote: > Currently we check the maximum latency of upstream and downstream > per link, not the maximum for the path > > This would work if all links have the same latency, but: > endpoint -> c -> b -> a -> root (in the order we walk the path) > > If c or b has the higest latency, it will not register > > Fix this by maintaining the maximum latency value for the path > > This change fixes a regression introduced by: > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges) Hi Ian, Sorry about the regression, and thank you very much for doing the hard work of debugging and fixing it! My guess is that 66ff14e59e8a isn't itself buggy, but it allowed ASPM to be enabled on a longer path, and we weren't computing the maximum latency correctly, so ASPM on that longer path exceeded the amount we could tolerate. If that's the case, 66ff14e59e8a probably just exposed an existing problem that could occur in other topologies even without 66ff14e59e8a. I'd like to work through this latency code with concrete examples. Can you collect the "sudo lspci -vv" output and attach it to an entry at https://bugzilla.kernel.org? If it's convenient, it would be really nice to compare it with similar output from before this patch. Bjorn > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com> > --- > drivers/pci/pcie/aspm.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > index b17e5ffd31b1..bd53fba7f382 100644 > --- a/drivers/pci/pcie/aspm.c > +++ b/drivers/pci/pcie/aspm.c > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev, > > static void pcie_aspm_check_latency(struct pci_dev *endpoint) > { > - u32 latency, l1_switch_latency = 0; > + u32 latency, l1_max_latency = 0, l1_switch_latency = 0; > struct aspm_latency *acceptable; > struct pcie_link_state *link; > > @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) > * substate latencies (and hence do not do any check). > */ > latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1); > + l1_max_latency = max_t(u32, latency, l1_max_latency); > if ((link->aspm_capable & ASPM_STATE_L1) && > - (latency + l1_switch_latency > acceptable->l1)) > + (l1_max_latency + l1_switch_latency > acceptable->l1)) > link->aspm_capable &= ~ASPM_STATE_L1; > l1_switch_latency += 1000; > > -- > 2.27.0 >
On Thu, Jul 30, 2020 at 12:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > On Mon, Jul 27, 2020 at 11:30:45PM +0200, Ian Kumlien wrote: > > Currently we check the maximum latency of upstream and downstream > > per link, not the maximum for the path > > > > This would work if all links have the same latency, but: > > endpoint -> c -> b -> a -> root (in the order we walk the path) > > > > If c or b has the higest latency, it will not register > > > > Fix this by maintaining the maximum latency value for the path > > > > This change fixes a regression introduced by: > > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges) > > Hi Ian, > > Sorry about the regression, and thank you very much for doing the > hard work of debugging and fixing it! > > My guess is that 66ff14e59e8a isn't itself buggy, but it allowed ASPM > to be enabled on a longer path, and we weren't computing the maximum > latency correctly, so ASPM on that longer path exceeded the amount we > could tolerate. If that's the case, 66ff14e59e8a probably just > exposed an existing problem that could occur in other topologies even > without 66ff14e59e8a. I agree, this is why I didn't do fixes:. but it does fix a regression - and it's hard to say exactly what - I'd like it to go in to stable when accepted... But we can rewrite it anyway you see fit :) > I'd like to work through this latency code with concrete examples. > Can you collect the "sudo lspci -vv" output and attach it to an entry > at https://bugzilla.kernel.org? If it's convenient, it would be > really nice to compare it with similar output from before this patch. I can cut from the mails i had in the conversation with Alexander Duyck I submitted it against PCI, you can find it here: https://bugzilla.kernel.org/show_bug.cgi?id=208741 Still filling in the data > Bjorn > > > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com> > > --- > > drivers/pci/pcie/aspm.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > index b17e5ffd31b1..bd53fba7f382 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev, > > > > static void pcie_aspm_check_latency(struct pci_dev *endpoint) > > { > > - u32 latency, l1_switch_latency = 0; > > + u32 latency, l1_max_latency = 0, l1_switch_latency = 0; > > struct aspm_latency *acceptable; > > struct pcie_link_state *link; > > > > @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) > > * substate latencies (and hence do not do any check). > > */ > > latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1); > > + l1_max_latency = max_t(u32, latency, l1_max_latency); > > if ((link->aspm_capable & ASPM_STATE_L1) && > > - (latency + l1_switch_latency > acceptable->l1)) > > + (l1_max_latency + l1_switch_latency > acceptable->l1)) > > link->aspm_capable &= ~ASPM_STATE_L1; > > l1_switch_latency += 1000; > > > > -- > > 2.27.0 > >
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index b17e5ffd31b1..bd53fba7f382 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev, static void pcie_aspm_check_latency(struct pci_dev *endpoint) { - u32 latency, l1_switch_latency = 0; + u32 latency, l1_max_latency = 0, l1_switch_latency = 0; struct aspm_latency *acceptable; struct pcie_link_state *link; @@ -470,8 +470,9 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint) * substate latencies (and hence do not do any check). */ latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1); + l1_max_latency = max_t(u32, latency, l1_max_latency); if ((link->aspm_capable & ASPM_STATE_L1) && - (latency + l1_switch_latency > acceptable->l1)) + (l1_max_latency + l1_switch_latency > acceptable->l1)) link->aspm_capable &= ~ASPM_STATE_L1; l1_switch_latency += 1000;
Currently we check the maximum latency of upstream and downstream per link, not the maximum for the path This would work if all links have the same latency, but: endpoint -> c -> b -> a -> root (in the order we walk the path) If c or b has the higest latency, it will not register Fix this by maintaining the maximum latency value for the path This change fixes a regression introduced by: 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges) Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com> --- drivers/pci/pcie/aspm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)