Message ID | 20210722025858.220064-1-alexander.deucher@amd.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: quirks: Quirk PCI d3hot delay for AMD xhci | expand |
On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: > From: Marcin Bachry <hegel666@gmail.com> > > Renoir needs a similar delay. > > [Alex: I talked to the AMD USB hardware team and the > AMD windows team and they are not aware of any HW > errata or specific issues. The HW works fine in > windows. I was told windows uses a rather generous > default delay of 100ms for PCI state transitions.] > > Signed-off-by: Marcin Bachry <hegel666@gmail.com> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Added stable tag and applied to pci/pm for v5.15, thanks! > Cc: mario.limonciello@amd.com > Cc: prike.liang@amd.com > Cc: shyam-sundar.s-k@amd.com > --- > > Bjorn, > > With the above comment in mind, would you consider this patch > or would you prefer to increase the default timeout on Linux? > 100ms seems a bit long and most devices seems to work within > that limit. Additionally, this patch doesn't seem to be > required on all AMD platforms with the affected USB controller, > so I suspect the current timeout on Linux is probably about > right. Increasing it seems to fix some of the marginal cases. > > Alex > > drivers/pci/quirks.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 22b2bb1109c9..dea10d62d5b9 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -1899,6 +1899,7 @@ static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev) > } > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot); > > #ifdef CONFIG_X86_IO_APIC > static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) > -- > 2.31.1 >
On 7/29/2021 15:39, Bjorn Helgaas wrote: > On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: >> From: Marcin Bachry <hegel666@gmail.com> >> >> Renoir needs a similar delay. >> >> [Alex: I talked to the AMD USB hardware team and the >> AMD windows team and they are not aware of any HW >> errata or specific issues. The HW works fine in >> windows. I was told windows uses a rather generous >> default delay of 100ms for PCI state transitions.] >> >> Signed-off-by: Marcin Bachry <hegel666@gmail.com> >> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > Added stable tag and applied to pci/pm for v5.15, thanks! Thanks Bjorn! Given how small/harmless this is and 5.14 isn't cut yet, any chance this could still make one of the -rcX rather than wait for 5.14.1 instead? > >> Cc: mario.limonciello@amd.com >> Cc: prike.liang@amd.com >> Cc: shyam-sundar.s-k@amd.com >> --- >> >> Bjorn, >> >> With the above comment in mind, would you consider this patch >> or would you prefer to increase the default timeout on Linux? >> 100ms seems a bit long and most devices seems to work within >> that limit. Additionally, this patch doesn't seem to be >> required on all AMD platforms with the affected USB controller, >> so I suspect the current timeout on Linux is probably about >> right. Increasing it seems to fix some of the marginal cases. >> >> Alex >> >> drivers/pci/quirks.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index 22b2bb1109c9..dea10d62d5b9 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -1899,6 +1899,7 @@ static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev) >> } >> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot); >> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot); >> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot); >> >> #ifdef CONFIG_X86_IO_APIC >> static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) >> -- >> 2.31.1 >>
On Thu, Jul 29, 2021 at 03:42:58PM -0500, Limonciello, Mario wrote: > On 7/29/2021 15:39, Bjorn Helgaas wrote: > > On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: > > > From: Marcin Bachry <hegel666@gmail.com> > > > > > > Renoir needs a similar delay. > > > > > > [Alex: I talked to the AMD USB hardware team and the > > > AMD windows team and they are not aware of any HW > > > errata or specific issues. The HW works fine in > > > windows. I was told windows uses a rather generous > > > default delay of 100ms for PCI state transitions.] > > > > > > Signed-off-by: Marcin Bachry <hegel666@gmail.com> > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > > > Added stable tag and applied to pci/pm for v5.15, thanks! > > Thanks Bjorn! > > Given how small/harmless this is and 5.14 isn't cut yet, any chance this > could still make one of the -rcX rather than wait for 5.14.1 instead? Done. What's the rest of the story here? Aare we working around a defect in these XHCI controllers? A defect in Linux? Obviously nobody wants to have to add a quirk for every new Device ID. It's not like this should be hard to figure out for your hardware guys in the lab, and if it turns out to be a Linux problem, we should fix it so everybody benefits. > > > Cc: mario.limonciello@amd.com > > > Cc: prike.liang@amd.com > > > Cc: shyam-sundar.s-k@amd.com > > > --- > > > > > > Bjorn, > > > > > > With the above comment in mind, would you consider this patch > > > or would you prefer to increase the default timeout on Linux? > > > 100ms seems a bit long and most devices seems to work within > > > that limit. Additionally, this patch doesn't seem to be > > > required on all AMD platforms with the affected USB controller, > > > so I suspect the current timeout on Linux is probably about > > > right. Increasing it seems to fix some of the marginal cases. > > > > > > Alex > > > > > > drivers/pci/quirks.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > index 22b2bb1109c9..dea10d62d5b9 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -1899,6 +1899,7 @@ static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev) > > > } > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot); > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot); > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot); > > > #ifdef CONFIG_X86_IO_APIC > > > static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) > > > -- > > > 2.31.1 > > > >
On 7/29/2021 16:06, Bjorn Helgaas wrote: > On Thu, Jul 29, 2021 at 03:42:58PM -0500, Limonciello, Mario wrote: >> On 7/29/2021 15:39, Bjorn Helgaas wrote: >>> On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: >>>> From: Marcin Bachry <hegel666@gmail.com> >>>> >>>> Renoir needs a similar delay. >>>> >>>> [Alex: I talked to the AMD USB hardware team and the >>>> AMD windows team and they are not aware of any HW >>>> errata or specific issues. The HW works fine in >>>> windows. I was told windows uses a rather generous >>>> default delay of 100ms for PCI state transitions.] >>>> >>>> Signed-off-by: Marcin Bachry <hegel666@gmail.com> >>>> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> >>> >>> Added stable tag and applied to pci/pm for v5.15, thanks! >> >> Thanks Bjorn! >> >> Given how small/harmless this is and 5.14 isn't cut yet, any chance this >> could still make one of the -rcX rather than wait for 5.14.1 instead? > > Done. Thanks! > > What's the rest of the story here? Aare we working around a defect in > these XHCI controllers? A defect in Linux? Obviously nobody wants to > have to add a quirk for every new Device ID. It's not like this > should be hard to figure out for your hardware guys in the lab, and if > it turns out to be a Linux problem, we should fix it so everybody > benefits. > Maybe you missed the embedded message from Alex above. We had a discussion with our internal team that works with Windows on this, and they told us the default delay is significantly more generous on Windows. >>>> Cc: mario.limonciello@amd.com >>>> Cc: prike.liang@amd.com >>>> Cc: shyam-sundar.s-k@amd.com >>>> --- >>>> >>>> Bjorn, >>>> >>>> With the above comment in mind, would you consider this patch >>>> or would you prefer to increase the default timeout on Linux? >>>> 100ms seems a bit long and most devices seems to work within >>>> that limit. Additionally, this patch doesn't seem to be >>>> required on all AMD platforms with the affected USB controller, >>>> so I suspect the current timeout on Linux is probably about >>>> right. Increasing it seems to fix some of the marginal cases. >>>> >>>> Alex >>>> >>>> drivers/pci/quirks.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >>>> index 22b2bb1109c9..dea10d62d5b9 100644 >>>> --- a/drivers/pci/quirks.c >>>> +++ b/drivers/pci/quirks.c >>>> @@ -1899,6 +1899,7 @@ static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev) >>>> } >>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot); >>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot); >>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot); >>>> #ifdef CONFIG_X86_IO_APIC >>>> static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) >>>> -- >>>> 2.31.1 >>>> >>
On Thu, Jul 29, 2021 at 04:09:50PM -0500, Limonciello, Mario wrote: > On 7/29/2021 16:06, Bjorn Helgaas wrote: > > On Thu, Jul 29, 2021 at 03:42:58PM -0500, Limonciello, Mario wrote: > > > On 7/29/2021 15:39, Bjorn Helgaas wrote: > > > > On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: > > > > > From: Marcin Bachry <hegel666@gmail.com> > > > > > > > > > > Renoir needs a similar delay. > > > > > > > > > > [Alex: I talked to the AMD USB hardware team and the > > > > > AMD windows team and they are not aware of any HW > > > > > errata or specific issues. The HW works fine in > > > > > windows. I was told windows uses a rather generous > > > > > default delay of 100ms for PCI state transitions.] > > > > > > > > > > Signed-off-by: Marcin Bachry <hegel666@gmail.com> > > > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > > > > > > > Added stable tag and applied to pci/pm for v5.15, thanks! > > > > > > Thanks Bjorn! > > > > > > Given how small/harmless this is and 5.14 isn't cut yet, any chance this > > > could still make one of the -rcX rather than wait for 5.14.1 instead? > > > > Done. > > Thanks! > > > What's the rest of the story here? Aare we working around a defect in > > these XHCI controllers? A defect in Linux? Obviously nobody wants to > > have to add a quirk for every new Device ID. It's not like this > > should be hard to figure out for your hardware guys in the lab, and if > > it turns out to be a Linux problem, we should fix it so everybody > > benefits. > > Maybe you missed the embedded message from Alex above. We had a discussion > with our internal team that works with Windows on this, and they told us the > default delay is significantly more generous on Windows. I did see Alex's message, but it didn't answer the question of whether this is a hardware defect or a Linux defect. "It works fine in Windows" doesn't mean the hardware conforms to the spec. PCIe r5.0, sec 5.3.1.4 says "... System Software must allow a minimum recovery time following a D3Hot → D0 transition of at least 10 ms (see Section 7.9.17), prior to accessing the Function." If the hardware isn't ready in 10ms, I'd claim that's a hardware defect. If Linux isn't waiting the 10ms, I'd claim that's a Linux defect. If things work by waiting 100ms, that's nice, but what's the point of specs if we have to increase the time and penalize everybody just to accommodate some oddball device? > > > > > Cc: mario.limonciello@amd.com > > > > > Cc: prike.liang@amd.com > > > > > Cc: shyam-sundar.s-k@amd.com > > > > > --- > > > > > > > > > > Bjorn, > > > > > > > > > > With the above comment in mind, would you consider this patch > > > > > or would you prefer to increase the default timeout on Linux? > > > > > 100ms seems a bit long and most devices seems to work within > > > > > that limit. Additionally, this patch doesn't seem to be > > > > > required on all AMD platforms with the affected USB controller, > > > > > so I suspect the current timeout on Linux is probably about > > > > > right. Increasing it seems to fix some of the marginal cases. > > > > > > > > > > Alex > > > > > > > > > > drivers/pci/quirks.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > > index 22b2bb1109c9..dea10d62d5b9 100644 > > > > > --- a/drivers/pci/quirks.c > > > > > +++ b/drivers/pci/quirks.c > > > > > @@ -1899,6 +1899,7 @@ static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev) > > > > > } > > > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot); > > > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot); > > > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot); > > > > > #ifdef CONFIG_X86_IO_APIC > > > > > static int dmi_disable_ioapicreroute(const struct dmi_system_id *d) > > > > > -- > > > > > 2.31.1 > > > > > > > > >
On Thu, Jul 29, 2021 at 04:30:28PM -0500, Bjorn Helgaas wrote: > On Thu, Jul 29, 2021 at 04:09:50PM -0500, Limonciello, Mario wrote: > > On 7/29/2021 16:06, Bjorn Helgaas wrote: > > > On Thu, Jul 29, 2021 at 03:42:58PM -0500, Limonciello, Mario wrote: > > > > On 7/29/2021 15:39, Bjorn Helgaas wrote: > > > > > On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: > > > > > > From: Marcin Bachry <hegel666@gmail.com> > > > > > > > > > > > > Renoir needs a similar delay. > > > > > > > > > > > > [Alex: I talked to the AMD USB hardware team and the > > > > > > AMD windows team and they are not aware of any HW > > > > > > errata or specific issues. The HW works fine in > > > > > > windows. I was told windows uses a rather generous > > > > > > default delay of 100ms for PCI state transitions.] > > > > > > > > > > > > Signed-off-by: Marcin Bachry <hegel666@gmail.com> > > > > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > > > > > > > > > Added stable tag and applied to pci/pm for v5.15, thanks! > > > > > > > > Thanks Bjorn! > > > > > > > > Given how small/harmless this is and 5.14 isn't cut yet, any chance this > > > > could still make one of the -rcX rather than wait for 5.14.1 instead? > > > > > > Done. > > > > Thanks! > > > > > What's the rest of the story here? Aare we working around a defect in > > > these XHCI controllers? A defect in Linux? Obviously nobody wants to > > > have to add a quirk for every new Device ID. It's not like this > > > should be hard to figure out for your hardware guys in the lab, and if > > > it turns out to be a Linux problem, we should fix it so everybody > > > benefits. > > > > Maybe you missed the embedded message from Alex above. We had a discussion > > with our internal team that works with Windows on this, and they told us the > > default delay is significantly more generous on Windows. > > I did see Alex's message, but it didn't answer the question of whether > this is a hardware defect or a Linux defect. "It works fine in > Windows" doesn't mean the hardware conforms to the spec. > > PCIe r5.0, sec 5.3.1.4 says "... System Software must allow a minimum > recovery time following a D3Hot → D0 transition of at least 10 ms (see > Section 7.9.17), prior to accessing the Function." > > If the hardware isn't ready in 10ms, I'd claim that's a hardware > defect. > > If Linux isn't waiting the 10ms, I'd claim that's a Linux defect. > > If things work by waiting 100ms, that's nice, but what's the point of > specs if we have to increase the time and penalize everybody just to > accommodate some oddball device? 10ms after hitting "send" it occurred to me that since all of these quirks are for AMD devices, we could just make the quirk generic so we wait 100ms for *all* AMD devices. Then AMD boxes would resume a little slower than everybody else, but some of the maintenance burden would go away. I'm only half joking, and I would take that patch if you sent it. Bjorn
[Public] > -----Original Message----- > From: Bjorn Helgaas <helgaas@kernel.org> > Sent: Thursday, July 29, 2021 5:34 PM > To: Limonciello, Mario <Mario.Limonciello@amd.com> > Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; > bhelgaas@google.com; linux-pci@vger.kernel.org; Marcin Bachry > <hegel666@gmail.com>; Liang, Prike <Prike.Liang@amd.com>; S-k, Shyam- > sundar <Shyam-sundar.S-k@amd.com> > Subject: Re: [PATCH] PCI: quirks: Quirk PCI d3hot delay for AMD xhci > > On Thu, Jul 29, 2021 at 04:30:28PM -0500, Bjorn Helgaas wrote: > > On Thu, Jul 29, 2021 at 04:09:50PM -0500, Limonciello, Mario wrote: > > > On 7/29/2021 16:06, Bjorn Helgaas wrote: > > > > On Thu, Jul 29, 2021 at 03:42:58PM -0500, Limonciello, Mario wrote: > > > > > On 7/29/2021 15:39, Bjorn Helgaas wrote: > > > > > > On Wed, Jul 21, 2021 at 10:58:58PM -0400, Alex Deucher wrote: > > > > > > > From: Marcin Bachry <hegel666@gmail.com> > > > > > > > > > > > > > > Renoir needs a similar delay. > > > > > > > > > > > > > > [Alex: I talked to the AMD USB hardware team and the > > > > > > > AMD windows team and they are not aware of any HW > > > > > > > errata or specific issues. The HW works fine in > > > > > > > windows. I was told windows uses a rather generous > > > > > > > default delay of 100ms for PCI state transitions.] > > > > > > > > > > > > > > Signed-off-by: Marcin Bachry <hegel666@gmail.com> > > > > > > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com> > > > > > > > > > > > > Added stable tag and applied to pci/pm for v5.15, thanks! > > > > > > > > > > Thanks Bjorn! > > > > > > > > > > Given how small/harmless this is and 5.14 isn't cut yet, any > > > > > chance this could still make one of the -rcX rather than wait for 5.14.1 > instead? > > > > > > > > Done. > > > > > > Thanks! > > > > > > > What's the rest of the story here? Aare we working around a > > > > defect in these XHCI controllers? A defect in Linux? Obviously > > > > nobody wants to have to add a quirk for every new Device ID. It's > > > > not like this should be hard to figure out for your hardware guys > > > > in the lab, and if it turns out to be a Linux problem, we should > > > > fix it so everybody benefits. > > > > > > Maybe you missed the embedded message from Alex above. We had a > > > discussion with our internal team that works with Windows on this, > > > and they told us the default delay is significantly more generous on > Windows. > > > > I did see Alex's message, but it didn't answer the question of whether > > this is a hardware defect or a Linux defect. "It works fine in > > Windows" doesn't mean the hardware conforms to the spec. > > > > PCIe r5.0, sec 5.3.1.4 says "... System Software must allow a minimum > > recovery time following a D3Hot → D0 transition of at least 10 ms (see > > Section 7.9.17), prior to accessing the Function." > > > > If the hardware isn't ready in 10ms, I'd claim that's a hardware > > defect. > > > > If Linux isn't waiting the 10ms, I'd claim that's a Linux defect. > > > > If things work by waiting 100ms, that's nice, but what's the point of > > specs if we have to increase the time and penalize everybody just to > > accommodate some oddball device? > > 10ms after hitting "send" it occurred to me that since all of these quirks are > for AMD devices, we could just make the quirk generic so we wait 100ms for > *all* AMD devices. Then AMD boxes would resume a little slower than > everybody else, but some of the maintenance burden would go away. > We probably only need a slight increase. As I said in the comment on the patch, it seems to only affect a small percentage of boards. For the most part 10ms seems to be fine. More of a corner case, maybe specific to certain platforms. It doesn't show up in silicon validation on our reference boards and then presumably doesn’t show up in windows due the increased timeout. I'll keep this in mind on the next platform and I'll consider a patch to generically increase the timeout for AMD if it proves to still be an issue in the wild again. So far our upcoming platforms (at least our internal engineering platforms don't exhibit this). That said, I don't recall us seeing this issue on any of our reference platforms in the past. Thanks, Alex > I'm only half joking, and I would take that patch if you sent it. > > Bjorn
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 22b2bb1109c9..dea10d62d5b9 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -1899,6 +1899,7 @@ static void quirk_ryzen_xhci_d3hot(struct pci_dev *dev) } DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e0, quirk_ryzen_xhci_d3hot); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15e1, quirk_ryzen_xhci_d3hot); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x1639, quirk_ryzen_xhci_d3hot); #ifdef CONFIG_X86_IO_APIC static int dmi_disable_ioapicreroute(const struct dmi_system_id *d)