Message ID | 20190122064544.27426-1-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | PCI / ACPI: Don't clear pme_poll on device that has unreliable ACPI wake | expand |
> On Jan 22, 2019, at 14:45, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > There are some e1000e devices can only be woken up from D3 one time, by > plugging ethernet cable. Subsequent cable plugging does set PME bit > correctly, but it still doesn't get woken up. > > Since e1000e connects to the root complex directly, we rely on ACPI to > wake it up. In this case, the GPE from _PRW only works once and stops > working after that. > > So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy > platform firmwares that have unreliable GPE wake. Forgot this: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202181 > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > drivers/pci/pci-acpi.c | 2 +- > drivers/pci/quirks.c | 8 ++++++++ > include/linux/pci.h | 1 + > 3 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index e1949f7efd9c..184e2fc8a294 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) > > pci_dev = to_pci_dev(context->dev); > > - if (pci_dev->pme_poll) > + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake) > pci_dev->pme_poll = false; > > if (pci_dev->current_state == PCI_D3cold) { > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index b0a413f3f7ca..ed4863496fa8 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID, > DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, > PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda); > > +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev) > +{ > + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n"); > + pdev->unreliable_acpi_wake = 1; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, quirk_unreliable_acpi_wake); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, quirk_unreliable_acpi_wake); > + > /* > * Some IDT switches incorrectly flag an ACS Source Validation error on > * completions for config read requests even though PCIe r4.0, sec > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 65f1d8c2f082..d22065c1576f 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -331,6 +331,7 @@ struct pci_dev { > unsigned int pme_support:5; /* Bitmask of states from which PME# > can be generated */ > unsigned int pme_poll:1; /* Poll device's PME status bit */ > + unsigned int unreliable_acpi_wake:1; /* ACPI Wake doesn't always work */ > unsigned int d1_support:1; /* Low power state D1 is supported */ > unsigned int d2_support:1; /* Low power state D2 is supported */ > unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ > -- > 2.17.1 >
On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote: > There are some e1000e devices can only be woken up from D3 one time, by > plugging ethernet cable. Subsequent cable plugging does set PME bit > correctly, but it still doesn't get woken up. > > Since e1000e connects to the root complex directly, we rely on ACPI to > wake it up. In this case, the GPE from _PRW only works once and stops > working after that. > > So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy > platform firmwares that have unreliable GPE wake. This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices. The e1000e driver claims about a zillion different device IDs. I would be surprised if these two devices are defective but all the others work correctly. Could it be that there is a problem with the wiring on this particular motherboard or with the ACPI _PRW methods (or the way Linux interprets them) in this firmware? Would you mind attaching a complete dmesg log and "sudo lspci -vvv" output to the bugzilla, please? > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > drivers/pci/pci-acpi.c | 2 +- > drivers/pci/quirks.c | 8 ++++++++ > include/linux/pci.h | 1 + > 3 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > index e1949f7efd9c..184e2fc8a294 100644 > --- a/drivers/pci/pci-acpi.c > +++ b/drivers/pci/pci-acpi.c > @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) > > pci_dev = to_pci_dev(context->dev); > > - if (pci_dev->pme_poll) > + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake) > pci_dev->pme_poll = false; > > if (pci_dev->current_state == PCI_D3cold) { > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index b0a413f3f7ca..ed4863496fa8 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID, > DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, > PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda); > > +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev) > +{ > + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n"); > + pdev->unreliable_acpi_wake = 1; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, quirk_unreliable_acpi_wake); > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, quirk_unreliable_acpi_wake); > + > /* > * Some IDT switches incorrectly flag an ACS Source Validation error on > * completions for config read requests even though PCIe r4.0, sec > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 65f1d8c2f082..d22065c1576f 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -331,6 +331,7 @@ struct pci_dev { > unsigned int pme_support:5; /* Bitmask of states from which PME# > can be generated */ > unsigned int pme_poll:1; /* Poll device's PME status bit */ > + unsigned int unreliable_acpi_wake:1; /* ACPI Wake doesn't always work */ > unsigned int d1_support:1; /* Low power state D1 is supported */ > unsigned int d2_support:1; /* Low power state D2 is supported */ > unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ > -- > 2.17.1 >
> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote: >> There are some e1000e devices can only be woken up from D3 one time, by >> plugging ethernet cable. Subsequent cable plugging does set PME bit >> correctly, but it still doesn't get woken up. >> >> Since e1000e connects to the root complex directly, we rely on ACPI to >> wake it up. In this case, the GPE from _PRW only works once and stops >> working after that. >> >> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy >> platform firmwares that have unreliable GPE wake. > > This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and > 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices. The e1000e driver > claims about a zillion different device IDs. > > I would be surprised if these two devices are defective but all the > others work correctly. Could it be that there is a problem with the > wiring on this particular motherboard or with the ACPI _PRW methods > (or the way Linux interprets them) in this firmware? If this is a motherboard issue or platform specific, do you prefer to use DMI matches here? As for _PRW, it’s shared by USB controller, Audio controller and ethernet. Only the ethernet (e1000e) has this issue. When this issue happens, the e1000e doesn’t get woken up by ethernet cable plugging, but inserting a USB device or plugging audio jack can wake up all three devices. So I think Linux interprets ACPI correctly here. Their _PRW here: USB controller: Scope (_SB.PCI0) { Device (XDCI) { Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW (0x6D, 0x04)) } Audio controller: Scope (_SB.PCI0) { Device (HDAS) { … Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW (0x6D, 0x04)) } Ethernet controller: Scope (_SB.PCI0) { Device (GLAN) { … Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW (0x6D, 0x04)) } } } > > Would you mind attaching a complete dmesg log and "sudo lspci -vvv" > output to the bugzilla, please? Sure. Kai-Heng > >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >> --- >> drivers/pci/pci-acpi.c | 2 +- >> drivers/pci/quirks.c | 8 ++++++++ >> include/linux/pci.h | 1 + >> 3 files changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c >> index e1949f7efd9c..184e2fc8a294 100644 >> --- a/drivers/pci/pci-acpi.c >> +++ b/drivers/pci/pci-acpi.c >> @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) >> >> pci_dev = to_pci_dev(context->dev); >> >> - if (pci_dev->pme_poll) >> + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake) >> pci_dev->pme_poll = false; >> >> if (pci_dev->current_state == PCI_D3cold) { >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >> index b0a413f3f7ca..ed4863496fa8 100644 >> --- a/drivers/pci/quirks.c >> +++ b/drivers/pci/quirks.c >> @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID, >> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, >> PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda); >> >> +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev) >> +{ >> + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n"); >> + pdev->unreliable_acpi_wake = 1; >> +} >> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, quirk_unreliable_acpi_wake); >> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, quirk_unreliable_acpi_wake); >> + >> /* >> * Some IDT switches incorrectly flag an ACS Source Validation error on >> * completions for config read requests even though PCIe r4.0, sec >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 65f1d8c2f082..d22065c1576f 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -331,6 +331,7 @@ struct pci_dev { >> unsigned int pme_support:5; /* Bitmask of states from which PME# >> can be generated */ >> unsigned int pme_poll:1; /* Poll device's PME status bit */ >> + unsigned int unreliable_acpi_wake:1; /* ACPI Wake doesn't always work */ >> unsigned int d1_support:1; /* Low power state D1 is supported */ >> unsigned int d2_support:1; /* Low power state D2 is supported */ >> unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ >> -- >> 2.17.1 >>
On Wed, Jan 23, 2019 at 03:17:37PM +0800, Kai Heng Feng wrote: > > On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote: > >> There are some e1000e devices can only be woken up from D3 one time, by > >> plugging ethernet cable. Subsequent cable plugging does set PME bit > >> correctly, but it still doesn't get woken up. > >> > >> Since e1000e connects to the root complex directly, we rely on ACPI to > >> wake it up. In this case, the GPE from _PRW only works once and stops > >> working after that. > >> > >> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy > >> platform firmwares that have unreliable GPE wake. > > > > This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and > > 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices. The e1000e driver > > claims about a zillion different device IDs. > > > > I would be surprised if these two devices are defective but all the > > others work correctly. Could it be that there is a problem with the > > wiring on this particular motherboard or with the ACPI _PRW methods > > (or the way Linux interprets them) in this firmware? > > If this is a motherboard issue or platform specific, do you prefer to use > DMI matches here? I'm not sure what the problem is yet, so let's hold off on the exact structure of the fix. If I understand correctly, e1000e wakeup works once, but doesn't work after that. Your lspci (from after that first wakeup, from https://bugzilla.kernel.org/attachment.cgi?id=280691) shows this: 00:14.0 XHC XHCI USB Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- 00:1f.3 HDAS audio Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- 00:1f.6 GLAN e1000e Flags: PMEClk- DSI+ D1- D2- ... PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME+ So the e1000e PME_Status bit is still set, which means it probably won't generate another PME interrupt, which would explain why wakeup doesn't work. To test this theory, can you try this: - sleep - wakeup via e1000e # DEV=00:1f.6 # lspci -vvs $DEV # setpci -s $DEV CAP_PM+4.W # setpci -s $DEV CAP_PM+4.W=0x8100 - sleep - attempt another wakeup via e1000e If this second wakeup works, it would suggest that PME_Status isn't being cleared correctly. I see code, e.g., in acpi_setup_gpe_for_wake(), that *looks* like it would arrange to clear it, but I'm not very familiar with it. Maybe there's some issue with multiple devices sharing an "implicit notification" situation like this. > As for _PRW, it’s shared by USB controller, Audio controller and ethernet. > Only the ethernet (e1000e) has this issue. > > When this issue happens, the e1000e doesn’t get woken up by ethernet cable > plugging, but inserting a USB device or plugging audio jack can wake up all > three devices. So I think Linux interprets ACPI correctly here. > > Their _PRW here: > USB controller: > Scope (_SB.PCI0) > { > Device (XDCI) > { > Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake > { > Return (GPRW (0x6D, 0x04)) > } > > Audio controller: > Scope (_SB.PCI0) > { > Device (HDAS) > { > … > Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake > { > Return (GPRW (0x6D, 0x04)) > } > > Ethernet controller: > Scope (_SB.PCI0) > { > Device (GLAN) > { > … > Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake > { > Return (GPRW (0x6D, 0x04)) > } > } > } > > > > > > Would you mind attaching a complete dmesg log and "sudo lspci -vvv" > > output to the bugzilla, please? > > Sure. > > Kai-Heng > > > > >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > >> --- > >> drivers/pci/pci-acpi.c | 2 +- > >> drivers/pci/quirks.c | 8 ++++++++ > >> include/linux/pci.h | 1 + > >> 3 files changed, 10 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c > >> index e1949f7efd9c..184e2fc8a294 100644 > >> --- a/drivers/pci/pci-acpi.c > >> +++ b/drivers/pci/pci-acpi.c > >> @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) > >> > >> pci_dev = to_pci_dev(context->dev); > >> > >> - if (pci_dev->pme_poll) > >> + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake) > >> pci_dev->pme_poll = false; > >> > >> if (pci_dev->current_state == PCI_D3cold) { > >> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > >> index b0a413f3f7ca..ed4863496fa8 100644 > >> --- a/drivers/pci/quirks.c > >> +++ b/drivers/pci/quirks.c > >> @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID, > >> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, > >> PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda); > >> > >> +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev) > >> +{ > >> + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n"); > >> + pdev->unreliable_acpi_wake = 1; > >> +} > >> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, quirk_unreliable_acpi_wake); > >> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, quirk_unreliable_acpi_wake); > >> + > >> /* > >> * Some IDT switches incorrectly flag an ACS Source Validation error on > >> * completions for config read requests even though PCIe r4.0, sec > >> diff --git a/include/linux/pci.h b/include/linux/pci.h > >> index 65f1d8c2f082..d22065c1576f 100644 > >> --- a/include/linux/pci.h > >> +++ b/include/linux/pci.h > >> @@ -331,6 +331,7 @@ struct pci_dev { > >> unsigned int pme_support:5; /* Bitmask of states from which PME# > >> can be generated */ > >> unsigned int pme_poll:1; /* Poll device's PME status bit */ > >> + unsigned int unreliable_acpi_wake:1; /* ACPI Wake doesn't always work */ > >> unsigned int d1_support:1; /* Low power state D1 is supported */ > >> unsigned int d2_support:1; /* Low power state D2 is supported */ > >> unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ > >> -- > >> 2.17.1
> On Jan 24, 2019, at 11:15 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Jan 23, 2019 at 03:17:37PM +0800, Kai Heng Feng wrote: >>> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote: >>>> There are some e1000e devices can only be woken up from D3 one time, by >>>> plugging ethernet cable. Subsequent cable plugging does set PME bit >>>> correctly, but it still doesn't get woken up. >>>> >>>> Since e1000e connects to the root complex directly, we rely on ACPI to >>>> wake it up. In this case, the GPE from _PRW only works once and stops >>>> working after that. >>>> >>>> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy >>>> platform firmwares that have unreliable GPE wake. >>> >>> This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and >>> 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices. The e1000e driver >>> claims about a zillion different device IDs. >>> >>> I would be surprised if these two devices are defective but all the >>> others work correctly. Could it be that there is a problem with the >>> wiring on this particular motherboard or with the ACPI _PRW methods >>> (or the way Linux interprets them) in this firmware? >> >> If this is a motherboard issue or platform specific, do you prefer to use >> DMI matches here? > > I'm not sure what the problem is yet, so let's hold off on the exact > structure of the fix. I think DMI table can put in e1000e driver instead of PCI quirk. > > If I understand correctly, e1000e wakeup works once, but doesn't work > after that. Your lspci (from after that first wakeup, from > https://bugzilla.kernel.org/attachment.cgi?id=280691) shows this: > > 00:14.0 XHC XHCI USB > Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) > Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- > 00:1f.3 HDAS audio > Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) > Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- > 00:1f.6 GLAN e1000e > Flags: PMEClk- DSI+ D1- D2- ... PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME+ > > So the e1000e PME_Status bit is still set, which means it probably > won't generate another PME interrupt, which would explain why wakeup > doesn't work. To test this theory, can you try this: > > - sleep > - wakeup via e1000e > # DEV=00:1f.6 > # lspci -vvs $DEV > # setpci -s $DEV CAP_PM+4.W > # setpci -s $DEV CAP_PM+4.W=0x8100 > - sleep > - attempt another wakeup via e1000e > > If this second wakeup works, it would suggest that PME_Status isn't > being cleared correctly. I see code, e.g., in > acpi_setup_gpe_for_wake(), that *looks* like it would arrange to clear > it, but I'm not very familiar with it. Maybe there's some issue with > multiple devices sharing an "implicit notification" situation like > this. The PME status is being cleared correctly. The lspci is captured after I plugged the ethernet cable second time, i.e. PME is set but not being woken up. Kai-Heng > >> As for _PRW, it’s shared by USB controller, Audio controller and ethernet. >> Only the ethernet (e1000e) has this issue. >> >> When this issue happens, the e1000e doesn’t get woken up by ethernet cable >> plugging, but inserting a USB device or plugging audio jack can wake up all >> three devices. So I think Linux interprets ACPI correctly here. >> >> Their _PRW here: >> USB controller: >> Scope (_SB.PCI0) >> { >> Device (XDCI) >> { >> Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake >> { >> Return (GPRW (0x6D, 0x04)) >> } >> >> Audio controller: >> Scope (_SB.PCI0) >> { >> Device (HDAS) >> { >> … >> Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake >> { >> Return (GPRW (0x6D, 0x04)) >> } >> >> Ethernet controller: >> Scope (_SB.PCI0) >> { >> Device (GLAN) >> { >> … >> Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake >> { >> Return (GPRW (0x6D, 0x04)) >> } >> } >> } >> >> >>> >>> Would you mind attaching a complete dmesg log and "sudo lspci -vvv" >>> output to the bugzilla, please? >> >> Sure. >> >> Kai-Heng >> >>> >>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >>>> --- >>>> drivers/pci/pci-acpi.c | 2 +- >>>> drivers/pci/quirks.c | 8 ++++++++ >>>> include/linux/pci.h | 1 + >>>> 3 files changed, 10 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c >>>> index e1949f7efd9c..184e2fc8a294 100644 >>>> --- a/drivers/pci/pci-acpi.c >>>> +++ b/drivers/pci/pci-acpi.c >>>> @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) >>>> >>>> pci_dev = to_pci_dev(context->dev); >>>> >>>> - if (pci_dev->pme_poll) >>>> + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake) >>>> pci_dev->pme_poll = false; >>>> >>>> if (pci_dev->current_state == PCI_D3cold) { >>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c >>>> index b0a413f3f7ca..ed4863496fa8 100644 >>>> --- a/drivers/pci/quirks.c >>>> +++ b/drivers/pci/quirks.c >>>> @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID, >>>> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, >>>> PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda); >>>> >>>> +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev) >>>> +{ >>>> + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n"); >>>> + pdev->unreliable_acpi_wake = 1; >>>> +} >>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, quirk_unreliable_acpi_wake); >>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, quirk_unreliable_acpi_wake); >>>> + >>>> /* >>>> * Some IDT switches incorrectly flag an ACS Source Validation error on >>>> * completions for config read requests even though PCIe r4.0, sec >>>> diff --git a/include/linux/pci.h b/include/linux/pci.h >>>> index 65f1d8c2f082..d22065c1576f 100644 >>>> --- a/include/linux/pci.h >>>> +++ b/include/linux/pci.h >>>> @@ -331,6 +331,7 @@ struct pci_dev { >>>> unsigned int pme_support:5; /* Bitmask of states from which PME# >>>> can be generated */ >>>> unsigned int pme_poll:1; /* Poll device's PME status bit */ >>>> + unsigned int unreliable_acpi_wake:1; /* ACPI Wake doesn't always work */ >>>> unsigned int d1_support:1; /* Low power state D1 is supported */ >>>> unsigned int d2_support:1; /* Low power state D2 is supported */ >>>> unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ >>>> -- >>>> 2.17.1
On Thu, Jan 24, 2019 at 11:29:37PM +0800, Kai Heng Feng wrote: > > On Jan 24, 2019, at 11:15 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, Jan 23, 2019 at 03:17:37PM +0800, Kai Heng Feng wrote: > >>> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > >>> On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote: > >>>> There are some e1000e devices can only be woken up from D3 one time, by > >>>> plugging ethernet cable. Subsequent cable plugging does set PME bit > >>>> correctly, but it still doesn't get woken up. > >>>> > >>>> Since e1000e connects to the root complex directly, we rely on ACPI to > >>>> wake it up. In this case, the GPE from _PRW only works once and stops > >>>> working after that. > >>>> > >>>> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy > >>>> platform firmwares that have unreliable GPE wake. > >>> > >>> This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and > >>> 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices. The e1000e driver > >>> claims about a zillion different device IDs. > >>> > >>> I would be surprised if these two devices are defective but all the > >>> others work correctly. Could it be that there is a problem with the > >>> wiring on this particular motherboard or with the ACPI _PRW methods > >>> (or the way Linux interprets them) in this firmware? > >> > >> If this is a motherboard issue or platform specific, do you prefer to use > >> DMI matches here? > > > > I'm not sure what the problem is yet, so let's hold off on the exact > > structure of the fix. > > I think DMI table can put in e1000e driver instead of PCI quirk. I don't think we should add a quirk or DMI table yet because we haven't gotten to the root cause of this problem. If the root cause is a problem in the Linux code, adding a quirk will mask the problem for this specific system, but will leave other systems with similar problems. > > If I understand correctly, e1000e wakeup works once, but doesn't work > > after that. Your lspci (from after that first wakeup, from > > https://bugzilla.kernel.org/attachment.cgi?id=280691) shows this: > > > > 00:14.0 XHC XHCI USB > > Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) > > Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- > > 00:1f.3 HDAS audio > > Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) > > Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- > > 00:1f.6 GLAN e1000e > > Flags: PMEClk- DSI+ D1- D2- ... PME(D0+,D1-,D2-,D3hot+,D3cold+) > > Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME+ > > > > So the e1000e PME_Status bit is still set, which means it probably > > won't generate another PME interrupt, which would explain why wakeup > > doesn't work. To test this theory, can you try this: > > > > - sleep > > - wakeup via e1000e > > # DEV=00:1f.6 > > # lspci -vvs $DEV > > # setpci -s $DEV CAP_PM+4.W > > # setpci -s $DEV CAP_PM+4.W=0x8100 > > - sleep > > - attempt another wakeup via e1000e > > > > If this second wakeup works, it would suggest that PME_Status isn't > > being cleared correctly. I see code, e.g., in > > acpi_setup_gpe_for_wake(), that *looks* like it would arrange to clear > > it, but I'm not very familiar with it. Maybe there's some issue with > > multiple devices sharing an "implicit notification" situation like > > this. > > The PME status is being cleared correctly. I was hoping to understand this better via the experiment above, but I'm still confused. Here's the scenario as I understand it: 0) fresh boot 1) e1000e PME_Status should be 0 2) sleep 3) wakeup via e1000e succeeds 4) e1000e PME_Status should be 0 5) sleep 6) wakeup via e1000e fails 7) wakeup via USB succeeds 8) e1000e PME_Status should be 0, but is actually 1 If I understand correctly, the bugzilla lspci (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected at point 8, and it shows PME_Status=1 when it should be 0. If we write a 1 to PME_Status to clear it, and it remains set, that's obviously a hardware defect, and Intel should document that in an erratum, and a quirk would be the appropriate way to work around it. But I doubt that's what's happening. If e1000e changes PME_Status from 0 to 1 and we don't get an interrupt (in this case, an SCI triggering GPE 0x6d), the problem is something in the path outside e1000e. Since the wakeup works the first time, we know e1000e is *capable* of generating an interrupt, and the problem is probably something in the way we acknowledge it. I want to make sure we're executing the code that clears PCI_PM_CTRL_PME_STATUS. Would you mind applying the patch below, turning on CONFIG_ACPI_DEBUG, booting with acpi.debug_layer=0x00000004 acpi.debug_level=0x00000004 and collecting the dmesg log? I think you can also set these debug flags via /sys/module/acpi/parameters/debug_* if that's easier. > The lspci is captured after I plugged the ethernet cable second time, > i.e. PME is set but not being woken up. > > Kai-Heng Bjorn diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index c9d8e3c837de..91a70ffdd3a1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1927,7 +1927,7 @@ void pcie_clear_root_pme_status(struct pci_dev *dev) bool pci_check_pme_status(struct pci_dev *dev) { int pmcsr_pos; - u16 pmcsr; + u16 pmcsr, pmcsr_orig; bool ret = false; if (!dev->pm_cap) @@ -1939,6 +1939,7 @@ bool pci_check_pme_status(struct pci_dev *dev) return false; /* Clear PME status. */ + pmcsr_orig = pmcsr; pmcsr |= PCI_PM_CTRL_PME_STATUS; if (pmcsr & PCI_PM_CTRL_PME_ENABLE) { /* Disable PME to avoid interrupt flood. */ @@ -1948,6 +1949,9 @@ bool pci_check_pme_status(struct pci_dev *dev) pci_write_config_word(dev, pmcsr_pos, pmcsr); + pci_read_config_word(dev, pmcsr_pos, &pmcsr); + pci_info(dev, "%s PM_CTRL %#06x -> %#06x\n", __func__, pmcsr_orig, pmcsr); + return ret; } @@ -2027,18 +2031,22 @@ static void pci_pme_list_scan(struct work_struct *work) static void __pci_pme_active(struct pci_dev *dev, bool enable) { - u16 pmcsr; + u16 pmcsr, pmcsr_orig; if (!dev->pme_support) return; pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); + pmcsr_orig = pmcsr; /* Clear PME_Status by writing 1 to it and enable PME# */ pmcsr |= PCI_PM_CTRL_PME_STATUS | PCI_PM_CTRL_PME_ENABLE; if (!enable) pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); + + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); + pci_info(dev, "%s PM_CTRL %#06x -> %#06x\n", __func__, pmcsr_orig, pmcsr); } /**
> On Jan 25, 2019, at 4:05 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Jan 24, 2019 at 11:29:37PM +0800, Kai Heng Feng wrote: >>> On Jan 24, 2019, at 11:15 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> On Wed, Jan 23, 2019 at 03:17:37PM +0800, Kai Heng Feng wrote: >>>>> On Jan 23, 2019, at 7:51 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>>>> On Tue, Jan 22, 2019 at 02:45:44PM +0800, Kai-Heng Feng wrote: >>>>>> There are some e1000e devices can only be woken up from D3 one time, by >>>>>> plugging ethernet cable. Subsequent cable plugging does set PME bit >>>>>> correctly, but it still doesn't get woken up. >>>>>> >>>>>> Since e1000e connects to the root complex directly, we rely on ACPI to >>>>>> wake it up. In this case, the GPE from _PRW only works once and stops >>>>>> working after that. >>>>>> >>>>>> So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy >>>>>> platform firmwares that have unreliable GPE wake. >>>>> >>>>> This quirk applies to all 0x15bb (E1000_DEV_ID_PCH_CNP_I219_LM7) and >>>>> 0x15bd (E1000_DEV_ID_PCH_CNP_I219_LM6) devices. The e1000e driver >>>>> claims about a zillion different device IDs. >>>>> >>>>> I would be surprised if these two devices are defective but all the >>>>> others work correctly. Could it be that there is a problem with the >>>>> wiring on this particular motherboard or with the ACPI _PRW methods >>>>> (or the way Linux interprets them) in this firmware? >>>> >>>> If this is a motherboard issue or platform specific, do you prefer to use >>>> DMI matches here? >>> >>> I'm not sure what the problem is yet, so let's hold off on the exact >>> structure of the fix. >> >> I think DMI table can put in e1000e driver instead of PCI quirk. > > I don't think we should add a quirk or DMI table yet because we > haven't gotten to the root cause of this problem. If the root cause > is a problem in the Linux code, adding a quirk will mask the problem > for this specific system, but will leave other systems with similar > problems. > >>> If I understand correctly, e1000e wakeup works once, but doesn't work >>> after that. Your lspci (from after that first wakeup, from >>> https://bugzilla.kernel.org/attachment.cgi?id=280691) shows this: >>> >>> 00:14.0 XHC XHCI USB >>> Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) >>> Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- >>> 00:1f.3 HDAS audio >>> Flags: PMEClk- DSI- D1- D2- ... PME(D0-,D1-,D2-,D3hot+,D3cold+) >>> Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- >>> 00:1f.6 GLAN e1000e >>> Flags: PMEClk- DSI+ D1- D2- ... PME(D0+,D1-,D2-,D3hot+,D3cold+) >>> Status: D3 NoSoftRst+ PME-Enable+ DSel=0 DScale=1 PME+ >>> >>> So the e1000e PME_Status bit is still set, which means it probably >>> won't generate another PME interrupt, which would explain why wakeup >>> doesn't work. To test this theory, can you try this: >>> >>> - sleep >>> - wakeup via e1000e >>> # DEV=00:1f.6 >>> # lspci -vvs $DEV >>> # setpci -s $DEV CAP_PM+4.W >>> # setpci -s $DEV CAP_PM+4.W=0x8100 >>> - sleep >>> - attempt another wakeup via e1000e >>> >>> If this second wakeup works, it would suggest that PME_Status isn't >>> being cleared correctly. I see code, e.g., in >>> acpi_setup_gpe_for_wake(), that *looks* like it would arrange to clear >>> it, but I'm not very familiar with it. Maybe there's some issue with >>> multiple devices sharing an "implicit notification" situation like >>> this. >> >> The PME status is being cleared correctly. > > I was hoping to understand this better via the experiment above, but > I'm still confused. Here's the scenario as I understand it: > > 0) fresh boot > 1) e1000e PME_Status should be 0 > 2) sleep > 3) wakeup via e1000e succeeds > 4) e1000e PME_Status should be 0 > 5) sleep > 6) wakeup via e1000e fails > 7) wakeup via USB succeeds > 8) e1000e PME_Status should be 0, but is actually 1 Sorry for not illustrating the scenario more clearly, here’s the test scenario: 0) fresh boot 1) no ethernet cable plugged 2) e1000e runtime suspend 3) PME_Status is 0 4) plug ethernet cable 5) e1000e gets woken up by ACPI wakeup 6) network connection established 6) unplug the ethernet cable 7) e1000e runtime suspend 8) plug ethernet cable again 9) PME_Status=1 but it’s not woken up, stays suspended 10) Plug a USB device, e1000e wakes up. This shows somehow the ACPI GPE still works for USB controller but not ethernet device. > > If I understand correctly, the bugzilla lspci > (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected > at point 8, and it shows PME_Status=1 when it should be 0. > > If we write a 1 to PME_Status to clear it, and it remains set, that's > obviously a hardware defect, and Intel should document that in an > erratum, and a quirk would be the appropriate way to work around it. > But I doubt that's what's happening. I’ll ask them if they can provide an erratum. > > If e1000e changes PME_Status from 0 to 1 and we don't get an interrupt > (in this case, an SCI triggering GPE 0x6d), the problem is something > in the path outside e1000e. Since the wakeup works the first time, we > know e1000e is *capable* of generating an interrupt, and the problem > is probably something in the way we acknowledge it. > > I want to make sure we're executing the code that clears > PCI_PM_CTRL_PME_STATUS. Would you mind applying the patch below, > turning on CONFIG_ACPI_DEBUG, booting with > > acpi.debug_layer=0x00000004 acpi.debug_level=0x00000004 > > and collecting the dmesg log? I think you can also set these debug > flags via /sys/module/acpi/parameters/debug_* if that's easier. Dmesg with the debug patch applied: https://bugzilla.kernel.org/attachment.cgi?id=280819 At the end, I re-plugged the ethernet cable, nothing shows. Kai-Heng > >> The lspci is captured after I plugged the ethernet cable second time, >> i.e. PME is set but not being woken up. >> >> Kai-Heng > > Bjorn > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index c9d8e3c837de..91a70ffdd3a1 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1927,7 +1927,7 @@ void pcie_clear_root_pme_status(struct pci_dev *dev) > bool pci_check_pme_status(struct pci_dev *dev) > { > int pmcsr_pos; > - u16 pmcsr; > + u16 pmcsr, pmcsr_orig; > bool ret = false; > > if (!dev->pm_cap) > @@ -1939,6 +1939,7 @@ bool pci_check_pme_status(struct pci_dev *dev) > return false; > > /* Clear PME status. */ > + pmcsr_orig = pmcsr; > pmcsr |= PCI_PM_CTRL_PME_STATUS; > if (pmcsr & PCI_PM_CTRL_PME_ENABLE) { > /* Disable PME to avoid interrupt flood. */ > @@ -1948,6 +1949,9 @@ bool pci_check_pme_status(struct pci_dev *dev) > > pci_write_config_word(dev, pmcsr_pos, pmcsr); > > + pci_read_config_word(dev, pmcsr_pos, &pmcsr); > + pci_info(dev, "%s PM_CTRL %#06x -> %#06x\n", __func__, pmcsr_orig, pmcsr); > + > return ret; > } > > @@ -2027,18 +2031,22 @@ static void pci_pme_list_scan(struct work_struct *work) > > static void __pci_pme_active(struct pci_dev *dev, bool enable) > { > - u16 pmcsr; > + u16 pmcsr, pmcsr_orig; > > if (!dev->pme_support) > return; > > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > + pmcsr_orig = pmcsr; > /* Clear PME_Status by writing 1 to it and enable PME# */ > pmcsr |= PCI_PM_CTRL_PME_STATUS | PCI_PM_CTRL_PME_ENABLE; > if (!enable) > pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > + > + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > + pci_info(dev, "%s PM_CTRL %#06x -> %#06x\n", __func__, pmcsr_orig, pmcsr); > } > > /**
Hi Bjorn, > On Jan 28, 2019, at 3:51 PM, Kai Heng Feng <kai.heng.feng@canonical.com> wrote: [snipped] >> If I understand correctly, the bugzilla lspci >> (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected >> at point 8, and it shows PME_Status=1 when it should be 0. >> >> If we write a 1 to PME_Status to clear it, and it remains set, that's >> obviously a hardware defect, and Intel should document that in an >> erratum, and a quirk would be the appropriate way to work around it. >> But I doubt that's what's happening. > > I’ll ask them if they can provide an erratum. Got confirmed with e1000e folks, I219 (the device in question) doesn’t really support runtime D3. I also checked the behavior of the device under Windows, and it stays at D0 all the time even when it’s not in use. So I sent a patch [1] to disable it. [1] https://lkml.org/lkml/2019/2/2/200 Kai-Heng
On Sun, Feb 03, 2019 at 01:46:50AM +0800, Kai Heng Feng wrote: > > On Jan 28, 2019, at 3:51 PM, Kai Heng Feng <kai.heng.feng@canonical.com> wrote: > > >> If I understand correctly, the bugzilla lspci > >> (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected > >> at point 8, and it shows PME_Status=1 when it should be 0. > >> > >> If we write a 1 to PME_Status to clear it, and it remains set, that's > >> obviously a hardware defect, and Intel should document that in an > >> erratum, and a quirk would be the appropriate way to work around it. > >> But I doubt that's what's happening. > > > > I’ll ask them if they can provide an erratum. > > Got confirmed with e1000e folks, I219 (the device in question) doesn’t > really support runtime D3. Did you get a reference, e.g., an intel.com URL for that? Intel usually publishes errata for hardware defects, which is nice because it means every customer doesn't have to experimentally rediscover them. > I also checked the behavior of the device under Windows, and it > stays at D0 all the time even when it’s not in use. I think there are two possible explanations for this: 1) This device requires a Windows or a driver update with a device-specific quirk similar to what you're proposing for Linux. 2) Windows correctly detects that this device doesn't support D3, and Linux has a bug and does not detect that. Obviously nobody wants to require OS or driver updates just for minor device changes, and the PCI and ACPI specs are designed to allow generic, non device-specific code to detect D3 support, so the first case should be a result of a hardware defect. > So I sent a patch [1] to disable it. > > [1] https://lkml.org/lkml/2019/2/2/200 OK. Since that's in drivers/net/..., I have no objection and the e1000e maintainers would deal with that. Bjorn
> On Feb 4, 2019, at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Sun, Feb 03, 2019 at 01:46:50AM +0800, Kai Heng Feng wrote: >>> On Jan 28, 2019, at 3:51 PM, Kai Heng Feng <kai.heng.feng@canonical.com> wrote: >> >>>> If I understand correctly, the bugzilla lspci >>>> (https://bugzilla.kernel.org/attachment.cgi?id=280691) was collected >>>> at point 8, and it shows PME_Status=1 when it should be 0. >>>> >>>> If we write a 1 to PME_Status to clear it, and it remains set, that's >>>> obviously a hardware defect, and Intel should document that in an >>>> erratum, and a quirk would be the appropriate way to work around it. >>>> But I doubt that's what's happening. >>> >>> I’ll ask them if they can provide an erratum. >> >> Got confirmed with e1000e folks, I219 (the device in question) doesn’t >> really support runtime D3. > > Did you get a reference, e.g., an intel.com URL for that? Intel > usually publishes errata for hardware defects, which is nice because > it means every customer doesn't have to experimentally rediscover > them. Unfortunately no. > >> I also checked the behavior of the device under Windows, and it >> stays at D0 all the time even when it’s not in use. > > I think there are two possible explanations for this: > > 1) This device requires a Windows or a driver update with a > device-specific quirk similar to what you're proposing for Linux. I am sure the latest driver is loaded under Windows. > > 2) Windows correctly detects that this device doesn't support D3, > and Linux has a bug and does not detect that. I think that’s the case. > > Obviously nobody wants to require OS or driver updates just for minor > device changes, and the PCI and ACPI specs are designed to allow > generic, non device-specific code to detect D3 support, so the first > case should be a result of a hardware defect. Yea, that’s why my original idea is to workaround it in PCI/ACPI. > >> So I sent a patch [1] to disable it. >> >> [1] https://lkml.org/lkml/2019/2/2/200 > > OK. Since that's in drivers/net/..., I have no objection and the > e1000e maintainers would deal with that. Thanks. Kai-Heng > > Bjorn
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index e1949f7efd9c..184e2fc8a294 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -430,7 +430,7 @@ static void pci_acpi_wake_dev(struct acpi_device_wakeup_context *context) pci_dev = to_pci_dev(context->dev); - if (pci_dev->pme_poll) + if (pci_dev->pme_poll && !pci_dev->unreliable_acpi_wake) pci_dev->pme_poll = false; if (pci_dev->current_state == PCI_D3cold) { diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index b0a413f3f7ca..ed4863496fa8 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4948,6 +4948,14 @@ DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMD, PCI_ANY_ID, DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda); +static void quirk_unreliable_acpi_wake(struct pci_dev *pdev) +{ + pci_info(pdev, "ACPI Wake unreliable, always poll PME\n"); + pdev->unreliable_acpi_wake = 1; +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bb, quirk_unreliable_acpi_wake); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x15bd, quirk_unreliable_acpi_wake); + /* * Some IDT switches incorrectly flag an ACS Source Validation error on * completions for config read requests even though PCIe r4.0, sec diff --git a/include/linux/pci.h b/include/linux/pci.h index 65f1d8c2f082..d22065c1576f 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -331,6 +331,7 @@ struct pci_dev { unsigned int pme_support:5; /* Bitmask of states from which PME# can be generated */ unsigned int pme_poll:1; /* Poll device's PME status bit */ + unsigned int unreliable_acpi_wake:1; /* ACPI Wake doesn't always work */ unsigned int d1_support:1; /* Low power state D1 is supported */ unsigned int d2_support:1; /* Low power state D2 is supported */ unsigned int no_d1d2:1; /* D1 and D2 are forbidden */
There are some e1000e devices can only be woken up from D3 one time, by plugging ethernet cable. Subsequent cable plugging does set PME bit correctly, but it still doesn't get woken up. Since e1000e connects to the root complex directly, we rely on ACPI to wake it up. In this case, the GPE from _PRW only works once and stops working after that. So introduce a new PCI quirk, to avoid clearing pme_poll flag for buggy platform firmwares that have unreliable GPE wake. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- drivers/pci/pci-acpi.c | 2 +- drivers/pci/quirks.c | 8 ++++++++ include/linux/pci.h | 1 + 3 files changed, 10 insertions(+), 1 deletion(-)