Message ID | 20140801161107.GA4553@pd.tnic (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Aug 01, 2014 at 06:11:08PM +0200, Borislav Petkov wrote: > [ 89.040795] pcieport 0000:00:04.0: System wakeup enabled by ACPI > [ 89.061697] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0014 address=0x0000000020001000 flags=0x0000] > [ 89.071871] ACPI: Preparing to enter system sleep state S5 > [ 89.072117] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query honored via cmdline > [ 89.089832] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x0000000000000080 flags=0x0020] > [ 89.102239] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x0000000000000000 flags=0x0000] > [ 89.114684] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x00000000ffffffc0 flags=0x0010] > [ 89.127162] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x00000000ffffffc0 flags=0x0010] > [ 89.139576] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x00000000ffffffc0 flags=0x0010] > [ 89.152017] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x00000000ffffffc0 flags=0x0010] > [ 89.164481] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.0 domain=0x0009 address=0x00000000ffffffc0 flags=0x0010] > [ 89.176994] AMD-Vi: Event logged [[ 89.177657] reboot: Power down > [ 89.185286] acpi_power_off called > > Now this device 00:12.0 is that OHCI thing for which we have the > hcd-pci.c hunk applied above, AFAICT: > > 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller > > so it must be still some timing issue there after disabling the device > and *before* disabling the IOMMU. My guess is that the firmware takes back the device after the OS released it and now the legacy emulation tries to do DMA with it. But since there is an IOMMU the physical addresses it tries to DMA to is not mapped and it generated IO page faults In a perfect world the BIOS would define Unity Mapping regions in the IVRS ACPI table for the USB controler so that the IOMMU driver would keep these regions mapped. But I have never seen those regions defined by any BIOS of an AMD machine, so this is probably the cause why you are seeing these IO page faults. But this doesn't explain the GPU faults, though. Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 02, 2014 at 12:14:10AM +0200, Jörg Rödel wrote: > My guess is that the firmware takes back the device after the OS > released it and now the legacy emulation tries to do DMA with it. But > since there is an IOMMU the physical addresses it tries to DMA to is > not mapped and it generated IO page faults > > In a perfect world the BIOS would define Unity Mapping regions in the > IVRS ACPI table for the USB controler so that the IOMMU driver would > keep these regions mapped. But I have never seen those regions defined > by any BIOS of an AMD machine, so this is probably the cause why you are > seeing these IO page faults. Ok but what changed? Apparently we didn't have that small window earlier as I have never seen those IOMMU PFs before. So Jiang's stuff simply opens that hole now. And I bet this whole work is aiming at physical hotplug which is all fine and dandy but it shouldn't cause regressions. So, IIUC, the dynamic IOAPIC stuff of freeing an irq number during suspend opens this hole. Which leads me to the naive thinking that maybe this new behavior should be configurable so that systems can choose. And I bet I won't be the last one to trigger those when those changes hit 3.17... > But this doesn't explain the GPU faults, though. I think this got fixed by https://lkml.kernel.org/r/1406766807-5745-1-git-send-email-jiang.liu@linux.intel.com which keeps the IRQ numbers across S/R. I'll watch out for those though.
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index 82044b5d6113..efc953119ce2 100644 --- a/drivers/usb/core/hcd-pci.c +++ b/drivers/usb/core/hcd-pci.c @@ -380,6 +380,8 @@ void usb_hcd_pci_shutdown(struct pci_dev *dev) if (test_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags) && hcd->driver->shutdown) { hcd->driver->shutdown(hcd); + if (usb_hcd_is_primary_hcd(hcd) && hcd->irq > 0) + free_irq(hcd->irq, hcd); pci_disable_device(dev); } }