diff mbox

PCI: Revert "PCI: Add runtime PM support for PCIe ports"

Message ID 20161227235737.GB24366@bhelgaas-glaptop.roam.corp.google.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Bjorn Helgaas Dec. 27, 2016, 11:57 p.m. UTC
Hi Killian,

Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
and all the debugging you've done.  Below is a revert of the troublesome
commit.  Can you test it and verify that it also fixes the problem?

I assume Mika is looking at this and will have a better solution soon.
But if not, I'll queue this up for v4.10.


commit e648b1ca2b94d207289fedc2538d33c57cdbc4de
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Dec 27 17:27:30 2016 -0600

    Revert "PCI: Add runtime PM support for PCIe ports"
    
    Revert 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports").
    
    Killian reported that on a Lenovo W54l with i7-4810MQ, Intel HD Graphics
    4600, and NVIDIA Quadro® K1100M, locking the screen kills all keyboard and
    mouse interaction.  Reverting 006d44e49a25 fixes the problem.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=190861
    Reported-by: kilian.singer@quantumtechnology.info
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    CC: stable@vger.kernel.org	# v4.8+
    CC: Mika Westerberg <mika.westerberg@linux.intel.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mika Westerberg Dec. 28, 2016, 9:17 a.m. UTC | #1
On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> I assume Mika is looking at this and will have a better solution soon.

I'll take a look at the issue next week. I'm currently on vacation.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Dec. 28, 2016, 11:29 a.m. UTC | #2
On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> and all the debugging you've done.  Below is a revert of the troublesome
> commit.  Can you test it and verify that it also fixes the problem?
> 
> I assume Mika is looking at this and will have a better solution soon.
> But if not, I'll queue this up for v4.10.

@Kilian:  Are you using the proprietary nvidia driver?  If so,
does the issue go away if you blacklist that driver or use nouveau
instead?


With a bit of googling I found dmesg and lspci output for this model:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386

The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
0x60, 0x64.  I assume they're located behind the LPC bridge?

The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
registers with vga_tryget() but never releases that lock.  Intel
chipsets have a quirk wherein I/O ports are routed to the bus to
which the legacy PCI VGA registers are locked.  So once vga_tryget()
is called by the nvidia driver, access to the keyboard and mouse is
routed to bus 01 (on which the Nvidia card resides) and not to bus 00
(on which the LPC bridge resides).

My theory would be that if you lock the screen, the Nvidia card
runtime suspends, allowing the root port above it to suspend,
and then the I/O ports are no longer accessible.

We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
is adjusted by writing to I/O ports of a gmux controller situated below
the LPC bridge.  The nvidia driver locks the legacy VGA registers and
from that point reads from the I/O ports always return 0xff.  Commit
4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
sought to fix it but caused other breakage which remains unfixed so far:
https://bugzilla.kernel.org/show_bug.cgi?id=105051
https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11

I've always wondered if the Intel chipsets would behave more sensibly
if the LPC bridge had BARs specifying the I/O regions used by devices
below it.

Reverting runtime suspend for PCIe ports is not a good solution as it's
needed for Thunderbolt runtime PM on Macs.

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Dec. 28, 2016, 4:18 p.m. UTC | #3
On Wed, Dec 28, 2016 at 12:29:54PM +0100, Lukas Wunner wrote:
> On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > and all the debugging you've done.  Below is a revert of the troublesome
> > commit.  Can you test it and verify that it also fixes the problem?
> > 
> > I assume Mika is looking at this and will have a better solution soon.
> > But if not, I'll queue this up for v4.10.
> 
> @Kilian:  Are you using the proprietary nvidia driver?  If so,
> does the issue go away if you blacklist that driver or use nouveau
> instead?
> 
> 
> With a bit of googling I found dmesg and lspci output for this model:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386
> 
> The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
> 0x60, 0x64.  I assume they're located behind the LPC bridge?
> 
> The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
> registers with vga_tryget() but never releases that lock.  Intel
> chipsets have a quirk wherein I/O ports are routed to the bus to
> which the legacy PCI VGA registers are locked.  So once vga_tryget()
> is called by the nvidia driver, access to the keyboard and mouse is
> routed to bus 01 (on which the Nvidia card resides) and not to bus 00
> (on which the LPC bridge resides).

Interesting.  A spec reference would be a good addition to whatever
fix is proposed.

> My theory would be that if you lock the screen, the Nvidia card
> runtime suspends, allowing the root port above it to suspend,
> and then the I/O ports are no longer accessible.
> 
> We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
> is adjusted by writing to I/O ports of a gmux controller situated below
> the LPC bridge.  The nvidia driver locks the legacy VGA registers and
> from that point reads from the I/O ports always return 0xff.  Commit
> 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
> sought to fix it but caused other breakage which remains unfixed so far:
> https://bugzilla.kernel.org/show_bug.cgi?id=105051
> https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> 
> I've always wondered if the Intel chipsets would behave more sensibly
> if the LPC bridge had BARs specifying the I/O regions used by devices
> below it.
> 
> Reverting runtime suspend for PCIe ports is not a good solution as it's
> needed for Thunderbolt runtime PM on Macs.

The choices are:

  1) Fix the regression and preserve runtime PM for PCIe ports
  2) Fix the regression by reverting runtime PM for PCIe ports

Obviously we hope for 1).  Preserving runtime PM without fixing the
regression isn't even on the list.  I know this is Linux 101, so I
apologize for restating the obvious.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Dec. 29, 2016, 9:58 a.m. UTC | #4
Dear Lukas and Bjorn,
I am using the nouveau driver.
dmesg and lspci --vv is attached as a comment to the bug rebort
https://bugzilla.kernel.org/show_bug.cgi?id=190861#c15
https://bugzilla.kernel.org/show_bug.cgi?id=190861#c16

I could completely deactivate nvidia in the bios...

Best regards
Kilian

----- Original Message -----
From: "Bjorn Helgaas" <helgaas@kernel.org>
To: "Lukas Wunner" <lukas@wunner.de>
Cc: "Kilian Singer" <kilian.singer@quantumtechnology.info>, linux-pci@vger.kernel.org, "Mika Westerberg" <mika.westerberg@linux.intel.com>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Wednesday, December 28, 2016 5:18:16 PM
Subject: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

On Wed, Dec 28, 2016 at 12:29:54PM +0100, Lukas Wunner wrote:
> On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > and all the debugging you've done.  Below is a revert of the troublesome
> > commit.  Can you test it and verify that it also fixes the problem?
> > 
> > I assume Mika is looking at this and will have a better solution soon.
> > But if not, I'll queue this up for v4.10.
> 
> @Kilian:  Are you using the proprietary nvidia driver?  If so,
> does the issue go away if you blacklist that driver or use nouveau
> instead?
> 
> 
> With a bit of googling I found dmesg and lspci output for this model:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386
> 
> The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
> 0x60, 0x64.  I assume they're located behind the LPC bridge?
> 
> The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
> registers with vga_tryget() but never releases that lock.  Intel
> chipsets have a quirk wherein I/O ports are routed to the bus to
> which the legacy PCI VGA registers are locked.  So once vga_tryget()
> is called by the nvidia driver, access to the keyboard and mouse is
> routed to bus 01 (on which the Nvidia card resides) and not to bus 00
> (on which the LPC bridge resides).

Interesting.  A spec reference would be a good addition to whatever
fix is proposed.

> My theory would be that if you lock the screen, the Nvidia card
> runtime suspends, allowing the root port above it to suspend,
> and then the I/O ports are no longer accessible.
> 
> We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
> is adjusted by writing to I/O ports of a gmux controller situated below
> the LPC bridge.  The nvidia driver locks the legacy VGA registers and
> from that point reads from the I/O ports always return 0xff.  Commit
> 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
> sought to fix it but caused other breakage which remains unfixed so far:
> https://bugzilla.kernel.org/show_bug.cgi?id=105051
> https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> 
> I've always wondered if the Intel chipsets would behave more sensibly
> if the LPC bridge had BARs specifying the I/O regions used by devices
> below it.
> 
> Reverting runtime suspend for PCIe ports is not a good solution as it's
> needed for Thunderbolt runtime PM on Macs.

The choices are:

  1) Fix the regression and preserve runtime PM for PCIe ports
  2) Fix the regression by reverting runtime PM for PCIe ports

Obviously we hope for 1).  Preserving runtime PM without fixing the
regression isn't even on the list.  I know this is Linux 101, so I
apologize for restating the obvious.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Dec. 29, 2016, 4:02 p.m. UTC | #5
One thing that was always weird in my debian system is, 
that even with working lock screen on the 4.7.0-1 version.
The lock screen is not a black screen but instead seems to
be a static screenshot of the desktop.

I currently have no system to compare with but this might be 
an abnormal behavior.

----- Original Message -----
From: "Kilian Singer" <kilian.singer@quantumtechnology.info>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: "Lukas Wunner" <lukas@wunner.de>, "linux-pci" <linux-pci@vger.kernel.org>, "Mika Westerberg" <mika.westerberg@linux.intel.com>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Thursday, December 29, 2016 10:58:44 AM
Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

Dear Lukas and Bjorn,
I am using the nouveau driver.
dmesg and lspci --vv is attached as a comment to the bug rebort
https://bugzilla.kernel.org/show_bug.cgi?id=190861#c15
https://bugzilla.kernel.org/show_bug.cgi?id=190861#c16

I could completely deactivate nvidia in the bios...

Best regards
Kilian

----- Original Message -----
From: "Bjorn Helgaas" <helgaas@kernel.org>
To: "Lukas Wunner" <lukas@wunner.de>
Cc: "Kilian Singer" <kilian.singer@quantumtechnology.info>, linux-pci@vger.kernel.org, "Mika Westerberg" <mika.westerberg@linux.intel.com>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Wednesday, December 28, 2016 5:18:16 PM
Subject: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

On Wed, Dec 28, 2016 at 12:29:54PM +0100, Lukas Wunner wrote:
> On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > and all the debugging you've done.  Below is a revert of the troublesome
> > commit.  Can you test it and verify that it also fixes the problem?
> > 
> > I assume Mika is looking at this and will have a better solution soon.
> > But if not, I'll queue this up for v4.10.
> 
> @Kilian:  Are you using the proprietary nvidia driver?  If so,
> does the issue go away if you blacklist that driver or use nouveau
> instead?
> 
> 
> With a bit of googling I found dmesg and lspci output for this model:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386
> 
> The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
> 0x60, 0x64.  I assume they're located behind the LPC bridge?
> 
> The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
> registers with vga_tryget() but never releases that lock.  Intel
> chipsets have a quirk wherein I/O ports are routed to the bus to
> which the legacy PCI VGA registers are locked.  So once vga_tryget()
> is called by the nvidia driver, access to the keyboard and mouse is
> routed to bus 01 (on which the Nvidia card resides) and not to bus 00
> (on which the LPC bridge resides).

Interesting.  A spec reference would be a good addition to whatever
fix is proposed.

> My theory would be that if you lock the screen, the Nvidia card
> runtime suspends, allowing the root port above it to suspend,
> and then the I/O ports are no longer accessible.
> 
> We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
> is adjusted by writing to I/O ports of a gmux controller situated below
> the LPC bridge.  The nvidia driver locks the legacy VGA registers and
> from that point reads from the I/O ports always return 0xff.  Commit
> 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
> sought to fix it but caused other breakage which remains unfixed so far:
> https://bugzilla.kernel.org/show_bug.cgi?id=105051
> https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> 
> I've always wondered if the Intel chipsets would behave more sensibly
> if the LPC bridge had BARs specifying the I/O regions used by devices
> below it.
> 
> Reverting runtime suspend for PCIe ports is not a good solution as it's
> needed for Thunderbolt runtime PM on Macs.

The choices are:

  1) Fix the regression and preserve runtime PM for PCIe ports
  2) Fix the regression by reverting runtime PM for PCIe ports

Obviously we hope for 1).  Preserving runtime PM without fixing the
regression isn't even on the list.  I know this is Linux 101, so I
apologize for restating the obvious.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Dec. 29, 2016, 4:20 p.m. UTC | #6
I know it is a repetition of what I have written above but this behaviour (comment 19) should be contrasted to the behaviour on the 4.8 and 4.9 kernel which make my system unresponsive:
Here the desktop is non static. I can see xclock ticking. The mouse
moves. But any keyboard interaction or mouse click is not possible anymore.

I checked that this behavior is not related to nvidia and nouveau kernel driver. By blacklisting them.


----- Original Message -----
From: "Kilian Singer" <kilian.singer@quantumtechnology.info>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: "Lukas Wunner" <lukas@wunner.de>, "linux-pci" <linux-pci@vger.kernel.org>, "Mika Westerberg" <mika.westerberg@linux.intel.com>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Thursday, December 29, 2016 5:02:30 PM
Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

One thing that was always weird in my debian system is, 
that even with working lock screen on the 4.7.0-1 version.
The lock screen is not a black screen but instead seems to
be a static screenshot of the desktop.

I currently have no system to compare with but this might be 
an abnormal behavior.

----- Original Message -----
From: "Kilian Singer" <kilian.singer@quantumtechnology.info>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: "Lukas Wunner" <lukas@wunner.de>, "linux-pci" <linux-pci@vger.kernel.org>, "Mika Westerberg" <mika.westerberg@linux.intel.com>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Thursday, December 29, 2016 10:58:44 AM
Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

Dear Lukas and Bjorn,
I am using the nouveau driver.
dmesg and lspci --vv is attached as a comment to the bug rebort
https://bugzilla.kernel.org/show_bug.cgi?id=190861#c15
https://bugzilla.kernel.org/show_bug.cgi?id=190861#c16

I could completely deactivate nvidia in the bios...

Best regards
Kilian

----- Original Message -----
From: "Bjorn Helgaas" <helgaas@kernel.org>
To: "Lukas Wunner" <lukas@wunner.de>
Cc: "Kilian Singer" <kilian.singer@quantumtechnology.info>, linux-pci@vger.kernel.org, "Mika Westerberg" <mika.westerberg@linux.intel.com>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Wednesday, December 28, 2016 5:18:16 PM
Subject: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

On Wed, Dec 28, 2016 at 12:29:54PM +0100, Lukas Wunner wrote:
> On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > and all the debugging you've done.  Below is a revert of the troublesome
> > commit.  Can you test it and verify that it also fixes the problem?
> > 
> > I assume Mika is looking at this and will have a better solution soon.
> > But if not, I'll queue this up for v4.10.
> 
> @Kilian:  Are you using the proprietary nvidia driver?  If so,
> does the issue go away if you blacklist that driver or use nouveau
> instead?
> 
> 
> With a bit of googling I found dmesg and lspci output for this model:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386
> 
> The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
> 0x60, 0x64.  I assume they're located behind the LPC bridge?
> 
> The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
> registers with vga_tryget() but never releases that lock.  Intel
> chipsets have a quirk wherein I/O ports are routed to the bus to
> which the legacy PCI VGA registers are locked.  So once vga_tryget()
> is called by the nvidia driver, access to the keyboard and mouse is
> routed to bus 01 (on which the Nvidia card resides) and not to bus 00
> (on which the LPC bridge resides).

Interesting.  A spec reference would be a good addition to whatever
fix is proposed.

> My theory would be that if you lock the screen, the Nvidia card
> runtime suspends, allowing the root port above it to suspend,
> and then the I/O ports are no longer accessible.
> 
> We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
> is adjusted by writing to I/O ports of a gmux controller situated below
> the LPC bridge.  The nvidia driver locks the legacy VGA registers and
> from that point reads from the I/O ports always return 0xff.  Commit
> 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
> sought to fix it but caused other breakage which remains unfixed so far:
> https://bugzilla.kernel.org/show_bug.cgi?id=105051
> https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> 
> I've always wondered if the Intel chipsets would behave more sensibly
> if the LPC bridge had BARs specifying the I/O regions used by devices
> below it.
> 
> Reverting runtime suspend for PCIe ports is not a good solution as it's
> needed for Thunderbolt runtime PM on Macs.

The choices are:

  1) Fix the regression and preserve runtime PM for PCIe ports
  2) Fix the regression by reverting runtime PM for PCIe ports

Obviously we hope for 1).  Preserving runtime PM without fixing the
regression isn't even on the list.  I know this is Linux 101, so I
apologize for restating the obvious.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Dec. 30, 2016, 12:19 a.m. UTC | #7
On Wednesday, December 28, 2016 10:18:16 AM Bjorn Helgaas wrote:
> On Wed, Dec 28, 2016 at 12:29:54PM +0100, Lukas Wunner wrote:
> > On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > > and all the debugging you've done.  Below is a revert of the troublesome
> > > commit.  Can you test it and verify that it also fixes the problem?
> > > 
> > > I assume Mika is looking at this and will have a better solution soon.
> > > But if not, I'll queue this up for v4.10.
> > 
> > @Kilian:  Are you using the proprietary nvidia driver?  If so,
> > does the issue go away if you blacklist that driver or use nouveau
> > instead?
> > 
> > 
> > With a bit of googling I found dmesg and lspci output for this model:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386
> > 
> > The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
> > 0x60, 0x64.  I assume they're located behind the LPC bridge?
> > 
> > The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
> > registers with vga_tryget() but never releases that lock.  Intel
> > chipsets have a quirk wherein I/O ports are routed to the bus to
> > which the legacy PCI VGA registers are locked.  So once vga_tryget()
> > is called by the nvidia driver, access to the keyboard and mouse is
> > routed to bus 01 (on which the Nvidia card resides) and not to bus 00
> > (on which the LPC bridge resides).
> 
> Interesting.  A spec reference would be a good addition to whatever
> fix is proposed.
> 
> > My theory would be that if you lock the screen, the Nvidia card
> > runtime suspends, allowing the root port above it to suspend,
> > and then the I/O ports are no longer accessible.
> > 
> > We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
> > is adjusted by writing to I/O ports of a gmux controller situated below
> > the LPC bridge.  The nvidia driver locks the legacy VGA registers and
> > from that point reads from the I/O ports always return 0xff.  Commit
> > 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
> > sought to fix it but caused other breakage which remains unfixed so far:
> > https://bugzilla.kernel.org/show_bug.cgi?id=105051
> > https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> > 
> > I've always wondered if the Intel chipsets would behave more sensibly
> > if the LPC bridge had BARs specifying the I/O regions used by devices
> > below it.
> > 
> > Reverting runtime suspend for PCIe ports is not a good solution as it's
> > needed for Thunderbolt runtime PM on Macs.
> 
> The choices are:
> 
>   1) Fix the regression and preserve runtime PM for PCIe ports
>   2) Fix the regression by reverting runtime PM for PCIe ports
> 
> Obviously we hope for 1).  Preserving runtime PM without fixing the
> regression isn't even on the list.  I know this is Linux 101, so I
> apologize for restating the obvious.

There is a couple of obvious things we can do other than reverting, though.

Like for example changing the cutoff date we have in there to cover the Kilian's
system.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Dec. 30, 2016, 2:48 p.m. UTC | #8
On Friday, December 30, 2016 01:19:14 AM Rafael J. Wysocki wrote:
> On Wednesday, December 28, 2016 10:18:16 AM Bjorn Helgaas wrote:
> > On Wed, Dec 28, 2016 at 12:29:54PM +0100, Lukas Wunner wrote:
> > > On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > > > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > > > and all the debugging you've done.  Below is a revert of the troublesome
> > > > commit.  Can you test it and verify that it also fixes the problem?
> > > > 
> > > > I assume Mika is looking at this and will have a better solution soon.
> > > > But if not, I'll queue this up for v4.10.
> > > 
> > > @Kilian:  Are you using the proprietary nvidia driver?  If so,
> > > does the issue go away if you blacklist that driver or use nouveau
> > > instead?
> > > 
> > > 
> > > With a bit of googling I found dmesg and lspci output for this model:
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437386
> > > 
> > > The keyboard and mouse seem to be PS/2 devices, accessed via I/O ports
> > > 0x60, 0x64.  I assume they're located behind the LPC bridge?
> > > 
> > > The proprietary nvidia driver has a bug, it locks the legacy PCI VGA
> > > registers with vga_tryget() but never releases that lock.  Intel
> > > chipsets have a quirk wherein I/O ports are routed to the bus to
> > > which the legacy PCI VGA registers are locked.  So once vga_tryget()
> > > is called by the nvidia driver, access to the keyboard and mouse is
> > > routed to bus 01 (on which the Nvidia card resides) and not to bus 00
> > > (on which the LPC bridge resides).
> > 
> > Interesting.  A spec reference would be a good addition to whatever
> > fix is proposed.
> > 
> > > My theory would be that if you lock the screen, the Nvidia card
> > > runtime suspends, allowing the root port above it to suspend,
> > > and then the I/O ports are no longer accessible.
> > > 
> > > We have a similar issue on dual GPU MacBook Pros:  Backlight brightness
> > > is adjusted by writing to I/O ports of a gmux controller situated below
> > > the LPC bridge.  The nvidia driver locks the legacy VGA registers and
> > > from that point reads from the I/O ports always return 0xff.  Commit
> > > 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes")
> > > sought to fix it but caused other breakage which remains unfixed so far:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=105051
> > > https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> > > 
> > > I've always wondered if the Intel chipsets would behave more sensibly
> > > if the LPC bridge had BARs specifying the I/O regions used by devices
> > > below it.
> > > 
> > > Reverting runtime suspend for PCIe ports is not a good solution as it's
> > > needed for Thunderbolt runtime PM on Macs.
> > 
> > The choices are:
> > 
> >   1) Fix the regression and preserve runtime PM for PCIe ports
> >   2) Fix the regression by reverting runtime PM for PCIe ports
> > 
> > Obviously we hope for 1).  Preserving runtime PM without fixing the
> > regression isn't even on the list.  I know this is Linux 101, so I
> > apologize for restating the obvious.
> 
> There is a couple of obvious things we can do other than reverting, though.
> 
> Like for example changing the cutoff date we have in there to cover the Kilian's
> system.

And I hope you realize that this revert isn't even sufficient to fix the Kilian's
machine entirely (system suspend/resume issues will remain after it).

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 17, 2017, 2:56 p.m. UTC | #9
On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> Hi Killian,
> 
> Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> and all the debugging you've done.  Below is a revert of the troublesome
> commit.  Can you test it and verify that it also fixes the problem?
> 
> I assume Mika is looking at this and will have a better solution soon.
> But if not, I'll queue this up for v4.10.

Can somebody please summarize the current state of this issue?  I
assume somebody has already posted a better patch that should replace
this naive revert, but I haven't been following the whole thread.

> commit e648b1ca2b94d207289fedc2538d33c57cdbc4de
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Tue Dec 27 17:27:30 2016 -0600
> 
>     Revert "PCI: Add runtime PM support for PCIe ports"
>     
>     Revert 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports").
>     
>     Killian reported that on a Lenovo W54l with i7-4810MQ, Intel HD Graphics
>     4600, and NVIDIA Quadro® K1100M, locking the screen kills all keyboard and
>     mouse interaction.  Reverting 006d44e49a25 fixes the problem.
>     
>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=190861
>     Reported-by: kilian.singer@quantumtechnology.info
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     CC: stable@vger.kernel.org	# v4.8+
>     CC: Mika Westerberg <mika.westerberg@linux.intel.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Jan. 17, 2017, 3:49 p.m. UTC | #10
Dear Bjorn,

I got two patches to test but they fail because I need to know the git
version against which to patch.

Also a manual patch was not possible because the source code differed
too much.

So currently I am waiting for that info.

Best regards

Kilian


On 17-Jan-17 15:56, Bjorn Helgaas wrote:
> On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
>> Hi Killian,
>>
>> Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
>> and all the debugging you've done.  Below is a revert of the troublesome
>> commit.  Can you test it and verify that it also fixes the problem?
>>
>> I assume Mika is looking at this and will have a better solution soon.
>> But if not, I'll queue this up for v4.10.
> Can somebody please summarize the current state of this issue?  I
> assume somebody has already posted a better patch that should replace
> this naive revert, but I haven't been following the whole thread.
>
>> commit e648b1ca2b94d207289fedc2538d33c57cdbc4de
>> Author: Bjorn Helgaas <bhelgaas@google.com>
>> Date:   Tue Dec 27 17:27:30 2016 -0600
>>
>>     Revert "PCI: Add runtime PM support for PCIe ports"
>>     
>>     Revert 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports").
>>     
>>     Killian reported that on a Lenovo W54l with i7-4810MQ, Intel HD Graphics
>>     4600, and NVIDIA Quadro® K1100M, locking the screen kills all keyboard and
>>     mouse interaction.  Reverting 006d44e49a25 fixes the problem.
>>     
>>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=190861
>>     Reported-by: kilian.singer@quantumtechnology.info
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>     CC: stable@vger.kernel.org	# v4.8+
>>     CC: Mika Westerberg <mika.westerberg@linux.intel.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 23, 2017, 8:33 p.m. UTC | #11
On Tue, Jan 17, 2017 at 08:56:28AM -0600, Bjorn Helgaas wrote:
> On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > Hi Killian,
> > 
> > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > and all the debugging you've done.  Below is a revert of the troublesome
> > commit.  Can you test it and verify that it also fixes the problem?
> > 
> > I assume Mika is looking at this and will have a better solution soon.
> > But if not, I'll queue this up for v4.10.
> 
> Can somebody please summarize the current state of this issue?  I
> assume somebody has already posted a better patch that should replace
> this naive revert, but I haven't been following the whole thread.

This is somewhat frustrating.  Is there a better patch than the revert
mentioned below?  There was a lot of hullabaloo when I first posted
it, but I haven't seen a good alternative yet.  I intended the revert
as a worst-case scenario fix, with the expectation that somebody would
fix the problem or at least avoid it without having to do the revert.
Maybe somebody posted that better fix and I just missed it?

From my perspective (and I have not followed the whole 100 message
thread), the bare bones of the situation are that 006d44e49a25 ("PCI:
Add runtime PM support for PCIe ports") probably reduced power
consumption on some machines.  But it also made Kilian's system
unresponsive when locking the screen.

Given only those assumptions, a revert seems like a reasonable
approach.  I understand and agree that we want to save power, but
not at the expense of making systems unresponsive.

Maybe 006d44e49a25 actually fixed a functional problem in addition to
saving power?  I don't think the changelog mentions anything like
that, but if that's the case, we should certainly consider that.

We're at -rc5 already, so if we want something other than a revert,
now is the time to propose it.

> > commit e648b1ca2b94d207289fedc2538d33c57cdbc4de
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Tue Dec 27 17:27:30 2016 -0600
> > 
> >     Revert "PCI: Add runtime PM support for PCIe ports"
> >     
> >     Revert 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports").
> >     
> >     Killian reported that on a Lenovo W54l with i7-4810MQ, Intel HD Graphics
> >     4600, and NVIDIA Quadro® K1100M, locking the screen kills all keyboard and
> >     mouse interaction.  Reverting 006d44e49a25 fixes the problem.
> >     
> >     Link: https://bugzilla.kernel.org/show_bug.cgi?id=190861
> >     Reported-by: kilian.singer@quantumtechnology.info
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >     CC: stable@vger.kernel.org	# v4.8+
> >     CC: Mika Westerberg <mika.westerberg@linux.intel.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg Jan. 23, 2017, 9:12 p.m. UTC | #12
On Mon, Jan 23, 2017 at 02:33:35PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 17, 2017 at 08:56:28AM -0600, Bjorn Helgaas wrote:
> > On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > > Hi Killian,
> > > 
> > > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > > and all the debugging you've done.  Below is a revert of the troublesome
> > > commit.  Can you test it and verify that it also fixes the problem?
> > > 
> > > I assume Mika is looking at this and will have a better solution soon.
> > > But if not, I'll queue this up for v4.10.
> > 
> > Can somebody please summarize the current state of this issue?  I
> > assume somebody has already posted a better patch that should replace
> > this naive revert, but I haven't been following the whole thread.
> 
> This is somewhat frustrating.  Is there a better patch than the revert
> mentioned below?  There was a lot of hullabaloo when I first posted
> it, but I haven't seen a good alternative yet.  I intended the revert
> as a worst-case scenario fix, with the expectation that somebody would
> fix the problem or at least avoid it without having to do the revert.
> Maybe somebody posted that better fix and I just missed it?

I understood that there is a patch here:

https://patchwork.freedesktop.org/patch/132478/

that is supposed to fix the issue. I'm waiting Kilian to test it.

> >From my perspective (and I have not followed the whole 100 message
> thread), the bare bones of the situation are that 006d44e49a25 ("PCI:
> Add runtime PM support for PCIe ports") probably reduced power
> consumption on some machines.  But it also made Kilian's system
> unresponsive when locking the screen.
> 
> Given only those assumptions, a revert seems like a reasonable
> approach.  I understand and agree that we want to save power, but
> not at the expense of making systems unresponsive.

But even if you revert the runtime PM commit, the same thing happens
when the system is suspended.

> Maybe 006d44e49a25 actually fixed a functional problem in addition to
> saving power?  I don't think the changelog mentions anything like
> that, but if that's the case, we should certainly consider that.
> 
> We're at -rc5 already, so if we want something other than a revert,
> now is the time to propose it.

Hmm, runtime PM patches went in for 4.9 IIRC. It is not a regression
introduced in v4.10 release cycle so I'm not sure why we are in such
hurry here?

I understand that the issue should be fixed but not why it should be
fixed for v4.10 as it is not a regression introduced by v4.10-rc1.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Wunner Jan. 24, 2017, 4:53 a.m. UTC | #13
On Mon, Jan 23, 2017 at 11:12:47PM +0200, Mika Westerberg wrote:
> On Mon, Jan 23, 2017 at 02:33:35PM -0600, Bjorn Helgaas wrote:
> > On Tue, Jan 17, 2017 at 08:56:28AM -0600, Bjorn Helgaas wrote:
> > > On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> > > > Hi Killian,
> > > > 
> > > > Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> > > > and all the debugging you've done.  Below is a revert of the troublesome
> > > > commit.  Can you test it and verify that it also fixes the problem?
> > > > 
> > > > I assume Mika is looking at this and will have a better solution soon.
> > > > But if not, I'll queue this up for v4.10.
> > > 
> > > Can somebody please summarize the current state of this issue?  I
> > > assume somebody has already posted a better patch that should replace
> > > this naive revert, but I haven't been following the whole thread.
> > 
> > This is somewhat frustrating.  Is there a better patch than the revert
> > mentioned below?  There was a lot of hullabaloo when I first posted
> > it, but I haven't seen a good alternative yet.  I intended the revert
> > as a worst-case scenario fix, with the expectation that somebody would
> > fix the problem or at least avoid it without having to do the revert.
> > Maybe somebody posted that better fix and I just missed it?
> 
> I understood that there is a patch here:
> 
> https://patchwork.freedesktop.org/patch/132478/
> 
> that is supposed to fix the issue. I'm waiting Kilian to test it.

That patch landed in Linus' tree tonight (commit 3846fd9b8600,
merge commit 3258943ddb90).

@Kilian: Could you retest with the tip of Linus' master branch?

Thanks,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 24, 2017, 8:01 p.m. UTC | #14
On Mon, Jan 23, 2017 at 11:12:47PM +0200, Mika Westerberg wrote:
> On Mon, Jan 23, 2017 at 02:33:35PM -0600, Bjorn Helgaas wrote:
> > From my perspective (and I have not followed the whole 100 message
> > thread), the bare bones of the situation are that 006d44e49a25 ("PCI:
> > Add runtime PM support for PCIe ports") probably reduced power
> > consumption on some machines.  But it also made Kilian's system
> > unresponsive when locking the screen.
> > 
> > Given only those assumptions, a revert seems like a reasonable
> > approach.  I understand and agree that we want to save power, but
> > not at the expense of making systems unresponsive.
> 
> But even if you revert the runtime PM commit, the same thing happens
> when the system is suspended.

In other words, we always had bug A, and after adding 006d44e49a25, we
have bug A and bug B.  It is worthwhile to avoid B even if A still
exists.

Kilian tripped over B, and no doubt others have as well.  Most others
will be frustrated and unable to work around it.  We're lucky Kilian
was patient and sophisticated enough to track it down.

> > Maybe 006d44e49a25 actually fixed a functional problem in addition to
> > saving power?  I don't think the changelog mentions anything like
> > that, but if that's the case, we should certainly consider that.
> > 
> > We're at -rc5 already, so if we want something other than a revert,
> > now is the time to propose it.
> 
> Hmm, runtime PM patches went in for 4.9 IIRC. It is not a regression
> introduced in v4.10 release cycle so I'm not sure why we are in such
> hurry here?
> 
> I understand that the issue should be fixed but not why it should be
> fixed for v4.10 as it is not a regression introduced by v4.10-rc1.

As far as I can tell, the downside of a revert is only that we'll
consume a little more power.  I'm not sure why we would release v4.10
with a known issue that we can easily avoid.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg Jan. 25, 2017, 9:48 a.m. UTC | #15
On Tue, Jan 24, 2017 at 02:01:03PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 23, 2017 at 11:12:47PM +0200, Mika Westerberg wrote:
> > On Mon, Jan 23, 2017 at 02:33:35PM -0600, Bjorn Helgaas wrote:
> > > From my perspective (and I have not followed the whole 100 message
> > > thread), the bare bones of the situation are that 006d44e49a25 ("PCI:
> > > Add runtime PM support for PCIe ports") probably reduced power
> > > consumption on some machines.  But it also made Kilian's system
> > > unresponsive when locking the screen.
> > > 
> > > Given only those assumptions, a revert seems like a reasonable
> > > approach.  I understand and agree that we want to save power, but
> > > not at the expense of making systems unresponsive.
> > 
> > But even if you revert the runtime PM commit, the same thing happens
> > when the system is suspended.
> 
> In other words, we always had bug A, and after adding 006d44e49a25, we
> have bug A and bug B.  It is worthwhile to avoid B even if A still
> exists.

I meant the same PCI PM series also added support for powering down PCI
bridges when the system is suspended. So the same issue happens when the
system is suspended even if the runtime PM patch is reverted.

> Kilian tripped over B, and no doubt others have as well.  Most others
> will be frustrated and unable to work around it.  We're lucky Kilian
> was patient and sophisticated enough to track it down.

I agree. Thanks Kilian for the patience :)

> > > Maybe 006d44e49a25 actually fixed a functional problem in addition to
> > > saving power?  I don't think the changelog mentions anything like
> > > that, but if that's the case, we should certainly consider that.
> > > 
> > > We're at -rc5 already, so if we want something other than a revert,
> > > now is the time to propose it.
> > 
> > Hmm, runtime PM patches went in for 4.9 IIRC. It is not a regression
> > introduced in v4.10 release cycle so I'm not sure why we are in such
> > hurry here?
> > 
> > I understand that the issue should be fixed but not why it should be
> > fixed for v4.10 as it is not a regression introduced by v4.10-rc1.
> 
> As far as I can tell, the downside of a revert is only that we'll
> consume a little more power.  I'm not sure why we would release v4.10
> with a known issue that we can easily avoid.

I've been told that reverting the nouveau driver back to use ACPI _DSM
method causes other issues.

I would rather try to understand what is actually going on and why this
happens in the first place, even if it takes longer than when v4.10 is
released, if 3846fd9b8600 ("drm/probe-helpers: Drop locking from
poll_enable") does not solve the issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kilian Singer Jan. 25, 2017, 4:05 p.m. UTC | #16
Dear Mika,
just came back from my lecture. Booted into the new kernel runtime suspend works!
Thanks for all the support. I really enjoyed working with you all.
It was very interesting to see how bugs are fixed :)

Greetings
Kilian

----- Original Message -----
From: "Mika Westerberg" <mika.westerberg@linux.intel.com>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: "Kilian Singer" <kilian.singer@quantumtechnology.info>, "linux-pci" <linux-pci@vger.kernel.org>, "Lukas Wunner" <lukas@wunner.de>, "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Sent: Wednesday, January 25, 2017 10:48:31 AM
Subject: Re: PCI: Revert "PCI: Add runtime PM support for PCIe ports"

On Tue, Jan 24, 2017 at 02:01:03PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 23, 2017 at 11:12:47PM +0200, Mika Westerberg wrote:
> > On Mon, Jan 23, 2017 at 02:33:35PM -0600, Bjorn Helgaas wrote:
> > > From my perspective (and I have not followed the whole 100 message
> > > thread), the bare bones of the situation are that 006d44e49a25 ("PCI:
> > > Add runtime PM support for PCIe ports") probably reduced power
> > > consumption on some machines.  But it also made Kilian's system
> > > unresponsive when locking the screen.
> > > 
> > > Given only those assumptions, a revert seems like a reasonable
> > > approach.  I understand and agree that we want to save power, but
> > > not at the expense of making systems unresponsive.
> > 
> > But even if you revert the runtime PM commit, the same thing happens
> > when the system is suspended.
> 
> In other words, we always had bug A, and after adding 006d44e49a25, we
> have bug A and bug B.  It is worthwhile to avoid B even if A still
> exists.

I meant the same PCI PM series also added support for powering down PCI
bridges when the system is suspended. So the same issue happens when the
system is suspended even if the runtime PM patch is reverted.

> Kilian tripped over B, and no doubt others have as well.  Most others
> will be frustrated and unable to work around it.  We're lucky Kilian
> was patient and sophisticated enough to track it down.

I agree. Thanks Kilian for the patience :)

> > > Maybe 006d44e49a25 actually fixed a functional problem in addition to
> > > saving power?  I don't think the changelog mentions anything like
> > > that, but if that's the case, we should certainly consider that.
> > > 
> > > We're at -rc5 already, so if we want something other than a revert,
> > > now is the time to propose it.
> > 
> > Hmm, runtime PM patches went in for 4.9 IIRC. It is not a regression
> > introduced in v4.10 release cycle so I'm not sure why we are in such
> > hurry here?
> > 
> > I understand that the issue should be fixed but not why it should be
> > fixed for v4.10 as it is not a regression introduced by v4.10-rc1.
> 
> As far as I can tell, the downside of a revert is only that we'll
> consume a little more power.  I'm not sure why we would release v4.10
> with a known issue that we can easily avoid.

I've been told that reverting the nouveau driver back to use ACPI _DSM
method causes other issues.

I would rather try to understand what is actually going on and why this
happens in the first place, even if it takes longer than when v4.10 is
released, if 3846fd9b8600 ("drm/probe-helpers: Drop locking from
poll_enable") does not solve the issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mika Westerberg Jan. 25, 2017, 4:31 p.m. UTC | #17
On Wed, Jan 25, 2017 at 05:05:29PM +0100, Kilian Singer wrote:
> Dear Mika,
> just came back from my lecture. Booted into the new kernel runtime suspend works!
> Thanks for all the support. I really enjoyed working with you all.
> It was very interesting to see how bugs are fixed :)

OK, thanks a lot for your patience :)

So it was 3846fd9b8600 ("drm/probe-helpers: Drop locking from
poll_enable") that actually fixed the issue for Kilian. I hope we do not
need to revert the runtime PM patch anymore ;-)

Thanks for everyone who participated in this.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Jan. 25, 2017, 5:58 p.m. UTC | #18
[+cc Ben, since this bug was reported against a Debian stretch kernel,
Daniel, Dave, Chris, Lyude]

On Tue, Dec 27, 2016 at 05:57:37PM -0600, Bjorn Helgaas wrote:
> Hi Killian,
> 
> Thanks for the report (https://bugzilla.kernel.org/show_bug.cgi?id=190861)
> and all the debugging you've done.  Below is a revert of the troublesome
> commit.  Can you test it and verify that it also fixes the problem?
> 
> I assume Mika is looking at this and will have a better solution soon.
> But if not, I'll queue this up for v4.10.
> 
> 
> commit e648b1ca2b94d207289fedc2538d33c57cdbc4de
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Tue Dec 27 17:27:30 2016 -0600
> 
>     Revert "PCI: Add runtime PM support for PCIe ports"
>     
>     Revert 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports").
>     
>     Killian reported that on a Lenovo W54l with i7-4810MQ, Intel HD Graphics
>     4600, and NVIDIA Quadro® K1100M, locking the screen kills all keyboard and
>     mouse interaction.  Reverting 006d44e49a25 fixes the problem.
>     
>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=190861
>     Reported-by: kilian.singer@quantumtechnology.info
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     CC: stable@vger.kernel.org	# v4.8+
>     CC: Mika Westerberg <mika.westerberg@linux.intel.com>

I dropped this revert, since Kilian has confirmed that 3846fd9b8600
("drm/probe-helpers: Drop locking from poll_enable"), which is already
in Linus' tree, fixes the problem.

Unfortunately the 3846fd9b8600 changelog does not mention this problem and
I don't know what's required to backport the fix to v4.9.

> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 9698289..dcb185c 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -11,7 +11,6 @@
>  #include <linux/kernel.h>
>  #include <linux/errno.h>
>  #include <linux/pm.h>
> -#include <linux/pm_runtime.h>
>  #include <linux/string.h>
>  #include <linux/slab.h>
>  #include <linux/pcieport_if.h>
> @@ -343,8 +342,6 @@ static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
>  		return retval;
>  	}
>  
> -	pm_runtime_no_callbacks(device);
> -
>  	return 0;
>  }
>  
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 8aa3f14..d3af264 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -85,26 +85,6 @@ static int pcie_port_resume_noirq(struct device *dev)
>  	return 0;
>  }
>  
> -static int pcie_port_runtime_suspend(struct device *dev)
> -{
> -	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
> -}
> -
> -static int pcie_port_runtime_resume(struct device *dev)
> -{
> -	return 0;
> -}
> -
> -static int pcie_port_runtime_idle(struct device *dev)
> -{
> -	/*
> -	 * Assume the PCI core has set bridge_d3 whenever it thinks the port
> -	 * should be good to go to D3.  Everything else, including moving
> -	 * the port to D3, is handled by the PCI core.
> -	 */
> -	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
> -}
> -
>  static const struct dev_pm_ops pcie_portdrv_pm_ops = {
>  	.suspend	= pcie_port_device_suspend,
>  	.resume		= pcie_port_device_resume,
> @@ -113,9 +93,6 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
>  	.poweroff	= pcie_port_device_suspend,
>  	.restore	= pcie_port_device_resume,
>  	.resume_noirq	= pcie_port_resume_noirq,
> -	.runtime_suspend = pcie_port_runtime_suspend,
> -	.runtime_resume	= pcie_port_runtime_resume,
> -	.runtime_idle	= pcie_port_runtime_idle,
>  };
>  
>  #define PCIE_PORTDRV_PM_OPS	(&pcie_portdrv_pm_ops)
> @@ -149,31 +126,11 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  		return status;
>  
>  	pci_save_state(dev);
> -
> -	if (pci_bridge_d3_possible(dev)) {
> -		/*
> -		 * Keep the port resumed 100ms to make sure things like
> -		 * config space accesses from userspace (lspci) will not
> -		 * cause the port to repeatedly suspend and resume.
> -		 */
> -		pm_runtime_set_autosuspend_delay(&dev->dev, 100);
> -		pm_runtime_use_autosuspend(&dev->dev);
> -		pm_runtime_mark_last_busy(&dev->dev);
> -		pm_runtime_put_autosuspend(&dev->dev);
> -		pm_runtime_allow(&dev->dev);
> -	}
> -
>  	return 0;
>  }
>  
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
> -	if (pci_bridge_d3_possible(dev)) {
> -		pm_runtime_forbid(&dev->dev);
> -		pm_runtime_get_noresume(&dev->dev);
> -		pm_runtime_dont_use_autosuspend(&dev->dev);
> -	}
> -
>  	pcie_port_device_remove(dev);
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 9698289..dcb185c 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -11,7 +11,6 @@ 
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/pm.h>
-#include <linux/pm_runtime.h>
 #include <linux/string.h>
 #include <linux/slab.h>
 #include <linux/pcieport_if.h>
@@ -343,8 +342,6 @@  static int pcie_device_init(struct pci_dev *pdev, int service, int irq)
 		return retval;
 	}
 
-	pm_runtime_no_callbacks(device);
-
 	return 0;
 }
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 8aa3f14..d3af264 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -85,26 +85,6 @@  static int pcie_port_resume_noirq(struct device *dev)
 	return 0;
 }
 
-static int pcie_port_runtime_suspend(struct device *dev)
-{
-	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
-}
-
-static int pcie_port_runtime_resume(struct device *dev)
-{
-	return 0;
-}
-
-static int pcie_port_runtime_idle(struct device *dev)
-{
-	/*
-	 * Assume the PCI core has set bridge_d3 whenever it thinks the port
-	 * should be good to go to D3.  Everything else, including moving
-	 * the port to D3, is handled by the PCI core.
-	 */
-	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
-}
-
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.suspend	= pcie_port_device_suspend,
 	.resume		= pcie_port_device_resume,
@@ -113,9 +93,6 @@  static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.poweroff	= pcie_port_device_suspend,
 	.restore	= pcie_port_device_resume,
 	.resume_noirq	= pcie_port_resume_noirq,
-	.runtime_suspend = pcie_port_runtime_suspend,
-	.runtime_resume	= pcie_port_runtime_resume,
-	.runtime_idle	= pcie_port_runtime_idle,
 };
 
 #define PCIE_PORTDRV_PM_OPS	(&pcie_portdrv_pm_ops)
@@ -149,31 +126,11 @@  static int pcie_portdrv_probe(struct pci_dev *dev,
 		return status;
 
 	pci_save_state(dev);
-
-	if (pci_bridge_d3_possible(dev)) {
-		/*
-		 * Keep the port resumed 100ms to make sure things like
-		 * config space accesses from userspace (lspci) will not
-		 * cause the port to repeatedly suspend and resume.
-		 */
-		pm_runtime_set_autosuspend_delay(&dev->dev, 100);
-		pm_runtime_use_autosuspend(&dev->dev);
-		pm_runtime_mark_last_busy(&dev->dev);
-		pm_runtime_put_autosuspend(&dev->dev);
-		pm_runtime_allow(&dev->dev);
-	}
-
 	return 0;
 }
 
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
-	if (pci_bridge_d3_possible(dev)) {
-		pm_runtime_forbid(&dev->dev);
-		pm_runtime_get_noresume(&dev->dev);
-		pm_runtime_dont_use_autosuspend(&dev->dev);
-	}
-
 	pcie_port_device_remove(dev);
 }