diff mbox

Revert "apple-gmux: lock iGP IO to protect from vgaarb changes"

Message ID 3671c3081a0ea3226dbb4c04e780ad7b6be0f201.1516793407.git.lukas@wunner.de (mailing list archive)
State Accepted, archived
Delegated to: Darren Hart
Headers show

Commit Message

Lukas Wunner Jan. 24, 2018, 6:35 p.m. UTC
Commit 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb
changes") amended this driver's ->probe hook to lock decoding of normal
(non-legacy) I/O space accesses to the integrated GPU on dual-GPU
MacBook Pros.  The lock stays in place until the driver is unbound.

The change was made to work around an issue with the out-of-tree nvidia
graphics driver (available at http://www.nvidia.com/object/unix.html).
It contains the following sequence in nvidia/nv.c:

	#if defined(CONFIG_VGA_ARB) && !defined(NVCPU_PPC64LE)
	#if defined(VGA_DEFAULT_DEVICE)
	    vga_tryget(VGA_DEFAULT_DEVICE, VGA_RSRC_LEGACY_MASK);
	#endif
	    vga_set_legacy_decoding(dev, VGA_RSRC_NONE);
	#endif

This code was reported to cause deadlocks with VFIO already in 2013:
https://devtalk.nvidia.com/default/topic/545560

I've reported the issue to Nvidia developers once more in 2017:
https://www.spinics.net/lists/dri-devel/msg138754.html

On the MacBookPro10,1, this code apparently breaks backlight control
(which is handled by apple-gmux via an I/O region starting at 0x700),
as reported by Petri Hodju:
https://bugzilla.kernel.org/show_bug.cgi?id=86121

I tried to replicate Petri's observations on my MacBook9,1, which uses
the same Intel Ivy Bridge + Nvidia GeForce GT 650M architecture, to no
avail.  On my machine apple-gmux' I/O region remains accessible even
with the nvidia driver loaded and commit 4eebd5a4e726 reverted.
Petri reported that apple-gmux becomes accessible again after a
suspend/resume cycle because the BIOS changed the VGA routing on the
root port to the Nvidia GPU.  Perhaps this is a BIOS issue after all
that can be fixed with an update?

In any case, the change made by commit 4eebd5a4e726 has turned out to
cause two new issues:

* Wilfried Klaebe reports a deadlock when launching Xorg because it
  opens /dev/vga_arbiter and calls vga_get(), but apple-gmux is holding
  a lock on I/O space indefinitely.  It looks like apple-gmux' current
  behavior is an abuse of the vgaarb API as locks are not meant to be
  held for longer periods:
  https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
  https://bugzilla.kernel.org/attachment.cgi?id=217541

* On dual GPU MacBook Pros introduced since 2013, the integrated GPU is
  powergated on boot und thus becomes invisible to Linux unless a custom
  EFI protocol is used to leave it powered on.  (A patch exists but is
  not in mainline yet due to several negative side effects.)  On these
  machines, locking I/O to the integrated GPU (as done by 4eebd5a4e726)
  fails and backlight control is therefore broken:
  https://bugzilla.kernel.org/show_bug.cgi?id=105051

So let's revert commit 4eebd5a4e726 please.  Users experiencing the
issue with the proprietary nvidia driver can comment out the above-
quoted problematic code as a workaround (or try updating the BIOS).

Cc: Petri Hodju <petrihodju@yahoo.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bruno Prémont <bonbons@linux-vserver.org>
Cc: Andy Ritger <aritger@nvidia.com>
Cc: Ronald Tschalär <ronald@innovation.ch>
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/platform/x86/apple-gmux.c | 48 +--------------------------------------
 1 file changed, 1 insertion(+), 47 deletions(-)

Comments

Wilfried Klaebe Jan. 25, 2018, 8:10 p.m. UTC | #1
On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> * Wilfried Klaebe reports a deadlock when launching Xorg because it
>   opens /dev/vga_arbiter and calls vga_get(), but apple-gmux is holding
>   a lock on I/O space indefinitely.  It looks like apple-gmux' current
>   behavior is an abuse of the vgaarb API as locks are not meant to be
>   held for longer periods:
>   https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
>   https://bugzilla.kernel.org/attachment.cgi?id=217541

I can confirm that every kernel including 4eebd5a4e726 that I tried
needed that commit reverted. Currently I run 4.15-rc9 and 4.14.14
(without 4eebd5a4e726, of course). Somewhere inbetween, using "git
revert 4eebd5a4e726" started acting up and I needed to work on it
manually, but the diff I currently apply is identical to what Lukas
submitted. 

I see warnings about possible circular locking regarding vgasr_mutex,
but haven't had a deadlock yet. I'd be happy to provide more information
as timely as job and family allow.

Best regards,
Wilfried
Lukas Wunner Jan. 25, 2018, 8:25 p.m. UTC | #2
On Thu, Jan 25, 2018 at 09:10:48PM +0100, Wilfried Klaebe wrote:
> On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> > * Wilfried Klaebe reports a deadlock when launching Xorg because it
> >   opens /dev/vga_arbiter and calls vga_get(), but apple-gmux is holding
> >   a lock on I/O space indefinitely.  It looks like apple-gmux' current
> >   behavior is an abuse of the vgaarb API as locks are not meant to be
> >   held for longer periods:
> >   https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> >   https://bugzilla.kernel.org/attachment.cgi?id=217541
[...]
> I see warnings about possible circular locking regarding vgasr_mutex,
> but haven't had a deadlock yet. I'd be happy to provide more information
> as timely as job and family allow.

Could you attach dmesg output with that lockdep splat to the
above-quoted bugzilla entry?

Thanks,

Lukas
Darren Hart Jan. 25, 2018, 10:38 p.m. UTC | #3
On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> Commit 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb
> changes") amended this driver's ->probe hook to lock decoding of normal
> (non-legacy) I/O space accesses to the integrated GPU on dual-GPU
> MacBook Pros.  The lock stays in place until the driver is unbound.
> 
...
> 
> So let's revert commit 4eebd5a4e726 please.  Users experiencing the
> issue with the proprietary nvidia driver can comment out the above-
> quoted problematic code as a workaround (or try updating the BIOS).
> 
> Cc: Petri Hodju <petrihodju@yahoo.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Bruno Prémont <bonbons@linux-vserver.org>
> Cc: Andy Ritger <aritger@nvidia.com>
> Cc: Ronald Tschalär <ronald@innovation.ch>

I presume we'd like to see this applied to linux-stable as well?
Lukas Wunner Jan. 26, 2018, 12:57 a.m. UTC | #4
On Thu, Jan 25, 2018 at 02:38:00PM -0800, Darren Hart wrote:
> On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> > Commit 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb
> > changes") amended this driver's ->probe hook to lock decoding of normal
> > (non-legacy) I/O space accesses to the integrated GPU on dual-GPU
> > MacBook Pros.  The lock stays in place until the driver is unbound.
> > 
> ...
> > 
> > So let's revert commit 4eebd5a4e726 please.  Users experiencing the
> > issue with the proprietary nvidia driver can comment out the above-
> > quoted problematic code as a workaround (or try updating the BIOS).
> > 
> > Cc: Petri Hodju <petrihodju@yahoo.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Bruno Prémont <bonbons@linux-vserver.org>
> > Cc: Andy Ritger <aritger@nvidia.com>
> > Cc: Ronald Tschalär <ronald@innovation.ch>
> 
> I presume we'd like to see this applied to linux-stable as well?

Fine by me.  Can you add the stable designation when applying or
would you prefer me to respin?

Thanks,

Lukas
Wilfried Klaebe Jan. 26, 2018, 6:53 p.m. UTC | #5
Am Thu, Jan 25, 2018 at 09:25:30PM +0100 schrieb Lukas Wunner:
> On Thu, Jan 25, 2018 at 09:10:48PM +0100, Wilfried Klaebe wrote:
> > On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> > > * Wilfried Klaebe reports a deadlock when launching Xorg because it
> > >   opens /dev/vga_arbiter and calls vga_get(), but apple-gmux is holding
> > >   a lock on I/O space indefinitely.  It looks like apple-gmux' current
> > >   behavior is an abuse of the vgaarb API as locks are not meant to be
> > >   held for longer periods:
> > >   https://bugzilla.kernel.org/show_bug.cgi?id=88861#c11
> > >   https://bugzilla.kernel.org/attachment.cgi?id=217541
> [...]
> > I see warnings about possible circular locking regarding vgasr_mutex,
> > but haven't had a deadlock yet. I'd be happy to provide more information
> > as timely as job and family allow.
> 
> Could you attach dmesg output with that lockdep splat to the
> above-quoted bugzilla entry?

Done.

Regards,
Wilfried
Darren Hart Jan. 31, 2018, 6:39 p.m. UTC | #6
On Thu, Jan 25, 2018 at 02:38:00PM -0800, Darren Hart wrote:
> On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> > Commit 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb
> > changes") amended this driver's ->probe hook to lock decoding of normal
> > (non-legacy) I/O space accesses to the integrated GPU on dual-GPU
> > MacBook Pros.  The lock stays in place until the driver is unbound.
> > 
> ...
> > 
> > So let's revert commit 4eebd5a4e726 please.  Users experiencing the
> > issue with the proprietary nvidia driver can comment out the above-
> > quoted problematic code as a workaround (or try updating the BIOS).
> > 
> > Cc: Petri Hodju <petrihodju@yahoo.com>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: Bruno Prémont <bonbons@linux-vserver.org>
> > Cc: Andy Ritger <aritger@nvidia.com>
> > Cc: Ronald Tschalär <ronald@innovation.ch>
> 
> I presume we'd like to see this applied to linux-stable as well?

I haven't heard back, but from what I've seen in this thread, this appears to me
to be in need of porting back to stable releases. I'm pushing this to testing,
and then shortly to for-next, and then to Linus.

Please let me know if this isn't what everyone is expecting.
Lukas Wunner Jan. 31, 2018, 7:44 p.m. UTC | #7
On Wed, Jan 31, 2018 at 10:39:25AM -0800, Darren Hart wrote:
> On Thu, Jan 25, 2018 at 02:38:00PM -0800, Darren Hart wrote:
> > On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> > > Commit 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb
> > > changes") amended this driver's ->probe hook to lock decoding of normal
> > > (non-legacy) I/O space accesses to the integrated GPU on dual-GPU
> > > MacBook Pros.  The lock stays in place until the driver is unbound.
> > > 
> > ...
> > > 
> > > So let's revert commit 4eebd5a4e726 please.  Users experiencing the
> > > issue with the proprietary nvidia driver can comment out the above-
> > > quoted problematic code as a workaround (or try updating the BIOS).
> > > 
> > > Cc: Petri Hodju <petrihodju@yahoo.com>
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: Bruno Prémont <bonbons@linux-vserver.org>
> > > Cc: Andy Ritger <aritger@nvidia.com>
> > > Cc: Ronald Tschalär <ronald@innovation.ch>
> > 
> > I presume we'd like to see this applied to linux-stable as well?
> 
> I haven't heard back

I did reply to your e-mail on Jan 26, quote:
"Fine by me.  Can you add the stable designation when applying or
 would you prefer me to respin?"

https://www.spinics.net/lists/platform-driver-x86/msg14690.html


> but from what I've seen in this thread, this appears to me
> to be in need of porting back to stable releases.
> I'm pushing this to testing, and then shortly to for-next, and then to Linus.
> Please let me know if this isn't what everyone is expecting.

WFM, thanks!

Lukas
Andy Ritger Feb. 1, 2018, 1:31 a.m. UTC | #8
On Wed, Jan 31, 2018 at 08:44:20PM +0100, Lukas Wunner wrote:
> On Wed, Jan 31, 2018 at 10:39:25AM -0800, Darren Hart wrote:
> > On Thu, Jan 25, 2018 at 02:38:00PM -0800, Darren Hart wrote:
> > > On Wed, Jan 24, 2018 at 07:35:45PM +0100, Lukas Wunner wrote:
> > > > Commit 4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb
> > > > changes") amended this driver's ->probe hook to lock decoding of normal
> > > > (non-legacy) I/O space accesses to the integrated GPU on dual-GPU
> > > > MacBook Pros.  The lock stays in place until the driver is unbound.
> > > > 
> > > ...
> > > > 
> > > > So let's revert commit 4eebd5a4e726 please.  Users experiencing the
> > > > issue with the proprietary nvidia driver can comment out the above-
> > > > quoted problematic code as a workaround (or try updating the BIOS).
> > > > 
> > > > Cc: Petri Hodju <petrihodju@yahoo.com>
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: Bruno Prémont <bonbons@linux-vserver.org>
> > > > Cc: Andy Ritger <aritger@nvidia.com>
> > > > Cc: Ronald Tschalär <ronald@innovation.ch>
> > > 
> > > I presume we'd like to see this applied to linux-stable as well?
> > 
> > I haven't heard back
> 
> I did reply to your e-mail on Jan 26, quote:
> "Fine by me.  Can you add the stable designation when applying or
>  would you prefer me to respin?"
> 
> https://www.spinics.net/lists/platform-driver-x86/msg14690.html
> 
> 
> > but from what I've seen in this thread, this appears to me
> > to be in need of porting back to stable releases.
> > I'm pushing this to testing, and then shortly to for-next, and then to Linus.
> > Please let me know if this isn't what everyone is expecting.
> 
> WFM, thanks!
> 
> Lukas

Sorry for responding slowly to the thread.  Yes, if this reintroduces
interaction problems for users of the nvidia proprietary driver, then
commenting out the code you noted may be the best workaround for them
in the short term.

Anyway, the patch, for whichever kernel trees, is Acked-by: Andy Ritger <aritger@nvidia.com>
diff mbox

Patch

diff --git a/drivers/platform/x86/apple-gmux.c b/drivers/platform/x86/apple-gmux.c
index 623d322447a2..7c4eb86c851e 100644
--- a/drivers/platform/x86/apple-gmux.c
+++ b/drivers/platform/x86/apple-gmux.c
@@ -24,7 +24,6 @@ 
 #include <linux/delay.h>
 #include <linux/pci.h>
 #include <linux/vga_switcheroo.h>
-#include <linux/vgaarb.h>
 #include <acpi/video.h>
 #include <asm/io.h>
 
@@ -54,7 +53,6 @@  struct apple_gmux_data {
 	bool indexed;
 	struct mutex index_lock;
 
-	struct pci_dev *pdev;
 	struct backlight_device *bdev;
 
 	/* switcheroo data */
@@ -599,23 +597,6 @@  static int gmux_resume(struct device *dev)
 	return 0;
 }
 
-static struct pci_dev *gmux_get_io_pdev(void)
-{
-	struct pci_dev *pdev = NULL;
-
-	while ((pdev = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, pdev))) {
-		u16 cmd;
-
-		pci_read_config_word(pdev, PCI_COMMAND, &cmd);
-		if (!(cmd & PCI_COMMAND_IO))
-			continue;
-
-		return pdev;
-	}
-
-	return NULL;
-}
-
 static int is_thunderbolt(struct device *dev, void *data)
 {
 	return to_pci_dev(dev)->is_thunderbolt;
@@ -631,7 +612,6 @@  static int gmux_probe(struct pnp_dev *pnp, const struct pnp_device_id *id)
 	int ret = -ENXIO;
 	acpi_status status;
 	unsigned long long gpe;
-	struct pci_dev *pdev = NULL;
 
 	if (apple_gmux_data)
 		return -EBUSY;
@@ -682,7 +662,7 @@  static int gmux_probe(struct pnp_dev *pnp, const struct pnp_device_id *id)
 			ver_minor = (version >> 16) & 0xff;
 			ver_release = (version >> 8) & 0xff;
 		} else {
-			pr_info("gmux device not present or IO disabled\n");
+			pr_info("gmux device not present\n");
 			ret = -ENODEV;
 			goto err_release;
 		}
@@ -690,23 +670,6 @@  static int gmux_probe(struct pnp_dev *pnp, const struct pnp_device_id *id)
 	pr_info("Found gmux version %d.%d.%d [%s]\n", ver_major, ver_minor,
 		ver_release, (gmux_data->indexed ? "indexed" : "classic"));
 
-	/*
-	 * Apple systems with gmux are EFI based and normally don't use
-	 * VGA. In addition changing IO+MEM ownership between IGP and dGPU
-	 * disables IO/MEM used for backlight control on some systems.
-	 * Lock IO+MEM to GPU with active IO to prevent switch.
-	 */
-	pdev = gmux_get_io_pdev();
-	if (pdev && vga_tryget(pdev,
-			       VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM)) {
-		pr_err("IO+MEM vgaarb-locking for PCI:%s failed\n",
-			pci_name(pdev));
-		ret = -EBUSY;
-		goto err_release;
-	} else if (pdev)
-		pr_info("locked IO for PCI:%s\n", pci_name(pdev));
-	gmux_data->pdev = pdev;
-
 	memset(&props, 0, sizeof(props));
 	props.type = BACKLIGHT_PLATFORM;
 	props.max_brightness = gmux_read32(gmux_data, GMUX_PORT_MAX_BRIGHTNESS);
@@ -822,10 +785,6 @@  static int gmux_probe(struct pnp_dev *pnp, const struct pnp_device_id *id)
 err_notify:
 	backlight_device_unregister(bdev);
 err_release:
-	if (gmux_data->pdev)
-		vga_put(gmux_data->pdev,
-			VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM);
-	pci_dev_put(pdev);
 	release_region(gmux_data->iostart, gmux_data->iolen);
 err_free:
 	kfree(gmux_data);
@@ -845,11 +804,6 @@  static void gmux_remove(struct pnp_dev *pnp)
 					   &gmux_notify_handler);
 	}
 
-	if (gmux_data->pdev) {
-		vga_put(gmux_data->pdev,
-			VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM);
-		pci_dev_put(gmux_data->pdev);
-	}
 	backlight_device_unregister(gmux_data->bdev);
 
 	release_region(gmux_data->iostart, gmux_data->iolen);