Message ID | 20160218203431.GA19065@wunner.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> wrote: > >> Ok, makes sense. I still think adding the check to the client_register >> function would be good, just as a safety measure. > > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice > causes me pain in the stomach. It's surprising for drivers which > just don't need it at the moment (amdgpu and snd_hda_intel) and > it feels like overengineering and pampering driver developers > beyond reasonable measure. Also while the single existing check is > cheap, we might later on add checks that take more time and slow > things down. It's motivated by Rusty's API Manifesto: http://sweng.the-davies.net/Home/rustys-api-design-manifesto With the mandatory check in _register we reach level 5 - it'll blow up at runtime when we try to register it. Without that the failure is completely silent, and you need to read the right mailing list thread (level 1), but at least the kerneldocs lift it up to level 3. For more context: We have tons of fun with EPROBE_DEFER handling between i915 and snd-hda, and I'm looking into all possible means to make any api/subsystem using deferred probing as robust as possible by default. One of the ideas is to inject deferred probe failures at runtime, but that's kinda hard to do in a generic way. At least making it as close to impossible to abuse as feasible is the next best option. -Daniel
Hi, On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote: > On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> wrote: > > > >> Ok, makes sense. I still think adding the check to the client_register > >> function would be good, just as a safety measure. > > > > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice > > causes me pain in the stomach. It's surprising for drivers which > > just don't need it at the moment (amdgpu and snd_hda_intel) and > > it feels like overengineering and pampering driver developers > > beyond reasonable measure. Also while the single existing check is > > cheap, we might later on add checks that take more time and slow > > things down. > > It's motivated by Rusty's API Manifesto: > > http://sweng.the-davies.net/Home/rustys-api-design-manifesto Interesting, thank you. > With the mandatory check in _register we reach level 5 - it'll blow up > at runtime when we try to register it. The manifesto says "5. Do it right or it will always break at runtime". However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev)) to register_client(), it will not *always* spew a stacktrace but only on the machines this concerns (MacBook Pros). Since failure to defer breaks GPU switching, level 5 is already reached. Chances are this won't go unnoticed by the user. > For more context: We have tons of fun with EPROBE_DEFER handling > between i915 and snd-hda I don't understand, there is currently not a single occurrence of EPROBE_DEFER in i915, apart from the one I added. In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c resides. Is the fun with EPROBE_DEFER handling caused by the lack thereof? Best regards, Lukas
On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner <lukas@wunner.de> wrote: > Hi, > > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote: >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> wrote: >> > >> >> Ok, makes sense. I still think adding the check to the client_register >> >> function would be good, just as a safety measure. >> > >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice >> > causes me pain in the stomach. It's surprising for drivers which >> > just don't need it at the moment (amdgpu and snd_hda_intel) and >> > it feels like overengineering and pampering driver developers >> > beyond reasonable measure. Also while the single existing check is >> > cheap, we might later on add checks that take more time and slow >> > things down. >> >> It's motivated by Rusty's API Manifesto: >> >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto > > Interesting, thank you. > > >> With the mandatory check in _register we reach level 5 - it'll blow up >> at runtime when we try to register it. > > The manifesto says "5. Do it right or it will always break at runtime". > > However even if we add a WARN_ON(vga_switcheroo_client_probe_defer(pdev)) > to register_client(), it will not *always* spew a stacktrace but only on > the machines this concerns (MacBook Pros). Since failure to defer breaks > GPU switching, level 5 is already reached. Chances are this won't go > unnoticed by the user. If we fail the register hopefully the driver checks for that and might blow up somewhere in untested error handling code. But there's a good chance it'll fail (we can encourage that more by adding must_check to the function declaration). In that case you get a nice bug report with splat from users hitting this. Without this it'll silently work, and all the reports you get is "linux is shit, gpu switching doesn't work". In both cases it can sometimes succeed, which is not great indeed. But I'm trying to fix that by injection EDEFER points artificially somehow. Not yet figured out that one. But irrespective of the precise failure mode making the defer check mandatory by just including it in _register() is better since it makes it impossible to forget to call it when its needed. So imo clearly the more robust API. And that's my metric for evaluating new API - how easy/hard is it to abuse/get wrong. >> For more context: We have tons of fun with EPROBE_DEFER handling >> between i915 and snd-hda > > I don't understand, there is currently not a single occurrence of > EPROBE_DEFER in i915, apart from the one I added. > > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c > resides. > > Is the fun with EPROBE_DEFER handling caused by the lack thereof? Yes, there's one instance where i915 shoudl defer missing. The real trouble is that snd-hda has some really close ties with i915, and resolves those with probe-defer. And blows up all the time since we started using this, and with hdmi/dp you really always have to test both together in CI, snd-hda is pretty much a part of the intel gfx driver nowadays. Deferred probing is ime real trouble. -Daniel
> -----Original Message----- > From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Thursday, February 18, 2016 6:11 PM > To: Lukas Wunner > Cc: dri-devel; platform-driver-x86@vger.kernel.org; intel-gfx; Ben Skeggs; > Deucher, Alexander > Subject: Re: [PATCH v5 10/12] drm/i915: Defer probe if gmux is present but > its driver isn't > > On Thu, Feb 18, 2016 at 11:20 PM, Lukas Wunner <lukas@wunner.de> wrote: > > Hi, > > > > On Thu, Feb 18, 2016 at 10:39:05PM +0100, Daniel Vetter wrote: > >> On Thu, Feb 18, 2016 at 9:34 PM, Lukas Wunner <lukas@wunner.de> > wrote: > >> > > >> >> Ok, makes sense. I still think adding the check to the client_register > >> >> function would be good, just as a safety measure. > >> > > >> > Hm, the idea of calling vga_switcheroo_client_probe_defer() twice > >> > causes me pain in the stomach. It's surprising for drivers which > >> > just don't need it at the moment (amdgpu and snd_hda_intel) and > >> > it feels like overengineering and pampering driver developers > >> > beyond reasonable measure. Also while the single existing check is > >> > cheap, we might later on add checks that take more time and slow > >> > things down. > >> > >> It's motivated by Rusty's API Manifesto: > >> > >> http://sweng.the-davies.net/Home/rustys-api-design-manifesto > > > > Interesting, thank you. > > > > > >> With the mandatory check in _register we reach level 5 - it'll blow up > >> at runtime when we try to register it. > > > > The manifesto says "5. Do it right or it will always break at runtime". > > > > However even if we add a > WARN_ON(vga_switcheroo_client_probe_defer(pdev)) > > to register_client(), it will not *always* spew a stacktrace but only on > > the machines this concerns (MacBook Pros). Since failure to defer breaks > > GPU switching, level 5 is already reached. Chances are this won't go > > unnoticed by the user. > > If we fail the register hopefully the driver checks for that and might > blow up somewhere in untested error handling code. But there's a good > chance it'll fail (we can encourage that more by adding must_check to > the function declaration). In that case you get a nice bug report with > splat from users hitting this. > > Without this it'll silently work, and all the reports you get is > "linux is shit, gpu switching doesn't work". > > In both cases it can sometimes succeed, which is not great indeed. But > I'm trying to fix that by injection EDEFER points artificially > somehow. Not yet figured out that one. > > But irrespective of the precise failure mode making the defer check > mandatory by just including it in _register() is better since it makes > it impossible to forget to call it when its needed. So imo clearly the > more robust API. And that's my metric for evaluating new API - how > easy/hard is it to abuse/get wrong. > > >> For more context: We have tons of fun with EPROBE_DEFER handling > >> between i915 and snd-hda > > > > I don't understand, there is currently not a single occurrence of > > EPROBE_DEFER in i915, apart from the one I added. > > > > In sound/ there are 88 occurrences of EPROBE_DEFER in soc/, plus 1 in > > ppc/ and that's it. So not a single one in pci/hda/ where hda_intel.c > > resides. > > > > Is the fun with EPROBE_DEFER handling caused by the lack thereof? > > Yes, there's one instance where i915 shoudl defer missing. The real > trouble is that snd-hda has some really close ties with i915, and > resolves those with probe-defer. And blows up all the time since we > started using this, and with hdmi/dp you really always have to test > both together in CI, snd-hda is pretty much a part of the intel gfx > driver nowadays. Deferred probing is ime real trouble. To further complicate things, AMD dGPUs have HDA audio on board as well. Alex
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 44912ec..80cfd73 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -35,11 +35,9 @@ #include "i915_trace.h" #include "intel_drv.h" -#include <linux/apple-gmux.h> #include <linux/console.h> #include <linux/module.h> #include <linux/pm_runtime.h> -#include <linux/vgaarb.h> #include <linux/vga_switcheroo.h> #include <drm/drm_crtc_helper.h> @@ -972,13 +970,7 @@ static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (PCI_FUNC(pdev->devfn)) return -ENODEV; - /* - * apple-gmux is needed on dual GPU MacBook Pro - * to probe the panel if we're the inactive GPU. - */ - if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) && - apple_gmux_present() && pdev != vga_default_device() && - !vga_switcheroo_handler_flags()) + if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER; return drm_get_pci_dev(pdev, ent, &driver); diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index bb8498c..9141bcd 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -22,13 +22,11 @@ * Authors: Ben Skeggs */ -#include <linux/apple-gmux.h> #include <linux/console.h> #include <linux/delay.h> #include <linux/module.h> #include <linux/pci.h> #include <linux/pm_runtime.h> -#include <linux/vgaarb.h> #include <linux/vga_switcheroo.h> #include "drmP.h" @@ -314,13 +312,7 @@ static int nouveau_drm_probe(struct pci_dev *pdev, bool boot = false; int ret; - /* - * apple-gmux is needed on dual GPU MacBook Pro - * to probe the panel if we're the inactive GPU. - */ - if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) && - apple_gmux_present() && pdev != vga_default_device() && - !vga_switcheroo_handler_flags()) + if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER; /* remove conflicting drivers (vesafb, efifb etc) */ diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index cad2555..7be0c38 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -34,11 +34,9 @@ #include "radeon_drv.h" #include <drm/drm_pciids.h> -#include <linux/apple-gmux.h> #include <linux/console.h> #include <linux/module.h> #include <linux/pm_runtime.h> -#include <linux/vgaarb.h> #include <linux/vga_switcheroo.h> #include <drm/drm_gem.h> @@ -321,13 +319,7 @@ static int radeon_pci_probe(struct pci_dev *pdev, { int ret; - /* - * apple-gmux is needed on dual GPU MacBook Pro - * to probe the panel if we're the inactive GPU. - */ - if (IS_ENABLED(CONFIG_VGA_ARB) && IS_ENABLED(CONFIG_VGA_SWITCHEROO) && - apple_gmux_present() && pdev != vga_default_device() && - !vga_switcheroo_handler_flags()) + if (vga_switcheroo_client_probe_defer(pdev)) return -EPROBE_DEFER; /* Get rid of things like offb */ diff --git a/drivers/gpu/vga/vga_switcheroo.c b/drivers/gpu/vga/vga_switcheroo.c index cbd7c98..101e14c 100644 --- a/drivers/gpu/vga/vga_switcheroo.c +++ b/drivers/gpu/vga/vga_switcheroo.c @@ -30,6 +30,7 @@ #define pr_fmt(fmt) "vga_switcheroo: " fmt +#include <linux/apple-gmux.h> #include <linux/console.h> #include <linux/debugfs.h> #include <linux/fb.h> @@ -308,7 +309,8 @@ static int register_client(struct pci_dev *pdev, * * Register vga client (GPU). Enable vga_switcheroo if another GPU and a * handler have already registered. The power state of the client is assumed - * to be ON. + * to be ON. Beforehand, vga_switcheroo_client_probe_defer() shall be called + * to ensure that all prerequisites are met. * * Return: 0 on success, -ENOMEM on memory allocation error. */ @@ -329,7 +331,8 @@ EXPORT_SYMBOL(vga_switcheroo_register_client); * @id: client identifier * * Register audio client (audio device on a GPU). The power state of the - * client is assumed to be ON. + * client is assumed to be ON. Beforehand, vga_switcheroo_client_probe_defer() + * shall be called to ensure that all prerequisites are met. * * Return: 0 on success, -ENOMEM on memory allocation error. */ @@ -376,6 +379,33 @@ find_active_client(struct list_head *head) } /** + * vga_switcheroo_client_probe_defer() - whether to defer probing a given client + * @pdev: client pci device + * + * Determine whether any prerequisites are not fulfilled to probe a given + * client. Drivers shall invoke this early on in their ->probe callback + * and return %-EPROBE_DEFER if it evaluates to %true. Thou shalt not + * register the client ere thou hast called this. + * + * Return: %true if probing should be deferred, otherwise %false. + */ +bool vga_switcheroo_client_probe_defer(struct pci_dev *pdev) +{ + if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA) { + /* + * apple-gmux is needed on pre-retina MacBook Pro + * to probe the panel if pdev is the inactive GPU. + */ + if (apple_gmux_present() && pdev != vga_default_device() && + !vgasr_priv.handler_flags) + return true; + } + + return false; +} +EXPORT_SYMBOL(vga_switcheroo_client_probe_defer); + +/** * vga_switcheroo_get_client_state() - obtain power state of a given client * @pdev: client pci device * diff --git a/include/linux/vga_switcheroo.h b/include/linux/vga_switcheroo.h index b39a5f3..960bedb 100644 --- a/include/linux/vga_switcheroo.h +++ b/include/linux/vga_switcheroo.h @@ -165,6 +165,7 @@ int vga_switcheroo_unlock_ddc(struct pci_dev *pdev); int vga_switcheroo_process_delayed_switch(void); +bool vga_switcheroo_client_probe_defer(struct pci_dev *pdev); enum vga_switcheroo_state vga_switcheroo_get_client_state(struct pci_dev *dev); void vga_switcheroo_set_dynamic_switch(struct pci_dev *pdev, enum vga_switcheroo_state dynamic); @@ -188,6 +189,7 @@ static inline enum vga_switcheroo_handler_flags_t vga_switcheroo_handler_flags(v static inline int vga_switcheroo_lock_ddc(struct pci_dev *pdev) { return -ENODEV; } static inline int vga_switcheroo_unlock_ddc(struct pci_dev *pdev) { return -ENODEV; } static inline int vga_switcheroo_process_delayed_switch(void) { return 0; } +static inline bool vga_switcheroo_client_probe_defer(struct pci_dev *pdev) { return false; } static inline enum vga_switcheroo_state vga_switcheroo_get_client_state(struct pci_dev *dev) { return VGA_SWITCHEROO_ON; } static inline void vga_switcheroo_set_dynamic_switch(struct pci_dev *pdev, enum vga_switcheroo_state dynamic) {}