Message ID | 20230404201842.567344-7-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers | expand |
On 4/4/23 1:18 PM, Daniel Vetter wrote: > Instead of calling aperture_remove_conflicting_devices() to remove the > conflicting devices, just call to aperture_detach_devices() to detach > the device that matches the same PCI BAR / aperture range. Since the > former is just a wrapper of the latter plus a sysfb_disable() call, > and now that's done in this function but only for the primary devices. > > This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable > sysfb device registration when removing conflicting FBs"), where we > remove the sysfb when loading a driver for an unrelated pci device, > resulting in the user loosing their efifb console or similar. > > Note that in practice this only is a problem with the nvidia blob, > because that's the only gpu driver people might install which does not > come with an fbdev driver of it's own. For everyone else the real gpu > driver will restore a working console. It might be worth noting that this also affects devices that have no driver installed, or where the driver failed to initialize or was configured not to set a mode. E.g. I reproduced this problem on a laptop with i915.modeset=0 and an NVIDIA driver that calls drm_fbdev_generic_setup. It would also reproduce on a system that sets modeset=0 (or has a GPU that's too new for its corresponding kernel driver) and that passes an NVIDIA GPU through to a VM using vfio-pci since that also calls aperture_remove_conflicting_pci_devices. I agree that in practice this will mostly affect people with our driver until I get my changes to add drm_fbdev_generic_setup checked in. But these other cases don't seem all that unlikely to me. -- Aaron > Also note that in the referenced bug there's confusion that this same > bug also happens on amdgpu. But that was just another amdgpu specific > regression, which just happened to happen at roughly the same time and > with the same user-observable symptoms. That bug is fixed now, see > https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15 > > Note that we should not have any such issues on non-pci multi-gpu > issues, because I could only find two such cases: > - SoC with some external panel over spi or similar. These panel > drivers do not use drm_aperture_remove_conflicting_framebuffers(), > so no problem. > - vga+mga, which is a direct console driver and entirely bypasses all > this. > > For the above reasons the cc: stable is just notionally, this patch > will need a backport and that's up to nvidia if they care enough. > > v2: > - Explain a bit better why other multi-gpu that aren't pci shouldn't > have any issues with making all this fully pci specific. > > v3 > - polish commit message (Javier) > > Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs") > Tested-by: Aaron Plattner <aplattner@nvidia.com> > Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> > References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28 > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Aaron Plattner <aplattner@nvidia.com> > Cc: Javier Martinez Canillas <javierm@redhat.com> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > Cc: Helge Deller <deller@gmx.de> > Cc: Sam Ravnborg <sam@ravnborg.org> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport) > --- > drivers/video/aperture.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c > index 8f1437339e49..2394c2d310f8 100644 > --- a/drivers/video/aperture.c > +++ b/drivers/video/aperture.c > @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na > > primary = pdev == vga_default_device(); > > + if (primary) > + sysfb_disable(); > + > for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) { > if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) > continue; > > base = pci_resource_start(pdev, bar); > size = pci_resource_len(pdev, bar); > - ret = aperture_remove_conflicting_devices(base, size, name); > - if (ret) > - return ret; > + aperture_detach_devices(base, size); > } > > if (primary) {
On Tue, Apr 04, 2023 at 01:59:33PM -0700, Aaron Plattner wrote: > On 4/4/23 1:18 PM, Daniel Vetter wrote: > > Instead of calling aperture_remove_conflicting_devices() to remove the > > conflicting devices, just call to aperture_detach_devices() to detach > > the device that matches the same PCI BAR / aperture range. Since the > > former is just a wrapper of the latter plus a sysfb_disable() call, > > and now that's done in this function but only for the primary devices. > > > > This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable > > sysfb device registration when removing conflicting FBs"), where we > > remove the sysfb when loading a driver for an unrelated pci device, > > resulting in the user loosing their efifb console or similar. > > > > Note that in practice this only is a problem with the nvidia blob, > > because that's the only gpu driver people might install which does not > > come with an fbdev driver of it's own. For everyone else the real gpu > > driver will restore a working console. > > It might be worth noting that this also affects devices that have no driver > installed, or where the driver failed to initialize or was configured not to > set a mode. E.g. I reproduced this problem on a laptop with i915.modeset=0 > and an NVIDIA driver that calls drm_fbdev_generic_setup. It would also > reproduce on a system that sets modeset=0 (or has a GPU that's too new for > its corresponding kernel driver) and that passes an NVIDIA GPU through to a > VM using vfio-pci since that also calls > aperture_remove_conflicting_pci_devices. > > I agree that in practice this will mostly affect people with our driver > until I get my changes to add drm_fbdev_generic_setup checked in. But these > other cases don't seem all that unlikely to me. Thomas Z. refactored the entire modeset=0 handling to be more consistent across drivers, so I think in practice it'll again only happen with the nvidia blob driver (unless you patch in the call to drm_firmware_drivers_only()). Or if you dont use nomodeset or similar and instead use a driver-specific module option, which isn't what howtos in distros recommend. I can add this to the commit message if you want? -Daniel > > -- Aaron > > > Also note that in the referenced bug there's confusion that this same > > bug also happens on amdgpu. But that was just another amdgpu specific > > regression, which just happened to happen at roughly the same time and > > with the same user-observable symptoms. That bug is fixed now, see > > https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15 > > > > Note that we should not have any such issues on non-pci multi-gpu > > issues, because I could only find two such cases: > > - SoC with some external panel over spi or similar. These panel > > drivers do not use drm_aperture_remove_conflicting_framebuffers(), > > so no problem. > > - vga+mga, which is a direct console driver and entirely bypasses all > > this. > > > > For the above reasons the cc: stable is just notionally, this patch > > will need a backport and that's up to nvidia if they care enough. > > > > v2: > > - Explain a bit better why other multi-gpu that aren't pci shouldn't > > have any issues with making all this fully pci specific. > > > > v3 > > - polish commit message (Javier) > > > > Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs") > > Tested-by: Aaron Plattner <aplattner@nvidia.com> > > Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> > > References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28 > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > Cc: Aaron Plattner <aplattner@nvidia.com> > > Cc: Javier Martinez Canillas <javierm@redhat.com> > > Cc: Thomas Zimmermann <tzimmermann@suse.de> > > Cc: Helge Deller <deller@gmx.de> > > Cc: Sam Ravnborg <sam@ravnborg.org> > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport) > > --- > > drivers/video/aperture.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c > > index 8f1437339e49..2394c2d310f8 100644 > > --- a/drivers/video/aperture.c > > +++ b/drivers/video/aperture.c > > @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na > > primary = pdev == vga_default_device(); > > + if (primary) > > + sysfb_disable(); > > + > > for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) { > > if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) > > continue; > > base = pci_resource_start(pdev, bar); > > size = pci_resource_len(pdev, bar); > > - ret = aperture_remove_conflicting_devices(base, size, name); > > - if (ret) > > - return ret; > > + aperture_detach_devices(base, size); > > } > > if (primary) {
Daniel Vetter <daniel.vetter@ffwll.ch> writes: > Instead of calling aperture_remove_conflicting_devices() to remove the > conflicting devices, just call to aperture_detach_devices() to detach > the device that matches the same PCI BAR / aperture range. Since the > former is just a wrapper of the latter plus a sysfb_disable() call, > and now that's done in this function but only for the primary devices. > > This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable > sysfb device registration when removing conflicting FBs"), where we > remove the sysfb when loading a driver for an unrelated pci device, > resulting in the user loosing their efifb console or similar. > > Note that in practice this only is a problem with the nvidia blob, > because that's the only gpu driver people might install which does not > come with an fbdev driver of it's own. For everyone else the real gpu > driver will restore a working console. > > Also note that in the referenced bug there's confusion that this same > bug also happens on amdgpu. But that was just another amdgpu specific > regression, which just happened to happen at roughly the same time and > with the same user-observable symptoms. That bug is fixed now, see > https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15 > > Note that we should not have any such issues on non-pci multi-gpu > issues, because I could only find two such cases: > - SoC with some external panel over spi or similar. These panel > drivers do not use drm_aperture_remove_conflicting_framebuffers(), > so no problem. > - vga+mga, which is a direct console driver and entirely bypasses all > this. > > For the above reasons the cc: stable is just notionally, this patch > will need a backport and that's up to nvidia if they care enough. > > v2: > - Explain a bit better why other multi-gpu that aren't pci shouldn't > have any issues with making all this fully pci specific. > > v3 > - polish commit message (Javier) > > Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs") > Tested-by: Aaron Plattner <aplattner@nvidia.com> > Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> > References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28 > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Aaron Plattner <aplattner@nvidia.com> > Cc: Javier Martinez Canillas <javierm@redhat.com> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > Cc: Helge Deller <deller@gmx.de> > Cc: Sam Ravnborg <sam@ravnborg.org> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport) > --- > drivers/video/aperture.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c index 8f1437339e49..2394c2d310f8 100644 --- a/drivers/video/aperture.c +++ b/drivers/video/aperture.c @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na primary = pdev == vga_default_device(); + if (primary) + sysfb_disable(); + for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) { if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) continue; base = pci_resource_start(pdev, bar); size = pci_resource_len(pdev, bar); - ret = aperture_remove_conflicting_devices(base, size, name); - if (ret) - return ret; + aperture_detach_devices(base, size); } if (primary) {