Message ID | 165453797543.3592816.6381793341352595461.stgit@omen (mailing list archive) |
---|---|
Headers | show |
Series | Improve vfio-pci primary GPU assignment behavior | expand |
Hello Alex, On 6/6/22 19:53, Alex Williamson wrote: > Users attempting to enable vfio PCI device assignment with a GPU will > often block the default PCI driver from the device to avoid conflicts > with the device initialization or release path. This means that > vfio-pci is sometimes the first PCI driver to bind to the device. In > the case of assigning the primary graphics device, low-level console > drivers may still generate resource conflicts. Users often employ > kernel command line arguments to disable conflicting drivers or > perform unbinding in userspace to avoid this, but the actual solution > is often distribution/kernel config specific based on the included > drivers. > > We can instead allow vfio-pci to copy the behavior of > drm_aperture_remove_conflicting_pci_framebuffers() in order to remove > these low-level drivers with conflicting resources. vfio-pci is not > however a DRM driver, nor does vfio-pci depend on DRM config options, > thus we split out and export the necessary DRM apterture support and > mirror the framebuffer and VGA support. > > I'd be happy to pull this series in through the vfio branch if > approved by the DRM maintainers. Thanks, > I understand your issue but I really don't think that using this helper is the correct thing to do. We already have some races with the current aperture infrastructure As an example you can look at [0]. The agreement on the mentioned thread is that we want to unify the fbdev and DRM drivers apertures into a single list, and ideally moving all to the Linux device model to handle the removal of conflicting devices. That's why I don't feel that leaking the DRM aperture helper to another is desirable since it would make even harder to cleanup this later. But also, this issue isn't something that only affects graphic devices, right? AFAIU from [1] and [2], the same issue happens if a PCI device has to be bound to vfio-pci but already was bound to a host driver. The fact that DRM happens to have some infrastructure to remove devices that conflict with an aperture is just a coincidence. Since this is used to remove devices bound to drivers that make use of the firmware-provided system framebuffer. The series [0] mentioned above, adds a sysfb_disable() that disables the Generic System Framebuffer logic that is what registers the framebuffer devices that are bound to these generic video drivers. On disable, the devices registered by sysfb are also unregistered. Would be enough for your use case to use that helper function if it lands or do you really need to look at the apertures? That is, do you want to remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need to also remove real fbdev and DRM drivers? [0]: https://lore.kernel.org/lkml/YnvrxICnisXU6I1y@ravnborg.org/T/ [1]: https://www.ibm.com/docs/en/linux-on-systems?topic=through-pci [2]: https://www.kernel.org/doc/Documentation/vfio.txt
On Tue, 7 Jun 2022 19:40:40 +0200 Javier Martinez Canillas <javierm@redhat.com> wrote: > Hello Alex, > > On 6/6/22 19:53, Alex Williamson wrote: > > Users attempting to enable vfio PCI device assignment with a GPU will > > often block the default PCI driver from the device to avoid conflicts > > with the device initialization or release path. This means that > > vfio-pci is sometimes the first PCI driver to bind to the device. In > > the case of assigning the primary graphics device, low-level console > > drivers may still generate resource conflicts. Users often employ > > kernel command line arguments to disable conflicting drivers or > > perform unbinding in userspace to avoid this, but the actual solution > > is often distribution/kernel config specific based on the included > > drivers. > > > > We can instead allow vfio-pci to copy the behavior of > > drm_aperture_remove_conflicting_pci_framebuffers() in order to remove > > these low-level drivers with conflicting resources. vfio-pci is not > > however a DRM driver, nor does vfio-pci depend on DRM config options, > > thus we split out and export the necessary DRM apterture support and > > mirror the framebuffer and VGA support. > > > > I'd be happy to pull this series in through the vfio branch if > > approved by the DRM maintainers. Thanks, > > > > I understand your issue but I really don't think that using this helper > is the correct thing to do. We already have some races with the current > aperture infrastructure As an example you can look at [0]. > > The agreement on the mentioned thread is that we want to unify the fbdev > and DRM drivers apertures into a single list, and ideally moving all to > the Linux device model to handle the removal of conflicting devices. > > That's why I don't feel that leaking the DRM aperture helper to another > is desirable since it would make even harder to cleanup this later. OTOH, this doesn't really make the problem worse and it identifies another stakeholder to a full solution. > But also, this issue isn't something that only affects graphic devices, > right? AFAIU from [1] and [2], the same issue happens if a PCI device > has to be bound to vfio-pci but already was bound to a host driver. When we're shuffling between PCI drivers, we expect the unbind of the previous driver to have released all the claimed resources. If there were a previously attached PCI graphics driver, then the code added in patch 2/ is simply redundant since that PCI graphics driver must have already performed similar actions. The primary use case of this series is where there is no previous PCI graphics driver and we have no visibility to platform drivers carving chunks of the PCI resources into different subsystems. AFAIK, this is unique to graphics devices. > The fact that DRM happens to have some infrastructure to remove devices > that conflict with an aperture is just a coincidence. Since this is used > to remove devices bound to drivers that make use of the firmware-provided > system framebuffer. It seems not so much a coincidence as an artifact of the exact problem both PCI graphics drivers and now vfio-pci face. We've created platform devices to manage sub-ranges of resources, where the actual parent of those resources is only discovered later and we don't automatically resolve the resource conflict between that parent device and these platform devices when binding the parent driver. > The series [0] mentioned above, adds a sysfb_disable() that disables the > Generic System Framebuffer logic that is what registers the framebuffer > devices that are bound to these generic video drivers. On disable, the > devices registered by sysfb are also unregistered. > > Would be enough for your use case to use that helper function if it lands > or do you really need to look at the apertures? That is, do you want to > remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need > to also remove real fbdev and DRM drivers? It's not clear to me how this helps. I infer that sysfb_disable() is intended to be used by, for example, a PCI console driver which would be taking over the console and can therefore make a clear decision to end sysfb support. The vfio-pci driver is not a console driver so we can't make any sort of blind assertion regarding sysfb. We might be binding to a secondary graphics card which has no sysfb drivers attached. I'm a lot more comfortable wielding an interface that intends to disable drivers/devices relative to the resources of a given device rather than a blanket means to disable a subsystem. I wonder if the race issues aren't better solved by avoiding to create platform devices exposing resource conflicts with known devices, especially when those existing devices have drivers attached. Thanks, Alex
Hi, > But also, this issue isn't something that only affects graphic devices, > right? AFAIU from [1] and [2], the same issue happens if a PCI device > has to be bound to vfio-pci but already was bound to a host driver. Nope. There is a standard procedure to bind and unbind pci drivers via sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}. > The fact that DRM happens to have some infrastructure to remove devices > that conflict with an aperture is just a coincidence. No. It's a consequence of firmware framebuffers not being linked to the pci device actually backing them, so some other way is needed to find and solve conflicts. > The series [0] mentioned above, adds a sysfb_disable() that disables the > Generic System Framebuffer logic that is what registers the framebuffer > devices that are bound to these generic video drivers. On disable, the > devices registered by sysfb are also unregistered. As Alex already mentioned this might not have the desired effect on systems with multiple GPUs (I think even without considering vfio-pci). > That is, do you want to remove the {vesa,efi,simple}fb and simpledrm > drivers or is there a need to also remove real fbdev and DRM drivers? Boot framebuffers are the problem because they are neither visible nor manageable in /sys/bus/pci. For real fbdev/drm drivers the standard pci unbind can be used. take care, Gerd
Hello Gerd and Alex, On 6/8/22 09:43, Gerd Hoffmann wrote: > Hi, > >> But also, this issue isn't something that only affects graphic devices, >> right? AFAIU from [1] and [2], the same issue happens if a PCI device >> has to be bound to vfio-pci but already was bound to a host driver. > > Nope. There is a standard procedure to bind and unbind pci drivers via > sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}. > Yes, but the cover letter says: "Users often employ kernel command line arguments to disable conflicting drivers or perform unbinding in userspace to avoid this" So I misunderstood that the goal was to avoid the need to do this via sysfs in user-space. I understand now that the problem is that for real PCI devices bound to a driver, you know the PCI device ID and bus so that you can use it, but with platform devices bound to drivers that just use a firmware-provided framebuffers you don't have that information to unbound. Because you could use the standard sysfs bind/unbind interface for this too, but don't have a way to know if the "simple-framebuffer" or "efi-framebuffer" is associated with a PCI device that you want to pass through or another one. The only information that could tell you that is the I/O memory resource that is associated with the platform device registered and that's why you want to use the drm_aperture_remove_conflicting_pci_framebuffers() helper. >> The fact that DRM happens to have some infrastructure to remove devices >> that conflict with an aperture is just a coincidence. > > No. It's a consequence of firmware framebuffers not being linked to the > pci device actually backing them, so some other way is needed to find > and solve conflicts. > Right, it's clear to me now. As mentioned I misunderstood your problem. >> The series [0] mentioned above, adds a sysfb_disable() that disables the >> Generic System Framebuffer logic that is what registers the framebuffer >> devices that are bound to these generic video drivers. On disable, the >> devices registered by sysfb are also unregistered. > > As Alex already mentioned this might not have the desired effect on > systems with multiple GPUs (I think even without considering vfio-pci). > That's correct, although the firmware framebuffer drivers are just a best effort to allow having some display output even if there's no real video driver (or if the user prevented them to load with "nomodeset"). We have talked about improving this, by unifying fbdev and DRM apertures in a single list that could track all the devices registered and their requested aperture so that all subsystems could use it. The reason why I was pushing back on using the DRM aperture helper is that it would make more complicated later to do this refactoring as more subsystems use the current API. But as Alex said, it wouldn't make the problem worse so I'm OK with this if others agree that's the correct thing to do. >> That is, do you want to remove the {vesa,efi,simple}fb and simpledrm >> drivers or is there a need to also remove real fbdev and DRM drivers? > > Boot framebuffers are the problem because they are neither visible nor > manageable in /sys/bus/pci. For real fbdev/drm drivers the standard pci > unbind can be used. > Yes. Honestly I believe all this should be handled by the Linux device model. That is, drivers could just do pci_request_region() / request_mem_region() and drivers that want to unbind another bound device could do something like pci_request_region_force() / request_mem_region_force() to kick them out.
Hi, > >> But also, this issue isn't something that only affects graphic devices, > >> right? AFAIU from [1] and [2], the same issue happens if a PCI device > >> has to be bound to vfio-pci but already was bound to a host driver. > > > > Nope. There is a standard procedure to bind and unbind pci drivers via > > sysfs, using /sys/bus/pci/drivers/$name/{bind,unbind}. > > > > Yes, but the cover letter says: > > "Users often employ kernel command line arguments to disable conflicting > drivers or perform unbinding in userspace to avoid this" Thats helpful at times to deal with driver and/or hardware quirks. Example: Years ago drm drivers used to be horrible when it came to unbind, leaving oopses and panics left & right when you tried (luckily it works much better these days). [ leaving this here for completeness, snipping the remaining reply, noting that we are on the same page now ] thanks & take care, Gerd