@@ -2488,12 +2488,12 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
struct vfio_pci_core_device *cur;
bool needs_reset = false;
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
- /* No VFIO device in the set can have an open device FD */
- if (cur->vdev.open_count)
- return false;
+ /* No other VFIO device in the set can be open. */
+ if (dev_set->open_count > 1)
+ return false;
+
+ list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
needs_reset |= cur->needs_reset;
- }
return needs_reset;
}
vfio_pci_dev_set_needs_reset() inspects the open_count of every device in the set to determine whether a reset is needed. The current device always has open_count == 1 within vfio_pci_core_disable(), effectively disabling the reset logic. This field is also documented as private in vfio_device, so it should not be used to determine whether other devices in the set are open. Checking for open_count > 1 on the device set fixes both issues. After commit 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure"), failure to create a new file for a device would cause the reset to be skipped due to open_count being decremented after calling close_device() in the error path. After commit eadd86f835c6 ("vfio: Remove calls to vfio_group_add_container_user()"), releasing a device would always skip the reset due to an ordering change in vfio_device_fops_release(). Failing to reset the device leaves it in an unknown state, potentially causing errors when it is bound to a different driver. This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3) assigned to a Windows guest. After shutting down the guest, unbinding the device from vfio-pci, and binding the device to amdgpu: [ 548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed! [ 548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed [ 548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22 [ 548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed [ 548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init Fixes: 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure") Fixes: eadd86f835c6 ("vfio: Remove calls to vfio_group_add_container_user()") Signed-off-by: Anthony DeRossi <ajderossi@gmail.com> --- drivers/vfio/pci/vfio_pci_core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)