Message ID | 20240503145142.2806030-1-vkale@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] vfio/pci: migration: Skip config space check for Vendor Specific Information in VSC during restore/load | expand |
On 5/3/24 16:51, Vinayak Kale wrote: > In case of migration, during restore operation, qemu checks config space of the > pci device with the config space in the migration stream captured during save > operation. In case of config space data mismatch, restore operation is failed. > > config space check is done in function get_pci_config_device(). By default VSC > (vendor-specific-capability) in config space is checked. > > Due to qemu's config space check for VSC, live migration is broken across NVIDIA > vGPU devices in situation where source and destination host driver is different. > In this situation, Vendor Specific Information in VSC varies on the destination > to ensure vGPU feature capabilities exposed to the guest driver are compatible > with destination host. > > If a vfio-pci device is migration capable and vfio-pci vendor driver is OK with > volatile Vendor Specific Info in VSC then qemu should exempt config space check > for Vendor Specific Info. It is vendor driver's responsibility to ensure that > VSC is consistent across migration. Here consistency could mean that VSC format > should be same on source and destination, however actual Vendor Specific Info > may not be byte-to-byte identical. > > This patch skips the check for Vendor Specific Information in VSC for VFIO-PCI > device by clearing pdev->cmask[] offsets. Config space check is still enforced > for 3 byte VSC header. If cmask[] is not set for an offset, then qemu skips > config space check for that offset. > > VSC check is skipped for machine types >= 9.1. The check would be enforced on > older machine types (<= 9.0). > > Signed-off-by: Vinayak Kale <vkale@nvidia.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Cédric Le Goater <clg@redhat.com> LGTM, Reviewed-by: Cédric Le Goater <clg@redhat.com> Thanks, C. > --- > Version History > v3->v4: > - VSC check is skipped for machine types >= 9.1. The check would be enforced > on older machine types (<= 9.0). > v2->v3: > - Config space check skipped only for Vendor Specific Info in VSC, check is > still enforced for 3 byte VSC header. > - Updated commit description with live migration failure scenario. > v1->v2: > - Limited scope of change to vfio-pci devices instead of all pci devices. > > hw/core/machine.c | 1 + > hw/vfio/pci.c | 26 ++++++++++++++++++++++++++ > hw/vfio/pci.h | 1 + > 3 files changed, 28 insertions(+) > > diff --git a/hw/core/machine.c b/hw/core/machine.c > index 4ff60911e7..fc3eb5115f 100644 > --- a/hw/core/machine.c > +++ b/hw/core/machine.c > @@ -35,6 +35,7 @@ > > GlobalProperty hw_compat_9_0[] = { > {"arm-cpu", "backcompat-cntfrq", "true" }, > + {"vfio-pci", "skip-vsc-check", "false" }, > }; > const size_t hw_compat_9_0_len = G_N_ELEMENTS(hw_compat_9_0); > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 64780d1b79..2ece9407cc 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -2134,6 +2134,28 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos) > } > } > > +static int vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos, > + uint8_t size, Error **errp) > +{ > + PCIDevice *pdev = &vdev->pdev; > + > + pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp); > + if (pos < 0) { > + return pos; > + } > + > + /* > + * Exempt config space check for Vendor Specific Information during > + * restore/load. > + * Config space check is still enforced for 3 byte VSC header. > + */ > + if (vdev->skip_vsc_check && size > 3) { > + memset(pdev->cmask + pos + 3, 0, size - 3); > + } > + > + return pos; > +} > + > static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) > { > ERRP_GUARD(); > @@ -2202,6 +2224,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) > vfio_check_af_flr(vdev, pos); > ret = pci_add_capability(pdev, cap_id, pos, size, errp); > break; > + case PCI_CAP_ID_VNDR: > + ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp); > + break; > default: > ret = pci_add_capability(pdev, cap_id, pos, size, errp); > break; > @@ -3390,6 +3415,7 @@ static Property vfio_pci_dev_properties[] = { > DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd, > TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *), > #endif > + DEFINE_PROP_BOOL("skip-vsc-check", VFIOPCIDevice, skip_vsc_check, true), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h > index 6e64a2654e..92cd62d115 100644 > --- a/hw/vfio/pci.h > +++ b/hw/vfio/pci.h > @@ -177,6 +177,7 @@ struct VFIOPCIDevice { > OnOffAuto ramfb_migrate; > bool defer_kvm_irq_routing; > bool clear_parent_atomics_on_exit; > + bool skip_vsc_check; > VFIODisplay *dpy; > Notifier irqchip_change_notifier; > };
On 5/3/24 16:51, Vinayak Kale wrote: > In case of migration, during restore operation, qemu checks config space of the > pci device with the config space in the migration stream captured during save > operation. In case of config space data mismatch, restore operation is failed. > > config space check is done in function get_pci_config_device(). By default VSC > (vendor-specific-capability) in config space is checked. > > Due to qemu's config space check for VSC, live migration is broken across NVIDIA > vGPU devices in situation where source and destination host driver is different. > In this situation, Vendor Specific Information in VSC varies on the destination > to ensure vGPU feature capabilities exposed to the guest driver are compatible > with destination host. > > If a vfio-pci device is migration capable and vfio-pci vendor driver is OK with > volatile Vendor Specific Info in VSC then qemu should exempt config space check > for Vendor Specific Info. It is vendor driver's responsibility to ensure that > VSC is consistent across migration. Here consistency could mean that VSC format > should be same on source and destination, however actual Vendor Specific Info > may not be byte-to-byte identical. > > This patch skips the check for Vendor Specific Information in VSC for VFIO-PCI > device by clearing pdev->cmask[] offsets. Config space check is still enforced > for 3 byte VSC header. If cmask[] is not set for an offset, then qemu skips > config space check for that offset. > > VSC check is skipped for machine types >= 9.1. The check would be enforced on > older machine types (<= 9.0). > > Signed-off-by: Vinayak Kale <vkale@nvidia.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Cédric Le Goater <clg@redhat.com> Applied to vfio-next. Thanks, C. > --- > Version History > v3->v4: > - VSC check is skipped for machine types >= 9.1. The check would be enforced > on older machine types (<= 9.0). > v2->v3: > - Config space check skipped only for Vendor Specific Info in VSC, check is > still enforced for 3 byte VSC header. > - Updated commit description with live migration failure scenario. > v1->v2: > - Limited scope of change to vfio-pci devices instead of all pci devices. > > hw/core/machine.c | 1 + > hw/vfio/pci.c | 26 ++++++++++++++++++++++++++ > hw/vfio/pci.h | 1 + > 3 files changed, 28 insertions(+) > > diff --git a/hw/core/machine.c b/hw/core/machine.c > index 4ff60911e7..fc3eb5115f 100644 > --- a/hw/core/machine.c > +++ b/hw/core/machine.c > @@ -35,6 +35,7 @@ > > GlobalProperty hw_compat_9_0[] = { > {"arm-cpu", "backcompat-cntfrq", "true" }, > + {"vfio-pci", "skip-vsc-check", "false" }, > }; > const size_t hw_compat_9_0_len = G_N_ELEMENTS(hw_compat_9_0); > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 64780d1b79..2ece9407cc 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -2134,6 +2134,28 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos) > } > } > > +static int vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos, > + uint8_t size, Error **errp) > +{ > + PCIDevice *pdev = &vdev->pdev; > + > + pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp); > + if (pos < 0) { > + return pos; > + } > + > + /* > + * Exempt config space check for Vendor Specific Information during > + * restore/load. > + * Config space check is still enforced for 3 byte VSC header. > + */ > + if (vdev->skip_vsc_check && size > 3) { > + memset(pdev->cmask + pos + 3, 0, size - 3); > + } > + > + return pos; > +} > + > static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) > { > ERRP_GUARD(); > @@ -2202,6 +2224,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) > vfio_check_af_flr(vdev, pos); > ret = pci_add_capability(pdev, cap_id, pos, size, errp); > break; > + case PCI_CAP_ID_VNDR: > + ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp); > + break; > default: > ret = pci_add_capability(pdev, cap_id, pos, size, errp); > break; > @@ -3390,6 +3415,7 @@ static Property vfio_pci_dev_properties[] = { > DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd, > TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *), > #endif > + DEFINE_PROP_BOOL("skip-vsc-check", VFIOPCIDevice, skip_vsc_check, true), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h > index 6e64a2654e..92cd62d115 100644 > --- a/hw/vfio/pci.h > +++ b/hw/vfio/pci.h > @@ -177,6 +177,7 @@ struct VFIOPCIDevice { > OnOffAuto ramfb_migrate; > bool defer_kvm_irq_routing; > bool clear_parent_atomics_on_exit; > + bool skip_vsc_check; > VFIODisplay *dpy; > Notifier irqchip_change_notifier; > };
diff --git a/hw/core/machine.c b/hw/core/machine.c index 4ff60911e7..fc3eb5115f 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -35,6 +35,7 @@ GlobalProperty hw_compat_9_0[] = { {"arm-cpu", "backcompat-cntfrq", "true" }, + {"vfio-pci", "skip-vsc-check", "false" }, }; const size_t hw_compat_9_0_len = G_N_ELEMENTS(hw_compat_9_0); diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 64780d1b79..2ece9407cc 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2134,6 +2134,28 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, uint8_t pos) } } +static int vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos, + uint8_t size, Error **errp) +{ + PCIDevice *pdev = &vdev->pdev; + + pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp); + if (pos < 0) { + return pos; + } + + /* + * Exempt config space check for Vendor Specific Information during + * restore/load. + * Config space check is still enforced for 3 byte VSC header. + */ + if (vdev->skip_vsc_check && size > 3) { + memset(pdev->cmask + pos + 3, 0, size - 3); + } + + return pos; +} + static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) { ERRP_GUARD(); @@ -2202,6 +2224,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp) vfio_check_af_flr(vdev, pos); ret = pci_add_capability(pdev, cap_id, pos, size, errp); break; + case PCI_CAP_ID_VNDR: + ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp); + break; default: ret = pci_add_capability(pdev, cap_id, pos, size, errp); break; @@ -3390,6 +3415,7 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd, TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *), #endif + DEFINE_PROP_BOOL("skip-vsc-check", VFIOPCIDevice, skip_vsc_check, true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 6e64a2654e..92cd62d115 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -177,6 +177,7 @@ struct VFIOPCIDevice { OnOffAuto ramfb_migrate; bool defer_kvm_irq_routing; bool clear_parent_atomics_on_exit; + bool skip_vsc_check; VFIODisplay *dpy; Notifier irqchip_change_notifier; };
In case of migration, during restore operation, qemu checks config space of the pci device with the config space in the migration stream captured during save operation. In case of config space data mismatch, restore operation is failed. config space check is done in function get_pci_config_device(). By default VSC (vendor-specific-capability) in config space is checked. Due to qemu's config space check for VSC, live migration is broken across NVIDIA vGPU devices in situation where source and destination host driver is different. In this situation, Vendor Specific Information in VSC varies on the destination to ensure vGPU feature capabilities exposed to the guest driver are compatible with destination host. If a vfio-pci device is migration capable and vfio-pci vendor driver is OK with volatile Vendor Specific Info in VSC then qemu should exempt config space check for Vendor Specific Info. It is vendor driver's responsibility to ensure that VSC is consistent across migration. Here consistency could mean that VSC format should be same on source and destination, however actual Vendor Specific Info may not be byte-to-byte identical. This patch skips the check for Vendor Specific Information in VSC for VFIO-PCI device by clearing pdev->cmask[] offsets. Config space check is still enforced for 3 byte VSC header. If cmask[] is not set for an offset, then qemu skips config space check for that offset. VSC check is skipped for machine types >= 9.1. The check would be enforced on older machine types (<= 9.0). Signed-off-by: Vinayak Kale <vkale@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> --- Version History v3->v4: - VSC check is skipped for machine types >= 9.1. The check would be enforced on older machine types (<= 9.0). v2->v3: - Config space check skipped only for Vendor Specific Info in VSC, check is still enforced for 3 byte VSC header. - Updated commit description with live migration failure scenario. v1->v2: - Limited scope of change to vfio-pci devices instead of all pci devices. hw/core/machine.c | 1 + hw/vfio/pci.c | 26 ++++++++++++++++++++++++++ hw/vfio/pci.h | 1 + 3 files changed, 28 insertions(+)