Message ID | 20230830103754.36461-16-zhenzhong.duan@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio: Adopt iommufd | expand |
On 8/30/23 12:37, Zhenzhong Duan wrote: > This adds "--enable-iommufd/--disable-iommufd" to enable or disable > iommufd support, enabled by default. Why would someone want to disable support at compile time ? It might have been useful for dev but now QEMU should self-adjust at runtime depending only on the host capabilities AFAIUI. Am I missing something ? Thanks, C. > > Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> > --- > meson.build | 6 ++++++ > meson_options.txt | 2 ++ > scripts/meson-buildoptions.sh | 3 +++ > 3 files changed, 11 insertions(+) > > diff --git a/meson.build b/meson.build > index 98e68ef0b1..6526d8cc9b 100644 > --- a/meson.build > +++ b/meson.build > @@ -574,6 +574,10 @@ have_tpm = get_option('tpm') \ > .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \ > .allowed() > > +have_iommufd = get_option('iommufd') \ > + .require(targetos == 'linux', error_message: 'iommufd is supported only on Linux') \ > + .allowed() > + > # vhost > have_vhost_user = get_option('vhost_user') \ > .disable_auto_if(targetos != 'linux') \ > @@ -2129,6 +2133,7 @@ endif > config_host_data.set('CONFIG_SNAPPY', snappy.found()) > config_host_data.set('CONFIG_TPM', have_tpm) > config_host_data.set('CONFIG_TSAN', get_option('tsan')) > +config_host_data.set('CONFIG_IOMMUFD', have_iommufd) > config_host_data.set('CONFIG_USB_LIBUSB', libusb.found()) > config_host_data.set('CONFIG_VDE', vde.found()) > config_host_data.set('CONFIG_VHOST_NET', have_vhost_net) > @@ -4051,6 +4056,7 @@ summary_info += {'vhost-user-crypto support': have_vhost_user_crypto} > summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server} > summary_info += {'vhost-vdpa support': have_vhost_vdpa} > summary_info += {'build guest agent': have_ga} > +summary_info += {'iommufd support': have_iommufd} > summary(summary_info, bool_yn: true, section: 'Configurable features') > > # Compilation information > diff --git a/meson_options.txt b/meson_options.txt > index aaea5ddd77..aed91d173b 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -105,6 +105,8 @@ option('dbus_display', type: 'feature', value: 'auto', > description: '-display dbus support') > option('tpm', type : 'feature', value : 'auto', > description: 'TPM support') > +option('iommufd', type : 'feature', value : 'auto', > + description: 'iommufd support') > > # Do not enable it by default even for Mingw32, because it doesn't > # work on Wine. > diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh > index 9da3fe299b..719401ffb0 100644 > --- a/scripts/meson-buildoptions.sh > +++ b/scripts/meson-buildoptions.sh > @@ -113,6 +113,7 @@ meson_options_help() { > printf "%s\n" ' hax HAX acceleration support' > printf "%s\n" ' hvf HVF acceleration support' > printf "%s\n" ' iconv Font glyph conversion support' > + printf "%s\n" ' iommufd iommufd support' > printf "%s\n" ' jack JACK sound support' > printf "%s\n" ' keyring Linux keyring support' > printf "%s\n" ' kvm KVM acceleration support' > @@ -325,6 +326,8 @@ _meson_option_parse() { > --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;; > --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;; > --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;; > + --enable-iommufd) printf "%s" -Diommufd=enabled ;; > + --disable-iommufd) printf "%s" -Diommufd=disabled ;; > --enable-jack) printf "%s" -Djack=enabled ;; > --disable-jack) printf "%s" -Djack=disabled ;; > --enable-keyring) printf "%s" -Dkeyring=enabled ;;
>-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Wednesday, September 20, 2023 1:08 AM >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > >On 8/30/23 12:37, Zhenzhong Duan wrote: >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable >> iommufd support, enabled by default. > >Why would someone want to disable support at compile time ? It might For those users who only want to support legacy container feature? Let me know if you still prefer to drop this patch, I'm fine with that. >have been useful for dev but now QEMU should self-adjust at runtime >depending only on the host capabilities AFAIUI. Am I missing something ? IOMMUFD doesn't support all features of legacy container, so QEMU doesn't self-adjust at runtime by checking if host supports IOMMUFD. We need to specify it explicitly to use IOMMUFD as below: -object iommufd,id=iommufd0 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 Thanks Zhenzhong
On 9/20/23 05:42, Duan, Zhenzhong wrote: > > >> -----Original Message----- >> From: Cédric Le Goater <clg@redhat.com> >> Sent: Wednesday, September 20, 2023 1:08 AM >> Subject: Re: [PATCH v1 15/22] Add iommufd configure option >> >> On 8/30/23 12:37, Zhenzhong Duan wrote: >>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable >>> iommufd support, enabled by default. >> >> Why would someone want to disable support at compile time ? It might > > For those users who only want to support legacy container feature? > Let me know if you still prefer to drop this patch, I'm fine with that. I think it is too early. >> have been useful for dev but now QEMU should self-adjust at runtime >> depending only on the host capabilities AFAIUI. Am I missing something ? > > IOMMUFD doesn't support all features of legacy container, so QEMU > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > We need to specify it explicitly to use IOMMUFD as below: > > -object iommufd,id=iommufd0 > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 OK. I am not sure this is the correct interface yet. At first glance, I wouldn't introduce a new object for a simple backend depending on a kernel interface. I would tend to prefer a "iommu-something" property of the vfio-pci device with string values: "legacy", "iommufd", "default" and define the various interfaces (the ops you proposed) for each depending on the user preference and the capabilities of the host and possibly the device. I might be wrong and this might have been discussed before. If so, it should go in the cover letter with other things : what is this patchset providing to VFIO (multiple iommu backends), how it is reaching that goal, how is it organized, how do we deal with the special case (spapr), what's the user interface, etc. Thanks, C. > Thanks > Zhenzhong >
On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote: > On 9/20/23 05:42, Duan, Zhenzhong wrote: > > > > > > > -----Original Message----- > > > From: Cédric Le Goater <clg@redhat.com> > > > Sent: Wednesday, September 20, 2023 1:08 AM > > > Subject: Re: [PATCH v1 15/22] Add iommufd configure option > > > > > > On 8/30/23 12:37, Zhenzhong Duan wrote: > > > > This adds "--enable-iommufd/--disable-iommufd" to enable or disable > > > > iommufd support, enabled by default. > > > > > > Why would someone want to disable support at compile time ? It might > > > > For those users who only want to support legacy container feature? > > Let me know if you still prefer to drop this patch, I'm fine with that. > > I think it is too early. > > > > have been useful for dev but now QEMU should self-adjust at runtime > > > depending only on the host capabilities AFAIUI. Am I missing something ? > > > > IOMMUFD doesn't support all features of legacy container, so QEMU > > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > > We need to specify it explicitly to use IOMMUFD as below: > > > > -object iommufd,id=iommufd0 > > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > OK. I am not sure this is the correct interface yet. At first glance, > I wouldn't introduce a new object for a simple backend depending on a > kernel interface. I would tend to prefer a "iommu-something" property > of the vfio-pci device with string values: "legacy", "iommufd", "default" > and define the various interfaces (the ops you proposed) for each > depending on the user preference and the capabilities of the host and > possibly the device. I think the idea came from Alex? The major point is to be able to have libvirt open /dev/iommufd and FD pass it into qemu and then share that single FD across all VFIOs. qemu will typically not be able to self-open /dev/iommufd as it is root-only. So the object is not exactly for the backend, the object is for the file descriptor. Adding a legacy/iommufd option to the vfio-pci device string doesn't address these needs. Jason
On Wed, Sep 20, 2023 at 09:51:03AM -0300, Jason Gunthorpe wrote: > On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote: > > On 9/20/23 05:42, Duan, Zhenzhong wrote: > > > > > > > > > > -----Original Message----- > > > > From: Cédric Le Goater <clg@redhat.com> > > > > Sent: Wednesday, September 20, 2023 1:08 AM > > > > Subject: Re: [PATCH v1 15/22] Add iommufd configure option > > > > > > > > On 8/30/23 12:37, Zhenzhong Duan wrote: > > > > > This adds "--enable-iommufd/--disable-iommufd" to enable or disable > > > > > iommufd support, enabled by default. > > > > > > > > Why would someone want to disable support at compile time ? It might > > > > > > For those users who only want to support legacy container feature? > > > Let me know if you still prefer to drop this patch, I'm fine with that. > > > > I think it is too early. > > > > > > have been useful for dev but now QEMU should self-adjust at runtime > > > > depending only on the host capabilities AFAIUI. Am I missing something ? > > > > > > IOMMUFD doesn't support all features of legacy container, so QEMU > > > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > > > We need to specify it explicitly to use IOMMUFD as below: > > > > > > -object iommufd,id=iommufd0 > > > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > > > OK. I am not sure this is the correct interface yet. At first glance, > > I wouldn't introduce a new object for a simple backend depending on a > > kernel interface. I would tend to prefer a "iommu-something" property > > of the vfio-pci device with string values: "legacy", "iommufd", "default" > > and define the various interfaces (the ops you proposed) for each > > depending on the user preference and the capabilities of the host and > > possibly the device. > > I think the idea came from Alex? The major point is to be able to have > libvirt open /dev/iommufd and FD pass it into qemu and then share that > single FD across all VFIOs. qemu will typically not be able to > self-open /dev/iommufd as it is root-only. > > So the object is not exactly for the backend, the object is for the > file descriptor. Assuming we must have the exact same FD used for all vfio-pci devices, then using -object iommufd is the least worst way to get that FD injected into QEMU from libvirt. It is a little sucky in that when hotplugging/unplugging devices, libvirt has to think about whether or not it has to object_add/object_del the iommufd> again I don't see better options considering the need to have a single global FD. With regards, Daniel
On 9/20/23 14:51, Jason Gunthorpe wrote: > On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote: >> On 9/20/23 05:42, Duan, Zhenzhong wrote: >>> >>> >>>> -----Original Message----- >>>> From: Cédric Le Goater <clg@redhat.com> >>>> Sent: Wednesday, September 20, 2023 1:08 AM >>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option >>>> >>>> On 8/30/23 12:37, Zhenzhong Duan wrote: >>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable >>>>> iommufd support, enabled by default. >>>> >>>> Why would someone want to disable support at compile time ? It might >>> >>> For those users who only want to support legacy container feature? >>> Let me know if you still prefer to drop this patch, I'm fine with that. >> >> I think it is too early. >> >>>> have been useful for dev but now QEMU should self-adjust at runtime >>>> depending only on the host capabilities AFAIUI. Am I missing something ? >>> >>> IOMMUFD doesn't support all features of legacy container, so QEMU >>> doesn't self-adjust at runtime by checking if host supports IOMMUFD. >>> We need to specify it explicitly to use IOMMUFD as below: >>> >>> -object iommufd,id=iommufd0 >>> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 >> >> OK. I am not sure this is the correct interface yet. At first glance, >> I wouldn't introduce a new object for a simple backend depending on a >> kernel interface. I would tend to prefer a "iommu-something" property >> of the vfio-pci device with string values: "legacy", "iommufd", "default" >> and define the various interfaces (the ops you proposed) for each >> depending on the user preference and the capabilities of the host and >> possibly the device. > > I think the idea came from Alex? The major point is to be able to have > libvirt open /dev/iommufd and FD pass it into qemu ok. > and then share that single FD across all VFIOs. I will ask Alex to help me catch up on the topic. > qemu will typically not be able to > self-open /dev/iommufd as it is root-only. I don't understand, we open multiple fds to KVM devices. This is the same. > > So the object is not exactly for the backend, the object is for the > file descriptor. got it. > > Adding a legacy/iommufd option to the vfio-pci device string doesn't > address these needs. I agree. Thanks, C. > Jason >
On Wed, Sep 20, 2023 at 02:01:39PM +0100, Daniel P. Berrangé wrote: > Assuming we must have the exact same FD used for all vfio-pci devices, > then using -object iommufd is the least worst way to get that FD > injected into QEMU from libvirt. Yes, same FD. It is a shared resource. Jason
On 9/20/23 15:02, Cédric Le Goater wrote: > On 9/20/23 14:51, Jason Gunthorpe wrote: >> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote: >>> On 9/20/23 05:42, Duan, Zhenzhong wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Cédric Le Goater <clg@redhat.com> >>>>> Sent: Wednesday, September 20, 2023 1:08 AM >>>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option >>>>> >>>>> On 8/30/23 12:37, Zhenzhong Duan wrote: >>>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable >>>>>> iommufd support, enabled by default. >>>>> >>>>> Why would someone want to disable support at compile time ? It might >>>> >>>> For those users who only want to support legacy container feature? >>>> Let me know if you still prefer to drop this patch, I'm fine with >>>> that. >>> >>> I think it is too early. >>> >>>>> have been useful for dev but now QEMU should self-adjust at runtime >>>>> depending only on the host capabilities AFAIUI. Am I missing >>>>> something ? >>>> >>>> IOMMUFD doesn't support all features of legacy container, so QEMU >>>> doesn't self-adjust at runtime by checking if host supports IOMMUFD. >>>> We need to specify it explicitly to use IOMMUFD as below: >>>> >>>> -object iommufd,id=iommufd0 >>>> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 >>> >>> OK. I am not sure this is the correct interface yet. At first glance, >>> I wouldn't introduce a new object for a simple backend depending on a >>> kernel interface. I would tend to prefer a "iommu-something" property >>> of the vfio-pci device with string values: "legacy", "iommufd", >>> "default" >>> and define the various interfaces (the ops you proposed) for each >>> depending on the user preference and the capabilities of the host and >>> possibly the device. >> >> I think the idea came from Alex? The major point is to be able to have >> libvirt open /dev/iommufd and FD pass it into qemu > > ok. > >> and then share that single FD across all VFIOs. > > I will ask Alex to help me catch up on the topic. > >> qemu will typically not be able to >> self-open /dev/iommufd as it is root-only. > > I don't understand, we open multiple fds to KVM devices. This is the > same. Actually qemu opens the /dev/iommu in case no fd is passed along with the iommufd object. This is done in [PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in iommufd_backend_connect(). I don't understand either. Thanks Eric > >> >> So the object is not exactly for the backend, the object is for the >> file descriptor. > got it. > >> >> Adding a legacy/iommufd option to the vfio-pci device string doesn't >> address these needs. > > I agree. > > Thanks, > > C. > >> Jason >> >
On Wed, Sep 20, 2023 at 07:37:53PM +0200, Eric Auger wrote: > >> qemu will typically not be able to > >> self-open /dev/iommufd as it is root-only. > > > > I don't understand, we open multiple fds to KVM devices. This is the > > same. > Actually qemu opens the /dev/iommu in case no fd is passed along with > the iommufd object. This is done in > [PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in > > iommufd_backend_connect(). I don't understand either. The char dev node is root only so this automatic behvaior is fine but not useful if qmeu is running in a sandbox. I'm not sure what "multiple fds to KVM devices" means, I don't know anything about kvm devices.. The iommufd design requires one open of the /dev/iommu to be shared across all the vfios. Jason
On Wed, 20 Sep 2023 03:42:20 +0000 "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote: > >-----Original Message----- > >From: Cédric Le Goater <clg@redhat.com> > >Sent: Wednesday, September 20, 2023 1:08 AM > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > > > >On 8/30/23 12:37, Zhenzhong Duan wrote: > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable > >> iommufd support, enabled by default. > > > >Why would someone want to disable support at compile time ? It might > > For those users who only want to support legacy container feature? > Let me know if you still prefer to drop this patch, I'm fine with that. > > >have been useful for dev but now QEMU should self-adjust at runtime > >depending only on the host capabilities AFAIUI. Am I missing something ? > > IOMMUFD doesn't support all features of legacy container, so QEMU > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > We need to specify it explicitly to use IOMMUFD as below: > > -object iommufd,id=iommufd0 > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 There's an important point here that maybe we've let slip for too long. Laine had asked in an internal forum whether the switch to IOMMUFD was visible to the guest. I replied that it wasn't, but this note about IOMMUFD vs container features jogged my memory that I think we still lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO. It seemed like there was something else too, but I don't recall without some research. Ideally we'd have feature parity and libvirt could simply use the native IOMMUFD interface whenever both the kernel and QEMU support it. Without that parity, when does libvirt decide to use IOMMUFD? How would libvirt know if some future IOMMUFD does have parity? Does the XML direct this through some new interpretation of the driver field? ex. "vfio-container" vs "vfio-iommufd" where "vfio" becomes an alias or priority preference. Thanks, Alex
On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote: > On Wed, 20 Sep 2023 03:42:20 +0000 > "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote: > > > >-----Original Message----- > > >From: Cédric Le Goater <clg@redhat.com> > > >Sent: Wednesday, September 20, 2023 1:08 AM > > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > > > > > >On 8/30/23 12:37, Zhenzhong Duan wrote: > > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable > > >> iommufd support, enabled by default. > > > > > >Why would someone want to disable support at compile time ? It might > > > > For those users who only want to support legacy container feature? > > Let me know if you still prefer to drop this patch, I'm fine with that. > > > > >have been useful for dev but now QEMU should self-adjust at runtime > > >depending only on the host capabilities AFAIUI. Am I missing something ? > > > > IOMMUFD doesn't support all features of legacy container, so QEMU > > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > > We need to specify it explicitly to use IOMMUFD as below: > > > > -object iommufd,id=iommufd0 > > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > There's an important point here that maybe we've let slip for too long. > Laine had asked in an internal forum whether the switch to IOMMUFD was > visible to the guest. I replied that it wasn't, but this note about > IOMMUFD vs container features jogged my memory that I think we still > lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO. It > seemed like there was something else too, but I don't recall without > some research. I think p2p is the only guest visible one. I still expect to solve it :\ > Ideally we'd have feature parity and libvirt could simply use the > native IOMMUFD interface whenever both the kernel and QEMU support it. > > Without that parity, when does libvirt decide to use IOMMUFD? > > How would libvirt know if some future IOMMUFD does have parity? At this point I think it is reasonable that iommufd is explicitly opted into. The next step would be automatic for single PCI device VMs (p2p is not relavent) The final step would be automatic if kernel supports P2P. I expect libvirt will be able to detect this from an open'd /dev/iommu. Jason
On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote: > On Wed, 20 Sep 2023 03:42:20 +0000 > "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote: > > > >-----Original Message----- > > >From: Cédric Le Goater <clg@redhat.com> > > >Sent: Wednesday, September 20, 2023 1:08 AM > > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > > > > > >On 8/30/23 12:37, Zhenzhong Duan wrote: > > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable > > >> iommufd support, enabled by default. > > > > > >Why would someone want to disable support at compile time ? It might > > > > For those users who only want to support legacy container feature? > > Let me know if you still prefer to drop this patch, I'm fine with that. > > > > >have been useful for dev but now QEMU should self-adjust at runtime > > >depending only on the host capabilities AFAIUI. Am I missing something ? > > > > IOMMUFD doesn't support all features of legacy container, so QEMU > > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > > We need to specify it explicitly to use IOMMUFD as below: > > > > -object iommufd,id=iommufd0 > > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > There's an important point here that maybe we've let slip for too long. > Laine had asked in an internal forum whether the switch to IOMMUFD was > visible to the guest. I replied that it wasn't, but this note about > IOMMUFD vs container features jogged my memory that I think we still > lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO. It > seemed like there was something else too, but I don't recall without > some research. > > Ideally we'd have feature parity and libvirt could simply use the > native IOMMUFD interface whenever both the kernel and QEMU support it. > > Without that parity, when does libvirt decide to use IOMMUFD? > > How would libvirt know if some future IOMMUFD does have parity? > > Does the XML direct this through some new interpretation of the driver > field? ex. "vfio-container" vs "vfio-iommufd" where "vfio" becomes an > alias or priority preference. Thanks, Right now a host device would have <hostdev mode='subsystem' type='mdev' model='vfio-pci'> ... </hostdev> where model could also accept 'vfio-ccw' / 'vfio-ap' on s390x IIUC. If the use of IOMMUFD has guest ABI feature differences, then we would need to treat this as a new device model in libvirt, ie add vfio-iommu-pci model. Does thos iommufd work with vfio-ccw / vfio-ap too ? If so we'd need new models for those too in libvirt. The downside of this is that it means no appication is going to use iommufd mode without having explicit coding done to make it aware of the new model in libvirt. If we /want/ apps to move over to iommufd approach in a finite short timeframe then IMHO achieving feature parity is critical as feature partiy would let libvirt switch over to it automatically and avoid the pain of updating any apps. This would be my preference, as exposing the iommufd concept to apps feels wrong - this is an internal impl detail ideally. Again we must have feature parity for this to work though. With regards, Daniel
On Wed, 20 Sep 2023 14:49:19 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Wed, Sep 20, 2023 at 07:37:53PM +0200, Eric Auger wrote: > > > >> qemu will typically not be able to > > >> self-open /dev/iommufd as it is root-only. > > > > > > I don't understand, we open multiple fds to KVM devices. This is the > > > same. > > Actually qemu opens the /dev/iommu in case no fd is passed along with > > the iommufd object. This is done in > > [PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in > > > > iommufd_backend_connect(). I don't understand either. > > The char dev node is root only so this automatic behvaior is fine > but not useful if qmeu is running in a sandbox. > > I'm not sure what "multiple fds to KVM devices" means, I don't know > anything about kvm devices.. Looking at a local VM, the only kvm related open file is /dev/kvm, which kvm_init() does directly open. The other tun/tap/vhost files are all passed by fd. We have a bunch of anon_inodes representing eventfds and vcpu source from /dev/kvm, but the only other direct files are disk images and the created pid file. > The iommufd design requires one open of the /dev/iommu to be shared > across all the vfios. "requires"? It's certainly of limited value to have multiple iommufd instances rather than create multiple address spaces within a single iommufd, but what exactly precludes an iommufd per device if QEMU, or any other userspace so desired? Thanks, Alex
On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote: > > The iommufd design requires one open of the /dev/iommu to be shared > > across all the vfios. > > "requires"? It's certainly of limited value to have multiple iommufd > instances rather than create multiple address spaces within a single > iommufd, but what exactly precludes an iommufd per device if QEMU, or > any other userspace so desired? Thanks, From the kernel side requires is too strong I suppose Not sure about these qemu patches though? Jason
On Wed, 20 Sep 2023 15:12:59 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote: > > On Wed, 20 Sep 2023 03:42:20 +0000 > > "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote: > > > > > >-----Original Message----- > > > >From: Cédric Le Goater <clg@redhat.com> > > > >Sent: Wednesday, September 20, 2023 1:08 AM > > > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > > > > > > > >On 8/30/23 12:37, Zhenzhong Duan wrote: > > > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable > > > >> iommufd support, enabled by default. > > > > > > > >Why would someone want to disable support at compile time ? It might > > > > > > For those users who only want to support legacy container feature? > > > Let me know if you still prefer to drop this patch, I'm fine with that. > > > > > > >have been useful for dev but now QEMU should self-adjust at runtime > > > >depending only on the host capabilities AFAIUI. Am I missing something ? > > > > > > IOMMUFD doesn't support all features of legacy container, so QEMU > > > doesn't self-adjust at runtime by checking if host supports IOMMUFD. > > > We need to specify it explicitly to use IOMMUFD as below: > > > > > > -object iommufd,id=iommufd0 > > > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > > > There's an important point here that maybe we've let slip for too long. > > Laine had asked in an internal forum whether the switch to IOMMUFD was > > visible to the guest. I replied that it wasn't, but this note about > > IOMMUFD vs container features jogged my memory that I think we still > > lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO. It > > seemed like there was something else too, but I don't recall without > > some research. > > I think p2p is the only guest visible one. > > I still expect to solve it :\ > > > Ideally we'd have feature parity and libvirt could simply use the > > native IOMMUFD interface whenever both the kernel and QEMU support it. > > > > Without that parity, when does libvirt decide to use IOMMUFD? > > > > How would libvirt know if some future IOMMUFD does have parity? > > At this point I think it is reasonable that iommufd is explicitly > opted into. > > The next step would be automatic for single PCI device VMs (p2p is not > relavent) And when a second PCI device is hot-plugged into the VM and it behaves differently from a VM with multiple statically attached devices? Seems like it's an opt-in until full p2p support, then an opt-out for potential bugs. Thanks, Alex > The final step would be automatic if kernel supports P2P. I expect > libvirt will be able to detect this from an open'd /dev/iommu. > > Jason >
>-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Wednesday, September 20, 2023 8:20 PM >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > >On 9/20/23 05:42, Duan, Zhenzhong wrote: >> >> >>> -----Original Message----- >>> From: Cédric Le Goater <clg@redhat.com> >>> Sent: Wednesday, September 20, 2023 1:08 AM >>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option >>> >>> On 8/30/23 12:37, Zhenzhong Duan wrote: >>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable >>>> iommufd support, enabled by default. >>> >>> Why would someone want to disable support at compile time ? It might >> >> For those users who only want to support legacy container feature? >> Let me know if you still prefer to drop this patch, I'm fine with that. > >I think it is too early. > >>> have been useful for dev but now QEMU should self-adjust at runtime >>> depending only on the host capabilities AFAIUI. Am I missing something ? >> >> IOMMUFD doesn't support all features of legacy container, so QEMU >> doesn't self-adjust at runtime by checking if host supports IOMMUFD. >> We need to specify it explicitly to use IOMMUFD as below: >> >> -object iommufd,id=iommufd0 >> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > >OK. I am not sure this is the correct interface yet. At first glance, >I wouldn't introduce a new object for a simple backend depending on a >kernel interface. I would tend to prefer a "iommu-something" property >of the vfio-pci device with string values: "legacy", "iommufd", "default" >and define the various interfaces (the ops you proposed) for each >depending on the user preference and the capabilities of the host and >possibly the device. > >I might be wrong and this might have been discussed before. If so, it >should go in the cover letter with other things : what is this patchset >providing to VFIO (multiple iommu backends), how it is reaching that >goal, how is it organized, how do we deal with the special case (spapr), >what's the user interface, etc. Got it, I'll add " how is it organized, how do we deal with the special case (spapr)" part, other parts seems already in cover letter, there is a diagram showing the architecture of VFIO/legacy BE/IOMMUFD BE, etc. Thanks Zhenzhong
>-----Original Message----- >From: Jason Gunthorpe <jgg@nvidia.com> >Sent: Thursday, September 21, 2023 2:20 AM >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > >On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote: > >> > The iommufd design requires one open of the /dev/iommu to be shared >> > across all the vfios. >> >> "requires"? It's certainly of limited value to have multiple iommufd >> instances rather than create multiple address spaces within a single >> iommufd, but what exactly precludes an iommufd per device if QEMU, or >> any other userspace so desired? Thanks, > >From the kernel side requires is too strong I suppose > >Not sure about these qemu patches though? I had ever tested with multiple IOMMUFDs and mix of IOMMUFD/legacy BE linking to different VFIO devices with this series, all works fine. Thanks Zhenzhong
>-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Wednesday, September 20, 2023 9:02 PM >Subject: Re: [PATCH v1 15/22] Add iommufd configure option > >On 9/20/23 14:51, Jason Gunthorpe wrote: >> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote: >>> On 9/20/23 05:42, Duan, Zhenzhong wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Cédric Le Goater <clg@redhat.com> >>>>> Sent: Wednesday, September 20, 2023 1:08 AM >>>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option >>>>> >>>>> On 8/30/23 12:37, Zhenzhong Duan wrote: >>>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable >>>>>> iommufd support, enabled by default. >>>>> >>>>> Why would someone want to disable support at compile time ? It might >>>> >>>> For those users who only want to support legacy container feature? >>>> Let me know if you still prefer to drop this patch, I'm fine with that. >>> >>> I think it is too early. >>> >>>>> have been useful for dev but now QEMU should self-adjust at runtime >>>>> depending only on the host capabilities AFAIUI. Am I missing something ? >>>> >>>> IOMMUFD doesn't support all features of legacy container, so QEMU >>>> doesn't self-adjust at runtime by checking if host supports IOMMUFD. >>>> We need to specify it explicitly to use IOMMUFD as below: >>>> >>>> -object iommufd,id=iommufd0 >>>> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 >>> >>> OK. I am not sure this is the correct interface yet. At first glance, >>> I wouldn't introduce a new object for a simple backend depending on a >>> kernel interface. I would tend to prefer a "iommu-something" property >>> of the vfio-pci device with string values: "legacy", "iommufd", "default" >>> and define the various interfaces (the ops you proposed) for each >>> depending on the user preference and the capabilities of the host and >>> possibly the device. >> >> I think the idea came from Alex? The major point is to be able to have >> libvirt open /dev/iommufd and FD pass it into qemu > >ok. > >> and then share that single FD across all VFIOs. > >I will ask Alex to help me catch up on the topic. > >> qemu will typically not be able to >> self-open /dev/iommufd as it is root-only. > >I don't understand, we open multiple fds to KVM devices. This is the same. There are two slight differences: 1. Different group: $ ll /dev/kvm crw-rw----+ 1 root kvm 10, 232 9月 18 14:23 /dev/kvm $ ll /dev/iommu crw-rw---- 1 root root 10, 124 9月 12 14:14 /dev/iommu 2. Default cgroup device allowed list: #cgroup_device_acl = [ # "/dev/null", "/dev/full", "/dev/zero", # "/dev/random", "/dev/urandom", # "/dev/ptmx", "/dev/kvm" #] By default, libvirt creates qemu instance with usr/group libvirt-qemu/kvm So qemu has permission to open /dev/kvm, but not for /dev/iommu. When a general user wants to open /dev/kvm, it's not permitted: duan@duan-server-S2600BT:~$ qemu-system-x86_64 -accel kvm Could not access KVM kernel module: Permission denied qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied Thanks Zhenzhong
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, September 21, 2023 2:20 AM > > On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote: > > > > The iommufd design requires one open of the /dev/iommu to be shared > > > across all the vfios. > > > > "requires"? It's certainly of limited value to have multiple iommufd > > instances rather than create multiple address spaces within a single > > iommufd, but what exactly precludes an iommufd per device if QEMU, or > > any other userspace so desired? Thanks, > > From the kernel side requires is too strong I suppose > Agree. But with limited value let's stay with one iommufd per qemu instance to reduce the maintenance burden. It is also more future-friendly towards nested translation or the need of centralized PASID tracking when supporting SIOV/ENQCMD, etc. Supporting those new features between multiple iommufd's would incur unnecessary complexities.
diff --git a/meson.build b/meson.build index 98e68ef0b1..6526d8cc9b 100644 --- a/meson.build +++ b/meson.build @@ -574,6 +574,10 @@ have_tpm = get_option('tpm') \ .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \ .allowed() +have_iommufd = get_option('iommufd') \ + .require(targetos == 'linux', error_message: 'iommufd is supported only on Linux') \ + .allowed() + # vhost have_vhost_user = get_option('vhost_user') \ .disable_auto_if(targetos != 'linux') \ @@ -2129,6 +2133,7 @@ endif config_host_data.set('CONFIG_SNAPPY', snappy.found()) config_host_data.set('CONFIG_TPM', have_tpm) config_host_data.set('CONFIG_TSAN', get_option('tsan')) +config_host_data.set('CONFIG_IOMMUFD', have_iommufd) config_host_data.set('CONFIG_USB_LIBUSB', libusb.found()) config_host_data.set('CONFIG_VDE', vde.found()) config_host_data.set('CONFIG_VHOST_NET', have_vhost_net) @@ -4051,6 +4056,7 @@ summary_info += {'vhost-user-crypto support': have_vhost_user_crypto} summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server} summary_info += {'vhost-vdpa support': have_vhost_vdpa} summary_info += {'build guest agent': have_ga} +summary_info += {'iommufd support': have_iommufd} summary(summary_info, bool_yn: true, section: 'Configurable features') # Compilation information diff --git a/meson_options.txt b/meson_options.txt index aaea5ddd77..aed91d173b 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -105,6 +105,8 @@ option('dbus_display', type: 'feature', value: 'auto', description: '-display dbus support') option('tpm', type : 'feature', value : 'auto', description: 'TPM support') +option('iommufd', type : 'feature', value : 'auto', + description: 'iommufd support') # Do not enable it by default even for Mingw32, because it doesn't # work on Wine. diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index 9da3fe299b..719401ffb0 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -113,6 +113,7 @@ meson_options_help() { printf "%s\n" ' hax HAX acceleration support' printf "%s\n" ' hvf HVF acceleration support' printf "%s\n" ' iconv Font glyph conversion support' + printf "%s\n" ' iommufd iommufd support' printf "%s\n" ' jack JACK sound support' printf "%s\n" ' keyring Linux keyring support' printf "%s\n" ' kvm KVM acceleration support' @@ -325,6 +326,8 @@ _meson_option_parse() { --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;; --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;; --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;; + --enable-iommufd) printf "%s" -Diommufd=enabled ;; + --disable-iommufd) printf "%s" -Diommufd=disabled ;; --enable-jack) printf "%s" -Djack=enabled ;; --disable-jack) printf "%s" -Djack=disabled ;; --enable-keyring) printf "%s" -Dkeyring=enabled ;;
This adds "--enable-iommufd/--disable-iommufd" to enable or disable iommufd support, enabled by default. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> --- meson.build | 6 ++++++ meson_options.txt | 2 ++ scripts/meson-buildoptions.sh | 3 +++ 3 files changed, 11 insertions(+)