[v1,15/22] Add iommufd configure option

Message ID	20230830103754.36461-16-zhenzhong.duan@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Zhenzhong Duan <zhenzhong.duan@intel.com> To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan <zhenzhong.duan@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, =?utf-8?q?Marc-Andr=C3=A9_Lureau?= <marcandre.lureau@redhat.com>, =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>, Thomas Huth <thuth@redhat.com>, =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org> Subject: [PATCH v1 15/22] Add iommufd configure option Date: Wed, 30 Aug 2023 18:37:47 +0800 Message-Id: <20230830103754.36461-16-zhenzhong.duan@intel.com> In-Reply-To: <20230830103754.36461-1-zhenzhong.duan@intel.com> References: <20230830103754.36461-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=192.55.52.93; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	vfio: Adopt iommufd \| expand [v1,00/22] vfio: Adopt iommufd [v1,01/22] scripts/update-linux-headers: Add iommufd.h [v1,02/22] Update linux-header to support iommufd cdev and hwpt alloc [v1,03/22] vfio/common: Move IOMMU agnostic helpers to a separate file [v1,04/22] vfio/common: Introduce vfio_container_add\|del_section_window() [v1,05/22] vfio/common: Extract out vfio_kvm_device_[add/del]_fd [v1,06/22] vfio/common: Add a vfio device iterator [v1,07/22] vfio/common: Refactor vfio_viommu_preset() to be group agnostic [v1,08/22] vfio/common: Move legacy VFIO backend code into separate container.c [v1,09/22] vfio/container: Introduce vfio_[attach/detach]_device [v1,10/22] vfio/platform: Use vfio_[attach/detach]_device [v1,11/22] vfio/ap: Use vfio_[attach/detach]_device [v1,12/22] vfio/ccw: Use vfio_[attach/detach]_device [v1,13/22] vfio: Add base container [v1,14/22] vfio/common: Simplify vfio_viommu_preset() [v1,15/22] Add iommufd configure option [v1,16/22] backends/iommufd: Introduce the iommufd object [v1,17/22] util/char_dev: Add open_cdev() [v1,18/22] vfio/iommufd: Implement the iommufd backend [v1,19/22] vfio/iommufd: Add vfio device iterator callback for iommufd [v1,20/22] vfio/pci: Adapt vfio pci hot reset support with iommufd BE [v1,21/22] vfio/pci: Allow the selection of a given iommu backend [v1,22/22] vfio/pci: Make vfio cdev pre-openable by passing a file handle

Duan, Zhenzhong Aug. 30, 2023, 10:37 a.m. UTC

This adds "--enable-iommufd/--disable-iommufd" to enable or disable
iommufd support, enabled by default.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 meson.build                   | 6 ++++++
 meson_options.txt             | 2 ++
 scripts/meson-buildoptions.sh | 3 +++
 3 files changed, 11 insertions(+)

Cédric Le Goater Sept. 19, 2023, 5:07 p.m. UTC | #1

On 8/30/23 12:37, Zhenzhong Duan wrote:
> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> iommufd support, enabled by default.

Why would someone want to disable support at compile time ? It might
have been useful for dev but now QEMU should self-adjust at runtime
depending only on the host capabilities AFAIUI. Am I missing something ?

Thanks,

C.


> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   meson.build                   | 6 ++++++
>   meson_options.txt             | 2 ++
>   scripts/meson-buildoptions.sh | 3 +++
>   3 files changed, 11 insertions(+)
> 
> diff --git a/meson.build b/meson.build
> index 98e68ef0b1..6526d8cc9b 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -574,6 +574,10 @@ have_tpm = get_option('tpm') \
>     .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \
>     .allowed()
>   
> +have_iommufd = get_option('iommufd') \
> +  .require(targetos == 'linux', error_message: 'iommufd is supported only on Linux') \
> +  .allowed()
> +
>   # vhost
>   have_vhost_user = get_option('vhost_user') \
>     .disable_auto_if(targetos != 'linux') \
> @@ -2129,6 +2133,7 @@ endif
>   config_host_data.set('CONFIG_SNAPPY', snappy.found())
>   config_host_data.set('CONFIG_TPM', have_tpm)
>   config_host_data.set('CONFIG_TSAN', get_option('tsan'))
> +config_host_data.set('CONFIG_IOMMUFD', have_iommufd)
>   config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
>   config_host_data.set('CONFIG_VDE', vde.found())
>   config_host_data.set('CONFIG_VHOST_NET', have_vhost_net)
> @@ -4051,6 +4056,7 @@ summary_info += {'vhost-user-crypto support': have_vhost_user_crypto}
>   summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server}
>   summary_info += {'vhost-vdpa support': have_vhost_vdpa}
>   summary_info += {'build guest agent': have_ga}
> +summary_info += {'iommufd support': have_iommufd}
>   summary(summary_info, bool_yn: true, section: 'Configurable features')
>   
>   # Compilation information
> diff --git a/meson_options.txt b/meson_options.txt
> index aaea5ddd77..aed91d173b 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -105,6 +105,8 @@ option('dbus_display', type: 'feature', value: 'auto',
>          description: '-display dbus support')
>   option('tpm', type : 'feature', value : 'auto',
>          description: 'TPM support')
> +option('iommufd', type : 'feature', value : 'auto',
> +       description: 'iommufd support')
>   
>   # Do not enable it by default even for Mingw32, because it doesn't
>   # work on Wine.
> diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
> index 9da3fe299b..719401ffb0 100644
> --- a/scripts/meson-buildoptions.sh
> +++ b/scripts/meson-buildoptions.sh
> @@ -113,6 +113,7 @@ meson_options_help() {
>     printf "%s\n" '  hax             HAX acceleration support'
>     printf "%s\n" '  hvf             HVF acceleration support'
>     printf "%s\n" '  iconv           Font glyph conversion support'
> +  printf "%s\n" '  iommufd         iommufd support'
>     printf "%s\n" '  jack            JACK sound support'
>     printf "%s\n" '  keyring         Linux keyring support'
>     printf "%s\n" '  kvm             KVM acceleration support'
> @@ -325,6 +326,8 @@ _meson_option_parse() {
>       --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;;
>       --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;;
>       --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;;
> +    --enable-iommufd) printf "%s" -Diommufd=enabled ;;
> +    --disable-iommufd) printf "%s" -Diommufd=disabled ;;
>       --enable-jack) printf "%s" -Djack=enabled ;;
>       --disable-jack) printf "%s" -Djack=disabled ;;
>       --enable-keyring) printf "%s" -Dkeyring=enabled ;;

Duan, Zhenzhong Sept. 20, 2023, 3:42 a.m. UTC | #2

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, September 20, 2023 1:08 AM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>> iommufd support, enabled by default.
>
>Why would someone want to disable support at compile time ? It might

For those users who only want to support legacy container feature?
Let me know if you still prefer to drop this patch, I'm fine with that.

>have been useful for dev but now QEMU should self-adjust at runtime
>depending only on the host capabilities AFAIUI. Am I missing something ?

IOMMUFD doesn't support all features of legacy container, so QEMU
doesn't self-adjust at runtime by checking if host supports IOMMUFD.
We need to specify it explicitly to use IOMMUFD as below:

    -object iommufd,id=iommufd0
    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

Thanks
Zhenzhong

Cédric Le Goater Sept. 20, 2023, 12:19 p.m. UTC | #3

On 9/20/23 05:42, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Sent: Wednesday, September 20, 2023 1:08 AM
>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>>
>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>>> iommufd support, enabled by default.
>>
>> Why would someone want to disable support at compile time ? It might
> 
> For those users who only want to support legacy container feature?
> Let me know if you still prefer to drop this patch, I'm fine with that.

I think it is too early.

>> have been useful for dev but now QEMU should self-adjust at runtime
>> depending only on the host capabilities AFAIUI. Am I missing something ?
> 
> IOMMUFD doesn't support all features of legacy container, so QEMU
> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> We need to specify it explicitly to use IOMMUFD as below:
> 
>      -object iommufd,id=iommufd0
>      -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

OK. I am not sure this is the correct interface yet. At first glance,
I wouldn't introduce a new object for a simple backend depending on a
kernel interface. I would tend to prefer a "iommu-something" property
of the vfio-pci device with string values: "legacy", "iommufd", "default"
and define the various interfaces (the ops you proposed) for each
depending on the user preference and the capabilities of the host and
possibly the device.

I might be wrong and this might have been discussed before. If so, it
should go in the cover letter with other things : what is this patchset
providing to VFIO (multiple iommu backends), how it is reaching that
goal, how is it organized, how do we deal with the special case (spapr),
what's the user interface, etc.

Thanks,

C.

> Thanks
> Zhenzhong
>

Jason Gunthorpe Sept. 20, 2023, 12:51 p.m. UTC | #4

On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote:
> On 9/20/23 05:42, Duan, Zhenzhong wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Cédric Le Goater <clg@redhat.com>
> > > Sent: Wednesday, September 20, 2023 1:08 AM
> > > Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > > 
> > > On 8/30/23 12:37, Zhenzhong Duan wrote:
> > > > This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > > > iommufd support, enabled by default.
> > > 
> > > Why would someone want to disable support at compile time ? It might
> > 
> > For those users who only want to support legacy container feature?
> > Let me know if you still prefer to drop this patch, I'm fine with that.
> 
> I think it is too early.
> 
> > > have been useful for dev but now QEMU should self-adjust at runtime
> > > depending only on the host capabilities AFAIUI. Am I missing something ?
> > 
> > IOMMUFD doesn't support all features of legacy container, so QEMU
> > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > We need to specify it explicitly to use IOMMUFD as below:
> > 
> >      -object iommufd,id=iommufd0
> >      -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
> 
> OK. I am not sure this is the correct interface yet. At first glance,
> I wouldn't introduce a new object for a simple backend depending on a
> kernel interface. I would tend to prefer a "iommu-something" property
> of the vfio-pci device with string values: "legacy", "iommufd", "default"
> and define the various interfaces (the ops you proposed) for each
> depending on the user preference and the capabilities of the host and
> possibly the device.

I think the idea came from Alex? The major point is to be able to have
libvirt open /dev/iommufd and FD pass it into qemu and then share that
single FD across all VFIOs. qemu will typically not be able to
self-open /dev/iommufd as it is root-only.

So the object is not exactly for the backend, the object is for the
file descriptor.

Adding a legacy/iommufd option to the vfio-pci device string doesn't
address these needs.

Jason

Daniel P. Berrangé Sept. 20, 2023, 1:01 p.m. UTC | #5

On Wed, Sep 20, 2023 at 09:51:03AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote:
> > On 9/20/23 05:42, Duan, Zhenzhong wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Cédric Le Goater <clg@redhat.com>
> > > > Sent: Wednesday, September 20, 2023 1:08 AM
> > > > Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > > > 
> > > > On 8/30/23 12:37, Zhenzhong Duan wrote:
> > > > > This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > > > > iommufd support, enabled by default.
> > > > 
> > > > Why would someone want to disable support at compile time ? It might
> > > 
> > > For those users who only want to support legacy container feature?
> > > Let me know if you still prefer to drop this patch, I'm fine with that.
> > 
> > I think it is too early.
> > 
> > > > have been useful for dev but now QEMU should self-adjust at runtime
> > > > depending only on the host capabilities AFAIUI. Am I missing something ?
> > > 
> > > IOMMUFD doesn't support all features of legacy container, so QEMU
> > > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > > We need to specify it explicitly to use IOMMUFD as below:
> > > 
> > >      -object iommufd,id=iommufd0
> > >      -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
> > 
> > OK. I am not sure this is the correct interface yet. At first glance,
> > I wouldn't introduce a new object for a simple backend depending on a
> > kernel interface. I would tend to prefer a "iommu-something" property
> > of the vfio-pci device with string values: "legacy", "iommufd", "default"
> > and define the various interfaces (the ops you proposed) for each
> > depending on the user preference and the capabilities of the host and
> > possibly the device.
> 
> I think the idea came from Alex? The major point is to be able to have
> libvirt open /dev/iommufd and FD pass it into qemu and then share that
> single FD across all VFIOs. qemu will typically not be able to
> self-open /dev/iommufd as it is root-only.
> 
> So the object is not exactly for the backend, the object is for the
> file descriptor.

Assuming we must have the exact same FD used for all vfio-pci devices,
then using -object iommufd is the least worst way to get that FD
injected into QEMU from libvirt. It is a little sucky in that when
hotplugging/unplugging devices, libvirt has to think about whether or
not it has to object_add/object_del  the iommufd> again I don't see
better options considering the need to have a single global FD.

With regards,
Daniel

Cédric Le Goater Sept. 20, 2023, 1:02 p.m. UTC | #6

On 9/20/23 14:51, Jason Gunthorpe wrote:
> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote:
>> On 9/20/23 05:42, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Cédric Le Goater <clg@redhat.com>
>>>> Sent: Wednesday, September 20, 2023 1:08 AM
>>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>>>>
>>>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>>>>> iommufd support, enabled by default.
>>>>
>>>> Why would someone want to disable support at compile time ? It might
>>>
>>> For those users who only want to support legacy container feature?
>>> Let me know if you still prefer to drop this patch, I'm fine with that.
>>
>> I think it is too early.
>>
>>>> have been useful for dev but now QEMU should self-adjust at runtime
>>>> depending only on the host capabilities AFAIUI. Am I missing something ?
>>>
>>> IOMMUFD doesn't support all features of legacy container, so QEMU
>>> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
>>> We need to specify it explicitly to use IOMMUFD as below:
>>>
>>>       -object iommufd,id=iommufd0
>>>       -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>>
>> OK. I am not sure this is the correct interface yet. At first glance,
>> I wouldn't introduce a new object for a simple backend depending on a
>> kernel interface. I would tend to prefer a "iommu-something" property
>> of the vfio-pci device with string values: "legacy", "iommufd", "default"
>> and define the various interfaces (the ops you proposed) for each
>> depending on the user preference and the capabilities of the host and
>> possibly the device.
> 
> I think the idea came from Alex? The major point is to be able to have
> libvirt open /dev/iommufd and FD pass it into qemu 

ok.

> and then share that single FD across all VFIOs. 

I will ask Alex to help me catch up on the topic.

> qemu will typically not be able to
> self-open /dev/iommufd as it is root-only.

I don't understand, we open multiple fds to KVM devices. This is the same.

> 
> So the object is not exactly for the backend, the object is for the
> file descriptor.
got it.

> 
> Adding a legacy/iommufd option to the vfio-pci device string doesn't
> address these needs.

I agree.

Thanks,

C.

> Jason
>

Jason Gunthorpe Sept. 20, 2023, 1:07 p.m. UTC | #7

On Wed, Sep 20, 2023 at 02:01:39PM +0100, Daniel P. Berrangé wrote:

> Assuming we must have the exact same FD used for all vfio-pci devices,
> then using -object iommufd is the least worst way to get that FD
> injected into QEMU from libvirt. 

Yes, same FD. It is a shared resource.

Jason

Eric Auger Sept. 20, 2023, 5:37 p.m. UTC | #8

On 9/20/23 15:02, Cédric Le Goater wrote:
> On 9/20/23 14:51, Jason Gunthorpe wrote:
>> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote:
>>> On 9/20/23 05:42, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Sent: Wednesday, September 20, 2023 1:08 AM
>>>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>>>>>
>>>>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>>>>>> iommufd support, enabled by default.
>>>>>
>>>>> Why would someone want to disable support at compile time ? It might
>>>>
>>>> For those users who only want to support legacy container feature?
>>>> Let me know if you still prefer to drop this patch, I'm fine with
>>>> that.
>>>
>>> I think it is too early.
>>>
>>>>> have been useful for dev but now QEMU should self-adjust at runtime
>>>>> depending only on the host capabilities AFAIUI. Am I missing
>>>>> something ?
>>>>
>>>> IOMMUFD doesn't support all features of legacy container, so QEMU
>>>> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
>>>> We need to specify it explicitly to use IOMMUFD as below:
>>>>
>>>>       -object iommufd,id=iommufd0
>>>>       -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>>>
>>> OK. I am not sure this is the correct interface yet. At first glance,
>>> I wouldn't introduce a new object for a simple backend depending on a
>>> kernel interface. I would tend to prefer a "iommu-something" property
>>> of the vfio-pci device with string values: "legacy", "iommufd",
>>> "default"
>>> and define the various interfaces (the ops you proposed) for each
>>> depending on the user preference and the capabilities of the host and
>>> possibly the device.
>>
>> I think the idea came from Alex? The major point is to be able to have
>> libvirt open /dev/iommufd and FD pass it into qemu 
>
> ok.
>
>> and then share that single FD across all VFIOs. 
>
> I will ask Alex to help me catch up on the topic.
>
>> qemu will typically not be able to
>> self-open /dev/iommufd as it is root-only.
>
> I don't understand, we open multiple fds to KVM devices. This is the
> same.
Actually qemu opens the /dev/iommu in case no fd is passed along with
the iommufd object. This is done in
[PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in

iommufd_backend_connect(). I don't understand either.

Thanks

Eric

>
>>
>> So the object is not exactly for the backend, the object is for the
>> file descriptor.
> got it.
>
>>
>> Adding a legacy/iommufd option to the vfio-pci device string doesn't
>> address these needs.
>
> I agree.
>
> Thanks,
>
> C.
>
>> Jason
>>
>

Jason Gunthorpe Sept. 20, 2023, 5:49 p.m. UTC | #9

On Wed, Sep 20, 2023 at 07:37:53PM +0200, Eric Auger wrote:

> >> qemu will typically not be able to
> >> self-open /dev/iommufd as it is root-only.
> >
> > I don't understand, we open multiple fds to KVM devices. This is the
> > same.
> Actually qemu opens the /dev/iommu in case no fd is passed along with
> the iommufd object. This is done in
> [PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in
> 
> iommufd_backend_connect(). I don't understand either.

The char dev node is root only so this automatic behvaior is fine
but not useful if qmeu is running in a sandbox.

I'm not sure what "multiple fds to KVM devices" means, I don't know
anything about kvm devices..

The iommufd design requires one open of the /dev/iommu to be shared
across all the vfios.

Jason

Alex Williamson Sept. 20, 2023, 6:01 p.m. UTC | #10

On Wed, 20 Sep 2023 03:42:20 +0000
"Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote:

> >-----Original Message-----
> >From: Cédric Le Goater <clg@redhat.com>
> >Sent: Wednesday, September 20, 2023 1:08 AM
> >Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> >
> >On 8/30/23 12:37, Zhenzhong Duan wrote:  
> >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> >> iommufd support, enabled by default.  
> >
> >Why would someone want to disable support at compile time ? It might  
> 
> For those users who only want to support legacy container feature?
> Let me know if you still prefer to drop this patch, I'm fine with that.
> 
> >have been useful for dev but now QEMU should self-adjust at runtime
> >depending only on the host capabilities AFAIUI. Am I missing something ?  
> 
> IOMMUFD doesn't support all features of legacy container, so QEMU
> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> We need to specify it explicitly to use IOMMUFD as below:
> 
>     -object iommufd,id=iommufd0
>     -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

There's an important point here that maybe we've let slip for too long.
Laine had asked in an internal forum whether the switch to IOMMUFD was
visible to the guest.  I replied that it wasn't, but this note about
IOMMUFD vs container features jogged my memory that I think we still
lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO.  It
seemed like there was something else too, but I don't recall without
some research.

Ideally we'd have feature parity and libvirt could simply use the
native IOMMUFD interface whenever both the kernel and QEMU support it.

Without that parity, when does libvirt decide to use IOMMUFD?

How would libvirt know if some future IOMMUFD does have parity?

Does the XML direct this through some new interpretation of the driver
field? ex. "vfio-container" vs "vfio-iommufd" where "vfio" becomes an
alias or priority preference.  Thanks,

Alex

Jason Gunthorpe Sept. 20, 2023, 6:12 p.m. UTC | #11

On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote:
> On Wed, 20 Sep 2023 03:42:20 +0000
> "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote:
> 
> > >-----Original Message-----
> > >From: Cédric Le Goater <clg@redhat.com>
> > >Sent: Wednesday, September 20, 2023 1:08 AM
> > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > >
> > >On 8/30/23 12:37, Zhenzhong Duan wrote:  
> > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > >> iommufd support, enabled by default.  
> > >
> > >Why would someone want to disable support at compile time ? It might  
> > 
> > For those users who only want to support legacy container feature?
> > Let me know if you still prefer to drop this patch, I'm fine with that.
> > 
> > >have been useful for dev but now QEMU should self-adjust at runtime
> > >depending only on the host capabilities AFAIUI. Am I missing something ?  
> > 
> > IOMMUFD doesn't support all features of legacy container, so QEMU
> > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > We need to specify it explicitly to use IOMMUFD as below:
> > 
> >     -object iommufd,id=iommufd0
> >     -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
> 
> There's an important point here that maybe we've let slip for too long.
> Laine had asked in an internal forum whether the switch to IOMMUFD was
> visible to the guest.  I replied that it wasn't, but this note about
> IOMMUFD vs container features jogged my memory that I think we still
> lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO.  It
> seemed like there was something else too, but I don't recall without
> some research.

I think p2p is the only guest visible one.

I still expect to solve it :\

> Ideally we'd have feature parity and libvirt could simply use the
> native IOMMUFD interface whenever both the kernel and QEMU support it.
> 
> Without that parity, when does libvirt decide to use IOMMUFD?
> 
> How would libvirt know if some future IOMMUFD does have parity?

At this point I think it is reasonable that iommufd is explicitly
opted into.

The next step would be automatic for single PCI device VMs (p2p is not
relavent)

The final step would be automatic if kernel supports P2P. I expect
libvirt will be able to detect this from an open'd /dev/iommu.

Jason

Daniel P. Berrangé Sept. 20, 2023, 6:15 p.m. UTC | #12

On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote:
> On Wed, 20 Sep 2023 03:42:20 +0000
> "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote:
> 
> > >-----Original Message-----
> > >From: Cédric Le Goater <clg@redhat.com>
> > >Sent: Wednesday, September 20, 2023 1:08 AM
> > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > >
> > >On 8/30/23 12:37, Zhenzhong Duan wrote:  
> > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > >> iommufd support, enabled by default.  
> > >
> > >Why would someone want to disable support at compile time ? It might  
> > 
> > For those users who only want to support legacy container feature?
> > Let me know if you still prefer to drop this patch, I'm fine with that.
> > 
> > >have been useful for dev but now QEMU should self-adjust at runtime
> > >depending only on the host capabilities AFAIUI. Am I missing something ?  
> > 
> > IOMMUFD doesn't support all features of legacy container, so QEMU
> > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > We need to specify it explicitly to use IOMMUFD as below:
> > 
> >     -object iommufd,id=iommufd0
> >     -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
> 
> There's an important point here that maybe we've let slip for too long.
> Laine had asked in an internal forum whether the switch to IOMMUFD was
> visible to the guest.  I replied that it wasn't, but this note about
> IOMMUFD vs container features jogged my memory that I think we still
> lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO.  It
> seemed like there was something else too, but I don't recall without
> some research.
> 
> Ideally we'd have feature parity and libvirt could simply use the
> native IOMMUFD interface whenever both the kernel and QEMU support it.
> 
> Without that parity, when does libvirt decide to use IOMMUFD?
> 
> How would libvirt know if some future IOMMUFD does have parity?
> 
> Does the XML direct this through some new interpretation of the driver
> field? ex. "vfio-container" vs "vfio-iommufd" where "vfio" becomes an
> alias or priority preference.  Thanks,

Right now a host device would have

  <hostdev mode='subsystem' type='mdev' model='vfio-pci'>
   ...
  </hostdev>

where model could also accept 'vfio-ccw' / 'vfio-ap' on s390x IIUC.

If the use of IOMMUFD has guest ABI feature differences, then we
would need to treat this as a new device model in libvirt, ie add
vfio-iommu-pci model.   Does thos iommufd work with vfio-ccw / vfio-ap
too ? If so we'd need new models for those too in libvirt.

The downside of this is that it means no appication is going to
use iommufd mode without having explicit coding done to make it
aware of the new model in libvirt.

If we /want/ apps to move over to iommufd approach in a finite
short timeframe then IMHO achieving feature parity is critical
as feature partiy would let libvirt switch over to it automatically
and avoid the pain of updating any apps. This would be my preference,
as exposing the iommufd concept to apps feels wrong - this is an
internal impl detail ideally. Again we must have feature parity
for this to work though.

With regards,
Daniel

Alex Williamson Sept. 20, 2023, 6:17 p.m. UTC | #13

On Wed, 20 Sep 2023 14:49:19 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Sep 20, 2023 at 07:37:53PM +0200, Eric Auger wrote:
> 
> > >> qemu will typically not be able to
> > >> self-open /dev/iommufd as it is root-only.  
> > >
> > > I don't understand, we open multiple fds to KVM devices. This is the
> > > same.  
> > Actually qemu opens the /dev/iommu in case no fd is passed along with
> > the iommufd object. This is done in
> > [PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in
> > 
> > iommufd_backend_connect(). I don't understand either.  
> 
> The char dev node is root only so this automatic behvaior is fine
> but not useful if qmeu is running in a sandbox.
> 
> I'm not sure what "multiple fds to KVM devices" means, I don't know
> anything about kvm devices..

Looking at a local VM, the only kvm related open file is /dev/kvm,
which kvm_init() does directly open.  The other tun/tap/vhost files are
all passed by fd.  We have a bunch of anon_inodes representing eventfds
and vcpu source from /dev/kvm, but the only other direct files are disk
images and the created pid file.

> The iommufd design requires one open of the /dev/iommu to be shared
> across all the vfios.

"requires"?  It's certainly of limited value to have multiple iommufd
instances rather than create multiple address spaces within a single
iommufd, but what exactly precludes an iommufd per device if QEMU, or
any other userspace so desired?  Thanks,

Alex

Jason Gunthorpe Sept. 20, 2023, 6:19 p.m. UTC | #14

On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote:

> > The iommufd design requires one open of the /dev/iommu to be shared
> > across all the vfios.
> 
> "requires"?  It's certainly of limited value to have multiple iommufd
> instances rather than create multiple address spaces within a single
> iommufd, but what exactly precludes an iommufd per device if QEMU, or
> any other userspace so desired?  Thanks,

From the kernel side requires is too strong I suppose

Not sure about these qemu patches though?

Jason

Alex Williamson Sept. 20, 2023, 8:29 p.m. UTC | #15

On Wed, 20 Sep 2023 15:12:59 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote:
> > On Wed, 20 Sep 2023 03:42:20 +0000
> > "Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote:
> >   
> > > >-----Original Message-----
> > > >From: Cédric Le Goater <clg@redhat.com>
> > > >Sent: Wednesday, September 20, 2023 1:08 AM
> > > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > > >
> > > >On 8/30/23 12:37, Zhenzhong Duan wrote:    
> > > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > > >> iommufd support, enabled by default.    
> > > >
> > > >Why would someone want to disable support at compile time ? It might    
> > > 
> > > For those users who only want to support legacy container feature?
> > > Let me know if you still prefer to drop this patch, I'm fine with that.
> > >   
> > > >have been useful for dev but now QEMU should self-adjust at runtime
> > > >depending only on the host capabilities AFAIUI. Am I missing something ?    
> > > 
> > > IOMMUFD doesn't support all features of legacy container, so QEMU
> > > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > > We need to specify it explicitly to use IOMMUFD as below:
> > > 
> > >     -object iommufd,id=iommufd0
> > >     -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0  
> > 
> > There's an important point here that maybe we've let slip for too long.
> > Laine had asked in an internal forum whether the switch to IOMMUFD was
> > visible to the guest.  I replied that it wasn't, but this note about
> > IOMMUFD vs container features jogged my memory that I think we still
> > lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO.  It
> > seemed like there was something else too, but I don't recall without
> > some research.  
> 
> I think p2p is the only guest visible one.
> 
> I still expect to solve it :\
> 
> > Ideally we'd have feature parity and libvirt could simply use the
> > native IOMMUFD interface whenever both the kernel and QEMU support it.
> > 
> > Without that parity, when does libvirt decide to use IOMMUFD?
> > 
> > How would libvirt know if some future IOMMUFD does have parity?  
> 
> At this point I think it is reasonable that iommufd is explicitly
> opted into.
> 
> The next step would be automatic for single PCI device VMs (p2p is not
> relavent)

And when a second PCI device is hot-plugged into the VM and it behaves
differently from a VM with multiple statically attached devices?  Seems
like it's an opt-in until full p2p support, then an opt-out for
potential bugs.  Thanks,

Alex

> The final step would be automatic if kernel supports P2P. I expect
> libvirt will be able to detect this from an open'd /dev/iommu.
> 
> Jason
>

Duan, Zhenzhong Sept. 21, 2023, 2:11 a.m. UTC | #16

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, September 20, 2023 8:20 PM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On 9/20/23 05:42, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Sent: Wednesday, September 20, 2023 1:08 AM
>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>>>
>>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>>>> iommufd support, enabled by default.
>>>
>>> Why would someone want to disable support at compile time ? It might
>>
>> For those users who only want to support legacy container feature?
>> Let me know if you still prefer to drop this patch, I'm fine with that.
>
>I think it is too early.
>
>>> have been useful for dev but now QEMU should self-adjust at runtime
>>> depending only on the host capabilities AFAIUI. Am I missing something ?
>>
>> IOMMUFD doesn't support all features of legacy container, so QEMU
>> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
>> We need to specify it explicitly to use IOMMUFD as below:
>>
>>      -object iommufd,id=iommufd0
>>      -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
>OK. I am not sure this is the correct interface yet. At first glance,
>I wouldn't introduce a new object for a simple backend depending on a
>kernel interface. I would tend to prefer a "iommu-something" property
>of the vfio-pci device with string values: "legacy", "iommufd", "default"
>and define the various interfaces (the ops you proposed) for each
>depending on the user preference and the capabilities of the host and
>possibly the device.
>
>I might be wrong and this might have been discussed before. If so, it
>should go in the cover letter with other things : what is this patchset
>providing to VFIO (multiple iommu backends), how it is reaching that
>goal, how is it organized, how do we deal with the special case (spapr),
>what's the user interface, etc.

Got it, I'll add " how is it organized, how do we deal with the special case (spapr)"
part, other parts seems already in cover letter, there is a diagram showing
the architecture of VFIO/legacy BE/IOMMUFD BE, etc.

Thanks
Zhenzhong

Duan, Zhenzhong Sept. 21, 2023, 3:43 a.m. UTC | #17

>-----Original Message-----
>From: Jason Gunthorpe <jgg@nvidia.com>
>Sent: Thursday, September 21, 2023 2:20 AM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote:
>
>> > The iommufd design requires one open of the /dev/iommu to be shared
>> > across all the vfios.
>>
>> "requires"?  It's certainly of limited value to have multiple iommufd
>> instances rather than create multiple address spaces within a single
>> iommufd, but what exactly precludes an iommufd per device if QEMU, or
>> any other userspace so desired?  Thanks,
>
>From the kernel side requires is too strong I suppose
>
>Not sure about these qemu patches though?

I had ever tested with multiple IOMMUFDs and mix of IOMMUFD/legacy BE linking to different VFIO devices with this series,  all works fine.

Thanks
Zhenzhong

Duan, Zhenzhong Sept. 21, 2023, 4 a.m. UTC | #18

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, September 20, 2023 9:02 PM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On 9/20/23 14:51, Jason Gunthorpe wrote:
>> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote:
>>> On 9/20/23 05:42, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Sent: Wednesday, September 20, 2023 1:08 AM
>>>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>>>>>
>>>>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>>>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>>>>>> iommufd support, enabled by default.
>>>>>
>>>>> Why would someone want to disable support at compile time ? It might
>>>>
>>>> For those users who only want to support legacy container feature?
>>>> Let me know if you still prefer to drop this patch, I'm fine with that.
>>>
>>> I think it is too early.
>>>
>>>>> have been useful for dev but now QEMU should self-adjust at runtime
>>>>> depending only on the host capabilities AFAIUI. Am I missing something ?
>>>>
>>>> IOMMUFD doesn't support all features of legacy container, so QEMU
>>>> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
>>>> We need to specify it explicitly to use IOMMUFD as below:
>>>>
>>>>       -object iommufd,id=iommufd0
>>>>       -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>>>
>>> OK. I am not sure this is the correct interface yet. At first glance,
>>> I wouldn't introduce a new object for a simple backend depending on a
>>> kernel interface. I would tend to prefer a "iommu-something" property
>>> of the vfio-pci device with string values: "legacy", "iommufd", "default"
>>> and define the various interfaces (the ops you proposed) for each
>>> depending on the user preference and the capabilities of the host and
>>> possibly the device.
>>
>> I think the idea came from Alex? The major point is to be able to have
>> libvirt open /dev/iommufd and FD pass it into qemu
>
>ok.
>
>> and then share that single FD across all VFIOs.
>
>I will ask Alex to help me catch up on the topic.
>
>> qemu will typically not be able to
>> self-open /dev/iommufd as it is root-only.
>
>I don't understand, we open multiple fds to KVM devices. This is the same.

There are two slight differences:

1. Different group:
$ ll /dev/kvm
crw-rw----+ 1 root kvm 10, 232  9月 18 14:23 /dev/kvm
$ ll /dev/iommu
crw-rw---- 1 root root 10, 124  9月 12 14:14 /dev/iommu

2. Default cgroup device allowed list:
#cgroup_device_acl = [
#    "/dev/null", "/dev/full", "/dev/zero",
#    "/dev/random", "/dev/urandom",
#    "/dev/ptmx", "/dev/kvm"
#]

By default, libvirt creates qemu instance with usr/group libvirt-qemu/kvm
So qemu has permission to open /dev/kvm, but not for /dev/iommu.

When a general user wants to open /dev/kvm, it's not permitted:

duan@duan-server-S2600BT:~$ qemu-system-x86_64 -accel kvm
Could not access KVM kernel module: Permission denied
qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied

Thanks
Zhenzhong

Tian, Kevin Sept. 26, 2023, 6:05 a.m. UTC | #19

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, September 21, 2023 2:20 AM
> 
> On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote:
> 
> > > The iommufd design requires one open of the /dev/iommu to be shared
> > > across all the vfios.
> >
> > "requires"?  It's certainly of limited value to have multiple iommufd
> > instances rather than create multiple address spaces within a single
> > iommufd, but what exactly precludes an iommufd per device if QEMU, or
> > any other userspace so desired?  Thanks,
> 
> From the kernel side requires is too strong I suppose
> 

Agree. But with limited value let's stay with one iommufd per qemu
instance to reduce the maintenance burden.

It is also more future-friendly towards nested translation or the
need of centralized PASID tracking when supporting SIOV/ENQCMD, etc.

Supporting those new features between multiple iommufd's would
incur unnecessary complexities.

[v1,15/22] Add iommufd configure option

Commit Message

Comments

Patch