[v11,20/23] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT

Message ID	20230513132827.39066-21-yi.l.liu@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Yi Liu <yi.l.liu@intel.com> To: alex.williamson@redhat.com, jgg@nvidia.com, kevin.tian@intel.com Date: Sat, 13 May 2023 06:28:24 -0700 Message-Id: <20230513132827.39066-21-yi.l.liu@intel.com> In-Reply-To: <20230513132827.39066-1-yi.l.liu@intel.com> References: <20230513132827.39066-1-yi.l.liu@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH v11 20/23] vfio: Add VFIO_DEVICE_[AT\|DE]TACH_IOMMUFD_PT Precedence: list Cc: mjrosato@linux.ibm.com, jasowang@redhat.com, xudong.hao@intel.com, zhenzhong.duan@intel.com, peterx@redhat.com, terrence.xu@intel.com, chao.p.peng@linux.intel.com, linux-s390@vger.kernel.org, yi.l.liu@intel.com, kvm@vger.kernel.org, lulu@redhat.com, yanting.jiang@intel.com, joro@8bytes.org, nicolinc@nvidia.com, yan.y.zhao@intel.com, intel-gfx@lists.freedesktop.org, eric.auger@redhat.com, intel-gvt-dev@lists.freedesktop.org, yi.y.sun@linux.intel.com, clegoate@redhat.com, cohuck@redhat.com, shameerali.kolothum.thodi@huawei.com, suravee.suthikulpanit@amd.com, robin.murphy@arm.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Add vfio_device cdev for iommufd support \| expand [v11,00/23] Add vfio_device cdev for iommufd support [v11,01/23] vfio: Allocate per device file structure [v11,02/23] vfio: Refine vfio file kAPIs for KVM [v11,03/23] vfio: Accept vfio device file in the KVM facing kAPI [v11,04/23] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd [v11,05/23] kvm/vfio: Accept vfio device file from userspace [v11,06/23] vfio: Pass struct vfio_device_file * to vfio_device_open/close() [v11,07/23] vfio: Block device access via device fd until device is opened [v11,08/23] vfio: Add cdev_device_open_cnt to vfio_group [v11,09/23] vfio: Make vfio_device_open() single open for device cdev path [v11,10/23] vfio-iommufd: Move noiommu compat probe out of vfio_iommufd_bind() [v11,11/23] vfio-iommufd: Split bind/attach into two steps [v11,12/23] vfio: Record devid in vfio_device_file [v11,13/23] vfio-iommufd: Add detach_ioas support for physical VFIO devices [v11,14/23] iommufd/device: Add iommufd_access_detach() API [v11,15/23] vfio-iommufd: Add detach_ioas support for emulated VFIO devices [v11,16/23] vfio: Name noiommu vfio_device with "noiommu-" prefix [v11,17/23] vfio: Move vfio_device_group_unregister() to be the first operation in unregister [v11,18/23] vfio: Add cdev for vfio_device [v11,19/23] vfio: Add VFIO_DEVICE_BIND_IOMMUFD [v11,20/23] vfio: Add VFIO_DEVICE_[AT\|DE]TACH_IOMMUFD_PT [v11,21/23] vfio: Determine noiommu device in __vfio_register_dev() [v11,22/23] vfio: Compile vfio_group infrastructure optionally [v11,23/23] docs: vfio: Add vfio device cdev description

Yi Liu May 13, 2023, 1:28 p.m. UTC

This adds ioctl for userspace to attach device cdev fd to and detach
from IOAS/hw_pagetable managed by iommufd.

    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
				   managed by iommufd. Attach can be
				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
				   or device fd close.
    VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
				   IOAS or hw_pagetable managed by iommufd.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/iommufd.c     | 18 +++++++++++
 drivers/vfio/vfio.h        | 18 +++++++++++
 drivers/vfio/vfio_main.c   |  8 +++++
 include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++++++
 5 files changed, 162 insertions(+)

Alex Williamson May 22, 2023, 10:15 p.m. UTC | #1

On Sat, 13 May 2023 06:28:24 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds ioctl for userspace to attach device cdev fd to and detach
> from IOAS/hw_pagetable managed by iommufd.
> 
>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> 				   managed by iommufd. Attach can be
> 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> 				   or device fd close.
>     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> 				   IOAS or hw_pagetable managed by iommufd.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/iommufd.c     | 18 +++++++++++
>  drivers/vfio/vfio.h        | 18 +++++++++++
>  drivers/vfio/vfio_main.c   |  8 +++++
>  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++++++
>  5 files changed, 162 insertions(+)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 291cc678a18b..3f14edb80a93 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return ret;
>  }
>  
> +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +			     struct vfio_device_attach_iommufd_pt __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_attach_iommufd_pt attach;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> +
> +	if (copy_from_user(&attach, arg, minsz))
> +		return -EFAULT;
> +
> +	if (attach.argsz < minsz || attach.flags)
> +		return -EINVAL;
> +
> +	/* ATTACH only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	ret = vfio_iommufd_attach(device, &attach.pt_id);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_detach;
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +
> +out_detach:
> +	vfio_iommufd_detach(device);
> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	return ret;
> +}
> +
> +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +			     struct vfio_device_detach_iommufd_pt __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_detach_iommufd_pt detach;
> +	unsigned long minsz;
> +
> +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> +
> +	if (copy_from_user(&detach, arg, minsz))
> +		return -EFAULT;
> +
> +	if (detach.argsz < minsz || detach.flags)
> +		return -EINVAL;
> +
> +	/* DETACH only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	vfio_iommufd_detach(device);
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +}
> +
>  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
>  {
>  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 83575b65ea01..799ea322a7d4 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
>  
> +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> +{
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	if (vfio_device_is_noiommu(vdev))
> +		return 0;

Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
return success and copy back the provided pt_id, why would a user not
consider it a bug that they can't use whatever value was there with
iommufd?

> +
> +	return vdev->ops->attach_ioas(vdev, pt_id);
> +}
> +
> +void vfio_iommufd_detach(struct vfio_device *vdev)
> +{
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	if (!vfio_device_is_noiommu(vdev))
> +		vdev->ops->detach_ioas(vdev);
> +}
> +
>  struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
>  {
>  	if (vdev->iommufd_device)
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 8b359a7794be..50553f67600f 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -241,6 +241,8 @@ int vfio_iommufd_bind(struct vfio_device_file *df);
>  void vfio_iommufd_unbind(struct vfio_device_file *df);
>  int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
>  				    struct iommufd_ctx *ictx);
> +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id);
> +void vfio_iommufd_detach(struct vfio_device *vdev);
>  #else
>  static inline int
>  vfio_iommufd_compat_probe_noiommu(struct vfio_device *device,
> @@ -282,6 +284,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
>  void vfio_device_cdev_close(struct vfio_device_file *df);
>  long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  				    struct vfio_device_bind_iommufd __user *arg);
> +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +			     struct vfio_device_attach_iommufd_pt __user *arg);
> +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +			     struct vfio_device_detach_iommufd_pt __user *arg);
>  int vfio_cdev_init(struct class *device_class);
>  void vfio_cdev_cleanup(void);
>  #else
> @@ -315,6 +321,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return -EOPNOTSUPP;
>  }
>  
> +static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +					   struct vfio_device_attach_iommufd_pt __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +					   struct vfio_device_detach_iommufd_pt __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  static inline int vfio_cdev_init(struct class *device_class)
>  {
>  	return 0;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index c9fa39ac4b02..8c3f26b4929b 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1165,6 +1165,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
>  		break;
>  
> +	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
> +		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
> +		break;
> +
> +	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
> +		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
> +		break;
> +
>  	default:
>  		if (unlikely(!device->ops->ioctl))
>  			ret = -EINVAL;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 07c917de31e9..770f5f949929 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -222,6 +222,58 @@ struct vfio_device_bind_iommufd {
>  
>  #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
>  
> +/*
> + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
> + *					struct vfio_device_attach_iommufd_pt)
> + *
> + * Attach a vfio device to an iommufd address space specified by IOAS
> + * id or hw_pagetable (hwpt) id.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VFIO_DEVICE_BIND_IOMMUFD
> + *
> + * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
> + *
> + * @argsz:	User filled size of this data.
> + * @flags:	Must be 0.
> + * @pt_id:	Input the target id which can represent an ioas or a hwpt
> + *		allocated via iommufd subsystem.
> + *		Output the input ioas id or the attached hwpt id which could
> + *		be the specified hwpt itself or a hwpt automatically created
> + *		for the specified ioas by kernel during the attachment.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_attach_iommufd_pt {
> +	__u32	argsz;
> +	__u32	flags;
> +	__u32	pt_id;
> +};
> +
> +#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
> +
> +/*
> + * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
> + *					struct vfio_device_detach_iommufd_pt)
> + *
> + * Detach a vfio device from the iommufd address space it has been
> + * attached to. After it, device should be in a blocking DMA state.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VFIO_DEVICE_BIND_IOMMUFD.

These "[a]vailable only after" comments are meaningless, if the user
has the file descriptor the ioctl is available.  We can say that ATTACH
should be used after BIND to associate the device with an address space
within the bound iommufd and DETACH removes that association, but the
user is welcome to call everything in the wrong order and we need to be
prepared for that anyway.  Thanks,

Alex

> + *
> + * @argsz:	User filled size of this data.
> + * @flags:	Must be 0.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_detach_iommufd_pt {
> +	__u32	argsz;
> +	__u32	flags;
> +};
> +
> +#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /**
>   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
>   *						struct vfio_device_info)

Yi Liu May 23, 2023, 1:20 a.m. UTC | #2

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, May 23, 2023 6:16 AM
> 
> On Sat, 13 May 2023 06:28:24 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds ioctl for userspace to attach device cdev fd to and detach
> > from IOAS/hw_pagetable managed by iommufd.
> >
> >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> > 				   managed by iommufd. Attach can be
> > 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> > 				   or device fd close.
> >     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> > 				   IOAS or hw_pagetable managed by iommufd.
> >
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/iommufd.c     | 18 +++++++++++
> >  drivers/vfio/vfio.h        | 18 +++++++++++
> >  drivers/vfio/vfio_main.c   |  8 +++++
> >  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++++++
> >  5 files changed, 162 insertions(+)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 291cc678a18b..3f14edb80a93 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file
> *df,
> >  	return ret;
> >  }
> >
> > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +			     struct vfio_device_attach_iommufd_pt __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_attach_iommufd_pt attach;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > +
> > +	if (copy_from_user(&attach, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (attach.argsz < minsz || attach.flags)
> > +		return -EINVAL;
> > +
> > +	/* ATTACH only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	ret = vfio_iommufd_attach(device, &attach.pt_id);
> > +	if (ret)
> > +		goto out_unlock;
> > +
> > +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> > +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> > +	if (ret)
> > +		goto out_detach;
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +
> > +out_detach:
> > +	vfio_iommufd_detach(device);
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	return ret;
> > +}
> > +
> > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +			     struct vfio_device_detach_iommufd_pt __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_detach_iommufd_pt detach;
> > +	unsigned long minsz;
> > +
> > +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > +
> > +	if (copy_from_user(&detach, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (detach.argsz < minsz || detach.flags)
> > +		return -EINVAL;
> > +
> > +	/* DETACH only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	vfio_iommufd_detach(device);
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +}
> > +
> >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> >  {
> >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 83575b65ea01..799ea322a7d4 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> >  		vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > +{
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	if (vfio_device_is_noiommu(vdev))
> > +		return 0;
> 
> Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> return success and copy back the provided pt_id, why would a user not
> consider it a bug that they can't use whatever value was there with
> iommufd?

Yes, this is the question I asked in [1]. At that time, it appears to me
that better to allow it [2]. Maybe it's more suitable to ask it here.

[1] https://lore.kernel.org/kvm/c203f11f-4d9f-cf43-03ab-e41a858bdd92@intel.com/
[2] https://lore.kernel.org/kvm/ZFFUyhqID+LtUB%2FD@nvidia.com/

> > +
> > +	return vdev->ops->attach_ioas(vdev, pt_id);
> > +}
> > +
> > +void vfio_iommufd_detach(struct vfio_device *vdev)
> > +{
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	if (!vfio_device_is_noiommu(vdev))
> > +		vdev->ops->detach_ioas(vdev);
> > +}
> > +
> >  struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> >  {
> >  	if (vdev->iommufd_device)
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 8b359a7794be..50553f67600f 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -241,6 +241,8 @@ int vfio_iommufd_bind(struct vfio_device_file *df);
> >  void vfio_iommufd_unbind(struct vfio_device_file *df);
> >  int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
> >  				    struct iommufd_ctx *ictx);
> > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id);
> > +void vfio_iommufd_detach(struct vfio_device *vdev);
> >  #else
> >  static inline int
> >  vfio_iommufd_compat_probe_noiommu(struct vfio_device *device,
> > @@ -282,6 +284,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct
> file *filep);
> >  void vfio_device_cdev_close(struct vfio_device_file *df);
> >  long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> >  				    struct vfio_device_bind_iommufd __user *arg);
> > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +			     struct vfio_device_attach_iommufd_pt __user *arg);
> > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +			     struct vfio_device_detach_iommufd_pt __user *arg);
> >  int vfio_cdev_init(struct class *device_class);
> >  void vfio_cdev_cleanup(void);
> >  #else
> > @@ -315,6 +321,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct
> vfio_device_file *df,
> >  	return -EOPNOTSUPP;
> >  }
> >
> > +static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +					   struct vfio_device_attach_iommufd_pt __user
> *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +					   struct vfio_device_detach_iommufd_pt
> __user *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  static inline int vfio_cdev_init(struct class *device_class)
> >  {
> >  	return 0;
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index c9fa39ac4b02..8c3f26b4929b 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1165,6 +1165,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
> >  		break;
> >
> > +	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
> > +		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
> > +		break;
> > +
> > +	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
> > +		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
> > +		break;
> > +
> >  	default:
> >  		if (unlikely(!device->ops->ioctl))
> >  			ret = -EINVAL;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 07c917de31e9..770f5f949929 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -222,6 +222,58 @@ struct vfio_device_bind_iommufd {
> >
> >  #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
> >
> > +/*
> > + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
> > + *					struct vfio_device_attach_iommufd_pt)
> > + *
> > + * Attach a vfio device to an iommufd address space specified by IOAS
> > + * id or hw_pagetable (hwpt) id.
> > + *
> > + * Available only after a device has been bound to iommufd via
> > + * VFIO_DEVICE_BIND_IOMMUFD
> > + *
> > + * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
> > + *
> > + * @argsz:	User filled size of this data.
> > + * @flags:	Must be 0.
> > + * @pt_id:	Input the target id which can represent an ioas or a hwpt
> > + *		allocated via iommufd subsystem.
> > + *		Output the input ioas id or the attached hwpt id which could
> > + *		be the specified hwpt itself or a hwpt automatically created
> > + *		for the specified ioas by kernel during the attachment.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_attach_iommufd_pt {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +	__u32	pt_id;
> > +};
> > +
> > +#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE,
> VFIO_BASE + 20)
> > +
> > +/*
> > + * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
> > + *					struct vfio_device_detach_iommufd_pt)
> > + *
> > + * Detach a vfio device from the iommufd address space it has been
> > + * attached to. After it, device should be in a blocking DMA state.
> > + *
> > + * Available only after a device has been bound to iommufd via
> > + * VFIO_DEVICE_BIND_IOMMUFD.
> 
> These "[a]vailable only after" comments are meaningless, if the user
> has the file descriptor the ioctl is available.  We can say that ATTACH
> should be used after BIND to associate the device with an address space
> within the bound iommufd and DETACH removes that association, but the
> user is welcome to call everything in the wrong order and we need to be
> prepared for that anyway.  Thanks,

Oh, yes. it's available as long as FD is got. But it is expected to fail if
the order is not met. This should be what the comment really wants
to deliver. Will have a look at other ioctls as well.

Regards,
Yi Liu

> 
> Alex
> 
> > + *
> > + * @argsz:	User filled size of this data.
> > + * @flags:	Must be 0.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_detach_iommufd_pt {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +};
> > +
> > +#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE,
> VFIO_BASE + 21)
> > +
> >  /**
> >   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
> >   *						struct vfio_device_info)

Alex Williamson May 23, 2023, 3:50 p.m. UTC | #3

On Tue, 23 May 2023 01:20:17 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, May 23, 2023 6:16 AM
> > 
> > On Sat, 13 May 2023 06:28:24 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This adds ioctl for userspace to attach device cdev fd to and detach
> > > from IOAS/hw_pagetable managed by iommufd.
> > >
> > >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> > > 				   managed by iommufd. Attach can be
> > > 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> > > 				   or device fd close.
> > >     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> > > 				   IOAS or hw_pagetable managed by iommufd.
> > >
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
> > >  drivers/vfio/iommufd.c     | 18 +++++++++++
> > >  drivers/vfio/vfio.h        | 18 +++++++++++
> > >  drivers/vfio/vfio_main.c   |  8 +++++
> > >  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++++++
> > >  5 files changed, 162 insertions(+)
> > >
> > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > index 291cc678a18b..3f14edb80a93 100644
> > > --- a/drivers/vfio/device_cdev.c
> > > +++ b/drivers/vfio/device_cdev.c
> > > @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file  
> > *df,  
> > >  	return ret;
> > >  }
> > >
> > > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > > +			     struct vfio_device_attach_iommufd_pt __user *arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_attach_iommufd_pt attach;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > > +
> > > +	if (copy_from_user(&attach, arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (attach.argsz < minsz || attach.flags)
> > > +		return -EINVAL;
> > > +
> > > +	/* ATTACH only allowed for cdev fds */
> > > +	if (df->group)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	ret = vfio_iommufd_attach(device, &attach.pt_id);
> > > +	if (ret)
> > > +		goto out_unlock;
> > > +
> > > +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> > > +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> > > +	if (ret)
> > > +		goto out_detach;
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +
> > > +	return 0;
> > > +
> > > +out_detach:
> > > +	vfio_iommufd_detach(device);
> > > +out_unlock:
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	return ret;
> > > +}
> > > +
> > > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > > +			     struct vfio_device_detach_iommufd_pt __user *arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_detach_iommufd_pt detach;
> > > +	unsigned long minsz;
> > > +
> > > +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > > +
> > > +	if (copy_from_user(&detach, arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (detach.argsz < minsz || detach.flags)
> > > +		return -EINVAL;
> > > +
> > > +	/* DETACH only allowed for cdev fds */
> > > +	if (df->group)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	vfio_iommufd_detach(device);
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> > >  {
> > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > index 83575b65ea01..799ea322a7d4 100644
> > > --- a/drivers/vfio/iommufd.c
> > > +++ b/drivers/vfio/iommufd.c
> > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> > >  		vdev->ops->unbind_iommufd(vdev);
> > >  }
> > >
> > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > +{
> > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > +
> > > +	if (vfio_device_is_noiommu(vdev))
> > > +		return 0;  
> > 
> > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > return success and copy back the provided pt_id, why would a user not
> > consider it a bug that they can't use whatever value was there with
> > iommufd?  
> 
> Yes, this is the question I asked in [1]. At that time, it appears to me
> that better to allow it [2]. Maybe it's more suitable to ask it here.

From an API perspective it seems wrong.  We return success without
doing anything.  A user would be right to consider it a bug that the
attach operation works but there's not actually any association to the
IOAS.  Thanks,

Alex


> [1] https://lore.kernel.org/kvm/c203f11f-4d9f-cf43-03ab-e41a858bdd92@intel.com/
> [2] https://lore.kernel.org/kvm/ZFFUyhqID+LtUB%2FD@nvidia.com/
> 
> > > +
> > > +	return vdev->ops->attach_ioas(vdev, pt_id);
> > > +}
> > > +
> > > +void vfio_iommufd_detach(struct vfio_device *vdev)
> > > +{
> > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > +
> > > +	if (!vfio_device_is_noiommu(vdev))
> > > +		vdev->ops->detach_ioas(vdev);
> > > +}
> > > +
> > >  struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > >  {
> > >  	if (vdev->iommufd_device)
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index 8b359a7794be..50553f67600f 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -241,6 +241,8 @@ int vfio_iommufd_bind(struct vfio_device_file *df);
> > >  void vfio_iommufd_unbind(struct vfio_device_file *df);
> > >  int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
> > >  				    struct iommufd_ctx *ictx);
> > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id);
> > > +void vfio_iommufd_detach(struct vfio_device *vdev);
> > >  #else
> > >  static inline int
> > >  vfio_iommufd_compat_probe_noiommu(struct vfio_device *device,
> > > @@ -282,6 +284,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct  
> > file *filep);  
> > >  void vfio_device_cdev_close(struct vfio_device_file *df);
> > >  long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > >  				    struct vfio_device_bind_iommufd __user *arg);
> > > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > > +			     struct vfio_device_attach_iommufd_pt __user *arg);
> > > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > > +			     struct vfio_device_detach_iommufd_pt __user *arg);
> > >  int vfio_cdev_init(struct class *device_class);
> > >  void vfio_cdev_cleanup(void);
> > >  #else
> > > @@ -315,6 +321,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct  
> > vfio_device_file *df,  
> > >  	return -EOPNOTSUPP;
> > >  }
> > >
> > > +static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > > +					   struct vfio_device_attach_iommufd_pt __user  
> > *arg)  
> > > +{
> > > +	return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > > +					   struct vfio_device_detach_iommufd_pt  
> > __user *arg)  
> > > +{
> > > +	return -EOPNOTSUPP;
> > > +}
> > > +
> > >  static inline int vfio_cdev_init(struct class *device_class)
> > >  {
> > >  	return 0;
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index c9fa39ac4b02..8c3f26b4929b 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -1165,6 +1165,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > >  		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
> > >  		break;
> > >
> > > +	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
> > > +		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
> > > +		break;
> > > +
> > > +	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
> > > +		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
> > > +		break;
> > > +
> > >  	default:
> > >  		if (unlikely(!device->ops->ioctl))
> > >  			ret = -EINVAL;
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 07c917de31e9..770f5f949929 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -222,6 +222,58 @@ struct vfio_device_bind_iommufd {
> > >
> > >  #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
> > >
> > > +/*
> > > + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
> > > + *					struct vfio_device_attach_iommufd_pt)
> > > + *
> > > + * Attach a vfio device to an iommufd address space specified by IOAS
> > > + * id or hw_pagetable (hwpt) id.
> > > + *
> > > + * Available only after a device has been bound to iommufd via
> > > + * VFIO_DEVICE_BIND_IOMMUFD
> > > + *
> > > + * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
> > > + *
> > > + * @argsz:	User filled size of this data.
> > > + * @flags:	Must be 0.
> > > + * @pt_id:	Input the target id which can represent an ioas or a hwpt
> > > + *		allocated via iommufd subsystem.
> > > + *		Output the input ioas id or the attached hwpt id which could
> > > + *		be the specified hwpt itself or a hwpt automatically created
> > > + *		for the specified ioas by kernel during the attachment.
> > > + *
> > > + * Return: 0 on success, -errno on failure.
> > > + */
> > > +struct vfio_device_attach_iommufd_pt {
> > > +	__u32	argsz;
> > > +	__u32	flags;
> > > +	__u32	pt_id;
> > > +};
> > > +
> > > +#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE,  
> > VFIO_BASE + 20)  
> > > +
> > > +/*
> > > + * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
> > > + *					struct vfio_device_detach_iommufd_pt)
> > > + *
> > > + * Detach a vfio device from the iommufd address space it has been
> > > + * attached to. After it, device should be in a blocking DMA state.
> > > + *
> > > + * Available only after a device has been bound to iommufd via
> > > + * VFIO_DEVICE_BIND_IOMMUFD.  
> > 
> > These "[a]vailable only after" comments are meaningless, if the user
> > has the file descriptor the ioctl is available.  We can say that ATTACH
> > should be used after BIND to associate the device with an address space
> > within the bound iommufd and DETACH removes that association, but the
> > user is welcome to call everything in the wrong order and we need to be
> > prepared for that anyway.  Thanks,  
> 
> Oh, yes. it's available as long as FD is got. But it is expected to fail if
> the order is not met. This should be what the comment really wants
> to deliver. Will have a look at other ioctls as well.
> 
> Regards,
> Yi Liu
> 
> > 
> > Alex
> >   
> > > + *
> > > + * @argsz:	User filled size of this data.
> > > + * @flags:	Must be 0.
> > > + *
> > > + * Return: 0 on success, -errno on failure.
> > > + */
> > > +struct vfio_device_detach_iommufd_pt {
> > > +	__u32	argsz;
> > > +	__u32	flags;
> > > +};
> > > +
> > > +#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE,  
> > VFIO_BASE + 21)  
> > > +
> > >  /**
> > >   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
> > >   *						struct vfio_device_info)  
>

Yi Liu May 24, 2023, 2:12 a.m. UTC | #4

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, May 23, 2023 11:50 PM
> 
> On Tue, 23 May 2023 01:20:17 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, May 23, 2023 6:16 AM
> > >
> > > On Sat, 13 May 2023 06:28:24 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > This adds ioctl for userspace to attach device cdev fd to and detach
> > > > from IOAS/hw_pagetable managed by iommufd.
> > > >
> > > >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> > > > 				   managed by iommufd. Attach can be
> > > > 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> > > > 				   or device fd close.
> > > >     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current
> attached
> > > > 				   IOAS or hw_pagetable managed by iommufd.
> > > >
> > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
> > > >  drivers/vfio/iommufd.c     | 18 +++++++++++
> > > >  drivers/vfio/vfio.h        | 18 +++++++++++
> > > >  drivers/vfio/vfio_main.c   |  8 +++++
> > > >  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++++++
> > > >  5 files changed, 162 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > > index 291cc678a18b..3f14edb80a93 100644
> > > > --- a/drivers/vfio/device_cdev.c
> > > > +++ b/drivers/vfio/device_cdev.c
> > > > @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct
> vfio_device_file
> > > *df,
> > > >  	return ret;
> > > >  }
> > > >
> > > > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > > > +			     struct vfio_device_attach_iommufd_pt __user *arg)
> > > > +{
> > > > +	struct vfio_device *device = df->device;
> > > > +	struct vfio_device_attach_iommufd_pt attach;
> > > > +	unsigned long minsz;
> > > > +	int ret;
> > > > +
> > > > +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > > > +
> > > > +	if (copy_from_user(&attach, arg, minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	if (attach.argsz < minsz || attach.flags)
> > > > +		return -EINVAL;
> > > > +
> > > > +	/* ATTACH only allowed for cdev fds */
> > > > +	if (df->group)
> > > > +		return -EINVAL;
> > > > +
> > > > +	mutex_lock(&device->dev_set->lock);
> > > > +	ret = vfio_iommufd_attach(device, &attach.pt_id);
> > > > +	if (ret)
> > > > +		goto out_unlock;
> > > > +
> > > > +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> > > > +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> > > > +	if (ret)
> > > > +		goto out_detach;
> > > > +	mutex_unlock(&device->dev_set->lock);
> > > > +
> > > > +	return 0;
> > > > +
> > > > +out_detach:
> > > > +	vfio_iommufd_detach(device);
> > > > +out_unlock:
> > > > +	mutex_unlock(&device->dev_set->lock);
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > > > +			     struct vfio_device_detach_iommufd_pt __user *arg)
> > > > +{
> > > > +	struct vfio_device *device = df->device;
> > > > +	struct vfio_device_detach_iommufd_pt detach;
> > > > +	unsigned long minsz;
> > > > +
> > > > +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > > > +
> > > > +	if (copy_from_user(&detach, arg, minsz))
> > > > +		return -EFAULT;
> > > > +
> > > > +	if (detach.argsz < minsz || detach.flags)
> > > > +		return -EINVAL;
> > > > +
> > > > +	/* DETACH only allowed for cdev fds */
> > > > +	if (df->group)
> > > > +		return -EINVAL;
> > > > +
> > > > +	mutex_lock(&device->dev_set->lock);
> > > > +	vfio_iommufd_detach(device);
> > > > +	mutex_unlock(&device->dev_set->lock);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> > > >  {
> > > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > index 83575b65ea01..799ea322a7d4 100644
> > > > --- a/drivers/vfio/iommufd.c
> > > > +++ b/drivers/vfio/iommufd.c
> > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> > > >  		vdev->ops->unbind_iommufd(vdev);
> > > >  }
> > > >
> > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > +{
> > > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > > +
> > > > +	if (vfio_device_is_noiommu(vdev))
> > > > +		return 0;
> > >
> > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > > return success and copy back the provided pt_id, why would a user not
> > > consider it a bug that they can't use whatever value was there with
> > > iommufd?
> >
> > Yes, this is the question I asked in [1]. At that time, it appears to me
> > that better to allow it [2]. Maybe it's more suitable to ask it here.
> 
> From an API perspective it seems wrong.  We return success without
> doing anything.  A user would be right to consider it a bug that the
> attach operation works but there's not actually any association to the
> IOAS.  Thanks,

The current version is kind of tradeoff based on prior remarks when
I asked the question. As prior comment[2], it appears to me the attach
shall success for noiommu devices as well, but per your remark it seems
not in plan. So anyway, we may just fail the attach/detach for noiommu
devices. Is it?

btw. Should we document it somewhere as well? E.g. noiommu userspace
does not support attach/detach? Userspace should know it is opening
noiommu devices.

Regards,
Yi Liu

Alex Williamson May 24, 2023, 3:31 p.m. UTC | #5

On Wed, 24 May 2023 02:12:14 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, May 23, 2023 11:50 PM
> > 
> > On Tue, 23 May 2023 01:20:17 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, May 23, 2023 6:16 AM
> > > >
> > > > On Sat, 13 May 2023 06:28:24 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > This adds ioctl for userspace to attach device cdev fd to and detach
> > > > > from IOAS/hw_pagetable managed by iommufd.
> > > > >
> > > > >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> > > > > 				   managed by iommufd. Attach can be
> > > > > 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> > > > > 				   or device fd close.
> > > > >     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current  
> > attached  
> > > > > 				   IOAS or hw_pagetable managed by iommufd.
> > > > >
> > > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
> > > > >  drivers/vfio/iommufd.c     | 18 +++++++++++
> > > > >  drivers/vfio/vfio.h        | 18 +++++++++++
> > > > >  drivers/vfio/vfio_main.c   |  8 +++++
> > > > >  include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++++++
> > > > >  5 files changed, 162 insertions(+)
> > > > >
> > > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > > > index 291cc678a18b..3f14edb80a93 100644
> > > > > --- a/drivers/vfio/device_cdev.c
> > > > > +++ b/drivers/vfio/device_cdev.c
> > > > > @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct  
> > vfio_device_file  
> > > > *df,  
> > > > >  	return ret;
> > > > >  }
> > > > >
> > > > > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > > > > +			     struct vfio_device_attach_iommufd_pt __user *arg)
> > > > > +{
> > > > > +	struct vfio_device *device = df->device;
> > > > > +	struct vfio_device_attach_iommufd_pt attach;
> > > > > +	unsigned long minsz;
> > > > > +	int ret;
> > > > > +
> > > > > +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > > > > +
> > > > > +	if (copy_from_user(&attach, arg, minsz))
> > > > > +		return -EFAULT;
> > > > > +
> > > > > +	if (attach.argsz < minsz || attach.flags)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	/* ATTACH only allowed for cdev fds */
> > > > > +	if (df->group)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	mutex_lock(&device->dev_set->lock);
> > > > > +	ret = vfio_iommufd_attach(device, &attach.pt_id);
> > > > > +	if (ret)
> > > > > +		goto out_unlock;
> > > > > +
> > > > > +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> > > > > +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> > > > > +	if (ret)
> > > > > +		goto out_detach;
> > > > > +	mutex_unlock(&device->dev_set->lock);
> > > > > +
> > > > > +	return 0;
> > > > > +
> > > > > +out_detach:
> > > > > +	vfio_iommufd_detach(device);
> > > > > +out_unlock:
> > > > > +	mutex_unlock(&device->dev_set->lock);
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > > > > +			     struct vfio_device_detach_iommufd_pt __user *arg)
> > > > > +{
> > > > > +	struct vfio_device *device = df->device;
> > > > > +	struct vfio_device_detach_iommufd_pt detach;
> > > > > +	unsigned long minsz;
> > > > > +
> > > > > +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > > > > +
> > > > > +	if (copy_from_user(&detach, arg, minsz))
> > > > > +		return -EFAULT;
> > > > > +
> > > > > +	if (detach.argsz < minsz || detach.flags)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	/* DETACH only allowed for cdev fds */
> > > > > +	if (df->group)
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	mutex_lock(&device->dev_set->lock);
> > > > > +	vfio_iommufd_detach(device);
> > > > > +	mutex_unlock(&device->dev_set->lock);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> > > > >  {
> > > > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > > index 83575b65ea01..799ea322a7d4 100644
> > > > > --- a/drivers/vfio/iommufd.c
> > > > > +++ b/drivers/vfio/iommufd.c
> > > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> > > > >  		vdev->ops->unbind_iommufd(vdev);
> > > > >  }
> > > > >
> > > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > > +{
> > > > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > > > +
> > > > > +	if (vfio_device_is_noiommu(vdev))
> > > > > +		return 0;  
> > > >
> > > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > > > return success and copy back the provided pt_id, why would a user not
> > > > consider it a bug that they can't use whatever value was there with
> > > > iommufd?  
> > >
> > > Yes, this is the question I asked in [1]. At that time, it appears to me
> > > that better to allow it [2]. Maybe it's more suitable to ask it here.  
> > 
> > From an API perspective it seems wrong.  We return success without
> > doing anything.  A user would be right to consider it a bug that the
> > attach operation works but there's not actually any association to the
> > IOAS.  Thanks,  
> 
> The current version is kind of tradeoff based on prior remarks when
> I asked the question. As prior comment[2], it appears to me the attach
> shall success for noiommu devices as well, but per your remark it seems
> not in plan. So anyway, we may just fail the attach/detach for noiommu
> devices. Is it?

If a user creates an ioas within an iommufd, attaches a device to that
ioas and populates it with mappings, wouldn't the user expect the
device to have access to and honor those mappings?  I think that's the
path we're headed down if we report a successful attach of a noiommu
device to an ioas.

We need to keep in mind that noiommu was meant to be a minimally
intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
the group requirements, solely for the purpose of making use of the
vfio device interface and without providing any DMA mapping services or
expectations.  IMO, an argument that we need the attach op to succeed in
order to avoid too much disruption in userspace code is nonsense.  On
the contrary, userspace needs to be very aware of this difference and
we shouldn't invest effort trying to make noiommu more convenient to
use.  It's inherently unsafe.

I'm not fond of what a mess noiommu has become with cdev, we're well
beyond the minimal code trickery of the legacy implementation.  I hate
to ask, but could we reiterate our requirements for noiommu as a part of
the native iommufd interface for vfio?  The nested userspace requirement
is gone now that hypervisors have vIOMMU support, so my assumption is
that this is only for bare metal systems without an IOMMU, which
ideally are less and less prevalent.  Are there any noiommu userspaces
that are actually going to adopt the noiommu cdev interface?  What
terrible things happen if noiommu only exists in the vfio group compat
interface to iommufd and at some distant point in the future dies when
that gets disabled?

> btw. Should we document it somewhere as well? E.g. noiommu userspace
> does not support attach/detach? Userspace should know it is opening
> noiommu devices.

Documentation never hurts.  This is such a specialized use case I'm not
sure we've bothered to do much documentation for noiommu previously.
Thanks,

Alex

Yi Liu May 25, 2023, 3:03 a.m. UTC | #6

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, May 24, 2023 11:32 PM
> 
> On Wed, 24 May 2023 02:12:14 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, May 23, 2023 11:50 PM
> > >
> > > On Tue, 23 May 2023 01:20:17 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, May 23, 2023 6:16 AM
> > > > >
> > > > > On Sat, 13 May 2023 06:28:24 -0700
> > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > >
> > > > > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > > > index 83575b65ea01..799ea322a7d4 100644
> > > > > > --- a/drivers/vfio/iommufd.c
> > > > > > +++ b/drivers/vfio/iommufd.c
> > > > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> > > > > >  		vdev->ops->unbind_iommufd(vdev);
> > > > > >  }
> > > > > >
> > > > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > > > +{
> > > > > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > > > > +
> > > > > > +	if (vfio_device_is_noiommu(vdev))
> > > > > > +		return 0;
> > > > >
> > > > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > > > > return success and copy back the provided pt_id, why would a user not
> > > > > consider it a bug that they can't use whatever value was there with
> > > > > iommufd?
> > > >
> > > > Yes, this is the question I asked in [1]. At that time, it appears to me
> > > > that better to allow it [2]. Maybe it's more suitable to ask it here.
> > >
> > > From an API perspective it seems wrong.  We return success without
> > > doing anything.  A user would be right to consider it a bug that the
> > > attach operation works but there's not actually any association to the
> > > IOAS.  Thanks,
> >
> > The current version is kind of tradeoff based on prior remarks when
> > I asked the question. As prior comment[2], it appears to me the attach
> > shall success for noiommu devices as well, but per your remark it seems
> > not in plan. So anyway, we may just fail the attach/detach for noiommu
> > devices. Is it?
> 
> If a user creates an ioas within an iommufd, attaches a device to that
> ioas and populates it with mappings, wouldn't the user expect the
> device to have access to and honor those mappings?  I think that's the
> path we're headed down if we report a successful attach of a noiommu
> device to an ioas.

makes sense. Let's just fail attach/detach for noiommu devices.

> 
> We need to keep in mind that noiommu was meant to be a minimally
> intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
> the group requirements, solely for the purpose of making use of the
> vfio device interface and without providing any DMA mapping services or
> expectations.  IMO, an argument that we need the attach op to succeed in
> order to avoid too much disruption in userspace code is nonsense.  On
> the contrary, userspace needs to be very aware of this difference and
> we shouldn't invest effort trying to make noiommu more convenient to
> use.  It's inherently unsafe.
> 
> I'm not fond of what a mess noiommu has become with cdev, we're well
> beyond the minimal code trickery of the legacy implementation.  I hate
> to ask, but could we reiterate our requirements for noiommu as a part of
> the native iommufd interface for vfio?  The nested userspace requirement
> is gone now that hypervisors have vIOMMU support, so my assumption is
> that this is only for bare metal systems without an IOMMU, which
> ideally are less and less prevalent.  Are there any noiommu userspaces
> that are actually going to adopt the noiommu cdev interface?  What
> terrible things happen if noiommu only exists in the vfio group compat
> interface to iommufd and at some distant point in the future dies when
> that gets disabled?

vIOMMU may introduce some performance deduction if there
are frequent map/unmap. As far as I know, some cloud service
providers are more willing to use noiommu mode within VM.
Besides the performance consideration, using a booting a VM
without vIOMMU is supposed to be more robust. But I'm not
sure if the noiommu userspace will adapt to cdev noiommu.
Perhaps yes if group may be deprecated in future.

> > btw. Should we document it somewhere as well? E.g. noiommu userspace
> > does not support attach/detach? Userspace should know it is opening
> > noiommu devices.
> 
> Documentation never hurts.  This is such a specialized use case I'm not
> sure we've bothered to do much documentation for noiommu previously.

Seems no, I didn't find special documentation for noiommu. Perhaps
a comment in the source code is enough. Depends on your taste.

Regards,
Yi Liu

Alex Williamson May 25, 2023, 3:59 p.m. UTC | #7

On Thu, 25 May 2023 03:03:54 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, May 24, 2023 11:32 PM
> > 
> > On Wed, 24 May 2023 02:12:14 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, May 23, 2023 11:50 PM
> > > >
> > > > On Tue, 23 May 2023 01:20:17 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Tuesday, May 23, 2023 6:16 AM
> > > > > >
> > > > > > On Sat, 13 May 2023 06:28:24 -0700
> > > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > > >  
> > > > > > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > > > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > > > > index 83575b65ea01..799ea322a7d4 100644
> > > > > > > --- a/drivers/vfio/iommufd.c
> > > > > > > +++ b/drivers/vfio/iommufd.c
> > > > > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> > > > > > >  		vdev->ops->unbind_iommufd(vdev);
> > > > > > >  }
> > > > > > >
> > > > > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > > > > +{
> > > > > > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > > > > > +
> > > > > > > +	if (vfio_device_is_noiommu(vdev))
> > > > > > > +		return 0;  
> > > > > >
> > > > > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > > > > > return success and copy back the provided pt_id, why would a user not
> > > > > > consider it a bug that they can't use whatever value was there with
> > > > > > iommufd?  
> > > > >
> > > > > Yes, this is the question I asked in [1]. At that time, it appears to me
> > > > > that better to allow it [2]. Maybe it's more suitable to ask it here.  
> > > >
> > > > From an API perspective it seems wrong.  We return success without
> > > > doing anything.  A user would be right to consider it a bug that the
> > > > attach operation works but there's not actually any association to the
> > > > IOAS.  Thanks,  
> > >
> > > The current version is kind of tradeoff based on prior remarks when
> > > I asked the question. As prior comment[2], it appears to me the attach
> > > shall success for noiommu devices as well, but per your remark it seems
> > > not in plan. So anyway, we may just fail the attach/detach for noiommu
> > > devices. Is it?  
> > 
> > If a user creates an ioas within an iommufd, attaches a device to that
> > ioas and populates it with mappings, wouldn't the user expect the
> > device to have access to and honor those mappings?  I think that's the
> > path we're headed down if we report a successful attach of a noiommu
> > device to an ioas.  
> 
> makes sense. Let's just fail attach/detach for noiommu devices.
> 
> > 
> > We need to keep in mind that noiommu was meant to be a minimally
> > intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
> > the group requirements, solely for the purpose of making use of the
> > vfio device interface and without providing any DMA mapping services or
> > expectations.  IMO, an argument that we need the attach op to succeed in
> > order to avoid too much disruption in userspace code is nonsense.  On
> > the contrary, userspace needs to be very aware of this difference and
> > we shouldn't invest effort trying to make noiommu more convenient to
> > use.  It's inherently unsafe.
> > 
> > I'm not fond of what a mess noiommu has become with cdev, we're well
> > beyond the minimal code trickery of the legacy implementation.  I hate
> > to ask, but could we reiterate our requirements for noiommu as a part of
> > the native iommufd interface for vfio?  The nested userspace requirement
> > is gone now that hypervisors have vIOMMU support, so my assumption is
> > that this is only for bare metal systems without an IOMMU, which
> > ideally are less and less prevalent.  Are there any noiommu userspaces
> > that are actually going to adopt the noiommu cdev interface?  What
> > terrible things happen if noiommu only exists in the vfio group compat
> > interface to iommufd and at some distant point in the future dies when
> > that gets disabled?  
> 
> vIOMMU may introduce some performance deduction if there
> are frequent map/unmap.

We use passthrough mode of the vIOMMU to negate that overhead for guest
drivers and vfio drivers have typically learned by now that dynamic
mappings using the vfio type1 mapping API are a bad idea.

> As far as I know, some cloud service
> providers are more willing to use noiommu mode within VM.

Sure, the VM itself is still isolated by the host IOMMU, but it's
clearly an extra maintenance and development burden when we should
instead be encouraging those use cases to use vIOMMU rather than
porting to a different noiommu uAPI.  Even if the host is not exposed,
any sort of security and support best practices in the guest should
favor a vIOMMU solution.

> Besides the performance consideration, using a booting a VM
> without vIOMMU is supposed to be more robust. But I'm not

What claims do you have to support lack of robustness in vIOMMU?  Can
they be fixed?

> sure if the noiommu userspace will adapt to cdev noiommu.
> Perhaps yes if group may be deprecated in future.

Deprecation is going to take a long time.  IMO, the VM use cases should
all be encouraged to adopt a vIOMMU solution rather than port to a new
cdev noiommu interface.  The question then is whether there are ongoing
bare metal noiommu use cases and how long those will drag out the vfio
group deprecation.  We could always add noiommu to the native vfio cdev
interface later if there's still demand.

> > > btw. Should we document it somewhere as well? E.g. noiommu userspace
> > > does not support attach/detach? Userspace should know it is opening
> > > noiommu devices.  
> > 
> > Documentation never hurts.  This is such a specialized use case I'm not
> > sure we've bothered to do much documentation for noiommu previously.  
> 
> Seems no, I didn't find special documentation for noiommu. Perhaps
> a comment in the source code is enough. Depends on your taste.

If we're only continuing the group compat noiommu support, I can't very
well require new documentation.  We have a simple model there, noiommu
devices only support the noiommu container type and provide no mapping
interfaces.  The iommufd interface relative to noiommu seems more
nuanced and probably needs to documentation should we decide to pursue
it. Thanks,

Alex

Yi Liu May 26, 2023, 8:38 a.m. UTC | #8

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, May 26, 2023 12:00 AM
> 
> On Thu, 25 May 2023 03:03:54 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, May 24, 2023 11:32 PM
> > >
> > > On Wed, 24 May 2023 02:12:14 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, May 23, 2023 11:50 PM
> > > > >
> > > > > On Tue, 23 May 2023 01:20:17 +0000
> > > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > > >
> > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > Sent: Tuesday, May 23, 2023 6:16 AM
> > > > > > >
> > > > > > > On Sat, 13 May 2023 06:28:24 -0700
> > > > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > > > >
> > > > > > > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > > > > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > > > > > index 83575b65ea01..799ea322a7d4 100644
> > > > > > > > --- a/drivers/vfio/iommufd.c
> > > > > > > > +++ b/drivers/vfio/iommufd.c
> > > > > > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file
> *df)
> > > > > > > >  		vdev->ops->unbind_iommufd(vdev);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > > > > > +{
> > > > > > > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > > > > > > +
> > > > > > > > +	if (vfio_device_is_noiommu(vdev))
> > > > > > > > +		return 0;
> > > > > > >
> > > > > > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > > > > > > return success and copy back the provided pt_id, why would a user not
> > > > > > > consider it a bug that they can't use whatever value was there with
> > > > > > > iommufd?
> > > > > >
> > > > > > Yes, this is the question I asked in [1]. At that time, it appears to me
> > > > > > that better to allow it [2]. Maybe it's more suitable to ask it here.
> > > > >
> > > > > From an API perspective it seems wrong.  We return success without
> > > > > doing anything.  A user would be right to consider it a bug that the
> > > > > attach operation works but there's not actually any association to the
> > > > > IOAS.  Thanks,
> > > >
> > > > The current version is kind of tradeoff based on prior remarks when
> > > > I asked the question. As prior comment[2], it appears to me the attach
> > > > shall success for noiommu devices as well, but per your remark it seems
> > > > not in plan. So anyway, we may just fail the attach/detach for noiommu
> > > > devices. Is it?
> > >
> > > If a user creates an ioas within an iommufd, attaches a device to that
> > > ioas and populates it with mappings, wouldn't the user expect the
> > > device to have access to and honor those mappings?  I think that's the
> > > path we're headed down if we report a successful attach of a noiommu
> > > device to an ioas.
> >
> > makes sense. Let's just fail attach/detach for noiommu devices.
> >
> > >
> > > We need to keep in mind that noiommu was meant to be a minimally
> > > intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
> > > the group requirements, solely for the purpose of making use of the
> > > vfio device interface and without providing any DMA mapping services or
> > > expectations.  IMO, an argument that we need the attach op to succeed in
> > > order to avoid too much disruption in userspace code is nonsense.  On
> > > the contrary, userspace needs to be very aware of this difference and
> > > we shouldn't invest effort trying to make noiommu more convenient to
> > > use.  It's inherently unsafe.
> > >
> > > I'm not fond of what a mess noiommu has become with cdev, we're well
> > > beyond the minimal code trickery of the legacy implementation.  I hate
> > > to ask, but could we reiterate our requirements for noiommu as a part of
> > > the native iommufd interface for vfio?  The nested userspace requirement
> > > is gone now that hypervisors have vIOMMU support, so my assumption is
> > > that this is only for bare metal systems without an IOMMU, which
> > > ideally are less and less prevalent.  Are there any noiommu userspaces
> > > that are actually going to adopt the noiommu cdev interface?  What
> > > terrible things happen if noiommu only exists in the vfio group compat
> > > interface to iommufd and at some distant point in the future dies when
> > > that gets disabled?
> >
> > vIOMMU may introduce some performance deduction if there
> > are frequent map/unmap.
> 
> We use passthrough mode of the vIOMMU to negate that overhead for guest
> drivers and vfio drivers have typically learned by now that dynamic
> mappings using the vfio type1 mapping API are a bad idea.

Yes, this can avoid this overhead.

> 
> > As far as I know, some cloud service
> > providers are more willing to use noiommu mode within VM.
> 
> Sure, the VM itself is still isolated by the host IOMMU, but it's
> clearly an extra maintenance and development burden when we should
> instead be encouraging those use cases to use vIOMMU rather than
> porting to a different noiommu uAPI.  Even if the host is not exposed,
> any sort of security and support best practices in the guest should
> favor a vIOMMU solution.
> 
> > Besides the performance consideration, using a booting a VM
> > without vIOMMU is supposed to be more robust. But I'm not
> 
> What claims do you have to support lack of robustness in vIOMMU?  Can
> they be fixed?

If no vIOMMU, the Qemu logic is simpler. Hence less chance to have errors.
That's what I heard.

> > sure if the noiommu userspace will adapt to cdev noiommu.
> > Perhaps yes if group may be deprecated in future.
> 
> Deprecation is going to take a long time.  IMO, the VM use cases should
> all be encouraged to adopt a vIOMMU solution rather than port to a new
> cdev noiommu interface.  The question then is whether there are ongoing
> bare metal noiommu use cases and how long those will drag out the vfio
> group deprecation. We could always add noiommu to the native vfio cdev
> interface later if there's still demand.

So we hope there is no noiommu userspace app after deprecating vfio_group.
But if still needed, then add it in cdev. Is it? sounds like a plan as
vfio_noiommu is also not there from vfio day1.

Jason Gunthorpe June 6, 2023, 2:40 p.m. UTC | #9

On Wed, May 24, 2023 at 09:31:42AM -0600, Alex Williamson wrote:

> If a user creates an ioas within an iommufd, attaches a device to that
> ioas and populates it with mappings, wouldn't the user expect the
> device to have access to and honor those mappings?  I think that's the
> path we're headed down if we report a successful attach of a noiommu
> device to an ioas.

I understand we are going to drop no-iommu from this series, so this
below is not relavent.

But to clarify my general design idea here again

The IOAS contains the mappings that userspace would like to use with
no-iommu. Userspace would use a new IOCTL to pin and return the DMA
addr's of those exact mappings.

So attaching a noiommu to an IOAS is a necessary operation that should
succeed. It doesn't make full API sense until we also get an ioctl to
return the dma_addr_t lists.

What is special about no-iommu is that the mapppings have to go
through the special ioctl API to pin and translate, the IOVA cannot be
used natively as a dma_addr. The IOAS is still used and still related
to the device, it just for pinning and dma_addr generation not HW
isolation.

> We need to keep in mind that noiommu was meant to be a minimally
> intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
> the group requirements, solely for the purpose of making use of the
> vfio device interface and without providing any DMA mapping services or
> expectations.  

Well, no-iommu turned into a total hack job as soon as it wrongly
relied on mlock() and /proc/ files to function. Even within its
defined limitations this is an incorrect way to use the mm and DMA
APIs. Memory under DMA must be locked using pin_user_pages(), mlock is
not a substitution.

I expect this is functionally broken these days, under some workloads,
on certain kernel configurations.

Even if we don't fully implement it, I prefer to imagine a design
where no-iommu is implemented correctly and orient things toward that.

> beyond the minimal code trickery of the legacy implementation.  I hate
> to ask, but could we reiterate our requirements for noiommu as a part of
> the native iommufd interface for vfio?  The nested userspace requirement
> is gone now that hypervisors have vIOMMU support, so my assumption is
> that this is only for bare metal systems without an IOMMU, which
> ideally are less and less prevalent.  

I understood there was some desire for DPDK users to do this for
higher performance on some systems.

> that are actually going to adopt the noiommu cdev interface?  What
> terrible things happen if noiommu only exists in the vfio group compat
> interface to iommufd and at some distant point in the future dies when
> that gets disabled?

I think it is fine, it is only for DPDK and if DPDK people really
really care about this then they can implement it properly someday.

I'm quite happy if we say we will not put no-iommu into the device
cdev until it is put in fully correctly without relying on mlock/etc.

Then the API construction would make alot more sense.

Jason

Jason Gunthorpe June 6, 2023, 2:42 p.m. UTC | #10

On Thu, May 25, 2023 at 03:03:54AM +0000, Liu, Yi L wrote:

> vIOMMU may introduce some performance deduction if there
> are frequent map/unmap. 

DPDK doesn't do that.

And once you turn on the HW IOMMU you negate alot of the micro
performance wins of bypassing. Maybe there is still some argument
about giant/huge pages or something.

> without vIOMMU is supposed to be more robust. But I'm not
> sure if the noiommu userspace will adapt to cdev noiommu.
> Perhaps yes if group may be deprecated in future.

I think that is more a distro question, and we don't have to answer it
fully now.

Jason

[v11,20/23] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT

Commit Message

Comments

Patch