Message ID | 20230308132903.465159-22-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cover-letter: Add vfio_device cdev for iommufd support | expand |
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, March 8, 2023 9:29 PM > + > +static int vfio_device_cdev_probe_noiommu(struct vfio_device *device) > +{ > + struct iommu_group *iommu_group; > + int ret = 0; > + > + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > + return -EINVAL; > + > + if (!capable(CAP_SYS_RAWIO)) > + return -EPERM; > + > + iommu_group = iommu_group_get(device->dev); > + if (!iommu_group) > + return 0; > + > + /* > + * We cannot support noiommu mode for devices that are protected > + * by IOMMU. So check the iommu_group, if it is a no-iommu group > + * created by VFIO, we support. If not, we refuse. > + */ > + if (!vfio_group_find_noiommu_group_from_iommu(iommu_group)) > + ret = -EINVAL; > + iommu_group_put(iommu_group); > + return ret; can check whether group->name == "vfio-noiommu"?
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Friday, March 10, 2023 5:02 PM > > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Wednesday, March 8, 2023 9:29 PM > > + > > +static int vfio_device_cdev_probe_noiommu(struct vfio_device *device) > > +{ > > + struct iommu_group *iommu_group; > > + int ret = 0; > > + > > + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > + return -EINVAL; > > + > > + if (!capable(CAP_SYS_RAWIO)) > > + return -EPERM; > > + > > + iommu_group = iommu_group_get(device->dev); > > + if (!iommu_group) > > + return 0; > > + > > + /* > > + * We cannot support noiommu mode for devices that are > protected > > + * by IOMMU. So check the iommu_group, if it is a no-iommu group > > + * created by VFIO, we support. If not, we refuse. > > + */ > > + if > (!vfio_group_find_noiommu_group_from_iommu(iommu_group)) > > + ret = -EINVAL; > > + iommu_group_put(iommu_group); > > + return ret; > > can check whether group->name == "vfio-noiommu"? But VFIO names it to be "vfio-noiommu" for both VFIO_EMULATED_IOMMU and VFIO_NO_IOMMU. And we don't support no-iommu mode for emulated devices since VFIO_MAP/UNMAP, pin_page(), dam_rw() won't work in the no-iommu mode. So maybe something like below in drivers/vfio/vfio.h. It can be used to replace the code from iommu_group_get() to vfio_group_find_noiommu_group_from_iommu() In my patch. #if IS_ENABLED(CONFIG_VFIO_GROUP) static inline bool vfio_device_group_allow_noiommu(struct vfio_device *device) { lockdep_assert_held(&device->dev_set->lock); return device->group->type == VFIO_NO_IOMMU; } #else static inline bool vfio_device_group_allow_noiommu(struct vfio_device *device) { struct iommu_group *iommu_group; lockdep_assert_held(&device->dev_set->lock); iommu_group = iommu_group_get(device->dev); if (iommu_group) iommu_group_put(iommu_group); return !iommu_group; } #endif
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Friday, March 10, 2023 5:58 PM > > > From: Tian, Kevin <kevin.tian@intel.com> > > Sent: Friday, March 10, 2023 5:02 PM > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > Sent: Wednesday, March 8, 2023 9:29 PM > > > + > > > +static int vfio_device_cdev_probe_noiommu(struct vfio_device *device) > > > +{ > > > + struct iommu_group *iommu_group; > > > + int ret = 0; > > > + > > > + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > > + return -EINVAL; > > > + > > > + if (!capable(CAP_SYS_RAWIO)) > > > + return -EPERM; > > > + > > > + iommu_group = iommu_group_get(device->dev); > > > + if (!iommu_group) > > > + return 0; > > > + > > > + /* > > > + * We cannot support noiommu mode for devices that are > > protected > > > + * by IOMMU. So check the iommu_group, if it is a no-iommu group > > > + * created by VFIO, we support. If not, we refuse. > > > + */ > > > + if > > (!vfio_group_find_noiommu_group_from_iommu(iommu_group)) > > > + ret = -EINVAL; > > > + iommu_group_put(iommu_group); > > > + return ret; > > > > can check whether group->name == "vfio-noiommu"? > > But VFIO names it to be "vfio-noiommu" for both VFIO_EMULATED_IOMMU > and VFIO_NO_IOMMU. And we don't support no-iommu mode for emulated > devices since VFIO_MAP/UNMAP, pin_page(), dam_rw() won't work in the > no-iommu mode. correct. > > So maybe something like below in drivers/vfio/vfio.h. It can be used > to replace the code from iommu_group_get() to > vfio_group_find_noiommu_group_from_iommu() In my patch. > > #if IS_ENABLED(CONFIG_VFIO_GROUP) > static inline bool vfio_device_group_allow_noiommu(struct vfio_device > *device) > { > lockdep_assert_held(&device->dev_set->lock); > > return device->group->type == VFIO_NO_IOMMU; > } > #else > static inline bool vfio_device_group_allow_noiommu(struct vfio_device > *device) > { > struct iommu_group *iommu_group; > > lockdep_assert_held(&device->dev_set->lock); > > iommu_group = iommu_group_get(device->dev); > if (iommu_group) > iommu_group_put(iommu_group); > > return !iommu_group; > } > #endif this makes sense.
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Friday, March 10, 2023 6:07 PM > > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Friday, March 10, 2023 5:58 PM > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > Sent: Friday, March 10, 2023 5:02 PM > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > Sent: Wednesday, March 8, 2023 9:29 PM > > > > + > > > > +static int vfio_device_cdev_probe_noiommu(struct vfio_device > *device) > > > > +{ > > > > + struct iommu_group *iommu_group; > > > > + int ret = 0; > > > > + > > > > + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > > > + return -EINVAL; > > > > + > > > > + if (!capable(CAP_SYS_RAWIO)) > > > > + return -EPERM; > > > > + > > > > + iommu_group = iommu_group_get(device->dev); > > > > + if (!iommu_group) > > > > + return 0; > > > > + > > > > + /* > > > > + * We cannot support noiommu mode for devices that are > > > protected > > > > + * by IOMMU. So check the iommu_group, if it is a no-iommu group > > > > + * created by VFIO, we support. If not, we refuse. > > > > + */ > > > > + if > > > (!vfio_group_find_noiommu_group_from_iommu(iommu_group)) > > > > + ret = -EINVAL; > > > > + iommu_group_put(iommu_group); > > > > + return ret; > > > > > > can check whether group->name == "vfio-noiommu"? > > > > But VFIO names it to be "vfio-noiommu" for both > VFIO_EMULATED_IOMMU > > and VFIO_NO_IOMMU. And we don't support no-iommu mode for > emulated > > devices since VFIO_MAP/UNMAP, pin_page(), dam_rw() won't work in > the > > no-iommu mode. > > correct. > > > > > So maybe something like below in drivers/vfio/vfio.h. It can be used > > to replace the code from iommu_group_get() to > > vfio_group_find_noiommu_group_from_iommu() In my patch. > > > > #if IS_ENABLED(CONFIG_VFIO_GROUP) > > static inline bool vfio_device_group_allow_noiommu(struct vfio_device > > *device) > > { > > lockdep_assert_held(&device->dev_set->lock); > > > > return device->group->type == VFIO_NO_IOMMU; > > } > > #else > > static inline bool vfio_device_group_allow_noiommu(struct vfio_device > > *device) > > { > > struct iommu_group *iommu_group; > > > > lockdep_assert_held(&device->dev_set->lock); > > > > iommu_group = iommu_group_get(device->dev); > > if (iommu_group) > > iommu_group_put(iommu_group); > > > > return !iommu_group; > > } > > #endif > > this makes sense. Just have one more think. vfio_device_is_noiommu() is already able to cover above vfio_device_group_allow_noiommu(), just needs to make it work when !VFIO_GROUP. In the group code, group->type == VFIO_NO_IOMMU means vfio_noiommu==true. So no need to check it. While in the case !VFIO_GROUP, needs to check it. So the code is as below. I can use vfio_device_is_noiommu() in cdev path. # if IS_ENABLED(CONFIG_VFIO_GROUP) static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && vdev->group->type == VFIO_NO_IOMMU; } #else static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { struct iommu_group *iommu_group; if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) return -EINVAL; iommu_group = iommu_group_get(vdev->dev); if (iommu_group) iommu_group_put(iommu_group); return !iommu_group; } #endif
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, March 15, 2023 12:40 PM > > > From: Tian, Kevin <kevin.tian@intel.com> > > Sent: Friday, March 10, 2023 6:07 PM > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > Sent: Friday, March 10, 2023 5:58 PM > > > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > > Sent: Friday, March 10, 2023 5:02 PM > > > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com> > > > > > Sent: Wednesday, March 8, 2023 9:29 PM > > > > > + > > > > > +static int vfio_device_cdev_probe_noiommu(struct vfio_device > > *device) > > > > > +{ > > > > > + struct iommu_group *iommu_group; > > > > > + int ret = 0; > > > > > + > > > > > + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) > || !vfio_noiommu) > > > > > + return -EINVAL; > > > > > + > > > > > + if (!capable(CAP_SYS_RAWIO)) > > > > > + return -EPERM; > > > > > + > > > > > + iommu_group = iommu_group_get(device->dev); > > > > > + if (!iommu_group) > > > > > + return 0; > > > > > + > > > > > + /* > > > > > + * We cannot support noiommu mode for devices that are > > > > protected > > > > > + * by IOMMU. So check the iommu_group, if it is a no-iommu > group > > > > > + * created by VFIO, we support. If not, we refuse. > > > > > + */ > > > > > + if > > > > (!vfio_group_find_noiommu_group_from_iommu(iommu_group)) > > > > > + ret = -EINVAL; > > > > > + iommu_group_put(iommu_group); > > > > > + return ret; > > > > > > > > can check whether group->name == "vfio-noiommu"? > > > > > > But VFIO names it to be "vfio-noiommu" for both > > VFIO_EMULATED_IOMMU > > > and VFIO_NO_IOMMU. And we don't support no-iommu mode for > > emulated > > > devices since VFIO_MAP/UNMAP, pin_page(), dam_rw() won't work in > > the > > > no-iommu mode. > > > > correct. > > > > > > > > So maybe something like below in drivers/vfio/vfio.h. It can be used > > > to replace the code from iommu_group_get() to > > > vfio_group_find_noiommu_group_from_iommu() In my patch. > > > > > > #if IS_ENABLED(CONFIG_VFIO_GROUP) > > > static inline bool vfio_device_group_allow_noiommu(struct vfio_device > > > *device) > > > { > > > lockdep_assert_held(&device->dev_set->lock); > > > > > > return device->group->type == VFIO_NO_IOMMU; > > > } > > > #else > > > static inline bool vfio_device_group_allow_noiommu(struct vfio_device > > > *device) > > > { > > > struct iommu_group *iommu_group; > > > > > > lockdep_assert_held(&device->dev_set->lock); > > > > > > iommu_group = iommu_group_get(device->dev); > > > if (iommu_group) > > > iommu_group_put(iommu_group); > > > > > > return !iommu_group; > > > } > > > #endif > > > > this makes sense. > > Just have one more think. vfio_device_is_noiommu() is already able > to cover above vfio_device_group_allow_noiommu(), just needs > to make it work when !VFIO_GROUP. In the group code, group->type > == VFIO_NO_IOMMU means vfio_noiommu==true. So no need to > check it. While in the case !VFIO_GROUP, needs to check it. So the > code is as below. I can use vfio_device_is_noiommu() in cdev path. > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > { > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > vdev->group->type == VFIO_NO_IOMMU; > } > #else > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > { > struct iommu_group *iommu_group; > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > return -EINVAL; > > iommu_group = iommu_group_get(vdev->dev); > if (iommu_group) > iommu_group_put(iommu_group); > > return !iommu_group; > } > #endif works for me.
On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > # if IS_ENABLED(CONFIG_VFIO_GROUP) > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > { > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > vdev->group->type == VFIO_NO_IOMMU; > } > #else > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > { > struct iommu_group *iommu_group; > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > return -EINVAL; > > iommu_group = iommu_group_get(vdev->dev); > if (iommu_group) > iommu_group_put(iommu_group); > > return !iommu_group; If we don't have VFIO_GROUP then no-iommu is signaled by a NULL iommu_ctx pointer in the vdev, don't mess with groups Jason
On 2023/3/20 22:09, Jason Gunthorpe wrote: > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > >> # if IS_ENABLED(CONFIG_VFIO_GROUP) >> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) >> { >> return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && >> vdev->group->type == VFIO_NO_IOMMU; >> } >> #else >> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) >> { >> struct iommu_group *iommu_group; >> >> if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) >> return -EINVAL; >> >> iommu_group = iommu_group_get(vdev->dev); >> if (iommu_group) >> iommu_group_put(iommu_group); >> >> return !iommu_group; > > If we don't have VFIO_GROUP then no-iommu is signaled by a NULL > iommu_ctx pointer in the vdev, don't mess with groups yes, NULL iommufd_ctx pointer would be set in vdev and passed to the vfio_device_open(). But here, we want to use this helper to check if user can use noiommu mode. This is before calling vfio_device_open(). e.g. if the device is protected by iommu, then user cannot use noiommu mode on it.
On Mon, Mar 20, 2023 at 10:31:53PM +0800, Yi Liu wrote: > On 2023/3/20 22:09, Jason Gunthorpe wrote: > > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > > > > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > > > { > > > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > > vdev->group->type == VFIO_NO_IOMMU; > > > } > > > #else > > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > > > { > > > struct iommu_group *iommu_group; > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > > return -EINVAL; > > > > > > iommu_group = iommu_group_get(vdev->dev); > > > if (iommu_group) > > > iommu_group_put(iommu_group); > > > > > > return !iommu_group; > > > > If we don't have VFIO_GROUP then no-iommu is signaled by a NULL > > iommu_ctx pointer in the vdev, don't mess with groups > > yes, NULL iommufd_ctx pointer would be set in vdev and passed to the > vfio_device_open(). But here, we want to use this helper to check if > user can use noiommu mode. This is before calling vfio_device_open(). > e.g. if the device is protected by iommu, then user cannot use noiommu > mode on it. Why not allow it? If the admin has enabled this mode we may as well let it be used. You explicitly ask for no-iommu mode by passing -1 for the iommufd parameter. If the module parameter says it is allowed then that is all you need. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, March 21, 2023 1:17 AM > > On Mon, Mar 20, 2023 at 10:31:53PM +0800, Yi Liu wrote: > > On 2023/3/20 22:09, Jason Gunthorpe wrote: > > > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > > > > > > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > > > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > > > > { > > > > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > > > vdev->group->type == VFIO_NO_IOMMU; > > > > } > > > > #else > > > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > > > > { > > > > struct iommu_group *iommu_group; > > > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > > > return -EINVAL; > > > > > > > > iommu_group = iommu_group_get(vdev->dev); > > > > if (iommu_group) > > > > iommu_group_put(iommu_group); > > > > > > > > return !iommu_group; > > > > > > If we don't have VFIO_GROUP then no-iommu is signaled by a NULL > > > iommu_ctx pointer in the vdev, don't mess with groups > > > > yes, NULL iommufd_ctx pointer would be set in vdev and passed to the > > vfio_device_open(). But here, we want to use this helper to check if > > user can use noiommu mode. This is before calling vfio_device_open(). > > e.g. if the device is protected by iommu, then user cannot use noiommu > > mode on it. > > Why not allow it? > > If the admin has enabled this mode we may as well let it be used. > > You explicitly ask for no-iommu mode by passing -1 for the iommufd > parameter. If the module parameter says it is allowed then that is all > you need. > IMHO we should disallow noiommu on a device which already has a iommu group. This is how noiommu works with vfio group. I don't think it's a good idea to further relax it in cdev.
On Tue, Mar 21, 2023 at 01:30:34AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Tuesday, March 21, 2023 1:17 AM > > > > On Mon, Mar 20, 2023 at 10:31:53PM +0800, Yi Liu wrote: > > > On 2023/3/20 22:09, Jason Gunthorpe wrote: > > > > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > > > > > > > > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > > > > > { > > > > > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > > > > vdev->group->type == VFIO_NO_IOMMU; > > > > > } > > > > > #else > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) > > > > > { > > > > > struct iommu_group *iommu_group; > > > > > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > > > > return -EINVAL; > > > > > > > > > > iommu_group = iommu_group_get(vdev->dev); > > > > > if (iommu_group) > > > > > iommu_group_put(iommu_group); > > > > > > > > > > return !iommu_group; > > > > > > > > If we don't have VFIO_GROUP then no-iommu is signaled by a NULL > > > > iommu_ctx pointer in the vdev, don't mess with groups > > > > > > yes, NULL iommufd_ctx pointer would be set in vdev and passed to the > > > vfio_device_open(). But here, we want to use this helper to check if > > > user can use noiommu mode. This is before calling vfio_device_open(). > > > e.g. if the device is protected by iommu, then user cannot use noiommu > > > mode on it. > > > > Why not allow it? > > > > If the admin has enabled this mode we may as well let it be used. > > > > You explicitly ask for no-iommu mode by passing -1 for the iommufd > > parameter. If the module parameter says it is allowed then that is all > > you need. > > > > IMHO we should disallow noiommu on a device which already has > a iommu group. This is how noiommu works with vfio group. I don't > think it's a good idea to further relax it in cdev. This isn't the same thing, this will trigger for mdevs and stuff that should not be noiommu as well. If you want to copy what the group code does then noiommu needs to be statically determined at physical vfio device allocation time. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, March 21, 2023 8:01 PM > > On Tue, Mar 21, 2023 at 01:30:34AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Tuesday, March 21, 2023 1:17 AM > > > > > > On Mon, Mar 20, 2023 at 10:31:53PM +0800, Yi Liu wrote: > > > > On 2023/3/20 22:09, Jason Gunthorpe wrote: > > > > > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > > > > > > > > > > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device > *vdev) > > > > > > { > > > > > > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > > > > > vdev->group->type == VFIO_NO_IOMMU; > > > > > > } > > > > > > #else > > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device > *vdev) > > > > > > { > > > > > > struct iommu_group *iommu_group; > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) > || !vfio_noiommu) > > > > > > return -EINVAL; > > > > > > > > > > > > iommu_group = iommu_group_get(vdev->dev); > > > > > > if (iommu_group) > > > > > > iommu_group_put(iommu_group); > > > > > > > > > > > > return !iommu_group; > > > > > > > > > > If we don't have VFIO_GROUP then no-iommu is signaled by a NULL > > > > > iommu_ctx pointer in the vdev, don't mess with groups > > > > > > > > yes, NULL iommufd_ctx pointer would be set in vdev and passed to > the > > > > vfio_device_open(). But here, we want to use this helper to check if > > > > user can use noiommu mode. This is before calling vfio_device_open(). > > > > e.g. if the device is protected by iommu, then user cannot use > noiommu > > > > mode on it. > > > > > > Why not allow it? > > > > > > If the admin has enabled this mode we may as well let it be used. > > > > > > You explicitly ask for no-iommu mode by passing -1 for the iommufd > > > parameter. If the module parameter says it is allowed then that is all > > > you need. > > > > > > > IMHO we should disallow noiommu on a device which already has > > a iommu group. This is how noiommu works with vfio group. I don't > > think it's a good idea to further relax it in cdev. > > This isn't the same thing, this will trigger for mdevs and stuff that > should not be noiommu as well. But the group path does disallow noiommu usage if the device has a real iommu_group (the one created by VFIO code is not real). Would it be better to keep it consistent from this angle? > If you want to copy what the group code does then noiommu needs to be > statically determined at physical vfio device allocation time. There is another reason which may not that strong. For devices protected by iommu, user needs to program IOVA mappings in order to do DMA. Such device has a real iommu_group. So if we allow using noiommu mode for such devices, DMA would be blocked by iommu. Perhaps users that use noiommu mode should not do DMA at the first place. Regards, Yi Liu
On Tue, Mar 21, 2023 at 02:37:58PM +0000, Liu, Yi L wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Tuesday, March 21, 2023 8:01 PM > > > > On Tue, Mar 21, 2023 at 01:30:34AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > Sent: Tuesday, March 21, 2023 1:17 AM > > > > > > > > On Mon, Mar 20, 2023 at 10:31:53PM +0800, Yi Liu wrote: > > > > > On 2023/3/20 22:09, Jason Gunthorpe wrote: > > > > > > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > > > > > > > > > > > > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > > > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device > > *vdev) > > > > > > > { > > > > > > > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > > > > > > vdev->group->type == VFIO_NO_IOMMU; > > > > > > > } > > > > > > > #else > > > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device > > *vdev) > > > > > > > { > > > > > > > struct iommu_group *iommu_group; > > > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) > > || !vfio_noiommu) > > > > > > > return -EINVAL; > > > > > > > > > > > > > > iommu_group = iommu_group_get(vdev->dev); > > > > > > > if (iommu_group) > > > > > > > iommu_group_put(iommu_group); > > > > > > > > > > > > > > return !iommu_group; > > > > > > > > > > > > If we don't have VFIO_GROUP then no-iommu is signaled by a NULL > > > > > > iommu_ctx pointer in the vdev, don't mess with groups > > > > > > > > > > yes, NULL iommufd_ctx pointer would be set in vdev and passed to > > the > > > > > vfio_device_open(). But here, we want to use this helper to check if > > > > > user can use noiommu mode. This is before calling vfio_device_open(). > > > > > e.g. if the device is protected by iommu, then user cannot use > > noiommu > > > > > mode on it. > > > > > > > > Why not allow it? > > > > > > > > If the admin has enabled this mode we may as well let it be used. > > > > > > > > You explicitly ask for no-iommu mode by passing -1 for the iommufd > > > > parameter. If the module parameter says it is allowed then that is all > > > > you need. > > > > > > > > > > IMHO we should disallow noiommu on a device which already has > > > a iommu group. This is how noiommu works with vfio group. I don't > > > think it's a good idea to further relax it in cdev. > > > > This isn't the same thing, this will trigger for mdevs and stuff that > > should not be noiommu as well. > > But the group path does disallow noiommu usage if the device has > a real iommu_group (the one created by VFIO code is not real). Would > it be better to keep it consistent from this angle? > > > If you want to copy what the group code does then noiommu needs to be > > statically determined at physical vfio device allocation time. > > There is another reason which may not that strong. For devices protected > by iommu, user needs to program IOVA mappings in order to do DMA. Such > device has a real iommu_group. Oh that is a good reason for sure But still, this check should be done at device creation time just like in group mode, not during each attach call. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, March 21, 2023 10:41 PM > > On Tue, Mar 21, 2023 at 02:37:58PM +0000, Liu, Yi L wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Tuesday, March 21, 2023 8:01 PM > > > > > > On Tue, Mar 21, 2023 at 01:30:34AM +0000, Tian, Kevin wrote: > > > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > > > Sent: Tuesday, March 21, 2023 1:17 AM > > > > > > > > > > On Mon, Mar 20, 2023 at 10:31:53PM +0800, Yi Liu wrote: > > > > > > On 2023/3/20 22:09, Jason Gunthorpe wrote: > > > > > > > On Wed, Mar 15, 2023 at 04:40:19AM +0000, Liu, Yi L wrote: > > > > > > > > > > > > > > > # if IS_ENABLED(CONFIG_VFIO_GROUP) > > > > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device > > > *vdev) > > > > > > > > { > > > > > > > > return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > > > > > > > vdev->group->type == VFIO_NO_IOMMU; > > > > > > > > } > > > > > > > > #else > > > > > > > > static inline bool vfio_device_is_noiommu(struct vfio_device > > > *vdev) > > > > > > > > { > > > > > > > > struct iommu_group *iommu_group; > > > > > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) > > > || !vfio_noiommu) > > > > > > > > return -EINVAL; > > > > > > > > > > > > > > > > iommu_group = iommu_group_get(vdev->dev); > > > > > > > > if (iommu_group) > > > > > > > > iommu_group_put(iommu_group); > > > > > > > > > > > > > > > > return !iommu_group; > > > > > > > > > > > > > > If we don't have VFIO_GROUP then no-iommu is signaled by a > NULL > > > > > > > iommu_ctx pointer in the vdev, don't mess with groups > > > > > > > > > > > > yes, NULL iommufd_ctx pointer would be set in vdev and passed > to > > > the > > > > > > vfio_device_open(). But here, we want to use this helper to check > if > > > > > > user can use noiommu mode. This is before calling > vfio_device_open(). > > > > > > e.g. if the device is protected by iommu, then user cannot use > > > noiommu > > > > > > mode on it. > > > > > > > > > > Why not allow it? > > > > > > > > > > If the admin has enabled this mode we may as well let it be used. > > > > > > > > > > You explicitly ask for no-iommu mode by passing -1 for the iommufd > > > > > parameter. If the module parameter says it is allowed then that is all > > > > > you need. > > > > > > > > > > > > > IMHO we should disallow noiommu on a device which already has > > > > a iommu group. This is how noiommu works with vfio group. I don't > > > > think it's a good idea to further relax it in cdev. > > > > > > This isn't the same thing, this will trigger for mdevs and stuff that > > > should not be noiommu as well. > > > > But the group path does disallow noiommu usage if the device has > > a real iommu_group (the one created by VFIO code is not real). Would > > it be better to keep it consistent from this angle? > > > > > If you want to copy what the group code does then noiommu needs to > be > > > statically determined at physical vfio device allocation time. > > > > There is another reason which may not that strong. For devices protected > > by iommu, user needs to program IOVA mappings in order to do DMA. > Such > > device has a real iommu_group. > > Oh that is a good reason for sure > > But still, this check should be done at device creation time just like > in group mode, not during each attach call. Seems like requiring a noiommu_capable flag in vfio_device. We set this flag only when the vfio_group->type==VFIO_NO_IOMMU or no real iommu_group (for the case in which vfio group code is compiled out). Perhaps the below check should be added as well. Then in the time of bind, just check the noiommu_capable flag and capable(CAP_SYS_RAWIO). if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) Regards, Yi Liu
On Tue, Mar 21, 2023 at 02:51:20PM +0000, Liu, Yi L wrote: > > But still, this check should be done at device creation time just like > > in group mode, not during each attach call. > > Seems like requiring a noiommu_capable flag in vfio_device. We > set this flag only when the vfio_group->type==VFIO_NO_IOMMU > or no real iommu_group (for the case in which vfio group code is > compiled out). Perhaps the below check should be added as well. > Then in the time of bind, just check the noiommu_capable flag > and capable(CAP_SYS_RAWIO). > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) Yes, and also only for physical devices Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Tuesday, March 21, 2023 10:58 PM > > On Tue, Mar 21, 2023 at 02:51:20PM +0000, Liu, Yi L wrote: > > > But still, this check should be done at device creation time just like > > > in group mode, not during each attach call. > > > > Seems like requiring a noiommu_capable flag in vfio_device. We > > set this flag only when the vfio_group->type==VFIO_NO_IOMMU > > or no real iommu_group (for the case in which vfio group code is > > compiled out). Perhaps the below check should be added as well. > > Then in the time of bind, just check the noiommu_capable flag > > and capable(CAP_SYS_RAWIO). > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > Yes, and also only for physical devices Sure. BTW. in the time of creation, there is no vfio group yet. So may just check if the device has a real iommu_group. Another alternative is to set this flag at the time of vfio_device registration. Physical device driver and emulated device driver uses different register APIs. Hence they can be distinguished easily. Regards, Yi Liu
On Tue, Mar 21, 2023 at 03:10:50PM +0000, Liu, Yi L wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Tuesday, March 21, 2023 10:58 PM > > > > On Tue, Mar 21, 2023 at 02:51:20PM +0000, Liu, Yi L wrote: > > > > But still, this check should be done at device creation time just like > > > > in group mode, not during each attach call. > > > > > > Seems like requiring a noiommu_capable flag in vfio_device. We > > > set this flag only when the vfio_group->type==VFIO_NO_IOMMU > > > or no real iommu_group (for the case in which vfio group code is > > > compiled out). Perhaps the below check should be added as well. > > > Then in the time of bind, just check the noiommu_capable flag > > > and capable(CAP_SYS_RAWIO). > > > > > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > > > Yes, and also only for physical devices > > Sure. BTW. in the time of creation, there is no vfio group yet. So may > just check if the device has a real iommu_group. Another alternative > is to set this flag at the time of vfio_device registration. Physical > device driver and emulated device driver uses different register APIs. > Hence they can be distinguished easily. Yes, at registration. In group mode it should copy the flag from the vfio_group, in non-group mode it should query the iommu_group Jason
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 1c640016a824..568cc9da16c7 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -3,6 +3,7 @@ * Copyright (c) 2023 Intel Corporation. */ #include <linux/vfio.h> +#include <linux/iommufd.h> #include "vfio.h" @@ -44,6 +45,171 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) return ret; } +static void vfio_device_get_kvm_safe(struct vfio_device_file *df) +{ + spin_lock(&df->kvm_ref_lock); + if (df->kvm) + _vfio_device_get_kvm_safe(df->device, df->kvm); + spin_unlock(&df->kvm_ref_lock); +} + +void vfio_device_cdev_close(struct vfio_device_file *df) +{ + struct vfio_device *device = df->device; + + /* + * As df->access_granted writer is under dev_set->lock as well, + * so this read no need to use smp_load_acquire() to pair with + * smp_store_release() in the caller of vfio_device_open(). + */ + if (!df->access_granted) + return; + + mutex_lock(&device->dev_set->lock); + vfio_device_close(df); + vfio_device_put_kvm(device); + if (df->iommufd) + iommufd_ctx_put(df->iommufd); + mutex_unlock(&device->dev_set->lock); + vfio_device_unblock_group(device); +} + +static int vfio_device_cdev_probe_noiommu(struct vfio_device *device) +{ + struct iommu_group *iommu_group; + int ret = 0; + + if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) + return -EINVAL; + + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + + iommu_group = iommu_group_get(device->dev); + if (!iommu_group) + return 0; + + /* + * We cannot support noiommu mode for devices that are protected + * by IOMMU. So check the iommu_group, if it is a no-iommu group + * created by VFIO, we support. If not, we refuse. + */ + if (!vfio_group_find_noiommu_group_from_iommu(iommu_group)) + ret = -EINVAL; + iommu_group_put(iommu_group); + return ret; +} + +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd) +{ + struct fd f; + struct iommufd_ctx *iommufd; + + f = fdget(fd); + if (!f.file) + return ERR_PTR(-EBADF); + + iommufd = iommufd_ctx_from_file(f.file); + + fdput(f); + return iommufd; +} + +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, + struct vfio_device_bind_iommufd __user *arg) +{ + struct vfio_device *device = df->device; + struct vfio_device_bind_iommufd bind; + struct iommufd_ctx *iommufd = NULL; + unsigned long minsz; + int ret; + + static_assert(__same_type(arg->out_devid, bind.out_devid)); + + minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid); + + if (copy_from_user(&bind, arg, minsz)) + return -EFAULT; + + if (bind.argsz < minsz || bind.flags) + return -EINVAL; + + if (!device->ops->bind_iommufd) + return -ENODEV; + + ret = vfio_device_block_group(device); + if (ret) + return ret; + + mutex_lock(&device->dev_set->lock); + /* If already got access, should fail it. */ + if (df->access_granted) { + ret = -EINVAL; + goto out_unlock; + } + + /* iommufd < 0 means noiommu mode */ + if (bind.iommufd < 0) { + ret = vfio_device_cdev_probe_noiommu(device); + if (ret) + goto out_unlock; + } else { + iommufd = vfio_get_iommufd_from_fd(bind.iommufd); + if (IS_ERR(iommufd)) { + ret = PTR_ERR(iommufd); + goto out_unlock; + } + } + + /* + * Before the device open, get the KVM pointer currently + * associated with the device file (if there is) and obtain + * a reference. This reference is held until device closed. + * Save the pointer in the device for use by drivers. + */ + vfio_device_get_kvm_safe(df); + + df->iommufd = iommufd; + ret = vfio_device_open(df); + if (ret) + goto out_put_kvm; + + if (df->iommufd) + bind.out_devid = df->devid; + else + bind.out_devid = IOMMUFD_INVALID_ID; + + ret = copy_to_user(&arg->out_devid, &bind.out_devid, + sizeof(bind.out_devid)) ? -EFAULT : 0; + if (ret) + goto out_close_device; + + if (bind.iommufd < 0) + dev_warn(device->dev, "device is bound to vfio-noiommu by user " + "(%s:%d)\n", current->comm, task_pid_nr(current)); + + /* + * Paired with smp_load_acquire() in vfio_device_fops::ioctl/ + * read/write/mmap + */ + smp_store_release(&df->access_granted, true); + mutex_unlock(&device->dev_set->lock); + + return 0; + +out_close_device: + vfio_device_close(df); +out_put_kvm: + df->iommufd = NULL; + vfio_device_put_kvm(device); + if (iommufd) + iommufd_ctx_put(iommufd); +out_unlock: + mutex_unlock(&device->dev_set->lock); + vfio_device_unblock_group(device); + return ret; +} + static char *vfio_device_devnode(const struct device *dev, umode_t *mode) { return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 51c027134814..fc49f2459b1a 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -701,6 +701,21 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev) return group; } +struct vfio_group * +vfio_group_find_noiommu_group_from_iommu(struct iommu_group *iommu_group) +{ + struct vfio_group *group; + bool found = false; + + mutex_lock(&vfio.group_lock); + group = vfio_group_find_from_iommu(iommu_group); + if (group && group->type == VFIO_NO_IOMMU) + found = true; + mutex_unlock(&vfio.group_lock); + + return found ? group : NULL; +} + int vfio_device_set_group(struct vfio_device *device, enum vfio_group_type type) { diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 3f359f04b754..5df737b24102 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -91,6 +91,8 @@ struct vfio_group { int vfio_device_block_group(struct vfio_device *device); void vfio_device_unblock_group(struct vfio_device *device); +struct vfio_group * +vfio_group_find_noiommu_group_from_iommu(struct iommu_group *iommu_group); int vfio_device_set_group(struct vfio_device *device, enum vfio_group_type type); void vfio_device_remove_group(struct vfio_device *device); @@ -273,6 +275,9 @@ static inline void vfio_device_del(struct vfio_device *device) void vfio_init_device_cdev(struct vfio_device *device); int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep); +void vfio_device_cdev_close(struct vfio_device_file *df); +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, + struct vfio_device_bind_iommufd __user *arg); int vfio_cdev_init(struct class *device_class); void vfio_cdev_cleanup(void); #else @@ -296,6 +301,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode, return 0; } +static inline void vfio_device_cdev_close(struct vfio_device_file *df) +{ +} + +static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, + struct vfio_device_bind_iommufd __user *arg) +{ + return -EOPNOTSUPP; +} + static inline int vfio_cdev_init(struct class *device_class) { return 0; diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index b0c2a7544524..08bb1705d02d 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -575,6 +575,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) if (df->group) vfio_device_group_close(df); + else + vfio_device_cdev_close(df); vfio_device_put_registration(device); @@ -1148,7 +1150,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, struct vfio_device *device = df->device; int ret; - /* Paired with smp_store_release() in vfio_device_group_open() */ + if (cmd == VFIO_DEVICE_BIND_IOMMUFD) + return vfio_device_ioctl_bind_iommufd(df, (void __user *)arg); + + /* + * Paired with smp_store_release() in the caller of + * vfio_device_open(). e.g. vfio_device_group_open() + * and vfio_device_ioctl_bind_iommufd() + */ if (!smp_load_acquire(&df->access_granted)) return -EINVAL; @@ -1179,7 +1188,11 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf, struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; - /* Paired with smp_store_release() in vfio_device_group_open() */ + /* + * Paired with smp_store_release() in the caller of + * vfio_device_open(). e.g. vfio_device_group_open() + * and vfio_device_ioctl_bind_iommufd() + */ if (!smp_load_acquire(&df->access_granted)) return -EINVAL; @@ -1196,7 +1209,11 @@ static ssize_t vfio_device_fops_write(struct file *filep, struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; - /* Paired with smp_store_release() in vfio_device_group_open() */ + /* + * Paired with smp_store_release() in the caller of + * vfio_device_open(). e.g. vfio_device_group_open() + * and vfio_device_ioctl_bind_iommufd() + */ if (!smp_load_acquire(&df->access_granted)) return -EINVAL; @@ -1211,7 +1228,11 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma) struct vfio_device_file *df = filep->private_data; struct vfio_device *device = df->device; - /* Paired with smp_store_release() in vfio_device_group_open() */ + /* + * Paired with smp_store_release() in the caller of + * vfio_device_open(). e.g. vfio_device_group_open() + * and vfio_device_ioctl_bind_iommufd() + */ if (!smp_load_acquire(&df->access_granted)) return -EINVAL; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 382d95455f89..a53afe349a34 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -194,6 +194,43 @@ struct vfio_group_status { /* --------------- IOCTLs for DEVICE file descriptors --------------- */ +/* + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19, + * struct vfio_device_bind_iommufd) + * + * Bind a vfio_device to the specified iommufd. + * + * The user should provide a device cookie when calling this ioctl. The + * cookie is carried only in event e.g. I/O fault reported to userspace + * via iommufd. The user should use devid returned by this ioctl to mark + * the target device in other ioctls (e.g. iommu hardware infomration query + * via iommufd, and etc.). + * + * User is not allowed to access the device before the binding operation + * is completed. + * + * Unbind is automatically conducted when device fd is closed. + * + * @argsz: user filled size of this data. + * @flags: reserved for future extension. + * @dev_cookie: a per device cookie provided by userspace. + * @iommufd: iommufd to bind. a negative value means noiommu. + * @out_devid: the device id generated by this bind. This field is valid + * as long as the input @iommufd is valid. Otherwise, it is + * meaningless. + * + * Return: 0 on success, -errno on failure. + */ +struct vfio_device_bind_iommufd { + __u32 argsz; + __u32 flags; + __aligned_u64 dev_cookie; + __s32 iommufd; + __u32 out_devid; +}; + +#define VFIO_DEVICE_BIND_IOMMUFD _IO(VFIO_TYPE, VFIO_BASE + 19) + /** * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7, * struct vfio_device_info)
This adds ioctl for userspace to bind device cdev fd to iommufd. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. open_device op is called after bind_iommufd op. VFIO no iommu mode is indicated by passing a negative iommufd value. Signed-off-by: Yi Liu <yi.l.liu@intel.com> --- drivers/vfio/device_cdev.c | 166 +++++++++++++++++++++++++++++++++++++ drivers/vfio/group.c | 15 ++++ drivers/vfio/vfio.h | 15 ++++ drivers/vfio/vfio_main.c | 29 ++++++- include/uapi/linux/vfio.h | 37 +++++++++ 5 files changed, 258 insertions(+), 4 deletions(-)