Message ID | 6348cc7a72ce9f2ac0e9caf9737e70177a01eb74.1724776335.git.nicolinc@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iommufd: Add VIOMMU infrastructure (Part-1) | expand |
On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > that should be linked to a physical device id via an idev pointer. Given some of the other discussions around CC I suspect we should rename these to 'create/destroy virtual device' with an eye that eventually they would be extended like other ops with per-CC platform data. ie this would be the interface to tell the CC trusted world that a secure device is being added to a VM with some additional flags.. Right now it only conveys the vRID parameter of the virtual device being created. A following question is if these objects should have their own IDs in the iommufd space too, and then unset is not unset but just a normal destroy object. If so then the thing you put in the ids xarray would also just be a normal object struct. This is probably worth doing if this is going to grow more CC stuff later. Jason
On Thu, Sep 05, 2024 at 01:03:53PM -0300, Jason Gunthorpe wrote: > On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > > that should be linked to a physical device id via an idev pointer. > > Given some of the other discussions around CC I suspect we should > rename these to 'create/destroy virtual device' with an eye that > eventually they would be extended like other ops with per-CC platform > data. > > ie this would be the interface to tell the CC trusted world that a > secure device is being added to a VM with some additional flags.. > > Right now it only conveys the vRID parameter of the virtual device > being created. > > A following question is if these objects should have their own IDs in > the iommufd space too, and then unset is not unset but just a normal > destroy object. If so then the thing you put in the ids xarray would > also just be a normal object struct. > > This is probably worth doing if this is going to grow more CC stuff > later. Having to admit that I have been struggling to find a better name than set_vdev_id, I also thought about something similar to that "create/destroy virtual device', yet was not that confident since we only have virtual device ID in its data structure. Also, the virtual device sounds a bit confusing, given we already have idev. That being said, if we have a clear picture that in the long term we would extend it to hold more information, I think it could be a smart move. Perhaps virtual device can have its own "attach" to vIOMMU? Or would you still prefer attaching via proxy hwpt_nested? Thanks Nicolin
On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > we only have virtual device ID in its data structure. Also, the > virtual device sounds a bit confusing, given we already have idev. idev is "iommufd device" which is the physical device The virtual device is the host side handle of a device in a VM. > That being said, if we have a clear picture that in the long term > we would extend it to hold more information, I think it could be > a smart move. > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > would you still prefer attaching via proxy hwpt_nested? I was thinking just creating it against a vIOMMU is an effective "attach" and the virtual device is permanently tied to the vIOMMU at creation time. Is there more to attach? I think some CC stuff had a few more verbs in the lifecycle though Jason
On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > we only have virtual device ID in its data structure. Also, the > > virtual device sounds a bit confusing, given we already have idev. > > idev is "iommufd device" which is the physical device > > The virtual device is the host side handle of a device in a VM. Yea, we need that narrative in kdoc to clearly separate them. > > That being said, if we have a clear picture that in the long term > > we would extend it to hold more information, I think it could be > > a smart move. > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > would you still prefer attaching via proxy hwpt_nested? > > I was thinking just creating it against a vIOMMU is an effective > "attach" and the virtual device is permanently tied to the vIOMMU at > creation time. Ah, right! The create is per-viommu, so it's being attached. Nicolin
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Friday, September 6, 2024 4:15 AM > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > That being said, if we have a clear picture that in the long term > > > we would extend it to hold more information, I think it could be > > > a smart move. > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > would you still prefer attaching via proxy hwpt_nested? > > > > I was thinking just creating it against a vIOMMU is an effective > > "attach" and the virtual device is permanently tied to the vIOMMU at > > creation time. > > Ah, right! The create is per-viommu, so it's being attached. > presumably we also need check compatibility between the idev which the virtual device is created against and the stage-2 pgtable, as a normal attach required?
On Wed, Sep 11, 2024 at 06:19:10AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Friday, September 6, 2024 4:15 AM > > > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > > That being said, if we have a clear picture that in the long term > > > > we would extend it to hold more information, I think it could be > > > > a smart move. > > > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > > would you still prefer attaching via proxy hwpt_nested? > > > > > > I was thinking just creating it against a vIOMMU is an effective > > > "attach" and the virtual device is permanently tied to the vIOMMU at > > > creation time. > > > > Ah, right! The create is per-viommu, so it's being attached. > > > > presumably we also need check compatibility between the idev > which the virtual device is created against and the stage-2 pgtable, > as a normal attach required? If that's required, it can be a part of "create virtual device", where idev and viommu (holding s2 hwpt) would be all available? Thanks Nicolin
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, September 11, 2024 3:12 PM > > On Wed, Sep 11, 2024 at 06:19:10AM +0000, Tian, Kevin wrote: > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > Sent: Friday, September 6, 2024 4:15 AM > > > > > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > > > That being said, if we have a clear picture that in the long term > > > > > we would extend it to hold more information, I think it could be > > > > > a smart move. > > > > > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > > > would you still prefer attaching via proxy hwpt_nested? > > > > > > > > I was thinking just creating it against a vIOMMU is an effective > > > > "attach" and the virtual device is permanently tied to the vIOMMU at > > > > creation time. > > > > > > Ah, right! The create is per-viommu, so it's being attached. > > > > > > > presumably we also need check compatibility between the idev > > which the virtual device is created against and the stage-2 pgtable, > > as a normal attach required? > > If that's required, it can be a part of "create virtual device", > where idev and viommu (holding s2 hwpt) would be all available? > yes
On Wed, Sep 11, 2024 at 07:18:52AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Wednesday, September 11, 2024 3:12 PM > > > > On Wed, Sep 11, 2024 at 06:19:10AM +0000, Tian, Kevin wrote: > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > > Sent: Friday, September 6, 2024 4:15 AM > > > > > > > > On Thu, Sep 05, 2024 at 02:43:26PM -0300, Jason Gunthorpe wrote: > > > > > On Thu, Sep 05, 2024 at 10:37:45AM -0700, Nicolin Chen wrote: > > > > > > That being said, if we have a clear picture that in the long term > > > > > > we would extend it to hold more information, I think it could be > > > > > > a smart move. > > > > > > > > > > > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > > > > > > would you still prefer attaching via proxy hwpt_nested? > > > > > > > > > > I was thinking just creating it against a vIOMMU is an effective > > > > > "attach" and the virtual device is permanently tied to the vIOMMU at > > > > > creation time. > > > > > > > > Ah, right! The create is per-viommu, so it's being attached. > > > > > > > > > > presumably we also need check compatibility between the idev > > > which the virtual device is created against and the stage-2 pgtable, > > > as a normal attach required? > > > > If that's required, it can be a part of "create virtual device", > > where idev and viommu (holding s2 hwpt) would be all available? > > > > yes Oh, I misread your question actually. I think it's about a matching validation between dev->iommu->iommu_dev and vIOMMU->iommu_dev. Thanks Nicolin
Hi Nicolin, On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > that should be linked to a physical device id via an idev pointer. > > Continue the support IOMMU_VIOMMU_TYPE_DEFAULT for a core-managed viommu. > Provide a lookup function for drivers to load device pointer by a virtual > device id. > > Add a rw_semaphore protection around the vdev_id list. Any future ioctl > handlers that potentially access the list must grab the lock too. > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/iommufd/device.c | 12 +++ > drivers/iommu/iommufd/iommufd_private.h | 21 ++++ > drivers/iommu/iommufd/main.c | 6 ++ > drivers/iommu/iommufd/viommu.c | 121 ++++++++++++++++++++++++ > include/uapi/linux/iommufd.h | 40 ++++++++ > 5 files changed, 200 insertions(+) > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c > index 5fd3dd420290..3ad759971b32 100644 > --- a/drivers/iommu/iommufd/device.c > +++ b/drivers/iommu/iommufd/device.c > @@ -136,6 +136,18 @@ void iommufd_device_destroy(struct iommufd_object *obj) > struct iommufd_device *idev = > container_of(obj, struct iommufd_device, obj); > > + /* Unlocked since there should be no race in a destroy() */ > + if (idev->vdev_id) { > + struct iommufd_vdev_id *vdev_id = idev->vdev_id; > + struct iommufd_viommu *viommu = vdev_id->viommu; > + struct iommufd_vdev_id *old; > + > + old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL, > + GFP_KERNEL); > + WARN_ON(old != vdev_id); > + kfree(vdev_id); > + idev->vdev_id = NULL; > + } > iommu_device_release_dma_owner(idev->dev); > iommufd_put_group(idev->igroup); > if (!iommufd_selftest_is_mock_dev(idev->dev)) > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h > index 1f2a1c133b9a..2c6e168c5300 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -416,6 +416,7 @@ struct iommufd_device { > struct iommufd_object obj; > struct iommufd_ctx *ictx; > struct iommufd_group *igroup; > + struct iommufd_vdev_id *vdev_id; > struct list_head group_item; > /* always the physical device */ > struct device *dev; > @@ -533,11 +534,31 @@ struct iommufd_viommu { > struct iommufd_ctx *ictx; > struct iommufd_hwpt_paging *hwpt; > > + /* The locking order is vdev_ids_rwsem -> igroup::lock */ > + struct rw_semaphore vdev_ids_rwsem; > + struct xarray vdev_ids; > + > unsigned int type; > }; > > +struct iommufd_vdev_id { > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + u64 id; > +}; > + > +static inline struct iommufd_viommu * > +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) > +{ > + return container_of(iommufd_get_object(ucmd->ictx, id, > + IOMMUFD_OBJ_VIOMMU), > + struct iommufd_viommu, obj); > +} > + > int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); > void iommufd_viommu_destroy(struct iommufd_object *obj); > +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); > +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd); > > #ifdef CONFIG_IOMMUFD_TEST > int iommufd_test(struct iommufd_ucmd *ucmd); > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index 288ee51b6829..199ad90fa36b 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -334,6 +334,8 @@ union ucmd_buffer { > struct iommu_option option; > struct iommu_vfio_ioas vfio_ioas; > struct iommu_viommu_alloc viommu; > + struct iommu_viommu_set_vdev_id set_vdev_id; > + struct iommu_viommu_unset_vdev_id unset_vdev_id; > #ifdef CONFIG_IOMMUFD_TEST > struct iommu_test_cmd test; > #endif > @@ -387,6 +389,10 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { > __reserved), > IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, > struct iommu_viommu_alloc, out_viommu_id), > + IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, > + struct iommu_viommu_set_vdev_id, vdev_id), > + IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, > + struct iommu_viommu_unset_vdev_id, vdev_id), > #ifdef CONFIG_IOMMUFD_TEST > IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), > #endif > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c > index 200653a4bf57..8ffcd72b16b8 100644 > --- a/drivers/iommu/iommufd/viommu.c > +++ b/drivers/iommu/iommufd/viommu.c > @@ -8,6 +8,15 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) > { > struct iommufd_viommu *viommu = > container_of(obj, struct iommufd_viommu, obj); > + struct iommufd_vdev_id *vdev_id; > + unsigned long index; > + > + xa_for_each(&viommu->vdev_ids, index, vdev_id) { > + /* Unlocked since there should be no race in a destroy() */ > + vdev_id->idev->vdev_id = NULL; > + kfree(vdev_id); > + } > + xa_destroy(&viommu->vdev_ids); > > refcount_dec(&viommu->hwpt->common.obj.users); > } > @@ -53,6 +62,9 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > viommu->ictx = ucmd->ictx; > viommu->hwpt = hwpt_paging; > > + xa_init(&viommu->vdev_ids); > + init_rwsem(&viommu->vdev_ids_rwsem); > + > refcount_inc(&viommu->hwpt->common.obj.users); > > cmd->out_viommu_id = viommu->obj.id; > @@ -70,3 +82,112 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > iommufd_put_object(ucmd->ictx, &idev->obj); > return rc; > } > + > +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd; > + struct iommufd_vdev_id *vdev_id, *curr; > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + int rc = 0; > + > + if (cmd->vdev_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + down_write(&viommu->vdev_ids_rwsem); > + mutex_lock(&idev->igroup->lock); > + if (idev->vdev_id) { > + rc = -EEXIST; > + goto out_unlock_igroup; > + } > + > + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); > + if (!vdev_id) { > + rc = -ENOMEM; > + goto out_unlock_igroup; > + } > + > + vdev_id->idev = idev; > + vdev_id->viommu = viommu; > + vdev_id->id = cmd->vdev_id; My understanding of IOMMUFD is very little, but AFAICT, that means that it’s assumed that each device can only have one stream ID(RID)? As I can see in patch 17 in arm_smmu_convert_viommu_vdev_id(), it converts the virtual ID to a physical one using master->streams[0].id. Is that correct or am I missing something? As I am looking at similar problem for paravirtual IOMMU with pKVM, where the UAPI would be something similar to: GET_NUM_END_POINTS(dev) => nr_sids SET_END_POINT_VSID(dev, sid_index, vsid) Similar to what VFIO does with IRQs. As a device can have many SIDs. Thanks, Mostafa > + > + curr = xa_cmpxchg(&viommu->vdev_ids, cmd->vdev_id, NULL, vdev_id, > + GFP_KERNEL); > + if (curr) { > + rc = xa_err(curr) ? : -EBUSY; > + goto out_free; > + } > + > + idev->vdev_id = vdev_id; > + goto out_unlock_igroup; > + > +out_free: > + kfree(vdev_id); > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); > + up_write(&viommu->vdev_ids_rwsem); > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +} > + > +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd; > + struct iommufd_viommu *viommu; > + struct iommufd_vdev_id *old; > + struct iommufd_device *idev; > + int rc = 0; > + > + if (cmd->vdev_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + down_write(&viommu->vdev_ids_rwsem); > + mutex_lock(&idev->igroup->lock); > + if (!idev->vdev_id) { > + rc = -ENOENT; > + goto out_unlock_igroup; > + } > + if (idev->vdev_id->id != cmd->vdev_id) { > + rc = -EINVAL; > + goto out_unlock_igroup; > + } > + > + old = xa_cmpxchg(&viommu->vdev_ids, idev->vdev_id->id, > + idev->vdev_id, NULL, GFP_KERNEL); > + if (xa_is_err(old)) { > + rc = xa_err(old); > + goto out_unlock_igroup; > + } > + kfree(old); > + idev->vdev_id = NULL; > + > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); > + up_write(&viommu->vdev_ids_rwsem); > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +} > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h > index 51ce6a019c34..1816e89c922d 100644 > --- a/include/uapi/linux/iommufd.h > +++ b/include/uapi/linux/iommufd.h > @@ -52,6 +52,8 @@ enum { > IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, > IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, > IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, > + IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, > + IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, > }; > > /** > @@ -882,4 +884,42 @@ struct iommu_viommu_alloc { > __u32 out_viommu_id; > }; > #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) > + > +/** > + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) > + * @size: sizeof(struct iommu_viommu_set_vdev_id) > + * @viommu_id: viommu ID to associate with the device to store its virtual ID > + * @dev_id: device ID to set its virtual ID > + * @__reserved: Must be 0 > + * @vdev_id: Virtual device ID > + * > + * Set a viommu-specific virtual ID of a device > + */ > +struct iommu_viommu_set_vdev_id { > + __u32 size; > + __u32 viommu_id; > + __u32 dev_id; > + __u32 __reserved; > + __aligned_u64 vdev_id; > +}; > +#define IOMMU_VIOMMU_SET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID) > + > +/** > + * struct iommu_viommu_unset_vdev_id - ioctl(IOMMU_VIOMMU_UNSET_VDEV_ID) > + * @size: sizeof(struct iommu_viommu_unset_vdev_id) > + * @viommu_id: viommu ID associated with the device to delete its virtual ID > + * @dev_id: device ID to unset its virtual ID > + * @__reserved: Must be 0 > + * @vdev_id: Virtual device ID (for verification) > + * > + * Unset a viommu-specific virtual ID of a device > + */ > +struct iommu_viommu_unset_vdev_id { > + __u32 size; > + __u32 viommu_id; > + __u32 dev_id; > + __u32 __reserved; > + __aligned_u64 vdev_id; > +}; > +#define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) > #endif > -- > 2.43.0 >
On Fri, Sep 27, 2024 at 01:50:52PM +0000, Mostafa Saleh wrote: > My understanding of IOMMUFD is very little, but AFAICT, that means that > it’s assumed that each device can only have one stream ID(RID)? > > As I can see in patch 17 in arm_smmu_convert_viommu_vdev_id(), it > converts the virtual ID to a physical one using master->streams[0].id. > > Is that correct or am I missing something? > > As I am looking at similar problem for paravirtual IOMMU with pKVM, where > the UAPI would be something similar to: > > GET_NUM_END_POINTS(dev) => nr_sids > > SET_END_POINT_VSID(dev, sid_index, vsid) > > Similar to what VFIO does with IRQs. > > As a device can have many SIDs. We don't support multi SID through this interface, at least in this version. To do multi-sid you have to inform the VM of all the different pSIDs the device has and then setup the vSID/pSID translation to map them all to the HW invalidation logic. Which is alot more steps, and we have no use case right now. Multi-sid is also not something I expect to see in any modern PCI device, and this is VFIO PCI... Jason
On Fri, Sep 27, 2024 at 11:01:41AM -0300, Jason Gunthorpe wrote: > On Fri, Sep 27, 2024 at 01:50:52PM +0000, Mostafa Saleh wrote: > > > My understanding of IOMMUFD is very little, but AFAICT, that means that > > it’s assumed that each device can only have one stream ID(RID)? > > > > As I can see in patch 17 in arm_smmu_convert_viommu_vdev_id(), it > > converts the virtual ID to a physical one using master->streams[0].id. > > > > Is that correct or am I missing something? > > > > As I am looking at similar problem for paravirtual IOMMU with pKVM, where > > the UAPI would be something similar to: > > > > GET_NUM_END_POINTS(dev) => nr_sids > > > > SET_END_POINT_VSID(dev, sid_index, vsid) > > > > Similar to what VFIO does with IRQs. > > > > As a device can have many SIDs. > > We don't support multi SID through this interface, at least in this > version. > > To do multi-sid you have to inform the VM of all the different pSIDs > the device has and then setup the vSID/pSID translation to map them > all to the HW invalidation logic. Why would the VM need to know the pSID? The way I view this is quite close to how irq works, the VM only views the GSI which is the virtualized number. The VMM then would need to configure vSID->pSID translation, also without knowing the actual pSID, just how many SIDs are there per-device; very similar to how it configures IRQs through VFIO_DEVICE_GET_INFO/VFIO_DEVICE_SET_IRQS. And as long as we only allow 1:1 vSID to pSID mapping, I guess it would be easy to implement. > > Which is alot more steps, and we have no use case right now. Multi-sid > is also not something I expect to see in any modern PCI device, and > this is VFIO PCI... > Ah, I thought IOMMUFD would be used instead of VFIO_TYPE1*, which should cover platform devices (VFIO-platform) or am I missing something? And multi-SIDs is common in platform devices and this would be quite restricting, and I was hoping to support the pKVM vIOMMU through IOMMUFD interface. If possible, can the UAPI be designed with this in mind, even if not implemented now? Thanks, Mostafa > Jason
On Fri, Sep 27, 2024 at 02:22:20PM +0000, Mostafa Saleh wrote: > > We don't support multi SID through this interface, at least in this > > version. > > > > To do multi-sid you have to inform the VM of all the different pSIDs > > the device has and then setup the vSID/pSID translation to map them > > all to the HW invalidation logic. > > Why would the VM need to know the pSID? It doesn't need to know the pSID exactly, but it needs to know all the pSIDs that exist and have them be labeled with vSIDs. With cmdq direct assignment the VM has to issue an ATS invalidation for each and every physical device using its vSID. There is no HW path to handle a 1:N v/p SID relationship. > Ah, I thought IOMMUFD would be used instead of VFIO_TYPE1*, which should cover > platform devices (VFIO-platform) or am I missing something? It does work with platform, but AFAIK nobody is interested in that so it hasn't been any focus. Enabling multi-sid nesting, sub stream ids, etc is some additional work I think. But what I mean is the really hard use case for the vSID/pSID mapping is ATS invalidation and you won't use ATS invalidation on platform so multi-sid likely doesn't matter. STE/CD invalidation could possibly be pushed down through the per-domain ioctl and replicated to all domain attachments. We don't have code in the series to do that, but it could work from a uAPI perspective. > If possible, can the UAPI be designed with this in mind, even if not > implemented now? It is reasonable to ask. I think things are extensible enough. I'd imagine we can add a flag 'secondary ID' and then a new field 'secondary ID index' to the vdev operations when someone wants to take this on. Jason
On Fri, Sep 27, 2024 at 3:58 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Fri, Sep 27, 2024 at 02:22:20PM +0000, Mostafa Saleh wrote: > > > > We don't support multi SID through this interface, at least in this > > > version. > > > > > > To do multi-sid you have to inform the VM of all the different pSIDs > > > the device has and then setup the vSID/pSID translation to map them > > > all to the HW invalidation logic. > > > > Why would the VM need to know the pSID? > > It doesn't need to know the pSID exactly, but it needs to know all the > pSIDs that exist and have them be labeled with vSIDs. > > With cmdq direct assignment the VM has to issue an ATS invalidation > for each and every physical device using its vSID. There is no HW path > to handle a 1:N v/p SID relationship. > I see, that's for the cmdq assignment. > > Ah, I thought IOMMUFD would be used instead of VFIO_TYPE1*, which should cover > > platform devices (VFIO-platform) or am I missing something? > > It does work with platform, but AFAIK nobody is interested in that so > it hasn't been any focus. Enabling multi-sid nesting, sub stream ids, > etc is some additional work I think. > > But what I mean is the really hard use case for the vSID/pSID mapping > is ATS invalidation and you won't use ATS invalidation on platform so > multi-sid likely doesn't matter. > > STE/CD invalidation could possibly be pushed down through the > per-domain ioctl and replicated to all domain attachments. We don't > have code in the series to do that, but it could work from a uAPI > perspective. > > > If possible, can the UAPI be designed with this in mind, even if not > > implemented now? > > It is reasonable to ask. I think things are extensible enough. I'd > imagine we can add a flag 'secondary ID' and then a new field > 'secondary ID index' to the vdev operations when someone wants to take > this on. > Makes sense, I can take this when I start doing the pKVM work with IOMMUFD, in case it wasn't supported by then. Thanks, Mostafa > Jason
On Thu, Sep 05, 2024 at 10:38:23AM -0700, Nicolin Chen wrote: > On Thu, Sep 05, 2024 at 01:03:53PM -0300, Jason Gunthorpe wrote: > > On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > > > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > > > that should be linked to a physical device id via an idev pointer. > > > > Given some of the other discussions around CC I suspect we should > > rename these to 'create/destroy virtual device' with an eye that > > eventually they would be extended like other ops with per-CC platform > > data. > > > > ie this would be the interface to tell the CC trusted world that a > > secure device is being added to a VM with some additional flags.. > > > > Right now it only conveys the vRID parameter of the virtual device > > being created. > > > > A following question is if these objects should have their own IDs in > > the iommufd space too, and then unset is not unset but just a normal > > destroy object. If so then the thing you put in the ids xarray would > > also just be a normal object struct. I found that adding it as a new object makes things a lot of easier since a vdevice can take refcounts of both viommu and idev. So both destroy() callbacks wouldn't be bothered. While confirming if I am missing something from the review comments, I am not quite sure what is "the thing you put in the ids xarray".. I only added a vRID xarray per viommu, yet that doesn't seem to be able to merge into the normal object struct. Mind elaborating? Thanks Nicolin > > This is probably worth doing if this is going to grow more CC stuff > > later. > > Having to admit that I have been struggling to find a better name > than set_vdev_id, I also thought about something similar to that > "create/destroy virtual device', yet was not that confident since > we only have virtual device ID in its data structure. Also, the > virtual device sounds a bit confusing, given we already have idev. > > That being said, if we have a clear picture that in the long term > we would extend it to hold more information, I think it could be > a smart move. > > Perhaps virtual device can have its own "attach" to vIOMMU? Or > would you still prefer attaching via proxy hwpt_nested? > > Thanks > Nicolin
On Tue, Oct 01, 2024 at 01:54:05AM -0700, Nicolin Chen wrote: > On Thu, Sep 05, 2024 at 10:38:23AM -0700, Nicolin Chen wrote: > > On Thu, Sep 05, 2024 at 01:03:53PM -0300, Jason Gunthorpe wrote: > > > On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > > > > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > > > > that should be linked to a physical device id via an idev pointer. > > > > > > Given some of the other discussions around CC I suspect we should > > > rename these to 'create/destroy virtual device' with an eye that > > > eventually they would be extended like other ops with per-CC platform > > > data. > > > > > > ie this would be the interface to tell the CC trusted world that a > > > secure device is being added to a VM with some additional flags.. > > > > > > Right now it only conveys the vRID parameter of the virtual device > > > being created. > > > > > > A following question is if these objects should have their own IDs in > > > the iommufd space too, and then unset is not unset but just a normal > > > destroy object. If so then the thing you put in the ids xarray would > > > also just be a normal object struct. > > I found that adding it as a new object makes things a lot of easier > since a vdevice can take refcounts of both viommu and idev. So both > destroy() callbacks wouldn't be bothered. > > While confirming if I am missing something from the review comments, > I am not quite sure what is "the thing you put in the ids xarray".. > I only added a vRID xarray per viommu, yet that doesn't seem to be > able to merge into the normal object struct. Mind elaborating? I would think to point the vRID xarray directly to the new iommufd vdevice object Jason
On Tue, Oct 01, 2024 at 10:46:20AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 01, 2024 at 01:54:05AM -0700, Nicolin Chen wrote: > > On Thu, Sep 05, 2024 at 10:38:23AM -0700, Nicolin Chen wrote: > > > On Thu, Sep 05, 2024 at 01:03:53PM -0300, Jason Gunthorpe wrote: > > > > On Tue, Aug 27, 2024 at 09:59:43AM -0700, Nicolin Chen wrote: > > > > > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > > > > > that should be linked to a physical device id via an idev pointer. > > > > > > > > Given some of the other discussions around CC I suspect we should > > > > rename these to 'create/destroy virtual device' with an eye that > > > > eventually they would be extended like other ops with per-CC platform > > > > data. > > > > > > > > ie this would be the interface to tell the CC trusted world that a > > > > secure device is being added to a VM with some additional flags.. > > > > > > > > Right now it only conveys the vRID parameter of the virtual device > > > > being created. > > > > > > > > A following question is if these objects should have their own IDs in > > > > the iommufd space too, and then unset is not unset but just a normal > > > > destroy object. If so then the thing you put in the ids xarray would > > > > also just be a normal object struct. > > > > I found that adding it as a new object makes things a lot of easier > > since a vdevice can take refcounts of both viommu and idev. So both > > destroy() callbacks wouldn't be bothered. > > > > While confirming if I am missing something from the review comments, > > I am not quite sure what is "the thing you put in the ids xarray".. > > I only added a vRID xarray per viommu, yet that doesn't seem to be > > able to merge into the normal object struct. Mind elaborating? > > I would think to point the vRID xarray directly to the new iommufd > vdevice object Oh, I think I already have that then: ictx->xarray: objIds <-> { IOAS|HWPT|VIOMMU|VDEVICE|.. }->objs viommu->xarray: vRids <-> { VDEVICE } pointers Thanks Nicolin
On 28/8/24 02:59, Nicolin Chen wrote: > Introduce a pair of new ioctls to set/unset a per-viommu virtual device id > that should be linked to a physical device id via an idev pointer. > > Continue the support IOMMU_VIOMMU_TYPE_DEFAULT for a core-managed viommu. > Provide a lookup function for drivers to load device pointer by a virtual > device id. > > Add a rw_semaphore protection around the vdev_id list. Any future ioctl > handlers that potentially access the list must grab the lock too. > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/iommufd/device.c | 12 +++ > drivers/iommu/iommufd/iommufd_private.h | 21 ++++ > drivers/iommu/iommufd/main.c | 6 ++ > drivers/iommu/iommufd/viommu.c | 121 ++++++++++++++++++++++++ > include/uapi/linux/iommufd.h | 40 ++++++++ > 5 files changed, 200 insertions(+) > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c > index 5fd3dd420290..3ad759971b32 100644 > --- a/drivers/iommu/iommufd/device.c > +++ b/drivers/iommu/iommufd/device.c > @@ -136,6 +136,18 @@ void iommufd_device_destroy(struct iommufd_object *obj) > struct iommufd_device *idev = > container_of(obj, struct iommufd_device, obj); > > + /* Unlocked since there should be no race in a destroy() */ > + if (idev->vdev_id) { > + struct iommufd_vdev_id *vdev_id = idev->vdev_id; > + struct iommufd_viommu *viommu = vdev_id->viommu; > + struct iommufd_vdev_id *old; > + > + old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL, > + GFP_KERNEL); > + WARN_ON(old != vdev_id); > + kfree(vdev_id); > + idev->vdev_id = NULL; > + } > iommu_device_release_dma_owner(idev->dev); > iommufd_put_group(idev->igroup); > if (!iommufd_selftest_is_mock_dev(idev->dev)) > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h > index 1f2a1c133b9a..2c6e168c5300 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -416,6 +416,7 @@ struct iommufd_device { > struct iommufd_object obj; > struct iommufd_ctx *ictx; > struct iommufd_group *igroup; > + struct iommufd_vdev_id *vdev_id; > struct list_head group_item; > /* always the physical device */ > struct device *dev; > @@ -533,11 +534,31 @@ struct iommufd_viommu { > struct iommufd_ctx *ictx; > struct iommufd_hwpt_paging *hwpt; > > + /* The locking order is vdev_ids_rwsem -> igroup::lock */ > + struct rw_semaphore vdev_ids_rwsem; > + struct xarray vdev_ids; > + > unsigned int type; > }; > > +struct iommufd_vdev_id { > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + u64 id; > +}; > + > +static inline struct iommufd_viommu * > +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) > +{ > + return container_of(iommufd_get_object(ucmd->ictx, id, > + IOMMUFD_OBJ_VIOMMU), > + struct iommufd_viommu, obj); > +} > + > int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); > void iommufd_viommu_destroy(struct iommufd_object *obj); > +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); > +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd); > > #ifdef CONFIG_IOMMUFD_TEST > int iommufd_test(struct iommufd_ucmd *ucmd); > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index 288ee51b6829..199ad90fa36b 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -334,6 +334,8 @@ union ucmd_buffer { > struct iommu_option option; > struct iommu_vfio_ioas vfio_ioas; > struct iommu_viommu_alloc viommu; > + struct iommu_viommu_set_vdev_id set_vdev_id; > + struct iommu_viommu_unset_vdev_id unset_vdev_id; > #ifdef CONFIG_IOMMUFD_TEST > struct iommu_test_cmd test; > #endif > @@ -387,6 +389,10 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { > __reserved), > IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, > struct iommu_viommu_alloc, out_viommu_id), > + IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, > + struct iommu_viommu_set_vdev_id, vdev_id), > + IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, > + struct iommu_viommu_unset_vdev_id, vdev_id), > #ifdef CONFIG_IOMMUFD_TEST > IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), > #endif > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c > index 200653a4bf57..8ffcd72b16b8 100644 > --- a/drivers/iommu/iommufd/viommu.c > +++ b/drivers/iommu/iommufd/viommu.c > @@ -8,6 +8,15 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) > { > struct iommufd_viommu *viommu = > container_of(obj, struct iommufd_viommu, obj); > + struct iommufd_vdev_id *vdev_id; > + unsigned long index; > + > + xa_for_each(&viommu->vdev_ids, index, vdev_id) { > + /* Unlocked since there should be no race in a destroy() */ > + vdev_id->idev->vdev_id = NULL; > + kfree(vdev_id); > + } > + xa_destroy(&viommu->vdev_ids); > > refcount_dec(&viommu->hwpt->common.obj.users); > } > @@ -53,6 +62,9 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > viommu->ictx = ucmd->ictx; > viommu->hwpt = hwpt_paging; > > + xa_init(&viommu->vdev_ids); > + init_rwsem(&viommu->vdev_ids_rwsem); > + > refcount_inc(&viommu->hwpt->common.obj.users); > > cmd->out_viommu_id = viommu->obj.id; > @@ -70,3 +82,112 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > iommufd_put_object(ucmd->ictx, &idev->obj); > return rc; > } > + > +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd; > + struct iommufd_vdev_id *vdev_id, *curr; > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + int rc = 0; > + > + if (cmd->vdev_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + down_write(&viommu->vdev_ids_rwsem); > + mutex_lock(&idev->igroup->lock); > + if (idev->vdev_id) { > + rc = -EEXIST; > + goto out_unlock_igroup; > + } > + > + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); > + if (!vdev_id) { > + rc = -ENOMEM; > + goto out_unlock_igroup; > + } > + > + vdev_id->idev = idev; > + vdev_id->viommu = viommu; > + vdev_id->id = cmd->vdev_id; > + > + curr = xa_cmpxchg(&viommu->vdev_ids, cmd->vdev_id, NULL, vdev_id, > + GFP_KERNEL); > + if (curr) { > + rc = xa_err(curr) ? : -EBUSY; > + goto out_free; > + } > + > + idev->vdev_id = vdev_id; > + goto out_unlock_igroup; > + > +out_free: > + kfree(vdev_id); > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); > + up_write(&viommu->vdev_ids_rwsem); > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +} > + > +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd; > + struct iommufd_viommu *viommu; > + struct iommufd_vdev_id *old; > + struct iommufd_device *idev; > + int rc = 0; > + > + if (cmd->vdev_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + down_write(&viommu->vdev_ids_rwsem); > + mutex_lock(&idev->igroup->lock); > + if (!idev->vdev_id) { > + rc = -ENOENT; > + goto out_unlock_igroup; > + } > + if (idev->vdev_id->id != cmd->vdev_id) { > + rc = -EINVAL; > + goto out_unlock_igroup; > + } > + > + old = xa_cmpxchg(&viommu->vdev_ids, idev->vdev_id->id, > + idev->vdev_id, NULL, GFP_KERNEL); > + if (xa_is_err(old)) { > + rc = xa_err(old); > + goto out_unlock_igroup; > + } > + kfree(old); > + idev->vdev_id = NULL; > + > +out_unlock_igroup: > + mutex_unlock(&idev->igroup->lock); > + up_write(&viommu->vdev_ids_rwsem); > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +} > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h > index 51ce6a019c34..1816e89c922d 100644 > --- a/include/uapi/linux/iommufd.h > +++ b/include/uapi/linux/iommufd.h > @@ -52,6 +52,8 @@ enum { > IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, > IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, > IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, > + IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, > + IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, > }; > > /** > @@ -882,4 +884,42 @@ struct iommu_viommu_alloc { > __u32 out_viommu_id; > }; > #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) > + > +/** > + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) > + * @size: sizeof(struct iommu_viommu_set_vdev_id) > + * @viommu_id: viommu ID to associate with the device to store its virtual ID > + * @dev_id: device ID to set its virtual ID > + * @__reserved: Must be 0 > + * @vdev_id: Virtual device ID > + * > + * Set a viommu-specific virtual ID of a device > + */ > +struct iommu_viommu_set_vdev_id { > + __u32 size; > + __u32 viommu_id; > + __u32 dev_id; Is this ID from vfio_device_bind_iommufd.out_devid? > + __u32 __reserved; > + __aligned_u64 vdev_id; What is the nature of this id? It is not the guest's BDFn, is it? The code suggests it is ARM's "SID" == "stream ID" and "a device might be able to generate multiple StreamIDs" (how, why?)
On Fri, Oct 04, 2024 at 02:32:28PM +1000, Alexey Kardashevskiy wrote: > > +/** > > + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) > > + * @size: sizeof(struct iommu_viommu_set_vdev_id) > > + * @viommu_id: viommu ID to associate with the device to store its virtual ID > > + * @dev_id: device ID to set its virtual ID > > + * @__reserved: Must be 0 > > + * @vdev_id: Virtual device ID > > + * > > + * Set a viommu-specific virtual ID of a device > > + */ > > +struct iommu_viommu_set_vdev_id { > > + __u32 size; > > + __u32 viommu_id; > > + __u32 dev_id; > > Is this ID from vfio_device_bind_iommufd.out_devid? Yes. > > + __u32 __reserved; > > + __aligned_u64 vdev_id; > > What is the nature of this id? It is not the guest's BDFn, is it? The Not exactly but certainly can be related. Explaining below.. > code suggests it is ARM's "SID" == "stream ID" and " Yes. That's the first use case of that. > a device might be > able to generate multiple StreamIDs" (how, why?)
On Fri, Oct 04, 2024 at 02:32:28PM +1000, Alexey Kardashevskiy wrote: > > + __u32 __reserved; > > + __aligned_u64 vdev_id; > > What is the nature of this id? It should be the vIOMMU's HW representation for the virtual device. On ARM it is the stream id, the index into the Stream Table On AMD it would be the "DeviceID" the index in the Device Table On Intel it is an index into the context table The primary usage is to transform virtual invalidations from the guest into physical invalidations. > For my SEV-TIO exercise ("trusted IO"), I am looking for a kernel interface > to pass the guest's BDFs for a specific host device (which is passed > through) and nothing in the kernel has any knowledge of it atm, is this the > right place, or another ioctl() is needed here? We probably need to add the vRID as well to this struct for that reason. The vdev_id is the iommu handle, and there is a platform specific transformation between Bus/Device/Function and the iommu handle. In some cases this is math, in some cases it is ACPI/DT tables or something. So I think the kernel should not make an assumption about the relationship. Jason
On Fri, Oct 04, 2024 at 08:41:47AM -0300, Jason Gunthorpe wrote: > On Fri, Oct 04, 2024 at 02:32:28PM +1000, Alexey Kardashevskiy wrote: > > For my SEV-TIO exercise ("trusted IO"), I am looking for a kernel interface > > to pass the guest's BDFs for a specific host device (which is passed > > through) and nothing in the kernel has any knowledge of it atm, is this the > > right place, or another ioctl() is needed here? > > We probably need to add the vRID as well to this struct for that > reason. "vRID"/"vBDF" doesn't sound very generic to me to put in this structure, though PCI devices are and very likely will be the only users of this Virtual Device for a while. Any good idea? Also, I am wondering if the uAPI structure of Virtual Device should have a driver-specific data structure. And the vdev_id should be in the driver-specific struct. So, it could stay in corresponding naming, "Stream ID", "Device ID" or "Context ID" v.s. a generic "Virtual ID" in the top-level structure? Then, other info like CCA can be put in the driver-level structure of SMMU's. > The vdev_id is the iommu handle, and there is a platform specific > transformation between Bus/Device/Function and the iommu handle. In > some cases this is math, in some cases it is ACPI/DT tables or > something. > So I think the kernel should not make an assumption about the > relationship. Agreed. That also implies that a vRID is quite independent to the IOMMU right? So, I think that the reason of adding a vRID to the virtual deivce uAPI/structure should be IOMMU requiring it? Thanks Nicolin
On Fri, Oct 04, 2024 at 11:13:46AM -0700, Nicolin Chen wrote: > On Fri, Oct 04, 2024 at 08:41:47AM -0300, Jason Gunthorpe wrote: > > On Fri, Oct 04, 2024 at 02:32:28PM +1000, Alexey Kardashevskiy wrote: > > > For my SEV-TIO exercise ("trusted IO"), I am looking for a kernel interface > > > to pass the guest's BDFs for a specific host device (which is passed > > > through) and nothing in the kernel has any knowledge of it atm, is this the > > > right place, or another ioctl() is needed here? > > > > We probably need to add the vRID as well to this struct for that > > reason. > > "vRID"/"vBDF" doesn't sound very generic to me to put in this > structure, though PCI devices are and very likely will be the > only users of this Virtual Device for a while. Any good idea? It isn't necessarily bad to have a pci field as long as we can somehow understand when it is used. > Also, I am wondering if the uAPI structure of Virtual Device > should have a driver-specific data structure. And the vdev_id > should be in the driver-specific struct. So, it could stay in > corresponding naming, "Stream ID", "Device ID" or "Context ID" > v.s. a generic "Virtual ID" in the top-level structure? Then, > other info like CCA can be put in the driver-level structure > of SMMU's. I'd to avoid a iommu-driver specific structure here, but I fear we will have a "lowervisor" (sigh) specific structure for the widely varied CC/pkvm/etc world. > Agreed. That also implies that a vRID is quite independent to > the IOMMU right? So, I think that the reason of adding a vRID > to the virtual deivce uAPI/structure should be IOMMU requiring > it? I would like to use this API to link in the CC/pkvm/etc world, and use it to create not just the vIOMMU components but link up to the "lowervisor" components as well, since it is all the same stuff basically. Jason
On Fri, Oct 04, 2024 at 03:50:19PM -0300, Jason Gunthorpe wrote: > On Fri, Oct 04, 2024 at 11:13:46AM -0700, Nicolin Chen wrote: > > On Fri, Oct 04, 2024 at 08:41:47AM -0300, Jason Gunthorpe wrote: > > > On Fri, Oct 04, 2024 at 02:32:28PM +1000, Alexey Kardashevskiy wrote: > > > > For my SEV-TIO exercise ("trusted IO"), I am looking for a kernel interface > > > > to pass the guest's BDFs for a specific host device (which is passed > > > > through) and nothing in the kernel has any knowledge of it atm, is this the > > > > right place, or another ioctl() is needed here? > > > > > > We probably need to add the vRID as well to this struct for that > > > reason. > > > > "vRID"/"vBDF" doesn't sound very generic to me to put in this > > structure, though PCI devices are and very likely will be the > > only users of this Virtual Device for a while. Any good idea? > > It isn't necessarily bad to have a pci field as long as we can > somehow understand when it is used. OK. > > Also, I am wondering if the uAPI structure of Virtual Device > > should have a driver-specific data structure. And the vdev_id > > should be in the driver-specific struct. So, it could stay in > > corresponding naming, "Stream ID", "Device ID" or "Context ID" > > v.s. a generic "Virtual ID" in the top-level structure? Then, > > other info like CCA can be put in the driver-level structure > > of SMMU's. > > I'd to avoid a iommu-driver specific structure here, but I fear we > will have a "lowervisor" (sigh) specific structure for the widely > varied CC/pkvm/etc world. The design of the structure also impacts how we implement the API between iommufd and the drivers. Right now, forwarding the ID via a function parameter is fine, but we would need a user structure once we have more stuff to forward. With that, I wonder what is better for the initial version of this structure, a generic virtual ID or a driver-named ID like "Stream ID"? The latter might be more understandable/flexible, so we won't need to justify a generic virtual ID along the way if something changes in the nature? > > Agreed. That also implies that a vRID is quite independent to > > the IOMMU right? So, I think that the reason of adding a vRID > > to the virtual deivce uAPI/structure should be IOMMU requiring > > it? > > I would like to use this API to link in the CC/pkvm/etc world, and use > it to create not just the vIOMMU components but link up to the > "lowervisor" components as well, since it is all the same stuff > basically. That sounds wider than what I defined it for in my patch: * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) * ... * Allocate a virtual device instance (for a physical device) against a vIOMMU. * This instance holds the device's information in a VM, related to its vIOMMU. Would you please help rephrase it? It'd be also helpful for me to update the doc. Though I feel slightly odd if we define it wider than "vIOMMU" since this is an iommufd header... Thanks Nicolin
On Fri, Oct 04, 2024 at 12:25:19PM -0700, Nicolin Chen wrote: > With that, I wonder what is better for the initial version of > this structure, a generic virtual ID or a driver-named ID like > "Stream ID"? The latter might be more understandable/flexible, > so we won't need to justify a generic virtual ID along the way > if something changes in the nature? I think the name could be a bit more specific "viommu_device_id" maybe? And elaborate in the kdoc that this is about the identifier that the iommu HW itself uses. > That sounds wider than what I defined it for in my patch: > * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) > * ... > * Allocate a virtual device instance (for a physical device) against a vIOMMU. > * This instance holds the device's information in a VM, related to its vIOMMU. > > Would you please help rephrase it? It'd be also helpful for me > to update the doc. I think that is still OK for the moment. > Though I feel slightly odd if we define it wider than "vIOMMU" > since this is an iommufd header... The notion I have is that vIOMMU would expand to encompass not just the physical hypervisor controled vIOMMU but also the vIOMMU controlled by the trusted "lowervisor" in a pkvm/cc/whatever world. Alexey is working on vIOMMU support in CC which has the trusted world do some of the trusted vIOMMU components. I'm hoping the other people in this area will look at his design and make it fit nicely to everyone. Jason
On Fri, Oct 04, 2024 at 05:17:46PM -0300, Jason Gunthorpe wrote: > On Fri, Oct 04, 2024 at 12:25:19PM -0700, Nicolin Chen wrote: > > > With that, I wonder what is better for the initial version of > > this structure, a generic virtual ID or a driver-named ID like > > "Stream ID"? The latter might be more understandable/flexible, > > so we won't need to justify a generic virtual ID along the way > > if something changes in the nature? > > I think the name could be a bit more specific "viommu_device_id" > maybe? And elaborate in the kdoc that this is about the identifier > that the iommu HW itself uses. A "vIOMMU Device ID" might sound a bit confusing with an ID of a vIOMMU itself :-/ At this moment, I named it "virt_id" in uAPI with a description: " * @virt_id: Virtual device ID per vIOMMU" I could add to that (or just in the documentation): "e.g. ARM's Stream ID, AMD's DeviceID, Intel's Context Table ID" to clarify it further. > > That sounds wider than what I defined it for in my patch: > > * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) > > * ... > > * Allocate a virtual device instance (for a physical device) against a vIOMMU. > > * This instance holds the device's information in a VM, related to its vIOMMU. > > > > Would you please help rephrase it? It'd be also helpful for me > > to update the doc. > > I think that is still OK for the moment. > > > Though I feel slightly odd if we define it wider than "vIOMMU" > > since this is an iommufd header... > > The notion I have is that vIOMMU would expand to encompass not just > the physical hypervisor controled vIOMMU but also the vIOMMU > controlled by the trusted "lowervisor" in a pkvm/cc/whatever world. > > Alexey is working on vIOMMU support in CC which has the trusted world > do some of the trusted vIOMMU components. I'm hoping the other people > in this area will look at his design and make it fit nicely to > everyone. Oh, I didn't connect the dots that lowervisor must rely on the vIOMMU too -- I'd need to check the CC stuff in detail. In that case, having it in vIOMMU uAPI totally makes sense. Thanks Nicolin
On Fri, Oct 04, 2024 at 01:33:33PM -0700, Nicolin Chen wrote: > On Fri, Oct 04, 2024 at 05:17:46PM -0300, Jason Gunthorpe wrote: > > On Fri, Oct 04, 2024 at 12:25:19PM -0700, Nicolin Chen wrote: > > > > > With that, I wonder what is better for the initial version of > > > this structure, a generic virtual ID or a driver-named ID like > > > "Stream ID"? The latter might be more understandable/flexible, > > > so we won't need to justify a generic virtual ID along the way > > > if something changes in the nature? > > > > I think the name could be a bit more specific "viommu_device_id" > > maybe? And elaborate in the kdoc that this is about the identifier > > that the iommu HW itself uses. > > A "vIOMMU Device ID" might sound a bit confusing with an ID of a > vIOMMU itself :-/ > > At this moment, I named it "virt_id" in uAPI with a description: > " * @virt_id: Virtual device ID per vIOMMU" > > I could add to that (or just in the documentation): > "e.g. ARM's Stream ID, AMD's DeviceID, Intel's Context Table ID" > to clarify it further. Yeah probably best > > Alexey is working on vIOMMU support in CC which has the trusted world > > do some of the trusted vIOMMU components. I'm hoping the other people > > in this area will look at his design and make it fit nicely to > > everyone. > > Oh, I didn't connect the dots that lowervisor must rely on the > vIOMMU too -- I'd need to check the CC stuff in detail. In that > case, having it in vIOMMU uAPI totally makes sense. I think we are still getting through this, and I do wonder how to manage it with so much stuff still in flux and sometimes private. At least for AMD's CC case where there is an entanglement with the physical IOMMU it seems like it makes sense. Even in the pKVM type case I think you end up ripping the translation away from the "physical" iommu and there should be some coordination to ensure this handover is clean and undone when iommufd is closed. But I wonder how the S2 works there.. Jason
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5fd3dd420290..3ad759971b32 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -136,6 +136,18 @@ void iommufd_device_destroy(struct iommufd_object *obj) struct iommufd_device *idev = container_of(obj, struct iommufd_device, obj); + /* Unlocked since there should be no race in a destroy() */ + if (idev->vdev_id) { + struct iommufd_vdev_id *vdev_id = idev->vdev_id; + struct iommufd_viommu *viommu = vdev_id->viommu; + struct iommufd_vdev_id *old; + + old = xa_cmpxchg(&viommu->vdev_ids, vdev_id->id, vdev_id, NULL, + GFP_KERNEL); + WARN_ON(old != vdev_id); + kfree(vdev_id); + idev->vdev_id = NULL; + } iommu_device_release_dma_owner(idev->dev); iommufd_put_group(idev->igroup); if (!iommufd_selftest_is_mock_dev(idev->dev)) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 1f2a1c133b9a..2c6e168c5300 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -416,6 +416,7 @@ struct iommufd_device { struct iommufd_object obj; struct iommufd_ctx *ictx; struct iommufd_group *igroup; + struct iommufd_vdev_id *vdev_id; struct list_head group_item; /* always the physical device */ struct device *dev; @@ -533,11 +534,31 @@ struct iommufd_viommu { struct iommufd_ctx *ictx; struct iommufd_hwpt_paging *hwpt; + /* The locking order is vdev_ids_rwsem -> igroup::lock */ + struct rw_semaphore vdev_ids_rwsem; + struct xarray vdev_ids; + unsigned int type; }; +struct iommufd_vdev_id { + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 id; +}; + +static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd); +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd); #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 288ee51b6829..199ad90fa36b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -334,6 +334,8 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_viommu_set_vdev_id set_vdev_id; + struct iommu_viommu_unset_vdev_id unset_vdev_id; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -387,6 +389,10 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VIOMMU_SET_VDEV_ID, iommufd_viommu_set_vdev_id, + struct iommu_viommu_set_vdev_id, vdev_id), + IOCTL_OP(IOMMU_VIOMMU_UNSET_VDEV_ID, iommufd_viommu_unset_vdev_id, + struct iommu_viommu_unset_vdev_id, vdev_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 200653a4bf57..8ffcd72b16b8 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -8,6 +8,15 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) { struct iommufd_viommu *viommu = container_of(obj, struct iommufd_viommu, obj); + struct iommufd_vdev_id *vdev_id; + unsigned long index; + + xa_for_each(&viommu->vdev_ids, index, vdev_id) { + /* Unlocked since there should be no race in a destroy() */ + vdev_id->idev->vdev_id = NULL; + kfree(vdev_id); + } + xa_destroy(&viommu->vdev_ids); refcount_dec(&viommu->hwpt->common.obj.users); } @@ -53,6 +62,9 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; + xa_init(&viommu->vdev_ids); + init_rwsem(&viommu->vdev_ids_rwsem); + refcount_inc(&viommu->hwpt->common.obj.users); cmd->out_viommu_id = viommu->obj.id; @@ -70,3 +82,112 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd; + struct iommufd_vdev_id *vdev_id, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + int rc = 0; + + if (cmd->vdev_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + down_write(&viommu->vdev_ids_rwsem); + mutex_lock(&idev->igroup->lock); + if (idev->vdev_id) { + rc = -EEXIST; + goto out_unlock_igroup; + } + + vdev_id = kzalloc(sizeof(*vdev_id), GFP_KERNEL); + if (!vdev_id) { + rc = -ENOMEM; + goto out_unlock_igroup; + } + + vdev_id->idev = idev; + vdev_id->viommu = viommu; + vdev_id->id = cmd->vdev_id; + + curr = xa_cmpxchg(&viommu->vdev_ids, cmd->vdev_id, NULL, vdev_id, + GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ? : -EBUSY; + goto out_free; + } + + idev->vdev_id = vdev_id; + goto out_unlock_igroup; + +out_free: + kfree(vdev_id); +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); + up_write(&viommu->vdev_ids_rwsem); + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +} + +int iommufd_viommu_unset_vdev_id(struct iommufd_ucmd *ucmd) +{ + struct iommu_viommu_unset_vdev_id *cmd = ucmd->cmd; + struct iommufd_viommu *viommu; + struct iommufd_vdev_id *old; + struct iommufd_device *idev; + int rc = 0; + + if (cmd->vdev_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + down_write(&viommu->vdev_ids_rwsem); + mutex_lock(&idev->igroup->lock); + if (!idev->vdev_id) { + rc = -ENOENT; + goto out_unlock_igroup; + } + if (idev->vdev_id->id != cmd->vdev_id) { + rc = -EINVAL; + goto out_unlock_igroup; + } + + old = xa_cmpxchg(&viommu->vdev_ids, idev->vdev_id->id, + idev->vdev_id, NULL, GFP_KERNEL); + if (xa_is_err(old)) { + rc = xa_err(old); + goto out_unlock_igroup; + } + kfree(old); + idev->vdev_id = NULL; + +out_unlock_igroup: + mutex_unlock(&idev->igroup->lock); + up_write(&viommu->vdev_ids_rwsem); + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +} diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 51ce6a019c34..1816e89c922d 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -52,6 +52,8 @@ enum { IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f, + IOMMUFD_CMD_VIOMMU_SET_VDEV_ID = 0x90, + IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID = 0x91, }; /** @@ -882,4 +884,42 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_viommu_set_vdev_id - ioctl(IOMMU_VIOMMU_SET_VDEV_ID) + * @size: sizeof(struct iommu_viommu_set_vdev_id) + * @viommu_id: viommu ID to associate with the device to store its virtual ID + * @dev_id: device ID to set its virtual ID + * @__reserved: Must be 0 + * @vdev_id: Virtual device ID + * + * Set a viommu-specific virtual ID of a device + */ +struct iommu_viommu_set_vdev_id { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 vdev_id; +}; +#define IOMMU_VIOMMU_SET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_SET_VDEV_ID) + +/** + * struct iommu_viommu_unset_vdev_id - ioctl(IOMMU_VIOMMU_UNSET_VDEV_ID) + * @size: sizeof(struct iommu_viommu_unset_vdev_id) + * @viommu_id: viommu ID associated with the device to delete its virtual ID + * @dev_id: device ID to unset its virtual ID + * @__reserved: Must be 0 + * @vdev_id: Virtual device ID (for verification) + * + * Unset a viommu-specific virtual ID of a device + */ +struct iommu_viommu_unset_vdev_id { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 __reserved; + __aligned_u64 vdev_id; +}; +#define IOMMU_VIOMMU_UNSET_VDEV_ID _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_UNSET_VDEV_ID) #endif
Introduce a pair of new ioctls to set/unset a per-viommu virtual device id that should be linked to a physical device id via an idev pointer. Continue the support IOMMU_VIOMMU_TYPE_DEFAULT for a core-managed viommu. Provide a lookup function for drivers to load device pointer by a virtual device id. Add a rw_semaphore protection around the vdev_id list. Any future ioctl handlers that potentially access the list must grab the lock too. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/device.c | 12 +++ drivers/iommu/iommufd/iommufd_private.h | 21 ++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 121 ++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 40 ++++++++ 5 files changed, 200 insertions(+)