Message ID | 19e20e54d41a0c1ab7403264e1016c4b19293135.1730313494.git.nicolinc@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE) | expand |
On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > +{ > + struct iommufd_vdevice *vdev = > + container_of(obj, struct iommufd_vdevice, obj); > + struct iommufd_viommu *viommu = vdev->viommu; > + > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); There are crazy races that would cause this not to work. Another thread could have successfully destroyed whatever caused EEXIST and the successfully registered this same vdev to the same id. Then this will wrongly erase the other threads entry. It would be better to skip the erase directly if the EEXIST unwind is being taken. Jason
On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote: > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > > +{ > > + struct iommufd_vdevice *vdev = > > + container_of(obj, struct iommufd_vdevice, obj); > > + struct iommufd_viommu *viommu = vdev->viommu; > > + > > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > There are crazy races that would cause this not to work. Another > thread could have successfully destroyed whatever caused EEXIST and > the successfully registered this same vdev to the same id. Then this > will wrongly erase the other threads entry. > > It would be better to skip the erase directly if the EEXIST unwind is > being taken. Hmm, is the "another thread" an alloc() or a destroy()? It doesn't seem to me that there could be another destroy() on the same object since this current destroy() is the abort to an unfinalized object. And it doesn't seem that another alloc() will get the same vdev ptr since every vdev allocation in the alloc() will be different? That being said, I think we could play safer with the followings: ------------------------------------------------------------------- @@ -88,7 +88,6 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_vdevice, obj); struct iommufd_viommu *viommu = vdev->viommu; - /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); put_device(vdev->dev); @@ -128,18 +127,19 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_idev; } + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + iommufd_object_abort(ucmd->ictx, &vdev->obj); + rc = xa_err(curr) ?: -EEXIST; + goto out_put_idev; + } + vdev->id = virt_id; vdev->dev = idev->dev; get_device(idev->dev); vdev->viommu = viommu; refcount_inc(&viommu->obj.users); - curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); - if (curr) { - rc = xa_err(curr) ?: -EEXIST; - goto out_abort; - } - cmd->out_vdevice_id = vdev->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); if (rc) ------------------------------------------------------------------- Thanks Nicolin
On Thu, Oct 31, 2024 at 09:56:37AM -0700, Nicolin Chen wrote: > On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote: > > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > > > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > > > +{ > > > + struct iommufd_vdevice *vdev = > > > + container_of(obj, struct iommufd_vdevice, obj); > > > + struct iommufd_viommu *viommu = vdev->viommu; > > > + > > > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > > > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > > > There are crazy races that would cause this not to work. Another > > thread could have successfully destroyed whatever caused EEXIST and > > the successfully registered this same vdev to the same id. Then this > > will wrongly erase the other threads entry. > > > > It would be better to skip the erase directly if the EEXIST unwind is > > being taken. > > Hmm, is the "another thread" an alloc() or a destroy()? I was thinking both > It doesn't seem to me that there could be another destroy() on the > same object since this current destroy() is the abort to an > unfinalized object. And it doesn't seem that another alloc() will > get the same vdev ptr since every vdev allocation in the alloc() > will be different? Ah so you are saying that since the vdev 'old' is local to this thread it can't possibly by aliased by another? I was worried the id could be aliased, but yes, that seems right that the vdev cmpxchg would reject that. So lets leave it Jason
On Thu, Oct 31, 2024 at 02:04:46PM -0300, Jason Gunthorpe wrote: > On Thu, Oct 31, 2024 at 09:56:37AM -0700, Nicolin Chen wrote: > > On Thu, Oct 31, 2024 at 10:29:41AM -0300, Jason Gunthorpe wrote: > > > On Wed, Oct 30, 2024 at 02:35:27PM -0700, Nicolin Chen wrote: > > > > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > > > > +{ > > > > + struct iommufd_vdevice *vdev = > > > > + container_of(obj, struct iommufd_vdevice, obj); > > > > + struct iommufd_viommu *viommu = vdev->viommu; > > > > + > > > > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > > > > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > > > > > > There are crazy races that would cause this not to work. Another > > > thread could have successfully destroyed whatever caused EEXIST and > > > the successfully registered this same vdev to the same id. Then this > > > will wrongly erase the other threads entry. > > > > > > It would be better to skip the erase directly if the EEXIST unwind is > > > being taken. > > > > Hmm, is the "another thread" an alloc() or a destroy()? > > I was thinking both > > > It doesn't seem to me that there could be another destroy() on the > > same object since this current destroy() is the abort to an > > unfinalized object. And it doesn't seem that another alloc() will > > get the same vdev ptr since every vdev allocation in the alloc() > > will be different? > > Ah so you are saying that since the vdev 'old' is local to this thread > it can't possibly by aliased by another? > > I was worried the id could be aliased, but yes, that seems right that > the vdev cmpxchg would reject that. > > So lets leave it Ack. I'll still update this since xa_cmpxchg can give other errno: + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ - /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ Thanks Nicolin
On 31/10/24 08:35, Nicolin Chen wrote: > Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct > device) against a vIOMMU (struct iommufd_viommu) object in a VM. > > This vDEVICE object (and its structure) holds all the infos and attributes > in the VM, regarding the device related to the vIOMMU. > > As an initial patch, add a per-vIOMMU virtual ID. This can be: > - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table > - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table > - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table > Potentially, this vDEVICE structure would hold some vData for Confidential > Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray > that belongs to a vIOMMU object. > > Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection > of a device object and an iommufd_viommu object, take two refcounts in the > ioctl handler. > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> > --- > drivers/iommu/iommufd/iommufd_private.h | 18 ++++++ > include/linux/iommufd.h | 4 ++ > include/uapi/linux/iommufd.h | 22 +++++++ > drivers/iommu/iommufd/main.c | 6 ++ > drivers/iommu/iommufd/viommu.c | 76 +++++++++++++++++++++++++ > 5 files changed, 126 insertions(+) > > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h > index e8f5ef550cc9..062656c19a07 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -507,8 +507,26 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, > return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); > } > > +static inline struct iommufd_viommu * > +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) > +{ > + return container_of(iommufd_get_object(ucmd->ictx, id, > + IOMMUFD_OBJ_VIOMMU), > + struct iommufd_viommu, obj); > +} > + > int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); > void iommufd_viommu_destroy(struct iommufd_object *obj); > +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); > +void iommufd_vdevice_destroy(struct iommufd_object *obj); > + > +struct iommufd_vdevice { > + struct iommufd_object obj; > + struct iommufd_ctx *ictx; > + struct iommufd_viommu *viommu; > + struct device *dev; > + u64 id; /* per-vIOMMU virtual ID */ > +}; > > #ifdef CONFIG_IOMMUFD_TEST > int iommufd_test(struct iommufd_ucmd *ucmd); > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h > index f03c75410938..ee58c5c573ec 100644 > --- a/include/linux/iommufd.h > +++ b/include/linux/iommufd.h > @@ -10,6 +10,7 @@ > #include <linux/errno.h> > #include <linux/refcount.h> > #include <linux/types.h> > +#include <linux/xarray.h> > > struct device; > struct file; > @@ -31,6 +32,7 @@ enum iommufd_object_type { > IOMMUFD_OBJ_ACCESS, > IOMMUFD_OBJ_FAULT, > IOMMUFD_OBJ_VIOMMU, > + IOMMUFD_OBJ_VDEVICE, > #ifdef CONFIG_IOMMUFD_TEST > IOMMUFD_OBJ_SELFTEST, > #endif > @@ -89,6 +91,8 @@ struct iommufd_viommu { > > const struct iommufd_viommu_ops *ops; > > + struct xarray vdevs; > + > unsigned int type; > }; > > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h > index a498d4838f9a..9b5236004b8e 100644 > --- a/include/uapi/linux/iommufd.h > +++ b/include/uapi/linux/iommufd.h > @@ -53,6 +53,7 @@ enum { > IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, > IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f, > IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, > + IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, > }; > > /** > @@ -864,4 +865,25 @@ struct iommu_viommu_alloc { > __u32 out_viommu_id; > }; > #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) > + > +/** > + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) > + * @size: sizeof(struct iommu_vdevice_alloc) > + * @viommu_id: vIOMMU ID to associate with the virtual device > + * @dev_id: The physical device to allocate a virtual instance on the vIOMMU > + * @out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY > + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID > + * of AMD IOMMU, and vRID of a nested Intel VT-d to a Context Table So it is one vdevice per a passed through device (say, a network adapter), right? I am asking as there are passed through devices and IOMMU devices, and (at least on AMD) IOMMUs look like PCI devices, both in hosts and guests. For example, from the above: "@dev_id: The physical device ..." - both a network card and IOMMU are physical, so dev_id is a NIC or IOMMU? I assume that шы a NIC (but it is a source of constant confusion). Is there any plan to add guest device BDFn as well, or I can add one here for my TEE-IO exercise, if it is the right place? It is the same as vDeviceID for AMD but I am not sure about the others, hence the question. Thanks, > + * > + * Allocate a virtual device instance (for a physical device) against a vIOMMU. > + * This instance holds the device's information (related to its vIOMMU) in a VM. > + */ > +struct iommu_vdevice_alloc { > + __u32 size; > + __u32 viommu_id; > + __u32 dev_id; > + __u32 out_vdevice_id; > + __aligned_u64 virt_id; > +}; > +#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) > #endif > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c > index cc514f9bc3e6..d735fe04197f 100644 > --- a/drivers/iommu/iommufd/main.c > +++ b/drivers/iommu/iommufd/main.c > @@ -308,6 +308,7 @@ union ucmd_buffer { > struct iommu_option option; > struct iommu_vfio_ioas vfio_ioas; > struct iommu_viommu_alloc viommu; > + struct iommu_vdevice_alloc vdev; > #ifdef CONFIG_IOMMUFD_TEST > struct iommu_test_cmd test; > #endif > @@ -363,6 +364,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { > __reserved), > IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, > struct iommu_viommu_alloc, out_viommu_id), > + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, > + struct iommu_vdevice_alloc, virt_id), > #ifdef CONFIG_IOMMUFD_TEST > IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), > #endif > @@ -501,6 +504,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { > [IOMMUFD_OBJ_VIOMMU] = { > .destroy = iommufd_viommu_destroy, > }, > + [IOMMUFD_OBJ_VDEVICE] = { > + .destroy = iommufd_vdevice_destroy, > + }, > #ifdef CONFIG_IOMMUFD_TEST > [IOMMUFD_OBJ_SELFTEST] = { > .destroy = iommufd_selftest_destroy, > diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c > index 888239b78667..c82b4a07a4b1 100644 > --- a/drivers/iommu/iommufd/viommu.c > +++ b/drivers/iommu/iommufd/viommu.c > @@ -11,6 +11,7 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) > if (viommu->ops && viommu->ops->destroy) > viommu->ops->destroy(viommu); > refcount_dec(&viommu->hwpt->common.obj.users); > + xa_destroy(&viommu->vdevs); > } > > int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > @@ -53,6 +54,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > goto out_put_hwpt; > } > > + xa_init(&viommu->vdevs); > viommu->type = cmd->type; > viommu->ictx = ucmd->ictx; > viommu->hwpt = hwpt_paging; > @@ -79,3 +81,77 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) > iommufd_put_object(ucmd->ictx, &idev->obj); > return rc; > } > + > +void iommufd_vdevice_destroy(struct iommufd_object *obj) > +{ > + struct iommufd_vdevice *vdev = > + container_of(obj, struct iommufd_vdevice, obj); > + struct iommufd_viommu *viommu = vdev->viommu; > + > + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ > + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); > + refcount_dec(&viommu->obj.users); > + put_device(vdev->dev); > +} > + > +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_vdevice_alloc *cmd = ucmd->cmd; > + struct iommufd_vdevice *vdev, *curr; > + struct iommufd_viommu *viommu; > + struct iommufd_device *idev; > + u64 virt_id = cmd->virt_id; > + int rc = 0; > + > + /* virt_id indexes an xarray */ > + if (virt_id > ULONG_MAX) > + return -EINVAL; > + > + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); > + if (IS_ERR(viommu)) > + return PTR_ERR(viommu); > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) { > + rc = PTR_ERR(idev); > + goto out_put_viommu; > + } > + > + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { > + rc = -EINVAL; > + goto out_put_idev; > + } > + > + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); > + if (IS_ERR(vdev)) { > + rc = PTR_ERR(vdev); > + goto out_put_idev; > + } > + > + vdev->id = virt_id; > + vdev->dev = idev->dev; > + get_device(idev->dev); > + vdev->viommu = viommu; > + refcount_inc(&viommu->obj.users); > + > + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); > + if (curr) { > + rc = xa_err(curr) ?: -EEXIST; > + goto out_abort; > + } > + > + cmd->out_vdevice_id = vdev->obj.id; > + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); > + if (rc) > + goto out_abort; > + iommufd_object_finalize(ucmd->ictx, &vdev->obj); > + goto out_put_idev; > + > +out_abort: > + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); > +out_put_idev: > + iommufd_put_object(ucmd->ictx, &idev->obj); > +out_put_viommu: > + iommufd_put_object(ucmd->ictx, &viommu->obj); > + return rc; > +}
On Thu, Nov 07, 2024 at 09:11:27PM +1100, Alexey Kardashevskiy wrote: > On 31/10/24 08:35, Nicolin Chen wrote: > > Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct > > device) against a vIOMMU (struct iommufd_viommu) object in a VM. > > > > This vDEVICE object (and its structure) holds all the infos and attributes > > in the VM, regarding the device related to the vIOMMU. > > > > As an initial patch, add a per-vIOMMU virtual ID. This can be: > > - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table > > - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table > > - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table > > Potentially, this vDEVICE structure would hold some vData for Confidential > > Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray > > that belongs to a vIOMMU object. > > > > Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection > > of a device object and an iommufd_viommu object, take two refcounts in the > > ioctl handler. > > +/** > > + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) > > + * @size: sizeof(struct iommu_vdevice_alloc) > > + * @viommu_id: vIOMMU ID to associate with the virtual device > > + * @dev_id: The physical device to allocate a virtual instance on the vIOMMU > > + * @out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY > > + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID > > + * of AMD IOMMU, and vRID of a nested Intel VT-d to a Context Table > > > So it is one vdevice per a passed through device (say, a network > adapter), right? Yes. It's per iommufd_device per iommufd_viommu. > I am asking as there are passed through devices and > IOMMU devices, and (at least on AMD) IOMMUs look like PCI devices, both > in hosts and guests. For example, from the above: "@dev_id: The physical > device ..." - both a network card and IOMMU are physical, so dev_id is a > NIC or IOMMU? I assume that шы a NIC (but it is a source of constant > confusion). In that case, dev_id is NIC. viommu_id is IOMMU. First VMM should allocate a vIOMMU using the dev_id (NIC) to get a viommu_id, and then use this viommu_id and dev_id to allocate a vDEVICE. It might sound duplicated in this case because this AMD IOMMU is exclusive for the NIC. But ARM/Intel can be shared among devices so they can allocate a vIOMMU with device1 and allocate vDEVICEs for device1, device2, device3, and so on. > Is there any plan to add guest device BDFn as well, or I can add one > here for my TEE-IO exercise, if it is the right place? It is the same as > vDeviceID for AMD but I am not sure about the others, hence the > question. Thanks, Generally speaking, adding vRID isn't a problem so long as there is a legit reason/usecase. That being said, if it is the same as the @virt_id for AMD, why not just pass via @virt_id v.s. adding a new vRID/vBDF field? Thanks Nicolin
On Thu, Nov 07, 2024 at 08:31:39AM -0800, Nicolin Chen wrote: > > Is there any plan to add guest device BDFn as well, or I can add one > > here for my TEE-IO exercise, if it is the right place? It is the same as > > vDeviceID for AMD but I am not sure about the others, hence the > > question. Thanks, > > Generally speaking, adding vRID isn't a problem so long as there > is a legit reason/usecase. That being said, if it is the same as > the @virt_id for AMD, why not just pass via @virt_id v.s. adding > a new vRID/vBDF field? For CC I'm expecting you to add a AMD CC specific datablob to alloc viommu that convays all the specific details that the CC world needs to define the vPCI device. This might include vRID It depends how you define vdeviceid - for AMD vdeviceid is the index into the DTE table. vRID, especially if it includes a segment id may be a larger value and should be seperate. Jason
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index e8f5ef550cc9..062656c19a07 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -507,8 +507,26 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev, return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); } +static inline struct iommufd_viommu * +iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_VIOMMU), + struct iommufd_viommu, obj); +} + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_vdevice_destroy(struct iommufd_object *obj); + +struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_viommu *viommu; + struct device *dev; + u64 id; /* per-vIOMMU virtual ID */ +}; #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index f03c75410938..ee58c5c573ec 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -10,6 +10,7 @@ #include <linux/errno.h> #include <linux/refcount.h> #include <linux/types.h> +#include <linux/xarray.h> struct device; struct file; @@ -31,6 +32,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ACCESS, IOMMUFD_OBJ_FAULT, IOMMUFD_OBJ_VIOMMU, + IOMMUFD_OBJ_VDEVICE, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -89,6 +91,8 @@ struct iommufd_viommu { const struct iommufd_viommu_ops *ops; + struct xarray vdevs; + unsigned int type; }; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index a498d4838f9a..9b5236004b8e 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -53,6 +53,7 @@ enum { IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f, IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, + IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, }; /** @@ -864,4 +865,25 @@ struct iommu_viommu_alloc { __u32 out_viommu_id; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) + +/** + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) + * @size: sizeof(struct iommu_vdevice_alloc) + * @viommu_id: vIOMMU ID to associate with the virtual device + * @dev_id: The physical device to allocate a virtual instance on the vIOMMU + * @out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID + * of AMD IOMMU, and vRID of a nested Intel VT-d to a Context Table + * + * Allocate a virtual device instance (for a physical device) against a vIOMMU. + * This instance holds the device's information (related to its vIOMMU) in a VM. + */ +struct iommu_vdevice_alloc { + __u32 size; + __u32 viommu_id; + __u32 dev_id; + __u32 out_vdevice_id; + __aligned_u64 virt_id; +}; +#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index cc514f9bc3e6..d735fe04197f 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -308,6 +308,7 @@ union ucmd_buffer { struct iommu_option option; struct iommu_vfio_ioas vfio_ioas; struct iommu_viommu_alloc viommu; + struct iommu_vdevice_alloc vdev; #ifdef CONFIG_IOMMUFD_TEST struct iommu_test_cmd test; #endif @@ -363,6 +364,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, struct iommu_viommu_alloc, out_viommu_id), + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, + struct iommu_vdevice_alloc, virt_id), #ifdef CONFIG_IOMMUFD_TEST IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), #endif @@ -501,6 +504,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy, }, + [IOMMUFD_OBJ_VDEVICE] = { + .destroy = iommufd_vdevice_destroy, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 888239b78667..c82b4a07a4b1 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -11,6 +11,7 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) if (viommu->ops && viommu->ops->destroy) viommu->ops->destroy(viommu); refcount_dec(&viommu->hwpt->common.obj.users); + xa_destroy(&viommu->vdevs); } int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) @@ -53,6 +54,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_hwpt; } + xa_init(&viommu->vdevs); viommu->type = cmd->type; viommu->ictx = ucmd->ictx; viommu->hwpt = hwpt_paging; @@ -79,3 +81,77 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &idev->obj); return rc; } + +void iommufd_vdevice_destroy(struct iommufd_object *obj) +{ + struct iommufd_vdevice *vdev = + container_of(obj, struct iommufd_vdevice, obj); + struct iommufd_viommu *viommu = vdev->viommu; + + /* xa_cmpxchg is okay to fail if alloc returned -EEXIST previously */ + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); + refcount_dec(&viommu->obj.users); + put_device(vdev->dev); +} + +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_vdevice_alloc *cmd = ucmd->cmd; + struct iommufd_vdevice *vdev, *curr; + struct iommufd_viommu *viommu; + struct iommufd_device *idev; + u64 virt_id = cmd->virt_id; + int rc = 0; + + /* virt_id indexes an xarray */ + if (virt_id > ULONG_MAX) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) { + rc = PTR_ERR(idev); + goto out_put_viommu; + } + + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { + rc = -EINVAL; + goto out_put_idev; + } + + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); + if (IS_ERR(vdev)) { + rc = PTR_ERR(vdev); + goto out_put_idev; + } + + vdev->id = virt_id; + vdev->dev = idev->dev; + get_device(idev->dev); + vdev->viommu = viommu; + refcount_inc(&viommu->obj.users); + + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); + if (curr) { + rc = xa_err(curr) ?: -EEXIST; + goto out_abort; + } + + cmd->out_vdevice_id = vdev->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &vdev->obj); + goto out_put_idev; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); +out_put_idev: + iommufd_put_object(ucmd->ictx, &idev->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +}
Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device (struct device) against a vIOMMU (struct iommufd_viommu) object in a VM. This vDEVICE object (and its structure) holds all the infos and attributes in the VM, regarding the device related to the vIOMMU. As an initial patch, add a per-vIOMMU virtual ID. This can be: - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table - Virtual RID on a nested Intel VT-D IOMMU, an index to a Context Table Potentially, this vDEVICE structure would hold some vData for Confidential Compute Architecture (CCA). Use this virtual ID to index an "vdevs" xarray that belongs to a vIOMMU object. Add a new ioctl for vDEVICE allocations. Since a vDEVICE is a connection of a device object and an iommufd_viommu object, take two refcounts in the ioctl handler. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/iommufd_private.h | 18 ++++++ include/linux/iommufd.h | 4 ++ include/uapi/linux/iommufd.h | 22 +++++++ drivers/iommu/iommufd/main.c | 6 ++ drivers/iommu/iommufd/viommu.c | 76 +++++++++++++++++++++++++ 5 files changed, 126 insertions(+)