Message ID | 20230816121349.104436-4-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | iommufd: Add iommu hardware info reporting | expand |
> From: Liu, Yi L <yi.l.liu@intel.com> > Sent: Wednesday, August 16, 2023 8:14 PM > > Under nested IOMMU translation, userspace owns the stage-1 translation > table (e.g. the stage-1 page table of Intel VT-d or the context table of > ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and > need to be compatible with the underlying IOMMU hardware. Hence, > userspace > should know the IOMMU hardware capability before creating and > configuring > the stage-1 translation table to kernel. > > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware > information > (a.k.a capability) for a given device. The returned data is vendor > specific, userspace needs to decode it with the structure by the output > @out_data_type field. "The format of the returned data is vendor specific and must be decoded according to @out_data_type field". > + > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_hw_info *cmd = ucmd->cmd; > + void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr); > + const struct iommu_ops *ops; > + struct iommufd_device *idev; > + unsigned int data_len; > + unsigned int copy_len; > + void *data = NULL; > + int rc; > + > + if (cmd->flags || cmd->__reserved) > + return -EOPNOTSUPP; > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) > + return PTR_ERR(idev); > + > + ops = dev_iommu_ops(idev->dev); > + if (ops->hw_info) { > + data = ops->hw_info(idev->dev, &data_len, &cmd- > >out_data_type); > + if (IS_ERR(data)) { > + rc = PTR_ERR(data); > + goto err_put; > + } > + > + /* > + * drivers that have hw_info callback should have a unique > + * iommu_hw_info_type. > + */ > + if (WARN_ON_ONCE(cmd->out_data_type == > + IOMMU_HW_INFO_TYPE_NONE)) { > + rc = -ENODEV; > + goto out; > + } > + } else { > + cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE; > + data_len = 0; > + data = NULL; data is already initialized as NULL. > + > + /* > + * We return the length the kernel supports so userspace may know > what > + * the kernel capability is. It could be larger than the input buffer. > + */ > + cmd->data_len = data_len; > + > + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); > +out: out_free: > + kfree(data); > +err_put: out_put: (since this is also used in the success path) > + * To capture an iommu type specific hardware information data, > @data_uptr and > + * its length @data_len must be provided. Trailing bytes will be zeroed if the > + * user buffer is larger than the data that kernel has. Otherwise, kernel only > + * fills the buffer using the given length in @data_len. If the ioctl succeeds, > + * @data_len will be updated to the length that kernel actually supports, > + * @out_data_type will be filled to decode the data filled in the buffer > + * pointed by @data_uptr. Input @data_len == zero is allowed, no > information > + * data will be filled to user, but user space could get the > iommu_hw_info_type > + * filled in @out_data_type and the iommu hardware information data > length > + * supported by kernel filled in @data_len. I'd just keep "Input @data_len == zero is allowed" and remove all the trailing words which just duplicate with the former context. Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Looks like Yi's latest code has not addressed these comments. On Thu, Aug 17, 2023 at 07:31:42AM +0000, Tian, Kevin wrote: > > From: Liu, Yi L <yi.l.liu@intel.com> > > Sent: Wednesday, August 16, 2023 8:14 PM > > > > Under nested IOMMU translation, userspace owns the stage-1 translation > > table (e.g. the stage-1 page table of Intel VT-d or the context table of > > ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and > > need to be compatible with the underlying IOMMU hardware. Hence, > > userspace > > should know the IOMMU hardware capability before creating and > > configuring > > the stage-1 translation table to kernel. > > > > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware > > information > > (a.k.a capability) for a given device. The returned data is vendor > > specific, userspace needs to decode it with the structure by the output > > @out_data_type field. > > "The format of the returned data is vendor specific and must be decoded > according to @out_data_type field". Ack. > > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) > > +{ > > + struct iommu_hw_info *cmd = ucmd->cmd; > > + void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr); > > + const struct iommu_ops *ops; > > + struct iommufd_device *idev; > > + unsigned int data_len; > > + unsigned int copy_len; > > + void *data = NULL; [..] > > + } else { > > + cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE; > > + data_len = 0; > > + data = NULL; > > data is already initialized as NULL. Will drop. > > + > > + /* > > + * We return the length the kernel supports so userspace may know > > what > > + * the kernel capability is. It could be larger than the input buffer. > > + */ > > + cmd->data_len = data_len; > > + > > + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); > > +out: > > out_free: > > > + kfree(data); > > +err_put: > > out_put: (since this is also used in the success path) Ack for both. > > + * To capture an iommu type specific hardware information data, > > @data_uptr and > > + * its length @data_len must be provided. Trailing bytes will be zeroed if the > > + * user buffer is larger than the data that kernel has. Otherwise, kernel only > > + * fills the buffer using the given length in @data_len. If the ioctl succeeds, > > + * @data_len will be updated to the length that kernel actually supports, > > + * @out_data_type will be filled to decode the data filled in the buffer > > + * pointed by @data_uptr. Input @data_len == zero is allowed, no > > information > > + * data will be filled to user, but user space could get the > > iommu_hw_info_type > > + * filled in @out_data_type and the iommu hardware information data > > length > > + * supported by kernel filled in @data_len. > > I'd just keep "Input @data_len == zero is allowed" and remove all the > trailing words which just duplicate with the former context. Will do. > Reviewed-by: Kevin Tian <kevin.tian@intel.com> Adding this. Thanks Nic
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Friday, August 18, 2023 5:08 AM > > Looks like Yi's latest code has not addressed these comments. Yeah. Not yet. In progress to incorporate them.
On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Friday, August 18, 2023 5:08 AM > > > > Looks like Yi's latest code has not addressed these comments. > > Yeah. Not yet. In progress to incorporate them.
On Thu, Aug 17, 2023 at 05:08:34PM -0700, Nicolin Chen wrote: > On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote: > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > Sent: Friday, August 18, 2023 5:08 AM > > > > > > Looks like Yi's latest code has not addressed these comments. > > > > Yeah. Not yet. In progress to incorporate them.
On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote: > > > > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) > > > > +{ > > > > + struct iommu_hw_info *cmd = ucmd->cmd; > > > > + void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr); > > > > + const struct iommu_ops *ops; > > > > + struct iommufd_device *idev; > > > > + unsigned int data_len; > > > > + unsigned int copy_len; > > > > + void *data = NULL; > > [..] > > > > + } else { > > > > + cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE; > > > > + data_len = 0; > > > > + data = NULL; > > > > > > data is already initialized as NULL. > > Probably we can set data_len = 0 and the out_data_type to _NONE is > the top as well. Any preference? I think it is clear to remove the variable initialization so this branch is more explicit Jason
On Thu, Aug 17, 2023 at 05:21:43PM -0700, Nicolin Chen wrote: > On Thu, Aug 17, 2023 at 05:08:34PM -0700, Nicolin Chen wrote: > > On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote: > > > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > > Sent: Friday, August 18, 2023 5:08 AM > > > > > > > > Looks like Yi's latest code has not addressed these comments. > > > > > > Yeah. Not yet. In progress to incorporate them.
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 90f88c295ce0..36dff7ca3ae4 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -4,6 +4,7 @@ #include <linux/iommufd.h> #include <linux/slab.h> #include <linux/iommu.h> +#include <uapi/linux/iommufd.h> #include "../iommu-priv.h" #include "io_pagetable.h" @@ -1119,3 +1120,75 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova, return rc; } EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD); + +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) +{ + struct iommu_hw_info *cmd = ucmd->cmd; + void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr); + const struct iommu_ops *ops; + struct iommufd_device *idev; + unsigned int data_len; + unsigned int copy_len; + void *data = NULL; + int rc; + + if (cmd->flags || cmd->__reserved) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + ops = dev_iommu_ops(idev->dev); + if (ops->hw_info) { + data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto err_put; + } + + /* + * drivers that have hw_info callback should have a unique + * iommu_hw_info_type. + */ + if (WARN_ON_ONCE(cmd->out_data_type == + IOMMU_HW_INFO_TYPE_NONE)) { + rc = -ENODEV; + goto out; + } + } else { + cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE; + data_len = 0; + data = NULL; + } + + copy_len = min(cmd->data_len, data_len); + if (copy_to_user(user_ptr, data, copy_len)) { + rc = -EFAULT; + goto out; + } + + /* + * Zero the trailing bytes if the user buffer is bigger than the + * data size kernel actually has. + */ + if (copy_len < cmd->data_len) { + if (clear_user(user_ptr + copy_len, cmd->data_len - copy_len)) { + rc = -EFAULT; + goto out; + } + } + + /* + * We return the length the kernel supports so userspace may know what + * the kernel capability is. It could be larger than the input buffer. + */ + cmd->data_len = data_len; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); +out: + kfree(data); +err_put: + iommufd_put_object(&idev->obj); + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 5a45b8ba2e26..2c58670011fe 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -296,6 +296,7 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) } void iommufd_device_destroy(struct iommufd_object *obj); +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd); struct iommufd_access { struct iommufd_object obj; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 5f7e9fa45502..e71523cbd0de 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -305,6 +305,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd) union ucmd_buffer { struct iommu_destroy destroy; + struct iommu_hw_info info; struct iommu_hwpt_alloc hwpt; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; @@ -337,6 +338,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), + IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info, + __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ac11ace21edb..09d5e9cff7b3 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -46,6 +46,7 @@ enum { IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, + IOMMUFD_CMD_GET_HW_INFO, }; /** @@ -379,4 +380,45 @@ struct iommu_hwpt_alloc { enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_NONE, }; + +/** + * struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO) + * @size: sizeof(struct iommu_hw_info) + * @flags: Must be 0 + * @dev_id: The device bound to the iommufd + * @data_len: Input the length of a user buffer in bytes. Output the length of + * data that kernel supports + * @data_uptr: User pointer to a user-space buffer used by the kernel to fill + * the iommu type specific hardware information data + * @out_data_type: Output the iommu hardware info type as defined in the enum + * iommu_hw_info_type. + * @__reserved: Must be 0 + * + * Query an iommu type specific hardware information data from an iommu behind + * a given device that has been bound to iommufd. This hardware info data will + * be used to sync capabilities between the virtual iommu and the physical + * iommu, e.g. a nested translation setup needs to check the hardware info, so + * a guest stage-1 page table can be compatible with the physical iommu. + * + * To capture an iommu type specific hardware information data, @data_uptr and + * its length @data_len must be provided. Trailing bytes will be zeroed if the + * user buffer is larger than the data that kernel has. Otherwise, kernel only + * fills the buffer using the given length in @data_len. If the ioctl succeeds, + * @data_len will be updated to the length that kernel actually supports, + * @out_data_type will be filled to decode the data filled in the buffer + * pointed by @data_uptr. Input @data_len == zero is allowed, no information + * data will be filled to user, but user space could get the iommu_hw_info_type + * filled in @out_data_type and the iommu hardware information data length + * supported by kernel filled in @data_len. + */ +struct iommu_hw_info { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 data_len; + __aligned_u64 data_uptr; + __u32 out_data_type; + __u32 __reserved; +}; +#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO) #endif