diff mbox series

[v8,3/5] iommufd: Add IOMMU_GET_HW_INFO

Message ID 20230816121349.104436-4-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series iommufd: Add iommu hardware info reporting | expand

Commit Message

Yi Liu Aug. 16, 2023, 12:13 p.m. UTC
Under nested IOMMU translation, userspace owns the stage-1 translation
table (e.g. the stage-1 page table of Intel VT-d or the context table of
ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and
need to be compatible with the underlying IOMMU hardware. Hence, userspace
should know the IOMMU hardware capability before creating and configuring
the stage-1 translation table to kernel.

This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information
(a.k.a capability) for a given device. The returned data is vendor
specific, userspace needs to decode it with the structure by the output
@out_data_type field.

As only physical devices have IOMMU hardware, so this will return error if
the given device is not a physical device.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 73 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/main.c            |  3 +
 include/uapi/linux/iommufd.h            | 42 ++++++++++++++
 4 files changed, 119 insertions(+)

Comments

Tian, Kevin Aug. 17, 2023, 7:31 a.m. UTC | #1
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, August 16, 2023 8:14 PM
> 
> Under nested IOMMU translation, userspace owns the stage-1 translation
> table (e.g. the stage-1 page table of Intel VT-d or the context table of
> ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and
> need to be compatible with the underlying IOMMU hardware. Hence,
> userspace
> should know the IOMMU hardware capability before creating and
> configuring
> the stage-1 translation table to kernel.
> 
> This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware
> information
> (a.k.a capability) for a given device. The returned data is vendor
> specific, userspace needs to decode it with the structure by the output
> @out_data_type field.

"The format of the returned data is vendor specific and must be decoded
according to @out_data_type field".

> +
> +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_hw_info *cmd = ucmd->cmd;
> +	void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
> +	const struct iommu_ops *ops;
> +	struct iommufd_device *idev;
> +	unsigned int data_len;
> +	unsigned int copy_len;
> +	void *data = NULL;
> +	int rc;
> +
> +	if (cmd->flags || cmd->__reserved)
> +		return -EOPNOTSUPP;
> +
> +	idev = iommufd_get_device(ucmd, cmd->dev_id);
> +	if (IS_ERR(idev))
> +		return PTR_ERR(idev);
> +
> +	ops = dev_iommu_ops(idev->dev);
> +	if (ops->hw_info) {
> +		data = ops->hw_info(idev->dev, &data_len, &cmd-
> >out_data_type);
> +		if (IS_ERR(data)) {
> +			rc = PTR_ERR(data);
> +			goto err_put;
> +		}
> +
> +		/*
> +		 * drivers that have hw_info callback should have a unique
> +		 * iommu_hw_info_type.
> +		 */
> +		if (WARN_ON_ONCE(cmd->out_data_type ==
> +				 IOMMU_HW_INFO_TYPE_NONE)) {
> +			rc = -ENODEV;
> +			goto out;
> +		}
> +	} else {
> +		cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
> +		data_len = 0;
> +		data = NULL;

data is already initialized as NULL.

> +
> +	/*
> +	 * We return the length the kernel supports so userspace may know
> what
> +	 * the kernel capability is. It could be larger than the input buffer.
> +	 */
> +	cmd->data_len = data_len;
> +
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +out:

out_free:

> +	kfree(data);
> +err_put:

out_put: (since this is also used in the success path)

> + * To capture an iommu type specific hardware information data,
> @data_uptr and
> + * its length @data_len must be provided. Trailing bytes will be zeroed if the
> + * user buffer is larger than the data that kernel has. Otherwise, kernel only
> + * fills the buffer using the given length in @data_len. If the ioctl succeeds,
> + * @data_len will be updated to the length that kernel actually supports,
> + * @out_data_type will be filled to decode the data filled in the buffer
> + * pointed by @data_uptr. Input @data_len == zero is allowed, no
> information
> + * data will be filled to user, but user space could get the
> iommu_hw_info_type
> + * filled in @out_data_type and the iommu hardware information data
> length
> + * supported by kernel filled in @data_len.

I'd just keep "Input @data_len == zero is allowed" and remove all the
trailing words which just duplicate with the former context.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Nicolin Chen Aug. 17, 2023, 9:07 p.m. UTC | #2
Looks like Yi's latest code has not addressed these comments.

On Thu, Aug 17, 2023 at 07:31:42AM +0000, Tian, Kevin wrote:
 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, August 16, 2023 8:14 PM
> >
> > Under nested IOMMU translation, userspace owns the stage-1 translation
> > table (e.g. the stage-1 page table of Intel VT-d or the context table of
> > ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and
> > need to be compatible with the underlying IOMMU hardware. Hence,
> > userspace
> > should know the IOMMU hardware capability before creating and
> > configuring
> > the stage-1 translation table to kernel.
> >
> > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware
> > information
> > (a.k.a capability) for a given device. The returned data is vendor
> > specific, userspace needs to decode it with the structure by the output
> > @out_data_type field.
> 
> "The format of the returned data is vendor specific and must be decoded
> according to @out_data_type field".

Ack.

> > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > +{
> > +     struct iommu_hw_info *cmd = ucmd->cmd;
> > +     void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
> > +     const struct iommu_ops *ops;
> > +     struct iommufd_device *idev;
> > +     unsigned int data_len;
> > +     unsigned int copy_len;
> > +     void *data = NULL;
[..]
> > +     } else {
> > +             cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
> > +             data_len = 0;
> > +             data = NULL;
> 
> data is already initialized as NULL.

Will drop.

> > +
> > +     /*
> > +      * We return the length the kernel supports so userspace may know
> > what
> > +      * the kernel capability is. It could be larger than the input buffer.
> > +      */
> > +     cmd->data_len = data_len;
> > +
> > +     rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > +out:
> 
> out_free:
> 
> > +     kfree(data);
> > +err_put:
> 
> out_put: (since this is also used in the success path)

Ack for both.

> > + * To capture an iommu type specific hardware information data,
> > @data_uptr and
> > + * its length @data_len must be provided. Trailing bytes will be zeroed if the
> > + * user buffer is larger than the data that kernel has. Otherwise, kernel only
> > + * fills the buffer using the given length in @data_len. If the ioctl succeeds,
> > + * @data_len will be updated to the length that kernel actually supports,
> > + * @out_data_type will be filled to decode the data filled in the buffer
> > + * pointed by @data_uptr. Input @data_len == zero is allowed, no
> > information
> > + * data will be filled to user, but user space could get the
> > iommu_hw_info_type
> > + * filled in @out_data_type and the iommu hardware information data
> > length
> > + * supported by kernel filled in @data_len.
> 
> I'd just keep "Input @data_len == zero is allowed" and remove all the
> trailing words which just duplicate with the former context.

Will do.

> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Adding this.

Thanks
Nic
Yi Liu Aug. 18, 2023, 12:04 a.m. UTC | #3
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, August 18, 2023 5:08 AM
> 
> Looks like Yi's latest code has not addressed these comments.

Yeah. Not yet. In progress to incorporate them. 
Nicolin Chen Aug. 18, 2023, 12:08 a.m. UTC | #4
On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote:

> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, August 18, 2023 5:08 AM
> >
> > Looks like Yi's latest code has not addressed these comments.
> 
> Yeah. Not yet. In progress to incorporate them. 
Nicolin Chen Aug. 18, 2023, 12:21 a.m. UTC | #5
On Thu, Aug 17, 2023 at 05:08:34PM -0700, Nicolin Chen wrote:
> On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote:
> 
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Friday, August 18, 2023 5:08 AM
> > >
> > > Looks like Yi's latest code has not addressed these comments.
> > 
> > Yeah. Not yet. In progress to incorporate them. 
Jason Gunthorpe Aug. 18, 2023, 12:54 a.m. UTC | #6
On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote:
> > > > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > > > +{
> > > > +     struct iommu_hw_info *cmd = ucmd->cmd;
> > > > +     void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
> > > > +     const struct iommu_ops *ops;
> > > > +     struct iommufd_device *idev;
> > > > +     unsigned int data_len;
> > > > +     unsigned int copy_len;
> > > > +     void *data = NULL;
> > [..]
> > > > +     } else {
> > > > +             cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
> > > > +             data_len = 0;
> > > > +             data = NULL;
> > >
> > > data is already initialized as NULL.
> 
> Probably we can set data_len = 0 and the out_data_type to _NONE is
> the top as well. Any preference?

I think it is clear to remove the variable initialization so this
branch is more explicit

Jason
Nicolin Chen Aug. 18, 2023, 1:30 a.m. UTC | #7
On Thu, Aug 17, 2023 at 05:21:43PM -0700, Nicolin Chen wrote:
> On Thu, Aug 17, 2023 at 05:08:34PM -0700, Nicolin Chen wrote:
> > On Fri, Aug 18, 2023 at 12:04:29AM +0000, Liu, Yi L wrote:
> > 
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Friday, August 18, 2023 5:08 AM
> > > >
> > > > Looks like Yi's latest code has not addressed these comments.
> > > 
> > > Yeah. Not yet. In progress to incorporate them. 
diff mbox series

Patch

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 90f88c295ce0..36dff7ca3ae4 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -4,6 +4,7 @@ 
 #include <linux/iommufd.h>
 #include <linux/slab.h>
 #include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
 #include "../iommu-priv.h"
 
 #include "io_pagetable.h"
@@ -1119,3 +1120,75 @@  int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD);
+
+int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hw_info *cmd = ucmd->cmd;
+	void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr);
+	const struct iommu_ops *ops;
+	struct iommufd_device *idev;
+	unsigned int data_len;
+	unsigned int copy_len;
+	void *data = NULL;
+	int rc;
+
+	if (cmd->flags || cmd->__reserved)
+		return -EOPNOTSUPP;
+
+	idev = iommufd_get_device(ucmd, cmd->dev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+
+	ops = dev_iommu_ops(idev->dev);
+	if (ops->hw_info) {
+		data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
+		if (IS_ERR(data)) {
+			rc = PTR_ERR(data);
+			goto err_put;
+		}
+
+		/*
+		 * drivers that have hw_info callback should have a unique
+		 * iommu_hw_info_type.
+		 */
+		if (WARN_ON_ONCE(cmd->out_data_type ==
+				 IOMMU_HW_INFO_TYPE_NONE)) {
+			rc = -ENODEV;
+			goto out;
+		}
+	} else {
+		cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
+		data_len = 0;
+		data = NULL;
+	}
+
+	copy_len = min(cmd->data_len, data_len);
+	if (copy_to_user(user_ptr, data, copy_len)) {
+		rc = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * Zero the trailing bytes if the user buffer is bigger than the
+	 * data size kernel actually has.
+	 */
+	if (copy_len < cmd->data_len) {
+		if (clear_user(user_ptr + copy_len, cmd->data_len - copy_len)) {
+			rc = -EFAULT;
+			goto out;
+		}
+	}
+
+	/*
+	 * We return the length the kernel supports so userspace may know what
+	 * the kernel capability is. It could be larger than the input buffer.
+	 */
+	cmd->data_len = data_len;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out:
+	kfree(data);
+err_put:
+	iommufd_put_object(&idev->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 5a45b8ba2e26..2c58670011fe 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -296,6 +296,7 @@  iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
 }
 
 void iommufd_device_destroy(struct iommufd_object *obj);
+int iommufd_get_hw_info(struct iommufd_ucmd *ucmd);
 
 struct iommufd_access {
 	struct iommufd_object obj;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 5f7e9fa45502..e71523cbd0de 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -305,6 +305,7 @@  static int iommufd_option(struct iommufd_ucmd *ucmd)
 
 union ucmd_buffer {
 	struct iommu_destroy destroy;
+	struct iommu_hw_info info;
 	struct iommu_hwpt_alloc hwpt;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
@@ -337,6 +338,8 @@  struct iommufd_ioctl_op {
 	}
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
+	IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info,
+		 __reserved),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
 		 __reserved),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index ac11ace21edb..09d5e9cff7b3 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -46,6 +46,7 @@  enum {
 	IOMMUFD_CMD_OPTION,
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
+	IOMMUFD_CMD_GET_HW_INFO,
 };
 
 /**
@@ -379,4 +380,45 @@  struct iommu_hwpt_alloc {
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_NONE,
 };
+
+/**
+ * struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO)
+ * @size: sizeof(struct iommu_hw_info)
+ * @flags: Must be 0
+ * @dev_id: The device bound to the iommufd
+ * @data_len: Input the length of a user buffer in bytes. Output the length of
+ *            data that kernel supports
+ * @data_uptr: User pointer to a user-space buffer used by the kernel to fill
+ *             the iommu type specific hardware information data
+ * @out_data_type: Output the iommu hardware info type as defined in the enum
+ *                 iommu_hw_info_type.
+ * @__reserved: Must be 0
+ *
+ * Query an iommu type specific hardware information data from an iommu behind
+ * a given device that has been bound to iommufd. This hardware info data will
+ * be used to sync capabilities between the virtual iommu and the physical
+ * iommu, e.g. a nested translation setup needs to check the hardware info, so
+ * a guest stage-1 page table can be compatible with the physical iommu.
+ *
+ * To capture an iommu type specific hardware information data, @data_uptr and
+ * its length @data_len must be provided. Trailing bytes will be zeroed if the
+ * user buffer is larger than the data that kernel has. Otherwise, kernel only
+ * fills the buffer using the given length in @data_len. If the ioctl succeeds,
+ * @data_len will be updated to the length that kernel actually supports,
+ * @out_data_type will be filled to decode the data filled in the buffer
+ * pointed by @data_uptr. Input @data_len == zero is allowed, no information
+ * data will be filled to user, but user space could get the iommu_hw_info_type
+ * filled in @out_data_type and the iommu hardware information data length
+ * supported by kernel filled in @data_len.
+ */
+struct iommu_hw_info {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
+	__u32 out_data_type;
+	__u32 __reserved;
+};
+#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
 #endif