diff mbox series

[v7,3/4] iommufd: Add IOMMU_GET_HW_INFO

Message ID 20230811071501.4126-4-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series iommufd: Add iommu hardware info reporting | expand

Commit Message

Yi Liu Aug. 11, 2023, 7:15 a.m. UTC
Under nested IOMMU translation, userspace owns the stage-1 translation
table (e.g. the stage-1 page table of Intel VT-d or the context table
of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific,
and need to be compatible with the underlying IOMMU hardware. Hence,
userspace should know the IOMMU hardware capability before creating and
configuring the stage-1 translation table to kernel.

This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information
(a.k.a capability) for a given device. The returned data is vendor specific,
userspace needs to decode it with the structure mapped by the @out_data_type
field.

As only physical devices have IOMMU hardware, so this will return error
if the given device is not a physical device.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/main.c | 85 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/iommufd.h | 36 +++++++++++++++
 2 files changed, 121 insertions(+)

Comments

Jason Gunthorpe Aug. 15, 2023, 4:32 p.m. UTC | #1
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:

> +static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_hw_info *cmd = ucmd->cmd;
> +	unsigned int length = cmd->data_len;
> +	struct iommufd_device *idev;
> +	void __user *user_ptr;
> +	u32 hw_info_type;
> +	int rc = 0;
> +
> +	if (cmd->flags || cmd->__reserved || !cmd->data_len)
> +		return -EOPNOTSUPP;

Is there a reason to block 0 data_len? I think this should work. The
code looks OK?

Jason
Jason Gunthorpe Aug. 15, 2023, 4:42 p.m. UTC | #2
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
> Under nested IOMMU translation, userspace owns the stage-1 translation
> table (e.g. the stage-1 page table of Intel VT-d or the context table
> of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific,
> and need to be compatible with the underlying IOMMU hardware. Hence,
> userspace should know the IOMMU hardware capability before creating and
> configuring the stage-1 translation table to kernel.
> 
> This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information
> (a.k.a capability) for a given device. The returned data is vendor specific,
> userspace needs to decode it with the structure mapped by the @out_data_type
> field.
> 
> As only physical devices have IOMMU hardware, so this will return error
> if the given device is not a physical device.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/main.c | 85 ++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/iommufd.h | 36 +++++++++++++++
>  2 files changed, 121 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 94c498b8fdf6..d459811c5381 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -17,6 +17,7 @@

I was looking at this more and this code should be in device.c:
  
> +static int iommufd_fill_hw_info(struct device *dev, void __user *user_ptr,
> +				unsigned int *length, u32 *type)
> +{

Since it is working on devices

main.c is primarily for context related stuff

Jason
Nicolin Chen Aug. 15, 2023, 5:31 p.m. UTC | #3
On Tue, Aug 15, 2023 at 01:32:01PM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
> 
> > +static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > +{
> > +	struct iommu_hw_info *cmd = ucmd->cmd;
> > +	unsigned int length = cmd->data_len;
> > +	struct iommufd_device *idev;
> > +	void __user *user_ptr;
> > +	u32 hw_info_type;
> > +	int rc = 0;
> > +
> > +	if (cmd->flags || cmd->__reserved || !cmd->data_len)
> > +		return -EOPNOTSUPP;
> 
> Is there a reason to block 0 data_len? I think this should work. The
> code looks OK?

I did a quick test passing !data_len and !data_ptr. And it works
by returning the type only.

Yet, in that case, should we mention this in the uAPI kdoc? It
feels to me that the uAPI always expects user space to read out
a length of data.

Thanks
Nic
Nicolin Chen Aug. 15, 2023, 5:36 p.m. UTC | #4
On Tue, Aug 15, 2023 at 01:42:35PM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
> > Under nested IOMMU translation, userspace owns the stage-1 translation
> > table (e.g. the stage-1 page table of Intel VT-d or the context table
> > of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific,
> > and need to be compatible with the underlying IOMMU hardware. Hence,
> > userspace should know the IOMMU hardware capability before creating and
> > configuring the stage-1 translation table to kernel.
> > 
> > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information
> > (a.k.a capability) for a given device. The returned data is vendor specific,
> > userspace needs to decode it with the structure mapped by the @out_data_type
> > field.
> > 
> > As only physical devices have IOMMU hardware, so this will return error
> > if the given device is not a physical device.
> > 
> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/main.c | 85 ++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/iommufd.h | 36 +++++++++++++++
> >  2 files changed, 121 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> > index 94c498b8fdf6..d459811c5381 100644
> > --- a/drivers/iommu/iommufd/main.c
> > +++ b/drivers/iommu/iommufd/main.c
> > @@ -17,6 +17,7 @@
> 
> I was looking at this more and this code should be in device.c:
>   
> > +static int iommufd_fill_hw_info(struct device *dev, void __user *user_ptr,
> > +				unsigned int *length, u32 *type)
> > +{
> 
> Since it is working on devices
> 
> main.c is primarily for context related stuff

Ack for that. We'd make similar changes to the other handlers too.

Thanks
Nic
Jason Gunthorpe Aug. 15, 2023, 6:29 p.m. UTC | #5
On Tue, Aug 15, 2023 at 10:31:09AM -0700, Nicolin Chen wrote:
> On Tue, Aug 15, 2023 at 01:32:01PM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
> > 
> > > +static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
> > > +{
> > > +	struct iommu_hw_info *cmd = ucmd->cmd;
> > > +	unsigned int length = cmd->data_len;
> > > +	struct iommufd_device *idev;
> > > +	void __user *user_ptr;
> > > +	u32 hw_info_type;
> > > +	int rc = 0;
> > > +
> > > +	if (cmd->flags || cmd->__reserved || !cmd->data_len)
> > > +		return -EOPNOTSUPP;
> > 
> > Is there a reason to block 0 data_len? I think this should work. The
> > code looks OK?
> 
> I did a quick test passing !data_len and !data_ptr. And it works
> by returning the type only.
> 
> Yet, in that case, should we mention this in the uAPI kdoc? It
> feels to me that the uAPI always expects user space to read out
> a length of data.

Well the way it ought to work is that userspace can pass in 0 length
and the kernel will return the correct length

So maybe this does need resending with this removed:

	*length = min(*length, data_len);

Also I see clear_user is called wrong, it doesn't return errno.

Please check and repost it ASAP I will update the branch. Probably
needs some doc adjusting too.

I came up with this:

int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
{
	struct iommu_hw_info *cmd = ucmd->cmd;
	void __user *user_ptr = u64_to_user_ptr(cmd->data_ptr);
	const struct iommu_ops *ops;
	struct iommufd_device *idev;
	unsigned int data_len;
	unsigned int copy_len;
	void *data = NULL;
	int rc;

	if (cmd->flags || cmd->__reserved)
		return -EOPNOTSUPP;

	idev = iommufd_get_device(ucmd, cmd->dev_id);
	if (IS_ERR(idev))
		return PTR_ERR(idev);

	ops = dev_iommu_ops(idev->dev);
	if (!ops->hw_info) {
		data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
		if (IS_ERR(data)) {
			rc = PTR_ERR(data);
			goto err_put;
		}

		/*
		 * drivers that have hw_info callback should have a unique
		 * iommu_hw_info_type.
		 */
		if (WARN_ON_ONCE(cmd->out_data_type ==
				 IOMMU_HW_INFO_TYPE_NONE)) {
			rc = -ENODEV;
			goto out;
		}
	} else {
		cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE;
		data_len = 0;
		data = NULL;
	}

	copy_len = min(cmd->data_len, data_len);
	if (copy_to_user(user_ptr, data, copy_len)) {
		rc = -EFAULT;
		goto out;
	}

	/*
	 * Zero the trailing bytes if the user buffer is bigger than the
	 * data size kernel actually has.
	 */
	if (copy_len < cmd->data_len) {
		if (clear_user(user_ptr + copy_len, cmd->data_len - copy_len)) {
			rc = -EFAULT;
			goto out;
		}
	}

	/*
	 * We return the length the kernel supports so userspace may know what
	 * the kernel capability is. It could be larger than the input buffer.
	 */
	cmd->data_len = data_len;

	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
out:
	kfree(data);
err_put:
	iommufd_put_object(&idev->obj);
	return rc;
}
Nicolin Chen Aug. 15, 2023, 6:53 p.m. UTC | #6
On Tue, Aug 15, 2023 at 03:29:21PM -0300, Jason Gunthorpe wrote:

> Well the way it ought to work is that userspace can pass in 0 length
> and the kernel will return the correct length
> 
> So maybe this does need resending with this removed:
> 
> 	*length = min(*length, data_len);

That "length" is 0 (copying the value of cmd->data_len), so it
should be 0 even having this line?
 
> Also I see clear_user is called wrong, it doesn't return errno.

Oh, right.

> Please check and repost it ASAP I will update the branch. Probably
> needs some doc adjusting too.

I think your version should be good. I can update the series for
the doc part. Yi can confirm tonight and report in his time zone.
And it should be available for you to take tomorrow.

> 	ops = dev_iommu_ops(idev->dev);
> 	if (!ops->hw_info) {
> 		data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);

It should be:
 	if (ops->hw_info) {

Thanks
Nic
Jason Gunthorpe Aug. 15, 2023, 6:56 p.m. UTC | #7
On Tue, Aug 15, 2023 at 11:53:26AM -0700, Nicolin Chen wrote:
> > 	ops = dev_iommu_ops(idev->dev);
> > 	if (!ops->hw_info) {
> > 		data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
> 
> It should be:
>  	if (ops->hw_info) {

Hmm, the test suite probably needs some more stuff then too since it
passed like that :)

Jason
Nicolin Chen Aug. 15, 2023, 6:58 p.m. UTC | #8
On Tue, Aug 15, 2023 at 03:56:37PM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 15, 2023 at 11:53:26AM -0700, Nicolin Chen wrote:
> > > 	ops = dev_iommu_ops(idev->dev);
> > > 	if (!ops->hw_info) {
> > > 		data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
> > 
> > It should be:
> >  	if (ops->hw_info) {
> 
> Hmm, the test suite probably needs some more stuff then too since it
> passed like that :)

Ack. I will see what I can do.
Nicolin Chen Aug. 16, 2023, 1:07 a.m. UTC | #9
On Tue, Aug 15, 2023 at 11:58:04AM -0700, Nicolin Chen wrote:
> On Tue, Aug 15, 2023 at 03:56:37PM -0300, Jason Gunthorpe wrote:
> > On Tue, Aug 15, 2023 at 11:53:26AM -0700, Nicolin Chen wrote:
> > > > 	ops = dev_iommu_ops(idev->dev);
> > > > 	if (!ops->hw_info) {
> > > > 		data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
> > > 
> > > It should be:
> > >  	if (ops->hw_info) {
> > 
> > Hmm, the test suite probably needs some more stuff then too since it
> > passed like that :)
> 
> Ack. I will see what I can do.

It actually reports errors when hw_info is defined (and it would
get an IOMMU_HW_INFO_TYPE_NONE.

#ok 62 iommufd_ioas.two_mock_domain.ioas_area_auto_destroy
# #  RUN           iommufd_ioas.two_mock_domain.get_hw_info ...
# iommufd: iommufd_utils.h:368: _test_cmd_get_hw_info: Assertion `cmd.out_data_type == IOMMU_HW_INFO_TYPE_SELFTEST' failed.
# # get_hw_info: Test terminated by assertion

By removing mock_domain_hw_info() to test the other path, simply
there would be a kernel crash.

So, I think that we are fine.

Thanks
Nicolin
diff mbox series

Patch

diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 94c498b8fdf6..d459811c5381 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -17,6 +17,7 @@ 
 #include <linux/bug.h>
 #include <uapi/linux/iommufd.h>
 #include <linux/iommufd.h>
+#include "../iommu-priv.h"
 
 #include "io_pagetable.h"
 #include "iommufd_private.h"
@@ -177,6 +178,87 @@  static int iommufd_destroy(struct iommufd_ucmd *ucmd)
 	return 0;
 }
 
+static int iommufd_fill_hw_info(struct device *dev, void __user *user_ptr,
+				unsigned int *length, u32 *type)
+{
+	const struct iommu_ops *ops;
+	unsigned int data_len;
+	void *data;
+	int rc = 0;
+
+	ops = dev_iommu_ops(dev);
+	if (!ops->hw_info) {
+		*length = 0;
+		*type = IOMMU_HW_INFO_TYPE_NONE;
+		return 0;
+	}
+
+	data = ops->hw_info(dev, &data_len, type);
+	if (IS_ERR(data))
+		return PTR_ERR(data);
+
+	/*
+	 * drivers that have hw_info callback should have a unique
+	 * iommu_hw_info_type.
+	 */
+	if (WARN_ON_ONCE(*type == IOMMU_HW_INFO_TYPE_NONE)) {
+		rc = -ENODEV;
+		goto err_free;
+	}
+
+	*length = min(*length, data_len);
+	if (copy_to_user(user_ptr, data, *length)) {
+		rc = -EFAULT;
+		goto err_free;
+	}
+
+err_free:
+	kfree(data);
+	return rc;
+}
+
+static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hw_info *cmd = ucmd->cmd;
+	unsigned int length = cmd->data_len;
+	struct iommufd_device *idev;
+	void __user *user_ptr;
+	u32 hw_info_type;
+	int rc = 0;
+
+	if (cmd->flags || cmd->__reserved || !cmd->data_len)
+		return -EOPNOTSUPP;
+
+	idev = iommufd_get_device(ucmd, cmd->dev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+
+	user_ptr = u64_to_user_ptr(cmd->data_ptr);
+
+	rc = iommufd_fill_hw_info(idev->dev, user_ptr,
+				  &length, &hw_info_type);
+	if (rc)
+		goto err_put;
+
+	/*
+	 * Zero the trailing bytes if the user buffer is bigger than the
+	 * data size kernel actually has.
+	 */
+	if (length < cmd->data_len) {
+		rc = clear_user(user_ptr + length, cmd->data_len - length);
+		if (rc)
+			goto err_put;
+	}
+
+	cmd->data_len = length;
+	cmd->out_data_type = hw_info_type;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+
+err_put:
+	iommufd_put_object(&idev->obj);
+	return rc;
+}
+
 static int iommufd_fops_open(struct inode *inode, struct file *filp)
 {
 	struct iommufd_ctx *ictx;
@@ -265,6 +347,7 @@  static int iommufd_option(struct iommufd_ucmd *ucmd)
 
 union ucmd_buffer {
 	struct iommu_destroy destroy;
+	struct iommu_hw_info info;
 	struct iommu_hwpt_alloc hwpt;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
@@ -297,6 +380,8 @@  struct iommufd_ioctl_op {
 	}
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
+	IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info,
+		 __reserved),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
 		 __reserved),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index ac11ace21edb..4a00f8fb2d54 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -46,6 +46,7 @@  enum {
 	IOMMUFD_CMD_OPTION,
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
+	IOMMUFD_CMD_GET_HW_INFO,
 };
 
 /**
@@ -379,4 +380,39 @@  struct iommu_hwpt_alloc {
 enum iommu_hw_info_type {
 	IOMMU_HW_INFO_TYPE_NONE,
 };
+
+/**
+ * struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO)
+ * @size: sizeof(struct iommu_hw_info)
+ * @flags: Must be 0
+ * @dev_id: The device bound to the iommufd
+ * @data_len: Input the length of the user buffer in bytes. Output the length
+ *            of data filled in the user buffer.
+ * @data_ptr: Pointer to the user buffer
+ * @out_data_type: Output the iommu hardware info type as defined in the enum
+ *                 iommu_hw_info_type.
+ * @__reserved: Must be 0
+ *
+ * Query the hardware information from an iommu behind a given device that has
+ * been bound to iommufd. @data_len is the size of the buffer, which captures an
+ * iommu type specific input data and a filled output data. Trailing bytes will
+ * be zeroed if the user buffer is larger than the data kernel has.
+ *
+ * The type specific data would be used to sync capabilities between the virtual
+ * IOMMU and the hardware IOMMU, e.g. a nested translation setup needs to check
+ * the hardware information, so the guest stage-1 page table will be compatible.
+ *
+ * The @out_data_type will be filled if the ioctl succeeds. It would be used to
+ * decode the data filled in the buffer pointed by @data_ptr.
+ */
+struct iommu_hw_info {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 data_len;
+	__aligned_u64 data_ptr;
+	__u32 out_data_type;
+	__u32 __reserved;
+};
+#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
 #endif