diff mbox series

[01/11] iommu: Add device dma ownership set/release interfaces

Message ID 20211115020552.2378167-2-baolu.lu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Fix BUG_ON in vfio_iommu_group_notifier() | expand

Commit Message

Baolu Lu Nov. 15, 2021, 2:05 a.m. UTC
From the perspective of who is initiating the device to do DMA, device
DMA could be divided into the following types:

        DMA_OWNER_KERNEL: kernel device driver intiates the DMA
        DMA_OWNER_USER: userspace device driver intiates the DMA

DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
same iommu group as an iommu group is the smallest granularity of device
isolation and protection that the IOMMU subsystem can guarantee. This
extends the iommu core to enforce this exclusion when devices are
assigned to userspace.

Basically two new interfaces are provided:

        int iommu_device_set_dma_owner(struct device *dev,
                enum iommu_dma_owner mode, struct file *user_file);
        void iommu_device_release_dma_owner(struct device *dev,
                enum iommu_dma_owner mode);

Although above interfaces are per-device, DMA owner is tracked per group
under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
and DMA_OWNER_USER set at the same time. Violation of this assumption
fails iommu_device_set_dma_owner().

Kernel driver which does DMA have DMA_OWNER_KENREL automatically
set/released in the driver binding process (see next patch).

Kernel driver which doesn't do DMA should not set the owner type (via a
new suppress flag in next patch). Device bound to such driver is considered
same as a driver-less device which is compatible to all owner types.

Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
a device before the userspace is allowed to access it, plus a fd pointer to
mark the user identity so a single group cannot be operated by multiple
users simultaneously. Vice versa, the owner type should be released after
the user access permission is withdrawn.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h |  31 ++++++++++++
 drivers/iommu/iommu.c | 106 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)

Comments

Christoph Hellwig Nov. 15, 2021, 1:14 p.m. UTC | #1
On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> +enum iommu_dma_owner {
> +	DMA_OWNER_NONE,
> +	DMA_OWNER_KERNEL,
> +	DMA_OWNER_USER,
> +};
> +

> +	enum iommu_dma_owner dma_owner;
> +	refcount_t owner_cnt;
> +	struct file *owner_user_file;

I'd just overload the ownership into owner_user_file,

 NULL			-> no owner
 (struct file *)1UL)	-> kernel
 real pointer		-> user

Which could simplify a lot of the code dealing with the owner.
Bjorn Helgaas Nov. 15, 2021, 8:38 p.m. UTC | #2
On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> From the perspective of who is initiating the device to do DMA, device
> DMA could be divided into the following types:
> 
>         DMA_OWNER_KERNEL: kernel device driver intiates the DMA
>         DMA_OWNER_USER: userspace device driver intiates the DMA

s/intiates/initiates/ (twice)

As your first sentence suggests, the driver doesn't actually
*initiate* the DMA in either case.  One of the drivers programs the
device, and the *device* initiates the DMA.

> DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
> same iommu group as an iommu group is the smallest granularity of device
> isolation and protection that the IOMMU subsystem can guarantee.

I think this basically says DMA_OWNER_KERNEL and DMA_OWNER_USER are
attributes of the iommu_group (not an individual device), and it
applies to all devices in the iommu_group.  Below, you allude to the
fact that the interfaces are per-device.  It's not clear to me why you
made a per-device interface instead of a per-group interface.

> This
> extends the iommu core to enforce this exclusion when devices are
> assigned to userspace.
> 
> Basically two new interfaces are provided:
> 
>         int iommu_device_set_dma_owner(struct device *dev,
>                 enum iommu_dma_owner mode, struct file *user_file);
>         void iommu_device_release_dma_owner(struct device *dev,
>                 enum iommu_dma_owner mode);
> 
> Although above interfaces are per-device, DMA owner is tracked per group
> under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
> and DMA_OWNER_USER set at the same time. Violation of this assumption
> fails iommu_device_set_dma_owner().
> 
> Kernel driver which does DMA have DMA_OWNER_KENREL automatically
> set/released in the driver binding process (see next patch).

s/DMA_OWNER_KENREL/DMA_OWNER_KERNEL/

> Kernel driver which doesn't do DMA should not set the owner type (via a
> new suppress flag in next patch). Device bound to such driver is considered
> same as a driver-less device which is compatible to all owner types.
> 
> Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
> a device before the userspace is allowed to access it, plus a fd pointer to
> mark the user identity so a single group cannot be operated by multiple
> users simultaneously. Vice versa, the owner type should be released after
> the user access permission is withdrawn.
Baolu Lu Nov. 16, 2021, 1:52 a.m. UTC | #3
Hi Bjorn,

On 11/16/21 4:38 AM, Bjorn Helgaas wrote:
> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>>  From the perspective of who is initiating the device to do DMA, device
>> DMA could be divided into the following types:
>>
>>          DMA_OWNER_KERNEL: kernel device driver intiates the DMA
>>          DMA_OWNER_USER: userspace device driver intiates the DMA
> 
> s/intiates/initiates/ (twice)

Yes.

> 
> As your first sentence suggests, the driver doesn't actually
> *initiate* the DMA in either case.  One of the drivers programs the
> device, and the *device* initiates the DMA.

You are right. I could rephrase it as:

"From the perspective of who is controlling the device to do DMA ..."

> 
>> DMA_OWNER_KERNEL and DMA_OWNER_USER are exclusive for all devices in
>> same iommu group as an iommu group is the smallest granularity of device
>> isolation and protection that the IOMMU subsystem can guarantee.
> 
> I think this basically says DMA_OWNER_KERNEL and DMA_OWNER_USER are
> attributes of the iommu_group (not an individual device), and it
> applies to all devices in the iommu_group.  Below, you allude to the
> fact that the interfaces are per-device.  It's not clear to me why you
> made a per-device interface instead of a per-group interface.

Yes, the attributes are of the iommu_group. We have both per-device and
per-iommu_group interfaces. The former is for device drivers and the
latter is only for vfio who has an iommu_group based iommu abstract.

>> This
>> extends the iommu core to enforce this exclusion when devices are
>> assigned to userspace.
>>
>> Basically two new interfaces are provided:
>>
>>          int iommu_device_set_dma_owner(struct device *dev,
>>                  enum iommu_dma_owner mode, struct file *user_file);
>>          void iommu_device_release_dma_owner(struct device *dev,
>>                  enum iommu_dma_owner mode);
>>
>> Although above interfaces are per-device, DMA owner is tracked per group
>> under the hood. An iommu group cannot have both DMA_OWNER_KERNEL
>> and DMA_OWNER_USER set at the same time. Violation of this assumption
>> fails iommu_device_set_dma_owner().
>>
>> Kernel driver which does DMA have DMA_OWNER_KENREL automatically
>> set/released in the driver binding process (see next patch).
> 
> s/DMA_OWNER_KENREL/DMA_OWNER_KERNEL/

Yes. Sorry for the typo.

> 
>> Kernel driver which doesn't do DMA should not set the owner type (via a
>> new suppress flag in next patch). Device bound to such driver is considered
>> same as a driver-less device which is compatible to all owner types.
>>
>> Userspace driver framework (e.g. vfio) should set DMA_OWNER_USER for
>> a device before the userspace is allowed to access it, plus a fd pointer to
>> mark the user identity so a single group cannot be operated by multiple
>> users simultaneously. Vice versa, the owner type should be released after
>> the user access permission is withdrawn.

Best regards,
baolu
Baolu Lu Nov. 16, 2021, 1:57 a.m. UTC | #4
Hi Christoph,

On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>> +enum iommu_dma_owner {
>> +	DMA_OWNER_NONE,
>> +	DMA_OWNER_KERNEL,
>> +	DMA_OWNER_USER,
>> +};
>> +
> 
>> +	enum iommu_dma_owner dma_owner;
>> +	refcount_t owner_cnt;
>> +	struct file *owner_user_file;
> 
> I'd just overload the ownership into owner_user_file,
> 
>   NULL			-> no owner
>   (struct file *)1UL)	-> kernel
>   real pointer		-> user
> 
> Which could simplify a lot of the code dealing with the owner.
> 

Yeah! Sounds reasonable. I will make this in the next version.

Best regards,
baolu
Jason Gunthorpe Nov. 16, 2021, 1:46 p.m. UTC | #5
On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> Hi Christoph,
> 
> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > +enum iommu_dma_owner {
> > > +	DMA_OWNER_NONE,
> > > +	DMA_OWNER_KERNEL,
> > > +	DMA_OWNER_USER,
> > > +};
> > > +
> > 
> > > +	enum iommu_dma_owner dma_owner;
> > > +	refcount_t owner_cnt;
> > > +	struct file *owner_user_file;
> > 
> > I'd just overload the ownership into owner_user_file,
> > 
> >   NULL			-> no owner
> >   (struct file *)1UL)	-> kernel
> >   real pointer		-> user
> > 
> > Which could simplify a lot of the code dealing with the owner.
> > 
> 
> Yeah! Sounds reasonable. I will make this in the next version.

It would be good to figure out how to make iommu_attach_device()
enforce no other driver binding as a kernel user without a file *, as
Robin pointed to, before optimizing this.

This fixes an existing bug where iommu_attach_device() only checks the
group size and is vunerable to a hot plug increasing the group size
after it returns. That check should be replaced by this series's logic
instead.

Jason
Baolu Lu Nov. 17, 2021, 5:22 a.m. UTC | #6
Hi Jason,

On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
>> Hi Christoph,
>>
>> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
>>> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>>>> +enum iommu_dma_owner {
>>>> +	DMA_OWNER_NONE,
>>>> +	DMA_OWNER_KERNEL,
>>>> +	DMA_OWNER_USER,
>>>> +};
>>>> +
>>>
>>>> +	enum iommu_dma_owner dma_owner;
>>>> +	refcount_t owner_cnt;
>>>> +	struct file *owner_user_file;
>>>
>>> I'd just overload the ownership into owner_user_file,
>>>
>>>    NULL			-> no owner
>>>    (struct file *)1UL)	-> kernel
>>>    real pointer		-> user
>>>
>>> Which could simplify a lot of the code dealing with the owner.
>>>
>>
>> Yeah! Sounds reasonable. I will make this in the next version.
> 
> It would be good to figure out how to make iommu_attach_device()
> enforce no other driver binding as a kernel user without a file *, as
> Robin pointed to, before optimizing this.
> 
> This fixes an existing bug where iommu_attach_device() only checks the
> group size and is vunerable to a hot plug increasing the group size
> after it returns. That check should be replaced by this series's logic
> instead.

As my my understanding, the essence of this problem is that only the
user owner of the iommu_group could attach an UNMANAGED domain to it.
If I understand it right, how about introducing a new interface to
allocate a user managed domain and storing the user file pointer in it.

--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,7 @@ struct iommu_domain {
         void *handler_token;
         struct iommu_domain_geometry geometry;
         struct iommu_dma_cookie *iova_cookie;
+       struct file *owner_user_file;
  };
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1902,6 +1902,18 @@ struct iommu_domain *iommu_domain_alloc(struct 
bus_type *bus)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_alloc);

+struct iommu_domain *iommu_domain_alloc_user(struct bus_type *bus,
+                                            struct file *filep)
+{
+       struct iommu_domain *domain;
+
+       domain = __iommu_domain_alloc(bus, IOMMU_DOMAIN_UNMANAGED);
+       if (domain)
+               domain->owner_user_file = filep;
+
+       return domain;
+}

When attaching a domain to an user-owned iommu_group, both group and
domain should have matched user fd.

Does above help here?

Best regards,
baolu
Jason Gunthorpe Nov. 17, 2021, 1:35 p.m. UTC | #7
On Wed, Nov 17, 2021 at 01:22:19PM +0800, Lu Baolu wrote:
> Hi Jason,
> 
> On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
> > On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > > Hi Christoph,
> > > 
> > > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > > +enum iommu_dma_owner {
> > > > > +	DMA_OWNER_NONE,
> > > > > +	DMA_OWNER_KERNEL,
> > > > > +	DMA_OWNER_USER,
> > > > > +};
> > > > > +
> > > > 
> > > > > +	enum iommu_dma_owner dma_owner;
> > > > > +	refcount_t owner_cnt;
> > > > > +	struct file *owner_user_file;
> > > > 
> > > > I'd just overload the ownership into owner_user_file,
> > > > 
> > > >    NULL			-> no owner
> > > >    (struct file *)1UL)	-> kernel
> > > >    real pointer		-> user
> > > > 
> > > > Which could simplify a lot of the code dealing with the owner.
> > > > 
> > > 
> > > Yeah! Sounds reasonable. I will make this in the next version.
> > 
> > It would be good to figure out how to make iommu_attach_device()
> > enforce no other driver binding as a kernel user without a file *, as
> > Robin pointed to, before optimizing this.
> > 
> > This fixes an existing bug where iommu_attach_device() only checks the
> > group size and is vunerable to a hot plug increasing the group size
> > after it returns. That check should be replaced by this series's logic
> > instead.
> 
> As my my understanding, the essence of this problem is that only the
> user owner of the iommu_group could attach an UNMANAGED domain to it.
> If I understand it right, how about introducing a new interface to
> allocate a user managed domain and storing the user file pointer in it.

For iommu_attach_device() the semantic is simple non-sharing, so there
is no need for the file * at all, it can just be NULL.

> Does above help here?

No, iommu_attach_device() is kernel only and should not interact with
userspace.

I'm also going to see if I can learn what Tegra is doing with
iommu_attach_group()

Jason
Baolu Lu Nov. 18, 2021, 1:12 a.m. UTC | #8
On 11/17/21 9:35 PM, Jason Gunthorpe wrote:
> On Wed, Nov 17, 2021 at 01:22:19PM +0800, Lu Baolu wrote:
>> Hi Jason,
>>
>> On 11/16/21 9:46 PM, Jason Gunthorpe wrote:
>>> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
>>>> Hi Christoph,
>>>>
>>>> On 11/15/21 9:14 PM, Christoph Hellwig wrote:
>>>>> On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
>>>>>> +enum iommu_dma_owner {
>>>>>> +	DMA_OWNER_NONE,
>>>>>> +	DMA_OWNER_KERNEL,
>>>>>> +	DMA_OWNER_USER,
>>>>>> +};
>>>>>> +
>>>>>
>>>>>> +	enum iommu_dma_owner dma_owner;
>>>>>> +	refcount_t owner_cnt;
>>>>>> +	struct file *owner_user_file;
>>>>>
>>>>> I'd just overload the ownership into owner_user_file,
>>>>>
>>>>>     NULL			-> no owner
>>>>>     (struct file *)1UL)	-> kernel
>>>>>     real pointer		-> user
>>>>>
>>>>> Which could simplify a lot of the code dealing with the owner.
>>>>>
>>>>
>>>> Yeah! Sounds reasonable. I will make this in the next version.
>>>
>>> It would be good to figure out how to make iommu_attach_device()
>>> enforce no other driver binding as a kernel user without a file *, as
>>> Robin pointed to, before optimizing this.
>>>
>>> This fixes an existing bug where iommu_attach_device() only checks the
>>> group size and is vunerable to a hot plug increasing the group size
>>> after it returns. That check should be replaced by this series's logic
>>> instead.
>>
>> As my my understanding, the essence of this problem is that only the
>> user owner of the iommu_group could attach an UNMANAGED domain to it.
>> If I understand it right, how about introducing a new interface to
>> allocate a user managed domain and storing the user file pointer in it.
> 
> For iommu_attach_device() the semantic is simple non-sharing, so there
> is no need for the file * at all, it can just be NULL.

The file * being NULL means the device is only owned by the kernel
driver. Perhaps we can check this pointer in iommu_attach_device() to
avoid using it for user domain attachment.

> 
>> Does above help here?
> 
> No, iommu_attach_device() is kernel only and should not interact with
> userspace.

The existing iommu_attach_device() allows only for singleton group. As
we have added group ownership attribute, we can enforce this interface
only for kernel domain usage.

> 
> I'm also going to see if I can learn what Tegra is doing with
> iommu_attach_group()

Okay! Thank you!

> 
> Jason
> 

Best regards,
baolu
Tian, Kevin Nov. 18, 2021, 2:39 a.m. UTC | #9
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 16, 2021 9:46 PM
> 
> On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > Hi Christoph,
> >
> > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > +enum iommu_dma_owner {
> > > > +	DMA_OWNER_NONE,
> > > > +	DMA_OWNER_KERNEL,
> > > > +	DMA_OWNER_USER,
> > > > +};
> > > > +
> > >
> > > > +	enum iommu_dma_owner dma_owner;
> > > > +	refcount_t owner_cnt;
> > > > +	struct file *owner_user_file;
> > >
> > > I'd just overload the ownership into owner_user_file,
> > >
> > >   NULL			-> no owner
> > >   (struct file *)1UL)	-> kernel
> > >   real pointer		-> user
> > >
> > > Which could simplify a lot of the code dealing with the owner.
> > >
> >
> > Yeah! Sounds reasonable. I will make this in the next version.
> 
> It would be good to figure out how to make iommu_attach_device()
> enforce no other driver binding as a kernel user without a file *, as
> Robin pointed to, before optimizing this.
> 
> This fixes an existing bug where iommu_attach_device() only checks the
> group size and is vunerable to a hot plug increasing the group size
> after it returns. That check should be replaced by this series's logic
> instead.
> 

I think this existing bug in iommu_attach_devce() is different from 
what this series is attempting to solve. To avoid breaking singleton
group assumption there the ideal band-aid is to fail device hotplug.
Otherwise some IOVA ranges which are supposed to go upstream 
to IOMMU may be considered as p2p and routed to the hotplugged
device instead. In concept a singleton group is different from a
multi-devices group which has only one device bound to driver...

This series aims to avoid conflict having both user and kernel drivers
mixed in a multi-devices group.

Thanks
Kevin
Jason Gunthorpe Nov. 18, 2021, 1:33 p.m. UTC | #10
On Thu, Nov 18, 2021 at 02:39:45AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, November 16, 2021 9:46 PM
> > 
> > On Tue, Nov 16, 2021 at 09:57:30AM +0800, Lu Baolu wrote:
> > > Hi Christoph,
> > >
> > > On 11/15/21 9:14 PM, Christoph Hellwig wrote:
> > > > On Mon, Nov 15, 2021 at 10:05:42AM +0800, Lu Baolu wrote:
> > > > > +enum iommu_dma_owner {
> > > > > +	DMA_OWNER_NONE,
> > > > > +	DMA_OWNER_KERNEL,
> > > > > +	DMA_OWNER_USER,
> > > > > +};
> > > > > +
> > > >
> > > > > +	enum iommu_dma_owner dma_owner;
> > > > > +	refcount_t owner_cnt;
> > > > > +	struct file *owner_user_file;
> > > >
> > > > I'd just overload the ownership into owner_user_file,
> > > >
> > > >   NULL			-> no owner
> > > >   (struct file *)1UL)	-> kernel
> > > >   real pointer		-> user
> > > >
> > > > Which could simplify a lot of the code dealing with the owner.
> > > >
> > >
> > > Yeah! Sounds reasonable. I will make this in the next version.
> > 
> > It would be good to figure out how to make iommu_attach_device()
> > enforce no other driver binding as a kernel user without a file *, as
> > Robin pointed to, before optimizing this.
> > 
> > This fixes an existing bug where iommu_attach_device() only checks the
> > group size and is vunerable to a hot plug increasing the group size
> > after it returns. That check should be replaced by this series's logic
> > instead.
> > 
> 
> I think this existing bug in iommu_attach_devce() is different from 
> what this series is attempting to solve. To avoid breaking singleton
> group assumption there the ideal band-aid is to fail device hotplug.
> Otherwise some IOVA ranges which are supposed to go upstream 
> to IOMMU may be considered as p2p and routed to the hotplugged
> device instead.

Yes, but the instability of the reserved regions during hotplug with
!ACS seems like an entirely different problem. It affects everything,
including VFIO, and multi-device groups. Certainly it is nothing to do
with this series.

> In concept a singleton group is different from a
> multi-devices group which has only one device bound to driver...

Really? Why? I don't see it that way..

A singleton group is just a multi-device group that hasn't been
hotplugged yet.

We don't seem to have the concept of a "true" singleton group which is
permanently single due to HW features.

> This series aims to avoid conflict having both user and kernel drivers
> mixed in a multi-devices group.

I see this series about bringing order to all the places that want to
use a non-default domain - in-kernel or user doesn't really matter.

ie why shouldn't iommu_attach_device() work in a group that has a PCI
bridge, just like VFIO does?

The only thing that is special about VFIO vs a kernel driver is we
want a little help to track userspace ownership and VFIO opens
userspace to do the P2P attack.

The way I see it the num device == 1 test in iommu_attach_device() is
an imperfect way of controlling driver binding, and we can do better
by using the mechanism in this series.

Jason
Jason Gunthorpe Nov. 18, 2021, 2:10 p.m. UTC | #11
On Thu, Nov 18, 2021 at 09:12:41AM +0800, Lu Baolu wrote:
> The existing iommu_attach_device() allows only for singleton group. As
> we have added group ownership attribute, we can enforce this interface
> only for kernel domain usage.

Below is what I came up with.
 - Replace the file * with a simple void *

 - Use owner_count == 0 <-> dma_owner == DMA_OWNER to simplify
    the logic and remove levels of indent

 - Add a kernel state DMA_OWNER_PRIVATE_DOMAIN

 - Rename the user state to DMA_OWNER_PRIVATE_DOMAIN_USER

   It differs from the above because it does extra work to keep the
   group isolated that kernel users do no need to do.
 
 - Rename the kernel state to DMA_OWNER_DMA_API to better reflect
   its purpose. Inspired by Robin's point that alot of this is
   indirectly coupled to the domain pointer.

 - Have iommu_attach_device() atomically swap from DMA_OWNER_DMA_API
   to DMA_OWNER_PRIVATE_DOMAIN - replaces the group size check.

When we figure out tegra we can add an WARN_ON to iommu_attach_group()
that dma_owner != DMA_OWNER_NONE || DMA_OWNER_DMA_API

Then the whole thing makes some general sense..

Jason

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 064d0679906afd..4cafe074775e30 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -49,7 +49,7 @@ struct iommu_group {
 	struct list_head entry;
 	enum iommu_dma_owner dma_owner;
 	refcount_t owner_cnt;
-	struct file *owner_user_file;
+	void *owner_cookie;
 };
 
 struct group_device {
@@ -1937,12 +1937,18 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 	 * change while we are attaching
 	 */
 	mutex_lock(&group->mutex);
-	ret = -EINVAL;
-	if (iommu_group_device_count(group) != 1)
+	if (group->dma_owner != DMA_OWNER_DMA_API ||
+	    refcount_read(&group->owner_cnt) != 1) {
+		ret = -EBUSY;
 		goto out_unlock;
+	}
 
 	ret = __iommu_attach_group(domain, group);
+	if (ret)
+		goto out_unlock;
 
+	group->dma_owner = DMA_OWNER_PRIVATE_DOMAIN;
+	group->owner_cookie = domain;
 out_unlock:
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
@@ -2193,14 +2199,11 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 		return;
 
 	mutex_lock(&group->mutex);
-	if (iommu_group_device_count(group) != 1) {
-		WARN_ON(1);
-		goto out_unlock;
-	}
-
+	WARN_ON(group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN ||
+		refcount_read(&group->owner_cnt) != 1 ||
+		group->owner_cookie != domain);
+	group->dma_owner = DMA_OWNER_DMA_API;
 	__iommu_detach_group(domain, group);
-
-out_unlock:
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
 }
@@ -3292,44 +3295,33 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 static int __iommu_group_set_dma_owner(struct iommu_group *group,
 				       enum iommu_dma_owner owner,
-				       struct file *user_file)
+				       void *owner_cookie)
 {
-	if (group->dma_owner != DMA_OWNER_NONE && group->dma_owner != owner)
-		return -EBUSY;
-
-	if (owner == DMA_OWNER_USER) {
-		if (!user_file)
-			return -EINVAL;
-
-		if (group->owner_user_file && group->owner_user_file != user_file)
-			return -EPERM;
+	if (refcount_inc_not_zero(&group->owner_cnt)) {
+		if (group->dma_owner != owner ||
+		    group->owner_cookie != owner_cookie) {
+			refcount_dec(&group->owner_cnt);
+			return -EBUSY;
+		}
+		return 0;
 	}
 
-	if (!refcount_inc_not_zero(&group->owner_cnt)) {
-		group->dma_owner = owner;
-		refcount_set(&group->owner_cnt, 1);
-
-		if (owner == DMA_OWNER_USER) {
-			/*
-			 * The UNMANAGED domain shouldn't be attached before
-			 * claiming the USER ownership for the first time.
-			 */
-			if (group->domain) {
-				if (group->domain != group->default_domain) {
-					group->dma_owner = DMA_OWNER_NONE;
-					refcount_set(&group->owner_cnt, 0);
-
-					return -EBUSY;
-				}
-
-				__iommu_detach_group(group->domain, group);
-			}
-
-			get_file(user_file);
-			group->owner_user_file = user_file;
+	/*
+	 * We must ensure that any device DMAs issued after this call
+	 * are discarded. DMAs can only reach real memory once someone
+	 * has attached a real domain.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		if (group->domain) {
+			if (group->domain != group->default_domain)
+				return -EBUSY;
+			__iommu_detach_group(group->domain, group);
 		}
 	}
 
+	group->dma_owner = owner;
+	group->owner_cookie = owner_cookie;
+	refcount_set(&group->owner_cnt, 1);
 	return 0;
 }
 
@@ -3339,20 +3331,18 @@ static void __iommu_group_release_dma_owner(struct iommu_group *group,
 	if (WARN_ON(group->dma_owner != owner))
 		return;
 
-	if (refcount_dec_and_test(&group->owner_cnt)) {
-		group->dma_owner = DMA_OWNER_NONE;
+	if (!refcount_dec_and_test(&group->owner_cnt))
+		return;
 
-		if (owner == DMA_OWNER_USER) {
-			fput(group->owner_user_file);
-			group->owner_user_file = NULL;
+	group->dma_owner = DMA_OWNER_NONE;
 
-			/*
-			 * The UNMANAGED domain should be detached before all USER
-			 * owners have been released.
-			 */
-			if (!WARN_ON(group->domain) && group->default_domain)
-				__iommu_attach_group(group->default_domain, group);
-		}
+	/*
+	 * The UNMANAGED domain should be detached before all USER
+	 * owners have been released.
+	 */
+	if (owner == DMA_OWNER_PRIVATE_DOMAIN_USER) {
+		if (!WARN_ON(group->domain) && group->default_domain)
+			__iommu_attach_group(group->default_domain, group);
 	}
 }
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d8946f22edd5df..7f50dfa7207e9c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -164,14 +164,21 @@ enum iommu_dev_features {
 
 /**
  * enum iommu_dma_owner - IOMMU DMA ownership
- * @DMA_OWNER_NONE: No DMA ownership
- * @DMA_OWNER_KERNEL: Device DMAs are initiated by a kernel driver
- * @DMA_OWNER_USER: Device DMAs are initiated by a userspace driver
+ * @DMA_OWNER_NONE:
+ *  No DMA ownership
+ * @DMA_OWNER_DMA_API:
+ *  Device DMAs are initiated by a kernel driver through the DMA API
+ * @DMA_OWNER_PRIVATE_DOMAIN:
+ *  Device DMAs are initiated by a kernel driver
+ * @DMA_OWNER_PRIVATE_DOMAIN_USER:
+ *  Device DMAs are initiated by userspace, kernel ensures that DMAs
+ *  never go to kernel memory.
  */
 enum iommu_dma_owner {
 	DMA_OWNER_NONE,
-	DMA_OWNER_KERNEL,
-	DMA_OWNER_USER,
+	DMA_OWNER_DMA_API,
+	DMA_OWNER_PRIVATE_DOMAIN,
+	DMA_OWNER_PRIVATE_DOMAIN_USER,
 };
 
 #define IOMMU_PASID_INVALID	(-1U)
Tian, Kevin Nov. 19, 2021, 5:44 a.m. UTC | #12
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, November 18, 2021 9:33 PM
> 
> > In concept a singleton group is different from a
> > multi-devices group which has only one device bound to driver...
> 
> Really? Why? I don't see it that way..
> 
> A singleton group is just a multi-device group that hasn't been
> hotplugged yet.
> 
> We don't seem to have the concept of a "true" singleton group which is
> permanently single due to HW features.
> 
> > This series aims to avoid conflict having both user and kernel drivers
> > mixed in a multi-devices group.

Well, the difference is just in literal. I don't know the background
why the existing iommu_attach_device() users want to do it this
way. But given the condition in iommu_attach_device() it could
in theory imply some unknown hardware-level side effect which 
may break the desired functionality once the group size grows 
beyond singleton. Is it a real case? I don't know...

You are now redefining that condition from singleton group to
multi-devices group with single driver bound. As long as no object
from existing driver users, I'm fine with it. But still want to raise
awareness as it does change the existing semantics (though might
be considered as an imperfect way).

Thanks
Kevin
Baolu Lu Nov. 19, 2021, 11:14 a.m. UTC | #13
On 2021/11/19 13:44, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Thursday, November 18, 2021 9:33 PM
>>
>>> In concept a singleton group is different from a
>>> multi-devices group which has only one device bound to driver...
>>
>> Really? Why? I don't see it that way..
>>
>> A singleton group is just a multi-device group that hasn't been
>> hotplugged yet.
>>
>> We don't seem to have the concept of a "true" singleton group which is
>> permanently single due to HW features.
>>
>>> This series aims to avoid conflict having both user and kernel drivers
>>> mixed in a multi-devices group.
> 
> Well, the difference is just in literal. I don't know the background
> why the existing iommu_attach_device() users want to do it this
> way. But given the condition in iommu_attach_device() it could
> in theory imply some unknown hardware-level side effect which
> may break the desired functionality once the group size grows
> beyond singleton. Is it a real case? I don't know...
> 
> You are now redefining that condition from singleton group to
> multi-devices group with single driver bound. As long as no object
> from existing driver users, I'm fine with it. But still want to raise
> awareness as it does change the existing semantics (though might
> be considered as an imperfect way).

The singleton group requirement for iommu_attach/detach_device() was
added by below commit:

commit 426a273834eae65abcfc7132a21a85b3151e0bce
Author: Joerg Roedel <jroedel@suse.de>
Date:   Thu May 28 18:41:30 2015 +0200

     iommu: Limit iommu_attach/detach_device to devices with their own group

     This patch changes the behavior of the iommu_attach_device
     and iommu_detach_device functions. With this change these
     functions only work on devices that have their own group.
     For all other devices the iommu_group_attach/detach
     functions must be used.

     Signed-off-by: Joerg Roedel <jroedel@suse.de>

Joerg,can you please shed some light on the background of this
requirement? Does above idea of transition from singleton group
to group with single driver bound make sense to you?

Best regards,
baolu
Jason Gunthorpe Nov. 19, 2021, 12:56 p.m. UTC | #14
On Fri, Nov 19, 2021 at 05:44:35AM +0000, Tian, Kevin wrote:

> Well, the difference is just in literal. I don't know the background
> why the existing iommu_attach_device() users want to do it this
> way. But given the condition in iommu_attach_device() it could
> in theory imply some unknown hardware-level side effect which 
> may break the desired functionality once the group size grows 
> beyond singleton. Is it a real case? I don't know...

No, this must not be.

Any device can have VFIO attached to it, and VFIO does the equivalent
of iommu_attach_device(). We cannot have "unknown hardware-level side
effects" to changing the domain's of multi device groups and also have
working VFIO.

If these HW problems exist they are a seperate issue and the iommu
core should flag such groups as being unable to change their domains
at all.

Jason
Joerg Roedel Nov. 19, 2021, 3:06 p.m. UTC | #15
On Fri, Nov 19, 2021 at 07:14:10PM +0800, Lu Baolu wrote:
> The singleton group requirement for iommu_attach/detach_device() was
> added by below commit:
> 
> commit 426a273834eae65abcfc7132a21a85b3151e0bce
> Author: Joerg Roedel <jroedel@suse.de>
> Date:   Thu May 28 18:41:30 2015 +0200
> 
>     iommu: Limit iommu_attach/detach_device to devices with their own group
> 
>     This patch changes the behavior of the iommu_attach_device
>     and iommu_detach_device functions. With this change these
>     functions only work on devices that have their own group.
>     For all other devices the iommu_group_attach/detach
>     functions must be used.
> 
>     Signed-off-by: Joerg Roedel <jroedel@suse.de>
> 
> Joerg,can you please shed some light on the background of this
> requirement? Does above idea of transition from singleton group
> to group with single driver bound make sense to you?

This change came to be because the iommu_attach/detach_device()
interface doesn't fit well into a world with iommu-groups. Devices
within a group are by definition not isolated between each other, so
they must all be in the same address space (== iommu_domain). So it
doesn't make sense to allow attaching a single device within a group to
a different iommu_domain.

I know that in theory it is safe to allow devices within a group to be
in different domains because there iommu-groups catch multiple
non-isolation cases:

	1) Devices behind a non-ACS capable bridge or multiple functions
	   of a PCI device. Here it is safe to put the devices into
	   different iommu-domains as long as all affected devices are
	   controlled by the same owner.

	2) Devices which share a single request-id and can't be
	   differentiated by the IOMMU hardware. These always need to be
	   in the same iommu_domain.

To lift the single-domain-per-group requirement the iommu core code
needs to learn the difference between the two cases above.

Regards,

	Joerg
Jason Gunthorpe Nov. 19, 2021, 3:43 p.m. UTC | #16
On Fri, Nov 19, 2021 at 04:06:12PM +0100, Jörg Rödel wrote:

> This change came to be because the iommu_attach/detach_device()
> interface doesn't fit well into a world with iommu-groups. Devices
> within a group are by definition not isolated between each other, so
> they must all be in the same address space (== iommu_domain). So it
> doesn't make sense to allow attaching a single device within a group to
> a different iommu_domain.

It is the same problem VFIO has. It changes the iommu_domain of a
group while it only has a single driver bound to one device in the
group.

Robin is also right to point out there is no guarentee that a single
device group will remain a single device group after a hot plug
event. This is something VFIO is also able to handle today.

So, I think the solution of this series applies equally well to this
problem. Let's see it in v2.

> I know that in theory it is safe to allow devices within a group to be
> in different domains because there iommu-groups catch multiple
> non-isolation cases:
> 
> 	1) Devices behind a non-ACS capable bridge or multiple functions
> 	   of a PCI device. Here it is safe to put the devices into
> 	   different iommu-domains as long as all affected devices are
> 	   controlled by the same owner.
> 
> 	2) Devices which share a single request-id and can't be
> 	   differentiated by the IOMMU hardware. These always need to be
> 	   in the same iommu_domain.

> To lift the single-domain-per-group requirement the iommu core code
> needs to learn the difference between the two cases above.

We had a long talk about this a while back, nobody came with
compelling arguments to justify doing this work. I've just been using
it as a guidepost for building APIs. If the API can accomodate #1 then
it is a better design than one that cannot.

Jason
Baolu Lu Nov. 20, 2021, 11:16 a.m. UTC | #17
Hi Joerg,

On 11/19/21 11:06 PM, Jörg Rödel wrote:
> On Fri, Nov 19, 2021 at 07:14:10PM +0800, Lu Baolu wrote:
>> The singleton group requirement for iommu_attach/detach_device() was
>> added by below commit:
>>
>> commit 426a273834eae65abcfc7132a21a85b3151e0bce
>> Author: Joerg Roedel <jroedel@suse.de>
>> Date:   Thu May 28 18:41:30 2015 +0200
>>
>>      iommu: Limit iommu_attach/detach_device to devices with their own group
>>
>>      This patch changes the behavior of the iommu_attach_device
>>      and iommu_detach_device functions. With this change these
>>      functions only work on devices that have their own group.
>>      For all other devices the iommu_group_attach/detach
>>      functions must be used.
>>
>>      Signed-off-by: Joerg Roedel <jroedel@suse.de>
>>
>> Joerg,can you please shed some light on the background of this
>> requirement? Does above idea of transition from singleton group
>> to group with single driver bound make sense to you?
> 
> This change came to be because the iommu_attach/detach_device()
> interface doesn't fit well into a world with iommu-groups. Devices
> within a group are by definition not isolated between each other, so
> they must all be in the same address space (== iommu_domain). So it
> doesn't make sense to allow attaching a single device within a group to
> a different iommu_domain.

Thanks for the explanation. It's very helpful. There seems to be a lot
of discussions around this, but I didn't see any meaningful reasons to
break the assumption of "all devices in a group being in a same address
space".

Best regards,
baolu

> 
> I know that in theory it is safe to allow devices within a group to be
> in different domains because there iommu-groups catch multiple
> non-isolation cases:
> 
> 	1) Devices behind a non-ACS capable bridge or multiple functions
> 	   of a PCI device. Here it is safe to put the devices into
> 	   different iommu-domains as long as all affected devices are
> 	   controlled by the same owner.
> 
> 	2) Devices which share a single request-id and can't be
> 	   differentiated by the IOMMU hardware. These always need to be
> 	   in the same iommu_domain.
> 
> To lift the single-domain-per-group requirement the iommu core code
> needs to learn the difference between the two cases above.
> 
> Regards,
> 
> 	Joerg
>
diff mbox series

Patch

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..f77eb9e7788a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -162,6 +162,18 @@  enum iommu_dev_features {
 	IOMMU_DEV_FEAT_IOPF,
 };
 
+/**
+ * enum iommu_dma_owner - IOMMU DMA ownership
+ * @DMA_OWNER_NONE: No DMA ownership
+ * @DMA_OWNER_KERNEL: Device DMAs are initiated by a kernel driver
+ * @DMA_OWNER_USER: Device DMAs are initiated by a userspace driver
+ */
+enum iommu_dma_owner {
+	DMA_OWNER_NONE,
+	DMA_OWNER_KERNEL,
+	DMA_OWNER_USER,
+};
+
 #define IOMMU_PASID_INVALID	(-1U)
 
 #ifdef CONFIG_IOMMU_API
@@ -681,6 +693,10 @@  struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       struct file *user_file);
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner);
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1081,6 +1097,21 @@  static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
 {
 	return NULL;
 }
+
+static inline int iommu_device_set_dma_owner(struct device *dev,
+					     enum iommu_dma_owner owner,
+					     struct file *user_file)
+{
+	if (owner != DMA_OWNER_KERNEL)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline void iommu_device_release_dma_owner(struct device *dev,
+						  enum iommu_dma_owner owner)
+{
+}
 #endif /* CONFIG_IOMMU_API */
 
 /**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8b86406b7162..39493b1b3edf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -26,6 +26,7 @@ 
 #include <linux/fsl/mc.h>
 #include <linux/module.h>
 #include <linux/cc_platform.h>
+#include <linux/file.h>
 #include <trace/events/iommu.h>
 
 static struct kset *iommu_group_kset;
@@ -48,6 +49,9 @@  struct iommu_group {
 	struct iommu_domain *default_domain;
 	struct iommu_domain *domain;
 	struct list_head entry;
+	enum iommu_dma_owner dma_owner;
+	refcount_t owner_cnt;
+	struct file *owner_user_file;
 };
 
 struct group_device {
@@ -621,6 +625,7 @@  struct iommu_group *iommu_group_alloc(void)
 	INIT_LIST_HEAD(&group->devices);
 	INIT_LIST_HEAD(&group->entry);
 	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
+	group->dma_owner = DMA_OWNER_NONE;
 
 	ret = ida_simple_get(&iommu_group_ida, 0, 0, GFP_KERNEL);
 	if (ret < 0) {
@@ -3351,3 +3356,104 @@  static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 	return ret;
 }
+
+static int __iommu_group_set_dma_owner(struct iommu_group *group,
+				       enum iommu_dma_owner owner,
+				       struct file *user_file)
+{
+	if (group->dma_owner != DMA_OWNER_NONE && group->dma_owner != owner)
+		return -EBUSY;
+
+	if (owner == DMA_OWNER_USER) {
+		if (!user_file)
+			return -EINVAL;
+
+		if (group->owner_user_file && group->owner_user_file != user_file)
+			return -EPERM;
+	}
+
+	if (!refcount_inc_not_zero(&group->owner_cnt)) {
+		group->dma_owner = owner;
+		refcount_set(&group->owner_cnt, 1);
+
+		if (owner == DMA_OWNER_USER) {
+			get_file(user_file);
+			group->owner_user_file = user_file;
+		}
+	}
+
+	return 0;
+}
+
+static void __iommu_group_release_dma_owner(struct iommu_group *group,
+					    enum iommu_dma_owner owner)
+{
+	if (WARN_ON(group->dma_owner != owner))
+		return;
+
+	if (refcount_dec_and_test(&group->owner_cnt)) {
+		group->dma_owner = DMA_OWNER_NONE;
+
+		if (owner == DMA_OWNER_USER) {
+			fput(group->owner_user_file);
+			group->owner_user_file = NULL;
+		}
+	}
+}
+
+/**
+ * iommu_device_set_dma_owner() - Set DMA ownership of a device
+ * @dev: The device.
+ * @owner: DMA_OWNER_KERNEL or DMA_OWNER_USER.
+ * @user_file: The device fd when DMA_OWNER_USER is about to set.
+ *
+ * Set the DMA ownership of a device. The KERNEL and USER ownership are
+ * exclusive. For DMA_OWNER_USER, the caller should also specify the fd
+ * through which the I/O address spaces are managed for the target device.
+ * This interface guarantees that the USER DMA ownership is only assigned
+ * to the same fd.
+ */
+int iommu_device_set_dma_owner(struct device *dev, enum iommu_dma_owner owner,
+			       struct file *user_file)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+	int ret;
+
+	if (!group) {
+		if (owner == DMA_OWNER_KERNEL)
+			return 0;
+		else
+			return -ENODEV;
+	}
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_group_set_dma_owner(group, owner, user_file);
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_device_set_dma_owner);
+
+/**
+ * iommu_device_release_dma_owner() - Release DMA ownership of a device
+ * @dev: The device.
+ * @owner: DMA_OWNER_KERNEL or DMA_OWNER_USER.
+ *
+ * Release the DMA ownership claimed by iommu_device_set_dma_owner().
+ */
+void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner owner)
+{
+	struct iommu_group *group = iommu_group_get(dev);
+
+	if (!group) {
+		WARN_ON(owner != DMA_OWNER_KERNEL);
+		return;
+	}
+
+	mutex_lock(&group->mutex);
+	__iommu_group_release_dma_owner(group, owner);
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);