diff mbox series

[v1,3/8] iommu: Extend iommu_at[de]tach_device() for multi-device groups

Message ID 20220106022053.2406748-4-baolu.lu@linux.intel.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Scrap iommu_attach/detach_group() interfaces | expand

Commit Message

Baolu Lu Jan. 6, 2022, 2:20 a.m. UTC
The iommu_attach/detach_device() interfaces were exposed for the device
drivers to attach/detach their own domains. The commit <426a273834eae>
("iommu: Limit iommu_attach/detach_device to device with their own group")
restricted them to singleton groups to avoid different device in a group
attaching different domain.

As we've introduced device DMA ownership into the iommu core. We can now
extend these interfaces for muliple-device groups, and "all devices are in
the same address space" is still guaranteed.

For multiple devices belonging to a same group, iommu_device_use_dma_api()
and iommu_attach_device() are exclusive. Therefore, when drivers decide to
use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() at
the same time.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++----------
 1 file changed, 62 insertions(+), 17 deletions(-)

Comments

Jason Gunthorpe Jan. 6, 2022, 5:22 p.m. UTC | #1
On Thu, Jan 06, 2022 at 10:20:48AM +0800, Lu Baolu wrote:
> The iommu_attach/detach_device() interfaces were exposed for the device
> drivers to attach/detach their own domains. The commit <426a273834eae>
> ("iommu: Limit iommu_attach/detach_device to device with their own group")
> restricted them to singleton groups to avoid different device in a group
> attaching different domain.
> 
> As we've introduced device DMA ownership into the iommu core. We can now
> extend these interfaces for muliple-device groups, and "all devices are in
> the same address space" is still guaranteed.
> 
> For multiple devices belonging to a same group, iommu_device_use_dma_api()
> and iommu_attach_device() are exclusive. Therefore, when drivers decide to
> use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() at
> the same time.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>  drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++----------
>  1 file changed, 62 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ab8ab95969f5..2c9efd85e447 100644
> +++ b/drivers/iommu/iommu.c
> @@ -47,6 +47,7 @@ struct iommu_group {
>  	struct iommu_domain *domain;
>  	struct list_head entry;
>  	unsigned int owner_cnt;
> +	unsigned int attach_cnt;

Why did we suddenly need another counter? None of the prior versions
needed this. I suppose this is being used a some flag to indicate if
owner_cnt == 1 or owner_cnt == 0 should restore the default domain?
Would rather a flag 'auto_no_kernel_dma_api_compat' or something


> +/**
> + * iommu_attach_device() - attach external or UNMANAGED domain to device
> + * @domain: the domain about to attach
> + * @dev: the device about to be attached
> + *
> + * For devices belonging to the same group, iommu_device_use_dma_api() and
> + * iommu_attach_device() are exclusive. Therefore, when drivers decide to
> + * use iommu_attach_domain(), they cannot call iommu_device_use_dma_api()
> + * at the same time.
> + */
>  int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  {
>  	struct iommu_group *group;
> +	int ret = 0;
> +
> +	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> +		return -EINVAL;
>  
>  	group = iommu_group_get(dev);
>  	if (!group)
>  		return -ENODEV;
>  
> +	if (group->owner_cnt) {
> +		/*
> +		 * Group has been used for kernel-api dma or claimed explicitly
> +		 * for exclusive occupation. For backward compatibility, device
> +		 * in a singleton group is allowed to ignore setting the
> +		 * drv.no_kernel_api_dma field.

BTW why is this call 'no kernel api dma' ? That reads backwards 'no
kernel dma api' right?

Aother appeal of putting no_kernel_api_dma in the struct device_driver
is that this could could simply do 'dev->driver->no_kernel_api_dma' to
figure out how it is being called and avoid this messy implicitness.

Once we know our calling context we can always automatic switch from
DMA API mode to another domain without any trouble or special
counters:

if (!dev->driver->no_kernel_api_dma) {
    if (group->owner_cnt > 1 || group->owner)
        return -EBUSY;
    return __iommu_attach_group(domain, group);
}

if (!group->owner_cnt) {
    ret = __iommu_attach_group(domain, group);
    if (ret)
        return ret;
} else if (group->owner || group->domain != domain)
    return -EBUSY;
group->owner_cnt++;

Right?

> +	if (!group->attach_cnt) {
> +		ret = __iommu_attach_group(domain, group);

How come we don't have to detatch the default domain here? Doesn't
that mean that the iommu_replace_group could also just call attach
directly without going through detatch?

Jason
Baolu Lu Jan. 7, 2022, 1:14 a.m. UTC | #2
Hi Jason,

On 1/7/22 1:22 AM, Jason Gunthorpe wrote:
> On Thu, Jan 06, 2022 at 10:20:48AM +0800, Lu Baolu wrote:
>> The iommu_attach/detach_device() interfaces were exposed for the device
>> drivers to attach/detach their own domains. The commit <426a273834eae>
>> ("iommu: Limit iommu_attach/detach_device to device with their own group")
>> restricted them to singleton groups to avoid different device in a group
>> attaching different domain.
>>
>> As we've introduced device DMA ownership into the iommu core. We can now
>> extend these interfaces for muliple-device groups, and "all devices are in
>> the same address space" is still guaranteed.
>>
>> For multiple devices belonging to a same group, iommu_device_use_dma_api()
>> and iommu_attach_device() are exclusive. Therefore, when drivers decide to
>> use iommu_attach_domain(), they cannot call iommu_device_use_dma_api() at
>> the same time.
>>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>>   drivers/iommu/iommu.c | 79 +++++++++++++++++++++++++++++++++----------
>>   1 file changed, 62 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index ab8ab95969f5..2c9efd85e447 100644
>> +++ b/drivers/iommu/iommu.c
>> @@ -47,6 +47,7 @@ struct iommu_group {
>>   	struct iommu_domain *domain;
>>   	struct list_head entry;
>>   	unsigned int owner_cnt;
>> +	unsigned int attach_cnt;
> 
> Why did we suddenly need another counter? None of the prior versions
> needed this. I suppose this is being used a some flag to indicate if
> owner_cnt == 1 or owner_cnt == 0 should restore the default domain?

Yes, exactly.

> Would rather a flag 'auto_no_kernel_dma_api_compat' or something

Adding a flag also works.

> 
> 
>> +/**
>> + * iommu_attach_device() - attach external or UNMANAGED domain to device
>> + * @domain: the domain about to attach
>> + * @dev: the device about to be attached
>> + *
>> + * For devices belonging to the same group, iommu_device_use_dma_api() and
>> + * iommu_attach_device() are exclusive. Therefore, when drivers decide to
>> + * use iommu_attach_domain(), they cannot call iommu_device_use_dma_api()
>> + * at the same time.
>> + */
>>   int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>>   {
>>   	struct iommu_group *group;
>> +	int ret = 0;
>> +
>> +	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
>> +		return -EINVAL;
>>   
>>   	group = iommu_group_get(dev);
>>   	if (!group)
>>   		return -ENODEV;
>>   
>> +	if (group->owner_cnt) {
>> +		/*
>> +		 * Group has been used for kernel-api dma or claimed explicitly
>> +		 * for exclusive occupation. For backward compatibility, device
>> +		 * in a singleton group is allowed to ignore setting the
>> +		 * drv.no_kernel_api_dma field.
> 
> BTW why is this call 'no kernel api dma' ? That reads backwards 'no
> kernel dma api' right?

Yes. Need to rephrase this wording.

> 
> Aother appeal of putting no_kernel_api_dma in the struct device_driver
> is that this could could simply do 'dev->driver->no_kernel_api_dma' to
> figure out how it is being called and avoid this messy implicitness.

Yes.

> 
> Once we know our calling context we can always automatic switch from
> DMA API mode to another domain without any trouble or special
> counters:
> 
> if (!dev->driver->no_kernel_api_dma) {
>      if (group->owner_cnt > 1 || group->owner)
>          return -EBUSY;
>      return __iommu_attach_group(domain, group);
> }

Is there any lock issue when referencing dev->driver here? I guess this
requires iommu_attach_device() only being called during the driver life
(a.k.a. between driver .probe and .release).

> 
> if (!group->owner_cnt) {
>      ret = __iommu_attach_group(domain, group);
>      if (ret)
>          return ret;
> } else if (group->owner || group->domain != domain)
>      return -EBUSY;
> group->owner_cnt++;
> 
> Right?

Yes. It's more straightforward if there's no issue around dev->driver
referencing.

> 
>> +	if (!group->attach_cnt) {
>> +		ret = __iommu_attach_group(domain, group);
> 
> How come we don't have to detatch the default domain here? Doesn't
> that mean that the iommu_replace_group could also just call attach
> directly without going through detatch?

__iommu_attach_group() allows replacing the default domain with a
private domain. Corresponding __iommu_detach_group() automatically
replaces private domain with the default domain.

The auto-switch logic should not apply to iommu_group_replace_domain()
which is designed for components with iommu_set_dma_owner() called.

> 
> Jason
> 

Best regards,
baolu
Jason Gunthorpe Jan. 7, 2022, 1:19 a.m. UTC | #3
On Fri, Jan 07, 2022 at 09:14:38AM +0800, Lu Baolu wrote:

> > Once we know our calling context we can always automatic switch from
> > DMA API mode to another domain without any trouble or special
> > counters:
> > 
> > if (!dev->driver->no_kernel_api_dma) {
> >      if (group->owner_cnt > 1 || group->owner)
> >          return -EBUSY;
> >      return __iommu_attach_group(domain, group);
> > }
> 
> Is there any lock issue when referencing dev->driver here? I guess this
> requires iommu_attach_device() only being called during the driver life
> (a.k.a. between driver .probe and .release).

Yes, that is correct. That would need to be documented.

It is the same reason the routine was able to get the group from the
dev. The dev's group must be stable so long as a driver is attached or
everything is broken :)

Much of the group refcounting code is useless for this reason. The
group simply cannot be concurrently destroyed in these contexts.

Jason
Joerg Roedel Feb. 14, 2022, 11:39 a.m. UTC | #4
On Thu, Jan 06, 2022 at 10:20:48AM +0800, Lu Baolu wrote:
>  int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  {
>  	struct iommu_group *group;
> -	int ret;
> +	int ret = 0;
> +
> +	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> +		return -EINVAL;
>  
>  	group = iommu_group_get(dev);
>  	if (!group)
>  		return -ENODEV;
>  
> -	/*
> -	 * Lock the group to make sure the device-count doesn't
> -	 * change while we are attaching
> -	 */
>  	mutex_lock(&group->mutex);
> -	ret = -EINVAL;
> -	if (iommu_group_device_count(group) != 1)
> -		goto out_unlock;
> +	if (group->owner_cnt) {
> +		/*
> +		 * Group has been used for kernel-api dma or claimed explicitly
> +		 * for exclusive occupation. For backward compatibility, device
> +		 * in a singleton group is allowed to ignore setting the
> +		 * drv.no_kernel_api_dma field.
> +		 */
> +		if ((group->domain == group->default_domain &&
> +		     iommu_group_device_count(group) != 1) ||
> +		    group->owner) {
> +			ret = -EBUSY;
> +			goto unlock_out;
> +		}
> +	}
>  
> -	ret = __iommu_attach_group(domain, group);
> +	if (!group->attach_cnt) {
> +		ret = __iommu_attach_group(domain, group);
> +		if (ret)
> +			goto unlock_out;
> +	} else {
> +		if (group->domain != domain) {
> +			ret = -EPERM;
> +			goto unlock_out;
> +		}
> +	}
>  
> -out_unlock:
> +	group->owner_cnt++;
> +	group->attach_cnt++;
> +
> +unlock_out:
>  	mutex_unlock(&group->mutex);
>  	iommu_group_put(group);

This extends iommu_attach_device() to behave as iommu_attach_group(),
changing the domain for the whole group. Wouldn't it be better to scrap
the iommu_attach_device() interface instead and only rely on
iommu_attach_group()? This way it is clear that a call changes the whole
group.

IIUC this work is heading towards allowing multiple domains in one group
as long as the group is owned by one entity. That is a valid
requirement, but the way to get there is in my eyes:

	1) Introduce a concept of a sub-group (or whatever we want to
	   call it), which groups devices together which must be in the
	   same domain because they use the same request ID and thus
	   look all the same to the IOMMU.

	2) Keep todays IOMMU groups to group devices together which can
	   bypass the IOMMU when talking to each other, like
	   multi-function devices and devices behind a no-ACS bridge.

	3) Rework group->domain and group->default_domain, eventually
	   moving them to sub-groups.

This is an important distinction to make and also the reason the
iommu_attach/detach_device() interface will always be misleading. Item
1) in this list will also be beneficial to other parts of the iommu
code, namely iommu-dma code which can have finer-grained DMA-API domains
with sub-groups instead of groups.

Regards,

	Joerg
Jason Gunthorpe Feb. 14, 2022, 1:03 p.m. UTC | #5
On Mon, Feb 14, 2022 at 12:39:36PM +0100, Joerg Roedel wrote:

> This extends iommu_attach_device() to behave as iommu_attach_group(),
> changing the domain for the whole group. 

Of course, the only action to take is to change the domain of a
group..

> Wouldn't it be better to scrap the iommu_attach_device() interface
> instead and only rely on iommu_attach_group()? This way it is clear
> that a call changes the whole group.

From an API design perspective drivers should never touch groups -
they have struct devices, they should have a clean struct device based
API.

Groups should disappear into an internal implementation detail, not be
so prominent in the API.

> IIUC this work is heading towards allowing multiple domains in one group
> as long as the group is owned by one entity.

No, it isn't. This work is only about properly arbitrating which
single domain is attached to an entire group.

> 	1) Introduce a concept of a sub-group (or whatever we want to
> 	   call it), which groups devices together which must be in the
> 	   same domain because they use the same request ID and thus
> 	   look all the same to the IOMMU.
>
> 	2) Keep todays IOMMU groups to group devices together which can
> 	   bypass the IOMMU when talking to each other, like
> 	   multi-function devices and devices behind a no-ACS bridge.

We've talked about all these details before and nobody has thought
they are important enough to implement. This distinction is not the
goal of this series.

I think if someone did want to do this there is room in the API to
allow the distinction between 1 (must share) and 2 (sharing is
insecure). eg by checking owner and blocking mixing user/kernel. 

This is another reason to stick with the device centric API as if we
did someday want multi-domain groups then the device input is still
the correct input and the iommu code can figure out what sub-groups or
whatever transparently.

Jason
Joerg Roedel Feb. 14, 2022, 2:39 p.m. UTC | #6
On Mon, Feb 14, 2022 at 09:03:13AM -0400, Jason Gunthorpe wrote:
> Groups should disappear into an internal implementation detail, not be
> so prominent in the API.

Not going to happen, IOMMU groups are ABI and todays device assignment
code, including user-space, relies on them.

Groups implement and important aspect of hardware IOMMUs that the API
can not abstract away: That there are devices which share the same
request-id. 

This is not an issue for devices concerned by iommufd, but for legacy
device assignment it is. The IOMMU-API needs to handle both in a clean
API, even if it means that drivers need to lookup the sub-group of a
device first.

And I don't see how a per-device API can handle both in a device-centric
way. For sure it is not making it 'device centric but operate on groups
under the hood'.

Regards,

	Joerg
Robin Murphy Feb. 14, 2022, 3:18 p.m. UTC | #7
On 2022-02-14 14:39, Joerg Roedel wrote:
> On Mon, Feb 14, 2022 at 09:03:13AM -0400, Jason Gunthorpe wrote:
>> Groups should disappear into an internal implementation detail, not be
>> so prominent in the API.
> 
> Not going to happen, IOMMU groups are ABI and todays device assignment
> code, including user-space, relies on them.
> 
> Groups implement and important aspect of hardware IOMMUs that the API
> can not abstract away: That there are devices which share the same
> request-id.
> 
> This is not an issue for devices concerned by iommufd, but for legacy
> device assignment it is. The IOMMU-API needs to handle both in a clean
> API, even if it means that drivers need to lookup the sub-group of a
> device first.
> 
> And I don't see how a per-device API can handle both in a device-centric
> way. For sure it is not making it 'device centric but operate on groups
> under the hood'.

Arguably, iommu_attach_device() could be renamed something like 
iommu_attach_group_for_dev(), since that's effectively the semantic that 
all the existing API users want anyway (even VFIO at the high level - 
the group is the means for the user to assign their GPU/NIC/whatever 
device to their process, not the end in itself). That's just a lot more 
churn.

It's not that callers should be blind to the entire concept of groups 
altogether - they remain a significant reason why iommu_attach_device() 
might fail, for one thing - however what callers really shouldn't need 
to be bothered with is the exact *implementation* of groups. I do 
actually quite like the idea of refining the group abstraction into 
isolation groups as a superset of alias groups, but if anything that's a 
further argument for not having the guts of the current abstraction 
exposed in places that don't need to care - otherwise that would be 
liable to be a microcosm of this series in itself: widespread churn vs. 
"same name, new meaning" compromises.

Robin.
Jason Gunthorpe Feb. 14, 2022, 3:46 p.m. UTC | #8
On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote:

> Arguably, iommu_attach_device() could be renamed something like
> iommu_attach_group_for_dev(), since that's effectively the semantic that all
> the existing API users want anyway (even VFIO at the high level - the group
> is the means for the user to assign their GPU/NIC/whatever device to their
> process, not the end in itself). That's just a lot more churn.

Right
 
> It's not that callers should be blind to the entire concept of groups
> altogether - they remain a significant reason why iommu_attach_device()
> might fail, for one thing - however what callers really shouldn't need to be
> bothered with is the exact *implementation* of groups. I do actually quite
> like the idea of refining the group abstraction into isolation groups as a
> superset of alias groups, but if anything that's a further argument for not
> having the guts of the current abstraction exposed in places that don't need
> to care - otherwise that would be liable to be a microcosm of this series in
> itself: widespread churn vs. "same name, new meaning" compromises.

Exactly, groups should not leak out through the abstraction more than
necessary. If the caller can't do anything with the group information
then it shouldn't touch it.

VFIO needs them because its uAPI is tied, but even so we keep talking
about ways to narrow the amount of group API it consumes.

We should not set the recommended/good kAPI based on VFIOs uAPI
design.

Jason
Joerg Roedel Feb. 15, 2022, 8:58 a.m. UTC | #9
On Mon, Feb 14, 2022 at 11:46:26AM -0400, Jason Gunthorpe wrote:
> On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote:
> 
> > Arguably, iommu_attach_device() could be renamed something like
> > iommu_attach_group_for_dev(), since that's effectively the semantic that all
> > the existing API users want anyway (even VFIO at the high level - the group
> > is the means for the user to assign their GPU/NIC/whatever device to their
> > process, not the end in itself). That's just a lot more churn.
> 
> Right

Okay, good point. I can live with an iommu_attach_group_for_dev()
interface, it is still better than making iommu_attach_device() silently
operate on whole groups.

> VFIO needs them because its uAPI is tied, but even so we keep talking
> about ways to narrow the amount of group API it consumes.
> 
> We should not set the recommended/good kAPI based on VFIOs uAPI
> design.

Agree here too. The current way groups are implemented can be turned
into a VFIO specific interface to keep its semantics and kABI. But the
IOMMU core code still needs the concept of alias groups.

Regards,

	Joerg
Jason Gunthorpe Feb. 15, 2022, 1:47 p.m. UTC | #10
On Tue, Feb 15, 2022 at 09:58:13AM +0100, Joerg Roedel wrote:
> On Mon, Feb 14, 2022 at 11:46:26AM -0400, Jason Gunthorpe wrote:
> > On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote:
> > 
> > > Arguably, iommu_attach_device() could be renamed something like
> > > iommu_attach_group_for_dev(), since that's effectively the semantic that all
> > > the existing API users want anyway (even VFIO at the high level - the group
> > > is the means for the user to assign their GPU/NIC/whatever device to their
> > > process, not the end in itself). That's just a lot more churn.
> > 
> > Right
> 
> Okay, good point. I can live with an iommu_attach_group_for_dev()
> interface, it is still better than making iommu_attach_device() silently
> operate on whole groups.

I think this is what Lu's series currently does, it just doesn't do
the rename churn as Robin noted. Lu, why not add a note like Robin
explained to the kdoc so it is clear this api impacts the whole group?

There is no argument that the internal operation of the iommu layer
should always be using groups - we are just presenting a simplified
API toward drivers.

Jason
Baolu Lu Feb. 16, 2022, 6:28 a.m. UTC | #11
On 2/15/22 9:47 PM, Jason Gunthorpe via iommu wrote:
> On Tue, Feb 15, 2022 at 09:58:13AM +0100, Joerg Roedel wrote:
>> On Mon, Feb 14, 2022 at 11:46:26AM -0400, Jason Gunthorpe wrote:
>>> On Mon, Feb 14, 2022 at 03:18:31PM +0000, Robin Murphy wrote:
>>>
>>>> Arguably, iommu_attach_device() could be renamed something like
>>>> iommu_attach_group_for_dev(), since that's effectively the semantic that all
>>>> the existing API users want anyway (even VFIO at the high level - the group
>>>> is the means for the user to assign their GPU/NIC/whatever device to their
>>>> process, not the end in itself). That's just a lot more churn.
>>>
>>> Right
>>
>> Okay, good point. I can live with an iommu_attach_group_for_dev()
>> interface, it is still better than making iommu_attach_device() silently
>> operate on whole groups.
> 
> I think this is what Lu's series currently does, it just doesn't do
> the rename churn as Robin noted. Lu, why not add a note like Robin
> explained to the kdoc so it is clear this api impacts the whole group?

I feel that the debate here is not about API name, but how should
iommu_attach/detach_device() be implemented and used.

It seems everyone agrees that for device assignment (where the I/O
address is owned by the user-space application), the iommu_group-based
APIs should always be used. Otherwise, the isolation and protection are
not guaranteed.

For kernel DMA (where the I/O address space is owned by the kernel
drivers), the device driver oriented interface should meet below
expectations:

  - the concept of iommu_group should be transparent to the device
    drivers;
  - but internally, iommu core only allows a single domain to attach to
    a group.

If the device driver uses default domain, the above expectations are
naturally met. But when the driver wants to attach its own domain, the
problem arises.

This series tries to use the DMA ownership mechanism to solve this. The
devices drivers explicitly declare that

  - I know that the device I am driving shares the iommu_group with
    others;
  - Other device drivers with the same awareness can only be bound to the
    devices in the shared group;
  - We can sync with each other so that only a shared domain could be
    attached to the devices in the group.

Another proposal (as suggested by Joerg) is to introduce the concept of
"sub-group". An iommu group could have one or multiple sub-groups with
non-aliased devices sitting in different sub-groups and use different
domains.

Above are what I get so far. If there's any misunderstanding, please
help to correct.

Best regards,
baolu
Jason Gunthorpe Feb. 16, 2022, 1:54 p.m. UTC | #12
On Wed, Feb 16, 2022 at 02:28:09PM +0800, Lu Baolu wrote:

> It seems everyone agrees that for device assignment (where the I/O
> address is owned by the user-space application), the iommu_group-based
> APIs should always be used. Otherwise, the isolation and protection are
> not guaranteed.

This group/device split is all just driven by VFIO. There is nothing
preventing a struct device * API from being used with user-space, and
Robin has been pushing that way. With enough fixing of VFIO we can do
it.

eg the device-centric VFIO patches should be able to eventually work
entirely on an iommu device API.

> Another proposal (as suggested by Joerg) is to introduce the concept of
> "sub-group". An iommu group could have one or multiple sub-groups with
> non-aliased devices sitting in different sub-groups and use different
> domains.

I still don't see how sub groups help or really change anything here.

The API already has the concept of 'ownership' seperated from the
concept of 'attach a domain to a device'.

Ownership works on the ACS group and attach works on the 'same RID'
group.

The API can take in the struct device and select which internal group
to use based on which action is being done.

Jason
diff mbox series

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ab8ab95969f5..2c9efd85e447 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -47,6 +47,7 @@  struct iommu_group {
 	struct iommu_domain *domain;
 	struct list_head entry;
 	unsigned int owner_cnt;
+	unsigned int attach_cnt;
 	void *owner;
 };
 
@@ -1921,27 +1922,59 @@  static int __iommu_attach_device(struct iommu_domain *domain,
 	return ret;
 }
 
+/**
+ * iommu_attach_device() - attach external or UNMANAGED domain to device
+ * @domain: the domain about to attach
+ * @dev: the device about to be attached
+ *
+ * For devices belonging to the same group, iommu_device_use_dma_api() and
+ * iommu_attach_device() are exclusive. Therefore, when drivers decide to
+ * use iommu_attach_domain(), they cannot call iommu_device_use_dma_api()
+ * at the same time.
+ */
 int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 {
 	struct iommu_group *group;
-	int ret;
+	int ret = 0;
+
+	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+		return -EINVAL;
 
 	group = iommu_group_get(dev);
 	if (!group)
 		return -ENODEV;
 
-	/*
-	 * Lock the group to make sure the device-count doesn't
-	 * change while we are attaching
-	 */
 	mutex_lock(&group->mutex);
-	ret = -EINVAL;
-	if (iommu_group_device_count(group) != 1)
-		goto out_unlock;
+	if (group->owner_cnt) {
+		/*
+		 * Group has been used for kernel-api dma or claimed explicitly
+		 * for exclusive occupation. For backward compatibility, device
+		 * in a singleton group is allowed to ignore setting the
+		 * drv.no_kernel_api_dma field.
+		 */
+		if ((group->domain == group->default_domain &&
+		     iommu_group_device_count(group) != 1) ||
+		    group->owner) {
+			ret = -EBUSY;
+			goto unlock_out;
+		}
+	}
 
-	ret = __iommu_attach_group(domain, group);
+	if (!group->attach_cnt) {
+		ret = __iommu_attach_group(domain, group);
+		if (ret)
+			goto unlock_out;
+	} else {
+		if (group->domain != domain) {
+			ret = -EPERM;
+			goto unlock_out;
+		}
+	}
 
-out_unlock:
+	group->owner_cnt++;
+	group->attach_cnt++;
+
+unlock_out:
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
 
@@ -2182,23 +2215,35 @@  static void __iommu_detach_device(struct iommu_domain *domain,
 	trace_detach_device_from_domain(dev);
 }
 
+/**
+ * iommu_detach_device() - detach external or UNMANAGED domain from device
+ * @domain: the domain about to detach
+ * @dev: the device about to be detached
+ *
+ * Paired with iommu_attach_device(), it detaches the domain from the device.
+ */
 void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 {
 	struct iommu_group *group;
 
+	if (WARN_ON(domain->type != IOMMU_DOMAIN_UNMANAGED))
+		return;
+
 	group = iommu_group_get(dev);
-	if (!group)
+	if (WARN_ON(!group))
 		return;
 
 	mutex_lock(&group->mutex);
-	if (iommu_group_device_count(group) != 1) {
-		WARN_ON(1);
-		goto out_unlock;
-	}
+	if (WARN_ON(!group->attach_cnt || !group->owner_cnt ||
+		    group->domain != domain))
+		goto unlock_out;
 
-	__iommu_detach_group(domain, group);
+	group->attach_cnt--;
+	group->owner_cnt--;
+	if (!group->attach_cnt)
+		__iommu_detach_group(domain, group);
 
-out_unlock:
+unlock_out:
 	mutex_unlock(&group->mutex);
 	iommu_group_put(group);
 }