mbox series

[RFC,v1,0/6] Resolve unwanted DMA backing with IOMMU

Message ID 20180726231624.21084-1-digetx@gmail.com (mailing list archive)
Headers show
Series Resolve unwanted DMA backing with IOMMU | expand

Message

Dmitry Osipenko July 26, 2018, 11:16 p.m. UTC
Hello,

There is a trouble on ARM with DMA allocations made by device drivers,
the trouble is that DMA allocations are getting implicitly backed with
IOMMU mapping by the driver core if IOMMU presents in a system and IOMMU
could handle device. This is an undesired behaviour for drivers that
manage IOMMU by themselves, like NVIDIA Tegra GPU driver. On arm32 the
implicit backing happens if CONFIG_ARM_DMA_USE_IOMMU=y (multiplatform
kernel configuration), on arm64 it happens if IOMMU domain type for a
device is equal to IOMMU_DOMAIN_DMA.

The proposed solution adds a new option to the base device driver
structure that allows device drivers to explicitly convey to the drivers
core that the implicit IOMMU backing for devices must not happen.

Dmitry Osipenko (6):
  driver core: Add option for disabling of backing devices DMA with
    IOMMU
  of/device: Don't back devices DMA with IOMMU if that's undesired by
    driver
  drm/tegra: Avoid implicit DMA backing with IOMMU
  gpu: host1x: Avoid implicit DMA backing with IOMMU
  drm/nouveau: tegra: Universally avoid implicit DMA backing with IOMMU
  Revert "drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping"

 drivers/gpu/drm/nouveau/nouveau_platform.c         |  1 +
 drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 13 -------------
 drivers/gpu/drm/tegra/dc.c                         |  1 +
 drivers/gpu/drm/tegra/gr2d.c                       |  1 +
 drivers/gpu/drm/tegra/gr3d.c                       |  1 +
 drivers/gpu/drm/tegra/vic.c                        |  1 +
 drivers/gpu/host1x/dev.c                           |  1 +
 drivers/of/device.c                                |  7 +++++++
 include/linux/device.h                             |  2 ++
 9 files changed, 15 insertions(+), 13 deletions(-)

Comments

Joerg Roedel July 27, 2018, 8:25 a.m. UTC | #1
On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> The proposed solution adds a new option to the base device driver
> structure that allows device drivers to explicitly convey to the drivers
> core that the implicit IOMMU backing for devices must not happen.

Why is IOMMU mapping a problem for the Tegra GPU driver?

If we add something like this then it should not be the choice of the
device driver, but of the user and/or the firmware.


	Joerg
Will Deacon July 27, 2018, 9:03 a.m. UTC | #2
On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > The proposed solution adds a new option to the base device driver
> > structure that allows device drivers to explicitly convey to the drivers
> > core that the implicit IOMMU backing for devices must not happen.
> 
> Why is IOMMU mapping a problem for the Tegra GPU driver?
> 
> If we add something like this then it should not be the choice of the
> device driver, but of the user and/or the firmware.

Agreed, and it would still need somebody to configure an identity domain so
that transactions aren't aborted immediately. We currently allow the
identity domain to be used by default via a command-line option, so I guess
we'd need a way for firmware to request that on a per-device basis.

Will
Dmitry Osipenko July 27, 2018, 2:10 p.m. UTC | #3
On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> > On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > > The proposed solution adds a new option to the base device driver
> > > structure that allows device drivers to explicitly convey to the drivers
> > > core that the implicit IOMMU backing for devices must not happen.
> > 
> > Why is IOMMU mapping a problem for the Tegra GPU driver?
> > 
> > If we add something like this then it should not be the choice of the
> > device driver, but of the user and/or the firmware.
> 
> Agreed, and it would still need somebody to configure an identity domain so
> that transactions aren't aborted immediately. We currently allow the
> identity domain to be used by default via a command-line option, so I guess
> we'd need a way for firmware to request that on a per-device basis.

The IOMMU mapping itself is not a problem, the problem is the management of 
the IOMMU. For Tegra we don't want anything to intrude into the IOMMU 
activities because:

1) GPU HW require additional configuration for the IOMMU usage and dumb 
mapping of the allocations simply doesn't work.

2) Older Tegra generations have a limited resource and capabilities in regards 
to IOMMU usage, allocating IOMMU domain per-device is just impossible for  
example.

3) HW performs context switches and so particular allocations have to be 
assigned to a particular contexts IOMMU domain.

Some of the above is due to a SW driver model (and its work-in-progress 
status), other is due to a HW model. So essentially we need a way for a driver 
to tell the core not to mess with IOMMU stuff of drivers device behind the 
drivers back.

I'm not sure what you guys are meaning by the "firmware", could you elaborate 
please? Do you mean the Open Firmware and hence the devicetree or what?
Joerg Roedel July 27, 2018, 2:20 p.m. UTC | #4
On Fri, Jul 27, 2018 at 05:10:22PM +0300, Dmitry Osipenko wrote:
> I'm not sure what you guys are meaning by the "firmware", could you elaborate 
> please? Do you mean the Open Firmware and hence the devicetree or what?

Yes, I think the best way to request this is using a device-tree
property. Letting the device driver request it is definitly a bad idea.


	Joerg
Robin Murphy July 27, 2018, 4:02 p.m. UTC | #5
On 27/07/18 15:10, Dmitry Osipenko wrote:
> On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
>> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
>>> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
>>>> The proposed solution adds a new option to the base device driver
>>>> structure that allows device drivers to explicitly convey to the drivers
>>>> core that the implicit IOMMU backing for devices must not happen.
>>>
>>> Why is IOMMU mapping a problem for the Tegra GPU driver?
>>>
>>> If we add something like this then it should not be the choice of the
>>> device driver, but of the user and/or the firmware.
>>
>> Agreed, and it would still need somebody to configure an identity domain so
>> that transactions aren't aborted immediately. We currently allow the
>> identity domain to be used by default via a command-line option, so I guess
>> we'd need a way for firmware to request that on a per-device basis.
> 
> The IOMMU mapping itself is not a problem, the problem is the management of
> the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> activities because:
> 
> 1) GPU HW require additional configuration for the IOMMU usage and dumb
> mapping of the allocations simply doesn't work.

Generally, that's already handled by the DRM drivers allocating their 
own unmanaged domains. The only problem we really need to solve in that 
regard is that currently the device DMA ops don't get updated when 
moving away from the managed domain. That's been OK for the VFIO case 
where the device is bound to a different driver which we know won't make 
any explicit DMA API calls, but for the more general case of IOMMU-aware 
drivers we could certainly do with a bit of cooperation between the 
IOMMU API, DMA API, and arch code to update the DMA ops dynamically to 
cope with intermediate subsystems making DMA API calls on behalf of 
devices they don't know the intimate details of.

> 2) Older Tegra generations have a limited resource and capabilities in regards
> to IOMMU usage, allocating IOMMU domain per-device is just impossible for
> example.
> 
> 3) HW performs context switches and so particular allocations have to be
> assigned to a particular contexts IOMMU domain.

I understand Qualcomm SoCs have a similar thing too, and AFAICS that 
case just doesn't fit into the current API model at all. We need the 
IOMMU driver to somehow know about the specific details of which devices 
have magic associations with specific contexts, and we almost certainly 
need a more expressive interface than iommu_domain_alloc() to have any 
hope of reliable results.

Robin.

> Some of the above is due to a SW driver model (and its work-in-progress
> status), other is due to a HW model. So essentially we need a way for a driver
> to tell the core not to mess with IOMMU stuff of drivers device behind the
> drivers back.
> 
> I'm not sure what you guys are meaning by the "firmware", could you elaborate
> please? Do you mean the Open Firmware and hence the devicetree or what?
> 
> 
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
Jordan Crouse July 27, 2018, 5:03 p.m. UTC | #6
On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> On 27/07/18 15:10, Dmitry Osipenko wrote:
> >On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> >>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> >>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> >>>>The proposed solution adds a new option to the base device driver
> >>>>structure that allows device drivers to explicitly convey to the drivers
> >>>>core that the implicit IOMMU backing for devices must not happen.
> >>>
> >>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> >>>
> >>>If we add something like this then it should not be the choice of the
> >>>device driver, but of the user and/or the firmware.
> >>
> >>Agreed, and it would still need somebody to configure an identity domain so
> >>that transactions aren't aborted immediately. We currently allow the
> >>identity domain to be used by default via a command-line option, so I guess
> >>we'd need a way for firmware to request that on a per-device basis.
> >
> >The IOMMU mapping itself is not a problem, the problem is the management of
> >the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> >activities because:
> >
> >1) GPU HW require additional configuration for the IOMMU usage and dumb
> >mapping of the allocations simply doesn't work.
> 
> Generally, that's already handled by the DRM drivers allocating
> their own unmanaged domains. The only problem we really need to
> solve in that regard is that currently the device DMA ops don't get
> updated when moving away from the managed domain. That's been OK for
> the VFIO case where the device is bound to a different driver which
> we know won't make any explicit DMA API calls, but for the more
> general case of IOMMU-aware drivers we could certainly do with a bit
> of cooperation between the IOMMU API, DMA API, and arch code to
> update the DMA ops dynamically to cope with intermediate subsystems
> making DMA API calls on behalf of devices they don't know the
> intimate details of.
> 
> >2) Older Tegra generations have a limited resource and capabilities in regards
> >to IOMMU usage, allocating IOMMU domain per-device is just impossible for
> >example.
> >
> >3) HW performs context switches and so particular allocations have to be
> >assigned to a particular contexts IOMMU domain.
> 
> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> case just doesn't fit into the current API model at all. We need the
> IOMMU driver to somehow know about the specific details of which
> devices have magic associations with specific contexts, and we
> almost certainly need a more expressive interface than
> iommu_domain_alloc() to have any hope of reliable results.

This is correct for Qualcomm GPUs - The GPU hardware context switching 
requires a specific context and there are some restrictions around
secure contexts as well.

We don't really care if the DMA attaches to a context just as long as it
doesn't attach to the one(s) we care about. Perhaps a "valid context" mask would
work in from the DT or the device struct to give the subsystems a clue as to
which domains they were allowed to use. I recognize that there isn't a
one-size-fits-all solution to this problem so I'm open to different ideas.

Jordan
Rob Herring July 27, 2018, 5:13 p.m. UTC | #7
On Fri, Jul 27, 2018 at 8:20 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> On Fri, Jul 27, 2018 at 05:10:22PM +0300, Dmitry Osipenko wrote:
> > I'm not sure what you guys are meaning by the "firmware", could you elaborate
> > please? Do you mean the Open Firmware and hence the devicetree or what?
>
> Yes, I think the best way to request this is using a device-tree
> property. Letting the device driver request it is definitly a bad idea.

I don't follow why we need a property rather than being implied by the
device's (the GPU) compatible string.

Rob
Dmitry Osipenko July 27, 2018, 5:16 p.m. UTC | #8
On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> > On 27/07/18 15:10, Dmitry Osipenko wrote:
> > >On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> > >>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> > >>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > >>>>The proposed solution adds a new option to the base device driver
> > >>>>structure that allows device drivers to explicitly convey to the
> > >>>>drivers
> > >>>>core that the implicit IOMMU backing for devices must not happen.
> > >>>
> > >>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> > >>>
> > >>>If we add something like this then it should not be the choice of the
> > >>>device driver, but of the user and/or the firmware.
> > >>
> > >>Agreed, and it would still need somebody to configure an identity domain
> > >>so
> > >>that transactions aren't aborted immediately. We currently allow the
> > >>identity domain to be used by default via a command-line option, so I
> > >>guess
> > >>we'd need a way for firmware to request that on a per-device basis.
> > >
> > >The IOMMU mapping itself is not a problem, the problem is the management
> > >of
> > >the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> > >activities because:
> > >
> > >1) GPU HW require additional configuration for the IOMMU usage and dumb
> > >mapping of the allocations simply doesn't work.
> > 
> > Generally, that's already handled by the DRM drivers allocating
> > their own unmanaged domains. The only problem we really need to
> > solve in that regard is that currently the device DMA ops don't get
> > updated when moving away from the managed domain. That's been OK for
> > the VFIO case where the device is bound to a different driver which
> > we know won't make any explicit DMA API calls, but for the more
> > general case of IOMMU-aware drivers we could certainly do with a bit
> > of cooperation between the IOMMU API, DMA API, and arch code to
> > update the DMA ops dynamically to cope with intermediate subsystems
> > making DMA API calls on behalf of devices they don't know the
> > intimate details of.
> > 
> > >2) Older Tegra generations have a limited resource and capabilities in
> > >regards to IOMMU usage, allocating IOMMU domain per-device is just
> > >impossible for example.
> > >
> > >3) HW performs context switches and so particular allocations have to be
> > >assigned to a particular contexts IOMMU domain.
> > 
> > I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> > case just doesn't fit into the current API model at all. We need the
> > IOMMU driver to somehow know about the specific details of which
> > devices have magic associations with specific contexts, and we
> > almost certainly need a more expressive interface than
> > iommu_domain_alloc() to have any hope of reliable results.
> 
> This is correct for Qualcomm GPUs - The GPU hardware context switching
> requires a specific context and there are some restrictions around
> secure contexts as well.
> 
> We don't really care if the DMA attaches to a context just as long as it
> doesn't attach to the one(s) we care about. Perhaps a "valid context" mask
> would work in from the DT or the device struct to give the subsystems a
> clue as to which domains they were allowed to use. I recognize that there
> isn't a one-size-fits-all solution to this problem so I'm open to different
> ideas.

Designating whether implicit IOMMU backing is appropriate for a device via 
device-tree property sounds a bit awkward because that will be a kinda 
software description (of a custom Linux driver model), while device-tree is 
supposed to describe HW.

What about to grant IOMMU drivers with ability to decide whether the implicit 
backing for a device is appropriate? Like this:

bool implicit_iommu_for_dma_is_allowed(struct device *dev)
{
	const struct iommu_ops *ops = dev->bus->iommu_ops;
	struct iommu_group *group;

	group = iommu_group_get(dev);
	if (!group)
		return NULL;

	iommu_group_put(group);

	if (!ops->implicit_iommu_for_dma_is_allowed)
		return true;

	return ops->implicit_iommu_for_dma_is_allowed(dev);
}

Then arch_setup_dma_ops() could have a clue whether implicit IOMMU backing for 
a device is appropriate.
Joerg Roedel July 27, 2018, 6:31 p.m. UTC | #9
On Fri, Jul 27, 2018 at 11:13:31AM -0600, Rob Herring wrote:
> I don't follow why we need a property rather than being implied by the
> device's (the GPU) compatible string.

There might be devices where either setup works, with or without IOMMU
translation, and the firmware can set the property depending on whether
the user wants more performance or more security.

If we have a whitelist in the kernel this gets more complicated, we
probably need additional kernel-parameters to overwrite those whitelist
entries. Having a property in the device-tree seems to be a better way
here, imho.


	Joerg
Dmitry Osipenko July 27, 2018, 8:02 p.m. UTC | #10
On Friday, 27 July 2018 21:31:34 MSK Joerg Roedel wrote:
> On Fri, Jul 27, 2018 at 11:13:31AM -0600, Rob Herring wrote:
> > I don't follow why we need a property rather than being implied by the
> > device's (the GPU) compatible string.
> 
> There might be devices where either setup works, with or without IOMMU
> translation, and the firmware can set the property depending on whether
> the user wants more performance or more security.
> 
> If we have a whitelist in the kernel this gets more complicated, we
> probably need additional kernel-parameters to overwrite those whitelist
> entries. Having a property in the device-tree seems to be a better way
> here, imho.

IIUC, device-tree should be considered to be "written in stone" for a consumer 
device and hence firmware property isn't something that could be easily 
changed. The kernel-parameter will be much more universal. Anyway the global 
whitelisting should be a different topic for discussion, right now we need a 
kind of private whitelisting that is internal to kernel.
Dmitry Osipenko July 28, 2018, 11:40 a.m. UTC | #11
On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
> > On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> > > On 27/07/18 15:10, Dmitry Osipenko wrote:
> > > >On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> > > >>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> > > >>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > > >>>>The proposed solution adds a new option to the base device driver
> > > >>>>structure that allows device drivers to explicitly convey to the
> > > >>>>drivers
> > > >>>>core that the implicit IOMMU backing for devices must not happen.
> > > >>>
> > > >>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> > > >>>
> > > >>>If we add something like this then it should not be the choice of the
> > > >>>device driver, but of the user and/or the firmware.
> > > >>
> > > >>Agreed, and it would still need somebody to configure an identity
> > > >>domain
> > > >>so
> > > >>that transactions aren't aborted immediately. We currently allow the
> > > >>identity domain to be used by default via a command-line option, so I
> > > >>guess
> > > >>we'd need a way for firmware to request that on a per-device basis.
> > > >
> > > >The IOMMU mapping itself is not a problem, the problem is the
> > > >management
> > > >of
> > > >the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> > > >activities because:
> > > >
> > > >1) GPU HW require additional configuration for the IOMMU usage and dumb
> > > >mapping of the allocations simply doesn't work.
> > > 
> > > Generally, that's already handled by the DRM drivers allocating
> > > their own unmanaged domains. The only problem we really need to
> > > solve in that regard is that currently the device DMA ops don't get
> > > updated when moving away from the managed domain. That's been OK for
> > > the VFIO case where the device is bound to a different driver which
> > > we know won't make any explicit DMA API calls, but for the more
> > > general case of IOMMU-aware drivers we could certainly do with a bit
> > > of cooperation between the IOMMU API, DMA API, and arch code to
> > > update the DMA ops dynamically to cope with intermediate subsystems
> > > making DMA API calls on behalf of devices they don't know the
> > > intimate details of.
> > > 
> > > >2) Older Tegra generations have a limited resource and capabilities in
> > > >regards to IOMMU usage, allocating IOMMU domain per-device is just
> > > >impossible for example.
> > > >
> > > >3) HW performs context switches and so particular allocations have to
> > > >be
> > > >assigned to a particular contexts IOMMU domain.
> > > 
> > > I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> > > case just doesn't fit into the current API model at all. We need the
> > > IOMMU driver to somehow know about the specific details of which
> > > devices have magic associations with specific contexts, and we
> > > almost certainly need a more expressive interface than
> > > iommu_domain_alloc() to have any hope of reliable results.
> > 
> > This is correct for Qualcomm GPUs - The GPU hardware context switching
> > requires a specific context and there are some restrictions around
> > secure contexts as well.
> > 
> > We don't really care if the DMA attaches to a context just as long as it
> > doesn't attach to the one(s) we care about. Perhaps a "valid context" mask
> > would work in from the DT or the device struct to give the subsystems a
> > clue as to which domains they were allowed to use. I recognize that there
> > isn't a one-size-fits-all solution to this problem so I'm open to
> > different
> > ideas.
> 
> Designating whether implicit IOMMU backing is appropriate for a device via
[snip]

[I've noticed that this should've been an answer to different message in the 
thread..]

As of the domain type, I don't think that any domain requirements should be 
defined via the DT, that should be purely internal to the kernel drivers. 
Maybe iommu_domain_alloc() could get an additional argument, like some 
platform-specific domain descriptor.

Anyway a custom IOMMU domain type requirements should be a different topic for 
discussion, at least for now (AFAIK) there is no need for that on Tegra and I 
can't suggest anything really constructive about it. Though maybe Mikko could 
give a comment from the Tegra perspective about whether custom domain type 
requirements could be ever needed on Tegra, what those requirements are and 
how it could be implemented.
Dmitry Osipenko Aug. 2, 2018, 6:24 p.m. UTC | #12
On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
> > On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> > > On 27/07/18 15:10, Dmitry Osipenko wrote:
> > > >On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> > > >>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> > > >>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> > > >>>>The proposed solution adds a new option to the base device driver
> > > >>>>structure that allows device drivers to explicitly convey to the
> > > >>>>drivers
> > > >>>>core that the implicit IOMMU backing for devices must not happen.
> > > >>>
> > > >>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> > > >>>
> > > >>>If we add something like this then it should not be the choice of the
> > > >>>device driver, but of the user and/or the firmware.
> > > >>
> > > >>Agreed, and it would still need somebody to configure an identity
> > > >>domain
> > > >>so
> > > >>that transactions aren't aborted immediately. We currently allow the
> > > >>identity domain to be used by default via a command-line option, so I
> > > >>guess
> > > >>we'd need a way for firmware to request that on a per-device basis.
> > > >
> > > >The IOMMU mapping itself is not a problem, the problem is the
> > > >management
> > > >of
> > > >the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> > > >activities because:
> > > >
> > > >1) GPU HW require additional configuration for the IOMMU usage and dumb
> > > >mapping of the allocations simply doesn't work.
> > > 
> > > Generally, that's already handled by the DRM drivers allocating
> > > their own unmanaged domains. The only problem we really need to
> > > solve in that regard is that currently the device DMA ops don't get
> > > updated when moving away from the managed domain. That's been OK for
> > > the VFIO case where the device is bound to a different driver which
> > > we know won't make any explicit DMA API calls, but for the more
> > > general case of IOMMU-aware drivers we could certainly do with a bit
> > > of cooperation between the IOMMU API, DMA API, and arch code to
> > > update the DMA ops dynamically to cope with intermediate subsystems
> > > making DMA API calls on behalf of devices they don't know the
> > > intimate details of.
> > > 
> > > >2) Older Tegra generations have a limited resource and capabilities in
> > > >regards to IOMMU usage, allocating IOMMU domain per-device is just
> > > >impossible for example.
> > > >
> > > >3) HW performs context switches and so particular allocations have to
> > > >be
> > > >assigned to a particular contexts IOMMU domain.
> > > 
> > > I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> > > case just doesn't fit into the current API model at all. We need the
> > > IOMMU driver to somehow know about the specific details of which
> > > devices have magic associations with specific contexts, and we
> > > almost certainly need a more expressive interface than
> > > iommu_domain_alloc() to have any hope of reliable results.
> > 
> > This is correct for Qualcomm GPUs - The GPU hardware context switching
> > requires a specific context and there are some restrictions around
> > secure contexts as well.
> > 
> > We don't really care if the DMA attaches to a context just as long as it
> > doesn't attach to the one(s) we care about. Perhaps a "valid context" mask
> > would work in from the DT or the device struct to give the subsystems a
> > clue as to which domains they were allowed to use. I recognize that there
> > isn't a one-size-fits-all solution to this problem so I'm open to
> > different
> > ideas.
> 
> Designating whether implicit IOMMU backing is appropriate for a device via
> device-tree property sounds a bit awkward because that will be a kinda
> software description (of a custom Linux driver model), while device-tree is
> supposed to describe HW.
> 
> What about to grant IOMMU drivers with ability to decide whether the
> implicit backing for a device is appropriate? Like this:
> 
> bool implicit_iommu_for_dma_is_allowed(struct device *dev)
> {
> 	const struct iommu_ops *ops = dev->bus->iommu_ops;
> 	struct iommu_group *group;
> 
> 	group = iommu_group_get(dev);
> 	if (!group)
> 		return NULL;
> 
> 	iommu_group_put(group);
> 
> 	if (!ops->implicit_iommu_for_dma_is_allowed)
> 		return true;
> 
> 	return ops->implicit_iommu_for_dma_is_allowed(dev);
> }
> 
> Then arch_setup_dma_ops() could have a clue whether implicit IOMMU backing
> for a device is appropriate.

Guys, does it sound good to you or maybe you have something else on your mind? 
Even if it's not an ideal solution, it fixes the immediate problem and should 
be good enough for the starter.
Robin Murphy Aug. 3, 2018, 3:43 p.m. UTC | #13
On 02/08/18 19:24, Dmitry Osipenko wrote:
> On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
>> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
>>> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
>>>> On 27/07/18 15:10, Dmitry Osipenko wrote:
>>>>> On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
>>>>>> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
>>>>>>> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
>>>>>>>> The proposed solution adds a new option to the base device driver
>>>>>>>> structure that allows device drivers to explicitly convey to the
>>>>>>>> drivers
>>>>>>>> core that the implicit IOMMU backing for devices must not happen.
>>>>>>>
>>>>>>> Why is IOMMU mapping a problem for the Tegra GPU driver?
>>>>>>>
>>>>>>> If we add something like this then it should not be the choice of the
>>>>>>> device driver, but of the user and/or the firmware.
>>>>>>
>>>>>> Agreed, and it would still need somebody to configure an identity
>>>>>> domain
>>>>>> so
>>>>>> that transactions aren't aborted immediately. We currently allow the
>>>>>> identity domain to be used by default via a command-line option, so I
>>>>>> guess
>>>>>> we'd need a way for firmware to request that on a per-device basis.
>>>>>
>>>>> The IOMMU mapping itself is not a problem, the problem is the
>>>>> management
>>>>> of
>>>>> the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
>>>>> activities because:
>>>>>
>>>>> 1) GPU HW require additional configuration for the IOMMU usage and dumb
>>>>> mapping of the allocations simply doesn't work.
>>>>
>>>> Generally, that's already handled by the DRM drivers allocating
>>>> their own unmanaged domains. The only problem we really need to
>>>> solve in that regard is that currently the device DMA ops don't get
>>>> updated when moving away from the managed domain. That's been OK for
>>>> the VFIO case where the device is bound to a different driver which
>>>> we know won't make any explicit DMA API calls, but for the more
>>>> general case of IOMMU-aware drivers we could certainly do with a bit
>>>> of cooperation between the IOMMU API, DMA API, and arch code to
>>>> update the DMA ops dynamically to cope with intermediate subsystems
>>>> making DMA API calls on behalf of devices they don't know the
>>>> intimate details of.
>>>>
>>>>> 2) Older Tegra generations have a limited resource and capabilities in
>>>>> regards to IOMMU usage, allocating IOMMU domain per-device is just
>>>>> impossible for example.
>>>>>
>>>>> 3) HW performs context switches and so particular allocations have to
>>>>> be
>>>>> assigned to a particular contexts IOMMU domain.
>>>>
>>>> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
>>>> case just doesn't fit into the current API model at all. We need the
>>>> IOMMU driver to somehow know about the specific details of which
>>>> devices have magic associations with specific contexts, and we
>>>> almost certainly need a more expressive interface than
>>>> iommu_domain_alloc() to have any hope of reliable results.
>>>
>>> This is correct for Qualcomm GPUs - The GPU hardware context switching
>>> requires a specific context and there are some restrictions around
>>> secure contexts as well.
>>>
>>> We don't really care if the DMA attaches to a context just as long as it
>>> doesn't attach to the one(s) we care about. Perhaps a "valid context" mask
>>> would work in from the DT or the device struct to give the subsystems a
>>> clue as to which domains they were allowed to use. I recognize that there
>>> isn't a one-size-fits-all solution to this problem so I'm open to
>>> different
>>> ideas.
>>
>> Designating whether implicit IOMMU backing is appropriate for a device via
>> device-tree property sounds a bit awkward because that will be a kinda
>> software description (of a custom Linux driver model), while device-tree is
>> supposed to describe HW.
>>
>> What about to grant IOMMU drivers with ability to decide whether the
>> implicit backing for a device is appropriate? Like this:
>>
>> bool implicit_iommu_for_dma_is_allowed(struct device *dev)
>> {
>> 	const struct iommu_ops *ops = dev->bus->iommu_ops;
>> 	struct iommu_group *group;
>>
>> 	group = iommu_group_get(dev);
>> 	if (!group)
>> 		return NULL;
>>
>> 	iommu_group_put(group);
>>
>> 	if (!ops->implicit_iommu_for_dma_is_allowed)
>> 		return true;
>>
>> 	return ops->implicit_iommu_for_dma_is_allowed(dev);
>> }
>>
>> Then arch_setup_dma_ops() could have a clue whether implicit IOMMU backing
>> for a device is appropriate.
> 
> Guys, does it sound good to you or maybe you have something else on your mind?
> Even if it's not an ideal solution, it fixes the immediate problem and should
> be good enough for the starter.

To me that looks like a step ion the wrong direction that won't help at 
all in actually addressing the underlying issues.

If the GPU driver wants to explicitly control IOMMU mappings instead of 
relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own 
unmanaged domain. At that point it shouldn't matter if a DMA ops domain 
was allocated, since the GPU device will no longer be attached to it. 
Yes, there may be some improvements to make like having unused domains 
not consume hardware contexts, but that's internal to the relevant IOMMU 
drivers. If moving in and out of DMA ops domains leaves the actual 
dma_ops broken, that's already a problem between the IOMMU API and the 
arch DMA code as I've mentioned before.

Furthermore, given what the example above is trying to do, 
arch_setup_dma_ops() is way too late to do it - the default domain was 
already set up in iommu_group_get_for_dev() when the IOMMU driver first 
saw that device. An "opt-out" mechanism that doesn't actually opt out 
and just bodges around being opted-in after the fact doesn't strike me 
as something which can grow to be robust and maintainable.

For the case where  a device has some special hardware relationship with 
a particular IOMMU context, the IOMMU driver *has* to be completely 
aware of that, i.e. it needs to be described in DT/ACPI, either via some 
explicit binding or at least inferred from some SoC/instance-specific 
IOMMU compatible. Then the IOMMU driver needs to know when the driver 
for that device is requesting its special domain so that it provide the 
correct context (and *not* allocate that context for other uses). 
Anything which just relies on the order in which things currently happen 
to be allocated is far too fragile long-term.

Robin.
Jordan Crouse Aug. 3, 2018, 5 p.m. UTC | #14
On Fri, Aug 03, 2018 at 04:43:41PM +0100, Robin Murphy wrote:
> On 02/08/18 19:24, Dmitry Osipenko wrote:
> >On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
> >>On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
> >>>On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> >>>>On 27/07/18 15:10, Dmitry Osipenko wrote:
> >>>>>On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> >>>>>>On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> >>>>>>>On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> >>>>>>>>The proposed solution adds a new option to the base device driver
> >>>>>>>>structure that allows device drivers to explicitly convey to the
> >>>>>>>>drivers
> >>>>>>>>core that the implicit IOMMU backing for devices must not happen.
> >>>>>>>
> >>>>>>>Why is IOMMU mapping a problem for the Tegra GPU driver?
> >>>>>>>
> >>>>>>>If we add something like this then it should not be the choice of the
> >>>>>>>device driver, but of the user and/or the firmware.
> >>>>>>
> >>>>>>Agreed, and it would still need somebody to configure an identity
> >>>>>>domain
> >>>>>>so
> >>>>>>that transactions aren't aborted immediately. We currently allow the
> >>>>>>identity domain to be used by default via a command-line option, so I
> >>>>>>guess
> >>>>>>we'd need a way for firmware to request that on a per-device basis.
> >>>>>
> >>>>>The IOMMU mapping itself is not a problem, the problem is the
> >>>>>management
> >>>>>of
> >>>>>the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> >>>>>activities because:
> >>>>>
> >>>>>1) GPU HW require additional configuration for the IOMMU usage and dumb
> >>>>>mapping of the allocations simply doesn't work.
> >>>>
> >>>>Generally, that's already handled by the DRM drivers allocating
> >>>>their own unmanaged domains. The only problem we really need to
> >>>>solve in that regard is that currently the device DMA ops don't get
> >>>>updated when moving away from the managed domain. That's been OK for
> >>>>the VFIO case where the device is bound to a different driver which
> >>>>we know won't make any explicit DMA API calls, but for the more
> >>>>general case of IOMMU-aware drivers we could certainly do with a bit
> >>>>of cooperation between the IOMMU API, DMA API, and arch code to
> >>>>update the DMA ops dynamically to cope with intermediate subsystems
> >>>>making DMA API calls on behalf of devices they don't know the
> >>>>intimate details of.
> >>>>
> >>>>>2) Older Tegra generations have a limited resource and capabilities in
> >>>>>regards to IOMMU usage, allocating IOMMU domain per-device is just
> >>>>>impossible for example.
> >>>>>
> >>>>>3) HW performs context switches and so particular allocations have to
> >>>>>be
> >>>>>assigned to a particular contexts IOMMU domain.
> >>>>
> >>>>I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> >>>>case just doesn't fit into the current API model at all. We need the
> >>>>IOMMU driver to somehow know about the specific details of which
> >>>>devices have magic associations with specific contexts, and we
> >>>>almost certainly need a more expressive interface than
> >>>>iommu_domain_alloc() to have any hope of reliable results.
> >>>
> >>>This is correct for Qualcomm GPUs - The GPU hardware context switching
> >>>requires a specific context and there are some restrictions around
> >>>secure contexts as well.
> >>>
> >>>We don't really care if the DMA attaches to a context just as long as it
> >>>doesn't attach to the one(s) we care about. Perhaps a "valid context" mask
> >>>would work in from the DT or the device struct to give the subsystems a
> >>>clue as to which domains they were allowed to use. I recognize that there
> >>>isn't a one-size-fits-all solution to this problem so I'm open to
> >>>different
> >>>ideas.
> >>
> >>Designating whether implicit IOMMU backing is appropriate for a device via
> >>device-tree property sounds a bit awkward because that will be a kinda
> >>software description (of a custom Linux driver model), while device-tree is
> >>supposed to describe HW.
> >>
> >>What about to grant IOMMU drivers with ability to decide whether the
> >>implicit backing for a device is appropriate? Like this:
> >>
> >>bool implicit_iommu_for_dma_is_allowed(struct device *dev)
> >>{
> >>	const struct iommu_ops *ops = dev->bus->iommu_ops;
> >>	struct iommu_group *group;
> >>
> >>	group = iommu_group_get(dev);
> >>	if (!group)
> >>		return NULL;
> >>
> >>	iommu_group_put(group);
> >>
> >>	if (!ops->implicit_iommu_for_dma_is_allowed)
> >>		return true;
> >>
> >>	return ops->implicit_iommu_for_dma_is_allowed(dev);
> >>}
> >>
> >>Then arch_setup_dma_ops() could have a clue whether implicit IOMMU backing
> >>for a device is appropriate.
> >
> >Guys, does it sound good to you or maybe you have something else on your mind?
> >Even if it's not an ideal solution, it fixes the immediate problem and should
> >be good enough for the starter.
> 
> To me that looks like a step ion the wrong direction that won't help
> at all in actually addressing the underlying issues.
> 
> If the GPU driver wants to explicitly control IOMMU mappings instead
> of relying on the IOMMU_DOMAIN_DMA abstraction, then it should use
> its own unmanaged domain. At that point it shouldn't matter if a DMA
> ops domain was allocated, since the GPU device will no longer be
> attached to it. Yes, there may be some improvements to make like
> having unused domains not consume hardware contexts, but that's
> internal to the relevant IOMMU drivers. If moving in and out of DMA
> ops domains leaves the actual dma_ops broken, that's already a
> problem between the IOMMU API and the arch DMA code as I've
> mentioned before.
> 
> Furthermore, given what the example above is trying to do,
> arch_setup_dma_ops() is way too late to do it - the default domain
> was already set up in iommu_group_get_for_dev() when the IOMMU
> driver first saw that device. An "opt-out" mechanism that doesn't
> actually opt out and just bodges around being opted-in after the
> fact doesn't strike me as something which can grow to be robust and
> maintainable.
> 
> For the case where  a device has some special hardware relationship
> with a particular IOMMU context, the IOMMU driver *has* to be
> completely aware of that, i.e. it needs to be described in DT/ACPI,
> either via some explicit binding or at least inferred from some
> SoC/instance-specific IOMMU compatible. Then the IOMMU driver needs
> to know when the driver for that device is requesting its special
> domain so that it provide the correct context (and *not* allocate
> that context for other uses). Anything which just relies on the
> order in which things currently happen to be allocated is far too
> fragile long-term.

Speculating what this would look like for arm-smmu-v2. Thinking we
would add either a mask or a bit to the per-device smmu data to identify
the "available" contexts for dma (or alternatively, the "reserved" domains
for  the device) and then change up _arm_smmu_alloc_bitmap() to hand out
the right numbers accordingly.

Then that leaves the interface for the device to request a specific context
- I guess a DOMAIN_ATTR would work for that in lieu of changing
iommu_alloc_domain() and friends.

Would this match up with your thinking?

Jordan
Dmitry Osipenko Aug. 15, 2018, 7:56 p.m. UTC | #15
On Friday, 3 August 2018 18:43:41 MSK Robin Murphy wrote:
> On 02/08/18 19:24, Dmitry Osipenko wrote:
> > On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
> >> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
> >>> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
> >>>> On 27/07/18 15:10, Dmitry Osipenko wrote:
> >>>>> On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
> >>>>>> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
> >>>>>>> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
> >>>>>>>> The proposed solution adds a new option to the base device driver
> >>>>>>>> structure that allows device drivers to explicitly convey to the
> >>>>>>>> drivers
> >>>>>>>> core that the implicit IOMMU backing for devices must not happen.
> >>>>>>> 
> >>>>>>> Why is IOMMU mapping a problem for the Tegra GPU driver?
> >>>>>>> 
> >>>>>>> If we add something like this then it should not be the choice of
> >>>>>>> the
> >>>>>>> device driver, but of the user and/or the firmware.
> >>>>>> 
> >>>>>> Agreed, and it would still need somebody to configure an identity
> >>>>>> domain
> >>>>>> so
> >>>>>> that transactions aren't aborted immediately. We currently allow the
> >>>>>> identity domain to be used by default via a command-line option, so I
> >>>>>> guess
> >>>>>> we'd need a way for firmware to request that on a per-device basis.
> >>>>> 
> >>>>> The IOMMU mapping itself is not a problem, the problem is the
> >>>>> management
> >>>>> of
> >>>>> the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
> >>>>> activities because:
> >>>>> 
> >>>>> 1) GPU HW require additional configuration for the IOMMU usage and
> >>>>> dumb
> >>>>> mapping of the allocations simply doesn't work.
> >>>> 
> >>>> Generally, that's already handled by the DRM drivers allocating
> >>>> their own unmanaged domains. The only problem we really need to
> >>>> solve in that regard is that currently the device DMA ops don't get
> >>>> updated when moving away from the managed domain. That's been OK for
> >>>> the VFIO case where the device is bound to a different driver which
> >>>> we know won't make any explicit DMA API calls, but for the more
> >>>> general case of IOMMU-aware drivers we could certainly do with a bit
> >>>> of cooperation between the IOMMU API, DMA API, and arch code to
> >>>> update the DMA ops dynamically to cope with intermediate subsystems
> >>>> making DMA API calls on behalf of devices they don't know the
> >>>> intimate details of.
> >>>> 
> >>>>> 2) Older Tegra generations have a limited resource and capabilities in
> >>>>> regards to IOMMU usage, allocating IOMMU domain per-device is just
> >>>>> impossible for example.
> >>>>> 
> >>>>> 3) HW performs context switches and so particular allocations have to
> >>>>> be
> >>>>> assigned to a particular contexts IOMMU domain.
> >>>> 
> >>>> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
> >>>> case just doesn't fit into the current API model at all. We need the
> >>>> IOMMU driver to somehow know about the specific details of which
> >>>> devices have magic associations with specific contexts, and we
> >>>> almost certainly need a more expressive interface than
> >>>> iommu_domain_alloc() to have any hope of reliable results.
> >>> 
> >>> This is correct for Qualcomm GPUs - The GPU hardware context switching
> >>> requires a specific context and there are some restrictions around
> >>> secure contexts as well.
> >>> 
> >>> We don't really care if the DMA attaches to a context just as long as it
> >>> doesn't attach to the one(s) we care about. Perhaps a "valid context"
> >>> mask
> >>> would work in from the DT or the device struct to give the subsystems a
> >>> clue as to which domains they were allowed to use. I recognize that
> >>> there
> >>> isn't a one-size-fits-all solution to this problem so I'm open to
> >>> different
> >>> ideas.
> >> 
> >> Designating whether implicit IOMMU backing is appropriate for a device
> >> via
> >> device-tree property sounds a bit awkward because that will be a kinda
> >> software description (of a custom Linux driver model), while device-tree
> >> is
> >> supposed to describe HW.
> >> 
> >> What about to grant IOMMU drivers with ability to decide whether the
> >> implicit backing for a device is appropriate? Like this:
> >> 
> >> bool implicit_iommu_for_dma_is_allowed(struct device *dev)
> >> {
> >> 
> >> 	const struct iommu_ops *ops = dev->bus->iommu_ops;
> >> 	struct iommu_group *group;
> >> 	
> >> 	group = iommu_group_get(dev);
> >> 	if (!group)
> >> 	
> >> 		return NULL;
> >> 	
> >> 	iommu_group_put(group);
> >> 	
> >> 	if (!ops->implicit_iommu_for_dma_is_allowed)
> >> 	
> >> 		return true;
> >> 	
> >> 	return ops->implicit_iommu_for_dma_is_allowed(dev);
> >> 
> >> }
> >> 
> >> Then arch_setup_dma_ops() could have a clue whether implicit IOMMU
> >> backing
> >> for a device is appropriate.
> > 
> > Guys, does it sound good to you or maybe you have something else on your
> > mind? Even if it's not an ideal solution, it fixes the immediate problem
> > and should be good enough for the starter.
> 
> To me that looks like a step ion the wrong direction that won't help at
> all in actually addressing the underlying issues.
> 
> If the GPU driver wants to explicitly control IOMMU mappings instead of
> relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own
> unmanaged domain. At that point it shouldn't matter if a DMA ops domain
> was allocated, since the GPU device will no longer be attached to it.

It is not obvious to me what solution you are proposing.. 

Are you saying that the detaching from the DMA IOMMU domain that is provided 
by dma_ops() implementer (ARM32 arch for example) should be generalized and 
hence there should be something like:

	dma_detach_device_from_iommu_dma_domain(dev);

that drivers will have to invoke.

And hence there will be dma_map_ops.iommu_detach_device() that dma_ops() 
provider will have to implement. Thereby provider will detach device from DMA 
domain, destroy the domain and update the DMA ops of the device.

> Yes, there may be some improvements to make like having unused domains
> not consume hardware contexts, but that's internal to the relevant IOMMU
> drivers. If moving in and out of DMA ops domains leaves the actual
> dma_ops broken, that's already a problem between the IOMMU API and the
> arch DMA code as I've mentioned before.
> 
> Furthermore, given what the example above is trying to do,
> arch_setup_dma_ops() is way too late to do it - the default domain was
> already set up in iommu_group_get_for_dev() when the IOMMU driver first
> saw that device. An "opt-out" mechanism that doesn't actually opt out
> and just bodges around being opted-in after the fact doesn't strike me
> as something which can grow to be robust and maintainable.
> 
> For the case where  a device has some special hardware relationship with
> a particular IOMMU context, the IOMMU driver *has* to be completely
> aware of that, i.e. it needs to be described in DT/ACPI, either via some
> explicit binding or at least inferred from some SoC/instance-specific
> IOMMU compatible. Then the IOMMU driver needs to know when the driver
> for that device is requesting its special domain so that it provide the
> correct context (and *not* allocate that context for other uses).
> Anything which just relies on the order in which things currently happen
> to be allocated is far too fragile long-term.

If hardware has some restrictions, then that should be reflected in the 
hardware description. But that's not what we are trying to solve, at least 
there is no such problem right now for NVIDIA Tegra.
Robin Murphy Aug. 16, 2018, 5:23 p.m. UTC | #16
On 15/08/18 20:56, Dmitry Osipenko wrote:
> On Friday, 3 August 2018 18:43:41 MSK Robin Murphy wrote:
>> On 02/08/18 19:24, Dmitry Osipenko wrote:
>>> On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
>>>> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
>>>>> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
>>>>>> On 27/07/18 15:10, Dmitry Osipenko wrote:
>>>>>>> On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
>>>>>>>> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
>>>>>>>>> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
>>>>>>>>>> The proposed solution adds a new option to the base device driver
>>>>>>>>>> structure that allows device drivers to explicitly convey to the
>>>>>>>>>> drivers
>>>>>>>>>> core that the implicit IOMMU backing for devices must not happen.
>>>>>>>>>
>>>>>>>>> Why is IOMMU mapping a problem for the Tegra GPU driver?
>>>>>>>>>
>>>>>>>>> If we add something like this then it should not be the choice of
>>>>>>>>> the
>>>>>>>>> device driver, but of the user and/or the firmware.
>>>>>>>>
>>>>>>>> Agreed, and it would still need somebody to configure an identity
>>>>>>>> domain
>>>>>>>> so
>>>>>>>> that transactions aren't aborted immediately. We currently allow the
>>>>>>>> identity domain to be used by default via a command-line option, so I
>>>>>>>> guess
>>>>>>>> we'd need a way for firmware to request that on a per-device basis.
>>>>>>>
>>>>>>> The IOMMU mapping itself is not a problem, the problem is the
>>>>>>> management
>>>>>>> of
>>>>>>> the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
>>>>>>> activities because:
>>>>>>>
>>>>>>> 1) GPU HW require additional configuration for the IOMMU usage and
>>>>>>> dumb
>>>>>>> mapping of the allocations simply doesn't work.
>>>>>>
>>>>>> Generally, that's already handled by the DRM drivers allocating
>>>>>> their own unmanaged domains. The only problem we really need to
>>>>>> solve in that regard is that currently the device DMA ops don't get
>>>>>> updated when moving away from the managed domain. That's been OK for
>>>>>> the VFIO case where the device is bound to a different driver which
>>>>>> we know won't make any explicit DMA API calls, but for the more
>>>>>> general case of IOMMU-aware drivers we could certainly do with a bit
>>>>>> of cooperation between the IOMMU API, DMA API, and arch code to
>>>>>> update the DMA ops dynamically to cope with intermediate subsystems
>>>>>> making DMA API calls on behalf of devices they don't know the
>>>>>> intimate details of.
>>>>>>
>>>>>>> 2) Older Tegra generations have a limited resource and capabilities in
>>>>>>> regards to IOMMU usage, allocating IOMMU domain per-device is just
>>>>>>> impossible for example.
>>>>>>>
>>>>>>> 3) HW performs context switches and so particular allocations have to
>>>>>>> be
>>>>>>> assigned to a particular contexts IOMMU domain.
>>>>>>
>>>>>> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
>>>>>> case just doesn't fit into the current API model at all. We need the
>>>>>> IOMMU driver to somehow know about the specific details of which
>>>>>> devices have magic associations with specific contexts, and we
>>>>>> almost certainly need a more expressive interface than
>>>>>> iommu_domain_alloc() to have any hope of reliable results.
>>>>>
>>>>> This is correct for Qualcomm GPUs - The GPU hardware context switching
>>>>> requires a specific context and there are some restrictions around
>>>>> secure contexts as well.
>>>>>
>>>>> We don't really care if the DMA attaches to a context just as long as it
>>>>> doesn't attach to the one(s) we care about. Perhaps a "valid context"
>>>>> mask
>>>>> would work in from the DT or the device struct to give the subsystems a
>>>>> clue as to which domains they were allowed to use. I recognize that
>>>>> there
>>>>> isn't a one-size-fits-all solution to this problem so I'm open to
>>>>> different
>>>>> ideas.
>>>>
>>>> Designating whether implicit IOMMU backing is appropriate for a device
>>>> via
>>>> device-tree property sounds a bit awkward because that will be a kinda
>>>> software description (of a custom Linux driver model), while device-tree
>>>> is
>>>> supposed to describe HW.
>>>>
>>>> What about to grant IOMMU drivers with ability to decide whether the
>>>> implicit backing for a device is appropriate? Like this:
>>>>
>>>> bool implicit_iommu_for_dma_is_allowed(struct device *dev)
>>>> {
>>>>
>>>> 	const struct iommu_ops *ops = dev->bus->iommu_ops;
>>>> 	struct iommu_group *group;
>>>> 	
>>>> 	group = iommu_group_get(dev);
>>>> 	if (!group)
>>>> 	
>>>> 		return NULL;
>>>> 	
>>>> 	iommu_group_put(group);
>>>> 	
>>>> 	if (!ops->implicit_iommu_for_dma_is_allowed)
>>>> 	
>>>> 		return true;
>>>> 	
>>>> 	return ops->implicit_iommu_for_dma_is_allowed(dev);
>>>>
>>>> }
>>>>
>>>> Then arch_setup_dma_ops() could have a clue whether implicit IOMMU
>>>> backing
>>>> for a device is appropriate.
>>>
>>> Guys, does it sound good to you or maybe you have something else on your
>>> mind? Even if it's not an ideal solution, it fixes the immediate problem
>>> and should be good enough for the starter.
>>
>> To me that looks like a step ion the wrong direction that won't help at
>> all in actually addressing the underlying issues.
>>
>> If the GPU driver wants to explicitly control IOMMU mappings instead of
>> relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own
>> unmanaged domain. At that point it shouldn't matter if a DMA ops domain
>> was allocated, since the GPU device will no longer be attached to it.
> 
> It is not obvious to me what solution you are proposing..
> 
> Are you saying that the detaching from the DMA IOMMU domain that is provided
> by dma_ops() implementer (ARM32 arch for example) should be generalized and
> hence there should be something like:
> 
> 	dma_detach_device_from_iommu_dma_domain(dev);
> 
> that drivers will have to invoke.

No, I mean that drivers should not have to care at all. If the device 
has been given a set of DMA ops which rely on it being attached to a 
default DMA domain, that's not the driver's fault and it's not something 
the driver should have deal with. Either the DMA ops themselves should 
be robust and provide a non-IOMMU fallback if they detect that the 
device is currently attached to a different domain, or the attach 
operation (ideally in the IOMMU core, but at worst in the IOMMU driver's 
.attach_dev callback) should automatically tell the arch code to update 
the device's DMA ops appropriately for the target domain. There are 
already examples of both approaches dotted around arch-specific code, so 
the question is which particular solution is most appropriate to 
standardise on in what is intended to be generic code.

> And hence there will be dma_map_ops.iommu_detach_device() that dma_ops()
> provider will have to implement. Thereby provider will detach device from DMA
> domain, destroy the domain and update the DMA ops of the device.
> 
>> Yes, there may be some improvements to make like having unused domains
>> not consume hardware contexts, but that's internal to the relevant IOMMU
>> drivers. If moving in and out of DMA ops domains leaves the actual
>> dma_ops broken, that's already a problem between the IOMMU API and the
>> arch DMA code as I've mentioned before.
>>
>> Furthermore, given what the example above is trying to do,
>> arch_setup_dma_ops() is way too late to do it - the default domain was
>> already set up in iommu_group_get_for_dev() when the IOMMU driver first
>> saw that device. An "opt-out" mechanism that doesn't actually opt out
>> and just bodges around being opted-in after the fact doesn't strike me
>> as something which can grow to be robust and maintainable.
>>
>> For the case where  a device has some special hardware relationship with
>> a particular IOMMU context, the IOMMU driver *has* to be completely
>> aware of that, i.e. it needs to be described in DT/ACPI, either via some
>> explicit binding or at least inferred from some SoC/instance-specific
>> IOMMU compatible. Then the IOMMU driver needs to know when the driver
>> for that device is requesting its special domain so that it provide the
>> correct context (and *not* allocate that context for other uses).
>> Anything which just relies on the order in which things currently happen
>> to be allocated is far too fragile long-term.
> 
> If hardware has some restrictions, then that should be reflected in the
> hardware description. But that's not what we are trying to solve, at least
> there is no such problem right now for NVIDIA Tegra.

OK, maybe I misunderstood "HW performs context switches and so 
particular allocations have to be assigned to a particular contexts 
IOMMU domain." - is it that the domain can be backed by any hardware 
context and the Tegra GPU driver only needs to know *which* one, rather 
then needing a specific hard-wired context to be allocated as in the 
Qcom case?

Robin.
Dmitry Osipenko Sept. 20, 2018, 5:09 p.m. UTC | #17
On 8/16/18 8:23 PM, Robin Murphy wrote:
> On 15/08/18 20:56, Dmitry Osipenko wrote:
>> On Friday, 3 August 2018 18:43:41 MSK Robin Murphy wrote:
>>> On 02/08/18 19:24, Dmitry Osipenko wrote:
>>>> On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
>>>>> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
>>>>>> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
>>>>>>> On 27/07/18 15:10, Dmitry Osipenko wrote:
>>>>>>>> On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
>>>>>>>>> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
>>>>>>>>>> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
>>>>>>>>>>> The proposed solution adds a new option to the base device driver
>>>>>>>>>>> structure that allows device drivers to explicitly convey to the
>>>>>>>>>>> drivers
>>>>>>>>>>> core that the implicit IOMMU backing for devices must not happen.
>>>>>>>>>>
>>>>>>>>>> Why is IOMMU mapping a problem for the Tegra GPU driver?
>>>>>>>>>>
>>>>>>>>>> If we add something like this then it should not be the choice of
>>>>>>>>>> the
>>>>>>>>>> device driver, but of the user and/or the firmware.
>>>>>>>>>
>>>>>>>>> Agreed, and it would still need somebody to configure an identity
>>>>>>>>> domain
>>>>>>>>> so
>>>>>>>>> that transactions aren't aborted immediately. We currently allow the
>>>>>>>>> identity domain to be used by default via a command-line option, so I
>>>>>>>>> guess
>>>>>>>>> we'd need a way for firmware to request that on a per-device basis.
>>>>>>>>
>>>>>>>> The IOMMU mapping itself is not a problem, the problem is the
>>>>>>>> management
>>>>>>>> of
>>>>>>>> the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
>>>>>>>> activities because:
>>>>>>>>
>>>>>>>> 1) GPU HW require additional configuration for the IOMMU usage and
>>>>>>>> dumb
>>>>>>>> mapping of the allocations simply doesn't work.
>>>>>>>
>>>>>>> Generally, that's already handled by the DRM drivers allocating
>>>>>>> their own unmanaged domains. The only problem we really need to
>>>>>>> solve in that regard is that currently the device DMA ops don't get
>>>>>>> updated when moving away from the managed domain. That's been OK for
>>>>>>> the VFIO case where the device is bound to a different driver which
>>>>>>> we know won't make any explicit DMA API calls, but for the more
>>>>>>> general case of IOMMU-aware drivers we could certainly do with a bit
>>>>>>> of cooperation between the IOMMU API, DMA API, and arch code to
>>>>>>> update the DMA ops dynamically to cope with intermediate subsystems
>>>>>>> making DMA API calls on behalf of devices they don't know the
>>>>>>> intimate details of.
>>>>>>>
>>>>>>>> 2) Older Tegra generations have a limited resource and capabilities in
>>>>>>>> regards to IOMMU usage, allocating IOMMU domain per-device is just
>>>>>>>> impossible for example.
>>>>>>>>
>>>>>>>> 3) HW performs context switches and so particular allocations have to
>>>>>>>> be
>>>>>>>> assigned to a particular contexts IOMMU domain.
>>>>>>>
>>>>>>> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
>>>>>>> case just doesn't fit into the current API model at all. We need the
>>>>>>> IOMMU driver to somehow know about the specific details of which
>>>>>>> devices have magic associations with specific contexts, and we
>>>>>>> almost certainly need a more expressive interface than
>>>>>>> iommu_domain_alloc() to have any hope of reliable results.
>>>>>>
>>>>>> This is correct for Qualcomm GPUs - The GPU hardware context switching
>>>>>> requires a specific context and there are some restrictions around
>>>>>> secure contexts as well.
>>>>>>
>>>>>> We don't really care if the DMA attaches to a context just as long as it
>>>>>> doesn't attach to the one(s) we care about. Perhaps a "valid context"
>>>>>> mask
>>>>>> would work in from the DT or the device struct to give the subsystems a
>>>>>> clue as to which domains they were allowed to use. I recognize that
>>>>>> there
>>>>>> isn't a one-size-fits-all solution to this problem so I'm open to
>>>>>> different
>>>>>> ideas.
>>>>>
>>>>> Designating whether implicit IOMMU backing is appropriate for a device
>>>>> via
>>>>> device-tree property sounds a bit awkward because that will be a kinda
>>>>> software description (of a custom Linux driver model), while device-tree
>>>>> is
>>>>> supposed to describe HW.
>>>>>
>>>>> What about to grant IOMMU drivers with ability to decide whether the
>>>>> implicit backing for a device is appropriate? Like this:
>>>>>
>>>>> bool implicit_iommu_for_dma_is_allowed(struct device *dev)
>>>>> {
>>>>>
>>>>>     const struct iommu_ops *ops = dev->bus->iommu_ops;
>>>>>     struct iommu_group *group;
>>>>>
>>>>>     group = iommu_group_get(dev);
>>>>>     if (!group)
>>>>>
>>>>>         return NULL;
>>>>>
>>>>>     iommu_group_put(group);
>>>>>
>>>>>     if (!ops->implicit_iommu_for_dma_is_allowed)
>>>>>
>>>>>         return true;
>>>>>
>>>>>     return ops->implicit_iommu_for_dma_is_allowed(dev);
>>>>>
>>>>> }
>>>>>
>>>>> Then arch_setup_dma_ops() could have a clue whether implicit IOMMU
>>>>> backing
>>>>> for a device is appropriate.
>>>>
>>>> Guys, does it sound good to you or maybe you have something else on your
>>>> mind? Even if it's not an ideal solution, it fixes the immediate problem
>>>> and should be good enough for the starter.
>>>
>>> To me that looks like a step ion the wrong direction that won't help at
>>> all in actually addressing the underlying issues.
>>>
>>> If the GPU driver wants to explicitly control IOMMU mappings instead of
>>> relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own
>>> unmanaged domain. At that point it shouldn't matter if a DMA ops domain
>>> was allocated, since the GPU device will no longer be attached to it.
>>
>> It is not obvious to me what solution you are proposing..
>>
>> Are you saying that the detaching from the DMA IOMMU domain that is provided
>> by dma_ops() implementer (ARM32 arch for example) should be generalized and
>> hence there should be something like:
>>
>>     dma_detach_device_from_iommu_dma_domain(dev);
>>
>> that drivers will have to invoke.
> 
> No, I mean that drivers should not have to care at all. If the device has been 
> given a set of DMA ops which rely on it being attached to a default DMA domain, 
> that's not the driver's fault and it's not something the driver should have deal 
> with. Either the DMA ops themselves should be robust and provide a non-IOMMU 
> fallback if they detect that the device is currently attached to a different 
> domain, or the attach operation (ideally in the IOMMU core, but at worst in the 
> IOMMU driver's .attach_dev callback) should automatically tell the arch code to 
> update the device's DMA ops appropriately for the target domain. There are 
> already examples of both approaches dotted around arch-specific code, so the 
> question is which particular solution is most appropriate to standardise on in 
> what is intended to be generic code.
Okay, thank you for the clarification. It will be better to start with a 
workaround within the driver, maybe later we could come up with a universal 
solution. Is there any chance that you or Joerg could help with the 
standardization in the future? I don't feel that I have enough expertise and 
capacity to do that.

>> And hence there will be dma_map_ops.iommu_detach_device() that dma_ops()
>> provider will have to implement. Thereby provider will detach device from DMA
>> domain, destroy the domain and update the DMA ops of the device.
>>
>>> Yes, there may be some improvements to make like having unused domains
>>> not consume hardware contexts, but that's internal to the relevant IOMMU
>>> drivers. If moving in and out of DMA ops domains leaves the actual
>>> dma_ops broken, that's already a problem between the IOMMU API and the
>>> arch DMA code as I've mentioned before.
>>>
>>> Furthermore, given what the example above is trying to do,
>>> arch_setup_dma_ops() is way too late to do it - the default domain was
>>> already set up in iommu_group_get_for_dev() when the IOMMU driver first
>>> saw that device. An "opt-out" mechanism that doesn't actually opt out
>>> and just bodges around being opted-in after the fact doesn't strike me
>>> as something which can grow to be robust and maintainable.
>>>
>>> For the case where  a device has some special hardware relationship with
>>> a particular IOMMU context, the IOMMU driver *has* to be completely
>>> aware of that, i.e. it needs to be described in DT/ACPI, either via some
>>> explicit binding or at least inferred from some SoC/instance-specific
>>> IOMMU compatible. Then the IOMMU driver needs to know when the driver
>>> for that device is requesting its special domain so that it provide the
>>> correct context (and *not* allocate that context for other uses).
>>> Anything which just relies on the order in which things currently happen
>>> to be allocated is far too fragile long-term.
>>
>> If hardware has some restrictions, then that should be reflected in the
>> hardware description. But that's not what we are trying to solve, at least
>> there is no such problem right now for NVIDIA Tegra.
> 
> OK, maybe I misunderstood "HW performs context switches and so particular 
> allocations have to be assigned to a particular contexts IOMMU domain." - is it 
> that the domain can be backed by any hardware context and the Tegra GPU driver 
> only needs to know *which* one, rather then needing a specific hard-wired 
> context to be allocated as in the Qcom case?

Yes, I can't recall that Tegra has any limitations like Qcom has.