diff mbox

[v6,8/8] arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops

Message ID 54B63028.3090701@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexandre Courbot Jan. 14, 2015, 9 a.m. UTC
On 12/02/2014 01:57 AM, Will Deacon wrote:
> This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
> actually called outside of a few drivers) into arch_setup_dma_ops, so
> that we can use IOMMUs for DMA transfers in a more generic fashion.
>
> Since this significantly complicates the arch_setup_dma_ops function,
> it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
> is not set, the iommu parameter is ignored and the normal ops are used
> instead.

A series for IOMMU support with Tegra/Nouveau ceased to work after this 
patch. The Tegra IOMMU is not registered by the time the DT is parsed, 
and thus all devices end up without the proper DMA ops set up because 
the phandle to the IOMMU cannot be resolved. Subsequently calling 
arm_iommu_create_mapping() and arm_iommu_attach_device() from the driver 
(as I used to do until 3.18) does not help since the call to 
set_dma_ops() has been moved out of arm_iommu_attach_device(). Therefore 
there seems to be no way for a device to gets its correct DMA ops unless 
the IOMMU is ready by the time the DT is parsed.

Also potentially affected by this are the Rockchip DRM and OMAP3 ISP 
drivers, which follow the same pattern.

This raises the following questions:

1) Why are arm_iommu_create_mapping() and arm_iommu_attach_device() 
still public since they cannot set the DMA ops and thus seem to be 
useless outside of arch_setup_dma_ops()?

2) Say you want to use the IOMMU API in your driver, and have an iommu 
property in your device's DT node. If by chance your IOMMU is registered 
early, you will already have a mapping automatically created even before 
your probe function is called. Can this be avoided? Is it even safe?

The issue is workarounded (at least for me) with the following patch:

         return 0;

This allows arm_iommu_create_mapping() and arm_iommu_attach_device() to 
set the correct DMA ops when called from the driver. But it doesn't look 
like mergeable material and I'd like to know whether there is a better 
way to handle this, or whether I should just use the IOMMU API directly. 
Knowing that even this might not be safe if ARM_DMA_USE_IOMMU is enabled 
because of point 2) above.

So basically I'm afraid I might not even be able to use the IOMMU safely 
after this. Or am I missing anything?

Thanks,
Alex.

Comments

Will Deacon Jan. 14, 2015, 10:46 a.m. UTC | #1
Hi Alex,

On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> On 12/02/2014 01:57 AM, Will Deacon wrote:
> > This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
> > actually called outside of a few drivers) into arch_setup_dma_ops, so
> > that we can use IOMMUs for DMA transfers in a more generic fashion.
> >
> > Since this significantly complicates the arch_setup_dma_ops function,
> > it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
> > is not set, the iommu parameter is ignored and the normal ops are used
> > instead.
> 
> A series for IOMMU support with Tegra/Nouveau ceased to work after this 
> patch.

Which series? This code shouldn't even be executed unless you start using
of_xlate and the generic bindings.

> The Tegra IOMMU is not registered by the time the DT is parsed, 
> and thus all devices end up without the proper DMA ops set up because 
> the phandle to the IOMMU cannot be resolved.

You might want to look at the patches posted for the exynos, renesas and ARM
SMMUs for some hints in how to use the new API.

> Subsequently calling arm_iommu_create_mapping() and
> arm_iommu_attach_device() from the driver (as I used to do until 3.18)
> does not help since the call to set_dma_ops() has been moved out of
> arm_iommu_attach_device(). Therefore there seems to be no way for a device
> to gets its correct DMA ops unless the IOMMU is ready by the time the DT
> is parsed.
> 
> Also potentially affected by this are the Rockchip DRM and OMAP3 ISP 
> drivers, which follow the same pattern.

I don't understand why any code currently in mainline should be affected.
Please can you elaborate on the failure case?

> This raises the following questions:
> 
> 1) Why are arm_iommu_create_mapping() and arm_iommu_attach_device() 
> still public since they cannot set the DMA ops and thus seem to be 
> useless outside of arch_setup_dma_ops()?

It has callers outside of the file. I'd like to make it static, but that
means doing some non-trivial porting of all the callers, which I'm also
unable to test.

> 2) Say you want to use the IOMMU API in your driver, and have an iommu 
> property in your device's DT node. If by chance your IOMMU is registered 
> early, you will already have a mapping automatically created even before 
> your probe function is called. Can this be avoided? Is it even safe?

Currently, I think you have to either teardown the ops manually or return
an error from of_xlate. Thierry was also looking at this sort of thing,
so it might be worth talking to him.

Will
Heiko Stuebner Jan. 14, 2015, 1:51 p.m. UTC | #2
Hi Will, Alexandre, Daniel,

Am Mittwoch, 14. Januar 2015, 10:46:10 schrieb Will Deacon:
> Hi Alex,
> 
> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > On 12/02/2014 01:57 AM, Will Deacon wrote:
> > > This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
> > > actually called outside of a few drivers) into arch_setup_dma_ops, so
> > > that we can use IOMMUs for DMA transfers in a more generic fashion.
> > > 
> > > Since this significantly complicates the arch_setup_dma_ops function,
> > > it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
> > > is not set, the iommu parameter is ignored and the normal ops are used
> > > instead.
> > 
> > A series for IOMMU support with Tegra/Nouveau ceased to work after this
> > patch.
> 
> Which series? This code shouldn't even be executed unless you start using
> of_xlate and the generic bindings.
> 
> > The Tegra IOMMU is not registered by the time the DT is parsed,
> > and thus all devices end up without the proper DMA ops set up because
> > the phandle to the IOMMU cannot be resolved.
> 
> You might want to look at the patches posted for the exynos, renesas and ARM
> SMMUs for some hints in how to use the new API.
> 
> > Subsequently calling arm_iommu_create_mapping() and
> > arm_iommu_attach_device() from the driver (as I used to do until 3.18)
> > does not help since the call to set_dma_ops() has been moved out of
> > arm_iommu_attach_device(). Therefore there seems to be no way for a device
> > to gets its correct DMA ops unless the IOMMU is ready by the time the DT
> > is parsed.
> > 
> > Also potentially affected by this are the Rockchip DRM and OMAP3 ISP
> > drivers, which follow the same pattern.
> 
> I don't understand why any code currently in mainline should be affected.
> Please can you elaborate on the failure case?

As Alexandre suspected the new Rockchip drm code seems to be affected by
this. I hadn't played with the drm code before last weekend and was then
stumbling over different iommu related issues. As I hadn't to much contact
with iommus till now I didn't get very far.

But with Alexandre's bandaid patch of adding
	set_dma_ops(dev, &iommu_ops);
to arm_iommu_attach_device both problems go away.


So to elaborate on the two failure cases:

When attaching either hdmi or vga connectors at runtime, I get

[drm:drm_mode_debug_printmodeline] Modeline 26:"1024x768" 85 94500 1024 1072 1168 1376 768 769 772 808 0x40 0x5
[drm:drm_crtc_helper_set_mode] [CRTC:14]
[drm:vop_crtc_dpms] crtc[14] mode[0]
rockchip-vop ff930000.vop: Attached to iommu domain
[drm] processing encoder TMDS-18
[drm] processing encoder DAC-22
[drm] processing connector HDMI-A-1
[drm] processing connector VGA-1
[drm:vop_win_update] [PLANE:12] [FB:-1->24] update
rk_iommu ff930300.iommu: Page fault at 0x2d400500 of type read
rk_iommu ff930300.iommu: iova = 0x2d400500: dte_index: 0xb5 pte_index: 0x0 page_offset: 0x500
rk_iommu ff930300.iommu: mmu_dte_addr: 0x2e3b3000 dte@0x2e3b32d4: 0x000000 valid: 0 pte@0x00000000: 0x000000 valid: 0 page@0x00000000 flags: 0x0
[drm:drm_crtc_helper_set_mode] [ENCODER:22:DAC-22] set [MODE:26:1024x768]

===============
When my wip vga-connector is plugged in at boot, I get


[drm:drm_target_preferred] found mode 1280x1024
[drm:drm_setup_crtcs] picking CRTCs for 4096x4096 config
[drm:drm_setup_crtcs] desired mode 1280x1024 set on crtc 14 (0,0)
------------[ cut here ]------------
WARNING: CPU: 1 PID: 33 at mm/page_alloc.c:2645 __alloc_pages_nodemask+0x18c/0x6a8()
Modules linked in:
CPU: 1 PID: 33 Comm: kworker/u8:1 Not tainted 3.19.0-rc1+ #1512
Hardware name: Rockchip Cortex-A9 (Device Tree)
Workqueue: deferwq deferred_probe_work_func
[<c00148e4>] (unwind_backtrace) from [<c00111e0>] (show_stack+0x10/0x14)
[<c00111e0>] (show_stack) from [<c0426564>] (dump_stack+0x6c/0x84)
[<c0426564>] (dump_stack) from [<c0021f34>] (warn_slowpath_common+0x80/0xac)
[<c0021f34>] (warn_slowpath_common) from [<c0021f78>] (warn_slowpath_null+0x18/0x1c)
[<c0021f78>] (warn_slowpath_null) from [<c008b114>] (__alloc_pages_nodemask+0x18c/0x6a8)
[<c008b114>] (__alloc_pages_nodemask) from [<c001a274>] (__dma_alloc_buffer.isra.18+0x2c/0x80)
[<c001a274>] (__dma_alloc_buffer.isra.18) from [<c001a2dc>] (__alloc_remap_buffer.isra.22+0x14/0x5c)
[<c001a2dc>] (__alloc_remap_buffer.isra.22) from [<c001a490>] (__dma_alloc+0x16c/0x1d8)
[<c001a490>] (__dma_alloc) from [<c001a614>] (arm_dma_alloc+0x84/0x90)
[<c001a614>] (arm_dma_alloc) from [<c022f610>] (rockchip_gem_create_object+0x8c/0xc4)
[<c022f610>] (rockchip_gem_create_object) from [<c022f158>] (rockchip_drm_fbdev_create+0x6c/0x1ec)
[<c022f158>] (rockchip_drm_fbdev_create) from [<c021499c>] (drm_fb_helper_initial_config+0x230/0x328)
[<c021499c>] (drm_fb_helper_initial_config) from [<c022f388>] (rockchip_drm_fbdev_init+0xa4/0xc0)
[<c022f388>] (rockchip_drm_fbdev_init) from [<c022ea18>] (rockchip_drm_load+0x1b8/0x1f4)
[<c022ea18>] (rockchip_drm_load) from [<c021c218>] (drm_dev_register+0x80/0x100)
[<c021c218>] (drm_dev_register) from [<c022e77c>] (rockchip_drm_bind+0x48/0x74)
[<c022e77c>] (rockchip_drm_bind) from [<c0234a58>] (try_to_bring_up_master.part.2+0xa4/0xf4)
[<c0234a58>] (try_to_bring_up_master.part.2) from [<c0234c58>] (component_add+0x9c/0x104)
[<c0234c58>] (component_add) from [<c023a1e4>] (platform_drv_probe+0x48/0x90)
[<c023a1e4>] (platform_drv_probe) from [<c0238c30>] (driver_probe_device+0x130/0x340)
[<c0238c30>] (driver_probe_device) from [<c0237270>] (bus_for_each_drv+0x70/0x84)
[<c0237270>] (bus_for_each_drv) from [<c0238a8c>] (device_attach+0x64/0x88)
[<c0238a8c>] (device_attach) from [<c0238090>] (bus_probe_device+0x28/0x98)
[<c0238090>] (bus_probe_device) from [<c0238548>] (deferred_probe_work_func+0x78/0xa4)
[<c0238548>] (deferred_probe_work_func) from [<c0034008>] (process_one_work+0x1c8/0x2f4)
[<c0034008>] (process_one_work) from [<c0034448>] (worker_thread+0x2e8/0x450)
[<c0034448>] (worker_thread) from [<c00380a8>] (kthread+0xdc/0xf0)
[<c00380a8>] (kthread) from [<c000e8b8>] (ret_from_fork+0x14/0x3c)
---[ end trace ce23b3730f8c0d32 ]---
[drm:rockchip_gem_create_object] *ERROR* failed to allocate 0x500000 byte dma buffer


where Daniel Kurtz already deducted in private:

"But, more importantly, this call stack has "arm_dma_alloc", which
suggests that are not actually using the iommu dma allocators.
The allocation should have been handled by arm_iommu_alloc_attrs(),
but for some reason, it is not.
The iommu allocator should have been installed as the allocator for
the drm device by the call to arm_iommu_attach_device() in
rockchip_drm_load."


> > This raises the following questions:
> > 
> > 1) Why are arm_iommu_create_mapping() and arm_iommu_attach_device()
> > still public since they cannot set the DMA ops and thus seem to be
> > useless outside of arch_setup_dma_ops()?
> 
> It has callers outside of the file. I'd like to make it static, but that
> means doing some non-trivial porting of all the callers, which I'm also
> unable to test.
> 
> > 2) Say you want to use the IOMMU API in your driver, and have an iommu
> > property in your device's DT node. If by chance your IOMMU is registered
> > early, you will already have a mapping automatically created even before
> > your probe function is called. Can this be avoided? Is it even safe?
> 
> Currently, I think you have to either teardown the ops manually or return
> an error from of_xlate. Thierry was also looking at this sort of thing,
> so it might be worth talking to him.
> 
> Will
[drm:output_poll_execute] [CONNECTOR:23:VGA-1] status updated from disconnected to connected
[drm:drm_fb_helper_hotplug_event] 
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:20:HDMI-A-1]
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:20:HDMI-A-1] disconnected
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:23:VGA-1]
[drm:drm_mode_debug_printmodeline] Modeline 26:"1280x1024" 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
[drm:drm_mode_prune_invalid] Not using 1280x1024 mode 12
[drm:drm_mode_debug_printmodeline] Modeline 27:"1152x864" 0 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
[drm:drm_mode_prune_invalid] Not using 1152x864 mode 12
[drm:drm_mode_debug_printmodeline] Modeline 38:"1280x1024" 0 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
[drm:drm_mode_prune_invalid] Not using 1280x1024 mode 12
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:23:VGA-1] probed modes :
[drm:drm_mode_debug_printmodeline] Modeline 28:"1024x768" 85 94500 1024 1072 1168 1376 768 769 772 808 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 39:"1024x768" 75 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 40:"1024x768" 70 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 41:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 42:"832x624" 75 57284 832 864 928 1152 624 625 628 667 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 29:"800x600" 85 56250 800 832 896 1048 600 601 604 631 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 43:"800x600" 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 44:"800x600" 72 50000 800 856 976 1040 600 637 643 666 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 31:"800x600" 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 32:"800x600" 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 30:"640x480" 85 36000 640 696 752 832 480 481 484 509 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 33:"640x480" 75 31500 640 656 720 840 480 481 484 500 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 34:"640x480" 73 31500 640 664 704 832 480 489 491 520 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 35:"640x480" 67 30240 640 704 768 864 480 483 486 525 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 36:"640x480" 60 25200 640 656 752 800 480 490 492 525 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 37:"720x400" 70 28320 720 738 846 900 400 412 414 449 0x40 0x6
[drm:drm_setup_crtcs] 
[drm:drm_enable_connectors] connector 20 enabled? no
[drm:drm_enable_connectors] connector 23 enabled? yes
[drm:drm_target_preferred] looking for cmdline mode on connector 23
[drm:drm_target_preferred] looking for preferred mode on connector 23 0
[drm:drm_target_preferred] found mode 1024x768
[drm:drm_setup_crtcs] picking CRTCs for 4096x4096 config
[drm:drm_setup_crtcs] desired mode 1024x768 set on crtc 14 (0,0)
[drm:drm_crtc_helper_set_config] 
[drm:drm_crtc_helper_set_config] [CRTC:8] [NOFB]
[drm:vop_crtc_dpms] crtc[8] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:vop_crtc_dpms] crtc[14] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:drm_crtc_helper_set_config] 
[drm:drm_crtc_helper_set_config] [CRTC:14] [FB:24] #connectors=1 (x y) (0 0)
[drm:drm_crtc_helper_set_config] crtc has no fb, full mode set
[drm:drm_crtc_helper_set_config] modes are different, full mode set
[drm:drm_mode_debug_printmodeline] Modeline 0:"" 0 0 0 0 0 0 0 0 0 0 0x0 0x0
[drm:drm_mode_debug_printmodeline] Modeline 26:"1024x768" 85 94500 1024 1072 1168 1376 768 769 772 808 0x40 0x5
[drm:drm_crtc_helper_set_config] encoder changed, full mode switch
[drm:drm_crtc_helper_set_config] [CONNECTOR:20:HDMI-A-1] to [NOCRTC]
[drm:drm_crtc_helper_set_config] crtc changed, full mode switch
[drm:drm_crtc_helper_set_config] [CONNECTOR:23:VGA-1] to [CRTC:14]
[drm:drm_crtc_helper_set_config] attempting to set mode from userspace
[drm:drm_mode_debug_printmodeline] Modeline 26:"1024x768" 85 94500 1024 1072 1168 1376 768 769 772 808 0x40 0x5
[drm:drm_crtc_helper_set_mode] [CRTC:14]
[drm:vop_crtc_dpms] crtc[14] mode[0]
rockchip-vop ff930000.vop: Attached to iommu domain
[drm] processing encoder TMDS-18
[drm] processing encoder DAC-22
[drm] processing connector HDMI-A-1
[drm] processing connector VGA-1
[drm:vop_win_update] [PLANE:12] [FB:-1->24] update
rk_iommu ff930300.iommu: Page fault at 0x2d400500 of type read
rk_iommu ff930300.iommu: iova = 0x2d400500: dte_index: 0xb5 pte_index: 0x0 page_offset: 0x500
rk_iommu ff930300.iommu: mmu_dte_addr: 0x2e3b3000 dte@0x2e3b32d4: 0x000000 valid: 0 pte@0x00000000: 0x000000 valid: 0 page@0x00000000 flags: 0x0
[drm:drm_crtc_helper_set_mode] [ENCODER:22:DAC-22] set [MODE:26:1024x768]
[drm:drm_crtc_helper_set_config] Setting connector DPMS state to on
[drm:drm_crtc_helper_set_config] 	[CONNECTOR:23:VGA-1] set DPMS on
[drm:vop_crtc_dpms] crtc[8] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter rk3x-i2c
[drm:output_poll_execute] [CONNECTOR:23:VGA-1] status updated from connected to disconnected
[drm:drm_fb_helper_hotplug_event] 
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:20:HDMI-A-1]
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:20:HDMI-A-1] disconnected
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:23:VGA-1]
[drm:drm_do_probe_ddc_edid] drm: skipping non-existent adapter rk3x-i2c
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:23:VGA-1] disconnected
[drm:drm_setup_crtcs] 
[drm:drm_enable_connectors] connector 20 enabled? no
[drm:drm_enable_connectors] connector 23 enabled? no
[drm:drm_setup_crtcs] picking CRTCs for 4096x4096 config
[drm:drm_crtc_helper_set_config] 
[drm:drm_crtc_helper_set_config] [CRTC:8] [NOFB]
[drm:vop_crtc_dpms] crtc[8] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:drm_crtc_helper_set_config] 
[drm:drm_crtc_helper_set_config] [CRTC:14] [NOFB]
[drm:vop_crtc_dpms] crtc[8] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:vop_crtc_dpms] crtc[14] mode[3]
rk_iommu ff930300.iommu: Enable stall request timed out, status: 0x00004b
rk_iommu ff930300.iommu: Disable paging request timed out, status: 0x00004b
rockchip-vop ff930000.vop: Detached from iommu domain
rockchip-drm display-subsystem: bound ff940000.vop (ops vop_component_ops)
rockchip-drm display-subsystem: bound ff930000.vop (ops vop_component_ops)
dwhdmi-rockchip ff980000.hdmi: Detected HDMI controller 0x20:0xa:0xa0:0xc1
rockchip-drm display-subsystem: bound ff980000.hdmi (ops dw_hdmi_rockchip_ops)
rockchip-drm display-subsystem: bound vga-bridge (ops vga_bridge_ops)
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] No driver support for vblank timestamp query.
[drm:vop_crtc_dpms] crtc[8] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:vop_crtc_dpms] crtc[14] mode[3]
[drm:vop_crtc_dpms] desired dpms mode is same as previous one.
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:20:HDMI-A-1]
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:20:HDMI-A-1] disconnected
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:23:VGA-1]
[drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:23:VGA-1] probed modes :
[drm:drm_mode_debug_printmodeline] Modeline 25:"1280x1024" 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
[drm:drm_mode_debug_printmodeline] Modeline 37:"1280x1024" 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 26:"1152x864" 75 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 27:"1024x768" 85 94500 1024 1072 1168 1376 768 769 772 808 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 38:"1024x768" 75 78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 39:"1024x768" 70 75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 40:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 41:"832x624" 75 57284 832 864 928 1152 624 625 628 667 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 28:"800x600" 85 56250 800 832 896 1048 600 601 604 631 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 42:"800x600" 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 43:"800x600" 72 50000 800 856 976 1040 600 637 643 666 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 30:"800x600" 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 31:"800x600" 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5
[drm:drm_mode_debug_printmodeline] Modeline 29:"640x480" 85 36000 640 696 752 832 480 481 484 509 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 32:"640x480" 75 31500 640 656 720 840 480 481 484 500 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 33:"640x480" 73 31500 640 664 704 832 480 489 491 520 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 34:"640x480" 67 30240 640 704 768 864 480 483 486 525 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 35:"640x480" 60 25200 640 656 752 800 480 490 492 525 0x40 0xa
[drm:drm_mode_debug_printmodeline] Modeline 36:"720x400" 70 28320 720 738 846 900 400 412 414 449 0x40 0x6
[drm:drm_setup_crtcs] 
[drm:drm_enable_connectors] connector 20 enabled? no
[drm:drm_enable_connectors] connector 23 enabled? yes
[drm:drm_target_preferred] looking for cmdline mode on connector 23
[drm:drm_target_preferred] looking for preferred mode on connector 23 0
[drm:drm_target_preferred] found mode 1280x1024
[drm:drm_setup_crtcs] picking CRTCs for 4096x4096 config
[drm:drm_setup_crtcs] desired mode 1280x1024 set on crtc 14 (0,0)
------------[ cut here ]------------
WARNING: CPU: 1 PID: 33 at mm/page_alloc.c:2645 __alloc_pages_nodemask+0x18c/0x6a8()
Modules linked in:
CPU: 1 PID: 33 Comm: kworker/u8:1 Not tainted 3.19.0-rc1+ #1512
Hardware name: Rockchip Cortex-A9 (Device Tree)
Workqueue: deferwq deferred_probe_work_func
[<c00148e4>] (unwind_backtrace) from [<c00111e0>] (show_stack+0x10/0x14)
[<c00111e0>] (show_stack) from [<c0426564>] (dump_stack+0x6c/0x84)
[<c0426564>] (dump_stack) from [<c0021f34>] (warn_slowpath_common+0x80/0xac)
[<c0021f34>] (warn_slowpath_common) from [<c0021f78>] (warn_slowpath_null+0x18/0x1c)
[<c0021f78>] (warn_slowpath_null) from [<c008b114>] (__alloc_pages_nodemask+0x18c/0x6a8)
[<c008b114>] (__alloc_pages_nodemask) from [<c001a274>] (__dma_alloc_buffer.isra.18+0x2c/0x80)
[<c001a274>] (__dma_alloc_buffer.isra.18) from [<c001a2dc>] (__alloc_remap_buffer.isra.22+0x14/0x5c)
[<c001a2dc>] (__alloc_remap_buffer.isra.22) from [<c001a490>] (__dma_alloc+0x16c/0x1d8)
[<c001a490>] (__dma_alloc) from [<c001a614>] (arm_dma_alloc+0x84/0x90)
[<c001a614>] (arm_dma_alloc) from [<c022f610>] (rockchip_gem_create_object+0x8c/0xc4)
[<c022f610>] (rockchip_gem_create_object) from [<c022f158>] (rockchip_drm_fbdev_create+0x6c/0x1ec)
[<c022f158>] (rockchip_drm_fbdev_create) from [<c021499c>] (drm_fb_helper_initial_config+0x230/0x328)
[<c021499c>] (drm_fb_helper_initial_config) from [<c022f388>] (rockchip_drm_fbdev_init+0xa4/0xc0)
[<c022f388>] (rockchip_drm_fbdev_init) from [<c022ea18>] (rockchip_drm_load+0x1b8/0x1f4)
[<c022ea18>] (rockchip_drm_load) from [<c021c218>] (drm_dev_register+0x80/0x100)
[<c021c218>] (drm_dev_register) from [<c022e77c>] (rockchip_drm_bind+0x48/0x74)
[<c022e77c>] (rockchip_drm_bind) from [<c0234a58>] (try_to_bring_up_master.part.2+0xa4/0xf4)
[<c0234a58>] (try_to_bring_up_master.part.2) from [<c0234c58>] (component_add+0x9c/0x104)
[<c0234c58>] (component_add) from [<c023a1e4>] (platform_drv_probe+0x48/0x90)
[<c023a1e4>] (platform_drv_probe) from [<c0238c30>] (driver_probe_device+0x130/0x340)
[<c0238c30>] (driver_probe_device) from [<c0237270>] (bus_for_each_drv+0x70/0x84)
[<c0237270>] (bus_for_each_drv) from [<c0238a8c>] (device_attach+0x64/0x88)
[<c0238a8c>] (device_attach) from [<c0238090>] (bus_probe_device+0x28/0x98)
[<c0238090>] (bus_probe_device) from [<c0238548>] (deferred_probe_work_func+0x78/0xa4)
[<c0238548>] (deferred_probe_work_func) from [<c0034008>] (process_one_work+0x1c8/0x2f4)
[<c0034008>] (process_one_work) from [<c0034448>] (worker_thread+0x2e8/0x450)
[<c0034448>] (worker_thread) from [<c00380a8>] (kthread+0xdc/0xf0)
[<c00380a8>] (kthread) from [<c000e8b8>] (ret_from_fork+0x14/0x3c)
---[ end trace ce23b3730f8c0d32 ]---
[drm:rockchip_gem_create_object] *ERROR* failed to allocate 0x500000 byte dma buffer
Will Deacon Jan. 14, 2015, 7:17 p.m. UTC | #3
On Wed, Jan 14, 2015 at 01:51:36PM +0000, Heiko Stübner wrote:
> Am Mittwoch, 14. Januar 2015, 10:46:10 schrieb Will Deacon:
> > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > > On 12/02/2014 01:57 AM, Will Deacon wrote:
> > > > This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
> > > > actually called outside of a few drivers) into arch_setup_dma_ops, so
> > > > that we can use IOMMUs for DMA transfers in a more generic fashion.
> > > > 
> > > > Since this significantly complicates the arch_setup_dma_ops function,
> > > > it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
> > > > is not set, the iommu parameter is ignored and the normal ops are used
> > > > instead.
> > > 
> > > A series for IOMMU support with Tegra/Nouveau ceased to work after this
> > > patch.
> > 
> > Which series? This code shouldn't even be executed unless you start using
> > of_xlate and the generic bindings.
> > 
> > > The Tegra IOMMU is not registered by the time the DT is parsed,
> > > and thus all devices end up without the proper DMA ops set up because
> > > the phandle to the IOMMU cannot be resolved.
> > 
> > You might want to look at the patches posted for the exynos, renesas and ARM
> > SMMUs for some hints in how to use the new API.
> > 
> > > Subsequently calling arm_iommu_create_mapping() and
> > > arm_iommu_attach_device() from the driver (as I used to do until 3.18)
> > > does not help since the call to set_dma_ops() has been moved out of
> > > arm_iommu_attach_device(). Therefore there seems to be no way for a device
> > > to gets its correct DMA ops unless the IOMMU is ready by the time the DT
> > > is parsed.
> > > 
> > > Also potentially affected by this are the Rockchip DRM and OMAP3 ISP
> > > drivers, which follow the same pattern.
> > 
> > I don't understand why any code currently in mainline should be affected.
> > Please can you elaborate on the failure case?
> 
> As Alexandre suspected the new Rockchip drm code seems to be affected by
> this. I hadn't played with the drm code before last weekend and was then
> stumbling over different iommu related issues. As I hadn't to much contact
> with iommus till now I didn't get very far.
> 
> But with Alexandre's bandaid patch of adding
> 	set_dma_ops(dev, &iommu_ops);
> to arm_iommu_attach_device both problems go away.
> 
> 
> So to elaborate on the two failure cases:

Aha, I see what you mean now -- the issue is that attaching to an IOMMU
domain no longer swizzles the DMA ops. Furthermore, we also need to take
into account the coherency of the device, which we didn't do before (and
assumedly "worked" because all of the users happened to be non-coherent).

Maybe we just need to add something like arm_iommu_swizzle_dma_ops, so
that direct users of the arm_iommu_* API can manage things manually?

Will
Alexandre Courbot Jan. 15, 2015, 2:57 a.m. UTC | #4
On 01/14/2015 07:46 PM, Will Deacon wrote:
> Hi Alex,
>
> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
>> On 12/02/2014 01:57 AM, Will Deacon wrote:
>>> This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
>>> actually called outside of a few drivers) into arch_setup_dma_ops, so
>>> that we can use IOMMUs for DMA transfers in a more generic fashion.
>>>
>>> Since this significantly complicates the arch_setup_dma_ops function,
>>> it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
>>> is not set, the iommu parameter is ignored and the normal ops are used
>>> instead.
>>
>> A series for IOMMU support with Tegra/Nouveau ceased to work after this
>> patch.
>
> Which series? This code shouldn't even be executed unless you start using
> of_xlate and the generic bindings.
>
>> The Tegra IOMMU is not registered by the time the DT is parsed,
>> and thus all devices end up without the proper DMA ops set up because
>> the phandle to the IOMMU cannot be resolved.
>
> You might want to look at the patches posted for the exynos, renesas and ARM
> SMMUs for some hints in how to use the new API.
>
>> Subsequently calling arm_iommu_create_mapping() and
>> arm_iommu_attach_device() from the driver (as I used to do until 3.18)
>> does not help since the call to set_dma_ops() has been moved out of
>> arm_iommu_attach_device(). Therefore there seems to be no way for a device
>> to gets its correct DMA ops unless the IOMMU is ready by the time the DT
>> is parsed.
>>
>> Also potentially affected by this are the Rockchip DRM and OMAP3 ISP
>> drivers, which follow the same pattern.
>
> I don't understand why any code currently in mainline should be affected.
> Please can you elaborate on the failure case?

Here is the sequence of events:

1) DT is populated and of_dma_configure() is called for every device. 
Here is the stack trace:

of_dma_configure
of_platform_device_create_pdata
of_platform_bus_create
of_platform_bus_create
of_platform_populate
customize_machine
do_one_initcall

of_dma_configure() is supposed to set the iommu_ops for every device 
(and this is currently the only place in the kernel where this can 
happen), but since the IOMMU is not ready yet it falls back to the 
default arm_dma_ops.

2) Driver calls arm_iommu_create_mapping() and arm_iommu_attach_device() 
(which were already called during step 1 btw). Both calls succeed, but 
since arm_iommu_attach_device() does not change the ops anymore we are 
still left with arm_dma_ops.

3) Things don't go well. :)

There are several problems with the way things are currently setup IMHO:

1) CONFIG_ARM_DMA_USE_IOMMU forcibly makes the DMA API go through the 
IOMMU. This effectively forbids the simultaneous use of the IOMMU API 
and DMA API because of address-space collisions issue. Being a kernel 
config option, there is no way to turn this behavior off which will 
likely become a problem for multi-configs where some platforms might 
want this and some other not.

2) IIUC arm_iommu_*() are now not supposed to be called directly by 
drivers anymore. At least that's what their current behavior makes me 
think, as well as the fact that they are not defined if 
CONFIG_ARM_DMA_USE_IOMMU is not set. Yet they are still public and 
drivers that use them have not been updated consequently.

3) This is a longer-standing issue, but IIUC the fact that the IOMMU VM 
used by the DMA API is not available effectively prevents anyone from 
using the DMA API behind a IOMMU and the IOMMU API simultaneously (which 
you now have to assume is always the case because of 1)).

These are huge issues - under the current conditions the only safe path 
for me is to eschew the DMA API completely and port my work to the IOMMU 
API, and I suspect this will affect other people as well.

>
>> This raises the following questions:
>>
>> 1) Why are arm_iommu_create_mapping() and arm_iommu_attach_device()
>> still public since they cannot set the DMA ops and thus seem to be
>> useless outside of arch_setup_dma_ops()?
>
> It has callers outside of the file. I'd like to make it static, but that
> means doing some non-trivial porting of all the callers, which I'm also
> unable to test.

Yeah, but all the callers of these functions are broken anyway as of 
3.18-rc4...

>
>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
>> property in your device's DT node. If by chance your IOMMU is registered
>> early, you will already have a mapping automatically created even before
>> your probe function is called. Can this be avoided? Is it even safe?
>
> Currently, I think you have to either teardown the ops manually or return
> an error from of_xlate. Thierry was also looking at this sort of thing,
> so it might be worth talking to him.

Tearing down the ops manually does not sound like something drivers 
should do - how can they even be aware of the private dma_ops anyway?

Thanks,
Alex.
Thierry Reding Jan. 15, 2015, 8:28 a.m. UTC | #5
On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
[...]
> > 2) Say you want to use the IOMMU API in your driver, and have an iommu 
> > property in your device's DT node. If by chance your IOMMU is registered 
> > early, you will already have a mapping automatically created even before 
> > your probe function is called. Can this be avoided? Is it even safe?
> 
> Currently, I think you have to either teardown the ops manually or return
> an error from of_xlate. Thierry was also looking at this sort of thing,
> so it might be worth talking to him.

I already explained in earlier threads why I think this is a bad idea.
It's completely unnatural for any driver to manually tear down something
that it didn't want set up in the first place. It also means that you
have to carefully audit any users of these IOMMU APIs to make sure that
they do tear down. That doesn't sound like a good incremental approach,
as evidenced by the breakage that Alex and Heiko have encountered.

The solution for me has been to completely side-step the issue and not
register the IOMMU with the new mechanism at all. That is, there's no
.of_xlate() implementation, which means that the ARM DMA API glue won't
try to be smart and use the IOMMU in ways it's not meant to be used.
This has several advantages, such as that I can also use the regular
driver model for suspend/resume of the IOMMU, and I get to enjoy the
benefits of devres in the IOMMU driver. Probe ordering is still a tiny
issue, but we can easily solve that using explicit initcall ordering
(which really isn't any worse than IOMMU_OF_DECLARE()).

Ideally of course I could use deferred probing to solve that. Hiroshi
and I posted patches to implement that over a year ago[0], but it was
mostly ignored after some, in my opinion, fruitful discussion.

That patch requires drivers to explicitly call iommu_attach() to
advertise that they want to use the IOMMU. This was done because a
previous attempt to call iommu_attach() from the driver core was
rejected on the sole argument that "it doesn't belong in the core",
without further explanation.

Note that with that approach things all happen at driver probe time, so
we can simply return -EPROBE_DEFER when the IOMMU isn't ready yet. At
the same time the IOMMU can be a regular driver just like any of the
other resource providers like clocks, regulators, etc.

That all said, I'm fine with the current situation, too. Nobody can
force the Tegra SMMU driver to register via IOMMU_OF_DECLARE() and I'm
actively recommending the use of the IOMMU API directly. That way we
get the control we need for drivers where it matters most (i.e. where
close control of the address space and switching of address spaces per
process are needed). Any other drivers, like USB or SDMMC simply use
the DMA API and get their buffers from a contiguous pool.

Thierry

[0]: https://lkml.org/lkml/2014/6/26/476
Thierry Reding Jan. 15, 2015, 8:30 a.m. UTC | #6
On Wed, Jan 14, 2015 at 07:17:50PM +0000, Will Deacon wrote:
> On Wed, Jan 14, 2015 at 01:51:36PM +0000, Heiko Stübner wrote:
> > Am Mittwoch, 14. Januar 2015, 10:46:10 schrieb Will Deacon:
> > > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > > > On 12/02/2014 01:57 AM, Will Deacon wrote:
> > > > > This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
> > > > > actually called outside of a few drivers) into arch_setup_dma_ops, so
> > > > > that we can use IOMMUs for DMA transfers in a more generic fashion.
> > > > > 
> > > > > Since this significantly complicates the arch_setup_dma_ops function,
> > > > > it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
> > > > > is not set, the iommu parameter is ignored and the normal ops are used
> > > > > instead.
> > > > 
> > > > A series for IOMMU support with Tegra/Nouveau ceased to work after this
> > > > patch.
> > > 
> > > Which series? This code shouldn't even be executed unless you start using
> > > of_xlate and the generic bindings.
> > > 
> > > > The Tegra IOMMU is not registered by the time the DT is parsed,
> > > > and thus all devices end up without the proper DMA ops set up because
> > > > the phandle to the IOMMU cannot be resolved.
> > > 
> > > You might want to look at the patches posted for the exynos, renesas and ARM
> > > SMMUs for some hints in how to use the new API.
> > > 
> > > > Subsequently calling arm_iommu_create_mapping() and
> > > > arm_iommu_attach_device() from the driver (as I used to do until 3.18)
> > > > does not help since the call to set_dma_ops() has been moved out of
> > > > arm_iommu_attach_device(). Therefore there seems to be no way for a device
> > > > to gets its correct DMA ops unless the IOMMU is ready by the time the DT
> > > > is parsed.
> > > > 
> > > > Also potentially affected by this are the Rockchip DRM and OMAP3 ISP
> > > > drivers, which follow the same pattern.
> > > 
> > > I don't understand why any code currently in mainline should be affected.
> > > Please can you elaborate on the failure case?
> > 
> > As Alexandre suspected the new Rockchip drm code seems to be affected by
> > this. I hadn't played with the drm code before last weekend and was then
> > stumbling over different iommu related issues. As I hadn't to much contact
> > with iommus till now I didn't get very far.
> > 
> > But with Alexandre's bandaid patch of adding
> > 	set_dma_ops(dev, &iommu_ops);
> > to arm_iommu_attach_device both problems go away.
> > 
> > 
> > So to elaborate on the two failure cases:
> 
> Aha, I see what you mean now -- the issue is that attaching to an IOMMU
> domain no longer swizzles the DMA ops. Furthermore, we also need to take
> into account the coherency of the device, which we didn't do before (and
> assumedly "worked" because all of the users happened to be non-coherent).
> 
> Maybe we just need to add something like arm_iommu_swizzle_dma_ops, so
> that direct users of the arm_iommu_* API can manage things manually?

Why does this even have to be an ARM-specific API? Couldn't this equally
well be generic across all platforms so that we get unified handling of
IOMMU through the DMA API?

Thierry
Will Deacon Jan. 15, 2015, 11:12 a.m. UTC | #7
On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> [...]
> > > 2) Say you want to use the IOMMU API in your driver, and have an iommu 
> > > property in your device's DT node. If by chance your IOMMU is registered 
> > > early, you will already have a mapping automatically created even before 
> > > your probe function is called. Can this be avoided? Is it even safe?
> > 
> > Currently, I think you have to either teardown the ops manually or return
> > an error from of_xlate. Thierry was also looking at this sort of thing,
> > so it might be worth talking to him.
> 
> I already explained in earlier threads why I think this is a bad idea.
> It's completely unnatural for any driver to manually tear down something
> that it didn't want set up in the first place. It also means that you
> have to carefully audit any users of these IOMMU APIs to make sure that
> they do tear down. That doesn't sound like a good incremental approach,
> as evidenced by the breakage that Alex and Heiko have encountered.

Well, perhaps we hide that behind a get_iommu API or something. We *do*
need this manual teardown step to support things like VFIO, so it makes
sense to reuse it for other users too imo.

> The solution for me has been to completely side-step the issue and not
> register the IOMMU with the new mechanism at all. That is, there's no
> .of_xlate() implementation, which means that the ARM DMA API glue won't
> try to be smart and use the IOMMU in ways it's not meant to be used.
> This has several advantages, such as that I can also use the regular
> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> benefits of devres in the IOMMU driver. Probe ordering is still a tiny
> issue, but we can easily solve that using explicit initcall ordering
> (which really isn't any worse than IOMMU_OF_DECLARE()).

That's a pity. I'd much rather extend what we currently have to satisfy
your use-case. Ho-hum.

Will
Will Deacon Jan. 15, 2015, 11:13 a.m. UTC | #8
On Thu, Jan 15, 2015 at 08:30:06AM +0000, Thierry Reding wrote:
> On Wed, Jan 14, 2015 at 07:17:50PM +0000, Will Deacon wrote:
> > On Wed, Jan 14, 2015 at 01:51:36PM +0000, Heiko Stübner wrote:
> > > As Alexandre suspected the new Rockchip drm code seems to be affected by
> > > this. I hadn't played with the drm code before last weekend and was then
> > > stumbling over different iommu related issues. As I hadn't to much contact
> > > with iommus till now I didn't get very far.
> > > 
> > > But with Alexandre's bandaid patch of adding
> > > 	set_dma_ops(dev, &iommu_ops);
> > > to arm_iommu_attach_device both problems go away.
> > > 
> > > 
> > > So to elaborate on the two failure cases:
> > 
> > Aha, I see what you mean now -- the issue is that attaching to an IOMMU
> > domain no longer swizzles the DMA ops. Furthermore, we also need to take
> > into account the coherency of the device, which we didn't do before (and
> > assumedly "worked" because all of the users happened to be non-coherent).
> > 
> > Maybe we just need to add something like arm_iommu_swizzle_dma_ops, so
> > that direct users of the arm_iommu_* API can manage things manually?
> 
> Why does this even have to be an ARM-specific API? Couldn't this equally
> well be generic across all platforms so that we get unified handling of
> IOMMU through the DMA API?

It's already an ARM-specific API. Having something generic would be great,
but I was thinking more about a short-term fix for the current issue rather
than implementing something brand new.

Will
Laurent Pinchart Jan. 15, 2015, 11:18 p.m. UTC | #9
On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > [...]
> > 
> >>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> >>> property in your device's DT node. If by chance your IOMMU is
> >>> registered early, you will already have a mapping automatically
> >>> created even before your probe function is called. Can this be
> >>> avoided? Is it even safe?
> >> 
> >> Currently, I think you have to either teardown the ops manually or
> >> return an error from of_xlate. Thierry was also looking at this sort of
> >> thing, so it might be worth talking to him.
> > 
> > I already explained in earlier threads why I think this is a bad idea.
> > It's completely unnatural for any driver to manually tear down something
> > that it didn't want set up in the first place. It also means that you
> > have to carefully audit any users of these IOMMU APIs to make sure that
> > they do tear down. That doesn't sound like a good incremental approach,
> > as evidenced by the breakage that Alex and Heiko have encountered.
> 
> Well, perhaps we hide that behind a get_iommu API or something. We *do*
> need this manual teardown step to support things like VFIO, so it makes
> sense to reuse it for other users too imo.
> 
> > The solution for me has been to completely side-step the issue and not
> > register the IOMMU with the new mechanism at all. That is, there's no
> > .of_xlate() implementation, which means that the ARM DMA API glue won't
> > try to be smart and use the IOMMU in ways it's not meant to be used.

That will break when someone will want to use the same IOMMU type for devices 
that use the DMA mapping API to hide the IOMMU. That might not be the case for 
your IOMMU today, but it's pretty fragile, we need to fix it.

> > This has several advantages, such as that I can also use the regular
> > driver model for suspend/resume of the IOMMU, and I get to enjoy the
> > benefits of devres in the IOMMU driver. Probe ordering is still a tiny
> > issue, but we can easily solve that using explicit initcall ordering
> > (which really isn't any worse than IOMMU_OF_DECLARE()).
> 
> That's a pity. I'd much rather extend what we currently have to satisfy
> your use-case. Ho-hum.

Assuming we want the IOMMU to be handled transparently for the majority of 
devices I only see two ways to fix this,

The first way is to create a default DMA mapping unconditionally and let 
drivers that can't live with it tear it down. That's what is implemented 
today.

The second way is to implement a mechanism to let drivers signal that they 
want to handle DMA mappings themselves. As the mappings need in the general 
case to be created before the probe function is called we can't signal this by 
calling a function in probe(). A new flag field for struct device_driver is a 
possible solution. This would however require delaying the creation of DMA 
mappings until right before probe time. Attaching to the IOMMU could be pushed 
to right before probe() as well, which would have the added benefit of making 
IOMMU driver implementable as real platform drivers.
Alexandre Courbot Jan. 18, 2015, 6:54 a.m. UTC | #10
On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
>> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
>>> On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
>>>> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
>>> [...]
>>>
>>>>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
>>>>> property in your device's DT node. If by chance your IOMMU is
>>>>> registered early, you will already have a mapping automatically
>>>>> created even before your probe function is called. Can this be
>>>>> avoided? Is it even safe?
>>>>
>>>> Currently, I think you have to either teardown the ops manually or
>>>> return an error from of_xlate. Thierry was also looking at this sort of
>>>> thing, so it might be worth talking to him.
>>>
>>> I already explained in earlier threads why I think this is a bad idea.
>>> It's completely unnatural for any driver to manually tear down something
>>> that it didn't want set up in the first place. It also means that you
>>> have to carefully audit any users of these IOMMU APIs to make sure that
>>> they do tear down. That doesn't sound like a good incremental approach,
>>> as evidenced by the breakage that Alex and Heiko have encountered.
>>
>> Well, perhaps we hide that behind a get_iommu API or something. We *do*
>> need this manual teardown step to support things like VFIO, so it makes
>> sense to reuse it for other users too imo.
>>
>>> The solution for me has been to completely side-step the issue and not
>>> register the IOMMU with the new mechanism at all. That is, there's no
>>> .of_xlate() implementation, which means that the ARM DMA API glue won't
>>> try to be smart and use the IOMMU in ways it's not meant to be used.
>
> That will break when someone will want to use the same IOMMU type for devices
> that use the DMA mapping API to hide the IOMMU. That might not be the case for
> your IOMMU today, but it's pretty fragile, we need to fix it.
>
>>> This has several advantages, such as that I can also use the regular
>>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
>>> benefits of devres in the IOMMU driver. Probe ordering is still a tiny
>>> issue, but we can easily solve that using explicit initcall ordering
>>> (which really isn't any worse than IOMMU_OF_DECLARE()).
>>
>> That's a pity. I'd much rather extend what we currently have to satisfy
>> your use-case. Ho-hum.
>
> Assuming we want the IOMMU to be handled transparently for the majority of
> devices I only see two ways to fix this,
>
> The first way is to create a default DMA mapping unconditionally and let
> drivers that can't live with it tear it down. That's what is implemented
> today.

I strongly support Thierry's point that drivers should not have to tear 
down things they don't need. The issue we are facing today is a very 
good illustration of why one should not have to do this.

Everybody hates to receive unsollicited email with a link that says "to 
unsubscribe, click here". Let's not import that unpleasant culture into 
the kernel.

I am arriving late in this discussion, but what is wrong with asking 
drivers to explicitly state that they want the DMA API to be backed by 
the IOMMU instead of forcibly making it work that way?

>
> The second way is to implement a mechanism to let drivers signal that they
> want to handle DMA mappings themselves. As the mappings need in the general
> case to be created before the probe function  is called

Sorry for being ignorant here, but why is that?

 > we can't signal this by
> calling a function in probe(). A new flag field for struct device_driver is a
> possible solution. This would however require delaying the creation of DMA
> mappings until right before probe time. Attaching to the IOMMU could be pushed
> to right before probe() as well, which would have the added benefit of making
> IOMMU driver implementable as real platform drivers.

Keeping the ability to write IOMMU drivers as platform drivers would be 
nice indeed.

The problem with the opt-out flag though is that one will need to check 
every single driver out there to see whether it stills behave correctly 
if its hardware is suddently put behind a IOMMU. Doing it the other way 
(a flag that enables IOMMU if available) sounds safer to me.

What we have right now is a mechanism that basically makes it impossible 
to use the DMA API on many ARM platforms if ARM_DMA_USE_IOMMU is set 
(and I suspect it would also make the IOMMU unusable as well, without 
any way to fix things). This is quite concerning.

Even more concerning is that -rc5 is about to be released and we have 
in-tree drivers (Rockchip DRM) that are not working as they should 
because of this patch. Will, what is your plan to fix this? Do we have 
stuff that absolutely depends on this patch? If not, can we just revert 
it until all these issues are solved?
Laurent Pinchart Jan. 18, 2015, 11:18 a.m. UTC | #11
Hi Alexandre,

On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> > On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> >> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> >>> On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> >>>> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> >>> [...]
> >>> 
> >>>>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> >>>>> property in your device's DT node. If by chance your IOMMU is
> >>>>> registered early, you will already have a mapping automatically
> >>>>> created even before your probe function is called. Can this be
> >>>>> avoided? Is it even safe?
> >>>> 
> >>>> Currently, I think you have to either teardown the ops manually or
> >>>> return an error from of_xlate. Thierry was also looking at this sort of
> >>>> thing, so it might be worth talking to him.
> >>> 
> >>> I already explained in earlier threads why I think this is a bad idea.
> >>> It's completely unnatural for any driver to manually tear down something
> >>> that it didn't want set up in the first place. It also means that you
> >>> have to carefully audit any users of these IOMMU APIs to make sure that
> >>> they do tear down. That doesn't sound like a good incremental approach,
> >>> as evidenced by the breakage that Alex and Heiko have encountered.
> >> 
> >> Well, perhaps we hide that behind a get_iommu API or something. We *do*
> >> need this manual teardown step to support things like VFIO, so it makes
> >> sense to reuse it for other users too imo.
> >> 
> >>> The solution for me has been to completely side-step the issue and not
> >>> register the IOMMU with the new mechanism at all. That is, there's no
> >>> .of_xlate() implementation, which means that the ARM DMA API glue won't
> >>> try to be smart and use the IOMMU in ways it's not meant to be used.
> > 
> > That will break when someone will want to use the same IOMMU type for
> > devices that use the DMA mapping API to hide the IOMMU. That might not be
> > the case for your IOMMU today, but it's pretty fragile, we need to fix
> > it.
> > 
> >>> This has several advantages, such as that I can also use the regular
> >>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> >>> benefits of devres in the IOMMU driver. Probe ordering is still a tiny
> >>> issue, but we can easily solve that using explicit initcall ordering
> >>> (which really isn't any worse than IOMMU_OF_DECLARE()).
> >> 
> >> That's a pity. I'd much rather extend what we currently have to satisfy
> >> your use-case. Ho-hum.
> > 
> > Assuming we want the IOMMU to be handled transparently for the majority of
> > devices I only see two ways to fix this,
> > 
> > The first way is to create a default DMA mapping unconditionally and let
> > drivers that can't live with it tear it down. That's what is implemented
> > today.
> 
> I strongly support Thierry's point that drivers should not have to tear
> down things they don't need. The issue we are facing today is a very
> good illustration of why one should not have to do this.
> 
> Everybody hates to receive unsollicited email with a link that says "to
> unsubscribe, click here". Let's not import that unpleasant culture into
> the kernel.
> 
> I am arriving late in this discussion, but what is wrong with asking
> drivers to explicitly state that they want the DMA API to be backed by
> the IOMMU instead of forcibly making it work that way?

The vast majority of the drivers are not IOMMU-aware. We would thus need to 
add a call at the beginning of the probe function of nearly every driver that 
can perform DMA to state that the driver doesn't need to handle any IOMMU that 
might be present in the system itself. I don't think that's a better solution.

Explicitly tearing down mappings in drivers that want to manage IOMMUs isn't a 
solution I like either. A possibly better solution would be to call a function 
to state that the DMA mapping API shouldn't not handle IOMMUs. Something like

	dma_mapping_ignore_iommu(dev);

at the beginning of the probe function of such drivers could do. The function 
would perform behind the scene all operations needed to tear down everything 
that shouldn't have been set up.

> > The second way is to implement a mechanism to let drivers signal that they
> > want to handle DMA mappings themselves. As the mappings need in the
> > general case to be created before the probe function  is called
> 
> Sorry for being ignorant here, but why is that?

Because a driver can call the DMA mapping API in its probe function, to 
allocate DMA coherent memory for instance. We need to ensure that the DMA 
mapping IOMMU has set up the required IOMMU ops by that time. As explained 
above I don't like the idea of sprinkling explicit calls to initialize IOMMU 
support in the vast majority of drivers, especially when they shouldn't be 
IOMMU-aware, so we then need to initialize everything that is needed before 
the first call to the DMA mapping API.

> > we can't signal this by calling a function in probe(). A new flag field
> > for struct device_driver is a possible solution. This would however
> > require delaying the creation of DMA mappings until right before probe
> > time. Attaching to the IOMMU could be pushed to right before probe() as
> > well, which would have the added benefit of making IOMMU driver
> > implementable as real platform drivers.
> 
> Keeping the ability to write IOMMU drivers as platform drivers would be
> nice indeed.
> 
> The problem with the opt-out flag though is that one will need to check
> every single driver out there to see whether it stills behave correctly
> if its hardware is suddently put behind a IOMMU.

That's actually my default assumption :-) On ARM platforms at least, for many 
devices, whether an IOMMU is present or not is an integration property, not a 
property of the device. The same way a device shouldn't care about the exact 
bus topology that connects it to the CPU and memory, it shouldn't care about 
whether an IOMMU is present on that bus, except in special cases.

> Doing it the other way (a flag that enables IOMMU if available) sounds safer
> to me.
>
> What we have right now is a mechanism that basically makes it impossible
> to use the DMA API on many ARM platforms if ARM_DMA_USE_IOMMU is set
> (and I suspect it would also make the IOMMU unusable as well, without
> any way to fix things). This is quite concerning.
> 
> Even more concerning is that -rc5 is about to be released and we have
> in-tree drivers (Rockchip DRM) that are not working as they should
> because of this patch. Will, what is your plan to fix this? Do we have
> stuff that absolutely depends on this patch? If not, can we just revert
> it until all these issues are solved?
Will Deacon Jan. 19, 2015, 11:12 a.m. UTC | #12
On Sun, Jan 18, 2015 at 11:18:51AM +0000, Laurent Pinchart wrote:
> On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> > > On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > >> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > >>> On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > >>>> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > >>> [...]
> > >>> 
> > >>>>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> > >>>>> property in your device's DT node. If by chance your IOMMU is
> > >>>>> registered early, you will already have a mapping automatically
> > >>>>> created even before your probe function is called. Can this be
> > >>>>> avoided? Is it even safe?
> > >>>> 
> > >>>> Currently, I think you have to either teardown the ops manually or
> > >>>> return an error from of_xlate. Thierry was also looking at this sort of
> > >>>> thing, so it might be worth talking to him.
> > >>> 
> > >>> I already explained in earlier threads why I think this is a bad idea.
> > >>> It's completely unnatural for any driver to manually tear down something
> > >>> that it didn't want set up in the first place. It also means that you
> > >>> have to carefully audit any users of these IOMMU APIs to make sure that
> > >>> they do tear down. That doesn't sound like a good incremental approach,
> > >>> as evidenced by the breakage that Alex and Heiko have encountered.
> > >> 
> > >> Well, perhaps we hide that behind a get_iommu API or something. We *do*
> > >> need this manual teardown step to support things like VFIO, so it makes
> > >> sense to reuse it for other users too imo.
> > >> 
> > >>> The solution for me has been to completely side-step the issue and not
> > >>> register the IOMMU with the new mechanism at all. That is, there's no
> > >>> .of_xlate() implementation, which means that the ARM DMA API glue won't
> > >>> try to be smart and use the IOMMU in ways it's not meant to be used.
> > > 
> > > That will break when someone will want to use the same IOMMU type for
> > > devices that use the DMA mapping API to hide the IOMMU. That might not be
> > > the case for your IOMMU today, but it's pretty fragile, we need to fix
> > > it.
> > > 
> > >>> This has several advantages, such as that I can also use the regular
> > >>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> > >>> benefits of devres in the IOMMU driver. Probe ordering is still a tiny
> > >>> issue, but we can easily solve that using explicit initcall ordering
> > >>> (which really isn't any worse than IOMMU_OF_DECLARE()).
> > >> 
> > >> That's a pity. I'd much rather extend what we currently have to satisfy
> > >> your use-case. Ho-hum.
> > > 
> > > Assuming we want the IOMMU to be handled transparently for the majority of
> > > devices I only see two ways to fix this,
> > > 
> > > The first way is to create a default DMA mapping unconditionally and let
> > > drivers that can't live with it tear it down. That's what is implemented
> > > today.
> > 
> > I strongly support Thierry's point that drivers should not have to tear
> > down things they don't need. The issue we are facing today is a very
> > good illustration of why one should not have to do this.
> > 
> > Everybody hates to receive unsollicited email with a link that says "to
> > unsubscribe, click here". Let's not import that unpleasant culture into
> > the kernel.
> > 
> > I am arriving late in this discussion, but what is wrong with asking
> > drivers to explicitly state that they want the DMA API to be backed by
> > the IOMMU instead of forcibly making it work that way?
> 
> The vast majority of the drivers are not IOMMU-aware. We would thus need to 
> add a call at the beginning of the probe function of nearly every driver that 
> can perform DMA to state that the driver doesn't need to handle any IOMMU that 
> might be present in the system itself. I don't think that's a better solution.
> 
> Explicitly tearing down mappings in drivers that want to manage IOMMUs isn't a 
> solution I like either. A possibly better solution would be to call a function 
> to state that the DMA mapping API shouldn't not handle IOMMUs. Something like
> 
> 	dma_mapping_ignore_iommu(dev);
> 
> at the beginning of the probe function of such drivers could do. The function 
> would perform behind the scene all operations needed to tear down everything 
> that shouldn't have been set up.

An alternative would be to add a flag to platform_driver, like we have for
"prevent_deferred_probe" which is something like "prevent_dma_configure".

For the moment, that would actually teardown the DMA configuration in
platform_drv_probe, but if things are reordering in future then we can avoid
setting up the ops altogether without an API change.

Will
Laurent Pinchart Jan. 19, 2015, 11:34 a.m. UTC | #13
Hi Will,

On Monday 19 January 2015 11:12:02 Will Deacon wrote:
> On Sun, Jan 18, 2015 at 11:18:51AM +0000, Laurent Pinchart wrote:
> > On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> >> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> >>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> >>>> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> >>>>> On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> >>>>>> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> >>>>> [...]
> >>>>> 
> >>>>>>> 2) Say you want to use the IOMMU API in your driver, and have an
> >>>>>>> iommu property in your device's DT node. If by chance your IOMMU
> >>>>>>> is registered early, you will already have a mapping automatically
> >>>>>>> created even before your probe function is called. Can this be
> >>>>>>> avoided? Is it even safe?
> >>>>>> 
> >>>>>> Currently, I think you have to either teardown the ops manually or
> >>>>>> return an error from of_xlate. Thierry was also looking at this
> >>>>>> sort of thing, so it might be worth talking to him.
> >>>>> 
> >>>>> I already explained in earlier threads why I think this is a bad
> >>>>> idea. It's completely unnatural for any driver to manually tear down
> >>>>> something that it didn't want set up in the first place. It also
> >>>>> means that you have to carefully audit any users of these IOMMU APIs
> >>>>> to make sure that they do tear down. That doesn't sound like a good
> >>>>> incremental approach, as evidenced by the breakage that Alex and
> >>>>> Heiko have encountered.
> >>>> 
> >>>> Well, perhaps we hide that behind a get_iommu API or something. We
> >>>> *do* need this manual teardown step to support things like VFIO, so
> >>>> it makes sense to reuse it for other users too imo.
> >>>> 
> >>>>> The solution for me has been to completely side-step the issue and
> >>>>> not register the IOMMU with the new mechanism at all. That is,
> >>>>> there's no .of_xlate() implementation, which means that the ARM DMA
> >>>>> API glue won't try to be smart and use the IOMMU in ways it's not
> >>>>> meant to be used.
> >>> 
> >>> That will break when someone will want to use the same IOMMU type for
> >>> devices that use the DMA mapping API to hide the IOMMU. That might not
> >>> be the case for your IOMMU today, but it's pretty fragile, we need to
> >>> fix it.
> >>> 
> >>>>> This has several advantages, such as that I can also use the regular
> >>>>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> >>>>> benefits of devres in the IOMMU driver. Probe ordering is still a
> >>>>> tiny issue, but we can easily solve that using explicit initcall
> >>>>> ordering (which really isn't any worse than IOMMU_OF_DECLARE()).
> >>>> 
> >>>> That's a pity. I'd much rather extend what we currently have to
> >>>> satisfy your use-case. Ho-hum.
> >>> 
> >>> Assuming we want the IOMMU to be handled transparently for the
> >>> majority of devices I only see two ways to fix this,
> >>> 
> >>> The first way is to create a default DMA mapping unconditionally and
> >>> let drivers that can't live with it tear it down. That's what is
> >>> implemented today.
> >> 
> >> I strongly support Thierry's point that drivers should not have to tear
> >> down things they don't need. The issue we are facing today is a very
> >> good illustration of why one should not have to do this.
> >> 
> >> Everybody hates to receive unsollicited email with a link that says "to
> >> unsubscribe, click here". Let's not import that unpleasant culture into
> >> the kernel.
> >> 
> >> I am arriving late in this discussion, but what is wrong with asking
> >> drivers to explicitly state that they want the DMA API to be backed by
> >> the IOMMU instead of forcibly making it work that way?
> > 
> > The vast majority of the drivers are not IOMMU-aware. We would thus need
> > to add a call at the beginning of the probe function of nearly every
> > driver that can perform DMA to state that the driver doesn't need to
> > handle any IOMMU that might be present in the system itself. I don't think
> > that's a better solution.
> > 
> > Explicitly tearing down mappings in drivers that want to manage IOMMUs
> > isn't a solution I like either. A possibly better solution would be to
> > call a function to state that the DMA mapping API shouldn't not handle
> > IOMMUs. Something like
> >
> > 	dma_mapping_ignore_iommu(dev);
> > 
> > at the beginning of the probe function of such drivers could do. The
> > function would perform behind the scene all operations needed to tear
> > down everything that shouldn't have been set up.
> 
> An alternative would be to add a flag to platform_driver, like we have for
> "prevent_deferred_probe" which is something like "prevent_dma_configure".

That's a solution I have proposed (albeit as a struct device_driver field, but 
that's a small detail), so I'm fine with it :-)

> For the moment, that would actually teardown the DMA configuration in
> platform_drv_probe, but if things are reordering in future then we can avoid
> setting up the ops altogether without an API change.
Thierry Reding Jan. 19, 2015, 12:31 p.m. UTC | #14
On Mon, Jan 19, 2015 at 01:34:24PM +0200, Laurent Pinchart wrote:
> Hi Will,
> 
> On Monday 19 January 2015 11:12:02 Will Deacon wrote:
> > On Sun, Jan 18, 2015 at 11:18:51AM +0000, Laurent Pinchart wrote:
> > > On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > >> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> > >>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > >>>> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > >>>>> On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > >>>>>> On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > >>>>> [...]
> > >>>>> 
> > >>>>>>> 2) Say you want to use the IOMMU API in your driver, and have an
> > >>>>>>> iommu property in your device's DT node. If by chance your IOMMU
> > >>>>>>> is registered early, you will already have a mapping automatically
> > >>>>>>> created even before your probe function is called. Can this be
> > >>>>>>> avoided? Is it even safe?
> > >>>>>> 
> > >>>>>> Currently, I think you have to either teardown the ops manually or
> > >>>>>> return an error from of_xlate. Thierry was also looking at this
> > >>>>>> sort of thing, so it might be worth talking to him.
> > >>>>> 
> > >>>>> I already explained in earlier threads why I think this is a bad
> > >>>>> idea. It's completely unnatural for any driver to manually tear down
> > >>>>> something that it didn't want set up in the first place. It also
> > >>>>> means that you have to carefully audit any users of these IOMMU APIs
> > >>>>> to make sure that they do tear down. That doesn't sound like a good
> > >>>>> incremental approach, as evidenced by the breakage that Alex and
> > >>>>> Heiko have encountered.
> > >>>> 
> > >>>> Well, perhaps we hide that behind a get_iommu API or something. We
> > >>>> *do* need this manual teardown step to support things like VFIO, so
> > >>>> it makes sense to reuse it for other users too imo.
> > >>>> 
> > >>>>> The solution for me has been to completely side-step the issue and
> > >>>>> not register the IOMMU with the new mechanism at all. That is,
> > >>>>> there's no .of_xlate() implementation, which means that the ARM DMA
> > >>>>> API glue won't try to be smart and use the IOMMU in ways it's not
> > >>>>> meant to be used.
> > >>> 
> > >>> That will break when someone will want to use the same IOMMU type for
> > >>> devices that use the DMA mapping API to hide the IOMMU. That might not
> > >>> be the case for your IOMMU today, but it's pretty fragile, we need to
> > >>> fix it.
> > >>> 
> > >>>>> This has several advantages, such as that I can also use the regular
> > >>>>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> > >>>>> benefits of devres in the IOMMU driver. Probe ordering is still a
> > >>>>> tiny issue, but we can easily solve that using explicit initcall
> > >>>>> ordering (which really isn't any worse than IOMMU_OF_DECLARE()).
> > >>>> 
> > >>>> That's a pity. I'd much rather extend what we currently have to
> > >>>> satisfy your use-case. Ho-hum.
> > >>> 
> > >>> Assuming we want the IOMMU to be handled transparently for the
> > >>> majority of devices I only see two ways to fix this,
> > >>> 
> > >>> The first way is to create a default DMA mapping unconditionally and
> > >>> let drivers that can't live with it tear it down. That's what is
> > >>> implemented today.
> > >> 
> > >> I strongly support Thierry's point that drivers should not have to tear
> > >> down things they don't need. The issue we are facing today is a very
> > >> good illustration of why one should not have to do this.
> > >> 
> > >> Everybody hates to receive unsollicited email with a link that says "to
> > >> unsubscribe, click here". Let's not import that unpleasant culture into
> > >> the kernel.
> > >> 
> > >> I am arriving late in this discussion, but what is wrong with asking
> > >> drivers to explicitly state that they want the DMA API to be backed by
> > >> the IOMMU instead of forcibly making it work that way?
> > > 
> > > The vast majority of the drivers are not IOMMU-aware. We would thus need
> > > to add a call at the beginning of the probe function of nearly every
> > > driver that can perform DMA to state that the driver doesn't need to
> > > handle any IOMMU that might be present in the system itself. I don't think
> > > that's a better solution.
> > > 
> > > Explicitly tearing down mappings in drivers that want to manage IOMMUs
> > > isn't a solution I like either. A possibly better solution would be to
> > > call a function to state that the DMA mapping API shouldn't not handle
> > > IOMMUs. Something like
> > >
> > > 	dma_mapping_ignore_iommu(dev);
> > > 
> > > at the beginning of the probe function of such drivers could do. The
> > > function would perform behind the scene all operations needed to tear
> > > down everything that shouldn't have been set up.
> > 
> > An alternative would be to add a flag to platform_driver, like we have for
> > "prevent_deferred_probe" which is something like "prevent_dma_configure".
> 
> That's a solution I have proposed (albeit as a struct device_driver field, but 
> that's a small detail), so I'm fine with it :-)

I think Marek had proposed something similar initially as well. I don't
see an immediate downside to that solution. It's still somewhat ugly in
that a lot of stuff is set up before it's known to actually be used at
all, but it seems like there's some consensus that this can be improved
later on, so I have no objections to such a patch.

Of course that doesn't solve the current breakage for the Rockchip DRM
and OMAP ISP drivers.

There's an additional issue with this automatic type of setting up
mapping: on Tegra quite a few devices can use the IOMMU. Among those are
SD/MMC controllers, HDA, SATA and things like the AHB bus. Creating a
separate domain (== address space) for each of those devices is wasteful
and there's currently no way to make them use a single virtual address
space. Each of them really has no use for a full 4 GiB of address space.
My earliest attempt to implement that was as a policy decision within
the IOMMU driver, but that wasn't very well received.

I'm now wondering whether the same could be done using this new type of
flag. Perhaps it can be assumed that every driver that doesn't want its
IOMMU master hooked up to the DMA API can (and will) manually manage the
virtual address space. Conversely every driver that wants to use the DMA
API to abstract away the existence of an IOMMU could be considered part
of the group where a separate domain isn't necessary.

Thierry
Thierry Reding Jan. 19, 2015, 12:36 p.m. UTC | #15
On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > > On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > > > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > > [...]
> > > 
> > >>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> > >>> property in your device's DT node. If by chance your IOMMU is
> > >>> registered early, you will already have a mapping automatically
> > >>> created even before your probe function is called. Can this be
> > >>> avoided? Is it even safe?
> > >> 
> > >> Currently, I think you have to either teardown the ops manually or
> > >> return an error from of_xlate. Thierry was also looking at this sort of
> > >> thing, so it might be worth talking to him.
> > > 
> > > I already explained in earlier threads why I think this is a bad idea.
> > > It's completely unnatural for any driver to manually tear down something
> > > that it didn't want set up in the first place. It also means that you
> > > have to carefully audit any users of these IOMMU APIs to make sure that
> > > they do tear down. That doesn't sound like a good incremental approach,
> > > as evidenced by the breakage that Alex and Heiko have encountered.
> > 
> > Well, perhaps we hide that behind a get_iommu API or something. We *do*
> > need this manual teardown step to support things like VFIO, so it makes
> > sense to reuse it for other users too imo.
> > 
> > > The solution for me has been to completely side-step the issue and not
> > > register the IOMMU with the new mechanism at all. That is, there's no
> > > .of_xlate() implementation, which means that the ARM DMA API glue won't
> > > try to be smart and use the IOMMU in ways it's not meant to be used.
> 
> That will break when someone will want to use the same IOMMU type for devices 
> that use the DMA mapping API to hide the IOMMU. That might not be the case for 
> your IOMMU today, but it's pretty fragile, we need to fix it.

No, there's absolutely no issue here. It simply means that you can't do
this on Tegra. So far I'm not sure I even see an advantage in using the
IOMMU for devices that don't care about it anyway. Consider the example
of the SD/MMC or HDA. They typically allocate fairly small buffers, the
order of a single page typically. They can simply use memory handed out
by the CMA.

So as long as we don't add a .of_xlate() implementation or instantiate
via the IOMMU_OF_DECLARE() mechanism we simply don't support IOMMU-over-
DMA on Tegra.

Thierry
Thierry Reding Jan. 19, 2015, 12:43 p.m. UTC | #16
On Sun, Jan 18, 2015 at 01:18:51PM +0200, Laurent Pinchart wrote:
> Hi Alexandre,
> 
> On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
[...]
> > > The second way is to implement a mechanism to let drivers signal that they
> > > want to handle DMA mappings themselves. As the mappings need in the
> > > general case to be created before the probe function  is called
> > 
> > Sorry for being ignorant here, but why is that?
> 
> Because a driver can call the DMA mapping API in its probe function, to 
> allocate DMA coherent memory for instance. We need to ensure that the DMA 
> mapping IOMMU has set up the required IOMMU ops by that time. As explained 
> above I don't like the idea of sprinkling explicit calls to initialize IOMMU 
> support in the vast majority of drivers, especially when they shouldn't be 
> IOMMU-aware, so we then need to initialize everything that is needed before 
> the first call to the DMA mapping API.

The original patch that Hiroshi posted based on my comments was to have
the driver core call iommu_attach(), which would then set up everything
needed right before calling into the driver's ->probe(). This works
quite nicely because by definition the driver can't allocate any DMA
before ->probe(). And, like you said, this allows deferred probe to be
used.

To me it's so obviously the right solution that I remain flabbergasted
with how much resistance it's received (or how much it's being ignored).

Thierry
Thierry Reding Jan. 19, 2015, 12:49 p.m. UTC | #17
On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
[...]
> The second way is to implement a mechanism to let drivers signal that they 
> want to handle DMA mappings themselves. As the mappings need in the general 
> case to be created before the probe function is called we can't signal this by 
> calling a function in probe(). A new flag field for struct device_driver is a 
> possible solution. This would however require delaying the creation of DMA 
> mappings until right before probe time. Attaching to the IOMMU could be pushed 
> to right before probe() as well, which would have the added benefit of making 
> IOMMU driver implementable as real platform drivers.

Right. This is a pretty important point, too. One of the things that
we've been working on is suspend/resume. Now if you don't have a struct
device you can't easily implement suspend/resume. You'd have to play
tricks like using syscore_ops, which then leads to potentially problems
with suspend/resume ordering. It also means we have to keep around
global variables for driver-private data because there's no struct
device to attach it to.

By properly encoding the dependencies via deferred probe we get the
proper ordering and we can use the regular driver model with all the
goodies that we've come up with over the years.

Thierry
Will Deacon Jan. 19, 2015, 12:50 p.m. UTC | #18
On Mon, Jan 19, 2015 at 12:43:06PM +0000, Thierry Reding wrote:
> On Sun, Jan 18, 2015 at 01:18:51PM +0200, Laurent Pinchart wrote:
> > On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > > On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> [...]
> > > > The second way is to implement a mechanism to let drivers signal that they
> > > > want to handle DMA mappings themselves. As the mappings need in the
> > > > general case to be created before the probe function  is called
> > > 
> > > Sorry for being ignorant here, but why is that?
> > 
> > Because a driver can call the DMA mapping API in its probe function, to 
> > allocate DMA coherent memory for instance. We need to ensure that the DMA 
> > mapping IOMMU has set up the required IOMMU ops by that time. As explained 
> > above I don't like the idea of sprinkling explicit calls to initialize IOMMU 
> > support in the vast majority of drivers, especially when they shouldn't be 
> > IOMMU-aware, so we then need to initialize everything that is needed before 
> > the first call to the DMA mapping API.
> 
> The original patch that Hiroshi posted based on my comments was to have
> the driver core call iommu_attach(), which would then set up everything
> needed right before calling into the driver's ->probe(). This works
> quite nicely because by definition the driver can't allocate any DMA
> before ->probe(). And, like you said, this allows deferred probe to be
> used.
> 
> To me it's so obviously the right solution that I remain flabbergasted
> with how much resistance it's received (or how much it's being ignored).

Have you considered reposting the patches based on what we currently have
(which has the advantage of identifying a specific IOMMU instance)?

Will
Thierry Reding Jan. 19, 2015, 1:36 p.m. UTC | #19
On Mon, Jan 19, 2015 at 12:50:52PM +0000, Will Deacon wrote:
> On Mon, Jan 19, 2015 at 12:43:06PM +0000, Thierry Reding wrote:
> > On Sun, Jan 18, 2015 at 01:18:51PM +0200, Laurent Pinchart wrote:
> > > On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > > > On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> > [...]
> > > > > The second way is to implement a mechanism to let drivers signal that they
> > > > > want to handle DMA mappings themselves. As the mappings need in the
> > > > > general case to be created before the probe function  is called
> > > > 
> > > > Sorry for being ignorant here, but why is that?
> > > 
> > > Because a driver can call the DMA mapping API in its probe function, to 
> > > allocate DMA coherent memory for instance. We need to ensure that the DMA 
> > > mapping IOMMU has set up the required IOMMU ops by that time. As explained 
> > > above I don't like the idea of sprinkling explicit calls to initialize IOMMU 
> > > support in the vast majority of drivers, especially when they shouldn't be 
> > > IOMMU-aware, so we then need to initialize everything that is needed before 
> > > the first call to the DMA mapping API.
> > 
> > The original patch that Hiroshi posted based on my comments was to have
> > the driver core call iommu_attach(), which would then set up everything
> > needed right before calling into the driver's ->probe(). This works
> > quite nicely because by definition the driver can't allocate any DMA
> > before ->probe(). And, like you said, this allows deferred probe to be
> > used.
> > 
> > To me it's so obviously the right solution that I remain flabbergasted
> > with how much resistance it's received (or how much it's being ignored).
> 
> Have you considered reposting the patches based on what we currently have
> (which has the advantage of identifying a specific IOMMU instance)?

No, I hadn't. Initially my patches included a solution for identifying
individual IOMMU instances, too. There was a small registry with a list
of struct iommus. That was supposed to get used to store per-instance
data (by being embedded in a driver-specific structure). I'd need to
look in more detail how that could be done with the infrastructure that
your patchset creates. I'm somewhat burried below other tasks right now
so don't expect to have any time to look into this anytime before -rc6
or -rc7 at the earliest.

Thierry
Arnd Bergmann Jan. 19, 2015, 3:52 p.m. UTC | #20
On Monday 19 January 2015 13:36:24 Thierry Reding wrote:
> On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
> > On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > > On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > > > On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > > > > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > > > [...]
> > > > 
> > > >>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> > > >>> property in your device's DT node. If by chance your IOMMU is
> > > >>> registered early, you will already have a mapping automatically
> > > >>> created even before your probe function is called. Can this be
> > > >>> avoided? Is it even safe?
> > > >> 
> > > >> Currently, I think you have to either teardown the ops manually or
> > > >> return an error from of_xlate. Thierry was also looking at this sort of
> > > >> thing, so it might be worth talking to him.
> > > > 
> > > > I already explained in earlier threads why I think this is a bad idea.
> > > > It's completely unnatural for any driver to manually tear down something
> > > > that it didn't want set up in the first place. It also means that you
> > > > have to carefully audit any users of these IOMMU APIs to make sure that
> > > > they do tear down. That doesn't sound like a good incremental approach,
> > > > as evidenced by the breakage that Alex and Heiko have encountered.
> > > 
> > > Well, perhaps we hide that behind a get_iommu API or something. We *do*
> > > need this manual teardown step to support things like VFIO, so it makes
> > > sense to reuse it for other users too imo.
> > > 
> > > > The solution for me has been to completely side-step the issue and not
> > > > register the IOMMU with the new mechanism at all. That is, there's no
> > > > .of_xlate() implementation, which means that the ARM DMA API glue won't
> > > > try to be smart and use the IOMMU in ways it's not meant to be used.
> > 
> > That will break when someone will want to use the same IOMMU type for devices 
> > that use the DMA mapping API to hide the IOMMU. That might not be the case for 
> > your IOMMU today, but it's pretty fragile, we need to fix it.
> 
> No, there's absolutely no issue here. It simply means that you can't do
> this on Tegra. So far I'm not sure I even see an advantage in using the
> IOMMU for devices that don't care about it anyway. Consider the example
> of the SD/MMC or HDA. They typically allocate fairly small buffers, the
> order of a single page typically. They can simply use memory handed out
> by the CMA.
> 
> So as long as we don't add a .of_xlate() implementation or instantiate
> via the IOMMU_OF_DECLARE() mechanism we simply don't support IOMMU-over-
> DMA on Tegra.

It breaks as soon as you have a system with memory above the 4GB boundary,
which is the whole point of iommus for most users.

CMA does not work for streaming mappings, only for the coherent API.

	Arnd
Arnd Bergmann Jan. 19, 2015, 4:13 p.m. UTC | #21
On Sunday 18 January 2015 13:18:51 Laurent Pinchart wrote:
> On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> > > On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > >> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > >>> This has several advantages, such as that I can also use the regular
> > >>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> > >>> benefits of devres in the IOMMU driver. Probe ordering is still a tiny
> > >>> issue, but we can easily solve that using explicit initcall ordering
> > >>> (which really isn't any worse than IOMMU_OF_DECLARE()).
> > >> 
> > >> That's a pity. I'd much rather extend what we currently have to satisfy
> > >> your use-case. Ho-hum.
> > > 
> > > Assuming we want the IOMMU to be handled transparently for the majority of
> > > devices I only see two ways to fix this,
> > > 
> > > The first way is to create a default DMA mapping unconditionally and let
> > > drivers that can't live with it tear it down. That's what is implemented
> > > today.
> > 
> > I strongly support Thierry's point that drivers should not have to tear
> > down things they don't need. The issue we are facing today is a very
> > good illustration of why one should not have to do this.
> > 
> > Everybody hates to receive unsollicited email with a link that says "to
> > unsubscribe, click here". Let's not import that unpleasant culture into
> > the kernel.
> > 
> > I am arriving late in this discussion, but what is wrong with asking
> > drivers to explicitly state that they want the DMA API to be backed by
> > the IOMMU instead of forcibly making it work that way?
> 
> The vast majority of the drivers are not IOMMU-aware. We would thus need to 
> add a call at the beginning of the probe function of nearly every driver that 
> can perform DMA to state that the driver doesn't need to handle any IOMMU that 
> might be present in the system itself. I don't think that's a better solution.

Right, abstracting the presence of an IOMMU (along with things like
cache management) is the whole point for having the dma-mapping API.

The iommu driver should instead be able to make the decision on whether
the device uses the iommu for DMA or not. In some cases, it's not an
option because the iommu is mandatory for all DMA and there is no working
passthrough mode. In other cases, it depends on the dma mask: as long as
the device's dma_mask covers all of RAM, we can avoid using the IOMMU
and get better performance (and also avoid setting up tables that may
need to be removed again), but when the dma mask is too small, we have
to use the iommu or fall back to swiotlb (which is currently not implemeted
on arm32).

> > > we can't signal this by calling a function in probe(). A new flag field
> > > for struct device_driver is a possible solution. This would however
> > > require delaying the creation of DMA mappings until right before probe
> > > time. Attaching to the IOMMU could be pushed to right before probe() as
> > > well, which would have the added benefit of making IOMMU driver
> > > implementable as real platform drivers.
> > 
> > Keeping the ability to write IOMMU drivers as platform drivers would be
> > nice indeed.
> > 
> > The problem with the opt-out flag though is that one will need to check
> > every single driver out there to see whether it stills behave correctly
> > if its hardware is suddently put behind a IOMMU.
> 
> That's actually my default assumption :-) On ARM platforms at least, for many 
> devices, whether an IOMMU is present or not is an integration property, not a 
> property of the device. The same way a device shouldn't care about the exact 
> bus topology that connects it to the CPU and memory, it shouldn't care about 
> whether an IOMMU is present on that bus, except in special cases.

Agreed. This must work by default, or basically all arm64 machines are
broken. At the moment, arm64 does not support IOMMUs properly and uses
uses swiotlb instead, but it's a huge performance bottleneck. On arm32,
very few systems need an IOMMU at the moment, but it's getting more common.

> > Doing it the other way (a flag that enables IOMMU if available) sounds safer
> > to me.
> >
> > What we have right now is a mechanism that basically makes it impossible
> > to use the DMA API on many ARM platforms if ARM_DMA_USE_IOMMU is set
> > (and I suspect it would also make the IOMMU unusable as well, without
> > any way to fix things). This is quite concerning.
> > 
> > Even more concerning is that -rc5 is about to be released and we have
> > in-tree drivers (Rockchip DRM) that are not working as they should
> > because of this patch. Will, what is your plan to fix this? Do we have
> > stuff that absolutely depends on this patch? If not, can we just revert
> > it until all these issues are solved?

keystone, shmobile, mvebu and highbank all have PCI buses that are unable
to access all of RAM, with different kinds of hacks to work around that.

	Arnd
Thierry Reding Jan. 19, 2015, 4:21 p.m. UTC | #22
On Mon, Jan 19, 2015 at 04:52:41PM +0100, Arnd Bergmann wrote:
> On Monday 19 January 2015 13:36:24 Thierry Reding wrote:
> > On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
> > > On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > > > On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > > > > On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > > > > > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > > > > [...]
> > > > > 
> > > > >>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> > > > >>> property in your device's DT node. If by chance your IOMMU is
> > > > >>> registered early, you will already have a mapping automatically
> > > > >>> created even before your probe function is called. Can this be
> > > > >>> avoided? Is it even safe?
> > > > >> 
> > > > >> Currently, I think you have to either teardown the ops manually or
> > > > >> return an error from of_xlate. Thierry was also looking at this sort of
> > > > >> thing, so it might be worth talking to him.
> > > > > 
> > > > > I already explained in earlier threads why I think this is a bad idea.
> > > > > It's completely unnatural for any driver to manually tear down something
> > > > > that it didn't want set up in the first place. It also means that you
> > > > > have to carefully audit any users of these IOMMU APIs to make sure that
> > > > > they do tear down. That doesn't sound like a good incremental approach,
> > > > > as evidenced by the breakage that Alex and Heiko have encountered.
> > > > 
> > > > Well, perhaps we hide that behind a get_iommu API or something. We *do*
> > > > need this manual teardown step to support things like VFIO, so it makes
> > > > sense to reuse it for other users too imo.
> > > > 
> > > > > The solution for me has been to completely side-step the issue and not
> > > > > register the IOMMU with the new mechanism at all. That is, there's no
> > > > > .of_xlate() implementation, which means that the ARM DMA API glue won't
> > > > > try to be smart and use the IOMMU in ways it's not meant to be used.
> > > 
> > > That will break when someone will want to use the same IOMMU type for devices 
> > > that use the DMA mapping API to hide the IOMMU. That might not be the case for 
> > > your IOMMU today, but it's pretty fragile, we need to fix it.
> > 
> > No, there's absolutely no issue here. It simply means that you can't do
> > this on Tegra. So far I'm not sure I even see an advantage in using the
> > IOMMU for devices that don't care about it anyway. Consider the example
> > of the SD/MMC or HDA. They typically allocate fairly small buffers, the
> > order of a single page typically. They can simply use memory handed out
> > by the CMA.
> > 
> > So as long as we don't add a .of_xlate() implementation or instantiate
> > via the IOMMU_OF_DECLARE() mechanism we simply don't support IOMMU-over-
> > DMA on Tegra.
> 
> It breaks as soon as you have a system with memory above the 4GB boundary,
> which is the whole point of iommus for most users.

Why does it break? The IOMMU API simply gets a list of pages and gets
the physical addresses from those pages when it maps them to the IO
virtual addresses. How is .of_xlate() or of_iommu_configure() related?

> CMA does not work for streaming mappings, only for the coherent API.

Why not? And if it doesn't I'm not sure we currently care on Tegra since
we've gotten away with using CMA just fine so far.

Thierry
Arnd Bergmann Jan. 19, 2015, 5:02 p.m. UTC | #23
On Monday 19 January 2015 17:21:14 Thierry Reding wrote:
> On Mon, Jan 19, 2015 at 04:52:41PM +0100, Arnd Bergmann wrote:
> > On Monday 19 January 2015 13:36:24 Thierry Reding wrote:
> > > On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
> > > > On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> > > > > On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> > > > > > On Wed, Jan 14, 2015 at 10:46:10AM +0000, Will Deacon wrote:
> > > > > > > On Wed, Jan 14, 2015 at 09:00:24AM +0000, Alexandre Courbot wrote:
> > > > > > [...]
> > > > > > 
> > > > > >>> 2) Say you want to use the IOMMU API in your driver, and have an iommu
> > > > > >>> property in your device's DT node. If by chance your IOMMU is
> > > > > >>> registered early, you will already have a mapping automatically
> > > > > >>> created even before your probe function is called. Can this be
> > > > > >>> avoided? Is it even safe?
> > > > > >> 
> > > > > >> Currently, I think you have to either teardown the ops manually or
> > > > > >> return an error from of_xlate. Thierry was also looking at this sort of
> > > > > >> thing, so it might be worth talking to him.
> > > > > > 
> > > > > > I already explained in earlier threads why I think this is a bad idea.
> > > > > > It's completely unnatural for any driver to manually tear down something
> > > > > > that it didn't want set up in the first place. It also means that you
> > > > > > have to carefully audit any users of these IOMMU APIs to make sure that
> > > > > > they do tear down. That doesn't sound like a good incremental approach,
> > > > > > as evidenced by the breakage that Alex and Heiko have encountered.
> > > > > 
> > > > > Well, perhaps we hide that behind a get_iommu API or something. We *do*
> > > > > need this manual teardown step to support things like VFIO, so it makes
> > > > > sense to reuse it for other users too imo.
> > > > > 
> > > > > > The solution for me has been to completely side-step the issue and not
> > > > > > register the IOMMU with the new mechanism at all. That is, there's no
> > > > > > .of_xlate() implementation, which means that the ARM DMA API glue won't
> > > > > > try to be smart and use the IOMMU in ways it's not meant to be used.
> > > > 
> > > > That will break when someone will want to use the same IOMMU type for devices 
> > > > that use the DMA mapping API to hide the IOMMU. That might not be the case for 
> > > > your IOMMU today, but it's pretty fragile, we need to fix it.
> > > 
> > > No, there's absolutely no issue here. It simply means that you can't do
> > > this on Tegra. So far I'm not sure I even see an advantage in using the
> > > IOMMU for devices that don't care about it anyway. Consider the example
> > > of the SD/MMC or HDA. They typically allocate fairly small buffers, the
> > > order of a single page typically. They can simply use memory handed out
> > > by the CMA.
> > > 
> > > So as long as we don't add a .of_xlate() implementation or instantiate
> > > via the IOMMU_OF_DECLARE() mechanism we simply don't support IOMMU-over-
> > > DMA on Tegra.
> > 
> > It breaks as soon as you have a system with memory above the 4GB boundary,
> > which is the whole point of iommus for most users.
> 
> Why does it break? The IOMMU API simply gets a list of pages and gets
> the physical addresses from those pages when it maps them to the IO
> virtual addresses. How is .of_xlate() or of_iommu_configure() related?

Because almost no drivers use the IOMMU API, and they expect the dma-mapping
API to do the right thing independent of the hardware configuration.

> > CMA does not work for streaming mappings, only for the coherent API.
> 
> Why not? And if it doesn't I'm not sure we currently care on Tegra since
> we've gotten away with using CMA just fine so far.

CMA is used as the backing for the dma_alloc_* API. It's impossible to
use it for streaming data that originates from arbitrary page cache
pages, like a send() operation on a user space network socket, or
writing to disk.

	Arnd
Laurent Pinchart Jan. 20, 2015, 1:47 p.m. UTC | #24
Hi Thierry,

On Monday 19 January 2015 17:21:14 Thierry Reding wrote:
> On Mon, Jan 19, 2015 at 04:52:41PM +0100, Arnd Bergmann wrote:
> > On Monday 19 January 2015 13:36:24 Thierry Reding wrote:
> >> On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
> >>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> >>>> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:

[...]

> >>>>> The solution for me has been to completely side-step the issue and
> >>>>> not register the IOMMU with the new mechanism at all. That is,
> >>>>> there's no .of_xlate() implementation, which means that the ARM
> >>>>> DMA API glue won't try to be smart and use the IOMMU in ways it's
> >>>>> not meant to be used.
> >>> 
> >>> That will break when someone will want to use the same IOMMU type for
> >>> devices that use the DMA mapping API to hide the IOMMU. That might
> >>> not be the case for your IOMMU today, but it's pretty fragile, we
> >>> need to fix it.
> >> 
> >> No, there's absolutely no issue here. It simply means that you can't do
> >> this on Tegra. So far I'm not sure I even see an advantage in using the
> >> IOMMU for devices that don't care about it anyway. Consider the example
> >> of the SD/MMC or HDA. They typically allocate fairly small buffers, the
> >> order of a single page typically. They can simply use memory handed out
> >> by the CMA.
> >> 
> >> So as long as we don't add a .of_xlate() implementation or instantiate
> >> via the IOMMU_OF_DECLARE() mechanism we simply don't support IOMMU-over-
> >> DMA on Tegra.
> > 
> > It breaks as soon as you have a system with memory above the 4GB boundary,
> > which is the whole point of iommus for most users.
> 
> Why does it break? The IOMMU API simply gets a list of pages and gets
> the physical addresses from those pages when it maps them to the IO
> virtual addresses. How is .of_xlate() or of_iommu_configure() related?

Arnd's point was that using the IOMMU for devices that deal with small buffers 
is mandatory in case those devices only support a 32-bit DMA address space, 
and the system has memory over 4GB that you want to use for DMA purpose. 
That's the case of Renesas' platforms with LPAE support for instance.

Another use case for IOMMUs is device isolation, which is, depending on your 
use cases, something you might want for devices that are not IOMMU-aware.

It should also be noted that even DRM drivers don't need to be explicit IOMMU 
users. The rcar-du driver allocates large frame buffers through the DMA 
mapping API, without caring about whether an IOMMU is present or not.

> > CMA does not work for streaming mappings, only for the coherent API.
> 
> Why not? And if it doesn't I'm not sure we currently care on Tegra since
> we've gotten away with using CMA just fine so far.
Laurent Pinchart Jan. 20, 2015, 1:50 p.m. UTC | #25
Hi Thierry,

On Monday 19 January 2015 14:36:38 Thierry Reding wrote:
> On Mon, Jan 19, 2015 at 12:50:52PM +0000, Will Deacon wrote:
> > On Mon, Jan 19, 2015 at 12:43:06PM +0000, Thierry Reding wrote:
> >> On Sun, Jan 18, 2015 at 01:18:51PM +0200, Laurent Pinchart wrote:
> >> > On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> >> > > On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> >> [...]
> >> 
> >>>>> The second way is to implement a mechanism to let drivers signal
> >>>>> that they want to handle DMA mappings themselves. As the mappings
> >>>>> need in the general case to be created before the probe function 
> >>>>> is called
> >>>> 
> >>>> Sorry for being ignorant here, but why is that?
> >>> 
> >>> Because a driver can call the DMA mapping API in its probe function,
> >>> to allocate DMA coherent memory for instance. We need to ensure that
> >>> the DMA mapping IOMMU has set up the required IOMMU ops by that time.
> >>> As explained above I don't like the idea of sprinkling explicit calls
> >>> to initialize IOMMU support in the vast majority of drivers,
> >>> especially when they shouldn't be IOMMU-aware, so we then need to
> >>> initialize everything that is needed before the first call to the DMA
> >>> mapping API.
> >> 
> >> The original patch that Hiroshi posted based on my comments was to have
> >> the driver core call iommu_attach(), which would then set up everything
> >> needed right before calling into the driver's ->probe(). This works
> >> quite nicely because by definition the driver can't allocate any DMA
> >> before ->probe(). And, like you said, this allows deferred probe to be
> >> used.
> >> 
> >> To me it's so obviously the right solution that I remain flabbergasted
> >> with how much resistance it's received (or how much it's being ignored).
> > 
> > Have you considered reposting the patches based on what we currently have
> > (which has the advantage of identifying a specific IOMMU instance)?
> 
> No, I hadn't. Initially my patches included a solution for identifying
> individual IOMMU instances, too. There was a small registry with a list
> of struct iommus. That was supposed to get used to store per-instance
> data (by being embedded in a driver-specific structure). I'd need to
> look in more detail how that could be done with the infrastructure that
> your patchset creates. I'm somewhat burried below other tasks right now
> so don't expect to have any time to look into this anytime before -rc6
> or -rc7 at the earliest.

I'm pretty buried below other tasks as well, but if you find time to work on 
this I'll try to prioritize testing and review.
Laurent Pinchart Jan. 20, 2015, 2:05 p.m. UTC | #26
Hi Thierry,

On Monday 19 January 2015 13:49:36 Thierry Reding wrote:
> On Fri, Jan 16, 2015 at 01:18:21AM +0200, Laurent Pinchart wrote:
> [...]
> 
> > The second way is to implement a mechanism to let drivers signal that they
> > want to handle DMA mappings themselves. As the mappings need in the
> > general case to be created before the probe function is called we can't
> > signal this by calling a function in probe(). A new flag field for struct
> > device_driver is a possible solution. This would however require delaying
> > the creation of DMA mappings until right before probe time. Attaching to
> > the IOMMU could be pushed to right before probe() as well, which would
> > have the added benefit of making IOMMU driver implementable as real
> > platform drivers.
> 
> Right. This is a pretty important point, too. One of the things that
> we've been working on is suspend/resume. Now if you don't have a struct
> device you can't easily implement suspend/resume. You'd have to play
> tricks like using syscore_ops, which then leads to potentially problems
> with suspend/resume ordering. It also means we have to keep around
> global variables for driver-private data because there's no struct
> device to attach it to.
> 
> By properly encoding the dependencies via deferred probe we get the
> proper ordering and we can use the regular driver model with all the
> goodies that we've come up with over the years.

Marek's patch set to port the Exynos IOMMU driver on Will's patches uses 
of_platform_device_create() in the init handler registered with 
IOMMU_OF_DECLARE() to create a platform device for the IOMMU. I've initially 
considered this as a dubious hack, but on the other hand it avoids modifying 
the whole IOMMU driver to get rid of struct device, making it easier to move 
to a deferred probe approach later if needed (and possible and desired).
Laurent Pinchart Jan. 20, 2015, 3:14 p.m. UTC | #27
Hi Thierry and Will,

On Monday 19 January 2015 13:31:00 Thierry Reding wrote:
> On Mon, Jan 19, 2015 at 01:34:24PM +0200, Laurent Pinchart wrote:
> > On Monday 19 January 2015 11:12:02 Will Deacon wrote:
> >> On Sun, Jan 18, 2015 at 11:18:51AM +0000, Laurent Pinchart wrote:
> >>> On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> >>>> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> >>>>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:

[snip]

> >>>> I am arriving late in this discussion, but what is wrong with asking
> >>>> drivers to explicitly state that they want the DMA API to be backed
> >>>> by the IOMMU instead of forcibly making it work that way?
> >>> 
> >>> The vast majority of the drivers are not IOMMU-aware. We would thus
> >>> need to add a call at the beginning of the probe function of nearly
> >>> every driver that can perform DMA to state that the driver doesn't need
> >>> to handle any IOMMU that might be present in the system itself. I don't
> >>> think that's a better solution.
> >>> 
> >>> Explicitly tearing down mappings in drivers that want to manage IOMMUs
> >>> isn't a solution I like either. A possibly better solution would be to
> >>> call a function to state that the DMA mapping API shouldn't not handle
> >>> IOMMUs. Something like
> >>> 
> >>> 	dma_mapping_ignore_iommu(dev);
> >>> 
> >>> at the beginning of the probe function of such drivers could do. The
> >>> function would perform behind the scene all operations needed to tear
> >>> down everything that shouldn't have been set up.
> >> 
> >> An alternative would be to add a flag to platform_driver, like we have
> >> for "prevent_deferred_probe" which is something like
> >> "prevent_dma_configure".
> > 
> > That's a solution I have proposed (albeit as a struct device_driver field,
> > but that's a small detail), so I'm fine with it :-)
> 
> I think Marek had proposed something similar initially as well. I don't
> see an immediate downside to that solution. It's still somewhat ugly in
> that a lot of stuff is set up before it's known to actually be used at
> all, but it seems like there's some consensus that this can be improved
> later on, so I have no objections to such a patch.
> 
> Of course that doesn't solve the current breakage for the Rockchip DRM
> and OMAP ISP drivers.

And, as I came to realize after a long bisect yesternight, the Renesas IPMMU 
driver :-/ Basically any platform that relied on arm_iommu_attach_device() to 
set the IOMMU DMA ops is now broken.

> There's an additional issue with this automatic type of setting up
> mapping: on Tegra quite a few devices can use the IOMMU. Among those are
> SD/MMC controllers, HDA, SATA and things like the AHB bus. Creating a
> separate domain (== address space) for each of those devices is wasteful
> and there's currently no way to make them use a single virtual address
> space. Each of them really has no use for a full 4 GiB of address space.
> My earliest attempt to implement that was as a policy decision within
> the IOMMU driver, but that wasn't very well received.

A similar discussion started at http://www.spinics.net/lists/arm-kernel/msg385805.html but didn't go very far.

In the end it's really a policy decision. The question is how to express that 
policy. As policies in DT are frowned upon we have several subsystems 
currently hacking around similar issues by implementing heuristics or defaults 
that nobody really complained about so far. We might be able to do the same 
for IOMMUs as a first step.

> I'm now wondering whether the same could be done using this new type of
> flag. Perhaps it can be assumed that every driver that doesn't want its
> IOMMU master hooked up to the DMA API can (and will) manually manage the
> virtual address space. Conversely every driver that wants to use the DMA
> API to abstract away the existence of an IOMMU could be considered part
> of the group where a separate domain isn't necessary.
Will Deacon Jan. 20, 2015, 3:19 p.m. UTC | #28
On Tue, Jan 20, 2015 at 03:14:01PM +0000, Laurent Pinchart wrote:
> On Monday 19 January 2015 13:31:00 Thierry Reding wrote:
> > On Mon, Jan 19, 2015 at 01:34:24PM +0200, Laurent Pinchart wrote:
> > > On Monday 19 January 2015 11:12:02 Will Deacon wrote:
> > >> On Sun, Jan 18, 2015 at 11:18:51AM +0000, Laurent Pinchart wrote:
> > >>> On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> > >>>> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> > >>>>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> 
> [snip]
> 
> > >>>> I am arriving late in this discussion, but what is wrong with asking
> > >>>> drivers to explicitly state that they want the DMA API to be backed
> > >>>> by the IOMMU instead of forcibly making it work that way?
> > >>> 
> > >>> The vast majority of the drivers are not IOMMU-aware. We would thus
> > >>> need to add a call at the beginning of the probe function of nearly
> > >>> every driver that can perform DMA to state that the driver doesn't need
> > >>> to handle any IOMMU that might be present in the system itself. I don't
> > >>> think that's a better solution.
> > >>> 
> > >>> Explicitly tearing down mappings in drivers that want to manage IOMMUs
> > >>> isn't a solution I like either. A possibly better solution would be to
> > >>> call a function to state that the DMA mapping API shouldn't not handle
> > >>> IOMMUs. Something like
> > >>> 
> > >>> 	dma_mapping_ignore_iommu(dev);
> > >>> 
> > >>> at the beginning of the probe function of such drivers could do. The
> > >>> function would perform behind the scene all operations needed to tear
> > >>> down everything that shouldn't have been set up.
> > >> 
> > >> An alternative would be to add a flag to platform_driver, like we have
> > >> for "prevent_deferred_probe" which is something like
> > >> "prevent_dma_configure".
> > > 
> > > That's a solution I have proposed (albeit as a struct device_driver field,
> > > but that's a small detail), so I'm fine with it :-)
> > 
> > I think Marek had proposed something similar initially as well. I don't
> > see an immediate downside to that solution. It's still somewhat ugly in
> > that a lot of stuff is set up before it's known to actually be used at
> > all, but it seems like there's some consensus that this can be improved
> > later on, so I have no objections to such a patch.
> > 
> > Of course that doesn't solve the current breakage for the Rockchip DRM
> > and OMAP ISP drivers.
> 
> And, as I came to realize after a long bisect yesternight, the Renesas IPMMU 
> driver :-/ Basically any platform that relied on arm_iommu_attach_device() to 
> set the IOMMU DMA ops is now broken.

We could restore the set_dma_ops call in arm_iommu_attach_device as a
temporary hack (along with a big fat comment), since arch_setup_dma_ops
actually sets the ops correct *after* the call to
arm_get_iommu_dma_map_ops...

It doesn't provide any motivation for people to consider moving over to the
new framework, but it fixes the current issues affecting mainline.

Will
Will Deacon Jan. 20, 2015, 3:21 p.m. UTC | #29
On Tue, Jan 20, 2015 at 03:19:10PM +0000, Will Deacon wrote:
> We could restore the set_dma_ops call in arm_iommu_attach_device as a
> temporary hack (along with a big fat comment), since arch_setup_dma_ops
> actually sets the ops correct *after* the call to
> arm_get_iommu_dma_map_ops...

s/arm_get_iommu_dma_map_ops/arm_setup_iommu_dma_ops/

Will
Laurent Pinchart Jan. 20, 2015, 3:35 p.m. UTC | #30
Hi Will,

On Tuesday 20 January 2015 15:19:11 Will Deacon wrote:
> On Tue, Jan 20, 2015 at 03:14:01PM +0000, Laurent Pinchart wrote:
> > On Monday 19 January 2015 13:31:00 Thierry Reding wrote:
> >> On Mon, Jan 19, 2015 at 01:34:24PM +0200, Laurent Pinchart wrote:
> >>> On Monday 19 January 2015 11:12:02 Will Deacon wrote:
> >>>> On Sun, Jan 18, 2015 at 11:18:51AM +0000, Laurent Pinchart wrote:
> >>>>> On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> >>>>>> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> >>>>>>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> >
> > [snip]
> > 
> >>>>>> I am arriving late in this discussion, but what is wrong with
> >>>>>> asking drivers to explicitly state that they want the DMA API to be
> >>>>>> backed by the IOMMU instead of forcibly making it work that way?
> >>>>> 
> >>>>> The vast majority of the drivers are not IOMMU-aware. We would thus
> >>>>> need to add a call at the beginning of the probe function of nearly
> >>>>> every driver that can perform DMA to state that the driver doesn't
> >>>>> need to handle any IOMMU that might be present in the system itself.
> >>>>> I don't think that's a better solution.
> >>>>> 
> >>>>> Explicitly tearing down mappings in drivers that want to manage
> >>>>> IOMMUs isn't a solution I like either. A possibly better solution
> >>>>> would be to call a function to state that the DMA mapping API
> >>>>> shouldn't not handle IOMMUs. Something like
> >>>>> 
> >>>>> 	dma_mapping_ignore_iommu(dev);
> >>>>> 
> >>>>> at the beginning of the probe function of such drivers could do. The
> >>>>> function would perform behind the scene all operations needed to
> >>>>> tear down everything that shouldn't have been set up.
> >>>> 
> >>>> An alternative would be to add a flag to platform_driver, like we
> >>>> have for "prevent_deferred_probe" which is something like
> >>>> "prevent_dma_configure".
> >>> 
> >>> That's a solution I have proposed (albeit as a struct device_driver
> >>> field, but that's a small detail), so I'm fine with it :-)
> >> 
> >> I think Marek had proposed something similar initially as well. I don't
> >> see an immediate downside to that solution. It's still somewhat ugly in
> >> that a lot of stuff is set up before it's known to actually be used at
> >> all, but it seems like there's some consensus that this can be improved
> >> later on, so I have no objections to such a patch.
> >> 
> >> Of course that doesn't solve the current breakage for the Rockchip DRM
> >> and OMAP ISP drivers.
> > 
> > And, as I came to realize after a long bisect yesternight, the Renesas
> > IPMMU driver :-/ Basically any platform that relied on
> > arm_iommu_attach_device() to set the IOMMU DMA ops is now broken.
> 
> We could restore the set_dma_ops call in arm_iommu_attach_device as a
> temporary hack (along with a big fat comment), since arch_setup_dma_ops
> actually sets the ops correct *after* the call to
> arm_get_iommu_dma_map_ops...
> 
> It doesn't provide any motivation for people to consider moving over to the
> new framework, but it fixes the current issues affecting mainline.

I'm all for incentives, but I think avoiding a major v3.19 regression would be 
good, too :-) I wanted to test your LPAE page table allocator yesterday with 
the Renesas IPMMU driver, and ended up spending the whole night bisecting the 
regression instead.
Laurent Pinchart Jan. 20, 2015, 4:41 p.m. UTC | #31
Hi Arnd,

On Monday 19 January 2015 17:13:14 Arnd Bergmann wrote:
> On Sunday 18 January 2015 13:18:51 Laurent Pinchart wrote:
> > On Sunday 18 January 2015 15:54:34 Alexandre Courbot wrote:
> >> On 01/16/2015 08:18 AM, Laurent Pinchart wrote:
> >>> On Thursday 15 January 2015 11:12:17 Will Deacon wrote:
> >>>> On Thu, Jan 15, 2015 at 08:28:44AM +0000, Thierry Reding wrote:
> >>>>> This has several advantages, such as that I can also use the regular
> >>>>> driver model for suspend/resume of the IOMMU, and I get to enjoy the
> >>>>> benefits of devres in the IOMMU driver. Probe ordering is still a
> >>>>> tiny issue, but we can easily solve that using explicit initcall
> >>>>> ordering (which really isn't any worse than IOMMU_OF_DECLARE()).
> >>>> 
> >>>> That's a pity. I'd much rather extend what we currently have to
> >>>> satisfy your use-case. Ho-hum.
> >>> 
> >>> Assuming we want the IOMMU to be handled transparently for the
> >>> majority of devices I only see two ways to fix this,
> >>> 
> >>> The first way is to create a default DMA mapping unconditionally and
> >>> let drivers that can't live with it tear it down. That's what is
> >>> implemented today.
> >> 
> >> I strongly support Thierry's point that drivers should not have to tear
> >> down things they don't need. The issue we are facing today is a very
> >> good illustration of why one should not have to do this.
> >> 
> >> Everybody hates to receive unsollicited email with a link that says "to
> >> unsubscribe, click here". Let's not import that unpleasant culture into
> >> the kernel.
> >> 
> >> I am arriving late in this discussion, but what is wrong with asking
> >> drivers to explicitly state that they want the DMA API to be backed by
> >> the IOMMU instead of forcibly making it work that way?
> > 
> > The vast majority of the drivers are not IOMMU-aware. We would thus need
> > to add a call at the beginning of the probe function of nearly every
> > driver that can perform DMA to state that the driver doesn't need to
> > handle any IOMMU that might be present in the system itself. I don't think
> > that's a better solution.
>
> Right, abstracting the presence of an IOMMU (along with things like
> cache management) is the whole point for having the dma-mapping API.
> 
> The iommu driver should instead be able to make the decision on whether
> the device uses the iommu for DMA or not. In some cases, it's not an
> option because the iommu is mandatory for all DMA and there is no working
> passthrough mode. In other cases, it depends on the dma mask: as long as
> the device's dma_mask covers all of RAM, we can avoid using the IOMMU
> and get better performance (and also avoid setting up tables that may
> need to be removed again), but when the dma mask is too small, we have
> to use the iommu or fall back to swiotlb (which is currently not implemeted
> on arm32).
> 
> >>> we can't signal this by calling a function in probe(). A new flag
> >>> field for struct device_driver is a possible solution. This would
> >>> however require delaying the creation of DMA mappings until right
> >>> before probe time. Attaching to the IOMMU could be pushed to right
> >>> before probe() as well, which would have the added benefit of making
> >>> IOMMU driver implementable as real platform drivers.
> >> 
> >> Keeping the ability to write IOMMU drivers as platform drivers would be
> >> nice indeed.
> >> 
> >> The problem with the opt-out flag though is that one will need to check
> >> every single driver out there to see whether it stills behave correctly
> >> if its hardware is suddently put behind a IOMMU.
> > 
> > That's actually my default assumption :-) On ARM platforms at least, for
> > many devices, whether an IOMMU is present or not is an integration
> > property, not a property of the device. The same way a device shouldn't
> > care about the exact bus topology that connects it to the CPU and memory,
> > it shouldn't care about whether an IOMMU is present on that bus, except
> > in special cases.
> 
> Agreed. This must work by default, or basically all arm64 machines are
> broken. At the moment, arm64 does not support IOMMUs properly and uses
> uses swiotlb instead, but it's a huge performance bottleneck. On arm32,
> very few systems need an IOMMU at the moment, but it's getting more common.
> 
> >> Doing it the other way (a flag that enables IOMMU if available) sounds
> >> safer to me.
> >> 
> >> What we have right now is a mechanism that basically makes it impossible
> >> to use the DMA API on many ARM platforms if ARM_DMA_USE_IOMMU is set
> >> (and I suspect it would also make the IOMMU unusable as well, without
> >> any way to fix things). This is quite concerning.
> >> 
> >> Even more concerning is that -rc5 is about to be released and we have
> >> in-tree drivers (Rockchip DRM) that are not working as they should
> >> because of this patch. Will, what is your plan to fix this? Do we have
> >> stuff that absolutely depends on this patch? If not, can we just revert
> >> it until all these issues are solved?
> 
> keystone, shmobile, mvebu and highbank all have PCI buses that are unable
> to access all of RAM, with different kinds of hacks to work around that.

But Will's series doesn't fix that, does it ?
diff mbox

Patch

--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1961,6 +1961,7 @@  int arm_iommu_attach_device(struct device *dev,

         kref_get(&mapping->kref);
         dev->archdata.mapping = mapping;
+       set_dma_ops(dev, &iommu_ops);

         pr_debug("Attached IOMMU controller to %s device.\n", 
dev_name(dev));