Message ID | 20211123112104.3530135-1-hsinyi@chromium.org (mailing list archive) |
---|---|
Headers | show |
Series | Allow restricted-dma-pool to customize IO_TLB_SEGSIZE | expand |
On 2021-11-23 11:21, Hsin-Yi Wang wrote: > Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases. > This series adds support to customize io_tlb_segsize for each > restricted-dma-pool. > > Example use case: > > mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through > mtk-scp. In order to use the noncontiguous DMA API[3], we need to use > the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs. > mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are > larger than the default IO_TLB_SEGSIZE (128) slabs. Are drivers really doing streaming DMA mappings that large? If so, that seems like it might be worth trying to address in its own right for the sake of efficiency - allocating ~5MB of memory twice and copying it back and forth doesn't sound like the ideal thing to do. If it's really about coherent DMA buffer allocation, I thought the plan was that devices which expect to use a significant amount and/or size of coherent buffers would continue to use a shared-dma-pool for that? It's still what the binding implies. My understanding was that swiotlb_alloc() is mostly just a fallback for the sake of drivers which mostly do streaming DMA but may allocate a handful of pages worth of coherent buffers here and there. Certainly looking at the mtk_scp driver, that seems like it shouldn't be going anywhere near SWIOTLB at all. Robin. > [1] (not in upstream) https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo.lin@mediatek.com/ > [2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c > [3] https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhatsky@chromium.org/ > > Hsin-Yi Wang (3): > dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE > dt-bindings: Add io-tlb-segsize property for restricted-dma-pool > arm64: dts: mt8183: use restricted swiotlb for scp mem > > .../reserved-memory/shared-dma-pool.yaml | 8 +++++ > .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 +-- > include/linux/swiotlb.h | 1 + > kernel/dma/swiotlb.c | 34 ++++++++++++++----- > 4 files changed, 37 insertions(+), 10 deletions(-) >
On Tue, Nov 23, 2021 at 7:58 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2021-11-23 11:21, Hsin-Yi Wang wrote: > > Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases. > > This series adds support to customize io_tlb_segsize for each > > restricted-dma-pool. > > > > Example use case: > > > > mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through > > mtk-scp. In order to use the noncontiguous DMA API[3], we need to use > > the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs. > > mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are > > larger than the default IO_TLB_SEGSIZE (128) slabs. > > Are drivers really doing streaming DMA mappings that large? If so, that > seems like it might be worth trying to address in its own right for the > sake of efficiency - allocating ~5MB of memory twice and copying it back > and forth doesn't sound like the ideal thing to do. > > If it's really about coherent DMA buffer allocation, I thought the plan > was that devices which expect to use a significant amount and/or size of > coherent buffers would continue to use a shared-dma-pool for that? It's > still what the binding implies. My understanding was that > swiotlb_alloc() is mostly just a fallback for the sake of drivers which > mostly do streaming DMA but may allocate a handful of pages worth of > coherent buffers here and there. Certainly looking at the mtk_scp > driver, that seems like it shouldn't be going anywhere near SWIOTLB at all. > mtk_scp on its own can use the shared-dma-pool, which it currently uses. The reason we switched to restricted-dma-pool is that we want to use the noncontiguous DMA API for mtk-isp. The noncontiguous DMA API is designed for devices with iommu, and if a device doesn't have an iommu, it will fallback using swiotlb. But currently noncontiguous DMA API doesn't work with the shared-dma-pool. vb2_dc_alloc() -> dma_alloc_noncontiguous() -> alloc_single_sgt() -> __dma_alloc_pages() -> dma_direct_alloc_pages() -> __dma_direct_alloc_pages() -> swiotlb_alloc(). > Robin. > > > [1] (not in upstream) https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo.lin@mediatek.com/ > > [2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c > > [3] https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhatsky@chromium.org/ > > > > Hsin-Yi Wang (3): > > dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE > > dt-bindings: Add io-tlb-segsize property for restricted-dma-pool > > arm64: dts: mt8183: use restricted swiotlb for scp mem > > > > .../reserved-memory/shared-dma-pool.yaml | 8 +++++ > > .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 +-- > > include/linux/swiotlb.h | 1 + > > kernel/dma/swiotlb.c | 34 ++++++++++++++----- > > 4 files changed, 37 insertions(+), 10 deletions(-) > >
On 2021-11-24 03:55, Hsin-Yi Wang wrote: > On Tue, Nov 23, 2021 at 7:58 PM Robin Murphy <robin.murphy@arm.com> wrote: >> >> On 2021-11-23 11:21, Hsin-Yi Wang wrote: >>> Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases. >>> This series adds support to customize io_tlb_segsize for each >>> restricted-dma-pool. >>> >>> Example use case: >>> >>> mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through >>> mtk-scp. In order to use the noncontiguous DMA API[3], we need to use >>> the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs. >>> mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are >>> larger than the default IO_TLB_SEGSIZE (128) slabs. >> >> Are drivers really doing streaming DMA mappings that large? If so, that >> seems like it might be worth trying to address in its own right for the >> sake of efficiency - allocating ~5MB of memory twice and copying it back >> and forth doesn't sound like the ideal thing to do. >> >> If it's really about coherent DMA buffer allocation, I thought the plan >> was that devices which expect to use a significant amount and/or size of >> coherent buffers would continue to use a shared-dma-pool for that? It's >> still what the binding implies. My understanding was that >> swiotlb_alloc() is mostly just a fallback for the sake of drivers which >> mostly do streaming DMA but may allocate a handful of pages worth of >> coherent buffers here and there. Certainly looking at the mtk_scp >> driver, that seems like it shouldn't be going anywhere near SWIOTLB at all. >> > mtk_scp on its own can use the shared-dma-pool, which it currently uses. > The reason we switched to restricted-dma-pool is that we want to use > the noncontiguous DMA API for mtk-isp. The noncontiguous DMA API is > designed for devices with iommu, and if a device doesn't have an > iommu, it will fallback using swiotlb. But currently noncontiguous DMA > API doesn't work with the shared-dma-pool. > > vb2_dc_alloc() -> dma_alloc_noncontiguous() -> alloc_single_sgt() -> > __dma_alloc_pages() -> dma_direct_alloc_pages() -> > __dma_direct_alloc_pages() -> swiotlb_alloc(). OK, thanks for clarifying. My gut feeling is that drivers should probably only be calling the noncontiguous API when they *know* that they have a scatter-gather-capable device or IOMMU that can cope with it, but either way I'm still not convinced that it makes sense to hack up SWIOTLB with DT ABI baggage for an obscure fallback case. It would seem a lot more sensible to fix alloc_single_sgt() to not ignore per-device pools once it has effectively fallen back to the normal dma_alloc_attrs() flow, but I guess that's not technically guaranteed to uphold the assumption that we can allocate struct-page-backed memory. Still, if we've got to the point of needing to use a SWIOTLB pool as nothing more than a bad reinvention of CMA, rather than an actual bounce buffer, that reeks of a fundamental design issue and adding more hacks on top to bodge around it is not the right way to go - we need to take a step back and properly reconsider how dma_alloc_noncontiguous() is supposed to interact with DMA protection schemes. Thanks, Robin. >>> [1] (not in upstream) https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo.lin@mediatek.com/ >>> [2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c >>> [3] https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhatsky@chromium.org/ >>> >>> Hsin-Yi Wang (3): >>> dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE >>> dt-bindings: Add io-tlb-segsize property for restricted-dma-pool >>> arm64: dts: mt8183: use restricted swiotlb for scp mem >>> >>> .../reserved-memory/shared-dma-pool.yaml | 8 +++++ >>> .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 +-- >>> include/linux/swiotlb.h | 1 + >>> kernel/dma/swiotlb.c | 34 ++++++++++++++----- >>> 4 files changed, 37 insertions(+), 10 deletions(-) >>> > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu >
Hi Robin, On Tue, Nov 23, 2021 at 8:59 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2021-11-23 11:21, Hsin-Yi Wang wrote: > > Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases. > > This series adds support to customize io_tlb_segsize for each > > restricted-dma-pool. > > > > Example use case: > > > > mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through > > mtk-scp. In order to use the noncontiguous DMA API[3], we need to use > > the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs. > > mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are > > larger than the default IO_TLB_SEGSIZE (128) slabs. > > Are drivers really doing streaming DMA mappings that large? If so, that > seems like it might be worth trying to address in its own right for the > sake of efficiency - allocating ~5MB of memory twice and copying it back > and forth doesn't sound like the ideal thing to do. > > If it's really about coherent DMA buffer allocation, I thought the plan > was that devices which expect to use a significant amount and/or size of > coherent buffers would continue to use a shared-dma-pool for that? It's > still what the binding implies. My understanding was that > swiotlb_alloc() is mostly just a fallback for the sake of drivers which > mostly do streaming DMA but may allocate a handful of pages worth of > coherent buffers here and there. Certainly looking at the mtk_scp > driver, that seems like it shouldn't be going anywhere near SWIOTLB at all. First, thanks a lot for taking a look at this patch series. The drivers would do streaming DMA within a reserved region that is the only memory accessible to them for security reasons. This seems to exactly match the definition of the restricted pool as merged recently. The new dma_alloc_noncontiguous() API would allow allocating suitable memory directly from the pool, which would eliminate the need to copy. However, for a restricted pool, this would exercise the SWIOTLB allocator, which currently suffers from the limitation as described by Hsin-Yi. Since the allocator in general is quite general purpose and already used for coherent allocations as per the current restricted pool implementation, I think it indeed makes sense to lift the limitation, rather than trying to come up with yet another thing. Best regards, Tomasz > > Robin. > > > [1] (not in upstream) https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo.lin@mediatek.com/ > > [2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c > > [3] https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhatsky@chromium.org/ > > > > Hsin-Yi Wang (3): > > dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE > > dt-bindings: Add io-tlb-segsize property for restricted-dma-pool > > arm64: dts: mt8183: use restricted swiotlb for scp mem > > > > .../reserved-memory/shared-dma-pool.yaml | 8 +++++ > > .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 +-- > > include/linux/swiotlb.h | 1 + > > kernel/dma/swiotlb.c | 34 ++++++++++++++----- > > 4 files changed, 37 insertions(+), 10 deletions(-) > >
On 2021-11-25 07:35, Tomasz Figa wrote: > Hi Robin, > > On Tue, Nov 23, 2021 at 8:59 PM Robin Murphy <robin.murphy@arm.com> wrote: >> >> On 2021-11-23 11:21, Hsin-Yi Wang wrote: >>> Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases. >>> This series adds support to customize io_tlb_segsize for each >>> restricted-dma-pool. >>> >>> Example use case: >>> >>> mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through >>> mtk-scp. In order to use the noncontiguous DMA API[3], we need to use >>> the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs. >>> mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are >>> larger than the default IO_TLB_SEGSIZE (128) slabs. >> >> Are drivers really doing streaming DMA mappings that large? If so, that >> seems like it might be worth trying to address in its own right for the >> sake of efficiency - allocating ~5MB of memory twice and copying it back >> and forth doesn't sound like the ideal thing to do. >> >> If it's really about coherent DMA buffer allocation, I thought the plan >> was that devices which expect to use a significant amount and/or size of >> coherent buffers would continue to use a shared-dma-pool for that? It's >> still what the binding implies. My understanding was that >> swiotlb_alloc() is mostly just a fallback for the sake of drivers which >> mostly do streaming DMA but may allocate a handful of pages worth of >> coherent buffers here and there. Certainly looking at the mtk_scp >> driver, that seems like it shouldn't be going anywhere near SWIOTLB at all. > > First, thanks a lot for taking a look at this patch series. > > The drivers would do streaming DMA within a reserved region that is > the only memory accessible to them for security reasons. This seems to > exactly match the definition of the restricted pool as merged > recently. Huh? Of the drivers indicated, the SCP driver is doing nothing but coherent allocations, and I'm not entirely sure what those ISP driver patches are supposed to be doing but I suspect it's probably just buffer allocation too. I don't see any actual streaming DMA anywhere :/ > The new dma_alloc_noncontiguous() API would allow allocating suitable > memory directly from the pool, which would eliminate the need to copy. Can you clarify what's being copied, and where? I'm not all that familiar with the media APIs, but I thought it was all based around preallocated DMA buffers (the whole dedicated "videobuf" thing)? The few instances of actual streaming DMA I can see in drivers/media/ look to be mostly PCI drivers mapping private descriptors, whereas the MTK ISP appears to be entirely register-based. > However, for a restricted pool, this would exercise the SWIOTLB > allocator, which currently suffers from the limitation as described by > Hsin-Yi. Since the allocator in general is quite general purpose and > already used for coherent allocations as per the current restricted > pool implementation, I think it indeed makes sense to lift the > limitation, rather than trying to come up with yet another thing. No, just fix the dma_alloc_noncontiguous() fallback case to split the allocation into dma_max_mapping_size() chunks. *That* makes sense. Thanks, Robin. > > Best regards, > Tomasz > >> >> Robin. >> >>> [1] (not in upstream) https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo.lin@mediatek.com/ >>> [2] https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c >>> [3] https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhatsky@chromium.org/ >>> >>> Hsin-Yi Wang (3): >>> dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE >>> dt-bindings: Add io-tlb-segsize property for restricted-dma-pool >>> arm64: dts: mt8183: use restricted swiotlb for scp mem >>> >>> .../reserved-memory/shared-dma-pool.yaml | 8 +++++ >>> .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi | 4 +-- >>> include/linux/swiotlb.h | 1 + >>> kernel/dma/swiotlb.c | 34 ++++++++++++++----- >>> 4 files changed, 37 insertions(+), 10 deletions(-) >>>