Message ID | 1495621472-9323-5-git-send-email-vladimir.murzin@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote: > This patch introduces default coherent DMA pool similar to default CMA > area concept. To keep other users safe code kept under CONFIG_ARM. I don't see a CONFIG_ARM in the code, although parts of it are added under CONFIG_OF_RESERVED_MEM. But overall this code look a bit odd to me. As far as I can tell the dma-coherent.c code is for the case where we have a special piece of coherent memory close to a device. If you're allocating out of the global allocator the memory should come from the normal dma_ops ->alloc allocator - and also take the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)
On 20/06/17 14:49, Christoph Hellwig wrote: > On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote: >> This patch introduces default coherent DMA pool similar to default CMA >> area concept. To keep other users safe code kept under CONFIG_ARM. > > I don't see a CONFIG_ARM in the code, although parts of it are added > under CONFIG_OF_RESERVED_MEM. It's in rmem_dma_setup() (line 325) currently enforcing no-map for ARM. > But overall this code look a bit odd to me. As far as I can tell > the dma-coherent.c code is for the case where we have a special > piece of coherent memory close to a device. True, but the case here is where we need a special piece of coherent memory for *all* devices, and it was more complicated *not* to reuse the existing infrastructure. This would already be achievable by specifying a separate rmem carveout per device, but the shared pool just makes life easier, and mirrors the functionality dma-contiguous already supports. > If you're allocating out of the global allocator the memory should > come from the normal dma_ops ->alloc allocator - and also take > the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or > DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory) The context here is noMMU but with caches - the problem being that the normal allocator will give back kernel memory, and there's no way to make that coherent with devices short of not enabling the caches in the first place, which is obviously undesirable. The trick is that RAM is aliased (in hardware) at two addresses, one of which makes CPU accesses non-cacheable, so by only ever accessing the RAM set aside for the coherent DMA pool using the non-cacheable alias (represented by the dma_pfn_offset) we can achieve DMA coherency. It perhaps seems a bit backwards, but we do actually end up honouring DMA_ATTR_NON_CONSISTENT to a degree in patch #5, as such requests are the only ones allowed to fall back to the normal dma_ops allocator. Robin.
On 20/06/17 14:49, Christoph Hellwig wrote: > On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote: >> This patch introduces default coherent DMA pool similar to default CMA >> area concept. To keep other users safe code kept under CONFIG_ARM. > > I don't see a CONFIG_ARM in the code, although parts of it are added > under CONFIG_OF_RESERVED_MEM. It should look like that: #ifdef CONFIG_ARM if (!of_get_flat_dt_prop(node, "no-map", NULL)) { pr_err("Reserved memory: regions without no-map are not yet supported\n"); return -EINVAL; } + + if (of_get_flat_dt_prop(node, "linux,dma-default", NULL)) { + WARN(dma_reserved_default_memory, + "Reserved memory: region for default DMA coherent area is redefined\n"); + dma_reserved_default_memory = rmem; + } #endif > > But overall this code look a bit odd to me. As far as I can tell > the dma-coherent.c code is for the case where we have a special > piece of coherent memory close to a device. > > If you're allocating out of the global allocator the memory should > come from the normal dma_ops ->alloc allocator - and also take > the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or > DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory) > It is how it has been started [1] - defining memory which is not cacheable (i.e. suitable for coherent allocations) and building custom allocator on top of it, like it was done for c6x and blackfin. The annoying thing was that we needed to advertise such memory via command line parameter plus some "mem=" adjustment to hide coherent memory from buddy allocator. So it was suggested to use reserved memory and this makes things look much better, but on the other hand require changes on dts side to "bind" devices with reserved memory - default DMA pool removes such drawback. [1] https://marc.info/?l=linux-arm-kernel&m=148163694824475&w=2 Cheers Vladimir
On Tue, Jun 20, 2017 at 03:24:21PM +0100, Robin Murphy wrote: > True, but the case here is where we need a special piece of coherent > memory for *all* devices, and it was more complicated *not* to reuse the > existing infrastructure. This would already be achievable by specifying > a separate rmem carveout per device, but the shared pool just makes life > easier, and mirrors the functionality dma-contiguous already supports. І'm really worried about the code in dma-coherent.c - the original version clearly intends to have a coherent pool per device, declared in the driver. Then Marek added the reserved_mem interface, and now we get another variant of it. Conceptually the per-device and global pool are very different, and to me it seems like the reserved mem should be a different interface. > > If you're allocating out of the global allocator the memory should > > come from the normal dma_ops ->alloc allocator - and also take > > the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or > > DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory) > > The context here is noMMU but with caches - the problem being that the > normal allocator will give back kernel memory, and there's no way to > make that coherent with devices short of not enabling the caches in the > first place, which is obviously undesirable. The trick is that RAM is > aliased (in hardware) at two addresses, one of which makes CPU accesses > non-cacheable, so by only ever accessing the RAM set aside for the > coherent DMA pool using the non-cacheable alias (represented by the > dma_pfn_offset) we can achieve DMA coherency. Yes, and I think this is something we already have to deal with for example on mips. A simple genalloc allocator from your pool in the normal dma_ops implementation should do the work just fine.
On Thu, Jun 22, 2017 at 02:18:48PM +0100, Vladimir Murzin wrote: > It is how it has been started [1] - defining memory which is not cacheable > (i.e. suitable for coherent allocations) and building custom allocator on top > of it, like it was done for c6x and blackfin. The annoying thing was that we > needed to advertise such memory via command line parameter plus some "mem=" > adjustment to hide coherent memory from buddy allocator. So it was suggested > to use reserved memory and this makes things look much better, but on the > other hand require changes on dts side to "bind" devices with reserved memory > - default DMA pool removes such drawback. I like the idea in general, I'm just worried about the overlap with the per-device coherent memory, especially when we have slight semantic mismatches like the one about the physical (or rather dma) address earlier.
On 26/06/17 10:42, Christoph Hellwig wrote: > On Tue, Jun 20, 2017 at 03:24:21PM +0100, Robin Murphy wrote: >> True, but the case here is where we need a special piece of coherent >> memory for *all* devices, and it was more complicated *not* to reuse the >> existing infrastructure. This would already be achievable by specifying >> a separate rmem carveout per device, but the shared pool just makes life >> easier, and mirrors the functionality dma-contiguous already supports. > > І'm really worried about the code in dma-coherent.c - the original > version clearly intends to have a coherent pool per device, declared > in the driver. Then Marek added the reserved_mem interface, and > now we get another variant of it. Conceptually the per-device > and global pool are very different, and to me it seems like the > reserved mem should be a different interface. > >>> If you're allocating out of the global allocator the memory should >>> come from the normal dma_ops ->alloc allocator - and also take >>> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or >>> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory) >> >> The context here is noMMU but with caches - the problem being that the >> normal allocator will give back kernel memory, and there's no way to >> make that coherent with devices short of not enabling the caches in the >> first place, which is obviously undesirable. The trick is that RAM is >> aliased (in hardware) at two addresses, one of which makes CPU accesses >> non-cacheable, so by only ever accessing the RAM set aside for the >> coherent DMA pool using the non-cacheable alias (represented by the >> dma_pfn_offset) we can achieve DMA coherency. > > Yes, and I think this is something we already have to deal with > for example on mips. A simple genalloc allocator from your pool > in the normal dma_ops implementation should do the work just fine. > Are you proposing keeping pool handling under arch? Cheers Vladimir
On 26/06/17 10:42, Christoph Hellwig wrote: > On Tue, Jun 20, 2017 at 03:24:21PM +0100, Robin Murphy wrote: >> True, but the case here is where we need a special piece of coherent >> memory for *all* devices, and it was more complicated *not* to reuse the >> existing infrastructure. This would already be achievable by specifying >> a separate rmem carveout per device, but the shared pool just makes life >> easier, and mirrors the functionality dma-contiguous already supports. > > І'm really worried about the code in dma-coherent.c - the original > version clearly intends to have a coherent pool per device, declared > in the driver. Then Marek added the reserved_mem interface, and > now we get another variant of it. Conceptually the per-device > and global pool are very different, and to me it seems like the > reserved mem should be a different interface. Per-device reserved mem is still a private per-device pool though, it's just discovered and declared by common firmware code rather than in some device-specific way by driver code - once it's assigned there's no distinction. The global/per-device issue is essentially entirely orthogonal, and has the dubious pleasure of being a massive conceptual difference yet a much smaller implementation difference. >>> If you're allocating out of the global allocator the memory should >>> come from the normal dma_ops ->alloc allocator - and also take >>> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or >>> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory) >> >> The context here is noMMU but with caches - the problem being that the >> normal allocator will give back kernel memory, and there's no way to >> make that coherent with devices short of not enabling the caches in the >> first place, which is obviously undesirable. The trick is that RAM is >> aliased (in hardware) at two addresses, one of which makes CPU accesses >> non-cacheable, so by only ever accessing the RAM set aside for the >> coherent DMA pool using the non-cacheable alias (represented by the >> dma_pfn_offset) we can achieve DMA coherency. > > Yes, and I think this is something we already have to deal with > for example on mips. A simple genalloc allocator from your pool > in the normal dma_ops implementation should do the work just fine. I admit I'm almost in agreement, were it not for the fact that dma-contiguous already supports all four combinations of both per-device and global pools, and both reserved mem and direct declarations from arch/platform code, all through the same interface to boot, and nobody's complaining about that. The only real difference for dma-coherent seems to be the way it's baked into the existing API. If it is just a matter of interfaces, I'd have no objection to exporting a separate e.g. dma_alloc_from_global_coherent() or somesuch as a conceptually separate interface to dma_coherent_default_memory, which the arch code can then call from ->alloc in the same manner they currently call dma_alloc_from_contiguous(). That seems like a reasonable way to keep the per-device and global pools conceptually distinct without needlessly duplicating implementations. In fact, I'm now wondering if the regular arm/arm64 atomic pools couldn't also make use of such a thing as well... Robin.
On Tue, Jun 27, 2017 at 03:36:16PM +0100, Robin Murphy wrote: > I admit I'm almost in agreement, were it not for the fact that > dma-contiguous already supports all four combinations of both per-device > and global pools, and both reserved mem and direct declarations from > arch/platform code, all through the same interface to boot, and nobody's > complaining about that. The only real difference for dma-coherent seems > to be the way it's baked into the existing API. > > If it is just a matter of interfaces, I'd have no objection to exporting > a separate e.g. dma_alloc_from_global_coherent() or somesuch as a > conceptually separate interface to dma_coherent_default_memory, which > the arch code can then call from ->alloc in the same manner they > currently call dma_alloc_from_contiguous(). That seems like a reasonable > way to keep the per-device and global pools conceptually distinct > without needlessly duplicating implementations. In fact, I'm now > wondering if the regular arm/arm64 atomic pools couldn't also make use > of such a thing as well... Ok. I think I'll just go ahead with the current patches, and then we'll try to come up with something better later. I really don't want it in actual arch code, but I want it controlled from the dma_map_ops instance instead of from generic code. There will be a lot of churn in this area if my plans go ahead, so I think we can handle it then.
diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt index 3da0ebd..16291f2 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt @@ -68,6 +68,9 @@ Linux implementation note: - If a "linux,cma-default" property is present, then Linux will use the region for the default pool of the contiguous memory allocator. +- If a "linux,dma-default" property is present, then Linux will use the + region for the default pool of the consistent DMA allocator. + Device node references to reserved memory ----------------------------------------- Regions in the /reserved-memory node may be referenced by other device diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index 99c9695..2ae24c2 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -19,6 +19,15 @@ struct dma_coherent_mem { bool use_dev_dma_pfn_offset; }; +static struct dma_coherent_mem *dma_coherent_default_memory __ro_after_init; + +static inline struct dma_coherent_mem *dev_get_coherent_memory(struct device *dev) +{ + if (dev && dev->dma_mem) + return dev->dma_mem; + return dma_coherent_default_memory; +} + static inline dma_addr_t dma_get_device_base(struct device *dev, struct dma_coherent_mem * mem) { @@ -93,6 +102,9 @@ static void dma_release_coherent_memory(struct dma_coherent_mem *mem) static int dma_assign_coherent_memory(struct device *dev, struct dma_coherent_mem *mem) { + if (!dev) + return -ENODEV; + if (dev->dma_mem) return -EBUSY; @@ -171,15 +183,12 @@ EXPORT_SYMBOL(dma_mark_declared_memory_occupied); int dma_alloc_from_coherent(struct device *dev, ssize_t size, dma_addr_t *dma_handle, void **ret) { - struct dma_coherent_mem *mem; + struct dma_coherent_mem *mem = dev_get_coherent_memory(dev); int order = get_order(size); unsigned long flags; int pageno; int dma_memory_map; - if (!dev) - return 0; - mem = dev->dma_mem; if (!mem) return 0; @@ -233,7 +242,7 @@ EXPORT_SYMBOL(dma_alloc_from_coherent); */ int dma_release_from_coherent(struct device *dev, int order, void *vaddr) { - struct dma_coherent_mem *mem = dev ? dev->dma_mem : NULL; + struct dma_coherent_mem *mem = dev_get_coherent_memory(dev); if (mem && vaddr >= mem->virt_base && vaddr < (mem->virt_base + (mem->size << PAGE_SHIFT))) { @@ -267,7 +276,7 @@ EXPORT_SYMBOL(dma_release_from_coherent); int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma, void *vaddr, size_t size, int *ret) { - struct dma_coherent_mem *mem = dev ? dev->dma_mem : NULL; + struct dma_coherent_mem *mem = dev_get_coherent_memory(dev); if (mem && vaddr >= mem->virt_base && vaddr + size <= (mem->virt_base + (mem->size << PAGE_SHIFT))) { @@ -297,6 +306,8 @@ EXPORT_SYMBOL(dma_mmap_from_coherent); #include <linux/of_fdt.h> #include <linux/of_reserved_mem.h> +static struct reserved_mem *dma_reserved_default_memory __initdata; + static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev) { struct dma_coherent_mem *mem = rmem->priv; @@ -318,7 +329,8 @@ static int rmem_dma_device_init(struct reserved_mem *rmem, struct device *dev) static void rmem_dma_device_release(struct reserved_mem *rmem, struct device *dev) { - dev->dma_mem = NULL; + if (dev) + dev->dma_mem = NULL; } static const struct reserved_mem_ops rmem_dma_ops = { @@ -338,6 +350,12 @@ static int __init rmem_dma_setup(struct reserved_mem *rmem) pr_err("Reserved memory: regions without no-map are not yet supported\n"); return -EINVAL; } + + if (of_get_flat_dt_prop(node, "linux,dma-default", NULL)) { + WARN(dma_reserved_default_memory, + "Reserved memory: region for default DMA coherent area is redefined\n"); + dma_reserved_default_memory = rmem; + } #endif rmem->ops = &rmem_dma_ops; @@ -345,5 +363,32 @@ static int __init rmem_dma_setup(struct reserved_mem *rmem) &rmem->base, (unsigned long)rmem->size / SZ_1M); return 0; } + +static int __init dma_init_reserved_memory(void) +{ + const struct reserved_mem_ops *ops; + int ret; + + if (!dma_reserved_default_memory) + return -ENOMEM; + + ops = dma_reserved_default_memory->ops; + + /* + * We rely on rmem_dma_device_init() does not propagate error of + * dma_assign_coherent_memory() for "NULL" device. + */ + ret = ops->device_init(dma_reserved_default_memory, NULL); + + if (!ret) { + dma_coherent_default_memory = dma_reserved_default_memory->priv; + pr_info("DMA: default coherent area is set\n"); + } + + return ret; +} + +core_initcall(dma_init_reserved_memory); + RESERVEDMEM_OF_DECLARE(dma, "shared-dma-pool", rmem_dma_setup); #endif