Message ID | 20190614134726.3827-17-hch@lst.de (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
Series | [01/16] media: videobuf-dma-contig: use dma_mmap_coherent | expand |
From: Christoph Hellwig > Sent: 14 June 2019 14:47 > > Many architectures (e.g. arm, m68 and sh) have always used exact > allocation in their dma coherent allocator, which avoids a lot of > memory waste especially for larger allocations. Lift this behavior > into the generic allocator so that dma-direct and the generic IOMMU > code benefit from this behavior as well. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > include/linux/dma-contiguous.h | 8 +++++--- > kernel/dma/contiguous.c | 17 +++++++++++------ > 2 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h > index c05d4e661489..2e542e314acf 100644 > --- a/include/linux/dma-contiguous.h > +++ b/include/linux/dma-contiguous.h > @@ -161,15 +161,17 @@ static inline struct page *dma_alloc_contiguous(struct device *dev, size_t size, > gfp_t gfp) > { > int node = dev ? dev_to_node(dev) : NUMA_NO_NODE; > - size_t align = get_order(PAGE_ALIGN(size)); > + void *cpu_addr = alloc_pages_exact_node(node, size, gfp); > > - return alloc_pages_node(node, gfp, align); > + if (!cpu_addr) > + return NULL; > + return virt_to_page(p); > } Does this still guarantee that requests for 16k will not cross a 16k boundary? It looks like you are losing the alignment parameter. There may be drivers and hardware that also require 12k allocates to not cross 16k boundaries (etc). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Jun 14, 2019 at 02:15:44PM +0000, David Laight wrote: > Does this still guarantee that requests for 16k will not cross a 16k boundary? > It looks like you are losing the alignment parameter. The DMA API never gave you alignment guarantees to start with, and you can get not naturally aligned memory from many of our current implementations.
From: 'Christoph Hellwig' > Sent: 14 June 2019 15:50 > To: David Laight > On Fri, Jun 14, 2019 at 02:15:44PM +0000, David Laight wrote: > > Does this still guarantee that requests for 16k will not cross a 16k boundary? > > It looks like you are losing the alignment parameter. > > The DMA API never gave you alignment guarantees to start with, > and you can get not naturally aligned memory from many of our > current implementations. Hmmm... I thought that was even documented. I'm pretty sure there is a lot of code out there that makes that assumption. Without it many drivers will have to allocate almost double the amount of memory they actually need in order to get the required alignment. So instead of saving memory you'll actually make more be used. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 14/06/2019 15:50, 'Christoph Hellwig' wrote: > On Fri, Jun 14, 2019 at 02:15:44PM +0000, David Laight wrote: >> Does this still guarantee that requests for 16k will not cross a 16k boundary? >> It looks like you are losing the alignment parameter. > > The DMA API never gave you alignment guarantees to start with, > and you can get not naturally aligned memory from many of our > current implementations. Well, apart from the bit in DMA-API-HOWTO which has said this since forever (well, before Git history, at least): "The CPU virtual address and the DMA address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary." That said, I don't believe this particular patch should make any appreciable difference - alloc_pages_exact() is still going to give back the same base address as the rounded up over-allocation would, and PAGE_ALIGN()ing the size passed to get_order() already seemed to be pointless. Robin.
On Fri, Jun 14, 2019 at 03:01:22PM +0000, David Laight wrote: > I'm pretty sure there is a lot of code out there that makes that assumption. > Without it many drivers will have to allocate almost double the > amount of memory they actually need in order to get the required alignment. > So instead of saving memory you'll actually make more be used. That code would already be broken on a lot of Linux platforms.
On Fri, Jun 14, 2019 at 04:05:33PM +0100, Robin Murphy wrote: > That said, I don't believe this particular patch should make any > appreciable difference - alloc_pages_exact() is still going to give back > the same base address as the rounded up over-allocation would, and > PAGE_ALIGN()ing the size passed to get_order() already seemed to be > pointless. True, we actually do get the right alignment just about anywhere. Not 100% sure about the various static pool implementations, but we can make sure if any didn't we'll do that right thing once those get consolidated.
From: Robin Murphy > Sent: 14 June 2019 16:06 ... > Well, apart from the bit in DMA-API-HOWTO which has said this since > forever (well, before Git history, at least): > > "The CPU virtual address and the DMA address are both > guaranteed to be aligned to the smallest PAGE_SIZE order which > is greater than or equal to the requested size. This invariant > exists (for example) to guarantee that if you allocate a chunk > which is smaller than or equal to 64 kilobytes, the extent of the > buffer you receive will not cross a 64K boundary." I knew it was somewhere :-) Interestingly that also implies that the address returned for a size of (say) 128 will also be page aligned. In that case 128 byte alignment should probably be ok - but it is still an API change that could have horrid consequences. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h index c05d4e661489..2e542e314acf 100644 --- a/include/linux/dma-contiguous.h +++ b/include/linux/dma-contiguous.h @@ -161,15 +161,17 @@ static inline struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) { int node = dev ? dev_to_node(dev) : NUMA_NO_NODE; - size_t align = get_order(PAGE_ALIGN(size)); + void *cpu_addr = alloc_pages_exact_node(node, size, gfp); - return alloc_pages_node(node, gfp, align); + if (!cpu_addr) + return NULL; + return virt_to_page(p); } static inline void dma_free_contiguous(struct device *dev, struct page *page, size_t size) { - __free_pages(page, get_order(size)); + free_pages_exact(page_address(page), get_order(size)); } #endif diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index bfc0c17f2a3d..84f41eea2741 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -232,9 +232,8 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) { int node = dev ? dev_to_node(dev) : NUMA_NO_NODE; size_t count = PAGE_ALIGN(size) >> PAGE_SHIFT; - size_t align = get_order(PAGE_ALIGN(size)); - struct page *page = NULL; struct cma *cma = NULL; + void *cpu_addr; if (dev && dev->cma_area) cma = dev->cma_area; @@ -243,14 +242,20 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) /* CMA can be used only in the context which permits sleeping */ if (cma && gfpflags_allow_blocking(gfp)) { + size_t align = get_order(PAGE_ALIGN(size)); + struct page *page; + align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT); page = cma_alloc(cma, count, align, gfp & __GFP_NOWARN); + if (page) + return page; } /* Fallback allocation of normal pages */ - if (!page) - page = alloc_pages_node(node, gfp, align); - return page; + cpu_addr = alloc_pages_exact_node(node, size, gfp); + if (!cpu_addr) + return NULL; + return virt_to_page(cpu_addr); } /** @@ -267,7 +272,7 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) void dma_free_contiguous(struct device *dev, struct page *page, size_t size) { if (!cma_release(dev_get_cma_area(dev), page, size >> PAGE_SHIFT)) - __free_pages(page, get_order(size)); + free_pages_exact(page_address(page), get_order(size)); } /*
Many architectures (e.g. arm, m68 and sh) have always used exact allocation in their dma coherent allocator, which avoids a lot of memory waste especially for larger allocations. Lift this behavior into the generic allocator so that dma-direct and the generic IOMMU code benefit from this behavior as well. Signed-off-by: Christoph Hellwig <hch@lst.de> --- include/linux/dma-contiguous.h | 8 +++++--- kernel/dma/contiguous.c | 17 +++++++++++------ 2 files changed, 16 insertions(+), 9 deletions(-)