Message ID | 1428590049-20357-2-git-send-email-thierry.reding@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 09, 2015 at 04:34:05PM +0200, Thierry Reding wrote: > From: Thierry Reding <treding@nvidia.com> > > Add implementations for drm_clflush_*() on ARM by borrowing code from > the DMA mapping API implementation. Unfortunately ARM doesn't export an > API to flush caches on a page by page basis, so this replicates most of > the code. I'm _really_ not happy with this, because it's poking about in ARM internal implementation details of the DMA API. It's also not going to work too well on aliasing caches either - especially when you consider that the userspace mapping of a page may have no relationship to the address you get from kmap. For an aliasing cache, the way things work with the DMA API, we ensure that the kernel alias is clean whenever pages are un-kmapped, which means that unmapped highmem pages never have L1 cache lines associated with the kernel alias of them. The user aliases are handled separately via the normal flush_dcache_page()/flush_anon_page() calls. None of this exists here... It gets even more hairly on older ARM CPUs - but I hope no one is expecting to support DRM's clflush there - we should make that explicit though, and ensure that clflush support returns an error there. That aside, we have most of this logic already inside dma_cache_maint_page(), and even if it was the right thing to be doing, we shouldn't be duplicating this architecture specific code inside a driver. So you can take that as a NAK on this.
On Fri, Apr 10, 2015 at 01:03:13PM +0100, Russell King - ARM Linux wrote: > On Thu, Apr 09, 2015 at 04:34:05PM +0200, Thierry Reding wrote: > > From: Thierry Reding <treding@nvidia.com> > > > > Add implementations for drm_clflush_*() on ARM by borrowing code from > > the DMA mapping API implementation. Unfortunately ARM doesn't export an > > API to flush caches on a page by page basis, so this replicates most of > > the code. > > I'm _really_ not happy with this, because it's poking about in ARM > internal implementation details of the DMA API. It's also not going > to work too well on aliasing caches either - especially when you > consider that the userspace mapping of a page may have no relationship > to the address you get from kmap. > > For an aliasing cache, the way things work with the DMA API, we ensure > that the kernel alias is clean whenever pages are un-kmapped, which > means that unmapped highmem pages never have L1 cache lines associated > with the kernel alias of them. The user aliases are handled separately > via the normal flush_dcache_page()/flush_anon_page() calls. > > None of this exists here... > > It gets even more hairly on older ARM CPUs - but I hope no one is > expecting to support DRM's clflush there - we should make that explicit > though, and ensure that clflush support returns an error there. > > That aside, we have most of this logic already inside > dma_cache_maint_page(), and even if it was the right thing to be doing, > we shouldn't be duplicating this architecture specific code inside a > driver. > > So you can take that as a NAK on this. Perhaps I should take a step back and explain what I'm trying to solve, maybe that'll allow us to come up with a more proper solution. Both the MSM and Tegra drivers allocate framebuffers from shmem in the presence of an IOMMU. The problem with allocating pages from the shmem is that pages end up not being flushed, resulting in visual artifacts on the screen (horizontal black lines) when the cachelines from earlier allocations of these pages get flushed. The drivers also use the IOMMU API directly to manage the IOVA space. Currently both drivers work around this by faking up an sg_table and calling dma_map_sg(), which ends up doing the right thing. Unfortunately this only works if nothing backs the DMA API and we end up getting the regular arm_dma_ops. If an IOMMU registers with the ARM/DMA glue, we'll end up getting a different set of dma_ops and things break (buffers are mapped through the IOMMU twice, ...). Still, this has worked fine on Tegra (and I assume the same is true for MSM) so far because we don't register an IOMMU with the ARM/DMA glue. However while porting this to 64-bit ARM I started seeing failures to map buffers because SWIOTLB is enabled by default and gets in the way. With regards to code duplication, I suspect we could call something like arm_dma_ops.sync_sg_for_device() directly, but that raises the question about which device to pass in. It isn't clear at allocation time which device will use the buffer. It may in fact be used by multiple devices at once. Do you have an alternative suggestion on how to fix this? Thierry
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 9a62d7a53553..200d86c3d72d 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -67,6 +67,41 @@ static void drm_cache_flush_clflush(struct page *pages[], } #endif +#if defined(CONFIG_ARM) + +#include <asm/cacheflush.h> +#include <asm/cachetype.h> +#include <asm/highmem.h> +#include <asm/outercache.h> + +static void drm_clflush_page(struct page *page) +{ + enum dma_data_direction dir = DMA_TO_DEVICE; + phys_addr_t phys = page_to_phys(page); + size_t size = PAGE_SIZE; + void *virt; + + if (PageHighMem(page)) { + if (cache_is_vipt_nonaliasing()) { + virt = kmap_atomic(page); + dmac_map_area(virt, size, dir); + kunmap_atomic(virt); + } else { + virt = kmap_high_get(page); + if (virt) { + dmac_map_area(virt, size, dir); + kunmap_high(page); + } + } + } else { + virt = page_address(page); + dmac_map_area(virt, size, dir); + } + + outer_flush_range(phys, phys + PAGE_SIZE); +} +#endif + void drm_clflush_pages(struct page *pages[], unsigned long num_pages) { @@ -94,6 +129,11 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) (unsigned long)page_virtual + PAGE_SIZE); kunmap_atomic(page_virtual); } +#elif defined(CONFIG_ARM) + unsigned long i; + + for (i = 0; i < num_pages; i++) + drm_clflush_page(pages[i]); #else printk(KERN_ERR "Architecture has no drm_cache.c support\n"); WARN_ON_ONCE(1); @@ -118,6 +158,11 @@ drm_clflush_sg(struct sg_table *st) if (wbinvd_on_all_cpus()) printk(KERN_ERR "Timed out waiting for cache flush.\n"); +#elif defined(CONFIG_ARM) + struct sg_page_iter sg_iter; + + for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) + drm_clflush_page(sg_page_iter_page(&sg_iter)); #else printk(KERN_ERR "Architecture has no drm_cache.c support\n"); WARN_ON_ONCE(1);