From patchwork Mon Jan 16 11:58:04 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vladimir Murzin X-Patchwork-Id: 9518545 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C74C460210 for ; Mon, 16 Jan 2017 11:58:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7310283FF for ; Mon, 16 Jan 2017 11:58:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9BDC02845E; Mon, 16 Jan 2017 11:58:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A5F5B283FF for ; Mon, 16 Jan 2017 11:58:31 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cT5vj-0004QD-03; Mon, 16 Jan 2017 11:58:31 +0000 Received: from foss.arm.com ([217.140.101.70]) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cT5vf-0004P6-8w for linux-arm-kernel@lists.infradead.org; Mon, 16 Jan 2017 11:58:29 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 21C2C154D; Mon, 16 Jan 2017 03:58:07 -0800 (PST) Received: from [10.1.79.56] (unknown [10.1.79.56]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9B2063F220; Mon, 16 Jan 2017 03:58:05 -0800 (PST) Subject: Re: [RFC PATCH v4 0/5] ARM: Fix dma_alloc_coherent() and friends for NOMMU To: Robin Murphy , Benjamin Gaignard References: <1484057925-23586-1-git-send-email-vladimir.murzin@arm.com> <1908ef72-beb0-1855-3c49-d5a37f014c17@arm.com> <78458e5f-6b30-ab9b-1226-83fe3a844e3a@arm.com> <8a428278-fe0f-ca1f-2b6a-96fdbb363db4@arm.com> <5c3476e2-fae4-677c-3ed9-7a4eb4263ecb@arm.com> <6dfa7f5a-712e-3457-305b-7e1b7f6a9b3c@arm.com> From: Vladimir Murzin Message-ID: <1c47a9f3-5167-9605-9878-aa539af75445@arm.com> Date: Mon, 16 Jan 2017 11:58:04 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <6dfa7f5a-712e-3457-305b-7e1b7f6a9b3c@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170116_035827_345332_3366A56E X-CRM114-Status: GOOD ( 30.90 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: sza@esh.hu, Alexandre Torgue , linux@armlinux.org.uk, kbuild-all@01.org, linux-arm-kernel@lists.infradead.org, Marek Szyprowski Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On 13/01/17 12:40, Robin Murphy wrote: > On 13/01/17 09:12, Vladimir Murzin wrote: >> On 12/01/17 18:07, Robin Murphy wrote: >>> On 12/01/17 17:15, Vladimir Murzin wrote: >>>> On 12/01/17 17:04, Robin Murphy wrote: >>>>> On 12/01/17 16:52, Vladimir Murzin wrote: >>>>>> On 12/01/17 10:55, Benjamin Gaignard wrote: >>>>>>> 2017-01-12 11:35 GMT+01:00 Benjamin Gaignard : >>>>>>>> 2017-01-11 15:34 GMT+01:00 Vladimir Murzin : >>>>>>>>> On 11/01/17 13:17, Benjamin Gaignard wrote: >>>>>>>>>> 2017-01-10 15:18 GMT+01:00 Vladimir Murzin : >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> It seem that addition of cache support for M-class cpus uncovered >>>>>>>>>>> latent bug in DMA usage. NOMMU memory model has been treated as being >>>>>>>>>>> always consistent; however, for R/M classes of cpu memory can be >>>>>>>>>>> covered by MPU which in turn might configure RAM as Normal >>>>>>>>>>> i.e. bufferable and cacheable. It breaks dma_alloc_coherent() and >>>>>>>>>>> friends, since data can stuck in caches now or be buffered. >>>>>>>>>>> >>>>>>>>>>> This patch set is trying to address the issue by providing region of >>>>>>>>>>> memory suitable for consistent DMA operations. It is supposed that >>>>>>>>>>> such region is marked by MPU as non-cacheable. Robin suggested to >>>>>>>>>>> advertise such memory as reserved shared-dma-pool, rather then using >>>>>>>>>>> homebrew command line option, and extend dma-coherent to provide >>>>>>>>>>> default DMA area in the similar way as it is done for CMA (PATCH >>>>>>>>>>> 2/5). It allows us to offload all bookkeeping on generic coherent DMA >>>>>>>>>>> framework, and it is seems that it might be reused by other >>>>>>>>>>> architectures like c6x and blackfin. >>>>>>>>>>> >>>>>>>>>>> Dedicated DMA region is required for cases other than: >>>>>>>>>>> - MMU/MPU is off >>>>>>>>>>> - cpu is v7m w/o cache support >>>>>>>>>>> - device is coherent >>>>>>>>>>> >>>>>>>>>>> In case one of the above conditions is true dma operations are forced >>>>>>>>>>> to be coherent and wired with dma_noop_ops. >>>>>>>>>>> >>>>>>>>>>> To make life easier NOMMU dma operations are kept in separate >>>>>>>>>>> compilation unit. >>>>>>>>>>> >>>>>>>>>>> Since the issue was reported in the same time as Benjamin sent his >>>>>>>>>>> patch [1] to allow mmap for NOMMU, his case is also addressed in this >>>>>>>>>>> series (PATCH 1/5 and PATCH 3/5). >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> I have tested this v4 on my setup (stm32f4, no cache, no MPU) and unfortunately >>>>>>>>>> it doesn't work with my drm/kms driver. >>>>>>>>> >>>>>>>>> I guess the same is for fbmem, but would be better to have confirmation since >>>>>>>>> amba-clcd I use has not been ported to drm/kms (yet), so I can't test. >>>>>>>>> >>>>>>>>>> I haven't any errors but nothing is displayed unlike what I have when >>>>>>>>>> using current dma-mapping >>>>>>>>>> code. >>>>>>>>>> I guess the issue is coming from dma-noop where __get_free_pages() is >>>>>>>>>> used instead of alloc_pages() >>>>>>>>>> in dma-mapping. >>>>>>>>> >>>>>>>>> Unless I've missed something bellow is a call stack for both >>>>>>>>> >>>>>>>>> #1 >>>>>>>>> __alloc_simple_buffer >>>>>>>>> __dma_alloc_buffer >>>>>>>>> alloc_pages >>>>>>>>> split_page >>>>>>>>> __dma_clear_buffer >>>>>>>>> memset >>>>>>>>> page_address >>>>>>>>> >>>>>>>>> #2 >>>>>>>>> __get_free_pages >>>>>>>>> alloc_pages >>>>>>>>> page_address >>>>>>>>> >>>>>>>>> So the difference is that nommu case in dma-mapping.c memzeros memory, handles >>>>>>>>> DMA_ATTR_NO_KERNEL_MAPPING and does optimisation of memory usage. >>>>>>>>> >>>>>>>>> Is something from above critical for your driver? >>>>>>>> >>>>>>>> I have removed all the diff (split_page, __dma_clear_buffer, memset) >>>>>>>> from #1 and it is still working. >>>>>>>> DMA_ATTR_NO_KERNEL_MAPPING flag is not set when allocating the buffer. >>>>>>>> >>>>>>>> I have investigated more and found that dma-noop doesn't take care of >>>>>>>> "dma-ranges" property which is set in DT. >>>>>>>> I believed that is the root cause of my problem with your patches. >>>>>>> >>>>>>> After testing changing virt_to_phys to virt_to_dma in dma-noop.c fix the issue >>>>>>> modetest and fbdemo are now still functional. >>>>>>> >>>>>> >>>>>> Thanks for narrowing it down! I did not noticed that stm32f4 remap its memory, >>>>>> so dma-ranges property is in use. >>>>>> >>>>>> It looks like virt_to_dma is ARM specific, so I probably have to discard idea >>>>>> of reusing dma-noop-ops and switch logic into dma-mapping-nommu.c based on >>>>>> is_device_dma_coherent(dev) check. >>>>> >>>>> dma_pfn_offset is a member of struct device, so it should be OK for >>>>> dma_noop_ops to also make reference to it (and assume it's zero if not >>>>> explicitly set). >>>>> >>>>>> Meanwhile, I'm quite puzzled on how such memory remaping should work together >>>>>> with reserved memory. It seem it doesn't account dma-ranges while reserving >>>>>> memory (it is too early) nor while allocating/mapping/etc. >>>>> >>>>> The reserved memory is described in terms of CPU physical addresses, so >>>>> a device offset shouldn't matter from that perspective. It only comes >>>>> into play at the point you generate the dma_addr_t to hand off to the >>>>> device - only then do you need to transform the CPU physical address of >>>>> the allocated/mapped page into the device's view of that page (i.e. >>>>> subtract the offset). >>>> >>>> Thanks for explanation! So dma-coherent.c should be modified, right? I see >>>> that some architectures provide phys_to_dma/dma_to_phys helpers primary for >>>> swiotlb, is it safe to reuse them given that default implementation is >>>> provided? Nothing under Documentation explains how they supposed to be used, >>>> sorry if asking stupid question. >>> >>> Those are essentially SWIOTLB-specific, so can't be universally relied >>> upon. I think something like this ought to suffice: >> >> Yup, but what about dma-coherent.c? Currently it has >> >> int dma_alloc_from_coherent(struct device *dev, ssize_t size, >> dma_addr_t *dma_handle, void **ret) >> { >> ... >> *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); >> *ret = mem->virt_base + (pageno << PAGE_SHIFT); >> ... >> } >> >> In case reserved memory is described in terms of CPU phys addresses, would not >> we need to take into account dma_pfn_offset? What I'm missing? > > Ah yes, I overlooked that one. AFAICS, that's intended to be accounted > for when calling dma_init_coherent_memory (i.e. phys_addr vs. > device_addr), but that's a bit awkward for a global pool. > > How utterly disgusting do you think this (or some variant thereof) looks? > > /* Apply device-specific offset for the global pool */ > if (mem == dma_coherent_default_memory) > *handle += dev->dma_pfn_offset << PAGE_SHIFT; It'd work for default dma region, but IMO the issue is wider here... does following look better? Cheers Vladimir > > Robin. > >> Thanks >> Vladimir >> >>> >>> ---8<--- >>> diff --git a/lib/dma-noop.c b/lib/dma-noop.c >>> index 3d766e78fbe2..fbb1b37750d5 100644 >>> --- a/lib/dma-noop.c >>> +++ b/lib/dma-noop.c >>> @@ -8,6 +8,11 @@ >>> #include >>> #include >>> >>> +static dma_addr_t dma_noop_dev_offset(struct device *dev) >>> +{ >>> + return (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT; >>> +} >>> + >>> static void *dma_noop_alloc(struct device *dev, size_t size, >>> dma_addr_t *dma_handle, gfp_t gfp, >>> unsigned long attrs) >>> @@ -16,7 +21,7 @@ static void *dma_noop_alloc(struct device *dev, size_t >>> size, >>> >>> ret = (void *)__get_free_pages(gfp, get_order(size)); >>> if (ret) >>> - *dma_handle = virt_to_phys(ret); >>> + *dma_handle = virt_to_phys(ret) - dma_noop_dev_offset(dev); >>> return ret; >>> } >>> >>> @@ -32,7 +37,7 @@ static dma_addr_t dma_noop_map_page(struct device >>> *dev, struct page *page, >>> enum dma_data_direction dir, >>> unsigned long attrs) >>> { >>> - return page_to_phys(page) + offset; >>> + return page_to_phys(page) + offset - dma_noop_dev_offset(dev); >>> } >>> >>> static int dma_noop_map_sg(struct device *dev, struct scatterlist *sgl, >>> int nents, >>> @@ -47,7 +52,8 @@ static int dma_noop_map_sg(struct device *dev, struct >>> scatterlist *sgl, int nent >>> >>> BUG_ON(!sg_page(sg)); >>> va = sg_virt(sg); >>> - sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va); >>> + sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va) - >>> + dma_noop_dev_offset(dev); >>> sg_dma_len(sg) = sg->length; >>> } >>> --->8--- >>> >>> intentionally whitespace-damaged by copy-pasting off my terminal to >>> emphasise how utterly untested it is ;) >>> >>> Robin. >>> >> > > diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index b52ba27..22daa4c 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -27,6 +27,15 @@ static inline struct dma_coherent_mem *dev_get_coherent_memory(struct device *de return dma_coherent_default_memory; } +static inline dma_addr_t dma_get_device_base(struct device *dev, + struct dma_coherent_mem * mem) +{ + if (!dev) + return mem->pfn_base << PAGE_SHIFT; + + return (mem->pfn_base + dev->dma_pfn_offset) << PAGE_SHIFT; +} + static bool dma_init_coherent_memory( phys_addr_t phys_addr, dma_addr_t device_addr, size_t size, int flags, struct dma_coherent_mem **mem) @@ -92,13 +101,19 @@ static void dma_release_coherent_memory(struct dma_coherent_mem *mem) static int dma_assign_coherent_memory(struct device *dev, struct dma_coherent_mem *mem) { + unsigned long dma_pfn_offset = mem->pfn_base - PFN_DOWN(mem->device_base); + if (!dev) return -ENODEV; if (dev->dma_mem) return -EBUSY; + if (dev->dma_pfn_offset) + WARN_ON(dev->dma_pfn_offset != dma_pfn_offset); + dev->dma_mem = mem; + dev->dma_pfn_offset = dma_pfn_offset; /* FIXME: this routine just ignores DMA_MEMORY_INCLUDES_CHILDREN */ return 0; @@ -145,7 +160,7 @@ void *dma_mark_declared_memory_occupied(struct device *dev, return ERR_PTR(-EINVAL); spin_lock_irqsave(&mem->spinlock, flags); - pos = (device_addr - mem->device_base) >> PAGE_SHIFT; + pos = PFN_DOWN(device_addr - dma_get_device_base(dev, mem)); err = bitmap_allocate_region(mem->bitmap, pos, get_order(size)); spin_unlock_irqrestore(&mem->spinlock, flags); @@ -195,8 +210,9 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size, /* * Memory was found in the per-device area. */ - *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); + *dma_handle = dma_get_device_base(dev, mem) + (pageno << PAGE_SHIFT); *ret = mem->virt_base + (pageno << PAGE_SHIFT); + dma_memory_map = (mem->flags & DMA_MEMORY_MAP); spin_unlock_irqrestore(&mem->spinlock, flags); if (dma_memory_map)