From patchwork Mon Dec 8 08:19:03 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Courbot X-Patchwork-Id: 5454781 Return-Path: X-Original-To: patchwork-dri-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 4B1009F1D4 for ; Mon, 8 Dec 2014 08:19:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3A4952012E for ; Mon, 8 Dec 2014 08:19:25 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 0F2072012B for ; Mon, 8 Dec 2014 08:19:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1925789F0B; Mon, 8 Dec 2014 00:19:23 -0800 (PST) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from hqemgate15.nvidia.com (hqemgate15.nvidia.com [216.228.121.64]) by gabe.freedesktop.org (Postfix) with ESMTP id 2C68D6E383 for ; Mon, 8 Dec 2014 00:19:22 -0800 (PST) Received: from hqnvupgp07.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com id ; Mon, 08 Dec 2014 00:18:32 -0800 Received: from HQMAIL101.nvidia.com ([172.20.12.94]) by hqnvupgp07.nvidia.com (PGP Universal service); Mon, 08 Dec 2014 00:16:20 -0800 X-PGP-Universal: processed; by hqnvupgp07.nvidia.com on Mon, 08 Dec 2014 00:16:20 -0800 Received: from HKMAIL102.nvidia.com (10.18.16.11) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.847.32; Mon, 8 Dec 2014 08:19:21 +0000 Received: from [10.19.57.128] (10.19.57.128) by HKMAIL102.nvidia.com (10.18.16.11) with Microsoft SMTP Server (TLS) id 15.0.847.32; Mon, 8 Dec 2014 08:19:06 +0000 Message-ID: <54855EF7.8050900@nvidia.com> Date: Mon, 8 Dec 2014 17:19:03 +0900 From: Alexandre Courbot Organization: NVIDIA User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Thierry Reding , Subject: Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM References: <1415795945-17575-1-git-send-email-thierry.reding@gmail.com> <54854FE7.8000604@nvidia.com> <54855D4B.50307@nvidia.com> In-Reply-To: <54855D4B.50307@nvidia.com> X-NVConfidentiality: public X-Originating-IP: [10.19.57.128] X-ClientProxiedBy: HKMAIL102.nvidia.com (10.18.16.11) To HKMAIL102.nvidia.com (10.18.16.11) Cc: Alexandre Courbot , Thomas Hellstrom , Allen Martin , Arnd Bergmann , Konrad Rzeszutek Wilk X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 12/08/2014 05:11 PM, Alexandre Courbot wrote: > On 12/08/2014 04:14 PM, Alexandre Courbot wrote: >> On 11/12/2014 09:39 PM, Thierry Reding wrote: >>> From: Thierry Reding >>> >>> dma_alloc_coherent() returns a kernel virtual address that is part of >>> the linear range. Passing such an address to virt_to_page() is illegal >>> on non-coherent architectures. This causes the kernel to oops on 64-bit >>> ARM because the struct page * obtained from virt_to_page() points to >>> unmapped memory. >>> >>> This commit fixes this by using phys_to_page() since we get a physical >>> address from dma_alloc_coherent(). Note that this is not a proper fix >>> because if an IOMMU is set up to translate addresses for the GPU this >>> address will be an I/O virtual address rather than a physical one. The >>> proper fix probably involves not getting a pointer to the struct page >>> in the first place, but that would be a much more intrusive change, if >>> at all possible. >>> >>> Until that time, this temporary fix will allow TTM to work on 32-bit >>> and 64-bit ARM as well, provided that no IOMMU translations are enabled >>> for the GPU. >>> >>> Signed-off-by: Thierry Reding >>> --- >>> Arnd, I realize that this isn't a proper fix according to what we >>> discussed on >>> IRC yesterday, but I can't see a way to remove access to the pages >>> array that >>> would be as simple as this. I've marked this as RFC in the hope that >>> it will >>> trigger some discussion that will lead to a proper solution. >>> >>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> index c96db433f8af..d7993985752c 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c >>> @@ -343,7 +343,11 @@ static struct dma_page >>> *__ttm_dma_alloc_page(struct dma_pool *pool) >>> &d_page->dma, >>> pool->gfp_flags); >>> if (d_page->vaddr) >>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64) >>> + d_page->p = phys_to_page(d_page->dma); >>> +#else >>> d_page->p = virt_to_page(d_page->vaddr); >>> +#endif >> >> Since I am messing with the IOMMU I just happened to have hit the issue >> you are mentioning. Wouldn't the following work: >> >> - if (d_page->vaddr) >> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64) >> - d_page->p = phys_to_page(d_page->dma); >> -#else >> - d_page->p = virt_to_page(d_page->vaddr); >> -#endif >> - else { >> + if (d_page->vaddr) { >> + if (is_vmalloc_addr(d_page->vaddr)) { >> + d_page->p = vmalloc_to_page(d_page->vaddr); >> + } else { >> + d_page->p = virt_to_page(d_page->vaddr); >> + } >> + } else { >> >> A remapped page will end up in the vmalloc range of the address space, >> and in this case we can use vmalloc_to_page() to get the right page. >> Pages outside of this range are part of the linear mapping and can be >> resolved using virt_to_page(). >> >> Jetson seems to be mostly happy with this, although I sometimes get the >> following trace: >> >> [ 13.174763] kernel BUG at ../mm/slab.c:2593! >> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM >> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O) >> cfbfillrect cfbimgblt cfbcopyarea ttm >> ... >> [ 13.175234] [] (cache_alloc_refill) from [] >> (__kmalloc+0x100/0x13c) >> [ 13.175247] [] (__kmalloc) from [] >> (arm_iommu_alloc_attrs+0x94/0x3a8) >> [ 13.175269] [] (arm_iommu_alloc_attrs) from [] >> (ttm_dma_populate+0x498/0x76c [ttm]) >> [ 13.175294] [] (ttm_dma_populate [ttm]) from [] >> (ttm_tt_bind+0x38/0x68 [ttm]) >> [ 13.175315] [] (ttm_tt_bind [ttm]) from [] >> (ttm_bo_handle_move_mem+0x408/0x47c [ttm]) >> [ 13.175337] [] (ttm_bo_handle_move_mem [ttm]) from >> [] (ttm_bo_validate+0x220/0x22c [ttm]) >> [ 13.175359] [] (ttm_bo_validate [ttm]) from [] >> (ttm_bo_init+0x220/0x338 [ttm]) >> [ 13.175480] [] (ttm_bo_init [ttm]) from [] >> (nouveau_bo_new+0x1c0/0x294 [nouveau]) >> [ 13.175688] [] (nouveau_bo_new [nouveau]) from [] >> (nv84_fence_create+0x1cc/0x240 [nouveau]) >> [ 13.175891] [] (nv84_fence_create [nouveau]) from >> [] (nvc0_fence_create+0xc/0x24 [nouveau]) >> [ 13.176094] [] (nvc0_fence_create [nouveau]) from >> [] (nouveau_accel_init+0xec/0x450 [nouveau]) >> >> I suspect this is related to this change, but it might also be the >> side-effect of another bug in my code. > > Ok, I finally understand where this trace comes from. > > slab.c:2593: > BUG_ON(flags & GFP_SLAB_BUG_MASK); > > gfp.h: > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > and ttm_dma_populate() calls ttm_dma_pool_init() with gfp_flags == > GFP_HIGHUSER, which includes __GFP_HIGHMEM. Somehow these flags are > propagated through arm_iommu_alloc_attrs() up to the slab allocator, > which meets this BUG_ON() when it needs to grow its cache. > > Note that ttm_dma_pool_init() can also be called with GFP_DMA32, which > will trigger the same error. > > So although I am still not sure how this should be fixed, it seems like > this has nothing to do with the change I am proposing. > > Temporary workaround: > > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c > @@ -882,9 +882,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma, > struct device *dev) > > type = ttm_to_type(ttm->page_flags, ttm->caching_state); > if (ttm->page_flags & TTM_PAGE_FLAG_DMA32) > - gfp_flags = GFP_USER | GFP_DMA32; > + gfp_flags = GFP_USER; > else > - gfp_flags = GFP_HIGHUSER; > + gfp_flags = GFP_USER; > if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC) > gfp_flags |= __GFP_ZERO; > > ... which obviously doesn't look right. This, however, looks better: if (!pages) I don't see any reason why the array of pages should be allocated with the same properties as the buffer itself. Looks like this is a DMA allocator bug. Sorry about the noise. Alex. diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index e8907117861e..bc495354c802 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1106,7 +1106,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, int i = 0; if (array_size <= PAGE_SIZE) - pages = kzalloc(array_size, gfp); + pages = kzalloc(array_size, GFP_KERNEL); else pages = vzalloc(array_size);