Message ID | 570B50B4.4020304@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/11/2016 04:22 PM, Alexandre Courbot wrote: > ... or maybe we could just unconditionally sync all buffers and let the > DMA API abstract this away. My concern is that on coherent architectures > we would still need to loop over all the pages for nothing, as I don't > think the loop (see e.g. nouveau_bo_sync_for_cpu in nouveau_bo.c) can be > optimized away by the compiler. Looking at the code it actually turns out we are already calling the sync functions on coherent buses anyway, so maybe we have little reasons to keep this at all?
On 04/11/2016 04:22 PM, Alexandre Courbot wrote: > Hi Robin, > > On 04/09/2016 03:46 AM, Robin Murphy wrote: >> Hi Alex, >> >> On 08/04/16 05:47, Alexandre Courbot wrote: >>> Hi Robin, >>> >>> On 04/07/2016 08:50 PM, Robin Murphy wrote: >>>> Hello, >>>> >>>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the >>>> look of it by dereferencing some offset from NULL inside >>>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged >>>> into an ARM Juno r1 board, which works fine with 4.5 and earlier. >>>> >>>> Attached are a couple of logs from booting arm64 defconfig plus DRM and >>>> Nouveau enabled - the second also has framebuffer console rotation >>>> turned on, which interestingly seems to move the point of failure, and >>>> the display does eventually come up to show the tail end of the >>>> panic in >>>> that case. >>>> >>>> I might be able to find time for a full bisection next week if isn't >>>> something sufficiently obvious to anyone who knows this driver. >>> >>> Looking at the log it is not clear to me what could be causing this. I >>> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would >>> indeed be useful here. >> >> OK, turns out the lure of writing something to remotely drive a Juno and >> parse kernel bootlogs through an automatic bisection was too great to >> resist on a Friday afternoon :D >> >> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as >> non-CPU-coherent on ARM64"), and sure enough reverting that removes the >> crash. > > Thanks for taking the time to bisect this. And apologies as it seems my > commit is the reason for your troubles. > > The CPU coherency flag is used for two things: explicitly sync buffers > pages when required, and allocating buffers that are not explicitly > synced (like fences or pushbuffers) using the DMA API. For this latter > use, it also accesses the buffer's content using the mapping provided by > dma_alloc_coherent() instead of creating a new one. All nouveau_bos are > supposed to be written using nouveau_bo_rd32(), and this function > handles the case of an DMA-API allocated object by detecting that the > result of ttm_kmap_obj_virtual() is NULL. > > But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in > order to perform a memcpy and uses its result directly - which means we > are doing memcpy on a NULL pointer. We never caught this because we > typically do not use Nouveau's fbcon with an ARM setup. > > I don't really like this special access for coherent objects, and > actually had a patch in my tree to attempt to remove it (attached). > Although it is not the whole solution (see below), the issue should at > least not be visible with it applied - could you confirm? Hi Robin, could you confirm whether the attached patch in my previous mail helps with your problem? Thanks!
Hi Alex, On 20/04/16 05:35, Alexandre Courbot wrote: [...] >>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as >>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the >>> crash. >> >> Thanks for taking the time to bisect this. And apologies as it seems my >> commit is the reason for your troubles. >> >> The CPU coherency flag is used for two things: explicitly sync buffers >> pages when required, and allocating buffers that are not explicitly >> synced (like fences or pushbuffers) using the DMA API. For this latter >> use, it also accesses the buffer's content using the mapping provided by >> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are >> supposed to be written using nouveau_bo_rd32(), and this function >> handles the case of an DMA-API allocated object by detecting that the >> result of ttm_kmap_obj_virtual() is NULL. >> >> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in >> order to perform a memcpy and uses its result directly - which means we >> are doing memcpy on a NULL pointer. We never caught this because we >> typically do not use Nouveau's fbcon with an ARM setup. >> >> I don't really like this special access for coherent objects, and >> actually had a patch in my tree to attempt to remove it (attached). >> Although it is not the whole solution (see below), the issue should at >> least not be visible with it applied - could you confirm? > > Hi Robin, could you confirm whether the attached patch in my previous > mail helps with your problem? With that patch on top of -rc4, it's conjuring up something that looks somewhat more like a real address on top of the offset, as it now crashes with "Unable to handle kernel paging request at virtual address ffffff8008f841ac", rather than the previous "Unable to handle kernel NULL pointer dereference at virtual address 000001ac". That does of course mean it still crashes in the same place, though :( Robin. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 20/04/16 11:44, Robin Murphy wrote: > Hi Alex, > > On 20/04/16 05:35, Alexandre Courbot wrote: > [...] >>>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as >>>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the >>>> crash. >>> >>> Thanks for taking the time to bisect this. And apologies as it seems my >>> commit is the reason for your troubles. >>> >>> The CPU coherency flag is used for two things: explicitly sync buffers >>> pages when required, and allocating buffers that are not explicitly >>> synced (like fences or pushbuffers) using the DMA API. For this latter >>> use, it also accesses the buffer's content using the mapping provided by >>> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are >>> supposed to be written using nouveau_bo_rd32(), and this function >>> handles the case of an DMA-API allocated object by detecting that the >>> result of ttm_kmap_obj_virtual() is NULL. >>> >>> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in >>> order to perform a memcpy and uses its result directly - which means we >>> are doing memcpy on a NULL pointer. We never caught this because we >>> typically do not use Nouveau's fbcon with an ARM setup. >>> >>> I don't really like this special access for coherent objects, and >>> actually had a patch in my tree to attempt to remove it (attached). >>> Although it is not the whole solution (see below), the issue should at >>> least not be visible with it applied - could you confirm? >> >> Hi Robin, could you confirm whether the attached patch in my previous >> mail helps with your problem? > > With that patch on top of -rc4, it's conjuring up something that looks > somewhat more like a real address on top of the offset, as it now > crashes with "Unable to handle kernel paging request at virtual address > ffffff8008f841ac", rather than the previous "Unable to handle kernel > NULL pointer dereference at virtual address 000001ac". > > That does of course mean it still crashes in the same place, though :( > > Robin. > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy > the information in any medium. Thank you. And since I intentionally sent this to the lists, anyone reading that _is_ an intended recipient, so it's all good, I promise! [sorry, SMTP server mixup on my end... *berates self*] Robin. > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
From 6199967b4f690e5ca7f404ebb9d1d8840024b5b7 Mon Sep 17 00:00:00 2001 From: Alexandre Courbot <acourbot@nvidia.com> Date: Thu, 3 Mar 2016 12:49:28 +0900 Subject: [PATCH] WIP: no dma api for coherent gpuobjs X-NVConfidentiality: public --- drivers/gpu/drm/nouveau/nouveau_bo.c | 61 +++--------------------------------- 1 file changed, 5 insertions(+), 56 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index db2a81461e0f..1112209ca871 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -424,13 +424,7 @@ nouveau_bo_map(struct nouveau_bo *nvbo) if (ret) return ret; - /* - * TTM buffers allocated using the DMA API already have a mapping, let's - * use it instead. - */ - if (!nvbo->force_coherent) - ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages, - &nvbo->kmap); + ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages, &nvbo->kmap); ttm_bo_unreserve(&nvbo->bo); return ret; @@ -442,12 +436,7 @@ nouveau_bo_unmap(struct nouveau_bo *nvbo) if (!nvbo) return; - /* - * TTM buffers allocated using the DMA API already had a coherent - * mapping which we used, no need to unmap. - */ - if (!nvbo->force_coherent) - ttm_bo_kunmap(&nvbo->kmap); + ttm_bo_kunmap(&nvbo->kmap); } void @@ -514,35 +503,13 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible, return 0; } -static inline void * -_nouveau_bo_mem_index(struct nouveau_bo *nvbo, unsigned index, void *mem, u8 sz) -{ - struct ttm_dma_tt *dma_tt; - u8 *m = mem; - - index *= sz; - - if (m) { - /* kmap'd address, return the corresponding offset */ - m += index; - } else { - /* DMA-API mapping, lookup the right address */ - dma_tt = (struct ttm_dma_tt *)nvbo->bo.ttm; - m = dma_tt->cpu_address[index / PAGE_SIZE]; - m += index % PAGE_SIZE; - } - - return m; -} -#define nouveau_bo_mem_index(o, i, m) _nouveau_bo_mem_index(o, i, m, sizeof(*m)) - void nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index, u16 val) { bool is_iomem; u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = nouveau_bo_mem_index(nvbo, index, mem); + mem += index; if (is_iomem) iowrite16_native(val, (void __force __iomem *)mem); @@ -556,7 +523,7 @@ nouveau_bo_rd32(struct nouveau_bo *nvbo, unsigned index) bool is_iomem; u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = nouveau_bo_mem_index(nvbo, index, mem); + mem += index; if (is_iomem) return ioread32_native((void __force __iomem *)mem); @@ -570,7 +537,7 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32 val) bool is_iomem; u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem); - mem = nouveau_bo_mem_index(nvbo, index, mem); + mem += index; if (is_iomem) iowrite32_native(val, (void __force __iomem *)mem); @@ -1496,14 +1463,6 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm) dev = drm->dev; pdev = device->dev; - /* - * Objects matching this condition have been marked as force_coherent, - * so use the DMA API for them. - */ - if (!nvxx_device(&drm->device)->func->cpu_coherent && - ttm->caching_state == tt_uncached) - return ttm_dma_populate(ttm_dma, dev->dev); - #if IS_ENABLED(CONFIG_AGP) if (drm->agp.bridge) { return ttm_agp_tt_populate(ttm); @@ -1561,16 +1520,6 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm) dev = drm->dev; pdev = device->dev; - /* - * Objects matching this condition have been marked as force_coherent, - * so use the DMA API for them. - */ - if (!nvxx_device(&drm->device)->func->cpu_coherent && - ttm->caching_state == tt_uncached) { - ttm_dma_unpopulate(ttm_dma, dev->dev); - return; - } - #if IS_ENABLED(CONFIG_AGP) if (drm->agp.bridge) { ttm_agp_tt_unpopulate(ttm); -- 2.8.0