Message ID | 622c9ac3784170880dbde6900146e68631df958a.1523977133.git.robin.murphy@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 4/17/2018 11:58 AM, Robin Murphy wrote: > The amdgpu driver doesn't appear to directly use the scatterlist mapped > by amdgpu_ttm_tt_pin_userptr(), it merely hands it off to > drm_prime_sg_to_page_addr_arrays() to generate the dma_address array > which it actually cares about. Now that the latter can cope with > dma_map_sg() coalescing dma-contiguous segments such that it returns > 0 < count < nents, we can relax the current count == nents check to > only consider genuine failure as other drivers do. > > This avoids a false negative on arm64 systems with an IOMMU. > > Reported-by: Sinan Kaya <okaya@codeaurora.org> > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > --- > > v2: Expand commit message to clarify > > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 205da3ff9cd0..f81e96a4242f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) > > r = -ENOMEM; > nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction); > - if (nents != ttm->sg->nents) > + if (nents == 0) > goto release_sg; > > drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages, > Tested-by: Sinan Kaya <okaya@codeaurora.org> using QDF2400 and XFX Vega64 GPU for the first two patches. ./builddir/tests/amdgpu/amdgpu_test -s 1 Suite: Basic Tests Test: Userptr Test ...passed Userptr Test fails without this patch.
On 4/17/2018 5:13 PM, Sinan Kaya wrote: > Tested-by: Sinan Kaya <okaya@codeaurora.org> > > using QDF2400 and XFX Vega64 GPU for the first two patches. > > ./builddir/tests/amdgpu/amdgpu_test -s 1 > > Suite: Basic Tests > Test: Userptr Test ...passed > > Userptr Test fails without this patch. I'm taking this back. I observed a crash with the HSA applications: ubuntu@ubuntu:~/amdgpu$_./vectoradd_hip.exe [ 834.002206] create_process:620 [ 837.413021] Unable to handle kernel NULL pointer dereference at virtual address 00000018 [ 837.414097] user pgtable: 4k pages, 48-bit VAs, pgd = ffff80000d448000 [ 837.427034] [0000000000000018] *pgd=000000000a424003, *pud=000000000e0b3003, *pmd=0000000000000000 [ 837.427414] Internal error: Oops: 96000006 [#1] SMP [ 837.427744] Modules linked in: [ 837.457060] CPU: 3 PID: 2321 Comm: vectoradd_hip.e Not tainted 4.13.0 #5 [ 837.463076] task: ffff80000dfb0d80 task.stack: ffff80000e17c000 [ 837.473795] PC is at drm_prime_sg_to_page_addr_arrays+0xac/0xec [ 837.482877] LR is at drm_prime_sg_to_page_addr_arrays+0xac/0xec [ 837.491910] pc : [<ffff0000084877e8>] lr : [<ffff0000084877e8>] pstate: 80400149 [ 837.492022] sp : ffff80000e17f850 [ 837.492115] x29: ffff80000e17f850 x28: ffff80000d586700 [ 837.516635] x27: 0000000000000000 x26: 0000e10410004000 [ 837.526444] x25: ffff80000cb91880 x24: 0000000000000002 [ 837.534974] x23: ffff80000cb91910 x22: ffff80000cb91900 [ 837.535178] x21: 0000000000000002 x20: 0000000000000000 [ 837.535340] x19: ffff80000cb91880 x18: 0000ffffffffd278 [ 837.560498] x17: 0000ffffbef39240 x16: ffff0000081bb868 [ 837.560684] x15: 0000ffffbf6fe000 x14: 0000000000000000 [ 837.574764] x13: 0000000000000000 x12: 0000000000000000 [ 837.588881] x11: 0000000000000001 x10: ffff80000a449038 [ 837.593181] x9 : 0000000000000000 x8 : ffff80000cb91980 [ 837.604606] x7 : 0000000000000000 x6 : 000000000000003f [ 837.612801] x5 : 0000000000000040 x4 : 0000000000000000 [ 837.617425] x3 : 0000000000000002 x2 : 0000000000000000 [ 837.625768] x1 : 0000000000000002 x0 : 0000000000000000 [ 838.516100] [<ffff0000084877e8>] drm_prime_sg_to_page_addr_arrays+0xac/0xec [ 838.516385] [<ffff0000084c4df4>] amdgpu_ttm_tt_populate+0x80/0xe8 [ 838.545137] [<ffff000008498fa4>] ttm_tt_bind+0x3c/0x7c [ 838.558468] [<ffff00000849abb4>] ttm_bo_handle_move_mem+0x12c/0x340 [ 838.562518] [<ffff00000849b988>] ttm_bo_validate+0x90/0x100 [ 838.572370] [<ffff00000849bc54>] ttm_bo_init_reserved+0x25c/0x324 [ 838.582103] [<ffff0000084c82b4>] amdgpu_bo_do_create+0x140/0x3e4 [ 838.591609] [<ffff0000084c8598>] amdgpu_bo_create+0x40/0x15c [ 838.601034] [<ffff00000856abc4>] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x36c/0x80c [ 838.609631] [<ffff0000084a3224>] kfd_ioctl_alloc_memory_of_gpu+0xfc/0x180 [ 838.621500] [<ffff0000084a49c0>] kfd_ioctl+0x144/0x1e8 [ 838.632253] [<ffff0000081bb0e8>] vfs_ioctl+0x18/0x40 [ 838.641592] [<ffff0000081bb758>] do_vfs_ioctl+0x5ac/0x6bc [ 838.649349] [<ffff0000081bb8c4>] SyS_ioctl+0x5c/0x8c [ 838.649609] [<ffff000008082bf0>] el0_svc_naked+0x24/0x28 [ 838.649776] Code: 17fffff1 350000d4 aa1903e0 97fab3f6 (b9401814) [ 838.672742] ---[ end trace fb2627bd4d4c9818 ]--- Robin, if you want to debug this; feel free to send me a debug patch.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 205da3ff9cd0..f81e96a4242f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm) r = -ENOMEM; nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction); - if (nents != ttm->sg->nents) + if (nents == 0) goto release_sg; drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
The amdgpu driver doesn't appear to directly use the scatterlist mapped by amdgpu_ttm_tt_pin_userptr(), it merely hands it off to drm_prime_sg_to_page_addr_arrays() to generate the dma_address array which it actually cares about. Now that the latter can cope with dma_map_sg() coalescing dma-contiguous segments such that it returns 0 < count < nents, we can relax the current count == nents check to only consider genuine failure as other drivers do. This avoids a false negative on arm64 systems with an IOMMU. Reported-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> --- v2: Expand commit message to clarify drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)