Message ID | 20200408115331.5529-1-thomas_os@shipmail.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ttm: Temporarily disable the huge_fault() callback | expand |
Hi, Christian, On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote: > From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org> > > With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are > errors like: > BUG: non-zero pgtables_bytes on freeing mm > and: > BUG: Bad rss-counter state > with TTM transparent huge-pages. > Until we've figured out what other TTM drivers do differently compared to > vmwgfx, disable the huge_fault() callback, eliminating transhuge > page-table entries. > > Cc: Christian König <christian.koenig@amd.com> > Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> > Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> > Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> > --- Without being able to test and track this down on amdgpu there's little more than this I can do at the moment. Hopefully I'll be able to test on nouveau/ttm after getting back from vacation to see if I can reproduce. It looks like some part of the kernel mistakes a huge page-table entry for a page directory, and that would be a path that is not hit with vmwgfx. /Thomas
Am 08.04.20 um 14:01 schrieb Thomas Hellström (VMware): > Hi, Christian, > > On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote: >> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org> >> >> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are >> errors like: >> BUG: non-zero pgtables_bytes on freeing mm >> and: >> BUG: Bad rss-counter state >> with TTM transparent huge-pages. >> Until we've figured out what other TTM drivers do differently >> compared to >> vmwgfx, disable the huge_fault() callback, eliminating transhuge >> page-table entries. >> >> Cc: Christian König <christian.koenig@amd.com> >> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> >> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> >> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Acked-by: Christian König <christian.koenig@amd.com> >> --- > > Without being able to test and track this down on amdgpu there's > little more than this I can do at the moment. Hopefully I'll be able > to test on nouveau/ttm after getting back from vacation to see if I > can reproduce. > > It looks like some part of the kernel mistakes a huge page-table entry > for a page directory, and that would be a path that is not hit with > vmwgfx. Well that looks like an ugly one and I don't know enough about the page table handling to hunt this down either. BTW: Have you seen the coverity warning about "WARN_ON_ONCE(ret = VM_FAULT_FALLBACK);"? Regards, Christian. > > /Thomas > >
On 4/8/20 2:19 PM, Christian König wrote: > Am 08.04.20 um 14:01 schrieb Thomas Hellström (VMware): >> Hi, Christian, >> >> On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote: >>> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org> >>> >>> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are >>> errors like: >>> BUG: non-zero pgtables_bytes on freeing mm >>> and: >>> BUG: Bad rss-counter state >>> with TTM transparent huge-pages. >>> Until we've figured out what other TTM drivers do differently >>> compared to >>> vmwgfx, disable the huge_fault() callback, eliminating transhuge >>> page-table entries. >>> >>> Cc: Christian König <christian.koenig@amd.com> >>> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> >>> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> >>> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> > > Acked-by: Christian König <christian.koenig@amd.com> > >>> --- >> >> Without being able to test and track this down on amdgpu there's >> little more than this I can do at the moment. Hopefully I'll be able >> to test on nouveau/ttm after getting back from vacation to see if I >> can reproduce. >> >> It looks like some part of the kernel mistakes a huge page-table >> entry for a page directory, and that would be a path that is not hit >> with vmwgfx. > > Well that looks like an ugly one and I don't know enough about the > page table handling to hunt this down either. > > BTW: Have you seen the coverity warning about "WARN_ON_ONCE(ret = > VM_FAULT_FALLBACK);"? Yes, that's a false warning but it might be that it should be rewritten for clarity like so: ret = VM_FAULT_FALLBACK; WARN_ON_ONCE(true); /Thomas > > Regards, > Christian. > >> >> /Thomas >> >>
Am 08.04.20 um 15:49 schrieb Thomas Hellström (VMware): > On 4/8/20 2:19 PM, Christian König wrote: >> Am 08.04.20 um 14:01 schrieb Thomas Hellström (VMware): >>> Hi, Christian, >>> >>> On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote: >>>> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org> >>>> >>>> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are >>>> errors like: >>>> BUG: non-zero pgtables_bytes on freeing mm >>>> and: >>>> BUG: Bad rss-counter state >>>> with TTM transparent huge-pages. >>>> Until we've figured out what other TTM drivers do differently >>>> compared to >>>> vmwgfx, disable the huge_fault() callback, eliminating transhuge >>>> page-table entries. >>>> >>>> Cc: Christian König <christian.koenig@amd.com> >>>> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> >>>> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> >>>> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> >> >> Acked-by: Christian König <christian.koenig@amd.com> >> >>>> --- >>> >>> Without being able to test and track this down on amdgpu there's >>> little more than this I can do at the moment. Hopefully I'll be able >>> to test on nouveau/ttm after getting back from vacation to see if I >>> can reproduce. >>> >>> It looks like some part of the kernel mistakes a huge page-table >>> entry for a page directory, and that would be a path that is not hit >>> with vmwgfx. >> >> Well that looks like an ugly one and I don't know enough about the >> page table handling to hunt this down either. >> >> BTW: Have you seen the coverity warning about "WARN_ON_ONCE(ret = >> VM_FAULT_FALLBACK);"? > > Yes, that's a false warning but it might be that it should be > rewritten for clarity like so: > > ret = VM_FAULT_FALLBACK; > WARN_ON_ONCE(true); Using an additional () also usually works, but I'm not even sure if WARN_ON_ONCE() isn't compiled to a no-op under some circumstances. So better save than sorry, Christian. > > /Thomas > > > >> >> Regards, >> Christian. >> >>> >>> /Thomas >>> >>> >
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 6ee3b96f0d13..0ad30b112982 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -442,66 +442,6 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf) } EXPORT_SYMBOL(ttm_bo_vm_fault); -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -/** - * ttm_pgprot_is_wrprotecting - Is a page protection value write-protecting? - * @prot: The page protection value - * - * Return: true if @prot is write-protecting. false otherwise. - */ -static bool ttm_pgprot_is_wrprotecting(pgprot_t prot) -{ - /* - * This is meant to say "pgprot_wrprotect(prot) == prot" in a generic - * way. Unfortunately there is no generic pgprot_wrprotect. - */ - return pte_val(pte_wrprotect(__pte(pgprot_val(prot)))) == - pgprot_val(prot); -} - -static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf, - enum page_entry_size pe_size) -{ - struct vm_area_struct *vma = vmf->vma; - pgprot_t prot; - struct ttm_buffer_object *bo = vma->vm_private_data; - vm_fault_t ret; - pgoff_t fault_page_size = 0; - bool write = vmf->flags & FAULT_FLAG_WRITE; - - switch (pe_size) { - case PE_SIZE_PMD: - fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT; - break; -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD - case PE_SIZE_PUD: - fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT; - break; -#endif - default: - WARN_ON_ONCE(1); - return VM_FAULT_FALLBACK; - } - - /* Fallback on write dirty-tracking or COW */ - if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot)) - return VM_FAULT_FALLBACK; - - ret = ttm_bo_vm_reserve(bo, vmf); - if (ret) - return ret; - - prot = vm_get_page_prot(vma->vm_flags); - ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size); - if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) - return ret; - - dma_resv_unlock(bo->base.resv); - - return ret; -} -#endif - void ttm_bo_vm_open(struct vm_area_struct *vma) { struct ttm_buffer_object *bo = vma->vm_private_data; @@ -604,9 +544,6 @@ static const struct vm_operations_struct ttm_bo_vm_ops = { .open = ttm_bo_vm_open, .close = ttm_bo_vm_close, .access = ttm_bo_vm_access, -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - .huge_fault = ttm_bo_vm_huge_fault, -#endif }; static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev,