diff mbox series

drm/ttm: Temporarily disable the huge_fault() callback

Message ID 20200408115331.5529-1-thomas_os@shipmail.org (mailing list archive)
State New, archived
Headers show
Series drm/ttm: Temporarily disable the huge_fault() callback | expand

Commit Message

Thomas Hellström (Intel) April 8, 2020, 11:53 a.m. UTC
From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>

With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are
errors like:
BUG: non-zero pgtables_bytes on freeing mm
and:
BUG: Bad rss-counter state
with TTM transparent huge-pages.
Until we've figured out what other TTM drivers do differently compared to
vmwgfx, disable the huge_fault() callback, eliminating transhuge
page-table entries.

Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 63 ---------------------------------
 1 file changed, 63 deletions(-)

Comments

Thomas Hellström (Intel) April 8, 2020, 12:01 p.m. UTC | #1
Hi, Christian,

On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote:
> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>
>
> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are
> errors like:
> BUG: non-zero pgtables_bytes on freeing mm
> and:
> BUG: Bad rss-counter state
> with TTM transparent huge-pages.
> Until we've figured out what other TTM drivers do differently compared to
> vmwgfx, disable the huge_fault() callback, eliminating transhuge
> page-table entries.
>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
> ---

Without being able to test and track this down on amdgpu there's little 
more than this I can do at the moment. Hopefully I'll be able to test on 
nouveau/ttm after getting back from vacation to see if I can reproduce.

It looks like some part of the kernel mistakes a huge page-table entry 
for a page directory, and that would be a path that is not hit with vmwgfx.

/Thomas
Christian König April 8, 2020, 12:19 p.m. UTC | #2
Am 08.04.20 um 14:01 schrieb Thomas Hellström (VMware):
> Hi, Christian,
>
> On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote:
>> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>
>>
>> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are
>> errors like:
>> BUG: non-zero pgtables_bytes on freeing mm
>> and:
>> BUG: Bad rss-counter state
>> with TTM transparent huge-pages.
>> Until we've figured out what other TTM drivers do differently 
>> compared to
>> vmwgfx, disable the huge_fault() callback, eliminating transhuge
>> page-table entries.
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
>> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
>> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>

Acked-by: Christian König <christian.koenig@amd.com>

>> ---
>
> Without being able to test and track this down on amdgpu there's 
> little more than this I can do at the moment. Hopefully I'll be able 
> to test on nouveau/ttm after getting back from vacation to see if I 
> can reproduce.
>
> It looks like some part of the kernel mistakes a huge page-table entry 
> for a page directory, and that would be a path that is not hit with 
> vmwgfx.

Well that looks like an ugly one and I don't know enough about the page 
table handling to hunt this down either.

BTW: Have you seen the coverity warning about  "WARN_ON_ONCE(ret = 
VM_FAULT_FALLBACK);"?

Regards,
Christian.

>
> /Thomas
>
>
Thomas Hellström (Intel) April 8, 2020, 1:49 p.m. UTC | #3
On 4/8/20 2:19 PM, Christian König wrote:
> Am 08.04.20 um 14:01 schrieb Thomas Hellström (VMware):
>> Hi, Christian,
>>
>> On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote:
>>> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>
>>>
>>> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are
>>> errors like:
>>> BUG: non-zero pgtables_bytes on freeing mm
>>> and:
>>> BUG: Bad rss-counter state
>>> with TTM transparent huge-pages.
>>> Until we've figured out what other TTM drivers do differently 
>>> compared to
>>> vmwgfx, disable the huge_fault() callback, eliminating transhuge
>>> page-table entries.
>>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
>>> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
>>> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
>
> Acked-by: Christian König <christian.koenig@amd.com>
>
>>> ---
>>
>> Without being able to test and track this down on amdgpu there's 
>> little more than this I can do at the moment. Hopefully I'll be able 
>> to test on nouveau/ttm after getting back from vacation to see if I 
>> can reproduce.
>>
>> It looks like some part of the kernel mistakes a huge page-table 
>> entry for a page directory, and that would be a path that is not hit 
>> with vmwgfx.
>
> Well that looks like an ugly one and I don't know enough about the 
> page table handling to hunt this down either.
>
> BTW: Have you seen the coverity warning about  "WARN_ON_ONCE(ret = 
> VM_FAULT_FALLBACK);"?

Yes, that's a false warning but it might be that it should be rewritten 
for clarity like so:

ret = VM_FAULT_FALLBACK;
WARN_ON_ONCE(true);

/Thomas



>
> Regards,
> Christian.
>
>>
>> /Thomas
>>
>>
Christian König April 8, 2020, 2:04 p.m. UTC | #4
Am 08.04.20 um 15:49 schrieb Thomas Hellström (VMware):
> On 4/8/20 2:19 PM, Christian König wrote:
>> Am 08.04.20 um 14:01 schrieb Thomas Hellström (VMware):
>>> Hi, Christian,
>>>
>>> On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote:
>>>> From: "Thomas Hellstrom (VMware)" <thomas_os@shipmail.org>
>>>>
>>>> With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are
>>>> errors like:
>>>> BUG: non-zero pgtables_bytes on freeing mm
>>>> and:
>>>> BUG: Bad rss-counter state
>>>> with TTM transparent huge-pages.
>>>> Until we've figured out what other TTM drivers do differently 
>>>> compared to
>>>> vmwgfx, disable the huge_fault() callback, eliminating transhuge
>>>> page-table entries.
>>>>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org>
>>>> Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
>>>> Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
>>
>> Acked-by: Christian König <christian.koenig@amd.com>
>>
>>>> ---
>>>
>>> Without being able to test and track this down on amdgpu there's 
>>> little more than this I can do at the moment. Hopefully I'll be able 
>>> to test on nouveau/ttm after getting back from vacation to see if I 
>>> can reproduce.
>>>
>>> It looks like some part of the kernel mistakes a huge page-table 
>>> entry for a page directory, and that would be a path that is not hit 
>>> with vmwgfx.
>>
>> Well that looks like an ugly one and I don't know enough about the 
>> page table handling to hunt this down either.
>>
>> BTW: Have you seen the coverity warning about  "WARN_ON_ONCE(ret = 
>> VM_FAULT_FALLBACK);"?
>
> Yes, that's a false warning but it might be that it should be 
> rewritten for clarity like so:
>
> ret = VM_FAULT_FALLBACK;
> WARN_ON_ONCE(true);

Using an additional () also usually works, but I'm not even sure if 
WARN_ON_ONCE() isn't compiled to a no-op under some circumstances.

So better save than sorry,
Christian.

>
> /Thomas
>
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> /Thomas
>>>
>>>
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 6ee3b96f0d13..0ad30b112982 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -442,66 +442,6 @@  vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 }
 EXPORT_SYMBOL(ttm_bo_vm_fault);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/**
- * ttm_pgprot_is_wrprotecting - Is a page protection value write-protecting?
- * @prot: The page protection value
- *
- * Return: true if @prot is write-protecting. false otherwise.
- */
-static bool ttm_pgprot_is_wrprotecting(pgprot_t prot)
-{
-	/*
-	 * This is meant to say "pgprot_wrprotect(prot) == prot" in a generic
-	 * way. Unfortunately there is no generic pgprot_wrprotect.
-	 */
-	return pte_val(pte_wrprotect(__pte(pgprot_val(prot)))) ==
-		pgprot_val(prot);
-}
-
-static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf,
-				       enum page_entry_size pe_size)
-{
-	struct vm_area_struct *vma = vmf->vma;
-	pgprot_t prot;
-	struct ttm_buffer_object *bo = vma->vm_private_data;
-	vm_fault_t ret;
-	pgoff_t fault_page_size = 0;
-	bool write = vmf->flags & FAULT_FLAG_WRITE;
-
-	switch (pe_size) {
-	case PE_SIZE_PMD:
-		fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT;
-		break;
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
-	case PE_SIZE_PUD:
-		fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT;
-		break;
-#endif
-	default:
-		WARN_ON_ONCE(1);
-		return VM_FAULT_FALLBACK;
-	}
-
-	/* Fallback on write dirty-tracking or COW */
-	if (write && ttm_pgprot_is_wrprotecting(vma->vm_page_prot))
-		return VM_FAULT_FALLBACK;
-
-	ret = ttm_bo_vm_reserve(bo, vmf);
-	if (ret)
-		return ret;
-
-	prot = vm_get_page_prot(vma->vm_flags);
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size);
-	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-		return ret;
-
-	dma_resv_unlock(bo->base.resv);
-
-	return ret;
-}
-#endif
-
 void ttm_bo_vm_open(struct vm_area_struct *vma)
 {
 	struct ttm_buffer_object *bo = vma->vm_private_data;
@@ -604,9 +544,6 @@  static const struct vm_operations_struct ttm_bo_vm_ops = {
 	.open = ttm_bo_vm_open,
 	.close = ttm_bo_vm_close,
 	.access = ttm_bo_vm_access,
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	.huge_fault = ttm_bo_vm_huge_fault,
-#endif
 };
 
 static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev,