diff mbox

drm/i915: Before pageflip, also wait for shared dmabuf fences.

Message ID 7eb19a73-a558-d2e6-bd8d-34fe95045dfd@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mario Kleiner Oct. 28, 2016, 5:37 p.m. UTC
On 10/28/2016 03:34 AM, Michel Dänzer wrote:
> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>
>> Just another gentle ping to see where you are with this?
>
> I haven't got a chance to look into this any further.
>
>

Fwiw., as a proof of concept, the attached experimental patch does work 
as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and 
DRI3/Present when applied to drm-next (updated from a few days ago). 
With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone 
under all loads. The tearing with "windowed" windows now looks as 
expected for regular tearing not related to Prime.

ftrace confirms the i915 driver's pageflip function is waiting on the 
fence in reservation_object_wait_timeout_rcu() as it should.

That entry->tv.shared needs to be set false for such buffers in 
amdgpu_bo_list_set() makes sense to me, as that is part of the buffer 
validation for command stream submission. There are other places in the 
driver where tv.shared is set, which i didn't check so far.

I don't know which of these would need to be updated with a "exported 
bo" check as well, e.g., for video decoding or maybe gpu compute? Adding 
or removing the check to amdgpu_gem_va_update_vm(), e.g., made no 
difference. I assume that makes sense because that functions seems to 
deal with amdgpu internal vm page tables or page table entries for such 
a bo, not with something visible to external clients?

All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping 
without causing any obvious new problems.

-mario

Comments

Christian König Oct. 28, 2016, 5:48 p.m. UTC | #1
Am 28.10.2016 um 19:37 schrieb Mario Kleiner:
>
>
> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
>> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>>
>>> Just another gentle ping to see where you are with this?
>>
>> I haven't got a chance to look into this any further.
>>
>>
>
> Fwiw., as a proof of concept, the attached experimental patch does 
> work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and 
> DRI3/Present when applied to drm-next (updated from a few days ago). 
> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone 
> under all loads. The tearing with "windowed" windows now looks as 
> expected for regular tearing not related to Prime.

Yeah, that's pretty much what I had in mind as well. You additionally 
need to wait for the shared fences when you export the BO for the first 
time, but that's only a nitpick.

>
> ftrace confirms the i915 driver's pageflip function is waiting on the 
> fence in reservation_object_wait_timeout_rcu() as it should.
>
> That entry->tv.shared needs to be set false for such buffers in 
> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer 
> validation for command stream submission. There are other places in 
> the driver where tv.shared is set, which i didn't check so far.
>
> I don't know which of these would need to be updated with a "exported 
> bo" check as well, e.g., for video decoding or maybe gpu compute? 
> Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made 
> no difference. I assume that makes sense because that functions seems 
> to deal with amdgpu internal vm page tables or page table entries for 
> such a bo, not with something visible to external clients?

Yes, exactly. VM updates doesn't matter for anybody else than amdgpu. 
Additional to that we don't even add a fence to the shared slot we 
reserve (could probably drop that for optimization).

Please remove the debugging stuff and the extra code on the VM updates 
and add a reservation_object_wait_timeout_rcu() to the export path and 
we should be good to go.

Regards,
Christian.

>
> All i can say is it fixes 3D rendering under DRI3 + Prime + 
> pageflipping without causing any obvious new problems.
>
> -mario
Michel Dänzer Oct. 31, 2016, 6:41 a.m. UTC | #2
On 29/10/16 02:37 AM, Mario Kleiner wrote:
> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
>> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>>
>>> Just another gentle ping to see where you are with this?
>>
>> I haven't got a chance to look into this any further.
> 
> Fwiw., as a proof of concept, the attached experimental patch does work
> as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and
> DRI3/Present when applied to drm-next (updated from a few days ago).
> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone
> under all loads. The tearing with "windowed" windows now looks as
> expected for regular tearing not related to Prime.
> 
> ftrace confirms the i915 driver's pageflip function is waiting on the
> fence in reservation_object_wait_timeout_rcu() as it should.
> 
> That entry->tv.shared needs to be set false for such buffers in
> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer
> validation for command stream submission. There are other places in the
> driver where tv.shared is set, which i didn't check so far.
> 
> I don't know which of these would need to be updated with a "exported
> bo" check as well, e.g., for video decoding or maybe gpu compute? Adding
> or removing the check to amdgpu_gem_va_update_vm(), e.g., made no
> difference. I assume that makes sense because that functions seems to
> deal with amdgpu internal vm page tables or page table entries for such
> a bo, not with something visible to external clients?
> 
> All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping
> without causing any obvious new problems.

[...]

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> index 7700dc2..bfbfeb9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
> @@ -121,5 +121,8 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
>  	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
>  		return ERR_PTR(-EPERM);
>  
> +	bo->prime_exported = true;
> +	DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo);
> +
>  	return drm_gem_prime_export(dev, gobj, flags);
>  }
> 

This will take effect in non-PRIME cases as well, at least DRI3 and
GL<->[other API] interop off the top of my head. Is that okay?
Mario Kleiner Nov. 5, 2016, 1:17 a.m. UTC | #3
On 10/28/2016 07:48 PM, Christian König wrote:
> Am 28.10.2016 um 19:37 schrieb Mario Kleiner:
>>
>>
>> On 10/28/2016 03:34 AM, Michel Dänzer wrote:
>>> On 27/10/16 10:33 PM, Mike Lothian wrote:
>>>>
>>>> Just another gentle ping to see where you are with this?
>>>
>>> I haven't got a chance to look into this any further.
>>>
>>>
>>
>> Fwiw., as a proof of concept, the attached experimental patch does
>> work as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and
>> DRI3/Present when applied to drm-next (updated from a few days ago).
>> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone
>> under all loads. The tearing with "windowed" windows now looks as
>> expected for regular tearing not related to Prime.
>
> Yeah, that's pretty much what I had in mind as well. You additionally
> need to wait for the shared fences when you export the BO for the first
> time, but that's only a nitpick.
>
>>
>> ftrace confirms the i915 driver's pageflip function is waiting on the
>> fence in reservation_object_wait_timeout_rcu() as it should.
>>
>> That entry->tv.shared needs to be set false for such buffers in
>> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer
>> validation for command stream submission. There are other places in
>> the driver where tv.shared is set, which i didn't check so far.
>>
>> I don't know which of these would need to be updated with a "exported
>> bo" check as well, e.g., for video decoding or maybe gpu compute?
>> Adding or removing the check to amdgpu_gem_va_update_vm(), e.g., made
>> no difference. I assume that makes sense because that functions seems
>> to deal with amdgpu internal vm page tables or page table entries for
>> such a bo, not with something visible to external clients?
>
> Yes, exactly. VM updates doesn't matter for anybody else than amdgpu.
> Additional to that we don't even add a fence to the shared slot we
> reserve (could probably drop that for optimization).
>
> Please remove the debugging stuff and the extra code on the VM updates
> and add a reservation_object_wait_timeout_rcu() to the export path and
> we should be good to go.
>

Done. Patch v2 just sent out after retesting with Tonga only and Intel + 
Tonga renderoffload. Would be cool if we could get this into 4.9-rc.

Ideally also backported to 4.8, given it is a simple change, as that 
would be the next official kernel for *buntu 16.04-LTS and derivatives, 
but that's probably breaking the rules as it doesn't fix a regression?

thanks,
-mario


> Regards,
> Christian.
>
>>
>> All i can say is it fixes 3D rendering under DRI3 + Prime +
>> pageflipping without causing any obvious new problems.
>>
>> -mario
>
>
diff mbox

Patch

From 2a8d7fcd36da30305fa675df311c697162792597 Mon Sep 17 00:00:00 2001
From: Mario Kleiner <mario.kleiner.de@gmail.com>
Date: Wed, 26 Oct 2016 10:58:00 +0200
Subject: [PATCH] drm/amdgpu: Attach exclusive fence to prime exported bo's.

External clients which import our bo's wait only
for exclusive dmabuf-fences, not on shared ones,
so attach fences on such exported buffers as
exclusive ones, not shared ones.

-> Backup commit. Work in progress.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h         | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 5 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     | 6 ++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c   | 3 +++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 039b57e..a337d56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -459,6 +459,7 @@  struct amdgpu_bo {
 	u64				metadata_flags;
 	void				*metadata;
 	u32				metadata_size;
+	bool				prime_exported;
 	/* list of all virtual address to which this bo
 	 * is associated to
 	 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 651115d..6e1d7b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -132,7 +132,10 @@  static int amdgpu_bo_list_set(struct amdgpu_device *adev,
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
 		entry->tv.bo = &entry->robj->tbo;
-		entry->tv.shared = true;
+		entry->tv.shared = !entry->robj->prime_exported;
+
+		if (entry->robj->prime_exported)
+		    DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj);
 
 		if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS)
 			gds_obj = entry->robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a7ea9a3..730a68e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -494,6 +494,12 @@  static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 
 	tv.bo = &bo_va->bo->tbo;
 	tv.shared = true;
+
+	if (bo_va->bo->prime_exported) {
+	    DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo);
+	    /* tv.shared = false; */
+	}
+
 	list_add(&tv.head, &list);
 
 	amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7700dc2..bfbfeb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -121,5 +121,8 @@  struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
 		return ERR_PTR(-EPERM);
 
+	bo->prime_exported = true;
+	DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo);
+
 	return drm_gem_prime_export(dev, gobj, flags);
 }
-- 
2.7.4