diff mbox

[1/2] drm/amd/display: Fix race between vblank irq and pageflip irq.

Message ID 1492791786-24543-2-git-send-email-mario.kleiner.de@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mario Kleiner April 21, 2017, 4:23 p.m. UTC
Since DC now uses CRTC_VERTICAL_INTERRUPT0 as VBLANK irq trigger
and vblank interrupts actually happen earliest at start of vblank,
instead of a bit before vblank, we no longer need some of the
fudging logic to deal with too early vblank irq handling (grep for
lb_vblank_lead_lines). This itself fixes a pageflip scheduling
bug in DC, caused by uninitialized  use of lb_vblank_lead_lines,
with a wrong startup value of 0. Thanks to the new vblank irq
trigger this value of zero is now actually correct for DC :).

A new problem is that vblank irq's race against pflip irq's,
and as both can fire at first line of vblank, it is no longer
guaranteed that vblank irq handling (therefore -> drm_handle_vblank()
-> drm_update_vblank_count()) executes before pflip irq handling
for a given vblank interval when a pageflip completes. Therefore
the vblank count and timestamps emitted to user-space as part of
the pageflip completion event will be often stale and cause new
timestamping and swap scheduling errors in user-space.

This was observed with large frequency on R9 380 Tonga Pro.

Fix this by enforcing a vblank count+timestamp update right
before emitting the pageflip completion event from the pflip
irq handler. The logic in core drm_update_vblank_count() makes
sure that no redundant or conflicting updates happen, iow. the
call turns into a no-op if it wasn't needed for that vblank,
burning a few microseconds of cpu time though.

Successfully tested on AMD R9 380 "Tonga Pro" (VI/DCE 10)
with DC enabled on the current DC staging branch. Independent
measurement of pageflip completion timing with special hardware
measurement equipment now confirms correct pageflip timestamps
and counts in the pageflip completion events.

Note that there is another unresolved pageflip bug present in current
dc staging, which causes pageflips to complete one vblank too early
when the pageflip ioctl gets called while in vblank. Something seems
to be amiss in the way amdgpu_dm_do_flip() handles 'target_vblank',
or how amdgpu_dm_atomic_commit_tail() computes 'target' for calling
amdgpu_dm_do_flip().

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Michel Dänzer April 24, 2017, 7:17 a.m. UTC | #1
On 22/04/17 01:23 AM, Mario Kleiner wrote:
> Since DC now uses CRTC_VERTICAL_INTERRUPT0 as VBLANK irq trigger
> and vblank interrupts actually happen earliest at start of vblank,
> instead of a bit before vblank, we no longer need some of the
> fudging logic to deal with too early vblank irq handling (grep for
> lb_vblank_lead_lines). This itself fixes a pageflip scheduling
> bug in DC, caused by uninitialized  use of lb_vblank_lead_lines,
> with a wrong startup value of 0. Thanks to the new vblank irq
> trigger this value of zero is now actually correct for DC :).
> 
> A new problem is that vblank irq's race against pflip irq's,
> and as both can fire at first line of vblank, it is no longer
> guaranteed that vblank irq handling (therefore -> drm_handle_vblank()
> -> drm_update_vblank_count()) executes before pflip irq handling
> for a given vblank interval when a pageflip completes. Therefore
> the vblank count and timestamps emitted to user-space as part of
> the pageflip completion event will be often stale and cause new
> timestamping and swap scheduling errors in user-space.
> 
> This was observed with large frequency on R9 380 Tonga Pro.
> 
> Fix this by enforcing a vblank count+timestamp update right
> before emitting the pageflip completion event from the pflip
> irq handler. The logic in core drm_update_vblank_count() makes
> sure that no redundant or conflicting updates happen, iow. the
> call turns into a no-op if it wasn't needed for that vblank,
> burning a few microseconds of cpu time though.
> 
> Successfully tested on AMD R9 380 "Tonga Pro" (VI/DCE 10)
> with DC enabled on the current DC staging branch. Independent
> measurement of pageflip completion timing with special hardware
> measurement equipment now confirms correct pageflip timestamps
> and counts in the pageflip completion events.
> 
> Note that there is another unresolved pageflip bug present in current
> dc staging, which causes pageflips to complete one vblank too early
> when the pageflip ioctl gets called while in vblank. Something seems
> to be amiss in the way amdgpu_dm_do_flip() handles 'target_vblank',
> or how amdgpu_dm_atomic_commit_tail() computes 'target' for calling
> amdgpu_dm_do_flip().

If the last paragraph refers to the problem fixed by patch 2, I'd drop
that paragraph. With that,

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 794362e..0d77b0a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -208,6 +208,9 @@  static void dm_pflip_high_irq(void *interrupt_params)
 	if (amdgpu_crtc->event
 			&& amdgpu_crtc->event->event.base.type
 			== DRM_EVENT_FLIP_COMPLETE) {
+		/* Update to correct count/ts if racing with vblank irq */
+		drm_accurate_vblank_count(&amdgpu_crtc->base);
+
 		drm_crtc_send_vblank_event(&amdgpu_crtc->base, amdgpu_crtc->event);
 		/* page flip completed. clean up */
 		amdgpu_crtc->event = NULL;