Message ID | 20211020161724.1.I67612ea073c3306c71b46a87be894f79707082df@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/bridge: analogix_dp: Make PSR-disable non-blocking | expand |
On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote: > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"), > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor > accidentally (?) set blocking=true. IIRC this wasn't accidental. The reason it became synchronous was: - To avoid racing a subsequent PSR entry (if exit takes a long time) - To avoid racing disable/modeset - We're not displaying new content while exiting PSR anyways, so there is minimal utility in allowing frames to be submitted - We're lying to userspace telling them frames are on the screen when we're just dropping them on the floor The actual latency gains from doing this synchronously are minimal since the panel will display new content as soon as it can regardless of whether the kernel is blocking. There is likely a perceptual difference, but that's only because kernel is lying to userspace and skipping frames without consent. Going back to the first line, it's entirely possible my memory is failing and this was accidental! Sean > > This can cause upwards of 60-100ms of unneeded latency when exiting > self-refresh, which can cause very noticeable lag when, say, moving a > cursor. > > Presumbaly it's OK to let the display finish exiting refresh in parallel > with clocking out the next video frames, so we shouldn't hold up the > atomic_enable() step. This also brings behavior in line with the > downstream ("mainline-derived") variant of the driver currently deployed > to Chrome OS Rockchip systems. > > Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin). > > Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR") > Cc: <stable@vger.kernel.org> > Cc: Zain Wang <wzz@rock-chips.com> > Cc: Tomasz Figa <tfiga@chromium.org> > Cc: Heiko Stuebner <heiko@sntech.de> > Cc: Sean Paul <seanpaul@chromium.org> > Signed-off-by: Brian Norris <briannorris@chromium.org> > --- > CC list is partially constructed from the commit message of the Fixed > commit > > drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c > index b7d2e4449cfa..fbe6eb9df310 100644 > --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c > +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c > @@ -1055,7 +1055,7 @@ static int analogix_dp_disable_psr(struct analogix_dp_device *dp) > psr_vsc.db[0] = 0; > psr_vsc.db[1] = 0; > > - return analogix_dp_send_psr_spd(dp, &psr_vsc, true); > + return analogix_dp_send_psr_spd(dp, &psr_vsc, false); > } > > /* > -- > 2.33.0.1079.g6e70778dc9-goog >
On Wed, Oct 20, 2021 at 5:40 PM Sean Paul <sean@poorly.run> wrote: > On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote: > > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"), > > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor > > accidentally (?) set blocking=true. > > IIRC this wasn't accidental. > > The reason it became synchronous was: > - To avoid racing a subsequent PSR entry (if exit takes a long time) How did this work pre-commit-6c836d965bad then? I don't see any provision for avoiding subsequent PSR entry. Or I guess that was implicitly covered by PSR_FLUSH_TIMEOUT_MS, which means we allowed at least 100ms between exit/entry each time, which was good enough? And in the "new" implementation, that turned into a running average that gets measured on each commit? So we're no longer guaranteed 100ms, and it's even worse if we cheat the timing measurement? I'm still not sure that "race" is truly a problem without consulting some kind of hardware documentation though. It wouldn't surprise me if these things are cancelable. > - To avoid racing disable/modeset > - We're not displaying new content while exiting PSR anyways, so there is > minimal utility in allowing frames to be submitted > - We're lying to userspace telling them frames are on the screen when we're > just dropping them on the floor > > The actual latency gains from doing this synchronously are minimal since the > panel will display new content as soon as it can regardless of whether the > kernel is blocking. There is likely a perceptual difference, but that's only > because kernel is lying to userspace and skipping frames without consent. Hmm, you might well be right about some of the first points (I'm still learning the DRM framework), but I'm a bit skeptical that the perceptual difference is "only" because we're cheating in some way. I'm not doing science here, and it's certainly not a blinded test, but I'm nearly certain this patch cuts out approx 50-80% of the cursor lag I see without this patch (relative to the current Chrome OS kernel). I don't see how cheating would produce a smoother cursor movement -- we'd still be dropping frames, and the movement would appear jumpy somewhere along the way. In any case, I'm absolutely certain that mainline Linux performs much much worse with PSR than the current CrOS kernel, but there are some other potential reasons for that, such as the lack of an input-notifier [1]. > Going back to the first line, it's entirely possible my memory is failing > and this was accidental! Well either way, thanks for the notes. I'll see if I can get anywhere on proving/disproving that they are relevant, or if they can be worked around some other way; or perhaps I can regain the lost performance some other way. It'll be a few days before I get around to that. Brian [1] This got locked up in "controversy": https://patchwork.kernel.org/project/linux-arm-kernel/patch/20180405095000.9756-25-enric.balletbo@collabora.com/
(Dropping Andrzej, because that address keeps bouncing. Does MAINTAINERS and/or .mailmap need updating?) Apologies for the double reply here, but I forgot to mention one last thing for now: On Wed, Oct 20, 2021 at 5:40 PM Sean Paul <sean@poorly.run> wrote: > > On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote: > > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"), > > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor > > accidentally (?) set blocking=true. > > IIRC this wasn't accidental. One other tip that made me think it was accidental was that today, the |blocking| argument to analogix_dp_send_psr_spd() is always true. If non-blocking support was intentionally dropped, it seemed like you should have dropped the non-blocking code too. But that's a weak proof of your intentions :) Brian
On Wed, Oct 20, 2021 at 06:23:35PM -0700, Brian Norris wrote: > On Wed, Oct 20, 2021 at 5:40 PM Sean Paul <sean@poorly.run> wrote: > > The actual latency gains from doing this synchronously are minimal since the > > panel will display new content as soon as it can regardless of whether the > > kernel is blocking. There is likely a perceptual difference, but that's only > > because kernel is lying to userspace and skipping frames without consent. > > Hmm, you might well be right about some of the first points (I'm still > learning the DRM framework), but I'm a bit skeptical that the > perceptual difference is "only" because we're cheating in some way. > I'm not doing science here, and it's certainly not a blinded test, but > I'm nearly certain this patch cuts out approx 50-80% of the cursor lag > I see without this patch (relative to the current Chrome OS kernel). I > don't see how cheating would produce a smoother cursor movement -- > we'd still be dropping frames, and the movement would appear jumpy > somewhere along the way. Aha, so I think I found {a,the} reason for some disagreement here: looking at the eDP PSR spec, I see that while the current implementation is looking for psr_state==DP_PSR_SINK_INACTIVE to signal PSR-exit completion, the spec shows an intermediate state (DP_PSR_SINK_ACTIVE_RESYNC == 4), where among other things, "the Sink device must display the incoming active frames from the Source device with no visible glitches and/or artifacts." And it happens that we move to DP_PSR_SINK_ACTIVE_RESYNC somewhat quickly (on the order of 20-40ms), while the move to DP_PSR_SINK_INACTIVE is a good chunk longer (approx 60ms more). So pre-commit-6c836d965bad might have been cheating a little (we'd claim we're "done" about 20-40ms too early), but post-commit-6c836d965bad we're waiting about 60ms too long. I'll send v2 to make this block for DP_PSR_SINK_ACTIVE_RESYNC || DP_PSR_SINK_INACTIVE, which gets much or all of the same latency win, and I'll try to document the reasons, etc., better. I'll probably also include a patch to drop the 'blocking' parameter, since it's unused, and gives the wrong idea about this state machine. > In any case, I'm absolutely certain that mainline Linux performs much > much worse with PSR than the current CrOS kernel, but there are some > other potential reasons for that, such as the lack of an > input-notifier [1]. ... > [1] This got locked up in "controversy": > https://patchwork.kernel.org/project/linux-arm-kernel/patch/20180405095000.9756-25-enric.balletbo@collabora.com/ While I'm here: I also played with this a bit, and I still haven't gotten all the details right, but I don't believe this alone will get the latency wins we'd like. We still need something like the above. Brian
diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c index b7d2e4449cfa..fbe6eb9df310 100644 --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c @@ -1055,7 +1055,7 @@ static int analogix_dp_disable_psr(struct analogix_dp_device *dp) psr_vsc.db[0] = 0; psr_vsc.db[1] = 0; - return analogix_dp_send_psr_spd(dp, &psr_vsc, true); + return analogix_dp_send_psr_spd(dp, &psr_vsc, false); } /*
Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"), "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor accidentally (?) set blocking=true. This can cause upwards of 60-100ms of unneeded latency when exiting self-refresh, which can cause very noticeable lag when, say, moving a cursor. Presumbaly it's OK to let the display finish exiting refresh in parallel with clocking out the next video frames, so we shouldn't hold up the atomic_enable() step. This also brings behavior in line with the downstream ("mainline-derived") variant of the driver currently deployed to Chrome OS Rockchip systems. Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin). Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR") Cc: <stable@vger.kernel.org> Cc: Zain Wang <wzz@rock-chips.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Sean Paul <seanpaul@chromium.org> Signed-off-by: Brian Norris <briannorris@chromium.org> --- CC list is partially constructed from the commit message of the Fixed commit drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)