sna: avoid negative timeouts

Message ID	1380781769-7805-1-git-send-email-felipe.contreras@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org> From: Felipe Contreras <felipe.contreras@gmail.com> To: xorg-devel@lists.x.org Date: Thu, 3 Oct 2013 01:29:29 -0500 Message-Id: <1380781769-7805-1-git-send-email-felipe.contreras@gmail.com> Cc: intel-gfx@lists.freedesktop.org, Felipe Contreras <felipe.contreras@gmail.com> Subject: [Intel-gfx] [PATCH] sna: avoid negative timeouts Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Felipe Contreras Oct. 3, 2013, 6:29 a.m. UTC

It's nice to avoid X server crashes (by not passing negative values to
select(3)).

For more information:
http://article.gmane.org/gmane.comp.freedesktop.xorg.devel/37388

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---
 src/sna/sna_accel.c | 3 +++
 1 file changed, 3 insertions(+)

Felipe Contreras Oct. 3, 2013, 8:06 a.m. UTC | #1

On Thu, Oct 3, 2013 at 2:36 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 01:29:29AM -0500, Felipe Contreras wrote:
>> It's nice to avoid X server crashes (by not passing negative values to
>> select(3)).
>>
>> For more information:
>> http://article.gmane.org/gmane.comp.freedesktop.xorg.devel/37388
>>
>> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
>
> Thanks for the patch, I pushed a slightly different version to consume
> the expired flush timeout immediately.
>
> However, in order for this to happen something within the BlockHandler
> must have run for at least 4ms. Which is itself very worrying - the two
> candidates are the throttle or lock contention. (The throttle is also
> prone to being influenced by a third party.) Such stalls are likely to
> be noticeable as jitter or judder, so if you can spot their source
> hopefully we can tackle the root cause.

What exactly do you mean by third party? I wouldn't notice if there
was jitter because it's a loading screen, and there's absolutely no
updates on the screen; it's static.

Do you mean you would like me to add debugging in
sna_accel_block_handler() to figure exactly which block is taking too
long to complete?

Felipe Contreras Oct. 3, 2013, 9:09 a.m. UTC | #2

On Thu, Oct 3, 2013 at 3:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 03:06:03AM -0500, Felipe Contreras wrote:
>> On Thu, Oct 3, 2013 at 2:36 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > On Thu, Oct 03, 2013 at 01:29:29AM -0500, Felipe Contreras wrote:
>> >> It's nice to avoid X server crashes (by not passing negative values to
>> >> select(3)).
>> >>
>> >> For more information:
>> >> http://article.gmane.org/gmane.comp.freedesktop.xorg.devel/37388
>> >>
>> >> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
>> >
>> > Thanks for the patch, I pushed a slightly different version to consume
>> > the expired flush timeout immediately.
>> >
>> > However, in order for this to happen something within the BlockHandler
>> > must have run for at least 4ms. Which is itself very worrying - the two
>> > candidates are the throttle or lock contention. (The throttle is also
>> > prone to being influenced by a third party.) Such stalls are likely to
>> > be noticeable as jitter or judder, so if you can spot their source
>> > hopefully we can tackle the root cause.
>>
>> What exactly do you mean by third party? I wouldn't notice if there
>> was jitter because it's a loading screen, and there's absolutely no
>> updates on the screen; it's static.
>
> Other users of /dev/dri/card0. The most likely cause for the delay here
> is waiting for the GPU to complete an operation or for waiting for
> access to the GPU.
>
>> Do you mean you would like me to add debugging in
>> sna_accel_block_handler() to figure exactly which block is taking too
>> long to complete?
>
> Right, if you do something like:
>
> diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
> index 21cd342..31ea1a3 100644
> @@ -15931,6 +15931,8 @@ void sna_accel_close(struct sna *sna)
>
>  void sna_accel_block_handler(struct sna *sna, struct timeval **tv)
>  {
> +       uint32_t duration[NUM_TIMERS] = { 0 };
> +
>         sigtrap_assert();
>
>         if (sna->kgem.need_retire)
> @@ -15947,19 +15949,28 @@ void sna_accel_block_handler(struct sna *sna, struct timeval **tv)
>         }
>
>  restart:
> -       if (sna_accel_do_flush(sna))
> +       if (sna_accel_do_flush(sna)) {
> +               duration[FLUSH_TIMER] = TIME;
>                 sna_accel_flush(sna);
> +               duration[FLUSH_TIMER] = TIME - duration[FLUSH_TIMER];
> +       }
>         assert(sna_accel_scanout(sna) == NULL ||
>                sna_accel_scanout(sna)->gpu_bo->exec == NULL ||
>                sna->timer_active & (1<<(FLUSH_TIMER)));
>
> -       if (sna_accel_do_throttle(sna))
> +       if (sna_accel_do_throttle(sna)) {
> +               duration[THROTTLE_TIMER] = TIME;
>                 sna_accel_throttle(sna);
> +               duration[THROTTLE_TIMER] = TIME - duration[THROTTLE_TIMER];
> +       }
>         assert(!sna->kgem.need_retire ||
>                sna->timer_active & (1<<(THROTTLE_TIMER)));
>
> -       if (sna_accel_do_expire(sna))
> +       if (sna_accel_do_expire(sna)) {
> +               duration[EXPIRE_TIMER] = TIME;
>                 sna_accel_expire(sna);
> +               duration[EXPIRE_TIMER] = TIME - duration[EXPIRE_TIMER];
> +       }
>         assert(!sna->kgem.need_expire ||
>                sna->timer_active & (1<<(EXPIRE_TIMER)));
>
> @@ -15981,8 +15992,14 @@ restart:
>                 timeout = sna->timer_expire[FLUSH_TIMER] - TIME;
>                 DBG(("%s: flush timer expires in %d [%d]\n",
>                      __FUNCTION__, timeout, sna->timer_expire[FLUSH_TIMER]));
> -               if (timeout < 3)
> +               if (timeout < 3) {
> +                       ErrorF("restarting timeout[%d] (%d, %d, %d)\n",
> +                              timeout,
> +                              duration[FLUSH_TIMER],
> +                              duration[THROTTLE_TIMER],
> +                              duration[EXPIRE_TIMER]);
>                         goto restart;
> +               }
>
>                 if (*tv == NULL) {
>                         *tv = &sna->timer_tv;
>
> That will hopefully catch which path is consuming too much time

This is what I got:

restarting timeout[2] (14, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[2] (14, 0, 0)
restarting timeout[1] (15, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[1] (15, 0, 0)
restarting timeout[1] (15, 0, 0)
restarting timeout[-1] (17, 0, 0)

Felipe Contreras Oct. 3, 2013, 3:56 p.m. UTC | #3

On Thu, Oct 3, 2013 at 4:15 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 04:09:32AM -0500, Felipe Contreras wrote:
>> On Thu, Oct 3, 2013 at 3:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > That will hopefully catch which path is consuming too much time
>>
>> This is what I got:
>>
>> restarting timeout[2] (14, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[2] (14, 0, 0)
>> restarting timeout[1] (15, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[1] (15, 0, 0)
>> restarting timeout[1] (15, 0, 0)
>> restarting timeout[-1] (17, 0, 0)
>
> Suspiciously like we block for an entire vrefresh to submit one batch.
> Hmm, do you mind sending me your complete Xorg.0.log?

Here you go.

BTW, the game is free to play, you just need Steam.

Felipe Contreras Oct. 3, 2013, 4:19 p.m. UTC | #4

On Thu, Oct 3, 2013 at 11:01 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 10:56:43AM -0500, Felipe Contreras wrote:
>> On Thu, Oct 3, 2013 at 4:15 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > On Thu, Oct 03, 2013 at 04:09:32AM -0500, Felipe Contreras wrote:
>> >> On Thu, Oct 3, 2013 at 3:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> >> > That will hopefully catch which path is consuming too much time
>> >>
>> >> This is what I got:
>> >>
>> >> restarting timeout[2] (14, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[2] (14, 0, 0)
>> >> restarting timeout[1] (15, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[1] (15, 0, 0)
>> >> restarting timeout[1] (15, 0, 0)
>> >> restarting timeout[-1] (17, 0, 0)
>> >
>> > Suspiciously like we block for an entire vrefresh to submit one batch.
>> > Hmm, do you mind sending me your complete Xorg.0.log?
>>
>> Here you go.
>
> Oh, TearFree. That explains it. I plan to fix that up very soon.

Cool. Any idea why this happens only in this game, and only in that
exact loading screen?

Chris Wilson Oct. 3, 2013, 4:31 p.m. UTC | #5

On Thu, Oct 03, 2013 at 11:19:56AM -0500, Felipe Contreras wrote:
> On Thu, Oct 3, 2013 at 11:01 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Oh, TearFree. That explains it. I plan to fix that up very soon.
> 
> Cool. Any idea why this happens only in this game, and only in that
> exact loading screen?

No. Just a perfect storm I would guess.
-Chris

sna: avoid negative timeouts

Commit Message

Comments

Patch