diff mbox

sna: avoid negative timeouts

Message ID 1380781769-7805-1-git-send-email-felipe.contreras@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Felipe Contreras Oct. 3, 2013, 6:29 a.m. UTC
It's nice to avoid X server crashes (by not passing negative values to
select(3)).

For more information:
http://article.gmane.org/gmane.comp.freedesktop.xorg.devel/37388

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---
 src/sna/sna_accel.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Felipe Contreras Oct. 3, 2013, 8:06 a.m. UTC | #1
On Thu, Oct 3, 2013 at 2:36 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 01:29:29AM -0500, Felipe Contreras wrote:
>> It's nice to avoid X server crashes (by not passing negative values to
>> select(3)).
>>
>> For more information:
>> http://article.gmane.org/gmane.comp.freedesktop.xorg.devel/37388
>>
>> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
>
> Thanks for the patch, I pushed a slightly different version to consume
> the expired flush timeout immediately.
>
> However, in order for this to happen something within the BlockHandler
> must have run for at least 4ms. Which is itself very worrying - the two
> candidates are the throttle or lock contention. (The throttle is also
> prone to being influenced by a third party.) Such stalls are likely to
> be noticeable as jitter or judder, so if you can spot their source
> hopefully we can tackle the root cause.

What exactly do you mean by third party? I wouldn't notice if there
was jitter because it's a loading screen, and there's absolutely no
updates on the screen; it's static.

Do you mean you would like me to add debugging in
sna_accel_block_handler() to figure exactly which block is taking too
long to complete?
Felipe Contreras Oct. 3, 2013, 9:09 a.m. UTC | #2
On Thu, Oct 3, 2013 at 3:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 03:06:03AM -0500, Felipe Contreras wrote:
>> On Thu, Oct 3, 2013 at 2:36 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > On Thu, Oct 03, 2013 at 01:29:29AM -0500, Felipe Contreras wrote:
>> >> It's nice to avoid X server crashes (by not passing negative values to
>> >> select(3)).
>> >>
>> >> For more information:
>> >> http://article.gmane.org/gmane.comp.freedesktop.xorg.devel/37388
>> >>
>> >> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
>> >
>> > Thanks for the patch, I pushed a slightly different version to consume
>> > the expired flush timeout immediately.
>> >
>> > However, in order for this to happen something within the BlockHandler
>> > must have run for at least 4ms. Which is itself very worrying - the two
>> > candidates are the throttle or lock contention. (The throttle is also
>> > prone to being influenced by a third party.) Such stalls are likely to
>> > be noticeable as jitter or judder, so if you can spot their source
>> > hopefully we can tackle the root cause.
>>
>> What exactly do you mean by third party? I wouldn't notice if there
>> was jitter because it's a loading screen, and there's absolutely no
>> updates on the screen; it's static.
>
> Other users of /dev/dri/card0. The most likely cause for the delay here
> is waiting for the GPU to complete an operation or for waiting for
> access to the GPU.
>
>> Do you mean you would like me to add debugging in
>> sna_accel_block_handler() to figure exactly which block is taking too
>> long to complete?
>
> Right, if you do something like:
>
> diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
> index 21cd342..31ea1a3 100644
> @@ -15931,6 +15931,8 @@ void sna_accel_close(struct sna *sna)
>
>  void sna_accel_block_handler(struct sna *sna, struct timeval **tv)
>  {
> +       uint32_t duration[NUM_TIMERS] = { 0 };
> +
>         sigtrap_assert();
>
>         if (sna->kgem.need_retire)
> @@ -15947,19 +15949,28 @@ void sna_accel_block_handler(struct sna *sna, struct timeval **tv)
>         }
>
>  restart:
> -       if (sna_accel_do_flush(sna))
> +       if (sna_accel_do_flush(sna)) {
> +               duration[FLUSH_TIMER] = TIME;
>                 sna_accel_flush(sna);
> +               duration[FLUSH_TIMER] = TIME - duration[FLUSH_TIMER];
> +       }
>         assert(sna_accel_scanout(sna) == NULL ||
>                sna_accel_scanout(sna)->gpu_bo->exec == NULL ||
>                sna->timer_active & (1<<(FLUSH_TIMER)));
>
> -       if (sna_accel_do_throttle(sna))
> +       if (sna_accel_do_throttle(sna)) {
> +               duration[THROTTLE_TIMER] = TIME;
>                 sna_accel_throttle(sna);
> +               duration[THROTTLE_TIMER] = TIME - duration[THROTTLE_TIMER];
> +       }
>         assert(!sna->kgem.need_retire ||
>                sna->timer_active & (1<<(THROTTLE_TIMER)));
>
> -       if (sna_accel_do_expire(sna))
> +       if (sna_accel_do_expire(sna)) {
> +               duration[EXPIRE_TIMER] = TIME;
>                 sna_accel_expire(sna);
> +               duration[EXPIRE_TIMER] = TIME - duration[EXPIRE_TIMER];
> +       }
>         assert(!sna->kgem.need_expire ||
>                sna->timer_active & (1<<(EXPIRE_TIMER)));
>
> @@ -15981,8 +15992,14 @@ restart:
>                 timeout = sna->timer_expire[FLUSH_TIMER] - TIME;
>                 DBG(("%s: flush timer expires in %d [%d]\n",
>                      __FUNCTION__, timeout, sna->timer_expire[FLUSH_TIMER]));
> -               if (timeout < 3)
> +               if (timeout < 3) {
> +                       ErrorF("restarting timeout[%d] (%d, %d, %d)\n",
> +                              timeout,
> +                              duration[FLUSH_TIMER],
> +                              duration[THROTTLE_TIMER],
> +                              duration[EXPIRE_TIMER]);
>                         goto restart;
> +               }
>
>                 if (*tv == NULL) {
>                         *tv = &sna->timer_tv;
>
> That will hopefully catch which path is consuming too much time

This is what I got:

restarting timeout[2] (14, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[2] (14, 0, 0)
restarting timeout[1] (15, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[0] (16, 0, 0)
restarting timeout[1] (15, 0, 0)
restarting timeout[1] (15, 0, 0)
restarting timeout[-1] (17, 0, 0)
Felipe Contreras Oct. 3, 2013, 3:56 p.m. UTC | #3
On Thu, Oct 3, 2013 at 4:15 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 04:09:32AM -0500, Felipe Contreras wrote:
>> On Thu, Oct 3, 2013 at 3:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > That will hopefully catch which path is consuming too much time
>>
>> This is what I got:
>>
>> restarting timeout[2] (14, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[2] (14, 0, 0)
>> restarting timeout[1] (15, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[0] (16, 0, 0)
>> restarting timeout[1] (15, 0, 0)
>> restarting timeout[1] (15, 0, 0)
>> restarting timeout[-1] (17, 0, 0)
>
> Suspiciously like we block for an entire vrefresh to submit one batch.
> Hmm, do you mind sending me your complete Xorg.0.log?

Here you go.

BTW, the game is free to play, you just need Steam.
Felipe Contreras Oct. 3, 2013, 4:19 p.m. UTC | #4
On Thu, Oct 3, 2013 at 11:01 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Thu, Oct 03, 2013 at 10:56:43AM -0500, Felipe Contreras wrote:
>> On Thu, Oct 3, 2013 at 4:15 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > On Thu, Oct 03, 2013 at 04:09:32AM -0500, Felipe Contreras wrote:
>> >> On Thu, Oct 3, 2013 at 3:17 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> >> > That will hopefully catch which path is consuming too much time
>> >>
>> >> This is what I got:
>> >>
>> >> restarting timeout[2] (14, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[2] (14, 0, 0)
>> >> restarting timeout[1] (15, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[0] (16, 0, 0)
>> >> restarting timeout[1] (15, 0, 0)
>> >> restarting timeout[1] (15, 0, 0)
>> >> restarting timeout[-1] (17, 0, 0)
>> >
>> > Suspiciously like we block for an entire vrefresh to submit one batch.
>> > Hmm, do you mind sending me your complete Xorg.0.log?
>>
>> Here you go.
>
> Oh, TearFree. That explains it. I plan to fix that up very soon.

Cool. Any idea why this happens only in this game, and only in that
exact loading screen?
Chris Wilson Oct. 3, 2013, 4:31 p.m. UTC | #5
On Thu, Oct 03, 2013 at 11:19:56AM -0500, Felipe Contreras wrote:
> On Thu, Oct 3, 2013 at 11:01 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Oh, TearFree. That explains it. I plan to fix that up very soon.
> 
> Cool. Any idea why this happens only in this game, and only in that
> exact loading screen?

No. Just a perfect storm I would guess.
-Chris
diff mbox

Patch

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 4838371..ce6da7d 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -15978,6 +15978,9 @@  void sna_accel_block_handler(struct sna *sna, struct timeval **tv)
 		     __FUNCTION__, sna->timer_active));
 
 		timeout = sna->timer_expire[0] - TIME;
+		if (timeout < 0)
+			timeout = 0;
+
 		DBG(("%s: flush timer expires in %d [%d]\n",
 		     __FUNCTION__, timeout, sna->timer_expire[0]));