diff mbox

Funky new vblank counter regressions in Linux 4.4-rc1

Message ID 564E0AF4.3050406@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mario Kleiner Nov. 19, 2015, 5:46 p.m. UTC
Hi Alex and Michel and Ville,

it's "fix vblank stuff" time again ;-)

Ville's changes to the DRM's drm_handle_vblank() / 
drm_update_vblank_count() code in Linux 4.4 not only made that code more 
elegant, but also removed the robustness against the vblank irq quirks 
in AMD hw and similar hardware. So now i get tons of off-by-one errors and

"[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
completion event has impossible msc 24803 < target_msc 24804" XOrg 
messages from that kernel.

One of the reasons for trouble is that AMD hw quirk where the hw fires 
an extra vblank irq shortly after vblank irq's get enabled, not 
synchronized to vblank, but typically in the middle of active scanout, 
so we get a redundant call to drm_handle_vblank in the middle of scanout.

To fix that i have a minor patch to make drm_update_vblank_count() again 
robust against such redundant calls, which i will send out later to the 
mailing list. Diff attached for reference.

The second quirk of AMD hw is that the vblank interrupt fires a few 
scanlines before start of vblank, so drm_handle_vblank -> 
drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
called before the start of the vblank for which the new vblank count 
should be queried.

The third problem is that the DRM vblank handling always had the 
assumption that hardware vblank counters would essentially increment at 
leading edge of vblank - basically in sync with the firing of the vblank 
irq, so that a hw counter readout from within the vblank irq handler 
would always deliver the new incremented value. If this assumption is 
violated then the counting by use of the hw counter gets unreliable, 
because depending on random small delays in irq handling the code may 
end up sampling the hw counter pre- or post-increment, leading to 
inconsistent updating and funky bugs. It just so happens that AMD 
hardware doesn't increment the hw counter at leading edge of vblank, so 
stuff falls apart.

So to fix those two problems i'm tinkering with cooking the hw vblank 
counter value returned by radeon_get_vblank_counter_kms() to make it 
appear as if the counter incremented at leading edge of vblank in sync 
with vblank irq.

It almost sort of works on the rs600 code path, but i need a bit of info 
from you:

1. There's this register from the old specs for m76.pdf, which is not 
part of the current register defines for radeon-kms:

"D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"

It contains the lower 16 bits of framecounter and the 13 bits of 
vertical scanout position. It seems to give the same readings as the 24 
bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
would come handy.

Does Evergreen and later have a same/similar register and where is it?

2. The hw framecounter seems to increment when the vertical scanout 
position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
gpu i tested so far. Is this so on all asics? And is the hw counter 
increment happening exactly at the moment that vertical scanout position 
jumps back to zero, ie. both events are driven by the same signal? Or is 
the framecounter increment just happening somewhere inside either 
scanline VTOTAL-1 or scanline 0?


If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
regression and with a bit of luck at the same time improve by being able 
to set dev->vblank_disable_immediate = true then and allow vblank irqs 
to get turned off more aggressively for a bit of extra power saving.

thanks,
-mario

Comments

Ville Syrjala Nov. 19, 2015, 6:20 p.m. UTC | #1
On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
> Hi Alex and Michel and Ville,
> 
> it's "fix vblank stuff" time again ;-)
> 
> Ville's changes to the DRM's drm_handle_vblank() / 
> drm_update_vblank_count() code in Linux 4.4 not only made that code more 
> elegant, but also removed the robustness against the vblank irq quirks 
> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
> 
> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
> completion event has impossible msc 24803 < target_msc 24804" XOrg 
> messages from that kernel.

Argh. Sorry about that.

> 
> One of the reasons for trouble is that AMD hw quirk where the hw fires 
> an extra vblank irq shortly after vblank irq's get enabled, not 
> synchronized to vblank, but typically in the middle of active scanout, 
> so we get a redundant call to drm_handle_vblank in the middle of scanout.

I think that should be fine as such. The code should ignore redudntant
vbl irqs. Well, assuming you have a reliable hw counter or you use the
timestamp guesstimate mechanism and your scanout position is reported
accurately. But I guess you have a bit of problem with both.

> 
> To fix that i have a minor patch to make drm_update_vblank_count() again 
> robust against such redundant calls, which i will send out later to the 
> mailing list. Diff attached for reference.
> 
> The second quirk of AMD hw is that the vblank interrupt fires a few 
> scanlines before start of vblank, so drm_handle_vblank -> 
> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
> called before the start of the vblank for which the new vblank count 
> should be queried.

Does it fire too soon, or is the scanout position register value(s)
just offset by a few lines perhaps?

We have that with i915 and I simply fix up the value when reading it
out. Fortunately for us the offset is constant (or at least seems to
be) for a given platform/connector combo.

> 
> The third problem is that the DRM vblank handling always had the 
> assumption that hardware vblank counters would essentially increment at 
> leading edge of vblank - basically in sync with the firing of the vblank 
> irq, so that a hw counter readout from within the vblank irq handler 
> would always deliver the new incremented value. If this assumption is 
> violated then the counting by use of the hw counter gets unreliable, 
> because depending on random small delays in irq handling the code may 
> end up sampling the hw counter pre- or post-increment, leading to 
> inconsistent updating and funky bugs. It just so happens that AMD 
> hardware doesn't increment the hw counter at leading edge of vblank, so 
> stuff falls apart.
> 
> So to fix those two problems i'm tinkering with cooking the hw vblank 
> counter value returned by radeon_get_vblank_counter_kms() to make it 
> appear as if the counter incremented at leading edge of vblank in sync 
> with vblank irq.

Yeah, I do that in i915 too. Well, on some platforms where the
counter increments at the leading edge of active.

> 
> It almost sort of works on the rs600 code path, but i need a bit of info 
> from you:
> 
> 1. There's this register from the old specs for m76.pdf, which is not 
> part of the current register defines for radeon-kms:
> 
> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> 
> It contains the lower 16 bits of framecounter and the 13 bits of 
> vertical scanout position. It seems to give the same readings as the 24 
> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
> would come handy.
> 
> Does Evergreen and later have a same/similar register and where is it?
> 
> 2. The hw framecounter seems to increment when the vertical scanout 
> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
> gpu i tested so far. Is this so on all asics? And is the hw counter 
> increment happening exactly at the moment that vertical scanout position 
> jumps back to zero, ie. both events are driven by the same signal? Or is 
> the framecounter increment just happening somewhere inside either 
> scanline VTOTAL-1 or scanline 0?
> 
> 
> If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
> regression and with a bit of luck at the same time improve by being able 
> to set dev->vblank_disable_immediate = true then and allow vblank irqs 
> to get turned off more aggressively for a bit of extra power saving.
> 
> thanks,
> -mario

> -- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>                                     unsigned long flags)
>  {
>         struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> -       u32 cur_vblank, diff;
> +       u32 cur_vblank, diff = 0;
>         bool rc;
>         struct timeval t_vblank;
> +       const struct timeval *t_old;
> +       u64 diff_ns;
>         int count = DRM_TIMESTAMP_MAXRETRIES;
>         int framedur_ns = vblank->framedur_ns;
> 
> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>                 rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>         } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
> 
> -       if (dev->max_vblank_count != 0) {
> -               /* trust the hw counter when it's around */
> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> -       } else if (rc && framedur_ns) {
> -               const struct timeval *t_old;
> -               u64 diff_ns;
> -
> +       /*
> +        * Always use vblank timestamping based method if supported to reject
> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> +        * due to some irq handling quirk.

Hmm. I'm thinking it would be better to simply not claim that there's a hw
counter if it isn't reliable.

> +        *
> +        * This also sets the diff value for use as fallback below in case the
> +        * hw does not support a suitable hw vblank counter.
> +        */
> +       if (rc && framedur_ns) {
>                 t_old = &vblanktimestamp(dev, pipe, vblank->count);
>                 diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> 
> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>                  */
>                 diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> 
> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> -                                     pipe, (long long) diff_ns, framedur_ns);
> -       } else {
> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> +                   pipe, (long long) diff_ns, framedur_ns);
> +
> +                   /* Treat this redundant invocation as no-op. */
> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> +                   return;
> +               }
> +       }
> +
> +       /*
> +        * hw counters, if marked as supported via max_vblank_count != 0,
> +        * *must* increment at leading edge of vblank and in sync with
> +        * the firing of the hw vblank irq, otherwise we have a race here
> +        * between gpu and us, where small variations in our execution
> +        * timing can cause off-by-one miscounting of vblanks.
> +        */
> +       if (dev->max_vblank_count != 0) {
> +               /* trust the hw counter when it's around */
> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> +       } else if (!(rc && framedur_ns)) {
>                 /* some kind of default for drivers w/o accurate vbl timestamping */
>                 diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>         }
Mario Kleiner Nov. 19, 2015, 7:12 p.m. UTC | #2
On 11/19/2015 07:20 PM, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
>> Hi Alex and Michel and Ville,
>>
>> it's "fix vblank stuff" time again ;-)
>>
>> Ville's changes to the DRM's drm_handle_vblank() /
>> drm_update_vblank_count() code in Linux 4.4 not only made that code more
>> elegant, but also removed the robustness against the vblank irq quirks
>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
>>
>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>> messages from that kernel.
>
> Argh. Sorry about that.
>

On the plus side, your "vblank timestamp deltas as fake vblank counters" 
code seems to work nicely on nouveau-kms, as far as testing with three 
Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and 
Ivybridge) i tested checked out nicely.

And at least the recent nv50+ NVidia Tesla also have 16 bit vblank 
counters which we could implement in nouveau, maybe with the same 
trickery to allow long trouble-free vblank off periods, hopefully that 
would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 
will probably also work. I think i read in the Tegra-3 PRM that the sync 
points they use to implement vblank counters do increment at leading 
edge of vblank.

The only problem we may have is that some of the embedded gpus may not 
have compliant vblank counters and they probably also lack vblank 
timestamping, so it might be a good idea to rather not use vblank 
counters at all in those drivers - patch their kms drivers to 
max_vblank_count = 0;

>>
>> One of the reasons for trouble is that AMD hw quirk where the hw fires
>> an extra vblank irq shortly after vblank irq's get enabled, not
>> synchronized to vblank, but typically in the middle of active scanout,
>> so we get a redundant call to drm_handle_vblank in the middle of scanout.
>
> I think that should be fine as such. The code should ignore redudntant
> vbl irqs. Well, assuming you have a reliable hw counter or you use the
> timestamp guesstimate mechanism and your scanout position is reported
> accurately. But I guess you have a bit of problem with both.
>

The problem is i'll need to treat calls to radeon kms 
driver->get_vblank_counter differently, depending if the function gets 
called from vblank irq, or from regular code, so that hw quirk that 
causes spontaneous misfiring of the vblank irq in the middle of scanout 
would confuse my hw vblank counter cooking method to produce a fake hw 
vblank counter increment. That's why i moved the filtering for redundant 
irqs based on vblank timestamps in drm_vblank_update() around to always 
apply. Makes us robust against that type of hw quirk in general and 
makes life for the vblank counter cooking so much easier.

It's a beautiful collaboration of different hw bugs to make things 
interesting :)

>>
>> To fix that i have a minor patch to make drm_update_vblank_count() again
>> robust against such redundant calls, which i will send out later to the
>> mailing list. Diff attached for reference.
>>
>> The second quirk of AMD hw is that the vblank interrupt fires a few
>> scanlines before start of vblank, so drm_handle_vblank ->
>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>> called before the start of the vblank for which the new vblank count
>> should be queried.
>
> Does it fire too soon, or is the scanout position register value(s)
> just offset by a few lines perhaps?
>
> We have that with i915 and I simply fix up the value when reading it
> out. Fortunately for us the offset is constant (or at least seems to
> be) for a given platform/connector combo.
>

I think they fire too soon, from all i've seen so far on a few cards.

>>
>> The third problem is that the DRM vblank handling always had the
>> assumption that hardware vblank counters would essentially increment at
>> leading edge of vblank - basically in sync with the firing of the vblank
>> irq, so that a hw counter readout from within the vblank irq handler
>> would always deliver the new incremented value. If this assumption is
>> violated then the counting by use of the hw counter gets unreliable,
>> because depending on random small delays in irq handling the code may
>> end up sampling the hw counter pre- or post-increment, leading to
>> inconsistent updating and funky bugs. It just so happens that AMD
>> hardware doesn't increment the hw counter at leading edge of vblank, so
>> stuff falls apart.
>>
>> So to fix those two problems i'm tinkering with cooking the hw vblank
>> counter value returned by radeon_get_vblank_counter_kms() to make it
>> appear as if the counter incremented at leading edge of vblank in sync
>> with vblank irq.
>
> Yeah, I do that in i915 too. Well, on some platforms where the
> counter increments at the leading edge of active.
>

Yep. I think that's key to make the vblank instant off stuff work reliably.

-mario

>>
>> It almost sort of works on the rs600 code path, but i need a bit of info
>> from you:
>>
>> 1. There's this register from the old specs for m76.pdf, which is not
>> part of the current register defines for radeon-kms:
>>
>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>
>> It contains the lower 16 bits of framecounter and the 13 bits of
>> vertical scanout position. It seems to give the same readings as the 24
>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>> would come handy.
>>
>> Does Evergreen and later have a same/similar register and where is it?
>>
>> 2. The hw framecounter seems to increment when the vertical scanout
>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>> gpu i tested so far. Is this so on all asics? And is the hw counter
>> increment happening exactly at the moment that vertical scanout position
>> jumps back to zero, ie. both events are driven by the same signal? Or is
>> the framecounter increment just happening somewhere inside either
>> scanline VTOTAL-1 or scanline 0?
>>
>>
>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>> regression and with a bit of luck at the same time improve by being able
>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
>> to get turned off more aggressively for a bit of extra power saving.
>>
>> thanks,
>> -mario
>
>> -- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>                                      unsigned long flags)
>>   {
>>          struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>> -       u32 cur_vblank, diff;
>> +       u32 cur_vblank, diff = 0;
>>          bool rc;
>>          struct timeval t_vblank;
>> +       const struct timeval *t_old;
>> +       u64 diff_ns;
>>          int count = DRM_TIMESTAMP_MAXRETRIES;
>>          int framedur_ns = vblank->framedur_ns;
>>
>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>                  rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>          } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>
>> -       if (dev->max_vblank_count != 0) {
>> -               /* trust the hw counter when it's around */
>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> -       } else if (rc && framedur_ns) {
>> -               const struct timeval *t_old;
>> -               u64 diff_ns;
>> -
>> +       /*
>> +        * Always use vblank timestamping based method if supported to reject
>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>> +        * due to some irq handling quirk.
>
> Hmm. I'm thinking it would be better to simply not claim that there's a hw
> counter if it isn't reliable.
>
>> +        *
>> +        * This also sets the diff value for use as fallback below in case the
>> +        * hw does not support a suitable hw vblank counter.
>> +        */
>> +       if (rc && framedur_ns) {
>>                  t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>                  diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>
>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>                   */
>>                  diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>
>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
>> -                                     pipe, (long long) diff_ns, framedur_ns);
>> -       } else {
>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
>> +                   pipe, (long long) diff_ns, framedur_ns);
>> +
>> +                   /* Treat this redundant invocation as no-op. */
>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
>> +                   return;
>> +               }
>> +       }
>> +
>> +       /*
>> +        * hw counters, if marked as supported via max_vblank_count != 0,
>> +        * *must* increment at leading edge of vblank and in sync with
>> +        * the firing of the hw vblank irq, otherwise we have a race here
>> +        * between gpu and us, where small variations in our execution
>> +        * timing can cause off-by-one miscounting of vblanks.
>> +        */
>> +       if (dev->max_vblank_count != 0) {
>> +               /* trust the hw counter when it's around */
>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> +       } else if (!(rc && framedur_ns)) {
>>                  /* some kind of default for drivers w/o accurate vbl timestamping */
>>                  diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>          }
>
>
Ville Syrjala Nov. 19, 2015, 7:45 p.m. UTC | #3
On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
> On 11/19/2015 07:20 PM, Ville Syrjälä wrote:
> > On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
> >> Hi Alex and Michel and Ville,
> >>
> >> it's "fix vblank stuff" time again ;-)
> >>
> >> Ville's changes to the DRM's drm_handle_vblank() /
> >> drm_update_vblank_count() code in Linux 4.4 not only made that code more
> >> elegant, but also removed the robustness against the vblank irq quirks
> >> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
> >>
> >> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
> >> completion event has impossible msc 24803 < target_msc 24804" XOrg
> >> messages from that kernel.
> >
> > Argh. Sorry about that.
> >
> 
> On the plus side, your "vblank timestamp deltas as fake vblank counters" 
> code seems to work nicely on nouveau-kms, as far as testing with three 
> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and 
> Ivybridge) i tested checked out nicely.
> 
> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank 
> counters which we could implement in nouveau, maybe with the same 
> trickery to allow long trouble-free vblank off periods, hopefully that 
> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 
> will probably also work. I think i read in the Tegra-3 PRM that the sync 
> points they use to implement vblank counters do increment at leading 
> edge of vblank.
> 
> The only problem we may have is that some of the embedded gpus may not 
> have compliant vblank counters and they probably also lack vblank 
> timestamping, so it might be a good idea to rather not use vblank 
> counters at all in those drivers - patch their kms drivers to 
> max_vblank_count = 0;
> 
> >>
> >> One of the reasons for trouble is that AMD hw quirk where the hw fires
> >> an extra vblank irq shortly after vblank irq's get enabled, not
> >> synchronized to vblank, but typically in the middle of active scanout,
> >> so we get a redundant call to drm_handle_vblank in the middle of scanout.
> >
> > I think that should be fine as such. The code should ignore redudntant
> > vbl irqs. Well, assuming you have a reliable hw counter or you use the
> > timestamp guesstimate mechanism and your scanout position is reported
> > accurately. But I guess you have a bit of problem with both.
> >
> 
> The problem is i'll need to treat calls to radeon kms 
> driver->get_vblank_counter differently, depending if the function gets 
> called from vblank irq, or from regular code, so that hw quirk that 
> causes spontaneous misfiring of the vblank irq in the middle of scanout 
> would confuse my hw vblank counter cooking method to produce a fake hw 
> vblank counter increment. That's why i moved the filtering for redundant 
> irqs based on vblank timestamps in drm_vblank_update() around to always 
> apply. Makes us robust against that type of hw quirk in general and 
> makes life for the vblank counter cooking so much easier.
> 
> It's a beautiful collaboration of different hw bugs to make things 
> interesting :)
> 
> >>
> >> To fix that i have a minor patch to make drm_update_vblank_count() again
> >> robust against such redundant calls, which i will send out later to the
> >> mailing list. Diff attached for reference.
> >>
> >> The second quirk of AMD hw is that the vblank interrupt fires a few
> >> scanlines before start of vblank, so drm_handle_vblank ->
> >> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
> >> called before the start of the vblank for which the new vblank count
> >> should be queried.
> >
> > Does it fire too soon, or is the scanout position register value(s)
> > just offset by a few lines perhaps?
> >
> > We have that with i915 and I simply fix up the value when reading it
> > out. Fortunately for us the offset is constant (or at least seems to
> > be) for a given platform/connector combo.
> >
> 
> I think they fire too soon, from all i've seen so far on a few cards.

That's unfortunate. Firing a bit too late would be perfectly fine for
most things. And that's actually what happens on older intel hw. Firing
too soon opens up some more races, as in you may have to wait a bit
more after getting woken up to make sure you've crossed into the
freame you were waiting for.

Also not sure how to deal with that sort of thing in a reasonable way in
.get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it
could fudge things a bit when the early irq arrives, but then if someone
calls drm_update_vblank_count() after the irq was handled but before
start of vblank you'll end up with a -1 diff.

Maybe something like:
.get_scanout_positon()
{
	read frame counter and scaline position

	if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) &&
	     scanline_slightly_below_start_of_vblank)) {
		last_fudge_frame = frame;
		scanline = start_of_vblank;
	} else
		last_fudge_frame = frame - 1;
}

could be done. So it would pretend the scanline has already crossed into
the vblank when the interrupt arrives, and it should keep doing that
until the frame counter indicates that you really crossed there.

> 
> >>
> >> The third problem is that the DRM vblank handling always had the
> >> assumption that hardware vblank counters would essentially increment at
> >> leading edge of vblank - basically in sync with the firing of the vblank
> >> irq, so that a hw counter readout from within the vblank irq handler
> >> would always deliver the new incremented value. If this assumption is
> >> violated then the counting by use of the hw counter gets unreliable,
> >> because depending on random small delays in irq handling the code may
> >> end up sampling the hw counter pre- or post-increment, leading to
> >> inconsistent updating and funky bugs. It just so happens that AMD
> >> hardware doesn't increment the hw counter at leading edge of vblank, so
> >> stuff falls apart.
> >>
> >> So to fix those two problems i'm tinkering with cooking the hw vblank
> >> counter value returned by radeon_get_vblank_counter_kms() to make it
> >> appear as if the counter incremented at leading edge of vblank in sync
> >> with vblank irq.
> >
> > Yeah, I do that in i915 too. Well, on some platforms where the
> > counter increments at the leading edge of active.
> >
> 
> Yep. I think that's key to make the vblank instant off stuff work reliably.

I think it should work OK with the timestamp based stuff too, assuming
you have accurate scanout position reporting. So having a reliable hw
counter shouldn't be a requirement for that.

> 
> -mario
> 
> >>
> >> It almost sort of works on the rs600 code path, but i need a bit of info
> >> from you:
> >>
> >> 1. There's this register from the old specs for m76.pdf, which is not
> >> part of the current register defines for radeon-kms:
> >>
> >> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> >>
> >> It contains the lower 16 bits of framecounter and the 13 bits of
> >> vertical scanout position. It seems to give the same readings as the 24
> >> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
> >> would come handy.
> >>
> >> Does Evergreen and later have a same/similar register and where is it?
> >>
> >> 2. The hw framecounter seems to increment when the vertical scanout
> >> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
> >> gpu i tested so far. Is this so on all asics? And is the hw counter
> >> increment happening exactly at the moment that vertical scanout position
> >> jumps back to zero, ie. both events are driven by the same signal? Or is
> >> the framecounter increment just happening somewhere inside either
> >> scanline VTOTAL-1 or scanline 0?
> >>
> >>
> >> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
> >> regression and with a bit of luck at the same time improve by being able
> >> to set dev->vblank_disable_immediate = true then and allow vblank irqs
> >> to get turned off more aggressively for a bit of extra power saving.
> >>
> >> thanks,
> >> -mario
> >
> >> -- a/drivers/gpu/drm/drm_irq.c
> >> +++ b/drivers/gpu/drm/drm_irq.c
> >> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>                                      unsigned long flags)
> >>   {
> >>          struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> >> -       u32 cur_vblank, diff;
> >> +       u32 cur_vblank, diff = 0;
> >>          bool rc;
> >>          struct timeval t_vblank;
> >> +       const struct timeval *t_old;
> >> +       u64 diff_ns;
> >>          int count = DRM_TIMESTAMP_MAXRETRIES;
> >>          int framedur_ns = vblank->framedur_ns;
> >>
> >> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>                  rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
> >>          } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
> >>
> >> -       if (dev->max_vblank_count != 0) {
> >> -               /* trust the hw counter when it's around */
> >> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >> -       } else if (rc && framedur_ns) {
> >> -               const struct timeval *t_old;
> >> -               u64 diff_ns;
> >> -
> >> +       /*
> >> +        * Always use vblank timestamping based method if supported to reject
> >> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> >> +        * due to some irq handling quirk.
> >
> > Hmm. I'm thinking it would be better to simply not claim that there's a hw
> > counter if it isn't reliable.
> >
> >> +        *
> >> +        * This also sets the diff value for use as fallback below in case the
> >> +        * hw does not support a suitable hw vblank counter.
> >> +        */
> >> +       if (rc && framedur_ns) {
> >>                  t_old = &vblanktimestamp(dev, pipe, vblank->count);
> >>                  diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> >>
> >> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>                   */
> >>                  diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> >>
> >> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> >> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> >> -                                     pipe, (long long) diff_ns, framedur_ns);
> >> -       } else {
> >> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> >> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> >> +                   pipe, (long long) diff_ns, framedur_ns);
> >> +
> >> +                   /* Treat this redundant invocation as no-op. */
> >> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> >> +                   return;
> >> +               }
> >> +       }
> >> +
> >> +       /*
> >> +        * hw counters, if marked as supported via max_vblank_count != 0,
> >> +        * *must* increment at leading edge of vblank and in sync with
> >> +        * the firing of the hw vblank irq, otherwise we have a race here
> >> +        * between gpu and us, where small variations in our execution
> >> +        * timing can cause off-by-one miscounting of vblanks.
> >> +        */
> >> +       if (dev->max_vblank_count != 0) {
> >> +               /* trust the hw counter when it's around */
> >> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >> +       } else if (!(rc && framedur_ns)) {
> >>                  /* some kind of default for drivers w/o accurate vbl timestamping */
> >>                  diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>          }
> >
> >
Alex Deucher Nov. 20, 2015, 3:42 a.m. UTC | #4
On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
> Hi Alex and Michel and Ville,
>
> it's "fix vblank stuff" time again ;-)

Adding Harry from our display team.  He might be able to fill in the
blanks of on some of this better than I can.  It might also be worth
checking to see how our DAL (our new display code which is being
developed directly by our display team) code handles this.  It could
be that we are just missing register settings:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
Additionally we've published full registers headers for the display block:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
The DCE8 stuff should generally apply back to DCE4.  If you have
questions about registers older asics not covered in the hw docs, let
me know.  Note the new headers are dword aligned rather than byte
aligned.

>
> Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_count()
> code in Linux 4.4 not only made that code more elegant, but also removed the
> robustness against the vblank irq quirks in AMD hw and similar hardware. So
> now i get tons of off-by-one errors and
>
> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
> completion event has impossible msc 24803 < target_msc 24804" XOrg messages
> from that kernel.
>
> One of the reasons for trouble is that AMD hw quirk where the hw fires an
> extra vblank irq shortly after vblank irq's get enabled, not synchronized to
> vblank, but typically in the middle of active scanout, so we get a redundant
> call to drm_handle_vblank in the middle of scanout.
>
> To fix that i have a minor patch to make drm_update_vblank_count() again
> robust against such redundant calls, which i will send out later to the
> mailing list. Diff attached for reference.
>
> The second quirk of AMD hw is that the vblank interrupt fires a few
> scanlines before start of vblank, so drm_handle_vblank ->
> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets called
> before the start of the vblank for which the new vblank count should be
> queried.
>
> The third problem is that the DRM vblank handling always had the assumption
> that hardware vblank counters would essentially increment at leading edge of
> vblank - basically in sync with the firing of the vblank irq, so that a hw
> counter readout from within the vblank irq handler would always deliver the
> new incremented value. If this assumption is violated then the counting by
> use of the hw counter gets unreliable, because depending on random small
> delays in irq handling the code may end up sampling the hw counter pre- or
> post-increment, leading to inconsistent updating and funky bugs. It just so
> happens that AMD hardware doesn't increment the hw counter at leading edge
> of vblank, so stuff falls apart.
>
> So to fix those two problems i'm tinkering with cooking the hw vblank
> counter value returned by radeon_get_vblank_counter_kms() to make it appear
> as if the counter incremented at leading edge of vblank in sync with vblank
> irq.
>
> It almost sort of works on the rs600 code path, but i need a bit of info
> from you:
>
> 1. There's this register from the old specs for m76.pdf, which is not part
> of the current register defines for radeon-kms:
>
> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>
> It contains the lower 16 bits of framecounter and the 13 bits of vertical
> scanout position. It seems to give the same readings as the 24 bit
> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This would
> come handy.
>
> Does Evergreen and later have a same/similar register and where is it?

Yes.  CRTC_STATUS_VF_COUNT

CRTC:CRTC_STATUS_VF_COUNT  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32  ·
 [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2]
GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4]
GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c
DESCRIPTION: Current composite vertical and frame count for CRTC
Field Name   Bits    Default   Description
CRTC_VF_COUNT
(Access: R)    28:0    0x0    Reports current vertical and frame count

>
> 2. The hw framecounter seems to increment when the vertical scanout position
> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i tested so
> far. Is this so on all asics? And is the hw counter increment happening
> exactly at the moment that vertical scanout position jumps back to zero, ie.
> both events are driven by the same signal? Or is the framecounter increment
> just happening somewhere inside either scanline VTOTAL-1 or scanline 0?

Unless Harry knows, we can ask the hw team, but I doubt they would
have changed it.  I think it's tied to the start line which is
controlled by this register:
CRTC:CRTC_START_LINE_CONTROL  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32
 ·  [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2]
GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4]
GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc
DESCRIPTION: move start_line signal earlier by 1 line in CRTC
Field Name   Bits          Default         Description
CRTC_PROGRESSIVE_START_LINE_EARLY         0    0x0    move start_line
signal by 1 line eariler in progressive mode

CRTC_INTERLACE_START_LINE_EARLY         8     0x1     move start_line
signal by 1 line earlier in interlaced timing mode

CRTC_ADVANCED_START_LINE_POSITION         19:16     0x4  Advanced
Start Line position respect to not VBLANK. Default of 4 means the
Advanced Start Line is 4 lines before the first non VBLANK line

The following info I dug up internally may be useful:

The position of the CRTC output timing generator when the “start of
frame” indicator occurs depends on several conditions.  These
conditions are whether the timing generator is outputting timing for a
progressive or interlaced display mode, whether the scaler is enabled,
and if so, whether the register to select an earlier than normal
“start of frame” indicator is set.  The “start of frame” indicator
typically occurs 2 lines before the vertical blank end position
(DxCRTC_V_BLANK_END) is reached

When interlaced output display modes are enabled
(DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled
(DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts
by 2 for both the even fields and odd fields of each frame.  For both
the even fields and the odd fields of interlaced output modes, the
“start of frame” indicator occurs 2 lines before the end of the
vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END –
3 depending on the field since the vertical counter counts by 2 in
interlaced), since the vertical counter counts by 2 in this mode).
There is one exception to the previous statement.  If scaling is
enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled
(DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then
the “start of frame” indicator happens 3 lines before the end of the
vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5
depending on the field since the vertical counter counts by 2 in
interlaced).

When progressive output display modes are enabled
(DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled
(DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines
before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2).
Similar to interlaced output mode, there is one exception to the
previous statement.  If scaling is enabled (DxSCL_SCALE_EN = 1) and
the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY =
1) in progressive output mode then the “start of frame” indicator
happens 3 lines before the end of the vertical blank
(DxCRTC_V_BLANK_END – 3)

I hope this helps,

Alex

>
>
> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
> regression and with a bit of luck at the same time improve by being able to
> set dev->vblank_disable_immediate = true then and allow vblank irqs to get
> turned off more aggressively for a bit of extra power saving.
>
> thanks,
> -mario
Mario Kleiner Nov. 20, 2015, 3:24 p.m. UTC | #5
On 11/19/2015 08:45 PM, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
>> On 11/19/2015 07:20 PM, Ville Syrjälä wrote:
>>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
>>>> Hi Alex and Michel and Ville,
>>>>
>>>> it's "fix vblank stuff" time again ;-)
>>>>
>>>> Ville's changes to the DRM's drm_handle_vblank() /
>>>> drm_update_vblank_count() code in Linux 4.4 not only made that code more
>>>> elegant, but also removed the robustness against the vblank irq quirks
>>>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
>>>>
>>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>>>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>>>> messages from that kernel.
>>>
>>> Argh. Sorry about that.
>>>
>>
>> On the plus side, your "vblank timestamp deltas as fake vblank counters"
>> code seems to work nicely on nouveau-kms, as far as testing with three
>> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and
>> Ivybridge) i tested checked out nicely.
>>
>> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank
>> counters which we could implement in nouveau, maybe with the same
>> trickery to allow long trouble-free vblank off periods, hopefully that
>> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3
>> will probably also work. I think i read in the Tegra-3 PRM that the sync
>> points they use to implement vblank counters do increment at leading
>> edge of vblank.
>>
>> The only problem we may have is that some of the embedded gpus may not
>> have compliant vblank counters and they probably also lack vblank
>> timestamping, so it might be a good idea to rather not use vblank
>> counters at all in those drivers - patch their kms drivers to
>> max_vblank_count = 0;
>>
>>>>
>>>> One of the reasons for trouble is that AMD hw quirk where the hw fires
>>>> an extra vblank irq shortly after vblank irq's get enabled, not
>>>> synchronized to vblank, but typically in the middle of active scanout,
>>>> so we get a redundant call to drm_handle_vblank in the middle of scanout.
>>>
>>> I think that should be fine as such. The code should ignore redudntant
>>> vbl irqs. Well, assuming you have a reliable hw counter or you use the
>>> timestamp guesstimate mechanism and your scanout position is reported
>>> accurately. But I guess you have a bit of problem with both.
>>>
>>
>> The problem is i'll need to treat calls to radeon kms
>> driver->get_vblank_counter differently, depending if the function gets
>> called from vblank irq, or from regular code, so that hw quirk that
>> causes spontaneous misfiring of the vblank irq in the middle of scanout
>> would confuse my hw vblank counter cooking method to produce a fake hw
>> vblank counter increment. That's why i moved the filtering for redundant
>> irqs based on vblank timestamps in drm_vblank_update() around to always
>> apply. Makes us robust against that type of hw quirk in general and
>> makes life for the vblank counter cooking so much easier.
>>
>> It's a beautiful collaboration of different hw bugs to make things
>> interesting :)
>>
>>>>
>>>> To fix that i have a minor patch to make drm_update_vblank_count() again
>>>> robust against such redundant calls, which i will send out later to the
>>>> mailing list. Diff attached for reference.
>>>>
>>>> The second quirk of AMD hw is that the vblank interrupt fires a few
>>>> scanlines before start of vblank, so drm_handle_vblank ->
>>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>>>> called before the start of the vblank for which the new vblank count
>>>> should be queried.
>>>
>>> Does it fire too soon, or is the scanout position register value(s)
>>> just offset by a few lines perhaps?
>>>
>>> We have that with i915 and I simply fix up the value when reading it
>>> out. Fortunately for us the offset is constant (or at least seems to
>>> be) for a given platform/connector combo.
>>>
>>
>> I think they fire too soon, from all i've seen so far on a few cards.
>
> That's unfortunate. Firing a bit too late would be perfectly fine for
> most things. And that's actually what happens on older intel hw. Firing
> too soon opens up some more races, as in you may have to wait a bit
> more after getting woken up to make sure you've crossed into the
> freame you were waiting for.
>
> Also not sure how to deal with that sort of thing in a reasonable way in
> .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it
> could fudge things a bit when the early irq arrives, but then if someone
> calls drm_update_vblank_count() after the irq was handled but before
> start of vblank you'll end up with a -1 diff.
>
> Maybe something like:
> .get_scanout_positon()
> {
> 	read frame counter and scaline position
>
> 	if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) &&
> 	     scanline_slightly_below_start_of_vblank)) {
> 		last_fudge_frame = frame;
> 		scanline = start_of_vblank;
> 	} else
> 		last_fudge_frame = frame - 1;
> }
>
> could be done. So it would pretend the scanline has already crossed into
> the vblank when the interrupt arrives, and it should keep doing that
> until the frame counter indicates that you really crossed there.
>

What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we 
are no more than 1% of the display height away from start of vblank we 
fudge scanout position in a way so that the timestamp gets computed for 
the soon-to-begin vblank, not the old one. For the vblank counter 
cooking atm. i check for in_irq(), assuming that means the same as 
DRM_CALLED_FROM_VBLIRQ, given that is currently the only call from a hw 
irq path, and then i test scanout position for 
no-more-than-10-scanlines-to-vblank, to decide if i have to bump the 
returned vblank count. On the sample of gpus i tested the vblank irq 
fired somewhere between 2 and 4 scanlines too early, so 10 scanlines 
margin should be ok. So it's not firing much too early, just a little bit.

>>
>>>>
>>>> The third problem is that the DRM vblank handling always had the
>>>> assumption that hardware vblank counters would essentially increment at
>>>> leading edge of vblank - basically in sync with the firing of the vblank
>>>> irq, so that a hw counter readout from within the vblank irq handler
>>>> would always deliver the new incremented value. If this assumption is
>>>> violated then the counting by use of the hw counter gets unreliable,
>>>> because depending on random small delays in irq handling the code may
>>>> end up sampling the hw counter pre- or post-increment, leading to
>>>> inconsistent updating and funky bugs. It just so happens that AMD
>>>> hardware doesn't increment the hw counter at leading edge of vblank, so
>>>> stuff falls apart.
>>>>
>>>> So to fix those two problems i'm tinkering with cooking the hw vblank
>>>> counter value returned by radeon_get_vblank_counter_kms() to make it
>>>> appear as if the counter incremented at leading edge of vblank in sync
>>>> with vblank irq.
>>>
>>> Yeah, I do that in i915 too. Well, on some platforms where the
>>> counter increments at the leading edge of active.
>>>
>>
>> Yep. I think that's key to make the vblank instant off stuff work reliably.
>
> I think it should work OK with the timestamp based stuff too, assuming
> you have accurate scanout position reporting. So having a reliable hw
> counter shouldn't be a requirement for that.
>

Yes, should work for the instant off/on itself. But the limitation of 
that is that with the timestamp based stuff we can only be accurate for 
vblank off periods of a few seconds into the future. The framedur_ns we 
calculate from hw mode and the refresh duration computed from the 
timestamps as measured by the host clock can easily deviate by, e.g., 10 
usecs due to difference and drift between the gpu pll and host clock, so 
for each passing refresh cycle with vblank off we'd accumulate ~10 usecs 
of error. After accumulating half a refresh duration of error we'd round 
to the wrong count and be off by one. So at 60 Hz refresh we'd have 
about 16666 usecs / 2 = 8333 usecs headroom / 10 usecs per frame error ~ 
833 refresh cycles before errors happen. At 60 Hz we'd have 833 / 60 ~ 
14 seconds of vblank off. For rates of 120 Hz gamer panels or some 200 
Hz neuro-science applications we'd only get a few seconds off error free 
vblank off without real hardware counters.

So it's good enough for typical desktop 
applications/compositors/games/media players, and a nice improvement 
over the previous state, but not quite sufficient for applications that 
need long time consistent vblank counts for long waits of multiple 
seconds or even minutes. So it's still good to have hw counters if we 
can get some that are reliable enough.

-mario


>>
>> -mario
>>
>>>>
>>>> It almost sort of works on the rs600 code path, but i need a bit of info
>>>> from you:
>>>>
>>>> 1. There's this register from the old specs for m76.pdf, which is not
>>>> part of the current register defines for radeon-kms:
>>>>
>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>
>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>> vertical scanout position. It seems to give the same readings as the 24
>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>> would come handy.
>>>>
>>>> Does Evergreen and later have a same/similar register and where is it?
>>>>
>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
>>>> increment happening exactly at the moment that vertical scanout position
>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
>>>> the framecounter increment just happening somewhere inside either
>>>> scanline VTOTAL-1 or scanline 0?
>>>>
>>>>
>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>>> regression and with a bit of luck at the same time improve by being able
>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
>>>> to get turned off more aggressively for a bit of extra power saving.
>>>>
>>>> thanks,
>>>> -mario
>>>
>>>> -- a/drivers/gpu/drm/drm_irq.c
>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>                                       unsigned long flags)
>>>>    {
>>>>           struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>>>> -       u32 cur_vblank, diff;
>>>> +       u32 cur_vblank, diff = 0;
>>>>           bool rc;
>>>>           struct timeval t_vblank;
>>>> +       const struct timeval *t_old;
>>>> +       u64 diff_ns;
>>>>           int count = DRM_TIMESTAMP_MAXRETRIES;
>>>>           int framedur_ns = vblank->framedur_ns;
>>>>
>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>                   rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>>>           } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>>>
>>>> -       if (dev->max_vblank_count != 0) {
>>>> -               /* trust the hw counter when it's around */
>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>> -       } else if (rc && framedur_ns) {
>>>> -               const struct timeval *t_old;
>>>> -               u64 diff_ns;
>>>> -
>>>> +       /*
>>>> +        * Always use vblank timestamping based method if supported to reject
>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>>>> +        * due to some irq handling quirk.
>>>
>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
>>> counter if it isn't reliable.
>>>
>>>> +        *
>>>> +        * This also sets the diff value for use as fallback below in case the
>>>> +        * hw does not support a suitable hw vblank counter.
>>>> +        */
>>>> +       if (rc && framedur_ns) {
>>>>                   t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>>>                   diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>>>
>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>                    */
>>>>                   diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>>>
>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
>>>> -       } else {
>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
>>>> +                   pipe, (long long) diff_ns, framedur_ns);
>>>> +
>>>> +                   /* Treat this redundant invocation as no-op. */
>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
>>>> +                   return;
>>>> +               }
>>>> +       }
>>>> +
>>>> +       /*
>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
>>>> +        * *must* increment at leading edge of vblank and in sync with
>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
>>>> +        * between gpu and us, where small variations in our execution
>>>> +        * timing can cause off-by-one miscounting of vblanks.
>>>> +        */
>>>> +       if (dev->max_vblank_count != 0) {
>>>> +               /* trust the hw counter when it's around */
>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>> +       } else if (!(rc && framedur_ns)) {
>>>>                   /* some kind of default for drivers w/o accurate vbl timestamping */
>>>>                   diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>           }
>>>
>>>
>
Ville Syrjala Nov. 20, 2015, 3:34 p.m. UTC | #6
On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
> 
> On 11/19/2015 08:45 PM, Ville Syrjälä wrote:
> > On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote:
> >> On 11/19/2015 07:20 PM, Ville Syrjälä wrote:
> >>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
> >>>> Hi Alex and Michel and Ville,
> >>>>
> >>>> it's "fix vblank stuff" time again ;-)
> >>>>
> >>>> Ville's changes to the DRM's drm_handle_vblank() /
> >>>> drm_update_vblank_count() code in Linux 4.4 not only made that code more
> >>>> elegant, but also removed the robustness against the vblank irq quirks
> >>>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
> >>>>
> >>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
> >>>> completion event has impossible msc 24803 < target_msc 24804" XOrg
> >>>> messages from that kernel.
> >>>
> >>> Argh. Sorry about that.
> >>>
> >>
> >> On the plus side, your "vblank timestamp deltas as fake vblank counters"
> >> code seems to work nicely on nouveau-kms, as far as testing with three
> >> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and
> >> Ivybridge) i tested checked out nicely.
> >>
> >> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank
> >> counters which we could implement in nouveau, maybe with the same
> >> trickery to allow long trouble-free vblank off periods, hopefully that
> >> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3
> >> will probably also work. I think i read in the Tegra-3 PRM that the sync
> >> points they use to implement vblank counters do increment at leading
> >> edge of vblank.
> >>
> >> The only problem we may have is that some of the embedded gpus may not
> >> have compliant vblank counters and they probably also lack vblank
> >> timestamping, so it might be a good idea to rather not use vblank
> >> counters at all in those drivers - patch their kms drivers to
> >> max_vblank_count = 0;
> >>
> >>>>
> >>>> One of the reasons for trouble is that AMD hw quirk where the hw fires
> >>>> an extra vblank irq shortly after vblank irq's get enabled, not
> >>>> synchronized to vblank, but typically in the middle of active scanout,
> >>>> so we get a redundant call to drm_handle_vblank in the middle of scanout.
> >>>
> >>> I think that should be fine as such. The code should ignore redudntant
> >>> vbl irqs. Well, assuming you have a reliable hw counter or you use the
> >>> timestamp guesstimate mechanism and your scanout position is reported
> >>> accurately. But I guess you have a bit of problem with both.
> >>>
> >>
> >> The problem is i'll need to treat calls to radeon kms
> >> driver->get_vblank_counter differently, depending if the function gets
> >> called from vblank irq, or from regular code, so that hw quirk that
> >> causes spontaneous misfiring of the vblank irq in the middle of scanout
> >> would confuse my hw vblank counter cooking method to produce a fake hw
> >> vblank counter increment. That's why i moved the filtering for redundant
> >> irqs based on vblank timestamps in drm_vblank_update() around to always
> >> apply. Makes us robust against that type of hw quirk in general and
> >> makes life for the vblank counter cooking so much easier.
> >>
> >> It's a beautiful collaboration of different hw bugs to make things
> >> interesting :)
> >>
> >>>>
> >>>> To fix that i have a minor patch to make drm_update_vblank_count() again
> >>>> robust against such redundant calls, which i will send out later to the
> >>>> mailing list. Diff attached for reference.
> >>>>
> >>>> The second quirk of AMD hw is that the vblank interrupt fires a few
> >>>> scanlines before start of vblank, so drm_handle_vblank ->
> >>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
> >>>> called before the start of the vblank for which the new vblank count
> >>>> should be queried.
> >>>
> >>> Does it fire too soon, or is the scanout position register value(s)
> >>> just offset by a few lines perhaps?
> >>>
> >>> We have that with i915 and I simply fix up the value when reading it
> >>> out. Fortunately for us the offset is constant (or at least seems to
> >>> be) for a given platform/connector combo.
> >>>
> >>
> >> I think they fire too soon, from all i've seen so far on a few cards.
> >
> > That's unfortunate. Firing a bit too late would be perfectly fine for
> > most things. And that's actually what happens on older intel hw. Firing
> > too soon opens up some more races, as in you may have to wait a bit
> > more after getting woken up to make sure you've crossed into the
> > freame you were waiting for.
> >
> > Also not sure how to deal with that sort of thing in a reasonable way in
> > .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it
> > could fudge things a bit when the early irq arrives, but then if someone
> > calls drm_update_vblank_count() after the irq was handled but before
> > start of vblank you'll end up with a -1 diff.
> >
> > Maybe something like:
> > .get_scanout_positon()
> > {
> > 	read frame counter and scaline position
> >
> > 	if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) &&
> > 	     scanline_slightly_below_start_of_vblank)) {
> > 		last_fudge_frame = frame;
> > 		scanline = start_of_vblank;
> > 	} else
> > 		last_fudge_frame = frame - 1;
> > }
> >
> > could be done. So it would pretend the scanline has already crossed into
> > the vblank when the interrupt arrives, and it should keep doing that
> > until the frame counter indicates that you really crossed there.
> >
> 
> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we 
> are no more than 1% of the display height away from start of vblank we 
> fudge scanout position in a way so that the timestamp gets computed for 
> the soon-to-begin vblank, not the old one.

The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
that if you call .get_scanout_positon() from a non-irq context between
the irq firing and start of vblank, you'll think you're still in the
previous frame. Hence my suggestion to note down the frame counter when
called from the irq, and then keep doing the fudging until you've truly
crossed the start of vblank.

> For the vblank counter 
> cooking atm. i check for in_irq(), assuming that means the same as 
> DRM_CALLED_FROM_VBLIRQ, given that is currently the only call from a hw 
> irq path, and then i test scanout position for 
> no-more-than-10-scanlines-to-vblank, to decide if i have to bump the 
> returned vblank count. On the sample of gpus i tested the vblank irq 
> fired somewhere between 2 and 4 scanlines too early, so 10 scanlines 
> margin should be ok. So it's not firing much too early, just a little bit.
> 
> >>
> >>>>
> >>>> The third problem is that the DRM vblank handling always had the
> >>>> assumption that hardware vblank counters would essentially increment at
> >>>> leading edge of vblank - basically in sync with the firing of the vblank
> >>>> irq, so that a hw counter readout from within the vblank irq handler
> >>>> would always deliver the new incremented value. If this assumption is
> >>>> violated then the counting by use of the hw counter gets unreliable,
> >>>> because depending on random small delays in irq handling the code may
> >>>> end up sampling the hw counter pre- or post-increment, leading to
> >>>> inconsistent updating and funky bugs. It just so happens that AMD
> >>>> hardware doesn't increment the hw counter at leading edge of vblank, so
> >>>> stuff falls apart.
> >>>>
> >>>> So to fix those two problems i'm tinkering with cooking the hw vblank
> >>>> counter value returned by radeon_get_vblank_counter_kms() to make it
> >>>> appear as if the counter incremented at leading edge of vblank in sync
> >>>> with vblank irq.
> >>>
> >>> Yeah, I do that in i915 too. Well, on some platforms where the
> >>> counter increments at the leading edge of active.
> >>>
> >>
> >> Yep. I think that's key to make the vblank instant off stuff work reliably.
> >
> > I think it should work OK with the timestamp based stuff too, assuming
> > you have accurate scanout position reporting. So having a reliable hw
> > counter shouldn't be a requirement for that.
> >
> 
> Yes, should work for the instant off/on itself. But the limitation of 
> that is that with the timestamp based stuff we can only be accurate for 
> vblank off periods of a few seconds into the future. The framedur_ns we 
> calculate from hw mode and the refresh duration computed from the 
> timestamps as measured by the host clock can easily deviate by, e.g., 10 
> usecs due to difference and drift between the gpu pll and host clock, so 
> for each passing refresh cycle with vblank off we'd accumulate ~10 usecs 
> of error. After accumulating half a refresh duration of error we'd round 
> to the wrong count and be off by one. So at 60 Hz refresh we'd have 
> about 16666 usecs / 2 = 8333 usecs headroom / 10 usecs per frame error ~ 
> 833 refresh cycles before errors happen. At 60 Hz we'd have 833 / 60 ~ 
> 14 seconds of vblank off. For rates of 120 Hz gamer panels or some 200 
> Hz neuro-science applications we'd only get a few seconds off error free 
> vblank off without real hardware counters.
> 
> So it's good enough for typical desktop 
> applications/compositors/games/media players, and a nice improvement 
> over the previous state, but not quite sufficient for applications that 
> need long time consistent vblank counts for long waits of multiple 
> seconds or even minutes. So it's still good to have hw counters if we 
> can get some that are reliable enough.

Ah, I didn't realize you care about small errors in the counter for
such long periods of vblank off.

> 
> -mario
> 
> 
> >>
> >> -mario
> >>
> >>>>
> >>>> It almost sort of works on the rs600 code path, but i need a bit of info
> >>>> from you:
> >>>>
> >>>> 1. There's this register from the old specs for m76.pdf, which is not
> >>>> part of the current register defines for radeon-kms:
> >>>>
> >>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> >>>>
> >>>> It contains the lower 16 bits of framecounter and the 13 bits of
> >>>> vertical scanout position. It seems to give the same readings as the 24
> >>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
> >>>> would come handy.
> >>>>
> >>>> Does Evergreen and later have a same/similar register and where is it?
> >>>>
> >>>> 2. The hw framecounter seems to increment when the vertical scanout
> >>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
> >>>> gpu i tested so far. Is this so on all asics? And is the hw counter
> >>>> increment happening exactly at the moment that vertical scanout position
> >>>> jumps back to zero, ie. both events are driven by the same signal? Or is
> >>>> the framecounter increment just happening somewhere inside either
> >>>> scanline VTOTAL-1 or scanline 0?
> >>>>
> >>>>
> >>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
> >>>> regression and with a bit of luck at the same time improve by being able
> >>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
> >>>> to get turned off more aggressively for a bit of extra power saving.
> >>>>
> >>>> thanks,
> >>>> -mario
> >>>
> >>>> -- a/drivers/gpu/drm/drm_irq.c
> >>>> +++ b/drivers/gpu/drm/drm_irq.c
> >>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>                                       unsigned long flags)
> >>>>    {
> >>>>           struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> >>>> -       u32 cur_vblank, diff;
> >>>> +       u32 cur_vblank, diff = 0;
> >>>>           bool rc;
> >>>>           struct timeval t_vblank;
> >>>> +       const struct timeval *t_old;
> >>>> +       u64 diff_ns;
> >>>>           int count = DRM_TIMESTAMP_MAXRETRIES;
> >>>>           int framedur_ns = vblank->framedur_ns;
> >>>>
> >>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>                   rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
> >>>>           } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
> >>>>
> >>>> -       if (dev->max_vblank_count != 0) {
> >>>> -               /* trust the hw counter when it's around */
> >>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>> -       } else if (rc && framedur_ns) {
> >>>> -               const struct timeval *t_old;
> >>>> -               u64 diff_ns;
> >>>> -
> >>>> +       /*
> >>>> +        * Always use vblank timestamping based method if supported to reject
> >>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> >>>> +        * due to some irq handling quirk.
> >>>
> >>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
> >>> counter if it isn't reliable.
> >>>
> >>>> +        *
> >>>> +        * This also sets the diff value for use as fallback below in case the
> >>>> +        * hw does not support a suitable hw vblank counter.
> >>>> +        */
> >>>> +       if (rc && framedur_ns) {
> >>>>                   t_old = &vblanktimestamp(dev, pipe, vblank->count);
> >>>>                   diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> >>>>
> >>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>                    */
> >>>>                   diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> >>>>
> >>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> >>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> >>>> -                                     pipe, (long long) diff_ns, framedur_ns);
> >>>> -       } else {
> >>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> >>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> >>>> +                   pipe, (long long) diff_ns, framedur_ns);
> >>>> +
> >>>> +                   /* Treat this redundant invocation as no-op. */
> >>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> >>>> +                   return;
> >>>> +               }
> >>>> +       }
> >>>> +
> >>>> +       /*
> >>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
> >>>> +        * *must* increment at leading edge of vblank and in sync with
> >>>> +        * the firing of the hw vblank irq, otherwise we have a race here
> >>>> +        * between gpu and us, where small variations in our execution
> >>>> +        * timing can cause off-by-one miscounting of vblanks.
> >>>> +        */
> >>>> +       if (dev->max_vblank_count != 0) {
> >>>> +               /* trust the hw counter when it's around */
> >>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>> +       } else if (!(rc && framedur_ns)) {
> >>>>                   /* some kind of default for drivers w/o accurate vbl timestamping */
> >>>>                   diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>>>           }
> >>>
> >>>
> >
Mario Kleiner Nov. 23, 2015, 3:23 p.m. UTC | #7
On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:

...

>> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we
>> are no more than 1% of the display height away from start of vblank we
>> fudge scanout position in a way so that the timestamp gets computed for
>> the soon-to-begin vblank, not the old one.
>
> The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
> that if you call .get_scanout_positon() from a non-irq context between
> the irq firing and start of vblank, you'll think you're still in the
> previous frame. Hence my suggestion to note down the frame counter when
> called from the irq, and then keep doing the fudging until you've truly
> crossed the start of vblank.
>

Ok, but why would that be a bad thing? I think we want it to think it is 
in the previous frame if it is called outside the vblank irq context. 
The only reason we fudge it to the next frames vblank if i vblank irq is 
because we know the vblank irq handler we are executing atm. was meant 
to execute within the upcoming vblank for the next frame, so we fudge 
the scanout positions and thereby timestamp to correspond to that new 
frame. But if something called outside irq context it should get a 
scanout position/timestamp that corresponds to "reality".

It would maybe be a problem to get different answers for a query at the 
same scanout position if we used .get_scanout_position() for something 
else than for calculating vblank timestamps, but we don't do that atm.

Maybe i'm overlooking something here related to the somewhat rewritten 
code from the last year or so? But in the original design this would be 
exactly what i intended?

...

>> So it's good enough for typical desktop
>> applications/compositors/games/media players, and a nice improvement
>> over the previous state, but not quite sufficient for applications that
>> need long time consistent vblank counts for long waits of multiple
>> seconds or even minutes. So it's still good to have hw counters if we
>> can get some that are reliable enough.
>
> Ah, I didn't realize you care about small errors in the counter for
> such long periods of vblank off.
>

Actually, you are right, i was stupid - not enough sleep last friday. I 
do care about such small errors, but the long vblank off periods don't 
matter at all the way my software works. I query the current count and 
timestamp (glXGetSyncValuesOML), calculate a target count based on those 
and then schedule a swap for that target count via glXSwapBuffersMscOML. 
That swapbuffers call will keep vblank irqs on until the kms pageflip is 
queued. So i only care about vblank counts/timestamps being consistent 
for short amounts of time, typically 1 video refresh from vblank query 
to queueing a vblank event, and then from reception of that 
event/queuing the pageflip to pageflip completion event. So even if the 
system would be heavily loaded and my code would have big preemption 
delays i think counts that are consistent over a few seconds would be 
enough to keep things working. Otherwise it wouldn't work now either 
with a vblank off after 5 seconds and nouveau not having vblank hw counters.

-mario


>>
>> -mario
>>
>>
>>>>
>>>> -mario
>>>>
>>>>>>
>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
>>>>>> from you:
>>>>>>
>>>>>> 1. There's this register from the old specs for m76.pdf, which is not
>>>>>> part of the current register defines for radeon-kms:
>>>>>>
>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>>>
>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>>>> vertical scanout position. It seems to give the same readings as the 24
>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>>>> would come handy.
>>>>>>
>>>>>> Does Evergreen and later have a same/similar register and where is it?
>>>>>>
>>>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
>>>>>> increment happening exactly at the moment that vertical scanout position
>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
>>>>>> the framecounter increment just happening somewhere inside either
>>>>>> scanline VTOTAL-1 or scanline 0?
>>>>>>
>>>>>>
>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>>>>> regression and with a bit of luck at the same time improve by being able
>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
>>>>>> to get turned off more aggressively for a bit of extra power saving.
>>>>>>
>>>>>> thanks,
>>>>>> -mario
>>>>>
>>>>>> -- a/drivers/gpu/drm/drm_irq.c
>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>                                        unsigned long flags)
>>>>>>     {
>>>>>>            struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>>>>>> -       u32 cur_vblank, diff;
>>>>>> +       u32 cur_vblank, diff = 0;
>>>>>>            bool rc;
>>>>>>            struct timeval t_vblank;
>>>>>> +       const struct timeval *t_old;
>>>>>> +       u64 diff_ns;
>>>>>>            int count = DRM_TIMESTAMP_MAXRETRIES;
>>>>>>            int framedur_ns = vblank->framedur_ns;
>>>>>>
>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>                    rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>>>>>            } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>>>>>
>>>>>> -       if (dev->max_vblank_count != 0) {
>>>>>> -               /* trust the hw counter when it's around */
>>>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>>>> -       } else if (rc && framedur_ns) {
>>>>>> -               const struct timeval *t_old;
>>>>>> -               u64 diff_ns;
>>>>>> -
>>>>>> +       /*
>>>>>> +        * Always use vblank timestamping based method if supported to reject
>>>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>>>>>> +        * due to some irq handling quirk.
>>>>>
>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
>>>>> counter if it isn't reliable.
>>>>>
>>>>>> +        *
>>>>>> +        * This also sets the diff value for use as fallback below in case the
>>>>>> +        * hw does not support a suitable hw vblank counter.
>>>>>> +        */
>>>>>> +       if (rc && framedur_ns) {
>>>>>>                    t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>>>>>                    diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>>>>>
>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>                     */
>>>>>>                    diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>>>>>
>>>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>>>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
>>>>>> -       } else {
>>>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>> +                   pipe, (long long) diff_ns, framedur_ns);
>>>>>> +
>>>>>> +                   /* Treat this redundant invocation as no-op. */
>>>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
>>>>>> +                   return;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>> +       /*
>>>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
>>>>>> +        * *must* increment at leading edge of vblank and in sync with
>>>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
>>>>>> +        * between gpu and us, where small variations in our execution
>>>>>> +        * timing can cause off-by-one miscounting of vblanks.
>>>>>> +        */
>>>>>> +       if (dev->max_vblank_count != 0) {
>>>>>> +               /* trust the hw counter when it's around */
>>>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>>>> +       } else if (!(rc && framedur_ns)) {
>>>>>>                    /* some kind of default for drivers w/o accurate vbl timestamping */
>>>>>>                    diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>>>            }
>>>>>
>>>>>
>>>
>
Ville Syrjala Nov. 23, 2015, 3:51 p.m. UTC | #8
On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
> ...
> 
> >> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we
> >> are no more than 1% of the display height away from start of vblank we
> >> fudge scanout position in a way so that the timestamp gets computed for
> >> the soon-to-begin vblank, not the old one.
> >
> > The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
> > that if you call .get_scanout_positon() from a non-irq context between
> > the irq firing and start of vblank, you'll think you're still in the
> > previous frame. Hence my suggestion to note down the frame counter when
> > called from the irq, and then keep doing the fudging until you've truly
> > crossed the start of vblank.
> >
> 
> Ok, but why would that be a bad thing? I think we want it to think it is 
> in the previous frame if it is called outside the vblank irq context. 
> The only reason we fudge it to the next frames vblank if i vblank irq is 
> because we know the vblank irq handler we are executing atm. was meant 
> to execute within the upcoming vblank for the next frame, so we fudge 
> the scanout positions and thereby timestamp to correspond to that new 
> frame. But if something called outside irq context it should get a 
> scanout position/timestamp that corresponds to "reality".

It would be a bad thing since it would cause the timestamp to jump
backwards, and that would also cause the frame count guesstimate to go
backwards.

> 
> It would maybe be a problem to get different answers for a query at the 
> same scanout position if we used .get_scanout_position() for something 
> else than for calculating vblank timestamps, but we don't do that atm.
> 
> Maybe i'm overlooking something here related to the somewhat rewritten 
> code from the last year or so? But in the original design this would be 
> exactly what i intended?
> 
> ...
> 
> >> So it's good enough for typical desktop
> >> applications/compositors/games/media players, and a nice improvement
> >> over the previous state, but not quite sufficient for applications that
> >> need long time consistent vblank counts for long waits of multiple
> >> seconds or even minutes. So it's still good to have hw counters if we
> >> can get some that are reliable enough.
> >
> > Ah, I didn't realize you care about small errors in the counter for
> > such long periods of vblank off.
> >
> 
> Actually, you are right, i was stupid - not enough sleep last friday. I 
> do care about such small errors, but the long vblank off periods don't 
> matter at all the way my software works. I query the current count and 
> timestamp (glXGetSyncValuesOML), calculate a target count based on those 
> and then schedule a swap for that target count via glXSwapBuffersMscOML. 
> That swapbuffers call will keep vblank irqs on until the kms pageflip is 
> queued. So i only care about vblank counts/timestamps being consistent 
> for short amounts of time, typically 1 video refresh from vblank query 
> to queueing a vblank event, and then from reception of that 
> event/queuing the pageflip to pageflip completion event. So even if the 
> system would be heavily loaded and my code would have big preemption 
> delays i think counts that are consistent over a few seconds would be 
> enough to keep things working. Otherwise it wouldn't work now either 
> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
> 
> -mario
> 
> 
> >>
> >> -mario
> >>
> >>
> >>>>
> >>>> -mario
> >>>>
> >>>>>>
> >>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
> >>>>>> from you:
> >>>>>>
> >>>>>> 1. There's this register from the old specs for m76.pdf, which is not
> >>>>>> part of the current register defines for radeon-kms:
> >>>>>>
> >>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> >>>>>>
> >>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
> >>>>>> vertical scanout position. It seems to give the same readings as the 24
> >>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
> >>>>>> would come handy.
> >>>>>>
> >>>>>> Does Evergreen and later have a same/similar register and where is it?
> >>>>>>
> >>>>>> 2. The hw framecounter seems to increment when the vertical scanout
> >>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
> >>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
> >>>>>> increment happening exactly at the moment that vertical scanout position
> >>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
> >>>>>> the framecounter increment just happening somewhere inside either
> >>>>>> scanline VTOTAL-1 or scanline 0?
> >>>>>>
> >>>>>>
> >>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
> >>>>>> regression and with a bit of luck at the same time improve by being able
> >>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
> >>>>>> to get turned off more aggressively for a bit of extra power saving.
> >>>>>>
> >>>>>> thanks,
> >>>>>> -mario
> >>>>>
> >>>>>> -- a/drivers/gpu/drm/drm_irq.c
> >>>>>> +++ b/drivers/gpu/drm/drm_irq.c
> >>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>                                        unsigned long flags)
> >>>>>>     {
> >>>>>>            struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> >>>>>> -       u32 cur_vblank, diff;
> >>>>>> +       u32 cur_vblank, diff = 0;
> >>>>>>            bool rc;
> >>>>>>            struct timeval t_vblank;
> >>>>>> +       const struct timeval *t_old;
> >>>>>> +       u64 diff_ns;
> >>>>>>            int count = DRM_TIMESTAMP_MAXRETRIES;
> >>>>>>            int framedur_ns = vblank->framedur_ns;
> >>>>>>
> >>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>                    rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
> >>>>>>            } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
> >>>>>>
> >>>>>> -       if (dev->max_vblank_count != 0) {
> >>>>>> -               /* trust the hw counter when it's around */
> >>>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>>>> -       } else if (rc && framedur_ns) {
> >>>>>> -               const struct timeval *t_old;
> >>>>>> -               u64 diff_ns;
> >>>>>> -
> >>>>>> +       /*
> >>>>>> +        * Always use vblank timestamping based method if supported to reject
> >>>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> >>>>>> +        * due to some irq handling quirk.
> >>>>>
> >>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
> >>>>> counter if it isn't reliable.
> >>>>>
> >>>>>> +        *
> >>>>>> +        * This also sets the diff value for use as fallback below in case the
> >>>>>> +        * hw does not support a suitable hw vblank counter.
> >>>>>> +        */
> >>>>>> +       if (rc && framedur_ns) {
> >>>>>>                    t_old = &vblanktimestamp(dev, pipe, vblank->count);
> >>>>>>                    diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> >>>>>>
> >>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>                     */
> >>>>>>                    diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> >>>>>>
> >>>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> >>>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> >>>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
> >>>>>> -       } else {
> >>>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> >>>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> >>>>>> +                   pipe, (long long) diff_ns, framedur_ns);
> >>>>>> +
> >>>>>> +                   /* Treat this redundant invocation as no-op. */
> >>>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> >>>>>> +                   return;
> >>>>>> +               }
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       /*
> >>>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
> >>>>>> +        * *must* increment at leading edge of vblank and in sync with
> >>>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
> >>>>>> +        * between gpu and us, where small variations in our execution
> >>>>>> +        * timing can cause off-by-one miscounting of vblanks.
> >>>>>> +        */
> >>>>>> +       if (dev->max_vblank_count != 0) {
> >>>>>> +               /* trust the hw counter when it's around */
> >>>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>>>> +       } else if (!(rc && framedur_ns)) {
> >>>>>>                    /* some kind of default for drivers w/o accurate vbl timestamping */
> >>>>>>                    diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>>>>>            }
> >>>>>
> >>>>>
> >>>
> >
Mario Kleiner Nov. 23, 2015, 5:58 p.m. UTC | #9
On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>
>> ...
>> Ok, but why would that be a bad thing? I think we want it to think it is
>> in the previous frame if it is called outside the vblank irq context.
>> The only reason we fudge it to the next frames vblank if i vblank irq is
>> because we know the vblank irq handler we are executing atm. was meant
>> to execute within the upcoming vblank for the next frame, so we fudge
>> the scanout positions and thereby timestamp to correspond to that new
>> frame. But if something called outside irq context it should get a
>> scanout position/timestamp that corresponds to "reality".
>
> It would be a bad thing since it would cause the timestamp to jump
> backwards, and that would also cause the frame count guesstimate to go
> backwards.
>

But only if we don't use the dev->driver->get_vblank_counter() method, 
which we try to use on AMD. Also dev->vblank_time_lock should protect us 
from concurrent execution of the vblank counting/timestamping in irq 
context and non-irq context? At least that was one of the purposes of 
that lock in the past?

-mario

>>
>> It would maybe be a problem to get different answers for a query at the
>> same scanout position if we used .get_scanout_position() for something
>> else than for calculating vblank timestamps, but we don't do that atm.
>>
>> Maybe i'm overlooking something here related to the somewhat rewritten
>> code from the last year or so? But in the original design this would be
>> exactly what i intended?
>>
>> ...
>>
>>>> So it's good enough for typical desktop
>>>> applications/compositors/games/media players, and a nice improvement
>>>> over the previous state, but not quite sufficient for applications that
>>>> need long time consistent vblank counts for long waits of multiple
>>>> seconds or even minutes. So it's still good to have hw counters if we
>>>> can get some that are reliable enough.
>>>
>>> Ah, I didn't realize you care about small errors in the counter for
>>> such long periods of vblank off.
>>>
>>
>> Actually, you are right, i was stupid - not enough sleep last friday. I
>> do care about such small errors, but the long vblank off periods don't
>> matter at all the way my software works. I query the current count and
>> timestamp (glXGetSyncValuesOML), calculate a target count based on those
>> and then schedule a swap for that target count via glXSwapBuffersMscOML.
>> That swapbuffers call will keep vblank irqs on until the kms pageflip is
>> queued. So i only care about vblank counts/timestamps being consistent
>> for short amounts of time, typically 1 video refresh from vblank query
>> to queueing a vblank event, and then from reception of that
>> event/queuing the pageflip to pageflip completion event. So even if the
>> system would be heavily loaded and my code would have big preemption
>> delays i think counts that are consistent over a few seconds would be
>> enough to keep things working. Otherwise it wouldn't work now either
>> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
>>
>> -mario
>>
>>
>>>>
>>>> -mario
>>>>
>>>>
>>>>>>
>>>>>> -mario
>>>>>>
>>>>>>>>
>>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
>>>>>>>> from you:
>>>>>>>>
>>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not
>>>>>>>> part of the current register defines for radeon-kms:
>>>>>>>>
>>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>>>>>
>>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>>>>>> vertical scanout position. It seems to give the same readings as the 24
>>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>>>>>> would come handy.
>>>>>>>>
>>>>>>>> Does Evergreen and later have a same/similar register and where is it?
>>>>>>>>
>>>>>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>>>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
>>>>>>>> increment happening exactly at the moment that vertical scanout position
>>>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
>>>>>>>> the framecounter increment just happening somewhere inside either
>>>>>>>> scanline VTOTAL-1 or scanline 0?
>>>>>>>>
>>>>>>>>
>>>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>>>>>>> regression and with a bit of luck at the same time improve by being able
>>>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
>>>>>>>> to get turned off more aggressively for a bit of extra power saving.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> -mario
>>>>>>>
>>>>>>>> -- a/drivers/gpu/drm/drm_irq.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>>>                                         unsigned long flags)
>>>>>>>>      {
>>>>>>>>             struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>>>>>>>> -       u32 cur_vblank, diff;
>>>>>>>> +       u32 cur_vblank, diff = 0;
>>>>>>>>             bool rc;
>>>>>>>>             struct timeval t_vblank;
>>>>>>>> +       const struct timeval *t_old;
>>>>>>>> +       u64 diff_ns;
>>>>>>>>             int count = DRM_TIMESTAMP_MAXRETRIES;
>>>>>>>>             int framedur_ns = vblank->framedur_ns;
>>>>>>>>
>>>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>>>                     rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>>>>>>>             } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>>>>>>>
>>>>>>>> -       if (dev->max_vblank_count != 0) {
>>>>>>>> -               /* trust the hw counter when it's around */
>>>>>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>>>>>> -       } else if (rc && framedur_ns) {
>>>>>>>> -               const struct timeval *t_old;
>>>>>>>> -               u64 diff_ns;
>>>>>>>> -
>>>>>>>> +       /*
>>>>>>>> +        * Always use vblank timestamping based method if supported to reject
>>>>>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>>>>>>>> +        * due to some irq handling quirk.
>>>>>>>
>>>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
>>>>>>> counter if it isn't reliable.
>>>>>>>
>>>>>>>> +        *
>>>>>>>> +        * This also sets the diff value for use as fallback below in case the
>>>>>>>> +        * hw does not support a suitable hw vblank counter.
>>>>>>>> +        */
>>>>>>>> +       if (rc && framedur_ns) {
>>>>>>>>                     t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>>>>>>>                     diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>>>>>>>
>>>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>>>                      */
>>>>>>>>                     diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>>>>>>>
>>>>>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>>>>>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
>>>>>>>> -       } else {
>>>>>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>>>>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>>>> +                   pipe, (long long) diff_ns, framedur_ns);
>>>>>>>> +
>>>>>>>> +                   /* Treat this redundant invocation as no-op. */
>>>>>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
>>>>>>>> +                   return;
>>>>>>>> +               }
>>>>>>>> +       }
>>>>>>>> +
>>>>>>>> +       /*
>>>>>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
>>>>>>>> +        * *must* increment at leading edge of vblank and in sync with
>>>>>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
>>>>>>>> +        * between gpu and us, where small variations in our execution
>>>>>>>> +        * timing can cause off-by-one miscounting of vblanks.
>>>>>>>> +        */
>>>>>>>> +       if (dev->max_vblank_count != 0) {
>>>>>>>> +               /* trust the hw counter when it's around */
>>>>>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>>>>>> +       } else if (!(rc && framedur_ns)) {
>>>>>>>>                     /* some kind of default for drivers w/o accurate vbl timestamping */
>>>>>>>>                     diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>>>>>             }
>>>>>>>
>>>>>>>
>>>>>
>>>
>
Harry Wentland Nov. 23, 2015, 8:04 p.m. UTC | #10
Hi Mario,

when we've had issues with this on amdgpu Christian fixed it by enabling 
page flip irq all the time, rather than turning it on when usermode 
request a flip and turning it back off after we handled it. I believe 
that fix exists on radeon already. Michel should have more info on that.

See other comments inline.

Thanks,
Harry


On 2015-11-23 11:02 AM, Mario Kleiner wrote:
> On 11/20/2015 04:42 AM, Alex Deucher wrote:
>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>> <mario.kleiner.de@gmail.com> wrote:
>>> Hi Alex and Michel and Ville,
>>>
>>> it's "fix vblank stuff" time again ;-)
>>
>> Adding Harry from our display team.  He might be able to fill in the
>> blanks of on some of this better than I can.  It might also be worth
>> checking to see how our DAL (our new display code which is being
>> developed directly by our display team) code handles this.  It could
>> be that we are just missing register settings:
>
> Thanks Alex! And hello Harry :)
>
>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
>
> I'll have a look at this.
>
>> Additionally we've published full registers headers for the display 
>> block:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce 
>>
>> The DCE8 stuff should generally apply back to DCE4.  If you have
>> questions about registers older asics not covered in the hw docs, let
>> me know.  Note the new headers are dword aligned rather than byte
>> aligned.
>>
>
> I've tested now with two different progressive modes on DCE3 and one 
> progressive mode on DCE4, the only cards i have atm. So far it seems 
> that the framecounter indeed increments when the vpos from the scanout 
> position query jumps back to zero. Attached for reference is my 
> current patch to radeon-kms. This one seems to work reliably so far, 
> also if i enable the immediate vblank irq disable which we so far only 
> used on intel-kms.
>
> But according to this patch the framecounter increment happens 
> somewhere in the middle of vblank.
>
> Essentially from the vpos extracted from 
> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start" 
> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 
> the framecounter stays on the count of the old previous scanout cycle. 
> Then when vpos wraps to zero the framecounter increments by 1. And 
> then we have another couple of dozen lines inside vblank until 
> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and 
> entering active scanout for the new frame.
>
> So the position of observed framecounter increment seems to be not 
> close to the end of vblank ("start of frame" indicator?), but a couple 
> of scanlines after start of vblank.
>
> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, 
> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the 
> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and 
> the framecounter increments by 1, then 38 scanlines later the vblank 
> ends.
>
> So i seem to have something that seems to work in practice and this 
> "increment framecounter if vpos wraps back to zero" behavior makes 
> some sense. It just doesn't conform to what those descriptions for 
> start_line and "start of frame" indicator describe?
>
This is correct. Our HW doesn't really have a vblank counter but a frame 
counter. The framecounter increments at the start of vsync, which is 
when we wrap to zero which doesn't coincide with the start of vblank.

What we're trying to do with get_vblank_counter isn't the same as what 
framecount gives us, but we could probably do something like:

if (get_scanout_pos > vblank_start)
   return frame_count + 1;
else
   return frame_count;

Harry

> I'll test with a few more video modes.
>
> thanks,
> -mario
>
>
>>>
>>> Ville's changes to the DRM's drm_handle_vblank() / 
>>> drm_update_vblank_count()
>>> code in Linux 4.4 not only made that code more elegant, but also 
>>> removed the
>>> robustness against the vblank irq quirks in AMD hw and similar 
>>> hardware. So
>>> now i get tons of off-by-one errors and
>>>
>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>>> completion event has impossible msc 24803 < target_msc 24804" XOrg 
>>> messages
>>> from that kernel.
>>>
>>> One of the reasons for trouble is that AMD hw quirk where the hw 
>>> fires an
>>> extra vblank irq shortly after vblank irq's get enabled, not 
>>> synchronized to
>>> vblank, but typically in the middle of active scanout, so we get a 
>>> redundant
>>> call to drm_handle_vblank in the middle of scanout.
>>>
>>> To fix that i have a minor patch to make drm_update_vblank_count() 
>>> again
>>> robust against such redundant calls, which i will send out later to the
>>> mailing list. Diff attached for reference.
>>>
>>> The second quirk of AMD hw is that the vblank interrupt fires a few
>>> scanlines before start of vblank, so drm_handle_vblank ->
>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
>>> called
>>> before the start of the vblank for which the new vblank count should be
>>> queried.
>>>
>>> The third problem is that the DRM vblank handling always had the 
>>> assumption
>>> that hardware vblank counters would essentially increment at leading 
>>> edge of
>>> vblank - basically in sync with the firing of the vblank irq, so 
>>> that a hw
>>> counter readout from within the vblank irq handler would always 
>>> deliver the
>>> new incremented value. If this assumption is violated then the 
>>> counting by
>>> use of the hw counter gets unreliable, because depending on random 
>>> small
>>> delays in irq handling the code may end up sampling the hw counter 
>>> pre- or
>>> post-increment, leading to inconsistent updating and funky bugs. It 
>>> just so
>>> happens that AMD hardware doesn't increment the hw counter at 
>>> leading edge
>>> of vblank, so stuff falls apart.
>>>
>>> So to fix those two problems i'm tinkering with cooking the hw vblank
>>> counter value returned by radeon_get_vblank_counter_kms() to make it 
>>> appear
>>> as if the counter incremented at leading edge of vblank in sync with 
>>> vblank
>>> irq.
>>>
>>> It almost sort of works on the rs600 code path, but i need a bit of 
>>> info
>>> from you:
>>>
>>> 1. There's this register from the old specs for m76.pdf, which is 
>>> not part
>>> of the current register defines for radeon-kms:
>>>
>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>
>>> It contains the lower 16 bits of framecounter and the 13 bits of 
>>> vertical
>>> scanout position. It seems to give the same readings as the 24 bit
>>> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
>>> would
>>> come handy.
>>>
>>> Does Evergreen and later have a same/similar register and where is it?
>>
>> Yes.  CRTC_STATUS_VF_COUNT
>>
>> CRTC:CRTC_STATUS_VF_COUNT  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32  ·
>>   [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2]
>> GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4]
>> GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c
>> DESCRIPTION: Current composite vertical and frame count for CRTC
>> Field Name   Bits    Default   Description
>> CRTC_VF_COUNT
>> (Access: R)    28:0    0x0    Reports current vertical and frame count
>>
>>>
>>> 2. The hw framecounter seems to increment when the vertical scanout 
>>> position
>>> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i 
>>> tested so
>>> far. Is this so on all asics? And is the hw counter increment happening
>>> exactly at the moment that vertical scanout position jumps back to 
>>> zero, ie.
>>> both events are driven by the same signal? Or is the framecounter 
>>> increment
>>> just happening somewhere inside either scanline VTOTAL-1 or scanline 0?
>>
>> Unless Harry knows, we can ask the hw team, but I doubt they would
>> have changed it.  I think it's tied to the start line which is
>> controlled by this register:
>> CRTC:CRTC_START_LINE_CONTROL  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32
>>   ·  [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2]
>> GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4]
>> GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc
>> DESCRIPTION: move start_line signal earlier by 1 line in CRTC
>> Field Name   Bits          Default         Description
>> CRTC_PROGRESSIVE_START_LINE_EARLY         0    0x0    move start_line
>> signal by 1 line eariler in progressive mode
>>
>> CRTC_INTERLACE_START_LINE_EARLY         8     0x1     move start_line
>> signal by 1 line earlier in interlaced timing mode
>>
>> CRTC_ADVANCED_START_LINE_POSITION         19:16     0x4 Advanced
>> Start Line position respect to not VBLANK. Default of 4 means the
>> Advanced Start Line is 4 lines before the first non VBLANK line
>>
>> The following info I dug up internally may be useful:
>>
>> The position of the CRTC output timing generator when the “start of
>> frame” indicator occurs depends on several conditions.  These
>> conditions are whether the timing generator is outputting timing for a
>> progressive or interlaced display mode, whether the scaler is enabled,
>> and if so, whether the register to select an earlier than normal
>> “start of frame” indicator is set.  The “start of frame” indicator
>> typically occurs 2 lines before the vertical blank end position
>> (DxCRTC_V_BLANK_END) is reached
>>
>> When interlaced output display modes are enabled
>> (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled
>> (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts
>> by 2 for both the even fields and odd fields of each frame.  For both
>> the even fields and the odd fields of interlaced output modes, the
>> “start of frame” indicator occurs 2 lines before the end of the
>> vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END –
>> 3 depending on the field since the vertical counter counts by 2 in
>> interlaced), since the vertical counter counts by 2 in this mode).
>> There is one exception to the previous statement.  If scaling is
>> enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled
>> (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then
>> the “start of frame” indicator happens 3 lines before the end of the
>> vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5
>> depending on the field since the vertical counter counts by 2 in
>> interlaced).
>>
>> When progressive output display modes are enabled
>> (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled
>> (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines
>> before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2).
>> Similar to interlaced output mode, there is one exception to the
>> previous statement.  If scaling is enabled (DxSCL_SCALE_EN = 1) and
>> the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY =
>> 1) in progressive output mode then the “start of frame” indicator
>> happens 3 lines before the end of the vertical blank
>> (DxCRTC_V_BLANK_END – 3)
>>
>> I hope this helps,
>>
>> Alex
>>
>>>
>>>
>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>> regression and with a bit of luck at the same time improve by being 
>>> able to
>>> set dev->vblank_disable_immediate = true then and allow vblank irqs 
>>> to get
>>> turned off more aggressively for a bit of extra power saving.
>>>
>>> thanks,
>>> -mario
Ville Syrjala Nov. 23, 2015, 8:24 p.m. UTC | #11
On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> 
> 
> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> > On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> >>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>
> >> ...
> >> Ok, but why would that be a bad thing? I think we want it to think it is
> >> in the previous frame if it is called outside the vblank irq context.
> >> The only reason we fudge it to the next frames vblank if i vblank irq is
> >> because we know the vblank irq handler we are executing atm. was meant
> >> to execute within the upcoming vblank for the next frame, so we fudge
> >> the scanout positions and thereby timestamp to correspond to that new
> >> frame. But if something called outside irq context it should get a
> >> scanout position/timestamp that corresponds to "reality".
> >
> > It would be a bad thing since it would cause the timestamp to jump
> > backwards, and that would also cause the frame count guesstimate to go
> > backwards.
> >
> 
> But only if we don't use the dev->driver->get_vblank_counter() method, 
> which we try to use on AMD.

Well, if you do it that way then you have the problem of the hw counter
seeming to jump forward by one after crossing the start of vblank (when
compared to the value you sampled when you processed the early vblank
interrupt).

I guess one silly idea would be to defer the vblank interrupt processing
to a timer, and just schedule it a bit into the future from the actual
interrupt handler.

> Also dev->vblank_time_lock should protect us 
> from concurrent execution of the vblank counting/timestamping in irq 
> context and non-irq context? At least that was one of the purposes of 
> that lock in the past?

The scenarion I outlined doesn't require anything to happen
concurrently.

> 
> -mario
> 
> >>
> >> It would maybe be a problem to get different answers for a query at the
> >> same scanout position if we used .get_scanout_position() for something
> >> else than for calculating vblank timestamps, but we don't do that atm.
> >>
> >> Maybe i'm overlooking something here related to the somewhat rewritten
> >> code from the last year or so? But in the original design this would be
> >> exactly what i intended?
> >>
> >> ...
> >>
> >>>> So it's good enough for typical desktop
> >>>> applications/compositors/games/media players, and a nice improvement
> >>>> over the previous state, but not quite sufficient for applications that
> >>>> need long time consistent vblank counts for long waits of multiple
> >>>> seconds or even minutes. So it's still good to have hw counters if we
> >>>> can get some that are reliable enough.
> >>>
> >>> Ah, I didn't realize you care about small errors in the counter for
> >>> such long periods of vblank off.
> >>>
> >>
> >> Actually, you are right, i was stupid - not enough sleep last friday. I
> >> do care about such small errors, but the long vblank off periods don't
> >> matter at all the way my software works. I query the current count and
> >> timestamp (glXGetSyncValuesOML), calculate a target count based on those
> >> and then schedule a swap for that target count via glXSwapBuffersMscOML.
> >> That swapbuffers call will keep vblank irqs on until the kms pageflip is
> >> queued. So i only care about vblank counts/timestamps being consistent
> >> for short amounts of time, typically 1 video refresh from vblank query
> >> to queueing a vblank event, and then from reception of that
> >> event/queuing the pageflip to pageflip completion event. So even if the
> >> system would be heavily loaded and my code would have big preemption
> >> delays i think counts that are consistent over a few seconds would be
> >> enough to keep things working. Otherwise it wouldn't work now either
> >> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
> >>
> >> -mario
> >>
> >>
> >>>>
> >>>> -mario
> >>>>
> >>>>
> >>>>>>
> >>>>>> -mario
> >>>>>>
> >>>>>>>>
> >>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
> >>>>>>>> from you:
> >>>>>>>>
> >>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not
> >>>>>>>> part of the current register defines for radeon-kms:
> >>>>>>>>
> >>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> >>>>>>>>
> >>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
> >>>>>>>> vertical scanout position. It seems to give the same readings as the 24
> >>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
> >>>>>>>> would come handy.
> >>>>>>>>
> >>>>>>>> Does Evergreen and later have a same/similar register and where is it?
> >>>>>>>>
> >>>>>>>> 2. The hw framecounter seems to increment when the vertical scanout
> >>>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
> >>>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
> >>>>>>>> increment happening exactly at the moment that vertical scanout position
> >>>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
> >>>>>>>> the framecounter increment just happening somewhere inside either
> >>>>>>>> scanline VTOTAL-1 or scanline 0?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
> >>>>>>>> regression and with a bit of luck at the same time improve by being able
> >>>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
> >>>>>>>> to get turned off more aggressively for a bit of extra power saving.
> >>>>>>>>
> >>>>>>>> thanks,
> >>>>>>>> -mario
> >>>>>>>
> >>>>>>>> -- a/drivers/gpu/drm/drm_irq.c
> >>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
> >>>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>>>                                         unsigned long flags)
> >>>>>>>>      {
> >>>>>>>>             struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> >>>>>>>> -       u32 cur_vblank, diff;
> >>>>>>>> +       u32 cur_vblank, diff = 0;
> >>>>>>>>             bool rc;
> >>>>>>>>             struct timeval t_vblank;
> >>>>>>>> +       const struct timeval *t_old;
> >>>>>>>> +       u64 diff_ns;
> >>>>>>>>             int count = DRM_TIMESTAMP_MAXRETRIES;
> >>>>>>>>             int framedur_ns = vblank->framedur_ns;
> >>>>>>>>
> >>>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>>>                     rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
> >>>>>>>>             } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
> >>>>>>>>
> >>>>>>>> -       if (dev->max_vblank_count != 0) {
> >>>>>>>> -               /* trust the hw counter when it's around */
> >>>>>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>>>>>> -       } else if (rc && framedur_ns) {
> >>>>>>>> -               const struct timeval *t_old;
> >>>>>>>> -               u64 diff_ns;
> >>>>>>>> -
> >>>>>>>> +       /*
> >>>>>>>> +        * Always use vblank timestamping based method if supported to reject
> >>>>>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> >>>>>>>> +        * due to some irq handling quirk.
> >>>>>>>
> >>>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
> >>>>>>> counter if it isn't reliable.
> >>>>>>>
> >>>>>>>> +        *
> >>>>>>>> +        * This also sets the diff value for use as fallback below in case the
> >>>>>>>> +        * hw does not support a suitable hw vblank counter.
> >>>>>>>> +        */
> >>>>>>>> +       if (rc && framedur_ns) {
> >>>>>>>>                     t_old = &vblanktimestamp(dev, pipe, vblank->count);
> >>>>>>>>                     diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> >>>>>>>>
> >>>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>>>                      */
> >>>>>>>>                     diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> >>>>>>>>
> >>>>>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> >>>>>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>>>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> >>>>>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
> >>>>>>>> -       } else {
> >>>>>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> >>>>>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>>>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> >>>>>>>> +                   pipe, (long long) diff_ns, framedur_ns);
> >>>>>>>> +
> >>>>>>>> +                   /* Treat this redundant invocation as no-op. */
> >>>>>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> >>>>>>>> +                   return;
> >>>>>>>> +               }
> >>>>>>>> +       }
> >>>>>>>> +
> >>>>>>>> +       /*
> >>>>>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
> >>>>>>>> +        * *must* increment at leading edge of vblank and in sync with
> >>>>>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
> >>>>>>>> +        * between gpu and us, where small variations in our execution
> >>>>>>>> +        * timing can cause off-by-one miscounting of vblanks.
> >>>>>>>> +        */
> >>>>>>>> +       if (dev->max_vblank_count != 0) {
> >>>>>>>> +               /* trust the hw counter when it's around */
> >>>>>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>>>>>> +       } else if (!(rc && framedur_ns)) {
> >>>>>>>>                     /* some kind of default for drivers w/o accurate vbl timestamping */
> >>>>>>>>                     diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>>>>>>>             }
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >
Mario Kleiner Nov. 23, 2015, 9:20 p.m. UTC | #12
On 11/23/2015 09:04 PM, Harry Wentland wrote:
> Hi Mario,
>
> when we've had issues with this on amdgpu Christian fixed it by enabling
> page flip irq all the time, rather than turning it on when usermode
> request a flip and turning it back off after we handled it. I believe
> that fix exists on radeon already. Michel should have more info on that.
>
> See other comments inline.
>
> Thanks,
> Harry
>
>
> On 2015-11-23 11:02 AM, Mario Kleiner wrote:
>> On 11/20/2015 04:42 AM, Alex Deucher wrote:
>>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
>>> <mario.kleiner.de@gmail.com> wrote:
>>>> Hi Alex and Michel and Ville,
>>>>
>>>> it's "fix vblank stuff" time again ;-)
>>>
>>> Adding Harry from our display team.  He might be able to fill in the
>>> blanks of on some of this better than I can.  It might also be worth
>>> checking to see how our DAL (our new display code which is being
>>> developed directly by our display team) code handles this.  It could
>>> be that we are just missing register settings:
>>
>> Thanks Alex! And hello Harry :)
>>
>>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip
>>
>> I'll have a look at this.
>>
>>> Additionally we've published full registers headers for the display
>>> block:
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce
>>>
>>> The DCE8 stuff should generally apply back to DCE4.  If you have
>>> questions about registers older asics not covered in the hw docs, let
>>> me know.  Note the new headers are dword aligned rather than byte
>>> aligned.
>>>
>>
>> I've tested now with two different progressive modes on DCE3 and one
>> progressive mode on DCE4, the only cards i have atm. So far it seems
>> that the framecounter indeed increments when the vpos from the scanout
>> position query jumps back to zero. Attached for reference is my
>> current patch to radeon-kms. This one seems to work reliably so far,
>> also if i enable the immediate vblank irq disable which we so far only
>> used on intel-kms.
>>
>> But according to this patch the framecounter increment happens
>> somewhere in the middle of vblank.
>>
>> Essentially from the vpos extracted from
>> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start"
>> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1
>> the framecounter stays on the count of the old previous scanout cycle.
>> Then when vpos wraps to zero the framecounter increments by 1. And
>> then we have another couple of dozen lines inside vblank until
>> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and
>> entering active scanout for the new frame.
>>
>> So the position of observed framecounter increment seems to be not
>> close to the end of vblank ("start of frame" indicator?), but a couple
>> of scanlines after start of vblank.
>>
>> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478,
>> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the
>> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and
>> the framecounter increments by 1, then 38 scanlines later the vblank
>> ends.
>>
>> So i seem to have something that seems to work in practice and this
>> "increment framecounter if vpos wraps back to zero" behavior makes
>> some sense. It just doesn't conform to what those descriptions for
>> start_line and "start of frame" indicator describe?
>>
> This is correct. Our HW doesn't really have a vblank counter but a frame
> counter. The framecounter increments at the start of vsync, which is
> when we wrap to zero which doesn't coincide with the start of vblank.
>
> What we're trying to do with get_vblank_counter isn't the same as what
> framecount gives us, but we could probably do something like:
>
> if (get_scanout_pos > vblank_start)
>    return frame_count + 1;
> else
>    return frame_count;
>

Great! That's what my current patch does and it seems to work well with 
different video modes on both DCE3 and DCE4. So theory agrees with 
practice again :) - thanks for clarifying this.

So the other problem we have since forever is the vblank irq firing 
before the start of vblank. We are typically 1-2 scanlines before 
vblank_start when we sample the scanout position and framecounter. This 
needs a slightly ugly workaround. Is there a way, maybe some config 
register, to fire the irq at leading edge of vblank instead of a bit too 
early?

thanks,
-mario

> Harry
>
>> I'll test with a few more video modes.
>>
>> thanks,
>> -mario
>>
>>
>>>>
>>>> Ville's changes to the DRM's drm_handle_vblank() /
>>>> drm_update_vblank_count()
>>>> code in Linux 4.4 not only made that code more elegant, but also
>>>> removed the
>>>> robustness against the vblank irq quirks in AMD hw and similar
>>>> hardware. So
>>>> now i get tons of off-by-one errors and
>>>>
>>>> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
>>>> completion event has impossible msc 24803 < target_msc 24804" XOrg
>>>> messages
>>>> from that kernel.
>>>>
>>>> One of the reasons for trouble is that AMD hw quirk where the hw
>>>> fires an
>>>> extra vblank irq shortly after vblank irq's get enabled, not
>>>> synchronized to
>>>> vblank, but typically in the middle of active scanout, so we get a
>>>> redundant
>>>> call to drm_handle_vblank in the middle of scanout.
>>>>
>>>> To fix that i have a minor patch to make drm_update_vblank_count()
>>>> again
>>>> robust against such redundant calls, which i will send out later to the
>>>> mailing list. Diff attached for reference.
>>>>
>>>> The second quirk of AMD hw is that the vblank interrupt fires a few
>>>> scanlines before start of vblank, so drm_handle_vblank ->
>>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
>>>> called
>>>> before the start of the vblank for which the new vblank count should be
>>>> queried.
>>>>
>>>> The third problem is that the DRM vblank handling always had the
>>>> assumption
>>>> that hardware vblank counters would essentially increment at leading
>>>> edge of
>>>> vblank - basically in sync with the firing of the vblank irq, so
>>>> that a hw
>>>> counter readout from within the vblank irq handler would always
>>>> deliver the
>>>> new incremented value. If this assumption is violated then the
>>>> counting by
>>>> use of the hw counter gets unreliable, because depending on random
>>>> small
>>>> delays in irq handling the code may end up sampling the hw counter
>>>> pre- or
>>>> post-increment, leading to inconsistent updating and funky bugs. It
>>>> just so
>>>> happens that AMD hardware doesn't increment the hw counter at
>>>> leading edge
>>>> of vblank, so stuff falls apart.
>>>>
>>>> So to fix those two problems i'm tinkering with cooking the hw vblank
>>>> counter value returned by radeon_get_vblank_counter_kms() to make it
>>>> appear
>>>> as if the counter incremented at leading edge of vblank in sync with
>>>> vblank
>>>> irq.
>>>>
>>>> It almost sort of works on the rs600 code path, but i need a bit of
>>>> info
>>>> from you:
>>>>
>>>> 1. There's this register from the old specs for m76.pdf, which is
>>>> not part
>>>> of the current register defines for radeon-kms:
>>>>
>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>
>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>> vertical
>>>> scanout position. It seems to give the same readings as the 24 bit
>>>> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>> would
>>>> come handy.
>>>>
>>>> Does Evergreen and later have a same/similar register and where is it?
>>>
>>> Yes.  CRTC_STATUS_VF_COUNT
>>>
>>> CRTC:CRTC_STATUS_VF_COUNT  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32  ·
>>>   [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2]
>>> GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4]
>>> GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c
>>> DESCRIPTION: Current composite vertical and frame count for CRTC
>>> Field Name   Bits    Default   Description
>>> CRTC_VF_COUNT
>>> (Access: R)    28:0    0x0    Reports current vertical and frame count
>>>
>>>>
>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>> position
>>>> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i
>>>> tested so
>>>> far. Is this so on all asics? And is the hw counter increment happening
>>>> exactly at the moment that vertical scanout position jumps back to
>>>> zero, ie.
>>>> both events are driven by the same signal? Or is the framecounter
>>>> increment
>>>> just happening somewhere inside either scanline VTOTAL-1 or scanline 0?
>>>
>>> Unless Harry knows, we can ask the hw team, but I doubt they would
>>> have changed it.  I think it's tied to the start line which is
>>> controlled by this register:
>>> CRTC:CRTC_START_LINE_CONTROL  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32
>>>   ·  [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2]
>>> GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4]
>>> GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc
>>> DESCRIPTION: move start_line signal earlier by 1 line in CRTC
>>> Field Name   Bits          Default         Description
>>> CRTC_PROGRESSIVE_START_LINE_EARLY         0    0x0    move start_line
>>> signal by 1 line eariler in progressive mode
>>>
>>> CRTC_INTERLACE_START_LINE_EARLY         8     0x1     move start_line
>>> signal by 1 line earlier in interlaced timing mode
>>>
>>> CRTC_ADVANCED_START_LINE_POSITION         19:16     0x4 Advanced
>>> Start Line position respect to not VBLANK. Default of 4 means the
>>> Advanced Start Line is 4 lines before the first non VBLANK line
>>>
>>> The following info I dug up internally may be useful:
>>>
>>> The position of the CRTC output timing generator when the “start of
>>> frame” indicator occurs depends on several conditions.  These
>>> conditions are whether the timing generator is outputting timing for a
>>> progressive or interlaced display mode, whether the scaler is enabled,
>>> and if so, whether the register to select an earlier than normal
>>> “start of frame” indicator is set.  The “start of frame” indicator
>>> typically occurs 2 lines before the vertical blank end position
>>> (DxCRTC_V_BLANK_END) is reached
>>>
>>> When interlaced output display modes are enabled
>>> (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled
>>> (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts
>>> by 2 for both the even fields and odd fields of each frame.  For both
>>> the even fields and the odd fields of interlaced output modes, the
>>> “start of frame” indicator occurs 2 lines before the end of the
>>> vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END –
>>> 3 depending on the field since the vertical counter counts by 2 in
>>> interlaced), since the vertical counter counts by 2 in this mode).
>>> There is one exception to the previous statement.  If scaling is
>>> enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled
>>> (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then
>>> the “start of frame” indicator happens 3 lines before the end of the
>>> vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5
>>> depending on the field since the vertical counter counts by 2 in
>>> interlaced).
>>>
>>> When progressive output display modes are enabled
>>> (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled
>>> (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines
>>> before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2).
>>> Similar to interlaced output mode, there is one exception to the
>>> previous statement.  If scaling is enabled (DxSCL_SCALE_EN = 1) and
>>> the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY =
>>> 1) in progressive output mode then the “start of frame” indicator
>>> happens 3 lines before the end of the vertical blank
>>> (DxCRTC_V_BLANK_END – 3)
>>>
>>> I hope this helps,
>>>
>>> Alex
>>>
>>>>
>>>>
>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>>> regression and with a bit of luck at the same time improve by being
>>>> able to
>>>> set dev->vblank_disable_immediate = true then and allow vblank irqs
>>>> to get
>>>> turned off more aggressively for a bit of extra power saving.
>>>>
>>>> thanks,
>>>> -mario
>
Mario Kleiner Nov. 25, 2015, 5:24 p.m. UTC | #13
On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>
>>
>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>
>>>> ...
>>>> Ok, but why would that be a bad thing? I think we want it to think it is
>>>> in the previous frame if it is called outside the vblank irq context.
>>>> The only reason we fudge it to the next frames vblank if i vblank irq is
>>>> because we know the vblank irq handler we are executing atm. was meant
>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>> the scanout positions and thereby timestamp to correspond to that new
>>>> frame. But if something called outside irq context it should get a
>>>> scanout position/timestamp that corresponds to "reality".
>>>
>>> It would be a bad thing since it would cause the timestamp to jump
>>> backwards, and that would also cause the frame count guesstimate to go
>>> backwards.
>>>
>>
>> But only if we don't use the dev->driver->get_vblank_counter() method,
>> which we try to use on AMD.
>
> Well, if you do it that way then you have the problem of the hw counter
> seeming to jump forward by one after crossing the start of vblank (when
> compared to the value you sampled when you processed the early vblank
> interrupt).
>

Ok, finally i see the bad scenario that wouldn't get prevented by our 
current locking with the new vblank counting in the core. The vblank 
enable path is safe due to locking and discounting of redundant 
timestamps etc. But the disable path could go wrong:

1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
updates timestamps and counts "as if" in vblank -> incremented vblank 
count and timestamp now set in the future.

2. After vblank irq finishes, but just before leading edge of vblank, 
vblank_disable_and_save() executes, doesn't get bumped timestamp or 
count because before vblank and not in vblank irq. Now 
drm_update_vblank_count() would process a
"new" timestamp and count from the past and we'd have time and counts 
going backwards, and bad things would happen.

I haven't observed such a thing happening during testing so far, 
probably because the time window in which it could happen is tiny, but 
given how awfully bad it would be, it needs to be prevented.

I had a look at the description of the Vblank irq in the "M76 Register 
Reference Guide" for older asics and the description suggests that the 
vblank irq fires when the crtc's line buffer is finished reading pixel 
data from the scanout buffer in memory for a frame, ie., when the line 
buffer read "enters" vblank. That would explain why the irq happens a 
few scanlines before actual vblank, because line buffer refills must 
obviously happen before the crtc can send pixel data from the line 
buffer to the encoders, so it would lead a bit in time. That also means 
we can't delay the vblank irq to actually happen at start of vblank and 
have to deal with the early vblank irq.

> I guess one silly idea would be to defer the vblank interrupt processing
> to a timer, and just schedule it a bit into the future from the actual
> interrupt handler.
>

Timer would be bad because then we get problems with the pageflip 
completion irq sometimes being processed before the vblank irq, and 
because we want to be fast in vblank irq handling, delivering vblank 
events etc. I wouldn't trust a timer to be reliable enough for such 
short waits. Busy waiting wouldn't be great either in irq.

So what about this relatively simple one?

1. In radeon_get_crtc_scanoutpos() we artifically define the 
vblank_start line to be, e.g, 5 scanlines before the true start of 
vblank, so for the purpose of vblank counter queries and timestamping 
our "vblank" would start a bit earlier and the vblank irq would always 
execute in "vblank". Non-Irq invocations like vblank_disable_and_save() 
would also be treated to this early vblank start and so what the DRM 
core observes would always be consistent.

2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() 
internally for "dynpm" dynamic power management/reclocking, and to 
implement pageflip completion detection on asics older than DCE3 which 
don't have pageflip interrupts. For those cases we need to use the true 
start of vblank, so for this internal use we pass in some special flag 
into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank 
start around.

3. I've added another check to my patch for drm_update_vblank_count() to 
catch timestamps going backwards and discounting such cases, for a bit 
of extra robustness against driver trouble.

Any ideas why this would be a stupid idea? I'll try this now and just 
hope meanwhile nobody finds a reason why this would be bad.

thanks,
-mario

>> Also dev->vblank_time_lock should protect us
>> from concurrent execution of the vblank counting/timestamping in irq
>> context and non-irq context? At least that was one of the purposes of
>> that lock in the past?
>
> The scenarion I outlined doesn't require anything to happen
> concurrently.
>
>>
>> -mario
>>
>>>>
>>>> It would maybe be a problem to get different answers for a query at the
>>>> same scanout position if we used .get_scanout_position() for something
>>>> else than for calculating vblank timestamps, but we don't do that atm.
>>>>
>>>> Maybe i'm overlooking something here related to the somewhat rewritten
>>>> code from the last year or so? But in the original design this would be
>>>> exactly what i intended?
>>>>
>>>> ...
>>>>
>>>>>> So it's good enough for typical desktop
>>>>>> applications/compositors/games/media players, and a nice improvement
>>>>>> over the previous state, but not quite sufficient for applications that
>>>>>> need long time consistent vblank counts for long waits of multiple
>>>>>> seconds or even minutes. So it's still good to have hw counters if we
>>>>>> can get some that are reliable enough.
>>>>>
>>>>> Ah, I didn't realize you care about small errors in the counter for
>>>>> such long periods of vblank off.
>>>>>
>>>>
>>>> Actually, you are right, i was stupid - not enough sleep last friday. I
>>>> do care about such small errors, but the long vblank off periods don't
>>>> matter at all the way my software works. I query the current count and
>>>> timestamp (glXGetSyncValuesOML), calculate a target count based on those
>>>> and then schedule a swap for that target count via glXSwapBuffersMscOML.
>>>> That swapbuffers call will keep vblank irqs on until the kms pageflip is
>>>> queued. So i only care about vblank counts/timestamps being consistent
>>>> for short amounts of time, typically 1 video refresh from vblank query
>>>> to queueing a vblank event, and then from reception of that
>>>> event/queuing the pageflip to pageflip completion event. So even if the
>>>> system would be heavily loaded and my code would have big preemption
>>>> delays i think counts that are consistent over a few seconds would be
>>>> enough to keep things working. Otherwise it wouldn't work now either
>>>> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
>>>>
>>>> -mario
>>>>
>>>>
>>>>>>
>>>>>> -mario
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> -mario
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
>>>>>>>>>> from you:
>>>>>>>>>>
>>>>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not
>>>>>>>>>> part of the current register defines for radeon-kms:
>>>>>>>>>>
>>>>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
>>>>>>>>>>
>>>>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
>>>>>>>>>> vertical scanout position. It seems to give the same readings as the 24
>>>>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
>>>>>>>>>> would come handy.
>>>>>>>>>>
>>>>>>>>>> Does Evergreen and later have a same/similar register and where is it?
>>>>>>>>>>
>>>>>>>>>> 2. The hw framecounter seems to increment when the vertical scanout
>>>>>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
>>>>>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
>>>>>>>>>> increment happening exactly at the moment that vertical scanout position
>>>>>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
>>>>>>>>>> the framecounter increment just happening somewhere inside either
>>>>>>>>>> scanline VTOTAL-1 or scanline 0?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
>>>>>>>>>> regression and with a bit of luck at the same time improve by being able
>>>>>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
>>>>>>>>>> to get turned off more aggressively for a bit of extra power saving.
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>> -mario
>>>>>>>>>
>>>>>>>>>> -- a/drivers/gpu/drm/drm_irq.c
>>>>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>>>>>                                          unsigned long flags)
>>>>>>>>>>       {
>>>>>>>>>>              struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>>>>>>>>>> -       u32 cur_vblank, diff;
>>>>>>>>>> +       u32 cur_vblank, diff = 0;
>>>>>>>>>>              bool rc;
>>>>>>>>>>              struct timeval t_vblank;
>>>>>>>>>> +       const struct timeval *t_old;
>>>>>>>>>> +       u64 diff_ns;
>>>>>>>>>>              int count = DRM_TIMESTAMP_MAXRETRIES;
>>>>>>>>>>              int framedur_ns = vblank->framedur_ns;
>>>>>>>>>>
>>>>>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>>>>>                      rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>>>>>>>>>              } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>>>>>>>>>
>>>>>>>>>> -       if (dev->max_vblank_count != 0) {
>>>>>>>>>> -               /* trust the hw counter when it's around */
>>>>>>>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>>>>>>>> -       } else if (rc && framedur_ns) {
>>>>>>>>>> -               const struct timeval *t_old;
>>>>>>>>>> -               u64 diff_ns;
>>>>>>>>>> -
>>>>>>>>>> +       /*
>>>>>>>>>> +        * Always use vblank timestamping based method if supported to reject
>>>>>>>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>>>>>>>>>> +        * due to some irq handling quirk.
>>>>>>>>>
>>>>>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
>>>>>>>>> counter if it isn't reliable.
>>>>>>>>>
>>>>>>>>>> +        *
>>>>>>>>>> +        * This also sets the diff value for use as fallback below in case the
>>>>>>>>>> +        * hw does not support a suitable hw vblank counter.
>>>>>>>>>> +        */
>>>>>>>>>> +       if (rc && framedur_ns) {
>>>>>>>>>>                      t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>>>>>>>>>                      diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>>>>>>>>>
>>>>>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>>>>>>                       */
>>>>>>>>>>                      diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>>>>>>>>>
>>>>>>>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>>>>>>>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>>>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>>>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
>>>>>>>>>> -       } else {
>>>>>>>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>>>>>>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>>>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>>>>>> +                   pipe, (long long) diff_ns, framedur_ns);
>>>>>>>>>> +
>>>>>>>>>> +                   /* Treat this redundant invocation as no-op. */
>>>>>>>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
>>>>>>>>>> +                   return;
>>>>>>>>>> +               }
>>>>>>>>>> +       }
>>>>>>>>>> +
>>>>>>>>>> +       /*
>>>>>>>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
>>>>>>>>>> +        * *must* increment at leading edge of vblank and in sync with
>>>>>>>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
>>>>>>>>>> +        * between gpu and us, where small variations in our execution
>>>>>>>>>> +        * timing can cause off-by-one miscounting of vblanks.
>>>>>>>>>> +        */
>>>>>>>>>> +       if (dev->max_vblank_count != 0) {
>>>>>>>>>> +               /* trust the hw counter when it's around */
>>>>>>>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>>>>>>>>>> +       } else if (!(rc && framedur_ns)) {
>>>>>>>>>>                      /* some kind of default for drivers w/o accurate vbl timestamping */
>>>>>>>>>>                      diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>>>>>>>              }
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
Ville Syrjala Nov. 25, 2015, 5:46 p.m. UTC | #14
On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>
> >>
> >> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> >>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>>>
> >>>> ...
> >>>> Ok, but why would that be a bad thing? I think we want it to think it is
> >>>> in the previous frame if it is called outside the vblank irq context.
> >>>> The only reason we fudge it to the next frames vblank if i vblank irq is
> >>>> because we know the vblank irq handler we are executing atm. was meant
> >>>> to execute within the upcoming vblank for the next frame, so we fudge
> >>>> the scanout positions and thereby timestamp to correspond to that new
> >>>> frame. But if something called outside irq context it should get a
> >>>> scanout position/timestamp that corresponds to "reality".
> >>>
> >>> It would be a bad thing since it would cause the timestamp to jump
> >>> backwards, and that would also cause the frame count guesstimate to go
> >>> backwards.
> >>>
> >>
> >> But only if we don't use the dev->driver->get_vblank_counter() method,
> >> which we try to use on AMD.
> >
> > Well, if you do it that way then you have the problem of the hw counter
> > seeming to jump forward by one after crossing the start of vblank (when
> > compared to the value you sampled when you processed the early vblank
> > interrupt).
> >
> 
> Ok, finally i see the bad scenario that wouldn't get prevented by our 
> current locking with the new vblank counting in the core. The vblank 
> enable path is safe due to locking and discounting of redundant 
> timestamps etc. But the disable path could go wrong:
> 
> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> updates timestamps and counts "as if" in vblank -> incremented vblank 
> count and timestamp now set in the future.
> 
> 2. After vblank irq finishes, but just before leading edge of vblank, 
> vblank_disable_and_save() executes, doesn't get bumped timestamp or 
> count because before vblank and not in vblank irq. Now 
> drm_update_vblank_count() would process a
> "new" timestamp and count from the past and we'd have time and counts 
> going backwards, and bad things would happen.
> 
> I haven't observed such a thing happening during testing so far, 
> probably because the time window in which it could happen is tiny, but 
> given how awfully bad it would be, it needs to be prevented.
> 
> I had a look at the description of the Vblank irq in the "M76 Register 
> Reference Guide" for older asics and the description suggests that the 
> vblank irq fires when the crtc's line buffer is finished reading pixel 
> data from the scanout buffer in memory for a frame, ie., when the line 
> buffer read "enters" vblank. That would explain why the irq happens a 
> few scanlines before actual vblank, because line buffer refills must 
> obviously happen before the crtc can send pixel data from the line 
> buffer to the encoders, so it would lead a bit in time. That also means 
> we can't delay the vblank irq to actually happen at start of vblank and 
> have to deal with the early vblank irq.
> 
> > I guess one silly idea would be to defer the vblank interrupt processing
> > to a timer, and just schedule it a bit into the future from the actual
> > interrupt handler.
> >
> 
> Timer would be bad because then we get problems with the pageflip 
> completion irq sometimes being processed before the vblank irq,

You you'd need to move page flip completion to happen from the vblank
timer too I suppose.

> and 
> because we want to be fast in vblank irq handling, delivering vblank 
> events etc. I wouldn't trust a timer to be reliable enough for such 
> short waits.

hrtimers should be accurate. But maybe more expensive than the timer
wheel.

> Busy waiting wouldn't be great either in irq.
> 
> So what about this relatively simple one?
> 
> 1. In radeon_get_crtc_scanoutpos() we artifically define the 
> vblank_start line to be, e.g, 5 scanlines before the true start of 
> vblank, so for the purpose of vblank counter queries and timestamping 
> our "vblank" would start a bit earlier and the vblank irq would always 
> execute in "vblank". Non-Irq invocations like vblank_disable_and_save() 
> would also be treated to this early vblank start and so what the DRM 
> core observes would always be consistent.
> 
> 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() 
> internally for "dynpm" dynamic power management/reclocking, and to 
> implement pageflip completion detection on asics older than DCE3 which 
> don't have pageflip interrupts. For those cases we need to use the true 
> start of vblank, so for this internal use we pass in some special flag 
> into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank 
> start around.
> 
> 3. I've added another check to my patch for drm_update_vblank_count() to 
> catch timestamps going backwards and discounting such cases, for a bit 
> of extra robustness against driver trouble.
> 
> Any ideas why this would be a stupid idea? I'll try this now and just 
> hope meanwhile nobody finds a reason why this would be bad.

What I don't like is leaking any of this into the core code. All the
hacks should live in the hw specific code. We've managed to keep all
the i915 hacks hidden that way.

So if you would:
- fudge your get_scanout_position as you suggested
- _not_ expose the hw counter to the vblank core since it
  disagrees with the scanout position
- keep the internal get_scanout_position variant/flag
  purely internal

then I think I might like it. That way working hardware doesn't have to
be exposed to these hacks, and there's possible a bit less danger that it
all gets broken again next time the core needs some work.

One problem with that approach could be that the vblank event and page
flip event would be delievered at different times if you keep using the
page flip interrupt, but I'm not sure that would be a problem. At least
they should both have the same timestamp and counter value.
Ville Syrjala Nov. 25, 2015, 5:58 p.m. UTC | #15
On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>
> >>
> >> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> >>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>>>
> >>>> ...
> >>>> Ok, but why would that be a bad thing? I think we want it to think it is
> >>>> in the previous frame if it is called outside the vblank irq context.
> >>>> The only reason we fudge it to the next frames vblank if i vblank irq is
> >>>> because we know the vblank irq handler we are executing atm. was meant
> >>>> to execute within the upcoming vblank for the next frame, so we fudge
> >>>> the scanout positions and thereby timestamp to correspond to that new
> >>>> frame. But if something called outside irq context it should get a
> >>>> scanout position/timestamp that corresponds to "reality".
> >>>
> >>> It would be a bad thing since it would cause the timestamp to jump
> >>> backwards, and that would also cause the frame count guesstimate to go
> >>> backwards.
> >>>
> >>
> >> But only if we don't use the dev->driver->get_vblank_counter() method,
> >> which we try to use on AMD.
> >
> > Well, if you do it that way then you have the problem of the hw counter
> > seeming to jump forward by one after crossing the start of vblank (when
> > compared to the value you sampled when you processed the early vblank
> > interrupt).
> >
> 
> Ok, finally i see the bad scenario that wouldn't get prevented by our 
> current locking with the new vblank counting in the core. The vblank 
> enable path is safe due to locking and discounting of redundant 
> timestamps etc. But the disable path could go wrong:
> 
> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> updates timestamps and counts "as if" in vblank -> incremented vblank 
> count and timestamp now set in the future.
> 
> 2. After vblank irq finishes, but just before leading edge of vblank, 
> vblank_disable_and_save() executes, doesn't get bumped timestamp or 
> count because before vblank and not in vblank irq. Now 
> drm_update_vblank_count() would process a
> "new" timestamp and count from the past and we'd have time and counts 
> going backwards, and bad things would happen.
> 
> I haven't observed such a thing happening during testing so far, 
> probably because the time window in which it could happen is tiny, but 
> given how awfully bad it would be, it needs to be prevented.
> 
> I had a look at the description of the Vblank irq in the "M76 Register 
> Reference Guide" for older asics and the description suggests that the 
> vblank irq fires when the crtc's line buffer is finished reading pixel 
> data from the scanout buffer in memory for a frame, ie., when the line 
> buffer read "enters" vblank.

Hmm. Does that mean there's always at least one fullscreen plane enabled
in the hw? As in you can't turn off the primary plane or make it smaller
than the active video area? Othwewise it sounds like you'd could either
not get it at all, or get it somewhere in the middle of the screen.
Mario Kleiner Nov. 25, 2015, 6:21 p.m. UTC | #16
On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>
>>>>
>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>> ...
>>>>>> Ok, but why would that be a bad thing? I think we want it to think it is
>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq is
>>>>>> because we know the vblank irq handler we are executing atm. was meant
>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>> frame. But if something called outside irq context it should get a
>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>
>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>> backwards.
>>>>>
>>>>
>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>> which we try to use on AMD.
>>>
>>> Well, if you do it that way then you have the problem of the hw counter
>>> seeming to jump forward by one after crossing the start of vblank (when
>>> compared to the value you sampled when you processed the early vblank
>>> interrupt).
>>>
>>
>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>> current locking with the new vblank counting in the core. The vblank
>> enable path is safe due to locking and discounting of redundant
>> timestamps etc. But the disable path could go wrong:
>>
>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>> updates timestamps and counts "as if" in vblank -> incremented vblank
>> count and timestamp now set in the future.
>>
>> 2. After vblank irq finishes, but just before leading edge of vblank,
>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>> count because before vblank and not in vblank irq. Now
>> drm_update_vblank_count() would process a
>> "new" timestamp and count from the past and we'd have time and counts
>> going backwards, and bad things would happen.
>>
>> I haven't observed such a thing happening during testing so far,
>> probably because the time window in which it could happen is tiny, but
>> given how awfully bad it would be, it needs to be prevented.
>>
>> I had a look at the description of the Vblank irq in the "M76 Register
>> Reference Guide" for older asics and the description suggests that the
>> vblank irq fires when the crtc's line buffer is finished reading pixel
>> data from the scanout buffer in memory for a frame, ie., when the line
>> buffer read "enters" vblank.
>
> Hmm. Does that mean there's always at least one fullscreen plane enabled
> in the hw? As in you can't turn off the primary plane or make it smaller
> than the active video area? Othwewise it sounds like you'd could either
> not get it at all, or get it somewhere in the middle of the screen.
>

It says "Interrupt that can be programmed to be generated by the
primary display controller's line buffer logic either when the
source image line counter is not requesting any active
display data (i.e. in the vertical blank) or the output CRTC
timing generator is within the vertical blanking region."

So my statements were my interpretation of this quote, so i can make 
some sense out of the vblank irq behaviour. I guess Alex or Harry would 
know? The M76 reference refers to some older asics, i just assume it is 
the same for the current ones, given that observed behaviour would be 
consistent with the line buffer causing this lead of a couple of 
scanlines. I see about 2 scanlines on DCE4 and about 3 scanlines on 
DCE3. I don't know how big the line buffer is, how quickly it refills 
etc., but it sounds reasonable.

-mario
Ville Syrjala Nov. 25, 2015, 7:36 p.m. UTC | #17
On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
> On 11/25/2015 06:46 PM, Ville Syrjälä wrote:
> > On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> >> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> >>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>>>
> >>>>
> >>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> >>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> >>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>>>>>
> >>>>>> ...
> >>>>>> Ok, but why would that be a bad thing? I think we want it to think it is
> >>>>>> in the previous frame if it is called outside the vblank irq context.
> >>>>>> The only reason we fudge it to the next frames vblank if i vblank irq is
> >>>>>> because we know the vblank irq handler we are executing atm. was meant
> >>>>>> to execute within the upcoming vblank for the next frame, so we fudge
> >>>>>> the scanout positions and thereby timestamp to correspond to that new
> >>>>>> frame. But if something called outside irq context it should get a
> >>>>>> scanout position/timestamp that corresponds to "reality".
> >>>>>
> >>>>> It would be a bad thing since it would cause the timestamp to jump
> >>>>> backwards, and that would also cause the frame count guesstimate to go
> >>>>> backwards.
> >>>>>
> >>>>
> >>>> But only if we don't use the dev->driver->get_vblank_counter() method,
> >>>> which we try to use on AMD.
> >>>
> >>> Well, if you do it that way then you have the problem of the hw counter
> >>> seeming to jump forward by one after crossing the start of vblank (when
> >>> compared to the value you sampled when you processed the early vblank
> >>> interrupt).
> >>>
> >>
> >> Ok, finally i see the bad scenario that wouldn't get prevented by our
> >> current locking with the new vblank counting in the core. The vblank
> >> enable path is safe due to locking and discounting of redundant
> >> timestamps etc. But the disable path could go wrong:
> >>
> >> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> >> updates timestamps and counts "as if" in vblank -> incremented vblank
> >> count and timestamp now set in the future.
> >>
> >> 2. After vblank irq finishes, but just before leading edge of vblank,
> >> vblank_disable_and_save() executes, doesn't get bumped timestamp or
> >> count because before vblank and not in vblank irq. Now
> >> drm_update_vblank_count() would process a
> >> "new" timestamp and count from the past and we'd have time and counts
> >> going backwards, and bad things would happen.
> >>
> >> I haven't observed such a thing happening during testing so far,
> >> probably because the time window in which it could happen is tiny, but
> >> given how awfully bad it would be, it needs to be prevented.
> >>
> >> I had a look at the description of the Vblank irq in the "M76 Register
> >> Reference Guide" for older asics and the description suggests that the
> >> vblank irq fires when the crtc's line buffer is finished reading pixel
> >> data from the scanout buffer in memory for a frame, ie., when the line
> >> buffer read "enters" vblank. That would explain why the irq happens a
> >> few scanlines before actual vblank, because line buffer refills must
> >> obviously happen before the crtc can send pixel data from the line
> >> buffer to the encoders, so it would lead a bit in time. That also means
> >> we can't delay the vblank irq to actually happen at start of vblank and
> >> have to deal with the early vblank irq.
> >>
> >>> I guess one silly idea would be to defer the vblank interrupt processing
> >>> to a timer, and just schedule it a bit into the future from the actual
> >>> interrupt handler.
> >>>
> >>
> >> Timer would be bad because then we get problems with the pageflip
> >> completion irq sometimes being processed before the vblank irq,
> >
> > You you'd need to move page flip completion to happen from the vblank
> > timer too I suppose.
> >
> >> and
> >> because we want to be fast in vblank irq handling, delivering vblank
> >> events etc. I wouldn't trust a timer to be reliable enough for such
> >> short waits.
> >
> > hrtimers should be accurate. But maybe more expensive than the timer
> > wheel.
> >
> 
> Sounds all a bit complex and fraught with new possible complications. I 
> can't spend much more time on this than i did so far, and we need to get 
> this into 4.4-rc, so i am aiming for the most simple solution that would 
> work.
> 
> >> Busy waiting wouldn't be great either in irq.
> >>
> >> So what about this relatively simple one?
> >>
> >> 1. In radeon_get_crtc_scanoutpos() we artifically define the
> >> vblank_start line to be, e.g, 5 scanlines before the true start of
> >> vblank, so for the purpose of vblank counter queries and timestamping
> >> our "vblank" would start a bit earlier and the vblank irq would always
> >> execute in "vblank". Non-Irq invocations like vblank_disable_and_save()
> >> would also be treated to this early vblank start and so what the DRM
> >> core observes would always be consistent.
> >>
> >> 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos()
> >> internally for "dynpm" dynamic power management/reclocking, and to
> >> implement pageflip completion detection on asics older than DCE3 which
> >> don't have pageflip interrupts. For those cases we need to use the true
> >> start of vblank, so for this internal use we pass in some special flag
> >> into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank
> >> start around.
> >>
> >> 3. I've added another check to my patch for drm_update_vblank_count() to
> >> catch timestamps going backwards and discounting such cases, for a bit
> >> of extra robustness against driver trouble.
> >>
> >> Any ideas why this would be a stupid idea? I'll try this now and just
> >> hope meanwhile nobody finds a reason why this would be bad.
> >
> > What I don't like is leaking any of this into the core code. All the
> > hacks should live in the hw specific code. We've managed to keep all
> > the i915 hacks hidden that way.
> >
> > So if you would:
> > - fudge your get_scanout_position as you suggested
> > - _not_ expose the hw counter to the vblank core since it
> >    disagrees with the scanout position
> > - keep the internal get_scanout_position variant/flag
> >    purely internal
> >
> > then I think I might like it. That way working hardware doesn't have to
> > be exposed to these hacks, and there's possible a bit less danger that it
> > all gets broken again next time the core needs some work.
> >
> 
> Ok. Exposing the fudged hw counter shouldn't be a problem though, given 
> that it would be fudged in a consistent way with the fudged scanout 
> positions and to have incremented at time of drm_handle_vblank(). We'd 
> bump the hw counter to the count of the new vblank at the same time when 
> the scanout positions would start counting backwards from minus 
> something to 0, showing how many vblank lines to go until start of 
> scanout, etc. The only difference to reality would be that this 
> simulated vblank would start 5 scanlines earlier than the true hw 
> vblank, but i can't see how the core would care about that.
> 
> 
> > One problem with that approach could be that the vblank event and page
> > flip event would be delievered at different times if you keep using the
> > page flip interrupt, but I'm not sure that would be a problem. At least
> > they should both have the same timestamp and counter value.
> >
> 
> That's the same now, and the timestamps and counts be the same. I'll 
> check with my measurement equipment that the timestamps will be still 
> accurate wrt. to reality.
> 
> Attached is my current patch i wanted to submit for the drm core's 
> drm_update_vblank_count(). I think it's good to make the core somewhat 
> robust against potential kms driver bugs or glitches. But if you 
> wouldn't like that patch, there wouldn't be much of a point sending it 
> out at all.
> 
> thanks,
> -mario
> 

> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
> From: Mario Kleiner <mario.kleiner.de@gmail.com>
> Date: Wed, 25 Nov 2015 18:48:31 +0100
> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.
> 
> The changes to drm_update_vblank_count() for Linux 4.4-rc
> made the function more fragile wrt. some hw quirks. E.g.,
> at dev->driver->enable_vblank(), AMD gpu's fire a spurious
> redundant vblank irq shortly after enabling vblank irqs, not
> locked to vblank. This causes a redundant call which needs
> to be suppressed to avoid miscounting.
> 
> To increase robustness, shuffle things around a bit:
> 
> On drivers with high precision vblank timestamping always
> evaluate the timestamp difference between current timestamp
> and previously recorded timestamp to detect such redundant
> invocations and no-op in that case.
> 
> Also detect and warn about timestamps going backwards to
> catch potential kms driver bugs.
> 
> This patch is meant for Linux 4.4-rc and later.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> ---
>  drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 41 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 819b8c1..8728c3c 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  				    unsigned long flags)
>  {
>  	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> -	u32 cur_vblank, diff;
> +	u32 cur_vblank, diff = 0;
>  	bool rc;
>  	struct timeval t_vblank;
> +	const struct timeval *t_old;
> +	u64 diff_ns;
>  	int count = DRM_TIMESTAMP_MAXRETRIES;
>  	int framedur_ns = vblank->framedur_ns;
>  
> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  		rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>  	} while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>  
> -	if (dev->max_vblank_count != 0) {
> -		/* trust the hw counter when it's around */
> -		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> -	} else if (rc && framedur_ns) {
> -		const struct timeval *t_old;
> -		u64 diff_ns;
> -
> +	/*
> +	 * Always use vblank timestamping based method if supported to reject
> +	 * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> +	 * due to some irq handling quirk.
> +	 *
> +	 * This also sets the diff value for use as fallback below in case the
> +	 * hw does not support a suitable hw vblank counter.
> +	 */
> +	if (rc && framedur_ns) {

If you fudged everything properly why do you still need this? With
working hw counter there should be no need to do this stuff.

>  		t_old = &vblanktimestamp(dev, pipe, vblank->count);
>  		diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>  
> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  		 */
>  		diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>  
> -		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> +		/* Catch driver timestamping bugs and prevent worse. */
> +		if ((s32) diff < 0) {
> +			DRM_DEBUG_VBL("crtc %u: Timestamp going backward!"
> +			" diff_ns = %lld, framedur_ns = %d)\n",
> +			pipe, (long long) diff_ns, framedur_ns);
> +			return;
> +		}
> +
> +		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>  			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> -				      " diff_ns = %lld, framedur_ns = %d)\n",
> -				      pipe, (long long) diff_ns, framedur_ns);
> -	} else {
> +			" diff_ns = %lld, framedur_ns = %d)\n",
> +			pipe, (long long) diff_ns, framedur_ns);
> +
> +			/* Treat this redundant invocation as no-op. */
> +			WARN_ON_ONCE(cur_vblank != vblank->last);
> +			return;
> +		}
> +	}
> +
> +	/*
> +	 * hw counters, if marked as supported via max_vblank_count != 0,
> +	 * *must* increment at leading edge of vblank or at least before
> +	 * the firing of the hw vblank irq, otherwise we have a race here
> +	 * between gpu and us, where small variations in our execution
> +	 * timing can cause off-by-one miscounting of vblanks.
> +	 */
> +	if (dev->max_vblank_count != 0) {
> +		/* trust the hw counter when it's around */
> +		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> +	} else if (!(rc && framedur_ns)) {
>  		/* some kind of default for drivers w/o accurate vbl timestamping */
>  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>  	}
> -- 
> 1.9.1
>
Alex Deucher Nov. 25, 2015, 7:38 p.m. UTC | #18
On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
> On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
>>
>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>>>
>>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>>>
>>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>>>
>>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>>>
>>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>>>
>>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>>
>>>>>>>
>>>>>>> ...
>>>>>>> Ok, but why would that be a bad thing? I think we want it to think it
>>>>>>> is
>>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq
>>>>>>> is
>>>>>>> because we know the vblank irq handler we are executing atm. was
>>>>>>> meant
>>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>>> frame. But if something called outside irq context it should get a
>>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>>
>>>>>>
>>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>>> backwards.
>>>>>>
>>>>>
>>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>>> which we try to use on AMD.
>>>>
>>>>
>>>> Well, if you do it that way then you have the problem of the hw counter
>>>> seeming to jump forward by one after crossing the start of vblank (when
>>>> compared to the value you sampled when you processed the early vblank
>>>> interrupt).
>>>>
>>>
>>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>>> current locking with the new vblank counting in the core. The vblank
>>> enable path is safe due to locking and discounting of redundant
>>> timestamps etc. But the disable path could go wrong:
>>>
>>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>>> updates timestamps and counts "as if" in vblank -> incremented vblank
>>> count and timestamp now set in the future.
>>>
>>> 2. After vblank irq finishes, but just before leading edge of vblank,
>>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>>> count because before vblank and not in vblank irq. Now
>>> drm_update_vblank_count() would process a
>>> "new" timestamp and count from the past and we'd have time and counts
>>> going backwards, and bad things would happen.
>>>
>>> I haven't observed such a thing happening during testing so far,
>>> probably because the time window in which it could happen is tiny, but
>>> given how awfully bad it would be, it needs to be prevented.
>>>
>>> I had a look at the description of the Vblank irq in the "M76 Register
>>> Reference Guide" for older asics and the description suggests that the
>>> vblank irq fires when the crtc's line buffer is finished reading pixel
>>> data from the scanout buffer in memory for a frame, ie., when the line
>>> buffer read "enters" vblank.
>>
>>
>> Hmm. Does that mean there's always at least one fullscreen plane enabled
>> in the hw? As in you can't turn off the primary plane or make it smaller
>> than the active video area? Othwewise it sounds like you'd could either
>> not get it at all, or get it somewhere in the middle of the screen.
>>
>
> It says "Interrupt that can be programmed to be generated by the
> primary display controller's line buffer logic either when the
> source image line counter is not requesting any active
> display data (i.e. in the vertical blank) or the output CRTC
> timing generator is within the vertical blanking region."
>
> So my statements were my interpretation of this quote, so i can make some
> sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
> M76 reference refers to some older asics, i just assume it is the same for
> the current ones, given that observed behaviour would be consistent with the
> line buffer causing this lead of a couple of scanlines. I see about 2
> scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
> line buffer is, how quickly it refills etc., but it sounds reasonable.

The size of the line buffer varies by generation, but the LB logic is
still responsible for generating the vblank interrupt even on newer
hw.

Alex
Mario Kleiner Nov. 27, 2015, 2:09 p.m. UTC | #19
On 11/25/2015 08:36 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
>> On 11/25/2015 06:46 PM, Ville Syrjälä wrote:

...

>> Attached is my current patch i wanted to submit for the drm core's
>> drm_update_vblank_count(). I think it's good to make the core somewhat
>> robust against potential kms driver bugs or glitches. But if you
>> wouldn't like that patch, there wouldn't be much of a point sending it
>> out at all.
>>
>> thanks,
>> -mario
>>
>
>> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
>> From: Mario Kleiner <mario.kleiner.de@gmail.com>
>> Date: Wed, 25 Nov 2015 18:48:31 +0100
>> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.
>>
>> The changes to drm_update_vblank_count() for Linux 4.4-rc
>> made the function more fragile wrt. some hw quirks. E.g.,
>> at dev->driver->enable_vblank(), AMD gpu's fire a spurious
>> redundant vblank irq shortly after enabling vblank irqs, not
>> locked to vblank. This causes a redundant call which needs
>> to be suppressed to avoid miscounting.
>>
>> To increase robustness, shuffle things around a bit:
>>
>> On drivers with high precision vblank timestamping always
>> evaluate the timestamp difference between current timestamp
>> and previously recorded timestamp to detect such redundant
>> invocations and no-op in that case.
>>
>> Also detect and warn about timestamps going backwards to
>> catch potential kms driver bugs.
>>
>> This patch is meant for Linux 4.4-rc and later.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>> ---
>>   drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++-----------
>>   1 file changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 819b8c1..8728c3c 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   				    unsigned long flags)
>>   {
>>   	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>> -	u32 cur_vblank, diff;
>> +	u32 cur_vblank, diff = 0;
>>   	bool rc;
>>   	struct timeval t_vblank;
>> +	const struct timeval *t_old;
>> +	u64 diff_ns;
>>   	int count = DRM_TIMESTAMP_MAXRETRIES;
>>   	int framedur_ns = vblank->framedur_ns;
>>
>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   		rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>   	} while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>
>> -	if (dev->max_vblank_count != 0) {
>> -		/* trust the hw counter when it's around */
>> -		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> -	} else if (rc && framedur_ns) {
>> -		const struct timeval *t_old;
>> -		u64 diff_ns;
>> -
>> +	/*
>> +	 * Always use vblank timestamping based method if supported to reject
>> +	 * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>> +	 * due to some irq handling quirk.
>> +	 *
>> +	 * This also sets the diff value for use as fallback below in case the
>> +	 * hw does not support a suitable hw vblank counter.
>> +	 */
>> +	if (rc && framedur_ns) {
>
> If you fudged everything properly why do you still need this? With
> working hw counter there should be no need to do this stuff.
>

As far as testing on one DCE4 card goes, i don't need it anymore with my 
fudged hw counters and timestamps. The fudging so far seems to work 
nicely. I just wanted to have a bit of extra robustness and a bit of 
extra available debug output against future or other broken drivers, or 
mistakes in fudging on the current driver, e.g., against things like 
timestamps going backwards. Especially since i can only test on two AMD 
cards atm., quite a limited sample. There are 3 display engine 
generations before and 5 generations after my test sample.

-mario


>>   		t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>   		diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>
>> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   		 */
>>   		diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>
>> -		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>> +		/* Catch driver timestamping bugs and prevent worse. */
>> +		if ((s32) diff < 0) {
>> +			DRM_DEBUG_VBL("crtc %u: Timestamp going backward!"
>> +			" diff_ns = %lld, framedur_ns = %d)\n",
>> +			pipe, (long long) diff_ns, framedur_ns);
>> +			return;
>> +		}
>> +
>> +		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>> -				      " diff_ns = %lld, framedur_ns = %d)\n",
>> -				      pipe, (long long) diff_ns, framedur_ns);
>> -	} else {
>> +			" diff_ns = %lld, framedur_ns = %d)\n",
>> +			pipe, (long long) diff_ns, framedur_ns);
>> +
>> +			/* Treat this redundant invocation as no-op. */
>> +			WARN_ON_ONCE(cur_vblank != vblank->last);
>> +			return;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * hw counters, if marked as supported via max_vblank_count != 0,
>> +	 * *must* increment at leading edge of vblank or at least before
>> +	 * the firing of the hw vblank irq, otherwise we have a race here
>> +	 * between gpu and us, where small variations in our execution
>> +	 * timing can cause off-by-one miscounting of vblanks.
>> +	 */
>> +	if (dev->max_vblank_count != 0) {
>> +		/* trust the hw counter when it's around */
>> +		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> +	} else if (!(rc && framedur_ns)) {
>>   		/* some kind of default for drivers w/o accurate vbl timestamping */
>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>   	}
>> --
>> 1.9.1
>>
>
>
Mario Kleiner Nov. 27, 2015, 2:36 p.m. UTC | #20
On 11/25/2015 08:38 PM, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
> <mario.kleiner.de@gmail.com> wrote:
>> On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
>>>
>>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
>>>>
>>>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
>>>>>
>>>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
>>>>>>>
>>>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> ...
>>>>>>>> Ok, but why would that be a bad thing? I think we want it to think it
>>>>>>>> is
>>>>>>>> in the previous frame if it is called outside the vblank irq context.
>>>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq
>>>>>>>> is
>>>>>>>> because we know the vblank irq handler we are executing atm. was
>>>>>>>> meant
>>>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
>>>>>>>> the scanout positions and thereby timestamp to correspond to that new
>>>>>>>> frame. But if something called outside irq context it should get a
>>>>>>>> scanout position/timestamp that corresponds to "reality".
>>>>>>>
>>>>>>>
>>>>>>> It would be a bad thing since it would cause the timestamp to jump
>>>>>>> backwards, and that would also cause the frame count guesstimate to go
>>>>>>> backwards.
>>>>>>>
>>>>>>
>>>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
>>>>>> which we try to use on AMD.
>>>>>
>>>>>
>>>>> Well, if you do it that way then you have the problem of the hw counter
>>>>> seeming to jump forward by one after crossing the start of vblank (when
>>>>> compared to the value you sampled when you processed the early vblank
>>>>> interrupt).
>>>>>
>>>>
>>>> Ok, finally i see the bad scenario that wouldn't get prevented by our
>>>> current locking with the new vblank counting in the core. The vblank
>>>> enable path is safe due to locking and discounting of redundant
>>>> timestamps etc. But the disable path could go wrong:
>>>>
>>>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
>>>> updates timestamps and counts "as if" in vblank -> incremented vblank
>>>> count and timestamp now set in the future.
>>>>
>>>> 2. After vblank irq finishes, but just before leading edge of vblank,
>>>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
>>>> count because before vblank and not in vblank irq. Now
>>>> drm_update_vblank_count() would process a
>>>> "new" timestamp and count from the past and we'd have time and counts
>>>> going backwards, and bad things would happen.
>>>>
>>>> I haven't observed such a thing happening during testing so far,
>>>> probably because the time window in which it could happen is tiny, but
>>>> given how awfully bad it would be, it needs to be prevented.
>>>>
>>>> I had a look at the description of the Vblank irq in the "M76 Register
>>>> Reference Guide" for older asics and the description suggests that the
>>>> vblank irq fires when the crtc's line buffer is finished reading pixel
>>>> data from the scanout buffer in memory for a frame, ie., when the line
>>>> buffer read "enters" vblank.
>>>
>>>
>>> Hmm. Does that mean there's always at least one fullscreen plane enabled
>>> in the hw? As in you can't turn off the primary plane or make it smaller
>>> than the active video area? Othwewise it sounds like you'd could either
>>> not get it at all, or get it somewhere in the middle of the screen.
>>>
>>
>> It says "Interrupt that can be programmed to be generated by the
>> primary display controller's line buffer logic either when the
>> source image line counter is not requesting any active
>> display data (i.e. in the vertical blank) or the output CRTC
>> timing generator is within the vertical blanking region."
>>
>> So my statements were my interpretation of this quote, so i can make some
>> sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
>> M76 reference refers to some older asics, i just assume it is the same for
>> the current ones, given that observed behaviour would be consistent with the
>> line buffer causing this lead of a couple of scanlines. I see about 2
>> scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
>> line buffer is, how quickly it refills etc., but it sounds reasonable.
>
> The size of the line buffer varies by generation, but the LB logic is
> still responsible for generating the vblank interrupt even on newer
> hw.
>
> Alex
>

Thanks for the pointer. I digged through the 
evergreen_line_buffer_adjust() function and its siblings from classic to 
DCE11. Seems the line buffer capacity goes up to a max 16384 pixels on 
single display, e.g., evergreen, and more like 8192 pixels on DCE8+? 
It's expressed as 8192 * 2 and talking about different line buffer 
partitions in some places. At least for DCE4+ i can see the sizes from 
the code, but not for earlier gens.

To find the proper value of how much earlier i need to fudge/place the 
start of vblank wrt. the true start of vblank in the code, i think i'll 
have to adjust for line buffer size.

The worst case would be that the line buffer can be ahead of our scanout 
positions by the full height of the line buffer a la: fudgelines = 
lb_size / mode->crtc_hdisplay

I think i could calculate this in the xxx_line_buffer_adjust() functions 
and store it in radeon_crtc for use in the fudging code. Atm. i'm just 
using a hard-coded "10" which so far worked for typical display modes. 
Assuming something like a 640x480 mode i'd need to set a constant of 26 
if i'd want to account for full line buffer size as worst case. So i 
wonder what a good value would be to be safe but at the same time to 
keep this extended vblank small? E.g., maybe a smaller value than full 
line buffer height would do because due to the way watermarks for lb 
refill are set it can't get ahead by more than a few scanlines << total 
size?

I also realized that we need to extend the virtual vblank by the same 
fudged number of scanlines when programming the page flip. Our hw 
settings make it flip at leading edge of true vblank if the new bo 
address is latched in time. To keep all the logic in userspace working 
properly we need to guarantee that now it will flip at the start of the 
fudged vblank at latest, otherwise delays the flip one frame. So i'd add 
a check for being in the fudged "vblank before the true vblank" to the 
radeon_flip_work_func() before programming the flip and udelay() if we'd 
end up in this forbidden zone. This problem already exists in the old 
code, but probably doesn't happen frequently (or ever) in practice, at 
least i couldn't easily provoke it, because usually we are too slow in 
scheduling the flip to fit into the time window for this race. But 
future optimizations or faster machines might break it.

Thoughts?
-mario
Ville Syrjala Nov. 27, 2015, 2:55 p.m. UTC | #21
On Wed, Nov 25, 2015 at 02:38:04PM -0500, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
> <mario.kleiner.de@gmail.com> wrote:
> > On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
> >>
> >> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> >>>
> >>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> >>>>
> >>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> >>>>>>
> >>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >>>>>>>
> >>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> ...
> >>>>>>> Ok, but why would that be a bad thing? I think we want it to think it
> >>>>>>> is
> >>>>>>> in the previous frame if it is called outside the vblank irq context.
> >>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq
> >>>>>>> is
> >>>>>>> because we know the vblank irq handler we are executing atm. was
> >>>>>>> meant
> >>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
> >>>>>>> the scanout positions and thereby timestamp to correspond to that new
> >>>>>>> frame. But if something called outside irq context it should get a
> >>>>>>> scanout position/timestamp that corresponds to "reality".
> >>>>>>
> >>>>>>
> >>>>>> It would be a bad thing since it would cause the timestamp to jump
> >>>>>> backwards, and that would also cause the frame count guesstimate to go
> >>>>>> backwards.
> >>>>>>
> >>>>>
> >>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
> >>>>> which we try to use on AMD.
> >>>>
> >>>>
> >>>> Well, if you do it that way then you have the problem of the hw counter
> >>>> seeming to jump forward by one after crossing the start of vblank (when
> >>>> compared to the value you sampled when you processed the early vblank
> >>>> interrupt).
> >>>>
> >>>
> >>> Ok, finally i see the bad scenario that wouldn't get prevented by our
> >>> current locking with the new vblank counting in the core. The vblank
> >>> enable path is safe due to locking and discounting of redundant
> >>> timestamps etc. But the disable path could go wrong:
> >>>
> >>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> >>> updates timestamps and counts "as if" in vblank -> incremented vblank
> >>> count and timestamp now set in the future.
> >>>
> >>> 2. After vblank irq finishes, but just before leading edge of vblank,
> >>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
> >>> count because before vblank and not in vblank irq. Now
> >>> drm_update_vblank_count() would process a
> >>> "new" timestamp and count from the past and we'd have time and counts
> >>> going backwards, and bad things would happen.
> >>>
> >>> I haven't observed such a thing happening during testing so far,
> >>> probably because the time window in which it could happen is tiny, but
> >>> given how awfully bad it would be, it needs to be prevented.
> >>>
> >>> I had a look at the description of the Vblank irq in the "M76 Register
> >>> Reference Guide" for older asics and the description suggests that the
> >>> vblank irq fires when the crtc's line buffer is finished reading pixel
> >>> data from the scanout buffer in memory for a frame, ie., when the line
> >>> buffer read "enters" vblank.
> >>
> >>
> >> Hmm. Does that mean there's always at least one fullscreen plane enabled
> >> in the hw? As in you can't turn off the primary plane or make it smaller
> >> than the active video area? Othwewise it sounds like you'd could either
> >> not get it at all, or get it somewhere in the middle of the screen.
> >>
> >
> > It says "Interrupt that can be programmed to be generated by the
> > primary display controller's line buffer logic either when the
> > source image line counter is not requesting any active
> > display data (i.e. in the vertical blank) or the output CRTC
> > timing generator is within the vertical blanking region."
> >
> > So my statements were my interpretation of this quote, so i can make some
> > sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
> > M76 reference refers to some older asics, i just assume it is the same for
> > the current ones, given that observed behaviour would be consistent with the
> > line buffer causing this lead of a couple of scanlines. I see about 2
> > scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
> > line buffer is, how quickly it refills etc., but it sounds reasonable.
> 
> The size of the line buffer varies by generation, but the LB logic is
> still responsible for generating the vblank interrupt even on newer
> hw.

So there's no actual vblank interrupt available from the timing
generator?
diff mbox

Patch

-- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -172,9 +172,11 @@  static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
                                    unsigned long flags)
 {
        struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
-       u32 cur_vblank, diff;
+       u32 cur_vblank, diff = 0;
        bool rc;
        struct timeval t_vblank;
+       const struct timeval *t_old;
+       u64 diff_ns;
        int count = DRM_TIMESTAMP_MAXRETRIES;
        int framedur_ns = vblank->framedur_ns;

@@ -195,13 +197,15 @@  static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
                rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
        } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);

-       if (dev->max_vblank_count != 0) {
-               /* trust the hw counter when it's around */
-               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
-       } else if (rc && framedur_ns) {
-               const struct timeval *t_old;
-               u64 diff_ns;
-
+       /*
+        * Always use vblank timestamping based method if supported to reject
+        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
+        * due to some irq handling quirk.
+        *
+        * This also sets the diff value for use as fallback below in case the
+        * hw does not support a suitable hw vblank counter.
+        */
+       if (rc && framedur_ns) {
                t_old = &vblanktimestamp(dev, pipe, vblank->count);
                diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);

@@ -212,11 +216,28 @@  static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
                 */
                diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);

-               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
-                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
-                                     " diff_ns = %lld, framedur_ns = %d)\n",
-                                     pipe, (long long) diff_ns, framedur_ns);
-       } else {
+               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
+                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
+                   " diff_ns = %lld, framedur_ns = %d)\n",
+                   pipe, (long long) diff_ns, framedur_ns);
+
+                   /* Treat this redundant invocation as no-op. */
+                   WARN_ON_ONCE(cur_vblank != vblank->last);
+                   return;
+               }
+       }
+
+       /*
+        * hw counters, if marked as supported via max_vblank_count != 0,
+        * *must* increment at leading edge of vblank and in sync with
+        * the firing of the hw vblank irq, otherwise we have a race here
+        * between gpu and us, where small variations in our execution
+        * timing can cause off-by-one miscounting of vblanks.
+        */
+       if (dev->max_vblank_count != 0) {
+               /* trust the hw counter when it's around */
+               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
+       } else if (!(rc && framedur_ns)) {
                /* some kind of default for drivers w/o accurate vbl timestamping */
                diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
        }