Message ID | 564E0AF4.3050406@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote: > Hi Alex and Michel and Ville, > > it's "fix vblank stuff" time again ;-) > > Ville's changes to the DRM's drm_handle_vblank() / > drm_update_vblank_count() code in Linux 4.4 not only made that code more > elegant, but also removed the robustness against the vblank irq quirks > in AMD hw and similar hardware. So now i get tons of off-by-one errors and > > "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip > completion event has impossible msc 24803 < target_msc 24804" XOrg > messages from that kernel. Argh. Sorry about that. > > One of the reasons for trouble is that AMD hw quirk where the hw fires > an extra vblank irq shortly after vblank irq's get enabled, not > synchronized to vblank, but typically in the middle of active scanout, > so we get a redundant call to drm_handle_vblank in the middle of scanout. I think that should be fine as such. The code should ignore redudntant vbl irqs. Well, assuming you have a reliable hw counter or you use the timestamp guesstimate mechanism and your scanout position is reported accurately. But I guess you have a bit of problem with both. > > To fix that i have a minor patch to make drm_update_vblank_count() again > robust against such redundant calls, which i will send out later to the > mailing list. Diff attached for reference. > > The second quirk of AMD hw is that the vblank interrupt fires a few > scanlines before start of vblank, so drm_handle_vblank -> > drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets > called before the start of the vblank for which the new vblank count > should be queried. Does it fire too soon, or is the scanout position register value(s) just offset by a few lines perhaps? We have that with i915 and I simply fix up the value when reading it out. Fortunately for us the offset is constant (or at least seems to be) for a given platform/connector combo. > > The third problem is that the DRM vblank handling always had the > assumption that hardware vblank counters would essentially increment at > leading edge of vblank - basically in sync with the firing of the vblank > irq, so that a hw counter readout from within the vblank irq handler > would always deliver the new incremented value. If this assumption is > violated then the counting by use of the hw counter gets unreliable, > because depending on random small delays in irq handling the code may > end up sampling the hw counter pre- or post-increment, leading to > inconsistent updating and funky bugs. It just so happens that AMD > hardware doesn't increment the hw counter at leading edge of vblank, so > stuff falls apart. > > So to fix those two problems i'm tinkering with cooking the hw vblank > counter value returned by radeon_get_vblank_counter_kms() to make it > appear as if the counter incremented at leading edge of vblank in sync > with vblank irq. Yeah, I do that in i915 too. Well, on some platforms where the counter increments at the leading edge of active. > > It almost sort of works on the rs600 code path, but i need a bit of info > from you: > > 1. There's this register from the old specs for m76.pdf, which is not > part of the current register defines for radeon-kms: > > "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" > > It contains the lower 16 bits of framecounter and the 13 bits of > vertical scanout position. It seems to give the same readings as the 24 > bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This > would come handy. > > Does Evergreen and later have a same/similar register and where is it? > > 2. The hw framecounter seems to increment when the vertical scanout > position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 > gpu i tested so far. Is this so on all asics? And is the hw counter > increment happening exactly at the moment that vertical scanout position > jumps back to zero, ie. both events are driven by the same signal? Or is > the framecounter increment just happening somewhere inside either > scanline VTOTAL-1 or scanline 0? > > > If we can fix this and get it into rc2 or rc3 then we could avoid a bad > regression and with a bit of luck at the same time improve by being able > to set dev->vblank_disable_immediate = true then and allow vblank irqs > to get turned off more aggressively for a bit of extra power saving. > > thanks, > -mario > -- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > unsigned long flags) > { > struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > - u32 cur_vblank, diff; > + u32 cur_vblank, diff = 0; > bool rc; > struct timeval t_vblank; > + const struct timeval *t_old; > + u64 diff_ns; > int count = DRM_TIMESTAMP_MAXRETRIES; > int framedur_ns = vblank->framedur_ns; > > @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); > } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); > > - if (dev->max_vblank_count != 0) { > - /* trust the hw counter when it's around */ > - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > - } else if (rc && framedur_ns) { > - const struct timeval *t_old; > - u64 diff_ns; > - > + /* > + * Always use vblank timestamping based method if supported to reject > + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up > + * due to some irq handling quirk. Hmm. I'm thinking it would be better to simply not claim that there's a hw counter if it isn't reliable. > + * > + * This also sets the diff value for use as fallback below in case the > + * hw does not support a suitable hw vblank counter. > + */ > + if (rc && framedur_ns) { > t_old = &vblanktimestamp(dev, pipe, vblank->count); > diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); > > @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > */ > diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); > > - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) > - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > - " diff_ns = %lld, framedur_ns = %d)\n", > - pipe, (long long) diff_ns, framedur_ns); > - } else { > + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { > + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > + " diff_ns = %lld, framedur_ns = %d)\n", > + pipe, (long long) diff_ns, framedur_ns); > + > + /* Treat this redundant invocation as no-op. */ > + WARN_ON_ONCE(cur_vblank != vblank->last); > + return; > + } > + } > + > + /* > + * hw counters, if marked as supported via max_vblank_count != 0, > + * *must* increment at leading edge of vblank and in sync with > + * the firing of the hw vblank irq, otherwise we have a race here > + * between gpu and us, where small variations in our execution > + * timing can cause off-by-one miscounting of vblanks. > + */ > + if (dev->max_vblank_count != 0) { > + /* trust the hw counter when it's around */ > + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > + } else if (!(rc && framedur_ns)) { > /* some kind of default for drivers w/o accurate vbl timestamping */ > diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > }
On 11/19/2015 07:20 PM, Ville Syrjälä wrote: > On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote: >> Hi Alex and Michel and Ville, >> >> it's "fix vblank stuff" time again ;-) >> >> Ville's changes to the DRM's drm_handle_vblank() / >> drm_update_vblank_count() code in Linux 4.4 not only made that code more >> elegant, but also removed the robustness against the vblank irq quirks >> in AMD hw and similar hardware. So now i get tons of off-by-one errors and >> >> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >> completion event has impossible msc 24803 < target_msc 24804" XOrg >> messages from that kernel. > > Argh. Sorry about that. > On the plus side, your "vblank timestamp deltas as fake vblank counters" code seems to work nicely on nouveau-kms, as far as testing with three Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and Ivybridge) i tested checked out nicely. And at least the recent nv50+ NVidia Tesla also have 16 bit vblank counters which we could implement in nouveau, maybe with the same trickery to allow long trouble-free vblank off periods, hopefully that would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 will probably also work. I think i read in the Tegra-3 PRM that the sync points they use to implement vblank counters do increment at leading edge of vblank. The only problem we may have is that some of the embedded gpus may not have compliant vblank counters and they probably also lack vblank timestamping, so it might be a good idea to rather not use vblank counters at all in those drivers - patch their kms drivers to max_vblank_count = 0; >> >> One of the reasons for trouble is that AMD hw quirk where the hw fires >> an extra vblank irq shortly after vblank irq's get enabled, not >> synchronized to vblank, but typically in the middle of active scanout, >> so we get a redundant call to drm_handle_vblank in the middle of scanout. > > I think that should be fine as such. The code should ignore redudntant > vbl irqs. Well, assuming you have a reliable hw counter or you use the > timestamp guesstimate mechanism and your scanout position is reported > accurately. But I guess you have a bit of problem with both. > The problem is i'll need to treat calls to radeon kms driver->get_vblank_counter differently, depending if the function gets called from vblank irq, or from regular code, so that hw quirk that causes spontaneous misfiring of the vblank irq in the middle of scanout would confuse my hw vblank counter cooking method to produce a fake hw vblank counter increment. That's why i moved the filtering for redundant irqs based on vblank timestamps in drm_vblank_update() around to always apply. Makes us robust against that type of hw quirk in general and makes life for the vblank counter cooking so much easier. It's a beautiful collaboration of different hw bugs to make things interesting :) >> >> To fix that i have a minor patch to make drm_update_vblank_count() again >> robust against such redundant calls, which i will send out later to the >> mailing list. Diff attached for reference. >> >> The second quirk of AMD hw is that the vblank interrupt fires a few >> scanlines before start of vblank, so drm_handle_vblank -> >> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets >> called before the start of the vblank for which the new vblank count >> should be queried. > > Does it fire too soon, or is the scanout position register value(s) > just offset by a few lines perhaps? > > We have that with i915 and I simply fix up the value when reading it > out. Fortunately for us the offset is constant (or at least seems to > be) for a given platform/connector combo. > I think they fire too soon, from all i've seen so far on a few cards. >> >> The third problem is that the DRM vblank handling always had the >> assumption that hardware vblank counters would essentially increment at >> leading edge of vblank - basically in sync with the firing of the vblank >> irq, so that a hw counter readout from within the vblank irq handler >> would always deliver the new incremented value. If this assumption is >> violated then the counting by use of the hw counter gets unreliable, >> because depending on random small delays in irq handling the code may >> end up sampling the hw counter pre- or post-increment, leading to >> inconsistent updating and funky bugs. It just so happens that AMD >> hardware doesn't increment the hw counter at leading edge of vblank, so >> stuff falls apart. >> >> So to fix those two problems i'm tinkering with cooking the hw vblank >> counter value returned by radeon_get_vblank_counter_kms() to make it >> appear as if the counter incremented at leading edge of vblank in sync >> with vblank irq. > > Yeah, I do that in i915 too. Well, on some platforms where the > counter increments at the leading edge of active. > Yep. I think that's key to make the vblank instant off stuff work reliably. -mario >> >> It almost sort of works on the rs600 code path, but i need a bit of info >> from you: >> >> 1. There's this register from the old specs for m76.pdf, which is not >> part of the current register defines for radeon-kms: >> >> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >> >> It contains the lower 16 bits of framecounter and the 13 bits of >> vertical scanout position. It seems to give the same readings as the 24 >> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >> would come handy. >> >> Does Evergreen and later have a same/similar register and where is it? >> >> 2. The hw framecounter seems to increment when the vertical scanout >> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 >> gpu i tested so far. Is this so on all asics? And is the hw counter >> increment happening exactly at the moment that vertical scanout position >> jumps back to zero, ie. both events are driven by the same signal? Or is >> the framecounter increment just happening somewhere inside either >> scanline VTOTAL-1 or scanline 0? >> >> >> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >> regression and with a bit of luck at the same time improve by being able >> to set dev->vblank_disable_immediate = true then and allow vblank irqs >> to get turned off more aggressively for a bit of extra power saving. >> >> thanks, >> -mario > >> -- a/drivers/gpu/drm/drm_irq.c >> +++ b/drivers/gpu/drm/drm_irq.c >> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >> unsigned long flags) >> { >> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >> - u32 cur_vblank, diff; >> + u32 cur_vblank, diff = 0; >> bool rc; >> struct timeval t_vblank; >> + const struct timeval *t_old; >> + u64 diff_ns; >> int count = DRM_TIMESTAMP_MAXRETRIES; >> int framedur_ns = vblank->framedur_ns; >> >> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); >> >> - if (dev->max_vblank_count != 0) { >> - /* trust the hw counter when it's around */ >> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >> - } else if (rc && framedur_ns) { >> - const struct timeval *t_old; >> - u64 diff_ns; >> - >> + /* >> + * Always use vblank timestamping based method if supported to reject >> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >> + * due to some irq handling quirk. > > Hmm. I'm thinking it would be better to simply not claim that there's a hw > counter if it isn't reliable. > >> + * >> + * This also sets the diff value for use as fallback below in case the >> + * hw does not support a suitable hw vblank counter. >> + */ >> + if (rc && framedur_ns) { >> t_old = &vblanktimestamp(dev, pipe, vblank->count); >> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >> >> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >> */ >> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >> >> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >> - " diff_ns = %lld, framedur_ns = %d)\n", >> - pipe, (long long) diff_ns, framedur_ns); >> - } else { >> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >> + " diff_ns = %lld, framedur_ns = %d)\n", >> + pipe, (long long) diff_ns, framedur_ns); >> + >> + /* Treat this redundant invocation as no-op. */ >> + WARN_ON_ONCE(cur_vblank != vblank->last); >> + return; >> + } >> + } >> + >> + /* >> + * hw counters, if marked as supported via max_vblank_count != 0, >> + * *must* increment at leading edge of vblank and in sync with >> + * the firing of the hw vblank irq, otherwise we have a race here >> + * between gpu and us, where small variations in our execution >> + * timing can cause off-by-one miscounting of vblanks. >> + */ >> + if (dev->max_vblank_count != 0) { >> + /* trust the hw counter when it's around */ >> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >> + } else if (!(rc && framedur_ns)) { >> /* some kind of default for drivers w/o accurate vbl timestamping */ >> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >> } > >
On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote: > On 11/19/2015 07:20 PM, Ville Syrjälä wrote: > > On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote: > >> Hi Alex and Michel and Ville, > >> > >> it's "fix vblank stuff" time again ;-) > >> > >> Ville's changes to the DRM's drm_handle_vblank() / > >> drm_update_vblank_count() code in Linux 4.4 not only made that code more > >> elegant, but also removed the robustness against the vblank irq quirks > >> in AMD hw and similar hardware. So now i get tons of off-by-one errors and > >> > >> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip > >> completion event has impossible msc 24803 < target_msc 24804" XOrg > >> messages from that kernel. > > > > Argh. Sorry about that. > > > > On the plus side, your "vblank timestamp deltas as fake vblank counters" > code seems to work nicely on nouveau-kms, as far as testing with three > Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and > Ivybridge) i tested checked out nicely. > > And at least the recent nv50+ NVidia Tesla also have 16 bit vblank > counters which we could implement in nouveau, maybe with the same > trickery to allow long trouble-free vblank off periods, hopefully that > would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 > will probably also work. I think i read in the Tegra-3 PRM that the sync > points they use to implement vblank counters do increment at leading > edge of vblank. > > The only problem we may have is that some of the embedded gpus may not > have compliant vblank counters and they probably also lack vblank > timestamping, so it might be a good idea to rather not use vblank > counters at all in those drivers - patch their kms drivers to > max_vblank_count = 0; > > >> > >> One of the reasons for trouble is that AMD hw quirk where the hw fires > >> an extra vblank irq shortly after vblank irq's get enabled, not > >> synchronized to vblank, but typically in the middle of active scanout, > >> so we get a redundant call to drm_handle_vblank in the middle of scanout. > > > > I think that should be fine as such. The code should ignore redudntant > > vbl irqs. Well, assuming you have a reliable hw counter or you use the > > timestamp guesstimate mechanism and your scanout position is reported > > accurately. But I guess you have a bit of problem with both. > > > > The problem is i'll need to treat calls to radeon kms > driver->get_vblank_counter differently, depending if the function gets > called from vblank irq, or from regular code, so that hw quirk that > causes spontaneous misfiring of the vblank irq in the middle of scanout > would confuse my hw vblank counter cooking method to produce a fake hw > vblank counter increment. That's why i moved the filtering for redundant > irqs based on vblank timestamps in drm_vblank_update() around to always > apply. Makes us robust against that type of hw quirk in general and > makes life for the vblank counter cooking so much easier. > > It's a beautiful collaboration of different hw bugs to make things > interesting :) > > >> > >> To fix that i have a minor patch to make drm_update_vblank_count() again > >> robust against such redundant calls, which i will send out later to the > >> mailing list. Diff attached for reference. > >> > >> The second quirk of AMD hw is that the vblank interrupt fires a few > >> scanlines before start of vblank, so drm_handle_vblank -> > >> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets > >> called before the start of the vblank for which the new vblank count > >> should be queried. > > > > Does it fire too soon, or is the scanout position register value(s) > > just offset by a few lines perhaps? > > > > We have that with i915 and I simply fix up the value when reading it > > out. Fortunately for us the offset is constant (or at least seems to > > be) for a given platform/connector combo. > > > > I think they fire too soon, from all i've seen so far on a few cards. That's unfortunate. Firing a bit too late would be perfectly fine for most things. And that's actually what happens on older intel hw. Firing too soon opens up some more races, as in you may have to wait a bit more after getting woken up to make sure you've crossed into the freame you were waiting for. Also not sure how to deal with that sort of thing in a reasonable way in .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it could fudge things a bit when the early irq arrives, but then if someone calls drm_update_vblank_count() after the irq was handled but before start of vblank you'll end up with a -1 diff. Maybe something like: .get_scanout_positon() { read frame counter and scaline position if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) && scanline_slightly_below_start_of_vblank)) { last_fudge_frame = frame; scanline = start_of_vblank; } else last_fudge_frame = frame - 1; } could be done. So it would pretend the scanline has already crossed into the vblank when the interrupt arrives, and it should keep doing that until the frame counter indicates that you really crossed there. > > >> > >> The third problem is that the DRM vblank handling always had the > >> assumption that hardware vblank counters would essentially increment at > >> leading edge of vblank - basically in sync with the firing of the vblank > >> irq, so that a hw counter readout from within the vblank irq handler > >> would always deliver the new incremented value. If this assumption is > >> violated then the counting by use of the hw counter gets unreliable, > >> because depending on random small delays in irq handling the code may > >> end up sampling the hw counter pre- or post-increment, leading to > >> inconsistent updating and funky bugs. It just so happens that AMD > >> hardware doesn't increment the hw counter at leading edge of vblank, so > >> stuff falls apart. > >> > >> So to fix those two problems i'm tinkering with cooking the hw vblank > >> counter value returned by radeon_get_vblank_counter_kms() to make it > >> appear as if the counter incremented at leading edge of vblank in sync > >> with vblank irq. > > > > Yeah, I do that in i915 too. Well, on some platforms where the > > counter increments at the leading edge of active. > > > > Yep. I think that's key to make the vblank instant off stuff work reliably. I think it should work OK with the timestamp based stuff too, assuming you have accurate scanout position reporting. So having a reliable hw counter shouldn't be a requirement for that. > > -mario > > >> > >> It almost sort of works on the rs600 code path, but i need a bit of info > >> from you: > >> > >> 1. There's this register from the old specs for m76.pdf, which is not > >> part of the current register defines for radeon-kms: > >> > >> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" > >> > >> It contains the lower 16 bits of framecounter and the 13 bits of > >> vertical scanout position. It seems to give the same readings as the 24 > >> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This > >> would come handy. > >> > >> Does Evergreen and later have a same/similar register and where is it? > >> > >> 2. The hw framecounter seems to increment when the vertical scanout > >> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 > >> gpu i tested so far. Is this so on all asics? And is the hw counter > >> increment happening exactly at the moment that vertical scanout position > >> jumps back to zero, ie. both events are driven by the same signal? Or is > >> the framecounter increment just happening somewhere inside either > >> scanline VTOTAL-1 or scanline 0? > >> > >> > >> If we can fix this and get it into rc2 or rc3 then we could avoid a bad > >> regression and with a bit of luck at the same time improve by being able > >> to set dev->vblank_disable_immediate = true then and allow vblank irqs > >> to get turned off more aggressively for a bit of extra power saving. > >> > >> thanks, > >> -mario > > > >> -- a/drivers/gpu/drm/drm_irq.c > >> +++ b/drivers/gpu/drm/drm_irq.c > >> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >> unsigned long flags) > >> { > >> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > >> - u32 cur_vblank, diff; > >> + u32 cur_vblank, diff = 0; > >> bool rc; > >> struct timeval t_vblank; > >> + const struct timeval *t_old; > >> + u64 diff_ns; > >> int count = DRM_TIMESTAMP_MAXRETRIES; > >> int framedur_ns = vblank->framedur_ns; > >> > >> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); > >> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); > >> > >> - if (dev->max_vblank_count != 0) { > >> - /* trust the hw counter when it's around */ > >> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >> - } else if (rc && framedur_ns) { > >> - const struct timeval *t_old; > >> - u64 diff_ns; > >> - > >> + /* > >> + * Always use vblank timestamping based method if supported to reject > >> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up > >> + * due to some irq handling quirk. > > > > Hmm. I'm thinking it would be better to simply not claim that there's a hw > > counter if it isn't reliable. > > > >> + * > >> + * This also sets the diff value for use as fallback below in case the > >> + * hw does not support a suitable hw vblank counter. > >> + */ > >> + if (rc && framedur_ns) { > >> t_old = &vblanktimestamp(dev, pipe, vblank->count); > >> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); > >> > >> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >> */ > >> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); > >> > >> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) > >> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >> - " diff_ns = %lld, framedur_ns = %d)\n", > >> - pipe, (long long) diff_ns, framedur_ns); > >> - } else { > >> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { > >> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >> + " diff_ns = %lld, framedur_ns = %d)\n", > >> + pipe, (long long) diff_ns, framedur_ns); > >> + > >> + /* Treat this redundant invocation as no-op. */ > >> + WARN_ON_ONCE(cur_vblank != vblank->last); > >> + return; > >> + } > >> + } > >> + > >> + /* > >> + * hw counters, if marked as supported via max_vblank_count != 0, > >> + * *must* increment at leading edge of vblank and in sync with > >> + * the firing of the hw vblank irq, otherwise we have a race here > >> + * between gpu and us, where small variations in our execution > >> + * timing can cause off-by-one miscounting of vblanks. > >> + */ > >> + if (dev->max_vblank_count != 0) { > >> + /* trust the hw counter when it's around */ > >> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >> + } else if (!(rc && framedur_ns)) { > >> /* some kind of default for drivers w/o accurate vbl timestamping */ > >> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > >> } > > > >
On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > Hi Alex and Michel and Ville, > > it's "fix vblank stuff" time again ;-) Adding Harry from our display team. He might be able to fill in the blanks of on some of this better than I can. It might also be worth checking to see how our DAL (our new display code which is being developed directly by our display team) code handles this. It could be that we are just missing register settings: http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip Additionally we've published full registers headers for the display block: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce The DCE8 stuff should generally apply back to DCE4. If you have questions about registers older asics not covered in the hw docs, let me know. Note the new headers are dword aligned rather than byte aligned. > > Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_count() > code in Linux 4.4 not only made that code more elegant, but also removed the > robustness against the vblank irq quirks in AMD hw and similar hardware. So > now i get tons of off-by-one errors and > > "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip > completion event has impossible msc 24803 < target_msc 24804" XOrg messages > from that kernel. > > One of the reasons for trouble is that AMD hw quirk where the hw fires an > extra vblank irq shortly after vblank irq's get enabled, not synchronized to > vblank, but typically in the middle of active scanout, so we get a redundant > call to drm_handle_vblank in the middle of scanout. > > To fix that i have a minor patch to make drm_update_vblank_count() again > robust against such redundant calls, which i will send out later to the > mailing list. Diff attached for reference. > > The second quirk of AMD hw is that the vblank interrupt fires a few > scanlines before start of vblank, so drm_handle_vblank -> > drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets called > before the start of the vblank for which the new vblank count should be > queried. > > The third problem is that the DRM vblank handling always had the assumption > that hardware vblank counters would essentially increment at leading edge of > vblank - basically in sync with the firing of the vblank irq, so that a hw > counter readout from within the vblank irq handler would always deliver the > new incremented value. If this assumption is violated then the counting by > use of the hw counter gets unreliable, because depending on random small > delays in irq handling the code may end up sampling the hw counter pre- or > post-increment, leading to inconsistent updating and funky bugs. It just so > happens that AMD hardware doesn't increment the hw counter at leading edge > of vblank, so stuff falls apart. > > So to fix those two problems i'm tinkering with cooking the hw vblank > counter value returned by radeon_get_vblank_counter_kms() to make it appear > as if the counter incremented at leading edge of vblank in sync with vblank > irq. > > It almost sort of works on the rs600 code path, but i need a bit of info > from you: > > 1. There's this register from the old specs for m76.pdf, which is not part > of the current register defines for radeon-kms: > > "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" > > It contains the lower 16 bits of framecounter and the 13 bits of vertical > scanout position. It seems to give the same readings as the 24 bit > R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This would > come handy. > > Does Evergreen and later have a same/similar register and where is it? Yes. CRTC_STATUS_VF_COUNT CRTC:CRTC_STATUS_VF_COUNT · [R/W] · 32 bits · Access: 8/16/32 · [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2] GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4] GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c DESCRIPTION: Current composite vertical and frame count for CRTC Field Name Bits Default Description CRTC_VF_COUNT (Access: R) 28:0 0x0 Reports current vertical and frame count > > 2. The hw framecounter seems to increment when the vertical scanout position > wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i tested so > far. Is this so on all asics? And is the hw counter increment happening > exactly at the moment that vertical scanout position jumps back to zero, ie. > both events are driven by the same signal? Or is the framecounter increment > just happening somewhere inside either scanline VTOTAL-1 or scanline 0? Unless Harry knows, we can ask the hw team, but I doubt they would have changed it. I think it's tied to the start line which is controlled by this register: CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc DESCRIPTION: move start_line signal earlier by 1 line in CRTC Field Name Bits Default Description CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line signal by 1 line eariler in progressive mode CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line signal by 1 line earlier in interlaced timing mode CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced Start Line position respect to not VBLANK. Default of 4 means the Advanced Start Line is 4 lines before the first non VBLANK line The following info I dug up internally may be useful: The position of the CRTC output timing generator when the “start of frame” indicator occurs depends on several conditions. These conditions are whether the timing generator is outputting timing for a progressive or interlaced display mode, whether the scaler is enabled, and if so, whether the register to select an earlier than normal “start of frame” indicator is set. The “start of frame” indicator typically occurs 2 lines before the vertical blank end position (DxCRTC_V_BLANK_END) is reached When interlaced output display modes are enabled (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts by 2 for both the even fields and odd fields of each frame. For both the even fields and the odd fields of interlaced output modes, the “start of frame” indicator occurs 2 lines before the end of the vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END – 3 depending on the field since the vertical counter counts by 2 in interlaced), since the vertical counter counts by 2 in this mode). There is one exception to the previous statement. If scaling is enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then the “start of frame” indicator happens 3 lines before the end of the vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5 depending on the field since the vertical counter counts by 2 in interlaced). When progressive output display modes are enabled (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2). Similar to interlaced output mode, there is one exception to the previous statement. If scaling is enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY = 1) in progressive output mode then the “start of frame” indicator happens 3 lines before the end of the vertical blank (DxCRTC_V_BLANK_END – 3) I hope this helps, Alex > > > If we can fix this and get it into rc2 or rc3 then we could avoid a bad > regression and with a bit of luck at the same time improve by being able to > set dev->vblank_disable_immediate = true then and allow vblank irqs to get > turned off more aggressively for a bit of extra power saving. > > thanks, > -mario
On 11/19/2015 08:45 PM, Ville Syrjälä wrote: > On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote: >> On 11/19/2015 07:20 PM, Ville Syrjälä wrote: >>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote: >>>> Hi Alex and Michel and Ville, >>>> >>>> it's "fix vblank stuff" time again ;-) >>>> >>>> Ville's changes to the DRM's drm_handle_vblank() / >>>> drm_update_vblank_count() code in Linux 4.4 not only made that code more >>>> elegant, but also removed the robustness against the vblank irq quirks >>>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and >>>> >>>> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >>>> completion event has impossible msc 24803 < target_msc 24804" XOrg >>>> messages from that kernel. >>> >>> Argh. Sorry about that. >>> >> >> On the plus side, your "vblank timestamp deltas as fake vblank counters" >> code seems to work nicely on nouveau-kms, as far as testing with three >> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and >> Ivybridge) i tested checked out nicely. >> >> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank >> counters which we could implement in nouveau, maybe with the same >> trickery to allow long trouble-free vblank off periods, hopefully that >> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 >> will probably also work. I think i read in the Tegra-3 PRM that the sync >> points they use to implement vblank counters do increment at leading >> edge of vblank. >> >> The only problem we may have is that some of the embedded gpus may not >> have compliant vblank counters and they probably also lack vblank >> timestamping, so it might be a good idea to rather not use vblank >> counters at all in those drivers - patch their kms drivers to >> max_vblank_count = 0; >> >>>> >>>> One of the reasons for trouble is that AMD hw quirk where the hw fires >>>> an extra vblank irq shortly after vblank irq's get enabled, not >>>> synchronized to vblank, but typically in the middle of active scanout, >>>> so we get a redundant call to drm_handle_vblank in the middle of scanout. >>> >>> I think that should be fine as such. The code should ignore redudntant >>> vbl irqs. Well, assuming you have a reliable hw counter or you use the >>> timestamp guesstimate mechanism and your scanout position is reported >>> accurately. But I guess you have a bit of problem with both. >>> >> >> The problem is i'll need to treat calls to radeon kms >> driver->get_vblank_counter differently, depending if the function gets >> called from vblank irq, or from regular code, so that hw quirk that >> causes spontaneous misfiring of the vblank irq in the middle of scanout >> would confuse my hw vblank counter cooking method to produce a fake hw >> vblank counter increment. That's why i moved the filtering for redundant >> irqs based on vblank timestamps in drm_vblank_update() around to always >> apply. Makes us robust against that type of hw quirk in general and >> makes life for the vblank counter cooking so much easier. >> >> It's a beautiful collaboration of different hw bugs to make things >> interesting :) >> >>>> >>>> To fix that i have a minor patch to make drm_update_vblank_count() again >>>> robust against such redundant calls, which i will send out later to the >>>> mailing list. Diff attached for reference. >>>> >>>> The second quirk of AMD hw is that the vblank interrupt fires a few >>>> scanlines before start of vblank, so drm_handle_vblank -> >>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets >>>> called before the start of the vblank for which the new vblank count >>>> should be queried. >>> >>> Does it fire too soon, or is the scanout position register value(s) >>> just offset by a few lines perhaps? >>> >>> We have that with i915 and I simply fix up the value when reading it >>> out. Fortunately for us the offset is constant (or at least seems to >>> be) for a given platform/connector combo. >>> >> >> I think they fire too soon, from all i've seen so far on a few cards. > > That's unfortunate. Firing a bit too late would be perfectly fine for > most things. And that's actually what happens on older intel hw. Firing > too soon opens up some more races, as in you may have to wait a bit > more after getting woken up to make sure you've crossed into the > freame you were waiting for. > > Also not sure how to deal with that sort of thing in a reasonable way in > .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it > could fudge things a bit when the early irq arrives, but then if someone > calls drm_update_vblank_count() after the irq was handled but before > start of vblank you'll end up with a -1 diff. > > Maybe something like: > .get_scanout_positon() > { > read frame counter and scaline position > > if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) && > scanline_slightly_below_start_of_vblank)) { > last_fudge_frame = frame; > scanline = start_of_vblank; > } else > last_fudge_frame = frame - 1; > } > > could be done. So it would pretend the scanline has already crossed into > the vblank when the interrupt arrives, and it should keep doing that > until the frame counter indicates that you really crossed there. > What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we are no more than 1% of the display height away from start of vblank we fudge scanout position in a way so that the timestamp gets computed for the soon-to-begin vblank, not the old one. For the vblank counter cooking atm. i check for in_irq(), assuming that means the same as DRM_CALLED_FROM_VBLIRQ, given that is currently the only call from a hw irq path, and then i test scanout position for no-more-than-10-scanlines-to-vblank, to decide if i have to bump the returned vblank count. On the sample of gpus i tested the vblank irq fired somewhere between 2 and 4 scanlines too early, so 10 scanlines margin should be ok. So it's not firing much too early, just a little bit. >> >>>> >>>> The third problem is that the DRM vblank handling always had the >>>> assumption that hardware vblank counters would essentially increment at >>>> leading edge of vblank - basically in sync with the firing of the vblank >>>> irq, so that a hw counter readout from within the vblank irq handler >>>> would always deliver the new incremented value. If this assumption is >>>> violated then the counting by use of the hw counter gets unreliable, >>>> because depending on random small delays in irq handling the code may >>>> end up sampling the hw counter pre- or post-increment, leading to >>>> inconsistent updating and funky bugs. It just so happens that AMD >>>> hardware doesn't increment the hw counter at leading edge of vblank, so >>>> stuff falls apart. >>>> >>>> So to fix those two problems i'm tinkering with cooking the hw vblank >>>> counter value returned by radeon_get_vblank_counter_kms() to make it >>>> appear as if the counter incremented at leading edge of vblank in sync >>>> with vblank irq. >>> >>> Yeah, I do that in i915 too. Well, on some platforms where the >>> counter increments at the leading edge of active. >>> >> >> Yep. I think that's key to make the vblank instant off stuff work reliably. > > I think it should work OK with the timestamp based stuff too, assuming > you have accurate scanout position reporting. So having a reliable hw > counter shouldn't be a requirement for that. > Yes, should work for the instant off/on itself. But the limitation of that is that with the timestamp based stuff we can only be accurate for vblank off periods of a few seconds into the future. The framedur_ns we calculate from hw mode and the refresh duration computed from the timestamps as measured by the host clock can easily deviate by, e.g., 10 usecs due to difference and drift between the gpu pll and host clock, so for each passing refresh cycle with vblank off we'd accumulate ~10 usecs of error. After accumulating half a refresh duration of error we'd round to the wrong count and be off by one. So at 60 Hz refresh we'd have about 16666 usecs / 2 = 8333 usecs headroom / 10 usecs per frame error ~ 833 refresh cycles before errors happen. At 60 Hz we'd have 833 / 60 ~ 14 seconds of vblank off. For rates of 120 Hz gamer panels or some 200 Hz neuro-science applications we'd only get a few seconds off error free vblank off without real hardware counters. So it's good enough for typical desktop applications/compositors/games/media players, and a nice improvement over the previous state, but not quite sufficient for applications that need long time consistent vblank counts for long waits of multiple seconds or even minutes. So it's still good to have hw counters if we can get some that are reliable enough. -mario >> >> -mario >> >>>> >>>> It almost sort of works on the rs600 code path, but i need a bit of info >>>> from you: >>>> >>>> 1. There's this register from the old specs for m76.pdf, which is not >>>> part of the current register defines for radeon-kms: >>>> >>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>>> >>>> It contains the lower 16 bits of framecounter and the 13 bits of >>>> vertical scanout position. It seems to give the same readings as the 24 >>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>>> would come handy. >>>> >>>> Does Evergreen and later have a same/similar register and where is it? >>>> >>>> 2. The hw framecounter seems to increment when the vertical scanout >>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 >>>> gpu i tested so far. Is this so on all asics? And is the hw counter >>>> increment happening exactly at the moment that vertical scanout position >>>> jumps back to zero, ie. both events are driven by the same signal? Or is >>>> the framecounter increment just happening somewhere inside either >>>> scanline VTOTAL-1 or scanline 0? >>>> >>>> >>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>>> regression and with a bit of luck at the same time improve by being able >>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs >>>> to get turned off more aggressively for a bit of extra power saving. >>>> >>>> thanks, >>>> -mario >>> >>>> -- a/drivers/gpu/drm/drm_irq.c >>>> +++ b/drivers/gpu/drm/drm_irq.c >>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>> unsigned long flags) >>>> { >>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >>>> - u32 cur_vblank, diff; >>>> + u32 cur_vblank, diff = 0; >>>> bool rc; >>>> struct timeval t_vblank; >>>> + const struct timeval *t_old; >>>> + u64 diff_ns; >>>> int count = DRM_TIMESTAMP_MAXRETRIES; >>>> int framedur_ns = vblank->framedur_ns; >>>> >>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); >>>> >>>> - if (dev->max_vblank_count != 0) { >>>> - /* trust the hw counter when it's around */ >>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>> - } else if (rc && framedur_ns) { >>>> - const struct timeval *t_old; >>>> - u64 diff_ns; >>>> - >>>> + /* >>>> + * Always use vblank timestamping based method if supported to reject >>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >>>> + * due to some irq handling quirk. >>> >>> Hmm. I'm thinking it would be better to simply not claim that there's a hw >>> counter if it isn't reliable. >>> >>>> + * >>>> + * This also sets the diff value for use as fallback below in case the >>>> + * hw does not support a suitable hw vblank counter. >>>> + */ >>>> + if (rc && framedur_ns) { >>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); >>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >>>> >>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>> */ >>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >>>> >>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>> - " diff_ns = %lld, framedur_ns = %d)\n", >>>> - pipe, (long long) diff_ns, framedur_ns); >>>> - } else { >>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>> + " diff_ns = %lld, framedur_ns = %d)\n", >>>> + pipe, (long long) diff_ns, framedur_ns); >>>> + >>>> + /* Treat this redundant invocation as no-op. */ >>>> + WARN_ON_ONCE(cur_vblank != vblank->last); >>>> + return; >>>> + } >>>> + } >>>> + >>>> + /* >>>> + * hw counters, if marked as supported via max_vblank_count != 0, >>>> + * *must* increment at leading edge of vblank and in sync with >>>> + * the firing of the hw vblank irq, otherwise we have a race here >>>> + * between gpu and us, where small variations in our execution >>>> + * timing can cause off-by-one miscounting of vblanks. >>>> + */ >>>> + if (dev->max_vblank_count != 0) { >>>> + /* trust the hw counter when it's around */ >>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>> + } else if (!(rc && framedur_ns)) { >>>> /* some kind of default for drivers w/o accurate vbl timestamping */ >>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >>>> } >>> >>> >
On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > > > On 11/19/2015 08:45 PM, Ville Syrjälä wrote: > > On Thu, Nov 19, 2015 at 08:12:24PM +0100, Mario Kleiner wrote: > >> On 11/19/2015 07:20 PM, Ville Syrjälä wrote: > >>> On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote: > >>>> Hi Alex and Michel and Ville, > >>>> > >>>> it's "fix vblank stuff" time again ;-) > >>>> > >>>> Ville's changes to the DRM's drm_handle_vblank() / > >>>> drm_update_vblank_count() code in Linux 4.4 not only made that code more > >>>> elegant, but also removed the robustness against the vblank irq quirks > >>>> in AMD hw and similar hardware. So now i get tons of off-by-one errors and > >>>> > >>>> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip > >>>> completion event has impossible msc 24803 < target_msc 24804" XOrg > >>>> messages from that kernel. > >>> > >>> Argh. Sorry about that. > >>> > >> > >> On the plus side, your "vblank timestamp deltas as fake vblank counters" > >> code seems to work nicely on nouveau-kms, as far as testing with three > >> Nvidia's went so far :). And both Intel gpu's (HD Ironlake, and > >> Ivybridge) i tested checked out nicely. > >> > >> And at least the recent nv50+ NVidia Tesla also have 16 bit vblank > >> counters which we could implement in nouveau, maybe with the same > >> trickery to allow long trouble-free vblank off periods, hopefully that > >> would also apply to the Tegra-4 and later Kepler based parts. Tegra-3 > >> will probably also work. I think i read in the Tegra-3 PRM that the sync > >> points they use to implement vblank counters do increment at leading > >> edge of vblank. > >> > >> The only problem we may have is that some of the embedded gpus may not > >> have compliant vblank counters and they probably also lack vblank > >> timestamping, so it might be a good idea to rather not use vblank > >> counters at all in those drivers - patch their kms drivers to > >> max_vblank_count = 0; > >> > >>>> > >>>> One of the reasons for trouble is that AMD hw quirk where the hw fires > >>>> an extra vblank irq shortly after vblank irq's get enabled, not > >>>> synchronized to vblank, but typically in the middle of active scanout, > >>>> so we get a redundant call to drm_handle_vblank in the middle of scanout. > >>> > >>> I think that should be fine as such. The code should ignore redudntant > >>> vbl irqs. Well, assuming you have a reliable hw counter or you use the > >>> timestamp guesstimate mechanism and your scanout position is reported > >>> accurately. But I guess you have a bit of problem with both. > >>> > >> > >> The problem is i'll need to treat calls to radeon kms > >> driver->get_vblank_counter differently, depending if the function gets > >> called from vblank irq, or from regular code, so that hw quirk that > >> causes spontaneous misfiring of the vblank irq in the middle of scanout > >> would confuse my hw vblank counter cooking method to produce a fake hw > >> vblank counter increment. That's why i moved the filtering for redundant > >> irqs based on vblank timestamps in drm_vblank_update() around to always > >> apply. Makes us robust against that type of hw quirk in general and > >> makes life for the vblank counter cooking so much easier. > >> > >> It's a beautiful collaboration of different hw bugs to make things > >> interesting :) > >> > >>>> > >>>> To fix that i have a minor patch to make drm_update_vblank_count() again > >>>> robust against such redundant calls, which i will send out later to the > >>>> mailing list. Diff attached for reference. > >>>> > >>>> The second quirk of AMD hw is that the vblank interrupt fires a few > >>>> scanlines before start of vblank, so drm_handle_vblank -> > >>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets > >>>> called before the start of the vblank for which the new vblank count > >>>> should be queried. > >>> > >>> Does it fire too soon, or is the scanout position register value(s) > >>> just offset by a few lines perhaps? > >>> > >>> We have that with i915 and I simply fix up the value when reading it > >>> out. Fortunately for us the offset is constant (or at least seems to > >>> be) for a given platform/connector combo. > >>> > >> > >> I think they fire too soon, from all i've seen so far on a few cards. > > > > That's unfortunate. Firing a bit too late would be perfectly fine for > > most things. And that's actually what happens on older intel hw. Firing > > too soon opens up some more races, as in you may have to wait a bit > > more after getting woken up to make sure you've crossed into the > > freame you were waiting for. > > > > Also not sure how to deal with that sort of thing in a reasonable way in > > .get_vblank_timestamp(). DRM_CALLED_FROM_VBLIRQ gets passed there, so it > > could fudge things a bit when the early irq arrives, but then if someone > > calls drm_update_vblank_count() after the irq was handled but before > > start of vblank you'll end up with a -1 diff. > > > > Maybe something like: > > .get_scanout_positon() > > { > > read frame counter and scaline position > > > > if ((flags & DRM_CALLED_FROM_VBLIRQ || frame == last_fudge_frame) && > > scanline_slightly_below_start_of_vblank)) { > > last_fudge_frame = frame; > > scanline = start_of_vblank; > > } else > > last_fudge_frame = frame - 1; > > } > > > > could be done. So it would pretend the scanline has already crossed into > > the vblank when the interrupt arrives, and it should keep doing that > > until the frame counter indicates that you really crossed there. > > > > What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we > are no more than 1% of the display height away from start of vblank we > fudge scanout position in a way so that the timestamp gets computed for > the soon-to-begin vblank, not the old one. The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is that if you call .get_scanout_positon() from a non-irq context between the irq firing and start of vblank, you'll think you're still in the previous frame. Hence my suggestion to note down the frame counter when called from the irq, and then keep doing the fudging until you've truly crossed the start of vblank. > For the vblank counter > cooking atm. i check for in_irq(), assuming that means the same as > DRM_CALLED_FROM_VBLIRQ, given that is currently the only call from a hw > irq path, and then i test scanout position for > no-more-than-10-scanlines-to-vblank, to decide if i have to bump the > returned vblank count. On the sample of gpus i tested the vblank irq > fired somewhere between 2 and 4 scanlines too early, so 10 scanlines > margin should be ok. So it's not firing much too early, just a little bit. > > >> > >>>> > >>>> The third problem is that the DRM vblank handling always had the > >>>> assumption that hardware vblank counters would essentially increment at > >>>> leading edge of vblank - basically in sync with the firing of the vblank > >>>> irq, so that a hw counter readout from within the vblank irq handler > >>>> would always deliver the new incremented value. If this assumption is > >>>> violated then the counting by use of the hw counter gets unreliable, > >>>> because depending on random small delays in irq handling the code may > >>>> end up sampling the hw counter pre- or post-increment, leading to > >>>> inconsistent updating and funky bugs. It just so happens that AMD > >>>> hardware doesn't increment the hw counter at leading edge of vblank, so > >>>> stuff falls apart. > >>>> > >>>> So to fix those two problems i'm tinkering with cooking the hw vblank > >>>> counter value returned by radeon_get_vblank_counter_kms() to make it > >>>> appear as if the counter incremented at leading edge of vblank in sync > >>>> with vblank irq. > >>> > >>> Yeah, I do that in i915 too. Well, on some platforms where the > >>> counter increments at the leading edge of active. > >>> > >> > >> Yep. I think that's key to make the vblank instant off stuff work reliably. > > > > I think it should work OK with the timestamp based stuff too, assuming > > you have accurate scanout position reporting. So having a reliable hw > > counter shouldn't be a requirement for that. > > > > Yes, should work for the instant off/on itself. But the limitation of > that is that with the timestamp based stuff we can only be accurate for > vblank off periods of a few seconds into the future. The framedur_ns we > calculate from hw mode and the refresh duration computed from the > timestamps as measured by the host clock can easily deviate by, e.g., 10 > usecs due to difference and drift between the gpu pll and host clock, so > for each passing refresh cycle with vblank off we'd accumulate ~10 usecs > of error. After accumulating half a refresh duration of error we'd round > to the wrong count and be off by one. So at 60 Hz refresh we'd have > about 16666 usecs / 2 = 8333 usecs headroom / 10 usecs per frame error ~ > 833 refresh cycles before errors happen. At 60 Hz we'd have 833 / 60 ~ > 14 seconds of vblank off. For rates of 120 Hz gamer panels or some 200 > Hz neuro-science applications we'd only get a few seconds off error free > vblank off without real hardware counters. > > So it's good enough for typical desktop > applications/compositors/games/media players, and a nice improvement > over the previous state, but not quite sufficient for applications that > need long time consistent vblank counts for long waits of multiple > seconds or even minutes. So it's still good to have hw counters if we > can get some that are reliable enough. Ah, I didn't realize you care about small errors in the counter for such long periods of vblank off. > > -mario > > > >> > >> -mario > >> > >>>> > >>>> It almost sort of works on the rs600 code path, but i need a bit of info > >>>> from you: > >>>> > >>>> 1. There's this register from the old specs for m76.pdf, which is not > >>>> part of the current register defines for radeon-kms: > >>>> > >>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" > >>>> > >>>> It contains the lower 16 bits of framecounter and the 13 bits of > >>>> vertical scanout position. It seems to give the same readings as the 24 > >>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This > >>>> would come handy. > >>>> > >>>> Does Evergreen and later have a same/similar register and where is it? > >>>> > >>>> 2. The hw framecounter seems to increment when the vertical scanout > >>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 > >>>> gpu i tested so far. Is this so on all asics? And is the hw counter > >>>> increment happening exactly at the moment that vertical scanout position > >>>> jumps back to zero, ie. both events are driven by the same signal? Or is > >>>> the framecounter increment just happening somewhere inside either > >>>> scanline VTOTAL-1 or scanline 0? > >>>> > >>>> > >>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad > >>>> regression and with a bit of luck at the same time improve by being able > >>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs > >>>> to get turned off more aggressively for a bit of extra power saving. > >>>> > >>>> thanks, > >>>> -mario > >>> > >>>> -- a/drivers/gpu/drm/drm_irq.c > >>>> +++ b/drivers/gpu/drm/drm_irq.c > >>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>> unsigned long flags) > >>>> { > >>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > >>>> - u32 cur_vblank, diff; > >>>> + u32 cur_vblank, diff = 0; > >>>> bool rc; > >>>> struct timeval t_vblank; > >>>> + const struct timeval *t_old; > >>>> + u64 diff_ns; > >>>> int count = DRM_TIMESTAMP_MAXRETRIES; > >>>> int framedur_ns = vblank->framedur_ns; > >>>> > >>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); > >>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); > >>>> > >>>> - if (dev->max_vblank_count != 0) { > >>>> - /* trust the hw counter when it's around */ > >>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >>>> - } else if (rc && framedur_ns) { > >>>> - const struct timeval *t_old; > >>>> - u64 diff_ns; > >>>> - > >>>> + /* > >>>> + * Always use vblank timestamping based method if supported to reject > >>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up > >>>> + * due to some irq handling quirk. > >>> > >>> Hmm. I'm thinking it would be better to simply not claim that there's a hw > >>> counter if it isn't reliable. > >>> > >>>> + * > >>>> + * This also sets the diff value for use as fallback below in case the > >>>> + * hw does not support a suitable hw vblank counter. > >>>> + */ > >>>> + if (rc && framedur_ns) { > >>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); > >>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); > >>>> > >>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>> */ > >>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); > >>>> > >>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) > >>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >>>> - " diff_ns = %lld, framedur_ns = %d)\n", > >>>> - pipe, (long long) diff_ns, framedur_ns); > >>>> - } else { > >>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { > >>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >>>> + " diff_ns = %lld, framedur_ns = %d)\n", > >>>> + pipe, (long long) diff_ns, framedur_ns); > >>>> + > >>>> + /* Treat this redundant invocation as no-op. */ > >>>> + WARN_ON_ONCE(cur_vblank != vblank->last); > >>>> + return; > >>>> + } > >>>> + } > >>>> + > >>>> + /* > >>>> + * hw counters, if marked as supported via max_vblank_count != 0, > >>>> + * *must* increment at leading edge of vblank and in sync with > >>>> + * the firing of the hw vblank irq, otherwise we have a race here > >>>> + * between gpu and us, where small variations in our execution > >>>> + * timing can cause off-by-one miscounting of vblanks. > >>>> + */ > >>>> + if (dev->max_vblank_count != 0) { > >>>> + /* trust the hw counter when it's around */ > >>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >>>> + } else if (!(rc && framedur_ns)) { > >>>> /* some kind of default for drivers w/o accurate vbl timestamping */ > >>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > >>>> } > >>> > >>> > >
On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: ... >> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we >> are no more than 1% of the display height away from start of vblank we >> fudge scanout position in a way so that the timestamp gets computed for >> the soon-to-begin vblank, not the old one. > > The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is > that if you call .get_scanout_positon() from a non-irq context between > the irq firing and start of vblank, you'll think you're still in the > previous frame. Hence my suggestion to note down the frame counter when > called from the irq, and then keep doing the fudging until you've truly > crossed the start of vblank. > Ok, but why would that be a bad thing? I think we want it to think it is in the previous frame if it is called outside the vblank irq context. The only reason we fudge it to the next frames vblank if i vblank irq is because we know the vblank irq handler we are executing atm. was meant to execute within the upcoming vblank for the next frame, so we fudge the scanout positions and thereby timestamp to correspond to that new frame. But if something called outside irq context it should get a scanout position/timestamp that corresponds to "reality". It would maybe be a problem to get different answers for a query at the same scanout position if we used .get_scanout_position() for something else than for calculating vblank timestamps, but we don't do that atm. Maybe i'm overlooking something here related to the somewhat rewritten code from the last year or so? But in the original design this would be exactly what i intended? ... >> So it's good enough for typical desktop >> applications/compositors/games/media players, and a nice improvement >> over the previous state, but not quite sufficient for applications that >> need long time consistent vblank counts for long waits of multiple >> seconds or even minutes. So it's still good to have hw counters if we >> can get some that are reliable enough. > > Ah, I didn't realize you care about small errors in the counter for > such long periods of vblank off. > Actually, you are right, i was stupid - not enough sleep last friday. I do care about such small errors, but the long vblank off periods don't matter at all the way my software works. I query the current count and timestamp (glXGetSyncValuesOML), calculate a target count based on those and then schedule a swap for that target count via glXSwapBuffersMscOML. That swapbuffers call will keep vblank irqs on until the kms pageflip is queued. So i only care about vblank counts/timestamps being consistent for short amounts of time, typically 1 video refresh from vblank query to queueing a vblank event, and then from reception of that event/queuing the pageflip to pageflip completion event. So even if the system would be heavily loaded and my code would have big preemption delays i think counts that are consistent over a few seconds would be enough to keep things working. Otherwise it wouldn't work now either with a vblank off after 5 seconds and nouveau not having vblank hw counters. -mario >> >> -mario >> >> >>>> >>>> -mario >>>> >>>>>> >>>>>> It almost sort of works on the rs600 code path, but i need a bit of info >>>>>> from you: >>>>>> >>>>>> 1. There's this register from the old specs for m76.pdf, which is not >>>>>> part of the current register defines for radeon-kms: >>>>>> >>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>>>>> >>>>>> It contains the lower 16 bits of framecounter and the 13 bits of >>>>>> vertical scanout position. It seems to give the same readings as the 24 >>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>>>>> would come handy. >>>>>> >>>>>> Does Evergreen and later have a same/similar register and where is it? >>>>>> >>>>>> 2. The hw framecounter seems to increment when the vertical scanout >>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 >>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter >>>>>> increment happening exactly at the moment that vertical scanout position >>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is >>>>>> the framecounter increment just happening somewhere inside either >>>>>> scanline VTOTAL-1 or scanline 0? >>>>>> >>>>>> >>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>>>>> regression and with a bit of luck at the same time improve by being able >>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs >>>>>> to get turned off more aggressively for a bit of extra power saving. >>>>>> >>>>>> thanks, >>>>>> -mario >>>>> >>>>>> -- a/drivers/gpu/drm/drm_irq.c >>>>>> +++ b/drivers/gpu/drm/drm_irq.c >>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>> unsigned long flags) >>>>>> { >>>>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >>>>>> - u32 cur_vblank, diff; >>>>>> + u32 cur_vblank, diff = 0; >>>>>> bool rc; >>>>>> struct timeval t_vblank; >>>>>> + const struct timeval *t_old; >>>>>> + u64 diff_ns; >>>>>> int count = DRM_TIMESTAMP_MAXRETRIES; >>>>>> int framedur_ns = vblank->framedur_ns; >>>>>> >>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >>>>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); >>>>>> >>>>>> - if (dev->max_vblank_count != 0) { >>>>>> - /* trust the hw counter when it's around */ >>>>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>>>> - } else if (rc && framedur_ns) { >>>>>> - const struct timeval *t_old; >>>>>> - u64 diff_ns; >>>>>> - >>>>>> + /* >>>>>> + * Always use vblank timestamping based method if supported to reject >>>>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >>>>>> + * due to some irq handling quirk. >>>>> >>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw >>>>> counter if it isn't reliable. >>>>> >>>>>> + * >>>>>> + * This also sets the diff value for use as fallback below in case the >>>>>> + * hw does not support a suitable hw vblank counter. >>>>>> + */ >>>>>> + if (rc && framedur_ns) { >>>>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); >>>>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >>>>>> >>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>> */ >>>>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >>>>>> >>>>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >>>>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>>>> - " diff_ns = %lld, framedur_ns = %d)\n", >>>>>> - pipe, (long long) diff_ns, framedur_ns); >>>>>> - } else { >>>>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >>>>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>>>> + " diff_ns = %lld, framedur_ns = %d)\n", >>>>>> + pipe, (long long) diff_ns, framedur_ns); >>>>>> + >>>>>> + /* Treat this redundant invocation as no-op. */ >>>>>> + WARN_ON_ONCE(cur_vblank != vblank->last); >>>>>> + return; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /* >>>>>> + * hw counters, if marked as supported via max_vblank_count != 0, >>>>>> + * *must* increment at leading edge of vblank and in sync with >>>>>> + * the firing of the hw vblank irq, otherwise we have a race here >>>>>> + * between gpu and us, where small variations in our execution >>>>>> + * timing can cause off-by-one miscounting of vblanks. >>>>>> + */ >>>>>> + if (dev->max_vblank_count != 0) { >>>>>> + /* trust the hw counter when it's around */ >>>>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>>>> + } else if (!(rc && framedur_ns)) { >>>>>> /* some kind of default for drivers w/o accurate vbl timestamping */ >>>>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >>>>>> } >>>>> >>>>> >>> >
On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: > On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > > ... > > >> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we > >> are no more than 1% of the display height away from start of vblank we > >> fudge scanout position in a way so that the timestamp gets computed for > >> the soon-to-begin vblank, not the old one. > > > > The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is > > that if you call .get_scanout_positon() from a non-irq context between > > the irq firing and start of vblank, you'll think you're still in the > > previous frame. Hence my suggestion to note down the frame counter when > > called from the irq, and then keep doing the fudging until you've truly > > crossed the start of vblank. > > > > Ok, but why would that be a bad thing? I think we want it to think it is > in the previous frame if it is called outside the vblank irq context. > The only reason we fudge it to the next frames vblank if i vblank irq is > because we know the vblank irq handler we are executing atm. was meant > to execute within the upcoming vblank for the next frame, so we fudge > the scanout positions and thereby timestamp to correspond to that new > frame. But if something called outside irq context it should get a > scanout position/timestamp that corresponds to "reality". It would be a bad thing since it would cause the timestamp to jump backwards, and that would also cause the frame count guesstimate to go backwards. > > It would maybe be a problem to get different answers for a query at the > same scanout position if we used .get_scanout_position() for something > else than for calculating vblank timestamps, but we don't do that atm. > > Maybe i'm overlooking something here related to the somewhat rewritten > code from the last year or so? But in the original design this would be > exactly what i intended? > > ... > > >> So it's good enough for typical desktop > >> applications/compositors/games/media players, and a nice improvement > >> over the previous state, but not quite sufficient for applications that > >> need long time consistent vblank counts for long waits of multiple > >> seconds or even minutes. So it's still good to have hw counters if we > >> can get some that are reliable enough. > > > > Ah, I didn't realize you care about small errors in the counter for > > such long periods of vblank off. > > > > Actually, you are right, i was stupid - not enough sleep last friday. I > do care about such small errors, but the long vblank off periods don't > matter at all the way my software works. I query the current count and > timestamp (glXGetSyncValuesOML), calculate a target count based on those > and then schedule a swap for that target count via glXSwapBuffersMscOML. > That swapbuffers call will keep vblank irqs on until the kms pageflip is > queued. So i only care about vblank counts/timestamps being consistent > for short amounts of time, typically 1 video refresh from vblank query > to queueing a vblank event, and then from reception of that > event/queuing the pageflip to pageflip completion event. So even if the > system would be heavily loaded and my code would have big preemption > delays i think counts that are consistent over a few seconds would be > enough to keep things working. Otherwise it wouldn't work now either > with a vblank off after 5 seconds and nouveau not having vblank hw counters. > > -mario > > > >> > >> -mario > >> > >> > >>>> > >>>> -mario > >>>> > >>>>>> > >>>>>> It almost sort of works on the rs600 code path, but i need a bit of info > >>>>>> from you: > >>>>>> > >>>>>> 1. There's this register from the old specs for m76.pdf, which is not > >>>>>> part of the current register defines for radeon-kms: > >>>>>> > >>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" > >>>>>> > >>>>>> It contains the lower 16 bits of framecounter and the 13 bits of > >>>>>> vertical scanout position. It seems to give the same readings as the 24 > >>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This > >>>>>> would come handy. > >>>>>> > >>>>>> Does Evergreen and later have a same/similar register and where is it? > >>>>>> > >>>>>> 2. The hw framecounter seems to increment when the vertical scanout > >>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 > >>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter > >>>>>> increment happening exactly at the moment that vertical scanout position > >>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is > >>>>>> the framecounter increment just happening somewhere inside either > >>>>>> scanline VTOTAL-1 or scanline 0? > >>>>>> > >>>>>> > >>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad > >>>>>> regression and with a bit of luck at the same time improve by being able > >>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs > >>>>>> to get turned off more aggressively for a bit of extra power saving. > >>>>>> > >>>>>> thanks, > >>>>>> -mario > >>>>> > >>>>>> -- a/drivers/gpu/drm/drm_irq.c > >>>>>> +++ b/drivers/gpu/drm/drm_irq.c > >>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>>>> unsigned long flags) > >>>>>> { > >>>>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > >>>>>> - u32 cur_vblank, diff; > >>>>>> + u32 cur_vblank, diff = 0; > >>>>>> bool rc; > >>>>>> struct timeval t_vblank; > >>>>>> + const struct timeval *t_old; > >>>>>> + u64 diff_ns; > >>>>>> int count = DRM_TIMESTAMP_MAXRETRIES; > >>>>>> int framedur_ns = vblank->framedur_ns; > >>>>>> > >>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); > >>>>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); > >>>>>> > >>>>>> - if (dev->max_vblank_count != 0) { > >>>>>> - /* trust the hw counter when it's around */ > >>>>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >>>>>> - } else if (rc && framedur_ns) { > >>>>>> - const struct timeval *t_old; > >>>>>> - u64 diff_ns; > >>>>>> - > >>>>>> + /* > >>>>>> + * Always use vblank timestamping based method if supported to reject > >>>>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up > >>>>>> + * due to some irq handling quirk. > >>>>> > >>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw > >>>>> counter if it isn't reliable. > >>>>> > >>>>>> + * > >>>>>> + * This also sets the diff value for use as fallback below in case the > >>>>>> + * hw does not support a suitable hw vblank counter. > >>>>>> + */ > >>>>>> + if (rc && framedur_ns) { > >>>>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); > >>>>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); > >>>>>> > >>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>>>> */ > >>>>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); > >>>>>> > >>>>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) > >>>>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >>>>>> - " diff_ns = %lld, framedur_ns = %d)\n", > >>>>>> - pipe, (long long) diff_ns, framedur_ns); > >>>>>> - } else { > >>>>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { > >>>>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >>>>>> + " diff_ns = %lld, framedur_ns = %d)\n", > >>>>>> + pipe, (long long) diff_ns, framedur_ns); > >>>>>> + > >>>>>> + /* Treat this redundant invocation as no-op. */ > >>>>>> + WARN_ON_ONCE(cur_vblank != vblank->last); > >>>>>> + return; > >>>>>> + } > >>>>>> + } > >>>>>> + > >>>>>> + /* > >>>>>> + * hw counters, if marked as supported via max_vblank_count != 0, > >>>>>> + * *must* increment at leading edge of vblank and in sync with > >>>>>> + * the firing of the hw vblank irq, otherwise we have a race here > >>>>>> + * between gpu and us, where small variations in our execution > >>>>>> + * timing can cause off-by-one miscounting of vblanks. > >>>>>> + */ > >>>>>> + if (dev->max_vblank_count != 0) { > >>>>>> + /* trust the hw counter when it's around */ > >>>>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >>>>>> + } else if (!(rc && framedur_ns)) { > >>>>>> /* some kind of default for drivers w/o accurate vbl timestamping */ > >>>>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > >>>>>> } > >>>>> > >>>>> > >>> > >
On 11/23/2015 04:51 PM, Ville Syrjälä wrote: > On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: >> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: >>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: >> >> ... >> Ok, but why would that be a bad thing? I think we want it to think it is >> in the previous frame if it is called outside the vblank irq context. >> The only reason we fudge it to the next frames vblank if i vblank irq is >> because we know the vblank irq handler we are executing atm. was meant >> to execute within the upcoming vblank for the next frame, so we fudge >> the scanout positions and thereby timestamp to correspond to that new >> frame. But if something called outside irq context it should get a >> scanout position/timestamp that corresponds to "reality". > > It would be a bad thing since it would cause the timestamp to jump > backwards, and that would also cause the frame count guesstimate to go > backwards. > But only if we don't use the dev->driver->get_vblank_counter() method, which we try to use on AMD. Also dev->vblank_time_lock should protect us from concurrent execution of the vblank counting/timestamping in irq context and non-irq context? At least that was one of the purposes of that lock in the past? -mario >> >> It would maybe be a problem to get different answers for a query at the >> same scanout position if we used .get_scanout_position() for something >> else than for calculating vblank timestamps, but we don't do that atm. >> >> Maybe i'm overlooking something here related to the somewhat rewritten >> code from the last year or so? But in the original design this would be >> exactly what i intended? >> >> ... >> >>>> So it's good enough for typical desktop >>>> applications/compositors/games/media players, and a nice improvement >>>> over the previous state, but not quite sufficient for applications that >>>> need long time consistent vblank counts for long waits of multiple >>>> seconds or even minutes. So it's still good to have hw counters if we >>>> can get some that are reliable enough. >>> >>> Ah, I didn't realize you care about small errors in the counter for >>> such long periods of vblank off. >>> >> >> Actually, you are right, i was stupid - not enough sleep last friday. I >> do care about such small errors, but the long vblank off periods don't >> matter at all the way my software works. I query the current count and >> timestamp (glXGetSyncValuesOML), calculate a target count based on those >> and then schedule a swap for that target count via glXSwapBuffersMscOML. >> That swapbuffers call will keep vblank irqs on until the kms pageflip is >> queued. So i only care about vblank counts/timestamps being consistent >> for short amounts of time, typically 1 video refresh from vblank query >> to queueing a vblank event, and then from reception of that >> event/queuing the pageflip to pageflip completion event. So even if the >> system would be heavily loaded and my code would have big preemption >> delays i think counts that are consistent over a few seconds would be >> enough to keep things working. Otherwise it wouldn't work now either >> with a vblank off after 5 seconds and nouveau not having vblank hw counters. >> >> -mario >> >> >>>> >>>> -mario >>>> >>>> >>>>>> >>>>>> -mario >>>>>> >>>>>>>> >>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info >>>>>>>> from you: >>>>>>>> >>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not >>>>>>>> part of the current register defines for radeon-kms: >>>>>>>> >>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>>>>>>> >>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of >>>>>>>> vertical scanout position. It seems to give the same readings as the 24 >>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>>>>>>> would come handy. >>>>>>>> >>>>>>>> Does Evergreen and later have a same/similar register and where is it? >>>>>>>> >>>>>>>> 2. The hw framecounter seems to increment when the vertical scanout >>>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 >>>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter >>>>>>>> increment happening exactly at the moment that vertical scanout position >>>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is >>>>>>>> the framecounter increment just happening somewhere inside either >>>>>>>> scanline VTOTAL-1 or scanline 0? >>>>>>>> >>>>>>>> >>>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>>>>>>> regression and with a bit of luck at the same time improve by being able >>>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs >>>>>>>> to get turned off more aggressively for a bit of extra power saving. >>>>>>>> >>>>>>>> thanks, >>>>>>>> -mario >>>>>>> >>>>>>>> -- a/drivers/gpu/drm/drm_irq.c >>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c >>>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>>>> unsigned long flags) >>>>>>>> { >>>>>>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >>>>>>>> - u32 cur_vblank, diff; >>>>>>>> + u32 cur_vblank, diff = 0; >>>>>>>> bool rc; >>>>>>>> struct timeval t_vblank; >>>>>>>> + const struct timeval *t_old; >>>>>>>> + u64 diff_ns; >>>>>>>> int count = DRM_TIMESTAMP_MAXRETRIES; >>>>>>>> int framedur_ns = vblank->framedur_ns; >>>>>>>> >>>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >>>>>>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); >>>>>>>> >>>>>>>> - if (dev->max_vblank_count != 0) { >>>>>>>> - /* trust the hw counter when it's around */ >>>>>>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>>>>>> - } else if (rc && framedur_ns) { >>>>>>>> - const struct timeval *t_old; >>>>>>>> - u64 diff_ns; >>>>>>>> - >>>>>>>> + /* >>>>>>>> + * Always use vblank timestamping based method if supported to reject >>>>>>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >>>>>>>> + * due to some irq handling quirk. >>>>>>> >>>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw >>>>>>> counter if it isn't reliable. >>>>>>> >>>>>>>> + * >>>>>>>> + * This also sets the diff value for use as fallback below in case the >>>>>>>> + * hw does not support a suitable hw vblank counter. >>>>>>>> + */ >>>>>>>> + if (rc && framedur_ns) { >>>>>>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); >>>>>>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >>>>>>>> >>>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>>>> */ >>>>>>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >>>>>>>> >>>>>>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >>>>>>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>>>>>> - " diff_ns = %lld, framedur_ns = %d)\n", >>>>>>>> - pipe, (long long) diff_ns, framedur_ns); >>>>>>>> - } else { >>>>>>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >>>>>>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>>>>>> + " diff_ns = %lld, framedur_ns = %d)\n", >>>>>>>> + pipe, (long long) diff_ns, framedur_ns); >>>>>>>> + >>>>>>>> + /* Treat this redundant invocation as no-op. */ >>>>>>>> + WARN_ON_ONCE(cur_vblank != vblank->last); >>>>>>>> + return; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * hw counters, if marked as supported via max_vblank_count != 0, >>>>>>>> + * *must* increment at leading edge of vblank and in sync with >>>>>>>> + * the firing of the hw vblank irq, otherwise we have a race here >>>>>>>> + * between gpu and us, where small variations in our execution >>>>>>>> + * timing can cause off-by-one miscounting of vblanks. >>>>>>>> + */ >>>>>>>> + if (dev->max_vblank_count != 0) { >>>>>>>> + /* trust the hw counter when it's around */ >>>>>>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>>>>>> + } else if (!(rc && framedur_ns)) { >>>>>>>> /* some kind of default for drivers w/o accurate vbl timestamping */ >>>>>>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >>>>>>>> } >>>>>>> >>>>>>> >>>>> >>> >
Hi Mario, when we've had issues with this on amdgpu Christian fixed it by enabling page flip irq all the time, rather than turning it on when usermode request a flip and turning it back off after we handled it. I believe that fix exists on radeon already. Michel should have more info on that. See other comments inline. Thanks, Harry On 2015-11-23 11:02 AM, Mario Kleiner wrote: > On 11/20/2015 04:42 AM, Alex Deucher wrote: >> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner >> <mario.kleiner.de@gmail.com> wrote: >>> Hi Alex and Michel and Ville, >>> >>> it's "fix vblank stuff" time again ;-) >> >> Adding Harry from our display team. He might be able to fill in the >> blanks of on some of this better than I can. It might also be worth >> checking to see how our DAL (our new display code which is being >> developed directly by our display team) code handles this. It could >> be that we are just missing register settings: > > Thanks Alex! And hello Harry :) > >> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip > > I'll have a look at this. > >> Additionally we've published full registers headers for the display >> block: >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce >> >> The DCE8 stuff should generally apply back to DCE4. If you have >> questions about registers older asics not covered in the hw docs, let >> me know. Note the new headers are dword aligned rather than byte >> aligned. >> > > I've tested now with two different progressive modes on DCE3 and one > progressive mode on DCE4, the only cards i have atm. So far it seems > that the framecounter indeed increments when the vpos from the scanout > position query jumps back to zero. Attached for reference is my > current patch to radeon-kms. This one seems to work reliably so far, > also if i enable the immediate vblank irq disable which we so far only > used on intel-kms. > > But according to this patch the framecounter increment happens > somewhere in the middle of vblank. > > Essentially from the vpos extracted from > EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start" > part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 > the framecounter stays on the count of the old previous scanout cycle. > Then when vpos wraps to zero the framecounter increments by 1. And > then we have another couple of dozen lines inside vblank until > reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and > entering active scanout for the new frame. > > So the position of observed framecounter increment seems to be not > close to the end of vblank ("start of frame" indicator?), but a couple > of scanlines after start of vblank. > > E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, > vtotal is 1481, end of vblank is 38. So i enter the vblank and see the > old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and > the framecounter increments by 1, then 38 scanlines later the vblank > ends. > > So i seem to have something that seems to work in practice and this > "increment framecounter if vpos wraps back to zero" behavior makes > some sense. It just doesn't conform to what those descriptions for > start_line and "start of frame" indicator describe? > This is correct. Our HW doesn't really have a vblank counter but a frame counter. The framecounter increments at the start of vsync, which is when we wrap to zero which doesn't coincide with the start of vblank. What we're trying to do with get_vblank_counter isn't the same as what framecount gives us, but we could probably do something like: if (get_scanout_pos > vblank_start) return frame_count + 1; else return frame_count; Harry > I'll test with a few more video modes. > > thanks, > -mario > > >>> >>> Ville's changes to the DRM's drm_handle_vblank() / >>> drm_update_vblank_count() >>> code in Linux 4.4 not only made that code more elegant, but also >>> removed the >>> robustness against the vblank irq quirks in AMD hw and similar >>> hardware. So >>> now i get tons of off-by-one errors and >>> >>> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >>> completion event has impossible msc 24803 < target_msc 24804" XOrg >>> messages >>> from that kernel. >>> >>> One of the reasons for trouble is that AMD hw quirk where the hw >>> fires an >>> extra vblank irq shortly after vblank irq's get enabled, not >>> synchronized to >>> vblank, but typically in the middle of active scanout, so we get a >>> redundant >>> call to drm_handle_vblank in the middle of scanout. >>> >>> To fix that i have a minor patch to make drm_update_vblank_count() >>> again >>> robust against such redundant calls, which i will send out later to the >>> mailing list. Diff attached for reference. >>> >>> The second quirk of AMD hw is that the vblank interrupt fires a few >>> scanlines before start of vblank, so drm_handle_vblank -> >>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets >>> called >>> before the start of the vblank for which the new vblank count should be >>> queried. >>> >>> The third problem is that the DRM vblank handling always had the >>> assumption >>> that hardware vblank counters would essentially increment at leading >>> edge of >>> vblank - basically in sync with the firing of the vblank irq, so >>> that a hw >>> counter readout from within the vblank irq handler would always >>> deliver the >>> new incremented value. If this assumption is violated then the >>> counting by >>> use of the hw counter gets unreliable, because depending on random >>> small >>> delays in irq handling the code may end up sampling the hw counter >>> pre- or >>> post-increment, leading to inconsistent updating and funky bugs. It >>> just so >>> happens that AMD hardware doesn't increment the hw counter at >>> leading edge >>> of vblank, so stuff falls apart. >>> >>> So to fix those two problems i'm tinkering with cooking the hw vblank >>> counter value returned by radeon_get_vblank_counter_kms() to make it >>> appear >>> as if the counter incremented at leading edge of vblank in sync with >>> vblank >>> irq. >>> >>> It almost sort of works on the rs600 code path, but i need a bit of >>> info >>> from you: >>> >>> 1. There's this register from the old specs for m76.pdf, which is >>> not part >>> of the current register defines for radeon-kms: >>> >>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>> >>> It contains the lower 16 bits of framecounter and the 13 bits of >>> vertical >>> scanout position. It seems to give the same readings as the 24 bit >>> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>> would >>> come handy. >>> >>> Does Evergreen and later have a same/similar register and where is it? >> >> Yes. CRTC_STATUS_VF_COUNT >> >> CRTC:CRTC_STATUS_VF_COUNT · [R/W] · 32 bits · Access: 8/16/32 · >> [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2] >> GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4] >> GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c >> DESCRIPTION: Current composite vertical and frame count for CRTC >> Field Name Bits Default Description >> CRTC_VF_COUNT >> (Access: R) 28:0 0x0 Reports current vertical and frame count >> >>> >>> 2. The hw framecounter seems to increment when the vertical scanout >>> position >>> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i >>> tested so >>> far. Is this so on all asics? And is the hw counter increment happening >>> exactly at the moment that vertical scanout position jumps back to >>> zero, ie. >>> both events are driven by the same signal? Or is the framecounter >>> increment >>> just happening somewhere inside either scanline VTOTAL-1 or scanline 0? >> >> Unless Harry knows, we can ask the hw team, but I doubt they would >> have changed it. I think it's tied to the start line which is >> controlled by this register: >> CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 >> · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] >> GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] >> GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc >> DESCRIPTION: move start_line signal earlier by 1 line in CRTC >> Field Name Bits Default Description >> CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line >> signal by 1 line eariler in progressive mode >> >> CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line >> signal by 1 line earlier in interlaced timing mode >> >> CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced >> Start Line position respect to not VBLANK. Default of 4 means the >> Advanced Start Line is 4 lines before the first non VBLANK line >> >> The following info I dug up internally may be useful: >> >> The position of the CRTC output timing generator when the “start of >> frame” indicator occurs depends on several conditions. These >> conditions are whether the timing generator is outputting timing for a >> progressive or interlaced display mode, whether the scaler is enabled, >> and if so, whether the register to select an earlier than normal >> “start of frame” indicator is set. The “start of frame” indicator >> typically occurs 2 lines before the vertical blank end position >> (DxCRTC_V_BLANK_END) is reached >> >> When interlaced output display modes are enabled >> (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled >> (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts >> by 2 for both the even fields and odd fields of each frame. For both >> the even fields and the odd fields of interlaced output modes, the >> “start of frame” indicator occurs 2 lines before the end of the >> vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END – >> 3 depending on the field since the vertical counter counts by 2 in >> interlaced), since the vertical counter counts by 2 in this mode). >> There is one exception to the previous statement. If scaling is >> enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled >> (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then >> the “start of frame” indicator happens 3 lines before the end of the >> vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5 >> depending on the field since the vertical counter counts by 2 in >> interlaced). >> >> When progressive output display modes are enabled >> (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled >> (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines >> before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2). >> Similar to interlaced output mode, there is one exception to the >> previous statement. If scaling is enabled (DxSCL_SCALE_EN = 1) and >> the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY = >> 1) in progressive output mode then the “start of frame” indicator >> happens 3 lines before the end of the vertical blank >> (DxCRTC_V_BLANK_END – 3) >> >> I hope this helps, >> >> Alex >> >>> >>> >>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>> regression and with a bit of luck at the same time improve by being >>> able to >>> set dev->vblank_disable_immediate = true then and allow vblank irqs >>> to get >>> turned off more aggressively for a bit of extra power saving. >>> >>> thanks, >>> -mario
On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: > > > On 11/23/2015 04:51 PM, Ville Syrjälä wrote: > > On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: > >> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > >>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > >> > >> ... > >> Ok, but why would that be a bad thing? I think we want it to think it is > >> in the previous frame if it is called outside the vblank irq context. > >> The only reason we fudge it to the next frames vblank if i vblank irq is > >> because we know the vblank irq handler we are executing atm. was meant > >> to execute within the upcoming vblank for the next frame, so we fudge > >> the scanout positions and thereby timestamp to correspond to that new > >> frame. But if something called outside irq context it should get a > >> scanout position/timestamp that corresponds to "reality". > > > > It would be a bad thing since it would cause the timestamp to jump > > backwards, and that would also cause the frame count guesstimate to go > > backwards. > > > > But only if we don't use the dev->driver->get_vblank_counter() method, > which we try to use on AMD. Well, if you do it that way then you have the problem of the hw counter seeming to jump forward by one after crossing the start of vblank (when compared to the value you sampled when you processed the early vblank interrupt). I guess one silly idea would be to defer the vblank interrupt processing to a timer, and just schedule it a bit into the future from the actual interrupt handler. > Also dev->vblank_time_lock should protect us > from concurrent execution of the vblank counting/timestamping in irq > context and non-irq context? At least that was one of the purposes of > that lock in the past? The scenarion I outlined doesn't require anything to happen concurrently. > > -mario > > >> > >> It would maybe be a problem to get different answers for a query at the > >> same scanout position if we used .get_scanout_position() for something > >> else than for calculating vblank timestamps, but we don't do that atm. > >> > >> Maybe i'm overlooking something here related to the somewhat rewritten > >> code from the last year or so? But in the original design this would be > >> exactly what i intended? > >> > >> ... > >> > >>>> So it's good enough for typical desktop > >>>> applications/compositors/games/media players, and a nice improvement > >>>> over the previous state, but not quite sufficient for applications that > >>>> need long time consistent vblank counts for long waits of multiple > >>>> seconds or even minutes. So it's still good to have hw counters if we > >>>> can get some that are reliable enough. > >>> > >>> Ah, I didn't realize you care about small errors in the counter for > >>> such long periods of vblank off. > >>> > >> > >> Actually, you are right, i was stupid - not enough sleep last friday. I > >> do care about such small errors, but the long vblank off periods don't > >> matter at all the way my software works. I query the current count and > >> timestamp (glXGetSyncValuesOML), calculate a target count based on those > >> and then schedule a swap for that target count via glXSwapBuffersMscOML. > >> That swapbuffers call will keep vblank irqs on until the kms pageflip is > >> queued. So i only care about vblank counts/timestamps being consistent > >> for short amounts of time, typically 1 video refresh from vblank query > >> to queueing a vblank event, and then from reception of that > >> event/queuing the pageflip to pageflip completion event. So even if the > >> system would be heavily loaded and my code would have big preemption > >> delays i think counts that are consistent over a few seconds would be > >> enough to keep things working. Otherwise it wouldn't work now either > >> with a vblank off after 5 seconds and nouveau not having vblank hw counters. > >> > >> -mario > >> > >> > >>>> > >>>> -mario > >>>> > >>>> > >>>>>> > >>>>>> -mario > >>>>>> > >>>>>>>> > >>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info > >>>>>>>> from you: > >>>>>>>> > >>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not > >>>>>>>> part of the current register defines for radeon-kms: > >>>>>>>> > >>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" > >>>>>>>> > >>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of > >>>>>>>> vertical scanout position. It seems to give the same readings as the 24 > >>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This > >>>>>>>> would come handy. > >>>>>>>> > >>>>>>>> Does Evergreen and later have a same/similar register and where is it? > >>>>>>>> > >>>>>>>> 2. The hw framecounter seems to increment when the vertical scanout > >>>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 > >>>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter > >>>>>>>> increment happening exactly at the moment that vertical scanout position > >>>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is > >>>>>>>> the framecounter increment just happening somewhere inside either > >>>>>>>> scanline VTOTAL-1 or scanline 0? > >>>>>>>> > >>>>>>>> > >>>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad > >>>>>>>> regression and with a bit of luck at the same time improve by being able > >>>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs > >>>>>>>> to get turned off more aggressively for a bit of extra power saving. > >>>>>>>> > >>>>>>>> thanks, > >>>>>>>> -mario > >>>>>>> > >>>>>>>> -- a/drivers/gpu/drm/drm_irq.c > >>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c > >>>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>>>>>> unsigned long flags) > >>>>>>>> { > >>>>>>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > >>>>>>>> - u32 cur_vblank, diff; > >>>>>>>> + u32 cur_vblank, diff = 0; > >>>>>>>> bool rc; > >>>>>>>> struct timeval t_vblank; > >>>>>>>> + const struct timeval *t_old; > >>>>>>>> + u64 diff_ns; > >>>>>>>> int count = DRM_TIMESTAMP_MAXRETRIES; > >>>>>>>> int framedur_ns = vblank->framedur_ns; > >>>>>>>> > >>>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>>>>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); > >>>>>>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); > >>>>>>>> > >>>>>>>> - if (dev->max_vblank_count != 0) { > >>>>>>>> - /* trust the hw counter when it's around */ > >>>>>>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >>>>>>>> - } else if (rc && framedur_ns) { > >>>>>>>> - const struct timeval *t_old; > >>>>>>>> - u64 diff_ns; > >>>>>>>> - > >>>>>>>> + /* > >>>>>>>> + * Always use vblank timestamping based method if supported to reject > >>>>>>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up > >>>>>>>> + * due to some irq handling quirk. > >>>>>>> > >>>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw > >>>>>>> counter if it isn't reliable. > >>>>>>> > >>>>>>>> + * > >>>>>>>> + * This also sets the diff value for use as fallback below in case the > >>>>>>>> + * hw does not support a suitable hw vblank counter. > >>>>>>>> + */ > >>>>>>>> + if (rc && framedur_ns) { > >>>>>>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); > >>>>>>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); > >>>>>>>> > >>>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > >>>>>>>> */ > >>>>>>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); > >>>>>>>> > >>>>>>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) > >>>>>>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >>>>>>>> - " diff_ns = %lld, framedur_ns = %d)\n", > >>>>>>>> - pipe, (long long) diff_ns, framedur_ns); > >>>>>>>> - } else { > >>>>>>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { > >>>>>>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > >>>>>>>> + " diff_ns = %lld, framedur_ns = %d)\n", > >>>>>>>> + pipe, (long long) diff_ns, framedur_ns); > >>>>>>>> + > >>>>>>>> + /* Treat this redundant invocation as no-op. */ > >>>>>>>> + WARN_ON_ONCE(cur_vblank != vblank->last); > >>>>>>>> + return; > >>>>>>>> + } > >>>>>>>> + } > >>>>>>>> + > >>>>>>>> + /* > >>>>>>>> + * hw counters, if marked as supported via max_vblank_count != 0, > >>>>>>>> + * *must* increment at leading edge of vblank and in sync with > >>>>>>>> + * the firing of the hw vblank irq, otherwise we have a race here > >>>>>>>> + * between gpu and us, where small variations in our execution > >>>>>>>> + * timing can cause off-by-one miscounting of vblanks. > >>>>>>>> + */ > >>>>>>>> + if (dev->max_vblank_count != 0) { > >>>>>>>> + /* trust the hw counter when it's around */ > >>>>>>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > >>>>>>>> + } else if (!(rc && framedur_ns)) { > >>>>>>>> /* some kind of default for drivers w/o accurate vbl timestamping */ > >>>>>>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > >>>>>>>> } > >>>>>>> > >>>>>>> > >>>>> > >>> > >
On 11/23/2015 09:04 PM, Harry Wentland wrote: > Hi Mario, > > when we've had issues with this on amdgpu Christian fixed it by enabling > page flip irq all the time, rather than turning it on when usermode > request a flip and turning it back off after we handled it. I believe > that fix exists on radeon already. Michel should have more info on that. > > See other comments inline. > > Thanks, > Harry > > > On 2015-11-23 11:02 AM, Mario Kleiner wrote: >> On 11/20/2015 04:42 AM, Alex Deucher wrote: >>> On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner >>> <mario.kleiner.de@gmail.com> wrote: >>>> Hi Alex and Michel and Ville, >>>> >>>> it's "fix vblank stuff" time again ;-) >>> >>> Adding Harry from our display team. He might be able to fill in the >>> blanks of on some of this better than I can. It might also be worth >>> checking to see how our DAL (our new display code which is being >>> developed directly by our display team) code handles this. It could >>> be that we are just missing register settings: >> >> Thanks Alex! And hello Harry :) >> >>> http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip >> >> I'll have a look at this. >> >>> Additionally we've published full registers headers for the display >>> block: >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce >>> >>> The DCE8 stuff should generally apply back to DCE4. If you have >>> questions about registers older asics not covered in the hw docs, let >>> me know. Note the new headers are dword aligned rather than byte >>> aligned. >>> >> >> I've tested now with two different progressive modes on DCE3 and one >> progressive mode on DCE4, the only cards i have atm. So far it seems >> that the framecounter indeed increments when the vpos from the scanout >> position query jumps back to zero. Attached for reference is my >> current patch to radeon-kms. This one seems to work reliably so far, >> also if i enable the immediate vblank irq disable which we so far only >> used on intel-kms. >> >> But according to this patch the framecounter increment happens >> somewhere in the middle of vblank. >> >> Essentially from the vpos extracted from >> EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start" >> part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 >> the framecounter stays on the count of the old previous scanout cycle. >> Then when vpos wraps to zero the framecounter increments by 1. And >> then we have another couple of dozen lines inside vblank until >> reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and >> entering active scanout for the new frame. >> >> So the position of observed framecounter increment seems to be not >> close to the end of vblank ("start of frame" indicator?), but a couple >> of scanlines after start of vblank. >> >> E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478, >> vtotal is 1481, end of vblank is 38. So i enter the vblank and see the >> old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and >> the framecounter increments by 1, then 38 scanlines later the vblank >> ends. >> >> So i seem to have something that seems to work in practice and this >> "increment framecounter if vpos wraps back to zero" behavior makes >> some sense. It just doesn't conform to what those descriptions for >> start_line and "start of frame" indicator describe? >> > This is correct. Our HW doesn't really have a vblank counter but a frame > counter. The framecounter increments at the start of vsync, which is > when we wrap to zero which doesn't coincide with the start of vblank. > > What we're trying to do with get_vblank_counter isn't the same as what > framecount gives us, but we could probably do something like: > > if (get_scanout_pos > vblank_start) > return frame_count + 1; > else > return frame_count; > Great! That's what my current patch does and it seems to work well with different video modes on both DCE3 and DCE4. So theory agrees with practice again :) - thanks for clarifying this. So the other problem we have since forever is the vblank irq firing before the start of vblank. We are typically 1-2 scanlines before vblank_start when we sample the scanout position and framecounter. This needs a slightly ugly workaround. Is there a way, maybe some config register, to fire the irq at leading edge of vblank instead of a bit too early? thanks, -mario > Harry > >> I'll test with a few more video modes. >> >> thanks, >> -mario >> >> >>>> >>>> Ville's changes to the DRM's drm_handle_vblank() / >>>> drm_update_vblank_count() >>>> code in Linux 4.4 not only made that code more elegant, but also >>>> removed the >>>> robustness against the vblank irq quirks in AMD hw and similar >>>> hardware. So >>>> now i get tons of off-by-one errors and >>>> >>>> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >>>> completion event has impossible msc 24803 < target_msc 24804" XOrg >>>> messages >>>> from that kernel. >>>> >>>> One of the reasons for trouble is that AMD hw quirk where the hw >>>> fires an >>>> extra vblank irq shortly after vblank irq's get enabled, not >>>> synchronized to >>>> vblank, but typically in the middle of active scanout, so we get a >>>> redundant >>>> call to drm_handle_vblank in the middle of scanout. >>>> >>>> To fix that i have a minor patch to make drm_update_vblank_count() >>>> again >>>> robust against such redundant calls, which i will send out later to the >>>> mailing list. Diff attached for reference. >>>> >>>> The second quirk of AMD hw is that the vblank interrupt fires a few >>>> scanlines before start of vblank, so drm_handle_vblank -> >>>> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets >>>> called >>>> before the start of the vblank for which the new vblank count should be >>>> queried. >>>> >>>> The third problem is that the DRM vblank handling always had the >>>> assumption >>>> that hardware vblank counters would essentially increment at leading >>>> edge of >>>> vblank - basically in sync with the firing of the vblank irq, so >>>> that a hw >>>> counter readout from within the vblank irq handler would always >>>> deliver the >>>> new incremented value. If this assumption is violated then the >>>> counting by >>>> use of the hw counter gets unreliable, because depending on random >>>> small >>>> delays in irq handling the code may end up sampling the hw counter >>>> pre- or >>>> post-increment, leading to inconsistent updating and funky bugs. It >>>> just so >>>> happens that AMD hardware doesn't increment the hw counter at >>>> leading edge >>>> of vblank, so stuff falls apart. >>>> >>>> So to fix those two problems i'm tinkering with cooking the hw vblank >>>> counter value returned by radeon_get_vblank_counter_kms() to make it >>>> appear >>>> as if the counter incremented at leading edge of vblank in sync with >>>> vblank >>>> irq. >>>> >>>> It almost sort of works on the rs600 code path, but i need a bit of >>>> info >>>> from you: >>>> >>>> 1. There's this register from the old specs for m76.pdf, which is >>>> not part >>>> of the current register defines for radeon-kms: >>>> >>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>>> >>>> It contains the lower 16 bits of framecounter and the 13 bits of >>>> vertical >>>> scanout position. It seems to give the same readings as the 24 bit >>>> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>>> would >>>> come handy. >>>> >>>> Does Evergreen and later have a same/similar register and where is it? >>> >>> Yes. CRTC_STATUS_VF_COUNT >>> >>> CRTC:CRTC_STATUS_VF_COUNT · [R/W] · 32 bits · Access: 8/16/32 · >>> [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2] >>> GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4] >>> GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c >>> DESCRIPTION: Current composite vertical and frame count for CRTC >>> Field Name Bits Default Description >>> CRTC_VF_COUNT >>> (Access: R) 28:0 0x0 Reports current vertical and frame count >>> >>>> >>>> 2. The hw framecounter seems to increment when the vertical scanout >>>> position >>>> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i >>>> tested so >>>> far. Is this so on all asics? And is the hw counter increment happening >>>> exactly at the moment that vertical scanout position jumps back to >>>> zero, ie. >>>> both events are driven by the same signal? Or is the framecounter >>>> increment >>>> just happening somewhere inside either scanline VTOTAL-1 or scanline 0? >>> >>> Unless Harry knows, we can ask the hw team, but I doubt they would >>> have changed it. I think it's tied to the start line which is >>> controlled by this register: >>> CRTC:CRTC_START_LINE_CONTROL · [R/W] · 32 bits · Access: 8/16/32 >>> · [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] >>> GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] >>> GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc >>> DESCRIPTION: move start_line signal earlier by 1 line in CRTC >>> Field Name Bits Default Description >>> CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line >>> signal by 1 line eariler in progressive mode >>> >>> CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line >>> signal by 1 line earlier in interlaced timing mode >>> >>> CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced >>> Start Line position respect to not VBLANK. Default of 4 means the >>> Advanced Start Line is 4 lines before the first non VBLANK line >>> >>> The following info I dug up internally may be useful: >>> >>> The position of the CRTC output timing generator when the “start of >>> frame” indicator occurs depends on several conditions. These >>> conditions are whether the timing generator is outputting timing for a >>> progressive or interlaced display mode, whether the scaler is enabled, >>> and if so, whether the register to select an earlier than normal >>> “start of frame” indicator is set. The “start of frame” indicator >>> typically occurs 2 lines before the vertical blank end position >>> (DxCRTC_V_BLANK_END) is reached >>> >>> When interlaced output display modes are enabled >>> (DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled >>> (DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts >>> by 2 for both the even fields and odd fields of each frame. For both >>> the even fields and the odd fields of interlaced output modes, the >>> “start of frame” indicator occurs 2 lines before the end of the >>> vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END – >>> 3 depending on the field since the vertical counter counts by 2 in >>> interlaced), since the vertical counter counts by 2 in this mode). >>> There is one exception to the previous statement. If scaling is >>> enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled >>> (DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then >>> the “start of frame” indicator happens 3 lines before the end of the >>> vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5 >>> depending on the field since the vertical counter counts by 2 in >>> interlaced). >>> >>> When progressive output display modes are enabled >>> (DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled >>> (DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines >>> before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2). >>> Similar to interlaced output mode, there is one exception to the >>> previous statement. If scaling is enabled (DxSCL_SCALE_EN = 1) and >>> the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY = >>> 1) in progressive output mode then the “start of frame” indicator >>> happens 3 lines before the end of the vertical blank >>> (DxCRTC_V_BLANK_END – 3) >>> >>> I hope this helps, >>> >>> Alex >>> >>>> >>>> >>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>>> regression and with a bit of luck at the same time improve by being >>>> able to >>>> set dev->vblank_disable_immediate = true then and allow vblank irqs >>>> to get >>>> turned off more aggressively for a bit of extra power saving. >>>> >>>> thanks, >>>> -mario >
On 11/23/2015 09:24 PM, Ville Syrjälä wrote: > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: >> >> >> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: >>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: >>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: >>>> >>>> ... >>>> Ok, but why would that be a bad thing? I think we want it to think it is >>>> in the previous frame if it is called outside the vblank irq context. >>>> The only reason we fudge it to the next frames vblank if i vblank irq is >>>> because we know the vblank irq handler we are executing atm. was meant >>>> to execute within the upcoming vblank for the next frame, so we fudge >>>> the scanout positions and thereby timestamp to correspond to that new >>>> frame. But if something called outside irq context it should get a >>>> scanout position/timestamp that corresponds to "reality". >>> >>> It would be a bad thing since it would cause the timestamp to jump >>> backwards, and that would also cause the frame count guesstimate to go >>> backwards. >>> >> >> But only if we don't use the dev->driver->get_vblank_counter() method, >> which we try to use on AMD. > > Well, if you do it that way then you have the problem of the hw counter > seeming to jump forward by one after crossing the start of vblank (when > compared to the value you sampled when you processed the early vblank > interrupt). > Ok, finally i see the bad scenario that wouldn't get prevented by our current locking with the new vblank counting in the core. The vblank enable path is safe due to locking and discounting of redundant timestamps etc. But the disable path could go wrong: 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), updates timestamps and counts "as if" in vblank -> incremented vblank count and timestamp now set in the future. 2. After vblank irq finishes, but just before leading edge of vblank, vblank_disable_and_save() executes, doesn't get bumped timestamp or count because before vblank and not in vblank irq. Now drm_update_vblank_count() would process a "new" timestamp and count from the past and we'd have time and counts going backwards, and bad things would happen. I haven't observed such a thing happening during testing so far, probably because the time window in which it could happen is tiny, but given how awfully bad it would be, it needs to be prevented. I had a look at the description of the Vblank irq in the "M76 Register Reference Guide" for older asics and the description suggests that the vblank irq fires when the crtc's line buffer is finished reading pixel data from the scanout buffer in memory for a frame, ie., when the line buffer read "enters" vblank. That would explain why the irq happens a few scanlines before actual vblank, because line buffer refills must obviously happen before the crtc can send pixel data from the line buffer to the encoders, so it would lead a bit in time. That also means we can't delay the vblank irq to actually happen at start of vblank and have to deal with the early vblank irq. > I guess one silly idea would be to defer the vblank interrupt processing > to a timer, and just schedule it a bit into the future from the actual > interrupt handler. > Timer would be bad because then we get problems with the pageflip completion irq sometimes being processed before the vblank irq, and because we want to be fast in vblank irq handling, delivering vblank events etc. I wouldn't trust a timer to be reliable enough for such short waits. Busy waiting wouldn't be great either in irq. So what about this relatively simple one? 1. In radeon_get_crtc_scanoutpos() we artifically define the vblank_start line to be, e.g, 5 scanlines before the true start of vblank, so for the purpose of vblank counter queries and timestamping our "vblank" would start a bit earlier and the vblank irq would always execute in "vblank". Non-Irq invocations like vblank_disable_and_save() would also be treated to this early vblank start and so what the DRM core observes would always be consistent. 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() internally for "dynpm" dynamic power management/reclocking, and to implement pageflip completion detection on asics older than DCE3 which don't have pageflip interrupts. For those cases we need to use the true start of vblank, so for this internal use we pass in some special flag into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank start around. 3. I've added another check to my patch for drm_update_vblank_count() to catch timestamps going backwards and discounting such cases, for a bit of extra robustness against driver trouble. Any ideas why this would be a stupid idea? I'll try this now and just hope meanwhile nobody finds a reason why this would be bad. thanks, -mario >> Also dev->vblank_time_lock should protect us >> from concurrent execution of the vblank counting/timestamping in irq >> context and non-irq context? At least that was one of the purposes of >> that lock in the past? > > The scenarion I outlined doesn't require anything to happen > concurrently. > >> >> -mario >> >>>> >>>> It would maybe be a problem to get different answers for a query at the >>>> same scanout position if we used .get_scanout_position() for something >>>> else than for calculating vblank timestamps, but we don't do that atm. >>>> >>>> Maybe i'm overlooking something here related to the somewhat rewritten >>>> code from the last year or so? But in the original design this would be >>>> exactly what i intended? >>>> >>>> ... >>>> >>>>>> So it's good enough for typical desktop >>>>>> applications/compositors/games/media players, and a nice improvement >>>>>> over the previous state, but not quite sufficient for applications that >>>>>> need long time consistent vblank counts for long waits of multiple >>>>>> seconds or even minutes. So it's still good to have hw counters if we >>>>>> can get some that are reliable enough. >>>>> >>>>> Ah, I didn't realize you care about small errors in the counter for >>>>> such long periods of vblank off. >>>>> >>>> >>>> Actually, you are right, i was stupid - not enough sleep last friday. I >>>> do care about such small errors, but the long vblank off periods don't >>>> matter at all the way my software works. I query the current count and >>>> timestamp (glXGetSyncValuesOML), calculate a target count based on those >>>> and then schedule a swap for that target count via glXSwapBuffersMscOML. >>>> That swapbuffers call will keep vblank irqs on until the kms pageflip is >>>> queued. So i only care about vblank counts/timestamps being consistent >>>> for short amounts of time, typically 1 video refresh from vblank query >>>> to queueing a vblank event, and then from reception of that >>>> event/queuing the pageflip to pageflip completion event. So even if the >>>> system would be heavily loaded and my code would have big preemption >>>> delays i think counts that are consistent over a few seconds would be >>>> enough to keep things working. Otherwise it wouldn't work now either >>>> with a vblank off after 5 seconds and nouveau not having vblank hw counters. >>>> >>>> -mario >>>> >>>> >>>>>> >>>>>> -mario >>>>>> >>>>>> >>>>>>>> >>>>>>>> -mario >>>>>>>> >>>>>>>>>> >>>>>>>>>> It almost sort of works on the rs600 code path, but i need a bit of info >>>>>>>>>> from you: >>>>>>>>>> >>>>>>>>>> 1. There's this register from the old specs for m76.pdf, which is not >>>>>>>>>> part of the current register defines for radeon-kms: >>>>>>>>>> >>>>>>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >>>>>>>>>> >>>>>>>>>> It contains the lower 16 bits of framecounter and the 13 bits of >>>>>>>>>> vertical scanout position. It seems to give the same readings as the 24 >>>>>>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This >>>>>>>>>> would come handy. >>>>>>>>>> >>>>>>>>>> Does Evergreen and later have a same/similar register and where is it? >>>>>>>>>> >>>>>>>>>> 2. The hw framecounter seems to increment when the vertical scanout >>>>>>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 >>>>>>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter >>>>>>>>>> increment happening exactly at the moment that vertical scanout position >>>>>>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is >>>>>>>>>> the framecounter increment just happening somewhere inside either >>>>>>>>>> scanline VTOTAL-1 or scanline 0? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad >>>>>>>>>> regression and with a bit of luck at the same time improve by being able >>>>>>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs >>>>>>>>>> to get turned off more aggressively for a bit of extra power saving. >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> -mario >>>>>>>>> >>>>>>>>>> -- a/drivers/gpu/drm/drm_irq.c >>>>>>>>>> +++ b/drivers/gpu/drm/drm_irq.c >>>>>>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>>>>>> unsigned long flags) >>>>>>>>>> { >>>>>>>>>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >>>>>>>>>> - u32 cur_vblank, diff; >>>>>>>>>> + u32 cur_vblank, diff = 0; >>>>>>>>>> bool rc; >>>>>>>>>> struct timeval t_vblank; >>>>>>>>>> + const struct timeval *t_old; >>>>>>>>>> + u64 diff_ns; >>>>>>>>>> int count = DRM_TIMESTAMP_MAXRETRIES; >>>>>>>>>> int framedur_ns = vblank->framedur_ns; >>>>>>>>>> >>>>>>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>>>>>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >>>>>>>>>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); >>>>>>>>>> >>>>>>>>>> - if (dev->max_vblank_count != 0) { >>>>>>>>>> - /* trust the hw counter when it's around */ >>>>>>>>>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>>>>>>>> - } else if (rc && framedur_ns) { >>>>>>>>>> - const struct timeval *t_old; >>>>>>>>>> - u64 diff_ns; >>>>>>>>>> - >>>>>>>>>> + /* >>>>>>>>>> + * Always use vblank timestamping based method if supported to reject >>>>>>>>>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >>>>>>>>>> + * due to some irq handling quirk. >>>>>>>>> >>>>>>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw >>>>>>>>> counter if it isn't reliable. >>>>>>>>> >>>>>>>>>> + * >>>>>>>>>> + * This also sets the diff value for use as fallback below in case the >>>>>>>>>> + * hw does not support a suitable hw vblank counter. >>>>>>>>>> + */ >>>>>>>>>> + if (rc && framedur_ns) { >>>>>>>>>> t_old = &vblanktimestamp(dev, pipe, vblank->count); >>>>>>>>>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >>>>>>>>>> >>>>>>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >>>>>>>>>> */ >>>>>>>>>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >>>>>>>>>> >>>>>>>>>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >>>>>>>>>> - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>>>>>>>> - " diff_ns = %lld, framedur_ns = %d)\n", >>>>>>>>>> - pipe, (long long) diff_ns, framedur_ns); >>>>>>>>>> - } else { >>>>>>>>>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >>>>>>>>>> + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >>>>>>>>>> + " diff_ns = %lld, framedur_ns = %d)\n", >>>>>>>>>> + pipe, (long long) diff_ns, framedur_ns); >>>>>>>>>> + >>>>>>>>>> + /* Treat this redundant invocation as no-op. */ >>>>>>>>>> + WARN_ON_ONCE(cur_vblank != vblank->last); >>>>>>>>>> + return; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * hw counters, if marked as supported via max_vblank_count != 0, >>>>>>>>>> + * *must* increment at leading edge of vblank and in sync with >>>>>>>>>> + * the firing of the hw vblank irq, otherwise we have a race here >>>>>>>>>> + * between gpu and us, where small variations in our execution >>>>>>>>>> + * timing can cause off-by-one miscounting of vblanks. >>>>>>>>>> + */ >>>>>>>>>> + if (dev->max_vblank_count != 0) { >>>>>>>>>> + /* trust the hw counter when it's around */ >>>>>>>>>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >>>>>>>>>> + } else if (!(rc && framedur_ns)) { >>>>>>>>>> /* some kind of default for drivers w/o accurate vbl timestamping */ >>>>>>>>>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >>>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >
On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: > On 11/23/2015 09:24 PM, Ville Syrjälä wrote: > > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: > >> > >> > >> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: > >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: > >>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > >>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > >>>> > >>>> ... > >>>> Ok, but why would that be a bad thing? I think we want it to think it is > >>>> in the previous frame if it is called outside the vblank irq context. > >>>> The only reason we fudge it to the next frames vblank if i vblank irq is > >>>> because we know the vblank irq handler we are executing atm. was meant > >>>> to execute within the upcoming vblank for the next frame, so we fudge > >>>> the scanout positions and thereby timestamp to correspond to that new > >>>> frame. But if something called outside irq context it should get a > >>>> scanout position/timestamp that corresponds to "reality". > >>> > >>> It would be a bad thing since it would cause the timestamp to jump > >>> backwards, and that would also cause the frame count guesstimate to go > >>> backwards. > >>> > >> > >> But only if we don't use the dev->driver->get_vblank_counter() method, > >> which we try to use on AMD. > > > > Well, if you do it that way then you have the problem of the hw counter > > seeming to jump forward by one after crossing the start of vblank (when > > compared to the value you sampled when you processed the early vblank > > interrupt). > > > > Ok, finally i see the bad scenario that wouldn't get prevented by our > current locking with the new vblank counting in the core. The vblank > enable path is safe due to locking and discounting of redundant > timestamps etc. But the disable path could go wrong: > > 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), > updates timestamps and counts "as if" in vblank -> incremented vblank > count and timestamp now set in the future. > > 2. After vblank irq finishes, but just before leading edge of vblank, > vblank_disable_and_save() executes, doesn't get bumped timestamp or > count because before vblank and not in vblank irq. Now > drm_update_vblank_count() would process a > "new" timestamp and count from the past and we'd have time and counts > going backwards, and bad things would happen. > > I haven't observed such a thing happening during testing so far, > probably because the time window in which it could happen is tiny, but > given how awfully bad it would be, it needs to be prevented. > > I had a look at the description of the Vblank irq in the "M76 Register > Reference Guide" for older asics and the description suggests that the > vblank irq fires when the crtc's line buffer is finished reading pixel > data from the scanout buffer in memory for a frame, ie., when the line > buffer read "enters" vblank. That would explain why the irq happens a > few scanlines before actual vblank, because line buffer refills must > obviously happen before the crtc can send pixel data from the line > buffer to the encoders, so it would lead a bit in time. That also means > we can't delay the vblank irq to actually happen at start of vblank and > have to deal with the early vblank irq. > > > I guess one silly idea would be to defer the vblank interrupt processing > > to a timer, and just schedule it a bit into the future from the actual > > interrupt handler. > > > > Timer would be bad because then we get problems with the pageflip > completion irq sometimes being processed before the vblank irq, You you'd need to move page flip completion to happen from the vblank timer too I suppose. > and > because we want to be fast in vblank irq handling, delivering vblank > events etc. I wouldn't trust a timer to be reliable enough for such > short waits. hrtimers should be accurate. But maybe more expensive than the timer wheel. > Busy waiting wouldn't be great either in irq. > > So what about this relatively simple one? > > 1. In radeon_get_crtc_scanoutpos() we artifically define the > vblank_start line to be, e.g, 5 scanlines before the true start of > vblank, so for the purpose of vblank counter queries and timestamping > our "vblank" would start a bit earlier and the vblank irq would always > execute in "vblank". Non-Irq invocations like vblank_disable_and_save() > would also be treated to this early vblank start and so what the DRM > core observes would always be consistent. > > 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() > internally for "dynpm" dynamic power management/reclocking, and to > implement pageflip completion detection on asics older than DCE3 which > don't have pageflip interrupts. For those cases we need to use the true > start of vblank, so for this internal use we pass in some special flag > into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank > start around. > > 3. I've added another check to my patch for drm_update_vblank_count() to > catch timestamps going backwards and discounting such cases, for a bit > of extra robustness against driver trouble. > > Any ideas why this would be a stupid idea? I'll try this now and just > hope meanwhile nobody finds a reason why this would be bad. What I don't like is leaking any of this into the core code. All the hacks should live in the hw specific code. We've managed to keep all the i915 hacks hidden that way. So if you would: - fudge your get_scanout_position as you suggested - _not_ expose the hw counter to the vblank core since it disagrees with the scanout position - keep the internal get_scanout_position variant/flag purely internal then I think I might like it. That way working hardware doesn't have to be exposed to these hacks, and there's possible a bit less danger that it all gets broken again next time the core needs some work. One problem with that approach could be that the vblank event and page flip event would be delievered at different times if you keep using the page flip interrupt, but I'm not sure that would be a problem. At least they should both have the same timestamp and counter value.
On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: > On 11/23/2015 09:24 PM, Ville Syrjälä wrote: > > On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: > >> > >> > >> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: > >>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: > >>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > >>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > >>>> > >>>> ... > >>>> Ok, but why would that be a bad thing? I think we want it to think it is > >>>> in the previous frame if it is called outside the vblank irq context. > >>>> The only reason we fudge it to the next frames vblank if i vblank irq is > >>>> because we know the vblank irq handler we are executing atm. was meant > >>>> to execute within the upcoming vblank for the next frame, so we fudge > >>>> the scanout positions and thereby timestamp to correspond to that new > >>>> frame. But if something called outside irq context it should get a > >>>> scanout position/timestamp that corresponds to "reality". > >>> > >>> It would be a bad thing since it would cause the timestamp to jump > >>> backwards, and that would also cause the frame count guesstimate to go > >>> backwards. > >>> > >> > >> But only if we don't use the dev->driver->get_vblank_counter() method, > >> which we try to use on AMD. > > > > Well, if you do it that way then you have the problem of the hw counter > > seeming to jump forward by one after crossing the start of vblank (when > > compared to the value you sampled when you processed the early vblank > > interrupt). > > > > Ok, finally i see the bad scenario that wouldn't get prevented by our > current locking with the new vblank counting in the core. The vblank > enable path is safe due to locking and discounting of redundant > timestamps etc. But the disable path could go wrong: > > 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), > updates timestamps and counts "as if" in vblank -> incremented vblank > count and timestamp now set in the future. > > 2. After vblank irq finishes, but just before leading edge of vblank, > vblank_disable_and_save() executes, doesn't get bumped timestamp or > count because before vblank and not in vblank irq. Now > drm_update_vblank_count() would process a > "new" timestamp and count from the past and we'd have time and counts > going backwards, and bad things would happen. > > I haven't observed such a thing happening during testing so far, > probably because the time window in which it could happen is tiny, but > given how awfully bad it would be, it needs to be prevented. > > I had a look at the description of the Vblank irq in the "M76 Register > Reference Guide" for older asics and the description suggests that the > vblank irq fires when the crtc's line buffer is finished reading pixel > data from the scanout buffer in memory for a frame, ie., when the line > buffer read "enters" vblank. Hmm. Does that mean there's always at least one fullscreen plane enabled in the hw? As in you can't turn off the primary plane or make it smaller than the active video area? Othwewise it sounds like you'd could either not get it at all, or get it somewhere in the middle of the screen.
On 11/25/2015 06:58 PM, Ville Syrjälä wrote: > On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: >> On 11/23/2015 09:24 PM, Ville Syrjälä wrote: >>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: >>>> >>>> >>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: >>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: >>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: >>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: >>>>>> >>>>>> ... >>>>>> Ok, but why would that be a bad thing? I think we want it to think it is >>>>>> in the previous frame if it is called outside the vblank irq context. >>>>>> The only reason we fudge it to the next frames vblank if i vblank irq is >>>>>> because we know the vblank irq handler we are executing atm. was meant >>>>>> to execute within the upcoming vblank for the next frame, so we fudge >>>>>> the scanout positions and thereby timestamp to correspond to that new >>>>>> frame. But if something called outside irq context it should get a >>>>>> scanout position/timestamp that corresponds to "reality". >>>>> >>>>> It would be a bad thing since it would cause the timestamp to jump >>>>> backwards, and that would also cause the frame count guesstimate to go >>>>> backwards. >>>>> >>>> >>>> But only if we don't use the dev->driver->get_vblank_counter() method, >>>> which we try to use on AMD. >>> >>> Well, if you do it that way then you have the problem of the hw counter >>> seeming to jump forward by one after crossing the start of vblank (when >>> compared to the value you sampled when you processed the early vblank >>> interrupt). >>> >> >> Ok, finally i see the bad scenario that wouldn't get prevented by our >> current locking with the new vblank counting in the core. The vblank >> enable path is safe due to locking and discounting of redundant >> timestamps etc. But the disable path could go wrong: >> >> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), >> updates timestamps and counts "as if" in vblank -> incremented vblank >> count and timestamp now set in the future. >> >> 2. After vblank irq finishes, but just before leading edge of vblank, >> vblank_disable_and_save() executes, doesn't get bumped timestamp or >> count because before vblank and not in vblank irq. Now >> drm_update_vblank_count() would process a >> "new" timestamp and count from the past and we'd have time and counts >> going backwards, and bad things would happen. >> >> I haven't observed such a thing happening during testing so far, >> probably because the time window in which it could happen is tiny, but >> given how awfully bad it would be, it needs to be prevented. >> >> I had a look at the description of the Vblank irq in the "M76 Register >> Reference Guide" for older asics and the description suggests that the >> vblank irq fires when the crtc's line buffer is finished reading pixel >> data from the scanout buffer in memory for a frame, ie., when the line >> buffer read "enters" vblank. > > Hmm. Does that mean there's always at least one fullscreen plane enabled > in the hw? As in you can't turn off the primary plane or make it smaller > than the active video area? Othwewise it sounds like you'd could either > not get it at all, or get it somewhere in the middle of the screen. > It says "Interrupt that can be programmed to be generated by the primary display controller's line buffer logic either when the source image line counter is not requesting any active display data (i.e. in the vertical blank) or the output CRTC timing generator is within the vertical blanking region." So my statements were my interpretation of this quote, so i can make some sense out of the vblank irq behaviour. I guess Alex or Harry would know? The M76 reference refers to some older asics, i just assume it is the same for the current ones, given that observed behaviour would be consistent with the line buffer causing this lead of a couple of scanlines. I see about 2 scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the line buffer is, how quickly it refills etc., but it sounds reasonable. -mario
On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote: > On 11/25/2015 06:46 PM, Ville Syrjälä wrote: > > On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: > >> On 11/23/2015 09:24 PM, Ville Syrjälä wrote: > >>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: > >>>> > >>>> > >>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: > >>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: > >>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > >>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > >>>>>> > >>>>>> ... > >>>>>> Ok, but why would that be a bad thing? I think we want it to think it is > >>>>>> in the previous frame if it is called outside the vblank irq context. > >>>>>> The only reason we fudge it to the next frames vblank if i vblank irq is > >>>>>> because we know the vblank irq handler we are executing atm. was meant > >>>>>> to execute within the upcoming vblank for the next frame, so we fudge > >>>>>> the scanout positions and thereby timestamp to correspond to that new > >>>>>> frame. But if something called outside irq context it should get a > >>>>>> scanout position/timestamp that corresponds to "reality". > >>>>> > >>>>> It would be a bad thing since it would cause the timestamp to jump > >>>>> backwards, and that would also cause the frame count guesstimate to go > >>>>> backwards. > >>>>> > >>>> > >>>> But only if we don't use the dev->driver->get_vblank_counter() method, > >>>> which we try to use on AMD. > >>> > >>> Well, if you do it that way then you have the problem of the hw counter > >>> seeming to jump forward by one after crossing the start of vblank (when > >>> compared to the value you sampled when you processed the early vblank > >>> interrupt). > >>> > >> > >> Ok, finally i see the bad scenario that wouldn't get prevented by our > >> current locking with the new vblank counting in the core. The vblank > >> enable path is safe due to locking and discounting of redundant > >> timestamps etc. But the disable path could go wrong: > >> > >> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), > >> updates timestamps and counts "as if" in vblank -> incremented vblank > >> count and timestamp now set in the future. > >> > >> 2. After vblank irq finishes, but just before leading edge of vblank, > >> vblank_disable_and_save() executes, doesn't get bumped timestamp or > >> count because before vblank and not in vblank irq. Now > >> drm_update_vblank_count() would process a > >> "new" timestamp and count from the past and we'd have time and counts > >> going backwards, and bad things would happen. > >> > >> I haven't observed such a thing happening during testing so far, > >> probably because the time window in which it could happen is tiny, but > >> given how awfully bad it would be, it needs to be prevented. > >> > >> I had a look at the description of the Vblank irq in the "M76 Register > >> Reference Guide" for older asics and the description suggests that the > >> vblank irq fires when the crtc's line buffer is finished reading pixel > >> data from the scanout buffer in memory for a frame, ie., when the line > >> buffer read "enters" vblank. That would explain why the irq happens a > >> few scanlines before actual vblank, because line buffer refills must > >> obviously happen before the crtc can send pixel data from the line > >> buffer to the encoders, so it would lead a bit in time. That also means > >> we can't delay the vblank irq to actually happen at start of vblank and > >> have to deal with the early vblank irq. > >> > >>> I guess one silly idea would be to defer the vblank interrupt processing > >>> to a timer, and just schedule it a bit into the future from the actual > >>> interrupt handler. > >>> > >> > >> Timer would be bad because then we get problems with the pageflip > >> completion irq sometimes being processed before the vblank irq, > > > > You you'd need to move page flip completion to happen from the vblank > > timer too I suppose. > > > >> and > >> because we want to be fast in vblank irq handling, delivering vblank > >> events etc. I wouldn't trust a timer to be reliable enough for such > >> short waits. > > > > hrtimers should be accurate. But maybe more expensive than the timer > > wheel. > > > > Sounds all a bit complex and fraught with new possible complications. I > can't spend much more time on this than i did so far, and we need to get > this into 4.4-rc, so i am aiming for the most simple solution that would > work. > > >> Busy waiting wouldn't be great either in irq. > >> > >> So what about this relatively simple one? > >> > >> 1. In radeon_get_crtc_scanoutpos() we artifically define the > >> vblank_start line to be, e.g, 5 scanlines before the true start of > >> vblank, so for the purpose of vblank counter queries and timestamping > >> our "vblank" would start a bit earlier and the vblank irq would always > >> execute in "vblank". Non-Irq invocations like vblank_disable_and_save() > >> would also be treated to this early vblank start and so what the DRM > >> core observes would always be consistent. > >> > >> 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() > >> internally for "dynpm" dynamic power management/reclocking, and to > >> implement pageflip completion detection on asics older than DCE3 which > >> don't have pageflip interrupts. For those cases we need to use the true > >> start of vblank, so for this internal use we pass in some special flag > >> into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank > >> start around. > >> > >> 3. I've added another check to my patch for drm_update_vblank_count() to > >> catch timestamps going backwards and discounting such cases, for a bit > >> of extra robustness against driver trouble. > >> > >> Any ideas why this would be a stupid idea? I'll try this now and just > >> hope meanwhile nobody finds a reason why this would be bad. > > > > What I don't like is leaking any of this into the core code. All the > > hacks should live in the hw specific code. We've managed to keep all > > the i915 hacks hidden that way. > > > > So if you would: > > - fudge your get_scanout_position as you suggested > > - _not_ expose the hw counter to the vblank core since it > > disagrees with the scanout position > > - keep the internal get_scanout_position variant/flag > > purely internal > > > > then I think I might like it. That way working hardware doesn't have to > > be exposed to these hacks, and there's possible a bit less danger that it > > all gets broken again next time the core needs some work. > > > > Ok. Exposing the fudged hw counter shouldn't be a problem though, given > that it would be fudged in a consistent way with the fudged scanout > positions and to have incremented at time of drm_handle_vblank(). We'd > bump the hw counter to the count of the new vblank at the same time when > the scanout positions would start counting backwards from minus > something to 0, showing how many vblank lines to go until start of > scanout, etc. The only difference to reality would be that this > simulated vblank would start 5 scanlines earlier than the true hw > vblank, but i can't see how the core would care about that. > > > > One problem with that approach could be that the vblank event and page > > flip event would be delievered at different times if you keep using the > > page flip interrupt, but I'm not sure that would be a problem. At least > > they should both have the same timestamp and counter value. > > > > That's the same now, and the timestamps and counts be the same. I'll > check with my measurement equipment that the timestamps will be still > accurate wrt. to reality. > > Attached is my current patch i wanted to submit for the drm core's > drm_update_vblank_count(). I think it's good to make the core somewhat > robust against potential kms driver bugs or glitches. But if you > wouldn't like that patch, there wouldn't be much of a point sending it > out at all. > > thanks, > -mario > > >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001 > From: Mario Kleiner <mario.kleiner.de@gmail.com> > Date: Wed, 25 Nov 2015 18:48:31 +0100 > Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust. > > The changes to drm_update_vblank_count() for Linux 4.4-rc > made the function more fragile wrt. some hw quirks. E.g., > at dev->driver->enable_vblank(), AMD gpu's fire a spurious > redundant vblank irq shortly after enabling vblank irqs, not > locked to vblank. This causes a redundant call which needs > to be suppressed to avoid miscounting. > > To increase robustness, shuffle things around a bit: > > On drivers with high precision vblank timestamping always > evaluate the timestamp difference between current timestamp > and previously recorded timestamp to detect such redundant > invocations and no-op in that case. > > Also detect and warn about timestamps going backwards to > catch potential kms driver bugs. > > This patch is meant for Linux 4.4-rc and later. > > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> > --- > drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 41 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > index 819b8c1..8728c3c 100644 > --- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > unsigned long flags) > { > struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > - u32 cur_vblank, diff; > + u32 cur_vblank, diff = 0; > bool rc; > struct timeval t_vblank; > + const struct timeval *t_old; > + u64 diff_ns; > int count = DRM_TIMESTAMP_MAXRETRIES; > int framedur_ns = vblank->framedur_ns; > > @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); > } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); > > - if (dev->max_vblank_count != 0) { > - /* trust the hw counter when it's around */ > - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > - } else if (rc && framedur_ns) { > - const struct timeval *t_old; > - u64 diff_ns; > - > + /* > + * Always use vblank timestamping based method if supported to reject > + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up > + * due to some irq handling quirk. > + * > + * This also sets the diff value for use as fallback below in case the > + * hw does not support a suitable hw vblank counter. > + */ > + if (rc && framedur_ns) { If you fudged everything properly why do you still need this? With working hw counter there should be no need to do this stuff. > t_old = &vblanktimestamp(dev, pipe, vblank->count); > diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); > > @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > */ > diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); > > - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) > + /* Catch driver timestamping bugs and prevent worse. */ > + if ((s32) diff < 0) { > + DRM_DEBUG_VBL("crtc %u: Timestamp going backward!" > + " diff_ns = %lld, framedur_ns = %d)\n", > + pipe, (long long) diff_ns, framedur_ns); > + return; > + } > + > + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { > DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." > - " diff_ns = %lld, framedur_ns = %d)\n", > - pipe, (long long) diff_ns, framedur_ns); > - } else { > + " diff_ns = %lld, framedur_ns = %d)\n", > + pipe, (long long) diff_ns, framedur_ns); > + > + /* Treat this redundant invocation as no-op. */ > + WARN_ON_ONCE(cur_vblank != vblank->last); > + return; > + } > + } > + > + /* > + * hw counters, if marked as supported via max_vblank_count != 0, > + * *must* increment at leading edge of vblank or at least before > + * the firing of the hw vblank irq, otherwise we have a race here > + * between gpu and us, where small variations in our execution > + * timing can cause off-by-one miscounting of vblanks. > + */ > + if (dev->max_vblank_count != 0) { > + /* trust the hw counter when it's around */ > + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; > + } else if (!(rc && framedur_ns)) { > /* some kind of default for drivers w/o accurate vbl timestamping */ > diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > } > -- > 1.9.1 >
On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner <mario.kleiner.de@gmail.com> wrote: > On 11/25/2015 06:58 PM, Ville Syrjälä wrote: >> >> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: >>> >>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote: >>>> >>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: >>>>> >>>>> >>>>> >>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: >>>>>> >>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: >>>>>>> >>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: >>>>>>>> >>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: >>>>>>> >>>>>>> >>>>>>> ... >>>>>>> Ok, but why would that be a bad thing? I think we want it to think it >>>>>>> is >>>>>>> in the previous frame if it is called outside the vblank irq context. >>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq >>>>>>> is >>>>>>> because we know the vblank irq handler we are executing atm. was >>>>>>> meant >>>>>>> to execute within the upcoming vblank for the next frame, so we fudge >>>>>>> the scanout positions and thereby timestamp to correspond to that new >>>>>>> frame. But if something called outside irq context it should get a >>>>>>> scanout position/timestamp that corresponds to "reality". >>>>>> >>>>>> >>>>>> It would be a bad thing since it would cause the timestamp to jump >>>>>> backwards, and that would also cause the frame count guesstimate to go >>>>>> backwards. >>>>>> >>>>> >>>>> But only if we don't use the dev->driver->get_vblank_counter() method, >>>>> which we try to use on AMD. >>>> >>>> >>>> Well, if you do it that way then you have the problem of the hw counter >>>> seeming to jump forward by one after crossing the start of vblank (when >>>> compared to the value you sampled when you processed the early vblank >>>> interrupt). >>>> >>> >>> Ok, finally i see the bad scenario that wouldn't get prevented by our >>> current locking with the new vblank counting in the core. The vblank >>> enable path is safe due to locking and discounting of redundant >>> timestamps etc. But the disable path could go wrong: >>> >>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), >>> updates timestamps and counts "as if" in vblank -> incremented vblank >>> count and timestamp now set in the future. >>> >>> 2. After vblank irq finishes, but just before leading edge of vblank, >>> vblank_disable_and_save() executes, doesn't get bumped timestamp or >>> count because before vblank and not in vblank irq. Now >>> drm_update_vblank_count() would process a >>> "new" timestamp and count from the past and we'd have time and counts >>> going backwards, and bad things would happen. >>> >>> I haven't observed such a thing happening during testing so far, >>> probably because the time window in which it could happen is tiny, but >>> given how awfully bad it would be, it needs to be prevented. >>> >>> I had a look at the description of the Vblank irq in the "M76 Register >>> Reference Guide" for older asics and the description suggests that the >>> vblank irq fires when the crtc's line buffer is finished reading pixel >>> data from the scanout buffer in memory for a frame, ie., when the line >>> buffer read "enters" vblank. >> >> >> Hmm. Does that mean there's always at least one fullscreen plane enabled >> in the hw? As in you can't turn off the primary plane or make it smaller >> than the active video area? Othwewise it sounds like you'd could either >> not get it at all, or get it somewhere in the middle of the screen. >> > > It says "Interrupt that can be programmed to be generated by the > primary display controller's line buffer logic either when the > source image line counter is not requesting any active > display data (i.e. in the vertical blank) or the output CRTC > timing generator is within the vertical blanking region." > > So my statements were my interpretation of this quote, so i can make some > sense out of the vblank irq behaviour. I guess Alex or Harry would know? The > M76 reference refers to some older asics, i just assume it is the same for > the current ones, given that observed behaviour would be consistent with the > line buffer causing this lead of a couple of scanlines. I see about 2 > scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the > line buffer is, how quickly it refills etc., but it sounds reasonable. The size of the line buffer varies by generation, but the LB logic is still responsible for generating the vblank interrupt even on newer hw. Alex
On 11/25/2015 08:36 PM, Ville Syrjälä wrote: > On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote: >> On 11/25/2015 06:46 PM, Ville Syrjälä wrote: ... >> Attached is my current patch i wanted to submit for the drm core's >> drm_update_vblank_count(). I think it's good to make the core somewhat >> robust against potential kms driver bugs or glitches. But if you >> wouldn't like that patch, there wouldn't be much of a point sending it >> out at all. >> >> thanks, >> -mario >> > >> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001 >> From: Mario Kleiner <mario.kleiner.de@gmail.com> >> Date: Wed, 25 Nov 2015 18:48:31 +0100 >> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust. >> >> The changes to drm_update_vblank_count() for Linux 4.4-rc >> made the function more fragile wrt. some hw quirks. E.g., >> at dev->driver->enable_vblank(), AMD gpu's fire a spurious >> redundant vblank irq shortly after enabling vblank irqs, not >> locked to vblank. This causes a redundant call which needs >> to be suppressed to avoid miscounting. >> >> To increase robustness, shuffle things around a bit: >> >> On drivers with high precision vblank timestamping always >> evaluate the timestamp difference between current timestamp >> and previously recorded timestamp to detect such redundant >> invocations and no-op in that case. >> >> Also detect and warn about timestamps going backwards to >> catch potential kms driver bugs. >> >> This patch is meant for Linux 4.4-rc and later. >> >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> >> --- >> drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++----------- >> 1 file changed, 41 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >> index 819b8c1..8728c3c 100644 >> --- a/drivers/gpu/drm/drm_irq.c >> +++ b/drivers/gpu/drm/drm_irq.c >> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >> unsigned long flags) >> { >> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >> - u32 cur_vblank, diff; >> + u32 cur_vblank, diff = 0; >> bool rc; >> struct timeval t_vblank; >> + const struct timeval *t_old; >> + u64 diff_ns; >> int count = DRM_TIMESTAMP_MAXRETRIES; >> int framedur_ns = vblank->framedur_ns; >> >> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); >> >> - if (dev->max_vblank_count != 0) { >> - /* trust the hw counter when it's around */ >> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >> - } else if (rc && framedur_ns) { >> - const struct timeval *t_old; >> - u64 diff_ns; >> - >> + /* >> + * Always use vblank timestamping based method if supported to reject >> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >> + * due to some irq handling quirk. >> + * >> + * This also sets the diff value for use as fallback below in case the >> + * hw does not support a suitable hw vblank counter. >> + */ >> + if (rc && framedur_ns) { > > If you fudged everything properly why do you still need this? With > working hw counter there should be no need to do this stuff. > As far as testing on one DCE4 card goes, i don't need it anymore with my fudged hw counters and timestamps. The fudging so far seems to work nicely. I just wanted to have a bit of extra robustness and a bit of extra available debug output against future or other broken drivers, or mistakes in fudging on the current driver, e.g., against things like timestamps going backwards. Especially since i can only test on two AMD cards atm., quite a limited sample. There are 3 display engine generations before and 5 generations after my test sample. -mario >> t_old = &vblanktimestamp(dev, pipe, vblank->count); >> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >> >> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, >> */ >> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >> >> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >> + /* Catch driver timestamping bugs and prevent worse. */ >> + if ((s32) diff < 0) { >> + DRM_DEBUG_VBL("crtc %u: Timestamp going backward!" >> + " diff_ns = %lld, framedur_ns = %d)\n", >> + pipe, (long long) diff_ns, framedur_ns); >> + return; >> + } >> + >> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >> DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >> - " diff_ns = %lld, framedur_ns = %d)\n", >> - pipe, (long long) diff_ns, framedur_ns); >> - } else { >> + " diff_ns = %lld, framedur_ns = %d)\n", >> + pipe, (long long) diff_ns, framedur_ns); >> + >> + /* Treat this redundant invocation as no-op. */ >> + WARN_ON_ONCE(cur_vblank != vblank->last); >> + return; >> + } >> + } >> + >> + /* >> + * hw counters, if marked as supported via max_vblank_count != 0, >> + * *must* increment at leading edge of vblank or at least before >> + * the firing of the hw vblank irq, otherwise we have a race here >> + * between gpu and us, where small variations in our execution >> + * timing can cause off-by-one miscounting of vblanks. >> + */ >> + if (dev->max_vblank_count != 0) { >> + /* trust the hw counter when it's around */ >> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >> + } else if (!(rc && framedur_ns)) { >> /* some kind of default for drivers w/o accurate vbl timestamping */ >> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >> } >> -- >> 1.9.1 >> > >
On 11/25/2015 08:38 PM, Alex Deucher wrote: > On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner > <mario.kleiner.de@gmail.com> wrote: >> On 11/25/2015 06:58 PM, Ville Syrjälä wrote: >>> >>> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: >>>> >>>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote: >>>>> >>>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: >>>>>>> >>>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: >>>>>>>> >>>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: >>>>>>>>> >>>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: >>>>>>>> >>>>>>>> >>>>>>>> ... >>>>>>>> Ok, but why would that be a bad thing? I think we want it to think it >>>>>>>> is >>>>>>>> in the previous frame if it is called outside the vblank irq context. >>>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq >>>>>>>> is >>>>>>>> because we know the vblank irq handler we are executing atm. was >>>>>>>> meant >>>>>>>> to execute within the upcoming vblank for the next frame, so we fudge >>>>>>>> the scanout positions and thereby timestamp to correspond to that new >>>>>>>> frame. But if something called outside irq context it should get a >>>>>>>> scanout position/timestamp that corresponds to "reality". >>>>>>> >>>>>>> >>>>>>> It would be a bad thing since it would cause the timestamp to jump >>>>>>> backwards, and that would also cause the frame count guesstimate to go >>>>>>> backwards. >>>>>>> >>>>>> >>>>>> But only if we don't use the dev->driver->get_vblank_counter() method, >>>>>> which we try to use on AMD. >>>>> >>>>> >>>>> Well, if you do it that way then you have the problem of the hw counter >>>>> seeming to jump forward by one after crossing the start of vblank (when >>>>> compared to the value you sampled when you processed the early vblank >>>>> interrupt). >>>>> >>>> >>>> Ok, finally i see the bad scenario that wouldn't get prevented by our >>>> current locking with the new vblank counting in the core. The vblank >>>> enable path is safe due to locking and discounting of redundant >>>> timestamps etc. But the disable path could go wrong: >>>> >>>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), >>>> updates timestamps and counts "as if" in vblank -> incremented vblank >>>> count and timestamp now set in the future. >>>> >>>> 2. After vblank irq finishes, but just before leading edge of vblank, >>>> vblank_disable_and_save() executes, doesn't get bumped timestamp or >>>> count because before vblank and not in vblank irq. Now >>>> drm_update_vblank_count() would process a >>>> "new" timestamp and count from the past and we'd have time and counts >>>> going backwards, and bad things would happen. >>>> >>>> I haven't observed such a thing happening during testing so far, >>>> probably because the time window in which it could happen is tiny, but >>>> given how awfully bad it would be, it needs to be prevented. >>>> >>>> I had a look at the description of the Vblank irq in the "M76 Register >>>> Reference Guide" for older asics and the description suggests that the >>>> vblank irq fires when the crtc's line buffer is finished reading pixel >>>> data from the scanout buffer in memory for a frame, ie., when the line >>>> buffer read "enters" vblank. >>> >>> >>> Hmm. Does that mean there's always at least one fullscreen plane enabled >>> in the hw? As in you can't turn off the primary plane or make it smaller >>> than the active video area? Othwewise it sounds like you'd could either >>> not get it at all, or get it somewhere in the middle of the screen. >>> >> >> It says "Interrupt that can be programmed to be generated by the >> primary display controller's line buffer logic either when the >> source image line counter is not requesting any active >> display data (i.e. in the vertical blank) or the output CRTC >> timing generator is within the vertical blanking region." >> >> So my statements were my interpretation of this quote, so i can make some >> sense out of the vblank irq behaviour. I guess Alex or Harry would know? The >> M76 reference refers to some older asics, i just assume it is the same for >> the current ones, given that observed behaviour would be consistent with the >> line buffer causing this lead of a couple of scanlines. I see about 2 >> scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the >> line buffer is, how quickly it refills etc., but it sounds reasonable. > > The size of the line buffer varies by generation, but the LB logic is > still responsible for generating the vblank interrupt even on newer > hw. > > Alex > Thanks for the pointer. I digged through the evergreen_line_buffer_adjust() function and its siblings from classic to DCE11. Seems the line buffer capacity goes up to a max 16384 pixels on single display, e.g., evergreen, and more like 8192 pixels on DCE8+? It's expressed as 8192 * 2 and talking about different line buffer partitions in some places. At least for DCE4+ i can see the sizes from the code, but not for earlier gens. To find the proper value of how much earlier i need to fudge/place the start of vblank wrt. the true start of vblank in the code, i think i'll have to adjust for line buffer size. The worst case would be that the line buffer can be ahead of our scanout positions by the full height of the line buffer a la: fudgelines = lb_size / mode->crtc_hdisplay I think i could calculate this in the xxx_line_buffer_adjust() functions and store it in radeon_crtc for use in the fudging code. Atm. i'm just using a hard-coded "10" which so far worked for typical display modes. Assuming something like a 640x480 mode i'd need to set a constant of 26 if i'd want to account for full line buffer size as worst case. So i wonder what a good value would be to be safe but at the same time to keep this extended vblank small? E.g., maybe a smaller value than full line buffer height would do because due to the way watermarks for lb refill are set it can't get ahead by more than a few scanlines << total size? I also realized that we need to extend the virtual vblank by the same fudged number of scanlines when programming the page flip. Our hw settings make it flip at leading edge of true vblank if the new bo address is latched in time. To keep all the logic in userspace working properly we need to guarantee that now it will flip at the start of the fudged vblank at latest, otherwise delays the flip one frame. So i'd add a check for being in the fudged "vblank before the true vblank" to the radeon_flip_work_func() before programming the flip and udelay() if we'd end up in this forbidden zone. This problem already exists in the old code, but probably doesn't happen frequently (or ever) in practice, at least i couldn't easily provoke it, because usually we are too slow in scheduling the flip to fit into the time window for this race. But future optimizations or faster machines might break it. Thoughts? -mario
On Wed, Nov 25, 2015 at 02:38:04PM -0500, Alex Deucher wrote: > On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner > <mario.kleiner.de@gmail.com> wrote: > > On 11/25/2015 06:58 PM, Ville Syrjälä wrote: > >> > >> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: > >>> > >>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote: > >>>> > >>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote: > >>>>>> > >>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: > >>>>>>> > >>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote: > >>>>>>>> > >>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: > >>>>>>> > >>>>>>> > >>>>>>> ... > >>>>>>> Ok, but why would that be a bad thing? I think we want it to think it > >>>>>>> is > >>>>>>> in the previous frame if it is called outside the vblank irq context. > >>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq > >>>>>>> is > >>>>>>> because we know the vblank irq handler we are executing atm. was > >>>>>>> meant > >>>>>>> to execute within the upcoming vblank for the next frame, so we fudge > >>>>>>> the scanout positions and thereby timestamp to correspond to that new > >>>>>>> frame. But if something called outside irq context it should get a > >>>>>>> scanout position/timestamp that corresponds to "reality". > >>>>>> > >>>>>> > >>>>>> It would be a bad thing since it would cause the timestamp to jump > >>>>>> backwards, and that would also cause the frame count guesstimate to go > >>>>>> backwards. > >>>>>> > >>>>> > >>>>> But only if we don't use the dev->driver->get_vblank_counter() method, > >>>>> which we try to use on AMD. > >>>> > >>>> > >>>> Well, if you do it that way then you have the problem of the hw counter > >>>> seeming to jump forward by one after crossing the start of vblank (when > >>>> compared to the value you sampled when you processed the early vblank > >>>> interrupt). > >>>> > >>> > >>> Ok, finally i see the bad scenario that wouldn't get prevented by our > >>> current locking with the new vblank counting in the core. The vblank > >>> enable path is safe due to locking and discounting of redundant > >>> timestamps etc. But the disable path could go wrong: > >>> > >>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), > >>> updates timestamps and counts "as if" in vblank -> incremented vblank > >>> count and timestamp now set in the future. > >>> > >>> 2. After vblank irq finishes, but just before leading edge of vblank, > >>> vblank_disable_and_save() executes, doesn't get bumped timestamp or > >>> count because before vblank and not in vblank irq. Now > >>> drm_update_vblank_count() would process a > >>> "new" timestamp and count from the past and we'd have time and counts > >>> going backwards, and bad things would happen. > >>> > >>> I haven't observed such a thing happening during testing so far, > >>> probably because the time window in which it could happen is tiny, but > >>> given how awfully bad it would be, it needs to be prevented. > >>> > >>> I had a look at the description of the Vblank irq in the "M76 Register > >>> Reference Guide" for older asics and the description suggests that the > >>> vblank irq fires when the crtc's line buffer is finished reading pixel > >>> data from the scanout buffer in memory for a frame, ie., when the line > >>> buffer read "enters" vblank. > >> > >> > >> Hmm. Does that mean there's always at least one fullscreen plane enabled > >> in the hw? As in you can't turn off the primary plane or make it smaller > >> than the active video area? Othwewise it sounds like you'd could either > >> not get it at all, or get it somewhere in the middle of the screen. > >> > > > > It says "Interrupt that can be programmed to be generated by the > > primary display controller's line buffer logic either when the > > source image line counter is not requesting any active > > display data (i.e. in the vertical blank) or the output CRTC > > timing generator is within the vertical blanking region." > > > > So my statements were my interpretation of this quote, so i can make some > > sense out of the vblank irq behaviour. I guess Alex or Harry would know? The > > M76 reference refers to some older asics, i just assume it is the same for > > the current ones, given that observed behaviour would be consistent with the > > line buffer causing this lead of a couple of scanlines. I see about 2 > > scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the > > line buffer is, how quickly it refills etc., but it sounds reasonable. > > The size of the line buffer varies by generation, but the LB logic is > still responsible for generating the vblank interrupt even on newer > hw. So there's no actual vblank interrupt available from the timing generator?
-- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, unsigned long flags) { struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; - u32 cur_vblank, diff; + u32 cur_vblank, diff = 0; bool rc; struct timeval t_vblank; + const struct timeval *t_old; + u64 diff_ns; int count = DRM_TIMESTAMP_MAXRETRIES; int framedur_ns = vblank->framedur_ns; @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); - if (dev->max_vblank_count != 0) { - /* trust the hw counter when it's around */ - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; - } else if (rc && framedur_ns) { - const struct timeval *t_old; - u64 diff_ns; - + /* + * Always use vblank timestamping based method if supported to reject + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up + * due to some irq handling quirk. + * + * This also sets the diff value for use as fallback below in case the + * hw does not support a suitable hw vblank counter. + */ + if (rc && framedur_ns) { t_old = &vblanktimestamp(dev, pipe, vblank->count); diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, */ diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) - DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." - " diff_ns = %lld, framedur_ns = %d)\n", - pipe, (long long) diff_ns, framedur_ns); - } else { + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { + DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." + " diff_ns = %lld, framedur_ns = %d)\n", + pipe, (long long) diff_ns, framedur_ns); + + /* Treat this redundant invocation as no-op. */ + WARN_ON_ONCE(cur_vblank != vblank->last); + return; + } + } + + /* + * hw counters, if marked as supported via max_vblank_count != 0, + * *must* increment at leading edge of vblank and in sync with + * the firing of the hw vblank irq, otherwise we have a race here + * between gpu and us, where small variations in our execution + * timing can cause off-by-one miscounting of vblanks. + */ + if (dev->max_vblank_count != 0) { + /* trust the hw counter when it's around */ + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; + } else if (!(rc && framedur_ns)) { /* some kind of default for drivers w/o accurate vbl timestamping */ diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; }