Message ID | 1349591890-13732-3-git-send-email-mario.kleiner@tuebingen.mpg.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> On 15.10.12 16:52, Chris Wilson wrote: > > On Mon, 15 Oct 2012 16:46:52 +0200, Mario Kleiner > <mario.kleiner@tuebingen.mpg.de> wrote: > >> Hi Chris, > >> > >> can you please check & merge at least the first two patches of the > >> series into the intel ddx? > > Thanks for the quick reply. > > > > > The first is along the right path, but I think you want to base the > > split on divisor==0. > > I don't think so, unless i misunderstand what you mean? The optimization > of I830DRI2ScheduleFlip()'ing immediately should only happen in the case > where current_msc is >= target_msc, ie., if the client really requests > swap at next possible vblank, regardless what divisor is, once we've > entered the whole ... > > if (divisor == 0 || current_msc < *target_msc) { > > ... block. Checking for divisor == 0 would be a nice little cleanup if > only either (divisor == 0) or (current_msc < *target_msc) could be true. > But it can happen that both (divisor == 0) and (current_msc < > *target_msc) and then a check for (divisor == 0) wouldn't tell you if > target_msc is in the future and the kernel vblank mechanism should > schedule swap in the future, or if it is time for an immediate flip. > > Also i tested with various distances between successive swap and with > divisor 0 vs. non-zero, so at least it works as advertised with the > current patch. > > So that patch should be ok. Yeah I don't understand the flip schedule at the top there either; if target_msc is out in the future, why would we schedule a page flip immediately just because divisor == 0? Maybe it should look like this instead? if (divisor == 0 || current_msc < *target_msc) { if (divisor == 0 && current_msc >= *target_msc) if (flip && I830DRI2ScheduleFlip(intel, draw, swap_info)) return TRUE; (could clean that up a little but it captures the optimization I think and avoids an extra vblank and potential frame delay.) > The second is broken in the DRI2 layer, as the > > SwapLimit mechanism exposes the multi-client race to a single client > > (i.e. there is no serialisation between the InvalidateEvent and the > > GetBuffers, and the InvalidateEvent is always broadcast too early.) > > Can you explain in more detail? I know there were many problems with the > invalidate events, but thought that's ok now? Multi-client race case > happens when a OpenGL app and a compositor is at work? So basically, the > whole DRI2 setup is fundamentally broken for any swaplimit other than 1 > and we have to wait for a future DRI3 to fix this properly? > > I didn't see any weird behaviour during testing under both compiz+unity > or without compositor with fullscreen OpenGL + pageflipping. Is this > race common or was i just somehow lucky? I'm not sure I understand Chris's comment either, however in the redirected case the compositor is ultimately responsible for when things show up on the screen, so timestamping is meaningless in that case. However for fullscreen, unredirected windows (sounds like that's what you tested), we should be able to properly queue and stamp things as they complete AIUI. > The > > third is just plain wrong, as pageflip may be a blocking ioctl (and will > > impose client blocking) so anybody who wants to override SwapBuffersWait > > will also want to prevent the inherent wait due to the flip. In any > > event that option is to only paper over the loose DRI2 specification and > > the lack of client control... > > I don't think so: If you want to run non-vsynced/tearing with max. > swaprate for benchmarks etc., then you must set "SwapBuffersWait" "off" > and set the drawables swapinterval to zero via some .drmrc setting or > the app calling glXSwapInterval etc. But once you've set the > swapinterval to zero, the x-server *always* does an immediate copy blit > and bypasses the whole vblank events / kms-pageflip mechanism. See I think this comes down to what people expect of the "SwapbuffersWait" option. Our man page indicates it's simply an anti-tearing feature, so if you disable it I think users would expect swaps to occur as soon as possible or when scheduled, regardless of potential tearing. If we leave flipping enabled, I think we'll often hit cases where a swap will be delayed until the next vblank (when the flip is latched into the display engine) rather than occurring immediately or when the vblank event for it fires, with likely tearing. Or do you think the current code handles this case well enough that we can guarantee we won't be delaying things by a frame sometimes? If so, you can argue that this isn't a performance advantage, but then it's a rather lame option anyway... So if/until we have async page flips, I think the current behavior is probably fine. Mario, do we have some small tests we can add to piglit for the timestamping cases? Looking at simple client code always helps me understand the complex DDX and server interaction better... Also, I'm assuming all this works ok with triple buffering disabled? What about with SNA? Thanks,
On Mon, 10 Dec 2012 10:48:29 -0800 Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > On 15.10.12 16:52, Chris Wilson wrote: > > > On Mon, 15 Oct 2012 16:46:52 +0200, Mario Kleiner > > <mario.kleiner@tuebingen.mpg.de> wrote: > > >> Hi Chris, > > >> > > >> can you please check & merge at least the first two patches of the > > >> series into the intel ddx? > > > > Thanks for the quick reply. > > > > > > > > The first is along the right path, but I think you want to base the > > > split on divisor==0. > > > > I don't think so, unless i misunderstand what you mean? The optimization > > of I830DRI2ScheduleFlip()'ing immediately should only happen in the case > > where current_msc is >= target_msc, ie., if the client really requests > > swap at next possible vblank, regardless what divisor is, once we've > > entered the whole ... > > > > if (divisor == 0 || current_msc < *target_msc) { > > > > ... block. Checking for divisor == 0 would be a nice little cleanup if > > only either (divisor == 0) or (current_msc < *target_msc) could be true. > > But it can happen that both (divisor == 0) and (current_msc < > > *target_msc) and then a check for (divisor == 0) wouldn't tell you if > > target_msc is in the future and the kernel vblank mechanism should > > schedule swap in the future, or if it is time for an immediate flip. > > > > Also i tested with various distances between successive swap and with > > divisor 0 vs. non-zero, so at least it works as advertised with the > > current patch. > > > > So that patch should be ok. > > Yeah I don't understand the flip schedule at the top there either; if > target_msc is out in the future, why would we schedule a page flip > immediately just because divisor == 0? > > Maybe it should look like this instead? > > if (divisor == 0 || current_msc < *target_msc) { > if (divisor == 0 && current_msc >= *target_msc) > if (flip && I830DRI2ScheduleFlip(intel, draw, swap_info)) > return TRUE; commit 2ab29a1688cd313768d928e87e145570f35b4a70 Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Mon Dec 10 14:55:32 2012 -0800 dri2: don't schedule a flip prematurely at ScheduleSwap time Mario, can you make sure this works for you? Thanks,
On 11.12.12 00:00, Jesse Barnes wrote: > On Mon, 10 Dec 2012 10:48:29 -0800 > Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > >>> On 15.10.12 16:52, Chris Wilson wrote: >>> > On Mon, 15 Oct 2012 16:46:52 +0200, Mario Kleiner >>> <mario.kleiner@tuebingen.mpg.de> wrote: >>> >> Hi Chris, >>> >> >>> >> can you please check & merge at least the first two patches of the >>> >> series into the intel ddx? >>> >>> Thanks for the quick reply. >>> >>> > >>> > The first is along the right path, but I think you want to base the >>> > split on divisor==0. >>> >>> I don't think so, unless i misunderstand what you mean? The optimization >>> of I830DRI2ScheduleFlip()'ing immediately should only happen in the case >>> where current_msc is >= target_msc, ie., if the client really requests >>> swap at next possible vblank, regardless what divisor is, once we've >>> entered the whole ... >>> >>> if (divisor == 0 || current_msc < *target_msc) { >>> >>> ... block. Checking for divisor == 0 would be a nice little cleanup if >>> only either (divisor == 0) or (current_msc < *target_msc) could be true. >>> But it can happen that both (divisor == 0) and (current_msc < >>> *target_msc) and then a check for (divisor == 0) wouldn't tell you if >>> target_msc is in the future and the kernel vblank mechanism should >>> schedule swap in the future, or if it is time for an immediate flip. >>> >>> Also i tested with various distances between successive swap and with >>> divisor 0 vs. non-zero, so at least it works as advertised with the >>> current patch. >>> >>> So that patch should be ok. >> >> Yeah I don't understand the flip schedule at the top there either; if >> target_msc is out in the future, why would we schedule a page flip >> immediately just because divisor == 0? >> >> Maybe it should look like this instead? >> >> if (divisor == 0 || current_msc < *target_msc) { >> if (divisor == 0 && current_msc >= *target_msc) >> if (flip && I830DRI2ScheduleFlip(intel, draw, swap_info)) >> return TRUE; > > commit 2ab29a1688cd313768d928e87e145570f35b4a70 > Author: Jesse Barnes <jbarnes@virtuousgeek.org> > Date: Mon Dec 10 14:55:32 2012 -0800 > > dri2: don't schedule a flip prematurely at ScheduleSwap time > > Mario, can you make sure this works for you? > Looks good for my purposes, ie., should fix glXSwapBuffersMscOML(), which was my main point of grief, thanks a lot! I'll retest soonish for extra paranoia. The divisor == 0 check is not really needed for the logic to work, but doesn't hurt and maybe still nice to leave there for readability to clarify the condition when the optimization should work. However, it doesn't fix some cases for normal glXSwapBuffers() with a swapinterval > 1. That requires always updating *target_msc properly, and the early exit if the optimization is taken prevents that. Also the swap_info->frame doesn't get updated in this case, which effectively disables/skips some correctness test in I830DRI2FlipEventHandler(). If you want to fix those as well you'll basically end up with my original patch, which moves the optimization a few lines down in the function and adds updating of those two variables, minus lots of comments in the code and commit message. But i'm happy, i mostly need glXSwapBuffersMscOML() and the pageflip timestamping to work properly. -mario
On 10.12.12 19:48, Jesse Barnes wrote: >> On 15.10.12 16:52, Chris Wilson wrote: >> > On Mon, 15 Oct 2012 16:46:52 +0200, Mario Kleiner >> <mario.kleiner@tuebingen.mpg.de> wrote: >> >> Hi Chris, >> >> >> >> can you please check & merge at least the first two patches of the >> >> series into the intel ddx? >> >> Thanks for the quick reply. >> >> > >> > The first is along the right path, but I think you want to base the >> > split on divisor==0. >> >> I don't think so, unless i misunderstand what you mean? The optimization >> of I830DRI2ScheduleFlip()'ing immediately should only happen in the case >> where current_msc is >= target_msc, ie., if the client really requests >> swap at next possible vblank, regardless what divisor is, once we've >> entered the whole ... >> >> if (divisor == 0 || current_msc < *target_msc) { >> >> ... block. Checking for divisor == 0 would be a nice little cleanup if >> only either (divisor == 0) or (current_msc < *target_msc) could be true. >> But it can happen that both (divisor == 0) and (current_msc < >> *target_msc) and then a check for (divisor == 0) wouldn't tell you if >> target_msc is in the future and the kernel vblank mechanism should >> schedule swap in the future, or if it is time for an immediate flip. >> >> Also i tested with various distances between successive swap and with >> divisor 0 vs. non-zero, so at least it works as advertised with the >> current patch. >> >> So that patch should be ok. > > Yeah I don't understand the flip schedule at the top there either; if > target_msc is out in the future, why would we schedule a page flip > immediately just because divisor == 0? > > Maybe it should look like this instead? > > if (divisor == 0 || current_msc < *target_msc) { > if (divisor == 0 && current_msc >= *target_msc) > if (flip && I830DRI2ScheduleFlip(intel, draw, swap_info)) > return TRUE; > > (could clean that up a little but it captures the optimization I think > and avoids an extra vblank and potential frame delay.) > >> The second is broken in the DRI2 layer, as the >> > SwapLimit mechanism exposes the multi-client race to a single client >> > (i.e. there is no serialisation between the InvalidateEvent and the >> > GetBuffers, and the InvalidateEvent is always broadcast too early.) >> >> Can you explain in more detail? I know there were many problems with the >> invalidate events, but thought that's ok now? Multi-client race case >> happens when a OpenGL app and a compositor is at work? So basically, the >> whole DRI2 setup is fundamentally broken for any swaplimit other than 1 >> and we have to wait for a future DRI3 to fix this properly? >> >> I didn't see any weird behaviour during testing under both compiz+unity >> or without compositor with fullscreen OpenGL + pageflipping. Is this >> race common or was i just somehow lucky? > > I'm not sure I understand Chris's comment either, however in the > redirected case the compositor is ultimately responsible for when > things show up on the screen, so timestamping is meaningless in that > case. However for fullscreen, unredirected windows (sounds like that's > what you tested), we should be able to properly queue and stamp things > as they complete AIUI. > Yes. I need it to work for fullscreen, unredirected windows. This is the only case where timestamping works reliably atm. My patch only applies to page-flipped swaps. In this cases (no compositor, fullscreen or unredirected with compositor) i didn't see problems on my system. But if it is a race it could be that my app is just not triggering it in typical use. It would be good to know when and when not exactly these races with invalidate events can happen? >> The >> > third is just plain wrong, as pageflip may be a blocking ioctl (and will >> > impose client blocking) so anybody who wants to override SwapBuffersWait >> > will also want to prevent the inherent wait due to the flip. In any >> > event that option is to only paper over the loose DRI2 specification and >> > the lack of client control... >> >> I don't think so: If you want to run non-vsynced/tearing with max. >> swaprate for benchmarks etc., then you must set "SwapBuffersWait" "off" >> and set the drawables swapinterval to zero via some .drmrc setting or >> the app calling glXSwapInterval etc. But once you've set the >> swapinterval to zero, the x-server *always* does an immediate copy blit >> and bypasses the whole vblank events / kms-pageflip mechanism. See > > I think this comes down to what people expect of the "SwapbuffersWait" > option. Our man page indicates it's simply an anti-tearing feature, so > if you disable it I think users would expect swaps to occur as soon as > possible or when scheduled, regardless of potential tearing. > > If we leave flipping enabled, I think we'll often hit cases where a > swap will be delayed until the next vblank (when the flip is latched > into the display engine) rather than occurring immediately or when the > vblank event for it fires, with likely tearing. Or do you think the > current code handles this case well enough that we can guarantee we > won't be delaying things by a frame sometimes? > My assumption was that the use case is getting maximum framerate for benchmarking, or for gamers who happily trade tearing for performance. In both cases your app probably just calls glXSwapBuffers asap and doesn't use deferred swaps. Also to avoid being throttled to refresh rate by use of vblank events, the app/user would set the swapinterval to zero -- But in this case, the x-server will skip the whole vblank event swap path anyway and always call DRI2CopyRegion() to do a blit copy, and those are not vsync'ed if "SwapBuffersWait" "off". My argument is that in the case where you don't want vsync the current code doesn't make a difference because vsync'ed pageflips weren't used anyway, but in the case you want vsync'ed swaps via non-zero swapinterval, you'll lose pageflipping and introduce uneccessary tearing and lower performance. > If so, you can argue that this isn't a performance advantage, but then > it's a rather lame option anyway... So if/until we have async page > flips, I think the current behavior is probably fine. > I don't really care about this case one way or the other for my purpose, i just thought the old behavior made more sense, assuming people use it the way i think they'd use it. But didn't you submit some async-flip ioctl() patches recently to allow non-vsync'ed kms flips? I also remember Chris posting a patch set for the x-server that allowed to communicate this properly to the ddx? Those together would solve it properly without "SwapBuffersWait" hacks. > Mario, do we have some small tests we can add to piglit for the > timestamping cases? Looking at simple client code always helps me > understand the complex DDX and server interaction better... > I'll write some, hopefully i'll find some time under the christmas tree... > Also, I'm assuming all this works ok with triple buffering disabled? Yes, works with double-buffering, that path is unchanged. Atm. that's what works correctly with timestamping. All my users have to disable triple-buffering to get a useable setup. My hope was to spare them the editing of xorg.conf in the default triple-buffering case, but we can live with that inconvenience as long as the xorg.conf option works. > What about with SNA? I think i don't understand the relationship between OpenGL and SNA? I'm somewhat lost there. As far as i can see there are completely separate implementations for SNA and OpenGL use? As far as i understood from following bits on the mailing list and on Phoronix, SNA is all about accelerating 2D operations, not 3D operations? But then i can see that quite some of the swap scheduling etc. is replicated in the SNA code, even the use of target_msc, divisor, remainder stuff for deferred swaps, and it is not clear how this OpenGL concepts are related to a 2D acceleration architecture? Enlight me a bit if you can, thanks, -mario
diff --git a/src/intel_dri.c b/src/intel_dri.c index 126acb2..8786bf4 100644 --- a/src/intel_dri.c +++ b/src/intel_dri.c @@ -1275,9 +1275,6 @@ I830DRI2ScheduleSwap(ClientPtr client, DrawablePtr draw, DRI2BufferPtr front, * the swap. */ if (divisor == 0 || current_msc < *target_msc) { - if (flip && I830DRI2ScheduleFlip(intel, draw, swap_info)) - return TRUE; - vbl.request.type = DRM_VBLANK_ABSOLUTE | DRM_VBLANK_EVENT | pipe_select(pipe); @@ -1295,6 +1292,25 @@ I830DRI2ScheduleSwap(ClientPtr client, DrawablePtr draw, DRI2BufferPtr front, if (current_msc >= *target_msc) *target_msc = current_msc; + /* If pageflip is requested for the next possible vblank, + * then avoid the roundtrip to the kernels vblank event + * delivery and schedule the pageflip for next vblank + * directly. This can't be done for any other case, as + * it would violate the OML_sync_control spec and + * SGI/MESA/GLX_swap_control spec! + */ + if (flip && (current_msc == *target_msc) && + I830DRI2ScheduleFlip(intel, draw, swap_info)) { + /* Create approximate target vblank of flip-completion, + * so basic consistency checks and swap_interval still + * work correctly. + */ + *target_msc += flip; + swap_info->frame = *target_msc; + + return TRUE; + } + vbl.request.sequence = *target_msc; vbl.request.signal = (unsigned long)swap_info; ret = drmWaitVBlank(intel->drmSubFD, &vbl);
Commit 7538be3315b8683b05e8f6b22023baadcc0bc4da together with DRI2/OpenGL triple-buffering support added an optimization which breaks kms pageflip swap scheduling for most meaningful use cases and thereby violates the OML_sync_control, GLX_SGI_swap_control, GLX_swap_control and MESA_swap_control specs, except for the trivial case of a glXSwapBuffers call with swap_interval == 1. This small modification allows to keep the optimization in the intended way, while removing the hilarious side effects for timing sensitive applications. Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de> --- src/intel_dri.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-)