Message ID | 1372428931-24144-1-git-send-email-vijay.a.purushothaman@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: intel-gfx-bounces+shobhit.kumar=intel.com@lists.freedesktop.org > [mailto:intel-gfx-bounces+shobhit.kumar=intel.com@lists.freedesktop.org] > On Behalf Of Vijay Purushothaman > Sent: Friday, June 28, 2013 7:46 PM > To: Intel Graphics > Subject: [Intel-gfx] [PATCH] drm/i915: Don't wait for vblank for sprite plane > flips > > Since the sprite planes are using synchronized MMIO based flip, no need to > wait for vblank. Removing this wait allows us to get a nice performance boost > to both 3D & media workloads based on sprite (~60 fps from ~20 fps) > > Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> > Signed-off-by: Gary Smith <gary.k.smith@intel.com> > --- > drivers/gpu/drm/i915/intel_sprite.c | 14 +------------- > 1 file changed, 1 insertion(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c > b/drivers/gpu/drm/i915/intel_sprite.c > index 1fa5612..1d14fc0 100644 > --- a/drivers/gpu/drm/i915/intel_sprite.c > +++ b/drivers/gpu/drm/i915/intel_sprite.c > @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct > drm_crtc *crtc, > intel_disable_primary(crtc); > > /* Unpin old obj after new one is active to avoid ugliness */ > - if (old_obj) { > - /* > - * It's fairly common to simply update the position of > - * an existing object. In that case, we don't need to > - * wait for vblank to avoid ugliness, we only need to > - * do the pin & ref bookkeeping. > - */ > - if (old_obj != obj) { > - mutex_unlock(&dev->struct_mutex); > - intel_wait_for_vblank(dev, to_intel_crtc(crtc)- > >pipe); > - mutex_lock(&dev->struct_mutex); > - } > + if (old_obj) > intel_unpin_fb_obj(old_obj); > - } > > out_unlock: > mutex_unlock(&dev->struct_mutex); > -- > 1.7.9.5 Tested on VLV. Works fine Tested-by: Shobhit Kumar <Shobhit.kumar@intel.com>
On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote: > Since the sprite planes are using synchronized MMIO based flip, no need > to wait for vblank. Removing this wait allows us to get a nice > performance boost to both 3D & media workloads based on sprite (~60 fps > from ~20 fps) Nak. We can't unpin the buffer until the hardware has finished reading from it. The proper fix is to do the unpin asynchronously after the flip has completed. That's one part of the bigger atomic pageflip story. > > Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> > Signed-off-by: Gary Smith <gary.k.smith@intel.com> > --- > drivers/gpu/drm/i915/intel_sprite.c | 14 +------------- > 1 file changed, 1 insertion(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c > index 1fa5612..1d14fc0 100644 > --- a/drivers/gpu/drm/i915/intel_sprite.c > +++ b/drivers/gpu/drm/i915/intel_sprite.c > @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc, > intel_disable_primary(crtc); > > /* Unpin old obj after new one is active to avoid ugliness */ > - if (old_obj) { > - /* > - * It's fairly common to simply update the position of > - * an existing object. In that case, we don't need to > - * wait for vblank to avoid ugliness, we only need to > - * do the pin & ref bookkeeping. > - */ > - if (old_obj != obj) { > - mutex_unlock(&dev->struct_mutex); > - intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe); > - mutex_lock(&dev->struct_mutex); > - } > + if (old_obj) > intel_unpin_fb_obj(old_obj); > - } > > out_unlock: > mutex_unlock(&dev->struct_mutex); > -- > 1.7.9.5 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Jun 28, 2013 at 05:24:50PM +0300, Ville Syrjälä wrote: > On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote: > > Since the sprite planes are using synchronized MMIO based flip, no need > > to wait for vblank. Removing this wait allows us to get a nice > > performance boost to both 3D & media workloads based on sprite (~60 fps > > from ~20 fps) > > Nak. We can't unpin the buffer until the hardware has finished reading > from it. > > The proper fix is to do the unpin asynchronously after the flip has > completed. That's one part of the bigger atomic pageflip story. The interested reader is invited to review the patches to do async unpinning here and in set-base sent many, many moons ago. -Chris
On 6/28/2013 7:54 PM, Ville Syrjälä wrote: > On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote: >> Since the sprite planes are using synchronized MMIO based flip, no need >> to wait for vblank. Removing this wait allows us to get a nice >> performance boost to both 3D & media workloads based on sprite (~60 fps >> from ~20 fps) > > Nak. We can't unpin the buffer until the hardware has finished reading > from it. Thanks much for the review feedback. We have removed this check in our android production branch so far we have not seen any side effect or artifact. Is this a conservative check or do we have any use case which will fail without this piece of code? Apparently the windows driver team have been using MMIO flips for Sprite and the windows driver also didn't wait for vblank for such flips all along. Could you please help me understand this a bit better? This wait reduces the perf to ~20 fps and thus prevent us from using sprite for any OGL layer mapping in hwcomposer and we are losing significant amount of power. For video content playback the problem is not that bad. > > The proper fix is to do the unpin asynchronously after the flip has > completed. That's one part of the bigger atomic pageflip story. > Any idea when are we planning to merge this atomic flip in d-i-n-q? Thanks, Vijay >> >> Signed-off-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com> >> Signed-off-by: Gary Smith <gary.k.smith@intel.com> >> --- >> drivers/gpu/drm/i915/intel_sprite.c | 14 +------------- >> 1 file changed, 1 insertion(+), 13 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c >> index 1fa5612..1d14fc0 100644 >> --- a/drivers/gpu/drm/i915/intel_sprite.c >> +++ b/drivers/gpu/drm/i915/intel_sprite.c >> @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc, >> intel_disable_primary(crtc); >> >> /* Unpin old obj after new one is active to avoid ugliness */ >> - if (old_obj) { >> - /* >> - * It's fairly common to simply update the position of >> - * an existing object. In that case, we don't need to >> - * wait for vblank to avoid ugliness, we only need to >> - * do the pin & ref bookkeeping. >> - */ >> - if (old_obj != obj) { >> - mutex_unlock(&dev->struct_mutex); >> - intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe); >> - mutex_lock(&dev->struct_mutex); >> - } >> + if (old_obj) >> intel_unpin_fb_obj(old_obj); >> - } >> >> out_unlock: >> mutex_unlock(&dev->struct_mutex); >> -- >> 1.7.9.5 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx >
On 6/28/2013 9:35 PM, Chris Wilson wrote: > On Fri, Jun 28, 2013 at 05:24:50PM +0300, Ville Syrjälä wrote: >> On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote: >>> Since the sprite planes are using synchronized MMIO based flip, no need >>> to wait for vblank. Removing this wait allows us to get a nice >>> performance boost to both 3D & media workloads based on sprite (~60 fps >>> from ~20 fps) >> >> Nak. We can't unpin the buffer until the hardware has finished reading >> from it. >> >> The proper fix is to do the unpin asynchronously after the flip has >> completed. That's one part of the bigger atomic pageflip story. > > The interested reader is invited to review the patches to do async > unpinning here and in set-base sent many, many moons ago. > -Chris > Thanks Chris. I will try to dig for those patches. In case if you have the pointers handy, would you mind pointing me to the patch series? Thanks, Vijay
On Mon, Jul 01, 2013 at 10:56:06AM +0530, Vijay Purushothaman wrote: > On 6/28/2013 7:54 PM, Ville Syrjälä wrote: > > On Fri, Jun 28, 2013 at 07:45:31PM +0530, Vijay Purushothaman wrote: > >> Since the sprite planes are using synchronized MMIO based flip, no need > >> to wait for vblank. Removing this wait allows us to get a nice > >> performance boost to both 3D & media workloads based on sprite (~60 fps > >> from ~20 fps) > > > > Nak. We can't unpin the buffer until the hardware has finished reading > > from it. > > Thanks much for the review feedback. We have removed this check in our > android production branch so far we have not seen any side effect or > artifact. Is this a conservative check or do we have any use case which > will fail without this piece of code? > > Apparently the windows driver team have been using MMIO flips for Sprite > and the windows driver also didn't wait for vblank for such flips all along. > > Could you please help me understand this a bit better? This wait reduces > the perf to ~20 fps and thus prevent us from using sprite for any OGL > layer mapping in hwcomposer and we are losing significant amount of > power. For video content playback the problem is not that bad. I'd say trying to implement hwcomposer w/o atomic pageflip is anyway a doomed idea. You will see ugly glitches if/when the sprite turns on/off too late/soon. I actually tried it myself a few years ago, and then realized that it just won't cut it, which is why I started writing some of the early atomic page flip code. > > The proper fix is to do the unpin asynchronously after the flip has > > completed. That's one part of the bigger atomic pageflip story. > > > > Any idea when are we planning to merge this atomic flip in d-i-n-q? After someone has time to rewrite a bunch of the code. My rough plan: - Fix watermarks. I've been working on it, and I hope to have something reasonable ready before my summer vacation starts after this week. Then there's still the pre-pch watermark code to fix. - Implement drm_plane support for primary planes. I posted some gen2-4 sprite C code a while back which should serve as a starting point. I still haven't quite figured out how we're going to handle compatibility w/ old userland though. We may not want to expose these planes unless we know userland can handle them. I want to do this before we merge any new atomic page flip ioctl, so that we don't have to carry the legacy baggage in the new API. - Cook up some plane_config type of stuff to pre-compute the plane state - Plug in the real atomic stuff Another big issues is the locking. We really want to get fine grained locking for the atomic page flip too, so someone has to figure out how to do it. Daniel's idea of using the ww_mutex stuff for this sounds like a reasonable approach to me, but I haven't really thought about it that much yet. Currently even the non-atomic plane ioctls take the big kms lock, which plainly sucks. There are a bunch of API issues left as well, like defining new properties, figuring out the event stuff (I know some people would prefer a single even for the whole crtc, which could be done, except we have to carefully define how it behaves in certain special cases), and probably some other things I forgot already. One other interesting idea would be having some swap interval or swap target timestamp stuff in the kernel, so that userspace could even schedule multiple atomic updates in advance. This may turn out rather problematic when performing operations on multiple crtcs, since there could be several different states in the pipeline for one crtc, so the state of another crtc would possibly need to be checked against all of them, so that we can be sure whether the operation will succeed regardless of which order they end up being pushed to the hardware. Maybe we'd just limit such heavy pipelining to simple page flip scenarios, which could neatly avoid some of the problems. But work on this stuff can easily wait until the basic atomic stuff is in.
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c index 1fa5612..1d14fc0 100644 --- a/drivers/gpu/drm/i915/intel_sprite.c +++ b/drivers/gpu/drm/i915/intel_sprite.c @@ -828,20 +828,8 @@ intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc, intel_disable_primary(crtc); /* Unpin old obj after new one is active to avoid ugliness */ - if (old_obj) { - /* - * It's fairly common to simply update the position of - * an existing object. In that case, we don't need to - * wait for vblank to avoid ugliness, we only need to - * do the pin & ref bookkeeping. - */ - if (old_obj != obj) { - mutex_unlock(&dev->struct_mutex); - intel_wait_for_vblank(dev, to_intel_crtc(crtc)->pipe); - mutex_lock(&dev->struct_mutex); - } + if (old_obj) intel_unpin_fb_obj(old_obj); - } out_unlock: mutex_unlock(&dev->struct_mutex);