Message ID | 1389266799-13317-3-git-send-email-akash.goel@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 09, 2014 at 04:56:39PM +0530, akash.goel@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > Using MMIO based flips now on VLV for Media power well residency optimization. > The blitter ring is currently being used just for the command streamer based > flip calls. For pure 3D workloads, with MMIO flips, there will be no use of > blitter ring and this will ensure the 100% residency in D0i3 for Media well. > The other alternative of having Render ring based flip calls is not being used, > as that option adversly affects the performance (FPS) of certain 3D Apps Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. -Chris
>> Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. Sorry, couldn't understand your point. With Command streamer based flips, we could use the cross ring MBOX mechanism at Hw level, to ensure that buffer is flipped only when the rendering is completed. But with MMIO flips, need to ensure that we somehow introduce a wait for the rendering to complete, before updating the Display Surface Address register, to effect the flip. Best Regards Akash -----Original Message----- From: Chris Wilson [mailto:chris@chris-wilson.co.uk] Sent: Thursday, January 09, 2014 5:02 PM To: Goel, Akash Cc: intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. On Thu, Jan 09, 2014 at 04:56:39PM +0530, akash.goel@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > Using MMIO based flips now on VLV for Media power well residency optimization. > The blitter ring is currently being used just for the command streamer > based flip calls. For pure 3D workloads, with MMIO flips, there will > be no use of blitter ring and this will ensure the 100% residency in D0i3 for Media well. > The other alternative of having Render ring based flip calls is not > being used, as that option adversly affects the performance (FPS) of > certain 3D Apps Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. -Chris -- Chris Wilson, Intel Open Source Technology Centre
Please could you kindly elaborate here, it will help us to proceed further with this patch. Best Regards Akash -----Original Message----- From: Goel, Akash Sent: Monday, January 13, 2014 3:17 PM To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. >> Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. Sorry, couldn't understand your point. With Command streamer based flips, we could use the cross ring MBOX mechanism at Hw level, to ensure that buffer is flipped only when the rendering is completed. But with MMIO flips, need to ensure that we somehow introduce a wait for the rendering to complete, before updating the Display Surface Address register, to effect the flip. Best Regards Akash -----Original Message----- From: Chris Wilson [mailto:chris@chris-wilson.co.uk] Sent: Thursday, January 09, 2014 5:02 PM To: Goel, Akash Cc: intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. On Thu, Jan 09, 2014 at 04:56:39PM +0530, akash.goel@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > Using MMIO based flips now on VLV for Media power well residency optimization. > The blitter ring is currently being used just for the command streamer > based flip calls. For pure 3D workloads, with MMIO flips, there will > be no use of blitter ring and this will ensure the 100% residency in D0i3 for Media well. > The other alternative of having Render ring based flip calls is not > being used, as that option adversly affects the performance (FPS) of > certain 3D Apps Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. -Chris -- Chris Wilson, Intel Open Source Technology Centre
On Fri, Feb 07, 2014 at 11:59:29AM +0000, Goel, Akash wrote: > Please could you kindly elaborate here, it will help us to proceed further with this patch. As Chris said, instead of rolling your own code to track when flips are emitted to the ring, simply add a real request (with the add_request function) like the execbuf paths. Then add any additional trackin you need to our request structure. -Daniel > > Best Regards > Akash > > -----Original Message----- > From: Goel, Akash > Sent: Monday, January 13, 2014 3:17 PM > To: Chris Wilson > Cc: intel-gfx@lists.freedesktop.org > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. > > >> Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > > Sorry, couldn't understand your point. > > With Command streamer based flips, we could use the cross ring MBOX mechanism at Hw level, to ensure that buffer is flipped only when the rendering is completed. > > But with MMIO flips, need to ensure that we somehow introduce a wait for the rendering to complete, before updating the Display Surface Address register, to effect the flip. > > Best Regards > Akash > > -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Thursday, January 09, 2014 5:02 PM > To: Goel, Akash > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. > > On Thu, Jan 09, 2014 at 04:56:39PM +0530, akash.goel@intel.com wrote: > > From: Akash Goel <akash.goel@intel.com> > > > > Using MMIO based flips now on VLV for Media power well residency optimization. > > The blitter ring is currently being used just for the command streamer > > based flip calls. For pure 3D workloads, with MMIO flips, there will > > be no use of blitter ring and this will ensure the 100% residency in D0i3 for Media well. > > The other alternative of having Render ring based flip calls is not > > being used, as that option adversly affects the performance (FPS) of > > certain 3D Apps > > Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>> As Chris said, instead of rolling your own code to track when flips are emitted to the ring, simply add a real request (with the add_request function) >> like the execbuf paths. Then add any additional trackin you need to our request structure. >> -Daniel I am really sorry, but I am still not able to get it. Earlier when we were adding a MI_DISPLAY_FLIP command in the blitter ring, the cross ring MBOX synchronization at Hw level ensured that Flip command will get parsed/executed by BCS, only after the rendering has completed on RCS. But now with MMIO based flips, before updating the Display Surface register, we somehow need to wait for the rendering to get completed. But we want to avoid synchronous wait in the Page flip call. So we call the 'wait_seqno' from the context of a worker thread & once the wait is completed we update the plane register to effect a flip. I am not able to understand that how the 'add_request' will help here. Best Regards Akash -----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Friday, February 07, 2014 8:17 PM To: Goel, Akash Cc: 'Chris Wilson'; 'intel-gfx@lists.freedesktop.org' Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. On Fri, Feb 07, 2014 at 11:59:29AM +0000, Goel, Akash wrote: > Please could you kindly elaborate here, it will help us to proceed further with this patch. As Chris said, instead of rolling your own code to track when flips are emitted to the ring, simply add a real request (with the add_request function) like the execbuf paths. Then add any additional trackin you need to our request structure. -Daniel > > Best Regards > Akash > > -----Original Message----- > From: Goel, Akash > Sent: Monday, January 13, 2014 3:17 PM > To: Chris Wilson > Cc: intel-gfx@lists.freedesktop.org > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. > > >> Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > > Sorry, couldn't understand your point. > > With Command streamer based flips, we could use the cross ring MBOX mechanism at Hw level, to ensure that buffer is flipped only when the rendering is completed. > > But with MMIO flips, need to ensure that we somehow introduce a wait for the rendering to complete, before updating the Display Surface Address register, to effect the flip. > > Best Regards > Akash > > -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Thursday, January 09, 2014 5:02 PM > To: Goel, Akash > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. > > On Thu, Jan 09, 2014 at 04:56:39PM +0530, akash.goel@intel.com wrote: > > From: Akash Goel <akash.goel@intel.com> > > > > Using MMIO based flips now on VLV for Media power well residency optimization. > > The blitter ring is currently being used just for the command > > streamer based flip calls. For pure 3D workloads, with MMIO flips, > > there will be no use of blitter ring and this will ensure the 100% residency in D0i3 for Media well. > > The other alternative of having Render ring based flip calls is not > > being used, as that option adversly affects the performance (FPS) of > > certain 3D Apps > > Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Hi Daniel/Chris, > As Chris said, instead of rolling your own code to track when flips are emitted to the ring simply add a real request (with the add_request function) like the execbuf paths.Then add any additional trackin you need to our request structure. We can use the 'add_request' function. But since we already have & can track the seqno of the object (for which we want the rendering to complete), we think there shall be no additional benefit in adding a new request & then tracking the same. > Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. We had the following point in mind, when implementing the Mmio based Page flips : We wanted to completely avoid locking of the device mutex from the flip path. As we had seen sometimes the flips getting delayed because of concurrent exec buffers processing, while we are waiting for them to release the mutex. Since the public functions (i915_wait_seqno ) require mutex to be taken beforehand, we had no choice but to expose the private __wait_seqno function in order to do so. Also, we couldn't find any other signaling mechanism (other than wait_seqno type of functions) to do so. Can you please provide your feedback on the above points. Regards, Sourab -----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Friday, February 07, 2014 8:17 PM To: Goel, Akash Cc: 'Chris Wilson'; 'intel-gfx@lists.freedesktop.org' Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. On Fri, Feb 07, 2014 at 11:59:29AM +0000, Goel, Akash wrote: > Please could you kindly elaborate here, it will help us to proceed further with this patch. As Chris said, instead of rolling your own code to track when flips are emitted to the ring, simply add a real request (with the add_request function) like the execbuf paths. Then add any additional trackin you need to our request structure. -Daniel > > Best Regards > Akash > > -----Original Message----- > From: Goel, Akash > Sent: Monday, January 13, 2014 3:17 PM > To: Chris Wilson > Cc: intel-gfx@lists.freedesktop.org > Subject: RE: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. > > >> Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > > Sorry, couldn't understand your point. > > With Command streamer based flips, we could use the cross ring MBOX mechanism at Hw level, to ensure that buffer is flipped only when the rendering is completed. > > But with MMIO flips, need to ensure that we somehow introduce a wait for the rendering to complete, before updating the Display Surface Address register, to effect the flip. > > Best Regards > Akash > > -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Thursday, January 09, 2014 5:02 PM > To: Goel, Akash > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/vlv: Replaced Blitter ring based flips with MMIO Flips for VLV. > > On Thu, Jan 09, 2014 at 04:56:39PM +0530, akash.goel@intel.com wrote: > > From: Akash Goel <akash.goel@intel.com> > > > > Using MMIO based flips now on VLV for Media power well residency optimization. > > The blitter ring is currently being used just for the command > > streamer based flip calls. For pure 3D workloads, with MMIO flips, > > there will be no use of blitter ring and this will ensure the 100% residency in D0i3 for Media well. > > The other alternative of having Render ring based flip calls is not > > being used, as that option adversly affects the performance (FPS) of > > certain 3D Apps > > Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Fri, Mar 07, 2014 at 01:17:17PM +0000, Gupta, Sourab wrote: > Hi Daniel/Chris, > > > As Chris said, instead of rolling your own code to track when flips are emitted to the ring simply add a real request (with the add_request function) like the execbuf paths.Then add any additional trackin you need to our request structure. > > We can use the 'add_request' function. But since we already have & can track the seqno of the object (for which we want the rendering to complete), we think there shall be no additional benefit in adding a new request & then tracking the same. > > > Rather exporting deep magic from i915_gem, just emit the request after the mmio flip and use the normal signalling mechanisms. There are other users who could also use a request after a flip. > > We had the following point in mind, when implementing the Mmio based Page flips : > We wanted to completely avoid locking of the device mutex from the flip path. As we had seen sometimes the flips getting delayed because of concurrent exec buffers processing, while we are waiting for them to release the mutex. > Since the public functions (i915_wait_seqno ) require mutex to be taken beforehand, we had no choice but to expose the private __wait_seqno function in order to do so. > Also, we couldn't find any other signaling mechanism (other than wait_seqno type of functions) to do so. > > Can you please provide your feedback on the above points. I'd like to move this towards the atomic age. In my atomic branch I had an interrupt driven mechanism for issuing the flips when the target seqno(s) is/are reached. So my hope would be to move towards something similar so that it can be easily used for the nuclear flip later. Here's my latest code for that: https://gitorious.org/vsyrjala/linux/commit/4cad93ab1ac09d4649419789a80408c5d8505cdc
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 2e22430..6d1e496 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -74,6 +74,7 @@ enum plane { PLANE_A = 0, PLANE_B, PLANE_C, + I915_MAX_PLANES }; #define plane_name(p) ((p) + 'A') @@ -2114,6 +2115,7 @@ bool i915_gem_retire_requests(struct drm_device *dev); void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring); int __must_check i915_gem_check_wedge(struct i915_gpu_error *error, bool interruptible); +int i915_gem_check_olr(struct intel_ring_buffer *ring, u32 seqno); static inline bool i915_reset_in_progress(struct i915_gpu_error *error) { return unlikely(atomic_read(&error->reset_counter) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 656406d..1a33501 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -957,7 +957,7 @@ i915_gem_check_wedge(struct i915_gpu_error *error, * Compare seqno against outstanding lazy request. Emit a request if they are * equal. */ -static int +int i915_gem_check_olr(struct intel_ring_buffer *ring, u32 seqno) { int ret; @@ -1008,7 +1008,7 @@ static bool can_wait_boost(struct drm_i915_file_private *file_priv) * Returns 0 if the seqno was found within the alloted time. Else returns the * errno with remaining time filled in timeout argument. */ -static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno, +int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno, unsigned reset_counter, bool interruptible, struct timespec *timeout, diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 4d1357a..25aa3a8 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -52,6 +52,9 @@ static void ironlake_pch_clock_get(struct intel_crtc *crtc, static int intel_set_mode(struct drm_crtc *crtc, struct drm_display_mode *mode, int x, int y, struct drm_framebuffer *old_fb); +int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno, unsigned reset_counter, + bool interruptible, struct timespec *timeout, + struct drm_i915_file_private *file_priv); typedef struct { int min, max; @@ -68,6 +71,24 @@ struct intel_limit { intel_p2_t p2; }; +struct i915_flip_data { + struct drm_crtc *crtc; + u32 seqno; + u32 ring_id; +}; + +struct i915_flip_work { + struct i915_flip_data flipdata; + struct work_struct work; +}; + +/* + * Need one work item only for each primary plane, + * as we support only one outstanding flip request + * on each plane at a time. + */ +static struct i915_flip_work flip_works[I915_MAX_PLANES]; + int intel_pch_rawclk(struct drm_device *dev) { @@ -8588,6 +8609,123 @@ err: return ret; } +static void intel_gen7_queue_mmio_flip_work(struct work_struct *__work) +{ + struct i915_flip_work *flipwork = + container_of(__work, struct i915_flip_work, work); + int ret = 0; + unsigned int reset_counter; + struct drm_crtc *crtc = flipwork->flipdata.crtc; + struct intel_crtc *intel_crtc = to_intel_crtc(crtc); + struct drm_device *dev = crtc->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + struct intel_ring_buffer *ring = + &dev_priv->ring[flipwork->flipdata.ring_id]; + + if (dev_priv->ums.mm_suspended || (ring->obj == NULL)) { + DRM_ERROR("flip attempted while the ring is not ready\n"); + return; + } + + /* + * Wait is needed only for nonZero Seqnos, as zero Seqno indicates + * that either the rendering on the object (through GPU) is already + * completed or not intiated at all + */ + if (flipwork->flipdata.seqno > 0) { + reset_counter = + atomic_read(&dev_priv->gpu_error.reset_counter); + /* sleep wait until the seqno has passed */ + ret = __wait_seqno(ring, flipwork->flipdata.seqno, + reset_counter, true, NULL, NULL); + if (ret) + DRM_ERROR("wait_seqno failed on seqno 0x%x(%d)\n", + flipwork->flipdata.seqno, + flipwork->flipdata.ring_id); + } + + intel_mark_page_flip_active(intel_crtc); + i9xx_update_plane(crtc, crtc->fb, 0, 0); +} + +/* + * Using MMIO based flips starting from VLV, for Media power well + * residency optimization. The other alternative of having Render + * ring based flip calls is not being used, as that option adversly + * affects the performance (FPS) of certain 3D Apps. + */ +static int intel_gen7_queue_mmio_flip(struct drm_device *dev, + struct drm_crtc *crtc, + struct drm_framebuffer *fb, + struct drm_i915_gem_object *obj, + uint32_t flags) +{ + struct drm_i915_private *dev_priv = dev->dev_private; + struct intel_crtc *intel_crtc = to_intel_crtc(crtc); + struct i915_flip_work *work = &flip_works[intel_crtc->plane]; + int ret; + + ret = intel_pin_and_fence_fb_obj(dev, obj, obj->ring); + if (ret) + goto err; + + switch (intel_crtc->plane) { + case PLANE_A: + case PLANE_B: + case PLANE_C: + break; + default: + WARN_ONCE(1, "unknown plane in flip command\n"); + ret = -ENODEV; + goto err_unpin; + } + + work->flipdata.crtc = crtc; + work->flipdata.seqno = obj->last_write_seqno; + work->flipdata.ring_id = RCS; + + if (obj->last_write_seqno > 0) { + if (obj->ring) { + work->flipdata.ring_id = obj->ring->id; + /* + * Check if there is a need to add the request + * in the ring to emit the seqno for this fb obj + */ + ret = i915_gem_check_olr(obj->ring, + obj->last_write_seqno); + if (ret) + goto err_unpin; + } else { + DRM_ERROR("NULL ring for active obj with seqno %x\n", + obj->last_write_seqno); + ret = -EINVAL; + goto err_unpin; + } + } + + /* + * Flush the work as it could be running now. + * Although this could only happen in a particular, + * very rare condition, as we allow only 1 outstanding + * flip at a time through the 'unpin_work' variable. + * To be checked, if its really needed or not + */ + flush_work(&work->work); + + /* + * Queue the MMIO flip work in our private workqueue. + */ + queue_work(dev_priv->flipwq, &work->work); + + return 0; + +err_unpin: + intel_unpin_fb_obj(obj); +err: + return ret; +} + + static int intel_default_queue_flip(struct drm_device *dev, struct drm_crtc *crtc, struct drm_framebuffer *fb, @@ -10171,6 +10309,12 @@ static void intel_crtc_init(struct drm_device *dev, int pipe) dev_priv->pipe_to_crtc_mapping[intel_crtc->pipe] = &intel_crtc->base; drm_crtc_helper_add(&intel_crtc->base, &intel_helper_funcs); + + /* + * Initialize the flip work item (one per primary plane) + */ + INIT_WORK(&flip_works[intel_crtc->plane].work, + intel_gen7_queue_mmio_flip_work); } enum pipe intel_get_pipe_from_connector(struct intel_connector *connector) @@ -10675,6 +10819,9 @@ static void intel_init_display(struct drm_device *dev) break; } + if (IS_VALLEYVIEW(dev)) + dev_priv->display.queue_flip = intel_gen7_queue_mmio_flip; + intel_panel_init_backlight_funcs(dev); }