Message ID | 1445349004-16409-16-git-send-email-paulo.r.zanoni@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Oct 20, 2015 at 11:50:01AM -0200, Paulo Zanoni wrote: > One of the problems with the current code is that it frees the CFB and > releases its drm_mm node as soon as we flip FBC's enable bit. This is > bad because after we disbale FBC the hardware may still use the CFB > for the rest of the frame, so in theory we should only release the > drm_mm node one frame after we disable FBC. Otherwise, a stolen memory > allocation done right after an FBC disable may result in either > corrupted memory for the new owner of that memory region or corrupted > screen/underruns in case the new owner changes it while the hardware > is still reading it. This case is not exactly easy to reproduce since > we currently don't do a lot of stolen memory allocations, but I see > patches on the mailing list trying to expose stolen memory to user > space, so races will be possible. > > I thought about three different approaches to solve this, and they all > have downsides. > > The first approach would be to simply use multiple drm_mm nodes and > freeing the unused ones only after a frame has passed. The problem > with this approach is that since stolen memory is rather small, > there's a risk we just won't be able to allocate a new CFB from stolen > if the previous one was not freed yet. This could happen in case we > quickly disable FBC from pipe A and decide to enable it on pipe B, or > just if we change pipe A's fb stride while FBC is enabled. > > The second approach would be similar to the first one, but maintaining > a single drm_mm node and keeping track of when it can be reused. This > would remove the disadvantage of not having enough space for two > nodes, but would create the new problem where we may not be able to > enable FBC at the point intel_fbc_update() is called, so we would have > to add more code to retry updating FBC after the time has passed. And > that can quickly get too complex since we can get invalidate, flush, > flip_prepare, disable and other calls in the middle of the wait. > > Both solutions above - and also the current code - have the problem > that we unnecessarily free+realloc FBC during invalidate+flush > operations even if the CFB size doesn't change. > > The third option would be to move the allocation/deallocation to > enable/disable. This makes sure that the pipe is always disabled when > we allocate/deallocate the CFB, so there's no risk that the FBC > hardware may read or write to the memory right after it is freed from > drm_mm. The downside is that it is possible for user space to change > the buffer stride without triggering a disable/enable - only > deactivate/activate -, so we'll have to handle this case somehow, even > though it is uncommon - see igt's kms_frontbuffer_tracking test, > fbc-stridechange subtest. It could be possible to implement a way to > free+alloc the CFB during said stride change, but it would involve a > lot of book-keeping - exactly as mentioned above - just for a rare > case, so for now I'll keep it simple and just deactivate FBC. Besides, > we may not even need to disable FBC since we do CFB over-allocation. > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > --- > drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++------------------- > 1 file changed, 68 insertions(+), 64 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c > index bf855b2..48d8cfc 100644 > --- a/drivers/gpu/drm/i915/intel_fbc.c > +++ b/drivers/gpu/drm/i915/intel_fbc.c > @@ -59,6 +59,45 @@ static unsigned int get_crtc_fence_y_offset(struct intel_crtc *crtc) > return crtc->base.y - crtc->adjusted_y; > } > > +/* > + * For SKL+, the plane source size used by the hardware is based on the value we > + * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value > + * we wrote to PIPESRC. > + */ > +static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, > + int *width, int *height) > +{ > + struct intel_plane_state *plane_state = > + to_intel_plane_state(crtc->base.primary->state); > + int w, h; > + > + if (intel_rotation_90_or_270(plane_state->base.rotation)) { > + w = drm_rect_height(&plane_state->src) >> 16; > + h = drm_rect_width(&plane_state->src) >> 16; > + } else { > + w = drm_rect_width(&plane_state->src) >> 16; > + h = drm_rect_height(&plane_state->src) >> 16; > + } > + > + if (width) > + *width = w; > + if (height) > + *height = h; > +} > + > +static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc, > + struct drm_framebuffer *fb) > +{ > + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > + int lines; > + > + intel_fbc_get_plane_source_size(crtc, NULL, &lines); > + if (INTEL_INFO(dev_priv)->gen >= 7) > + lines = min(lines, 2048); > + > + return lines * fb->pitches[0]; Why are we even looking at fb->pitches here? I thought that for fbc we only compress what we actually scan out, i.e. the source rectangle of the plane in pixels. Maybe some rounding involved perhaps. Or did I miss something? -Daniel > +} > + > static void i8xx_fbc_deactivate(struct drm_i915_private *dev_priv) > { > u32 fbc_ctl; > @@ -604,11 +643,17 @@ again: > } > } > > -static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv, int size, > - int fb_cpp) > +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) > { > + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > + struct drm_framebuffer *fb = crtc->base.primary->state->fb; > struct drm_mm_node *uninitialized_var(compressed_llb); > - int ret; > + int size, fb_cpp, ret; > + > + WARN_ON(dev_priv->fbc.compressed_fb.allocated); > + > + size = intel_fbc_calculate_cfb_size(crtc, fb); > + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > ret = find_compression_threshold(dev_priv, &dev_priv->fbc.compressed_fb, > size, fb_cpp); > @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) > mutex_unlock(&dev_priv->fbc.lock); > } > > -/* > - * For SKL+, the plane source size used by the hardware is based on the value we > - * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value > - * we wrote to PIPESRC. > - */ > -static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, > - int *width, int *height) > -{ > - struct intel_plane_state *plane_state = > - to_intel_plane_state(crtc->base.primary->state); > - int w, h; > - > - if (intel_rotation_90_or_270(plane_state->base.rotation)) { > - w = drm_rect_height(&plane_state->src) >> 16; > - h = drm_rect_width(&plane_state->src) >> 16; > - } else { > - w = drm_rect_width(&plane_state->src) >> 16; > - h = drm_rect_height(&plane_state->src) >> 16; > - } > - > - if (width) > - *width = w; > - if (height) > - *height = h; > -} > - > -static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc) > -{ > - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > - struct drm_framebuffer *fb = crtc->base.primary->fb; > - int lines; > - > - intel_fbc_get_plane_source_size(crtc, NULL, &lines); > - if (INTEL_INFO(dev_priv)->gen >= 7) > - lines = min(lines, 2048); > - > - return lines * fb->pitches[0]; > -} > - > -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) > -{ > - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > - struct drm_framebuffer *fb = crtc->base.primary->fb; > - int size, cpp; > - > - size = intel_fbc_calculate_cfb_size(crtc); > - cpp = drm_format_plane_cpp(fb->pixel_format, 0); > - > - if (dev_priv->fbc.compressed_fb.allocated && > - size <= dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) > - return 0; > - > - /* Release any current block */ > - __intel_fbc_cleanup_cfb(dev_priv); > - > - return intel_fbc_alloc_cfb(dev_priv, size, cpp); > -} > - > static bool stride_is_valid(struct drm_i915_private *dev_priv, > unsigned int stride) > { > @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct intel_crtc *crtc) > goto out_disable; > } > > - if (intel_fbc_setup_cfb(crtc)) { > - set_no_fbc_reason(dev_priv, "not enough stolen memory"); > + /* It is possible for the required CFB size change without a > + * crtc->disable + crtc->enable since it is possible to change the > + * stride without triggering a full modeset. Since we try to > + * over-allocate the CFB, there's a chance we may keep FBC enabled even > + * if this happens, but if we exceed the current CFB size we'll have to > + * disable FBC. Notice that it would be possible to disable FBC, wait > + * for a frame, free the stolen node, then try to reenable FBC in case > + * we didn't get any invalidate/deactivate calls, but this would require > + * a lot of tracking just for a case we expect to be uncommon, so we > + * just don't have this code for now. */ > + if (intel_fbc_calculate_cfb_size(crtc, fb) > > + dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) { > + set_no_fbc_reason(dev_priv, "CFB requirements changed"); > goto out_disable; > } > > @@ -958,7 +956,6 @@ out_disable: > DRM_DEBUG_KMS("unsupported config, deactivating FBC\n"); > __intel_fbc_deactivate(dev_priv); > } > - __intel_fbc_cleanup_cfb(dev_priv); > } > > /* > @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc *crtc) > goto out; > } > > + if (intel_fbc_alloc_cfb(crtc)) { > + set_no_fbc_reason(dev_priv, "not enough stolen memory"); > + goto out; > + } > + > DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", pipe_name(crtc->pipe)); > dev_priv->fbc.no_fbc_reason = "FBC enabled but not active yet\n"; > > @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct drm_i915_private *dev_priv) > > DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", pipe_name(crtc->pipe)); > > + __intel_fbc_cleanup_cfb(dev_priv); > + > dev_priv->fbc.enabled = false; > dev_priv->fbc.crtc = NULL; > } > -- > 2.6.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Oct 21, 2015 at 09:11:08AM +0200, Daniel Vetter wrote: > On Tue, Oct 20, 2015 at 11:50:01AM -0200, Paulo Zanoni wrote: > > One of the problems with the current code is that it frees the CFB and > > releases its drm_mm node as soon as we flip FBC's enable bit. This is > > bad because after we disbale FBC the hardware may still use the CFB > > for the rest of the frame, so in theory we should only release the > > drm_mm node one frame after we disable FBC. Otherwise, a stolen memory > > allocation done right after an FBC disable may result in either > > corrupted memory for the new owner of that memory region or corrupted > > screen/underruns in case the new owner changes it while the hardware > > is still reading it. This case is not exactly easy to reproduce since > > we currently don't do a lot of stolen memory allocations, but I see > > patches on the mailing list trying to expose stolen memory to user > > space, so races will be possible. > > > > I thought about three different approaches to solve this, and they all > > have downsides. > > > > The first approach would be to simply use multiple drm_mm nodes and > > freeing the unused ones only after a frame has passed. The problem > > with this approach is that since stolen memory is rather small, > > there's a risk we just won't be able to allocate a new CFB from stolen > > if the previous one was not freed yet. This could happen in case we > > quickly disable FBC from pipe A and decide to enable it on pipe B, or > > just if we change pipe A's fb stride while FBC is enabled. > > > > The second approach would be similar to the first one, but maintaining > > a single drm_mm node and keeping track of when it can be reused. This > > would remove the disadvantage of not having enough space for two > > nodes, but would create the new problem where we may not be able to > > enable FBC at the point intel_fbc_update() is called, so we would have > > to add more code to retry updating FBC after the time has passed. And > > that can quickly get too complex since we can get invalidate, flush, > > flip_prepare, disable and other calls in the middle of the wait. > > > > Both solutions above - and also the current code - have the problem > > that we unnecessarily free+realloc FBC during invalidate+flush > > operations even if the CFB size doesn't change. > > > > The third option would be to move the allocation/deallocation to > > enable/disable. This makes sure that the pipe is always disabled when > > we allocate/deallocate the CFB, so there's no risk that the FBC > > hardware may read or write to the memory right after it is freed from > > drm_mm. The downside is that it is possible for user space to change > > the buffer stride without triggering a disable/enable - only > > deactivate/activate -, so we'll have to handle this case somehow, even > > though it is uncommon - see igt's kms_frontbuffer_tracking test, > > fbc-stridechange subtest. It could be possible to implement a way to > > free+alloc the CFB during said stride change, but it would involve a > > lot of book-keeping - exactly as mentioned above - just for a rare > > case, so for now I'll keep it simple and just deactivate FBC. Besides, > > we may not even need to disable FBC since we do CFB over-allocation. > > > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > > --- > > drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++------------------- > > 1 file changed, 68 insertions(+), 64 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c > > index bf855b2..48d8cfc 100644 > > --- a/drivers/gpu/drm/i915/intel_fbc.c > > +++ b/drivers/gpu/drm/i915/intel_fbc.c > > @@ -59,6 +59,45 @@ static unsigned int get_crtc_fence_y_offset(struct intel_crtc *crtc) > > return crtc->base.y - crtc->adjusted_y; > > } > > > > +/* > > + * For SKL+, the plane source size used by the hardware is based on the value we > > + * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value > > + * we wrote to PIPESRC. > > + */ > > +static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, > > + int *width, int *height) > > +{ > > + struct intel_plane_state *plane_state = > > + to_intel_plane_state(crtc->base.primary->state); > > + int w, h; > > + > > + if (intel_rotation_90_or_270(plane_state->base.rotation)) { > > + w = drm_rect_height(&plane_state->src) >> 16; > > + h = drm_rect_width(&plane_state->src) >> 16; > > + } else { > > + w = drm_rect_width(&plane_state->src) >> 16; > > + h = drm_rect_height(&plane_state->src) >> 16; > > + } > > + > > + if (width) > > + *width = w; > > + if (height) > > + *height = h; > > +} > > + > > +static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc, > > + struct drm_framebuffer *fb) > > +{ > > + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > + int lines; > > + > > + intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > + if (INTEL_INFO(dev_priv)->gen >= 7) > > + lines = min(lines, 2048); > > + > > + return lines * fb->pitches[0]; > > Why are we even looking at fb->pitches here? I thought that for fbc we > only compress what we actually scan out, i.e. the source rectangle of the > plane in pixels. Maybe some rounding involved perhaps. IIRC the docs do say the hw uses the plane stride register figure this out. It's something I've been wondering for a few years now, but never actually tested to see if we can get away with less. > > Or did I miss something? > -Daniel > > > +} > > + > > static void i8xx_fbc_deactivate(struct drm_i915_private *dev_priv) > > { > > u32 fbc_ctl; > > @@ -604,11 +643,17 @@ again: > > } > > } > > > > -static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv, int size, > > - int fb_cpp) > > +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) > > { > > + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > + struct drm_framebuffer *fb = crtc->base.primary->state->fb; > > struct drm_mm_node *uninitialized_var(compressed_llb); > > - int ret; > > + int size, fb_cpp, ret; > > + > > + WARN_ON(dev_priv->fbc.compressed_fb.allocated); > > + > > + size = intel_fbc_calculate_cfb_size(crtc, fb); > > + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > ret = find_compression_threshold(dev_priv, &dev_priv->fbc.compressed_fb, > > size, fb_cpp); > > @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) > > mutex_unlock(&dev_priv->fbc.lock); > > } > > > > -/* > > - * For SKL+, the plane source size used by the hardware is based on the value we > > - * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value > > - * we wrote to PIPESRC. > > - */ > > -static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, > > - int *width, int *height) > > -{ > > - struct intel_plane_state *plane_state = > > - to_intel_plane_state(crtc->base.primary->state); > > - int w, h; > > - > > - if (intel_rotation_90_or_270(plane_state->base.rotation)) { > > - w = drm_rect_height(&plane_state->src) >> 16; > > - h = drm_rect_width(&plane_state->src) >> 16; > > - } else { > > - w = drm_rect_width(&plane_state->src) >> 16; > > - h = drm_rect_height(&plane_state->src) >> 16; > > - } > > - > > - if (width) > > - *width = w; > > - if (height) > > - *height = h; > > -} > > - > > -static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc) > > -{ > > - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > - int lines; > > - > > - intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > - if (INTEL_INFO(dev_priv)->gen >= 7) > > - lines = min(lines, 2048); > > - > > - return lines * fb->pitches[0]; > > -} > > - > > -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) > > -{ > > - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > - int size, cpp; > > - > > - size = intel_fbc_calculate_cfb_size(crtc); > > - cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > - > > - if (dev_priv->fbc.compressed_fb.allocated && > > - size <= dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) > > - return 0; > > - > > - /* Release any current block */ > > - __intel_fbc_cleanup_cfb(dev_priv); > > - > > - return intel_fbc_alloc_cfb(dev_priv, size, cpp); > > -} > > - > > static bool stride_is_valid(struct drm_i915_private *dev_priv, > > unsigned int stride) > > { > > @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct intel_crtc *crtc) > > goto out_disable; > > } > > > > - if (intel_fbc_setup_cfb(crtc)) { > > - set_no_fbc_reason(dev_priv, "not enough stolen memory"); > > + /* It is possible for the required CFB size change without a > > + * crtc->disable + crtc->enable since it is possible to change the > > + * stride without triggering a full modeset. Since we try to > > + * over-allocate the CFB, there's a chance we may keep FBC enabled even > > + * if this happens, but if we exceed the current CFB size we'll have to > > + * disable FBC. Notice that it would be possible to disable FBC, wait > > + * for a frame, free the stolen node, then try to reenable FBC in case > > + * we didn't get any invalidate/deactivate calls, but this would require > > + * a lot of tracking just for a case we expect to be uncommon, so we > > + * just don't have this code for now. */ > > + if (intel_fbc_calculate_cfb_size(crtc, fb) > > > + dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) { > > + set_no_fbc_reason(dev_priv, "CFB requirements changed"); > > goto out_disable; > > } > > > > @@ -958,7 +956,6 @@ out_disable: > > DRM_DEBUG_KMS("unsupported config, deactivating FBC\n"); > > __intel_fbc_deactivate(dev_priv); > > } > > - __intel_fbc_cleanup_cfb(dev_priv); > > } > > > > /* > > @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc *crtc) > > goto out; > > } > > > > + if (intel_fbc_alloc_cfb(crtc)) { > > + set_no_fbc_reason(dev_priv, "not enough stolen memory"); > > + goto out; > > + } > > + > > DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", pipe_name(crtc->pipe)); > > dev_priv->fbc.no_fbc_reason = "FBC enabled but not active yet\n"; > > > > @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct drm_i915_private *dev_priv) > > > > DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", pipe_name(crtc->pipe)); > > > > + __intel_fbc_cleanup_cfb(dev_priv); > > + > > dev_priv->fbc.enabled = false; > > dev_priv->fbc.crtc = NULL; > > } > > -- > > 2.6.1 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Oct 21, 2015 at 10:20:55AM +0300, Ville Syrjälä wrote: > On Wed, Oct 21, 2015 at 09:11:08AM +0200, Daniel Vetter wrote: > > On Tue, Oct 20, 2015 at 11:50:01AM -0200, Paulo Zanoni wrote: > > > One of the problems with the current code is that it frees the CFB and > > > releases its drm_mm node as soon as we flip FBC's enable bit. This is > > > bad because after we disbale FBC the hardware may still use the CFB > > > for the rest of the frame, so in theory we should only release the > > > drm_mm node one frame after we disable FBC. Otherwise, a stolen memory > > > allocation done right after an FBC disable may result in either > > > corrupted memory for the new owner of that memory region or corrupted > > > screen/underruns in case the new owner changes it while the hardware > > > is still reading it. This case is not exactly easy to reproduce since > > > we currently don't do a lot of stolen memory allocations, but I see > > > patches on the mailing list trying to expose stolen memory to user > > > space, so races will be possible. > > > > > > I thought about three different approaches to solve this, and they all > > > have downsides. > > > > > > The first approach would be to simply use multiple drm_mm nodes and > > > freeing the unused ones only after a frame has passed. The problem > > > with this approach is that since stolen memory is rather small, > > > there's a risk we just won't be able to allocate a new CFB from stolen > > > if the previous one was not freed yet. This could happen in case we > > > quickly disable FBC from pipe A and decide to enable it on pipe B, or > > > just if we change pipe A's fb stride while FBC is enabled. > > > > > > The second approach would be similar to the first one, but maintaining > > > a single drm_mm node and keeping track of when it can be reused. This > > > would remove the disadvantage of not having enough space for two > > > nodes, but would create the new problem where we may not be able to > > > enable FBC at the point intel_fbc_update() is called, so we would have > > > to add more code to retry updating FBC after the time has passed. And > > > that can quickly get too complex since we can get invalidate, flush, > > > flip_prepare, disable and other calls in the middle of the wait. > > > > > > Both solutions above - and also the current code - have the problem > > > that we unnecessarily free+realloc FBC during invalidate+flush > > > operations even if the CFB size doesn't change. > > > > > > The third option would be to move the allocation/deallocation to > > > enable/disable. This makes sure that the pipe is always disabled when > > > we allocate/deallocate the CFB, so there's no risk that the FBC > > > hardware may read or write to the memory right after it is freed from > > > drm_mm. The downside is that it is possible for user space to change > > > the buffer stride without triggering a disable/enable - only > > > deactivate/activate -, so we'll have to handle this case somehow, even > > > though it is uncommon - see igt's kms_frontbuffer_tracking test, > > > fbc-stridechange subtest. It could be possible to implement a way to > > > free+alloc the CFB during said stride change, but it would involve a > > > lot of book-keeping - exactly as mentioned above - just for a rare > > > case, so for now I'll keep it simple and just deactivate FBC. Besides, > > > we may not even need to disable FBC since we do CFB over-allocation. > > > > > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++------------------- > > > 1 file changed, 68 insertions(+), 64 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c > > > index bf855b2..48d8cfc 100644 > > > --- a/drivers/gpu/drm/i915/intel_fbc.c > > > +++ b/drivers/gpu/drm/i915/intel_fbc.c > > > @@ -59,6 +59,45 @@ static unsigned int get_crtc_fence_y_offset(struct intel_crtc *crtc) > > > return crtc->base.y - crtc->adjusted_y; > > > } > > > > > > +/* > > > + * For SKL+, the plane source size used by the hardware is based on the value we > > > + * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value > > > + * we wrote to PIPESRC. > > > + */ > > > +static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, > > > + int *width, int *height) > > > +{ > > > + struct intel_plane_state *plane_state = > > > + to_intel_plane_state(crtc->base.primary->state); > > > + int w, h; > > > + > > > + if (intel_rotation_90_or_270(plane_state->base.rotation)) { > > > + w = drm_rect_height(&plane_state->src) >> 16; > > > + h = drm_rect_width(&plane_state->src) >> 16; > > > + } else { > > > + w = drm_rect_width(&plane_state->src) >> 16; > > > + h = drm_rect_height(&plane_state->src) >> 16; > > > + } > > > + > > > + if (width) > > > + *width = w; > > > + if (height) > > > + *height = h; > > > +} > > > + > > > +static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc, > > > + struct drm_framebuffer *fb) > > > +{ > > > + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > > + int lines; > > > + > > > + intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > + if (INTEL_INFO(dev_priv)->gen >= 7) > > > + lines = min(lines, 2048); > > > + > > > + return lines * fb->pitches[0]; > > > > Why are we even looking at fb->pitches here? I thought that for fbc we > > only compress what we actually scan out, i.e. the source rectangle of the > > plane in pixels. Maybe some rounding involved perhaps. > > IIRC the docs do say the hw uses the plane stride register figure this > out. It's something I've been wondering for a few years now, but never > actually tested to see if we can get away with less. In that case a comment would be great, maybe even with the Bspec citation. -Daniel > > > > > Or did I miss something? > > -Daniel > > > > > +} > > > + > > > static void i8xx_fbc_deactivate(struct drm_i915_private *dev_priv) > > > { > > > u32 fbc_ctl; > > > @@ -604,11 +643,17 @@ again: > > > } > > > } > > > > > > -static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv, int size, > > > - int fb_cpp) > > > +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) > > > { > > > + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > > + struct drm_framebuffer *fb = crtc->base.primary->state->fb; > > > struct drm_mm_node *uninitialized_var(compressed_llb); > > > - int ret; > > > + int size, fb_cpp, ret; > > > + > > > + WARN_ON(dev_priv->fbc.compressed_fb.allocated); > > > + > > > + size = intel_fbc_calculate_cfb_size(crtc, fb); > > > + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > > > ret = find_compression_threshold(dev_priv, &dev_priv->fbc.compressed_fb, > > > size, fb_cpp); > > > @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) > > > mutex_unlock(&dev_priv->fbc.lock); > > > } > > > > > > -/* > > > - * For SKL+, the plane source size used by the hardware is based on the value we > > > - * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value > > > - * we wrote to PIPESRC. > > > - */ > > > -static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, > > > - int *width, int *height) > > > -{ > > > - struct intel_plane_state *plane_state = > > > - to_intel_plane_state(crtc->base.primary->state); > > > - int w, h; > > > - > > > - if (intel_rotation_90_or_270(plane_state->base.rotation)) { > > > - w = drm_rect_height(&plane_state->src) >> 16; > > > - h = drm_rect_width(&plane_state->src) >> 16; > > > - } else { > > > - w = drm_rect_width(&plane_state->src) >> 16; > > > - h = drm_rect_height(&plane_state->src) >> 16; > > > - } > > > - > > > - if (width) > > > - *width = w; > > > - if (height) > > > - *height = h; > > > -} > > > - > > > -static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc) > > > -{ > > > - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > - int lines; > > > - > > > - intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > - if (INTEL_INFO(dev_priv)->gen >= 7) > > > - lines = min(lines, 2048); > > > - > > > - return lines * fb->pitches[0]; > > > -} > > > - > > > -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) > > > -{ > > > - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > - int size, cpp; > > > - > > > - size = intel_fbc_calculate_cfb_size(crtc); > > > - cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > - > > > - if (dev_priv->fbc.compressed_fb.allocated && > > > - size <= dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) > > > - return 0; > > > - > > > - /* Release any current block */ > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > - > > > - return intel_fbc_alloc_cfb(dev_priv, size, cpp); > > > -} > > > - > > > static bool stride_is_valid(struct drm_i915_private *dev_priv, > > > unsigned int stride) > > > { > > > @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct intel_crtc *crtc) > > > goto out_disable; > > > } > > > > > > - if (intel_fbc_setup_cfb(crtc)) { > > > - set_no_fbc_reason(dev_priv, "not enough stolen memory"); > > > + /* It is possible for the required CFB size change without a > > > + * crtc->disable + crtc->enable since it is possible to change the > > > + * stride without triggering a full modeset. Since we try to > > > + * over-allocate the CFB, there's a chance we may keep FBC enabled even > > > + * if this happens, but if we exceed the current CFB size we'll have to > > > + * disable FBC. Notice that it would be possible to disable FBC, wait > > > + * for a frame, free the stolen node, then try to reenable FBC in case > > > + * we didn't get any invalidate/deactivate calls, but this would require > > > + * a lot of tracking just for a case we expect to be uncommon, so we > > > + * just don't have this code for now. */ > > > + if (intel_fbc_calculate_cfb_size(crtc, fb) > > > > + dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) { > > > + set_no_fbc_reason(dev_priv, "CFB requirements changed"); > > > goto out_disable; > > > } > > > > > > @@ -958,7 +956,6 @@ out_disable: > > > DRM_DEBUG_KMS("unsupported config, deactivating FBC\n"); > > > __intel_fbc_deactivate(dev_priv); > > > } > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > } > > > > > > /* > > > @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc *crtc) > > > goto out; > > > } > > > > > > + if (intel_fbc_alloc_cfb(crtc)) { > > > + set_no_fbc_reason(dev_priv, "not enough stolen memory"); > > > + goto out; > > > + } > > > + > > > DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", pipe_name(crtc->pipe)); > > > dev_priv->fbc.no_fbc_reason = "FBC enabled but not active yet\n"; > > > > > > @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct drm_i915_private *dev_priv) > > > > > > DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", pipe_name(crtc->pipe)); > > > > > > + __intel_fbc_cleanup_cfb(dev_priv); > > > + > > > dev_priv->fbc.enabled = false; > > > dev_priv->fbc.crtc = NULL; > > > } > > > -- > > > 2.6.1 > > > > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Ville Syrjälä > Intel OTC
Em Qua, 2015-10-21 às 09:24 +0200, Daniel Vetter escreveu: > On Wed, Oct 21, 2015 at 10:20:55AM +0300, Ville Syrjälä wrote: > > On Wed, Oct 21, 2015 at 09:11:08AM +0200, Daniel Vetter wrote: > > > On Tue, Oct 20, 2015 at 11:50:01AM -0200, Paulo Zanoni wrote: > > > > One of the problems with the current code is that it frees the > > > > CFB and > > > > releases its drm_mm node as soon as we flip FBC's enable bit. > > > > This is > > > > bad because after we disbale FBC the hardware may still use the > > > > CFB > > > > for the rest of the frame, so in theory we should only release > > > > the > > > > drm_mm node one frame after we disable FBC. Otherwise, a stolen > > > > memory > > > > allocation done right after an FBC disable may result in either > > > > corrupted memory for the new owner of that memory region or > > > > corrupted > > > > screen/underruns in case the new owner changes it while the > > > > hardware > > > > is still reading it. This case is not exactly easy to reproduce > > > > since > > > > we currently don't do a lot of stolen memory allocations, but I > > > > see > > > > patches on the mailing list trying to expose stolen memory to > > > > user > > > > space, so races will be possible. > > > > > > > > I thought about three different approaches to solve this, and > > > > they all > > > > have downsides. > > > > > > > > The first approach would be to simply use multiple drm_mm nodes > > > > and > > > > freeing the unused ones only after a frame has passed. The > > > > problem > > > > with this approach is that since stolen memory is rather small, > > > > there's a risk we just won't be able to allocate a new CFB from > > > > stolen > > > > if the previous one was not freed yet. This could happen in > > > > case we > > > > quickly disable FBC from pipe A and decide to enable it on pipe > > > > B, or > > > > just if we change pipe A's fb stride while FBC is enabled. > > > > > > > > The second approach would be similar to the first one, but > > > > maintaining > > > > a single drm_mm node and keeping track of when it can be > > > > reused. This > > > > would remove the disadvantage of not having enough space for > > > > two > > > > nodes, but would create the new problem where we may not be > > > > able to > > > > enable FBC at the point intel_fbc_update() is called, so we > > > > would have > > > > to add more code to retry updating FBC after the time has > > > > passed. And > > > > that can quickly get too complex since we can get invalidate, > > > > flush, > > > > flip_prepare, disable and other calls in the middle of the > > > > wait. > > > > > > > > Both solutions above - and also the current code - have the > > > > problem > > > > that we unnecessarily free+realloc FBC during invalidate+flush > > > > operations even if the CFB size doesn't change. > > > > > > > > The third option would be to move the allocation/deallocation > > > > to > > > > enable/disable. This makes sure that the pipe is always > > > > disabled when > > > > we allocate/deallocate the CFB, so there's no risk that the FBC > > > > hardware may read or write to the memory right after it is > > > > freed from > > > > drm_mm. The downside is that it is possible for user space to > > > > change > > > > the buffer stride without triggering a disable/enable - only > > > > deactivate/activate -, so we'll have to handle this case > > > > somehow, even > > > > though it is uncommon - see igt's kms_frontbuffer_tracking > > > > test, > > > > fbc-stridechange subtest. It could be possible to implement a > > > > way to > > > > free+alloc the CFB during said stride change, but it would > > > > involve a > > > > lot of book-keeping - exactly as mentioned above - just for a > > > > rare > > > > case, so for now I'll keep it simple and just deactivate FBC. > > > > Besides, > > > > we may not even need to disable FBC since we do CFB over- > > > > allocation. > > > > > > > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++--- > > > > ---------------- > > > > 1 file changed, 68 insertions(+), 64 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_fbc.c > > > > b/drivers/gpu/drm/i915/intel_fbc.c > > > > index bf855b2..48d8cfc 100644 > > > > --- a/drivers/gpu/drm/i915/intel_fbc.c > > > > +++ b/drivers/gpu/drm/i915/intel_fbc.c > > > > @@ -59,6 +59,45 @@ static unsigned int > > > > get_crtc_fence_y_offset(struct intel_crtc *crtc) > > > > return crtc->base.y - crtc->adjusted_y; > > > > } > > > > > > > > +/* > > > > + * For SKL+, the plane source size used by the hardware is > > > > based on the value we > > > > + * write to the PLANE_SIZE register. For BDW-, the hardware > > > > looks at the value > > > > + * we wrote to PIPESRC. > > > > + */ > > > > +static void intel_fbc_get_plane_source_size(struct intel_crtc > > > > *crtc, > > > > + int *width, int > > > > *height) > > > > +{ > > > > + struct intel_plane_state *plane_state = > > > > + to_intel_plane_state(crtc- > > > > >base.primary->state); > > > > + int w, h; > > > > + > > > > + if (intel_rotation_90_or_270(plane_state- > > > > >base.rotation)) { > > > > + w = drm_rect_height(&plane_state->src) >> 16; > > > > + h = drm_rect_width(&plane_state->src) >> 16; > > > > + } else { > > > > + w = drm_rect_width(&plane_state->src) >> 16; > > > > + h = drm_rect_height(&plane_state->src) >> 16; > > > > + } > > > > + > > > > + if (width) > > > > + *width = w; > > > > + if (height) > > > > + *height = h; > > > > +} > > > > + > > > > +static int intel_fbc_calculate_cfb_size(struct intel_crtc > > > > *crtc, > > > > + struct drm_framebuffer > > > > *fb) > > > > +{ > > > > + struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > + int lines; > > > > + > > > > + intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > > + if (INTEL_INFO(dev_priv)->gen >= 7) > > > > + lines = min(lines, 2048); > > > > + > > > > + return lines * fb->pitches[0]; > > > > > > Why are we even looking at fb->pitches here? I thought that for > > > fbc we > > > only compress what we actually scan out, i.e. the source > > > rectangle of the > > > plane in pixels. Maybe some rounding involved perhaps. > > > > IIRC the docs do say the hw uses the plane stride register figure > > this > > out. It's something I've been wondering for a few years now, but > > never > > actually tested to see if we can get away with less. So we have a volunteer to implement the stolen memory checker? :) > In that case a comment would be great, maybe even with the Bspec > citation. Well, the docs talk about the stride and the HW guys confirmed it, so it didn't seem to be as something that needed a comment. You can also git-blame it :) http://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915/intel _fbc.c?id=c4ffd40908c30a33291227920e921f6b45b9e8f7 > -Daniel > > > > > > > > > Or did I miss something? > > > -Daniel > > > > > > > +} > > > > + > > > > static void i8xx_fbc_deactivate(struct drm_i915_private > > > > *dev_priv) > > > > { > > > > u32 fbc_ctl; > > > > @@ -604,11 +643,17 @@ again: > > > > } > > > > } > > > > > > > > -static int intel_fbc_alloc_cfb(struct drm_i915_private > > > > *dev_priv, int size, > > > > - int fb_cpp) > > > > +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) > > > > { > > > > + struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > + struct drm_framebuffer *fb = crtc->base.primary- > > > > >state->fb; > > > > struct drm_mm_node *uninitialized_var(compressed_llb); > > > > - int ret; > > > > + int size, fb_cpp, ret; > > > > + > > > > + WARN_ON(dev_priv->fbc.compressed_fb.allocated); > > > > + > > > > + size = intel_fbc_calculate_cfb_size(crtc, fb); > > > > + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > > > > > ret = find_compression_threshold(dev_priv, &dev_priv- > > > > >fbc.compressed_fb, > > > > size, fb_cpp); > > > > @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct > > > > drm_i915_private *dev_priv) > > > > mutex_unlock(&dev_priv->fbc.lock); > > > > } > > > > > > > > -/* > > > > - * For SKL+, the plane source size used by the hardware is > > > > based on the value we > > > > - * write to the PLANE_SIZE register. For BDW-, the hardware > > > > looks at the value > > > > - * we wrote to PIPESRC. > > > > - */ > > > > -static void intel_fbc_get_plane_source_size(struct intel_crtc > > > > *crtc, > > > > - int *width, int > > > > *height) > > > > -{ > > > > - struct intel_plane_state *plane_state = > > > > - to_intel_plane_state(crtc- > > > > >base.primary->state); > > > > - int w, h; > > > > - > > > > - if (intel_rotation_90_or_270(plane_state- > > > > >base.rotation)) { > > > > - w = drm_rect_height(&plane_state->src) >> 16; > > > > - h = drm_rect_width(&plane_state->src) >> 16; > > > > - } else { > > > > - w = drm_rect_width(&plane_state->src) >> 16; > > > > - h = drm_rect_height(&plane_state->src) >> 16; > > > > - } > > > > - > > > > - if (width) > > > > - *width = w; > > > > - if (height) > > > > - *height = h; > > > > -} > > > > - > > > > -static int intel_fbc_calculate_cfb_size(struct intel_crtc > > > > *crtc) > > > > -{ > > > > - struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > > - int lines; > > > > - > > > > - intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > > - if (INTEL_INFO(dev_priv)->gen >= 7) > > > > - lines = min(lines, 2048); > > > > - > > > > - return lines * fb->pitches[0]; > > > > -} > > > > - > > > > -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) > > > > -{ > > > > - struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > > - int size, cpp; > > > > - > > > > - size = intel_fbc_calculate_cfb_size(crtc); > > > > - cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > - > > > > - if (dev_priv->fbc.compressed_fb.allocated && > > > > - size <= dev_priv->fbc.compressed_fb.size * > > > > dev_priv->fbc.threshold) > > > > - return 0; > > > > - > > > > - /* Release any current block */ > > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > > - > > > > - return intel_fbc_alloc_cfb(dev_priv, size, cpp); > > > > -} > > > > - > > > > static bool stride_is_valid(struct drm_i915_private *dev_priv, > > > > unsigned int stride) > > > > { > > > > @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct > > > > intel_crtc *crtc) > > > > goto out_disable; > > > > } > > > > > > > > - if (intel_fbc_setup_cfb(crtc)) { > > > > - set_no_fbc_reason(dev_priv, "not enough stolen > > > > memory"); > > > > + /* It is possible for the required CFB size change > > > > without a > > > > + * crtc->disable + crtc->enable since it is possible > > > > to change the > > > > + * stride without triggering a full modeset. Since we > > > > try to > > > > + * over-allocate the CFB, there's a chance we may keep > > > > FBC enabled even > > > > + * if this happens, but if we exceed the current CFB > > > > size we'll have to > > > > + * disable FBC. Notice that it would be possible to > > > > disable FBC, wait > > > > + * for a frame, free the stolen node, then try to > > > > reenable FBC in case > > > > + * we didn't get any invalidate/deactivate calls, but > > > > this would require > > > > + * a lot of tracking just for a case we expect to be > > > > uncommon, so we > > > > + * just don't have this code for now. */ > > > > + if (intel_fbc_calculate_cfb_size(crtc, fb) > > > > > + dev_priv->fbc.compressed_fb.size * dev_priv- > > > > >fbc.threshold) { > > > > + set_no_fbc_reason(dev_priv, "CFB requirements > > > > changed"); > > > > goto out_disable; > > > > } > > > > > > > > @@ -958,7 +956,6 @@ out_disable: > > > > DRM_DEBUG_KMS("unsupported config, > > > > deactivating FBC\n"); > > > > __intel_fbc_deactivate(dev_priv); > > > > } > > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > > } > > > > > > > > /* > > > > @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc > > > > *crtc) > > > > goto out; > > > > } > > > > > > > > + if (intel_fbc_alloc_cfb(crtc)) { > > > > + set_no_fbc_reason(dev_priv, "not enough stolen > > > > memory"); > > > > + goto out; > > > > + } > > > > + > > > > DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", > > > > pipe_name(crtc->pipe)); > > > > dev_priv->fbc.no_fbc_reason = "FBC enabled but not > > > > active yet\n"; > > > > > > > > @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct > > > > drm_i915_private *dev_priv) > > > > > > > > DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", > > > > pipe_name(crtc->pipe)); > > > > > > > > + __intel_fbc_cleanup_cfb(dev_priv); > > > > + > > > > dev_priv->fbc.enabled = false; > > > > dev_priv->fbc.crtc = NULL; > > > > } > > > > -- > > > > 2.6.1 > > > > > > > > _______________________________________________ > > > > Intel-gfx mailing list > > > > Intel-gfx@lists.freedesktop.org > > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > > > -- > > > Daniel Vetter > > > Software Engineer, Intel Corporation > > > http://blog.ffwll.ch > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > > Ville Syrjälä > > Intel OTC >
On Wed, Oct 21, 2015 at 06:30:09PM +0000, Zanoni, Paulo R wrote: > Em Qua, 2015-10-21 às 09:24 +0200, Daniel Vetter escreveu: > > On Wed, Oct 21, 2015 at 10:20:55AM +0300, Ville Syrjälä wrote: > > > On Wed, Oct 21, 2015 at 09:11:08AM +0200, Daniel Vetter wrote: > > > > On Tue, Oct 20, 2015 at 11:50:01AM -0200, Paulo Zanoni wrote: > > > > > One of the problems with the current code is that it frees the > > > > > CFB and > > > > > releases its drm_mm node as soon as we flip FBC's enable bit. > > > > > This is > > > > > bad because after we disbale FBC the hardware may still use the > > > > > CFB > > > > > for the rest of the frame, so in theory we should only release > > > > > the > > > > > drm_mm node one frame after we disable FBC. Otherwise, a stolen > > > > > memory > > > > > allocation done right after an FBC disable may result in either > > > > > corrupted memory for the new owner of that memory region or > > > > > corrupted > > > > > screen/underruns in case the new owner changes it while the > > > > > hardware > > > > > is still reading it. This case is not exactly easy to reproduce > > > > > since > > > > > we currently don't do a lot of stolen memory allocations, but I > > > > > see > > > > > patches on the mailing list trying to expose stolen memory to > > > > > user > > > > > space, so races will be possible. > > > > > > > > > > I thought about three different approaches to solve this, and > > > > > they all > > > > > have downsides. > > > > > > > > > > The first approach would be to simply use multiple drm_mm nodes > > > > > and > > > > > freeing the unused ones only after a frame has passed. The > > > > > problem > > > > > with this approach is that since stolen memory is rather small, > > > > > there's a risk we just won't be able to allocate a new CFB from > > > > > stolen > > > > > if the previous one was not freed yet. This could happen in > > > > > case we > > > > > quickly disable FBC from pipe A and decide to enable it on pipe > > > > > B, or > > > > > just if we change pipe A's fb stride while FBC is enabled. > > > > > > > > > > The second approach would be similar to the first one, but > > > > > maintaining > > > > > a single drm_mm node and keeping track of when it can be > > > > > reused. This > > > > > would remove the disadvantage of not having enough space for > > > > > two > > > > > nodes, but would create the new problem where we may not be > > > > > able to > > > > > enable FBC at the point intel_fbc_update() is called, so we > > > > > would have > > > > > to add more code to retry updating FBC after the time has > > > > > passed. And > > > > > that can quickly get too complex since we can get invalidate, > > > > > flush, > > > > > flip_prepare, disable and other calls in the middle of the > > > > > wait. > > > > > > > > > > Both solutions above - and also the current code - have the > > > > > problem > > > > > that we unnecessarily free+realloc FBC during invalidate+flush > > > > > operations even if the CFB size doesn't change. > > > > > > > > > > The third option would be to move the allocation/deallocation > > > > > to > > > > > enable/disable. This makes sure that the pipe is always > > > > > disabled when > > > > > we allocate/deallocate the CFB, so there's no risk that the FBC > > > > > hardware may read or write to the memory right after it is > > > > > freed from > > > > > drm_mm. The downside is that it is possible for user space to > > > > > change > > > > > the buffer stride without triggering a disable/enable - only > > > > > deactivate/activate -, so we'll have to handle this case > > > > > somehow, even > > > > > though it is uncommon - see igt's kms_frontbuffer_tracking > > > > > test, > > > > > fbc-stridechange subtest. It could be possible to implement a > > > > > way to > > > > > free+alloc the CFB during said stride change, but it would > > > > > involve a > > > > > lot of book-keeping - exactly as mentioned above - just for a > > > > > rare > > > > > case, so for now I'll keep it simple and just deactivate FBC. > > > > > Besides, > > > > > we may not even need to disable FBC since we do CFB over- > > > > > allocation. > > > > > > > > > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> > > > > > --- > > > > > drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++--- > > > > > ---------------- > > > > > 1 file changed, 68 insertions(+), 64 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_fbc.c > > > > > b/drivers/gpu/drm/i915/intel_fbc.c > > > > > index bf855b2..48d8cfc 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_fbc.c > > > > > +++ b/drivers/gpu/drm/i915/intel_fbc.c > > > > > @@ -59,6 +59,45 @@ static unsigned int > > > > > get_crtc_fence_y_offset(struct intel_crtc *crtc) > > > > > return crtc->base.y - crtc->adjusted_y; > > > > > } > > > > > > > > > > +/* > > > > > + * For SKL+, the plane source size used by the hardware is > > > > > based on the value we > > > > > + * write to the PLANE_SIZE register. For BDW-, the hardware > > > > > looks at the value > > > > > + * we wrote to PIPESRC. > > > > > + */ > > > > > +static void intel_fbc_get_plane_source_size(struct intel_crtc > > > > > *crtc, > > > > > + int *width, int > > > > > *height) > > > > > +{ > > > > > + struct intel_plane_state *plane_state = > > > > > + to_intel_plane_state(crtc- > > > > > >base.primary->state); > > > > > + int w, h; > > > > > + > > > > > + if (intel_rotation_90_or_270(plane_state- > > > > > >base.rotation)) { > > > > > + w = drm_rect_height(&plane_state->src) >> 16; > > > > > + h = drm_rect_width(&plane_state->src) >> 16; > > > > > + } else { > > > > > + w = drm_rect_width(&plane_state->src) >> 16; > > > > > + h = drm_rect_height(&plane_state->src) >> 16; > > > > > + } > > > > > + > > > > > + if (width) > > > > > + *width = w; > > > > > + if (height) > > > > > + *height = h; > > > > > +} > > > > > + > > > > > +static int intel_fbc_calculate_cfb_size(struct intel_crtc > > > > > *crtc, > > > > > + struct drm_framebuffer > > > > > *fb) > > > > > +{ > > > > > + struct drm_i915_private *dev_priv = crtc->base.dev- > > > > > >dev_private; > > > > > + int lines; > > > > > + > > > > > + intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > > > + if (INTEL_INFO(dev_priv)->gen >= 7) > > > > > + lines = min(lines, 2048); > > > > > + > > > > > + return lines * fb->pitches[0]; > > > > > > > > Why are we even looking at fb->pitches here? I thought that for > > > > fbc we > > > > only compress what we actually scan out, i.e. the source > > > > rectangle of the > > > > plane in pixels. Maybe some rounding involved perhaps. > > > > > > IIRC the docs do say the hw uses the plane stride register figure > > > this > > > out. It's something I've been wondering for a few years now, but > > > never > > > actually tested to see if we can get away with less. > > So we have a volunteer to implement the stolen memory checker? :) > > > > In that case a comment would be great, maybe even with the Bspec > > citation. > > Well, the docs talk about the stride and the HW guys confirmed it, so > it didn't seem to be as something that needed a comment. > > You can also git-blame it :) > > http://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915/intel > _fbc.c?id=c4ffd40908c30a33291227920e921f6b45b9e8f7 Indeed, \o/ for good commit message. Still seems fairly surprising that the hw expects the stride and not just what it actually has to read. But I guess I've learned that now. A comment for the next person would still be good though I think. Something like: /* hw needs the full buffer stride, not just the active area */ To make it clear this is 100% intentional ;-) -Daniel > > > > -Daniel > > > > > > > > > > > > > Or did I miss something? > > > > -Daniel > > > > > > > > > +} > > > > > + > > > > > static void i8xx_fbc_deactivate(struct drm_i915_private > > > > > *dev_priv) > > > > > { > > > > > u32 fbc_ctl; > > > > > @@ -604,11 +643,17 @@ again: > > > > > } > > > > > } > > > > > > > > > > -static int intel_fbc_alloc_cfb(struct drm_i915_private > > > > > *dev_priv, int size, > > > > > - int fb_cpp) > > > > > +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) > > > > > { > > > > > + struct drm_i915_private *dev_priv = crtc->base.dev- > > > > > >dev_private; > > > > > + struct drm_framebuffer *fb = crtc->base.primary- > > > > > >state->fb; > > > > > struct drm_mm_node *uninitialized_var(compressed_llb); > > > > > - int ret; > > > > > + int size, fb_cpp, ret; > > > > > + > > > > > + WARN_ON(dev_priv->fbc.compressed_fb.allocated); > > > > > + > > > > > + size = intel_fbc_calculate_cfb_size(crtc, fb); > > > > > + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > > > > > > > ret = find_compression_threshold(dev_priv, &dev_priv- > > > > > >fbc.compressed_fb, > > > > > size, fb_cpp); > > > > > @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct > > > > > drm_i915_private *dev_priv) > > > > > mutex_unlock(&dev_priv->fbc.lock); > > > > > } > > > > > > > > > > -/* > > > > > - * For SKL+, the plane source size used by the hardware is > > > > > based on the value we > > > > > - * write to the PLANE_SIZE register. For BDW-, the hardware > > > > > looks at the value > > > > > - * we wrote to PIPESRC. > > > > > - */ > > > > > -static void intel_fbc_get_plane_source_size(struct intel_crtc > > > > > *crtc, > > > > > - int *width, int > > > > > *height) > > > > > -{ > > > > > - struct intel_plane_state *plane_state = > > > > > - to_intel_plane_state(crtc- > > > > > >base.primary->state); > > > > > - int w, h; > > > > > - > > > > > - if (intel_rotation_90_or_270(plane_state- > > > > > >base.rotation)) { > > > > > - w = drm_rect_height(&plane_state->src) >> 16; > > > > > - h = drm_rect_width(&plane_state->src) >> 16; > > > > > - } else { > > > > > - w = drm_rect_width(&plane_state->src) >> 16; > > > > > - h = drm_rect_height(&plane_state->src) >> 16; > > > > > - } > > > > > - > > > > > - if (width) > > > > > - *width = w; > > > > > - if (height) > > > > > - *height = h; > > > > > -} > > > > > - > > > > > -static int intel_fbc_calculate_cfb_size(struct intel_crtc > > > > > *crtc) > > > > > -{ > > > > > - struct drm_i915_private *dev_priv = crtc->base.dev- > > > > > >dev_private; > > > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > > > - int lines; > > > > > - > > > > > - intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > > > - if (INTEL_INFO(dev_priv)->gen >= 7) > > > > > - lines = min(lines, 2048); > > > > > - > > > > > - return lines * fb->pitches[0]; > > > > > -} > > > > > - > > > > > -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) > > > > > -{ > > > > > - struct drm_i915_private *dev_priv = crtc->base.dev- > > > > > >dev_private; > > > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > > > - int size, cpp; > > > > > - > > > > > - size = intel_fbc_calculate_cfb_size(crtc); > > > > > - cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > > - > > > > > - if (dev_priv->fbc.compressed_fb.allocated && > > > > > - size <= dev_priv->fbc.compressed_fb.size * > > > > > dev_priv->fbc.threshold) > > > > > - return 0; > > > > > - > > > > > - /* Release any current block */ > > > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > > > - > > > > > - return intel_fbc_alloc_cfb(dev_priv, size, cpp); > > > > > -} > > > > > - > > > > > static bool stride_is_valid(struct drm_i915_private *dev_priv, > > > > > unsigned int stride) > > > > > { > > > > > @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct > > > > > intel_crtc *crtc) > > > > > goto out_disable; > > > > > } > > > > > > > > > > - if (intel_fbc_setup_cfb(crtc)) { > > > > > - set_no_fbc_reason(dev_priv, "not enough stolen > > > > > memory"); > > > > > + /* It is possible for the required CFB size change > > > > > without a > > > > > + * crtc->disable + crtc->enable since it is possible > > > > > to change the > > > > > + * stride without triggering a full modeset. Since we > > > > > try to > > > > > + * over-allocate the CFB, there's a chance we may keep > > > > > FBC enabled even > > > > > + * if this happens, but if we exceed the current CFB > > > > > size we'll have to > > > > > + * disable FBC. Notice that it would be possible to > > > > > disable FBC, wait > > > > > + * for a frame, free the stolen node, then try to > > > > > reenable FBC in case > > > > > + * we didn't get any invalidate/deactivate calls, but > > > > > this would require > > > > > + * a lot of tracking just for a case we expect to be > > > > > uncommon, so we > > > > > + * just don't have this code for now. */ > > > > > + if (intel_fbc_calculate_cfb_size(crtc, fb) > > > > > > + dev_priv->fbc.compressed_fb.size * dev_priv- > > > > > >fbc.threshold) { > > > > > + set_no_fbc_reason(dev_priv, "CFB requirements > > > > > changed"); > > > > > goto out_disable; > > > > > } > > > > > > > > > > @@ -958,7 +956,6 @@ out_disable: > > > > > DRM_DEBUG_KMS("unsupported config, > > > > > deactivating FBC\n"); > > > > > __intel_fbc_deactivate(dev_priv); > > > > > } > > > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > > > } > > > > > > > > > > /* > > > > > @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc > > > > > *crtc) > > > > > goto out; > > > > > } > > > > > > > > > > + if (intel_fbc_alloc_cfb(crtc)) { > > > > > + set_no_fbc_reason(dev_priv, "not enough stolen > > > > > memory"); > > > > > + goto out; > > > > > + } > > > > > + > > > > > DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", > > > > > pipe_name(crtc->pipe)); > > > > > dev_priv->fbc.no_fbc_reason = "FBC enabled but not > > > > > active yet\n"; > > > > > > > > > > @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct > > > > > drm_i915_private *dev_priv) > > > > > > > > > > DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", > > > > > pipe_name(crtc->pipe)); > > > > > > > > > > + __intel_fbc_cleanup_cfb(dev_priv); > > > > > + > > > > > dev_priv->fbc.enabled = false; > > > > > dev_priv->fbc.crtc = NULL; > > > > > } > > > > > -- > > > > > 2.6.1 > > > > > > > > > > _______________________________________________ > > > > > Intel-gfx mailing list > > > > > Intel-gfx@lists.freedesktop.org > > > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > http://blog.ffwll.ch > > > > _______________________________________________ > > > > Intel-gfx mailing list > > > > Intel-gfx@lists.freedesktop.org > > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > > > -- > > > Ville Syrjälä > > > Intel OTC > >
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c index bf855b2..48d8cfc 100644 --- a/drivers/gpu/drm/i915/intel_fbc.c +++ b/drivers/gpu/drm/i915/intel_fbc.c @@ -59,6 +59,45 @@ static unsigned int get_crtc_fence_y_offset(struct intel_crtc *crtc) return crtc->base.y - crtc->adjusted_y; } +/* + * For SKL+, the plane source size used by the hardware is based on the value we + * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value + * we wrote to PIPESRC. + */ +static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, + int *width, int *height) +{ + struct intel_plane_state *plane_state = + to_intel_plane_state(crtc->base.primary->state); + int w, h; + + if (intel_rotation_90_or_270(plane_state->base.rotation)) { + w = drm_rect_height(&plane_state->src) >> 16; + h = drm_rect_width(&plane_state->src) >> 16; + } else { + w = drm_rect_width(&plane_state->src) >> 16; + h = drm_rect_height(&plane_state->src) >> 16; + } + + if (width) + *width = w; + if (height) + *height = h; +} + +static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc, + struct drm_framebuffer *fb) +{ + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; + int lines; + + intel_fbc_get_plane_source_size(crtc, NULL, &lines); + if (INTEL_INFO(dev_priv)->gen >= 7) + lines = min(lines, 2048); + + return lines * fb->pitches[0]; +} + static void i8xx_fbc_deactivate(struct drm_i915_private *dev_priv) { u32 fbc_ctl; @@ -604,11 +643,17 @@ again: } } -static int intel_fbc_alloc_cfb(struct drm_i915_private *dev_priv, int size, - int fb_cpp) +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) { + struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; + struct drm_framebuffer *fb = crtc->base.primary->state->fb; struct drm_mm_node *uninitialized_var(compressed_llb); - int ret; + int size, fb_cpp, ret; + + WARN_ON(dev_priv->fbc.compressed_fb.allocated); + + size = intel_fbc_calculate_cfb_size(crtc, fb); + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); ret = find_compression_threshold(dev_priv, &dev_priv->fbc.compressed_fb, size, fb_cpp); @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct drm_i915_private *dev_priv) mutex_unlock(&dev_priv->fbc.lock); } -/* - * For SKL+, the plane source size used by the hardware is based on the value we - * write to the PLANE_SIZE register. For BDW-, the hardware looks at the value - * we wrote to PIPESRC. - */ -static void intel_fbc_get_plane_source_size(struct intel_crtc *crtc, - int *width, int *height) -{ - struct intel_plane_state *plane_state = - to_intel_plane_state(crtc->base.primary->state); - int w, h; - - if (intel_rotation_90_or_270(plane_state->base.rotation)) { - w = drm_rect_height(&plane_state->src) >> 16; - h = drm_rect_width(&plane_state->src) >> 16; - } else { - w = drm_rect_width(&plane_state->src) >> 16; - h = drm_rect_height(&plane_state->src) >> 16; - } - - if (width) - *width = w; - if (height) - *height = h; -} - -static int intel_fbc_calculate_cfb_size(struct intel_crtc *crtc) -{ - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; - struct drm_framebuffer *fb = crtc->base.primary->fb; - int lines; - - intel_fbc_get_plane_source_size(crtc, NULL, &lines); - if (INTEL_INFO(dev_priv)->gen >= 7) - lines = min(lines, 2048); - - return lines * fb->pitches[0]; -} - -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) -{ - struct drm_i915_private *dev_priv = crtc->base.dev->dev_private; - struct drm_framebuffer *fb = crtc->base.primary->fb; - int size, cpp; - - size = intel_fbc_calculate_cfb_size(crtc); - cpp = drm_format_plane_cpp(fb->pixel_format, 0); - - if (dev_priv->fbc.compressed_fb.allocated && - size <= dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) - return 0; - - /* Release any current block */ - __intel_fbc_cleanup_cfb(dev_priv); - - return intel_fbc_alloc_cfb(dev_priv, size, cpp); -} - static bool stride_is_valid(struct drm_i915_private *dev_priv, unsigned int stride) { @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct intel_crtc *crtc) goto out_disable; } - if (intel_fbc_setup_cfb(crtc)) { - set_no_fbc_reason(dev_priv, "not enough stolen memory"); + /* It is possible for the required CFB size change without a + * crtc->disable + crtc->enable since it is possible to change the + * stride without triggering a full modeset. Since we try to + * over-allocate the CFB, there's a chance we may keep FBC enabled even + * if this happens, but if we exceed the current CFB size we'll have to + * disable FBC. Notice that it would be possible to disable FBC, wait + * for a frame, free the stolen node, then try to reenable FBC in case + * we didn't get any invalidate/deactivate calls, but this would require + * a lot of tracking just for a case we expect to be uncommon, so we + * just don't have this code for now. */ + if (intel_fbc_calculate_cfb_size(crtc, fb) > + dev_priv->fbc.compressed_fb.size * dev_priv->fbc.threshold) { + set_no_fbc_reason(dev_priv, "CFB requirements changed"); goto out_disable; } @@ -958,7 +956,6 @@ out_disable: DRM_DEBUG_KMS("unsupported config, deactivating FBC\n"); __intel_fbc_deactivate(dev_priv); } - __intel_fbc_cleanup_cfb(dev_priv); } /* @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc *crtc) goto out; } + if (intel_fbc_alloc_cfb(crtc)) { + set_no_fbc_reason(dev_priv, "not enough stolen memory"); + goto out; + } + DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", pipe_name(crtc->pipe)); dev_priv->fbc.no_fbc_reason = "FBC enabled but not active yet\n"; @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct drm_i915_private *dev_priv) DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", pipe_name(crtc->pipe)); + __intel_fbc_cleanup_cfb(dev_priv); + dev_priv->fbc.enabled = false; dev_priv->fbc.crtc = NULL; }
One of the problems with the current code is that it frees the CFB and releases its drm_mm node as soon as we flip FBC's enable bit. This is bad because after we disbale FBC the hardware may still use the CFB for the rest of the frame, so in theory we should only release the drm_mm node one frame after we disable FBC. Otherwise, a stolen memory allocation done right after an FBC disable may result in either corrupted memory for the new owner of that memory region or corrupted screen/underruns in case the new owner changes it while the hardware is still reading it. This case is not exactly easy to reproduce since we currently don't do a lot of stolen memory allocations, but I see patches on the mailing list trying to expose stolen memory to user space, so races will be possible. I thought about three different approaches to solve this, and they all have downsides. The first approach would be to simply use multiple drm_mm nodes and freeing the unused ones only after a frame has passed. The problem with this approach is that since stolen memory is rather small, there's a risk we just won't be able to allocate a new CFB from stolen if the previous one was not freed yet. This could happen in case we quickly disable FBC from pipe A and decide to enable it on pipe B, or just if we change pipe A's fb stride while FBC is enabled. The second approach would be similar to the first one, but maintaining a single drm_mm node and keeping track of when it can be reused. This would remove the disadvantage of not having enough space for two nodes, but would create the new problem where we may not be able to enable FBC at the point intel_fbc_update() is called, so we would have to add more code to retry updating FBC after the time has passed. And that can quickly get too complex since we can get invalidate, flush, flip_prepare, disable and other calls in the middle of the wait. Both solutions above - and also the current code - have the problem that we unnecessarily free+realloc FBC during invalidate+flush operations even if the CFB size doesn't change. The third option would be to move the allocation/deallocation to enable/disable. This makes sure that the pipe is always disabled when we allocate/deallocate the CFB, so there's no risk that the FBC hardware may read or write to the memory right after it is freed from drm_mm. The downside is that it is possible for user space to change the buffer stride without triggering a disable/enable - only deactivate/activate -, so we'll have to handle this case somehow, even though it is uncommon - see igt's kms_frontbuffer_tracking test, fbc-stridechange subtest. It could be possible to implement a way to free+alloc the CFB during said stride change, but it would involve a lot of book-keeping - exactly as mentioned above - just for a rare case, so for now I'll keep it simple and just deactivate FBC. Besides, we may not even need to disable FBC since we do CFB over-allocation. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> --- drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++------------------- 1 file changed, 68 insertions(+), 64 deletions(-)