diff mbox series

[RFC] drm/vc4: Add a load tracker to prevent HVS underflow errors

Message ID 20181016094045.23021-1-boris.brezillon@bootlin.com (mailing list archive)
State New, archived
Headers show
Series [RFC] drm/vc4: Add a load tracker to prevent HVS underflow errors | expand

Commit Message

Boris Brezillon Oct. 16, 2018, 9:40 a.m. UTC
The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
meet the requested framerate. The problem is, the HVS and memory bus
bandwidths are limited, and if we don't take these limitations into
account we might end up with HVS underflow errors.

This patch is trying to model the per-plane HVS and memory bus bandwidth
consumption and take a decision at atomic_check() time whether the
estimated load will fit in the HVS and membus budget.

Note that we take an extra margin on the memory bus consumption to let
the system run smoothly when other blocks are doing heavy use of the
memory bus. Same goes for the HVS limit, except the margin is smaller in
this case, since the HVS is not used by external components.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
This logic has been validated using a simple shell script and
some instrumentation in the VC4 driver:

- capture underflow errors at the HVS level and expose a debugfs file
  reporting those errors
- add debugfs files to expose when atomic_check fails because of the
  HVS or membus load limitation or when it fails for other reasons

The branch containing those modification is available here [1], and the
script (which is internally using modetest) is here [2] (please note
that I'm bad at writing shell scripts :-)).

Note that those modification tend to over-estimate the load, and thus
reject setups that might have previously worked, so we might want to
adjust the limits to avoid that.

[1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
[2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
---
 drivers/gpu/drm/vc4/vc4_drv.h   |  11 +++++
 drivers/gpu/drm/vc4/vc4_kms.c   | 104 +++++++++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/vc4/vc4_plane.c |  60 +++++++++++++++++++++++
 3 files changed, 174 insertions(+), 1 deletion(-)

Comments

Daniel Vetter Oct. 16, 2018, 12:57 p.m. UTC | #1
On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> meet the requested framerate. The problem is, the HVS and memory bus
> bandwidths are limited, and if we don't take these limitations into
> account we might end up with HVS underflow errors.
> 
> This patch is trying to model the per-plane HVS and memory bus bandwidth
> consumption and take a decision at atomic_check() time whether the
> estimated load will fit in the HVS and membus budget.
> 
> Note that we take an extra margin on the memory bus consumption to let
> the system run smoothly when other blocks are doing heavy use of the
> memory bus. Same goes for the HVS limit, except the margin is smaller in
> this case, since the HVS is not used by external components.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> ---
> This logic has been validated using a simple shell script and
> some instrumentation in the VC4 driver:
> 
> - capture underflow errors at the HVS level and expose a debugfs file
>   reporting those errors
> - add debugfs files to expose when atomic_check fails because of the
>   HVS or membus load limitation or when it fails for other reasons
> 
> The branch containing those modification is available here [1], and the
> script (which is internally using modetest) is here [2] (please note
> that I'm bad at writing shell scripts :-)).
> 
> Note that those modification tend to over-estimate the load, and thus
> reject setups that might have previously worked, so we might want to
> adjust the limits to avoid that.
> 
> [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test

Any interest in using igt to test this stuff? We have at least a bunch of
tests already in there that try all kinds of plane setups. And we use
those to hunt for underruns on i915 hw.

Wrt underrun reporting: On i915 we just dump them into dmesg at the error
level, using DRM_ERROR, plus a tracepoint. See e.g.
intel_pch_fifo_underrun_irq_handler(). If there's interest we could
perhaps extract this into something common, similar to what was done with
crc support already.

> ---
>  drivers/gpu/drm/vc4/vc4_drv.h   |  11 +++++
>  drivers/gpu/drm/vc4/vc4_kms.c   | 104 +++++++++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/vc4/vc4_plane.c |  60 +++++++++++++++++++++++
>  3 files changed, 174 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
> index bd6ef1f31822..48f6ee5ceda3 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.h
> +++ b/drivers/gpu/drm/vc4/vc4_drv.h
> @@ -200,6 +200,7 @@ struct vc4_dev {
>  
>  	struct drm_modeset_lock ctm_state_lock;
>  	struct drm_private_obj ctm_manager;
> +	struct drm_private_obj load_tracker;
>  };
>  
>  static inline struct vc4_dev *
> @@ -369,6 +370,16 @@ struct vc4_plane_state {
>  	 * to enable background color fill.
>  	 */
>  	bool needs_bg_fill;
> +
> +	/* Load of this plane on the HVS block. The load is expressed in HVS
> +	 * cycles/sec.
> +	 */
> +	u64 hvs_load;
> +
> +	/* Memory bandwidth needed for this plane. This is expressed in
> +	 * bytes/sec.
> +	 */
> +	u64 membus_load;
>  };
>  
>  static inline struct vc4_plane_state *
> diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
> index 127468785f74..4c65e6013bd3 100644
> --- a/drivers/gpu/drm/vc4/vc4_kms.c
> +++ b/drivers/gpu/drm/vc4/vc4_kms.c
> @@ -34,6 +34,18 @@ static struct vc4_ctm_state *to_vc4_ctm_state(struct drm_private_state *priv)
>  	return container_of(priv, struct vc4_ctm_state, base);
>  }
>  
> +struct vc4_load_tracker_state {
> +	struct drm_private_state base;
> +	u64 hvs_load;
> +	u64 membus_load;
> +};
> +
> +static struct vc4_load_tracker_state *
> +to_vc4_load_tracker_state(struct drm_private_state *priv)
> +{
> +	return container_of(priv, struct vc4_load_tracker_state, base);
> +}
> +
>  static struct vc4_ctm_state *vc4_get_ctm_state(struct drm_atomic_state *state,
>  					       struct drm_private_obj *manager)
>  {
> @@ -379,6 +391,81 @@ vc4_ctm_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
>  	return 0;
>  }
>  
> +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> +{
> +	struct drm_plane_state *old_plane_state, *new_plane_state;
> +	struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> +	struct vc4_load_tracker_state *load_state;
> +	struct drm_private_state *priv_state;
> +	struct drm_plane *plane;
> +	int ret, i;
> +

You're missing the modeset locking for vc4->load_tracker. See the
kerneldoc for drm_atomic_get_private_obj_state(). Probably a good time to
implement the locking refactoring idea I have and just implement a per
private_obj lock, and remove all the ad-hoc locking from all the callers?

Would definitely simplify the code, and avoid "oops no locking" issues
like here.

Cheers, Daniel

> +	priv_state = drm_atomic_get_private_obj_state(state,
> +						      &vc4->load_tracker);
> +	if (IS_ERR(priv_state))
> +		return PTR_ERR(priv_state);
> +
> +	load_state = to_vc4_load_tracker_state(priv_state);
> +	for_each_oldnew_plane_in_state(state, plane, old_plane_state,
> +				       new_plane_state, i) {
> +		struct vc4_plane_state *vc4_plane_state;
> +
> +		if (old_plane_state->fb && old_plane_state->crtc) {
> +			vc4_plane_state = to_vc4_plane_state(old_plane_state);
> +			load_state->membus_load -= vc4_plane_state->membus_load;
> +			load_state->hvs_load -= vc4_plane_state->hvs_load;
> +		}
> +
> +		if (new_plane_state->fb && new_plane_state->crtc) {
> +			vc4_plane_state = to_vc4_plane_state(new_plane_state);
> +			load_state->membus_load += vc4_plane_state->membus_load;
> +			load_state->hvs_load += vc4_plane_state->hvs_load;
> +		}
> +	}
> +
> +	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let
> +	 * the system work when other blocks are accessing the memory.
> +	 */
> +	if (load_state->membus_load > SZ_1G + SZ_512M)
> +		return -ENOSPC;
> +
> +	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> +	 * consider the maximum number of cycles is 240M.
> +	 */
> +	if (load_state->hvs_load > 240000000ULL)
> +		return -ENOSPC;

EINVAL is for atomic_check failures. ENOSPC isn't one of the permitted
errno codes, see the kernel-doc for &drm_mode_config_funcs.atomic_check.
atomic_commit has a different set of permissible errno codes.

We should probably enforce this in drm core ...
-Daniel

> +
> +	return 0;
> +}
> +
> +static struct drm_private_state *
> +vc4_load_tracker_duplicate_state(struct drm_private_obj *obj)
> +{
> +	struct vc4_load_tracker_state *state;
> +
> +	state = kmemdup(obj->state, sizeof(*state), GFP_KERNEL);
> +	if (!state)
> +		return NULL;
> +
> +	__drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
> +
> +	return &state->base;
> +}
> +
> +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
> +					   struct drm_private_state *state)
> +{
> +	struct vc4_load_tracker_state *load_state;
> +
> +	load_state = to_vc4_load_tracker_state(state);
> +	kfree(load_state);
> +}
> +
> +static const struct drm_private_state_funcs vc4_load_tracker_state_funcs = {
> +	.atomic_duplicate_state = vc4_load_tracker_duplicate_state,
> +	.atomic_destroy_state = vc4_load_tracker_destroy_state,
> +};
> +
>  static int
>  vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
>  {
> @@ -388,7 +475,11 @@ vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
>  	if (ret < 0)
>  		return ret;
>  
> -	return drm_atomic_helper_check(dev, state);
> +	ret = drm_atomic_helper_check(dev, state);
> +	if (ret)
> +		return ret;
> +
> +	return vc4_load_tracker_atomic_check(state);
>  }
>  
>  static const struct drm_mode_config_funcs vc4_mode_funcs = {
> @@ -401,6 +492,7 @@ int vc4_kms_load(struct drm_device *dev)
>  {
>  	struct vc4_dev *vc4 = to_vc4_dev(dev);
>  	struct vc4_ctm_state *ctm_state;
> +	struct vc4_load_tracker_state *load_state;
>  	int ret;
>  
>  	sema_init(&vc4->async_modeset, 1);
> @@ -426,9 +518,19 @@ int vc4_kms_load(struct drm_device *dev)
>  	ctm_state = kzalloc(sizeof(*ctm_state), GFP_KERNEL);
>  	if (!ctm_state)
>  		return -ENOMEM;
> +
>  	drm_atomic_private_obj_init(&vc4->ctm_manager, &ctm_state->base,
>  				    &vc4_ctm_state_funcs);
>  
> +	load_state = kzalloc(sizeof(*load_state), GFP_KERNEL);
> +	if (!load_state) {
> +		drm_atomic_private_obj_fini(&vc4->ctm_manager);
> +		return -ENOMEM;
> +	}
> +
> +	drm_atomic_private_obj_init(&vc4->load_tracker, &load_state->base,
> +				    &vc4_load_tracker_state_funcs);
> +
>  	drm_mode_config_reset(dev);
>  
>  	drm_kms_helper_poll_init(dev);
> diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
> index 60d5ad19cedd..f47d38383a2f 100644
> --- a/drivers/gpu/drm/vc4/vc4_plane.c
> +++ b/drivers/gpu/drm/vc4/vc4_plane.c
> @@ -455,6 +455,64 @@ static void vc4_write_scaling_parameters(struct drm_plane_state *state,
>  	}
>  }
>  
> +static void vc4_plane_calc_load(struct drm_plane_state *state)
> +{
> +	unsigned int hvs_load_shift, vrefresh, i;
> +	struct drm_framebuffer *fb = state->fb;
> +	struct vc4_plane_state *vc4_state;
> +	struct drm_crtc_state *crtc_state;
> +	unsigned int vscale_factor;
> +
> +	vc4_state = to_vc4_plane_state(state);
> +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> +							state->crtc);
> +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
> +
> +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
> +	 * 4 pixels/cycle otherwise.
> +	 * Alpha blending step seems to be pipelined and it's always operating
> +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
> +	 * scaler block.
> +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
> +	 */
> +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
> +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
> +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
> +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
> +		hvs_load_shift = 1;
> +	else
> +		hvs_load_shift = 2;
> +
> +	vc4_state->membus_load = 0;
> +	vc4_state->hvs_load = 0;
> +	for (i = 0; i < fb->format->num_planes; i++) {
> +		unsigned long pixels_load;
> +
> +		/* Even if the bandwidth/plane required for a single frame is
> +		 *
> +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> +		 *
> +		 * when downscaling, we have to read more pixels per line in
> +		 * the time frame reserved for a single line, so the bandwidth
> +		 * demand can be punctually higher. To account for that, we
> +		 * calculate the down-scaling factor and multiply the plane
> +		 * load by this number. We're likely over-estimating the read
> +		 * demand, but that's better than under-estimating it.
> +		 */
> +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
> +					     vc4_state->crtc_h);
> +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
> +			      vscale_factor;
> +
> +		vc4_state->membus_load += fb->format->cpp[i] * pixels_load;
> +		vc4_state->hvs_load += pixels_load;
> +	}
> +
> +	vc4_state->hvs_load *= vrefresh;
> +	vc4_state->hvs_load >>= hvs_load_shift;
> +	vc4_state->membus_load *= vrefresh;
> +}
> +
>  /* Writes out a full display list for an active plane to the plane's
>   * private dlist state.
>   */
> @@ -722,6 +780,8 @@ static int vc4_plane_mode_set(struct drm_plane *plane,
>  	vc4_state->needs_bg_fill = fb->format->has_alpha || !covers_screen ||
>  				   state->alpha != DRM_BLEND_ALPHA_OPAQUE;
>  
> +	vc4_plane_calc_load(state);
> +
>  	return 0;
>  }
>  
> -- 
> 2.14.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Boris Brezillon Oct. 16, 2018, 1:10 p.m. UTC | #2
Hi Daniel,

On Tue, 16 Oct 2018 14:57:43 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > meet the requested framerate. The problem is, the HVS and memory bus
> > bandwidths are limited, and if we don't take these limitations into
> > account we might end up with HVS underflow errors.
> > 
> > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > consumption and take a decision at atomic_check() time whether the
> > estimated load will fit in the HVS and membus budget.
> > 
> > Note that we take an extra margin on the memory bus consumption to let
> > the system run smoothly when other blocks are doing heavy use of the
> > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > this case, since the HVS is not used by external components.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > ---
> > This logic has been validated using a simple shell script and
> > some instrumentation in the VC4 driver:
> > 
> > - capture underflow errors at the HVS level and expose a debugfs file
> >   reporting those errors
> > - add debugfs files to expose when atomic_check fails because of the
> >   HVS or membus load limitation or when it fails for other reasons
> > 
> > The branch containing those modification is available here [1], and the
> > script (which is internally using modetest) is here [2] (please note
> > that I'm bad at writing shell scripts :-)).
> > 
> > Note that those modification tend to over-estimate the load, and thus
> > reject setups that might have previously worked, so we might want to
> > adjust the limits to avoid that.
> > 
> > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> 
> Any interest in using igt to test this stuff? We have at least a bunch of
> tests already in there that try all kinds of plane setups. And we use
> those to hunt for underruns on i915 hw.
> 
> Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> level, using DRM_ERROR, plus a tracepoint. See e.g.
> intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> perhaps extract this into something common, similar to what was done with
> crc support already.

Sounds like a good idea. I'll have a look at what's done in the intel
driver and will check how feasible this is to have a common
infrastructure to test that in igt.

Thanks for the pointers.


> > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > +{
> > +	struct drm_plane_state *old_plane_state, *new_plane_state;
> > +	struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > +	struct vc4_load_tracker_state *load_state;
> > +	struct drm_private_state *priv_state;
> > +	struct drm_plane *plane;
> > +	int ret, i;
> > +  
> 
> You're missing the modeset locking for vc4->load_tracker. See the
> kerneldoc for drm_atomic_get_private_obj_state().

Hm, I missed this part of the doc, and I thought we were protected by
the modeset lock.

> Probably a good time to
> implement the locking refactoring idea I have and just implement a per
> private_obj lock, and remove all the ad-hoc locking from all the callers?

Let me see if I get it correctly. You want to add a lock to the
drm_private_obj struct, right?

> 
> Would definitely simplify the code, and avoid "oops no locking" issues
> like here.

Yep, I agree.

> 
> Cheers, Daniel
> 
> > +	priv_state = drm_atomic_get_private_obj_state(state,
> > +						      &vc4->load_tracker);
> > +	if (IS_ERR(priv_state))
> > +		return PTR_ERR(priv_state);
> > +
> > +	load_state = to_vc4_load_tracker_state(priv_state);
> > +	for_each_oldnew_plane_in_state(state, plane, old_plane_state,
> > +				       new_plane_state, i) {
> > +		struct vc4_plane_state *vc4_plane_state;
> > +
> > +		if (old_plane_state->fb && old_plane_state->crtc) {
> > +			vc4_plane_state = to_vc4_plane_state(old_plane_state);
> > +			load_state->membus_load -= vc4_plane_state->membus_load;
> > +			load_state->hvs_load -= vc4_plane_state->hvs_load;
> > +		}
> > +
> > +		if (new_plane_state->fb && new_plane_state->crtc) {
> > +			vc4_plane_state = to_vc4_plane_state(new_plane_state);
> > +			load_state->membus_load += vc4_plane_state->membus_load;
> > +			load_state->hvs_load += vc4_plane_state->hvs_load;
> > +		}
> > +	}
> > +
> > +	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let
> > +	 * the system work when other blocks are accessing the memory.
> > +	 */
> > +	if (load_state->membus_load > SZ_1G + SZ_512M)
> > +		return -ENOSPC;
> > +
> > +	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> > +	 * consider the maximum number of cycles is 240M.
> > +	 */
> > +	if (load_state->hvs_load > 240000000ULL)
> > +		return -ENOSPC;  
> 
> EINVAL is for atomic_check failures. ENOSPC isn't one of the permitted
> errno codes, see the kernel-doc for &drm_mode_config_funcs.atomic_check.
> atomic_commit has a different set of permissible errno codes.

Okay, will change ENOSPC into EINVAL.

> 
> We should probably enforce this in drm core ...

Enough on my plate for now, I'll let someone else take care of
that if you don't mind :-P.

Thanks for the review.

Boris
Daniel Vetter Oct. 16, 2018, 1:12 p.m. UTC | #3
On Tue, Oct 16, 2018 at 3:10 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
>
> Hi Daniel,
>
> On Tue, 16 Oct 2018 14:57:43 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > meet the requested framerate. The problem is, the HVS and memory bus
> > > bandwidths are limited, and if we don't take these limitations into
> > > account we might end up with HVS underflow errors.
> > >
> > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > consumption and take a decision at atomic_check() time whether the
> > > estimated load will fit in the HVS and membus budget.
> > >
> > > Note that we take an extra margin on the memory bus consumption to let
> > > the system run smoothly when other blocks are doing heavy use of the
> > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > this case, since the HVS is not used by external components.
> > >
> > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > ---
> > > This logic has been validated using a simple shell script and
> > > some instrumentation in the VC4 driver:
> > >
> > > - capture underflow errors at the HVS level and expose a debugfs file
> > >   reporting those errors
> > > - add debugfs files to expose when atomic_check fails because of the
> > >   HVS or membus load limitation or when it fails for other reasons
> > >
> > > The branch containing those modification is available here [1], and the
> > > script (which is internally using modetest) is here [2] (please note
> > > that I'm bad at writing shell scripts :-)).
> > >
> > > Note that those modification tend to over-estimate the load, and thus
> > > reject setups that might have previously worked, so we might want to
> > > adjust the limits to avoid that.
> > >
> > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
> >
> > Any interest in using igt to test this stuff? We have at least a bunch of
> > tests already in there that try all kinds of plane setups. And we use
> > those to hunt for underruns on i915 hw.
> >
> > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > level, using DRM_ERROR, plus a tracepoint. See e.g.
> > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > perhaps extract this into something common, similar to what was done with
> > crc support already.
>
> Sounds like a good idea. I'll have a look at what's done in the intel
> driver and will check how feasible this is to have a common
> infrastructure to test that in igt.
>
> Thanks for the pointers.
>
>
> > > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > > +{
> > > +   struct drm_plane_state *old_plane_state, *new_plane_state;
> > > +   struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > > +   struct vc4_load_tracker_state *load_state;
> > > +   struct drm_private_state *priv_state;
> > > +   struct drm_plane *plane;
> > > +   int ret, i;
> > > +
> >
> > You're missing the modeset locking for vc4->load_tracker. See the
> > kerneldoc for drm_atomic_get_private_obj_state().
>
> Hm, I missed this part of the doc, and I thought we were protected by
> the modeset lock.
>
> > Probably a good time to
> > implement the locking refactoring idea I have and just implement a per
> > private_obj lock, and remove all the ad-hoc locking from all the callers?
>
> Let me see if I get it correctly. You want to add a lock to the
> drm_private_obj struct, right?

Yup, plus automatically grab that lock in get_private_obj_state, and
then drop the now superfluous locking from all the callers.

>
> >
> > Would definitely simplify the code, and avoid "oops no locking" issues
> > like here.
>
> Yep, I agree.
>
> >
> > Cheers, Daniel
> >
> > > +   priv_state = drm_atomic_get_private_obj_state(state,
> > > +                                                 &vc4->load_tracker);
> > > +   if (IS_ERR(priv_state))
> > > +           return PTR_ERR(priv_state);
> > > +
> > > +   load_state = to_vc4_load_tracker_state(priv_state);
> > > +   for_each_oldnew_plane_in_state(state, plane, old_plane_state,
> > > +                                  new_plane_state, i) {
> > > +           struct vc4_plane_state *vc4_plane_state;
> > > +
> > > +           if (old_plane_state->fb && old_plane_state->crtc) {
> > > +                   vc4_plane_state = to_vc4_plane_state(old_plane_state);
> > > +                   load_state->membus_load -= vc4_plane_state->membus_load;
> > > +                   load_state->hvs_load -= vc4_plane_state->hvs_load;
> > > +           }
> > > +
> > > +           if (new_plane_state->fb && new_plane_state->crtc) {
> > > +                   vc4_plane_state = to_vc4_plane_state(new_plane_state);
> > > +                   load_state->membus_load += vc4_plane_state->membus_load;
> > > +                   load_state->hvs_load += vc4_plane_state->hvs_load;
> > > +           }
> > > +   }
> > > +
> > > +   /* The abolsute limit is 2Gbyte/sec, but let's take a margin to let
> > > +    * the system work when other blocks are accessing the memory.
> > > +    */
> > > +   if (load_state->membus_load > SZ_1G + SZ_512M)
> > > +           return -ENOSPC;
> > > +
> > > +   /* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> > > +    * consider the maximum number of cycles is 240M.
> > > +    */
> > > +   if (load_state->hvs_load > 240000000ULL)
> > > +           return -ENOSPC;
> >
> > EINVAL is for atomic_check failures. ENOSPC isn't one of the permitted
> > errno codes, see the kernel-doc for &drm_mode_config_funcs.atomic_check.
> > atomic_commit has a different set of permissible errno codes.
>
> Okay, will change ENOSPC into EINVAL.
>
> >
> > We should probably enforce this in drm core ...
>
> Enough on my plate for now, I'll let someone else take care of
> that if you don't mind :-P.

Hm, would make for a neat starter task in Documentation/gpu/todo.rst.
If I can volunteer you for the typing :-)
-Daniel

>
> Thanks for the review.
>
> Boris
Ville Syrjälä Oct. 16, 2018, 1:28 p.m. UTC | #4
On Tue, Oct 16, 2018 at 03:12:54PM +0200, Daniel Vetter wrote:
> On Tue, Oct 16, 2018 at 3:10 PM Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> >
> > Hi Daniel,
> >
> > On Tue, 16 Oct 2018 14:57:43 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > bandwidths are limited, and if we don't take these limitations into
> > > > account we might end up with HVS underflow errors.
> > > >
> > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > consumption and take a decision at atomic_check() time whether the
> > > > estimated load will fit in the HVS and membus budget.
> > > >
> > > > Note that we take an extra margin on the memory bus consumption to let
> > > > the system run smoothly when other blocks are doing heavy use of the
> > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > this case, since the HVS is not used by external components.
> > > >
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > ---
> > > > This logic has been validated using a simple shell script and
> > > > some instrumentation in the VC4 driver:
> > > >
> > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > >   reporting those errors
> > > > - add debugfs files to expose when atomic_check fails because of the
> > > >   HVS or membus load limitation or when it fails for other reasons
> > > >
> > > > The branch containing those modification is available here [1], and the
> > > > script (which is internally using modetest) is here [2] (please note
> > > > that I'm bad at writing shell scripts :-)).
> > > >
> > > > Note that those modification tend to over-estimate the load, and thus
> > > > reject setups that might have previously worked, so we might want to
> > > > adjust the limits to avoid that.
> > > >
> > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
> > >
> > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > tests already in there that try all kinds of plane setups. And we use
> > > those to hunt for underruns on i915 hw.
> > >
> > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > level, using DRM_ERROR, plus a tracepoint. See e.g.
> > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > perhaps extract this into something common, similar to what was done with
> > > crc support already.
> >
> > Sounds like a good idea. I'll have a look at what's done in the intel
> > driver and will check how feasible this is to have a common
> > infrastructure to test that in igt.
> >
> > Thanks for the pointers.
> >
> >
> > > > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > > > +{
> > > > +   struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > +   struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > > > +   struct vc4_load_tracker_state *load_state;
> > > > +   struct drm_private_state *priv_state;
> > > > +   struct drm_plane *plane;
> > > > +   int ret, i;
> > > > +
> > >
> > > You're missing the modeset locking for vc4->load_tracker. See the
> > > kerneldoc for drm_atomic_get_private_obj_state().
> >
> > Hm, I missed this part of the doc, and I thought we were protected by
> > the modeset lock.
> >
> > > Probably a good time to
> > > implement the locking refactoring idea I have and just implement a per
> > > private_obj lock, and remove all the ad-hoc locking from all the callers?
> >
> > Let me see if I get it correctly. You want to add a lock to the
> > drm_private_obj struct, right?
> 
> Yup, plus automatically grab that lock in get_private_obj_state, and
> then drop the now superfluous locking from all the callers.

You mean this?
https://patchwork.freedesktop.org/patch/205943/
Daniel Vetter Oct. 16, 2018, 4:41 p.m. UTC | #5
On Tue, Oct 16, 2018 at 04:28:09PM +0300, Ville Syrjälä wrote:
> On Tue, Oct 16, 2018 at 03:12:54PM +0200, Daniel Vetter wrote:
> > On Tue, Oct 16, 2018 at 3:10 PM Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:
> > >
> > > Hi Daniel,
> > >
> > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > account we might end up with HVS underflow errors.
> > > > >
> > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > consumption and take a decision at atomic_check() time whether the
> > > > > estimated load will fit in the HVS and membus budget.
> > > > >
> > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > this case, since the HVS is not used by external components.
> > > > >
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > ---
> > > > > This logic has been validated using a simple shell script and
> > > > > some instrumentation in the VC4 driver:
> > > > >
> > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > >   reporting those errors
> > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > >
> > > > > The branch containing those modification is available here [1], and the
> > > > > script (which is internally using modetest) is here [2] (please note
> > > > > that I'm bad at writing shell scripts :-)).
> > > > >
> > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > reject setups that might have previously worked, so we might want to
> > > > > adjust the limits to avoid that.
> > > > >
> > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
> > > >
> > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > tests already in there that try all kinds of plane setups. And we use
> > > > those to hunt for underruns on i915 hw.
> > > >
> > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > level, using DRM_ERROR, plus a tracepoint. See e.g.
> > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > perhaps extract this into something common, similar to what was done with
> > > > crc support already.
> > >
> > > Sounds like a good idea. I'll have a look at what's done in the intel
> > > driver and will check how feasible this is to have a common
> > > infrastructure to test that in igt.
> > >
> > > Thanks for the pointers.
> > >
> > >
> > > > > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > > > > +{
> > > > > +   struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > +   struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > > > > +   struct vc4_load_tracker_state *load_state;
> > > > > +   struct drm_private_state *priv_state;
> > > > > +   struct drm_plane *plane;
> > > > > +   int ret, i;
> > > > > +
> > > >
> > > > You're missing the modeset locking for vc4->load_tracker. See the
> > > > kerneldoc for drm_atomic_get_private_obj_state().
> > >
> > > Hm, I missed this part of the doc, and I thought we were protected by
> > > the modeset lock.
> > >
> > > > Probably a good time to
> > > > implement the locking refactoring idea I have and just implement a per
> > > > private_obj lock, and remove all the ad-hoc locking from all the callers?
> > >
> > > Let me see if I get it correctly. You want to add a lock to the
> > > drm_private_obj struct, right?
> > 
> > Yup, plus automatically grab that lock in get_private_obj_state, and
> > then drop the now superfluous locking from all the callers.
> 
> You mean this?
> https://patchwork.freedesktop.org/patch/205943/

Yup, that one exactly. No idea why it died in a bikeshed, except maybe we
should put all the private obj on a list and add some code to
lock_all_ctx() to also lock those.
-Daniel
Ville Syrjälä Oct. 16, 2018, 5:39 p.m. UTC | #6
On Tue, Oct 16, 2018 at 06:41:51PM +0200, Daniel Vetter wrote:
> On Tue, Oct 16, 2018 at 04:28:09PM +0300, Ville Syrjälä wrote:
> > On Tue, Oct 16, 2018 at 03:12:54PM +0200, Daniel Vetter wrote:
> > > On Tue, Oct 16, 2018 at 3:10 PM Boris Brezillon
> > > <boris.brezillon@bootlin.com> wrote:
> > > >
> > > > Hi Daniel,
> > > >
> > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > account we might end up with HVS underflow errors.
> > > > > >
> > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > estimated load will fit in the HVS and membus budget.
> > > > > >
> > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > this case, since the HVS is not used by external components.
> > > > > >
> > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > ---
> > > > > > This logic has been validated using a simple shell script and
> > > > > > some instrumentation in the VC4 driver:
> > > > > >
> > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > >   reporting those errors
> > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > >
> > > > > > The branch containing those modification is available here [1], and the
> > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > that I'm bad at writing shell scripts :-)).
> > > > > >
> > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > reject setups that might have previously worked, so we might want to
> > > > > > adjust the limits to avoid that.
> > > > > >
> > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
> > > > >
> > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > those to hunt for underruns on i915 hw.
> > > > >
> > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > level, using DRM_ERROR, plus a tracepoint. See e.g.
> > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > perhaps extract this into something common, similar to what was done with
> > > > > crc support already.
> > > >
> > > > Sounds like a good idea. I'll have a look at what's done in the intel
> > > > driver and will check how feasible this is to have a common
> > > > infrastructure to test that in igt.
> > > >
> > > > Thanks for the pointers.
> > > >
> > > >
> > > > > > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > > > > > +{
> > > > > > +   struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > > +   struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > > > > > +   struct vc4_load_tracker_state *load_state;
> > > > > > +   struct drm_private_state *priv_state;
> > > > > > +   struct drm_plane *plane;
> > > > > > +   int ret, i;
> > > > > > +
> > > > >
> > > > > You're missing the modeset locking for vc4->load_tracker. See the
> > > > > kerneldoc for drm_atomic_get_private_obj_state().
> > > >
> > > > Hm, I missed this part of the doc, and I thought we were protected by
> > > > the modeset lock.
> > > >
> > > > > Probably a good time to
> > > > > implement the locking refactoring idea I have and just implement a per
> > > > > private_obj lock, and remove all the ad-hoc locking from all the callers?
> > > >
> > > > Let me see if I get it correctly. You want to add a lock to the
> > > > drm_private_obj struct, right?
> > > 
> > > Yup, plus automatically grab that lock in get_private_obj_state, and
> > > then drop the now superfluous locking from all the callers.
> > 
> > You mean this?
> > https://patchwork.freedesktop.org/patch/205943/
> 
> Yup, that one exactly. No idea why it died in a bikeshed, except maybe we
> should put all the private obj on a list and add some code to
> lock_all_ctx() to also lock those.

Yeah, I was also thinking along those lines a while ago when looking
at some function that iterates all the other objects already. Can't
recall what function that was though :(

Wrt. the per-private obj locks, we still don't use that for i915 cdclk
etc. so I can't in good conciense nak it. If it's useful for other
drivers then it should go in. And we'll just have to think of some
other way to handle the i915 case should we ever decide to convert
it over.
Daniel Vetter Oct. 16, 2018, 5:51 p.m. UTC | #7
On Tue, Oct 16, 2018 at 7:39 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Tue, Oct 16, 2018 at 06:41:51PM +0200, Daniel Vetter wrote:
> > On Tue, Oct 16, 2018 at 04:28:09PM +0300, Ville Syrjälä wrote:
> > > On Tue, Oct 16, 2018 at 03:12:54PM +0200, Daniel Vetter wrote:
> > > > On Tue, Oct 16, 2018 at 3:10 PM Boris Brezillon
> > > > <boris.brezillon@bootlin.com> wrote:
> > > > >
> > > > > Hi Daniel,
> > > > >
> > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > account we might end up with HVS underflow errors.
> > > > > > >
> > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > >
> > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > this case, since the HVS is not used by external components.
> > > > > > >
> > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > ---
> > > > > > > This logic has been validated using a simple shell script and
> > > > > > > some instrumentation in the VC4 driver:
> > > > > > >
> > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > >   reporting those errors
> > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > >
> > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > >
> > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > adjust the limits to avoid that.
> > > > > > >
> > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
> > > > > >
> > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > those to hunt for underruns on i915 hw.
> > > > > >
> > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > level, using DRM_ERROR, plus a tracepoint. See e.g.
> > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > crc support already.
> > > > >
> > > > > Sounds like a good idea. I'll have a look at what's done in the intel
> > > > > driver and will check how feasible this is to have a common
> > > > > infrastructure to test that in igt.
> > > > >
> > > > > Thanks for the pointers.
> > > > >
> > > > >
> > > > > > > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > > > > > > +{
> > > > > > > +   struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > > > +   struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > > > > > > +   struct vc4_load_tracker_state *load_state;
> > > > > > > +   struct drm_private_state *priv_state;
> > > > > > > +   struct drm_plane *plane;
> > > > > > > +   int ret, i;
> > > > > > > +
> > > > > >
> > > > > > You're missing the modeset locking for vc4->load_tracker. See the
> > > > > > kerneldoc for drm_atomic_get_private_obj_state().
> > > > >
> > > > > Hm, I missed this part of the doc, and I thought we were protected by
> > > > > the modeset lock.
> > > > >
> > > > > > Probably a good time to
> > > > > > implement the locking refactoring idea I have and just implement a per
> > > > > > private_obj lock, and remove all the ad-hoc locking from all the callers?
> > > > >
> > > > > Let me see if I get it correctly. You want to add a lock to the
> > > > > drm_private_obj struct, right?
> > > >
> > > > Yup, plus automatically grab that lock in get_private_obj_state, and
> > > > then drop the now superfluous locking from all the callers.
> > >
> > > You mean this?
> > > https://patchwork.freedesktop.org/patch/205943/
> >
> > Yup, that one exactly. No idea why it died in a bikeshed, except maybe we
> > should put all the private obj on a list and add some code to
> > lock_all_ctx() to also lock those.
>
> Yeah, I was also thinking along those lines a while ago when looking
> at some function that iterates all the other objects already. Can't
> recall what function that was though :(
>
> Wrt. the per-private obj locks, we still don't use that for i915 cdclk
> etc. so I can't in good conciense nak it. If it's useful for other
> drivers then it should go in. And we'll just have to think of some
> other way to handle the i915 case should we ever decide to convert
> it over.

I think we should still be able to implement a read-only peek
interface (just return the current state pointer, cast to const),
assuming you have some other locking constraints. Entirely in i915
code of course, that's not something we can easily make generic
unfortunately.
-Daniel
Boris Brezillon Oct. 17, 2018, 1:45 p.m. UTC | #8
+Rob

On Tue, 16 Oct 2018 18:41:51 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Oct 16, 2018 at 04:28:09PM +0300, Ville Syrjälä wrote:
> > On Tue, Oct 16, 2018 at 03:12:54PM +0200, Daniel Vetter wrote:  
> > > On Tue, Oct 16, 2018 at 3:10 PM Boris Brezillon
> > > <boris.brezillon@bootlin.com> wrote:  
> > > >
> > > > Hi Daniel,
> > > >
> > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >  
> > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > account we might end up with HVS underflow errors.
> > > > > >
> > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > estimated load will fit in the HVS and membus budget.
> > > > > >
> > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > this case, since the HVS is not used by external components.
> > > > > >
> > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > ---
> > > > > > This logic has been validated using a simple shell script and
> > > > > > some instrumentation in the VC4 driver:
> > > > > >
> > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > >   reporting those errors
> > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > >
> > > > > > The branch containing those modification is available here [1], and the
> > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > that I'm bad at writing shell scripts :-)).
> > > > > >
> > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > reject setups that might have previously worked, so we might want to
> > > > > > adjust the limits to avoid that.
> > > > > >
> > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> > > > >
> > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > those to hunt for underruns on i915 hw.
> > > > >
> > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > level, using DRM_ERROR, plus a tracepoint. See e.g.
> > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > perhaps extract this into something common, similar to what was done with
> > > > > crc support already.  
> > > >
> > > > Sounds like a good idea. I'll have a look at what's done in the intel
> > > > driver and will check how feasible this is to have a common
> > > > infrastructure to test that in igt.
> > > >
> > > > Thanks for the pointers.
> > > >
> > > >  
> > > > > > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> > > > > > +{
> > > > > > +   struct drm_plane_state *old_plane_state, *new_plane_state;
> > > > > > +   struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> > > > > > +   struct vc4_load_tracker_state *load_state;
> > > > > > +   struct drm_private_state *priv_state;
> > > > > > +   struct drm_plane *plane;
> > > > > > +   int ret, i;
> > > > > > +  
> > > > >
> > > > > You're missing the modeset locking for vc4->load_tracker. See the
> > > > > kerneldoc for drm_atomic_get_private_obj_state().  
> > > >
> > > > Hm, I missed this part of the doc, and I thought we were protected by
> > > > the modeset lock.
> > > >  
> > > > > Probably a good time to
> > > > > implement the locking refactoring idea I have and just implement a per
> > > > > private_obj lock, and remove all the ad-hoc locking from all the callers?  
> > > >
> > > > Let me see if I get it correctly. You want to add a lock to the
> > > > drm_private_obj struct, right?  
> > > 
> > > Yup, plus automatically grab that lock in get_private_obj_state, and
> > > then drop the now superfluous locking from all the callers.  
> > 
> > You mean this?
> > https://patchwork.freedesktop.org/patch/205943/  
> 
> Yup, that one exactly. No idea why it died in a bikeshed, except maybe we
> should put all the private obj on a list and add some code to
> lock_all_ctx() to also lock those.

Does anyone plan to work on that or should I do it?
Boris Brezillon Oct. 23, 2018, 7:55 a.m. UTC | #9
Hi Daniel,

On Tue, 16 Oct 2018 14:57:43 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > meet the requested framerate. The problem is, the HVS and memory bus
> > bandwidths are limited, and if we don't take these limitations into
> > account we might end up with HVS underflow errors.
> > 
> > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > consumption and take a decision at atomic_check() time whether the
> > estimated load will fit in the HVS and membus budget.
> > 
> > Note that we take an extra margin on the memory bus consumption to let
> > the system run smoothly when other blocks are doing heavy use of the
> > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > this case, since the HVS is not used by external components.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > ---
> > This logic has been validated using a simple shell script and
> > some instrumentation in the VC4 driver:
> > 
> > - capture underflow errors at the HVS level and expose a debugfs file
> >   reporting those errors
> > - add debugfs files to expose when atomic_check fails because of the
> >   HVS or membus load limitation or when it fails for other reasons
> > 
> > The branch containing those modification is available here [1], and the
> > script (which is internally using modetest) is here [2] (please note
> > that I'm bad at writing shell scripts :-)).
> > 
> > Note that those modification tend to over-estimate the load, and thus
> > reject setups that might have previously worked, so we might want to
> > adjust the limits to avoid that.
> > 
> > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> 
> Any interest in using igt to test this stuff? We have at least a bunch of
> tests already in there that try all kinds of plane setups. And we use
> those to hunt for underruns on i915 hw.
> 
> Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> level, using DRM_ERROR,

Are you masking the underrun interrupt after it's been reported? If we
don't do that on VC4 we just end up flooding the kernel-log buffer until
someone comes and update the config.

> plus a tracepoint. See e.g.
> intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> perhaps extract this into something common, similar to what was done with
> crc support already.
> 

I'm not a big fan of hardcoded trace points in general (because of the
whole "it's part of the stable ABI" thing), and in this case, making the
tracepoint generic sounds even more risky to me. Indeed, how can we know
about all the HW specific bits one might want to expose. For instance,
I see the intel underrun tracepoint exposes a struct with a frame and
scanline field, and AFAICT, we don't have such information in the VC4
case.

Any opinion on that?

Regards,

Boris
Daniel Vetter Oct. 23, 2018, 1:44 p.m. UTC | #10
On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:
> Hi Daniel,
> 
> On Tue, 16 Oct 2018 14:57:43 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > meet the requested framerate. The problem is, the HVS and memory bus
> > > bandwidths are limited, and if we don't take these limitations into
> > > account we might end up with HVS underflow errors.
> > > 
> > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > consumption and take a decision at atomic_check() time whether the
> > > estimated load will fit in the HVS and membus budget.
> > > 
> > > Note that we take an extra margin on the memory bus consumption to let
> > > the system run smoothly when other blocks are doing heavy use of the
> > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > this case, since the HVS is not used by external components.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > ---
> > > This logic has been validated using a simple shell script and
> > > some instrumentation in the VC4 driver:
> > > 
> > > - capture underflow errors at the HVS level and expose a debugfs file
> > >   reporting those errors
> > > - add debugfs files to expose when atomic_check fails because of the
> > >   HVS or membus load limitation or when it fails for other reasons
> > > 
> > > The branch containing those modification is available here [1], and the
> > > script (which is internally using modetest) is here [2] (please note
> > > that I'm bad at writing shell scripts :-)).
> > > 
> > > Note that those modification tend to over-estimate the load, and thus
> > > reject setups that might have previously worked, so we might want to
> > > adjust the limits to avoid that.
> > > 
> > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> > 
> > Any interest in using igt to test this stuff? We have at least a bunch of
> > tests already in there that try all kinds of plane setups. And we use
> > those to hunt for underruns on i915 hw.
> > 
> > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > level, using DRM_ERROR,
> 
> Are you masking the underrun interrupt after it's been reported? If we
> don't do that on VC4 we just end up flooding the kernel-log buffer until
> someone comes and update the config.

Yeah we do that too. Rule is that a full modeset will clear any underrun
masking (so tests need to make sure they start with a modeset, or it'll be
for nothing).

> 
> > plus a tracepoint. See e.g.
> > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > perhaps extract this into something common, similar to what was done with
> > crc support already.
> > 
> 
> I'm not a big fan of hardcoded trace points in general (because of the
> whole "it's part of the stable ABI" thing), and in this case, making the
> tracepoint generic sounds even more risky to me. Indeed, how can we know
> about all the HW specific bits one might want to expose. For instance,
> I see the intel underrun tracepoint exposes a struct with a frame and
> scanline field, and AFAICT, we don't have such information in the VC4
> case.
> 
> Any opinion on that?

It's only abi if you're unlucky. If it's just for debugging and
validation, you can change it again. Tbh, no idea why we even have these
tracepoints, they're fairly useless imo. CI only relies upon the dmesg
output. Maybe run git blame and ask the original author, we can probably
update them to suit our needs.
-Daniel
Boris Brezillon Oct. 25, 2018, 8:09 a.m. UTC | #11
On Tue, 23 Oct 2018 15:44:43 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:
> > Hi Daniel,
> > 
> > On Tue, 16 Oct 2018 14:57:43 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > bandwidths are limited, and if we don't take these limitations into
> > > > account we might end up with HVS underflow errors.
> > > > 
> > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > consumption and take a decision at atomic_check() time whether the
> > > > estimated load will fit in the HVS and membus budget.
> > > > 
> > > > Note that we take an extra margin on the memory bus consumption to let
> > > > the system run smoothly when other blocks are doing heavy use of the
> > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > this case, since the HVS is not used by external components.
> > > > 
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > ---
> > > > This logic has been validated using a simple shell script and
> > > > some instrumentation in the VC4 driver:
> > > > 
> > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > >   reporting those errors
> > > > - add debugfs files to expose when atomic_check fails because of the
> > > >   HVS or membus load limitation or when it fails for other reasons
> > > > 
> > > > The branch containing those modification is available here [1], and the
> > > > script (which is internally using modetest) is here [2] (please note
> > > > that I'm bad at writing shell scripts :-)).
> > > > 
> > > > Note that those modification tend to over-estimate the load, and thus
> > > > reject setups that might have previously worked, so we might want to
> > > > adjust the limits to avoid that.
> > > > 
> > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test    
> > > 
> > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > tests already in there that try all kinds of plane setups. And we use
> > > those to hunt for underruns on i915 hw.
> > > 
> > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > level, using DRM_ERROR,  
> > 
> > Are you masking the underrun interrupt after it's been reported? If we
> > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > someone comes and update the config.  
> 
> Yeah we do that too. Rule is that a full modeset will clear any underrun
> masking (so tests need to make sure they start with a modeset, or it'll be
> for nothing).
> 
> >   
> > > plus a tracepoint. See e.g.
> > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > perhaps extract this into something common, similar to what was done with
> > > crc support already.
> > >   
> > 
> > I'm not a big fan of hardcoded trace points in general (because of the
> > whole "it's part of the stable ABI" thing), and in this case, making the
> > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > about all the HW specific bits one might want to expose. For instance,
> > I see the intel underrun tracepoint exposes a struct with a frame and
> > scanline field, and AFAICT, we don't have such information in the VC4
> > case.
> > 
> > Any opinion on that?  
> 
> It's only abi if you're unlucky. If it's just for debugging and
> validation, you can change it again. Tbh, no idea why we even have these
> tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> output. Maybe run git blame and ask the original author, we can probably
> update them to suit our needs.

Okay, I think I'll go for a generic debugfs entry that returns true
when an underrun error happened since the last modeset, false otherwise.

Next question is: should I attach the underrun status to the drm_device
or have one per CRTC? In my case, I only care about the "has an
underrun error occurred on any of the active CRTC" case, so I'd vote for
a per-device underrun status.
Daniel Vetter Oct. 25, 2018, 9:33 a.m. UTC | #12
On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:
> On Tue, 23 Oct 2018 15:44:43 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:
> > > Hi Daniel,
> > > 
> > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >   
> > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > account we might end up with HVS underflow errors.
> > > > > 
> > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > consumption and take a decision at atomic_check() time whether the
> > > > > estimated load will fit in the HVS and membus budget.
> > > > > 
> > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > this case, since the HVS is not used by external components.
> > > > > 
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > ---
> > > > > This logic has been validated using a simple shell script and
> > > > > some instrumentation in the VC4 driver:
> > > > > 
> > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > >   reporting those errors
> > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > 
> > > > > The branch containing those modification is available here [1], and the
> > > > > script (which is internally using modetest) is here [2] (please note
> > > > > that I'm bad at writing shell scripts :-)).
> > > > > 
> > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > reject setups that might have previously worked, so we might want to
> > > > > adjust the limits to avoid that.
> > > > > 
> > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test    
> > > > 
> > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > tests already in there that try all kinds of plane setups. And we use
> > > > those to hunt for underruns on i915 hw.
> > > > 
> > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > level, using DRM_ERROR,  
> > > 
> > > Are you masking the underrun interrupt after it's been reported? If we
> > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > someone comes and update the config.  
> > 
> > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > masking (so tests need to make sure they start with a modeset, or it'll be
> > for nothing).
> > 
> > >   
> > > > plus a tracepoint. See e.g.
> > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > perhaps extract this into something common, similar to what was done with
> > > > crc support already.
> > > >   
> > > 
> > > I'm not a big fan of hardcoded trace points in general (because of the
> > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > about all the HW specific bits one might want to expose. For instance,
> > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > scanline field, and AFAICT, we don't have such information in the VC4
> > > case.
> > > 
> > > Any opinion on that?  
> > 
> > It's only abi if you're unlucky. If it's just for debugging and
> > validation, you can change it again. Tbh, no idea why we even have these
> > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > output. Maybe run git blame and ask the original author, we can probably
> > update them to suit our needs.
> 
> Okay, I think I'll go for a generic debugfs entry that returns true
> when an underrun error happened since the last modeset, false otherwise.
> 
> Next question is: should I attach the underrun status to the drm_device
> or have one per CRTC? In my case, I only care about the "has an
> underrun error occurred on any of the active CRTC" case, so I'd vote for
> a per-device underrun status.

Yeah probably good enough. For our CI all we care about is the warn/error
level dmesg output. Anything at that level is considered a CI failure.

What do you need the debugfs file for?
-Daniel
Boris Brezillon Oct. 25, 2018, 9:41 a.m. UTC | #13
On Thu, 25 Oct 2018 11:33:14 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:
> > On Tue, 23 Oct 2018 15:44:43 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:  
> > > > Hi Daniel,
> > > > 
> > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >     
> > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:    
> > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > account we might end up with HVS underflow errors.
> > > > > > 
> > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > 
> > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > this case, since the HVS is not used by external components.
> > > > > > 
> > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > ---
> > > > > > This logic has been validated using a simple shell script and
> > > > > > some instrumentation in the VC4 driver:
> > > > > > 
> > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > >   reporting those errors
> > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > 
> > > > > > The branch containing those modification is available here [1], and the
> > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > 
> > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > reject setups that might have previously worked, so we might want to
> > > > > > adjust the limits to avoid that.
> > > > > > 
> > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test      
> > > > > 
> > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > those to hunt for underruns on i915 hw.
> > > > > 
> > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > level, using DRM_ERROR,    
> > > > 
> > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > someone comes and update the config.    
> > > 
> > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > for nothing).
> > >   
> > > >     
> > > > > plus a tracepoint. See e.g.
> > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > perhaps extract this into something common, similar to what was done with
> > > > > crc support already.
> > > > >     
> > > > 
> > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > about all the HW specific bits one might want to expose. For instance,
> > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > case.
> > > > 
> > > > Any opinion on that?    
> > > 
> > > It's only abi if you're unlucky. If it's just for debugging and
> > > validation, you can change it again. Tbh, no idea why we even have these
> > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > output. Maybe run git blame and ask the original author, we can probably
> > > update them to suit our needs.  
> > 
> > Okay, I think I'll go for a generic debugfs entry that returns true
> > when an underrun error happened since the last modeset, false otherwise.
> > 
> > Next question is: should I attach the underrun status to the drm_device
> > or have one per CRTC? In my case, I only care about the "has an
> > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > a per-device underrun status.  
> 
> Yeah probably good enough. For our CI all we care about is the warn/error
> level dmesg output. Anything at that level is considered a CI failure.

So igt is grepping dmesg to detect when an underrun happens?

> 
> What do you need the debugfs file for?

I just thought having a debugfs file to expose the underrun information
would be cleaner than having to grep in dmesg to detect such failures.
Daniel Vetter Oct. 26, 2018, 1:30 p.m. UTC | #14
On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:
> On Thu, 25 Oct 2018 11:33:14 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:
> > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >   
> > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:  
> > > > > Hi Daniel,
> > > > > 
> > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >     
> > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:    
> > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > account we might end up with HVS underflow errors.
> > > > > > > 
> > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > 
> > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > this case, since the HVS is not used by external components.
> > > > > > > 
> > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > ---
> > > > > > > This logic has been validated using a simple shell script and
> > > > > > > some instrumentation in the VC4 driver:
> > > > > > > 
> > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > >   reporting those errors
> > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > 
> > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > 
> > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > adjust the limits to avoid that.
> > > > > > > 
> > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test      
> > > > > > 
> > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > those to hunt for underruns on i915 hw.
> > > > > > 
> > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > level, using DRM_ERROR,    
> > > > > 
> > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > someone comes and update the config.    
> > > > 
> > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > for nothing).
> > > >   
> > > > >     
> > > > > > plus a tracepoint. See e.g.
> > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > crc support already.
> > > > > >     
> > > > > 
> > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > case.
> > > > > 
> > > > > Any opinion on that?    
> > > > 
> > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > output. Maybe run git blame and ask the original author, we can probably
> > > > update them to suit our needs.  
> > > 
> > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > when an underrun error happened since the last modeset, false otherwise.
> > > 
> > > Next question is: should I attach the underrun status to the drm_device
> > > or have one per CRTC? In my case, I only care about the "has an
> > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > a per-device underrun status.  
> > 
> > Yeah probably good enough. For our CI all we care about is the warn/error
> > level dmesg output. Anything at that level is considered a CI failure.
> 
> So igt is grepping dmesg to detect when an underrun happens?

No, but the CI runner is also observing dmesg. Anything in there at
warning or higher level is considered a failure.

> > What do you need the debugfs file for?
> 
> I just thought having a debugfs file to expose the underrun information
> would be cleaner than having to grep in dmesg to detect such failures.

The issue is that you want to detect underruns everywhere, not just in the
specific tests you're checking for it. Anything that does a modeset could
cause an underrun (at least we've managed to do so pretty much everywhere
on i915 hw, if you misprogram is sufficiently).
-Daniel
Boris Brezillon Oct. 26, 2018, 1:57 p.m. UTC | #15
On Fri, 26 Oct 2018 15:30:26 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:
> > On Thu, 25 Oct 2018 11:33:14 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:  
> > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >     
> > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:    
> > > > > > Hi Daniel,
> > > > > > 
> > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > >       
> > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:      
> > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > 
> > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > 
> > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > 
> > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > ---
> > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > 
> > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > >   reporting those errors
> > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > 
> > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > 
> > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > adjust the limits to avoid that.
> > > > > > > > 
> > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test        
> > > > > > > 
> > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > 
> > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > level, using DRM_ERROR,      
> > > > > > 
> > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > someone comes and update the config.      
> > > > > 
> > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > for nothing).
> > > > >     
> > > > > >       
> > > > > > > plus a tracepoint. See e.g.
> > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > crc support already.
> > > > > > >       
> > > > > > 
> > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > case.
> > > > > > 
> > > > > > Any opinion on that?      
> > > > > 
> > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > update them to suit our needs.    
> > > > 
> > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > when an underrun error happened since the last modeset, false otherwise.
> > > > 
> > > > Next question is: should I attach the underrun status to the drm_device
> > > > or have one per CRTC? In my case, I only care about the "has an
> > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > a per-device underrun status.    
> > > 
> > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > level dmesg output. Anything at that level is considered a CI failure.  
> > 
> > So igt is grepping dmesg to detect when an underrun happens?  
> 
> No, but the CI runner is also observing dmesg. Anything in there at
> warning or higher level is considered a failure.

Eric, do you do the same when you launch the IGT testsuite?

> 
> > > What do you need the debugfs file for?  
> > 
> > I just thought having a debugfs file to expose the underrun information
> > would be cleaner than having to grep in dmesg to detect such failures.  
> 
> The issue is that you want to detect underruns everywhere, not just in the
> specific tests you're checking for it. Anything that does a modeset could
> cause an underrun (at least we've managed to do so pretty much everywhere
> on i915 hw, if you misprogram is sufficiently).

In my specific case, I want to have the IGT test check the underrun
value while the test is being executed so that I know which exact
configuration triggers the underrun situation. At least that's how I
did to adjust/debug the HVS load tracking code. Maybe it's not really a
problem when all we do is tracking regressions.
Daniel Vetter Oct. 26, 2018, 2:26 p.m. UTC | #16
On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
>
> On Fri, 26 Oct 2018 15:30:26 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:
> > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:
> > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:
> > > > > > > Hi Daniel,
> > > > > > >
> > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > >
> > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:
> > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > >
> > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > >
> > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > ---
> > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > >
> > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > >   reporting those errors
> > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > >
> > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > >
> > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > adjust the limits to avoid that.
> > > > > > > > >
> > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test
> > > > > > > >
> > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > >
> > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > level, using DRM_ERROR,
> > > > > > >
> > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > someone comes and update the config.
> > > > > >
> > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > for nothing).
> > > > > >
> > > > > > >
> > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > crc support already.
> > > > > > > >
> > > > > > >
> > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > case.
> > > > > > >
> > > > > > > Any opinion on that?
> > > > > >
> > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > update them to suit our needs.
> > > > >
> > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > >
> > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > a per-device underrun status.
> > > >
> > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > level dmesg output. Anything at that level is considered a CI failure.
> > >
> > > So igt is grepping dmesg to detect when an underrun happens?
> >
> > No, but the CI runner is also observing dmesg. Anything in there at
> > warning or higher level is considered a failure.
>
> Eric, do you do the same when you launch the IGT testsuite?
>
> >
> > > > What do you need the debugfs file for?
> > >
> > > I just thought having a debugfs file to expose the underrun information
> > > would be cleaner than having to grep in dmesg to detect such failures.
> >
> > The issue is that you want to detect underruns everywhere, not just in the
> > specific tests you're checking for it. Anything that does a modeset could
> > cause an underrun (at least we've managed to do so pretty much everywhere
> > on i915 hw, if you misprogram is sufficiently).
>
> In my specific case, I want to have the IGT test check the underrun
> value while the test is being executed so that I know which exact
> configuration triggers the underrun situation. At least that's how I
> did to adjust/debug the HVS load tracking code. Maybe it's not really a
> problem when all we do is tracking regressions.

Ok, that makes sense, and explains why you want the overall underrun
counter exposed in debugfs. Our experience with underruns has been
that it's very dynamic (on i915 hw), and most often it's not a given
configuration, but a given sequence of changes while some other
workload is going on ...
-Daniel
Boris Brezillon Oct. 26, 2018, 2:52 p.m. UTC | #17
On Fri, 26 Oct 2018 16:26:03 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> >
> > On Fri, 26 Oct 2018 15:30:26 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >  
> > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:  
> > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >  
> > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:  
> > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > >  
> > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:  
> > > > > > > > Hi Daniel,
> > > > > > > >
> > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > >  
> > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > >
> > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > >
> > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > ---
> > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > >
> > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > >   reporting those errors
> > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > >
> > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > >
> > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > >
> > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> > > > > > > > >
> > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > >
> > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > level, using DRM_ERROR,  
> > > > > > > >
> > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > someone comes and update the config.  
> > > > > > >
> > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > for nothing).
> > > > > > >  
> > > > > > > >  
> > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > crc support already.
> > > > > > > > >  
> > > > > > > >
> > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > case.
> > > > > > > >
> > > > > > > > Any opinion on that?  
> > > > > > >
> > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > update them to suit our needs.  
> > > > > >
> > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > >
> > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > a per-device underrun status.  
> > > > >
> > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > level dmesg output. Anything at that level is considered a CI failure.  
> > > >
> > > > So igt is grepping dmesg to detect when an underrun happens?  
> > >
> > > No, but the CI runner is also observing dmesg. Anything in there at
> > > warning or higher level is considered a failure.  
> >
> > Eric, do you do the same when you launch the IGT testsuite?
> >  
> > >  
> > > > > What do you need the debugfs file for?  
> > > >
> > > > I just thought having a debugfs file to expose the underrun information
> > > > would be cleaner than having to grep in dmesg to detect such failures.  
> > >
> > > The issue is that you want to detect underruns everywhere, not just in the
> > > specific tests you're checking for it. Anything that does a modeset could
> > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > on i915 hw, if you misprogram is sufficiently).  
> >
> > In my specific case, I want to have the IGT test check the underrun
> > value while the test is being executed so that I know which exact
> > configuration triggers the underrun situation. At least that's how I
> > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > problem when all we do is tracking regressions.  
> 
> Ok, that makes sense, and explains why you want the overall underrun
> counter exposed in debugfs.

Just one tiny detail which is not exactly related to this discussion
but I thought I'd mention it here: underrun is actually not a counter,
it's just a boolean. I used an atomic_t type to avoid having to add a
spinlock to protect the variable (the variable is modified from
an interrupt context and read from a non-atomic context). So, the
acceptable values for underrun are currently 0 or 1. I can make it a
counter if required.
Ville Syrjälä Oct. 26, 2018, 3:10 p.m. UTC | #18
On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote:
> On Fri, 26 Oct 2018 16:26:03 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:
> > >
> > > On Fri, 26 Oct 2018 15:30:26 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >  
> > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:  
> > > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >  
> > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:  
> > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > >  
> > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:  
> > > > > > > > > Hi Daniel,
> > > > > > > > >
> > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > >  
> > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > > >
> > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > > >
> > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > > ---
> > > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > > >
> > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > > >   reporting those errors
> > > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > > >
> > > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > > >
> > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > > >
> > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> > > > > > > > > >
> > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > > >
> > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > > level, using DRM_ERROR,  
> > > > > > > > >
> > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > > someone comes and update the config.  
> > > > > > > >
> > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > > for nothing).
> > > > > > > >  
> > > > > > > > >  
> > > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > > crc support already.
> > > > > > > > > >  
> > > > > > > > >
> > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > > case.
> > > > > > > > >
> > > > > > > > > Any opinion on that?  
> > > > > > > >
> > > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > > update them to suit our needs.  
> > > > > > >
> > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > > >
> > > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > > a per-device underrun status.  
> > > > > >
> > > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > > level dmesg output. Anything at that level is considered a CI failure.  
> > > > >
> > > > > So igt is grepping dmesg to detect when an underrun happens?  
> > > >
> > > > No, but the CI runner is also observing dmesg. Anything in there at
> > > > warning or higher level is considered a failure.  
> > >
> > > Eric, do you do the same when you launch the IGT testsuite?
> > >  
> > > >  
> > > > > > What do you need the debugfs file for?  
> > > > >
> > > > > I just thought having a debugfs file to expose the underrun information
> > > > > would be cleaner than having to grep in dmesg to detect such failures.  
> > > >
> > > > The issue is that you want to detect underruns everywhere, not just in the
> > > > specific tests you're checking for it. Anything that does a modeset could
> > > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > > on i915 hw, if you misprogram is sufficiently).  
> > >
> > > In my specific case, I want to have the IGT test check the underrun
> > > value while the test is being executed so that I know which exact
> > > configuration triggers the underrun situation. At least that's how I
> > > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > > problem when all we do is tracking regressions.  
> > 
> > Ok, that makes sense, and explains why you want the overall underrun
> > counter exposed in debugfs.
> 
> Just one tiny detail which is not exactly related to this discussion
> but I thought I'd mention it here: underrun is actually not a counter,
> it's just a boolean. I used an atomic_t type to avoid having to add a
> spinlock to protect the variable (the variable is modified from
> an interrupt context and read from a non-atomic context). So, the
> acceptable values for underrun are currently 0 or 1. I can make it a
> counter if required.

One idea I had a while back for i915 would be to count the number of
frames that had an underrun. So basically something like this:

underrun_irq() {
	underrun_frames=1
	disable_underrun_irq();
	vblank_get();
}
vblank_irq() {
	if (underrun) {
		underrun_frames++;
	} else if (underrun_frames) {
		vblank_put();
		enable_underrun_irq();
		DEBUG("%d frames had an underrun\n", underrun_frames);
		underrun_frames=0;
	}
}

This avoids drowning in underrun interrupts while still
reporting at least how many frames had problems.

But I haven't had time to implement that yet.
Daniel Vetter Oct. 29, 2018, 8:06 a.m. UTC | #19
On Fri, Oct 26, 2018 at 06:10:03PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote:
> > On Fri, 26 Oct 2018 16:26:03 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> > 
> > > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> > > <boris.brezillon@bootlin.com> wrote:
> > > >
> > > > On Fri, 26 Oct 2018 15:30:26 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >  
> > > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:  
> > > > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > >  
> > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:  
> > > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > >  
> > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:  
> > > > > > > > > > Hi Daniel,
> > > > > > > > > >
> > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > >  
> > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:  
> > > > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > > > >
> > > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > > > >
> > > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > > > >
> > > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > > > >   reporting those errors
> > > > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > > > >
> > > > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > > > >
> > > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test  
> > > > > > > > > > >
> > > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > > > >
> > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > > > level, using DRM_ERROR,  
> > > > > > > > > >
> > > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > > > someone comes and update the config.  
> > > > > > > > >
> > > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > > > for nothing).
> > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > > > crc support already.
> > > > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > > > case.
> > > > > > > > > >
> > > > > > > > > > Any opinion on that?  
> > > > > > > > >
> > > > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > > > update them to suit our needs.  
> > > > > > > >
> > > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > > > >
> > > > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > > > a per-device underrun status.  
> > > > > > >
> > > > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > > > level dmesg output. Anything at that level is considered a CI failure.  
> > > > > >
> > > > > > So igt is grepping dmesg to detect when an underrun happens?  
> > > > >
> > > > > No, but the CI runner is also observing dmesg. Anything in there at
> > > > > warning or higher level is considered a failure.  
> > > >
> > > > Eric, do you do the same when you launch the IGT testsuite?
> > > >  
> > > > >  
> > > > > > > What do you need the debugfs file for?  
> > > > > >
> > > > > > I just thought having a debugfs file to expose the underrun information
> > > > > > would be cleaner than having to grep in dmesg to detect such failures.  
> > > > >
> > > > > The issue is that you want to detect underruns everywhere, not just in the
> > > > > specific tests you're checking for it. Anything that does a modeset could
> > > > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > > > on i915 hw, if you misprogram is sufficiently).  
> > > >
> > > > In my specific case, I want to have the IGT test check the underrun
> > > > value while the test is being executed so that I know which exact
> > > > configuration triggers the underrun situation. At least that's how I
> > > > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > > > problem when all we do is tracking regressions.  
> > > 
> > > Ok, that makes sense, and explains why you want the overall underrun
> > > counter exposed in debugfs.
> > 
> > Just one tiny detail which is not exactly related to this discussion
> > but I thought I'd mention it here: underrun is actually not a counter,
> > it's just a boolean. I used an atomic_t type to avoid having to add a
> > spinlock to protect the variable (the variable is modified from
> > an interrupt context and read from a non-atomic context). So, the
> > acceptable values for underrun are currently 0 or 1. I can make it a
> > counter if required.
> 
> One idea I had a while back for i915 would be to count the number of
> frames that had an underrun. So basically something like this:
> 
> underrun_irq() {
> 	underrun_frames=1
> 	disable_underrun_irq();
> 	vblank_get();
> }
> vblank_irq() {
> 	if (underrun) {
> 		underrun_frames++;
> 	} else if (underrun_frames) {
> 		vblank_put();
> 		enable_underrun_irq();
> 		DEBUG("%d frames had an underrun\n", underrun_frames);
> 		underrun_frames=0;
> 	}
> }
> 
> This avoids drowning in underrun interrupts while still
> reporting at least how many frames had problems.
> 
> But I haven't had time to implement that yet.

The other upshot of a counter is that there's no problem with resetting.
Userspace simply grabs the counter before and after the test and compares.
If you only have 0/1 you need some way to reset, or otherwise automated
running in a CI farm isn't possible.
-Daniel
Boris Brezillon Oct. 29, 2018, 8:41 a.m. UTC | #20
On Mon, 29 Oct 2018 09:06:40 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, Oct 26, 2018 at 06:10:03PM +0300, Ville Syrjälä wrote:
> > On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote:  
> > > On Fri, 26 Oct 2018 16:26:03 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >   
> > > > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> > > > <boris.brezillon@bootlin.com> wrote:  
> > > > >
> > > > > On Fri, 26 Oct 2018 15:30:26 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >    
> > > > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:    
> > > > > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > >    
> > > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:    
> > > > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > >    
> > > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:    
> > > > > > > > > > > Hi Daniel,
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > > >    
> > > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:    
> > > > > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > > > > >   reporting those errors
> > > > > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > > > > >
> > > > > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test    
> > > > > > > > > > > >
> > > > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > > > > >
> > > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > > > > level, using DRM_ERROR,    
> > > > > > > > > > >
> > > > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > > > > someone comes and update the config.    
> > > > > > > > > >
> > > > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > > > > for nothing).
> > > > > > > > > >    
> > > > > > > > > > >    
> > > > > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > > > > crc support already.
> > > > > > > > > > > >    
> > > > > > > > > > >
> > > > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > > Any opinion on that?    
> > > > > > > > > >
> > > > > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > > > > update them to suit our needs.    
> > > > > > > > >
> > > > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > > > > >
> > > > > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > > > > a per-device underrun status.    
> > > > > > > >
> > > > > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > > > > level dmesg output. Anything at that level is considered a CI failure.    
> > > > > > >
> > > > > > > So igt is grepping dmesg to detect when an underrun happens?    
> > > > > >
> > > > > > No, but the CI runner is also observing dmesg. Anything in there at
> > > > > > warning or higher level is considered a failure.    
> > > > >
> > > > > Eric, do you do the same when you launch the IGT testsuite?
> > > > >    
> > > > > >    
> > > > > > > > What do you need the debugfs file for?    
> > > > > > >
> > > > > > > I just thought having a debugfs file to expose the underrun information
> > > > > > > would be cleaner than having to grep in dmesg to detect such failures.    
> > > > > >
> > > > > > The issue is that you want to detect underruns everywhere, not just in the
> > > > > > specific tests you're checking for it. Anything that does a modeset could
> > > > > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > > > > on i915 hw, if you misprogram is sufficiently).    
> > > > >
> > > > > In my specific case, I want to have the IGT test check the underrun
> > > > > value while the test is being executed so that I know which exact
> > > > > configuration triggers the underrun situation. At least that's how I
> > > > > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > > > > problem when all we do is tracking regressions.    
> > > > 
> > > > Ok, that makes sense, and explains why you want the overall underrun
> > > > counter exposed in debugfs.  
> > > 
> > > Just one tiny detail which is not exactly related to this discussion
> > > but I thought I'd mention it here: underrun is actually not a counter,
> > > it's just a boolean. I used an atomic_t type to avoid having to add a
> > > spinlock to protect the variable (the variable is modified from
> > > an interrupt context and read from a non-atomic context). So, the
> > > acceptable values for underrun are currently 0 or 1. I can make it a
> > > counter if required.  
> > 
> > One idea I had a while back for i915 would be to count the number of
> > frames that had an underrun. So basically something like this:
> > 
> > underrun_irq() {
> > 	underrun_frames=1
> > 	disable_underrun_irq();
> > 	vblank_get();
> > }
> > vblank_irq() {
> > 	if (underrun) {
> > 		underrun_frames++;
> > 	} else if (underrun_frames) {
> > 		vblank_put();
> > 		enable_underrun_irq();
> > 		DEBUG("%d frames had an underrun\n", underrun_frames);
> > 		underrun_frames=0;
> > 	}
> > }
> > 
> > This avoids drowning in underrun interrupts while still
> > reporting at least how many frames had problems.
> > 
> > But I haven't had time to implement that yet.  

May I ask why you need to know how many frames had underrun issues?

> 
> The other upshot of a counter is that there's no problem with resetting.
> Userspace simply grabs the counter before and after the test and compares.
> If you only have 0/1 you need some way to reset, or otherwise automated
> running in a CI farm isn't possible.

The reset was done during atomic commit in my proposal, so no action
was required on the user side apart from applying a new config.
Anyway, I'll change that for a real counter.
Daniel Vetter Oct. 29, 2018, 9:03 a.m. UTC | #21
On Mon, Oct 29, 2018 at 09:41:36AM +0100, Boris Brezillon wrote:
> On Mon, 29 Oct 2018 09:06:40 +0100
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Fri, Oct 26, 2018 at 06:10:03PM +0300, Ville Syrjälä wrote:
> > > On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote:  
> > > > On Fri, 26 Oct 2018 16:26:03 +0200
> > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >   
> > > > > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> > > > > <boris.brezillon@bootlin.com> wrote:  
> > > > > >
> > > > > > On Fri, 26 Oct 2018 15:30:26 +0200
> > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > >    
> > > > > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:    
> > > > > > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > >    
> > > > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:    
> > > > > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > >    
> > > > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:    
> > > > > > > > > > > > Hi Daniel,
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > > > >    
> > > > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:    
> > > > > > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > > > > > >   reporting those errors
> > > > > > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test    
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > > > > > level, using DRM_ERROR,    
> > > > > > > > > > > >
> > > > > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > > > > > someone comes and update the config.    
> > > > > > > > > > >
> > > > > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > > > > > for nothing).
> > > > > > > > > > >    
> > > > > > > > > > > >    
> > > > > > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > > > > > crc support already.
> > > > > > > > > > > > >    
> > > > > > > > > > > >
> > > > > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > > > > > case.
> > > > > > > > > > > >
> > > > > > > > > > > > Any opinion on that?    
> > > > > > > > > > >
> > > > > > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > > > > > update them to suit our needs.    
> > > > > > > > > >
> > > > > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > > > > > >
> > > > > > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > > > > > a per-device underrun status.    
> > > > > > > > >
> > > > > > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > > > > > level dmesg output. Anything at that level is considered a CI failure.    
> > > > > > > >
> > > > > > > > So igt is grepping dmesg to detect when an underrun happens?    
> > > > > > >
> > > > > > > No, but the CI runner is also observing dmesg. Anything in there at
> > > > > > > warning or higher level is considered a failure.    
> > > > > >
> > > > > > Eric, do you do the same when you launch the IGT testsuite?
> > > > > >    
> > > > > > >    
> > > > > > > > > What do you need the debugfs file for?    
> > > > > > > >
> > > > > > > > I just thought having a debugfs file to expose the underrun information
> > > > > > > > would be cleaner than having to grep in dmesg to detect such failures.    
> > > > > > >
> > > > > > > The issue is that you want to detect underruns everywhere, not just in the
> > > > > > > specific tests you're checking for it. Anything that does a modeset could
> > > > > > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > > > > > on i915 hw, if you misprogram is sufficiently).    
> > > > > >
> > > > > > In my specific case, I want to have the IGT test check the underrun
> > > > > > value while the test is being executed so that I know which exact
> > > > > > configuration triggers the underrun situation. At least that's how I
> > > > > > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > > > > > problem when all we do is tracking regressions.    
> > > > > 
> > > > > Ok, that makes sense, and explains why you want the overall underrun
> > > > > counter exposed in debugfs.  
> > > > 
> > > > Just one tiny detail which is not exactly related to this discussion
> > > > but I thought I'd mention it here: underrun is actually not a counter,
> > > > it's just a boolean. I used an atomic_t type to avoid having to add a
> > > > spinlock to protect the variable (the variable is modified from
> > > > an interrupt context and read from a non-atomic context). So, the
> > > > acceptable values for underrun are currently 0 or 1. I can make it a
> > > > counter if required.  
> > > 
> > > One idea I had a while back for i915 would be to count the number of
> > > frames that had an underrun. So basically something like this:
> > > 
> > > underrun_irq() {
> > > 	underrun_frames=1
> > > 	disable_underrun_irq();
> > > 	vblank_get();
> > > }
> > > vblank_irq() {
> > > 	if (underrun) {
> > > 		underrun_frames++;
> > > 	} else if (underrun_frames) {
> > > 		vblank_put();
> > > 		enable_underrun_irq();
> > > 		DEBUG("%d frames had an underrun\n", underrun_frames);
> > > 		underrun_frames=0;
> > > 	}
> > > }
> > > 
> > > This avoids drowning in underrun interrupts while still
> > > reporting at least how many frames had problems.
> > > 
> > > But I haven't had time to implement that yet.  
> 
> May I ask why you need to know how many frames had underrun issues?

Much quicker testing when you're experimenting around with e.g. watermark
settings. Full modeset for resetting can easily take a few seconds. I
think we even had patches that restored the interrupt after 1ms already,
to have more accurate sampling to judge whether a given change made things
worse or better. We might have somewhat strange hw :-)

> > The other upshot of a counter is that there's no problem with resetting.
> > Userspace simply grabs the counter before and after the test and compares.
> > If you only have 0/1 you need some way to reset, or otherwise automated
> > running in a CI farm isn't possible.
> 
> The reset was done during atomic commit in my proposal, so no action
> was required on the user side apart from applying a new config.
> Anyway, I'll change that for a real counter.

We want to measure underruns while doing full modesets too (specifically,
when shutting down the pipeline). At least for our hw this has
historically helped greatly in catching programming sequence issues.
-Daniel
Boris Brezillon Oct. 29, 2018, 9:10 a.m. UTC | #22
On Mon, 29 Oct 2018 10:03:01 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, Oct 29, 2018 at 09:41:36AM +0100, Boris Brezillon wrote:
> > On Mon, 29 Oct 2018 09:06:40 +0100
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Fri, Oct 26, 2018 at 06:10:03PM +0300, Ville Syrjälä wrote:  
> > > > On Fri, Oct 26, 2018 at 04:52:33PM +0200, Boris Brezillon wrote:    
> > > > > On Fri, 26 Oct 2018 16:26:03 +0200
> > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >     
> > > > > > On Fri, Oct 26, 2018 at 3:57 PM Boris Brezillon
> > > > > > <boris.brezillon@bootlin.com> wrote:    
> > > > > > >
> > > > > > > On Fri, 26 Oct 2018 15:30:26 +0200
> > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > >      
> > > > > > > > On Thu, Oct 25, 2018 at 11:41:14AM +0200, Boris Brezillon wrote:      
> > > > > > > > > On Thu, 25 Oct 2018 11:33:14 +0200
> > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > >      
> > > > > > > > > > On Thu, Oct 25, 2018 at 10:09:31AM +0200, Boris Brezillon wrote:      
> > > > > > > > > > > On Tue, 23 Oct 2018 15:44:43 +0200
> > > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > > >      
> > > > > > > > > > > > On Tue, Oct 23, 2018 at 09:55:08AM +0200, Boris Brezillon wrote:      
> > > > > > > > > > > > > Hi Daniel,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 16 Oct 2018 14:57:43 +0200
> > > > > > > > > > > > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > > > > > > > >      
> > > > > > > > > > > > > > On Tue, Oct 16, 2018 at 11:40:45AM +0200, Boris Brezillon wrote:      
> > > > > > > > > > > > > > > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> > > > > > > > > > > > > > > meet the requested framerate. The problem is, the HVS and memory bus
> > > > > > > > > > > > > > > bandwidths are limited, and if we don't take these limitations into
> > > > > > > > > > > > > > > account we might end up with HVS underflow errors.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch is trying to model the per-plane HVS and memory bus bandwidth
> > > > > > > > > > > > > > > consumption and take a decision at atomic_check() time whether the
> > > > > > > > > > > > > > > estimated load will fit in the HVS and membus budget.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that we take an extra margin on the memory bus consumption to let
> > > > > > > > > > > > > > > the system run smoothly when other blocks are doing heavy use of the
> > > > > > > > > > > > > > > memory bus. Same goes for the HVS limit, except the margin is smaller in
> > > > > > > > > > > > > > > this case, since the HVS is not used by external components.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > This logic has been validated using a simple shell script and
> > > > > > > > > > > > > > > some instrumentation in the VC4 driver:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - capture underflow errors at the HVS level and expose a debugfs file
> > > > > > > > > > > > > > >   reporting those errors
> > > > > > > > > > > > > > > - add debugfs files to expose when atomic_check fails because of the
> > > > > > > > > > > > > > >   HVS or membus load limitation or when it fails for other reasons
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The branch containing those modification is available here [1], and the
> > > > > > > > > > > > > > > script (which is internally using modetest) is here [2] (please note
> > > > > > > > > > > > > > > that I'm bad at writing shell scripts :-)).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that those modification tend to over-estimate the load, and thus
> > > > > > > > > > > > > > > reject setups that might have previously worked, so we might want to
> > > > > > > > > > > > > > > adjust the limits to avoid that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [1]https://github.com/bbrezillon/linux/tree/vc4/hvs-bandwidth-eval
> > > > > > > > > > > > > > > [2]https://github.com/bbrezillon/vc4-hvs-bandwidth-test      
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Any interest in using igt to test this stuff? We have at least a bunch of
> > > > > > > > > > > > > > tests already in there that try all kinds of plane setups. And we use
> > > > > > > > > > > > > > those to hunt for underruns on i915 hw.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Wrt underrun reporting: On i915 we just dump them into dmesg at the error
> > > > > > > > > > > > > > level, using DRM_ERROR,      
> > > > > > > > > > > > >
> > > > > > > > > > > > > Are you masking the underrun interrupt after it's been reported? If we
> > > > > > > > > > > > > don't do that on VC4 we just end up flooding the kernel-log buffer until
> > > > > > > > > > > > > someone comes and update the config.      
> > > > > > > > > > > >
> > > > > > > > > > > > Yeah we do that too. Rule is that a full modeset will clear any underrun
> > > > > > > > > > > > masking (so tests need to make sure they start with a modeset, or it'll be
> > > > > > > > > > > > for nothing).
> > > > > > > > > > > >      
> > > > > > > > > > > > >      
> > > > > > > > > > > > > > plus a tracepoint. See e.g.
> > > > > > > > > > > > > > intel_pch_fifo_underrun_irq_handler(). If there's interest we could
> > > > > > > > > > > > > > perhaps extract this into something common, similar to what was done with
> > > > > > > > > > > > > > crc support already.
> > > > > > > > > > > > > >      
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm not a big fan of hardcoded trace points in general (because of the
> > > > > > > > > > > > > whole "it's part of the stable ABI" thing), and in this case, making the
> > > > > > > > > > > > > tracepoint generic sounds even more risky to me. Indeed, how can we know
> > > > > > > > > > > > > about all the HW specific bits one might want to expose. For instance,
> > > > > > > > > > > > > I see the intel underrun tracepoint exposes a struct with a frame and
> > > > > > > > > > > > > scanline field, and AFAICT, we don't have such information in the VC4
> > > > > > > > > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any opinion on that?      
> > > > > > > > > > > >
> > > > > > > > > > > > It's only abi if you're unlucky. If it's just for debugging and
> > > > > > > > > > > > validation, you can change it again. Tbh, no idea why we even have these
> > > > > > > > > > > > tracepoints, they're fairly useless imo. CI only relies upon the dmesg
> > > > > > > > > > > > output. Maybe run git blame and ask the original author, we can probably
> > > > > > > > > > > > update them to suit our needs.      
> > > > > > > > > > >
> > > > > > > > > > > Okay, I think I'll go for a generic debugfs entry that returns true
> > > > > > > > > > > when an underrun error happened since the last modeset, false otherwise.
> > > > > > > > > > >
> > > > > > > > > > > Next question is: should I attach the underrun status to the drm_device
> > > > > > > > > > > or have one per CRTC? In my case, I only care about the "has an
> > > > > > > > > > > underrun error occurred on any of the active CRTC" case, so I'd vote for
> > > > > > > > > > > a per-device underrun status.      
> > > > > > > > > >
> > > > > > > > > > Yeah probably good enough. For our CI all we care about is the warn/error
> > > > > > > > > > level dmesg output. Anything at that level is considered a CI failure.      
> > > > > > > > >
> > > > > > > > > So igt is grepping dmesg to detect when an underrun happens?      
> > > > > > > >
> > > > > > > > No, but the CI runner is also observing dmesg. Anything in there at
> > > > > > > > warning or higher level is considered a failure.      
> > > > > > >
> > > > > > > Eric, do you do the same when you launch the IGT testsuite?
> > > > > > >      
> > > > > > > >      
> > > > > > > > > > What do you need the debugfs file for?      
> > > > > > > > >
> > > > > > > > > I just thought having a debugfs file to expose the underrun information
> > > > > > > > > would be cleaner than having to grep in dmesg to detect such failures.      
> > > > > > > >
> > > > > > > > The issue is that you want to detect underruns everywhere, not just in the
> > > > > > > > specific tests you're checking for it. Anything that does a modeset could
> > > > > > > > cause an underrun (at least we've managed to do so pretty much everywhere
> > > > > > > > on i915 hw, if you misprogram is sufficiently).      
> > > > > > >
> > > > > > > In my specific case, I want to have the IGT test check the underrun
> > > > > > > value while the test is being executed so that I know which exact
> > > > > > > configuration triggers the underrun situation. At least that's how I
> > > > > > > did to adjust/debug the HVS load tracking code. Maybe it's not really a
> > > > > > > problem when all we do is tracking regressions.      
> > > > > > 
> > > > > > Ok, that makes sense, and explains why you want the overall underrun
> > > > > > counter exposed in debugfs.    
> > > > > 
> > > > > Just one tiny detail which is not exactly related to this discussion
> > > > > but I thought I'd mention it here: underrun is actually not a counter,
> > > > > it's just a boolean. I used an atomic_t type to avoid having to add a
> > > > > spinlock to protect the variable (the variable is modified from
> > > > > an interrupt context and read from a non-atomic context). So, the
> > > > > acceptable values for underrun are currently 0 or 1. I can make it a
> > > > > counter if required.    
> > > > 
> > > > One idea I had a while back for i915 would be to count the number of
> > > > frames that had an underrun. So basically something like this:
> > > > 
> > > > underrun_irq() {
> > > > 	underrun_frames=1
> > > > 	disable_underrun_irq();
> > > > 	vblank_get();
> > > > }
> > > > vblank_irq() {
> > > > 	if (underrun) {
> > > > 		underrun_frames++;
> > > > 	} else if (underrun_frames) {
> > > > 		vblank_put();
> > > > 		enable_underrun_irq();
> > > > 		DEBUG("%d frames had an underrun\n", underrun_frames);
> > > > 		underrun_frames=0;
> > > > 	}
> > > > }
> > > > 
> > > > This avoids drowning in underrun interrupts while still
> > > > reporting at least how many frames had problems.
> > > > 
> > > > But I haven't had time to implement that yet.    
> > 
> > May I ask why you need to know how many frames had underrun issues?  
> 
> Much quicker testing when you're experimenting around with e.g. watermark
> settings. Full modeset for resetting can easily take a few seconds. I
> think we even had patches that restored the interrupt after 1ms already,
> to have more accurate sampling to judge whether a given change made things
> worse or better. We might have somewhat strange hw :-)
> 
> > > The other upshot of a counter is that there's no problem with resetting.
> > > Userspace simply grabs the counter before and after the test and compares.
> > > If you only have 0/1 you need some way to reset, or otherwise automated
> > > running in a CI farm isn't possible.  
> > 
> > The reset was done during atomic commit in my proposal, so no action
> > was required on the user side apart from applying a new config.
> > Anyway, I'll change that for a real counter.  
> 
> We want to measure underruns while doing full modesets too (specifically,
> when shutting down the pipeline). At least for our hw this has
> historically helped greatly in catching programming sequence issues.

Looks like you have particular needs for intel HW which I don't know
about, maybe I shouldn't be the one driving this rework...
Daniel Vetter Oct. 29, 2018, 1:48 p.m. UTC | #23
On Mon, Oct 29, 2018 at 10:10:56AM +0100, Boris Brezillon wrote:
> On Mon, 29 Oct 2018 10:03:01 +0100
> Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Oct 29, 2018 at 09:41:36AM +0100, Boris Brezillon wrote:
> > > On Mon, 29 Oct 2018 09:06:40 +0100
> > > > The other upshot of a counter is that there's no problem with resetting.
> > > > Userspace simply grabs the counter before and after the test and compares.
> > > > If you only have 0/1 you need some way to reset, or otherwise automated
> > > > running in a CI farm isn't possible.  
> > > 
> > > The reset was done during atomic commit in my proposal, so no action
> > > was required on the user side apart from applying a new config.
> > > Anyway, I'll change that for a real counter.  
> > 
> > We want to measure underruns while doing full modesets too (specifically,
> > when shutting down the pipeline). At least for our hw this has
> > historically helped greatly in catching programming sequence issues.
> 
> Looks like you have particular needs for intel HW which I don't know
> about, maybe I shouldn't be the one driving this rework...

No one is expecting you do drive i915 work, we just want to make sure that
whatever you come up for igt tests is generally useful, and not just only
for the limited use case of vc4. Atm the only two drivers interested in
underrun reporting are i915 and vc4, and it looks like they span a pretty
broad spectrum alredy.
-Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index bd6ef1f31822..48f6ee5ceda3 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -200,6 +200,7 @@  struct vc4_dev {
 
 	struct drm_modeset_lock ctm_state_lock;
 	struct drm_private_obj ctm_manager;
+	struct drm_private_obj load_tracker;
 };
 
 static inline struct vc4_dev *
@@ -369,6 +370,16 @@  struct vc4_plane_state {
 	 * to enable background color fill.
 	 */
 	bool needs_bg_fill;
+
+	/* Load of this plane on the HVS block. The load is expressed in HVS
+	 * cycles/sec.
+	 */
+	u64 hvs_load;
+
+	/* Memory bandwidth needed for this plane. This is expressed in
+	 * bytes/sec.
+	 */
+	u64 membus_load;
 };
 
 static inline struct vc4_plane_state *
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index 127468785f74..4c65e6013bd3 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -34,6 +34,18 @@  static struct vc4_ctm_state *to_vc4_ctm_state(struct drm_private_state *priv)
 	return container_of(priv, struct vc4_ctm_state, base);
 }
 
+struct vc4_load_tracker_state {
+	struct drm_private_state base;
+	u64 hvs_load;
+	u64 membus_load;
+};
+
+static struct vc4_load_tracker_state *
+to_vc4_load_tracker_state(struct drm_private_state *priv)
+{
+	return container_of(priv, struct vc4_load_tracker_state, base);
+}
+
 static struct vc4_ctm_state *vc4_get_ctm_state(struct drm_atomic_state *state,
 					       struct drm_private_obj *manager)
 {
@@ -379,6 +391,81 @@  vc4_ctm_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 	return 0;
 }
 
+static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
+{
+	struct drm_plane_state *old_plane_state, *new_plane_state;
+	struct vc4_dev *vc4 = to_vc4_dev(state->dev);
+	struct vc4_load_tracker_state *load_state;
+	struct drm_private_state *priv_state;
+	struct drm_plane *plane;
+	int ret, i;
+
+	priv_state = drm_atomic_get_private_obj_state(state,
+						      &vc4->load_tracker);
+	if (IS_ERR(priv_state))
+		return PTR_ERR(priv_state);
+
+	load_state = to_vc4_load_tracker_state(priv_state);
+	for_each_oldnew_plane_in_state(state, plane, old_plane_state,
+				       new_plane_state, i) {
+		struct vc4_plane_state *vc4_plane_state;
+
+		if (old_plane_state->fb && old_plane_state->crtc) {
+			vc4_plane_state = to_vc4_plane_state(old_plane_state);
+			load_state->membus_load -= vc4_plane_state->membus_load;
+			load_state->hvs_load -= vc4_plane_state->hvs_load;
+		}
+
+		if (new_plane_state->fb && new_plane_state->crtc) {
+			vc4_plane_state = to_vc4_plane_state(new_plane_state);
+			load_state->membus_load += vc4_plane_state->membus_load;
+			load_state->hvs_load += vc4_plane_state->hvs_load;
+		}
+	}
+
+	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let
+	 * the system work when other blocks are accessing the memory.
+	 */
+	if (load_state->membus_load > SZ_1G + SZ_512M)
+		return -ENOSPC;
+
+	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
+	 * consider the maximum number of cycles is 240M.
+	 */
+	if (load_state->hvs_load > 240000000ULL)
+		return -ENOSPC;
+
+	return 0;
+}
+
+static struct drm_private_state *
+vc4_load_tracker_duplicate_state(struct drm_private_obj *obj)
+{
+	struct vc4_load_tracker_state *state;
+
+	state = kmemdup(obj->state, sizeof(*state), GFP_KERNEL);
+	if (!state)
+		return NULL;
+
+	__drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
+
+	return &state->base;
+}
+
+static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
+					   struct drm_private_state *state)
+{
+	struct vc4_load_tracker_state *load_state;
+
+	load_state = to_vc4_load_tracker_state(state);
+	kfree(load_state);
+}
+
+static const struct drm_private_state_funcs vc4_load_tracker_state_funcs = {
+	.atomic_duplicate_state = vc4_load_tracker_duplicate_state,
+	.atomic_destroy_state = vc4_load_tracker_destroy_state,
+};
+
 static int
 vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 {
@@ -388,7 +475,11 @@  vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 	if (ret < 0)
 		return ret;
 
-	return drm_atomic_helper_check(dev, state);
+	ret = drm_atomic_helper_check(dev, state);
+	if (ret)
+		return ret;
+
+	return vc4_load_tracker_atomic_check(state);
 }
 
 static const struct drm_mode_config_funcs vc4_mode_funcs = {
@@ -401,6 +492,7 @@  int vc4_kms_load(struct drm_device *dev)
 {
 	struct vc4_dev *vc4 = to_vc4_dev(dev);
 	struct vc4_ctm_state *ctm_state;
+	struct vc4_load_tracker_state *load_state;
 	int ret;
 
 	sema_init(&vc4->async_modeset, 1);
@@ -426,9 +518,19 @@  int vc4_kms_load(struct drm_device *dev)
 	ctm_state = kzalloc(sizeof(*ctm_state), GFP_KERNEL);
 	if (!ctm_state)
 		return -ENOMEM;
+
 	drm_atomic_private_obj_init(&vc4->ctm_manager, &ctm_state->base,
 				    &vc4_ctm_state_funcs);
 
+	load_state = kzalloc(sizeof(*load_state), GFP_KERNEL);
+	if (!load_state) {
+		drm_atomic_private_obj_fini(&vc4->ctm_manager);
+		return -ENOMEM;
+	}
+
+	drm_atomic_private_obj_init(&vc4->load_tracker, &load_state->base,
+				    &vc4_load_tracker_state_funcs);
+
 	drm_mode_config_reset(dev);
 
 	drm_kms_helper_poll_init(dev);
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 60d5ad19cedd..f47d38383a2f 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -455,6 +455,64 @@  static void vc4_write_scaling_parameters(struct drm_plane_state *state,
 	}
 }
 
+static void vc4_plane_calc_load(struct drm_plane_state *state)
+{
+	unsigned int hvs_load_shift, vrefresh, i;
+	struct drm_framebuffer *fb = state->fb;
+	struct vc4_plane_state *vc4_state;
+	struct drm_crtc_state *crtc_state;
+	unsigned int vscale_factor;
+
+	vc4_state = to_vc4_plane_state(state);
+	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
+							state->crtc);
+	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
+
+	/* The HVS is able to process 2 pixels/cycle when scaling the source,
+	 * 4 pixels/cycle otherwise.
+	 * Alpha blending step seems to be pipelined and it's always operating
+	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
+	 * scaler block.
+	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
+	 */
+	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
+	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
+	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
+	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
+		hvs_load_shift = 1;
+	else
+		hvs_load_shift = 2;
+
+	vc4_state->membus_load = 0;
+	vc4_state->hvs_load = 0;
+	for (i = 0; i < fb->format->num_planes; i++) {
+		unsigned long pixels_load;
+
+		/* Even if the bandwidth/plane required for a single frame is
+		 *
+		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
+		 *
+		 * when downscaling, we have to read more pixels per line in
+		 * the time frame reserved for a single line, so the bandwidth
+		 * demand can be punctually higher. To account for that, we
+		 * calculate the down-scaling factor and multiply the plane
+		 * load by this number. We're likely over-estimating the read
+		 * demand, but that's better than under-estimating it.
+		 */
+		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
+					     vc4_state->crtc_h);
+		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
+			      vscale_factor;
+
+		vc4_state->membus_load += fb->format->cpp[i] * pixels_load;
+		vc4_state->hvs_load += pixels_load;
+	}
+
+	vc4_state->hvs_load *= vrefresh;
+	vc4_state->hvs_load >>= hvs_load_shift;
+	vc4_state->membus_load *= vrefresh;
+}
+
 /* Writes out a full display list for an active plane to the plane's
  * private dlist state.
  */
@@ -722,6 +780,8 @@  static int vc4_plane_mode_set(struct drm_plane *plane,
 	vc4_state->needs_bg_fill = fb->format->has_alpha || !covers_screen ||
 				   state->alpha != DRM_BLEND_ALPHA_OPAQUE;
 
+	vc4_plane_calc_load(state);
+
 	return 0;
 }