diff mbox series

[15/20] drm/i915/guc: Ensure H2G buffer updates visible before tail update

Message ID 20210603051630.2635-16-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series GuC CTBs changes + a few misc patches | expand

Commit Message

Matthew Brost June 3, 2021, 5:16 a.m. UTC
Ensure H2G buffer updates are visible before descriptor tail updates by
inserting a barrier between the H2G buffer update and the tail. The
barrier is simple wmb() for SMEM and is register write for LMEM. This is
needed if more than 1 H2G can be inflight at once.

If this barrier is not inserted it is possible the descriptor tail
update is scene by the GuC before H2G buffer update which results in the
GuC reading a corrupt H2G value. This can bring down the H2G channel
among other bad things.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 28 +++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Michal Wajdeczko June 3, 2021, 9:44 a.m. UTC | #1
On 03.06.2021 07:16, Matthew Brost wrote:
> Ensure H2G buffer updates are visible before descriptor tail updates by
> inserting a barrier between the H2G buffer update and the tail. The
> barrier is simple wmb() for SMEM and is register write for LMEM. This is
> needed if more than 1 H2G can be inflight at once.
> 
> If this barrier is not inserted it is possible the descriptor tail
> update is scene by the GuC before H2G buffer update which results in the
> GuC reading a corrupt H2G value. This can bring down the H2G channel
> among other bad things.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 28 +++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 80976fe40fbf..31f83956bfc3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -328,6 +328,28 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
>  	return ++ct->requests.last_fence;
>  }
>  
> +static void write_barrier(struct intel_guc_ct *ct)
> +{
> +	struct intel_guc *guc = ct_to_guc(ct);
> +	struct intel_gt *gt = guc_to_gt(guc);
> +
> +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> +		GEM_BUG_ON(guc->send_regs.fw_domains);
> +		/*
> +		 * This register is used by the i915 and GuC for MMIO based
> +		 * communication. Once we are in this code CTBs are the only
> +		 * method the i915 uses to communicate with the GuC so it is
> +		 * safe to write to this register (a value of 0 is NOP for MMIO
> +		 * communication). If we ever start mixing CTBs and MMIOs a new
> +		 * register will have to be chosen.
> +		 */
> +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);

can't we at least start with SOFT_SCRATCH register that is not used for
GuC MMIO based communication on Gen12 LMEM platforms? see [1]

I really don't feel comfortable that we are touching a register that
elsewhere is protected with the mutex. And mixing CTBs and MMIO is not
far away.

Michal

[1]
https://lore.kernel.org/intel-gfx/51b9bd05-7d6f-29f1-de0f-3a14bade6c9c@intel.com/

> +	} else {
> +		/* wmb() sufficient for a barrier if in smem */
> +		wmb();
> +	}
> +}
> +
>  /**
>   * DOC: CTB Host to GuC request
>   *
> @@ -411,6 +433,12 @@ static int ct_write(struct intel_guc_ct *ct,
>  	}
>  	GEM_BUG_ON(tail > size);
>  
> +	/*
> +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> +	 * submission) are visible before updating the descriptor tail
> +	 */
> +	write_barrier(ct);
> +
>  	/* now update desc tail (back in bytes) */
>  	desc->tail = tail * 4;
>  	return 0;
>
Matthew Brost June 3, 2021, 4:10 p.m. UTC | #2
On Thu, Jun 03, 2021 at 11:44:57AM +0200, Michal Wajdeczko wrote:
> 
> 
> On 03.06.2021 07:16, Matthew Brost wrote:
> > Ensure H2G buffer updates are visible before descriptor tail updates by
> > inserting a barrier between the H2G buffer update and the tail. The
> > barrier is simple wmb() for SMEM and is register write for LMEM. This is
> > needed if more than 1 H2G can be inflight at once.
> > 
> > If this barrier is not inserted it is possible the descriptor tail
> > update is scene by the GuC before H2G buffer update which results in the
> > GuC reading a corrupt H2G value. This can bring down the H2G channel
> > among other bad things.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 28 +++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 80976fe40fbf..31f83956bfc3 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -328,6 +328,28 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
> >  	return ++ct->requests.last_fence;
> >  }
> >  
> > +static void write_barrier(struct intel_guc_ct *ct)
> > +{
> > +	struct intel_guc *guc = ct_to_guc(ct);
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +
> > +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> > +		GEM_BUG_ON(guc->send_regs.fw_domains);
> > +		/*
> > +		 * This register is used by the i915 and GuC for MMIO based
> > +		 * communication. Once we are in this code CTBs are the only
> > +		 * method the i915 uses to communicate with the GuC so it is
> > +		 * safe to write to this register (a value of 0 is NOP for MMIO
> > +		 * communication). If we ever start mixing CTBs and MMIOs a new
> > +		 * register will have to be chosen.
> > +		 */
> > +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
> 
> can't we at least start with SOFT_SCRATCH register that is not used for
> GuC MMIO based communication on Gen12 LMEM platforms? see [1]
> 

We likely can use this but I really don't feel comfortable switching the
register without some more testing first (e.g. let's change in this in
internal, let it soak for bit, then make the change upstream).

> I really don't feel comfortable that we are touching a register that
> elsewhere is protected with the mutex. And mixing CTBs and MMIO is not
> far away.
>

The only code that mixes CTBs and MMIOs is SRIOV which is a ways away
from landing.

Matt
 
> Michal
> 
> [1]
> https://lore.kernel.org/intel-gfx/51b9bd05-7d6f-29f1-de0f-3a14bade6c9c@intel.com/
> 
> > +	} else {
> > +		/* wmb() sufficient for a barrier if in smem */
> > +		wmb();
> > +	}
> > +}
> > +
> >  /**
> >   * DOC: CTB Host to GuC request
> >   *
> > @@ -411,6 +433,12 @@ static int ct_write(struct intel_guc_ct *ct,
> >  	}
> >  	GEM_BUG_ON(tail > size);
> >  
> > +	/*
> > +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> > +	 * submission) are visible before updating the descriptor tail
> > +	 */
> > +	write_barrier(ct);
> > +
> >  	/* now update desc tail (back in bytes) */
> >  	desc->tail = tail * 4;
> >  	return 0;
> >
Daniel Vetter June 4, 2021, 8:39 a.m. UTC | #3
On Thu, Jun 03, 2021 at 09:10:14AM -0700, Matthew Brost wrote:
> On Thu, Jun 03, 2021 at 11:44:57AM +0200, Michal Wajdeczko wrote:
> > 
> > 
> > On 03.06.2021 07:16, Matthew Brost wrote:
> > > Ensure H2G buffer updates are visible before descriptor tail updates by
> > > inserting a barrier between the H2G buffer update and the tail. The
> > > barrier is simple wmb() for SMEM and is register write for LMEM. This is
> > > needed if more than 1 H2G can be inflight at once.
> > > 
> > > If this barrier is not inserted it is possible the descriptor tail
> > > update is scene by the GuC before H2G buffer update which results in the
> > > GuC reading a corrupt H2G value. This can bring down the H2G channel
> > > among other bad things.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 28 +++++++++++++++++++++++
> > >  1 file changed, 28 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > index 80976fe40fbf..31f83956bfc3 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > > @@ -328,6 +328,28 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct)
> > >  	return ++ct->requests.last_fence;
> > >  }
> > >  
> > > +static void write_barrier(struct intel_guc_ct *ct)
> > > +{
> > > +	struct intel_guc *guc = ct_to_guc(ct);
> > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > +
> > > +	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
> > > +		GEM_BUG_ON(guc->send_regs.fw_domains);
> > > +		/*
> > > +		 * This register is used by the i915 and GuC for MMIO based
> > > +		 * communication. Once we are in this code CTBs are the only
> > > +		 * method the i915 uses to communicate with the GuC so it is
> > > +		 * safe to write to this register (a value of 0 is NOP for MMIO
> > > +		 * communication). If we ever start mixing CTBs and MMIOs a new
> > > +		 * register will have to be chosen.
> > > +		 */
> > > +		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
> > 
> > can't we at least start with SOFT_SCRATCH register that is not used for
> > GuC MMIO based communication on Gen12 LMEM platforms? see [1]
> > 
> 
> We likely can use this but I really don't feel comfortable switching the
> register without some more testing first (e.g. let's change in this in
> internal, let it soak for bit, then make the change upstream).
> 
> > I really don't feel comfortable that we are touching a register that
> > elsewhere is protected with the mutex. And mixing CTBs and MMIO is not
> > far away.
> >
> 
> The only code that mixes CTBs and MMIOs is SRIOV which is a ways away
> from landing.

Maybe add a FIXME note as part of the SRIOV patch stack in internal to
track this?
-Daniel

> 
> Matt
>  
> > Michal
> > 
> > [1]
> > https://lore.kernel.org/intel-gfx/51b9bd05-7d6f-29f1-de0f-3a14bade6c9c@intel.com/
> > 
> > > +	} else {
> > > +		/* wmb() sufficient for a barrier if in smem */
> > > +		wmb();
> > > +	}
> > > +}
> > > +
> > >  /**
> > >   * DOC: CTB Host to GuC request
> > >   *
> > > @@ -411,6 +433,12 @@ static int ct_write(struct intel_guc_ct *ct,
> > >  	}
> > >  	GEM_BUG_ON(tail > size);
> > >  
> > > +	/*
> > > +	 * make sure H2G buffer update and LRC tail update (if this triggering a
> > > +	 * submission) are visible before updating the descriptor tail
> > > +	 */
> > > +	write_barrier(ct);
> > > +
> > >  	/* now update desc tail (back in bytes) */
> > >  	desc->tail = tail * 4;
> > >  	return 0;
> > >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 80976fe40fbf..31f83956bfc3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -328,6 +328,28 @@  static u32 ct_get_next_fence(struct intel_guc_ct *ct)
 	return ++ct->requests.last_fence;
 }
 
+static void write_barrier(struct intel_guc_ct *ct)
+{
+	struct intel_guc *guc = ct_to_guc(ct);
+	struct intel_gt *gt = guc_to_gt(guc);
+
+	if (i915_gem_object_is_lmem(guc->ct.vma->obj)) {
+		GEM_BUG_ON(guc->send_regs.fw_domains);
+		/*
+		 * This register is used by the i915 and GuC for MMIO based
+		 * communication. Once we are in this code CTBs are the only
+		 * method the i915 uses to communicate with the GuC so it is
+		 * safe to write to this register (a value of 0 is NOP for MMIO
+		 * communication). If we ever start mixing CTBs and MMIOs a new
+		 * register will have to be chosen.
+		 */
+		intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0);
+	} else {
+		/* wmb() sufficient for a barrier if in smem */
+		wmb();
+	}
+}
+
 /**
  * DOC: CTB Host to GuC request
  *
@@ -411,6 +433,12 @@  static int ct_write(struct intel_guc_ct *ct,
 	}
 	GEM_BUG_ON(tail > size);
 
+	/*
+	 * make sure H2G buffer update and LRC tail update (if this triggering a
+	 * submission) are visible before updating the descriptor tail
+	 */
+	write_barrier(ct);
+
 	/* now update desc tail (back in bytes) */
 	desc->tail = tail * 4;
 	return 0;