diff mbox series

drm/i915: Refine mmio debug flow to avoid bad unlock balance detected.

Message ID 20230704080727.2665-1-shawn.c.lee@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Refine mmio debug flow to avoid bad unlock balance detected. | expand

Commit Message

Lee, Shawn C July 4, 2023, 8:07 a.m. UTC
Perform reboot stresss test on a kernel build with lockdebug flag enabled.
Bad unlock balanace detected would happened sometimes. Below is the
problematic scenario. If params.mmio_debug value was changed at step #4.
And it would trigger this issue. Modify code flow that decide to
enable/disable mmio debug before unclaimed_reg_debug() can avoid
this symptom.

    1. GEN6_READ_HEADER is called with params.mmio_debug = 0
    2. unclaimed_reg_debug(before = true) is called
    3. unclaimed_reg_debug return without taking a lock
       because params.mmio_debug == 0
    4. other thread modifies params.mmio_debug to 1
    5. GEN6_READ_FOOTER is called with params.mmio_debug != 0
    6. unclaimed_reg_debug tries to assert non-taken lock (first WARN)
    7. unclaimed_reg_debug tries to release non-taken lock (second WARN)

Closes:https://gitlab.freedesktop.org/drm/intel/-/issues/8749
Signed-off-by: Lee Shawn C <shawn.c.lee@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Jani Nikula July 4, 2023, 9:54 a.m. UTC | #1
On Tue, 04 Jul 2023, Lee Shawn C <shawn.c.lee@intel.com> wrote:
> Perform reboot stresss test on a kernel build with lockdebug flag enabled.
> Bad unlock balanace detected would happened sometimes. Below is the
> problematic scenario. If params.mmio_debug value was changed at step #4.
> And it would trigger this issue. Modify code flow that decide to
> enable/disable mmio debug before unclaimed_reg_debug() can avoid
> this symptom.
>
>     1. GEN6_READ_HEADER is called with params.mmio_debug = 0
>     2. unclaimed_reg_debug(before = true) is called
>     3. unclaimed_reg_debug return without taking a lock
>        because params.mmio_debug == 0
>     4. other thread modifies params.mmio_debug to 1
>     5. GEN6_READ_FOOTER is called with params.mmio_debug != 0
>     6. unclaimed_reg_debug tries to assert non-taken lock (first WARN)
>     7. unclaimed_reg_debug tries to release non-taken lock (second WARN)
>
> Closes:https://gitlab.freedesktop.org/drm/intel/-/issues/8749

Thanks for the report and analysis!

> Signed-off-by: Lee Shawn C <shawn.c.lee@intel.com>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 796ebfe6c550..9d665978cc43 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1931,9 +1931,6 @@ unclaimed_reg_debug(struct intel_uncore *uncore,
>  		    const bool read,
>  		    const bool before)
>  {
> -	if (likely(!uncore->i915->params.mmio_debug) || !uncore->debug)
> -		return;
> -
>  	/* interrupts are disabled and re-enabled around uncore->lock usage */
>  	lockdep_assert_held(&uncore->lock);
>  
> @@ -2001,13 +1998,16 @@ __gen2_read(64)
>  #define GEN6_READ_HEADER(x) \
>  	u32 offset = i915_mmio_reg_offset(reg); \
>  	unsigned long irqflags; \
> +	const bool mmio_debug = likely(uncore->i915->params.mmio_debug) || uncore->debug; \

This changes the condition. The old one requires both to be true, this
requires either to be true. Additionally this gets the "likely"
backwards as the condition is reversed.

	const bool mmio_debug =	unlikely(uncore->i915->params.mmio_debug) && uncore->debug;

Anyway. I'm not happy about having the debug conditions moved within
GEN6_READ_HEADER. This also does not address GEN6_WRITE_*.

I sent a slightly different kind of fix [1].


BR,
Jani.


[1] https://patchwork.freedesktop.org/series/120167/


>  	u##x val = 0; \
>  	assert_rpm_wakelock_held(uncore->rpm); \
>  	spin_lock_irqsave(&uncore->lock, irqflags); \
> -	unclaimed_reg_debug(uncore, reg, true, true)
> +	if (mmio_debug) \
> +		unclaimed_reg_debug(uncore, reg, true, true)
>  
>  #define GEN6_READ_FOOTER \
> -	unclaimed_reg_debug(uncore, reg, true, false); \
> +	if (mmio_debug) \
> +		unclaimed_reg_debug(uncore, reg, true, false); \
>  	spin_unlock_irqrestore(&uncore->lock, irqflags); \
>  	trace_i915_reg_rw(false, reg, val, sizeof(val), trace); \
>  	return val
Andi Shyti July 11, 2023, 10:31 a.m. UTC | #2
Hi Lee,

I'm really sorry for the late reply, I had some holidays in
between.

On Tue, Jul 04, 2023 at 04:07:27PM +0800, Lee Shawn C wrote:
> Perform reboot stresss test on a kernel build with lockdebug flag enabled.
> Bad unlock balanace detected would happened sometimes. Below is the
> problematic scenario. If params.mmio_debug value was changed at step #4.
> And it would trigger this issue. Modify code flow that decide to
> enable/disable mmio debug before unclaimed_reg_debug() can avoid
> this symptom.
> 
>     1. GEN6_READ_HEADER is called with params.mmio_debug = 0
>     2. unclaimed_reg_debug(before = true) is called
>     3. unclaimed_reg_debug return without taking a lock
>        because params.mmio_debug == 0
>     4. other thread modifies params.mmio_debug to 1
>     5. GEN6_READ_FOOTER is called with params.mmio_debug != 0
>     6. unclaimed_reg_debug tries to assert non-taken lock (first WARN)
>     7. unclaimed_reg_debug tries to release non-taken lock (second WARN)
> 
> Closes:https://gitlab.freedesktop.org/drm/intel/-/issues/8749
> Signed-off-by: Lee Shawn C <shawn.c.lee@intel.com>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 796ebfe6c550..9d665978cc43 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1931,9 +1931,6 @@ unclaimed_reg_debug(struct intel_uncore *uncore,
>  		    const bool read,
>  		    const bool before)
>  {
> -	if (likely(!uncore->i915->params.mmio_debug) || !uncore->debug)
> -		return;
> -

this is a very good catch! I'm fine with the change from my side:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> 

But I'd like to hear from someone else, as well, Matt, Daniele?

Thank you,
Andi
Jani Nikula July 11, 2023, 10:40 a.m. UTC | #3
On Tue, 11 Jul 2023, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> Hi Lee,
>
> I'm really sorry for the late reply, I had some holidays in
> between.
>
> On Tue, Jul 04, 2023 at 04:07:27PM +0800, Lee Shawn C wrote:
>> Perform reboot stresss test on a kernel build with lockdebug flag enabled.
>> Bad unlock balanace detected would happened sometimes. Below is the
>> problematic scenario. If params.mmio_debug value was changed at step #4.
>> And it would trigger this issue. Modify code flow that decide to
>> enable/disable mmio debug before unclaimed_reg_debug() can avoid
>> this symptom.
>> 
>>     1. GEN6_READ_HEADER is called with params.mmio_debug = 0
>>     2. unclaimed_reg_debug(before = true) is called
>>     3. unclaimed_reg_debug return without taking a lock
>>        because params.mmio_debug == 0
>>     4. other thread modifies params.mmio_debug to 1
>>     5. GEN6_READ_FOOTER is called with params.mmio_debug != 0
>>     6. unclaimed_reg_debug tries to assert non-taken lock (first WARN)
>>     7. unclaimed_reg_debug tries to release non-taken lock (second WARN)
>> 
>> Closes:https://gitlab.freedesktop.org/drm/intel/-/issues/8749
>> Signed-off-by: Lee Shawn C <shawn.c.lee@intel.com>
>> Cc: Uma Shankar <uma.shankar@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_uncore.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>> index 796ebfe6c550..9d665978cc43 100644
>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>> @@ -1931,9 +1931,6 @@ unclaimed_reg_debug(struct intel_uncore *uncore,
>>  		    const bool read,
>>  		    const bool before)
>>  {
>> -	if (likely(!uncore->i915->params.mmio_debug) || !uncore->debug)
>> -		return;
>> -
>
> this is a very good catch! I'm fine with the change from my side:
>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Nope, there's a bug in the patch.

https://lore.kernel.org/r/87ilb00zot.fsf@intel.com

>
> But I'd like to hear from someone else, as well, Matt, Daniele?
>
> Thank you,
> Andi
Andi Shyti July 11, 2023, 11:06 a.m. UTC | #4
Hi Jani,

> >> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> >> index 796ebfe6c550..9d665978cc43 100644
> >> --- a/drivers/gpu/drm/i915/intel_uncore.c
> >> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> >> @@ -1931,9 +1931,6 @@ unclaimed_reg_debug(struct intel_uncore *uncore,
> >>  		    const bool read,
> >>  		    const bool before)
> >>  {
> >> -	if (likely(!uncore->i915->params.mmio_debug) || !uncore->debug)
> >> -		return;
> >> -
> >
> > this is a very good catch! I'm fine with the change from my side:
> >
> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> 
> Nope, there's a bug in the patch.
> 
> https://lore.kernel.org/r/87ilb00zot.fsf@intel.com

OK, I missed your reply.

Thanks, Jani!

Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 796ebfe6c550..9d665978cc43 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1931,9 +1931,6 @@  unclaimed_reg_debug(struct intel_uncore *uncore,
 		    const bool read,
 		    const bool before)
 {
-	if (likely(!uncore->i915->params.mmio_debug) || !uncore->debug)
-		return;
-
 	/* interrupts are disabled and re-enabled around uncore->lock usage */
 	lockdep_assert_held(&uncore->lock);
 
@@ -2001,13 +1998,16 @@  __gen2_read(64)
 #define GEN6_READ_HEADER(x) \
 	u32 offset = i915_mmio_reg_offset(reg); \
 	unsigned long irqflags; \
+	const bool mmio_debug = likely(uncore->i915->params.mmio_debug) || uncore->debug; \
 	u##x val = 0; \
 	assert_rpm_wakelock_held(uncore->rpm); \
 	spin_lock_irqsave(&uncore->lock, irqflags); \
-	unclaimed_reg_debug(uncore, reg, true, true)
+	if (mmio_debug) \
+		unclaimed_reg_debug(uncore, reg, true, true)
 
 #define GEN6_READ_FOOTER \
-	unclaimed_reg_debug(uncore, reg, true, false); \
+	if (mmio_debug) \
+		unclaimed_reg_debug(uncore, reg, true, false); \
 	spin_unlock_irqrestore(&uncore->lock, irqflags); \
 	trace_i915_reg_rw(false, reg, val, sizeof(val), trace); \
 	return val