drm/i915: Fix NPD in PMU during driver teardown

Message ID	5535d98d0c1f1fa22e6ca6e8973a05e58a097944.1656622601.git.stuart.summers@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Stuart Summers <stuart.summers@intel.com> To: Date: Thu, 30 Jun 2022 21:00:28 +0000 Message-Id: <5535d98d0c1f1fa22e6ca6e8973a05e58a097944.1656622601.git.stuart.summers@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH] drm/i915: Fix NPD in PMU during driver teardown Precedence: list Cc: intel-gfx@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915: Fix NPD in PMU during driver teardown \| expand drm/i915: Fix NPD in PMU during driver teardown

Summers, Stuart June 30, 2022, 9 p.m. UTC

In the driver teardown, we are unregistering the gt prior
to unregistering the PMU. This means there is a small window
of time in which the application can request metrics from the
PMU, some of which are calling into the uapi engines list,
while the engines are not available. In this case we can
see null pointer dereferences.

Fix this ordering in both the driver load and unload sequences.

Additionally add a check for engine presence to prevent this
NPD in the event this ordering is accidentally reversed. Print
a debug message indicating when they aren't available.

v1: Actually address the driver load/unload ordering issue

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 11 ++---
 drivers/gpu/drm/i915/i915_pmu.c    | 72 +++++++++++++++++-------------
 2 files changed, 48 insertions(+), 35 deletions(-)

Umesh Nerlige Ramappa July 1, 2022, 12:11 a.m. UTC | #1

On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>In the driver teardown, we are unregistering the gt prior
>to unregistering the PMU. This means there is a small window
>of time in which the application can request metrics from the
>PMU, some of which are calling into the uapi engines list,
>while the engines are not available. In this case we can
>see null pointer dereferences.
>
>Fix this ordering in both the driver load and unload sequences.
>
>Additionally add a check for engine presence to prevent this
>NPD in the event this ordering is accidentally reversed. Print
>a debug message indicating when they aren't available.
>
>v1: Actually address the driver load/unload ordering issue
>
>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>---

I thought this is likely happening because intel_gpu_top is running in 
the background when i915 is unloaded. I tried a quick repro, I don't see 
the unload succeed ("fatal module in use", not sure if this was a 
partial unload), but when I try to kill intel_gpu_top, I get an NPD.  
This is in the event disable path - i915_pmu_event_stop -> 
i915_pmu_disable.

It's likely that you are seeing a different path (unload) leading to the 
same issue.

I think in i915_pmu_disable/disable should be aware of event->hw.state 
and or pmu->closed states before accessing the event. Maybe like,

if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event)) {

@Tvrtko, wondering if this case is tested by igt@perf_pmu@module-unload. 
I am not clear if we should use event->hw.state or pmu->closed here and 
if/how they are related. IMO, for this issue, the engine check is good 
enough too, so we don't really need the pmu state checks.  Thoughts?

Regards,
Umesh

> drivers/gpu/drm/i915/i915_driver.c | 11 ++---
> drivers/gpu/drm/i915/i915_pmu.c    | 72 +++++++++++++++++-------------
> 2 files changed, 48 insertions(+), 35 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
>index deb8a8b76965..ee4dcb85d206 100644
>--- a/drivers/gpu/drm/i915/i915_driver.c
>+++ b/drivers/gpu/drm/i915/i915_driver.c
>@@ -717,7 +717,6 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
> 	struct drm_device *dev = &dev_priv->drm;
>
> 	i915_gem_driver_register(dev_priv);
>-	i915_pmu_register(dev_priv);
>
> 	intel_vgpu_register(dev_priv);
>
>@@ -731,11 +730,12 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
> 	i915_debugfs_register(dev_priv);
> 	i915_setup_sysfs(dev_priv);
>
>+	intel_gt_driver_register(to_gt(dev_priv));
>+
> 	/* Depends on sysfs having been initialized */
>+	i915_pmu_register(dev_priv);
> 	i915_perf_register(dev_priv);
>
>-	intel_gt_driver_register(to_gt(dev_priv));
>-
> 	intel_display_driver_register(dev_priv);
>
> 	intel_power_domains_enable(dev_priv);
>@@ -762,11 +762,12 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>
> 	intel_display_driver_unregister(dev_priv);
>
>-	intel_gt_driver_unregister(to_gt(dev_priv));
>-
> 	i915_perf_unregister(dev_priv);
>+	/* GT should be available until PMU is gone */
> 	i915_pmu_unregister(dev_priv);
>
>+	intel_gt_driver_unregister(to_gt(dev_priv));
>+
> 	i915_teardown_sysfs(dev_priv);
> 	drm_dev_unplug(&dev_priv->drm);
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 958b37123bf1..796a1d8e36f2 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -670,21 +670,28 @@ static void i915_pmu_enable(struct perf_event *event)
> 	if (is_engine_event(event)) {
> 		u8 sample = engine_event_sample(event);
> 		struct intel_engine_cs *engine;
>-
>-		engine = intel_engine_lookup_user(i915,
>-						  engine_event_class(event),
>-						  engine_event_instance(event));
>-
>-		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
>-			     I915_ENGINE_SAMPLE_COUNT);
>-		BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
>-			     I915_ENGINE_SAMPLE_COUNT);
>-		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>-		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>-		GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>-
>-		engine->pmu.enable |= BIT(sample);
>-		engine->pmu.enable_count[sample]++;
>+		u8 class = engine_event_class(event);
>+		u8 instance = engine_event_instance(event);
>+
>+		engine = intel_engine_lookup_user(i915, class, instance);
>+		if (engine) {
>+			BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
>+				     I915_ENGINE_SAMPLE_COUNT);
>+			BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
>+				     I915_ENGINE_SAMPLE_COUNT);
>+			GEM_BUG_ON(sample >=
>+				   ARRAY_SIZE(engine->pmu.enable_count));
>+			GEM_BUG_ON(sample >=
>+				   ARRAY_SIZE(engine->pmu.sample));
>+			GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>+
>+			engine->pmu.enable |= BIT(sample);
>+			engine->pmu.enable_count[sample]++;
>+		} else {
>+			drm_dbg(&i915->drm,
>+				"Invalid engine event: { class:%d, inst:%d }\n",
>+				class, instance);
>+		}
> 	}
>
> 	spin_unlock_irqrestore(&pmu->lock, flags);
>@@ -714,21 +721,26 @@ static void i915_pmu_disable(struct perf_event *event)
> 	if (is_engine_event(event)) {
> 		u8 sample = engine_event_sample(event);
> 		struct intel_engine_cs *engine;
>-
>-		engine = intel_engine_lookup_user(i915,
>-						  engine_event_class(event),
>-						  engine_event_instance(event));
>-
>-		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>-		GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>-		GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
>-
>-		/*
>-		 * Decrement the reference count and clear the enabled
>-		 * bitmask when the last listener on an event goes away.
>-		 */
>-		if (--engine->pmu.enable_count[sample] == 0)
>-			engine->pmu.enable &= ~BIT(sample);
>+		u8 class = engine_event_class(event);
>+		u8 instance = engine_event_instance(event);
>+
>+		engine = intel_engine_lookup_user(i915, class, instance);
>+		if (engine) {
>+			GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>+			GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>+			GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
>+
>+			/*
>+			 * Decrement the reference count and clear the enabled
>+			 * bitmask when the last listener on an event goes away.
>+			 */
>+			if (--engine->pmu.enable_count[sample] == 0)
>+				engine->pmu.enable &= ~BIT(sample);
>+		} else {
>+			drm_dbg(&i915->drm,
>+				"Invalid engine event: { class:%d, inst:%d }\n",
>+				class, instance);
>+		}
> 	}
>
> 	GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>-- 
>2.25.1
>

Tvrtko Ursulin July 1, 2022, 8:37 a.m. UTC | #2

On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
> On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>> In the driver teardown, we are unregistering the gt prior
>> to unregistering the PMU. This means there is a small window
>> of time in which the application can request metrics from the
>> PMU, some of which are calling into the uapi engines list,
>> while the engines are not available. In this case we can
>> see null pointer dereferences.
>>
>> Fix this ordering in both the driver load and unload sequences.
>>
>> Additionally add a check for engine presence to prevent this
>> NPD in the event this ordering is accidentally reversed. Print
>> a debug message indicating when they aren't available.
>>
>> v1: Actually address the driver load/unload ordering issue
>>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> ---
> 
> I thought this is likely happening because intel_gpu_top is running in 
> the background when i915 is unloaded. I tried a quick repro, I don't see 
> the unload succeed ("fatal module in use", not sure if this was a 
> partial unload), but when I try to kill intel_gpu_top, I get an NPD. 
> This is in the event disable path - i915_pmu_event_stop -> 
> i915_pmu_disable.

So i915 failed to unload (as expected - with perf events open we elevate 
the module ref count via i915_pmu_event_init -> drm_dev_get), then you 
quit intel_gpu_top and get NPD? On the engine lookup? With the 
re-ordered init/fini sequence as from this patch?

With elevated module count there shouldn't be any unloading happening so 
I am intrigued.

> It's likely that you are seeing a different path (unload) leading to the 
> same issue.
> 
> I think in i915_pmu_disable/disable should be aware of event->hw.state 
> and or pmu->closed states before accessing the event. Maybe like,
> 
> if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event)) {
> 
> @Tvrtko, wondering if this case is tested by igt@perf_pmu@module-unload. 

A bit yes. From what Stuart wrote it seems the test would need to be 
extended to cover the case where PMU is getting opened while module 
unload is in progress.

But the NPD you saw is for the moment confusing so I don't know what is 
happening.

> I am not clear if we should use event->hw.state or pmu->closed here and 
> if/how they are related. IMO, for this issue, the engine check is good 
> enough too, so we don't really need the pmu state checks.  Thoughts?

Engine check at the moment feels like papering.

Indeed as you say I think the pmu->closed might be the solution. Perhaps 
the race is as mentioned above. PMU open happening in parallel to unload..

If the sequence of events userspace triggers is:

   i915_pmu_event_init
   i915_pmu_event_start
   i915_pmu_enable
   i915_pmu_event_read

I guess pmu->closed can get set halfway in i915_pmu_event_init. What 
would be the effect of that.. We'd try to get a module reference while 
in the process of unloading. Which is probably very bad.. So possibly a 
final check on pmu->close is needed there. Ho hum.. can it be made safe 
is the question.

It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps the 
evens open all the time. So I think more info needed, for me at least.

Regards,

Tvrtko

> 
> Regards,
> Umesh
> 
>> drivers/gpu/drm/i915/i915_driver.c | 11 ++---
>> drivers/gpu/drm/i915/i915_pmu.c    | 72 +++++++++++++++++-------------
>> 2 files changed, 48 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
>> b/drivers/gpu/drm/i915/i915_driver.c
>> index deb8a8b76965..ee4dcb85d206 100644
>> --- a/drivers/gpu/drm/i915/i915_driver.c
>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>> @@ -717,7 +717,6 @@ static void i915_driver_register(struct 
>> drm_i915_private *dev_priv)
>>     struct drm_device *dev = &dev_priv->drm;
>>
>>     i915_gem_driver_register(dev_priv);
>> -    i915_pmu_register(dev_priv);
>>
>>     intel_vgpu_register(dev_priv);
>>
>> @@ -731,11 +730,12 @@ static void i915_driver_register(struct 
>> drm_i915_private *dev_priv)
>>     i915_debugfs_register(dev_priv);
>>     i915_setup_sysfs(dev_priv);
>>
>> +    intel_gt_driver_register(to_gt(dev_priv));
>> +
>>     /* Depends on sysfs having been initialized */
>> +    i915_pmu_register(dev_priv);
>>     i915_perf_register(dev_priv);
>>
>> -    intel_gt_driver_register(to_gt(dev_priv));
>> -
>>     intel_display_driver_register(dev_priv);
>>
>>     intel_power_domains_enable(dev_priv);
>> @@ -762,11 +762,12 @@ static void i915_driver_unregister(struct 
>> drm_i915_private *dev_priv)
>>
>>     intel_display_driver_unregister(dev_priv);
>>
>> -    intel_gt_driver_unregister(to_gt(dev_priv));
>> -
>>     i915_perf_unregister(dev_priv);
>> +    /* GT should be available until PMU is gone */
>>     i915_pmu_unregister(dev_priv);
>>
>> +    intel_gt_driver_unregister(to_gt(dev_priv));
>> +
>>     i915_teardown_sysfs(dev_priv);
>>     drm_dev_unplug(&dev_priv->drm);
>>
>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
>> b/drivers/gpu/drm/i915/i915_pmu.c
>> index 958b37123bf1..796a1d8e36f2 100644
>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>> @@ -670,21 +670,28 @@ static void i915_pmu_enable(struct perf_event 
>> *event)
>>     if (is_engine_event(event)) {
>>         u8 sample = engine_event_sample(event);
>>         struct intel_engine_cs *engine;
>> -
>> -        engine = intel_engine_lookup_user(i915,
>> -                          engine_event_class(event),
>> -                          engine_event_instance(event));
>> -
>> -        BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
>> -                 I915_ENGINE_SAMPLE_COUNT);
>> -        BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
>> -                 I915_ENGINE_SAMPLE_COUNT);
>> -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>> -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>> -        GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>> -
>> -        engine->pmu.enable |= BIT(sample);
>> -        engine->pmu.enable_count[sample]++;
>> +        u8 class = engine_event_class(event);
>> +        u8 instance = engine_event_instance(event);
>> +
>> +        engine = intel_engine_lookup_user(i915, class, instance);
>> +        if (engine) {
>> +            BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
>> +                     I915_ENGINE_SAMPLE_COUNT);
>> +            BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
>> +                     I915_ENGINE_SAMPLE_COUNT);
>> +            GEM_BUG_ON(sample >=
>> +                   ARRAY_SIZE(engine->pmu.enable_count));
>> +            GEM_BUG_ON(sample >=
>> +                   ARRAY_SIZE(engine->pmu.sample));
>> +            GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>> +
>> +            engine->pmu.enable |= BIT(sample);
>> +            engine->pmu.enable_count[sample]++;
>> +        } else {
>> +            drm_dbg(&i915->drm,
>> +                "Invalid engine event: { class:%d, inst:%d }\n",
>> +                class, instance);
>> +        }
>>     }
>>
>>     spin_unlock_irqrestore(&pmu->lock, flags);
>> @@ -714,21 +721,26 @@ static void i915_pmu_disable(struct perf_event 
>> *event)
>>     if (is_engine_event(event)) {
>>         u8 sample = engine_event_sample(event);
>>         struct intel_engine_cs *engine;
>> -
>> -        engine = intel_engine_lookup_user(i915,
>> -                          engine_event_class(event),
>> -                          engine_event_instance(event));
>> -
>> -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>> -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>> -        GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
>> -
>> -        /*
>> -         * Decrement the reference count and clear the enabled
>> -         * bitmask when the last listener on an event goes away.
>> -         */
>> -        if (--engine->pmu.enable_count[sample] == 0)
>> -            engine->pmu.enable &= ~BIT(sample);
>> +        u8 class = engine_event_class(event);
>> +        u8 instance = engine_event_instance(event);
>> +
>> +        engine = intel_engine_lookup_user(i915, class, instance);
>> +        if (engine) {
>> +            GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>> +            GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>> +            GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
>> +
>> +            /*
>> +             * Decrement the reference count and clear the enabled
>> +             * bitmask when the last listener on an event goes away.
>> +             */
>> +            if (--engine->pmu.enable_count[sample] == 0)
>> +                engine->pmu.enable &= ~BIT(sample);
>> +        } else {
>> +            drm_dbg(&i915->drm,
>> +                "Invalid engine event: { class:%d, inst:%d }\n",
>> +                class, instance);
>> +        }
>>     }
>>
>>     GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>> -- 
>> 2.25.1
>>

Summers, Stuart July 1, 2022, 2:54 p.m. UTC | #3

On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
> On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
> > On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
> > > In the driver teardown, we are unregistering the gt prior
> > > to unregistering the PMU. This means there is a small window
> > > of time in which the application can request metrics from the
> > > PMU, some of which are calling into the uapi engines list,
> > > while the engines are not available. In this case we can
> > > see null pointer dereferences.
> > > 
> > > Fix this ordering in both the driver load and unload sequences.
> > > 
> > > Additionally add a check for engine presence to prevent this
> > > NPD in the event this ordering is accidentally reversed. Print
> > > a debug message indicating when they aren't available.
> > > 
> > > v1: Actually address the driver load/unload ordering issue
> > > 
> > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > ---
> > 
> > I thought this is likely happening because intel_gpu_top is running
> > in 
> > the background when i915 is unloaded. I tried a quick repro, I
> > don't see 
> > the unload succeed ("fatal module in use", not sure if this was a 
> > partial unload), but when I try to kill intel_gpu_top, I get an
> > NPD. 
> > This is in the event disable path - i915_pmu_event_stop -> 
> > i915_pmu_disable.
> 
> So i915 failed to unload (as expected - with perf events open we
> elevate 
> the module ref count via i915_pmu_event_init -> drm_dev_get), then
> you 
> quit intel_gpu_top and get NPD? On the engine lookup? With the 
> re-ordered init/fini sequence as from this patch?
> 
> With elevated module count there shouldn't be any unloading happening
> so 
> I am intrigued.
> 
> > It's likely that you are seeing a different path (unload) leading
> > to the 
> > same issue.
> > 
> > I think in i915_pmu_disable/disable should be aware of event-
> > >hw.state 
> > and or pmu->closed states before accessing the event. Maybe like,
> > 
> > if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
> > {
> > 
> > @Tvrtko, wondering if this case is tested by igt@perf
> > _pmu@module-unload. 
> 
> A bit yes. From what Stuart wrote it seems the test would need to be 
> extended to cover the case where PMU is getting opened while module 
> unload is in progress.
> 
> But the NPD you saw is for the moment confusing so I don't know what
> is 
> happening.
> 
> > I am not clear if we should use event->hw.state or pmu->closed here
> > and 
> > if/how they are related. IMO, for this issue, the engine check is
> > good 
> > enough too, so we don't really need the pmu state checks. 
> > Thoughts?
> 
> Engine check at the moment feels like papering.
> 
> Indeed as you say I think the pmu->closed might be the solution.
> Perhaps 
> the race is as mentioned above. PMU open happening in parallel to
> unload..
> 
> If the sequence of events userspace triggers is:
> 
>    i915_pmu_event_init
>    i915_pmu_event_start
>    i915_pmu_enable
>    i915_pmu_event_read
> 
> I guess pmu->closed can get set halfway in i915_pmu_event_init. What 
> would be the effect of that.. We'd try to get a module reference
> while 
> in the process of unloading. Which is probably very bad.. So possibly
> a 
> final check on pmu->close is needed there. Ho hum.. can it be made
> safe 
> is the question.
> 
> It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
> the 
> evens open all the time. So I think more info needed, for me at
> least.

So one thing here is this doesn't have to do with module unload, but
module unbind specifically (while perf is open). I don't know if the
NPD from Umesh is the same as what we're seeing here. I'd really like
to separate these unless you know for sure that's related. Also it
would be interesting to know if this patch fixes your issue as well.

I still think the re-ordering in i915_driver.c should be enough and we
shouldn't need to check pmu->closed. The unregister should be enough to
ensure the perf tools are notified that new events aren't allowed, and
at that time the engine structures are still intact. And even if for
some reason the perf code still calls in to our function pointers, we
have these engine checks as a failsafe.

I'm by the way uploading one more version here with a drm_WARN_ONCE
instead of the debug print.

Thanks,
Stuart

> 
> Regards,
> 
> Tvrtko
> 
> > Regards,
> > Umesh
> > 
> > > drivers/gpu/drm/i915/i915_driver.c | 11 ++---
> > > drivers/gpu/drm/i915/i915_pmu.c    | 72 +++++++++++++++++------
> > > -------
> > > 2 files changed, 48 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> > > b/drivers/gpu/drm/i915/i915_driver.c
> > > index deb8a8b76965..ee4dcb85d206 100644
> > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > @@ -717,7 +717,6 @@ static void i915_driver_register(struct 
> > > drm_i915_private *dev_priv)
> > >     struct drm_device *dev = &dev_priv->drm;
> > > 
> > >     i915_gem_driver_register(dev_priv);
> > > -    i915_pmu_register(dev_priv);
> > > 
> > >     intel_vgpu_register(dev_priv);
> > > 
> > > @@ -731,11 +730,12 @@ static void i915_driver_register(struct 
> > > drm_i915_private *dev_priv)
> > >     i915_debugfs_register(dev_priv);
> > >     i915_setup_sysfs(dev_priv);
> > > 
> > > +    intel_gt_driver_register(to_gt(dev_priv));
> > > +
> > >     /* Depends on sysfs having been initialized */
> > > +    i915_pmu_register(dev_priv);
> > >     i915_perf_register(dev_priv);
> > > 
> > > -    intel_gt_driver_register(to_gt(dev_priv));
> > > -
> > >     intel_display_driver_register(dev_priv);
> > > 
> > >     intel_power_domains_enable(dev_priv);
> > > @@ -762,11 +762,12 @@ static void i915_driver_unregister(struct 
> > > drm_i915_private *dev_priv)
> > > 
> > >     intel_display_driver_unregister(dev_priv);
> > > 
> > > -    intel_gt_driver_unregister(to_gt(dev_priv));
> > > -
> > >     i915_perf_unregister(dev_priv);
> > > +    /* GT should be available until PMU is gone */
> > >     i915_pmu_unregister(dev_priv);
> > > 
> > > +    intel_gt_driver_unregister(to_gt(dev_priv));
> > > +
> > >     i915_teardown_sysfs(dev_priv);
> > >     drm_dev_unplug(&dev_priv->drm);
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
> > > b/drivers/gpu/drm/i915/i915_pmu.c
> > > index 958b37123bf1..796a1d8e36f2 100644
> > > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > > @@ -670,21 +670,28 @@ static void i915_pmu_enable(struct
> > > perf_event 
> > > *event)
> > >     if (is_engine_event(event)) {
> > >         u8 sample = engine_event_sample(event);
> > >         struct intel_engine_cs *engine;
> > > -
> > > -        engine = intel_engine_lookup_user(i915,
> > > -                          engine_event_class(event),
> > > -                          engine_event_instance(event));
> > > -
> > > -        BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
> > > -                 I915_ENGINE_SAMPLE_COUNT);
> > > -        BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
> > > -                 I915_ENGINE_SAMPLE_COUNT);
> > > -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine-
> > > >pmu.enable_count));
> > > -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
> > > -        GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
> > > -
> > > -        engine->pmu.enable |= BIT(sample);
> > > -        engine->pmu.enable_count[sample]++;
> > > +        u8 class = engine_event_class(event);
> > > +        u8 instance = engine_event_instance(event);
> > > +
> > > +        engine = intel_engine_lookup_user(i915, class,
> > > instance);
> > > +        if (engine) {
> > > +            BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
> > > +                     I915_ENGINE_SAMPLE_COUNT);
> > > +            BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
> > > +                     I915_ENGINE_SAMPLE_COUNT);
> > > +            GEM_BUG_ON(sample >=
> > > +                   ARRAY_SIZE(engine->pmu.enable_count));
> > > +            GEM_BUG_ON(sample >=
> > > +                   ARRAY_SIZE(engine->pmu.sample));
> > > +            GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
> > > +
> > > +            engine->pmu.enable |= BIT(sample);
> > > +            engine->pmu.enable_count[sample]++;
> > > +        } else {
> > > +            drm_dbg(&i915->drm,
> > > +                "Invalid engine event: { class:%d, inst:%d }\n",
> > > +                class, instance);
> > > +        }
> > >     }
> > > 
> > >     spin_unlock_irqrestore(&pmu->lock, flags);
> > > @@ -714,21 +721,26 @@ static void i915_pmu_disable(struct
> > > perf_event 
> > > *event)
> > >     if (is_engine_event(event)) {
> > >         u8 sample = engine_event_sample(event);
> > >         struct intel_engine_cs *engine;
> > > -
> > > -        engine = intel_engine_lookup_user(i915,
> > > -                          engine_event_class(event),
> > > -                          engine_event_instance(event));
> > > -
> > > -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine-
> > > >pmu.enable_count));
> > > -        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
> > > -        GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
> > > -
> > > -        /*
> > > -         * Decrement the reference count and clear the enabled
> > > -         * bitmask when the last listener on an event goes away.
> > > -         */
> > > -        if (--engine->pmu.enable_count[sample] == 0)
> > > -            engine->pmu.enable &= ~BIT(sample);
> > > +        u8 class = engine_event_class(event);
> > > +        u8 instance = engine_event_instance(event);
> > > +
> > > +        engine = intel_engine_lookup_user(i915, class,
> > > instance);
> > > +        if (engine) {
> > > +            GEM_BUG_ON(sample >= ARRAY_SIZE(engine-
> > > >pmu.enable_count));
> > > +            GEM_BUG_ON(sample >= ARRAY_SIZE(engine-
> > > >pmu.sample));
> > > +            GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
> > > +
> > > +            /*
> > > +             * Decrement the reference count and clear the
> > > enabled
> > > +             * bitmask when the last listener on an event goes
> > > away.
> > > +             */
> > > +            if (--engine->pmu.enable_count[sample] == 0)
> > > +                engine->pmu.enable &= ~BIT(sample);
> > > +        } else {
> > > +            drm_dbg(&i915->drm,
> > > +                "Invalid engine event: { class:%d, inst:%d }\n",
> > > +                class, instance);
> > > +        }
> > >     }
> > > 
> > >     GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
> > > -- 
> > > 2.25.1
> > >

Umesh Nerlige Ramappa July 1, 2022, 6:09 p.m. UTC | #4

On Fri, Jul 01, 2022 at 09:37:20AM +0100, Tvrtko Ursulin wrote:
>
>On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>In the driver teardown, we are unregistering the gt prior
>>>to unregistering the PMU. This means there is a small window
>>>of time in which the application can request metrics from the
>>>PMU, some of which are calling into the uapi engines list,
>>>while the engines are not available. In this case we can
>>>see null pointer dereferences.
>>>
>>>Fix this ordering in both the driver load and unload sequences.
>>>
>>>Additionally add a check for engine presence to prevent this
>>>NPD in the event this ordering is accidentally reversed. Print
>>>a debug message indicating when they aren't available.
>>>
>>>v1: Actually address the driver load/unload ordering issue
>>>
>>>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>---
>>
>>I thought this is likely happening because intel_gpu_top is running 
>>in the background when i915 is unloaded. I tried a quick repro, I 
>>don't see the unload succeed ("fatal module in use", not sure if 
>>this was a partial unload), but when I try to kill intel_gpu_top, I 
>>get an NPD. This is in the event disable path - i915_pmu_event_stop 
>>-> i915_pmu_disable.
>
>So i915 failed to unload (as expected - with perf events open we 
>elevate the module ref count via i915_pmu_event_init -> drm_dev_get), 
>then you quit intel_gpu_top and get NPD?

I was just trying to point out an instance of the failure that I saw 
when running this execution sequence. This is without any of Stuart's 
changes.

> On the engine lookup?

Don't know if it is specifically engine lookup, but it's in the path of 
i915_pmu_event_stop which executes on killing intel_gpu_top.

> With the re-ordered init/fini sequence as from this patch?
No, without any changes from this thread.

>
>With elevated module count there shouldn't be any unloading happening 
>so I am intrigued.
>
>>It's likely that you are seeing a different path (unload) leading to 
>>the same issue.
>>
>>I think in i915_pmu_disable/disable should be aware of 
>>event->hw.state and or pmu->closed states before accessing the 
>>event. Maybe like,
>>
>>if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event)) {
>>
>>@Tvrtko, wondering if this case is tested by 
>>igt@perf_pmu@module-unload.
>
>A bit yes. From what Stuart wrote it seems the test would need to be 
>extended to cover the case where PMU is getting opened while module 
>unload is in progress.
>
>But the NPD you saw is for the moment confusing so I don't know what 
>is happening.

I guess it's a separate issue. I can check if Stuart's patches resolve 
it and if not, will create a bug.

Regards,
Umesh

>
>>I am not clear if we should use event->hw.state or pmu->closed here 
>>and if/how they are related. IMO, for this issue, the engine check 
>>is good enough too, so we don't really need the pmu state checks.  
>>Thoughts?
>
>Engine check at the moment feels like papering.
>
>Indeed as you say I think the pmu->closed might be the solution. 
>Perhaps the race is as mentioned above. PMU open happening in parallel 
>to unload..
>
>If the sequence of events userspace triggers is:
>
>  i915_pmu_event_init
>  i915_pmu_event_start
>  i915_pmu_enable
>  i915_pmu_event_read
>
>I guess pmu->closed can get set halfway in i915_pmu_event_init. What 
>would be the effect of that.. We'd try to get a module reference while 
>in the process of unloading. Which is probably very bad.. So possibly 
>a final check on pmu->close is needed there. Ho hum.. can it be made 
>safe is the question.
>
>It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps the 
>evens open all the time. So I think more info needed, for me at least.
>
>Regards,
>
>Tvrtko
>
>>
>>Regards,
>>Umesh
>>
>>>drivers/gpu/drm/i915/i915_driver.c | 11 ++---
>>>drivers/gpu/drm/i915/i915_pmu.c    | 72 +++++++++++++++++-------------
>>>2 files changed, 48 insertions(+), 35 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/i915/i915_driver.c 
>>>b/drivers/gpu/drm/i915/i915_driver.c
>>>index deb8a8b76965..ee4dcb85d206 100644
>>>--- a/drivers/gpu/drm/i915/i915_driver.c
>>>+++ b/drivers/gpu/drm/i915/i915_driver.c
>>>@@ -717,7 +717,6 @@ static void i915_driver_register(struct 
>>>drm_i915_private *dev_priv)
>>>    struct drm_device *dev = &dev_priv->drm;
>>>
>>>    i915_gem_driver_register(dev_priv);
>>>-    i915_pmu_register(dev_priv);
>>>
>>>    intel_vgpu_register(dev_priv);
>>>
>>>@@ -731,11 +730,12 @@ static void i915_driver_register(struct 
>>>drm_i915_private *dev_priv)
>>>    i915_debugfs_register(dev_priv);
>>>    i915_setup_sysfs(dev_priv);
>>>
>>>+    intel_gt_driver_register(to_gt(dev_priv));
>>>+
>>>    /* Depends on sysfs having been initialized */
>>>+    i915_pmu_register(dev_priv);
>>>    i915_perf_register(dev_priv);
>>>
>>>-    intel_gt_driver_register(to_gt(dev_priv));
>>>-
>>>    intel_display_driver_register(dev_priv);
>>>
>>>    intel_power_domains_enable(dev_priv);
>>>@@ -762,11 +762,12 @@ static void i915_driver_unregister(struct 
>>>drm_i915_private *dev_priv)
>>>
>>>    intel_display_driver_unregister(dev_priv);
>>>
>>>-    intel_gt_driver_unregister(to_gt(dev_priv));
>>>-
>>>    i915_perf_unregister(dev_priv);
>>>+    /* GT should be available until PMU is gone */
>>>    i915_pmu_unregister(dev_priv);
>>>
>>>+    intel_gt_driver_unregister(to_gt(dev_priv));
>>>+
>>>    i915_teardown_sysfs(dev_priv);
>>>    drm_dev_unplug(&dev_priv->drm);
>>>
>>>diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
>>>b/drivers/gpu/drm/i915/i915_pmu.c
>>>index 958b37123bf1..796a1d8e36f2 100644
>>>--- a/drivers/gpu/drm/i915/i915_pmu.c
>>>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>>>@@ -670,21 +670,28 @@ static void i915_pmu_enable(struct 
>>>perf_event *event)
>>>    if (is_engine_event(event)) {
>>>        u8 sample = engine_event_sample(event);
>>>        struct intel_engine_cs *engine;
>>>-
>>>-        engine = intel_engine_lookup_user(i915,
>>>-                          engine_event_class(event),
>>>-                          engine_event_instance(event));
>>>-
>>>-        BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
>>>-                 I915_ENGINE_SAMPLE_COUNT);
>>>-        BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
>>>-                 I915_ENGINE_SAMPLE_COUNT);
>>>-        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>>>-        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>>>-        GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>>>-
>>>-        engine->pmu.enable |= BIT(sample);
>>>-        engine->pmu.enable_count[sample]++;
>>>+        u8 class = engine_event_class(event);
>>>+        u8 instance = engine_event_instance(event);
>>>+
>>>+        engine = intel_engine_lookup_user(i915, class, instance);
>>>+        if (engine) {
>>>+            BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.enable_count) !=
>>>+                     I915_ENGINE_SAMPLE_COUNT);
>>>+            BUILD_BUG_ON(ARRAY_SIZE(engine->pmu.sample) !=
>>>+                     I915_ENGINE_SAMPLE_COUNT);
>>>+            GEM_BUG_ON(sample >=
>>>+                   ARRAY_SIZE(engine->pmu.enable_count));
>>>+            GEM_BUG_ON(sample >=
>>>+                   ARRAY_SIZE(engine->pmu.sample));
>>>+            GEM_BUG_ON(engine->pmu.enable_count[sample] == ~0);
>>>+
>>>+            engine->pmu.enable |= BIT(sample);
>>>+            engine->pmu.enable_count[sample]++;
>>>+        } else {
>>>+            drm_dbg(&i915->drm,
>>>+                "Invalid engine event: { class:%d, inst:%d }\n",
>>>+                class, instance);
>>>+        }
>>>    }
>>>
>>>    spin_unlock_irqrestore(&pmu->lock, flags);
>>>@@ -714,21 +721,26 @@ static void i915_pmu_disable(struct 
>>>perf_event *event)
>>>    if (is_engine_event(event)) {
>>>        u8 sample = engine_event_sample(event);
>>>        struct intel_engine_cs *engine;
>>>-
>>>-        engine = intel_engine_lookup_user(i915,
>>>-                          engine_event_class(event),
>>>-                          engine_event_instance(event));
>>>-
>>>-        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>>>-        GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>>>-        GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
>>>-
>>>-        /*
>>>-         * Decrement the reference count and clear the enabled
>>>-         * bitmask when the last listener on an event goes away.
>>>-         */
>>>-        if (--engine->pmu.enable_count[sample] == 0)
>>>-            engine->pmu.enable &= ~BIT(sample);
>>>+        u8 class = engine_event_class(event);
>>>+        u8 instance = engine_event_instance(event);
>>>+
>>>+        engine = intel_engine_lookup_user(i915, class, instance);
>>>+        if (engine) {
>>>+            GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.enable_count));
>>>+            GEM_BUG_ON(sample >= ARRAY_SIZE(engine->pmu.sample));
>>>+            GEM_BUG_ON(engine->pmu.enable_count[sample] == 0);
>>>+
>>>+            /*
>>>+             * Decrement the reference count and clear the enabled
>>>+             * bitmask when the last listener on an event goes away.
>>>+             */
>>>+            if (--engine->pmu.enable_count[sample] == 0)
>>>+                engine->pmu.enable &= ~BIT(sample);
>>>+        } else {
>>>+            drm_dbg(&i915->drm,
>>>+                "Invalid engine event: { class:%d, inst:%d }\n",
>>>+                class, instance);
>>>+        }
>>>    }
>>>
>>>    GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count));
>>>-- 
>>>2.25.1
>>>

Tvrtko Ursulin July 4, 2022, 8:31 a.m. UTC | #5

On 01/07/2022 15:54, Summers, Stuart wrote:
> On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>> On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>> On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>> In the driver teardown, we are unregistering the gt prior
>>>> to unregistering the PMU. This means there is a small window
>>>> of time in which the application can request metrics from the
>>>> PMU, some of which are calling into the uapi engines list,
>>>> while the engines are not available. In this case we can
>>>> see null pointer dereferences.
>>>>
>>>> Fix this ordering in both the driver load and unload sequences.
>>>>
>>>> Additionally add a check for engine presence to prevent this
>>>> NPD in the event this ordering is accidentally reversed. Print
>>>> a debug message indicating when they aren't available.
>>>>
>>>> v1: Actually address the driver load/unload ordering issue
>>>>
>>>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>> ---
>>>
>>> I thought this is likely happening because intel_gpu_top is running
>>> in
>>> the background when i915 is unloaded. I tried a quick repro, I
>>> don't see
>>> the unload succeed ("fatal module in use", not sure if this was a
>>> partial unload), but when I try to kill intel_gpu_top, I get an
>>> NPD.
>>> This is in the event disable path - i915_pmu_event_stop ->
>>> i915_pmu_disable.
>>
>> So i915 failed to unload (as expected - with perf events open we
>> elevate
>> the module ref count via i915_pmu_event_init -> drm_dev_get), then
>> you
>> quit intel_gpu_top and get NPD? On the engine lookup? With the
>> re-ordered init/fini sequence as from this patch?
>>
>> With elevated module count there shouldn't be any unloading happening
>> so
>> I am intrigued.
>>
>>> It's likely that you are seeing a different path (unload) leading
>>> to the
>>> same issue.
>>>
>>> I think in i915_pmu_disable/disable should be aware of event-
>>>> hw.state
>>> and or pmu->closed states before accessing the event. Maybe like,
>>>
>>> if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
>>> {
>>>
>>> @Tvrtko, wondering if this case is tested by igt@perf
>>> _pmu@module-unload.
>>
>> A bit yes. From what Stuart wrote it seems the test would need to be
>> extended to cover the case where PMU is getting opened while module
>> unload is in progress.
>>
>> But the NPD you saw is for the moment confusing so I don't know what
>> is
>> happening.
>>
>>> I am not clear if we should use event->hw.state or pmu->closed here
>>> and
>>> if/how they are related. IMO, for this issue, the engine check is
>>> good
>>> enough too, so we don't really need the pmu state checks.
>>> Thoughts?
>>
>> Engine check at the moment feels like papering.
>>
>> Indeed as you say I think the pmu->closed might be the solution.
>> Perhaps
>> the race is as mentioned above. PMU open happening in parallel to
>> unload..
>>
>> If the sequence of events userspace triggers is:
>>
>>     i915_pmu_event_init
>>     i915_pmu_event_start
>>     i915_pmu_enable
>>     i915_pmu_event_read
>>
>> I guess pmu->closed can get set halfway in i915_pmu_event_init. What
>> would be the effect of that.. We'd try to get a module reference
>> while
>> in the process of unloading. Which is probably very bad.. So possibly
>> a
>> final check on pmu->close is needed there. Ho hum.. can it be made
>> safe
>> is the question.
>>
>> It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
>> the
>> evens open all the time. So I think more info needed, for me at
>> least.
> 
> So one thing here is this doesn't have to do with module unload, but
> module unbind specifically (while perf is open). I don't know if the
> NPD from Umesh is the same as what we're seeing here. I'd really like
> to separate these unless you know for sure that's related. Also it
> would be interesting to know if this patch fixes your issue as well.
> 
> I still think the re-ordering in i915_driver.c should be enough and we
> shouldn't need to check pmu->closed. The unregister should be enough to
> ensure the perf tools are notified that new events aren't allowed, and
> at that time the engine structures are still intact. And even if for
> some reason the perf code still calls in to our function pointers, we
> have these engine checks as a failsafe.
> 
> I'm by the way uploading one more version here with a drm_WARN_ONCE
> instead of the debug print.

Problem is I am not a fan of papering so lets get to the bottom of the 
issue first. (In the meantime simple patch to re-order driver fini is 
okay since that seems obvious enough, I tnink.)

We need to see call traces from any oopses and try to extend perf_pmu to 
catch them. And we need to understand the problem, if it is a real 
problem, which I laid out last week about race between module unload and 
elevating the module use count from our perf event init.

Without understanding the details of possible failure mode flows we 
don't know how much the papering with engine checks solves and how much 
it leaves broken.

If you guys are too busy to tackle that I'll put it onto myself, but 
help would certainly be appreciated.

Regards,

Tvrtko

Umesh Nerlige Ramappa July 12, 2022, 9:03 p.m. UTC | #6

On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin wrote:
>
>On 01/07/2022 15:54, Summers, Stuart wrote:
>>On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>>>In the driver teardown, we are unregistering the gt prior
>>>>>to unregistering the PMU. This means there is a small window
>>>>>of time in which the application can request metrics from the
>>>>>PMU, some of which are calling into the uapi engines list,
>>>>>while the engines are not available. In this case we can
>>>>>see null pointer dereferences.
>>>>>
>>>>>Fix this ordering in both the driver load and unload sequences.
>>>>>
>>>>>Additionally add a check for engine presence to prevent this
>>>>>NPD in the event this ordering is accidentally reversed. Print
>>>>>a debug message indicating when they aren't available.
>>>>>
>>>>>v1: Actually address the driver load/unload ordering issue
>>>>>
>>>>>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>>>---
>>>>
>>>>I thought this is likely happening because intel_gpu_top is running
>>>>in
>>>>the background when i915 is unloaded. I tried a quick repro, I
>>>>don't see
>>>>the unload succeed ("fatal module in use", not sure if this was a
>>>>partial unload), but when I try to kill intel_gpu_top, I get an
>>>>NPD.
>>>>This is in the event disable path - i915_pmu_event_stop ->
>>>>i915_pmu_disable.
>>>
>>>So i915 failed to unload (as expected - with perf events open we
>>>elevate
>>>the module ref count via i915_pmu_event_init -> drm_dev_get), then
>>>you
>>>quit intel_gpu_top and get NPD? On the engine lookup? With the
>>>re-ordered init/fini sequence as from this patch?
>>>
>>>With elevated module count there shouldn't be any unloading happening
>>>so
>>>I am intrigued.
>>>
>>>>It's likely that you are seeing a different path (unload) leading
>>>>to the
>>>>same issue.
>>>>
>>>>I think in i915_pmu_disable/disable should be aware of event-
>>>>>hw.state
>>>>and or pmu->closed states before accessing the event. Maybe like,
>>>>
>>>>if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
>>>>{
>>>>
>>>>@Tvrtko, wondering if this case is tested by igt@perf
>>>>_pmu@module-unload.
>>>
>>>A bit yes. From what Stuart wrote it seems the test would need to be
>>>extended to cover the case where PMU is getting opened while module
>>>unload is in progress.
>>>
>>>But the NPD you saw is for the moment confusing so I don't know what
>>>is
>>>happening.
>>>
>>>>I am not clear if we should use event->hw.state or pmu->closed here
>>>>and
>>>>if/how they are related. IMO, for this issue, the engine check is
>>>>good
>>>>enough too, so we don't really need the pmu state checks.
>>>>Thoughts?
>>>
>>>Engine check at the moment feels like papering.
>>>
>>>Indeed as you say I think the pmu->closed might be the solution.
>>>Perhaps
>>>the race is as mentioned above. PMU open happening in parallel to
>>>unload..
>>>
>>>If the sequence of events userspace triggers is:
>>>
>>>    i915_pmu_event_init
>>>    i915_pmu_event_start
>>>    i915_pmu_enable
>>>    i915_pmu_event_read
>>>
>>>I guess pmu->closed can get set halfway in i915_pmu_event_init. What
>>>would be the effect of that.. We'd try to get a module reference
>>>while
>>>in the process of unloading. Which is probably very bad.. So possibly
>>>a
>>>final check on pmu->close is needed there. Ho hum.. can it be made
>>>safe
>>>is the question.
>>>
>>>It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
>>>the
>>>evens open all the time. So I think more info needed, for me at
>>>least.
>>
>>So one thing here is this doesn't have to do with module unload, but
>>module unbind specifically (while perf is open). I don't know if the
>>NPD from Umesh is the same as what we're seeing here. I'd really like
>>to separate these unless you know for sure that's related. Also it
>>would be interesting to know if this patch fixes your issue as well.
>>
>>I still think the re-ordering in i915_driver.c should be enough and we
>>shouldn't need to check pmu->closed. The unregister should be enough to
>>ensure the perf tools are notified that new events aren't allowed, and
>>at that time the engine structures are still intact. And even if for
>>some reason the perf code still calls in to our function pointers, we
>>have these engine checks as a failsafe.
>>
>>I'm by the way uploading one more version here with a drm_WARN_ONCE
>>instead of the debug print.
>
>Problem is I am not a fan of papering so lets get to the bottom of the 
>issue first. (In the meantime simple patch to re-order driver fini is 
>okay since that seems obvious enough, I tnink.)
>
>We need to see call traces from any oopses and try to extend perf_pmu 
>to catch them. And we need to understand the problem, if it is a real 
>problem, which I laid out last week about race between module unload 
>and elevating the module use count from our perf event init.
>
>Without understanding the details of possible failure mode flows we 
>don't know how much the papering with engine checks solves and how 
>much it leaves broken.
>
>If you guys are too busy to tackle that I'll put it onto myself, but 
>help would certainly be appreciated.

Looks like Stuart/Chris are pointing towards the unbind as an issue.

I ran this sequence and only the modprobe showed an error (FATAL: ...  
still in use). What happens with the unbind. Should pmu also handle the 
unbind somehow?

- run intel_gpu_top
- unbind
- modprobe -r i915
- kill intel_gpu_top.

Thanks,
Umesh

>
>Regards,
>
>Tvrtko

Tvrtko Ursulin July 19, 2022, 9 a.m. UTC | #7

On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
> On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin wrote:
>>
>> On 01/07/2022 15:54, Summers, Stuart wrote:
>>> On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>> On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>> On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>>>> In the driver teardown, we are unregistering the gt prior
>>>>>> to unregistering the PMU. This means there is a small window
>>>>>> of time in which the application can request metrics from the
>>>>>> PMU, some of which are calling into the uapi engines list,
>>>>>> while the engines are not available. In this case we can
>>>>>> see null pointer dereferences.
>>>>>>
>>>>>> Fix this ordering in both the driver load and unload sequences.
>>>>>>
>>>>>> Additionally add a check for engine presence to prevent this
>>>>>> NPD in the event this ordering is accidentally reversed. Print
>>>>>> a debug message indicating when they aren't available.
>>>>>>
>>>>>> v1: Actually address the driver load/unload ordering issue
>>>>>>
>>>>>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>>>> ---
>>>>>
>>>>> I thought this is likely happening because intel_gpu_top is running
>>>>> in
>>>>> the background when i915 is unloaded. I tried a quick repro, I
>>>>> don't see
>>>>> the unload succeed ("fatal module in use", not sure if this was a
>>>>> partial unload), but when I try to kill intel_gpu_top, I get an
>>>>> NPD.
>>>>> This is in the event disable path - i915_pmu_event_stop ->
>>>>> i915_pmu_disable.
>>>>
>>>> So i915 failed to unload (as expected - with perf events open we
>>>> elevate
>>>> the module ref count via i915_pmu_event_init -> drm_dev_get), then
>>>> you
>>>> quit intel_gpu_top and get NPD? On the engine lookup? With the
>>>> re-ordered init/fini sequence as from this patch?
>>>>
>>>> With elevated module count there shouldn't be any unloading happening
>>>> so
>>>> I am intrigued.
>>>>
>>>>> It's likely that you are seeing a different path (unload) leading
>>>>> to the
>>>>> same issue.
>>>>>
>>>>> I think in i915_pmu_disable/disable should be aware of event-
>>>>>> hw.state
>>>>> and or pmu->closed states before accessing the event. Maybe like,
>>>>>
>>>>> if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
>>>>> {
>>>>>
>>>>> @Tvrtko, wondering if this case is tested by igt@perf
>>>>> _pmu@module-unload.
>>>>
>>>> A bit yes. From what Stuart wrote it seems the test would need to be
>>>> extended to cover the case where PMU is getting opened while module
>>>> unload is in progress.
>>>>
>>>> But the NPD you saw is for the moment confusing so I don't know what
>>>> is
>>>> happening.
>>>>
>>>>> I am not clear if we should use event->hw.state or pmu->closed here
>>>>> and
>>>>> if/how they are related. IMO, for this issue, the engine check is
>>>>> good
>>>>> enough too, so we don't really need the pmu state checks.
>>>>> Thoughts?
>>>>
>>>> Engine check at the moment feels like papering.
>>>>
>>>> Indeed as you say I think the pmu->closed might be the solution.
>>>> Perhaps
>>>> the race is as mentioned above. PMU open happening in parallel to
>>>> unload..
>>>>
>>>> If the sequence of events userspace triggers is:
>>>>
>>>>    i915_pmu_event_init
>>>>    i915_pmu_event_start
>>>>    i915_pmu_enable
>>>>    i915_pmu_event_read
>>>>
>>>> I guess pmu->closed can get set halfway in i915_pmu_event_init. What
>>>> would be the effect of that.. We'd try to get a module reference
>>>> while
>>>> in the process of unloading. Which is probably very bad.. So possibly
>>>> a
>>>> final check on pmu->close is needed there. Ho hum.. can it be made
>>>> safe
>>>> is the question.
>>>>
>>>> It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
>>>> the
>>>> evens open all the time. So I think more info needed, for me at
>>>> least.
>>>
>>> So one thing here is this doesn't have to do with module unload, but
>>> module unbind specifically (while perf is open). I don't know if the
>>> NPD from Umesh is the same as what we're seeing here. I'd really like
>>> to separate these unless you know for sure that's related. Also it
>>> would be interesting to know if this patch fixes your issue as well.
>>>
>>> I still think the re-ordering in i915_driver.c should be enough and we
>>> shouldn't need to check pmu->closed. The unregister should be enough to
>>> ensure the perf tools are notified that new events aren't allowed, and
>>> at that time the engine structures are still intact. And even if for
>>> some reason the perf code still calls in to our function pointers, we
>>> have these engine checks as a failsafe.
>>>
>>> I'm by the way uploading one more version here with a drm_WARN_ONCE
>>> instead of the debug print.
>>
>> Problem is I am not a fan of papering so lets get to the bottom of the 
>> issue first. (In the meantime simple patch to re-order driver fini is 
>> okay since that seems obvious enough, I tnink.)
>>
>> We need to see call traces from any oopses and try to extend perf_pmu 
>> to catch them. And we need to understand the problem, if it is a real 
>> problem, which I laid out last week about race between module unload 
>> and elevating the module use count from our perf event init.
>>
>> Without understanding the details of possible failure mode flows we 
>> don't know how much the papering with engine checks solves and how 
>> much it leaves broken.
>>
>> If you guys are too busy to tackle that I'll put it onto myself, but 
>> help would certainly be appreciated.
> 
> Looks like Stuart/Chris are pointing towards the unbind as an issue.
> 
> I ran this sequence and only the modprobe showed an error (FATAL: ... 
> still in use). What happens with the unbind. Should pmu also handle the 
> unbind somehow?
> 
> - run intel_gpu_top
> - unbind
> - modprobe -r i915
> - kill intel_gpu_top.

And it crashes or survives in this scenario?

Module still in use here would be expected since intel_gpu_top is 
holding a module reference.

And pmu->closed should be set at the unbind step via i915_pci_remove -> 
i915_driver_unregister -> i915_pmu_unregister.

We also need to try a stress test with two threads:

	Thread A		Thread B
	-----------		-----------
	loop:			loop:
	  open pmu event	  rmmod
	  close pmu event	  insmod

To see if it can hit a problem with drm_dev_get from i915_pmu_event_init 
being called at a bad moment relative to module unload. Maybe I am 
confused but that seems a possibility and a serious problem currently.

Regards,

Tvrtko

Umesh Nerlige Ramappa July 20, 2022, 12:22 a.m. UTC | #8

On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin wrote:
>
>On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
>>On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 01/07/2022 15:54, Summers, Stuart wrote:
>>>>On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>>>On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>>>On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>>>>>In the driver teardown, we are unregistering the gt prior
>>>>>>>to unregistering the PMU. This means there is a small window
>>>>>>>of time in which the application can request metrics from the
>>>>>>>PMU, some of which are calling into the uapi engines list,
>>>>>>>while the engines are not available. In this case we can
>>>>>>>see null pointer dereferences.
>>>>>>>
>>>>>>>Fix this ordering in both the driver load and unload sequences.
>>>>>>>
>>>>>>>Additionally add a check for engine presence to prevent this
>>>>>>>NPD in the event this ordering is accidentally reversed. Print
>>>>>>>a debug message indicating when they aren't available.
>>>>>>>
>>>>>>>v1: Actually address the driver load/unload ordering issue
>>>>>>>
>>>>>>>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>>>>>---
>>>>>>
>>>>>>I thought this is likely happening because intel_gpu_top is running
>>>>>>in
>>>>>>the background when i915 is unloaded. I tried a quick repro, I
>>>>>>don't see
>>>>>>the unload succeed ("fatal module in use", not sure if this was a
>>>>>>partial unload), but when I try to kill intel_gpu_top, I get an
>>>>>>NPD.
>>>>>>This is in the event disable path - i915_pmu_event_stop ->
>>>>>>i915_pmu_disable.
>>>>>
>>>>>So i915 failed to unload (as expected - with perf events open we
>>>>>elevate
>>>>>the module ref count via i915_pmu_event_init -> drm_dev_get), then
>>>>>you
>>>>>quit intel_gpu_top and get NPD? On the engine lookup? With the
>>>>>re-ordered init/fini sequence as from this patch?
>>>>>
>>>>>With elevated module count there shouldn't be any unloading happening
>>>>>so
>>>>>I am intrigued.
>>>>>
>>>>>>It's likely that you are seeing a different path (unload) leading
>>>>>>to the
>>>>>>same issue.
>>>>>>
>>>>>>I think in i915_pmu_disable/disable should be aware of event-
>>>>>>>hw.state
>>>>>>and or pmu->closed states before accessing the event. Maybe like,
>>>>>>
>>>>>>if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
>>>>>>{
>>>>>>
>>>>>>@Tvrtko, wondering if this case is tested by igt@perf
>>>>>>_pmu@module-unload.
>>>>>
>>>>>A bit yes. From what Stuart wrote it seems the test would need to be
>>>>>extended to cover the case where PMU is getting opened while module
>>>>>unload is in progress.
>>>>>
>>>>>But the NPD you saw is for the moment confusing so I don't know what
>>>>>is
>>>>>happening.
>>>>>
>>>>>>I am not clear if we should use event->hw.state or pmu->closed here
>>>>>>and
>>>>>>if/how they are related. IMO, for this issue, the engine check is
>>>>>>good
>>>>>>enough too, so we don't really need the pmu state checks.
>>>>>>Thoughts?
>>>>>
>>>>>Engine check at the moment feels like papering.
>>>>>
>>>>>Indeed as you say I think the pmu->closed might be the solution.
>>>>>Perhaps
>>>>>the race is as mentioned above. PMU open happening in parallel to
>>>>>unload..
>>>>>
>>>>>If the sequence of events userspace triggers is:
>>>>>
>>>>>   i915_pmu_event_init
>>>>>   i915_pmu_event_start
>>>>>   i915_pmu_enable
>>>>>   i915_pmu_event_read
>>>>>
>>>>>I guess pmu->closed can get set halfway in i915_pmu_event_init. What
>>>>>would be the effect of that.. We'd try to get a module reference
>>>>>while
>>>>>in the process of unloading. Which is probably very bad.. So possibly
>>>>>a
>>>>>final check on pmu->close is needed there. Ho hum.. can it be made
>>>>>safe
>>>>>is the question.
>>>>>
>>>>>It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
>>>>>the
>>>>>evens open all the time. So I think more info needed, for me at
>>>>>least.
>>>>
>>>>So one thing here is this doesn't have to do with module unload, but
>>>>module unbind specifically (while perf is open). I don't know if the
>>>>NPD from Umesh is the same as what we're seeing here. I'd really like
>>>>to separate these unless you know for sure that's related. Also it
>>>>would be interesting to know if this patch fixes your issue as well.
>>>>
>>>>I still think the re-ordering in i915_driver.c should be enough and we
>>>>shouldn't need to check pmu->closed. The unregister should be enough to
>>>>ensure the perf tools are notified that new events aren't allowed, and
>>>>at that time the engine structures are still intact. And even if for
>>>>some reason the perf code still calls in to our function pointers, we
>>>>have these engine checks as a failsafe.
>>>>
>>>>I'm by the way uploading one more version here with a drm_WARN_ONCE
>>>>instead of the debug print.
>>>
>>>Problem is I am not a fan of papering so lets get to the bottom of 
>>>the issue first. (In the meantime simple patch to re-order driver 
>>>fini is okay since that seems obvious enough, I tnink.)
>>>
>>>We need to see call traces from any oopses and try to extend 
>>>perf_pmu to catch them. And we need to understand the problem, if 
>>>it is a real problem, which I laid out last week about race 
>>>between module unload and elevating the module use count from our 
>>>perf event init.
>>>
>>>Without understanding the details of possible failure mode flows 
>>>we don't know how much the papering with engine checks solves and 
>>>how much it leaves broken.
>>>
>>>If you guys are too busy to tackle that I'll put it onto myself, 
>>>but help would certainly be appreciated.
>>
>>Looks like Stuart/Chris are pointing towards the unbind as an issue.
>>
>>I ran this sequence and only the modprobe showed an error (FATAL: 
>>... still in use). What happens with the unbind. Should pmu also 
>>handle the unbind somehow?
>>
>>- run intel_gpu_top
>>- unbind
>>- modprobe -r i915
>>- kill intel_gpu_top.
>
>And it crashes or survives in this scenario?

hangs on adlp, haven't been able to get the serial logs

>
>Module still in use here would be expected since intel_gpu_top is 
>holding a module reference.
>
>And pmu->closed should be set at the unbind step via i915_pci_remove 
>-> i915_driver_unregister -> i915_pmu_unregister.

After unbind, 

kill intel_gpu_top -> i915_pmu_event_del -> i915_pmu_event_stop -> 
i915_pmu_disable -> likely HANGs when dereferencing engine.

Can we can short circuit i915_pmu_disable with 

if (pmu->closed)
	return;

since this function is also adjusting pmu->enable_count. Does it matter 
after pmu is closed?

Umesh


>
>We also need to try a stress test with two threads:
>
>	Thread A		Thread B
>	-----------		-----------
>	loop:			loop:
>	  open pmu event	  rmmod
>	  close pmu event	  insmod
>
>To see if it can hit a problem with drm_dev_get from 
>i915_pmu_event_init being called at a bad moment relative to module 
>unload. Maybe I am confused but that seems a possibility and a serious 
>problem currently.
>
>Regards,
>
>Tvrtko

Tvrtko Ursulin July 20, 2022, 8:14 a.m. UTC | #9

On 20/07/2022 01:22, Umesh Nerlige Ramappa wrote:
> On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin wrote:
>>
>> On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
>>> On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 01/07/2022 15:54, Summers, Stuart wrote:
>>>>> On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>>>> On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>>>> On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>>>>>> In the driver teardown, we are unregistering the gt prior
>>>>>>>> to unregistering the PMU. This means there is a small window
>>>>>>>> of time in which the application can request metrics from the
>>>>>>>> PMU, some of which are calling into the uapi engines list,
>>>>>>>> while the engines are not available. In this case we can
>>>>>>>> see null pointer dereferences.
>>>>>>>>
>>>>>>>> Fix this ordering in both the driver load and unload sequences.
>>>>>>>>
>>>>>>>> Additionally add a check for engine presence to prevent this
>>>>>>>> NPD in the event this ordering is accidentally reversed. Print
>>>>>>>> a debug message indicating when they aren't available.
>>>>>>>>
>>>>>>>> v1: Actually address the driver load/unload ordering issue
>>>>>>>>
>>>>>>>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> I thought this is likely happening because intel_gpu_top is running
>>>>>>> in
>>>>>>> the background when i915 is unloaded. I tried a quick repro, I
>>>>>>> don't see
>>>>>>> the unload succeed ("fatal module in use", not sure if this was a
>>>>>>> partial unload), but when I try to kill intel_gpu_top, I get an
>>>>>>> NPD.
>>>>>>> This is in the event disable path - i915_pmu_event_stop ->
>>>>>>> i915_pmu_disable.
>>>>>>
>>>>>> So i915 failed to unload (as expected - with perf events open we
>>>>>> elevate
>>>>>> the module ref count via i915_pmu_event_init -> drm_dev_get), then
>>>>>> you
>>>>>> quit intel_gpu_top and get NPD? On the engine lookup? With the
>>>>>> re-ordered init/fini sequence as from this patch?
>>>>>>
>>>>>> With elevated module count there shouldn't be any unloading happening
>>>>>> so
>>>>>> I am intrigued.
>>>>>>
>>>>>>> It's likely that you are seeing a different path (unload) leading
>>>>>>> to the
>>>>>>> same issue.
>>>>>>>
>>>>>>> I think in i915_pmu_disable/disable should be aware of event-
>>>>>>>> hw.state
>>>>>>> and or pmu->closed states before accessing the event. Maybe like,
>>>>>>>
>>>>>>> if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
>>>>>>> {
>>>>>>>
>>>>>>> @Tvrtko, wondering if this case is tested by igt@perf
>>>>>>> _pmu@module-unload.
>>>>>>
>>>>>> A bit yes. From what Stuart wrote it seems the test would need to be
>>>>>> extended to cover the case where PMU is getting opened while module
>>>>>> unload is in progress.
>>>>>>
>>>>>> But the NPD you saw is for the moment confusing so I don't know what
>>>>>> is
>>>>>> happening.
>>>>>>
>>>>>>> I am not clear if we should use event->hw.state or pmu->closed here
>>>>>>> and
>>>>>>> if/how they are related. IMO, for this issue, the engine check is
>>>>>>> good
>>>>>>> enough too, so we don't really need the pmu state checks.
>>>>>>> Thoughts?
>>>>>>
>>>>>> Engine check at the moment feels like papering.
>>>>>>
>>>>>> Indeed as you say I think the pmu->closed might be the solution.
>>>>>> Perhaps
>>>>>> the race is as mentioned above. PMU open happening in parallel to
>>>>>> unload..
>>>>>>
>>>>>> If the sequence of events userspace triggers is:
>>>>>>
>>>>>>    i915_pmu_event_init
>>>>>>    i915_pmu_event_start
>>>>>>    i915_pmu_enable
>>>>>>    i915_pmu_event_read
>>>>>>
>>>>>> I guess pmu->closed can get set halfway in i915_pmu_event_init. What
>>>>>> would be the effect of that.. We'd try to get a module reference
>>>>>> while
>>>>>> in the process of unloading. Which is probably very bad.. So possibly
>>>>>> a
>>>>>> final check on pmu->close is needed there. Ho hum.. can it be made
>>>>>> safe
>>>>>> is the question.
>>>>>>
>>>>>> It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
>>>>>> the
>>>>>> evens open all the time. So I think more info needed, for me at
>>>>>> least.
>>>>>
>>>>> So one thing here is this doesn't have to do with module unload, but
>>>>> module unbind specifically (while perf is open). I don't know if the
>>>>> NPD from Umesh is the same as what we're seeing here. I'd really like
>>>>> to separate these unless you know for sure that's related. Also it
>>>>> would be interesting to know if this patch fixes your issue as well.
>>>>>
>>>>> I still think the re-ordering in i915_driver.c should be enough and we
>>>>> shouldn't need to check pmu->closed. The unregister should be 
>>>>> enough to
>>>>> ensure the perf tools are notified that new events aren't allowed, and
>>>>> at that time the engine structures are still intact. And even if for
>>>>> some reason the perf code still calls in to our function pointers, we
>>>>> have these engine checks as a failsafe.
>>>>>
>>>>> I'm by the way uploading one more version here with a drm_WARN_ONCE
>>>>> instead of the debug print.
>>>>
>>>> Problem is I am not a fan of papering so lets get to the bottom of 
>>>> the issue first. (In the meantime simple patch to re-order driver 
>>>> fini is okay since that seems obvious enough, I tnink.)
>>>>
>>>> We need to see call traces from any oopses and try to extend 
>>>> perf_pmu to catch them. And we need to understand the problem, if it 
>>>> is a real problem, which I laid out last week about race between 
>>>> module unload and elevating the module use count from our perf event 
>>>> init.
>>>>
>>>> Without understanding the details of possible failure mode flows we 
>>>> don't know how much the papering with engine checks solves and how 
>>>> much it leaves broken.
>>>>
>>>> If you guys are too busy to tackle that I'll put it onto myself, but 
>>>> help would certainly be appreciated.
>>>
>>> Looks like Stuart/Chris are pointing towards the unbind as an issue.
>>>
>>> I ran this sequence and only the modprobe showed an error (FATAL: ... 
>>> still in use). What happens with the unbind. Should pmu also handle 
>>> the unbind somehow?
>>>
>>> - run intel_gpu_top
>>> - unbind
>>> - modprobe -r i915
>>> - kill intel_gpu_top.
>>
>> And it crashes or survives in this scenario?
> 
> hangs on adlp, haven't been able to get the serial logs
> 
>>
>> Module still in use here would be expected since intel_gpu_top is 
>> holding a module reference.
>>
>> And pmu->closed should be set at the unbind step via i915_pci_remove 
>> -> i915_driver_unregister -> i915_pmu_unregister.
> 
> After unbind,
> kill intel_gpu_top -> i915_pmu_event_del -> i915_pmu_event_stop -> 
> i915_pmu_disable -> likely HANGs when dereferencing engine.
> 
> Can we can short circuit i915_pmu_disable with
> if (pmu->closed)
>      return;
> 
> since this function is also adjusting pmu->enable_count. Does it matter 
> after pmu is closed?

Erm yes.. this sounds obvious now but why I did not put a pmu->closed check in i915_pmu_event_stop, since read and start/init have it!? Was it a simple oversight or something more I can't remember.

Try like this maybe:

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..2399adf92cc0 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -760,9 +760,13 @@ static void i915_pmu_event_start(struct perf_event *event, int flags)
  
  static void i915_pmu_event_stop(struct perf_event *event, int flags)
  {
+       if (pmu->closed)
+               goto out;
+
         if (flags & PERF_EF_UPDATE)
                 i915_pmu_event_read(event);
         i915_pmu_disable(event);
+out:
         event->hw.state = PERF_HES_STOPPED;
  }
  

Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")

Enable count handling in i915_pmu_disable should not matter since the i915_pmu_unregister would have already been executed by this point so all we need to ensure is that pmu->closed is not use after free. And since open event hold the DRM device reference I think that is fine.

Regards,

Tvrtko

> 
> Umesh
> 
> 
>>
>> We also need to try a stress test with two threads:
>>
>>     Thread A        Thread B
>>     -----------        -----------
>>     loop:            loop:
>>       open pmu event      rmmod
>>       close pmu event      insmod
>>
>> To see if it can hit a problem with drm_dev_get from 
>> i915_pmu_event_init being called at a bad moment relative to module 
>> unload. Maybe I am confused but that seems a possibility and a serious 
>> problem currently.
>>
>> Regards,
>>
>> Tvrtko

Umesh Nerlige Ramappa July 20, 2022, 8:07 p.m. UTC | #10

On Wed, Jul 20, 2022 at 09:14:38AM +0100, Tvrtko Ursulin wrote:
>
>On 20/07/2022 01:22, Umesh Nerlige Ramappa wrote:
>>On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
>>>>On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>On 01/07/2022 15:54, Summers, Stuart wrote:
>>>>>>On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>>>>>On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>>>>>On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart Summers wrote:
>>>>>>>>>In the driver teardown, we are unregistering the gt prior
>>>>>>>>>to unregistering the PMU. This means there is a small window
>>>>>>>>>of time in which the application can request metrics from the
>>>>>>>>>PMU, some of which are calling into the uapi engines list,
>>>>>>>>>while the engines are not available. In this case we can
>>>>>>>>>see null pointer dereferences.
>>>>>>>>>
>>>>>>>>>Fix this ordering in both the driver load and unload sequences.
>>>>>>>>>
>>>>>>>>>Additionally add a check for engine presence to prevent this
>>>>>>>>>NPD in the event this ordering is accidentally reversed. Print
>>>>>>>>>a debug message indicating when they aren't available.
>>>>>>>>>
>>>>>>>>>v1: Actually address the driver load/unload ordering issue
>>>>>>>>>
>>>>>>>>>Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>>>>>>>>---
>>>>>>>>
>>>>>>>>I thought this is likely happening because intel_gpu_top is running
>>>>>>>>in
>>>>>>>>the background when i915 is unloaded. I tried a quick repro, I
>>>>>>>>don't see
>>>>>>>>the unload succeed ("fatal module in use", not sure if this was a
>>>>>>>>partial unload), but when I try to kill intel_gpu_top, I get an
>>>>>>>>NPD.
>>>>>>>>This is in the event disable path - i915_pmu_event_stop ->
>>>>>>>>i915_pmu_disable.
>>>>>>>
>>>>>>>So i915 failed to unload (as expected - with perf events open we
>>>>>>>elevate
>>>>>>>the module ref count via i915_pmu_event_init -> drm_dev_get), then
>>>>>>>you
>>>>>>>quit intel_gpu_top and get NPD? On the engine lookup? With the
>>>>>>>re-ordered init/fini sequence as from this patch?
>>>>>>>
>>>>>>>With elevated module count there shouldn't be any unloading happening
>>>>>>>so
>>>>>>>I am intrigued.
>>>>>>>
>>>>>>>>It's likely that you are seeing a different path (unload) leading
>>>>>>>>to the
>>>>>>>>same issue.
>>>>>>>>
>>>>>>>>I think in i915_pmu_disable/disable should be aware of event-
>>>>>>>>>hw.state
>>>>>>>>and or pmu->closed states before accessing the event. Maybe like,
>>>>>>>>
>>>>>>>>if (event->hw.state != PERF_HES_STOPPED && is_engine_event(event))
>>>>>>>>{
>>>>>>>>
>>>>>>>>@Tvrtko, wondering if this case is tested by igt@perf
>>>>>>>>_pmu@module-unload.
>>>>>>>
>>>>>>>A bit yes. From what Stuart wrote it seems the test would need to be
>>>>>>>extended to cover the case where PMU is getting opened while module
>>>>>>>unload is in progress.
>>>>>>>
>>>>>>>But the NPD you saw is for the moment confusing so I don't know what
>>>>>>>is
>>>>>>>happening.
>>>>>>>
>>>>>>>>I am not clear if we should use event->hw.state or pmu->closed here
>>>>>>>>and
>>>>>>>>if/how they are related. IMO, for this issue, the engine check is
>>>>>>>>good
>>>>>>>>enough too, so we don't really need the pmu state checks.
>>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Engine check at the moment feels like papering.
>>>>>>>
>>>>>>>Indeed as you say I think the pmu->closed might be the solution.
>>>>>>>Perhaps
>>>>>>>the race is as mentioned above. PMU open happening in parallel to
>>>>>>>unload..
>>>>>>>
>>>>>>>If the sequence of events userspace triggers is:
>>>>>>>
>>>>>>>   i915_pmu_event_init
>>>>>>>   i915_pmu_event_start
>>>>>>>   i915_pmu_enable
>>>>>>>   i915_pmu_event_read
>>>>>>>
>>>>>>>I guess pmu->closed can get set halfway in i915_pmu_event_init. What
>>>>>>>would be the effect of that.. We'd try to get a module reference
>>>>>>>while
>>>>>>>in the process of unloading. Which is probably very bad.. So possibly
>>>>>>>a
>>>>>>>final check on pmu->close is needed there. Ho hum.. can it be made
>>>>>>>safe
>>>>>>>is the question.
>>>>>>>
>>>>>>>It doesn't explain the NPD on Ctrl-C though.. intel_gpu_top keeps
>>>>>>>the
>>>>>>>evens open all the time. So I think more info needed, for me at
>>>>>>>least.
>>>>>>
>>>>>>So one thing here is this doesn't have to do with module unload, but
>>>>>>module unbind specifically (while perf is open). I don't know if the
>>>>>>NPD from Umesh is the same as what we're seeing here. I'd really like
>>>>>>to separate these unless you know for sure that's related. Also it
>>>>>>would be interesting to know if this patch fixes your issue as well.
>>>>>>
>>>>>>I still think the re-ordering in i915_driver.c should be enough and we
>>>>>>shouldn't need to check pmu->closed. The unregister should 
>>>>>>be enough to
>>>>>>ensure the perf tools are notified that new events aren't allowed, and
>>>>>>at that time the engine structures are still intact. And even if for
>>>>>>some reason the perf code still calls in to our function pointers, we
>>>>>>have these engine checks as a failsafe.
>>>>>>
>>>>>>I'm by the way uploading one more version here with a drm_WARN_ONCE
>>>>>>instead of the debug print.
>>>>>
>>>>>Problem is I am not a fan of papering so lets get to the 
>>>>>bottom of the issue first. (In the meantime simple patch to 
>>>>>re-order driver fini is okay since that seems obvious enough, 
>>>>>I tnink.)
>>>>>
>>>>>We need to see call traces from any oopses and try to extend 
>>>>>perf_pmu to catch them. And we need to understand the problem, 
>>>>>if it is a real problem, which I laid out last week about race 
>>>>>between module unload and elevating the module use count from 
>>>>>our perf event init.
>>>>>
>>>>>Without understanding the details of possible failure mode 
>>>>>flows we don't know how much the papering with engine checks 
>>>>>solves and how much it leaves broken.
>>>>>
>>>>>If you guys are too busy to tackle that I'll put it onto 
>>>>>myself, but help would certainly be appreciated.
>>>>
>>>>Looks like Stuart/Chris are pointing towards the unbind as an issue.
>>>>
>>>>I ran this sequence and only the modprobe showed an error 
>>>>(FATAL: ... still in use). What happens with the unbind. Should 
>>>>pmu also handle the unbind somehow?
>>>>
>>>>- run intel_gpu_top
>>>>- unbind
>>>>- modprobe -r i915
>>>>- kill intel_gpu_top.
>>>
>>>And it crashes or survives in this scenario?
>>
>>hangs on adlp, haven't been able to get the serial logs
>>
>>>
>>>Module still in use here would be expected since intel_gpu_top is 
>>>holding a module reference.
>>>
>>>And pmu->closed should be set at the unbind step via 
>>>i915_pci_remove -> i915_driver_unregister -> i915_pmu_unregister.
>>
>>After unbind,
>>kill intel_gpu_top -> i915_pmu_event_del -> i915_pmu_event_stop -> 
>>i915_pmu_disable -> likely HANGs when dereferencing engine.
>>
>>Can we can short circuit i915_pmu_disable with
>>if (pmu->closed)
>>     return;
>>
>>since this function is also adjusting pmu->enable_count. Does it 
>>matter after pmu is closed?
>
>Erm yes.. this sounds obvious now but why I did not put a pmu->closed check in i915_pmu_event_stop, since read and start/init have it!? Was it a simple oversight or something more I can't remember.
>
>Try like this maybe:
>
>diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
>index 958b37123bf1..2399adf92cc0 100644
>--- a/drivers/gpu/drm/i915/i915_pmu.c
>+++ b/drivers/gpu/drm/i915/i915_pmu.c
>@@ -760,9 +760,13 @@ static void i915_pmu_event_start(struct perf_event *event, int flags)
> static void i915_pmu_event_stop(struct perf_event *event, int flags)
> {
>+       if (pmu->closed)
>+               goto out;
>+
>        if (flags & PERF_EF_UPDATE)
>                i915_pmu_event_read(event);
>        i915_pmu_disable(event);
>+out:
>        event->hw.state = PERF_HES_STOPPED;
> }
>
>Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")

that works. I don't see a hang with the above sequence on ADLP. Do you 
want to post/merge this?

Also what about Stuart's changes in this series. At a minimum, I would 
keep the engine checks in i915_pmu_disable (rev1)? I am not sure the 
reorder of pmu/gt registrations is needed though.

Thanks,
Umesh

>
>Enable count handling in i915_pmu_disable should not matter since the i915_pmu_unregister would have already been executed by this point so all we need to ensure is that pmu->closed is not use after free. And since open event hold the DRM device reference I think that is fine.
>
>Regards,
>
>Tvrtko
>
>>
>>Umesh
>>
>>
>>>
>>>We also need to try a stress test with two threads:
>>>
>>>    Thread A        Thread B
>>>    -----------        -----------
>>>    loop:            loop:
>>>      open pmu event      rmmod
>>>      close pmu event      insmod
>>>
>>>To see if it can hit a problem with drm_dev_get from 
>>>i915_pmu_event_init being called at a bad moment relative to 
>>>module unload. Maybe I am confused but that seems a possibility 
>>>and a serious problem currently.
>>>
>>>Regards,
>>>
>>>Tvrtko

Summers, Stuart July 21, 2022, 4:30 a.m. UTC | #11

On Wed, 2022-07-20 at 13:07 -0700, Umesh Nerlige Ramappa wrote:
> On Wed, Jul 20, 2022 at 09:14:38AM +0100, Tvrtko Ursulin wrote:
> > On 20/07/2022 01:22, Umesh Nerlige Ramappa wrote:
> > > On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin wrote:
> > > > On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
> > > > > On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin
> > > > > wrote:
> > > > > > On 01/07/2022 15:54, Summers, Stuart wrote:
> > > > > > > On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
> > > > > > > > On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
> > > > > > > > > On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart
> > > > > > > > > Summers wrote:
> > > > > > > > > > In the driver teardown, we are unregistering the gt
> > > > > > > > > > prior
> > > > > > > > > > to unregistering the PMU. This means there is a
> > > > > > > > > > small window
> > > > > > > > > > of time in which the application can request
> > > > > > > > > > metrics from the
> > > > > > > > > > PMU, some of which are calling into the uapi
> > > > > > > > > > engines list,
> > > > > > > > > > while the engines are not available. In this case
> > > > > > > > > > we can
> > > > > > > > > > see null pointer dereferences.
> > > > > > > > > > 
> > > > > > > > > > Fix this ordering in both the driver load and
> > > > > > > > > > unload sequences.
> > > > > > > > > > 
> > > > > > > > > > Additionally add a check for engine presence to
> > > > > > > > > > prevent this
> > > > > > > > > > NPD in the event this ordering is accidentally
> > > > > > > > > > reversed. Print
> > > > > > > > > > a debug message indicating when they aren't
> > > > > > > > > > available.
> > > > > > > > > > 
> > > > > > > > > > v1: Actually address the driver load/unload
> > > > > > > > > > ordering issue
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Stuart Summers <
> > > > > > > > > > stuart.summers@intel.com>
> > > > > > > > > > ---
> > > > > > > > > 
> > > > > > > > > I thought this is likely happening because
> > > > > > > > > intel_gpu_top is running
> > > > > > > > > in
> > > > > > > > > the background when i915 is unloaded. I tried a quick
> > > > > > > > > repro, I
> > > > > > > > > don't see
> > > > > > > > > the unload succeed ("fatal module in use", not sure
> > > > > > > > > if this was a
> > > > > > > > > partial unload), but when I try to kill
> > > > > > > > > intel_gpu_top, I get an
> > > > > > > > > NPD.
> > > > > > > > > This is in the event disable path -
> > > > > > > > > i915_pmu_event_stop ->
> > > > > > > > > i915_pmu_disable.
> > > > > > > > 
> > > > > > > > So i915 failed to unload (as expected - with perf
> > > > > > > > events open we
> > > > > > > > elevate
> > > > > > > > the module ref count via i915_pmu_event_init ->
> > > > > > > > drm_dev_get), then
> > > > > > > > you
> > > > > > > > quit intel_gpu_top and get NPD? On the engine lookup?
> > > > > > > > With the
> > > > > > > > re-ordered init/fini sequence as from this patch?
> > > > > > > > 
> > > > > > > > With elevated module count there shouldn't be any
> > > > > > > > unloading happening
> > > > > > > > so
> > > > > > > > I am intrigued.
> > > > > > > > 
> > > > > > > > > It's likely that you are seeing a different path
> > > > > > > > > (unload) leading
> > > > > > > > > to the
> > > > > > > > > same issue.
> > > > > > > > > 
> > > > > > > > > I think in i915_pmu_disable/disable should be aware
> > > > > > > > > of event-
> > > > > > > > > > hw.state
> > > > > > > > > and or pmu->closed states before accessing the event.
> > > > > > > > > Maybe like,
> > > > > > > > > 
> > > > > > > > > if (event->hw.state != PERF_HES_STOPPED &&
> > > > > > > > > is_engine_event(event))
> > > > > > > > > {
> > > > > > > > > 
> > > > > > > > > @Tvrtko, wondering if this case is tested by igt@perf
> > > > > > > > > _pmu@module-unload.
> > > > > > > > 
> > > > > > > > A bit yes. From what Stuart wrote it seems the test
> > > > > > > > would need to be
> > > > > > > > extended to cover the case where PMU is getting opened
> > > > > > > > while module
> > > > > > > > unload is in progress.
> > > > > > > > 
> > > > > > > > But the NPD you saw is for the moment confusing so I
> > > > > > > > don't know what
> > > > > > > > is
> > > > > > > > happening.
> > > > > > > > 
> > > > > > > > > I am not clear if we should use event->hw.state or
> > > > > > > > > pmu->closed here
> > > > > > > > > and
> > > > > > > > > if/how they are related. IMO, for this issue, the
> > > > > > > > > engine check is
> > > > > > > > > good
> > > > > > > > > enough too, so we don't really need the pmu state
> > > > > > > > > checks.
> > > > > > > > > Thoughts?
> > > > > > > > 
> > > > > > > > Engine check at the moment feels like papering.
> > > > > > > > 
> > > > > > > > Indeed as you say I think the pmu->closed might be the
> > > > > > > > solution.
> > > > > > > > Perhaps
> > > > > > > > the race is as mentioned above. PMU open happening in
> > > > > > > > parallel to
> > > > > > > > unload..
> > > > > > > > 
> > > > > > > > If the sequence of events userspace triggers is:
> > > > > > > > 
> > > > > > > >    i915_pmu_event_init
> > > > > > > >    i915_pmu_event_start
> > > > > > > >    i915_pmu_enable
> > > > > > > >    i915_pmu_event_read
> > > > > > > > 
> > > > > > > > I guess pmu->closed can get set halfway in
> > > > > > > > i915_pmu_event_init. What
> > > > > > > > would be the effect of that.. We'd try to get a module
> > > > > > > > reference
> > > > > > > > while
> > > > > > > > in the process of unloading. Which is probably very
> > > > > > > > bad.. So possibly
> > > > > > > > a
> > > > > > > > final check on pmu->close is needed there. Ho hum.. can
> > > > > > > > it be made
> > > > > > > > safe
> > > > > > > > is the question.
> > > > > > > > 
> > > > > > > > It doesn't explain the NPD on Ctrl-C though..
> > > > > > > > intel_gpu_top keeps
> > > > > > > > the
> > > > > > > > evens open all the time. So I think more info needed,
> > > > > > > > for me at
> > > > > > > > least.
> > > > > > > 
> > > > > > > So one thing here is this doesn't have to do with module
> > > > > > > unload, but
> > > > > > > module unbind specifically (while perf is open). I don't
> > > > > > > know if the
> > > > > > > NPD from Umesh is the same as what we're seeing here. I'd
> > > > > > > really like
> > > > > > > to separate these unless you know for sure that's
> > > > > > > related. Also it
> > > > > > > would be interesting to know if this patch fixes your
> > > > > > > issue as well.
> > > > > > > 
> > > > > > > I still think the re-ordering in i915_driver.c should be
> > > > > > > enough and we
> > > > > > > shouldn't need to check pmu->closed. The unregister
> > > > > > > should 
> > > > > > > be enough to
> > > > > > > ensure the perf tools are notified that new events aren't
> > > > > > > allowed, and
> > > > > > > at that time the engine structures are still intact. And
> > > > > > > even if for
> > > > > > > some reason the perf code still calls in to our function
> > > > > > > pointers, we
> > > > > > > have these engine checks as a failsafe.
> > > > > > > 
> > > > > > > I'm by the way uploading one more version here with a
> > > > > > > drm_WARN_ONCE
> > > > > > > instead of the debug print.
> > > > > > 
> > > > > > Problem is I am not a fan of papering so lets get to the 
> > > > > > bottom of the issue first. (In the meantime simple patch
> > > > > > to 
> > > > > > re-order driver fini is okay since that seems obvious
> > > > > > enough, 
> > > > > > I tnink.)
> > > > > > 
> > > > > > We need to see call traces from any oopses and try to
> > > > > > extend 
> > > > > > perf_pmu to catch them. And we need to understand the
> > > > > > problem, 
> > > > > > if it is a real problem, which I laid out last week about
> > > > > > race 
> > > > > > between module unload and elevating the module use count
> > > > > > from 
> > > > > > our perf event init.
> > > > > > 
> > > > > > Without understanding the details of possible failure mode 
> > > > > > flows we don't know how much the papering with engine
> > > > > > checks 
> > > > > > solves and how much it leaves broken.
> > > > > > 
> > > > > > If you guys are too busy to tackle that I'll put it onto 
> > > > > > myself, but help would certainly be appreciated.
> > > > > 
> > > > > Looks like Stuart/Chris are pointing towards the unbind as an
> > > > > issue.
> > > > > 
> > > > > I ran this sequence and only the modprobe showed an error 
> > > > > (FATAL: ... still in use). What happens with the unbind.
> > > > > Should 
> > > > > pmu also handle the unbind somehow?
> > > > > 
> > > > > - run intel_gpu_top
> > > > > - unbind
> > > > > - modprobe -r i915
> > > > > - kill intel_gpu_top.
> > > > 
> > > > And it crashes or survives in this scenario?
> > > 
> > > hangs on adlp, haven't been able to get the serial logs
> > > 
> > > > Module still in use here would be expected since intel_gpu_top
> > > > is 
> > > > holding a module reference.
> > > > 
> > > > And pmu->closed should be set at the unbind step via 
> > > > i915_pci_remove -> i915_driver_unregister ->
> > > > i915_pmu_unregister.
> > > 
> > > After unbind,
> > > kill intel_gpu_top -> i915_pmu_event_del -> i915_pmu_event_stop
> > > -> 
> > > i915_pmu_disable -> likely HANGs when dereferencing engine.
> > > 
> > > Can we can short circuit i915_pmu_disable with
> > > if (pmu->closed)
> > >     return;
> > > 
> > > since this function is also adjusting pmu->enable_count. Does it 
> > > matter after pmu is closed?
> > 
> > Erm yes.. this sounds obvious now but why I did not put a pmu-
> > >closed check in i915_pmu_event_stop, since read and start/init
> > have it!? Was it a simple oversight or something more I can't
> > remember.
> > 
> > Try like this maybe:
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c
> > b/drivers/gpu/drm/i915/i915_pmu.c
> > index 958b37123bf1..2399adf92cc0 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -760,9 +760,13 @@ static void i915_pmu_event_start(struct
> > perf_event *event, int flags)
> > static void i915_pmu_event_stop(struct perf_event *event, int
> > flags)
> > {
> > +       if (pmu->closed)
> > +               goto out;
> > +
> >        if (flags & PERF_EF_UPDATE)
> >                i915_pmu_event_read(event);
> >        i915_pmu_disable(event);
> > +out:
> >        event->hw.state = PERF_HES_STOPPED;
> > }
> > 
> > Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")
> 
> that works. I don't see a hang with the above sequence on ADLP. Do
> you 
> want to post/merge this?
> 
> Also what about Stuart's changes in this series. At a minimum, I
> would 
> keep the engine checks in i915_pmu_disable (rev1)? I am not sure the 
> reorder of pmu/gt registrations is needed though.

Thanks for the help here Tvrtko/Umesh! Sorry for the late reply here.
I've been swamped and haven't been able to get back here.

IMO we really should have all three of these, possibly in three
separate patches. I'm happy to post any or all of these or one of you
can - happy to review. It will be earliest some time tomorrow though.

Thanks,
Stuart

> 
> Thanks,
> Umesh
> 
> > Enable count handling in i915_pmu_disable should not matter since
> > the i915_pmu_unregister would have already been executed by this
> > point so all we need to ensure is that pmu->closed is not use after
> > free. And since open event hold the DRM device reference I think
> > that is fine.
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > Umesh
> > > 
> > > 
> > > > We also need to try a stress test with two threads:
> > > > 
> > > >     Thread A        Thread B
> > > >     -----------        -----------
> > > >     loop:            loop:
> > > >       open pmu event      rmmod
> > > >       close pmu event      insmod
> > > > 
> > > > To see if it can hit a problem with drm_dev_get from 
> > > > i915_pmu_event_init being called at a bad moment relative to 
> > > > module unload. Maybe I am confused but that seems a
> > > > possibility 
> > > > and a serious problem currently.
> > > > 
> > > > Regards,
> > > > 
> > > > Tvrtko

Tvrtko Ursulin July 21, 2022, 7:43 a.m. UTC | #12

On 21/07/2022 05:30, Summers, Stuart wrote:
> On Wed, 2022-07-20 at 13:07 -0700, Umesh Nerlige Ramappa wrote:
>> On Wed, Jul 20, 2022 at 09:14:38AM +0100, Tvrtko Ursulin wrote:
>>> On 20/07/2022 01:22, Umesh Nerlige Ramappa wrote:
>>>> On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin wrote:
>>>>> On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
>>>>>> On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin
>>>>>> wrote:
>>>>>>> On 01/07/2022 15:54, Summers, Stuart wrote:
>>>>>>>> On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin wrote:
>>>>>>>>> On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
>>>>>>>>>> On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart
>>>>>>>>>> Summers wrote:
>>>>>>>>>>> In the driver teardown, we are unregistering the gt
>>>>>>>>>>> prior
>>>>>>>>>>> to unregistering the PMU. This means there is a
>>>>>>>>>>> small window
>>>>>>>>>>> of time in which the application can request
>>>>>>>>>>> metrics from the
>>>>>>>>>>> PMU, some of which are calling into the uapi
>>>>>>>>>>> engines list,
>>>>>>>>>>> while the engines are not available. In this case
>>>>>>>>>>> we can
>>>>>>>>>>> see null pointer dereferences.
>>>>>>>>>>>
>>>>>>>>>>> Fix this ordering in both the driver load and
>>>>>>>>>>> unload sequences.
>>>>>>>>>>>
>>>>>>>>>>> Additionally add a check for engine presence to
>>>>>>>>>>> prevent this
>>>>>>>>>>> NPD in the event this ordering is accidentally
>>>>>>>>>>> reversed. Print
>>>>>>>>>>> a debug message indicating when they aren't
>>>>>>>>>>> available.
>>>>>>>>>>>
>>>>>>>>>>> v1: Actually address the driver load/unload
>>>>>>>>>>> ordering issue
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stuart Summers <
>>>>>>>>>>> stuart.summers@intel.com>not yet been able to force a failure on a system with lots of RAM. My dev system has 32G of ram, and I have not been able to arrive at the level of memory pressure to apply which causes the gem cache to exceed the system memory without being killed by OOM first.
>>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> I thought this is likely happening because
>>>>>>>>>> intel_gpu_top is running
>>>>>>>>>> in
>>>>>>>>>> the background when i915 is unloaded. I tried a quick
>>>>>>>>>> repro, I
>>>>>>>>>> don't see
>>>>>>>>>> the unload succeed ("fatal module in use", not sure
>>>>>>>>>> if this was a
>>>>>>>>>> partial unload), but when I try to kill
>>>>>>>>>> intel_gpu_top, I get an
>>>>>>>>>> NPD.
>>>>>>>>>> This is in the event disable path -
>>>>>>>>>> i915_pmu_event_stop ->
>>>>>>>>>> i915_pmu_disable.
>>>>>>>>>
>>>>>>>>> So i915 failed to unload (as expected - with perf
>>>>>>>>> events open we
>>>>>>>>> elevate
>>>>>>>>> the module ref count via i915_pmu_event_init ->
>>>>>>>>> drm_dev_get), then
>>>>>>>>> you
>>>>>>>>> quit intel_gpu_top and get NPD? On the engine lookup?
>>>>>>>>> With the
>>>>>>>>> re-ordered init/fini sequence as from this patch?
>>>>>>>>>
>>>>>>>>> With elevated module count there shouldn't be any
>>>>>>>>> unloading happening
>>>>>>>>> so
>>>>>>>>> I am intrigued.
>>>>>>>>>
>>>>>>>>>> It's likely that you are seeing a different path
>>>>>>>>>> (unload) leading
>>>>>>>>>> to the
>>>>>>>>>> same issue.
>>>>>>>>>>
>>>>>>>>>> I think in i915_pmu_disable/disable should be aware
>>>>>>>>>> of event-
>>>>>>>>>>> hw.state
>>>>>>>>>> and or pmu->closed states before accessing the event.
>>>>>>>>>> Maybe like,
>>>>>>>>>>
>>>>>>>>>> if (event->hw.state != PERF_HES_STOPPED &&
>>>>>>>>>> is_engine_event(event))
>>>>>>>>>> {
>>>>>>>>>>
>>>>>>>>>> @Tvrtko, wondering if this case is tested by igt@perf
>>>>>>>>>> _pmu@module-unload.
>>>>>>>>>
>>>>>>>>> A bit yes. From what Stuart wrote it seems the test
>>>>>>>>> would need to be
>>>>>>>>> extended to cover the case where PMU is getting opened
>>>>>>>>> while module
>>>>>>>>> unload is in progress.
>>>>>>>>>
>>>>>>>>> But the NPD you saw is for the moment confusing so I
>>>>>>>>> don't know what
>>>>>>>>> is
>>>>>>>>> happening.
>>>>>>>>>
>>>>>>>>>> I am not clear if we should use event->hw.state or
>>>>>>>>>> pmu->closed here
>>>>>>>>>> and
>>>>>>>>>> if/how they are related. IMO, for this issue, the
>>>>>>>>>> engine check is
>>>>>>>>>> good
>>>>>>>>>> enough too, so we don't really need the pmu state
>>>>>>>>>> checks.
>>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Engine check at the moment feels like papering.
>>>>>>>>>
>>>>>>>>> Indeed as you say I think the pmu->closed might be the
>>>>>>>>> solution.
>>>>>>>>> Perhaps
>>>>>>>>> the race is as mentioned above. PMU open happening in
>>>>>>>>> parallel to
>>>>>>>>> unload..
>>>>>>>>>
>>>>>>>>> If the sequence of events userspace triggers is:
>>>>>>>>>
>>>>>>>>>     i915_pmu_event_init
>>>>>>>>>     i915_pmu_event_start
>>>>>>>>>     i915_pmu_enable
>>>>>>>>>     i915_pmu_event_read
>>>>>>>>>
>>>>>>>>> I guess pmu->closed can get set halfway in
>>>>>>>>> i915_pmu_event_init. What
>>>>>>>>> would be the effect of that.. We'd try to get a module
>>>>>>>>> reference
>>>>>>>>> while
>>>>>>>>> in the process of unloading. Which is probably very
>>>>>>>>> bad.. So possibly
>>>>>>>>> a
>>>>>>>>> final check on pmu->close is needed there. Ho hum.. can
>>>>>>>>> it be made
>>>>>>>>> safe
>>>>>>>>> is the question.
>>>>>>>>>
>>>>>>>>> It doesn't explain the NPD on Ctrl-C though..
>>>>>>>>> intel_gpu_top keeps
>>>>>>>>> the
>>>>>>>>> evens open all the time. So I think more info needed,
>>>>>>>>> for me at
>>>>>>>>> least.
>>>>>>>>
>>>>>>>> So one thing here is this doesn't have to do with module
>>>>>>>> unload, but
>>>>>>>> module unbind specifically (while perf is open). I don't
>>>>>>>> know if the
>>>>>>>> NPD from Umesh is the same as what we're seeing here. I'd
>>>>>>>> really like
>>>>>>>> to separate these unless you know for sure that's
>>>>>>>> related. Also it
>>>>>>>> would be interesting to know if this patch fixes your
>>>>>>>> issue as well.
>>>>>>>>
>>>>>>>> I still think the re-ordering in i915_driver.c should be
>>>>>>>> enough and we
>>>>>>>> shouldn't need to check pmu->closed. The unregister
>>>>>>>> should
>>>>>>>> be enough to
>>>>>>>> ensure the perf tools are notified that new events aren't
>>>>>>>> allowed, and
>>>>>>>> at that time the engine structures are still intact. And
>>>>>>>> even if for
>>>>>>>> some reason the perf code still calls in to our function
>>>>>>>> pointers, we
>>>>>>>> have these engine checks as a failsafe.
>>>>>>>>
>>>>>>>> I'm by the way uploading one more version here with a
>>>>>>>> drm_WARN_ONCE
>>>>>>>> instead of the debug print.
>>>>>>>
>>>>>>> Problem is I am not a fan of papering so lets get to the
>>>>>>> bottom of the issue first. (In the meantime simple patch
>>>>>>> to
>>>>>>> re-order driver fini is okay since that seems obvious
>>>>>>> enough,
>>>>>>> I tnink.)
>>>>>>>
>>>>>>> We need to see call traces from any oopses and try to
>>>>>>> extend
>>>>>>> perf_pmu to catch them. And we need to understand the
>>>>>>> problem,
>>>>>>> if it is a real problem, which I laid out last week about
>>>>>>> race
>>>>>>> between module unload and elevating the module use count
>>>>>>> from
>>>>>>> our perf event init.
>>>>>>>
>>>>>>> Without understanding the details of possible failure mode
>>>>>>> flows we don't know how much the papering with engine
>>>>>>> checks
>>>>>>> solves and how much it leaves broken.
>>>>>>>
>>>>>>> If you guys are too busy to tackle that I'll put it onto
>>>>>>> myself, but help would certainly be appreciated.
>>>>>>
>>>>>> Looks like Stuart/Chris are pointing towards the unbind as an
>>>>>> issue.
>>>>>>
>>>>>> I ran this sequence and only the modprobe showed an error
>>>>>> (FATAL: ... still in use). What happens with the unbind.
>>>>>> Should
>>>>>> pmu also handle the unbind somehow?
>>>>>>
>>>>>> - run intel_gpu_top
>>>>>> - unbind
>>>>>> - modprobe -r i915
>>>>>> - kill intel_gpu_top.
>>>>>
>>>>> And it crashes or survives in this scenario?
>>>>
>>>> hangs on adlp, haven't been able to get the serial logs
>>>>
>>>>> Module still in use here would be expected since intel_gpu_top
>>>>> is
>>>>> holding a module reference.
>>>>>
>>>>> And pmu->closed should be set at the unbind step via
>>>>> i915_pci_remove -> i915_driver_unregister ->
>>>>> i915_pmu_unregister.
>>>>
>>>> After unbind,
>>>> kill intel_gpu_top -> i915_pmu_event_del -> i915_pmu_event_stop
>>>> ->
>>>> i915_pmu_disable -> likely HANGs when dereferencing engine.
>>>>
>>>> Can we can short circuit i915_pmu_disable with
>>>> if (pmu->closed)
>>>>      return;
>>>>
>>>> since this function is also adjusting pmu->enable_count. Does it
>>>> matter after pmu is closed?
>>>
>>> Erm yes.. this sounds obvious now but why I did not put a pmu-
>>>> closed check in i915_pmu_event_stop, since read and start/init
>>> have it!? Was it a simple oversight or something more I can't
>>> remember.
>>>
>>> Try like this maybe:
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_pmu.c
>>> b/drivers/gpu/drm/i915/i915_pmu.c
>>> index 958b37123bf1..2399adf92cc0 100644
>>> --- a/drivers/gpu/drm/i915/i915_pmu.c
>>> +++ b/drivers/gpu/drm/i915/i915_pmu.c
>>> @@ -760,9 +760,13 @@ static void i915_pmu_event_start(struct
>>> perf_event *event, int flags)
>>> static void i915_pmu_event_stop(struct perf_event *event, int
>>> flags)
>>> {
>>> +       if (pmu->closed)
>>> +               goto out;
>>> +
>>>         if (flags & PERF_EF_UPDATE)
>>>                 i915_pmu_event_read(event);
>>>         i915_pmu_disable(event);
>>> +out:
>>>         event->hw.state = PERF_HES_STOPPED;
>>> }
>>>
>>> Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")
>>
>> that works. I don't see a hang with the above sequence on ADLP. Do
>> you
>> want to post/merge this?
>>
>> Also what about Stuart's changes in this series. At a minimum, I
>> would
>> keep the engine checks in i915_pmu_disable (rev1)? I am not sure the
>> reorder of pmu/gt registrations is needed though.
> 
> Thanks for the help here Tvrtko/Umesh! Sorry for the late reply here.
> I've been swamped and haven't been able to get back here.
> 
> IMO we really should have all three of these, possibly in three
> separate patches. I'm happy to post any or all of these or one of you
> can - happy to review. It will be earliest some time tomorrow though.

Re-order on unbind path AFAIR yes, and pmu->closed check in either 
i915_pmu_event_stop or early return from i915_pmu_disable (I was going 
for symmetry with start, but perhaps it looks clumsy, not sure) yes. 
Those two should have a fixes tag as well. Null engine checks I still do 
not support. It adds a production build debug string for something which 
is supposed to be impossible and a programming error, and makes the code 
a bit uglier with the extra indentation.

Regards,

Tvrtko

> 
> Thanks,
> Stuart
> 
>>
>> Thanks,
>> Umesh
>>
>>> Enable count handling in i915_pmu_disable should not matter since
>>> the i915_pmu_unregister would have already been executed by this
>>> point so all we need to ensure is that pmu->closed is not use after
>>> free. And since open event hold the DRM device reference I think
>>> that is fine.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> Umesh
>>>>
>>>>
>>>>> We also need to try a stress test with two threads:
>>>>>
>>>>>      Thread A        Thread B
>>>>>      -----------        -----------
>>>>>      loop:            loop:
>>>>>        open pmu event      rmmod
>>>>>        close pmu event      insmod
>>>>>
>>>>> To see if it can hit a problem with drm_dev_get from
>>>>> i915_pmu_event_init being called at a bad moment relative to
>>>>> module unload. Maybe I am confused but that seems a
>>>>> possibility
>>>>> and a serious problem currently.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Tvrtko

Summers, Stuart Aug. 3, 2022, 10:54 p.m. UTC | #13

On Thu, 2022-07-21 at 08:43 +0100, Tvrtko Ursulin wrote:
> On 21/07/2022 05:30, Summers, Stuart wrote:
> > On Wed, 2022-07-20 at 13:07 -0700, Umesh Nerlige Ramappa wrote:
> > > On Wed, Jul 20, 2022 at 09:14:38AM +0100, Tvrtko Ursulin wrote:
> > > > On 20/07/2022 01:22, Umesh Nerlige Ramappa wrote:
> > > > > On Tue, Jul 19, 2022 at 10:00:01AM +0100, Tvrtko Ursulin
> > > > > wrote:
> > > > > > On 12/07/2022 22:03, Umesh Nerlige Ramappa wrote:
> > > > > > > On Mon, Jul 04, 2022 at 09:31:55AM +0100, Tvrtko Ursulin
> > > > > > > wrote:
> > > > > > > > On 01/07/2022 15:54, Summers, Stuart wrote:
> > > > > > > > > On Fri, 2022-07-01 at 09:37 +0100, Tvrtko Ursulin
> > > > > > > > > wrote:
> > > > > > > > > > On 01/07/2022 01:11, Umesh Nerlige Ramappa wrote:
> > > > > > > > > > > On Thu, Jun 30, 2022 at 09:00:28PM +0000, Stuart
> > > > > > > > > > > Summers wrote:
> > > > > > > > > > > > In the driver teardown, we are unregistering
> > > > > > > > > > > > the gt
> > > > > > > > > > > > prior
> > > > > > > > > > > > to unregistering the PMU. This means there is a
> > > > > > > > > > > > small window
> > > > > > > > > > > > of time in which the application can request
> > > > > > > > > > > > metrics from the
> > > > > > > > > > > > PMU, some of which are calling into the uapi
> > > > > > > > > > > > engines list,
> > > > > > > > > > > > while the engines are not available. In this
> > > > > > > > > > > > case
> > > > > > > > > > > > we can
> > > > > > > > > > > > see null pointer dereferences.
> > > > > > > > > > > > 
> > > > > > > > > > > > Fix this ordering in both the driver load and
> > > > > > > > > > > > unload sequences.
> > > > > > > > > > > > 
> > > > > > > > > > > > Additionally add a check for engine presence to
> > > > > > > > > > > > prevent this
> > > > > > > > > > > > NPD in the event this ordering is accidentally
> > > > > > > > > > > > reversed. Print
> > > > > > > > > > > > a debug message indicating when they aren't
> > > > > > > > > > > > available.
> > > > > > > > > > > > 
> > > > > > > > > > > > v1: Actually address the driver load/unload
> > > > > > > > > > > > ordering issue
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Stuart Summers <
> > > > > > > > > > > > stuart.summers@intel.com>not yet been able to
> > > > > > > > > > > > force a failure on a system with lots of RAM.
> > > > > > > > > > > > My dev system has 32G of ram, and I have not
> > > > > > > > > > > > been able to arrive at the level of memory
> > > > > > > > > > > > pressure to apply which causes the gem cache to
> > > > > > > > > > > > exceed the system memory without being killed
> > > > > > > > > > > > by OOM first.
> > > > > > > > > > > > ---
> > > > > > > > > > > 
> > > > > > > > > > > I thought this is likely happening because
> > > > > > > > > > > intel_gpu_top is running
> > > > > > > > > > > in
> > > > > > > > > > > the background when i915 is unloaded. I tried a
> > > > > > > > > > > quick
> > > > > > > > > > > repro, I
> > > > > > > > > > > don't see
> > > > > > > > > > > the unload succeed ("fatal module in use", not
> > > > > > > > > > > sure
> > > > > > > > > > > if this was a
> > > > > > > > > > > partial unload), but when I try to kill
> > > > > > > > > > > intel_gpu_top, I get an
> > > > > > > > > > > NPD.
> > > > > > > > > > > This is in the event disable path -
> > > > > > > > > > > i915_pmu_event_stop ->
> > > > > > > > > > > i915_pmu_disable.
> > > > > > > > > > 
> > > > > > > > > > So i915 failed to unload (as expected - with perf
> > > > > > > > > > events open we
> > > > > > > > > > elevate
> > > > > > > > > > the module ref count via i915_pmu_event_init ->
> > > > > > > > > > drm_dev_get), then
> > > > > > > > > > you
> > > > > > > > > > quit intel_gpu_top and get NPD? On the engine
> > > > > > > > > > lookup?
> > > > > > > > > > With the
> > > > > > > > > > re-ordered init/fini sequence as from this patch?
> > > > > > > > > > 
> > > > > > > > > > With elevated module count there shouldn't be any
> > > > > > > > > > unloading happening
> > > > > > > > > > so
> > > > > > > > > > I am intrigued.
> > > > > > > > > > 
> > > > > > > > > > > It's likely that you are seeing a different path
> > > > > > > > > > > (unload) leading
> > > > > > > > > > > to the
> > > > > > > > > > > same issue.
> > > > > > > > > > > 
> > > > > > > > > > > I think in i915_pmu_disable/disable should be
> > > > > > > > > > > aware
> > > > > > > > > > > of event-
> > > > > > > > > > > > hw.state
> > > > > > > > > > > and or pmu->closed states before accessing the
> > > > > > > > > > > event.
> > > > > > > > > > > Maybe like,
> > > > > > > > > > > 
> > > > > > > > > > > if (event->hw.state != PERF_HES_STOPPED &&
> > > > > > > > > > > is_engine_event(event))
> > > > > > > > > > > {
> > > > > > > > > > > 
> > > > > > > > > > > @Tvrtko, wondering if this case is tested by 
> > > > > > > > > > > igt@perf
> > > > > > > > > > > _pmu@module-unload.
> > > > > > > > > > 
> > > > > > > > > > A bit yes. From what Stuart wrote it seems the test
> > > > > > > > > > would need to be
> > > > > > > > > > extended to cover the case where PMU is getting
> > > > > > > > > > opened
> > > > > > > > > > while module
> > > > > > > > > > unload is in progress.
> > > > > > > > > > 
> > > > > > > > > > But the NPD you saw is for the moment confusing so
> > > > > > > > > > I
> > > > > > > > > > don't know what
> > > > > > > > > > is
> > > > > > > > > > happening.
> > > > > > > > > > 
> > > > > > > > > > > I am not clear if we should use event->hw.state
> > > > > > > > > > > or
> > > > > > > > > > > pmu->closed here
> > > > > > > > > > > and
> > > > > > > > > > > if/how they are related. IMO, for this issue, the
> > > > > > > > > > > engine check is
> > > > > > > > > > > good
> > > > > > > > > > > enough too, so we don't really need the pmu state
> > > > > > > > > > > checks.
> > > > > > > > > > > Thoughts?
> > > > > > > > > > 
> > > > > > > > > > Engine check at the moment feels like papering.
> > > > > > > > > > 
> > > > > > > > > > Indeed as you say I think the pmu->closed might be
> > > > > > > > > > the
> > > > > > > > > > solution.
> > > > > > > > > > Perhaps
> > > > > > > > > > the race is as mentioned above. PMU open happening
> > > > > > > > > > in
> > > > > > > > > > parallel to
> > > > > > > > > > unload..
> > > > > > > > > > 
> > > > > > > > > > If the sequence of events userspace triggers is:
> > > > > > > > > > 
> > > > > > > > > >     i915_pmu_event_init
> > > > > > > > > >     i915_pmu_event_start
> > > > > > > > > >     i915_pmu_enable
> > > > > > > > > >     i915_pmu_event_read
> > > > > > > > > > 
> > > > > > > > > > I guess pmu->closed can get set halfway in
> > > > > > > > > > i915_pmu_event_init. What
> > > > > > > > > > would be the effect of that.. We'd try to get a
> > > > > > > > > > module
> > > > > > > > > > reference
> > > > > > > > > > while
> > > > > > > > > > in the process of unloading. Which is probably very
> > > > > > > > > > bad.. So possibly
> > > > > > > > > > a
> > > > > > > > > > final check on pmu->close is needed there. Ho hum..
> > > > > > > > > > can
> > > > > > > > > > it be made
> > > > > > > > > > safe
> > > > > > > > > > is the question.
> > > > > > > > > > 
> > > > > > > > > > It doesn't explain the NPD on Ctrl-C though..
> > > > > > > > > > intel_gpu_top keeps
> > > > > > > > > > the
> > > > > > > > > > evens open all the time. So I think more info
> > > > > > > > > > needed,
> > > > > > > > > > for me at
> > > > > > > > > > least.
> > > > > > > > > 
> > > > > > > > > So one thing here is this doesn't have to do with
> > > > > > > > > module
> > > > > > > > > unload, but
> > > > > > > > > module unbind specifically (while perf is open). I
> > > > > > > > > don't
> > > > > > > > > know if the
> > > > > > > > > NPD from Umesh is the same as what we're seeing here.
> > > > > > > > > I'd
> > > > > > > > > really like
> > > > > > > > > to separate these unless you know for sure that's
> > > > > > > > > related. Also it
> > > > > > > > > would be interesting to know if this patch fixes your
> > > > > > > > > issue as well.
> > > > > > > > > 
> > > > > > > > > I still think the re-ordering in i915_driver.c should
> > > > > > > > > be
> > > > > > > > > enough and we
> > > > > > > > > shouldn't need to check pmu->closed. The unregister
> > > > > > > > > should
> > > > > > > > > be enough to
> > > > > > > > > ensure the perf tools are notified that new events
> > > > > > > > > aren't
> > > > > > > > > allowed, and
> > > > > > > > > at that time the engine structures are still intact.
> > > > > > > > > And
> > > > > > > > > even if for
> > > > > > > > > some reason the perf code still calls in to our
> > > > > > > > > function
> > > > > > > > > pointers, we
> > > > > > > > > have these engine checks as a failsafe.
> > > > > > > > > 
> > > > > > > > > I'm by the way uploading one more version here with a
> > > > > > > > > drm_WARN_ONCE
> > > > > > > > > instead of the debug print.
> > > > > > > > 
> > > > > > > > Problem is I am not a fan of papering so lets get to
> > > > > > > > the
> > > > > > > > bottom of the issue first. (In the meantime simple
> > > > > > > > patch
> > > > > > > > to
> > > > > > > > re-order driver fini is okay since that seems obvious
> > > > > > > > enough,
> > > > > > > > I tnink.)
> > > > > > > > 
> > > > > > > > We need to see call traces from any oopses and try to
> > > > > > > > extend
> > > > > > > > perf_pmu to catch them. And we need to understand the
> > > > > > > > problem,
> > > > > > > > if it is a real problem, which I laid out last week
> > > > > > > > about
> > > > > > > > race
> > > > > > > > between module unload and elevating the module use
> > > > > > > > count
> > > > > > > > from
> > > > > > > > our perf event init.
> > > > > > > > 
> > > > > > > > Without understanding the details of possible failure
> > > > > > > > mode
> > > > > > > > flows we don't know how much the papering with engine
> > > > > > > > checks
> > > > > > > > solves and how much it leaves broken.
> > > > > > > > 
> > > > > > > > If you guys are too busy to tackle that I'll put it
> > > > > > > > onto
> > > > > > > > myself, but help would certainly be appreciated.
> > > > > > > 
> > > > > > > Looks like Stuart/Chris are pointing towards the unbind
> > > > > > > as an
> > > > > > > issue.
> > > > > > > 
> > > > > > > I ran this sequence and only the modprobe showed an error
> > > > > > > (FATAL: ... still in use). What happens with the unbind.
> > > > > > > Should
> > > > > > > pmu also handle the unbind somehow?
> > > > > > > 
> > > > > > > - run intel_gpu_top
> > > > > > > - unbind
> > > > > > > - modprobe -r i915
> > > > > > > - kill intel_gpu_top.
> > > > > > 
> > > > > > And it crashes or survives in this scenario?
> > > > > 
> > > > > hangs on adlp, haven't been able to get the serial logs
> > > > > 
> > > > > > Module still in use here would be expected since
> > > > > > intel_gpu_top
> > > > > > is
> > > > > > holding a module reference.
> > > > > > 
> > > > > > And pmu->closed should be set at the unbind step via
> > > > > > i915_pci_remove -> i915_driver_unregister ->
> > > > > > i915_pmu_unregister.
> > > > > 
> > > > > After unbind,
> > > > > kill intel_gpu_top -> i915_pmu_event_del ->
> > > > > i915_pmu_event_stop
> > > > > ->
> > > > > i915_pmu_disable -> likely HANGs when dereferencing engine.
> > > > > 
> > > > > Can we can short circuit i915_pmu_disable with
> > > > > if (pmu->closed)
> > > > >      return;
> > > > > 
> > > > > since this function is also adjusting pmu->enable_count. Does
> > > > > it
> > > > > matter after pmu is closed?
> > > > 
> > > > Erm yes.. this sounds obvious now but why I did not put a pmu-
> > > > > closed check in i915_pmu_event_stop, since read and
> > > > > start/init
> > > > have it!? Was it a simple oversight or something more I can't
> > > > remember.
> > > > 
> > > > Try like this maybe:
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_pmu.c
> > > > b/drivers/gpu/drm/i915/i915_pmu.c
> > > > index 958b37123bf1..2399adf92cc0 100644
> > > > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > > > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > > > @@ -760,9 +760,13 @@ static void i915_pmu_event_start(struct
> > > > perf_event *event, int flags)
> > > > static void i915_pmu_event_stop(struct perf_event *event, int
> > > > flags)
> > > > {
> > > > +       if (pmu->closed)
> > > > +               goto out;
> > > > +
> > > >         if (flags & PERF_EF_UPDATE)
> > > >                 i915_pmu_event_read(event);
> > > >         i915_pmu_disable(event);
> > > > +out:
> > > >         event->hw.state = PERF_HES_STOPPED;
> > > > }
> > > > 
> > > > Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")
> > > 
> > > that works. I don't see a hang with the above sequence on ADLP.
> > > Do
> > > you
> > > want to post/merge this?
> > > 
> > > Also what about Stuart's changes in this series. At a minimum, I
> > > would
> > > keep the engine checks in i915_pmu_disable (rev1)? I am not sure
> > > the
> > > reorder of pmu/gt registrations is needed though.
> > 
> > Thanks for the help here Tvrtko/Umesh! Sorry for the late reply
> > here.
> > I've been swamped and haven't been able to get back here.
> > 
> > IMO we really should have all three of these, possibly in three
> > separate patches. I'm happy to post any or all of these or one of
> > you
> > can - happy to review. It will be earliest some time tomorrow
> > though.
> 
> Re-order on unbind path AFAIR yes, and pmu->closed check in either 
> i915_pmu_event_stop or early return from i915_pmu_disable (I was
> going 
> for symmetry with start, but perhaps it looks clumsy, not sure) yes. 
> Those two should have a fixes tag as well. Null engine checks I still
> do 
> not support. It adds a production build debug string for something
> which 
> is supposed to be impossible and a programming error, and makes the
> code 
> a bit uglier with the extra indentation.

No problem dropping it.

And I'll post the other two here shortly. I'm going to stick with your
first suggestion on that second patch since I'd otherwise have to add a
couple of skips to avoid the bad pmu-> accesses. And I of course agree
with the symmetry.

Thanks,
Stuart

> 
> Regards,
> 
> Tvrtko
> 
> > Thanks,
> > Stuart
> > 
> > > Thanks,
> > > Umesh
> > > 
> > > > Enable count handling in i915_pmu_disable should not matter
> > > > since
> > > > the i915_pmu_unregister would have already been executed by
> > > > this
> > > > point so all we need to ensure is that pmu->closed is not use
> > > > after
> > > > free. And since open event hold the DRM device reference I
> > > > think
> > > > that is fine.
> > > > 
> > > > Regards,
> > > > 
> > > > Tvrtko
> > > > 
> > > > > Umesh
> > > > > 
> > > > > 
> > > > > > We also need to try a stress test with two threads:
> > > > > > 
> > > > > >      Thread A        Thread B
> > > > > >      -----------        -----------
> > > > > >      loop:            loop:
> > > > > >        open pmu event      rmmod
> > > > > >        close pmu event      insmod
> > > > > > 
> > > > > > To see if it can hit a problem with drm_dev_get from
> > > > > > i915_pmu_event_init being called at a bad moment relative
> > > > > > to
> > > > > > module unload. Maybe I am confused but that seems a
> > > > > > possibility
> > > > > > and a serious problem currently.
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Tvrtko

drm/i915: Fix NPD in PMU during driver teardown

Commit Message

Comments

Patch