diff mbox series

[1/2] drm/i915: Fix NPD in PMU during driver teardown

Message ID 20220803230325.137593-1-stuart.summers@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915: Fix NPD in PMU during driver teardown | expand

Commit Message

Summers, Stuart Aug. 3, 2022, 11:03 p.m. UTC
In the driver teardown, we are unregistering the gt prior
to unregistering the PMU. This means there is a small window
of time in which the application can request metrics from the
PMU, some of which are calling into the uapi engines list,
while the engines are not available. In this case we can
see null pointer dereferences.

Fix this ordering in both the driver load and unload sequences.

v1: Actually address the driver load/unload ordering issue
v2: Remove the NULL checks since they shouldn't be necessary
    now with the proper ordering

Fixes: 42014f69bb235 ("drm/i915: Hook up GT power management")
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
---
 drivers/gpu/drm/i915/i915_driver.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Tvrtko Ursulin Aug. 4, 2022, 8:42 a.m. UTC | #1
On 04/08/2022 00:03, Stuart Summers wrote:
> In the driver teardown, we are unregistering the gt prior
> to unregistering the PMU. This means there is a small window
> of time in which the application can request metrics from the
> PMU, some of which are calling into the uapi engines list,
> while the engines are not available. In this case we can
> see null pointer dereferences.
> 
> Fix this ordering in both the driver load and unload sequences.
> 
> v1: Actually address the driver load/unload ordering issue
> v2: Remove the NULL checks since they shouldn't be necessary
>      now with the proper ordering
> 
> Fixes: 42014f69bb235 ("drm/i915: Hook up GT power management")

What happened in this commit to break it?

> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_driver.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index deb8a8b76965a..ee4dcb85d2060 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -717,7 +717,6 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
>   	struct drm_device *dev = &dev_priv->drm;
>   
>   	i915_gem_driver_register(dev_priv);
> -	i915_pmu_register(dev_priv);
>   
>   	intel_vgpu_register(dev_priv);
>   
> @@ -731,11 +730,12 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
>   	i915_debugfs_register(dev_priv);
>   	i915_setup_sysfs(dev_priv);
>   
> +	intel_gt_driver_register(to_gt(dev_priv));
> +
>   	/* Depends on sysfs having been initialized */
> +	i915_pmu_register(dev_priv);
>   	i915_perf_register(dev_priv);
>   
> -	intel_gt_driver_register(to_gt(dev_priv));
> -

There was a bit of a time distance since we originally discussed this so 
things kind of evaporated from my head. Or at least it feels like that. 
  Today when I look at the code I don't understand why is this shuffle 
relevant.

The sysfs comment does not really apply to pmu, but also if I look into 
intel_gt_driver_(un)register I did not spot what is the relevant part 
which interacts with the PMU.

On register it is engine list first then PMU.

On unregister it is PMU first, then engine list:

   i915_driver_remove
     i915_driver_unregister
       i915_pmu_unregister
     i915_gem_driver_remove
       clears uabi engines list

Help please? :)

Regards,

Tvrtko

>   	intel_display_driver_register(dev_priv);
>   
>   	intel_power_domains_enable(dev_priv);
> @@ -762,11 +762,12 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>   
>   	intel_display_driver_unregister(dev_priv);
>   
> -	intel_gt_driver_unregister(to_gt(dev_priv));
> -
>   	i915_perf_unregister(dev_priv);
> +	/* GT should be available until PMU is gone */
>   	i915_pmu_unregister(dev_priv);
>   
> +	intel_gt_driver_unregister(to_gt(dev_priv));
> +
>   	i915_teardown_sysfs(dev_priv);
>   	drm_dev_unplug(&dev_priv->drm);
>
Summers, Stuart Aug. 4, 2022, 6:56 p.m. UTC | #2
On Thu, 2022-08-04 at 09:42 +0100, Tvrtko Ursulin wrote:
> On 04/08/2022 00:03, Stuart Summers wrote:
> > In the driver teardown, we are unregistering the gt prior
> > to unregistering the PMU. This means there is a small window
> > of time in which the application can request metrics from the
> > PMU, some of which are calling into the uapi engines list,
> > while the engines are not available. In this case we can
> > see null pointer dereferences.
> > 
> > Fix this ordering in both the driver load and unload sequences.
> > 
> > v1: Actually address the driver load/unload ordering issue
> > v2: Remove the NULL checks since they shouldn't be necessary
> >      now with the proper ordering
> > 
> > Fixes: 42014f69bb235 ("drm/i915: Hook up GT power management")
> 
> What happened in this commit to break it?

Hm.. well this was the patch that added the abstraction ordering issue,
i.e. gt_register/unregister. There isn't anything specifically broken
here since we don't actually access the gt structure underneath. My
assertion is that this patch should have taken the expansion of the gt
structure into consideration and added the correct ordering with
respect to the pmu.

Are you asking for the specific patch where things stopped working
functionally?

Thanks,
Stuart

> 
> > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_driver.c | 11 ++++++-----
> >   1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > b/drivers/gpu/drm/i915/i915_driver.c
> > index deb8a8b76965a..ee4dcb85d2060 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -717,7 +717,6 @@ static void i915_driver_register(struct
> > drm_i915_private *dev_priv)
> >   	struct drm_device *dev = &dev_priv->drm;
> >   
> >   	i915_gem_driver_register(dev_priv);
> > -	i915_pmu_register(dev_priv);
> >   
> >   	intel_vgpu_register(dev_priv);
> >   
> > @@ -731,11 +730,12 @@ static void i915_driver_register(struct
> > drm_i915_private *dev_priv)
> >   	i915_debugfs_register(dev_priv);
> >   	i915_setup_sysfs(dev_priv);
> >   
> > +	intel_gt_driver_register(to_gt(dev_priv));
> > +
> >   	/* Depends on sysfs having been initialized */
> > +	i915_pmu_register(dev_priv);
> >   	i915_perf_register(dev_priv);
> >   
> > -	intel_gt_driver_register(to_gt(dev_priv));
> > -
> 
> There was a bit of a time distance since we originally discussed this
> so 
> things kind of evaporated from my head. Or at least it feels like
> that. 
>   Today when I look at the code I don't understand why is this
> shuffle 
> relevant.
> 
> The sysfs comment does not really apply to pmu, but also if I look
> into 
> intel_gt_driver_(un)register I did not spot what is the relevant
> part 
> which interacts with the PMU.
> 
> On register it is engine list first then PMU.
> 
> On unregister it is PMU first, then engine list:
> 
>    i915_driver_remove
>      i915_driver_unregister
>        i915_pmu_unregister
>      i915_gem_driver_remove
>        clears uabi engines list
> 
> Help please? :)
> 
> Regards,
> 
> Tvrtko
> 
> >   	intel_display_driver_register(dev_priv);
> >   
> >   	intel_power_domains_enable(dev_priv);
> > @@ -762,11 +762,12 @@ static void i915_driver_unregister(struct
> > drm_i915_private *dev_priv)
> >   
> >   	intel_display_driver_unregister(dev_priv);
> >   
> > -	intel_gt_driver_unregister(to_gt(dev_priv));
> > -
> >   	i915_perf_unregister(dev_priv);
> > +	/* GT should be available until PMU is gone */
> >   	i915_pmu_unregister(dev_priv);
> >   
> > +	intel_gt_driver_unregister(to_gt(dev_priv));
> > +
> >   	i915_teardown_sysfs(dev_priv);
> >   	drm_dev_unplug(&dev_priv->drm);
> >
Tvrtko Ursulin Aug. 5, 2022, 9:26 a.m. UTC | #3
On 04/08/2022 19:56, Summers, Stuart wrote:
> On Thu, 2022-08-04 at 09:42 +0100, Tvrtko Ursulin wrote:
>> On 04/08/2022 00:03, Stuart Summers wrote:
>>> In the driver teardown, we are unregistering the gt prior
>>> to unregistering the PMU. This means there is a small window
>>> of time in which the application can request metrics from the
>>> PMU, some of which are calling into the uapi engines list,
>>> while the engines are not available. In this case we can
>>> see null pointer dereferences.
>>>
>>> Fix this ordering in both the driver load and unload sequences.
>>>
>>> v1: Actually address the driver load/unload ordering issue
>>> v2: Remove the NULL checks since they shouldn't be necessary
>>>       now with the proper ordering
>>>
>>> Fixes: 42014f69bb235 ("drm/i915: Hook up GT power management")
>>
>> What happened in this commit to break it?
> 
> Hm.. well this was the patch that added the abstraction ordering issue,
> i.e. gt_register/unregister. There isn't anything specifically broken
> here since we don't actually access the gt structure underneath. My
> assertion is that this patch should have taken the expansion of the gt
> structure into consideration and added the correct ordering with
> respect to the pmu.
> 
> Are you asking for the specific patch where things stopped working
> functionally?

No, I was looking for the info on what is actually broken with the 
ordering, as is since I couldn't spot it myself yesterday. In other 
words, with patch 2/2 applied - does 1/2 fix anything further for the 
PMU use cases?

Regards,

Tvrtko

>>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_driver.c | 11 ++++++-----
>>>    1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c
>>> b/drivers/gpu/drm/i915/i915_driver.c
>>> index deb8a8b76965a..ee4dcb85d2060 100644
>>> --- a/drivers/gpu/drm/i915/i915_driver.c
>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
>>> @@ -717,7 +717,6 @@ static void i915_driver_register(struct
>>> drm_i915_private *dev_priv)
>>>    	struct drm_device *dev = &dev_priv->drm;
>>>    
>>>    	i915_gem_driver_register(dev_priv);
>>> -	i915_pmu_register(dev_priv);
>>>    
>>>    	intel_vgpu_register(dev_priv);
>>>    
>>> @@ -731,11 +730,12 @@ static void i915_driver_register(struct
>>> drm_i915_private *dev_priv)
>>>    	i915_debugfs_register(dev_priv);
>>>    	i915_setup_sysfs(dev_priv);
>>>    
>>> +	intel_gt_driver_register(to_gt(dev_priv));
>>> +
>>>    	/* Depends on sysfs having been initialized */
>>> +	i915_pmu_register(dev_priv);
>>>    	i915_perf_register(dev_priv);
>>>    
>>> -	intel_gt_driver_register(to_gt(dev_priv));
>>> -
>>
>> There was a bit of a time distance since we originally discussed this
>> so
>> things kind of evaporated from my head. Or at least it feels like
>> that.
>>    Today when I look at the code I don't understand why is this
>> shuffle
>> relevant.
>>
>> The sysfs comment does not really apply to pmu, but also if I look
>> into
>> intel_gt_driver_(un)register I did not spot what is the relevant
>> part
>> which interacts with the PMU.
>>
>> On register it is engine list first then PMU.
>>
>> On unregister it is PMU first, then engine list:
>>
>>     i915_driver_remove
>>       i915_driver_unregister
>>         i915_pmu_unregister
>>       i915_gem_driver_remove
>>         clears uabi engines list
>>
>> Help please? :)
>>
>> Regards,
>>
>> Tvrtko
>>
>>>    	intel_display_driver_register(dev_priv);
>>>    
>>>    	intel_power_domains_enable(dev_priv);
>>> @@ -762,11 +762,12 @@ static void i915_driver_unregister(struct
>>> drm_i915_private *dev_priv)
>>>    
>>>    	intel_display_driver_unregister(dev_priv);
>>>    
>>> -	intel_gt_driver_unregister(to_gt(dev_priv));
>>> -
>>>    	i915_perf_unregister(dev_priv);
>>> +	/* GT should be available until PMU is gone */
>>>    	i915_pmu_unregister(dev_priv);
>>>    
>>> +	intel_gt_driver_unregister(to_gt(dev_priv));
>>> +
>>>    	i915_teardown_sysfs(dev_priv);
>>>    	drm_dev_unplug(&dev_priv->drm);
>>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index deb8a8b76965a..ee4dcb85d2060 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -717,7 +717,6 @@  static void i915_driver_register(struct drm_i915_private *dev_priv)
 	struct drm_device *dev = &dev_priv->drm;
 
 	i915_gem_driver_register(dev_priv);
-	i915_pmu_register(dev_priv);
 
 	intel_vgpu_register(dev_priv);
 
@@ -731,11 +730,12 @@  static void i915_driver_register(struct drm_i915_private *dev_priv)
 	i915_debugfs_register(dev_priv);
 	i915_setup_sysfs(dev_priv);
 
+	intel_gt_driver_register(to_gt(dev_priv));
+
 	/* Depends on sysfs having been initialized */
+	i915_pmu_register(dev_priv);
 	i915_perf_register(dev_priv);
 
-	intel_gt_driver_register(to_gt(dev_priv));
-
 	intel_display_driver_register(dev_priv);
 
 	intel_power_domains_enable(dev_priv);
@@ -762,11 +762,12 @@  static void i915_driver_unregister(struct drm_i915_private *dev_priv)
 
 	intel_display_driver_unregister(dev_priv);
 
-	intel_gt_driver_unregister(to_gt(dev_priv));
-
 	i915_perf_unregister(dev_priv);
+	/* GT should be available until PMU is gone */
 	i915_pmu_unregister(dev_priv);
 
+	intel_gt_driver_unregister(to_gt(dev_priv));
+
 	i915_teardown_sysfs(dev_priv);
 	drm_dev_unplug(&dev_priv->drm);