diff mbox series

[1/2] drm/i915: Fix a potential UAF at device unload

Message ID 20220923073515.23093-1-nirmoy.das@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915: Fix a potential UAF at device unload | expand

Commit Message

Nirmoy Das Sept. 23, 2022, 7:35 a.m. UTC
i915_gem_drain_freed_objects() might not be enough to
free all the objects and RCU delayed work might get
scheduled after the i915 device struct gets freed.

Call i915_gem_drain_workqueue() to catch all RCU delayed work.

Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrzej Hajda Sept. 27, 2022, 7:43 a.m. UTC | #1
On 23.09.2022 09:35, Nirmoy Das wrote:
> i915_gem_drain_freed_objects() might not be enough to
> free all the objects and RCU delayed work might get
> scheduled after the i915 device struct gets freed.
> 
> Call i915_gem_drain_workqueue() to catch all RCU delayed work.
> 
> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>

Regards
Andrzej

> ---
>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 88df9a35e0fe..7541028caebd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv)
>   
>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>   {
> -	i915_gem_drain_freed_objects(dev_priv);
> +	i915_gem_drain_workqueue(dev_priv);
>   	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>   	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>   	drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
Andi Shyti Sept. 29, 2022, 11:32 a.m. UTC | #2
Hi Nirmoy,

On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote:
> i915_gem_drain_freed_objects() might not be enough to
> free all the objects and RCU delayed work might get
> scheduled after the i915 device struct gets freed.
> 
> Call i915_gem_drain_workqueue() to catch all RCU delayed work.
> 
> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>

pushed to drm-intel-gt-next

Thanks,
Andi
Nirmoy Das Sept. 29, 2022, 11:36 a.m. UTC | #3
On 9/29/2022 1:32 PM, Andi Shyti wrote:
> Hi Nirmoy,
>
> On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote:
>> i915_gem_drain_freed_objects() might not be enough to
>> free all the objects and RCU delayed work might get
>> scheduled after the i915 device struct gets freed.
>>
>> Call i915_gem_drain_workqueue() to catch all RCU delayed work.
>>
>> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
>> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> pushed to drm-intel-gt-next


Thanks, Andi!

>
> Thanks,
> Andi
Ville Syrjala Oct. 21, 2022, 4:34 p.m. UTC | #4
On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote:
> i915_gem_drain_freed_objects() might not be enough to
> free all the objects and RCU delayed work might get
> scheduled after the i915 device struct gets freed.
> 
> Call i915_gem_drain_workqueue() to catch all RCU delayed work.

shard-snb is stil hitting the mm.shrink_count WARNn reliably,
and things go downhill after that.

> 
> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 88df9a35e0fe..7541028caebd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv)
>  
>  void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>  {
> -	i915_gem_drain_freed_objects(dev_priv);
> +	i915_gem_drain_workqueue(dev_priv);
>  	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>  	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>  	drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
> -- 
> 2.37.3
Nirmoy Das Oct. 24, 2022, 8:08 a.m. UTC | #5
On 10/21/2022 6:34 PM, Ville Syrjälä wrote:
> On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote:
>> i915_gem_drain_freed_objects() might not be enough to
>> free all the objects and RCU delayed work might get
>> scheduled after the i915 device struct gets freed.
>>
>> Call i915_gem_drain_workqueue() to catch all RCU delayed work.
> shard-snb is stil hitting the mm.shrink_count WARNn reliably,
> and things go downhill after that.


Looks better now again. Going to look into that.


Thanks,

Nirmoy

>
>> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
>> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 88df9a35e0fe..7541028caebd 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv)
>>   
>>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>>   {
>> -	i915_gem_drain_freed_objects(dev_priv);
>> +	i915_gem_drain_workqueue(dev_priv);
>>   	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>>   	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>>   	drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
>> -- 
>> 2.37.3
Ville Syrjala Nov. 2, 2022, 5:55 p.m. UTC | #6
On Mon, Oct 24, 2022 at 10:08:29AM +0200, Das, Nirmoy wrote:
> 
> On 10/21/2022 6:34 PM, Ville Syrjälä wrote:
> > On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote:
> >> i915_gem_drain_freed_objects() might not be enough to
> >> free all the objects and RCU delayed work might get
> >> scheduled after the i915 device struct gets freed.
> >>
> >> Call i915_gem_drain_workqueue() to catch all RCU delayed work.
> > shard-snb is stil hitting the mm.shrink_count WARNn reliably,
> > and things go downhill after that.
> 
> 
> Looks better now again. Going to look into that.

Looks to be still hitting it occasionally in module reload tests:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7033/shard-snb5/igt@i915_module_load@reload.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7035/shard-snb7/igt@perf_pmu@module-unload.html

> 
> 
> Thanks,
> 
> Nirmoy
> 
> >
> >> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
> >> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >> index 88df9a35e0fe..7541028caebd 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv)
> >>   
> >>   void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
> >>   {
> >> -	i915_gem_drain_freed_objects(dev_priv);
> >> +	i915_gem_drain_workqueue(dev_priv);
> >>   	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
> >>   	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
> >>   	drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
> >> -- 
> >> 2.37.3
Nirmoy Das Nov. 2, 2022, 6:37 p.m. UTC | #7
Hi Ville,

On 11/2/2022 6:55 PM, Ville Syrjälä wrote:
> On Mon, Oct 24, 2022 at 10:08:29AM +0200, Das, Nirmoy wrote:
>> On 10/21/2022 6:34 PM, Ville Syrjälä wrote:
>>> On Fri, Sep 23, 2022 at 09:35:14AM +0200, Nirmoy Das wrote:
>>>> i915_gem_drain_freed_objects() might not be enough to
>>>> free all the objects and RCU delayed work might get
>>>> scheduled after the i915 device struct gets freed.
>>>>
>>>> Call i915_gem_drain_workqueue() to catch all RCU delayed work.
>>> shard-snb is stil hitting the mm.shrink_count WARNn reliably,
>>> and things go downhill after that.
>>
>> Looks better now again. Going to look into that.
> Looks to be still hitting it occasionally in module reload tests:
> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7033/shard-snb5/igt@i915_module_load@reload.html
> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_7035/shard-snb7/igt@perf_pmu@module-unload.html


There are no snb in RIl so I ran this test on tgl-u for 6+ hours without 
any reproduction. Not sure why snb is so special here.

May be we need your previous patch as well ? I will be on vacation from 
next week so unfortunately I won't be able work on it for few  weeks.


Regards,

Nirmoy

>
>>
>> Thanks,
>>
>> Nirmoy
>>
>>>> Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
>>>> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>> index 88df9a35e0fe..7541028caebd 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -1278,7 +1278,7 @@ void i915_gem_init_early(struct drm_i915_private *dev_priv)
>>>>    
>>>>    void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>>>>    {
>>>> -	i915_gem_drain_freed_objects(dev_priv);
>>>> +	i915_gem_drain_workqueue(dev_priv);
>>>>    	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
>>>>    	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>>>>    	drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);
>>>> -- 
>>>> 2.37.3
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 88df9a35e0fe..7541028caebd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1278,7 +1278,7 @@  void i915_gem_init_early(struct drm_i915_private *dev_priv)
 
 void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 {
-	i915_gem_drain_freed_objects(dev_priv);
+	i915_gem_drain_workqueue(dev_priv);
 	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
 	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
 	drm_WARN_ON(&dev_priv->drm, dev_priv->mm.shrink_count);