diff mbox series

[2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()

Message ID 20210922083807.888206-3-thomas.hellstrom@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series DG1 Lockdep warning fixes | expand

Commit Message

Thomas Hellström Sept. 22, 2021, 8:38 a.m. UTC
In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
which may trigger a call to the i915_ttm_bo_destroy() function.
However, at this point the gem object refcount is set to 1, which
triggers a lockdep warning in __i915_gem_free_object() and a
corresponding failure in DG1 BAT, i915_selftest@live@mman.

Fix this by clearing the gem object refcount if called from that
failure path.

Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Matthew Auld Sept. 22, 2021, 10:55 a.m. UTC | #1
On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
> which may trigger a call to the i915_ttm_bo_destroy() function.
> However, at this point the gem object refcount is set to 1, which
> triggers a lockdep warning in __i915_gem_free_object() and a
> corresponding failure in DG1 BAT, i915_selftest@live@mman.
>
> Fix this by clearing the gem object refcount if called from that
> failure path.
>
> Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index b94497989995..b1f561543ff3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>
>         i915_ttm_backup_free(obj);
>
> +       /* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
> +       if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
> +               refcount_set(&obj->base.refcount.refcount, 0);
> +
>         /* This releases all gem object bindings to the backend. */
>         __i915_gem_free_object(obj);

The __i915_gem_free_object is also nuking stuff like mm.placements,
which is still owned by the caller AFAIK, or at least it is until we
have successfully initialised the object, so smells like potential
double free? Can we easily move that under the ttm.created check?
Otherwise maybe we are meant to move the mm.placements handling into
the RCU callback?

>
> --
> 2.31.1
>
Thomas Hellström Sept. 22, 2021, 11:34 a.m. UTC | #2
On 9/22/21 12:55 PM, Matthew Auld wrote:
> On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
>> which may trigger a call to the i915_ttm_bo_destroy() function.
>> However, at this point the gem object refcount is set to 1, which
>> triggers a lockdep warning in __i915_gem_free_object() and a
>> corresponding failure in DG1 BAT, i915_selftest@live@mman.
>>
>> Fix this by clearing the gem object refcount if called from that
>> failure path.
>>
>> Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index b94497989995..b1f561543ff3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>>
>>          i915_ttm_backup_free(obj);
>>
>> +       /* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
>> +       if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
>> +               refcount_set(&obj->base.refcount.refcount, 0);
>> +
>>          /* This releases all gem object bindings to the backend. */
>>          __i915_gem_free_object(obj);
> The __i915_gem_free_object is also nuking stuff like mm.placements,
> which is still owned by the caller AFAIK, or at least it is until we
> have successfully initialised the object, so smells like potential
> double free? Can we easily move that under the ttm.created check?
> Otherwise maybe we are meant to move the mm.placements handling into
> the RCU callback?

Yes, it indeed sounds like a closer look is needed for the error 
handling here. Perhaps it makes sense to initialize the TTM part and 
then the GEM part while still having the lock. Meanwhile I'll put it 
under the ttm.created check.

Thanks,

Thomas


>
>> --
>> 2.31.1
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index b94497989995..b1f561543ff3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -900,6 +900,10 @@  void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 
 	i915_ttm_backup_free(obj);
 
+	/* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
+	if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
+		refcount_set(&obj->base.refcount.refcount, 0);
+
 	/* This releases all gem object bindings to the backend. */
 	__i915_gem_free_object(obj);