diff mbox series

[RFC] drm/ttm: Simplify the delayed destroy locking

Message ID 20210412131740.10257-1-thomas.hellstrom@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] drm/ttm: Simplify the delayed destroy locking | expand

Commit Message

Thomas Hellström April 12, 2021, 1:17 p.m. UTC
This RFC needs some decent testing on a driver with bos that share
reservation objects, and of course a check for whether I missed
something obvious.

The locking around delayed destroy is rather complex due to the fact
that we want to individualize dma_resv pointers before putting the object
on the delayed-destroy list. That individualization is currently protected
by the lru lock forcing us to do try-reservations under the lru lock when
reserving an object from the lru- or ddestroy lists, which complicates
moving over to per-manager lru locks.

Instead protect the individualization with the object kref == 0,
and ensure kref != 0 before trying to reserve an object from the list.
Remove the lru lock around individualization and protect the delayed
destroy list with a separate spinlock. Isolate the delayed destroy list
similarly to a lockless list before traversing it. Also don't try to drop
resource references from the delayed destroy list traversal, but leave
that to the final object release. The role of the delayed destroy thread
will now simply be to try to idle the object and when idle, drop
the delayed destoy reference.

This should pave the way for per-manager lru lists, sleeping eviction locks
that are kept, optional drm_mm eviction roster usage and moving over to a
completely lockless delayed destroy list even if the traversal order
will then change because there is no llist_add_tail().

Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c     | 193 +++++++++++++------------------
 drivers/gpu/drm/ttm/ttm_device.c |   1 +
 include/drm/ttm/ttm_device.h     |   1 +
 3 files changed, 82 insertions(+), 113 deletions(-)

Comments

Christian König April 12, 2021, 2:01 p.m. UTC | #1
Hi Thomas,

well in general a good idea, but I'm working on a different plan for a 
while now.

My idea here is that instead of the BO the resource object is kept on a 
double linked lru list.

The resource objects then have a pointer to either the BO or a fence object.

When it is a fence object we can just grab a reference to it and wait 
for it to complete.

If it is a BO we evict it the same way we currently do.

This allows to remove both the delayed delete, individualization of BOs, 
ghost objects etc...

Regards,
Christian.

Am 12.04.21 um 15:17 schrieb Thomas Hellström:
> This RFC needs some decent testing on a driver with bos that share
> reservation objects, and of course a check for whether I missed
> something obvious.
>
> The locking around delayed destroy is rather complex due to the fact
> that we want to individualize dma_resv pointers before putting the object
> on the delayed-destroy list. That individualization is currently protected
> by the lru lock forcing us to do try-reservations under the lru lock when
> reserving an object from the lru- or ddestroy lists, which complicates
> moving over to per-manager lru locks.
>
> Instead protect the individualization with the object kref == 0,
> and ensure kref != 0 before trying to reserve an object from the list.
> Remove the lru lock around individualization and protect the delayed
> destroy list with a separate spinlock. Isolate the delayed destroy list
> similarly to a lockless list before traversing it. Also don't try to drop
> resource references from the delayed destroy list traversal, but leave
> that to the final object release. The role of the delayed destroy thread
> will now simply be to try to idle the object and when idle, drop
> the delayed destoy reference.
>
> This should pave the way for per-manager lru lists, sleeping eviction locks
> that are kept, optional drm_mm eviction roster usage and moving over to a
> completely lockless delayed destroy list even if the traversal order
> will then change because there is no llist_add_tail().
>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c     | 193 +++++++++++++------------------
>   drivers/gpu/drm/ttm/ttm_device.c |   1 +
>   include/drm/ttm/ttm_device.h     |   1 +
>   3 files changed, 82 insertions(+), 113 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 6ab7b66ce36d..fd57252bc8fe 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>   
>   static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
>   {
> +	if (kref_read(&bo->kref))
> +		dma_resv_assert_held(bo->base.resv);
> +
>   	if (bo->bdev->funcs->delete_mem_notify)
>   		bo->bdev->funcs->delete_mem_notify(bo);
>   
> @@ -239,13 +242,18 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
>   		return r;
>   
>   	if (bo->type != ttm_bo_type_sg) {
> -		/* This works because the BO is about to be destroyed and nobody
> -		 * reference it any more. The only tricky case is the trylock on
> -		 * the resv object while holding the lru_lock.
> +		/*
> +		 * This works because nobody besides us can lock the bo or
> +		 * assume it is locked while its refcount is zero.
>   		 */
> -		spin_lock(&bo->bdev->lru_lock);
> +		WARN_ON_ONCE(kref_read(&bo->kref));
>   		bo->base.resv = &bo->base._resv;
> -		spin_unlock(&bo->bdev->lru_lock);
> +		/*
> +		 * Don't reorder against kref_init().
> +		 * Matches the implicit full barrier of a successful
> +		 * kref_get_unless_zero() after a lru_list_lookup.
> +		 */
> +		smp_mb();
>   	}
>   
>   	return r;
> @@ -274,122 +282,66 @@ static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
>   }
>   
>   /**
> - * function ttm_bo_cleanup_refs
> - * If bo idle, remove from lru lists, and unref.
> - * If not idle, block if possible.
> - *
> - * Must be called with lru_lock and reservation held, this function
> - * will drop the lru lock and optionally the reservation lock before returning.
> + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings and remove
> + * from lru_lists.
>    *
>    * @bo:                    The buffer object to clean-up
>    * @interruptible:         Any sleeps should occur interruptibly.
> - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
> - * @unlock_resv:           Unlock the reservation lock as well.
> + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if not idle.
>    */
> -
>   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
> -			       bool interruptible, bool no_wait_gpu,
> -			       bool unlock_resv)
> +			       bool interruptible, bool no_wait_gpu)
>   {
> -	struct dma_resv *resv = &bo->base._resv;
>   	int ret;
>   
> -	if (dma_resv_test_signaled_rcu(resv, true))
> -		ret = 0;
> -	else
> -		ret = -EBUSY;
> -
> -	if (ret && !no_wait_gpu) {
> -		long lret;
> -
> -		if (unlock_resv)
> -			dma_resv_unlock(bo->base.resv);
> -		spin_unlock(&bo->bdev->lru_lock);
> -
> -		lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
> -						 30 * HZ);
> -
> -		if (lret < 0)
> -			return lret;
> -		else if (lret == 0)
> -			return -EBUSY;
> -
> -		spin_lock(&bo->bdev->lru_lock);
> -		if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
> -			/*
> -			 * We raced, and lost, someone else holds the reservation now,
> -			 * and is probably busy in ttm_bo_cleanup_memtype_use.
> -			 *
> -			 * Even if it's not the case, because we finished waiting any
> -			 * delayed destruction would succeed, so just return success
> -			 * here.
> -			 */
> -			spin_unlock(&bo->bdev->lru_lock);
> -			return 0;
> -		}
> -		ret = 0;
> -	}
> -
> -	if (ret || unlikely(list_empty(&bo->ddestroy))) {
> -		if (unlock_resv)
> -			dma_resv_unlock(bo->base.resv);
> -		spin_unlock(&bo->bdev->lru_lock);
> +	dma_resv_assert_held(bo->base.resv);
> +	WARN_ON_ONCE(!bo->deleted);
> +	ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
> +	if (ret)
>   		return ret;
> -	}
>   
> +	ttm_bo_cleanup_memtype_use(bo);
> +	spin_lock(&bo->bdev->lru_lock);
>   	ttm_bo_del_from_lru(bo);
> -	list_del_init(&bo->ddestroy);
>   	spin_unlock(&bo->bdev->lru_lock);
> -	ttm_bo_cleanup_memtype_use(bo);
> -
> -	if (unlock_resv)
> -		dma_resv_unlock(bo->base.resv);
> -
> -	ttm_bo_put(bo);
>   
>   	return 0;
>   }
>   
>   /*
> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
> - * encountered buffers.
> + * Isolate a part of the delayed destroy list and check for idle on all
> + * buffer objects on it. If idle, remove from the list and drop the
> + * delayed destroy refcount. Return all busy buffers to the delayed
> + * destroy list.
>    */
>   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
>   {
> -	struct list_head removed;
> -	bool empty;
> -
> -	INIT_LIST_HEAD(&removed);
> -
> -	spin_lock(&bdev->lru_lock);
> -	while (!list_empty(&bdev->ddestroy)) {
> -		struct ttm_buffer_object *bo;
> -
> -		bo = list_first_entry(&bdev->ddestroy, struct ttm_buffer_object,
> -				      ddestroy);
> -		list_move_tail(&bo->ddestroy, &removed);
> -		if (!ttm_bo_get_unless_zero(bo))
> -			continue;
> -
> -		if (remove_all || bo->base.resv != &bo->base._resv) {
> -			spin_unlock(&bdev->lru_lock);
> -			dma_resv_lock(bo->base.resv, NULL);
> -
> -			spin_lock(&bdev->lru_lock);
> -			ttm_bo_cleanup_refs(bo, false, !remove_all, true);
> -
> -		} else if (dma_resv_trylock(bo->base.resv)) {
> -			ttm_bo_cleanup_refs(bo, false, !remove_all, true);
> -		} else {
> -			spin_unlock(&bdev->lru_lock);
> +	struct ttm_buffer_object *bo, *next;
> +	struct list_head isolated;
> +	bool empty, idle;
> +
> +	INIT_LIST_HEAD(&isolated);
> +
> +	spin_lock(&bdev->ddestroy_lock);
> +	list_splice_init(&bdev->ddestroy, &isolated);
> +	spin_unlock(&bdev->ddestroy_lock);
> +	list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
> +		WARN_ON_ONCE(!kref_read(&bo->kref) ||
> +			     bo->base.resv != &bo->base._resv);
> +		if (!remove_all)
> +			idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
> +		else
> +			idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
> +							  false, 30 * HZ) > 0);
> +		if (idle) {
> +			list_del_init(&bo->ddestroy);
> +			ttm_bo_put(bo);
>   		}
> -
> -		ttm_bo_put(bo);
> -		spin_lock(&bdev->lru_lock);
>   	}
> -	list_splice_tail(&removed, &bdev->ddestroy);
> +	spin_lock(&bdev->ddestroy_lock);
> +	list_splice(&isolated, &bdev->ddestroy);
>   	empty = list_empty(&bdev->ddestroy);
> -	spin_unlock(&bdev->lru_lock);
> +	spin_unlock(&bdev->ddestroy_lock);
>   
>   	return empty;
>   }
> @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
>   		ret = ttm_bo_individualize_resv(bo);
>   		if (ret) {
>   			/* Last resort, if we fail to allocate memory for the
> -			 * fences block for the BO to become idle
> +			 * fences block for the BO to become idle, and then
> +			 * we are free to update the resv pointer.
>   			 */
>   			dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
>   						  30 * HZ);
> +			if (bo->type != ttm_bo_type_sg) {
> +				bo->base.resv = &bo->base._resv;
> +				/* Please see ttm_bo_individualize_resv() */
> +				smp_mb();
> +			}
>   		}
>   
>   		if (bo->bdev->funcs->release_notify)
> @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
>   	    !dma_resv_trylock(bo->base.resv)) {
>   		/* The BO is not idle, resurrect it for delayed destroy */
>   		ttm_bo_flush_all_fences(bo);
> -		bo->deleted = true;
> -
>   		spin_lock(&bo->bdev->lru_lock);
> +		bo->deleted = true;
>   
>   		/*
>   		 * Make pinned bos immediately available to
> @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
>   			ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>   		}
>   
> +		spin_lock(&bo->bdev->ddestroy_lock);
>   		kref_init(&bo->kref);
>   		list_add_tail(&bo->ddestroy, &bdev->ddestroy);
> +		spin_unlock(&bo->bdev->ddestroy_lock);
>   		spin_unlock(&bo->bdev->lru_lock);
>   
>   		schedule_delayed_work(&bdev->wq,
> @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
>   
>   	spin_lock(&bo->bdev->lru_lock);
>   	ttm_bo_del_from_lru(bo);
> -	list_del(&bo->ddestroy);
>   	spin_unlock(&bo->bdev->lru_lock);
>   
> +	pr_info("Deleting.\n");
> +	WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
> +
>   	ttm_bo_cleanup_memtype_use(bo);
>   	dma_resv_unlock(bo->base.resv);
>   
> @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>   		list_for_each_entry(bo, &man->lru[i], lru) {
>   			bool busy;
>   
> +			if (!ttm_bo_get_unless_zero(bo))
> +				continue;
> +
>   			if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>   							    &busy)) {
>   				if (busy && !busy_bo && ticket !=
>   				    dma_resv_locking_ctx(bo->base.resv))
>   					busy_bo = bo;
> +				ttm_bo_put(bo);
>   				continue;
>   			}
>   
>   			if (place && !bdev->funcs->eviction_valuable(bo,
>   								      place)) {
> +				ttm_bo_put(bo);
>   				if (locked)
>   					dma_resv_unlock(bo->base.resv);
>   				continue;
>   			}
> -			if (!ttm_bo_get_unless_zero(bo)) {
> -				if (locked)
> -					dma_resv_unlock(bo->base.resv);
> -				continue;
> -			}
> +
>   			break;
>   		}
>   
> @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>   	}
>   
>   	if (bo->deleted) {
> +		spin_unlock(&bdev->lru_lock);
>   		ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
> -					  ctx->no_wait_gpu, locked);
> +					  ctx->no_wait_gpu);
> +		if (locked)
> +			dma_resv_unlock(bo->base.resv);
>   		ttm_bo_put(bo);
>   		return ret;
>   	}
> @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   	bool locked;
>   	int ret;
>   
> -	if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
> +	if (!ttm_bo_get_unless_zero(bo))
>   		return -EBUSY;
>   
> -	if (!ttm_bo_get_unless_zero(bo)) {
> -		if (locked)
> -			dma_resv_unlock(bo->base.resv);
> +	if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
> +		ttm_bo_put(bo);
>   		return -EBUSY;
>   	}
>   
>   	if (bo->deleted) {
> -		ttm_bo_cleanup_refs(bo, false, false, locked);
> +		spin_unlock(&bo->bdev->lru_lock);
> +		ttm_bo_cleanup_refs(bo, false, false);
> +		if (locked)
> +			dma_resv_unlock(bo->base.resv);
>   		ttm_bo_put(bo);
>   		return 0;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 9b787b3caeb5..b941989885ed 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
>   	bdev->vma_manager = vma_manager;
>   	INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
>   	spin_lock_init(&bdev->lru_lock);
> +	spin_lock_init(&bdev->ddestroy_lock);
>   	INIT_LIST_HEAD(&bdev->ddestroy);
>   	bdev->dev_mapping = mapping;
>   	mutex_lock(&ttm_global_mutex);
> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
> index 7c8f87bd52d3..fa8c4f6a169e 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -279,6 +279,7 @@ struct ttm_device {
>   	 * Protection for the per manager LRU and ddestroy lists.
>   	 */
>   	spinlock_t lru_lock;
> +	spinlock_t ddestroy_lock;
>   	struct list_head ddestroy;
>   
>   	/*
Thomas Hellström April 12, 2021, 2:16 p.m. UTC | #2
Hi, Christian,

On 4/12/21 4:01 PM, Christian König wrote:
> Hi Thomas,
>
> well in general a good idea, but I'm working on a different plan for a 
> while now.
>
> My idea here is that instead of the BO the resource object is kept on 
> a double linked lru list.
>
> The resource objects then have a pointer to either the BO or a fence 
> object.
>
> When it is a fence object we can just grab a reference to it and wait 
> for it to complete.
>
> If it is a BO we evict it the same way we currently do.
>
> This allows to remove both the delayed delete, individualization of 
> BOs, ghost objects etc...

Hmm, ok. So in that case, what would trigger the final release of the 
buffer object in the absence of a delayed delete list? Would we use a 
fence callback for that?

Otherwise sounds like a good idea, and if it runs into problems, we 
could possibly resurrect this.

/Thomas




>
> Regards,
> Christian.
>
> Am 12.04.21 um 15:17 schrieb Thomas Hellström:
>> This RFC needs some decent testing on a driver with bos that share
>> reservation objects, and of course a check for whether I missed
>> something obvious.
>>
>> The locking around delayed destroy is rather complex due to the fact
>> that we want to individualize dma_resv pointers before putting the 
>> object
>> on the delayed-destroy list. That individualization is currently 
>> protected
>> by the lru lock forcing us to do try-reservations under the lru lock 
>> when
>> reserving an object from the lru- or ddestroy lists, which complicates
>> moving over to per-manager lru locks.
>>
>> Instead protect the individualization with the object kref == 0,
>> and ensure kref != 0 before trying to reserve an object from the list.
>> Remove the lru lock around individualization and protect the delayed
>> destroy list with a separate spinlock. Isolate the delayed destroy list
>> similarly to a lockless list before traversing it. Also don't try to 
>> drop
>> resource references from the delayed destroy list traversal, but leave
>> that to the final object release. The role of the delayed destroy thread
>> will now simply be to try to idle the object and when idle, drop
>> the delayed destoy reference.
>>
>> This should pave the way for per-manager lru lists, sleeping eviction 
>> locks
>> that are kept, optional drm_mm eviction roster usage and moving over 
>> to a
>> completely lockless delayed destroy list even if the traversal order
>> will then change because there is no llist_add_tail().
>>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c     | 193 +++++++++++++------------------
>>   drivers/gpu/drm/ttm/ttm_device.c |   1 +
>>   include/drm/ttm/ttm_device.h     |   1 +
>>   3 files changed, 82 insertions(+), 113 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 6ab7b66ce36d..fd57252bc8fe 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct 
>> ttm_buffer_object *bo,
>>     static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
>>   {
>> +    if (kref_read(&bo->kref))
>> +        dma_resv_assert_held(bo->base.resv);
>> +
>>       if (bo->bdev->funcs->delete_mem_notify)
>>           bo->bdev->funcs->delete_mem_notify(bo);
>>   @@ -239,13 +242,18 @@ static int ttm_bo_individualize_resv(struct 
>> ttm_buffer_object *bo)
>>           return r;
>>         if (bo->type != ttm_bo_type_sg) {
>> -        /* This works because the BO is about to be destroyed and 
>> nobody
>> -         * reference it any more. The only tricky case is the 
>> trylock on
>> -         * the resv object while holding the lru_lock.
>> +        /*
>> +         * This works because nobody besides us can lock the bo or
>> +         * assume it is locked while its refcount is zero.
>>            */
>> -        spin_lock(&bo->bdev->lru_lock);
>> +        WARN_ON_ONCE(kref_read(&bo->kref));
>>           bo->base.resv = &bo->base._resv;
>> -        spin_unlock(&bo->bdev->lru_lock);
>> +        /*
>> +         * Don't reorder against kref_init().
>> +         * Matches the implicit full barrier of a successful
>> +         * kref_get_unless_zero() after a lru_list_lookup.
>> +         */
>> +        smp_mb();
>>       }
>>         return r;
>> @@ -274,122 +282,66 @@ static void ttm_bo_flush_all_fences(struct 
>> ttm_buffer_object *bo)
>>   }
>>     /**
>> - * function ttm_bo_cleanup_refs
>> - * If bo idle, remove from lru lists, and unref.
>> - * If not idle, block if possible.
>> - *
>> - * Must be called with lru_lock and reservation held, this function
>> - * will drop the lru lock and optionally the reservation lock before 
>> returning.
>> + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings and 
>> remove
>> + * from lru_lists.
>>    *
>>    * @bo:                    The buffer object to clean-up
>>    * @interruptible:         Any sleeps should occur interruptibly.
>> - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
>> - * @unlock_resv:           Unlock the reservation lock as well.
>> + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if not 
>> idle.
>>    */
>> -
>>   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>> -                   bool interruptible, bool no_wait_gpu,
>> -                   bool unlock_resv)
>> +                   bool interruptible, bool no_wait_gpu)
>>   {
>> -    struct dma_resv *resv = &bo->base._resv;
>>       int ret;
>>   -    if (dma_resv_test_signaled_rcu(resv, true))
>> -        ret = 0;
>> -    else
>> -        ret = -EBUSY;
>> -
>> -    if (ret && !no_wait_gpu) {
>> -        long lret;
>> -
>> -        if (unlock_resv)
>> -            dma_resv_unlock(bo->base.resv);
>> -        spin_unlock(&bo->bdev->lru_lock);
>> -
>> -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
>> -                         30 * HZ);
>> -
>> -        if (lret < 0)
>> -            return lret;
>> -        else if (lret == 0)
>> -            return -EBUSY;
>> -
>> -        spin_lock(&bo->bdev->lru_lock);
>> -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
>> -            /*
>> -             * We raced, and lost, someone else holds the 
>> reservation now,
>> -             * and is probably busy in ttm_bo_cleanup_memtype_use.
>> -             *
>> -             * Even if it's not the case, because we finished 
>> waiting any
>> -             * delayed destruction would succeed, so just return 
>> success
>> -             * here.
>> -             */
>> -            spin_unlock(&bo->bdev->lru_lock);
>> -            return 0;
>> -        }
>> -        ret = 0;
>> -    }
>> -
>> -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
>> -        if (unlock_resv)
>> -            dma_resv_unlock(bo->base.resv);
>> -        spin_unlock(&bo->bdev->lru_lock);
>> +    dma_resv_assert_held(bo->base.resv);
>> +    WARN_ON_ONCE(!bo->deleted);
>> +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
>> +    if (ret)
>>           return ret;
>> -    }
>>   +    ttm_bo_cleanup_memtype_use(bo);
>> +    spin_lock(&bo->bdev->lru_lock);
>>       ttm_bo_del_from_lru(bo);
>> -    list_del_init(&bo->ddestroy);
>>       spin_unlock(&bo->bdev->lru_lock);
>> -    ttm_bo_cleanup_memtype_use(bo);
>> -
>> -    if (unlock_resv)
>> -        dma_resv_unlock(bo->base.resv);
>> -
>> -    ttm_bo_put(bo);
>>         return 0;
>>   }
>>     /*
>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
>> - * encountered buffers.
>> + * Isolate a part of the delayed destroy list and check for idle on all
>> + * buffer objects on it. If idle, remove from the list and drop the
>> + * delayed destroy refcount. Return all busy buffers to the delayed
>> + * destroy list.
>>    */
>>   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
>>   {
>> -    struct list_head removed;
>> -    bool empty;
>> -
>> -    INIT_LIST_HEAD(&removed);
>> -
>> -    spin_lock(&bdev->lru_lock);
>> -    while (!list_empty(&bdev->ddestroy)) {
>> -        struct ttm_buffer_object *bo;
>> -
>> -        bo = list_first_entry(&bdev->ddestroy, struct 
>> ttm_buffer_object,
>> -                      ddestroy);
>> -        list_move_tail(&bo->ddestroy, &removed);
>> -        if (!ttm_bo_get_unless_zero(bo))
>> -            continue;
>> -
>> -        if (remove_all || bo->base.resv != &bo->base._resv) {
>> -            spin_unlock(&bdev->lru_lock);
>> -            dma_resv_lock(bo->base.resv, NULL);
>> -
>> -            spin_lock(&bdev->lru_lock);
>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>> -
>> -        } else if (dma_resv_trylock(bo->base.resv)) {
>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>> -        } else {
>> -            spin_unlock(&bdev->lru_lock);
>> +    struct ttm_buffer_object *bo, *next;
>> +    struct list_head isolated;
>> +    bool empty, idle;
>> +
>> +    INIT_LIST_HEAD(&isolated);
>> +
>> +    spin_lock(&bdev->ddestroy_lock);
>> +    list_splice_init(&bdev->ddestroy, &isolated);
>> +    spin_unlock(&bdev->ddestroy_lock);
>> +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
>> +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
>> +                 bo->base.resv != &bo->base._resv);
>> +        if (!remove_all)
>> +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
>> +        else
>> +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
>> +                              false, 30 * HZ) > 0);
>> +        if (idle) {
>> +            list_del_init(&bo->ddestroy);
>> +            ttm_bo_put(bo);
>>           }
>> -
>> -        ttm_bo_put(bo);
>> -        spin_lock(&bdev->lru_lock);
>>       }
>> -    list_splice_tail(&removed, &bdev->ddestroy);
>> +    spin_lock(&bdev->ddestroy_lock);
>> +    list_splice(&isolated, &bdev->ddestroy);
>>       empty = list_empty(&bdev->ddestroy);
>> -    spin_unlock(&bdev->lru_lock);
>> +    spin_unlock(&bdev->ddestroy_lock);
>>         return empty;
>>   }
>> @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
>>           ret = ttm_bo_individualize_resv(bo);
>>           if (ret) {
>>               /* Last resort, if we fail to allocate memory for the
>> -             * fences block for the BO to become idle
>> +             * fences block for the BO to become idle, and then
>> +             * we are free to update the resv pointer.
>>                */
>>               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
>>                             30 * HZ);
>> +            if (bo->type != ttm_bo_type_sg) {
>> +                bo->base.resv = &bo->base._resv;
>> +                /* Please see ttm_bo_individualize_resv() */
>> +                smp_mb();
>> +            }
>>           }
>>             if (bo->bdev->funcs->release_notify)
>> @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
>>           !dma_resv_trylock(bo->base.resv)) {
>>           /* The BO is not idle, resurrect it for delayed destroy */
>>           ttm_bo_flush_all_fences(bo);
>> -        bo->deleted = true;
>> -
>>           spin_lock(&bo->bdev->lru_lock);
>> +        bo->deleted = true;
>>             /*
>>            * Make pinned bos immediately available to
>> @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
>>               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>>           }
>>   +        spin_lock(&bo->bdev->ddestroy_lock);
>>           kref_init(&bo->kref);
>>           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
>> +        spin_unlock(&bo->bdev->ddestroy_lock);
>>           spin_unlock(&bo->bdev->lru_lock);
>>             schedule_delayed_work(&bdev->wq,
>> @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
>>         spin_lock(&bo->bdev->lru_lock);
>>       ttm_bo_del_from_lru(bo);
>> -    list_del(&bo->ddestroy);
>>       spin_unlock(&bo->bdev->lru_lock);
>>   +    pr_info("Deleting.\n");
>> +    WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
>> +
>>       ttm_bo_cleanup_memtype_use(bo);
>>       dma_resv_unlock(bo->base.resv);
>>   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>           list_for_each_entry(bo, &man->lru[i], lru) {
>>               bool busy;
>>   +            if (!ttm_bo_get_unless_zero(bo))
>> +                continue;
>> +
>>               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>                                   &busy)) {
>>                   if (busy && !busy_bo && ticket !=
>>                       dma_resv_locking_ctx(bo->base.resv))
>>                       busy_bo = bo;
>> +                ttm_bo_put(bo);
>>                   continue;
>>               }
>>                 if (place && !bdev->funcs->eviction_valuable(bo,
>>                                         place)) {
>> +                ttm_bo_put(bo);
>>                   if (locked)
>>                       dma_resv_unlock(bo->base.resv);
>>                   continue;
>>               }
>> -            if (!ttm_bo_get_unless_zero(bo)) {
>> -                if (locked)
>> -                    dma_resv_unlock(bo->base.resv);
>> -                continue;
>> -            }
>> +
>>               break;
>>           }
>>   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>       }
>>         if (bo->deleted) {
>> +        spin_unlock(&bdev->lru_lock);
>>           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
>> -                      ctx->no_wait_gpu, locked);
>> +                      ctx->no_wait_gpu);
>> +        if (locked)
>> +            dma_resv_unlock(bo->base.resv);
>>           ttm_bo_put(bo);
>>           return ret;
>>       }
>> @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct ttm_buffer_object 
>> *bo, struct ttm_operation_ctx *ctx,
>>       bool locked;
>>       int ret;
>>   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
>> +    if (!ttm_bo_get_unless_zero(bo))
>>           return -EBUSY;
>>   -    if (!ttm_bo_get_unless_zero(bo)) {
>> -        if (locked)
>> -            dma_resv_unlock(bo->base.resv);
>> +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
>> +        ttm_bo_put(bo);
>>           return -EBUSY;
>>       }
>>         if (bo->deleted) {
>> -        ttm_bo_cleanup_refs(bo, false, false, locked);
>> +        spin_unlock(&bo->bdev->lru_lock);
>> +        ttm_bo_cleanup_refs(bo, false, false);
>> +        if (locked)
>> +            dma_resv_unlock(bo->base.resv);
>>           ttm_bo_put(bo);
>>           return 0;
>>       }
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>> b/drivers/gpu/drm/ttm/ttm_device.c
>> index 9b787b3caeb5..b941989885ed 100644
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>> @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev, 
>> struct ttm_device_funcs *funcs,
>>       bdev->vma_manager = vma_manager;
>>       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
>>       spin_lock_init(&bdev->lru_lock);
>> +    spin_lock_init(&bdev->ddestroy_lock);
>>       INIT_LIST_HEAD(&bdev->ddestroy);
>>       bdev->dev_mapping = mapping;
>>       mutex_lock(&ttm_global_mutex);
>> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
>> index 7c8f87bd52d3..fa8c4f6a169e 100644
>> --- a/include/drm/ttm/ttm_device.h
>> +++ b/include/drm/ttm/ttm_device.h
>> @@ -279,6 +279,7 @@ struct ttm_device {
>>        * Protection for the per manager LRU and ddestroy lists.
>>        */
>>       spinlock_t lru_lock;
>> +    spinlock_t ddestroy_lock;
>>       struct list_head ddestroy;
>>         /*
>
Christian König April 12, 2021, 2:21 p.m. UTC | #3
Am 12.04.21 um 16:16 schrieb Thomas Hellström:
> Hi, Christian,
>
> On 4/12/21 4:01 PM, Christian König wrote:
>> Hi Thomas,
>>
>> well in general a good idea, but I'm working on a different plan for 
>> a while now.
>>
>> My idea here is that instead of the BO the resource object is kept on 
>> a double linked lru list.
>>
>> The resource objects then have a pointer to either the BO or a fence 
>> object.
>>
>> When it is a fence object we can just grab a reference to it and wait 
>> for it to complete.
>>
>> If it is a BO we evict it the same way we currently do.
>>
>> This allows to remove both the delayed delete, individualization of 
>> BOs, ghost objects etc...
>
> Hmm, ok. So in that case, what would trigger the final release of the 
> buffer object in the absence of a delayed delete list? Would we use a 
> fence callback for that?

Key point is you don't need any final release of the BO any more. When 
the BOs reference count becomes zero it can be destructed immediately.

Only the resource object is kept around and protected by a fence until 
it can be released finally.

How that resource object is then destroyed could be handled in different 
ways, we could use a delayed work, shrinker callback or just rely on new 
allocations to scan the deleted objects.

>
> Otherwise sounds like a good idea, and if it runs into problems, we 
> could possibly resurrect this.

Well the main problem is that I need to change all drivers so that 
bo->mem can now be a pointer. IIRC you suggested that years ago and I'm 
working on the details of that idea ever since.

Regards,
Christian.

>
> /Thomas
>
>
>
>
>>
>> Regards,
>> Christian.
>>
>> Am 12.04.21 um 15:17 schrieb Thomas Hellström:
>>> This RFC needs some decent testing on a driver with bos that share
>>> reservation objects, and of course a check for whether I missed
>>> something obvious.
>>>
>>> The locking around delayed destroy is rather complex due to the fact
>>> that we want to individualize dma_resv pointers before putting the 
>>> object
>>> on the delayed-destroy list. That individualization is currently 
>>> protected
>>> by the lru lock forcing us to do try-reservations under the lru lock 
>>> when
>>> reserving an object from the lru- or ddestroy lists, which complicates
>>> moving over to per-manager lru locks.
>>>
>>> Instead protect the individualization with the object kref == 0,
>>> and ensure kref != 0 before trying to reserve an object from the list.
>>> Remove the lru lock around individualization and protect the delayed
>>> destroy list with a separate spinlock. Isolate the delayed destroy list
>>> similarly to a lockless list before traversing it. Also don't try to 
>>> drop
>>> resource references from the delayed destroy list traversal, but leave
>>> that to the final object release. The role of the delayed destroy 
>>> thread
>>> will now simply be to try to idle the object and when idle, drop
>>> the delayed destoy reference.
>>>
>>> This should pave the way for per-manager lru lists, sleeping 
>>> eviction locks
>>> that are kept, optional drm_mm eviction roster usage and moving over 
>>> to a
>>> completely lockless delayed destroy list even if the traversal order
>>> will then change because there is no llist_add_tail().
>>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c     | 193 
>>> +++++++++++++------------------
>>>   drivers/gpu/drm/ttm/ttm_device.c |   1 +
>>>   include/drm/ttm/ttm_device.h     |   1 +
>>>   3 files changed, 82 insertions(+), 113 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 6ab7b66ce36d..fd57252bc8fe 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct 
>>> ttm_buffer_object *bo,
>>>     static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object 
>>> *bo)
>>>   {
>>> +    if (kref_read(&bo->kref))
>>> +        dma_resv_assert_held(bo->base.resv);
>>> +
>>>       if (bo->bdev->funcs->delete_mem_notify)
>>>           bo->bdev->funcs->delete_mem_notify(bo);
>>>   @@ -239,13 +242,18 @@ static int ttm_bo_individualize_resv(struct 
>>> ttm_buffer_object *bo)
>>>           return r;
>>>         if (bo->type != ttm_bo_type_sg) {
>>> -        /* This works because the BO is about to be destroyed and 
>>> nobody
>>> -         * reference it any more. The only tricky case is the 
>>> trylock on
>>> -         * the resv object while holding the lru_lock.
>>> +        /*
>>> +         * This works because nobody besides us can lock the bo or
>>> +         * assume it is locked while its refcount is zero.
>>>            */
>>> -        spin_lock(&bo->bdev->lru_lock);
>>> +        WARN_ON_ONCE(kref_read(&bo->kref));
>>>           bo->base.resv = &bo->base._resv;
>>> -        spin_unlock(&bo->bdev->lru_lock);
>>> +        /*
>>> +         * Don't reorder against kref_init().
>>> +         * Matches the implicit full barrier of a successful
>>> +         * kref_get_unless_zero() after a lru_list_lookup.
>>> +         */
>>> +        smp_mb();
>>>       }
>>>         return r;
>>> @@ -274,122 +282,66 @@ static void ttm_bo_flush_all_fences(struct 
>>> ttm_buffer_object *bo)
>>>   }
>>>     /**
>>> - * function ttm_bo_cleanup_refs
>>> - * If bo idle, remove from lru lists, and unref.
>>> - * If not idle, block if possible.
>>> - *
>>> - * Must be called with lru_lock and reservation held, this function
>>> - * will drop the lru lock and optionally the reservation lock 
>>> before returning.
>>> + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings and 
>>> remove
>>> + * from lru_lists.
>>>    *
>>>    * @bo:                    The buffer object to clean-up
>>>    * @interruptible:         Any sleeps should occur interruptibly.
>>> - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
>>> - * @unlock_resv:           Unlock the reservation lock as well.
>>> + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if not 
>>> idle.
>>>    */
>>> -
>>>   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>>> -                   bool interruptible, bool no_wait_gpu,
>>> -                   bool unlock_resv)
>>> +                   bool interruptible, bool no_wait_gpu)
>>>   {
>>> -    struct dma_resv *resv = &bo->base._resv;
>>>       int ret;
>>>   -    if (dma_resv_test_signaled_rcu(resv, true))
>>> -        ret = 0;
>>> -    else
>>> -        ret = -EBUSY;
>>> -
>>> -    if (ret && !no_wait_gpu) {
>>> -        long lret;
>>> -
>>> -        if (unlock_resv)
>>> -            dma_resv_unlock(bo->base.resv);
>>> -        spin_unlock(&bo->bdev->lru_lock);
>>> -
>>> -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
>>> -                         30 * HZ);
>>> -
>>> -        if (lret < 0)
>>> -            return lret;
>>> -        else if (lret == 0)
>>> -            return -EBUSY;
>>> -
>>> -        spin_lock(&bo->bdev->lru_lock);
>>> -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
>>> -            /*
>>> -             * We raced, and lost, someone else holds the 
>>> reservation now,
>>> -             * and is probably busy in ttm_bo_cleanup_memtype_use.
>>> -             *
>>> -             * Even if it's not the case, because we finished 
>>> waiting any
>>> -             * delayed destruction would succeed, so just return 
>>> success
>>> -             * here.
>>> -             */
>>> -            spin_unlock(&bo->bdev->lru_lock);
>>> -            return 0;
>>> -        }
>>> -        ret = 0;
>>> -    }
>>> -
>>> -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
>>> -        if (unlock_resv)
>>> -            dma_resv_unlock(bo->base.resv);
>>> -        spin_unlock(&bo->bdev->lru_lock);
>>> +    dma_resv_assert_held(bo->base.resv);
>>> +    WARN_ON_ONCE(!bo->deleted);
>>> +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
>>> +    if (ret)
>>>           return ret;
>>> -    }
>>>   +    ttm_bo_cleanup_memtype_use(bo);
>>> +    spin_lock(&bo->bdev->lru_lock);
>>>       ttm_bo_del_from_lru(bo);
>>> -    list_del_init(&bo->ddestroy);
>>>       spin_unlock(&bo->bdev->lru_lock);
>>> -    ttm_bo_cleanup_memtype_use(bo);
>>> -
>>> -    if (unlock_resv)
>>> -        dma_resv_unlock(bo->base.resv);
>>> -
>>> -    ttm_bo_put(bo);
>>>         return 0;
>>>   }
>>>     /*
>>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
>>> - * encountered buffers.
>>> + * Isolate a part of the delayed destroy list and check for idle on 
>>> all
>>> + * buffer objects on it. If idle, remove from the list and drop the
>>> + * delayed destroy refcount. Return all busy buffers to the delayed
>>> + * destroy list.
>>>    */
>>>   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
>>>   {
>>> -    struct list_head removed;
>>> -    bool empty;
>>> -
>>> -    INIT_LIST_HEAD(&removed);
>>> -
>>> -    spin_lock(&bdev->lru_lock);
>>> -    while (!list_empty(&bdev->ddestroy)) {
>>> -        struct ttm_buffer_object *bo;
>>> -
>>> -        bo = list_first_entry(&bdev->ddestroy, struct 
>>> ttm_buffer_object,
>>> -                      ddestroy);
>>> -        list_move_tail(&bo->ddestroy, &removed);
>>> -        if (!ttm_bo_get_unless_zero(bo))
>>> -            continue;
>>> -
>>> -        if (remove_all || bo->base.resv != &bo->base._resv) {
>>> -            spin_unlock(&bdev->lru_lock);
>>> -            dma_resv_lock(bo->base.resv, NULL);
>>> -
>>> -            spin_lock(&bdev->lru_lock);
>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>> -
>>> -        } else if (dma_resv_trylock(bo->base.resv)) {
>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>> -        } else {
>>> -            spin_unlock(&bdev->lru_lock);
>>> +    struct ttm_buffer_object *bo, *next;
>>> +    struct list_head isolated;
>>> +    bool empty, idle;
>>> +
>>> +    INIT_LIST_HEAD(&isolated);
>>> +
>>> +    spin_lock(&bdev->ddestroy_lock);
>>> +    list_splice_init(&bdev->ddestroy, &isolated);
>>> +    spin_unlock(&bdev->ddestroy_lock);
>>> +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
>>> +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
>>> +                 bo->base.resv != &bo->base._resv);
>>> +        if (!remove_all)
>>> +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
>>> +        else
>>> +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
>>> +                              false, 30 * HZ) > 0);
>>> +        if (idle) {
>>> +            list_del_init(&bo->ddestroy);
>>> +            ttm_bo_put(bo);
>>>           }
>>> -
>>> -        ttm_bo_put(bo);
>>> -        spin_lock(&bdev->lru_lock);
>>>       }
>>> -    list_splice_tail(&removed, &bdev->ddestroy);
>>> +    spin_lock(&bdev->ddestroy_lock);
>>> +    list_splice(&isolated, &bdev->ddestroy);
>>>       empty = list_empty(&bdev->ddestroy);
>>> -    spin_unlock(&bdev->lru_lock);
>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>         return empty;
>>>   }
>>> @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
>>>           ret = ttm_bo_individualize_resv(bo);
>>>           if (ret) {
>>>               /* Last resort, if we fail to allocate memory for the
>>> -             * fences block for the BO to become idle
>>> +             * fences block for the BO to become idle, and then
>>> +             * we are free to update the resv pointer.
>>>                */
>>>               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
>>>                             30 * HZ);
>>> +            if (bo->type != ttm_bo_type_sg) {
>>> +                bo->base.resv = &bo->base._resv;
>>> +                /* Please see ttm_bo_individualize_resv() */
>>> +                smp_mb();
>>> +            }
>>>           }
>>>             if (bo->bdev->funcs->release_notify)
>>> @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
>>>           !dma_resv_trylock(bo->base.resv)) {
>>>           /* The BO is not idle, resurrect it for delayed destroy */
>>>           ttm_bo_flush_all_fences(bo);
>>> -        bo->deleted = true;
>>> -
>>>           spin_lock(&bo->bdev->lru_lock);
>>> +        bo->deleted = true;
>>>             /*
>>>            * Make pinned bos immediately available to
>>> @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
>>>               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>>>           }
>>>   +        spin_lock(&bo->bdev->ddestroy_lock);
>>>           kref_init(&bo->kref);
>>>           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
>>> +        spin_unlock(&bo->bdev->ddestroy_lock);
>>>           spin_unlock(&bo->bdev->lru_lock);
>>>             schedule_delayed_work(&bdev->wq,
>>> @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
>>>         spin_lock(&bo->bdev->lru_lock);
>>>       ttm_bo_del_from_lru(bo);
>>> -    list_del(&bo->ddestroy);
>>>       spin_unlock(&bo->bdev->lru_lock);
>>>   +    pr_info("Deleting.\n");
>>> +    WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
>>> +
>>>       ttm_bo_cleanup_memtype_use(bo);
>>>       dma_resv_unlock(bo->base.resv);
>>>   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct ttm_device 
>>> *bdev,
>>>           list_for_each_entry(bo, &man->lru[i], lru) {
>>>               bool busy;
>>>   +            if (!ttm_bo_get_unless_zero(bo))
>>> +                continue;
>>> +
>>>               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>>                                   &busy)) {
>>>                   if (busy && !busy_bo && ticket !=
>>>                       dma_resv_locking_ctx(bo->base.resv))
>>>                       busy_bo = bo;
>>> +                ttm_bo_put(bo);
>>>                   continue;
>>>               }
>>>                 if (place && !bdev->funcs->eviction_valuable(bo,
>>>                                         place)) {
>>> +                ttm_bo_put(bo);
>>>                   if (locked)
>>>                       dma_resv_unlock(bo->base.resv);
>>>                   continue;
>>>               }
>>> -            if (!ttm_bo_get_unless_zero(bo)) {
>>> -                if (locked)
>>> -                    dma_resv_unlock(bo->base.resv);
>>> -                continue;
>>> -            }
>>> +
>>>               break;
>>>           }
>>>   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>>       }
>>>         if (bo->deleted) {
>>> +        spin_unlock(&bdev->lru_lock);
>>>           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
>>> -                      ctx->no_wait_gpu, locked);
>>> +                      ctx->no_wait_gpu);
>>> +        if (locked)
>>> +            dma_resv_unlock(bo->base.resv);
>>>           ttm_bo_put(bo);
>>>           return ret;
>>>       }
>>> @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct ttm_buffer_object 
>>> *bo, struct ttm_operation_ctx *ctx,
>>>       bool locked;
>>>       int ret;
>>>   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
>>> +    if (!ttm_bo_get_unless_zero(bo))
>>>           return -EBUSY;
>>>   -    if (!ttm_bo_get_unless_zero(bo)) {
>>> -        if (locked)
>>> -            dma_resv_unlock(bo->base.resv);
>>> +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
>>> +        ttm_bo_put(bo);
>>>           return -EBUSY;
>>>       }
>>>         if (bo->deleted) {
>>> -        ttm_bo_cleanup_refs(bo, false, false, locked);
>>> +        spin_unlock(&bo->bdev->lru_lock);
>>> +        ttm_bo_cleanup_refs(bo, false, false);
>>> +        if (locked)
>>> +            dma_resv_unlock(bo->base.resv);
>>>           ttm_bo_put(bo);
>>>           return 0;
>>>       }
>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>>> b/drivers/gpu/drm/ttm/ttm_device.c
>>> index 9b787b3caeb5..b941989885ed 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>> @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev, 
>>> struct ttm_device_funcs *funcs,
>>>       bdev->vma_manager = vma_manager;
>>>       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
>>>       spin_lock_init(&bdev->lru_lock);
>>> +    spin_lock_init(&bdev->ddestroy_lock);
>>>       INIT_LIST_HEAD(&bdev->ddestroy);
>>>       bdev->dev_mapping = mapping;
>>>       mutex_lock(&ttm_global_mutex);
>>> diff --git a/include/drm/ttm/ttm_device.h 
>>> b/include/drm/ttm/ttm_device.h
>>> index 7c8f87bd52d3..fa8c4f6a169e 100644
>>> --- a/include/drm/ttm/ttm_device.h
>>> +++ b/include/drm/ttm/ttm_device.h
>>> @@ -279,6 +279,7 @@ struct ttm_device {
>>>        * Protection for the per manager LRU and ddestroy lists.
>>>        */
>>>       spinlock_t lru_lock;
>>> +    spinlock_t ddestroy_lock;
>>>       struct list_head ddestroy;
>>>         /*
>>
Thomas Hellström April 12, 2021, 2:40 p.m. UTC | #4
On 4/12/21 4:21 PM, Christian König wrote:
>
>
> Am 12.04.21 um 16:16 schrieb Thomas Hellström:
>> Hi, Christian,
>>
>> On 4/12/21 4:01 PM, Christian König wrote:
>>> Hi Thomas,
>>>
>>> well in general a good idea, but I'm working on a different plan for 
>>> a while now.
>>>
>>> My idea here is that instead of the BO the resource object is kept 
>>> on a double linked lru list.
>>>
>>> The resource objects then have a pointer to either the BO or a fence 
>>> object.
>>>
>>> When it is a fence object we can just grab a reference to it and 
>>> wait for it to complete.
>>>
>>> If it is a BO we evict it the same way we currently do.
>>>
>>> This allows to remove both the delayed delete, individualization of 
>>> BOs, ghost objects etc...
>>
>> Hmm, ok. So in that case, what would trigger the final release of the 
>> buffer object in the absence of a delayed delete list? Would we use a 
>> fence callback for that?
>
> Key point is you don't need any final release of the BO any more. When 
> the BOs reference count becomes zero it can be destructed immediately.
>
> Only the resource object is kept around and protected by a fence until 
> it can be released finally.
>
> How that resource object is then destroyed could be handled in 
> different ways, we could use a delayed work, shrinker callback or just 
> rely on new allocations to scan the deleted objects.

Hmm, ok, but what about when the object is backed by pages? Would the 
pages be attached to the resource object or how do we know when to free 
them?

>
>>
>> Otherwise sounds like a good idea, and if it runs into problems, we 
>> could possibly resurrect this.
>
> Well the main problem is that I need to change all drivers so that 
> bo->mem can now be a pointer. IIRC you suggested that years ago and 
> I'm working on the details of that idea ever since.

Hmm, I might have, yes. Can't remember the context, though.

Thanks,
/Thomas


>
> Regards,
> Christian.
>
>>
>> /Thomas
>>
>>
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 12.04.21 um 15:17 schrieb Thomas Hellström:
>>>> This RFC needs some decent testing on a driver with bos that share
>>>> reservation objects, and of course a check for whether I missed
>>>> something obvious.
>>>>
>>>> The locking around delayed destroy is rather complex due to the fact
>>>> that we want to individualize dma_resv pointers before putting the 
>>>> object
>>>> on the delayed-destroy list. That individualization is currently 
>>>> protected
>>>> by the lru lock forcing us to do try-reservations under the lru 
>>>> lock when
>>>> reserving an object from the lru- or ddestroy lists, which complicates
>>>> moving over to per-manager lru locks.
>>>>
>>>> Instead protect the individualization with the object kref == 0,
>>>> and ensure kref != 0 before trying to reserve an object from the list.
>>>> Remove the lru lock around individualization and protect the delayed
>>>> destroy list with a separate spinlock. Isolate the delayed destroy 
>>>> list
>>>> similarly to a lockless list before traversing it. Also don't try 
>>>> to drop
>>>> resource references from the delayed destroy list traversal, but leave
>>>> that to the final object release. The role of the delayed destroy 
>>>> thread
>>>> will now simply be to try to idle the object and when idle, drop
>>>> the delayed destoy reference.
>>>>
>>>> This should pave the way for per-manager lru lists, sleeping 
>>>> eviction locks
>>>> that are kept, optional drm_mm eviction roster usage and moving 
>>>> over to a
>>>> completely lockless delayed destroy list even if the traversal order
>>>> will then change because there is no llist_add_tail().
>>>>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_bo.c     | 193 
>>>> +++++++++++++------------------
>>>>   drivers/gpu/drm/ttm/ttm_device.c |   1 +
>>>>   include/drm/ttm/ttm_device.h     |   1 +
>>>>   3 files changed, 82 insertions(+), 113 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> index 6ab7b66ce36d..fd57252bc8fe 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct 
>>>> ttm_buffer_object *bo,
>>>>     static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object 
>>>> *bo)
>>>>   {
>>>> +    if (kref_read(&bo->kref))
>>>> +        dma_resv_assert_held(bo->base.resv);
>>>> +
>>>>       if (bo->bdev->funcs->delete_mem_notify)
>>>>           bo->bdev->funcs->delete_mem_notify(bo);
>>>>   @@ -239,13 +242,18 @@ static int ttm_bo_individualize_resv(struct 
>>>> ttm_buffer_object *bo)
>>>>           return r;
>>>>         if (bo->type != ttm_bo_type_sg) {
>>>> -        /* This works because the BO is about to be destroyed and 
>>>> nobody
>>>> -         * reference it any more. The only tricky case is the 
>>>> trylock on
>>>> -         * the resv object while holding the lru_lock.
>>>> +        /*
>>>> +         * This works because nobody besides us can lock the bo or
>>>> +         * assume it is locked while its refcount is zero.
>>>>            */
>>>> -        spin_lock(&bo->bdev->lru_lock);
>>>> +        WARN_ON_ONCE(kref_read(&bo->kref));
>>>>           bo->base.resv = &bo->base._resv;
>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>> +        /*
>>>> +         * Don't reorder against kref_init().
>>>> +         * Matches the implicit full barrier of a successful
>>>> +         * kref_get_unless_zero() after a lru_list_lookup.
>>>> +         */
>>>> +        smp_mb();
>>>>       }
>>>>         return r;
>>>> @@ -274,122 +282,66 @@ static void ttm_bo_flush_all_fences(struct 
>>>> ttm_buffer_object *bo)
>>>>   }
>>>>     /**
>>>> - * function ttm_bo_cleanup_refs
>>>> - * If bo idle, remove from lru lists, and unref.
>>>> - * If not idle, block if possible.
>>>> - *
>>>> - * Must be called with lru_lock and reservation held, this function
>>>> - * will drop the lru lock and optionally the reservation lock 
>>>> before returning.
>>>> + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings 
>>>> and remove
>>>> + * from lru_lists.
>>>>    *
>>>>    * @bo:                    The buffer object to clean-up
>>>>    * @interruptible:         Any sleeps should occur interruptibly.
>>>> - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
>>>> - * @unlock_resv:           Unlock the reservation lock as well.
>>>> + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if 
>>>> not idle.
>>>>    */
>>>> -
>>>>   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>>>> -                   bool interruptible, bool no_wait_gpu,
>>>> -                   bool unlock_resv)
>>>> +                   bool interruptible, bool no_wait_gpu)
>>>>   {
>>>> -    struct dma_resv *resv = &bo->base._resv;
>>>>       int ret;
>>>>   -    if (dma_resv_test_signaled_rcu(resv, true))
>>>> -        ret = 0;
>>>> -    else
>>>> -        ret = -EBUSY;
>>>> -
>>>> -    if (ret && !no_wait_gpu) {
>>>> -        long lret;
>>>> -
>>>> -        if (unlock_resv)
>>>> -            dma_resv_unlock(bo->base.resv);
>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>> -
>>>> -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
>>>> -                         30 * HZ);
>>>> -
>>>> -        if (lret < 0)
>>>> -            return lret;
>>>> -        else if (lret == 0)
>>>> -            return -EBUSY;
>>>> -
>>>> -        spin_lock(&bo->bdev->lru_lock);
>>>> -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
>>>> -            /*
>>>> -             * We raced, and lost, someone else holds the 
>>>> reservation now,
>>>> -             * and is probably busy in ttm_bo_cleanup_memtype_use.
>>>> -             *
>>>> -             * Even if it's not the case, because we finished 
>>>> waiting any
>>>> -             * delayed destruction would succeed, so just return 
>>>> success
>>>> -             * here.
>>>> -             */
>>>> -            spin_unlock(&bo->bdev->lru_lock);
>>>> -            return 0;
>>>> -        }
>>>> -        ret = 0;
>>>> -    }
>>>> -
>>>> -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
>>>> -        if (unlock_resv)
>>>> -            dma_resv_unlock(bo->base.resv);
>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>> +    dma_resv_assert_held(bo->base.resv);
>>>> +    WARN_ON_ONCE(!bo->deleted);
>>>> +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
>>>> +    if (ret)
>>>>           return ret;
>>>> -    }
>>>>   +    ttm_bo_cleanup_memtype_use(bo);
>>>> +    spin_lock(&bo->bdev->lru_lock);
>>>>       ttm_bo_del_from_lru(bo);
>>>> -    list_del_init(&bo->ddestroy);
>>>>       spin_unlock(&bo->bdev->lru_lock);
>>>> -    ttm_bo_cleanup_memtype_use(bo);
>>>> -
>>>> -    if (unlock_resv)
>>>> -        dma_resv_unlock(bo->base.resv);
>>>> -
>>>> -    ttm_bo_put(bo);
>>>>         return 0;
>>>>   }
>>>>     /*
>>>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
>>>> - * encountered buffers.
>>>> + * Isolate a part of the delayed destroy list and check for idle 
>>>> on all
>>>> + * buffer objects on it. If idle, remove from the list and drop the
>>>> + * delayed destroy refcount. Return all busy buffers to the delayed
>>>> + * destroy list.
>>>>    */
>>>>   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
>>>>   {
>>>> -    struct list_head removed;
>>>> -    bool empty;
>>>> -
>>>> -    INIT_LIST_HEAD(&removed);
>>>> -
>>>> -    spin_lock(&bdev->lru_lock);
>>>> -    while (!list_empty(&bdev->ddestroy)) {
>>>> -        struct ttm_buffer_object *bo;
>>>> -
>>>> -        bo = list_first_entry(&bdev->ddestroy, struct 
>>>> ttm_buffer_object,
>>>> -                      ddestroy);
>>>> -        list_move_tail(&bo->ddestroy, &removed);
>>>> -        if (!ttm_bo_get_unless_zero(bo))
>>>> -            continue;
>>>> -
>>>> -        if (remove_all || bo->base.resv != &bo->base._resv) {
>>>> -            spin_unlock(&bdev->lru_lock);
>>>> -            dma_resv_lock(bo->base.resv, NULL);
>>>> -
>>>> -            spin_lock(&bdev->lru_lock);
>>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>>> -
>>>> -        } else if (dma_resv_trylock(bo->base.resv)) {
>>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>>> -        } else {
>>>> -            spin_unlock(&bdev->lru_lock);
>>>> +    struct ttm_buffer_object *bo, *next;
>>>> +    struct list_head isolated;
>>>> +    bool empty, idle;
>>>> +
>>>> +    INIT_LIST_HEAD(&isolated);
>>>> +
>>>> +    spin_lock(&bdev->ddestroy_lock);
>>>> +    list_splice_init(&bdev->ddestroy, &isolated);
>>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>> +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
>>>> +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
>>>> +                 bo->base.resv != &bo->base._resv);
>>>> +        if (!remove_all)
>>>> +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
>>>> +        else
>>>> +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
>>>> +                              false, 30 * HZ) > 0);
>>>> +        if (idle) {
>>>> +            list_del_init(&bo->ddestroy);
>>>> +            ttm_bo_put(bo);
>>>>           }
>>>> -
>>>> -        ttm_bo_put(bo);
>>>> -        spin_lock(&bdev->lru_lock);
>>>>       }
>>>> -    list_splice_tail(&removed, &bdev->ddestroy);
>>>> +    spin_lock(&bdev->ddestroy_lock);
>>>> +    list_splice(&isolated, &bdev->ddestroy);
>>>>       empty = list_empty(&bdev->ddestroy);
>>>> -    spin_unlock(&bdev->lru_lock);
>>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>>         return empty;
>>>>   }
>>>> @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
>>>>           ret = ttm_bo_individualize_resv(bo);
>>>>           if (ret) {
>>>>               /* Last resort, if we fail to allocate memory for the
>>>> -             * fences block for the BO to become idle
>>>> +             * fences block for the BO to become idle, and then
>>>> +             * we are free to update the resv pointer.
>>>>                */
>>>>               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
>>>>                             30 * HZ);
>>>> +            if (bo->type != ttm_bo_type_sg) {
>>>> +                bo->base.resv = &bo->base._resv;
>>>> +                /* Please see ttm_bo_individualize_resv() */
>>>> +                smp_mb();
>>>> +            }
>>>>           }
>>>>             if (bo->bdev->funcs->release_notify)
>>>> @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
>>>>           !dma_resv_trylock(bo->base.resv)) {
>>>>           /* The BO is not idle, resurrect it for delayed destroy */
>>>>           ttm_bo_flush_all_fences(bo);
>>>> -        bo->deleted = true;
>>>> -
>>>>           spin_lock(&bo->bdev->lru_lock);
>>>> +        bo->deleted = true;
>>>>             /*
>>>>            * Make pinned bos immediately available to
>>>> @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
>>>>               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>>>>           }
>>>>   +        spin_lock(&bo->bdev->ddestroy_lock);
>>>>           kref_init(&bo->kref);
>>>>           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
>>>> +        spin_unlock(&bo->bdev->ddestroy_lock);
>>>>           spin_unlock(&bo->bdev->lru_lock);
>>>>             schedule_delayed_work(&bdev->wq,
>>>> @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
>>>>         spin_lock(&bo->bdev->lru_lock);
>>>>       ttm_bo_del_from_lru(bo);
>>>> -    list_del(&bo->ddestroy);
>>>>       spin_unlock(&bo->bdev->lru_lock);
>>>>   +    pr_info("Deleting.\n");
>>>> + WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
>>>> +
>>>>       ttm_bo_cleanup_memtype_use(bo);
>>>>       dma_resv_unlock(bo->base.resv);
>>>>   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct ttm_device 
>>>> *bdev,
>>>>           list_for_each_entry(bo, &man->lru[i], lru) {
>>>>               bool busy;
>>>>   +            if (!ttm_bo_get_unless_zero(bo))
>>>> +                continue;
>>>> +
>>>>               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>>>                                   &busy)) {
>>>>                   if (busy && !busy_bo && ticket !=
>>>> dma_resv_locking_ctx(bo->base.resv))
>>>>                       busy_bo = bo;
>>>> +                ttm_bo_put(bo);
>>>>                   continue;
>>>>               }
>>>>                 if (place && !bdev->funcs->eviction_valuable(bo,
>>>>                                         place)) {
>>>> +                ttm_bo_put(bo);
>>>>                   if (locked)
>>>>                       dma_resv_unlock(bo->base.resv);
>>>>                   continue;
>>>>               }
>>>> -            if (!ttm_bo_get_unless_zero(bo)) {
>>>> -                if (locked)
>>>> -                    dma_resv_unlock(bo->base.resv);
>>>> -                continue;
>>>> -            }
>>>> +
>>>>               break;
>>>>           }
>>>>   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device 
>>>> *bdev,
>>>>       }
>>>>         if (bo->deleted) {
>>>> +        spin_unlock(&bdev->lru_lock);
>>>>           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
>>>> -                      ctx->no_wait_gpu, locked);
>>>> +                      ctx->no_wait_gpu);
>>>> +        if (locked)
>>>> +            dma_resv_unlock(bo->base.resv);
>>>>           ttm_bo_put(bo);
>>>>           return ret;
>>>>       }
>>>> @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct ttm_buffer_object 
>>>> *bo, struct ttm_operation_ctx *ctx,
>>>>       bool locked;
>>>>       int ret;
>>>>   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
>>>> +    if (!ttm_bo_get_unless_zero(bo))
>>>>           return -EBUSY;
>>>>   -    if (!ttm_bo_get_unless_zero(bo)) {
>>>> -        if (locked)
>>>> -            dma_resv_unlock(bo->base.resv);
>>>> +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
>>>> +        ttm_bo_put(bo);
>>>>           return -EBUSY;
>>>>       }
>>>>         if (bo->deleted) {
>>>> -        ttm_bo_cleanup_refs(bo, false, false, locked);
>>>> +        spin_unlock(&bo->bdev->lru_lock);
>>>> +        ttm_bo_cleanup_refs(bo, false, false);
>>>> +        if (locked)
>>>> +            dma_resv_unlock(bo->base.resv);
>>>>           ttm_bo_put(bo);
>>>>           return 0;
>>>>       }
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>>>> b/drivers/gpu/drm/ttm/ttm_device.c
>>>> index 9b787b3caeb5..b941989885ed 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>>> @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev, 
>>>> struct ttm_device_funcs *funcs,
>>>>       bdev->vma_manager = vma_manager;
>>>>       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
>>>>       spin_lock_init(&bdev->lru_lock);
>>>> +    spin_lock_init(&bdev->ddestroy_lock);
>>>>       INIT_LIST_HEAD(&bdev->ddestroy);
>>>>       bdev->dev_mapping = mapping;
>>>>       mutex_lock(&ttm_global_mutex);
>>>> diff --git a/include/drm/ttm/ttm_device.h 
>>>> b/include/drm/ttm/ttm_device.h
>>>> index 7c8f87bd52d3..fa8c4f6a169e 100644
>>>> --- a/include/drm/ttm/ttm_device.h
>>>> +++ b/include/drm/ttm/ttm_device.h
>>>> @@ -279,6 +279,7 @@ struct ttm_device {
>>>>        * Protection for the per manager LRU and ddestroy lists.
>>>>        */
>>>>       spinlock_t lru_lock;
>>>> +    spinlock_t ddestroy_lock;
>>>>       struct list_head ddestroy;
>>>>         /*
>>>
>
Christian König April 12, 2021, 2:44 p.m. UTC | #5
Am 12.04.21 um 16:40 schrieb Thomas Hellström:
>
> On 4/12/21 4:21 PM, Christian König wrote:
>>
>>
>> Am 12.04.21 um 16:16 schrieb Thomas Hellström:
>>> Hi, Christian,
>>>
>>> On 4/12/21 4:01 PM, Christian König wrote:
>>>> Hi Thomas,
>>>>
>>>> well in general a good idea, but I'm working on a different plan 
>>>> for a while now.
>>>>
>>>> My idea here is that instead of the BO the resource object is kept 
>>>> on a double linked lru list.
>>>>
>>>> The resource objects then have a pointer to either the BO or a 
>>>> fence object.
>>>>
>>>> When it is a fence object we can just grab a reference to it and 
>>>> wait for it to complete.
>>>>
>>>> If it is a BO we evict it the same way we currently do.
>>>>
>>>> This allows to remove both the delayed delete, individualization of 
>>>> BOs, ghost objects etc...
>>>
>>> Hmm, ok. So in that case, what would trigger the final release of 
>>> the buffer object in the absence of a delayed delete list? Would we 
>>> use a fence callback for that?
>>
>> Key point is you don't need any final release of the BO any more. 
>> When the BOs reference count becomes zero it can be destructed 
>> immediately.
>>
>> Only the resource object is kept around and protected by a fence 
>> until it can be released finally.
>>
>> How that resource object is then destroyed could be handled in 
>> different ways, we could use a delayed work, shrinker callback or 
>> just rely on new allocations to scan the deleted objects.
>
> Hmm, ok, but what about when the object is backed by pages? Would the 
> pages be attached to the resource object or how do we know when to 
> free them?

Yes, exactly that's the idea.

Basically the ttm pointer will be moved into the resource object as well.

Christian.

>
>>
>>>
>>> Otherwise sounds like a good idea, and if it runs into problems, we 
>>> could possibly resurrect this.
>>
>> Well the main problem is that I need to change all drivers so that 
>> bo->mem can now be a pointer. IIRC you suggested that years ago and 
>> I'm working on the details of that idea ever since.
>
> Hmm, I might have, yes. Can't remember the context, though.
>
> Thanks,
> /Thomas
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 12.04.21 um 15:17 schrieb Thomas Hellström:
>>>>> This RFC needs some decent testing on a driver with bos that share
>>>>> reservation objects, and of course a check for whether I missed
>>>>> something obvious.
>>>>>
>>>>> The locking around delayed destroy is rather complex due to the fact
>>>>> that we want to individualize dma_resv pointers before putting the 
>>>>> object
>>>>> on the delayed-destroy list. That individualization is currently 
>>>>> protected
>>>>> by the lru lock forcing us to do try-reservations under the lru 
>>>>> lock when
>>>>> reserving an object from the lru- or ddestroy lists, which 
>>>>> complicates
>>>>> moving over to per-manager lru locks.
>>>>>
>>>>> Instead protect the individualization with the object kref == 0,
>>>>> and ensure kref != 0 before trying to reserve an object from the 
>>>>> list.
>>>>> Remove the lru lock around individualization and protect the delayed
>>>>> destroy list with a separate spinlock. Isolate the delayed destroy 
>>>>> list
>>>>> similarly to a lockless list before traversing it. Also don't try 
>>>>> to drop
>>>>> resource references from the delayed destroy list traversal, but 
>>>>> leave
>>>>> that to the final object release. The role of the delayed destroy 
>>>>> thread
>>>>> will now simply be to try to idle the object and when idle, drop
>>>>> the delayed destoy reference.
>>>>>
>>>>> This should pave the way for per-manager lru lists, sleeping 
>>>>> eviction locks
>>>>> that are kept, optional drm_mm eviction roster usage and moving 
>>>>> over to a
>>>>> completely lockless delayed destroy list even if the traversal order
>>>>> will then change because there is no llist_add_tail().
>>>>>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/ttm/ttm_bo.c     | 193 
>>>>> +++++++++++++------------------
>>>>>   drivers/gpu/drm/ttm/ttm_device.c |   1 +
>>>>>   include/drm/ttm/ttm_device.h     |   1 +
>>>>>   3 files changed, 82 insertions(+), 113 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> index 6ab7b66ce36d..fd57252bc8fe 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct 
>>>>> ttm_buffer_object *bo,
>>>>>     static void ttm_bo_cleanup_memtype_use(struct 
>>>>> ttm_buffer_object *bo)
>>>>>   {
>>>>> +    if (kref_read(&bo->kref))
>>>>> +        dma_resv_assert_held(bo->base.resv);
>>>>> +
>>>>>       if (bo->bdev->funcs->delete_mem_notify)
>>>>>           bo->bdev->funcs->delete_mem_notify(bo);
>>>>>   @@ -239,13 +242,18 @@ static int 
>>>>> ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
>>>>>           return r;
>>>>>         if (bo->type != ttm_bo_type_sg) {
>>>>> -        /* This works because the BO is about to be destroyed and 
>>>>> nobody
>>>>> -         * reference it any more. The only tricky case is the 
>>>>> trylock on
>>>>> -         * the resv object while holding the lru_lock.
>>>>> +        /*
>>>>> +         * This works because nobody besides us can lock the bo or
>>>>> +         * assume it is locked while its refcount is zero.
>>>>>            */
>>>>> -        spin_lock(&bo->bdev->lru_lock);
>>>>> +        WARN_ON_ONCE(kref_read(&bo->kref));
>>>>>           bo->base.resv = &bo->base._resv;
>>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>>> +        /*
>>>>> +         * Don't reorder against kref_init().
>>>>> +         * Matches the implicit full barrier of a successful
>>>>> +         * kref_get_unless_zero() after a lru_list_lookup.
>>>>> +         */
>>>>> +        smp_mb();
>>>>>       }
>>>>>         return r;
>>>>> @@ -274,122 +282,66 @@ static void ttm_bo_flush_all_fences(struct 
>>>>> ttm_buffer_object *bo)
>>>>>   }
>>>>>     /**
>>>>> - * function ttm_bo_cleanup_refs
>>>>> - * If bo idle, remove from lru lists, and unref.
>>>>> - * If not idle, block if possible.
>>>>> - *
>>>>> - * Must be called with lru_lock and reservation held, this function
>>>>> - * will drop the lru lock and optionally the reservation lock 
>>>>> before returning.
>>>>> + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings 
>>>>> and remove
>>>>> + * from lru_lists.
>>>>>    *
>>>>>    * @bo:                    The buffer object to clean-up
>>>>>    * @interruptible:         Any sleeps should occur interruptibly.
>>>>> - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY 
>>>>> instead.
>>>>> - * @unlock_resv:           Unlock the reservation lock as well.
>>>>> + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if 
>>>>> not idle.
>>>>>    */
>>>>> -
>>>>>   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
>>>>> -                   bool interruptible, bool no_wait_gpu,
>>>>> -                   bool unlock_resv)
>>>>> +                   bool interruptible, bool no_wait_gpu)
>>>>>   {
>>>>> -    struct dma_resv *resv = &bo->base._resv;
>>>>>       int ret;
>>>>>   -    if (dma_resv_test_signaled_rcu(resv, true))
>>>>> -        ret = 0;
>>>>> -    else
>>>>> -        ret = -EBUSY;
>>>>> -
>>>>> -    if (ret && !no_wait_gpu) {
>>>>> -        long lret;
>>>>> -
>>>>> -        if (unlock_resv)
>>>>> -            dma_resv_unlock(bo->base.resv);
>>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>>> -
>>>>> -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
>>>>> -                         30 * HZ);
>>>>> -
>>>>> -        if (lret < 0)
>>>>> -            return lret;
>>>>> -        else if (lret == 0)
>>>>> -            return -EBUSY;
>>>>> -
>>>>> -        spin_lock(&bo->bdev->lru_lock);
>>>>> -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
>>>>> -            /*
>>>>> -             * We raced, and lost, someone else holds the 
>>>>> reservation now,
>>>>> -             * and is probably busy in ttm_bo_cleanup_memtype_use.
>>>>> -             *
>>>>> -             * Even if it's not the case, because we finished 
>>>>> waiting any
>>>>> -             * delayed destruction would succeed, so just return 
>>>>> success
>>>>> -             * here.
>>>>> -             */
>>>>> -            spin_unlock(&bo->bdev->lru_lock);
>>>>> -            return 0;
>>>>> -        }
>>>>> -        ret = 0;
>>>>> -    }
>>>>> -
>>>>> -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
>>>>> -        if (unlock_resv)
>>>>> -            dma_resv_unlock(bo->base.resv);
>>>>> -        spin_unlock(&bo->bdev->lru_lock);
>>>>> +    dma_resv_assert_held(bo->base.resv);
>>>>> +    WARN_ON_ONCE(!bo->deleted);
>>>>> +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
>>>>> +    if (ret)
>>>>>           return ret;
>>>>> -    }
>>>>>   +    ttm_bo_cleanup_memtype_use(bo);
>>>>> +    spin_lock(&bo->bdev->lru_lock);
>>>>>       ttm_bo_del_from_lru(bo);
>>>>> -    list_del_init(&bo->ddestroy);
>>>>>       spin_unlock(&bo->bdev->lru_lock);
>>>>> -    ttm_bo_cleanup_memtype_use(bo);
>>>>> -
>>>>> -    if (unlock_resv)
>>>>> -        dma_resv_unlock(bo->base.resv);
>>>>> -
>>>>> -    ttm_bo_put(bo);
>>>>>         return 0;
>>>>>   }
>>>>>     /*
>>>>> - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
>>>>> - * encountered buffers.
>>>>> + * Isolate a part of the delayed destroy list and check for idle 
>>>>> on all
>>>>> + * buffer objects on it. If idle, remove from the list and drop the
>>>>> + * delayed destroy refcount. Return all busy buffers to the delayed
>>>>> + * destroy list.
>>>>>    */
>>>>>   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool 
>>>>> remove_all)
>>>>>   {
>>>>> -    struct list_head removed;
>>>>> -    bool empty;
>>>>> -
>>>>> -    INIT_LIST_HEAD(&removed);
>>>>> -
>>>>> -    spin_lock(&bdev->lru_lock);
>>>>> -    while (!list_empty(&bdev->ddestroy)) {
>>>>> -        struct ttm_buffer_object *bo;
>>>>> -
>>>>> -        bo = list_first_entry(&bdev->ddestroy, struct 
>>>>> ttm_buffer_object,
>>>>> -                      ddestroy);
>>>>> -        list_move_tail(&bo->ddestroy, &removed);
>>>>> -        if (!ttm_bo_get_unless_zero(bo))
>>>>> -            continue;
>>>>> -
>>>>> -        if (remove_all || bo->base.resv != &bo->base._resv) {
>>>>> -            spin_unlock(&bdev->lru_lock);
>>>>> -            dma_resv_lock(bo->base.resv, NULL);
>>>>> -
>>>>> -            spin_lock(&bdev->lru_lock);
>>>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>>>> -
>>>>> -        } else if (dma_resv_trylock(bo->base.resv)) {
>>>>> -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>>>>> -        } else {
>>>>> -            spin_unlock(&bdev->lru_lock);
>>>>> +    struct ttm_buffer_object *bo, *next;
>>>>> +    struct list_head isolated;
>>>>> +    bool empty, idle;
>>>>> +
>>>>> +    INIT_LIST_HEAD(&isolated);
>>>>> +
>>>>> +    spin_lock(&bdev->ddestroy_lock);
>>>>> +    list_splice_init(&bdev->ddestroy, &isolated);
>>>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>>> +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
>>>>> +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
>>>>> +                 bo->base.resv != &bo->base._resv);
>>>>> +        if (!remove_all)
>>>>> +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
>>>>> +        else
>>>>> +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
>>>>> +                              false, 30 * HZ) > 0);
>>>>> +        if (idle) {
>>>>> +            list_del_init(&bo->ddestroy);
>>>>> +            ttm_bo_put(bo);
>>>>>           }
>>>>> -
>>>>> -        ttm_bo_put(bo);
>>>>> -        spin_lock(&bdev->lru_lock);
>>>>>       }
>>>>> -    list_splice_tail(&removed, &bdev->ddestroy);
>>>>> +    spin_lock(&bdev->ddestroy_lock);
>>>>> +    list_splice(&isolated, &bdev->ddestroy);
>>>>>       empty = list_empty(&bdev->ddestroy);
>>>>> -    spin_unlock(&bdev->lru_lock);
>>>>> +    spin_unlock(&bdev->ddestroy_lock);
>>>>>         return empty;
>>>>>   }
>>>>> @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
>>>>>           ret = ttm_bo_individualize_resv(bo);
>>>>>           if (ret) {
>>>>>               /* Last resort, if we fail to allocate memory for the
>>>>> -             * fences block for the BO to become idle
>>>>> +             * fences block for the BO to become idle, and then
>>>>> +             * we are free to update the resv pointer.
>>>>>                */
>>>>>               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
>>>>>                             30 * HZ);
>>>>> +            if (bo->type != ttm_bo_type_sg) {
>>>>> +                bo->base.resv = &bo->base._resv;
>>>>> +                /* Please see ttm_bo_individualize_resv() */
>>>>> +                smp_mb();
>>>>> +            }
>>>>>           }
>>>>>             if (bo->bdev->funcs->release_notify)
>>>>> @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
>>>>>           !dma_resv_trylock(bo->base.resv)) {
>>>>>           /* The BO is not idle, resurrect it for delayed destroy */
>>>>>           ttm_bo_flush_all_fences(bo);
>>>>> -        bo->deleted = true;
>>>>> -
>>>>>           spin_lock(&bo->bdev->lru_lock);
>>>>> +        bo->deleted = true;
>>>>>             /*
>>>>>            * Make pinned bos immediately available to
>>>>> @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
>>>>>               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
>>>>>           }
>>>>>   +        spin_lock(&bo->bdev->ddestroy_lock);
>>>>>           kref_init(&bo->kref);
>>>>>           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
>>>>> +        spin_unlock(&bo->bdev->ddestroy_lock);
>>>>>           spin_unlock(&bo->bdev->lru_lock);
>>>>>             schedule_delayed_work(&bdev->wq,
>>>>> @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
>>>>>         spin_lock(&bo->bdev->lru_lock);
>>>>>       ttm_bo_del_from_lru(bo);
>>>>> -    list_del(&bo->ddestroy);
>>>>>       spin_unlock(&bo->bdev->lru_lock);
>>>>>   +    pr_info("Deleting.\n");
>>>>> + WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
>>>>> +
>>>>>       ttm_bo_cleanup_memtype_use(bo);
>>>>>       dma_resv_unlock(bo->base.resv);
>>>>>   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct ttm_device 
>>>>> *bdev,
>>>>>           list_for_each_entry(bo, &man->lru[i], lru) {
>>>>>               bool busy;
>>>>>   +            if (!ttm_bo_get_unless_zero(bo))
>>>>> +                continue;
>>>>> +
>>>>>               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>>>>                                   &busy)) {
>>>>>                   if (busy && !busy_bo && ticket !=
>>>>> dma_resv_locking_ctx(bo->base.resv))
>>>>>                       busy_bo = bo;
>>>>> +                ttm_bo_put(bo);
>>>>>                   continue;
>>>>>               }
>>>>>                 if (place && !bdev->funcs->eviction_valuable(bo,
>>>>>                                         place)) {
>>>>> +                ttm_bo_put(bo);
>>>>>                   if (locked)
>>>>>                       dma_resv_unlock(bo->base.resv);
>>>>>                   continue;
>>>>>               }
>>>>> -            if (!ttm_bo_get_unless_zero(bo)) {
>>>>> -                if (locked)
>>>>> -                    dma_resv_unlock(bo->base.resv);
>>>>> -                continue;
>>>>> -            }
>>>>> +
>>>>>               break;
>>>>>           }
>>>>>   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device 
>>>>> *bdev,
>>>>>       }
>>>>>         if (bo->deleted) {
>>>>> +        spin_unlock(&bdev->lru_lock);
>>>>>           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
>>>>> -                      ctx->no_wait_gpu, locked);
>>>>> +                      ctx->no_wait_gpu);
>>>>> +        if (locked)
>>>>> +            dma_resv_unlock(bo->base.resv);
>>>>>           ttm_bo_put(bo);
>>>>>           return ret;
>>>>>       }
>>>>> @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct 
>>>>> ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>>>>>       bool locked;
>>>>>       int ret;
>>>>>   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
>>>>> +    if (!ttm_bo_get_unless_zero(bo))
>>>>>           return -EBUSY;
>>>>>   -    if (!ttm_bo_get_unless_zero(bo)) {
>>>>> -        if (locked)
>>>>> -            dma_resv_unlock(bo->base.resv);
>>>>> +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
>>>>> +        ttm_bo_put(bo);
>>>>>           return -EBUSY;
>>>>>       }
>>>>>         if (bo->deleted) {
>>>>> -        ttm_bo_cleanup_refs(bo, false, false, locked);
>>>>> +        spin_unlock(&bo->bdev->lru_lock);
>>>>> +        ttm_bo_cleanup_refs(bo, false, false);
>>>>> +        if (locked)
>>>>> +            dma_resv_unlock(bo->base.resv);
>>>>>           ttm_bo_put(bo);
>>>>>           return 0;
>>>>>       }
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_device.c
>>>>> index 9b787b3caeb5..b941989885ed 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>>>> @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev, 
>>>>> struct ttm_device_funcs *funcs,
>>>>>       bdev->vma_manager = vma_manager;
>>>>>       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
>>>>>       spin_lock_init(&bdev->lru_lock);
>>>>> +    spin_lock_init(&bdev->ddestroy_lock);
>>>>>       INIT_LIST_HEAD(&bdev->ddestroy);
>>>>>       bdev->dev_mapping = mapping;
>>>>>       mutex_lock(&ttm_global_mutex);
>>>>> diff --git a/include/drm/ttm/ttm_device.h 
>>>>> b/include/drm/ttm/ttm_device.h
>>>>> index 7c8f87bd52d3..fa8c4f6a169e 100644
>>>>> --- a/include/drm/ttm/ttm_device.h
>>>>> +++ b/include/drm/ttm/ttm_device.h
>>>>> @@ -279,6 +279,7 @@ struct ttm_device {
>>>>>        * Protection for the per manager LRU and ddestroy lists.
>>>>>        */
>>>>>       spinlock_t lru_lock;
>>>>> +    spinlock_t ddestroy_lock;
>>>>>       struct list_head ddestroy;
>>>>>         /*
>>>>
>>
Daniel Vetter April 12, 2021, 3:43 p.m. UTC | #6
On Mon, Apr 12, 2021 at 04:21:37PM +0200, Christian König wrote:
> 
> 
> Am 12.04.21 um 16:16 schrieb Thomas Hellström:
> > Hi, Christian,
> > 
> > On 4/12/21 4:01 PM, Christian König wrote:
> > > Hi Thomas,
> > > 
> > > well in general a good idea, but I'm working on a different plan for
> > > a while now.
> > > 
> > > My idea here is that instead of the BO the resource object is kept
> > > on a double linked lru list.
> > > 
> > > The resource objects then have a pointer to either the BO or a fence
> > > object.
> > > 
> > > When it is a fence object we can just grab a reference to it and
> > > wait for it to complete.
> > > 
> > > If it is a BO we evict it the same way we currently do.
> > > 
> > > This allows to remove both the delayed delete, individualization of
> > > BOs, ghost objects etc...
> > 
> > Hmm, ok. So in that case, what would trigger the final release of the
> > buffer object in the absence of a delayed delete list? Would we use a
> > fence callback for that?
> 
> Key point is you don't need any final release of the BO any more. When the
> BOs reference count becomes zero it can be destructed immediately.
> 
> Only the resource object is kept around and protected by a fence until it
> can be released finally.

I was reading dma_resv here for a second, and wondered how we figure out
the refcounting for that. But since you aim for a fence, that's
refcounted, and then as long as we make sure the lifetime rules for
resource objects in this free-standing mode is very clear (lru owns it,
until we remove it holding the lru lock should work?) then I think this is
rather clean approach.

Also non-freestanding resource objects are fully owned by their objects.

> How that resource object is then destroyed could be handled in different
> ways, we could use a delayed work, shrinker callback or just rely on new
> allocations to scan the deleted objects.
> 
> > 
> > Otherwise sounds like a good idea, and if it runs into problems, we
> > could possibly resurrect this.
> 
> Well the main problem is that I need to change all drivers so that bo->mem
> can now be a pointer. IIRC you suggested that years ago and I'm working on
> the details of that idea ever since.

Yeah that's going to be a pile :-) But I think we've reached the same
conclusion on i915 for polishing our lmem manager (atm it's ghost objects
all the way down), before we decided to go direct adopting ttm.

So makes all sense to me.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > /Thomas
> > 
> > 
> > 
> > 
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > Am 12.04.21 um 15:17 schrieb Thomas Hellström:
> > > > This RFC needs some decent testing on a driver with bos that share
> > > > reservation objects, and of course a check for whether I missed
> > > > something obvious.
> > > > 
> > > > The locking around delayed destroy is rather complex due to the fact
> > > > that we want to individualize dma_resv pointers before putting
> > > > the object
> > > > on the delayed-destroy list. That individualization is currently
> > > > protected
> > > > by the lru lock forcing us to do try-reservations under the lru
> > > > lock when
> > > > reserving an object from the lru- or ddestroy lists, which complicates
> > > > moving over to per-manager lru locks.
> > > > 
> > > > Instead protect the individualization with the object kref == 0,
> > > > and ensure kref != 0 before trying to reserve an object from the list.
> > > > Remove the lru lock around individualization and protect the delayed
> > > > destroy list with a separate spinlock. Isolate the delayed destroy list
> > > > similarly to a lockless list before traversing it. Also don't
> > > > try to drop
> > > > resource references from the delayed destroy list traversal, but leave
> > > > that to the final object release. The role of the delayed
> > > > destroy thread
> > > > will now simply be to try to idle the object and when idle, drop
> > > > the delayed destoy reference.
> > > > 
> > > > This should pave the way for per-manager lru lists, sleeping
> > > > eviction locks
> > > > that are kept, optional drm_mm eviction roster usage and moving
> > > > over to a
> > > > completely lockless delayed destroy list even if the traversal order
> > > > will then change because there is no llist_add_tail().
> > > > 
> > > > Cc: Christian König <christian.koenig@amd.com>
> > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >   drivers/gpu/drm/ttm/ttm_bo.c     | 193
> > > > +++++++++++++------------------
> > > >   drivers/gpu/drm/ttm/ttm_device.c |   1 +
> > > >   include/drm/ttm/ttm_device.h     |   1 +
> > > >   3 files changed, 82 insertions(+), 113 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > index 6ab7b66ce36d..fd57252bc8fe 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > @@ -217,6 +217,9 @@ static int ttm_bo_handle_move_mem(struct
> > > > ttm_buffer_object *bo,
> > > >     static void ttm_bo_cleanup_memtype_use(struct
> > > > ttm_buffer_object *bo)
> > > >   {
> > > > +    if (kref_read(&bo->kref))
> > > > +        dma_resv_assert_held(bo->base.resv);
> > > > +
> > > >       if (bo->bdev->funcs->delete_mem_notify)
> > > >           bo->bdev->funcs->delete_mem_notify(bo);
> > > >   @@ -239,13 +242,18 @@ static int
> > > > ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
> > > >           return r;
> > > >         if (bo->type != ttm_bo_type_sg) {
> > > > -        /* This works because the BO is about to be destroyed
> > > > and nobody
> > > > -         * reference it any more. The only tricky case is the
> > > > trylock on
> > > > -         * the resv object while holding the lru_lock.
> > > > +        /*
> > > > +         * This works because nobody besides us can lock the bo or
> > > > +         * assume it is locked while its refcount is zero.
> > > >            */
> > > > -        spin_lock(&bo->bdev->lru_lock);
> > > > +        WARN_ON_ONCE(kref_read(&bo->kref));
> > > >           bo->base.resv = &bo->base._resv;
> > > > -        spin_unlock(&bo->bdev->lru_lock);
> > > > +        /*
> > > > +         * Don't reorder against kref_init().
> > > > +         * Matches the implicit full barrier of a successful
> > > > +         * kref_get_unless_zero() after a lru_list_lookup.
> > > > +         */
> > > > +        smp_mb();
> > > >       }
> > > >         return r;
> > > > @@ -274,122 +282,66 @@ static void
> > > > ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
> > > >   }
> > > >     /**
> > > > - * function ttm_bo_cleanup_refs
> > > > - * If bo idle, remove from lru lists, and unref.
> > > > - * If not idle, block if possible.
> > > > - *
> > > > - * Must be called with lru_lock and reservation held, this function
> > > > - * will drop the lru lock and optionally the reservation lock
> > > > before returning.
> > > > + * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings
> > > > and remove
> > > > + * from lru_lists.
> > > >    *
> > > >    * @bo:                    The buffer object to clean-up
> > > >    * @interruptible:         Any sleeps should occur interruptibly.
> > > > - * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
> > > > - * @unlock_resv:           Unlock the reservation lock as well.
> > > > + * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if
> > > > not idle.
> > > >    */
> > > > -
> > > >   static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
> > > > -                   bool interruptible, bool no_wait_gpu,
> > > > -                   bool unlock_resv)
> > > > +                   bool interruptible, bool no_wait_gpu)
> > > >   {
> > > > -    struct dma_resv *resv = &bo->base._resv;
> > > >       int ret;
> > > >   -    if (dma_resv_test_signaled_rcu(resv, true))
> > > > -        ret = 0;
> > > > -    else
> > > > -        ret = -EBUSY;
> > > > -
> > > > -    if (ret && !no_wait_gpu) {
> > > > -        long lret;
> > > > -
> > > > -        if (unlock_resv)
> > > > -            dma_resv_unlock(bo->base.resv);
> > > > -        spin_unlock(&bo->bdev->lru_lock);
> > > > -
> > > > -        lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
> > > > -                         30 * HZ);
> > > > -
> > > > -        if (lret < 0)
> > > > -            return lret;
> > > > -        else if (lret == 0)
> > > > -            return -EBUSY;
> > > > -
> > > > -        spin_lock(&bo->bdev->lru_lock);
> > > > -        if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
> > > > -            /*
> > > > -             * We raced, and lost, someone else holds the
> > > > reservation now,
> > > > -             * and is probably busy in ttm_bo_cleanup_memtype_use.
> > > > -             *
> > > > -             * Even if it's not the case, because we finished
> > > > waiting any
> > > > -             * delayed destruction would succeed, so just
> > > > return success
> > > > -             * here.
> > > > -             */
> > > > -            spin_unlock(&bo->bdev->lru_lock);
> > > > -            return 0;
> > > > -        }
> > > > -        ret = 0;
> > > > -    }
> > > > -
> > > > -    if (ret || unlikely(list_empty(&bo->ddestroy))) {
> > > > -        if (unlock_resv)
> > > > -            dma_resv_unlock(bo->base.resv);
> > > > -        spin_unlock(&bo->bdev->lru_lock);
> > > > +    dma_resv_assert_held(bo->base.resv);
> > > > +    WARN_ON_ONCE(!bo->deleted);
> > > > +    ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
> > > > +    if (ret)
> > > >           return ret;
> > > > -    }
> > > >   +    ttm_bo_cleanup_memtype_use(bo);
> > > > +    spin_lock(&bo->bdev->lru_lock);
> > > >       ttm_bo_del_from_lru(bo);
> > > > -    list_del_init(&bo->ddestroy);
> > > >       spin_unlock(&bo->bdev->lru_lock);
> > > > -    ttm_bo_cleanup_memtype_use(bo);
> > > > -
> > > > -    if (unlock_resv)
> > > > -        dma_resv_unlock(bo->base.resv);
> > > > -
> > > > -    ttm_bo_put(bo);
> > > >         return 0;
> > > >   }
> > > >     /*
> > > > - * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
> > > > - * encountered buffers.
> > > > + * Isolate a part of the delayed destroy list and check for
> > > > idle on all
> > > > + * buffer objects on it. If idle, remove from the list and drop the
> > > > + * delayed destroy refcount. Return all busy buffers to the delayed
> > > > + * destroy list.
> > > >    */
> > > >   bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
> > > >   {
> > > > -    struct list_head removed;
> > > > -    bool empty;
> > > > -
> > > > -    INIT_LIST_HEAD(&removed);
> > > > -
> > > > -    spin_lock(&bdev->lru_lock);
> > > > -    while (!list_empty(&bdev->ddestroy)) {
> > > > -        struct ttm_buffer_object *bo;
> > > > -
> > > > -        bo = list_first_entry(&bdev->ddestroy, struct
> > > > ttm_buffer_object,
> > > > -                      ddestroy);
> > > > -        list_move_tail(&bo->ddestroy, &removed);
> > > > -        if (!ttm_bo_get_unless_zero(bo))
> > > > -            continue;
> > > > -
> > > > -        if (remove_all || bo->base.resv != &bo->base._resv) {
> > > > -            spin_unlock(&bdev->lru_lock);
> > > > -            dma_resv_lock(bo->base.resv, NULL);
> > > > -
> > > > -            spin_lock(&bdev->lru_lock);
> > > > -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
> > > > -
> > > > -        } else if (dma_resv_trylock(bo->base.resv)) {
> > > > -            ttm_bo_cleanup_refs(bo, false, !remove_all, true);
> > > > -        } else {
> > > > -            spin_unlock(&bdev->lru_lock);
> > > > +    struct ttm_buffer_object *bo, *next;
> > > > +    struct list_head isolated;
> > > > +    bool empty, idle;
> > > > +
> > > > +    INIT_LIST_HEAD(&isolated);
> > > > +
> > > > +    spin_lock(&bdev->ddestroy_lock);
> > > > +    list_splice_init(&bdev->ddestroy, &isolated);
> > > > +    spin_unlock(&bdev->ddestroy_lock);
> > > > +    list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
> > > > +        WARN_ON_ONCE(!kref_read(&bo->kref) ||
> > > > +                 bo->base.resv != &bo->base._resv);
> > > > +        if (!remove_all)
> > > > +            idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
> > > > +        else
> > > > +            idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
> > > > +                              false, 30 * HZ) > 0);
> > > > +        if (idle) {
> > > > +            list_del_init(&bo->ddestroy);
> > > > +            ttm_bo_put(bo);
> > > >           }
> > > > -
> > > > -        ttm_bo_put(bo);
> > > > -        spin_lock(&bdev->lru_lock);
> > > >       }
> > > > -    list_splice_tail(&removed, &bdev->ddestroy);
> > > > +    spin_lock(&bdev->ddestroy_lock);
> > > > +    list_splice(&isolated, &bdev->ddestroy);
> > > >       empty = list_empty(&bdev->ddestroy);
> > > > -    spin_unlock(&bdev->lru_lock);
> > > > +    spin_unlock(&bdev->ddestroy_lock);
> > > >         return empty;
> > > >   }
> > > > @@ -405,10 +357,16 @@ static void ttm_bo_release(struct kref *kref)
> > > >           ret = ttm_bo_individualize_resv(bo);
> > > >           if (ret) {
> > > >               /* Last resort, if we fail to allocate memory for the
> > > > -             * fences block for the BO to become idle
> > > > +             * fences block for the BO to become idle, and then
> > > > +             * we are free to update the resv pointer.
> > > >                */
> > > >               dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
> > > >                             30 * HZ);
> > > > +            if (bo->type != ttm_bo_type_sg) {
> > > > +                bo->base.resv = &bo->base._resv;
> > > > +                /* Please see ttm_bo_individualize_resv() */
> > > > +                smp_mb();
> > > > +            }
> > > >           }
> > > >             if (bo->bdev->funcs->release_notify)
> > > > @@ -422,9 +380,8 @@ static void ttm_bo_release(struct kref *kref)
> > > >           !dma_resv_trylock(bo->base.resv)) {
> > > >           /* The BO is not idle, resurrect it for delayed destroy */
> > > >           ttm_bo_flush_all_fences(bo);
> > > > -        bo->deleted = true;
> > > > -
> > > >           spin_lock(&bo->bdev->lru_lock);
> > > > +        bo->deleted = true;
> > > >             /*
> > > >            * Make pinned bos immediately available to
> > > > @@ -439,8 +396,10 @@ static void ttm_bo_release(struct kref *kref)
> > > >               ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
> > > >           }
> > > >   +        spin_lock(&bo->bdev->ddestroy_lock);
> > > >           kref_init(&bo->kref);
> > > >           list_add_tail(&bo->ddestroy, &bdev->ddestroy);
> > > > +        spin_unlock(&bo->bdev->ddestroy_lock);
> > > >           spin_unlock(&bo->bdev->lru_lock);
> > > >             schedule_delayed_work(&bdev->wq,
> > > > @@ -450,9 +409,11 @@ static void ttm_bo_release(struct kref *kref)
> > > >         spin_lock(&bo->bdev->lru_lock);
> > > >       ttm_bo_del_from_lru(bo);
> > > > -    list_del(&bo->ddestroy);
> > > >       spin_unlock(&bo->bdev->lru_lock);
> > > >   +    pr_info("Deleting.\n");
> > > > +    WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
> > > > +
> > > >       ttm_bo_cleanup_memtype_use(bo);
> > > >       dma_resv_unlock(bo->base.resv);
> > > >   @@ -630,25 +591,26 @@ int ttm_mem_evict_first(struct
> > > > ttm_device *bdev,
> > > >           list_for_each_entry(bo, &man->lru[i], lru) {
> > > >               bool busy;
> > > >   +            if (!ttm_bo_get_unless_zero(bo))
> > > > +                continue;
> > > > +
> > > >               if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
> > > >                                   &busy)) {
> > > >                   if (busy && !busy_bo && ticket !=
> > > >                       dma_resv_locking_ctx(bo->base.resv))
> > > >                       busy_bo = bo;
> > > > +                ttm_bo_put(bo);
> > > >                   continue;
> > > >               }
> > > >                 if (place && !bdev->funcs->eviction_valuable(bo,
> > > >                                         place)) {
> > > > +                ttm_bo_put(bo);
> > > >                   if (locked)
> > > >                       dma_resv_unlock(bo->base.resv);
> > > >                   continue;
> > > >               }
> > > > -            if (!ttm_bo_get_unless_zero(bo)) {
> > > > -                if (locked)
> > > > -                    dma_resv_unlock(bo->base.resv);
> > > > -                continue;
> > > > -            }
> > > > +
> > > >               break;
> > > >           }
> > > >   @@ -670,8 +632,11 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
> > > >       }
> > > >         if (bo->deleted) {
> > > > +        spin_unlock(&bdev->lru_lock);
> > > >           ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
> > > > -                      ctx->no_wait_gpu, locked);
> > > > +                      ctx->no_wait_gpu);
> > > > +        if (locked)
> > > > +            dma_resv_unlock(bo->base.resv);
> > > >           ttm_bo_put(bo);
> > > >           return ret;
> > > >       }
> > > > @@ -1168,17 +1133,19 @@ int ttm_bo_swapout(struct
> > > > ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> > > >       bool locked;
> > > >       int ret;
> > > >   -    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
> > > > +    if (!ttm_bo_get_unless_zero(bo))
> > > >           return -EBUSY;
> > > >   -    if (!ttm_bo_get_unless_zero(bo)) {
> > > > -        if (locked)
> > > > -            dma_resv_unlock(bo->base.resv);
> > > > +    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
> > > > +        ttm_bo_put(bo);
> > > >           return -EBUSY;
> > > >       }
> > > >         if (bo->deleted) {
> > > > -        ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > +        spin_unlock(&bo->bdev->lru_lock);
> > > > +        ttm_bo_cleanup_refs(bo, false, false);
> > > > +        if (locked)
> > > > +            dma_resv_unlock(bo->base.resv);
> > > >           ttm_bo_put(bo);
> > > >           return 0;
> > > >       }
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> > > > b/drivers/gpu/drm/ttm/ttm_device.c
> > > > index 9b787b3caeb5..b941989885ed 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_device.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_device.c
> > > > @@ -228,6 +228,7 @@ int ttm_device_init(struct ttm_device *bdev,
> > > > struct ttm_device_funcs *funcs,
> > > >       bdev->vma_manager = vma_manager;
> > > >       INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
> > > >       spin_lock_init(&bdev->lru_lock);
> > > > +    spin_lock_init(&bdev->ddestroy_lock);
> > > >       INIT_LIST_HEAD(&bdev->ddestroy);
> > > >       bdev->dev_mapping = mapping;
> > > >       mutex_lock(&ttm_global_mutex);
> > > > diff --git a/include/drm/ttm/ttm_device.h
> > > > b/include/drm/ttm/ttm_device.h
> > > > index 7c8f87bd52d3..fa8c4f6a169e 100644
> > > > --- a/include/drm/ttm/ttm_device.h
> > > > +++ b/include/drm/ttm/ttm_device.h
> > > > @@ -279,6 +279,7 @@ struct ttm_device {
> > > >        * Protection for the per manager LRU and ddestroy lists.
> > > >        */
> > > >       spinlock_t lru_lock;
> > > > +    spinlock_t ddestroy_lock;
> > > >       struct list_head ddestroy;
> > > >         /*
> > > 
>
Thomas Hellström April 12, 2021, 3:49 p.m. UTC | #7
On Mon, 2021-04-12 at 17:43 +0200, Daniel Vetter wrote:
> On Mon, Apr 12, 2021 at 04:21:37PM +0200, Christian König wrote:
> > 
> > 
> > Am 12.04.21 um 16:16 schrieb Thomas Hellström:
> > > Hi, Christian,
> > > 
> > > On 4/12/21 4:01 PM, Christian König wrote:
> > > > Hi Thomas,
> > > > 
> > > > well in general a good idea, but I'm working on a different
> > > > plan for
> > > > a while now.
> > > > 
> > > > My idea here is that instead of the BO the resource object is
> > > > kept
> > > > on a double linked lru list.
> > > > 
> > > > The resource objects then have a pointer to either the BO or a
> > > > fence
> > > > object.
> > > > 
> > > > When it is a fence object we can just grab a reference to it
> > > > and
> > > > wait for it to complete.
> > > > 
> > > > If it is a BO we evict it the same way we currently do.
> > > > 
> > > > This allows to remove both the delayed delete,
> > > > individualization of
> > > > BOs, ghost objects etc...
> > > 
> > > Hmm, ok. So in that case, what would trigger the final release of
> > > the
> > > buffer object in the absence of a delayed delete list? Would we
> > > use a
> > > fence callback for that?
> > 
> > Key point is you don't need any final release of the BO any more.
> > When the
> > BOs reference count becomes zero it can be destructed immediately.
> > 
> > Only the resource object is kept around and protected by a fence
> > until it
> > can be released finally.
> 
> I was reading dma_resv here for a second, and wondered how we figure
> out
> the refcounting for that. But since you aim for a fence, that's
> refcounted, 

Hmm, Good point. What about objects with multiple shared fences?

/Thomas
Daniel Vetter April 12, 2021, 4:39 p.m. UTC | #8
On Mon, Apr 12, 2021 at 05:49:50PM +0200, Thomas Hellström wrote:
> On Mon, 2021-04-12 at 17:43 +0200, Daniel Vetter wrote:
> > On Mon, Apr 12, 2021 at 04:21:37PM +0200, Christian König wrote:
> > > 
> > > 
> > > Am 12.04.21 um 16:16 schrieb Thomas Hellström:
> > > > Hi, Christian,
> > > > 
> > > > On 4/12/21 4:01 PM, Christian König wrote:
> > > > > Hi Thomas,
> > > > > 
> > > > > well in general a good idea, but I'm working on a different
> > > > > plan for
> > > > > a while now.
> > > > > 
> > > > > My idea here is that instead of the BO the resource object is
> > > > > kept
> > > > > on a double linked lru list.
> > > > > 
> > > > > The resource objects then have a pointer to either the BO or a
> > > > > fence
> > > > > object.
> > > > > 
> > > > > When it is a fence object we can just grab a reference to it
> > > > > and
> > > > > wait for it to complete.
> > > > > 
> > > > > If it is a BO we evict it the same way we currently do.
> > > > > 
> > > > > This allows to remove both the delayed delete,
> > > > > individualization of
> > > > > BOs, ghost objects etc...
> > > > 
> > > > Hmm, ok. So in that case, what would trigger the final release of
> > > > the
> > > > buffer object in the absence of a delayed delete list? Would we
> > > > use a
> > > > fence callback for that?
> > > 
> > > Key point is you don't need any final release of the BO any more.
> > > When the
> > > BOs reference count becomes zero it can be destructed immediately.
> > > 
> > > Only the resource object is kept around and protected by a fence
> > > until it
> > > can be released finally.
> > 
> > I was reading dma_resv here for a second, and wondered how we figure
> > out
> > the refcounting for that. But since you aim for a fence, that's
> > refcounted, 
> 
> Hmm, Good point. What about objects with multiple shared fences?

Package it up into a new fence as part of the privatization step. We
already have the helpers for that I think.

It does mean a memory allocation in there, but I /think/ that's fine.
-Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 6ab7b66ce36d..fd57252bc8fe 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -217,6 +217,9 @@  static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 
 static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
 {
+	if (kref_read(&bo->kref))
+		dma_resv_assert_held(bo->base.resv);
+
 	if (bo->bdev->funcs->delete_mem_notify)
 		bo->bdev->funcs->delete_mem_notify(bo);
 
@@ -239,13 +242,18 @@  static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
 		return r;
 
 	if (bo->type != ttm_bo_type_sg) {
-		/* This works because the BO is about to be destroyed and nobody
-		 * reference it any more. The only tricky case is the trylock on
-		 * the resv object while holding the lru_lock.
+		/*
+		 * This works because nobody besides us can lock the bo or
+		 * assume it is locked while its refcount is zero.
 		 */
-		spin_lock(&bo->bdev->lru_lock);
+		WARN_ON_ONCE(kref_read(&bo->kref));
 		bo->base.resv = &bo->base._resv;
-		spin_unlock(&bo->bdev->lru_lock);
+		/*
+		 * Don't reorder against kref_init().
+		 * Matches the implicit full barrier of a successful
+		 * kref_get_unless_zero() after a lru_list_lookup.
+		 */
+		smp_mb();
 	}
 
 	return r;
@@ -274,122 +282,66 @@  static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo)
 }
 
 /**
- * function ttm_bo_cleanup_refs
- * If bo idle, remove from lru lists, and unref.
- * If not idle, block if possible.
- *
- * Must be called with lru_lock and reservation held, this function
- * will drop the lru lock and optionally the reservation lock before returning.
+ * ttm_bo_cleanup_refs - idle and remove pages and gpu-bindings and remove
+ * from lru_lists.
  *
  * @bo:                    The buffer object to clean-up
  * @interruptible:         Any sleeps should occur interruptibly.
- * @no_wait_gpu:           Never wait for gpu. Return -EBUSY instead.
- * @unlock_resv:           Unlock the reservation lock as well.
+ * @no_wait_gpu:           Never wait for gpu. Return -EBUSY if not idle.
  */
-
 static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo,
-			       bool interruptible, bool no_wait_gpu,
-			       bool unlock_resv)
+			       bool interruptible, bool no_wait_gpu)
 {
-	struct dma_resv *resv = &bo->base._resv;
 	int ret;
 
-	if (dma_resv_test_signaled_rcu(resv, true))
-		ret = 0;
-	else
-		ret = -EBUSY;
-
-	if (ret && !no_wait_gpu) {
-		long lret;
-
-		if (unlock_resv)
-			dma_resv_unlock(bo->base.resv);
-		spin_unlock(&bo->bdev->lru_lock);
-
-		lret = dma_resv_wait_timeout_rcu(resv, true, interruptible,
-						 30 * HZ);
-
-		if (lret < 0)
-			return lret;
-		else if (lret == 0)
-			return -EBUSY;
-
-		spin_lock(&bo->bdev->lru_lock);
-		if (unlock_resv && !dma_resv_trylock(bo->base.resv)) {
-			/*
-			 * We raced, and lost, someone else holds the reservation now,
-			 * and is probably busy in ttm_bo_cleanup_memtype_use.
-			 *
-			 * Even if it's not the case, because we finished waiting any
-			 * delayed destruction would succeed, so just return success
-			 * here.
-			 */
-			spin_unlock(&bo->bdev->lru_lock);
-			return 0;
-		}
-		ret = 0;
-	}
-
-	if (ret || unlikely(list_empty(&bo->ddestroy))) {
-		if (unlock_resv)
-			dma_resv_unlock(bo->base.resv);
-		spin_unlock(&bo->bdev->lru_lock);
+	dma_resv_assert_held(bo->base.resv);
+	WARN_ON_ONCE(!bo->deleted);
+	ret = ttm_bo_wait(bo, interruptible, no_wait_gpu);
+	if (ret)
 		return ret;
-	}
 
+	ttm_bo_cleanup_memtype_use(bo);
+	spin_lock(&bo->bdev->lru_lock);
 	ttm_bo_del_from_lru(bo);
-	list_del_init(&bo->ddestroy);
 	spin_unlock(&bo->bdev->lru_lock);
-	ttm_bo_cleanup_memtype_use(bo);
-
-	if (unlock_resv)
-		dma_resv_unlock(bo->base.resv);
-
-	ttm_bo_put(bo);
 
 	return 0;
 }
 
 /*
- * Traverse the delayed list, and call ttm_bo_cleanup_refs on all
- * encountered buffers.
+ * Isolate a part of the delayed destroy list and check for idle on all
+ * buffer objects on it. If idle, remove from the list and drop the
+ * delayed destroy refcount. Return all busy buffers to the delayed
+ * destroy list.
  */
 bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all)
 {
-	struct list_head removed;
-	bool empty;
-
-	INIT_LIST_HEAD(&removed);
-
-	spin_lock(&bdev->lru_lock);
-	while (!list_empty(&bdev->ddestroy)) {
-		struct ttm_buffer_object *bo;
-
-		bo = list_first_entry(&bdev->ddestroy, struct ttm_buffer_object,
-				      ddestroy);
-		list_move_tail(&bo->ddestroy, &removed);
-		if (!ttm_bo_get_unless_zero(bo))
-			continue;
-
-		if (remove_all || bo->base.resv != &bo->base._resv) {
-			spin_unlock(&bdev->lru_lock);
-			dma_resv_lock(bo->base.resv, NULL);
-
-			spin_lock(&bdev->lru_lock);
-			ttm_bo_cleanup_refs(bo, false, !remove_all, true);
-
-		} else if (dma_resv_trylock(bo->base.resv)) {
-			ttm_bo_cleanup_refs(bo, false, !remove_all, true);
-		} else {
-			spin_unlock(&bdev->lru_lock);
+	struct ttm_buffer_object *bo, *next;
+	struct list_head isolated;
+	bool empty, idle;
+
+	INIT_LIST_HEAD(&isolated);
+
+	spin_lock(&bdev->ddestroy_lock);
+	list_splice_init(&bdev->ddestroy, &isolated);
+	spin_unlock(&bdev->ddestroy_lock);
+	list_for_each_entry_safe(bo, next, &isolated, ddestroy) {
+		WARN_ON_ONCE(!kref_read(&bo->kref) ||
+			     bo->base.resv != &bo->base._resv);
+		if (!remove_all)
+			idle = dma_resv_test_signaled_rcu(bo->base.resv, true);
+		else
+			idle = (dma_resv_wait_timeout_rcu(bo->base.resv, true,
+							  false, 30 * HZ) > 0);
+		if (idle) {
+			list_del_init(&bo->ddestroy);
+			ttm_bo_put(bo);
 		}
-
-		ttm_bo_put(bo);
-		spin_lock(&bdev->lru_lock);
 	}
-	list_splice_tail(&removed, &bdev->ddestroy);
+	spin_lock(&bdev->ddestroy_lock);
+	list_splice(&isolated, &bdev->ddestroy);
 	empty = list_empty(&bdev->ddestroy);
-	spin_unlock(&bdev->lru_lock);
+	spin_unlock(&bdev->ddestroy_lock);
 
 	return empty;
 }
@@ -405,10 +357,16 @@  static void ttm_bo_release(struct kref *kref)
 		ret = ttm_bo_individualize_resv(bo);
 		if (ret) {
 			/* Last resort, if we fail to allocate memory for the
-			 * fences block for the BO to become idle
+			 * fences block for the BO to become idle, and then
+			 * we are free to update the resv pointer.
 			 */
 			dma_resv_wait_timeout_rcu(bo->base.resv, true, false,
 						  30 * HZ);
+			if (bo->type != ttm_bo_type_sg) {
+				bo->base.resv = &bo->base._resv;
+				/* Please see ttm_bo_individualize_resv() */
+				smp_mb();
+			}
 		}
 
 		if (bo->bdev->funcs->release_notify)
@@ -422,9 +380,8 @@  static void ttm_bo_release(struct kref *kref)
 	    !dma_resv_trylock(bo->base.resv)) {
 		/* The BO is not idle, resurrect it for delayed destroy */
 		ttm_bo_flush_all_fences(bo);
-		bo->deleted = true;
-
 		spin_lock(&bo->bdev->lru_lock);
+		bo->deleted = true;
 
 		/*
 		 * Make pinned bos immediately available to
@@ -439,8 +396,10 @@  static void ttm_bo_release(struct kref *kref)
 			ttm_bo_move_to_lru_tail(bo, &bo->mem, NULL);
 		}
 
+		spin_lock(&bo->bdev->ddestroy_lock);
 		kref_init(&bo->kref);
 		list_add_tail(&bo->ddestroy, &bdev->ddestroy);
+		spin_unlock(&bo->bdev->ddestroy_lock);
 		spin_unlock(&bo->bdev->lru_lock);
 
 		schedule_delayed_work(&bdev->wq,
@@ -450,9 +409,11 @@  static void ttm_bo_release(struct kref *kref)
 
 	spin_lock(&bo->bdev->lru_lock);
 	ttm_bo_del_from_lru(bo);
-	list_del(&bo->ddestroy);
 	spin_unlock(&bo->bdev->lru_lock);
 
+	pr_info("Deleting.\n");
+	WARN_ON_ONCE(!list_empty_careful(&bo->ddestroy));
+
 	ttm_bo_cleanup_memtype_use(bo);
 	dma_resv_unlock(bo->base.resv);
 
@@ -630,25 +591,26 @@  int ttm_mem_evict_first(struct ttm_device *bdev,
 		list_for_each_entry(bo, &man->lru[i], lru) {
 			bool busy;
 
+			if (!ttm_bo_get_unless_zero(bo))
+				continue;
+
 			if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
 							    &busy)) {
 				if (busy && !busy_bo && ticket !=
 				    dma_resv_locking_ctx(bo->base.resv))
 					busy_bo = bo;
+				ttm_bo_put(bo);
 				continue;
 			}
 
 			if (place && !bdev->funcs->eviction_valuable(bo,
 								      place)) {
+				ttm_bo_put(bo);
 				if (locked)
 					dma_resv_unlock(bo->base.resv);
 				continue;
 			}
-			if (!ttm_bo_get_unless_zero(bo)) {
-				if (locked)
-					dma_resv_unlock(bo->base.resv);
-				continue;
-			}
+
 			break;
 		}
 
@@ -670,8 +632,11 @@  int ttm_mem_evict_first(struct ttm_device *bdev,
 	}
 
 	if (bo->deleted) {
+		spin_unlock(&bdev->lru_lock);
 		ret = ttm_bo_cleanup_refs(bo, ctx->interruptible,
-					  ctx->no_wait_gpu, locked);
+					  ctx->no_wait_gpu);
+		if (locked)
+			dma_resv_unlock(bo->base.resv);
 		ttm_bo_put(bo);
 		return ret;
 	}
@@ -1168,17 +1133,19 @@  int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	bool locked;
 	int ret;
 
-	if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
+	if (!ttm_bo_get_unless_zero(bo))
 		return -EBUSY;
 
-	if (!ttm_bo_get_unless_zero(bo)) {
-		if (locked)
-			dma_resv_unlock(bo->base.resv);
+	if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL)) {
+		ttm_bo_put(bo);
 		return -EBUSY;
 	}
 
 	if (bo->deleted) {
-		ttm_bo_cleanup_refs(bo, false, false, locked);
+		spin_unlock(&bo->bdev->lru_lock);
+		ttm_bo_cleanup_refs(bo, false, false);
+		if (locked)
+			dma_resv_unlock(bo->base.resv);
 		ttm_bo_put(bo);
 		return 0;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 9b787b3caeb5..b941989885ed 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -228,6 +228,7 @@  int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
 	bdev->vma_manager = vma_manager;
 	INIT_DELAYED_WORK(&bdev->wq, ttm_device_delayed_workqueue);
 	spin_lock_init(&bdev->lru_lock);
+	spin_lock_init(&bdev->ddestroy_lock);
 	INIT_LIST_HEAD(&bdev->ddestroy);
 	bdev->dev_mapping = mapping;
 	mutex_lock(&ttm_global_mutex);
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 7c8f87bd52d3..fa8c4f6a169e 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -279,6 +279,7 @@  struct ttm_device {
 	 * Protection for the per manager LRU and ddestroy lists.
 	 */
 	spinlock_t lru_lock;
+	spinlock_t ddestroy_lock;
 	struct list_head ddestroy;
 
 	/*