diff mbox series

[v3,08/17] drm/i915: Call i915_gem_evict_vm in vm_fault_gtt to prevent new ENOSPC errors

Message ID 20211216142749.1966107-9-maarten.lankhorst@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Remove short term pins from execbuf by requiring lock to unbind. | expand

Commit Message

Maarten Lankhorst Dec. 16, 2021, 2:27 p.m. UTC
Now that we cannot unbind kill the currently locked object directly
because we're removing short term pinning, we may have to unbind the
object from gtt manually, using a i915_gem_evict_vm() call.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Matthew Auld Dec. 17, 2021, 11:58 a.m. UTC | #1
On Thu, 16 Dec 2021 at 14:28, Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
>
> Now that we cannot unbind kill the currently locked object directly

"unbind kill"

> because we're removing short term pinning, we may have to unbind the
> object from gtt manually, using a i915_gem_evict_vm() call.
>
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Maybe mention that this only in preparation for some future patches,
once the actual eviction is trylock and evict_for_vm can also handle
shared dma-resv? At this point in the series we shouldn't expect to
hit -ENOSPC, right?

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index af81d6c3332a..00cd9642669a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -358,8 +358,22 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>                         vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
>                 }
>
> -               /* The entire mappable GGTT is pinned? Unexpected! */
> -               GEM_BUG_ON(vma == ERR_PTR(-ENOSPC));
> +               /*
> +                * The entire mappable GGTT is pinned? Unexpected!
> +                * Try to evict the object we locked too, as normally we skip it
> +                * due to lack of short term pinning inside execbuf.
> +                */
> +               if (vma == ERR_PTR(-ENOSPC)) {
> +                       ret = mutex_lock_interruptible(&ggtt->vm.mutex);
> +                       if (!ret) {
> +                               ret = i915_gem_evict_vm(&ggtt->vm);
> +                               mutex_unlock(&ggtt->vm.mutex);
> +                       }
> +                       if (ret)
> +                               goto err_reset;
> +                       vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
> +               }
> +               GEM_WARN_ON(vma == ERR_PTR(-ENOSPC));

Looks like this is being triggered in CI, I assume because the trylock
could easily fail, due to contention? Is this expected for now? Do we
keep the WARN and track it as a known issue?

>         }
>         if (IS_ERR(vma)) {
>                 ret = PTR_ERR(vma);
> --
> 2.34.1
>
Maarten Lankhorst Dec. 17, 2021, 3:29 p.m. UTC | #2
On 17-12-2021 12:58, Matthew Auld wrote:
> On Thu, 16 Dec 2021 at 14:28, Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com> wrote:
>> Now that we cannot unbind kill the currently locked object directly
> "unbind kill"
>
>> because we're removing short term pinning, we may have to unbind the
>> object from gtt manually, using a i915_gem_evict_vm() call.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Maybe mention that this only in preparation for some future patches,
> once the actual eviction is trylock and evict_for_vm can also handle
> shared dma-resv? At this point in the series we shouldn't expect to
> hit -ENOSPC, right?
>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index af81d6c3332a..00cd9642669a 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -358,8 +358,22 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>                         vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
>>                 }
>>
>> -               /* The entire mappable GGTT is pinned? Unexpected! */
>> -               GEM_BUG_ON(vma == ERR_PTR(-ENOSPC));
>> +               /*
>> +                * The entire mappable GGTT is pinned? Unexpected!
>> +                * Try to evict the object we locked too, as normally we skip it
>> +                * due to lack of short term pinning inside execbuf.
>> +                */
>> +               if (vma == ERR_PTR(-ENOSPC)) {
>> +                       ret = mutex_lock_interruptible(&ggtt->vm.mutex);
>> +                       if (!ret) {
>> +                               ret = i915_gem_evict_vm(&ggtt->vm);
>> +                               mutex_unlock(&ggtt->vm.mutex);
>> +                       }
>> +                       if (ret)
>> +                               goto err_reset;
>> +                       vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
>> +               }
>> +               GEM_WARN_ON(vma == ERR_PTR(-ENOSPC));
> Looks like this is being triggered in CI, I assume because the trylock
> could easily fail, due to contention? Is this expected for now? Do we
> keep the WARN and track it as a known issue?

I think it makes sense. I can probably fix i915_gem_evict_vm to attempt to take all objects in a blocking way.

I had some primitives that could lock for eviction, and keep a refcount on the object. i915_gem_evict_vm could probably be changed to use it.

>>         }
>>         if (IS_ERR(vma)) {
>>                 ret = PTR_ERR(vma);
>> --
>> 2.34.1
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index af81d6c3332a..00cd9642669a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -358,8 +358,22 @@  static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
 			vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
 		}
 
-		/* The entire mappable GGTT is pinned? Unexpected! */
-		GEM_BUG_ON(vma == ERR_PTR(-ENOSPC));
+		/*
+		 * The entire mappable GGTT is pinned? Unexpected!
+		 * Try to evict the object we locked too, as normally we skip it
+		 * due to lack of short term pinning inside execbuf.
+		 */
+		if (vma == ERR_PTR(-ENOSPC)) {
+			ret = mutex_lock_interruptible(&ggtt->vm.mutex);
+			if (!ret) {
+				ret = i915_gem_evict_vm(&ggtt->vm);
+				mutex_unlock(&ggtt->vm.mutex);
+			}
+			if (ret)
+				goto err_reset;
+			vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
+		}
+		GEM_WARN_ON(vma == ERR_PTR(-ENOSPC));
 	}
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);