diff mbox series

[v2,1/4] drm/panthor: Fix tiler OOM handling to allow incremental rendering

Message ID 20240430112852.486424-2-boris.brezillon@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/panthor: Collection of tiler heap related fixes | expand

Commit Message

Boris Brezillon April 30, 2024, 11:28 a.m. UTC
From: Antonino Maniscalco <antonino.maniscalco@collabora.com>

If the kernel couldn't allocate memory because we reached the maximum
number of chunks but no render passes are in flight
(panthor_heap_grow() returning -ENOMEM), we should defer the OOM
handling to the FW by returning a NULL chunk. The FW will then call
the tiler OOM exception handler, which is supposed to implement
incremental rendering (execute an intermediate fragment job to flush
the pending primitives, release the tiler memory that was used to
store those primitives, and start over from where it stopped).

Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
return ENOMEM no matter the reason of this allocation failure, the FW
doesn't care anyway.

v2:
- Make panthor_heap_grow() return -ENOMEM for all kind of allocation
  failures
- Document the panthor_heap_grow() semantics

Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_heap.c  | 12 ++++++++----
 drivers/gpu/drm/panthor/panthor_sched.c |  7 ++++++-
 2 files changed, 14 insertions(+), 5 deletions(-)

Comments

Liviu Dudau April 30, 2024, 3:27 p.m. UTC | #1
On Tue, Apr 30, 2024 at 01:28:49PM +0200, Boris Brezillon wrote:
> From: Antonino Maniscalco <antonino.maniscalco@collabora.com>
> 
> If the kernel couldn't allocate memory because we reached the maximum
> number of chunks but no render passes are in flight
> (panthor_heap_grow() returning -ENOMEM), we should defer the OOM
> handling to the FW by returning a NULL chunk. The FW will then call
> the tiler OOM exception handler, which is supposed to implement
> incremental rendering (execute an intermediate fragment job to flush
> the pending primitives, release the tiler memory that was used to
> store those primitives, and start over from where it stopped).
> 
> Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
> return ENOMEM no matter the reason of this allocation failure, the FW
> doesn't care anyway.
> 
> v2:
> - Make panthor_heap_grow() return -ENOMEM for all kind of allocation
>   failures
> - Document the panthor_heap_grow() semantics
> 
> Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
> Signed-off-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>

Best regards,
Liviu

> ---
>  drivers/gpu/drm/panthor/panthor_heap.c  | 12 ++++++++----
>  drivers/gpu/drm/panthor/panthor_sched.c |  7 ++++++-
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c b/drivers/gpu/drm/panthor/panthor_heap.c
> index 143fa35f2e74..c3c0ba744937 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -410,6 +410,13 @@ int panthor_heap_return_chunk(struct panthor_heap_pool *pool,
>   * @renderpasses_in_flight: Number of render passes currently in-flight.
>   * @pending_frag_count: Number of fragment jobs waiting for execution/completion.
>   * @new_chunk_gpu_va: Pointer used to return the chunk VA.
> + *
> + * Return:
> + * - 0 if a new heap was allocated
> + * - -ENOMEM if the tiler context reached the maximum number of chunks
> + *   or if too many render passes are in-flight
> + *   or if the allocation failed
> + * - -EINVAL if any of the arguments passed to panthor_heap_grow() is invalid
>   */
>  int panthor_heap_grow(struct panthor_heap_pool *pool,
>  		      u64 heap_gpu_va,
> @@ -439,10 +446,7 @@ int panthor_heap_grow(struct panthor_heap_pool *pool,
>  	 * handler provided by the userspace driver, if any).
>  	 */
>  	if (renderpasses_in_flight > heap->target_in_flight ||
> -	    (pending_frag_count > 0 && heap->chunk_count >= heap->max_chunks)) {
> -		ret = -EBUSY;
> -		goto out_unlock;
> -	} else if (heap->chunk_count >= heap->max_chunks) {
> +	    heap->chunk_count >= heap->max_chunks) {
>  		ret = -ENOMEM;
>  		goto out_unlock;
>  	}
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index b3a51a6de523..fd928362d45e 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1354,7 +1354,12 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id)
>  					pending_frag_count, &new_chunk_va);
>  	}
>  
> -	if (ret && ret != -EBUSY) {
> +	/* If the heap context doesn't have memory for us, we want to let the
> +	 * FW try to reclaim memory by waiting for fragment jobs to land or by
> +	 * executing the tiler OOM exception handler, which is supposed to
> +	 * implement incremental rendering.
> +	 */
> +	if (ret && ret != -ENOMEM) {
>  		drm_warn(&ptdev->base, "Failed to extend the tiler heap\n");
>  		group->fatal_queues |= BIT(cs_id);
>  		sched_queue_delayed_work(sched, tick, 0);
> -- 
> 2.44.0
>
Steven Price May 2, 2024, 2:03 p.m. UTC | #2
On 30/04/2024 12:28, Boris Brezillon wrote:
> From: Antonino Maniscalco <antonino.maniscalco@collabora.com>
> 
> If the kernel couldn't allocate memory because we reached the maximum
> number of chunks but no render passes are in flight
> (panthor_heap_grow() returning -ENOMEM), we should defer the OOM
> handling to the FW by returning a NULL chunk. The FW will then call
> the tiler OOM exception handler, which is supposed to implement
> incremental rendering (execute an intermediate fragment job to flush
> the pending primitives, release the tiler memory that was used to
> store those primitives, and start over from where it stopped).
> 
> Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
> return ENOMEM no matter the reason of this allocation failure, the FW
> doesn't care anyway.
> 
> v2:
> - Make panthor_heap_grow() return -ENOMEM for all kind of allocation
>   failures
> - Document the panthor_heap_grow() semantics
> 
> Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
> Signed-off-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

Reviewed-by: Steven Price <steven.price@arm.com>

Thanks,
Steve

> ---
>  drivers/gpu/drm/panthor/panthor_heap.c  | 12 ++++++++----
>  drivers/gpu/drm/panthor/panthor_sched.c |  7 ++++++-
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c b/drivers/gpu/drm/panthor/panthor_heap.c
> index 143fa35f2e74..c3c0ba744937 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -410,6 +410,13 @@ int panthor_heap_return_chunk(struct panthor_heap_pool *pool,
>   * @renderpasses_in_flight: Number of render passes currently in-flight.
>   * @pending_frag_count: Number of fragment jobs waiting for execution/completion.
>   * @new_chunk_gpu_va: Pointer used to return the chunk VA.
> + *
> + * Return:
> + * - 0 if a new heap was allocated
> + * - -ENOMEM if the tiler context reached the maximum number of chunks
> + *   or if too many render passes are in-flight
> + *   or if the allocation failed
> + * - -EINVAL if any of the arguments passed to panthor_heap_grow() is invalid
>   */
>  int panthor_heap_grow(struct panthor_heap_pool *pool,
>  		      u64 heap_gpu_va,
> @@ -439,10 +446,7 @@ int panthor_heap_grow(struct panthor_heap_pool *pool,
>  	 * handler provided by the userspace driver, if any).
>  	 */
>  	if (renderpasses_in_flight > heap->target_in_flight ||
> -	    (pending_frag_count > 0 && heap->chunk_count >= heap->max_chunks)) {
> -		ret = -EBUSY;
> -		goto out_unlock;
> -	} else if (heap->chunk_count >= heap->max_chunks) {
> +	    heap->chunk_count >= heap->max_chunks) {
>  		ret = -ENOMEM;
>  		goto out_unlock;
>  	}
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index b3a51a6de523..fd928362d45e 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1354,7 +1354,12 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id)
>  					pending_frag_count, &new_chunk_va);
>  	}
>  
> -	if (ret && ret != -EBUSY) {
> +	/* If the heap context doesn't have memory for us, we want to let the
> +	 * FW try to reclaim memory by waiting for fragment jobs to land or by
> +	 * executing the tiler OOM exception handler, which is supposed to
> +	 * implement incremental rendering.
> +	 */
> +	if (ret && ret != -ENOMEM) {
>  		drm_warn(&ptdev->base, "Failed to extend the tiler heap\n");
>  		group->fatal_queues |= BIT(cs_id);
>  		sched_queue_delayed_work(sched, tick, 0);
diff mbox series

Patch

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c b/drivers/gpu/drm/panthor/panthor_heap.c
index 143fa35f2e74..c3c0ba744937 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -410,6 +410,13 @@  int panthor_heap_return_chunk(struct panthor_heap_pool *pool,
  * @renderpasses_in_flight: Number of render passes currently in-flight.
  * @pending_frag_count: Number of fragment jobs waiting for execution/completion.
  * @new_chunk_gpu_va: Pointer used to return the chunk VA.
+ *
+ * Return:
+ * - 0 if a new heap was allocated
+ * - -ENOMEM if the tiler context reached the maximum number of chunks
+ *   or if too many render passes are in-flight
+ *   or if the allocation failed
+ * - -EINVAL if any of the arguments passed to panthor_heap_grow() is invalid
  */
 int panthor_heap_grow(struct panthor_heap_pool *pool,
 		      u64 heap_gpu_va,
@@ -439,10 +446,7 @@  int panthor_heap_grow(struct panthor_heap_pool *pool,
 	 * handler provided by the userspace driver, if any).
 	 */
 	if (renderpasses_in_flight > heap->target_in_flight ||
-	    (pending_frag_count > 0 && heap->chunk_count >= heap->max_chunks)) {
-		ret = -EBUSY;
-		goto out_unlock;
-	} else if (heap->chunk_count >= heap->max_chunks) {
+	    heap->chunk_count >= heap->max_chunks) {
 		ret = -ENOMEM;
 		goto out_unlock;
 	}
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index b3a51a6de523..fd928362d45e 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1354,7 +1354,12 @@  static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id)
 					pending_frag_count, &new_chunk_va);
 	}
 
-	if (ret && ret != -EBUSY) {
+	/* If the heap context doesn't have memory for us, we want to let the
+	 * FW try to reclaim memory by waiting for fragment jobs to land or by
+	 * executing the tiler OOM exception handler, which is supposed to
+	 * implement incremental rendering.
+	 */
+	if (ret && ret != -ENOMEM) {
 		drm_warn(&ptdev->base, "Failed to extend the tiler heap\n");
 		group->fatal_queues |= BIT(cs_id);
 		sched_queue_delayed_work(sched, tick, 0);