[v1,5/6] drm/lima: remove guilty drm_sched context handling

Message ID	20240117031212.1104034-6-nunes.erico@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Erico Nunes <nunes.erico@gmail.com> To: Qiang Yu <yuq825@gmail.com>, dri-devel@lists.freedesktop.org, lima@lists.freedesktop.org Subject: [PATCH v1 5/6] drm/lima: remove guilty drm_sched context handling Date: Wed, 17 Jan 2024 04:12:11 +0100 Message-ID: <20240117031212.1104034-6-nunes.erico@gmail.com> In-Reply-To: <20240117031212.1104034-1-nunes.erico@gmail.com> References: <20240117031212.1104034-1-nunes.erico@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: Thomas Zimmermann <tzimmermann@suse.de>, linux-kernel@vger.kernel.org, Maxime Ripard <mripard@kernel.org>, christian.koenig@amd.com, anarsoul@gmail.com, Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@gmail.com>, Sumit Semwal <sumit.semwal@linaro.org>, Erico Nunes <nunes.erico@gmail.com> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/lima: fixes and improvements to error recovery \| expand [v1,0/6] drm/lima: fixes and improvements to error recovery [v1,1/6] drm/lima: fix devfreq refcount imbalance for job timeouts [v1,2/6] drm/lima: reset async_reset on pp hard reset [v1,3/6] drm/lima: set bus_stop bit before hard reset [v1,4/6] drm/lima: handle spurious timeouts due to high irq latency [v1,5/6] drm/lima: remove guilty drm_sched context handling [v1,6/6] drm/lima: improve some pp debug messages

Message ID

20240117031212.1104034-6-nunes.erico@gmail.com (mailing list archive)

State

New, archived

Headers

From: Erico Nunes <nunes.erico@gmail.com>
To: Qiang Yu <yuq825@gmail.com>, dri-devel@lists.freedesktop.org,
 lima@lists.freedesktop.org
Subject: [PATCH v1 5/6] drm/lima: remove guilty drm_sched context handling
Date: Wed, 17 Jan 2024 04:12:11 +0100
Message-ID: <20240117031212.1104034-6-nunes.erico@gmail.com>
In-Reply-To: <20240117031212.1104034-1-nunes.erico@gmail.com>
References: <20240117031212.1104034-1-nunes.erico@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Cc: Thomas Zimmermann <tzimmermann@suse.de>, linux-kernel@vger.kernel.org,
 Maxime Ripard <mripard@kernel.org>, christian.koenig@amd.com,
 anarsoul@gmail.com, Daniel Vetter <daniel@ffwll.ch>,
 David Airlie <airlied@gmail.com>, Sumit Semwal <sumit.semwal@linaro.org>,
 Erico Nunes <nunes.erico@gmail.com>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/lima: fixes and improvements to error recovery | expand

Commit Message

Erico Nunes Jan. 17, 2024, 3:12 a.m. UTC

Marking the context as guilty currently only makes the application which
hits a single timeout problem to stop its rendering context entirely.
All jobs submitted later are dropped from the guilty context.

Lima runs on fairly underpowered hardware for modern standards and it is
not entirely unreasonable that a rendering job may time out occasionally
due to high system load or too demanding application stack. In this case
it would be generally preferred to report the error but try to keep the
application going.

Other similar embedded GPU drivers don't make use of the guilty context
flag. Now that there are reliability improvements to the lima timeout
recovery handling, drop the guilty contexts to let the application keep
running in this case.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
---
 drivers/gpu/drm/lima/lima_ctx.c   | 2 +-
 drivers/gpu/drm/lima/lima_ctx.h   | 1 -
 drivers/gpu/drm/lima/lima_sched.c | 5 ++---
 drivers/gpu/drm/lima/lima_sched.h | 3 +--
 4 files changed, 4 insertions(+), 7 deletions(-)

Comments

Vasily Khoruzhick Jan. 17, 2024, 6:28 p.m. UTC | #1

On Tue, Jan 16, 2024 at 7:12 PM Erico Nunes <nunes.erico@gmail.com> wrote:
>
> Marking the context as guilty currently only makes the application which
> hits a single timeout problem to stop its rendering context entirely.
> All jobs submitted later are dropped from the guilty context.
>
> Lima runs on fairly underpowered hardware for modern standards and it is
> not entirely unreasonable that a rendering job may time out occasionally
> due to high system load or too demanding application stack. In this case
> it would be generally preferred to report the error but try to keep the
> application going.
>
> Other similar embedded GPU drivers don't make use of the guilty context
> flag. Now that there are reliability improvements to the lima timeout
> recovery handling, drop the guilty contexts to let the application keep
> running in this case.
>
> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>

> ---
>  drivers/gpu/drm/lima/lima_ctx.c   | 2 +-
>  drivers/gpu/drm/lima/lima_ctx.h   | 1 -
>  drivers/gpu/drm/lima/lima_sched.c | 5 ++---
>  drivers/gpu/drm/lima/lima_sched.h | 3 +--
>  4 files changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/lima/lima_ctx.c b/drivers/gpu/drm/lima/lima_ctx.c
> index 8389f2d7d021..0e668fc1e0f9 100644
> --- a/drivers/gpu/drm/lima/lima_ctx.c
> +++ b/drivers/gpu/drm/lima/lima_ctx.c
> @@ -19,7 +19,7 @@ int lima_ctx_create(struct lima_device *dev, struct lima_ctx_mgr *mgr, u32 *id)
>         kref_init(&ctx->refcnt);
>
>         for (i = 0; i < lima_pipe_num; i++) {
> -               err = lima_sched_context_init(dev->pipe + i, ctx->context + i, &ctx->guilty);
> +               err = lima_sched_context_init(dev->pipe + i, ctx->context + i);
>                 if (err)
>                         goto err_out0;
>         }
> diff --git a/drivers/gpu/drm/lima/lima_ctx.h b/drivers/gpu/drm/lima/lima_ctx.h
> index 74e2be09090f..5b1063ce968b 100644
> --- a/drivers/gpu/drm/lima/lima_ctx.h
> +++ b/drivers/gpu/drm/lima/lima_ctx.h
> @@ -13,7 +13,6 @@ struct lima_ctx {
>         struct kref refcnt;
>         struct lima_device *dev;
>         struct lima_sched_context context[lima_pipe_num];
> -       atomic_t guilty;
>
>         /* debug info */
>         char pname[TASK_COMM_LEN];
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 9449b81bcd5b..496c79713fe8 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -154,13 +154,12 @@ void lima_sched_task_fini(struct lima_sched_task *task)
>  }
>
>  int lima_sched_context_init(struct lima_sched_pipe *pipe,
> -                           struct lima_sched_context *context,
> -                           atomic_t *guilty)
> +                           struct lima_sched_context *context)
>  {
>         struct drm_gpu_scheduler *sched = &pipe->base;
>
>         return drm_sched_entity_init(&context->base, DRM_SCHED_PRIORITY_NORMAL,
> -                                    &sched, 1, guilty);
> +                                    &sched, 1, NULL);
>  }
>
>  void lima_sched_context_fini(struct lima_sched_pipe *pipe,
> diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
> index 34050facb110..677e908b53f8 100644
> --- a/drivers/gpu/drm/lima/lima_sched.h
> +++ b/drivers/gpu/drm/lima/lima_sched.h
> @@ -93,8 +93,7 @@ int lima_sched_task_init(struct lima_sched_task *task,
>  void lima_sched_task_fini(struct lima_sched_task *task);
>
>  int lima_sched_context_init(struct lima_sched_pipe *pipe,
> -                           struct lima_sched_context *context,
> -                           atomic_t *guilty);
> +                           struct lima_sched_context *context);
>  void lima_sched_context_fini(struct lima_sched_pipe *pipe,
>                              struct lima_sched_context *context);
>  struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task);
> --
> 2.43.0
>

diff --git a/drivers/gpu/drm/lima/lima_ctx.c b/drivers/gpu/drm/lima/lima_ctx.c
index 8389f2d7d021..0e668fc1e0f9 100644
--- a/drivers/gpu/drm/lima/lima_ctx.c
+++ b/drivers/gpu/drm/lima/lima_ctx.c
@@ -19,7 +19,7 @@  int lima_ctx_create(struct lima_device *dev, struct lima_ctx_mgr *mgr, u32 *id)
 	kref_init(&ctx->refcnt);
 
 	for (i = 0; i < lima_pipe_num; i++) {
-		err = lima_sched_context_init(dev->pipe + i, ctx->context + i, &ctx->guilty);
+		err = lima_sched_context_init(dev->pipe + i, ctx->context + i);
 		if (err)
 			goto err_out0;
 	}
diff --git a/drivers/gpu/drm/lima/lima_ctx.h b/drivers/gpu/drm/lima/lima_ctx.h
index 74e2be09090f..5b1063ce968b 100644
--- a/drivers/gpu/drm/lima/lima_ctx.h
+++ b/drivers/gpu/drm/lima/lima_ctx.h
@@ -13,7 +13,6 @@  struct lima_ctx {
 	struct kref refcnt;
 	struct lima_device *dev;
 	struct lima_sched_context context[lima_pipe_num];
-	atomic_t guilty;
 
 	/* debug info */
 	char pname[TASK_COMM_LEN];
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 9449b81bcd5b..496c79713fe8 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -154,13 +154,12 @@  void lima_sched_task_fini(struct lima_sched_task *task)
 }
 
 int lima_sched_context_init(struct lima_sched_pipe *pipe,
-			    struct lima_sched_context *context,
-			    atomic_t *guilty)
+			    struct lima_sched_context *context)
 {
 	struct drm_gpu_scheduler *sched = &pipe->base;
 
 	return drm_sched_entity_init(&context->base, DRM_SCHED_PRIORITY_NORMAL,
-				     &sched, 1, guilty);
+				     &sched, 1, NULL);
 }
 
 void lima_sched_context_fini(struct lima_sched_pipe *pipe,
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index 34050facb110..677e908b53f8 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -93,8 +93,7 @@  int lima_sched_task_init(struct lima_sched_task *task,
 void lima_sched_task_fini(struct lima_sched_task *task);
 
 int lima_sched_context_init(struct lima_sched_pipe *pipe,
-			    struct lima_sched_context *context,
-			    atomic_t *guilty);
+			    struct lima_sched_context *context);
 void lima_sched_context_fini(struct lima_sched_pipe *pipe,
 			     struct lima_sched_context *context);
 struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task);

[v1,5/6] drm/lima: remove guilty drm_sched context handling

Commit Message

Comments

Patch