drm/sched: Improve teardown documentation

Message ID	20241105143137.71893-2-pstanner@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Philipp Stanner <pstanner@redhat.com> To: Luben Tuikov <ltuikov89@gmail.com>, Matthew Brost <matthew.brost@intel.com>, Danilo Krummrich <dakr@kernel.org>, Philipp Stanner <pstanner@redhat.com>, Maarten Lankhorst <maarten.lankhorst@linux.intel.com>, Maxime Ripard <mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>, Tvrtko Ursulin <tursulin@igalia.com> Cc: linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH] drm/sched: Improve teardown documentation Date: Tue, 5 Nov 2024 15:31:38 +0100 Message-ID: <20241105143137.71893-2-pstanner@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII"; x-default=true Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/sched: Improve teardown documentation \| expand drm/sched: Improve teardown documentation

Message ID

20241105143137.71893-2-pstanner@redhat.com (mailing list archive)

State

New, archived

Headers

From: Philipp Stanner <pstanner@redhat.com>
To: Luben Tuikov <ltuikov89@gmail.com>,
 Matthew Brost <matthew.brost@intel.com>,
 Danilo Krummrich <dakr@kernel.org>, Philipp Stanner <pstanner@redhat.com>,
 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
 Maxime Ripard <mripard@kernel.org>,
 Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>,
 Simona Vetter <simona@ffwll.ch>, Tvrtko Ursulin <tursulin@igalia.com>
Cc: linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: [PATCH] drm/sched: Improve teardown documentation
Date: Tue,  5 Nov 2024 15:31:38 +0100
Message-ID: <20241105143137.71893-2-pstanner@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="US-ASCII"; x-default=true
Precedence: list
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/sched: Improve teardown documentation | expand

Commit Message

Philipp Stanner Nov. 5, 2024, 2:31 p.m. UTC

If jobs are still enqueued in struct drm_gpu_scheduler.pending_list
when drm_sched_fini() gets called, those jobs will be leaked since that
function stops both job-submission and (automatic) job-cleanup. It is,
thus, up to the driver to take care of preventing leaks.

The related function drm_sched_wqueue_stop() also prevents automatic job
cleanup.

Those pitfals are not reflected in the documentation, currently.

Explicitly inform about the leak problem in the docstring of
drm_sched_fini().

Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and
hint at the consequences for automatic cleanup.

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
Hi,

in our discussion about my proposed waitque-cleanup for this problem
Sima suggested [1] that we document the problems first and by doing so get
to a consenus what the problems actually are and how we could solve
them.

This is my proposal for documenting the leaks on teardown. Feedback very
welcome.

P.

[1] https://lore.kernel.org/dri-devel/ZtidJ8S9THvzkQ-6@phenom.ffwll.local/
---
 drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Comments

Philipp Stanner Nov. 7, 2024, 9:18 a.m. UTC | #1

On Tue, 2024-11-05 at 15:31 +0100, Philipp Stanner wrote:
> If jobs are still enqueued in struct drm_gpu_scheduler.pending_list
> when drm_sched_fini() gets called, those jobs will be leaked since
> that
> function stops both job-submission and (automatic) job-cleanup. It
> is,
> thus, up to the driver to take care of preventing leaks.
> 
> The related function drm_sched_wqueue_stop() also prevents automatic
> job
> cleanup.
> 
> Those pitfals are not reflected in the documentation, currently.
> 
> Explicitly inform about the leak problem in the docstring of
> drm_sched_fini().
> 
> Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and
> hint at the consequences for automatic cleanup.
> 
> Signed-off-by: Philipp Stanner <pstanner@redhat.com>

Got an off-list (mail from dri-devel apparently got lost) RB from
Christian.

Applied to drm-misc-next.

P.

> ---
> Hi,
> 
> in our discussion about my proposed waitque-cleanup for this problem
> Sima suggested [1] that we document the problems first and by doing
> so get
> to a consenus what the problems actually are and how we could solve
> them.
> 
> This is my proposal for documenting the leaks on teardown. Feedback
> very
> welcome.
> 
> P.
> 
> [1]
> https://lore.kernel.org/dri-devel/ZtidJ8S9THvzkQ-6@phenom.ffwll.local/
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index e97c6c60bc96..3dfa9db89484 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1333,6 +1333,19 @@ EXPORT_SYMBOL(drm_sched_init);
>   * @sched: scheduler instance
>   *
>   * Tears down and cleans up the scheduler.
> + *
> + * This stops submission of new jobs to the hardware through
> + * drm_sched_backend_ops.run_job(). Consequently,
> drm_sched_backend_ops.free_job()
> + * will not be called for all jobs still in
> drm_gpu_scheduler.pending_list.
> + * There is no solution for this currently. Thus, it is up to the
> driver to make
> + * sure that
> + *  a) drm_sched_fini() is only called after for all submitted jobs
> + *     drm_sched_backend_ops.free_job() has been called or that
> + *  b) the jobs for which drm_sched_backend_ops.free_job() has not
> been called
> + *     after drm_sched_fini() ran are freed manually.
> + *
> + * FIXME: Take care of the above problem and prevent this function
> from leaking
> + * the jobs in drm_gpu_scheduler.pending_list under any
> circumstances.
>   */
>  void drm_sched_fini(struct drm_gpu_scheduler *sched)
>  {
> @@ -1428,8 +1441,10 @@ EXPORT_SYMBOL(drm_sched_wqueue_ready);
>  
>  /**
>   * drm_sched_wqueue_stop - stop scheduler submission
> - *
>   * @sched: scheduler instance
> + *
> + * Stops the scheduler from pulling new jobs from entities. It also
> stops
> + * freeing jobs automatically through
> drm_sched_backend_ops.free_job().
>   */
>  void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
>  {
> @@ -1441,8 +1456,12 @@ EXPORT_SYMBOL(drm_sched_wqueue_stop);
>  
>  /**
>   * drm_sched_wqueue_start - start scheduler submission
> - *
>   * @sched: scheduler instance
> + *
> + * Restarts the scheduler after drm_sched_wqueue_stop() has stopped
> it.
> + *
> + * This function is not necessary for 'conventional' startup. The
> scheduler is
> + * fully operational after drm_sched_init() succeeded.
>   */
>  void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
>  {

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e97c6c60bc96..3dfa9db89484 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1333,6 +1333,19 @@  EXPORT_SYMBOL(drm_sched_init);
  * @sched: scheduler instance
  *
  * Tears down and cleans up the scheduler.
+ *
+ * This stops submission of new jobs to the hardware through
+ * drm_sched_backend_ops.run_job(). Consequently, drm_sched_backend_ops.free_job()
+ * will not be called for all jobs still in drm_gpu_scheduler.pending_list.
+ * There is no solution for this currently. Thus, it is up to the driver to make
+ * sure that
+ *  a) drm_sched_fini() is only called after for all submitted jobs
+ *     drm_sched_backend_ops.free_job() has been called or that
+ *  b) the jobs for which drm_sched_backend_ops.free_job() has not been called
+ *     after drm_sched_fini() ran are freed manually.
+ *
+ * FIXME: Take care of the above problem and prevent this function from leaking
+ * the jobs in drm_gpu_scheduler.pending_list under any circumstances.
  */
 void drm_sched_fini(struct drm_gpu_scheduler *sched)
 {
@@ -1428,8 +1441,10 @@  EXPORT_SYMBOL(drm_sched_wqueue_ready);
 
 /**
  * drm_sched_wqueue_stop - stop scheduler submission
- *
  * @sched: scheduler instance
+ *
+ * Stops the scheduler from pulling new jobs from entities. It also stops
+ * freeing jobs automatically through drm_sched_backend_ops.free_job().
  */
 void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched)
 {
@@ -1441,8 +1456,12 @@  EXPORT_SYMBOL(drm_sched_wqueue_stop);
 
 /**
  * drm_sched_wqueue_start - start scheduler submission
- *
  * @sched: scheduler instance
+ *
+ * Restarts the scheduler after drm_sched_wqueue_stop() has stopped it.
+ *
+ * This function is not necessary for 'conventional' startup. The scheduler is
+ * fully operational after drm_sched_init() succeeded.
  */
 void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched)
 {

drm/sched: Improve teardown documentation

Commit Message

Comments

Patch