diff mbox series

drm/scheduler: Partially revert "drm/scheduler: track GPU active time per entity"

Message ID 20230817050435.3109-1-xinhui.pan@amd.com (mailing list archive)
State New, archived
Headers show
Series drm/scheduler: Partially revert "drm/scheduler: track GPU active time per entity" | expand

Commit Message

Pan, Xinhui Aug. 17, 2023, 5:04 a.m. UTC
This patch partially revert commit df622729ddbf ("drm/scheduler: track
GPU active time per entity") which touchs entity without any reference.

I notice there is one memory overwritten from gpu scheduler side.
The case is like below.
A(drm_sched_main)			B(vm fini)
drm_sched_job_begin			drm_sched_entity_kill
	//job in pending_list			wait_for_completion
complete_all				...
...					kfree entity
drm_sched_get_cleanup_job
	//fetch job from pending_list
	access job->entity //memory overwitten

As long as we can NOT guarantee entity is alive in this case, lets
revert it for now.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Pan, Xinhui Aug. 17, 2023, 5:09 a.m. UTC | #1
[AMD Official Use Only - General]

Can we just add kref for entity?
Or just collect such job time usage somewhere else?

-----Original Message-----
From: Pan, Xinhui <Xinhui.Pan@amd.com>
Sent: Thursday, August 17, 2023 1:05 PM
To: amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben <Luben.Tuikov@amd.com>; airlied@gmail.com; dri-devel@lists.freedesktop.org; l.stach@pengutronix.de; Koenig, Christian <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
Subject: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU active time per entity"

This patch partially revert commit df622729ddbf ("drm/scheduler: track GPU active time per entity") which touchs entity without any reference.

I notice there is one memory overwritten from gpu scheduler side.
The case is like below.
A(drm_sched_main)                       B(vm fini)
drm_sched_job_begin                     drm_sched_entity_kill
        //job in pending_list                   wait_for_completion
complete_all                            ...
...                                     kfree entity
drm_sched_get_cleanup_job
        //fetch job from pending_list
        access job->entity //memory overwitten

As long as we can NOT guarantee entity is alive in this case, lets revert it for now.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 602361c690c9..1b3f1a6a8514 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -907,12 +907,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)

        spin_unlock(&sched->job_list_lock);

-       if (job) {
-               job->entity->elapsed_ns += ktime_to_ns(
-                       ktime_sub(job->s_fence->finished.timestamp,
-                                 job->s_fence->scheduled.timestamp));
-       }
-
        return job;
 }

--
2.34.1
Chen, Guchun Aug. 17, 2023, 5:10 a.m. UTC | #2
[Public]

Hi Xinhui,

That patch has been reverted on Linux mainline.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/scheduler/sched_main.c?h=v6.5-rc6&id=baad10973fdb442912af676de3348e80bd8fe602

Regards,
Guchun

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
> xinhui pan
> Sent: Thursday, August 17, 2023 1:05 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Pan, Xinhui <Xinhui.Pan@amd.com>; dri-devel@lists.freedesktop.org;
> Tuikov, Luben <Luben.Tuikov@amd.com>; airlied@gmail.com; Koenig,
> Christian <Christian.Koenig@amd.com>; l.stach@pengutronix.de
> Subject: [PATCH] drm/scheduler: Partially revert "drm/scheduler: track GPU
> active time per entity"
>
> This patch partially revert commit df622729ddbf ("drm/scheduler: track GPU
> active time per entity") which touchs entity without any reference.
>
> I notice there is one memory overwritten from gpu scheduler side.
> The case is like below.
> A(drm_sched_main)                     B(vm fini)
> drm_sched_job_begin                   drm_sched_entity_kill
>       //job in pending_list                   wait_for_completion
> complete_all                          ...
> ...                                   kfree entity
> drm_sched_get_cleanup_job
>       //fetch job from pending_list
>       access job->entity //memory overwitten
>
> As long as we can NOT guarantee entity is alive in this case, lets revert it for
> now.
>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 6 ------
>  1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 602361c690c9..1b3f1a6a8514 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -907,12 +907,6 @@ drm_sched_get_cleanup_job(struct
> drm_gpu_scheduler *sched)
>
>       spin_unlock(&sched->job_list_lock);
>
> -     if (job) {
> -             job->entity->elapsed_ns += ktime_to_ns(
> -                     ktime_sub(job->s_fence->finished.timestamp,
> -                               job->s_fence->scheduled.timestamp));
> -     }
> -
>       return job;
>  }
>
> --
> 2.34.1
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 602361c690c9..1b3f1a6a8514 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -907,12 +907,6 @@  drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 
 	spin_unlock(&sched->job_list_lock);
 
-	if (job) {
-		job->entity->elapsed_ns += ktime_to_ns(
-			ktime_sub(job->s_fence->finished.timestamp,
-				  job->s_fence->scheduled.timestamp));
-	}
-
 	return job;
 }