Message ID | 1534518942-13167-1-git-send-email-andrey.grodzovsky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/scheduler: Add stopped flag to drm_sched_entity | expand |
-----Original Message----- From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Andrey Grodzovsky Sent: Friday, August 17, 2018 11:16 PM To: dri-devel@lists.freedesktop.org Cc: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removed from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++++- include/drm/gpu_scheduler.h | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 1416edb..07cfe63 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -177,8 +177,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) /* For killed process disable any more IBs enqueue right now */ last_user = cmpxchg(&entity->last_user, current->group_leader, NULL); if ((!last_user || last_user == current->group_leader) && - (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) + (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) { + spin_lock(&entity->rq_lock); + entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); + spin_unlock(&entity->rq_lock); + } return ret; } @@ -504,6 +508,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job, if (first) { /* Add the entity to the run queue */ spin_lock(&entity->rq_lock); + if (entity->stopped) { + spin_unlock(&entity->rq_lock); + return; + } [DZ] the code changes so frequent recently and has this regression, my code synced last Friday still has below checking: spin_lock(&entity->rq_lock); if (!entity->rq) { DRM_ERROR("Trying to push to a killed entity\n"); spin_unlock(&entity->rq_lock); return; } So you should add DRM_ERROR as well when hitting it. With that fix, patch is Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Regards, David Zhou drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(&entity->rq_lock); drm_sched_wakeup(entity->rq->sched); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 919ae57..daec50f 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -70,6 +70,7 @@ enum drm_sched_priority { * @fini_status: contains the exit status in case the process was signalled. * @last_scheduled: points to the finished fence of the last scheduled job. * @last_user: last group leader pushing a job into the entity. + * @stopped: Marks the enity as removed from rq and destined for termination. * * Entities will emit jobs in order to their corresponding hardware * ring, and the scheduler will alternate between entities based on @@ -92,6 +93,7 @@ struct drm_sched_entity { atomic_t *guilty; struct dma_fence *last_scheduled; struct task_struct *last_user; + bool stopped; }; /** -- 2.7.4
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 1416edb..07cfe63 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -177,8 +177,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) /* For killed process disable any more IBs enqueue right now */ last_user = cmpxchg(&entity->last_user, current->group_leader, NULL); if ((!last_user || last_user == current->group_leader) && - (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) + (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) { + spin_lock(&entity->rq_lock); + entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); + spin_unlock(&entity->rq_lock); + } return ret; } @@ -504,6 +508,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job, if (first) { /* Add the entity to the run queue */ spin_lock(&entity->rq_lock); + if (entity->stopped) { + spin_unlock(&entity->rq_lock); + return; + } drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(&entity->rq_lock); drm_sched_wakeup(entity->rq->sched); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 919ae57..daec50f 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -70,6 +70,7 @@ enum drm_sched_priority { * @fini_status: contains the exit status in case the process was signalled. * @last_scheduled: points to the finished fence of the last scheduled job. * @last_user: last group leader pushing a job into the entity. + * @stopped: Marks the enity as removed from rq and destined for termination. * * Entities will emit jobs in order to their corresponding hardware * ring, and the scheduler will alternate between entities based on @@ -92,6 +93,7 @@ struct drm_sched_entity { atomic_t *guilty; struct dma_fence *last_scheduled; struct task_struct *last_user; + bool stopped; }; /**
The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removed from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++++- include/drm/gpu_scheduler.h | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-)