Message ID | 20220929132136.1715-1-christian.koenig@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/13] drm/scheduler: fix fence ref counting | expand |
I've forgot to add a cover letter, so here some more background to this change. Basically I'm switching amdgpu over to using the dependencies inside the drm_sched_job instead of it's own data structure. This has the major advantage that we can keep those dependencies around after the entity is already freed up. Otherwise the blocking for killed entities can easily result in a deadlock. Regards, Christian. Am 29.09.22 um 15:21 schrieb Christian König: > We leaked dependency fences when processes were beeing killed. > > Additional to that grab a reference to the last scheduled fence. > > Signed-off-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/scheduler/sched_entity.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 191c56064f19..1bb1437a8fed 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -207,6 +207,7 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, > struct drm_sched_job *job = container_of(cb, struct drm_sched_job, > finish_cb); > > + dma_fence_put(f); > init_irq_work(&job->work, drm_sched_entity_kill_jobs_irq_work); > irq_work_queue(&job->work); > } > @@ -234,8 +235,10 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) > struct drm_sched_fence *s_fence = job->s_fence; > > /* Wait for all dependencies to avoid data corruptions */ > - while ((f = drm_sched_job_dependency(job, entity))) > + while ((f = drm_sched_job_dependency(job, entity))) { > dma_fence_wait(f, false); > + dma_fence_put(f); > + } > > drm_sched_fence_scheduled(s_fence); > dma_fence_set_error(&s_fence->finished, -ESRCH); > @@ -250,6 +253,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) > continue; > } > > + dma_fence_get(entity->last_scheduled); > r = dma_fence_add_callback(entity->last_scheduled, > &job->finish_cb, > drm_sched_entity_kill_jobs_cb);
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 191c56064f19..1bb1437a8fed 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -207,6 +207,7 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, struct drm_sched_job *job = container_of(cb, struct drm_sched_job, finish_cb); + dma_fence_put(f); init_irq_work(&job->work, drm_sched_entity_kill_jobs_irq_work); irq_work_queue(&job->work); } @@ -234,8 +235,10 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) struct drm_sched_fence *s_fence = job->s_fence; /* Wait for all dependencies to avoid data corruptions */ - while ((f = drm_sched_job_dependency(job, entity))) + while ((f = drm_sched_job_dependency(job, entity))) { dma_fence_wait(f, false); + dma_fence_put(f); + } drm_sched_fence_scheduled(s_fence); dma_fence_set_error(&s_fence->finished, -ESRCH); @@ -250,6 +253,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) continue; } + dma_fence_get(entity->last_scheduled); r = dma_fence_add_callback(entity->last_scheduled, &job->finish_cb, drm_sched_entity_kill_jobs_cb);
We leaked dependency fences when processes were beeing killed. Additional to that grab a reference to the last scheduled fence. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)