Message ID | 20210708215439.4093557-1-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/sched: Barriers are needed for entity->last_scheduled | expand |
Am 08.07.21 um 23:54 schrieb Daniel Vetter: > It might be good enough on x86 with just READ_ONCE, but the write side > should then at least be WRITE_ONCE because x86 has total store order. > > It's definitely not enough on arm. > > Fix this proplery, which means > - explain the need for the barrier in both places > - point at the other side in each comment > > Also pull out the !sched_list case as the first check, so that the > code flow is clearer. > > While at it sprinkle some comments around because it was very > non-obvious to me what's actually going on here and why. > > Note that we really need full barriers here, at first I thought > store-release and load-acquire on ->last_scheduled would be enough, > but we actually requiring ordering between that and the queue state. > > v2: Put smp_rmp() in the right place and fix up comment (Andrey) > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: Steven Price <steven.price@arm.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Cc: Lee Jones <lee.jones@linaro.org> > Cc: Boris Brezillon <boris.brezillon@collabora.com> > --- > drivers/gpu/drm/scheduler/sched_entity.c | 27 ++++++++++++++++++++++-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 64d398166644..6366006c0fcf 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -439,8 +439,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) > dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED); > > dma_fence_put(entity->last_scheduled); > + > entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); > > + /* > + * If the queue is empty we allow drm_sched_entity_select_rq() to > + * locklessly access ->last_scheduled. This only works if we set the > + * pointer before we dequeue and if we a write barrier here. > + */ > + smp_wmb(); > + That whole stuff needs to be inside the spsc queue, not outside. Otherwise drm_sched_entity_is_idle() won't work either and cause a lot of trouble during process tear down. Christian. > spsc_queue_pop(&entity->job_queue); > return sched_job; > } > @@ -459,10 +467,25 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) > struct drm_gpu_scheduler *sched; > struct drm_sched_rq *rq; > > - if (spsc_queue_count(&entity->job_queue) || !entity->sched_list) > + /* single possible engine and already selected */ > + if (!entity->sched_list) > + return; > + > + /* queue non-empty, stay on the same engine */ > + if (spsc_queue_count(&entity->job_queue)) > return; > > - fence = READ_ONCE(entity->last_scheduled); > + /* > + * Only when the queue is empty are we guaranteed that the scheduler > + * thread cannot change ->last_scheduled. To enforce ordering we need > + * a read barrier here. See drm_sched_entity_pop_job() for the other > + * side. > + */ > + smp_rmb(); > + > + fence = entity->last_scheduled; > + > + /* stay on the same engine if the previous job hasn't finished */ > if (fence && !dma_fence_is_signaled(fence)) > return; >
On Fri, Jul 9, 2021 at 8:58 AM Christian König <christian.koenig@amd.com> wrote: > > > > Am 08.07.21 um 23:54 schrieb Daniel Vetter: > > It might be good enough on x86 with just READ_ONCE, but the write side > > should then at least be WRITE_ONCE because x86 has total store order. > > > > It's definitely not enough on arm. > > > > Fix this proplery, which means > > - explain the need for the barrier in both places > > - point at the other side in each comment > > > > Also pull out the !sched_list case as the first check, so that the > > code flow is clearer. > > > > While at it sprinkle some comments around because it was very > > non-obvious to me what's actually going on here and why. > > > > Note that we really need full barriers here, at first I thought > > store-release and load-acquire on ->last_scheduled would be enough, > > but we actually requiring ordering between that and the queue state. > > > > v2: Put smp_rmp() in the right place and fix up comment (Andrey) > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > Cc: "Christian König" <christian.koenig@amd.com> > > Cc: Steven Price <steven.price@arm.com> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > > Cc: Lee Jones <lee.jones@linaro.org> > > Cc: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > drivers/gpu/drm/scheduler/sched_entity.c | 27 ++++++++++++++++++++++-- > > 1 file changed, 25 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > > index 64d398166644..6366006c0fcf 100644 > > --- a/drivers/gpu/drm/scheduler/sched_entity.c > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > > @@ -439,8 +439,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) > > dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED); > > > > dma_fence_put(entity->last_scheduled); > > + > > entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); > > > > + /* > > + * If the queue is empty we allow drm_sched_entity_select_rq() to > > + * locklessly access ->last_scheduled. This only works if we set the > > + * pointer before we dequeue and if we a write barrier here. > > + */ > > + smp_wmb(); > > + > > That whole stuff needs to be inside the spsc queue, not outside. > > Otherwise drm_sched_entity_is_idle() won't work either and cause a lot > of trouble during process tear down. Nah, that just means you need another 2 comments with their barrier to explain how things are serialized there against entity->stopped. The queue only needs to provide store-release and load-acquire barriers from a functional pov, if you assumie more then that's very strange. We need barriers in the other direction here (I haven't looked at what entity_is_idle) needs. This is why the first rule of lockless algorithms is "don't", and the second one is to very painstaikingly document every barrier necessary with - explanation what it synchronizes and why - and a pointer to where the other side of the barrier is in the code (there always has to be one, barrier on one side does nothing) -Daniel > Christian. > > > spsc_queue_pop(&entity->job_queue); > > return sched_job; > > } > > @@ -459,10 +467,25 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) > > struct drm_gpu_scheduler *sched; > > struct drm_sched_rq *rq; > > > > - if (spsc_queue_count(&entity->job_queue) || !entity->sched_list) > > + /* single possible engine and already selected */ > > + if (!entity->sched_list) > > + return; > > + > > + /* queue non-empty, stay on the same engine */ > > + if (spsc_queue_count(&entity->job_queue)) > > return; > > > > - fence = READ_ONCE(entity->last_scheduled); > > + /* > > + * Only when the queue is empty are we guaranteed that the scheduler > > + * thread cannot change ->last_scheduled. To enforce ordering we need > > + * a read barrier here. See drm_sched_entity_pop_job() for the other > > + * side. > > + */ > > + smp_rmb(); > > + > > + fence = entity->last_scheduled; > > + > > + /* stay on the same engine if the previous job hasn't finished */ > > if (fence && !dma_fence_is_signaled(fence)) > > return; > > >
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 64d398166644..6366006c0fcf 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -439,8 +439,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED); dma_fence_put(entity->last_scheduled); + entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished); + /* + * If the queue is empty we allow drm_sched_entity_select_rq() to + * locklessly access ->last_scheduled. This only works if we set the + * pointer before we dequeue and if we a write barrier here. + */ + smp_wmb(); + spsc_queue_pop(&entity->job_queue); return sched_job; } @@ -459,10 +467,25 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) struct drm_gpu_scheduler *sched; struct drm_sched_rq *rq; - if (spsc_queue_count(&entity->job_queue) || !entity->sched_list) + /* single possible engine and already selected */ + if (!entity->sched_list) + return; + + /* queue non-empty, stay on the same engine */ + if (spsc_queue_count(&entity->job_queue)) return; - fence = READ_ONCE(entity->last_scheduled); + /* + * Only when the queue is empty are we guaranteed that the scheduler + * thread cannot change ->last_scheduled. To enforce ordering we need + * a read barrier here. See drm_sched_entity_pop_job() for the other + * side. + */ + smp_rmb(); + + fence = entity->last_scheduled; + + /* stay on the same engine if the previous job hasn't finished */ if (fence && !dma_fence_is_signaled(fence)) return;
It might be good enough on x86 with just READ_ONCE, but the write side should then at least be WRITE_ONCE because x86 has total store order. It's definitely not enough on arm. Fix this proplery, which means - explain the need for the barrier in both places - point at the other side in each comment Also pull out the !sched_list case as the first check, so that the code flow is clearer. While at it sprinkle some comments around because it was very non-obvious to me what's actually going on here and why. Note that we really need full barriers here, at first I thought store-release and load-acquire on ->last_scheduled would be enough, but we actually requiring ordering between that and the queue state. v2: Put smp_rmp() in the right place and fix up comment (Andrey) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Steven Price <steven.price@arm.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Boris Brezillon <boris.brezillon@collabora.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 27 ++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)