Message ID | 1574691041-5499-1-git-send-email-andrey.grodzovsky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/2] drm/sched: Avoid job cleanup if sched thread is parked. | expand |
On 25/11/2019 14:10, Andrey Grodzovsky wrote: > When the sched thread is parked we assume ring_mirror_list is > not accessed from here. FWIW I don't think this is necessary. kthread_park() will wait until the thread is parked, at which point the thread is stuck in kthread_parkme() until unparked. So all this does is avoid waiting for any cleanup jobs before parking - which might be a reasonable goal in itself, but if so lets at least document that. Steve > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index d4cc728..6774955 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) > struct drm_sched_job *job; > unsigned long flags; > > - /* Don't destroy jobs while the timeout worker is running */ > - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && > - !cancel_delayed_work(&sched->work_tdr)) > + /* > + * Don't destroy jobs while the timeout worker is running OR thread > + * is being parked and hence assumed to not touch ring_mirror_list > + */ > + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > + !cancel_delayed_work(&sched->work_tdr)) || > + __kthread_should_park(sched->thread)) > return NULL; > > spin_lock_irqsave(&sched->job_list_lock, flags); >
Am 25.11.19 um 17:51 schrieb Steven Price: > On 25/11/2019 14:10, Andrey Grodzovsky wrote: >> When the sched thread is parked we assume ring_mirror_list is >> not accessed from here. > FWIW I don't think this is necessary. kthread_park() will wait until the > thread is parked, at which point the thread is stuck in kthread_parkme() > until unparked. > > So all this does is avoid waiting for any cleanup jobs before parking - > which might be a reasonable goal in itself, but if so lets at least > document that. Now that you mention it that is indeed wrong. The real problem is that in the main thread we mangled the call to kthread_parkme() into drm_sched_blocked() which can be called in atomic context. I suggest to rework this so that the kthread_should_park() and kthread_should_stop() test in wait_event_interruptible() come first and then call kthread_parkme() outside of the wait_event_interruptible(). Regards, Christian. > > Steve > >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Reviewed-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index d4cc728..6774955 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >> struct drm_sched_job *job; >> unsigned long flags; >> >> - /* Don't destroy jobs while the timeout worker is running */ >> - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >> - !cancel_delayed_work(&sched->work_tdr)) >> + /* >> + * Don't destroy jobs while the timeout worker is running OR thread >> + * is being parked and hence assumed to not touch ring_mirror_list >> + */ >> + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && >> + !cancel_delayed_work(&sched->work_tdr)) || >> + __kthread_should_park(sched->thread)) >> return NULL; >> >> spin_lock_irqsave(&sched->job_list_lock, flags); >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
On 11/26/19 4:08 AM, Christian König wrote: > Am 25.11.19 um 17:51 schrieb Steven Price: >> On 25/11/2019 14:10, Andrey Grodzovsky wrote: >>> When the sched thread is parked we assume ring_mirror_list is >>> not accessed from here. >> FWIW I don't think this is necessary. kthread_park() will wait until the >> thread is parked, at which point the thread is stuck in kthread_parkme() >> until unparked. >> >> So all this does is avoid waiting for any cleanup jobs before parking - >> which might be a reasonable goal in itself, but if so lets at least >> document that. > > Now that you mention it that is indeed wrong. I wouldn't s call it wrong but superfluous in current code as indeed once the thread is parked there will be no subsequent calls to drm_sched_get_cleanup_job until the thread is unpacked back, if for example we decide to call drm_sched_get_cleanup_job from a work item which keeps scheduled it would be needed. > > The real problem is that in the main thread we mangled the call to > kthread_parkme() into drm_sched_blocked() which can be called in > atomic context. Where is the atomic context in wait_event_interruptible ? I seem no to see any. Andrey > > I suggest to rework this so that the kthread_should_park() and > kthread_should_stop() test in wait_event_interruptible() come first > and then call kthread_parkme() outside of the wait_event_interruptible(). > > Regards, > Christian. > >> >> Steve >> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>> Reviewed-by: Christian König <christian.koenig@amd.com> >>> --- >>> drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index d4cc728..6774955 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct >>> drm_gpu_scheduler *sched) >>> struct drm_sched_job *job; >>> unsigned long flags; >>> - /* Don't destroy jobs while the timeout worker is running */ >>> - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>> - !cancel_delayed_work(&sched->work_tdr)) >>> + /* >>> + * Don't destroy jobs while the timeout worker is running OR thread >>> + * is being parked and hence assumed to not touch ring_mirror_list >>> + */ >>> + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && >>> + !cancel_delayed_work(&sched->work_tdr)) || >>> + __kthread_should_park(sched->thread)) >>> return NULL; >>> spin_lock_irqsave(&sched->job_list_lock, flags); >>> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Candrey.grodzovsky%40amd.com%7Ce81a824a5f984f51bc1908d772503a25%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637103561221298691&sdata=tW9Xupt7ascVPOlHxH0pHGcbUArVyTa5VTle016AcGg%3D&reserved=0 >> >
Ping... Andrey On 11/26/19 10:36 AM, Andrey Grodzovsky wrote: > > On 11/26/19 4:08 AM, Christian König wrote: >> Am 25.11.19 um 17:51 schrieb Steven Price: >>> On 25/11/2019 14:10, Andrey Grodzovsky wrote: >>>> When the sched thread is parked we assume ring_mirror_list is >>>> not accessed from here. >>> FWIW I don't think this is necessary. kthread_park() will wait until >>> the >>> thread is parked, at which point the thread is stuck in >>> kthread_parkme() >>> until unparked. >>> >>> So all this does is avoid waiting for any cleanup jobs before parking - >>> which might be a reasonable goal in itself, but if so lets at least >>> document that. >> >> Now that you mention it that is indeed wrong. > > > I wouldn't s call it wrong but superfluous in current code as indeed > once the thread is parked there will be no subsequent calls to > drm_sched_get_cleanup_job until the thread is unpacked back, if for > example we decide to call drm_sched_get_cleanup_job from a work item > which keeps scheduled it would be needed. > > >> >> The real problem is that in the main thread we mangled the call to >> kthread_parkme() into drm_sched_blocked() which can be called in >> atomic context. > > > Where is the atomic context in wait_event_interruptible ? I seem no to > see any. > > Andrey > > >> >> I suggest to rework this so that the kthread_should_park() and >> kthread_should_stop() test in wait_event_interruptible() come first >> and then call kthread_parkme() outside of the >> wait_event_interruptible(). >> >> Regards, >> Christian. >> >>> >>> Steve >>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>> Reviewed-by: Christian König <christian.koenig@amd.com> >>>> --- >>>> drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++--- >>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index d4cc728..6774955 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct >>>> drm_gpu_scheduler *sched) >>>> struct drm_sched_job *job; >>>> unsigned long flags; >>>> - /* Don't destroy jobs while the timeout worker is running */ >>>> - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>> - !cancel_delayed_work(&sched->work_tdr)) >>>> + /* >>>> + * Don't destroy jobs while the timeout worker is running OR >>>> thread >>>> + * is being parked and hence assumed to not touch ring_mirror_list >>>> + */ >>>> + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>> + !cancel_delayed_work(&sched->work_tdr)) || >>>> + __kthread_should_park(sched->thread)) >>>> return NULL; >>>> spin_lock_irqsave(&sched->job_list_lock, flags); >>>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Candrey.grodzovsky%40amd.com%7Ce81a824a5f984f51bc1908d772503a25%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637103561221298691&sdata=tW9Xupt7ascVPOlHxH0pHGcbUArVyTa5VTle016AcGg%3D&reserved=0 >>> >>
Am 27.11.19 um 16:32 schrieb Andrey Grodzovsky: > Ping... > > Andrey > > On 11/26/19 10:36 AM, Andrey Grodzovsky wrote: >> >> On 11/26/19 4:08 AM, Christian König wrote: >>> Am 25.11.19 um 17:51 schrieb Steven Price: >>>> On 25/11/2019 14:10, Andrey Grodzovsky wrote: >>>>> When the sched thread is parked we assume ring_mirror_list is >>>>> not accessed from here. >>>> FWIW I don't think this is necessary. kthread_park() will wait >>>> until the >>>> thread is parked, at which point the thread is stuck in >>>> kthread_parkme() >>>> until unparked. >>>> >>>> So all this does is avoid waiting for any cleanup jobs before >>>> parking - >>>> which might be a reasonable goal in itself, but if so lets at least >>>> document that. >>> >>> Now that you mention it that is indeed wrong. >> >> >> I wouldn't s call it wrong but superfluous in current code as indeed >> once the thread is parked there will be no subsequent calls to >> drm_sched_get_cleanup_job until the thread is unpacked back, if for >> example we decide to call drm_sched_get_cleanup_job from a work item >> which keeps scheduled it would be needed. >> >> >>> >>> The real problem is that in the main thread we mangled the call to >>> kthread_parkme() into drm_sched_blocked() which can be called in >>> atomic context. >> >> >> Where is the atomic context in wait_event_interruptible ? I seem no >> to see any. It's a rare event, but the check code in a wait_event_* macro might be called in atomic context. That's also the reason why Steven Price came up with the following patch: > commit 588b9828f0744ca13555c4a35cd0251ac8ad8ad2 > Author: Steven Price <steven.price@arm.com> > Date: Fri Oct 25 11:51:56 2019 +0100 > > drm: Don't free jobs in wait_event_interruptible() BTW: I've just pushed the v4 of that patch to drm-misc-next. Christian. >> >> Andrey >> >> >>> >>> I suggest to rework this so that the kthread_should_park() and >>> kthread_should_stop() test in wait_event_interruptible() come first >>> and then call kthread_parkme() outside of the >>> wait_event_interruptible(). >>> >>> Regards, >>> Christian. >>> >>>> >>>> Steve >>>> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>> Reviewed-by: Christian König <christian.koenig@amd.com> >>>>> --- >>>>> drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++--- >>>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index d4cc728..6774955 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct >>>>> drm_gpu_scheduler *sched) >>>>> struct drm_sched_job *job; >>>>> unsigned long flags; >>>>> - /* Don't destroy jobs while the timeout worker is running */ >>>>> - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>>> - !cancel_delayed_work(&sched->work_tdr)) >>>>> + /* >>>>> + * Don't destroy jobs while the timeout worker is running OR >>>>> thread >>>>> + * is being parked and hence assumed to not touch >>>>> ring_mirror_list >>>>> + */ >>>>> + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>>> + !cancel_delayed_work(&sched->work_tdr)) || >>>>> + __kthread_should_park(sched->thread)) >>>>> return NULL; >>>>> spin_lock_irqsave(&sched->job_list_lock, flags); >>>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@lists.freedesktop.org >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Candrey.grodzovsky%40amd.com%7Ce81a824a5f984f51bc1908d772503a25%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637103561221298691&sdata=tW9Xupt7ascVPOlHxH0pHGcbUArVyTa5VTle016AcGg%3D&reserved=0 >>>> >>>
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index d4cc728..6774955 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) struct drm_sched_job *job; unsigned long flags; - /* Don't destroy jobs while the timeout worker is running */ - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && - !cancel_delayed_work(&sched->work_tdr)) + /* + * Don't destroy jobs while the timeout worker is running OR thread + * is being parked and hence assumed to not touch ring_mirror_list + */ + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && + !cancel_delayed_work(&sched->work_tdr)) || + __kthread_should_park(sched->thread)) return NULL; spin_lock_irqsave(&sched->job_list_lock, flags);