diff mbox series

[v3,1/2] drm/sched: Avoid job cleanup if sched thread is parked.

Message ID 1574691041-5499-1-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)
State New, archived
Headers show
Series [v3,1/2] drm/sched: Avoid job cleanup if sched thread is parked. | expand

Commit Message

Andrey Grodzovsky Nov. 25, 2019, 2:10 p.m. UTC
When the sched thread is parked we assume ring_mirror_list is
not accessed from here.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Steven Price Nov. 25, 2019, 4:51 p.m. UTC | #1
On 25/11/2019 14:10, Andrey Grodzovsky wrote:
> When the sched thread is parked we assume ring_mirror_list is
> not accessed from here.

FWIW I don't think this is necessary. kthread_park() will wait until the
thread is parked, at which point the thread is stuck in kthread_parkme()
until unparked.

So all this does is avoid waiting for any cleanup jobs before parking -
which might be a reasonable goal in itself, but if so lets at least
document that.

Steve

> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index d4cc728..6774955 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>  	struct drm_sched_job *job;
>  	unsigned long flags;
>  
> -	/* Don't destroy jobs while the timeout worker is running */
> -	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> -	    !cancel_delayed_work(&sched->work_tdr))
> +	/*
> +	* Don't destroy jobs while the timeout worker is running  OR thread
> +	* is being parked and hence assumed to not touch ring_mirror_list
> +	*/
> +	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> +	     !cancel_delayed_work(&sched->work_tdr)) ||
> +	     __kthread_should_park(sched->thread))
>  		return NULL;
>  
>  	spin_lock_irqsave(&sched->job_list_lock, flags);
>
Christian König Nov. 26, 2019, 9:08 a.m. UTC | #2
Am 25.11.19 um 17:51 schrieb Steven Price:
> On 25/11/2019 14:10, Andrey Grodzovsky wrote:
>> When the sched thread is parked we assume ring_mirror_list is
>> not accessed from here.
> FWIW I don't think this is necessary. kthread_park() will wait until the
> thread is parked, at which point the thread is stuck in kthread_parkme()
> until unparked.
>
> So all this does is avoid waiting for any cleanup jobs before parking -
> which might be a reasonable goal in itself, but if so lets at least
> document that.

Now that you mention it that is indeed wrong.

The real problem is that in the main thread we mangled the call to 
kthread_parkme() into drm_sched_blocked() which can be called in atomic 
context.

I suggest to rework this so that the kthread_should_park() and 
kthread_should_stop() test in wait_event_interruptible() come first and 
then call kthread_parkme() outside of the wait_event_interruptible().

Regards,
Christian.

>
> Steve
>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index d4cc728..6774955 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   	struct drm_sched_job *job;
>>   	unsigned long flags;
>>   
>> -	/* Don't destroy jobs while the timeout worker is running */
>> -	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> -	    !cancel_delayed_work(&sched->work_tdr))
>> +	/*
>> +	* Don't destroy jobs while the timeout worker is running  OR thread
>> +	* is being parked and hence assumed to not touch ring_mirror_list
>> +	*/
>> +	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> +	     !cancel_delayed_work(&sched->work_tdr)) ||
>> +	     __kthread_should_park(sched->thread))
>>   		return NULL;
>>   
>>   	spin_lock_irqsave(&sched->job_list_lock, flags);
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Andrey Grodzovsky Nov. 26, 2019, 3:36 p.m. UTC | #3
On 11/26/19 4:08 AM, Christian König wrote:
> Am 25.11.19 um 17:51 schrieb Steven Price:
>> On 25/11/2019 14:10, Andrey Grodzovsky wrote:
>>> When the sched thread is parked we assume ring_mirror_list is
>>> not accessed from here.
>> FWIW I don't think this is necessary. kthread_park() will wait until the
>> thread is parked, at which point the thread is stuck in kthread_parkme()
>> until unparked.
>>
>> So all this does is avoid waiting for any cleanup jobs before parking -
>> which might be a reasonable goal in itself, but if so lets at least
>> document that.
>
> Now that you mention it that is indeed wrong.


I wouldn't s call it wrong but superfluous in current code as indeed 
once the thread is parked there will be no subsequent calls to 
drm_sched_get_cleanup_job until the thread is unpacked back, if for 
example we decide to call drm_sched_get_cleanup_job from a work item 
which keeps scheduled it would be needed.


>
> The real problem is that in the main thread we mangled the call to 
> kthread_parkme() into drm_sched_blocked() which can be called in 
> atomic context.


Where is the atomic context in wait_event_interruptible ? I seem no to 
see any.

Andrey


>
> I suggest to rework this so that the kthread_should_park() and 
> kthread_should_stop() test in wait_event_interruptible() come first 
> and then call kthread_parkme() outside of the wait_event_interruptible().
>
> Regards,
> Christian.
>
>>
>> Steve
>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index d4cc728..6774955 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct 
>>> drm_gpu_scheduler *sched)
>>>       struct drm_sched_job *job;
>>>       unsigned long flags;
>>>   -    /* Don't destroy jobs while the timeout worker is running */
>>> -    if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> -        !cancel_delayed_work(&sched->work_tdr))
>>> +    /*
>>> +    * Don't destroy jobs while the timeout worker is running OR thread
>>> +    * is being parked and hence assumed to not touch ring_mirror_list
>>> +    */
>>> +    if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> +         !cancel_delayed_work(&sched->work_tdr)) ||
>>> +         __kthread_should_park(sched->thread))
>>>           return NULL;
>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Ce81a824a5f984f51bc1908d772503a25%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637103561221298691&amp;sdata=tW9Xupt7ascVPOlHxH0pHGcbUArVyTa5VTle016AcGg%3D&amp;reserved=0 
>>
>
Andrey Grodzovsky Nov. 27, 2019, 3:32 p.m. UTC | #4
Ping...

Andrey

On 11/26/19 10:36 AM, Andrey Grodzovsky wrote:
>
> On 11/26/19 4:08 AM, Christian König wrote:
>> Am 25.11.19 um 17:51 schrieb Steven Price:
>>> On 25/11/2019 14:10, Andrey Grodzovsky wrote:
>>>> When the sched thread is parked we assume ring_mirror_list is
>>>> not accessed from here.
>>> FWIW I don't think this is necessary. kthread_park() will wait until 
>>> the
>>> thread is parked, at which point the thread is stuck in 
>>> kthread_parkme()
>>> until unparked.
>>>
>>> So all this does is avoid waiting for any cleanup jobs before parking -
>>> which might be a reasonable goal in itself, but if so lets at least
>>> document that.
>>
>> Now that you mention it that is indeed wrong.
>
>
> I wouldn't s call it wrong but superfluous in current code as indeed 
> once the thread is parked there will be no subsequent calls to 
> drm_sched_get_cleanup_job until the thread is unpacked back, if for 
> example we decide to call drm_sched_get_cleanup_job from a work item 
> which keeps scheduled it would be needed.
>
>
>>
>> The real problem is that in the main thread we mangled the call to 
>> kthread_parkme() into drm_sched_blocked() which can be called in 
>> atomic context.
>
>
> Where is the atomic context in wait_event_interruptible ? I seem no to 
> see any.
>
> Andrey
>
>
>>
>> I suggest to rework this so that the kthread_should_park() and 
>> kthread_should_stop() test in wait_event_interruptible() come first 
>> and then call kthread_parkme() outside of the 
>> wait_event_interruptible().
>>
>> Regards,
>> Christian.
>>
>>>
>>> Steve
>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
>>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index d4cc728..6774955 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct 
>>>> drm_gpu_scheduler *sched)
>>>>       struct drm_sched_job *job;
>>>>       unsigned long flags;
>>>>   -    /* Don't destroy jobs while the timeout worker is running */
>>>> -    if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>> -        !cancel_delayed_work(&sched->work_tdr))
>>>> +    /*
>>>> +    * Don't destroy jobs while the timeout worker is running OR 
>>>> thread
>>>> +    * is being parked and hence assumed to not touch ring_mirror_list
>>>> +    */
>>>> +    if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>> +         !cancel_delayed_work(&sched->work_tdr)) ||
>>>> +         __kthread_should_park(sched->thread))
>>>>           return NULL;
>>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Ce81a824a5f984f51bc1908d772503a25%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637103561221298691&amp;sdata=tW9Xupt7ascVPOlHxH0pHGcbUArVyTa5VTle016AcGg%3D&amp;reserved=0 
>>>
>>
Christian König Nov. 27, 2019, 3:35 p.m. UTC | #5
Am 27.11.19 um 16:32 schrieb Andrey Grodzovsky:
> Ping...
>
> Andrey
>
> On 11/26/19 10:36 AM, Andrey Grodzovsky wrote:
>>
>> On 11/26/19 4:08 AM, Christian König wrote:
>>> Am 25.11.19 um 17:51 schrieb Steven Price:
>>>> On 25/11/2019 14:10, Andrey Grodzovsky wrote:
>>>>> When the sched thread is parked we assume ring_mirror_list is
>>>>> not accessed from here.
>>>> FWIW I don't think this is necessary. kthread_park() will wait 
>>>> until the
>>>> thread is parked, at which point the thread is stuck in 
>>>> kthread_parkme()
>>>> until unparked.
>>>>
>>>> So all this does is avoid waiting for any cleanup jobs before 
>>>> parking -
>>>> which might be a reasonable goal in itself, but if so lets at least
>>>> document that.
>>>
>>> Now that you mention it that is indeed wrong.
>>
>>
>> I wouldn't s call it wrong but superfluous in current code as indeed 
>> once the thread is parked there will be no subsequent calls to 
>> drm_sched_get_cleanup_job until the thread is unpacked back, if for 
>> example we decide to call drm_sched_get_cleanup_job from a work item 
>> which keeps scheduled it would be needed.
>>
>>
>>>
>>> The real problem is that in the main thread we mangled the call to 
>>> kthread_parkme() into drm_sched_blocked() which can be called in 
>>> atomic context.
>>
>>
>> Where is the atomic context in wait_event_interruptible ? I seem no 
>> to see any.

It's a rare event, but the check code in a wait_event_* macro might be 
called in atomic context.

That's also the reason why Steven Price came up with the following patch:
> commit 588b9828f0744ca13555c4a35cd0251ac8ad8ad2
> Author: Steven Price <steven.price@arm.com>
> Date:   Fri Oct 25 11:51:56 2019 +0100
>
>     drm: Don't free jobs in wait_event_interruptible()

BTW: I've just pushed the v4 of that patch to drm-misc-next.

Christian.

>>
>> Andrey
>>
>>
>>>
>>> I suggest to rework this so that the kthread_should_park() and 
>>> kthread_should_stop() test in wait_event_interruptible() come first 
>>> and then call kthread_parkme() outside of the 
>>> wait_event_interruptible().
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Steve
>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/scheduler/sched_main.c | 10 +++++++---
>>>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index d4cc728..6774955 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct 
>>>>> drm_gpu_scheduler *sched)
>>>>>       struct drm_sched_job *job;
>>>>>       unsigned long flags;
>>>>>   -    /* Don't destroy jobs while the timeout worker is running */
>>>>> -    if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>> -        !cancel_delayed_work(&sched->work_tdr))
>>>>> +    /*
>>>>> +    * Don't destroy jobs while the timeout worker is running OR 
>>>>> thread
>>>>> +    * is being parked and hence assumed to not touch 
>>>>> ring_mirror_list
>>>>> +    */
>>>>> +    if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>> +         !cancel_delayed_work(&sched->work_tdr)) ||
>>>>> +         __kthread_should_park(sched->thread))
>>>>>           return NULL;
>>>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Candrey.grodzovsky%40amd.com%7Ce81a824a5f984f51bc1908d772503a25%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637103561221298691&amp;sdata=tW9Xupt7ascVPOlHxH0pHGcbUArVyTa5VTle016AcGg%3D&amp;reserved=0 
>>>>
>>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index d4cc728..6774955 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -635,9 +635,13 @@  drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 	struct drm_sched_job *job;
 	unsigned long flags;
 
-	/* Don't destroy jobs while the timeout worker is running */
-	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-	    !cancel_delayed_work(&sched->work_tdr))
+	/*
+	* Don't destroy jobs while the timeout worker is running  OR thread
+	* is being parked and hence assumed to not touch ring_mirror_list
+	*/
+	if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	     !cancel_delayed_work(&sched->work_tdr)) ||
+	     __kthread_should_park(sched->thread))
 		return NULL;
 
 	spin_lock_irqsave(&sched->job_list_lock, flags);