diff mbox series

[v2] drm/scheduler: Fix job cleanup without timeout handler

Message ID 20190520231649.24595-1-nunes.erico@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/scheduler: Fix job cleanup without timeout handler | expand

Commit Message

Erico Nunes May 20, 2019, 11:16 p.m. UTC
After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
only deleted when the timeout handler is able to be cancelled
successfully.

In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
job cleanup would be skipped which may result in memory leaks.

Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
drm_sched_cleanup_jobs.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Christian König May 21, 2019, 6:47 a.m. UTC | #1
Am 21.05.19 um 01:16 schrieb Erico Nunes:
> [CAUTION: External Email]
>
> After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> only deleted when the timeout handler is able to be cancelled
> successfully.
>
> In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> job cleanup would be skipped which may result in memory leaks.
>
> Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> drm_sched_cleanup_jobs.
>
> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> Cc: Christian König <christian.koenig@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Going to pick that up later today into our internal branch.

Thanks for the help,
Christian.

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index f8f0e1c19002..10d1d37e644a 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -630,7 +630,8 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>          unsigned long flags;
>
>          /* Don't destroy jobs while the timeout worker is running */
> -       if (!cancel_delayed_work(&sched->work_tdr))
> +       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> +           !cancel_delayed_work(&sched->work_tdr))
>                  return;
>
>
> --
> 2.20.1
>
Erico Nunes May 21, 2019, 12:16 p.m. UTC | #2
On Tue, May 21, 2019 at 8:47 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Am 21.05.19 um 01:16 schrieb Erico Nunes:
> > [CAUTION: External Email]
> >
> > After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> > only deleted when the timeout handler is able to be cancelled
> > successfully.
> >
> > In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> > job cleanup would be skipped which may result in memory leaks.
> >
> > Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> > drm_sched_cleanup_jobs.
> >
> > Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> > Cc: Christian König <christian.koenig@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Going to pick that up later today into our internal branch.

Thanks. I also posted one to set lima to use a real default timeout.

Is it possible for us to still get at least one of these into a tree
that goes to a linux 5.2-rc?
This issue may make lima unusable as the system runs out of memory
quickly if many opengl programs are launched.

Erico
Christian König May 21, 2019, 12:52 p.m. UTC | #3
Am 21.05.19 um 14:16 schrieb Erico Nunes:
> [CAUTION: External Email]
>
> On Tue, May 21, 2019 at 8:47 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
>> Am 21.05.19 um 01:16 schrieb Erico Nunes:
>>> [CAUTION: External Email]
>>>
>>> After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
>>> only deleted when the timeout handler is able to be cancelled
>>> successfully.
>>>
>>> In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
>>> job cleanup would be skipped which may result in memory leaks.
>>>
>>> Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
>>> drm_sched_cleanup_jobs.
>>>
>>> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Going to pick that up later today into our internal branch.
> Thanks. I also posted one to set lima to use a real default timeout.
>
> Is it possible for us to still get at least one of these into a tree
> that goes to a linux 5.2-rc?
> This issue may make lima unusable as the system runs out of memory
> quickly if many opengl programs are launched.

Alex should pick it up for his next -fixes pull this week.

Christian.

>
> Erico
Qiang Yu May 21, 2019, 1:03 p.m. UTC | #4
On Tue, May 21, 2019 at 8:17 PM Erico Nunes <nunes.erico@gmail.com> wrote:
>
> On Tue, May 21, 2019 at 8:47 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
> >
> > Am 21.05.19 um 01:16 schrieb Erico Nunes:
> > > [CAUTION: External Email]
> > >
> > > After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> > > only deleted when the timeout handler is able to be cancelled
> > > successfully.
> > >
> > > In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> > > job cleanup would be skipped which may result in memory leaks.
> > >
> > > Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> > > drm_sched_cleanup_jobs.
> > >
> > > Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> >
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> >
> > Going to pick that up later today into our internal branch.
>
> Thanks. I also posted one to set lima to use a real default timeout.
>
> Is it possible for us to still get at least one of these into a tree
> that goes to a linux 5.2-rc?
> This issue may make lima unusable as the system runs out of memory
> quickly if many opengl programs are launched.
>
The "drm/scheduler: rework job destruction" commit is not in 5.2-rc. So no
need to pick any fix to that place.

It would be better that run some tests on 5.2-rc for other possible
fixes indeed.

Regards,
Qiang
Alex Deucher May 21, 2019, 2:13 p.m. UTC | #5
On Tue, May 21, 2019 at 2:48 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Am 21.05.19 um 01:16 schrieb Erico Nunes:
> > [CAUTION: External Email]
> >
> > After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
> > only deleted when the timeout handler is able to be cancelled
> > successfully.
> >
> > In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
> > job cleanup would be skipped which may result in memory leaks.
> >
> > Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
> > drm_sched_cleanup_jobs.
> >
> > Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
> > Cc: Christian König <christian.koenig@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Going to pick that up later today into our internal branch.

Please apply it to drm-misc-next.  that is where the other gpu
scheduler changes are.  They are not in 5.2.

Alex

>
> Thanks for the help,
> Christian.
>
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index f8f0e1c19002..10d1d37e644a 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -630,7 +630,8 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> >          unsigned long flags;
> >
> >          /* Don't destroy jobs while the timeout worker is running */
> > -       if (!cancel_delayed_work(&sched->work_tdr))
> > +       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> > +           !cancel_delayed_work(&sched->work_tdr))
> >                  return;
> >
> >
> > --
> > 2.20.1
> >
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König May 21, 2019, 2:15 p.m. UTC | #6
Am 21.05.19 um 16:13 schrieb Alex Deucher:
> [CAUTION: External Email]
>
> On Tue, May 21, 2019 at 2:48 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
>> Am 21.05.19 um 01:16 schrieb Erico Nunes:
>>> [CAUTION: External Email]
>>>
>>> After "5918045c4ed4 drm/scheduler: rework job destruction", jobs are
>>> only deleted when the timeout handler is able to be cancelled
>>> successfully.
>>>
>>> In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
>>> job cleanup would be skipped which may result in memory leaks.
>>>
>>> Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
>>> drm_sched_cleanup_jobs.
>>>
>>> Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Going to pick that up later today into our internal branch.
> Please apply it to drm-misc-next.  that is where the other gpu
> scheduler changes are.  They are not in 5.2.

Ah! Now that makes sense again, thanks for the reminder.

Christian.

>
> Alex
>
>> Thanks for the help,
>> Christian.
>>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index f8f0e1c19002..10d1d37e644a 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -630,7 +630,8 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>           unsigned long flags;
>>>
>>>           /* Don't destroy jobs while the timeout worker is running */
>>> -       if (!cancel_delayed_work(&sched->work_tdr))
>>> +       if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>> +           !cancel_delayed_work(&sched->work_tdr))
>>>                   return;
>>>
>>>
>>> --
>>> 2.20.1
>>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index f8f0e1c19002..10d1d37e644a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -630,7 +630,8 @@  static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 	unsigned long flags;
 
 	/* Don't destroy jobs while the timeout worker is running */
-	if (!cancel_delayed_work(&sched->work_tdr))
+	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	    !cancel_delayed_work(&sched->work_tdr))
 		return;