diff mbox series

[v3,09/13] drm/sched: Submit job before starting TDR

Message ID 20230912021615.2086698-10-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series DRM scheduler changes for Xe | expand

Commit Message

Matthew Brost Sept. 12, 2023, 2:16 a.m. UTC
If the TDR is set to a value, it can fire before a job is submitted in
drm_sched_main. The job should be always be submitted before the TDR
fires, fix this ordering.

v2:
  - Add to pending list before run_job, start TDR after (Luben, Boris)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Luben Tuikov Sept. 14, 2023, 2:56 a.m. UTC | #1
On 2023-09-11 22:16, Matthew Brost wrote:
> If the TDR is set to a value, it can fire before a job is submitted in
> drm_sched_main. The job should be always be submitted before the TDR
> fires, fix this ordering.
> 
> v2:
>   - Add to pending list before run_job, start TDR after (Luben, Boris)
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c627d3e6494a..9dbfab7be2c6 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
>  
>  	spin_lock(&sched->job_list_lock);
>  	list_add_tail(&s_job->list, &sched->pending_list);
> -	drm_sched_start_timeout(sched);
>  	spin_unlock(&sched->job_list_lock);
>  }
>  
> @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  		fence = sched->ops->run_job(sched_job);
>  		complete_all(&entity->entity_idle);
>  		drm_sched_fence_scheduled(s_fence, fence);
> +		drm_sched_start_timeout_unlocked(sched);
>  
>  		if (!IS_ERR_OR_NULL(fence)) {
>  			/* Drop for original kref_init of the fence */

So, sched->ops->run_job(), is a "job inflection point" from the point of view of
the DRM scheduler. After that call, DRM has relinquished control of the job to the
firmware/hardware.

Putting the job in the pending list, before submitting it to down to the firmware/hardware,
goes along with starting a timeout timer for the job. The timeout always includes
time for the firmware/hardware to get it prepped, as well as time for the actual
execution of the job (task). Thus, we want to do this:
	1. Put the job in pending list. "Pending list" means "pends in hardware".
	2. Start a timeout timer for the job.
	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
	   by calling sched->ops->run_job().
Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
the timeout value?
Matthew Brost Sept. 14, 2023, 5:48 p.m. UTC | #2
On Wed, Sep 13, 2023 at 10:56:10PM -0400, Luben Tuikov wrote:
> On 2023-09-11 22:16, Matthew Brost wrote:
> > If the TDR is set to a value, it can fire before a job is submitted in
> > drm_sched_main. The job should be always be submitted before the TDR
> > fires, fix this ordering.
> > 
> > v2:
> >   - Add to pending list before run_job, start TDR after (Luben, Boris)
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index c627d3e6494a..9dbfab7be2c6 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
> >  
> >  	spin_lock(&sched->job_list_lock);
> >  	list_add_tail(&s_job->list, &sched->pending_list);
> > -	drm_sched_start_timeout(sched);
> >  	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> > @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
> >  		fence = sched->ops->run_job(sched_job);
> >  		complete_all(&entity->entity_idle);
> >  		drm_sched_fence_scheduled(s_fence, fence);
> > +		drm_sched_start_timeout_unlocked(sched);
> >  
> >  		if (!IS_ERR_OR_NULL(fence)) {
> >  			/* Drop for original kref_init of the fence */
> 
> So, sched->ops->run_job(), is a "job inflection point" from the point of view of
> the DRM scheduler. After that call, DRM has relinquished control of the job to the
> firmware/hardware.
> 
> Putting the job in the pending list, before submitting it to down to the firmware/hardware,
> goes along with starting a timeout timer for the job. The timeout always includes
> time for the firmware/hardware to get it prepped, as well as time for the actual
> execution of the job (task). Thus, we want to do this:
> 	1. Put the job in pending list. "Pending list" means "pends in hardware".
> 	2. Start a timeout timer for the job.
> 	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
> 	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
> 	   by calling sched->ops->run_job().
> Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
> the timeout value?

I disagree. It is clear race if the timeout starts before run_job() that
the TDR can fire before run_job() is called. The entire point of this
patch is to seal this race by starting the TDR after run_job() is
called.

Matt

> -- 
> Regards,
> Luben
>
Luben Tuikov Sept. 21, 2023, 3:35 a.m. UTC | #3
On 2023-09-14 13:48, Matthew Brost wrote:
> On Wed, Sep 13, 2023 at 10:56:10PM -0400, Luben Tuikov wrote:
>> On 2023-09-11 22:16, Matthew Brost wrote:
>>> If the TDR is set to a value, it can fire before a job is submitted in
>>> drm_sched_main. The job should be always be submitted before the TDR
>>> fires, fix this ordering.
>>>
>>> v2:
>>>   - Add to pending list before run_job, start TDR after (Luben, Boris)
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>  drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index c627d3e6494a..9dbfab7be2c6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -498,7 +498,6 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>  
>>>  	spin_lock(&sched->job_list_lock);
>>>  	list_add_tail(&s_job->list, &sched->pending_list);
>>> -	drm_sched_start_timeout(sched);
>>>  	spin_unlock(&sched->job_list_lock);
>>>  }
>>>  
>>> @@ -1234,6 +1233,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>  		fence = sched->ops->run_job(sched_job);
>>>  		complete_all(&entity->entity_idle);
>>>  		drm_sched_fence_scheduled(s_fence, fence);
>>> +		drm_sched_start_timeout_unlocked(sched);
>>>  
>>>  		if (!IS_ERR_OR_NULL(fence)) {
>>>  			/* Drop for original kref_init of the fence */
>>
>> So, sched->ops->run_job(), is a "job inflection point" from the point of view of
>> the DRM scheduler. After that call, DRM has relinquished control of the job to the
>> firmware/hardware.
>>
>> Putting the job in the pending list, before submitting it to down to the firmware/hardware,
>> goes along with starting a timeout timer for the job. The timeout always includes
>> time for the firmware/hardware to get it prepped, as well as time for the actual
>> execution of the job (task). Thus, we want to do this:
>> 	1. Put the job in pending list. "Pending list" means "pends in hardware".
>> 	2. Start a timeout timer for the job.
>> 	3. Start executing the job/task. This usually involves giving it to firmware/hardware,
>> 	   i.e. ownership of the job/task changes to another domain. In our case this is accomplished
>> 	   by calling sched->ops->run_job().
>> Perhaps move drm_sched_start_timeout() closer to sched->ops->run_job() from above and/or increase
>> the timeout value?
> 
> I disagree. It is clear race if the timeout starts before run_job() that
> the TDR can fire before run_job() is called. The entire point of this

Then that would mean that 1) the timeout time is too short, and/or 2) the firmware/hardware
took a really long time to complete the job (from the point of view of the scheduler TDR).

> patch is to seal this race by starting the TDR after run_job() is
> called.

Once you call run_job() you're no longer in control of the job and things can
happen, like this job being returned/cancelled due to reasons out of the scheduler's
control. If you started the timeout _after_ submitting the job to the hardware,
you may be racing with what the hardware might want to do to the job as described
in the previous sentence.

The timeout timer should start before we give away the job to the hardware.
This is not that dissimilar to sending a network packet out the interface.

If you need a longer timeout time, then we can do that, but starting the timeout
after giving away the job to the hardware is a no-go.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c627d3e6494a..9dbfab7be2c6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -498,7 +498,6 @@  static void drm_sched_job_begin(struct drm_sched_job *s_job)
 
 	spin_lock(&sched->job_list_lock);
 	list_add_tail(&s_job->list, &sched->pending_list);
-	drm_sched_start_timeout(sched);
 	spin_unlock(&sched->job_list_lock);
 }
 
@@ -1234,6 +1233,7 @@  static void drm_sched_run_job_work(struct work_struct *w)
 		fence = sched->ops->run_job(sched_job);
 		complete_all(&entity->entity_idle);
 		drm_sched_fence_scheduled(s_fence, fence);
+		drm_sched_start_timeout_unlocked(sched);
 
 		if (!IS_ERR_OR_NULL(fence)) {
 			/* Drop for original kref_init of the fence */