drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Message ID	20231102224653.5785-2-ltuikov89@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Luben Tuikov <ltuikov89@gmail.com> To: tvrtko.ursulin@linux.intel.com Subject: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready() Date: Thu, 2 Nov 2023 18:46:54 -0400 Message-ID: <20231102224653.5785-2-ltuikov89@gmail.com> In-Reply-To: <bb7c307e-271c-4f4c-bdbc-7078972ba515@linux.intel.com> References: <bb7c307e-271c-4f4c-bdbc-7078972ba515@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: matthew.brost@intel.com, robdclark@chromium.org, sarah.walker@imgtec.com, ltuikov@yahoo.com, Luben Tuikov <ltuikov89@gmail.com>, ketil.johnsen@arm.com, lina@asahilina.net, mcanal@igalia.com, Liviu.Dudau@arm.com, dri-devel@lists.freedesktop.org, intel-xe@lists.freedesktop.org, boris.brezillon@collabora.com, dakr@redhat.com, donald.robson@imgtec.com, christian.koenig@amd.com, faith.ekstrand@collabora.com Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/sched: Eliminate drm_sched_run_job_queue_if_ready() \| expand drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Luben Tuikov Nov. 2, 2023, 10:46 p.m. UTC

Eliminate drm_sched_run_job_queue_if_ready() and instead just call
drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
the former function uses drm_sched_select_entity() to determine if the
scheduler had an entity ready in one of its run-queues, and in the case of the
Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
just that, selects the _next_ entity which is ready, sets up the run-queue and
completion and returns that entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in calling select_entity()
twice, which may result in skipping a ready entity if more than one entity is
ready. This commit fixes this by eliminating the if_ready() variant.

Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)


base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e

Tvrtko Ursulin Nov. 3, 2023, 10:39 a.m. UTC | #1

On 02/11/2023 22:46, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.

Fixes: is missing since the regression already landed.

> 
> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>   }
>   EXPORT_SYMBOL(drm_sched_pick_best);
>   
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>   /**
>    * drm_sched_free_job_work - worker to call free_job
>    *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>   		sched->ops->free_job(cleanup_job);
>   
>   		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);

It works but is a bit wasteful causing needless CPU wake ups with a 
potentially empty queue, both here and in drm_sched_run_job_work below.

What would be the problem in having a "peek" type helper? It would be 
easy to do it in a single spin lock section instead of drop and re-acquire.

What is even the point of having the re-queue here _inside_ the if 
(cleanup_job) block? See 
https://lists.freedesktop.org/archives/dri-devel/2023-November/429037.html. 
Because of the lock drop and re-acquire I don't see that it makes sense 
to make potential re-queue depend on the existence of current finished job.

Also the point of doing the re-queue of the run job queue from the free 
worker?

(I suppose re-queuing the _free_ worker itself is needed in the current 
design, albeit inefficient.)

Regards,

Tvrtko

>   	}
>   }
>   
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	}
>   
>   	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>   }
>   
>   /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e

Matthew Brost Nov. 3, 2023, 3:13 p.m. UTC | #2

On Thu, Nov 02, 2023 at 06:46:54PM -0400, Luben Tuikov wrote:
> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
> the former function uses drm_sched_select_entity() to determine if the
> scheduler had an entity ready in one of its run-queues, and in the case of the
> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
> just that, selects the _next_ entity which is ready, sets up the run-queue and
> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
> 
> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
> in the case of RR scheduling, that would result in calling select_entity()
> twice, which may result in skipping a ready entity if more than one entity is
> ready. This commit fixes this by eliminating the if_ready() variant.
> 

Ah, yes I guess we both missed this. What about reviving the peek
argument [1]? This would avoid unnecessary re-queues. 

Matt

[1] https://patchwork.freedesktop.org/patch/562222/?series=121744&rev=7

> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 98b2ad54fc7071..05816e7cae8c8b 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>  }
>  EXPORT_SYMBOL(drm_sched_pick_best);
>  
> -/**
> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
> - * @sched: scheduler instance
> - */
> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
> -{
> -	if (drm_sched_select_entity(sched))
> -		drm_sched_run_job_queue(sched);
> -}
> -
>  /**
>   * drm_sched_free_job_work - worker to call free_job
>   *
> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>  		sched->ops->free_job(cleanup_job);
>  
>  		drm_sched_free_job_queue_if_done(sched);
> -		drm_sched_run_job_queue_if_ready(sched);
> +		drm_sched_run_job_queue(sched);
>  	}
>  }
>  
> @@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
>  	}
>  
>  	wake_up(&sched->job_scheduled);
> -	drm_sched_run_job_queue_if_ready(sched);
> +	drm_sched_run_job_queue(sched);
>  }
>  
>  /**
> 
> base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e
> -- 
> 2.42.1
>

Luben Tuikov Nov. 4, 2023, 12:24 a.m. UTC | #3

Hi Matt, :-)

On 2023-11-03 11:13, Matthew Brost wrote:
> On Thu, Nov 02, 2023 at 06:46:54PM -0400, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
> 
> Ah, yes I guess we both missed this. What about reviving the peek
> argument [1]? This would avoid unnecessary re-queues.

So, I really am not too fond of "peek-then-get-and-do" (scheduling) organizations,
because they don't scale. As we've seen in our case, the RR has a side effect,
as Tvrtko pointed out (thanks!), and in the future this
"peek-first, then go-again, to go"-type of organization would only prevent us
from doing more interesting things.

Also, with the GPU scheduler organization, mixing in the "peek", we just get
to carry it around through many a function, only to be used in a leaf function,
and exported way back up (because we don't know the rq at that level).

I'd much rather we just did "consume-until-empty", and if we have one last
empty check (or first), then that's not a breaker. (I mean, we have a
drm_sched_pick_best() which has time complexity O(n), and we execute it every time
we arm a job, so it's not that big of a deal.) Plus, it makes the code concise
and compact.

Let me reconstitute the patch and I'll send it for yours review.

Luben Tuikov Nov. 4, 2023, 12:25 a.m. UTC | #4

Hi Tvrtko,

On 2023-11-03 06:39, Tvrtko Ursulin wrote:
> 
> On 02/11/2023 22:46, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
> 
> Fixes: is missing since the regression already landed.

Ah, yes, thank you for pointing that out. :-)
I'll add one.

> 
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   }
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>> -/**
>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>> -{
>> -	if (drm_sched_select_entity(sched))
>> -		drm_sched_run_job_queue(sched);
>> -}
>> -
>>   /**
>>    * drm_sched_free_job_work - worker to call free_job
>>    *
>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>   		sched->ops->free_job(cleanup_job);
>>   
>>   		drm_sched_free_job_queue_if_done(sched);
>> -		drm_sched_run_job_queue_if_ready(sched);
>> +		drm_sched_run_job_queue(sched);
> 
> It works but is a bit wasteful causing needless CPU wake ups with a 

I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
put the wq on the same CPU by instruction cache locality.

> potentially empty queue, both here and in drm_sched_run_job_work below.

That's true, but if you were to look at the typical execution of
this code you'd see we get a string of function entry when the incoming queue
is non-empty, followed by one empty entry only to be taken off the CPU. So,
it really isn't a breaker.

So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
makes it in the next version of the patch.

Thanks!

Tvrtko Ursulin Nov. 6, 2023, 12:54 p.m. UTC | #5

On 04/11/2023 00:25, Luben Tuikov wrote:
> Hi Tvrtko,
> 
> On 2023-11-03 06:39, Tvrtko Ursulin wrote:
>>
>> On 02/11/2023 22:46, Luben Tuikov wrote:
>>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>>> the former function uses drm_sched_select_entity() to determine if the
>>> scheduler had an entity ready in one of its run-queues, and in the case of the
>>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>>
>>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>>> in the case of RR scheduling, that would result in calling select_entity()
>>> twice, which may result in skipping a ready entity if more than one entity is
>>> ready. This commit fixes this by eliminating the if_ready() variant.
>>
>> Fixes: is missing since the regression already landed.
> 
> Ah, yes, thank you for pointing that out. :-)
> I'll add one.
> 
>>
>>>
>>> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_pick_best);
>>>    
>>> -/**
>>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>>> - * @sched: scheduler instance
>>> - */
>>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>>> -{
>>> -	if (drm_sched_select_entity(sched))
>>> -		drm_sched_run_job_queue(sched);
>>> -}
>>> -
>>>    /**
>>>     * drm_sched_free_job_work - worker to call free_job
>>>     *
>>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>>    		sched->ops->free_job(cleanup_job);
>>>    
>>>    		drm_sched_free_job_queue_if_done(sched);
>>> -		drm_sched_run_job_queue_if_ready(sched);
>>> +		drm_sched_run_job_queue(sched);
>>
>> It works but is a bit wasteful causing needless CPU wake ups with a
> 
> I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
> put the wq on the same CPU by instruction cache locality.
> 
>> potentially empty queue, both here and in drm_sched_run_job_work below.
> 
> That's true, but if you were to look at the typical execution of
> this code you'd see we get a string of function entry when the incoming queue
> is non-empty, followed by one empty entry only to be taken off the CPU. So,
> it really isn't a breaker.
> 
> So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
> makes it in the next version of the patch.

Okay, I will be keeping an eye on that.

Separately, I might send a patch to do the "re-queue if more pending" in 
one atomic section. (Instead of re-acquiring the lock.)

And also as a heads up, at some point in the next few months I will 
start looking at the latency and power effects of the "do just one and 
re-queue" conversion. In ChromeOS milli-Watts matter and some things 
like media playback do have a lot of inter-engine dependencies. So 
keeping the CPU C state residency low might matter. Well, it might 
matter for server media transcode stream density workloads too, both 
from power and stream capacity per socket metrics.

Regards,

Tvrtko

drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Commit Message

Comments

Patch