diff mbox series

[2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

Message ID 20240924101914.2713-3-tursulin@igalia.com (mailing list archive)
State New, archived
Headers show
Series DRM scheduler fixes and improvements | expand

Commit Message

Tvrtko Ursulin Sept. 24, 2024, 10:19 a.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>

Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.

Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().

v2:
 * Improve commit message. (Philipp)
 * Cache the scheduler pointer directly. (Christian)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Philipp Stanner <pstanner@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Christian König Sept. 24, 2024, 1:55 p.m. UTC | #1
I've pushed the first to drm-misc-next, but that one here fails to apply 
cleanly.

Christian.

Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin:
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> Since drm_sched_entity_modify_sched() can modify the entities run queue,
> lets make sure to only dereference the pointer once so both adding and
> waking up are guaranteed to be consistent.
>
> Alternative of moving the spin_unlock to after the wake up would for now
> be more problematic since the same lock is taken inside
> drm_sched_rq_update_fifo().
>
> v2:
>   * Improve commit message. (Philipp)
>   * Cache the scheduler pointer directly. (Christian)
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Luben Tuikov <ltuikov89@gmail.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Philipp Stanner <pstanner@redhat.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.7+
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 0e002c17fcb6..a75eede8bf8d 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   
>   	/* first job wakes up scheduler */
>   	if (first) {
> +		struct drm_gpu_scheduler *sched;
> +		struct drm_sched_rq *rq;
> +
>   		/* Add the entity to the run queue */
>   		spin_lock(&entity->rq_lock);
>   		if (entity->stopped) {
> @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   			return;
>   		}
>   
> -		drm_sched_rq_add_entity(entity->rq, entity);
> +		rq = entity->rq;
> +		sched = rq->sched;
> +
> +		drm_sched_rq_add_entity(rq, entity);
>   		spin_unlock(&entity->rq_lock);
>   
>   		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>   			drm_sched_rq_update_fifo(entity, submit_ts);
>   
> -		drm_sched_wakeup(entity->rq->sched);
> +		drm_sched_wakeup(sched);
>   	}
>   }
>   EXPORT_SYMBOL(drm_sched_entity_push_job);
Tvrtko Ursulin Sept. 24, 2024, 2:12 p.m. UTC | #2
On 24/09/2024 14:55, Christian König wrote:
> I've pushed the first to drm-misc-next, but that one here fails to apply 
> cleanly.

This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow control 
race") in drm-misc-fixes.

In theory 1-3 from my series are fixes. Should they also go to 
drm-misc-fixes? I am not too familiar with the drm-misc flow.

Or the series now needs to wait for some backmerge?

Regards,

Tvrtko

> Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> Since drm_sched_entity_modify_sched() can modify the entities run queue,
>> lets make sure to only dereference the pointer once so both adding and
>> waking up are guaranteed to be consistent.
>>
>> Alternative of moving the spin_unlock to after the wake up would for now
>> be more problematic since the same lock is taken inside
>> drm_sched_rq_update_fifo().
>>
>> v2:
>>   * Improve commit message. (Philipp)
>>   * Cache the scheduler pointer directly. (Christian)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify 
>> sched list")
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Luben Tuikov <ltuikov89@gmail.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: David Airlie <airlied@gmail.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Cc: Philipp Stanner <pstanner@redhat.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Cc: <stable@vger.kernel.org> # v5.7+
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 0e002c17fcb6..a75eede8bf8d 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct 
>> drm_sched_job *sched_job)
>>       /* first job wakes up scheduler */
>>       if (first) {
>> +        struct drm_gpu_scheduler *sched;
>> +        struct drm_sched_rq *rq;
>> +
>>           /* Add the entity to the run queue */
>>           spin_lock(&entity->rq_lock);
>>           if (entity->stopped) {
>> @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct 
>> drm_sched_job *sched_job)
>>               return;
>>           }
>> -        drm_sched_rq_add_entity(entity->rq, entity);
>> +        rq = entity->rq;
>> +        sched = rq->sched;
>> +
>> +        drm_sched_rq_add_entity(rq, entity);
>>           spin_unlock(&entity->rq_lock);
>>           if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>>               drm_sched_rq_update_fifo(entity, submit_ts);
>> -        drm_sched_wakeup(entity->rq->sched);
>> +        drm_sched_wakeup(sched);
>>       }
>>   }
>>   EXPORT_SYMBOL(drm_sched_entity_push_job);
>
Christian König Sept. 24, 2024, 2:20 p.m. UTC | #3
Am 24.09.24 um 16:12 schrieb Tvrtko Ursulin:
>
> On 24/09/2024 14:55, Christian König wrote:
>> I've pushed the first to drm-misc-next, but that one here fails to 
>> apply cleanly.
>
> This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow 
> control race") in drm-misc-fixes.
>
> In theory 1-3 from my series are fixes. Should they also go to 
> drm-misc-fixes? I am not too familiar with the drm-misc flow.

Ah shit, in that case you should have spitted the patches up into fixes 
and next. Going to push the first 3 to fixes.

> Or the series now needs to wait for some backmerge?

Are the remaining 3 patches independent? If not then we need to wait for 
a backmerge.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>> Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>
>>> Since drm_sched_entity_modify_sched() can modify the entities run 
>>> queue,
>>> lets make sure to only dereference the pointer once so both adding and
>>> waking up are guaranteed to be consistent.
>>>
>>> Alternative of moving the spin_unlock to after the wake up would for 
>>> now
>>> be more problematic since the same lock is taken inside
>>> drm_sched_rq_update_fifo().
>>>
>>> v2:
>>>   * Improve commit message. (Philipp)
>>>   * Cache the scheduler pointer directly. (Christian)
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify 
>>> sched list")
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Luben Tuikov <ltuikov89@gmail.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: David Airlie <airlied@gmail.com>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Philipp Stanner <pstanner@redhat.com>
>>> Cc: dri-devel@lists.freedesktop.org
>>> Cc: <stable@vger.kernel.org> # v5.7+
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 0e002c17fcb6..a75eede8bf8d 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct 
>>> drm_sched_job *sched_job)
>>>       /* first job wakes up scheduler */
>>>       if (first) {
>>> +        struct drm_gpu_scheduler *sched;
>>> +        struct drm_sched_rq *rq;
>>> +
>>>           /* Add the entity to the run queue */
>>>           spin_lock(&entity->rq_lock);
>>>           if (entity->stopped) {
>>> @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct 
>>> drm_sched_job *sched_job)
>>>               return;
>>>           }
>>> -        drm_sched_rq_add_entity(entity->rq, entity);
>>> +        rq = entity->rq;
>>> +        sched = rq->sched;
>>> +
>>> +        drm_sched_rq_add_entity(rq, entity);
>>>           spin_unlock(&entity->rq_lock);
>>>           if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>>>               drm_sched_rq_update_fifo(entity, submit_ts);
>>> -        drm_sched_wakeup(entity->rq->sched);
>>> +        drm_sched_wakeup(sched);
>>>       }
>>>   }
>>>   EXPORT_SYMBOL(drm_sched_entity_push_job);
>>
Tvrtko Ursulin Sept. 24, 2024, 2:54 p.m. UTC | #4
On 24/09/2024 15:20, Christian König wrote:
> Am 24.09.24 um 16:12 schrieb Tvrtko Ursulin:
>>
>> On 24/09/2024 14:55, Christian König wrote:
>>> I've pushed the first to drm-misc-next, but that one here fails to 
>>> apply cleanly.
>>
>> This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow 
>> control race") in drm-misc-fixes.
>>
>> In theory 1-3 from my series are fixes. Should they also go to 
>> drm-misc-fixes? I am not too familiar with the drm-misc flow.
> 
> Ah shit, in that case you should have spitted the patches up into fixes 
> and next. Going to push the first 3 to fixes.

Sorry my drm-intel ways of thinking (cherry picked fixes) are hard to 
get rid of. Hence the series was structured as 1-3 fixes, 4-8 refactors etc.

Now appears it is too late to pull out the first one from drm-misc-next.
>> Or the series now needs to wait for some backmerge?
> 
> Are the remaining 3 patches independent? If not then we need to wait for 
> a backmerge.

These are independent:

Fixes:

  1/8 "drm/sched: Add locking to drm_sched_entity_modify_sched"

Not fixes:

  5/8 "drm/sched: Stop setting current entity in FIFO mode"
  6/8 "drm/sched: Re-order struct drm_sched_rq members for clarity"

While the rest touch at least some common areas.

2/8 and 3/8 are also fixes.

4/8, 7/8 and 8/8 not fixes but depend on 2/8 and 3/8.

Regards,

Tvrtko

>>> Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>
>>>> Since drm_sched_entity_modify_sched() can modify the entities run 
>>>> queue,
>>>> lets make sure to only dereference the pointer once so both adding and
>>>> waking up are guaranteed to be consistent.
>>>>
>>>> Alternative of moving the spin_unlock to after the wake up would for 
>>>> now
>>>> be more problematic since the same lock is taken inside
>>>> drm_sched_rq_update_fifo().
>>>>
>>>> v2:
>>>>   * Improve commit message. (Philipp)
>>>>   * Cache the scheduler pointer directly. (Christian)
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify 
>>>> sched list")
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>> Cc: Luben Tuikov <ltuikov89@gmail.com>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: David Airlie <airlied@gmail.com>
>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>> Cc: Philipp Stanner <pstanner@redhat.com>
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: <stable@vger.kernel.org> # v5.7+
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/scheduler/sched_entity.c | 10 ++++++++--
>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> index 0e002c17fcb6..a75eede8bf8d 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>>> @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct 
>>>> drm_sched_job *sched_job)
>>>>       /* first job wakes up scheduler */
>>>>       if (first) {
>>>> +        struct drm_gpu_scheduler *sched;
>>>> +        struct drm_sched_rq *rq;
>>>> +
>>>>           /* Add the entity to the run queue */
>>>>           spin_lock(&entity->rq_lock);
>>>>           if (entity->stopped) {
>>>> @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct 
>>>> drm_sched_job *sched_job)
>>>>               return;
>>>>           }
>>>> -        drm_sched_rq_add_entity(entity->rq, entity);
>>>> +        rq = entity->rq;
>>>> +        sched = rq->sched;
>>>> +
>>>> +        drm_sched_rq_add_entity(rq, entity);
>>>>           spin_unlock(&entity->rq_lock);
>>>>           if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>>>>               drm_sched_rq_update_fifo(entity, submit_ts);
>>>> -        drm_sched_wakeup(entity->rq->sched);
>>>> +        drm_sched_wakeup(sched);
>>>>       }
>>>>   }
>>>>   EXPORT_SYMBOL(drm_sched_entity_push_job);
>>>
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 0e002c17fcb6..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 
 	/* first job wakes up scheduler */
 	if (first) {
+		struct drm_gpu_scheduler *sched;
+		struct drm_sched_rq *rq;
+
 		/* Add the entity to the run queue */
 		spin_lock(&entity->rq_lock);
 		if (entity->stopped) {
@@ -608,13 +611,16 @@  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 			return;
 		}
 
-		drm_sched_rq_add_entity(entity->rq, entity);
+		rq = entity->rq;
+		sched = rq->sched;
+
+		drm_sched_rq_add_entity(rq, entity);
 		spin_unlock(&entity->rq_lock);
 
 		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
 			drm_sched_rq_update_fifo(entity, submit_ts);
 
-		drm_sched_wakeup(entity->rq->sched);
+		drm_sched_wakeup(sched);
 	}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);