diff mbox series

drm/sched: Add return value for drm_sched_entity_push_job

Message ID 20250326070441.1515428-1-liuqianyi125@gmail.com (mailing list archive)
State New
Headers show
Series drm/sched: Add return value for drm_sched_entity_push_job | expand

Commit Message

Qianyi Liu March 26, 2025, 7:04 a.m. UTC
Currently drm_sched_entity_push_job() has no return value to indicate
operation status. This makes it difficult for callers to handle error
conditions properly.

Add a int return value to drm_sched_entity_push_job() that returns 0 on
success or a negative error code (e.g., -EINVAL) on failure. This allows
callers to:

1. Detect when job submission fails
2. Perform proper cleanup (e.g., release job and fence allocations)
3. Avoid potential memory leaks in error paths

Signed-off-by: Qianyi Liu <liuqianyi125@gmail.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 8 ++++++--
 include/drm/gpu_scheduler.h              | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Philipp Stanner March 26, 2025, 8:14 a.m. UTC | #1
Hello,

thanks for your patch

On Wed, 2025-03-26 at 15:04 +0800, Qianyi Liu wrote:
> Currently drm_sched_entity_push_job() has no return value to indicate
> operation status. This makes it difficult for callers to handle error
> conditions properly.
> 
> Add a int return value to drm_sched_entity_push_job() that returns 0
> on
> success or a negative error code (e.g., -EINVAL) on failure. This
> allows
> callers to:
> 
> 1. Detect when job submission fails
> 2. Perform proper cleanup (e.g., release job and fence allocations)
> 3. Avoid potential memory leaks in error paths
> 
> Signed-off-by: Qianyi Liu <liuqianyi125@gmail.com>
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 8 ++++++--
>  include/drm/gpu_scheduler.h              | 2 +-
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index bd39db7bb240..f31964e76062 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -579,8 +579,10 @@ void drm_sched_entity_select_rq(struct
> drm_sched_entity *entity)
>   * fence sequence number this function should be called with
> drm_sched_job_arm()
>   * under common lock for the struct drm_sched_entity that was set up
> for
>   * @sched_job in drm_sched_job_init().
> + *
> + * Returns 0 on success or a negative error code on failure.
>   */
> -void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> +int drm_sched_entity_push_job(struct drm_sched_job *sched_job)

I am not convinced this is a justified change, because:
   1. The entity being dead is the only error that might occur
   2. That error is extremely unlikely
   3. It does not occur in all existing upstream drivers
   4. Drivers typically first prevent userspace from pushing stuff to
      an entity, then tear the entity down, then (in case of firmware
      schedulers) tear the entire scheduler down.
   5. Your patch doesn't add an upstream user for the error code,
      showing how it's better / different from how existing drivers
      handle their entity-pushing.

Thus, pushing to a killed entity is a gross misdesign of the driver,
which should become visible during development and can't be handled
during runtime in a reasonable manner.

P.

>  {
>   struct drm_sched_entity *entity = sched_job->entity;
>   bool first;
> @@ -609,7 +611,7 @@ void drm_sched_entity_push_job(struct
> drm_sched_job *sched_job)
>   spin_unlock(&entity->lock);
>  
>   DRM_ERROR("Trying to push to a killed entity\n");
> - return;
> + return -EINVAL;
>   }
>  
>   rq = entity->rq;
> @@ -626,5 +628,7 @@ void drm_sched_entity_push_job(struct
> drm_sched_job *sched_job)
>  
>   drm_sched_wakeup(sched);
>   }
> +
> + return 0;
>  }
>  EXPORT_SYMBOL(drm_sched_entity_push_job);
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index 50928a7ae98e..48a263571bab 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -589,7 +589,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>          struct drm_sched_entity *entity,
>          u32 credits, void *owner);
>  void drm_sched_job_arm(struct drm_sched_job *job);
> -void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> +int drm_sched_entity_push_job(struct drm_sched_job *sched_job);
>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>   struct dma_fence *fence);
>  int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index bd39db7bb240..f31964e76062 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -579,8 +579,10 @@  void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
  * fence sequence number this function should be called with drm_sched_job_arm()
  * under common lock for the struct drm_sched_entity that was set up for
  * @sched_job in drm_sched_job_init().
+ *
+ * Returns 0 on success or a negative error code on failure.
  */
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
+int drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 {
 	struct drm_sched_entity *entity = sched_job->entity;
 	bool first;
@@ -609,7 +611,7 @@  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 			spin_unlock(&entity->lock);
 
 			DRM_ERROR("Trying to push to a killed entity\n");
-			return;
+			return -EINVAL;
 		}
 
 		rq = entity->rq;
@@ -626,5 +628,7 @@  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 
 		drm_sched_wakeup(sched);
 	}
+
+	return 0;
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 50928a7ae98e..48a263571bab 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -589,7 +589,7 @@  int drm_sched_job_init(struct drm_sched_job *job,
 		       struct drm_sched_entity *entity,
 		       u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
+int drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 				 struct dma_fence *fence);
 int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,