[v4,3/5] drm/scheduler: rework job destruction

Message ID	1555438986-18303-3-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> To: <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org>, <eric@anholt.net>, <etnaviv@lists.freedesktop.org>, <ckoenig.leichtzumerken@gmail.com> Subject: [PATCH v4 3/5] drm/scheduler: rework job destruction Date: Tue, 16 Apr 2019 14:23:04 -0400 Message-ID: <1555438986-18303-3-git-send-email-andrey.grodzovsky@amd.com> In-Reply-To: <1555438986-18303-1-git-send-email-andrey.grodzovsky@amd.com> References: <1555438986-18303-1-git-send-email-andrey.grodzovsky@amd.com> MIME-Version: 1.0 Precedence: list Cc: Nicholas.Kazlauskas@amd.com, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	[v4,1/5] drm/amd/display: wait for fence without holding reservation lock \| expand [v4,1/5] drm/amd/display: wait for fence without holding reservation lock [v4,2/5] drm/amd/display: Use a reasonable timeout for framebuffer fence waits [v4,3/5] drm/scheduler: rework job destruction [v4,4/5] drm/sched: Keep s_fence->parent pointer [v4,5/5] drm/amdgpu: Avoid HW reset if guilty job already signaled.

Andrey Grodzovsky April 16, 2019, 6:23 p.m. UTC

From: Christian König <christian.koenig@amd.com>

We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.
v4: Move guilty job free into sched code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
 drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
 drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
 drivers/gpu/drm/lima/lima_sched.c          |   2 +-
 drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
 drivers/gpu/drm/scheduler/sched_main.c     | 145 +++++++++++++++++------------
 drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
 include/drm/gpu_scheduler.h                |   6 +-
 8 files changed, 94 insertions(+), 78 deletions(-)

Andrey Grodzovsky April 17, 2019, 2:36 p.m. UTC | #1

Ping on this patch and patch 5. The rest already RBed.

Andrey

On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
> From: Christian König <christian.koenig@amd.com>
>
> We now destroy finished jobs from the worker thread to make sure that
> we never destroy a job currently in timeout processing.
> By this we avoid holding lock around ring mirror list in drm_sched_stop
> which should solve a deadlock reported by a user.
>
> v2: Remove unused variable.
> v4: Move guilty job free into sched code.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>   drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>   drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>   drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>   drivers/gpu/drm/scheduler/sched_main.c     | 145 +++++++++++++++++------------
>   drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>   include/drm/gpu_scheduler.h                |   6 +-
>   8 files changed, 94 insertions(+), 78 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7cee269..a0e165c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>   		if (!ring || !ring->sched.thread)
>   			continue;
>   
> -		drm_sched_stop(&ring->sched);
> +		drm_sched_stop(&ring->sched, &job->base);
>   
>   		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
>   		amdgpu_fence_driver_force_completion(ring);
> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>   	if(job)
>   		drm_sched_increase_karma(&job->base);
>   
> -
> -
>   	if (!amdgpu_sriov_vf(adev)) {
>   
>   		if (!need_full_reset)
> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>   	return r;
>   }
>   
> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
> -					  struct amdgpu_job *job)
> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>   {
>   	int i;
>   
> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   
>   	/* Post ASIC reset for all devs .*/
>   	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
> -		amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
> +		amdgpu_device_post_asic_reset(tmp_adev);
>   
>   		if (r) {
>   			/* bad news, how to tell it to userspace ? */
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> index 33854c9..5778d9c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>   		    mmu_size + gpu->buffer.size;
>   
>   	/* Add in the active command buffers */
> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>   	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>   		submit = to_etnaviv_submit(s_job);
>   		file_size += submit->cmdbuf.size;
>   		n_obj++;
>   	}
> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>   
>   	/* Add in the active buffer objects */
>   	list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>   			      gpu->buffer.size,
>   			      etnaviv_cmdbuf_get_va(&gpu->buffer));
>   
> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>   	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>   		submit = to_etnaviv_submit(s_job);
>   		etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>   				      submit->cmdbuf.vaddr, submit->cmdbuf.size,
>   				      etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>   	}
> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>   
>   	/* Reserve space for the bomap */
>   	if (n_bomap_pages) {
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 6d24fea..a813c82 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>   	}
>   
>   	/* block scheduler */
> -	drm_sched_stop(&gpu->sched);
> +	drm_sched_stop(&gpu->sched, sched_job);
>   
>   	if(sched_job)
>   		drm_sched_increase_karma(sched_job);
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 97bd9c1..df98931 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job)
>   static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe,
>   					 struct lima_sched_task *task)
>   {
> -	drm_sched_stop(&pipe->base);
> +	drm_sched_stop(&pipe->base, &task->base);
>   
>   	if (task)
>   		drm_sched_increase_karma(&task->base);
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 0a7ed04..c6336b7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
>   		sched_job);
>   
>   	for (i = 0; i < NUM_JOB_SLOTS; i++)
> -		drm_sched_stop(&pfdev->js->queue[i].sched);
> +		drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>   
>   	if (sched_job)
>   		drm_sched_increase_karma(sched_job);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 19fc601..21e8734 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>   }
>   EXPORT_SYMBOL(drm_sched_resume_timeout);
>   
> -/* job_finish is called after hw fence signaled
> - */
> -static void drm_sched_job_finish(struct work_struct *work)
> -{
> -	struct drm_sched_job *s_job = container_of(work, struct drm_sched_job,
> -						   finish_work);
> -	struct drm_gpu_scheduler *sched = s_job->sched;
> -	unsigned long flags;
> -
> -	/*
> -	 * Canceling the timeout without removing our job from the ring mirror
> -	 * list is safe, as we will only end up in this worker if our jobs
> -	 * finished fence has been signaled. So even if some another worker
> -	 * manages to find this job as the next job in the list, the fence
> -	 * signaled check below will prevent the timeout to be restarted.
> -	 */
> -	cancel_delayed_work_sync(&sched->work_tdr);
> -
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	/* queue TDR for next job */
> -	drm_sched_start_timeout(sched);
> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> -
> -	sched->ops->free_job(s_job);
> -}
> -
>   static void drm_sched_job_begin(struct drm_sched_job *s_job)
>   {
>   	struct drm_gpu_scheduler *sched = s_job->sched;
> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	if (job)
>   		job->sched->ops->timedout_job(job);
>   
> +	/*
> +	 * Guilty job did complete and hence needs to be manually removed
> +	 * See drm_sched_stop doc.
> +	 */
> +	if (list_empty(&job->node))
> +		job->sched->ops->free_job(job);
> +
>   	spin_lock_irqsave(&sched->job_list_lock, flags);
>   	drm_sched_start_timeout(sched);
>   	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>    * @sched: scheduler instance
>    * @bad: bad scheduler job
>    *
> + * Stop the scheduler and also removes and frees all completed jobs.
> + * Note: bad job will not be freed as it might be used later and so it's
> + * callers responsibility to release it manually if it's not part of the
> + * mirror list any more.
> + *
>    */
> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   {
> -	struct drm_sched_job *s_job;
> +	struct drm_sched_job *s_job, *tmp;
>   	unsigned long flags;
> -	struct dma_fence *last_fence =  NULL;
>   
>   	kthread_park(sched->thread);
>   
>   	/*
> -	 * Verify all the signaled jobs in mirror list are removed from the ring
> -	 * by waiting for the latest job to enter the list. This should insure that
> -	 * also all the previous jobs that were in flight also already singaled
> -	 * and removed from the list.
> +	 * Iterate the job list from later to  earlier one and either deactive
> +	 * their HW callbacks or remove them from mirror list if they already
> +	 * signaled.
> +	 * This iteration is thread safe as sched thread is stopped.
>   	 */
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
> +	list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) {
>   		if (s_job->s_fence->parent &&
>   		    dma_fence_remove_callback(s_job->s_fence->parent,
>   					      &s_job->cb)) {
> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched)
>   			s_job->s_fence->parent = NULL;
>   			atomic_dec(&sched->hw_rq_count);
>   		} else {
> -			 last_fence = dma_fence_get(&s_job->s_fence->finished);
> -			 break;
> +			/*
> +			 * remove job from ring_mirror_list.
> +			 * Locking here is for concurrent resume timeout
> +			 */
> +			spin_lock_irqsave(&sched->job_list_lock, flags);
> +			list_del_init(&s_job->node);
> +			spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +
> +			/*
> +			 * Wait for job's HW fence callback to finish using s_job
> +			 * before releasing it.
> +			 *
> +			 * Job is still alive so fence refcount at least 1
> +			 */
> +			dma_fence_wait(&s_job->s_fence->finished, false);
> +
> +			/*
> +			 * We must keep bad job alive for later use during
> +			 * recovery by some of the drivers
> +			 */
> +			if (bad != s_job)
> +				sched->ops->free_job(s_job);
>   		}
>   	}
> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> -
> -	if (last_fence) {
> -		dma_fence_wait(last_fence, false);
> -		dma_fence_put(last_fence);
> -	}
>   }
>   
>   EXPORT_SYMBOL(drm_sched_stop);
> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>   void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>   {
>   	struct drm_sched_job *s_job, *tmp;
> +	unsigned long flags;
>   	int r;
>   
>   	if (!full_recovery)
> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>   
>   	/*
>   	 * Locking the list is not required here as the sched thread is parked
> -	 * so no new jobs are being pushed in to HW and in drm_sched_stop we
> -	 * flushed all the jobs who were still in mirror list but who already
> -	 * signaled and removed them self from the list. Also concurrent
> +	 * so no new jobs are being inserted or removed. Also concurrent
>   	 * GPU recovers can't run in parallel.
>   	 */
>   	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>   			drm_sched_process_job(NULL, &s_job->cb);
>   	}
>   
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>   	drm_sched_start_timeout(sched);
> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
>   unpark:
>   	kthread_unpark(sched->thread);
> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>   	uint64_t guilty_context;
>   	bool found_guilty = false;
>   
> -	/*TODO DO we need spinlock here ? */
>   	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>   		struct drm_sched_fence *s_fence = s_job->s_fence;
>   
> @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		return -ENOMEM;
>   	job->id = atomic64_inc_return(&sched->job_id_count);
>   
> -	INIT_WORK(&job->finish_work, drm_sched_job_finish);
>   	INIT_LIST_HEAD(&job->node);
>   
>   	return 0;
> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>   	struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb);
>   	struct drm_sched_fence *s_fence = s_job->s_fence;
>   	struct drm_gpu_scheduler *sched = s_fence->sched;
> -	unsigned long flags;
> -
> -	cancel_delayed_work(&sched->work_tdr);
>   
>   	atomic_dec(&sched->hw_rq_count);
>   	atomic_dec(&sched->num_jobs);
>   
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	/* remove job from ring_mirror_list */
> -	list_del_init(&s_job->node);
> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +	trace_drm_sched_process_job(s_fence);
>   
>   	drm_sched_fence_finished(s_fence);
> -
> -	trace_drm_sched_process_job(s_fence);
>   	wake_up_interruptible(&sched->wake_up_worker);
> +}
> +
> +/**
> + * drm_sched_cleanup_jobs - destroy finished jobs
> + *
> + * @sched: scheduler instance
> + *
> + * Remove all finished jobs from the mirror list and destroy them.
> + */
> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> +{
> +	unsigned long flags;
> +
> +	/* Don't destroy jobs while the timeout worker is running */
> +	if (!cancel_delayed_work(&sched->work_tdr))
> +		return;
> +
> +
> +	while (!list_empty(&sched->ring_mirror_list)) {
> +		struct drm_sched_job *job;
> +
> +		job = list_first_entry(&sched->ring_mirror_list,
> +				       struct drm_sched_job, node);
> +		if (!dma_fence_is_signaled(&job->s_fence->finished))
> +			break;
> +
> +		spin_lock_irqsave(&sched->job_list_lock, flags);
> +		/* remove job from ring_mirror_list */
> +		list_del_init(&job->node);
> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> +
> +		sched->ops->free_job(job);
> +	}
> +
> +	/* queue timeout for next job */
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
> +	drm_sched_start_timeout(sched);
> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
> -	schedule_work(&s_job->finish_work);
>   }
>   
>   /**
> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>   		struct dma_fence *fence;
>   
>   		wait_event_interruptible(sched->wake_up_worker,
> +					 (drm_sched_cleanup_jobs(sched),
>   					 (!drm_sched_blocked(sched) &&
>   					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop());
> +					 kthread_should_stop()));
>   
>   		if (!entity)
>   			continue;
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index e740f3b..1a4abe7 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
>   
>   	/* block scheduler */
>   	for (q = 0; q < V3D_MAX_QUEUES; q++)
> -		drm_sched_stop(&v3d->queue[q].sched);
> +		drm_sched_stop(&v3d->queue[q].sched, sched_job);
>   
>   	if (sched_job)
>   		drm_sched_increase_karma(sched_job);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 0daca4d..9ee0f27 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>    * @sched: the scheduler instance on which this job is scheduled.
>    * @s_fence: contains the fences for the scheduling of job.
>    * @finish_cb: the callback for the finished fence.
> - * @finish_work: schedules the function @drm_sched_job_finish once the job has
> - *               finished to remove the job from the
> - *               @drm_gpu_scheduler.ring_mirror_list.
>    * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
>    * @id: a unique id assigned to each job scheduled on the scheduler.
>    * @karma: increment on every hang caused by this job. If this exceeds the hang
> @@ -188,7 +185,6 @@ struct drm_sched_job {
>   	struct drm_gpu_scheduler	*sched;
>   	struct drm_sched_fence		*s_fence;
>   	struct dma_fence_cb		finish_cb;
> -	struct work_struct		finish_work;
>   	struct list_head		node;
>   	uint64_t			id;
>   	atomic_t			karma;
> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		       void *owner);
>   void drm_sched_job_cleanup(struct drm_sched_job *job);
>   void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
>   void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
>   void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>   void drm_sched_increase_karma(struct drm_sched_job *bad);

Christian König April 17, 2019, 5:17 p.m. UTC | #2

I can't review this patch, since I'm one of the authors of it, but in 
general your changes look good to me now.

For patch #5 I think it might be cleaner if we move incrementing of the 
hw_rq_count while starting the scheduler again.

Regards,
Christian.

Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
> Ping on this patch and patch 5. The rest already RBed.
>
> Andrey
>
> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>> From: Christian König <christian.koenig@amd.com>
>>
>> We now destroy finished jobs from the worker thread to make sure that
>> we never destroy a job currently in timeout processing.
>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>> which should solve a deadlock reported by a user.
>>
>> v2: Remove unused variable.
>> v4: Move guilty job free into sched code.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>    drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>    drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>    drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>    drivers/gpu/drm/scheduler/sched_main.c     | 145 +++++++++++++++++------------
>>    drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>    include/drm/gpu_scheduler.h                |   6 +-
>>    8 files changed, 94 insertions(+), 78 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 7cee269..a0e165c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>    		if (!ring || !ring->sched.thread)
>>    			continue;
>>    
>> -		drm_sched_stop(&ring->sched);
>> +		drm_sched_stop(&ring->sched, &job->base);
>>    
>>    		/* after all hw jobs are reset, hw fence is meaningless, so force_completion */
>>    		amdgpu_fence_driver_force_completion(ring);
>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>>    	if(job)
>>    		drm_sched_increase_karma(&job->base);
>>    
>> -
>> -
>>    	if (!amdgpu_sriov_vf(adev)) {
>>    
>>    		if (!need_full_reset)
>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>>    	return r;
>>    }
>>    
>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
>> -					  struct amdgpu_job *job)
>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>    {
>>    	int i;
>>    
>> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>    
>>    	/* Post ASIC reset for all devs .*/
>>    	list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
>> -		amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
>> +		amdgpu_device_post_asic_reset(tmp_adev);
>>    
>>    		if (r) {
>>    			/* bad news, how to tell it to userspace ? */
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> index 33854c9..5778d9c 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>    		    mmu_size + gpu->buffer.size;
>>    
>>    	/* Add in the active command buffers */
>> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>    	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>    		submit = to_etnaviv_submit(s_job);
>>    		file_size += submit->cmdbuf.size;
>>    		n_obj++;
>>    	}
>> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>    
>>    	/* Add in the active buffer objects */
>>    	list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>    			      gpu->buffer.size,
>>    			      etnaviv_cmdbuf_get_va(&gpu->buffer));
>>    
>> -	spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>    	list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>    		submit = to_etnaviv_submit(s_job);
>>    		etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>    				      submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>    				      etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>    	}
>> -	spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>    
>>    	/* Reserve space for the bomap */
>>    	if (n_bomap_pages) {
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index 6d24fea..a813c82 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
>>    	}
>>    
>>    	/* block scheduler */
>> -	drm_sched_stop(&gpu->sched);
>> +	drm_sched_stop(&gpu->sched, sched_job);
>>    
>>    	if(sched_job)
>>    		drm_sched_increase_karma(sched_job);
>> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
>> index 97bd9c1..df98931 100644
>> --- a/drivers/gpu/drm/lima/lima_sched.c
>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>> @@ -300,7 +300,7 @@ static struct dma_fence *lima_sched_run_job(struct drm_sched_job *job)
>>    static void lima_sched_handle_error_task(struct lima_sched_pipe *pipe,
>>    					 struct lima_sched_task *task)
>>    {
>> -	drm_sched_stop(&pipe->base);
>> +	drm_sched_stop(&pipe->base, &task->base);
>>    
>>    	if (task)
>>    		drm_sched_increase_karma(&task->base);
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 0a7ed04..c6336b7 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct drm_sched_job *sched_job)
>>    		sched_job);
>>    
>>    	for (i = 0; i < NUM_JOB_SLOTS; i++)
>> -		drm_sched_stop(&pfdev->js->queue[i].sched);
>> +		drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>    
>>    	if (sched_job)
>>    		drm_sched_increase_karma(sched_job);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 19fc601..21e8734 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
>>    }
>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>    
>> -/* job_finish is called after hw fence signaled
>> - */
>> -static void drm_sched_job_finish(struct work_struct *work)
>> -{
>> -	struct drm_sched_job *s_job = container_of(work, struct drm_sched_job,
>> -						   finish_work);
>> -	struct drm_gpu_scheduler *sched = s_job->sched;
>> -	unsigned long flags;
>> -
>> -	/*
>> -	 * Canceling the timeout without removing our job from the ring mirror
>> -	 * list is safe, as we will only end up in this worker if our jobs
>> -	 * finished fence has been signaled. So even if some another worker
>> -	 * manages to find this job as the next job in the list, the fence
>> -	 * signaled check below will prevent the timeout to be restarted.
>> -	 */
>> -	cancel_delayed_work_sync(&sched->work_tdr);
>> -
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	/* queue TDR for next job */
>> -	drm_sched_start_timeout(sched);
>> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> -	sched->ops->free_job(s_job);
>> -}
>> -
>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>    {
>>    	struct drm_gpu_scheduler *sched = s_job->sched;
>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>    	if (job)
>>    		job->sched->ops->timedout_job(job);
>>    
>> +	/*
>> +	 * Guilty job did complete and hence needs to be manually removed
>> +	 * See drm_sched_stop doc.
>> +	 */
>> +	if (list_empty(&job->node))
>> +		job->sched->ops->free_job(job);
>> +
>>    	spin_lock_irqsave(&sched->job_list_lock, flags);
>>    	drm_sched_start_timeout(sched);
>>    	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>     * @sched: scheduler instance
>>     * @bad: bad scheduler job
>>     *
>> + * Stop the scheduler and also removes and frees all completed jobs.
>> + * Note: bad job will not be freed as it might be used later and so it's
>> + * callers responsibility to release it manually if it's not part of the
>> + * mirror list any more.
>> + *
>>     */
>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    {
>> -	struct drm_sched_job *s_job;
>> +	struct drm_sched_job *s_job, *tmp;
>>    	unsigned long flags;
>> -	struct dma_fence *last_fence =  NULL;
>>    
>>    	kthread_park(sched->thread);
>>    
>>    	/*
>> -	 * Verify all the signaled jobs in mirror list are removed from the ring
>> -	 * by waiting for the latest job to enter the list. This should insure that
>> -	 * also all the previous jobs that were in flight also already singaled
>> -	 * and removed from the list.
>> +	 * Iterate the job list from later to  earlier one and either deactive
>> +	 * their HW callbacks or remove them from mirror list if they already
>> +	 * signaled.
>> +	 * This iteration is thread safe as sched thread is stopped.
>>    	 */
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
>> +	list_for_each_entry_safe_reverse(s_job, tmp, &sched->ring_mirror_list, node) {
>>    		if (s_job->s_fence->parent &&
>>    		    dma_fence_remove_callback(s_job->s_fence->parent,
>>    					      &s_job->cb)) {
>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>    			s_job->s_fence->parent = NULL;
>>    			atomic_dec(&sched->hw_rq_count);
>>    		} else {
>> -			 last_fence = dma_fence_get(&s_job->s_fence->finished);
>> -			 break;
>> +			/*
>> +			 * remove job from ring_mirror_list.
>> +			 * Locking here is for concurrent resume timeout
>> +			 */
>> +			spin_lock_irqsave(&sched->job_list_lock, flags);
>> +			list_del_init(&s_job->node);
>> +			spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>> +			/*
>> +			 * Wait for job's HW fence callback to finish using s_job
>> +			 * before releasing it.
>> +			 *
>> +			 * Job is still alive so fence refcount at least 1
>> +			 */
>> +			dma_fence_wait(&s_job->s_fence->finished, false);
>> +
>> +			/*
>> +			 * We must keep bad job alive for later use during
>> +			 * recovery by some of the drivers
>> +			 */
>> +			if (bad != s_job)
>> +				sched->ops->free_job(s_job);
>>    		}
>>    	}
>> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> -	if (last_fence) {
>> -		dma_fence_wait(last_fence, false);
>> -		dma_fence_put(last_fence);
>> -	}
>>    }
>>    
>>    EXPORT_SYMBOL(drm_sched_stop);
>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    {
>>    	struct drm_sched_job *s_job, *tmp;
>> +	unsigned long flags;
>>    	int r;
>>    
>>    	if (!full_recovery)
>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    
>>    	/*
>>    	 * Locking the list is not required here as the sched thread is parked
>> -	 * so no new jobs are being pushed in to HW and in drm_sched_stop we
>> -	 * flushed all the jobs who were still in mirror list but who already
>> -	 * signaled and removed them self from the list. Also concurrent
>> +	 * so no new jobs are being inserted or removed. Also concurrent
>>    	 * GPU recovers can't run in parallel.
>>    	 */
>>    	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery)
>>    			drm_sched_process_job(NULL, &s_job->cb);
>>    	}
>>    
>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>    	drm_sched_start_timeout(sched);
>> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>    
>>    unpark:
>>    	kthread_unpark(sched->thread);
>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>>    	uint64_t guilty_context;
>>    	bool found_guilty = false;
>>    
>> -	/*TODO DO we need spinlock here ? */
>>    	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
>>    		struct drm_sched_fence *s_fence = s_job->s_fence;
>>    
>> @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>    		return -ENOMEM;
>>    	job->id = atomic64_inc_return(&sched->job_id_count);
>>    
>> -	INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>    	INIT_LIST_HEAD(&job->node);
>>    
>>    	return 0;
>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>    	struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, cb);
>>    	struct drm_sched_fence *s_fence = s_job->s_fence;
>>    	struct drm_gpu_scheduler *sched = s_fence->sched;
>> -	unsigned long flags;
>> -
>> -	cancel_delayed_work(&sched->work_tdr);
>>    
>>    	atomic_dec(&sched->hw_rq_count);
>>    	atomic_dec(&sched->num_jobs);
>>    
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	/* remove job from ring_mirror_list */
>> -	list_del_init(&s_job->node);
>> -	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +	trace_drm_sched_process_job(s_fence);
>>    
>>    	drm_sched_fence_finished(s_fence);
>> -
>> -	trace_drm_sched_process_job(s_fence);
>>    	wake_up_interruptible(&sched->wake_up_worker);
>> +}
>> +
>> +/**
>> + * drm_sched_cleanup_jobs - destroy finished jobs
>> + *
>> + * @sched: scheduler instance
>> + *
>> + * Remove all finished jobs from the mirror list and destroy them.
>> + */
>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>> +{
>> +	unsigned long flags;
>> +
>> +	/* Don't destroy jobs while the timeout worker is running */
>> +	if (!cancel_delayed_work(&sched->work_tdr))
>> +		return;
>> +
>> +
>> +	while (!list_empty(&sched->ring_mirror_list)) {
>> +		struct drm_sched_job *job;
>> +
>> +		job = list_first_entry(&sched->ring_mirror_list,
>> +				       struct drm_sched_job, node);
>> +		if (!dma_fence_is_signaled(&job->s_fence->finished))
>> +			break;
>> +
>> +		spin_lock_irqsave(&sched->job_list_lock, flags);
>> +		/* remove job from ring_mirror_list */
>> +		list_del_init(&job->node);
>> +		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> +
>> +		sched->ops->free_job(job);
>> +	}
>> +
>> +	/* queue timeout for next job */
>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>> +	drm_sched_start_timeout(sched);
>> +	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>    
>> -	schedule_work(&s_job->finish_work);
>>    }
>>    
>>    /**
>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>    		struct dma_fence *fence;
>>    
>>    		wait_event_interruptible(sched->wake_up_worker,
>> +					 (drm_sched_cleanup_jobs(sched),
>>    					 (!drm_sched_blocked(sched) &&
>>    					  (entity = drm_sched_select_entity(sched))) ||
>> -					 kthread_should_stop());
>> +					 kthread_should_stop()));
>>    
>>    		if (!entity)
>>    			continue;
>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
>> index e740f3b..1a4abe7 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
>>    
>>    	/* block scheduler */
>>    	for (q = 0; q < V3D_MAX_QUEUES; q++)
>> -		drm_sched_stop(&v3d->queue[q].sched);
>> +		drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>    
>>    	if (sched_job)
>>    		drm_sched_increase_karma(sched_job);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 0daca4d..9ee0f27 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -167,9 +167,6 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>>     * @sched: the scheduler instance on which this job is scheduled.
>>     * @s_fence: contains the fences for the scheduling of job.
>>     * @finish_cb: the callback for the finished fence.
>> - * @finish_work: schedules the function @drm_sched_job_finish once the job has
>> - *               finished to remove the job from the
>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>     * @node: used to append this struct to the @drm_gpu_scheduler.ring_mirror_list.
>>     * @id: a unique id assigned to each job scheduled on the scheduler.
>>     * @karma: increment on every hang caused by this job. If this exceeds the hang
>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>    	struct drm_gpu_scheduler	*sched;
>>    	struct drm_sched_fence		*s_fence;
>>    	struct dma_fence_cb		finish_cb;
>> -	struct work_struct		finish_work;
>>    	struct list_head		node;
>>    	uint64_t			id;
>>    	atomic_t			karma;
>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>    		       void *owner);
>>    void drm_sched_job_cleanup(struct drm_sched_job *job);
>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad);
>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery);
>>    void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>    void drm_sched_increase_karma(struct drm_sched_job *bad);

Andrey Grodzovsky April 17, 2019, 5:53 p.m. UTC | #3

On 4/17/19 1:17 PM, Christian König wrote:
> I can't review this patch, since I'm one of the authors of it, but in 
> general your changes look good to me now.
>
> For patch #5 I think it might be cleaner if we move incrementing of 
> the hw_rq_count while starting the scheduler again.


But the increment of  hw_rq_count is conditional on if the guilty job 
was signaled, moving it into drm_sched_start will also force me to pass  
'job_signaled' flag into drm_sched_start which is against your original 
comment that we don't want to pass this logic around helper functions 
and keep it all in one place which is amdgpu_device_gpu_recover.

Andrey


>
> Regards,
> Christian.
>
> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>> Ping on this patch and patch 5. The rest already RBed.
>>
>> Andrey
>>
>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>> From: Christian König <christian.koenig@amd.com>
>>>
>>> We now destroy finished jobs from the worker thread to make sure that
>>> we never destroy a job currently in timeout processing.
>>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>>> which should solve a deadlock reported by a user.
>>>
>>> v2: Remove unused variable.
>>> v4: Move guilty job free into sched code.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>    drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>    drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>    drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>    drivers/gpu/drm/scheduler/sched_main.c     | 145 
>>> +++++++++++++++++------------
>>>    drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>    include/drm/gpu_scheduler.h                |   6 +-
>>>    8 files changed, 94 insertions(+), 78 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 7cee269..a0e165c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct 
>>> amdgpu_device *adev,
>>>            if (!ring || !ring->sched.thread)
>>>                continue;
>>>    -        drm_sched_stop(&ring->sched);
>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>               /* after all hw jobs are reset, hw fence is 
>>> meaningless, so force_completion */
>>>            amdgpu_fence_driver_force_completion(ring);
>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct 
>>> amdgpu_device *adev,
>>>        if(job)
>>>            drm_sched_increase_karma(&job->base);
>>>    -
>>> -
>>>        if (!amdgpu_sriov_vf(adev)) {
>>>               if (!need_full_reset)
>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct 
>>> amdgpu_hive_info *hive,
>>>        return r;
>>>    }
>>>    -static void amdgpu_device_post_asic_reset(struct amdgpu_device 
>>> *adev,
>>> -                      struct amdgpu_job *job)
>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>>    {
>>>        int i;
>>>    @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct 
>>> amdgpu_device *adev,
>>>           /* Post ASIC reset for all devs .*/
>>>        list_for_each_entry(tmp_adev, device_list_handle, 
>>> gmc.xgmi.head) {
>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? 
>>> job : NULL);
>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>               if (r) {
>>>                /* bad news, how to tell it to userspace ? */
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c 
>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>> index 33854c9..5778d9c 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>                mmu_size + gpu->buffer.size;
>>>           /* Add in the active command buffers */
>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>        list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>            submit = to_etnaviv_submit(s_job);
>>>            file_size += submit->cmdbuf.size;
>>>            n_obj++;
>>>        }
>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>           /* Add in the active buffer objects */
>>>        list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>                      gpu->buffer.size,
>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>    -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>        list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>            submit = to_etnaviv_submit(s_job);
>>>            etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>                          submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>        }
>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>           /* Reserve space for the bomap */
>>>        if (n_bomap_pages) {
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> index 6d24fea..a813c82 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct 
>>> drm_sched_job *sched_job)
>>>        }
>>>           /* block scheduler */
>>> -    drm_sched_stop(&gpu->sched);
>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>           if(sched_job)
>>>            drm_sched_increase_karma(sched_job);
>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>>> b/drivers/gpu/drm/lima/lima_sched.c
>>> index 97bd9c1..df98931 100644
>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>> @@ -300,7 +300,7 @@ static struct dma_fence 
>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>    static void lima_sched_handle_error_task(struct lima_sched_pipe 
>>> *pipe,
>>>                         struct lima_sched_task *task)
>>>    {
>>> -    drm_sched_stop(&pipe->base);
>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>           if (task)
>>>            drm_sched_increase_karma(&task->base);
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 0a7ed04..c6336b7 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct 
>>> drm_sched_job *sched_job)
>>>            sched_job);
>>>           for (i = 0; i < NUM_JOB_SLOTS; i++)
>>> -        drm_sched_stop(&pfdev->js->queue[i].sched);
>>> +        drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>           if (sched_job)
>>>            drm_sched_increase_karma(sched_job);
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 19fc601..21e8734 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct 
>>> drm_gpu_scheduler *sched,
>>>    }
>>>    EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>    -/* job_finish is called after hw fence signaled
>>> - */
>>> -static void drm_sched_job_finish(struct work_struct *work)
>>> -{
>>> -    struct drm_sched_job *s_job = container_of(work, struct 
>>> drm_sched_job,
>>> -                           finish_work);
>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>> -    unsigned long flags;
>>> -
>>> -    /*
>>> -     * Canceling the timeout without removing our job from the ring 
>>> mirror
>>> -     * list is safe, as we will only end up in this worker if our jobs
>>> -     * finished fence has been signaled. So even if some another 
>>> worker
>>> -     * manages to find this job as the next job in the list, the fence
>>> -     * signaled check below will prevent the timeout to be restarted.
>>> -     */
>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>> -
>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -    /* queue TDR for next job */
>>> -    drm_sched_start_timeout(sched);
>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> -    sched->ops->free_job(s_job);
>>> -}
>>> -
>>>    static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>    {
>>>        struct drm_gpu_scheduler *sched = s_job->sched;
>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>        if (job)
>>>            job->sched->ops->timedout_job(job);
>>>    +    /*
>>> +     * Guilty job did complete and hence needs to be manually removed
>>> +     * See drm_sched_stop doc.
>>> +     */
>>> +    if (list_empty(&job->node))
>>> +        job->sched->ops->free_job(job);
>>> +
>>>        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>        drm_sched_start_timeout(sched);
>>>        spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>     * @sched: scheduler instance
>>>     * @bad: bad scheduler job
>>>     *
>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>> + * Note: bad job will not be freed as it might be used later and so 
>>> it's
>>> + * callers responsibility to release it manually if it's not part 
>>> of the
>>> + * mirror list any more.
>>> + *
>>>     */
>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
>>> drm_sched_job *bad)
>>>    {
>>> -    struct drm_sched_job *s_job;
>>> +    struct drm_sched_job *s_job, *tmp;
>>>        unsigned long flags;
>>> -    struct dma_fence *last_fence =  NULL;
>>>           kthread_park(sched->thread);
>>>           /*
>>> -     * Verify all the signaled jobs in mirror list are removed from 
>>> the ring
>>> -     * by waiting for the latest job to enter the list. This should 
>>> insure that
>>> -     * also all the previous jobs that were in flight also already 
>>> singaled
>>> -     * and removed from the list.
>>> +     * Iterate the job list from later to  earlier one and either 
>>> deactive
>>> +     * their HW callbacks or remove them from mirror list if they 
>>> already
>>> +     * signaled.
>>> +     * This iteration is thread safe as sched thread is stopped.
>>>         */
>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, 
>>> node) {
>>> +    list_for_each_entry_safe_reverse(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>>            if (s_job->s_fence->parent &&
>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>                              &s_job->cb)) {
>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler 
>>> *sched)
>>>                s_job->s_fence->parent = NULL;
>>>                atomic_dec(&sched->hw_rq_count);
>>>            } else {
>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>> -             break;
>>> +            /*
>>> +             * remove job from ring_mirror_list.
>>> +             * Locking here is for concurrent resume timeout
>>> +             */
>>> +            spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +            list_del_init(&s_job->node);
>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +
>>> +            /*
>>> +             * Wait for job's HW fence callback to finish using s_job
>>> +             * before releasing it.
>>> +             *
>>> +             * Job is still alive so fence refcount at least 1
>>> +             */
>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>> +
>>> +            /*
>>> +             * We must keep bad job alive for later use during
>>> +             * recovery by some of the drivers
>>> +             */
>>> +            if (bad != s_job)
>>> +                sched->ops->free_job(s_job);
>>>            }
>>>        }
>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> -    if (last_fence) {
>>> -        dma_fence_wait(last_fence, false);
>>> -        dma_fence_put(last_fence);
>>> -    }
>>>    }
>>>       EXPORT_SYMBOL(drm_sched_stop);
>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool 
>>> full_recovery)
>>>    {
>>>        struct drm_sched_job *s_job, *tmp;
>>> +    unsigned long flags;
>>>        int r;
>>>           if (!full_recovery)
>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler 
>>> *sched, bool full_recovery)
>>>           /*
>>>         * Locking the list is not required here as the sched thread 
>>> is parked
>>> -     * so no new jobs are being pushed in to HW and in 
>>> drm_sched_stop we
>>> -     * flushed all the jobs who were still in mirror list but who 
>>> already
>>> -     * signaled and removed them self from the list. Also concurrent
>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>         * GPU recovers can't run in parallel.
>>>         */
>>>        list_for_each_entry_safe(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler 
>>> *sched, bool full_recovery)
>>>                drm_sched_process_job(NULL, &s_job->cb);
>>>        }
>>>    +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>        drm_sched_start_timeout(sched);
>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>       unpark:
>>>        kthread_unpark(sched->thread);
>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct 
>>> drm_gpu_scheduler *sched)
>>>        uint64_t guilty_context;
>>>        bool found_guilty = false;
>>>    -    /*TODO DO we need spinlock here ? */
>>>        list_for_each_entry_safe(s_job, tmp, 
>>> &sched->ring_mirror_list, node) {
>>>            struct drm_sched_fence *s_fence = s_job->s_fence;
>>>    @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job 
>>> *job,
>>>            return -ENOMEM;
>>>        job->id = atomic64_inc_return(&sched->job_id_count);
>>>    -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>        INIT_LIST_HEAD(&job->node);
>>>           return 0;
>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct 
>>> dma_fence *f, struct dma_fence_cb *cb)
>>>        struct drm_sched_job *s_job = container_of(cb, struct 
>>> drm_sched_job, cb);
>>>        struct drm_sched_fence *s_fence = s_job->s_fence;
>>>        struct drm_gpu_scheduler *sched = s_fence->sched;
>>> -    unsigned long flags;
>>> -
>>> -    cancel_delayed_work(&sched->work_tdr);
>>>           atomic_dec(&sched->hw_rq_count);
>>>        atomic_dec(&sched->num_jobs);
>>>    -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -    /* remove job from ring_mirror_list */
>>> -    list_del_init(&s_job->node);
>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +    trace_drm_sched_process_job(s_fence);
>>>           drm_sched_fence_finished(s_fence);
>>> -
>>> -    trace_drm_sched_process_job(s_fence);
>>>        wake_up_interruptible(&sched->wake_up_worker);
>>> +}
>>> +
>>> +/**
>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>> + *
>>> + * @sched: scheduler instance
>>> + *
>>> + * Remove all finished jobs from the mirror list and destroy them.
>>> + */
>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>> +{
>>> +    unsigned long flags;
>>> +
>>> +    /* Don't destroy jobs while the timeout worker is running */
>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>> +        return;
>>> +
>>> +
>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>> +        struct drm_sched_job *job;
>>> +
>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>> +                       struct drm_sched_job, node);
>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>> +            break;
>>> +
>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +        /* remove job from ring_mirror_list */
>>> +        list_del_init(&job->node);
>>> +        spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> +
>>> +        sched->ops->free_job(job);
>>> +    }
>>> +
>>> +    /* queue timeout for next job */
>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +    drm_sched_start_timeout(sched);
>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>    -    schedule_work(&s_job->finish_work);
>>>    }
>>>       /**
>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>            struct dma_fence *fence;
>>> wait_event_interruptible(sched->wake_up_worker,
>>> +                     (drm_sched_cleanup_jobs(sched),
>>>                         (!drm_sched_blocked(sched) &&
>>>                          (entity = drm_sched_select_entity(sched))) ||
>>> -                     kthread_should_stop());
>>> +                     kthread_should_stop()));
>>>               if (!entity)
>>>                continue;
>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>> index e740f3b..1a4abe7 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, 
>>> struct drm_sched_job *sched_job)
>>>           /* block scheduler */
>>>        for (q = 0; q < V3D_MAX_QUEUES; q++)
>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>           if (sched_job)
>>>            drm_sched_increase_karma(sched_job);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 0daca4d..9ee0f27 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -167,9 +167,6 @@ struct drm_sched_fence 
>>> *to_drm_sched_fence(struct dma_fence *f);
>>>     * @sched: the scheduler instance on which this job is scheduled.
>>>     * @s_fence: contains the fences for the scheduling of job.
>>>     * @finish_cb: the callback for the finished fence.
>>> - * @finish_work: schedules the function @drm_sched_job_finish once 
>>> the job has
>>> - *               finished to remove the job from the
>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>     * @node: used to append this struct to the 
>>> @drm_gpu_scheduler.ring_mirror_list.
>>>     * @id: a unique id assigned to each job scheduled on the scheduler.
>>>     * @karma: increment on every hang caused by this job. If this 
>>> exceeds the hang
>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>        struct drm_gpu_scheduler    *sched;
>>>        struct drm_sched_fence        *s_fence;
>>>        struct dma_fence_cb        finish_cb;
>>> -    struct work_struct        finish_work;
>>>        struct list_head        node;
>>>        uint64_t            id;
>>>        atomic_t            karma;
>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>                   void *owner);
>>>    void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>    void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
>>> drm_sched_job *bad);
>>>    void drm_sched_start(struct drm_gpu_scheduler *sched, bool 
>>> full_recovery);
>>>    void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>    void drm_sched_increase_karma(struct drm_sched_job *bad);
>

Christian König April 17, 2019, 5:59 p.m. UTC | #4

Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
> On 4/17/19 1:17 PM, Christian König wrote:
>> I can't review this patch, since I'm one of the authors of it, but in
>> general your changes look good to me now.
>>
>> For patch #5 I think it might be cleaner if we move incrementing of
>> the hw_rq_count while starting the scheduler again.
>
> But the increment of  hw_rq_count is conditional on if the guilty job
> was signaled, moving it into drm_sched_start will also force me to pass
> 'job_signaled' flag into drm_sched_start which is against your original
> comment that we don't want to pass this logic around helper functions
> and keep it all in one place which is amdgpu_device_gpu_recover.

Well I hope that incrementing hw_rq_count is conditional for signaled 
jobs anyway, or otherwise we would seriously mess up the counter.

E.g. in drm_sched_stop() we also only decrement it when we where able to 
remove the callback.

Christian.

>
> Andrey
>
>
>> Regards,
>> Christian.
>>
>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>> Ping on this patch and patch 5. The rest already RBed.
>>>
>>> Andrey
>>>
>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>> From: Christian König <christian.koenig@amd.com>
>>>>
>>>> We now destroy finished jobs from the worker thread to make sure that
>>>> we never destroy a job currently in timeout processing.
>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>>>> which should solve a deadlock reported by a user.
>>>>
>>>> v2: Remove unused variable.
>>>> v4: Move guilty job free into sched code.
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>     drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>     drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>     drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>     drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>> +++++++++++++++++------------
>>>>     drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>     include/drm/gpu_scheduler.h                |   6 +-
>>>>     8 files changed, 94 insertions(+), 78 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 7cee269..a0e165c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>> amdgpu_device *adev,
>>>>             if (!ring || !ring->sched.thread)
>>>>                 continue;
>>>>     -        drm_sched_stop(&ring->sched);
>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>                /* after all hw jobs are reset, hw fence is
>>>> meaningless, so force_completion */
>>>>             amdgpu_fence_driver_force_completion(ring);
>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>> amdgpu_device *adev,
>>>>         if(job)
>>>>             drm_sched_increase_karma(&job->base);
>>>>     -
>>>> -
>>>>         if (!amdgpu_sriov_vf(adev)) {
>>>>                if (!need_full_reset)
>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>> amdgpu_hive_info *hive,
>>>>         return r;
>>>>     }
>>>>     -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>> *adev,
>>>> -                      struct amdgpu_job *job)
>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>>>     {
>>>>         int i;
>>>>     @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>> amdgpu_device *adev,
>>>>            /* Post ASIC reset for all devs .*/
>>>>         list_for_each_entry(tmp_adev, device_list_handle,
>>>> gmc.xgmi.head) {
>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>> job : NULL);
>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>                if (r) {
>>>>                 /* bad news, how to tell it to userspace ? */
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> index 33854c9..5778d9c 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>                 mmu_size + gpu->buffer.size;
>>>>            /* Add in the active command buffers */
>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>>             submit = to_etnaviv_submit(s_job);
>>>>             file_size += submit->cmdbuf.size;
>>>>             n_obj++;
>>>>         }
>>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>            /* Add in the active buffer objects */
>>>>         list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>                       gpu->buffer.size,
>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>     -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) {
>>>>             submit = to_etnaviv_submit(s_job);
>>>>             etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>                           submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>         }
>>>> -    spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>            /* Reserve space for the bomap */
>>>>         if (n_bomap_pages) {
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> index 6d24fea..a813c82 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>> drm_sched_job *sched_job)
>>>>         }
>>>>            /* block scheduler */
>>>> -    drm_sched_stop(&gpu->sched);
>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>            if(sched_job)
>>>>             drm_sched_increase_karma(sched_job);
>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>> index 97bd9c1..df98931 100644
>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>     static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>> *pipe,
>>>>                          struct lima_sched_task *task)
>>>>     {
>>>> -    drm_sched_stop(&pipe->base);
>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>            if (task)
>>>>             drm_sched_increase_karma(&task->base);
>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> index 0a7ed04..c6336b7 100644
>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>> drm_sched_job *sched_job)
>>>>             sched_job);
>>>>            for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>> -        drm_sched_stop(&pfdev->js->queue[i].sched);
>>>> +        drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>            if (sched_job)
>>>>             drm_sched_increase_karma(sched_job);
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 19fc601..21e8734 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>> drm_gpu_scheduler *sched,
>>>>     }
>>>>     EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>     -/* job_finish is called after hw fence signaled
>>>> - */
>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>> -{
>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>> drm_sched_job,
>>>> -                           finish_work);
>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>> -    unsigned long flags;
>>>> -
>>>> -    /*
>>>> -     * Canceling the timeout without removing our job from the ring
>>>> mirror
>>>> -     * list is safe, as we will only end up in this worker if our jobs
>>>> -     * finished fence has been signaled. So even if some another
>>>> worker
>>>> -     * manages to find this job as the next job in the list, the fence
>>>> -     * signaled check below will prevent the timeout to be restarted.
>>>> -     */
>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>> -
>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -    /* queue TDR for next job */
>>>> -    drm_sched_start_timeout(sched);
>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> -
>>>> -    sched->ops->free_job(s_job);
>>>> -}
>>>> -
>>>>     static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>     {
>>>>         struct drm_gpu_scheduler *sched = s_job->sched;
>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>> work_struct *work)
>>>>         if (job)
>>>>             job->sched->ops->timedout_job(job);
>>>>     +    /*
>>>> +     * Guilty job did complete and hence needs to be manually removed
>>>> +     * See drm_sched_stop doc.
>>>> +     */
>>>> +    if (list_empty(&job->node))
>>>> +        job->sched->ops->free_job(job);
>>>> +
>>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>         drm_sched_start_timeout(sched);
>>>>         spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>      * @sched: scheduler instance
>>>>      * @bad: bad scheduler job
>>>>      *
>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>> + * Note: bad job will not be freed as it might be used later and so
>>>> it's
>>>> + * callers responsibility to release it manually if it's not part
>>>> of the
>>>> + * mirror list any more.
>>>> + *
>>>>      */
>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>> drm_sched_job *bad)
>>>>     {
>>>> -    struct drm_sched_job *s_job;
>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>         unsigned long flags;
>>>> -    struct dma_fence *last_fence =  NULL;
>>>>            kthread_park(sched->thread);
>>>>            /*
>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>> the ring
>>>> -     * by waiting for the latest job to enter the list. This should
>>>> insure that
>>>> -     * also all the previous jobs that were in flight also already
>>>> singaled
>>>> -     * and removed from the list.
>>>> +     * Iterate the job list from later to  earlier one and either
>>>> deactive
>>>> +     * their HW callbacks or remove them from mirror list if they
>>>> already
>>>> +     * signaled.
>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>          */
>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>> node) {
>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>> &sched->ring_mirror_list, node) {
>>>>             if (s_job->s_fence->parent &&
>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>                               &s_job->cb)) {
>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>> *sched)
>>>>                 s_job->s_fence->parent = NULL;
>>>>                 atomic_dec(&sched->hw_rq_count);
>>>>             } else {
>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>> -             break;
>>>> +            /*
>>>> +             * remove job from ring_mirror_list.
>>>> +             * Locking here is for concurrent resume timeout
>>>> +             */
>>>> +            spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +            list_del_init(&s_job->node);
>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +
>>>> +            /*
>>>> +             * Wait for job's HW fence callback to finish using s_job
>>>> +             * before releasing it.
>>>> +             *
>>>> +             * Job is still alive so fence refcount at least 1
>>>> +             */
>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>> +
>>>> +            /*
>>>> +             * We must keep bad job alive for later use during
>>>> +             * recovery by some of the drivers
>>>> +             */
>>>> +            if (bad != s_job)
>>>> +                sched->ops->free_job(s_job);
>>>>             }
>>>>         }
>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> -
>>>> -    if (last_fence) {
>>>> -        dma_fence_wait(last_fence, false);
>>>> -        dma_fence_put(last_fence);
>>>> -    }
>>>>     }
>>>>        EXPORT_SYMBOL(drm_sched_stop);
>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>> full_recovery)
>>>>     {
>>>>         struct drm_sched_job *s_job, *tmp;
>>>> +    unsigned long flags;
>>>>         int r;
>>>>            if (!full_recovery)
>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>> *sched, bool full_recovery)
>>>>            /*
>>>>          * Locking the list is not required here as the sched thread
>>>> is parked
>>>> -     * so no new jobs are being pushed in to HW and in
>>>> drm_sched_stop we
>>>> -     * flushed all the jobs who were still in mirror list but who
>>>> already
>>>> -     * signaled and removed them self from the list. Also concurrent
>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>          * GPU recovers can't run in parallel.
>>>>          */
>>>>         list_for_each_entry_safe(s_job, tmp,
>>>> &sched->ring_mirror_list, node) {
>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>> *sched, bool full_recovery)
>>>>                 drm_sched_process_job(NULL, &s_job->cb);
>>>>         }
>>>>     +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>         drm_sched_start_timeout(sched);
>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>        unpark:
>>>>         kthread_unpark(sched->thread);
>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>> drm_gpu_scheduler *sched)
>>>>         uint64_t guilty_context;
>>>>         bool found_guilty = false;
>>>>     -    /*TODO DO we need spinlock here ? */
>>>>         list_for_each_entry_safe(s_job, tmp,
>>>> &sched->ring_mirror_list, node) {
>>>>             struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>     @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>> *job,
>>>>             return -ENOMEM;
>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
>>>>     -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>         INIT_LIST_HEAD(&job->node);
>>>>            return 0;
>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>         struct drm_sched_job *s_job = container_of(cb, struct
>>>> drm_sched_job, cb);
>>>>         struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>         struct drm_gpu_scheduler *sched = s_fence->sched;
>>>> -    unsigned long flags;
>>>> -
>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>            atomic_dec(&sched->hw_rq_count);
>>>>         atomic_dec(&sched->num_jobs);
>>>>     -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> -    /* remove job from ring_mirror_list */
>>>> -    list_del_init(&s_job->node);
>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +    trace_drm_sched_process_job(s_fence);
>>>>            drm_sched_fence_finished(s_fence);
>>>> -
>>>> -    trace_drm_sched_process_job(s_fence);
>>>>         wake_up_interruptible(&sched->wake_up_worker);
>>>> +}
>>>> +
>>>> +/**
>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>> + *
>>>> + * @sched: scheduler instance
>>>> + *
>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>> + */
>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>> +{
>>>> +    unsigned long flags;
>>>> +
>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>> +        return;
>>>> +
>>>> +
>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>> +        struct drm_sched_job *job;
>>>> +
>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>> +                       struct drm_sched_job, node);
>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>> +            break;
>>>> +
>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +        /* remove job from ring_mirror_list */
>>>> +        list_del_init(&job->node);
>>>> +        spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>> +
>>>> +        sched->ops->free_job(job);
>>>> +    }
>>>> +
>>>> +    /* queue timeout for next job */
>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>> +    drm_sched_start_timeout(sched);
>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>     -    schedule_work(&s_job->finish_work);
>>>>     }
>>>>        /**
>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>             struct dma_fence *fence;
>>>> wait_event_interruptible(sched->wake_up_worker,
>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>                          (!drm_sched_blocked(sched) &&
>>>>                           (entity = drm_sched_select_entity(sched))) ||
>>>> -                     kthread_should_stop());
>>>> +                     kthread_should_stop()));
>>>>                if (!entity)
>>>>                 continue;
>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> index e740f3b..1a4abe7 100644
>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>> struct drm_sched_job *sched_job)
>>>>            /* block scheduler */
>>>>         for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>            if (sched_job)
>>>>             drm_sched_increase_karma(sched_job);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index 0daca4d..9ee0f27 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>      * @sched: the scheduler instance on which this job is scheduled.
>>>>      * @s_fence: contains the fences for the scheduling of job.
>>>>      * @finish_cb: the callback for the finished fence.
>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>> the job has
>>>> - *               finished to remove the job from the
>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>      * @node: used to append this struct to the
>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>      * @id: a unique id assigned to each job scheduled on the scheduler.
>>>>      * @karma: increment on every hang caused by this job. If this
>>>> exceeds the hang
>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>         struct drm_gpu_scheduler    *sched;
>>>>         struct drm_sched_fence        *s_fence;
>>>>         struct dma_fence_cb        finish_cb;
>>>> -    struct work_struct        finish_work;
>>>>         struct list_head        node;
>>>>         uint64_t            id;
>>>>         atomic_t            karma;
>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>                    void *owner);
>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>     void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>> drm_sched_job *bad);
>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>> full_recovery);
>>>>     void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>     void drm_sched_increase_karma(struct drm_sched_job *bad);
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König April 17, 2019, 6:01 p.m. UTC | #5

Am 17.04.19 um 19:59 schrieb Christian König:
> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
>> On 4/17/19 1:17 PM, Christian König wrote:
>>> I can't review this patch, since I'm one of the authors of it, but in
>>> general your changes look good to me now.
>>>
>>> For patch #5 I think it might be cleaner if we move incrementing of
>>> the hw_rq_count while starting the scheduler again.
>>
>> But the increment of  hw_rq_count is conditional on if the guilty job
>> was signaled, moving it into drm_sched_start will also force me to pass
>> 'job_signaled' flag into drm_sched_start which is against your original
>> comment that we don't want to pass this logic around helper functions
>> and keep it all in one place which is amdgpu_device_gpu_recover.
>
> Well I hope that incrementing hw_rq_count is conditional for signaled 
> jobs anyway, or otherwise we would seriously mess up the counter.
>
> E.g. in drm_sched_stop() we also only decrement it when we where able 
> to remove the callback.

Ok, checking the code again we don't need any special handling here 
since all signaled jobs are already removed from the mirror_list.

Christian.

>
> Christian.
>
>>
>> Andrey
>>
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>> Ping on this patch and patch 5. The rest already RBed.
>>>>
>>>> Andrey
>>>>
>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>
>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>> we never destroy a job currently in timeout processing.
>>>>> By this we avoid holding lock around ring mirror list in 
>>>>> drm_sched_stop
>>>>> which should solve a deadlock reported by a user.
>>>>>
>>>>> v2: Remove unused variable.
>>>>> v4: Move guilty job free into sched code.
>>>>>
>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>>     drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>>     drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>>     drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>>> +++++++++++++++++------------
>>>>>     drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>>     include/drm/gpu_scheduler.h                |   6 +-
>>>>>     8 files changed, 94 insertions(+), 78 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index 7cee269..a0e165c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>> amdgpu_device *adev,
>>>>>             if (!ring || !ring->sched.thread)
>>>>>                 continue;
>>>>>     -        drm_sched_stop(&ring->sched);
>>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>>                /* after all hw jobs are reset, hw fence is
>>>>> meaningless, so force_completion */
>>>>>             amdgpu_fence_driver_force_completion(ring);
>>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>> amdgpu_device *adev,
>>>>>         if(job)
>>>>>             drm_sched_increase_karma(&job->base);
>>>>>     -
>>>>> -
>>>>>         if (!amdgpu_sriov_vf(adev)) {
>>>>>                if (!need_full_reset)
>>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>>> amdgpu_hive_info *hive,
>>>>>         return r;
>>>>>     }
>>>>>     -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>> *adev,
>>>>> -                      struct amdgpu_job *job)
>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device 
>>>>> *adev)
>>>>>     {
>>>>>         int i;
>>>>>     @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>>> amdgpu_device *adev,
>>>>>            /* Post ASIC reset for all devs .*/
>>>>>         list_for_each_entry(tmp_adev, device_list_handle,
>>>>> gmc.xgmi.head) {
>>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>> job : NULL);
>>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>>                if (r) {
>>>>>                 /* bad news, how to tell it to userspace ? */
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> index 33854c9..5778d9c 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>                 mmu_size + gpu->buffer.size;
>>>>>            /* Add in the active command buffers */
>>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, 
>>>>> node) {
>>>>>             submit = to_etnaviv_submit(s_job);
>>>>>             file_size += submit->cmdbuf.size;
>>>>>             n_obj++;
>>>>>         }
>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>            /* Add in the active buffer objects */
>>>>>         list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>                       gpu->buffer.size,
>>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>>     - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>         list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, 
>>>>> node) {
>>>>>             submit = to_etnaviv_submit(s_job);
>>>>>             etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>>                           submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>>         }
>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>            /* Reserve space for the bomap */
>>>>>         if (n_bomap_pages) {
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> index 6d24fea..a813c82 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>>> drm_sched_job *sched_job)
>>>>>         }
>>>>>            /* block scheduler */
>>>>> -    drm_sched_stop(&gpu->sched);
>>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>>            if(sched_job)
>>>>>             drm_sched_increase_karma(sched_job);
>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>> index 97bd9c1..df98931 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>>     static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>>> *pipe,
>>>>>                          struct lima_sched_task *task)
>>>>>     {
>>>>> -    drm_sched_stop(&pipe->base);
>>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>>            if (task)
>>>>>             drm_sched_increase_karma(&task->base);
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> index 0a7ed04..c6336b7 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>>> drm_sched_job *sched_job)
>>>>>             sched_job);
>>>>>            for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>>>>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>>            if (sched_job)
>>>>>             drm_sched_increase_karma(sched_job);
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 19fc601..21e8734 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>>> drm_gpu_scheduler *sched,
>>>>>     }
>>>>>     EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>     -/* job_finish is called after hw fence signaled
>>>>> - */
>>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>>> -{
>>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>>> drm_sched_job,
>>>>> -                           finish_work);
>>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>>> -    unsigned long flags;
>>>>> -
>>>>> -    /*
>>>>> -     * Canceling the timeout without removing our job from the ring
>>>>> mirror
>>>>> -     * list is safe, as we will only end up in this worker if our 
>>>>> jobs
>>>>> -     * finished fence has been signaled. So even if some another
>>>>> worker
>>>>> -     * manages to find this job as the next job in the list, the 
>>>>> fence
>>>>> -     * signaled check below will prevent the timeout to be 
>>>>> restarted.
>>>>> -     */
>>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>>> -
>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -    /* queue TDR for next job */
>>>>> -    drm_sched_start_timeout(sched);
>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> -
>>>>> -    sched->ops->free_job(s_job);
>>>>> -}
>>>>> -
>>>>>     static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>     {
>>>>>         struct drm_gpu_scheduler *sched = s_job->sched;
>>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>>> work_struct *work)
>>>>>         if (job)
>>>>>             job->sched->ops->timedout_job(job);
>>>>>     +    /*
>>>>> +     * Guilty job did complete and hence needs to be manually 
>>>>> removed
>>>>> +     * See drm_sched_stop doc.
>>>>> +     */
>>>>> +    if (list_empty(&job->node))
>>>>> +        job->sched->ops->free_job(job);
>>>>> +
>>>>>         spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>         drm_sched_start_timeout(sched);
>>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>>      * @sched: scheduler instance
>>>>>      * @bad: bad scheduler job
>>>>>      *
>>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>>> + * Note: bad job will not be freed as it might be used later and so
>>>>> it's
>>>>> + * callers responsibility to release it manually if it's not part
>>>>> of the
>>>>> + * mirror list any more.
>>>>> + *
>>>>>      */
>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>> drm_sched_job *bad)
>>>>>     {
>>>>> -    struct drm_sched_job *s_job;
>>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>>         unsigned long flags;
>>>>> -    struct dma_fence *last_fence =  NULL;
>>>>>            kthread_park(sched->thread);
>>>>>            /*
>>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>>> the ring
>>>>> -     * by waiting for the latest job to enter the list. This should
>>>>> insure that
>>>>> -     * also all the previous jobs that were in flight also already
>>>>> singaled
>>>>> -     * and removed from the list.
>>>>> +     * Iterate the job list from later to  earlier one and either
>>>>> deactive
>>>>> +     * their HW callbacks or remove them from mirror list if they
>>>>> already
>>>>> +     * signaled.
>>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>>          */
>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>>> node) {
>>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>>> &sched->ring_mirror_list, node) {
>>>>>             if (s_job->s_fence->parent &&
>>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>>                               &s_job->cb)) {
>>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>>> *sched)
>>>>>                 s_job->s_fence->parent = NULL;
>>>>>                 atomic_dec(&sched->hw_rq_count);
>>>>>             } else {
>>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>>> -             break;
>>>>> +            /*
>>>>> +             * remove job from ring_mirror_list.
>>>>> +             * Locking here is for concurrent resume timeout
>>>>> +             */
>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +            list_del_init(&s_job->node);
>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +
>>>>> +            /*
>>>>> +             * Wait for job's HW fence callback to finish using 
>>>>> s_job
>>>>> +             * before releasing it.
>>>>> +             *
>>>>> +             * Job is still alive so fence refcount at least 1
>>>>> +             */
>>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>>> +
>>>>> +            /*
>>>>> +             * We must keep bad job alive for later use during
>>>>> +             * recovery by some of the drivers
>>>>> +             */
>>>>> +            if (bad != s_job)
>>>>> +                sched->ops->free_job(s_job);
>>>>>             }
>>>>>         }
>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> -
>>>>> -    if (last_fence) {
>>>>> -        dma_fence_wait(last_fence, false);
>>>>> -        dma_fence_put(last_fence);
>>>>> -    }
>>>>>     }
>>>>>        EXPORT_SYMBOL(drm_sched_stop);
>>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>> full_recovery)
>>>>>     {
>>>>>         struct drm_sched_job *s_job, *tmp;
>>>>> +    unsigned long flags;
>>>>>         int r;
>>>>>            if (!full_recovery)
>>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>> *sched, bool full_recovery)
>>>>>            /*
>>>>>          * Locking the list is not required here as the sched thread
>>>>> is parked
>>>>> -     * so no new jobs are being pushed in to HW and in
>>>>> drm_sched_stop we
>>>>> -     * flushed all the jobs who were still in mirror list but who
>>>>> already
>>>>> -     * signaled and removed them self from the list. Also concurrent
>>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>>          * GPU recovers can't run in parallel.
>>>>>          */
>>>>>         list_for_each_entry_safe(s_job, tmp,
>>>>> &sched->ring_mirror_list, node) {
>>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>> *sched, bool full_recovery)
>>>>>                 drm_sched_process_job(NULL, &s_job->cb);
>>>>>         }
>>>>>     +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>         drm_sched_start_timeout(sched);
>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>        unpark:
>>>>>         kthread_unpark(sched->thread);
>>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>>> drm_gpu_scheduler *sched)
>>>>>         uint64_t guilty_context;
>>>>>         bool found_guilty = false;
>>>>>     -    /*TODO DO we need spinlock here ? */
>>>>>         list_for_each_entry_safe(s_job, tmp,
>>>>> &sched->ring_mirror_list, node) {
>>>>>             struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>     @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>>> *job,
>>>>>             return -ENOMEM;
>>>>>         job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>     -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>>         INIT_LIST_HEAD(&job->node);
>>>>>            return 0;
>>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>         struct drm_sched_job *s_job = container_of(cb, struct
>>>>> drm_sched_job, cb);
>>>>>         struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>         struct drm_gpu_scheduler *sched = s_fence->sched;
>>>>> -    unsigned long flags;
>>>>> -
>>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>>            atomic_dec(&sched->hw_rq_count);
>>>>>         atomic_dec(&sched->num_jobs);
>>>>>     -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> -    /* remove job from ring_mirror_list */
>>>>> -    list_del_init(&s_job->node);
>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +    trace_drm_sched_process_job(s_fence);
>>>>>            drm_sched_fence_finished(s_fence);
>>>>> -
>>>>> -    trace_drm_sched_process_job(s_fence);
>>>>> wake_up_interruptible(&sched->wake_up_worker);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>>> + *
>>>>> + * @sched: scheduler instance
>>>>> + *
>>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>>> + */
>>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>> +{
>>>>> +    unsigned long flags;
>>>>> +
>>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>>> +        return;
>>>>> +
>>>>> +
>>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>>> +        struct drm_sched_job *job;
>>>>> +
>>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>>> +                       struct drm_sched_job, node);
>>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>> +            break;
>>>>> +
>>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +        /* remove job from ring_mirror_list */
>>>>> +        list_del_init(&job->node);
>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>> +
>>>>> +        sched->ops->free_job(job);
>>>>> +    }
>>>>> +
>>>>> +    /* queue timeout for next job */
>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>> +    drm_sched_start_timeout(sched);
>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>     -    schedule_work(&s_job->finish_work);
>>>>>     }
>>>>>        /**
>>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>>             struct dma_fence *fence;
>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>>                          (!drm_sched_blocked(sched) &&
>>>>>                           (entity = 
>>>>> drm_sched_select_entity(sched))) ||
>>>>> -                     kthread_should_stop());
>>>>> +                     kthread_should_stop()));
>>>>>                if (!entity)
>>>>>                 continue;
>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> index e740f3b..1a4abe7 100644
>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>> struct drm_sched_job *sched_job)
>>>>>            /* block scheduler */
>>>>>         for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>            if (sched_job)
>>>>>             drm_sched_increase_karma(sched_job);
>>>>> diff --git a/include/drm/gpu_scheduler.h 
>>>>> b/include/drm/gpu_scheduler.h
>>>>> index 0daca4d..9ee0f27 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>>      * @sched: the scheduler instance on which this job is scheduled.
>>>>>      * @s_fence: contains the fences for the scheduling of job.
>>>>>      * @finish_cb: the callback for the finished fence.
>>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>>> the job has
>>>>> - *               finished to remove the job from the
>>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>>      * @node: used to append this struct to the
>>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>>      * @id: a unique id assigned to each job scheduled on the 
>>>>> scheduler.
>>>>>      * @karma: increment on every hang caused by this job. If this
>>>>> exceeds the hang
>>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>>         struct drm_gpu_scheduler    *sched;
>>>>>         struct drm_sched_fence        *s_fence;
>>>>>         struct dma_fence_cb        finish_cb;
>>>>> -    struct work_struct        finish_work;
>>>>>         struct list_head        node;
>>>>>         uint64_t            id;
>>>>>         atomic_t            karma;
>>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>                    void *owner);
>>>>>     void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>>     void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>> drm_sched_job *bad);
>>>>>     void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>> full_recovery);
>>>>>     void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>>     void drm_sched_increase_karma(struct drm_sched_job *bad);
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

Andrey Grodzovsky April 17, 2019, 6:29 p.m. UTC | #6

On 4/17/19 2:01 PM, Koenig, Christian wrote:
> Am 17.04.19 um 19:59 schrieb Christian König:
>> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
>>> On 4/17/19 1:17 PM, Christian König wrote:
>>>> I can't review this patch, since I'm one of the authors of it, but in
>>>> general your changes look good to me now.
>>>>
>>>> For patch #5 I think it might be cleaner if we move incrementing of
>>>> the hw_rq_count while starting the scheduler again.
>>> But the increment of  hw_rq_count is conditional on if the guilty job
>>> was signaled, moving it into drm_sched_start will also force me to pass
>>> 'job_signaled' flag into drm_sched_start which is against your original
>>> comment that we don't want to pass this logic around helper functions
>>> and keep it all in one place which is amdgpu_device_gpu_recover.
>> Well I hope that incrementing hw_rq_count is conditional for signaled
>> jobs anyway, or otherwise we would seriously mess up the counter.
>>
>> E.g. in drm_sched_stop() we also only decrement it when we where able
>> to remove the callback.
> Ok, checking the code again we don't need any special handling here
> since all signaled jobs are already removed from the mirror_list.
>
> Christian.

We decrement in drm_sched_stop and then later if the guilty job is found 
to be signaled we are skipping drm_sched_resubmit_jobs and so will not 
increment back and then the count becomes 'negative' when the fence 
signals and i got a bug. But now i think what i need is to just move the 
atomic_inc(&sched->hw_rq_count) from drm_sched_resubmit_jobs into 
drm_sched_start and so this way i can get rid of the conditional 
re-incriment i am doing now. Agree ?

Andrey

>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>>> Ping on this patch and patch 5. The rest already RBed.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>>
>>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>>> we never destroy a job currently in timeout processing.
>>>>>> By this we avoid holding lock around ring mirror list in
>>>>>> drm_sched_stop
>>>>>> which should solve a deadlock reported by a user.
>>>>>>
>>>>>> v2: Remove unused variable.
>>>>>> v4: Move guilty job free into sched code.
>>>>>>
>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>>>      drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>>>      drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>>>      drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>>>      drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>>>> +++++++++++++++++------------
>>>>>>      drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>>>      include/drm/gpu_scheduler.h                |   6 +-
>>>>>>      8 files changed, 94 insertions(+), 78 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index 7cee269..a0e165c 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>> amdgpu_device *adev,
>>>>>>              if (!ring || !ring->sched.thread)
>>>>>>                  continue;
>>>>>>      -        drm_sched_stop(&ring->sched);
>>>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>>>                 /* after all hw jobs are reset, hw fence is
>>>>>> meaningless, so force_completion */
>>>>>>              amdgpu_fence_driver_force_completion(ring);
>>>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>> amdgpu_device *adev,
>>>>>>          if(job)
>>>>>>              drm_sched_increase_karma(&job->base);
>>>>>>      -
>>>>>> -
>>>>>>          if (!amdgpu_sriov_vf(adev)) {
>>>>>>                 if (!need_full_reset)
>>>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>>>> amdgpu_hive_info *hive,
>>>>>>          return r;
>>>>>>      }
>>>>>>      -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>> *adev,
>>>>>> -                      struct amdgpu_job *job)
>>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>> *adev)
>>>>>>      {
>>>>>>          int i;
>>>>>>      @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>>>> amdgpu_device *adev,
>>>>>>             /* Post ASIC reset for all devs .*/
>>>>>>          list_for_each_entry(tmp_adev, device_list_handle,
>>>>>> gmc.xgmi.head) {
>>>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>>> job : NULL);
>>>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>>>                 if (r) {
>>>>>>                  /* bad news, how to tell it to userspace ? */
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> index 33854c9..5778d9c 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>                  mmu_size + gpu->buffer.size;
>>>>>>             /* Add in the active command buffers */
>>>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>          list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>> node) {
>>>>>>              submit = to_etnaviv_submit(s_job);
>>>>>>              file_size += submit->cmdbuf.size;
>>>>>>              n_obj++;
>>>>>>          }
>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>             /* Add in the active buffer objects */
>>>>>>          list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>                        gpu->buffer.size,
>>>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>>>      - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>          list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>> node) {
>>>>>>              submit = to_etnaviv_submit(s_job);
>>>>>>              etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>>>                            submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>>>          }
>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>             /* Reserve space for the bomap */
>>>>>>          if (n_bomap_pages) {
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> index 6d24fea..a813c82 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>>>> drm_sched_job *sched_job)
>>>>>>          }
>>>>>>             /* block scheduler */
>>>>>> -    drm_sched_stop(&gpu->sched);
>>>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>>>             if(sched_job)
>>>>>>              drm_sched_increase_karma(sched_job);
>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> index 97bd9c1..df98931 100644
>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>>>      static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>>>> *pipe,
>>>>>>                           struct lima_sched_task *task)
>>>>>>      {
>>>>>> -    drm_sched_stop(&pipe->base);
>>>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>>>             if (task)
>>>>>>              drm_sched_increase_karma(&task->base);
>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> index 0a7ed04..c6336b7 100644
>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>>>> drm_sched_job *sched_job)
>>>>>>              sched_job);
>>>>>>             for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>>>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>>>>>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>>>             if (sched_job)
>>>>>>              drm_sched_increase_karma(sched_job);
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index 19fc601..21e8734 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>>>> drm_gpu_scheduler *sched,
>>>>>>      }
>>>>>>      EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>>      -/* job_finish is called after hw fence signaled
>>>>>> - */
>>>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>>>> -{
>>>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>>>> drm_sched_job,
>>>>>> -                           finish_work);
>>>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>> -    unsigned long flags;
>>>>>> -
>>>>>> -    /*
>>>>>> -     * Canceling the timeout without removing our job from the ring
>>>>>> mirror
>>>>>> -     * list is safe, as we will only end up in this worker if our
>>>>>> jobs
>>>>>> -     * finished fence has been signaled. So even if some another
>>>>>> worker
>>>>>> -     * manages to find this job as the next job in the list, the
>>>>>> fence
>>>>>> -     * signaled check below will prevent the timeout to be
>>>>>> restarted.
>>>>>> -     */
>>>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>>>> -
>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    /* queue TDR for next job */
>>>>>> -    drm_sched_start_timeout(sched);
>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> -
>>>>>> -    sched->ops->free_job(s_job);
>>>>>> -}
>>>>>> -
>>>>>>      static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>>      {
>>>>>>          struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>>>> work_struct *work)
>>>>>>          if (job)
>>>>>>              job->sched->ops->timedout_job(job);
>>>>>>      +    /*
>>>>>> +     * Guilty job did complete and hence needs to be manually
>>>>>> removed
>>>>>> +     * See drm_sched_stop doc.
>>>>>> +     */
>>>>>> +    if (list_empty(&job->node))
>>>>>> +        job->sched->ops->free_job(job);
>>>>>> +
>>>>>>          spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>          drm_sched_start_timeout(sched);
>>>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>>>       * @sched: scheduler instance
>>>>>>       * @bad: bad scheduler job
>>>>>>       *
>>>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>>>> + * Note: bad job will not be freed as it might be used later and so
>>>>>> it's
>>>>>> + * callers responsibility to release it manually if it's not part
>>>>>> of the
>>>>>> + * mirror list any more.
>>>>>> + *
>>>>>>       */
>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>> drm_sched_job *bad)
>>>>>>      {
>>>>>> -    struct drm_sched_job *s_job;
>>>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>>>          unsigned long flags;
>>>>>> -    struct dma_fence *last_fence =  NULL;
>>>>>>             kthread_park(sched->thread);
>>>>>>             /*
>>>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>>>> the ring
>>>>>> -     * by waiting for the latest job to enter the list. This should
>>>>>> insure that
>>>>>> -     * also all the previous jobs that were in flight also already
>>>>>> singaled
>>>>>> -     * and removed from the list.
>>>>>> +     * Iterate the job list from later to  earlier one and either
>>>>>> deactive
>>>>>> +     * their HW callbacks or remove them from mirror list if they
>>>>>> already
>>>>>> +     * signaled.
>>>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>>>           */
>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>>>> node) {
>>>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>>>> &sched->ring_mirror_list, node) {
>>>>>>              if (s_job->s_fence->parent &&
>>>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>>>                                &s_job->cb)) {
>>>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>>>> *sched)
>>>>>>                  s_job->s_fence->parent = NULL;
>>>>>>                  atomic_dec(&sched->hw_rq_count);
>>>>>>              } else {
>>>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>>>> -             break;
>>>>>> +            /*
>>>>>> +             * remove job from ring_mirror_list.
>>>>>> +             * Locking here is for concurrent resume timeout
>>>>>> +             */
>>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +            list_del_init(&s_job->node);
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> +            /*
>>>>>> +             * Wait for job's HW fence callback to finish using
>>>>>> s_job
>>>>>> +             * before releasing it.
>>>>>> +             *
>>>>>> +             * Job is still alive so fence refcount at least 1
>>>>>> +             */
>>>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>>>> +
>>>>>> +            /*
>>>>>> +             * We must keep bad job alive for later use during
>>>>>> +             * recovery by some of the drivers
>>>>>> +             */
>>>>>> +            if (bad != s_job)
>>>>>> +                sched->ops->free_job(s_job);
>>>>>>              }
>>>>>>          }
>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> -
>>>>>> -    if (last_fence) {
>>>>>> -        dma_fence_wait(last_fence, false);
>>>>>> -        dma_fence_put(last_fence);
>>>>>> -    }
>>>>>>      }
>>>>>>         EXPORT_SYMBOL(drm_sched_stop);
>>>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>>>      void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>> full_recovery)
>>>>>>      {
>>>>>>          struct drm_sched_job *s_job, *tmp;
>>>>>> +    unsigned long flags;
>>>>>>          int r;
>>>>>>             if (!full_recovery)
>>>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>> *sched, bool full_recovery)
>>>>>>             /*
>>>>>>           * Locking the list is not required here as the sched thread
>>>>>> is parked
>>>>>> -     * so no new jobs are being pushed in to HW and in
>>>>>> drm_sched_stop we
>>>>>> -     * flushed all the jobs who were still in mirror list but who
>>>>>> already
>>>>>> -     * signaled and removed them self from the list. Also concurrent
>>>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>>>           * GPU recovers can't run in parallel.
>>>>>>           */
>>>>>>          list_for_each_entry_safe(s_job, tmp,
>>>>>> &sched->ring_mirror_list, node) {
>>>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>> *sched, bool full_recovery)
>>>>>>                  drm_sched_process_job(NULL, &s_job->cb);
>>>>>>          }
>>>>>>      +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>          drm_sched_start_timeout(sched);
>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>         unpark:
>>>>>>          kthread_unpark(sched->thread);
>>>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>>>> drm_gpu_scheduler *sched)
>>>>>>          uint64_t guilty_context;
>>>>>>          bool found_guilty = false;
>>>>>>      -    /*TODO DO we need spinlock here ? */
>>>>>>          list_for_each_entry_safe(s_job, tmp,
>>>>>> &sched->ring_mirror_list, node) {
>>>>>>              struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>      @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>>>> *job,
>>>>>>              return -ENOMEM;
>>>>>>          job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>      -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>>>          INIT_LIST_HEAD(&job->node);
>>>>>>             return 0;
>>>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>>          struct drm_sched_job *s_job = container_of(cb, struct
>>>>>> drm_sched_job, cb);
>>>>>>          struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>          struct drm_gpu_scheduler *sched = s_fence->sched;
>>>>>> -    unsigned long flags;
>>>>>> -
>>>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>>>             atomic_dec(&sched->hw_rq_count);
>>>>>>          atomic_dec(&sched->num_jobs);
>>>>>>      -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> -    /* remove job from ring_mirror_list */
>>>>>> -    list_del_init(&s_job->node);
>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +    trace_drm_sched_process_job(s_fence);
>>>>>>             drm_sched_fence_finished(s_fence);
>>>>>> -
>>>>>> -    trace_drm_sched_process_job(s_fence);
>>>>>> wake_up_interruptible(&sched->wake_up_worker);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>>>> + *
>>>>>> + * @sched: scheduler instance
>>>>>> + *
>>>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>>>> + */
>>>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>>> +{
>>>>>> +    unsigned long flags;
>>>>>> +
>>>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>>>> +        return;
>>>>>> +
>>>>>> +
>>>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>>>> +        struct drm_sched_job *job;
>>>>>> +
>>>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>>>> +                       struct drm_sched_job, node);
>>>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>>> +            break;
>>>>>> +
>>>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +        /* remove job from ring_mirror_list */
>>>>>> +        list_del_init(&job->node);
>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>> +
>>>>>> +        sched->ops->free_job(job);
>>>>>> +    }
>>>>>> +
>>>>>> +    /* queue timeout for next job */
>>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>> +    drm_sched_start_timeout(sched);
>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>      -    schedule_work(&s_job->finish_work);
>>>>>>      }
>>>>>>         /**
>>>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>>>              struct dma_fence *fence;
>>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>>>                           (!drm_sched_blocked(sched) &&
>>>>>>                            (entity =
>>>>>> drm_sched_select_entity(sched))) ||
>>>>>> -                     kthread_should_stop());
>>>>>> +                     kthread_should_stop()));
>>>>>>                 if (!entity)
>>>>>>                  continue;
>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> index e740f3b..1a4abe7 100644
>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>>> struct drm_sched_job *sched_job)
>>>>>>             /* block scheduler */
>>>>>>          for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>>             if (sched_job)
>>>>>>              drm_sched_increase_karma(sched_job);
>>>>>> diff --git a/include/drm/gpu_scheduler.h
>>>>>> b/include/drm/gpu_scheduler.h
>>>>>> index 0daca4d..9ee0f27 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>>>       * @sched: the scheduler instance on which this job is scheduled.
>>>>>>       * @s_fence: contains the fences for the scheduling of job.
>>>>>>       * @finish_cb: the callback for the finished fence.
>>>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>>>> the job has
>>>>>> - *               finished to remove the job from the
>>>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>>>       * @node: used to append this struct to the
>>>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>>>       * @id: a unique id assigned to each job scheduled on the
>>>>>> scheduler.
>>>>>>       * @karma: increment on every hang caused by this job. If this
>>>>>> exceeds the hang
>>>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>>>          struct drm_gpu_scheduler    *sched;
>>>>>>          struct drm_sched_fence        *s_fence;
>>>>>>          struct dma_fence_cb        finish_cb;
>>>>>> -    struct work_struct        finish_work;
>>>>>>          struct list_head        node;
>>>>>>          uint64_t            id;
>>>>>>          atomic_t            karma;
>>>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>                     void *owner);
>>>>>>      void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>>>      void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>> drm_sched_job *bad);
>>>>>>      void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>> full_recovery);
>>>>>>      void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>>>      void drm_sched_increase_karma(struct drm_sched_job *bad);
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König April 17, 2019, 7:08 p.m. UTC | #7

Am 17.04.19 um 20:29 schrieb Grodzovsky, Andrey:
> On 4/17/19 2:01 PM, Koenig, Christian wrote:
>> Am 17.04.19 um 19:59 schrieb Christian König:
>>> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey:
>>>> On 4/17/19 1:17 PM, Christian König wrote:
>>>>> I can't review this patch, since I'm one of the authors of it, but in
>>>>> general your changes look good to me now.
>>>>>
>>>>> For patch #5 I think it might be cleaner if we move incrementing of
>>>>> the hw_rq_count while starting the scheduler again.
>>>> But the increment of  hw_rq_count is conditional on if the guilty job
>>>> was signaled, moving it into drm_sched_start will also force me to pass
>>>> 'job_signaled' flag into drm_sched_start which is against your original
>>>> comment that we don't want to pass this logic around helper functions
>>>> and keep it all in one place which is amdgpu_device_gpu_recover.
>>> Well I hope that incrementing hw_rq_count is conditional for signaled
>>> jobs anyway, or otherwise we would seriously mess up the counter.
>>>
>>> E.g. in drm_sched_stop() we also only decrement it when we where able
>>> to remove the callback.
>> Ok, checking the code again we don't need any special handling here
>> since all signaled jobs are already removed from the mirror_list.
>>
>> Christian.
> We decrement in drm_sched_stop and then later if the guilty job is found
> to be signaled we are skipping drm_sched_resubmit_jobs and so will not
> increment back and then the count becomes 'negative' when the fence
> signals and i got a bug. But now i think what i need is to just move the
> atomic_inc(&sched->hw_rq_count) from drm_sched_resubmit_jobs into
> drm_sched_start and so this way i can get rid of the conditional
> re-incriment i am doing now. Agree ?

Yes, exactly what I had in mind after checking the code once more as well.

Christian.

>
> Andrey
>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey:
>>>>>> Ping on this patch and patch 5. The rest already RBed.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote:
>>>>>>> From: Christian König <christian.koenig@amd.com>
>>>>>>>
>>>>>>> We now destroy finished jobs from the worker thread to make sure that
>>>>>>> we never destroy a job currently in timeout processing.
>>>>>>> By this we avoid holding lock around ring mirror list in
>>>>>>> drm_sched_stop
>>>>>>> which should solve a deadlock reported by a user.
>>>>>>>
>>>>>>> v2: Remove unused variable.
>>>>>>> v4: Move guilty job free into sched code.
>>>>>>>
>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>>>>>>       drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
>>>>>>>       drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   2 +-
>>>>>>>       drivers/gpu/drm/lima/lima_sched.c          |   2 +-
>>>>>>>       drivers/gpu/drm/panfrost/panfrost_job.c    |   2 +-
>>>>>>>       drivers/gpu/drm/scheduler/sched_main.c     | 145
>>>>>>> +++++++++++++++++------------
>>>>>>>       drivers/gpu/drm/v3d/v3d_sched.c            |   2 +-
>>>>>>>       include/drm/gpu_scheduler.h                |   6 +-
>>>>>>>       8 files changed, 94 insertions(+), 78 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> index 7cee269..a0e165c 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>               if (!ring || !ring->sched.thread)
>>>>>>>                   continue;
>>>>>>>       -        drm_sched_stop(&ring->sched);
>>>>>>> +        drm_sched_stop(&ring->sched, &job->base);
>>>>>>>                  /* after all hw jobs are reset, hw fence is
>>>>>>> meaningless, so force_completion */
>>>>>>>               amdgpu_fence_driver_force_completion(ring);
>>>>>>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>           if(job)
>>>>>>>               drm_sched_increase_karma(&job->base);
>>>>>>>       -
>>>>>>> -
>>>>>>>           if (!amdgpu_sriov_vf(adev)) {
>>>>>>>                  if (!need_full_reset)
>>>>>>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct
>>>>>>> amdgpu_hive_info *hive,
>>>>>>>           return r;
>>>>>>>       }
>>>>>>>       -static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>>> *adev,
>>>>>>> -                      struct amdgpu_job *job)
>>>>>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device
>>>>>>> *adev)
>>>>>>>       {
>>>>>>>           int i;
>>>>>>>       @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>              /* Post ASIC reset for all devs .*/
>>>>>>>           list_for_each_entry(tmp_adev, device_list_handle,
>>>>>>> gmc.xgmi.head) {
>>>>>>> -        amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ?
>>>>>>> job : NULL);
>>>>>>> +        amdgpu_device_post_asic_reset(tmp_adev);
>>>>>>>                  if (r) {
>>>>>>>                   /* bad news, how to tell it to userspace ? */
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> index 33854c9..5778d9c 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
>>>>>>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>>                   mmu_size + gpu->buffer.size;
>>>>>>>              /* Add in the active command buffers */
>>>>>>> -    spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>>           list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>>> node) {
>>>>>>>               submit = to_etnaviv_submit(s_job);
>>>>>>>               file_size += submit->cmdbuf.size;
>>>>>>>               n_obj++;
>>>>>>>           }
>>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>>              /* Add in the active buffer objects */
>>>>>>>           list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) {
>>>>>>> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>>>>>>>                         gpu->buffer.size,
>>>>>>> etnaviv_cmdbuf_get_va(&gpu->buffer));
>>>>>>>       - spin_lock_irqsave(&gpu->sched.job_list_lock, flags);
>>>>>>>           list_for_each_entry(s_job, &gpu->sched.ring_mirror_list,
>>>>>>> node) {
>>>>>>>               submit = to_etnaviv_submit(s_job);
>>>>>>>               etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD,
>>>>>>>                             submit->cmdbuf.vaddr, submit->cmdbuf.size,
>>>>>>> etnaviv_cmdbuf_get_va(&submit->cmdbuf));
>>>>>>>           }
>>>>>>> - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags);
>>>>>>>              /* Reserve space for the bomap */
>>>>>>>           if (n_bomap_pages) {
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index 6d24fea..a813c82 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct
>>>>>>> drm_sched_job *sched_job)
>>>>>>>           }
>>>>>>>              /* block scheduler */
>>>>>>> -    drm_sched_stop(&gpu->sched);
>>>>>>> +    drm_sched_stop(&gpu->sched, sched_job);
>>>>>>>              if(sched_job)
>>>>>>>               drm_sched_increase_karma(sched_job);
>>>>>>> diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index 97bd9c1..df98931 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -300,7 +300,7 @@ static struct dma_fence
>>>>>>> *lima_sched_run_job(struct drm_sched_job *job)
>>>>>>>       static void lima_sched_handle_error_task(struct lima_sched_pipe
>>>>>>> *pipe,
>>>>>>>                            struct lima_sched_task *task)
>>>>>>>       {
>>>>>>> -    drm_sched_stop(&pipe->base);
>>>>>>> +    drm_sched_stop(&pipe->base, &task->base);
>>>>>>>              if (task)
>>>>>>>               drm_sched_increase_karma(&task->base);
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 0a7ed04..c6336b7 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -385,7 +385,7 @@ static void panfrost_job_timedout(struct
>>>>>>> drm_sched_job *sched_job)
>>>>>>>               sched_job);
>>>>>>>              for (i = 0; i < NUM_JOB_SLOTS; i++)
>>>>>>> - drm_sched_stop(&pfdev->js->queue[i].sched);
>>>>>>> + drm_sched_stop(&pfdev->js->queue[i].sched, sched_job);
>>>>>>>              if (sched_job)
>>>>>>>               drm_sched_increase_karma(sched_job);
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index 19fc601..21e8734 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -265,32 +265,6 @@ void drm_sched_resume_timeout(struct
>>>>>>> drm_gpu_scheduler *sched,
>>>>>>>       }
>>>>>>>       EXPORT_SYMBOL(drm_sched_resume_timeout);
>>>>>>>       -/* job_finish is called after hw fence signaled
>>>>>>> - */
>>>>>>> -static void drm_sched_job_finish(struct work_struct *work)
>>>>>>> -{
>>>>>>> -    struct drm_sched_job *s_job = container_of(work, struct
>>>>>>> drm_sched_job,
>>>>>>> -                           finish_work);
>>>>>>> -    struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>>> -    unsigned long flags;
>>>>>>> -
>>>>>>> -    /*
>>>>>>> -     * Canceling the timeout without removing our job from the ring
>>>>>>> mirror
>>>>>>> -     * list is safe, as we will only end up in this worker if our
>>>>>>> jobs
>>>>>>> -     * finished fence has been signaled. So even if some another
>>>>>>> worker
>>>>>>> -     * manages to find this job as the next job in the list, the
>>>>>>> fence
>>>>>>> -     * signaled check below will prevent the timeout to be
>>>>>>> restarted.
>>>>>>> -     */
>>>>>>> -    cancel_delayed_work_sync(&sched->work_tdr);
>>>>>>> -
>>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> -    /* queue TDR for next job */
>>>>>>> -    drm_sched_start_timeout(sched);
>>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> -
>>>>>>> -    sched->ops->free_job(s_job);
>>>>>>> -}
>>>>>>> -
>>>>>>>       static void drm_sched_job_begin(struct drm_sched_job *s_job)
>>>>>>>       {
>>>>>>>           struct drm_gpu_scheduler *sched = s_job->sched;
>>>>>>> @@ -315,6 +289,13 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>           if (job)
>>>>>>>               job->sched->ops->timedout_job(job);
>>>>>>>       +    /*
>>>>>>> +     * Guilty job did complete and hence needs to be manually
>>>>>>> removed
>>>>>>> +     * See drm_sched_stop doc.
>>>>>>> +     */
>>>>>>> +    if (list_empty(&job->node))
>>>>>>> +        job->sched->ops->free_job(job);
>>>>>>> +
>>>>>>>           spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>>           drm_sched_start_timeout(sched);
>>>>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> @@ -371,23 +352,26 @@ EXPORT_SYMBOL(drm_sched_increase_karma);
>>>>>>>        * @sched: scheduler instance
>>>>>>>        * @bad: bad scheduler job
>>>>>>>        *
>>>>>>> + * Stop the scheduler and also removes and frees all completed jobs.
>>>>>>> + * Note: bad job will not be freed as it might be used later and so
>>>>>>> it's
>>>>>>> + * callers responsibility to release it manually if it's not part
>>>>>>> of the
>>>>>>> + * mirror list any more.
>>>>>>> + *
>>>>>>>        */
>>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched)
>>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>>> drm_sched_job *bad)
>>>>>>>       {
>>>>>>> -    struct drm_sched_job *s_job;
>>>>>>> +    struct drm_sched_job *s_job, *tmp;
>>>>>>>           unsigned long flags;
>>>>>>> -    struct dma_fence *last_fence =  NULL;
>>>>>>>              kthread_park(sched->thread);
>>>>>>>              /*
>>>>>>> -     * Verify all the signaled jobs in mirror list are removed from
>>>>>>> the ring
>>>>>>> -     * by waiting for the latest job to enter the list. This should
>>>>>>> insure that
>>>>>>> -     * also all the previous jobs that were in flight also already
>>>>>>> singaled
>>>>>>> -     * and removed from the list.
>>>>>>> +     * Iterate the job list from later to  earlier one and either
>>>>>>> deactive
>>>>>>> +     * their HW callbacks or remove them from mirror list if they
>>>>>>> already
>>>>>>> +     * signaled.
>>>>>>> +     * This iteration is thread safe as sched thread is stopped.
>>>>>>>            */
>>>>>>> -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> -    list_for_each_entry_reverse(s_job, &sched->ring_mirror_list,
>>>>>>> node) {
>>>>>>> +    list_for_each_entry_safe_reverse(s_job, tmp,
>>>>>>> &sched->ring_mirror_list, node) {
>>>>>>>               if (s_job->s_fence->parent &&
>>>>>>> dma_fence_remove_callback(s_job->s_fence->parent,
>>>>>>>                                 &s_job->cb)) {
>>>>>>> @@ -395,16 +379,30 @@ void drm_sched_stop(struct drm_gpu_scheduler
>>>>>>> *sched)
>>>>>>>                   s_job->s_fence->parent = NULL;
>>>>>>>                   atomic_dec(&sched->hw_rq_count);
>>>>>>>               } else {
>>>>>>> -             last_fence = dma_fence_get(&s_job->s_fence->finished);
>>>>>>> -             break;
>>>>>>> +            /*
>>>>>>> +             * remove job from ring_mirror_list.
>>>>>>> +             * Locking here is for concurrent resume timeout
>>>>>>> +             */
>>>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> +            list_del_init(&s_job->node);
>>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * Wait for job's HW fence callback to finish using
>>>>>>> s_job
>>>>>>> +             * before releasing it.
>>>>>>> +             *
>>>>>>> +             * Job is still alive so fence refcount at least 1
>>>>>>> +             */
>>>>>>> + dma_fence_wait(&s_job->s_fence->finished, false);
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * We must keep bad job alive for later use during
>>>>>>> +             * recovery by some of the drivers
>>>>>>> +             */
>>>>>>> +            if (bad != s_job)
>>>>>>> +                sched->ops->free_job(s_job);
>>>>>>>               }
>>>>>>>           }
>>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> -
>>>>>>> -    if (last_fence) {
>>>>>>> -        dma_fence_wait(last_fence, false);
>>>>>>> -        dma_fence_put(last_fence);
>>>>>>> -    }
>>>>>>>       }
>>>>>>>          EXPORT_SYMBOL(drm_sched_stop);
>>>>>>> @@ -418,6 +416,7 @@ EXPORT_SYMBOL(drm_sched_stop);
>>>>>>>       void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>>> full_recovery)
>>>>>>>       {
>>>>>>>           struct drm_sched_job *s_job, *tmp;
>>>>>>> +    unsigned long flags;
>>>>>>>           int r;
>>>>>>>              if (!full_recovery)
>>>>>>> @@ -425,9 +424,7 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>>> *sched, bool full_recovery)
>>>>>>>              /*
>>>>>>>            * Locking the list is not required here as the sched thread
>>>>>>> is parked
>>>>>>> -     * so no new jobs are being pushed in to HW and in
>>>>>>> drm_sched_stop we
>>>>>>> -     * flushed all the jobs who were still in mirror list but who
>>>>>>> already
>>>>>>> -     * signaled and removed them self from the list. Also concurrent
>>>>>>> +     * so no new jobs are being inserted or removed. Also concurrent
>>>>>>>            * GPU recovers can't run in parallel.
>>>>>>>            */
>>>>>>>           list_for_each_entry_safe(s_job, tmp,
>>>>>>> &sched->ring_mirror_list, node) {
>>>>>>> @@ -445,7 +442,9 @@ void drm_sched_start(struct drm_gpu_scheduler
>>>>>>> *sched, bool full_recovery)
>>>>>>>                   drm_sched_process_job(NULL, &s_job->cb);
>>>>>>>           }
>>>>>>>       +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>>           drm_sched_start_timeout(sched);
>>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>>          unpark:
>>>>>>>           kthread_unpark(sched->thread);
>>>>>>> @@ -464,7 +463,6 @@ void drm_sched_resubmit_jobs(struct
>>>>>>> drm_gpu_scheduler *sched)
>>>>>>>           uint64_t guilty_context;
>>>>>>>           bool found_guilty = false;
>>>>>>>       -    /*TODO DO we need spinlock here ? */
>>>>>>>           list_for_each_entry_safe(s_job, tmp,
>>>>>>> &sched->ring_mirror_list, node) {
>>>>>>>               struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>>       @@ -514,7 +512,6 @@ int drm_sched_job_init(struct drm_sched_job
>>>>>>> *job,
>>>>>>>               return -ENOMEM;
>>>>>>>           job->id = atomic64_inc_return(&sched->job_id_count);
>>>>>>>       -    INIT_WORK(&job->finish_work, drm_sched_job_finish);
>>>>>>>           INIT_LIST_HEAD(&job->node);
>>>>>>>              return 0;
>>>>>>> @@ -597,24 +594,53 @@ static void drm_sched_process_job(struct
>>>>>>> dma_fence *f, struct dma_fence_cb *cb)
>>>>>>>           struct drm_sched_job *s_job = container_of(cb, struct
>>>>>>> drm_sched_job, cb);
>>>>>>>           struct drm_sched_fence *s_fence = s_job->s_fence;
>>>>>>>           struct drm_gpu_scheduler *sched = s_fence->sched;
>>>>>>> -    unsigned long flags;
>>>>>>> -
>>>>>>> -    cancel_delayed_work(&sched->work_tdr);
>>>>>>>              atomic_dec(&sched->hw_rq_count);
>>>>>>>           atomic_dec(&sched->num_jobs);
>>>>>>>       -    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> -    /* remove job from ring_mirror_list */
>>>>>>> -    list_del_init(&s_job->node);
>>>>>>> -    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> +    trace_drm_sched_process_job(s_fence);
>>>>>>>              drm_sched_fence_finished(s_fence);
>>>>>>> -
>>>>>>> -    trace_drm_sched_process_job(s_fence);
>>>>>>> wake_up_interruptible(&sched->wake_up_worker);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_sched_cleanup_jobs - destroy finished jobs
>>>>>>> + *
>>>>>>> + * @sched: scheduler instance
>>>>>>> + *
>>>>>>> + * Remove all finished jobs from the mirror list and destroy them.
>>>>>>> + */
>>>>>>> +static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>>>>>> +{
>>>>>>> +    unsigned long flags;
>>>>>>> +
>>>>>>> +    /* Don't destroy jobs while the timeout worker is running */
>>>>>>> +    if (!cancel_delayed_work(&sched->work_tdr))
>>>>>>> +        return;
>>>>>>> +
>>>>>>> +
>>>>>>> +    while (!list_empty(&sched->ring_mirror_list)) {
>>>>>>> +        struct drm_sched_job *job;
>>>>>>> +
>>>>>>> +        job = list_first_entry(&sched->ring_mirror_list,
>>>>>>> +                       struct drm_sched_job, node);
>>>>>>> +        if (!dma_fence_is_signaled(&job->s_fence->finished))
>>>>>>> +            break;
>>>>>>> +
>>>>>>> +        spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> +        /* remove job from ring_mirror_list */
>>>>>>> +        list_del_init(&job->node);
>>>>>>> + spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>> +
>>>>>>> +        sched->ops->free_job(job);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* queue timeout for next job */
>>>>>>> +    spin_lock_irqsave(&sched->job_list_lock, flags);
>>>>>>> +    drm_sched_start_timeout(sched);
>>>>>>> +    spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>>>>>       -    schedule_work(&s_job->finish_work);
>>>>>>>       }
>>>>>>>          /**
>>>>>>> @@ -656,9 +682,10 @@ static int drm_sched_main(void *param)
>>>>>>>               struct dma_fence *fence;
>>>>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>>>> +                     (drm_sched_cleanup_jobs(sched),
>>>>>>>                            (!drm_sched_blocked(sched) &&
>>>>>>>                             (entity =
>>>>>>> drm_sched_select_entity(sched))) ||
>>>>>>> -                     kthread_should_stop());
>>>>>>> +                     kthread_should_stop()));
>>>>>>>                  if (!entity)
>>>>>>>                   continue;
>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index e740f3b..1a4abe7 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -232,7 +232,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
>>>>>>> struct drm_sched_job *sched_job)
>>>>>>>              /* block scheduler */
>>>>>>>           for (q = 0; q < V3D_MAX_QUEUES; q++)
>>>>>>> -        drm_sched_stop(&v3d->queue[q].sched);
>>>>>>> +        drm_sched_stop(&v3d->queue[q].sched, sched_job);
>>>>>>>              if (sched_job)
>>>>>>>               drm_sched_increase_karma(sched_job);
>>>>>>> diff --git a/include/drm/gpu_scheduler.h
>>>>>>> b/include/drm/gpu_scheduler.h
>>>>>>> index 0daca4d..9ee0f27 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -167,9 +167,6 @@ struct drm_sched_fence
>>>>>>> *to_drm_sched_fence(struct dma_fence *f);
>>>>>>>        * @sched: the scheduler instance on which this job is scheduled.
>>>>>>>        * @s_fence: contains the fences for the scheduling of job.
>>>>>>>        * @finish_cb: the callback for the finished fence.
>>>>>>> - * @finish_work: schedules the function @drm_sched_job_finish once
>>>>>>> the job has
>>>>>>> - *               finished to remove the job from the
>>>>>>> - *               @drm_gpu_scheduler.ring_mirror_list.
>>>>>>>        * @node: used to append this struct to the
>>>>>>> @drm_gpu_scheduler.ring_mirror_list.
>>>>>>>        * @id: a unique id assigned to each job scheduled on the
>>>>>>> scheduler.
>>>>>>>        * @karma: increment on every hang caused by this job. If this
>>>>>>> exceeds the hang
>>>>>>> @@ -188,7 +185,6 @@ struct drm_sched_job {
>>>>>>>           struct drm_gpu_scheduler    *sched;
>>>>>>>           struct drm_sched_fence        *s_fence;
>>>>>>>           struct dma_fence_cb        finish_cb;
>>>>>>> -    struct work_struct        finish_work;
>>>>>>>           struct list_head        node;
>>>>>>>           uint64_t            id;
>>>>>>>           atomic_t            karma;
>>>>>>> @@ -296,7 +292,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>>>>>>                      void *owner);
>>>>>>>       void drm_sched_job_cleanup(struct drm_sched_job *job);
>>>>>>>       void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>>>>>>> -void drm_sched_stop(struct drm_gpu_scheduler *sched);
>>>>>>> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct
>>>>>>> drm_sched_job *bad);
>>>>>>>       void drm_sched_start(struct drm_gpu_scheduler *sched, bool
>>>>>>> full_recovery);
>>>>>>>       void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
>>>>>>>       void drm_sched_increase_karma(struct drm_sched_job *bad);
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[v4,3/5] drm/scheduler: rework job destruction

Commit Message

Comments

Patch