drm/scheduler: Add stopped flag to drm_sched_entity

Message ID	1534518942-13167-1-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> Received-SPF: None (protection.outlook.com: amd.com does not designate permitted sender hosts) From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> To: <dri-devel@lists.freedesktop.org> Subject: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity Date: Fri, 17 Aug 2018 11:15:42 -0400 Message-ID: <1534518942-13167-1-git-send-email-andrey.grodzovsky@amd.com> MIME-Version: 1.0 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; DM2PR12MB0249; 20:op5SSM8fBJQrbi5xXiKjdkvWeGnGLHCykvqp2BZmLP3YoW1TBPQb/ejNNDe4hy3xgK8Sq1MW3d57wj3dJZB53I3DjEB4OEMlu89j8wYjZZFrhWqT7GBL9OctYiUz8gmjX1M5+foHN9M1f8/L/6/TIkUergg/r1N9RqqQHEYUDFtHOVcj6WCis5RdH/liGO+4vWCjEBibWM58ATOvtnwlECUuOxeF0xmMNtMJtXTW0+BvbmxH/DXlTbOAlBTcSnBd Precedence: list Cc: christian.koenig@amd.com, amd-gfx@lists.freedesktop.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/scheduler: Add stopped flag to drm_sched_entity \| expand drm/scheduler: Add stopped flag to drm_sched_entity

Message ID

1534518942-13167-1-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)

State

New, archived

Headers

Received-SPF: None (protection.outlook.com: amd.com does not designate
 permitted sender hosts)
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: <dri-devel@lists.freedesktop.org>
Subject: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity
Date: Fri, 17 Aug 2018 11:15:42 -0400
Message-ID: <1534518942-13167-1-git-send-email-andrey.grodzovsky@amd.com>
MIME-Version: 1.0
SpamDiagnosticOutput: 1:99
SpamDiagnosticMetadata: NSPM
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Aug 2018 15:15:51.3282 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 ac2c89d6-a3d6-48ff-fc96-08d6045451c2
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17];
 Helo=[SATLEXCHOV01.amd.com]
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR12MB0249
X-BeenThere: dri-devel@lists.freedesktop.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Direct Rendering Infrastructure - Development
 <dri-devel.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Cc: christian.koenig@amd.com, amd-gfx@lists.freedesktop.org
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Series

drm/scheduler: Add stopped flag to drm_sched_entity | expand

Commit Message

Andrey Grodzovsky Aug. 17, 2018, 3:15 p.m. UTC

The flag will prevent another thread from same process to
reinsert the entity queue into scheduler's rq after it was already
removed from there by another thread during drm_sched_entity_flush.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++++-
 include/drm/gpu_scheduler.h              |  2 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Chunming Zhou Aug. 20, 2018, 2:49 a.m. UTC | #1

-----Original Message-----
From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Andrey Grodzovsky
Sent: Friday, August 17, 2018 11:16 PM
To: dri-devel@lists.freedesktop.org
Cc: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity

The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removed from there by another thread during drm_sched_entity_flush.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_entity.c | 10 +++++++++-
 include/drm/gpu_scheduler.h              |  2 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1416edb..07cfe63 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -177,8 +177,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 	/* For killed process disable any more IBs enqueue right now */
 	last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
 	if ((!last_user || last_user == current->group_leader) &&
-	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
+	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) {
+		spin_lock(&entity->rq_lock);
+		entity->stopped = true;
 		drm_sched_rq_remove_entity(entity->rq, entity);
+		spin_unlock(&entity->rq_lock);
+	}
 
 	return ret;
 }
@@ -504,6 +508,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
 	if (first) {
 		/* Add the entity to the run queue */
 		spin_lock(&entity->rq_lock);
+		if (entity->stopped) {
+			spin_unlock(&entity->rq_lock);
+			return;
+		}
[DZ] the code changes so frequent recently and has this regression, my code synced last Friday still has below checking:
                spin_lock(&entity->rq_lock);
                if (!entity->rq) {
                        DRM_ERROR("Trying to push to a killed entity\n");
                        spin_unlock(&entity->rq_lock);
                        return;
                }
So you should add DRM_ERROR as well when hitting it.

With that fix, patch is Reviewed-by: Chunming Zhou <david1.zhou@amd.com>

Regards,
David Zhou
 		drm_sched_rq_add_entity(entity->rq, entity);
 		spin_unlock(&entity->rq_lock);
 		drm_sched_wakeup(entity->rq->sched);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 919ae57..daec50f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -70,6 +70,7 @@ enum drm_sched_priority {
  * @fini_status: contains the exit status in case the process was signalled.
  * @last_scheduled: points to the finished fence of the last scheduled job.
  * @last_user: last group leader pushing a job into the entity.
+ * @stopped: Marks the enity as removed from rq and destined for termination.
  *
  * Entities will emit jobs in order to their corresponding hardware
  * ring, and the scheduler will alternate between entities based on @@ -92,6 +93,7 @@ struct drm_sched_entity {
 	atomic_t			*guilty;
 	struct dma_fence                *last_scheduled;
 	struct task_struct		*last_user;
+	bool 				stopped;
 };
 
 /**
--
2.7.4

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 1416edb..07cfe63 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -177,8 +177,12 @@  long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)
 	/* For killed process disable any more IBs enqueue right now */
 	last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
 	if ((!last_user || last_user == current->group_leader) &&
-	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
+	    (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) {
+		spin_lock(&entity->rq_lock);
+		entity->stopped = true;
 		drm_sched_rq_remove_entity(entity->rq, entity);
+		spin_unlock(&entity->rq_lock);
+	}
 
 	return ret;
 }
@@ -504,6 +508,10 @@  void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
 	if (first) {
 		/* Add the entity to the run queue */
 		spin_lock(&entity->rq_lock);
+		if (entity->stopped) {
+			spin_unlock(&entity->rq_lock);
+			return;
+		}
 		drm_sched_rq_add_entity(entity->rq, entity);
 		spin_unlock(&entity->rq_lock);
 		drm_sched_wakeup(entity->rq->sched);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 919ae57..daec50f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -70,6 +70,7 @@  enum drm_sched_priority {
  * @fini_status: contains the exit status in case the process was signalled.
  * @last_scheduled: points to the finished fence of the last scheduled job.
  * @last_user: last group leader pushing a job into the entity.
+ * @stopped: Marks the enity as removed from rq and destined for termination.
  *
  * Entities will emit jobs in order to their corresponding hardware
  * ring, and the scheduler will alternate between entities based on
@@ -92,6 +93,7 @@  struct drm_sched_entity {
 	atomic_t			*guilty;
 	struct dma_fence                *last_scheduled;
 	struct task_struct		*last_user;
+	bool 				stopped;
 };
 
 /**

drm/scheduler: Add stopped flag to drm_sched_entity

Commit Message

Comments

Patch