From patchwork Thu Nov  8 13:42:34 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Sharat Masetty <smasetty@codeaurora.org>
X-Patchwork-Id: 10674221
X-Patchwork-Delegate: agross@codeaurora.org
Return-Path: <linux-arm-msm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9FA21175A
	for <patchwork-linux-arm-msm@patchwork.kernel.org>;
 Thu,  8 Nov 2018 13:42:46 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E8102DAD5
	for <patchwork-linux-arm-msm@patchwork.kernel.org>;
 Thu,  8 Nov 2018 13:42:46 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 82E9C2DB00; Thu,  8 Nov 2018 13:42:46 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_INVALID,
	DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,SUBJ_OBFU_PUNCT_FEW
	autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CAF762DAD5
	for <patchwork-linux-arm-msm@patchwork.kernel.org>;
 Thu,  8 Nov 2018 13:42:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727059AbeKHXSQ (ORCPT
        <rfc822;patchwork-linux-arm-msm@patchwork.kernel.org>);
        Thu, 8 Nov 2018 18:18:16 -0500
Received: from smtp.codeaurora.org ([198.145.29.96]:39056 "EHLO
        smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726618AbeKHXSQ (ORCPT
        <rfc822;linux-arm-msm@vger.kernel.org>);
        Thu, 8 Nov 2018 18:18:16 -0500
Received: by smtp.codeaurora.org (Postfix, from userid 1000)
        id 0E155607F7; Thu,  8 Nov 2018 13:42:43 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org;
        s=default; t=1541684563;
        bh=ouhei/jMEJa85g6d+x/qwtqKAnae3vVBwaPh7hFRQD0=;
        h=From:To:Cc:Subject:Date:From;
        b=M5tvuAYZsNUUuqZ9g1H+BfssNWifWvBflHBcj0vJYO+ka5iMLT92AUMIfzR8UUu6K
         lLiEaGcj7AJCpkFBYsW4IHLdm0QGVXyJXqTGE1xCemqy3FrETvWKDhtRLLiA8L7pEA
         WW/LpGonP2GwUq9On6Sqil+JNLvle1r6pScVBREo=
Received: from smasetty-linux.qualcomm.com
 (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19])
        (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits))
        (No client certificate requested)
        (Authenticated sender: smasetty@smtp.codeaurora.org)
        by smtp.codeaurora.org (Postfix) with ESMTPSA id ACDDC6038E;
        Thu,  8 Nov 2018 13:42:40 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org;
        s=default; t=1541684562;
        bh=ouhei/jMEJa85g6d+x/qwtqKAnae3vVBwaPh7hFRQD0=;
        h=From:To:Cc:Subject:Date:From;
        b=TkPf6BY36AhFOAEuj9gxNBh2tXmZvLpdv3KqbqkFG6Lrd2xWFQVaeGHvR0yHYftYH
         OgEv16z+Reuy4TrMC2djleycLi8ICpo0cTQ0hba6RIWbVsNdyZ/bhas7+ORf2qapnB
         Y6JZwDgAmcur6Hss+zT0/63AvFkXJ2Rdo/mHJjZ4=
DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org ACDDC6038E
Authentication-Results: pdx-caf-mail.web.codeaurora.org;
 dmarc=none (p=none dis=none) header.from=codeaurora.org
Authentication-Results: pdx-caf-mail.web.codeaurora.org;
 spf=none smtp.mailfrom=smasetty@codeaurora.org
From: Sharat Masetty <smasetty@codeaurora.org>
To: Christian.Koenig@amd.com, freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org,
        jcrouse@codeaurora.org, Sharat Masetty <smasetty@codeaurora.org>
Subject: [PATCH] drm/scheduler: Add drm_sched_suspend/resume timeout functions
Date: Thu,  8 Nov 2018 19:12:34 +0530
Message-Id: <1541684554-17115-1-git-send-email-smasetty@codeaurora.org>
X-Mailer: git-send-email 1.9.1
Sender: linux-arm-msm-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-arm-msm.vger.kernel.org>
X-Mailing-List: linux-arm-msm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Hi Christian,

Can you please review this patch? It is a continuation of the discussion at [1].
At first I was thinking of using a cancel for suspend instead of a mod(to an
arbitrarily large value), but I couldn't get it to fit in as I have an additional
constraint of being able to call these functions from an IRQ context.

These new functions race with other contexts, primarily finish_job(),
timedout_job() and recovery(), but I did go through the possible races between
these(I think). Please let me know what you think of this? I have not tested
this yet and if this is something in the right direction, I will put this
through my testing drill and polish it.

IMO I think I prefer the callback approach as it appears to be simple, less
error prone for both the scheduler and the drivers.

[1]  https://patchwork.freedesktop.org/patch/259914/

Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
---
 drivers/gpu/drm/scheduler/sched_main.c | 81 +++++++++++++++++++++++++++++++++-
 include/drm/gpu_scheduler.h            |  5 +++
 2 files changed, 85 insertions(+), 1 deletion(-)

--
1.9.1

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c993d10..9645789 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -191,11 +191,84 @@ bool drm_sched_dependency_optimized(struct dma_fence* fence,
  */
 static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
+
+	sched->timeout_remaining = sched->timeout;
+
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-	    !list_empty(&sched->ring_mirror_list))
+	    !list_empty(&sched->ring_mirror_list) && !sched->work_tdr_suspended)
 		schedule_delayed_work(&sched->work_tdr, sched->timeout);
+
+	spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
 }

+/**
+ * drm_sched_suspend_timeout - suspend timeout for reset worker
+ *
+ * @sched: scheduler instance for which to suspend the timeout
+ *
+ * Suspend the delayed work timeout for the scheduler. Note that
+ * this function can be called from an IRQ context.
+ */
+void drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched)
+{
+	unsigned long flags, timeout;
+
+	spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
+
+	if (sched->work_tdr_suspended ||
+			sched->timeout == MAX_SCHEDULE_TIMEOUT ||
+			list_empty(&sched->ring_mirror_list))
+		goto done;
+
+	timeout = sched->work_tdr.timer.expires;
+
+	/*
+	 * Reset timeout to an arbitrarily large value
+	 */
+	mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout * 10);
+
+	timeout -= jiffies;
+
+	/* FIXME: Can jiffies be after timeout? */
+	sched->timeout_remaining = time_after(jiffies, timeout)? 0: timeout;
+	sched->work_tdr_suspended = true;
+
+done:
+	spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
+}
+EXPORT_SYMBOL(drm_sched_suspend_timeout);
+
+/**
+ * drm_sched_resume_timeout - resume timeout for reset worker
+ *
+ * @sched: scheduler instance for which to resume the timeout
+ *
+ * Resume the delayed work timeout for the scheduler. Note that
+ * this function can be called from an IRQ context.
+ */
+void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
+
+	if (!sched->work_tdr_suspended ||
+			sched->timeout == MAX_SCHEDULE_TIMEOUT) {
+		spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
+		return;
+	}
+
+	mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout_remaining);
+
+	sched->work_tdr_suspended = false;
+
+	spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);
+}
+EXPORT_SYMBOL(drm_sched_resume_timeout);
+
 /* job_finish is called after hw fence signaled
  */
 static void drm_sched_job_finish(struct work_struct *work)
@@ -348,6 +421,11 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
 	struct drm_sched_job *s_job, *tmp;
 	bool found_guilty = false;
 	int r;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched->tdr_suspend_lock, flags);
+	sched->work_tdr_suspended = false;
+	spin_unlock_irqrestore(&sched->tdr_suspend_lock, flags);

 	spin_lock(&sched->job_list_lock);
 	list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
@@ -607,6 +685,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	init_waitqueue_head(&sched->job_scheduled);
 	INIT_LIST_HEAD(&sched->ring_mirror_list);
 	spin_lock_init(&sched->job_list_lock);
+	spin_lock_init(&sched->tdr_suspend_lock);
 	atomic_set(&sched->hw_rq_count, 0);
 	INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
 	atomic_set(&sched->num_jobs, 0);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d87b268..5d39572 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -278,6 +278,9 @@ struct drm_gpu_scheduler {
 	atomic_t			hw_rq_count;
 	atomic64_t			job_id_count;
 	struct delayed_work		work_tdr;
+	unsigned long			timeout_remaining;
+	bool				work_tdr_suspended;
+	spinlock_t			tdr_suspend_lock;
 	struct task_struct		*thread;
 	struct list_head		ring_mirror_list;
 	spinlock_t			job_list_lock;
@@ -300,6 +303,8 @@ void drm_sched_hw_job_reset(struct drm_gpu_scheduler *sched,
 bool drm_sched_dependency_optimized(struct dma_fence* fence,
 				    struct drm_sched_entity *entity);
 void drm_sched_job_kickout(struct drm_sched_job *s_job);
+void drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched);
+void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched);

 void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 			     struct drm_sched_entity *entity);