[v4,4/7] drm/msm: Fix cx collapse issue during recovery

Message ID	20220817204224.v4.4.I4ac27a0b34ea796ce0f938bb509e257516bc6f57@changeid (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-arm-msm-owner@kernel.org> From: Akhil P Oommen <quic_akhilpo@quicinc.com> To: freedreno <freedreno@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org>, <linux-arm-msm@vger.kernel.org>, Rob Clark <robdclark@gmail.com>, Bjorn Andersson <bjorn.andersson@linaro.org>, "Dmitry Baryshkov" <dmitry.baryshkov@linaro.org> CC: Matthias Kaehlcke <mka@chromium.org>, Jonathan Marek <jonathan@marek.ca>, Jordan Crouse <jordan@cosmicpenguin.net>, Douglas Anderson <dianders@chromium.org>, Akhil P Oommen <quic_akhilpo@quicinc.com>, "Abhinav Kumar" <quic_abhinavk@quicinc.com>, Chia-I Wu <olvaffe@gmail.com>, "Daniel Vetter" <daniel@ffwll.ch>, David Airlie <airlied@linux.ie>, Sean Paul <sean@poorly.run>, <linux-kernel@vger.kernel.org> Subject: [PATCH v4 4/7] drm/msm: Fix cx collapse issue during recovery Date: Wed, 17 Aug 2022 20:44:17 +0530 Message-ID: <20220817204224.v4.4.I4ac27a0b34ea796ce0f938bb509e257516bc6f57@changeid> In-Reply-To: <1660749261-7602-1-git-send-email-quic_akhilpo@quicinc.com> References: <1660749261-7602-1-git-send-email-quic_akhilpo@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	Improve GPU Recovery \| expand [v4,0/7] Improve GPU Recovery [v4,1/7] drm/msm: Remove unnecessary pm_runtime_get/put [v4,2/7] drm/msm: Take single rpm refcount on behalf of all submits [v4,3/7] drm/msm: Correct pm_runtime votes in recover worker [v4,4/7] drm/msm: Fix cx collapse issue during recovery [v4,5/7] drm/msm/a6xx: Ensure CX collapse during gpu recovery [v4,6/7] drm/msm/a6xx: Improve gpu recovery sequence [v4,7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure

Message ID

20220817204224.v4.4.I4ac27a0b34ea796ce0f938bb509e257516bc6f57@changeid (mailing list archive)

State

Superseded

Headers

From: Akhil P Oommen <quic_akhilpo@quicinc.com>
To: freedreno <freedreno@lists.freedesktop.org>,
        <dri-devel@lists.freedesktop.org>, <linux-arm-msm@vger.kernel.org>,
        Rob Clark <robdclark@gmail.com>,
        Bjorn Andersson <bjorn.andersson@linaro.org>,
        "Dmitry Baryshkov" <dmitry.baryshkov@linaro.org>
CC: Matthias Kaehlcke <mka@chromium.org>,
        Jonathan Marek <jonathan@marek.ca>,
        Jordan Crouse <jordan@cosmicpenguin.net>,
        Douglas Anderson <dianders@chromium.org>,
        Akhil P Oommen <quic_akhilpo@quicinc.com>,
        "Abhinav Kumar" <quic_abhinavk@quicinc.com>,
        Chia-I Wu <olvaffe@gmail.com>,
        "Daniel Vetter" <daniel@ffwll.ch>, David Airlie <airlied@linux.ie>,
        Sean Paul <sean@poorly.run>, <linux-kernel@vger.kernel.org>
Subject: [PATCH v4 4/7] drm/msm: Fix cx collapse issue during recovery
Date: Wed, 17 Aug 2022 20:44:17 +0530
Message-ID: 
 <20220817204224.v4.4.I4ac27a0b34ea796ce0f938bb509e257516bc6f57@changeid>
In-Reply-To: <1660749261-7602-1-git-send-email-quic_akhilpo@quicinc.com>
References: <1660749261-7602-1-git-send-email-quic_akhilpo@quicinc.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

Series

Improve GPU Recovery | expand

Commit Message

Akhil P Oommen Aug. 17, 2022, 3:14 p.m. UTC

There are some hardware logic under CX domain. For a successful
recovery, we should ensure cx headswitch collapses to ensure all the
stale states are cleard out. This is especially true to for a6xx family
where we can GMU co-processor.

Currently, cx doesn't collapse due to a devlink between gpu and its
smmu. So the *struct gpu device* needs to be runtime suspended to ensure
that the iommu driver removes its vote on cx gdsc.

Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
---

Changes in v4:
- Keep active_submit lock across the suspend & resume (Rob)
- Clear gpu->active_submits to silence a WARN() during runpm suspend (Rob)

Changes in v3:
- Simplied the pm refcount drop since we have just a single refcount now
for all active submits

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 32 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/msm/msm_gpu.c         |  4 +---
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 42ed9a3..0c8f19e 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1193,7 +1193,7 @@  static void a6xx_recover(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
-	int i;
+	int i, active_submits;
 
 	adreno_dump_info(gpu);
 
@@ -1210,8 +1210,34 @@  static void a6xx_recover(struct msm_gpu *gpu)
 	 */
 	gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
 
-	gpu->funcs->pm_suspend(gpu);
-	gpu->funcs->pm_resume(gpu);
+	pm_runtime_dont_use_autosuspend(&gpu->pdev->dev);
+
+	/* active_submit won't change until we make a submission */
+	mutex_lock(&gpu->active_lock);
+	active_submits = gpu->active_submits;
+
+	/*
+	 * Temporarily clear active_submits count to silence a WARN() in the
+	 * runtime suspend cb
+	 */
+	gpu->active_submits = 0;
+
+	/* Drop the rpm refcount from active submits */
+	if (active_submits)
+		pm_runtime_put(&gpu->pdev->dev);
+
+	/* And the final one from recover worker */
+	pm_runtime_put_sync(&gpu->pdev->dev);
+
+	pm_runtime_use_autosuspend(&gpu->pdev->dev);
+
+	if (active_submits)
+		pm_runtime_get(&gpu->pdev->dev);
+
+	pm_runtime_get_sync(&gpu->pdev->dev);
+
+	gpu->active_submits = active_submits;
+	mutex_unlock(&gpu->active_lock);
 
 	msm_gpu_hw_init(gpu);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 1945efb..07e55a6 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -426,9 +426,7 @@  static void recover_worker(struct kthread_work *work)
 		/* retire completed submits, plus the one that hung: */
 		retire_submits(gpu);
 
-		pm_runtime_get_sync(&gpu->pdev->dev);
 		gpu->funcs->recover(gpu);
-		pm_runtime_put_sync(&gpu->pdev->dev);
 
 		/*
 		 * Replay all remaining submits starting with highest priority
@@ -445,7 +443,7 @@  static void recover_worker(struct kthread_work *work)
 		}
 	}
 
-	pm_runtime_put_sync(&gpu->pdev->dev);
+	pm_runtime_put(&gpu->pdev->dev);
 
 	mutex_unlock(&gpu->lock);

[v4,4/7] drm/msm: Fix cx collapse issue during recovery

Commit Message

Patch