From patchwork Wed Dec 11 07:54:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13903069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 072FEE7717D for ; Wed, 11 Dec 2024 07:54:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E85E710E5FD; Wed, 11 Dec 2024 07:54:25 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="A+4jrFMe"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id B2E2010E576 for ; Wed, 11 Dec 2024 07:54:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1733903663; bh=oCfEc8Rp4c1euA6PuGNOBj1L9F+BDOGTGCqU1dbo6oo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A+4jrFMeLzulvM0jp6druj/jYamEUBL1RuEp3NlXfotrI4wHLDYqI1QhQOJZ0k015 3nyrNPUdBPFXuYPsLzYqlfIgY05P/9NpNRn0AHdwaBiypSjVuBpryfTqSWyJUk/PzS Iu7HFdiJsJiXzdbSghklYF2YEv06xj67YpvXNjwfyavkYEY1A50EWZmKTLEpLsCOBs BkDMldBEWq+4HkgpD2XBN8STRUKBWf2xZbm1i3p0gs+906aFZoRaXVzuMopoSXUcmn pw+tcLwzpq+3WYQyqWn8xYNY20/WYOvlIa4qpTlnphLXBPm336IKyF3xwDLTkMbhuT Cq9FlT/D+3Bcw== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:d3ea:1c7:41fd:3038]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 07BB417E1437; Wed, 11 Dec 2024 08:54:23 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 1/5] drm/panthor: Preserve the result returned by panthor_fw_resume() Date: Wed, 11 Dec 2024 08:54:15 +0100 Message-ID: <20241211075419.2333731-2-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241211075419.2333731-1-boris.brezillon@collabora.com> References: <20241211075419.2333731-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" WARN() will return true if the condition is true, false otherwise. If we store the return of drm_WARN_ON() in ret, we lose the actual error code. v3: - Add R-b v2: - Add R-b Fixes: 5fe909cae118 ("drm/panthor: Add the device logical block") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Adrian Larumbe Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index 984615f4ed27..e701e605d013 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -461,8 +461,8 @@ int panthor_device_resume(struct device *dev) drm_dev_enter(&ptdev->base, &cookie)) { panthor_gpu_resume(ptdev); panthor_mmu_resume(ptdev); - ret = drm_WARN_ON(&ptdev->base, panthor_fw_resume(ptdev)); - if (!ret) { + ret = panthor_fw_resume(ptdev); + if (!drm_WARN_ON(&ptdev->base, ret)) { panthor_sched_resume(ptdev); } else { panthor_mmu_suspend(ptdev); From patchwork Wed Dec 11 07:54:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13903070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D260DE77182 for ; Wed, 11 Dec 2024 07:54:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 55F8810E576; Wed, 11 Dec 2024 07:54:30 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="ebKgxwkY"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1A47F10E576 for ; Wed, 11 Dec 2024 07:54:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1733903663; bh=PA05YmMTaUJabHsEElMeEJ9Jyv0DQWwiI29VUGxAewI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ebKgxwkYbN3QaMZsOneoQrAHJPQF/FrqaS6H15XMkGL9H23nXxzRFdGD5ijP25oJz ATPsbRjaSAoUfIaj/id/BF6rZ4IxdqiFd2tJkfzd085WXg7v0SA70CBED0MP9/YQmm e+u4pV/Ds57QNzmMtifHUnRmqqoqAgyVJ6YcdZQecVe1CyU+E0E+Uv7cVGoRrAAgyi B1bib/0BK04+y2Bt/OT7gBZhPaXTiGGh+93aGKjj2WqEWV4irLYRLRgLVmdvOiqyU3 OyeyTyBG4WpsRTeGpP07TjSKrUVBF5uvEA5ZdTquxn4IH3yTyVXujNw6lCj+ZP+GgA K5kdZu5cf5W8Q== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:d3ea:1c7:41fd:3038]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 6CC1C17E14D6; Wed, 11 Dec 2024 08:54:23 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 2/5] drm/panthor: Be robust against runtime PM resume failures in the suspend path Date: Wed, 11 Dec 2024 08:54:16 +0100 Message-ID: <20241211075419.2333731-3-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241211075419.2333731-1-boris.brezillon@collabora.com> References: <20241211075419.2333731-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" The runtime PM resume operation is not guaranteed to succeed, but if it fails, the device should be in a suspended state. Make sure we're robust to resume failures in the unplug path. v3: - Fix typo - Add R-bs v2: - Move the bit that belonged in the next commit - Drop the panthor_device_unplug() changes Signed-off-by: Boris Brezillon Reviewed-by: Adrian Larumbe Reviewed-by: Steven Price --- drivers/gpu/drm/panthor/panthor_fw.c | 14 +++++++++----- drivers/gpu/drm/panthor/panthor_gpu.c | 3 ++- drivers/gpu/drm/panthor/panthor_mmu.c | 3 ++- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c index ebf8980ca9a3..02789558788d 100644 --- a/drivers/gpu/drm/panthor/panthor_fw.c +++ b/drivers/gpu/drm/panthor/panthor_fw.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -1190,11 +1191,13 @@ void panthor_fw_unplug(struct panthor_device *ptdev) cancel_delayed_work_sync(&ptdev->fw->watchdog.ping_work); - /* Make sure the IRQ handler can be called after that point. */ - if (ptdev->fw->irq.irq) - panthor_job_irq_suspend(&ptdev->fw->irq); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) { + /* Make sure the IRQ handler cannot be called after that point. */ + if (ptdev->fw->irq.irq) + panthor_job_irq_suspend(&ptdev->fw->irq); - panthor_fw_stop(ptdev); + panthor_fw_stop(ptdev); + } list_for_each_entry(section, &ptdev->fw->sections, node) panthor_kernel_bo_destroy(section->mem); @@ -1207,7 +1210,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev) panthor_vm_put(ptdev->fw->vm); ptdev->fw->vm = NULL; - panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 20000); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) + panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 20000); } /** diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c index 0f3cac6ec88e..ee85a371bc38 100644 --- a/drivers/gpu/drm/panthor/panthor_gpu.c +++ b/drivers/gpu/drm/panthor/panthor_gpu.c @@ -180,7 +180,8 @@ void panthor_gpu_unplug(struct panthor_device *ptdev) unsigned long flags; /* Make sure the IRQ handler is not running after that point. */ - panthor_gpu_irq_suspend(&ptdev->gpu->irq); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) + panthor_gpu_irq_suspend(&ptdev->gpu->irq); /* Wake-up all waiters. */ spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags); diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c index 9478ee2093d1..6716463903bc 100644 --- a/drivers/gpu/drm/panthor/panthor_mmu.c +++ b/drivers/gpu/drm/panthor/panthor_mmu.c @@ -2681,7 +2681,8 @@ int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm */ void panthor_mmu_unplug(struct panthor_device *ptdev) { - panthor_mmu_irq_suspend(&ptdev->mmu->irq); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) + panthor_mmu_irq_suspend(&ptdev->mmu->irq); mutex_lock(&ptdev->mmu->as.slots_lock); for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) { From patchwork Wed Dec 11 07:54:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13903072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7A76DE7717D for ; Wed, 11 Dec 2024 07:54:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4CE2310EA81; Wed, 11 Dec 2024 07:54:32 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="Yz3yGEQd"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7576C10E576 for ; Wed, 11 Dec 2024 07:54:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1733903664; bh=sJQmsiiyboG/JGS0OQiv5K1cY5GnkwewzmWJRvMhg8A=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Yz3yGEQd96cs4ndu+VD9zJbecrgXCZB+6bsWC0f7AnCttIiz6K2Q6qrVv2C77Jxey xH+mzp6hmhdq01yq/AyWKf9vfCGkoD78slHEBl8u+SRvegmRHnmfxyYnNHgbLpJN0m MgxSLNcXlB1Sl3E02gLErC7xGvwkh8VOIs3cEh+UxXO9ZZCdxUbqhRr3Le1q+XdaQK 9YvVFHgvhGdT2OzPldoGmWmUgDTLgGN7sPDmVvjGOUGb6gSJmhfKj1tdGrfiJsRZ+O eQ7J9IVIsRjEXxUm65OOq02idfb7hT7lTMag5w/0RPwVVkEkkxI1Dtds2hMGjPxKeP mPzeQIQM76rZA== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:d3ea:1c7:41fd:3038]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id D07B617E14E7; Wed, 11 Dec 2024 08:54:23 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 3/5] drm/panthor: Ignore devfreq_{suspend, resume}_device() failures Date: Wed, 11 Dec 2024 08:54:17 +0100 Message-ID: <20241211075419.2333731-4-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241211075419.2333731-1-boris.brezillon@collabora.com> References: <20241211075419.2333731-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" devfreq_{resume,suspend}_device() don't bother undoing the suspend_count modifications if something fails, so either it assumes failures are harmless, or it's super fragile/buggy. In either case it's not something we can address at the driver level, so let's just assume failures are harmless for now, like is done in panfrost. v3: - Add R-b v2: - Add R-b Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Adrian Larumbe --- drivers/gpu/drm/panthor/panthor_devfreq.c | 12 ++++---- drivers/gpu/drm/panthor/panthor_devfreq.h | 4 +-- drivers/gpu/drm/panthor/panthor_device.c | 35 ++--------------------- 3 files changed, 11 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c index ecc7a52bd688..3686515d368d 100644 --- a/drivers/gpu/drm/panthor/panthor_devfreq.c +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c @@ -243,26 +243,26 @@ int panthor_devfreq_init(struct panthor_device *ptdev) return 0; } -int panthor_devfreq_resume(struct panthor_device *ptdev) +void panthor_devfreq_resume(struct panthor_device *ptdev) { struct panthor_devfreq *pdevfreq = ptdev->devfreq; if (!pdevfreq->devfreq) - return 0; + return; panthor_devfreq_reset(pdevfreq); - return devfreq_resume_device(pdevfreq->devfreq); + drm_WARN_ON(&ptdev->base, devfreq_resume_device(pdevfreq->devfreq)); } -int panthor_devfreq_suspend(struct panthor_device *ptdev) +void panthor_devfreq_suspend(struct panthor_device *ptdev) { struct panthor_devfreq *pdevfreq = ptdev->devfreq; if (!pdevfreq->devfreq) - return 0; + return; - return devfreq_suspend_device(pdevfreq->devfreq); + drm_WARN_ON(&ptdev->base, devfreq_suspend_device(pdevfreq->devfreq)); } void panthor_devfreq_record_busy(struct panthor_device *ptdev) diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.h b/drivers/gpu/drm/panthor/panthor_devfreq.h index 83a5c9522493..b7631de695f7 100644 --- a/drivers/gpu/drm/panthor/panthor_devfreq.h +++ b/drivers/gpu/drm/panthor/panthor_devfreq.h @@ -12,8 +12,8 @@ struct panthor_devfreq; int panthor_devfreq_init(struct panthor_device *ptdev); -int panthor_devfreq_resume(struct panthor_device *ptdev); -int panthor_devfreq_suspend(struct panthor_device *ptdev); +void panthor_devfreq_resume(struct panthor_device *ptdev); +void panthor_devfreq_suspend(struct panthor_device *ptdev); void panthor_devfreq_record_busy(struct panthor_device *ptdev); void panthor_devfreq_record_idle(struct panthor_device *ptdev); diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index e701e605d013..e3b22107b268 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -453,9 +453,7 @@ int panthor_device_resume(struct device *dev) if (ret) goto err_disable_stacks_clk; - ret = panthor_devfreq_resume(ptdev); - if (ret) - goto err_disable_coregroup_clk; + panthor_devfreq_resume(ptdev); if (panthor_device_is_initialized(ptdev) && drm_dev_enter(&ptdev->base, &cookie)) { @@ -492,8 +490,6 @@ int panthor_device_resume(struct device *dev) err_suspend_devfreq: panthor_devfreq_suspend(ptdev); - -err_disable_coregroup_clk: clk_disable_unprepare(ptdev->clks.coregroup); err_disable_stacks_clk: @@ -510,7 +506,7 @@ int panthor_device_resume(struct device *dev) int panthor_device_suspend(struct device *dev) { struct panthor_device *ptdev = dev_get_drvdata(dev); - int ret, cookie; + int cookie; if (atomic_read(&ptdev->pm.state) != PANTHOR_DEVICE_PM_STATE_ACTIVE) return -EINVAL; @@ -542,36 +538,11 @@ int panthor_device_suspend(struct device *dev) drm_dev_exit(cookie); } - ret = panthor_devfreq_suspend(ptdev); - if (ret) { - if (panthor_device_is_initialized(ptdev) && - drm_dev_enter(&ptdev->base, &cookie)) { - panthor_gpu_resume(ptdev); - panthor_mmu_resume(ptdev); - drm_WARN_ON(&ptdev->base, panthor_fw_resume(ptdev)); - panthor_sched_resume(ptdev); - drm_dev_exit(cookie); - } - - goto err_set_active; - } + panthor_devfreq_suspend(ptdev); clk_disable_unprepare(ptdev->clks.coregroup); clk_disable_unprepare(ptdev->clks.stacks); clk_disable_unprepare(ptdev->clks.core); atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); return 0; - -err_set_active: - /* If something failed and we have to revert back to an - * active state, we also need to clear the MMIO userspace - * mappings, so any dumb pages that were mapped while we - * were trying to suspend gets invalidated. - */ - mutex_lock(&ptdev->pm.mmio_lock); - atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_ACTIVE); - unmap_mapping_range(ptdev->base.anon_inode->i_mapping, - DRM_PANTHOR_USER_MMIO_OFFSET, 0, 1); - mutex_unlock(&ptdev->pm.mmio_lock); - return ret; } From patchwork Wed Dec 11 07:54:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13903071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D8748E77180 for ; Wed, 11 Dec 2024 07:54:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 45A1710EA7A; Wed, 11 Dec 2024 07:54:32 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="YVPTdAVc"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id BD6D010E576 for ; Wed, 11 Dec 2024 07:54:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1733903664; bh=3hOOmx5wAFN5IvspLRDRZNSS0DiYJ3/2S0dn334VoYQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YVPTdAVcyE9+MXsi84s7PBIv+bHoupsADSjfO2/sSHfsZCt+hGsY5IdFP108v6eOu zryFrS2el+wtoqZHViGXOyMBnmHURYF4eol5jlHSfR71Wz/fPIJp6/euQpy8m4VoFx PZ5KqaFbBLVeKfhAxEvHsxO88quRmKUYDjzguigbYLMEsTyF6d6D+WXc92HgDZU6OG BEUBG/z6gYsP0NjgyNJdqmffYSln3vV9bQqmimzo2nuDXaCCIGbiU6WBvmKyLGOXf1 UbMofSQjSDOGMTxm9/1CPp/TAcXqpAhVBvq5oSFTAYsINK/t7ChDyhyV3RQ5clBxSH lYZLthKI/HLPw== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:d3ea:1c7:41fd:3038]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 3E9FE17E14EA; Wed, 11 Dec 2024 08:54:24 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 4/5] drm/panthor: Be robust against resume failures Date: Wed, 11 Dec 2024 08:54:18 +0100 Message-ID: <20241211075419.2333731-5-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241211075419.2333731-1-boris.brezillon@collabora.com> References: <20241211075419.2333731-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When the runtime PM resume callback returns an error, it puts the device in a state where it can't be resumed anymore. Make sure we can recover from such transient failures by calling pm_runtime_set_suspended() explicitly after a pm_runtime_resume_and_get() failure. v3: - Add R-b/A-b v2: - Add a comment explaining potential races in panthor_device_resume_and_get() Signed-off-by: Boris Brezillon Reviewed-by: Adrian Larumbe Acked-by: Steven Price --- drivers/gpu/drm/panthor/panthor_device.c | 1 + drivers/gpu/drm/panthor/panthor_device.h | 26 ++++++++++++++++++++++++ drivers/gpu/drm/panthor/panthor_drv.c | 2 +- drivers/gpu/drm/panthor/panthor_sched.c | 4 ++-- 4 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index e3b22107b268..0362101ea896 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -500,6 +500,7 @@ int panthor_device_resume(struct device *dev) err_set_suspended: atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); + atomic_set(&ptdev->pm.recovery_needed, 1); return ret; } diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 0e68f5a70d20..b6c4f25a5d6e 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -180,6 +181,9 @@ struct panthor_device { * is suspended. */ struct page *dummy_latest_flush; + + /** @recovery_needed: True when a resume attempt failed. */ + atomic_t recovery_needed; } pm; /** @profile_mask: User-set profiling flags for job accounting. */ @@ -243,6 +247,28 @@ int panthor_device_mmap_io(struct panthor_device *ptdev, int panthor_device_resume(struct device *dev); int panthor_device_suspend(struct device *dev); +static inline int panthor_device_resume_and_get(struct panthor_device *ptdev) +{ + int ret = pm_runtime_resume_and_get(ptdev->base.dev); + + /* If the resume failed, we need to clear the runtime_error, which + * can done by forcing the RPM state to suspended. If multiple + * threads called panthor_device_resume_and_get(), we only want + * one of them to update the state, hence the cmpxchg. Note that a + * thread might enter panthor_device_resume_and_get() and call + * pm_runtime_resume_and_get() after another thread had attempted + * to resume and failed. This means we will end up with an error + * without even attempting a resume ourselves. The only risk here + * is to report an error when the second resume attempt might have + * succeeded. Given resume errors are not expected, this is probably + * something we can live with. + */ + if (ret && atomic_cmpxchg(&ptdev->pm.recovery_needed, 1, 0) == 1) + pm_runtime_set_suspended(ptdev->base.dev); + + return ret; +} + enum drm_panthor_exception_type { DRM_PANTHOR_EXCEPTION_OK = 0x00, DRM_PANTHOR_EXCEPTION_TERMINATED = 0x04, diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index 1498c97b4b85..b7a9adc918e3 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -763,7 +763,7 @@ static int panthor_query_timestamp_info(struct panthor_device *ptdev, { int ret; - ret = pm_runtime_resume_and_get(ptdev->base.dev); + ret = panthor_device_resume_and_get(ptdev); if (ret) return ret; diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index 97ed5fe5a191..77b184c3fb0c 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -2364,7 +2364,7 @@ static void tick_work(struct work_struct *work) if (!drm_dev_enter(&ptdev->base, &cookie)) return; - ret = pm_runtime_resume_and_get(ptdev->base.dev); + ret = panthor_device_resume_and_get(ptdev); if (drm_WARN_ON(&ptdev->base, ret)) goto out_dev_exit; @@ -3131,7 +3131,7 @@ queue_run_job(struct drm_sched_job *sched_job) return dma_fence_get(job->done_fence); } - ret = pm_runtime_resume_and_get(ptdev->base.dev); + ret = panthor_device_resume_and_get(ptdev); if (drm_WARN_ON(&ptdev->base, ret)) return ERR_PTR(ret); From patchwork Wed Dec 11 07:54:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13903073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D182BE77183 for ; Wed, 11 Dec 2024 07:54:34 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F0A9110EA85; Wed, 11 Dec 2024 07:54:32 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="qkPsUZa3"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 230C410E576 for ; Wed, 11 Dec 2024 07:54:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1733903664; bh=x369bhNx1CrlN65acPcV/Tx2CMCvGgt096ZPVRp0/do=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qkPsUZa3iubBuLC2HZblN61jXHkv+Pc0DCPy3EeGtBqAZnp/J8LtUByS8iPY+xAOR WODEgjERDsiAefW9363q4TwzaToctJWeB5NUk8HSlBU9d600ORVh+Y8W/H5pYYmYdx WRuSmIvqQlb5m6r4zc7YuYMzIsKiXz4ugcV9fCx1pGC5HhZ9Hu4qMrbivSW7W2bQjP bodZpM8SVAx/7Y9YsXf3OLMh8jxdwlCMDZ7LW9pbV8oPBymjSQdva3cDribpGQ/6ZK rAXOB61snb8DhObaSnbW8O2fNVZgmMSqXxfD6PGYCFo18+xPGZdxqmiX/B+dNL9/hA gmmBM/U5+o54w== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:d3ea:1c7:41fd:3038]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id A4B0317E14EF; Wed, 11 Dec 2024 08:54:24 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v3 5/5] drm/panthor: Fix the fast-reset logic Date: Wed, 11 Dec 2024 08:54:19 +0100 Message-ID: <20241211075419.2333731-6-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241211075419.2333731-1-boris.brezillon@collabora.com> References: <20241211075419.2333731-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If we do a GPU soft-reset, that's no longer fast reset. This also means the slow reset fallback doesn't work because the MCU state is only reset after a GPU soft-reset. Let's move the retry logic to panthor_device_resume() to issue a soft-reset between the fast and slow attempts, and patch panthor_gpu_suspend() to only power-off the L2 when a fast reset is requested. v3: - No changes v2: - Add R-b Signed-off-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panthor/panthor_device.c | 32 ++++++++++---- drivers/gpu/drm/panthor/panthor_device.h | 11 +++++ drivers/gpu/drm/panthor/panthor_fw.c | 54 ++++++------------------ drivers/gpu/drm/panthor/panthor_gpu.c | 11 ++--- 4 files changed, 53 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index 0362101ea896..2c817e65e6be 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -431,6 +431,22 @@ int panthor_device_mmap_io(struct panthor_device *ptdev, struct vm_area_struct * return 0; } +static int panthor_device_resume_hw_components(struct panthor_device *ptdev) +{ + int ret; + + panthor_gpu_resume(ptdev); + panthor_mmu_resume(ptdev); + + ret = panthor_fw_resume(ptdev); + if (!ret) + return 0; + + panthor_mmu_suspend(ptdev); + panthor_gpu_suspend(ptdev); + return ret; +} + int panthor_device_resume(struct device *dev) { struct panthor_device *ptdev = dev_get_drvdata(dev); @@ -457,16 +473,16 @@ int panthor_device_resume(struct device *dev) if (panthor_device_is_initialized(ptdev) && drm_dev_enter(&ptdev->base, &cookie)) { - panthor_gpu_resume(ptdev); - panthor_mmu_resume(ptdev); - ret = panthor_fw_resume(ptdev); - if (!drm_WARN_ON(&ptdev->base, ret)) { - panthor_sched_resume(ptdev); - } else { - panthor_mmu_suspend(ptdev); - panthor_gpu_suspend(ptdev); + ret = panthor_device_resume_hw_components(ptdev); + if (ret && ptdev->reset.fast) { + drm_err(&ptdev->base, "Fast reset failed, trying a slow reset"); + ptdev->reset.fast = false; + ret = panthor_device_resume_hw_components(ptdev); } + if (!ret) + panthor_sched_resume(ptdev); + drm_dev_exit(cookie); if (ret) diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index b6c4f25a5d6e..da6574021664 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -157,6 +157,17 @@ struct panthor_device { /** @pending: Set to true if a reset is pending. */ atomic_t pending; + + /** + * @fast: True if the post_reset logic can proceed with a fast reset. + * + * A fast reset is just a reset where the driver doesn't reload the FW sections. + * + * Any time the firmware is properly suspended, a fast reset can take place. + * On the other hand, if the halt operation failed, the driver will reload + * all FW sections to make sure we start from a fresh state. + */ + bool fast; } reset; /** @pm: Power management related data. */ diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c index 02789558788d..5b68dc02b5ce 100644 --- a/drivers/gpu/drm/panthor/panthor_fw.c +++ b/drivers/gpu/drm/panthor/panthor_fw.c @@ -263,17 +263,6 @@ struct panthor_fw { /** @booted: True is the FW is booted */ bool booted; - /** - * @fast_reset: True if the post_reset logic can proceed with a fast reset. - * - * A fast reset is just a reset where the driver doesn't reload the FW sections. - * - * Any time the firmware is properly suspended, a fast reset can take place. - * On the other hand, if the halt operation failed, the driver will reload - * all sections to make sure we start from a fresh state. - */ - bool fast_reset; - /** @irq: Job irq data. */ struct panthor_irq irq; }; @@ -1090,7 +1079,7 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang) /* Make sure we won't be woken up by a ping. */ cancel_delayed_work_sync(&ptdev->fw->watchdog.ping_work); - ptdev->fw->fast_reset = false; + ptdev->reset.fast = false; if (!on_hang) { struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev); @@ -1100,7 +1089,7 @@ void panthor_fw_pre_reset(struct panthor_device *ptdev, bool on_hang) gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1); if (!readl_poll_timeout(ptdev->iomem + MCU_STATUS, status, status == MCU_STATUS_HALT, 10, 100000)) { - ptdev->fw->fast_reset = true; + ptdev->reset.fast = true; } else { drm_warn(&ptdev->base, "Failed to cleanly suspend MCU"); } @@ -1125,49 +1114,30 @@ int panthor_fw_post_reset(struct panthor_device *ptdev) if (ret) return ret; - /* If this is a fast reset, try to start the MCU without reloading - * the FW sections. If it fails, go for a full reset. - */ - if (ptdev->fw->fast_reset) { + if (!ptdev->reset.fast) { + /* On a slow reset, reload all sections, including RO ones. + * We're not supposed to end up here anyway, let's just assume + * the overhead of reloading everything is acceptable. + */ + panthor_reload_fw_sections(ptdev, true); + } else { /* The FW detects 0 -> 1 transitions. Make sure we reset * the HALT bit before the FW is rebooted. * This is not needed on a slow reset because FW sections are * re-initialized. */ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev); + panthor_fw_update_reqs(glb_iface, req, 0, GLB_HALT); - - ret = panthor_fw_start(ptdev); - if (!ret) - goto out; - - /* Forcibly reset the MCU and force a slow reset, so we get a - * fresh boot on the next panthor_fw_start() call. - */ - panthor_fw_stop(ptdev); - ptdev->fw->fast_reset = false; - drm_err(&ptdev->base, "FW fast reset failed, trying a slow reset"); - - ret = panthor_vm_flush_all(ptdev->fw->vm); - if (ret) { - drm_err(&ptdev->base, "FW slow reset failed (couldn't flush FW's AS l2cache)"); - return ret; - } } - /* Reload all sections, including RO ones. We're not supposed - * to end up here anyway, let's just assume the overhead of - * reloading everything is acceptable. - */ - panthor_reload_fw_sections(ptdev, true); - ret = panthor_fw_start(ptdev); if (ret) { - drm_err(&ptdev->base, "FW slow reset failed (couldn't start the FW )"); + drm_err(&ptdev->base, "FW %s reset failed", + ptdev->reset.fast ? "fast" : "slow"); return ret; } -out: /* We must re-initialize the global interface even on fast-reset. */ panthor_fw_init_global_iface(ptdev); return 0; diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c index ee85a371bc38..671049020afa 100644 --- a/drivers/gpu/drm/panthor/panthor_gpu.c +++ b/drivers/gpu/drm/panthor/panthor_gpu.c @@ -470,11 +470,12 @@ int panthor_gpu_soft_reset(struct panthor_device *ptdev) */ void panthor_gpu_suspend(struct panthor_device *ptdev) { - /* - * It may be preferable to simply power down the L2, but for now just - * soft-reset which will leave the L2 powered down. - */ - panthor_gpu_soft_reset(ptdev); + /* On a fast reset, simply power down the L2. */ + if (!ptdev->reset.fast) + panthor_gpu_soft_reset(ptdev); + else + panthor_gpu_power_off(ptdev, L2, 1, 20000); + panthor_gpu_irq_suspend(&ptdev->gpu->irq); }