From patchwork Wed Feb 26 19:58:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ma=C3=ADra_Canal?= X-Patchwork-Id: 13993095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B20F7C021B8 for ; Wed, 26 Feb 2025 19:59:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 82E1710E2F8; Wed, 26 Feb 2025 19:59:41 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.b="Kt1eAYtW"; dkim-atps=neutral Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by gabe.freedesktop.org (Postfix) with ESMTPS id 65B5910E2F8 for ; Wed, 26 Feb 2025 19:59:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From:Sender: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=e9Cn6ByLPah154+p0cfCiGDTEchGM+HAhFQn+4UWbM0=; b=Kt1eAYtWfvnnI23KtV0iD+g6oy 8onJGhgbsCEHMnR5ZQ8tPW/B4gV0aKJZ3T5xvqEA6/ts6VDNWQfX9rhGd7aVhUCfLugDMaDXc50+j j9toaLaAdugnRUE0u4GN/FZtE20sDhSGFicFYHR5RWIFUuizXFYpRDsgK2vw/EiKKQfnxI1kLQmA4 tRWkxQ1jNQ99sK4VZVFHqrGAS+iW8YT6VlQE6bgzBeqm0GdJcmG7esQTZXPRfmZ39fkWN6bXKowXd Y1BgrKIJUHMzXubzywy6w31WK4iHhdtVSF0mQXdvrOZ+RwOXeJFjCDKZOrKBv8lmjmBNmZ1f61qvR 7QaQpZ5g==; Received: from [187.36.213.55] (helo=1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1tnNZ3-0019lA-HS; Wed, 26 Feb 2025 20:59:35 +0100 From: =?utf-8?q?Ma=C3=ADra_Canal?= Date: Wed, 26 Feb 2025 16:58:59 -0300 Subject: [PATCH 1/6] drm/v3d: Don't run jobs that have errors flagged in its fence MIME-Version: 1.0 Message-Id: <20250226-v3d-gpu-reset-fixes-v1-1-83a969fdd9c1@igalia.com> References: <20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@igalia.com> In-Reply-To: <20250226-v3d-gpu-reset-fixes-v1-0-83a969fdd9c1@igalia.com> To: Melissa Wen , Iago Toral , Jose Maria Casanova Crespo Cc: dri-devel@lists.freedesktop.org, kernel-dev@igalia.com, stable@vger.kernel.org, =?utf-8?q?Ma=C3=ADra_Canal?= X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=2147; i=mcanal@igalia.com; h=from:subject:message-id; bh=bUryKYLtnN/BEwb4kNwZZ92cWDNa8JE7Wj0Szh70uSc=; b=owEBbQGS/pANAwAIAT/zDop2iPqqAcsmYgBnv3KfDfDWWj8qczEVtu3v+dkTMoqFXTQVzUUFH UtCIkSXjbSJATMEAAEIAB0WIQT45F19ARZ3Bymmd9E/8w6Kdoj6qgUCZ79ynwAKCRA/8w6Kdoj6 qpuJCACN80fHtqEHQF75lH1xT/V+7c1yD4OjYWK2HmI+sC+18Ym7S8TL1iacNyZBKCR8RvRFGXs Yx+iwD85dJL/CfhxkIiD0dz7oNXAe1bTW1ufJvITwtlWnZQib02D/GAKAQxddrT2cf1yCzIBg5z EG/dvC8wQzqD/mr0yV7TM2rhsYueKv7LRgWx1L0aBlkM2SfYVzEYi0e1WCi2XLpNYbgc43j+3Uv 9xi48TEdQoj8ggOwCjbtg2xeceOsBCCoVMl/J/Q7DXuKDd0NVakusK6vi3raZqUcPQuJrC1n3u3 vTpU+tH5MEeSMM416QXJGJr3TS5FQ2I3GqXvg+T4TyjSVCwR X-Developer-Key: i=mcanal@igalia.com; a=openpgp; fpr=F8E45D7D0116770729A677D13FF30E8A7688FAAA X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" The V3D driver still relies on `drm_sched_increase_karma()` and `drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs. The function `drm_sched_increase_karma()` marks the job as guilty, while `drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of that guilty job. Because of this, we must check whether the job’s DMA fence has been flagged with an error before executing the job. Otherwise, the same guilty job may be resubmitted indefinitely, causing repeated GPU resets. This patch adds a check for an error on the job's fence to prevent running a guilty job that was previously flagged when the GPU timed out. Note that the CPU and CACHE_CLEAN queues do not require this check, as their jobs are executed synchronously once the DRM scheduler starts them. Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.") Signed-off-by: Maíra Canal --- drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 80466ce8c7df669280e556c0793490b79e75d2c7..c2010ecdb08f4ba3b54f7783ed33901552d0eba1 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -327,11 +327,15 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job) struct drm_device *dev = &v3d->drm; struct dma_fence *fence; + if (unlikely(job->base.base.s_fence->finished.error)) + return NULL; + + v3d->tfu_job = job; + fence = v3d_fence_create(v3d, V3D_TFU); if (IS_ERR(fence)) return NULL; - v3d->tfu_job = job; if (job->base.irq_fence) dma_fence_put(job->base.irq_fence); job->base.irq_fence = dma_fence_get(fence); @@ -369,6 +373,9 @@ v3d_csd_job_run(struct drm_sched_job *sched_job) struct dma_fence *fence; int i, csd_cfg0_reg; + if (unlikely(job->base.base.s_fence->finished.error)) + return NULL; + v3d->csd_job = job; v3d_invalidate_caches(v3d);