From patchwork Mon Oct 28 11:48:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13853333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 61A9AD13596 for ; Mon, 28 Oct 2024 11:48:25 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BAD8A10E47A; Mon, 28 Oct 2024 11:48:24 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="OivnJ7nc"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5DE1310E36B for ; Mon, 28 Oct 2024 11:48:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1730116098; bh=RFWRoQRLx2cTCWZHvx7bah58/x+jCS63c16lvg0cICs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OivnJ7nc7Ie/koqh5jRJsx8NKr0Me8zscAT1yKx8hTZj/72UUBjydwZk6CSVeIDFV 94cRbDsHpyRdwRCWbc1LX++uEGu/woypOfNXVQNzEl1Aiavv+FVdNFIsr7ezdtKdJl 0SPVNxK4R9K5VVnYNtt7ans/bMqVzGVdpgN0Zur6FEkIojY51DtP7IitutoysQsbqf srq5GgOqZ1Dyf7TnVVb88nLbdKJb3RY9Qi+RIc6vszfX7qWGpAjHIfxXl9VldarwbL zO7SvFt07SCqnNcaV22qzEH6TytWmC3Dm56HeglotwbU/YIaF1lkiviEHm6spvbTvv C4ANsf9vDMj+A== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id C3FE517E1574; Mon, 28 Oct 2024 12:48:17 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: Christopher Healy , dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v2 1/3] drm/panthor: Fail job creation when the group is dead Date: Mon, 28 Oct 2024 12:48:13 +0100 Message-ID: <20241028114815.3793855-2-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241028114815.3793855-1-boris.brezillon@collabora.com> References: <20241028114815.3793855-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Userspace can use GROUP_SUBMIT errors as a trigger to check the group state and recreate the group if it became unusable. Make sure we report an error when the group became unusable. Changes in v2: - Add R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_sched.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index eb9f6635cc12..eda8fbb276b3 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -3688,6 +3688,11 @@ panthor_job_create(struct panthor_file *pfile, goto err_put_job; } + if (!group_can_run(job->group)) { + ret = -EINVAL; + goto err_put_job; + } + if (job->queue_idx >= job->group->queue_count || !job->group->queues[job->queue_idx]) { ret = -EINVAL; From patchwork Mon Oct 28 11:48:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13853335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A530ED13596 for ; Mon, 28 Oct 2024 11:48:28 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3689210E47E; Mon, 28 Oct 2024 11:48:25 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="ZMP54YTz"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id CDB8D10E36B for ; Mon, 28 Oct 2024 11:48:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1730116098; bh=qE1CM94eUqDK8eXzwuWkNmiabIZtn7drTOPuJOsz048=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZMP54YTzax0IMSu4l0IIDxlEvlxWZmNQFdkziCXWULOBkK0DMPUGax0kJhX90Npp1 ho4nAI1XdSb3Jv4FGEuqROCaTJCnpRh/xeu7AYoftTF55k5ZEAddDXWChg08eeDWkH AP/hoZdTaXAWMUTsqzM3YwAQNNOhDJt6wTVvd8rTliQswDcVoIYBvrKXVnemW5mDyX ozcYB/zZjcn4SStcd0AnxRJ+7rJigf030WUJvTmD/0yvknC3ymec/MzMVS5h3c0RHG jVOb8VVryrTmFGGiGZF4rAfhM3BrDyjkHC7unLEbpZE80YyYz7iVC4cNisgZ30dnEF NLdRA9vmKdftQ== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id 4048417E35CF; Mon, 28 Oct 2024 12:48:18 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: Christopher Healy , dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v2 2/3] drm/panthor: Report group as timedout when we fail to properly suspend Date: Mon, 28 Oct 2024 12:48:14 +0100 Message-ID: <20241028114815.3793855-3-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241028114815.3793855-1-boris.brezillon@collabora.com> References: <20241028114815.3793855-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If we don't do that, the group is considered usable by userspace, but all further GROUP_SUBMIT will fail with -EINVAL. Changes in v2: - New patch Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_sched.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index eda8fbb276b3..ef4bec7ff9c7 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -602,10 +602,11 @@ struct panthor_group { * @timedout: True when a timeout occurred on any of the queues owned by * this group. * - * Timeouts can be reported by drm_sched or by the FW. In any case, any - * timeout situation is unrecoverable, and the group becomes useless. - * We simply wait for all references to be dropped so we can release the - * group object. + * Timeouts can be reported by drm_sched or by the FW. If a reset is required, + * and the group can't be suspended, this also leads to a timeout. In any case, + * any timeout situation is unrecoverable, and the group becomes useless. We + * simply wait for all references to be dropped so we can release the group + * object. */ bool timedout; @@ -2687,6 +2688,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev) csgs_upd_ctx_init(&upd_ctx); while (slot_mask) { u32 csg_id = ffs(slot_mask) - 1; + struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id]; + + /* We consider group suspension failures as fatal and flag the + * group as unusable by setting timedout=true. + */ + csg_slot->group->timedout = true; csgs_upd_ctx_queue_reqs(ptdev, &upd_ctx, csg_id, CSG_STATE_TERMINATE, From patchwork Mon Oct 28 11:48:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13853334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1821BD13570 for ; Mon, 28 Oct 2024 11:48:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1B52810E47C; Mon, 28 Oct 2024 11:48:25 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=collabora.com header.i=@collabora.com header.b="e/xy1/e8"; dkim-atps=neutral Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3F45E10E36B for ; Mon, 28 Oct 2024 11:48:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1730116099; bh=Uc2eg3ONkKsKm5ztPmk7ozIpoVZvSZDjmB326oIDvCg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e/xy1/e8IBkf9gJ2gNUGFVkbjGsQylZQMl+xkdwbU/CLeBUEAd6j0qNbwUw92xKRD F37f5+i/VZSCCAigBWVN8DbQrpqi1Y9i7XBFIN3V7sM5Qxdmii2pHwoRmEtNseDKd+ 8TEOkYcXMDQZfdZkEvXmVg0t8w84QL1FxBKAlPRZCLj6gZUPZSXMTs6xpS1JZBCKWH DyRYrZT4MQNrcrMbZHxawfkDwRFCBy2wbQXZXSA45NqThR0KwrbEVDw3Nmb5k///jn ptuii4T4WKG4qZeX8FOIzBj40Yshmdd2kHvhy1650Qom+fpEPdFuUPqISpcirtwkxB Y0vWcvfWoIl9A== Received: from localhost.localdomain (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id AB30817E35D3; Mon, 28 Oct 2024 12:48:18 +0100 (CET) From: Boris Brezillon To: Boris Brezillon , Steven Price , Liviu Dudau , =?utf-8?q?Adri=C3=A1n_Larumbe?= Cc: Christopher Healy , dri-devel@lists.freedesktop.org, kernel@collabora.com Subject: [PATCH v2 3/3] drm/panthor: Report innocent group kill Date: Mon, 28 Oct 2024 12:48:15 +0100 Message-ID: <20241028114815.3793855-4-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241028114815.3793855-1-boris.brezillon@collabora.com> References: <20241028114815.3793855-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Groups can be killed during a reset even though they did nothing wrong. That usually happens when the FW is put in a bad state by other groups, resulting in group suspension failures when the reset happens. If we end up in that situation, flag the group innocent and report innocence through a new DRM_PANTHOR_GROUP_STATE flag. Bump the minor driver version to reflect the uAPI change. Changes in v2: - New patch Signed-off-by: Boris Brezillon --- drivers/gpu/drm/panthor/panthor_drv.c | 2 +- drivers/gpu/drm/panthor/panthor_sched.c | 16 ++++++++++++++++ include/uapi/drm/panthor_drm.h | 9 +++++++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index ac7e53f6e3f0..f1dff7e0173d 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1507,7 +1507,7 @@ static const struct drm_driver panthor_drm_driver = { .desc = "Panthor DRM driver", .date = "20230801", .major = 1, - .minor = 2, + .minor = 3, .gem_create_object = panthor_gem_create_object, .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index ef4bec7ff9c7..bb4a8c522fdc 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -610,6 +610,16 @@ struct panthor_group { */ bool timedout; + /** + * @innocent: True when the group becomes unusable because the group suspension + * failed during a reset. + * + * Sometimes the FW was put in a bad state by other groups, causing the group + * suspension happening in the reset path to fail. In that case, we consider the + * group innocent. + */ + bool innocent; + /** * @syncobjs: Pool of per-queue synchronization objects. * @@ -2690,6 +2700,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev) u32 csg_id = ffs(slot_mask) - 1; struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id]; + /* If the group was still usable before that point, we consider + * it innocent. + */ + if (group_can_run(csg_slot->group)) + csg_slot->group->innocent = true; + /* We consider group suspension failures as fatal and flag the * group as unusable by setting timedout=true. */ diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h index 87c9cb555dd1..b99763cbae48 100644 --- a/include/uapi/drm/panthor_drm.h +++ b/include/uapi/drm/panthor_drm.h @@ -923,6 +923,15 @@ enum drm_panthor_group_state_flags { * When a group ends up with this flag set, no jobs can be submitted to its queues. */ DRM_PANTHOR_GROUP_STATE_FATAL_FAULT = 1 << 1, + + /** + * @DRM_PANTHOR_GROUP_STATE_INNOCENT: Group was killed during a reset caused by other + * groups. + * + * This flag can only be set if DRM_PANTHOR_GROUP_STATE_TIMEDOUT is set and + * DRM_PANTHOR_GROUP_STATE_FATAL_FAULT is not. + */ + DRM_PANTHOR_GROUP_STATE_INNOCENT = 1 << 2, }; /**