From patchwork Fri Feb 7 14:51:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tvrtko Ursulin X-Patchwork-Id: 13965237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32A06C0219E for ; Fri, 7 Feb 2025 14:52:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 48EE210EB32; Fri, 7 Feb 2025 14:52:47 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.b="NbFQSRme"; dkim-atps=neutral Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2449110E27D; Fri, 7 Feb 2025 14:51:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=u1FqwNDCUrbBg0RpDSF38zAMjA7/G9WyaEImBN40SNI=; b=NbFQSRmeXgDcXBOeQ+oWq6E0Mi bQRFeKkPSC1b5JVYkin06EkhO51wQhqctUn6aNQOkd+qbR/60FBImAI2NX+hvMWMe2tQR29rS8N+D tJDs80bhZIdX9IAzWJWjgNCsCNO1O5qkLLcheu2sfu508s7+Dy0rLE2dYhLfktDQvF9OBWbUNOUVk RKNqY5jL4WlgpzQHkfGBAeUQptpfFwQUy+c/l0/3aW3niCp9lY1ERpZ1u+dlpKoUm2+VYZBPpuI68 ZIhYof58pHgwOVdlR0yI1pk/fUuyQ6IAdNkxQkGA6pFzytFPy3KtTjHDFeAxTJMtTgosFsGD6hI8b xO+sVBng==; Received: from [90.241.98.187] (helo=localhost) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1tgPhB-005spB-H8; Fri, 07 Feb 2025 15:51:11 +0100 From: Tvrtko Ursulin To: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Cc: kernel-dev@igalia.com, Tvrtko Ursulin , =?utf-8?q?Christian_K=C3=B6nig?= , Danilo Krummrich , Matthew Brost , Philipp Stanner , "Zhang, Hawking" Subject: [PATCH 2/6] drm/amdgpu: Pop jobs from the queue more robustly Date: Fri, 7 Feb 2025 14:51:00 +0000 Message-ID: <20250207145104.60455-3-tvrtko.ursulin@igalia.com> X-Mailer: git-send-email 2.48.0 In-Reply-To: <20250207145104.60455-1-tvrtko.ursulin@igalia.com> References: <20250207145104.60455-1-tvrtko.ursulin@igalia.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a newly added __drm_sched_entity_queue_pop. This allows breaking the hidden dependency that queue_node has to be the first element in struct drm_sched_job. A comment is also added with a reference to the mailing list discussion explaining the copied helper will be removed when the whole broken amdgpu_job_stop_all_jobs_on_sched is removed. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Danilo Krummrich Cc: Matthew Brost Cc: Philipp Stanner Cc: "Zhang, Hawking" --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 100f04475943..1899c601c95c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct drm_sched_job *sched_job) return fence; } -#define to_drm_sched_job(sched_job) \ - container_of((sched_job), struct drm_sched_job, queue_node) +/* + * This is a duplicate function from DRM scheduler sched_internal.h. + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is removed, due + * latter being incorrect and racy. + * + * See https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-a4f7-94e50514f539@amd.com/ + */ +static struct drm_sched_job * +drm_sched_entity_queue_pop(struct drm_sched_entity *entity) +{ + struct spsc_node *node; + + node = spsc_queue_pop(&entity->job_queue); + if (!node) + return NULL; + + return container_of(node, struct drm_sched_job, queue_node); +} void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched) { @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched) struct drm_sched_rq *rq = sched->sched_rq[i]; spin_lock(&rq->lock); list_for_each_entry(s_entity, &rq->entities, list) { - while ((s_job = to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) { + while ((s_job = drm_sched_entity_queue_pop(s_entity))) { struct drm_sched_fence *s_fence = s_job->s_fence; dma_fence_signal(&s_fence->scheduled);