From patchwork Tue Jan 7 17:32:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Falkowski X-Patchwork-Id: 13929037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9CB50E77198 for ; Tue, 7 Jan 2025 14:22:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EF1FC10E708; Tue, 7 Jan 2025 14:22:50 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ZxECE9wx"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5501B10EAD1 for ; Tue, 7 Jan 2025 14:22:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736259769; x=1767795769; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fDeSjr2N0ZgZwEYt5benJYoBhxMK1Fls1tYNKedyiUA=; b=ZxECE9wxpeH80cdKh52HYw29+E1jvl1w2R5+1whl8hgkrt5gf0Z9xJZN mZXgftoW5kIKu4hyC1g7QFL6YehylvoIvggyHJVtPkKqgH/Bxq6xfjVDV quA46qcG7WiTmJO/rIYoTfo9dccMQhIZi36eS1pQj6B1x23kWIYsmB5hz OReLCdLUI8VW+TpxfKJmwL31sDHl/8feEi/4KTgxEGSE2C4odnt0yKhXD R3khV6qkIFGw+CKw85SCx2ibyedZRAqnvAxXcLRZqae7HiwWZ+opTXGmA g5vWQp4EtZB3zpNrqoiM3zl0O+Y7DeyqpZXpSwMcGt6lTqYemJbcyU3Tq g==; X-CSE-ConnectionGUID: b/ifGnimQ+e/UaK3fqH96w== X-CSE-MsgGUID: 7Nb3a0skS7KjVH+y/rcNew== X-IronPort-AV: E=McAfee;i="6700,10204,11308"; a="36324469" X-IronPort-AV: E=Sophos;i="6.12,295,1728975600"; d="scan'208";a="36324469" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 06:22:49 -0800 X-CSE-ConnectionGUID: 38Mjp+IyTQi1L2yIXTefqg== X-CSE-MsgGUID: HLfMTR8eRnqMzwJO4HnhKQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="103635497" Received: from try2-8594.igk.intel.com ([10.91.220.58]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jan 2025 06:22:48 -0800 From: Maciej Falkowski To: dri-devel@lists.freedesktop.org Cc: oded.gabbay@gmail.com, quic_jhugo@quicinc.com, jacek.lawrynowicz@linux.intel.com, Karol Wachowski , Maciej Falkowski Subject: [PATCH 11/14] accel/ivpu: Fix locking order in ivpu_job_submit Date: Tue, 7 Jan 2025 18:32:34 +0100 Message-ID: <20250107173238.381120-12-maciej.falkowski@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250107173238.381120-1-maciej.falkowski@linux.intel.com> References: <20250107173238.381120-1-maciej.falkowski@linux.intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Karol Wachowski Fix deadlock in job submission and abort handling. When a thread aborts currently executing jobs due to a fault, it first locks the global lock protecting submitted_jobs (#1). After the last job is destroyed, it proceeds to release the related context and locks file_priv (#2). Meanwhile, in the job submission thread, the file_priv lock (#2) is taken first, and then the submitted_jobs lock (#1) is obtained when a job is added to the submitted jobs list. CPU0 CPU1 ---- ---- (for example due to a fault) (jobs submissions keep coming) lock(&vdev->submitted_jobs_lock) #1 ivpu_jobs_abort_all() job_destroy() lock(&file_priv->lock) #2 lock(&vdev->submitted_jobs_lock) #1 file_priv_release() lock(&vdev->context_list_lock) lock(&file_priv->lock) #2 This order of locking causes a deadlock. To resolve this issue, change the order of locking in ivpu_job_submit(). Signed-off-by: Karol Wachowski Signed-off-by: Maciej Falkowski --- drivers/accel/ivpu/ivpu_job.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c index c694822a14bf..c93ea37062d7 100644 --- a/drivers/accel/ivpu/ivpu_job.c +++ b/drivers/accel/ivpu/ivpu_job.c @@ -597,6 +597,7 @@ static int ivpu_job_submit(struct ivpu_job *job, u8 priority, u32 cmdq_id) if (ret < 0) return ret; + mutex_lock(&vdev->submitted_jobs_lock); mutex_lock(&file_priv->lock); if (cmdq_id == 0) @@ -606,19 +607,17 @@ static int ivpu_job_submit(struct ivpu_job *job, u8 priority, u32 cmdq_id) if (!cmdq) { ivpu_warn_ratelimited(vdev, "Failed to get job queue, ctx %d\n", file_priv->ctx.id); ret = -EINVAL; - goto err_unlock_file_priv; + goto err_unlock; } ret = ivpu_cmdq_register(file_priv, cmdq); if (ret) { ivpu_err(vdev, "Failed to register command queue: %d\n", ret); - goto err_unlock_file_priv; + goto err_unlock; } job->cmdq_id = cmdq->id; - mutex_lock(&vdev->submitted_jobs_lock); - is_first_job = xa_empty(&vdev->submitted_jobs_xa); ret = xa_alloc_cyclic(&vdev->submitted_jobs_xa, &job->job_id, job, file_priv->job_limit, &file_priv->job_id_next, GFP_KERNEL); @@ -626,7 +625,7 @@ static int ivpu_job_submit(struct ivpu_job *job, u8 priority, u32 cmdq_id) ivpu_dbg(vdev, JOB, "Too many active jobs in ctx %d\n", file_priv->ctx.id); ret = -EBUSY; - goto err_unlock_submitted_jobs; + goto err_unlock; } ret = ivpu_cmdq_push_job(cmdq, job); @@ -649,22 +648,20 @@ static int ivpu_job_submit(struct ivpu_job *job, u8 priority, u32 cmdq_id) job->job_id, file_priv->ctx.id, job->engine_idx, cmdq->priority, job->cmd_buf_vpu_addr, cmdq->jobq->header.tail); - mutex_unlock(&vdev->submitted_jobs_lock); mutex_unlock(&file_priv->lock); if (unlikely(ivpu_test_mode & IVPU_TEST_MODE_NULL_HW)) { - mutex_lock(&vdev->submitted_jobs_lock); ivpu_job_signal_and_destroy(vdev, job->job_id, VPU_JSM_STATUS_SUCCESS); - mutex_unlock(&vdev->submitted_jobs_lock); } + mutex_unlock(&vdev->submitted_jobs_lock); + return 0; err_erase_xa: xa_erase(&vdev->submitted_jobs_xa, job->job_id); -err_unlock_submitted_jobs: +err_unlock: mutex_unlock(&vdev->submitted_jobs_lock); -err_unlock_file_priv: mutex_unlock(&file_priv->lock); ivpu_rpm_put(vdev); return ret;