From patchwork Mon Aug 26 12:25:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13777800 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 40FBDC5321D for ; Mon, 26 Aug 2024 12:25:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B289010E1F0; Mon, 26 Aug 2024 12:25:50 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="QGpfBchs"; dkim-atps=neutral Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3D29F10E1EE for ; Mon, 26 Aug 2024 12:25:46 +0000 (UTC) Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-53349d3071eso5226131e87.2 for ; Mon, 26 Aug 2024 05:25:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724675144; x=1725279944; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ViTCCzAV/I8v55Kkt9Y+jyfbuAlIWQhxQTTOXnRJfHE=; b=QGpfBchsLjIBHq16p6jMMyP3NnUTs3kr1QySYAFs/qLEr5brzShqnNqVqq8oU1vctH idpzhKTqSU9l66uIQGaEKm6++XKL8/us/OqY84nZ8cYGLKcZACQXSivDeRufzhOQDpC3 m8ftQFtx/PqiEBgmEcS+zZ5WJB7WRKydW+7me09x3nDgj4ud2VlYO52dFMBNY4s+mZ3Z Ggnj+AoutwpGXlJVpmxz5JM+9vHR7yLWOqCsoBmks+bgrx1lvMsm8+4ukOZRdLwxr/wU WXmhBUACTnppV1SWYACU8DrRKeFGBxatqSXyoYjcBZ0mCg8FXd+iGp1Ha7htw6ePyfIu yCzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724675144; x=1725279944; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ViTCCzAV/I8v55Kkt9Y+jyfbuAlIWQhxQTTOXnRJfHE=; b=B8DlMNXKeNgZh/4n1dGNef8KOEcvof0d/e24Wq+kBdixqaAYoaqvp6lmxBOFUBXEYa 6/uqaYJ+QAj55FR4iMpg3ixmsRxMF9Jtpqqul+WHL4h5xJquvmLs9N3puPHsbYCoPslD Sh5+aG2ySp2GkMNtuZSxFqCRwQ4/Iqzlc1Coz6OHbTnOAudCGYplJO2D+UMVqQWS5l2O RvWUR3ez49nTZW5j2X/WkpK/pClOPtO4XZ4Pp2oS+F6evOHwrJytOOtsnT1FD3aGa3a5 0+c6T28hkpNae7OWAK5PITGdT/LCLVh5oef4WOnvyH4ZuMt1Mb+uvfpCGTaL1C4btzTy +/SQ== X-Gm-Message-State: AOJu0Yw0E8G0Qb7fYOLi6nGl2m/+Ho2BIgzHMSDbUQE/osdXdMpXbkvS +HOOKGHse0quFj6HenHh9BQ0fGRzayqLr44fi+Eis8SF7krqIIxi X-Google-Smtp-Source: AGHT+IEsM1sfUYY6djvoRAS3ClTUggIZsMdhVPDu0NPfOvHOMTL0JyjVWWgV6yi3FbJBBw951/MmXw== X-Received: by 2002:a05:6512:3c9a:b0:52e:999b:7c01 with SMTP id 2adb3069b0e04-534387bbf66mr6833788e87.48.1724675143633; Mon, 26 Aug 2024 05:25:43 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1594:dd00:26ff:beb7:f040:ba50]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a868f28f122sm655073066b.49.2024.08.26.05.25.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2024 05:25:43 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: daniel.vetter@ffwll.ch, vitaly.prosyak@amd.com Cc: dri-devel@lists.freedesktop.org, ltuikov89@gmail.com Subject: [PATCH 1/4] drm/sched: add optional errno to drm_sched_start() Date: Mon, 26 Aug 2024 14:25:38 +0200 Message-Id: <20240826122541.85663-1-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences Signed-off-by: Jesse Zhang Signed-off-by: Vitaly Prosyak Signed-off-by: Christian König Cc: Alex Deucher Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- drivers/gpu/drm/imagination/pvr_queue.c | 4 ++-- drivers/gpu/drm/lima/lima_sched.c | 2 +- drivers/gpu/drm/nouveau/nouveau_sched.c | 2 +- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- drivers/gpu/drm/panthor/panthor_mmu.c | 2 +- drivers/gpu/drm/scheduler/sched_main.c | 7 ++++--- drivers/gpu/drm/v3d/v3d_sched.c | 2 +- include/drm/gpu_scheduler.h | 2 +- 11 files changed, 16 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c index 2320df51c914..18135d8235f9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c @@ -300,7 +300,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus if (r) goto out; } else { - drm_sched_start(&ring->sched); + drm_sched_start(&ring->sched, 0); } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 1cd7d355689c..5891312e44ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5879,7 +5879,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (!amdgpu_ring_sched_ready(ring)) continue; - drm_sched_start(&ring->sched); + drm_sched_start(&ring->sched, 0); } if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled) @@ -6374,7 +6374,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev) if (!amdgpu_ring_sched_ready(ring)) continue; - drm_sched_start(&ring->sched); + drm_sched_start(&ring->sched, 0); } amdgpu_device_unset_mp1_state(adev); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index ab9ca4824b62..23ced5896c7c 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -72,7 +72,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job drm_sched_resubmit_jobs(&gpu->sched); - drm_sched_start(&gpu->sched); + drm_sched_start(&gpu->sched, 0); return DRM_GPU_SCHED_STAT_NOMINAL; out_no_timeout: diff --git a/drivers/gpu/drm/imagination/pvr_queue.c b/drivers/gpu/drm/imagination/pvr_queue.c index 20cb46012082..c4f08432882b 100644 --- a/drivers/gpu/drm/imagination/pvr_queue.c +++ b/drivers/gpu/drm/imagination/pvr_queue.c @@ -782,7 +782,7 @@ static void pvr_queue_start(struct pvr_queue *queue) } } - drm_sched_start(&queue->scheduler); + drm_sched_start(&queue->scheduler, 0); } /** @@ -842,7 +842,7 @@ pvr_queue_timedout_job(struct drm_sched_job *s_job) } mutex_unlock(&pvr_dev->queues.lock); - drm_sched_start(sched); + drm_sched_start(sched, 0); return DRM_GPU_SCHED_STAT_NOMINAL; } diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index 1a944edb6ddc..b40c90e97d7e 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -463,7 +463,7 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job lima_pm_idle(ldev); drm_sched_resubmit_jobs(&pipe->base); - drm_sched_start(&pipe->base); + drm_sched_start(&pipe->base, 0); return DRM_GPU_SCHED_STAT_NOMINAL; } diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c index eb6c3f9a01f5..4412f2711fb5 100644 --- a/drivers/gpu/drm/nouveau/nouveau_sched.c +++ b/drivers/gpu/drm/nouveau/nouveau_sched.c @@ -379,7 +379,7 @@ nouveau_sched_timedout_job(struct drm_sched_job *sched_job) else NV_PRINTK(warn, job->cli, "Generic job timeout.\n"); - drm_sched_start(sched); + drm_sched_start(sched, 0); return stat; } diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index df49d37d0e7e..d140800606bf 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -727,7 +727,7 @@ panfrost_reset(struct panfrost_device *pfdev, /* Restart the schedulers */ for (i = 0; i < NUM_JOB_SLOTS; i++) - drm_sched_start(&pfdev->js->queue[i].sched); + drm_sched_start(&pfdev->js->queue[i].sched, 0); /* Re-enable job interrupts now that everything has been restarted. */ job_write(pfdev, JOB_INT_MASK, diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c index d47972806d50..e630cdf47f99 100644 --- a/drivers/gpu/drm/panthor/panthor_mmu.c +++ b/drivers/gpu/drm/panthor/panthor_mmu.c @@ -827,7 +827,7 @@ static void panthor_vm_stop(struct panthor_vm *vm) static void panthor_vm_start(struct panthor_vm *vm) { - drm_sched_start(&vm->sched); + drm_sched_start(&vm->sched, 0); } /** diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index ab53ab486fe6..f093616fe53c 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -674,9 +674,10 @@ EXPORT_SYMBOL(drm_sched_stop); * drm_sched_start - recover jobs after a reset * * @sched: scheduler instance + * @errno: error to set on the pending fences * */ -void drm_sched_start(struct drm_gpu_scheduler *sched) +void drm_sched_start(struct drm_gpu_scheduler *sched, int errno) { struct drm_sched_job *s_job, *tmp; @@ -691,13 +692,13 @@ void drm_sched_start(struct drm_gpu_scheduler *sched) atomic_add(s_job->credits, &sched->credit_count); if (!fence) { - drm_sched_job_done(s_job, -ECANCELED); + drm_sched_job_done(s_job, errno ?: -ECANCELED); continue; } if (dma_fence_add_callback(fence, &s_job->cb, drm_sched_job_done_cb)) - drm_sched_job_done(s_job, fence->error); + drm_sched_job_done(s_job, fence->error ?: errno); } drm_sched_start_timeout_unlocked(sched); diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index fd29a00b233c..b6a89171824b 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -663,7 +663,7 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job) /* Unblock schedulers and restart their jobs. */ for (q = 0; q < V3D_MAX_QUEUES; q++) { - drm_sched_start(&v3d->queue[q].sched); + drm_sched_start(&v3d->queue[q].sched, 0); } mutex_unlock(&v3d->reset_lock); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index fe8edb917360..a8d19b10f9b8 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -579,7 +579,7 @@ bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched); void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched); void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched); void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); -void drm_sched_start(struct drm_gpu_scheduler *sched); +void drm_sched_start(struct drm_gpu_scheduler *sched, int errno); void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); void drm_sched_increase_karma(struct drm_sched_job *bad); void drm_sched_reset_karma(struct drm_sched_job *bad); From patchwork Mon Aug 26 12:25:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13777799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4BE2DC5321D for ; Mon, 26 Aug 2024 12:25:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C1B2B10E1EE; Mon, 26 Aug 2024 12:25:47 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="gaDNmo3Z"; dkim-atps=neutral Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by gabe.freedesktop.org (Postfix) with ESMTPS id C9C5210E1EE for ; Mon, 26 Aug 2024 12:25:46 +0000 (UTC) Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5bed72ff2f2so5218870a12.2 for ; Mon, 26 Aug 2024 05:25:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724675145; x=1725279945; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fNBj9hOATBnhz/4mAZ40a6b5DcGcVB6sfQANvzEIDEc=; b=gaDNmo3Z5Z+obpi0WVKr5cGP7WElfmTM786z0OtUjFEzDMLyODQkjFT+bHm7enCGJ/ YBbYKHS/ZGv4h7PERaxhdFgPI0aYXM77oPPCo9HzNxevF38h8Orn0fMjhWQbjNvobBW7 guE/XeMWtAFBfyN/QV/ENaPjQ+SZYNBaAlbWwzg24K3r1dGk+ER04yYYTcVUhKBR2eYS UAGyVrmAGEovQ6QfViVtB40L7j7bQhJjd5kly/++IbWBijKKyg7Ac/lQd/8gBb8rwmZ7 iF3MqEWPCiANO848EqEvqAktOG8F+Jj0bMuSgJWC6TkMLXovEVUhcETDjIU2ljC4i8tz 7QbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724675145; x=1725279945; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fNBj9hOATBnhz/4mAZ40a6b5DcGcVB6sfQANvzEIDEc=; b=eK39W+cWnqxPxEOtm0m9nrgIxK/NMy1i+VqAml0DbI8ILk2Twg58RWD14mmzVshzyQ wmNW8qvS1gu0MMyrcSme+0z5AXocgt5OUp1Mk6S8DGst3DahqfwvRr6q3SruHb8+POX2 uXTbvlqGycEsdbiaTAEpQA66Io77gh72qNkvLte62ZZUhFCeWqoSXx1bQOp/9Zj9Ga1i m3PwaT7MEjay7182Is4N05THqimqMO/YRQNCNngLdHCyNyRTARLtzIGnk7Jp+ryyDzFx 2GyNOuoSSb8OqainvJmEtckriw4r4hohhYBg+czH24ge/O09fUO5iEQAMz0D8J9C4rRA QtTg== X-Gm-Message-State: AOJu0YyZSqO029esf4xDbUdnMRXHzKxg6QVC7pCT7PBYNKr2rrDopz/k Xe6q7Cy9AsnO5YyY6H3VWfTIEtmljn3IGXOfygX9HDwi4qXB+/Dm X-Google-Smtp-Source: AGHT+IHhrnJfCLxeHkYGPcYBa0ig0XlsyOKlsFs1HxiP2n8L/RTvXYdDZ2D7C1iGy16YY8uO0+Oe5g== X-Received: by 2002:a17:906:d264:b0:a86:a1cd:5a8c with SMTP id a640c23a62f3a-a86a52bb759mr612058966b.22.1724675144376; Mon, 26 Aug 2024 05:25:44 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1594:dd00:26ff:beb7:f040:ba50]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a868f28f122sm655073066b.49.2024.08.26.05.25.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2024 05:25:44 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: daniel.vetter@ffwll.ch, vitaly.prosyak@amd.com Cc: dri-devel@lists.freedesktop.org, ltuikov89@gmail.com Subject: [PATCH 2/4] dma-buf: give examples of error codes to use Date: Mon, 26 Aug 2024 14:25:39 +0200 Message-Id: <20240826122541.85663-2-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240826122541.85663-1-christian.koenig@amd.com> References: <20240826122541.85663-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" The dma_fence_set_error() function allows to set an error code on a dma_fence object before it is signaled. Document some of the potential error codes drivers should use and especially what they mean. Signed-off-by: Christian König --- include/linux/dma-fence.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index e06bad467f55..e7ad819962e3 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -574,6 +574,12 @@ int dma_fence_get_status(struct dma_fence *fence); * rather than success. This must be set before signaling (so that the value * is visible before any waiters on the signal callback are woken). This * helper exists to help catching erroneous setting of #dma_fence.error. + * + * Examples of error codes which drivers should use: + * + * * %-ENODATA This operation produced no data, no other operation affected. + * * %-ECANCELED All operations from the same context have been canceled. + * * %-ETIME Operation caused a timeout and potentially device reset. */ static inline void dma_fence_set_error(struct dma_fence *fence, int error) From patchwork Mon Aug 26 12:25:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13777801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 50C7BC5321D for ; Mon, 26 Aug 2024 12:25:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CDF6110E1F2; Mon, 26 Aug 2024 12:25:53 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FRveN8z5"; dkim-atps=neutral Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4A02710E1EE for ; Mon, 26 Aug 2024 12:25:47 +0000 (UTC) Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5bed72ff2f2so5218884a12.2 for ; Mon, 26 Aug 2024 05:25:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724675145; x=1725279945; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vhdgn8wDn5XDKZdaWj/YI4H1m4ky6MdIjePQT7kdXL4=; b=FRveN8z595hiN497jX3Fak2dXXgM84U7RySv2Rf+uZFzWvcusbkUhxo6CkXdh8jBG9 Fk/JQpw7Hp3RRMTroK+BEk50V74FxsZjbqad0rYj1+7Dt+rLRHD9P9JUhaYuuP9DSTLw oeCZ6EQM0u6V++ifHCforZNZkN7peQ3YdpeMP7pp2xN8MAtEEvggaLS0t8O61uOCDor0 W9vmle6mod8dsRqO6f9BalsIhEDjaZWt3jXa8/Rpcro/Pg/4Jb/NgC+KIoBLgd9LYKGw jfRyAySY4LrRnDcdJuAj4ml/C58S+z6W/1hGT0p8U6++KfgKSgjbWQuk5jhUph6y2qZQ AEFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724675145; x=1725279945; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vhdgn8wDn5XDKZdaWj/YI4H1m4ky6MdIjePQT7kdXL4=; b=pssS+yYWiuhHGv+udTpwf9rGsIfEKRgBkkQ7SjnU4GrpRz/0/9zNp41/zLTlgsi6fg 9PjpiuJFVj9oX3sBTaEow9Jz6fHwU0WLClQ4fEswexOK+M+txNBh4ocJGxgpuS+gcRwf hK3tA9DYJjts9RFNImn7IYOac4RByVT7ZQSMVj00ObrfDPa1jRMuOGQ80GkZchMu8bIz 5ymd4JvF8Or2kkwFrnx7XcxGyMiUSwp6PsSgT2jK1xaZL5fkaNiliK+M9whcPxdrHXmU slMKnYi1iD9CWYpovBxdLPy62Ml3aZ2GFuFVOuQ4p2+rSH2NGj7a9vA7CK2QizvdHJXu aBSg== X-Gm-Message-State: AOJu0YxNZf3aUNE0OwaUgGruh6kHQDBM6n6QF2fAEeVnTaw91OEM6BZz 4yVd0MgJ2kJcWgvnvNxavHdRO+4qWXLG3IzEHD3jXimvBn0g5k0P X-Google-Smtp-Source: AGHT+IFlLtVgaQhSgekEoMzGf3p4VRuZWMLJmZSLRiyX79GBPlBXhI2kzD9eZt/sBWOGpikZ8U/Juw== X-Received: by 2002:a17:907:86a0:b0:a86:843e:b3dc with SMTP id a640c23a62f3a-a86a54ccefcmr603802066b.62.1724675145125; Mon, 26 Aug 2024 05:25:45 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1594:dd00:26ff:beb7:f040:ba50]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a868f28f122sm655073066b.49.2024.08.26.05.25.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2024 05:25:44 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: daniel.vetter@ffwll.ch, vitaly.prosyak@amd.com Cc: dri-devel@lists.freedesktop.org, ltuikov89@gmail.com Subject: [PATCH 3/4] drm/doc: Document submission error signaling Date: Mon, 26 Aug 2024 14:25:40 +0200 Message-Id: <20240826122541.85663-3-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240826122541.85663-1-christian.koenig@amd.com> References: <20240826122541.85663-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Different approaches have been tried to signal resets and other errors in vendor specific ways which not only resulted in a wide variety of implementations but also repeating the same bugs and problems over different drivers. Document that drivers should use dma_fence based error signaling which is vendor agnostic and allows userspace to query submission errors in generic non-vendor specific code. Signed-off-by: Christian König --- Documentation/gpu/drm-uapi.rst | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst index 370d820be248..b75cc9a70d1f 100644 --- a/Documentation/gpu/drm-uapi.rst +++ b/Documentation/gpu/drm-uapi.rst @@ -305,13 +305,26 @@ Kernel Mode Driver ------------------ The KMD is responsible for checking if the device needs a reset, and to perform -it as needed. Usually a hang is detected when a job gets stuck executing. KMD -should keep track of resets, because userspace can query any time about the -reset status for a specific context. This is needed to propagate to the rest of -the stack that a reset has happened. Currently, this is implemented by each -driver separately, with no common DRM interface. Ideally this should be properly -integrated at DRM scheduler to provide a common ground for all drivers. After a -reset, KMD should reject new command submissions for affected contexts. +it as needed. Usually a hang is detected when a job gets stuck executing. + +Propagation of errors to userspace has proven to be tricky since it goes in +the opposite direction of the usual flow of commands. Because of this vendor +independent error handling was added to the &dma_fence object, this way drivers +can add an error code to their fences before signaling them. See function +dma_fence_set_error() on how to do this and for examples of error codes to use. + +The DRM scheduler also allows setting error codes on all pending fences when +hardware submissions are restarted after an reset. Error codes are also +forwarded from the hardware fence to the scheduler fence to bubble up errors +to the higher levels of the stack and eventually userspace. + +Fence errors can be queried by userspace through the generic SYNC_IOC_FILE_INFO +IOCTL as well as through driver specific interfaces. + +Additional to setting fence errors drivers should also keep track of resets per +context, the DRM scheduler provides the drm_sched_entity_error() function as +helper for this use case. After a reset, KMD should reject new command +submissions for affected contexts. User Mode Driver ---------------- From patchwork Mon Aug 26 12:25:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13777802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2FAFCC5472C for ; Mon, 26 Aug 2024 12:25:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3E49210E1F3; Mon, 26 Aug 2024 12:25:54 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="g91xUbsK"; dkim-atps=neutral Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by gabe.freedesktop.org (Postfix) with ESMTPS id 90EDC10E1F0 for ; Mon, 26 Aug 2024 12:25:48 +0000 (UTC) Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-533488ffaebso4824541e87.0 for ; Mon, 26 Aug 2024 05:25:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724675147; x=1725279947; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pYZtiYtxRd3+JF554W7zBHVXrKeX4GmwhuWQSFjXhdQ=; b=g91xUbsKxYsbMC88SiiQE7aNOZx1wJ8gwAc3YXilhzW0aKVROByuBWsn0cpziduq2G srQOrjet343MnTyhbtVeqsNnKUqO6E+h3FOaxbYUrcJTSkYGf4Lz6q3t9RDZuM7E9jYV E7a/QUEhMqLItgDh6sLKdZr9Vuj69lVPPpHIKp2dIYtLuB1eO9rF6R2JQn8yNyhTJ3pC MKKq+NKvqRo2Ucqj3krqcfyT7aiE1gZAlX8DbAtQK304RFjaTMsiZR3v+fq2q2WQF4/+ Dd1p4ApCgLQxsYYQILz+arRN/oNWztdDsfbuyg9Tip0I3YxoKcF7M8z7ys+ge5qENsDK Mdrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724675147; x=1725279947; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pYZtiYtxRd3+JF554W7zBHVXrKeX4GmwhuWQSFjXhdQ=; b=hNNrkeBXv2echU8UpQYg4wQtq0/un+jmVX3abfli3aZRDUQ+YezmyXoz/tvpoeA9Fe J9bnIgvdM0IiW4NfG8rrcIwSvEai3cdCfk7/jepcH7kggen6mqQlA3gijBzG/TvulIvX 9thMupgoztq2r7JCD4Hg+UHuCBsg3muDP3tdxAtONtyfjP1PEReOc1wRxNcZhS7j7gO1 fAtaklHSWmGjEfUN2YD95PlMNnaRgP1s243iUpft+B+KFB8RBynEhTmhTlqcYN3cMrfj EZykJM8tqLSKAgzSLXkWWOpS319On1Qx9/33hnzkd6yG+1Iw7re3xMvDyfzn2JxX4t3O GLUQ== X-Gm-Message-State: AOJu0YygCShK6Qxc86bMM47GyapfoJU/y4xwqv/fIT2yrWdMEn40u1Db GE7Bjs9uAr/2e7LC6MOG6tsLPYQHnrhTFAEiT5SoPuO0P+GrVOVv X-Google-Smtp-Source: AGHT+IH4B/zq1lrNoOARpIwftWEZdDyWLrXfkEfpLRY+r8Ah2o0yq3AF3QzxRxM1kz3/BnjcyMfZvg== X-Received: by 2002:a05:6512:3188:b0:52e:7656:a0f4 with SMTP id 2adb3069b0e04-534387bb456mr7901172e87.41.1724675145873; Mon, 26 Aug 2024 05:25:45 -0700 (PDT) Received: from able.fritz.box ([2a00:e180:1594:dd00:26ff:beb7:f040:ba50]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a868f28f122sm655073066b.49.2024.08.26.05.25.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2024 05:25:45 -0700 (PDT) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: daniel.vetter@ffwll.ch, vitaly.prosyak@amd.com Cc: dri-devel@lists.freedesktop.org, ltuikov89@gmail.com Subject: [PATCH 4/4] drm/todos: add entry for drm_syncobj error handling Date: Mon, 26 Aug 2024 14:25:41 +0200 Message-Id: <20240826122541.85663-4-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240826122541.85663-1-christian.koenig@amd.com> References: <20240826122541.85663-1-christian.koenig@amd.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" That would be rather nice to have and the kernel side is really trivial, only the userspace side might be a bit more complex. Signed-off-by: Christian König Acked-by: Daniel Vetter --- Documentation/gpu/todo.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst index 96c453980ab6..c771f0c9610f 100644 --- a/Documentation/gpu/todo.rst +++ b/Documentation/gpu/todo.rst @@ -834,6 +834,22 @@ Contact: Javier Martinez Canillas Level: Advanced +Querying errors from drm_syncobj +================================ + +The drm_syncobj container can be used by driver independent code to signal +complection of submission. + +One minor feature still missing is a generic DRM IOCTL to query the error +status of binary and timeline drm_syncobj. + +This should probably be improved by implementing the necessary kernel interface +and adding support for that in the userspace stack. + +Contact: Christian König + +Level: Starter + Outside DRM ===========