From patchwork Mon Nov 13 12:38:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christian_K=C3=B6nig?= X-Patchwork-Id: 13453946 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3A0ADC4167D for ; Mon, 13 Nov 2023 12:38:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5F59C10E09F; Mon, 13 Nov 2023 12:38:41 +0000 (UTC) Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by gabe.freedesktop.org (Postfix) with ESMTPS id E687710E374 for ; Mon, 13 Nov 2023 12:38:38 +0000 (UTC) Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-4084de32db5so38047135e9.0 for ; Mon, 13 Nov 2023 04:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699879117; x=1700483917; darn=lists.freedesktop.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Ukm8phRDPk8EbU35rBMhp9qJpmKB4f/zJ1Y/aXk5W+w=; b=FqDiY3Wzuozrlu0PLuFqv8PvYkfcy3Sb+IcoZisjFaK97DfLB0yV+JM1jYcvb4X2U7 wPPZ2QJbwN9DnMtmzufjYs1NZqBS2v5rGUTVoNc0KvdtUEBSaOxD4AaiTRHDdzdxcGY0 Uppt5niXa6xRsgKpJeViKcUwmxblX3TAGenePktltanlokmbu1QXor2hieh/ve11tRRJ VErwfvvTVEIg7D7eKio6hfiYBYm8tPD90sTJz5mPmXrEIXOqnMlkFRX2zB4USkQnSsbd WDqUkgfVtWgyrnyLiGojpAFNztezv6wRixqxEaWU2lau5d0KEiQeM7Mvxdi+a/3Y28mD ffvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699879117; x=1700483917; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Ukm8phRDPk8EbU35rBMhp9qJpmKB4f/zJ1Y/aXk5W+w=; b=IWaYqiojkk77CnEActENUf3lf8BWScidWK10tEtJUXTi4BtG5yoVNzHq4YT+oA4nf/ mNqPzjPOaScdXaWG7+0753ItBjCvkPe2RdvhIYipveLj4ovEtDltGliTuqhglhZJi6Xp d8HvtNUSx1BAdgArLi/MqomOpFHMC6Skn/OGux3lRQ7XdKsFyaYulc8BZF7AH73mtJ+Y wio36/KhjLvZpf1mueyEGa2rliqwuHSVBNIU56XxKGi/0HWPNvvU9wVYxC5sWr5EhwDp uZK6naOWNVPJ2PiOKQDNlBMG0ZOiUtMfhphrwp6gJsIllVZZcFTtpMlE0Q0DnyAU//8e OVKg== X-Gm-Message-State: AOJu0Yzqvsev86y8xwiFtChjokIRyrPHve5PIC88tgutq+VcEGTSFaqw IZUrNec9ZmRvdZigEA3NiIM= X-Google-Smtp-Source: AGHT+IFHtTYLExf2ag11zaOn6IYxKYrcPWEfy4+eatoMQq2/CF8RAkgnpieT7o91vksj6nCsUrB4Yg== X-Received: by 2002:a05:600c:d9:b0:402:cc5c:c98 with SMTP id u25-20020a05600c00d900b00402cc5c0c98mr5076290wmm.13.1699879117145; Mon, 13 Nov 2023 04:38:37 -0800 (PST) Received: from EliteBook.amd.com (munvpn.amd.com. [165.204.72.6]) by smtp.gmail.com with ESMTPSA id p41-20020a05600c1da900b00406408dc788sm13879636wms.44.2023.11.13.04.38.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 04:38:36 -0800 (PST) From: " =?utf-8?q?Christian_K=C3=B6nig?= " X-Google-Original-From: =?utf-8?q?Christian_K=C3=B6nig?= To: airlied@gmail.com, ltuikov89@gmail.com, dakr@redhat.com, dri-devel@lists.freedesktop.org, matthew.brost@intel.com, boris.brezillon@collabora.com, alexander.deucher@amd.com Subject: [PATCH] drm/scheduler: improve GPU scheduler documentation Date: Mon, 13 Nov 2023 13:38:32 +0100 Message-Id: <20231113123832.120710-1-christian.koenig@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Christian_K=C3=B6nig?= Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Start to improve the scheduler document. Especially document the lifetime of each of the objects as well as the restrictions around DMA-fence handling and userspace compatibility. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_main.c | 126 ++++++++++++++++++++----- 1 file changed, 104 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 506371c42745..36a7c5dc852d 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -24,28 +24,110 @@ /** * DOC: Overview * - * The GPU scheduler provides entities which allow userspace to push jobs - * into software queues which are then scheduled on a hardware run queue. - * The software queues have a priority among them. The scheduler selects the entities - * from the run queue using a FIFO. The scheduler provides dependency handling - * features among jobs. The driver is supposed to provide callback functions for - * backend operations to the scheduler like submitting a job to hardware run queue, - * returning the dependencies of a job etc. - * - * The organisation of the scheduler is the following: - * - * 1. Each hw run queue has one scheduler - * 2. Each scheduler has multiple run queues with different priorities - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) - * 3. Each scheduler run queue has a queue of entities to schedule - * 4. Entities themselves maintain a queue of jobs that will be scheduled on - * the hardware. - * - * The jobs in a entity are always scheduled in the order that they were pushed. - * - * Note that once a job was taken from the entities queue and pushed to the - * hardware, i.e. the pending queue, the entity must not be referenced anymore - * through the jobs entity pointer. + * The GPU scheduler implements some logic to decide which command submission + * to push next to the hardware. Another major use case for the GPU scheduler + * is to enforce correct driver behavior around those command submission. + * Because of this it's also used by drivers which don't need the actual + * scheduling functionality. + * + * To fulfill this task the GPU scheduler uses of the following objects: + * + * 1. The job object which contains a bunch of dependencies in the form of + * DMA-fence objects. Drivers can also implement an optional prepare_job + * callback which returns additional dependencies as DMA-fence objects. + * It's important to note that this callback must follow the DMA-fence rules, + * so it can't easily allocate memory or grab locks under which memory is + * allocated. Drivers should use this as base class for an object which + * contains the necessary state to push the command submission to the + * hardware. + * + * The lifetime of the job object should at least be from pushing it into the + * scheduler until the scheduler notes through the free callback that a job + * isn't needed any more. Drivers can of course keep their job object alive + * longer than that, but that's outside of the scope of the scheduler + * component. Job initialization is split into two parts, + * drm_sched_job_init() and drm_sched_job_arm(). It's important to note that + * after arming a job drivers must follow the DMA-fence rules and can't + * easily allocate memory or takes locks under which memory is allocated. + * + * 2. The entity object which is a container for jobs which should execute + * sequentially. Drivers should create an entity for each individual context + * they maintain for command submissions which can run in parallel. + * + * The lifetime of the entity should *not* exceed the lifetime of the + * userspace process it was created for and drivers should call the + * drm_sched_entity_flush() function from their file_operations.flush + * callback. Background is that for compatibility reasons with existing + * userspace all results of a command submission should become visible + * externally even after after a process exits. The only exception to that + * is when the process is actively killed by a SIGKILL. In this case the + * entity object makes sure that jobs are freed without running them while + * still maintaining correct sequential order for signaling fences. So it's + * possible that an entity object is not alive any more while jobs from it + * are still running on the hardware. + * + * 3. The hardware fence object which is a DMA-fence provided by the driver as + * result of running jobs. Drivers need to make sure that the normal + * DMA-fence semantics are followed for this object. It's important to note + * that the memory for this object can *not* be allocated in the run_job + * callback since that would violate the requirements for the DMA-fence + * implementation. The scheduler maintains a timeout handler which triggers + * if this fence doesn't signal in a configurable time frame. + * + * The lifetime of this object follows DMA-fence ref-counting rules, the + * scheduler takes ownership of the reference returned by the driver and + * drops it when it's not needed any more. Errors should also be signaled + * through the hardware fence and are bubbled up back to the scheduler fence + * and entity. + * + * 4. The scheduler fence object which encapsulates the whole time from pushing + * the job into the scheduler until the hardware has finished processing it. + * This is internally managed by the scheduler, but drivers can grab + * additional reference to it after arming a job. The implementation + * provides DMA-fence interfaces for signaling both scheduling of a command + * submission as well as finishing of processing. + * + * The lifetime of this object also follows normal DMA-fence ref-counting + * rules. The finished fence is the one normally exposed outside of the + * scheduler, but the driver can grab references to both the scheduled as + * well as the finished fence when needed for pipe-lining optimizations. + * + * 5. The run queue object which is a container of entities for a certain + * priority level. The lifetime of those objects are bound to the scheduler + * lifetime. + * + * This is internally managed by the scheduler and drivers shouldn't touch + * them directly. + * + * 6. The scheduler object itself which does the actual work of selecting a job + * and pushing it to the hardware. Both FIFO and RR selection algorithm are + * supported, but FIFO is preferred for many use cases. + * + * The lifetime of this object is managed by the driver using it. Before + * destroying the scheduler the driver must ensure that all hardware + * processing involving this scheduler object has finished by calling for + * example disable_irq(). It is *not* sufficient to wait for the hardware + * fence here since this doesn't guarantee that all callback processing has + * finished. + * + * All callbacks the driver needs to implement are restricted by DMA-fence + * signaling rules to guarantee deadlock free forward progress. This especially + * means that for normal operation no memory can be allocated. All memory which + * is needed for pushing the job to the hardware must be allocated before + * arming a job. It also means that no locks can be taken under which memory + * might be allocated as well. + * + * Memory which is optional to allocate for device core dumping or debugging + * *must* be allocated with GFP_NOWAIT and appropriate error handling taking if + * that allocation fails. GFP_ATOMIC should only be used if absolutely + * necessary since dipping into the special atomic reserves is usually not + * justified for a GPU driver. + * + * The scheduler also used to provided functionality for re-submitting jobs + * with replacing the hardware fence during reset handling. This functionality + * is now marked as deprecated since this has proven to be fundamentally racy + * and not compatible with DMA-fence rules and shouldn't be used in any new + * code. */ #include