From patchwork Thu Mar 27 18:00:18 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: oscar.mateo@intel.com X-Patchwork-Id: 3898581 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0D495BF540 for ; Thu, 27 Mar 2014 17:09:19 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1C7E220240 for ; Thu, 27 Mar 2014 17:09:18 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 18ECE20237 for ; Thu, 27 Mar 2014 17:09:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A906D6E9E8; Thu, 27 Mar 2014 10:09:16 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id CAA046E9EB for ; Thu, 27 Mar 2014 10:09:13 -0700 (PDT) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 27 Mar 2014 10:08:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,743,1389772800"; d="scan'208";a="501087173" Received: from omateolo-linux2.iwi.intel.com ([172.28.253.148]) by fmsmga001.fm.intel.com with ESMTP; 27 Mar 2014 10:06:45 -0700 From: oscar.mateo@intel.com To: intel-gfx@lists.freedesktop.org Date: Thu, 27 Mar 2014 18:00:18 +0000 Message-Id: <1395943218-7708-50-git-send-email-oscar.mateo@intel.com> X-Mailer: git-send-email 1.9.0 In-Reply-To: <1395943218-7708-1-git-send-email-oscar.mateo@intel.com> References: <1395943218-7708-1-git-send-email-oscar.mateo@intel.com> Cc: Thomas Daniel Subject: [Intel-gfx] [PATCH 49/49] drm/i915/bdw: Document execlists and logical ring contexts X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Oscar Mateo Explain i915_lrc.c with some execlists notes Signed-off-by: Thomas Daniel v2: Add notes on logical ring context creation. Signed-off-by: Oscar Mateo --- drivers/gpu/drm/i915/i915_lrc.c | 78 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_lrc.c b/drivers/gpu/drm/i915/i915_lrc.c index 025dae7..521abe9 100644 --- a/drivers/gpu/drm/i915/i915_lrc.c +++ b/drivers/gpu/drm/i915/i915_lrc.c @@ -33,8 +33,86 @@ * These expanded contexts enable a number of new abilities, especially * "Execlists" (also implemented in this file). * + * One of the main differences with the legacy HW contexts is that logical + * ring contexts incorporate many more things to the context's state, like + * PDPs or ringbuffer control registers. + * + * Regarding the creation of contexts, we had before: + * + * - One global default context. + * - One local default context for each opened fd. + * - One extra context for each context create ioctl call. + * + * Now that ringbuffers belong per-context (and not per-engine, like before) and + * that contexts are uniquely tied to a given engine (and not reusable, like + * before) we need: + * + * - One global default context for each engine. + * - Up to "no. of engines" local default contexts for each opened fd. + * - Up to "no. of engines" extra local contexts for each context create ioctl. + * + * Given that at creation time of a non-global context we don't know which + * engine is going to use it, we have implemented a deferred creation of + * LR contexts: the local default context starts its life as a hollow or + * blank holder, that gets populated once we receive an execbuffer ioctl on + * that fd. If later on we receive another execbuffer ioctl for a different + * engine, we create a second local default context and so on. The same rules + * apply to create context. + * * Execlists are the new method by which, on gen8+ hardware, workloads are * submitted for execution (as opposed to the legacy, ringbuffer-based, method). + * This method works as follows: + * + * When a request is committed, its commands (the BB start and any leading or + * trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer + * for the appropriate context. The tail pointer in the hardware context is not + * updated at this time, but instead, kept by the driver in the ringbuffer + * structure. A structure representing this request is added to a request queue + * for the appropriate engine: this structure contains a copy of the context's + * tail after the request was written to the ring buffer and a pointer to the + * context itself. + * + * If the engine's request queue was empty before the request was added, the + * queue is processed immediately. Otherwise the queue will be processed during + * a context switch interrupt. In any case, elements on the queue will get sent + * (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a + * globally unique 20-bits context ID (constructed with the fd's ID, plus our + * own context ID, plus the engine's ID). + * + * When execution of a request completes, the GPU updates the context status + * buffer with a context complete event and generates a context switch interrupt. + * During context switch interrupt handling, the driver examines the context + * status events in the context status buffer: for each context complete event, + * if the announced ID matches that on the head of the request queue then that + * request is retired and removed from the queue. + * + * After processing, if any requests were retired and the queue is not empty + * then a new execution list can be submitted. The two requests at the front of + * the queue are next to be submitted but since a context may not occur twice in + * an execution list, if subsequent requests have the same ID as the first then + * the two requests must be combined. This is done simply by discarding requests + * at the head of the queue until either only one requests is left (in which case + * we use a NULL second context) or the first two requests have unique IDs. + * + * By always executing the first two requests in the queue the driver ensures + * that the GPU is kept as busy as possible. In the case where a single context + * completes but a second context is still executing, the request for the second + * context will be at the head of the queue when we remove the first one. This + * request will then be resubmitted along with a new request for a different context, + * which will cause the hardware to continue executing the second request and queue + * the new request (the GPU detects the condition of a context getting preempted + * with the same context and optimizes the context switch flow by not doing + * preemption, but just sampling the new tail pointer). + * + * Because the GPU continues to execute while the context switch interrupt is being + * handled, there is a race condition where a second context completes while + * handling the completion of the previous. This results in the second context being + * resubmitted (potentially along with a third), and an extra context complete event + * for that context will occur. The request will be removed from the queue at the + * first context complete event, and the second context complete event will not + * result in removal of a request from the queue because the IDs of the request + * and the event will not match. + * */ #include