From patchwork Fri Jun 12 17:09:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Gordon X-Patchwork-Id: 6599931 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id E0E6DC0020 for ; Fri, 12 Jun 2015 17:09:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CA0622068D for ; Fri, 12 Jun 2015 17:09:55 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 96D732068C for ; Fri, 12 Jun 2015 17:09:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 101796EB21; Fri, 12 Jun 2015 10:09:54 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTP id EC5046EB21 for ; Fri, 12 Jun 2015 10:09:51 -0700 (PDT) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP; 12 Jun 2015 10:09:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,602,1427785200"; d="scan'208";a="586799100" Received: from dsgordon-linux.isw.intel.com ([10.102.226.51]) by orsmga003.jf.intel.com with ESMTP; 12 Jun 2015 10:09:50 -0700 From: Dave Gordon To: intel-gfx@lists.freedesktop.org Date: Fri, 12 Jun 2015 18:09:08 +0100 Message-Id: <1434128948-9221-3-git-send-email-david.s.gordon@intel.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1434128948-9221-1-git-send-email-david.s.gordon@intel.com> References: <1433789441-8295-1-git-send-email-david.s.gordon@intel.com> <1434128948-9221-1-git-send-email-david.s.gordon@intel.com> Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The original idea of preallocating the OLR was implemented in > 9d773091 drm/i915: Preallocate next seqno before touching the ring and the sequence of operations was to allocate the OLR, then wrap past the end of the ring if necessary, then wait for space if necessary. But subsequently intel_ring_begin() was refactored, in > 304d695 drm/i915: Flush outstanding requests before allocating new seqno to ensure that pending work that might need to be flushed used the old and not the newly-allocated request. This changed the sequence to wrap and/or wait, then allocate, although the comment still said /* Preallocate the olr before touching the ring */ which was no longer true as intel_wrap_ring_buffer() touches the ring. However, with the introduction of dynamic pinning, in > 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand came the possibility that the ringbuffer might not be pinned to the GTT or mapped into CPU address space when intel_ring_begin() is called. It gets pinned when the request is allocated, so it's now important that this comes *before* anything that can write into the ringbuffer, in this case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer happens not to be mapped, and (b) tail happens to be sufficiently close to the end of the ring to trigger wrapping. So the correct order is neither the original allocate-wait-pad-wait, nor the subsequent wait-pad-wait-allocate, but wait-allocate-pad, avoiding both the problems described in the two commits mentioned above. As a bonus, we eliminate the special case where a single ring_begin() might end up waiting twice (once to be able to wrap, and then again if that still hadn't actually freed enough space for the request). We just precalculate the total amount of space we'll need *including* any for padding the end of the ring and wait for that much in one go :) In the time since this code was written, it has all been cloned from the original ringbuffer model to become the execbuffer code, in > 82e104c drm/i915/bdw: New logical ring submission mechanism So now we have to fix it in both paths ... Signed-off-by: Dave Gordon --- drivers/gpu/drm/i915/intel_lrc.c | 64 +++++++++++++++---------------- drivers/gpu/drm/i915/intel_ringbuffer.c | 63 +++++++++++++++--------------- 2 files changed, 64 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 454e836..3ef5fb6 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -740,39 +740,22 @@ intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf, execlists_context_queue(ring, ctx, ringbuf->tail, request); } -static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf, - struct intel_context *ctx) -{ - uint32_t __iomem *virt; - int rem = ringbuf->size - ringbuf->tail; - - if (ringbuf->space < rem) { - int ret = logical_ring_wait_for_space(ringbuf, ctx, rem); - - if (ret) - return ret; - } - - virt = ringbuf->virtual_start + ringbuf->tail; - rem /= 4; - while (rem--) - iowrite32(MI_NOOP, virt++); - - ringbuf->tail = 0; - intel_ring_update_space(ringbuf); - - return 0; -} - static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, struct intel_context *ctx, int bytes) { + int fill = 0; int ret; + /* + * If the request will not fit between 'tail' and the effective + * size of the ringbuffer, then we need to pad the end of the + * ringbuffer with NOOPs, then start the request at the top of + * the ring. This increases the total size that we need to check + * for by however much is left at the end of the ring ... + */ if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) { - ret = logical_ring_wrap_buffer(ringbuf, ctx); - if (unlikely(ret)) - return ret; + fill = ringbuf->size - ringbuf->tail; + bytes += fill; } if (unlikely(ringbuf->space < bytes)) { @@ -781,6 +764,28 @@ static int logical_ring_prepare(struct intel_ringbuffer *ringbuf, return ret; } + /* Ensure we have a request before touching the ring */ + if (!ringbuf->ring->outstanding_lazy_request) { + ret = i915_gem_request_alloc(ringbuf->ring, ctx); + if (ret) + return ret; + } + + if (unlikely(fill)) { + uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; + + /* tail should not have moved */ + if (WARN_ON(fill != ringbuf->size - ringbuf->tail)) + fill = ringbuf->size - ringbuf->tail; + + do + iowrite32(MI_NOOP, virt++); + while ((fill -= 4) > 0); + + ringbuf->tail = 0; + intel_ring_update_space(ringbuf); + } + return 0; } @@ -814,11 +819,6 @@ static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf, if (ret) return ret; - /* Preallocate the olr before touching the ring */ - ret = i915_gem_request_alloc(ring, ctx); - if (ret) - return ret; - ringbuf->space -= num_dwords * sizeof(uint32_t); return 0; } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index a3406b2..4c0bc29 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2137,29 +2137,6 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n) return 0; } -static int intel_wrap_ring_buffer(struct intel_engine_cs *ring) -{ - uint32_t __iomem *virt; - struct intel_ringbuffer *ringbuf = ring->buffer; - int rem = ringbuf->size - ringbuf->tail; - - if (ringbuf->space < rem) { - int ret = ring_wait_for_space(ring, rem); - if (ret) - return ret; - } - - virt = ringbuf->virtual_start + ringbuf->tail; - rem /= 4; - while (rem--) - iowrite32(MI_NOOP, virt++); - - ringbuf->tail = 0; - intel_ring_update_space(ringbuf); - - return 0; -} - int intel_ring_idle(struct intel_engine_cs *ring) { struct drm_i915_gem_request *req; @@ -2197,12 +2174,19 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, int bytes) { struct intel_ringbuffer *ringbuf = ring->buffer; + int fill = 0; int ret; + /* + * If the request will not fit between 'tail' and the effective + * size of the ringbuffer, then we need to pad the end of the + * ringbuffer with NOOPs, then start the request at the top of + * the ring. This increases the total size that we need to check + * for by however much is left at the end of the ring ... + */ if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) { - ret = intel_wrap_ring_buffer(ring); - if (unlikely(ret)) - return ret; + fill = ringbuf->size - ringbuf->tail; + bytes += fill; } if (unlikely(ringbuf->space < bytes)) { @@ -2211,6 +2195,28 @@ static int __intel_ring_prepare(struct intel_engine_cs *ring, return ret; } + /* Ensure we have a request before touching the ring */ + if (!ringbuf->ring->outstanding_lazy_request) { + ret = i915_gem_request_alloc(ringbuf->ring, ctx); + if (ret) + return ret; + } + + if (unlikely(fill)) { + uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail; + + /* tail should not have moved */ + if (WARN_ON(fill != ringbuf->size - ringbuf->tail)) + fill = ringbuf->size - ringbuf->tail; + + do + iowrite32(MI_NOOP, virt++); + while ((fill -= 4) > 0); + + ringbuf->tail = 0; + intel_ring_update_space(ringbuf); + } + return 0; } @@ -2229,11 +2235,6 @@ int intel_ring_begin(struct intel_engine_cs *ring, if (ret) return ret; - /* Preallocate the olr before touching the ring */ - ret = i915_gem_request_alloc(ring, ring->default_context); - if (ret) - return ret; - ring->buffer->space -= num_dwords * sizeof(uint32_t); return 0; }