From patchwork Thu Feb 20 06:19:19 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 3684581 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 9F974BF13A for ; Thu, 20 Feb 2014 06:19:57 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 4E8612017B for ; Thu, 20 Feb 2014 06:19:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id E1EE1201B6 for ; Thu, 20 Feb 2014 06:19:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EF565F9E28; Wed, 19 Feb 2014 22:19:49 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id 8CEA5FB0F2 for ; Wed, 19 Feb 2014 22:19:34 -0800 (PST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 19 Feb 2014 22:19:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,510,1389772800"; d="scan'208";a="486366822" Received: from unknown (HELO ironside.amr.corp.intel.com) ([10.255.14.6]) by orsmga002.jf.intel.com with ESMTP; 19 Feb 2014 22:19:33 -0800 From: Ben Widawsky To: Intel GFX Date: Wed, 19 Feb 2014 22:19:19 -0800 Message-Id: <1392877166-9195-6-git-send-email-benjamin.widawsky@intel.com> X-Mailer: git-send-email 1.9.0 In-Reply-To: <1392877166-9195-1-git-send-email-benjamin.widawsky@intel.com> References: <1392877166-9195-1-git-send-email-benjamin.widawsky@intel.com> Cc: Ben Widawsky , Ben Widawsky Subject: [Intel-gfx] [PATCH 06/13] drm/i915/bdw: implement semaphore signal X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Semaphore signalling works similarly to previous GENs with the exception that the per ring mailboxes no longer exist. Instead you must define your own space, somewhere in the GTT. The comments in the code define the layout I've opted for, which should be fairly future proof. Ie. I tried to define offsets in abstract terms (NUM_RINGS, seqno size, etc). NOTE: If one wanted to move this to the HWSP they could. I've decided one 4k object would be easier to deal with, and provide potential wins with cache locality, but that's all speculative. v2: Update the macro to not need the other ring's ring->id (Chris) Update the comment to use the correct formula (Chris) v3: Move the macros the ringbuffer.h to prevent churn in next patch (Ville) v4: Fixed compilation rebase conflict commit 1ec9e26ddab06459e89a890431b2de064c5d1056 Author: Daniel Vetter Date: Fri Feb 14 14:01:11 2014 +0100 drm/i915: Consolidate binding parameters into flags Signed-off-by: Ben Widawsky --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_reg.h | 5 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 153 +++++++++++++++++++++++--------- drivers/gpu/drm/i915/intel_ringbuffer.h | 73 +++++++++++++-- 4 files changed, 185 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8c64831..028ce5a 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1419,6 +1419,7 @@ typedef struct drm_i915_private { struct pci_dev *bridge_dev; struct intel_ring_buffer ring[I915_NUM_RINGS]; + struct drm_i915_gem_object *semaphore_obj; uint32_t last_seqno, next_seqno; drm_dma_handle_t *status_page_dmah; diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2f564ce..4e36d03 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -215,7 +215,7 @@ #define MI_DISPLAY_FLIP_IVB_SPRITE_B (3 << 19) #define MI_DISPLAY_FLIP_IVB_PLANE_C (4 << 19) #define MI_DISPLAY_FLIP_IVB_SPRITE_C (5 << 19) -#define MI_SEMAPHORE_MBOX MI_INSTR(0x16, 1) /* gen6+ */ +#define MI_SEMAPHORE_MBOX MI_INSTR(0x16, 1) /* gen6, gen7 */ #define MI_SEMAPHORE_GLOBAL_GTT (1<<22) #define MI_SEMAPHORE_UPDATE (1<<21) #define MI_SEMAPHORE_COMPARE (1<<20) @@ -240,6 +240,8 @@ #define MI_RESTORE_EXT_STATE_EN (1<<2) #define MI_FORCE_RESTORE (1<<1) #define MI_RESTORE_INHIBIT (1<<0) +#define MI_SEMAPHORE_SIGNAL MI_INSTR(0x1b, 0) /* GEN8+ */ +#define MI_SEMAPHORE_TARGET(engine) ((engine)<<15) #define MI_STORE_DWORD_IMM MI_INSTR(0x20, 1) #define MI_MEM_VIRTUAL (1 << 22) /* 965+ only */ #define MI_STORE_DWORD_INDEX MI_INSTR(0x21, 1) @@ -328,6 +330,7 @@ #define PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE (1<<10) /* GM45+ only */ #define PIPE_CONTROL_INDIRECT_STATE_DISABLE (1<<9) #define PIPE_CONTROL_NOTIFY (1<<8) +#define PIPE_CONTROL_FLUSH_ENABLE (1<<7) /* gen7+ */ #define PIPE_CONTROL_VF_CACHE_INVALIDATE (1<<4) #define PIPE_CONTROL_CONST_CACHE_INVALIDATE (1<<3) #define PIPE_CONTROL_STATE_CACHE_INVALIDATE (1<<2) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 691da67..717976b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -621,6 +621,13 @@ static int init_render_ring(struct intel_ring_buffer *ring) static void render_ring_cleanup(struct intel_ring_buffer *ring) { struct drm_device *dev = ring->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + + if (dev_priv->semaphore_obj) { + i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj); + drm_gem_object_unreference(&dev_priv->semaphore_obj->base); + dev_priv->semaphore_obj = NULL; + } if (ring->scratch.obj == NULL) return; @@ -634,6 +641,85 @@ static void render_ring_cleanup(struct intel_ring_buffer *ring) ring->scratch.obj = NULL; } +static int gen8_rcs_signal(struct intel_ring_buffer *signaller, + unsigned int num_dwords) +{ +#define MBOX_UPDATE_DWORDS 8 + struct drm_device *dev = signaller->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + struct intel_ring_buffer *waiter; + int i, ret, num_rings; + + num_rings = hweight_long(INTEL_INFO(dev)->ring_mask); + num_dwords += (num_rings-1) * MBOX_UPDATE_DWORDS; +#undef MBOX_UPDATE_DWORDS + + ret = intel_ring_begin(signaller, num_dwords); + if (ret) + return ret; + + for_each_ring(waiter, dev_priv, i) { + u64 gtt_offset = signaller->semaphore.signal_ggtt[i]; + if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID) + continue; + + intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6)); + intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB | + PIPE_CONTROL_QW_WRITE | + PIPE_CONTROL_FLUSH_ENABLE); + intel_ring_emit(signaller, lower_32_bits(gtt_offset)); + intel_ring_emit(signaller, upper_32_bits(gtt_offset)); + intel_ring_emit(signaller, signaller->outstanding_lazy_seqno); + intel_ring_emit(signaller, 0); + intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL | + MI_SEMAPHORE_TARGET(waiter->id)); + intel_ring_emit(signaller, 0); + } + + WARN_ON(i != num_rings); + + return 0; +} + +static int gen8_xcs_signal(struct intel_ring_buffer *signaller, + unsigned int num_dwords) +{ +#define MBOX_UPDATE_DWORDS 6 + struct drm_device *dev = signaller->dev; + struct drm_i915_private *dev_priv = dev->dev_private; + struct intel_ring_buffer *waiter; + int i, ret, num_rings; + + num_rings = hweight_long(INTEL_INFO(dev)->ring_mask); + num_dwords = (num_rings-1) * MBOX_UPDATE_DWORDS; +#undef MBOX_UPDATE_DWORDS + + /* XXX: + 4 for the caller */ + ret = intel_ring_begin(signaller, num_dwords + 4); + if (ret) + return ret; + + for_each_ring(waiter, dev_priv, i) { + u64 gtt_offset = signaller->semaphore.signal_ggtt[i]; + if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID) + continue; + + intel_ring_emit(signaller, (MI_FLUSH_DW + 1) | + MI_FLUSH_DW_OP_STOREDW); + intel_ring_emit(signaller, lower_32_bits(gtt_offset) | + MI_FLUSH_DW_USE_GTT); + intel_ring_emit(signaller, upper_32_bits(gtt_offset)); + intel_ring_emit(signaller, signaller->outstanding_lazy_seqno); + intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL | + MI_SEMAPHORE_TARGET(waiter->id)); + intel_ring_emit(signaller, 0); + } + + WARN_ON(i != num_rings); + + return 0; +} + static int gen6_signal(struct intel_ring_buffer *signaller, unsigned int num_dwords) { @@ -1865,12 +1951,30 @@ int intel_init_render_ring_buffer(struct drm_device *dev) { drm_i915_private_t *dev_priv = dev->dev_private; struct intel_ring_buffer *ring = &dev_priv->ring[RCS]; + struct drm_i915_gem_object *obj; + int ret; ring->name = "render ring"; ring->id = RCS; ring->mmio_base = RENDER_RING_BASE; if (INTEL_INFO(dev)->gen >= 8) { + if (i915_semaphore_is_enabled(dev)) { + obj = i915_gem_alloc_object(dev, 4096); + if (obj == NULL) { + DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n"); + i915.semaphores = 0; + } else { + i915_gem_object_set_cache_level(obj, I915_CACHE_LLC); + ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_NONBLOCK); + if (ret != 0) { + drm_gem_object_unreference(&obj->base); + DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n"); + i915.semaphores = 0; + } else + dev_priv->semaphore_obj = obj; + } + } ring->add_request = gen6_add_request; ring->flush = gen8_render_ring_flush; ring->irq_get = gen8_ring_get_irq; @@ -1879,17 +1983,10 @@ int intel_init_render_ring_buffer(struct drm_device *dev) ring->get_seqno = gen6_ring_get_seqno; ring->set_seqno = ring_set_seqno; if (i915_semaphore_is_enabled(dev)) { + BUG_ON(!dev_priv->semaphore_obj); ring->semaphore.sync_to = gen6_ring_sync; - ring->semaphore.signal = gen6_signal; - ring->semaphore.signal = gen6_signal; - ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.signal[RCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[BCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VECS] = GEN6_NOSYNC; + ring->semaphore.signal = gen8_rcs_signal; + GEN8_RING_SEMAPHORE_INIT; } } else if (INTEL_INFO(dev)->gen >= 6) { ring->add_request = gen6_add_request; @@ -1958,9 +2055,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev) /* Workaround batchbuffer to combat CS tlb bug. */ if (HAS_BROKEN_CS_TLB(dev)) { - struct drm_i915_gem_object *obj; - int ret; - obj = i915_gem_alloc_object(dev, I830_BATCH_LIMIT); if (obj == NULL) { DRM_ERROR("Failed to allocate batch bo\n"); @@ -2076,15 +2170,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev) gen8_ring_dispatch_execbuffer; if (i915_semaphore_is_enabled(dev)) { ring->semaphore.sync_to = gen6_ring_sync; - ring->semaphore.signal = gen6_signal; - ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.signal[RCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[BCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VECS] = GEN6_NOSYNC; + ring->semaphore.signal = gen8_xcs_signal; + GEN8_RING_SEMAPHORE_INIT; } } else { ring->irq_enable_mask = GT_BSD_USER_INTERRUPT; @@ -2149,15 +2236,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev) ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer; if (i915_semaphore_is_enabled(dev)) { ring->semaphore.sync_to = gen6_ring_sync; - ring->semaphore.signal = gen6_signal; - ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.signal[RCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[BCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VECS] = GEN6_NOSYNC; + ring->semaphore.signal = gen8_xcs_signal; + GEN8_RING_SEMAPHORE_INIT; } } else { ring->irq_enable_mask = GT_BLT_USER_INTERRUPT; @@ -2205,15 +2285,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev) ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer; if (i915_semaphore_is_enabled(dev)) { ring->semaphore.sync_to = gen6_ring_sync; - ring->semaphore.signal = gen6_signal; - ring->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID; - ring->semaphore.mbox.signal[RCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[BCS] = GEN6_NOSYNC; - ring->semaphore.mbox.signal[VECS] = GEN6_NOSYNC; + ring->semaphore.signal = gen8_xcs_signal; + GEN8_RING_SEMAPHORE_INIT; } } else { ring->irq_enable_mask = PM_VEBOX_USER_INTERRUPT; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 36b4462..38a50c3 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -33,6 +33,31 @@ struct intel_hw_status_page { #define I915_READ_IMR(ring) I915_READ(RING_IMR((ring)->mmio_base)) #define I915_WRITE_IMR(ring, val) I915_WRITE(RING_IMR((ring)->mmio_base), val) +/* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to + * do the writes, and that must have qw aligned offsets, simply pretend it's 8b. + */ +#define i915_semaphore_seqno_size sizeof(uint64_t) +#define GEN8_SIGNAL_OFFSET(to) \ + (i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \ + (ring->id * I915_NUM_RINGS * i915_semaphore_seqno_size) + \ + (i915_semaphore_seqno_size * (to))) + +#define GEN8_WAIT_OFFSET(from) \ + (i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \ + ((from) * I915_NUM_RINGS * i915_semaphore_seqno_size) + \ + (i915_semaphore_seqno_size * ring->id)) + +#define GEN8_RING_SEMAPHORE_INIT do { \ + if (!dev_priv->semaphore_obj) { \ + break; \ + } \ + ring->semaphore.signal_ggtt[RCS] = GEN8_SIGNAL_OFFSET(RCS); \ + ring->semaphore.signal_ggtt[VCS] = GEN8_SIGNAL_OFFSET(VCS); \ + ring->semaphore.signal_ggtt[BCS] = GEN8_SIGNAL_OFFSET(BCS); \ + ring->semaphore.signal_ggtt[VECS] = GEN8_SIGNAL_OFFSET(VECS); \ + ring->semaphore.signal_ggtt[ring->id] = MI_SEMAPHORE_SYNC_INVALID; \ + } while(0) + enum intel_ring_hangcheck_action { HANGCHECK_IDLE = 0, HANGCHECK_WAIT, @@ -113,15 +138,51 @@ struct intel_ring_buffer { #define I915_DISPATCH_PINNED 0x2 void (*cleanup)(struct intel_ring_buffer *ring); + /* GEN8 signal/wait table + * signal to signal to signal to signal to + * RCS VCS BCS VECS + * ------------------------------------------------------ + * RCS | NOP (0x00) | BCS (0x08) | VCS (0x10) | VECS (0x18) | + * |----------------------------------------------------- + * VCS | RCS (0x20) | NOP (0x28) | BCS (0x30) | VECS (0x38) | + * |----------------------------------------------------- + * BCS | RCS (0x40) | VCS (0x48) | NOP (0x50) | VECS (0x58) | + * |----------------------------------------------------- + * VECS | RCS (0x60) | VCS (0x68) | BCS (0x70) | NOP (0x78) | + * |----------------------------------------------------- + * + * Generalization: + * f(x, y) := (x->id * NUM_RINGS * seqno_size) + (seqno_size * y->id) + * ie. transpose of g(x, y) + * + * sync from sync from sync from sync from + * RCS VCS BCS VECS + * ------------------------------------------------------ + * RCS | NOP (0x00) | BCS (0x20) | VCS (0x40) | VECS (0x60) | + * |----------------------------------------------------- + * VCS | RCS (0x08) | NOP (0x28) | BCS (0x48) | VECS (0x68) | + * |----------------------------------------------------- + * BCS | RCS (0x10) | VCS (0x30) | NOP (0x50) | VECS (0x60) | + * |----------------------------------------------------- + * VECS | RCS (0x18) | VCS (0x38) | BCS (0x58) | NOP (0x78) | + * |----------------------------------------------------- + * + * Generalization: + * g(x, y) := (y->id * NUM_RINGS * seqno_size) + (seqno_size * x->id) + * ie. transpose of f(x, y) + */ struct { u32 sync_seqno[I915_NUM_RINGS-1]; - struct { - /* our mbox written by others */ - u32 wait[I915_NUM_RINGS]; - /* mboxes this ring signals to */ - u32 signal[I915_NUM_RINGS]; - } mbox; + union { + struct { + /* our mbox written by others */ + u32 wait[I915_NUM_RINGS]; + /* mboxes this ring signals to */ + u32 signal[I915_NUM_RINGS]; + } mbox; + u64 signal_ggtt[I915_NUM_RINGS]; + }; /* AKA wait() */ int (*sync_to)(struct intel_ring_buffer *ring,