From patchwork Tue Mar 18 05:48:56 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 3847471 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 810D1BF549 for ; Tue, 18 Mar 2014 05:49:24 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2DA6920270 for ; Tue, 18 Mar 2014 05:49:23 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id C79E22015D for ; Tue, 18 Mar 2014 05:49:21 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2B1652B0F7; Mon, 17 Mar 2014 22:49:21 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id 5FA732B0F2 for ; Mon, 17 Mar 2014 22:49:14 -0700 (PDT) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 17 Mar 2014 22:49:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,675,1389772800"; d="scan'208";a="474655810" Received: from unknown (HELO ironside.amr.corp.intel.com) ([10.255.13.186]) by orsmga001.jf.intel.com with ESMTP; 17 Mar 2014 22:49:13 -0700 From: Ben Widawsky To: Intel GFX Date: Mon, 17 Mar 2014 22:48:56 -0700 Message-Id: <1395121738-29126-25-git-send-email-benjamin.widawsky@intel.com> X-Mailer: git-send-email 1.9.0 In-Reply-To: <1395121738-29126-1-git-send-email-benjamin.widawsky@intel.com> References: <1395121738-29126-1-git-send-email-benjamin.widawsky@intel.com> Subject: [Intel-gfx] [PATCH 24/26] drm/i915: Finish gen6/7 dynamic page table allocation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch continues on the idea from the previous patch. From here on, in the steady state, PDEs are all pointing to the scratch page table (as recommended in the spec). When an object is allocated in the VA range, the code will determine if we need to allocate a page for the page table. Similarly when the object is destroyed, we will remove, and free the page table pointing the PDE back to the scratch page. Following patches will work to unify the code a bit as we bring in GEN8 support. GEN6 and GEN8 are different enough that I had a hard time to get to this point with as much common code as I do. The aliasing PPGTT must pre-allocate all of the page tables. There are a few reasons for this. Two trivial ones: aliasing ppgtt goes through the ggtt paths, so it's hard to maintain, we currently do not restore the default context (assuming the previous force reload is indeed necessary). Most importantly though, the only way (it seems from empirical evidence) to invalidate the CS TLBs on non-render ring is to either use ring sync (which requires actually stopping the rings in order to synchronize when the sync completes vs. where you are in execution), or to reload DCLV. Since without full PPGTT we do not ever reload the DCLV register, there is no good way to achieve this. The simplest solution is just to not support dynamic page table creation/destruction in the aliasing PPGTT. We could always reload DCLV, but this seems like quite a bit of excess overhead only to save at most 2MB-4k of memory for the aliasing PPGTT page tables. Signed-off-by: Ben Widawsky --- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 123 +++++++++++++++++++++++++++++--- drivers/gpu/drm/i915/i915_trace.h | 108 ++++++++++++++++++++++++++++ 4 files changed, 224 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index b19442c..eeef032 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2373,7 +2373,7 @@ static inline void i915_gem_chipset_flush(struct drm_device *dev) if (INTEL_INFO(dev)->gen < 6) intel_gtt_chipset_flush(); } -int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt); +int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, bool aliasing); bool intel_enable_ppgtt(struct drm_device *dev, bool full); /* i915_gem_stolen.c */ diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 6ad5380..185c926 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -209,7 +209,7 @@ create_vm_for_ctx(struct drm_device *dev, struct i915_hw_context *ctx) if (!ppgtt) return ERR_PTR(-ENOMEM); - ret = i915_gem_init_ppgtt(dev, ppgtt); + ret = i915_gem_init_ppgtt(dev, ppgtt, ctx->file_priv == NULL); if (ret) { kfree(ppgtt); return ERR_PTR(ret); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 6d904c9..846a5b5 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1016,13 +1016,54 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, kunmap_atomic(pt_vaddr); } +static DECLARE_BITMAP(new_page_tables, I915_PDES_PER_PD); static int gen6_alloc_va_range(struct i915_address_space *vm, uint64_t start, uint64_t length) { + struct drm_device *dev = vm->dev; + struct drm_i915_private *dev_priv = dev->dev_private; struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base); struct i915_pagetab *pt; + const uint32_t start_save = start, length_save = length; uint32_t pde, temp; + int ret; + + BUG_ON(upper_32_bits(start)); + + bitmap_zero(new_page_tables, I915_PDES_PER_PD); + + trace_i915_va_alloc(vm, start, length); + + /* The allocation is done in two stages so that we can bail out with + * minimal amount of pain. The first stage finds new page tables that + * need allocation. The second stage marks use ptes within the page + * tables. + */ + gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) { + if (pt != ppgtt->scratch_pt) { + WARN_ON(bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT)); + continue; + } + + /* We've already allocated a page table */ + WARN_ON(!bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT)); + + pt = alloc_pt_single(dev); + if (IS_ERR(pt)) { + ret = PTR_ERR(pt); + goto unwind_out; + } + + ppgtt->pd.page_tables[pde] = pt; + set_bit(pde, new_page_tables); + trace_i915_pagetable_alloc(vm, pde, + start, + start + (1 << GEN6_PDE_SHIFT)); + } + + start = start_save; + length = length_save; gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) { int j; @@ -1040,11 +1081,32 @@ static int gen6_alloc_va_range(struct i915_address_space *vm, } } - bitmap_or(pt->used_ptes, pt->used_ptes, tmp_bitmap, + if (test_and_clear_bit(pde, new_page_tables)) + gen6_map_single(&ppgtt->pd, pde, pt); + + trace_i915_pagetable_map(vm, pde, pt, + gen6_pte_index(start), + gen6_pte_count(start, length), + GEN6_PTES_PER_PT); + bitmap_or(pt->used_ptes, tmp_bitmap, pt->used_ptes, GEN6_PTES_PER_PT); } + WARN_ON(!bitmap_empty(new_page_tables, I915_PDES_PER_PD)); + + /* Make sure write is complete before other code can use this page + * table. Also require for WC mapped PTEs */ + readl(dev_priv->gtt.gsm); + return 0; + +unwind_out: + for_each_set_bit(pde, new_page_tables, I915_PDES_PER_PD) { + struct i915_pagetab *pt = ppgtt->pd.page_tables[pde]; + ppgtt->pd.page_tables[pde] = NULL; + free_pt_single(pt, vm->dev); + } + return ret; } static void gen6_teardown_va_range(struct i915_address_space *vm, @@ -1055,9 +1117,30 @@ static void gen6_teardown_va_range(struct i915_address_space *vm, struct i915_pagetab *pt; uint32_t pde, temp; + trace_i915_va_teardown(vm, start, length); + gen6_for_each_pde(pt, &ppgtt->pd, start, length, temp, pde) { + + if (WARN(pt == ppgtt->scratch_pt, + "Tried to teardown scratch page vm %p. pde %u: %llx-%llx\n", + vm, pde, start, start + length)) + continue; + + trace_i915_pagetable_unmap(vm, pde, pt, + gen6_pte_index(start), + gen6_pte_count(start, length), + GEN6_PTES_PER_PT); + bitmap_clear(pt->used_ptes, gen6_pte_index(start), gen6_pte_count(start, length)); + + if (bitmap_empty(pt->used_ptes, GEN6_PTES_PER_PT)) { + trace_i915_pagetable_destroy(vm, pde, + start & GENMASK_ULL(64, GEN6_PDE_SHIFT), + start + (1 << GEN6_PDE_SHIFT)); + gen6_map_single(&ppgtt->pd, pde, ppgtt->scratch_pt); + ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt; + } } } @@ -1065,9 +1148,13 @@ static void gen6_ppgtt_free(struct i915_hw_ppgtt *ppgtt) { int i; - for (i = 0; i < ppgtt->num_pd_entries; i++) - free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev); + for (i = 0; i < ppgtt->num_pd_entries; i++) { + struct i915_pagetab *pt = ppgtt->pd.page_tables[i]; + if (pt != ppgtt->scratch_pt) + free_pt_single(ppgtt->pd.page_tables[i], ppgtt->base.dev); + } + /* Consider putting this as part of pd free. */ free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev); free_pd_single(&ppgtt->pd, ppgtt->base.dev); } @@ -1136,7 +1223,7 @@ err_out: return ret; } -static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt) +static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt, bool preallocate_pt) { int ret; @@ -1144,9 +1231,13 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt) if (ret) return ret; + if (!preallocate_pt) + return 0; + ret = alloc_pt_range(&ppgtt->pd, 0, ppgtt->num_pd_entries, ppgtt->base.dev); if (ret) { + free_pt_scratch(ppgtt->scratch_pt, ppgtt->base.dev); drm_mm_remove_node(&ppgtt->node); return ret; } @@ -1154,8 +1245,19 @@ static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt) return 0; } +static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt, + uint64_t start, uint64_t length) +{ + struct i915_pagetab *unused; + uint32_t pde, temp; -static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) + gen6_for_each_pde(unused, &ppgtt->pd, start, length, temp, pde) { + ppgtt->pd.page_tables[pde] = ppgtt->scratch_pt; + gen6_map_single(&ppgtt->pd, pde, ppgtt->pd.page_tables[pde]); + } +} + +static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt, bool aliasing) { struct drm_device *dev = ppgtt->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; @@ -1174,7 +1276,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) } else BUG(); - ret = gen6_ppgtt_alloc(ppgtt); + ret = gen6_ppgtt_alloc(ppgtt, aliasing); if (ret) return ret; @@ -1193,7 +1295,10 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) ppgtt->pd_addr = (gen6_gtt_pte_t __iomem*)dev_priv->gtt.gsm + ppgtt->pd.pd_offset / sizeof(gen6_gtt_pte_t); - gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total); + if (aliasing) + gen6_map_page_range(dev_priv, &ppgtt->pd, 0, ppgtt->base.total); + else + gen6_scratch_va_range(ppgtt, 0, ppgtt->base.total); DRM_DEBUG_DRIVER("Allocated pde space (%ldM) at GTT entry: %lx\n", ppgtt->node.size >> 20, @@ -1202,7 +1307,7 @@ static int gen6_ppgtt_init(struct i915_hw_ppgtt *ppgtt) return 0; } -int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt) +int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt, bool aliasing) { struct drm_i915_private *dev_priv = dev->dev_private; int ret = 0; @@ -1211,7 +1316,7 @@ int i915_gem_init_ppgtt(struct drm_device *dev, struct i915_hw_ppgtt *ppgtt) ppgtt->base.scratch = dev_priv->gtt.base.scratch; if (INTEL_INFO(dev)->gen < 8) - ret = gen6_ppgtt_init(ppgtt); + ret = gen6_ppgtt_init(ppgtt, aliasing); else if (IS_GEN8(dev)) ret = gen8_ppgtt_init(ppgtt, dev_priv->gtt.base.total); else diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index b95a380..86e85de 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -81,6 +81,114 @@ TRACE_EVENT(i915_vma_unbind, __entry->obj, __entry->offset, __entry->size, __entry->vm) ); +DECLARE_EVENT_CLASS(i915_va, + TP_PROTO(struct i915_address_space *vm, u64 start, u64 length), + TP_ARGS(vm, start, length), + + TP_STRUCT__entry( + __field(struct i915_address_space *, vm) + __field(u64, start) + __field(u64, end) + ), + + TP_fast_assign( + __entry->vm = vm; + __entry->start = start; + __entry->end = start + length; + ), + + TP_printk("vm=%p, 0x%llx-0x%llx", __entry->vm, __entry->start, __entry->end) +); + +DEFINE_EVENT(i915_va, i915_va_alloc, + TP_PROTO(struct i915_address_space *vm, u64 start, u64 length), + TP_ARGS(vm, start, length) +); + +DEFINE_EVENT(i915_va, i915_va_teardown, + TP_PROTO(struct i915_address_space *vm, u64 start, u64 length), + TP_ARGS(vm, start, length) +); + +DECLARE_EVENT_CLASS(i915_pagetable, + TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 end), + TP_ARGS(vm, pde, start, end), + + TP_STRUCT__entry( + __field(struct i915_address_space *, vm) + __field(u32, pde) + __field(u64, start) + __field(u64, end) + ), + + TP_fast_assign( + __entry->vm = vm; + __entry->pde = pde; + __entry->start = start; + __entry->end = end; + ), + + TP_printk("vm=%p, pde=%d (0x%llx-0x%llx)", + __entry->vm, __entry->pde, __entry->start, __entry->end) +); + +DEFINE_EVENT(i915_pagetable, i915_pagetable_alloc, + TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 end), + TP_ARGS(vm, pde, start, end) +); + +DEFINE_EVENT(i915_pagetable, i915_pagetable_destroy, + TP_PROTO(struct i915_address_space *vm, u32 pde, u64 start, u64 end), + TP_ARGS(vm, pde, start, end) +); + +/* Avoid extra math because we only support two sizes. The format is defined by + * bitmap_scnprintf. Each 32 bits is 8 HEX digits followed by comma */ +#define TRACE_PT_SIZE(bits) \ + ((((bits) == 1024) ? 288 : 144) + 1) + +DECLARE_EVENT_CLASS(i915_pagetable_update, + TP_PROTO(struct i915_address_space *vm, u32 pde, + struct i915_pagetab *pt, u32 first, u32 len, size_t bits), + TP_ARGS(vm, pde, pt, first, len, bits), + + TP_STRUCT__entry( + __field(struct i915_address_space *, vm) + __field(u32, pde) + __field(u32, first) + __field(u32, last) + __dynamic_array(char, cur_ptes, TRACE_PT_SIZE(bits)) + ), + + TP_fast_assign( + __entry->vm = vm; + __entry->pde = pde; + __entry->first = first; + __entry->last = first + len; + + bitmap_scnprintf(__get_str(cur_ptes), + TRACE_PT_SIZE(bits), + pt->used_ptes, + bits); + ), + + TP_printk("vm=%p, pde=%d, updating %u:%u\t%s", + __entry->vm, __entry->pde, __entry->last, __entry->first, + __get_str(cur_ptes)) +); + +DEFINE_EVENT(i915_pagetable_update, i915_pagetable_map, + TP_PROTO(struct i915_address_space *vm, u32 pde, + struct i915_pagetab *pt, u32 first, u32 len, size_t bits), + TP_ARGS(vm, pde, pt, first, len, bits) +); + +DEFINE_EVENT(i915_pagetable_update, i915_pagetable_unmap, + TP_PROTO(struct i915_address_space *vm, u32 pde, + struct i915_pagetab *pt, u32 first, u32 len, size_t bits), + TP_ARGS(vm, pde, pt, first, len, bits) +); + TRACE_EVENT(i915_gem_object_change_domain, TP_PROTO(struct drm_i915_gem_object *obj, u32 old_read, u32 old_write), TP_ARGS(obj, old_read, old_write),