From patchwork Tue May 26 14:21:12 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Thierry X-Patchwork-Id: 6482071 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 713E19F38C for ; Tue, 26 May 2015 14:21:31 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2F26A20621 for ; Tue, 26 May 2015 14:21:30 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 921BD20627 for ; Tue, 26 May 2015 14:21:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 049BB6E6C3; Tue, 26 May 2015 07:21:27 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTP id 5B2D96E6BF for ; Tue, 26 May 2015 07:21:25 -0700 (PDT) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 26 May 2015 07:21:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,498,1427785200"; d="scan'208";a="715734949" Received: from michelth-linux.isw.intel.com ([10.102.226.66]) by fmsmga001.fm.intel.com with ESMTP; 26 May 2015 07:21:25 -0700 From: Michel Thierry To: intel-gfx@lists.freedesktop.org Date: Tue, 26 May 2015 15:21:12 +0100 Message-Id: <1432650084-24491-6-git-send-email-michel.thierry@intel.com> X-Mailer: git-send-email 2.4.0 In-Reply-To: <1432650084-24491-1-git-send-email-michel.thierry@intel.com> References: <1432650084-24491-1-git-send-email-michel.thierry@intel.com> Cc: akash.goel@intel.com Subject: [Intel-gfx] [PATCH 05/16] drm/i915/gen8: implement alloc/free for 4lvl X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP PML4 has no special attributes, and there will always be a PML4. So simply initialize it at creation, and destroy it at the end. The code for 4lvl is able to call into the existing 3lvl page table code to handle all of the lower levels. v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the compiler happy. And define ret only in one place. Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl. v3: Use i915_dma_unmap_single instead of pci API. Fix a couple of incorrect checks when unmapping pdp and pd pages (Akash). v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp_single param list. v5: Prevent (harmless) out of range access in gen8_for_each_pml4e. v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error paths. (Akash) v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/. v8: Change location of pml4_init/fini. It will make next patches cleaner. Cc: Akash Goel Signed-off-by: Ben Widawsky Signed-off-by: Michel Thierry (v2+) --- drivers/gpu/drm/i915/i915_gem_gtt.c | 198 ++++++++++++++++++++++++++++++------ drivers/gpu/drm/i915/i915_gem_gtt.h | 12 ++- 2 files changed, 177 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index dc33314f8..7dad575 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -474,8 +474,12 @@ static void unmap_and_free_pdp(struct i915_page_directory_pointer *pdp, struct drm_device *dev) { __pdp_fini(pdp); - if (USES_FULL_48BIT_PPGTT(dev)) + + if (USES_FULL_48BIT_PPGTT(dev)) { + i915_dma_unmap_single(pdp, dev); + __free_page(pdp->page); kfree(pdp); + } } static int __pdp_init(struct i915_page_directory_pointer *pdp, @@ -501,6 +505,37 @@ static int __pdp_init(struct i915_page_directory_pointer *pdp, return 0; } +static struct +i915_page_directory_pointer *alloc_pdp_single(struct i915_hw_ppgtt *ppgtt) +{ + struct drm_device *dev = ppgtt->base.dev; + struct i915_page_directory_pointer *pdp; + int ret; + + WARN_ON(!USES_FULL_48BIT_PPGTT(dev)); + + pdp = kmalloc(sizeof(*pdp), GFP_KERNEL); + if (!pdp) + return ERR_PTR(-ENOMEM); + + pdp->page = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO); + if (!pdp->page) { + kfree(pdp); + return ERR_PTR(-ENOMEM); + } + + ret = __pdp_init(pdp, dev); + if (ret) { + __free_page(pdp->page); + kfree(pdp); + return ERR_PTR(ret); + } + + i915_dma_map_single(pdp, dev); + + return pdp; +} + /* Broadwell Page Directory Pointer Descriptors */ static int gen8_write_pdp(struct intel_engine_cs *ring, unsigned entry, @@ -681,6 +716,28 @@ static void gen8_initialize_pd(struct i915_address_space *vm, kunmap_atomic(page_directory); } +static void pml4_fini(struct i915_pml4 *pml4) +{ + struct i915_hw_ppgtt *ppgtt = + container_of(pml4, struct i915_hw_ppgtt, pml4); + i915_dma_unmap_single(pml4, ppgtt->base.dev); + __free_page(pml4->page); + pml4->page = NULL; +} + +static int pml4_init(struct i915_hw_ppgtt *ppgtt) +{ + struct i915_pml4 *pml4 = &ppgtt->pml4; + + pml4->page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!pml4->page) + return -ENOMEM; + + i915_dma_map_single(pml4, ppgtt->base.dev); + + return 0; +} + /* It's likely we'll map more than one pagetable at a time. This function will * save us unnecessary kmap calls, but do no more functionally than multiple * calls to map_pt. */ @@ -723,28 +780,46 @@ static void gen8_free_page_tables(struct i915_page_directory *pd, struct drm_dev } } -static void gen8_ppgtt_cleanup(struct i915_address_space *vm) +static void gen8_ppgtt_cleanup_3lvl(struct i915_page_directory_pointer *pdp, + struct drm_device *dev) { - struct i915_hw_ppgtt *ppgtt = - container_of(vm, struct i915_hw_ppgtt, base); int i; - if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) { - for_each_set_bit(i, ppgtt->pdp.used_pdpes, - I915_PDPES_PER_PDP(ppgtt->base.dev)) { - if (WARN_ON(!ppgtt->pdp.page_directory[i])) - continue; + for_each_set_bit(i, pdp->used_pdpes, I915_PDPES_PER_PDP(dev)) { + if (WARN_ON(!pdp->page_directory[i])) + continue; - gen8_free_page_tables(ppgtt->pdp.page_directory[i], - ppgtt->base.dev); - unmap_and_free_pd(ppgtt->pdp.page_directory[i], - ppgtt->base.dev); - } - unmap_and_free_pdp(&ppgtt->pdp, ppgtt->base.dev); - } else { - WARN_ON(1); /* to be implemented later */ + gen8_free_page_tables(pdp->page_directory[i], dev); + unmap_and_free_pd(pdp->page_directory[i], dev); } + unmap_and_free_pdp(pdp, dev); +} + +static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt) +{ + int i; + + for_each_set_bit(i, ppgtt->pml4.used_pml4es, GEN8_PML4ES_PER_PML4) { + if (WARN_ON(!ppgtt->pml4.pdps[i])) + continue; + + gen8_ppgtt_cleanup_3lvl(ppgtt->pml4.pdps[i], ppgtt->base.dev); + } + + pml4_fini(&ppgtt->pml4); +} + +static void gen8_ppgtt_cleanup(struct i915_address_space *vm) +{ + struct i915_hw_ppgtt *ppgtt = + container_of(vm, struct i915_hw_ppgtt, base); + + if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) + gen8_ppgtt_cleanup_3lvl(&ppgtt->pdp, ppgtt->base.dev); + else + gen8_ppgtt_cleanup_4lvl(ppgtt); + unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev); unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev); } @@ -1013,8 +1088,62 @@ static int gen8_alloc_va_range_4lvl(struct i915_address_space *vm, uint64_t start, uint64_t length) { - WARN_ON(1); /* to be implemented later */ + DECLARE_BITMAP(new_pdps, GEN8_PML4ES_PER_PML4); + struct i915_hw_ppgtt *ppgtt = + container_of(vm, struct i915_hw_ppgtt, base); + struct i915_page_directory_pointer *pdp; + const uint64_t orig_start = start; + const uint64_t orig_length = length; + uint64_t temp, pml4e; + int ret = 0; + + /* Do the pml4 allocations first, so we don't need to track the newly + * allocated tables below the pdp */ + bitmap_zero(new_pdps, GEN8_PML4ES_PER_PML4); + + /* The pagedirectory and pagetable allocations are done in the shared 3 + * and 4 level code. Just allocate the pdps. + */ + gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) { + if (!pdp) { + WARN_ON(test_bit(pml4e, pml4->used_pml4es)); + pdp = alloc_pdp_single(ppgtt); + if (IS_ERR(pdp)) + goto err_out; + + pml4->pdps[pml4e] = pdp; + set_bit(pml4e, new_pdps); + trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, pml4e, + pml4e << GEN8_PML4E_SHIFT, + GEN8_PML4E_SHIFT); + } + } + + WARN(bitmap_weight(new_pdps, GEN8_PML4ES_PER_PML4) > 2, + "The allocation has spanned more than 512GB. " + "It is highly likely this is incorrect."); + + start = orig_start; + length = orig_length; + + gen8_for_each_pml4e(pdp, pml4, start, length, temp, pml4e) { + WARN_ON(!pdp); + + ret = gen8_alloc_va_range_3lvl(vm, pdp, start, length); + if (ret) + goto err_out; + } + + bitmap_or(pml4->used_pml4es, new_pdps, pml4->used_pml4es, + GEN8_PML4ES_PER_PML4); + return 0; + +err_out: + for_each_set_bit(pml4e, new_pdps, GEN8_PML4ES_PER_PML4) + gen8_ppgtt_cleanup_3lvl(pml4->pdps[pml4e], vm->dev); + + return ret; } static int gen8_alloc_va_range(struct i915_address_space *vm, @@ -1023,10 +1152,10 @@ static int gen8_alloc_va_range(struct i915_address_space *vm, struct i915_hw_ppgtt *ppgtt = container_of(vm, struct i915_hw_ppgtt, base); - if (!USES_FULL_48BIT_PPGTT(vm->dev)) - return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length); - else + if (USES_FULL_48BIT_PPGTT(vm->dev)) return gen8_alloc_va_range_4lvl(vm, &ppgtt->pml4, start, length); + else + return gen8_alloc_va_range_3lvl(vm, &ppgtt->pdp, start, length); } /* @@ -1038,6 +1167,8 @@ static int gen8_alloc_va_range(struct i915_address_space *vm, */ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt) { + int ret; + ppgtt->scratch_pt = alloc_pt_single(ppgtt->base.dev); if (IS_ERR(ppgtt->scratch_pt)) return PTR_ERR(ppgtt->scratch_pt); @@ -1049,19 +1180,19 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt) gen8_initialize_pt(&ppgtt->base, ppgtt->scratch_pt); gen8_initialize_pd(&ppgtt->base, ppgtt->scratch_pd); - if (!USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) { - int ret = __pdp_init(&ppgtt->pdp, false); + if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) { + ret = pml4_init(ppgtt); + if (ret) + goto err_out; - if (ret) { - unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev); - unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev); - return ret; - } + ppgtt->base.total = 1ULL << 48; + } else { + ret = __pdp_init(&ppgtt->pdp, false); + if (ret) + goto err_out; ppgtt->base.total = 1ULL << 32; - } else { - ppgtt->base.total = 1ULL << 48; - return -EPERM; /* Not yet implemented */ + trace_i915_page_directory_pointer_entry_alloc(&ppgtt->base, 0, 0, GEN8_PML4E_SHIFT); } ppgtt->base.start = 0; @@ -1075,6 +1206,11 @@ static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt) ppgtt->switch_mm = gen8_mm_switch; return 0; + +err_out: + unmap_and_free_pd(ppgtt->scratch_pd, ppgtt->base.dev); + unmap_and_free_pt(ppgtt->scratch_pt, ppgtt->base.dev); + return ret; } static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index a01cc34..2229d05 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -95,6 +95,7 @@ typedef uint64_t gen8_pde_t; */ #define GEN8_PML4ES_PER_PML4 512 #define GEN8_PML4E_SHIFT 39 +#define GEN8_PML4E_MASK (GEN8_PML4ES_PER_PML4 - 1) #define GEN8_PDPE_SHIFT 30 /* NB: GEN8_PDPE_MASK is untrue for 32b platforms, but it has no impact on 32b page * tables */ @@ -455,6 +456,14 @@ static inline uint32_t gen6_pde_index(uint32_t addr) temp = min(temp, length), \ start += temp, length -= temp) +#define gen8_for_each_pml4e(pdp, pml4, start, length, temp, iter) \ + for (iter = gen8_pml4e_index(start); \ + pdp = (pml4)->pdps[iter], length > 0 && iter < GEN8_PML4ES_PER_PML4; \ + iter++, \ + temp = ALIGN(start+1, 1ULL << GEN8_PML4E_SHIFT) - start, \ + temp = min(temp, length), \ + start += temp, length -= temp) + #define gen8_for_each_pdpe(pd, pdp, start, length, temp, iter) \ gen8_for_each_pdpe_e(pd, pdp, start, length, temp, iter, I915_PDPES_PER_PDP(dev)) @@ -475,8 +484,7 @@ static inline uint32_t gen8_pdpe_index(uint64_t address) static inline uint32_t gen8_pml4e_index(uint64_t address) { - WARN_ON(1); /* For 64B */ - return 0; + return (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK; } static inline size_t gen8_pte_count(uint64_t address, uint64_t length)