From patchwork Mon Jan 3 11:59:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12702512 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 311E0C433EF for ; Mon, 3 Jan 2022 12:00:20 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6DD8910E25A; Mon, 3 Jan 2022 12:00:03 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id EC25B10E158; Mon, 3 Jan 2022 12:00:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641211200; x=1672747200; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=82BhBXFn0fKMdXlX8VsPfQ2/KrqXB0gFpKin1YRDgQ8=; b=OeH5EMDm+JaygvH620uOZXlWiv1EuUeqUTM8mNNv4lpJR9/MSNkPbEmL NCK3QTyz5YwFuYy1qGQnOQuv3IsfnVNW5hpMnIKktulB9hL7kP6KPsJHQ MuuQlu2ayGTxX/JgnhsDBJdBWxExd7MG3aZRc3brUelMPw2wH1Qij4RLd E1L1xNmOYFM56mQc9gER6dNTTBBNg8XpplJmvfuvnx0tLjvOZUceJ3nBY IYGFQhOvK9gFm42r9/kZ7X+6NZSp0aIyq+fCD4ppm8qKoTclCAE1mCIpB jomeTWbfhzPsS21ZP7luQqlwn2GVY8YMdy1guyZqSJDNGaPIummvB4Exa g==; X-IronPort-AV: E=McAfee;i="6200,9189,10215"; a="222044505" X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="222044505" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:00 -0800 X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="512018349" Received: from mikaelsk-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.198]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 03:59:58 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Mon, 3 Jan 2022 12:59:44 +0100 Message-Id: <20220103115947.92688-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> References: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 1/4] drm/i915: Initial introduction of vma resources X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Introduce vma resources, sort of similar to TTM resources, needed for asynchronous bind management. Initially we will use them to hold completion of unbinding when we capture data from a vma, but they will be used extensively in upcoming patches for asynchronous vma unbinding. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 55 +++++++- drivers/gpu/drm/i915/i915_vma.h | 19 ++- drivers/gpu/drm/i915/i915_vma_resource.c | 124 ++++++++++++++++++ drivers/gpu/drm/i915/i915_vma_resource.h | 70 ++++++++++ drivers/gpu/drm/i915/i915_vma_snapshot.c | 15 +-- drivers/gpu/drm/i915/i915_vma_snapshot.h | 7 +- drivers/gpu/drm/i915/i915_vma_types.h | 5 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 99 ++++++++------ 10 files changed, 334 insertions(+), 63 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_vma_resource.c create mode 100644 drivers/gpu/drm/i915/i915_vma_resource.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 1b62b9f65196..98433ad74194 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -174,6 +174,7 @@ i915-y += \ i915_trace_points.o \ i915_ttm_buddy_manager.o \ i915_vma.o \ + i915_vma_resource.o \ i915_vma_snapshot.o \ intel_wopcm.o diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e9541244027a..72e497745c12 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1422,7 +1422,7 @@ eb_relocate_entry(struct i915_execbuffer *eb, mutex_lock(&vma->vm->mutex); err = i915_vma_bind(target->vma, target->vma->obj->cache_level, - PIN_GLOBAL, NULL); + PIN_GLOBAL, NULL, NULL); mutex_unlock(&vma->vm->mutex); reloc_cache_remap(&eb->reloc_cache, ev->vma->obj); if (err) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index be208a8f1ed0..7097c5016431 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -37,6 +37,7 @@ #include "i915_sw_fence_work.h" #include "i915_trace.h" #include "i915_vma.h" +#include "i915_vma_resource.h" static struct kmem_cache *slab_vmas; @@ -380,6 +381,8 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma) * @cache_level: mapping cache level * @flags: flags like global or local mapping * @work: preallocated worker for allocating and binding the PTE + * @vma_res: pointer to a preallocated vma resource. The resource is either + * consumed or freed. * * DMA addresses are taken from the scatter-gather table of this object (or of * this VMA in case of non-default GGTT views) and PTE entries set up. @@ -388,7 +391,8 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma) int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, u32 flags, - struct i915_vma_work *work) + struct i915_vma_work *work, + struct i915_vma_resource *vma_res) { u32 bind_flags; u32 vma_flags; @@ -399,11 +403,15 @@ int i915_vma_bind(struct i915_vma *vma, if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start, vma->node.size, - vma->vm->total))) + vma->vm->total))) { + kfree(vma_res); return -ENODEV; + } - if (GEM_DEBUG_WARN_ON(!flags)) + if (GEM_DEBUG_WARN_ON(!flags)) { + kfree(vma_res); return -EINVAL; + } bind_flags = flags; bind_flags &= I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; @@ -412,11 +420,21 @@ int i915_vma_bind(struct i915_vma *vma, vma_flags &= I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; bind_flags &= ~vma_flags; - if (bind_flags == 0) + if (bind_flags == 0) { + kfree(vma_res); return 0; + } GEM_BUG_ON(!atomic_read(&vma->pages_count)); + if (vma->resource || !vma_res) { + /* Rebinding with an additional I915_VMA_*_BIND */ + GEM_WARN_ON(!vma_flags); + kfree(vma_res); + } else { + i915_vma_resource_init(vma_res); + vma->resource = vma_res; + } trace_i915_vma_bind(vma, bind_flags); if (work && bind_flags & vma->vm->bind_async_flags) { struct dma_fence *prev; @@ -1279,6 +1297,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, { struct i915_vma_work *work = NULL; struct dma_fence *moving = NULL; + struct i915_vma_resource *vma_res = NULL; intel_wakeref_t wakeref = 0; unsigned int bound; int err; @@ -1333,6 +1352,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, } } + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) { + err = PTR_ERR(vma_res); + goto err_fence; + } + /* * Differentiate between user/kernel vma inside the aliasing-ppgtt. * @@ -1353,7 +1378,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, err = mutex_lock_interruptible_nested(&vma->vm->mutex, !(flags & PIN_GLOBAL)); if (err) - goto err_fence; + goto err_vma_res; /* No more allocations allowed now we hold vm->mutex */ @@ -1394,7 +1419,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, GEM_BUG_ON(!vma->pages); err = i915_vma_bind(vma, vma->obj->cache_level, - flags, work); + flags, work, vma_res); + vma_res = NULL; if (err) goto err_remove; @@ -1417,6 +1443,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, i915_active_release(&vma->active); err_unlock: mutex_unlock(&vma->vm->mutex); +err_vma_res: + kfree(vma_res); err_fence: if (work) dma_fence_work_commit_imm(&work->base); @@ -1567,6 +1595,7 @@ void i915_vma_release(struct kref *ref) i915_vm_put(vma->vm); i915_active_fini(&vma->active); + GEM_WARN_ON(vma->resource); i915_vma_free(vma); } @@ -1715,6 +1744,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma, void __i915_vma_evict(struct i915_vma *vma) { + struct dma_fence *unbind_fence; + GEM_BUG_ON(i915_vma_is_pinned(vma)); if (i915_vma_is_map_and_fenceable(vma)) { @@ -1752,8 +1783,20 @@ void __i915_vma_evict(struct i915_vma *vma) atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); + unbind_fence = i915_vma_resource_unbind(vma->resource); + i915_vma_resource_put(vma->resource); + vma->resource = NULL; + i915_vma_detach(vma); vma_unbind_pages(vma); + + /* + * This uninterruptible wait under the vm mutex is currently + * only ever blocking while the vma is being captured from. + * With async unbinding, this wait here will be removed. + */ + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); } int __i915_vma_unbind(struct i915_vma *vma) diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 32719431b3df..de0f3e44cdfa 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -37,6 +37,7 @@ #include "i915_active.h" #include "i915_request.h" +#include "i915_vma_resource.h" #include "i915_vma_types.h" struct i915_vma * @@ -204,7 +205,8 @@ struct i915_vma_work *i915_vma_work(void); int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, u32 flags, - struct i915_vma_work *work); + struct i915_vma_work *work, + struct i915_vma_resource *vma_res); bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color); bool i915_vma_misplaced(const struct i915_vma *vma, @@ -428,6 +430,21 @@ static inline int i915_vma_sync(struct i915_vma *vma) return i915_active_wait(&vma->active); } +/** + * i915_vma_get_current_resource - Get the current resource of the vma + * @vma: The vma to get the current resource from. + * + * It's illegal to call this function if the vma is not bound. + * + * Return: A refcounted pointer to the current vma resource + * of the vma, assuming the vma is bound. + */ +static inline struct i915_vma_resource * +i915_vma_get_current_resource(struct i915_vma *vma) +{ + return i915_vma_resource_get(vma->resource); +} + void i915_vma_module_exit(void); int i915_vma_module_init(void); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c new file mode 100644 index 000000000000..833e987bed2a --- /dev/null +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ +#include + +#include "i915_vma_resource.h" + +/* Callbacks for the unbind dma-fence. */ +static const char *get_driver_name(struct dma_fence *fence) +{ + return "vma unbind fence"; +} + +static const char *get_timeline_name(struct dma_fence *fence) +{ + return "unbound"; +} + +static struct dma_fence_ops unbind_fence_ops = { + .get_driver_name = get_driver_name, + .get_timeline_name = get_timeline_name, +}; + +/** + * i915_vma_resource_init - Initialize a vma resource. + * @vma_res: The vma resource to initialize + * + * Initializes a vma resource allocated using i915_vma_resource_alloc(). + * The reason for having separate allocate and initialize function is that + * initialization may need to be performed from under a lock where + * allocation is not allowed. + */ +void i915_vma_resource_init(struct i915_vma_resource *vma_res) +{ + spin_lock_init(&vma_res->lock); + dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, + &vma_res->lock, 0, 0); + refcount_set(&vma_res->hold_count, 1); +} + +/** + * i915_vma_resource_alloc - Allocate a vma resource + * + * Return: A pointer to a cleared struct i915_vma_resource or + * a -ENOMEM error pointer if allocation fails. + */ +struct i915_vma_resource *i915_vma_resource_alloc(void) +{ + struct i915_vma_resource *vma_res = + kzalloc(sizeof(*vma_res), GFP_KERNEL); + + return vma_res ? vma_res : ERR_PTR(-ENOMEM); +} + +static void __i915_vma_resource_unhold(struct i915_vma_resource *vma_res) +{ + if (refcount_dec_and_test(&vma_res->hold_count)) + dma_fence_signal(&vma_res->unbind_fence); +} + +/** + * i915_vma_resource_unhold - Unhold the signaling of the vma resource unbind + * fence. + * @vma_res: The vma resource. + * @lockdep_cookie: The lockdep cookie returned from i915_vma_resource_hold. + * + * The function may leave a dma_fence critical section. + */ +void i915_vma_resource_unhold(struct i915_vma_resource *vma_res, + bool lockdep_cookie) +{ + dma_fence_end_signalling(lockdep_cookie); + + if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { + unsigned long irq_flags; + + /* Inefficient open-coded might_lock_irqsave() */ + spin_lock_irqsave(&vma_res->lock, irq_flags); + spin_unlock_irqrestore(&vma_res->lock, irq_flags); + } + + __i915_vma_resource_unhold(vma_res); +} + +/** + * i915_vma_resource_hold - Hold the signaling of the vma resource unbind fence. + * @vma_res: The vma resource. + * @lockdep_cookie: Pointer to a bool serving as a lockdep cooke that should + * be given as an argument to the pairing i915_vma_resource_unhold. + * + * If returning true, the function enters a dma_fence signalling critical + * section is not in one already. + * + * Return: true if holding successful, false if not. + */ +bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, + bool *lockdep_cookie) +{ + bool held = refcount_inc_not_zero(&vma_res->hold_count); + + if (held) + *lockdep_cookie = dma_fence_begin_signalling(); + + return held; +} + +/** + * i915_vma_resource_unbind - Unbind a vma resource + * @vma_res: The vma resource to unbind. + * + * At this point this function does little more than publish a fence that + * signals immediately unless signaling is held back. + * + * Return: A refcounted pointer to a dma-fence that signals when unbinding is + * complete. + */ +struct dma_fence * +i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +{ + __i915_vma_resource_unhold(vma_res); + dma_fence_get(&vma_res->unbind_fence); + return &vma_res->unbind_fence; +} diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h new file mode 100644 index 000000000000..34744da23072 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef __I915_VMA_RESOURCE_H__ +#define __I915_VMA_RESOURCE_H__ + +#include +#include + +/** + * struct i915_vma_resource - Snapshotted unbind information. + * @unbind_fence: Fence to mark unbinding complete. Note that this fence + * is not considered published until unbind is scheduled, and as such it + * is illegal to access this fence before scheduled unbind other than + * for refcounting. + * @lock: The @unbind_fence lock. We're also using it to protect the weak + * pointer to the struct i915_vma, @vma during lookup and takedown. + * @hold_count: Number of holders blocking the fence from finishing. + * The vma itself is keeping a hold, which is released when unbind + * is scheduled. + * + * The lifetime of a struct i915_vma_resource is from a binding request to + * the actual possible asynchronous unbind has completed. + */ +struct i915_vma_resource { + struct dma_fence unbind_fence; + /* See above for description of the lock. */ + spinlock_t lock; + refcount_t hold_count; +}; + +bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, + bool *lockdep_cookie); + +void i915_vma_resource_unhold(struct i915_vma_resource *vma_res, + bool lockdep_cookie); + +struct i915_vma_resource *i915_vma_resource_alloc(void); + +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); + +/** + * i915_vma_resource_get - Take a reference on a vma resource + * @vma_res: The vma resource on which to take a reference. + * + * Return: The @vma_res pointer + */ +static inline struct i915_vma_resource +*i915_vma_resource_get(struct i915_vma_resource *vma_res) +{ + dma_fence_get(&vma_res->unbind_fence); + return vma_res; +} + +/** + * i915_vma_resource_put - Release a reference to a struct i915_vma_resource + * @vma_res: The resource + */ +static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) +{ + dma_fence_put(&vma_res->unbind_fence); +} + +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +void i915_vma_resource_init(struct i915_vma_resource *vma_res); +#endif + +#endif diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.c b/drivers/gpu/drm/i915/i915_vma_snapshot.c index 2949ceea9884..f7333c7a2f5e 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.c +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.c @@ -3,6 +3,7 @@ * Copyright © 2021 Intel Corporation */ +#include "i915_vma_resource.h" #include "i915_vma_snapshot.h" #include "i915_vma_types.h" #include "i915_vma.h" @@ -35,7 +36,7 @@ void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, vsnap->pages_rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); vsnap->mr = vma->obj->mm.region; kref_init(&vsnap->kref); - vsnap->vma_resource = &vma->active; + vsnap->vma_resource = i915_vma_get_current_resource(vma); vsnap->onstack = false; vsnap->present = true; } @@ -62,6 +63,7 @@ static void vma_snapshot_release(struct kref *ref) container_of(ref, typeof(*vsnap), kref); vsnap->present = false; + i915_vma_resource_put(vsnap->vma_resource); if (vsnap->pages_rsgt) i915_refct_sgt_put(vsnap->pages_rsgt); if (!vsnap->onstack) @@ -109,12 +111,7 @@ void i915_vma_snapshot_put_onstack(struct i915_vma_snapshot *vsnap) bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, bool *lockdep_cookie) { - bool pinned = i915_active_acquire_if_busy(vsnap->vma_resource); - - if (pinned) - *lockdep_cookie = dma_fence_begin_signalling(); - - return pinned; + return i915_vma_resource_hold(vsnap->vma_resource, lockdep_cookie); } /** @@ -128,7 +125,5 @@ bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, void i915_vma_snapshot_resource_unpin(struct i915_vma_snapshot *vsnap, bool lockdep_cookie) { - dma_fence_end_signalling(lockdep_cookie); - - return i915_active_release(vsnap->vma_resource); + i915_vma_resource_unhold(vsnap->vma_resource, lockdep_cookie); } diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.h b/drivers/gpu/drm/i915/i915_vma_snapshot.h index 940581df4622..e74588dd676b 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.h +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.h @@ -31,10 +31,7 @@ struct sg_table; * @pages_rsgt: The refcounted sg_table holding the reference for @pages if any. * @mr: The memory region pointed for the pages bound. * @kref: Reference for this structure. - * @vma_resource: FIXME: A means to keep the unbind fence from signaling. - * Temporarily while we have only sync unbinds, and still use the vma - * active, we use that. With async unbinding we need a signaling refcount - * for the unbind fence. + * @vma_resource: Pointer to the vma resource representing the vma binding. * @page_sizes: The vma GTT page sizes information. * @onstack: Whether the structure shouldn't be freed on final put. * @present: Whether the structure is present and initialized. @@ -49,7 +46,7 @@ struct i915_vma_snapshot { struct i915_refct_sgt *pages_rsgt; struct intel_memory_region *mr; struct kref kref; - struct i915_active *vma_resource; + struct i915_vma_resource *vma_resource; u32 page_sizes; bool onstack:1; bool present:1; diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h index ca575e129ced..ac1928d1dc11 100644 --- a/drivers/gpu/drm/i915/i915_vma_types.h +++ b/drivers/gpu/drm/i915/i915_vma_types.h @@ -95,6 +95,8 @@ enum i915_cache_level; * */ +struct i915_vma_resource; + struct intel_remapped_plane_info { /* in gtt pages */ u32 offset:31; @@ -291,6 +293,9 @@ struct i915_vma { struct list_head evict_link; struct list_head closed_link; + + /** The async vma resource. Protected by the vm_mutex */ + struct i915_vma_resource *resource; }; #endif diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 575705c3bce9..54be880e55c3 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -32,6 +32,7 @@ #include "i915_random.h" #include "i915_selftest.h" +#include "i915_vma_resource.h" #include "mock_drm.h" #include "mock_gem_device.h" @@ -1336,6 +1337,33 @@ static int igt_mock_drunk(void *arg) return exercise_mock(ggtt->vm.i915, drunk_hole); } +static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset) +{ + struct i915_address_space *vm = vma->vm; + struct i915_vma_resource *vma_res; + struct drm_i915_gem_object *obj = vma->obj; + int err; + + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) + return PTR_ERR(vma_res); + + mutex_lock(&vm->mutex); + err = i915_gem_gtt_reserve(vm, &vma->node, obj->base.size, + offset, + obj->cache_level, + 0); + if (!err) { + i915_vma_resource_init(vma_res); + vma->resource = vma_res; + } else { + kfree(vma_res); + } + mutex_unlock(&vm->mutex); + + return err; +} + static int igt_gtt_reserve(void *arg) { struct i915_ggtt *ggtt = arg; @@ -1370,20 +1398,13 @@ static int igt_gtt_reserve(void *arg) } list_add(&obj->st_link, &objects); - vma = i915_vma_instance(obj, &ggtt->vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, - obj->base.size, - total, - obj->cache_level, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = reserve_gtt_with_resource(vma, total); if (err) { pr_err("i915_gem_gtt_reserve (pass 1) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1429,13 +1450,7 @@ static int igt_gtt_reserve(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, - obj->base.size, - total, - obj->cache_level, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = reserve_gtt_with_resource(vma, total); if (err) { pr_err("i915_gem_gtt_reserve (pass 2) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1476,13 +1491,7 @@ static int igt_gtt_reserve(void *arg) 2 * I915_GTT_PAGE_SIZE, I915_GTT_MIN_ALIGNMENT); - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, - obj->base.size, - offset, - obj->cache_level, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = reserve_gtt_with_resource(vma, offset); if (err) { pr_err("i915_gem_gtt_reserve (pass 3) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1509,6 +1518,31 @@ static int igt_gtt_reserve(void *arg) return err; } +static int insert_gtt_with_resource(struct i915_vma *vma) +{ + struct i915_address_space *vm = vma->vm; + struct i915_vma_resource *vma_res; + struct drm_i915_gem_object *obj = vma->obj; + int err; + + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) + return PTR_ERR(vma_res); + + mutex_lock(&vm->mutex); + err = i915_gem_gtt_insert(vm, &vma->node, obj->base.size, 0, + obj->cache_level, 0, vm->total, 0); + if (!err) { + i915_vma_resource_init(vma_res); + vma->resource = vma_res; + } else { + kfree(vma_res); + } + mutex_unlock(&vm->mutex); + + return err; +} + static int igt_gtt_insert(void *arg) { struct i915_ggtt *ggtt = arg; @@ -1593,12 +1627,7 @@ static int igt_gtt_insert(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, - obj->base.size, 0, obj->cache_level, - 0, ggtt->vm.total, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = insert_gtt_with_resource(vma); if (err == -ENOSPC) { /* maxed out the GGTT space */ i915_gem_object_put(obj); @@ -1653,12 +1682,7 @@ static int igt_gtt_insert(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, - obj->base.size, 0, obj->cache_level, - 0, ggtt->vm.total, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = insert_gtt_with_resource(vma); if (err) { pr_err("i915_gem_gtt_insert (pass 2) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1702,12 +1726,7 @@ static int igt_gtt_insert(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, - obj->base.size, 0, obj->cache_level, - 0, ggtt->vm.total, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = insert_gtt_with_resource(vma); if (err) { pr_err("i915_gem_gtt_insert (pass 3) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); From patchwork Mon Jan 3 11:59:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12702513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 94377C433FE for ; Mon, 3 Jan 2022 12:00:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 125AD10E2A2; Mon, 3 Jan 2022 12:00:05 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2943710E20E; Mon, 3 Jan 2022 12:00:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641211203; x=1672747203; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0X1CCd2dp2HNyt6/SkU92fp934AOO57LYEBvr23PWIs=; b=DBBxLRn3SakcCUmQVAi83ozUUa1fn/nLMLGMPiLcqCxIsRA3Mj0OrFlU HmnurhZNBJkkjrLxWwNVpZfiVL+rrpkeW4d173nSdH2jr7jECh1Z9RXBS S8+WtVqTl/xMZx6hP25f1KXo+PpWQjvi6X0vVr5XxIBkRx3Bdsoq6Ov6y fbpMR+xWPDNlGjmXhRCTDsTXbwjoolorG2vveau3HIbnHwEvOyA1PnaVp mToSzptOVuI1FloCGWlesiqFkx2IgfTGc4KbwgtoqQHntNuo89OTU/F4w gQdTnpYvw7SwANXzHg7EF/Zg/+DAyrgENUUKTiOvFf8xU7yOPI6GTSW+L g==; X-IronPort-AV: E=McAfee;i="6200,9189,10215"; a="222044511" X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="222044511" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:02 -0800 X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="512018450" Received: from mikaelsk-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.198]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:00 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Mon, 3 Jan 2022 12:59:45 +0100 Message-Id: <20220103115947.92688-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> References: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 2/4] drm/i915: Use the vma resource as argument for gtt binding / unbinding X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When introducing asynchronous unbinding, the vma itself may no longer be alive when the actual binding or unbinding takes place. Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource instead of a struct i915_vma for the bind_vma() and unbind_vma() ops. Similarly change the insert_entries() op for struct i915_address_space. Replace a couple of i915_vma_snapshot members with their newly introduced i915_vma_resource counterparts, since they have the same lifetime. Also make sure to avoid changing the struct i915_vma_flags (in particular the bind flags) async. That should now only be done sync under the vm mutex. v2: - Update the vma_res::bound_flags when binding to the aliased ggtt Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/display/intel_dpt.c | 27 ++--- .../gpu/drm/i915/gem/i915_gem_object_types.h | 27 +---- .../gpu/drm/i915/gem/selftests/huge_pages.c | 37 +++---- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 19 ++-- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 37 +++---- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 70 ++++++------- drivers/gpu/drm/i915/gt/intel_gtt.h | 16 +-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 22 +++-- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 13 ++- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h | 2 +- drivers/gpu/drm/i915/i915_debugfs.c | 3 +- drivers/gpu/drm/i915/i915_gpu_error.c | 6 +- drivers/gpu/drm/i915/i915_vma.c | 24 ++++- drivers/gpu/drm/i915/i915_vma.h | 11 +-- drivers/gpu/drm/i915/i915_vma_resource.c | 9 +- drivers/gpu/drm/i915/i915_vma_resource.h | 99 ++++++++++++++++++- drivers/gpu/drm/i915/i915_vma_snapshot.c | 4 - drivers/gpu/drm/i915/i915_vma_snapshot.h | 8 -- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 64 ++++++++---- drivers/gpu/drm/i915/selftests/mock_gtt.c | 12 +-- 21 files changed, 308 insertions(+), 206 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index 8f674745e7e0..63a83d5f85a1 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -48,7 +48,7 @@ static void dpt_insert_page(struct i915_address_space *vm, } static void dpt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { @@ -64,8 +64,8 @@ static void dpt_insert_entries(struct i915_address_space *vm, * not to allow the user to override access to a read only page. */ - i = vma->node.start / I915_GTT_PAGE_SIZE; - for_each_sgt_daddr(addr, sgt_iter, vma->pages) + i = vma_res->start / I915_GTT_PAGE_SIZE; + for_each_sgt_daddr(addr, sgt_iter, vma_res->bi.pages) gen8_set_pte(&base[i++], pte_encode | addr); } @@ -76,35 +76,38 @@ static void dpt_clear_range(struct i915_address_space *vm, static void dpt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { - struct drm_i915_gem_object *obj = vma->obj; u32 pte_flags; + if (vma_res->bound_flags) + return; + /* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; - if (vma->vm->has_read_only && i915_gem_object_is_readonly(obj)) + if (vm->has_read_only && vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(obj)) + if (vma_res->bi.lmem) pte_flags |= PTE_LM; - vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma_res, cache_level, pte_flags); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; /* * Without aliasing PPGTT there's no difference between * GLOBAL/LOCAL_BIND, it's all the same ptes. Hence unconditionally * upgrade to both bound if we bind either to avoid double-binding. */ - atomic_or(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND, &vma->flags); + vma_res->bound_flags = I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; } -static void dpt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) +static void dpt_unbind_vma(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - vm->clear_range(vm, vma->node.start, vma->size); + vm->clear_range(vm, vma_res->start, vma_res->vma_size); } static void dpt_cleanup(struct i915_address_space *vm) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index f9f7e44099fe..f99d260e0684 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -15,6 +15,7 @@ #include "i915_active.h" #include "i915_selftest.h" +#include "i915_vma_resource.h" struct drm_i915_gem_object; struct intel_fronbuffer; @@ -549,31 +550,7 @@ struct drm_i915_gem_object { struct sg_table *pages; void *mapping; - struct i915_page_sizes { - /** - * The sg mask of the pages sg_table. i.e the mask of - * of the lengths for each sg entry. - */ - unsigned int phys; - - /** - * The gtt page sizes we are allowed to use given the - * sg mask and the supported page sizes. This will - * express the smallest unit we can use for the whole - * object, as well as the larger sizes we may be able - * to use opportunistically. - */ - unsigned int sg; - - /** - * The actual gtt page size usage. Since we can have - * multiple vma associated with this object we need to - * prevent any trampling of state, hence a copy of this - * struct also lives in each vma, therefore the gtt - * value here should only be read/write through the vma. - */ - unsigned int gtt; - } page_sizes; + struct i915_page_sizes page_sizes; I915_SELFTEST_DECLARE(unsigned int page_mask); diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 11f0aa65f8a3..26f997c376a2 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -370,9 +370,9 @@ static int igt_check_page_sizes(struct i915_vma *vma) err = -EINVAL; } - if (!HAS_PAGE_SIZES(i915, vma->page_sizes.gtt)) { + if (!HAS_PAGE_SIZES(i915, vma->resource->page_sizes_gtt)) { pr_err("unsupported page_sizes.gtt=%u, supported=%u\n", - vma->page_sizes.gtt & ~supported, supported); + vma->resource->page_sizes_gtt & ~supported, supported); err = -EINVAL; } @@ -403,15 +403,9 @@ static int igt_check_page_sizes(struct i915_vma *vma) if (i915_gem_object_is_lmem(obj) && IS_ALIGNED(vma->node.start, SZ_2M) && vma->page_sizes.sg & SZ_2M && - vma->page_sizes.gtt < SZ_2M) { + vma->resource->page_sizes_gtt < SZ_2M) { pr_err("gtt pages mismatch for LMEM, expected 2M GTT pages, sg(%u), gtt(%u)\n", - vma->page_sizes.sg, vma->page_sizes.gtt); - err = -EINVAL; - } - - if (obj->mm.page_sizes.gtt) { - pr_err("obj->page_sizes.gtt(%u) should never be set\n", - obj->mm.page_sizes.gtt); + vma->page_sizes.sg, vma->resource->page_sizes_gtt); err = -EINVAL; } @@ -547,9 +541,9 @@ static int igt_mock_memory_region_huge_pages(void *arg) goto out_unpin; } - if (vma->page_sizes.gtt != page_size) { + if (vma->resource->page_sizes_gtt != page_size) { pr_err("%s page_sizes.gtt=%u, expected=%u\n", - __func__, vma->page_sizes.gtt, + __func__, vma->resource->page_sizes_gtt, page_size); err = -EINVAL; goto out_unpin; @@ -630,9 +624,9 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) err = igt_check_page_sizes(vma); - if (vma->page_sizes.gtt != page_size) { + if (vma->resource->page_sizes_gtt != page_size) { pr_err("page_sizes.gtt=%u, expected %u\n", - vma->page_sizes.gtt, page_size); + vma->resource->page_sizes_gtt, page_size); err = -EINVAL; } @@ -657,9 +651,10 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) err = igt_check_page_sizes(vma); - if (vma->page_sizes.gtt != I915_GTT_PAGE_SIZE_4K) { + if (vma->resource->page_sizes_gtt != I915_GTT_PAGE_SIZE_4K) { pr_err("page_sizes.gtt=%u, expected %llu\n", - vma->page_sizes.gtt, I915_GTT_PAGE_SIZE_4K); + vma->resource->page_sizes_gtt, + I915_GTT_PAGE_SIZE_4K); err = -EINVAL; } @@ -805,9 +800,9 @@ static int igt_mock_ppgtt_huge_fill(void *arg) } } - if (vma->page_sizes.gtt != expected_gtt) { + if (vma->resource->page_sizes_gtt != expected_gtt) { pr_err("gtt=%u, expected=%u, size=%zd, single=%s\n", - vma->page_sizes.gtt, expected_gtt, + vma->resource->page_sizes_gtt, expected_gtt, obj->base.size, yesno(!!single)); err = -EINVAL; break; @@ -961,10 +956,10 @@ static int igt_mock_ppgtt_64K(void *arg) } } - if (vma->page_sizes.gtt != expected_gtt) { + if (vma->resource->page_sizes_gtt != expected_gtt) { pr_err("gtt=%u, expected=%u, i=%d, single=%s\n", - vma->page_sizes.gtt, expected_gtt, i, - yesno(!!single)); + vma->resource->page_sizes_gtt, + expected_gtt, i, yesno(!!single)); err = -EINVAL; goto out_vma_unpin; } diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 6e9292918bfc..d657ffd6c86a 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -104,17 +104,17 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, } static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); struct i915_page_directory * const pd = ppgtt->pd; - unsigned int first_entry = vma->node.start / I915_GTT_PAGE_SIZE; + unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE; unsigned int act_pt = first_entry / GEN6_PTES; unsigned int act_pte = first_entry % GEN6_PTES; const u32 pte_encode = vm->pte_encode(0, cache_level, flags); - struct sgt_dma iter = sgt_dma(vma); + struct sgt_dma iter = sgt_dma(vma_res); gen6_pte_t *vaddr; GEM_BUG_ON(!pd->entry[act_pt]); @@ -140,7 +140,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, } } while (1); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; } static void gen6_flush_pd(struct gen6_ppgtt *ppgtt, u64 start, u64 end) @@ -271,13 +271,13 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm) static void pd_vma_bind(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 unused) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - struct gen6_ppgtt *ppgtt = vma->private; - u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE; + struct gen6_ppgtt *ppgtt = vma_res->private; + u32 ggtt_offset = vma_res->start / I915_GTT_PAGE_SIZE; ppgtt->pp_dir = ggtt_offset * sizeof(gen6_pte_t) << 10; ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset; @@ -285,9 +285,10 @@ static void pd_vma_bind(struct i915_address_space *vm, gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total); } -static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma) +static void pd_vma_unbind(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - struct gen6_ppgtt *ppgtt = vma->private; + struct gen6_ppgtt *ppgtt = vma_res->private; struct i915_page_directory * const pd = ppgtt->base.pd; struct i915_page_table *pt; unsigned int pde; diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index b012c50f7ce7..c43e724afa9f 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -453,20 +453,21 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; } -static void gen8_ppgtt_insert_huge(struct i915_vma *vma, +static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, struct sgt_dma *iter, enum i915_cache_level cache_level, u32 flags) { const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); unsigned int rem = sg_dma_len(iter->sg); - u64 start = vma->node.start; + u64 start = vma_res->start; - GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm)); + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); do { struct i915_page_directory * const pdp = - gen8_pdp_for_page_address(vma->vm, start); + gen8_pdp_for_page_address(vm, start); struct i915_page_directory * const pd = i915_pd_entry(pdp, __gen8_pte_index(start, 2)); gen8_pte_t encode = pte_encode; @@ -475,7 +476,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, gen8_pte_t *vaddr; u16 index; - if (vma->page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && rem >= I915_GTT_PAGE_SIZE_2M && !__gen8_pte_index(start, 0)) { @@ -492,7 +493,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, page_size = I915_GTT_PAGE_SIZE; if (!index && - vma->page_sizes.sg & I915_GTT_PAGE_SIZE_64K && + vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_64K && IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_64K) && (IS_ALIGNED(rem, I915_GTT_PAGE_SIZE_64K) || rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)) @@ -541,9 +542,9 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, */ if (maybe_64K != -1 && (index == I915_PDES || - (i915_vm_has_scratch_64K(vma->vm) && - !iter->sg && IS_ALIGNED(vma->node.start + - vma->node.size, + (i915_vm_has_scratch_64K(vm) && + !iter->sg && IS_ALIGNED(vma_res->start + + vma_res->node_size, I915_GTT_PAGE_SIZE_2M)))) { vaddr = px_vaddr(pd); vaddr[maybe_64K] |= GEN8_PDE_IPS_64K; @@ -559,10 +560,10 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, * instead - which we detect as missing results during * selftests. */ - if (I915_SELFTEST_ONLY(vma->vm->scrub_64K)) { + if (I915_SELFTEST_ONLY(vm->scrub_64K)) { u16 i; - encode = vma->vm->scratch[0]->encode; + encode = vm->scratch[0]->encode; vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K)); for (i = 1; i < index; i += 16) @@ -572,22 +573,22 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, } } - vma->page_sizes.gtt |= page_size; + vma_res->page_sizes_gtt |= page_size; } while (iter->sg && sg_dma_len(iter->sg)); } static void gen8_ppgtt_insert(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm); - struct sgt_dma iter = sgt_dma(vma); + struct sgt_dma iter = sgt_dma(vma_res); - if (vma->page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags); + if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { - u64 idx = vma->node.start >> GEN8_PTE_SHIFT; + u64 idx = vma_res->start >> GEN8_PTE_SHIFT; do { struct i915_page_directory * const pdp = @@ -597,7 +598,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, cache_level, flags); } while (idx); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; } } diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 352254e001b4..74aa90587061 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1718,8 +1718,8 @@ static void print_request_ring(struct drm_printer *m, struct i915_request *rq) drm_printf(m, "[head %04x, postfix %04x, tail %04x, batch 0x%08x_%08x]:\n", rq->head, rq->postfix, rq->tail, - vsnap ? upper_32_bits(vsnap->gtt_offset) : ~0u, - vsnap ? lower_32_bits(vsnap->gtt_offset) : ~0u); + vsnap ? upper_32_bits(vsnap->vma_resource->start) : ~0u, + vsnap ? lower_32_bits(vsnap->vma_resource->start) : ~0u); size = rq->tail - rq->head; if (rq->tail < rq->head) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 5263dda7f8d5..0137b6af0973 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -235,7 +235,7 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm, } static void gen8_ggtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { @@ -252,10 +252,10 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, */ gte = (gen8_pte_t __iomem *)ggtt->gsm; - gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + gte += vma_res->start / I915_GTT_PAGE_SIZE; + end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE; - for_each_sgt_daddr(addr, iter, vma->pages) + for_each_sgt_daddr(addr, iter, vma_res->bi.pages) gen8_set_pte(gte++, pte_encode | addr); GEM_BUG_ON(gte > end); @@ -292,7 +292,7 @@ static void gen6_ggtt_insert_page(struct i915_address_space *vm, * through the GMADR mapped BAR (i915->mm.gtt->gtt). */ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { @@ -303,10 +303,10 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, dma_addr_t addr; gte = (gen6_pte_t __iomem *)ggtt->gsm; - gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + gte += vma_res->start / I915_GTT_PAGE_SIZE; + end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE; - for_each_sgt_daddr(addr, iter, vma->pages) + for_each_sgt_daddr(addr, iter, vma_res->bi.pages) iowrite32(vm->pte_encode(addr, level, flags), gte++); GEM_BUG_ON(gte > end); @@ -389,7 +389,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm, struct insert_entries { struct i915_address_space *vm; - struct i915_vma *vma; + struct i915_vma_resource *vma_res; enum i915_cache_level level; u32 flags; }; @@ -398,18 +398,18 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg) { struct insert_entries *arg = _arg; - gen8_ggtt_insert_entries(arg->vm, arg->vma, arg->level, arg->flags); + gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags); bxt_vtd_ggtt_wa(arg->vm); return 0; } static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { - struct insert_entries arg = { vm, vma, level, flags }; + struct insert_entries arg = { vm, vma_res, level, flags }; stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL); } @@ -448,14 +448,14 @@ static void i915_ggtt_insert_page(struct i915_address_space *vm, } static void i915_ggtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 unused) { unsigned int flags = (cache_level == I915_CACHE_NONE) ? AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY; - intel_gtt_insert_sg_entries(vma->pages, vma->node.start >> PAGE_SHIFT, + intel_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT, flags); } @@ -467,30 +467,32 @@ static void i915_ggtt_clear_range(struct i915_address_space *vm, static void ggtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { - struct drm_i915_gem_object *obj = vma->obj; u32 pte_flags; - if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK)) + if (vma_res->bound_flags & (~flags & I915_VMA_BIND_MASK)) return; + vma_res->bound_flags |= flags; + /* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; - if (i915_gem_object_is_readonly(obj)) + if (vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(obj)) + if (vma_res->bi.lmem) pte_flags |= PTE_LM; - vm->insert_entries(vm, vma, cache_level, pte_flags); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vm->insert_entries(vm, vma_res, cache_level, pte_flags); + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; } -static void ggtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) +static void ggtt_unbind_vma(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - vm->clear_range(vm, vma->node.start, vma->size); + vm->clear_range(vm, vma_res->start, vma_res->vma_size); } static int ggtt_reserve_guc_top(struct i915_ggtt *ggtt) @@ -623,7 +625,7 @@ static int init_ggtt(struct i915_ggtt *ggtt) static void aliasing_gtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { @@ -631,25 +633,27 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm, /* Currently applicable only to VLV */ pte_flags = 0; - if (i915_gem_object_is_readonly(vma->obj)) + if (vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; if (flags & I915_VMA_LOCAL_BIND) ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm, - stash, vma, cache_level, flags); + stash, vma_res, cache_level, flags); if (flags & I915_VMA_GLOBAL_BIND) - vm->insert_entries(vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma_res, cache_level, pte_flags); + + vma_res->bound_flags |= flags; } static void aliasing_gtt_unbind_vma(struct i915_address_space *vm, - struct i915_vma *vma) + struct i915_vma_resource *vma_res) { - if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) - vm->clear_range(vm, vma->node.start, vma->size); + if (vma_res->bound_flags & I915_VMA_GLOBAL_BIND) + vm->clear_range(vm, vma_res->start, vma_res->vma_size); - if (i915_vma_is_bound(vma, I915_VMA_LOCAL_BIND)) - ppgtt_unbind_vma(&i915_vm_to_ggtt(vm)->alias->vm, vma); + if (vma_res->bound_flags & I915_VMA_LOCAL_BIND) + ppgtt_unbind_vma(&i915_vm_to_ggtt(vm)->alias->vm, vma_res); } static int init_aliasing_ppgtt(struct i915_ggtt *ggtt) @@ -1280,7 +1284,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm) atomic_read(&vma->flags) & I915_VMA_BIND_MASK; GEM_BUG_ON(!was_bound); - vma->ops->bind_vma(vm, NULL, vma, + vma->ops->bind_vma(vm, NULL, vma->resource, obj ? obj->cache_level : 0, was_bound); if (obj) { /* only used during resume => exclusive access */ diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 177b42b935a1..676b839d1a34 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -27,6 +27,7 @@ #include "gt/intel_reset.h" #include "i915_selftest.h" +#include "i915_vma_resource.h" #include "i915_vma_types.h" #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) @@ -200,7 +201,7 @@ struct i915_vma_ops { /* Map an object into an address space with the given cache flags. */ void (*bind_vma)(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); /* @@ -208,7 +209,8 @@ struct i915_vma_ops { * setting the valid PTE entries to a reserved scratch page. */ void (*unbind_vma)(struct i915_address_space *vm, - struct i915_vma *vma); + struct i915_vma_resource *vma_res); + }; struct i915_address_space { @@ -285,7 +287,7 @@ struct i915_address_space { enum i915_cache_level cache_level, u32 flags); void (*insert_entries)(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); void (*cleanup)(struct i915_address_space *vm); @@ -600,11 +602,11 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt); void ppgtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); void ppgtt_unbind_vma(struct i915_address_space *vm, - struct i915_vma *vma); + struct i915_vma_resource *vma_res); void gtt_write_workarounds(struct intel_gt *gt); @@ -627,8 +629,8 @@ __vm_create_scratch_for_read_pinned(struct i915_address_space *vm, unsigned long static inline struct sgt_dma { struct scatterlist *sg; dma_addr_t dma, max; -} sgt_dma(struct i915_vma *vma) { - struct scatterlist *sg = vma->pages->sgl; +} sgt_dma(struct i915_vma_resource *vma_res) { + struct scatterlist *sg = vma_res->bi.pages->sgl; dma_addr_t addr = sg_dma_address(sg); return (struct sgt_dma){ sg, addr, addr + sg_dma_len(sg) }; diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 083b3090c69c..48e6e2f87700 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -179,32 +179,34 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt, void ppgtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { u32 pte_flags; - if (!test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) { - vm->allocate_va_range(vm, stash, vma->node.start, vma->size); - set_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma)); + if (!vma_res->allocated) { + vm->allocate_va_range(vm, stash, vma_res->start, + vma_res->vma_size); + vma_res->allocated = true; } /* Applicable to VLV, and gen8+ */ pte_flags = 0; - if (i915_gem_object_is_readonly(vma->obj)) + if (vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(vma->obj)) + if (vma_res->bi.lmem) pte_flags |= PTE_LM; - vm->insert_entries(vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma_res, cache_level, pte_flags); wmb(); } -void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) +void ppgtt_unbind_vma(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - if (test_and_clear_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) - vm->clear_range(vm, vma->node.start, vma->size); + if (vma_res->allocated) + vm->clear_range(vm, vma_res->start, vma_res->vma_size); } static unsigned long pd_count(u64 size, int shift) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index a5af05bde6f2..777fc6f0ceff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -448,20 +448,19 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw) { struct drm_i915_gem_object *obj = uc_fw->obj; struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt; - struct i915_vma *dummy = &uc_fw->dummy; + struct i915_vma_resource *dummy = &uc_fw->dummy; u32 pte_flags = 0; - dummy->node.start = uc_fw_ggtt_offset(uc_fw); - dummy->node.size = obj->base.size; - dummy->pages = obj->mm.pages; - dummy->vm = &ggtt->vm; + dummy->start = uc_fw_ggtt_offset(uc_fw); + dummy->node_size = obj->base.size; + dummy->bi.pages = obj->mm.pages; GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); - GEM_BUG_ON(dummy->node.size > ggtt->uc_fw.size); + GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size); /* uc_fw->obj cache domains were not controlled across suspend */ if (i915_gem_object_has_struct_page(obj)) - drm_clflush_sg(dummy->pages); + drm_clflush_sg(dummy->bi.pages); if (i915_gem_object_is_lmem(obj)) pte_flags |= PTE_LM; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h index d9d1dc0b4cbb..3229018877d3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h @@ -85,7 +85,7 @@ struct intel_uc_fw { * threaded as it done during driver load (inherently single threaded) * or during a GT reset (mutex guarantees single threaded). */ - struct i915_vma dummy; + struct i915_vma_resource dummy; struct i915_vma *rsa_data; /* diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index e0e052cdf8b8..f7d1feba5aa4 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -170,7 +170,8 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) seq_printf(m, " (%s offset: %08llx, size: %08llx, pages: %s", stringify_vma_type(vma), vma->node.start, vma->node.size, - stringify_page_sizes(vma->page_sizes.gtt, NULL, 0)); + stringify_page_sizes(vma->resource->page_sizes_gtt, + NULL, 0)); if (i915_vma_is_ggtt(vma) || i915_vma_is_dpt(vma)) { switch (vma->ggtt_view.type) { case I915_GGTT_VIEW_NORMAL: diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 5ae812d60abe..1af54ff374f9 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1040,9 +1040,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, strcpy(dst->name, vsnap->name); dst->next = NULL; - dst->gtt_offset = vsnap->gtt_offset; - dst->gtt_size = vsnap->gtt_size; - dst->gtt_page_sizes = vsnap->page_sizes; + dst->gtt_offset = vsnap->vma_resource->start; + dst->gtt_size = vsnap->vma_resource->node_size; + dst->gtt_page_sizes = vsnap->vma_resource->page_sizes_gtt; dst->unused = 0; ret = -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 7097c5016431..1d4e448d22d9 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -298,7 +298,7 @@ static void __vma_bind(struct dma_fence_work *work) struct i915_vma *vma = vw->vma; vma->ops->bind_vma(vw->vm, &vw->stash, - vma, vw->cache_level, vw->flags); + vma->resource, vw->cache_level, vw->flags); } static void __vma_release(struct dma_fence_work *work) @@ -375,6 +375,21 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma) #define i915_vma_verify_bind_complete(_vma) 0 #endif +I915_SELFTEST_EXPORT void +i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, + struct i915_vma *vma) +{ + struct drm_i915_gem_object *obj = vma->obj; + + i915_vma_resource_init(vma_res, vma->pages, &vma->page_sizes, + i915_gem_object_is_readonly(obj), + i915_gem_object_is_lmem(obj), + vma->private, + vma->node.start, + vma->node.size, + vma->size); +} + /** * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space. * @vma: VMA to map @@ -432,7 +447,7 @@ int i915_vma_bind(struct i915_vma *vma, GEM_WARN_ON(!vma_flags); kfree(vma_res); } else { - i915_vma_resource_init(vma_res); + i915_vma_resource_init_from_vma(vma_res, vma); vma->resource = vma_res; } trace_i915_vma_bind(vma, bind_flags); @@ -472,7 +487,8 @@ int i915_vma_bind(struct i915_vma *vma, if (ret) return ret; } - vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags); + vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level, + bind_flags); } atomic_or(bind_flags, &vma->flags); @@ -1778,7 +1794,7 @@ void __i915_vma_evict(struct i915_vma *vma) if (likely(atomic_read(&vma->vm->open))) { trace_i915_vma_unbind(vma); - vma->ops->unbind_vma(vma->vm, vma); + vma->ops->unbind_vma(vma->vm, vma->resource); } atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index de0f3e44cdfa..1df57ec832bd 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -339,12 +339,6 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma); */ void i915_vma_unpin_iomap(struct i915_vma *vma); -static inline struct page *i915_vma_first_page(struct i915_vma *vma) -{ - GEM_BUG_ON(!vma->pages); - return sg_page(vma->pages->sgl); -} - /** * i915_vma_pin_fence - pin fencing state * @vma: vma to pin fencing for @@ -445,6 +439,11 @@ i915_vma_get_current_resource(struct i915_vma *vma) return i915_vma_resource_get(vma->resource); } +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +void i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, + struct i915_vma *vma); +#endif + void i915_vma_module_exit(void); int i915_vma_module_init(void); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 833e987bed2a..c86db89ab5d2 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -23,15 +23,12 @@ static struct dma_fence_ops unbind_fence_ops = { }; /** - * i915_vma_resource_init - Initialize a vma resource. + * __i915_vma_resource_init - Initialize a vma resource. * @vma_res: The vma resource to initialize * - * Initializes a vma resource allocated using i915_vma_resource_alloc(). - * The reason for having separate allocate and initialize function is that - * initialization may need to be performed from under a lock where - * allocation is not allowed. + * Initializes the private members of a vma resource. */ -void i915_vma_resource_init(struct i915_vma_resource *vma_res) +void __i915_vma_resource_init(struct i915_vma_resource *vma_res) { spin_lock_init(&vma_res->lock); dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 34744da23072..9872de58268b 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -9,6 +9,25 @@ #include #include +#include "i915_gem.h" + +struct i915_page_sizes { + /** + * The sg mask of the pages sg_table. i.e the mask of + * the lengths for each sg entry. + */ + unsigned int phys; + + /** + * The gtt page sizes we are allowed to use given the + * sg mask and the supported page sizes. This will + * express the smallest unit we can use for the whole + * object, as well as the larger sizes we may be able + * to use opportunistically. + */ + unsigned int sg; +}; + /** * struct i915_vma_resource - Snapshotted unbind information. * @unbind_fence: Fence to mark unbinding complete. Note that this fence @@ -20,6 +39,13 @@ * @hold_count: Number of holders blocking the fence from finishing. * The vma itself is keeping a hold, which is released when unbind * is scheduled. + * @private: Bind backend private info. + * @start: Offset into the address space of bind range start. + * @node_size: Size of the allocated range manager node. + * @vma_size: Bind size. + * @page_sizes_gtt: Resulting page sizes from the bind operation. + * @bound_flags: Flags indicating binding status. + * @allocated: Backend private data. TODO: Should move into @private. * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -29,6 +55,32 @@ struct i915_vma_resource { /* See above for description of the lock. */ spinlock_t lock; refcount_t hold_count; + + /** + * struct i915_vma_bindinfo - Information needed for async bind + * only but that can be dropped after the bind has taken place. + * Consider making this a separate argument to the bind_vma + * op, coalescing with other arguments like vm, stash, cache_level + * and flags + * @pages: The pages sg-table. + * @page_sizes: Page sizes of the pages. + * @readonly: Whether the vma should be bound read-only. + * @lmem: Whether the vma points to lmem. + */ + struct i915_vma_bindinfo { + struct sg_table *pages; + struct i915_page_sizes page_sizes; + bool readonly:1; + bool lmem:1; + } bi; + + void *private; + unsigned long start; + unsigned long node_size; + unsigned long vma_size; + u32 page_sizes_gtt; + u32 bound_flags; + bool allocated:1; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -41,6 +93,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); +void __i915_vma_resource_init(struct i915_vma_resource *vma_res); + /** * i915_vma_resource_get - Take a reference on a vma resource * @vma_res: The vma resource on which to take a reference. @@ -63,8 +117,47 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) dma_fence_put(&vma_res->unbind_fence); } -#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) -void i915_vma_resource_init(struct i915_vma_resource *vma_res); -#endif +/** + * i915_vma_resource_init - Initialize a vma resource. + * @vma_res: The vma resource to initialize + * @pages: The pages sg-table. + * @page_sizes: Page sizes of the pages. + * @readonly: Whether the vma should be bound read-only. + * @lmem: Whether the vma points to lmem. + * @private: Bind backend private info. + * @start: Offset into the address space of bind range start. + * @node_size: Size of the allocated range manager node. + * @size: Bind size. + * + * Initializes a vma resource allocated using i915_vma_resource_alloc(). + * The reason for having separate allocate and initialize function is that + * initialization may need to be performed from under a lock where + * allocation is not allowed. + */ +static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, + struct sg_table *pages, + const struct i915_page_sizes *page_sizes, + bool readonly, + bool lmem, + void *private, + unsigned long start, + unsigned long node_size, + unsigned long size) +{ + __i915_vma_resource_init(vma_res); + vma_res->bi.pages = pages; + vma_res->bi.page_sizes = *page_sizes; + vma_res->bi.readonly = readonly; + vma_res->bi.lmem = lmem; + vma_res->private = private; + vma_res->start = start; + vma_res->node_size = node_size; + vma_res->vma_size = size; +} + +static inline void i915_vma_resource_fini(struct i915_vma_resource *vma_res) +{ + GEM_BUG_ON(refcount_read(&vma_res->hold_count) != 1); +} #endif diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.c b/drivers/gpu/drm/i915/i915_vma_snapshot.c index f7333c7a2f5e..69f62c1ca967 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.c +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.c @@ -24,11 +24,7 @@ void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, assert_object_held(vma->obj); vsnap->name = name; - vsnap->size = vma->size; vsnap->obj_size = vma->obj->base.size; - vsnap->gtt_offset = vma->node.start; - vsnap->gtt_size = vma->node.size; - vsnap->page_sizes = vma->page_sizes.gtt; vsnap->pages = vma->pages; vsnap->pages_rsgt = NULL; vsnap->mr = NULL; diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.h b/drivers/gpu/drm/i915/i915_vma_snapshot.h index e74588dd676b..1b08ce9f8576 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.h +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.h @@ -23,31 +23,23 @@ struct sg_table; /** * struct i915_vma_snapshot - Snapshot of vma metadata. - * @size: The vma size in bytes. * @obj_size: The size of the underlying object in bytes. - * @gtt_offset: The gtt offset the vma is bound to. - * @gtt_size: The size in bytes allocated for the vma in the GTT. * @pages: The struct sg_table pointing to the pages bound. * @pages_rsgt: The refcounted sg_table holding the reference for @pages if any. * @mr: The memory region pointed for the pages bound. * @kref: Reference for this structure. * @vma_resource: Pointer to the vma resource representing the vma binding. - * @page_sizes: The vma GTT page sizes information. * @onstack: Whether the structure shouldn't be freed on final put. * @present: Whether the structure is present and initialized. */ struct i915_vma_snapshot { const char *name; - size_t size; size_t obj_size; - size_t gtt_offset; - size_t gtt_size; struct sg_table *pages; struct i915_refct_sgt *pages_rsgt; struct intel_memory_region *mr; struct kref kref; struct i915_vma_resource *vma_resource; - u32 page_sizes; bool onstack:1; bool present:1; }; diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 54be880e55c3..70b5c47890b9 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -239,11 +239,11 @@ static int lowlevel_hole(struct i915_address_space *vm, unsigned long end_time) { I915_RND_STATE(seed_prng); - struct i915_vma *mock_vma; + struct i915_vma_resource *mock_vma_res; unsigned int size; - mock_vma = kzalloc(sizeof(*mock_vma), GFP_KERNEL); - if (!mock_vma) + mock_vma_res = kzalloc(sizeof(*mock_vma_res), GFP_KERNEL); + if (!mock_vma_res) return -ENOMEM; /* Keep creating larger objects until one cannot fit into the hole */ @@ -269,7 +269,7 @@ static int lowlevel_hole(struct i915_address_space *vm, break; } while (count >>= 1); if (!count) { - kfree(mock_vma); + kfree(mock_vma_res); return -ENOMEM; } GEM_BUG_ON(!order); @@ -343,12 +343,12 @@ static int lowlevel_hole(struct i915_address_space *vm, break; } - mock_vma->pages = obj->mm.pages; - mock_vma->node.size = BIT_ULL(size); - mock_vma->node.start = addr; + mock_vma_res->bi.pages = obj->mm.pages; + mock_vma_res->node_size = BIT_ULL(size); + mock_vma_res->start = addr; with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref) - vm->insert_entries(vm, mock_vma, + vm->insert_entries(vm, mock_vma_res, I915_CACHE_NONE, 0); } count = n; @@ -371,7 +371,7 @@ static int lowlevel_hole(struct i915_address_space *vm, cleanup_freed_objects(vm->i915); } - kfree(mock_vma); + kfree(mock_vma_res); return 0; } @@ -1280,6 +1280,7 @@ static void track_vma_bind(struct i915_vma *vma) atomic_set(&vma->pages_count, I915_VMA_PAGES_ACTIVE); __i915_gem_object_pin_pages(obj); vma->pages = obj->mm.pages; + vma->resource->bi.pages = vma->pages; mutex_lock(&vma->vm->mutex); list_add_tail(&vma->vm_link, &vma->vm->bound_list); @@ -1354,7 +1355,7 @@ static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset) obj->cache_level, 0); if (!err) { - i915_vma_resource_init(vma_res); + i915_vma_resource_init_from_vma(vma_res, vma); vma->resource = vma_res; } else { kfree(vma_res); @@ -1533,7 +1534,7 @@ static int insert_gtt_with_resource(struct i915_vma *vma) err = i915_gem_gtt_insert(vm, &vma->node, obj->base.size, 0, obj->cache_level, 0, vm->total, 0); if (!err) { - i915_vma_resource_init(vma_res); + i915_vma_resource_init_from_vma(vma_res, vma); vma->resource = vma_res; } else { kfree(vma_res); @@ -1958,6 +1959,7 @@ static int igt_cs_tlb(void *arg) struct i915_vm_pt_stash stash = {}; struct i915_request *rq; struct i915_gem_ww_ctx ww; + struct i915_vma_resource *vma_res; u64 offset; offset = igt_random_offset(&prng, @@ -1978,6 +1980,13 @@ static int igt_cs_tlb(void *arg) if (err) goto end; + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) { + i915_vma_put_pages(vma); + err = PTR_ERR(vma_res); + goto end; + } + i915_gem_ww_ctx_init(&ww, false); retry: err = i915_vm_lock_objects(vm, &ww); @@ -1999,33 +2008,41 @@ static int igt_cs_tlb(void *arg) goto retry; } i915_gem_ww_ctx_fini(&ww); - if (err) + if (err) { + kfree(vma_res); goto end; + } + i915_vma_resource_init_from_vma(vma_res, vma); /* Prime the TLB with the dummy pages */ for (i = 0; i < count; i++) { - vma->node.start = offset + i * PAGE_SIZE; - vm->insert_entries(vm, vma, I915_CACHE_NONE, 0); + vma_res->start = offset + i * PAGE_SIZE; + vm->insert_entries(vm, vma_res, I915_CACHE_NONE, + 0); - rq = submit_batch(ce, vma->node.start); + rq = submit_batch(ce, vma_res->start); if (IS_ERR(rq)) { err = PTR_ERR(rq); + i915_vma_resource_fini(vma_res); + kfree(vma_res); goto end; } i915_request_put(rq); } - + i915_vma_resource_fini(vma_res); i915_vma_put_pages(vma); err = context_sync(ce); if (err) { pr_err("%s: dummy setup timed out\n", ce->engine->name); + kfree(vma_res); goto end; } vma = i915_vma_instance(act, vm, NULL); if (IS_ERR(vma)) { + kfree(vma_res); err = PTR_ERR(vma); goto end; } @@ -2033,19 +2050,22 @@ static int igt_cs_tlb(void *arg) i915_gem_object_lock(act, NULL); err = i915_vma_get_pages(vma); i915_gem_object_unlock(act); - if (err) + if (err) { + kfree(vma_res); goto end; + } + i915_vma_resource_init_from_vma(vma_res, vma); /* Replace the TLB with target batches */ for (i = 0; i < count; i++) { struct i915_request *rq; u32 *cs = batch + i * 64 / sizeof(*cs); u64 addr; - vma->node.start = offset + i * PAGE_SIZE; - vm->insert_entries(vm, vma, I915_CACHE_NONE, 0); + vma_res->start = offset + i * PAGE_SIZE; + vm->insert_entries(vm, vma_res, I915_CACHE_NONE, 0); - addr = vma->node.start + i * 64; + addr = vma_res->start + i * 64; cs[4] = MI_NOOP; cs[6] = lower_32_bits(addr); cs[7] = upper_32_bits(addr); @@ -2054,6 +2074,8 @@ static int igt_cs_tlb(void *arg) rq = submit_batch(ce, addr); if (IS_ERR(rq)) { err = PTR_ERR(rq); + i915_vma_resource_fini(vma_res); + kfree(vma_res); goto end; } @@ -2070,6 +2092,8 @@ static int igt_cs_tlb(void *arg) } end_spin(batch, count - 1); + i915_vma_resource_fini(vma_res); + kfree(vma_res); i915_vma_put_pages(vma); err = context_sync(ce); diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c index 1802baf80a17..d40519e3ca38 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c @@ -33,23 +33,23 @@ static void mock_insert_page(struct i915_address_space *vm, } static void mock_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { } static void mock_bind_ppgtt(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND); - set_bit(I915_VMA_LOCAL_BIND_BIT, __i915_vma_flags(vma)); + vma_res->bound_flags |= flags; } static void mock_unbind_ppgtt(struct i915_address_space *vm, - struct i915_vma *vma) + struct i915_vma_resource *vma_res) { } @@ -93,14 +93,14 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name) static void mock_bind_ggtt(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { } static void mock_unbind_ggtt(struct i915_address_space *vm, - struct i915_vma *vma) + struct i915_vma_resource *vma_res) { } From patchwork Mon Jan 3 11:59:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12702514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 06EBFC433F5 for ; Mon, 3 Jan 2022 12:00:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D885410E28A; Mon, 3 Jan 2022 12:00:06 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id D4B4710E294; Mon, 3 Jan 2022 12:00:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641211204; x=1672747204; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZSFDsiaddu/3nPQ58pfaQPSH30E5wIdyS3DoGtWgLVw=; b=TCEY2bCMtidT95QrBoqPZng7+R6mjRdMkdq6qW/UvIfgn8bnGTxZmBxt AT08at00VG8FkFuPF3ppj54hLjVZ/5rMqE/4gPJdRUWpvbFlTysMmcjZO Rp7OsuLGR8+gJ92BN+lUVXgVhSFmhQ66eNtdir6Z9eATX5FkaQXyZoIIn wrGnZC1A+Tqtc0OguNciHiPB5xdNBh5yry6OiQvJ1zCsJugd5CQkMTqOE Zql80AOCIwZ9enH8v30GXiALQiUxjvr7uDqHJizsNIvDp7g4Yj7+DiywQ ot5Gd3A49Dp/Iacse/K9Q96B79Gap2k2/Tl4vIu+Pzfauj+SZhc+7Z/nN Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10215"; a="222044522" X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="222044522" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:04 -0800 X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="512018498" Received: from mikaelsk-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.198]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:02 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Mon, 3 Jan 2022 12:59:46 +0100 Message-Id: <20220103115947.92688-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> References: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 3/4] drm/i915: Use vma resources for async unbinding X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Implement async (non-blocking) unbinding by not syncing the vma before calling unbind on the vma_resource. Add the resulting unbind fence to the object's dma_resv from where it is picked up by the ttm migration code. Ideally these unbind fences should be coalesced with the migration blit fence to avoid stalling the migration blit waiting for unbind, as they can certainly go on in parallel, but since we don't yet have a reasonable data structure to use to coalesce fences and attach the resulting fence to a timeline, we defer that for now. Note that with async unbinding, even while the unbind waits for the preceding bind to complete before unbinding, the vma itself might have been destroyed in the process, clearing the vma pages. Therefore we can only allow async unbinding if we have a refcounted sg-list and keep a refcount on that for the vma resource pages to stay intact until binding occurs. If this condition is not met, a request for an async unbind is diverted to a sync unbind. v2: - Use a separate kmem_cache for vma resources for now to isolate their memory allocation and aid debugging. - Move the check for vm closed to the actual unbinding thread. Regardless of whether the vm is closed, we need the unbind fence to properly wait for capture. - Clear vma_res::vm on unbind and update its documentation. v4: - Take cache coloring into account when searching for vma resources pending unbind. (Matthew Auld) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 11 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 + drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 12 +- drivers/gpu/drm/i915/i915_module.c | 3 + drivers/gpu/drm/i915/i915_vma.c | 196 ++++++++-- drivers/gpu/drm/i915/i915_vma.h | 3 +- drivers/gpu/drm/i915/i915_vma_resource.c | 354 +++++++++++++++++-- drivers/gpu/drm/i915/i915_vma_resource.h | 48 +++ 11 files changed, 571 insertions(+), 66 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index ee9612a3ee5e..0f514435b9c5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -142,7 +142,16 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo) struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); int ret; - ret = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); + /* + * Note: The async unbinding here will actually transform the + * blocking wait for unbind into a wait before finally submitting + * evict / migration blit and thus stall the migration timeline + * which may not be good for overall throughput. We should make + * sure we await the unbind fences *after* the migration blit + * instead of *before* as we currently do. + */ + ret = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE | + I915_GEM_OBJECT_UNBIND_ASYNC); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 0137b6af0973..ae7bbd8914c1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -142,7 +142,7 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm) continue; if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) { - __i915_vma_evict(vma); + __i915_vma_evict(vma, false); drm_mm_remove_node(&vma->node); } } diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index a94be0306464..46be4197b93f 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -161,6 +161,9 @@ static void __i915_vm_release(struct work_struct *work) struct i915_address_space *vm = container_of(work, struct i915_address_space, release_work); + /* Synchronize async unbinds. */ + i915_vma_resource_bind_dep_sync_all(vm); + vm->cleanup(vm); i915_address_space_fini(vm); @@ -189,6 +192,7 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass) if (!kref_read(&vm->resv_ref)) kref_init(&vm->resv_ref); + vm->pending_unbind = RB_ROOT_CACHED; INIT_WORK(&vm->release_work, __i915_vm_release); atomic_set(&vm->open, 1); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 676b839d1a34..8073438b67c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -265,6 +265,9 @@ struct i915_address_space { /* Flags used when creating page-table objects for this vm */ unsigned long lmem_pt_obj_flags; + /* Interval tree for pending unbind vma resources */ + struct rb_root_cached pending_unbind; + struct drm_i915_gem_object * (*alloc_pt_dma)(struct i915_address_space *vm, int sz); struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index beeb42a14aae..63712b2a729e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1670,6 +1670,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, #define I915_GEM_OBJECT_UNBIND_BARRIER BIT(1) #define I915_GEM_OBJECT_UNBIND_TEST BIT(2) #define I915_GEM_OBJECT_UNBIND_VM_TRYLOCK BIT(3) +#define I915_GEM_OBJECT_UNBIND_ASYNC BIT(4) void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 915bf431f320..d6d9b5c13299 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -155,10 +155,16 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, spin_unlock(&obj->vma.lock); if (vma) { + bool vm_trylock = !!(flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK); ret = -EBUSY; - if (flags & I915_GEM_OBJECT_UNBIND_ACTIVE || - !i915_vma_is_active(vma)) { - if (flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK) { + if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) { + assert_object_held(vma->obj); + ret = i915_vma_unbind_async(vma, vm_trylock); + } + + if (ret == -EBUSY && (flags & I915_GEM_OBJECT_UNBIND_ACTIVE || + !i915_vma_is_active(vma))) { + if (vm_trylock) { if (mutex_trylock(&vma->vm->mutex)) { ret = __i915_vma_unbind(vma); mutex_unlock(&vma->vm->mutex); diff --git a/drivers/gpu/drm/i915/i915_module.c b/drivers/gpu/drm/i915/i915_module.c index f6bcd2f89257..a8f175960b34 100644 --- a/drivers/gpu/drm/i915/i915_module.c +++ b/drivers/gpu/drm/i915/i915_module.c @@ -17,6 +17,7 @@ #include "i915_scheduler.h" #include "i915_selftest.h" #include "i915_vma.h" +#include "i915_vma_resource.h" static int i915_check_nomodeset(void) { @@ -64,6 +65,8 @@ static const struct { .exit = i915_scheduler_module_exit }, { .init = i915_vma_module_init, .exit = i915_vma_module_exit }, + { .init = i915_vma_resource_module_init, + .exit = i915_vma_resource_module_exit }, { .init = i915_mock_selftests }, { .init = i915_pmu_init, .exit = i915_pmu_exit }, diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 1d4e448d22d9..bfbbbc6fc818 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -285,9 +285,10 @@ struct i915_vma_work { struct dma_fence_work base; struct i915_address_space *vm; struct i915_vm_pt_stash stash; - struct i915_vma *vma; + struct i915_vma_resource *vma_res; struct drm_i915_gem_object *pinned; struct i915_sw_dma_fence_cb cb; + struct i915_refct_sgt *rsgt; enum i915_cache_level cache_level; unsigned int flags; }; @@ -295,10 +296,11 @@ struct i915_vma_work { static void __vma_bind(struct dma_fence_work *work) { struct i915_vma_work *vw = container_of(work, typeof(*vw), base); - struct i915_vma *vma = vw->vma; + struct i915_vma_resource *vma_res = vw->vma_res; + + vma_res->ops->bind_vma(vma_res->vm, &vw->stash, + vma_res, vw->cache_level, vw->flags); - vma->ops->bind_vma(vw->vm, &vw->stash, - vma->resource, vw->cache_level, vw->flags); } static void __vma_release(struct dma_fence_work *work) @@ -312,6 +314,10 @@ static void __vma_release(struct dma_fence_work *work) i915_vm_free_pt_stash(vw->vm, &vw->stash); i915_vm_put(vw->vm); + if (vw->vma_res) + i915_vma_resource_put(vw->vma_res); + if (vw->rsgt) + i915_refct_sgt_put(vw->rsgt); } static const struct dma_fence_work_ops bind_ops = { @@ -381,13 +387,11 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, { struct drm_i915_gem_object *obj = vma->obj; - i915_vma_resource_init(vma_res, vma->pages, &vma->page_sizes, + i915_vma_resource_init(vma_res, vma->vm, vma->pages, &vma->page_sizes, i915_gem_object_is_readonly(obj), i915_gem_object_is_lmem(obj), - vma->private, - vma->node.start, - vma->node.size, - vma->size); + vma->ops, vma->private, vma->node.start, + vma->node.size, vma->size); } /** @@ -411,6 +415,7 @@ int i915_vma_bind(struct i915_vma *vma, { u32 bind_flags; u32 vma_flags; + int ret; lockdep_assert_held(&vma->vm->mutex); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); @@ -419,12 +424,12 @@ int i915_vma_bind(struct i915_vma *vma, if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start, vma->node.size, vma->vm->total))) { - kfree(vma_res); + i915_vma_resource_free(vma_res); return -ENODEV; } if (GEM_DEBUG_WARN_ON(!flags)) { - kfree(vma_res); + i915_vma_resource_free(vma_res); return -EINVAL; } @@ -436,12 +441,30 @@ int i915_vma_bind(struct i915_vma *vma, bind_flags &= ~vma_flags; if (bind_flags == 0) { - kfree(vma_res); + i915_vma_resource_free(vma_res); return 0; } GEM_BUG_ON(!atomic_read(&vma->pages_count)); + /* Wait for or await async unbinds touching our range */ + if (work && bind_flags & vma->vm->bind_async_flags) + ret = i915_vma_resource_bind_dep_await(vma->vm, + &work->base.chain, + vma->node.start, + vma->node.size, + true, + GFP_NOWAIT | + __GFP_RETRY_MAYFAIL | + __GFP_NOWARN); + else + ret = i915_vma_resource_bind_dep_sync(vma->vm, vma->node.start, + vma->node.size, true); + if (ret) { + i915_vma_resource_free(vma_res); + return ret; + } + if (vma->resource || !vma_res) { /* Rebinding with an additional I915_VMA_*_BIND */ GEM_WARN_ON(!vma_flags); @@ -454,9 +477,11 @@ int i915_vma_bind(struct i915_vma *vma, if (work && bind_flags & vma->vm->bind_async_flags) { struct dma_fence *prev; - work->vma = vma; + work->vma_res = i915_vma_resource_get(vma->resource); work->cache_level = cache_level; work->flags = bind_flags; + if (vma->obj->mm.rsgt) + work->rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); /* * Note we only want to chain up to the migration fence on @@ -484,8 +509,12 @@ int i915_vma_bind(struct i915_vma *vma, int ret; ret = i915_gem_object_wait_moving_fence(vma->obj, true); - if (ret) + if (ret) { + i915_vma_resource_free(vma->resource); + vma->resource = NULL; + return ret; + } } vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level, bind_flags); @@ -1758,8 +1787,9 @@ int _i915_vma_move_to_active(struct i915_vma *vma, return 0; } -void __i915_vma_evict(struct i915_vma *vma) +struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) { + struct i915_vma_resource *vma_res = vma->resource; struct dma_fence *unbind_fence; GEM_BUG_ON(i915_vma_is_pinned(vma)); @@ -1792,27 +1822,39 @@ void __i915_vma_evict(struct i915_vma *vma) GEM_BUG_ON(vma->fence); GEM_BUG_ON(i915_vma_has_userfault(vma)); - if (likely(atomic_read(&vma->vm->open))) { - trace_i915_vma_unbind(vma); - vma->ops->unbind_vma(vma->vm, vma->resource); - } + /* Object backend must be async capable. */ + GEM_WARN_ON(async && !vma->obj->mm.rsgt); + + /* If vm is not open, unbind is a nop. */ + vma_res->needs_wakeref = i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND) && + atomic_read(&vma->vm->open); + trace_i915_vma_unbind(vma); + + unbind_fence = i915_vma_resource_unbind(vma_res); + vma->resource = NULL; + atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); - unbind_fence = i915_vma_resource_unbind(vma->resource); - i915_vma_resource_put(vma->resource); - vma->resource = NULL; + /* Object backend must be async capable. */ + GEM_WARN_ON(async && !vma->obj->mm.rsgt); i915_vma_detach(vma); - vma_unbind_pages(vma); + + if (!async && unbind_fence) { + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); + unbind_fence = NULL; + } /* - * This uninterruptible wait under the vm mutex is currently - * only ever blocking while the vma is being captured from. - * With async unbinding, this wait here will be removed. + * Binding itself may not have completed until the unbind fence signals, + * so don't drop the pages until that happens, unless the resource is + * async_capable. */ - dma_fence_wait(unbind_fence, false); - dma_fence_put(unbind_fence); + + vma_unbind_pages(vma); + return unbind_fence; } int __i915_vma_unbind(struct i915_vma *vma) @@ -1839,12 +1881,46 @@ int __i915_vma_unbind(struct i915_vma *vma) return ret; GEM_BUG_ON(i915_vma_is_active(vma)); - __i915_vma_evict(vma); + __i915_vma_evict(vma, false); drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */ return 0; } +static struct dma_fence *__i915_vma_unbind_async(struct i915_vma *vma) +{ + struct dma_fence *fence; + + lockdep_assert_held(&vma->vm->mutex); + + if (!drm_mm_node_allocated(&vma->node)) + return NULL; + + if (i915_vma_is_pinned(vma)) + return ERR_PTR(-EAGAIN); + + /* + * We probably need to replace this with awaiting the fences of the + * object's dma_resv when the vma active goes away. When doing that + * we need to be careful to not add the vma_resource unbind fence + * immediately to the object's dma_resv, because then unbinding + * the next vma from the object, in case there are many, will + * actually await the unbinding of the previous vmas, which is + * undesirable. + */ + if (i915_sw_fence_await_active(&vma->resource->chain, &vma->active, + I915_ACTIVE_AWAIT_EXCL | + I915_ACTIVE_AWAIT_ACTIVE) < 0) { + return ERR_PTR(-EBUSY); + } + + fence = __i915_vma_evict(vma, true); + + drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */ + + return fence; +} + int i915_vma_unbind(struct i915_vma *vma) { struct i915_address_space *vm = vma->vm; @@ -1881,6 +1957,68 @@ int i915_vma_unbind(struct i915_vma *vma) return err; } +int i915_vma_unbind_async(struct i915_vma *vma, bool trylock_vm) +{ + struct drm_i915_gem_object *obj = vma->obj; + struct i915_address_space *vm = vma->vm; + intel_wakeref_t wakeref = 0; + struct dma_fence *fence; + int err; + + /* + * We need the dma-resv lock since we add the + * unbind fence to the dma-resv object. + */ + assert_object_held(obj); + + if (!drm_mm_node_allocated(&vma->node)) + return 0; + + if (i915_vma_is_pinned(vma)) { + vma_print_allocator(vma, "is pinned"); + return -EAGAIN; + } + + if (!obj->mm.rsgt) + return -EBUSY; + + err = dma_resv_reserve_shared(obj->base.resv, 1); + if (err) + return -EBUSY; + + /* + * It would be great if we could grab this wakeref from the + * async unbind work if needed, but we can't because it uses + * kmalloc and it's in the dma-fence signalling critical path. + */ + if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) + wakeref = intel_runtime_pm_get(&vm->i915->runtime_pm); + + if (trylock_vm && !mutex_trylock(&vm->mutex)) { + err = -EBUSY; + goto out_rpm; + } else if (!trylock_vm) { + err = mutex_lock_interruptible_nested(&vm->mutex, !wakeref); + if (err) + goto out_rpm; + } + + fence = __i915_vma_unbind_async(vma); + mutex_unlock(&vm->mutex); + if (IS_ERR_OR_NULL(fence)) { + err = PTR_ERR_OR_ZERO(fence); + goto out_rpm; + } + + dma_resv_add_shared_fence(obj->base.resv, fence); + dma_fence_put(fence); + +out_rpm: + if (wakeref) + intel_runtime_pm_put(&vm->i915->runtime_pm, wakeref); + return err; +} + struct i915_vma *i915_vma_make_unshrinkable(struct i915_vma *vma) { i915_gem_object_make_unshrinkable(vma->obj); diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 1df57ec832bd..a560bae04e7e 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -213,9 +213,10 @@ bool i915_vma_misplaced(const struct i915_vma *vma, u64 size, u64 alignment, u64 flags); void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); void i915_vma_revoke_mmap(struct i915_vma *vma); -void __i915_vma_evict(struct i915_vma *vma); +struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); +int __must_check i915_vma_unbind_async(struct i915_vma *vma, bool trylock_vm); void i915_vma_unlink_ctx(struct i915_vma *vma); void i915_vma_close(struct i915_vma *vma); void i915_vma_reopen(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index c86db89ab5d2..88b4ebb800c2 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -2,39 +2,44 @@ /* * Copyright © 2021 Intel Corporation */ + +#include #include +#include "i915_sw_fence.h" #include "i915_vma_resource.h" +#include "i915_drv.h" -/* Callbacks for the unbind dma-fence. */ -static const char *get_driver_name(struct dma_fence *fence) -{ - return "vma unbind fence"; -} +#include "gt/intel_gtt.h" -static const char *get_timeline_name(struct dma_fence *fence) -{ - return "unbound"; -} - -static struct dma_fence_ops unbind_fence_ops = { - .get_driver_name = get_driver_name, - .get_timeline_name = get_timeline_name, -}; +static struct kmem_cache *slab_vma_resources; /** - * __i915_vma_resource_init - Initialize a vma resource. - * @vma_res: The vma resource to initialize + * DOC: + * We use a per-vm interval tree to keep track of vma_resources + * scheduled for unbind but not yet unbound. The tree is protected by + * the vm mutex, and nodes are removed just after the unbind fence signals. + * The removal takes the vm mutex from a kernel thread which we need to + * keep in mind so that we don't grab the mutex and try to wait for all + * pending unbinds to complete, because that will temporaryily block many + * of the workqueue threads, and people will get angry. * - * Initializes the private members of a vma resource. + * We should consider using a single ordered fence per VM instead but that + * requires ordering the unbinds and might introduce unnecessary waiting + * for unrelated unbinds. Amount of code will probably be roughly the same + * due to the simplicity of using the interval tree interface. + * + * Another drawback of this interval tree is that the complexity of insertion + * and removal of fences increases as O(ln(pending_unbinds)) instead of + * O(1) for a single fence without interval tree. */ -void __i915_vma_resource_init(struct i915_vma_resource *vma_res) -{ - spin_lock_init(&vma_res->lock); - dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, - &vma_res->lock, 0, 0); - refcount_set(&vma_res->hold_count, 1); -} +#define VMA_RES_START(_node) ((_node)->start) +#define VMA_RES_LAST(_node) ((_node)->start + (_node)->node_size - 1) +INTERVAL_TREE_DEFINE(struct i915_vma_resource, rb, + unsigned long, __subtree_last, + VMA_RES_START, VMA_RES_LAST, static, vma_res_itree); + +/* Callbacks for the unbind dma-fence. */ /** * i915_vma_resource_alloc - Allocate a vma resource @@ -45,15 +50,73 @@ void __i915_vma_resource_init(struct i915_vma_resource *vma_res) struct i915_vma_resource *i915_vma_resource_alloc(void) { struct i915_vma_resource *vma_res = - kzalloc(sizeof(*vma_res), GFP_KERNEL); + kmem_cache_zalloc(slab_vma_resources, GFP_KERNEL); return vma_res ? vma_res : ERR_PTR(-ENOMEM); } +/** + * i915_vma_resource_free - Free a vma resource + * @vma_res: The vma resource to free. + */ +void i915_vma_resource_free(struct i915_vma_resource *vma_res) +{ + kmem_cache_free(slab_vma_resources, vma_res); +} + +static const char *get_driver_name(struct dma_fence *fence) +{ + return "vma unbind fence"; +} + +static const char *get_timeline_name(struct dma_fence *fence) +{ + return "unbound"; +} + +static void unbind_fence_free_rcu(struct rcu_head *head) +{ + struct i915_vma_resource *vma_res = + container_of(head, typeof(*vma_res), unbind_fence.rcu); + + i915_vma_resource_free(vma_res); +} + +static void unbind_fence_release(struct dma_fence *fence) +{ + struct i915_vma_resource *vma_res = + container_of(fence, typeof(*vma_res), unbind_fence); + + i915_sw_fence_fini(&vma_res->chain); + + call_rcu(&fence->rcu, unbind_fence_free_rcu); +} + +static struct dma_fence_ops unbind_fence_ops = { + .get_driver_name = get_driver_name, + .get_timeline_name = get_timeline_name, + .release = unbind_fence_release, +}; + static void __i915_vma_resource_unhold(struct i915_vma_resource *vma_res) { - if (refcount_dec_and_test(&vma_res->hold_count)) - dma_fence_signal(&vma_res->unbind_fence); + struct i915_address_space *vm; + + if (!refcount_dec_and_test(&vma_res->hold_count)) + return; + + dma_fence_signal(&vma_res->unbind_fence); + + vm = vma_res->vm; + if (vma_res->wakeref) + intel_runtime_pm_put(&vm->i915->runtime_pm, vma_res->wakeref); + + vma_res->vm = NULL; + if (!RB_EMPTY_NODE(&vma_res->rb)) { + mutex_lock(&vm->mutex); + vma_res_itree_remove(vma_res, &vm->pending_unbind); + mutex_unlock(&vm->mutex); + } } /** @@ -102,6 +165,49 @@ bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, return held; } +static void i915_vma_resource_unbind_work(struct work_struct *work) +{ + struct i915_vma_resource *vma_res = + container_of(work, typeof(*vma_res), work); + struct i915_address_space *vm = vma_res->vm; + bool lockdep_cookie; + + lockdep_cookie = dma_fence_begin_signalling(); + if (likely(atomic_read(&vm->open))) + vma_res->ops->unbind_vma(vm, vma_res); + + dma_fence_end_signalling(lockdep_cookie); + __i915_vma_resource_unhold(vma_res); + i915_vma_resource_put(vma_res); +} + +static int +i915_vma_resource_fence_notify(struct i915_sw_fence *fence, + enum i915_sw_fence_notify state) +{ + struct i915_vma_resource *vma_res = + container_of(fence, typeof(*vma_res), chain); + struct dma_fence *unbind_fence = + &vma_res->unbind_fence; + + switch (state) { + case FENCE_COMPLETE: + dma_fence_get(unbind_fence); + if (vma_res->immediate_unbind) { + i915_vma_resource_unbind_work(&vma_res->work); + } else { + INIT_WORK(&vma_res->work, i915_vma_resource_unbind_work); + queue_work(system_unbound_wq, &vma_res->work); + } + break; + case FENCE_FREE: + i915_vma_resource_put(vma_res); + break; + } + + return NOTIFY_DONE; +} + /** * i915_vma_resource_unbind - Unbind a vma resource * @vma_res: The vma resource to unbind. @@ -112,10 +218,196 @@ bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, * Return: A refcounted pointer to a dma-fence that signals when unbinding is * complete. */ -struct dma_fence * -i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) { - __i915_vma_resource_unhold(vma_res); - dma_fence_get(&vma_res->unbind_fence); + struct i915_address_space *vm = vma_res->vm; + + /* Reference for the sw fence */ + i915_vma_resource_get(vma_res); + + /* Caller must already have a wakeref in this case. */ + if (vma_res->needs_wakeref) { + vma_res->wakeref = intel_runtime_pm_get_if_in_use(&vm->i915->runtime_pm); + } + + if (atomic_read(&vma_res->chain.pending) <= 1) { + RB_CLEAR_NODE(&vma_res->rb); + vma_res->immediate_unbind = 1; + } else { + vma_res_itree_insert(vma_res, &vma_res->vm->pending_unbind); + } + + i915_sw_fence_commit(&vma_res->chain); + return &vma_res->unbind_fence; } + +/** + * __i915_vma_resource_init - Initialize a vma resource. + * @vma_res: The vma resource to initialize + * + * Initializes the private members of a vma resource. + */ +void __i915_vma_resource_init(struct i915_vma_resource *vma_res) +{ + spin_lock_init(&vma_res->lock); + dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, + &vma_res->lock, 0, 0); + refcount_set(&vma_res->hold_count, 1); + i915_sw_fence_init(&vma_res->chain, i915_vma_resource_fence_notify); +} + +static void +i915_vma_resource_color_adjust_range(struct i915_address_space *vm, + unsigned long *start, + unsigned long *end) +{ + if (i915_vm_has_cache_coloring(vm)) { + if (start) + start -= I915_GTT_PAGE_SIZE; + end += I915_GTT_PAGE_SIZE; + } +} + +/** + * i915_vma_resource_bind_dep_sync - Wait for / sync all unbinds touching a + * certain vm range. + * @vm: The vm to look at. + * @offset: The range start. + * @size: The range size. + * @intr: Whether to wait interrubtible. + * + * The function needs to be called with the vm lock held. + * + * Return: Zero on success, -ERESTARTSYS if interrupted and @intr==true + */ +int i915_vma_resource_bind_dep_sync(struct i915_address_space *vm, + unsigned long offset, + unsigned long size, + bool intr) +{ + struct i915_vma_resource *node; + unsigned long last = offset + size - 1; + + lockdep_assert_held(&vm->mutex); + might_sleep(); + + i915_vma_resource_color_adjust_range(vm, &offset, &last); + node = vma_res_itree_iter_first(&vm->pending_unbind, offset, last); + while (node) { + int ret = dma_fence_wait(&node->unbind_fence, intr); + + if (ret) + return ret; + + node = vma_res_itree_iter_next(node, offset, last); + } + + return 0; +} + +/** + * i915_vma_resource_bind_dep_sync_all - Wait for / sync all unbinds of a vm, + * releasing the vm lock while waiting. + * @vm: The vm to look at. + * + * The function may not be called with the vm lock held. + * Typically this is called at vm destruction to finish any pending + * unbind operations. The vm mutex is released while waiting to avoid + * stalling kernel workqueues trying to grab the mutex. + */ +void i915_vma_resource_bind_dep_sync_all(struct i915_address_space *vm) +{ + struct i915_vma_resource *node; + struct dma_fence *fence; + + do { + fence = NULL; + mutex_lock(&vm->mutex); + node = vma_res_itree_iter_first(&vm->pending_unbind, 0, + ULONG_MAX); + if (node) + fence = dma_fence_get_rcu(&node->unbind_fence); + mutex_unlock(&vm->mutex); + + if (fence) { + /* + * The wait makes sure the node eventually removes + * itself from the tree. + */ + dma_fence_wait(fence, false); + dma_fence_put(fence); + } + } while (node); +} + +/** + * i915_vma_resource_bind_dep_await - Have a struct i915_sw_fence await all + * pending unbinds in a certain range of a vm. + * @vm: The vm to look at. + * @sw_fence: The struct i915_sw_fence that will be awaiting the unbinds. + * @offset: The range start. + * @size: The range size. + * @intr: Whether to wait interrubtible. + * @gfp: Allocation mode for memory allocations. + * + * The function makes @sw_fence await all pending unbinds in a certain + * vm range before calling the complete notifier. To be able to await + * each individual unbind, the function needs to allocate memory using + * the @gpf allocation mode. If that fails, the function will instead + * wait for the unbind fence to signal, using @intr to judge whether to + * wait interruptible or not. Note that @gfp should ideally be selected so + * as to avoid any expensive memory allocation stalls and rather fail and + * synchronize itself. For now the vm mutex is required when calling this + * function with means that @gfp can't call into direct reclaim. In reality + * this means that during heavy memory pressure, we will sync in this + * function. + * + * Return: Zero on success, -ERESTARTSYS if interrupted and @intr==true + */ +int i915_vma_resource_bind_dep_await(struct i915_address_space *vm, + struct i915_sw_fence *sw_fence, + unsigned long offset, + unsigned long size, + bool intr, + gfp_t gfp) +{ + struct i915_vma_resource *node; + unsigned long last = offset + size - 1; + + lockdep_assert_held(&vm->mutex); + might_alloc(gfp); + might_sleep(); + + i915_vma_resource_color_adjust_range(vm, &offset, &last); + node = vma_res_itree_iter_first(&vm->pending_unbind, offset, last); + while (node) { + int ret; + + ret = i915_sw_fence_await_dma_fence(sw_fence, + &node->unbind_fence, + MAX_SCHEDULE_TIMEOUT, gfp); + if (ret) + ret = dma_fence_wait(&node->unbind_fence, intr); + if (ret) + return ret; + + node = vma_res_itree_iter_next(node, offset, last); + } + + return 0; +} + +void i915_vma_resource_module_exit(void) +{ + kmem_cache_destroy(slab_vma_resources); +} + +int __init i915_vma_resource_module_init(void) +{ + slab_vma_resources = KMEM_CACHE(i915_vma_resource, SLAB_HWCACHE_ALIGN); + if (!slab_vma_resources) + return -ENOMEM; + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 9872de58268b..a89537e83c70 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -10,6 +10,8 @@ #include #include "i915_gem.h" +#include "i915_sw_fence.h" +#include "intel_runtime_pm.h" struct i915_page_sizes { /** @@ -39,6 +41,13 @@ struct i915_page_sizes { * @hold_count: Number of holders blocking the fence from finishing. * The vma itself is keeping a hold, which is released when unbind * is scheduled. + * @work: Work struct for deferred unbind work. + * @chain: Pointer to struct i915_sw_fence used to await dependencies. + * @rb: Rb node for the vm's pending unbind interval tree. + * @__subtree_last: Interval tree private member. + * @vm: non-refcounted pointer to the vm. This is for internal use only and + * this member is cleared after vm_resource unbind. + * @ops: Pointer to the backend i915_vma_ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. * @node_size: Size of the allocated range manager node. @@ -46,6 +55,8 @@ struct i915_page_sizes { * @page_sizes_gtt: Resulting page sizes from the bind operation. * @bound_flags: Flags indicating binding status. * @allocated: Backend private data. TODO: Should move into @private. + * @immediate_unbind: Unbind can be done immediately and don't need to be + * deferred to a work item awaiting unsignaled fences. * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -55,6 +66,12 @@ struct i915_vma_resource { /* See above for description of the lock. */ spinlock_t lock; refcount_t hold_count; + struct work_struct work; + struct i915_sw_fence chain; + struct rb_node rb; + unsigned long __subtree_last; + struct i915_address_space *vm; + intel_wakeref_t wakeref; /** * struct i915_vma_bindinfo - Information needed for async bind @@ -74,13 +91,17 @@ struct i915_vma_resource { bool lmem:1; } bi; + const struct i915_vma_ops *ops; void *private; unsigned long start; unsigned long node_size; unsigned long vma_size; u32 page_sizes_gtt; + u32 bound_flags; bool allocated:1; + bool immediate_unbind:1; + bool needs_wakeref:1; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -91,6 +112,8 @@ void i915_vma_resource_unhold(struct i915_vma_resource *vma_res, struct i915_vma_resource *i915_vma_resource_alloc(void); +void i915_vma_resource_free(struct i915_vma_resource *vma_res); + struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); void __i915_vma_resource_init(struct i915_vma_resource *vma_res); @@ -120,10 +143,12 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) /** * i915_vma_resource_init - Initialize a vma resource. * @vma_res: The vma resource to initialize + * @vm: Pointer to the vm. * @pages: The pages sg-table. * @page_sizes: Page sizes of the pages. * @readonly: Whether the vma should be bound read-only. * @lmem: Whether the vma points to lmem. + * @ops: The backend ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. * @node_size: Size of the allocated range manager node. @@ -135,20 +160,24 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) * allocation is not allowed. */ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, + struct i915_address_space *vm, struct sg_table *pages, const struct i915_page_sizes *page_sizes, bool readonly, bool lmem, + const struct i915_vma_ops *ops, void *private, unsigned long start, unsigned long node_size, unsigned long size) { __i915_vma_resource_init(vma_res); + vma_res->vm = vm; vma_res->bi.pages = pages; vma_res->bi.page_sizes = *page_sizes; vma_res->bi.readonly = readonly; vma_res->bi.lmem = lmem; + vma_res->ops = ops; vma_res->private = private; vma_res->start = start; vma_res->node_size = node_size; @@ -158,6 +187,25 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, static inline void i915_vma_resource_fini(struct i915_vma_resource *vma_res) { GEM_BUG_ON(refcount_read(&vma_res->hold_count) != 1); + i915_sw_fence_fini(&vma_res->chain); } +int i915_vma_resource_bind_dep_sync(struct i915_address_space *vm, + unsigned long first, + unsigned long last, + bool intr); + +int i915_vma_resource_bind_dep_await(struct i915_address_space *vm, + struct i915_sw_fence *sw_fence, + unsigned long first, + unsigned long last, + bool intr, + gfp_t gfp); + +void i915_vma_resource_bind_dep_sync_all(struct i915_address_space *vm); + +void i915_vma_resource_module_exit(void); + +int i915_vma_resource_module_init(void); + #endif From patchwork Mon Jan 3 11:59:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12702515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0759CC4332F for ; Mon, 3 Jan 2022 12:00:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2A47010E1FE; Mon, 3 Jan 2022 12:00:08 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 88E4D10E2A4; Mon, 3 Jan 2022 12:00:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1641211206; x=1672747206; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bZv4V7RZeGx1aoB2exJEuaIMe1M/H7z/8CoI78E9HS4=; b=bF9xQFGEDRrzk1wPLdR7dKgJiebVTx/sBalWH8HbIhsYGumPeRQWZKLM kiQe+AzMt+UkaggGox8dvgpWdlo/4m0zubWv7CILXpHAjBlHVzEMkHFhr g3UnyiB+YoCj/VPfncHjuFH2eFwDjHdIuR976Yf33iBv5aLNvFDcXJZ5D nIcqF/byISE/xzm3Rxinj981N3RO7Q+mT9PElf0OcdhIgTVzqvgNDsuxe z4fyM4XFXIUXiQqoPiUJUT5byXZVlOYfqvKzMXXU4LPFVB0srisY3Uy2F FZseZ0yh2oxrYXXocplg3jLfr9IAMynpYgiW3UkvVWkO610WIFyMMR8+R A==; X-IronPort-AV: E=McAfee;i="6200,9189,10215"; a="222044528" X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="222044528" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:06 -0800 X-IronPort-AV: E=Sophos;i="5.88,257,1635231600"; d="scan'208";a="512018538" Received: from mikaelsk-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.198]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2022 04:00:04 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Mon, 3 Jan 2022 12:59:47 +0100 Message-Id: <20220103115947.92688-5-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> References: <20220103115947.92688-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v4 4/4] drm/i915: Use struct vma_resource instead of struct vma_snapshot X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" There is always a struct vma_resource guaranteed to be alive when we access a corresponding struct vma_snapshot. So ditch the latter and instead of allocating vma_snapshots, reference the already existning vma_resource. This requires a couple of extra members in struct vma_resource but that's a small price to pay for the simplification. v2: - Fix a missing include and declaration (kernel test robot ) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 - .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 15 +-- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 +- drivers/gpu/drm/i915/i915_gpu_error.c | 87 ++++++------ drivers/gpu/drm/i915/i915_request.c | 12 +- drivers/gpu/drm/i915/i915_request.h | 6 +- drivers/gpu/drm/i915/i915_vma.c | 14 +- drivers/gpu/drm/i915/i915_vma_resource.c | 4 + drivers/gpu/drm/i915/i915_vma_resource.h | 28 +++- drivers/gpu/drm/i915/i915_vma_snapshot.c | 125 ------------------ drivers/gpu/drm/i915/i915_vma_snapshot.h | 101 -------------- 11 files changed, 89 insertions(+), 313 deletions(-) delete mode 100644 drivers/gpu/drm/i915/i915_vma_snapshot.c delete mode 100644 drivers/gpu/drm/i915/i915_vma_snapshot.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 98433ad74194..aa86ac33effc 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -175,7 +175,6 @@ i915-y += \ i915_ttm_buddy_manager.o \ i915_vma.o \ i915_vma_resource.o \ - i915_vma_snapshot.o \ intel_wopcm.o # general-purpose microcontroller (GuC) support diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 72e497745c12..2f85fe557ad2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -29,7 +29,6 @@ #include "i915_gem_ioctls.h" #include "i915_trace.h" #include "i915_user_extensions.h" -#include "i915_vma_snapshot.h" struct eb_vma { struct i915_vma *vma; @@ -1952,7 +1951,6 @@ static void eb_capture_stage(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; unsigned int i = count, j; - struct i915_vma_snapshot *vsnap; while (i--) { struct eb_vma *ev = &eb->vma[i]; @@ -1962,11 +1960,6 @@ static void eb_capture_stage(struct i915_execbuffer *eb) if (!(flags & EXEC_OBJECT_CAPTURE)) continue; - vsnap = i915_vma_snapshot_alloc(GFP_KERNEL); - if (!vsnap) - continue; - - i915_vma_snapshot_init(vsnap, vma, "user"); for_each_batch_create_order(eb, j) { struct i915_capture_list *capture; @@ -1975,10 +1968,9 @@ static void eb_capture_stage(struct i915_execbuffer *eb) continue; capture->next = eb->capture_lists[j]; - capture->vma_snapshot = i915_vma_snapshot_get(vsnap); + capture->vma_res = i915_vma_resource_get(vma->resource); eb->capture_lists[j] = capture; } - i915_vma_snapshot_put(vsnap); } } @@ -3281,9 +3273,8 @@ eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence, * _onstack interface. */ if (eb->batches[i]->vma) - i915_vma_snapshot_init_onstack(&eb->requests[i]->batch_snapshot, - eb->batches[i]->vma, - "batch"); + eb->requests[i]->batch_res = + i915_vma_resource_get(eb->batches[i]->vma->resource); if (eb->batch_pool) { GEM_BUG_ON(intel_context_is_parallel(eb->context)); intel_gt_buffer_pool_mark_active(eb->batch_pool, diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 74aa90587061..d1daa4cc2895 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1708,18 +1708,15 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, static void print_request_ring(struct drm_printer *m, struct i915_request *rq) { - struct i915_vma_snapshot *vsnap = &rq->batch_snapshot; + struct i915_vma_resource *vma_res = rq->batch_res; void *ring; int size; - if (!i915_vma_snapshot_present(vsnap)) - vsnap = NULL; - drm_printf(m, "[head %04x, postfix %04x, tail %04x, batch 0x%08x_%08x]:\n", rq->head, rq->postfix, rq->tail, - vsnap ? upper_32_bits(vsnap->vma_resource->start) : ~0u, - vsnap ? lower_32_bits(vsnap->vma_resource->start) : ~0u); + vma_res ? upper_32_bits(vma_res->start) : ~0u, + vma_res ? lower_32_bits(vma_res->start) : ~0u); size = rq->tail - rq->head; if (rq->tail < rq->head) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 1af54ff374f9..f8c4336cba89 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -48,7 +48,6 @@ #include "i915_gpu_error.h" #include "i915_memcpy.h" #include "i915_scatterlist.h" -#include "i915_vma_snapshot.h" #define ALLOW_FAIL (__GFP_KSWAPD_RECLAIM | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) #define ATOMIC_MAYFAIL (GFP_ATOMIC | __GFP_NOWARN) @@ -1013,8 +1012,10 @@ void __i915_gpu_coredump_free(struct kref *error_ref) static struct i915_vma_coredump * i915_vma_coredump_create(const struct intel_gt *gt, - const struct i915_vma_snapshot *vsnap, - struct i915_vma_compress *compress) + const struct i915_vma_resource *vma_res, + struct i915_vma_compress *compress, + const char *name) + { struct i915_ggtt *ggtt = gt->ggtt; const u64 slot = ggtt->error_capture.start; @@ -1024,7 +1025,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, might_sleep(); - if (!vsnap || !vsnap->pages || !compress) + if (!vma_res || !vma_res->bi.pages || !compress) return NULL; dst = kmalloc(sizeof(*dst), ALLOW_FAIL); @@ -1037,12 +1038,12 @@ i915_vma_coredump_create(const struct intel_gt *gt, } INIT_LIST_HEAD(&dst->page_list); - strcpy(dst->name, vsnap->name); + strcpy(dst->name, name); dst->next = NULL; - dst->gtt_offset = vsnap->vma_resource->start; - dst->gtt_size = vsnap->vma_resource->node_size; - dst->gtt_page_sizes = vsnap->vma_resource->page_sizes_gtt; + dst->gtt_offset = vma_res->start; + dst->gtt_size = vma_res->node_size; + dst->gtt_page_sizes = vma_res->page_sizes_gtt; dst->unused = 0; ret = -EINVAL; @@ -1050,7 +1051,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, void __iomem *s; dma_addr_t dma; - for_each_sgt_daddr(dma, iter, vsnap->pages) { + for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { mutex_lock(&ggtt->error_mutex); ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0); @@ -1068,11 +1069,11 @@ i915_vma_coredump_create(const struct intel_gt *gt, if (ret) break; } - } else if (vsnap->mr && vsnap->mr->type != INTEL_MEMORY_SYSTEM) { - struct intel_memory_region *mem = vsnap->mr; + } else if (vma_res->bi.lmem) { + struct intel_memory_region *mem = vma_res->mr; dma_addr_t dma; - for_each_sgt_daddr(dma, iter, vsnap->pages) { + for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { void __iomem *s; s = io_mapping_map_wc(&mem->iomap, @@ -1088,7 +1089,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, } else { struct page *page; - for_each_sgt_page(page, iter, vsnap->pages) { + for_each_sgt_page(page, iter, vma_res->bi.pages) { void *s; drm_clflush_pages(&page, 1); @@ -1324,33 +1325,32 @@ static bool record_context(struct i915_gem_context_coredump *e, struct intel_engine_capture_vma { struct intel_engine_capture_vma *next; - struct i915_vma_snapshot *vsnap; + struct i915_vma_resource *vma_res; char name[16]; bool lockdep_cookie; }; static struct intel_engine_capture_vma * capture_vma_snapshot(struct intel_engine_capture_vma *next, - struct i915_vma_snapshot *vsnap, - gfp_t gfp) + struct i915_vma_resource *vma_res, + gfp_t gfp, const char *name) { struct intel_engine_capture_vma *c; - if (!i915_vma_snapshot_present(vsnap)) + if (!vma_res) return next; c = kmalloc(sizeof(*c), gfp); if (!c) return next; - if (!i915_vma_snapshot_resource_pin(vsnap, &c->lockdep_cookie)) { + if (!i915_vma_resource_hold(vma_res, &c->lockdep_cookie)) { kfree(c); return next; } - strcpy(c->name, vsnap->name); - c->vsnap = vsnap; - i915_vma_snapshot_get(vsnap); + strcpy(c->name, name); + c->vma_res = i915_vma_resource_get(vma_res); c->next = next; return c; @@ -1362,8 +1362,6 @@ capture_vma(struct intel_engine_capture_vma *next, const char *name, gfp_t gfp) { - struct i915_vma_snapshot *vsnap; - if (!vma) return next; @@ -1372,19 +1370,10 @@ capture_vma(struct intel_engine_capture_vma *next, * to a struct i915_vma_snapshot at command submission time. * Not here. */ - GEM_WARN_ON(!i915_vma_is_pinned(vma)); - if (!i915_vma_is_pinned(vma)) - return next; - - vsnap = i915_vma_snapshot_alloc(gfp); - if (!vsnap) + if (GEM_WARN_ON(!i915_vma_is_pinned(vma))) return next; - i915_vma_snapshot_init(vsnap, vma, name); - next = capture_vma_snapshot(next, vsnap, gfp); - - /* FIXME: Replace on async unbind. */ - i915_vma_snapshot_put(vsnap); + next = capture_vma_snapshot(next, vma->resource, gfp, name); return next; } @@ -1397,7 +1386,8 @@ capture_user(struct intel_engine_capture_vma *capture, struct i915_capture_list *c; for (c = rq->capture_list; c; c = c->next) - capture = capture_vma_snapshot(capture, c->vma_snapshot, gfp); + capture = capture_vma_snapshot(capture, c->vma_res, gfp, + "user"); return capture; } @@ -1415,16 +1405,19 @@ static struct i915_vma_coredump * create_vma_coredump(const struct intel_gt *gt, struct i915_vma *vma, const char *name, struct i915_vma_compress *compress) { - struct i915_vma_coredump *ret; - struct i915_vma_snapshot tmp; + struct i915_vma_coredump *ret = NULL; + struct i915_vma_resource *vma_res; + bool lockdep_cookie; if (!vma) return NULL; - GEM_WARN_ON(!i915_vma_is_pinned(vma)); - i915_vma_snapshot_init_onstack(&tmp, vma, name); - ret = i915_vma_coredump_create(gt, &tmp, compress); - i915_vma_snapshot_put_onstack(&tmp); + vma_res = vma->resource; + + if (i915_vma_resource_hold(vma_res, &lockdep_cookie)) { + ret = i915_vma_coredump_create(gt, vma_res, compress, name); + i915_vma_resource_unhold(vma_res, lockdep_cookie); + } return ret; } @@ -1471,7 +1464,7 @@ intel_engine_coredump_add_request(struct intel_engine_coredump *ee, * as the simplest method to avoid being overwritten * by userspace. */ - vma = capture_vma_snapshot(vma, &rq->batch_snapshot, gfp); + vma = capture_vma_snapshot(vma, rq->batch_res, gfp, "batch"); vma = capture_user(vma, rq, gfp); vma = capture_vma(vma, rq->ring->vma, "ring", gfp); vma = capture_vma(vma, rq->context->state, "HW context", gfp); @@ -1492,14 +1485,14 @@ intel_engine_coredump_add_vma(struct intel_engine_coredump *ee, while (capture) { struct intel_engine_capture_vma *this = capture; - struct i915_vma_snapshot *vsnap = this->vsnap; + struct i915_vma_resource *vma_res = this->vma_res; add_vma(ee, - i915_vma_coredump_create(engine->gt, - vsnap, compress)); + i915_vma_coredump_create(engine->gt, vma_res, + compress, this->name)); - i915_vma_snapshot_resource_unpin(vsnap, this->lockdep_cookie); - i915_vma_snapshot_put(vsnap); + i915_vma_resource_unhold(vma_res, this->lockdep_cookie); + i915_vma_resource_put(vma_res); capture = this->next; kfree(this); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 76cf5ac91e94..ba3a70b2cc57 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -116,8 +116,10 @@ static void i915_fence_release(struct dma_fence *fence) rq->guc_prio != GUC_PRIO_FINI); i915_request_free_capture_list(fetch_and_zero(&rq->capture_list)); - if (i915_vma_snapshot_present(&rq->batch_snapshot)) - i915_vma_snapshot_put_onstack(&rq->batch_snapshot); + if (rq->batch_res) { + i915_vma_resource_put(rq->batch_res); + rq->batch_res = NULL; + } /* * The request is put onto a RCU freelist (i.e. the address @@ -308,7 +310,7 @@ void i915_request_free_capture_list(struct i915_capture_list *capture) while (capture) { struct i915_capture_list *next = capture->next; - i915_vma_snapshot_put(capture->vma_snapshot); + i915_vma_resource_put(capture->vma_res); kfree(capture); capture = next; } @@ -854,7 +856,7 @@ static void __i915_request_ctor(void *arg) i915_sw_fence_init(&rq->semaphore, semaphore_notify); clear_capture_list(rq); - rq->batch_snapshot.present = false; + rq->batch_res = NULL; init_llist_head(&rq->execute_cb); } @@ -960,7 +962,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) __rq_init_watchdog(rq); assert_capture_list_is_null(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); - GEM_BUG_ON(i915_vma_snapshot_present(&rq->batch_snapshot)); + GEM_BUG_ON(rq->batch_res); /* * Reserve space in the ring buffer for all the commands required to diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 170ee78c2858..28b1f9db5487 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -40,7 +40,7 @@ #include "i915_scheduler.h" #include "i915_selftest.h" #include "i915_sw_fence.h" -#include "i915_vma_snapshot.h" +#include "i915_vma_resource.h" #include @@ -52,7 +52,7 @@ struct i915_request; #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) struct i915_capture_list { - struct i915_vma_snapshot *vma_snapshot; + struct i915_vma_resource *vma_res; struct i915_capture_list *next; }; @@ -300,7 +300,7 @@ struct i915_request { /** Batch buffer pointer for selftest internal use. */ I915_SELFTEST_DECLARE(struct i915_vma *batch); - struct i915_vma_snapshot batch_snapshot; + struct i915_vma_resource *batch_res; #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) /** diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index bfbbbc6fc818..1ed8dfdd7d45 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -288,7 +288,6 @@ struct i915_vma_work { struct i915_vma_resource *vma_res; struct drm_i915_gem_object *pinned; struct i915_sw_dma_fence_cb cb; - struct i915_refct_sgt *rsgt; enum i915_cache_level cache_level; unsigned int flags; }; @@ -316,8 +315,6 @@ static void __vma_release(struct dma_fence_work *work) i915_vm_put(vw->vm); if (vw->vma_res) i915_vma_resource_put(vw->vma_res); - if (vw->rsgt) - i915_refct_sgt_put(vw->rsgt); } static const struct dma_fence_work_ops bind_ops = { @@ -388,8 +385,8 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, struct drm_i915_gem_object *obj = vma->obj; i915_vma_resource_init(vma_res, vma->vm, vma->pages, &vma->page_sizes, - i915_gem_object_is_readonly(obj), - i915_gem_object_is_lmem(obj), + obj->mm.rsgt, i915_gem_object_is_readonly(obj), + i915_gem_object_is_lmem(obj), obj->mm.region, vma->ops, vma->private, vma->node.start, vma->node.size, vma->size); } @@ -480,8 +477,6 @@ int i915_vma_bind(struct i915_vma *vma, work->vma_res = i915_vma_resource_get(vma->resource); work->cache_level = cache_level; work->flags = bind_flags; - if (vma->obj->mm.rsgt) - work->rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); /* * Note we only want to chain up to the migration fence on @@ -1823,7 +1818,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) GEM_BUG_ON(i915_vma_has_userfault(vma)); /* Object backend must be async capable. */ - GEM_WARN_ON(async && !vma->obj->mm.rsgt); + GEM_WARN_ON(async && !vma->resource->bi.pages_rsgt); /* If vm is not open, unbind is a nop. */ vma_res->needs_wakeref = i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND) && @@ -1836,9 +1831,6 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); - /* Object backend must be async capable. */ - GEM_WARN_ON(async && !vma->obj->mm.rsgt); - i915_vma_detach(vma); if (!async && unbind_fence) { diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 88b4ebb800c2..21ec752a76c2 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -9,6 +9,7 @@ #include "i915_sw_fence.h" #include "i915_vma_resource.h" #include "i915_drv.h" +#include "intel_memory_region.h" #include "gt/intel_gtt.h" @@ -117,6 +118,9 @@ static void __i915_vma_resource_unhold(struct i915_vma_resource *vma_res) vma_res_itree_remove(vma_res, &vm->pending_unbind); mutex_unlock(&vm->mutex); } + + if (vma_res->bi.pages_rsgt) + i915_refct_sgt_put(vma_res->bi.pages_rsgt); } /** diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index a89537e83c70..faecdf3e7eca 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -10,9 +10,12 @@ #include #include "i915_gem.h" +#include "i915_scatterlist.h" #include "i915_sw_fence.h" #include "intel_runtime_pm.h" +struct intel_memory_region; + struct i915_page_sizes { /** * The sg mask of the pages sg_table. i.e the mask of @@ -47,6 +50,7 @@ struct i915_page_sizes { * @__subtree_last: Interval tree private member. * @vm: non-refcounted pointer to the vm. This is for internal use only and * this member is cleared after vm_resource unbind. + * @mr: The memory region of the object pointed to by the vma. * @ops: Pointer to the backend i915_vma_ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. @@ -55,8 +59,10 @@ struct i915_page_sizes { * @page_sizes_gtt: Resulting page sizes from the bind operation. * @bound_flags: Flags indicating binding status. * @allocated: Backend private data. TODO: Should move into @private. - * @immediate_unbind: Unbind can be done immediately and don't need to be - * deferred to a work item awaiting unsignaled fences. + * @immediate_unbind: Unbind can be done immediately and doesn't need to be + * deferred to a work item awaiting unsignaled fences. This is a hack. + * (dma_fence_work uses a fence flag for this, but this seems slightly + * cleaner). * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -81,16 +87,22 @@ struct i915_vma_resource { * and flags * @pages: The pages sg-table. * @page_sizes: Page sizes of the pages. + * @pages_rsgt: Refcounted sg-table when delayed object destruction + * is supported. May be NULL. * @readonly: Whether the vma should be bound read-only. * @lmem: Whether the vma points to lmem. */ struct i915_vma_bindinfo { struct sg_table *pages; struct i915_page_sizes page_sizes; + struct i915_refct_sgt *pages_rsgt; bool readonly:1; bool lmem:1; } bi; +#if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) + struct intel_memory_region *mr; +#endif const struct i915_vma_ops *ops; void *private; unsigned long start; @@ -146,8 +158,11 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) * @vm: Pointer to the vm. * @pages: The pages sg-table. * @page_sizes: Page sizes of the pages. + * @pages_rsgt: Pointer to a struct i915_refct_sgt of an object with + * delayed destruction. * @readonly: Whether the vma should be bound read-only. * @lmem: Whether the vma points to lmem. + * @mr: The memory region of the object the vma points to. * @ops: The backend ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. @@ -163,8 +178,10 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, struct i915_address_space *vm, struct sg_table *pages, const struct i915_page_sizes *page_sizes, + struct i915_refct_sgt *pages_rsgt, bool readonly, bool lmem, + struct intel_memory_region *mr, const struct i915_vma_ops *ops, void *private, unsigned long start, @@ -175,8 +192,13 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, vma_res->vm = vm; vma_res->bi.pages = pages; vma_res->bi.page_sizes = *page_sizes; + if (pages_rsgt) + vma_res->bi.pages_rsgt = i915_refct_sgt_get(pages_rsgt); vma_res->bi.readonly = readonly; vma_res->bi.lmem = lmem; +#if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) + vma_res->mr = mr; +#endif vma_res->ops = ops; vma_res->private = private; vma_res->start = start; @@ -187,6 +209,8 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, static inline void i915_vma_resource_fini(struct i915_vma_resource *vma_res) { GEM_BUG_ON(refcount_read(&vma_res->hold_count) != 1); + if (vma_res->bi.pages_rsgt) + i915_refct_sgt_put(vma_res->bi.pages_rsgt); i915_sw_fence_fini(&vma_res->chain); } diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.c b/drivers/gpu/drm/i915/i915_vma_snapshot.c deleted file mode 100644 index 69f62c1ca967..000000000000 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.c +++ /dev/null @@ -1,125 +0,0 @@ -// SPDX-License-Identifier: MIT -/* - * Copyright © 2021 Intel Corporation - */ - -#include "i915_vma_resource.h" -#include "i915_vma_snapshot.h" -#include "i915_vma_types.h" -#include "i915_vma.h" - -/** - * i915_vma_snapshot_init - Initialize a struct i915_vma_snapshot from - * a struct i915_vma. - * @vsnap: The i915_vma_snapshot to init. - * @vma: A struct i915_vma used to initialize @vsnap. - * @name: Name associated with the snapshot. The character pointer needs to - * stay alive over the lifitime of the shapsot - */ -void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name) -{ - if (!i915_vma_is_pinned(vma)) - assert_object_held(vma->obj); - - vsnap->name = name; - vsnap->obj_size = vma->obj->base.size; - vsnap->pages = vma->pages; - vsnap->pages_rsgt = NULL; - vsnap->mr = NULL; - if (vma->obj->mm.rsgt) - vsnap->pages_rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); - vsnap->mr = vma->obj->mm.region; - kref_init(&vsnap->kref); - vsnap->vma_resource = i915_vma_get_current_resource(vma); - vsnap->onstack = false; - vsnap->present = true; -} - -/** - * i915_vma_snapshot_init_onstack - Initialize a struct i915_vma_snapshot from - * a struct i915_vma, but avoid kfreeing it on last put. - * @vsnap: The i915_vma_snapshot to init. - * @vma: A struct i915_vma used to initialize @vsnap. - * @name: Name associated with the snapshot. The character pointer needs to - * stay alive over the lifitime of the shapsot - */ -void i915_vma_snapshot_init_onstack(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name) -{ - i915_vma_snapshot_init(vsnap, vma, name); - vsnap->onstack = true; -} - -static void vma_snapshot_release(struct kref *ref) -{ - struct i915_vma_snapshot *vsnap = - container_of(ref, typeof(*vsnap), kref); - - vsnap->present = false; - i915_vma_resource_put(vsnap->vma_resource); - if (vsnap->pages_rsgt) - i915_refct_sgt_put(vsnap->pages_rsgt); - if (!vsnap->onstack) - kfree(vsnap); -} - -/** - * i915_vma_snapshot_put - Put an i915_vma_snapshot pointer reference - * @vsnap: The pointer reference - */ -void i915_vma_snapshot_put(struct i915_vma_snapshot *vsnap) -{ - kref_put(&vsnap->kref, vma_snapshot_release); -} - -/** - * i915_vma_snapshot_put_onstack - Put an onstcak i915_vma_snapshot pointer - * reference and varify that the structure is released - * @vsnap: The pointer reference - * - * This function is intended to be paired with a i915_vma_init_onstack() - * and should be called before exiting the scope that declared or - * freeing the structure that embedded @vsnap to verify that all references - * have been released. - */ -void i915_vma_snapshot_put_onstack(struct i915_vma_snapshot *vsnap) -{ - if (!kref_put(&vsnap->kref, vma_snapshot_release)) - GEM_BUG_ON(1); -} - -/** - * i915_vma_snapshot_resource_pin - Temporarily block the memory the - * vma snapshot is pointing to from being released. - * @vsnap: The vma snapshot. - * @lockdep_cookie: Pointer to bool needed for lockdep support. This needs - * to be passed to the paired i915_vma_snapshot_resource_unpin. - * - * This function will temporarily try to hold up a fence or similar structure - * and will therefore enter a fence signaling critical section. - * - * Return: true if we succeeded in blocking the memory from being released, - * false otherwise. - */ -bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, - bool *lockdep_cookie) -{ - return i915_vma_resource_hold(vsnap->vma_resource, lockdep_cookie); -} - -/** - * i915_vma_snapshot_resource_unpin - Unblock vma snapshot memory from - * being released. - * @vsnap: The vma snapshot. - * @lockdep_cookie: Cookie returned from matching i915_vma_resource_pin(). - * - * Might leave a fence signalling critical section and signal a fence. - */ -void i915_vma_snapshot_resource_unpin(struct i915_vma_snapshot *vsnap, - bool lockdep_cookie) -{ - i915_vma_resource_unhold(vsnap->vma_resource, lockdep_cookie); -} diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.h b/drivers/gpu/drm/i915/i915_vma_snapshot.h deleted file mode 100644 index 1b08ce9f8576..000000000000 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.h +++ /dev/null @@ -1,101 +0,0 @@ -/* SPDX-License-Identifier: MIT */ -/* - * Copyright © 2021 Intel Corporation - */ -#ifndef _I915_VMA_SNAPSHOT_H_ -#define _I915_VMA_SNAPSHOT_H_ - -#include -#include -#include - -struct i915_active; -struct i915_refct_sgt; -struct i915_vma; -struct intel_memory_region; -struct sg_table; - -/** - * DOC: Simple utilities for snapshotting GPU vma metadata, later used for - * error capture. Vi use a separate header for this to avoid issues due to - * recursive header includes. - */ - -/** - * struct i915_vma_snapshot - Snapshot of vma metadata. - * @obj_size: The size of the underlying object in bytes. - * @pages: The struct sg_table pointing to the pages bound. - * @pages_rsgt: The refcounted sg_table holding the reference for @pages if any. - * @mr: The memory region pointed for the pages bound. - * @kref: Reference for this structure. - * @vma_resource: Pointer to the vma resource representing the vma binding. - * @onstack: Whether the structure shouldn't be freed on final put. - * @present: Whether the structure is present and initialized. - */ -struct i915_vma_snapshot { - const char *name; - size_t obj_size; - struct sg_table *pages; - struct i915_refct_sgt *pages_rsgt; - struct intel_memory_region *mr; - struct kref kref; - struct i915_vma_resource *vma_resource; - bool onstack:1; - bool present:1; -}; - -void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name); - -void i915_vma_snapshot_init_onstack(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name); - -void i915_vma_snapshot_put(struct i915_vma_snapshot *vsnap); - -void i915_vma_snapshot_put_onstack(struct i915_vma_snapshot *vsnap); - -bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, - bool *lockdep_cookie); - -void i915_vma_snapshot_resource_unpin(struct i915_vma_snapshot *vsnap, - bool lockdep_cookie); - -/** - * i915_vma_snapshot_alloc - Allocate a struct i915_vma_snapshot - * @gfp: Allocation mode. - * - * Return: A pointer to a struct i915_vma_snapshot if successful. - * NULL otherwise. - */ -static inline struct i915_vma_snapshot *i915_vma_snapshot_alloc(gfp_t gfp) -{ - return kmalloc(sizeof(struct i915_vma_snapshot), gfp); -} - -/** - * i915_vma_snapshot_get - Take a reference on a struct i915_vma_snapshot - * - * Return: A pointer to a struct i915_vma_snapshot. - */ -static inline struct i915_vma_snapshot * -i915_vma_snapshot_get(struct i915_vma_snapshot *vsnap) -{ - kref_get(&vsnap->kref); - return vsnap; -} - -/** - * i915_vma_snapshot_present - Whether a struct i915_vma_snapshot is - * present and initialized. - * - * Return: true if present and initialized; false otherwise. - */ -static inline bool -i915_vma_snapshot_present(const struct i915_vma_snapshot *vsnap) -{ - return vsnap && vsnap->present; -} - -#endif