From patchwork Fri Dec 17 09:19:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 58DCFC433EF for ; Fri, 17 Dec 2021 09:19:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 25E3F10FA25; Fri, 17 Dec 2021 09:19:43 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6D2FD10FA27; Fri, 17 Dec 2021 09:19:41 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493328" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493328" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:41 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321508" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:39 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:23 +0100 Message-Id: <20211217091929.105781-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 1/7] drm/i915: Avoid using the i915_fence_array when collecting dependencies X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since the gt migration code was using only a single fence for dependencies, these were collected in a dma_fence_array. However, it turns out that it's illegal to use some dma_fences in a dma_fence_array, in particular other dma_fence_arrays and dma_fence_chains, and this causes trouble for us moving forward. Have the gt migration code instead take a const struct i915_deps for dependencies. This means we can skip the dma_fence_array creation and instead pass the struct i915_deps instead to circumvent the problem. v2: - Make the prev_deps() function static. (kernel test robot ) - Update the struct i915_deps kerneldoc. Fixes: 5652df829b3c ("drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous") Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 129 +++++-------------- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 17 +++ drivers/gpu/drm/i915/gt/intel_migrate.c | 24 ++-- drivers/gpu/drm/i915/gt/intel_migrate.h | 9 +- drivers/gpu/drm/i915/i915_request.c | 22 ++++ drivers/gpu/drm/i915/i915_request.h | 2 + 6 files changed, 91 insertions(+), 112 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index 80df9f592407..960145c8200f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -3,8 +3,6 @@ * Copyright © 2021 Intel Corporation */ -#include - #include #include "i915_drv.h" @@ -44,17 +42,12 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration, #endif /** - * DOC: Set of utilities to dynamically collect dependencies and - * eventually coalesce them into a single fence which is fed into - * the GT migration code, since it only accepts a single dependency - * fence. - * The single fence returned from these utilities, in the case of - * dependencies from multiple fence contexts, a struct dma_fence_array, - * since the i915 request code can break that up and await the individual - * fences. + * DOC: Set of utilities to dynamically collect dependencies into a + * structure which is fed into the GT migration code. * * Once we can do async unbinding, this is also needed to coalesce - * the migration fence with the unbind fences. + * the migration fence with the unbind fences if these are coalesced + * post-migration. * * While collecting the individual dependencies, we store the refcounted * struct dma_fence pointers in a realloc-managed pointer array, since @@ -65,32 +58,13 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration, * A struct i915_deps need to be initialized using i915_deps_init(). * If i915_deps_add_dependency() or i915_deps_add_resv() return an * error code they will internally call i915_deps_fini(), which frees - * all internal references and allocations. After a call to - * i915_deps_to_fence(), or i915_deps_sync(), the struct should similarly - * be viewed as uninitialized. + * all internal references and allocations. * * We might want to break this out into a separate file as a utility. */ #define I915_DEPS_MIN_ALLOC_CHUNK 8U -/** - * struct i915_deps - Collect dependencies into a single dma-fence - * @single: Storage for pointer if the collection is a single fence. - * @fence: Allocated array of fence pointers if more than a single fence; - * otherwise points to the address of @single. - * @num_deps: Current number of dependency fences. - * @fences_size: Size of the @fences array in number of pointers. - * @gfp: Allocation mode. - */ -struct i915_deps { - struct dma_fence *single; - struct dma_fence **fences; - unsigned int num_deps; - unsigned int fences_size; - gfp_t gfp; -}; - static void i915_deps_reset_fences(struct i915_deps *deps) { if (deps->fences != &deps->single) @@ -163,7 +137,7 @@ static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence, return ret; } -static int i915_deps_sync(struct i915_deps *deps, +static int i915_deps_sync(const struct i915_deps *deps, const struct ttm_operation_ctx *ctx) { struct dma_fence **fences = deps->fences; @@ -183,7 +157,6 @@ static int i915_deps_sync(struct i915_deps *deps, break; } - i915_deps_fini(deps); return ret; } @@ -221,34 +194,6 @@ static int i915_deps_add_dependency(struct i915_deps *deps, return i915_deps_grow(deps, fence, ctx); } -static struct dma_fence *i915_deps_to_fence(struct i915_deps *deps, - const struct ttm_operation_ctx *ctx) -{ - struct dma_fence_array *array; - - if (deps->num_deps == 0) - return NULL; - - if (deps->num_deps == 1) { - deps->num_deps = 0; - return deps->fences[0]; - } - - /* - * TODO: Alter the allocation mode here to not try too hard to - * make things async. - */ - array = dma_fence_array_create(deps->num_deps, deps->fences, 0, 0, - false); - if (!array) - return ERR_PTR(i915_deps_sync(deps, ctx)); - - deps->fences = NULL; - i915_deps_reset_fences(deps); - - return &array->base; -} - static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, bool all, const bool no_excl, const struct ttm_operation_ctx *ctx) @@ -387,7 +332,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo, struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm, struct sg_table *dst_st, - struct dma_fence *dep) + const struct i915_deps *deps) { struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), bdev); @@ -411,7 +356,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo, return ERR_PTR(-EINVAL); intel_engine_pm_get(i915->gt.migrate.context->engine); - ret = intel_context_migrate_clear(i915->gt.migrate.context, dep, + ret = intel_context_migrate_clear(i915->gt.migrate.context, deps, dst_st->sgl, dst_level, i915_ttm_gtt_binds_lmem(dst_mem), 0, &rq); @@ -425,7 +370,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo, src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm); intel_engine_pm_get(i915->gt.migrate.context->engine); ret = intel_context_migrate_copy(i915->gt.migrate.context, - dep, src_rsgt->table.sgl, + deps, src_rsgt->table.sgl, src_level, i915_ttm_gtt_binds_lmem(bo->resource), dst_st->sgl, dst_level, @@ -610,10 +555,11 @@ i915_ttm_memcpy_work_arm(struct i915_ttm_memcpy_work *work, } static struct dma_fence * -__i915_ttm_move(struct ttm_buffer_object *bo, bool clear, +__i915_ttm_move(struct ttm_buffer_object *bo, + const struct ttm_operation_ctx *ctx, bool clear, struct ttm_resource *dst_mem, struct ttm_tt *dst_ttm, struct i915_refct_sgt *dst_rsgt, bool allow_accel, - struct dma_fence *move_dep) + const struct i915_deps *move_deps) { struct i915_ttm_memcpy_work *copy_work = NULL; struct i915_ttm_memcpy_arg _arg, *arg = &_arg; @@ -621,7 +567,7 @@ __i915_ttm_move(struct ttm_buffer_object *bo, bool clear, if (allow_accel) { fence = i915_ttm_accel_move(bo, clear, dst_mem, dst_ttm, - &dst_rsgt->table, move_dep); + &dst_rsgt->table, move_deps); /* * We only need to intercept the error when moving to lmem. @@ -655,8 +601,8 @@ __i915_ttm_move(struct ttm_buffer_object *bo, bool clear, if (!IS_ERR(fence)) goto out; - } else if (move_dep) { - int err = dma_fence_wait(move_dep, true); + } else if (move_deps) { + int err = i915_deps_sync(move_deps, ctx); if (err) return ERR_PTR(err); @@ -680,29 +626,21 @@ __i915_ttm_move(struct ttm_buffer_object *bo, bool clear, return fence; } -static struct dma_fence *prev_fence(struct ttm_buffer_object *bo, - struct ttm_operation_ctx *ctx) +static int +prev_deps(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx, + struct i915_deps *deps) { - struct i915_deps deps; int ret; - /* - * Instead of trying hard with GFP_KERNEL to allocate memory, - * the dependency collection will just sync if it doesn't - * succeed. - */ - i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); - ret = i915_deps_add_dependency(&deps, bo->moving, ctx); + ret = i915_deps_add_dependency(deps, bo->moving, ctx); if (!ret) /* * TODO: Only await excl fence here, and shared fences before * signaling the migration fence. */ - ret = i915_deps_add_resv(&deps, bo->base.resv, true, false, ctx); - if (ret) - return ERR_PTR(ret); + ret = i915_deps_add_resv(deps, bo->base.resv, true, false, ctx); - return i915_deps_to_fence(&deps, ctx); + return ret; } /** @@ -756,16 +694,18 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict, clear = !i915_ttm_cpu_maps_iomem(bo->resource) && (!ttm || !ttm_tt_is_populated(ttm)); if (!(clear && ttm && !(ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC))) { - struct dma_fence *dep = prev_fence(bo, ctx); + struct i915_deps deps; - if (IS_ERR(dep)) { + i915_deps_init(&deps, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); + ret = prev_deps(bo, ctx, &deps); + if (ret) { i915_refct_sgt_put(dst_rsgt); - return PTR_ERR(dep); + return ret; } - migration_fence = __i915_ttm_move(bo, clear, dst_mem, bo->ttm, - dst_rsgt, true, dep); - dma_fence_put(dep); + migration_fence = __i915_ttm_move(bo, ctx, clear, dst_mem, bo->ttm, + dst_rsgt, true, &deps); + i915_deps_fini(&deps); } /* We can possibly get an -ERESTARTSYS here */ @@ -826,7 +766,7 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst, .interruptible = intr, }; struct i915_refct_sgt *dst_rsgt; - struct dma_fence *copy_fence, *dep_fence; + struct dma_fence *copy_fence; struct i915_deps deps; int ret, shared_err; @@ -847,15 +787,12 @@ int i915_gem_obj_copy_ttm(struct drm_i915_gem_object *dst, if (ret) return ret; - dep_fence = i915_deps_to_fence(&deps, &ctx); - if (IS_ERR(dep_fence)) - return PTR_ERR(dep_fence); - dst_rsgt = i915_ttm_resource_get_st(dst, dst_bo->resource); - copy_fence = __i915_ttm_move(src_bo, false, dst_bo->resource, + copy_fence = __i915_ttm_move(src_bo, &ctx, false, dst_bo->resource, dst_bo->ttm, dst_rsgt, allow_accel, - dep_fence); + &deps); + i915_deps_fini(&deps); i915_refct_sgt_put(dst_rsgt); if (IS_ERR_OR_NULL(copy_fence)) return PTR_ERR_OR_ZERO(copy_fence); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h index d2e7f149e05c..138b7647a558 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h @@ -18,6 +18,23 @@ struct ttm_tt; struct drm_i915_gem_object; struct i915_refct_sgt; +/** + * struct i915_deps - Collect dependencies into a single dma-fence + * @single: Storage for pointer if the collection is a single fence. + * @fences: Allocated array of fence pointers if more than a single fence; + * otherwise points to the address of @single. + * @num_deps: Current number of dependency fences. + * @fences_size: Size of the @fences array in number of pointers. + * @gfp: Allocation mode. + */ +struct i915_deps { + struct dma_fence *single; + struct dma_fence **fences; + unsigned int num_deps; + unsigned int fences_size; + gfp_t gfp; +}; + int i915_ttm_move_notify(struct ttm_buffer_object *bo); I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration, diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 19a01878fee3..18b44af56969 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -404,7 +404,7 @@ static int emit_copy(struct i915_request *rq, int size) int intel_context_migrate_copy(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -431,8 +431,8 @@ intel_context_migrate_copy(struct intel_context *ce, goto out_ce; } - if (await) { - err = i915_request_await_dma_fence(rq, await); + if (deps) { + err = i915_request_await_deps(rq, deps); if (err) goto out_rq; @@ -442,7 +442,7 @@ intel_context_migrate_copy(struct intel_context *ce, goto out_rq; } - await = NULL; + deps = NULL; } /* The PTE updates + copy must not be interrupted. */ @@ -525,7 +525,7 @@ static int emit_clear(struct i915_request *rq, int size, u32 value) int intel_context_migrate_clear(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, @@ -550,8 +550,8 @@ intel_context_migrate_clear(struct intel_context *ce, goto out_ce; } - if (await) { - err = i915_request_await_dma_fence(rq, await); + if (deps) { + err = i915_request_await_deps(rq, deps); if (err) goto out_rq; @@ -561,7 +561,7 @@ intel_context_migrate_clear(struct intel_context *ce, goto out_rq; } - await = NULL; + deps = NULL; } /* The PTE updates + clear must not be interrupted. */ @@ -599,7 +599,7 @@ intel_context_migrate_clear(struct intel_context *ce, int intel_migrate_copy(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -624,7 +624,7 @@ int intel_migrate_copy(struct intel_migrate *m, if (err) goto out; - err = intel_context_migrate_copy(ce, await, + err = intel_context_migrate_copy(ce, deps, src, src_cache_level, src_is_lmem, dst, dst_cache_level, dst_is_lmem, out); @@ -638,7 +638,7 @@ int intel_migrate_copy(struct intel_migrate *m, int intel_migrate_clear(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, @@ -661,7 +661,7 @@ intel_migrate_clear(struct intel_migrate *m, if (err) goto out; - err = intel_context_migrate_clear(ce, await, sg, cache_level, + err = intel_context_migrate_clear(ce, deps, sg, cache_level, is_lmem, value, out); intel_context_unpin(ce); diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h index 4e18e755a00b..ccc677ec4aa3 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.h +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h @@ -11,6 +11,7 @@ #include "intel_migrate_types.h" struct dma_fence; +struct i915_deps; struct i915_request; struct i915_gem_ww_ctx; struct intel_gt; @@ -23,7 +24,7 @@ struct intel_context *intel_migrate_create_context(struct intel_migrate *m); int intel_migrate_copy(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -33,7 +34,7 @@ int intel_migrate_copy(struct intel_migrate *m, struct i915_request **out); int intel_context_migrate_copy(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *src, enum i915_cache_level src_cache_level, bool src_is_lmem, @@ -45,7 +46,7 @@ int intel_context_migrate_copy(struct intel_context *ce, int intel_migrate_clear(struct intel_migrate *m, struct i915_gem_ww_ctx *ww, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, @@ -53,7 +54,7 @@ intel_migrate_clear(struct intel_migrate *m, struct i915_request **out); int intel_context_migrate_clear(struct intel_context *ce, - struct dma_fence *await, + const struct i915_deps *deps, struct scatterlist *sg, enum i915_cache_level cache_level, bool is_lmem, diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 40cf8ec0b9fc..7d804df27546 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -32,6 +32,7 @@ #include #include "gem/i915_gem_context.h" +#include "gem/i915_gem_ttm_move.h" #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine.h" @@ -1543,6 +1544,27 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) return 0; } +/** + * i915_request_await_deps - set this request to (async) wait upon a struct + * i915_deps dma_fence collection + * @rq: request we are wishing to use + * @deps: The struct i915_deps containing the dependencies. + * + * Returns 0 if successful, negative error code on error. + */ +int i915_request_await_deps(struct i915_request *rq, const struct i915_deps *deps) +{ + int i, err; + + for (i = 0; i < deps->num_deps; ++i) { + err = i915_request_await_dma_fence(rq, deps->fences[i]); + if (err) + return err; + } + + return 0; +} + /** * i915_request_await_object - set this request to (async) wait upon a bo * @to: request we are wishing to use diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index ce7714210697..170ee78c2858 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -47,6 +47,7 @@ struct drm_file; struct drm_i915_gem_object; struct drm_printer; +struct i915_deps; struct i915_request; #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) @@ -411,6 +412,7 @@ int i915_request_await_object(struct i915_request *to, bool write); int i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence); +int i915_request_await_deps(struct i915_request *rq, const struct i915_deps *deps); int i915_request_await_execution(struct i915_request *rq, struct dma_fence *fence); From patchwork Fri Dec 17 09:19:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F0266C433F5 for ; Fri, 17 Dec 2021 09:19:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 03BA610FA36; Fri, 17 Dec 2021 09:19:45 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6174310FA2A; Fri, 17 Dec 2021 09:19:43 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493332" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493332" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:43 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321523" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:41 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:24 +0100 Message-Id: <20211217091929.105781-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 2/7] drm/i915: Break out the i915_deps utility X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since it's starting to be used outside the i915 TTM move code, move it to a separate set of files. v2: - Update the documentation. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 176 +------------ drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 17 -- drivers/gpu/drm/i915/i915_deps.c | 244 +++++++++++++++++++ drivers/gpu/drm/i915/i915_deps.h | 46 ++++ drivers/gpu/drm/i915/i915_request.c | 2 +- 6 files changed, 293 insertions(+), 193 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_deps.c create mode 100644 drivers/gpu/drm/i915/i915_deps.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 6ddd2d2bbaaf..1b62b9f65196 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -163,6 +163,7 @@ i915-y += \ i915_active.o \ i915_buddy.o \ i915_cmd_parser.o \ + i915_deps.o \ i915_gem_evict.o \ i915_gem_gtt.o \ i915_gem_ww.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index 960145c8200f..e8a99e8cd129 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -5,6 +5,7 @@ #include +#include "i915_deps.h" #include "i915_drv.h" #include "intel_memory_region.h" #include "intel_region_ttm.h" @@ -41,181 +42,6 @@ void i915_ttm_migrate_set_failure_modes(bool gpu_migration, } #endif -/** - * DOC: Set of utilities to dynamically collect dependencies into a - * structure which is fed into the GT migration code. - * - * Once we can do async unbinding, this is also needed to coalesce - * the migration fence with the unbind fences if these are coalesced - * post-migration. - * - * While collecting the individual dependencies, we store the refcounted - * struct dma_fence pointers in a realloc-managed pointer array, since - * that can be easily fed into a dma_fence_array. Other options are - * available, like for example an xarray for similarity with drm/sched. - * Can be changed easily if needed. - * - * A struct i915_deps need to be initialized using i915_deps_init(). - * If i915_deps_add_dependency() or i915_deps_add_resv() return an - * error code they will internally call i915_deps_fini(), which frees - * all internal references and allocations. - * - * We might want to break this out into a separate file as a utility. - */ - -#define I915_DEPS_MIN_ALLOC_CHUNK 8U - -static void i915_deps_reset_fences(struct i915_deps *deps) -{ - if (deps->fences != &deps->single) - kfree(deps->fences); - deps->num_deps = 0; - deps->fences_size = 1; - deps->fences = &deps->single; -} - -static void i915_deps_init(struct i915_deps *deps, gfp_t gfp) -{ - deps->fences = NULL; - deps->gfp = gfp; - i915_deps_reset_fences(deps); -} - -static void i915_deps_fini(struct i915_deps *deps) -{ - unsigned int i; - - for (i = 0; i < deps->num_deps; ++i) - dma_fence_put(deps->fences[i]); - - if (deps->fences != &deps->single) - kfree(deps->fences); -} - -static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence, - const struct ttm_operation_ctx *ctx) -{ - int ret; - - if (deps->num_deps >= deps->fences_size) { - unsigned int new_size = 2 * deps->fences_size; - struct dma_fence **new_fences; - - new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK); - new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp); - if (!new_fences) - goto sync; - - memcpy(new_fences, deps->fences, - deps->fences_size * sizeof(*new_fences)); - swap(new_fences, deps->fences); - if (new_fences != &deps->single) - kfree(new_fences); - deps->fences_size = new_size; - } - deps->fences[deps->num_deps++] = dma_fence_get(fence); - return 0; - -sync: - if (ctx->no_wait_gpu && !dma_fence_is_signaled(fence)) { - ret = -EBUSY; - goto unref; - } - - ret = dma_fence_wait(fence, ctx->interruptible); - if (ret) - goto unref; - - ret = fence->error; - if (ret) - goto unref; - - return 0; - -unref: - i915_deps_fini(deps); - return ret; -} - -static int i915_deps_sync(const struct i915_deps *deps, - const struct ttm_operation_ctx *ctx) -{ - struct dma_fence **fences = deps->fences; - unsigned int i; - int ret = 0; - - for (i = 0; i < deps->num_deps; ++i, ++fences) { - if (ctx->no_wait_gpu && !dma_fence_is_signaled(*fences)) { - ret = -EBUSY; - break; - } - - ret = dma_fence_wait(*fences, ctx->interruptible); - if (!ret) - ret = (*fences)->error; - if (ret) - break; - } - - return ret; -} - -static int i915_deps_add_dependency(struct i915_deps *deps, - struct dma_fence *fence, - const struct ttm_operation_ctx *ctx) -{ - unsigned int i; - int ret; - - if (!fence) - return 0; - - if (dma_fence_is_signaled(fence)) { - ret = fence->error; - if (ret) - i915_deps_fini(deps); - return ret; - } - - for (i = 0; i < deps->num_deps; ++i) { - struct dma_fence *entry = deps->fences[i]; - - if (!entry->context || entry->context != fence->context) - continue; - - if (dma_fence_is_later(fence, entry)) { - dma_fence_put(entry); - deps->fences[i] = dma_fence_get(fence); - } - - return 0; - } - - return i915_deps_grow(deps, fence, ctx); -} - -static int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, - bool all, const bool no_excl, - const struct ttm_operation_ctx *ctx) -{ - struct dma_resv_iter iter; - struct dma_fence *fence; - - dma_resv_assert_held(resv); - dma_resv_for_each_fence(&iter, resv, all, fence) { - int ret; - - if (no_excl && dma_resv_iter_is_exclusive(&iter)) - continue; - - ret = i915_deps_add_dependency(deps, fence, ctx); - if (ret) - return ret; - } - - return 0; -} - static enum i915_cache_level i915_ttm_cache_level(struct drm_i915_private *i915, struct ttm_resource *res, struct ttm_tt *ttm) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h index 138b7647a558..d2e7f149e05c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h @@ -18,23 +18,6 @@ struct ttm_tt; struct drm_i915_gem_object; struct i915_refct_sgt; -/** - * struct i915_deps - Collect dependencies into a single dma-fence - * @single: Storage for pointer if the collection is a single fence. - * @fences: Allocated array of fence pointers if more than a single fence; - * otherwise points to the address of @single. - * @num_deps: Current number of dependency fences. - * @fences_size: Size of the @fences array in number of pointers. - * @gfp: Allocation mode. - */ -struct i915_deps { - struct dma_fence *single; - struct dma_fence **fences; - unsigned int num_deps; - unsigned int fences_size; - gfp_t gfp; -}; - int i915_ttm_move_notify(struct ttm_buffer_object *bo); I915_SELFTEST_DECLARE(void i915_ttm_migrate_set_failure_modes(bool gpu_migration, diff --git a/drivers/gpu/drm/i915/i915_deps.c b/drivers/gpu/drm/i915/i915_deps.c new file mode 100644 index 000000000000..b232b8e294d4 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_deps.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include +#include + +#include + +#include "i915_deps.h" + +/** + * DOC: Set of utilities to dynamically collect dependencies into a + * structure which is fed into the GT migration code. + * + * Once we can do async unbinding, this is also needed to coalesce + * the migration fence with the unbind fences if these are coalesced + * post-migration. + * + * While collecting the individual dependencies, we store the refcounted + * struct dma_fence pointers in a realloc-managed pointer array, since + * that can be easily fed into a dma_fence_array. Other options are + * available, like for example an xarray for similarity with drm/sched. + * Can be changed easily if needed. + * + * A struct i915_deps need to be initialized using i915_deps_init(). + * If i915_deps_add_dependency() or i915_deps_add_resv() return an + * error code they will internally call i915_deps_fini(), which frees + * all internal references and allocations. + */ + +/* Min number of fence pointers in the array when an allocation occurs. */ +#define I915_DEPS_MIN_ALLOC_CHUNK 8U + +static void i915_deps_reset_fences(struct i915_deps *deps) +{ + if (deps->fences != &deps->single) + kfree(deps->fences); + deps->num_deps = 0; + deps->fences_size = 1; + deps->fences = &deps->single; +} + +/** + * i915_deps_init - Initialize an i915_deps structure + * @deps: Pointer to the i915_deps structure to initialize. + * @gfp: The allocation mode for subsequenst allocations. + */ +void i915_deps_init(struct i915_deps *deps, gfp_t gfp) +{ + deps->fences = NULL; + deps->gfp = gfp; + i915_deps_reset_fences(deps); +} + +/** + * i915_deps_fini - Finalize an i915_deps structure + * @deps: Pointer to the i915_deps structure to finalize. + * + * This function drops all fence references taken, conditionally frees and + * then resets the fences array. + */ +void i915_deps_fini(struct i915_deps *deps) +{ + unsigned int i; + + for (i = 0; i < deps->num_deps; ++i) + dma_fence_put(deps->fences[i]); + + if (deps->fences != &deps->single) + kfree(deps->fences); +} + +static int i915_deps_grow(struct i915_deps *deps, struct dma_fence *fence, + const struct ttm_operation_ctx *ctx) +{ + int ret; + + if (deps->num_deps >= deps->fences_size) { + unsigned int new_size = 2 * deps->fences_size; + struct dma_fence **new_fences; + + new_size = max(new_size, I915_DEPS_MIN_ALLOC_CHUNK); + new_fences = kmalloc_array(new_size, sizeof(*new_fences), deps->gfp); + if (!new_fences) + goto sync; + + memcpy(new_fences, deps->fences, + deps->fences_size * sizeof(*new_fences)); + swap(new_fences, deps->fences); + if (new_fences != &deps->single) + kfree(new_fences); + deps->fences_size = new_size; + } + deps->fences[deps->num_deps++] = dma_fence_get(fence); + return 0; + +sync: + if (ctx->no_wait_gpu && !dma_fence_is_signaled(fence)) { + ret = -EBUSY; + goto unref; + } + + ret = dma_fence_wait(fence, ctx->interruptible); + if (ret) + goto unref; + + ret = fence->error; + if (ret) + goto unref; + + return 0; + +unref: + i915_deps_fini(deps); + return ret; +} + +/** + * i915_deps_sync - Wait for all the fences in the dependency collection + * @deps: Pointer to the i915_deps structure the fences of which to wait for. + * @ctx: Pointer to a struct ttm_operation_ctx indicating how the waits + * should be performed. + * + * This function waits for fences in the dependency collection. If it + * encounters an error during the wait or a fence error, the wait for + * further fences is aborted and the error returned. + * + * Return: Zero if successful, Negative error code on error. + */ +int i915_deps_sync(const struct i915_deps *deps, const struct ttm_operation_ctx *ctx) +{ + struct dma_fence **fences = deps->fences; + unsigned int i; + int ret = 0; + + for (i = 0; i < deps->num_deps; ++i, ++fences) { + if (ctx->no_wait_gpu && !dma_fence_is_signaled(*fences)) { + ret = -EBUSY; + break; + } + + ret = dma_fence_wait(*fences, ctx->interruptible); + if (!ret) + ret = (*fences)->error; + if (ret) + break; + } + + return ret; +} + +/** + * i915_deps_add_dependency - Add a fence to the dependency collection + * @deps: Pointer to the i915_deps structure a fence is to be added to. + * @fence: The fence to add. + * @ctx: Pointer to a struct ttm_operation_ctx indicating how waits are to + * be performed if waiting. + * + * Adds a fence to the dependency collection, and takes a reference on it. + * If the fence context is not zero and there was a later fence from the + * same fence context already added, then the fence is not added to the + * dependency collection. If the fence context is not zero and there was + * an earlier fence already added, then the fence will replace the older + * fence from the same context and the reference on the earlier fence will + * be dropped. + * If there is a failure to allocate memory to accommodate the new fence to + * be added, the new fence will instead be waited for and an error may + * be returned; depending on the value of @ctx, or if there was a fence + * error. If an error was returned, the dependency collection will be + * finalized and all fence reference dropped. + * + * Return: 0 if success. Negative error code on error. + */ +int i915_deps_add_dependency(struct i915_deps *deps, + struct dma_fence *fence, + const struct ttm_operation_ctx *ctx) +{ + unsigned int i; + int ret; + + if (!fence) + return 0; + + if (dma_fence_is_signaled(fence)) { + ret = fence->error; + if (ret) + i915_deps_fini(deps); + return ret; + } + + for (i = 0; i < deps->num_deps; ++i) { + struct dma_fence *entry = deps->fences[i]; + + if (!entry->context || entry->context != fence->context) + continue; + + if (dma_fence_is_later(fence, entry)) { + dma_fence_put(entry); + deps->fences[i] = dma_fence_get(fence); + } + + return 0; + } + + return i915_deps_grow(deps, fence, ctx); +} + +/** + * i915_deps_add_resv - Add the fences of a reservation object to a dependency + * collection. + * @deps: Pointer to the i915_deps structure a fence is to be added to. + * @resv: The reservation object, then fences of which to add. + * @all: Whether to include all shared fences of @resv. + * @no_excl: Whether to exclude the exclusive fence. + * @ctx: Pointer to a struct ttm_operation_ctx indicating how waits are to + * be performed if waiting. + * + * Calls i915_deps_add_depencency() on the indicated fences of @resv. + * + * Return: Zero on success. Negative error code on error. + */ +int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, + bool all, const bool no_excl, + const struct ttm_operation_ctx *ctx) +{ + struct dma_resv_iter iter; + struct dma_fence *fence; + + dma_resv_assert_held(resv); + dma_resv_for_each_fence(&iter, resv, all, fence) { + int ret; + + if (no_excl && dma_resv_iter_is_exclusive(&iter)) + continue; + + ret = i915_deps_add_dependency(deps, fence, ctx); + if (ret) + return ret; + } + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_deps.h b/drivers/gpu/drm/i915/i915_deps.h new file mode 100644 index 000000000000..df18e21d8206 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_deps.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef _I915_DEPS_H_ +#define _I915_DEPS_H_ + +#include + +struct ttm_operation_ctx; +struct dma_fence; +struct dma_resv; + +/** + * struct i915_deps - Collect dependencies into a single dma-fence + * @single: Storage for pointer if the collection is a single fence. + * @fences: Allocated array of fence pointers if more than a single fence; + * otherwise points to the address of @single. + * @num_deps: Current number of dependency fences. + * @fences_size: Size of the @fences array in number of pointers. + * @gfp: Allocation mode. + */ +struct i915_deps { + struct dma_fence *single; + struct dma_fence **fences; + unsigned int num_deps; + unsigned int fences_size; + gfp_t gfp; +}; + +void i915_deps_init(struct i915_deps *deps, gfp_t gfp); + +void i915_deps_fini(struct i915_deps *deps); + +int i915_deps_add_dependency(struct i915_deps *deps, + struct dma_fence *fence, + const struct ttm_operation_ctx *ctx); + +int i915_deps_add_resv(struct i915_deps *deps, struct dma_resv *resv, + bool all, const bool no_excl, + const struct ttm_operation_ctx *ctx); + +int i915_deps_sync(const struct i915_deps *deps, + const struct ttm_operation_ctx *ctx); +#endif diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7d804df27546..76cf5ac91e94 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -32,7 +32,6 @@ #include #include "gem/i915_gem_context.h" -#include "gem/i915_gem_ttm_move.h" #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine.h" @@ -43,6 +42,7 @@ #include "gt/intel_rps.h" #include "i915_active.h" +#include "i915_deps.h" #include "i915_drv.h" #include "i915_trace.h" #include "intel_pm.h" From patchwork Fri Dec 17 09:19:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B158FC433FE for ; Fri, 17 Dec 2021 09:20:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1E41D10FA62; Fri, 17 Dec 2021 09:19:52 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 29CBB10FA39; Fri, 17 Dec 2021 09:19:45 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493335" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493335" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:45 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321531" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:43 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:25 +0100 Message-Id: <20211217091929.105781-4-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 3/7] drm/i915: Require the vm mutex for i915_vma_bind() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Protect updates of struct i915_vma flags and async binding / unbinding with the vm::mutex. This means that i915_vma_bind() needs to assert vm::mutex held. In order to make that possible drop the caching of kmap_atomic() maps around i915_vma_bind(). An alternative would be to use kmap_local() but since we block cpu unplugging during sleeps inside kmap_local() sections this may have unwanted side-effects. Particularly since we might wait for gpu while holding the vm mutex. This change may theoretically increase execbuf cpu-usage on snb, but at least on non-highmem systems that increase should be very small. Signed-off-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 50 ++++++++++++++++++- drivers/gpu/drm/i915/i915_vma.c | 1 + 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 2213f7b613da..6013f7e18f60 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1109,6 +1109,47 @@ static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache) return &i915->ggtt; } +static void reloc_cache_unmap(struct reloc_cache *cache) +{ + void *vaddr; + + if (!cache->vaddr) + return; + + vaddr = unmask_page(cache->vaddr); + if (cache->vaddr & KMAP) + kunmap_atomic(vaddr); + else + io_mapping_unmap_atomic((void __iomem *)vaddr); +} + +static void reloc_cache_remap(struct reloc_cache *cache, + struct drm_i915_gem_object *obj) +{ + void *vaddr; + + if (!cache->vaddr) + return; + + if (cache->vaddr & KMAP) { + struct page *page = i915_gem_object_get_page(obj, cache->page); + + vaddr = kmap_atomic(page); + cache->vaddr = unmask_flags(cache->vaddr) | + (unsigned long)vaddr; + } else { + struct i915_ggtt *ggtt = cache_to_ggtt(cache); + unsigned long offset; + + offset = cache->node.start; + if (!drm_mm_node_allocated(&cache->node)) + offset += cache->page << PAGE_SHIFT; + + cache->vaddr = (unsigned long) + io_mapping_map_atomic_wc(&ggtt->iomap, offset); + } +} + static void reloc_cache_reset(struct reloc_cache *cache, struct i915_execbuffer *eb) { void *vaddr; @@ -1373,10 +1414,17 @@ eb_relocate_entry(struct i915_execbuffer *eb, * batchbuffers. */ if (reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION && - GRAPHICS_VER(eb->i915) == 6) { + GRAPHICS_VER(eb->i915) == 6 && + !i915_vma_is_bound(target->vma, I915_VMA_GLOBAL_BIND)) { + struct i915_vma *vma = target->vma; + + reloc_cache_unmap(&eb->reloc_cache); + mutex_lock(&vma->vm->mutex); err = i915_vma_bind(target->vma, target->vma->obj->cache_level, PIN_GLOBAL, NULL); + mutex_unlock(&vma->vm->mutex); + reloc_cache_remap(&eb->reloc_cache, ev->vma->obj); if (err) return err; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 927f0d4f8e11..d792a3d0da7a 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -398,6 +398,7 @@ int i915_vma_bind(struct i915_vma *vma, u32 bind_flags; u32 vma_flags; + lockdep_assert_held(&vma->vm->mutex); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); GEM_BUG_ON(vma->size > vma->node.size); From patchwork Fri Dec 17 09:19:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E7C18C433F5 for ; Fri, 17 Dec 2021 09:20:06 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5722F10FA8C; Fri, 17 Dec 2021 09:19:52 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id EFA4410FA30; Fri, 17 Dec 2021 09:19:46 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493338" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493338" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:46 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321537" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:45 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:26 +0100 Message-Id: <20211217091929.105781-5-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 4/7] drm/i915: Initial introduction of vma resources X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Introduce vma resources, sort of similar to TTM resources, needed for asynchronous bind management. Initially we will use them to hold completion of unbinding when we capture data from a vma, but they will be used extensively in upcoming patches for asynchronous vma unbinding. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 55 +++++++- drivers/gpu/drm/i915/i915_vma.h | 19 ++- drivers/gpu/drm/i915/i915_vma_resource.c | 124 ++++++++++++++++++ drivers/gpu/drm/i915/i915_vma_resource.h | 70 ++++++++++ drivers/gpu/drm/i915/i915_vma_snapshot.c | 15 +-- drivers/gpu/drm/i915/i915_vma_snapshot.h | 7 +- drivers/gpu/drm/i915/i915_vma_types.h | 5 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 99 ++++++++------ 10 files changed, 334 insertions(+), 63 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_vma_resource.c create mode 100644 drivers/gpu/drm/i915/i915_vma_resource.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 1b62b9f65196..98433ad74194 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -174,6 +174,7 @@ i915-y += \ i915_trace_points.o \ i915_ttm_buddy_manager.o \ i915_vma.o \ + i915_vma_resource.o \ i915_vma_snapshot.o \ intel_wopcm.o diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 6013f7e18f60..b6faae1f9081 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1422,7 +1422,7 @@ eb_relocate_entry(struct i915_execbuffer *eb, mutex_lock(&vma->vm->mutex); err = i915_vma_bind(target->vma, target->vma->obj->cache_level, - PIN_GLOBAL, NULL); + PIN_GLOBAL, NULL, NULL); mutex_unlock(&vma->vm->mutex); reloc_cache_remap(&eb->reloc_cache, ev->vma->obj); if (err) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index d792a3d0da7a..4308659bf552 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -37,6 +37,7 @@ #include "i915_sw_fence_work.h" #include "i915_trace.h" #include "i915_vma.h" +#include "i915_vma_resource.h" static struct kmem_cache *slab_vmas; @@ -385,6 +386,8 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma) * @cache_level: mapping cache level * @flags: flags like global or local mapping * @work: preallocated worker for allocating and binding the PTE + * @vma_res: pointer to a preallocated vma resource. The resource is either + * consumed or freed. * * DMA addresses are taken from the scatter-gather table of this object (or of * this VMA in case of non-default GGTT views) and PTE entries set up. @@ -393,7 +396,8 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma) int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, u32 flags, - struct i915_vma_work *work) + struct i915_vma_work *work, + struct i915_vma_resource *vma_res) { u32 bind_flags; u32 vma_flags; @@ -404,11 +408,15 @@ int i915_vma_bind(struct i915_vma *vma, if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start, vma->node.size, - vma->vm->total))) + vma->vm->total))) { + kfree(vma_res); return -ENODEV; + } - if (GEM_DEBUG_WARN_ON(!flags)) + if (GEM_DEBUG_WARN_ON(!flags)) { + kfree(vma_res); return -EINVAL; + } bind_flags = flags; bind_flags &= I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; @@ -417,11 +425,21 @@ int i915_vma_bind(struct i915_vma *vma, vma_flags &= I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; bind_flags &= ~vma_flags; - if (bind_flags == 0) + if (bind_flags == 0) { + kfree(vma_res); return 0; + } GEM_BUG_ON(!vma->pages); + if (vma->resource || !vma_res) { + /* Rebinding with an additional I915_VMA_*_BIND */ + GEM_WARN_ON(!vma_flags); + kfree(vma_res); + } else { + i915_vma_resource_init(vma_res); + vma->resource = vma_res; + } trace_i915_vma_bind(vma, bind_flags); if (work && bind_flags & vma->vm->bind_async_flags) { struct dma_fence *prev; @@ -897,6 +915,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, { struct i915_vma_work *work = NULL; struct dma_fence *moving = NULL; + struct i915_vma_resource *vma_res = NULL; intel_wakeref_t wakeref = 0; unsigned int bound; int err; @@ -953,6 +972,12 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, } } + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) { + err = PTR_ERR(vma_res); + goto err_fence; + } + /* * Differentiate between user/kernel vma inside the aliasing-ppgtt. * @@ -973,7 +998,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, err = mutex_lock_interruptible_nested(&vma->vm->mutex, !(flags & PIN_GLOBAL)); if (err) - goto err_fence; + goto err_vma_res; /* No more allocations allowed now we hold vm->mutex */ @@ -1014,7 +1039,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, GEM_BUG_ON(!vma->pages); err = i915_vma_bind(vma, vma->obj->cache_level, - flags, work); + flags, work, vma_res); + vma_res = NULL; if (err) goto err_remove; @@ -1037,6 +1063,8 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, i915_active_release(&vma->active); err_unlock: mutex_unlock(&vma->vm->mutex); +err_vma_res: + kfree(vma_res); err_fence: if (work) dma_fence_work_commit_imm(&work->base); @@ -1170,6 +1198,7 @@ void i915_vma_release(struct kref *ref) i915_vm_put(vma->vm); i915_active_fini(&vma->active); + GEM_WARN_ON(vma->resource); i915_vma_free(vma); } @@ -1318,6 +1347,8 @@ int _i915_vma_move_to_active(struct i915_vma *vma, void __i915_vma_evict(struct i915_vma *vma) { + struct dma_fence *unbind_fence; + GEM_BUG_ON(i915_vma_is_pinned(vma)); if (i915_vma_is_map_and_fenceable(vma)) { @@ -1355,8 +1386,20 @@ void __i915_vma_evict(struct i915_vma *vma) atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); + unbind_fence = i915_vma_resource_unbind(vma->resource); + i915_vma_resource_put(vma->resource); + vma->resource = NULL; + i915_vma_detach(vma); vma_unbind_pages(vma); + + /* + * This uninterruptible wait under the vm mutex is currently + * only ever blocking while the vma is being captured from. + * With async unbinding, this wait here will be removed. + */ + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); } int __i915_vma_unbind(struct i915_vma *vma) diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 4033aa08d5e4..947ca7b3f3fa 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -37,6 +37,7 @@ #include "i915_active.h" #include "i915_request.h" +#include "i915_vma_resource.h" #include "i915_vma_types.h" struct i915_vma * @@ -206,7 +207,8 @@ struct i915_vma_work *i915_vma_work(void); int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, u32 flags, - struct i915_vma_work *work); + struct i915_vma_work *work, + struct i915_vma_resource *vma_res); bool i915_gem_valid_gtt_space(struct i915_vma *vma, unsigned long color); bool i915_vma_misplaced(const struct i915_vma *vma, @@ -430,6 +432,21 @@ static inline int i915_vma_sync(struct i915_vma *vma) return i915_active_wait(&vma->active); } +/** + * i915_vma_get_current_resource - Get the current resource of the vma + * @vma: The vma to get the current resource from. + * + * It's illegal to call this function if the vma is not bound. + * + * Return: A refcounted pointer to the current vma resource + * of the vma, assuming the vma is bound. + */ +static inline struct i915_vma_resource * +i915_vma_get_current_resource(struct i915_vma *vma) +{ + return i915_vma_resource_get(vma->resource); +} + void i915_vma_module_exit(void); int i915_vma_module_init(void); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c new file mode 100644 index 000000000000..833e987bed2a --- /dev/null +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ +#include + +#include "i915_vma_resource.h" + +/* Callbacks for the unbind dma-fence. */ +static const char *get_driver_name(struct dma_fence *fence) +{ + return "vma unbind fence"; +} + +static const char *get_timeline_name(struct dma_fence *fence) +{ + return "unbound"; +} + +static struct dma_fence_ops unbind_fence_ops = { + .get_driver_name = get_driver_name, + .get_timeline_name = get_timeline_name, +}; + +/** + * i915_vma_resource_init - Initialize a vma resource. + * @vma_res: The vma resource to initialize + * + * Initializes a vma resource allocated using i915_vma_resource_alloc(). + * The reason for having separate allocate and initialize function is that + * initialization may need to be performed from under a lock where + * allocation is not allowed. + */ +void i915_vma_resource_init(struct i915_vma_resource *vma_res) +{ + spin_lock_init(&vma_res->lock); + dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, + &vma_res->lock, 0, 0); + refcount_set(&vma_res->hold_count, 1); +} + +/** + * i915_vma_resource_alloc - Allocate a vma resource + * + * Return: A pointer to a cleared struct i915_vma_resource or + * a -ENOMEM error pointer if allocation fails. + */ +struct i915_vma_resource *i915_vma_resource_alloc(void) +{ + struct i915_vma_resource *vma_res = + kzalloc(sizeof(*vma_res), GFP_KERNEL); + + return vma_res ? vma_res : ERR_PTR(-ENOMEM); +} + +static void __i915_vma_resource_unhold(struct i915_vma_resource *vma_res) +{ + if (refcount_dec_and_test(&vma_res->hold_count)) + dma_fence_signal(&vma_res->unbind_fence); +} + +/** + * i915_vma_resource_unhold - Unhold the signaling of the vma resource unbind + * fence. + * @vma_res: The vma resource. + * @lockdep_cookie: The lockdep cookie returned from i915_vma_resource_hold. + * + * The function may leave a dma_fence critical section. + */ +void i915_vma_resource_unhold(struct i915_vma_resource *vma_res, + bool lockdep_cookie) +{ + dma_fence_end_signalling(lockdep_cookie); + + if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { + unsigned long irq_flags; + + /* Inefficient open-coded might_lock_irqsave() */ + spin_lock_irqsave(&vma_res->lock, irq_flags); + spin_unlock_irqrestore(&vma_res->lock, irq_flags); + } + + __i915_vma_resource_unhold(vma_res); +} + +/** + * i915_vma_resource_hold - Hold the signaling of the vma resource unbind fence. + * @vma_res: The vma resource. + * @lockdep_cookie: Pointer to a bool serving as a lockdep cooke that should + * be given as an argument to the pairing i915_vma_resource_unhold. + * + * If returning true, the function enters a dma_fence signalling critical + * section is not in one already. + * + * Return: true if holding successful, false if not. + */ +bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, + bool *lockdep_cookie) +{ + bool held = refcount_inc_not_zero(&vma_res->hold_count); + + if (held) + *lockdep_cookie = dma_fence_begin_signalling(); + + return held; +} + +/** + * i915_vma_resource_unbind - Unbind a vma resource + * @vma_res: The vma resource to unbind. + * + * At this point this function does little more than publish a fence that + * signals immediately unless signaling is held back. + * + * Return: A refcounted pointer to a dma-fence that signals when unbinding is + * complete. + */ +struct dma_fence * +i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +{ + __i915_vma_resource_unhold(vma_res); + dma_fence_get(&vma_res->unbind_fence); + return &vma_res->unbind_fence; +} diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h new file mode 100644 index 000000000000..34744da23072 --- /dev/null +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2021 Intel Corporation + */ + +#ifndef __I915_VMA_RESOURCE_H__ +#define __I915_VMA_RESOURCE_H__ + +#include +#include + +/** + * struct i915_vma_resource - Snapshotted unbind information. + * @unbind_fence: Fence to mark unbinding complete. Note that this fence + * is not considered published until unbind is scheduled, and as such it + * is illegal to access this fence before scheduled unbind other than + * for refcounting. + * @lock: The @unbind_fence lock. We're also using it to protect the weak + * pointer to the struct i915_vma, @vma during lookup and takedown. + * @hold_count: Number of holders blocking the fence from finishing. + * The vma itself is keeping a hold, which is released when unbind + * is scheduled. + * + * The lifetime of a struct i915_vma_resource is from a binding request to + * the actual possible asynchronous unbind has completed. + */ +struct i915_vma_resource { + struct dma_fence unbind_fence; + /* See above for description of the lock. */ + spinlock_t lock; + refcount_t hold_count; +}; + +bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, + bool *lockdep_cookie); + +void i915_vma_resource_unhold(struct i915_vma_resource *vma_res, + bool lockdep_cookie); + +struct i915_vma_resource *i915_vma_resource_alloc(void); + +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); + +/** + * i915_vma_resource_get - Take a reference on a vma resource + * @vma_res: The vma resource on which to take a reference. + * + * Return: The @vma_res pointer + */ +static inline struct i915_vma_resource +*i915_vma_resource_get(struct i915_vma_resource *vma_res) +{ + dma_fence_get(&vma_res->unbind_fence); + return vma_res; +} + +/** + * i915_vma_resource_put - Release a reference to a struct i915_vma_resource + * @vma_res: The resource + */ +static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) +{ + dma_fence_put(&vma_res->unbind_fence); +} + +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +void i915_vma_resource_init(struct i915_vma_resource *vma_res); +#endif + +#endif diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.c b/drivers/gpu/drm/i915/i915_vma_snapshot.c index 2949ceea9884..f7333c7a2f5e 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.c +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.c @@ -3,6 +3,7 @@ * Copyright © 2021 Intel Corporation */ +#include "i915_vma_resource.h" #include "i915_vma_snapshot.h" #include "i915_vma_types.h" #include "i915_vma.h" @@ -35,7 +36,7 @@ void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, vsnap->pages_rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); vsnap->mr = vma->obj->mm.region; kref_init(&vsnap->kref); - vsnap->vma_resource = &vma->active; + vsnap->vma_resource = i915_vma_get_current_resource(vma); vsnap->onstack = false; vsnap->present = true; } @@ -62,6 +63,7 @@ static void vma_snapshot_release(struct kref *ref) container_of(ref, typeof(*vsnap), kref); vsnap->present = false; + i915_vma_resource_put(vsnap->vma_resource); if (vsnap->pages_rsgt) i915_refct_sgt_put(vsnap->pages_rsgt); if (!vsnap->onstack) @@ -109,12 +111,7 @@ void i915_vma_snapshot_put_onstack(struct i915_vma_snapshot *vsnap) bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, bool *lockdep_cookie) { - bool pinned = i915_active_acquire_if_busy(vsnap->vma_resource); - - if (pinned) - *lockdep_cookie = dma_fence_begin_signalling(); - - return pinned; + return i915_vma_resource_hold(vsnap->vma_resource, lockdep_cookie); } /** @@ -128,7 +125,5 @@ bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, void i915_vma_snapshot_resource_unpin(struct i915_vma_snapshot *vsnap, bool lockdep_cookie) { - dma_fence_end_signalling(lockdep_cookie); - - return i915_active_release(vsnap->vma_resource); + i915_vma_resource_unhold(vsnap->vma_resource, lockdep_cookie); } diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.h b/drivers/gpu/drm/i915/i915_vma_snapshot.h index 940581df4622..e74588dd676b 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.h +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.h @@ -31,10 +31,7 @@ struct sg_table; * @pages_rsgt: The refcounted sg_table holding the reference for @pages if any. * @mr: The memory region pointed for the pages bound. * @kref: Reference for this structure. - * @vma_resource: FIXME: A means to keep the unbind fence from signaling. - * Temporarily while we have only sync unbinds, and still use the vma - * active, we use that. With async unbinding we need a signaling refcount - * for the unbind fence. + * @vma_resource: Pointer to the vma resource representing the vma binding. * @page_sizes: The vma GTT page sizes information. * @onstack: Whether the structure shouldn't be freed on final put. * @present: Whether the structure is present and initialized. @@ -49,7 +46,7 @@ struct i915_vma_snapshot { struct i915_refct_sgt *pages_rsgt; struct intel_memory_region *mr; struct kref kref; - struct i915_active *vma_resource; + struct i915_vma_resource *vma_resource; u32 page_sizes; bool onstack:1; bool present:1; diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h index f03fa96a1701..0564cee3ffe0 100644 --- a/drivers/gpu/drm/i915/i915_vma_types.h +++ b/drivers/gpu/drm/i915/i915_vma_types.h @@ -95,6 +95,8 @@ enum i915_cache_level; * */ +struct i915_vma_resource; + struct intel_remapped_plane_info { /* in gtt pages */ u32 offset:31; @@ -292,6 +294,9 @@ struct i915_vma { struct list_head evict_link; struct list_head closed_link; + + /** The async vma resource. Protected by the vm_mutex */ + struct i915_vma_resource *resource; }; #endif diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 46f4236039a9..29b48fedc675 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -32,6 +32,7 @@ #include "i915_random.h" #include "i915_selftest.h" +#include "i915_vma_resource.h" #include "mock_drm.h" #include "mock_gem_device.h" @@ -1336,6 +1337,33 @@ static int igt_mock_drunk(void *arg) return exercise_mock(ggtt->vm.i915, drunk_hole); } +static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset) +{ + struct i915_address_space *vm = vma->vm; + struct i915_vma_resource *vma_res; + struct drm_i915_gem_object *obj = vma->obj; + int err; + + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) + return PTR_ERR(vma_res); + + mutex_lock(&vm->mutex); + err = i915_gem_gtt_reserve(vm, &vma->node, obj->base.size, + offset, + obj->cache_level, + 0); + if (!err) { + i915_vma_resource_init(vma_res); + vma->resource = vma_res; + } else { + kfree(vma_res); + } + mutex_unlock(&vm->mutex); + + return err; +} + static int igt_gtt_reserve(void *arg) { struct i915_ggtt *ggtt = arg; @@ -1370,20 +1398,13 @@ static int igt_gtt_reserve(void *arg) } list_add(&obj->st_link, &objects); - vma = i915_vma_instance(obj, &ggtt->vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, - obj->base.size, - total, - obj->cache_level, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = reserve_gtt_with_resource(vma, total); if (err) { pr_err("i915_gem_gtt_reserve (pass 1) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1429,13 +1450,7 @@ static int igt_gtt_reserve(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, - obj->base.size, - total, - obj->cache_level, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = reserve_gtt_with_resource(vma, total); if (err) { pr_err("i915_gem_gtt_reserve (pass 2) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1476,13 +1491,7 @@ static int igt_gtt_reserve(void *arg) 2 * I915_GTT_PAGE_SIZE, I915_GTT_MIN_ALIGNMENT); - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, - obj->base.size, - offset, - obj->cache_level, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = reserve_gtt_with_resource(vma, offset); if (err) { pr_err("i915_gem_gtt_reserve (pass 3) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1509,6 +1518,31 @@ static int igt_gtt_reserve(void *arg) return err; } +static int insert_gtt_with_resource(struct i915_vma *vma) +{ + struct i915_address_space *vm = vma->vm; + struct i915_vma_resource *vma_res; + struct drm_i915_gem_object *obj = vma->obj; + int err; + + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) + return PTR_ERR(vma_res); + + mutex_lock(&vm->mutex); + err = i915_gem_gtt_insert(vm, &vma->node, obj->base.size, 0, + obj->cache_level, 0, vm->total, 0); + if (!err) { + i915_vma_resource_init(vma_res); + vma->resource = vma_res; + } else { + kfree(vma_res); + } + mutex_unlock(&vm->mutex); + + return err; +} + static int igt_gtt_insert(void *arg) { struct i915_ggtt *ggtt = arg; @@ -1593,12 +1627,7 @@ static int igt_gtt_insert(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, - obj->base.size, 0, obj->cache_level, - 0, ggtt->vm.total, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = insert_gtt_with_resource(vma); if (err == -ENOSPC) { /* maxed out the GGTT space */ i915_gem_object_put(obj); @@ -1653,12 +1682,7 @@ static int igt_gtt_insert(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, - obj->base.size, 0, obj->cache_level, - 0, ggtt->vm.total, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = insert_gtt_with_resource(vma); if (err) { pr_err("i915_gem_gtt_insert (pass 2) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); @@ -1702,12 +1726,7 @@ static int igt_gtt_insert(void *arg) goto out; } - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_gtt_insert(&ggtt->vm, &vma->node, - obj->base.size, 0, obj->cache_level, - 0, ggtt->vm.total, - 0); - mutex_unlock(&ggtt->vm.mutex); + err = insert_gtt_with_resource(vma); if (err) { pr_err("i915_gem_gtt_insert (pass 3) failed at %llu/%llu with err=%d\n", total, ggtt->vm.total, err); From patchwork Fri Dec 17 09:19:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A85EC433EF for ; Fri, 17 Dec 2021 09:20:04 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7218110FA3B; Fri, 17 Dec 2021 09:19:51 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 052F010FA3E; Fri, 17 Dec 2021 09:19:48 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493343" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493343" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:48 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321545" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:47 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:27 +0100 Message-Id: <20211217091929.105781-6-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 5/7] drm/i915: Use the vma resource as argument for gtt binding / unbinding X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" When introducing asynchronous unbinding, the vma itself may no longer be alive when the actual binding or unbinding takes place. Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource instead of a struct i915_vma for the bind_vma() and unbind_vma() ops. Similarly change the insert_entries() op for struct i915_address_space. Replace a couple of i915_vma_snapshot members with their newly introduced i915_vma_resource counterparts, since they have the same lifetime. Also make sure to avoid changing the struct i915_vma_flags (in particular the bind flags) async. That should now only be done sync under the vm mutex. v2: - Update the vma_res::bound_flags when binding to the aliased ggtt Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/display/intel_dpt.c | 27 ++--- .../gpu/drm/i915/gem/i915_gem_object_types.h | 27 +---- .../gpu/drm/i915/gem/selftests/huge_pages.c | 37 +++---- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 19 ++-- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 37 +++---- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 70 ++++++------- drivers/gpu/drm/i915/gt/intel_gtt.h | 15 +-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 22 +++-- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 13 ++- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h | 2 +- drivers/gpu/drm/i915/i915_debugfs.c | 3 +- drivers/gpu/drm/i915/i915_gpu_error.c | 6 +- drivers/gpu/drm/i915/i915_vma.c | 24 ++++- drivers/gpu/drm/i915/i915_vma.h | 11 +-- drivers/gpu/drm/i915/i915_vma_resource.c | 9 +- drivers/gpu/drm/i915/i915_vma_resource.h | 99 ++++++++++++++++++- drivers/gpu/drm/i915/i915_vma_snapshot.c | 4 - drivers/gpu/drm/i915/i915_vma_snapshot.h | 8 -- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 64 ++++++++---- drivers/gpu/drm/i915/selftests/mock_gtt.c | 12 +-- 21 files changed, 307 insertions(+), 206 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index 963ca7155b06..f9f2a4ef38cd 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -48,7 +48,7 @@ static void dpt_insert_page(struct i915_address_space *vm, } static void dpt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { @@ -64,8 +64,8 @@ static void dpt_insert_entries(struct i915_address_space *vm, * not to allow the user to override access to a read only page. */ - i = vma->node.start / I915_GTT_PAGE_SIZE; - for_each_sgt_daddr(addr, sgt_iter, vma->pages) + i = vma_res->start / I915_GTT_PAGE_SIZE; + for_each_sgt_daddr(addr, sgt_iter, vma_res->bi.pages) gen8_set_pte(&base[i++], pte_encode | addr); } @@ -76,35 +76,38 @@ static void dpt_clear_range(struct i915_address_space *vm, static void dpt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { - struct drm_i915_gem_object *obj = vma->obj; u32 pte_flags; + if (vma_res->bound_flags) + return; + /* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; - if (vma->vm->has_read_only && i915_gem_object_is_readonly(obj)) + if (vm->has_read_only && vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(obj)) + if (vma_res->bi.lmem) pte_flags |= PTE_LM; - vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma_res, cache_level, pte_flags); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; /* * Without aliasing PPGTT there's no difference between * GLOBAL/LOCAL_BIND, it's all the same ptes. Hence unconditionally * upgrade to both bound if we bind either to avoid double-binding. */ - atomic_or(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND, &vma->flags); + vma_res->bound_flags = I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND; } -static void dpt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) +static void dpt_unbind_vma(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - vm->clear_range(vm, vma->node.start, vma->size); + vm->clear_range(vm, vma_res->start, vma_res->vma_size); } static void dpt_cleanup(struct i915_address_space *vm) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index f9f7e44099fe..f99d260e0684 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -15,6 +15,7 @@ #include "i915_active.h" #include "i915_selftest.h" +#include "i915_vma_resource.h" struct drm_i915_gem_object; struct intel_fronbuffer; @@ -549,31 +550,7 @@ struct drm_i915_gem_object { struct sg_table *pages; void *mapping; - struct i915_page_sizes { - /** - * The sg mask of the pages sg_table. i.e the mask of - * of the lengths for each sg entry. - */ - unsigned int phys; - - /** - * The gtt page sizes we are allowed to use given the - * sg mask and the supported page sizes. This will - * express the smallest unit we can use for the whole - * object, as well as the larger sizes we may be able - * to use opportunistically. - */ - unsigned int sg; - - /** - * The actual gtt page size usage. Since we can have - * multiple vma associated with this object we need to - * prevent any trampling of state, hence a copy of this - * struct also lives in each vma, therefore the gtt - * value here should only be read/write through the vma. - */ - unsigned int gtt; - } page_sizes; + struct i915_page_sizes page_sizes; I915_SELFTEST_DECLARE(unsigned int page_mask); diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index c69c7d45aabc..52c82cd4b548 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -370,9 +370,9 @@ static int igt_check_page_sizes(struct i915_vma *vma) err = -EINVAL; } - if (!HAS_PAGE_SIZES(i915, vma->page_sizes.gtt)) { + if (!HAS_PAGE_SIZES(i915, vma->resource->page_sizes_gtt)) { pr_err("unsupported page_sizes.gtt=%u, supported=%u\n", - vma->page_sizes.gtt & ~supported, supported); + vma->resource->page_sizes_gtt & ~supported, supported); err = -EINVAL; } @@ -403,15 +403,9 @@ static int igt_check_page_sizes(struct i915_vma *vma) if (i915_gem_object_is_lmem(obj) && IS_ALIGNED(vma->node.start, SZ_2M) && vma->page_sizes.sg & SZ_2M && - vma->page_sizes.gtt < SZ_2M) { + vma->resource->page_sizes_gtt < SZ_2M) { pr_err("gtt pages mismatch for LMEM, expected 2M GTT pages, sg(%u), gtt(%u)\n", - vma->page_sizes.sg, vma->page_sizes.gtt); - err = -EINVAL; - } - - if (obj->mm.page_sizes.gtt) { - pr_err("obj->page_sizes.gtt(%u) should never be set\n", - obj->mm.page_sizes.gtt); + vma->page_sizes.sg, vma->resource->page_sizes_gtt); err = -EINVAL; } @@ -547,9 +541,9 @@ static int igt_mock_memory_region_huge_pages(void *arg) goto out_unpin; } - if (vma->page_sizes.gtt != page_size) { + if (vma->resource->page_sizes_gtt != page_size) { pr_err("%s page_sizes.gtt=%u, expected=%u\n", - __func__, vma->page_sizes.gtt, + __func__, vma->resource->page_sizes_gtt, page_size); err = -EINVAL; goto out_unpin; @@ -630,9 +624,9 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) err = igt_check_page_sizes(vma); - if (vma->page_sizes.gtt != page_size) { + if (vma->resource->page_sizes_gtt != page_size) { pr_err("page_sizes.gtt=%u, expected %u\n", - vma->page_sizes.gtt, page_size); + vma->resource->page_sizes_gtt, page_size); err = -EINVAL; } @@ -657,9 +651,10 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) err = igt_check_page_sizes(vma); - if (vma->page_sizes.gtt != I915_GTT_PAGE_SIZE_4K) { + if (vma->resource->page_sizes_gtt != I915_GTT_PAGE_SIZE_4K) { pr_err("page_sizes.gtt=%u, expected %llu\n", - vma->page_sizes.gtt, I915_GTT_PAGE_SIZE_4K); + vma->resource->page_sizes_gtt, + I915_GTT_PAGE_SIZE_4K); err = -EINVAL; } @@ -805,9 +800,9 @@ static int igt_mock_ppgtt_huge_fill(void *arg) } } - if (vma->page_sizes.gtt != expected_gtt) { + if (vma->resource->page_sizes_gtt != expected_gtt) { pr_err("gtt=%u, expected=%u, size=%zd, single=%s\n", - vma->page_sizes.gtt, expected_gtt, + vma->resource->page_sizes_gtt, expected_gtt, obj->base.size, yesno(!!single)); err = -EINVAL; break; @@ -961,10 +956,10 @@ static int igt_mock_ppgtt_64K(void *arg) } } - if (vma->page_sizes.gtt != expected_gtt) { + if (vma->resource->page_sizes_gtt != expected_gtt) { pr_err("gtt=%u, expected=%u, i=%d, single=%s\n", - vma->page_sizes.gtt, expected_gtt, i, - yesno(!!single)); + vma->resource->page_sizes_gtt, + expected_gtt, i, yesno(!!single)); err = -EINVAL; goto out_vma_unpin; } diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index c0d149f04949..37ae471cbae9 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -104,17 +104,17 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, } static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); struct i915_page_directory * const pd = ppgtt->pd; - unsigned int first_entry = vma->node.start / I915_GTT_PAGE_SIZE; + unsigned int first_entry = vma_res->start / I915_GTT_PAGE_SIZE; unsigned int act_pt = first_entry / GEN6_PTES; unsigned int act_pte = first_entry % GEN6_PTES; const u32 pte_encode = vm->pte_encode(0, cache_level, flags); - struct sgt_dma iter = sgt_dma(vma); + struct sgt_dma iter = sgt_dma(vma_res); gen6_pte_t *vaddr; GEM_BUG_ON(!pd->entry[act_pt]); @@ -140,7 +140,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, } } while (1); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; } static void gen6_flush_pd(struct gen6_ppgtt *ppgtt, u64 start, u64 end) @@ -284,13 +284,13 @@ static void pd_vma_clear_pages(struct i915_vma *vma) static void pd_vma_bind(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 unused) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm); - struct gen6_ppgtt *ppgtt = vma->private; - u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE; + struct gen6_ppgtt *ppgtt = vma_res->private; + u32 ggtt_offset = vma_res->start / I915_GTT_PAGE_SIZE; ppgtt->pp_dir = ggtt_offset * sizeof(gen6_pte_t) << 10; ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset; @@ -298,9 +298,10 @@ static void pd_vma_bind(struct i915_address_space *vm, gen6_flush_pd(ppgtt, 0, ppgtt->base.vm.total); } -static void pd_vma_unbind(struct i915_address_space *vm, struct i915_vma *vma) +static void pd_vma_unbind(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - struct gen6_ppgtt *ppgtt = vma->private; + struct gen6_ppgtt *ppgtt = vma_res->private; struct i915_page_directory * const pd = ppgtt->base.pd; struct i915_page_table *pt; unsigned int pde; diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index b012c50f7ce7..c43e724afa9f 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -453,20 +453,21 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; } -static void gen8_ppgtt_insert_huge(struct i915_vma *vma, +static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, struct sgt_dma *iter, enum i915_cache_level cache_level, u32 flags) { const gen8_pte_t pte_encode = gen8_pte_encode(0, cache_level, flags); unsigned int rem = sg_dma_len(iter->sg); - u64 start = vma->node.start; + u64 start = vma_res->start; - GEM_BUG_ON(!i915_vm_is_4lvl(vma->vm)); + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); do { struct i915_page_directory * const pdp = - gen8_pdp_for_page_address(vma->vm, start); + gen8_pdp_for_page_address(vm, start); struct i915_page_directory * const pd = i915_pd_entry(pdp, __gen8_pte_index(start, 2)); gen8_pte_t encode = pte_encode; @@ -475,7 +476,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, gen8_pte_t *vaddr; u16 index; - if (vma->page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && rem >= I915_GTT_PAGE_SIZE_2M && !__gen8_pte_index(start, 0)) { @@ -492,7 +493,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, page_size = I915_GTT_PAGE_SIZE; if (!index && - vma->page_sizes.sg & I915_GTT_PAGE_SIZE_64K && + vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_64K && IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_64K) && (IS_ALIGNED(rem, I915_GTT_PAGE_SIZE_64K) || rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)) @@ -541,9 +542,9 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, */ if (maybe_64K != -1 && (index == I915_PDES || - (i915_vm_has_scratch_64K(vma->vm) && - !iter->sg && IS_ALIGNED(vma->node.start + - vma->node.size, + (i915_vm_has_scratch_64K(vm) && + !iter->sg && IS_ALIGNED(vma_res->start + + vma_res->node_size, I915_GTT_PAGE_SIZE_2M)))) { vaddr = px_vaddr(pd); vaddr[maybe_64K] |= GEN8_PDE_IPS_64K; @@ -559,10 +560,10 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, * instead - which we detect as missing results during * selftests. */ - if (I915_SELFTEST_ONLY(vma->vm->scrub_64K)) { + if (I915_SELFTEST_ONLY(vm->scrub_64K)) { u16 i; - encode = vma->vm->scratch[0]->encode; + encode = vm->scratch[0]->encode; vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K)); for (i = 1; i < index; i += 16) @@ -572,22 +573,22 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, } } - vma->page_sizes.gtt |= page_size; + vma_res->page_sizes_gtt |= page_size; } while (iter->sg && sg_dma_len(iter->sg)); } static void gen8_ppgtt_insert(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(vm); - struct sgt_dma iter = sgt_dma(vma); + struct sgt_dma iter = sgt_dma(vma_res); - if (vma->page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vma, &iter, cache_level, flags); + if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { - u64 idx = vma->node.start >> GEN8_PTE_SHIFT; + u64 idx = vma_res->start >> GEN8_PTE_SHIFT; do { struct i915_page_directory * const pdp = @@ -597,7 +598,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, cache_level, flags); } while (idx); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; } } diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 352254e001b4..74aa90587061 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1718,8 +1718,8 @@ static void print_request_ring(struct drm_printer *m, struct i915_request *rq) drm_printf(m, "[head %04x, postfix %04x, tail %04x, batch 0x%08x_%08x]:\n", rq->head, rq->postfix, rq->tail, - vsnap ? upper_32_bits(vsnap->gtt_offset) : ~0u, - vsnap ? lower_32_bits(vsnap->gtt_offset) : ~0u); + vsnap ? upper_32_bits(vsnap->vma_resource->start) : ~0u, + vsnap ? lower_32_bits(vsnap->vma_resource->start) : ~0u); size = rq->tail - rq->head; if (rq->tail < rq->head) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index d85a1050f4a8..ad54941fcb98 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -238,7 +238,7 @@ static void gen8_ggtt_insert_page(struct i915_address_space *vm, } static void gen8_ggtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { @@ -255,10 +255,10 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, */ gte = (gen8_pte_t __iomem *)ggtt->gsm; - gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + gte += vma_res->start / I915_GTT_PAGE_SIZE; + end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE; - for_each_sgt_daddr(addr, iter, vma->pages) + for_each_sgt_daddr(addr, iter, vma_res->bi.pages) gen8_set_pte(gte++, pte_encode | addr); GEM_BUG_ON(gte > end); @@ -295,7 +295,7 @@ static void gen6_ggtt_insert_page(struct i915_address_space *vm, * through the GMADR mapped BAR (i915->mm.gtt->gtt). */ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { @@ -306,10 +306,10 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm, dma_addr_t addr; gte = (gen6_pte_t __iomem *)ggtt->gsm; - gte += vma->node.start / I915_GTT_PAGE_SIZE; - end = gte + vma->node.size / I915_GTT_PAGE_SIZE; + gte += vma_res->start / I915_GTT_PAGE_SIZE; + end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE; - for_each_sgt_daddr(addr, iter, vma->pages) + for_each_sgt_daddr(addr, iter, vma_res->bi.pages) iowrite32(vm->pte_encode(addr, level, flags), gte++); GEM_BUG_ON(gte > end); @@ -392,7 +392,7 @@ static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm, struct insert_entries { struct i915_address_space *vm; - struct i915_vma *vma; + struct i915_vma_resource *vma_res; enum i915_cache_level level; u32 flags; }; @@ -401,18 +401,18 @@ static int bxt_vtd_ggtt_insert_entries__cb(void *_arg) { struct insert_entries *arg = _arg; - gen8_ggtt_insert_entries(arg->vm, arg->vma, arg->level, arg->flags); + gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags); bxt_vtd_ggtt_wa(arg->vm); return 0; } static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { - struct insert_entries arg = { vm, vma, level, flags }; + struct insert_entries arg = { vm, vma_res, level, flags }; stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL); } @@ -451,14 +451,14 @@ static void i915_ggtt_insert_page(struct i915_address_space *vm, } static void i915_ggtt_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 unused) { unsigned int flags = (cache_level == I915_CACHE_NONE) ? AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY; - intel_gtt_insert_sg_entries(vma->pages, vma->node.start >> PAGE_SHIFT, + intel_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT, flags); } @@ -470,30 +470,32 @@ static void i915_ggtt_clear_range(struct i915_address_space *vm, static void ggtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { - struct drm_i915_gem_object *obj = vma->obj; u32 pte_flags; - if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK)) + if (vma_res->bound_flags & (~flags & I915_VMA_BIND_MASK)) return; + vma_res->bound_flags |= flags; + /* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; - if (i915_gem_object_is_readonly(obj)) + if (vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(obj)) + if (vma_res->bi.lmem) pte_flags |= PTE_LM; - vm->insert_entries(vm, vma, cache_level, pte_flags); - vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; + vm->insert_entries(vm, vma_res, cache_level, pte_flags); + vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE; } -static void ggtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) +static void ggtt_unbind_vma(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - vm->clear_range(vm, vma->node.start, vma->size); + vm->clear_range(vm, vma_res->start, vma_res->vma_size); } static int ggtt_reserve_guc_top(struct i915_ggtt *ggtt) @@ -626,7 +628,7 @@ static int init_ggtt(struct i915_ggtt *ggtt) static void aliasing_gtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { @@ -634,25 +636,27 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm, /* Currently applicable only to VLV */ pte_flags = 0; - if (i915_gem_object_is_readonly(vma->obj)) + if (vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; if (flags & I915_VMA_LOCAL_BIND) ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm, - stash, vma, cache_level, flags); + stash, vma_res, cache_level, flags); if (flags & I915_VMA_GLOBAL_BIND) - vm->insert_entries(vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma_res, cache_level, pte_flags); + + vma_res->bound_flags |= flags; } static void aliasing_gtt_unbind_vma(struct i915_address_space *vm, - struct i915_vma *vma) + struct i915_vma_resource *vma_res) { - if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) - vm->clear_range(vm, vma->node.start, vma->size); + if (vma_res->bound_flags & I915_VMA_GLOBAL_BIND) + vm->clear_range(vm, vma_res->start, vma_res->vma_size); - if (i915_vma_is_bound(vma, I915_VMA_LOCAL_BIND)) - ppgtt_unbind_vma(&i915_vm_to_ggtt(vm)->alias->vm, vma); + if (vma_res->bound_flags & I915_VMA_LOCAL_BIND) + ppgtt_unbind_vma(&i915_vm_to_ggtt(vm)->alias->vm, vma_res); } static int init_aliasing_ppgtt(struct i915_ggtt *ggtt) @@ -1304,7 +1308,7 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm) atomic_read(&vma->flags) & I915_VMA_BIND_MASK; GEM_BUG_ON(!was_bound); - vma->ops->bind_vma(vm, NULL, vma, + vma->ops->bind_vma(vm, NULL, vma->resource, obj ? obj->cache_level : 0, was_bound); if (obj) { /* only used during resume => exclusive access */ diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 15b98321e89a..19c2497630e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -27,6 +27,7 @@ #include "gt/intel_reset.h" #include "i915_selftest.h" +#include "i915_vma_resource.h" #include "i915_vma_types.h" #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) @@ -200,7 +201,7 @@ struct i915_vma_ops { /* Map an object into an address space with the given cache flags. */ void (*bind_vma)(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); /* @@ -208,7 +209,7 @@ struct i915_vma_ops { * setting the valid PTE entries to a reserved scratch page. */ void (*unbind_vma)(struct i915_address_space *vm, - struct i915_vma *vma); + struct i915_vma_resource *vma_res); int (*set_pages)(struct i915_vma *vma); void (*clear_pages)(struct i915_vma *vma); @@ -288,7 +289,7 @@ struct i915_address_space { enum i915_cache_level cache_level, u32 flags); void (*insert_entries)(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); void (*cleanup)(struct i915_address_space *vm); @@ -607,11 +608,11 @@ void clear_pages(struct i915_vma *vma); void ppgtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags); void ppgtt_unbind_vma(struct i915_address_space *vm, - struct i915_vma *vma); + struct i915_vma_resource *vma_res); void gtt_write_workarounds(struct intel_gt *gt); @@ -634,8 +635,8 @@ __vm_create_scratch_for_read_pinned(struct i915_address_space *vm, unsigned long static inline struct sgt_dma { struct scatterlist *sg; dma_addr_t dma, max; -} sgt_dma(struct i915_vma *vma) { - struct scatterlist *sg = vma->pages->sgl; +} sgt_dma(struct i915_vma_resource *vma_res) { + struct scatterlist *sg = vma_res->bi.pages->sgl; dma_addr_t addr = sg_dma_address(sg); return (struct sgt_dma){ sg, addr, addr + sg_dma_len(sg) }; diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 4396bfd630d8..39ed6d709022 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -179,32 +179,34 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt, void ppgtt_bind_vma(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { u32 pte_flags; - if (!test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) { - vm->allocate_va_range(vm, stash, vma->node.start, vma->size); - set_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma)); + if (!vma_res->allocated) { + vm->allocate_va_range(vm, stash, vma_res->start, + vma_res->vma_size); + vma_res->allocated = true; } /* Applicable to VLV, and gen8+ */ pte_flags = 0; - if (i915_gem_object_is_readonly(vma->obj)) + if (vma_res->bi.readonly) pte_flags |= PTE_READ_ONLY; - if (i915_gem_object_is_lmem(vma->obj)) + if (vma_res->bi.lmem) pte_flags |= PTE_LM; - vm->insert_entries(vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma_res, cache_level, pte_flags); wmb(); } -void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) +void ppgtt_unbind_vma(struct i915_address_space *vm, + struct i915_vma_resource *vma_res) { - if (test_and_clear_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) - vm->clear_range(vm, vma->node.start, vma->size); + if (vma_res->allocated) + vm->clear_range(vm, vma_res->start, vma_res->vma_size); } static unsigned long pd_count(u64 size, int shift) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index a5af05bde6f2..777fc6f0ceff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -448,20 +448,19 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw) { struct drm_i915_gem_object *obj = uc_fw->obj; struct i915_ggtt *ggtt = __uc_fw_to_gt(uc_fw)->ggtt; - struct i915_vma *dummy = &uc_fw->dummy; + struct i915_vma_resource *dummy = &uc_fw->dummy; u32 pte_flags = 0; - dummy->node.start = uc_fw_ggtt_offset(uc_fw); - dummy->node.size = obj->base.size; - dummy->pages = obj->mm.pages; - dummy->vm = &ggtt->vm; + dummy->start = uc_fw_ggtt_offset(uc_fw); + dummy->node_size = obj->base.size; + dummy->bi.pages = obj->mm.pages; GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); - GEM_BUG_ON(dummy->node.size > ggtt->uc_fw.size); + GEM_BUG_ON(dummy->node_size > ggtt->uc_fw.size); /* uc_fw->obj cache domains were not controlled across suspend */ if (i915_gem_object_has_struct_page(obj)) - drm_clflush_sg(dummy->pages); + drm_clflush_sg(dummy->bi.pages); if (i915_gem_object_is_lmem(obj)) pte_flags |= PTE_LM; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h index d9d1dc0b4cbb..3229018877d3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h @@ -85,7 +85,7 @@ struct intel_uc_fw { * threaded as it done during driver load (inherently single threaded) * or during a GT reset (mutex guarantees single threaded). */ - struct i915_vma dummy; + struct i915_vma_resource dummy; struct i915_vma *rsa_data; /* diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 359d8ffc6e36..ec2cffddfd7b 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -170,7 +170,8 @@ i915_debugfs_describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) seq_printf(m, " (%s offset: %08llx, size: %08llx, pages: %s", stringify_vma_type(vma), vma->node.start, vma->node.size, - stringify_page_sizes(vma->page_sizes.gtt, NULL, 0)); + stringify_page_sizes(vma->resource->page_sizes_gtt, + NULL, 0)); if (i915_vma_is_ggtt(vma) || i915_vma_is_dpt(vma)) { switch (vma->ggtt_view.type) { case I915_GGTT_VIEW_NORMAL: diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index e06d3aee0017..e5bb6ecf1202 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1040,9 +1040,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, strcpy(dst->name, vsnap->name); dst->next = NULL; - dst->gtt_offset = vsnap->gtt_offset; - dst->gtt_size = vsnap->gtt_size; - dst->gtt_page_sizes = vsnap->page_sizes; + dst->gtt_offset = vsnap->vma_resource->start; + dst->gtt_size = vsnap->vma_resource->node_size; + dst->gtt_page_sizes = vsnap->vma_resource->page_sizes_gtt; dst->unused = 0; ret = -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 4308659bf552..7ad7e5251df6 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -299,7 +299,7 @@ static void __vma_bind(struct dma_fence_work *work) struct i915_vma *vma = vw->vma; vma->ops->bind_vma(vw->vm, &vw->stash, - vma, vw->cache_level, vw->flags); + vma->resource, vw->cache_level, vw->flags); } static void __vma_release(struct dma_fence_work *work) @@ -380,6 +380,21 @@ static int i915_vma_verify_bind_complete(struct i915_vma *vma) #define i915_vma_verify_bind_complete(_vma) 0 #endif +I915_SELFTEST_EXPORT void +i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, + struct i915_vma *vma) +{ + struct drm_i915_gem_object *obj = vma->obj; + + i915_vma_resource_init(vma_res, vma->pages, &vma->page_sizes, + i915_gem_object_is_readonly(obj), + i915_gem_object_is_lmem(obj), + vma->private, + vma->node.start, + vma->node.size, + vma->size); +} + /** * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space. * @vma: VMA to map @@ -437,7 +452,7 @@ int i915_vma_bind(struct i915_vma *vma, GEM_WARN_ON(!vma_flags); kfree(vma_res); } else { - i915_vma_resource_init(vma_res); + i915_vma_resource_init_from_vma(vma_res, vma); vma->resource = vma_res; } trace_i915_vma_bind(vma, bind_flags); @@ -477,7 +492,8 @@ int i915_vma_bind(struct i915_vma *vma, if (ret) return ret; } - vma->ops->bind_vma(vma->vm, NULL, vma, cache_level, bind_flags); + vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level, + bind_flags); } atomic_or(bind_flags, &vma->flags); @@ -1381,7 +1397,7 @@ void __i915_vma_evict(struct i915_vma *vma) if (likely(atomic_read(&vma->vm->open))) { trace_i915_vma_unbind(vma); - vma->ops->unbind_vma(vma->vm, vma); + vma->ops->unbind_vma(vma->vm, vma->resource); } atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 947ca7b3f3fa..e634fb1d22bf 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -341,12 +341,6 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma); */ void i915_vma_unpin_iomap(struct i915_vma *vma); -static inline struct page *i915_vma_first_page(struct i915_vma *vma) -{ - GEM_BUG_ON(!vma->pages); - return sg_page(vma->pages->sgl); -} - /** * i915_vma_pin_fence - pin fencing state * @vma: vma to pin fencing for @@ -447,6 +441,11 @@ i915_vma_get_current_resource(struct i915_vma *vma) return i915_vma_resource_get(vma->resource); } +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) +void i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, + struct i915_vma *vma); +#endif + void i915_vma_module_exit(void); int i915_vma_module_init(void); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 833e987bed2a..c86db89ab5d2 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -23,15 +23,12 @@ static struct dma_fence_ops unbind_fence_ops = { }; /** - * i915_vma_resource_init - Initialize a vma resource. + * __i915_vma_resource_init - Initialize a vma resource. * @vma_res: The vma resource to initialize * - * Initializes a vma resource allocated using i915_vma_resource_alloc(). - * The reason for having separate allocate and initialize function is that - * initialization may need to be performed from under a lock where - * allocation is not allowed. + * Initializes the private members of a vma resource. */ -void i915_vma_resource_init(struct i915_vma_resource *vma_res) +void __i915_vma_resource_init(struct i915_vma_resource *vma_res) { spin_lock_init(&vma_res->lock); dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 34744da23072..9872de58268b 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -9,6 +9,25 @@ #include #include +#include "i915_gem.h" + +struct i915_page_sizes { + /** + * The sg mask of the pages sg_table. i.e the mask of + * the lengths for each sg entry. + */ + unsigned int phys; + + /** + * The gtt page sizes we are allowed to use given the + * sg mask and the supported page sizes. This will + * express the smallest unit we can use for the whole + * object, as well as the larger sizes we may be able + * to use opportunistically. + */ + unsigned int sg; +}; + /** * struct i915_vma_resource - Snapshotted unbind information. * @unbind_fence: Fence to mark unbinding complete. Note that this fence @@ -20,6 +39,13 @@ * @hold_count: Number of holders blocking the fence from finishing. * The vma itself is keeping a hold, which is released when unbind * is scheduled. + * @private: Bind backend private info. + * @start: Offset into the address space of bind range start. + * @node_size: Size of the allocated range manager node. + * @vma_size: Bind size. + * @page_sizes_gtt: Resulting page sizes from the bind operation. + * @bound_flags: Flags indicating binding status. + * @allocated: Backend private data. TODO: Should move into @private. * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -29,6 +55,32 @@ struct i915_vma_resource { /* See above for description of the lock. */ spinlock_t lock; refcount_t hold_count; + + /** + * struct i915_vma_bindinfo - Information needed for async bind + * only but that can be dropped after the bind has taken place. + * Consider making this a separate argument to the bind_vma + * op, coalescing with other arguments like vm, stash, cache_level + * and flags + * @pages: The pages sg-table. + * @page_sizes: Page sizes of the pages. + * @readonly: Whether the vma should be bound read-only. + * @lmem: Whether the vma points to lmem. + */ + struct i915_vma_bindinfo { + struct sg_table *pages; + struct i915_page_sizes page_sizes; + bool readonly:1; + bool lmem:1; + } bi; + + void *private; + unsigned long start; + unsigned long node_size; + unsigned long vma_size; + u32 page_sizes_gtt; + u32 bound_flags; + bool allocated:1; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -41,6 +93,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); +void __i915_vma_resource_init(struct i915_vma_resource *vma_res); + /** * i915_vma_resource_get - Take a reference on a vma resource * @vma_res: The vma resource on which to take a reference. @@ -63,8 +117,47 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) dma_fence_put(&vma_res->unbind_fence); } -#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) -void i915_vma_resource_init(struct i915_vma_resource *vma_res); -#endif +/** + * i915_vma_resource_init - Initialize a vma resource. + * @vma_res: The vma resource to initialize + * @pages: The pages sg-table. + * @page_sizes: Page sizes of the pages. + * @readonly: Whether the vma should be bound read-only. + * @lmem: Whether the vma points to lmem. + * @private: Bind backend private info. + * @start: Offset into the address space of bind range start. + * @node_size: Size of the allocated range manager node. + * @size: Bind size. + * + * Initializes a vma resource allocated using i915_vma_resource_alloc(). + * The reason for having separate allocate and initialize function is that + * initialization may need to be performed from under a lock where + * allocation is not allowed. + */ +static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, + struct sg_table *pages, + const struct i915_page_sizes *page_sizes, + bool readonly, + bool lmem, + void *private, + unsigned long start, + unsigned long node_size, + unsigned long size) +{ + __i915_vma_resource_init(vma_res); + vma_res->bi.pages = pages; + vma_res->bi.page_sizes = *page_sizes; + vma_res->bi.readonly = readonly; + vma_res->bi.lmem = lmem; + vma_res->private = private; + vma_res->start = start; + vma_res->node_size = node_size; + vma_res->vma_size = size; +} + +static inline void i915_vma_resource_fini(struct i915_vma_resource *vma_res) +{ + GEM_BUG_ON(refcount_read(&vma_res->hold_count) != 1); +} #endif diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.c b/drivers/gpu/drm/i915/i915_vma_snapshot.c index f7333c7a2f5e..69f62c1ca967 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.c +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.c @@ -24,11 +24,7 @@ void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, assert_object_held(vma->obj); vsnap->name = name; - vsnap->size = vma->size; vsnap->obj_size = vma->obj->base.size; - vsnap->gtt_offset = vma->node.start; - vsnap->gtt_size = vma->node.size; - vsnap->page_sizes = vma->page_sizes.gtt; vsnap->pages = vma->pages; vsnap->pages_rsgt = NULL; vsnap->mr = NULL; diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.h b/drivers/gpu/drm/i915/i915_vma_snapshot.h index e74588dd676b..1b08ce9f8576 100644 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.h +++ b/drivers/gpu/drm/i915/i915_vma_snapshot.h @@ -23,31 +23,23 @@ struct sg_table; /** * struct i915_vma_snapshot - Snapshot of vma metadata. - * @size: The vma size in bytes. * @obj_size: The size of the underlying object in bytes. - * @gtt_offset: The gtt offset the vma is bound to. - * @gtt_size: The size in bytes allocated for the vma in the GTT. * @pages: The struct sg_table pointing to the pages bound. * @pages_rsgt: The refcounted sg_table holding the reference for @pages if any. * @mr: The memory region pointed for the pages bound. * @kref: Reference for this structure. * @vma_resource: Pointer to the vma resource representing the vma binding. - * @page_sizes: The vma GTT page sizes information. * @onstack: Whether the structure shouldn't be freed on final put. * @present: Whether the structure is present and initialized. */ struct i915_vma_snapshot { const char *name; - size_t size; size_t obj_size; - size_t gtt_offset; - size_t gtt_size; struct sg_table *pages; struct i915_refct_sgt *pages_rsgt; struct intel_memory_region *mr; struct kref kref; struct i915_vma_resource *vma_resource; - u32 page_sizes; bool onstack:1; bool present:1; }; diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 29b48fedc675..1279b44e1566 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -239,11 +239,11 @@ static int lowlevel_hole(struct i915_address_space *vm, unsigned long end_time) { I915_RND_STATE(seed_prng); - struct i915_vma *mock_vma; + struct i915_vma_resource *mock_vma_res; unsigned int size; - mock_vma = kzalloc(sizeof(*mock_vma), GFP_KERNEL); - if (!mock_vma) + mock_vma_res = kzalloc(sizeof(*mock_vma_res), GFP_KERNEL); + if (!mock_vma_res) return -ENOMEM; /* Keep creating larger objects until one cannot fit into the hole */ @@ -269,7 +269,7 @@ static int lowlevel_hole(struct i915_address_space *vm, break; } while (count >>= 1); if (!count) { - kfree(mock_vma); + kfree(mock_vma_res); return -ENOMEM; } GEM_BUG_ON(!order); @@ -343,12 +343,12 @@ static int lowlevel_hole(struct i915_address_space *vm, break; } - mock_vma->pages = obj->mm.pages; - mock_vma->node.size = BIT_ULL(size); - mock_vma->node.start = addr; + mock_vma_res->bi.pages = obj->mm.pages; + mock_vma_res->node_size = BIT_ULL(size); + mock_vma_res->start = addr; with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref) - vm->insert_entries(vm, mock_vma, + vm->insert_entries(vm, mock_vma_res, I915_CACHE_NONE, 0); } count = n; @@ -371,7 +371,7 @@ static int lowlevel_hole(struct i915_address_space *vm, cleanup_freed_objects(vm->i915); } - kfree(mock_vma); + kfree(mock_vma_res); return 0; } @@ -1280,6 +1280,7 @@ static void track_vma_bind(struct i915_vma *vma) atomic_set(&vma->pages_count, I915_VMA_PAGES_ACTIVE); __i915_gem_object_pin_pages(obj); vma->pages = obj->mm.pages; + vma->resource->bi.pages = vma->pages; mutex_lock(&vma->vm->mutex); list_add_tail(&vma->vm_link, &vma->vm->bound_list); @@ -1354,7 +1355,7 @@ static int reserve_gtt_with_resource(struct i915_vma *vma, u64 offset) obj->cache_level, 0); if (!err) { - i915_vma_resource_init(vma_res); + i915_vma_resource_init_from_vma(vma_res, vma); vma->resource = vma_res; } else { kfree(vma_res); @@ -1533,7 +1534,7 @@ static int insert_gtt_with_resource(struct i915_vma *vma) err = i915_gem_gtt_insert(vm, &vma->node, obj->base.size, 0, obj->cache_level, 0, vm->total, 0); if (!err) { - i915_vma_resource_init(vma_res); + i915_vma_resource_init_from_vma(vma_res, vma); vma->resource = vma_res; } else { kfree(vma_res); @@ -1958,6 +1959,7 @@ static int igt_cs_tlb(void *arg) struct i915_vm_pt_stash stash = {}; struct i915_request *rq; struct i915_gem_ww_ctx ww; + struct i915_vma_resource *vma_res; u64 offset; offset = igt_random_offset(&prng, @@ -1976,6 +1978,13 @@ static int igt_cs_tlb(void *arg) if (err) goto end; + vma_res = i915_vma_resource_alloc(); + if (IS_ERR(vma_res)) { + vma->ops->clear_pages(vma); + err = PTR_ERR(vma_res); + goto end; + } + i915_gem_ww_ctx_init(&ww, false); retry: err = i915_vm_lock_objects(vm, &ww); @@ -1997,51 +2006,62 @@ static int igt_cs_tlb(void *arg) goto retry; } i915_gem_ww_ctx_fini(&ww); - if (err) + if (err) { + kfree(vma_res); goto end; + } + i915_vma_resource_init_from_vma(vma_res, vma); /* Prime the TLB with the dummy pages */ for (i = 0; i < count; i++) { - vma->node.start = offset + i * PAGE_SIZE; - vm->insert_entries(vm, vma, I915_CACHE_NONE, 0); + vma_res->start = offset + i * PAGE_SIZE; + vm->insert_entries(vm, vma_res, I915_CACHE_NONE, + 0); - rq = submit_batch(ce, vma->node.start); + rq = submit_batch(ce, vma_res->start); if (IS_ERR(rq)) { err = PTR_ERR(rq); + i915_vma_resource_fini(vma_res); + kfree(vma_res); goto end; } i915_request_put(rq); } - + i915_vma_resource_fini(vma_res); vma->ops->clear_pages(vma); err = context_sync(ce); if (err) { pr_err("%s: dummy setup timed out\n", ce->engine->name); + kfree(vma_res); goto end; } vma = i915_vma_instance(act, vm, NULL); if (IS_ERR(vma)) { + kfree(vma_res); err = PTR_ERR(vma); goto end; } err = vma->ops->set_pages(vma); - if (err) + if (err) { + kfree(vma_res); goto end; + } + i915_vma_resource_init_from_vma(vma_res, vma); /* Replace the TLB with target batches */ for (i = 0; i < count; i++) { struct i915_request *rq; u32 *cs = batch + i * 64 / sizeof(*cs); u64 addr; - vma->node.start = offset + i * PAGE_SIZE; - vm->insert_entries(vm, vma, I915_CACHE_NONE, 0); + vma_res->start = offset + i * PAGE_SIZE; + vm->insert_entries(vm, vma_res, I915_CACHE_NONE, 0); - addr = vma->node.start + i * 64; + addr = vma_res->start + i * 64; cs[4] = MI_NOOP; cs[6] = lower_32_bits(addr); cs[7] = upper_32_bits(addr); @@ -2050,6 +2070,8 @@ static int igt_cs_tlb(void *arg) rq = submit_batch(ce, addr); if (IS_ERR(rq)) { err = PTR_ERR(rq); + i915_vma_resource_fini(vma_res); + kfree(vma_res); goto end; } @@ -2066,6 +2088,8 @@ static int igt_cs_tlb(void *arg) } end_spin(batch, count - 1); + i915_vma_resource_fini(vma_res); + kfree(vma_res); vma->ops->clear_pages(vma); err = context_sync(ce); diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c index 32ca8962d0ab..3287804689a4 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c @@ -33,23 +33,23 @@ static void mock_insert_page(struct i915_address_space *vm, } static void mock_insert_entries(struct i915_address_space *vm, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level level, u32 flags) { } static void mock_bind_ppgtt(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { GEM_BUG_ON(flags & I915_VMA_GLOBAL_BIND); - set_bit(I915_VMA_LOCAL_BIND_BIT, __i915_vma_flags(vma)); + vma_res->bound_flags |= flags; } static void mock_unbind_ppgtt(struct i915_address_space *vm, - struct i915_vma *vma) + struct i915_vma_resource *vma_res) { } @@ -95,14 +95,14 @@ struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name) static void mock_bind_ggtt(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, - struct i915_vma *vma, + struct i915_vma_resource *vma_res, enum i915_cache_level cache_level, u32 flags) { } static void mock_unbind_ggtt(struct i915_address_space *vm, - struct i915_vma *vma) + struct i915_vma_resource *vma_res) { } From patchwork Fri Dec 17 09:19:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C08EC433F5 for ; Fri, 17 Dec 2021 09:20:10 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6912910FAAF; Fri, 17 Dec 2021 09:19:53 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id DE9F410FA62; Fri, 17 Dec 2021 09:19:50 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493347" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493347" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:50 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321554" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:49 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:28 +0100 Message-Id: <20211217091929.105781-7-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 6/7] drm/i915: Use vma resources for async unbinding X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Implement async (non-blocking) unbinding by not syncing the vma before calling unbind on the vma_resource. Add the resulting unbind fence to the object's dma_resv from where it is picked up by the ttm migration code. Ideally these unbind fences should be coalesced with the migration blit fence to avoid stalling the migration blit waiting for unbind, as they can certainly go on in parallel, but since we don't yet have a reasonable data structure to use to coalesce fences and attach the resulting fence to a timeline, we defer that for now. Note that with async unbinding, even while the unbind waits for the preceding bind to complete before unbinding, the vma itself might have been destroyed in the process, clearing the vma pages. Therefore we can only allow async unbinding if we have a refcounted sg-list and keep a refcount on that for the vma resource pages to stay intact until binding occurs. If this condition is not met, a request for an async unbind is diverted to a sync unbind. v2: - Use a separate kmem_cache for vma resources for now to isolate their memory allocation and aid debugging. - Move the check for vm closed to the actual unbinding thread. Regardless of whether the vm is closed, we need the unbind fence to properly wait for capture. - Clear vma_res::vm on unbind and update its documentation. Signed-off-by: Thomas Hellström Reported-by: kernel test robot --- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 11 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 + drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 3 + drivers/gpu/drm/i915/i915_module.c | 3 + drivers/gpu/drm/i915/i915_vma.c | 181 ++++++++-- drivers/gpu/drm/i915/i915_vma.h | 3 +- drivers/gpu/drm/i915/i915_vma_resource.c | 326 +++++++++++++++++-- drivers/gpu/drm/i915/i915_vma_resource.h | 45 +++ 11 files changed, 519 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c index e8a99e8cd129..0b0477f0af8b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c @@ -142,7 +142,16 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo) struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); int ret; - ret = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE); + /* + * Note: The async unbinding here will actually transform the + * blocking wait for unbind into a wait before finally submitting + * evict / migration blit and thus stall the migration timeline + * which may not be good for overall throughput. We should make + * sure we await the unbind fences *after* the migration blit + * instead of *before* as we currently do. + */ + ret = i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE | + I915_GEM_OBJECT_UNBIND_ASYNC); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index ad54941fcb98..3e9fbbfa13c6 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -145,7 +145,7 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm) continue; if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) { - __i915_vma_evict(vma); + __i915_vma_evict(vma, false); drm_mm_remove_node(&vma->node); } } diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 30683c06b344..b582a4c6c3c7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -161,6 +161,9 @@ static void __i915_vm_release(struct work_struct *work) struct i915_address_space *vm = container_of(work, struct i915_address_space, release_work); + /* Synchronize async unbinds. */ + i915_vma_resource_bind_dep_sync_all(vm); + vm->cleanup(vm); i915_address_space_fini(vm); @@ -189,6 +192,7 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass) if (!kref_read(&vm->resv_ref)) kref_init(&vm->resv_ref); + vm->pending_unbind = RB_ROOT_CACHED; INIT_WORK(&vm->release_work, __i915_vm_release); atomic_set(&vm->open, 1); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 19c2497630e8..b9bd60cb2687 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -267,6 +267,9 @@ struct i915_address_space { /* Flags used when creating page-table objects for this vm */ unsigned long lmem_pt_obj_flags; + /* Interval tree for pending unbind vma resources */ + struct rb_root_cached pending_unbind; + struct drm_i915_gem_object * (*alloc_pt_dma)(struct i915_address_space *vm, int sz); struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ed48179bacd8..a99696500045 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1666,6 +1666,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, #define I915_GEM_OBJECT_UNBIND_BARRIER BIT(1) #define I915_GEM_OBJECT_UNBIND_TEST BIT(2) #define I915_GEM_OBJECT_UNBIND_VM_TRYLOCK BIT(3) +#define I915_GEM_OBJECT_UNBIND_ASYNC BIT(4) void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 527228d4da7e..22ffa6636f0e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -165,6 +165,9 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, } else { ret = -EBUSY; } + } else if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) { + assert_object_held(vma->obj); + ret = i915_vma_unbind_async(vma); } else { ret = i915_vma_unbind(vma); } diff --git a/drivers/gpu/drm/i915/i915_module.c b/drivers/gpu/drm/i915/i915_module.c index f6bcd2f89257..a8f175960b34 100644 --- a/drivers/gpu/drm/i915/i915_module.c +++ b/drivers/gpu/drm/i915/i915_module.c @@ -17,6 +17,7 @@ #include "i915_scheduler.h" #include "i915_selftest.h" #include "i915_vma.h" +#include "i915_vma_resource.h" static int i915_check_nomodeset(void) { @@ -64,6 +65,8 @@ static const struct { .exit = i915_scheduler_module_exit }, { .init = i915_vma_module_init, .exit = i915_vma_module_exit }, + { .init = i915_vma_resource_module_init, + .exit = i915_vma_resource_module_exit }, { .init = i915_mock_selftests }, { .init = i915_pmu_init, .exit = i915_pmu_exit }, diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 7ad7e5251df6..3c4389aac53a 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -286,9 +286,10 @@ struct i915_vma_work { struct dma_fence_work base; struct i915_address_space *vm; struct i915_vm_pt_stash stash; - struct i915_vma *vma; + struct i915_vma_resource *vma_res; struct drm_i915_gem_object *pinned; struct i915_sw_dma_fence_cb cb; + struct i915_refct_sgt *rsgt; enum i915_cache_level cache_level; unsigned int flags; }; @@ -296,10 +297,11 @@ struct i915_vma_work { static void __vma_bind(struct dma_fence_work *work) { struct i915_vma_work *vw = container_of(work, typeof(*vw), base); - struct i915_vma *vma = vw->vma; + struct i915_vma_resource *vma_res = vw->vma_res; + + vma_res->ops->bind_vma(vma_res->vm, &vw->stash, + vma_res, vw->cache_level, vw->flags); - vma->ops->bind_vma(vw->vm, &vw->stash, - vma->resource, vw->cache_level, vw->flags); } static void __vma_release(struct dma_fence_work *work) @@ -313,6 +315,10 @@ static void __vma_release(struct dma_fence_work *work) i915_vm_free_pt_stash(vw->vm, &vw->stash); i915_vm_put(vw->vm); + if (vw->vma_res) + i915_vma_resource_put(vw->vma_res); + if (vw->rsgt) + i915_refct_sgt_put(vw->rsgt); } static const struct dma_fence_work_ops bind_ops = { @@ -386,13 +392,11 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, { struct drm_i915_gem_object *obj = vma->obj; - i915_vma_resource_init(vma_res, vma->pages, &vma->page_sizes, + i915_vma_resource_init(vma_res, vma->vm, vma->pages, &vma->page_sizes, i915_gem_object_is_readonly(obj), i915_gem_object_is_lmem(obj), - vma->private, - vma->node.start, - vma->node.size, - vma->size); + vma->ops, vma->private, vma->node.start, + vma->node.size, vma->size); } /** @@ -416,6 +420,7 @@ int i915_vma_bind(struct i915_vma *vma, { u32 bind_flags; u32 vma_flags; + int ret; lockdep_assert_held(&vma->vm->mutex); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); @@ -424,12 +429,12 @@ int i915_vma_bind(struct i915_vma *vma, if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start, vma->node.size, vma->vm->total))) { - kfree(vma_res); + i915_vma_resource_free(vma_res); return -ENODEV; } if (GEM_DEBUG_WARN_ON(!flags)) { - kfree(vma_res); + i915_vma_resource_free(vma_res); return -EINVAL; } @@ -441,12 +446,30 @@ int i915_vma_bind(struct i915_vma *vma, bind_flags &= ~vma_flags; if (bind_flags == 0) { - kfree(vma_res); + i915_vma_resource_free(vma_res); return 0; } GEM_BUG_ON(!vma->pages); + /* Wait for or await async unbinds touching our range */ + if (work && bind_flags & vma->vm->bind_async_flags) + ret = i915_vma_resource_bind_dep_await(vma->vm, + &work->base.chain, + vma->node.start, + vma->node.size, + true, + GFP_NOWAIT | + __GFP_RETRY_MAYFAIL | + __GFP_NOWARN); + else + ret = i915_vma_resource_bind_dep_sync(vma->vm, vma->node.start, + vma->node.size, true); + if (ret) { + i915_vma_resource_free(vma_res); + return ret; + } + if (vma->resource || !vma_res) { /* Rebinding with an additional I915_VMA_*_BIND */ GEM_WARN_ON(!vma_flags); @@ -459,9 +482,11 @@ int i915_vma_bind(struct i915_vma *vma, if (work && bind_flags & vma->vm->bind_async_flags) { struct dma_fence *prev; - work->vma = vma; + work->vma_res = i915_vma_resource_get(vma->resource); work->cache_level = cache_level; work->flags = bind_flags; + if (vma->obj->mm.rsgt) + work->rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); /* * Note we only want to chain up to the migration fence on @@ -489,8 +514,12 @@ int i915_vma_bind(struct i915_vma *vma, int ret; ret = i915_gem_object_wait_moving_fence(vma->obj, true); - if (ret) + if (ret) { + i915_vma_resource_free(vma->resource); + vma->resource = NULL; + return ret; + } } vma->ops->bind_vma(vma->vm, NULL, vma->resource, cache_level, bind_flags); @@ -1361,7 +1390,7 @@ int _i915_vma_move_to_active(struct i915_vma *vma, return 0; } -void __i915_vma_evict(struct i915_vma *vma) +struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) { struct dma_fence *unbind_fence; @@ -1395,27 +1424,35 @@ void __i915_vma_evict(struct i915_vma *vma) GEM_BUG_ON(vma->fence); GEM_BUG_ON(i915_vma_has_userfault(vma)); - if (likely(atomic_read(&vma->vm->open))) { - trace_i915_vma_unbind(vma); - vma->ops->unbind_vma(vma->vm, vma->resource); - } - atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), - &vma->flags); + /* Object backend must be async capable. */ + GEM_WARN_ON(async && !vma->obj->mm.rsgt); + trace_i915_vma_unbind(vma); unbind_fence = i915_vma_resource_unbind(vma->resource); - i915_vma_resource_put(vma->resource); vma->resource = NULL; + atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), + &vma->flags); + + /* Object backend must be async capable. */ + GEM_WARN_ON(async && !vma->obj->mm.rsgt); + i915_vma_detach(vma); - vma_unbind_pages(vma); + + if (!async && unbind_fence) { + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); + unbind_fence = NULL; + } /* - * This uninterruptible wait under the vm mutex is currently - * only ever blocking while the vma is being captured from. - * With async unbinding, this wait here will be removed. + * Binding itself may not have completed until the unbind fence signals, + * so don't drop the pages until that happens, unless the resource is + * async_capable. */ - dma_fence_wait(unbind_fence, false); - dma_fence_put(unbind_fence); + + vma_unbind_pages(vma); + return unbind_fence; } int __i915_vma_unbind(struct i915_vma *vma) @@ -1442,12 +1479,51 @@ int __i915_vma_unbind(struct i915_vma *vma) return ret; GEM_BUG_ON(i915_vma_is_active(vma)); - __i915_vma_evict(vma); + __i915_vma_evict(vma, false); drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */ return 0; } +static struct dma_fence *__i915_vma_unbind_async(struct i915_vma *vma) +{ + struct dma_fence *fence; + + lockdep_assert_held(&vma->vm->mutex); + + if (!drm_mm_node_allocated(&vma->node)) + return NULL; + + if (i915_vma_is_pinned(vma)) { + vma_print_allocator(vma, "is pinned"); + return ERR_PTR(-EAGAIN); + } + + /* + * We probably need to replace this with awaiting the fences of the + * object's dma_resv when the vma active goes away. When doing that + * we need to be careful to not add the vma_resource unbind fence + * immediately to the object's dma_resv, because then unbinding + * the next vma from the object, in case there are many, will + * actually await the unbinding of the previous vmas, which is + * undesirable. + */ + if (i915_sw_fence_await_active(&vma->resource->chain, &vma->active, + I915_ACTIVE_AWAIT_EXCL | + I915_ACTIVE_AWAIT_ACTIVE) < 0) { + int ret = i915_vma_sync(vma); + + if (ret) + return ERR_PTR(ret); + } + + fence = __i915_vma_evict(vma, true); + + drm_mm_remove_node(&vma->node); /* pairs with i915_vma_release() */ + + return fence; +} + int i915_vma_unbind(struct i915_vma *vma) { struct i915_address_space *vm = vma->vm; @@ -1484,6 +1560,53 @@ int i915_vma_unbind(struct i915_vma *vma) return err; } +int i915_vma_unbind_async(struct i915_vma *vma) +{ + struct drm_i915_gem_object *obj = vma->obj; + struct i915_address_space *vm = vma->vm; + struct dma_fence *fence; + int err; + + /* + * We need the dma-resv lock since we add the + * unbind fence to the dma-resv object. + */ + assert_object_held(obj); + + if (!drm_mm_node_allocated(&vma->node)) + return 0; + + if (i915_vma_is_pinned(vma)) { + vma_print_allocator(vma, "is pinned"); + return -EAGAIN; + } + + if (!obj->mm.rsgt) + return i915_vma_unbind(vma); + + if (i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) + /* Might require pm. Not handled yet.*/ + return i915_vma_unbind(vma); + + err = dma_resv_reserve_shared(obj->base.resv, 1); + if (err) + return i915_vma_unbind(vma); + + err = mutex_lock_interruptible(&vma->vm->mutex); + if (err) + return err; + + fence = __i915_vma_unbind_async(vma); + mutex_unlock(&vm->mutex); + if (IS_ERR_OR_NULL(fence)) + return PTR_ERR_OR_ZERO(fence); + + dma_resv_add_shared_fence(obj->base.resv, fence); + dma_fence_put(fence); + + return 0; +} + struct i915_vma *i915_vma_make_unshrinkable(struct i915_vma *vma) { i915_gem_object_make_unshrinkable(vma->obj); diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index e634fb1d22bf..2b823d1943a1 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -215,9 +215,10 @@ bool i915_vma_misplaced(const struct i915_vma *vma, u64 size, u64 alignment, u64 flags); void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); void i915_vma_revoke_mmap(struct i915_vma *vma); -void __i915_vma_evict(struct i915_vma *vma); +struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); +int __must_check i915_vma_unbind_async(struct i915_vma *vma); void i915_vma_unlink_ctx(struct i915_vma *vma); void i915_vma_close(struct i915_vma *vma); void i915_vma_reopen(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index c86db89ab5d2..39ef68a531e3 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -2,39 +2,43 @@ /* * Copyright © 2021 Intel Corporation */ + +#include #include +#include "i915_sw_fence.h" #include "i915_vma_resource.h" -/* Callbacks for the unbind dma-fence. */ -static const char *get_driver_name(struct dma_fence *fence) -{ - return "vma unbind fence"; -} +#include "gt/intel_gtt.h" -static const char *get_timeline_name(struct dma_fence *fence) -{ - return "unbound"; -} - -static struct dma_fence_ops unbind_fence_ops = { - .get_driver_name = get_driver_name, - .get_timeline_name = get_timeline_name, -}; +static struct kmem_cache *slab_vma_resources; /** - * __i915_vma_resource_init - Initialize a vma resource. - * @vma_res: The vma resource to initialize + * DOC: + * We use a per-vm interval tree to keep track of vma_resources + * scheduled for unbind but not yet unbound. The tree is protected by + * the vm mutex, and nodes are removed just after the unbind fence signals. + * The removal takes the vm mutex from a kernel thread which we need to + * keep in mind so that we don't grab the mutex and try to wait for all + * pending unbinds to complete, because that will temporaryily block many + * of the workqueue threads, and people will get angry. * - * Initializes the private members of a vma resource. + * We should consider using a single ordered fence per VM instead but that + * requires ordering the unbinds and might introduce unnecessary waiting + * for unrelated unbinds. Amount of code will probably be roughly the same + * due to the simplicity of using the interval tree interface. + * + * Another drawback of this interval tree is that the complexity of insertion + * and removal of fences increases as O(ln(pending_unbinds)) instead of + * O(1) for a single fence without interval tree. */ -void __i915_vma_resource_init(struct i915_vma_resource *vma_res) -{ - spin_lock_init(&vma_res->lock); - dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, - &vma_res->lock, 0, 0); - refcount_set(&vma_res->hold_count, 1); -} +#define VMA_RES_START(_node) ((_node)->start) +#define VMA_RES_LAST(_node) ((_node)->start + (_node)->node_size - 1) +INTERVAL_TREE_DEFINE(struct i915_vma_resource, rb, + unsigned long, __subtree_last, + VMA_RES_START, VMA_RES_LAST, static, vma_res_itree); + +/* Callbacks for the unbind dma-fence. */ /** * i915_vma_resource_alloc - Allocate a vma resource @@ -45,15 +49,60 @@ void __i915_vma_resource_init(struct i915_vma_resource *vma_res) struct i915_vma_resource *i915_vma_resource_alloc(void) { struct i915_vma_resource *vma_res = - kzalloc(sizeof(*vma_res), GFP_KERNEL); + kmem_cache_zalloc(slab_vma_resources, GFP_KERNEL); return vma_res ? vma_res : ERR_PTR(-ENOMEM); } +/** + * i915_vma_resource_free - Free a vma resource + * @vma_res: The vma resource to free. + */ +void i915_vma_resource_free(struct i915_vma_resource *vma_res) +{ + kmem_cache_free(slab_vma_resources, vma_res); +} + +static const char *get_driver_name(struct dma_fence *fence) +{ + return "vma unbind fence"; +} + +static const char *get_timeline_name(struct dma_fence *fence) +{ + return "unbound"; +} + +static void unbind_fence_free_rcu(struct rcu_head *head) +{ + struct i915_vma_resource *vma_res = + container_of(head, typeof(*vma_res), unbind_fence.rcu); + + i915_vma_resource_free(vma_res); +} + +static void unbind_fence_release(struct dma_fence *fence) +{ + struct i915_vma_resource *vma_res = + container_of(fence, typeof(*vma_res), unbind_fence); + + i915_sw_fence_fini(&vma_res->chain); + + call_rcu(&fence->rcu, unbind_fence_free_rcu); +} + +static struct dma_fence_ops unbind_fence_ops = { + .get_driver_name = get_driver_name, + .get_timeline_name = get_timeline_name, + .release = unbind_fence_release, +}; + static void __i915_vma_resource_unhold(struct i915_vma_resource *vma_res) { - if (refcount_dec_and_test(&vma_res->hold_count)) - dma_fence_signal(&vma_res->unbind_fence); + if (!refcount_dec_and_test(&vma_res->hold_count)) + return; + + dma_fence_signal(&vma_res->unbind_fence); } /** @@ -102,6 +151,53 @@ bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, return held; } +static void i915_vma_resource_unbind_work(struct work_struct *work) +{ + struct i915_vma_resource *vma_res = + container_of(work, typeof(*vma_res), work); + struct i915_address_space *vm = vma_res->vm; + + if (likely(atomic_read(&vm->open))) + vma_res->ops->unbind_vma(vm, vma_res); + + vma_res->vm = NULL; + if (!RB_EMPTY_NODE(&vma_res->rb)) { + mutex_lock(&vm->mutex); + vma_res_itree_remove(vma_res, &vm->pending_unbind); + mutex_unlock(&vm->mutex); + } + + __i915_vma_resource_unhold(vma_res); + i915_vma_resource_put(vma_res); +} + +static int +i915_vma_resource_fence_notify(struct i915_sw_fence *fence, + enum i915_sw_fence_notify state) +{ + struct i915_vma_resource *vma_res = + container_of(fence, typeof(*vma_res), chain); + struct dma_fence *unbind_fence = + &vma_res->unbind_fence; + + switch (state) { + case FENCE_COMPLETE: + dma_fence_get(unbind_fence); + if (vma_res->immediate_unbind) { + i915_vma_resource_unbind_work(&vma_res->work); + } else { + INIT_WORK(&vma_res->work, i915_vma_resource_unbind_work); + queue_work(system_unbound_wq, &vma_res->work); + } + break; + case FENCE_FREE: + i915_vma_resource_put(vma_res); + break; + } + + return NOTIFY_DONE; +} + /** * i915_vma_resource_unbind - Unbind a vma resource * @vma_res: The vma resource to unbind. @@ -112,10 +208,178 @@ bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, * Return: A refcounted pointer to a dma-fence that signals when unbinding is * complete. */ -struct dma_fence * -i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) { - __i915_vma_resource_unhold(vma_res); - dma_fence_get(&vma_res->unbind_fence); + /* Reference for the returned fence. */ + i915_vma_resource_get(vma_res); + /* Reference for the sw fence */ + i915_vma_resource_get(vma_res); + + if (atomic_read(&vma_res->chain.pending) <= 1) { + RB_CLEAR_NODE(&vma_res->rb); + vma_res->immediate_unbind = 1; + } else { + vma_res_itree_insert(vma_res, &vma_res->vm->pending_unbind); + } + + i915_sw_fence_commit(&vma_res->chain); + return &vma_res->unbind_fence; } + +/** + * __i915_vma_resource_init - Initialize a vma resource. + * @vma_res: The vma resource to initialize + * + * Initializes the private members of a vma resource. + */ +void __i915_vma_resource_init(struct i915_vma_resource *vma_res) +{ + spin_lock_init(&vma_res->lock); + dma_fence_init(&vma_res->unbind_fence, &unbind_fence_ops, + &vma_res->lock, 0, 0); + refcount_set(&vma_res->hold_count, 1); + i915_sw_fence_init(&vma_res->chain, i915_vma_resource_fence_notify); +} + +/** + * i915_vma_resource_bind_dep_sync - Wait for / sync all unbinds touching a + * certain vm range. + * @vm: The vm to look at. + * @offset: The range start. + * @size: The range size. + * @intr: Whether to wait interrubtible. + * + * The function needs to be called with the vm lock held. + * + * Return: Zero on success, -ERESTARTSYS if interrupted and @intr==true + */ +int i915_vma_resource_bind_dep_sync(struct i915_address_space *vm, + unsigned long offset, + unsigned long size, + bool intr) +{ + struct i915_vma_resource *node; + unsigned long last = offset + size - 1; + + lockdep_assert_held(&vm->mutex); + might_sleep(); + + node = vma_res_itree_iter_first(&vm->pending_unbind, offset, last); + while (node) { + int ret = dma_fence_wait(&node->unbind_fence, intr); + + if (ret) + return ret; + + node = vma_res_itree_iter_next(node, offset, last); + } + + return 0; +} + +/** + * i915_vma_resource_bind_dep_sync_all - Wait for / sync all unbinds of a vm, + * releasing the vm lock while waiting. + * @vm: The vm to look at. + * + * The function may not be called with the vm lock held. + * Typically this is called at vm destruction to finish any pending + * unbind operations. The vm mutex is released while waiting to avoid + * stalling kernel workqueues trying to grab the mutex. + */ +void i915_vma_resource_bind_dep_sync_all(struct i915_address_space *vm) +{ + struct i915_vma_resource *node; + struct dma_fence *fence; + + do { + fence = NULL; + mutex_lock(&vm->mutex); + node = vma_res_itree_iter_first(&vm->pending_unbind, 0, + ULONG_MAX); + if (node) + fence = dma_fence_get_rcu(&node->unbind_fence); + mutex_unlock(&vm->mutex); + + if (fence) { + /* + * The wait makes sure the node eventually removes + * itself from the tree. + */ + dma_fence_wait(fence, false); + dma_fence_put(fence); + } + } while (node); +} + +/** + * i915_vma_resource_bind_dep_await - Have a struct i915_sw_fence await all + * pending unbinds in a certain range of a vm. + * @vm: The vm to look at. + * @sw_fence: The struct i915_sw_fence that will be awaiting the unbinds. + * @offset: The range start. + * @size: The range size. + * @intr: Whether to wait interrubtible. + * @gfp: Allocation mode for memory allocations. + * + * The function makes @sw_fence await all pending unbinds in a certain + * vm range before calling the complete notifier. To be able to await + * each individual unbind, the function needs to allocate memory using + * the @gpf allocation mode. If that fails, the function will instead + * wait for the unbind fence to signal, using @intr to judge whether to + * wait interruptible or not. Note that @gfp should ideally be selected so + * as to avoid any expensive memory allocation stalls and rather fail and + * synchronize itself. For now the vm mutex is required when calling this + * function with means that @gfp can't call into direct reclaim. In reality + * this means that during heavy memory pressure, we will sync in this + * function. + * + * Return: Zero on success, -ERESTARTSYS if interrupted and @intr==true + */ +int i915_vma_resource_bind_dep_await(struct i915_address_space *vm, + struct i915_sw_fence *sw_fence, + unsigned long offset, + unsigned long size, + bool intr, + gfp_t gfp) +{ + struct i915_vma_resource *node; + unsigned long last = offset + size - 1; + + lockdep_assert_held(&vm->mutex); + might_alloc(gfp); + might_sleep(); + + node = vma_res_itree_iter_first(&vm->pending_unbind, offset, last); + while (node) { + int ret; + + ret = i915_sw_fence_await_dma_fence(sw_fence, + &node->unbind_fence, + MAX_SCHEDULE_TIMEOUT, gfp); + if (ret) + ret = dma_fence_wait(&node->unbind_fence, intr); + if (ret) + return ret; + + node = vma_res_itree_iter_next(node, offset, last); + } + + return 0; +} + +void i915_vma_resource_module_exit(void) +{ + kmem_cache_destroy(slab_vma_resources); +} + +int __init i915_vma_resource_module_init(void) +{ + pr_err("vma resource size is %lu\n", sizeof(struct i915_vma_resource)); + slab_vma_resources = KMEM_CACHE(i915_vma_resource, SLAB_HWCACHE_ALIGN); + if (!slab_vma_resources) + return -ENOMEM; + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 9872de58268b..26914558bf6f 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -10,6 +10,7 @@ #include #include "i915_gem.h" +#include "i915_sw_fence.h" struct i915_page_sizes { /** @@ -39,6 +40,13 @@ struct i915_page_sizes { * @hold_count: Number of holders blocking the fence from finishing. * The vma itself is keeping a hold, which is released when unbind * is scheduled. + * @work: Work struct for deferred unbind work. + * @chain: Pointer to struct i915_sw_fence used to await dependencies. + * @rb: Rb node for the vm's pending unbind interval tree. + * @__subtree_last: Interval tree private member. + * @vm: non-refcounted pointer to the vm. This is for internal use only and + * this member is cleared after vm_resource unbind. + * @ops: Pointer to the backend i915_vma_ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. * @node_size: Size of the allocated range manager node. @@ -46,6 +54,8 @@ struct i915_page_sizes { * @page_sizes_gtt: Resulting page sizes from the bind operation. * @bound_flags: Flags indicating binding status. * @allocated: Backend private data. TODO: Should move into @private. + * @immediate_unbind: Unbind can be done immediately and don't need to be + * deferred to a work item awaiting unsignaled fences. * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -55,6 +65,11 @@ struct i915_vma_resource { /* See above for description of the lock. */ spinlock_t lock; refcount_t hold_count; + struct work_struct work; + struct i915_sw_fence chain; + struct rb_node rb; + unsigned long __subtree_last; + struct i915_address_space *vm; /** * struct i915_vma_bindinfo - Information needed for async bind @@ -74,13 +89,16 @@ struct i915_vma_resource { bool lmem:1; } bi; + const struct i915_vma_ops *ops; void *private; unsigned long start; unsigned long node_size; unsigned long vma_size; u32 page_sizes_gtt; + u32 bound_flags; bool allocated:1; + bool immediate_unbind:1; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -91,6 +109,8 @@ void i915_vma_resource_unhold(struct i915_vma_resource *vma_res, struct i915_vma_resource *i915_vma_resource_alloc(void); +void i915_vma_resource_free(struct i915_vma_resource *vma_res); + struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); void __i915_vma_resource_init(struct i915_vma_resource *vma_res); @@ -120,10 +140,12 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) /** * i915_vma_resource_init - Initialize a vma resource. * @vma_res: The vma resource to initialize + * @vm: Pointer to the vm. * @pages: The pages sg-table. * @page_sizes: Page sizes of the pages. * @readonly: Whether the vma should be bound read-only. * @lmem: Whether the vma points to lmem. + * @ops: The backend ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. * @node_size: Size of the allocated range manager node. @@ -135,20 +157,24 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) * allocation is not allowed. */ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, + struct i915_address_space *vm, struct sg_table *pages, const struct i915_page_sizes *page_sizes, bool readonly, bool lmem, + const struct i915_vma_ops *ops, void *private, unsigned long start, unsigned long node_size, unsigned long size) { __i915_vma_resource_init(vma_res); + vma_res->vm = vm; vma_res->bi.pages = pages; vma_res->bi.page_sizes = *page_sizes; vma_res->bi.readonly = readonly; vma_res->bi.lmem = lmem; + vma_res->ops = ops; vma_res->private = private; vma_res->start = start; vma_res->node_size = node_size; @@ -158,6 +184,25 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, static inline void i915_vma_resource_fini(struct i915_vma_resource *vma_res) { GEM_BUG_ON(refcount_read(&vma_res->hold_count) != 1); + i915_sw_fence_fini(&vma_res->chain); } +int i915_vma_resource_bind_dep_sync(struct i915_address_space *vm, + unsigned long first, + unsigned long last, + bool intr); + +int i915_vma_resource_bind_dep_await(struct i915_address_space *vm, + struct i915_sw_fence *sw_fence, + unsigned long first, + unsigned long last, + bool intr, + gfp_t gfp); + +void i915_vma_resource_bind_dep_sync_all(struct i915_address_space *vm); + +void i915_vma_resource_module_exit(void); + +int i915_vma_resource_module_init(void); + #endif From patchwork Fri Dec 17 09:19:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Hellstrom X-Patchwork-Id: 12683963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C5DB6C433EF for ; Fri, 17 Dec 2021 09:20:13 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0638C10FAA0; Fri, 17 Dec 2021 09:19:55 +0000 (UTC) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 899DC10FAA0; Fri, 17 Dec 2021 09:19:52 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10200"; a="300493350" X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="300493350" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:52 -0800 X-IronPort-AV: E=Sophos;i="5.88,213,1635231600"; d="scan'208";a="683321562" Received: from olindum-mobl1.ger.corp.intel.com (HELO thellstr-mobl1.intel.com) ([10.249.254.180]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 01:19:50 -0800 From: =?utf-8?q?Thomas_Hellstr=C3=B6m?= To: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Fri, 17 Dec 2021 10:19:29 +0100 Message-Id: <20211217091929.105781-8-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> References: <20211217091929.105781-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 7/7] drm/i915: Use struct vma_resource instead of struct vma_snapshot X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?utf-8?q?Thomas_Hellstr=C3=B6m?= , matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" There is always a struct vma_resource guaranteed to be alive when we access a corresponding struct vma_snapshot. So ditch the latter and instead of allocating vma_snapshots, reference the already existning vma_resource. This requires a couple of extra members in struct vma_resource but that's a small price to pay for the simplification. v2: - Fix a missing include and declaration (kernel test robot ) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 - .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 15 +-- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 +- drivers/gpu/drm/i915/i915_gpu_error.c | 87 ++++++------ drivers/gpu/drm/i915/i915_request.c | 12 +- drivers/gpu/drm/i915/i915_request.h | 6 +- drivers/gpu/drm/i915/i915_vma.c | 14 +- drivers/gpu/drm/i915/i915_vma_resource.c | 3 + drivers/gpu/drm/i915/i915_vma_resource.h | 28 +++- drivers/gpu/drm/i915/i915_vma_snapshot.c | 125 ------------------ drivers/gpu/drm/i915/i915_vma_snapshot.h | 101 -------------- 11 files changed, 88 insertions(+), 313 deletions(-) delete mode 100644 drivers/gpu/drm/i915/i915_vma_snapshot.c delete mode 100644 drivers/gpu/drm/i915/i915_vma_snapshot.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 98433ad74194..aa86ac33effc 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -175,7 +175,6 @@ i915-y += \ i915_ttm_buddy_manager.o \ i915_vma.o \ i915_vma_resource.o \ - i915_vma_snapshot.o \ intel_wopcm.o # general-purpose microcontroller (GuC) support diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b6faae1f9081..51649bbb8cc3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -29,7 +29,6 @@ #include "i915_gem_ioctls.h" #include "i915_trace.h" #include "i915_user_extensions.h" -#include "i915_vma_snapshot.h" struct eb_vma { struct i915_vma *vma; @@ -1952,7 +1951,6 @@ static void eb_capture_stage(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; unsigned int i = count, j; - struct i915_vma_snapshot *vsnap; while (i--) { struct eb_vma *ev = &eb->vma[i]; @@ -1962,11 +1960,6 @@ static void eb_capture_stage(struct i915_execbuffer *eb) if (!(flags & EXEC_OBJECT_CAPTURE)) continue; - vsnap = i915_vma_snapshot_alloc(GFP_KERNEL); - if (!vsnap) - continue; - - i915_vma_snapshot_init(vsnap, vma, "user"); for_each_batch_create_order(eb, j) { struct i915_capture_list *capture; @@ -1975,10 +1968,9 @@ static void eb_capture_stage(struct i915_execbuffer *eb) continue; capture->next = eb->capture_lists[j]; - capture->vma_snapshot = i915_vma_snapshot_get(vsnap); + capture->vma_res = i915_vma_resource_get(vma->resource); eb->capture_lists[j] = capture; } - i915_vma_snapshot_put(vsnap); } } @@ -3281,9 +3273,8 @@ eb_requests_create(struct i915_execbuffer *eb, struct dma_fence *in_fence, * _onstack interface. */ if (eb->batches[i]->vma) - i915_vma_snapshot_init_onstack(&eb->requests[i]->batch_snapshot, - eb->batches[i]->vma, - "batch"); + eb->requests[i]->batch_res = + i915_vma_resource_get(eb->batches[i]->vma->resource); if (eb->batch_pool) { GEM_BUG_ON(intel_context_is_parallel(eb->context)); intel_gt_buffer_pool_mark_active(eb->batch_pool, diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 74aa90587061..d1daa4cc2895 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1708,18 +1708,15 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine, static void print_request_ring(struct drm_printer *m, struct i915_request *rq) { - struct i915_vma_snapshot *vsnap = &rq->batch_snapshot; + struct i915_vma_resource *vma_res = rq->batch_res; void *ring; int size; - if (!i915_vma_snapshot_present(vsnap)) - vsnap = NULL; - drm_printf(m, "[head %04x, postfix %04x, tail %04x, batch 0x%08x_%08x]:\n", rq->head, rq->postfix, rq->tail, - vsnap ? upper_32_bits(vsnap->vma_resource->start) : ~0u, - vsnap ? lower_32_bits(vsnap->vma_resource->start) : ~0u); + vma_res ? upper_32_bits(vma_res->start) : ~0u, + vma_res ? lower_32_bits(vma_res->start) : ~0u); size = rq->tail - rq->head; if (rq->tail < rq->head) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index e5bb6ecf1202..e07802eadfd0 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -48,7 +48,6 @@ #include "i915_gpu_error.h" #include "i915_memcpy.h" #include "i915_scatterlist.h" -#include "i915_vma_snapshot.h" #define ALLOW_FAIL (__GFP_KSWAPD_RECLAIM | __GFP_RETRY_MAYFAIL | __GFP_NOWARN) #define ATOMIC_MAYFAIL (GFP_ATOMIC | __GFP_NOWARN) @@ -1013,8 +1012,10 @@ void __i915_gpu_coredump_free(struct kref *error_ref) static struct i915_vma_coredump * i915_vma_coredump_create(const struct intel_gt *gt, - const struct i915_vma_snapshot *vsnap, - struct i915_vma_compress *compress) + const struct i915_vma_resource *vma_res, + struct i915_vma_compress *compress, + const char *name) + { struct i915_ggtt *ggtt = gt->ggtt; const u64 slot = ggtt->error_capture.start; @@ -1024,7 +1025,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, might_sleep(); - if (!vsnap || !vsnap->pages || !compress) + if (!vma_res || !vma_res->bi.pages || !compress) return NULL; dst = kmalloc(sizeof(*dst), ALLOW_FAIL); @@ -1037,12 +1038,12 @@ i915_vma_coredump_create(const struct intel_gt *gt, } INIT_LIST_HEAD(&dst->page_list); - strcpy(dst->name, vsnap->name); + strcpy(dst->name, name); dst->next = NULL; - dst->gtt_offset = vsnap->vma_resource->start; - dst->gtt_size = vsnap->vma_resource->node_size; - dst->gtt_page_sizes = vsnap->vma_resource->page_sizes_gtt; + dst->gtt_offset = vma_res->start; + dst->gtt_size = vma_res->node_size; + dst->gtt_page_sizes = vma_res->page_sizes_gtt; dst->unused = 0; ret = -EINVAL; @@ -1050,7 +1051,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, void __iomem *s; dma_addr_t dma; - for_each_sgt_daddr(dma, iter, vsnap->pages) { + for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { mutex_lock(&ggtt->error_mutex); ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0); @@ -1068,11 +1069,11 @@ i915_vma_coredump_create(const struct intel_gt *gt, if (ret) break; } - } else if (vsnap->mr && vsnap->mr->type != INTEL_MEMORY_SYSTEM) { - struct intel_memory_region *mem = vsnap->mr; + } else if (vma_res->bi.lmem) { + struct intel_memory_region *mem = vma_res->mr; dma_addr_t dma; - for_each_sgt_daddr(dma, iter, vsnap->pages) { + for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { void __iomem *s; s = io_mapping_map_wc(&mem->iomap, @@ -1088,7 +1089,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, } else { struct page *page; - for_each_sgt_page(page, iter, vsnap->pages) { + for_each_sgt_page(page, iter, vma_res->bi.pages) { void *s; drm_clflush_pages(&page, 1); @@ -1324,33 +1325,32 @@ static bool record_context(struct i915_gem_context_coredump *e, struct intel_engine_capture_vma { struct intel_engine_capture_vma *next; - struct i915_vma_snapshot *vsnap; + struct i915_vma_resource *vma_res; char name[16]; bool lockdep_cookie; }; static struct intel_engine_capture_vma * capture_vma_snapshot(struct intel_engine_capture_vma *next, - struct i915_vma_snapshot *vsnap, - gfp_t gfp) + struct i915_vma_resource *vma_res, + gfp_t gfp, const char *name) { struct intel_engine_capture_vma *c; - if (!i915_vma_snapshot_present(vsnap)) + if (!vma_res) return next; c = kmalloc(sizeof(*c), gfp); if (!c) return next; - if (!i915_vma_snapshot_resource_pin(vsnap, &c->lockdep_cookie)) { + if (!i915_vma_resource_hold(vma_res, &c->lockdep_cookie)) { kfree(c); return next; } - strcpy(c->name, vsnap->name); - c->vsnap = vsnap; - i915_vma_snapshot_get(vsnap); + strcpy(c->name, name); + c->vma_res = i915_vma_resource_get(vma_res); c->next = next; return c; @@ -1362,8 +1362,6 @@ capture_vma(struct intel_engine_capture_vma *next, const char *name, gfp_t gfp) { - struct i915_vma_snapshot *vsnap; - if (!vma) return next; @@ -1372,19 +1370,10 @@ capture_vma(struct intel_engine_capture_vma *next, * to a struct i915_vma_snapshot at command submission time. * Not here. */ - GEM_WARN_ON(!i915_vma_is_pinned(vma)); - if (!i915_vma_is_pinned(vma)) - return next; - - vsnap = i915_vma_snapshot_alloc(gfp); - if (!vsnap) + if (GEM_WARN_ON(!i915_vma_is_pinned(vma))) return next; - i915_vma_snapshot_init(vsnap, vma, name); - next = capture_vma_snapshot(next, vsnap, gfp); - - /* FIXME: Replace on async unbind. */ - i915_vma_snapshot_put(vsnap); + next = capture_vma_snapshot(next, vma->resource, gfp, name); return next; } @@ -1397,7 +1386,8 @@ capture_user(struct intel_engine_capture_vma *capture, struct i915_capture_list *c; for (c = rq->capture_list; c; c = c->next) - capture = capture_vma_snapshot(capture, c->vma_snapshot, gfp); + capture = capture_vma_snapshot(capture, c->vma_res, gfp, + "user"); return capture; } @@ -1415,16 +1405,19 @@ static struct i915_vma_coredump * create_vma_coredump(const struct intel_gt *gt, struct i915_vma *vma, const char *name, struct i915_vma_compress *compress) { - struct i915_vma_coredump *ret; - struct i915_vma_snapshot tmp; + struct i915_vma_coredump *ret = NULL; + struct i915_vma_resource *vma_res; + bool lockdep_cookie; if (!vma) return NULL; - GEM_WARN_ON(!i915_vma_is_pinned(vma)); - i915_vma_snapshot_init_onstack(&tmp, vma, name); - ret = i915_vma_coredump_create(gt, &tmp, compress); - i915_vma_snapshot_put_onstack(&tmp); + vma_res = vma->resource; + + if (i915_vma_resource_hold(vma_res, &lockdep_cookie)) { + ret = i915_vma_coredump_create(gt, vma_res, compress, name); + i915_vma_resource_unhold(vma_res, lockdep_cookie); + } return ret; } @@ -1471,7 +1464,7 @@ intel_engine_coredump_add_request(struct intel_engine_coredump *ee, * as the simplest method to avoid being overwritten * by userspace. */ - vma = capture_vma_snapshot(vma, &rq->batch_snapshot, gfp); + vma = capture_vma_snapshot(vma, rq->batch_res, gfp, "batch"); vma = capture_user(vma, rq, gfp); vma = capture_vma(vma, rq->ring->vma, "ring", gfp); vma = capture_vma(vma, rq->context->state, "HW context", gfp); @@ -1492,14 +1485,14 @@ intel_engine_coredump_add_vma(struct intel_engine_coredump *ee, while (capture) { struct intel_engine_capture_vma *this = capture; - struct i915_vma_snapshot *vsnap = this->vsnap; + struct i915_vma_resource *vma_res = this->vma_res; add_vma(ee, - i915_vma_coredump_create(engine->gt, - vsnap, compress)); + i915_vma_coredump_create(engine->gt, vma_res, + compress, this->name)); - i915_vma_snapshot_resource_unpin(vsnap, this->lockdep_cookie); - i915_vma_snapshot_put(vsnap); + i915_vma_resource_unhold(vma_res, this->lockdep_cookie); + i915_vma_resource_put(vma_res); capture = this->next; kfree(this); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 76cf5ac91e94..ba3a70b2cc57 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -116,8 +116,10 @@ static void i915_fence_release(struct dma_fence *fence) rq->guc_prio != GUC_PRIO_FINI); i915_request_free_capture_list(fetch_and_zero(&rq->capture_list)); - if (i915_vma_snapshot_present(&rq->batch_snapshot)) - i915_vma_snapshot_put_onstack(&rq->batch_snapshot); + if (rq->batch_res) { + i915_vma_resource_put(rq->batch_res); + rq->batch_res = NULL; + } /* * The request is put onto a RCU freelist (i.e. the address @@ -308,7 +310,7 @@ void i915_request_free_capture_list(struct i915_capture_list *capture) while (capture) { struct i915_capture_list *next = capture->next; - i915_vma_snapshot_put(capture->vma_snapshot); + i915_vma_resource_put(capture->vma_res); kfree(capture); capture = next; } @@ -854,7 +856,7 @@ static void __i915_request_ctor(void *arg) i915_sw_fence_init(&rq->semaphore, semaphore_notify); clear_capture_list(rq); - rq->batch_snapshot.present = false; + rq->batch_res = NULL; init_llist_head(&rq->execute_cb); } @@ -960,7 +962,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) __rq_init_watchdog(rq); assert_capture_list_is_null(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); - GEM_BUG_ON(i915_vma_snapshot_present(&rq->batch_snapshot)); + GEM_BUG_ON(rq->batch_res); /* * Reserve space in the ring buffer for all the commands required to diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 170ee78c2858..28b1f9db5487 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -40,7 +40,7 @@ #include "i915_scheduler.h" #include "i915_selftest.h" #include "i915_sw_fence.h" -#include "i915_vma_snapshot.h" +#include "i915_vma_resource.h" #include @@ -52,7 +52,7 @@ struct i915_request; #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) struct i915_capture_list { - struct i915_vma_snapshot *vma_snapshot; + struct i915_vma_resource *vma_res; struct i915_capture_list *next; }; @@ -300,7 +300,7 @@ struct i915_request { /** Batch buffer pointer for selftest internal use. */ I915_SELFTEST_DECLARE(struct i915_vma *batch); - struct i915_vma_snapshot batch_snapshot; + struct i915_vma_resource *batch_res; #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) /** diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 3c4389aac53a..f417939d91a8 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -289,7 +289,6 @@ struct i915_vma_work { struct i915_vma_resource *vma_res; struct drm_i915_gem_object *pinned; struct i915_sw_dma_fence_cb cb; - struct i915_refct_sgt *rsgt; enum i915_cache_level cache_level; unsigned int flags; }; @@ -317,8 +316,6 @@ static void __vma_release(struct dma_fence_work *work) i915_vm_put(vw->vm); if (vw->vma_res) i915_vma_resource_put(vw->vma_res); - if (vw->rsgt) - i915_refct_sgt_put(vw->rsgt); } static const struct dma_fence_work_ops bind_ops = { @@ -393,8 +390,8 @@ i915_vma_resource_init_from_vma(struct i915_vma_resource *vma_res, struct drm_i915_gem_object *obj = vma->obj; i915_vma_resource_init(vma_res, vma->vm, vma->pages, &vma->page_sizes, - i915_gem_object_is_readonly(obj), - i915_gem_object_is_lmem(obj), + obj->mm.rsgt, i915_gem_object_is_readonly(obj), + i915_gem_object_is_lmem(obj), obj->mm.region, vma->ops, vma->private, vma->node.start, vma->node.size, vma->size); } @@ -485,8 +482,6 @@ int i915_vma_bind(struct i915_vma *vma, work->vma_res = i915_vma_resource_get(vma->resource); work->cache_level = cache_level; work->flags = bind_flags; - if (vma->obj->mm.rsgt) - work->rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); /* * Note we only want to chain up to the migration fence on @@ -1425,7 +1420,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) GEM_BUG_ON(i915_vma_has_userfault(vma)); /* Object backend must be async capable. */ - GEM_WARN_ON(async && !vma->obj->mm.rsgt); + GEM_WARN_ON(async && !vma->resource->bi.pages_rsgt); trace_i915_vma_unbind(vma); unbind_fence = i915_vma_resource_unbind(vma->resource); @@ -1434,9 +1429,6 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), &vma->flags); - /* Object backend must be async capable. */ - GEM_WARN_ON(async && !vma->obj->mm.rsgt); - i915_vma_detach(vma); if (!async && unbind_fence) { diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 39ef68a531e3..763b246a00c7 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -8,6 +8,7 @@ #include "i915_sw_fence.h" #include "i915_vma_resource.h" +#include "intel_memory_region.h" #include "gt/intel_gtt.h" @@ -103,6 +104,8 @@ static void __i915_vma_resource_unhold(struct i915_vma_resource *vma_res) return; dma_fence_signal(&vma_res->unbind_fence); + if (vma_res->bi.pages_rsgt) + i915_refct_sgt_put(vma_res->bi.pages_rsgt); } /** diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 26914558bf6f..e45e7fb65f96 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -10,8 +10,11 @@ #include #include "i915_gem.h" +#include "i915_scatterlist.h" #include "i915_sw_fence.h" +struct intel_memory_region; + struct i915_page_sizes { /** * The sg mask of the pages sg_table. i.e the mask of @@ -46,6 +49,7 @@ struct i915_page_sizes { * @__subtree_last: Interval tree private member. * @vm: non-refcounted pointer to the vm. This is for internal use only and * this member is cleared after vm_resource unbind. + * @mr: The memory region of the object pointed to by the vma. * @ops: Pointer to the backend i915_vma_ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. @@ -54,8 +58,10 @@ struct i915_page_sizes { * @page_sizes_gtt: Resulting page sizes from the bind operation. * @bound_flags: Flags indicating binding status. * @allocated: Backend private data. TODO: Should move into @private. - * @immediate_unbind: Unbind can be done immediately and don't need to be - * deferred to a work item awaiting unsignaled fences. + * @immediate_unbind: Unbind can be done immediately and doesn't need to be + * deferred to a work item awaiting unsignaled fences. This is a hack. + * (dma_fence_work uses a fence flag for this, but this seems slightly + * cleaner). * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -79,16 +85,22 @@ struct i915_vma_resource { * and flags * @pages: The pages sg-table. * @page_sizes: Page sizes of the pages. + * @pages_rsgt: Refcounted sg-table when delayed object destruction + * is supported. May be NULL. * @readonly: Whether the vma should be bound read-only. * @lmem: Whether the vma points to lmem. */ struct i915_vma_bindinfo { struct sg_table *pages; struct i915_page_sizes page_sizes; + struct i915_refct_sgt *pages_rsgt; bool readonly:1; bool lmem:1; } bi; +#if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) + struct intel_memory_region *mr; +#endif const struct i915_vma_ops *ops; void *private; unsigned long start; @@ -143,8 +155,11 @@ static inline void i915_vma_resource_put(struct i915_vma_resource *vma_res) * @vm: Pointer to the vm. * @pages: The pages sg-table. * @page_sizes: Page sizes of the pages. + * @pages_rsgt: Pointer to a struct i915_refct_sgt of an object with + * delayed destruction. * @readonly: Whether the vma should be bound read-only. * @lmem: Whether the vma points to lmem. + * @mr: The memory region of the object the vma points to. * @ops: The backend ops. * @private: Bind backend private info. * @start: Offset into the address space of bind range start. @@ -160,8 +175,10 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, struct i915_address_space *vm, struct sg_table *pages, const struct i915_page_sizes *page_sizes, + struct i915_refct_sgt *pages_rsgt, bool readonly, bool lmem, + struct intel_memory_region *mr, const struct i915_vma_ops *ops, void *private, unsigned long start, @@ -172,8 +189,13 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, vma_res->vm = vm; vma_res->bi.pages = pages; vma_res->bi.page_sizes = *page_sizes; + if (pages_rsgt) + vma_res->bi.pages_rsgt = i915_refct_sgt_get(pages_rsgt); vma_res->bi.readonly = readonly; vma_res->bi.lmem = lmem; +#if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) + vma_res->mr = mr; +#endif vma_res->ops = ops; vma_res->private = private; vma_res->start = start; @@ -184,6 +206,8 @@ static inline void i915_vma_resource_init(struct i915_vma_resource *vma_res, static inline void i915_vma_resource_fini(struct i915_vma_resource *vma_res) { GEM_BUG_ON(refcount_read(&vma_res->hold_count) != 1); + if (vma_res->bi.pages_rsgt) + i915_refct_sgt_put(vma_res->bi.pages_rsgt); i915_sw_fence_fini(&vma_res->chain); } diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.c b/drivers/gpu/drm/i915/i915_vma_snapshot.c deleted file mode 100644 index 69f62c1ca967..000000000000 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.c +++ /dev/null @@ -1,125 +0,0 @@ -// SPDX-License-Identifier: MIT -/* - * Copyright © 2021 Intel Corporation - */ - -#include "i915_vma_resource.h" -#include "i915_vma_snapshot.h" -#include "i915_vma_types.h" -#include "i915_vma.h" - -/** - * i915_vma_snapshot_init - Initialize a struct i915_vma_snapshot from - * a struct i915_vma. - * @vsnap: The i915_vma_snapshot to init. - * @vma: A struct i915_vma used to initialize @vsnap. - * @name: Name associated with the snapshot. The character pointer needs to - * stay alive over the lifitime of the shapsot - */ -void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name) -{ - if (!i915_vma_is_pinned(vma)) - assert_object_held(vma->obj); - - vsnap->name = name; - vsnap->obj_size = vma->obj->base.size; - vsnap->pages = vma->pages; - vsnap->pages_rsgt = NULL; - vsnap->mr = NULL; - if (vma->obj->mm.rsgt) - vsnap->pages_rsgt = i915_refct_sgt_get(vma->obj->mm.rsgt); - vsnap->mr = vma->obj->mm.region; - kref_init(&vsnap->kref); - vsnap->vma_resource = i915_vma_get_current_resource(vma); - vsnap->onstack = false; - vsnap->present = true; -} - -/** - * i915_vma_snapshot_init_onstack - Initialize a struct i915_vma_snapshot from - * a struct i915_vma, but avoid kfreeing it on last put. - * @vsnap: The i915_vma_snapshot to init. - * @vma: A struct i915_vma used to initialize @vsnap. - * @name: Name associated with the snapshot. The character pointer needs to - * stay alive over the lifitime of the shapsot - */ -void i915_vma_snapshot_init_onstack(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name) -{ - i915_vma_snapshot_init(vsnap, vma, name); - vsnap->onstack = true; -} - -static void vma_snapshot_release(struct kref *ref) -{ - struct i915_vma_snapshot *vsnap = - container_of(ref, typeof(*vsnap), kref); - - vsnap->present = false; - i915_vma_resource_put(vsnap->vma_resource); - if (vsnap->pages_rsgt) - i915_refct_sgt_put(vsnap->pages_rsgt); - if (!vsnap->onstack) - kfree(vsnap); -} - -/** - * i915_vma_snapshot_put - Put an i915_vma_snapshot pointer reference - * @vsnap: The pointer reference - */ -void i915_vma_snapshot_put(struct i915_vma_snapshot *vsnap) -{ - kref_put(&vsnap->kref, vma_snapshot_release); -} - -/** - * i915_vma_snapshot_put_onstack - Put an onstcak i915_vma_snapshot pointer - * reference and varify that the structure is released - * @vsnap: The pointer reference - * - * This function is intended to be paired with a i915_vma_init_onstack() - * and should be called before exiting the scope that declared or - * freeing the structure that embedded @vsnap to verify that all references - * have been released. - */ -void i915_vma_snapshot_put_onstack(struct i915_vma_snapshot *vsnap) -{ - if (!kref_put(&vsnap->kref, vma_snapshot_release)) - GEM_BUG_ON(1); -} - -/** - * i915_vma_snapshot_resource_pin - Temporarily block the memory the - * vma snapshot is pointing to from being released. - * @vsnap: The vma snapshot. - * @lockdep_cookie: Pointer to bool needed for lockdep support. This needs - * to be passed to the paired i915_vma_snapshot_resource_unpin. - * - * This function will temporarily try to hold up a fence or similar structure - * and will therefore enter a fence signaling critical section. - * - * Return: true if we succeeded in blocking the memory from being released, - * false otherwise. - */ -bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, - bool *lockdep_cookie) -{ - return i915_vma_resource_hold(vsnap->vma_resource, lockdep_cookie); -} - -/** - * i915_vma_snapshot_resource_unpin - Unblock vma snapshot memory from - * being released. - * @vsnap: The vma snapshot. - * @lockdep_cookie: Cookie returned from matching i915_vma_resource_pin(). - * - * Might leave a fence signalling critical section and signal a fence. - */ -void i915_vma_snapshot_resource_unpin(struct i915_vma_snapshot *vsnap, - bool lockdep_cookie) -{ - i915_vma_resource_unhold(vsnap->vma_resource, lockdep_cookie); -} diff --git a/drivers/gpu/drm/i915/i915_vma_snapshot.h b/drivers/gpu/drm/i915/i915_vma_snapshot.h deleted file mode 100644 index 1b08ce9f8576..000000000000 --- a/drivers/gpu/drm/i915/i915_vma_snapshot.h +++ /dev/null @@ -1,101 +0,0 @@ -/* SPDX-License-Identifier: MIT */ -/* - * Copyright © 2021 Intel Corporation - */ -#ifndef _I915_VMA_SNAPSHOT_H_ -#define _I915_VMA_SNAPSHOT_H_ - -#include -#include -#include - -struct i915_active; -struct i915_refct_sgt; -struct i915_vma; -struct intel_memory_region; -struct sg_table; - -/** - * DOC: Simple utilities for snapshotting GPU vma metadata, later used for - * error capture. Vi use a separate header for this to avoid issues due to - * recursive header includes. - */ - -/** - * struct i915_vma_snapshot - Snapshot of vma metadata. - * @obj_size: The size of the underlying object in bytes. - * @pages: The struct sg_table pointing to the pages bound. - * @pages_rsgt: The refcounted sg_table holding the reference for @pages if any. - * @mr: The memory region pointed for the pages bound. - * @kref: Reference for this structure. - * @vma_resource: Pointer to the vma resource representing the vma binding. - * @onstack: Whether the structure shouldn't be freed on final put. - * @present: Whether the structure is present and initialized. - */ -struct i915_vma_snapshot { - const char *name; - size_t obj_size; - struct sg_table *pages; - struct i915_refct_sgt *pages_rsgt; - struct intel_memory_region *mr; - struct kref kref; - struct i915_vma_resource *vma_resource; - bool onstack:1; - bool present:1; -}; - -void i915_vma_snapshot_init(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name); - -void i915_vma_snapshot_init_onstack(struct i915_vma_snapshot *vsnap, - struct i915_vma *vma, - const char *name); - -void i915_vma_snapshot_put(struct i915_vma_snapshot *vsnap); - -void i915_vma_snapshot_put_onstack(struct i915_vma_snapshot *vsnap); - -bool i915_vma_snapshot_resource_pin(struct i915_vma_snapshot *vsnap, - bool *lockdep_cookie); - -void i915_vma_snapshot_resource_unpin(struct i915_vma_snapshot *vsnap, - bool lockdep_cookie); - -/** - * i915_vma_snapshot_alloc - Allocate a struct i915_vma_snapshot - * @gfp: Allocation mode. - * - * Return: A pointer to a struct i915_vma_snapshot if successful. - * NULL otherwise. - */ -static inline struct i915_vma_snapshot *i915_vma_snapshot_alloc(gfp_t gfp) -{ - return kmalloc(sizeof(struct i915_vma_snapshot), gfp); -} - -/** - * i915_vma_snapshot_get - Take a reference on a struct i915_vma_snapshot - * - * Return: A pointer to a struct i915_vma_snapshot. - */ -static inline struct i915_vma_snapshot * -i915_vma_snapshot_get(struct i915_vma_snapshot *vsnap) -{ - kref_get(&vsnap->kref); - return vsnap; -} - -/** - * i915_vma_snapshot_present - Whether a struct i915_vma_snapshot is - * present and initialized. - * - * Return: true if present and initialized; false otherwise. - */ -static inline bool -i915_vma_snapshot_present(const struct i915_vma_snapshot *vsnap) -{ - return vsnap && vsnap->present; -} - -#endif