Message ID | 1458636164-4905-1-git-send-email-akash.goel@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Tim GoreĀ Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf > Of akash.goel@intel.com > Sent: Tuesday, March 22, 2016 8:43 AM > To: intel-gfx@lists.freedesktop.org > Cc: Goel, Akash > Subject: [Intel-gfx] [PATCH v9] drm/i915: Support to enable TRTT on GEN9 > > From: Akash Goel <akash.goel@intel.com> > > Gen9 has an additional address translation hardware support in form of Tiled > Resource Translation Table (TR-TT) which provides an extra level of > abstraction over PPGTT. > This is useful for mapping Sparse/Tiled texture resources. > Sparse resources are created as virtual-only allocations. Regions of the > resource that the application intends to use is bound to the physical memory > on the fly and can be re-bound to different memory allocations over the > lifetime of the resource. > > TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required > for a new PPGTT instance, but TR-TT may not enabled for every context. > 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT, > which such chunk to use is conveyed to HW through a register. > Any GFX address, which lies in that reserved 44 bit range will be translated > through TR-TT first and then through PPGTT to get the actual physical > address, so the output of translation from TR-TT will be a PPGTT offset. > > TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which leaves > behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and each > level is contained within a 4KB page hence L3 and L2 is composed of > 512 64b entries and L1 is composed of 1024 32b entries. > > There is a provision to keep TR-TT Tables in virtual space, where the pages of > TRTT tables will be mapped to PPGTT. > Currently this is the supported mode, in this mode UMD will have a full > control on TR-TT management, with bare minimum support from KMD. > So the entries of L3 table will contain the PPGTT offset of L2 Table pages, > similarly entries of L2 table will contain the PPGTT offset of L1 Table pages. > The entries of L1 table will contain the PPGTT offset of BOs actually backing > the Sparse resources. > UMD will have to allocate the L3/L2/L1 table pages as a regular BO only & > assign them a PPGTT address through the Soft Pin API (for example, use soft > pin to assign l3_table_address to the L3 table BO, when used). > UMD will also program the entries in the TR-TT page tables using regular > batch commands (MI_STORE_DATA_IMM), or via mmapping of the page > table BOs. > UMD may do the complete PPGTT address space management, on the > pretext that it could help minimize the conflicts. > > Any space in TR-TT segment not bound to any Sparse texture, will be handled > through Invalid tile, User is expected to initialize the entries of a new > L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to > the holes in the Sparse texture resource will be set with the Null tile pattern > The improper programming of TRTT should only lead to a recoverable GPU > hang, eventually leading to banning of the culprit context without victimizing > others. > > The association of any Sparse resource with the BOs will be known only to > UMD, and only the Sparse resources shall be assigned an offset from the TR- > TT segment by UMD. The use of TR-TT segment or mapping of Sparse > resources will be transparent to the KMD, UMD will do the address > assignment from TR-TT segment autonomously and KMD will be oblivious of > it. > Any object must not be assigned an address from TR-TT segment, they will > be mapped to PPGTT in a regular way by KMD. > > This patch provides an interface through which UMD can convey KMD to > enable TR-TT for a given context. A new I915_CONTEXT_PARAM_TRTT param > has been added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose. > UMD will have to pass the GFX address of L3 table page, start location of TR- > TT segment alongwith the pattern value for the Null & invalid Tile registers. > > v2: > - Support context_getparam for TRTT also and dispense with a separate > GETPARAM case for TRTT (Chris). > - Use i915_dbg to log errors for the invalid TRTT ABI parameters passed > from user space (Chris). > - Move all the argument checking for TRTT in context_setparam to the > set_trtt function (Chris). > - Change the type of 'flags' field inside 'intel_context' to unsigned (Chris) > - Rename certain functions to rightly reflect their purpose, rename > the new param for TRTT in gem_context_param to > I915_CONTEXT_PARAM_TRTT, > rephrase few lines in the commit message body, add more comments > (Chris). > - Extend ABI to allow User specify TRTT segment location also. > - Fix for selective enabling of TRTT on per context basis, explicitly > disable TR-TT at the start of a new context. > > v3: > - Check the return value of gen9_emit_trtt_regs (Chris) > - Update the kernel doc for intel_context structure. > - Rebased. > > v4: > - Fix the warnings reported by 'checkpatch.pl --strict' (Michel) > - Fix the context_getparam implementation avoiding the reset of size field, > affecting the TRTT case. > > v5: > - Update the TR-TT params right away in context_setparam, by constructing > & submitting a request emitting LRIs, instead of deferring it and > conflating with the next batch submission (Chris) > - Follow the struct_mutex handling related prescribed rules, while accessing > User space buffer, both in context_setparam & getparam functions (Chris). > > v6: > - Fix the warning caused due to removal of un-allocated trtt vma node. > > v7: > - Move context ref/unref to context_setparam_ioctl from set_trtt() & > remove > that from get_trtt() as not really needed there (Chris). > - Add a check for improper values for Null & Invalid Tiles. > - Remove superfluous DRM_ERROR from trtt_context_allocate_vma (Chris). > - Rebased. > > v8: > - Add context ref/unref to context_getparam_ioctl also so as to be > consistent > and ease the extension of ioctl in future (Chris) > > v9: > - Fix the handling of return value from trtt_context_allocate_vma() function, > causing kernel panic at the time of destroying context, in case of > unsuccessful allocation of trtt vma. > - Rebased. > > Testcase: igt/gem_trtt > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Michel Thierry <michel.thierry@intel.com> > Signed-off-by: Akash Goel <akash.goel@intel.com> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_drv.h | 16 +++- > drivers/gpu/drm/i915/i915_gem_context.c | 157 > +++++++++++++++++++++++++++++++- > drivers/gpu/drm/i915/i915_gem_gtt.c | 65 +++++++++++++ > drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++ > drivers/gpu/drm/i915/i915_reg.h | 19 ++++ > drivers/gpu/drm/i915/intel_lrc.c | 124 ++++++++++++++++++++++++- > drivers/gpu/drm/i915/intel_lrc.h | 1 + > include/uapi/drm/i915_drm.h | 8 ++ > 8 files changed, 393 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h index ecbd418..272d1f8 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -804,6 +804,7 @@ struct i915_ctx_hang_stats { #define > DEFAULT_CONTEXT_HANDLE 0 > > #define CONTEXT_NO_ZEROMAP (1<<0) > +#define CONTEXT_USE_TRTT (1 << 1) > /** > * struct intel_context - as the name implies, represents a context. > * @ref: reference count. > @@ -818,6 +819,8 @@ struct i915_ctx_hang_stats { > * @ppgtt: virtual memory space used by this context. > * @legacy_hw_ctx: render context backing object and whether it is > correctly > * initialized (legacy ring submission mechanism only). > + * @trtt_info: Programming parameters for tr-tt (redirection tables for > + * userspace, for sparse resource management) > * @link: link in the global list of contexts. > * > * Contexts are memory images used by the hardware to store copies of > their @@ -828,7 +831,7 @@ struct intel_context { > int user_handle; > uint8_t remap_slice; > struct drm_i915_private *i915; > - int flags; > + unsigned int flags; > struct drm_i915_file_private *file_priv; > struct i915_ctx_hang_stats hang_stats; > struct i915_hw_ppgtt *ppgtt; > @@ -849,6 +852,15 @@ struct intel_context { > uint32_t *lrc_reg_state; > } engine[I915_NUM_ENGINES]; > > + /* TRTT info */ > + struct intel_context_trtt { > + u32 invd_tile_val; > + u32 null_tile_val; > + u64 l3_table_address; > + u64 segment_base_addr; > + struct i915_vma *vma; > + } trtt_info; > + > struct list_head link; > }; > > @@ -2657,6 +2669,8 @@ struct drm_i915_cmd_table { > !IS_VALLEYVIEW(dev) && > !IS_CHERRYVIEW(dev) && \ > !IS_BROXTON(dev)) > > +#define HAS_TRTT(dev) (IS_GEN9(dev)) > + A very minor point, but there is a w/a to disable TRTT for BXT_REVID_A0/1. I realise this is basically obsolete now, but I'm still using one! > #define INTEL_PCH_DEVICE_ID_MASK 0xff00 > #define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00 > #define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00 > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > b/drivers/gpu/drm/i915/i915_gem_context.c > index 394e525..5f28c23 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -133,6 +133,14 @@ static int get_context_size(struct drm_device *dev) > return ret; > } > > +static void intel_context_free_trtt(struct intel_context *ctx) { > + if (!ctx->trtt_info.vma) > + return; > + > + intel_trtt_context_destroy_vma(ctx->trtt_info.vma); > +} > + > static void i915_gem_context_clean(struct intel_context *ctx) { > struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; @@ -164,6 +172,8 @@ > void i915_gem_context_free(struct kref *ctx_ref) > */ > i915_gem_context_clean(ctx); > > + intel_context_free_trtt(ctx); > + > i915_ppgtt_put(ctx->ppgtt); > > if (ctx->legacy_hw_ctx.rcs_state) > @@ -507,6 +517,129 @@ i915_gem_context_get(struct > drm_i915_file_private *file_priv, u32 id) > return ctx; > } > > +static int > +intel_context_get_trtt(struct intel_context *ctx, > + struct drm_i915_gem_context_param *args) { > + struct drm_i915_gem_context_trtt_param trtt_params; > + struct drm_device *dev = ctx->i915->dev; > + > + if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) { > + return -ENODEV; > + } else if (args->size < sizeof(trtt_params)) { > + args->size = sizeof(trtt_params); > + } else { > + trtt_params.segment_base_addr = > + ctx->trtt_info.segment_base_addr; > + trtt_params.l3_table_address = > + ctx->trtt_info.l3_table_address; > + trtt_params.null_tile_val = > + ctx->trtt_info.null_tile_val; > + trtt_params.invd_tile_val = > + ctx->trtt_info.invd_tile_val; > + > + mutex_unlock(&dev->struct_mutex); > + > + if (__copy_to_user(to_user_ptr(args->value), > + &trtt_params, > + sizeof(trtt_params))) { > + mutex_lock(&dev->struct_mutex); > + return -EFAULT; > + } > + > + args->size = sizeof(trtt_params); > + mutex_lock(&dev->struct_mutex); > + } > + > + return 0; > +} > + > +static int > +intel_context_set_trtt(struct intel_context *ctx, > + struct drm_i915_gem_context_param *args) { > + struct drm_i915_gem_context_trtt_param trtt_params; > + struct i915_vma *vma; > + struct drm_device *dev = ctx->i915->dev; > + int ret; > + > + if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) > + return -ENODEV; > + else if (ctx->flags & CONTEXT_USE_TRTT) > + return -EEXIST; > + else if (args->size < sizeof(trtt_params)) > + return -EINVAL; > + > + mutex_unlock(&dev->struct_mutex); > + > + if (copy_from_user(&trtt_params, > + to_user_ptr(args->value), > + sizeof(trtt_params))) { > + mutex_lock(&dev->struct_mutex); > + ret = -EFAULT; > + goto exit; > + } > + > + mutex_lock(&dev->struct_mutex); > + > + /* Check if the setup happened from another path */ > + if (ctx->flags & CONTEXT_USE_TRTT) { > + ret = -EEXIST; > + goto exit; > + } > + > + /* basic sanity checks for the segment location & l3 table pointer */ > + if (trtt_params.segment_base_addr & (GEN9_TRTT_SEGMENT_SIZE - > 1)) { > + i915_dbg(dev, "segment base address not correctly > aligned\n"); > + ret = -EINVAL; > + goto exit; > + } > + > + if (((trtt_params.l3_table_address + PAGE_SIZE) >= > + trtt_params.segment_base_addr) && > + (trtt_params.l3_table_address < > + (trtt_params.segment_base_addr + > GEN9_TRTT_SEGMENT_SIZE))) { > + i915_dbg(dev, "l3 table address conflicts with trtt > segment\n"); > + ret = -EINVAL; > + goto exit; > + } > + > + if (trtt_params.l3_table_address & > ~GEN9_TRTT_L3_GFXADDR_MASK) { > + i915_dbg(dev, "invalid l3 table address\n"); > + ret = -EINVAL; > + goto exit; > + } > + > + if (trtt_params.null_tile_val == trtt_params.invd_tile_val) { > + i915_dbg(dev, "incorrect values for null & invalid tiles\n"); > + return -EINVAL; > + } > + > + vma = intel_trtt_context_allocate_vma(&ctx->ppgtt->base, > + trtt_params.segment_base_addr); > + if (IS_ERR(vma)) { > + ret = PTR_ERR(vma); > + goto exit; > + } > + > + ctx->trtt_info.vma = vma; > + ctx->trtt_info.null_tile_val = trtt_params.null_tile_val; > + ctx->trtt_info.invd_tile_val = trtt_params.invd_tile_val; > + ctx->trtt_info.l3_table_address = trtt_params.l3_table_address; > + ctx->trtt_info.segment_base_addr = > trtt_params.segment_base_addr; > + > + ret = intel_lr_rcs_context_setup_trtt(ctx); > + if (ret) { > + intel_trtt_context_destroy_vma(ctx->trtt_info.vma); > + goto exit; > + } > + > + ctx->flags |= CONTEXT_USE_TRTT; > + > +exit: > + return ret; > +} > + > static inline int > mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags) { @@ - > 931,7 +1064,14 @@ int i915_gem_context_getparam_ioctl(struct drm_device > *dev, void *data, > return PTR_ERR(ctx); > } > > - args->size = 0; > + /* > + * Take a reference also, as in certain cases we have to release & > + * reacquire the struct_mutex and we don't want the context to > + * go away. > + */ > + i915_gem_context_reference(ctx); > + > + args->size = (args->param != I915_CONTEXT_PARAM_TRTT) ? 0 : > +args->size; > switch (args->param) { > case I915_CONTEXT_PARAM_BAN_PERIOD: > args->value = ctx->hang_stats.ban_period_seconds; > @@ -947,10 +1087,14 @@ int i915_gem_context_getparam_ioctl(struct > drm_device *dev, void *data, > else > args->value = to_i915(dev)->ggtt.base.total; > break; > + case I915_CONTEXT_PARAM_TRTT: > + ret = intel_context_get_trtt(ctx, args); > + break; > default: > ret = -EINVAL; > break; > } > + i915_gem_context_unreference(ctx); > mutex_unlock(&dev->struct_mutex); > > return ret; > @@ -974,6 +1118,13 @@ int i915_gem_context_setparam_ioctl(struct > drm_device *dev, void *data, > return PTR_ERR(ctx); > } > > + /* > + * Take a reference also, as in certain cases we have to release & > + * reacquire the struct_mutex and we don't want the context to > + * go away. > + */ > + i915_gem_context_reference(ctx); > + > switch (args->param) { > case I915_CONTEXT_PARAM_BAN_PERIOD: > if (args->size) > @@ -992,10 +1143,14 @@ int i915_gem_context_setparam_ioctl(struct > drm_device *dev, void *data, > ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : > 0; > } > break; > + case I915_CONTEXT_PARAM_TRTT: > + ret = intel_context_set_trtt(ctx, args); > + break; > default: > ret = -EINVAL; > break; > } > + i915_gem_context_unreference(ctx); > mutex_unlock(&dev->struct_mutex); > > return ret; > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c > b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 0715bb7..cbf8a03 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -2169,6 +2169,17 @@ int i915_ppgtt_init_hw(struct drm_device *dev) { > gtt_write_workarounds(dev); > > + if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) { > + struct drm_i915_private *dev_priv = dev->dev_private; > + /* > + * Globally enable TR-TT support in Hw. > + * Still TR-TT enabling on per context basis is required. > + * Non-trtt contexts are not affected by this setting. > + */ > + I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR, > + GEN9_TRTT_BYPASS_DISABLE); > + } > + > /* In the case of execlists, PPGTT is enabled by the context > descriptor > * and the PDPs are contained within the context itself. We don't > * need to do anything here. */ > @@ -3362,6 +3373,60 @@ > i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object > *obj, > > } > > +void intel_trtt_context_destroy_vma(struct i915_vma *vma) { > + struct i915_address_space *vm = vma->vm; > + > + WARN_ON(!list_empty(&vma->obj_link)); > + WARN_ON(!list_empty(&vma->vm_link)); > + WARN_ON(!list_empty(&vma->exec_list)); > + > + WARN_ON(!vma->pin_count); > + > + if (drm_mm_node_allocated(&vma->node)) > + drm_mm_remove_node(&vma->node); > + > + i915_ppgtt_put(i915_vm_to_ppgtt(vm)); > + kmem_cache_free(to_i915(vm->dev)->vmas, vma); } > + > +struct i915_vma * > +intel_trtt_context_allocate_vma(struct i915_address_space *vm, > + uint64_t segment_base_addr) > +{ > + struct i915_vma *vma; > + int ret; > + > + vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL); > + if (!vma) > + return ERR_PTR(-ENOMEM); > + > + INIT_LIST_HEAD(&vma->obj_link); > + INIT_LIST_HEAD(&vma->vm_link); > + INIT_LIST_HEAD(&vma->exec_list); > + vma->vm = vm; > + i915_ppgtt_get(i915_vm_to_ppgtt(vm)); > + > + /* Mark the vma as permanently pinned */ > + vma->pin_count = 1; > + > + /* Reserve from the 48 bit PPGTT space */ > + vma->node.start = segment_base_addr; > + vma->node.size = GEN9_TRTT_SEGMENT_SIZE; > + ret = drm_mm_reserve_node(&vm->mm, &vma->node); > + if (ret) { > + ret = i915_gem_evict_for_vma(vma); > + if (ret == 0) > + ret = drm_mm_reserve_node(&vm->mm, &vma- > >node); > + } > + if (ret) { > + intel_trtt_context_destroy_vma(vma); > + return ERR_PTR(ret); > + } > + > + return vma; > +} > + > static struct scatterlist * > rotate_pages(const dma_addr_t *in, unsigned int offset, > unsigned int width, unsigned int height, diff --git > a/drivers/gpu/drm/i915/i915_gem_gtt.h > b/drivers/gpu/drm/i915/i915_gem_gtt.h > index d804be0..8cbaca2 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.h > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h > @@ -128,6 +128,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t; > #define GEN8_PPAT_ELLC_OVERRIDE (0<<2) > #define GEN8_PPAT(i, x) ((uint64_t) (x) << ((i) * 8)) > > +/* Fixed size segment */ > +#define GEN9_TRTT_SEG_SIZE_SHIFT 44 > +#define GEN9_TRTT_SEGMENT_SIZE (1ULL << > GEN9_TRTT_SEG_SIZE_SHIFT) > + > enum i915_ggtt_view_type { > I915_GGTT_VIEW_NORMAL = 0, > I915_GGTT_VIEW_ROTATED, > @@ -560,4 +564,8 @@ size_t > i915_ggtt_view_size(struct drm_i915_gem_object *obj, > const struct i915_ggtt_view *view); > > +struct i915_vma * > +intel_trtt_context_allocate_vma(struct i915_address_space *vm, > + uint64_t segment_base_addr); > +void intel_trtt_context_destroy_vma(struct i915_vma *vma); > #endif > diff --git a/drivers/gpu/drm/i915/i915_reg.h > b/drivers/gpu/drm/i915/i915_reg.h index 264885f..07936b6 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -188,6 +188,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t > reg) > #define GEN8_RPCS_EU_MIN_SHIFT 0 > #define GEN8_RPCS_EU_MIN_MASK (0xf << > GEN8_RPCS_EU_MIN_SHIFT) > > +#define GEN9_TR_CHICKEN_BIT_VECTOR _MMIO(0x4DFC) > +#define GEN9_TRTT_BYPASS_DISABLE (1 << 0) > + > +/* TRTT registers in the H/W Context */ > +#define GEN9_TRTT_L3_POINTER_DW0 _MMIO(0x4DE0) > +#define GEN9_TRTT_L3_POINTER_DW1 _MMIO(0x4DE4) > +#define GEN9_TRTT_L3_GFXADDR_MASK 0xFFFFFFFF0000 > + > +#define GEN9_TRTT_NULL_TILE_REG _MMIO(0x4DE8) > +#define GEN9_TRTT_INVD_TILE_REG _MMIO(0x4DEC) > + > +#define GEN9_TRTT_VA_MASKDATA _MMIO(0x4DF0) > +#define GEN9_TRVA_MASK_VALUE 0xF0 > +#define GEN9_TRVA_DATA_MASK 0xF > + > +#define GEN9_TRTT_TABLE_CONTROL _MMIO(0x4DF4) > +#define GEN9_TRTT_IN_GFX_VA_SPACE (1 << 1) > +#define GEN9_TRTT_ENABLE (1 << 0) > + > #define GAM_ECOCHK _MMIO(0x4090) > #define BDW_DISABLE_HDC_INVALIDATION (1<<25) > #define ECOCHK_SNB_BIT (1<<10) > diff --git a/drivers/gpu/drm/i915/intel_lrc.c > b/drivers/gpu/drm/i915/intel_lrc.c > index 3a23b95..8af480b 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1645,6 +1645,76 @@ static int gen9_init_render_ring(struct > intel_engine_cs *engine) > return init_workarounds_ring(engine); > } > > +static int gen9_init_rcs_context_trtt(struct drm_i915_gem_request *req) > +{ > + struct intel_ringbuffer *ringbuf = req->ringbuf; > + int ret; > + > + ret = intel_logical_ring_begin(req, 2 + 2); > + if (ret) > + return ret; > + > + intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1)); > + > + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL); > + intel_logical_ring_emit(ringbuf, 0); > + > + intel_logical_ring_emit(ringbuf, MI_NOOP); > + intel_logical_ring_advance(ringbuf); > + > + return 0; > +} > + > +static int gen9_emit_trtt_regs(struct drm_i915_gem_request *req) { > + struct intel_context *ctx = req->ctx; > + struct intel_ringbuffer *ringbuf = req->ringbuf; > + u64 masked_l3_gfx_address = > + ctx->trtt_info.l3_table_address & > GEN9_TRTT_L3_GFXADDR_MASK; > + u32 trva_data_value = > + (ctx->trtt_info.segment_base_addr >> > GEN9_TRTT_SEG_SIZE_SHIFT) & > + GEN9_TRVA_DATA_MASK; > + const int num_lri_cmds = 6; > + int ret; > + > + /* > + * Emitting LRIs to update the TRTT registers is most reliable, instead > + * of directly updating the context image, as this will ensure that > + * update happens in a serialized manner for the context and also > + * lite-restore scenario will get handled. > + */ > + ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2); > + if (ret) > + return ret; > + > + intel_logical_ring_emit(ringbuf, > MI_LOAD_REGISTER_IMM(num_lri_cmds)); > + > + intel_logical_ring_emit_reg(ringbuf, > GEN9_TRTT_L3_POINTER_DW0); > + intel_logical_ring_emit(ringbuf, > +lower_32_bits(masked_l3_gfx_address)); > + > + intel_logical_ring_emit_reg(ringbuf, > GEN9_TRTT_L3_POINTER_DW1); > + intel_logical_ring_emit(ringbuf, > +upper_32_bits(masked_l3_gfx_address)); > + > + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_NULL_TILE_REG); > + intel_logical_ring_emit(ringbuf, ctx->trtt_info.null_tile_val); > + > + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_INVD_TILE_REG); > + intel_logical_ring_emit(ringbuf, ctx->trtt_info.invd_tile_val); > + > + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_VA_MASKDATA); > + intel_logical_ring_emit(ringbuf, > + GEN9_TRVA_MASK_VALUE | > trva_data_value); > + > + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL); > + intel_logical_ring_emit(ringbuf, > + GEN9_TRTT_IN_GFX_VA_SPACE | > GEN9_TRTT_ENABLE); > + > + intel_logical_ring_emit(ringbuf, MI_NOOP); > + intel_logical_ring_advance(ringbuf); > + > + return 0; > +} > + > static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req) > { > struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt; @@ -2003,6 > +2073,25 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request > *req) > return intel_lr_context_render_state_init(req); > } > > +static int gen9_init_rcs_context(struct drm_i915_gem_request *req) { > + int ret; > + > + /* > + * Explictily disable TR-TT at the start of a new context. > + * Otherwise on switching from a TR-TT context to a new Non TR-TT > + * context the TR-TT settings of the outgoing context could get > + * spilled on to the new incoming context as only the Ring Context > + * part is loaded on the first submission of a new context, due to > + * the setting of ENGINE_CTX_RESTORE_INHIBIT bit. > + */ > + ret = gen9_init_rcs_context_trtt(req); > + if (ret) > + return ret; > + > + return gen8_init_rcs_context(req); > +} > + > /** > * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer > * > @@ -2134,11 +2223,14 @@ static int logical_render_ring_init(struct > drm_device *dev) > logical_ring_default_vfuncs(dev, engine); > > /* Override some for render ring. */ > - if (INTEL_INFO(dev)->gen >= 9) > + if (INTEL_INFO(dev)->gen >= 9) { > engine->init_hw = gen9_init_render_ring; > - else > + engine->init_context = gen9_init_rcs_context; > + } else { > engine->init_hw = gen8_init_render_ring; > - engine->init_context = gen8_init_rcs_context; > + engine->init_context = gen8_init_rcs_context; > + } > + > engine->cleanup = intel_fini_pipe_control; > engine->emit_flush = gen8_emit_flush_render; > engine->emit_request = gen8_emit_request_render; @@ -2702,3 > +2794,29 @@ void intel_lr_context_reset(struct drm_device *dev, > ringbuf->tail = 0; > } > } > + > +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx) { > + struct intel_engine_cs *engine = &(ctx->i915->engine[RCS]); > + struct drm_i915_gem_request *req; > + int ret; > + > + if (!ctx->engine[RCS].state) { > + ret = intel_lr_context_deferred_alloc(ctx, engine); > + if (ret) > + return ret; > + } > + > + req = i915_gem_request_alloc(engine, ctx); > + if (IS_ERR(req)) > + return PTR_ERR(req); > + > + ret = gen9_emit_trtt_regs(req); > + if (ret) { > + i915_gem_request_cancel(req); > + return ret; > + } > + > + i915_add_request(req); > + return 0; > +} > diff --git a/drivers/gpu/drm/i915/intel_lrc.h > b/drivers/gpu/drm/i915/intel_lrc.h > index a17cb12..f3600b2 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.h > +++ b/drivers/gpu/drm/i915/intel_lrc.h > @@ -107,6 +107,7 @@ void intel_lr_context_reset(struct drm_device *dev, > struct intel_context *ctx); > uint64_t intel_lr_context_descriptor(struct intel_context *ctx, > struct intel_engine_cs *engine); > +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx); > > u32 intel_execlists_ctx_id(struct intel_context *ctx, > struct intel_engine_cs *engine); > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index a5524cc..604da23 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1167,7 +1167,15 @@ struct drm_i915_gem_context_param { > #define I915_CONTEXT_PARAM_BAN_PERIOD 0x1 > #define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2 > #define I915_CONTEXT_PARAM_GTT_SIZE 0x3 > +#define I915_CONTEXT_PARAM_TRTT 0x4 > __u64 value; > }; > > +struct drm_i915_gem_context_trtt_param { > + __u64 segment_base_addr; > + __u64 l3_table_address; > + __u32 invd_tile_val; > + __u32 null_tile_val; > +}; > + > #endif /* _UAPI_I915_DRM_H_ */ > -- > 1.9.2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 3/24/2016 9:59 PM, Gore, Tim wrote: > > Tim Gore > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > >> -----Original Message----- >> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf >> Of akash.goel@intel.com >> Sent: Tuesday, March 22, 2016 8:43 AM >> To: intel-gfx@lists.freedesktop.org >> Cc: Goel, Akash >> Subject: [Intel-gfx] [PATCH v9] drm/i915: Support to enable TRTT on GEN9 >> >> From: Akash Goel <akash.goel@intel.com> >> >> Gen9 has an additional address translation hardware support in form of Tiled >> Resource Translation Table (TR-TT) which provides an extra level of >> abstraction over PPGTT. >> This is useful for mapping Sparse/Tiled texture resources. >> Sparse resources are created as virtual-only allocations. Regions of the >> resource that the application intends to use is bound to the physical memory >> on the fly and can be re-bound to different memory allocations over the >> lifetime of the resource. >> >> TR-TT is tightly coupled with PPGTT, a new instance of TR-TT will be required >> for a new PPGTT instance, but TR-TT may not enabled for every context. >> 1/16th of the 48bit PPGTT space is earmarked for the translation by TR-TT, >> which such chunk to use is conveyed to HW through a register. >> Any GFX address, which lies in that reserved 44 bit range will be translated >> through TR-TT first and then through PPGTT to get the actual physical >> address, so the output of translation from TR-TT will be a PPGTT offset. >> >> TRTT is constructed as a 3 level tile Table. Each tile is 64KB is size which leaves >> behind 44-16=28 address bits. 28bits are partitioned as 9+9+10, and each >> level is contained within a 4KB page hence L3 and L2 is composed of >> 512 64b entries and L1 is composed of 1024 32b entries. >> >> There is a provision to keep TR-TT Tables in virtual space, where the pages of >> TRTT tables will be mapped to PPGTT. >> Currently this is the supported mode, in this mode UMD will have a full >> control on TR-TT management, with bare minimum support from KMD. >> So the entries of L3 table will contain the PPGTT offset of L2 Table pages, >> similarly entries of L2 table will contain the PPGTT offset of L1 Table pages. >> The entries of L1 table will contain the PPGTT offset of BOs actually backing >> the Sparse resources. >> UMD will have to allocate the L3/L2/L1 table pages as a regular BO only & >> assign them a PPGTT address through the Soft Pin API (for example, use soft >> pin to assign l3_table_address to the L3 table BO, when used). >> UMD will also program the entries in the TR-TT page tables using regular >> batch commands (MI_STORE_DATA_IMM), or via mmapping of the page >> table BOs. >> UMD may do the complete PPGTT address space management, on the >> pretext that it could help minimize the conflicts. >> >> Any space in TR-TT segment not bound to any Sparse texture, will be handled >> through Invalid tile, User is expected to initialize the entries of a new >> L3/L2/L1 table page with the Invalid tile pattern. The entries corresponding to >> the holes in the Sparse texture resource will be set with the Null tile pattern >> The improper programming of TRTT should only lead to a recoverable GPU >> hang, eventually leading to banning of the culprit context without victimizing >> others. >> >> The association of any Sparse resource with the BOs will be known only to >> UMD, and only the Sparse resources shall be assigned an offset from the TR- >> TT segment by UMD. The use of TR-TT segment or mapping of Sparse >> resources will be transparent to the KMD, UMD will do the address >> assignment from TR-TT segment autonomously and KMD will be oblivious of >> it. >> Any object must not be assigned an address from TR-TT segment, they will >> be mapped to PPGTT in a regular way by KMD. >> >> This patch provides an interface through which UMD can convey KMD to >> enable TR-TT for a given context. A new I915_CONTEXT_PARAM_TRTT param >> has been added to I915_GEM_CONTEXT_SETPARAM ioctl for that purpose. >> UMD will have to pass the GFX address of L3 table page, start location of TR- >> TT segment alongwith the pattern value for the Null & invalid Tile registers. >> >> v2: >> - Support context_getparam for TRTT also and dispense with a separate >> GETPARAM case for TRTT (Chris). >> - Use i915_dbg to log errors for the invalid TRTT ABI parameters passed >> from user space (Chris). >> - Move all the argument checking for TRTT in context_setparam to the >> set_trtt function (Chris). >> - Change the type of 'flags' field inside 'intel_context' to unsigned (Chris) >> - Rename certain functions to rightly reflect their purpose, rename >> the new param for TRTT in gem_context_param to >> I915_CONTEXT_PARAM_TRTT, >> rephrase few lines in the commit message body, add more comments >> (Chris). >> - Extend ABI to allow User specify TRTT segment location also. >> - Fix for selective enabling of TRTT on per context basis, explicitly >> disable TR-TT at the start of a new context. >> >> v3: >> - Check the return value of gen9_emit_trtt_regs (Chris) >> - Update the kernel doc for intel_context structure. >> - Rebased. >> >> v4: >> - Fix the warnings reported by 'checkpatch.pl --strict' (Michel) >> - Fix the context_getparam implementation avoiding the reset of size field, >> affecting the TRTT case. >> >> v5: >> - Update the TR-TT params right away in context_setparam, by constructing >> & submitting a request emitting LRIs, instead of deferring it and >> conflating with the next batch submission (Chris) >> - Follow the struct_mutex handling related prescribed rules, while accessing >> User space buffer, both in context_setparam & getparam functions (Chris). >> >> v6: >> - Fix the warning caused due to removal of un-allocated trtt vma node. >> >> v7: >> - Move context ref/unref to context_setparam_ioctl from set_trtt() & >> remove >> that from get_trtt() as not really needed there (Chris). >> - Add a check for improper values for Null & Invalid Tiles. >> - Remove superfluous DRM_ERROR from trtt_context_allocate_vma (Chris). >> - Rebased. >> >> v8: >> - Add context ref/unref to context_getparam_ioctl also so as to be >> consistent >> and ease the extension of ioctl in future (Chris) >> >> v9: >> - Fix the handling of return value from trtt_context_allocate_vma() function, >> causing kernel panic at the time of destroying context, in case of >> unsuccessful allocation of trtt vma. >> - Rebased. >> >> Testcase: igt/gem_trtt >> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Michel Thierry <michel.thierry@intel.com> >> Signed-off-by: Akash Goel <akash.goel@intel.com> >> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> >> --- >> drivers/gpu/drm/i915/i915_drv.h | 16 +++- >> drivers/gpu/drm/i915/i915_gem_context.c | 157 >> +++++++++++++++++++++++++++++++- >> drivers/gpu/drm/i915/i915_gem_gtt.c | 65 +++++++++++++ >> drivers/gpu/drm/i915/i915_gem_gtt.h | 8 ++ >> drivers/gpu/drm/i915/i915_reg.h | 19 ++++ >> drivers/gpu/drm/i915/intel_lrc.c | 124 ++++++++++++++++++++++++- >> drivers/gpu/drm/i915/intel_lrc.h | 1 + >> include/uapi/drm/i915_drm.h | 8 ++ >> 8 files changed, 393 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_drv.h >> b/drivers/gpu/drm/i915/i915_drv.h index ecbd418..272d1f8 100644 >> >> @@ -2657,6 +2669,8 @@ struct drm_i915_cmd_table { >> !IS_VALLEYVIEW(dev) && >> !IS_CHERRYVIEW(dev) && \ >> !IS_BROXTON(dev)) >> >> +#define HAS_TRTT(dev) (IS_GEN9(dev)) >> + > > A very minor point, but there is a w/a to disable TRTT for BXT_REVID_A0/1. I realise this > is basically obsolete now, but I'm still using one! > Thanks for raising this. Michel & Thomas also apprised me about a similar WA for KBL. Was thinking to submit that as a follow up patch. Best regards Akash >> return ret; >> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c >> b/drivers/gpu/drm/i915/i915_gem_gtt.c >> index 0715bb7..cbf8a03 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c >> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c >> @@ -2169,6 +2169,17 @@ int i915_ppgtt_init_hw(struct drm_device *dev) { >> gtt_write_workarounds(dev); >> >> + if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) { >> + struct drm_i915_private *dev_priv = dev->dev_private; >> + /* >> + * Globally enable TR-TT support in Hw. >> + * Still TR-TT enabling on per context basis is required. >> + * Non-trtt contexts are not affected by this setting. >> + */ >> + I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR, >> + GEN9_TRTT_BYPASS_DISABLE); >> + } >> + >> /* In the case of execlists, PPGTT is enabled by the context >> descriptor >> * and the PDPs are contained within the context itself. We don't >> * need to do anything here. */ >> @@ -3362,6 +3373,60 @@ >> i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object >> *obj, >> >> } >> >> +void intel_trtt_context_destroy_vma(struct i915_vma *vma) { >> + struct i915_address_space *vm = vma->vm; >> + >> + WARN_ON(!list_empty(&vma->obj_link)); >> + WARN_ON(!list_empty(&vma->vm_link)); >> + WARN_ON(!list_empty(&vma->exec_list)); >> + >> + WARN_ON(!vma->pin_count); >> + >> + if (drm_mm_node_allocated(&vma->node)) >> + drm_mm_remove_node(&vma->node); >> + >> + i915_ppgtt_put(i915_vm_to_ppgtt(vm)); >> + kmem_cache_free(to_i915(vm->dev)->vmas, vma); } >> + >> +struct i915_vma * >> +intel_trtt_context_allocate_vma(struct i915_address_space *vm, >> + uint64_t segment_base_addr) >> +{ >> + struct i915_vma *vma; >> + int ret; >> + >> + vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL); >> + if (!vma) >> + return ERR_PTR(-ENOMEM); >> + >> + INIT_LIST_HEAD(&vma->obj_link); >> + INIT_LIST_HEAD(&vma->vm_link); >> + INIT_LIST_HEAD(&vma->exec_list); >> + vma->vm = vm; >> + i915_ppgtt_get(i915_vm_to_ppgtt(vm)); >> + >> + /* Mark the vma as permanently pinned */ >> + vma->pin_count = 1; >> + >> + /* Reserve from the 48 bit PPGTT space */ >> + vma->node.start = segment_base_addr; >> + vma->node.size = GEN9_TRTT_SEGMENT_SIZE; >> + ret = drm_mm_reserve_node(&vm->mm, &vma->node); >> + if (ret) { >> + ret = i915_gem_evict_for_vma(vma); >> + if (ret == 0) >> + ret = drm_mm_reserve_node(&vm->mm, &vma- >>> node); >> + } >> + if (ret) { >> + intel_trtt_context_destroy_vma(vma); >> + return ERR_PTR(ret); >> + } >> + >> + return vma; >> +} >> + >> static struct scatterlist * >> rotate_pages(const dma_addr_t *in, unsigned int offset, >> unsigned int width, unsigned int height, diff --git >> a/drivers/gpu/drm/i915/i915_gem_gtt.h >> b/drivers/gpu/drm/i915/i915_gem_gtt.h >> index d804be0..8cbaca2 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h >> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h >> @@ -128,6 +128,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t; >> #define GEN8_PPAT_ELLC_OVERRIDE (0<<2) >> #define GEN8_PPAT(i, x) ((uint64_t) (x) << ((i) * 8)) >> >> +/* Fixed size segment */ >> +#define GEN9_TRTT_SEG_SIZE_SHIFT 44 >> +#define GEN9_TRTT_SEGMENT_SIZE (1ULL << >> GEN9_TRTT_SEG_SIZE_SHIFT) >> + >> enum i915_ggtt_view_type { >> I915_GGTT_VIEW_NORMAL = 0, >> I915_GGTT_VIEW_ROTATED, >> @@ -560,4 +564,8 @@ size_t >> i915_ggtt_view_size(struct drm_i915_gem_object *obj, >> const struct i915_ggtt_view *view); >> >> +struct i915_vma * >> +intel_trtt_context_allocate_vma(struct i915_address_space *vm, >> + uint64_t segment_base_addr); >> +void intel_trtt_context_destroy_vma(struct i915_vma *vma); >> #endif >> diff --git a/drivers/gpu/drm/i915/i915_reg.h >> b/drivers/gpu/drm/i915/i915_reg.h index 264885f..07936b6 100644 >> --- a/drivers/gpu/drm/i915/i915_reg.h >> +++ b/drivers/gpu/drm/i915/i915_reg.h >> @@ -188,6 +188,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t >> reg) >> #define GEN8_RPCS_EU_MIN_SHIFT 0 >> #define GEN8_RPCS_EU_MIN_MASK (0xf << >> GEN8_RPCS_EU_MIN_SHIFT) >> >> +#define GEN9_TR_CHICKEN_BIT_VECTOR _MMIO(0x4DFC) >> +#define GEN9_TRTT_BYPASS_DISABLE (1 << 0) >> + >> +/* TRTT registers in the H/W Context */ >> +#define GEN9_TRTT_L3_POINTER_DW0 _MMIO(0x4DE0) >> +#define GEN9_TRTT_L3_POINTER_DW1 _MMIO(0x4DE4) >> +#define GEN9_TRTT_L3_GFXADDR_MASK 0xFFFFFFFF0000 >> + >> +#define GEN9_TRTT_NULL_TILE_REG _MMIO(0x4DE8) >> +#define GEN9_TRTT_INVD_TILE_REG _MMIO(0x4DEC) >> + >> +#define GEN9_TRTT_VA_MASKDATA _MMIO(0x4DF0) >> +#define GEN9_TRVA_MASK_VALUE 0xF0 >> +#define GEN9_TRVA_DATA_MASK 0xF >> + >> +#define GEN9_TRTT_TABLE_CONTROL _MMIO(0x4DF4) >> +#define GEN9_TRTT_IN_GFX_VA_SPACE (1 << 1) >> +#define GEN9_TRTT_ENABLE (1 << 0) >> + >> #define GAM_ECOCHK _MMIO(0x4090) >> #define BDW_DISABLE_HDC_INVALIDATION (1<<25) >> #define ECOCHK_SNB_BIT (1<<10) >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c >> b/drivers/gpu/drm/i915/intel_lrc.c >> index 3a23b95..8af480b 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -1645,6 +1645,76 @@ static int gen9_init_render_ring(struct >> intel_engine_cs *engine) >> return init_workarounds_ring(engine); >> } >> >> +static int gen9_init_rcs_context_trtt(struct drm_i915_gem_request *req) >> +{ >> + struct intel_ringbuffer *ringbuf = req->ringbuf; >> + int ret; >> + >> + ret = intel_logical_ring_begin(req, 2 + 2); >> + if (ret) >> + return ret; >> + >> + intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1)); >> + >> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL); >> + intel_logical_ring_emit(ringbuf, 0); >> + >> + intel_logical_ring_emit(ringbuf, MI_NOOP); >> + intel_logical_ring_advance(ringbuf); >> + >> + return 0; >> +} >> + >> +static int gen9_emit_trtt_regs(struct drm_i915_gem_request *req) { >> + struct intel_context *ctx = req->ctx; >> + struct intel_ringbuffer *ringbuf = req->ringbuf; >> + u64 masked_l3_gfx_address = >> + ctx->trtt_info.l3_table_address & >> GEN9_TRTT_L3_GFXADDR_MASK; >> + u32 trva_data_value = >> + (ctx->trtt_info.segment_base_addr >> >> GEN9_TRTT_SEG_SIZE_SHIFT) & >> + GEN9_TRVA_DATA_MASK; >> + const int num_lri_cmds = 6; >> + int ret; >> + >> + /* >> + * Emitting LRIs to update the TRTT registers is most reliable, instead >> + * of directly updating the context image, as this will ensure that >> + * update happens in a serialized manner for the context and also >> + * lite-restore scenario will get handled. >> + */ >> + ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2); >> + if (ret) >> + return ret; >> + >> + intel_logical_ring_emit(ringbuf, >> MI_LOAD_REGISTER_IMM(num_lri_cmds)); >> + >> + intel_logical_ring_emit_reg(ringbuf, >> GEN9_TRTT_L3_POINTER_DW0); >> + intel_logical_ring_emit(ringbuf, >> +lower_32_bits(masked_l3_gfx_address)); >> + >> + intel_logical_ring_emit_reg(ringbuf, >> GEN9_TRTT_L3_POINTER_DW1); >> + intel_logical_ring_emit(ringbuf, >> +upper_32_bits(masked_l3_gfx_address)); >> + >> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_NULL_TILE_REG); >> + intel_logical_ring_emit(ringbuf, ctx->trtt_info.null_tile_val); >> + >> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_INVD_TILE_REG); >> + intel_logical_ring_emit(ringbuf, ctx->trtt_info.invd_tile_val); >> + >> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_VA_MASKDATA); >> + intel_logical_ring_emit(ringbuf, >> + GEN9_TRVA_MASK_VALUE | >> trva_data_value); >> + >> + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL); >> + intel_logical_ring_emit(ringbuf, >> + GEN9_TRTT_IN_GFX_VA_SPACE | >> GEN9_TRTT_ENABLE); >> + >> + intel_logical_ring_emit(ringbuf, MI_NOOP); >> + intel_logical_ring_advance(ringbuf); >> + >> + return 0; >> +} >> + >> static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req) >> { >> struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt; @@ -2003,6 >> +2073,25 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request >> *req) >> return intel_lr_context_render_state_init(req); >> } >> >> +static int gen9_init_rcs_context(struct drm_i915_gem_request *req) { >> + int ret; >> + >> + /* >> + * Explictily disable TR-TT at the start of a new context. >> + * Otherwise on switching from a TR-TT context to a new Non TR-TT >> + * context the TR-TT settings of the outgoing context could get >> + * spilled on to the new incoming context as only the Ring Context >> + * part is loaded on the first submission of a new context, due to >> + * the setting of ENGINE_CTX_RESTORE_INHIBIT bit. >> + */ >> + ret = gen9_init_rcs_context_trtt(req); >> + if (ret) >> + return ret; >> + >> + return gen8_init_rcs_context(req); >> +} >> + >> /** >> * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer >> * >> @@ -2134,11 +2223,14 @@ static int logical_render_ring_init(struct >> drm_device *dev) >> logical_ring_default_vfuncs(dev, engine); >> >> /* Override some for render ring. */ >> - if (INTEL_INFO(dev)->gen >= 9) >> + if (INTEL_INFO(dev)->gen >= 9) { >> engine->init_hw = gen9_init_render_ring; >> - else >> + engine->init_context = gen9_init_rcs_context; >> + } else { >> engine->init_hw = gen8_init_render_ring; >> - engine->init_context = gen8_init_rcs_context; >> + engine->init_context = gen8_init_rcs_context; >> + } >> + >> engine->cleanup = intel_fini_pipe_control; >> engine->emit_flush = gen8_emit_flush_render; >> engine->emit_request = gen8_emit_request_render; @@ -2702,3 >> +2794,29 @@ void intel_lr_context_reset(struct drm_device *dev, >> ringbuf->tail = 0; >> } >> } >> + >> +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx) { >> + struct intel_engine_cs *engine = &(ctx->i915->engine[RCS]); >> + struct drm_i915_gem_request *req; >> + int ret; >> + >> + if (!ctx->engine[RCS].state) { >> + ret = intel_lr_context_deferred_alloc(ctx, engine); >> + if (ret) >> + return ret; >> + } >> + >> + req = i915_gem_request_alloc(engine, ctx); >> + if (IS_ERR(req)) >> + return PTR_ERR(req); >> + >> + ret = gen9_emit_trtt_regs(req); >> + if (ret) { >> + i915_gem_request_cancel(req); >> + return ret; >> + } >> + >> + i915_add_request(req); >> + return 0; >> +} >> diff --git a/drivers/gpu/drm/i915/intel_lrc.h >> b/drivers/gpu/drm/i915/intel_lrc.h >> index a17cb12..f3600b2 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.h >> +++ b/drivers/gpu/drm/i915/intel_lrc.h >> @@ -107,6 +107,7 @@ void intel_lr_context_reset(struct drm_device *dev, >> struct intel_context *ctx); >> uint64_t intel_lr_context_descriptor(struct intel_context *ctx, >> struct intel_engine_cs *engine); >> +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx); >> >> u32 intel_execlists_ctx_id(struct intel_context *ctx, >> struct intel_engine_cs *engine); >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h >> index a5524cc..604da23 100644 >> --- a/include/uapi/drm/i915_drm.h >> +++ b/include/uapi/drm/i915_drm.h >> @@ -1167,7 +1167,15 @@ struct drm_i915_gem_context_param { >> #define I915_CONTEXT_PARAM_BAN_PERIOD 0x1 >> #define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2 >> #define I915_CONTEXT_PARAM_GTT_SIZE 0x3 >> +#define I915_CONTEXT_PARAM_TRTT 0x4 >> __u64 value; >> }; >> >> +struct drm_i915_gem_context_trtt_param { >> + __u64 segment_base_addr; >> + __u64 l3_table_address; >> + __u32 invd_tile_val; >> + __u32 null_tile_val; >> +}; >> + >> #endif /* _UAPI_I915_DRM_H_ */ >> -- >> 1.9.2 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ecbd418..272d1f8 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -804,6 +804,7 @@ struct i915_ctx_hang_stats { #define DEFAULT_CONTEXT_HANDLE 0 #define CONTEXT_NO_ZEROMAP (1<<0) +#define CONTEXT_USE_TRTT (1 << 1) /** * struct intel_context - as the name implies, represents a context. * @ref: reference count. @@ -818,6 +819,8 @@ struct i915_ctx_hang_stats { * @ppgtt: virtual memory space used by this context. * @legacy_hw_ctx: render context backing object and whether it is correctly * initialized (legacy ring submission mechanism only). + * @trtt_info: Programming parameters for tr-tt (redirection tables for + * userspace, for sparse resource management) * @link: link in the global list of contexts. * * Contexts are memory images used by the hardware to store copies of their @@ -828,7 +831,7 @@ struct intel_context { int user_handle; uint8_t remap_slice; struct drm_i915_private *i915; - int flags; + unsigned int flags; struct drm_i915_file_private *file_priv; struct i915_ctx_hang_stats hang_stats; struct i915_hw_ppgtt *ppgtt; @@ -849,6 +852,15 @@ struct intel_context { uint32_t *lrc_reg_state; } engine[I915_NUM_ENGINES]; + /* TRTT info */ + struct intel_context_trtt { + u32 invd_tile_val; + u32 null_tile_val; + u64 l3_table_address; + u64 segment_base_addr; + struct i915_vma *vma; + } trtt_info; + struct list_head link; }; @@ -2657,6 +2669,8 @@ struct drm_i915_cmd_table { !IS_VALLEYVIEW(dev) && !IS_CHERRYVIEW(dev) && \ !IS_BROXTON(dev)) +#define HAS_TRTT(dev) (IS_GEN9(dev)) + #define INTEL_PCH_DEVICE_ID_MASK 0xff00 #define INTEL_PCH_IBX_DEVICE_ID_TYPE 0x3b00 #define INTEL_PCH_CPT_DEVICE_ID_TYPE 0x1c00 diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 394e525..5f28c23 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -133,6 +133,14 @@ static int get_context_size(struct drm_device *dev) return ret; } +static void intel_context_free_trtt(struct intel_context *ctx) +{ + if (!ctx->trtt_info.vma) + return; + + intel_trtt_context_destroy_vma(ctx->trtt_info.vma); +} + static void i915_gem_context_clean(struct intel_context *ctx) { struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; @@ -164,6 +172,8 @@ void i915_gem_context_free(struct kref *ctx_ref) */ i915_gem_context_clean(ctx); + intel_context_free_trtt(ctx); + i915_ppgtt_put(ctx->ppgtt); if (ctx->legacy_hw_ctx.rcs_state) @@ -507,6 +517,129 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id) return ctx; } +static int +intel_context_get_trtt(struct intel_context *ctx, + struct drm_i915_gem_context_param *args) +{ + struct drm_i915_gem_context_trtt_param trtt_params; + struct drm_device *dev = ctx->i915->dev; + + if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) { + return -ENODEV; + } else if (args->size < sizeof(trtt_params)) { + args->size = sizeof(trtt_params); + } else { + trtt_params.segment_base_addr = + ctx->trtt_info.segment_base_addr; + trtt_params.l3_table_address = + ctx->trtt_info.l3_table_address; + trtt_params.null_tile_val = + ctx->trtt_info.null_tile_val; + trtt_params.invd_tile_val = + ctx->trtt_info.invd_tile_val; + + mutex_unlock(&dev->struct_mutex); + + if (__copy_to_user(to_user_ptr(args->value), + &trtt_params, + sizeof(trtt_params))) { + mutex_lock(&dev->struct_mutex); + return -EFAULT; + } + + args->size = sizeof(trtt_params); + mutex_lock(&dev->struct_mutex); + } + + return 0; +} + +static int +intel_context_set_trtt(struct intel_context *ctx, + struct drm_i915_gem_context_param *args) +{ + struct drm_i915_gem_context_trtt_param trtt_params; + struct i915_vma *vma; + struct drm_device *dev = ctx->i915->dev; + int ret; + + if (!HAS_TRTT(dev) || !USES_FULL_48BIT_PPGTT(dev)) + return -ENODEV; + else if (ctx->flags & CONTEXT_USE_TRTT) + return -EEXIST; + else if (args->size < sizeof(trtt_params)) + return -EINVAL; + + mutex_unlock(&dev->struct_mutex); + + if (copy_from_user(&trtt_params, + to_user_ptr(args->value), + sizeof(trtt_params))) { + mutex_lock(&dev->struct_mutex); + ret = -EFAULT; + goto exit; + } + + mutex_lock(&dev->struct_mutex); + + /* Check if the setup happened from another path */ + if (ctx->flags & CONTEXT_USE_TRTT) { + ret = -EEXIST; + goto exit; + } + + /* basic sanity checks for the segment location & l3 table pointer */ + if (trtt_params.segment_base_addr & (GEN9_TRTT_SEGMENT_SIZE - 1)) { + i915_dbg(dev, "segment base address not correctly aligned\n"); + ret = -EINVAL; + goto exit; + } + + if (((trtt_params.l3_table_address + PAGE_SIZE) >= + trtt_params.segment_base_addr) && + (trtt_params.l3_table_address < + (trtt_params.segment_base_addr + GEN9_TRTT_SEGMENT_SIZE))) { + i915_dbg(dev, "l3 table address conflicts with trtt segment\n"); + ret = -EINVAL; + goto exit; + } + + if (trtt_params.l3_table_address & ~GEN9_TRTT_L3_GFXADDR_MASK) { + i915_dbg(dev, "invalid l3 table address\n"); + ret = -EINVAL; + goto exit; + } + + if (trtt_params.null_tile_val == trtt_params.invd_tile_val) { + i915_dbg(dev, "incorrect values for null & invalid tiles\n"); + return -EINVAL; + } + + vma = intel_trtt_context_allocate_vma(&ctx->ppgtt->base, + trtt_params.segment_base_addr); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto exit; + } + + ctx->trtt_info.vma = vma; + ctx->trtt_info.null_tile_val = trtt_params.null_tile_val; + ctx->trtt_info.invd_tile_val = trtt_params.invd_tile_val; + ctx->trtt_info.l3_table_address = trtt_params.l3_table_address; + ctx->trtt_info.segment_base_addr = trtt_params.segment_base_addr; + + ret = intel_lr_rcs_context_setup_trtt(ctx); + if (ret) { + intel_trtt_context_destroy_vma(ctx->trtt_info.vma); + goto exit; + } + + ctx->flags |= CONTEXT_USE_TRTT; + +exit: + return ret; +} + static inline int mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags) { @@ -931,7 +1064,14 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, return PTR_ERR(ctx); } - args->size = 0; + /* + * Take a reference also, as in certain cases we have to release & + * reacquire the struct_mutex and we don't want the context to + * go away. + */ + i915_gem_context_reference(ctx); + + args->size = (args->param != I915_CONTEXT_PARAM_TRTT) ? 0 : args->size; switch (args->param) { case I915_CONTEXT_PARAM_BAN_PERIOD: args->value = ctx->hang_stats.ban_period_seconds; @@ -947,10 +1087,14 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, else args->value = to_i915(dev)->ggtt.base.total; break; + case I915_CONTEXT_PARAM_TRTT: + ret = intel_context_get_trtt(ctx, args); + break; default: ret = -EINVAL; break; } + i915_gem_context_unreference(ctx); mutex_unlock(&dev->struct_mutex); return ret; @@ -974,6 +1118,13 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, return PTR_ERR(ctx); } + /* + * Take a reference also, as in certain cases we have to release & + * reacquire the struct_mutex and we don't want the context to + * go away. + */ + i915_gem_context_reference(ctx); + switch (args->param) { case I915_CONTEXT_PARAM_BAN_PERIOD: if (args->size) @@ -992,10 +1143,14 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, ctx->flags |= args->value ? CONTEXT_NO_ZEROMAP : 0; } break; + case I915_CONTEXT_PARAM_TRTT: + ret = intel_context_set_trtt(ctx, args); + break; default: ret = -EINVAL; break; } + i915_gem_context_unreference(ctx); mutex_unlock(&dev->struct_mutex); return ret; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 0715bb7..cbf8a03 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2169,6 +2169,17 @@ int i915_ppgtt_init_hw(struct drm_device *dev) { gtt_write_workarounds(dev); + if (HAS_TRTT(dev) && USES_FULL_48BIT_PPGTT(dev)) { + struct drm_i915_private *dev_priv = dev->dev_private; + /* + * Globally enable TR-TT support in Hw. + * Still TR-TT enabling on per context basis is required. + * Non-trtt contexts are not affected by this setting. + */ + I915_WRITE(GEN9_TR_CHICKEN_BIT_VECTOR, + GEN9_TRTT_BYPASS_DISABLE); + } + /* In the case of execlists, PPGTT is enabled by the context descriptor * and the PDPs are contained within the context itself. We don't * need to do anything here. */ @@ -3362,6 +3373,60 @@ i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj, } +void intel_trtt_context_destroy_vma(struct i915_vma *vma) +{ + struct i915_address_space *vm = vma->vm; + + WARN_ON(!list_empty(&vma->obj_link)); + WARN_ON(!list_empty(&vma->vm_link)); + WARN_ON(!list_empty(&vma->exec_list)); + + WARN_ON(!vma->pin_count); + + if (drm_mm_node_allocated(&vma->node)) + drm_mm_remove_node(&vma->node); + + i915_ppgtt_put(i915_vm_to_ppgtt(vm)); + kmem_cache_free(to_i915(vm->dev)->vmas, vma); +} + +struct i915_vma * +intel_trtt_context_allocate_vma(struct i915_address_space *vm, + uint64_t segment_base_addr) +{ + struct i915_vma *vma; + int ret; + + vma = kmem_cache_zalloc(to_i915(vm->dev)->vmas, GFP_KERNEL); + if (!vma) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&vma->obj_link); + INIT_LIST_HEAD(&vma->vm_link); + INIT_LIST_HEAD(&vma->exec_list); + vma->vm = vm; + i915_ppgtt_get(i915_vm_to_ppgtt(vm)); + + /* Mark the vma as permanently pinned */ + vma->pin_count = 1; + + /* Reserve from the 48 bit PPGTT space */ + vma->node.start = segment_base_addr; + vma->node.size = GEN9_TRTT_SEGMENT_SIZE; + ret = drm_mm_reserve_node(&vm->mm, &vma->node); + if (ret) { + ret = i915_gem_evict_for_vma(vma); + if (ret == 0) + ret = drm_mm_reserve_node(&vm->mm, &vma->node); + } + if (ret) { + intel_trtt_context_destroy_vma(vma); + return ERR_PTR(ret); + } + + return vma; +} + static struct scatterlist * rotate_pages(const dma_addr_t *in, unsigned int offset, unsigned int width, unsigned int height, diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index d804be0..8cbaca2 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -128,6 +128,10 @@ typedef uint64_t gen8_ppgtt_pml4e_t; #define GEN8_PPAT_ELLC_OVERRIDE (0<<2) #define GEN8_PPAT(i, x) ((uint64_t) (x) << ((i) * 8)) +/* Fixed size segment */ +#define GEN9_TRTT_SEG_SIZE_SHIFT 44 +#define GEN9_TRTT_SEGMENT_SIZE (1ULL << GEN9_TRTT_SEG_SIZE_SHIFT) + enum i915_ggtt_view_type { I915_GGTT_VIEW_NORMAL = 0, I915_GGTT_VIEW_ROTATED, @@ -560,4 +564,8 @@ size_t i915_ggtt_view_size(struct drm_i915_gem_object *obj, const struct i915_ggtt_view *view); +struct i915_vma * +intel_trtt_context_allocate_vma(struct i915_address_space *vm, + uint64_t segment_base_addr); +void intel_trtt_context_destroy_vma(struct i915_vma *vma); #endif diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 264885f..07936b6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -188,6 +188,25 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN8_RPCS_EU_MIN_SHIFT 0 #define GEN8_RPCS_EU_MIN_MASK (0xf << GEN8_RPCS_EU_MIN_SHIFT) +#define GEN9_TR_CHICKEN_BIT_VECTOR _MMIO(0x4DFC) +#define GEN9_TRTT_BYPASS_DISABLE (1 << 0) + +/* TRTT registers in the H/W Context */ +#define GEN9_TRTT_L3_POINTER_DW0 _MMIO(0x4DE0) +#define GEN9_TRTT_L3_POINTER_DW1 _MMIO(0x4DE4) +#define GEN9_TRTT_L3_GFXADDR_MASK 0xFFFFFFFF0000 + +#define GEN9_TRTT_NULL_TILE_REG _MMIO(0x4DE8) +#define GEN9_TRTT_INVD_TILE_REG _MMIO(0x4DEC) + +#define GEN9_TRTT_VA_MASKDATA _MMIO(0x4DF0) +#define GEN9_TRVA_MASK_VALUE 0xF0 +#define GEN9_TRVA_DATA_MASK 0xF + +#define GEN9_TRTT_TABLE_CONTROL _MMIO(0x4DF4) +#define GEN9_TRTT_IN_GFX_VA_SPACE (1 << 1) +#define GEN9_TRTT_ENABLE (1 << 0) + #define GAM_ECOCHK _MMIO(0x4090) #define BDW_DISABLE_HDC_INVALIDATION (1<<25) #define ECOCHK_SNB_BIT (1<<10) diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 3a23b95..8af480b 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1645,6 +1645,76 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine) return init_workarounds_ring(engine); } +static int gen9_init_rcs_context_trtt(struct drm_i915_gem_request *req) +{ + struct intel_ringbuffer *ringbuf = req->ringbuf; + int ret; + + ret = intel_logical_ring_begin(req, 2 + 2); + if (ret) + return ret; + + intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(1)); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL); + intel_logical_ring_emit(ringbuf, 0); + + intel_logical_ring_emit(ringbuf, MI_NOOP); + intel_logical_ring_advance(ringbuf); + + return 0; +} + +static int gen9_emit_trtt_regs(struct drm_i915_gem_request *req) +{ + struct intel_context *ctx = req->ctx; + struct intel_ringbuffer *ringbuf = req->ringbuf; + u64 masked_l3_gfx_address = + ctx->trtt_info.l3_table_address & GEN9_TRTT_L3_GFXADDR_MASK; + u32 trva_data_value = + (ctx->trtt_info.segment_base_addr >> GEN9_TRTT_SEG_SIZE_SHIFT) & + GEN9_TRVA_DATA_MASK; + const int num_lri_cmds = 6; + int ret; + + /* + * Emitting LRIs to update the TRTT registers is most reliable, instead + * of directly updating the context image, as this will ensure that + * update happens in a serialized manner for the context and also + * lite-restore scenario will get handled. + */ + ret = intel_logical_ring_begin(req, num_lri_cmds * 2 + 2); + if (ret) + return ret; + + intel_logical_ring_emit(ringbuf, MI_LOAD_REGISTER_IMM(num_lri_cmds)); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_L3_POINTER_DW0); + intel_logical_ring_emit(ringbuf, lower_32_bits(masked_l3_gfx_address)); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_L3_POINTER_DW1); + intel_logical_ring_emit(ringbuf, upper_32_bits(masked_l3_gfx_address)); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_NULL_TILE_REG); + intel_logical_ring_emit(ringbuf, ctx->trtt_info.null_tile_val); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_INVD_TILE_REG); + intel_logical_ring_emit(ringbuf, ctx->trtt_info.invd_tile_val); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_VA_MASKDATA); + intel_logical_ring_emit(ringbuf, + GEN9_TRVA_MASK_VALUE | trva_data_value); + + intel_logical_ring_emit_reg(ringbuf, GEN9_TRTT_TABLE_CONTROL); + intel_logical_ring_emit(ringbuf, + GEN9_TRTT_IN_GFX_VA_SPACE | GEN9_TRTT_ENABLE); + + intel_logical_ring_emit(ringbuf, MI_NOOP); + intel_logical_ring_advance(ringbuf); + + return 0; +} + static int intel_logical_ring_emit_pdps(struct drm_i915_gem_request *req) { struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt; @@ -2003,6 +2073,25 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request *req) return intel_lr_context_render_state_init(req); } +static int gen9_init_rcs_context(struct drm_i915_gem_request *req) +{ + int ret; + + /* + * Explictily disable TR-TT at the start of a new context. + * Otherwise on switching from a TR-TT context to a new Non TR-TT + * context the TR-TT settings of the outgoing context could get + * spilled on to the new incoming context as only the Ring Context + * part is loaded on the first submission of a new context, due to + * the setting of ENGINE_CTX_RESTORE_INHIBIT bit. + */ + ret = gen9_init_rcs_context_trtt(req); + if (ret) + return ret; + + return gen8_init_rcs_context(req); +} + /** * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer * @@ -2134,11 +2223,14 @@ static int logical_render_ring_init(struct drm_device *dev) logical_ring_default_vfuncs(dev, engine); /* Override some for render ring. */ - if (INTEL_INFO(dev)->gen >= 9) + if (INTEL_INFO(dev)->gen >= 9) { engine->init_hw = gen9_init_render_ring; - else + engine->init_context = gen9_init_rcs_context; + } else { engine->init_hw = gen8_init_render_ring; - engine->init_context = gen8_init_rcs_context; + engine->init_context = gen8_init_rcs_context; + } + engine->cleanup = intel_fini_pipe_control; engine->emit_flush = gen8_emit_flush_render; engine->emit_request = gen8_emit_request_render; @@ -2702,3 +2794,29 @@ void intel_lr_context_reset(struct drm_device *dev, ringbuf->tail = 0; } } + +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx) +{ + struct intel_engine_cs *engine = &(ctx->i915->engine[RCS]); + struct drm_i915_gem_request *req; + int ret; + + if (!ctx->engine[RCS].state) { + ret = intel_lr_context_deferred_alloc(ctx, engine); + if (ret) + return ret; + } + + req = i915_gem_request_alloc(engine, ctx); + if (IS_ERR(req)) + return PTR_ERR(req); + + ret = gen9_emit_trtt_regs(req); + if (ret) { + i915_gem_request_cancel(req); + return ret; + } + + i915_add_request(req); + return 0; +} diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index a17cb12..f3600b2 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -107,6 +107,7 @@ void intel_lr_context_reset(struct drm_device *dev, struct intel_context *ctx); uint64_t intel_lr_context_descriptor(struct intel_context *ctx, struct intel_engine_cs *engine); +int intel_lr_rcs_context_setup_trtt(struct intel_context *ctx); u32 intel_execlists_ctx_id(struct intel_context *ctx, struct intel_engine_cs *engine); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a5524cc..604da23 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1167,7 +1167,15 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_PARAM_BAN_PERIOD 0x1 #define I915_CONTEXT_PARAM_NO_ZEROMAP 0x2 #define I915_CONTEXT_PARAM_GTT_SIZE 0x3 +#define I915_CONTEXT_PARAM_TRTT 0x4 __u64 value; }; +struct drm_i915_gem_context_trtt_param { + __u64 segment_base_addr; + __u64 l3_table_address; + __u32 invd_tile_val; + __u32 null_tile_val; +}; + #endif /* _UAPI_I915_DRM_H_ */