Message ID | 20240415081423.495834-20-balasubramani.vivekanandan@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enable display support for Battlemage | expand |
On Mon, Apr 15, 2024 at 01:44:21PM +0530, Balasubramani Vivekanandan wrote: > From: Nirmoy Das <nirmoy.das@intel.com> > > Display surfaces can be tagged as transient by mapping it using one of > the various L3:XD PAT index modes on Xe2. The expectation is that KMD > needs to request transient data flush at the start of flip sequence to > ensure all transient data in L3 cache is flushed to memory. Add a > routine for this which we can then call from the display code. > > CC: Matt Roper <matthew.d.roper@intel.com> > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> > Co-developed-by: Matthew Auld <matthew.auld@intel.com> > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> > --- > drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ++ > drivers/gpu/drm/xe/xe_device.c | 49 ++++++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_device.h | 2 ++ > 3 files changed, 54 insertions(+) > > diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h > index 8fe811ea404a..65719a712807 100644 > --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h > +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h > @@ -318,6 +318,9 @@ > > #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658) > > +#define XE2_TDF_CTRL XE_REG(0xb418) > +#define TRANSIENT_FLUSH_REQUEST REG_BIT(0) > + > #define XEHP_MERT_MOD_CTRL XE_REG_MCR(0xcf28) > #define RENDER_MOD_CTRL XE_REG_MCR(0xcf2c) > #define COMP_MOD_CTRL XE_REG_MCR(0xcf30) > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c > index d85a2ba0a057..22e6422c7b8e 100644 > --- a/drivers/gpu/drm/xe/xe_device.c > +++ b/drivers/gpu/drm/xe/xe_device.c > @@ -717,6 +717,55 @@ void xe_device_wmb(struct xe_device *xe) > xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0); > } > > +/** > + * xe_device_td_flush() - Flush transient L3 cache entries > + * @xe: The device > + * > + * Display engine has direct access to memory and is never coherent with L3/L4 > + * caches (or CPU caches), however KMD is responsible for specifically flushing > + * transient L3 GPU cache entries prior to the flip sequence to ensure scanout > + * can happen from such a surface without seeing corruption. > + * > + * Display surfaces can be tagged as transient by mapping it using one of the > + * various L3:XD PAT index modes on Xe2. > + * > + * Note: On non-discrete xe2 platforms, like LNL, the entire L3 cache is flushed > + * at the end of each submission via PIPE_CONTROL for compute/render, since SA > + * Media is not coherent with L3 and we want to support render-vs-media > + * usescases. For other engines like copy/blt the HW internally forces uncached > + * behaviour, hence why we can skip the TDF on such platforms. > + */ > +void xe_device_td_flush(struct xe_device *xe) > +{ > + struct xe_gt *gt; > + u8 id; > + > + if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20) > + return; > + > + for_each_gt(gt, xe, id) { > + if (xe_gt_is_media_type(gt)) > + continue; > + > + if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)) > + return; > + > + xe_mmio_write32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST); > + /* > + * FIXME: We can likely do better here with our choice of > + * timeout. Currently we just assume the worst case, i.e. 64us, > + * which is believed to be sufficient to cover the worst case > + * scenario on current platforms if all cache entries are > + * transient and need to be flushed.. > + */ > + if (xe_mmio_wait32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0, > + 150, NULL, false)) Comment (64us) doesn't seem to match code (150us). Aside from that, Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Matt > + xe_gt_err_once(gt, "TD flush timeout\n"); > + > + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); > + } > +} > + > u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size) > { > return xe_device_has_flat_ccs(xe) ? > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h > index d413bc2c6be5..d3430f4b820a 100644 > --- a/drivers/gpu/drm/xe/xe_device.h > +++ b/drivers/gpu/drm/xe/xe_device.h > @@ -176,4 +176,6 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p); > u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address); > u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); > > +void xe_device_td_flush(struct xe_device *xe); > + > #endif > -- > 2.25.1 >
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h index 8fe811ea404a..65719a712807 100644 --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h @@ -318,6 +318,9 @@ #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658) +#define XE2_TDF_CTRL XE_REG(0xb418) +#define TRANSIENT_FLUSH_REQUEST REG_BIT(0) + #define XEHP_MERT_MOD_CTRL XE_REG_MCR(0xcf28) #define RENDER_MOD_CTRL XE_REG_MCR(0xcf2c) #define COMP_MOD_CTRL XE_REG_MCR(0xcf30) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index d85a2ba0a057..22e6422c7b8e 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -717,6 +717,55 @@ void xe_device_wmb(struct xe_device *xe) xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0); } +/** + * xe_device_td_flush() - Flush transient L3 cache entries + * @xe: The device + * + * Display engine has direct access to memory and is never coherent with L3/L4 + * caches (or CPU caches), however KMD is responsible for specifically flushing + * transient L3 GPU cache entries prior to the flip sequence to ensure scanout + * can happen from such a surface without seeing corruption. + * + * Display surfaces can be tagged as transient by mapping it using one of the + * various L3:XD PAT index modes on Xe2. + * + * Note: On non-discrete xe2 platforms, like LNL, the entire L3 cache is flushed + * at the end of each submission via PIPE_CONTROL for compute/render, since SA + * Media is not coherent with L3 and we want to support render-vs-media + * usescases. For other engines like copy/blt the HW internally forces uncached + * behaviour, hence why we can skip the TDF on such platforms. + */ +void xe_device_td_flush(struct xe_device *xe) +{ + struct xe_gt *gt; + u8 id; + + if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20) + return; + + for_each_gt(gt, xe, id) { + if (xe_gt_is_media_type(gt)) + continue; + + if (xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)) + return; + + xe_mmio_write32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST); + /* + * FIXME: We can likely do better here with our choice of + * timeout. Currently we just assume the worst case, i.e. 64us, + * which is believed to be sufficient to cover the worst case + * scenario on current platforms if all cache entries are + * transient and need to be flushed.. + */ + if (xe_mmio_wait32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0, + 150, NULL, false)) + xe_gt_err_once(gt, "TD flush timeout\n"); + + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + } +} + u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size) { return xe_device_has_flat_ccs(xe) ? diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index d413bc2c6be5..d3430f4b820a 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -176,4 +176,6 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p); u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address); u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); +void xe_device_td_flush(struct xe_device *xe); + #endif