Message ID | 20240403112253.1432390-24-balasubramani.vivekanandan@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enable dislay support for Battlemage | expand |
Hi Bala, On 4/3/2024 1:22 PM, Balasubramani Vivekanandan wrote: > From: Nirmoy Das <nirmoy.das@intel.com> > > Display surfaces can be tagged as transient by mapping it using one of > the various L3:XD PAT index modes on Xe2. The expectation is that KMD > needs to request transient data flush at the start of flip sequence to > ensure all transient data in L3 cache is flushed to memory. Add a > routine for this which we can then call from the display code. > > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> > Co-developed-by: Matthew Auld <matthew.auld@intel.com> > Signed-off-by: Matthew Auld <matthew.auld@intel.com> > Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> > --- > drivers/gpu/drm/xe/regs/xe_gt_regs.h | 3 ++ > drivers/gpu/drm/xe/xe_device.c | 52 ++++++++++++++++++++++++++++ > drivers/gpu/drm/xe/xe_device.h | 2 ++ > 3 files changed, 57 insertions(+) > > diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h > index 6617c86a096b..7afe810b3441 100644 > --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h > +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h > @@ -306,6 +306,9 @@ > > #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658) > > +#define XE2_TDF_CTRL XE_REG(0xb418) > +#define TRANSIENT_FLUSH_REQUEST REG_BIT(0) > + > #define XEHP_MERT_MOD_CTRL XE_REG_MCR(0xcf28) > #define RENDER_MOD_CTRL XE_REG_MCR(0xcf2c) > #define COMP_MOD_CTRL XE_REG_MCR(0xcf30) > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c > index 01bd5ccf05ca..0c9769fe04f6 100644 > --- a/drivers/gpu/drm/xe/xe_device.c > +++ b/drivers/gpu/drm/xe/xe_device.c > @@ -641,6 +641,58 @@ void xe_device_wmb(struct xe_device *xe) > xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0); > } > > +/** > + * xe_device_td_flush() - Flush transient L3 cache entries > + * @xe: The device > + * > + * Display engine has direct access to memory and is never coherent with L3/L4 > + * caches (or CPU caches), however KMD is responsible for specifically flushing > + * transient L3 GPU cache entries prior to the flip sequence to ensure scanout > + * can happen from such a surface without seeing corruption. > + * > + * Display surfaces can be tagged as transient by mapping it using one of the > + * various L3:XD PAT index modes on Xe2. > + * > + * Note: On non-discrete xe2 platforms, like LNL, the entire L3 cache is flushed > + * at the end of each submission via PIPE_CONTROL for compute/render, since SA > + * Media is not coherent with L3 and we want to support render-vs-media > + * usescases. For other engines like copy/blt the HW internally forces uncached > + * behaviour, hence why we can skip the TDF on such platforms. > + */ > +void xe_device_td_flush(struct xe_device *xe) > +{ > + struct xe_gt *gt; > + int err; > + u8 id; > + > + if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20) > + return; > + > + for_each_gt(gt, xe, id) { > + if (xe_gt_is_media_type(gt)) > + continue; > + > + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); > + if (err) > + return; This can be if (xe_force_wake_get()..) without needing the err variable. Sorry, this was my oversight from this morning. Regards, Nirmoy > + > + xe_mmio_write32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST); > + /* > + * FIXME: We can likely do better here with our choice of > + * timeout. Currently we just assume the worst case, but really > + * we should make this dependent on how much actual L3 there is > + * for this system. Recomendation is to allow ~64us in the worst > + * case for 8M of L3 (assumes all entries are transient and need > + * to be flushed). > + */ > + if (xe_mmio_wait32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0, > + 150, NULL, false)) > + xe_gt_err_once(gt, "TD flush timeout\n"); > + > + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); > + } > +} > + > u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size) > { > return xe_device_has_flat_ccs(xe) ? > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h > index d413bc2c6be5..d3430f4b820a 100644 > --- a/drivers/gpu/drm/xe/xe_device.h > +++ b/drivers/gpu/drm/xe/xe_device.h > @@ -176,4 +176,6 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p); > u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address); > u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); > > +void xe_device_td_flush(struct xe_device *xe); > + > #endif
diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h index 6617c86a096b..7afe810b3441 100644 --- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h +++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h @@ -306,6 +306,9 @@ #define XE2LPM_L3SQCREG5 XE_REG_MCR(0xb658) +#define XE2_TDF_CTRL XE_REG(0xb418) +#define TRANSIENT_FLUSH_REQUEST REG_BIT(0) + #define XEHP_MERT_MOD_CTRL XE_REG_MCR(0xcf28) #define RENDER_MOD_CTRL XE_REG_MCR(0xcf2c) #define COMP_MOD_CTRL XE_REG_MCR(0xcf30) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 01bd5ccf05ca..0c9769fe04f6 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -641,6 +641,58 @@ void xe_device_wmb(struct xe_device *xe) xe_mmio_write32(gt, SOFTWARE_FLAGS_SPR33, 0); } +/** + * xe_device_td_flush() - Flush transient L3 cache entries + * @xe: The device + * + * Display engine has direct access to memory and is never coherent with L3/L4 + * caches (or CPU caches), however KMD is responsible for specifically flushing + * transient L3 GPU cache entries prior to the flip sequence to ensure scanout + * can happen from such a surface without seeing corruption. + * + * Display surfaces can be tagged as transient by mapping it using one of the + * various L3:XD PAT index modes on Xe2. + * + * Note: On non-discrete xe2 platforms, like LNL, the entire L3 cache is flushed + * at the end of each submission via PIPE_CONTROL for compute/render, since SA + * Media is not coherent with L3 and we want to support render-vs-media + * usescases. For other engines like copy/blt the HW internally forces uncached + * behaviour, hence why we can skip the TDF on such platforms. + */ +void xe_device_td_flush(struct xe_device *xe) +{ + struct xe_gt *gt; + int err; + u8 id; + + if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20) + return; + + for_each_gt(gt, xe, id) { + if (xe_gt_is_media_type(gt)) + continue; + + err = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); + if (err) + return; + + xe_mmio_write32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST); + /* + * FIXME: We can likely do better here with our choice of + * timeout. Currently we just assume the worst case, but really + * we should make this dependent on how much actual L3 there is + * for this system. Recomendation is to allow ~64us in the worst + * case for 8M of L3 (assumes all entries are transient and need + * to be flushed). + */ + if (xe_mmio_wait32(gt, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0, + 150, NULL, false)) + xe_gt_err_once(gt, "TD flush timeout\n"); + + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + } +} + u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size) { return xe_device_has_flat_ccs(xe) ? diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index d413bc2c6be5..d3430f4b820a 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -176,4 +176,6 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p); u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address); u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); +void xe_device_td_flush(struct xe_device *xe); + #endif