Message ID | 20230713152718.645488-2-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/2] drm/i915: Refactor PAT/object cache handling | expand |
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > According to the comment in i915_gem_object_can_bypass_llc the > purpose of the function is to return false if the platform/object > has a caching mode where GPU can bypass the LLC. > > So far the only platforms which allegedly can do this are Jasperlake > and Elkhartlake, and that via MOCS (not PAT). > > Instead of blindly assuming that objects where userspace has set the > PAT index can (bypass the LLC), question is is there a such PAT index > on a platform. Probably starting with Meteorlake since that one is the > only one where set PAT extension can be currently used. Or if there is > a MOCS entry which can achieve the same thing on Meteorlake. > > If there is such PAT, now that i915 can be made to understand them > better, we can make the check more fine grained. Or if there is a MOCS > entry then we probably should apply the blanket IS_METEORLAKE condition. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level") > Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > Cc: Fei Yang <fei.yang@intel.com> > Cc: Andi Shyti <andi.shyti@linux.intel.com> > Cc: Matt Roper <matthew.d.roper@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c > index 33a1e97d18b3..1e34171c4162 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) > if (!(obj->flags & I915_BO_ALLOC_USER)) > return false; > > - /* > - * Always flush cache for UMD objects at creation time. > - */ > - if (obj->pat_set_by_user) I'm afraid this is going to break MESA. Can we run MESA tests with this patch? > /* > * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it > * possible for userspace to bypass the GTT caching bits set by the > -- > 2.39.2
On 14/07/2023 06:43, Yang, Fei wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> According to the comment in i915_gem_object_can_bypass_llc the >> purpose of the function is to return false if the platform/object >> has a caching mode where GPU can bypass the LLC. >> >> So far the only platforms which allegedly can do this are Jasperlake >> and Elkhartlake, and that via MOCS (not PAT). >> >> Instead of blindly assuming that objects where userspace has set the >> PAT index can (bypass the LLC), question is is there a such PAT index >> on a platform. Probably starting with Meteorlake since that one is the >> only one where set PAT extension can be currently used. Or if there is >> a MOCS entry which can achieve the same thing on Meteorlake. >> >> If there is such PAT, now that i915 can be made to understand them >> better, we can make the check more fine grained. Or if there is a MOCS >> entry then we probably should apply the blanket IS_METEORLAKE condition. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level") >> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >> Cc: Fei Yang <fei.yang@intel.com> >> Cc: Andi Shyti <andi.shyti@linux.intel.com> >> Cc: Matt Roper <matthew.d.roper@intel.com> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------ >> 1 file changed, 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c >> index 33a1e97d18b3..1e34171c4162 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) >> if (!(obj->flags & I915_BO_ALLOC_USER)) >> return false; >> >> - /* >> - * Always flush cache for UMD objects at creation time. >> - */ >> - if (obj->pat_set_by_user) > > I'm afraid this is going to break MESA. Can we run MESA tests with this patch? I can't, but question is why it would break Mesa which would need a nice comment here? For instance should the check be IS_METEORLAKE? Or should it be "is wb" && "not has 1-way coherent"? Or both? Or, given how Meteorlake does not have LLC, how can anything bypass it there? Or is it about snooping on Meteorlake and how? Regards, Tvrtko > >> /* >> * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it >> * possible for userspace to bypass the GTT caching bits set by the >> -- >> 2.39.2
> On 14/07/2023 06:43, Yang, Fei wrote: >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> According to the comment in i915_gem_object_can_bypass_llc the >>> purpose of the function is to return false if the platform/object has >>> a caching mode where GPU can bypass the LLC. >>> >>> So far the only platforms which allegedly can do this are Jasperlake >>> and Elkhartlake, and that via MOCS (not PAT). >>> >>> Instead of blindly assuming that objects where userspace has set the >>> PAT index can (bypass the LLC), question is is there a such PAT index >>> on a platform. Probably starting with Meteorlake since that one is >>> the only one where set PAT extension can be currently used. Or if >>> there is a MOCS entry which can achieve the same thing on Meteorlake. >>> >>> If there is such PAT, now that i915 can be made to understand them >>> better, we can make the check more fine grained. Or if there is a >>> MOCS entry then we probably should apply the blanket IS_METEORLAKE condition. >>> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of >>> cache_level") >>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>> Cc: Fei Yang <fei.yang@intel.com> >>> Cc: Andi Shyti <andi.shyti@linux.intel.com> >>> Cc: Matt Roper <matthew.d.roper@intel.com> >>> --- >>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------ >>> 1 file changed, 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> index 33a1e97d18b3..1e34171c4162 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) >>> if (!(obj->flags & I915_BO_ALLOC_USER)) >>> return false; >>> >>> - /* >>> - * Always flush cache for UMD objects at creation time. >>> - */ >>> - if (obj->pat_set_by_user) >> >> I'm afraid this is going to break MESA. Can we run MESA tests with this patch? > > I can't, but question is why it would break Mesa which would need a > nice comment here? For objects with PAT index set by user, the KMD doesn't know whether the user space would map it as cacheable or not for CPU access. So we want to enforce a cache flush at BO creation time before handing the BO over to user space. I remember the issue we had before is that MESA sees garbage data in a BO that is supposed to be initialized with zero. I understand your point, checking for obj->pat_set_by_user is not something to be done in this function. I'm fine with the removal of these lines, but the checking needs to be done somewhere, maybe just one level up? diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 1df74f7aa3dc..39cd903ba223 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -258,6 +258,7 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj) * the driver. */ if (i915_gem_object_can_bypass_llc(obj) || + obj->pat_set_by_user || (!HAS_LLC(i915) && !IS_DG1(i915))) wbinvd_on_all_cpus(); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 8f1633c3fb93..770e02a851f6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -254,7 +254,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj) if (i915_gem_object_needs_bit17_swizzle(obj)) i915_gem_object_do_bit_17_swizzle(obj, st); - if (i915_gem_object_can_bypass_llc(obj)) + if (i915_gem_object_can_bypass_llc(obj) || + obj->pat_set_by_user) obj->cache_dirty = true; __i915_gem_object_set_pages(obj, st); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 1d3ebdf4069b..9e65e3324422 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -170,7 +170,8 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) } WARN_ON_ONCE(!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE)); - if (i915_gem_object_can_bypass_llc(obj)) + if (i915_gem_object_can_bypass_llc(obj) || + obj->pat_set_by_user) obj->cache_dirty = true; __i915_gem_object_set_pages(obj, st); -Fei > For instance should the check be IS_METEORLAKE? > > Or should it be "is wb" && "not has 1-way coherent"? > > Or both? > > Or, given how Meteorlake does not have LLC, how can anything bypass it > there? Or is it about snooping on Meteorlake and how? > > Regards, > > Tvrtko > >> >>> /* >>> * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it >>> * possible for userspace to bypass the GTT caching bits set >>> by the >>> -- >>> 2.39.2
On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote: > > On 14/07/2023 06:43, Yang, Fei wrote: > > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > > > > According to the comment in i915_gem_object_can_bypass_llc the > > > purpose of the function is to return false if the platform/object > > > has a caching mode where GPU can bypass the LLC. > > > > > > So far the only platforms which allegedly can do this are Jasperlake > > > and Elkhartlake, and that via MOCS (not PAT). > > > > > > Instead of blindly assuming that objects where userspace has set the > > > PAT index can (bypass the LLC), question is is there a such PAT index > > > on a platform. Probably starting with Meteorlake since that one is the > > > only one where set PAT extension can be currently used. Or if there is > > > a MOCS entry which can achieve the same thing on Meteorlake. > > > > > > If there is such PAT, now that i915 can be made to understand them > > > better, we can make the check more fine grained. Or if there is a MOCS > > > entry then we probably should apply the blanket IS_METEORLAKE condition. > > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level") > > > Cc: Chris Wilson <chris.p.wilson@linux.intel.com> > > > Cc: Fei Yang <fei.yang@intel.com> > > > Cc: Andi Shyti <andi.shyti@linux.intel.com> > > > Cc: Matt Roper <matthew.d.roper@intel.com> > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------ > > > 1 file changed, 6 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > index 33a1e97d18b3..1e34171c4162 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) > > > if (!(obj->flags & I915_BO_ALLOC_USER)) > > > return false; > > > > > > - /* > > > - * Always flush cache for UMD objects at creation time. > > > - */ > > > - if (obj->pat_set_by_user) > > > > I'm afraid this is going to break MESA. Can we run MESA tests with this patch? > > I can't, but question is why it would break Mesa which would need a nice > comment here? > > For instance should the check be IS_METEORLAKE? > > Or should it be "is wb" && "not has 1-way coherent"? > > Or both? > > Or, given how Meteorlake does not have LLC, how can anything bypass it > there? Or is it about snooping on Meteorlake and how? I think the "LLC" in the function name is a bit misleading since this is really all just about the ability to avoid coherency (which might come from an LLC on some platforms or from snooping on others). The concern is that the CPU writes to the buffer and those writes sit in a CPU cache without making it to RAM immediately. If the GPU then reads the object with any of the non-coherent PAT settings that were introduced in Xe_LPG, it will not snoop the CPU cache and will read old, stale data from RAM. So I think we'd want a condition like ("Xe_LPG or later" && "any non coherent PAT"). The WB/WT/UC status of the GPU behavior shouldn't matter here, just the coherency setting. Matt > > Regards, > > Tvrtko > > > > > > /* > > > * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it > > > * possible for userspace to bypass the GTT caching bits set by the > > > -- > > > 2.39.2
On 15/07/2023 01:20, Matt Roper wrote: > On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote: >> >> On 14/07/2023 06:43, Yang, Fei wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> According to the comment in i915_gem_object_can_bypass_llc the >>>> purpose of the function is to return false if the platform/object >>>> has a caching mode where GPU can bypass the LLC. >>>> >>>> So far the only platforms which allegedly can do this are Jasperlake >>>> and Elkhartlake, and that via MOCS (not PAT). >>>> >>>> Instead of blindly assuming that objects where userspace has set the >>>> PAT index can (bypass the LLC), question is is there a such PAT index >>>> on a platform. Probably starting with Meteorlake since that one is the >>>> only one where set PAT extension can be currently used. Or if there is >>>> a MOCS entry which can achieve the same thing on Meteorlake. >>>> >>>> If there is such PAT, now that i915 can be made to understand them >>>> better, we can make the check more fine grained. Or if there is a MOCS >>>> entry then we probably should apply the blanket IS_METEORLAKE condition. >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level") >>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com> >>>> Cc: Fei Yang <fei.yang@intel.com> >>>> Cc: Andi Shyti <andi.shyti@linux.intel.com> >>>> Cc: Matt Roper <matthew.d.roper@intel.com> >>>> --- >>>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------ >>>> 1 file changed, 6 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> index 33a1e97d18b3..1e34171c4162 100644 >>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c >>>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) >>>> if (!(obj->flags & I915_BO_ALLOC_USER)) >>>> return false; >>>> >>>> - /* >>>> - * Always flush cache for UMD objects at creation time. >>>> - */ >>>> - if (obj->pat_set_by_user) >>> >>> I'm afraid this is going to break MESA. Can we run MESA tests with this patch? >> >> I can't, but question is why it would break Mesa which would need a nice >> comment here? >> >> For instance should the check be IS_METEORLAKE? >> >> Or should it be "is wb" && "not has 1-way coherent"? >> >> Or both? >> >> Or, given how Meteorlake does not have LLC, how can anything bypass it >> there? Or is it about snooping on Meteorlake and how? > > I think the "LLC" in the function name is a bit misleading since this is > really all just about the ability to avoid coherency (which might come > from an LLC on some platforms or from snooping on others). > > The concern is that the CPU writes to the buffer and those writes sit in > a CPU cache without making it to RAM immediately. If the GPU then > reads the object with any of the non-coherent PAT settings that were > introduced in Xe_LPG, it will not snoop the CPU cache and will read old, > stale data from RAM. > > So I think we'd want a condition like ("Xe_LPG or later" && "any non > coherent PAT"). The WB/WT/UC status of the GPU behavior shouldn't > matter here, just the coherency setting. Right, sounds plausible to me. So with this series the new condition in this function would look like this: i915_gem_object_can_bypass_llc(..) { ... if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) && i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH1W) != 1) return true; ("!= 1" in the condition meaning either it is not coherent, or i915 does not know due table being incomplete - like some PAT index on some future platform was forgotten to be defined.) That would catch any platform with non-coherent WB, as long as the PAT-to-i915-cache-mode tables are correct. It would currently only apply to Meteorlake: #define MTL_CACHE_MODES \ .cache_modes = { \ [0] = I915_CACHE(WB), \ [1] = I915_CACHE(WT), \ [2] = I915_CACHE(UC), \ [3] = _I915_CACHE(WB, COH1W), \ [4] = __I915_CACHE(WB, BIT(I915_CACHE_FLAG_COH1W) | BIT(I915_CACHE_FLAG_COH2W)), \ } Or are saying it should apply to UC and WT too somehow? I'll also try to join sub-threads to Fei's reply here too. So in terms of the stated issue with _CPU_ access from Mesa seeing stale data (non-zeroed pages) depending on the PAT index - I don't understand that yet. That seems like a completely CPU cache problem space and I do not understand how PAT index gets into the picture. But the proposed patch from your email Fei looks like it would be covered by the snippet I have in this reply. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 33a1e97d18b3..1e34171c4162 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj) if (!(obj->flags & I915_BO_ALLOC_USER)) return false; - /* - * Always flush cache for UMD objects at creation time. - */ - if (obj->pat_set_by_user) - return true; - /* * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it * possible for userspace to bypass the GTT caching bits set by the