Message ID | 20211110085527.1033475-1-thomas.hellstrom@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/ttm: Fix illegal addition to shrinker list | expand |
On 10/11/2021 08:55, Thomas Hellström wrote: > There's a small window of opportunity during which the adjust_lru() > function can be called with a GEM refcount of zero from the TTM > eviction code. This results in a kernel BUG(). > > Ensure that we don't attempt to modify the GEM shrinker lists unless > we have a GEM refcount. > > Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru") > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
On 11/10/21 11:29, Matthew Auld wrote: > On 10/11/2021 08:55, Thomas Hellström wrote: >> There's a small window of opportunity during which the adjust_lru() >> function can be called with a GEM refcount of zero from the TTM >> eviction code. This results in a kernel BUG(). >> >> Ensure that we don't attempt to modify the GEM shrinker lists unless >> we have a GEM refcount. >> >> Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into >> adjust_lru") >> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> > Reviewed-by: Matthew Auld <matthew.auld@intel.com> Thanks for reviewing, Matt. /Thomas
On 11/10/21 13:16, Patchwork wrote: > Project List - Patchwork *Patch Details* > *Series:* drm/i915/ttm: Fix illegal addition to shrinker list > *URL:* https://patchwork.freedesktop.org/series/96748/ > *State:* failure > *Details:* > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21552/index.html > > > CI Bug Log - changes from CI_DRM_10861_full -> Patchwork_21552_full > > > Summary > > *FAILURE* > > Serious unknown changes coming with Patchwork_21552_full absolutely > need to be > verified manually. > > If you think the reported changes have nothing to do with the changes > introduced in Patchwork_21552_full, please notify your bug team to > allow them > to document this new failure mode, which will reduce false positives > in CI. > > > Participating hosts (11 -> 11) > > No changes in participating hosts > > > Possible new issues > > Here are the unknown changes that may have been introduced in > Patchwork_21552_full: > > > IGT changes > > > Possible regressions > > * igt@i915_suspend@debugfs-reader: > o shard-snb: PASS > <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10861/shard-snb4/igt@i915_suspend@debugfs-reader.html> > -> DMESG-WARN > <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21552/shard-snb5/igt@i915_suspend@debugfs-reader.html> > Lakshmi, This failure is unrelated. Thanks, Thomas
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index e98503c0830b..68cfe6e9ceab 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -771,18 +771,29 @@ void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj) * * TODO: consider maybe also bumping the shrinker list here when we have * already unpinned it, which should give us something more like an LRU. + * + * TODO: There is a small window of opportunity for this function to + * get called from eviction after we've dropped the last GEM refcount, + * but before the TTM deleted flag is set on the object. Avoid + * adjusting the shrinker list in such cases, since the object is + * not available to the shrinker anyway due to its zero refcount. + * To fix this properly we should move to a TTM shrinker LRU list for + * these objects. */ - if (shrinkable != obj->mm.ttm_shrinkable) { - if (shrinkable) { - if (obj->mm.madv == I915_MADV_WILLNEED) - __i915_gem_object_make_shrinkable(obj); - else - __i915_gem_object_make_purgeable(obj); - } else { - i915_gem_object_make_unshrinkable(obj); + if (kref_get_unless_zero(&obj->base.refcount)) { + if (shrinkable != obj->mm.ttm_shrinkable) { + if (shrinkable) { + if (obj->mm.madv == I915_MADV_WILLNEED) + __i915_gem_object_make_shrinkable(obj); + else + __i915_gem_object_make_purgeable(obj); + } else { + i915_gem_object_make_unshrinkable(obj); + } + + obj->mm.ttm_shrinkable = shrinkable; } - - obj->mm.ttm_shrinkable = shrinkable; + i915_gem_object_put(obj); } /*
There's a small window of opportunity during which the adjust_lru() function can be called with a GEM refcount of zero from the TTM eviction code. This results in a kernel BUG(). Ensure that we don't attempt to modify the GEM shrinker lists unless we have a GEM refcount. Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 31 +++++++++++++++++-------- 1 file changed, 21 insertions(+), 10 deletions(-)