diff mbox series

drm/i915/ttm: Fix illegal addition to shrinker list

Message ID 20211110085527.1033475-1-thomas.hellstrom@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/ttm: Fix illegal addition to shrinker list | expand

Commit Message

Thomas Hellstrom Nov. 10, 2021, 8:55 a.m. UTC
There's a small window of opportunity during which the adjust_lru()
function can be called with a GEM refcount of zero from the TTM
eviction code. This results in a kernel BUG().

Ensure that we don't attempt to modify the GEM shrinker lists unless
we have a GEM refcount.

Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 31 +++++++++++++++++--------
 1 file changed, 21 insertions(+), 10 deletions(-)

Comments

Matthew Auld Nov. 10, 2021, 10:29 a.m. UTC | #1
On 10/11/2021 08:55, Thomas Hellström wrote:
> There's a small window of opportunity during which the adjust_lru()
> function can be called with a GEM refcount of zero from the TTM
> eviction code. This results in a kernel BUG().
> 
> Ensure that we don't attempt to modify the GEM shrinker lists unless
> we have a GEM refcount.
> 
> Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru")
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Thomas Hellstrom Nov. 10, 2021, 10:31 a.m. UTC | #2
On 11/10/21 11:29, Matthew Auld wrote:
> On 10/11/2021 08:55, Thomas Hellström wrote:
>> There's a small window of opportunity during which the adjust_lru()
>> function can be called with a GEM refcount of zero from the TTM
>> eviction code. This results in a kernel BUG().
>>
>> Ensure that we don't attempt to modify the GEM shrinker lists unless
>> we have a GEM refcount.
>>
>> Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into 
>> adjust_lru")
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>


Thanks for reviewing, Matt.

/Thomas
Thomas Hellstrom Nov. 10, 2021, 1:05 p.m. UTC | #3
On 11/10/21 13:16, Patchwork wrote:
> Project List - Patchwork *Patch Details*
> *Series:* 	drm/i915/ttm: Fix illegal addition to shrinker list
> *URL:* 	https://patchwork.freedesktop.org/series/96748/
> *State:* 	failure
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21552/index.html
>
>
>   CI Bug Log - changes from CI_DRM_10861_full -> Patchwork_21552_full
>
>
>     Summary
>
> *FAILURE*
>
> Serious unknown changes coming with Patchwork_21552_full absolutely 
> need to be
> verified manually.
>
> If you think the reported changes have nothing to do with the changes
> introduced in Patchwork_21552_full, please notify your bug team to 
> allow them
> to document this new failure mode, which will reduce false positives 
> in CI.
>
>
>     Participating hosts (11 -> 11)
>
> No changes in participating hosts
>
>
>     Possible new issues
>
> Here are the unknown changes that may have been introduced in 
> Patchwork_21552_full:
>
>
>       IGT changes
>
>
>         Possible regressions
>
>   * igt@i915_suspend@debugfs-reader:
>       o shard-snb: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10861/shard-snb4/igt@i915_suspend@debugfs-reader.html>
>         -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21552/shard-snb5/igt@i915_suspend@debugfs-reader.html>
>
Lakshmi,

This failure is unrelated.

Thanks,

Thomas
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index e98503c0830b..68cfe6e9ceab 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -771,18 +771,29 @@  void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj)
 	 *
 	 * TODO: consider maybe also bumping the shrinker list here when we have
 	 * already unpinned it, which should give us something more like an LRU.
+	 *
+	 * TODO: There is a small window of opportunity for this function to
+	 * get called from eviction after we've dropped the last GEM refcount,
+	 * but before the TTM deleted flag is set on the object. Avoid
+	 * adjusting the shrinker list in such cases, since the object is
+	 * not available to the shrinker anyway due to its zero refcount.
+	 * To fix this properly we should move to a TTM shrinker LRU list for
+	 * these objects.
 	 */
-	if (shrinkable != obj->mm.ttm_shrinkable) {
-		if (shrinkable) {
-			if (obj->mm.madv == I915_MADV_WILLNEED)
-				__i915_gem_object_make_shrinkable(obj);
-			else
-				__i915_gem_object_make_purgeable(obj);
-		} else {
-			i915_gem_object_make_unshrinkable(obj);
+	if (kref_get_unless_zero(&obj->base.refcount)) {
+		if (shrinkable != obj->mm.ttm_shrinkable) {
+			if (shrinkable) {
+				if (obj->mm.madv == I915_MADV_WILLNEED)
+					__i915_gem_object_make_shrinkable(obj);
+				else
+					__i915_gem_object_make_purgeable(obj);
+			} else {
+				i915_gem_object_make_unshrinkable(obj);
+			}
+
+			obj->mm.ttm_shrinkable = shrinkable;
 		}
-
-		obj->mm.ttm_shrinkable = shrinkable;
+		i915_gem_object_put(obj);
 	}
 
 	/*