diff mbox

drm: Handle unexpected holes in color-eviction

Message ID 20180219113543.8010-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Feb. 19, 2018, 11:35 a.m. UTC
During eviction, the driver may free more than one hole in the drm_mm
due to the side-effects in evicting the scanned nodes. However,
drm_mm_scan_color_evict() expects that the scan result is the first
available hole (in the mru freed hole_stack list):

  kernel BUG at drivers/gpu/drm/drm_mm.c:844!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in: i915 snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core lpc_ich snd_pcm e1000e mei_me prime_numbers mei
  CPU: 1 PID: 1490 Comm: gem_userptr_bli Tainted: G     U           4.16.0-rc1-g740f57c54ecf-kasan_6+ #1
  Hardware name: Dell Inc. OptiPlex 755                 /0PU052, BIOS A08 02/19/2008
  RIP: 0010:drm_mm_scan_color_evict+0x2b8/0x3d0
  RSP: 0018:ffff880057a573f8 EFLAGS: 00010287
  RAX: ffff8800611f5980 RBX: ffff880057a575d0 RCX: dffffc0000000000
  RDX: 00000000029d5000 RSI: 1ffff1000af4aec1 RDI: ffff8800611f5a10
  RBP: ffff88005ab884d0 R08: ffff880057a57600 R09: 000000000afff000
  R10: 1ffff1000b5710b5 R11: 0000000000001000 R12: 1ffff1000af4ae82
  R13: ffff8800611f59b0 R14: ffff8800611f5980 R15: ffff880057a57608
  FS:  00007f2de0c2e8c0(0000) GS:ffff88006ac40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2ddde1e000 CR3: 00000000609b2000 CR4: 00000000000006e0
  Call Trace:
   ? drm_mm_scan_remove_block+0x330/0x330
   ? drm_mm_scan_remove_block+0x151/0x330
   i915_gem_evict_something+0x711/0xbd0 [i915]
   ? igt_evict_contexts+0x50/0x50 [i915]
   ? nop_clear_range+0x10/0x10 [i915]
   ? igt_evict_something+0x90/0x90 [i915]
   ? i915_gem_gtt_reserve+0x1a1/0x320 [i915]
   i915_gem_gtt_insert+0x237/0x400 [i915]
   __i915_vma_do_pin+0xc25/0x1a20 [i915]
   eb_lookup_vmas+0x1c63/0x3790 [i915]
   ? i915_gem_check_execbuffer+0x250/0x250 [i915]
   ? trace_hardirqs_on_caller+0x33f/0x590
   ? _raw_spin_unlock_irqrestore+0x39/0x60
   ? __pm_runtime_resume+0x7d/0xf0
   i915_gem_do_execbuffer+0x86a/0x2ff0 [i915]
   ? __kmalloc+0x132/0x340
   ? i915_gem_execbuffer2_ioctl+0x10f/0x760 [i915]
   ? drm_ioctl_kernel+0x12e/0x1c0
   ? drm_ioctl+0x662/0x980
   ? eb_relocate_slow+0xa90/0xa90 [i915]
   ? i915_gem_execbuffer2_ioctl+0x10f/0x760 [i915]
   ? __might_fault+0xea/0x1a0
   i915_gem_execbuffer2_ioctl+0x3cc/0x760 [i915]
   ? i915_gem_execbuffer_ioctl+0xba0/0xba0 [i915]
   ? lock_acquire+0x3c0/0x3c0
   ? i915_gem_execbuffer_ioctl+0xba0/0xba0 [i915]
   drm_ioctl_kernel+0x12e/0x1c0
   drm_ioctl+0x662/0x980
   ? i915_gem_execbuffer_ioctl+0xba0/0xba0 [i915]
   ? drm_getstats+0x20/0x20
   ? debug_check_no_obj_freed+0x2a6/0x8c0
   do_vfs_ioctl+0x170/0xe70
   ? ioctl_preallocate+0x170/0x170
   ? task_work_run+0xbe/0x160
   ? lock_acquire+0x3c0/0x3c0
   ? trace_hardirqs_on_caller+0x33f/0x590
   ? _raw_spin_unlock_irq+0x2f/0x50
   SyS_ioctl+0x36/0x70
   ? do_vfs_ioctl+0xe70/0xe70
   do_syscall_64+0x18c/0x5d0
   entry_SYSCALL_64_after_hwframe+0x26/0x9b
  RIP: 0033:0x7f2ddf13b587
  RSP: 002b:00007fff15c4f9d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2ddf13b587
  RDX: 00007fff15c4fa20 RSI: 0000000040406469 RDI: 0000000000000003
  RBP: 00007fff15c4fa20 R08: 0000000000000000 R09: 00007f2ddf3fe120
  R10: 0000000000000073 R11: 0000000000000246 R12: 0000000040406469
  R13: 0000000000000003 R14: 00007fff15c4fa20 R15: 00000000000000c7
  Code: 00 00 00 4a c7 44 22 08 00 00 00 00 42 c7 44 22 10 00 00 00 00 48 81 c4 b8 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 0b 0f 0b <0f> 0b 31 c0 eb c0 4c 89 ef e8 9a 09 41 ff e9 1e fe ff ff 4c 89
  RIP: drm_mm_scan_color_evict+0x2b8/0x3d0 RSP: ffff880057a573f8

We can trivially relax this assumption, by searching the hole_stack for
the scan result and warn instead if the driver called us without any
result.

Fixes: 3fa489dabea9 ("drm: Apply tight eviction scanning to color_adjust")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.11+
---
 drivers/gpu/drm/drm_mm.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Comments

Joonas Lahtinen Feb. 19, 2018, 2:40 p.m. UTC | #1
Quoting Chris Wilson (2018-02-19 13:35:43)
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -836,9 +836,24 @@ struct drm_mm_node *drm_mm_scan_color_evict(struct drm_mm_scan *scan)
>         if (!mm->color_adjust)
>                 return NULL;
>  
> -       hole = list_first_entry(&mm->hole_stack, typeof(*hole), hole_stack);
> -       hole_start = __drm_mm_hole_node_start(hole);
> -       hole_end = hole_start + hole->hole_size;
> +       /*
> +        * The hole found during scanning should ideally be the first element
> +        * in the hole_stack list, but due to side-effects in the driver it
> +        * may not be.
> +        */
> +       list_for_each_entry(hole, &mm->hole_stack, hole_stack) {
> +               hole_start = __drm_mm_hole_node_start(hole);
> +               hole_end = hole_start + hole->hole_size;
> +
> +               if (hole_start <= scan->hit_start &&
> +                   hole_end >= scan->hit_end)

How about some likely() here?

> +                       break;
> +       }
> +
> +       /* We should only be called after we found the hole previously */
> +       DRM_MM_BUG_ON(&hole->hole_stack == &mm->hole_stack);
> +       if (unlikely(&hole->hole_stack == &mm->hole_stack))

Would be more readable as:

if (...) {
	DRM_MM_BUG()
}

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
Chris Wilson Feb. 19, 2018, 2:43 p.m. UTC | #2
Quoting Joonas Lahtinen (2018-02-19 14:40:32)
> Quoting Chris Wilson (2018-02-19 13:35:43)
> > +++ b/drivers/gpu/drm/drm_mm.c
> > @@ -836,9 +836,24 @@ struct drm_mm_node *drm_mm_scan_color_evict(struct drm_mm_scan *scan)
> >         if (!mm->color_adjust)
> >                 return NULL;
> >  
> > -       hole = list_first_entry(&mm->hole_stack, typeof(*hole), hole_stack);
> > -       hole_start = __drm_mm_hole_node_start(hole);
> > -       hole_end = hole_start + hole->hole_size;
> > +       /*
> > +        * The hole found during scanning should ideally be the first element
> > +        * in the hole_stack list, but due to side-effects in the driver it
> > +        * may not be.
> > +        */
> > +       list_for_each_entry(hole, &mm->hole_stack, hole_stack) {
> > +               hole_start = __drm_mm_hole_node_start(hole);
> > +               hole_end = hole_start + hole->hole_size;
> > +
> > +               if (hole_start <= scan->hit_start &&
> > +                   hole_end >= scan->hit_end)
> 
> How about some likely() here?

I felt that at the point where we admit we need a search, likely() went
out of the window. Given that I think we'll need 2 or 3 iterations at
most, that probably means in practice this is around the 50% mark.
Claiming that it's likely() may be misleading to the reader.

> > +                       break;
> > +       }
> > +
> > +       /* We should only be called after we found the hole previously */
> > +       DRM_MM_BUG_ON(&hole->hole_stack == &mm->hole_stack);
> > +       if (unlikely(&hole->hole_stack == &mm->hole_stack))
> 
> Would be more readable as:
> 
> if (...) {
>         DRM_MM_BUG()
> }

Maybe DRM_MM_WARN_ON(), both hypothetical functions :)
-Chris
Daniel Vetter Feb. 19, 2018, 3:52 p.m. UTC | #3
On Mon, Feb 19, 2018 at 02:43:59PM +0000, Chris Wilson wrote:
> Quoting Joonas Lahtinen (2018-02-19 14:40:32)
> > Quoting Chris Wilson (2018-02-19 13:35:43)
> > > +++ b/drivers/gpu/drm/drm_mm.c
> > > @@ -836,9 +836,24 @@ struct drm_mm_node *drm_mm_scan_color_evict(struct drm_mm_scan *scan)
> > >         if (!mm->color_adjust)
> > >                 return NULL;
> > >  
> > > -       hole = list_first_entry(&mm->hole_stack, typeof(*hole), hole_stack);
> > > -       hole_start = __drm_mm_hole_node_start(hole);
> > > -       hole_end = hole_start + hole->hole_size;
> > > +       /*
> > > +        * The hole found during scanning should ideally be the first element
> > > +        * in the hole_stack list, but due to side-effects in the driver it
> > > +        * may not be.
> > > +        */
> > > +       list_for_each_entry(hole, &mm->hole_stack, hole_stack) {
> > > +               hole_start = __drm_mm_hole_node_start(hole);
> > > +               hole_end = hole_start + hole->hole_size;
> > > +
> > > +               if (hole_start <= scan->hit_start &&
> > > +                   hole_end >= scan->hit_end)
> > 
> > How about some likely() here?
> 
> I felt that at the point where we admit we need a search, likely() went
> out of the window. Given that I think we'll need 2 or 3 iterations at
> most, that probably means in practice this is around the 50% mark.
> Claiming that it's likely() may be misleading to the reader.

Concurred.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I spent some brain cycles trying to come up with clever tricks to avoid
the need to this function outright, but failed.
-Daniel

> > > +                       break;
> > > +       }
> > > +
> > > +       /* We should only be called after we found the hole previously */
> > > +       DRM_MM_BUG_ON(&hole->hole_stack == &mm->hole_stack);
> > > +       if (unlikely(&hole->hole_stack == &mm->hole_stack))
> > 
> > Would be more readable as:
> > 
> > if (...) {
> >         DRM_MM_BUG()
> > }
> 
> Maybe DRM_MM_WARN_ON(), both hypothetical functions :)
> -Chris
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chris Wilson Feb. 20, 2018, 9:08 a.m. UTC | #4
Quoting Daniel Vetter (2018-02-19 15:52:34)
> On Mon, Feb 19, 2018 at 02:43:59PM +0000, Chris Wilson wrote:
> > Quoting Joonas Lahtinen (2018-02-19 14:40:32)
> > > Quoting Chris Wilson (2018-02-19 13:35:43)
> > > > +++ b/drivers/gpu/drm/drm_mm.c
> > > > @@ -836,9 +836,24 @@ struct drm_mm_node *drm_mm_scan_color_evict(struct drm_mm_scan *scan)
> > > >         if (!mm->color_adjust)
> > > >                 return NULL;
> > > >  
> > > > -       hole = list_first_entry(&mm->hole_stack, typeof(*hole), hole_stack);
> > > > -       hole_start = __drm_mm_hole_node_start(hole);
> > > > -       hole_end = hole_start + hole->hole_size;
> > > > +       /*
> > > > +        * The hole found during scanning should ideally be the first element
> > > > +        * in the hole_stack list, but due to side-effects in the driver it
> > > > +        * may not be.
> > > > +        */
> > > > +       list_for_each_entry(hole, &mm->hole_stack, hole_stack) {
> > > > +               hole_start = __drm_mm_hole_node_start(hole);
> > > > +               hole_end = hole_start + hole->hole_size;
> > > > +
> > > > +               if (hole_start <= scan->hit_start &&
> > > > +                   hole_end >= scan->hit_end)
> > > 
> > > How about some likely() here?
> > 
> > I felt that at the point where we admit we need a search, likely() went
> > out of the window. Given that I think we'll need 2 or 3 iterations at
> > most, that probably means in practice this is around the 50% mark.
> > Claiming that it's likely() may be misleading to the reader.
> 
> Concurred.
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Not too surprising that no one else had an opinion since only i915 uses
the node coloring at present. Pushed to drm-misc-fixes.
 
> I spent some brain cycles trying to come up with clever tricks to avoid
> the need to this function outright, but failed.

I think it's not too bad overall. When scanning we include the necessary
bookends in the hole calculation, so we know that if we do need to evict
a bookend then it is available and was included in our LRU ordering. It
is just that during the scan, the intermediate state is inconsistent
with the final arrangement and so instead of always evicting those
bookends we make a final call on whether the new node can fit into the
smaller hole or needs the enlargement. It should only be called a maximum
of two times to check on the two bookends, and have no impact for when
coloring is not in effect.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 186c4e90cc1c..89eef1bb4ddc 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -836,9 +836,24 @@  struct drm_mm_node *drm_mm_scan_color_evict(struct drm_mm_scan *scan)
 	if (!mm->color_adjust)
 		return NULL;
 
-	hole = list_first_entry(&mm->hole_stack, typeof(*hole), hole_stack);
-	hole_start = __drm_mm_hole_node_start(hole);
-	hole_end = hole_start + hole->hole_size;
+	/*
+	 * The hole found during scanning should ideally be the first element
+	 * in the hole_stack list, but due to side-effects in the driver it
+	 * may not be.
+	 */
+	list_for_each_entry(hole, &mm->hole_stack, hole_stack) {
+		hole_start = __drm_mm_hole_node_start(hole);
+		hole_end = hole_start + hole->hole_size;
+
+		if (hole_start <= scan->hit_start &&
+		    hole_end >= scan->hit_end)
+			break;
+	}
+
+	/* We should only be called after we found the hole previously */
+	DRM_MM_BUG_ON(&hole->hole_stack == &mm->hole_stack);
+	if (unlikely(&hole->hole_stack == &mm->hole_stack))
+		return NULL;
 
 	DRM_MM_BUG_ON(hole_start > scan->hit_start);
 	DRM_MM_BUG_ON(hole_end < scan->hit_end);