Message ID | 20220905133738.466490-2-hdegoede@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/gma500: Fix 2 locking related WARN_ON oopses | expand |
On Mon, Sep 5, 2022 at 3:37 PM Hans de Goede <hdegoede@redhat.com> wrote: > > gma_crtc_page_flip() was holding the event_lock spinlock while calling > crtc_funcs->mode_set_base() which takes ww_mutex. > > The only reason to hold event_lock is to clear gma_crtc->page_flip_event > on mode_set_base() errors. > > Instead unlock it after setting gma_crtc->page_flip_event and on > errors re-take the lock and clear gma_crtc->page_flip_event it > it is still set. Hi Hans, thanks for having a look at gma500. See comments below. > > This fixes the following WARN/stacktrace: > > [ 512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870 > [ 512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell > [ 512.123031] preempt_count: 1, expected: 0 > [ 512.123048] RCU nest depth: 0, expected: 0 > [ 512.123066] INFO: lockdep is turned off. > [ 512.123080] irq event stamp: 0 > [ 512.123094] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0 > [ 512.123176] softirqs last enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0 > [ 512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 512.123233] Preemption disabled at: > [ 512.123241] [<0000000000000000>] 0x0 > [ 512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G W 5.19.0+ #1 > [ 512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013 > [ 512.123323] Call Trace: > [ 512.123346] <TASK> > [ 512.123370] dump_stack_lvl+0x5b/0x77 > [ 512.123412] __might_resched.cold+0xff/0x13a > [ 512.123458] ww_mutex_lock+0x1e/0xa0 > [ 512.123495] psb_gem_pin+0x2c/0x150 [gma500_gfx] > [ 512.123601] gma_pipe_set_base+0x76/0x240 [gma500_gfx] > [ 512.123708] gma_crtc_page_flip+0x95/0x130 [gma500_gfx] > [ 512.123808] drm_mode_page_flip_ioctl+0x57d/0x5d0 > [ 512.123897] ? drm_mode_cursor2_ioctl+0x10/0x10 > [ 512.123936] drm_ioctl_kernel+0xa1/0x150 > [ 512.123984] drm_ioctl+0x21f/0x420 > [ 512.124025] ? drm_mode_cursor2_ioctl+0x10/0x10 > [ 512.124070] ? rcu_read_lock_bh_held+0xb/0x60 > [ 512.124104] ? lock_release+0x1ef/0x2d0 > [ 512.124161] __x64_sys_ioctl+0x8d/0xd0 > [ 512.124203] do_syscall_64+0x58/0x80 > [ 512.124239] ? do_syscall_64+0x67/0x80 > [ 512.124267] ? trace_hardirqs_on_prepare+0x55/0xe0 > [ 512.124300] ? do_syscall_64+0x67/0x80 > [ 512.124340] ? rcu_read_lock_sched_held+0x10/0x80 > [ 512.124377] entry_SYSCALL_64_after_hwframe+0x63/0xcd > [ 512.124411] RIP: 0033:0x7fcc4a70740f > [ 512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > [ 512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f > [ 512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009 > [ 512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034 > [ 512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0 > [ 512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0 > [ 512.124647] </TASK> > > Signed-off-by: Hans de Goede <hdegoede@redhat.com> > --- > drivers/gpu/drm/gma500/gma_display.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c > index bd40c040a2c9..cf038e322164 100644 > --- a/drivers/gpu/drm/gma500/gma_display.c > +++ b/drivers/gpu/drm/gma500/gma_display.c > @@ -532,15 +532,19 @@ int gma_crtc_page_flip(struct drm_crtc *crtc, > WARN_ON(drm_crtc_vblank_get(crtc) != 0); > > gma_crtc->page_flip_event = event; > + spin_unlock_irqrestore(&dev->event_lock, flags); > > /* Call this locked if we want an event at vblank interrupt. */ If we don't hold the event_lock around mode_set_base() we could potentially get a vblank before we do the modeset. That would send the event prematurely. I think this is what the comment tries to tell us. -Patrik > ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb); > if (ret) { > - gma_crtc->page_flip_event = NULL; > - drm_crtc_vblank_put(crtc); > + spin_lock_irqsave(&dev->event_lock, flags); > + if (gma_crtc->page_flip_event) { > + gma_crtc->page_flip_event = NULL; > + drm_crtc_vblank_put(crtc); > + } > + spin_unlock_irqrestore(&dev->event_lock, flags); > } > > - spin_unlock_irqrestore(&dev->event_lock, flags); > } else { > ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb); > } > -- > 2.36.1 >
Hi, On 9/6/22 14:50, Patrik Jakobsson wrote: > On Mon, Sep 5, 2022 at 3:37 PM Hans de Goede <hdegoede@redhat.com> wrote: >> >> gma_crtc_page_flip() was holding the event_lock spinlock while calling >> crtc_funcs->mode_set_base() which takes ww_mutex. >> >> The only reason to hold event_lock is to clear gma_crtc->page_flip_event >> on mode_set_base() errors. >> >> Instead unlock it after setting gma_crtc->page_flip_event and on >> errors re-take the lock and clear gma_crtc->page_flip_event it >> it is still set. > > Hi Hans, thanks for having a look at gma500. > > See comments below. > >> >> This fixes the following WARN/stacktrace: >> >> [ 512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870 >> [ 512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell >> [ 512.123031] preempt_count: 1, expected: 0 >> [ 512.123048] RCU nest depth: 0, expected: 0 >> [ 512.123066] INFO: lockdep is turned off. >> [ 512.123080] irq event stamp: 0 >> [ 512.123094] hardirqs last enabled at (0): [<0000000000000000>] 0x0 >> [ 512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0 >> [ 512.123176] softirqs last enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0 >> [ 512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0 >> [ 512.123233] Preemption disabled at: >> [ 512.123241] [<0000000000000000>] 0x0 >> [ 512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G W 5.19.0+ #1 >> [ 512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013 >> [ 512.123323] Call Trace: >> [ 512.123346] <TASK> >> [ 512.123370] dump_stack_lvl+0x5b/0x77 >> [ 512.123412] __might_resched.cold+0xff/0x13a >> [ 512.123458] ww_mutex_lock+0x1e/0xa0 >> [ 512.123495] psb_gem_pin+0x2c/0x150 [gma500_gfx] >> [ 512.123601] gma_pipe_set_base+0x76/0x240 [gma500_gfx] >> [ 512.123708] gma_crtc_page_flip+0x95/0x130 [gma500_gfx] >> [ 512.123808] drm_mode_page_flip_ioctl+0x57d/0x5d0 >> [ 512.123897] ? drm_mode_cursor2_ioctl+0x10/0x10 >> [ 512.123936] drm_ioctl_kernel+0xa1/0x150 >> [ 512.123984] drm_ioctl+0x21f/0x420 >> [ 512.124025] ? drm_mode_cursor2_ioctl+0x10/0x10 >> [ 512.124070] ? rcu_read_lock_bh_held+0xb/0x60 >> [ 512.124104] ? lock_release+0x1ef/0x2d0 >> [ 512.124161] __x64_sys_ioctl+0x8d/0xd0 >> [ 512.124203] do_syscall_64+0x58/0x80 >> [ 512.124239] ? do_syscall_64+0x67/0x80 >> [ 512.124267] ? trace_hardirqs_on_prepare+0x55/0xe0 >> [ 512.124300] ? do_syscall_64+0x67/0x80 >> [ 512.124340] ? rcu_read_lock_sched_held+0x10/0x80 >> [ 512.124377] entry_SYSCALL_64_after_hwframe+0x63/0xcd >> [ 512.124411] RIP: 0033:0x7fcc4a70740f >> [ 512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 >> [ 512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> [ 512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f >> [ 512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009 >> [ 512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034 >> [ 512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0 >> [ 512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0 >> [ 512.124647] </TASK> >> >> Signed-off-by: Hans de Goede <hdegoede@redhat.com> >> --- >> drivers/gpu/drm/gma500/gma_display.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c >> index bd40c040a2c9..cf038e322164 100644 >> --- a/drivers/gpu/drm/gma500/gma_display.c >> +++ b/drivers/gpu/drm/gma500/gma_display.c >> @@ -532,15 +532,19 @@ int gma_crtc_page_flip(struct drm_crtc *crtc, >> WARN_ON(drm_crtc_vblank_get(crtc) != 0); >> >> gma_crtc->page_flip_event = event; >> + spin_unlock_irqrestore(&dev->event_lock, flags); >> >> /* Call this locked if we want an event at vblank interrupt. */ > > If we don't hold the event_lock around mode_set_base() we could > potentially get a vblank before we do the modeset. That would send the > event prematurely. I think this is what the comment tries to tell us. There is no way to avoid the vblank-irq and the mode_set_base() call racing with each other and then delivering a blank event from what is actually the page-flip which just completed. Even with the old code, of we hold the lock over the mode_set_base() and then the vblank-irq triggers while we are holding the lock this will block the irq-handler until the mode_set_base() has completed (which is probably only a couple 100s of usecs / max 1 msec) and then as soon as gma_crtc_page_flip() releases the lock the irq-handler will continue running and still prematurely deliver the vblank event. In practice this is not a problem because userspace does: -submit frame -wait for vblank -render new frame -submit new frame -wait for vblank And the time it takes to render a new frame means that after the first wait for vblank userspace never hits the race, unless userspace cannot keep up (is rendering at say less then 60 fps) in that case it may hit the race if it is only barely keeping up and if it is only barely keeping up then userspace hitting / winning the race is actually a good thing, because then it should start rendering the next frame asap. So TL;DR: we cannot avoid sometimes racing but in practice this is not an issue. The kernel oopses this fixes OTOH are a real issue. I hope this helps explain why I still believe this is the right fix. Regards, Hans > > -Patrik > >> ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb); >> if (ret) { >> - gma_crtc->page_flip_event = NULL; >> - drm_crtc_vblank_put(crtc); >> + spin_lock_irqsave(&dev->event_lock, flags); >> + if (gma_crtc->page_flip_event) { >> + gma_crtc->page_flip_event = NULL; >> + drm_crtc_vblank_put(crtc); >> + } >> + spin_unlock_irqrestore(&dev->event_lock, flags); >> } >> >> - spin_unlock_irqrestore(&dev->event_lock, flags); >> } else { >> ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb); >> } >> -- >> 2.36.1 >> >
diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c index bd40c040a2c9..cf038e322164 100644 --- a/drivers/gpu/drm/gma500/gma_display.c +++ b/drivers/gpu/drm/gma500/gma_display.c @@ -532,15 +532,19 @@ int gma_crtc_page_flip(struct drm_crtc *crtc, WARN_ON(drm_crtc_vblank_get(crtc) != 0); gma_crtc->page_flip_event = event; + spin_unlock_irqrestore(&dev->event_lock, flags); /* Call this locked if we want an event at vblank interrupt. */ ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb); if (ret) { - gma_crtc->page_flip_event = NULL; - drm_crtc_vblank_put(crtc); + spin_lock_irqsave(&dev->event_lock, flags); + if (gma_crtc->page_flip_event) { + gma_crtc->page_flip_event = NULL; + drm_crtc_vblank_put(crtc); + } + spin_unlock_irqrestore(&dev->event_lock, flags); } - spin_unlock_irqrestore(&dev->event_lock, flags); } else { ret = crtc_funcs->mode_set_base(crtc, crtc->x, crtc->y, old_fb); }
gma_crtc_page_flip() was holding the event_lock spinlock while calling crtc_funcs->mode_set_base() which takes ww_mutex. The only reason to hold event_lock is to clear gma_crtc->page_flip_event on mode_set_base() errors. Instead unlock it after setting gma_crtc->page_flip_event and on errors re-take the lock and clear gma_crtc->page_flip_event it it is still set. This fixes the following WARN/stacktrace: [ 512.122953] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:870 [ 512.123004] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1253, name: gnome-shell [ 512.123031] preempt_count: 1, expected: 0 [ 512.123048] RCU nest depth: 0, expected: 0 [ 512.123066] INFO: lockdep is turned off. [ 512.123080] irq event stamp: 0 [ 512.123094] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 512.123134] hardirqs last disabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0 [ 512.123176] softirqs last enabled at (0): [<ffffffff8d0ec28c>] copy_process+0x9fc/0x1de0 [ 512.123207] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 512.123233] Preemption disabled at: [ 512.123241] [<0000000000000000>] 0x0 [ 512.123275] CPU: 3 PID: 1253 Comm: gnome-shell Tainted: G W 5.19.0+ #1 [ 512.123304] Hardware name: Packard Bell dot s/SJE01_CT, BIOS V1.10 07/23/2013 [ 512.123323] Call Trace: [ 512.123346] <TASK> [ 512.123370] dump_stack_lvl+0x5b/0x77 [ 512.123412] __might_resched.cold+0xff/0x13a [ 512.123458] ww_mutex_lock+0x1e/0xa0 [ 512.123495] psb_gem_pin+0x2c/0x150 [gma500_gfx] [ 512.123601] gma_pipe_set_base+0x76/0x240 [gma500_gfx] [ 512.123708] gma_crtc_page_flip+0x95/0x130 [gma500_gfx] [ 512.123808] drm_mode_page_flip_ioctl+0x57d/0x5d0 [ 512.123897] ? drm_mode_cursor2_ioctl+0x10/0x10 [ 512.123936] drm_ioctl_kernel+0xa1/0x150 [ 512.123984] drm_ioctl+0x21f/0x420 [ 512.124025] ? drm_mode_cursor2_ioctl+0x10/0x10 [ 512.124070] ? rcu_read_lock_bh_held+0xb/0x60 [ 512.124104] ? lock_release+0x1ef/0x2d0 [ 512.124161] __x64_sys_ioctl+0x8d/0xd0 [ 512.124203] do_syscall_64+0x58/0x80 [ 512.124239] ? do_syscall_64+0x67/0x80 [ 512.124267] ? trace_hardirqs_on_prepare+0x55/0xe0 [ 512.124300] ? do_syscall_64+0x67/0x80 [ 512.124340] ? rcu_read_lock_sched_held+0x10/0x80 [ 512.124377] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 512.124411] RIP: 0033:0x7fcc4a70740f [ 512.124442] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 512.124470] RSP: 002b:00007ffda73f5390 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 512.124503] RAX: ffffffffffffffda RBX: 000055cc9e474500 RCX: 00007fcc4a70740f [ 512.124524] RDX: 00007ffda73f5420 RSI: 00000000c01864b0 RDI: 0000000000000009 [ 512.124544] RBP: 00007ffda73f5420 R08: 000055cc9c0b0cb0 R09: 0000000000000034 [ 512.124564] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01864b0 [ 512.124584] R13: 0000000000000009 R14: 000055cc9df484d0 R15: 000055cc9af5d0c0 [ 512.124647] </TASK> Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- drivers/gpu/drm/gma500/gma_display.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)