diff mbox

Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

Message ID 51DD2FCB.70809@canonical.com (mailing list archive)
State New, archived
Headers show

Commit Message

Maarten Lankhorst July 10, 2013, 9:56 a.m. UTC
Op 10-07-13 11:46, Markus Trippelsdorf schreef:
> On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
>> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
>>> By simply copy/pasting a big document under LibreOffice my system hangs
>>> itself up. Only a hard reset gets it working again.
>>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
>>>
>>> I've bisected the issue to:
>>>
>>> commit ecff665f5e3f1c6909353e00b9420e45ae23d995
>>> Author: Maarten Lankhorst <m.b.lankhorst@gmail.com>
>>> Date:   Thu Jun 27 13:48:17 2013 +0200
>>>
>>>     drm/ttm: make ttm reservation calls behave like reservation calls
>>>     
>>>     This commit converts the source of the val_seq counter to
>>>     the ww_mutex api. The reservation objects are converted later,
>>>     because there is still a lockdep splat in nouveau that has to
>>>     resolved first.
>>>     
>>>     Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>     Reviewed-by: Jerome Glisse <jglisse@redhat.com>
>>>     Signed-off-by: Dave Airlie <airlied@redhat.com>
>> Hey,
>>
>> Can you try current head with CONFIG_PROVE_LOCKING set and post the
>> lockdep splat from dmesg, if any? If there is any locking issue
>> lockdep should warn about it.  Lockdep will turn itself off after the
>> first splat, so if the lockdep splat happens before running the
>> affected parts those will have to be fixed first.
> There was an unrelated EDAC lockdep splat, so I simply disabled it.
>
> This is what I get:
>
> Jul 10 11:40:44 x4 kernel: ================================================
> Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
> Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
> Jul 10 11:40:44 x4 kernel: ------------------------------------------------
> Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
> Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
> Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
> Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
> Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
> Jul 10 11:40:53 x4 kernel: Emergency Sync complete
>
Thanks, exactly what I thought. I missed a backoff somewhere..

Does the below patch fix it?

---

Comments

Markus Trippelsdorf July 10, 2013, 10:03 a.m. UTC | #1
On 2013.07.10 at 11:56 +0200, Maarten Lankhorst wrote:
> Op 10-07-13 11:46, Markus Trippelsdorf schreef:
> > On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
> >> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
> >>> By simply copy/pasting a big document under LibreOffice my system hangs
> >>> itself up. Only a hard reset gets it working again.
> >>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
> >>>
> >>> I've bisected the issue to:
> >>>
> >>> commit ecff665f5e3f1c6909353e00b9420e45ae23d995
> >>> Author: Maarten Lankhorst <m.b.lankhorst@gmail.com>
> >>> Date:   Thu Jun 27 13:48:17 2013 +0200
> >>>
> >>>     drm/ttm: make ttm reservation calls behave like reservation calls
> >>>     
> >>>     This commit converts the source of the val_seq counter to
> >>>     the ww_mutex api. The reservation objects are converted later,
> >>>     because there is still a lockdep splat in nouveau that has to
> >>>     resolved first.
> >>>     
> >>>     Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> >>>     Reviewed-by: Jerome Glisse <jglisse@redhat.com>
> >>>     Signed-off-by: Dave Airlie <airlied@redhat.com>
> >> Hey,
> >>
> >> Can you try current head with CONFIG_PROVE_LOCKING set and post the
> >> lockdep splat from dmesg, if any? If there is any locking issue
> >> lockdep should warn about it.  Lockdep will turn itself off after the
> >> first splat, so if the lockdep splat happens before running the
> >> affected parts those will have to be fixed first.
> > There was an unrelated EDAC lockdep splat, so I simply disabled it.
> >
> > This is what I get:
> >
> > Jul 10 11:40:44 x4 kernel: ================================================
> > Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
> > Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
> > Jul 10 11:40:44 x4 kernel: ------------------------------------------------
> > Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
> > Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
> > Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
> > Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
> > Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
> > Jul 10 11:40:53 x4 kernel: Emergency Sync complete
> >
> Thanks, exactly what I thought. I missed a backoff somewhere..
> 
> Does the below patch fix it?

Yes. Thank you for your quick reply.
diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 0219d26..2020bf4 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -377,6 +377,7 @@  int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
 					domain = lobj->alt_domain;
 					goto retry;
 				}
+				ttm_eu_backoff_reservation(ticket, head);
 				return r;
 			}
 		}