Message ID | 1479900325-28358-1-git-send-email-nhaehnle@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: > From: Nicolai Hähnle <Nicolai.Haehnle@amd.com> > > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in > the following example. Acquire context stamps are ordered like the thread > numbers, i.e. thread #1 should back off when it encounters a mutex locked > by thread #0 etc. > > Thread #0 Thread #1 Thread #2 Thread #3 > --------- --------- --------- --------- > lock(ww) > success > lock(ww') > success > lock(ww) > lock(ww) . > . . unlock(ww) part 1 > lock(ww) . . . > success . . . > . . unlock(ww) part 2 > . back off > lock(ww') . > . . > (stuck) (stuck) > > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 > (without being protected by lock->base.wait_lock), meaning that thread #0 > can acquire ww in the fast path or, much more likely, the medium path > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then > won't wake up any of the waiters in ww_mutex_set_context_fastpath. > > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is > thread #2, since waiters are added at the tail. Thread #2 wakes up and > backs off since it sees ww owned by a context with a lower stamp. > > Meanwhile, thread #1 is never woken up, and so it won't back off its lock > on ww'. So thread #0 gets stuck waiting for ww' to be released. > > This patch fixes the deadlock by waking up all waiters in the slow path > of ww_mutex_unlock. > > We have an internal test case for amdgpu which continuously submits > command streams from tens of threads, where all command streams reference > hundreds of GPU buffer objects with a lot of overlap in the buffer lists > between command streams. This test reliably caused a deadlock, and while I > haven't completely confirmed that it is exactly the scenario outlined > above, this patch does fix the test case. > > v2: > - use wake_q_add > - add additional explanations > > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> > Cc: dri-devel@lists.freedesktop.org > Cc: stable@vger.kernel.org > Reviewed-by: Christian König <christian.koenig@amd.com> (v1) > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Yeah, when the owning ctx changes we need to wake up all waiters, to make sure we catch all (new) deadlock scenarios. And I tried poking at your example, and I think it's solid and can't be minimized any further. I don't have much clue on mutex.c code itself, but the changes seem reasonable. With that caveat: Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cheers, Daniel > --- > kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index a70b90d..7fbf9b4 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock, > __visible __used noinline > void __sched __mutex_unlock_slowpath(atomic_t *lock_count); > > +static __used noinline > +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count); > + > /** > * mutex_unlock - release the mutex > * @lock: the mutex to be released > @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) > */ > mutex_clear_owner(&lock->base); > #endif > - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); > + /* > + * A previously _not_ waiting task may acquire the lock via the fast > + * path during our unlock. In that case, already waiting tasks may have > + * to back off to avoid a deadlock. Wake up all waiters so that they > + * can check their acquire context stamp against the new owner. > + */ > + __mutex_fastpath_unlock(&lock->base.count, > + __mutex_unlock_slowpath_wakeall); > } > EXPORT_SYMBOL(ww_mutex_unlock); > > @@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible); > * Release the lock, slowpath: > */ > static inline void > -__mutex_unlock_common_slowpath(struct mutex *lock, int nested) > +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all) > { > unsigned long flags; > WAKE_Q(wake_q); > @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested) > mutex_release(&lock->dep_map, nested, _RET_IP_); > debug_mutex_unlock(lock); > > - if (!list_empty(&lock->wait_list)) { > + if (wake_all) { > + struct mutex_waiter *waiter; > + > + list_for_each_entry(waiter, &lock->wait_list, list) { > + debug_mutex_wake_waiter(lock, waiter); > + wake_q_add(&wake_q, waiter->task); > + } > + } else if (!list_empty(&lock->wait_list)) { > /* get the first entry from the wait-list: */ > struct mutex_waiter *waiter = > list_entry(lock->wait_list.next, > @@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count) > { > struct mutex *lock = container_of(lock_count, struct mutex, count); > > - __mutex_unlock_common_slowpath(lock, 1); > + __mutex_unlock_common_slowpath(lock, 1, 0); > +} > + > +static void > +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count) > +{ > + struct mutex *lock = container_of(lock_count, struct mutex, count); > + > + __mutex_unlock_common_slowpath(lock, 1, 1); > } > > #ifndef CONFIG_DEBUG_LOCK_ALLOC > -- > 2.7.4 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: > From: Nicolai Hähnle <Nicolai.Haehnle@amd.com> > > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in > the following example. Acquire context stamps are ordered like the thread > numbers, i.e. thread #1 should back off when it encounters a mutex locked > by thread #0 etc. > > Thread #0 Thread #1 Thread #2 Thread #3 > --------- --------- --------- --------- > lock(ww) > success > lock(ww') > success > lock(ww) > lock(ww) . > . . unlock(ww) part 1 > lock(ww) . . . > success . . . > . . unlock(ww) part 2 > . back off > lock(ww') . > . . > (stuck) (stuck) > > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 > (without being protected by lock->base.wait_lock), meaning that thread #0 > can acquire ww in the fast path or, much more likely, the medium path > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then > won't wake up any of the waiters in ww_mutex_set_context_fastpath. > > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is > thread #2, since waiters are added at the tail. Thread #2 wakes up and > backs off since it sees ww owned by a context with a lower stamp. > > Meanwhile, thread #1 is never woken up, and so it won't back off its lock > on ww'. So thread #0 gets stuck waiting for ww' to be released. > > This patch fixes the deadlock by waking up all waiters in the slow path > of ww_mutex_unlock. > > We have an internal test case for amdgpu which continuously submits > command streams from tens of threads, where all command streams reference > hundreds of GPU buffer objects with a lot of overlap in the buffer lists > between command streams. This test reliably caused a deadlock, and while I > haven't completely confirmed that it is exactly the scenario outlined > above, this patch does fix the test case. > > v2: > - use wake_q_add > - add additional explanations > > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> > Cc: dri-devel@lists.freedesktop.org > Cc: stable@vger.kernel.org > Reviewed-by: Christian König <christian.koenig@amd.com> (v1) > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Completely and utterly fails to apply; I think this patch is based on code prior to the mutex rewrite. Please rebase on tip/locking/core. Also, is this a regression, or has this been a 'feature' of the ww_mutex code from early on?
On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote: > On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: > > From: Nicolai Hähnle <Nicolai.Haehnle@amd.com> > > > > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in > > the following example. Acquire context stamps are ordered like the thread > > numbers, i.e. thread #1 should back off when it encounters a mutex locked > > by thread #0 etc. > > > > Thread #0 Thread #1 Thread #2 Thread #3 > > --------- --------- --------- --------- > > lock(ww) > > success > > lock(ww') > > success > > lock(ww) > > lock(ww) . > > . . unlock(ww) part 1 > > lock(ww) . . . > > success . . . > > . . unlock(ww) part 2 > > . back off > > lock(ww') . > > . . > > (stuck) (stuck) > > > > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 > > (without being protected by lock->base.wait_lock), meaning that thread #0 > > can acquire ww in the fast path or, much more likely, the medium path > > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then > > won't wake up any of the waiters in ww_mutex_set_context_fastpath. > > > > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is > > thread #2, since waiters are added at the tail. Thread #2 wakes up and > > backs off since it sees ww owned by a context with a lower stamp. > > > > Meanwhile, thread #1 is never woken up, and so it won't back off its lock > > on ww'. So thread #0 gets stuck waiting for ww' to be released. > > > > This patch fixes the deadlock by waking up all waiters in the slow path > > of ww_mutex_unlock. > > > > We have an internal test case for amdgpu which continuously submits > > command streams from tens of threads, where all command streams reference > > hundreds of GPU buffer objects with a lot of overlap in the buffer lists > > between command streams. This test reliably caused a deadlock, and while I > > haven't completely confirmed that it is exactly the scenario outlined > > above, this patch does fix the test case. > > > > v2: > > - use wake_q_add > > - add additional explanations > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> > > Cc: dri-devel@lists.freedesktop.org > > Cc: stable@vger.kernel.org > > Reviewed-by: Christian König <christian.koenig@amd.com> (v1) > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> > > Completely and utterly fails to apply; I think this patch is based on > code prior to the mutex rewrite. > > Please rebase on tip/locking/core. > > Also, is this a regression, or has this been a 'feature' of the ww_mutex > code from early on? Sorry forgot to mention that, but I checked. Seems to have been broken since day 1, at least looking at the original code the wake-single-waiter stuff is as old as the mutex code added in 2006. -Daniel
On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote: > On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: > > > From: Nicolai Hähnle <Nicolai.Haehnle@amd.com> > > > > > > Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in > > > the following example. Acquire context stamps are ordered like the thread > > > numbers, i.e. thread #1 should back off when it encounters a mutex locked > > > by thread #0 etc. > > > > > > Thread #0 Thread #1 Thread #2 Thread #3 > > > --------- --------- --------- --------- > > > lock(ww) > > > success > > > lock(ww') > > > success > > > lock(ww) > > > lock(ww) . > > > . . unlock(ww) part 1 > > > lock(ww) . . . > > > success . . . > > > . . unlock(ww) part 2 > > > . back off > > > lock(ww') . > > > . . > > > (stuck) (stuck) > > > > > > Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 > > > (without being protected by lock->base.wait_lock), meaning that thread #0 > > > can acquire ww in the fast path or, much more likely, the medium path > > > in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then > > > won't wake up any of the waiters in ww_mutex_set_context_fastpath. > > > > > > Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is > > > thread #2, since waiters are added at the tail. Thread #2 wakes up and > > > backs off since it sees ww owned by a context with a lower stamp. > > > > > > Meanwhile, thread #1 is never woken up, and so it won't back off its lock > > > on ww'. So thread #0 gets stuck waiting for ww' to be released. > > > > > > This patch fixes the deadlock by waking up all waiters in the slow path > > > of ww_mutex_unlock. > > > > > > We have an internal test case for amdgpu which continuously submits > > > command streams from tens of threads, where all command streams reference > > > hundreds of GPU buffer objects with a lot of overlap in the buffer lists > > > between command streams. This test reliably caused a deadlock, and while I > > > haven't completely confirmed that it is exactly the scenario outlined > > > above, this patch does fix the test case. > > > > > > v2: > > > - use wake_q_add > > > - add additional explanations > > > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > Cc: Ingo Molnar <mingo@redhat.com> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> > > > Cc: dri-devel@lists.freedesktop.org > > > Cc: stable@vger.kernel.org > > > Reviewed-by: Christian König <christian.koenig@amd.com> (v1) > > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> > > > > Completely and utterly fails to apply; I think this patch is based on > > code prior to the mutex rewrite. > > > > Please rebase on tip/locking/core. > > > > Also, is this a regression, or has this been a 'feature' of the ww_mutex > > code from early on? > > Sorry forgot to mention that, but I checked. Seems to have been broken > since day 1, at least looking at the original code the wake-single-waiter > stuff is as old as the mutex code added in 2006. More details: For gpu drivers this was originally working, since the ww_mutex implementation in ttm did use wake_up_all. So need to add a Fixes: 5e338405119a ("drm/ttm: convert to the reservation api")
Op 23-11-16 om 14:11 schreef Daniel Vetter: > On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote: >> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote: >>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: >>>> From: Nicolai Hähnle <Nicolai.Haehnle@amd.com> >>>> >>>> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in >>>> the following example. Acquire context stamps are ordered like the thread >>>> numbers, i.e. thread #1 should back off when it encounters a mutex locked >>>> by thread #0 etc. >>>> >>>> Thread #0 Thread #1 Thread #2 Thread #3 >>>> --------- --------- --------- --------- >>>> lock(ww) >>>> success >>>> lock(ww') >>>> success >>>> lock(ww) >>>> lock(ww) . >>>> . . unlock(ww) part 1 >>>> lock(ww) . . . >>>> success . . . >>>> . . unlock(ww) part 2 >>>> . back off >>>> lock(ww') . >>>> . . >>>> (stuck) (stuck) >>>> >>>> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 >>>> (without being protected by lock->base.wait_lock), meaning that thread #0 >>>> can acquire ww in the fast path or, much more likely, the medium path >>>> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then >>>> won't wake up any of the waiters in ww_mutex_set_context_fastpath. >>>> >>>> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is >>>> thread #2, since waiters are added at the tail. Thread #2 wakes up and >>>> backs off since it sees ww owned by a context with a lower stamp. >>>> >>>> Meanwhile, thread #1 is never woken up, and so it won't back off its lock >>>> on ww'. So thread #0 gets stuck waiting for ww' to be released. >>>> >>>> This patch fixes the deadlock by waking up all waiters in the slow path >>>> of ww_mutex_unlock. >>>> >>>> We have an internal test case for amdgpu which continuously submits >>>> command streams from tens of threads, where all command streams reference >>>> hundreds of GPU buffer objects with a lot of overlap in the buffer lists >>>> between command streams. This test reliably caused a deadlock, and while I >>>> haven't completely confirmed that it is exactly the scenario outlined >>>> above, this patch does fix the test case. >>>> >>>> v2: >>>> - use wake_q_add >>>> - add additional explanations >>>> >>>> Cc: Peter Zijlstra <peterz@infradead.org> >>>> Cc: Ingo Molnar <mingo@redhat.com> >>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> >>>> Cc: dri-devel@lists.freedesktop.org >>>> Cc: stable@vger.kernel.org >>>> Reviewed-by: Christian König <christian.koenig@amd.com> (v1) >>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> >>> Completely and utterly fails to apply; I think this patch is based on >>> code prior to the mutex rewrite. >>> >>> Please rebase on tip/locking/core. >>> >>> Also, is this a regression, or has this been a 'feature' of the ww_mutex >>> code from early on? >> Sorry forgot to mention that, but I checked. Seems to have been broken >> since day 1, at least looking at the original code the wake-single-waiter >> stuff is as old as the mutex code added in 2006. > More details: For gpu drivers this was originally working, since the > ww_mutex implementation in ttm did use wake_up_all. So need to add a > > Fixes: 5e338405119a ("drm/ttm: convert to the reservation api") But it's broken in the original kernel ww_mutex implementation, I guess it doesn't matter much since ttm was the first in-kernel user anyway. :) ~Maarten
On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: > @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) > */ > mutex_clear_owner(&lock->base); > #endif > - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); > + /* > + * A previously _not_ waiting task may acquire the lock via the fast > + * path during our unlock. In that case, already waiting tasks may have > + * to back off to avoid a deadlock. Wake up all waiters so that they > + * can check their acquire context stamp against the new owner. > + */ > + __mutex_fastpath_unlock(&lock->base.count, > + __mutex_unlock_slowpath_wakeall); > } So doing a wake-all has obvious issues with thundering herd etc.. Also, with the new mutex, you'd not be able to do hand-off, which would introduce starvation cases. Ideally we'd iterate the blocked list and pick the waiter with the earliest stamp, or we'd maintain the list in stamp order instead of FIFO, for ww_mutex.
On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote: > On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: > > @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) > > */ > > mutex_clear_owner(&lock->base); > > #endif > > - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); > > + /* > > + * A previously _not_ waiting task may acquire the lock via the fast > > + * path during our unlock. In that case, already waiting tasks may have > > + * to back off to avoid a deadlock. Wake up all waiters so that they > > + * can check their acquire context stamp against the new owner. > > + */ > > + __mutex_fastpath_unlock(&lock->base.count, > > + __mutex_unlock_slowpath_wakeall); > > } > > So doing a wake-all has obvious issues with thundering herd etc.. Also, > with the new mutex, you'd not be able to do hand-off, which would > introduce starvation cases. > > Ideally we'd iterate the blocked list and pick the waiter with the > earliest stamp, or we'd maintain the list in stamp order instead of > FIFO, for ww_mutex. Not sure we'll win that much, at least I think we still need to wake up everyone with earlier stamp than the one of the task that just released the lock. Otherwise there's deadlocks. So just cuts the wakeups in half, on average. What we could do is do a handoff-hint with the timestamp of earliest task we believe should get the lock. Everyone with a later timestamp that gets woken then knows that they definitely have a deadlock situation and need to back off (thread 2 in the example). thread 1 would get woken, and would be able to take the lock, except when thread 0 successfully raced it and stole the lock. And anyone else racing in with later timestamps would also immediately back off, ensuring fairness. Without thinking it through in detail this is a PI issue, except that we replace boosting with wakeup&back-off. Could we perhaps steal something from rt mutexes to make it fair&efficient? -Daniel
On Wed, Nov 23, 2016 at 03:25:25PM +0100, Daniel Vetter wrote: > Without thinking it through in detail this is a PI issue, except that we > replace boosting with wakeup&back-off. Could we perhaps steal something > from rt mutexes to make it fair&efficient? rt_mutexes order the waiters by 'priority' and wake the top most.
On 23.11.2016 15:25, Daniel Vetter wrote: > On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote: >> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote: >>> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) >>> */ >>> mutex_clear_owner(&lock->base); >>> #endif >>> - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); >>> + /* >>> + * A previously _not_ waiting task may acquire the lock via the fast >>> + * path during our unlock. In that case, already waiting tasks may have >>> + * to back off to avoid a deadlock. Wake up all waiters so that they >>> + * can check their acquire context stamp against the new owner. >>> + */ >>> + __mutex_fastpath_unlock(&lock->base.count, >>> + __mutex_unlock_slowpath_wakeall); >>> } >> >> So doing a wake-all has obvious issues with thundering herd etc.. Also, >> with the new mutex, you'd not be able to do hand-off, which would >> introduce starvation cases. >> >> Ideally we'd iterate the blocked list and pick the waiter with the >> earliest stamp, or we'd maintain the list in stamp order instead of >> FIFO, for ww_mutex. > > Not sure we'll win that much, at least I think we still need to wake up > everyone with earlier stamp than the one of the task that just released > the lock. Otherwise there's deadlocks. So just cuts the wakeups in half, > on average. > > What we could do is do a handoff-hint with the timestamp of earliest task > we believe should get the lock. Everyone with a later timestamp that gets > woken then knows that they definitely have a deadlock situation and need > to back off (thread 2 in the example). > > thread 1 would get woken, and would be able to take the lock, except when > thread 0 successfully raced it and stole the lock. And anyone else racing > in with later timestamps would also immediately back off, ensuring > fairness. I did consider maintaining stamp order in the waiter list and originally decided against it because I just wanted a simple and conservative fix to avoid adding new regressions. Now that a different approach is needed for >= 4.9 anyway mostly due to the hand-off logic, I'm reconsidering this. I do believe we can win a bit by keeping the wait list sorted, if we also make sure that waiters don't add themselves in the first place if they see that a deadlock situation cannot be avoided. I will probably want to extend struct mutex_waiter with ww_mutex-specific fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce pointer-chasing). That should be fine since it lives on the stack. In the meantime, I'd appreciate it if patch #1 could be accepted as-is for stable updates to <= 4.8. It fixes a real (if rare) bug, and the stampede inefficiency isn't a problem in practice at least for GPU applications. Thanks, Nicolai > > Without thinking it through in detail this is a PI issue, except that we > replace boosting with wakeup&back-off. Could we perhaps steal something > from rt mutexes to make it fair&efficient? > -Daniel >
On Thu, Nov 24, 2016 at 12:26:57PM +0100, Nicolai Hähnle wrote: > I do believe we can win a bit by keeping the wait list sorted, if we also > make sure that waiters don't add themselves in the first place if they see > that a deadlock situation cannot be avoided. > > I will probably want to extend struct mutex_waiter with ww_mutex-specific > fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce > pointer-chasing). That should be fine since it lives on the stack. Right, shouldn't be a problem I think. The only 'problem' I can see with using that is that its possible to mix ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the list order somewhat tricky. Ideally we'd remove that feature, although I see its actually used quite a bit :/ > In the meantime, I'd appreciate it if patch #1 could be accepted as-is for > stable updates to <= 4.8. It fixes a real (if rare) bug, and the stampede > inefficiency isn't a problem in practice at least for GPU applications. Sorry can't do. We don't do stable patches that don't have anything upstream.
On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra <peterz@infradead.org> wrote: > >> I do believe we can win a bit by keeping the wait list sorted, if we also >> make sure that waiters don't add themselves in the first place if they see >> that a deadlock situation cannot be avoided. >> >> I will probably want to extend struct mutex_waiter with ww_mutex-specific >> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce >> pointer-chasing). That should be fine since it lives on the stack. > > Right, shouldn't be a problem I think. > > The only 'problem' I can see with using that is that its possible to mix > ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the > list order somewhat tricky. > > Ideally we'd remove that feature, although I see its actually used quite > a bit :/ I guess we could create a small fake acquire_ctx for single-lock paths. That way callers still don't need to deal with having an explicit ctx, but we can assume the timestamp (for ensuring fairness) is available for all cases. Otherwise there's indeed a problem with correctly (well fairly) interleaving ctx and non-ctx lockers I think. -Daniel
On Thu, Nov 24, 2016 at 12:52:25PM +0100, Daniel Vetter wrote: > On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > > >> I do believe we can win a bit by keeping the wait list sorted, if we also > >> make sure that waiters don't add themselves in the first place if they see > >> that a deadlock situation cannot be avoided. > >> > >> I will probably want to extend struct mutex_waiter with ww_mutex-specific > >> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce > >> pointer-chasing). That should be fine since it lives on the stack. > > > > Right, shouldn't be a problem I think. > > > > The only 'problem' I can see with using that is that its possible to mix > > ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the > > list order somewhat tricky. > > > > Ideally we'd remove that feature, although I see its actually used quite > > a bit :/ > > I guess we could create a small fake acquire_ctx for single-lock > paths. That way callers still don't need to deal with having an > explicit ctx, but we can assume the timestamp (for ensuring fairness) > is available for all cases. Otherwise there's indeed a problem with > correctly (well fairly) interleaving ctx and non-ctx lockers I think. Actually tried that, but we need a ww_class to get a stamp from, and ww_mutex_lock() doesn't have one of those..
On 24.11.2016 12:56, Peter Zijlstra wrote: > On Thu, Nov 24, 2016 at 12:52:25PM +0100, Daniel Vetter wrote: >> On Thu, Nov 24, 2016 at 12:40 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>> >>>> I do believe we can win a bit by keeping the wait list sorted, if we also >>>> make sure that waiters don't add themselves in the first place if they see >>>> that a deadlock situation cannot be avoided. >>>> >>>> I will probably want to extend struct mutex_waiter with ww_mutex-specific >>>> fields to facilitate this (i.e. ctx pointer, perhaps stamp as well to reduce >>>> pointer-chasing). That should be fine since it lives on the stack. >>> >>> Right, shouldn't be a problem I think. >>> >>> The only 'problem' I can see with using that is that its possible to mix >>> ww and !ww waiters through ww_mutex_lock(.ctx = NULL). This makes the >>> list order somewhat tricky. >>> >>> Ideally we'd remove that feature, although I see its actually used quite >>> a bit :/ >> >> I guess we could create a small fake acquire_ctx for single-lock >> paths. That way callers still don't need to deal with having an >> explicit ctx, but we can assume the timestamp (for ensuring fairness) >> is available for all cases. Otherwise there's indeed a problem with >> correctly (well fairly) interleaving ctx and non-ctx lockers I think. > > Actually tried that, but we need a ww_class to get a stamp from, and > ww_mutex_lock() doesn't have one of those.. The acquire context needs to be live until the unlock anyway, so this is something that requires modifying the callers of ww_mutex_lock. Those should all have a ww_class available, or something is very wrong :) Nicolai
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index a70b90d..7fbf9b4 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock, __visible __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count); +static __used noinline +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count); + /** * mutex_unlock - release the mutex * @lock: the mutex to be released @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) */ mutex_clear_owner(&lock->base); #endif - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); + /* + * A previously _not_ waiting task may acquire the lock via the fast + * path during our unlock. In that case, already waiting tasks may have + * to back off to avoid a deadlock. Wake up all waiters so that they + * can check their acquire context stamp against the new owner. + */ + __mutex_fastpath_unlock(&lock->base.count, + __mutex_unlock_slowpath_wakeall); } EXPORT_SYMBOL(ww_mutex_unlock); @@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible); * Release the lock, slowpath: */ static inline void -__mutex_unlock_common_slowpath(struct mutex *lock, int nested) +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all) { unsigned long flags; WAKE_Q(wake_q); @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested) mutex_release(&lock->dep_map, nested, _RET_IP_); debug_mutex_unlock(lock); - if (!list_empty(&lock->wait_list)) { + if (wake_all) { + struct mutex_waiter *waiter; + + list_for_each_entry(waiter, &lock->wait_list, list) { + debug_mutex_wake_waiter(lock, waiter); + wake_q_add(&wake_q, waiter->task); + } + } else if (!list_empty(&lock->wait_list)) { /* get the first entry from the wait-list: */ struct mutex_waiter *waiter = list_entry(lock->wait_list.next, @@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count) { struct mutex *lock = container_of(lock_count, struct mutex, count); - __mutex_unlock_common_slowpath(lock, 1); + __mutex_unlock_common_slowpath(lock, 1, 0); +} + +static void +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count) +{ + struct mutex *lock = container_of(lock_count, struct mutex, count); + + __mutex_unlock_common_slowpath(lock, 1, 1); } #ifndef CONFIG_DEBUG_LOCK_ALLOC