Message ID | 20240924094515.3561410-6-liwei391@huawei.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Steven Rostedt |
Headers | show |
Series | tracing: Fix several deadlock/race issues in timerlat and hwlat tracer | expand |
On Tue, 24 Sep 2024 17:45:15 +0800 Wei Li <liwei391@huawei.com> wrote: > Another "hung task" error was reported during the test, and i figured out > the deadlock scenario is as follows: > > T1 [BP] | T2 [AP] | T3 [hwlatd/1] | T4 > work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn() | hwlat_hotplug_workfn() > _cpu_down() | stop_cpu_kthread() | | mutex_lock(&hwlat_data.lock) > cpus_write_lock() | kthread_stop(hwlatd/1) | mutex_lock(&hwlat_data.lock) | > __cpuhp_kick_ap() | wait_for_completion() | | cpus_read_lock() > > It constitutes ABBA deadlock indirectly between "cpu_hotplug_lock" and > "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible > to fix this. > > Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") > Signed-off-by: Wei Li <liwei391@huawei.com> > --- > kernel/trace/trace_hwlat.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c > index 3bd6071441ad..4c228ccb8a38 100644 > --- a/kernel/trace/trace_hwlat.c > +++ b/kernel/trace/trace_hwlat.c > @@ -370,7 +370,8 @@ static int kthread_fn(void *data) > get_sample(); > local_irq_enable(); > > - mutex_lock(&hwlat_data.lock); > + if (mutex_lock_interruptible(&hwlat_data.lock)) > + break; So basically this requires as signal to break it out of the loop? But if it receives a signal for any other reason, it breaks out of the loop too. Which is not what we want. If anything, it should be: if (mutex_lock_interruptible(&hwlat_data.lock)) continue; But I still don't really like this solution, as it will still report a deadlock. Is it possible to switch the cpu_read_lock() to be taken before the hwlat_data.lock? -- Steve > interval = hwlat_data.sample_window - hwlat_data.sample_width; > mutex_unlock(&hwlat_data.lock); >
On 2024/10/4 4:19, Steven Rostedt wrote: > On Tue, 24 Sep 2024 17:45:15 +0800 > Wei Li <liwei391@huawei.com> wrote: > >> Another "hung task" error was reported during the test, and i figured out >> the deadlock scenario is as follows: >> >> T1 [BP] | T2 [AP] | T3 [hwlatd/1] | T4 >> work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn() | hwlat_hotplug_workfn() >> _cpu_down() | stop_cpu_kthread() | | mutex_lock(&hwlat_data.lock) >> cpus_write_lock() | kthread_stop(hwlatd/1) | mutex_lock(&hwlat_data.lock) | >> __cpuhp_kick_ap() | wait_for_completion() | | cpus_read_lock() >> >> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() and 'break' if signal pending below, i choosed 'break' here too.tween "cpu_hotplug_lock" and >> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible >> to fix this. >> >> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") >> Signed-off-by: Wei Li <liwei391@huawei.com> >> --- >> kernel/trace/trace_hwlat.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c >> index 3bd6071441ad..4c228ccb8a38 100644 >> --- a/kernel/trace/trace_hwlat.c >> +++ b/kernel/trace/trace_hwlat.c >> @@ -370,7 +370,8 @@ static int kthread_fn(void *data) >> get_sample(); >> local_irq_enable(); >> >> - mutex_lock(&hwlat_data.lock); >> + if (mutex_lock_interruptible(&hwlat_data.lock)) >> + break; > > So basically this requires as signal to break it out of the loop? > > But if it receives a signal for any other reason, it breaks out of the loop > too. Which is not what we want. If anything, it should be: > > if (mutex_lock_interruptible(&hwlat_data.lock)) > continue; As it calls msleep_interruptible() below and 'break' if signal pending, i choosed 'break' here too. > But I still don't really like this solution, as it will still report a > deadlock. > > Is it possible to switch the cpu_read_lock() to be taken before the > hwlat_data.lock? It's a little hard to change the sequence of these two locks, we'll hold "cpu_hotplug_lock" for longer unnecessarily if we do that. But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try another modification. Thanks, Wei
On Wed, 9 Oct 2024 15:47:23 +0800 "liwei (GF)" <liwei391@huawei.com> wrote: > On 2024/10/4 4:19, Steven Rostedt wrote: > > On Tue, 24 Sep 2024 17:45:15 +0800 > > Wei Li <liwei391@huawei.com> wrote: > > > >> Another "hung task" error was reported during the test, and i figured out > >> the deadlock scenario is as follows: > >> > >> T1 [BP] | T2 [AP] | T3 [hwlatd/1] | T4 > >> work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn() | hwlat_hotplug_workfn() > >> _cpu_down() | stop_cpu_kthread() | | mutex_lock(&hwlat_data.lock) > >> cpus_write_lock() | kthread_stop(hwlatd/1) | mutex_lock(&hwlat_data.lock) | > >> __cpuhp_kick_ap() | wait_for_completion() | | cpus_read_lock() So, if we can make T3 not take the mutex_lock then that should be a solution, right? > >> > >> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() and 'break' if signal pending below, i choosed 'break' here too.tween "cpu_hotplug_lock" and > >> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible > >> to fix this. > >> > >> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") > >> Signed-off-by: Wei Li <liwei391@huawei.com> > >> --- > >> kernel/trace/trace_hwlat.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c > >> index 3bd6071441ad..4c228ccb8a38 100644 > >> --- a/kernel/trace/trace_hwlat.c > >> +++ b/kernel/trace/trace_hwlat.c > >> @@ -370,7 +370,8 @@ static int kthread_fn(void *data) > >> get_sample(); > >> local_irq_enable(); > >> > >> - mutex_lock(&hwlat_data.lock); > >> + if (mutex_lock_interruptible(&hwlat_data.lock)) > >> + break; > > > > So basically this requires as signal to break it out of the loop? > > > > But if it receives a signal for any other reason, it breaks out of the loop > > too. Which is not what we want. If anything, it should be: > > > > if (mutex_lock_interruptible(&hwlat_data.lock)) > > continue; > > As it calls msleep_interruptible() below and 'break' if signal pending, i > choosed 'break' here too. > > > But I still don't really like this solution, as it will still report a > > deadlock. > > > > Is it possible to switch the cpu_read_lock() to be taken before the > > hwlat_data.lock? > > It's a little hard to change the sequence of these two locks, we'll hold > "cpu_hotplug_lock" for longer unnecessarily if we do that. > > But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try > another modification. Have you found something yet? Looking at the code we have: mutex_lock(&hwlat_data.lock); interval = hwlat_data.sample_window - hwlat_data.sample_width; mutex_unlock(&hwlat_data.lock); Where the lock is only there to synchronize the calculation of the interval. We could add a counter for when sample_window and sample_width are updated, and we could simply do: again: counter = atomic_read(&hwlat_data.counter); smp_rmb(); if (!(counter & 1)) { new_interval = hwlat_data.sample_window - hwlat_data.sample_width; smp_rmb(); if (counter == atomic_read(&hwlat_data.counter)) interval = new_interval; } Then we could do something like: atomic_inc(&hwlat_data.counter); smp_wmb(); /* update sample_window or sample_width */ smp_wmb(); atomic_inc(&hwlat_data.counter); And then the interval will only be updated if the values are not being updated. Otherwise it just keeps the previous value. -- Steve
Hi Steven, On 2024/11/13 7:50, Steven Rostedt wrote: >>>> Another "hung task" error was reported during the test, and i figured out >>>> the deadlock scenario is as follows: >>>> >>>> T1 [BP] | T2 [AP] | T3 [hwlatd/1] | T4 >>>> work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn() | hwlat_hotplug_workfn() >>>> _cpu_down() | stop_cpu_kthread() | | mutex_lock(&hwlat_data.lock) >>>> cpus_write_lock() | kthread_stop(hwlatd/1) | mutex_lock(&hwlat_data.lock) | >>>> __cpuhp_kick_ap() | wait_for_completion() | | cpus_read_lock() > > So, if we can make T3 not take the mutex_lock then that should be a > solution, right? > >>>> >>>> It constitutes ABBA deadlock indirectly beAs it calls msleep_interruptible() and 'break' if signal pending below, i choosed 'break' here too.tween "cpu_hotplug_lock" and >>>> "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible >>>> to fix this. >>>> >>>> Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") >>>> Signed-off-by: Wei Li <liwei391@huawei.com> >>>> --- >>>> kernel/trace/trace_hwlat.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c >>>> index 3bd6071441ad..4c228ccb8a38 100644 >>>> --- a/kernel/trace/trace_hwlat.c >>>> +++ b/kernel/trace/trace_hwlat.c >>>> @@ -370,7 +370,8 @@ static int kthread_fn(void *data) >>>> get_sample(); >>>> local_irq_enable(); >>>> >>>> - mutex_lock(&hwlat_data.lock); >>>> + if (mutex_lock_interruptible(&hwlat_data.lock)) >>>> + break; >>> >>> So basically this requires as signal to break it out of the loop? >>> >>> But if it receives a signal for any other reason, it breaks out of the loop >>> too. Which is not what we want. If anything, it should be: >>> >>> if (mutex_lock_interruptible(&hwlat_data.lock)) >>> continue; >> >> As it calls msleep_interruptible() below and 'break' if signal pending, i >> choosed 'break' here too. >> >>> But I still don't really like this solution, as it will still report a >>> deadlock. >>> >>> Is it possible to switch the cpu_read_lock() to be taken before the >>> hwlat_data.lock? >> >> It's a little hard to change the sequence of these two locks, we'll hold >> "cpu_hotplug_lock" for longer unnecessarily if we do that. >> >> But maybe we can remove the "hwlat_data.lock" in kthread_fn(), let me try >> another modification. > > Have you found something yet? Looking at the code we have: > > mutex_lock(&hwlat_data.lock); > interval = hwlat_data.sample_window - hwlat_data.sample_width; > mutex_unlock(&hwlat_data.lock); > > Where the lock is only there to synchronize the calculation of the > interval. We could add a counter for when sample_window and sample_width > are updated, and we could simply do: > > again: > counter = atomic_read(&hwlat_data.counter); > smp_rmb(); > if (!(counter & 1)) { > new_interval = hwlat_data.sample_window - hwlat_data.sample_width; > smp_rmb(); > if (counter == atomic_read(&hwlat_data.counter)) > interval = new_interval; > } > > Then we could do something like: > > atomic_inc(&hwlat_data.counter); > smp_wmb(); > /* update sample_window or sample_width */ > smp_wmb(); > atomic_inc(&hwlat_data.counter); > > And then the interval will only be updated if the values are not being > updated. Otherwise it just keeps the previous value. > Your seqlock-like solution seems to be able to solve this issue, but the difficulty is that the current updates of sample_window and sample_width are implemented using the framework of 'trace_min_max_fops'. We cannot add 'atomic_inc(&hwlat_data.counter)' into the update processes for sample_window and sample_width directly. If we want to remove the mutex_lock here, maybe we need to break the application of trace_min_max_write(). However, if we do that, we can add a 'hwlat_data.sample_interval' and update it at the same time as updating sample_window and sample_width. I didn't figure out an elegant solution yet. Thanks, Wei
diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 3bd6071441ad..4c228ccb8a38 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -370,7 +370,8 @@ static int kthread_fn(void *data) get_sample(); local_irq_enable(); - mutex_lock(&hwlat_data.lock); + if (mutex_lock_interruptible(&hwlat_data.lock)) + break; interval = hwlat_data.sample_window - hwlat_data.sample_width; mutex_unlock(&hwlat_data.lock);
Another "hung task" error was reported during the test, and i figured out the deadlock scenario is as follows: T1 [BP] | T2 [AP] | T3 [hwlatd/1] | T4 work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn() | hwlat_hotplug_workfn() _cpu_down() | stop_cpu_kthread() | | mutex_lock(&hwlat_data.lock) cpus_write_lock() | kthread_stop(hwlatd/1) | mutex_lock(&hwlat_data.lock) | __cpuhp_kick_ap() | wait_for_completion() | | cpus_read_lock() It constitutes ABBA deadlock indirectly between "cpu_hotplug_lock" and "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible to fix this. Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") Signed-off-by: Wei Li <liwei391@huawei.com> --- kernel/trace/trace_hwlat.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)