diff mbox series

perf/arm-dmc620: Reverse locking order in dmc620_pmu_get_irq()

Message ID 20230405172842.2663770-1-longman@redhat.com (mailing list archive)
State New, archived
Headers show
Series perf/arm-dmc620: Reverse locking order in dmc620_pmu_get_irq() | expand

Commit Message

Waiman Long April 5, 2023, 5:28 p.m. UTC
The following circular locking dependency was reported when running
cpus online/offline test on an arm64 system.

[   84.195923] Chain exists of:
                 dmc620_pmu_irqs_lock --> cpu_hotplug_lock --> cpuhp_state-down

[   84.207305]  Possible unsafe locking scenario:

[   84.213212]        CPU0                    CPU1
[   84.217729]        ----                    ----
[   84.222247]   lock(cpuhp_state-down);
[   84.225899]                                lock(cpu_hotplug_lock);
[   84.232068]                                lock(cpuhp_state-down);
[   84.238237]   lock(dmc620_pmu_irqs_lock);
[   84.242236]
                *** DEADLOCK ***

The problematic locking order seems to be

	lock(dmc620_pmu_irqs_lock) --> lock(cpu_hotplug_lock)

This locking order happens when dmc620_pmu_get_irq() is called from
dmc620_pmu_device_probe(). Fix this possible deadlock scenario by
reversing the locking order.

Also export __cpuhp_state_add_instance_cpuslocked() so that it can be
accessed by kernel modules.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 drivers/perf/arm_dmc620_pmu.c | 4 +++-
 kernel/cpu.c                  | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Will Deacon April 11, 2023, 12:38 p.m. UTC | #1
Hi Waiman,

[+Tuan Phan, Robin and Suzuki]

On Wed, Apr 05, 2023 at 01:28:42PM -0400, Waiman Long wrote:
> The following circular locking dependency was reported when running
> cpus online/offline test on an arm64 system.
> 
> [   84.195923] Chain exists of:
>                  dmc620_pmu_irqs_lock --> cpu_hotplug_lock --> cpuhp_state-down
> 
> [   84.207305]  Possible unsafe locking scenario:
> 
> [   84.213212]        CPU0                    CPU1
> [   84.217729]        ----                    ----
> [   84.222247]   lock(cpuhp_state-down);
> [   84.225899]                                lock(cpu_hotplug_lock);
> [   84.232068]                                lock(cpuhp_state-down);
> [   84.238237]   lock(dmc620_pmu_irqs_lock);
> [   84.242236]
>                 *** DEADLOCK ***
> 
> The problematic locking order seems to be
> 
> 	lock(dmc620_pmu_irqs_lock) --> lock(cpu_hotplug_lock)
> 
> This locking order happens when dmc620_pmu_get_irq() is called from
> dmc620_pmu_device_probe(). Fix this possible deadlock scenario by
> reversing the locking order.
> 
> Also export __cpuhp_state_add_instance_cpuslocked() so that it can be
> accessed by kernel modules.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  drivers/perf/arm_dmc620_pmu.c | 4 +++-
>  kernel/cpu.c                  | 1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
> index 54aa4658fb36..78d3bfbe96a6 100644
> --- a/drivers/perf/arm_dmc620_pmu.c
> +++ b/drivers/perf/arm_dmc620_pmu.c
> @@ -425,7 +425,7 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
>  	if (ret)
>  		goto out_free_irq;
>  
> -	ret = cpuhp_state_add_instance_nocalls(cpuhp_state_num, &irq->node);
> +	ret = cpuhp_state_add_instance_nocalls_cpuslocked(cpuhp_state_num, &irq->node);
>  	if (ret)
>  		goto out_free_irq;
>  
> @@ -445,9 +445,11 @@ static int dmc620_pmu_get_irq(struct dmc620_pmu *dmc620_pmu, int irq_num)
>  {
>  	struct dmc620_pmu_irq *irq;
>  
> +	cpus_read_lock();
>  	mutex_lock(&dmc620_pmu_irqs_lock);
>  	irq = __dmc620_pmu_get_irq(irq_num);
>  	mutex_unlock(&dmc620_pmu_irqs_lock);
> +	cpus_read_unlock();
>  
>  	if (IS_ERR(irq))
>  		return PTR_ERR(irq);
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6c0a92ca6bb5..05daaef362e6 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2036,6 +2036,7 @@ int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
>  	mutex_unlock(&cpuhp_state_mutex);
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance_cpuslocked);

Thanks for the fix, but I think it would be much cleaner if we could handle
this internally to the driver without having to export additional symbols
from the hotplug machinery.

Looking at the driver, it looks like it would make more sense to register
each PMU instance with the cpuhp state machine and avoid having to traverse
a mutable list, rather than add irq instances.

That said, I really don't grok this comment:

	/* We're only reading, but this isn't the place to be involving RCU */

Given that perf_pmu_migrate_context() calls synchronize_rcu()...

So perhaps we could just walk the list using RCU as the easiest fix?

Will
Waiman Long April 11, 2023, 1:44 p.m. UTC | #2
On 4/11/23 08:38, Will Deacon wrote:
> Hi Waiman,
>
> [+Tuan Phan, Robin and Suzuki]
>
> On Wed, Apr 05, 2023 at 01:28:42PM -0400, Waiman Long wrote:
>> The following circular locking dependency was reported when running
>> cpus online/offline test on an arm64 system.
>>
>> [   84.195923] Chain exists of:
>>                   dmc620_pmu_irqs_lock --> cpu_hotplug_lock --> cpuhp_state-down
>>
>> [   84.207305]  Possible unsafe locking scenario:
>>
>> [   84.213212]        CPU0                    CPU1
>> [   84.217729]        ----                    ----
>> [   84.222247]   lock(cpuhp_state-down);
>> [   84.225899]                                lock(cpu_hotplug_lock);
>> [   84.232068]                                lock(cpuhp_state-down);
>> [   84.238237]   lock(dmc620_pmu_irqs_lock);
>> [   84.242236]
>>                  *** DEADLOCK ***
>>
>> The problematic locking order seems to be
>>
>> 	lock(dmc620_pmu_irqs_lock) --> lock(cpu_hotplug_lock)
>>
>> This locking order happens when dmc620_pmu_get_irq() is called from
>> dmc620_pmu_device_probe(). Fix this possible deadlock scenario by
>> reversing the locking order.
>>
>> Also export __cpuhp_state_add_instance_cpuslocked() so that it can be
>> accessed by kernel modules.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   drivers/perf/arm_dmc620_pmu.c | 4 +++-
>>   kernel/cpu.c                  | 1 +
>>   2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
>> index 54aa4658fb36..78d3bfbe96a6 100644
>> --- a/drivers/perf/arm_dmc620_pmu.c
>> +++ b/drivers/perf/arm_dmc620_pmu.c
>> @@ -425,7 +425,7 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
>>   	if (ret)
>>   		goto out_free_irq;
>>   
>> -	ret = cpuhp_state_add_instance_nocalls(cpuhp_state_num, &irq->node);
>> +	ret = cpuhp_state_add_instance_nocalls_cpuslocked(cpuhp_state_num, &irq->node);
>>   	if (ret)
>>   		goto out_free_irq;
>>   
>> @@ -445,9 +445,11 @@ static int dmc620_pmu_get_irq(struct dmc620_pmu *dmc620_pmu, int irq_num)
>>   {
>>   	struct dmc620_pmu_irq *irq;
>>   
>> +	cpus_read_lock();
>>   	mutex_lock(&dmc620_pmu_irqs_lock);
>>   	irq = __dmc620_pmu_get_irq(irq_num);
>>   	mutex_unlock(&dmc620_pmu_irqs_lock);
>> +	cpus_read_unlock();
>>   
>>   	if (IS_ERR(irq))
>>   		return PTR_ERR(irq);
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 6c0a92ca6bb5..05daaef362e6 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -2036,6 +2036,7 @@ int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
>>   	mutex_unlock(&cpuhp_state_mutex);
>>   	return ret;
>>   }
>> +EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance_cpuslocked);
> Thanks for the fix, but I think it would be much cleaner if we could handle
> this internally to the driver without having to export additional symbols
> from the hotplug machinery.
>
> Looking at the driver, it looks like it would make more sense to register
> each PMU instance with the cpuhp state machine and avoid having to traverse
> a mutable list, rather than add irq instances.
>
> That said, I really don't grok this comment:
>
> 	/* We're only reading, but this isn't the place to be involving RCU */
>
> Given that perf_pmu_migrate_context() calls synchronize_rcu()...
>
> So perhaps we could just walk the list using RCU as the easiest fix?

My patch is just one of the possible fixes. I don't mind if you have a 
better fix in mind. My knowledge about the internal working of the 
driver is limited. So it will be great if someone more familiar with the 
driver can come up with a better fix.

Thanks,
longman
Robin Murphy April 11, 2023, 4:27 p.m. UTC | #3
On 11/04/2023 1:38 pm, Will Deacon wrote:
> Hi Waiman,
> 
> [+Tuan Phan, Robin and Suzuki]
> 
> On Wed, Apr 05, 2023 at 01:28:42PM -0400, Waiman Long wrote:
>> The following circular locking dependency was reported when running
>> cpus online/offline test on an arm64 system.
>>
>> [   84.195923] Chain exists of:
>>                   dmc620_pmu_irqs_lock --> cpu_hotplug_lock --> cpuhp_state-down
>>
>> [   84.207305]  Possible unsafe locking scenario:
>>
>> [   84.213212]        CPU0                    CPU1
>> [   84.217729]        ----                    ----
>> [   84.222247]   lock(cpuhp_state-down);
>> [   84.225899]                                lock(cpu_hotplug_lock);
>> [   84.232068]                                lock(cpuhp_state-down);
>> [   84.238237]   lock(dmc620_pmu_irqs_lock);
>> [   84.242236]
>>                  *** DEADLOCK ***
>>
>> The problematic locking order seems to be
>>
>> 	lock(dmc620_pmu_irqs_lock) --> lock(cpu_hotplug_lock)
>>
>> This locking order happens when dmc620_pmu_get_irq() is called from
>> dmc620_pmu_device_probe(). Fix this possible deadlock scenario by
>> reversing the locking order.
>>
>> Also export __cpuhp_state_add_instance_cpuslocked() so that it can be
>> accessed by kernel modules.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   drivers/perf/arm_dmc620_pmu.c | 4 +++-
>>   kernel/cpu.c                  | 1 +
>>   2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
>> index 54aa4658fb36..78d3bfbe96a6 100644
>> --- a/drivers/perf/arm_dmc620_pmu.c
>> +++ b/drivers/perf/arm_dmc620_pmu.c
>> @@ -425,7 +425,7 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
>>   	if (ret)
>>   		goto out_free_irq;
>>   
>> -	ret = cpuhp_state_add_instance_nocalls(cpuhp_state_num, &irq->node);
>> +	ret = cpuhp_state_add_instance_nocalls_cpuslocked(cpuhp_state_num, &irq->node);
>>   	if (ret)
>>   		goto out_free_irq;
>>   
>> @@ -445,9 +445,11 @@ static int dmc620_pmu_get_irq(struct dmc620_pmu *dmc620_pmu, int irq_num)
>>   {
>>   	struct dmc620_pmu_irq *irq;
>>   
>> +	cpus_read_lock();
>>   	mutex_lock(&dmc620_pmu_irqs_lock);
>>   	irq = __dmc620_pmu_get_irq(irq_num);
>>   	mutex_unlock(&dmc620_pmu_irqs_lock);
>> +	cpus_read_unlock();
>>   
>>   	if (IS_ERR(irq))
>>   		return PTR_ERR(irq);
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 6c0a92ca6bb5..05daaef362e6 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -2036,6 +2036,7 @@ int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
>>   	mutex_unlock(&cpuhp_state_mutex);
>>   	return ret;
>>   }
>> +EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance_cpuslocked);
> 
> Thanks for the fix, but I think it would be much cleaner if we could handle
> this internally to the driver without having to export additional symbols
> from the hotplug machinery.

Ack, I suspect using distinct locks for the global dmc620_pmu_irqs list 
and the per-IRQ-context PMU lists might be the simplest and cleanest 
way? I don't recall any good reason why I wrote it like it is, and TBH 
it does look a bit fishy with fresh eyes 3 years later.

> Looking at the driver, it looks like it would make more sense to register
> each PMU instance with the cpuhp state machine and avoid having to traverse
> a mutable list, rather than add irq instances.

At least part of it is that this IRQ-sharing machinery was originally[1] 
designed in the shape of driver-agnostic helper functions, because 
munging system PMU interrupts together is a thing people are unlikely to 
stop doing in general. I still expect to need to factor it out some day...

> That said, I really don't grok this comment:
> 
> 	/* We're only reading, but this isn't the place to be involving RCU */
> 
> Given that perf_pmu_migrate_context() calls synchronize_rcu()...

I think that may have been the point - we can't reasonably use RCU to 
protect the list iteration itself here *because* that would then mean 
synchronize_rcu() being called within the read-side critical section. 
The rest was that hotplug was *supposed* to be a context where just 
taking the same lock that serialises against dmc620_pmu_get_irq() 
touching the same list is safe, simple and obvious, even if it's 
stronger than we strictly need.

Cheers,
Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/d73dd8c3579fbf713d6215317404549aede8ad2d.1586363449.git.robin.murphy@arm.com/
diff mbox series

Patch

diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 54aa4658fb36..78d3bfbe96a6 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -425,7 +425,7 @@  static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
 	if (ret)
 		goto out_free_irq;
 
-	ret = cpuhp_state_add_instance_nocalls(cpuhp_state_num, &irq->node);
+	ret = cpuhp_state_add_instance_nocalls_cpuslocked(cpuhp_state_num, &irq->node);
 	if (ret)
 		goto out_free_irq;
 
@@ -445,9 +445,11 @@  static int dmc620_pmu_get_irq(struct dmc620_pmu *dmc620_pmu, int irq_num)
 {
 	struct dmc620_pmu_irq *irq;
 
+	cpus_read_lock();
 	mutex_lock(&dmc620_pmu_irqs_lock);
 	irq = __dmc620_pmu_get_irq(irq_num);
 	mutex_unlock(&dmc620_pmu_irqs_lock);
+	cpus_read_unlock();
 
 	if (IS_ERR(irq))
 		return PTR_ERR(irq);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6c0a92ca6bb5..05daaef362e6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2036,6 +2036,7 @@  int __cpuhp_state_add_instance_cpuslocked(enum cpuhp_state state,
 	mutex_unlock(&cpuhp_state_mutex);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance_cpuslocked);
 
 int __cpuhp_state_add_instance(enum cpuhp_state state, struct hlist_node *node,
 			       bool invoke)