diff mbox series

drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling

Message ID 1594891165-8228-1-git-send-email-liuqi115@huawei.com (mailing list archive)
State New, archived
Headers show
Series drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling | expand

Commit Message

liuqi (BA) July 16, 2020, 9:19 a.m. UTC
When users try to remove PMU modules during perf sampling, kernel panic
will happen because the pmu->read() is a NULL pointer here.

INFO on HiSilicon hip08 platform as follow:
pc : hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
lr : hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
sp : ffff800010103e90
x29: ffff800010103e90 x28: ffff0027db0c0e40
x27: ffffa29a76f129d8 x26: ffffa29a77ceb000
x25: ffffa29a773a5000 x24: ffffa29a77392000
x23: ffffddffe5943f08 x22: ffff002784285960
x21: ffff002784285800 x20: ffff0027d2e76c80
x19: ffff0027842859e0 x18: ffff80003498bcc8
x17: ffffa29a76afe910 x16: ffffa29a7583f530
x15: 16151a1512061a1e x14: 0000000000000000
x13: ffffa29a76f1e238 x12: 0000000000000001
x11: 0000000000000400 x10: 00000000000009f0
x9 : ffff8000107b3e70 x8 : ffff0027db0c1890
x7 : ffffa29a773a7000 x6 : 00000007f5131013
x5 : 00000007f5131013 x4 : 09f257d417c00000
x3 : 00000002187bd7ce x2 : ffffa29a38f0f0d8
x1 : ffffa29a38eae268 x0 : ffff0027d2e76c80
Call trace:
hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
__perf_event_read+0x1a0/0x1f8
flush_smp_call_function_queue+0xa0/0x160
generic_smp_call_function_single_interrupt+0x18/0x20
handle_IPI+0x31c/0x4dc
gic_handle_irq+0x2c8/0x310
el1_irq+0xcc/0x180
arch_cpu_idle+0x4c/0x20c
default_idle_call+0x20/0x30
do_idle+0x1b4/0x270
cpu_startup_entry+0x28/0x30
secondary_start_kernel+0x1a4/0x1fc

To solve the above issue, current module should be registered to kernel,
so that try_module_get() can be invoked when perf sampling starts. This
adds the reference counting of module and could prevent users from removing
modules during sampling.

Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reported-by: Haifeng Wang <wang.wanghaifeng@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
---
Kernel panic will also happen when users try to unbind PMU drivers with
device. This unbind issue could be solved by another patch latter.

 drivers/perf/arm_smmuv3_pmu.c                 | 1 +
 drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
 5 files changed, 5 insertions(+)

--
2.7.4

Comments

Will Deacon July 16, 2020, 9:41 a.m. UTC | #1
On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
> Kernel panic will also happen when users try to unbind PMU drivers with
> device. This unbind issue could be solved by another patch latter.
> 
>  drivers/perf/arm_smmuv3_pmu.c                 | 1 +
>  drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
>  drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>  drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
>  drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
>  5 files changed, 5 insertions(+)
> 
> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> index 48e28ef..90caba56 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>  	platform_set_drvdata(pdev, smmu_pmu);
> 
>  	smmu_pmu->pmu = (struct pmu) {
> +		.module		= THIS_MODULE,

I thought platform_driver_register() did this automatically?

Will
John Garry July 16, 2020, 10:21 a.m. UTC | #2
On 16/07/2020 10:41, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
>> Kernel panic will also happen when users try to unbind PMU drivers with
>> device. This unbind issue could be solved by another patch latter.
>>
>>   drivers/perf/arm_smmuv3_pmu.c                 | 1 +
>>   drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
>>   drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>>   drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
>>   drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
>>   5 files changed, 5 insertions(+)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 48e28ef..90caba56 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>   	platform_set_drvdata(pdev, smmu_pmu);
>>
>>   	smmu_pmu->pmu = (struct pmu) {
>> +		.module		= THIS_MODULE,
> 
> I thought platform_driver_register() did this automatically?
> 

Isn't that something different? The perf framework knows nothing of the 
platform_device/driver really, and just knows the event_source device 
which it creates. And so we also need to tell the perf framework about 
the module backing this pmu.

I think some relevant code is perf_try_init_event() -> 
try_module_get(pmu->module).

Thanks,
John
Robin Murphy July 16, 2020, 10:26 a.m. UTC | #3
On 2020-07-16 10:41, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
>> Kernel panic will also happen when users try to unbind PMU drivers with
>> device. This unbind issue could be solved by another patch latter.
>>
>>   drivers/perf/arm_smmuv3_pmu.c                 | 1 +
>>   drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
>>   drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>>   drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
>>   drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
>>   5 files changed, 5 insertions(+)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 48e28ef..90caba56 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>   	platform_set_drvdata(pdev, smmu_pmu);
>>
>>   	smmu_pmu->pmu = (struct pmu) {
>> +		.module		= THIS_MODULE,
> 
> I thought platform_driver_register() did this automatically?

For the platform device itself, yes, but this is for the PMU device - 
perf needs to take a reference to the module, otherwise the platform 
device can still be pulled out from under its feet.

I can't remember if we ever discussed making perf_pmu_register() do the 
same trick as platform_device_register() and friends, but obviously it's 
a possibility.

Robin.
Will Deacon July 16, 2020, 10:30 a.m. UTC | #4
On Thu, Jul 16, 2020 at 11:26:25AM +0100, Robin Murphy wrote:
> On 2020-07-16 10:41, Will Deacon wrote:
> > On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
> > > Kernel panic will also happen when users try to unbind PMU drivers with
> > > device. This unbind issue could be solved by another patch latter.
> > > 
> > >   drivers/perf/arm_smmuv3_pmu.c                 | 1 +
> > >   drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
> > >   drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
> > >   drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
> > >   drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
> > >   5 files changed, 5 insertions(+)
> > > 
> > > diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> > > index 48e28ef..90caba56 100644
> > > --- a/drivers/perf/arm_smmuv3_pmu.c
> > > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > > @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
> > >   	platform_set_drvdata(pdev, smmu_pmu);
> > > 
> > >   	smmu_pmu->pmu = (struct pmu) {
> > > +		.module		= THIS_MODULE,
> > 
> > I thought platform_driver_register() did this automatically?
> 
> For the platform device itself, yes, but this is for the PMU device - perf
> needs to take a reference to the module, otherwise the platform device can
> still be pulled out from under its feet.

Urgh, gross.

> I can't remember if we ever discussed making perf_pmu_register() do the same
> trick as platform_device_register() and friends, but obviously it's a
> possibility.

Yeah, but I suppose this patch is the right thing to do for now. I'll queue
it as a fix.

Will
John Garry July 16, 2020, 10:36 a.m. UTC | #5
On 16/07/2020 11:30, Will Deacon wrote:
> On Thu, Jul 16, 2020 at 11:26:25AM +0100, Robin Murphy wrote:
>> On 2020-07-16 10:41, Will Deacon wrote:
>>> On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
>>>> Kernel panic will also happen when users try to unbind PMU drivers with
>>>> device. This unbind issue could be solved by another patch latter.
>>>>
>>>>    drivers/perf/arm_smmuv3_pmu.c                 | 1 +
>>>>    drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
>>>>    drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
>>>>    drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
>>>>    drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
>>>>    5 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>>>> index 48e28ef..90caba56 100644
>>>> --- a/drivers/perf/arm_smmuv3_pmu.c
>>>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>>>> @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>>>    	platform_set_drvdata(pdev, smmu_pmu);
>>>>
>>>>    	smmu_pmu->pmu = (struct pmu) {
>>>> +		.module		= THIS_MODULE,
>>> I thought platform_driver_register() did this automatically?
>> For the platform device itself, yes, but this is for the PMU device - perf
>> needs to take a reference to the module, otherwise the platform device can
>> still be pulled out from under its feet.
> Urgh, gross.
> 
>> I can't remember if we ever discussed making perf_pmu_register() do the same
>> trick as platform_device_register() and friends, but obviously it's a
>> possibility.
> Yeah, but I suppose this patch is the right thing to do for now. I'll queue
> it as a fix.
> 

Please also note what Qi Liu wrote about being able to unbind the driver 
and cause the same issue. I don't know if you can about that issue also.
Will Deacon July 16, 2020, 10:44 a.m. UTC | #6
On Thu, Jul 16, 2020 at 11:36:10AM +0100, John Garry wrote:
> On 16/07/2020 11:30, Will Deacon wrote:
> > On Thu, Jul 16, 2020 at 11:26:25AM +0100, Robin Murphy wrote:
> > > On 2020-07-16 10:41, Will Deacon wrote:
> > > > On Thu, Jul 16, 2020 at 05:19:25PM +0800, Qi Liu wrote:
> > > > > Kernel panic will also happen when users try to unbind PMU drivers with
> > > > > device. This unbind issue could be solved by another patch latter.
> > > > > 
> > > > >    drivers/perf/arm_smmuv3_pmu.c                 | 1 +
> > > > >    drivers/perf/fsl_imx8_ddr_perf.c              | 1 +
> > > > >    drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c | 1 +
> > > > >    drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  | 1 +
> > > > >    drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  | 1 +
> > > > >    5 files changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> > > > > index 48e28ef..90caba56 100644
> > > > > --- a/drivers/perf/arm_smmuv3_pmu.c
> > > > > +++ b/drivers/perf/arm_smmuv3_pmu.c
> > > > > @@ -742,6 +742,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
> > > > >    	platform_set_drvdata(pdev, smmu_pmu);
> > > > > 
> > > > >    	smmu_pmu->pmu = (struct pmu) {
> > > > > +		.module		= THIS_MODULE,
> > > > I thought platform_driver_register() did this automatically?
> > > For the platform device itself, yes, but this is for the PMU device - perf
> > > needs to take a reference to the module, otherwise the platform device can
> > > still be pulled out from under its feet.
> > Urgh, gross.
> > 
> > > I can't remember if we ever discussed making perf_pmu_register() do the same
> > > trick as platform_device_register() and friends, but obviously it's a
> > > possibility.
> > Yeah, but I suppose this patch is the right thing to do for now. I'll queue
> > it as a fix.
> > 
> 
> Please also note what Qi Liu wrote about being able to unbind the driver and
> cause the same issue. I don't know if you can about that issue also.

I guess we have to throw in some '.suppress_bind_attrs = true,' lines as
well, then? I'll queue this as-is, but happy to take a follow-up if somebody
can test it.

Will
John Garry July 16, 2020, 11:09 a.m. UTC | #7
On 16/07/2020 11:44, Will Deacon wrote:
>> Please also note what Qi Liu wrote about being able to unbind the driver and
>> cause the same issue. I don't know if you can about that issue also.
> I guess we have to throw in some '.suppress_bind_attrs = true,' lines as
> well, then?

Yeah, I think it would be a follow up using suppress_bind_attrs. TBH, I 
don't know why this is an opt out feature, than an opt in. Probably a 
good reason...

  I'll queue this as-is, but happy to take a follow-up if somebody
> can test it.
> 

Thanks,
john
Will Deacon July 16, 2020, noon UTC | #8
On Thu, 16 Jul 2020 17:19:25 +0800, Qi Liu wrote:
> When users try to remove PMU modules during perf sampling, kernel panic
> will happen because the pmu->read() is a NULL pointer here.
> 
> INFO on HiSilicon hip08 platform as follow:
> pc : hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
> lr : hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
> sp : ffff800010103e90
> x29: ffff800010103e90 x28: ffff0027db0c0e40
> x27: ffffa29a76f129d8 x26: ffffa29a77ceb000
> x25: ffffa29a773a5000 x24: ffffa29a77392000
> x23: ffffddffe5943f08 x22: ffff002784285960
> x21: ffff002784285800 x20: ffff0027d2e76c80
> x19: ffff0027842859e0 x18: ffff80003498bcc8
> x17: ffffa29a76afe910 x16: ffffa29a7583f530
> x15: 16151a1512061a1e x14: 0000000000000000
> x13: ffffa29a76f1e238 x12: 0000000000000001
> x11: 0000000000000400 x10: 00000000000009f0
> x9 : ffff8000107b3e70 x8 : ffff0027db0c1890
> x7 : ffffa29a773a7000 x6 : 00000007f5131013
> x5 : 00000007f5131013 x4 : 09f257d417c00000
> x3 : 00000002187bd7ce x2 : ffffa29a38f0f0d8
> x1 : ffffa29a38eae268 x0 : ffff0027d2e76c80
> Call trace:
> hisi_uncore_pmu_event_update+0x30/0xa4 [hisi_uncore_pmu]
> hisi_uncore_pmu_read+0x20/0x2c [hisi_uncore_pmu]
> __perf_event_read+0x1a0/0x1f8
> flush_smp_call_function_queue+0xa0/0x160
> generic_smp_call_function_single_interrupt+0x18/0x20
> handle_IPI+0x31c/0x4dc
> gic_handle_irq+0x2c8/0x310
> el1_irq+0xcc/0x180
> arch_cpu_idle+0x4c/0x20c
> default_idle_call+0x20/0x30
> do_idle+0x1b4/0x270
> cpu_startup_entry+0x28/0x30
> secondary_start_kernel+0x1a4/0x1fc
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] drivers/perf: Fix kernel panic when rmmod PMU modules during perf sampling
      https://git.kernel.org/arm64/c/bdc5c744c7b6

Cheers,
diff mbox series

Patch

diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 48e28ef..90caba56 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -742,6 +742,7 @@  static int smmu_pmu_probe(struct platform_device *pdev)
 	platform_set_drvdata(pdev, smmu_pmu);

 	smmu_pmu->pmu = (struct pmu) {
+		.module		= THIS_MODULE,
 		.task_ctx_nr    = perf_invalid_context,
 		.pmu_enable	= smmu_pmu_enable,
 		.pmu_disable	= smmu_pmu_disable,
diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 90884d1..2aed2d9 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -512,6 +512,7 @@  static int ddr_perf_init(struct ddr_pmu *pmu, void __iomem *base,
 {
 	*pmu = (struct ddr_pmu) {
 		.pmu = (struct pmu) {
+			.module	      = THIS_MODULE,
 			.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 			.task_ctx_nr = perf_invalid_context,
 			.attr_groups = attr_groups,
diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
index 15713fa..71587f1 100644
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
@@ -378,6 +378,7 @@  static int hisi_ddrc_pmu_probe(struct platform_device *pdev)
 			      ddrc_pmu->sccl_id, ddrc_pmu->index_id);
 	ddrc_pmu->pmu = (struct pmu) {
 		.name		= name,
+		.module		= THIS_MODULE,
 		.task_ctx_nr	= perf_invalid_context,
 		.event_init	= hisi_uncore_pmu_event_init,
 		.pmu_enable	= hisi_uncore_pmu_enable,
diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
index dcc5600..c199de7 100644
--- a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
@@ -390,6 +390,7 @@  static int hisi_hha_pmu_probe(struct platform_device *pdev)
 			      hha_pmu->sccl_id, hha_pmu->index_id);
 	hha_pmu->pmu = (struct pmu) {
 		.name		= name,
+		.module		= THIS_MODULE,
 		.task_ctx_nr	= perf_invalid_context,
 		.event_init	= hisi_uncore_pmu_event_init,
 		.pmu_enable	= hisi_uncore_pmu_enable,
diff --git a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
index 7719ae4..567d7e6 100644
--- a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
@@ -380,6 +380,7 @@  static int hisi_l3c_pmu_probe(struct platform_device *pdev)
 			      l3c_pmu->sccl_id, l3c_pmu->index_id);
 	l3c_pmu->pmu = (struct pmu) {
 		.name		= name,
+		.module		= THIS_MODULE,
 		.task_ctx_nr	= perf_invalid_context,
 		.event_init	= hisi_uncore_pmu_event_init,
 		.pmu_enable	= hisi_uncore_pmu_enable,