diff mbox series

pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

Message ID 20240125191756.868860-1-cristian.marussi@arm.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal | expand

Commit Message

Cristian Marussi Jan. 25, 2024, 7:17 p.m. UTC
On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing.
Indeed, this particular setup causes the probe to bail out early without
giving any error, so that, then, the removal code is run on unload, but
without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
 [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
 Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
 CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
 Hardware name: linux,dummy-virt (DT)
 pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
 pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
 lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
 sp : ffff80008393bc10
 x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
 x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
 x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
 x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
 x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
 x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
 x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
 x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
 x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
 Call trace:
  scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
  scmi_dev_remove+0x28/0x40 [scmi_core]
  device_remove+0x54/0x90
  device_release_driver_internal+0x1dc/0x240
  driver_detach+0x58/0xa8
  bus_remove_driver+0x78/0x108
  driver_unregister+0x38/0x70
  scmi_driver_unregister+0x28/0x180 [scmi_core]
  scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
  __arm64_sys_delete_module+0x1a8/0x2c0
  invoke_syscall+0x50/0x128
  el0_svc_common.constprop.0+0x48/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x34/0xb8
  el0t_64_sync_handler+0x100/0x130
  el0t_64_sync+0x190/0x198
 Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
 ---[ end trace 0000000000000000 ]---

Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
I suppose the probe does NOT bail out with an error because this DT config has
to be supported, right ?
---
 drivers/pmdomain/arm/scmi_perf_domain.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ulf Hansson Jan. 30, 2024, 1:09 p.m. UTC | #1
On Thu, 25 Jan 2024 at 20:18, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On unloading of the scmi_perf_domain module got the below splat, when in
> the DT provided to the system under test the '#power-domain-cells' property
> was missing.
> Indeed, this particular setup causes the probe to bail out early without
> giving any error, so that, then, the removal code is run on unload, but
> without all the expected initialized structures in place.
>
> Add a check and bail out early on remove too.

Thanks for spotting this!

>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
>  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
>  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>  sp : ffff80008393bc10
>  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
>  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
>  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
>  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
>  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
>  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
>  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
>  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
>  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
>  Call trace:
>   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>   scmi_dev_remove+0x28/0x40 [scmi_core]
>   device_remove+0x54/0x90
>   device_release_driver_internal+0x1dc/0x240
>   driver_detach+0x58/0xa8
>   bus_remove_driver+0x78/0x108
>   driver_unregister+0x38/0x70
>   scmi_driver_unregister+0x28/0x180 [scmi_core]
>   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
>   __arm64_sys_delete_module+0x1a8/0x2c0
>   invoke_syscall+0x50/0x128
>   el0_svc_common.constprop.0+0x48/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x34/0xb8
>   el0t_64_sync_handler+0x100/0x130
>   el0t_64_sync+0x190/0x198
>  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
>  ---[ end trace 0000000000000000 ]---
>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> ---
> I suppose the probe does NOT bail out with an error because this DT config has
> to be supported, right ?

Actually, no. It's a mistake by me, the probe should bail out with an
error code.

In fact, there is also one additional similar problem in probe, when
the number of perf-domains are zero. In that case, we should also
return an error code, rather than returning 0.

Would you mind updating the patch to cover both problems - or if you
are too busy, just let me know and I can help out.

> ---
>  drivers/pmdomain/arm/scmi_perf_domain.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c
> index 709bbc448fad..d7ef46ccd9b8 100644
> --- a/drivers/pmdomain/arm/scmi_perf_domain.c
> +++ b/drivers/pmdomain/arm/scmi_perf_domain.c
> @@ -159,6 +159,9 @@ static void scmi_perf_domain_remove(struct scmi_device *sdev)
>         struct genpd_onecell_data *scmi_pd_data = dev_get_drvdata(dev);
>         int i;
>
> +       if (!scmi_pd_data)
> +               return;
> +
>         of_genpd_del_provider(dev->of_node);
>
>         for (i = 0; i < scmi_pd_data->num_domains; i++)
> --
> 2.43.0
>

Kind regards
Uffe
Cristian Marussi Jan. 30, 2024, 8:07 p.m. UTC | #2
On Tue, Jan 30, 2024 at 02:09:20PM +0100, Ulf Hansson wrote:
> On Thu, 25 Jan 2024 at 20:18, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > On unloading of the scmi_perf_domain module got the below splat, when in
> > the DT provided to the system under test the '#power-domain-cells' property
> > was missing.
> > Indeed, this particular setup causes the probe to bail out early without
> > giving any error, so that, then, the removal code is run on unload, but
> > without all the expected initialized structures in place.
> >
> > Add a check and bail out early on remove too.
> 
> Thanks for spotting this!
> 
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > Mem abort info:
> >    ESR = 0x0000000096000004
> >    EC = 0x25: DABT (current EL), IL = 32 bits
> >    SET = 0, FnV = 0
> >    EA = 0, S1PTW = 0
> >    FSC = 0x04: level 0 translation fault
> >  Data abort info:
> >    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> >  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
> >  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> >  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
> >  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
> >  Hardware name: linux,dummy-virt (DT)
> >  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> >  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> >  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> >  sp : ffff80008393bc10
> >  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
> >  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> >  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
> >  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
> >  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
> >  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
> >  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
> >  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
> >  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
> >  Call trace:
> >   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> >   scmi_dev_remove+0x28/0x40 [scmi_core]
> >   device_remove+0x54/0x90
> >   device_release_driver_internal+0x1dc/0x240
> >   driver_detach+0x58/0xa8
> >   bus_remove_driver+0x78/0x108
> >   driver_unregister+0x38/0x70
> >   scmi_driver_unregister+0x28/0x180 [scmi_core]
> >   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
> >   __arm64_sys_delete_module+0x1a8/0x2c0
> >   invoke_syscall+0x50/0x128
> >   el0_svc_common.constprop.0+0x48/0xf0
> >   do_el0_svc+0x24/0x38
> >   el0_svc+0x34/0xb8
> >   el0t_64_sync_handler+0x100/0x130
> >   el0t_64_sync+0x190/0x198
> >  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
> >  ---[ end trace 0000000000000000 ]---
> >
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
> > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > ---
> > I suppose the probe does NOT bail out with an error because this DT config has
> > to be supported, right ?
> 
> Actually, no. It's a mistake by me, the probe should bail out with an
> error code.
>

Ok. I suppose any old platform like JUNO that missed this will have to
update their DT to use the new scmi_perf_domain...well it should have
anyway really, it is just that now it is silently failing.

> In fact, there is also one additional similar problem in probe, when
> the number of perf-domains are zero. In that case, we should also
> return an error code, rather than returning 0.
> 
> Would you mind updating the patch to cover both problems - or if you
> are too busy, just let me know and I can help out.

No problem, I can do it next week, but regarding the zero domain case,
I remember I used to do the same on regulator/voltage driver and bail out
when no domains were found, but we were asked by some customer to support
instead the very useless and funny case of zero domains for some of their
testing setup scenarios .. i.e. allowing the driver to load with zero domains
(and do nothing) and then unload cleanly avoiding harms while unloading ...)

Thoughts about this ? Can fix as you prefer .

Thanks,
Cristian
Ulf Hansson Jan. 31, 2024, 11:35 a.m. UTC | #3
On Tue, 30 Jan 2024 at 21:07, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Tue, Jan 30, 2024 at 02:09:20PM +0100, Ulf Hansson wrote:
> > On Thu, 25 Jan 2024 at 20:18, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > >
> > > On unloading of the scmi_perf_domain module got the below splat, when in
> > > the DT provided to the system under test the '#power-domain-cells' property
> > > was missing.
> > > Indeed, this particular setup causes the probe to bail out early without
> > > giving any error, so that, then, the removal code is run on unload, but
> > > without all the expected initialized structures in place.
> > >
> > > Add a check and bail out early on remove too.
> >
> > Thanks for spotting this!
> >
> > >
> > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > > Mem abort info:
> > >    ESR = 0x0000000096000004
> > >    EC = 0x25: DABT (current EL), IL = 32 bits
> > >    SET = 0, FnV = 0
> > >    EA = 0, S1PTW = 0
> > >    FSC = 0x04: level 0 translation fault
> > >  Data abort info:
> > >    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > >    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > >    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > >  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
> > >  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> > >  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > >  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
> > >  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
> > >  Hardware name: linux,dummy-virt (DT)
> > >  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > >  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > >  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > >  sp : ffff80008393bc10
> > >  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
> > >  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> > >  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
> > >  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
> > >  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > >  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
> > >  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
> > >  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
> > >  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
> > >  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
> > >  Call trace:
> > >   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > >   scmi_dev_remove+0x28/0x40 [scmi_core]
> > >   device_remove+0x54/0x90
> > >   device_release_driver_internal+0x1dc/0x240
> > >   driver_detach+0x58/0xa8
> > >   bus_remove_driver+0x78/0x108
> > >   driver_unregister+0x38/0x70
> > >   scmi_driver_unregister+0x28/0x180 [scmi_core]
> > >   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
> > >   __arm64_sys_delete_module+0x1a8/0x2c0
> > >   invoke_syscall+0x50/0x128
> > >   el0_svc_common.constprop.0+0x48/0xf0
> > >   do_el0_svc+0x24/0x38
> > >   el0_svc+0x34/0xb8
> > >   el0t_64_sync_handler+0x100/0x130
> > >   el0t_64_sync+0x190/0x198
> > >  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
> > >  ---[ end trace 0000000000000000 ]---
> > >
> > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
> > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > > ---
> > > I suppose the probe does NOT bail out with an error because this DT config has
> > > to be supported, right ?
> >
> > Actually, no. It's a mistake by me, the probe should bail out with an
> > error code.
> >
>
> Ok. I suppose any old platform like JUNO that missed this will have to
> update their DT to use the new scmi_perf_domain...well it should have
> anyway really, it is just that now it is silently failing.

I don't think it's failing. The old binding for SCMI perf (using
clock-cells) is still supported the way they were before, which is
only for cpufreq.

But, yes you are right, both the DT and the consumer driver would need
to be updated to support SCMI perf.

>
> > In fact, there is also one additional similar problem in probe, when
> > the number of perf-domains are zero. In that case, we should also
> > return an error code, rather than returning 0.
> >
> > Would you mind updating the patch to cover both problems - or if you
> > are too busy, just let me know and I can help out.
>
> No problem, I can do it next week, but regarding the zero domain case,
> I remember I used to do the same on regulator/voltage driver and bail out
> when no domains were found, but we were asked by some customer to support
> instead the very useless and funny case of zero domains for some of their
> testing setup scenarios .. i.e. allowing the driver to load with zero domains
> (and do nothing) and then unload cleanly avoiding harms while unloading ...)
>
> Thoughts about this ? Can fix as you prefer .

In my opinion, there is no point having a module/driver loaded to do
nothing. I would prefer to just return an error code.

Thanks for fixing this!

Kind regards
Uffe
Sudeep Holla Jan. 31, 2024, 11:53 a.m. UTC | #4
On Wed, Jan 31, 2024 at 12:35:56PM +0100, Ulf Hansson wrote:
> On Tue, 30 Jan 2024 at 21:07, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > On Tue, Jan 30, 2024 at 02:09:20PM +0100, Ulf Hansson wrote:
> > > On Thu, 25 Jan 2024 at 20:18, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > >
> > > > On unloading of the scmi_perf_domain module got the below splat, when in
> > > > the DT provided to the system under test the '#power-domain-cells' property
> > > > was missing.
> > > > Indeed, this particular setup causes the probe to bail out early without
> > > > giving any error, so that, then, the removal code is run on unload, but
> > > > without all the expected initialized structures in place.
> > > >
> > > > Add a check and bail out early on remove too.
> > >
> > > Thanks for spotting this!
> > >
> > > >
> > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > > > Mem abort info:
> > > >    ESR = 0x0000000096000004
> > > >    EC = 0x25: DABT (current EL), IL = 32 bits
> > > >    SET = 0, FnV = 0
> > > >    EA = 0, S1PTW = 0
> > > >    FSC = 0x04: level 0 translation fault
> > > >  Data abort info:
> > > >    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > > >    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > >    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > > >  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
> > > >  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> > > >  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > > >  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
> > > >  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
> > > >  Hardware name: linux,dummy-virt (DT)
> > > >  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > >  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > > >  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > > >  sp : ffff80008393bc10
> > > >  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
> > > >  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> > > >  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
> > > >  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
> > > >  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > >  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
> > > >  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
> > > >  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
> > > >  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
> > > >  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
> > > >  Call trace:
> > > >   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > > >   scmi_dev_remove+0x28/0x40 [scmi_core]
> > > >   device_remove+0x54/0x90
> > > >   device_release_driver_internal+0x1dc/0x240
> > > >   driver_detach+0x58/0xa8
> > > >   bus_remove_driver+0x78/0x108
> > > >   driver_unregister+0x38/0x70
> > > >   scmi_driver_unregister+0x28/0x180 [scmi_core]
> > > >   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
> > > >   __arm64_sys_delete_module+0x1a8/0x2c0
> > > >   invoke_syscall+0x50/0x128
> > > >   el0_svc_common.constprop.0+0x48/0xf0
> > > >   do_el0_svc+0x24/0x38
> > > >   el0_svc+0x34/0xb8
> > > >   el0t_64_sync_handler+0x100/0x130
> > > >   el0t_64_sync+0x190/0x198
> > > >  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
> > > >  ---[ end trace 0000000000000000 ]---
> > > >
> > > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > > Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
> > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > > > ---
> > > > I suppose the probe does NOT bail out with an error because this DT config has
> > > > to be supported, right ?
> > >
> > > Actually, no. It's a mistake by me, the probe should bail out with an
> > > error code.
> > >
> >
> > Ok. I suppose any old platform like JUNO that missed this will have to
> > update their DT to use the new scmi_perf_domain...well it should have
> > anyway really, it is just that now it is silently failing.
> 
> I don't think it's failing. The old binding for SCMI perf (using
> clock-cells) is still supported the way they were before, which is
> only for cpufreq.
> 
> But, yes you are right, both the DT and the consumer driver would need
> to be updated to support SCMI perf.
> 

Not sure if you want to flag an error on platforms that doesn't use this.
IMO probe succeeding doing nothing seems right. Won't returning the error
from probe gets flagged as error during boot or module loading though
it is harmless on the platform since it doesn't use it.

> In fact, there is also one additional similar problem in probe, when
> the number of perf-domains are zero. In that case, we should also
> return an error code, rather than returning 0.
>
> >
> > > In fact, there is also one additional similar problem in probe, when
> > > the number of perf-domains are zero. In that case, we should also
> > > return an error code, rather than returning 0.
> > >
> > > Would you mind updating the patch to cover both problems - or if you
> > > are too busy, just let me know and I can help out.
> >
> > No problem, I can do it next week, but regarding the zero domain case,
> > I remember I used to do the same on regulator/voltage driver and bail out
> > when no domains were found, but we were asked by some customer to support
> > instead the very useless and funny case of zero domains for some of their
> > testing setup scenarios .. i.e. allowing the driver to load with zero domains
> > (and do nothing) and then unload cleanly avoiding harms while unloading ...)
> >
> > Thoughts about this ? Can fix as you prefer .
> 
> In my opinion, there is no point having a module/driver loaded to do
> nothing. I would prefer to just return an error code.
> 

IIRC we had this in one of the driver but there was a request to keep it
this way as it is useful in SCMI f/w bringup/testing. Not all info/features
need to be ready. That said I am fine if pmdomain prefers to flag 0 domains
as error.
Ulf Hansson Jan. 31, 2024, 12:26 p.m. UTC | #5
On Wed, 31 Jan 2024 at 12:53, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Wed, Jan 31, 2024 at 12:35:56PM +0100, Ulf Hansson wrote:
> > On Tue, 30 Jan 2024 at 21:07, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > >
> > > On Tue, Jan 30, 2024 at 02:09:20PM +0100, Ulf Hansson wrote:
> > > > On Thu, 25 Jan 2024 at 20:18, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > >
> > > > > On unloading of the scmi_perf_domain module got the below splat, when in
> > > > > the DT provided to the system under test the '#power-domain-cells' property
> > > > > was missing.
> > > > > Indeed, this particular setup causes the probe to bail out early without
> > > > > giving any error, so that, then, the removal code is run on unload, but
> > > > > without all the expected initialized structures in place.
> > > > >
> > > > > Add a check and bail out early on remove too.
> > > >
> > > > Thanks for spotting this!
> > > >
> > > > >
> > > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> > > > > Mem abort info:
> > > > >    ESR = 0x0000000096000004
> > > > >    EC = 0x25: DABT (current EL), IL = 32 bits
> > > > >    SET = 0, FnV = 0
> > > > >    EA = 0, S1PTW = 0
> > > > >    FSC = 0x04: level 0 translation fault
> > > > >  Data abort info:
> > > > >    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > > > >    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > > >    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > > > >  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
> > > > >  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> > > > >  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> > > > >  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
> > > > >  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
> > > > >  Hardware name: linux,dummy-virt (DT)
> > > > >  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > > >  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > > > >  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > > > >  sp : ffff80008393bc10
> > > > >  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
> > > > >  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> > > > >  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
> > > > >  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
> > > > >  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > > >  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
> > > > >  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
> > > > >  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
> > > > >  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
> > > > >  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
> > > > >  Call trace:
> > > > >   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
> > > > >   scmi_dev_remove+0x28/0x40 [scmi_core]
> > > > >   device_remove+0x54/0x90
> > > > >   device_release_driver_internal+0x1dc/0x240
> > > > >   driver_detach+0x58/0xa8
> > > > >   bus_remove_driver+0x78/0x108
> > > > >   driver_unregister+0x38/0x70
> > > > >   scmi_driver_unregister+0x28/0x180 [scmi_core]
> > > > >   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
> > > > >   __arm64_sys_delete_module+0x1a8/0x2c0
> > > > >   invoke_syscall+0x50/0x128
> > > > >   el0_svc_common.constprop.0+0x48/0xf0
> > > > >   do_el0_svc+0x24/0x38
> > > > >   el0_svc+0x34/0xb8
> > > > >   el0t_64_sync_handler+0x100/0x130
> > > > >   el0t_64_sync+0x190/0x198
> > > > >  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
> > > > >  ---[ end trace 0000000000000000 ]---
> > > > >
> > > > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
> > > > > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
> > > > > ---
> > > > > I suppose the probe does NOT bail out with an error because this DT config has
> > > > > to be supported, right ?
> > > >
> > > > Actually, no. It's a mistake by me, the probe should bail out with an
> > > > error code.
> > > >
> > >
> > > Ok. I suppose any old platform like JUNO that missed this will have to
> > > update their DT to use the new scmi_perf_domain...well it should have
> > > anyway really, it is just that now it is silently failing.
> >
> > I don't think it's failing. The old binding for SCMI perf (using
> > clock-cells) is still supported the way they were before, which is
> > only for cpufreq.
> >
> > But, yes you are right, both the DT and the consumer driver would need
> > to be updated to support SCMI perf.
> >
>
> Not sure if you want to flag an error on platforms that doesn't use this.
> IMO probe succeeding doing nothing seems right. Won't returning the error
> from probe gets flagged as error during boot or module loading though
> it is harmless on the platform since it doesn't use it.
>
> > In fact, there is also one additional similar problem in probe, when
> > the number of perf-domains are zero. In that case, we should also
> > return an error code, rather than returning 0.
> >
> > >
> > > > In fact, there is also one additional similar problem in probe, when
> > > > the number of perf-domains are zero. In that case, we should also
> > > > return an error code, rather than returning 0.
> > > >
> > > > Would you mind updating the patch to cover both problems - or if you
> > > > are too busy, just let me know and I can help out.
> > >
> > > No problem, I can do it next week, but regarding the zero domain case,
> > > I remember I used to do the same on regulator/voltage driver and bail out
> > > when no domains were found, but we were asked by some customer to support
> > > instead the very useless and funny case of zero domains for some of their
> > > testing setup scenarios .. i.e. allowing the driver to load with zero domains
> > > (and do nothing) and then unload cleanly avoiding harms while unloading ...)
> > >
> > > Thoughts about this ? Can fix as you prefer .
> >
> > In my opinion, there is no point having a module/driver loaded to do
> > nothing. I would prefer to just return an error code.
> >
>
> IIRC we had this in one of the driver but there was a request to keep it
> this way as it is useful in SCMI f/w bringup/testing. Not all info/features
> need to be ready. That said I am fine if pmdomain prefers to flag 0 domains
> as error.

Well, I don't have a strong opinion around this particular case. My
opinion is more generic, keeping modules loaded wastes memory for no
good reason.

On the other hand, it may not be a problem in this case, as you suggested above.

Perhaps a consistent behaviour is better?

Kind regards
Uffe
Sudeep Holla Jan. 31, 2024, 4:16 p.m. UTC | #6
On Thu, Jan 25, 2024 at 07:17:56PM +0000, Cristian Marussi wrote:
> On unloading of the scmi_perf_domain module got the below splat, when in
> the DT provided to the system under test the '#power-domain-cells' property
> was missing.
> Indeed, this particular setup causes the probe to bail out early without
> giving any error, so that, then, the removal code is run on unload, but
> without all the expected initialized structures in place.
> 
> Add a check and bail out early on remove too.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
>  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
>  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]


>  sp : ffff80008393bc10
>  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
>  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
>  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
>  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
>  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
>  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
>  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
>  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
>  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048

These can be dropped as they are not useful.

>  Call trace:
>   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>   scmi_dev_remove+0x28/0x40 [scmi_core]
>   device_remove+0x54/0x90
>   device_release_driver_internal+0x1dc/0x240
>   driver_detach+0x58/0xa8
>   bus_remove_driver+0x78/0x108
>   driver_unregister+0x38/0x70
>   scmi_driver_unregister+0x28/0x180 [scmi_core]
>   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
>   __arm64_sys_delete_module+0x1a8/0x2c0
>   invoke_syscall+0x50/0x128
>   el0_svc_common.constprop.0+0x48/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x34/0xb8
>   el0t_64_sync_handler+0x100/0x130
>   el0t_64_sync+0x190/0x198
>  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
>  ---[ end trace 0000000000000000 ]---
> 
> Cc: Sudeep Holla <sudeep.holla@arm.com>

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Ulf Hansson Feb. 13, 2024, 12:33 p.m. UTC | #7
On Thu, 25 Jan 2024 at 20:18, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On unloading of the scmi_perf_domain module got the below splat, when in
> the DT provided to the system under test the '#power-domain-cells' property
> was missing.
> Indeed, this particular setup causes the probe to bail out early without
> giving any error, so that, then, the removal code is run on unload, but
> without all the expected initialized structures in place.
>
> Add a check and bail out early on remove too.
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp=00000001076e5000
>  [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>  Modules linked in: scmi_perf_domain(-) scmi_module scmi_core
>  CPU: 0 PID: 231 Comm: rmmod Not tainted 6.7.0-00084-gb4b1f27d3b83-dirty #15
>  Hardware name: linux,dummy-virt (DT)
>  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>  pc : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>  lr : scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>  sp : ffff80008393bc10
>  x29: ffff80008393bc10 x28: ffff0000875a8000 x27: 0000000000000000
>  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
>  x23: ffff00008030c090 x22: ffff00008032d490 x21: ffff80007b287050
>  x20: 0000000000000000 x19: ffff00008032d410 x18: 0000000000000000
>  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>  x14: 8ba0696d05013a2f x13: 0000000000000000 x12: 0000000000000002
>  x11: 0101010101010101 x10: ffff00008510cff8 x9 : ffff800080a6797c
>  x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
>  x5 : 8080808000000000 x4 : 0000000000000020 x3 : 00000000553a3dc1
>  x2 : ffff0000875a8000 x1 : ffff0000875a8000 x0 : ffff800082ffa048
>  Call trace:
>   scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
>   scmi_dev_remove+0x28/0x40 [scmi_core]
>   device_remove+0x54/0x90
>   device_release_driver_internal+0x1dc/0x240
>   driver_detach+0x58/0xa8
>   bus_remove_driver+0x78/0x108
>   driver_unregister+0x38/0x70
>   scmi_driver_unregister+0x28/0x180 [scmi_core]
>   scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
>   __arm64_sys_delete_module+0x1a8/0x2c0
>   invoke_syscall+0x50/0x128
>   el0_svc_common.constprop.0+0x48/0xf0
>   do_el0_svc+0x24/0x38
>   el0_svc+0x34/0xb8
>   el0t_64_sync_handler+0x100/0x130
>   el0t_64_sync+0x190/0x198
>  Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
>  ---[ end trace 0000000000000000 ]---
>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>

Applied for fixes - and by amending the commit message a bit,
according to suggestions from Sudeep, thanks!

Kind regards
Uffe


> ---
> I suppose the probe does NOT bail out with an error because this DT config has
> to be supported, right ?
> ---
>  drivers/pmdomain/arm/scmi_perf_domain.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c
> index 709bbc448fad..d7ef46ccd9b8 100644
> --- a/drivers/pmdomain/arm/scmi_perf_domain.c
> +++ b/drivers/pmdomain/arm/scmi_perf_domain.c
> @@ -159,6 +159,9 @@ static void scmi_perf_domain_remove(struct scmi_device *sdev)
>         struct genpd_onecell_data *scmi_pd_data = dev_get_drvdata(dev);
>         int i;
>
> +       if (!scmi_pd_data)
> +               return;
> +
>         of_genpd_del_provider(dev->of_node);
>
>         for (i = 0; i < scmi_pd_data->num_domains; i++)
> --
> 2.43.0
>
diff mbox series

Patch

diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c
index 709bbc448fad..d7ef46ccd9b8 100644
--- a/drivers/pmdomain/arm/scmi_perf_domain.c
+++ b/drivers/pmdomain/arm/scmi_perf_domain.c
@@ -159,6 +159,9 @@  static void scmi_perf_domain_remove(struct scmi_device *sdev)
 	struct genpd_onecell_data *scmi_pd_data = dev_get_drvdata(dev);
 	int i;
 
+	if (!scmi_pd_data)
+		return;
+
 	of_genpd_del_provider(dev->of_node);
 
 	for (i = 0; i < scmi_pd_data->num_domains; i++)