diff mbox series

soc: imx8m: Probe the SoC driver late

Message ID 20240925160026.84091-1-marex@denx.de (mailing list archive)
State New, archived
Headers show
Series soc: imx8m: Probe the SoC driver late | expand

Commit Message

Marek Vasut Sept. 25, 2024, 4 p.m. UTC
With driver_async_probe=* on kernel command line, the following trace is
produced because on i.MX8M Plus hardware because the soc-imx8m.c driver
calls of_clk_get_by_name() which returns -EPROBE_DEFER because the clock
driver is not yet probed. This was not detected during regular testing
without driver_async_probe.

Attempt to fix it by probing the SoC driver late, but I don't think that
is the correct approach here.

"
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1 at drivers/soc/imx/soc-imx8m.c:115 imx8mm_soc_revision+0xdc/0x180
CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-next-20240924-00002-g2062bb554dea #603
Hardware name: DH electronics i.MX8M Plus DHCOM Premium Developer Kit (3) (DT)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : imx8mm_soc_revision+0xdc/0x180
lr : imx8mm_soc_revision+0xd0/0x180
sp : ffff8000821fbcc0
x29: ffff8000821fbce0 x28: 0000000000000000 x27: ffff800081810120
x26: ffff8000818a9970 x25: 0000000000000006 x24: 0000000000824311
x23: ffff8000817f42c8 x22: ffff0000df8be210 x21: fffffffffffffdfb
x20: ffff800082780000 x19: 0000000000000001 x18: ffffffffffffffff
x17: ffff800081fff418 x16: ffff8000823e1000 x15: ffff0000c03b65e8
x14: ffff0000c00051b0 x13: ffff800082790000 x12: 0000000000000801
x11: ffff80008278ffff x10: ffff80008209d3a6 x9 : ffff80008062e95c
x8 : ffff8000821fb9a0 x7 : 0000000000000000 x6 : 00000000000080e3
x5 : ffff0000df8c03d8 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : fffffffffffffdfb x0 : fffffffffffffdfb
Call trace:
 imx8mm_soc_revision+0xdc/0x180
 imx8_soc_init+0xb0/0x1e0
 do_one_initcall+0x94/0x1a8
 kernel_init_freeable+0x240/0x2a8
 kernel_init+0x28/0x140
 ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
SoC: i.MX8MP revision 1.1
"

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: imx@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/soc/imx/soc-imx8m.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Arnd Bergmann Sept. 25, 2024, 4:04 p.m. UTC | #1
On Wed, Sep 25, 2024, at 16:00, Marek Vasut wrote:
> With driver_async_probe=* on kernel command line, the following trace is
> produced because on i.MX8M Plus hardware because the soc-imx8m.c driver
> calls of_clk_get_by_name() which returns -EPROBE_DEFER because the clock
> driver is not yet probed. This was not detected during regular testing
> without driver_async_probe.
>
> Attempt to fix it by probing the SoC driver late, but I don't think that
> is the correct approach here.

I think the correct fix would be to propagate the -EPROBE_DEFER
and return that from imx8_soc_init(), so it gets retried again
after the clock driver.

      Arnd
Marek Vasut Sept. 25, 2024, 4:09 p.m. UTC | #2
On 9/25/24 6:04 PM, Arnd Bergmann wrote:
> On Wed, Sep 25, 2024, at 16:00, Marek Vasut wrote:
>> With driver_async_probe=* on kernel command line, the following trace is
>> produced because on i.MX8M Plus hardware because the soc-imx8m.c driver
>> calls of_clk_get_by_name() which returns -EPROBE_DEFER because the clock
>> driver is not yet probed. This was not detected during regular testing
>> without driver_async_probe.
>>
>> Attempt to fix it by probing the SoC driver late, but I don't think that
>> is the correct approach here.
> 
> I think the correct fix would be to propagate the -EPROBE_DEFER
> and return that from imx8_soc_init(), so it gets retried again
> after the clock driver.
I already tried that, but if I return -EPROBE_DEFER from 
device_initcall, it doesn't get retriggered . I suspect EPROBE_DEFER 
works only for proper drivers ?
Arnd Bergmann Sept. 25, 2024, 4:31 p.m. UTC | #3
On Wed, Sep 25, 2024, at 16:09, Marek Vasut wrote:
> On 9/25/24 6:04 PM, Arnd Bergmann wrote:
>> On Wed, Sep 25, 2024, at 16:00, Marek Vasut wrote:
>>> With driver_async_probe=* on kernel command line, the following trace is
>>> produced because on i.MX8M Plus hardware because the soc-imx8m.c driver
>>> calls of_clk_get_by_name() which returns -EPROBE_DEFER because the clock
>>> driver is not yet probed. This was not detected during regular testing
>>> without driver_async_probe.
>>>
>>> Attempt to fix it by probing the SoC driver late, but I don't think that
>>> is the correct approach here.
>> 
>> I think the correct fix would be to propagate the -EPROBE_DEFER
>> and return that from imx8_soc_init(), so it gets retried again
>> after the clock driver.
> I already tried that, but if I return -EPROBE_DEFER from 
> device_initcall, it doesn't get retriggered . I suspect EPROBE_DEFER 
> works only for proper drivers ?

Right, of course. And unfortunately it can't just register to
the fsl,imx8mm-anatop/fsl,imx8mm-ocotp/... nodes because they
all have a driver already.

On the other hand, making it a late_initcall() defeats the
purpose of the driver because then it can't be used by other
drivers with soc_device_match(), resulting in incorrect
behavior when another driver relies on this to enable
a chip revision specific workaround.

     Arnd
Marek Vasut Sept. 25, 2024, 5:07 p.m. UTC | #4
On 9/25/24 6:31 PM, Arnd Bergmann wrote:
> On Wed, Sep 25, 2024, at 16:09, Marek Vasut wrote:
>> On 9/25/24 6:04 PM, Arnd Bergmann wrote:
>>> On Wed, Sep 25, 2024, at 16:00, Marek Vasut wrote:
>>>> With driver_async_probe=* on kernel command line, the following trace is
>>>> produced because on i.MX8M Plus hardware because the soc-imx8m.c driver
>>>> calls of_clk_get_by_name() which returns -EPROBE_DEFER because the clock
>>>> driver is not yet probed. This was not detected during regular testing
>>>> without driver_async_probe.
>>>>
>>>> Attempt to fix it by probing the SoC driver late, but I don't think that
>>>> is the correct approach here.
>>>
>>> I think the correct fix would be to propagate the -EPROBE_DEFER
>>> and return that from imx8_soc_init(), so it gets retried again
>>> after the clock driver.
>> I already tried that, but if I return -EPROBE_DEFER from
>> device_initcall, it doesn't get retriggered . I suspect EPROBE_DEFER
>> works only for proper drivers ?
> 
> Right, of course. And unfortunately it can't just register to
> the fsl,imx8mm-anatop/fsl,imx8mm-ocotp/... nodes because they
> all have a driver already.
> 
> On the other hand, making it a late_initcall() defeats the
> purpose of the driver because then it can't be used by other
> drivers with soc_device_match(), resulting in incorrect
> behavior when another driver relies on this to enable
> a chip revision specific workaround.
I know, I am unsure how to proceed with this. Convert this to 
platform_driver or some such and probe it early maybe ?
Saravana Kannan Sept. 25, 2024, 6:48 p.m. UTC | #5
On Wed, Sep 25, 2024 at 10:07 AM Marek Vasut <marex@denx.de> wrote:
>
> On 9/25/24 6:31 PM, Arnd Bergmann wrote:
> > On Wed, Sep 25, 2024, at 16:09, Marek Vasut wrote:
> >> On 9/25/24 6:04 PM, Arnd Bergmann wrote:
> >>> On Wed, Sep 25, 2024, at 16:00, Marek Vasut wrote:
> >>>> With driver_async_probe=* on kernel command line, the following trace is
> >>>> produced because on i.MX8M Plus hardware because the soc-imx8m.c driver
> >>>> calls of_clk_get_by_name() which returns -EPROBE_DEFER because the clock
> >>>> driver is not yet probed. This was not detected during regular testing
> >>>> without driver_async_probe.
> >>>>
> >>>> Attempt to fix it by probing the SoC driver late, but I don't think that
> >>>> is the correct approach here.
> >>>
> >>> I think the correct fix would be to propagate the -EPROBE_DEFER
> >>> and return that from imx8_soc_init(), so it gets retried again
> >>> after the clock driver.
> >> I already tried that, but if I return -EPROBE_DEFER from
> >> device_initcall, it doesn't get retriggered . I suspect EPROBE_DEFER
> >> works only for proper drivers ?
> >
> > Right, of course. And unfortunately it can't just register to
> > the fsl,imx8mm-anatop/fsl,imx8mm-ocotp/... nodes because they
> > all have a driver already.

Arnd,

Can't we change this to add a platform device and a platform driver in
the initcall? And then the driver can return -EPROBE_DEFER if it can't
get the clock yet?

> >
> > On the other hand, making it a late_initcall() defeats the
> > purpose of the driver because then it can't be used by other
> > drivers with soc_device_match(), resulting in incorrect
> > behavior when another driver relies on this to enable
> > a chip revision specific workaround.

We could have soc_device_match() return -EPROBE_DEFER if no soc device
has been registered yet.

For cases where it's already working without any changes, we shouldn't
see any new -EPROBE_DEFER return values. But for cases like what Marek
is trying to do, it should work properly. He might have to fix bad
driver code where they remap the error instead of returning it as is.

On a tangential note, the soc framework seems to be yet another
framework violating the bus vs class thing. If it's a bus, then you
need to have a probe. Otherwise, just make it a class. Might be too
much to fix at this point, but might be good to keep this in mind if
you plan to write more frameworks or redo soc framework at some point
:)

See Slide 20:
https://lpc.events/event/18/contributions/1734/

> I know, I am unsure how to proceed with this. Convert this to
> platform_driver or some such and probe it early maybe ?

Marek,

Thanks for trying out drive_async_probe=* and trying to fix boards/drivers.

The issue you are facing and the proper way to handle it was covered
in my talk at LPC 2024 in Slide 18:
https://lpc.events/event/18/contributions/1734/

All the benefits of fw_devlink are only present for drivers that use a
device-driver model. And yes, in this case we should convert this
driver into a platform device/driver if it's possible.

Thanks,
Saravana
Arnd Bergmann Sept. 25, 2024, 8 p.m. UTC | #6
On Wed, Sep 25, 2024, at 18:48, Saravana Kannan wrote:
> On Wed, Sep 25, 2024 at 10:07 AM Marek Vasut <marex@denx.de> wrote:
>> >
>> > Right, of course. And unfortunately it can't just register to
>> > the fsl,imx8mm-anatop/fsl,imx8mm-ocotp/... nodes because they
>> > all have a driver already.
>
> Can't we change this to add a platform device and a platform driver in
> the initcall? And then the driver can return -EPROBE_DEFER if it can't
> get the clock yet?

Yes, good idea. So the initcall would still use of_match_node()
to see if wants to be loaded and then either bail early or
call platform_create_bundle() to register the driver and the
device.

>> > On the other hand, making it a late_initcall() defeats the
>> > purpose of the driver because then it can't be used by other
>> > drivers with soc_device_match(), resulting in incorrect
>> > behavior when another driver relies on this to enable
>> > a chip revision specific workaround.
>
> We could have soc_device_match() return -EPROBE_DEFER if no soc device
> has been registered yet.
>
> For cases where it's already working without any changes, we shouldn't
> see any new -EPROBE_DEFER return values. But for cases like what Marek
> is trying to do, it should work properly. He might have to fix bad
> driver code where they remap the error instead of returning it as is.

Right.

> On a tangential note, the soc framework seems to be yet another
> framework violating the bus vs class thing. If it's a bus, then you
> need to have a probe. Otherwise, just make it a class. Might be too
> much to fix at this point, but might be good to keep this in mind if
> you plan to write more frameworks or redo soc framework at some point
> :)
>
> See Slide 20:
> https://lpc.events/event/18/contributions/1734/

Very useful, I don't think I've seen this explained like this
in the past. It's probably not easy to change now since I'm
sure there is existing userspace looking at /sys/bus/soc, but
I can at least make sure I'll follow these when reviewing new
bus_type or class submissions.

     Arnd
Saravana Kannan Sept. 25, 2024, 8:04 p.m. UTC | #7
On Wed, Sep 25, 2024 at 1:00 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Sep 25, 2024, at 18:48, Saravana Kannan wrote:
> > On Wed, Sep 25, 2024 at 10:07 AM Marek Vasut <marex@denx.de> wrote:
> >> >
> >> > Right, of course. And unfortunately it can't just register to
> >> > the fsl,imx8mm-anatop/fsl,imx8mm-ocotp/... nodes because they
> >> > all have a driver already.
> >
> > Can't we change this to add a platform device and a platform driver in
> > the initcall? And then the driver can return -EPROBE_DEFER if it can't
> > get the clock yet?
>
> Yes, good idea. So the initcall would still use of_match_node()
> to see if wants to be loaded and then either bail early or
> call platform_create_bundle() to register the driver and the
> device.
>
> >> > On the other hand, making it a late_initcall() defeats the
> >> > purpose of the driver because then it can't be used by other
> >> > drivers with soc_device_match(), resulting in incorrect
> >> > behavior when another driver relies on this to enable
> >> > a chip revision specific workaround.
> >
> > We could have soc_device_match() return -EPROBE_DEFER if no soc device
> > has been registered yet.
> >
> > For cases where it's already working without any changes, we shouldn't
> > see any new -EPROBE_DEFER return values. But for cases like what Marek
> > is trying to do, it should work properly. He might have to fix bad
> > driver code where they remap the error instead of returning it as is.
>
> Right.

Sweet!

Marek, now you know what to do :)

>
> > On a tangential note, the soc framework seems to be yet another
> > framework violating the bus vs class thing. If it's a bus, then you
> > need to have a probe. Otherwise, just make it a class. Might be too
> > much to fix at this point, but might be good to keep this in mind if
> > you plan to write more frameworks or redo soc framework at some point
> > :)
> >
> > See Slide 20:
> > https://lpc.events/event/18/contributions/1734/
>
> Very useful, I don't think I've seen this explained like this
> in the past. It's probably not easy to change now since I'm
> sure there is existing userspace looking at /sys/bus/soc, but
> I can at least make sure I'll follow these when reviewing new
> bus_type or class submissions.

Great to hear it was useful!

-Saravana
Marek Vasut Sept. 25, 2024, 10:11 p.m. UTC | #8
On 9/25/24 8:48 PM, Saravana Kannan wrote:

Hi,

>>> On the other hand, making it a late_initcall() defeats the
>>> purpose of the driver because then it can't be used by other
>>> drivers with soc_device_match(), resulting in incorrect
>>> behavior when another driver relies on this to enable
>>> a chip revision specific workaround.
> 
> We could have soc_device_match() return -EPROBE_DEFER if no soc device
> has been registered yet.
> 
> For cases where it's already working without any changes, we shouldn't
> see any new -EPROBE_DEFER return values. But for cases like what Marek
> is trying to do, it should work properly. He might have to fix bad
> driver code where they remap the error instead of returning it as is.

I sent out V2.

> On a tangential note, the soc framework seems to be yet another
> framework violating the bus vs class thing. If it's a bus, then you
> need to have a probe. Otherwise, just make it a class. Might be too
> much to fix at this point, but might be good to keep this in mind if
> you plan to write more frameworks or redo soc framework at some point
> :)
> 
> See Slide 20:
> https://lpc.events/event/18/contributions/1734/
> 
>> I know, I am unsure how to proceed with this. Convert this to
>> platform_driver or some such and probe it early maybe ?
> 
> Marek,
> 
> Thanks for trying out drive_async_probe=* and trying to fix boards/drivers.

That was one of the most useful things I picked at LPC this year, thanks.

> The issue you are facing and the proper way to handle it was covered
> in my talk at LPC 2024 in Slide 18:
> https://lpc.events/event/18/contributions/1734/
> 
> All the benefits of fw_devlink are only present for drivers that use a
> device-driver model. And yes, in this case we should convert this
> driver into a platform device/driver if it's possible.
Done in V2, thanks.
diff mbox series

Patch

diff --git a/drivers/soc/imx/soc-imx8m.c b/drivers/soc/imx/soc-imx8m.c
index fe111bae38c8e..04d6ab1073b25 100644
--- a/drivers/soc/imx/soc-imx8m.c
+++ b/drivers/soc/imx/soc-imx8m.c
@@ -251,6 +251,6 @@  static int __init imx8_soc_init(void)
 	kfree(soc_dev_attr);
 	return ret;
 }
-device_initcall(imx8_soc_init);
+late_initcall(imx8_soc_init);
 MODULE_DESCRIPTION("NXP i.MX8M SoC driver");
 MODULE_LICENSE("GPL");