diff mbox series

[1/2] irqchip: irq-meson-gpio: make it possible to build as a module

Message ID 20201020072532.949137-2-narmstrong@baylibre.com (mailing list archive)
State New, archived
Headers show
Series irq-meson-gpio: make it possible to build as a module | expand

Commit Message

Neil Armstrong Oct. 20, 2020, 7:25 a.m. UTC
In order to reduce the kernel Image size on multi-platform distributions,
make it possible to build the Amlogic GPIO IRQ controller as a module
by switching it to a platform driver.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
---
 drivers/irqchip/Kconfig          |  5 +-
 drivers/irqchip/irq-meson-gpio.c | 89 ++++++++++++++++++++------------
 2 files changed, 59 insertions(+), 35 deletions(-)

Comments

Kevin Hilman Oct. 20, 2020, 6:23 p.m. UTC | #1
Neil Armstrong <narmstrong@baylibre.com> writes:

> In order to reduce the kernel Image size on multi-platform distributions,
> make it possible to build the Amlogic GPIO IRQ controller as a module
> by switching it to a platform driver.
>
> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>

Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>

Tested as a module on meson-sm1-khadas-vim3l where the wired networking
uses GPIO IRQs.

Kevin
Marc Zyngier May 24, 2021, 10:11 a.m. UTC | #2
On Fri, 21 May 2021 10:47:48 +0100,
Lee Jones <lee.jones@linaro.org> wrote:
> 
> [1  <text/plain; UTF-8 (quoted-printable)>]
> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
> 
> > Neil Armstrong <narmstrong@baylibre.com> writes:
> >
> > > In order to reduce the kernel Image size on multi-platform distributions,
> > > make it possible to build the Amlogic GPIO IRQ controller as a module
> > > by switching it to a platform driver.
> > >
> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> >
> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> > Tested-by: Kevin Hilman <khilman@baylibre.com>
> >
> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
> > uses GPIO IRQs.
> >
> 
> Good morning Neil, Kevin,
> 
> What happened to this set in the end?  I still don't see it in Mainline.

Last time I tried this patch, it broke my test setup in non-obvious
ways. Has someone checked that the issue I reported back then has been
resolved now that fw_devlink is more usable?

Thanks,

	M.
Kevin Hilman May 25, 2021, 4:17 p.m. UTC | #3
Marc Zyngier <maz@kernel.org> writes:

> On Fri, 21 May 2021 10:47:48 +0100,
> Lee Jones <lee.jones@linaro.org> wrote:
>> 
>> [1  <text/plain; UTF-8 (quoted-printable)>]
>> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
>> 
>> > Neil Armstrong <narmstrong@baylibre.com> writes:
>> >
>> > > In order to reduce the kernel Image size on multi-platform distributions,
>> > > make it possible to build the Amlogic GPIO IRQ controller as a module
>> > > by switching it to a platform driver.
>> > >
>> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
>> >
>> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
>> > Tested-by: Kevin Hilman <khilman@baylibre.com>
>> >
>> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
>> > uses GPIO IRQs.
>> >
>> 
>> Good morning Neil, Kevin,
>> 
>> What happened to this set in the end?  I still don't see it in Mainline.
>
> Last time I tried this patch, it broke my test setup in non-obvious
> ways. Has someone checked that the issue I reported back then has been
> resolved now that fw_devlink is more usable?

Not yet, that's (still) on my TODO list... (well, TBH, I forgot about
it, but I'll have another look.)

Kevin
Lee Jones May 25, 2021, 4:30 p.m. UTC | #4
On Tue, 25 May 2021, Kevin Hilman wrote:

> Marc Zyngier <maz@kernel.org> writes:
> 
> > On Fri, 21 May 2021 10:47:48 +0100,
> > Lee Jones <lee.jones@linaro.org> wrote:
> >> 
> >> [1  <text/plain; UTF-8 (quoted-printable)>]
> >> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
> >> 
> >> > Neil Armstrong <narmstrong@baylibre.com> writes:
> >> >
> >> > > In order to reduce the kernel Image size on multi-platform distributions,
> >> > > make it possible to build the Amlogic GPIO IRQ controller as a module
> >> > > by switching it to a platform driver.
> >> > >
> >> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> >> >
> >> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> >> > Tested-by: Kevin Hilman <khilman@baylibre.com>
> >> >
> >> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
> >> > uses GPIO IRQs.
> >> >
> >> 
> >> Good morning Neil, Kevin,
> >> 
> >> What happened to this set in the end?  I still don't see it in Mainline.
> >
> > Last time I tried this patch, it broke my test setup in non-obvious
> > ways. Has someone checked that the issue I reported back then has been
> > resolved now that fw_devlink is more usable?
> 
> Not yet, that's (still) on my TODO list... (well, TBH, I forgot about
> it, but I'll have another look.)

Superstar!  Thanks Kevin.
Kevin Hilman June 14, 2021, 10:30 p.m. UTC | #5
Marc Zyngier <maz@kernel.org> writes:

> On Fri, 21 May 2021 10:47:48 +0100,
> Lee Jones <lee.jones@linaro.org> wrote:
>> 
>> [1  <text/plain; UTF-8 (quoted-printable)>]
>> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
>> 
>> > Neil Armstrong <narmstrong@baylibre.com> writes:
>> >
>> > > In order to reduce the kernel Image size on multi-platform distributions,
>> > > make it possible to build the Amlogic GPIO IRQ controller as a module
>> > > by switching it to a platform driver.
>> > >
>> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
>> >
>> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
>> > Tested-by: Kevin Hilman <khilman@baylibre.com>
>> >
>> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
>> > uses GPIO IRQs.
>> >
>> 
>> Good morning Neil, Kevin,
>> 
>> What happened to this set in the end?  I still don't see it in Mainline.
>
> Last time I tried this patch, it broke my test setup in non-obvious
> ways. Has someone checked that the issue I reported back then has been
> resolved now that fw_devlink is more usable?

OK, after much anticipation (and much delay due to me forgetting about
this), I just gave this series a spin again on top of v5.13-rc6, and it
seems to work fine with `fw_devlink=on`

I started with your config[1] and accepting all the defaults of any new
configs.  IOW, I ran: yes '' | make oldconfig after copying your config
to .config.

With that it seems to be working fine for me.

Right after boot (and before network probes) I see module loaded, but no
users yet in /proc/interrupts:

/ # uname -a
Linux buildroot 5.13.0-rc6-00002-g679c8e852942 #5 SMP PREEMPT Mon Jun 14 15:08:40 PDT 2021 aarch64 GNU/Linux
/ # lsmod |grep gpio
irq_meson_gpio         20480  0
leds_gpio              16384  0
/ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  9:          0          0          0          0     GICv2  25 Level     vgic
 11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
 12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
 13:       1416        916        534       1421     GICv2  26 Level     arch_timer
 15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
 22:         38          0          0          0     GICv2 225 Edge      ttyAML0
 23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
 25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
 28:        322          0          0          0     GICv2  35 Edge      meson
 31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
 32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
 34:          0          0          0          0     GICv2 194 Level     panfrost-job
 35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
 36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
 39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
 40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
IPI0:       425        544        664        925       Rescheduling interrupts
IPI1:        86        166        269        136       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:         0          0          0          0       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts
Err:          0

So then I init the network interface and PHY works, DHCP works etc.

/ # udhcpc
udhcpc: started, v1.31.1
[  102.250449] meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=37)
[  102.256413] meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
[  102.269433] meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
[  102.271357] meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
[  102.278493] meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
udhcpc: sending discover
[  104.743301] meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[  104.746470] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
udhcpc: sending discover
udhcpc: sending select for 192.168.0.122
udhcpc: lease of 192.168.0.122 obtained, lease time 600
deleting routers
adding dns 192.168.0.254
adding dns 192.168.0.254
/ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  9:          0          0          0          0     GICv2  25 Level     vgic
 11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
 12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
 13:       1575       1018        604       1588     GICv2  26 Level     arch_timer
 14:          8          0          0          0     GICv2  40 Level     eth0
 15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
 22:        132          0          0          0     GICv2 225 Edge      ttyAML0
 23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
 25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
 28:        322          0          0          0     GICv2  35 Edge      meson
 31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
 32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
 34:          0          0          0          0     GICv2 194 Level     panfrost-job
 35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
 36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
 37:          2          0          0          0  meson-gpio-irqchip  26 Level     0.0:00
 39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
 40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
IPI0:       476        567        720        956       Rescheduling interrupts
IPI1:        93        166        270        137       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:         0          0          0          0       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts
Err:          0
/ #


Kevin


[1] http://www.loen.fr/tmp/Config.full-arm64
Lee Jones July 13, 2021, 9:05 a.m. UTC | #6
On Mon, 14 Jun 2021, Kevin Hilman wrote:

> Marc Zyngier <maz@kernel.org> writes:
> 
> > On Fri, 21 May 2021 10:47:48 +0100,
> > Lee Jones <lee.jones@linaro.org> wrote:
> >> 
> >> [1  <text/plain; UTF-8 (quoted-printable)>]
> >> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
> >> 
> >> > Neil Armstrong <narmstrong@baylibre.com> writes:
> >> >
> >> > > In order to reduce the kernel Image size on multi-platform distributions,
> >> > > make it possible to build the Amlogic GPIO IRQ controller as a module
> >> > > by switching it to a platform driver.
> >> > >
> >> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> >> >
> >> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> >> > Tested-by: Kevin Hilman <khilman@baylibre.com>
> >> >
> >> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
> >> > uses GPIO IRQs.
> >> >
> >> 
> >> Good morning Neil, Kevin,
> >> 
> >> What happened to this set in the end?  I still don't see it in Mainline.
> >
> > Last time I tried this patch, it broke my test setup in non-obvious
> > ways. Has someone checked that the issue I reported back then has been
> > resolved now that fw_devlink is more usable?
> 
> OK, after much anticipation (and much delay due to me forgetting about
> this), I just gave this series a spin again on top of v5.13-rc6, and it
> seems to work fine with `fw_devlink=on`
> 
> I started with your config[1] and accepting all the defaults of any new
> configs.  IOW, I ran: yes '' | make oldconfig after copying your config
> to .config.
> 
> With that it seems to be working fine for me.
> 
> Right after boot (and before network probes) I see module loaded, but no
> users yet in /proc/interrupts:
> 
> / # uname -a
> Linux buildroot 5.13.0-rc6-00002-g679c8e852942 #5 SMP PREEMPT Mon Jun 14 15:08:40 PDT 2021 aarch64 GNU/Linux
> / # lsmod |grep gpio
> irq_meson_gpio         20480  0
> leds_gpio              16384  0
> / # cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   9:          0          0          0          0     GICv2  25 Level     vgic
>  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
>  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
>  13:       1416        916        534       1421     GICv2  26 Level     arch_timer
>  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
>  22:         38          0          0          0     GICv2 225 Edge      ttyAML0
>  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
>  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
>  28:        322          0          0          0     GICv2  35 Edge      meson
>  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
>  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
>  34:          0          0          0          0     GICv2 194 Level     panfrost-job
>  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
>  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
>  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
>  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
> IPI0:       425        544        664        925       Rescheduling interrupts
> IPI1:        86        166        269        136       Function call interrupts
> IPI2:         0          0          0          0       CPU stop interrupts
> IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0       Timer broadcast interrupts
> IPI5:         0          0          0          0       IRQ work interrupts
> IPI6:         0          0          0          0       CPU wake-up interrupts
> Err:          0
> 
> So then I init the network interface and PHY works, DHCP works etc.
> 
> / # udhcpc
> udhcpc: started, v1.31.1
> [  102.250449] meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=37)
> [  102.256413] meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> [  102.269433] meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> [  102.271357] meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> [  102.278493] meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
> udhcpc: sending discover
> [  104.743301] meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> [  104.746470] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> udhcpc: sending discover
> udhcpc: sending select for 192.168.0.122
> udhcpc: lease of 192.168.0.122 obtained, lease time 600
> deleting routers
> adding dns 192.168.0.254
> adding dns 192.168.0.254
> / # cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   9:          0          0          0          0     GICv2  25 Level     vgic
>  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
>  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
>  13:       1575       1018        604       1588     GICv2  26 Level     arch_timer
>  14:          8          0          0          0     GICv2  40 Level     eth0
>  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
>  22:        132          0          0          0     GICv2 225 Edge      ttyAML0
>  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
>  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
>  28:        322          0          0          0     GICv2  35 Edge      meson
>  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
>  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
>  34:          0          0          0          0     GICv2 194 Level     panfrost-job
>  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
>  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
>  37:          2          0          0          0  meson-gpio-irqchip  26 Level     0.0:00
>  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
>  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
> IPI0:       476        567        720        956       Rescheduling interrupts
> IPI1:        93        166        270        137       Function call interrupts
> IPI2:         0          0          0          0       CPU stop interrupts
> IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0       Timer broadcast interrupts
> IPI5:         0          0          0          0       IRQ work interrupts
> IPI6:         0          0          0          0       CPU wake-up interrupts
> Err:          0
> / #
> 
> Kevin
> 
> [1] http://www.loen.fr/tmp/Config.full-arm64

Thanks Kevin.

Now that -rc1 is out, hopefully Marc can assign some time to this.
Marc Zyngier Aug. 3, 2021, 9:44 a.m. UTC | #7
On Mon, 14 Jun 2021 23:30:22 +0100,
Kevin Hilman <khilman@baylibre.com> wrote:
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > On Fri, 21 May 2021 10:47:48 +0100,
> > Lee Jones <lee.jones@linaro.org> wrote:
> >> 
> >> [1  <text/plain; UTF-8 (quoted-printable)>]
> >> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
> >> 
> >> > Neil Armstrong <narmstrong@baylibre.com> writes:
> >> >
> >> > > In order to reduce the kernel Image size on multi-platform distributions,
> >> > > make it possible to build the Amlogic GPIO IRQ controller as a module
> >> > > by switching it to a platform driver.
> >> > >
> >> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> >> >
> >> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> >> > Tested-by: Kevin Hilman <khilman@baylibre.com>
> >> >
> >> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
> >> > uses GPIO IRQs.
> >> >
> >> 
> >> Good morning Neil, Kevin,
> >> 
> >> What happened to this set in the end?  I still don't see it in Mainline.
> >
> > Last time I tried this patch, it broke my test setup in non-obvious
> > ways. Has someone checked that the issue I reported back then has been
> > resolved now that fw_devlink is more usable?
> 
> OK, after much anticipation (and much delay due to me forgetting about
> this), I just gave this series a spin again on top of v5.13-rc6, and it
> seems to work fine with `fw_devlink=on`
> 
> I started with your config[1] and accepting all the defaults of any new
> configs.  IOW, I ran: yes '' | make oldconfig after copying your config
> to .config.
> 
> With that it seems to be working fine for me.
> 
> Right after boot (and before network probes) I see module loaded, but no
> users yet in /proc/interrupts:
> 
> / # uname -a
> Linux buildroot 5.13.0-rc6-00002-g679c8e852942 #5 SMP PREEMPT Mon Jun 14 15:08:40 PDT 2021 aarch64 GNU/Linux
> / # lsmod |grep gpio
> irq_meson_gpio         20480  0
> leds_gpio              16384  0
> / # cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   9:          0          0          0          0     GICv2  25 Level     vgic
>  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
>  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
>  13:       1416        916        534       1421     GICv2  26 Level     arch_timer
>  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
>  22:         38          0          0          0     GICv2 225 Edge      ttyAML0
>  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
>  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
>  28:        322          0          0          0     GICv2  35 Edge      meson
>  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
>  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
>  34:          0          0          0          0     GICv2 194 Level     panfrost-job
>  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
>  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
>  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
>  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
> IPI0:       425        544        664        925       Rescheduling interrupts
> IPI1:        86        166        269        136       Function call interrupts
> IPI2:         0          0          0          0       CPU stop interrupts
> IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0       Timer broadcast interrupts
> IPI5:         0          0          0          0       IRQ work interrupts
> IPI6:         0          0          0          0       CPU wake-up interrupts
> Err:          0
> 
> So then I init the network interface and PHY works, DHCP works etc.
> 
> / # udhcpc
> udhcpc: started, v1.31.1
> [  102.250449] meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=37)
> [  102.256413] meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> [  102.269433] meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> [  102.271357] meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> [  102.278493] meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
> udhcpc: sending discover
> [  104.743301] meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> [  104.746470] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> udhcpc: sending discover
> udhcpc: sending select for 192.168.0.122
> udhcpc: lease of 192.168.0.122 obtained, lease time 600
> deleting routers
> adding dns 192.168.0.254
> adding dns 192.168.0.254
> / # cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   9:          0          0          0          0     GICv2  25 Level     vgic
>  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
>  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
>  13:       1575       1018        604       1588     GICv2  26 Level     arch_timer
>  14:          8          0          0          0     GICv2  40 Level     eth0
>  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
>  22:        132          0          0          0     GICv2 225 Edge      ttyAML0
>  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
>  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
>  28:        322          0          0          0     GICv2  35 Edge      meson
>  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
>  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
>  34:          0          0          0          0     GICv2 194 Level     panfrost-job
>  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
>  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
>  37:          2          0          0          0  meson-gpio-irqchip  26 Level     0.0:00
>  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
>  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
> IPI0:       476        567        720        956       Rescheduling interrupts
> IPI1:        93        166        270        137       Function call interrupts
> IPI2:         0          0          0          0       CPU stop interrupts
> IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0       Timer broadcast interrupts
> IPI5:         0          0          0          0       IRQ work interrupts
> IPI6:         0          0          0          0       CPU wake-up interrupts
> Err:          0
> / #

This thing keeps failing on my end. It only works if I force the
irqchip module to be present before the MDIO module is loaded. Here's
an example:

root@tiger-roach:~# modprobe mdio_mux_meson_g12a
[  125.871544] libphy: mdio_mux: probed
[  125.882575] g12a-mdio_mux ff64c000.mdio-multiplexer: Error: Failed to register MDIO bus for child /soc/bus@ff600000/mdio-multiplexer@4c000/mdio@0
[  125.892630] libphy: mdio_mux: probed

Trying to bring up the Ethernet interface will fail. Note that there
was no attempt to load the irqchip driver.

root@tiger-roach:~# modprobe -r mdio_mux_meson_g12a
root@tiger-roach:~# modprobe irq-meson-gpio 
[  144.983344] meson-gpio-intc ffd0f080.interrupt-controller: 100 to 8 gpio interrupt mux initialized
root@tiger-roach:~# modprobe mdio_mux_meson_g12a
[  150.376464] libphy: mdio_mux: probed
[  150.391039] libphy: mdio_mux: probed

And it now works.

Is it a MDIO issue? a fw_devlink issue? No idea. But I'd really like
to see this addressed before taking this patch, as everything works
just fine as long as the irqchip is built in (which on its own could
well pure luck).

Saravana, could you please have a look from a fw_devlink perspective?

Thanks,

	M.
Marc Zyngier Aug. 3, 2021, 9:51 a.m. UTC | #8
On Tue, 03 Aug 2021 10:44:34 +0100,
Marc Zyngier <maz@kernel.org> wrote:

[...]

> This thing keeps failing on my end. It only works if I force the
> irqchip module to be present before the MDIO module is loaded. Here's
> an example:
> 
> root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> [  125.871544] libphy: mdio_mux: probed
> [  125.882575] g12a-mdio_mux ff64c000.mdio-multiplexer: Error: Failed to register MDIO bus for child /soc/bus@ff600000/mdio-multiplexer@4c000/mdio@0
> [  125.892630] libphy: mdio_mux: probed
> 
> Trying to bring up the Ethernet interface will fail. Note that there
> was no attempt to load the irqchip driver.
> 
> root@tiger-roach:~# modprobe -r mdio_mux_meson_g12a
> root@tiger-roach:~# modprobe irq-meson-gpio 
> [  144.983344] meson-gpio-intc ffd0f080.interrupt-controller: 100 to 8 gpio interrupt mux initialized
> root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> [  150.376464] libphy: mdio_mux: probed
> [  150.391039] libphy: mdio_mux: probed
> 
> And it now works.

An additional source of amusement is that this patch allows the
irqchip to be removed from the kernel. It becomes really fun when you
have live interrupts...

	M.
Neil Armstrong Aug. 3, 2021, 9:51 a.m. UTC | #9
On 03/08/2021 11:44, Marc Zyngier wrote:
> On Mon, 14 Jun 2021 23:30:22 +0100,
> Kevin Hilman <khilman@baylibre.com> wrote:
>>
>> Marc Zyngier <maz@kernel.org> writes:
>>
>>> On Fri, 21 May 2021 10:47:48 +0100,
>>> Lee Jones <lee.jones@linaro.org> wrote:
>>>>
>>>> [1  <text/plain; UTF-8 (quoted-printable)>]
>>>> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
>>>>
>>>>> Neil Armstrong <narmstrong@baylibre.com> writes:
>>>>>
>>>>>> In order to reduce the kernel Image size on multi-platform distributions,
>>>>>> make it possible to build the Amlogic GPIO IRQ controller as a module
>>>>>> by switching it to a platform driver.
>>>>>>
>>>>>> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
>>>>>
>>>>> Reviewed-by: Kevin Hilman <khilman@baylibre.com>
>>>>> Tested-by: Kevin Hilman <khilman@baylibre.com>
>>>>>
>>>>> Tested as a module on meson-sm1-khadas-vim3l where the wired networking
>>>>> uses GPIO IRQs.
>>>>>
>>>>
>>>> Good morning Neil, Kevin,
>>>>
>>>> What happened to this set in the end?  I still don't see it in Mainline.
>>>
>>> Last time I tried this patch, it broke my test setup in non-obvious
>>> ways. Has someone checked that the issue I reported back then has been
>>> resolved now that fw_devlink is more usable?
>>
>> OK, after much anticipation (and much delay due to me forgetting about
>> this), I just gave this series a spin again on top of v5.13-rc6, and it
>> seems to work fine with `fw_devlink=on`
>>
>> I started with your config[1] and accepting all the defaults of any new
>> configs.  IOW, I ran: yes '' | make oldconfig after copying your config
>> to .config.
>>
>> With that it seems to be working fine for me.
>>
>> Right after boot (and before network probes) I see module loaded, but no
>> users yet in /proc/interrupts:
>>
>> / # uname -a
>> Linux buildroot 5.13.0-rc6-00002-g679c8e852942 #5 SMP PREEMPT Mon Jun 14 15:08:40 PDT 2021 aarch64 GNU/Linux
>> / # lsmod |grep gpio
>> irq_meson_gpio         20480  0
>> leds_gpio              16384  0
>> / # cat /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   9:          0          0          0          0     GICv2  25 Level     vgic
>>  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
>>  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
>>  13:       1416        916        534       1421     GICv2  26 Level     arch_timer
>>  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
>>  22:         38          0          0          0     GICv2 225 Edge      ttyAML0
>>  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
>>  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
>>  28:        322          0          0          0     GICv2  35 Edge      meson
>>  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
>>  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
>>  34:          0          0          0          0     GICv2 194 Level     panfrost-job
>>  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
>>  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
>>  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
>>  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
>> IPI0:       425        544        664        925       Rescheduling interrupts
>> IPI1:        86        166        269        136       Function call interrupts
>> IPI2:         0          0          0          0       CPU stop interrupts
>> IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
>> IPI4:         0          0          0          0       Timer broadcast interrupts
>> IPI5:         0          0          0          0       IRQ work interrupts
>> IPI6:         0          0          0          0       CPU wake-up interrupts
>> Err:          0
>>
>> So then I init the network interface and PHY works, DHCP works etc.
>>
>> / # udhcpc
>> udhcpc: started, v1.31.1
>> [  102.250449] meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=37)
>> [  102.256413] meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
>> [  102.269433] meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
>> [  102.271357] meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
>> [  102.278493] meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
>> udhcpc: sending discover
>> [  104.743301] meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
>> [  104.746470] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> udhcpc: sending discover
>> udhcpc: sending select for 192.168.0.122
>> udhcpc: lease of 192.168.0.122 obtained, lease time 600
>> deleting routers
>> adding dns 192.168.0.254
>> adding dns 192.168.0.254
>> / # cat /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   9:          0          0          0          0     GICv2  25 Level     vgic
>>  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
>>  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
>>  13:       1575       1018        604       1588     GICv2  26 Level     arch_timer
>>  14:          8          0          0          0     GICv2  40 Level     eth0
>>  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
>>  22:        132          0          0          0     GICv2 225 Edge      ttyAML0
>>  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
>>  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
>>  28:        322          0          0          0     GICv2  35 Edge      meson
>>  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
>>  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
>>  34:          0          0          0          0     GICv2 194 Level     panfrost-job
>>  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
>>  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
>>  37:          2          0          0          0  meson-gpio-irqchip  26 Level     0.0:00
>>  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
>>  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
>> IPI0:       476        567        720        956       Rescheduling interrupts
>> IPI1:        93        166        270        137       Function call interrupts
>> IPI2:         0          0          0          0       CPU stop interrupts
>> IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
>> IPI4:         0          0          0          0       Timer broadcast interrupts
>> IPI5:         0          0          0          0       IRQ work interrupts
>> IPI6:         0          0          0          0       CPU wake-up interrupts
>> Err:          0
>> / #
> 
> This thing keeps failing on my end. It only works if I force the
> irqchip module to be present before the MDIO module is loaded. Here's
> an example:
> 
> root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> [  125.871544] libphy: mdio_mux: probed
> [  125.882575] g12a-mdio_mux ff64c000.mdio-multiplexer: Error: Failed to register MDIO bus for child /soc/bus@ff600000/mdio-multiplexer@4c000/mdio@0
> [  125.892630] libphy: mdio_mux: probed

This error is caused because the PHY in the mdio@0 sub-bus cannot get the IRQ from the irq-meson-gpio driver,
so it may be a dependency issue causing the PHY not probing the irq-meson-gpio node.

IRQ management from PHY drivers is weird, because the core request the IRQ instead of the PHY driver.
maybe causing fw_devlink issue not detecting the link.

Neil

> 
> Trying to bring up the Ethernet interface will fail. Note that there
> was no attempt to load the irqchip driver.
> 
> root@tiger-roach:~# modprobe -r mdio_mux_meson_g12a
> root@tiger-roach:~# modprobe irq-meson-gpio 
> [  144.983344] meson-gpio-intc ffd0f080.interrupt-controller: 100 to 8 gpio interrupt mux initialized
> root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> [  150.376464] libphy: mdio_mux: probed
> [  150.391039] libphy: mdio_mux: probed
> 
> And it now works.
> 
> Is it a MDIO issue? a fw_devlink issue? No idea. But I'd really like
> to see this addressed before taking this patch, as everything works
> just fine as long as the irqchip is built in (which on its own could
> well pure luck).
> 
> Saravana, could you please have a look from a fw_devlink perspective?
> 
> Thanks,
> 
> 	M.
>
Saravana Kannan Aug. 4, 2021, 1:36 a.m. UTC | #10
On Tue, Aug 3, 2021 at 2:44 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Mon, 14 Jun 2021 23:30:22 +0100,
> Kevin Hilman <khilman@baylibre.com> wrote:
> >
> > Marc Zyngier <maz@kernel.org> writes:
> >
> > > On Fri, 21 May 2021 10:47:48 +0100,
> > > Lee Jones <lee.jones@linaro.org> wrote:
> > >>
> > >> [1  <text/plain; UTF-8 (quoted-printable)>]
> > >> On Tue, 20 Oct 2020 at 19:23, Kevin Hilman <khilman@baylibre.com> wrote:
> > >>
> > >> > Neil Armstrong <narmstrong@baylibre.com> writes:
> > >> >
> > >> > > In order to reduce the kernel Image size on multi-platform distributions,
> > >> > > make it possible to build the Amlogic GPIO IRQ controller as a module
> > >> > > by switching it to a platform driver.
> > >> > >
> > >> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> > >> >
> > >> > Reviewed-by: Kevin Hilman <khilman@baylibre.com>
> > >> > Tested-by: Kevin Hilman <khilman@baylibre.com>
> > >> >
> > >> > Tested as a module on meson-sm1-khadas-vim3l where the wired networking
> > >> > uses GPIO IRQs.
> > >> >
> > >>
> > >> Good morning Neil, Kevin,
> > >>
> > >> What happened to this set in the end?  I still don't see it in Mainline.
> > >
> > > Last time I tried this patch, it broke my test setup in non-obvious
> > > ways. Has someone checked that the issue I reported back then has been
> > > resolved now that fw_devlink is more usable?
> >
> > OK, after much anticipation (and much delay due to me forgetting about
> > this), I just gave this series a spin again on top of v5.13-rc6, and it
> > seems to work fine with `fw_devlink=on`
> >
> > I started with your config[1] and accepting all the defaults of any new
> > configs.  IOW, I ran: yes '' | make oldconfig after copying your config
> > to .config.
> >
> > With that it seems to be working fine for me.
> >
> > Right after boot (and before network probes) I see module loaded, but no
> > users yet in /proc/interrupts:
> >
> > / # uname -a
> > Linux buildroot 5.13.0-rc6-00002-g679c8e852942 #5 SMP PREEMPT Mon Jun 14 15:08:40 PDT 2021 aarch64 GNU/Linux
> > / # lsmod |grep gpio
> > irq_meson_gpio         20480  0
> > leds_gpio              16384  0
> > / # cat /proc/interrupts
> >            CPU0       CPU1       CPU2       CPU3
> >   9:          0          0          0          0     GICv2  25 Level     vgic
> >  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
> >  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
> >  13:       1416        916        534       1421     GICv2  26 Level     arch_timer
> >  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
> >  22:         38          0          0          0     GICv2 225 Edge      ttyAML0
> >  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
> >  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
> >  28:        322          0          0          0     GICv2  35 Edge      meson
> >  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
> >  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
> >  34:          0          0          0          0     GICv2 194 Level     panfrost-job
> >  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
> >  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
> >  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
> >  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
> > IPI0:       425        544        664        925       Rescheduling interrupts
> > IPI1:        86        166        269        136       Function call interrupts
> > IPI2:         0          0          0          0       CPU stop interrupts
> > IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> > IPI4:         0          0          0          0       Timer broadcast interrupts
> > IPI5:         0          0          0          0       IRQ work interrupts
> > IPI6:         0          0          0          0       CPU wake-up interrupts
> > Err:          0
> >
> > So then I init the network interface and PHY works, DHCP works etc.
> >
> > / # udhcpc
> > udhcpc: started, v1.31.1
> > [  102.250449] meson8b-dwmac ff3f0000.ethernet eth0: PHY [0.0:00] driver [RTL8211F Gigabit Ethernet] (irq=37)
> > [  102.256413] meson8b-dwmac ff3f0000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> > [  102.269433] meson8b-dwmac ff3f0000.ethernet eth0: No Safety Features support found
> > [  102.271357] meson8b-dwmac ff3f0000.ethernet eth0: PTP not supported by HW
> > [  102.278493] meson8b-dwmac ff3f0000.ethernet eth0: configuring for phy/rgmii link mode
> > udhcpc: sending discover
> > [  104.743301] meson8b-dwmac ff3f0000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
> > [  104.746470] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > udhcpc: sending discover
> > udhcpc: sending select for 192.168.0.122
> > udhcpc: lease of 192.168.0.122 obtained, lease time 600
> > deleting routers
> > adding dns 192.168.0.254
> > adding dns 192.168.0.254
> > / # cat /proc/interrupts
> >            CPU0       CPU1       CPU2       CPU3
> >   9:          0          0          0          0     GICv2  25 Level     vgic
> >  11:          0          0          0          0     GICv2  30 Level     kvm guest ptimer
> >  12:          0          0          0          0     GICv2  27 Level     kvm guest vtimer
> >  13:       1575       1018        604       1588     GICv2  26 Level     arch_timer
> >  14:          8          0          0          0     GICv2  40 Level     eth0
> >  15:          5          0          0          0     GICv2  89 Edge      dw_hdmi_top_irq, ff600000.hdmi-tx
> >  22:        132          0          0          0     GICv2 225 Edge      ttyAML0
> >  23:         20          0          0          0     GICv2 227 Edge      ff805000.i2c
> >  25:          2          0          0          0     GICv2 232 Edge      ff809000.adc
> >  28:        322          0          0          0     GICv2  35 Edge      meson
> >  31:          0          0          0          0     GICv2 222 Edge      ffe05000.sd
> >  32:        787          0          0          0     GICv2 223 Edge      ffe07000.mmc
> >  34:          0          0          0          0     GICv2 194 Level     panfrost-job
> >  35:          0          0          0          0     GICv2 193 Level     panfrost-mmu
> >  36:          3          0          0          0     GICv2 192 Level     panfrost-gpu
> >  37:          2          0          0          0  meson-gpio-irqchip  26 Level     0.0:00
> >  39:          0          0          0          0     GICv2  63 Level     ff400000.usb, ff400000.usb
> >  40:         32          0          0          0     GICv2  62 Level     xhci-hcd:usb1
> > IPI0:       476        567        720        956       Rescheduling interrupts
> > IPI1:        93        166        270        137       Function call interrupts
> > IPI2:         0          0          0          0       CPU stop interrupts
> > IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
> > IPI4:         0          0          0          0       Timer broadcast interrupts
> > IPI5:         0          0          0          0       IRQ work interrupts
> > IPI6:         0          0          0          0       CPU wake-up interrupts
> > Err:          0
> > / #
>
> This thing keeps failing on my end. It only works if I force the
> irqchip module to be present before the MDIO module is loaded. Here's
> an example:
>
> root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> [  125.871544] libphy: mdio_mux: probed
> [  125.882575] g12a-mdio_mux ff64c000.mdio-multiplexer: Error: Failed to register MDIO bus for child /soc/bus@ff600000/mdio-multiplexer@4c000/mdio@0
> [  125.892630] libphy: mdio_mux: probed
>
> Trying to bring up the Ethernet interface will fail. Note that there
> was no attempt to load the irqchip driver.
>
> root@tiger-roach:~# modprobe -r mdio_mux_meson_g12a
> root@tiger-roach:~# modprobe irq-meson-gpio
> [  144.983344] meson-gpio-intc ffd0f080.interrupt-controller: 100 to 8 gpio interrupt mux initialized
> root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> [  150.376464] libphy: mdio_mux: probed
> [  150.391039] libphy: mdio_mux: probed
>
> And it now works.
>
> Is it a MDIO issue? a fw_devlink issue? No idea. But I'd really like
> to see this addressed before taking this patch, as everything works
> just fine as long as the irqchip is built in (which on its own could
> well pure luck).
>
> Saravana, could you please have a look from a fw_devlink perspective?

Sigh... I spent several hours looking at this and wrote up an analysis
and then realized I might be looking at the wrong DT files.

Marc, can you point me to the board file in upstream that corresponds
to the platform in which you see this issue? I'm not asking for [1],
but the actual final .dts (not .dtsi) file that corresponds to the
platform/board/system.

Based on your error messages, it's failing for mdio@0 which
corresponds to ext_mdio. But none of the board dts files in upstream
have a compatible property for "ext_mdio". Which means fw_devlink
_should_ propagate the gpio_intc IRQ dependency all the way up to
eth_phy.

Also, in the failing case, can you run:
ls -ld supplier:*

in the /sys/devices/....<something>/ folder that corresponds to the
"eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?

Thanks,
Saravana

[1] - arch/arm64/boot/dts/amlogic/meson-g12-common.dtsi
Saravana Kannan Aug. 4, 2021, 2:11 a.m. UTC | #11
On Tue, Aug 3, 2021 at 2:51 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 03 Aug 2021 10:44:34 +0100,
> Marc Zyngier <maz@kernel.org> wrote:
>
> [...]
>
> > This thing keeps failing on my end. It only works if I force the
> > irqchip module to be present before the MDIO module is loaded. Here's
> > an example:
> >
> > root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> > [  125.871544] libphy: mdio_mux: probed
> > [  125.882575] g12a-mdio_mux ff64c000.mdio-multiplexer: Error: Failed to register MDIO bus for child /soc/bus@ff600000/mdio-multiplexer@4c000/mdio@0
> > [  125.892630] libphy: mdio_mux: probed
> >
> > Trying to bring up the Ethernet interface will fail. Note that there
> > was no attempt to load the irqchip driver.
> >
> > root@tiger-roach:~# modprobe -r mdio_mux_meson_g12a
> > root@tiger-roach:~# modprobe irq-meson-gpio
> > [  144.983344] meson-gpio-intc ffd0f080.interrupt-controller: 100 to 8 gpio interrupt mux initialized
> > root@tiger-roach:~# modprobe mdio_mux_meson_g12a
> > [  150.376464] libphy: mdio_mux: probed
> > [  150.391039] libphy: mdio_mux: probed
> >
> > And it now works.
>
> An additional source of amusement is that this patch allows the
> irqchip to be removed from the kernel. It becomes really fun when you
> have live interrupts...

Which is why I wrote IRQCHIP_PLATFORM_DRIVER_BEGIN/END macros. Maybe
those should be used instead?

-Saravana

[1] - https://lore.kernel.org/lkml/20200718000637.3632841-2-saravanak@google.com/
Marc Zyngier Aug. 4, 2021, 8:50 a.m. UTC | #12
On Wed, 04 Aug 2021 02:36:45 +0100,
Saravana Kannan <saravanak@google.com> wrote:

Hi Saravana,

Thanks for looking into this.

[...]

> > Saravana, could you please have a look from a fw_devlink perspective?
> 
> Sigh... I spent several hours looking at this and wrote up an analysis
> and then realized I might be looking at the wrong DT files.
> 
> Marc, can you point me to the board file in upstream that corresponds
> to the platform in which you see this issue? I'm not asking for [1],
> but the actual final .dts (not .dtsi) file that corresponds to the
> platform/board/system.

The platform I can reproduce this on is described in
arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
intricate maze of inclusion, node merge and other DT subtleties. I
suggest you look at the decompiled version to get a view of the
result.

> Based on your error messages, it's failing for mdio@0 which
> corresponds to ext_mdio. But none of the board dts files in upstream
> have a compatible property for "ext_mdio". Which means fw_devlink
> _should_ propagate the gpio_intc IRQ dependency all the way up to
> eth_phy.
> 
> Also, in the failing case, can you run:
> ls -ld supplier:*
> 
> in the /sys/devices/....<something>/ folder that corresponds to the
> "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?

Here you go:

root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer/supplier -> ../../../platform/soc/ff600000.bus/ff63c000.bus/ff63c000.system-controller/ff63c000.system-controller:clock-controller

Thanks,

	M.
Saravana Kannan Aug. 4, 2021, 6:20 p.m. UTC | #13
On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 04 Aug 2021 02:36:45 +0100,
> Saravana Kannan <saravanak@google.com> wrote:
>
> Hi Saravana,
>
> Thanks for looking into this.

You are welcome. I just don't want people to think fw_devlink is broken :)

>
> [...]
>
> > > Saravana, could you please have a look from a fw_devlink perspective?
> >
> > Sigh... I spent several hours looking at this and wrote up an analysis
> > and then realized I might be looking at the wrong DT files.
> >
> > Marc, can you point me to the board file in upstream that corresponds
> > to the platform in which you see this issue? I'm not asking for [1],
> > but the actual final .dts (not .dtsi) file that corresponds to the
> > platform/board/system.
>
> The platform I can reproduce this on is described in
> arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> intricate maze of inclusion, node merge and other DT subtleties. I
> suggest you look at the decompiled version to get a view of the
> result.

Thanks. After decompiling it, it looks something like (stripped a
bunch of reg and address properties and added the labels back):

eth_phy: mdio-multiplexer@4c000 {
        compatible = "amlogic,g12a-mdio-mux";
        clocks = <0x02 0x13 0x1e 0x02 0xb1>;
        clock-names = "pclk\0clkin0\0clkin1";
        mdio-parent-bus = <0x22>;

        ext_mdio: mdio@0 {
                reg = <0x00>;

                ethernet-phy@0 {
                        max-speed = <0x3e8>;
                        interrupt-parent = <0x23>;
                        interrupts = <0x1a 0x08>;
                        phandle = <0x16>;
                };
        };

        int_mdio: mdio@1 {
                ...
        }
}

And phandle 0x23 refers to the gpio_intc interrupt controller with the
modular driver.

> > Based on your error messages, it's failing for mdio@0 which
> > corresponds to ext_mdio. But none of the board dts files in upstream
> > have a compatible property for "ext_mdio". Which means fw_devlink
> > _should_ propagate the gpio_intc IRQ dependency all the way up to
> > eth_phy.
> >
> > Also, in the failing case, can you run:
> > ls -ld supplier:*
> >
> > in the /sys/devices/....<something>/ folder that corresponds to the
> > "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
>
> Here you go:
>
> root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer

As we discussed over chat, this was taken after the mdio-multiplexer
driver "successfully" probes this device. This will cause
SYNC_STATE_ONLY device links created by fw_devlink to be deleted
(because they are useless after a device probes). So, this doesn't
show the info I was hoping to demonstrate.

In any case, one can see that fw_devlink properly created the device
link for the clocks dependency. So fw_devlink is parsing this node
properly. But it doesn't create a similar probe order enforcing device
link between the mdio-multiplexer and the gpio_intc because the
dependency is only present in a grand child DT node (ethernet-phy@0
under ext_mdio). So fw_devlink is working as intended.

I spent several hours squinting at the code/DT yesterday. Here's what
is going on and causing the problem:

The failing driver in this case is
drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
handling is what I pasted above in this email. In the failure case,
the call flow is something like this:

g12a_mdio_mux_probe()
-> mdio_mux_init()
-> of_mdiobus_register(ext_mdio DT node)
-> of_mdiobus_register_phy(ext_mdio DT node)
-> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
-> Tried to get the IRQ listed in ethernet_phy and fails with
-EPROBE_DEFER because the IRQ driver isn't loaded yet.

The error is propagated correctly all the way up to of_mdiobus_register(), but
mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
continues on with the rest of the stuff and returns success as long as
one of the child nodes (in this case int_mdio) succeeds.

Since the probe returns 0 without really succeeding, networking stuff
just fails badly after this. So, IMO, the real problem is with
mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
quick hack (pasted at the end of this email) to test my theory and he
confirmed that it fixes the issue (a few deferred probes later, things
work properly).

Andrew, I don't see any good reason for mdio_mux_init() not
propagating the errors up correctly (at least for EPROBE_DEFER). I'll
send a patch to fix this. Please let me know if there's a reason it
has to stay as-is.

-Saravana

index 110e4ee85785..d973a267151f 100644
--- a/drivers/net/mdio/mdio-mux.c
+++ b/drivers/net/mdio/mdio-mux.c
@@ -170,6 +170,9 @@ int mdio_mux_init(struct device *dev,
                                child_bus_node);
                        mdiobus_free(cb->mii_bus);
                        devm_kfree(dev, cb);
+                       /* Not a final fix. I think it can cause UAF issues. */
+                       mdio_mux_uninit(pb);
+                       return r;
                } else {
                        cb->next = pb->children;
                        pb->children = cb;
Saravana Kannan Aug. 4, 2021, 9:47 p.m. UTC | #14
On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 04 Aug 2021 02:36:45 +0100,
> > Saravana Kannan <saravanak@google.com> wrote:
> >
> > Hi Saravana,
> >
> > Thanks for looking into this.
>
> You are welcome. I just don't want people to think fw_devlink is broken :)
>
> >
> > [...]
> >
> > > > Saravana, could you please have a look from a fw_devlink perspective?
> > >
> > > Sigh... I spent several hours looking at this and wrote up an analysis
> > > and then realized I might be looking at the wrong DT files.
> > >
> > > Marc, can you point me to the board file in upstream that corresponds
> > > to the platform in which you see this issue? I'm not asking for [1],
> > > but the actual final .dts (not .dtsi) file that corresponds to the
> > > platform/board/system.
> >
> > The platform I can reproduce this on is described in
> > arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> > intricate maze of inclusion, node merge and other DT subtleties. I
> > suggest you look at the decompiled version to get a view of the
> > result.
>
> Thanks. After decompiling it, it looks something like (stripped a
> bunch of reg and address properties and added the labels back):
>
> eth_phy: mdio-multiplexer@4c000 {
>         compatible = "amlogic,g12a-mdio-mux";
>         clocks = <0x02 0x13 0x1e 0x02 0xb1>;
>         clock-names = "pclk\0clkin0\0clkin1";
>         mdio-parent-bus = <0x22>;
>
>         ext_mdio: mdio@0 {
>                 reg = <0x00>;
>
>                 ethernet-phy@0 {
>                         max-speed = <0x3e8>;
>                         interrupt-parent = <0x23>;
>                         interrupts = <0x1a 0x08>;
>                         phandle = <0x16>;
>                 };
>         };
>
>         int_mdio: mdio@1 {
>                 ...
>         }
> }
>
> And phandle 0x23 refers to the gpio_intc interrupt controller with the
> modular driver.
>
> > > Based on your error messages, it's failing for mdio@0 which
> > > corresponds to ext_mdio. But none of the board dts files in upstream
> > > have a compatible property for "ext_mdio". Which means fw_devlink
> > > _should_ propagate the gpio_intc IRQ dependency all the way up to
> > > eth_phy.
> > >
> > > Also, in the failing case, can you run:
> > > ls -ld supplier:*
> > >
> > > in the /sys/devices/....<something>/ folder that corresponds to the
> > > "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
> >
> > Here you go:
> >
> > root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> > lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
>
> As we discussed over chat, this was taken after the mdio-multiplexer
> driver "successfully" probes this device. This will cause
> SYNC_STATE_ONLY device links created by fw_devlink to be deleted
> (because they are useless after a device probes). So, this doesn't
> show the info I was hoping to demonstrate.
>
> In any case, one can see that fw_devlink properly created the device
> link for the clocks dependency. So fw_devlink is parsing this node
> properly. But it doesn't create a similar probe order enforcing device
> link between the mdio-multiplexer and the gpio_intc because the
> dependency is only present in a grand child DT node (ethernet-phy@0
> under ext_mdio). So fw_devlink is working as intended.
>
> I spent several hours squinting at the code/DT yesterday. Here's what
> is going on and causing the problem:
>
> The failing driver in this case is
> drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
> handling is what I pasted above in this email. In the failure case,
> the call flow is something like this:
>
> g12a_mdio_mux_probe()
> -> mdio_mux_init()
> -> of_mdiobus_register(ext_mdio DT node)
> -> of_mdiobus_register_phy(ext_mdio DT node)
> -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
> -> Tried to get the IRQ listed in ethernet_phy and fails with
> -EPROBE_DEFER because the IRQ driver isn't loaded yet.
>
> The error is propagated correctly all the way up to of_mdiobus_register(), but
> mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
> continues on with the rest of the stuff and returns success as long as
> one of the child nodes (in this case int_mdio) succeeds.
>
> Since the probe returns 0 without really succeeding, networking stuff
> just fails badly after this. So, IMO, the real problem is with
> mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
> quick hack (pasted at the end of this email) to test my theory and he
> confirmed that it fixes the issue (a few deferred probes later, things
> work properly).
>
> Andrew, I don't see any good reason for mdio_mux_init() not
> propagating the errors up correctly (at least for EPROBE_DEFER). I'll
> send a patch to fix this. Please let me know if there's a reason it
> has to stay as-is.

I sent out the proper fix as a series:
https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t

Marc, can you give it a shot please?

-Saravana

>
> -Saravana
>
> index 110e4ee85785..d973a267151f 100644
> --- a/drivers/net/mdio/mdio-mux.c
> +++ b/drivers/net/mdio/mdio-mux.c
> @@ -170,6 +170,9 @@ int mdio_mux_init(struct device *dev,
>                                 child_bus_node);
>                         mdiobus_free(cb->mii_bus);
>                         devm_kfree(dev, cb);
> +                       /* Not a final fix. I think it can cause UAF issues. */
> +                       mdio_mux_uninit(pb);
> +                       return r;
>                 } else {
>                         cb->next = pb->children;
>                         pb->children = cb;
Neil Armstrong Aug. 5, 2021, 6:31 a.m. UTC | #15
Hi Saravana,

On 04/08/2021 23:47, Saravana Kannan wrote:
> On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@google.com> wrote:
>>
>> On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
>>>
>>> On Wed, 04 Aug 2021 02:36:45 +0100,
>>> Saravana Kannan <saravanak@google.com> wrote:
>>>
>>> Hi Saravana,
>>>
>>> Thanks for looking into this.
>>
>> You are welcome. I just don't want people to think fw_devlink is broken :)
>>
>>>
>>> [...]
>>>
>>>>> Saravana, could you please have a look from a fw_devlink perspective?
>>>>
>>>> Sigh... I spent several hours looking at this and wrote up an analysis
>>>> and then realized I might be looking at the wrong DT files.
>>>>
>>>> Marc, can you point me to the board file in upstream that corresponds
>>>> to the platform in which you see this issue? I'm not asking for [1],
>>>> but the actual final .dts (not .dtsi) file that corresponds to the
>>>> platform/board/system.
>>>
>>> The platform I can reproduce this on is described in
>>> arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
>>> intricate maze of inclusion, node merge and other DT subtleties. I
>>> suggest you look at the decompiled version to get a view of the
>>> result.
>>
>> Thanks. After decompiling it, it looks something like (stripped a
>> bunch of reg and address properties and added the labels back):
>>
>> eth_phy: mdio-multiplexer@4c000 {
>>         compatible = "amlogic,g12a-mdio-mux";
>>         clocks = <0x02 0x13 0x1e 0x02 0xb1>;
>>         clock-names = "pclk\0clkin0\0clkin1";
>>         mdio-parent-bus = <0x22>;
>>
>>         ext_mdio: mdio@0 {
>>                 reg = <0x00>;
>>
>>                 ethernet-phy@0 {
>>                         max-speed = <0x3e8>;
>>                         interrupt-parent = <0x23>;
>>                         interrupts = <0x1a 0x08>;
>>                         phandle = <0x16>;
>>                 };
>>         };
>>
>>         int_mdio: mdio@1 {
>>                 ...
>>         }
>> }
>>
>> And phandle 0x23 refers to the gpio_intc interrupt controller with the
>> modular driver.
>>
>>>> Based on your error messages, it's failing for mdio@0 which
>>>> corresponds to ext_mdio. But none of the board dts files in upstream
>>>> have a compatible property for "ext_mdio". Which means fw_devlink
>>>> _should_ propagate the gpio_intc IRQ dependency all the way up to
>>>> eth_phy.
>>>>
>>>> Also, in the failing case, can you run:
>>>> ls -ld supplier:*
>>>>
>>>> in the /sys/devices/....<something>/ folder that corresponds to the
>>>> "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
>>>
>>> Here you go:
>>>
>>> root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
>>> lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
>>
>> As we discussed over chat, this was taken after the mdio-multiplexer
>> driver "successfully" probes this device. This will cause
>> SYNC_STATE_ONLY device links created by fw_devlink to be deleted
>> (because they are useless after a device probes). So, this doesn't
>> show the info I was hoping to demonstrate.
>>
>> In any case, one can see that fw_devlink properly created the device
>> link for the clocks dependency. So fw_devlink is parsing this node
>> properly. But it doesn't create a similar probe order enforcing device
>> link between the mdio-multiplexer and the gpio_intc because the
>> dependency is only present in a grand child DT node (ethernet-phy@0
>> under ext_mdio). So fw_devlink is working as intended.
>>
>> I spent several hours squinting at the code/DT yesterday. Here's what
>> is going on and causing the problem:
>>
>> The failing driver in this case is
>> drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
>> handling is what I pasted above in this email. In the failure case,
>> the call flow is something like this:
>>
>> g12a_mdio_mux_probe()
>> -> mdio_mux_init()
>> -> of_mdiobus_register(ext_mdio DT node)
>> -> of_mdiobus_register_phy(ext_mdio DT node)
>> -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
>> -> Tried to get the IRQ listed in ethernet_phy and fails with
>> -EPROBE_DEFER because the IRQ driver isn't loaded yet.
>>
>> The error is propagated correctly all the way up to of_mdiobus_register(), but
>> mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
>> continues on with the rest of the stuff and returns success as long as
>> one of the child nodes (in this case int_mdio) succeeds.
>>
>> Since the probe returns 0 without really succeeding, networking stuff
>> just fails badly after this. So, IMO, the real problem is with
>> mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
>> quick hack (pasted at the end of this email) to test my theory and he
>> confirmed that it fixes the issue (a few deferred probes later, things
>> work properly).
>>
>> Andrew, I don't see any good reason for mdio_mux_init() not
>> propagating the errors up correctly (at least for EPROBE_DEFER). I'll
>> send a patch to fix this. Please let me know if there's a reason it
>> has to stay as-is.
> 
> I sent out the proper fix as a series:
> https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t

Thanks a lot for digging here and providing the appropriate fixes !

Neil

> 
> Marc, can you give it a shot please?
> 
> -Saravana
> 
>>
>> -Saravana
>>
>> index 110e4ee85785..d973a267151f 100644
>> --- a/drivers/net/mdio/mdio-mux.c
>> +++ b/drivers/net/mdio/mdio-mux.c
>> @@ -170,6 +170,9 @@ int mdio_mux_init(struct device *dev,
>>                                 child_bus_node);
>>                         mdiobus_free(cb->mii_bus);
>>                         devm_kfree(dev, cb);
>> +                       /* Not a final fix. I think it can cause UAF issues. */
>> +                       mdio_mux_uninit(pb);
>> +                       return r;
>>                 } else {
>>                         cb->next = pb->children;
>>                         pb->children = cb;
Lee Jones Aug. 5, 2021, 7:57 a.m. UTC | #16
On Wed, 04 Aug 2021, Saravana Kannan wrote:

> On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Wed, 04 Aug 2021 02:36:45 +0100,
> > > Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Hi Saravana,
> > >
> > > Thanks for looking into this.
> >
> > You are welcome. I just don't want people to think fw_devlink is broken :)
> >
> > >
> > > [...]
> > >
> > > > > Saravana, could you please have a look from a fw_devlink perspective?
> > > >
> > > > Sigh... I spent several hours looking at this and wrote up an analysis
> > > > and then realized I might be looking at the wrong DT files.
> > > >
> > > > Marc, can you point me to the board file in upstream that corresponds
> > > > to the platform in which you see this issue? I'm not asking for [1],
> > > > but the actual final .dts (not .dtsi) file that corresponds to the
> > > > platform/board/system.
> > >
> > > The platform I can reproduce this on is described in
> > > arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> > > intricate maze of inclusion, node merge and other DT subtleties. I
> > > suggest you look at the decompiled version to get a view of the
> > > result.
> >
> > Thanks. After decompiling it, it looks something like (stripped a
> > bunch of reg and address properties and added the labels back):
> >
> > eth_phy: mdio-multiplexer@4c000 {
> >         compatible = "amlogic,g12a-mdio-mux";
> >         clocks = <0x02 0x13 0x1e 0x02 0xb1>;
> >         clock-names = "pclk\0clkin0\0clkin1";
> >         mdio-parent-bus = <0x22>;
> >
> >         ext_mdio: mdio@0 {
> >                 reg = <0x00>;
> >
> >                 ethernet-phy@0 {
> >                         max-speed = <0x3e8>;
> >                         interrupt-parent = <0x23>;
> >                         interrupts = <0x1a 0x08>;
> >                         phandle = <0x16>;
> >                 };
> >         };
> >
> >         int_mdio: mdio@1 {
> >                 ...
> >         }
> > }
> >
> > And phandle 0x23 refers to the gpio_intc interrupt controller with the
> > modular driver.
> >
> > > > Based on your error messages, it's failing for mdio@0 which
> > > > corresponds to ext_mdio. But none of the board dts files in upstream
> > > > have a compatible property for "ext_mdio". Which means fw_devlink
> > > > _should_ propagate the gpio_intc IRQ dependency all the way up to
> > > > eth_phy.
> > > >
> > > > Also, in the failing case, can you run:
> > > > ls -ld supplier:*
> > > >
> > > > in the /sys/devices/....<something>/ folder that corresponds to the
> > > > "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
> > >
> > > Here you go:
> > >
> > > root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> > > lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
> >
> > As we discussed over chat, this was taken after the mdio-multiplexer
> > driver "successfully" probes this device. This will cause
> > SYNC_STATE_ONLY device links created by fw_devlink to be deleted
> > (because they are useless after a device probes). So, this doesn't
> > show the info I was hoping to demonstrate.
> >
> > In any case, one can see that fw_devlink properly created the device
> > link for the clocks dependency. So fw_devlink is parsing this node
> > properly. But it doesn't create a similar probe order enforcing device
> > link between the mdio-multiplexer and the gpio_intc because the
> > dependency is only present in a grand child DT node (ethernet-phy@0
> > under ext_mdio). So fw_devlink is working as intended.
> >
> > I spent several hours squinting at the code/DT yesterday. Here's what
> > is going on and causing the problem:
> >
> > The failing driver in this case is
> > drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
> > handling is what I pasted above in this email. In the failure case,
> > the call flow is something like this:
> >
> > g12a_mdio_mux_probe()
> > -> mdio_mux_init()
> > -> of_mdiobus_register(ext_mdio DT node)
> > -> of_mdiobus_register_phy(ext_mdio DT node)
> > -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
> > -> Tried to get the IRQ listed in ethernet_phy and fails with
> > -EPROBE_DEFER because the IRQ driver isn't loaded yet.
> >
> > The error is propagated correctly all the way up to of_mdiobus_register(), but
> > mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
> > continues on with the rest of the stuff and returns success as long as
> > one of the child nodes (in this case int_mdio) succeeds.
> >
> > Since the probe returns 0 without really succeeding, networking stuff
> > just fails badly after this. So, IMO, the real problem is with
> > mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
> > quick hack (pasted at the end of this email) to test my theory and he
> > confirmed that it fixes the issue (a few deferred probes later, things
> > work properly).
> >
> > Andrew, I don't see any good reason for mdio_mux_init() not
> > propagating the errors up correctly (at least for EPROBE_DEFER). I'll
> > send a patch to fix this. Please let me know if there's a reason it
> > has to stay as-is.
> 
> I sent out the proper fix as a series:
> https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> 
> Marc, can you give it a shot please?
> 
> -Saravana

Superstar!  Thanks for taking the time to rectify this for all of us.
Saravana Kannan Aug. 6, 2021, 11:55 p.m. UTC | #17
On Wed, Aug 4, 2021 at 11:31 PM Neil Armstrong <narmstrong@baylibre.com> wrote:
>
> Hi Saravana,
>
> On 04/08/2021 23:47, Saravana Kannan wrote:
> > On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@google.com> wrote:
> >>
> >> On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
> >>>
> >>> On Wed, 04 Aug 2021 02:36:45 +0100,
> >>> Saravana Kannan <saravanak@google.com> wrote:
> >>>
> >>> Hi Saravana,
> >>>
> >>> Thanks for looking into this.
> >>
> >> You are welcome. I just don't want people to think fw_devlink is broken :)
> >>
> >>>
> >>> [...]
> >>>
> >>>>> Saravana, could you please have a look from a fw_devlink perspective?
> >>>>
> >>>> Sigh... I spent several hours looking at this and wrote up an analysis
> >>>> and then realized I might be looking at the wrong DT files.
> >>>>
> >>>> Marc, can you point me to the board file in upstream that corresponds
> >>>> to the platform in which you see this issue? I'm not asking for [1],
> >>>> but the actual final .dts (not .dtsi) file that corresponds to the
> >>>> platform/board/system.
> >>>
> >>> The platform I can reproduce this on is described in
> >>> arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> >>> intricate maze of inclusion, node merge and other DT subtleties. I
> >>> suggest you look at the decompiled version to get a view of the
> >>> result.
> >>
> >> Thanks. After decompiling it, it looks something like (stripped a
> >> bunch of reg and address properties and added the labels back):
> >>
> >> eth_phy: mdio-multiplexer@4c000 {
> >>         compatible = "amlogic,g12a-mdio-mux";
> >>         clocks = <0x02 0x13 0x1e 0x02 0xb1>;
> >>         clock-names = "pclk\0clkin0\0clkin1";
> >>         mdio-parent-bus = <0x22>;
> >>
> >>         ext_mdio: mdio@0 {
> >>                 reg = <0x00>;
> >>
> >>                 ethernet-phy@0 {
> >>                         max-speed = <0x3e8>;
> >>                         interrupt-parent = <0x23>;
> >>                         interrupts = <0x1a 0x08>;
> >>                         phandle = <0x16>;
> >>                 };
> >>         };
> >>
> >>         int_mdio: mdio@1 {
> >>                 ...
> >>         }
> >> }
> >>
> >> And phandle 0x23 refers to the gpio_intc interrupt controller with the
> >> modular driver.
> >>
> >>>> Based on your error messages, it's failing for mdio@0 which
> >>>> corresponds to ext_mdio. But none of the board dts files in upstream
> >>>> have a compatible property for "ext_mdio". Which means fw_devlink
> >>>> _should_ propagate the gpio_intc IRQ dependency all the way up to
> >>>> eth_phy.
> >>>>
> >>>> Also, in the failing case, can you run:
> >>>> ls -ld supplier:*
> >>>>
> >>>> in the /sys/devices/....<something>/ folder that corresponds to the
> >>>> "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
> >>>
> >>> Here you go:
> >>>
> >>> root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> >>> lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
> >>
> >> As we discussed over chat, this was taken after the mdio-multiplexer
> >> driver "successfully" probes this device. This will cause
> >> SYNC_STATE_ONLY device links created by fw_devlink to be deleted
> >> (because they are useless after a device probes). So, this doesn't
> >> show the info I was hoping to demonstrate.
> >>
> >> In any case, one can see that fw_devlink properly created the device
> >> link for the clocks dependency. So fw_devlink is parsing this node
> >> properly. But it doesn't create a similar probe order enforcing device
> >> link between the mdio-multiplexer and the gpio_intc because the
> >> dependency is only present in a grand child DT node (ethernet-phy@0
> >> under ext_mdio). So fw_devlink is working as intended.
> >>
> >> I spent several hours squinting at the code/DT yesterday. Here's what
> >> is going on and causing the problem:
> >>
> >> The failing driver in this case is
> >> drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
> >> handling is what I pasted above in this email. In the failure case,
> >> the call flow is something like this:
> >>
> >> g12a_mdio_mux_probe()
> >> -> mdio_mux_init()
> >> -> of_mdiobus_register(ext_mdio DT node)
> >> -> of_mdiobus_register_phy(ext_mdio DT node)
> >> -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
> >> -> Tried to get the IRQ listed in ethernet_phy and fails with
> >> -EPROBE_DEFER because the IRQ driver isn't loaded yet.
> >>
> >> The error is propagated correctly all the way up to of_mdiobus_register(), but
> >> mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
> >> continues on with the rest of the stuff and returns success as long as
> >> one of the child nodes (in this case int_mdio) succeeds.
> >>
> >> Since the probe returns 0 without really succeeding, networking stuff
> >> just fails badly after this. So, IMO, the real problem is with
> >> mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
> >> quick hack (pasted at the end of this email) to test my theory and he
> >> confirmed that it fixes the issue (a few deferred probes later, things
> >> work properly).
> >>
> >> Andrew, I don't see any good reason for mdio_mux_init() not
> >> propagating the errors up correctly (at least for EPROBE_DEFER). I'll
> >> send a patch to fix this. Please let me know if there's a reason it
> >> has to stay as-is.
> >
> > I sent out the proper fix as a series:
> > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
>
> Thanks a lot for digging here and providing the appropriate fixes !

You are welcome!

Btw, 'm too lazy to download the mbox for your original patch
(justifiably not cc'ed in it) and reply to it. I made this comment
earlier too.

Can you please use the IRQCHIP_PLATFORM_DRIVER_BEGIN and
IRQCHIP_PLATFORM_DRIVER_END macros? They avoid boilerplate code every
irqchip driver has to implement, adds some restrictions to avoid
unbinding these drivers/unloading these modules, and also makes it
easy to convert from IRQCHIP_DECLARE to a platform driver. It'll also
allow you to drop the of_irq_find_parent() call in your probe.

Cheers,
Saravana

>
> Neil
>
> >
> > Marc, can you give it a shot please?
> >
> > -Saravana
> >
> >>
> >> -Saravana
> >>
> >> index 110e4ee85785..d973a267151f 100644
> >> --- a/drivers/net/mdio/mdio-mux.c
> >> +++ b/drivers/net/mdio/mdio-mux.c
> >> @@ -170,6 +170,9 @@ int mdio_mux_init(struct device *dev,
> >>                                 child_bus_node);
> >>                         mdiobus_free(cb->mii_bus);
> >>                         devm_kfree(dev, cb);
> >> +                       /* Not a final fix. I think it can cause UAF issues. */
> >> +                       mdio_mux_uninit(pb);
> >> +                       return r;
> >>                 } else {
> >>                         cb->next = pb->children;
> >>                         pb->children = cb;
>
Lee Jones Aug. 16, 2021, 12:47 p.m. UTC | #18
On Thu, 05 Aug 2021, Lee Jones wrote:

> On Wed, 04 Aug 2021, Saravana Kannan wrote:
> 
> > On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Wed, 04 Aug 2021 02:36:45 +0100,
> > > > Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > Thanks for looking into this.
> > >
> > > You are welcome. I just don't want people to think fw_devlink is broken :)
> > >
> > > >
> > > > [...]
> > > >
> > > > > > Saravana, could you please have a look from a fw_devlink perspective?
> > > > >
> > > > > Sigh... I spent several hours looking at this and wrote up an analysis
> > > > > and then realized I might be looking at the wrong DT files.
> > > > >
> > > > > Marc, can you point me to the board file in upstream that corresponds
> > > > > to the platform in which you see this issue? I'm not asking for [1],
> > > > > but the actual final .dts (not .dtsi) file that corresponds to the
> > > > > platform/board/system.
> > > >
> > > > The platform I can reproduce this on is described in
> > > > arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> > > > intricate maze of inclusion, node merge and other DT subtleties. I
> > > > suggest you look at the decompiled version to get a view of the
> > > > result.
> > >
> > > Thanks. After decompiling it, it looks something like (stripped a
> > > bunch of reg and address properties and added the labels back):
> > >
> > > eth_phy: mdio-multiplexer@4c000 {
> > >         compatible = "amlogic,g12a-mdio-mux";
> > >         clocks = <0x02 0x13 0x1e 0x02 0xb1>;
> > >         clock-names = "pclk\0clkin0\0clkin1";
> > >         mdio-parent-bus = <0x22>;
> > >
> > >         ext_mdio: mdio@0 {
> > >                 reg = <0x00>;
> > >
> > >                 ethernet-phy@0 {
> > >                         max-speed = <0x3e8>;
> > >                         interrupt-parent = <0x23>;
> > >                         interrupts = <0x1a 0x08>;
> > >                         phandle = <0x16>;
> > >                 };
> > >         };
> > >
> > >         int_mdio: mdio@1 {
> > >                 ...
> > >         }
> > > }
> > >
> > > And phandle 0x23 refers to the gpio_intc interrupt controller with the
> > > modular driver.
> > >
> > > > > Based on your error messages, it's failing for mdio@0 which
> > > > > corresponds to ext_mdio. But none of the board dts files in upstream
> > > > > have a compatible property for "ext_mdio". Which means fw_devlink
> > > > > _should_ propagate the gpio_intc IRQ dependency all the way up to
> > > > > eth_phy.
> > > > >
> > > > > Also, in the failing case, can you run:
> > > > > ls -ld supplier:*
> > > > >
> > > > > in the /sys/devices/....<something>/ folder that corresponds to the
> > > > > "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
> > > >
> > > > Here you go:
> > > >
> > > > root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> > > > lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
> > >
> > > As we discussed over chat, this was taken after the mdio-multiplexer
> > > driver "successfully" probes this device. This will cause
> > > SYNC_STATE_ONLY device links created by fw_devlink to be deleted
> > > (because they are useless after a device probes). So, this doesn't
> > > show the info I was hoping to demonstrate.
> > >
> > > In any case, one can see that fw_devlink properly created the device
> > > link for the clocks dependency. So fw_devlink is parsing this node
> > > properly. But it doesn't create a similar probe order enforcing device
> > > link between the mdio-multiplexer and the gpio_intc because the
> > > dependency is only present in a grand child DT node (ethernet-phy@0
> > > under ext_mdio). So fw_devlink is working as intended.
> > >
> > > I spent several hours squinting at the code/DT yesterday. Here's what
> > > is going on and causing the problem:
> > >
> > > The failing driver in this case is
> > > drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
> > > handling is what I pasted above in this email. In the failure case,
> > > the call flow is something like this:
> > >
> > > g12a_mdio_mux_probe()
> > > -> mdio_mux_init()
> > > -> of_mdiobus_register(ext_mdio DT node)
> > > -> of_mdiobus_register_phy(ext_mdio DT node)
> > > -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
> > > -> Tried to get the IRQ listed in ethernet_phy and fails with
> > > -EPROBE_DEFER because the IRQ driver isn't loaded yet.
> > >
> > > The error is propagated correctly all the way up to of_mdiobus_register(), but
> > > mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
> > > continues on with the rest of the stuff and returns success as long as
> > > one of the child nodes (in this case int_mdio) succeeds.
> > >
> > > Since the probe returns 0 without really succeeding, networking stuff
> > > just fails badly after this. So, IMO, the real problem is with
> > > mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
> > > quick hack (pasted at the end of this email) to test my theory and he
> > > confirmed that it fixes the issue (a few deferred probes later, things
> > > work properly).
> > >
> > > Andrew, I don't see any good reason for mdio_mux_init() not
> > > propagating the errors up correctly (at least for EPROBE_DEFER). I'll
> > > send a patch to fix this. Please let me know if there's a reason it
> > > has to stay as-is.
> > 
> > I sent out the proper fix as a series:
> > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> > 
> > Marc, can you give it a shot please?
> > 
> > -Saravana
> 
> Superstar!  Thanks for taking the time to rectify this for all of us.

Just to clarify:

  Are we waiting on a subsequent patch submission at this point?
Saravana Kannan Aug. 16, 2021, 8:27 p.m. UTC | #19
On Mon, Aug 16, 2021 at 5:47 AM Lee Jones <lee.jones@linaro.org> wrote:
>
> On Thu, 05 Aug 2021, Lee Jones wrote:
>
> > On Wed, 04 Aug 2021, Saravana Kannan wrote:
> >
> > > On Wed, Aug 4, 2021 at 11:20 AM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Wed, Aug 4, 2021 at 1:50 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > On Wed, 04 Aug 2021 02:36:45 +0100,
> > > > > Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > Hi Saravana,
> > > > >
> > > > > Thanks for looking into this.
> > > >
> > > > You are welcome. I just don't want people to think fw_devlink is broken :)
> > > >
> > > > >
> > > > > [...]
> > > > >
> > > > > > > Saravana, could you please have a look from a fw_devlink perspective?
> > > > > >
> > > > > > Sigh... I spent several hours looking at this and wrote up an analysis
> > > > > > and then realized I might be looking at the wrong DT files.
> > > > > >
> > > > > > Marc, can you point me to the board file in upstream that corresponds
> > > > > > to the platform in which you see this issue? I'm not asking for [1],
> > > > > > but the actual final .dts (not .dtsi) file that corresponds to the
> > > > > > platform/board/system.
> > > > >
> > > > > The platform I can reproduce this on is described in
> > > > > arch/arm64/boot/dts/amlogic/meson-sm1-khadas-vim3l.dts. It is an
> > > > > intricate maze of inclusion, node merge and other DT subtleties. I
> > > > > suggest you look at the decompiled version to get a view of the
> > > > > result.
> > > >
> > > > Thanks. After decompiling it, it looks something like (stripped a
> > > > bunch of reg and address properties and added the labels back):
> > > >
> > > > eth_phy: mdio-multiplexer@4c000 {
> > > >         compatible = "amlogic,g12a-mdio-mux";
> > > >         clocks = <0x02 0x13 0x1e 0x02 0xb1>;
> > > >         clock-names = "pclk\0clkin0\0clkin1";
> > > >         mdio-parent-bus = <0x22>;
> > > >
> > > >         ext_mdio: mdio@0 {
> > > >                 reg = <0x00>;
> > > >
> > > >                 ethernet-phy@0 {
> > > >                         max-speed = <0x3e8>;
> > > >                         interrupt-parent = <0x23>;
> > > >                         interrupts = <0x1a 0x08>;
> > > >                         phandle = <0x16>;
> > > >                 };
> > > >         };
> > > >
> > > >         int_mdio: mdio@1 {
> > > >                 ...
> > > >         }
> > > > }
> > > >
> > > > And phandle 0x23 refers to the gpio_intc interrupt controller with the
> > > > modular driver.
> > > >
> > > > > > Based on your error messages, it's failing for mdio@0 which
> > > > > > corresponds to ext_mdio. But none of the board dts files in upstream
> > > > > > have a compatible property for "ext_mdio". Which means fw_devlink
> > > > > > _should_ propagate the gpio_intc IRQ dependency all the way up to
> > > > > > eth_phy.
> > > > > >
> > > > > > Also, in the failing case, can you run:
> > > > > > ls -ld supplier:*
> > > > > >
> > > > > > in the /sys/devices/....<something>/ folder that corresponds to the
> > > > > > "eth_phy: mdio-multiplexer@4c000" DT node and tell me what it shows?
> > > > >
> > > > > Here you go:
> > > > >
> > > > > root@tiger-roach:~# find /sys/devices/ -name 'supplier*'|grep -i mdio | xargs ls -ld
> > > > > lrwxrwxrwx 1 root root 0 Aug  4 09:47 /sys/devices/platform/soc/ff600000.bus/ff64c000.mdio-multiplexer/supplier:platform:ff63c000.system-controller:clock-controller -> ../../../../virtual/devlink/platform:ff63c000.system-controller:clock-controller--platform:ff64c000.mdio-multiplexer
> > > >
> > > > As we discussed over chat, this was taken after the mdio-multiplexer
> > > > driver "successfully" probes this device. This will cause
> > > > SYNC_STATE_ONLY device links created by fw_devlink to be deleted
> > > > (because they are useless after a device probes). So, this doesn't
> > > > show the info I was hoping to demonstrate.
> > > >
> > > > In any case, one can see that fw_devlink properly created the device
> > > > link for the clocks dependency. So fw_devlink is parsing this node
> > > > properly. But it doesn't create a similar probe order enforcing device
> > > > link between the mdio-multiplexer and the gpio_intc because the
> > > > dependency is only present in a grand child DT node (ethernet-phy@0
> > > > under ext_mdio). So fw_devlink is working as intended.
> > > >
> > > > I spent several hours squinting at the code/DT yesterday. Here's what
> > > > is going on and causing the problem:
> > > >
> > > > The failing driver in this case is
> > > > drivers/net/mdio/mdio-mux-meson-g12a.c. And the only DT node it's
> > > > handling is what I pasted above in this email. In the failure case,
> > > > the call flow is something like this:
> > > >
> > > > g12a_mdio_mux_probe()
> > > > -> mdio_mux_init()
> > > > -> of_mdiobus_register(ext_mdio DT node)
> > > > -> of_mdiobus_register_phy(ext_mdio DT node)
> > > > -> several calls deep fwnode_mdiobus_phy_device_register(ethernet_phy DT node)
> > > > -> Tried to get the IRQ listed in ethernet_phy and fails with
> > > > -EPROBE_DEFER because the IRQ driver isn't loaded yet.
> > > >
> > > > The error is propagated correctly all the way up to of_mdiobus_register(), but
> > > > mdio_mux_init() ignores the -EPROBE_DEFER from of_mdiobus_register() and just
> > > > continues on with the rest of the stuff and returns success as long as
> > > > one of the child nodes (in this case int_mdio) succeeds.
> > > >
> > > > Since the probe returns 0 without really succeeding, networking stuff
> > > > just fails badly after this. So, IMO, the real problem is with
> > > > mdio_mux_init() not propagating up the -EPROBE_DEFER. I gave Marc a
> > > > quick hack (pasted at the end of this email) to test my theory and he
> > > > confirmed that it fixes the issue (a few deferred probes later, things
> > > > work properly).
> > > >
> > > > Andrew, I don't see any good reason for mdio_mux_init() not
> > > > propagating the errors up correctly (at least for EPROBE_DEFER). I'll
> > > > send a patch to fix this. Please let me know if there's a reason it
> > > > has to stay as-is.
> > >
> > > I sent out the proper fix as a series:
> > > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> > >
> > > Marc, can you give it a shot please?
> > >
> > > -Saravana
> >
> > Superstar!  Thanks for taking the time to rectify this for all of us.
>
> Just to clarify:
>
>   Are we waiting on a subsequent patch submission at this point?

Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
proper fix patches. I didn't think I needed to send any newer patches.
Is there some reason you that I needed to?
https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t

-Saravana


>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog
Andrew Lunn Aug. 16, 2021, 8:46 p.m. UTC | #20
> Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
> proper fix patches. I didn't think I needed to send any newer patches.
> Is there some reason you that I needed to?
> https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t

https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=net%3A+mdio-mux%3A+Delete+unnecessary+devm_kfree&archive=both&delegate=

State Changes Requested. I guess because you got the subject wrong.

With netdev, if it has not been merged within three days, you probably
need to resubmit.

     Andrew
Saravana Kannan Aug. 16, 2021, 9:02 p.m. UTC | #21
On Mon, Aug 16, 2021 at 1:46 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
> > proper fix patches. I didn't think I needed to send any newer patches.
> > Is there some reason you that I needed to?
> > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
>
> https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=net%3A+mdio-mux%3A+Delete+unnecessary+devm_kfree&archive=both&delegate=
>
> State Changes Requested. I guess because you got the subject wrong.

I'm assuming the prefix is wrong? What should it be? I went by looking
at the latest commit in:
$ git log --oneline  drivers/net/mdio/
ac53c26433b5 net: mdiobus: withdraw fwnode_mdbiobus_register

What prefix do I need to use to be considered correct?
net: mdio:?

>
> With netdev, if it has not been merged within three days, you probably
> need to resubmit.

Ah, thanks for the info. Since you didn't say anything, I assumed it'd
get in eventually and didn't really check the patchwork status.

-Saravana
Andrew Lunn Aug. 16, 2021, 9:18 p.m. UTC | #22
On Mon, Aug 16, 2021 at 02:02:12PM -0700, Saravana Kannan wrote:
> On Mon, Aug 16, 2021 at 1:46 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
> > > proper fix patches. I didn't think I needed to send any newer patches.
> > > Is there some reason you that I needed to?
> > > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> >
> > https://patchwork.kernel.org/project/netdevbpf/list/?series=&submitter=&state=*&q=net%3A+mdio-mux%3A+Delete+unnecessary+devm_kfree&archive=both&delegate=
> >
> > State Changes Requested. I guess because you got the subject wrong.
> 
> I'm assuming the prefix is wrong? What should it be? I went by looking
> at the latest commit in:
> $ git log --oneline  drivers/net/mdio/
> ac53c26433b5 net: mdiobus: withdraw fwnode_mdbiobus_register
> 
> What prefix do I need to use to be considered correct?
> net: mdio:?

https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html

and in particular:

https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html#how-do-i-indicate-which-tree-net-vs-net-next-my-patch-should-be-in

	Andrew
Lee Jones Aug. 17, 2021, 7:24 a.m. UTC | #23
On Mon, 16 Aug 2021, Saravana Kannan wrote:
> > > > I sent out the proper fix as a series:
> > > > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> > > >
> > > > Marc, can you give it a shot please?
> > > >
> > > > -Saravana
> > >
> > > Superstar!  Thanks for taking the time to rectify this for all of us.
> >
> > Just to clarify:
> >
> >   Are we waiting on a subsequent patch submission at this point?
> 
> Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
> proper fix patches. I didn't think I needed to send any newer patches.
> Is there some reason you that I needed to?

Actually, I meant *this* patch.

But happy to have unlocked your patches also. :)
Saravana Kannan Aug. 17, 2021, 6:12 p.m. UTC | #24
On Tue, Aug 17, 2021 at 12:24 AM Lee Jones <lee.jones@linaro.org> wrote:
>
> On Mon, 16 Aug 2021, Saravana Kannan wrote:
> > > > > I sent out the proper fix as a series:
> > > > > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> > > > >
> > > > > Marc, can you give it a shot please?
> > > > >
> > > > > -Saravana
> > > >
> > > > Superstar!  Thanks for taking the time to rectify this for all of us.
> > >
> > > Just to clarify:
> > >
> > >   Are we waiting on a subsequent patch submission at this point?
> >
> > Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
> > proper fix patches. I didn't think I needed to send any newer patches.
> > Is there some reason you that I needed to?
>
> Actually, I meant *this* patch.

I think it'll be nice if Neil addresses the stuff Marc mentioned
(ideally) using the macros I suggested. Not sure if Marc is waiting on
that though. Marc also probably wants my mdio-mux series to merge
first before he takes this patch. So that it doesn't break networking
in his device.

-Saravana

>
> But happy to have unlocked your patches also. :)
>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog
Marc Zyngier Aug. 18, 2021, 11:19 a.m. UTC | #25
On Tue, 17 Aug 2021 19:12:34 +0100,
Saravana Kannan <saravanak@google.com> wrote:
> 
> On Tue, Aug 17, 2021 at 12:24 AM Lee Jones <lee.jones@linaro.org> wrote:
> >
> > On Mon, 16 Aug 2021, Saravana Kannan wrote:
> > > > > > I sent out the proper fix as a series:
> > > > > > https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
> > > > > >
> > > > > > Marc, can you give it a shot please?
> > > > > >
> > > > > > -Saravana
> > > > >
> > > > > Superstar!  Thanks for taking the time to rectify this for all of us.
> > > >
> > > > Just to clarify:
> > > >
> > > >   Are we waiting on a subsequent patch submission at this point?
> > >
> > > Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
> > > proper fix patches. I didn't think I needed to send any newer patches.
> > > Is there some reason you that I needed to?
> >
> > Actually, I meant *this* patch.
> 
> I think it'll be nice if Neil addresses the stuff Marc mentioned
> (ideally) using the macros I suggested. Not sure if Marc is waiting on
> that though. Marc also probably wants my mdio-mux series to merge
> first before he takes this patch. So that it doesn't break networking
> in his device.

Yup. Two things need to happen here:

- the MDIO fixes must be merged (I think they are queued, from what I
  can see)

- the irqchip patch must be fixed so that the driver cannot be
  unloaded (Saravana did explain what needs to be done).

Once these two condition are met, I'll happily take this patch.

	M.
Neil Armstrong Sept. 2, 2021, 9:28 a.m. UTC | #26
Hi,

On 18/08/2021 13:19, Marc Zyngier wrote:
> On Tue, 17 Aug 2021 19:12:34 +0100,
> Saravana Kannan <saravanak@google.com> wrote:
>>
>> On Tue, Aug 17, 2021 at 12:24 AM Lee Jones <lee.jones@linaro.org> wrote:
>>>
>>> On Mon, 16 Aug 2021, Saravana Kannan wrote:
>>>>>>> I sent out the proper fix as a series:
>>>>>>> https://lore.kernel.org/lkml/20210804214333.927985-1-saravanak@google.com/T/#t
>>>>>>>
>>>>>>> Marc, can you give it a shot please?
>>>>>>>
>>>>>>> -Saravana
>>>>>>
>>>>>> Superstar!  Thanks for taking the time to rectify this for all of us.
>>>>>
>>>>> Just to clarify:
>>>>>
>>>>>   Are we waiting on a subsequent patch submission at this point?
>>>>
>>>> Not that I'm aware of. Andrew added a "Reviewed-by" to all 3 of my
>>>> proper fix patches. I didn't think I needed to send any newer patches.
>>>> Is there some reason you that I needed to?
>>>
>>> Actually, I meant *this* patch.
>>
>> I think it'll be nice if Neil addresses the stuff Marc mentioned
>> (ideally) using the macros I suggested. Not sure if Marc is waiting on
>> that though. Marc also probably wants my mdio-mux series to merge
>> first before he takes this patch. So that it doesn't break networking
>> in his device.
> 
> Yup. Two things need to happen here:
> 
> - the MDIO fixes must be merged (I think they are queued, from what I
>   can see)
> 
> - the irqchip patch must be fixed so that the driver cannot be
>   unloaded (Saravana did explain what needs to be done).

I'll post a re-spin of this serie with the suggested fixes.

Neil

> 
> Once these two condition are met, I'll happily take this patch.
> 
> 	M.
>
diff mbox series

Patch

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index bfc9719dbcdc..04fbae99a429 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -410,8 +410,9 @@  config IRQ_UNIPHIER_AIDET
 	  Support for the UniPhier AIDET (ARM Interrupt Detector).
 
 config MESON_IRQ_GPIO
-       bool "Meson GPIO Interrupt Multiplexer"
-       depends on ARCH_MESON
+       tristate "Meson GPIO Interrupt Multiplexer"
+       depends on ARCH_MESON || COMPILE_TEST
+       default ARCH_MESON
        select IRQ_DOMAIN_HIERARCHY
        help
          Support Meson SoC Family GPIO Interrupt Multiplexer
diff --git a/drivers/irqchip/irq-meson-gpio.c b/drivers/irqchip/irq-meson-gpio.c
index bc7aebcc96e9..e3b462bd3981 100644
--- a/drivers/irqchip/irq-meson-gpio.c
+++ b/drivers/irqchip/irq-meson-gpio.c
@@ -15,6 +15,7 @@ 
 #include <linux/irqchip.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
+#include <linux/of_irq.h>
 
 #define NUM_CHANNEL 8
 #define MAX_INPUT_MUX 256
@@ -136,6 +137,7 @@  static const struct of_device_id meson_irq_gpio_matches[] = {
 struct meson_gpio_irq_controller {
 	const struct meson_gpio_irq_params *params;
 	void __iomem *base;
+	struct irq_domain *domain;
 	u32 channel_irqs[NUM_CHANNEL];
 	DECLARE_BITMAP(channel_map, NUM_CHANNEL);
 	spinlock_t lock;
@@ -436,8 +438,8 @@  static const struct irq_domain_ops meson_gpio_irq_domain_ops = {
 	.translate	= meson_gpio_irq_domain_translate,
 };
 
-static int __init meson_gpio_irq_parse_dt(struct device_node *node,
-					  struct meson_gpio_irq_controller *ctl)
+static int meson_gpio_irq_parse_dt(struct device_node *node,
+				   struct meson_gpio_irq_controller *ctl)
 {
 	const struct of_device_id *match;
 	int ret;
@@ -463,63 +465,84 @@  static int __init meson_gpio_irq_parse_dt(struct device_node *node,
 	return 0;
 }
 
-static int __init meson_gpio_irq_of_init(struct device_node *node,
-					 struct device_node *parent)
+static int meson_gpio_intc_probe(struct platform_device *pdev)
 {
-	struct irq_domain *domain, *parent_domain;
+	struct device_node *node = pdev->dev.of_node, *parent;
 	struct meson_gpio_irq_controller *ctl;
+	struct irq_domain *parent_domain;
+	struct resource *res;
 	int ret;
 
+	parent = of_irq_find_parent(node);
 	if (!parent) {
-		pr_err("missing parent interrupt node\n");
+		dev_err(&pdev->dev, "missing parent interrupt node\n");
 		return -ENODEV;
 	}
 
 	parent_domain = irq_find_host(parent);
 	if (!parent_domain) {
-		pr_err("unable to obtain parent domain\n");
+		dev_err(&pdev->dev, "unable to obtain parent domain\n");
 		return -ENXIO;
 	}
 
-	ctl = kzalloc(sizeof(*ctl), GFP_KERNEL);
+	ctl = devm_kzalloc(&pdev->dev, sizeof(*ctl), GFP_KERNEL);
 	if (!ctl)
 		return -ENOMEM;
 
 	spin_lock_init(&ctl->lock);
 
-	ctl->base = of_iomap(node, 0);
-	if (!ctl->base) {
-		ret = -ENOMEM;
-		goto free_ctl;
-	}
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	ctl->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(ctl->base))
+		return PTR_ERR(ctl->base);
 
 	ret = meson_gpio_irq_parse_dt(node, ctl);
 	if (ret)
-		goto free_channel_irqs;
-
-	domain = irq_domain_create_hierarchy(parent_domain, 0,
-					     ctl->params->nr_hwirq,
-					     of_node_to_fwnode(node),
-					     &meson_gpio_irq_domain_ops,
-					     ctl);
-	if (!domain) {
-		pr_err("failed to add domain\n");
-		ret = -ENODEV;
-		goto free_channel_irqs;
+		return ret;
+
+	ctl->domain = irq_domain_create_hierarchy(parent_domain, 0,
+						  ctl->params->nr_hwirq,
+						  of_node_to_fwnode(node),
+						  &meson_gpio_irq_domain_ops,
+						  ctl);
+	if (!ctl->domain) {
+		dev_err(&pdev->dev, "failed to add domain\n");
+		return -ENODEV;
 	}
 
-	pr_info("%d to %d gpio interrupt mux initialized\n",
-		ctl->params->nr_hwirq, NUM_CHANNEL);
+	platform_set_drvdata(pdev, ctl);
+
+	dev_info(&pdev->dev, "%d to %d gpio interrupt mux initialized\n",
+		 ctl->params->nr_hwirq, NUM_CHANNEL);
 
 	return 0;
+}
 
-free_channel_irqs:
-	iounmap(ctl->base);
-free_ctl:
-	kfree(ctl);
+static int meson_gpio_intc_remove(struct platform_device *pdev)
+{
+	struct meson_gpio_irq_controller *ctl = platform_get_drvdata(pdev);
 
-	return ret;
+	irq_domain_remove(ctl->domain);
+
+	return 0;
 }
 
-IRQCHIP_DECLARE(meson_gpio_intc, "amlogic,meson-gpio-intc",
-		meson_gpio_irq_of_init);
+static const struct of_device_id meson_gpio_intc_of_match[] = {
+	{ .compatible = "amlogic,meson-gpio-intc", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, meson_gpio_intc_of_match);
+
+static struct platform_driver meson_gpio_intc_driver = {
+	.probe  = meson_gpio_intc_probe,
+	.remove = meson_gpio_intc_remove,
+	.driver = {
+		.name = "meson-gpio-intc",
+		.of_match_table = meson_gpio_intc_of_match,
+	},
+};
+module_platform_driver(meson_gpio_intc_driver);
+
+MODULE_AUTHOR("Jerome Brunet <jbrunet@baylibre.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:meson-gpio-intc");