Message ID | 20211214013800.2703568-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
Headers | show |
Series | Do something about ls-extirq interrupt-map breakage | expand |
On Tue, 14 Dec 2021 01:37:50 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > Currently the ls-extirq driver's use of the "interrupt-map" property is > double-broken: > - once by Rob Herring's commit 869f0ec048dc ("arm64: dts: freescale: Fix > 'interrupt-map' parent address cells") > - twice by Marc Zyngier's commit 041284181226 ("of/irq: Allow matching > of an interrupt-map local to an interrupt controller"), later revised, > not very elegantly, through commit de4adddcbcc2 ("of/irq: Add a quirk Elegance is, I'm afraid to say, bloody overrated when dealing with this sort of crap. > for controllers with their own definition of interrupt-map"). So this > part works but we're on an offender list. Define 'part works'. Either it does, or it doesn't. There is no middle ground here. > > Mark suggests that the problem may lie with the ls-extirq driver, and > its interpretation of the "interrupt-map" property, to be exact. s/Mark/Marc/, unless you are talking about someone else (who?). > > This set of changes attempts to make the problem smaller by using a > vendor-specific name for the property, and reverts Rob's patch because > similarity with "interrupt-map" isn't actually a desirable feature after > all, it seems. > > Vladimir Oltean (10): > irqchip/ls-extirq: rename "interrupt-map" OF property to > "fsl,extirq-map" > Revert "arm64: dts: freescale: Fix 'interrupt-map' parent address > cells" > dt-bindings: ls-extirq: replace "interrupt-map" documentation with > "fsl,extirq-map" > arm64: dts: ls1043a: rename the "interrupt-map" of the extirq node to > "fsl,extirq-map" > arm64: dts: ls1046a: rename the "interrupt-map" of the extirq node to > "fsl,extirq-map" > arm64: dts: ls1088a: rename the "interrupt-map" of the extirq node to > "fsl,extirq-map" > arm64: dts: ls208xa: rename the "interrupt-map" of the extirq node to > "fsl,extirq-map" > arm64: dts: lx2160a: rename the "interrupt-map" of the extirq node to > "fsl,extirq-map" > ARM: dts: ls1021a: rename the "interrupt-map" of the extirq node to > "fsl,extirq-map" > dt-bindings: ls-extirq: add a YAML schema for the validator > > .../interrupt-controller/fsl,ls-extirq.txt | 53 --------- > .../interrupt-controller/fsl,ls-extirq.yaml | 110 ++++++++++++++++++ > arch/arm/boot/dts/ls1021a.dtsi | 3 +- > .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 3 +- > .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 3 +- > .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 27 +++-- > .../arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 27 +++-- > .../arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 27 +++-- > drivers/irqchip/irq-ls-extirq.c | 12 +- > 9 files changed, 161 insertions(+), 104 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt > create mode 100644 Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml This is totally pointless. These machines have been in the wild for years, and existing DTs will be there *forever*. The very notion of 'backporting' DT changes is totally ludicrous when it is some firmware (ATF, u-boot, or something else *that isn't under your control*) that provides the DT. It also breaks backward compatibility (old kernel with new DT), which is just as important. Why do you think I went the elegance-deprived route and added a quirk? So no, I'm not taking the irqchip changes, as most of this churn serves no purpose. The revert of 869f0ec048dc is the only thing that makes some sense. M.
Hi Marc (with a c), On Tue, Dec 14, 2021 at 08:51:47AM +0000, Marc Zyngier wrote: > On Tue, 14 Dec 2021 01:37:50 +0000, > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > Currently the ls-extirq driver's use of the "interrupt-map" property is > > double-broken: > > - once by Rob Herring's commit 869f0ec048dc ("arm64: dts: freescale: Fix > > 'interrupt-map' parent address cells") > > - twice by Marc Zyngier's commit 041284181226 ("of/irq: Allow matching > > of an interrupt-map local to an interrupt controller"), later revised, > > not very elegantly, through commit de4adddcbcc2 ("of/irq: Add a quirk > > Elegance is, I'm afraid to say, bloody overrated when dealing with > this sort of crap. > > > > for controllers with their own definition of interrupt-map"). So this > > part works but we're on an offender list. > > Define 'part works'. Either it does, or it doesn't. There is no middle > ground here. "Part" is the subject, "works" is the predicate. It is a part of a larger set of changes that now works after some breakage. > > > > Mark suggests that the problem may lie with the ls-extirq driver, and > > its interpretation of the "interrupt-map" property, to be exact. > > s/Mark/Marc/, unless you are talking about someone else (who?). Twas a typo, my hand must have slipped. This will not happen again. > > This set of changes attempts to make the problem smaller by using a > > vendor-specific name for the property, and reverts Rob's patch because > > similarity with "interrupt-map" isn't actually a desirable feature after > > all, it seems. > > > > Vladimir Oltean (10): > > irqchip/ls-extirq: rename "interrupt-map" OF property to > > "fsl,extirq-map" > > Revert "arm64: dts: freescale: Fix 'interrupt-map' parent address > > cells" > > dt-bindings: ls-extirq: replace "interrupt-map" documentation with > > "fsl,extirq-map" > > arm64: dts: ls1043a: rename the "interrupt-map" of the extirq node to > > "fsl,extirq-map" > > arm64: dts: ls1046a: rename the "interrupt-map" of the extirq node to > > "fsl,extirq-map" > > arm64: dts: ls1088a: rename the "interrupt-map" of the extirq node to > > "fsl,extirq-map" > > arm64: dts: ls208xa: rename the "interrupt-map" of the extirq node to > > "fsl,extirq-map" > > arm64: dts: lx2160a: rename the "interrupt-map" of the extirq node to > > "fsl,extirq-map" > > ARM: dts: ls1021a: rename the "interrupt-map" of the extirq node to > > "fsl,extirq-map" > > dt-bindings: ls-extirq: add a YAML schema for the validator > > > > .../interrupt-controller/fsl,ls-extirq.txt | 53 --------- > > .../interrupt-controller/fsl,ls-extirq.yaml | 110 ++++++++++++++++++ > > arch/arm/boot/dts/ls1021a.dtsi | 3 +- > > .../arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 3 +- > > .../arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 3 +- > > .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 27 +++-- > > .../arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 27 +++-- > > .../arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 27 +++-- > > drivers/irqchip/irq-ls-extirq.c | 12 +- > > 9 files changed, 161 insertions(+), 104 deletions(-) > > delete mode 100644 Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.txt > > create mode 100644 Documentation/devicetree/bindings/interrupt-controller/fsl,ls-extirq.yaml > > This is totally pointless. These machines have been in the wild for > years, and existing DTs will be there *forever*. The very notion of > 'backporting' DT changes is totally ludicrous when it is some firmware > (ATF, u-boot, or something else *that isn't under your control*) that > provides the DT. It also breaks backward compatibility (old kernel > with new DT), which is just as important. Why do you think I went the > elegance-deprived route and added a quirk? I wish the firmware for these SoCs was smart enough to be compatible with the bindings that are in the kernel and provide a blob that the kernel could actually use. Some work has been started there and this is work in progress. True, I don't know what other OF-based firmware some other customers may use, but I trust it isn't a lot more advanced than what U-Boot currently has :) Also, the machines may have been in the wild for years, but the ls-extirq driver was added in November 2019. So not with the introduction of the SoC device trees themselves. That isn't so long ago. As for compatibility between old kernel and new DT: I guess you'll hear various opinions on this one. https://www.spinics.net/lists/linux-mips/msg07778.html | > Are we okay with the new device tree blobs breaking the old kernel? | | From my point of view, newer device trees are not required to work on | older kernel, this would impose an unreasonable limitation and the use | case is very limited. Sadly we're at a point where we cannot have software any longer that works with all device trees in circulation, after Rob's change, because the ls-extirq driver won't know what is the expected correct length of an "interrupt-map"/"fsl,extirq-map"/whatever-you-want-to-call-it specifier. And even with the revert, the argument you've brought to the table still holds: any kernel running on a device tree with Rob's change will stay broken no matter what we do. I'd like to take a more constructive approach and see what we can do going forward (in the direction of the arrow of time, in case that's not clear), at least. > So no, I'm not taking the irqchip changes, as most of this churn > serves no purpose. The revert of 869f0ec048dc is the only thing that > makes some sense. The devicetree changes were for Shawn to pick up anyway. But I posted the entire series as RFC to gather comments. Other opinions still welcome.
On Tue, 14 Dec 2021 09:58:54 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > Hi Marc (with a c), > > I wish the firmware for these SoCs was smart enough to be compatible > with the bindings that are in the kernel and provide a blob that the > kernel could actually use. Some work has been started there and this is > work in progress. True, I don't know what other OF-based firmware some > other customers may use, but I trust it isn't a lot more advanced than > what U-Boot currently has :) > > Also, the machines may have been in the wild for years, but the > ls-extirq driver was added in November 2019. So not with the > introduction of the SoC device trees themselves. That isn't so long ago. > > As for compatibility between old kernel and new DT: I guess you'll hear > various opinions on this one. > https://www.spinics.net/lists/linux-mips/msg07778.html > > | > Are we okay with the new device tree blobs breaking the old kernel? > | > | From my point of view, newer device trees are not required to work on > | older kernel, this would impose an unreasonable limitation and the use > | case is very limited. My views are on the opposite side. DT is an ABI, full stop. If you change something, you *must* guarantee forward *and* backward compatibility. That's because: - you don't control how updatable the firmware is - people may need to revert to other versions of the kernel because the new one is broken - there are plenty of DT users beyond Linux, and we are not creating bindings for Linux only. You may disagree with this, but for the subsystems I maintain, this is the rule I intent to stick to. M.
On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > On Tue, 14 Dec 2021 09:58:54 +0000, > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > Hi Marc (with a c), > > > > I wish the firmware for these SoCs was smart enough to be compatible > > with the bindings that are in the kernel and provide a blob that the > > kernel could actually use. Some work has been started there and this is > > work in progress. True, I don't know what other OF-based firmware some > > other customers may use, but I trust it isn't a lot more advanced than > > what U-Boot currently has :) > > > > Also, the machines may have been in the wild for years, but the > > ls-extirq driver was added in November 2019. So not with the > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > various opinions on this one. > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > | > > | From my point of view, newer device trees are not required to work on > > | older kernel, this would impose an unreasonable limitation and the use > > | case is very limited. > > My views are on the opposite side. DT is an ABI, full stop. If you > change something, you *must* guarantee forward *and* backward > compatibility. That's because: > > - you don't control how updatable the firmware is > > - people may need to revert to other versions of the kernel because > the new one is broken > > - there are plenty of DT users beyond Linux, and we are not creating > bindings for Linux only. > > You may disagree with this, but for the subsystems I maintain, this is > the rule I intent to stick to. That's an honorable set of guiding principles, but how do you apply them here? Reverting Rob's change won't fix the past, and updating the code to account for one format will break the other. As for trying one format, and if there's an error try the other, there may be situations in which you accept invalid input as valid.
On Tue, 14 Dec 2021 10:30:26 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > > On Tue, 14 Dec 2021 09:58:54 +0000, > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > Hi Marc (with a c), > > > > > > I wish the firmware for these SoCs was smart enough to be compatible > > > with the bindings that are in the kernel and provide a blob that the > > > kernel could actually use. Some work has been started there and this is > > > work in progress. True, I don't know what other OF-based firmware some > > > other customers may use, but I trust it isn't a lot more advanced than > > > what U-Boot currently has :) > > > > > > Also, the machines may have been in the wild for years, but the > > > ls-extirq driver was added in November 2019. So not with the > > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > > various opinions on this one. > > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > > | > > > | From my point of view, newer device trees are not required to work on > > > | older kernel, this would impose an unreasonable limitation and the use > > > | case is very limited. > > > > My views are on the opposite side. DT is an ABI, full stop. If you > > change something, you *must* guarantee forward *and* backward > > compatibility. That's because: > > > > - you don't control how updatable the firmware is > > > > - people may need to revert to other versions of the kernel because > > the new one is broken > > > > - there are plenty of DT users beyond Linux, and we are not creating > > bindings for Linux only. > > > > You may disagree with this, but for the subsystems I maintain, this is > > the rule I intent to stick to. > > That's an honorable set of guiding principles, but how do you apply them > here? Reverting Rob's change won't fix the past, and updating the code > to account for one format will break the other. As for trying one > format, and if there's an error try the other, there may be situations > in which you accept invalid input as valid. maz@hot-poop:~/arm-platforms$ git describe --contains 869f0ec048dc --match=v\* v5.16-rc1~125^2~19^2~16 This patch landed in -rc1, and isn't part of any release. Just revert it, and no damage is done. M.
On Tue, Dec 14, 2021 at 10:39:35AM +0000, Marc Zyngier wrote: > On Tue, 14 Dec 2021 10:30:26 +0000, > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > > > On Tue, 14 Dec 2021 09:58:54 +0000, > > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > > > Hi Marc (with a c), > > > > > > > > I wish the firmware for these SoCs was smart enough to be compatible > > > > with the bindings that are in the kernel and provide a blob that the > > > > kernel could actually use. Some work has been started there and this is > > > > work in progress. True, I don't know what other OF-based firmware some > > > > other customers may use, but I trust it isn't a lot more advanced than > > > > what U-Boot currently has :) > > > > > > > > Also, the machines may have been in the wild for years, but the > > > > ls-extirq driver was added in November 2019. So not with the > > > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > > > various opinions on this one. > > > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > > > | > > > > | From my point of view, newer device trees are not required to work on > > > > | older kernel, this would impose an unreasonable limitation and the use > > > > | case is very limited. > > > > > > My views are on the opposite side. DT is an ABI, full stop. If you > > > change something, you *must* guarantee forward *and* backward > > > compatibility. That's because: > > > > > > - you don't control how updatable the firmware is > > > > > > - people may need to revert to other versions of the kernel because > > > the new one is broken > > > > > > - there are plenty of DT users beyond Linux, and we are not creating > > > bindings for Linux only. > > > > > > You may disagree with this, but for the subsystems I maintain, this is > > > the rule I intent to stick to. > > > > That's an honorable set of guiding principles, but how do you apply them > > here? Reverting Rob's change won't fix the past, and updating the code > > to account for one format will break the other. As for trying one > > format, and if there's an error try the other, there may be situations > > in which you accept invalid input as valid. > > maz@hot-poop:~/arm-platforms$ git describe --contains 869f0ec048dc --match=v\* > v5.16-rc1~125^2~19^2~16 > > This patch landed in -rc1, and isn't part of any release. Just revert > it, and no damage is done. The revert is one of the patches posted here. It will fix the problem short-term but it may not be enough long-term. I think Rob is working on some sort of validation for "interrupt-map" and this is how the apparently non-conformant property was brought to his attention. It will trigger validation warnings that I'm afraid will be tempting for many to "fix". Thus the rest of the patches. Maybe it's just me, but between having to play a whack-a-mole game and snapping compatibility of old kernels with new DT blobs, I think more time is lost with the latter.
On Tue, 14 Dec 2021 10:53:16 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > On Tue, Dec 14, 2021 at 10:39:35AM +0000, Marc Zyngier wrote: > > On Tue, 14 Dec 2021 10:30:26 +0000, > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > > > > On Tue, 14 Dec 2021 09:58:54 +0000, > > > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > > > > > Hi Marc (with a c), > > > > > > > > > > I wish the firmware for these SoCs was smart enough to be compatible > > > > > with the bindings that are in the kernel and provide a blob that the > > > > > kernel could actually use. Some work has been started there and this is > > > > > work in progress. True, I don't know what other OF-based firmware some > > > > > other customers may use, but I trust it isn't a lot more advanced than > > > > > what U-Boot currently has :) > > > > > > > > > > Also, the machines may have been in the wild for years, but the > > > > > ls-extirq driver was added in November 2019. So not with the > > > > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > > > > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > > > > various opinions on this one. > > > > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > > > > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > > > > | > > > > > | From my point of view, newer device trees are not required to work on > > > > > | older kernel, this would impose an unreasonable limitation and the use > > > > > | case is very limited. > > > > > > > > My views are on the opposite side. DT is an ABI, full stop. If you > > > > change something, you *must* guarantee forward *and* backward > > > > compatibility. That's because: > > > > > > > > - you don't control how updatable the firmware is > > > > > > > > - people may need to revert to other versions of the kernel because > > > > the new one is broken > > > > > > > > - there are plenty of DT users beyond Linux, and we are not creating > > > > bindings for Linux only. > > > > > > > > You may disagree with this, but for the subsystems I maintain, this is > > > > the rule I intent to stick to. > > > > > > That's an honorable set of guiding principles, but how do you apply them > > > here? Reverting Rob's change won't fix the past, and updating the code > > > to account for one format will break the other. As for trying one > > > format, and if there's an error try the other, there may be situations > > > in which you accept invalid input as valid. > > > > maz@hot-poop:~/arm-platforms$ git describe --contains 869f0ec048dc --match=v\* > > v5.16-rc1~125^2~19^2~16 > > > > This patch landed in -rc1, and isn't part of any release. Just revert > > it, and no damage is done. > > The revert is one of the patches posted here. It will fix the problem > short-term but it may not be enough long-term. I think Rob is working on > some sort of validation for "interrupt-map" and this is how the apparently > non-conformant property was brought to his attention. It will trigger > validation warnings that I'm afraid will be tempting for many to "fix". Then build an annotation mechanism for the warning not to fire for quirked systems. > Thus the rest of the patches. Maybe it's just me, but between having to > play a whack-a-mole game and snapping compatibility of old kernels with > new DT blobs, I think more time is lost with the latter. I said what I had to say on the subject, and when it comes to wasted time, that's more than enough. M.
Hello Marc, On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > On Tue, 14 Dec 2021 09:58:54 +0000, > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > Hi Marc (with a c), > > > > I wish the firmware for these SoCs was smart enough to be compatible > > with the bindings that are in the kernel and provide a blob that the > > kernel could actually use. Some work has been started there and this is > > work in progress. True, I don't know what other OF-based firmware some > > other customers may use, but I trust it isn't a lot more advanced than > > what U-Boot currently has :) > > > > Also, the machines may have been in the wild for years, but the > > ls-extirq driver was added in November 2019. So not with the > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > various opinions on this one. > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > | > > | From my point of view, newer device trees are not required to work on > > | older kernel, this would impose an unreasonable limitation and the use > > | case is very limited. > > My views are on the opposite side. DT is an ABI, full stop. If you > change something, you *must* guarantee forward *and* backward > compatibility. That's because: > > - you don't control how updatable the firmware is > > - people may need to revert to other versions of the kernel because > the new one is broken > > - there are plenty of DT users beyond Linux, and we are not creating > bindings for Linux only. > > You may disagree with this, but for the subsystems I maintain, this is > the rule I intent to stick to. > > M. > > -- > Without deviation from the norm, progress is not possible. I was just debugging an interesting issue with an old kernel not working with a new DT blob, and after figuring out what the problem was (is), I remembered this message and I'm curious what you have to say about it. I have this DT layout: ethernet-phy@1 { reg = <0x1>; interrupts-extended = <&extirq 2 IRQ_TYPE_LEVEL_LOW>; }; extirq: interrupt-controller@1ac { compatible = "fsl,ls1021a-extirq"; <bla bla> }; I booted the new DT blob (which has "interrupts-extended") on a kernel where the ls-extirq driver did not exist. This had the result of of_mdiobus_phy_device_register() -> of_irq_get() returning -EPROBE_DEFER forever and ever. So the PHY driver in turn never probed, and Ethernet was broken. So I had to delete the interrupts OF property to let the PHY at least work in poll mode. What went wrong here in your opinion?
On Thu, 24 Mar 2022 17:10:42 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > Hello Marc, > > On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > > On Tue, 14 Dec 2021 09:58:54 +0000, > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > Hi Marc (with a c), > > > > > > I wish the firmware for these SoCs was smart enough to be compatible > > > with the bindings that are in the kernel and provide a blob that the > > > kernel could actually use. Some work has been started there and this is > > > work in progress. True, I don't know what other OF-based firmware some > > > other customers may use, but I trust it isn't a lot more advanced than > > > what U-Boot currently has :) > > > > > > Also, the machines may have been in the wild for years, but the > > > ls-extirq driver was added in November 2019. So not with the > > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > > various opinions on this one. > > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > > | > > > | From my point of view, newer device trees are not required to work on > > > | older kernel, this would impose an unreasonable limitation and the use > > > | case is very limited. > > > > My views are on the opposite side. DT is an ABI, full stop. If you > > change something, you *must* guarantee forward *and* backward > > compatibility. That's because: > > > > - you don't control how updatable the firmware is > > > > - people may need to revert to other versions of the kernel because > > the new one is broken > > > > - there are plenty of DT users beyond Linux, and we are not creating > > bindings for Linux only. > > > > You may disagree with this, but for the subsystems I maintain, this is > > the rule I intent to stick to. > > > > M. > > > > -- > > Without deviation from the norm, progress is not possible. > > I was just debugging an interesting issue with an old kernel not working > with a new DT blob, and after figuring out what the problem was (is), > I remembered this message and I'm curious what you have to say about it. > > I have this DT layout: > > ethernet-phy@1 { > reg = <0x1>; > interrupts-extended = <&extirq 2 IRQ_TYPE_LEVEL_LOW>; > }; > > extirq: interrupt-controller@1ac { > compatible = "fsl,ls1021a-extirq"; > <bla bla> > }; > > I booted the new DT blob (which has "interrupts-extended") on a kernel > where the ls-extirq driver did not exist. This had the result of > of_mdiobus_phy_device_register() -> of_irq_get() returning -EPROBE_DEFER > forever and ever. So the PHY driver in turn never probed, and Ethernet > was broken. So I had to delete the interrupts OF property to let the PHY > at least work in poll mode. > > What went wrong here in your opinion? I'm not sure what you expect me to say here. You have a device that references an interrupt. The DT seems sound (I don't get why you think "interrupt-extended" is a problem here, but hey...). If your kernel doesn't have a driver for the interrupt controller referenced here, what do you expect, other than things not working? M.
On Thu, Mar 24, 2022 at 05:21:50PM +0000, Marc Zyngier wrote: > On Thu, 24 Mar 2022 17:10:42 +0000, > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > Hello Marc, > > > > On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > > > On Tue, 14 Dec 2021 09:58:54 +0000, > > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > > > Hi Marc (with a c), > > > > > > > > I wish the firmware for these SoCs was smart enough to be compatible > > > > with the bindings that are in the kernel and provide a blob that the > > > > kernel could actually use. Some work has been started there and this is > > > > work in progress. True, I don't know what other OF-based firmware some > > > > other customers may use, but I trust it isn't a lot more advanced than > > > > what U-Boot currently has :) > > > > > > > > Also, the machines may have been in the wild for years, but the > > > > ls-extirq driver was added in November 2019. So not with the > > > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > > > various opinions on this one. > > > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > > > | > > > > | From my point of view, newer device trees are not required to work on > > > > | older kernel, this would impose an unreasonable limitation and the use > > > > | case is very limited. > > > > > > My views are on the opposite side. DT is an ABI, full stop. If you > > > change something, you *must* guarantee forward *and* backward > > > compatibility. That's because: > > > > > > - you don't control how updatable the firmware is > > > > > > - people may need to revert to other versions of the kernel because > > > the new one is broken > > > > > > - there are plenty of DT users beyond Linux, and we are not creating > > > bindings for Linux only. > > > > > > You may disagree with this, but for the subsystems I maintain, this is > > > the rule I intent to stick to. > > > > > > M. > > > > > > -- > > > Without deviation from the norm, progress is not possible. > > > > I was just debugging an interesting issue with an old kernel not working > > with a new DT blob, and after figuring out what the problem was (is), > > I remembered this message and I'm curious what you have to say about it. > > > > I have this DT layout: > > > > ethernet-phy@1 { > > reg = <0x1>; > > interrupts-extended = <&extirq 2 IRQ_TYPE_LEVEL_LOW>; > > }; > > > > extirq: interrupt-controller@1ac { > > compatible = "fsl,ls1021a-extirq"; > > <bla bla> > > }; > > > > I booted the new DT blob (which has "interrupts-extended") on a kernel > > where the ls-extirq driver did not exist. This had the result of > > of_mdiobus_phy_device_register() -> of_irq_get() returning -EPROBE_DEFER > > forever and ever. So the PHY driver in turn never probed, and Ethernet > > was broken. So I had to delete the interrupts OF property to let the PHY > > at least work in poll mode. > > > > What went wrong here in your opinion? > > I'm not sure what you expect me to say here. You have a device that > references an interrupt. The DT seems sound (I don't get why you think > "interrupt-extended" is a problem here, but hey...). > > If your kernel doesn't have a driver for the interrupt controller > referenced here, what do you expect, other than things not working? > > M. > > -- > Without deviation from the norm, progress is not possible. I was just raising this as what I thought would be a simple and non-controversial counter example to your remark "If you change something, you *must* guarantee forward *and* backward compatibility." Practically speaking, what has happened is that the board DT appeared in kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to enable PHY interrupts in kernel N+2. That DT update practically broke kernel N from running correctly on DTs taken from kernel N+2 onwards. This is the observable behavior, we can find as many justifications for it as we wish. (as to what I expect, Ethernet PHYs work without an interrupt too, but of_mdiobus_phy_device_register() treats -EPROBE_DEFER from of_irq_get() as special, because it assumes the IRQ domain will eventually come up. The IRQ is optional, as evidenced by the fact that kernel N worked)
On Thu, 24 Mar 2022 17:34:06 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > On Thu, Mar 24, 2022 at 05:21:50PM +0000, Marc Zyngier wrote: > > On Thu, 24 Mar 2022 17:10:42 +0000, > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > Hello Marc, > > > > > > On Tue, Dec 14, 2021 at 10:20:36AM +0000, Marc Zyngier wrote: > > > > On Tue, 14 Dec 2021 09:58:54 +0000, > > > > Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > > > > > > > > > Hi Marc (with a c), > > > > > > > > > > I wish the firmware for these SoCs was smart enough to be compatible > > > > > with the bindings that are in the kernel and provide a blob that the > > > > > kernel could actually use. Some work has been started there and this is > > > > > work in progress. True, I don't know what other OF-based firmware some > > > > > other customers may use, but I trust it isn't a lot more advanced than > > > > > what U-Boot currently has :) > > > > > > > > > > Also, the machines may have been in the wild for years, but the > > > > > ls-extirq driver was added in November 2019. So not with the > > > > > introduction of the SoC device trees themselves. That isn't so long ago. > > > > > > > > > > As for compatibility between old kernel and new DT: I guess you'll hear > > > > > various opinions on this one. > > > > > https://www.spinics.net/lists/linux-mips/msg07778.html > > > > > > > > > > | > Are we okay with the new device tree blobs breaking the old kernel? > > > > > | > > > > > | From my point of view, newer device trees are not required to work on > > > > > | older kernel, this would impose an unreasonable limitation and the use > > > > > | case is very limited. > > > > > > > > My views are on the opposite side. DT is an ABI, full stop. If you > > > > change something, you *must* guarantee forward *and* backward > > > > compatibility. That's because: > > > > > > > > - you don't control how updatable the firmware is > > > > > > > > - people may need to revert to other versions of the kernel because > > > > the new one is broken > > > > > > > > - there are plenty of DT users beyond Linux, and we are not creating > > > > bindings for Linux only. > > > > > > > > You may disagree with this, but for the subsystems I maintain, this is > > > > the rule I intent to stick to. > > > > > > > > M. > > > > > > > > -- > > > > Without deviation from the norm, progress is not possible. > > > > > > I was just debugging an interesting issue with an old kernel not working > > > with a new DT blob, and after figuring out what the problem was (is), > > > I remembered this message and I'm curious what you have to say about it. > > > > > > I have this DT layout: > > > > > > ethernet-phy@1 { > > > reg = <0x1>; > > > interrupts-extended = <&extirq 2 IRQ_TYPE_LEVEL_LOW>; > > > }; > > > > > > extirq: interrupt-controller@1ac { > > > compatible = "fsl,ls1021a-extirq"; > > > <bla bla> > > > }; > > > > > > I booted the new DT blob (which has "interrupts-extended") on a kernel > > > where the ls-extirq driver did not exist. This had the result of > > > of_mdiobus_phy_device_register() -> of_irq_get() returning -EPROBE_DEFER > > > forever and ever. So the PHY driver in turn never probed, and Ethernet > > > was broken. So I had to delete the interrupts OF property to let the PHY > > > at least work in poll mode. > > > > > > What went wrong here in your opinion? > > > > I'm not sure what you expect me to say here. You have a device that > > references an interrupt. The DT seems sound (I don't get why you think > > "interrupt-extended" is a problem here, but hey...). > > > > If your kernel doesn't have a driver for the interrupt controller > > referenced here, what do you expect, other than things not working? > > > > M. > > > > -- > > Without deviation from the norm, progress is not possible. > > I was just raising this as what I thought would be a simple and > non-controversial counter example to your remark "If you change something, > you *must* guarantee forward *and* backward compatibility." If you change something *in the binding*, which was implicit in the context, and makes no sense out of context. > > Practically speaking, what has happened is that the board DT appeared in > kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to > enable PHY interrupts in kernel N+2. That DT update practically broke > kernel N from running correctly on DTs taken from kernel N+2 onwards. > This is the observable behavior, we can find as many justifications for > it as we wish. Well, you can also argue that the DT was broken at N and N+1 for not describing the HW correctly and completely. No binding has changed here. Your DT was incomplete, and someone fixed it for you. We can argue this things forever and a half. I've laid down the ground rules for the stuff I maintain. If you're not happy with this, you can fix it by either removing the NXP hardware from the tree, or taking over from me as the irqchip maintainer. I'd be perfectly happy with any (and even more, with both) of these outcomes. M.
On Thu, Mar 24, 2022 at 06:06:51PM +0000, Marc Zyngier wrote: > > I was just raising this as what I thought would be a simple and > > non-controversial counter example to your remark "If you change something, > > you *must* guarantee forward *and* backward compatibility." > > If you change something *in the binding*, which was implicit in the > context, and makes no sense out of context. > > > Practically speaking, what has happened is that the board DT appeared in > > kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to > > enable PHY interrupts in kernel N+2. That DT update practically broke > > kernel N from running correctly on DTs taken from kernel N+2 onwards. > > This is the observable behavior, we can find as many justifications for > > it as we wish. > > Well, you can also argue that the DT was broken at N and N+1 for not > describing the HW correctly and completely. No binding has changed > here. Your DT was incomplete, and someone fixed it for you. > > We can argue this things forever and a half. I've laid down the ground > rules for the stuff I maintain. If you're not happy with this, you can > fix it by either removing the NXP hardware from the tree, or taking > over from me as the irqchip maintainer. I'd be perfectly happy with > any (and even more, with both) of these outcomes. Ok, my intention wasn't to inflame you even though the way in which I presented the problem might have suggested otherwise. With my developer hat I still don't agree with you even with the additional clarification you've made that you were referring only to bindings and not to any and all DT changes. The reason being that the DT blob is a whole, and it doesn't matter if there's a regression because of a binding change or something else, you still need to be prepared to update it, sometimes in lockstep with the kernel, like it or not. But as a user, I just wanted to get an opinion from you what can we do to deal better with this situation: optional interrupt provided by device with missing driver, which of_irq_get() doesn't seem to understand. There are more angles to this than just "new DT with old kernel". It can also be new kernel, but ls-extirq driver disabled, and I still see that as a kernel <-> DT compatibility concern.
On Thu, 24 Mar 2022 19:09:05 +0000, Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > On Thu, Mar 24, 2022 at 06:06:51PM +0000, Marc Zyngier wrote: > > > I was just raising this as what I thought would be a simple and > > > non-controversial counter example to your remark "If you change something, > > > you *must* guarantee forward *and* backward compatibility." > > > > If you change something *in the binding*, which was implicit in the > > context, and makes no sense out of context. > > > > > Practically speaking, what has happened is that the board DT appeared in > > > kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to > > > enable PHY interrupts in kernel N+2. That DT update practically broke > > > kernel N from running correctly on DTs taken from kernel N+2 onwards. > > > This is the observable behavior, we can find as many justifications for > > > it as we wish. > > > > Well, you can also argue that the DT was broken at N and N+1 for not > > describing the HW correctly and completely. No binding has changed > > here. Your DT was incomplete, and someone fixed it for you. > > > > We can argue this things forever and a half. I've laid down the ground > > rules for the stuff I maintain. If you're not happy with this, you can > > fix it by either removing the NXP hardware from the tree, or taking > > over from me as the irqchip maintainer. I'd be perfectly happy with > > any (and even more, with both) of these outcomes. > > Ok, my intention wasn't to inflame you even though the way in which I > presented the problem might have suggested otherwise. > > With my developer hat I still don't agree with you even with the > additional clarification you've made that you were referring only to > bindings and not to any and all DT changes. The reason being that the DT > blob is a whole, and it doesn't matter if there's a regression because > of a binding change or something else, you still need to be prepared to > update it, sometimes in lockstep with the kernel, like it or not. <rant> No. This doesn't happen on systems that ship the DT as part of their firmware, and this doesn't happen with ACPI either. This only happens on platform that are maintained like the NXP, Marvell, and other similar platforms that are being used as a job security tool by doing piecemeal enablement. Properly maintained systems have had the same DT for years. Features have been added over time, yet without breaking compatibility in either direction. Yes, it requires some effort and planning. And even quirks at times. Yet they don't break. Amusingly, some of these better supported platforms do not have their DT in the kernel tree. Synquacer, for example. Keeping the DTs in the kernel tree has been one of the worse decision we have ever made. It has simply moved the board files of old to a different place, under the guise of separating description and code. In practice, it abstracted nothing at all, only made it more complicated because people are treating DT as an integral part of the kernel code base, which it really shouldn't be. </rant> > But as a user, I just wanted to get an opinion from you what can we do > to deal better with this situation: optional interrupt provided by > device with missing driver, which of_irq_get() doesn't seem to understand. > There are more angles to this than just "new DT with old kernel". It can > also be new kernel, but ls-extirq driver disabled, and I still see that > as a kernel <-> DT compatibility concern. If you're missing a driver, that's a user error. Or rather, a platform maintainer error for not establishing the correct dependencies. This has nothing to do with the DT. As for optional interrupts, that has nothing to do with the DT either, but with the kernel code that requests it. If you think the kernel should do better, you can always post a patch. And I'm done on that subject. M.
On 2022-03-24 19:09, Vladimir Oltean wrote: > On Thu, Mar 24, 2022 at 06:06:51PM +0000, Marc Zyngier wrote: >>> I was just raising this as what I thought would be a simple and >>> non-controversial counter example to your remark "If you change something, >>> you *must* guarantee forward *and* backward compatibility." >> >> If you change something *in the binding*, which was implicit in the >> context, and makes no sense out of context. >> >>> Practically speaking, what has happened is that the board DT appeared in >>> kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to >>> enable PHY interrupts in kernel N+2. That DT update practically broke >>> kernel N from running correctly on DTs taken from kernel N+2 onwards. >>> This is the observable behavior, we can find as many justifications for >>> it as we wish. >> >> Well, you can also argue that the DT was broken at N and N+1 for not >> describing the HW correctly and completely. No binding has changed >> here. Your DT was incomplete, and someone fixed it for you. >> >> We can argue this things forever and a half. I've laid down the ground >> rules for the stuff I maintain. If you're not happy with this, you can >> fix it by either removing the NXP hardware from the tree, or taking >> over from me as the irqchip maintainer. I'd be perfectly happy with >> any (and even more, with both) of these outcomes. > > Ok, my intention wasn't to inflame you even though the way in which I > presented the problem might have suggested otherwise. > > With my developer hat I still don't agree with you even with the > additional clarification you've made that you were referring only to > bindings and not to any and all DT changes. The reason being that the DT > blob is a whole, and it doesn't matter if there's a regression because > of a binding change or something else, you still need to be prepared to > update it, sometimes in lockstep with the kernel, like it or not. > > But as a user, I just wanted to get an opinion from you what can we do > to deal better with this situation: optional interrupt provided by > device with missing driver, which of_irq_get() doesn't seem to understand. FWIW, of_irq_get() absolutely understands how to handle a missing IRQ provider driver; it returns -EPROBE_DEFER. If a caller considers the IRQ optional, then it's up to that caller to decide how long to keep waiting for the provider to appear until giving up and carrying on without it. If your phy driver is making the dumb decision to wait for ever for something which isn't critical, then you're free to fix it, or perhaps even propose for of_irq_get() to opt in to the driver_deferred_probe_check_state() mechanism if you believe it's a sufficiently general case. If a new DT with an additional new property (either on an existing machine, or on a completely new machine which has the property from the start) exposes a bug in a driver, that's unfortunate, but it is entirely irrelevant to the ABI implications of changing the interpretation of an existing property. Robin.
On Fri, Mar 25, 2022 at 10:34:13AM +0000, Robin Murphy wrote: > On 2022-03-24 19:09, Vladimir Oltean wrote: > > On Thu, Mar 24, 2022 at 06:06:51PM +0000, Marc Zyngier wrote: > > > > I was just raising this as what I thought would be a simple and > > > > non-controversial counter example to your remark "If you change something, > > > > you *must* guarantee forward *and* backward compatibility." > > > > > > If you change something *in the binding*, which was implicit in the > > > context, and makes no sense out of context. > > > > > > > Practically speaking, what has happened is that the board DT appeared in > > > > kernel N, the ls-extirq driver in kernel N+1, and the DT was updated to > > > > enable PHY interrupts in kernel N+2. That DT update practically broke > > > > kernel N from running correctly on DTs taken from kernel N+2 onwards. > > > > This is the observable behavior, we can find as many justifications for > > > > it as we wish. > > > > > > Well, you can also argue that the DT was broken at N and N+1 for not > > > describing the HW correctly and completely. No binding has changed > > > here. Your DT was incomplete, and someone fixed it for you. > > > > > > We can argue this things forever and a half. I've laid down the ground > > > rules for the stuff I maintain. If you're not happy with this, you can > > > fix it by either removing the NXP hardware from the tree, or taking > > > over from me as the irqchip maintainer. I'd be perfectly happy with > > > any (and even more, with both) of these outcomes. > > > > Ok, my intention wasn't to inflame you even though the way in which I > > presented the problem might have suggested otherwise. > > > > With my developer hat I still don't agree with you even with the > > additional clarification you've made that you were referring only to > > bindings and not to any and all DT changes. The reason being that the DT > > blob is a whole, and it doesn't matter if there's a regression because > > of a binding change or something else, you still need to be prepared to > > update it, sometimes in lockstep with the kernel, like it or not. > > > > But as a user, I just wanted to get an opinion from you what can we do > > to deal better with this situation: optional interrupt provided by > > device with missing driver, which of_irq_get() doesn't seem to understand. > > FWIW, of_irq_get() absolutely understands how to handle a missing IRQ > provider driver; it returns -EPROBE_DEFER. If a caller considers the IRQ > optional, then it's up to that caller to decide how long to keep waiting for > the provider to appear until giving up and carrying on without it. If your > phy driver is making the dumb decision to wait for ever for something which > isn't critical, then you're free to fix it, or perhaps even propose for > of_irq_get() to opt in to the driver_deferred_probe_check_state() mechanism > if you believe it's a sufficiently general case. Thanks, I really needed that suggestion, at the moment I made this change which seems to do what I want when I force-disable the ls-extirq driver (which in turn simulates an unwritten driver from an old kernel): diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c index 1becb1a731f6..1c1584fca632 100644 --- a/drivers/net/mdio/fwnode_mdio.c +++ b/drivers/net/mdio/fwnode_mdio.c @@ -43,6 +43,11 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio, int rc; rc = fwnode_irq_get(child, 0); + /* Don't wait forever if the IRQ provider doesn't become available, + * just fall back to poll mode + */ + if (rc == -EPROBE_DEFER) + rc = driver_deferred_probe_check_state(&phy->mdio.dev); if (rc == -EPROBE_DEFER) return rc; The trickier part seems to be to adjust this change for older kernels where the MDIO bus code calls of_irq_get() directly, and not get regressions in. That, plus I need to understand what changes in behavior when the irqchip driver is built as module and the MDIO bus driver is built-in, and the "driver_deferred_probe_timeout" kernel boot parameter has the default value of 0. > If a new DT with an additional new property (either on an existing machine, > or on a completely new machine which has the property from the start) > exposes a bug in a driver, that's unfortunate, but it is entirely irrelevant > to the ABI implications of changing the interpretation of an existing > property. > > Robin. Agree, but I'd like to at least be shot down for a point I _am_ trying to make, not for one I am not. When I resumed this discussion I wasn't really focused on the patch set that proposed to change some DT bindings. That was a bad idea, I abandoned it, issue was solved (more or less) through other means, end of story. Instead I focused on one of the arguments that Marc brought, that being able to roll back kernel independently of firmware is important. As a realist, I can't help but remark that this is effectively a non-goal. There is always a risk that old kernels are caught off guard by DT changes which they aren't prepared to handle, and even if I was aware of the fact that I'm making a breaking change for old kernels when I'm adding the 'interrupts-extended' property to the PHY (which I wasn't), I'd still do it 10 out of 10 times. I guess I'll always treat the 'old kernel works with new DT blob' case as a pleasant surprise rather than the inviolable norm that Marc is trying to make it sound like. Mentality difference between NXP/Marvell vs Socionext aside, I really don't see how you can systematically avoid these issues, so it's just a losing game to try to claim that every firmware blob should work with every kernel, for the simple reason that you can't change the past. As to whether this has any implication on the point you and Marc are trying to make, that the firmware ABI contract shouldn't change, maybe not, probably not, especially if there are alternatives. But bring a more solid argument to the table. Changing DT bindings is not off the table _because_ old kernels will stop working with new DT blobs.