Message ID | f2a14edb5761d372ec939ccbea4fb8dfd1fdab91.1731685185.git.pnewman@connecttech.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: stmmac: dwmac-tegra: Read iommu stream id from device tree | expand |
On 11/15/24 17:31, Parker Newman wrote: > From: Parker Newman <pnewman@connecttech.com> > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > Fixes kernel panics when using mgbe controllers other than mgbe0. It's better to include the full Oops backtrace, possibly decoded. > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. Since this looks like a fix, you should include a suitable 'Fixes' tag here, and specify the 'net' target tree in the subj prefix. > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > if (IS_ERR(mgbe->xpcs)) > return PTR_ERR(mgbe->xpcs); > > + /* get controller's stream id from iommu property in device tree */ > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > + return -EINVAL; > + } I *think* it would be better to fallback (possibly with a warning or notice) to the previous default value when the device tree property is not available, to avoid regressions. Thanks, Paolo
On Fri, 15 Nov 2024 18:17:07 +0100 Paolo Abeni <pabeni@redhat.com> wrote: > On 11/15/24 17:31, Parker Newman wrote: > > From: Parker Newman <pnewman@connecttech.com> > > > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > > Fixes kernel panics when using mgbe controllers other than mgbe0. > > It's better to include the full Oops backtrace, possibly decoded. > Will do, there are many different ones but I can add the most common. > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. > > Since this looks like a fix, you should include a suitable 'Fixes' tag > here, and specify the 'net' target tree in the subj prefix. > Sorry I missed the "net" tag. The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but in the past I was told they aren't needed in that situation? > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > > if (IS_ERR(mgbe->xpcs)) > > return PTR_ERR(mgbe->xpcs); > > > > + /* get controller's stream id from iommu property in device tree */ > > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > > + return -EINVAL; > > + } > > I *think* it would be better to fallback (possibly with a warning or > notice) to the previous default value when the device tree property is > not available, to avoid regressions. > I debated this as well... In theory the iommu must be setup for the mgbe controller to work anyways. Doing it this way means the worst case is probe() fails and you lose an ethernet port. Having it fall back to mgbe0's SID adds the risk of the entire system crashing. I can see arguments for both methods. I can add the fallback to mgbe0's SID and change the message to a warning when I send V2 if you like. Thanks! Parker > Thanks, > > Paolo >
On Fri, Nov 15, 2024 at 01:59:40PM -0500, Parker Newman wrote: > On Fri, 15 Nov 2024 18:17:07 +0100 > Paolo Abeni <pabeni@redhat.com> wrote: > > > On 11/15/24 17:31, Parker Newman wrote: > > > From: Parker Newman <pnewman@connecttech.com> > > > > > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > > > Fixes kernel panics when using mgbe controllers other than mgbe0. > > > > It's better to include the full Oops backtrace, possibly decoded. > > > > Will do, there are many different ones but I can add the most common. > > > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. > > > > Since this looks like a fix, you should include a suitable 'Fixes' tag > > here, and specify the 'net' target tree in the subj prefix. > > > > Sorry I missed the "net" tag. > > The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but > in the past I was told they aren't needed in that situation? > > > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > > > if (IS_ERR(mgbe->xpcs)) > > > return PTR_ERR(mgbe->xpcs); > > > > > > + /* get controller's stream id from iommu property in device tree */ > > > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > > > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > > > + return -EINVAL; > > > + } > > > > I *think* it would be better to fallback (possibly with a warning or > > notice) to the previous default value when the device tree property is > > not available, to avoid regressions. > > > > I debated this as well... In theory the iommu must be setup for the > mgbe controller to work anyways. Doing it this way means the worst case is > probe() fails and you lose an ethernet port. New DT properties are always optional. Take the example of a board only using a single controller. It should happily work. It probably does not have this property because it is not needed. Your change is likely to cause a regression on such a board. Also, is a binding patch needed? Andrew
On Sat, 16 Nov 2024 20:22:53 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > On Fri, Nov 15, 2024 at 01:59:40PM -0500, Parker Newman wrote: > > On Fri, 15 Nov 2024 18:17:07 +0100 > > Paolo Abeni <pabeni@redhat.com> wrote: > > > > > On 11/15/24 17:31, Parker Newman wrote: > > > > From: Parker Newman <pnewman@connecttech.com> > > > > > > > > Read the iommu stream id from device tree rather than hard coding to mgbe0. > > > > Fixes kernel panics when using mgbe controllers other than mgbe0. > > > > > > It's better to include the full Oops backtrace, possibly decoded. > > > > > > > Will do, there are many different ones but I can add the most common. > > > > > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board. > > > > > > Since this looks like a fix, you should include a suitable 'Fixes' tag > > > here, and specify the 'net' target tree in the subj prefix. > > > > > > > Sorry I missed the "net" tag. > > > > The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but > > in the past I was told they aren't needed in that situation? > > > > > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) > > > > if (IS_ERR(mgbe->xpcs)) > > > > return PTR_ERR(mgbe->xpcs); > > > > > > > > + /* get controller's stream id from iommu property in device tree */ > > > > + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { > > > > + dev_err(mgbe->dev, "failed to get iommu stream id\n"); > > > > + return -EINVAL; > > > > + } > > > > > > I *think* it would be better to fallback (possibly with a warning or > > > notice) to the previous default value when the device tree property is > > > not available, to avoid regressions. > > > > > > > I debated this as well... In theory the iommu must be setup for the > > mgbe controller to work anyways. Doing it this way means the worst case is > > probe() fails and you lose an ethernet port. > > New DT properties are always optional. Take the example of a board > only using a single controller. It should happily work. It probably > does not have this property because it is not needed. Your change is > likely to cause a regression on such a board. > > Also, is a binding patch needed? > > Andrew This is not a new dt property, the "iommus" property is an existing property that is parsed by the Nvidia implementation of the arm-smmu driver. Here is a snippet from the device tree: smmu_niso0: iommu@12000000 { compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; ... } /* MGBE0 */ ethernet@6800000 { compatible = "nvidia,tegra234-mgbe"; ... iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; ... } /* MGBE1 */ ethernet@6900000 { compatible = "nvidia,tegra234-mgbe"; ... iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; ... } The 2nd field of the iommus propert is the "Stream ID" which arm-smmu stores in the device's struct iommu_fwspec *fwspec. This is what the existing tegra_dev_iommu_get_stream_id() function uses to get the SID. If the iommus property is missing completely from the MGBE's device tree node it causes secure read/write errors which spam the kernel log and can cause crashes. I can add the fallback in V2 with a warning if that is preferred. Thanks, Parker
> This is not a new dt property, the "iommus" property is an existing property > that is parsed by the Nvidia implementation of the arm-smmu driver. > > Here is a snippet from the device tree: > > smmu_niso0: iommu@12000000 { > compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; > ... > } > > /* MGBE0 */ > ethernet@6800000 { > compatible = "nvidia,tegra234-mgbe"; > ... > iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; > ... > } > > /* MGBE1 */ > ethernet@6900000 { > compatible = "nvidia,tegra234-mgbe"; > ... > iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; > ... > } What i was meaning does the nvidia,tegra234-mgbe binding allow iommus? I just checked, yes it does. > If the iommus property is missing completely from the MGBE's device tree node it > causes secure read/write errors which spam the kernel log and can cause crashes. > > I can add the fallback in V2 with a warning if that is preferred. The fact it crashed makes me think it is optional. Any existing users must work, otherwise it would crash, and then be debugged. I guess you are pushing the usage further, and so have come across this condition. Is the iommus a SoC property, or a board property? If it is a SoC property, could you review all the SoC .dtsi files and fix up any which are missing the property? Adding a warning is O.K, but ideally the missing property should be added first. The merge window is open now, so patches will need to wait two weeks. Andrew
On Tue, 19 Nov 2024 01:50:18 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > > This is not a new dt property, the "iommus" property is an existing property > > that is parsed by the Nvidia implementation of the arm-smmu driver. > > > > Here is a snippet from the device tree: > > > > smmu_niso0: iommu@12000000 { > > compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500"; > > ... > > } > > > > /* MGBE0 */ > > ethernet@6800000 { > > compatible = "nvidia,tegra234-mgbe"; > > ... > > iommus = <&smmu_niso0 TEGRA234_SID_MGBE>; > > ... > > } > > > > /* MGBE1 */ > > ethernet@6900000 { > > compatible = "nvidia,tegra234-mgbe"; > > ... > > iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>; > > ... > > } > > What i was meaning does the nvidia,tegra234-mgbe binding allow iommus? > I just checked, yes it does. > > > If the iommus property is missing completely from the MGBE's device tree node it > > causes secure read/write errors which spam the kernel log and can cause crashes. > > > > I can add the fallback in V2 with a warning if that is preferred. > > The fact it crashed makes me think it is optional. Any existing users > must work, otherwise it would crash, and then be debugged. I guess you > are pushing the usage further, and so have come across this condition. > > Is the iommus a SoC property, or a board property? If it is a SoC > property, could you review all the SoC .dtsi files and fix up any > which are missing the property? > > Adding a warning is O.K, but ideally the missing property should be > added first. I think there is some confusion here. I will try to summarize: - Ihe iommu is supported by the Tegra SOC. - The way the mgbe driver is written the iommu DT property is REQUIRED. - "iommus" is a SOC DT property and is defined in tegra234.dtsi. - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. - There are no device tree changes required to to make this patch work. - This patch works fine with existing device trees. I will add the fallback however in case there is changes made to the iommu subsystem in the future. > The merge window is open now, so patches will need to wait two weeks. > Ok thanks, I will wait a couple weeks to resend. Parker
> I think there is some confusion here. I will try to summarize: > - Ihe iommu is supported by the Tegra SOC. > - The way the mgbe driver is written the iommu DT property is REQUIRED. If it is required, please also include a patch to nvidia,tegra234-mgbe.yaml and make iommus required. > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > - There are no device tree changes required to to make this patch work. > - This patch works fine with existing device trees. > > I will add the fallback however in case there is changes made to the iommu > subsystem in the future. I would suggest you make iommus a required property and run the tests over the existing .dts files. I looked at the history of tegra234.dtsi. The ethernet nodes were added in: 610cdf3186bc604961bf04851e300deefd318038 Author: Thierry Reding <treding@nvidia.com> Date: Thu Jul 7 09:48:15 2022 +0200 arm64: tegra: Add MGBE nodes on Tegra234 and the iommus property is present. So the requires is safe. Please expand the commit message. It is clear from all the questions and backwards and forwards, it does not provide enough details. I just have one open issue. The code has been like this for over 2 years. Why has it only now started crashing? Andrew
On Tue, 19 Nov 2024 20:18:00 +0100 Andrew Lunn <andrew@lunn.ch> wrote: > > I think there is some confusion here. I will try to summarize: > > - Ihe iommu is supported by the Tegra SOC. > > - The way the mgbe driver is written the iommu DT property is REQUIRED. > > If it is required, please also include a patch to > nvidia,tegra234-mgbe.yaml and make iommus required. > I will add this when I submit a v2 of the patch. > > - "iommus" is a SOC DT property and is defined in tegra234.dtsi. > > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property. > > - There are no device tree changes required to to make this patch work. > > - This patch works fine with existing device trees. > > > > I will add the fallback however in case there is changes made to the iommu > > subsystem in the future. > > I would suggest you make iommus a required property and run the tests > over the existing .dts files. > > I looked at the history of tegra234.dtsi. The ethernet nodes were > added in: > > 610cdf3186bc604961bf04851e300deefd318038 > Author: Thierry Reding <treding@nvidia.com> > Date: Thu Jul 7 09:48:15 2022 +0200 > > arm64: tegra: Add MGBE nodes on Tegra234 > > and the iommus property is present. So the requires is safe. > > Please expand the commit message. It is clear from all the questions > and backwards and forwards, it does not provide enough details. > I will add more details when I submit V2. > I just have one open issue. The code has been like this for over 2 > years. Why has it only now started crashing? > It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia provides a custom kernel package with many out of tree drivers including a driver for the mgbe controllers. Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller, the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards that use 2 or more of the mgbe controllers which is why we found the bug. Thanks, Parker
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c index 3827997d2132..dc903b846b1b 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only +#include <linux/iommu.h> #include <linux/platform_device.h> #include <linux/of.h> #include <linux/module.h> @@ -19,6 +20,8 @@ struct tegra_mgbe { struct reset_control *rst_mac; struct reset_control *rst_pcs; + u32 iommu_sid; + void __iomem *hv; void __iomem *regs; void __iomem *xpcs; @@ -50,7 +53,6 @@ struct tegra_mgbe { #define MGBE_WRAP_COMMON_INTR_ENABLE 0x8704 #define MAC_SBD_INTR BIT(2) #define MGBE_WRAP_AXI_ASID0_CTRL 0x8400 -#define MGBE_SID 0x6 static int __maybe_unused tegra_mgbe_suspend(struct device *dev) { @@ -84,7 +86,7 @@ static int __maybe_unused tegra_mgbe_resume(struct device *dev) writel(MAC_SBD_INTR, mgbe->regs + MGBE_WRAP_COMMON_INTR_ENABLE); /* Program SID */ - writel(MGBE_SID, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); + writel(mgbe->iommu_sid, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); value = readl(mgbe->xpcs + XPCS_WRAP_UPHY_STATUS); if ((value & XPCS_WRAP_UPHY_STATUS_TX_P_UP) == 0) { @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev) if (IS_ERR(mgbe->xpcs)) return PTR_ERR(mgbe->xpcs); + /* get controller's stream id from iommu property in device tree */ + if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) { + dev_err(mgbe->dev, "failed to get iommu stream id\n"); + return -EINVAL; + } + res.addr = mgbe->regs; res.irq = irq; @@ -346,7 +354,7 @@ static int tegra_mgbe_probe(struct platform_device *pdev) writel(MAC_SBD_INTR, mgbe->regs + MGBE_WRAP_COMMON_INTR_ENABLE); /* Program SID */ - writel(MGBE_SID, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); + writel(mgbe->iommu_sid, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL); plat->flags |= STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP;