diff mbox series

[v1,1/1] net: stmmac: dwmac-tegra: Read iommu stream id from device tree

Message ID f2a14edb5761d372ec939ccbea4fb8dfd1fdab91.1731685185.git.pnewman@connecttech.com (mailing list archive)
State New
Headers show
Series net: stmmac: dwmac-tegra: Read iommu stream id from device tree | expand

Commit Message

Parker Newman Nov. 15, 2024, 4:31 p.m. UTC
From: Parker Newman <pnewman@connecttech.com>

Read the iommu stream id from device tree rather than hard coding to mgbe0.
Fixes kernel panics when using mgbe controllers other than mgbe0.

Tested with Orin AGX 64GB module on Connect Tech Forge carrier board.

Signed-off-by: Parker Newman <pnewman@connecttech.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--
2.47.0

Comments

Paolo Abeni Nov. 15, 2024, 5:17 p.m. UTC | #1
On 11/15/24 17:31, Parker Newman wrote:
> From: Parker Newman <pnewman@connecttech.com>
> 
> Read the iommu stream id from device tree rather than hard coding to mgbe0.
> Fixes kernel panics when using mgbe controllers other than mgbe0.

It's better to include the full Oops backtrace, possibly decoded.

> Tested with Orin AGX 64GB module on Connect Tech Forge carrier board.

Since this looks like a fix, you should include a suitable 'Fixes' tag
here, and specify the 'net' target tree in the subj prefix.

> @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev)
>  	if (IS_ERR(mgbe->xpcs))
>  		return PTR_ERR(mgbe->xpcs);
> 
> +	/* get controller's stream id from iommu property in device tree */
> +	if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) {
> +		dev_err(mgbe->dev, "failed to get iommu stream id\n");
> +		return -EINVAL;
> +	}

I *think* it would be better to fallback (possibly with a warning or
notice) to the previous default value when the device tree property is
not available, to avoid regressions.

Thanks,

Paolo
Parker Newman Nov. 15, 2024, 6:59 p.m. UTC | #2
On Fri, 15 Nov 2024 18:17:07 +0100
Paolo Abeni <pabeni@redhat.com> wrote:

> On 11/15/24 17:31, Parker Newman wrote:
> > From: Parker Newman <pnewman@connecttech.com>
> >
> > Read the iommu stream id from device tree rather than hard coding to mgbe0.
> > Fixes kernel panics when using mgbe controllers other than mgbe0.
>
> It's better to include the full Oops backtrace, possibly decoded.
>

Will do, there are many different ones but I can add the most common.

> > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board.
>
> Since this looks like a fix, you should include a suitable 'Fixes' tag
> here, and specify the 'net' target tree in the subj prefix.
>

Sorry I missed the "net" tag.

The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but
in the past I was told they aren't needed in that situation?

> > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev)
> >  	if (IS_ERR(mgbe->xpcs))
> >  		return PTR_ERR(mgbe->xpcs);
> >
> > +	/* get controller's stream id from iommu property in device tree */
> > +	if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) {
> > +		dev_err(mgbe->dev, "failed to get iommu stream id\n");
> > +		return -EINVAL;
> > +	}
>
> I *think* it would be better to fallback (possibly with a warning or
> notice) to the previous default value when the device tree property is
> not available, to avoid regressions.
>

I debated this as well... In theory the iommu must be setup for the
mgbe controller to work anyways. Doing it this way means the worst case is
probe() fails and you lose an ethernet port.

Having it fall back to mgbe0's SID adds the risk of the entire system crashing.

I can see arguments for both methods. I can add the fallback to mgbe0's SID
and change the message to a warning when I send V2 if you like.

Thanks!
Parker

> Thanks,
>
> Paolo
>
Andrew Lunn Nov. 16, 2024, 7:22 p.m. UTC | #3
On Fri, Nov 15, 2024 at 01:59:40PM -0500, Parker Newman wrote:
> On Fri, 15 Nov 2024 18:17:07 +0100
> Paolo Abeni <pabeni@redhat.com> wrote:
> 
> > On 11/15/24 17:31, Parker Newman wrote:
> > > From: Parker Newman <pnewman@connecttech.com>
> > >
> > > Read the iommu stream id from device tree rather than hard coding to mgbe0.
> > > Fixes kernel panics when using mgbe controllers other than mgbe0.
> >
> > It's better to include the full Oops backtrace, possibly decoded.
> >
> 
> Will do, there are many different ones but I can add the most common.
> 
> > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board.
> >
> > Since this looks like a fix, you should include a suitable 'Fixes' tag
> > here, and specify the 'net' target tree in the subj prefix.
> >
> 
> Sorry I missed the "net" tag.
> 
> The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but
> in the past I was told they aren't needed in that situation?
> 
> > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev)
> > >  	if (IS_ERR(mgbe->xpcs))
> > >  		return PTR_ERR(mgbe->xpcs);
> > >
> > > +	/* get controller's stream id from iommu property in device tree */
> > > +	if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) {
> > > +		dev_err(mgbe->dev, "failed to get iommu stream id\n");
> > > +		return -EINVAL;
> > > +	}
> >
> > I *think* it would be better to fallback (possibly with a warning or
> > notice) to the previous default value when the device tree property is
> > not available, to avoid regressions.
> >
> 
> I debated this as well... In theory the iommu must be setup for the
> mgbe controller to work anyways. Doing it this way means the worst case is
> probe() fails and you lose an ethernet port.

New DT properties are always optional. Take the example of a board
only using a single controller. It should happily work. It probably
does not have this property because it is not needed. Your change is
likely to cause a regression on such a board.

Also, is a binding patch needed?

	Andrew
Parker Newman Nov. 18, 2024, 1:44 p.m. UTC | #4
On Sat, 16 Nov 2024 20:22:53 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> On Fri, Nov 15, 2024 at 01:59:40PM -0500, Parker Newman wrote:
> > On Fri, 15 Nov 2024 18:17:07 +0100
> > Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > > On 11/15/24 17:31, Parker Newman wrote:
> > > > From: Parker Newman <pnewman@connecttech.com>
> > > >
> > > > Read the iommu stream id from device tree rather than hard coding to mgbe0.
> > > > Fixes kernel panics when using mgbe controllers other than mgbe0.
> > >
> > > It's better to include the full Oops backtrace, possibly decoded.
> > >
> >
> > Will do, there are many different ones but I can add the most common.
> >
> > > > Tested with Orin AGX 64GB module on Connect Tech Forge carrier board.
> > >
> > > Since this looks like a fix, you should include a suitable 'Fixes' tag
> > > here, and specify the 'net' target tree in the subj prefix.
> > >
> >
> > Sorry I missed the "net" tag.
> >
> > The bug has existed since dwmac-tegra.c was added. I can add a Fixes tag but
> > in the past I was told they aren't needed in that situation?
> >
> > > > @@ -241,6 +243,12 @@ static int tegra_mgbe_probe(struct platform_device *pdev)
> > > >  	if (IS_ERR(mgbe->xpcs))
> > > >  		return PTR_ERR(mgbe->xpcs);
> > > >
> > > > +	/* get controller's stream id from iommu property in device tree */
> > > > +	if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) {
> > > > +		dev_err(mgbe->dev, "failed to get iommu stream id\n");
> > > > +		return -EINVAL;
> > > > +	}
> > >
> > > I *think* it would be better to fallback (possibly with a warning or
> > > notice) to the previous default value when the device tree property is
> > > not available, to avoid regressions.
> > >
> >
> > I debated this as well... In theory the iommu must be setup for the
> > mgbe controller to work anyways. Doing it this way means the worst case is
> > probe() fails and you lose an ethernet port.
>
> New DT properties are always optional. Take the example of a board
> only using a single controller. It should happily work. It probably
> does not have this property because it is not needed. Your change is
> likely to cause a regression on such a board.
>
> Also, is a binding patch needed?
>
> 	Andrew

This is not a new dt property, the "iommus" property is an existing property
that is parsed by the Nvidia implementation of the arm-smmu driver.

Here is a snippet from the device tree:

smmu_niso0: iommu@12000000 {
        compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
...
}

/* MGBE0 */
ethernet@6800000 {
	compatible = "nvidia,tegra234-mgbe";
...
	iommus = <&smmu_niso0 TEGRA234_SID_MGBE>;
...
}

/* MGBE1 */
ethernet@6900000 {
	compatible = "nvidia,tegra234-mgbe";
...
	iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
...
}

The 2nd field of the iommus propert is the "Stream ID" which arm-smmu stores
in the device's struct iommu_fwspec *fwspec. This is what the existing
tegra_dev_iommu_get_stream_id() function uses to get the SID.

If the iommus property is missing completely from the MGBE's device tree node it
causes secure read/write errors which spam the kernel log and can cause crashes.

I can add the fallback in V2 with a warning if that is preferred.

Thanks,
Parker
Andrew Lunn Nov. 19, 2024, 12:50 a.m. UTC | #5
> This is not a new dt property, the "iommus" property is an existing property
> that is parsed by the Nvidia implementation of the arm-smmu driver.
> 
> Here is a snippet from the device tree:
> 
> smmu_niso0: iommu@12000000 {
>         compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
> ...
> }
> 
> /* MGBE0 */
> ethernet@6800000 {
> 	compatible = "nvidia,tegra234-mgbe";
> ...
> 	iommus = <&smmu_niso0 TEGRA234_SID_MGBE>;
> ...
> }
> 
> /* MGBE1 */
> ethernet@6900000 {
> 	compatible = "nvidia,tegra234-mgbe";
> ...
> 	iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
> ...
> }

What i was meaning does the nvidia,tegra234-mgbe binding allow iommus?
I just checked, yes it does.

> If the iommus property is missing completely from the MGBE's device tree node it
> causes secure read/write errors which spam the kernel log and can cause crashes.
> 
> I can add the fallback in V2 with a warning if that is preferred.

The fact it crashed makes me think it is optional. Any existing users
must work, otherwise it would crash, and then be debugged. I guess you
are pushing the usage further, and so have come across this condition.

Is the iommus a SoC property, or a board property? If it is a SoC
property, could you review all the SoC .dtsi files and fix up any
which are missing the property?

Adding a warning is O.K, but ideally the missing property should be
added first.

The merge window is open now, so patches will need to wait two weeks.

	Andrew
Parker Newman Nov. 19, 2024, 6:13 p.m. UTC | #6
On Tue, 19 Nov 2024 01:50:18 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> > This is not a new dt property, the "iommus" property is an existing property
> > that is parsed by the Nvidia implementation of the arm-smmu driver.
> >
> > Here is a snippet from the device tree:
> >
> > smmu_niso0: iommu@12000000 {
> >         compatible = "nvidia,tegra234-smmu", "nvidia,smmu-500";
> > ...
> > }
> >
> > /* MGBE0 */
> > ethernet@6800000 {
> > 	compatible = "nvidia,tegra234-mgbe";
> > ...
> > 	iommus = <&smmu_niso0 TEGRA234_SID_MGBE>;
> > ...
> > }
> >
> > /* MGBE1 */
> > ethernet@6900000 {
> > 	compatible = "nvidia,tegra234-mgbe";
> > ...
> > 	iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
> > ...
> > }
>
> What i was meaning does the nvidia,tegra234-mgbe binding allow iommus?
> I just checked, yes it does.
>
> > If the iommus property is missing completely from the MGBE's device tree node it
> > causes secure read/write errors which spam the kernel log and can cause crashes.
> >
> > I can add the fallback in V2 with a warning if that is preferred.
>
> The fact it crashed makes me think it is optional. Any existing users
> must work, otherwise it would crash, and then be debugged. I guess you
> are pushing the usage further, and so have come across this condition.
>
> Is the iommus a SoC property, or a board property? If it is a SoC
> property, could you review all the SoC .dtsi files and fix up any
> which are missing the property?
>
> Adding a warning is O.K, but ideally the missing property should be
> added first.

I think there is some confusion here. I will try to summarize:
- Ihe iommu is supported by the Tegra SOC.
- The way the mgbe driver is written the iommu DT property is REQUIRED.
- "iommus" is a SOC DT property and is defined in tegra234.dtsi.
- The mgbe device tree nodes in tegra234.dtsi DO have the iommus property.
- There are no device tree changes required to to make this patch work.
- This patch works fine with existing device trees.

I will add the fallback however in case there is changes made to the iommu
subsystem in the future.

> The merge window is open now, so patches will need to wait two weeks.
>

Ok thanks, I will wait a couple weeks to resend.
Parker
Andrew Lunn Nov. 19, 2024, 7:18 p.m. UTC | #7
> I think there is some confusion here. I will try to summarize:
> - Ihe iommu is supported by the Tegra SOC.
> - The way the mgbe driver is written the iommu DT property is REQUIRED.

If it is required, please also include a patch to
nvidia,tegra234-mgbe.yaml and make iommus required.

> - "iommus" is a SOC DT property and is defined in tegra234.dtsi.
> - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property.
> - There are no device tree changes required to to make this patch work.
> - This patch works fine with existing device trees.
> 
> I will add the fallback however in case there is changes made to the iommu
> subsystem in the future.

I would suggest you make iommus a required property and run the tests
over the existing .dts files.

I looked at the history of tegra234.dtsi. The ethernet nodes were
added in:

610cdf3186bc604961bf04851e300deefd318038
Author: Thierry Reding <treding@nvidia.com>
Date:   Thu Jul 7 09:48:15 2022 +0200

    arm64: tegra: Add MGBE nodes on Tegra234

and the iommus property is present. So the requires is safe.

Please expand the commit message. It is clear from all the questions
and backwards and forwards, it does not provide enough details.

I just have one open issue. The code has been like this for over 2
years. Why has it only now started crashing?

	Andrew
Parker Newman Nov. 19, 2024, 7:47 p.m. UTC | #8
On Tue, 19 Nov 2024 20:18:00 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> > I think there is some confusion here. I will try to summarize:
> > - Ihe iommu is supported by the Tegra SOC.
> > - The way the mgbe driver is written the iommu DT property is REQUIRED.
>
> If it is required, please also include a patch to
> nvidia,tegra234-mgbe.yaml and make iommus required.
>

I will add this when I submit a v2 of the patch.

> > - "iommus" is a SOC DT property and is defined in tegra234.dtsi.
> > - The mgbe device tree nodes in tegra234.dtsi DO have the iommus property.
> > - There are no device tree changes required to to make this patch work.
> > - This patch works fine with existing device trees.
> >
> > I will add the fallback however in case there is changes made to the iommu
> > subsystem in the future.
>
> I would suggest you make iommus a required property and run the tests
> over the existing .dts files.
>
> I looked at the history of tegra234.dtsi. The ethernet nodes were
> added in:
>
> 610cdf3186bc604961bf04851e300deefd318038
> Author: Thierry Reding <treding@nvidia.com>
> Date:   Thu Jul 7 09:48:15 2022 +0200
>
>     arm64: tegra: Add MGBE nodes on Tegra234
>
> and the iommus property is present. So the requires is safe.
>
> Please expand the commit message. It is clear from all the questions
> and backwards and forwards, it does not provide enough details.
>

I will add more details when I submit V2.

> I just have one open issue. The code has been like this for over 2
> years. Why has it only now started crashing?
>

It is rare for Nvidia Jetson users to use the mainline kernel. Nvidia
provides a custom kernel package with many out of tree drivers including a
driver for the mgbe controllers.

Also, while the Orin AGX SOC (tegra234) has 4 instances of the mgbe controller,
the Nvidia Orin AGX devkit only uses mgbe0. Connect Tech has carrier boards
that use 2 or more of the mgbe controllers which is why we found the bug.

Thanks,
Parker
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c
index 3827997d2132..dc903b846b1b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c
@@ -1,4 +1,5 @@ 
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/iommu.h>
 #include <linux/platform_device.h>
 #include <linux/of.h>
 #include <linux/module.h>
@@ -19,6 +20,8 @@  struct tegra_mgbe {
 	struct reset_control *rst_mac;
 	struct reset_control *rst_pcs;

+	u32 iommu_sid;
+
 	void __iomem *hv;
 	void __iomem *regs;
 	void __iomem *xpcs;
@@ -50,7 +53,6 @@  struct tegra_mgbe {
 #define MGBE_WRAP_COMMON_INTR_ENABLE	0x8704
 #define MAC_SBD_INTR			BIT(2)
 #define MGBE_WRAP_AXI_ASID0_CTRL	0x8400
-#define MGBE_SID			0x6

 static int __maybe_unused tegra_mgbe_suspend(struct device *dev)
 {
@@ -84,7 +86,7 @@  static int __maybe_unused tegra_mgbe_resume(struct device *dev)
 	writel(MAC_SBD_INTR, mgbe->regs + MGBE_WRAP_COMMON_INTR_ENABLE);

 	/* Program SID */
-	writel(MGBE_SID, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL);
+	writel(mgbe->iommu_sid, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL);

 	value = readl(mgbe->xpcs + XPCS_WRAP_UPHY_STATUS);
 	if ((value & XPCS_WRAP_UPHY_STATUS_TX_P_UP) == 0) {
@@ -241,6 +243,12 @@  static int tegra_mgbe_probe(struct platform_device *pdev)
 	if (IS_ERR(mgbe->xpcs))
 		return PTR_ERR(mgbe->xpcs);

+	/* get controller's stream id from iommu property in device tree */
+	if (!tegra_dev_iommu_get_stream_id(mgbe->dev, &mgbe->iommu_sid)) {
+		dev_err(mgbe->dev, "failed to get iommu stream id\n");
+		return -EINVAL;
+	}
+
 	res.addr = mgbe->regs;
 	res.irq = irq;

@@ -346,7 +354,7 @@  static int tegra_mgbe_probe(struct platform_device *pdev)
 	writel(MAC_SBD_INTR, mgbe->regs + MGBE_WRAP_COMMON_INTR_ENABLE);

 	/* Program SID */
-	writel(MGBE_SID, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL);
+	writel(mgbe->iommu_sid, mgbe->hv + MGBE_WRAP_AXI_ASID0_CTRL);

 	plat->flags |= STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP;