Message ID | 1527160318-10958-2-git-send-email-vladimir_zapolskiy@mentor.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Geert Uytterhoeven |
Headers | show |
On Thu, May 24, 2018 at 02:11:53PM +0300, Vladimir Zapolskiy wrote: > The change fixes a sleep in atomic context issue, which can be > always triggered by running 'ethtool -r' command, because > phy_start_aneg() protects phydev fields by a mutex. > > Another note is that the change implicitly replaces phy_start_aneg() > with a newer phy_restart_aneg(). > > Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> > --- > drivers/net/ethernet/renesas/ravb_main.c | 17 +---------------- > 1 file changed, 1 insertion(+), 16 deletions(-) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index 68f122140966..4a043eb0e2aa 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev, > return error; > } > > -static int ravb_nway_reset(struct net_device *ndev) > -{ > - struct ravb_private *priv = netdev_priv(ndev); > - int error = -ENODEV; > - unsigned long flags; > - > - if (ndev->phydev) { > - spin_lock_irqsave(&priv->lock, flags); > - error = phy_start_aneg(ndev->phydev); > - spin_unlock_irqrestore(&priv->lock, flags); > - } Eck! phylib assumes thread context and takes a mutex while calling into the PHY driver. It would be good to add some sort of fixes: tag. Maybe for the commit that added the generic nway_reset? That would at least cover some stable kernels. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew
On 05/24/2018 04:22 PM, Andrew Lunn wrote: > On Thu, May 24, 2018 at 02:11:53PM +0300, Vladimir Zapolskiy wrote: >> The change fixes a sleep in atomic context issue, which can be >> always triggered by running 'ethtool -r' command, because >> phy_start_aneg() protects phydev fields by a mutex. >> >> Another note is that the change implicitly replaces phy_start_aneg() >> with a newer phy_restart_aneg(). >> >> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> >> --- >> drivers/net/ethernet/renesas/ravb_main.c | 17 +---------------- >> 1 file changed, 1 insertion(+), 16 deletions(-) >> >> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c >> index 68f122140966..4a043eb0e2aa 100644 >> --- a/drivers/net/ethernet/renesas/ravb_main.c >> +++ b/drivers/net/ethernet/renesas/ravb_main.c >> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev, >> return error; >> } >> >> -static int ravb_nway_reset(struct net_device *ndev) >> -{ >> - struct ravb_private *priv = netdev_priv(ndev); >> - int error = -ENODEV; >> - unsigned long flags; >> - >> - if (ndev->phydev) { >> - spin_lock_irqsave(&priv->lock, flags); >> - error = phy_start_aneg(ndev->phydev); >> - spin_unlock_irqrestore(&priv->lock, flags); >> - } > > Eck! phylib assumes thread context and takes a mutex while calling > into the PHY driver. > > It would be good to add some sort of fixes: tag. Maybe for the commit > that added the generic nway_reset? That would at least cover some > stable kernels. > > Reviewed-by: Andrew Lunn <andrew@lunn.ch> > Hi Andrew, thank you for review. generally it makes sense to add Fixes tag, but as I said in the commit message the problem is present before reused phy_ethtool_*() functions were added to the kernel, so some kind of juggling with the proper kernel version would be required in assumption that the fixes are backported as an unmodified changes. Hopefully Sergei as the driver maintainer can verify the fixes on older kernels and suggest the right kernel versions for backporting. -- With best wishes, Vladimir
Hello! On 05/24/2018 05:11 PM, Vladimir Zapolskiy wrote: >>> The change fixes a sleep in atomic context issue, which can be >>> always triggered by running 'ethtool -r' command, because >>> phy_start_aneg() protects phydev fields by a mutex. You don't say that *not* grabbing the spinlock is safe... >>> Another note is that the change implicitly replaces phy_start_aneg() >>> with a newer phy_restart_aneg(). Hm, perphaps this could be a material for a separate patch? >>> >>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> >>> --- >>> drivers/net/ethernet/renesas/ravb_main.c | 17 +---------------- >>> 1 file changed, 1 insertion(+), 16 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c >>> index 68f122140966..4a043eb0e2aa 100644 >>> --- a/drivers/net/ethernet/renesas/ravb_main.c >>> +++ b/drivers/net/ethernet/renesas/ravb_main.c >>> @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev, >>> return error; >>> } >>> >>> -static int ravb_nway_reset(struct net_device *ndev) >>> -{ >>> - struct ravb_private *priv = netdev_priv(ndev); >>> - int error = -ENODEV; >>> - unsigned long flags; >>> - >>> - if (ndev->phydev) { >>> - spin_lock_irqsave(&priv->lock, flags); >>> - error = phy_start_aneg(ndev->phydev); >>> - spin_unlock_irqrestore(&priv->lock, flags); >>> - } >> >> Eck! phylib assumes thread context and takes a mutex while calling >> into the PHY driver. >> >> It would be good to add some sort of fixes: tag. Maybe for the commit >> that added the generic nway_reset? That would at least cover some >> stable kernels. >> >> Reviewed-by: Andrew Lunn <andrew@lunn.ch> >> > > Hi Andrew, thank you for review. > > generally it makes sense to add Fixes tag, but as I said in > the commit message the problem is present before reused phy_ethtool_*() > functions were added to the kernel, so some kind of juggling with > the proper kernel version would be required in assumption that > the fixes are backported as an unmodified changes. The -stable fixes can vary from version to version, IIUC. You could be asked to backport your patch if Greg KH (or somebody else from the -stable kernel maintainers) gets in trouble backporting your patch. > Hopefully Sergei as the driver maintainer can verify the fixes on I'm *not* a maintainer, just a humble reviewer! :-) > older kernels and suggest the right kernel versions for backporting. This would be asking too much from me, I'm afraid... Still, Dave, could you please give me a couple of days to spend on this series? > -- > With best wishes, > Vladimir MBR, Sergei
On Thu, May 24, 2018 at 07:18:28PM +0300, Sergei Shtylyov wrote: > Hello! > > On 05/24/2018 05:11 PM, Vladimir Zapolskiy wrote: > > >>> The change fixes a sleep in atomic context issue, which can be > >>> always triggered by running 'ethtool -r' command, because > >>> phy_start_aneg() protects phydev fields by a mutex. > > You don't say that *not* grabbing the spinlock is safe... For it to be unsafe, i think that would mean phylib would need to call back into the MAC driver? The only way that could happen is via the adjust_link call. And that will deadlock, since it takes the same lock. Or am i/we missing something? Andrew
On 05/24/2018 07:44 PM, Andrew Lunn wrote: >>>>> The change fixes a sleep in atomic context issue, which can be >>>>> always triggered by running 'ethtool -r' command, because >>>>> phy_start_aneg() protects phydev fields by a mutex. >> >> You don't say that *not* grabbing the spinlock is safe... > > For it to be unsafe, i think that would mean phylib would need to call > back into the MAC driver? The only way that could happen is via the > adjust_link call. And that will deadlock, since it takes the same > lock. > > Or am i/we missing something? It doesn't take any locks currently, only patches #3/#6 makes it do so... > Andrew MBR, Sergei
> > For it to be unsafe, i think that would mean phylib would need to call > > back into the MAC driver? The only way that could happen is via the > > adjust_link call. And that will deadlock, since it takes the same > > lock. > > > > Or am i/we missing something? > > It doesn't take any locks currently, only patches #3/#6 makes it do so... Ah, yes. You should not be holding any spinlocks when calling into phylib. It does its own locking, which is mutex based. The code in this patch is not touching the MAC, so looks safe to me. Andrew
Hello Sergei, On 05/24/2018 08:01 PM, Sergei Shtylyov wrote: > On 05/24/2018 07:44 PM, Andrew Lunn wrote: > >>>>>> The change fixes a sleep in atomic context issue, which can be >>>>>> always triggered by running 'ethtool -r' command, because >>>>>> phy_start_aneg() protects phydev fields by a mutex. >>> >>> You don't say that *not* grabbing the spinlock is safe... I say both that it is the fix and it is safe, I've already described the function of the spinlock in my comments, and it is more or less clear from the driver code. >> >> For it to be unsafe, i think that would mean phylib would need to call >> back into the MAC driver? The only way that could happen is via the >> adjust_link call. And that will deadlock, since it takes the same >> lock. >> >> Or am i/we missing something? > > It doesn't take any locks currently, only patches #3/#6 makes it do so... And that's the proper fix in my opinion, my tests don't unveil any issues. -- With best wishes, Vladimir
Hello. A formal patch review this time... On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote: > The change fixes a sleep in atomic context issue, which can be > always triggered by running 'ethtool -r' command, because > phy_start_aneg() protects phydev fields by a mutex. OK so far... > Another note is that the change implicitly replaces phy_start_aneg() > with a newer phy_restart_aneg(). Why? Is this necessary to fix the BUG()? > Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> > --- > drivers/net/ethernet/renesas/ravb_main.c | 17 +---------------- > 1 file changed, 1 insertion(+), 16 deletions(-) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index 68f122140966..4a043eb0e2aa 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev, > return error; > } > > -static int ravb_nway_reset(struct net_device *ndev) > -{ > - struct ravb_private *priv = netdev_priv(ndev); > - int error = -ENODEV; > - unsigned long flags; > - > - if (ndev->phydev) { > - spin_lock_irqsave(&priv->lock, flags); OK, removing spin_lock_irqsave() fixes the BUG()... Not sure what we rotect against here anyway, MAC interrupts? > - error = phy_start_aneg(ndev->phydev); > - spin_unlock_irqrestore(&priv->lock, flags); > - } > - > - return error; > -} > - > static u32 ravb_get_msglevel(struct net_device *ndev) > { > struct ravb_private *priv = netdev_priv(ndev); > @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol) > } > > static const struct ethtool_ops ravb_ethtool_ops = { > - .nway_reset = ravb_nway_reset, > + .nway_reset = phy_ethtool_nway_reset, What does this fix? > .get_msglevel = ravb_get_msglevel, > .set_msglevel = ravb_set_msglevel, > .get_link = ethtool_op_get_link, MBR, Sergei
Hello. A formal patch review this time... On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote: > The change fixes a sleep in atomic context issue, which can be > always triggered by running 'ethtool -r' command, because > phy_start_aneg() protects phydev fields by a mutex. OK so far... > Another note is that the change implicitly replaces phy_start_aneg() > with a newer phy_restart_aneg(). Why? Is this necessary to fix the BUG()? > Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> > --- > drivers/net/ethernet/renesas/ravb_main.c | 17 +---------------- > 1 file changed, 1 insertion(+), 16 deletions(-) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index 68f122140966..4a043eb0e2aa 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev, > return error; > } > > -static int ravb_nway_reset(struct net_device *ndev) > -{ > - struct ravb_private *priv = netdev_priv(ndev); > - int error = -ENODEV; > - unsigned long flags; > - > - if (ndev->phydev) { > - spin_lock_irqsave(&priv->lock, flags); OK, removing spin_lock_irqsave() fixes the BUG()... Not sure what we rotect against here anyway, MAC interrupts? > - error = phy_start_aneg(ndev->phydev); > - spin_unlock_irqrestore(&priv->lock, flags); > - } > - > - return error; > -} > - > static u32 ravb_get_msglevel(struct net_device *ndev) > { > struct ravb_private *priv = netdev_priv(ndev); > @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol) > } > > static const struct ethtool_ops ravb_ethtool_ops = { > - .nway_reset = ravb_nway_reset, > + .nway_reset = phy_ethtool_nway_reset, What does this fix? > .get_msglevel = ravb_get_msglevel, > .set_msglevel = ravb_set_msglevel, > .get_link = ethtool_op_get_link, MBR, Sergei
On 05/24/2018 02:11 PM, Vladimir Zapolskiy wrote: > The change fixes a sleep in atomic context issue, which can be > always triggered by running 'ethtool -r' command, because > phy_start_aneg() protects phydev fields by a mutex. BTW, I was unable to trigger the BUG() with 'ethtool -r eth0' where 'eth0' is EtherAVB. What am I doing wrong? :-) MBR, Sergei
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 68f122140966..4a043eb0e2aa 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev, return error; } -static int ravb_nway_reset(struct net_device *ndev) -{ - struct ravb_private *priv = netdev_priv(ndev); - int error = -ENODEV; - unsigned long flags; - - if (ndev->phydev) { - spin_lock_irqsave(&priv->lock, flags); - error = phy_start_aneg(ndev->phydev); - spin_unlock_irqrestore(&priv->lock, flags); - } - - return error; -} - static u32 ravb_get_msglevel(struct net_device *ndev) { struct ravb_private *priv = netdev_priv(ndev); @@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol) } static const struct ethtool_ops ravb_ethtool_ops = { - .nway_reset = ravb_nway_reset, + .nway_reset = phy_ethtool_nway_reset, .get_msglevel = ravb_get_msglevel, .set_msglevel = ravb_set_msglevel, .get_link = ethtool_op_get_link,
The change fixes a sleep in atomic context issue, which can be always triggered by running 'ethtool -r' command, because phy_start_aneg() protects phydev fields by a mutex. Another note is that the change implicitly replaces phy_start_aneg() with a newer phy_restart_aneg(). Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> --- drivers/net/ethernet/renesas/ravb_main.c | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-)