Message ID | 20250123-fix_missing_rtnl_lock_phy_disconnect-v2-1-e6206f5508ba@bootlin.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Geert Uytterhoeven |
Headers | show |
Series | Fix missing rtnl lock in suspend path | expand |
On 23/01/2025 16:58, Kory Maincent wrote: > Fix the suspend path by ensuring the rtnl lock is held where required. > Calls to ravb_open, ravb_close and wol operations must be performed under > the rtnl lock to prevent conflicts with ongoing ndo operations. > > Without this fix, the following warning is triggered: > [ 39.032969] ============================= > [ 39.032983] WARNING: suspicious RCU usage > [ 39.033019] ----------------------------- > [ 39.033033] drivers/net/phy/phy_device.c:2004 suspicious > rcu_dereference_protected() usage! > ... > [ 39.033597] stack backtrace: > [ 39.033613] CPU: 0 UID: 0 PID: 174 Comm: python3 Not tainted > 6.13.0-rc7-next-20250116-arm64-renesas-00002-g35245dfdc62c #7 > [ 39.033623] Hardware name: Renesas SMARC EVK version 2 based on > r9a08g045s33 (DT) > [ 39.033628] Call trace: > [ 39.033633] show_stack+0x14/0x1c (C) > [ 39.033652] dump_stack_lvl+0xb4/0xc4 > [ 39.033664] dump_stack+0x14/0x1c > [ 39.033671] lockdep_rcu_suspicious+0x16c/0x22c > [ 39.033682] phy_detach+0x160/0x190 > [ 39.033694] phy_disconnect+0x40/0x54 > [ 39.033703] ravb_close+0x6c/0x1cc > [ 39.033714] ravb_suspend+0x48/0x120 > [ 39.033721] dpm_run_callback+0x4c/0x14c > [ 39.033731] device_suspend+0x11c/0x4dc > [ 39.033740] dpm_suspend+0xdc/0x214 > [ 39.033748] dpm_suspend_start+0x48/0x60 > [ 39.033758] suspend_devices_and_enter+0x124/0x574 > [ 39.033769] pm_suspend+0x1ac/0x274 > [ 39.033778] state_store+0x88/0x124 > [ 39.033788] kobj_attr_store+0x14/0x24 > [ 39.033798] sysfs_kf_write+0x48/0x6c > [ 39.033808] kernfs_fop_write_iter+0x118/0x1a8 > [ 39.033817] vfs_write+0x27c/0x378 > [ 39.033825] ksys_write+0x64/0xf4 > [ 39.033833] __arm64_sys_write+0x18/0x20 > [ 39.033841] invoke_syscall+0x44/0x104 > [ 39.033852] el0_svc_common.constprop.0+0xb4/0xd4 > [ 39.033862] do_el0_svc+0x18/0x20 > [ 39.033870] el0_svc+0x3c/0xf0 > [ 39.033880] el0t_64_sync_handler+0xc0/0xc4 > [ 39.033888] el0t_64_sync+0x154/0x158 > [ 39.041274] ravb 11c30000.ethernet eth0: Link is Down > > Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> > Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/ > Fixes: 0184165b2f42 ("ravb: add sleep PM suspend/resume support") > Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com> > Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> > Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> > Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> > --- > > Changes in v2: > - Move the rtnl_lock before ravb_wol_setup() and and ravb_wol_restore() > instead of before the if condition. > --- > drivers/net/ethernet/renesas/ravb_main.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c > index bc395294a32d..cfe4f0f364f3 100644 > --- a/drivers/net/ethernet/renesas/ravb_main.c > +++ b/drivers/net/ethernet/renesas/ravb_main.c > @@ -3217,10 +3217,16 @@ static int ravb_suspend(struct device *dev) > > netif_device_detach(ndev); > > - if (priv->wol_enabled) > - return ravb_wol_setup(ndev); > + if (priv->wol_enabled) { > + rtnl_lock(); > + ret = ravb_wol_setup(ndev); > + rtnl_unlock(); > + return ret; > + } > > + rtnl_lock(); > ret = ravb_close(ndev); > + rtnl_unlock(); > if (ret) > return ret; > > @@ -3247,7 +3253,9 @@ static int ravb_resume(struct device *dev) > > /* If WoL is enabled restore the interface. */ > if (priv->wol_enabled) { > + rtnl_lock(); > ret = ravb_wol_restore(ndev); > + rtnl_unlock(); > if (ret) > return ret; > } else { > @@ -3257,7 +3265,9 @@ static int ravb_resume(struct device *dev) > } > > /* Reopening the interface will restore the device to the working state. */ > + rtnl_lock(); > ret = ravb_open(ndev); > + rtnl_unlock(); > if (ret < 0) > goto out_rpm_put; > > Please remove Reviewed-by tags when making changes like this in a subsequent version of a patch series, this is no longer the patch I reviewed. I don't like the multiple lock/unlock calls in each function. I think v1 was better, where we take the lock once in each function and then unlock when it is no longer needed or when we're about to return. Thanks,
On Thu, 23 Jan 2025 17:23:11 +0000 Paul Barker <paul.barker.ct@bp.renesas.com> wrote: > On 23/01/2025 16:58, Kory Maincent wrote: > > Fix the suspend path by ensuring the rtnl lock is held where required. > > Calls to ravb_open, ravb_close and wol operations must be performed under > > the rtnl lock to prevent conflicts with ongoing ndo operations. ... > > > > @@ -3247,7 +3253,9 @@ static int ravb_resume(struct device *dev) > > > > /* If WoL is enabled restore the interface. */ > > if (priv->wol_enabled) { > > + rtnl_lock(); > > ret = ravb_wol_restore(ndev); > > + rtnl_unlock(); > > if (ret) > > return ret; > > } else { > > @@ -3257,7 +3265,9 @@ static int ravb_resume(struct device *dev) > > } > > > > /* Reopening the interface will restore the device to the working > > state. */ > > + rtnl_lock(); > > ret = ravb_open(ndev); > > + rtnl_unlock(); > > if (ret < 0) > > goto out_rpm_put; > > > > > > Please remove Reviewed-by tags when making changes like this in a > subsequent version of a patch series, this is no longer the patch I > reviewed. Oh, sorry for that! > I don't like the multiple lock/unlock calls in each function. I think v1 > was better, where we take the lock once in each function and then unlock > when it is no longer needed or when we're about to return. You will need to achieve a consensus on it with Claudiu. His point of view has that the locking scheme looks complicated. On my side I don't have really an opinion, maybe a small preference for v1 which is protecting wol_enabled flag even if it is not needed. --- pw-bot: cr
On Thu, 23 Jan 2025 18:33:58 +0100 Kory Maincent <kory.maincent@bootlin.com> wrote: > > > > > > @@ -3247,7 +3253,9 @@ static int ravb_resume(struct device *dev) > > > > > > /* If WoL is enabled restore the interface. */ > > > if (priv->wol_enabled) { > > > + rtnl_lock(); > > > ret = ravb_wol_restore(ndev); > > > + rtnl_unlock(); > > > if (ret) > > > return ret; > > > } else { > > > @@ -3257,7 +3265,9 @@ static int ravb_resume(struct device *dev) > > > } > > > > > > /* Reopening the interface will restore the device to the working > > > state. */ > > > + rtnl_lock(); > > > ret = ravb_open(ndev); > > > + rtnl_unlock(); > > > if (ret < 0) > > > goto out_rpm_put; > > > I don't like the multiple lock/unlock calls in each function. I think v1 > > was better, where we take the lock once in each function and then unlock > > when it is no longer needed or when we're about to return. > > You will need to achieve a consensus on it with Claudiu. His point of view has > that the locking scheme looks complicated. > > On my side I don't have really an opinion, maybe a small preference for v1 > which is protecting wol_enabled flag even if it is not needed. Claudiu any remarks? If not I will come back to the first version as asked by Paul who is the Maintainer of the ravb driver. Sergey have asked to remove the duplicate of the if condition. Paul is this ok for you? @@ -3245,19 +3250,21 @@ static int ravb_resume(struct device *dev) if (!netif_running(ndev)) return 0; + rtnl_lock(); /* If WoL is enabled restore the interface. */ - if (priv->wol_enabled) { + if (priv->wol_enabled) ret = ravb_wol_restore(ndev); - if (ret) - return ret; - } else { + else ret = pm_runtime_force_resume(dev); - if (ret) - return ret; + + if (ret) { + rtnl_unlock(); + return ret; } /* Reopening the interface will restore the device to the working state. */ ret = ravb_open(ndev); + rtnl_unlock(); if (ret < 0) goto out_rpm_put; Note: Sergey, I have received your mail as spam. Regards,
On 27.01.2025 12:28, Kory Maincent wrote: > On Thu, 23 Jan 2025 18:33:58 +0100 > Kory Maincent <kory.maincent@bootlin.com> wrote: > >>>> >>>> @@ -3247,7 +3253,9 @@ static int ravb_resume(struct device *dev) >>>> >>>> /* If WoL is enabled restore the interface. */ >>>> if (priv->wol_enabled) { >>>> + rtnl_lock(); >>>> ret = ravb_wol_restore(ndev); >>>> + rtnl_unlock(); >>>> if (ret) >>>> return ret; >>>> } else { >>>> @@ -3257,7 +3265,9 @@ static int ravb_resume(struct device *dev) >>>> } >>>> >>>> /* Reopening the interface will restore the device to the working >>>> state. */ >>>> + rtnl_lock(); >>>> ret = ravb_open(ndev); >>>> + rtnl_unlock(); >>>> if (ret < 0) >>>> goto out_rpm_put; >> >>> I don't like the multiple lock/unlock calls in each function. I think v1 >>> was better, where we take the lock once in each function and then unlock >>> when it is no longer needed or when we're about to return. >> >> You will need to achieve a consensus on it with Claudiu. His point of view has >> that the locking scheme looks complicated. >> >> On my side I don't have really an opinion, maybe a small preference for v1 >> which is protecting wol_enabled flag even if it is not needed. > > Claudiu any remarks? Sorry for the delay. I still consider it safe as proposed (taking the lock around the individual functions) due to the above reasons: 1/ in ravb_suspend(): - the execution just returns after ravb_wol_setup() - there is no need to lock around runtime PM function (pm_runtime_force_suspend()) as the execution through it reach this driver only though the driver specific runtime PM function which is a nop (and FMPOV it should be removed) 2/ in ravb_resume(): - locking only around ravb_wol_restore() and ravb_open() mimics what is done when the interface is open/closed through user space; in that scenario the ravb_close()/ravb_open() are called with rtnl_lock() held through devinet_ioctl() - and for the above mentioned reason there is no need to lock around pm_runtime_force_resume() Please follow the approach preferred by the maintainers. Thank you, Claudiu > If not I will come back to the first version as asked by Paul who is the > Maintainer of the ravb driver. > > Sergey have asked to remove the duplicate of the if condition. > Paul is this ok for you? > > @@ -3245,19 +3250,21 @@ static int ravb_resume(struct device *dev) > if (!netif_running(ndev)) > return 0; > > + rtnl_lock(); > /* If WoL is enabled restore the interface. */ > - if (priv->wol_enabled) { > + if (priv->wol_enabled) > ret = ravb_wol_restore(ndev); > - if (ret) > - return ret; > - } else { > + else > ret = pm_runtime_force_resume(dev); > - if (ret) > - return ret; > + > + if (ret) { > + rtnl_unlock(); > + return ret; > } > > /* Reopening the interface will restore the device to the working > state. */ > ret = ravb_open(ndev); > + rtnl_unlock(); > if (ret < 0) > goto out_rpm_put; > > > Note: Sergey, I have received your mail as spam. > > Regards,
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index bc395294a32d..cfe4f0f364f3 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -3217,10 +3217,16 @@ static int ravb_suspend(struct device *dev) netif_device_detach(ndev); - if (priv->wol_enabled) - return ravb_wol_setup(ndev); + if (priv->wol_enabled) { + rtnl_lock(); + ret = ravb_wol_setup(ndev); + rtnl_unlock(); + return ret; + } + rtnl_lock(); ret = ravb_close(ndev); + rtnl_unlock(); if (ret) return ret; @@ -3247,7 +3253,9 @@ static int ravb_resume(struct device *dev) /* If WoL is enabled restore the interface. */ if (priv->wol_enabled) { + rtnl_lock(); ret = ravb_wol_restore(ndev); + rtnl_unlock(); if (ret) return ret; } else { @@ -3257,7 +3265,9 @@ static int ravb_resume(struct device *dev) } /* Reopening the interface will restore the device to the working state. */ + rtnl_lock(); ret = ravb_open(ndev); + rtnl_unlock(); if (ret < 0) goto out_rpm_put;