diff mbox series

another netdev instance lock bug in ipv6_add_dev

Message ID aac073de8beec3e531c86c101b274d434741c28e.camel@nvidia.com (mailing list archive)
State New
Headers show
Series another netdev instance lock bug in ipv6_add_dev | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Cosmin Ratiu April 2, 2025, 9:41 p.m. UTC
Hi,

Not sure if it's reported already, but I encountered a bug while
testing with the new locking scheme.
This is the call trace:

[ 3454.975672] WARNING: CPU: 1 PID: 58237 at
./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
[ 3455.008776]  ? ipv6_add_dev+0x370/0x620
[ 3455.010097]  ipv6_find_idev+0x96/0xe0
[ 3455.010725]  addrconf_add_dev+0x1e/0xa0
[ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
[ 3455.013537]  addrconf_notify+0x35f/0x8d0
[ 3455.014214]  notifier_call_chain+0x38/0xf0
[ 3455.014903]  netdev_state_change+0x65/0x90
[ 3455.015586]  linkwatch_do_dev+0x5a/0x70
[ 3455.016238]  rtnl_getlink+0x241/0x3e0
[ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0

The call chain is rtnl_getlink -> linkwatch_sync_dev ->
linkwatch_do_dev -> netdev_state_change -> ...

Nothing on this path acquires the netdev lock, resulting in a warning.
Perhaps rtnl_getlink should acquire it, in addition to the RTNL already
held by rtnetlink_rcv_msg?

The same thing can be seen from the regular linkwatch wq:

[ 3456.637014] WARNING: CPU: 16 PID: 83257 at
./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
[ 3456.655305] Call Trace:
[ 3456.655610]  <TASK>
[ 3456.655890]  ? __warn+0x89/0x1b0
[ 3456.656261]  ? ipv6_add_dev+0x370/0x620
[ 3456.660039]  ipv6_find_idev+0x96/0xe0
[ 3456.660445]  addrconf_add_dev+0x1e/0xa0
[ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
[ 3456.661803]  addrconf_notify+0x35f/0x8d0
[ 3456.662236]  notifier_call_chain+0x38/0xf0
[ 3456.662676]  netdev_state_change+0x65/0x90
[ 3456.663112]  linkwatch_do_dev+0x5a/0x70
[ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
[ 3456.663990]  linkwatch_event+0x21/0x30
[ 3456.664399]  process_one_work+0x211/0x610
[ 3456.664828]  worker_thread+0x1cc/0x380
[ 3456.665691]  kthread+0xf4/0x210

In this case, __linkwatch_run_queue seems like a good place to grab a
device lock before calling linkwatch_do_dev.

The proposed patch is below, I'll let you reason through the
implications of calling NETDEV_CHANGE notifiers from linkwatch with the
instance lock, you have thought about this much longer than me.

---
 net/core/link_watch.c | 2 ++
 net/core/rtnetlink.c  | 2 ++
 2 files changed, 4 insertions(+)

 			       RTM_NEWLINK, NETLINK_CB(skb).portid,

Comments

Stanislav Fomichev April 2, 2025, 11:20 p.m. UTC | #1
On 04/02, Cosmin Ratiu wrote:
> Hi,
> 
> Not sure if it's reported already, but I encountered a bug while
> testing with the new locking scheme.
> This is the call trace:
> 
> [ 3454.975672] WARNING: CPU: 1 PID: 58237 at
> ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> [ 3455.008776]  ? ipv6_add_dev+0x370/0x620
> [ 3455.010097]  ipv6_find_idev+0x96/0xe0
> [ 3455.010725]  addrconf_add_dev+0x1e/0xa0
> [ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
> [ 3455.013537]  addrconf_notify+0x35f/0x8d0
> [ 3455.014214]  notifier_call_chain+0x38/0xf0
> [ 3455.014903]  netdev_state_change+0x65/0x90
> [ 3455.015586]  linkwatch_do_dev+0x5a/0x70
> [ 3455.016238]  rtnl_getlink+0x241/0x3e0
> [ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0
> 
> The call chain is rtnl_getlink -> linkwatch_sync_dev ->
> linkwatch_do_dev -> netdev_state_change -> ...
> 
> Nothing on this path acquires the netdev lock, resulting in a warning.
> Perhaps rtnl_getlink should acquire it, in addition to the RTNL already
> held by rtnetlink_rcv_msg?
> 
> The same thing can be seen from the regular linkwatch wq:
> 
> [ 3456.637014] WARNING: CPU: 16 PID: 83257 at
> ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> [ 3456.655305] Call Trace:
> [ 3456.655610]  <TASK>
> [ 3456.655890]  ? __warn+0x89/0x1b0
> [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
> [ 3456.660039]  ipv6_find_idev+0x96/0xe0
> [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
> [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
> [ 3456.661803]  addrconf_notify+0x35f/0x8d0
> [ 3456.662236]  notifier_call_chain+0x38/0xf0
> [ 3456.662676]  netdev_state_change+0x65/0x90
> [ 3456.663112]  linkwatch_do_dev+0x5a/0x70
> [ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
> [ 3456.663990]  linkwatch_event+0x21/0x30
> [ 3456.664399]  process_one_work+0x211/0x610
> [ 3456.664828]  worker_thread+0x1cc/0x380
> [ 3456.665691]  kthread+0xf4/0x210
> 
> In this case, __linkwatch_run_queue seems like a good place to grab a
> device lock before calling linkwatch_do_dev.

Thanks for the report! What about linkwatch_sync_dev in netdev_run_todo
and carrier_show? Should probably also need to be wrapped?

> The proposed patch is below, I'll let you reason through the
> implications of calling NETDEV_CHANGE notifiers from linkwatch with the
> instance lock, you have thought about this much longer than me.

Yeah, I wonder whether other unlocked NETDEV_CHANGE paths can trigger
a call to ipv6_add_dev. Will try to take a look. Initially I
though that only NETDEV_UP will trigger ipv6_add_dev, but looks
like it can happen on CHANGE as well :-(
Cosmin Ratiu April 3, 2025, 1:24 p.m. UTC | #2
On Wed, 2025-04-02 at 16:20 -0700, Stanislav Fomichev wrote:
> On 04/02, Cosmin Ratiu wrote:
> > Hi,
> > 
> > Not sure if it's reported already, but I encountered a bug while
> > testing with the new locking scheme.
> > This is the call trace:
> > 
> > [ 3454.975672] WARNING: CPU: 1 PID: 58237 at
> > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > [ 3455.008776]  ? ipv6_add_dev+0x370/0x620
> > [ 3455.010097]  ipv6_find_idev+0x96/0xe0
> > [ 3455.010725]  addrconf_add_dev+0x1e/0xa0
> > [ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
> > [ 3455.013537]  addrconf_notify+0x35f/0x8d0
> > [ 3455.014214]  notifier_call_chain+0x38/0xf0
> > [ 3455.014903]  netdev_state_change+0x65/0x90
> > [ 3455.015586]  linkwatch_do_dev+0x5a/0x70
> > [ 3455.016238]  rtnl_getlink+0x241/0x3e0
> > [ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0
> > 
> > The call chain is rtnl_getlink -> linkwatch_sync_dev ->
> > linkwatch_do_dev -> netdev_state_change -> ...
> > 
> > Nothing on this path acquires the netdev lock, resulting in a
> > warning.
> > Perhaps rtnl_getlink should acquire it, in addition to the RTNL
> > already
> > held by rtnetlink_rcv_msg?
> > 
> > The same thing can be seen from the regular linkwatch wq:
> > 
> > [ 3456.637014] WARNING: CPU: 16 PID: 83257 at
> > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > [ 3456.655305] Call Trace:
> > [ 3456.655610]  <TASK>
> > [ 3456.655890]  ? __warn+0x89/0x1b0
> > [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
> > [ 3456.660039]  ipv6_find_idev+0x96/0xe0
> > [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
> > [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
> > [ 3456.661803]  addrconf_notify+0x35f/0x8d0
> > [ 3456.662236]  notifier_call_chain+0x38/0xf0
> > [ 3456.662676]  netdev_state_change+0x65/0x90
> > [ 3456.663112]  linkwatch_do_dev+0x5a/0x70
> > [ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
> > [ 3456.663990]  linkwatch_event+0x21/0x30
> > [ 3456.664399]  process_one_work+0x211/0x610
> > [ 3456.664828]  worker_thread+0x1cc/0x380
> > [ 3456.665691]  kthread+0xf4/0x210
> > 
> > In this case, __linkwatch_run_queue seems like a good place to grab
> > a
> > device lock before calling linkwatch_do_dev.
> 
> Thanks for the report! What about linkwatch_sync_dev in
> netdev_run_todo
> and carrier_show? Should probably also need to be wrapped?

Done, here's the patch I'm testing with which works for all tests I
could get my hands on. Will you officially propose it (maybe in a
slightly different form) please?

-------------------------

linkwatch can end up calling the IPv6 addrconf notifier, which might
end
up calling ipv6_add_dev which requires holding the netdev lock.

This patch makes sure that the netdev instance lock is held on all call
paths.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Change-Id: Ief821cf069408cecc82adaa01cafa0462c51908a
---
 net/core/dev.c        | 2 +-
 net/core/link_watch.c | 2 ++
 net/core/net-sysfs.c  | 2 ++
 net/core/rtnetlink.c  | 2 ++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 87cba93fa59f..1b9ee2828076 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11343,8 +11343,8 @@ void netdev_run_todo(void)
 
 		netdev_lock(dev);
 		WRITE_ONCE(dev->reg_state, NETREG_UNREGISTERED);
-		netdev_unlock(dev);
 		linkwatch_sync_dev(dev);
+		netdev_unlock(dev);
 	}
 
 	cnt = 0;
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2b9807..002f18b11d85 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -240,7 +240,9 @@ static void __linkwatch_run_queue(int urgent_only)
 		 */
 		netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
 		spin_unlock_irq(&lweventlist_lock);
+		netdev_lock_ops(dev);
 		linkwatch_do_dev(dev);
+		netdev_unlock_ops(dev);
 		do_dev--;
 		spin_lock_irq(&lweventlist_lock);
 	}
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1ace0cd01adc..92cffb233306 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -325,7 +325,9 @@ static ssize_t carrier_show(struct device *dev,
 		/* Synchronize carrier state with link watch,
 		 * see also rtnl_getlink().
 		 */
+		netdev_lock_ops(netdev);
 		linkwatch_sync_dev(netdev);
+		netdev_unlock_ops(netdev);
 
 		ret = sysfs_emit(buf, fmt_dec,
!!netif_carrier_ok(netdev));
 	}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e4c93f87f5d4..2cb28a3d0d20 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4175,7 +4175,9 @@ static int rtnl_getlink(struct sk_buff *skb,
struct nlmsghdr *nlh,
 	 * only TX if link watch work has run, but without this we'd
 	 * already report carrier on, even if it doesn't work yet.
 	 */
+	netdev_lock_ops(dev);
 	linkwatch_sync_dev(dev);
+	netdev_unlock_ops(dev);
 
 	err = rtnl_fill_ifinfo(nskb, dev, net,
 			       RTM_NEWLINK, NETLINK_CB(skb).portid,
Stanislav Fomichev April 3, 2025, 4:22 p.m. UTC | #3
On 04/03, Cosmin Ratiu wrote:
> On Wed, 2025-04-02 at 16:20 -0700, Stanislav Fomichev wrote:
> > On 04/02, Cosmin Ratiu wrote:
> > > Hi,
> > > 
> > > Not sure if it's reported already, but I encountered a bug while
> > > testing with the new locking scheme.
> > > This is the call trace:
> > > 
> > > [ 3454.975672] WARNING: CPU: 1 PID: 58237 at
> > > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > > [ 3455.008776]  ? ipv6_add_dev+0x370/0x620
> > > [ 3455.010097]  ipv6_find_idev+0x96/0xe0
> > > [ 3455.010725]  addrconf_add_dev+0x1e/0xa0
> > > [ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
> > > [ 3455.013537]  addrconf_notify+0x35f/0x8d0
> > > [ 3455.014214]  notifier_call_chain+0x38/0xf0
> > > [ 3455.014903]  netdev_state_change+0x65/0x90
> > > [ 3455.015586]  linkwatch_do_dev+0x5a/0x70
> > > [ 3455.016238]  rtnl_getlink+0x241/0x3e0
> > > [ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0
> > > 
> > > The call chain is rtnl_getlink -> linkwatch_sync_dev ->
> > > linkwatch_do_dev -> netdev_state_change -> ...
> > > 
> > > Nothing on this path acquires the netdev lock, resulting in a
> > > warning.
> > > Perhaps rtnl_getlink should acquire it, in addition to the RTNL
> > > already
> > > held by rtnetlink_rcv_msg?
> > > 
> > > The same thing can be seen from the regular linkwatch wq:
> > > 
> > > [ 3456.637014] WARNING: CPU: 16 PID: 83257 at
> > > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > > [ 3456.655305] Call Trace:
> > > [ 3456.655610]  <TASK>
> > > [ 3456.655890]  ? __warn+0x89/0x1b0
> > > [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
> > > [ 3456.660039]  ipv6_find_idev+0x96/0xe0
> > > [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
> > > [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
> > > [ 3456.661803]  addrconf_notify+0x35f/0x8d0
> > > [ 3456.662236]  notifier_call_chain+0x38/0xf0
> > > [ 3456.662676]  netdev_state_change+0x65/0x90
> > > [ 3456.663112]  linkwatch_do_dev+0x5a/0x70
> > > [ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
> > > [ 3456.663990]  linkwatch_event+0x21/0x30
> > > [ 3456.664399]  process_one_work+0x211/0x610
> > > [ 3456.664828]  worker_thread+0x1cc/0x380
> > > [ 3456.665691]  kthread+0xf4/0x210
> > > 
> > > In this case, __linkwatch_run_queue seems like a good place to grab
> > > a
> > > device lock before calling linkwatch_do_dev.
> > 
> > Thanks for the report! What about linkwatch_sync_dev in
> > netdev_run_todo
> > and carrier_show? Should probably also need to be wrapped?
> 
> Done, here's the patch I'm testing with which works for all tests I
> could get my hands on. Will you officially propose it (maybe in a
> slightly different form) please?

I'm thinking maybe we should push down the locking a bit? To the
level of netdev_state_change. Since, in theory, every NETDEV_CHANGE
can reach addrconf_notify. I was playing with the patch below,
but I think the ethtool will lockup, so I need to fix that at least...
Let me spend a bit more time today chasing down the callers...

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9fb03a292817..f5a322b63e37 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4975,6 +4975,7 @@ void dev_set_rx_mode(struct net_device *dev);
 int dev_set_promiscuity(struct net_device *dev, int inc);
 int netif_set_allmulti(struct net_device *dev, int inc, bool notify);
 int dev_set_allmulti(struct net_device *dev, int inc);
+void netif_state_change(struct net_device *dev);
 void netdev_state_change(struct net_device *dev);
 void __netdev_notify_peers(struct net_device *dev);
 void netdev_notify_peers(struct net_device *dev);
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index ccaaf4c7d5f6..ea39dd23a197 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -240,6 +240,6 @@ rtnl_notify_needed(const struct net *net, u16 nlflags, u32 group)
 	return (nlflags & NLM_F_ECHO) || rtnl_has_listeners(net, group);
 }
 
-void netdev_set_operstate(struct net_device *dev, int newstate);
+void netif_set_operstate(struct net_device *dev, int newstate);
 
 #endif	/* __LINUX_RTNETLINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 87cba93fa59f..d4a5c07a0d73 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1567,15 +1567,7 @@ void netdev_features_change(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_features_change);
 
-/**
- *	netdev_state_change - device changes state
- *	@dev: device to cause notification
- *
- *	Called to indicate a device has changed state. This function calls
- *	the notifier chains for netdev_chain and sends a NEWLINK message
- *	to the routing socket.
- */
-void netdev_state_change(struct net_device *dev)
+void netif_state_change(struct net_device *dev)
 {
 	if (dev->flags & IFF_UP) {
 		struct netdev_notifier_change_info change_info = {
@@ -1587,7 +1579,6 @@ void netdev_state_change(struct net_device *dev)
 		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL, 0, NULL);
 	}
 }
-EXPORT_SYMBOL(netdev_state_change);
 
 /**
  * __netdev_notify_peers - notify network peers about existence of @dev,
diff --git a/net/core/dev_api.c b/net/core/dev_api.c
index 90bafb0b1b8c..6c6ca15ef2a3 100644
--- a/net/core/dev_api.c
+++ b/net/core/dev_api.c
@@ -327,3 +327,19 @@ int dev_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_xdp_propagate);
+
+/**
+ *	netdev_state_change - device changes state
+ *	@dev: device to cause notification
+ *
+ *	Called to indicate a device has changed state. This function calls
+ *	the notifier chains for netdev_chain and sends a NEWLINK message
+ *	to the routing socket.
+ */
+void netdev_state_change(struct net_device *dev)
+{
+	netdev_lock_ops(dev);
+	netif_state_change(dev);
+	netdev_unlock_ops(dev);
+}
+EXPORT_SYMBOL(netdev_state_change);
diff --git a/net/core/lock_debug.c b/net/core/lock_debug.c
index 72e522a68775..c442bf52dbaf 100644
--- a/net/core/lock_debug.c
+++ b/net/core/lock_debug.c
@@ -20,11 +20,11 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
 	switch (cmd) {
 	case NETDEV_REGISTER:
 	case NETDEV_UP:
+	case NETDEV_CHANGE:
 		netdev_ops_assert_locked(dev);
 		fallthrough;
 	case NETDEV_DOWN:
 	case NETDEV_REBOOT:
-	case NETDEV_CHANGE:
 	case NETDEV_UNREGISTER:
 	case NETDEV_CHANGEMTU:
 	case NETDEV_CHANGEADDR:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c23852835050..d8d03ff87a3b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1043,7 +1043,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id,
 }
 EXPORT_SYMBOL_GPL(rtnl_put_cacheinfo);
 
-void netdev_set_operstate(struct net_device *dev, int newstate)
+void netif_set_operstate(struct net_device *dev, int newstate)
 {
 	unsigned int old = READ_ONCE(dev->operstate);
 
@@ -1052,9 +1052,9 @@ void netdev_set_operstate(struct net_device *dev, int newstate)
 			return;
 	} while (!try_cmpxchg(&dev->operstate, &old, newstate));
 
-	netdev_state_change(dev);
+	netif_state_change(dev);
 }
-EXPORT_SYMBOL(netdev_set_operstate);
+EXPORT_SYMBOL(netif_set_operstate);
 
 static void set_operstate(struct net_device *dev, unsigned char transition)
 {
@@ -1080,7 +1080,7 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
 		break;
 	}
 
-	netdev_set_operstate(dev, operstate);
+	netif_set_operstate(dev, operstate);
 }
 
 static unsigned int rtnl_dev_get_flags(const struct net_device *dev)
@@ -3396,7 +3396,7 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
 errout:
 	if (status & DO_SETLINK_MODIFIED) {
 		if ((status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY)
-			netdev_state_change(dev);
+			netif_state_change(dev);
 
 		if (err < 0)
 			net_warn_ratelimited("A link change request failed with some changes committed already. Interface %s may have been left with an inconsistent configuration, please check.\n",
@@ -3676,8 +3676,11 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 				nla_len(tb[IFLA_BROADCAST]));
 	if (tb[IFLA_TXQLEN])
 		dev->tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
-	if (tb[IFLA_OPERSTATE])
+	if (tb[IFLA_OPERSTATE]) {
+		netdev_lock_ops(dev);
 		set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
+		netdev_unlock_ops(dev);
+	}
 	if (tb[IFLA_LINKMODE])
 		dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
 	if (tb[IFLA_GROUP])
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 439cfb7ad5d1..1b1b700ec05e 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -33,14 +33,14 @@ static void hsr_set_operstate(struct hsr_port *master, bool has_carrier)
 	struct net_device *dev = master->dev;
 
 	if (!is_admin_up(dev)) {
-		netdev_set_operstate(dev, IF_OPER_DOWN);
+		netif_set_operstate(dev, IF_OPER_DOWN);
 		return;
 	}
 
 	if (has_carrier)
-		netdev_set_operstate(dev, IF_OPER_UP);
+		netif_set_operstate(dev, IF_OPER_UP);
 	else
-		netdev_set_operstate(dev, IF_OPER_LOWERLAYERDOWN);
+		netif_set_operstate(dev, IF_OPER_LOWERLAYERDOWN);
 }
 
 static bool hsr_check_carrier(struct hsr_port *master)
Stanislav Fomichev April 3, 2025, 10:04 p.m. UTC | #4
On 04/03, Stanislav Fomichev wrote:
> On 04/03, Cosmin Ratiu wrote:
> > On Wed, 2025-04-02 at 16:20 -0700, Stanislav Fomichev wrote:
> > > On 04/02, Cosmin Ratiu wrote:
> > > > Hi,
> > > > 
> > > > Not sure if it's reported already, but I encountered a bug while
> > > > testing with the new locking scheme.
> > > > This is the call trace:
> > > > 
> > > > [ 3454.975672] WARNING: CPU: 1 PID: 58237 at
> > > > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > > > [ 3455.008776]  ? ipv6_add_dev+0x370/0x620
> > > > [ 3455.010097]  ipv6_find_idev+0x96/0xe0
> > > > [ 3455.010725]  addrconf_add_dev+0x1e/0xa0
> > > > [ 3455.011382]  addrconf_init_auto_addrs+0xb0/0x720
> > > > [ 3455.013537]  addrconf_notify+0x35f/0x8d0
> > > > [ 3455.014214]  notifier_call_chain+0x38/0xf0
> > > > [ 3455.014903]  netdev_state_change+0x65/0x90
> > > > [ 3455.015586]  linkwatch_do_dev+0x5a/0x70
> > > > [ 3455.016238]  rtnl_getlink+0x241/0x3e0
> > > > [ 3455.019046]  rtnetlink_rcv_msg+0x177/0x5e0
> > > > 
> > > > The call chain is rtnl_getlink -> linkwatch_sync_dev ->
> > > > linkwatch_do_dev -> netdev_state_change -> ...
> > > > 
> > > > Nothing on this path acquires the netdev lock, resulting in a
> > > > warning.
> > > > Perhaps rtnl_getlink should acquire it, in addition to the RTNL
> > > > already
> > > > held by rtnetlink_rcv_msg?
> > > > 
> > > > The same thing can be seen from the regular linkwatch wq:
> > > > 
> > > > [ 3456.637014] WARNING: CPU: 16 PID: 83257 at
> > > > ./include/net/netdev_lock.h:54 ipv6_add_dev+0x370/0x620
> > > > [ 3456.655305] Call Trace:
> > > > [ 3456.655610]  <TASK>
> > > > [ 3456.655890]  ? __warn+0x89/0x1b0
> > > > [ 3456.656261]  ? ipv6_add_dev+0x370/0x620
> > > > [ 3456.660039]  ipv6_find_idev+0x96/0xe0
> > > > [ 3456.660445]  addrconf_add_dev+0x1e/0xa0
> > > > [ 3456.660861]  addrconf_init_auto_addrs+0xb0/0x720
> > > > [ 3456.661803]  addrconf_notify+0x35f/0x8d0
> > > > [ 3456.662236]  notifier_call_chain+0x38/0xf0
> > > > [ 3456.662676]  netdev_state_change+0x65/0x90
> > > > [ 3456.663112]  linkwatch_do_dev+0x5a/0x70
> > > > [ 3456.663529]  __linkwatch_run_queue+0xeb/0x200
> > > > [ 3456.663990]  linkwatch_event+0x21/0x30
> > > > [ 3456.664399]  process_one_work+0x211/0x610
> > > > [ 3456.664828]  worker_thread+0x1cc/0x380
> > > > [ 3456.665691]  kthread+0xf4/0x210
> > > > 
> > > > In this case, __linkwatch_run_queue seems like a good place to grab
> > > > a
> > > > device lock before calling linkwatch_do_dev.
> > > 
> > > Thanks for the report! What about linkwatch_sync_dev in
> > > netdev_run_todo
> > > and carrier_show? Should probably also need to be wrapped?
> > 
> > Done, here's the patch I'm testing with which works for all tests I
> > could get my hands on. Will you officially propose it (maybe in a
> > slightly different form) please?
> 
> I'm thinking maybe we should push down the locking a bit? To the
> level of netdev_state_change. Since, in theory, every NETDEV_CHANGE
> can reach addrconf_notify. I was playing with the patch below,
> but I think the ethtool will lockup, so I need to fix that at least...
> Let me spend a bit more time today chasing down the callers...

Ok, so the final patch is gonna look like this. LMK if you can give it
a test on your side.

diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
index 6c2d8945f597..eab601ab2db0 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -338,10 +338,11 @@ operations directly under the netdev instance lock.
 Devices drivers are encouraged to rely on the instance lock where possible.
 
 For the (mostly software) drivers that need to interact with the core stack,
-there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g.,
-``dev_set_mtu`` and ``netif_set_mtu``). The ``dev_xxx`` functions handle
-acquiring the instance lock themselves, while the ``netif_xxx`` functions
-assume that the driver has already acquired the instance lock.
+there are two sets of interfaces: ``dev_xxx``/``netdev_xxx`` and ``netif_xxx``
+(e.g., ``dev_set_mtu`` and ``netif_set_mtu``). The ``dev_xxx``/``netdev_xxx``
+functions handle acquiring the instance lock themselves, while the
+``netif_xxx`` functions assume that the driver has already acquired
+the instance lock.
 
 Notifiers and netdev instance lock
 ==================================
@@ -354,6 +355,7 @@ For devices with locked ops, currently only the following notifiers are
 running under the lock:
 * ``NETDEV_REGISTER``
 * ``NETDEV_UP``
+* ``NETDEV_CHANGE``
 
 The following notifiers are running without the lock:
 * ``NETDEV_UNREGISTER``
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9fb03a292817..b3a162105129 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4430,6 +4430,7 @@ void linkwatch_fire_event(struct net_device *dev);
  * pending work list (if queued).
  */
 void linkwatch_sync_dev(struct net_device *dev);
+void __linkwatch_sync_dev(struct net_device *dev);
 
 /**
  *	netif_carrier_ok - test if carrier present
@@ -4975,6 +4976,7 @@ void dev_set_rx_mode(struct net_device *dev);
 int dev_set_promiscuity(struct net_device *dev, int inc);
 int netif_set_allmulti(struct net_device *dev, int inc, bool notify);
 int dev_set_allmulti(struct net_device *dev, int inc);
+void netif_state_change(struct net_device *dev);
 void netdev_state_change(struct net_device *dev);
 void __netdev_notify_peers(struct net_device *dev);
 void netdev_notify_peers(struct net_device *dev);
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index ccaaf4c7d5f6..ea39dd23a197 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -240,6 +240,6 @@ rtnl_notify_needed(const struct net *net, u16 nlflags, u32 group)
 	return (nlflags & NLM_F_ECHO) || rtnl_has_listeners(net, group);
 }
 
-void netdev_set_operstate(struct net_device *dev, int newstate);
+void netif_set_operstate(struct net_device *dev, int newstate);
 
 #endif	/* __LINUX_RTNETLINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 87cba93fa59f..d4a5c07a0d73 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1567,15 +1567,7 @@ void netdev_features_change(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_features_change);
 
-/**
- *	netdev_state_change - device changes state
- *	@dev: device to cause notification
- *
- *	Called to indicate a device has changed state. This function calls
- *	the notifier chains for netdev_chain and sends a NEWLINK message
- *	to the routing socket.
- */
-void netdev_state_change(struct net_device *dev)
+void netif_state_change(struct net_device *dev)
 {
 	if (dev->flags & IFF_UP) {
 		struct netdev_notifier_change_info change_info = {
@@ -1587,7 +1579,6 @@ void netdev_state_change(struct net_device *dev)
 		rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL, 0, NULL);
 	}
 }
-EXPORT_SYMBOL(netdev_state_change);
 
 /**
  * __netdev_notify_peers - notify network peers about existence of @dev,
diff --git a/net/core/dev_api.c b/net/core/dev_api.c
index 90bafb0b1b8c..6c6ca15ef2a3 100644
--- a/net/core/dev_api.c
+++ b/net/core/dev_api.c
@@ -327,3 +327,19 @@ int dev_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_xdp_propagate);
+
+/**
+ *	netdev_state_change - device changes state
+ *	@dev: device to cause notification
+ *
+ *	Called to indicate a device has changed state. This function calls
+ *	the notifier chains for netdev_chain and sends a NEWLINK message
+ *	to the routing socket.
+ */
+void netdev_state_change(struct net_device *dev)
+{
+	netdev_lock_ops(dev);
+	netif_state_change(dev);
+	netdev_unlock_ops(dev);
+}
+EXPORT_SYMBOL(netdev_state_change);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2b9807..864f3bbc3a4c 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -183,7 +183,7 @@ static void linkwatch_do_dev(struct net_device *dev)
 		else
 			dev_deactivate(dev);
 
-		netdev_state_change(dev);
+		netif_state_change(dev);
 	}
 	/* Note: our callers are responsible for calling netdev_tracker_free().
 	 * This is the reason we use __dev_put() instead of dev_put().
@@ -240,7 +240,9 @@ static void __linkwatch_run_queue(int urgent_only)
 		 */
 		netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
 		spin_unlock_irq(&lweventlist_lock);
+		netdev_lock_ops(dev);
 		linkwatch_do_dev(dev);
+		netdev_unlock_ops(dev);
 		do_dev--;
 		spin_lock_irq(&lweventlist_lock);
 	}
@@ -253,25 +255,41 @@ static void __linkwatch_run_queue(int urgent_only)
 	spin_unlock_irq(&lweventlist_lock);
 }
 
-void linkwatch_sync_dev(struct net_device *dev)
+static bool linkwatch_clean_dev(struct net_device *dev)
 {
 	unsigned long flags;
-	int clean = 0;
+	bool clean = false;
 
 	spin_lock_irqsave(&lweventlist_lock, flags);
 	if (!list_empty(&dev->link_watch_list)) {
 		list_del_init(&dev->link_watch_list);
-		clean = 1;
+		clean = true;
 		/* We must release netdev tracker under
 		 * the spinlock protection.
 		 */
 		netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
 	}
 	spin_unlock_irqrestore(&lweventlist_lock, flags);
-	if (clean)
+
+	return clean;
+}
+
+void __linkwatch_sync_dev(struct net_device *dev)
+{
+	netdev_ops_assert_locked(dev);
+
+	if (linkwatch_clean_dev(dev))
 		linkwatch_do_dev(dev);
 }
 
+void linkwatch_sync_dev(struct net_device *dev)
+{
+	if (linkwatch_clean_dev(dev)) {
+		netdev_lock_ops(dev);
+		linkwatch_do_dev(dev);
+		netdev_unlock_ops(dev);
+	}
+}
 
 /* Must be called with the rtnl semaphore held */
 void linkwatch_run_queue(void)
diff --git a/net/core/lock_debug.c b/net/core/lock_debug.c
index 72e522a68775..c442bf52dbaf 100644
--- a/net/core/lock_debug.c
+++ b/net/core/lock_debug.c
@@ -20,11 +20,11 @@ int netdev_debug_event(struct notifier_block *nb, unsigned long event,
 	switch (cmd) {
 	case NETDEV_REGISTER:
 	case NETDEV_UP:
+	case NETDEV_CHANGE:
 		netdev_ops_assert_locked(dev);
 		fallthrough;
 	case NETDEV_DOWN:
 	case NETDEV_REBOOT:
-	case NETDEV_CHANGE:
 	case NETDEV_UNREGISTER:
 	case NETDEV_CHANGEMTU:
 	case NETDEV_CHANGEADDR:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index c23852835050..d8d03ff87a3b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1043,7 +1043,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id,
 }
 EXPORT_SYMBOL_GPL(rtnl_put_cacheinfo);
 
-void netdev_set_operstate(struct net_device *dev, int newstate)
+void netif_set_operstate(struct net_device *dev, int newstate)
 {
 	unsigned int old = READ_ONCE(dev->operstate);
 
@@ -1052,9 +1052,9 @@ void netdev_set_operstate(struct net_device *dev, int newstate)
 			return;
 	} while (!try_cmpxchg(&dev->operstate, &old, newstate));
 
-	netdev_state_change(dev);
+	netif_state_change(dev);
 }
-EXPORT_SYMBOL(netdev_set_operstate);
+EXPORT_SYMBOL(netif_set_operstate);
 
 static void set_operstate(struct net_device *dev, unsigned char transition)
 {
@@ -1080,7 +1080,7 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
 		break;
 	}
 
-	netdev_set_operstate(dev, operstate);
+	netif_set_operstate(dev, operstate);
 }
 
 static unsigned int rtnl_dev_get_flags(const struct net_device *dev)
@@ -3396,7 +3396,7 @@ static int do_setlink(const struct sk_buff *skb, struct net_device *dev,
 errout:
 	if (status & DO_SETLINK_MODIFIED) {
 		if ((status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY)
-			netdev_state_change(dev);
+			netif_state_change(dev);
 
 		if (err < 0)
 			net_warn_ratelimited("A link change request failed with some changes committed already. Interface %s may have been left with an inconsistent configuration, please check.\n",
@@ -3676,8 +3676,11 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 				nla_len(tb[IFLA_BROADCAST]));
 	if (tb[IFLA_TXQLEN])
 		dev->tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
-	if (tb[IFLA_OPERSTATE])
+	if (tb[IFLA_OPERSTATE]) {
+		netdev_lock_ops(dev);
 		set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
+		netdev_unlock_ops(dev);
+	}
 	if (tb[IFLA_LINKMODE])
 		dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
 	if (tb[IFLA_GROUP])
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 221639407c72..8262cc10f98d 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -60,7 +60,7 @@ static struct devlink *netdev_to_devlink_get(struct net_device *dev)
 u32 ethtool_op_get_link(struct net_device *dev)
 {
 	/* Synchronize carrier state with link watch, see also rtnl_getlink() */
-	linkwatch_sync_dev(dev);
+	__linkwatch_sync_dev(dev);
 
 	return netif_carrier_ok(dev) ? 1 : 0;
 }
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 439cfb7ad5d1..1b1b700ec05e 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -33,14 +33,14 @@ static void hsr_set_operstate(struct hsr_port *master, bool has_carrier)
 	struct net_device *dev = master->dev;
 
 	if (!is_admin_up(dev)) {
-		netdev_set_operstate(dev, IF_OPER_DOWN);
+		netif_set_operstate(dev, IF_OPER_DOWN);
 		return;
 	}
 
 	if (has_carrier)
-		netdev_set_operstate(dev, IF_OPER_UP);
+		netif_set_operstate(dev, IF_OPER_UP);
 	else
-		netdev_set_operstate(dev, IF_OPER_LOWERLAYERDOWN);
+		netif_set_operstate(dev, IF_OPER_LOWERLAYERDOWN);
 }
 
 static bool hsr_check_carrier(struct hsr_port *master)
Cosmin Ratiu April 4, 2025, 2:03 p.m. UTC | #5
On Thu, 2025-04-03 at 15:04 -0700, Stanislav Fomichev wrote:
> Ok, so the final patch is gonna look like this. LMK if you can give
> it a test on your side.

I put this through regressions with my pending mlx5 locking patches and
got all green.
Thanks for fixing this new issue!

Cosmin.
diff mbox series

Patch

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index cb04ef2b9807..002f18b11d85 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -240,7 +240,9 @@  static void __linkwatch_run_queue(int urgent_only)
 		 */
 		netdev_tracker_free(dev, &dev->linkwatch_dev_tracker);
 		spin_unlock_irq(&lweventlist_lock);
+		netdev_lock_ops(dev);
 		linkwatch_do_dev(dev);
+		netdev_unlock_ops(dev);
 		do_dev--;
 		spin_lock_irq(&lweventlist_lock);
 	}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a2736e434712..c77b37d897eb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4175,7 +4175,9 @@  static int rtnl_getlink(struct sk_buff *skb,
struct nlmsghdr *nlh,
 	 * only TX if link watch work has run, but without this we'd
 	 * already report carrier on, even if it doesn't work yet.
 	 */
+	netdev_lock_ops(dev);
 	linkwatch_sync_dev(dev);
+	netdev_unlock_ops(dev);
 
 	err = rtnl_fill_ifinfo(nskb, dev, net,