Message ID | 20230406233058.780721-1-vladimir.oltean@nxp.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net] net: ipv4/ipv6 addrconf: call igmp{,6}_group_dropped() while dev is still up | expand |
Context | Check | Description |
---|---|---|
netdev/series_format | success | Single patches do not need cover letters |
netdev/tree_selection | success | Clearly marked for net |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 20 this patch: 20 |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/build_clang | success | Errors and warnings before: 18 this patch: 18 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/deprecated_api | success | None detected |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 20 this patch: 20 |
netdev/checkpatch | warning | WARNING: line length of 81 exceeds 80 columns |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
[ cc Ido in case such a change has implications to mlxsw ] On 4/6/23 5:30 PM, Vladimir Oltean wrote: > ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and > both of these eventually result in calls to dev_mc_del(), either through > igmp_group_dropped() or igmp6_group_dropped(). > > The problem is that dev_mc_del() does call __dev_set_rx_mode(), but this > will not propagate all the way to the ndo_set_rx_mode() of the device, > because of this check: > > /* dev_open will call this function so the list will stay sane. */ > if (!(dev->flags&IFF_UP)) > return; > > and the NETDEV_DOWN notifier is emitted while the interface is already > down. OTOH we have NETDEV_GOING_DOWN which is emitted a bit earlier - > see: > > dev_close_many() > -> __dev_close_many() > -> call_netdevice_notifiers(NETDEV_GOING_DOWN, dev); > -> dev->flags &= ~IFF_UP; > -> call_netdevice_notifiers(NETDEV_DOWN, dev); > > Normally this oversight is easy to miss, because the addresses aren't > lost, just not synced to the device until the next up event. > > DSA does some processing in its dsa_slave_set_rx_mode(), and assumes > that all addresses that were synced are also unsynced by the time the > device is unregistered. Due to that assumption not being satisfied, > the WARN_ON(!list_empty(&dp->mdbs)); from dsa_switch_release_ports() > triggers, and we leak memory corresponding to the multicast addresses > that were never synced. > > Minimal reproducer: > ip link set swp0 up > ip link set swp0 down > echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind > > The proposal is to respond to that slightly earlier notifier with the > IGMP address deletion, so that the ndo_set_rx_mode() of the device does > actually get called. I am not familiar with the details of these layers, > but it appeared to me that NETDEV_DOWN needed to be replaced everywhere > with NETDEV_GOING_DOWN, so I blindly did that and it worked. > > Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- > Obviously DSA is not the only affected driver, but the extent to which > other drivers are impacted is not obvious to me. At least in DSA, there > is a WARN_ON() and a memory leak, so this is why I chose that Fixes tag. > > net/ipv4/devinet.c | 7 ++++--- > net/ipv6/addrconf.c | 12 ++++++------ > 2 files changed, 10 insertions(+), 9 deletions(-) > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index 5deac0517ef7..95690d16d651 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -392,7 +392,8 @@ static void __inet_del_ifa(struct in_device *in_dev, > > rtmsg_ifa(RTM_DELADDR, ifa, nlh, portid); > blocking_notifier_call_chain(&inetaddr_chain, > - NETDEV_DOWN, ifa); > + NETDEV_GOING_DOWN, > + ifa); > inet_free_ifa(ifa); > } else { > promote = ifa; > @@ -429,7 +430,7 @@ static void __inet_del_ifa(struct in_device *in_dev, > So that, this order is correct. > */ > rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid); > - blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); > + blocking_notifier_call_chain(&inetaddr_chain, NETDEV_GOING_DOWN, ifa1); > > if (promote) { > struct in_ifaddr *next_sec; > @@ -1588,7 +1589,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, > /* Send gratuitous ARP to notify of link change */ > inetdev_send_gratuitous_arp(dev, in_dev); > break; > - case NETDEV_DOWN: > + case NETDEV_GOING_DOWN: > ip_mc_down(in_dev); > break; > case NETDEV_PRE_TYPE_CHANGE: > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c > index 3797917237d0..9e484f829f1c 100644 > --- a/net/ipv6/addrconf.c > +++ b/net/ipv6/addrconf.c > @@ -1307,7 +1307,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) > > ipv6_ifa_notify(RTM_DELADDR, ifp); > > - inet6addr_notifier_call_chain(NETDEV_DOWN, ifp); > + inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifp); > > if (action != CLEANUP_PREFIX_RT_NOP) { > cleanup_prefix_route(ifp, expires, > @@ -3670,12 +3670,12 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, > } > break; > > - case NETDEV_DOWN: > + case NETDEV_GOING_DOWN: > case NETDEV_UNREGISTER: > /* > * Remove all addresses from this interface. > */ > - addrconf_ifdown(dev, event != NETDEV_DOWN); > + addrconf_ifdown(dev, event != NETDEV_GOING_DOWN); > break; > > case NETDEV_CHANGENAME: > @@ -3741,7 +3741,7 @@ static bool addr_is_local(const struct in6_addr *addr) > > static int addrconf_ifdown(struct net_device *dev, bool unregister) > { > - unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN; > + unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_GOING_DOWN; > struct net *net = dev_net(dev); > struct inet6_dev *idev; > struct inet6_ifaddr *ifa; > @@ -3877,7 +3877,7 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister) > > if (state != INET6_IFADDR_STATE_DEAD) { > __ipv6_ifa_notify(RTM_DELADDR, ifa); > - inet6addr_notifier_call_chain(NETDEV_DOWN, ifa); > + inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifa); > } else { > if (idev->cnf.forwarding) > addrconf_leave_anycast(ifa); > @@ -6252,7 +6252,7 @@ static void dev_disable_change(struct inet6_dev *idev) > > netdev_notifier_info_init(&info, idev->dev); > if (idev->cnf.disable_ipv6) > - addrconf_notify(NULL, NETDEV_DOWN, &info); > + addrconf_notify(NULL, NETDEV_GOING_DOWN, &info); > else > addrconf_notify(NULL, NETDEV_UP, &info); > }
On Sun, Apr 09, 2023 at 08:08:01PM -0600, David Ahern wrote: > [ cc Ido in case such a change has implications to mlxsw ] Thanks > > On 4/6/23 5:30 PM, Vladimir Oltean wrote: > > ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and > > both of these eventually result in calls to dev_mc_del(), either through > > igmp_group_dropped() or igmp6_group_dropped(). > > > > The problem is that dev_mc_del() does call __dev_set_rx_mode(), but this > > will not propagate all the way to the ndo_set_rx_mode() of the device, > > because of this check: > > > > /* dev_open will call this function so the list will stay sane. */ > > if (!(dev->flags&IFF_UP)) > > return; > > > > and the NETDEV_DOWN notifier is emitted while the interface is already > > down. OTOH we have NETDEV_GOING_DOWN which is emitted a bit earlier - > > see: > > > > dev_close_many() > > -> __dev_close_many() > > -> call_netdevice_notifiers(NETDEV_GOING_DOWN, dev); > > -> dev->flags &= ~IFF_UP; > > -> call_netdevice_notifiers(NETDEV_DOWN, dev); > > > > Normally this oversight is easy to miss, because the addresses aren't > > lost, just not synced to the device until the next up event. > > > > DSA does some processing in its dsa_slave_set_rx_mode(), and assumes > > that all addresses that were synced are also unsynced by the time the > > device is unregistered. Due to that assumption not being satisfied, > > the WARN_ON(!list_empty(&dp->mdbs)); from dsa_switch_release_ports() > > triggers, and we leak memory corresponding to the multicast addresses > > that were never synced. > > > > Minimal reproducer: > > ip link set swp0 up > > ip link set swp0 down > > echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind > > > > The proposal is to respond to that slightly earlier notifier with the > > IGMP address deletion, so that the ndo_set_rx_mode() of the device does > > actually get called. I am not familiar with the details of these layers, > > but it appeared to me that NETDEV_DOWN needed to be replaced everywhere > > with NETDEV_GOING_DOWN, so I blindly did that and it worked. I think there is a confusion here between the netdev notifier and inetaddr notifiers. They all use "NETDEV_DOWN", but in the inetaddr notifiers it means that an address is being deleted. Changing the event to "NETDEV_GOING_DOWN" is going to break a lot of users since none of the inetaddr listeners respond to "NETDEV_GOING_DOWN". IOW, I believe you only need this change for IPv4 (and similarly for IPv6): diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 5deac0517ef7..679c9819f25b 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1588,7 +1588,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, /* Send gratuitous ARP to notify of link change */ inetdev_send_gratuitous_arp(dev, in_dev); break; - case NETDEV_DOWN: + case NETDEV_GOING_DOWN: ip_mc_down(in_dev); break; case NETDEV_PRE_TYPE_CHANGE:
Hi Ido, On Mon, Apr 10, 2023 at 10:55:02AM +0300, Ido Schimmel wrote: > > > The proposal is to respond to that slightly earlier notifier with the > > > IGMP address deletion, so that the ndo_set_rx_mode() of the device does > > > actually get called. I am not familiar with the details of these layers, > > > but it appeared to me that NETDEV_DOWN needed to be replaced everywhere > > > with NETDEV_GOING_DOWN, so I blindly did that and it worked. > > I think there is a confusion here between the netdev notifier and > inetaddr notifiers. They all use "NETDEV_DOWN", but in the inetaddr > notifiers it means that an address is being deleted. Changing the event > to "NETDEV_GOING_DOWN" is going to break a lot of users since none of > the inetaddr listeners respond to "NETDEV_GOING_DOWN". > > IOW, I believe you only need this change for IPv4 (and similarly for > IPv6): > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index 5deac0517ef7..679c9819f25b 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -1588,7 +1588,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, > /* Send gratuitous ARP to notify of link change */ > inetdev_send_gratuitous_arp(dev, in_dev); > break; > - case NETDEV_DOWN: > + case NETDEV_GOING_DOWN: > ip_mc_down(in_dev); > break; > case NETDEV_PRE_TYPE_CHANGE: You are correct, only that portion is needed for IPv4. When I open my eyes, I see it too :) Although it would have been a lot less confusing for someone looking at the code for the first time if the inetaddr and inet6addr notifiers did not use events from the same NETDEV_ namespace as the netdev notifiers... So, how do you think I should proceed with this? One patch or two (for IPv4 and IPv6)? Is the Fixes: tag ok?
On Mon, Apr 10, 2023 at 01:09:58PM +0300, Vladimir Oltean wrote: > So, how do you think I should proceed with this? One patch or two > (for IPv4 and IPv6)? Is the Fixes: tag ok? Fixes tag looks OK and one patch is fine by me. However, given the problem is the check you mentioned in __dev_set_rx_mode(), wouldn't it be better to simply remove it? From the comment above this check it seems to assume that there is no need to update the Rx filters of the device when it's down because they will be synced when it's put back up, but it fails to consider the case where one wants to clear the filters of the device as part of device dismantle.
On Mon, Apr 10, 2023 at 02:43:43PM +0300, Ido Schimmel wrote: > On Mon, Apr 10, 2023 at 01:09:58PM +0300, Vladimir Oltean wrote: > > So, how do you think I should proceed with this? One patch or two > > (for IPv4 and IPv6)? Is the Fixes: tag ok? > > Fixes tag looks OK and one patch is fine by me. However, given the > problem is the check you mentioned in __dev_set_rx_mode(), wouldn't it > be better to simply remove it? From the comment above this check it > seems to assume that there is no need to update the Rx filters of the > device when it's down because they will be synced when it's put back up, > but it fails to consider the case where one wants to clear the filters > of the device as part of device dismantle. Yes, I can do this.
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 5deac0517ef7..95690d16d651 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -392,7 +392,8 @@ static void __inet_del_ifa(struct in_device *in_dev, rtmsg_ifa(RTM_DELADDR, ifa, nlh, portid); blocking_notifier_call_chain(&inetaddr_chain, - NETDEV_DOWN, ifa); + NETDEV_GOING_DOWN, + ifa); inet_free_ifa(ifa); } else { promote = ifa; @@ -429,7 +430,7 @@ static void __inet_del_ifa(struct in_device *in_dev, So that, this order is correct. */ rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid); - blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1); + blocking_notifier_call_chain(&inetaddr_chain, NETDEV_GOING_DOWN, ifa1); if (promote) { struct in_ifaddr *next_sec; @@ -1588,7 +1589,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, /* Send gratuitous ARP to notify of link change */ inetdev_send_gratuitous_arp(dev, in_dev); break; - case NETDEV_DOWN: + case NETDEV_GOING_DOWN: ip_mc_down(in_dev); break; case NETDEV_PRE_TYPE_CHANGE: diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 3797917237d0..9e484f829f1c 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1307,7 +1307,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) ipv6_ifa_notify(RTM_DELADDR, ifp); - inet6addr_notifier_call_chain(NETDEV_DOWN, ifp); + inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifp); if (action != CLEANUP_PREFIX_RT_NOP) { cleanup_prefix_route(ifp, expires, @@ -3670,12 +3670,12 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, } break; - case NETDEV_DOWN: + case NETDEV_GOING_DOWN: case NETDEV_UNREGISTER: /* * Remove all addresses from this interface. */ - addrconf_ifdown(dev, event != NETDEV_DOWN); + addrconf_ifdown(dev, event != NETDEV_GOING_DOWN); break; case NETDEV_CHANGENAME: @@ -3741,7 +3741,7 @@ static bool addr_is_local(const struct in6_addr *addr) static int addrconf_ifdown(struct net_device *dev, bool unregister) { - unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN; + unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_GOING_DOWN; struct net *net = dev_net(dev); struct inet6_dev *idev; struct inet6_ifaddr *ifa; @@ -3877,7 +3877,7 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister) if (state != INET6_IFADDR_STATE_DEAD) { __ipv6_ifa_notify(RTM_DELADDR, ifa); - inet6addr_notifier_call_chain(NETDEV_DOWN, ifa); + inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifa); } else { if (idev->cnf.forwarding) addrconf_leave_anycast(ifa); @@ -6252,7 +6252,7 @@ static void dev_disable_change(struct inet6_dev *idev) netdev_notifier_info_init(&info, idev->dev); if (idev->cnf.disable_ipv6) - addrconf_notify(NULL, NETDEV_DOWN, &info); + addrconf_notify(NULL, NETDEV_GOING_DOWN, &info); else addrconf_notify(NULL, NETDEV_UP, &info); }
ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and both of these eventually result in calls to dev_mc_del(), either through igmp_group_dropped() or igmp6_group_dropped(). The problem is that dev_mc_del() does call __dev_set_rx_mode(), but this will not propagate all the way to the ndo_set_rx_mode() of the device, because of this check: /* dev_open will call this function so the list will stay sane. */ if (!(dev->flags&IFF_UP)) return; and the NETDEV_DOWN notifier is emitted while the interface is already down. OTOH we have NETDEV_GOING_DOWN which is emitted a bit earlier - see: dev_close_many() -> __dev_close_many() -> call_netdevice_notifiers(NETDEV_GOING_DOWN, dev); -> dev->flags &= ~IFF_UP; -> call_netdevice_notifiers(NETDEV_DOWN, dev); Normally this oversight is easy to miss, because the addresses aren't lost, just not synced to the device until the next up event. DSA does some processing in its dsa_slave_set_rx_mode(), and assumes that all addresses that were synced are also unsynced by the time the device is unregistered. Due to that assumption not being satisfied, the WARN_ON(!list_empty(&dp->mdbs)); from dsa_switch_release_ports() triggers, and we leak memory corresponding to the multicast addresses that were never synced. Minimal reproducer: ip link set swp0 up ip link set swp0 down echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind The proposal is to respond to that slightly earlier notifier with the IGMP address deletion, so that the ndo_set_rx_mode() of the device does actually get called. I am not familiar with the details of these layers, but it appeared to me that NETDEV_DOWN needed to be replaced everywhere with NETDEV_GOING_DOWN, so I blindly did that and it worked. Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- Obviously DSA is not the only affected driver, but the extent to which other drivers are impacted is not obvious to me. At least in DSA, there is a WARN_ON() and a memory leak, so this is why I chose that Fixes tag. net/ipv4/devinet.c | 7 ++++--- net/ipv6/addrconf.c | 12 ++++++------ 2 files changed, 10 insertions(+), 9 deletions(-)