Message ID | 20230725102137.299305-1-liuhangbin@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [PATCHv4,net] ipv6: do not match device when remove source route | expand |
On Tue, 25 Jul 2023 18:21:37 +0800 Hangbin Liu wrote: > After deleting an IPv6 address on an interface and cleaning up the > related preferred source entries, it is important to ensure that all > routes associated with the deleted address are properly cleared. The > current implementation of rt6_remove_prefsrc() only checks the preferred > source addresses bound to the current device. However, there may be > routes that are bound to other devices but still utilize the same > preferred source address. David, Ido, where do you stand on this patch? The discussion on v3 continued after this was posted and we got no review tags...
On Tue, Jul 25, 2023 at 06:21:37PM +0800, Hangbin Liu wrote: > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 64e873f5895f..44e980109e30 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -4590,10 +4590,10 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) > struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; > struct net *net = ((struct arg_dev_net_ip *)arg)->net; > struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; > + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; > > - if (!rt->nh && > - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && > - rt != net->ipv6.fib6_null_entry && > + if (rt != net->ipv6.fib6_null_entry && > + rt->fib6_table->tb6_id == tb6_id && > ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { > spin_lock_bh(&rt6_exception_lock); > /* remove prefsrc entry */ > @@ -4611,7 +4611,9 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp) > .net = net, > .addr = &ifp->addr, > }; > - fib6_clean_all(net, fib6_remove_prefsrc, &adni); > + > + if (!ipv6_chk_addr_and_flags(net, adni.addr, adni.dev, true, 0, IFA_F_TENTATIVE)) Setting 'skip_dev_check' to true is problematic since when a link-local address is deleted from a device, it should be removed as the preferred source address from routes using the device as their nexthop device, even if this address is configured on other devices. You can't configure a route with a link-local preferred source address if the address is not configured on the nexthop device: # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip -6 address add fe80::1/128 dev dummy1 # ip -6 route add 2001:db8:1::/64 dev dummy2 src fe80::1 Error: Invalid source address. # ip -6 address add fe80::1/128 dev dummy2 # ip -6 route add 2001:db8:1::/64 dev dummy2 src fe80::1 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy2 src fe80::1 metric 1024 pref medium But if I now delete the address from dummy2, the preferred source address still exists: # ip -6 address del fe80::1/128 dev dummy2 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy2 src fe80::1 metric 1024 pref medium Setting 'skip_dev_check' to false will solve this problem: # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip -6 address add fe80::1/128 dev dummy1 # ip -6 address add fe80::1/128 dev dummy2 # ip -6 route add 2001:db8:1::/64 dev dummy2 src fe80::1 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy2 src fe80::1 metric 1024 pref medium # ip -6 address del fe80::1/128 dev dummy2 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy2 metric 1024 pref medium But will create another problem where when such an address is deleted it also affects routes that shouldn't be affected: # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip -6 address add fe80::1/128 dev dummy1 # ip -6 address add fe80::1/128 dev dummy2 # ip -6 route add 2001:db8:1::/64 dev dummy1 src fe80::1 # ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy1 src fe80::1 metric 1024 pref medium # ip -6 route show 2001:db8:2::/64 2001:db8:2::/64 dev dummy2 src fe80::1 metric 1024 pref medium # ip -6 address del fe80::1/128 dev dummy2 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy1 metric 1024 pref medium # ip -6 route show 2001:db8:2::/64 2001:db8:2::/64 dev dummy2 metric 1024 pref medium So, I think we need to call ipv6_chk_addr() from rt6_remove_prefsrc() to be consistent with the addition path in ip6_route_info_create(): diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 56a55585eb79..e7e2187bff0c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4591,11 +4591,13 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; struct net *net = ((struct arg_dev_net_ip *)arg)->net; struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; if (!rt->nh && - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && rt != net->ipv6.fib6_null_entry && - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { + rt->fib6_table->tb6_id == tb6_id && + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && + !ipv6_chk_addr(net, addr, rt->fib6_nh->fib_nh_dev, 0)) { spin_lock_bh(&rt6_exception_lock); /* remove prefsrc entry */ rt->fib6_prefsrc.plen = 0; ipv6_chk_addr() is not cheap, but it's only called for routes that match the previous criteria. With the above patch, the previous test cases now work as expected: # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip -6 address add fe80::1/128 dev dummy1 # ip -6 address add fe80::1/128 dev dummy2 # ip -6 route add 2001:db8:1::/64 dev dummy2 src fe80::1 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy2 src fe80::1 metric 1024 pref medium # ip -6 address del fe80::1/128 dev dummy2 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy2 metric 1024 pref medium # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip -6 address add fe80::1/128 dev dummy1 # ip -6 address add fe80::1/128 dev dummy2 # ip -6 route add 2001:db8:1::/64 dev dummy1 src fe80::1 # ip -6 route add 2001:db8:2::/64 dev dummy2 src fe80::1 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy1 src fe80::1 metric 1024 pref medium # ip -6 route show 2001:db8:2::/64 2001:db8:2::/64 dev dummy2 src fe80::1 metric 1024 pref medium # ip -6 address del fe80::1/128 dev dummy2 # ip -6 route show 2001:db8:1::/64 2001:db8:1::/64 dev dummy1 src fe80::1 metric 1024 pref medium # ip -6 route show 2001:db8:2::/64 2001:db8:2::/64 dev dummy2 metric 1024 pref medium There is however one failure in the selftest: # ./fib_tests.sh -t ipv6_del_addr IPv6 delete address route tests Regular FIB info TEST: Prefsrc removed from VRF when source address deleted [ OK ] TEST: Prefsrc in default VRF not removed [ OK ] TEST: Prefsrc removed in default VRF when source address deleted [ OK ] TEST: Prefsrc in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Prefsrc removed from VRF when source address deleted [FAIL] TEST: Prefsrc in default VRF not removed [ OK ] TEST: Prefsrc removed in default VRF when source address deleted [ OK ] TEST: Prefsrc in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Prefsrc removed in default VRF when source address deleted [ OK ] Identical address on different devices TEST: Prefsrc not removed when src address exists on other device [ OK ] Tests passed: 9 Tests failed: 1 Which is basically: # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip link add red type vrf table 1111 # ip link set dev red up # ip link set dummy2 vrf red # ip -6 address add dev dummy1 2001:db8:104::12/64 # ip -6 address add dev dummy2 2001:db8:104::12/64 # ip -6 route add 2001:db8:106::/64 dev lo src 2001:db8:104::12 # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 # ip -6 address del dev dummy2 2001:db8:104::12/64 # ip -6 route show vrf red | grep "src 2001:db8:104::12" 2001:db8:106::/64 dev lo src 2001:db8:104::12 metric 1024 pref medium I'm not sure it's realistic to expect the source address to be removed when the address is deleted from dummy2, given that user space was only able to configure the route because the address was available on dummy1 in the default vrf: # ip link add name dummy1 up type dummy # ip link add name dummy2 up type dummy # ip link add red type vrf table 1111 # ip link set dev red up # ip link set dummy2 vrf red # ip -6 address add dev dummy2 2001:db8:104::12/64 # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 Error: Invalid source address. Anyway, given that this patch does not fix a regression and the on-going discussion around the semantics, I suggest to target future versions at net-next.
On Tue, Aug 01, 2023 at 02:51:52PM +0300, Ido Schimmel wrote: > On Tue, Jul 25, 2023 at 06:21:37PM +0800, Hangbin Liu wrote: > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 64e873f5895f..44e980109e30 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -4590,10 +4590,10 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) > > struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; > > struct net *net = ((struct arg_dev_net_ip *)arg)->net; > > struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; > > + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; > > > > - if (!rt->nh && > > - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && > > - rt != net->ipv6.fib6_null_entry && > > + if (rt != net->ipv6.fib6_null_entry && > > + rt->fib6_table->tb6_id == tb6_id && > > ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { > > spin_lock_bh(&rt6_exception_lock); > > /* remove prefsrc entry */ > > @@ -4611,7 +4611,9 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp) > > .net = net, > > .addr = &ifp->addr, > > }; > > - fib6_clean_all(net, fib6_remove_prefsrc, &adni); > > + > > + if (!ipv6_chk_addr_and_flags(net, adni.addr, adni.dev, true, 0, IFA_F_TENTATIVE)) > > Setting 'skip_dev_check' to true is problematic since when a link-local > address is deleted from a device, it should be removed as the preferred > source address from routes using the device as their nexthop device, > even if this address is configured on other devices. > > You can't configure a route with a link-local preferred source address > if the address is not configured on the nexthop device: Thanks for letting me know another case I'm not aware... > Setting 'skip_dev_check' to false will solve this problem: > > But will create another problem where when such an address is deleted it > also affects routes that shouldn't be affected: > > So, I think we need to call ipv6_chk_addr() from rt6_remove_prefsrc() to > be consistent with the addition path in ip6_route_info_create(): > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 56a55585eb79..e7e2187bff0c 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -4591,11 +4591,13 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) > struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; > struct net *net = ((struct arg_dev_net_ip *)arg)->net; > struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; > + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; > > if (!rt->nh && > - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && > rt != net->ipv6.fib6_null_entry && > - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { > + rt->fib6_table->tb6_id == tb6_id && > + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && > + !ipv6_chk_addr(net, addr, rt->fib6_nh->fib_nh_dev, 0)) { > spin_lock_bh(&rt6_exception_lock); > /* remove prefsrc entry */ > rt->fib6_prefsrc.plen = 0; > > ipv6_chk_addr() is not cheap, but it's only called for routes that match > the previous criteria. > > With the above patch, the previous test cases now work as expected: > > There is however one failure in the selftest: > > Which is basically: > > # ip link add name dummy1 up type dummy > # ip link add name dummy2 up type dummy > # ip link add red type vrf table 1111 > # ip link set dev red up > # ip link set dummy2 vrf red > # ip -6 address add dev dummy1 2001:db8:104::12/64 > # ip -6 address add dev dummy2 2001:db8:104::12/64 > # ip -6 route add 2001:db8:106::/64 dev lo src 2001:db8:104::12 > # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 > # ip -6 address del dev dummy2 2001:db8:104::12/64 > # ip -6 route show vrf red | grep "src 2001:db8:104::12" > 2001:db8:106::/64 dev lo src 2001:db8:104::12 metric 1024 pref medium > > I'm not sure it's realistic to expect the source address to be removed > when the address is deleted from dummy2, given that user space was only > able to configure the route because the address was available on dummy1 > in the default vrf: > > # ip link add name dummy1 up type dummy > # ip link add name dummy2 up type dummy > # ip link add red type vrf table 1111 > # ip link set dev red up > # ip link set dummy2 vrf red > # ip -6 address add dev dummy2 2001:db8:104::12/64 > # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 > Error: Invalid source address. OK.. Another difference with IPv4, which could add this route directly. e.g. ip addr add dev dummy2 172.16.104.13/24 ip route add vrf red 172.16.107.0/24 dev lo src 172.16.104.13 For the IPv6 part, if we remove dummy1 addr, should we remove the src route in vrf since user only able to config the route when the addr available on dummy1? My current patch, and with yours, will keep the src route in vrf.. + ip link add name dummy1 up type dummy + ip link add name dummy2 up type dummy + ip link add red type vrf table 1111 + ip link set dev red up + ip link set dummy2 vrf red + ip -6 address add dev dummy1 2001:db8:104::12/64 + ip -6 address add dev dummy2 2001:db8:104::12/64 + ip -6 route add 2001:db8:106::/64 dev lo src 2001:db8:104::12 + ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 + ip -6 address del dev dummy1 2001:db8:104::12/64 + ip -6 route show vrf red 2001:db8:104::/64 dev dummy2 proto kernel metric 256 pref medium 2001:db8:106::/64 dev lo src 2001:db8:104::12 metric 1024 pref medium fe80::/64 dev dummy2 proto kernel metric 256 pref medium multicast ff00::/8 dev dummy2 proto kernel metric 256 pref medium + ip -6 route show ::1 dev lo proto kernel metric 256 pref medium 2001:db8:106::/64 dev lo metric 1024 pref medium > > Anyway, given that this patch does not fix a regression and the on-going > discussion around the semantics, I suggest to target future versions at > net-next. OK, I will. Thanks Hangbin
On Tue, Aug 01, 2023 at 02:51:52PM +0300, Ido Schimmel wrote: > So, I think we need to call ipv6_chk_addr() from rt6_remove_prefsrc() to > be consistent with the addition path in ip6_route_info_create(): > > ipv6_chk_addr() is not cheap, but it's only called for routes that match > the previous criteria. > > There is however one failure in the selftest: > > Which is basically: > > # ip link add name dummy1 up type dummy > # ip link add name dummy2 up type dummy > # ip link add red type vrf table 1111 > # ip link set dev red up > # ip link set dummy2 vrf red > # ip -6 address add dev dummy1 2001:db8:104::12/64 > # ip -6 address add dev dummy2 2001:db8:104::12/64 > # ip -6 route add 2001:db8:106::/64 dev lo src 2001:db8:104::12 > # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 > # ip -6 address del dev dummy2 2001:db8:104::12/64 > # ip -6 route show vrf red | grep "src 2001:db8:104::12" > 2001:db8:106::/64 dev lo src 2001:db8:104::12 metric 1024 pref medium > > I'm not sure it's realistic to expect the source address to be removed > when the address is deleted from dummy2, given that user space was only > able to configure the route because the address was available on dummy1 > in the default vrf: > > # ip link add name dummy1 up type dummy > # ip link add name dummy2 up type dummy > # ip link add red type vrf table 1111 > # ip link set dev red up > # ip link set dummy2 vrf red > # ip -6 address add dev dummy2 2001:db8:104::12/64 > # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 > Error: Invalid source address. After looking into it. I think this looks like a bug in ip6_route_info_create(). When the dev is lo. It's vrf is the default vrf table, which makes ipv6_chk_addr() only check dummy1 instead of dummy2. Maybe we should add a special check for lo dev? On the other hand, for route delete. How about add a special check for link local src addr add loopback dev? e.g. diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b45b33394f43..6dbfae99f811 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4586,11 +4586,17 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; struct net *net = ((struct arg_dev_net_ip *)arg)->net; struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; - if (!rt->nh && - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && - rt != net->ipv6.fib6_null_entry && - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { + bool skip_dev_check = !(ipv6_addr_type(addr) & IPV6_ADDR_LINKLOCAL); + + if (!(rt->fib6_nh->fib_nh_dev->flags & IFF_LOOPBACK)) + dev = rt->fib6_nh->fib_nh_dev; + + if (rt != net->ipv6.fib6_null_entry && + rt->fib6_table->tb6_id == tb6_id && + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && + !ipv6_chk_addr_and_flags(net, addr, dev, skip_dev_check, 0, IFA_F_TENTATIVE)) { spin_lock_bh(&rt6_exception_lock); /* remove prefsrc entry */ rt->fib6_prefsrc.plen = 0; With this, we can pass the fib_tests and your link local src route testing Thanks Hangbin
On Thu, Aug 03, 2023 at 12:15:40PM +0800, Hangbin Liu wrote: > On Tue, Aug 01, 2023 at 02:51:52PM +0300, Ido Schimmel wrote: > > On Tue, Jul 25, 2023 at 06:21:37PM +0800, Hangbin Liu wrote: > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > > index 64e873f5895f..44e980109e30 100644 > > > --- a/net/ipv6/route.c > > > +++ b/net/ipv6/route.c > > > @@ -4590,10 +4590,10 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) > > > struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; > > > struct net *net = ((struct arg_dev_net_ip *)arg)->net; > > > struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; > > > + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; > > > > > > - if (!rt->nh && > > > - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && > > > - rt != net->ipv6.fib6_null_entry && > > > + if (rt != net->ipv6.fib6_null_entry && > > > + rt->fib6_table->tb6_id == tb6_id && > > > ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { > > > spin_lock_bh(&rt6_exception_lock); > > > /* remove prefsrc entry */ > > > @@ -4611,7 +4611,9 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp) > > > .net = net, > > > .addr = &ifp->addr, > > > }; > > > - fib6_clean_all(net, fib6_remove_prefsrc, &adni); > > > + > > > + if (!ipv6_chk_addr_and_flags(net, adni.addr, adni.dev, true, 0, IFA_F_TENTATIVE)) > > > > Setting 'skip_dev_check' to true is problematic since when a link-local > > address is deleted from a device, it should be removed as the preferred > > source address from routes using the device as their nexthop device, > > even if this address is configured on other devices. > > > > You can't configure a route with a link-local preferred source address > > if the address is not configured on the nexthop device: > > Thanks for letting me know another case I'm not aware... > > > Setting 'skip_dev_check' to false will solve this problem: > > > > But will create another problem where when such an address is deleted it > > also affects routes that shouldn't be affected: > > > > So, I think we need to call ipv6_chk_addr() from rt6_remove_prefsrc() to > > be consistent with the addition path in ip6_route_info_create(): > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 56a55585eb79..e7e2187bff0c 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -4591,11 +4591,13 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) > > struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; > > struct net *net = ((struct arg_dev_net_ip *)arg)->net; > > struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; > > + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; > > > > if (!rt->nh && > > - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && > > rt != net->ipv6.fib6_null_entry && > > - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { > > + rt->fib6_table->tb6_id == tb6_id && > > + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && > > + !ipv6_chk_addr(net, addr, rt->fib6_nh->fib_nh_dev, 0)) { > > spin_lock_bh(&rt6_exception_lock); > > /* remove prefsrc entry */ > > rt->fib6_prefsrc.plen = 0; > > > > ipv6_chk_addr() is not cheap, but it's only called for routes that match > > the previous criteria. > > > > With the above patch, the previous test cases now work as expected: > > > > There is however one failure in the selftest: > > > > Which is basically: > > > > # ip link add name dummy1 up type dummy > > # ip link add name dummy2 up type dummy > > # ip link add red type vrf table 1111 > > # ip link set dev red up > > # ip link set dummy2 vrf red > > # ip -6 address add dev dummy1 2001:db8:104::12/64 > > # ip -6 address add dev dummy2 2001:db8:104::12/64 > > # ip -6 route add 2001:db8:106::/64 dev lo src 2001:db8:104::12 > > # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 > > # ip -6 address del dev dummy2 2001:db8:104::12/64 > > # ip -6 route show vrf red | grep "src 2001:db8:104::12" > > 2001:db8:106::/64 dev lo src 2001:db8:104::12 metric 1024 pref medium > > > > I'm not sure it's realistic to expect the source address to be removed > > when the address is deleted from dummy2, given that user space was only > > able to configure the route because the address was available on dummy1 > > in the default vrf: > > > > # ip link add name dummy1 up type dummy > > # ip link add name dummy2 up type dummy > > # ip link add red type vrf table 1111 > > # ip link set dev red up > > # ip link set dummy2 vrf red > > # ip -6 address add dev dummy2 2001:db8:104::12/64 > > # ip -6 route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 > > Error: Invalid source address. > > OK.. Another difference with IPv4, which could add this route directly. e.g. > > ip addr add dev dummy2 172.16.104.13/24 > ip route add vrf red 172.16.107.0/24 dev lo src 172.16.104.13 There is a fundamental difference between how the preferred source address is implemented for IPv4 compared to IPv6. In IPv4, the local route for the preferred source address is looked up in the table where the route is installed and there is a fallback to the local table in case the route was not installed in the main table. See fib_valid_prefsrc() and commit e1b8d903c6c3 ("net: Fix prefsrc lookups"). In IPv6, the preferred source address is looked up in the same VRF as the first nexthop device. This will give you similar results to IPv4 if the route is installed in the same VRF as the nexthop device, but not when the nexthop device is enslaved to a different VRF. Personally, I find this behavior weird because at least as far as user space is concerned, the preferred source address is an attribute of the route, not the nexthop. This is probably rooted in historical reasons such as the fact that this feature was implemented in IPv6 before multipath routes and VRFs. Anyway, the objective of the patch is not to align IPv4 and IPv6 (I'm not sure it's possible / worth the effort), but to align the logic for the preferred source address between addition and deletion. That is, if a global address is deleted from a device, then it should be removed from all the relevant routes, not only those using the device as their nexthop device. I tested [1] the following patch [2] and it seems to work. [1] + ip link add name dummy1 up type dummy + ip link add name dummy2 up type dummy + ip link add name dummy3 up type dummy + ip link add name red up type vrf table 1111 + ip link set dev dummy3 master red + ip address add 2001:db8:1::1/64 dev dummy1 + ip address add 2001:db8:1::1/64 dev dummy3 + ip route add 2001:db8:2::/64 dev dummy2 src 2001:db8:1::1 + ip -6 route show 2001:db8:2::/64 dev dummy2 2001:db8:2::/64 src 2001:db8:1::1 metric 1024 pref medium + ip address del 2001:db8:1::1/64 dev dummy3 + ip -6 route show 2001:db8:2::/64 dev dummy2 2001:db8:2::/64 src 2001:db8:1::1 metric 1024 pref medium + ip address del 2001:db8:1::1/64 dev dummy1 + ip -6 route show 2001:db8:2::/64 dev dummy2 2001:db8:2::/64 metric 1024 pref medium + ip address add fe80::1/128 dev dummy1 + ip address add fe80::1/128 dev dummy2 + ip route add 2001:db8:3::/64 dev dummy2 src fe80::1 + ip -6 route show 2001:db8:3::/64 2001:db8:3::/64 dev dummy2 src fe80::1 metric 1024 pref medium + ip address del fe80::1/128 dev dummy1 + ip -6 route show 2001:db8:3::/64 2001:db8:3::/64 dev dummy2 src fe80::1 metric 1024 pref medium + ip address del fe80::1/128 dev dummy2 + ip -6 route show 2001:db8:3::/64 2001:db8:3::/64 dev dummy2 metric 1024 pref medium [2] diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 10751df16dab..3e1c76c7bdd3 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4582,21 +4582,19 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net, /* remove deleted ip from prefsrc entries */ struct arg_dev_net_ip { - struct net_device *dev; struct net *net; struct in6_addr *addr; }; static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) { - struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; struct net *net = ((struct arg_dev_net_ip *)arg)->net; struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; if (!rt->nh && - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && rt != net->ipv6.fib6_null_entry && - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && + !ipv6_chk_addr(net, addr, rt->fib6_nh->fib_nh_dev, 0)) { spin_lock_bh(&rt6_exception_lock); /* remove prefsrc entry */ rt->fib6_prefsrc.plen = 0; @@ -4609,7 +4607,6 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp) { struct net *net = dev_net(ifp->idev->dev); struct arg_dev_net_ip adni = { - .dev = ifp->idev->dev, .net = net, .addr = &ifp->addr, };
On Thu, Aug 10, 2023 at 12:06:31PM +0300, Ido Schimmel wrote: > static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) > { > - struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; We can't remove this parameter and there is an tb id check. My final patch will looks like: static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) { + struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; struct net *net = ((struct arg_dev_net_ip *)arg)->net; - struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; - if (!rt->nh && - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && - rt != net->ipv6.fib6_null_entry && - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { + if (rt != net->ipv6.fib6_null_entry && + rt->fib6_table->tb6_id == tb6_id && + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && + !ipv6_chk_addr(net, addr, rt->fib6_nh->fib_nh_dev, 0)) { spin_lock_bh(&rt6_exception_lock); /* remove prefsrc entry */ rt->fib6_prefsrc.plen = 0; And I will update the fib_test to do a special check for nexthop on loopback dev. Thanks Hangbin > struct net *net = ((struct arg_dev_net_ip *)arg)->net; > struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; > > if (!rt->nh && > - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && > rt != net->ipv6.fib6_null_entry && > - ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { > + ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr) && > + !ipv6_chk_addr(net, addr, rt->fib6_nh->fib_nh_dev, 0)) { > spin_lock_bh(&rt6_exception_lock); > /* remove prefsrc entry */ > rt->fib6_prefsrc.plen = 0; > @@ -4609,7 +4607,6 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp) > { > struct net *net = dev_net(ifp->idev->dev); > struct arg_dev_net_ip adni = { > - .dev = ifp->idev->dev, > .net = net, > .addr = &ifp->addr, > };
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 64e873f5895f..44e980109e30 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4590,10 +4590,10 @@ static int fib6_remove_prefsrc(struct fib6_info *rt, void *arg) struct net_device *dev = ((struct arg_dev_net_ip *)arg)->dev; struct net *net = ((struct arg_dev_net_ip *)arg)->net; struct in6_addr *addr = ((struct arg_dev_net_ip *)arg)->addr; + u32 tb6_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN; - if (!rt->nh && - ((void *)rt->fib6_nh->fib_nh_dev == dev || !dev) && - rt != net->ipv6.fib6_null_entry && + if (rt != net->ipv6.fib6_null_entry && + rt->fib6_table->tb6_id == tb6_id && ipv6_addr_equal(addr, &rt->fib6_prefsrc.addr)) { spin_lock_bh(&rt6_exception_lock); /* remove prefsrc entry */ @@ -4611,7 +4611,9 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp) .net = net, .addr = &ifp->addr, }; - fib6_clean_all(net, fib6_remove_prefsrc, &adni); + + if (!ipv6_chk_addr_and_flags(net, adni.addr, adni.dev, true, 0, IFA_F_TENTATIVE)) + fib6_clean_all(net, fib6_remove_prefsrc, &adni); } #define RTF_RA_ROUTER (RTF_ADDRCONF | RTF_DEFAULT) diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index 35d89dfa6f11..618aceb7497d 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -9,7 +9,7 @@ ret=0 ksft_skip=4 # all tests in this script. Can be overridden with -t option -TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh" +TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh" VERBOSE=0 PAUSE_ON_FAIL=no @@ -1796,6 +1796,8 @@ ipv4_del_addr_test() $IP li set dummy1 up $IP li add dummy2 type dummy $IP li set dummy2 up + $IP li add dummy3 type dummy + $IP li set dummy3 up $IP li add red type vrf table 1111 $IP li set red up $IP ro add vrf red unreachable default @@ -1808,11 +1810,13 @@ ipv4_del_addr_test() $IP addr add dev dummy2 172.16.104.1/24 $IP addr add dev dummy2 172.16.104.11/24 $IP addr add dev dummy2 172.16.104.12/24 + $IP addr add dev dummy3 172.16.104.1/24 $IP route add 172.16.105.0/24 via 172.16.104.2 src 172.16.104.11 $IP route add 172.16.106.0/24 dev lo src 172.16.104.12 $IP route add table 0 172.16.107.0/24 via 172.16.104.2 src 172.16.104.13 $IP route add vrf red 172.16.105.0/24 via 172.16.104.2 src 172.16.104.11 $IP route add vrf red 172.16.106.0/24 dev lo src 172.16.104.12 + $IP route add 172.16.108.0/24 via 172.16.104.2 src 172.16.104.1 set +e # removing address from device in vrf should only remove route from vrf table @@ -1864,11 +1868,117 @@ ipv4_del_addr_test() $IP ro ls | grep -q 172.16.107.0/24 log_test $? 1 "Route removed in default VRF when source address deleted" + # removing address from one device while other device still has this + # address should not remove the source route + echo " Identical address on different device" + $IP addr del dev dummy3 172.16.104.1/24 + $IP ro ls | grep -q 172.16.108.0/24 + log_test $? 0 "Route not removed when source address exists on other device" + $IP li del dummy1 $IP li del dummy2 + $IP li del dummy3 cleanup } +ipv6_del_addr_test() +{ + echo + echo "IPv6 delete address route tests" + + setup + + set -e + $IP li add dummy1 up type dummy + $IP li add dummy2 up type dummy + $IP li add dummy3 up type dummy + $IP li add red type vrf table 1111 + $IP li set red up + $IP ro add vrf red unreachable default + $IP li set dummy2 vrf red + + $IP addr add dev dummy1 2001:db8:104::1/64 + $IP addr add dev dummy1 2001:db8:104::11/64 + $IP addr add dev dummy1 2001:db8:104::12/64 + $IP addr add dev dummy1 2001:db8:104::13/64 + $IP addr add dev dummy2 2001:db8:104::1/64 + $IP addr add dev dummy2 2001:db8:104::11/64 + $IP addr add dev dummy2 2001:db8:104::12/64 + $IP addr add dev dummy3 2001:db8:104::1/64 + $IP route add 2001:db8:105::/64 via 2001:db8:104::2 src 2001:db8:104::11 + $IP route add 2001:db8:106::/64 dev lo src 2001:db8:104::12 + $IP route add table 0 2001:db8:107::/64 via 2001:db8:104::2 src 2001:db8:104::13 + $IP route add vrf red 2001:db8:105::/64 via 2001:db8:104::2 src 2001:db8:104::11 + $IP route add vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 + $IP route add 2001:db8:108::/64 via 2001:db8:104::2 src 2001:db8:104::1 + set +e + + # removing address from device in vrf should only remove it as a + # preferred source address from routes in vrf table + echo " Regular FIB info" + + $IP addr del dev dummy2 2001:db8:104::11/64 + # Checking if the source address exists instead of the dest subnet + # as IPv6 only removes the preferred source address, not whole route. + $IP -6 ro ls vrf red | grep -q "src 2001:db8:104::11" + log_test $? 1 "Prefsrc removed from VRF when source address deleted" + + $IP -6 ro ls | grep -q " src 2001:db8:104::11" + log_test $? 0 "Prefsrc in default VRF not removed" + + $IP addr add dev dummy2 2001:db8:104::11/64 + $IP route replace vrf red 2001:db8:105::/64 via 2001:db8:104::2 src 2001:db8:104::11 + + $IP addr del dev dummy1 2001:db8:104::11/64 + $IP -6 ro ls | grep -q "src 2001:db8:104::11" + log_test $? 1 "Prefsrc removed in default VRF when source address deleted" + + $IP -6 ro ls vrf red | grep -q "src 2001:db8:104::11" + log_test $? 0 "Prefsrc in VRF is not removed by address delete" + + # removing address from device in vrf should only remove preferred + # source address from vrf table even when the associated fib info + # only differs in table ID + echo " Identical FIB info with different table ID" + + $IP addr del dev dummy2 2001:db8:104::12/64 + $IP -6 ro ls vrf red | grep -q "src 2001:db8:104::12" + log_test $? 1 "Prefsrc removed from VRF when source address deleted" + + $IP -6 ro ls | grep -q "src 2001:db8:104::12" + log_test $? 0 "Prefsrc in default VRF not removed" + + $IP addr add dev dummy2 2001:db8:104::12/64 + $IP route replace vrf red 2001:db8:106::/64 dev lo src 2001:db8:104::12 + + $IP addr del dev dummy1 2001:db8:104::12/64 + $IP -6 ro ls | grep -q "src 2001:db8:104::12" + log_test $? 1 "Prefsrc removed in default VRF when source address deleted" + + $IP -6 ro ls vrf red | grep -q "src 2001:db8:104::12" + log_test $? 0 "Prefsrc in VRF is not removed by address delete" + + # removing address from device in default vrf should remove preferred + # source address from the default vrf even when route was inserted + # with a table ID of 0. + echo " Table ID 0" + + $IP addr del dev dummy1 2001:db8:104::13/64 + $IP -6 ro ls | grep -q "src 2001:db8:104::13" + log_test $? 1 "Prefsrc removed in default VRF when source address deleted" + + # removing address from one device while other device still has this + # address should not remove the source route + echo " Identical address on different devices" + $IP addr del dev dummy3 2001:db8:104::1/64 + $IP -6 ro ls | grep -q "src 2001:db8:104::1 " + log_test $? 0 "Prefsrc not removed when src address exists on other device" + + $IP li del dummy1 + $IP li del dummy2 + $IP li del dummy3 + cleanup +} ipv4_route_v6_gw_test() { @@ -2211,6 +2321,7 @@ do ipv6_addr_metric) ipv6_addr_metric_test;; ipv4_addr_metric) ipv4_addr_metric_test;; ipv4_del_addr) ipv4_del_addr_test;; + ipv6_del_addr) ipv6_del_addr_test;; ipv6_route_metrics) ipv6_route_metrics_test;; ipv4_route_metrics) ipv4_route_metrics_test;; ipv4_route_v6_gw) ipv4_route_v6_gw_test;;
After deleting an IPv6 address on an interface and cleaning up the related preferred source entries, it is important to ensure that all routes associated with the deleted address are properly cleared. The current implementation of rt6_remove_prefsrc() only checks the preferred source addresses bound to the current device. However, there may be routes that are bound to other devices but still utilize the same preferred source address. To address this issue, it is necessary to also delete entries that are bound to other interfaces but share the same source address with the current device. Failure to delete these entries would leave routes that are bound to the deleted address unclear. Here is an example reproducer (I have omitted unrelated routes): + ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 1:2:3:4::5/64 dev dummy1 + ip route add 7:7:7:0::1 dev dummy1 src 1:2:3:4::5 + ip route add 7:7:7:0::2 dev dummy2 src 1:2:3:4::5 + ip -6 route show 1:2:3:4::/64 dev dummy1 proto kernel metric 256 pref medium 7:7:7::1 dev dummy1 src 1:2:3:4::5 metric 1024 pref medium 7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium + ip addr del 1:2:3:4::5/64 dev dummy1 + ip -6 route show 7:7:7::1 dev dummy1 metric 1024 pref medium 7:7:7::2 dev dummy2 src 1:2:3:4::5 metric 1024 pref medium Ido notified that there is a commit 5a56a0b3a45d ("net: Don't delete routes in different VRFs") to not affect the route in different VRFs. To fix all these issues. Here are what we do: 1. In fib6_remove_prefsrc, remove the rt dev checking and add an table id checking to not remove the route in different VRFs. 2. In fib6_remove_prefsrc, remove the !rt-nh checking to clear the IPv6 routes that are using a nexthop object. This would be consistent with IPv4. 3. In rt6_remove_prefsrc, add a check to make sure not remove the src route if the address still exists on other device(in same VRF). After fix: + ip addr del 1:2:3:4::5/64 dev dummy1 + ip -6 route show 7:7:7::1 dev dummy1 metric 1024 pref medium 7:7:7::2 dev dummy2 metric 1024 pref medium An ipv6_del_addr test is also added for fib_tests.sh. Note that instead of removing the whole route for IPv4, IPv6 only remove the preferred source address for source routing. So in the testing use "grep -q src $src_ipv6_address" instead of "grep -q $dst_ipv6_subnet/64" when checking if the source route deleted. Here is the fib_tests.sh ipv6_del_addr test result. Before fix: IPv6 delete address route tests Regular FIB info TEST: Prefsrc removed from VRF when source address deleted [ OK ] TEST: Prefsrc in default VRF not removed [ OK ] TEST: Prefsrc removed in default VRF when source address deleted [ OK ] TEST: Prefsrc in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Prefsrc removed from VRF when source address deleted [FAIL] TEST: Prefsrc in default VRF not removed [ OK ] TEST: Prefsrc removed in default VRF when source address deleted [FAIL] TEST: Prefsrc in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Prefsrc removed in default VRF when source address deleted [ OK ] Identical address on different devices TEST: Prefsrc not removed when src address exists on other device [ OK ] After fix: IPv6 delete address route tests Regular FIB info TEST: Prefsrc removed from VRF when source address deleted [ OK ] TEST: Prefsrc in default VRF not removed [ OK ] TEST: Prefsrc removed in default VRF when source address deleted [ OK ] TEST: Prefsrc in VRF is not removed by address delete [ OK ] Identical FIB info with different table ID TEST: Prefsrc removed from VRF when source address deleted [ OK ] TEST: Prefsrc in default VRF not removed [ OK ] TEST: Prefsrc removed in default VRF when source address deleted [ OK ] TEST: Prefsrc in VRF is not removed by address delete [ OK ] Table ID 0 TEST: Prefsrc removed in default VRF when source address deleted [ OK ] Identical address on different devices TEST: Prefsrc not removed when src address exists on other device [ OK ] Reported-by: Thomas Haller <thaller@redhat.com> Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- v4: check if the prefsrc address still exists on other device v3: remove rt nh checking. update the ipv6_del_addr test descriptions v2: checking table id and update fib_test.sh --- net/ipv6/route.c | 10 +- tools/testing/selftests/net/fib_tests.sh | 113 ++++++++++++++++++++++- 2 files changed, 118 insertions(+), 5 deletions(-)