Message ID | 20220829100121.3821-1-nicolas.dichtel@6wind.com (mailing list archive) |
---|---|
State | Accepted |
Commit | eb55dc09b5dd040232d5de32812cc83001a23da6 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ip: fix triggering of 'icmp redirect' | expand |
On 8/29/22 4:01 AM, Nicolas Dichtel wrote: > __mkroute_input() uses fib_validate_source() to trigger an icmp redirect. > My understanding is that fib_validate_source() is used to know if the src > address and the gateway address are on the same link. For that, > fib_validate_source() returns 1 (same link) or 0 (not the same network). > __mkroute_input() is the only user of these positive values, all other > callers only look if the returned value is negative. > > Since the below patch, fib_validate_source() didn't return anymore 1 when > both addresses are on the same network, because the route lookup returns > RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. > Let's adapat the test to return 1 again when both addresses are on the same > link. > > CC: stable@vger.kernel.org > Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") > Reported-by: kernel test robot <yujie.liu@intel.com> > Reported-by: Heng Qi <hengqi@linux.alibaba.com> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> > --- > > This code exists since more than two decades: > https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f > > Please, feel free to comment if my analysis seems wrong. > > Regards, > Nicolas > > net/ipv4/fib_frontend.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c > index f361d3d56be2..943edf4ad4db 100644 > --- a/net/ipv4/fib_frontend.c > +++ b/net/ipv4/fib_frontend.c > @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, > dev_match = dev_match || (res.type == RTN_LOCAL && > dev == net->loopback_dev); > if (dev_match) { > - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; > + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; > return ret; > } > if (no_addr) > @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, > ret = 0; > if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) { > if (res.type == RTN_UNICAST) > - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; > + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; > } > return ret; > Looks ok to me. Reviewed-by: David Ahern <dsahern@kernel.org>
Hello: This patch was applied to netdev/net.git (master) by Jakub Kicinski <kuba@kernel.org>: On Mon, 29 Aug 2022 12:01:21 +0200 you wrote: > __mkroute_input() uses fib_validate_source() to trigger an icmp redirect. > My understanding is that fib_validate_source() is used to know if the src > address and the gateway address are on the same link. For that, > fib_validate_source() returns 1 (same link) or 0 (not the same network). > __mkroute_input() is the only user of these positive values, all other > callers only look if the returned value is negative. > > [...] Here is the summary with links: - [net] ip: fix triggering of 'icmp redirect' https://git.kernel.org/netdev/net/c/eb55dc09b5dd You are awesome, thank you!
Hello, On Mon, 29 Aug 2022, Nicolas Dichtel wrote: > __mkroute_input() uses fib_validate_source() to trigger an icmp redirect. > My understanding is that fib_validate_source() is used to know if the src > address and the gateway address are on the same link. For that, > fib_validate_source() returns 1 (same link) or 0 (not the same network). > __mkroute_input() is the only user of these positive values, all other > callers only look if the returned value is negative. > > Since the below patch, fib_validate_source() didn't return anymore 1 when > both addresses are on the same network, because the route lookup returns > RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. > Let's adapat the test to return 1 again when both addresses are on the same > link. > > CC: stable@vger.kernel.org > Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") > Reported-by: kernel test robot <yujie.liu@intel.com> > Reported-by: Heng Qi <hengqi@linux.alibaba.com> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> > --- > > This code exists since more than two decades: > https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f > > Please, feel free to comment if my analysis seems wrong. > > Regards, > Nicolas > > net/ipv4/fib_frontend.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c > index f361d3d56be2..943edf4ad4db 100644 > --- a/net/ipv4/fib_frontend.c > +++ b/net/ipv4/fib_frontend.c > @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, > dev_match = dev_match || (res.type == RTN_LOCAL && > dev == net->loopback_dev); > if (dev_match) { > - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; > + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; Looks like I'm late here. nhc_scope is related to the nhc_gw. nhc_scope = RT_SCOPE_LINK means rt_gw4 is a target behing a GW (nhc_gw). OTOH, RT_SCOPE_HOST means nhc_gw is 0 (missing) or a local IP and as result, rt_gw4 is directly connected. IIRC, this should look like this: nhc_gw nhc_scope rt_gw4 fib_scope (route) --------------------------------------------------------------------------- 0 RT_SCOPE_NOWHERE Host RT_SCOPE_HOST (local) 0 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link) LOCAL1 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link) REM_GW1 RT_SCOPE_LINK Universe RT_SCOPE_UNIVERSE (indirect) For the code above: we do not check res->scope, we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE). It means, reverse route points back to same device and sender is not reached via gateway REM_GW1. By changing it to nhc_scope >= RT_SCOPE_LINK, ret always will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253). Note that RT_SCOPE_HOST is 254. Looks like calling fib_info_update_nhc_saddr() in nh_create_ipv4() with fib_nh->fib_nh_scope (nhc_scope) is a problem, it should be fib_nh->fib_nh_scope - 1 or something like that, see below. 127.0.0.1 is selected because it is ifa_scope = RT_SCOPE_HOST, lo is first device and fib_nh_scope is RT_SCOPE_HOST when GW is not provided while creating the nexthop. About commit 747c14307214: - if nexthop_create() assigns always nhc_scope = RT_SCOPE_LINK then the assumption is that all added gateways are forced to be used for routes to universe? If GW is not provided, we should use nhc_scope = RT_SCOPE_HOST to allow link routes, right? Now, the question is, can I use same nexthop in routes with different scope ? What if later I add local IP that matches the IP used in nexthop? This nexthop's GW becomes local one. But these are corner cases. What I see as things to change: - fib_check_nexthop(): "scope == RT_SCOPE_HOST", "Route with host scope can not have multiple nexthops". Why not? We may need to spread traffic to multiple local IPs. But this is old problem and needs more investigation. - as fib_check_nh() is called (and sets nhc_scope) in nh_create_ipv4(), i.e. when a nexthop is added, we should tune nhc_scope in nh_create_ipv4() by selecting fib_cfg.fc_scope based on the present GW and then to provide it to fib_check_nh() and fib_info_update_nhc_saddr. As result, this nexthop will get valid nhc_scope based on the current IPs and link routes and also valid nh_saddr (not 127.0.0.1). We can do it as follows: .fc_scope = cfg->gw.ipv4 ? RT_SCOPE_UNIVERSE : RT_SCOPE_LINK, As result, we will also fix the scope provided to fib_info_update_nhc_saddr which looks like the main problem here. - later, if created route refers to this nexthop, fib_check_nexthop() should make sure the provided route's scope is not >= the nhc_scope, may be a check in nexthop_check_scope() is needed. This will ensure that new link route is not using nexthop with provided gateway. - __fib_validate_source() should match for RT_SCOPE_HOST as before. - fib_check_nh_nongw: no GW => RT_SCOPE_HOST So, we return to all state but with fixed argument to fib_info_update_nhc_saddr(). > return ret; > } > if (no_addr) > @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, > ret = 0; > if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) { > if (res.type == RTN_UNICAST) > - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; > + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; > } > return ret; > > -- > 2.33.0 Regards -- Julian Anastasov <ja@ssi.bg>
Le 27/09/2022 à 14:56, Julian Anastasov a écrit : > > Hello, Hello, thanks for the detailed explanation. > > On Mon, 29 Aug 2022, Nicolas Dichtel wrote: > >> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect. >> My understanding is that fib_validate_source() is used to know if the src >> address and the gateway address are on the same link. For that, >> fib_validate_source() returns 1 (same link) or 0 (not the same network). >> __mkroute_input() is the only user of these positive values, all other >> callers only look if the returned value is negative. >> >> Since the below patch, fib_validate_source() didn't return anymore 1 when >> both addresses are on the same network, because the route lookup returns >> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. >> Let's adapat the test to return 1 again when both addresses are on the same >> link. >> >> CC: stable@vger.kernel.org >> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") >> Reported-by: kernel test robot <yujie.liu@intel.com> >> Reported-by: Heng Qi <hengqi@linux.alibaba.com> >> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> >> --- >> >> This code exists since more than two decades: >> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f >> >> Please, feel free to comment if my analysis seems wrong. >> >> Regards, >> Nicolas >> >> net/ipv4/fib_frontend.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c >> index f361d3d56be2..943edf4ad4db 100644 >> --- a/net/ipv4/fib_frontend.c >> +++ b/net/ipv4/fib_frontend.c >> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, >> dev_match = dev_match || (res.type == RTN_LOCAL && >> dev == net->loopback_dev); >> if (dev_match) { >> - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; >> + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; > > Looks like I'm late here. nhc_scope is related to the > nhc_gw. nhc_scope = RT_SCOPE_LINK means rt_gw4 is a target behing a GW > (nhc_gw). OTOH, RT_SCOPE_HOST means nhc_gw is 0 (missing) or a local IP > and as result, rt_gw4 is directly connected. IIRC, this should look > like this: > > nhc_gw nhc_scope rt_gw4 fib_scope (route) > --------------------------------------------------------------------------- > 0 RT_SCOPE_NOWHERE Host RT_SCOPE_HOST (local) > 0 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link) > LOCAL1 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link) > REM_GW1 RT_SCOPE_LINK Universe RT_SCOPE_UNIVERSE (indirect) > > For the code above: we do not check res->scope, > we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE). > It means, reverse route points back to same device and sender is not > reached via gateway REM_GW1. iproute2 reject a gw which is not directly connected, am I wrong? > > By changing it to nhc_scope >= RT_SCOPE_LINK, ret always > will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253). > Note that RT_SCOPE_HOST is 254. Do you have a setup which shows the problem? > > Looks like calling fib_info_update_nhc_saddr() in > nh_create_ipv4() with fib_nh->fib_nh_scope (nhc_scope) is a problem, > it should be fib_nh->fib_nh_scope - 1 or something like that, > see below. 127.0.0.1 is selected because it is > ifa_scope = RT_SCOPE_HOST, lo is first device and fib_nh_scope is > RT_SCOPE_HOST when GW is not provided while creating the nexthop. > > About commit 747c14307214: > > - if nexthop_create() assigns always nhc_scope = RT_SCOPE_LINK then > the assumption is that all added gateways are forced to be used > for routes to universe? If GW is not provided, we should use > nhc_scope = RT_SCOPE_HOST to allow link routes, right? > Now, the question is, can I use same nexthop in routes > with different scope ? What if later I add local IP that matches > the IP used in nexthop? This nexthop's GW becomes local one. > But these are corner cases. > > What I see as things to change: > > - fib_check_nexthop(): "scope == RT_SCOPE_HOST", > "Route with host scope can not have multiple nexthops". > Why not? We may need to spread traffic to multiple > local IPs. But this is old problem and needs more > investigation. > > - as fib_check_nh() is called (and sets nhc_scope) in > nh_create_ipv4(), i.e. when a nexthop is added, we should > tune nhc_scope in nh_create_ipv4() by selecting fib_cfg.fc_scope > based on the present GW and then to provide it to fib_check_nh() > and fib_info_update_nhc_saddr. As result, this nexthop will get > valid nhc_scope based on the current IPs and link routes and > also valid nh_saddr (not 127.0.0.1). We can do it as follows: > > .fc_scope = cfg->gw.ipv4 ? RT_SCOPE_UNIVERSE : > RT_SCOPE_LINK, > > As result, we will also fix the scope provided to > fib_info_update_nhc_saddr which looks like the main > problem here. > > - later, if created route refers to this nexthop, > fib_check_nexthop() should make sure the provided > route's scope is not >= the nhc_scope, may be a > check in nexthop_check_scope() is needed. This > will ensure that new link route is not using nexthop > with provided gateway. > > - __fib_validate_source() should match for RT_SCOPE_HOST > as before. > > - fib_check_nh_nongw: no GW => RT_SCOPE_HOST After reverting the two commits (747c14307214 and eb55dc09b5dd) and putting the below patch, the initial problem is fixed. But it's not clear what is broken with the current code. Before sending these patches formally, it would be nice to be able to add a selftest to demonstrate what is wrong. @@ -2534,7 +2534,7 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh, if (!err) { nh->nh_flags = fib_nh->fib_nh_flags; fib_info_update_nhc_saddr(net, &fib_nh->nh_common, - fib_nh->fib_nh_scope); + !fib_nh->fib_nh_scope ? 0 : fib_nh->fib_nh_scope - 1); } else { fib_nh_release(net, fib_nh); }
Hello, On Thu, 29 Sep 2022, Nicolas Dichtel wrote: > Le 27/09/2022 à 14:56, Julian Anastasov a écrit : > > > > nhc_gw nhc_scope rt_gw4 fib_scope (route) > > --------------------------------------------------------------------------- > > 0 RT_SCOPE_NOWHERE Host RT_SCOPE_HOST (local) > > 0 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link) > > LOCAL1 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link) > > REM_GW1 RT_SCOPE_LINK Universe RT_SCOPE_UNIVERSE (indirect) > > > > For the code above: we do not check res->scope, > > we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE). > > It means, reverse route points back to same device and sender is not > > reached via gateway REM_GW1. In short, to send redirects, sender should be reachable via link route (with nhc_scope = RT_SCOPE_HOST). > iproute2 reject a gw which is not directly connected, am I wrong? ip tool can not do it. It only provides route's scope specified by user and the GW's scope is determined by fib_check_nh_v4_gw() as route's scope + 1 but at least RT_SCOPE_LINK: /* It is not necessary, but requires a bit of thinking */ if (fl4.flowi4_scope < RT_SCOPE_LINK) fl4.flowi4_scope = RT_SCOPE_LINK; The other allowed value for nhc_scope when rt_gw4 is not 0 is RT_SCOPE_HOST (GW is a local IP, useful when autoselecting source address from same subnet). It is set by fib_check_nh_v4_gw() when res.type = RTN_LOCAL. > > By changing it to nhc_scope >= RT_SCOPE_LINK, ret always > > will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253). > > Note that RT_SCOPE_HOST is 254. > Do you have a setup which shows the problem? No, just by analyze. RT_SCOPE_LINK indicates sender is reached via GW. > After reverting the two commits (747c14307214 and eb55dc09b5dd) and putting the > below patch, the initial problem is fixed. But it's not clear what is broken > with the current code. Before sending these patches formally, it would be nice > to be able to add a selftest to demonstrate what is wrong. What is broken? I guess, __fib_validate_source always returns 1 causing redirects. As for nh_create_ipv4(), may be using scope=0 as arg to fib_check_nh() should work. Now I can not find example for corner case where this can fail. > @@ -2534,7 +2534,7 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh, > if (!err) { > nh->nh_flags = fib_nh->fib_nh_flags; > fib_info_update_nhc_saddr(net, &fib_nh->nh_common, > - fib_nh->fib_nh_scope); > + !fib_nh->fib_nh_scope ? 0 : fib_nh->fib_nh_scope - 1); And this fix is needed to not expose scope host saddr (127.0.0.1) when nexthop is without GW and to not expose scope link saddr when nexthop is with GW (for traffic via scope global routes). > } else { > fib_nh_release(net, fib_nh); > } Regards -- Julian Anastasov <ja@ssi.bg>
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index f361d3d56be2..943edf4ad4db 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, dev_match = dev_match || (res.type == RTN_LOCAL && dev == net->loopback_dev); if (dev_match) { - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; return ret; } if (no_addr) @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, ret = 0; if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) { if (res.type == RTN_UNICAST) - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST; + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK; } return ret;
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect. My understanding is that fib_validate_source() is used to know if the src address and the gateway address are on the same link. For that, fib_validate_source() returns 1 (same link) or 0 (not the same network). __mkroute_input() is the only user of these positive values, all other callers only look if the returned value is negative. Since the below patch, fib_validate_source() didn't return anymore 1 when both addresses are on the same network, because the route lookup returns RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. Let's adapat the test to return 1 again when both addresses are on the same link. CC: stable@vger.kernel.org Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> --- This code exists since more than two decades: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f Please, feel free to comment if my analysis seems wrong. Regards, Nicolas net/ipv4/fib_frontend.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)