Message ID | 20221026032017.3675060-1-william.xuanziyang@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ipv4: fix source address and gateway mismatch under multiple default gateways | expand |
Hello, On Wed, 26 Oct 2022, Ziyang Xuan wrote: > We found a problem that source address doesn't match with selected gateway > under multiple default gateways. The reproducer is as following: > > Setup in client as following: > > $ ip link add link eth2 dev eth2.71 type vlan id 71 > $ ip link add link eth2 dev eth2.72 type vlan id 72 > $ ip addr add 192.168.71.41/24 dev eth2.71 > $ ip addr add 192.168.72.41/24 dev eth2.72 > $ ip link set eth2.71 up > $ ip link set eth2.72 up > $ route add -net default gw 192.168.71.1 dev eth2.71 > $ route add -net default gw 192.168.72.1 dev eth2.72 Second route goes to first position due to the prepend operation for the route add command. That is why 192.168.72.41 is selected. > Add a nameserver configuration in the following file: > $ cat /etc/resolv.conf > nameserver 8.8.8.8 > > Setup in peer server as following: > > $ ip link add link eth2 dev eth2.71 type vlan id 71 > $ ip link add link eth2 dev eth2.72 type vlan id 72 > $ ip addr add 192.168.71.1/24 dev eth2.71 > $ ip addr add 192.168.72.1/24 dev eth2.72 > $ ip link set eth2.71 up > $ ip link set eth2.72 up > > Use the following command trigger DNS packet in client: > $ ping www.baidu.com > > Capture packets with tcpdump in client when ping: > $ tcpdump -i eth2 -vne > ... > 20:30:22.996044 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25407, offset 0, flags [DF], proto UDP (17), length 59) > 192.168.72.41.42666 > 8.8.8.8.domain: 58562+ A? www.baidu.com. (31) > ... > > We get the problem that IPv4 saddr "192.168.72.41" do not match with > selected VLAN device "eth2.71". The problem could be that source address is selected once and later used as source address in following routing lookups. And your routing rules do not express the restriction that both addresses can not be used for specific route. If you have such restriction which is common, you should use source-specific routes: 1. ip rule to consider table main only for link routes, no default routes here ip rule add prio 10 table main 2. source-specific routes: ip rule add prio 20 from 192.168.71.0/24 table 20 ip route append default via 192.168.71.1 dev eth2.71 src 192.168.71.41 table 20 ip rule add prio 30 from 192.168.72.0/24 table 30 ip route append default via 192.168.72.1 dev eth2.72 src 192.168.72.41 table 30 3. Store default alternative routes not in table main: ip rule add prio 200 table 200 ip route append default via 192.168.71.1 dev eth2.71 src 192.168.71.41 table 200 ip route append default via 192.168.72.1 dev eth2.72 src 192.168.72.41 table 200 Above routes should work even without specifying prefsrc. As result, table 200 is used only for routing lookups without specific source address, usually for first packet in connection, next packets should hit tables 20/30. You can check https://ja.ssi.bg/dgd-usage.txt for such examples, see under 2. Alternative routes and dead gateway detection > In above scenario, the process does __ip_route_output_key() twice in > ip_route_connect(), the two processes have chosen different default gateway, > and the last choice is not the best. > > Add flowi4->saddr and fib_nh_common->nhc_gw.ipv4 matching consideration in > fib_select_default() to fix that. Other setups may not have such restriction, they can prefer any gateway in reachable state no matter the saddr. > Fixes: 19baf839ff4a ("[IPV4]: Add LC-Trie FIB lookup algorithm.") > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > --- > net/ipv4/fib_semantics.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index e9a7f70a54df..8bd94875a009 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -2046,6 +2046,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) > int order = -1, last_idx = -1; > struct fib_alias *fa, *fa1 = NULL; > u32 last_prio = res->fi->fib_priority; > + u8 prefix, max_prefix = 0; > dscp_t last_dscp = 0; > > hlist_for_each_entry_rcu(fa, fa_head, fa_list) { > @@ -2078,6 +2079,11 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) > if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) > continue; > > + prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); > + if (prefix < max_prefix) > + continue; > + max_prefix = max_t(u8, prefix, max_prefix); > + > fib_alias_accessed(fa); > > if (!fi) { > -- > 2.25.1 Regards -- Julian Anastasov <ja@ssi.bg>
Hi Ziyang, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net/master] url: https://github.com/intel-lab-lkp/linux/commits/Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 patch link: https://lore.kernel.org/r/20221026032017.3675060-1-william.xuanziyang%40huawei.com patch subject: [PATCH net] ipv4: fix source address and gateway mismatch under multiple default gateways config: sparc-randconfig-s033-20221026 compiler: sparc64-linux-gcc (GCC) 12.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.4-39-gce1a6720-dirty # https://github.com/intel-lab-lkp/linux/commit/10ba8a9a3b91638837103326eb6a23b814b5ee2b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 git checkout 10ba8a9a3b91638837103326eb6a23b814b5ee2b # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=sparc SHELL=/bin/bash net/ipv4/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> sparse warnings: (new ones prefixed by >>) >> net/ipv4/fib_semantics.c:2082:43: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned long @@ got restricted __be32 @@ net/ipv4/fib_semantics.c:2082:43: sparse: expected unsigned long net/ipv4/fib_semantics.c:2082:43: sparse: got restricted __be32 vim +2082 net/ipv4/fib_semantics.c 2038 2039 /* Must be invoked inside of an RCU protected region. */ 2040 static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) 2041 { 2042 struct fib_info *fi = NULL, *last_resort = NULL; 2043 struct hlist_head *fa_head = res->fa_head; 2044 struct fib_table *tb = res->table; 2045 u8 slen = 32 - res->prefixlen; 2046 int order = -1, last_idx = -1; 2047 struct fib_alias *fa, *fa1 = NULL; 2048 u32 last_prio = res->fi->fib_priority; 2049 u8 prefix, max_prefix = 0; 2050 dscp_t last_dscp = 0; 2051 2052 hlist_for_each_entry_rcu(fa, fa_head, fa_list) { 2053 struct fib_info *next_fi = fa->fa_info; 2054 struct fib_nh_common *nhc; 2055 2056 if (fa->fa_slen != slen) 2057 continue; 2058 if (fa->fa_dscp && 2059 fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos)) 2060 continue; 2061 if (fa->tb_id != tb->tb_id) 2062 continue; 2063 if (next_fi->fib_priority > last_prio && 2064 fa->fa_dscp == last_dscp) { 2065 if (last_dscp) 2066 continue; 2067 break; 2068 } 2069 if (next_fi->fib_flags & RTNH_F_DEAD) 2070 continue; 2071 last_dscp = fa->fa_dscp; 2072 last_prio = next_fi->fib_priority; 2073 2074 if (next_fi->fib_scope != res->scope || 2075 fa->fa_type != RTN_UNICAST) 2076 continue; 2077 2078 nhc = fib_info_nhc(next_fi, 0); 2079 if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) 2080 continue; 2081 > 2082 prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); 2083 if (prefix < max_prefix) 2084 continue; 2085 max_prefix = max_t(u8, prefix, max_prefix); 2086 2087 fib_alias_accessed(fa); 2088 2089 if (!fi) { 2090 if (next_fi != res->fi) 2091 break; 2092 fa1 = fa; 2093 } else if (!fib_detect_death(fi, order, &last_resort, 2094 &last_idx, fa1->fa_default)) { 2095 fib_result_assign(res, fi); 2096 fa1->fa_default = order; 2097 goto out; 2098 } 2099 fi = next_fi; 2100 order++; 2101 } 2102 2103 if (order <= 0 || !fi) { 2104 if (fa1) 2105 fa1->fa_default = -1; 2106 goto out; 2107 } 2108 2109 if (!fib_detect_death(fi, order, &last_resort, &last_idx, 2110 fa1->fa_default)) { 2111 fib_result_assign(res, fi); 2112 fa1->fa_default = order; 2113 goto out; 2114 } 2115 2116 if (last_idx >= 0) 2117 fib_result_assign(res, last_resort); 2118 fa1->fa_default = last_idx; 2119 out: 2120 return; 2121 } 2122
Hi Ziyang, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net/master] url: https://github.com/intel-lab-lkp/linux/commits/Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 patch link: https://lore.kernel.org/r/20221026032017.3675060-1-william.xuanziyang%40huawei.com patch subject: [PATCH net] ipv4: fix source address and gateway mismatch under multiple default gateways config: loongarch-randconfig-s043-20221024 compiler: loongarch64-linux-gcc (GCC) 12.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.4-39-gce1a6720-dirty # https://github.com/intel-lab-lkp/linux/commit/10ba8a9a3b91638837103326eb6a23b814b5ee2b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 git checkout 10ba8a9a3b91638837103326eb6a23b814b5ee2b # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=loongarch SHELL=/bin/bash net/ipv4/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> sparse warnings: (new ones prefixed by >>) >> net/ipv4/fib_semantics.c:2082:43: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned long word @@ got restricted __be32 @@ net/ipv4/fib_semantics.c:2082:43: sparse: expected unsigned long word net/ipv4/fib_semantics.c:2082:43: sparse: got restricted __be32 vim +2082 net/ipv4/fib_semantics.c 2038 2039 /* Must be invoked inside of an RCU protected region. */ 2040 static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) 2041 { 2042 struct fib_info *fi = NULL, *last_resort = NULL; 2043 struct hlist_head *fa_head = res->fa_head; 2044 struct fib_table *tb = res->table; 2045 u8 slen = 32 - res->prefixlen; 2046 int order = -1, last_idx = -1; 2047 struct fib_alias *fa, *fa1 = NULL; 2048 u32 last_prio = res->fi->fib_priority; 2049 u8 prefix, max_prefix = 0; 2050 dscp_t last_dscp = 0; 2051 2052 hlist_for_each_entry_rcu(fa, fa_head, fa_list) { 2053 struct fib_info *next_fi = fa->fa_info; 2054 struct fib_nh_common *nhc; 2055 2056 if (fa->fa_slen != slen) 2057 continue; 2058 if (fa->fa_dscp && 2059 fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos)) 2060 continue; 2061 if (fa->tb_id != tb->tb_id) 2062 continue; 2063 if (next_fi->fib_priority > last_prio && 2064 fa->fa_dscp == last_dscp) { 2065 if (last_dscp) 2066 continue; 2067 break; 2068 } 2069 if (next_fi->fib_flags & RTNH_F_DEAD) 2070 continue; 2071 last_dscp = fa->fa_dscp; 2072 last_prio = next_fi->fib_priority; 2073 2074 if (next_fi->fib_scope != res->scope || 2075 fa->fa_type != RTN_UNICAST) 2076 continue; 2077 2078 nhc = fib_info_nhc(next_fi, 0); 2079 if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) 2080 continue; 2081 > 2082 prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); 2083 if (prefix < max_prefix) 2084 continue; 2085 max_prefix = max_t(u8, prefix, max_prefix); 2086 2087 fib_alias_accessed(fa); 2088 2089 if (!fi) { 2090 if (next_fi != res->fi) 2091 break; 2092 fa1 = fa; 2093 } else if (!fib_detect_death(fi, order, &last_resort, 2094 &last_idx, fa1->fa_default)) { 2095 fib_result_assign(res, fi); 2096 fa1->fa_default = order; 2097 goto out; 2098 } 2099 fi = next_fi; 2100 order++; 2101 } 2102 2103 if (order <= 0 || !fi) { 2104 if (fa1) 2105 fa1->fa_default = -1; 2106 goto out; 2107 } 2108 2109 if (!fib_detect_death(fi, order, &last_resort, &last_idx, 2110 fa1->fa_default)) { 2111 fib_result_assign(res, fi); 2112 fa1->fa_default = order; 2113 goto out; 2114 } 2115 2116 if (last_idx >= 0) 2117 fib_result_assign(res, last_resort); 2118 fa1->fa_default = last_idx; 2119 out: 2120 return; 2121 } 2122
Hi Ziyang, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net/master] url: https://github.com/intel-lab-lkp/linux/commits/Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 patch link: https://lore.kernel.org/r/20221026032017.3675060-1-william.xuanziyang%40huawei.com patch subject: [PATCH net] ipv4: fix source address and gateway mismatch under multiple default gateways config: x86_64-randconfig-s023 compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.4-39-gce1a6720-dirty # https://github.com/intel-lab-lkp/linux/commit/10ba8a9a3b91638837103326eb6a23b814b5ee2b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 git checkout 10ba8a9a3b91638837103326eb6a23b814b5ee2b # save the config file mkdir build_dir && cp config build_dir/.config make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash net/ipv4/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> sparse warnings: (new ones prefixed by >>) net/ipv4/fib_semantics.c:2082:26: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned long word @@ got restricted __be32 @@ net/ipv4/fib_semantics.c:2082:26: sparse: expected unsigned long word net/ipv4/fib_semantics.c:2082:26: sparse: got restricted __be32 >> net/ipv4/fib_semantics.c:2082:26: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected long @@ got restricted __be32 @@ net/ipv4/fib_semantics.c:2082:26: sparse: expected long net/ipv4/fib_semantics.c:2082:26: sparse: got restricted __be32 vim +2082 net/ipv4/fib_semantics.c 2038 2039 /* Must be invoked inside of an RCU protected region. */ 2040 static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) 2041 { 2042 struct fib_info *fi = NULL, *last_resort = NULL; 2043 struct hlist_head *fa_head = res->fa_head; 2044 struct fib_table *tb = res->table; 2045 u8 slen = 32 - res->prefixlen; 2046 int order = -1, last_idx = -1; 2047 struct fib_alias *fa, *fa1 = NULL; 2048 u32 last_prio = res->fi->fib_priority; 2049 u8 prefix, max_prefix = 0; 2050 dscp_t last_dscp = 0; 2051 2052 hlist_for_each_entry_rcu(fa, fa_head, fa_list) { 2053 struct fib_info *next_fi = fa->fa_info; 2054 struct fib_nh_common *nhc; 2055 2056 if (fa->fa_slen != slen) 2057 continue; 2058 if (fa->fa_dscp && 2059 fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos)) 2060 continue; 2061 if (fa->tb_id != tb->tb_id) 2062 continue; 2063 if (next_fi->fib_priority > last_prio && 2064 fa->fa_dscp == last_dscp) { 2065 if (last_dscp) 2066 continue; 2067 break; 2068 } 2069 if (next_fi->fib_flags & RTNH_F_DEAD) 2070 continue; 2071 last_dscp = fa->fa_dscp; 2072 last_prio = next_fi->fib_priority; 2073 2074 if (next_fi->fib_scope != res->scope || 2075 fa->fa_type != RTN_UNICAST) 2076 continue; 2077 2078 nhc = fib_info_nhc(next_fi, 0); 2079 if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) 2080 continue; 2081 > 2082 prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); 2083 if (prefix < max_prefix) 2084 continue; 2085 max_prefix = max_t(u8, prefix, max_prefix); 2086 2087 fib_alias_accessed(fa); 2088 2089 if (!fi) { 2090 if (next_fi != res->fi) 2091 break; 2092 fa1 = fa; 2093 } else if (!fib_detect_death(fi, order, &last_resort, 2094 &last_idx, fa1->fa_default)) { 2095 fib_result_assign(res, fi); 2096 fa1->fa_default = order; 2097 goto out; 2098 } 2099 fi = next_fi; 2100 order++; 2101 } 2102 2103 if (order <= 0 || !fi) { 2104 if (fa1) 2105 fa1->fa_default = -1; 2106 goto out; 2107 } 2108 2109 if (!fib_detect_death(fi, order, &last_resort, &last_idx, 2110 fa1->fa_default)) { 2111 fib_result_assign(res, fi); 2112 fa1->fa_default = order; 2113 goto out; 2114 } 2115 2116 if (last_idx >= 0) 2117 fib_result_assign(res, last_resort); 2118 fa1->fa_default = last_idx; 2119 out: 2120 return; 2121 } 2122
Hi Ziyang, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net/master] url: https://github.com/intel-lab-lkp/linux/commits/Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 patch link: https://lore.kernel.org/r/20221026032017.3675060-1-william.xuanziyang%40huawei.com patch subject: [PATCH net] ipv4: fix source address and gateway mismatch under multiple default gateways config: parisc-randconfig-s031-20221026 compiler: hppa64-linux-gcc (GCC) 12.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.4-39-gce1a6720-dirty # https://github.com/intel-lab-lkp/linux/commit/10ba8a9a3b91638837103326eb6a23b814b5ee2b git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Ziyang-Xuan/ipv4-fix-source-address-and-gateway-mismatch-under-multiple-default-gateways/20221026-112135 git checkout 10ba8a9a3b91638837103326eb6a23b814b5ee2b # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> sparse warnings: (new ones prefixed by >>) >> net/ipv4/fib_semantics.c:2082:43: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned long x @@ got restricted __be32 @@ net/ipv4/fib_semantics.c:2082:43: sparse: expected unsigned long x net/ipv4/fib_semantics.c:2082:43: sparse: got restricted __be32 vim +2082 net/ipv4/fib_semantics.c 2038 2039 /* Must be invoked inside of an RCU protected region. */ 2040 static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) 2041 { 2042 struct fib_info *fi = NULL, *last_resort = NULL; 2043 struct hlist_head *fa_head = res->fa_head; 2044 struct fib_table *tb = res->table; 2045 u8 slen = 32 - res->prefixlen; 2046 int order = -1, last_idx = -1; 2047 struct fib_alias *fa, *fa1 = NULL; 2048 u32 last_prio = res->fi->fib_priority; 2049 u8 prefix, max_prefix = 0; 2050 dscp_t last_dscp = 0; 2051 2052 hlist_for_each_entry_rcu(fa, fa_head, fa_list) { 2053 struct fib_info *next_fi = fa->fa_info; 2054 struct fib_nh_common *nhc; 2055 2056 if (fa->fa_slen != slen) 2057 continue; 2058 if (fa->fa_dscp && 2059 fa->fa_dscp != inet_dsfield_to_dscp(flp->flowi4_tos)) 2060 continue; 2061 if (fa->tb_id != tb->tb_id) 2062 continue; 2063 if (next_fi->fib_priority > last_prio && 2064 fa->fa_dscp == last_dscp) { 2065 if (last_dscp) 2066 continue; 2067 break; 2068 } 2069 if (next_fi->fib_flags & RTNH_F_DEAD) 2070 continue; 2071 last_dscp = fa->fa_dscp; 2072 last_prio = next_fi->fib_priority; 2073 2074 if (next_fi->fib_scope != res->scope || 2075 fa->fa_type != RTN_UNICAST) 2076 continue; 2077 2078 nhc = fib_info_nhc(next_fi, 0); 2079 if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) 2080 continue; 2081 > 2082 prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); 2083 if (prefix < max_prefix) 2084 continue; 2085 max_prefix = max_t(u8, prefix, max_prefix); 2086 2087 fib_alias_accessed(fa); 2088 2089 if (!fi) { 2090 if (next_fi != res->fi) 2091 break; 2092 fa1 = fa; 2093 } else if (!fib_detect_death(fi, order, &last_resort, 2094 &last_idx, fa1->fa_default)) { 2095 fib_result_assign(res, fi); 2096 fa1->fa_default = order; 2097 goto out; 2098 } 2099 fi = next_fi; 2100 order++; 2101 } 2102 2103 if (order <= 0 || !fi) { 2104 if (fa1) 2105 fa1->fa_default = -1; 2106 goto out; 2107 } 2108 2109 if (!fib_detect_death(fi, order, &last_resort, &last_idx, 2110 fa1->fa_default)) { 2111 fib_result_assign(res, fi); 2112 fa1->fa_default = order; 2113 goto out; 2114 } 2115 2116 if (last_idx >= 0) 2117 fib_result_assign(res, last_resort); 2118 fa1->fa_default = last_idx; 2119 out: 2120 return; 2121 } 2122
> > Hello, > > On Wed, 26 Oct 2022, Ziyang Xuan wrote: > >> We found a problem that source address doesn't match with selected gateway >> under multiple default gateways. The reproducer is as following: >> >> Setup in client as following: >> >> $ ip link add link eth2 dev eth2.71 type vlan id 71 >> $ ip link add link eth2 dev eth2.72 type vlan id 72 >> $ ip addr add 192.168.71.41/24 dev eth2.71 >> $ ip addr add 192.168.72.41/24 dev eth2.72 >> $ ip link set eth2.71 up >> $ ip link set eth2.72 up >> $ route add -net default gw 192.168.71.1 dev eth2.71 >> $ route add -net default gw 192.168.72.1 dev eth2.72 > > Second route goes to first position due to the > prepend operation for the route add command. That is > why 192.168.72.41 is selected. > >> Add a nameserver configuration in the following file: >> $ cat /etc/resolv.conf >> nameserver 8.8.8.8 >> >> Setup in peer server as following: >> >> $ ip link add link eth2 dev eth2.71 type vlan id 71 >> $ ip link add link eth2 dev eth2.72 type vlan id 72 >> $ ip addr add 192.168.71.1/24 dev eth2.71 >> $ ip addr add 192.168.72.1/24 dev eth2.72 >> $ ip link set eth2.71 up >> $ ip link set eth2.72 up >> >> Use the following command trigger DNS packet in client: >> $ ping www.baidu.com >> >> Capture packets with tcpdump in client when ping: >> $ tcpdump -i eth2 -vne >> ... >> 20:30:22.996044 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25407, offset 0, flags [DF], proto UDP (17), length 59) >> 192.168.72.41.42666 > 8.8.8.8.domain: 58562+ A? www.baidu.com. (31) >> ... >> >> We get the problem that IPv4 saddr "192.168.72.41" do not match with >> selected VLAN device "eth2.71". > > The problem could be that source address is selected > once and later used as source address in following routing lookups. > > And your routing rules do not express the restriction that > both addresses can not be used for specific route. If you have > such restriction which is common, you should use source-specific routes: Hi Julian, Thank you for your quick reply. Can we make some work to make the restriction "both addresses can not be used for specific route" in code but in consciousness? Thanks. > > 1. ip rule to consider table main only for link routes, > no default routes here > > ip rule add prio 10 table main > > 2. source-specific routes: > > ip rule add prio 20 from 192.168.71.0/24 table 20 > ip route append default via 192.168.71.1 dev eth2.71 src 192.168.71.41 table 20 > ip rule add prio 30 from 192.168.72.0/24 table 30 > ip route append default via 192.168.72.1 dev eth2.72 src 192.168.72.41 table 30 > > 3. Store default alternative routes not in table main: > ip rule add prio 200 table 200 > ip route append default via 192.168.71.1 dev eth2.71 src 192.168.71.41 table 200 > ip route append default via 192.168.72.1 dev eth2.72 src 192.168.72.41 table 200 > > Above routes should work even without specifying prefsrc. > > As result, table 200 is used only for routing lookups > without specific source address, usually for first packet in > connection, next packets should hit tables 20/30. > > You can check https://ja.ssi.bg/dgd-usage.txt for such > examples, see under 2. Alternative routes and dead gateway detection > >> In above scenario, the process does __ip_route_output_key() twice in >> ip_route_connect(), the two processes have chosen different default gateway, >> and the last choice is not the best. >> >> Add flowi4->saddr and fib_nh_common->nhc_gw.ipv4 matching consideration in >> fib_select_default() to fix that. > > Other setups may not have such restriction, they can > prefer any gateway in reachable state no matter the saddr. > >> Fixes: 19baf839ff4a ("[IPV4]: Add LC-Trie FIB lookup algorithm.") >> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> >> --- >> net/ipv4/fib_semantics.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c >> index e9a7f70a54df..8bd94875a009 100644 >> --- a/net/ipv4/fib_semantics.c >> +++ b/net/ipv4/fib_semantics.c >> @@ -2046,6 +2046,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) >> int order = -1, last_idx = -1; >> struct fib_alias *fa, *fa1 = NULL; >> u32 last_prio = res->fi->fib_priority; >> + u8 prefix, max_prefix = 0; >> dscp_t last_dscp = 0; >> >> hlist_for_each_entry_rcu(fa, fa_head, fa_list) { >> @@ -2078,6 +2079,11 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) >> if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) >> continue; >> >> + prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); >> + if (prefix < max_prefix) >> + continue; >> + max_prefix = max_t(u8, prefix, max_prefix); >> + >> fib_alias_accessed(fa); >> >> if (!fi) { >> -- >> 2.25.1 > > Regards > > -- > Julian Anastasov <ja@ssi.bg> > > > . >
Hello, On Sat, 29 Oct 2022, Ziyang Xuan (William) wrote: > > On Wed, 26 Oct 2022, Ziyang Xuan wrote: > > > >> We found a problem that source address doesn't match with selected gateway > >> under multiple default gateways. The reproducer is as following: > >> > >> Setup in client as following: > >> > >> $ ip link add link eth2 dev eth2.71 type vlan id 71 > >> $ ip link add link eth2 dev eth2.72 type vlan id 72 > >> $ ip addr add 192.168.71.41/24 dev eth2.71 > >> $ ip addr add 192.168.72.41/24 dev eth2.72 > >> $ ip link set eth2.71 up > >> $ ip link set eth2.72 up > >> $ route add -net default gw 192.168.71.1 dev eth2.71 > >> $ route add -net default gw 192.168.72.1 dev eth2.72 > > > > Second route goes to first position due to the > > prepend operation for the route add command. That is > > why 192.168.72.41 is selected. > > > >> Add a nameserver configuration in the following file: > >> $ cat /etc/resolv.conf > >> nameserver 8.8.8.8 > >> > >> Setup in peer server as following: > >> > >> $ ip link add link eth2 dev eth2.71 type vlan id 71 > >> $ ip link add link eth2 dev eth2.72 type vlan id 72 > >> $ ip addr add 192.168.71.1/24 dev eth2.71 > >> $ ip addr add 192.168.72.1/24 dev eth2.72 > >> $ ip link set eth2.71 up > >> $ ip link set eth2.72 up > >> > >> Use the following command trigger DNS packet in client: > >> $ ping www.baidu.com > >> > >> Capture packets with tcpdump in client when ping: > >> $ tcpdump -i eth2 -vne > >> ... > >> 20:30:22.996044 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25407, offset 0, flags [DF], proto UDP (17), length 59) > >> 192.168.72.41.42666 > 8.8.8.8.domain: 58562+ A? www.baidu.com. (31) > >> ... > >> > >> We get the problem that IPv4 saddr "192.168.72.41" do not match with > >> selected VLAN device "eth2.71". > > > > The problem could be that source address is selected > > once and later used as source address in following routing lookups. > > > > And your routing rules do not express the restriction that > > both addresses can not be used for specific route. If you have > > such restriction which is common, you should use source-specific routes: > > Hi Julian, > > Thank you for your quick reply. > > Can we make some work to make the restriction "both addresses can not be used for specific route" in code but in consciousness? No good idea about this. Not sure if the new 'nexthop' object is suitable for such policy. Regards -- Julian Anastasov <ja@ssi.bg>
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index e9a7f70a54df..8bd94875a009 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -2046,6 +2046,7 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) int order = -1, last_idx = -1; struct fib_alias *fa, *fa1 = NULL; u32 last_prio = res->fi->fib_priority; + u8 prefix, max_prefix = 0; dscp_t last_dscp = 0; hlist_for_each_entry_rcu(fa, fa_head, fa_list) { @@ -2078,6 +2079,11 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) if (!nhc->nhc_gw_family || nhc->nhc_scope != RT_SCOPE_LINK) continue; + prefix = __ffs(flp->saddr ^ nhc->nhc_gw.ipv4); + if (prefix < max_prefix) + continue; + max_prefix = max_t(u8, prefix, max_prefix); + fib_alias_accessed(fa); if (!fi) {
We found a problem that source address doesn't match with selected gateway under multiple default gateways. The reproducer is as following: Setup in client as following: $ ip link add link eth2 dev eth2.71 type vlan id 71 $ ip link add link eth2 dev eth2.72 type vlan id 72 $ ip addr add 192.168.71.41/24 dev eth2.71 $ ip addr add 192.168.72.41/24 dev eth2.72 $ ip link set eth2.71 up $ ip link set eth2.72 up $ route add -net default gw 192.168.71.1 dev eth2.71 $ route add -net default gw 192.168.72.1 dev eth2.72 Add a nameserver configuration in the following file: $ cat /etc/resolv.conf nameserver 8.8.8.8 Setup in peer server as following: $ ip link add link eth2 dev eth2.71 type vlan id 71 $ ip link add link eth2 dev eth2.72 type vlan id 72 $ ip addr add 192.168.71.1/24 dev eth2.71 $ ip addr add 192.168.72.1/24 dev eth2.72 $ ip link set eth2.71 up $ ip link set eth2.72 up Use the following command trigger DNS packet in client: $ ping www.baidu.com Capture packets with tcpdump in client when ping: $ tcpdump -i eth2 -vne ... 20:30:22.996044 52:54:00:20:23:a9 > 52:54:00:d2:4f:e3, ethertype 802.1Q (0x8100), length 77: vlan 71, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 25407, offset 0, flags [DF], proto UDP (17), length 59) 192.168.72.41.42666 > 8.8.8.8.domain: 58562+ A? www.baidu.com. (31) ... We get the problem that IPv4 saddr "192.168.72.41" do not match with selected VLAN device "eth2.71". In above scenario, the process does __ip_route_output_key() twice in ip_route_connect(), the two processes have chosen different default gateway, and the last choice is not the best. Add flowi4->saddr and fib_nh_common->nhc_gw.ipv4 matching consideration in fib_select_default() to fix that. Fixes: 19baf839ff4a ("[IPV4]: Add LC-Trie FIB lookup algorithm.") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> --- net/ipv4/fib_semantics.c | 6 ++++++ 1 file changed, 6 insertions(+)