Message ID | 20210429170254.5grfgsz2hgy2qjhk@dwarf.suse.cz (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC] fix xfrm MTU regression | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | warning | 3 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org kuba@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 37 this patch: 37 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 83 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 30 this patch: 30 |
netdev/header_inline | success | Link |
2021-04-29, 19:02:54 +0200, Jiri Bohac wrote: > Hi, > > Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6 > sendmsg crash caused by too small MTU") breaks PMTU for xfrm. > > A Packet Too Big ICMPv6 message received in response to an ESP > packet will prevent all further communication through the tunnel > if the reported MTU minus the ESP overhead is smaller than 1280. > > E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead > is 92 bytes. Receiving a PTB with MTU of 1371 or less will result > in all further packets in the tunnel dropped. A ping through the > tunnel fails with "ping: sendmsg: Invalid argument". > > Apparently the MTU on the xfrm route is smaller than 1280 and > fails the check inside ip6_setup_cork() added by 749439bf. > > We found this by debugging USGv6/ipv6ready failures. Failing > tests are: "Phase-2 Interoperability Test Scenario IPsec" / > 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation). That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu should return at least 1280 for ipv6"), currently in Steffen's ipsec tree: https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote: > That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu > should return at least 1280 for ipv6"), currently in Steffen's ipsec > tree: > https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276 Thanks, that is interesting! The patch makes my large (-s 1400) pings inside ESP pass through a 1280-MTU link on an intermediary router but in a suboptimal double-fragmented way. tcpdump on the router shows: 22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232 22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100) 22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276 I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again. The same pings with my patch come through in two fragments: 22:13:22.072934 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x28), length 1236 22:13:22.073039 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x29), length 356 I can do more tests if needed.
On Thu, Apr 29, 2021 at 07:02:55PM +0200, Jiri Bohac wrote: > Below is my attempt to fix the situation by dropping the MTU > check and instead checking for the underflows described in the > 749439bf commit message (without much understanding of the > details!). Does this make sense?: the first version left headersize uninitialized in the error path; v2 below fixes this. Signed-off-by: Jiri Bohac <jbohac@suse.cz> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index ff4f9ebcf7f6..171eb4ec1e67 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, if (np->frag_size) mtu = np->frag_size; } - if (mtu < IPV6_MIN_MTU) - return -EINVAL; cork->base.fragsize = mtu; cork->base.gso_size = ipc6->gso_size; cork->base.tx_flags = 0; @@ -1465,8 +1463,6 @@ static int __ip6_append_data(struct sock *sk, fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len + (opt ? opt->opt_nflen : 0); - maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - - sizeof(struct frag_hdr); headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + @@ -1474,6 +1470,13 @@ static int __ip6_append_data(struct sock *sk, sizeof(struct frag_hdr) : 0) + rt->rt6i_nfheader_len; + if (mtu < fragheaderlen || + ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr)) + goto emsgsize; + + maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - + sizeof(struct frag_hdr); + /* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit * the first fragment */
2021-04-29, 22:25:29 +0200, Jiri Bohac wrote: > On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote: > > That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu > > should return at least 1280 for ipv6"), currently in Steffen's ipsec > > tree: > > https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276 > > Thanks, that is interesting! The patch makes my large (-s 1400) pings inside > ESP pass through a 1280-MTU link on an intermediary router but in a suboptimal > double-fragmented way. tcpdump on the router shows: > > 22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232 > 22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100) > 22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276 > > I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again. It's a bit ugly, but I don't think we can do any better. We're going through the stack twice in tunnel mode. The first pass (before xfrm) we fragment according to the PMTU (adjusted to IPV6_MIN_MTU, because MTUs lower than that are illegal in IPv6). The second time (after xfrm), the first ESP packet is too big so we fragment it. This behavior is consistent with a vti device running over a network with MTU=1280 (which doesn't seem to work without my patch). In transport mode, we're only going through the stack once, so we don't see this double fragmentation. I think my patch is correct, because without it we have IPv6 dsts going around the kernel with an associated MTU smaller than IPV6_MIN_MTU.
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index ff4f9ebcf7f6..8af6adb42c85 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, if (np->frag_size) mtu = np->frag_size; } - if (mtu < IPV6_MIN_MTU) - return -EINVAL; cork->base.fragsize = mtu; cork->base.gso_size = ipc6->gso_size; cork->base.tx_flags = 0; @@ -1465,6 +1463,11 @@ static int __ip6_append_data(struct sock *sk, fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len + (opt ? opt->opt_nflen : 0); + + if (mtu < fragheaderlen || + ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr)) + goto emsgsize; + maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - sizeof(struct frag_hdr);
Hi, Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6 sendmsg crash caused by too small MTU") breaks PMTU for xfrm. A Packet Too Big ICMPv6 message received in response to an ESP packet will prevent all further communication through the tunnel if the reported MTU minus the ESP overhead is smaller than 1280. E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead is 92 bytes. Receiving a PTB with MTU of 1371 or less will result in all further packets in the tunnel dropped. A ping through the tunnel fails with "ping: sendmsg: Invalid argument". Apparently the MTU on the xfrm route is smaller than 1280 and fails the check inside ip6_setup_cork() added by 749439bf. We found this by debugging USGv6/ipv6ready failures. Failing tests are: "Phase-2 Interoperability Test Scenario IPsec" / 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation). Below is my attempt to fix the situation by dropping the MTU check and instead checking for the underflows described in the 749439bf commit message (without much understanding of the details!). Does this make sense?: Signed-off-by: Jiri Bohac <jbohac@suse.cz>