Message ID | 20220114174058.rqhtuwpfhq6czldn@dwarf.suse.cz (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | xfrm: fix MTU regression | expand |
On Fri, Jan 14, 2022 at 06:40:58PM +0100, Jiri Bohac wrote: > Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6 > sendmsg crash caused by too small MTU") breaks PMTU for xfrm. > > A Packet Too Big ICMPv6 message received in response to an ESP > packet will prevent all further communication through the tunnel > if the reported MTU minus the ESP overhead is smaller than 1280. > > E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead > is 92 bytes. Receiving a PTB with MTU of 1371 or less will result > in all further packets in the tunnel dropped. A ping through the > tunnel fails with "ping: sendmsg: Invalid argument". > > Apparently the MTU on the xfrm route is smaller than 1280 and > fails the check inside ip6_setup_cork() added by 749439bf. > > We found this by debugging USGv6/ipv6ready failures. Failing > tests are: "Phase-2 Interoperability Test Scenario IPsec" / > 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation). > > Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: > xfrm_state_mtu should return at least 1280 for ipv6") attempted > to fix this but caused another regression in TCP MSS calculations > and had to be reverted. > > The patch below fixes the situation by dropping the MTU > check and instead checking for the underflows described in the > 749439bf commit message. > > Signed-off-by: Jiri Bohac <jbohac@suse.cz> Can you please add a 'Fixes:' tag so that it can be backported to the stable trees? Btw. this fixes a xfrm issue, but touches only generic IPv6 code. To which tree should this patch be applied? I can take it to the ipsec tee, but would also be ok if it is applied directly to the net tree.
On Wed, Jan 19, 2022 at 08:35:19AM +0100, Steffen Klassert wrote: > Can you please add a 'Fixes:' tag so that it can be backported > to the stable trees? sure; I'll send a v2 with added Fixes: for the original regression (749439bf), which will reappear once b515d263 (which causes the current regression) is reverted. Note this patch needs to be accompanied by the revert! > Btw. this fixes a xfrm issue, but touches only generic IPv6 code. > To which tree should this patch be applied? I can take it to > the ipsec tee, but would also be ok if it is applied directly > to the net tree. b515d263 touches xfrm code; but being a regression maybe we want the fastest track possible? Thanks,
Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
A Packet Too Big ICMPv6 message received in response to an ESP
packet will prevent all further communication through the tunnel
if the reported MTU minus the ESP overhead is smaller than 1280.
E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
in all further packets in the tunnel dropped. A ping through the
tunnel fails with "ping: sendmsg: Invalid argument".
Apparently the MTU on the xfrm route is smaller than 1280 and
fails the check inside ip6_setup_cork() added by 749439bf.
We found this by debugging USGv6/ipv6ready failures. Failing
tests are: "Phase-2 Interoperability Test Scenario IPsec" /
5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm:
xfrm_state_mtu should return at least 1280 for ipv6") attempted
to fix this but caused another regression in TCP MSS calculations
and had to be reverted.
The patch below fixes the situation by dropping the MTU
check and instead checking for the underflows described in the
749439bf commit message.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Fixes: 749439bfac6e ("ipv6: fix udpv6 sendmsg crash caused by too small MTU")
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..171eb4ec1e67 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
@@ -1465,8 +1463,6 @@ static int __ip6_append_data(struct sock *sk,
fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
- maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
- sizeof(struct frag_hdr);
headersize = sizeof(struct ipv6hdr) +
(opt ? opt->opt_flen + opt->opt_nflen : 0) +
@@ -1474,6 +1470,13 @@ static int __ip6_append_data(struct sock *sk,
sizeof(struct frag_hdr) : 0) +
rt->rt6i_nfheader_len;
+ if (mtu < fragheaderlen ||
+ ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+ goto emsgsize;
+
+ maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
+ sizeof(struct frag_hdr);
+
/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
* the first fragment
*/
On Wed, Jan 19, 2022 at 10:12:33AM +0100, Jiri Bohac wrote: > On Wed, Jan 19, 2022 at 08:35:19AM +0100, Steffen Klassert wrote: > > Can you please add a 'Fixes:' tag so that it can be backported > > to the stable trees? > > sure; I'll send a v2 with added Fixes: for the original > regression (749439bf), which will reappear once b515d263 (which > causes the current regression) is reverted. Note this patch needs > to be accompanied by the revert! > > > Btw. this fixes a xfrm issue, but touches only generic IPv6 code. > > To which tree should this patch be applied? I can take it to > > the ipsec tee, but would also be ok if it is applied directly > > to the net tree. > > b515d263 touches xfrm code; but being a regression maybe we want > the fastest track possible? The patch is already marked as 'awaiting upstream' in patchwork, so I'll take it into the ipsec tree.
On Mon, Jan 24, 2022 at 04:45:31PM +0100, Steffen Klassert wrote: > > sure; I'll send a v2 with added Fixes: for the original > > regression (749439bf), which will reappear once b515d263 (which > > causes the current regression) is reverted. Note this patch needs > > to be accompanied by the revert! > > > > > Btw. this fixes a xfrm issue, but touches only generic IPv6 code. > > > To which tree should this patch be applied? I can take it to > > > the ipsec tee, but would also be ok if it is applied directly > > > to the net tree. > > > > b515d263 touches xfrm code; but being a regression maybe we want > > the fastest track possible? > > The patch is already marked as 'awaiting upstream' in patchwork, > so I'll take it into the ipsec tree. OK, thanks! Will you also revert b515d263 in the ipsec tree?
On Wed, Jan 19, 2022 at 10:22:53AM +0100, Jiri Bohac wrote: > Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6 > sendmsg crash caused by too small MTU") breaks PMTU for xfrm. > > A Packet Too Big ICMPv6 message received in response to an ESP > packet will prevent all further communication through the tunnel > if the reported MTU minus the ESP overhead is smaller than 1280. > > E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead > is 92 bytes. Receiving a PTB with MTU of 1371 or less will result > in all further packets in the tunnel dropped. A ping through the > tunnel fails with "ping: sendmsg: Invalid argument". > > Apparently the MTU on the xfrm route is smaller than 1280 and > fails the check inside ip6_setup_cork() added by 749439bf. > > We found this by debugging USGv6/ipv6ready failures. Failing > tests are: "Phase-2 Interoperability Test Scenario IPsec" / > 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation). > > Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: > xfrm_state_mtu should return at least 1280 for ipv6") attempted > to fix this but caused another regression in TCP MSS calculations > and had to be reverted. > > The patch below fixes the situation by dropping the MTU > check and instead checking for the underflows described in the > 749439bf commit message. > > Signed-off-by: Jiri Bohac <jbohac@suse.cz> > Fixes: 749439bfac6e ("ipv6: fix udpv6 sendmsg crash caused by too small MTU") Applied to the ipsec tree, thanks Jiri!
On Tue, Jan 25, 2022 at 10:41:02AM +0100, Jiri Bohac wrote: > On Mon, Jan 24, 2022 at 04:45:31PM +0100, Steffen Klassert wrote: > > > sure; I'll send a v2 with added Fixes: for the original > > > regression (749439bf), which will reappear once b515d263 (which > > > causes the current regression) is reverted. Note this patch needs > > > to be accompanied by the revert! > > > > > > > Btw. this fixes a xfrm issue, but touches only generic IPv6 code. > > > > To which tree should this patch be applied? I can take it to > > > > the ipsec tee, but would also be ok if it is applied directly > > > > to the net tree. > > > > > > b515d263 touches xfrm code; but being a regression maybe we want > > > the fastest track possible? > > > > The patch is already marked as 'awaiting upstream' in patchwork, > > so I'll take it into the ipsec tree. > > OK, thanks! Will you also revert b515d263 in the ipsec tree? You need to send a patch if you want to have this reverted.
This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a.
Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS
calculation in ipsec transport mode, resulting complete stalls of TCP
connections. This happens when the (P)MTU is 1280 or slighly larger.
The desired formula for the MSS is:
MSS = (MTU - ESP_overhead) - IP header - TCP header
However, the above commit clamps the (MTU - ESP_overhead) to a
minimum of 1280, turning the formula into
MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header
With the (P)MTU near 1280, the calculated MSS is too large and the
resulting TCP packets never make it to the destination because they
are over the actual PMTU.
The above commit also causes suboptimal double fragmentation in
xfrm tunnel mode, as described in
https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/
The original problem the above commit was trying to fix is now fixed
by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU
regression").
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
---
include/net/xfrm.h | 1 -
net/ipv4/esp4.c | 2 +-
net/ipv6/esp6.c | 2 +-
net/xfrm/xfrm_state.c | 14 ++------------
4 files changed, 4 insertions(+), 15 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index fdb41e8bb626..d78a294e1d0c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1568,7 +1568,6 @@ void xfrm_sad_getinfo(struct net *net, struct xfrmk_sadinfo *si);
void xfrm_spd_getinfo(struct net *net, struct xfrmk_spdinfo *si);
u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
int xfrm_init_replay(struct xfrm_state *x);
-u32 __xfrm_state_mtu(struct xfrm_state *x, int mtu);
u32 xfrm_state_mtu(struct xfrm_state *x, int mtu);
int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload);
int xfrm_init_state(struct xfrm_state *x);
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 851f542928a3..e1b1d080e908 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -671,7 +671,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
struct xfrm_dst *dst = (struct xfrm_dst *)skb_dst(skb);
u32 padto;
- padto = min(x->tfcpad, __xfrm_state_mtu(x, dst->child_mtu_cached));
+ padto = min(x->tfcpad, xfrm_state_mtu(x, dst->child_mtu_cached));
if (skb->len < padto)
esp.tfclen = padto - skb->len;
}
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 8bb2c407b46b..7591160edce1 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -707,7 +707,7 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
struct xfrm_dst *dst = (struct xfrm_dst *)skb_dst(skb);
u32 padto;
- padto = min(x->tfcpad, __xfrm_state_mtu(x, dst->child_mtu_cached));
+ padto = min(x->tfcpad, xfrm_state_mtu(x, dst->child_mtu_cached));
if (skb->len < padto)
esp.tfclen = padto - skb->len;
}
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index ca6bee18346d..dde82f8eb949 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2572,7 +2572,7 @@ void xfrm_state_delete_tunnel(struct xfrm_state *x)
}
EXPORT_SYMBOL(xfrm_state_delete_tunnel);
-u32 __xfrm_state_mtu(struct xfrm_state *x, int mtu)
+u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
{
const struct xfrm_type *type = READ_ONCE(x->type);
struct crypto_aead *aead;
@@ -2603,17 +2603,7 @@ u32 __xfrm_state_mtu(struct xfrm_state *x, int mtu)
return ((mtu - x->props.header_len - crypto_aead_authsize(aead) -
net_adj) & ~(blksize - 1)) + net_adj - 2;
}
-EXPORT_SYMBOL_GPL(__xfrm_state_mtu);
-
-u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
-{
- mtu = __xfrm_state_mtu(x, mtu);
-
- if (x->props.family == AF_INET6 && mtu < IPV6_MIN_MTU)
- return IPV6_MIN_MTU;
-
- return mtu;
-}
+EXPORT_SYMBOL_GPL(xfrm_state_mtu);
int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload)
{
On Wed, Jan 26, 2022 at 04:00:18PM +0100, Jiri Bohac wrote: > This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a. > > Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu > should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS > calculation in ipsec transport mode, resulting complete stalls of TCP > connections. This happens when the (P)MTU is 1280 or slighly larger. > > The desired formula for the MSS is: > MSS = (MTU - ESP_overhead) - IP header - TCP header > > However, the above commit clamps the (MTU - ESP_overhead) to a > minimum of 1280, turning the formula into > MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header > > With the (P)MTU near 1280, the calculated MSS is too large and the > resulting TCP packets never make it to the destination because they > are over the actual PMTU. > > The above commit also causes suboptimal double fragmentation in > xfrm tunnel mode, as described in > https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/ > > The original problem the above commit was trying to fix is now fixed > by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU > regression"). > > Signed-off-by: Jiri Bohac <jbohac@suse.cz> Applied, thanks Jiri!
Hi, this is your Linux kernel regression tracker speaking. The patch discussed below is now in linux-next for about 11 days, but not yet in the net tree afaics. Will it be merged this week? And shouldn't this patch have these stables tags? Cc: <stable@vger.kernel.org> # 5.14: 6596a0229541 xfrm: fix MTU regression Cc: <stable@vger.kernel.org> # 5.14 And maybe a fixes tag, too? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them and lack knowledge about most of the areas they concern. I thus unfortunately will sometimes get things wrong or miss something important. I hope that's not the case here; if you think it is, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. #regzbot poke On 01.02.22 07:46, Steffen Klassert wrote: > On Wed, Jan 26, 2022 at 04:00:18PM +0100, Jiri Bohac wrote: >> This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a. >> >> Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu >> should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS >> calculation in ipsec transport mode, resulting complete stalls of TCP >> connections. This happens when the (P)MTU is 1280 or slighly larger. >> >> The desired formula for the MSS is: >> MSS = (MTU - ESP_overhead) - IP header - TCP header >> >> However, the above commit clamps the (MTU - ESP_overhead) to a >> minimum of 1280, turning the formula into >> MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header >> >> With the (P)MTU near 1280, the calculated MSS is too large and the >> resulting TCP packets never make it to the destination because they >> are over the actual PMTU. >> >> The above commit also causes suboptimal double fragmentation in >> xfrm tunnel mode, as described in >> https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/ >> >> The original problem the above commit was trying to fix is now fixed >> by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU >> regression"). >> >> Signed-off-by: Jiri Bohac <jbohac@suse.cz> > > Applied, thanks Jiri!
On Tue, Feb 15, 2022 at 03:59:38PM +0100, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker speaking. > > The patch discussed below is now in linux-next for about 11 days, but > not yet in the net tree afaics. Will it be merged this week? And It will be merged with the next pull request for the ipsec tree that will happen likely next week.
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index ff4f9ebcf7f6..171eb4ec1e67 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork, if (np->frag_size) mtu = np->frag_size; } - if (mtu < IPV6_MIN_MTU) - return -EINVAL; cork->base.fragsize = mtu; cork->base.gso_size = ipc6->gso_size; cork->base.tx_flags = 0; @@ -1465,8 +1463,6 @@ static int __ip6_append_data(struct sock *sk, fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len + (opt ? opt->opt_nflen : 0); - maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - - sizeof(struct frag_hdr); headersize = sizeof(struct ipv6hdr) + (opt ? opt->opt_flen + opt->opt_nflen : 0) + @@ -1474,6 +1470,13 @@ static int __ip6_append_data(struct sock *sk, sizeof(struct frag_hdr) : 0) + rt->rt6i_nfheader_len; + if (mtu < fragheaderlen || + ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr)) + goto emsgsize; + + maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - + sizeof(struct frag_hdr); + /* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit * the first fragment */
Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6 sendmsg crash caused by too small MTU") breaks PMTU for xfrm. A Packet Too Big ICMPv6 message received in response to an ESP packet will prevent all further communication through the tunnel if the reported MTU minus the ESP overhead is smaller than 1280. E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead is 92 bytes. Receiving a PTB with MTU of 1371 or less will result in all further packets in the tunnel dropped. A ping through the tunnel fails with "ping: sendmsg: Invalid argument". Apparently the MTU on the xfrm route is smaller than 1280 and fails the check inside ip6_setup_cork() added by 749439bf. We found this by debugging USGv6/ipv6ready failures. Failing tests are: "Phase-2 Interoperability Test Scenario IPsec" / 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation). Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu should return at least 1280 for ipv6") attempted to fix this but caused another regression in TCP MSS calculations and had to be reverted. The patch below fixes the situation by dropping the MTU check and instead checking for the underflows described in the 749439bf commit message. Signed-off-by: Jiri Bohac <jbohac@suse.cz>