diff mbox series

[net,3/3] net: ip: always refragment ip defragmented packets

Message ID 20210105231523.622-4-fw@strlen.de (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series net: fix netfilter defrag/ip tunnel pmtu blackhole | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 5 maintainers not CCed: hannes@stressinduktion.org kuba@kernel.org yoshfuji@linux-ipv6.org davem@davemloft.net kuznet@ms2.inr.ac.ru
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Florian Westphal Jan. 5, 2021, 11:15 p.m. UTC
Conntrack reassembly records the largest fragment size seen in IPCB.
However, when this gets forwarded/transmitted, fragmentation will only
be forced if one of the fragmented packets had the DF bit set.

In that case, a flag in IPCB will force fragmentation even if the
MTU is large enough.

This should work fine, but this breaks with ip tunnels.
Consider client that sends a UDP datagram of size X to another host.

The client fragments the datagram, so two packets, of size y and z, are
sent. DF bit is not set on any of these packets.

Middlebox netfilter reassembles those packets back to single size-X
packet, before routing decision.

packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
isn't set.  At output time, ip refragmentation is skipped as well
because x is still smaller than the mtu of the output device.

If ttransmit device is an ip tunnel, the packet size increases to
x+overhead.

Also, tunnel might be configured to force DF bit on outer header.

In this case, packet will be dropped (exceeds MTU) and an ICMP error is
generated back to sender.

But sender already respects the announced MTU, all the packets that
it sent did fit the announced mtu.

Force refragmentation as per original sizes unconditionally so ip tunnel
will encapsulate the fragments instead.

The only other solution I see is to place ip refragmentation in
the ip_tunnel code to handle this case.

Fixes: d6b915e29f4ad ("ip_fragment: don't forward defragmented DF packet")
Reported-by: Christian Perle <christian.perle@secunet.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/ip_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian Perle Jan. 7, 2021, 7:52 a.m. UTC | #1
Hello Florian,

On Wed, Jan 06, 2021 at 00:15:23 +0100, Florian Westphal wrote:

> Force refragmentation as per original sizes unconditionally so ip tunnel
> will encapsulate the fragments instead.
[...]
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 89fff5f59eea..2ed0b01f72f0 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -302,7 +302,7 @@ static int __ip_finish_output(struct net *net, struct sock *sk, struct sk_buff *
>  	if (skb_is_gso(skb))
>  		return ip_finish_output_gso(net, sk, skb, mtu);
>  
> -	if (skb->len > mtu || (IPCB(skb)->flags & IPSKB_FRAG_PMTU))
> +	if (skb->len > mtu || IPCB(skb)->frag_max_size)
>  		return ip_fragment(net, sk, skb, mtu, ip_finish_output2);
>  
>  	return ip_finish_output2(net, sk, skb);
> -- 
> 2.26.2

Did some tests yesterday and I can confirm that this patch fixes the
problem for both IPIP tunnel and XFRM tunnel interfaces.

Thanks for the fix!
  Christian Perle
diff mbox series

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 89fff5f59eea..2ed0b01f72f0 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -302,7 +302,7 @@  static int __ip_finish_output(struct net *net, struct sock *sk, struct sk_buff *
 	if (skb_is_gso(skb))
 		return ip_finish_output_gso(net, sk, skb, mtu);
 
-	if (skb->len > mtu || (IPCB(skb)->flags & IPSKB_FRAG_PMTU))
+	if (skb->len > mtu || IPCB(skb)->frag_max_size)
 		return ip_fragment(net, sk, skb, mtu, ip_finish_output2);
 
 	return ip_finish_output2(net, sk, skb);