mbox series

[net,0/3] net: fix netfilter defrag/ip tunnel pmtu blackhole

Message ID 20210105231523.622-1-fw@strlen.de (mailing list archive)
Headers show
Series net: fix netfilter defrag/ip tunnel pmtu blackhole | expand

Message

Florian Westphal Jan. 5, 2021, 11:15 p.m. UTC
Christian Perle reported a PMTU blackhole due to unexpected interaction
between the ip defragmentation that comes with connection tracking and
ip tunnels.

Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
scenario even without netfilter.

Christinas setup looks like this:
     +--------+       +---------+       +--------+
     |Router A|-------|Wanrouter|-------|Router B|
     |        |.IPIP..|         |..IPIP.|        |
     +--------+       +---------+       +--------+
          /             mtu 1400           \
         /                                  \
 +--------+                                  +--------+
 |Client A|                                  |Client B|
 +--------+                                  +--------+

MTU is 1500 everywhere, except on Router A to Wanrouter and
Wanrouter to Router B.

Router A and Router B use IPIP tunnel interfaces to tunnel traffic
between Client A and Client B over WAN.

Client A sends a 1400 byte UDP datagram to Client B.
This packet gets encapsulated in the IPIP tunnel.

This works, packet is received on client B.

When conntrack (or anything else that forces ip defragmentation) is
enabled on Router A, the packet gets dropped on Router A after
encapsulation because they exceed the link MTU.

Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
no packets pass even in the no-netfilter scenario.

Patch one is a reproducer script for selftest infra.

Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
an icmp error to Client A.  This allows 'nopmtudisc' tunnel to forward
the UDP datagrams.

Patch three enables ip refragmentation for all reassembled packets, just
like ipv6.

Comments

Pablo Neira Ayuso Jan. 7, 2021, 10:14 p.m. UTC | #1
On Wed, Jan 06, 2021 at 12:15:20AM +0100, Florian Westphal wrote:
> Christian Perle reported a PMTU blackhole due to unexpected interaction
> between the ip defragmentation that comes with connection tracking and
> ip tunnels.
> 
> Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
> scenario even without netfilter.
> 
> Christinas setup looks like this:
>      +--------+       +---------+       +--------+
>      |Router A|-------|Wanrouter|-------|Router B|
>      |        |.IPIP..|         |..IPIP.|        |
>      +--------+       +---------+       +--------+
>           /             mtu 1400           \
>          /                                  \
>  +--------+                                  +--------+
>  |Client A|                                  |Client B|
>  +--------+                                  +--------+
> 
> MTU is 1500 everywhere, except on Router A to Wanrouter and
> Wanrouter to Router B.
> 
> Router A and Router B use IPIP tunnel interfaces to tunnel traffic
> between Client A and Client B over WAN.
> 
> Client A sends a 1400 byte UDP datagram to Client B.
> This packet gets encapsulated in the IPIP tunnel.
> 
> This works, packet is received on client B.
> 
> When conntrack (or anything else that forces ip defragmentation) is
> enabled on Router A, the packet gets dropped on Router A after
> encapsulation because they exceed the link MTU.
> 
> Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
> no packets pass even in the no-netfilter scenario.
> 
> Patch one is a reproducer script for selftest infra.
> 
> Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
> an icmp error to Client A.  This allows 'nopmtudisc' tunnel to forward
> the UDP datagrams.
> 
> Patch three enables ip refragmentation for all reassembled packets, just
> like ipv6.

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

Thanks.
Jakub Kicinski Jan. 7, 2021, 10:45 p.m. UTC | #2
On Thu, 7 Jan 2021 23:14:03 +0100 Pablo Neira Ayuso wrote:
> On Wed, Jan 06, 2021 at 12:15:20AM +0100, Florian Westphal wrote:
> > Christian Perle reported a PMTU blackhole due to unexpected interaction
> > between the ip defragmentation that comes with connection tracking and
> > ip tunnels.
> > 
> > Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
> > scenario even without netfilter.
> > 
> > Christinas setup looks like this:
> >      +--------+       +---------+       +--------+
> >      |Router A|-------|Wanrouter|-------|Router B|
> >      |        |.IPIP..|         |..IPIP.|        |
> >      +--------+       +---------+       +--------+
> >           /             mtu 1400           \
> >          /                                  \
> >  +--------+                                  +--------+
> >  |Client A|                                  |Client B|
> >  +--------+                                  +--------+
> > 
> > MTU is 1500 everywhere, except on Router A to Wanrouter and
> > Wanrouter to Router B.
> > 
> > Router A and Router B use IPIP tunnel interfaces to tunnel traffic
> > between Client A and Client B over WAN.
> > 
> > Client A sends a 1400 byte UDP datagram to Client B.
> > This packet gets encapsulated in the IPIP tunnel.
> > 
> > This works, packet is received on client B.
> > 
> > When conntrack (or anything else that forces ip defragmentation) is
> > enabled on Router A, the packet gets dropped on Router A after
> > encapsulation because they exceed the link MTU.
> > 
> > Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
> > no packets pass even in the no-netfilter scenario.
> > 
> > Patch one is a reproducer script for selftest infra.
> > 
> > Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
> > an icmp error to Client A.  This allows 'nopmtudisc' tunnel to forward
> > the UDP datagrams.
> > 
> > Patch three enables ip refragmentation for all reassembled packets, just
> > like ipv6.  
> 
> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

Applied, thanks!