Message ID | 20210615110709.541499-1-toke@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] icmp: don't send out ICMP messages with a source address of 0.0.0.0 | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 5 of 5 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 5591 this patch: 5591 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: No space is necessary after a cast WARNING: Prefer 'unsigned long' over 'unsigned long int' as the int is unnecessary WARNING: Unnecessary typecast of c90 int constant - '(unsigned long int) 0xc0000008' could be '0xc0000008UL' |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 5656 this patch: 5656 |
netdev/header_inline | success | Link |
On 6/15/21 5:07 AM, Toke Høiland-Jørgensen wrote: > When constructing ICMP response messages, the kernel will try to pick a > suitable source address for the outgoing packet. However, if no IPv4 > addresses are configured on the system at all, this will fail and we end up > producing an ICMP message with a source address of 0.0.0.0. This can happen > on a box routing IPv4 traffic via v6 nexthops, for instance. > > Since 0.0.0.0 is not generally routable on the internet, there's a good > chance that such ICMP messages will never make it back to the sender of the > original packet that the ICMP message was sent in response to. This, in > turn, can create connectivity and PMTUd problems for senders. Fortunately, > RFC7600 reserves a dummy address to be used as a source for ICMP > messages (192.0.0.8/32), so let's teach the kernel to substitute that > address as a last resort if the regular source address selection procedure > fails. > > Below is a quick example reproducing this issue with network namespaces: > > ip netns add ns0 > ip l add type veth peer netns ns0 > ip l set dev veth0 up > ip a add 10.0.0.1/24 dev veth0 > ip a add fc00:dead:cafe:42::1/64 dev veth0 > ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 > ip -n ns0 l set dev veth0 up > ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 > ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 > ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 > ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 > tcpdump -tpni veth0 -c 2 icmp & > ping -w 1 10.1.0.1 > /dev/null > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes > IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 > IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 > 2 packets captured > 2 packets received by filter > 0 packets dropped by kernel > > With this patch the above capture changes to: > IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 > IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 We should capture this use case in a test script. There is already an icmp_redirect.sh; how about starting an icmp.sh script? > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") This should be the one that allows IPv6 nexthops with IPv4 routes. Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes") > Reported-by: Juliusz Chroboczek <jch@irif.fr> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > include/uapi/linux/in.h | 3 +++ > net/ipv4/icmp.c | 7 +++++++ > 2 files changed, 10 insertions(+) > > diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h > index 7d6687618d80..d1b327036ae4 100644 > --- a/include/uapi/linux/in.h > +++ b/include/uapi/linux/in.h > @@ -289,6 +289,9 @@ struct sockaddr_in { > /* Address indicating an error return. */ > #define INADDR_NONE ((unsigned long int) 0xffffffff) > > +/* Dummy address for src of ICMP replies if no real address is set (RFC7600). */ > +#define INADDR_DUMMY ((unsigned long int) 0xc0000008) > + > /* Network number for local host loopback. */ > #define IN_LOOPBACKNET 127 > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index 7b6931a4d775..752e392083e6 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -759,6 +759,13 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, > icmp_param.data_len = room; > icmp_param.head_len = sizeof(struct icmphdr); > > + /* if we don't have a source address at this point, fall back to the > + * dummy address instead of sending out a packet with a source address > + * of 0.0.0.0 > + */ > + if (!fl4.saddr) > + fl4.saddr = htonl(INADDR_DUMMY); > + > icmp_push_reply(&icmp_param, &fl4, &ipc, &rt); > ende: > ip_rt_put(rt); > Reviewed-by: David Ahern <dsahern@kernel.org>
On 6/17/21 9:06 AM, David Ahern wrote: >> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > This should be the one that allows IPv6 nexthops with IPv4 routes. > > Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes") > on further thought, this is the receiving / transit path and not the sending node, so my change to allow v6 gw with v4 routes is not relevant.
David Ahern <dsahern@gmail.com> writes: > On 6/17/21 9:06 AM, David Ahern wrote: >>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >> This should be the one that allows IPv6 nexthops with IPv4 routes. >> >> Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes") >> > > on further thought, this is the receiving / transit path and not the > sending node, so my change to allow v6 gw with v4 routes is not relevant. Right, so you're OK with keeping the patch as-is, then? I can send a test-case as a follow-up... -Toke
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h index 7d6687618d80..d1b327036ae4 100644 --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -289,6 +289,9 @@ struct sockaddr_in { /* Address indicating an error return. */ #define INADDR_NONE ((unsigned long int) 0xffffffff) +/* Dummy address for src of ICMP replies if no real address is set (RFC7600). */ +#define INADDR_DUMMY ((unsigned long int) 0xc0000008) + /* Network number for local host loopback. */ #define IN_LOOPBACKNET 127 diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 7b6931a4d775..752e392083e6 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -759,6 +759,13 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, icmp_param.data_len = room; icmp_param.head_len = sizeof(struct icmphdr); + /* if we don't have a source address at this point, fall back to the + * dummy address instead of sending out a packet with a source address + * of 0.0.0.0 + */ + if (!fl4.saddr) + fl4.saddr = htonl(INADDR_DUMMY); + icmp_push_reply(&icmp_param, &fl4, &ipc, &rt); ende: ip_rt_put(rt);
When constructing ICMP response messages, the kernel will try to pick a suitable source address for the outgoing packet. However, if no IPv4 addresses are configured on the system at all, this will fail and we end up producing an ICMP message with a source address of 0.0.0.0. This can happen on a box routing IPv4 traffic via v6 nexthops, for instance. Since 0.0.0.0 is not generally routable on the internet, there's a good chance that such ICMP messages will never make it back to the sender of the original packet that the ICMP message was sent in response to. This, in turn, can create connectivity and PMTUd problems for senders. Fortunately, RFC7600 reserves a dummy address to be used as a source for ICMP messages (192.0.0.8/32), so let's teach the kernel to substitute that address as a last resort if the regular source address selection procedure fails. Below is a quick example reproducing this issue with network namespaces: ip netns add ns0 ip l add type veth peer netns ns0 ip l set dev veth0 up ip a add 10.0.0.1/24 dev veth0 ip a add fc00:dead:cafe:42::1/64 dev veth0 ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 ip -n ns0 l set dev veth0 up ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 tcpdump -tpni veth0 -c 2 icmp & ping -w 1 10.1.0.1 > /dev/null tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 2 packets captured 2 packets received by filter 0 packets dropped by kernel With this patch the above capture changes to: IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Juliusz Chroboczek <jch@irif.fr> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/uapi/linux/in.h | 3 +++ net/ipv4/icmp.c | 7 +++++++ 2 files changed, 10 insertions(+)