[net-next,2/2] datagram, udp: Set local address and rehash socket atomically against lookup

If a UDP socket changes its local address while it's receiving
datagrams, as a result of connect(), there is a period during which
a lookup operation might fail to find it, after the address is changed
but before the secondary hash (port and address) and the four-tuple
hash (local and remote ports and addresses) are updated.

Secondary hash chains were introduced by commit 30fff9231fad ("udp:
bind() optimisation") and, as a result, a rehash operation became
needed to make a bound socket reachable again after a connect().

This operation was introduced by commit 719f835853a9 ("udp: add
rehash on connect()") which isn't however a complete fix: the
socket will be found once the rehashing completes, but not while
it's pending.

This is noticeable with a socat(1) server in UDP4-LISTEN mode, and a
client sending datagrams to it. After the server receives the first
datagram (cf. _xioopen_ipdgram_listen()), it issues a connect() to
the address of the sender, in order to set up a directed flow.

Now, if the client, running on a different CPU thread, happens to
send a (subsequent) datagram while the server's socket changes its
address, but is not rehashed yet, this will result in a failed
lookup and a port unreachable error delivered to the client, as
apparent from the following reproducer:

  LEN=$(($(cat /proc/sys/net/core/wmem_default) / 4))
  dd if=/dev/urandom bs=1 count=${LEN} of=tmp.in

  while :; do
  	taskset -c 1 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc &
  	sleep 0.1 || sleep 1
  	taskset -c 2 socat OPEN:tmp.in UDP4:localhost:1337,shut-null
  	wait
  done

where the client will eventually get ECONNREFUSED on a write()
(typically the second or third one of a given iteration):

  2024/11/13 21:28:23 socat[46901] E write(6, 0x556db2e3c000, 8192): Connection refused

This issue was first observed as a seldom failure in Podman's tests
checking UDP functionality while using pasta(1) to connect the
container's network namespace, which leads us to a reproducer with
the lookup error resulting in an ICMP packet on a tap device:

  LOCAL_ADDR="$(ip -j -4 addr show|jq -rM '.[] | .addr_info[0] | select(.scope == "global").local')"

  while :; do
  	./pasta --config-net -p pasta.pcap -u 1337 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc &
  	sleep 0.2 || sleep 1
  	socat OPEN:tmp.in UDP4:${LOCAL_ADDR}:1337,shut-null
  	wait
  	cmp tmp.in tmp.out
  done

Once this fails:

  tmp.in tmp.out differ: char 8193, line 29

we can finally have a look at what's going on:

  $ tshark -r pasta.pcap
      1   0.000000           :: ? ff02::16     ICMPv6 110 Multicast Listener Report Message v2
      2   0.168690 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
      3   0.168767 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
      4   0.168806 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
      5   0.168827 c6:47:05:8d:dc:04 ? Broadcast    ARP 42 Who has 88.198.0.161? Tell 88.198.0.164
      6   0.168851 9a:55:9a:55:9a:55 ? c6:47:05:8d:dc:04 ARP 42 88.198.0.161 is at 9a:55:9a:55:9a:55
      7   0.168875 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
      8   0.168896 88.198.0.164 ? 88.198.0.161 ICMP 590 Destination unreachable (Port unreachable)
      9   0.168926 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
     10   0.168959 88.198.0.161 ? 88.198.0.164 UDP 8234 60260 ? 1337 Len=8192
     11   0.168989 88.198.0.161 ? 88.198.0.164 UDP 4138 60260 ? 1337 Len=4096
     12   0.169010 88.198.0.161 ? 88.198.0.164 UDP 42 60260 ? 1337 Len=0

On the third datagram received, the network namespace of the container
initiates an ARP lookup to deliver the ICMP message.

In another variant of this reproducer, starting the client with:

  strace -f pasta --config-net -u 1337 socat UDP4-LISTEN:1337,null-eof OPEN:tmp.out,create,trunc 2>strace.log &

and connecting to the socat server using a loopback address:

  socat OPEN:tmp.in UDP4:localhost:1337,shut-null

we can more clearly observe a sendmmsg() call failing after the
first datagram is delivered:

  [pid 278012] connect(173, 0x7fff96c95fc0, 16) = 0
  [...]
  [pid 278012] recvmmsg(173, 0x7fff96c96020, 1024, MSG_DONTWAIT, NULL) = -1 EAGAIN (Resource temporarily unavailable)
  [pid 278012] sendmmsg(173, 0x561c5ad0a720, 1, MSG_NOSIGNAL) = 1
  [...]
  [pid 278012] sendmmsg(173, 0x561c5ad0a720, 1, MSG_NOSIGNAL) = -1 ECONNREFUSED (Connection refused)

and, somewhat confusingly, after a connect() on the same socket
succeeded.

To fix this, replace the rehash operation by a set_rcv_saddr()
callback holding the spinlock on the primary hash chain, just like
the rehash operation used to do, but also setting the address (via
inet_update_saddr(), moved to headers) while holding the spinlock.

To make this atomic against the lookup operation, also acquire the
spinlock on the primary chain there.

This results in some awkwardness at a caller site, specifically
sock_bindtoindex_locked(), where we really just need to rehash the
socket without changing its address. With the new operation, we now
need to forcibly set the current address again.

On the other hand, this appears more elegant than alternatives such
as fetching the spinlock reference in ip4_datagram_connect() and
ip6_datagram_conect(), and keeping the rehash operation around for
a single user also seems a tad overkill.

v1:
  - fix build with CONFIG_IPV6=n: add ifdef around sk_v6_rcv_saddr
    usage (Kuniyuki Iwashima)
  - directly use sk_rcv_saddr for IPv4 receive addresses instead of
    fetching inet_rcv_saddr (Kuniyuki Iwashima)
  - move inet_update_saddr() to inet_hashtables.h and use that
    to set IPv4/IPv6 addresses as suitable (Kuniyuki Iwashima)
  - rebase onto net-next, update commit message accordingly

Reported-by: Ed Santiago <santiago@redhat.com>
Link: https://github.com/containers/podman/issues/24147
Analysed-by: David Gibson <david@gibson.dropbear.id.au>
Fixes: 30fff9231fad ("udp: bind() optimisation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
 include/net/inet_hashtables.h | 13 ++++++
 include/net/sock.h            |  2 +-
 include/net/udp.h             |  3 +-
 net/core/sock.c               | 12 ++++-
 net/ipv4/datagram.c           |  7 +--
 net/ipv4/inet_hashtables.c    | 13 ------
 net/ipv4/udp.c                | 84 +++++++++++++++++++++++------------
 net/ipv4/udp_impl.h           |  2 +-
 net/ipv4/udplite.c            |  2 +-
 net/ipv6/datagram.c           | 30 +++++++++----
 net/ipv6/udp.c                | 31 +++++++------
 net/ipv6/udp_impl.h           |  2 +-
 net/ipv6/udplite.c            |  2 +-
 13 files changed, 130 insertions(+), 73 deletions(-)

Message ID	20241204221254.3537932-3-sbrivio@redhat.com (mailing list archive)
State	Rejected
Delegated to:	Netdev Maintainers
Headers	show Received: from passt.top (passt.top [88.198.0.164]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 830D2194147 for <netdev@vger.kernel.org>; Wed, 4 Dec 2024 22:18:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=88.198.0.164 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733350689; cv=none; b=JFtN8f45aE7Cp49irWVr2OzX42xo7f1E1ebp9Xb25eU6GSftbwjxS9iBzd8cwN1aTPGpYMs97pv1NUTqHUaiZbmxihxlf5jW/TGlGhBfQdMHBrlBQSq8cfWrrR6n6MM76e1jWNih9Bc/u57OP0WrXC8zHJfHh8IvDlP6y5Y3gxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733350689; c=relaxed/simple; bh=YY29xeXp56HbEYceo0sl5uzYii7GWYSG+AG6TbXtDC4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qm6AXCtHubckyVoKUZn3RXWgIToL9ZATK9bLNZqxK3/JFKh0i7MSrYM8CvYxhHm6HEjQaSwWOa8HIcXmXaB7WBdr2AX3uzUg/uhyCLds2eEdrEI+q0AXTXpn2EPkr/SKPKYic53hBnCbouqO+uMcIrQnMvqdUu9pXribjbXWxu8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=passt.top; arc=none smtp.client-ip=88.198.0.164 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=passt.top Received: by passt.top (Postfix, from userid 1000) id 284AA5A0621; Wed, 04 Dec 2024 23:12:54 +0100 (CET) From: Stefano Brivio <sbrivio@redhat.com> To: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Eric Dumazet <edumazet@google.com>, netdev@vger.kernel.org, Kuniyuki Iwashima <kuniyu@amazon.com>, Mike Manning <mvrmanning@gmail.com>, David Gibson <david@gibson.dropbear.id.au>, Paul Holzinger <pholzing@redhat.com>, Philo Lu <lulie@linux.alibaba.com>, Cambda Zhu <cambda@linux.alibaba.com>, Fred Chen <fred.cc@alibaba-inc.com>, Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Subject: [PATCH net-next 2/2] datagram, udp: Set local address and rehash socket atomically against lookup Date: Wed, 4 Dec 2024 23:12:54 +0100 Message-ID: <20241204221254.3537932-3-sbrivio@redhat.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241204221254.3537932-1-sbrivio@redhat.com> References: <20241204221254.3537932-1-sbrivio@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: <netdev.vger.kernel.org> List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Delegate: kuba@kernel.org
Series	Fix race between datagram socket address change and rehash \| expand [net-next,0/2] Fix race between datagram socket address change and rehash [net-next,1/2] datagram: Rehash sockets only if local address changed for their family [net-next,2/2] datagram, udp: Set local address and rehash socket atomically against lookup

Context	Check	Description
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for net-next, async
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 17 this patch: 17
netdev/build_tools	success	Errors and warnings before: 0 (+0) this patch: 0 (+0)
netdev/cc_maintainers	warning	4 maintainers not CCed: kuba@kernel.org pabeni@redhat.com horms@kernel.org dsahern@kernel.org
netdev/build_clang	success	Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 2898 this patch: 2899
netdev/checkpatch	warning	CHECK: Unbalanced braces around else statement WARNING: Non-standard signature: Analysed-by:
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 17 this patch: 17
netdev/source_inline	success	Was 0 now: 0

[net-next,2/2] datagram, udp: Set local address and rehash socket atomically against lookup

Checks

Commit Message

Comments

Patch