diff mbox series

[net] ipv4: give an IPv4 dev to blackhole_netdev

Message ID 3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.com (mailing list archive)
State Accepted
Commit 22600596b6756b166fd052d5facb66287e6f0bad
Delegated to: Netdev Maintainers
Headers show
Series [net] ipv4: give an IPv4 dev to blackhole_netdev | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 6 this patch: 6
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 48 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-11--15-00 (tests: 776)

Commit Message

Xin Long Oct. 9, 2024, 6:47 p.m. UTC
After commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to
invalidate dst entries"), blackhole_netdev was introduced to invalidate
dst cache entries on the TX path whenever the cache times out or is
flushed.

When two UDP sockets (sk1 and sk2) send messages to the same destination
simultaneously, they are using the same dst cache. If the dst cache is
invalidated on one path (sk2) while the other (sk1) is still transmitting,
sk1 may try to use the invalid dst entry.

         CPU1                   CPU2

      udp_sendmsg(sk1)       udp_sendmsg(sk2)
      udp_send_skb()
      ip_output()
                                             <--- dst timeout or flushed
                             dst_dev_put()
      ip_finish_output2()
      ip_neigh_for_gw()

This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
arp_constructor(). This error is then propagated back to userspace,
breaking the UDP application.

The patch fixes this issue by assigning an in_dev to blackhole_dev for
IPv4, similar to what was done for IPv6 in commit e5f80fcf869a ("ipv6:
give an IPv6 dev to blackhole_netdev"). This ensures that even when the
dst entry is invalidated with blackhole_dev, it will not fail to create
the neigh entry.

As devinet_init() is called ealier than blackhole_netdev_init() in system
booting, it can not assign the in_dev to blackhole_dev in devinet_init().
As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
inet_blackhole_dev_init() is called after blackhole_netdev_init().

Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/ipv4/devinet.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

Comments

Eric Dumazet Oct. 10, 2024, 6:58 p.m. UTC | #1
On Wed, Oct 9, 2024 at 8:47 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> After commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to
> invalidate dst entries"), blackhole_netdev was introduced to invalidate
> dst cache entries on the TX path whenever the cache times out or is
> flushed.
>
> When two UDP sockets (sk1 and sk2) send messages to the same destination
> simultaneously, they are using the same dst cache. If the dst cache is
> invalidated on one path (sk2) while the other (sk1) is still transmitting,
> sk1 may try to use the invalid dst entry.
>
>          CPU1                   CPU2
>
>       udp_sendmsg(sk1)       udp_sendmsg(sk2)
>       udp_send_skb()
>       ip_output()
>                                              <--- dst timeout or flushed
>                              dst_dev_put()
>       ip_finish_output2()
>       ip_neigh_for_gw()
>
> This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
> blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
> arp_constructor(). This error is then propagated back to userspace,
> breaking the UDP application.
>
> The patch fixes this issue by assigning an in_dev to blackhole_dev for
> IPv4, similar to what was done for IPv6 in commit e5f80fcf869a ("ipv6:
> give an IPv6 dev to blackhole_netdev"). This ensures that even when the
> dst entry is invalidated with blackhole_dev, it will not fail to create
> the neigh entry.
>
> As devinet_init() is called ealier than blackhole_netdev_init() in system
> booting, it can not assign the in_dev to blackhole_dev in devinet_init().
> As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
> inet_blackhole_dev_init() is called after blackhole_netdev_init().
>
> Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>
patchwork-bot+netdevbpf@kernel.org Oct. 11, 2024, 10:50 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  9 Oct 2024 14:47:13 -0400 you wrote:
> After commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to
> invalidate dst entries"), blackhole_netdev was introduced to invalidate
> dst cache entries on the TX path whenever the cache times out or is
> flushed.
> 
> When two UDP sockets (sk1 and sk2) send messages to the same destination
> simultaneously, they are using the same dst cache. If the dst cache is
> invalidated on one path (sk2) while the other (sk1) is still transmitting,
> sk1 may try to use the invalid dst entry.
> 
> [...]

Here is the summary with links:
  - [net] ipv4: give an IPv4 dev to blackhole_netdev
    https://git.kernel.org/netdev/net/c/22600596b675

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index ab76744383cf..7cf5f7d0d0de 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -298,17 +298,19 @@  static struct in_device *inetdev_init(struct net_device *dev)
 	/* Account for reference dev->ip_ptr (below) */
 	refcount_set(&in_dev->refcnt, 1);
 
-	err = devinet_sysctl_register(in_dev);
-	if (err) {
-		in_dev->dead = 1;
-		neigh_parms_release(&arp_tbl, in_dev->arp_parms);
-		in_dev_put(in_dev);
-		in_dev = NULL;
-		goto out;
+	if (dev != blackhole_netdev) {
+		err = devinet_sysctl_register(in_dev);
+		if (err) {
+			in_dev->dead = 1;
+			neigh_parms_release(&arp_tbl, in_dev->arp_parms);
+			in_dev_put(in_dev);
+			in_dev = NULL;
+			goto out;
+		}
+		ip_mc_init_dev(in_dev);
+		if (dev->flags & IFF_UP)
+			ip_mc_up(in_dev);
 	}
-	ip_mc_init_dev(in_dev);
-	if (dev->flags & IFF_UP)
-		ip_mc_up(in_dev);
 
 	/* we can receive as soon as ip_ptr is set -- do this last */
 	rcu_assign_pointer(dev->ip_ptr, in_dev);
@@ -347,6 +349,19 @@  static void inetdev_destroy(struct in_device *in_dev)
 	in_dev_put(in_dev);
 }
 
+static int __init inet_blackhole_dev_init(void)
+{
+	int err = 0;
+
+	rtnl_lock();
+	if (!inetdev_init(blackhole_netdev))
+		err = -ENOMEM;
+	rtnl_unlock();
+
+	return err;
+}
+late_initcall(inet_blackhole_dev_init);
+
 int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b)
 {
 	const struct in_ifaddr *ifa;