Message ID | 20250220072559.782296-1-idosch@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 0e4427f8f587c4b603475468bb3aee9418574893 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: loopback: Avoid sending IP packets without an Ethernet header | expand |
On Thu, Feb 20, 2025 at 8:26 AM Ido Schimmel <idosch@nvidia.com> wrote: > > After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev") > IPv4 neighbors can be constructed on the blackhole net device, but they > are constructed with an output function (neigh_direct_output()) that > simply calls dev_queue_xmit(). The latter will transmit packets via > 'skb->dev' which might not be the blackhole net device if dst_dev_put() > switched 'dst->dev' to the blackhole net device while another CPU was > using the dst entry in ip_output(), but after it already initialized > 'skb->dev' from 'dst->dev'. > > Specifically, the following can happen: > > CPU1 CPU2 > > udp_sendmsg(sk1) udp_sendmsg(sk2) > udp_send_skb() [...] > ip_output() > skb->dev = skb_dst(skb)->dev > dst_dev_put() > dst->dev = blackhole_netdev > ip_finish_output2() > resolves neigh on dst->dev > neigh_output() > neigh_direct_output() > dev_queue_xmit() > > This will result in IPv4 packets being sent without an Ethernet header > via a valid net device: > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes > 22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown > (0x58c6), length 68: > 0x0000: 8dda 74ca f1ae ca6c ca6c 0098 969c 0400 ..t....l.l...... > 0x0010: 0000 4730 3f18 6800 0000 0000 0000 9971 ..G0?.h........q > 0x0020: c4c9 9055 a157 0a70 9ead bf83 38ca ab38 ...U.W.p....8..8 > 0x0030: 8add ab96 e052 .....R > > Fix by making sure that neighbors are constructed on top of the > blackhole net device with an output function that simply consumes the > packets, in a similar fashion to dst_discard_out() and > blackhole_netdev_xmit(). > > Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") > Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev") > Reported-by: Florian Meister <fmei@sfs.com> > Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/ > Signed-off-by: Ido Schimmel <idosch@nvidia.com> > --- > drivers/net/loopback.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c > index c8840c3b9a1b..f1d68153987e 100644 > --- a/drivers/net/loopback.c > +++ b/drivers/net/loopback.c > @@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb, > return NETDEV_TX_OK; > } > > +static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb) > +{ > + kfree_skb(skb); If there is any risk of this being hit often, I would probably use the recent SKB_DROP_REASON_BLACKHOLE (feel free to resubmit https://lore.kernel.org/netdev/20250212164323.2183023-1-edumazet@google.com/T/#mbb8d4b0779cb8f0654a382772c943af5389606ea ?) Otherwise, this looks good to me, thanks ! Reviewed-by: Eric Dumazet <edumazet@google.com>
On Thu, Feb 20, 2025 at 11:40:07AM +0100, Eric Dumazet wrote: > On Thu, Feb 20, 2025 at 8:26 AM Ido Schimmel <idosch@nvidia.com> wrote: > > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c > > index c8840c3b9a1b..f1d68153987e 100644 > > --- a/drivers/net/loopback.c > > +++ b/drivers/net/loopback.c > > @@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb, > > return NETDEV_TX_OK; > > } > > > > +static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb) > > +{ > > + kfree_skb(skb); > > If there is any risk of this being hit often, I would probably use the > recent SKB_DROP_REASON_BLACKHOLE Not very often. About 10 times while running the reproducer I shared here: https://lore.kernel.org/netdev/Z7D9cR22BDPN7WSJ@shredder/ In line with the original report: https://github.com/siderolabs/talos/issues/9837#issuecomment-2642116378 > (feel free to resubmit > https://lore.kernel.org/netdev/20250212164323.2183023-1-edumazet@google.com/T/#mbb8d4b0779cb8f0654a382772c943af5389606ea > ?) Can we do it in net-next? A few questions / suggestions regarding the patch: 1. Can we use it for IPv4 as well? I tested the following: diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 753704f75b2c..2aeab70c1cb5 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -966,6 +966,7 @@ static int ip_error(struct sk_buff *skb) switch (rt->dst.error) { case EINVAL: + SKB_DR_SET(reason, BLACKHOLE); default: goto out; case EHOSTUNREACH: 2. Given that this reason is going to be triggered both for user-installed blackhole routes and dst entries being destroyed how about adjusting the comment? Otherwise I think it will be confusing for users who didn't install a blackhole route. Something like: diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index f1d68153987e..cb269b3251d4 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -239,14 +239,13 @@ struct pernet_operations __net_initdata loopback_net_ops = { static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb, struct net_device *dev) { - kfree_skb(skb); - net_warn_ratelimited("%s(): Dropping skb.\n", __func__); + kfree_skb_reason(skb, SKB_DROP_REASON_BLACKHOLE); return NETDEV_TX_OK; } static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb) { - kfree_skb(skb); + kfree_skb_reason(skb, SKB_DROP_REASON_BLACKHOLE); return 0; } diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h index de42577f16dd..0ef6869dbd1b 100644 --- a/include/net/dropreason-core.h +++ b/include/net/dropreason-core.h @@ -556,7 +556,8 @@ enum skb_drop_reason { */ SKB_DROP_REASON_BRIDGE_INGRESS_STP_STATE, /** - * @SKB_DROP_REASON_BLACKHOLE: blackhole route. + * @SKB_DROP_REASON_BLACKHOLE: blackhole route or dst entry being + * destroyed. */ SKB_DROP_REASON_BLACKHOLE, /** > Otherwise, this looks good to me, thanks ! > > Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks!
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 20 Feb 2025 09:25:59 +0200 you wrote: > After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev") > IPv4 neighbors can be constructed on the blackhole net device, but they > are constructed with an output function (neigh_direct_output()) that > simply calls dev_queue_xmit(). The latter will transmit packets via > 'skb->dev' which might not be the blackhole net device if dst_dev_put() > switched 'dst->dev' to the blackhole net device while another CPU was > using the dst entry in ip_output(), but after it already initialized > 'skb->dev' from 'dst->dev'. > > [...] Here is the summary with links: - [net] net: loopback: Avoid sending IP packets without an Ethernet header https://git.kernel.org/netdev/net/c/0e4427f8f587 You are awesome, thank you!
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index c8840c3b9a1b..f1d68153987e 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb, return NETDEV_TX_OK; } +static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb) +{ + kfree_skb(skb); + return 0; +} + +static int blackhole_neigh_construct(struct net_device *dev, + struct neighbour *n) +{ + n->output = blackhole_neigh_output; + return 0; +} + static const struct net_device_ops blackhole_netdev_ops = { .ndo_start_xmit = blackhole_netdev_xmit, + .ndo_neigh_construct = blackhole_neigh_construct, }; /* This is a dst-dummy device used specifically for invalidated
After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev") IPv4 neighbors can be constructed on the blackhole net device, but they are constructed with an output function (neigh_direct_output()) that simply calls dev_queue_xmit(). The latter will transmit packets via 'skb->dev' which might not be the blackhole net device if dst_dev_put() switched 'dst->dev' to the blackhole net device while another CPU was using the dst entry in ip_output(), but after it already initialized 'skb->dev' from 'dst->dev'. Specifically, the following can happen: CPU1 CPU2 udp_sendmsg(sk1) udp_sendmsg(sk2) udp_send_skb() [...] ip_output() skb->dev = skb_dst(skb)->dev dst_dev_put() dst->dev = blackhole_netdev ip_finish_output2() resolves neigh on dst->dev neigh_output() neigh_direct_output() dev_queue_xmit() This will result in IPv4 packets being sent without an Ethernet header via a valid net device: tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown (0x58c6), length 68: 0x0000: 8dda 74ca f1ae ca6c ca6c 0098 969c 0400 ..t....l.l...... 0x0010: 0000 4730 3f18 6800 0000 0000 0000 9971 ..G0?.h........q 0x0020: c4c9 9055 a157 0a70 9ead bf83 38ca ab38 ...U.W.p....8..8 0x0030: 8add ab96 e052 .....R Fix by making sure that neighbors are constructed on top of the blackhole net device with an output function that simply consumes the packets, in a similar fashion to dst_discard_out() and blackhole_netdev_xmit(). Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev") Reported-by: Florian Meister <fmei@sfs.com> Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> --- drivers/net/loopback.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)