Message ID | 5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613.git.sd@queasysnail.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [net,v2] tcp: drop secpath at the same time as we currently drop dst | expand |
On Mon, Feb 17, 2025 at 11:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote: > > Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while > running tests that boil down to: > - create a pair of netns > - run a basic TCP test over ipcomp6 > - delete the pair of netns > > The xfrm_state found on spi_byaddr was not deleted at the time we > delete the netns, because we still have a reference on it. This > lingering reference comes from a secpath (which holds a ref on the > xfrm_state), which is still attached to an skb. This skb is not > leaked, it ends up on sk_receive_queue and then gets defer-free'd by > skb_attempt_defer_free. > > The problem happens when we defer freeing an skb (push it on one CPU's > defer_list), and don't flush that list before the netns is deleted. In > that case, we still have a reference on the xfrm_state that we don't > expect at this point. > > We already drop the skb's dst in the TCP receive path when it's no > longer needed, so let's also drop the secpath. At this point, > tcp_filter has already called into the LSM hooks that may require the > secpath, so it should not be needed anymore. However, in some of those > places, the MPTCP extension has just been attached to the skb, so we > cannot simply drop all extensions. > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") > Reported-by: Xiumei Mu <xmu@redhat.com> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > --- > v1: drop all extensions just before calling skb_attempt_defer_free > https://lore.kernel.org/netdev/879a4592e4e4bd0c30dbe29ca189e224ec1739a5.1739201151.git.sd@queasysnail.net/ > v2: - drop only secpath, as soon as possible - per Eric's feedback > - add debug warns if trying to add to sk_receive_queue an skb with > a dst or a secpath > > @Eric feel free to add some tags (Suggested-by? sign-off?) for the > code I adapted from > https://lore.kernel.org/netdev/CANn89i+JdDukwEhZ%3D41FxY-w63eER6JVixkwL+s2eSOjo6aWEQ@mail.gmail.com/ Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks !
On Mon, Feb 17, 2025 at 5:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote: > > Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while > running tests that boil down to: > - create a pair of netns > - run a basic TCP test over ipcomp6 > - delete the pair of netns > > The xfrm_state found on spi_byaddr was not deleted at the time we > delete the netns, because we still have a reference on it. This > lingering reference comes from a secpath (which holds a ref on the > xfrm_state), which is still attached to an skb. This skb is not > leaked, it ends up on sk_receive_queue and then gets defer-free'd by > skb_attempt_defer_free. > > The problem happens when we defer freeing an skb (push it on one CPU's > defer_list), and don't flush that list before the netns is deleted. In > that case, we still have a reference on the xfrm_state that we don't > expect at this point. > > We already drop the skb's dst in the TCP receive path when it's no > longer needed, so let's also drop the secpath. At this point, > tcp_filter has already called into the LSM hooks that may require the > secpath, so it should not be needed anymore. I don't recall seeing any follow up in the v1 patchset regarding IP_CMSG_PASSSEC/security_socket_getpeersec_dgram(), can you confirm that the secpath is preserved for that code path? https://lore.kernel.org/linux-security-module/CAHC9VhQZ+k1J0UidJ-bgdBGBuVX9M18tQ+a+fuqXQM_L-PFvzA@mail.gmail.com > However, in some of those > places, the MPTCP extension has just been attached to the skb, so we > cannot simply drop all extensions. > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") > Reported-by: Xiumei Mu <xmu@redhat.com> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> > --- > v1: drop all extensions just before calling skb_attempt_defer_free > https://lore.kernel.org/netdev/879a4592e4e4bd0c30dbe29ca189e224ec1739a5.1739201151.git.sd@queasysnail.net/ > v2: - drop only secpath, as soon as possible - per Eric's feedback > - add debug warns if trying to add to sk_receive_queue an skb with > a dst or a secpath > > @Eric feel free to add some tags (Suggested-by? sign-off?) for the > code I adapted from > https://lore.kernel.org/netdev/CANn89i+JdDukwEhZ%3D41FxY-w63eER6JVixkwL+s2eSOjo6aWEQ@mail.gmail.com/ > > include/net/tcp.h | 14 ++++++++++++++ > net/ipv4/tcp_fastopen.c | 4 ++-- > net/ipv4/tcp_input.c | 8 ++++---- > net/ipv4/tcp_ipv4.c | 2 +- > 4 files changed, 21 insertions(+), 7 deletions(-)
2025-02-17, 17:35:32 -0500, Paul Moore wrote: > On Mon, Feb 17, 2025 at 5:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote: > > > > Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while > > running tests that boil down to: > > - create a pair of netns > > - run a basic TCP test over ipcomp6 > > - delete the pair of netns > > > > The xfrm_state found on spi_byaddr was not deleted at the time we > > delete the netns, because we still have a reference on it. This > > lingering reference comes from a secpath (which holds a ref on the > > xfrm_state), which is still attached to an skb. This skb is not > > leaked, it ends up on sk_receive_queue and then gets defer-free'd by > > skb_attempt_defer_free. > > > > The problem happens when we defer freeing an skb (push it on one CPU's > > defer_list), and don't flush that list before the netns is deleted. In > > that case, we still have a reference on the xfrm_state that we don't > > expect at this point. > > > > We already drop the skb's dst in the TCP receive path when it's no > > longer needed, so let's also drop the secpath. At this point, > > tcp_filter has already called into the LSM hooks that may require the > > secpath, so it should not be needed anymore. > > I don't recall seeing any follow up in the v1 patchset regarding > IP_CMSG_PASSSEC/security_socket_getpeersec_dgram(), can you confirm > that the secpath is preserved for that code path? > > https://lore.kernel.org/linux-security-module/CAHC9VhQZ+k1J0UidJ-bgdBGBuVX9M18tQ+a+fuqXQM_L-PFvzA@mail.gmail.com Sorry, I thought we'd addressed this in the v1 discussion with Eric. IP_CMSG_PASSSEC is not blocked for TCP sockets, but it will only process skbs that came from the error queue (ip_recv_error -> ip_cmsg_recv -> ip_cmsg_recv_offset -> ip_cmsg_recv_security -> security_socket_getpeersec_dgram), which don't go through those code paths at all. So AFAICT IP_CMSG_PASSSEC for TCP isn't affected by dropping the secpath early.
On Mon, Feb 17, 2025 at 6:14 PM Sabrina Dubroca <sd@queasysnail.net> wrote: > 2025-02-17, 17:35:32 -0500, Paul Moore wrote: > > On Mon, Feb 17, 2025 at 5:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote: > > > > > > Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while > > > running tests that boil down to: > > > - create a pair of netns > > > - run a basic TCP test over ipcomp6 > > > - delete the pair of netns > > > > > > The xfrm_state found on spi_byaddr was not deleted at the time we > > > delete the netns, because we still have a reference on it. This > > > lingering reference comes from a secpath (which holds a ref on the > > > xfrm_state), which is still attached to an skb. This skb is not > > > leaked, it ends up on sk_receive_queue and then gets defer-free'd by > > > skb_attempt_defer_free. > > > > > > The problem happens when we defer freeing an skb (push it on one CPU's > > > defer_list), and don't flush that list before the netns is deleted. In > > > that case, we still have a reference on the xfrm_state that we don't > > > expect at this point. > > > > > > We already drop the skb's dst in the TCP receive path when it's no > > > longer needed, so let's also drop the secpath. At this point, > > > tcp_filter has already called into the LSM hooks that may require the > > > secpath, so it should not be needed anymore. > > > > I don't recall seeing any follow up in the v1 patchset regarding > > IP_CMSG_PASSSEC/security_socket_getpeersec_dgram(), can you confirm > > that the secpath is preserved for that code path? > > > > https://lore.kernel.org/linux-security-module/CAHC9VhQZ+k1J0UidJ-bgdBGBuVX9M18tQ+a+fuqXQM_L-PFvzA@mail.gmail.com > > Sorry, I thought we'd addressed this in the v1 discussion with Eric. > > IP_CMSG_PASSSEC is not blocked for TCP sockets, but it will only > process skbs that came from the error queue (ip_recv_error -> > ip_cmsg_recv -> ip_cmsg_recv_offset -> ip_cmsg_recv_security -> > security_socket_getpeersec_dgram), which don't go through those code > paths at all. So AFAICT IP_CMSG_PASSSEC for TCP isn't affected by > dropping the secpath early. Great, thanks for clearing that up.
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Mon, 17 Feb 2025 11:23:35 +0100 you wrote: > Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while > running tests that boil down to: > - create a pair of netns > - run a basic TCP test over ipcomp6 > - delete the pair of netns > > The xfrm_state found on spi_byaddr was not deleted at the time we > delete the netns, because we still have a reference on it. This > lingering reference comes from a secpath (which holds a ref on the > xfrm_state), which is still attached to an skb. This skb is not > leaked, it ends up on sk_receive_queue and then gets defer-free'd by > skb_attempt_defer_free. > > [...] Here is the summary with links: - [net,v2] tcp: drop secpath at the same time as we currently drop dst https://git.kernel.org/netdev/net/c/9b6412e6979f You are awesome, thank you!
diff --git a/include/net/tcp.h b/include/net/tcp.h index 5b2b04835688..930cda5b5eb9 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -41,6 +41,7 @@ #include <net/inet_ecn.h> #include <net/dst.h> #include <net/mptcp.h> +#include <net/xfrm.h> #include <linux/seq_file.h> #include <linux/memcontrol.h> @@ -683,6 +684,19 @@ void tcp_fin(struct sock *sk); void tcp_check_space(struct sock *sk); void tcp_sack_compress_send_ack(struct sock *sk); +static inline void tcp_cleanup_skb(struct sk_buff *skb) +{ + skb_dst_drop(skb); + secpath_reset(skb); +} + +static inline void tcp_add_receive_queue(struct sock *sk, struct sk_buff *skb) +{ + DEBUG_NET_WARN_ON_ONCE(skb_dst(skb)); + DEBUG_NET_WARN_ON_ONCE(secpath_exists(skb)); + __skb_queue_tail(&sk->sk_receive_queue, skb); +} + /* tcp_timer.c */ void tcp_init_xmit_timers(struct sock *); static inline void tcp_clear_xmit_timers(struct sock *sk) diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 0f523cbfe329..32b28fc21b63 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -178,7 +178,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) if (!skb) return; - skb_dst_drop(skb); + tcp_cleanup_skb(skb); /* segs_in has been initialized to 1 in tcp_create_openreq_child(). * Hence, reset segs_in to 0 before calling tcp_segs_in() * to avoid double counting. Also, tcp_segs_in() expects @@ -195,7 +195,7 @@ void tcp_fastopen_add_skb(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_SYN; tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); tp->syn_data_acked = 1; /* u64_stats_update_begin(&tp->syncp) not needed here, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index eb82e01da911..6821e5540a53 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4970,7 +4970,7 @@ static void tcp_ofo_queue(struct sock *sk) tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; if (!eaten) - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); else kfree_skb_partial(skb, fragstolen); @@ -5162,7 +5162,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, skb, fragstolen)) ? 1 : 0; tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); if (!eaten) { - __skb_queue_tail(&sk->sk_receive_queue, skb); + tcp_add_receive_queue(sk, skb); skb_set_owner_r(skb, sk); } return eaten; @@ -5245,7 +5245,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); return; } - skb_dst_drop(skb); + tcp_cleanup_skb(skb); __skb_pull(skb, tcp_hdr(skb)->doff * 4); reason = SKB_DROP_REASON_NOT_SPECIFIED; @@ -6226,7 +6226,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS); /* Bulk data transfer: receiver */ - skb_dst_drop(skb); + tcp_cleanup_skb(skb); __skb_pull(skb, tcp_header_len); eaten = tcp_queue_rcv(sk, skb, &fragstolen); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cc2b5194a18d..2632844d2c35 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2027,7 +2027,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb, */ skb_condense(skb); - skb_dst_drop(skb); + tcp_cleanup_skb(skb); if (unlikely(tcp_checksum_complete(skb))) { bh_unlock_sock(sk);
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while running tests that boil down to: - create a pair of netns - run a basic TCP test over ipcomp6 - delete the pair of netns The xfrm_state found on spi_byaddr was not deleted at the time we delete the netns, because we still have a reference on it. This lingering reference comes from a secpath (which holds a ref on the xfrm_state), which is still attached to an skb. This skb is not leaked, it ends up on sk_receive_queue and then gets defer-free'd by skb_attempt_defer_free. The problem happens when we defer freeing an skb (push it on one CPU's defer_list), and don't flush that list before the netns is deleted. In that case, we still have a reference on the xfrm_state that we don't expect at this point. We already drop the skb's dst in the TCP receive path when it's no longer needed, so let's also drop the secpath. At this point, tcp_filter has already called into the LSM hooks that may require the secpath, so it should not be needed anymore. However, in some of those places, the MPTCP extension has just been attached to the skb, so we cannot simply drop all extensions. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> --- v1: drop all extensions just before calling skb_attempt_defer_free https://lore.kernel.org/netdev/879a4592e4e4bd0c30dbe29ca189e224ec1739a5.1739201151.git.sd@queasysnail.net/ v2: - drop only secpath, as soon as possible - per Eric's feedback - add debug warns if trying to add to sk_receive_queue an skb with a dst or a secpath @Eric feel free to add some tags (Suggested-by? sign-off?) for the code I adapted from https://lore.kernel.org/netdev/CANn89i+JdDukwEhZ%3D41FxY-w63eER6JVixkwL+s2eSOjo6aWEQ@mail.gmail.com/ include/net/tcp.h | 14 ++++++++++++++ net/ipv4/tcp_fastopen.c | 4 ++-- net/ipv4/tcp_input.c | 8 ++++---- net/ipv4/tcp_ipv4.c | 2 +- 4 files changed, 21 insertions(+), 7 deletions(-)