Message ID | 20201103191830.60151-5-saeedm@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,1/9] net/mlx5e: Fix modify header actions memory leak | expand |
On Tue, 3 Nov 2020 11:18:25 -0800 Saeed Mahameed wrote: > From: Maxim Mikityanskiy <maximmi@mellanox.com> > > On resync, the driver calls inet_lookup_established > (__inet6_lookup_established) that increases sk_refcnt of the socket. To > decrease it, the driver set skb->destructor to sock_edemux. However, it > didn't work well, because the TCP stack also sets this destructor for > early demux, and the refcount gets decreased only once Why is the stack doing early_demux if there is already a socket assigned? Or is it not early_demux but something else? Can you point us at the code? IPv4: if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && !skb->sk && <============ !ip_is_fragment(iph)) { const struct net_protocol *ipprot; int protocol = iph->protocol; ipprot = rcu_dereference(inet_protos[protocol]); if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) { err = INDIRECT_CALL_2(edemux, tcp_v4_early_demux, udp_v4_early_demux, skb); if (unlikely(err)) goto drop_error; /* must reload iph, skb->head might have changed */ iph = ip_hdr(skb); } } IPv6: if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) { ~~~~~~~~~~~~~~~ const struct inet6_protocol *ipprot; ipprot = rcu_dereference(inet6_protos[ipv6_hdr(skb)->nexthdr]); if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) INDIRECT_CALL_2(edemux, tcp_v6_early_demux, udp_v6_early_demux, skb); }
On 2020-11-05 00:59, Jakub Kicinski wrote: > On Tue, 3 Nov 2020 11:18:25 -0800 Saeed Mahameed wrote: >> From: Maxim Mikityanskiy <maximmi@mellanox.com> >> >> On resync, the driver calls inet_lookup_established >> (__inet6_lookup_established) that increases sk_refcnt of the socket. To >> decrease it, the driver set skb->destructor to sock_edemux. However, it >> didn't work well, because the TCP stack also sets this destructor for >> early demux, and the refcount gets decreased only once > > Why is the stack doing early_demux if there is already a socket > assigned? Or is it not early_demux but something else? > Can you point us at the code? I thought this was the code that was in conflict with setting skb->destructor in the driver: void skb_set_owner_w(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); skb->sk = sk; #ifdef CONFIG_INET if (unlikely(!sk_fullsock(sk))) { skb->destructor = sock_edemux; sock_hold(sk); return; } #endif However, after taking another look, it seems that the root cause is somewhere else. This piece of code actually calls skb_orphan before reassigning the destructor. I'll debug it further to try to find where the destructor is replaced or just not called. > IPv4: > if (net->ipv4.sysctl_ip_early_demux && > !skb_dst(skb) && > !skb->sk && <============ > !ip_is_fragment(iph)) { > const struct net_protocol *ipprot; > int protocol = iph->protocol; > > ipprot = rcu_dereference(inet_protos[protocol]); > if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) { > err = INDIRECT_CALL_2(edemux, tcp_v4_early_demux, > udp_v4_early_demux, skb); > if (unlikely(err)) > goto drop_error; > /* must reload iph, skb->head might have changed */ > iph = ip_hdr(skb); > } > } > > IPv6: > if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) { > ~~~~~~~~~~~~~~~ > const struct inet6_protocol *ipprot; > > ipprot = rcu_dereference(inet6_protos[ipv6_hdr(skb)->nexthdr]); > if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) > INDIRECT_CALL_2(edemux, tcp_v6_early_demux, > udp_v6_early_demux, skb); > } >
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c index 7f6221b8b1f7..6a1d82503ef8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c @@ -476,19 +476,22 @@ static void resync_update_sn(struct mlx5e_rq *rq, struct sk_buff *skb) depth += sizeof(struct tcphdr); - if (unlikely(!sk || sk->sk_state == TCP_TIME_WAIT)) + if (unlikely(!sk)) return; - if (unlikely(!resync_queue_get_psv(sk))) - return; + if (unlikely(sk->sk_state == TCP_TIME_WAIT)) + goto unref; - skb->sk = sk; - skb->destructor = sock_edemux; + if (unlikely(!resync_queue_get_psv(sk))) + goto unref; seq = th->seq; datalen = skb->len - depth; tls_offload_rx_resync_async_request_start(sk, seq, datalen); rq->stats->tls_resync_req_start++; + +unref: + sock_gen_put(sk); } void mlx5e_ktls_rx_resync(struct net_device *netdev, struct sock *sk,