Message ID | 20220225014929.942444-2-wangyufen@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf, sockmap: Fix memleaks and issues of mem charge/uncharge | expand |
Context | Check | Description |
---|---|---|
bpf/vmtest-bpf-next-PR | success | PR summary |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Series has a cover letter |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 76 this patch: 76 |
netdev/cc_maintainers | success | CCed 12 of 12 maintainers |
netdev/build_clang | success | Errors and warnings before: 22 this patch: 22 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 81 this patch: 81 |
netdev/checkpatch | warning | CHECK: Unbalanced braces around else statement |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
bpf/vmtest-bpf-next | success | VM_Test |
On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote: > If tcp_bpf_sendmsg is running during a tear down operation we may enqueue > data on the ingress msg queue while tear down is trying to free it. > > sk1 (redirect sk2) sk2 > ------------------- --------------- > tcp_bpf_sendmsg() > tcp_bpf_send_verdict() > tcp_bpf_sendmsg_redir() > bpf_tcp_ingress() > sock_map_close() > lock_sock() > lock_sock() ... blocking > sk_psock_stop > sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); > release_sock(sk); > lock_sock() > sk_mem_charge() > get_page() > sk_psock_queue_msg() > sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); > drop_sk_msg() > release_sock() > > While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge > and has sg pages need to put. To fix we use sk_msg_free() and then kfee() > msg. > What about the other code path? That is, sk_psock_skb_ingress_enqueue(). I don't see skmsg is charged there. Thanks.
在 2022/2/28 3:21, Cong Wang 写道: > On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote: >> If tcp_bpf_sendmsg is running during a tear down operation we may enqueue >> data on the ingress msg queue while tear down is trying to free it. >> >> sk1 (redirect sk2) sk2 >> ------------------- --------------- >> tcp_bpf_sendmsg() >> tcp_bpf_send_verdict() >> tcp_bpf_sendmsg_redir() >> bpf_tcp_ingress() >> sock_map_close() >> lock_sock() >> lock_sock() ... blocking >> sk_psock_stop >> sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); >> release_sock(sk); >> lock_sock() >> sk_mem_charge() >> get_page() >> sk_psock_queue_msg() >> sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); >> drop_sk_msg() >> release_sock() >> >> While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge >> and has sg pages need to put. To fix we use sk_msg_free() and then kfee() >> msg. >> > What about the other code path? That is, sk_psock_skb_ingress_enqueue(). > I don't see skmsg is charged there. sk_psock_skb_ingress_self() | sk_psock_skb_ingress() skb_set_owner_r() sk_mem_charge() sk_psock_skb_ingress_enqueue() The other code path skmsg is charged by skb_set_owner_r()->sk_mem_charge() > > Thanks. > .
wangyufen wrote: > > 在 2022/2/28 3:21, Cong Wang 写道: > > On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote: > >> If tcp_bpf_sendmsg is running during a tear down operation we may enqueue > >> data on the ingress msg queue while tear down is trying to free it. > >> > >> sk1 (redirect sk2) sk2 > >> ------------------- --------------- > >> tcp_bpf_sendmsg() > >> tcp_bpf_send_verdict() > >> tcp_bpf_sendmsg_redir() > >> bpf_tcp_ingress() > >> sock_map_close() > >> lock_sock() > >> lock_sock() ... blocking > >> sk_psock_stop > >> sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); > >> release_sock(sk); > >> lock_sock() > >> sk_mem_charge() > >> get_page() > >> sk_psock_queue_msg() > >> sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); > >> drop_sk_msg() > >> release_sock() > >> > >> While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge > >> and has sg pages need to put. To fix we use sk_msg_free() and then kfee() > >> msg. > >> > > What about the other code path? That is, sk_psock_skb_ingress_enqueue(). > > I don't see skmsg is charged there. > > sk_psock_skb_ingress_self() | sk_psock_skb_ingress() > skb_set_owner_r() > sk_mem_charge() > sk_psock_skb_ingress_enqueue() > > The other code path skmsg is charged by skb_set_owner_r()->sk_mem_charge() > > > > > Thanks. > > . I walked that code and fix LGTM as well. Acked-by: John Fastabend <john.fastabend@gmail.com>
On Tue, Mar 01, 2022 at 09:49:12AM +0800, wangyufen wrote: > > 在 2022/2/28 3:21, Cong Wang 写道: > > On Fri, Feb 25, 2022 at 09:49:26AM +0800, Wang Yufen wrote: > > > If tcp_bpf_sendmsg is running during a tear down operation we may enqueue > > > data on the ingress msg queue while tear down is trying to free it. > > > > > > sk1 (redirect sk2) sk2 > > > ------------------- --------------- > > > tcp_bpf_sendmsg() > > > tcp_bpf_send_verdict() > > > tcp_bpf_sendmsg_redir() > > > bpf_tcp_ingress() > > > sock_map_close() > > > lock_sock() > > > lock_sock() ... blocking > > > sk_psock_stop > > > sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); > > > release_sock(sk); > > > lock_sock() > > > sk_mem_charge() > > > get_page() > > > sk_psock_queue_msg() > > > sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); > > > drop_sk_msg() > > > release_sock() > > > > > > While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge > > > and has sg pages need to put. To fix we use sk_msg_free() and then kfee() > > > msg. > > > > > What about the other code path? That is, sk_psock_skb_ingress_enqueue(). > > I don't see skmsg is charged there. > > sk_psock_skb_ingress_self() | sk_psock_skb_ingress() > skb_set_owner_r() > sk_mem_charge() > sk_psock_skb_ingress_enqueue() > > The other code path skmsg is charged by skb_set_owner_r()->sk_mem_charge() > skb_set_owner_r() charges skb, I was asking skmsg. ;) In sk_psock_skb_ingress_enqueue(), the skmsg was initialized but not actually charged, hence I was asking... From a second look, it seems sk_mem_uncharge() is not called for sk_psock_skb_ingress_enqueue() where msg->skb is clearly not NULL. Also, you introduce an unnecessary sk_msg_init() from __sk_msg_free(), because you call kfree(msg) after it. Thanks.
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index fdb5375f0562..c5a2d6f50f25 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -304,21 +304,16 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb) kfree_skb(skb); } -static inline void drop_sk_msg(struct sk_psock *psock, struct sk_msg *msg) -{ - if (msg->skb) - sock_drop(psock->sk, msg->skb); - kfree(msg); -} - static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) list_add_tail(&msg->list, &psock->ingress_msg); - else - drop_sk_msg(psock, msg); + else { + sk_msg_free(psock->sk, msg); + kfree(msg); + } spin_unlock_bh(&psock->ingress_lock); }
If tcp_bpf_sendmsg is running during a tear down operation we may enqueue data on the ingress msg queue while tear down is trying to free it. sk1 (redirect sk2) sk2 ------------------- --------------- tcp_bpf_sendmsg() tcp_bpf_send_verdict() tcp_bpf_sendmsg_redir() bpf_tcp_ingress() sock_map_close() lock_sock() lock_sock() ... blocking sk_psock_stop sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); release_sock(sk); lock_sock() sk_mem_charge() get_page() sk_psock_queue_msg() sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); drop_sk_msg() release_sock() While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge and has sg pages need to put. To fix we use sk_msg_free() and then kfee() msg. This issue can cause the following info: WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0 Call Trace: <IRQ> inet_csk_destroy_sock+0x55/0x110 tcp_rcv_state_process+0xe5f/0xe90 ? sk_filter_trim_cap+0x10d/0x230 ? tcp_v4_do_rcv+0x161/0x250 tcp_v4_do_rcv+0x161/0x250 tcp_v4_rcv+0xc3a/0xce0 ip_protocol_deliver_rcu+0x3d/0x230 ip_local_deliver_finish+0x54/0x60 ip_local_deliver+0xfd/0x110 ? ip_protocol_deliver_rcu+0x230/0x230 ip_rcv+0xd6/0x100 ? ip_local_deliver+0x110/0x110 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0xa4/0x160 __napi_poll+0x29/0x1b0 net_rx_action+0x287/0x300 __do_softirq+0xff/0x2fc do_softirq+0x79/0x90 </IRQ> WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0 Call Trace: <TASK> __sk_destruct+0x24/0x1f0 sk_psock_destroy+0x19b/0x1c0 process_one_work+0x1b3/0x3c0 ? process_one_work+0x3c0/0x3c0 worker_thread+0x30/0x350 ? process_one_work+0x3c0/0x3c0 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Fixes: 9635720b7c88 ("bpf, sockmap: Fix memleak on ingress msg enqueue") Signed-off-by: Wang Yufen <wangyufen@huawei.com> --- include/linux/skmsg.h | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)