Message ID | 25c57c05b6f5cc81fd49b8f060ebf0961ea8af68.1619638230.git.lucien.xin@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tipc: fix a race in tipc_sk_mcast_rcv | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 5 this patch: 5 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 66 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 5 this patch: 5 |
netdev/header_inline | success | Link |
Hi Xin, See my comments inline. Best regards, Tung Nguyen -----Original Message----- From: Xin Long <lucien.xin@gmail.com> Sent: Thursday, April 29, 2021 2:31 AM To: network dev <netdev@vger.kernel.org>; tipc-discussion@lists.sourceforge.net Cc: kuba@kernel.org; lyl2019@mail.ustc.edu.cn; davem@davemloft.net Subject: [tipc-discussion] [PATCH net] tipc: fix a race in tipc_sk_mcast_rcv After commit cb1b728096f5 ("tipc: eliminate race condition at multicast reception"), when processing the multicast reception, the packets are firstly moved from be->inputq1 to be->arrvq in tipc_node_broadcast(), then process be->arrvq in tipc_sk_mcast_rcv(). In tipc_sk_mcast_rcv(), it gets the 1st skb by skb_peek(), then process this skb without any lock. It means meanwhile another thread could also call tipc_sk_mcast_rcv() and process be->arrvq and pick up the same skb, then free it. A double free issue will be caused as Li Shuang reported: [] kernel BUG at mm/slub.c:305! [] kfree+0x3a7/0x3d0 [] kfree_skb+0x32/0xa0 [] skb_release_data+0xb4/0x170 [] kfree_skb+0x32/0xa0 [] skb_release_data+0xb4/0x170 [] kfree_skb+0x32/0xa0 [] tipc_sk_mcast_rcv+0x1fa/0x380 [tipc] [] tipc_rcv+0x411/0x1120 [tipc] [] tipc_udp_recv+0xc6/0x1e0 [tipc] [] udp_queue_rcv_one_skb+0x1a9/0x500 [] udp_unicast_rcv_skb.isra.66+0x75/0x90 [] __udp4_lib_rcv+0x537/0xc40 [] ip_protocol_deliver_rcu+0xdf/0x1d0 [] ip_local_deliver_finish+0x4a/0x50 [] ip_local_deliver+0x6b/0xe0 [] ip_rcv+0x27b/0x36a [] __netif_receive_skb_core+0xb47/0xc40 [] process_backlog+0xae/0x160 Commit 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") tried to fix this double free by not releasing the skbs in be->arrvq, which would definitely cause the skbs' leak. The problem is we shouldn't process the skbs in be->arrvq without any lock to protect the code from peeking to dequeuing them. The fix here is to use a temp skb list instead of be->arrvq to make it "per thread safe". While at it, remove the no-longer-used be->arrvq. Fixes: cb1b728096f5 ("tipc: eliminate race condition at multicast reception") Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/tipc/node.c | 9 ++++----- net/tipc/socket.c | 16 +++------------- 2 files changed, 7 insertions(+), 18 deletions(-) diff --git a/net/tipc/node.c b/net/tipc/node.c index e0ee832..0c636fb 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -72,7 +72,6 @@ struct tipc_link_entry { struct tipc_bclink_entry { struct tipc_link *link; struct sk_buff_head inputq1; - struct sk_buff_head arrvq; struct sk_buff_head inputq2; struct sk_buff_head namedq; u16 named_rcv_nxt; @@ -552,7 +551,6 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id, INIT_LIST_HEAD(&n->conn_sks); skb_queue_head_init(&n->bc_entry.namedq); skb_queue_head_init(&n->bc_entry.inputq1); - __skb_queue_head_init(&n->bc_entry.arrvq); skb_queue_head_init(&n->bc_entry.inputq2); for (i = 0; i < MAX_BEARERS; i++) spin_lock_init(&n->links[i].lock); @@ -1803,14 +1801,15 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests) static void tipc_node_mcast_rcv(struct tipc_node *n) { struct tipc_bclink_entry *be = &n->bc_entry; + struct sk_buff_head tmpq; - /* 'arrvq' is under inputq2's lock protection */ + __skb_queue_head_init(&tmpq); spin_lock_bh(&be->inputq2.lock); spin_lock_bh(&be->inputq1.lock); - skb_queue_splice_tail_init(&be->inputq1, &be->arrvq); + skb_queue_splice_tail_init(&be->inputq1, &tmpq); spin_unlock_bh(&be->inputq1.lock); spin_unlock_bh(&be->inputq2.lock); - tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2); + tipc_sk_mcast_rcv(n->net, &tmpq, &be->inputq2); } static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr, diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 022999e..2870798 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1210,8 +1210,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, __skb_queue_head_init(&tmpq); INIT_LIST_HEAD(&dports); - skb = tipc_skb_peek(arrvq, &inputq->lock); - for (; skb; skb = tipc_skb_peek(arrvq, &inputq->lock)) { Please remove function tipc_skb_peek() because it is no longer used. + while ((skb = __skb_dequeue(arrvq)) != NULL) { hdr = buf_msg(skb); user = msg_user(hdr); mtyp = msg_type(hdr); @@ -1220,13 +1219,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, type = msg_nametype(hdr); if (mtyp == TIPC_GRP_UCAST_MSG || user == GROUP_PROTOCOL) { - spin_lock_bh(&inputq->lock); - if (skb_peek(arrvq) == skb) { - __skb_dequeue(arrvq); - __skb_queue_tail(inputq, skb); - } - kfree_skb(skb); - spin_unlock_bh(&inputq->lock); + skb_queue_tail(inputq, skb); continue; } @@ -1263,10 +1256,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, } /* Append to inputq if not already done by other thread */ Please remove above comment because the temporary queue is thread-safe spin_lock_bh(&inputq->lock); - if (skb_peek(arrvq) == skb) { - skb_queue_splice_tail_init(&tmpq, inputq); - __skb_dequeue(arrvq); - } + skb_queue_splice_tail_init(&tmpq, inputq); spin_unlock_bh(&inputq->lock); __skb_queue_purge(&tmpq); kfree_skb(skb);
On 4/28/21 3:30 PM, Xin Long wrote: > After commit cb1b728096f5 ("tipc: eliminate race condition at multicast > reception"), when processing the multicast reception, the packets are > firstly moved from be->inputq1 to be->arrvq in tipc_node_broadcast(), > then process be->arrvq in tipc_sk_mcast_rcv(). > > In tipc_sk_mcast_rcv(), it gets the 1st skb by skb_peek(), then process > this skb without any lock. It means meanwhile another thread could also > call tipc_sk_mcast_rcv() and process be->arrvq and pick up the same skb, > then free it. A double free issue will be caused as Li Shuang reported: > > [] kernel BUG at mm/slub.c:305! > [] kfree+0x3a7/0x3d0 > [] kfree_skb+0x32/0xa0 > [] skb_release_data+0xb4/0x170 > [] kfree_skb+0x32/0xa0 > [] skb_release_data+0xb4/0x170 > [] kfree_skb+0x32/0xa0 > [] tipc_sk_mcast_rcv+0x1fa/0x380 [tipc] > [] tipc_rcv+0x411/0x1120 [tipc] > [] tipc_udp_recv+0xc6/0x1e0 [tipc] > [] udp_queue_rcv_one_skb+0x1a9/0x500 > [] udp_unicast_rcv_skb.isra.66+0x75/0x90 > [] __udp4_lib_rcv+0x537/0xc40 > [] ip_protocol_deliver_rcu+0xdf/0x1d0 > [] ip_local_deliver_finish+0x4a/0x50 > [] ip_local_deliver+0x6b/0xe0 > [] ip_rcv+0x27b/0x36a > [] __netif_receive_skb_core+0xb47/0xc40 > [] process_backlog+0xae/0x160 > > Commit 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") > tried to fix this double free by not releasing the skbs in be->arrvq, > which would definitely cause the skbs' leak. > > The problem is we shouldn't process the skbs in be->arrvq without any > lock to protect the code from peeking to dequeuing them. The fix here > is to use a temp skb list instead of be->arrvq to make it "per thread > safe". While at it, remove the no-longer-used be->arrvq. > > Fixes: cb1b728096f5 ("tipc: eliminate race condition at multicast reception") > Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") > Reported-by: Li Shuang <shuali@redhat.com> > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > net/tipc/node.c | 9 ++++----- > net/tipc/socket.c | 16 +++------------- > 2 files changed, 7 insertions(+), 18 deletions(-) > > diff --git a/net/tipc/node.c b/net/tipc/node.c > index e0ee832..0c636fb 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -72,7 +72,6 @@ struct tipc_link_entry { > struct tipc_bclink_entry { > struct tipc_link *link; > struct sk_buff_head inputq1; > - struct sk_buff_head arrvq; > struct sk_buff_head inputq2; > struct sk_buff_head namedq; > u16 named_rcv_nxt; > @@ -552,7 +551,6 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id, > INIT_LIST_HEAD(&n->conn_sks); > skb_queue_head_init(&n->bc_entry.namedq); > skb_queue_head_init(&n->bc_entry.inputq1); > - __skb_queue_head_init(&n->bc_entry.arrvq); > skb_queue_head_init(&n->bc_entry.inputq2); > for (i = 0; i < MAX_BEARERS; i++) > spin_lock_init(&n->links[i].lock); > @@ -1803,14 +1801,15 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests) > static void tipc_node_mcast_rcv(struct tipc_node *n) > { > struct tipc_bclink_entry *be = &n->bc_entry; > + struct sk_buff_head tmpq; > > - /* 'arrvq' is under inputq2's lock protection */ > + __skb_queue_head_init(&tmpq); > spin_lock_bh(&be->inputq2.lock); > spin_lock_bh(&be->inputq1.lock); > - skb_queue_splice_tail_init(&be->inputq1, &be->arrvq); > + skb_queue_splice_tail_init(&be->inputq1, &tmpq); > spin_unlock_bh(&be->inputq1.lock); > spin_unlock_bh(&be->inputq2.lock); > - tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2); > + tipc_sk_mcast_rcv(n->net, &tmpq, &be->inputq2); > } > > static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr, > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 022999e..2870798 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -1210,8 +1210,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, > __skb_queue_head_init(&tmpq); > INIT_LIST_HEAD(&dports); > > - skb = tipc_skb_peek(arrvq, &inputq->lock); > - for (; skb; skb = tipc_skb_peek(arrvq, &inputq->lock)) { > + while ((skb = __skb_dequeue(arrvq)) != NULL) { > hdr = buf_msg(skb); > user = msg_user(hdr); > mtyp = msg_type(hdr); > @@ -1220,13 +1219,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, > type = msg_nametype(hdr); > > if (mtyp == TIPC_GRP_UCAST_MSG || user == GROUP_PROTOCOL) { > - spin_lock_bh(&inputq->lock); > - if (skb_peek(arrvq) == skb) { > - __skb_dequeue(arrvq); > - __skb_queue_tail(inputq, skb); > - } > - kfree_skb(skb); > - spin_unlock_bh(&inputq->lock); > + skb_queue_tail(inputq, skb); > continue; > } > > @@ -1263,10 +1256,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, > } > /* Append to inputq if not already done by other thread */ > spin_lock_bh(&inputq->lock); > - if (skb_peek(arrvq) == skb) { > - skb_queue_splice_tail_init(&tmpq, inputq); > - __skb_dequeue(arrvq); > - } > + skb_queue_splice_tail_init(&tmpq, inputq); > spin_unlock_bh(&inputq->lock); > __skb_queue_purge(&tmpq); > kfree_skb(skb); Nack. This would invalidate the sequence guarantee of messages between two specific sockets. The whole point of having a lock protected arrival queue is to preserve the order when messages are moved from inputq1 to inputq2. Let's take a discussion on our mailing list. ///jon
diff --git a/net/tipc/node.c b/net/tipc/node.c index e0ee832..0c636fb 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -72,7 +72,6 @@ struct tipc_link_entry { struct tipc_bclink_entry { struct tipc_link *link; struct sk_buff_head inputq1; - struct sk_buff_head arrvq; struct sk_buff_head inputq2; struct sk_buff_head namedq; u16 named_rcv_nxt; @@ -552,7 +551,6 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id, INIT_LIST_HEAD(&n->conn_sks); skb_queue_head_init(&n->bc_entry.namedq); skb_queue_head_init(&n->bc_entry.inputq1); - __skb_queue_head_init(&n->bc_entry.arrvq); skb_queue_head_init(&n->bc_entry.inputq2); for (i = 0; i < MAX_BEARERS; i++) spin_lock_init(&n->links[i].lock); @@ -1803,14 +1801,15 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests) static void tipc_node_mcast_rcv(struct tipc_node *n) { struct tipc_bclink_entry *be = &n->bc_entry; + struct sk_buff_head tmpq; - /* 'arrvq' is under inputq2's lock protection */ + __skb_queue_head_init(&tmpq); spin_lock_bh(&be->inputq2.lock); spin_lock_bh(&be->inputq1.lock); - skb_queue_splice_tail_init(&be->inputq1, &be->arrvq); + skb_queue_splice_tail_init(&be->inputq1, &tmpq); spin_unlock_bh(&be->inputq1.lock); spin_unlock_bh(&be->inputq2.lock); - tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2); + tipc_sk_mcast_rcv(n->net, &tmpq, &be->inputq2); } static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr, diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 022999e..2870798 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1210,8 +1210,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, __skb_queue_head_init(&tmpq); INIT_LIST_HEAD(&dports); - skb = tipc_skb_peek(arrvq, &inputq->lock); - for (; skb; skb = tipc_skb_peek(arrvq, &inputq->lock)) { + while ((skb = __skb_dequeue(arrvq)) != NULL) { hdr = buf_msg(skb); user = msg_user(hdr); mtyp = msg_type(hdr); @@ -1220,13 +1219,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, type = msg_nametype(hdr); if (mtyp == TIPC_GRP_UCAST_MSG || user == GROUP_PROTOCOL) { - spin_lock_bh(&inputq->lock); - if (skb_peek(arrvq) == skb) { - __skb_dequeue(arrvq); - __skb_queue_tail(inputq, skb); - } - kfree_skb(skb); - spin_unlock_bh(&inputq->lock); + skb_queue_tail(inputq, skb); continue; } @@ -1263,10 +1256,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, } /* Append to inputq if not already done by other thread */ spin_lock_bh(&inputq->lock); - if (skb_peek(arrvq) == skb) { - skb_queue_splice_tail_init(&tmpq, inputq); - __skb_dequeue(arrvq); - } + skb_queue_splice_tail_init(&tmpq, inputq); spin_unlock_bh(&inputq->lock); __skb_queue_purge(&tmpq); kfree_skb(skb);
After commit cb1b728096f5 ("tipc: eliminate race condition at multicast reception"), when processing the multicast reception, the packets are firstly moved from be->inputq1 to be->arrvq in tipc_node_broadcast(), then process be->arrvq in tipc_sk_mcast_rcv(). In tipc_sk_mcast_rcv(), it gets the 1st skb by skb_peek(), then process this skb without any lock. It means meanwhile another thread could also call tipc_sk_mcast_rcv() and process be->arrvq and pick up the same skb, then free it. A double free issue will be caused as Li Shuang reported: [] kernel BUG at mm/slub.c:305! [] kfree+0x3a7/0x3d0 [] kfree_skb+0x32/0xa0 [] skb_release_data+0xb4/0x170 [] kfree_skb+0x32/0xa0 [] skb_release_data+0xb4/0x170 [] kfree_skb+0x32/0xa0 [] tipc_sk_mcast_rcv+0x1fa/0x380 [tipc] [] tipc_rcv+0x411/0x1120 [tipc] [] tipc_udp_recv+0xc6/0x1e0 [tipc] [] udp_queue_rcv_one_skb+0x1a9/0x500 [] udp_unicast_rcv_skb.isra.66+0x75/0x90 [] __udp4_lib_rcv+0x537/0xc40 [] ip_protocol_deliver_rcu+0xdf/0x1d0 [] ip_local_deliver_finish+0x4a/0x50 [] ip_local_deliver+0x6b/0xe0 [] ip_rcv+0x27b/0x36a [] __netif_receive_skb_core+0xb47/0xc40 [] process_backlog+0xae/0x160 Commit 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") tried to fix this double free by not releasing the skbs in be->arrvq, which would definitely cause the skbs' leak. The problem is we shouldn't process the skbs in be->arrvq without any lock to protect the code from peeking to dequeuing them. The fix here is to use a temp skb list instead of be->arrvq to make it "per thread safe". While at it, remove the no-longer-used be->arrvq. Fixes: cb1b728096f5 ("tipc: eliminate race condition at multicast reception") Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/tipc/node.c | 9 ++++----- net/tipc/socket.c | 16 +++------------- 2 files changed, 7 insertions(+), 18 deletions(-)