diff mbox series

[net] tipc: fix a race in tipc_sk_mcast_rcv

Message ID 25c57c05b6f5cc81fd49b8f060ebf0961ea8af68.1619638230.git.lucien.xin@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] tipc: fix a race in tipc_sk_mcast_rcv | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 66 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/header_inline success Link

Commit Message

Xin Long April 28, 2021, 7:30 p.m. UTC
After commit cb1b728096f5 ("tipc: eliminate race condition at multicast
reception"), when processing the multicast reception, the packets are
firstly moved from be->inputq1 to be->arrvq in tipc_node_broadcast(),
then process be->arrvq in tipc_sk_mcast_rcv().

In tipc_sk_mcast_rcv(), it gets the 1st skb by skb_peek(), then process
this skb without any lock. It means meanwhile another thread could also
call tipc_sk_mcast_rcv() and process be->arrvq and pick up the same skb,
then free it. A double free issue will be caused as Li Shuang reported:

  [] kernel BUG at mm/slub.c:305!
  []  kfree+0x3a7/0x3d0
  []  kfree_skb+0x32/0xa0
  []  skb_release_data+0xb4/0x170
  []  kfree_skb+0x32/0xa0
  []  skb_release_data+0xb4/0x170
  []  kfree_skb+0x32/0xa0
  []  tipc_sk_mcast_rcv+0x1fa/0x380 [tipc]
  []  tipc_rcv+0x411/0x1120 [tipc]
  []  tipc_udp_recv+0xc6/0x1e0 [tipc]
  []  udp_queue_rcv_one_skb+0x1a9/0x500
  []  udp_unicast_rcv_skb.isra.66+0x75/0x90
  []  __udp4_lib_rcv+0x537/0xc40
  []  ip_protocol_deliver_rcu+0xdf/0x1d0
  []  ip_local_deliver_finish+0x4a/0x50
  []  ip_local_deliver+0x6b/0xe0
  []  ip_rcv+0x27b/0x36a
  []  __netif_receive_skb_core+0xb47/0xc40
  []  process_backlog+0xae/0x160

Commit 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
tried to fix this double free by not releasing the skbs in be->arrvq,
which would definitely cause the skbs' leak.

The problem is we shouldn't process the skbs in be->arrvq without any
lock to protect the code from peeking to dequeuing them. The fix here
is to use a temp skb list instead of be->arrvq to make it "per thread
safe". While at it, remove the no-longer-used be->arrvq.

Fixes: cb1b728096f5 ("tipc: eliminate race condition at multicast reception")
Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/tipc/node.c   |  9 ++++-----
 net/tipc/socket.c | 16 +++-------------
 2 files changed, 7 insertions(+), 18 deletions(-)

Comments

Tung Quang Nguyen April 29, 2021, 2:18 a.m. UTC | #1
Hi Xin,

See my comments inline.

Best regards,
Tung Nguyen

-----Original Message-----
From: Xin Long <lucien.xin@gmail.com> 
Sent: Thursday, April 29, 2021 2:31 AM
To: network dev <netdev@vger.kernel.org>; tipc-discussion@lists.sourceforge.net
Cc: kuba@kernel.org; lyl2019@mail.ustc.edu.cn; davem@davemloft.net
Subject: [tipc-discussion] [PATCH net] tipc: fix a race in tipc_sk_mcast_rcv

After commit cb1b728096f5 ("tipc: eliminate race condition at multicast
reception"), when processing the multicast reception, the packets are
firstly moved from be->inputq1 to be->arrvq in tipc_node_broadcast(),
then process be->arrvq in tipc_sk_mcast_rcv().

In tipc_sk_mcast_rcv(), it gets the 1st skb by skb_peek(), then process
this skb without any lock. It means meanwhile another thread could also
call tipc_sk_mcast_rcv() and process be->arrvq and pick up the same skb,
then free it. A double free issue will be caused as Li Shuang reported:

  [] kernel BUG at mm/slub.c:305!
  []  kfree+0x3a7/0x3d0
  []  kfree_skb+0x32/0xa0
  []  skb_release_data+0xb4/0x170
  []  kfree_skb+0x32/0xa0
  []  skb_release_data+0xb4/0x170
  []  kfree_skb+0x32/0xa0
  []  tipc_sk_mcast_rcv+0x1fa/0x380 [tipc]
  []  tipc_rcv+0x411/0x1120 [tipc]
  []  tipc_udp_recv+0xc6/0x1e0 [tipc]
  []  udp_queue_rcv_one_skb+0x1a9/0x500
  []  udp_unicast_rcv_skb.isra.66+0x75/0x90
  []  __udp4_lib_rcv+0x537/0xc40
  []  ip_protocol_deliver_rcu+0xdf/0x1d0
  []  ip_local_deliver_finish+0x4a/0x50
  []  ip_local_deliver+0x6b/0xe0
  []  ip_rcv+0x27b/0x36a
  []  __netif_receive_skb_core+0xb47/0xc40
  []  process_backlog+0xae/0x160

Commit 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
tried to fix this double free by not releasing the skbs in be->arrvq,
which would definitely cause the skbs' leak.

The problem is we shouldn't process the skbs in be->arrvq without any
lock to protect the code from peeking to dequeuing them. The fix here
is to use a temp skb list instead of be->arrvq to make it "per thread
safe". While at it, remove the no-longer-used be->arrvq.

Fixes: cb1b728096f5 ("tipc: eliminate race condition at multicast reception")
Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/tipc/node.c   |  9 ++++-----
 net/tipc/socket.c | 16 +++-------------
 2 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index e0ee832..0c636fb 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -72,7 +72,6 @@ struct tipc_link_entry {
 struct tipc_bclink_entry {
 	struct tipc_link *link;
 	struct sk_buff_head inputq1;
-	struct sk_buff_head arrvq;
 	struct sk_buff_head inputq2;
 	struct sk_buff_head namedq;
 	u16 named_rcv_nxt;
@@ -552,7 +551,6 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id,
 	INIT_LIST_HEAD(&n->conn_sks);
 	skb_queue_head_init(&n->bc_entry.namedq);
 	skb_queue_head_init(&n->bc_entry.inputq1);
-	__skb_queue_head_init(&n->bc_entry.arrvq);
 	skb_queue_head_init(&n->bc_entry.inputq2);
 	for (i = 0; i < MAX_BEARERS; i++)
 		spin_lock_init(&n->links[i].lock);
@@ -1803,14 +1801,15 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests)
 static void tipc_node_mcast_rcv(struct tipc_node *n)
 {
 	struct tipc_bclink_entry *be = &n->bc_entry;
+	struct sk_buff_head tmpq;
 
-	/* 'arrvq' is under inputq2's lock protection */
+	__skb_queue_head_init(&tmpq);
 	spin_lock_bh(&be->inputq2.lock);
 	spin_lock_bh(&be->inputq1.lock);
-	skb_queue_splice_tail_init(&be->inputq1, &be->arrvq);
+	skb_queue_splice_tail_init(&be->inputq1, &tmpq);
 	spin_unlock_bh(&be->inputq1.lock);
 	spin_unlock_bh(&be->inputq2.lock);
-	tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2);
+	tipc_sk_mcast_rcv(n->net, &tmpq, &be->inputq2);
 }
 
 static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr,
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 022999e..2870798 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1210,8 +1210,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
 	__skb_queue_head_init(&tmpq);
 	INIT_LIST_HEAD(&dports);
 
-	skb = tipc_skb_peek(arrvq, &inputq->lock);
-	for (; skb; skb = tipc_skb_peek(arrvq, &inputq->lock)) {
Please remove function tipc_skb_peek() because it is no longer used.

+	while ((skb = __skb_dequeue(arrvq)) != NULL) {
 		hdr = buf_msg(skb);
 		user = msg_user(hdr);
 		mtyp = msg_type(hdr);
@@ -1220,13 +1219,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
 		type = msg_nametype(hdr);
 
 		if (mtyp == TIPC_GRP_UCAST_MSG || user == GROUP_PROTOCOL) {
-			spin_lock_bh(&inputq->lock);
-			if (skb_peek(arrvq) == skb) {
-				__skb_dequeue(arrvq);
-				__skb_queue_tail(inputq, skb);
-			}
-			kfree_skb(skb);
-			spin_unlock_bh(&inputq->lock);
+			skb_queue_tail(inputq, skb);
 			continue;
 		}
 
@@ -1263,10 +1256,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
 		}
 		/* Append to inputq if not already done by other thread */
Please remove above comment because the temporary queue is thread-safe
 		spin_lock_bh(&inputq->lock);
-		if (skb_peek(arrvq) == skb) {
-			skb_queue_splice_tail_init(&tmpq, inputq);
-			__skb_dequeue(arrvq);
-		}
+		skb_queue_splice_tail_init(&tmpq, inputq);
 		spin_unlock_bh(&inputq->lock);
 		__skb_queue_purge(&tmpq);
 		kfree_skb(skb);
Jon Maloy May 13, 2021, 9:15 p.m. UTC | #2
On 4/28/21 3:30 PM, Xin Long wrote:
> After commit cb1b728096f5 ("tipc: eliminate race condition at multicast
> reception"), when processing the multicast reception, the packets are
> firstly moved from be->inputq1 to be->arrvq in tipc_node_broadcast(),
> then process be->arrvq in tipc_sk_mcast_rcv().
>
> In tipc_sk_mcast_rcv(), it gets the 1st skb by skb_peek(), then process
> this skb without any lock. It means meanwhile another thread could also
> call tipc_sk_mcast_rcv() and process be->arrvq and pick up the same skb,
> then free it. A double free issue will be caused as Li Shuang reported:
>
>    [] kernel BUG at mm/slub.c:305!
>    []  kfree+0x3a7/0x3d0
>    []  kfree_skb+0x32/0xa0
>    []  skb_release_data+0xb4/0x170
>    []  kfree_skb+0x32/0xa0
>    []  skb_release_data+0xb4/0x170
>    []  kfree_skb+0x32/0xa0
>    []  tipc_sk_mcast_rcv+0x1fa/0x380 [tipc]
>    []  tipc_rcv+0x411/0x1120 [tipc]
>    []  tipc_udp_recv+0xc6/0x1e0 [tipc]
>    []  udp_queue_rcv_one_skb+0x1a9/0x500
>    []  udp_unicast_rcv_skb.isra.66+0x75/0x90
>    []  __udp4_lib_rcv+0x537/0xc40
>    []  ip_protocol_deliver_rcu+0xdf/0x1d0
>    []  ip_local_deliver_finish+0x4a/0x50
>    []  ip_local_deliver+0x6b/0xe0
>    []  ip_rcv+0x27b/0x36a
>    []  __netif_receive_skb_core+0xb47/0xc40
>    []  process_backlog+0xae/0x160
>
> Commit 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
> tried to fix this double free by not releasing the skbs in be->arrvq,
> which would definitely cause the skbs' leak.
>
> The problem is we shouldn't process the skbs in be->arrvq without any
> lock to protect the code from peeking to dequeuing them. The fix here
> is to use a temp skb list instead of be->arrvq to make it "per thread
> safe". While at it, remove the no-longer-used be->arrvq.
>
> Fixes: cb1b728096f5 ("tipc: eliminate race condition at multicast reception")
> Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
> Reported-by: Li Shuang <shuali@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>   net/tipc/node.c   |  9 ++++-----
>   net/tipc/socket.c | 16 +++-------------
>   2 files changed, 7 insertions(+), 18 deletions(-)
>
> diff --git a/net/tipc/node.c b/net/tipc/node.c
> index e0ee832..0c636fb 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -72,7 +72,6 @@ struct tipc_link_entry {
>   struct tipc_bclink_entry {
>   	struct tipc_link *link;
>   	struct sk_buff_head inputq1;
> -	struct sk_buff_head arrvq;
>   	struct sk_buff_head inputq2;
>   	struct sk_buff_head namedq;
>   	u16 named_rcv_nxt;
> @@ -552,7 +551,6 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id,
>   	INIT_LIST_HEAD(&n->conn_sks);
>   	skb_queue_head_init(&n->bc_entry.namedq);
>   	skb_queue_head_init(&n->bc_entry.inputq1);
> -	__skb_queue_head_init(&n->bc_entry.arrvq);
>   	skb_queue_head_init(&n->bc_entry.inputq2);
>   	for (i = 0; i < MAX_BEARERS; i++)
>   		spin_lock_init(&n->links[i].lock);
> @@ -1803,14 +1801,15 @@ void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests)
>   static void tipc_node_mcast_rcv(struct tipc_node *n)
>   {
>   	struct tipc_bclink_entry *be = &n->bc_entry;
> +	struct sk_buff_head tmpq;
>   
> -	/* 'arrvq' is under inputq2's lock protection */
> +	__skb_queue_head_init(&tmpq);
>   	spin_lock_bh(&be->inputq2.lock);
>   	spin_lock_bh(&be->inputq1.lock);
> -	skb_queue_splice_tail_init(&be->inputq1, &be->arrvq);
> +	skb_queue_splice_tail_init(&be->inputq1, &tmpq);
>   	spin_unlock_bh(&be->inputq1.lock);
>   	spin_unlock_bh(&be->inputq2.lock);
> -	tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2);
> +	tipc_sk_mcast_rcv(n->net, &tmpq, &be->inputq2);
>   }
>   
>   static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr,
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 022999e..2870798 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1210,8 +1210,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
>   	__skb_queue_head_init(&tmpq);
>   	INIT_LIST_HEAD(&dports);
>   
> -	skb = tipc_skb_peek(arrvq, &inputq->lock);
> -	for (; skb; skb = tipc_skb_peek(arrvq, &inputq->lock)) {
> +	while ((skb = __skb_dequeue(arrvq)) != NULL) {
>   		hdr = buf_msg(skb);
>   		user = msg_user(hdr);
>   		mtyp = msg_type(hdr);
> @@ -1220,13 +1219,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
>   		type = msg_nametype(hdr);
>   
>   		if (mtyp == TIPC_GRP_UCAST_MSG || user == GROUP_PROTOCOL) {
> -			spin_lock_bh(&inputq->lock);
> -			if (skb_peek(arrvq) == skb) {
> -				__skb_dequeue(arrvq);
> -				__skb_queue_tail(inputq, skb);
> -			}
> -			kfree_skb(skb);
> -			spin_unlock_bh(&inputq->lock);
> +			skb_queue_tail(inputq, skb);
>   			continue;
>   		}
>   
> @@ -1263,10 +1256,7 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
>   		}
>   		/* Append to inputq if not already done by other thread */
>   		spin_lock_bh(&inputq->lock);
> -		if (skb_peek(arrvq) == skb) {
> -			skb_queue_splice_tail_init(&tmpq, inputq);
> -			__skb_dequeue(arrvq);
> -		}
> +		skb_queue_splice_tail_init(&tmpq, inputq);
>   		spin_unlock_bh(&inputq->lock);
>   		__skb_queue_purge(&tmpq);
>   		kfree_skb(skb);
Nack.

This would invalidate the sequence guarantee of messages between two 
specific sockets.
The whole point of having a lock protected arrival queue is to preserve 
the order when messages are moved from inputq1 to inputq2.
Let's take a discussion on our mailing list.

///jon
diff mbox series

Patch

diff --git a/net/tipc/node.c b/net/tipc/node.c
index e0ee832..0c636fb 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -72,7 +72,6 @@  struct tipc_link_entry {
 struct tipc_bclink_entry {
 	struct tipc_link *link;
 	struct sk_buff_head inputq1;
-	struct sk_buff_head arrvq;
 	struct sk_buff_head inputq2;
 	struct sk_buff_head namedq;
 	u16 named_rcv_nxt;
@@ -552,7 +551,6 @@  struct tipc_node *tipc_node_create(struct net *net, u32 addr, u8 *peer_id,
 	INIT_LIST_HEAD(&n->conn_sks);
 	skb_queue_head_init(&n->bc_entry.namedq);
 	skb_queue_head_init(&n->bc_entry.inputq1);
-	__skb_queue_head_init(&n->bc_entry.arrvq);
 	skb_queue_head_init(&n->bc_entry.inputq2);
 	for (i = 0; i < MAX_BEARERS; i++)
 		spin_lock_init(&n->links[i].lock);
@@ -1803,14 +1801,15 @@  void tipc_node_broadcast(struct net *net, struct sk_buff *skb, int rc_dests)
 static void tipc_node_mcast_rcv(struct tipc_node *n)
 {
 	struct tipc_bclink_entry *be = &n->bc_entry;
+	struct sk_buff_head tmpq;
 
-	/* 'arrvq' is under inputq2's lock protection */
+	__skb_queue_head_init(&tmpq);
 	spin_lock_bh(&be->inputq2.lock);
 	spin_lock_bh(&be->inputq1.lock);
-	skb_queue_splice_tail_init(&be->inputq1, &be->arrvq);
+	skb_queue_splice_tail_init(&be->inputq1, &tmpq);
 	spin_unlock_bh(&be->inputq1.lock);
 	spin_unlock_bh(&be->inputq2.lock);
-	tipc_sk_mcast_rcv(n->net, &be->arrvq, &be->inputq2);
+	tipc_sk_mcast_rcv(n->net, &tmpq, &be->inputq2);
 }
 
 static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr,
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 022999e..2870798 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1210,8 +1210,7 @@  void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
 	__skb_queue_head_init(&tmpq);
 	INIT_LIST_HEAD(&dports);
 
-	skb = tipc_skb_peek(arrvq, &inputq->lock);
-	for (; skb; skb = tipc_skb_peek(arrvq, &inputq->lock)) {
+	while ((skb = __skb_dequeue(arrvq)) != NULL) {
 		hdr = buf_msg(skb);
 		user = msg_user(hdr);
 		mtyp = msg_type(hdr);
@@ -1220,13 +1219,7 @@  void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
 		type = msg_nametype(hdr);
 
 		if (mtyp == TIPC_GRP_UCAST_MSG || user == GROUP_PROTOCOL) {
-			spin_lock_bh(&inputq->lock);
-			if (skb_peek(arrvq) == skb) {
-				__skb_dequeue(arrvq);
-				__skb_queue_tail(inputq, skb);
-			}
-			kfree_skb(skb);
-			spin_unlock_bh(&inputq->lock);
+			skb_queue_tail(inputq, skb);
 			continue;
 		}
 
@@ -1263,10 +1256,7 @@  void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq,
 		}
 		/* Append to inputq if not already done by other thread */
 		spin_lock_bh(&inputq->lock);
-		if (skb_peek(arrvq) == skb) {
-			skb_queue_splice_tail_init(&tmpq, inputq);
-			__skb_dequeue(arrvq);
-		}
+		skb_queue_splice_tail_init(&tmpq, inputq);
 		spin_unlock_bh(&inputq->lock);
 		__skb_queue_purge(&tmpq);
 		kfree_skb(skb);