Message ID | 20240122011807.2110357-1-shaozhengchao@huawei.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 234ec0b6034b16869d45128b8cd2dc6ffe596f04 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4] netlink: fix potential sleeping issue in mqueue_flush_file | expand |
On Mon, Jan 22, 2024 at 09:18:07AM +0800, Zhengchao Shao wrote: > I analyze the potential sleeping issue of the following processes: > Thread A Thread B > ... netlink_create //ref = 1 > do_mq_notify ... > sock = netlink_getsockbyfilp ... //ref = 2 > info->notify_sock = sock; ... > ... netlink_sendmsg > ... skb = netlink_alloc_large_skb //skb->head is vmalloced > ... netlink_unicast > ... sk = netlink_getsockbyportid //ref = 3 > ... netlink_sendskb > ... __netlink_sendskb > ... skb_queue_tail //put skb to sk_receive_queue > ... sock_put //ref = 2 > ... ... > ... netlink_release > ... deferred_put_nlk_sk //ref = 1 > mqueue_flush_file > spin_lock > remove_notification > netlink_sendskb > sock_put //ref = 0 > sk_free > ... > __sk_destruct > netlink_sock_destruct > skb_queue_purge //get skb from sk_receive_queue > ... > __skb_queue_purge_reason > kfree_skb_reason > __kfree_skb > ... > skb_release_all > skb_release_head_state > netlink_skb_destructor > vfree(skb->head) //sleeping while holding spinlock > > In netlink_sendmsg, if the memory pointed to by skb->head is allocated by > vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. > When the mqueue executes flush, the sleeping bug will occur. Use > vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. mqueue notification is of NOTIFY_COOKIE_LEN size: static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification) { [...] if (notification->sigev_notify == SIGEV_THREAD) { long timeo; /* create the notify skb */ nc = alloc_skb(NOTIFY_COOKIE_LEN, GFP_KERNEL); if (!nc) return -ENOMEM; Do you have a reproducer?
On 2024/1/22 16:56, Pablo Neira Ayuso wrote: > On Mon, Jan 22, 2024 at 09:18:07AM +0800, Zhengchao Shao wrote: >> I analyze the potential sleeping issue of the following processes: >> Thread A Thread B >> ... netlink_create //ref = 1 >> do_mq_notify ... >> sock = netlink_getsockbyfilp ... //ref = 2 >> info->notify_sock = sock; ... >> ... netlink_sendmsg >> ... skb = netlink_alloc_large_skb //skb->head is vmalloced >> ... netlink_unicast >> ... sk = netlink_getsockbyportid //ref = 3 >> ... netlink_sendskb >> ... __netlink_sendskb >> ... skb_queue_tail //put skb to sk_receive_queue >> ... sock_put //ref = 2 >> ... ... >> ... netlink_release >> ... deferred_put_nlk_sk //ref = 1 >> mqueue_flush_file >> spin_lock >> remove_notification >> netlink_sendskb >> sock_put //ref = 0 >> sk_free >> ... >> __sk_destruct >> netlink_sock_destruct >> skb_queue_purge //get skb from sk_receive_queue >> ... >> __skb_queue_purge_reason >> kfree_skb_reason >> __kfree_skb >> ... >> skb_release_all >> skb_release_head_state >> netlink_skb_destructor >> vfree(skb->head) //sleeping while holding spinlock >> >> In netlink_sendmsg, if the memory pointed to by skb->head is allocated by >> vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. >> When the mqueue executes flush, the sleeping bug will occur. Use >> vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. > > mqueue notification is of NOTIFY_COOKIE_LEN size: > > static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification) > { > [...] > if (notification->sigev_notify == SIGEV_THREAD) { > long timeo; > > /* create the notify skb */ > nc = alloc_skb(NOTIFY_COOKIE_LEN, GFP_KERNEL); > if (!nc) > return -ENOMEM; > > Do you have a reproducer? Hi Pablo: I donot have reproducer. I found the issue when running syz on the 5.10 stable branch, but it only happened once. Then I analyzed the mainline code and found the same issue. The sock can be obtained from the value of sigev_signo transferred by the user in do_mq_notify. And the sock may be of type netlink, and it is possible to allocate the head area using vmalloc. Not only release the skb allocated in do_mq_notify, but also release the skb allocated in netlink_sendmsg when put sock mqueue_flush_file. What I missed? Thank you. Zhengchao Shao
Hi, On Mon, 2024-01-22 at 19:10 +0800, shaozhengchao wrote: > On 2024/1/22 16:56, Pablo Neira Ayuso wrote: > > On Mon, Jan 22, 2024 at 09:18:07AM +0800, Zhengchao Shao wrote: > > > I analyze the potential sleeping issue of the following processes: > > > Thread A Thread B > > > ... netlink_create //ref = 1 > > > do_mq_notify ... > > > sock = netlink_getsockbyfilp ... //ref = 2 > > > info->notify_sock = sock; ... > > > ... netlink_sendmsg > > > ... skb = netlink_alloc_large_skb //skb->head is vmalloced > > > ... netlink_unicast > > > ... sk = netlink_getsockbyportid //ref = 3 > > > ... netlink_sendskb > > > ... __netlink_sendskb > > > ... skb_queue_tail //put skb to sk_receive_queue > > > ... sock_put //ref = 2 > > > ... ... > > > ... netlink_release > > > ... deferred_put_nlk_sk //ref = 1 > > > mqueue_flush_file > > > spin_lock > > > remove_notification > > > netlink_sendskb > > > sock_put //ref = 0 > > > sk_free > > > ... > > > __sk_destruct > > > netlink_sock_destruct > > > skb_queue_purge //get skb from sk_receive_queue > > > ... > > > __skb_queue_purge_reason > > > kfree_skb_reason > > > __kfree_skb > > > ... > > > skb_release_all > > > skb_release_head_state > > > netlink_skb_destructor > > > vfree(skb->head) //sleeping while holding spinlock > > > > > > In netlink_sendmsg, if the memory pointed to by skb->head is allocated by > > > vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. > > > When the mqueue executes flush, the sleeping bug will occur. Use > > > vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. > > > > mqueue notification is of NOTIFY_COOKIE_LEN size: > > > > static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification) > > { > > [...] > > if (notification->sigev_notify == SIGEV_THREAD) { > > long timeo; > > > > /* create the notify skb */ > > nc = alloc_skb(NOTIFY_COOKIE_LEN, GFP_KERNEL); > > if (!nc) > > return -ENOMEM; > > > > Do you have a reproducer? > Hi Pablo: > I donot have reproducer. I found the issue when running syz on > the 5.10 stable branch, but it only happened once. Then I analyzed the > mainline code and found the same issue. > The sock can be obtained from the value of sigev_signo > transferred by the user in do_mq_notify. And the sock may be of type > netlink, and it is possible to allocate the head area using vmalloc. > Not only release the skb allocated in do_mq_notify, but also release > the skb allocated in netlink_sendmsg when put sock mqueue_flush_file. > What I missed? I believe your explanation is solid and the patch looks safe, I'm applying it. Thank you! Paolo p.s. for once I'm answering in place of Pablo, which is sort of funny given that our names are confused a significant number of times on the ML...)
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Mon, 22 Jan 2024 09:18:07 +0800 you wrote: > I analyze the potential sleeping issue of the following processes: > Thread A Thread B > ... netlink_create //ref = 1 > do_mq_notify ... > sock = netlink_getsockbyfilp ... //ref = 2 > info->notify_sock = sock; ... > ... netlink_sendmsg > ... skb = netlink_alloc_large_skb //skb->head is vmalloced > ... netlink_unicast > ... sk = netlink_getsockbyportid //ref = 3 > ... netlink_sendskb > ... __netlink_sendskb > ... skb_queue_tail //put skb to sk_receive_queue > ... sock_put //ref = 2 > ... ... > ... netlink_release > ... deferred_put_nlk_sk //ref = 1 > mqueue_flush_file > spin_lock > remove_notification > netlink_sendskb > sock_put //ref = 0 > sk_free > ... > __sk_destruct > netlink_sock_destruct > skb_queue_purge //get skb from sk_receive_queue > ... > __skb_queue_purge_reason > kfree_skb_reason > __kfree_skb > ... > skb_release_all > skb_release_head_state > netlink_skb_destructor > vfree(skb->head) //sleeping while holding spinlock > > [...] Here is the summary with links: - [net,v4] netlink: fix potential sleeping issue in mqueue_flush_file https://git.kernel.org/netdev/net/c/234ec0b6034b You are awesome, thank you!
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 4ed8ffd58ff3..9c962347cf85 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -374,7 +374,7 @@ static void netlink_skb_destructor(struct sk_buff *skb) if (is_vmalloc_addr(skb->head)) { if (!skb->cloned || !atomic_dec_return(&(skb_shinfo(skb)->dataref))) - vfree(skb->head); + vfree_atomic(skb->head); skb->head = NULL; }
I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue. Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> --- v4: Use vfree_atomic to release skb->head v3: Put sock after releasing the spinlock. v2: CCed some networking maintainer & netdev list --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)