Message ID | 20220208232822.3432213-1-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 68468d8c4cd4222a4ca1f185ab5a1c14480d078c |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] veth: fix races around rq->rx_notify_masked | expand |
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Tue, 8 Feb 2022 15:28:22 -0800 you wrote: > From: Eric Dumazet <edumazet@google.com> > > veth being NETIF_F_LLTX enabled, we need to be more careful > whenever we read/write rq->rx_notify_masked. > > BUG: KCSAN: data-race in veth_xmit / veth_xmit > > [...] Here is the summary with links: - [net] veth: fix races around rq->rx_notify_masked https://git.kernel.org/netdev/net/c/68468d8c4cd4 You are awesome, thank you!
On 2022/02/09 8:28, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> Thank you for handling this case. > veth being NETIF_F_LLTX enabled, we need to be more careful > whenever we read/write rq->rx_notify_masked. > > BUG: KCSAN: data-race in veth_xmit / veth_xmit > > write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0: > __veth_xdp_flush drivers/net/veth.c:269 [inline] > veth_xmit+0x307/0x470 drivers/net/veth.c:350 > __netdev_start_xmit include/linux/netdevice.h:4683 [inline] > netdev_start_xmit include/linux/netdevice.h:4697 [inline] > xmit_one+0x105/0x2f0 net/core/dev.c:3473 > dev_hard_start_xmit net/core/dev.c:3489 [inline] > __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 > dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 > br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53 > NF_HOOK include/linux/netfilter.h:307 [inline] > br_forward_finish net/bridge/br_forward.c:66 [inline] > NF_HOOK include/linux/netfilter.h:307 [inline] > __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115 > br_flood+0x521/0x5c0 net/bridge/br_forward.c:242 > br_dev_xmit+0x8b6/0x960 > __netdev_start_xmit include/linux/netdevice.h:4683 [inline] > netdev_start_xmit include/linux/netdevice.h:4697 [inline] > xmit_one+0x105/0x2f0 net/core/dev.c:3473 > dev_hard_start_xmit net/core/dev.c:3489 [inline] > __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 > dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 > neigh_hh_output include/net/neighbour.h:525 [inline] > neigh_output include/net/neighbour.h:539 [inline] > ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228 > ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316 > NF_HOOK_COND include/linux/netfilter.h:296 [inline] > ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430 > dst_output include/net/dst.h:451 [inline] > ip_local_out net/ipv4/ip_output.c:126 [inline] > ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570 > udp_send_skb+0x641/0x880 net/ipv4/udp.c:967 > udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254 > inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 > sock_sendmsg_nosec net/socket.c:705 [inline] > sock_sendmsg net/socket.c:725 [inline] > ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 > ___sys_sendmsg net/socket.c:2467 [inline] > __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 > __do_sys_sendmmsg net/socket.c:2582 [inline] > __se_sys_sendmmsg net/socket.c:2579 [inline] > __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1: > __veth_xdp_flush drivers/net/veth.c:268 [inline] > veth_xmit+0x2d6/0x470 drivers/net/veth.c:350 > __netdev_start_xmit include/linux/netdevice.h:4683 [inline] > netdev_start_xmit include/linux/netdevice.h:4697 [inline] > xmit_one+0x105/0x2f0 net/core/dev.c:3473 > dev_hard_start_xmit net/core/dev.c:3489 [inline] > __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 > dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 > br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53 > NF_HOOK include/linux/netfilter.h:307 [inline] > br_forward_finish net/bridge/br_forward.c:66 [inline] > NF_HOOK include/linux/netfilter.h:307 [inline] > __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115 > br_flood+0x521/0x5c0 net/bridge/br_forward.c:242 > br_dev_xmit+0x8b6/0x960 > __netdev_start_xmit include/linux/netdevice.h:4683 [inline] > netdev_start_xmit include/linux/netdevice.h:4697 [inline] > xmit_one+0x105/0x2f0 net/core/dev.c:3473 > dev_hard_start_xmit net/core/dev.c:3489 [inline] > __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116 > dev_queue_xmit+0x13/0x20 net/core/dev.c:4149 > neigh_hh_output include/net/neighbour.h:525 [inline] > neigh_output include/net/neighbour.h:539 [inline] > ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228 > ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316 > NF_HOOK_COND include/linux/netfilter.h:296 [inline] > ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430 > dst_output include/net/dst.h:451 [inline] > ip_local_out net/ipv4/ip_output.c:126 [inline] > ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570 > udp_send_skb+0x641/0x880 net/ipv4/udp.c:967 > udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254 > inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 > sock_sendmsg_nosec net/socket.c:705 [inline] > sock_sendmsg net/socket.c:725 [inline] > ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 > ___sys_sendmsg net/socket.c:2467 [inline] > __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 > __do_sys_sendmmsg net/socket.c:2582 [inline] > __se_sys_sendmmsg net/socket.c:2579 [inline] > __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > value changed: 0x00 -> 0x01 I'm not familiar with KCSAN. Does this mean rx_notify_masked value is changed while another CPU is reading it? If so, I'm not sure there is a problem with that. At least we could call napi_schedule() twice, but that just causes one extra napi poll due to NAPIF_STATE_MISSED, and it happens very rarely? Toshiaki Makita
On Wed, Feb 9, 2022 at 4:36 AM Toshiaki Makita <toshiaki.makita1@gmail.com> wrote: > > On 2022/02/09 8:28, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > Thank you for handling this case. > > > veth being NETIF_F_LLTX enabled, we need to be more careful > > whenever we read/write rq->rx_notify_masked. > > > > BUG: KCSAN: data-race in veth_xmit / veth_xmit > > > > w > > value changed: 0x00 -> 0x01 > > I'm not familiar with KCSAN. > Does this mean rx_notify_masked value is changed while another CPU is reading it? > Yes. > If so, I'm not sure there is a problem with that. This is a problem if not annotated properly. > At least we could call napi_schedule() twice, but that just causes one extra napi > poll due to NAPIF_STATE_MISSED, and it happens very rarely? > > Toshiaki Makita The issue might be more problematic, a compiler might play bad games, look for load and store tearing.
On 2022/02/11 1:57, Eric Dumazet wrote: > On Wed, Feb 9, 2022 at 4:36 AM Toshiaki Makita > <toshiaki.makita1@gmail.com> wrote: >> >> On 2022/02/09 8:28, Eric Dumazet wrote: >>> From: Eric Dumazet <edumazet@google.com> >> >> Thank you for handling this case. >> >>> veth being NETIF_F_LLTX enabled, we need to be more careful >>> whenever we read/write rq->rx_notify_masked. >>> >>> BUG: KCSAN: data-race in veth_xmit / veth_xmit >>> >>> w >>> value changed: 0x00 -> 0x01 >> >> I'm not familiar with KCSAN. >> Does this mean rx_notify_masked value is changed while another CPU is reading it? >> > > Yes. > >> If so, I'm not sure there is a problem with that. > > This is a problem if not annotated properly. > >> At least we could call napi_schedule() twice, but that just causes one extra napi >> poll due to NAPIF_STATE_MISSED, and it happens very rarely? > > The issue might be more problematic, a compiler might play bad games, > look for load and store tearing. Thank you. I assume you mean problems like those listed in this page, https://lwn.net/Articles/793253/ e.g. "invented stores", not exactly load/store tearing. (since rx_notify_masked is 0/1 value and seems not able to be teared) Now I understand that it's pretty hard to know what exact problem will happen as the behavior is undefined and depends heavily on compiler implementation. Thank you for the fix again! Toshiaki Makita
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 354a963075c5f15d194152ca9b38c59c66763e94..d29fb9759cc9557d62370f016fc160f3dbd16ce3 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -265,9 +265,10 @@ static void __veth_xdp_flush(struct veth_rq *rq) { /* Write ptr_ring before reading rx_notify_masked */ smp_mb(); - if (!rq->rx_notify_masked) { - rq->rx_notify_masked = true; - napi_schedule(&rq->xdp_napi); + if (!READ_ONCE(rq->rx_notify_masked) && + napi_schedule_prep(&rq->xdp_napi)) { + WRITE_ONCE(rq->rx_notify_masked, true); + __napi_schedule(&rq->xdp_napi); } } @@ -912,8 +913,10 @@ static int veth_poll(struct napi_struct *napi, int budget) /* Write rx_notify_masked before reading ptr_ring */ smp_store_mb(rq->rx_notify_masked, false); if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) { - rq->rx_notify_masked = true; - napi_schedule(&rq->xdp_napi); + if (napi_schedule_prep(&rq->xdp_napi)) { + WRITE_ONCE(rq->rx_notify_masked, true); + __napi_schedule(&rq->xdp_napi); + } } }