Message ID | 20230725072049.617289-1-jasowang@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 25266128fe16d5632d43ada34c847d7b8daba539 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] virtio-net: fix race between set queues and probe | expand |
On Tue, Jul 25, 2023 at 03:20:49AM -0400, Jason Wang wrote: > A race were found where set_channels could be called after registering > but before virtnet_set_queues() in virtnet_probe(). Fixing this by > moving the virtnet_set_queues() before netdevice registering. While at > it, use _virtnet_set_queues() to avoid holding rtnl as the device is > not even registered at that time. > > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") > Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> stable material I guess? > --- > drivers/net/virtio_net.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 0db14f6b87d3..1270c8d23463 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev) > if (vi->has_rss || vi->has_rss_hash_report) > virtnet_init_default_rss(vi); > > + _virtnet_set_queues(vi, vi->curr_queue_pairs); > + > /* serialize netdev register + virtio_device_ready() with ndo_open() */ > rtnl_lock(); > > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev) > goto free_unregister_netdev; > } > > - virtnet_set_queues(vi, vi->curr_queue_pairs); > - > /* Assume link up if device can't report link status, > otherwise get link status from config. */ > netif_carrier_off(dev); > -- > 2.39.3
On Tue, 25 Jul 2023 03:20:49 -0400, Jason Wang <jasowang@redhat.com> wrote: > A race were found where set_channels could be called after registering > but before virtnet_set_queues() in virtnet_probe(). Fixing this by > moving the virtnet_set_queues() before netdevice registering. While at > it, use _virtnet_set_queues() to avoid holding rtnl as the device is > not even registered at that time. > > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") > Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 0db14f6b87d3..1270c8d23463 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev) > if (vi->has_rss || vi->has_rss_hash_report) > virtnet_init_default_rss(vi); > > + _virtnet_set_queues(vi, vi->curr_queue_pairs); > + > /* serialize netdev register + virtio_device_ready() with ndo_open() */ > rtnl_lock(); > > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev) > goto free_unregister_netdev; > } > > - virtnet_set_queues(vi, vi->curr_queue_pairs); > - > /* Assume link up if device can't report link status, > otherwise get link status from config. */ > netif_carrier_off(dev); > -- > 2.39.3 >
On Tue, Jul 25, 2023 at 3:53 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Jul 25, 2023 at 03:20:49AM -0400, Jason Wang wrote: > > A race were found where set_channels could be called after registering > > but before virtnet_set_queues() in virtnet_probe(). Fixing this by > > moving the virtnet_set_queues() before netdevice registering. While at > > it, use _virtnet_set_queues() to avoid holding rtnl as the device is > > not even registered at that time. > > > > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com> > > stable material I guess? Yes it is. Thanks > > > --- > > drivers/net/virtio_net.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 0db14f6b87d3..1270c8d23463 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev) > > if (vi->has_rss || vi->has_rss_hash_report) > > virtnet_init_default_rss(vi); > > > > + _virtnet_set_queues(vi, vi->curr_queue_pairs); > > + > > /* serialize netdev register + virtio_device_ready() with ndo_open() */ > > rtnl_lock(); > > > > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev) > > goto free_unregister_netdev; > > } > > > > - virtnet_set_queues(vi, vi->curr_queue_pairs); > > - > > /* Assume link up if device can't report link status, > > otherwise get link status from config. */ > > netif_carrier_off(dev); > > -- > > 2.39.3 >
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 25 Jul 2023 03:20:49 -0400 you wrote: > A race were found where set_channels could be called after registering > but before virtnet_set_queues() in virtnet_probe(). Fixing this by > moving the virtnet_set_queues() before netdevice registering. While at > it, use _virtnet_set_queues() to avoid holding rtnl as the device is > not even registered at that time. > > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") > Signed-off-by: Jason Wang <jasowang@redhat.com> > > [...] Here is the summary with links: - [net] virtio-net: fix race between set queues and probe https://git.kernel.org/netdev/net/c/25266128fe16 You are awesome, thank you!
On Tue, 2023-07-25 at 03:20 -0400, Jason Wang wrote: > A race were found where set_channels could be called after registering > but before virtnet_set_queues() in virtnet_probe(). Fixing this by > moving the virtnet_set_queues() before netdevice registering. While at > it, use _virtnet_set_queues() to avoid holding rtnl as the device is > not even registered at that time. > > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/net/virtio_net.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 0db14f6b87d3..1270c8d23463 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev) > if (vi->has_rss || vi->has_rss_hash_report) > virtnet_init_default_rss(vi); > > + _virtnet_set_queues(vi, vi->curr_queue_pairs); > + > /* serialize netdev register + virtio_device_ready() with ndo_open() > */ > rtnl_lock(); > > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev) > goto free_unregister_netdev; > } > > - virtnet_set_queues(vi, vi->curr_queue_pairs); > - > /* Assume link up if device can't report link status, > otherwise get link status from config. */ > netif_carrier_off(dev); This change seems to break mlx5_vdpa when using virtio_vdpa. _virtnet_set_queues() hangs. Because DRIVER_OK has not yet been set, mlx5_vdpa cvq kick handler will ignore any commands. Output: [ 199.681445] mlx5_core 0000:08:00.0: E-Switch: Enable: mode(OFFLOADS), nvfs(1), necvfs(0), active vports(2) [ 199.690154] mlx5_core 0000:08:00.2: firmware version: 22.38.458 [ 199.862816] mlx5_core 0000:08:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 199.895653] mlx5_core 0000:08:00.2: Assigned random MAC address b2:a1:71:c0:28:a2 [ 200.036900] mlx5_core 0000:08:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced) [ 202.215022] mlx5_core 0000:08:00.2: mlx5_vdpa_reset:2813:(pid 1294): performing device reset [ 223.228309] rcu: INFO: rcu_sched self-detected stall on CPU [ 223.228870] rcu: 3-....: (5250 ticks this GP) idle=1cdc/1/0x4000000000000000 softirq=1981/1981 fqs=2064 [ 223.229686] rcu: (t=5251 jiffies g=7061 q=4365 ncpus=10) [ 223.230165] CPU: 3 PID: 1294 Comm: modprobe Not tainted 6.5.0-rc4+ #32 [ 223.230725] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel- 1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 223.231657] RIP: 0010:virtqueue_get_buf_ctx+0x7/0x290 [ 223.232141] Code: ... [ 223.233639] RSP: 0018:ffff888144fe7898 EFLAGS: 00000246 [ 223.234108] RAX: 0000000000000000 RBX: ffff888103327940 RCX: ffff888100eca000 [ 223.234711] RDX: 0000000000000000 RSI: ffff888144fe78ac RDI: ffff888108c3b300 [ 223.235315] RBP: ffff888144fe78d0 R08: 0000000000000001 R09: ffff888103327940 [ 223.235921] R10: 0000000000000003 R11: 0000000000000008 R12: 0000000000000002 [ 223.236536] R13: 0000000000000004 R14: ffff8881037c1400 R15: ffff88810337f740 [ 223.237144] FS: 00007f55572f8740(0000) GS:ffff88846f980000(0000) knlGS:0000000000000000 [ 223.237877] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 223.238401] CR2: 00007f5556d05c00 CR3: 000000012df3a002 CR4: 0000000000370ea0 [ 223.239016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 223.239630] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 223.240265] Call Trace: [ 223.240570] <IRQ> [ 223.240823] ? rcu_dump_cpu_stacks+0xc4/0x100 [ 223.241250] ? rcu_sched_clock_irq+0x407/0xaf0 [ 223.241677] ? trigger_load_balance+0x62/0x380 [ 223.242105] ? update_process_times+0x5b/0x90 [ 223.242527] ? tick_sched_timer+0x8b/0xa0 [ 223.242926] ? tick_sched_do_timer+0x80/0x80 [ 223.243346] ? __hrtimer_run_queues+0x121/0x270 [ 223.243777] ? hrtimer_interrupt+0xf4/0x230 [ 223.244193] ? __sysvec_apic_timer_interrupt+0x52/0xf0 [ 223.244684] ? sysvec_apic_timer_interrupt+0x69/0x90 [ 223.245150] </IRQ> [ 223.245412] <TASK> [ 223.245668] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 223.246160] ? virtqueue_get_buf_ctx+0x7/0x290 [ 223.246588] virtnet_send_command+0x113/0x170 [ 223.247008] ? virtnet_set_affinity+0x166/0x1b0 [ 223.247441] _virtnet_set_queues+0x9f/0x100 [ 223.247850] virtnet_probe+0xa27/0x1270 [ 223.248242] ? vdpa_get_config+0x5b/0x70 [vdpa] [ 223.248699] ? virtio_vdpa_notify_with_data+0x30/0x30 [virtio_vdpa] [ 223.249251] virtio_dev_probe+0x196/0x300 [ 223.249646] really_probe+0xc3/0x3d0 [ 223.250012] ? driver_probe_device+0x90/0x90 [ 223.250425] __driver_probe_device+0x80/0x160 [ 223.250846] driver_probe_device+0x1f/0x90 [ 223.251247] __device_attach_driver+0x7d/0x100 [ 223.251674] bus_for_each_drv+0x7d/0xd0 [ 223.252055] __device_attach+0xb2/0x1c0 [ 223.252455] bus_probe_device+0x86/0xa0 [ 223.252839] device_add+0x65b/0x870 [ 223.253200] ? mlx5_vdpa_set_status+0x2a/0x200 [mlx5_vdpa] [ 223.253699] ? virtio_vdpa_notify_with_data+0x30/0x30 [virtio_vdpa] [ 223.254255] register_virtio_device+0xb5/0xf0 [ 223.254678] virtio_vdpa_probe+0xae/0xf0 [virtio_vdpa] [ 223.255156] really_probe+0xc3/0x3d0 [ 223.255518] ? __device_attach_driver+0x100/0x100 [ 223.255962] __driver_probe_device+0x80/0x160 [ 223.256409] driver_probe_device+0x1f/0x90 [ 223.256812] __driver_attach+0xe9/0x1b0 [ 223.257190] bus_for_each_dev+0x70/0xc0 [ 223.257574] bus_add_driver+0xf1/0x220 [ 223.257954] driver_register+0x55/0xf0 [ 223.258330] ? 0xffffffffa0647000 [ 223.258675] do_one_initcall+0x4c/0x1e0 [ 223.259060] ? kmalloc_trace+0x26/0x80 [ 223.259438] do_init_module+0x88/0x260 [ 223.259813] init_module_from_file+0x86/0xc0 [ 223.260245] __x64_sys_finit_module+0x16d/0x2e0 [ 223.260695] do_syscall_64+0x3d/0x90 [ 223.261063] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 223.261532] RIP: 0033:0x7f5556d2bd2d [ 223.261894] Code: ... [ 223.263415] RSP: 002b:00007ffc1ffcd908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 223.264113] RAX: ffffffffffffffda RBX: 00005642fa82caf0 RCX: 00007f5556d2bd2d [ 223.264736] RDX: 0000000000000000 RSI: 00005642f8ca3fc9 RDI: 0000000000000003 [ 223.265348] RBP: 00007ffc1ffcd9c0 R08: 0000000000000000 R09: 00007ffc1ffcd950 [ 223.265966] R10: 0000000000000003 R11: 0000000000000246 R12: 00005642f8ca3fc9 [ 223.266581] R13: 0000000000040000 R14: 00005642fa82cc20 R15: 0000000000000000 [ 223.267192] </TASK> Thanks, Dragos
On Tue, Aug 8, 2023 at 7:35 PM Dragos Tatulea <dtatulea@nvidia.com> wrote: > > On Tue, 2023-07-25 at 03:20 -0400, Jason Wang wrote: > > A race were found where set_channels could be called after registering > > but before virtnet_set_queues() in virtnet_probe(). Fixing this by > > moving the virtnet_set_queues() before netdevice registering. While at > > it, use _virtnet_set_queues() to avoid holding rtnl as the device is > > not even registered at that time. > > > > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > --- > > drivers/net/virtio_net.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 0db14f6b87d3..1270c8d23463 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev) > > if (vi->has_rss || vi->has_rss_hash_report) > > virtnet_init_default_rss(vi); > > > > + _virtnet_set_queues(vi, vi->curr_queue_pairs); > > + > > /* serialize netdev register + virtio_device_ready() with ndo_open() > > */ > > rtnl_lock(); > > > > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev) > > goto free_unregister_netdev; > > } > > > > - virtnet_set_queues(vi, vi->curr_queue_pairs); > > - > > /* Assume link up if device can't report link status, > > otherwise get link status from config. */ > > netif_carrier_off(dev); > > This change seems to break mlx5_vdpa when using virtio_vdpa. > _virtnet_set_queues() hangs. Because DRIVER_OK has not yet been set, mlx5_vdpa > cvq kick handler will ignore any commands. > Yes, I will post a fix soon. Thanks
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0db14f6b87d3..1270c8d23463 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev) if (vi->has_rss || vi->has_rss_hash_report) virtnet_init_default_rss(vi); + _virtnet_set_queues(vi, vi->curr_queue_pairs); + /* serialize netdev register + virtio_device_ready() with ndo_open() */ rtnl_lock(); @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev) goto free_unregister_netdev; } - virtnet_set_queues(vi, vi->curr_queue_pairs); - /* Assume link up if device can't report link status, otherwise get link status from config. */ netif_carrier_off(dev);
A race were found where set_channels could be called after registering but before virtnet_set_queues() in virtnet_probe(). Fixing this by moving the virtnet_set_queues() before netdevice registering. While at it, use _virtnet_set_queues() to avoid holding rtnl as the device is not even registered at that time. Fixes: a220871be66f ("virtio-net: correctly enable multiqueue") Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/virtio_net.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)