Message ID | 20200602081112.32426-1-slp@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-blk: Disable request queuing while switching contexts | expand |
Am 02.06.2020 um 10:11 hat Sergio Lopez geschrieben: > Disable request queuing while switching contexts on > virtio_blk_data_plane_[start|stop](), preventing requests from getting > queued on the wrong context. > > Placing requests on the wrong context may lead to them being wrongly > accessed in parallel from different threads, potentially leading to > multiple issues. > > For example, stopping and resuming a VM multiple times while the guest > is generating I/O on a virtio_blk device can trigger a crash with a > stack tracing looking like this one: > > <------> > Thread 2 (Thread 0x7ff736765700 (LWP 1062503)): > #0 0x00005567a13b99d6 in iov_memset > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > at util/iov.c:69 > #1 0x00005567a13bab73 in qemu_iovec_memset > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > #5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420) > at block/linux-aio.c:236 > #6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267 > #7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > at util/aio-posix.c:520 > #8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8) > at util/aio-posix.c:562 > #9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > at util/aio-posix.c:597 > #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639 > #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75 > #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519 > #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0 > #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6 > > Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)): > #0 0x00005567a13b99d6 in iov_memset > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > at util/iov.c:69 > #1 0x00005567a13bab73 in qemu_iovec_memset > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > #5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2) > at block/linux-aio.c:375 > #6 0x00005567a12f4af2 in laio_co_submit > (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2) > at block/linux-aio.c:394 > #7 0x00005567a12f1803 in raw_co_prw > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2) > at block/file-posix.c:1892 > #8 0x00005567a12f1941 in raw_co_pwritev > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0) > at block/file-posix.c:1925 > #9 0x00005567a12fe3e1 in bdrv_driver_pwritev > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0) > at block/io.c:1183 > #10 0x00005567a1300340 in bdrv_aligned_pwritev > (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980 > #11 0x00005567a1300b29 in bdrv_co_pwritev_part > (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) > at block/io.c:2137 > #12 0x00005567a12baba1 in qcow2_co_pwritev_task > (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444 > #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475 > #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45 > #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115 > #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6 > #17 0x00007ff6626f1350 in () > #18 0x0000000000000000 in () > <------> > > This is also known to cause crashes with this message (assertion > failed): > > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765 > Signed-off-by: Sergio Lopez <slp@redhat.com> > --- > hw/block/dataplane/virtio-blk.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > index 1b52e8159c..f1c7ba69c0 100644 > --- a/hw/block/dataplane/virtio-blk.c > +++ b/hw/block/dataplane/virtio-blk.c > @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > vblk->dataplane_started = true; > trace_virtio_blk_data_plane_start(s); > > + /* Prevent requests from getting queued on the old context */ > + blk_set_disable_request_queuing(s->conf->conf.blk, true); > + > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err); > if (r < 0) { > error_report_err(local_err); > goto fail_guest_notifiers; > } > > + blk_set_disable_request_queuing(s->conf->conf.blk, false); > + Why are we even getting new requests that could possibly be queued? This is in virtio_blk_data_plane_start/stop(), so clearly the device should be idle and the virtio queues shouldn't be processed at all at the same time? blk_set_disable_request_queuing() feels more like a hack that hides the bug instead of fixing it. Kevin
On Tue, Jun 02, 2020 at 01:23:14PM +0200, Kevin Wolf wrote: > Am 02.06.2020 um 10:11 hat Sergio Lopez geschrieben: > > Disable request queuing while switching contexts on > > virtio_blk_data_plane_[start|stop](), preventing requests from getting > > queued on the wrong context. > > > > Placing requests on the wrong context may lead to them being wrongly > > accessed in parallel from different threads, potentially leading to > > multiple issues. > > > > For example, stopping and resuming a VM multiple times while the guest > > is generating I/O on a virtio_blk device can trigger a crash with a > > stack tracing looking like this one: > > > > <------> > > Thread 2 (Thread 0x7ff736765700 (LWP 1062503)): > > #0 0x00005567a13b99d6 in iov_memset > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > at util/iov.c:69 > > #1 0x00005567a13bab73 in qemu_iovec_memset > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > #5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420) > > at block/linux-aio.c:236 > > #6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267 > > #7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > at util/aio-posix.c:520 > > #8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8) > > at util/aio-posix.c:562 > > #9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > at util/aio-posix.c:597 > > #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639 > > #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75 > > #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519 > > #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0 > > #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6 > > > > Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)): > > #0 0x00005567a13b99d6 in iov_memset > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > at util/iov.c:69 > > #1 0x00005567a13bab73 in qemu_iovec_memset > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > #5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2) > > at block/linux-aio.c:375 > > #6 0x00005567a12f4af2 in laio_co_submit > > (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2) > > at block/linux-aio.c:394 > > #7 0x00005567a12f1803 in raw_co_prw > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2) > > at block/file-posix.c:1892 > > #8 0x00005567a12f1941 in raw_co_pwritev > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0) > > at block/file-posix.c:1925 > > #9 0x00005567a12fe3e1 in bdrv_driver_pwritev > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0) > > at block/io.c:1183 > > #10 0x00005567a1300340 in bdrv_aligned_pwritev > > (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980 > > #11 0x00005567a1300b29 in bdrv_co_pwritev_part > > (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) > > at block/io.c:2137 > > #12 0x00005567a12baba1 in qcow2_co_pwritev_task > > (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444 > > #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475 > > #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45 > > #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115 > > #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6 > > #17 0x00007ff6626f1350 in () > > #18 0x0000000000000000 in () > > <------> > > > > This is also known to cause crashes with this message (assertion > > failed): > > > > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' > > > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765 > > Signed-off-by: Sergio Lopez <slp@redhat.com> > > --- > > hw/block/dataplane/virtio-blk.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > > index 1b52e8159c..f1c7ba69c0 100644 > > --- a/hw/block/dataplane/virtio-blk.c > > +++ b/hw/block/dataplane/virtio-blk.c > > @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > > vblk->dataplane_started = true; > > trace_virtio_blk_data_plane_start(s); > > > > + /* Prevent requests from getting queued on the old context */ > > + blk_set_disable_request_queuing(s->conf->conf.blk, true); > > + > > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err); > > if (r < 0) { > > error_report_err(local_err); > > goto fail_guest_notifiers; > > } > > > > + blk_set_disable_request_queuing(s->conf->conf.blk, false); > > + > > Why are we even getting new requests that could possibly be queued? This > is in virtio_blk_data_plane_start/stop(), so clearly the device should > be idle and the virtio queues shouldn't be processed at all at the same > time? Requests are processed and queued by virtio_blk_dma_restart_bh(), which is a BH registered by virtio_blk_dma_restart_cb(), which in turn is a callback for VM state change events registered with qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s). This is the back trace of a request being queued this way: (gdb) bt #0 0x000055cd35ce7490 in virtio_blk_submit_multireq (blk=0x55cd38cd24d0, mrb=0x7ffd25b365c0) at /root/src/qemu/hw/block/virtio-blk.c:448 #1 0x000055cd35ce7d5f in virtio_blk_handle_request (req=0x7f5a4c022c80, mrb=0x7ffd25b365c0) at /root/src/qemu/hw/block/virtio-blk.c:686 #2 0x000055cd35ce8602 in virtio_blk_dma_restart_bh (opaque=0x55cd38ccf4c0) at /root/src/qemu/hw/block/virtio-blk.c:836 #3 0x000055cd360dc4f5 in aio_bh_call (bh=0x55cd3880b300) at /root/src/qemu/util/async.c:164 #4 0x000055cd360dc4f5 in aio_bh_poll (ctx=ctx@entry=0x55cd3780bff0) at /root/src/qemu/util/async.c:164 #5 0x000055cd360dfd29 in aio_poll (ctx=ctx@entry=0x55cd3780bff0, blocking=blocking@entry=true) at /root/src/qemu/util/aio-posix.c:650 #6 0x000055cd3603a645 in bdrv_do_drained_begin (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55cd3788d550) at /root/src/qemu/block/io.c:429 #7 0x000055cd3603a645 in bdrv_do_drained_begin (bs=0x55cd3788d550, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at /root/src/qemu/block/io.c:395 #8 0x000055cd35fde969 in bdrv_set_aio_context_ignore (bs=bs@entry=0x55cd3788d550, new_context=new_context@entry=0x55cd37810e40, ignore=ignore@entry=0x7ffd25b36890) at /root/src/qemu/block.c:6309 #9 0x000055cd35fdef33 in bdrv_child_try_set_aio_context (bs=bs@entry=0x55cd3788d550, ctx=0x55cd37810e40, ignore_child=<optimized out>, errp=<optimized out>) at /root/src/qemu/block.c:6431 #10 0x000055cd3602919b in blk_do_set_aio_context (blk=0x55cd38cd24d0, new_context=0x55cd37810e40, update_root_node=<optimized out>, errp=<optimized out>) at /root/src/qemu/block/block-backend.c:2016 #11 0x000055cd35cef54f in virtio_blk_data_plane_start (vdev=<optimized out>) at /root/src/qemu/hw/block/dataplane/virtio-blk.c:220 #12 0x000055cd35f48cd2 in virtio_bus_start_ioeventfd (bus=0x55cd38ccf438) at /root/src/qemu/hw/virtio/virtio-bus.c:222 #13 0x000055cd35d6ea1f in vm_state_notify (running=running@entry=1, state=state@entry=RUN_STATE_RUNNING) at /root/src/qemu/softmmu/vl.c:1271 #14 0x000055cd35ca931d in vm_prepare_start () at /root/src/qemu/cpus.c:2156 #15 0x000055cd35ca9369 in vm_start () at /root/src/qemu/cpus.c:2162 #16 0x000055cd35f7a265 in qmp_cont (errp=errp@entry=0x7ffd25b36a60) at /root/src/qemu/monitor/qmp-cmds.c:160 #17 0x000055cd35f7d704 in hmp_cont (mon=<optimized out>, qdict=<optimized out>) at /root/src/qemu/monitor/hmp-cmds.c:1050 #18 0x000055cd35f79c54 in handle_hmp_command (mon=mon@entry=0x55cd37813160, cmdline=<optimized out>, cmdline@entry=0x55cd3790cfe0 "c") at /root/src/qemu/monitor/hmp.c:1082 #19 0x000055cd35f79d7c in monitor_command_cb (opaque=0x55cd37813160, cmdline=0x55cd3790cfe0 "c", readline_opaque=<optimized out>) at /root/src/qemu/monitor/hmp.c:47 #20 0x000055cd360f46b5 in readline_handle_byte (rs=0x55cd3790cfe0, ch=<optimized out>) at /root/src/qemu/util/readline.c:408 #21 0x000055cd35f79de3 in monitor_read (opaque=0x55cd37813160, buf=<optimized out>, size=<optimized out>) at /root/src/qemu/monitor/hmp.c:1312 #22 0x000055cd3607050d in fd_chr_read (chan=0x55cd378143a0, cond=<optimized out>, opaque=<optimized out>) at /root/src/qemu/chardev/char-fd.c:68 #23 0x00007f5a70d1667d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #24 0x000055cd360dee58 in glib_pollfds_poll () at /root/src/qemu/util/main-loop.c:219 #25 0x000055cd360dee58 in os_host_main_loop_wait (timeout=<optimized out>) at /root/src/qemu/util/main-loop.c:242 #26 0x000055cd360dee58 in main_loop_wait (nonblocking=nonblocking@entry=0) at /root/src/qemu/util/main-loop.c:518 #27 0x000055cd35d6f1b9 in qemu_main_loop () at /root/src/qemu/softmmu/vl.c:1664 #28 0x000055cd35c6205e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/src/qemu/softmmu/main.c:49 I'm not familiar with the code, but I guess we need this BH to ensure we process the virtqueue after resuming. We could perhaps schedule the BH in a different way, or add a flag to VirtIOBlock so virtio_blk_data_plane_[start|stop] would instruct virtio_blk_dma_restart_bh() to avoid processing the virtqueue, rescheduling itself instead. How does this would look to you? Thanks, Sergio. > blk_set_disable_request_queuing() feels more like a hack that hides the > bug instead of fixing it. > > Kevin >
Am 02.06.2020 um 14:18 hat Sergio Lopez geschrieben: > On Tue, Jun 02, 2020 at 01:23:14PM +0200, Kevin Wolf wrote: > > Am 02.06.2020 um 10:11 hat Sergio Lopez geschrieben: > > > Disable request queuing while switching contexts on > > > virtio_blk_data_plane_[start|stop](), preventing requests from getting > > > queued on the wrong context. > > > > > > Placing requests on the wrong context may lead to them being wrongly > > > accessed in parallel from different threads, potentially leading to > > > multiple issues. > > > > > > For example, stopping and resuming a VM multiple times while the guest > > > is generating I/O on a virtio_blk device can trigger a crash with a > > > stack tracing looking like this one: > > > > > > <------> > > > Thread 2 (Thread 0x7ff736765700 (LWP 1062503)): > > > #0 0x00005567a13b99d6 in iov_memset > > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > > at util/iov.c:69 > > > #1 0x00005567a13bab73 in qemu_iovec_memset > > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > > #5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420) > > > at block/linux-aio.c:236 > > > #6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267 > > > #7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > > at util/aio-posix.c:520 > > > #8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8) > > > at util/aio-posix.c:562 > > > #9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > > at util/aio-posix.c:597 > > > #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639 > > > #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75 > > > #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519 > > > #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0 > > > #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6 > > > > > > Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)): > > > #0 0x00005567a13b99d6 in iov_memset > > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > > at util/iov.c:69 > > > #1 0x00005567a13bab73 in qemu_iovec_memset > > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > > #5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2) > > > at block/linux-aio.c:375 > > > #6 0x00005567a12f4af2 in laio_co_submit > > > (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2) > > > at block/linux-aio.c:394 > > > #7 0x00005567a12f1803 in raw_co_prw > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2) > > > at block/file-posix.c:1892 > > > #8 0x00005567a12f1941 in raw_co_pwritev > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0) > > > at block/file-posix.c:1925 > > > #9 0x00005567a12fe3e1 in bdrv_driver_pwritev > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0) > > > at block/io.c:1183 > > > #10 0x00005567a1300340 in bdrv_aligned_pwritev > > > (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980 > > > #11 0x00005567a1300b29 in bdrv_co_pwritev_part > > > (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) > > > at block/io.c:2137 > > > #12 0x00005567a12baba1 in qcow2_co_pwritev_task > > > (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444 > > > #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475 > > > #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45 > > > #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115 > > > #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6 > > > #17 0x00007ff6626f1350 in () > > > #18 0x0000000000000000 in () > > > <------> > > > > > > This is also known to cause crashes with this message (assertion > > > failed): > > > > > > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' > > > > > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765 > > > Signed-off-by: Sergio Lopez <slp@redhat.com> > > > --- > > > hw/block/dataplane/virtio-blk.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > > > index 1b52e8159c..f1c7ba69c0 100644 > > > --- a/hw/block/dataplane/virtio-blk.c > > > +++ b/hw/block/dataplane/virtio-blk.c > > > @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > > > vblk->dataplane_started = true; > > > trace_virtio_blk_data_plane_start(s); > > > > > > + /* Prevent requests from getting queued on the old context */ > > > + blk_set_disable_request_queuing(s->conf->conf.blk, true); > > > + > > > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err); > > > if (r < 0) { > > > error_report_err(local_err); > > > goto fail_guest_notifiers; > > > } > > > > > > + blk_set_disable_request_queuing(s->conf->conf.blk, false); > > > + > > > > Why are we even getting new requests that could possibly be queued? This > > is in virtio_blk_data_plane_start/stop(), so clearly the device should > > be idle and the virtio queues shouldn't be processed at all at the same > > time? > > Requests are processed and queued by virtio_blk_dma_restart_bh(), > which is a BH registered by virtio_blk_dma_restart_cb(), which in turn > is a callback for VM state change events registered with > qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s). > > This is the back trace of a request being queued this way: > > (gdb) bt > #0 0x000055cd35ce7490 in virtio_blk_submit_multireq (blk=0x55cd38cd24d0, mrb=0x7ffd25b365c0) > at /root/src/qemu/hw/block/virtio-blk.c:448 > #1 0x000055cd35ce7d5f in virtio_blk_handle_request (req=0x7f5a4c022c80, mrb=0x7ffd25b365c0) > at /root/src/qemu/hw/block/virtio-blk.c:686 > #2 0x000055cd35ce8602 in virtio_blk_dma_restart_bh (opaque=0x55cd38ccf4c0) > at /root/src/qemu/hw/block/virtio-blk.c:836 > #3 0x000055cd360dc4f5 in aio_bh_call (bh=0x55cd3880b300) at /root/src/qemu/util/async.c:164 > #4 0x000055cd360dc4f5 in aio_bh_poll (ctx=ctx@entry=0x55cd3780bff0) at /root/src/qemu/util/async.c:164 > #5 0x000055cd360dfd29 in aio_poll (ctx=ctx@entry=0x55cd3780bff0, blocking=blocking@entry=true) > at /root/src/qemu/util/aio-posix.c:650 > #6 0x000055cd3603a645 in bdrv_do_drained_begin > (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55cd3788d550) > at /root/src/qemu/block/io.c:429 > #7 0x000055cd3603a645 in bdrv_do_drained_begin > (bs=0x55cd3788d550, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at /root/src/qemu/block/io.c:395 > #8 0x000055cd35fde969 in bdrv_set_aio_context_ignore > (bs=bs@entry=0x55cd3788d550, new_context=new_context@entry=0x55cd37810e40, ignore=ignore@entry=0x7ffd25b36890) at /root/src/qemu/block.c:6309 > #9 0x000055cd35fdef33 in bdrv_child_try_set_aio_context > (bs=bs@entry=0x55cd3788d550, ctx=0x55cd37810e40, ignore_child=<optimized out>, errp=<optimized out>) > at /root/src/qemu/block.c:6431 > #10 0x000055cd3602919b in blk_do_set_aio_context > (blk=0x55cd38cd24d0, new_context=0x55cd37810e40, update_root_node=<optimized out>, errp=<optimized out>) at /root/src/qemu/block/block-backend.c:2016 > #11 0x000055cd35cef54f in virtio_blk_data_plane_start (vdev=<optimized out>) > at /root/src/qemu/hw/block/dataplane/virtio-blk.c:220 Ugh. Thanks, I see now what's going on. We were relying on blk_set_aio_context() to complete the requests, and since BlockBackend queues the requests instead of completing them during drained_begin, we end up executing the requests after drained_end in the main loop context, but the block nodes have meanwhile moved to the iothread context. Of course, this was already a bad hack before, witnessed by the FIXME comment in virtio_blk_dma_restart_cb(). > I'm not familiar with the code, but I guess we need this BH to ensure > we process the virtqueue after resuming. The BH was introduced in commit 213189ab65d. The commit message has a very detailed problem description, but to summarise, if I understand correctly, the main purpose is that we don't end up stopping the VM due to an I/O error while we are only in the middle of sending VM state handlers events that the VM is about to be continued. > We could perhaps schedule the BH in a different way, or add a flag to > VirtIOBlock so virtio_blk_data_plane_[start|stop] would instruct > virtio_blk_dma_restart_bh() to avoid processing the virtqueue, > rescheduling itself instead. How does this would look to you? Yes, I think if dataplane is enabled, we should really run the restarted requests in the iothread. virtio_blk_dma_restart_cb() is called too early, before the dataplane is running, so I guess we need some way to tell the dataplane that it should run the request restarting logic of virtio_blk_dma_restart_bh() after it has started (no additional BH needed there). Your flag sounds like the obvious way to get this. In fact, would it hurt to do this just unconditionally without a flag? s->rq should be NULL when there is nothing to restart. And then we can skip the same logic in virtio_blk_dma_restart_cb() if we know that a dataplane is going to be started. Kevin
On Tue, Jun 02, 2020 at 03:04:33PM +0200, Kevin Wolf wrote: > Am 02.06.2020 um 14:18 hat Sergio Lopez geschrieben: > > On Tue, Jun 02, 2020 at 01:23:14PM +0200, Kevin Wolf wrote: > > > Am 02.06.2020 um 10:11 hat Sergio Lopez geschrieben: > > > > Disable request queuing while switching contexts on > > > > virtio_blk_data_plane_[start|stop](), preventing requests from getting > > > > queued on the wrong context. > > > > > > > > Placing requests on the wrong context may lead to them being wrongly > > > > accessed in parallel from different threads, potentially leading to > > > > multiple issues. > > > > > > > > For example, stopping and resuming a VM multiple times while the guest > > > > is generating I/O on a virtio_blk device can trigger a crash with a > > > > stack tracing looking like this one: > > > > > > > > <------> > > > > Thread 2 (Thread 0x7ff736765700 (LWP 1062503)): > > > > #0 0x00005567a13b99d6 in iov_memset > > > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > > > at util/iov.c:69 > > > > #1 0x00005567a13bab73 in qemu_iovec_memset > > > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > > > #5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420) > > > > at block/linux-aio.c:236 > > > > #6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267 > > > > #7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > > > at util/aio-posix.c:520 > > > > #8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8) > > > > at util/aio-posix.c:562 > > > > #9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > > > at util/aio-posix.c:597 > > > > #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639 > > > > #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75 > > > > #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519 > > > > #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0 > > > > #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6 > > > > > > > > Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)): > > > > #0 0x00005567a13b99d6 in iov_memset > > > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > > > at util/iov.c:69 > > > > #1 0x00005567a13bab73 in qemu_iovec_memset > > > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > > > #5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2) > > > > at block/linux-aio.c:375 > > > > #6 0x00005567a12f4af2 in laio_co_submit > > > > (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2) > > > > at block/linux-aio.c:394 > > > > #7 0x00005567a12f1803 in raw_co_prw > > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2) > > > > at block/file-posix.c:1892 > > > > #8 0x00005567a12f1941 in raw_co_pwritev > > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0) > > > > at block/file-posix.c:1925 > > > > #9 0x00005567a12fe3e1 in bdrv_driver_pwritev > > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0) > > > > at block/io.c:1183 > > > > #10 0x00005567a1300340 in bdrv_aligned_pwritev > > > > (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980 > > > > #11 0x00005567a1300b29 in bdrv_co_pwritev_part > > > > (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) > > > > at block/io.c:2137 > > > > #12 0x00005567a12baba1 in qcow2_co_pwritev_task > > > > (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444 > > > > #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475 > > > > #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45 > > > > #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115 > > > > #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6 > > > > #17 0x00007ff6626f1350 in () > > > > #18 0x0000000000000000 in () > > > > <------> > > > > > > > > This is also known to cause crashes with this message (assertion > > > > failed): > > > > > > > > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' > > > > > > > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765 > > > > Signed-off-by: Sergio Lopez <slp@redhat.com> > > > > --- > > > > hw/block/dataplane/virtio-blk.c | 10 ++++++++++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > > > > index 1b52e8159c..f1c7ba69c0 100644 > > > > --- a/hw/block/dataplane/virtio-blk.c > > > > +++ b/hw/block/dataplane/virtio-blk.c > > > > @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > > > > vblk->dataplane_started = true; > > > > trace_virtio_blk_data_plane_start(s); > > > > > > > > + /* Prevent requests from getting queued on the old context */ > > > > + blk_set_disable_request_queuing(s->conf->conf.blk, true); > > > > + > > > > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err); > > > > if (r < 0) { > > > > error_report_err(local_err); > > > > goto fail_guest_notifiers; > > > > } > > > > > > > > + blk_set_disable_request_queuing(s->conf->conf.blk, false); > > > > + > > > > > > Why are we even getting new requests that could possibly be queued? This > > > is in virtio_blk_data_plane_start/stop(), so clearly the device should > > > be idle and the virtio queues shouldn't be processed at all at the same > > > time? > > > > Requests are processed and queued by virtio_blk_dma_restart_bh(), > > which is a BH registered by virtio_blk_dma_restart_cb(), which in turn > > is a callback for VM state change events registered with > > qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s). > > > > This is the back trace of a request being queued this way: > > > > (gdb) bt > > #0 0x000055cd35ce7490 in virtio_blk_submit_multireq (blk=0x55cd38cd24d0, mrb=0x7ffd25b365c0) > > at /root/src/qemu/hw/block/virtio-blk.c:448 > > #1 0x000055cd35ce7d5f in virtio_blk_handle_request (req=0x7f5a4c022c80, mrb=0x7ffd25b365c0) > > at /root/src/qemu/hw/block/virtio-blk.c:686 > > #2 0x000055cd35ce8602 in virtio_blk_dma_restart_bh (opaque=0x55cd38ccf4c0) > > at /root/src/qemu/hw/block/virtio-blk.c:836 > > #3 0x000055cd360dc4f5 in aio_bh_call (bh=0x55cd3880b300) at /root/src/qemu/util/async.c:164 > > #4 0x000055cd360dc4f5 in aio_bh_poll (ctx=ctx@entry=0x55cd3780bff0) at /root/src/qemu/util/async.c:164 > > #5 0x000055cd360dfd29 in aio_poll (ctx=ctx@entry=0x55cd3780bff0, blocking=blocking@entry=true) > > at /root/src/qemu/util/aio-posix.c:650 > > #6 0x000055cd3603a645 in bdrv_do_drained_begin > > (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55cd3788d550) > > at /root/src/qemu/block/io.c:429 > > #7 0x000055cd3603a645 in bdrv_do_drained_begin > > (bs=0x55cd3788d550, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at /root/src/qemu/block/io.c:395 > > #8 0x000055cd35fde969 in bdrv_set_aio_context_ignore > > (bs=bs@entry=0x55cd3788d550, new_context=new_context@entry=0x55cd37810e40, ignore=ignore@entry=0x7ffd25b36890) at /root/src/qemu/block.c:6309 > > #9 0x000055cd35fdef33 in bdrv_child_try_set_aio_context > > (bs=bs@entry=0x55cd3788d550, ctx=0x55cd37810e40, ignore_child=<optimized out>, errp=<optimized out>) > > at /root/src/qemu/block.c:6431 > > #10 0x000055cd3602919b in blk_do_set_aio_context > > (blk=0x55cd38cd24d0, new_context=0x55cd37810e40, update_root_node=<optimized out>, errp=<optimized out>) at /root/src/qemu/block/block-backend.c:2016 > > #11 0x000055cd35cef54f in virtio_blk_data_plane_start (vdev=<optimized out>) > > at /root/src/qemu/hw/block/dataplane/virtio-blk.c:220 > > Ugh. Thanks, I see now what's going on. > > We were relying on blk_set_aio_context() to complete the requests, and > since BlockBackend queues the requests instead of completing them during > drained_begin, we end up executing the requests after drained_end in the > main loop context, but the block nodes have meanwhile moved to the > iothread context. > > Of course, this was already a bad hack before, witnessed by the FIXME > comment in virtio_blk_dma_restart_cb(). > > > I'm not familiar with the code, but I guess we need this BH to ensure > > we process the virtqueue after resuming. > > The BH was introduced in commit 213189ab65d. The commit message has a > very detailed problem description, but to summarise, if I understand > correctly, the main purpose is that we don't end up stopping the VM due > to an I/O error while we are only in the middle of sending VM state > handlers events that the VM is about to be continued. Thanks for pointers. > > We could perhaps schedule the BH in a different way, or add a flag to > > VirtIOBlock so virtio_blk_data_plane_[start|stop] would instruct > > virtio_blk_dma_restart_bh() to avoid processing the virtqueue, > > rescheduling itself instead. How does this would look to you? > > Yes, I think if dataplane is enabled, we should really run the restarted > requests in the iothread. virtio_blk_dma_restart_cb() is called too > early, before the dataplane is running, so I guess we need some way to > tell the dataplane that it should run the request restarting logic of > virtio_blk_dma_restart_bh() after it has started (no additional BH > needed there). Your flag sounds like the obvious way to get this. > > In fact, would it hurt to do this just unconditionally without a flag? > s->rq should be NULL when there is nothing to restart. > > And then we can skip the same logic in virtio_blk_dma_restart_cb() if we > know that a dataplane is going to be started. OK, I'll prepare a patch implementing this strategy to see how it would look like. Thanks, Sergio.
On Tue, Jun 02, 2020 at 03:04:33PM +0200, Kevin Wolf wrote: > Am 02.06.2020 um 14:18 hat Sergio Lopez geschrieben: > > On Tue, Jun 02, 2020 at 01:23:14PM +0200, Kevin Wolf wrote: > > > Am 02.06.2020 um 10:11 hat Sergio Lopez geschrieben: > > > > Disable request queuing while switching contexts on > > > > virtio_blk_data_plane_[start|stop](), preventing requests from getting > > > > queued on the wrong context. > > > > > > > > Placing requests on the wrong context may lead to them being wrongly > > > > accessed in parallel from different threads, potentially leading to > > > > multiple issues. > > > > > > > > For example, stopping and resuming a VM multiple times while the guest > > > > is generating I/O on a virtio_blk device can trigger a crash with a > > > > stack tracing looking like this one: > > > > > > > > <------> > > > > Thread 2 (Thread 0x7ff736765700 (LWP 1062503)): > > > > #0 0x00005567a13b99d6 in iov_memset > > > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > > > at util/iov.c:69 > > > > #1 0x00005567a13bab73 in qemu_iovec_memset > > > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > > > #5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420) > > > > at block/linux-aio.c:236 > > > > #6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267 > > > > #7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > > > at util/aio-posix.c:520 > > > > #8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8) > > > > at util/aio-posix.c:562 > > > > #9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) > > > > at util/aio-posix.c:597 > > > > #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639 > > > > #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75 > > > > #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519 > > > > #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0 > > > > #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6 > > > > > > > > Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)): > > > > #0 0x00005567a13b99d6 in iov_memset > > > > (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) > > > > at util/iov.c:69 > > > > #1 0x00005567a13bab73 in qemu_iovec_memset > > > > (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 > > > > #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 > > > > #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 > > > > #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 > > > > #5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2) > > > > at block/linux-aio.c:375 > > > > #6 0x00005567a12f4af2 in laio_co_submit > > > > (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2) > > > > at block/linux-aio.c:394 > > > > #7 0x00005567a12f1803 in raw_co_prw > > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2) > > > > at block/file-posix.c:1892 > > > > #8 0x00005567a12f1941 in raw_co_pwritev > > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0) > > > > at block/file-posix.c:1925 > > > > #9 0x00005567a12fe3e1 in bdrv_driver_pwritev > > > > (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0) > > > > at block/io.c:1183 > > > > #10 0x00005567a1300340 in bdrv_aligned_pwritev > > > > (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980 > > > > #11 0x00005567a1300b29 in bdrv_co_pwritev_part > > > > (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) > > > > at block/io.c:2137 > > > > #12 0x00005567a12baba1 in qcow2_co_pwritev_task > > > > (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444 > > > > #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475 > > > > #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45 > > > > #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115 > > > > #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6 > > > > #17 0x00007ff6626f1350 in () > > > > #18 0x0000000000000000 in () > > > > <------> > > > > > > > > This is also known to cause crashes with this message (assertion > > > > failed): > > > > > > > > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' > > > > > > > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765 > > > > Signed-off-by: Sergio Lopez <slp@redhat.com> > > > > --- > > > > hw/block/dataplane/virtio-blk.c | 10 ++++++++++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > > > > index 1b52e8159c..f1c7ba69c0 100644 > > > > --- a/hw/block/dataplane/virtio-blk.c > > > > +++ b/hw/block/dataplane/virtio-blk.c > > > > @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > > > > vblk->dataplane_started = true; > > > > trace_virtio_blk_data_plane_start(s); > > > > > > > > + /* Prevent requests from getting queued on the old context */ > > > > + blk_set_disable_request_queuing(s->conf->conf.blk, true); > > > > + > > > > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err); > > > > if (r < 0) { > > > > error_report_err(local_err); > > > > goto fail_guest_notifiers; > > > > } > > > > > > > > + blk_set_disable_request_queuing(s->conf->conf.blk, false); > > > > + > > > > > > Why are we even getting new requests that could possibly be queued? This > > > is in virtio_blk_data_plane_start/stop(), so clearly the device should > > > be idle and the virtio queues shouldn't be processed at all at the same > > > time? > > > > Requests are processed and queued by virtio_blk_dma_restart_bh(), > > which is a BH registered by virtio_blk_dma_restart_cb(), which in turn > > is a callback for VM state change events registered with > > qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s). > > > > This is the back trace of a request being queued this way: > > > > (gdb) bt > > #0 0x000055cd35ce7490 in virtio_blk_submit_multireq (blk=0x55cd38cd24d0, mrb=0x7ffd25b365c0) > > at /root/src/qemu/hw/block/virtio-blk.c:448 > > #1 0x000055cd35ce7d5f in virtio_blk_handle_request (req=0x7f5a4c022c80, mrb=0x7ffd25b365c0) > > at /root/src/qemu/hw/block/virtio-blk.c:686 > > #2 0x000055cd35ce8602 in virtio_blk_dma_restart_bh (opaque=0x55cd38ccf4c0) > > at /root/src/qemu/hw/block/virtio-blk.c:836 > > #3 0x000055cd360dc4f5 in aio_bh_call (bh=0x55cd3880b300) at /root/src/qemu/util/async.c:164 > > #4 0x000055cd360dc4f5 in aio_bh_poll (ctx=ctx@entry=0x55cd3780bff0) at /root/src/qemu/util/async.c:164 > > #5 0x000055cd360dfd29 in aio_poll (ctx=ctx@entry=0x55cd3780bff0, blocking=blocking@entry=true) > > at /root/src/qemu/util/aio-posix.c:650 > > #6 0x000055cd3603a645 in bdrv_do_drained_begin > > (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55cd3788d550) > > at /root/src/qemu/block/io.c:429 > > #7 0x000055cd3603a645 in bdrv_do_drained_begin > > (bs=0x55cd3788d550, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at /root/src/qemu/block/io.c:395 > > #8 0x000055cd35fde969 in bdrv_set_aio_context_ignore > > (bs=bs@entry=0x55cd3788d550, new_context=new_context@entry=0x55cd37810e40, ignore=ignore@entry=0x7ffd25b36890) at /root/src/qemu/block.c:6309 > > #9 0x000055cd35fdef33 in bdrv_child_try_set_aio_context > > (bs=bs@entry=0x55cd3788d550, ctx=0x55cd37810e40, ignore_child=<optimized out>, errp=<optimized out>) > > at /root/src/qemu/block.c:6431 > > #10 0x000055cd3602919b in blk_do_set_aio_context > > (blk=0x55cd38cd24d0, new_context=0x55cd37810e40, update_root_node=<optimized out>, errp=<optimized out>) at /root/src/qemu/block/block-backend.c:2016 > > #11 0x000055cd35cef54f in virtio_blk_data_plane_start (vdev=<optimized out>) > > at /root/src/qemu/hw/block/dataplane/virtio-blk.c:220 > > Ugh. Thanks, I see now what's going on. > > We were relying on blk_set_aio_context() to complete the requests, and > since BlockBackend queues the requests instead of completing them during > drained_begin, we end up executing the requests after drained_end in the > main loop context, but the block nodes have meanwhile moved to the > iothread context. > > Of course, this was already a bad hack before, witnessed by the FIXME > comment in virtio_blk_dma_restart_cb(). > > > I'm not familiar with the code, but I guess we need this BH to ensure > > we process the virtqueue after resuming. > > The BH was introduced in commit 213189ab65d. The commit message has a > very detailed problem description, but to summarise, if I understand > correctly, the main purpose is that we don't end up stopping the VM due > to an I/O error while we are only in the middle of sending VM state > handlers events that the VM is about to be continued. > > > We could perhaps schedule the BH in a different way, or add a flag to > > VirtIOBlock so virtio_blk_data_plane_[start|stop] would instruct > > virtio_blk_dma_restart_bh() to avoid processing the virtqueue, > > rescheduling itself instead. How does this would look to you? > > Yes, I think if dataplane is enabled, we should really run the restarted > requests in the iothread. virtio_blk_dma_restart_cb() is called too > early, before the dataplane is running, so I guess we need some way to > tell the dataplane that it should run the request restarting logic of > virtio_blk_dma_restart_bh() after it has started (no additional BH > needed there). Your flag sounds like the obvious way to get this. > > In fact, would it hurt to do this just unconditionally without a flag? > s->rq should be NULL when there is nothing to restart. > > And then we can skip the same logic in virtio_blk_dma_restart_cb() if we > know that a dataplane is going to be started. This sounds good. Stefan
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 1b52e8159c..f1c7ba69c0 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) vblk->dataplane_started = true; trace_virtio_blk_data_plane_start(s); + /* Prevent requests from getting queued on the old context */ + blk_set_disable_request_queuing(s->conf->conf.blk, true); + r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err); if (r < 0) { error_report_err(local_err); goto fail_guest_notifiers; } + blk_set_disable_request_queuing(s->conf->conf.blk, false); + /* Kick right away to begin processing requests already in vring */ for (i = 0; i < nvqs; i++) { VirtQueue *vq = virtio_get_queue(s->vdev, i); @@ -287,10 +292,15 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) aio_context_acquire(s->ctx); aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s); + /* Prevent requests from getting queued on the old context */ + blk_set_disable_request_queuing(s->conf->conf.blk, true); + /* Drain and try to switch bs back to the QEMU main loop. If other users * keep the BlockBackend in the iothread, that's ok */ blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context(), NULL); + blk_set_disable_request_queuing(s->conf->conf.blk, false); + aio_context_release(s->ctx); for (i = 0; i < nvqs; i++) {
Disable request queuing while switching contexts on virtio_blk_data_plane_[start|stop](), preventing requests from getting queued on the wrong context. Placing requests on the wrong context may lead to them being wrongly accessed in parallel from different threads, potentially leading to multiple issues. For example, stopping and resuming a VM multiple times while the guest is generating I/O on a virtio_blk device can trigger a crash with a stack tracing looking like this one: <------> Thread 2 (Thread 0x7ff736765700 (LWP 1062503)): #0 0x00005567a13b99d6 in iov_memset (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:69 #1 0x00005567a13bab73 in qemu_iovec_memset (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 #5 0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420) at block/linux-aio.c:236 #6 0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267 #7 0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) at util/aio-posix.c:520 #8 0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8) at util/aio-posix.c:562 #9 0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8) at util/aio-posix.c:597 #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639 #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75 #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519 #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0 #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6 Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)): #0 0x00005567a13b99d6 in iov_memset (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:69 #1 0x00005567a13bab73 in qemu_iovec_memset (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530 #2 0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86 #3 0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217 #4 0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323 #5 0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2) at block/linux-aio.c:375 #6 0x00005567a12f4af2 in laio_co_submit (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2) at block/linux-aio.c:394 #7 0x00005567a12f1803 in raw_co_prw (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2) at block/file-posix.c:1892 #8 0x00005567a12f1941 in raw_co_pwritev (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0) at block/file-posix.c:1925 #9 0x00005567a12fe3e1 in bdrv_driver_pwritev (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0) at block/io.c:1183 #10 0x00005567a1300340 in bdrv_aligned_pwritev (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980 #11 0x00005567a1300b29 in bdrv_co_pwritev_part (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:2137 #12 0x00005567a12baba1 in qcow2_co_pwritev_task (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444 #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475 #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45 #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115 #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6 #17 0x00007ff6626f1350 in () #18 0x0000000000000000 in () <------> This is also known to cause crashes with this message (assertion failed): aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765 Signed-off-by: Sergio Lopez <slp@redhat.com> --- hw/block/dataplane/virtio-blk.c | 10 ++++++++++ 1 file changed, 10 insertions(+)