Message ID | 20231213211544.1601971-1-stefanha@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | aio-posix: call ->poll_end() when removing AioHandler | expand |
Based-on: 20231205182011.1976568-1-stefanha@redhat.com (https://lore.kernel.org/qemu-devel/20231205182011.1976568-1-stefanha@redhat.com/)
On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > Alternatives welcome! (A cleaner version of this approach might be to forbid > cross-thread aio_set_fd_handler() calls and to refactor all > aio_set_fd_handler() callers so they come from the AioContext's home thread. > I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs > should be thread-safe.) I think that's pretty hard because aio_set_fd_handler() is a pretty important part of the handoff from one AioContext to another and also of drained_begin()/end(), and both of these things run in the main thread. Regarding how to solve this issue, there is a lot of "underdocumenting" of the locking policy in aio-posix.c, and indeed it makes running aio_set_fd_handler() in the target AioContext tempting; but it is also scary to rely on the iothread being able to react quickly. I'm also worried that we're changing the logic just because we don't understand the old one, but then we add technical debt. So, as a first step, I would take inspiration from the block layer locking work, and add assertions to functions like poll_set_started() or find_aio_handler(). Is the list_lock elevated (nonzero)? Or locked? Are we in the iothread? And likewise, for each list, does insertion happen from the iothread or with the list_lock taken (and possibly elevated)? Does removal happen from the iothread or with list_lock zero+taken? After this step, we should have a clearer idea of the possible states of the node (based on the lists, the state is a subset of {poll_started, deleted, ready}) and draw a nice graph of the transitions. We should also understand if any calls to QLIST_IS_INSERTED() have correctness issues. Good news, I don't think any memory barriers are needed here. One thing that we already do correctly is that, once a node is deleted, we try to skip work; see for example poll_set_started(). This also provides a good place to do cleanup work for deleted nodes, including calling poll_end(): aio_free_deleted_handlers(), because it runs with list_lock zero and taken, just like the tail of aio_remove_fd_handler(). It's the safest possible place to do cleanup and to take a lock. Therefore we have: - a fast path in the iothread that runs without any concurrence with stuff happening in the main thread - a slow path in the iothread that runs with list_lock zero and taken. The slow path shares logic with the main thread, meaning that aio_free_deleted_handlers() and aio_remove_fd_handler() should share some functions called by both. If the code is organized this way, any wrong bits should jump out more easily. For example, these two lines in aio_remove_fd_handler() are clearly misplaced node->pfd.revents = 0; node->poll_ready = false; because they run in the main thread but they touch iothread data! They should be after qemu_lockcnt_count() is checked to be zero. Regarding the call to io_poll_ready(), I would hope that it is unnecessary; in other words, that after drained_end() the virtqueue notification would be raised. Yes, virtio_queue_set_notification is edge triggered rather than level triggered, so it would be necessary to add a check with virtio_queue_host_notifier_aio_poll() and virtio_queue_host_notifier_aio_poll_ready() in virtio_queue_aio_attach_host_notifier, but that does not seem too bad because virtio is the only user of the io_poll_begin and io_poll_end callbacks. It would have to be documented though. Paolo Paolo > > Stefan Hajnoczi (3): > aio-posix: run aio_set_fd_handler() in target AioContext > aio: use counter instead of ctx->list_lock > aio-posix: call ->poll_end() when removing AioHandler > > include/block/aio.h | 22 ++--- > util/aio-posix.c | 197 ++++++++++++++++++++++++++++++++------------ > util/async.c | 2 - > util/fdmon-epoll.c | 6 +- > 4 files changed, 152 insertions(+), 75 deletions(-) > > -- > 2.43.0 >
Am 13.12.23 um 22:15 schrieb Stefan Hajnoczi: > But there you have it. Please let me know what you think and try your > reproducers to see if this fixes the missing io_poll_end() issue. Thanks! > Thanks to you! I applied the RFC (and the series it depends on) on top of 8.2.0-rc4 and this fixes my reproducer which drains VirtIO SCSI or VirtIO block devices in a loop. Also didn't encounter any other issues while playing around a bit with backup and mirror jobs. The changes look fine to me, but this issue is also the first time I came in close contact with this code, so that unfortunately does not say much. Best Regards, Fiona
On Thu, Dec 14, 2023 at 12:10:32AM +0100, Paolo Bonzini wrote: > On Wed, Dec 13, 2023 at 10:15 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > Alternatives welcome! (A cleaner version of this approach might be to forbid > > cross-thread aio_set_fd_handler() calls and to refactor all > > aio_set_fd_handler() callers so they come from the AioContext's home thread. > > I'm starting to think that only the aio_notify() and aio_schedule_bh() APIs > > should be thread-safe.) > > I think that's pretty hard because aio_set_fd_handler() is a pretty > important part of the handoff from one AioContext to another and also > of drained_begin()/end(), and both of these things run in the main > thread. > > Regarding how to solve this issue, there is a lot of > "underdocumenting" of the locking policy in aio-posix.c, and indeed it > makes running aio_set_fd_handler() in the target AioContext tempting; > but it is also scary to rely on the iothread being able to react > quickly. I'm also worried that we're changing the logic just because > we don't understand the old one, but then we add technical debt. > > So, as a first step, I would take inspiration from the block layer > locking work, and add assertions to functions like poll_set_started() > or find_aio_handler(). Is the list_lock elevated (nonzero)? Or locked? > Are we in the iothread? And likewise, for each list, does insertion > happen from the iothread or with the list_lock taken (and possibly > elevated)? Does removal happen from the iothread or with list_lock > zero+taken? > > After this step, we should have a clearer idea of the possible states > of the node (based on the lists, the state is a subset of > {poll_started, deleted, ready}) and draw a nice graph of the > transitions. We should also understand if any calls to > QLIST_IS_INSERTED() have correctness issues. > > Good news, I don't think any memory barriers are needed here. One > thing that we already do correctly is that, once a node is deleted, we > try to skip work; see for example poll_set_started(). This also > provides a good place to do cleanup work for deleted nodes, including > calling poll_end(): aio_free_deleted_handlers(), because it runs with > list_lock zero and taken, just like the tail of > aio_remove_fd_handler(). It's the safest possible place to do cleanup > and to take a lock. Therefore we have: > > - a fast path in the iothread that runs without any concurrence with > stuff happening in the main thread > > - a slow path in the iothread that runs with list_lock zero and taken. > The slow path shares logic with the main thread, meaning that > aio_free_deleted_handlers() and aio_remove_fd_handler() should share > some functions called by both. > > If the code is organized this way, any wrong bits should jump out more > easily. For example, these two lines in aio_remove_fd_handler() are > clearly misplaced > > node->pfd.revents = 0; > node->poll_ready = false; > > because they run in the main thread but they touch iothread data! They > should be after qemu_lockcnt_count() is checked to be zero. > > Regarding the call to io_poll_ready(), I would hope that it is > unnecessary; in other words, that after drained_end() the virtqueue > notification would be raised. Yes, virtio_queue_set_notification is > edge triggered rather than level triggered, so it would be necessary > to add a check with virtio_queue_host_notifier_aio_poll() and > virtio_queue_host_notifier_aio_poll_ready() in > virtio_queue_aio_attach_host_notifier, but that does not seem too bad > because virtio is the only user of the io_poll_begin and io_poll_end > callbacks. It would have to be documented though. I think Hanna had the same idea: document that ->io_poll_end() isn't called by aio_set_fd_handler() and shift the responsibility onto the caller to get back into a state where notifications are enabled before they add the fd with aio_set_fd_handler() again. In a little more detail, the caller needs to do the following before adding the fd back with aio_set_fd_handler() again: 1. Call ->io_poll_end(). 2. Poll one more time in case an event slipped in and write to the eventfd so the fd is immediately readable or call ->io_poll_ready(). I think this is more or less what you described above. I don't like pushing this responsibility onto the caller, but adding a synchronization point in aio_set_fd_handler() is problematic, so let's give it a try. I'll try that approach and send a v2. Stefan > > Paolo > > > Paolo > > > > > Stefan Hajnoczi (3): > > aio-posix: run aio_set_fd_handler() in target AioContext > > aio: use counter instead of ctx->list_lock > > aio-posix: call ->poll_end() when removing AioHandler > > > > include/block/aio.h | 22 ++--- > > util/aio-posix.c | 197 ++++++++++++++++++++++++++++++++------------ > > util/async.c | 2 - > > util/fdmon-epoll.c | 6 +- > > 4 files changed, 152 insertions(+), 75 deletions(-) > > > > -- > > 2.43.0 > > >
On Thu, Dec 14, 2023 at 02:38:47PM +0100, Fiona Ebner wrote: > Am 13.12.23 um 22:15 schrieb Stefan Hajnoczi: > > But there you have it. Please let me know what you think and try your > > reproducers to see if this fixes the missing io_poll_end() issue. Thanks! > > > > Thanks to you! I applied the RFC (and the series it depends on) on top > of 8.2.0-rc4 and this fixes my reproducer which drains VirtIO SCSI or > VirtIO block devices in a loop. Also didn't encounter any other issues > while playing around a bit with backup and mirror jobs. > > The changes look fine to me, but this issue is also the first time I > came in close contact with this code, so that unfortunately does not say > much. Great. I will still try the other approach that Hanna and Paolo have suggested. It seems more palatable. I will send a v2. Stefan
Am 14.12.23 um 20:53 schrieb Stefan Hajnoczi: > > I will still try the other approach that Hanna and Paolo have suggested. > It seems more palatable. I will send a v2. > FYI, what I already tried downstream (for VirtIO SCSI): > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > index 9c751bf296..a6449b04d0 100644 > --- a/hw/scsi/virtio-scsi.c > +++ b/hw/scsi/virtio-scsi.c > @@ -1166,6 +1166,8 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > > for (uint32_t i = 0; i < total_queues; i++) { > VirtQueue *vq = virtio_get_queue(vdev, i); > + virtio_queue_set_notification(vq, 1); > + virtio_queue_notify(vdev, i); > virtio_queue_aio_attach_host_notifier(vq, s->ctx); > } > } But this introduces an issue where e.g. a 'backup' QMP command would put the iothread into a bad state. After the command, whenever the guest issues IO, the thread will temporarily spike to using 100% CPU. Using QMP stop+cont is a way to make it go back to normal. I think it's because of nested drains, because when additionally checking that the drain count is zero and only executing the loop then, that issue doesn't seem to manifest, i.e.: > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > index 9c751bf296..d22c586b38 100644 > --- a/hw/scsi/virtio-scsi.c > +++ b/hw/scsi/virtio-scsi.c > @@ -1164,9 +1164,13 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > return; > } > > - for (uint32_t i = 0; i < total_queues; i++) { > - VirtQueue *vq = virtio_get_queue(vdev, i); > - virtio_queue_aio_attach_host_notifier(vq, s->ctx); > + if (s->bus.drain_count == 0) { > + for (uint32_t i = 0; i < total_queues; i++) { > + VirtQueue *vq = virtio_get_queue(vdev, i); > + virtio_queue_set_notification(vq, 1); > + virtio_queue_notify(vdev, i); > + virtio_queue_aio_attach_host_notifier(vq, s->ctx); > + } > } > } > Best Regards, Fiona
On Mon, Dec 18, 2023 at 01:41:38PM +0100, Fiona Ebner wrote: > Am 14.12.23 um 20:53 schrieb Stefan Hajnoczi: > > > > I will still try the other approach that Hanna and Paolo have suggested. > > It seems more palatable. I will send a v2. > > > > FYI, what I already tried downstream (for VirtIO SCSI): > > > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > > index 9c751bf296..a6449b04d0 100644 > > --- a/hw/scsi/virtio-scsi.c > > +++ b/hw/scsi/virtio-scsi.c > > @@ -1166,6 +1166,8 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > > > > for (uint32_t i = 0; i < total_queues; i++) { > > VirtQueue *vq = virtio_get_queue(vdev, i); > > + virtio_queue_set_notification(vq, 1); > > + virtio_queue_notify(vdev, i); > > virtio_queue_aio_attach_host_notifier(vq, s->ctx); > > } > > } > > But this introduces an issue where e.g. a 'backup' QMP command would put > the iothread into a bad state. After the command, whenever the guest > issues IO, the thread will temporarily spike to using 100% CPU. Using > QMP stop+cont is a way to make it go back to normal. > > I think it's because of nested drains, because when additionally > checking that the drain count is zero and only executing the loop then, > that issue doesn't seem to manifest, i.e.: Thanks for letting me know about the issue. I'll keep an eye out for it when playing with the code. Stefan > > > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > > index 9c751bf296..d22c586b38 100644 > > --- a/hw/scsi/virtio-scsi.c > > +++ b/hw/scsi/virtio-scsi.c > > @@ -1164,9 +1164,13 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > > return; > > } > > > > - for (uint32_t i = 0; i < total_queues; i++) { > > - VirtQueue *vq = virtio_get_queue(vdev, i); > > - virtio_queue_aio_attach_host_notifier(vq, s->ctx); > > + if (s->bus.drain_count == 0) { > > + for (uint32_t i = 0; i < total_queues; i++) { > > + VirtQueue *vq = virtio_get_queue(vdev, i); > > + virtio_queue_set_notification(vq, 1); > > + virtio_queue_notify(vdev, i); > > + virtio_queue_aio_attach_host_notifier(vq, s->ctx); > > + } > > } > > } > > > > Best Regards, > Fiona >
On Mon, Dec 18, 2023 at 1:41 PM Fiona Ebner <f.ebner@proxmox.com> wrote: > I think it's because of nested drains, because when additionally > checking that the drain count is zero and only executing the loop then, > that issue doesn't seem to manifest But isn't virtio_scsi_drained_end only run if bus->drain_count == 0? if (bus->drain_count-- == 1) { trace_scsi_bus_drained_end(bus, sdev); if (bus->info->drained_end) { bus->info->drained_end(bus); } } Paolo > > > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > > index 9c751bf296..d22c586b38 100644 > > --- a/hw/scsi/virtio-scsi.c > > +++ b/hw/scsi/virtio-scsi.c > > @@ -1164,9 +1164,13 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > > return; > > } > > > > - for (uint32_t i = 0; i < total_queues; i++) { > > - VirtQueue *vq = virtio_get_queue(vdev, i); > > - virtio_queue_aio_attach_host_notifier(vq, s->ctx); > > + if (s->bus.drain_count == 0) { > > + for (uint32_t i = 0; i < total_queues; i++) { > > + VirtQueue *vq = virtio_get_queue(vdev, i); > > + virtio_queue_set_notification(vq, 1); > > + virtio_queue_notify(vdev, i); > > + virtio_queue_aio_attach_host_notifier(vq, s->ctx); > > + } > > } > > } > > > > Best Regards, > Fiona >
Am 18.12.23 um 15:49 schrieb Paolo Bonzini: > On Mon, Dec 18, 2023 at 1:41 PM Fiona Ebner <f.ebner@proxmox.com> wrote: >> I think it's because of nested drains, because when additionally >> checking that the drain count is zero and only executing the loop then, >> that issue doesn't seem to manifest > > But isn't virtio_scsi_drained_end only run if bus->drain_count == 0? > > if (bus->drain_count-- == 1) { > trace_scsi_bus_drained_end(bus, sdev); > if (bus->info->drained_end) { > bus->info->drained_end(bus); > } > } > You're right. Sorry, I must've messed up my testing yesterday :( Sometimes the CPU spikes are very short-lived. Now I see the same issue with both variants. Unfortunately, I haven't been able to figure out why it happens yet. Best Regards, Fiona
On 13.12.23 22:15, Stefan Hajnoczi wrote: > Hanna and Fiona encountered a bug in aio_set_fd_handler(): there is no matching > io_poll_end() call upon removing an AioHandler when io_poll_begin() was > previously called. The missing io_poll_end() call leaves virtqueue > notifications disabled and the virtqueue's ioeventfd will never become > readable anymore. > > The details of how virtio-scsi devices using IOThreads can hang after > hotplug/unplug are covered here: > https://issues.redhat.com/browse/RHEL-3934 > > Hanna is currently away over the December holidays. I'm sending these RFC > patches in the meantime. They demonstrate running aio_set_fd_handler() in the > AioContext home thread and adding the missing io_poll_end() call. I agree with Paolo that if the handlers are removed, the caller probably isn’t interested in notifications: In our specific case, we remove the handlers because the device is to be drained, so we won’t poll the virtqueue anyway. Therefore, if io_poll_end() is to be called to complete the start-end pair, it shouldn’t be done when the handlers are removed, but instead when they are reinstated. I believe that’s quite infeasible to do in generic code: We’d have to basically remember that we haven’t called a specific io_poll_end callback yet, and so once it is reinstated while we’re not in a polling phase, we would need to call it then. That in itself is already hairy, but in addition, the callback consists of a function pointer and an opaque pointer, and it seems impossible to reliably establish identity between two opaque pointers to know whether a newly instated io_poll_end callback is the same one as one that had been removed before (pointer identity may work, but also may not). That’s why I think the responsibility should fall on the caller. I believe both virtio_queue_aio_attach_host_notifier*() functions should enable vq notifications before instating the handler – if we’re in polling mode, io_poll_start() will then immediately be called and disable vq notifications again. Having them enabled briefly shouldn’t hurt anything but performance (which I don’t think is terrible in the drain case). For completeness’ sake, we may even have virtio_queue_aio_detach_host_notifier() disable vq notifications, because otherwise it’s unknown whether notifications are enabled or disabled after removing the notifiers. That isn’t terrible, but I think (A) if any of the two, we want them to be disabled, because we won’t check for requests anyway, and (B) this gives us a clearer state machine, where removing the notifiers will always leave vq notifications disabled, so when they are reinstated (i.e. when calling virtio_queue_aio_attach_host_notifier*()), it’s clear that we must poll once to check for new requests. I’ve attached the preliminary patch that I didn’t get to send (or test much) last year. Not sure if it has the same CPU-usage-spike issue Fiona was seeing, the only functional difference is that I notify the vq after attaching the notifiers instead of before. Hanna
On Tue, Jan 2, 2024 at 4:24 PM Hanna Czenczek <hreitz@redhat.com> wrote: > I’ve attached the preliminary patch that I didn’t get to send (or test > much) last year. Not sure if it has the same CPU-usage-spike issue > Fiona was seeing, the only functional difference is that I notify the vq > after attaching the notifiers instead of before. I think the patch makes sense and cleaning up the logic of aio_poll (which is one of those functions that grew and grew without much clarity into who did what) can be done on top. Just one small thing, the virtio_queue_notify_vq() call is required because the virtqueue interrupt and eventfd are edge-triggered rather than level-triggered; so perhaps it should be placed in the function(s) that establish the handlers, virtio_queue_aio_attach_host_notifier() and virtio_queue_aio_attach_host_notifier_no_poll()? Neither virtio_blk_drained_end() nor virtio_scsi_drained_end() are particularly special, and the comment applies just as well: /* * We will have ignored notifications about new requests from the guest * while handlers were not attached, so "kick" the virt queue to process * those requests now. */ Paolo
On 02.01.24 16:53, Paolo Bonzini wrote: > On Tue, Jan 2, 2024 at 4:24 PM Hanna Czenczek<hreitz@redhat.com> wrote: >> I’ve attached the preliminary patch that I didn’t get to send (or test >> much) last year. Not sure if it has the same CPU-usage-spike issue >> Fiona was seeing, the only functional difference is that I notify the vq >> after attaching the notifiers instead of before. > I think the patch makes sense and cleaning up the logic of aio_poll > (which is one of those functions that grew and grew without much > clarity into who did what) can be done on top. > > Just one small thing, the virtio_queue_notify_vq() call is required > because the virtqueue interrupt and eventfd are edge-triggered rather > than level-triggered; so perhaps it should be placed in the > function(s) that establish the handlers, > virtio_queue_aio_attach_host_notifier() and > virtio_queue_aio_attach_host_notifier_no_poll()? Neither > virtio_blk_drained_end() nor virtio_scsi_drained_end() are > particularly special, and the comment applies just as well: > > /* > * We will have ignored notifications about new requests from the guest > * while handlers were not attached, so "kick" the virt queue to process > * those requests now. > */ I wasn’t entirely whether we want to call notify_vq() if we have instated the handlers for the first time (in which case someone ought to look for existing unprocessed requests anyway), so I decided to limit it to drained_end. But considering that it must be safe to call notify_vq() right after instating the handlers (virtio_queue_host_notifier_read() may then be called after all), we might as well do so every time, yes. Hanna
Am 02.01.24 um 16:24 schrieb Hanna Czenczek: > > I’ve attached the preliminary patch that I didn’t get to send (or test > much) last year. Not sure if it has the same CPU-usage-spike issue > Fiona was seeing, the only functional difference is that I notify the vq > after attaching the notifiers instead of before. > Applied the patch on top of c12887e1b0 ("block-coroutine-wrapper: use qemu_get_current_aio_context()") because it conflicts with b6948ab01d ("virtio-blk: add iothread-vq-mapping parameter"). I'm happy to report that I cannot reproduce the CPU-usage-spike issue with the patch, but I did run into an assertion failure when trying to verify that it fixes my original stuck-guest-IO issue. See below for the backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934 > I think it’s sufficient to simply call virtio_queue_notify_vq(vq) after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because both virtio-scsi’s and virtio-blk’s .handle_output() implementations acquire the device’s context, so this should be directly callable from any context. I guess this is not true anymore now that the AioContext locking was removed? Back to the CPU-usage-spike issue: I experimented around and it doesn't seem to matter whether I notify the virt queue before or after attaching the notifiers. But there's another functional difference. My patch called virtio_queue_notify() which contains this block: > if (vq->host_notifier_enabled) { > event_notifier_set(&vq->host_notifier); > } else if (vq->handle_output) { > vq->handle_output(vdev, vq); In my testing, the first branch was taken, calling event_notifier_set(). Hanna's patch uses virtio_queue_notify_vq() and there, vq->handle_output() will be called. That seems to be the relevant difference regarding the CPU-usage-spike issue. Best Regards, Fiona [0]: > #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44 > #1 0x00007ffff60e3d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 > #2 0x00007ffff6094f32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 > #3 0x00007ffff607f472 in __GI_abort () at ./stdlib/abort.c:79 > #4 0x00007ffff607f395 in __assert_fail_base (fmt=0x7ffff61f3a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x555556246bf8 "ctx == qemu_get_current_aio_context()", > file=file@entry=0x555556246baf "../system/dma-helpers.c", line=line@entry=123, > function=function@entry=0x555556246c70 <__PRETTY_FUNCTION__.1> "dma_blk_cb") at ./assert/assert.c:92 > #5 0x00007ffff608de32 in __GI___assert_fail (assertion=0x555556246bf8 "ctx == qemu_get_current_aio_context()", > file=0x555556246baf "../system/dma-helpers.c", line=123, function=0x555556246c70 <__PRETTY_FUNCTION__.1> "dma_blk_cb") > at ./assert/assert.c:101 > #6 0x0000555555b83425 in dma_blk_cb (opaque=0x55555804f150, ret=0) at ../system/dma-helpers.c:123 > #7 0x0000555555b839ec in dma_blk_io (ctx=0x555557404310, sg=0x5555588ca6f8, offset=70905856, align=512, > io_func=0x555555a94a87 <scsi_dma_readv>, io_func_opaque=0x55555817ea00, cb=0x555555a8d99f <scsi_dma_complete>, opaque=0x55555817ea00, > dir=DMA_DIRECTION_FROM_DEVICE) at ../system/dma-helpers.c:236 > #8 0x0000555555a8de9a in scsi_do_read (r=0x55555817ea00, ret=0) at ../hw/scsi/scsi-disk.c:431 > #9 0x0000555555a8e249 in scsi_read_data (req=0x55555817ea00) at ../hw/scsi/scsi-disk.c:501 > #10 0x0000555555a897e3 in scsi_req_continue (req=0x55555817ea00) at ../hw/scsi/scsi-bus.c:1478 > #11 0x0000555555d8270e in virtio_scsi_handle_cmd_req_submit (s=0x555558669af0, req=0x5555588ca6b0) at ../hw/scsi/virtio-scsi.c:828 > #12 0x0000555555d82937 in virtio_scsi_handle_cmd_vq (s=0x555558669af0, vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:870 > #13 0x0000555555d829a9 in virtio_scsi_handle_cmd (vdev=0x555558669af0, vq=0x555558672550) at ../hw/scsi/virtio-scsi.c:883 > #14 0x0000555555db3784 in virtio_queue_notify_vq (vq=0x555558672550) at ../hw/virtio/virtio.c:2268 > #15 0x0000555555d8346a in virtio_scsi_drained_end (bus=0x555558669d88) at ../hw/scsi/virtio-scsi.c:1179 > #16 0x0000555555a8a549 in scsi_device_drained_end (sdev=0x555558105000) at ../hw/scsi/scsi-bus.c:1774 > #17 0x0000555555a931db in scsi_disk_drained_end (opaque=0x555558105000) at ../hw/scsi/scsi-disk.c:2369 > #18 0x0000555555ee439c in blk_root_drained_end (child=0x5555574065d0) at ../block/block-backend.c:2829 > #19 0x0000555555ef0ac3 in bdrv_parent_drained_end_single (c=0x5555574065d0) at ../block/io.c:74 > #20 0x0000555555ef0b02 in bdrv_parent_drained_end (bs=0x555557409f80, ignore=0x0) at ../block/io.c:89 > #21 0x0000555555ef1b1b in bdrv_do_drained_end (bs=0x555557409f80, parent=0x0) at ../block/io.c:421 > #22 0x0000555555ef1b5a in bdrv_drained_end (bs=0x555557409f80) at ../block/io.c:428 > #23 0x0000555555efcf64 in mirror_exit_common (job=0x5555588b8220) at ../block/mirror.c:798 > #24 0x0000555555efcfde in mirror_abort (job=0x5555588b8220) at ../block/mirror.c:814 > #25 0x0000555555ec53ea in job_abort (job=0x5555588b8220) at ../job.c:825 > #26 0x0000555555ec54d5 in job_finalize_single_locked (job=0x5555588b8220) at ../job.c:855 > #27 0x0000555555ec57cb in job_completed_txn_abort_locked (job=0x5555588b8220) at ../job.c:958 > #28 0x0000555555ec5c20 in job_completed_locked (job=0x5555588b8220) at ../job.c:1065 > #29 0x0000555555ec5cd5 in job_exit (opaque=0x5555588b8220) at ../job.c:1088 > #30 0x000055555608342e in aio_bh_call (bh=0x7fffe400dfd0) at ../util/async.c:169 > #31 0x0000555556083549 in aio_bh_poll (ctx=0x55555718ade0) at ../util/async.c:216 > #32 0x0000555556065203 in aio_dispatch (ctx=0x55555718ade0) at ../util/aio-posix.c:423 > #33 0x0000555556083988 in aio_ctx_dispatch (source=0x55555718ade0, callback=0x0, user_data=0x0) at ../util/async.c:358 > #34 0x00007ffff753e7a9 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 > #35 0x00005555560850ae in glib_pollfds_poll () at ../util/main-loop.c:290 > #36 0x000055555608512b in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313 > #37 0x0000555556085239 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592 > #38 0x0000555555b8d501 in qemu_main_loop () at ../system/runstate.c:782 > #39 0x0000555555e55587 in qemu_default_main () at ../system/main.c:37 > #40 0x0000555555e555c2 in main (argc=68, argv=0x7fffffffd8b8) at ../system/main.c:48
On 1/3/24 12:40, Fiona Ebner wrote: > I'm happy to report that I cannot reproduce the CPU-usage-spike issue > with the patch, but I did run into an assertion failure when trying to > verify that it fixes my original stuck-guest-IO issue. See below for the > backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934 > >> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because >> both virtio-scsi’s and virtio-blk’s .handle_output() implementations >> acquire the device’s context, so this should be directly callable from >> any context. > > I guess this is not true anymore now that the AioContext locking was > removed? Good point and, in fact, even before it was much safer to use virtio_queue_notify() instead. Not only does it use the event notifier handler, but it also calls it in the right thread/AioContext just by doing event_notifier_set(). The call to virtio_queue_set_notification(..., 1) is safe; not sure about the call to virtio_queue_set_notification(..., 0) though. I'd rather have that executed synchronously in the AioContext using aio_wait_bh_oneshot(). This is consistent with the pattern used by virtio_scsi_dataplane_stop() and virtio_blk_data_plane_stop(). Paolo
Am 03.01.24 um 14:35 schrieb Paolo Bonzini: > On 1/3/24 12:40, Fiona Ebner wrote: >> I'm happy to report that I cannot reproduce the CPU-usage-spike issue >> with the patch, but I did run into an assertion failure when trying to >> verify that it fixes my original stuck-guest-IO issue. See below for the >> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934 >> >>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because >>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations >>> acquire the device’s context, so this should be directly callable from >>> any context. >> >> I guess this is not true anymore now that the AioContext locking was >> removed? > > Good point and, in fact, even before it was much safer to use > virtio_queue_notify() instead. Not only does it use the event notifier > handler, but it also calls it in the right thread/AioContext just by > doing event_notifier_set(). > But with virtio_queue_notify() using the event notifier, the CPU-usage-spike issue is present: >> Back to the CPU-usage-spike issue: I experimented around and it doesn't >> seem to matter whether I notify the virt queue before or after attaching >> the notifiers. But there's another functional difference. My patch >> called virtio_queue_notify() which contains this block: >> >>> if (vq->host_notifier_enabled) { >>> event_notifier_set(&vq->host_notifier); >>> } else if (vq->handle_output) { >>> vq->handle_output(vdev, vq); >> >> In my testing, the first branch was taken, calling event_notifier_set(). >> Hanna's patch uses virtio_queue_notify_vq() and there, >> vq->handle_output() will be called. That seems to be the relevant >> difference regarding the CPU-usage-spike issue. I should mention that this is with a VirtIO SCSI disk. I also attempted to reproduce the CPU-usage-spike issue with a VirtIO block disk, but didn't manage yet. What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of the queues (but only one) will always show as nonempty. And then, run_poll_handlers_once() will always detect progress which explains the CPU usage. The following shows 1. vq address 2. number of times vq was passed to virtio_queue_host_notifier_aio_poll() 3. number of times the result of virtio_queue_host_notifier_aio_poll() was true for the vq > 0x555fd93f9c80 17162000 0 > 0x555fd93f9e48 17162000 6 > 0x555fd93f9ee0 17162000 0 > 0x555fd93f9d18 17162000 17162000 > 0x555fd93f9db0 17162000 0 > 0x555fd93f9f78 17162000 0 And for the problematic one, the reason it is seen as nonempty is: > 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0 Those values stay like this while the call counts above increase. So something going wrong with the indices when the event notifier is set from QEMU side (in the main thread)? The guest is Debian 12 with a 6.1 kernel. Best Regards, Fiona
Am 05.01.24 um 14:43 schrieb Fiona Ebner: > Am 03.01.24 um 14:35 schrieb Paolo Bonzini: >> On 1/3/24 12:40, Fiona Ebner wrote: >>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue >>> with the patch, but I did run into an assertion failure when trying to >>> verify that it fixes my original stuck-guest-IO issue. See below for the >>> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934 >>> >>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because >>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations >>>> acquire the device’s context, so this should be directly callable from >>>> any context. >>> >>> I guess this is not true anymore now that the AioContext locking was >>> removed? >> >> Good point and, in fact, even before it was much safer to use >> virtio_queue_notify() instead. Not only does it use the event notifier >> handler, but it also calls it in the right thread/AioContext just by >> doing event_notifier_set(). >> > > But with virtio_queue_notify() using the event notifier, the > CPU-usage-spike issue is present: > >>> Back to the CPU-usage-spike issue: I experimented around and it doesn't >>> seem to matter whether I notify the virt queue before or after attaching >>> the notifiers. But there's another functional difference. My patch >>> called virtio_queue_notify() which contains this block: >>> >>>> if (vq->host_notifier_enabled) { >>>> event_notifier_set(&vq->host_notifier); >>>> } else if (vq->handle_output) { >>>> vq->handle_output(vdev, vq); >>> >>> In my testing, the first branch was taken, calling event_notifier_set(). >>> Hanna's patch uses virtio_queue_notify_vq() and there, >>> vq->handle_output() will be called. That seems to be the relevant >>> difference regarding the CPU-usage-spike issue. > > I should mention that this is with a VirtIO SCSI disk. I also attempted > to reproduce the CPU-usage-spike issue with a VirtIO block disk, but > didn't manage yet. > > What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of > the queues (but only one) will always show as nonempty. And then, > run_poll_handlers_once() will always detect progress which explains the > CPU usage. > > The following shows > 1. vq address > 2. number of times vq was passed to virtio_queue_host_notifier_aio_poll() > 3. number of times the result of virtio_queue_host_notifier_aio_poll() > was true for the vq > >> 0x555fd93f9c80 17162000 0 >> 0x555fd93f9e48 17162000 6 >> 0x555fd93f9ee0 17162000 0 >> 0x555fd93f9d18 17162000 17162000 >> 0x555fd93f9db0 17162000 0 >> 0x555fd93f9f78 17162000 0 > > And for the problematic one, the reason it is seen as nonempty is: > >> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0 > vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and s->events_dropped is false in my testing, so virtio_scsi_handle_event_vq() doesn't do anything. > Those values stay like this while the call counts above increase. > > So something going wrong with the indices when the event notifier is set > from QEMU side (in the main thread)? > > The guest is Debian 12 with a 6.1 kernel. Best Regards, Fiona
On 05.01.24 15:30, Fiona Ebner wrote: > Am 05.01.24 um 14:43 schrieb Fiona Ebner: >> Am 03.01.24 um 14:35 schrieb Paolo Bonzini: >>> On 1/3/24 12:40, Fiona Ebner wrote: >>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue >>>> with the patch, but I did run into an assertion failure when trying to >>>> verify that it fixes my original stuck-guest-IO issue. See below for the >>>> backtrace [0]. Hanna wrote in https://issues.redhat.com/browse/RHEL-3934 >>>> >>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, because >>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations >>>>> acquire the device’s context, so this should be directly callable from >>>>> any context. >>>> I guess this is not true anymore now that the AioContext locking was >>>> removed? >>> Good point and, in fact, even before it was much safer to use >>> virtio_queue_notify() instead. Not only does it use the event notifier >>> handler, but it also calls it in the right thread/AioContext just by >>> doing event_notifier_set(). >>> >> But with virtio_queue_notify() using the event notifier, the >> CPU-usage-spike issue is present: >> >>>> Back to the CPU-usage-spike issue: I experimented around and it doesn't >>>> seem to matter whether I notify the virt queue before or after attaching >>>> the notifiers. But there's another functional difference. My patch >>>> called virtio_queue_notify() which contains this block: >>>> >>>>> if (vq->host_notifier_enabled) { >>>>> event_notifier_set(&vq->host_notifier); >>>>> } else if (vq->handle_output) { >>>>> vq->handle_output(vdev, vq); >>>> In my testing, the first branch was taken, calling event_notifier_set(). >>>> Hanna's patch uses virtio_queue_notify_vq() and there, >>>> vq->handle_output() will be called. That seems to be the relevant >>>> difference regarding the CPU-usage-spike issue. >> I should mention that this is with a VirtIO SCSI disk. I also attempted >> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but >> didn't manage yet. >> >> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of >> the queues (but only one) will always show as nonempty. And then, >> run_poll_handlers_once() will always detect progress which explains the >> CPU usage. >> >> The following shows >> 1. vq address >> 2. number of times vq was passed to virtio_queue_host_notifier_aio_poll() >> 3. number of times the result of virtio_queue_host_notifier_aio_poll() >> was true for the vq >> >>> 0x555fd93f9c80 17162000 0 >>> 0x555fd93f9e48 17162000 6 >>> 0x555fd93f9ee0 17162000 0 >>> 0x555fd93f9d18 17162000 17162000 >>> 0x555fd93f9db0 17162000 0 >>> 0x555fd93f9f78 17162000 0 >> And for the problematic one, the reason it is seen as nonempty is: >> >>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0 > vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and > s->events_dropped is false in my testing, so > virtio_scsi_handle_event_vq() doesn't do anything. > >> Those values stay like this while the call counts above increase. >> >> So something going wrong with the indices when the event notifier is set >> from QEMU side (in the main thread)? >> >> The guest is Debian 12 with a 6.1 kernel. So, trying to figure out a new RFC version: About the stack trace you, Fiona, posted: As far as I understand, that happens because virtio_blk_drained_end() calling virtio_queue_notify_vq() wasn’t safe after all, and instead we need to use virtio_queue_notify(). Right? However, you say using virtio_queue_notify() instead causes busy loops of doing nothing in virtio-scsi (what you describe above). I mean, better than a crash, but, you know. :) I don’t know have any prior knowledge about the event handling, unfortunately. The fact that 8 buffers are available but we don’t use any sounds OK to me; as far as I understand, we only use those buffers if we have any events to push into them, so as long as we don’t, we won’t. Question is, should we not have its poll handler return false if we don’t have any events (i.e. events_dropped is false)? Would that solve it? Hanna
On 22.01.24 18:41, Hanna Czenczek wrote: > On 05.01.24 15:30, Fiona Ebner wrote: >> Am 05.01.24 um 14:43 schrieb Fiona Ebner: >>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini: >>>> On 1/3/24 12:40, Fiona Ebner wrote: >>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue >>>>> with the patch, but I did run into an assertion failure when >>>>> trying to >>>>> verify that it fixes my original stuck-guest-IO issue. See below >>>>> for the >>>>> backtrace [0]. Hanna wrote in >>>>> https://issues.redhat.com/browse/RHEL-3934 >>>>> >>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, >>>>>> because >>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations >>>>>> acquire the device’s context, so this should be directly callable >>>>>> from >>>>>> any context. >>>>> I guess this is not true anymore now that the AioContext locking was >>>>> removed? >>>> Good point and, in fact, even before it was much safer to use >>>> virtio_queue_notify() instead. Not only does it use the event >>>> notifier >>>> handler, but it also calls it in the right thread/AioContext just by >>>> doing event_notifier_set(). >>>> >>> But with virtio_queue_notify() using the event notifier, the >>> CPU-usage-spike issue is present: >>> >>>>> Back to the CPU-usage-spike issue: I experimented around and it >>>>> doesn't >>>>> seem to matter whether I notify the virt queue before or after >>>>> attaching >>>>> the notifiers. But there's another functional difference. My patch >>>>> called virtio_queue_notify() which contains this block: >>>>> >>>>>> if (vq->host_notifier_enabled) { >>>>>> event_notifier_set(&vq->host_notifier); >>>>>> } else if (vq->handle_output) { >>>>>> vq->handle_output(vdev, vq); >>>>> In my testing, the first branch was taken, calling >>>>> event_notifier_set(). >>>>> Hanna's patch uses virtio_queue_notify_vq() and there, >>>>> vq->handle_output() will be called. That seems to be the relevant >>>>> difference regarding the CPU-usage-spike issue. >>> I should mention that this is with a VirtIO SCSI disk. I also attempted >>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but >>> didn't manage yet. >>> >>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of >>> the queues (but only one) will always show as nonempty. And then, >>> run_poll_handlers_once() will always detect progress which explains the >>> CPU usage. >>> >>> The following shows >>> 1. vq address >>> 2. number of times vq was passed to >>> virtio_queue_host_notifier_aio_poll() >>> 3. number of times the result of virtio_queue_host_notifier_aio_poll() >>> was true for the vq >>> >>>> 0x555fd93f9c80 17162000 0 >>>> 0x555fd93f9e48 17162000 6 >>>> 0x555fd93f9ee0 17162000 0 >>>> 0x555fd93f9d18 17162000 17162000 >>>> 0x555fd93f9db0 17162000 0 >>>> 0x555fd93f9f78 17162000 0 >>> And for the problematic one, the reason it is seen as nonempty is: >>> >>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0 >> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and >> s->events_dropped is false in my testing, so >> virtio_scsi_handle_event_vq() doesn't do anything. >> >>> Those values stay like this while the call counts above increase. >>> >>> So something going wrong with the indices when the event notifier is >>> set >>> from QEMU side (in the main thread)? >>> >>> The guest is Debian 12 with a 6.1 kernel. > > So, trying to figure out a new RFC version: > > About the stack trace you, Fiona, posted: As far as I understand, > that happens because virtio_blk_drained_end() calling > virtio_queue_notify_vq() wasn’t safe after all, and instead we need to > use virtio_queue_notify(). Right? > > However, you say using virtio_queue_notify() instead causes busy loops > of doing nothing in virtio-scsi (what you describe above). I mean, > better than a crash, but, you know. :) > > I don’t know have any prior knowledge about the event handling, > unfortunately. The fact that 8 buffers are available but we don’t use > any sounds OK to me; as far as I understand, we only use those buffers > if we have any events to push into them, so as long as we don’t, we > won’t. Question is, should we not have its poll handler return false > if we don’t have any events (i.e. events_dropped is false)? Would > that solve it? Or actually, maybe we could just skip the virtio_queue_notify() call for the event vq? I.e. have it be `if (vq != VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`? I wouldn’t like that very much, (a) because this would make it slightly cumbersome to put that into virtio_queue_aio_attach_host_notifier*(), and (b) in case that does fix it, I do kind of feel like the real problem is that we use virtio_queue_host_notifier_aio_poll() for the event vq, which tells the polling code to poll whenever the vq is non-empty, but we (AFAIU) expect the event vq to be non-empty all the time. Hanna
Am 22.01.24 um 18:52 schrieb Hanna Czenczek: > On 22.01.24 18:41, Hanna Czenczek wrote: >> On 05.01.24 15:30, Fiona Ebner wrote: >>> Am 05.01.24 um 14:43 schrieb Fiona Ebner: >>>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini: >>>>> On 1/3/24 12:40, Fiona Ebner wrote: >>>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue >>>>>> with the patch, but I did run into an assertion failure when >>>>>> trying to >>>>>> verify that it fixes my original stuck-guest-IO issue. See below >>>>>> for the >>>>>> backtrace [0]. Hanna wrote in >>>>>> https://issues.redhat.com/browse/RHEL-3934 >>>>>> >>>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >>>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, >>>>>>> because >>>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations >>>>>>> acquire the device’s context, so this should be directly callable >>>>>>> from >>>>>>> any context. >>>>>> I guess this is not true anymore now that the AioContext locking was >>>>>> removed? >>>>> Good point and, in fact, even before it was much safer to use >>>>> virtio_queue_notify() instead. Not only does it use the event >>>>> notifier >>>>> handler, but it also calls it in the right thread/AioContext just by >>>>> doing event_notifier_set(). >>>>> >>>> But with virtio_queue_notify() using the event notifier, the >>>> CPU-usage-spike issue is present: >>>> >>>>>> Back to the CPU-usage-spike issue: I experimented around and it >>>>>> doesn't >>>>>> seem to matter whether I notify the virt queue before or after >>>>>> attaching >>>>>> the notifiers. But there's another functional difference. My patch >>>>>> called virtio_queue_notify() which contains this block: >>>>>> >>>>>>> if (vq->host_notifier_enabled) { >>>>>>> event_notifier_set(&vq->host_notifier); >>>>>>> } else if (vq->handle_output) { >>>>>>> vq->handle_output(vdev, vq); >>>>>> In my testing, the first branch was taken, calling >>>>>> event_notifier_set(). >>>>>> Hanna's patch uses virtio_queue_notify_vq() and there, >>>>>> vq->handle_output() will be called. That seems to be the relevant >>>>>> difference regarding the CPU-usage-spike issue. >>>> I should mention that this is with a VirtIO SCSI disk. I also attempted >>>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but >>>> didn't manage yet. >>>> >>>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of >>>> the queues (but only one) will always show as nonempty. And then, >>>> run_poll_handlers_once() will always detect progress which explains the >>>> CPU usage. >>>> >>>> The following shows >>>> 1. vq address >>>> 2. number of times vq was passed to >>>> virtio_queue_host_notifier_aio_poll() >>>> 3. number of times the result of virtio_queue_host_notifier_aio_poll() >>>> was true for the vq >>>> >>>>> 0x555fd93f9c80 17162000 0 >>>>> 0x555fd93f9e48 17162000 6 >>>>> 0x555fd93f9ee0 17162000 0 >>>>> 0x555fd93f9d18 17162000 17162000 >>>>> 0x555fd93f9db0 17162000 0 >>>>> 0x555fd93f9f78 17162000 0 >>>> And for the problematic one, the reason it is seen as nonempty is: >>>> >>>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0 >>> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and >>> s->events_dropped is false in my testing, so >>> virtio_scsi_handle_event_vq() doesn't do anything. >>> >>>> Those values stay like this while the call counts above increase. >>>> >>>> So something going wrong with the indices when the event notifier is >>>> set >>>> from QEMU side (in the main thread)? >>>> >>>> The guest is Debian 12 with a 6.1 kernel. >> >> So, trying to figure out a new RFC version: >> >> About the stack trace you, Fiona, posted: As far as I understand, >> that happens because virtio_blk_drained_end() calling >> virtio_queue_notify_vq() wasn’t safe after all, and instead we need to >> use virtio_queue_notify(). Right? >> AFAICT, yes. In particular, after 4f36b13847 ("scsi: remove AioContext locking"), the AioContext is not acquired by virtio_scsi_handle_{cmd,ctrl,event} anymore. >> However, you say using virtio_queue_notify() instead causes busy loops >> of doing nothing in virtio-scsi (what you describe above). I mean, >> better than a crash, but, you know. :) Yes, that happens for me when using virtio_queue_notify(), i.e. > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > index 690aceec45..8cdf04ac2d 100644 > --- a/hw/scsi/virtio-scsi.c > +++ b/hw/scsi/virtio-scsi.c > @@ -1166,7 +1166,11 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > > for (uint32_t i = 0; i < total_queues; i++) { > VirtQueue *vq = virtio_get_queue(vdev, i); > + if (!virtio_queue_get_notification(vq)) { > + virtio_queue_set_notification(vq, true); > + } > virtio_queue_aio_attach_host_notifier(vq, s->ctx); > + virtio_queue_notify(vdev, i); > } > } > >> >> I don’t know have any prior knowledge about the event handling, >> unfortunately. The fact that 8 buffers are available but we don’t use >> any sounds OK to me; as far as I understand, we only use those buffers >> if we have any events to push into them, so as long as we don’t, we >> won’t. Question is, should we not have its poll handler return false >> if we don’t have any events (i.e. events_dropped is false)? Would >> that solve it? > > Or actually, maybe we could just skip the virtio_queue_notify() call for > the event vq? I.e. have it be `if (vq != > VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`? That seems to avoid the CPU-usage-spike issue :) > I wouldn’t like that very much, (a) because this would make it slightly > cumbersome to put that into virtio_queue_aio_attach_host_notifier*(), > and (b) in case that does fix it, I do kind of feel like the real > problem is that we use virtio_queue_host_notifier_aio_poll() for the > event vq, which tells the polling code to poll whenever the vq is > non-empty, but we (AFAIU) expect the event vq to be non-empty all the time. > AFAIU, (at least in my testing) it's only non-empty after it was notified via virtio_scsi_drained_end() once. But it's hard to tell, because it seems that the poll callback is only called after the first drain. I noticed poll_set_started() is not called, because ctx->fdmon_ops->need_wait(ctx) was true, i.e. ctx->poll_disable_cnt was positive (I'm using fdmon_poll). I then found this is because of the notifier for the event vq, being attached with > virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx); in virtio_scsi_dataplane_start(). But in virtio_scsi_drained_end() it is attached with virtio_queue_aio_attach_host_notifier() instead of the _no_poll() variant. So that might be the actual issue here? From a quick test, I cannot see the CPU-usage-spike issue with the following either: > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c > index 690aceec45..ba1ab8e410 100644 > --- a/hw/scsi/virtio-scsi.c > +++ b/hw/scsi/virtio-scsi.c > @@ -1166,7 +1166,15 @@ static void virtio_scsi_drained_end(SCSIBus *bus) > > for (uint32_t i = 0; i < total_queues; i++) { > VirtQueue *vq = virtio_get_queue(vdev, i); > - virtio_queue_aio_attach_host_notifier(vq, s->ctx); > + if (!virtio_queue_get_notification(vq)) { > + virtio_queue_set_notification(vq, true); > + } > + if (vq == VIRTIO_SCSI_COMMON(s)->event_vq) { > + virtio_queue_aio_attach_host_notifier_no_poll(vq, s->ctx); > + } else { > + virtio_queue_aio_attach_host_notifier(vq, s->ctx); > + } > + virtio_queue_notify(vdev, i); > } > } Best Regards, Fiona
On 22.01.24 18:52, Hanna Czenczek wrote: > On 22.01.24 18:41, Hanna Czenczek wrote: >> On 05.01.24 15:30, Fiona Ebner wrote: >>> Am 05.01.24 um 14:43 schrieb Fiona Ebner: >>>> Am 03.01.24 um 14:35 schrieb Paolo Bonzini: >>>>> On 1/3/24 12:40, Fiona Ebner wrote: >>>>>> I'm happy to report that I cannot reproduce the CPU-usage-spike >>>>>> issue >>>>>> with the patch, but I did run into an assertion failure when >>>>>> trying to >>>>>> verify that it fixes my original stuck-guest-IO issue. See below >>>>>> for the >>>>>> backtrace [0]. Hanna wrote in >>>>>> https://issues.redhat.com/browse/RHEL-3934 >>>>>> >>>>>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq) >>>>>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, >>>>>>> because >>>>>>> both virtio-scsi’s and virtio-blk’s .handle_output() >>>>>>> implementations >>>>>>> acquire the device’s context, so this should be directly >>>>>>> callable from >>>>>>> any context. >>>>>> I guess this is not true anymore now that the AioContext locking was >>>>>> removed? >>>>> Good point and, in fact, even before it was much safer to use >>>>> virtio_queue_notify() instead. Not only does it use the event >>>>> notifier >>>>> handler, but it also calls it in the right thread/AioContext just by >>>>> doing event_notifier_set(). >>>>> >>>> But with virtio_queue_notify() using the event notifier, the >>>> CPU-usage-spike issue is present: >>>> >>>>>> Back to the CPU-usage-spike issue: I experimented around and it >>>>>> doesn't >>>>>> seem to matter whether I notify the virt queue before or after >>>>>> attaching >>>>>> the notifiers. But there's another functional difference. My patch >>>>>> called virtio_queue_notify() which contains this block: >>>>>> >>>>>>> if (vq->host_notifier_enabled) { >>>>>>> event_notifier_set(&vq->host_notifier); >>>>>>> } else if (vq->handle_output) { >>>>>>> vq->handle_output(vdev, vq); >>>>>> In my testing, the first branch was taken, calling >>>>>> event_notifier_set(). >>>>>> Hanna's patch uses virtio_queue_notify_vq() and there, >>>>>> vq->handle_output() will be called. That seems to be the relevant >>>>>> difference regarding the CPU-usage-spike issue. >>>> I should mention that this is with a VirtIO SCSI disk. I also >>>> attempted >>>> to reproduce the CPU-usage-spike issue with a VirtIO block disk, but >>>> didn't manage yet. >>>> >>>> What I noticed is that in virtio_queue_host_notifier_aio_poll(), >>>> one of >>>> the queues (but only one) will always show as nonempty. And then, >>>> run_poll_handlers_once() will always detect progress which explains >>>> the >>>> CPU usage. >>>> >>>> The following shows >>>> 1. vq address >>>> 2. number of times vq was passed to >>>> virtio_queue_host_notifier_aio_poll() >>>> 3. number of times the result of virtio_queue_host_notifier_aio_poll() >>>> was true for the vq >>>> >>>>> 0x555fd93f9c80 17162000 0 >>>>> 0x555fd93f9e48 17162000 6 >>>>> 0x555fd93f9ee0 17162000 0 >>>>> 0x555fd93f9d18 17162000 17162000 >>>>> 0x555fd93f9db0 17162000 0 >>>>> 0x555fd93f9f78 17162000 0 >>>> And for the problematic one, the reason it is seen as nonempty is: >>>> >>>>> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0 >>> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and >>> s->events_dropped is false in my testing, so >>> virtio_scsi_handle_event_vq() doesn't do anything. >>> >>>> Those values stay like this while the call counts above increase. >>>> >>>> So something going wrong with the indices when the event notifier >>>> is set >>>> from QEMU side (in the main thread)? >>>> >>>> The guest is Debian 12 with a 6.1 kernel. >> >> So, trying to figure out a new RFC version: >> >> About the stack trace you, Fiona, posted: As far as I understand, >> that happens because virtio_blk_drained_end() calling >> virtio_queue_notify_vq() wasn’t safe after all, and instead we need >> to use virtio_queue_notify(). Right? >> >> However, you say using virtio_queue_notify() instead causes busy >> loops of doing nothing in virtio-scsi (what you describe above). I >> mean, better than a crash, but, you know. :) >> >> I don’t know have any prior knowledge about the event handling, >> unfortunately. The fact that 8 buffers are available but we don’t >> use any sounds OK to me; as far as I understand, we only use those >> buffers if we have any events to push into them, so as long as we >> don’t, we won’t. Question is, should we not have its poll handler >> return false if we don’t have any events (i.e. events_dropped is >> false)? Would that solve it? > > Or actually, maybe we could just skip the virtio_queue_notify() call > for the event vq? I.e. have it be `if (vq != > VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`? > I wouldn’t like that very much, (a) because this would make it > slightly cumbersome to put that into > virtio_queue_aio_attach_host_notifier*(), and (b) in case that does > fix it, I do kind of feel like the real problem is that we use > virtio_queue_host_notifier_aio_poll() for the event vq, which tells > the polling code to poll whenever the vq is non-empty, but we (AFAIU) > expect the event vq to be non-empty all the time. Turns out there was commit 38738f7dbbda90fbc161757b7f4be35b52205552 (“virtio-scsi: don't waste CPU polling the event virtqueue”) by Stefan, which specifically intended to not use virtio_queue_host_notifier_aio_poll() for the event vq. I think the problem is that virtio_scsi_drained_end() should have taken care to use virtio_queue_aio_attach_host_notifier_no_poll() for the event vq. If we do that, I think running virtio_queue_notify() on the event vq too should be reasonable. We still want to check whether there are new buffers available in case we have events_dropped. I don’t know whether it’s really necessary, but without virtio_queue_host_notifier_aio_poll() installed, it shouldn’t hurt. Hanna
On 23.01.24 12:12, Fiona Ebner wrote: [...] > I noticed poll_set_started() is not called, because > ctx->fdmon_ops->need_wait(ctx) was true, i.e. ctx->poll_disable_cnt was > positive (I'm using fdmon_poll). I then found this is because of the > notifier for the event vq, being attached with > >> virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx); > in virtio_scsi_dataplane_start(). But in virtio_scsi_drained_end() it is > attached with virtio_queue_aio_attach_host_notifier() instead of the > _no_poll() variant. So that might be the actual issue here? > > From a quick test, I cannot see the CPU-usage-spike issue with the > following either: > >> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c >> index 690aceec45..ba1ab8e410 100644 >> --- a/hw/scsi/virtio-scsi.c >> +++ b/hw/scsi/virtio-scsi.c >> @@ -1166,7 +1166,15 @@ static void virtio_scsi_drained_end(SCSIBus *bus) >> >> for (uint32_t i = 0; i < total_queues; i++) { >> VirtQueue *vq = virtio_get_queue(vdev, i); >> - virtio_queue_aio_attach_host_notifier(vq, s->ctx); >> + if (!virtio_queue_get_notification(vq)) { >> + virtio_queue_set_notification(vq, true); >> + } >> + if (vq == VIRTIO_SCSI_COMMON(s)->event_vq) { >> + virtio_queue_aio_attach_host_notifier_no_poll(vq, s->ctx); >> + } else { >> + virtio_queue_aio_attach_host_notifier(vq, s->ctx); >> + } >> + virtio_queue_notify(vdev, i); >> } >> } Perfect, so we agree on trying it that way. :) Hanna
On 02.01.24 16:24, Hanna Czenczek wrote: [...] > I’ve attached the preliminary patch that I didn’t get to send (or test > much) last year. Not sure if it has the same CPU-usage-spike issue > Fiona was seeing, the only functional difference is that I notify the > vq after attaching the notifiers instead of before. It’ll take me some more time to send a patch mail to that effect, because now there’s an assertion failure on hotunplug, which bisects back to eaad0fe26050c227dc5dad63205835bac4912a51 (“scsi: only access SCSIDevice->requests from one thread”): {"execute":"device_del","arguments":{"id":"stg0"}} {"return": {}} qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. (gdb) bt #0 0x00007f32a668d83c in () at /usr/lib/libc.so.6 #1 0x00007f32a663d668 in raise () at /usr/lib/libc.so.6 #2 0x00007f32a66254b8 in abort () at /usr/lib/libc.so.6 #3 0x00007f32a66253dc in () at /usr/lib/libc.so.6 #4 0x00007f32a6635d26 in () at /usr/lib/libc.so.6 #5 0x0000556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at ../block/block-backend.c:2429 #6 blk_get_aio_context (blk=0x556e6e89ccf0) at ../block/block-backend.c:2417 #7 0x0000556e6b112d87 in scsi_device_for_each_req_async_bh (opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128 #8 0x0000556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at ../util/async.c:218 #9 0x0000556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290, blocking=blocking@entry=true) at ../util/aio-posix.c:722 #10 0x0000556e6b4564b6 in iothread_run (opaque=opaque@entry=0x556e6d89d920) at ../iothread.c:63 #11 0x0000556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at ../util/qemu-thread-posix.c:541 #12 0x00007f32a668b9eb in () at /usr/lib/libc.so.6 #13 0x00007f32a670f7cc in () at /usr/lib/libc.so.6