diff mbox series

[RFC] mpqemu: Remove unlock/lock of iothread in mpqemu-link send and recv functions

Message ID 165331848622.286860.14764979875661796662.stgit@localhost.localdomain (mailing list archive)
State New, archived
Headers show
Series [RFC] mpqemu: Remove unlock/lock of iothread in mpqemu-link send and recv functions | expand

Commit Message

Alexander Duyck May 23, 2022, 3:09 p.m. UTC
From: Alexander Duyck <alexanderduyck@fb.com>

When I run Multi-process QEMU with an e1000 as the remote device and SMP
enabled I see the combination lock up and become unresponsive. The QEMU build
is a fairly standard x86_64-softmmu setup. After doing some digging I tracked
the lockup down to the what appears to be a race with the mpqemu-link msg_send
and msg_receive functions and the reacquisition of the lock.

I am assuming the issue is some sort of lock inversion though I haven't
identified exactly what the other lock involved is yet. For now removing
the logic to unlock the iothread and then reacquire the lock seems to
resolve the issue. I am assuming the releasing of the lock was some form of
optimization but I am not certain so I am submitting this as an RFC.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
---
 hw/remote/mpqemu-link.c |   25 -------------------------
 1 file changed, 25 deletions(-)

Comments

Jag Raman May 23, 2022, 10:56 p.m. UTC | #1
> On May 23, 2022, at 11:09 AM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> 
> From: Alexander Duyck <alexanderduyck@fb.com>
> 
> When I run Multi-process QEMU with an e1000 as the remote device and SMP
> enabled I see the combination lock up and become unresponsive. The QEMU build
> is a fairly standard x86_64-softmmu setup. After doing some digging I tracked
> the lockup down to the what appears to be a race with the mpqemu-link msg_send
> and msg_receive functions and the reacquisition of the lock.
> 
> I am assuming the issue is some sort of lock inversion though I haven't
> identified exactly what the other lock involved is yet. For now removing
> the logic to unlock the iothread and then reacquire the lock seems to
> resolve the issue. I am assuming the releasing of the lock was some form of
> optimization but I am not certain so I am submitting this as an RFC.

Hi Alexander,

We are working on moving away from Multi-process QEMU and to using vfio-user
based approach. The vfio-user patches are under review. I believe we would drop
the Multi-process support once vfio-user is merged.

We release the lock here while communicating with the remote process via the
QIOChannel. It is to prevent lockup of the VM in case the QIOChannel hangs.

I was able to reproduce this issue at my end. There is a deadlock between
"mpqemu_msg_send() -> qemu_mutex_lock_iothread()" and
"mpqemu_msg_send_and_await_reply() -> QEMU_LOCK_GUARD(&pdev->io_mutex)”.

From what I can tell, as soon as one vcpu thread drops the iothread lock, another
thread running mpqemu_msg_send_and_await_reply() holds on to it. That prevents
the first thread from completing. Attaching backtrace below.

To avoid the deadlock, I think we should drop both the iothread lock and io_mutex
and reacquire them in the correct order - first iothread and then io_mutex. Given
multiprocess QEMU would be dropped in the near future, I suppose we don’t have
to proceed further along these lines.

I tested your patch, and that fixes the e1000 issue at my end also. I believe we
could adopt it.

Thank you!
--
Jag

Thread 6 (Thread 0x7f2d12281700 (LWP 31758)):
#0  0x00007f2d9b7ac54d in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f2d9b7a7e9b in _L_lock_883 () at /lib64/libpthread.so.0
#2  0x00007f2d9b7a7d68 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x000055bdeb48663f in qemu_mutex_lock_impl (mutex=0x55bdebf68800 <qemu_global_mutex>, file=0x55bdeb5c5c5a "../hw/remote/mpqemu-link.c", line=79) at ../util/qemu-thread-posix.c:88
#4  0x000055bdeb006546 in qemu_mutex_lock_iothread_impl (file=0x55bdeb5c5c5a "../hw/remote/mpqemu-link.c", line=79) at ../softmmu/cpus.c:502
#5  0x000055bdeafed3ff in mpqemu_msg_send (msg=0x7f2d12280430, ioc=0x55bdeeb02600, errp=0x7f2d12280420) at ../hw/remote/mpqemu-link.c:79
#6  0x000055bdeafed93c in mpqemu_msg_send_and_await_reply (msg=0x7f2d12280430, pdev=0x55bdeeaff8e0, errp=0x7f2d12280420) at ../hw/remote/mpqemu-link.c:198
#7  0x000055bdeafefe0e in send_bar_access_msg (pdev=0x55bdeeaff8e0, mr=0x55bdeeb00460, write=false, addr=192, val=0x7f2d12280578, size=4, memory=true) at ../hw/remote/proxy.c:256
#8  0x000055bdeafeff3e in proxy_bar_read (opaque=0x55bdeeb00450, addr=192, size=4) at ../hw/remote/proxy.c:280
#9  0x000055bdeb1f3759 in memory_region_read_accessor (mr=0x55bdeeb00460, addr=192, value=0x7f2d12280750, size=4, shift=0, mask=4294967295, attrs=...) at ../softmmu/memory.c:440
#10 0x000055bdeb1f3c8e in access_with_adjusted_size (addr=192, value=0x7f2d12280750, size=4, access_size_min=1, access_size_max=8, access_fn=0x55bdeb1f3716 <memory_region_read_accessor>, mr=0x55bdeeb00460, attrs=...) at ../softmmu/memory.c:554
#11 0x000055bdeb1f695f in memory_region_dispatch_read1 (mr=0x55bdeeb00460, addr=192, pval=0x7f2d12280750, size=4, attrs=...) at ../softmmu/memory.c:1424
#12 0x000055bdeb1f6a79 in memory_region_dispatch_read (mr=0x55bdeeb00460, addr=192, pval=0x7f2d12280750, op=MO_32, attrs=...) at ../softmmu/memory.c:1457
#13 0x000055bdeb20451a in flatview_read_continue (fv=0x7f2d0c30ef50, addr=4273602752, attrs=..., ptr=0x7f2d9d988028, len=4, addr1=192, l=4, mr=0x55bdeeb00460) at ../softmmu/physmem.c:2881
#14 0x000055bdeb204692 in flatview_read (fv=0x7f2d0c30ef50, addr=4273602752, attrs=..., buf=0x7f2d9d988028, len=4) at ../softmmu/physmem.c:2923
#15 0x000055bdeb20471b in address_space_read_full (as=0x55bdebf705e0 <address_space_memory>, addr=4273602752, attrs=..., buf=0x7f2d9d988028, len=4) at ../softmmu/physmem.c:2936
#16 0x000055bdeb20483f in address_space_rw (as=0x55bdebf705e0 <address_space_memory>, addr=4273602752, attrs=..., buf=0x7f2d9d988028, len=4, is_write=false) at ../softmmu/physmem.c:2964
#17 0x000055bdeb29a60a in kvm_cpu_exec (cpu=0x55bdedcb0410) at ../accel/kvm/kvm-all.c:2929
#18 0x000055bdeb29c3fc in kvm_vcpu_thread_fn (arg=0x55bdedcb0410) at ../accel/kvm/kvm-accel-ops.c:49
#19 0x000055bdeb4872f8 in qemu_thread_start (args=0x55bdedcbf700) at ../util/qemu-thread-posix.c:504
#20 0x00007f2d9b7a5ea5 in start_thread () at /lib64/libpthread.so.0
#21 0x00007f2d9b4ceb0d in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f2d13a84700 (LWP 31753)):
#0  0x00007f2d9b7ac54d in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f2d9b7a7e9b in _L_lock_883 () at /lib64/libpthread.so.0
#2  0x00007f2d9b7a7d68 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x000055bdeb48663f in qemu_mutex_lock_impl (mutex=0x55bdeeb00308, file=0x55bdeb5c5aa0 "/home/qemu-separation/qemu-split/include/qemu/thread.h", line=118) at ../util/qemu-thread-posix.c:88
#4  0x000055bdeafecf00 in qemu_mutex_lock (mutex=0x55bdeeb00308) at /home/qemu-separation/qemu-split/include/qemu/thread.h:118
#5  0x000055bdeafecf4a in qemu_lockable_lock (x=0x7f2d13a83310) at /home/qemu-separation/qemu-split/include/qemu/lockable.h:95
#6  0x000055bdeafecf88 in qemu_lockable_auto_lock (x=0x7f2d13a83310) at /home/qemu-separation/qemu-split/include/qemu/lockable.h:105
#7  0x000055bdeafed90e in mpqemu_msg_send_and_await_reply (msg=0x7f2d13a83490, pdev=0x55bdeeaff8e0, errp=0x7f2d13a83480) at ../hw/remote/mpqemu-link.c:197
#8  0x000055bdeafefe0e in send_bar_access_msg (pdev=0x55bdeeaff8e0, mr=0x55bdeeb00460, write=true, addr=200, val=0x7f2d13a835b8, size=4, memory=true) at ../hw/remote/proxy.c:256
#9  0x000055bdeafefec9 in proxy_bar_write (opaque=0x55bdeeb00450, addr=200, val=4, size=4) at ../hw/remote/proxy.c:271
#10 0x000055bdeb1f3a48 in memory_region_write_accessor (mr=0x55bdeeb00460, addr=200, value=0x7f2d13a836e8, size=4, shift=0, mask=4294967295, attrs=...) at ../softmmu/memory.c:492
#11 0x000055bdeb1f3c8e in access_with_adjusted_size (addr=200, value=0x7f2d13a836e8, size=4, access_size_min=1, access_size_max=8, access_fn=0x55bdeb1f3952 <memory_region_write_accessor>, mr=0x55bdeeb00460, attrs=...) at ../softmmu/memory.c:554
#12 0x000055bdeb1f6d8b in memory_region_dispatch_write (mr=0x55bdeeb00460, addr=200, data=4, op=MO_32, attrs=...) at ../softmmu/memory.c:1514
#13 0x000055bdeb20429f in flatview_write_continue (fv=0x7f2d0c30ef50, addr=4273602760, attrs=..., ptr=0x7f2d9d991028, len=4, addr1=200, l=4, mr=0x55bdeeb00460) at ../softmmu/physmem.c:2814
#14 0x000055bdeb204402 in flatview_write (fv=0x7f2d0c30ef50, addr=4273602760, attrs=..., buf=0x7f2d9d991028, len=4) at ../softmmu/physmem.c:2856
#15 0x000055bdeb2047b2 in address_space_write (as=0x55bdebf705e0 <address_space_memory>, addr=4273602760, attrs=..., buf=0x7f2d9d991028, len=4) at ../softmmu/physmem.c:2952
#16 0x000055bdeb20481f in address_space_rw (as=0x55bdebf705e0 <address_space_memory>, addr=4273602760, attrs=..., buf=0x7f2d9d991028, len=4, is_write=true) at ../softmmu/physmem.c:2962
#17 0x000055bdeb29a60a in kvm_cpu_exec (cpu=0x55bdedc56930) at ../accel/kvm/kvm-all.c:2929
#18 0x000055bdeb29c3fc in kvm_vcpu_thread_fn (arg=0x55bdedc56930) at ../accel/kvm/kvm-accel-ops.c:49
#19 0x000055bdeb4872f8 in qemu_thread_start (args=0x55bdedc64fa0) at ../util/qemu-thread-posix.c:504
#20 0x00007f2d9b7a5ea5 in start_thread () at /lib64/libpthread.so.0
#21 0x00007f2d9b4ceb0d in clone () at /lib64/libc.so.6

> 
> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
> hw/remote/mpqemu-link.c |   25 -------------------------
> 1 file changed, 25 deletions(-)
> 
> diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> index 9bd98e82197e..3e7818f54a63 100644
> --- a/hw/remote/mpqemu-link.c
> +++ b/hw/remote/mpqemu-link.c
> @@ -33,7 +33,6 @@
>  */
> bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> {
> -    bool iolock = qemu_mutex_iothread_locked();
>     bool iothread = qemu_in_iothread();
>     struct iovec send[2] = {};
>     int *fds = NULL;
> @@ -57,16 +56,6 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
>      */
>     assert(qemu_in_coroutine() || !iothread);
> 
> -    /*
> -     * Skip unlocking/locking iothread lock when the IOThread is running
> -     * in co-routine context. Co-routine context is asserted above
> -     * for IOThread case.
> -     * Also skip lock handling while in a co-routine in the main context.
> -     */
> -    if (iolock && !iothread && !qemu_in_coroutine()) {
> -        qemu_mutex_unlock_iothread();
> -    }
> -
>     if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
>                                     fds, nfds, 0, errp)) {
>         ret = true;
> @@ -74,11 +63,6 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
>         trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
>     }
> 
> -    if (iolock && !iothread && !qemu_in_coroutine()) {
> -        /* See above comment why skip locking here. */
> -        qemu_mutex_lock_iothread();
> -    }
> -
>     return ret;
> }
> 
> @@ -96,7 +80,6 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
>                            size_t *nfds, Error **errp)
> {
>     struct iovec iov = { .iov_base = buf, .iov_len = len };
> -    bool iolock = qemu_mutex_iothread_locked();
>     bool iothread = qemu_in_iothread();
>     int ret = -1;
> 
> @@ -106,16 +89,8 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
>      */
>     assert(qemu_in_coroutine() || !iothread);
> 
> -    if (iolock && !iothread && !qemu_in_coroutine()) {
> -        qemu_mutex_unlock_iothread();
> -    }
> -
>     ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, errp);
> 
> -    if (iolock && !iothread && !qemu_in_coroutine()) {
> -        qemu_mutex_lock_iothread();
> -    }
> -
>     return (ret <= 0) ? ret : iov.iov_len;
> }
> 
> 
> 
>
Alexander Duyck May 24, 2022, 2:58 p.m. UTC | #2
On Mon, May 23, 2022 at 3:56 PM Jag Raman <jag.raman@oracle.com> wrote:
>
>
>
> > On May 23, 2022, at 11:09 AM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> >
> > From: Alexander Duyck <alexanderduyck@fb.com>
> >
> > When I run Multi-process QEMU with an e1000 as the remote device and SMP
> > enabled I see the combination lock up and become unresponsive. The QEMU build
> > is a fairly standard x86_64-softmmu setup. After doing some digging I tracked
> > the lockup down to the what appears to be a race with the mpqemu-link msg_send
> > and msg_receive functions and the reacquisition of the lock.
> >
> > I am assuming the issue is some sort of lock inversion though I haven't
> > identified exactly what the other lock involved is yet. For now removing
> > the logic to unlock the iothread and then reacquire the lock seems to
> > resolve the issue. I am assuming the releasing of the lock was some form of
> > optimization but I am not certain so I am submitting this as an RFC.
>
> Hi Alexander,
>
> We are working on moving away from Multi-process QEMU and to using vfio-user
> based approach. The vfio-user patches are under review. I believe we would drop
> the Multi-process support once vfio-user is merged.
>
> We release the lock here while communicating with the remote process via the
> QIOChannel. It is to prevent lockup of the VM in case the QIOChannel hangs.
>
> I was able to reproduce this issue at my end. There is a deadlock between
> "mpqemu_msg_send() -> qemu_mutex_lock_iothread()" and
> "mpqemu_msg_send_and_await_reply() -> QEMU_LOCK_GUARD(&pdev->io_mutex)”.
>
> From what I can tell, as soon as one vcpu thread drops the iothread lock, another
> thread running mpqemu_msg_send_and_await_reply() holds on to it. That prevents
> the first thread from completing. Attaching backtrace below.
>
> To avoid the deadlock, I think we should drop both the iothread lock and io_mutex
> and reacquire them in the correct order - first iothread and then io_mutex. Given
> multiprocess QEMU would be dropped in the near future, I suppose we don’t have
> to proceed further along these lines.
>
> I tested your patch, and that fixes the e1000 issue at my end also. I believe we
> could adopt it.

Hi Jag,

I will go take a look at the vfio-user patches. I hadn't been
following the list lately so I didn't realize things were headed in
that direction. It may work out better for me depending on what is
enabled, as I was looking at working with PCIe config and MSI-X anyway
so if the vfio-user supports that already then it may save me some
work.

What you describe as being the issue doesn't sound too surprising, as
I had mentioned disabling SMP also solved the problem for me. It is
likely due to the fact that the e1000 has separate threads running for
handling stats and such and then the main thread handling the
networking. I would think we cannot drop the io_mutex though as it is
needed to enforce the ordering of the request send and the reply
receive. Rather if we need to drop the iothread lock I would think it
might be better to drop it before acquiring the io_mutex, or to at
least wait until after we release the io_mutex before reacquiring it.

If you want I can resubmit my patch for acceptance, but it sounds like
the code will be deprecated soon so I am not sure there is much point.
I will turn my focus to vfio-user and see if I can get it up and
running for my use case.

Thanks,

- Alex
diff mbox series

Patch

diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index 9bd98e82197e..3e7818f54a63 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -33,7 +33,6 @@ 
  */
 bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
 {
-    bool iolock = qemu_mutex_iothread_locked();
     bool iothread = qemu_in_iothread();
     struct iovec send[2] = {};
     int *fds = NULL;
@@ -57,16 +56,6 @@  bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
      */
     assert(qemu_in_coroutine() || !iothread);
 
-    /*
-     * Skip unlocking/locking iothread lock when the IOThread is running
-     * in co-routine context. Co-routine context is asserted above
-     * for IOThread case.
-     * Also skip lock handling while in a co-routine in the main context.
-     */
-    if (iolock && !iothread && !qemu_in_coroutine()) {
-        qemu_mutex_unlock_iothread();
-    }
-
     if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send),
                                     fds, nfds, 0, errp)) {
         ret = true;
@@ -74,11 +63,6 @@  bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
         trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds);
     }
 
-    if (iolock && !iothread && !qemu_in_coroutine()) {
-        /* See above comment why skip locking here. */
-        qemu_mutex_lock_iothread();
-    }
-
     return ret;
 }
 
@@ -96,7 +80,6 @@  static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
                            size_t *nfds, Error **errp)
 {
     struct iovec iov = { .iov_base = buf, .iov_len = len };
-    bool iolock = qemu_mutex_iothread_locked();
     bool iothread = qemu_in_iothread();
     int ret = -1;
 
@@ -106,16 +89,8 @@  static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, size_t len, int **fds,
      */
     assert(qemu_in_coroutine() || !iothread);
 
-    if (iolock && !iothread && !qemu_in_coroutine()) {
-        qemu_mutex_unlock_iothread();
-    }
-
     ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, errp);
 
-    if (iolock && !iothread && !qemu_in_coroutine()) {
-        qemu_mutex_lock_iothread();
-    }
-
     return (ret <= 0) ? ret : iov.iov_len;
 }