Message ID | 20231205182011.1976568-5-stefanha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | aio: remove AioContext lock | expand |
Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben: > aio_context_acquire()/aio_context_release() has been replaced by > fine-grained locking to protect state shared by multiple threads. The > AioContext lock still plays the role of balancing locking in > AIO_WAIT_WHILE() and many functions in QEMU either require that the > AioContext lock is held or not held for this reason. In other words, the > AioContext lock is purely there for consistency with itself and serves > no real purpose anymore. > > Stop actually acquiring/releasing the lock in > aio_context_acquire()/aio_context_release() so that subsequent patches > can remove callers across the codebase incrementally. > > I have performed "make check" and qemu-iotests stress tests across > x86-64, ppc64le, and aarch64 to confirm that there are no failures as a > result of eliminating the lock. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > Reviewed-by: Eric Blake <eblake@redhat.com> > Acked-by: Kevin Wolf <kwolf@redhat.com> I knew why I wasn't confident enough to give a R-b... This crashes qemu-storage-daemon in the qemu-iotests case graph-changes-while-io. qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: Assertion `client->recv_coroutine == qemu_coroutine_self()' failed. (gdb) bt #0 0x00007fdb00529884 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007fdb004d8afe in raise () from /lib64/libc.so.6 #2 0x00007fdb004c187f in abort () from /lib64/libc.so.6 #3 0x00007fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6 #4 0x00007fdb004d1187 in __assert_fail () from /lib64/libc.so.6 #5 0x0000557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542 #6 nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962 #7 0x0000557f9faa416b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 #8 0x00007fdb004efe90 in ?? () from /lib64/libc.so.6 #9 0x00007fdafc35f680 in ?? () #10 0x0000000000000000 in ?? () (gdb) p *client $2 = {refcount = 4, close_fn = 0x557f9f95dc40 <nbd_blockdev_client_closed>, exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x557fa0b33d90, ioc = 0x557fa0b33d90, recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, send_coroutine = 0x0, read_yielding = false, quiescing = false, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, nb_requests = 1, closing = false, check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0} (gdb) p co_tls_current $3 = (Coroutine *) 0x7fdaf00061d0 Kevin
Am 19.12.2023 um 16:28 hat Kevin Wolf geschrieben: > Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben: > > aio_context_acquire()/aio_context_release() has been replaced by > > fine-grained locking to protect state shared by multiple threads. The > > AioContext lock still plays the role of balancing locking in > > AIO_WAIT_WHILE() and many functions in QEMU either require that the > > AioContext lock is held or not held for this reason. In other words, the > > AioContext lock is purely there for consistency with itself and serves > > no real purpose anymore. > > > > Stop actually acquiring/releasing the lock in > > aio_context_acquire()/aio_context_release() so that subsequent patches > > can remove callers across the codebase incrementally. > > > > I have performed "make check" and qemu-iotests stress tests across > > x86-64, ppc64le, and aarch64 to confirm that there are no failures as a > > result of eliminating the lock. > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > Reviewed-by: Eric Blake <eblake@redhat.com> > > Acked-by: Kevin Wolf <kwolf@redhat.com> > > I knew why I wasn't confident enough to give a R-b... This crashes > qemu-storage-daemon in the qemu-iotests case graph-changes-while-io. > > qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: Assertion `client->recv_coroutine == qemu_coroutine_self()' failed. > > (gdb) bt > #0 0x00007fdb00529884 in __pthread_kill_implementation () from /lib64/libc.so.6 > #1 0x00007fdb004d8afe in raise () from /lib64/libc.so.6 > #2 0x00007fdb004c187f in abort () from /lib64/libc.so.6 > #3 0x00007fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6 > #4 0x00007fdb004d1187 in __assert_fail () from /lib64/libc.so.6 > #5 0x0000557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542 > #6 nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962 > #7 0x0000557f9faa416b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 > #8 0x00007fdb004efe90 in ?? () from /lib64/libc.so.6 > #9 0x00007fdafc35f680 in ?? () > #10 0x0000000000000000 in ?? () > (gdb) p *client > $2 = {refcount = 4, close_fn = 0x557f9f95dc40 <nbd_blockdev_client_closed>, exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x557fa0b33d90, ioc = 0x557fa0b33d90, > recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, > send_coroutine = 0x0, read_yielding = false, quiescing = false, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, nb_requests = 1, closing = false, > check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0} > (gdb) p co_tls_current > $3 = (Coroutine *) 0x7fdaf00061d0 This one isn't easy to debug... The first problem here is that two nbd_trip() coroutines are scheduled in the same iothread, and creating the second one overwrites client->recv_coroutine, which triggers the assertion in the first one. This can be fixed by introducing a new mutex in NBDClient and taking it in nbd_client_receive_next_request() so that there is no race between checking client->recv_coroutine != NULL and setting it to a new coroutine. (Not entirely sure why two different threads are doing this, maybe the main thread reentering in drained_end and the iothread waiting for the next request?) However, I'm seeing new assertion failures when I do that: client->quiescing isn't set in the -EAGAIN case in nbd_trip(). I haven't really figured out yet where this comes from. Taking the new NBDClient lock in the drain functions and in nbd_trip() doesn't seem to be enough to fix it anyway. Or maybe I didn't quite find the right places to take it. I'm not sure if the list of NBD clients needs a lock, too, or if it is only ever accessed in the main thread, but that should be unrelated to the assertion failures I'm seeing. Kevin
The following hack makes the test pass but there are larger safety issues that I'll need to look at on Wednesday: diff --git a/nbd/server.c b/nbd/server.c index 895cf0a752..cf4b7d5c6d 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1617,7 +1617,7 @@ static void nbd_drained_begin(void *opaque) } } -static void nbd_drained_end(void *opaque) +static void nbd_resume_clients(void *opaque) { NBDExport *exp = opaque; NBDClient *client; @@ -1628,6 +1628,15 @@ static void nbd_drained_end(void *opaque) } } +static void nbd_drained_end(void *opaque) +{ + NBDExport *exp = opaque; + + /* TODO how to make sure exp doesn't go away? */ + /* TODO what if AioContext changes before this runs? */ + aio_bh_schedule_oneshot(nbd_export_aio_context(exp), nbd_resume_clients, exp); +} + static bool nbd_drained_poll(void *opaque) { NBDExport *exp = opaque;
Am 19.12.2023 um 22:23 hat Stefan Hajnoczi geschrieben: > The following hack makes the test pass but there are larger safety > issues that I'll need to look at on Wednesday: I see, you're taking the same approach as in the SCSI layer: Don't make things thread-safe, but just always access them from the same thread. In theory this should be okay, but I'm almost sure that at least nbd_drained_poll() must then run in the same AioContext, too. > diff --git a/nbd/server.c b/nbd/server.c > index 895cf0a752..cf4b7d5c6d 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1617,7 +1617,7 @@ static void nbd_drained_begin(void *opaque) > } > } > > -static void nbd_drained_end(void *opaque) > +static void nbd_resume_clients(void *opaque) > { > NBDExport *exp = opaque; > NBDClient *client; > @@ -1628,6 +1628,15 @@ static void nbd_drained_end(void *opaque) > } > } > > +static void nbd_drained_end(void *opaque) > +{ > + NBDExport *exp = opaque; > + > + /* TODO how to make sure exp doesn't go away? */ blk_exp_ref()? > + /* TODO what if AioContext changes before this runs? */ > + aio_bh_schedule_oneshot(nbd_export_aio_context(exp), > nbd_resume_clients, exp); We could increase client->nb_requests if we change it to be accessed atomically. Then nbd_drained_poll() will make any AioContext change wait for the BH. Or maybe aio_wait_bh_oneshot() would already solve both problems? > +} > + Kevin
On Wed, 20 Dec 2023 at 04:32, Kevin Wolf <kwolf@redhat.com> wrote: > > Am 19.12.2023 um 22:23 hat Stefan Hajnoczi geschrieben: > > The following hack makes the test pass but there are larger safety > > issues that I'll need to look at on Wednesday: > > I see, you're taking the same approach as in the SCSI layer: Don't make > things thread-safe, but just always access them from the same thread. Yes, but it feels like a hack to me. You pointed out that other parts also don't look thread-safe (e.g. the clients list) and I agree. I've started annotating the code and will try to come up with a full fix today. Stefan
On Tue, 19 Dec 2023 at 13:20, Kevin Wolf <kwolf@redhat.com> wrote: > > Am 19.12.2023 um 16:28 hat Kevin Wolf geschrieben: > > Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben: > > > aio_context_acquire()/aio_context_release() has been replaced by > > > fine-grained locking to protect state shared by multiple threads. The > > > AioContext lock still plays the role of balancing locking in > > > AIO_WAIT_WHILE() and many functions in QEMU either require that the > > > AioContext lock is held or not held for this reason. In other words, the > > > AioContext lock is purely there for consistency with itself and serves > > > no real purpose anymore. > > > > > > Stop actually acquiring/releasing the lock in > > > aio_context_acquire()/aio_context_release() so that subsequent patches > > > can remove callers across the codebase incrementally. > > > > > > I have performed "make check" and qemu-iotests stress tests across > > > x86-64, ppc64le, and aarch64 to confirm that there are no failures as a > > > result of eliminating the lock. > > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > > Reviewed-by: Eric Blake <eblake@redhat.com> > > > Acked-by: Kevin Wolf <kwolf@redhat.com> > > > > I knew why I wasn't confident enough to give a R-b... This crashes > > qemu-storage-daemon in the qemu-iotests case graph-changes-while-io. > > > > qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: Assertion `client->recv_coroutine == qemu_coroutine_self()' failed. > > > > (gdb) bt > > #0 0x00007fdb00529884 in __pthread_kill_implementation () from /lib64/libc.so.6 > > #1 0x00007fdb004d8afe in raise () from /lib64/libc.so.6 > > #2 0x00007fdb004c187f in abort () from /lib64/libc.so.6 > > #3 0x00007fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6 > > #4 0x00007fdb004d1187 in __assert_fail () from /lib64/libc.so.6 > > #5 0x0000557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542 > > #6 nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962 > > #7 0x0000557f9faa416b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 > > #8 0x00007fdb004efe90 in ?? () from /lib64/libc.so.6 > > #9 0x00007fdafc35f680 in ?? () > > #10 0x0000000000000000 in ?? () > > (gdb) p *client > > $2 = {refcount = 4, close_fn = 0x557f9f95dc40 <nbd_blockdev_client_closed>, exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x557fa0b33d90, ioc = 0x557fa0b33d90, > > recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, > > send_coroutine = 0x0, read_yielding = false, quiescing = false, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, nb_requests = 1, closing = false, > > check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0} > > (gdb) p co_tls_current > > $3 = (Coroutine *) 0x7fdaf00061d0 > > This one isn't easy to debug... > > The first problem here is that two nbd_trip() coroutines are scheduled > in the same iothread, and creating the second one overwrites > client->recv_coroutine, which triggers the assertion in the first one. > > This can be fixed by introducing a new mutex in NBDClient and taking it > in nbd_client_receive_next_request() so that there is no race between > checking client->recv_coroutine != NULL and setting it to a new > coroutine. (Not entirely sure why two different threads are doing this, > maybe the main thread reentering in drained_end and the iothread waiting > for the next request?) > > However, I'm seeing new assertion failures when I do that: > client->quiescing isn't set in the -EAGAIN case in nbd_trip(). I haven't > really figured out yet where this comes from. Taking the new NBDClient > lock in the drain functions and in nbd_trip() doesn't seem to be enough > to fix it anyway. Or maybe I didn't quite find the right places to take > it. bdrv_graph_wrlock() -> bdrv_drain_all_begin_nopoll() followed by bdrv_drain_all_end() causes this issue. It's a race condition where nbd_trip() in the IOThread sees client->quiescing == true for a moment but then the drained region ends before nbd_trip() re-acquires the lock and reaches assert(client->quiescing). The "nopoll" part of bdrv_drain_all_begin_nopoll() seems to be the issue. We cannot assume all requests have quiesced when .drained_end() is called. I'm running more tests now to be sure I have a working solution. Will send patches soon. Stefan
diff --git a/util/async.c b/util/async.c index 8f90ddc304..04ee83d220 100644 --- a/util/async.c +++ b/util/async.c @@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx) void aio_context_acquire(AioContext *ctx) { - qemu_rec_mutex_lock(&ctx->lock); + /* TODO remove this function */ } void aio_context_release(AioContext *ctx) { - qemu_rec_mutex_unlock(&ctx->lock); + /* TODO remove this function */ } QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)