Message ID | 1452870739-28484-6-git-send-email-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jan 15, 2016 at 04:12:08PM +0100, Paolo Bonzini wrote: > @@ -120,9 +117,17 @@ Block layer code must therefore expect to run in an IOThread and avoid using > old APIs that implicitly use the main loop. See the "How to program for > IOThreads" above for information on how to do that. > > -If main loop code such as a QMP function wishes to access a BlockDriverState it > -must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the > -IOThread does not run in parallel. > +If main loop code such as a QMP function wishes to access a BlockDriverState > +it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure > +that callbacks in the IOThread do not run in parallel. > + > +Code running in the monitor typically uses bdrv_drain() to ensure that > +past requests from the guest are completed. When a block device is > +running in an IOThread, the IOThread can also process requests from > +the guest (via ioeventfd). These requests have to be blocked with > +aio_disable_clients() before calling bdrv_drain(). You can then reenable > +guest requests with aio_enable_clients() before finally releasing the > +AioContext and completing the monitor command. This is outdated. Monitor commands do this: bdrv_drained_begin() -> ... -> bdrv_drain() -> ... -> bdrv_drained_end() > @@ -110,6 +111,8 @@ static void *test_acquire_thread(void *opaque) > qemu_mutex_lock(&data->start_lock); > qemu_mutex_unlock(&data->start_lock); > > + g_usleep(500000); Sleep?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 02/02/2016 15:52, Stefan Hajnoczi wrote: >>> @@ -110,6 +111,8 @@ static void *test_acquire_thread(void >>> *opaque) qemu_mutex_lock(&data->start_lock); >>> qemu_mutex_unlock(&data->start_lock); >>> >>> + g_usleep(500000); > Sleep? What about it? :) Paolo
On Tue, Feb 02, 2016 at 04:01:55PM +0100, Paolo Bonzini wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > > > On 02/02/2016 15:52, Stefan Hajnoczi wrote: > >>> @@ -110,6 +111,8 @@ static void *test_acquire_thread(void > >>> *opaque) qemu_mutex_lock(&data->start_lock); > >>> qemu_mutex_unlock(&data->start_lock); > >>> > >>> + g_usleep(500000); > > Sleep? > > What about it? :) Sleep in a loop is inefficient but at least correct. A sleep outside a loop is a race condition. If the machine is heavily loaded, whatever you are waiting for may not happen in 500 ms and the test case is unreliable. What is the purpose of this sleep? Stefan
On 03/02/2016 10:34, Stefan Hajnoczi wrote: >>>>>>>>> + g_usleep(500000); >>>>> Sleep? >>> >>> What about it? :) > Sleep in a loop is inefficient but at least correct. > > A sleep outside a loop is a race condition. If the machine is > heavily loaded, whatever you are waiting for may not happen in 500 > ms and the test case is unreliable. > > What is the purpose of this sleep? In the test the desired ordering of events is main thread additional thread --------------------------------------------------------------------- lock start_lock start thread lock start_lock (waits) acquire context unlock start_lock lock start_lock (returns) unlock start_lock aio_poll poll() <<<< event_notifier_set <<<< Comparing the test to QEMU, the roles of the threads are reversed. The test's "main thread" is QEMU's dataplane thread, and the test's additional thread is QEMU's main thread. The sleep "ensures" that poll() blocks before event_notifier_set. The sleep represents how QEMU's main thread acquires the context rarely, and not right after starting the dataplane thread. Because this is a file descriptor source, there is really no difference between the code's behavior, no matter if aio_poll starts before or after the event_notifier_set. The test passes even if you remove the sleep. Do you have any suggestion? Just add a comment? Paolo
On Wed, Feb 03, 2016 at 10:52:38AM +0100, Paolo Bonzini wrote: > > > On 03/02/2016 10:34, Stefan Hajnoczi wrote: > >>>>>>>>> + g_usleep(500000); > >>>>> Sleep? > >>> > >>> What about it? :) > > Sleep in a loop is inefficient but at least correct. > > > > A sleep outside a loop is a race condition. If the machine is > > heavily loaded, whatever you are waiting for may not happen in 500 > > ms and the test case is unreliable. > > > > What is the purpose of this sleep? > > In the test the desired ordering of events is > > main thread additional thread > --------------------------------------------------------------------- > lock start_lock > start thread > lock start_lock (waits) > acquire context > unlock start_lock > lock start_lock (returns) > unlock start_lock > aio_poll > poll() <<<< > event_notifier_set <<<< > > Comparing the test to QEMU, the roles of the threads are reversed. The > test's "main thread" is QEMU's dataplane thread, and the test's > additional thread is QEMU's main thread. > > The sleep "ensures" that poll() blocks before event_notifier_set. The > sleep represents how QEMU's main thread acquires the context rarely, > and not right after starting the dataplane thread. > > Because this is a file descriptor source, there is really no > difference between the code's behavior, no matter if aio_poll starts > before or after the event_notifier_set. The test passes even if you > remove the sleep. > > Do you have any suggestion? Just add a comment? How about removing the sleep and adding a comment explaining that the event_notifier_set() could happen either before or during poll()? Stefan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 05/02/2016 18:51, Stefan Hajnoczi wrote: >> Because this is a file descriptor source, there is really no >> difference between the code's behavior, no matter if aio_poll >> starts before or after the event_notifier_set. The test passes >> even if you remove the sleep. >> >> Do you have any suggestion? Just add a comment? > > How about removing the sleep and adding a comment explaining that > the event_notifier_set() could happen either before or during > poll()? Okay. Paolo -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJWuLV0AAoJEL/70l94x66DajUIAIddRKfy4S45hUQmz9XUTbhA cWfq7SWKjRB7FQSTM2rGDgDCvxDA1MejaIgde3GEMYRQcOj3yZgS8oSvDLXR8Eny Y71Ir/kyjbqVEgjnZwN/1g7gULoS/dZCI/rXgjFjxhe3TC9Pwp6Gp6d6+vlnWRWM c0a1xSwXpbnfLthozgz4eHD7d0u77/Ph7WCyU/2VYI09xKnAel+sIELzrFUhOYcc us4/6tYE1YT0ZagpLkVHXdWbJK30d8tqSU+tBCrW6L/Ql8TAUjwbufaYZpCFBBiL lF1Aif4q3Bg3iCohq4gRSf+96gp2GpIowJKqIJg0BpT9Z5XI/xRRzUeIq9eLbMM= =zN4i -----END PGP SIGNATURE-----
diff --git a/async.c b/async.c index 29f2845..e4ef8b8 100644 --- a/async.c +++ b/async.c @@ -84,8 +84,8 @@ int aio_bh_poll(AioContext *ctx) * aio_notify again if necessary. */ if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) { - /* Idle BHs and the notify BH don't count as progress */ - if (!bh->idle && bh != ctx->notify_dummy_bh) { + /* Idle BHs don't count as progress */ + if (!bh->idle) { ret = 1; } bh->idle = 0; @@ -237,7 +237,6 @@ aio_ctx_finalize(GSource *source) { AioContext *ctx = (AioContext *) source; - qemu_bh_delete(ctx->notify_dummy_bh); thread_pool_free(ctx->thread_pool); qemu_mutex_lock(&ctx->bh_lock); @@ -522,19 +521,6 @@ static void aio_timerlist_notify(void *opaque) aio_notify(opaque); } -static void aio_rfifolock_cb(void *opaque) -{ - AioContext *ctx = opaque; - - /* Kick owner thread in case they are blocked in aio_poll() */ - qemu_bh_schedule(ctx->notify_dummy_bh); -} - -static void notify_dummy_bh(void *opaque) -{ - /* Do nothing, we were invoked just to force the event loop to iterate */ -} - static void event_notifier_dummy_cb(EventNotifier *e) { } @@ -563,10 +549,10 @@ AioContext *aio_context_new(Error **errp) event_notifier_dummy_cb); ctx->thread_pool = NULL; qemu_mutex_init(&ctx->bh_lock); - rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx); + rfifolock_init(&ctx->lock, NULL, NULL); timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx); - ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL); + qemu_event_init(&ctx->sync_io_event, true); return ctx; fail: diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt index ac3e74c..22a89b2 100644 --- a/docs/multiple-iothreads.txt +++ b/docs/multiple-iothreads.txt @@ -103,13 +103,10 @@ a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No acquire/release or locking is needed for the qemu_bh_schedule() call. But be sure to acquire the AioContext for aio_bh_new() if necessary. -The relationship between AioContext and the block layer -------------------------------------------------------- -The AioContext originates from the QEMU block layer because it provides a -scoped way of running event loop iterations until all work is done. This -feature is used to complete all in-flight block I/O requests (see -bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be -used by any QEMU subsystem. +AioContext and the block layer +------------------------------ +The AioContext originates from the QEMU block layer, even though nowadays +AioContext is a generic event loop that can be used by any QEMU subsystem. The block layer has support for AioContext integrated. Each BlockDriverState is associated with an AioContext using bdrv_set_aio_context() and @@ -120,9 +117,17 @@ Block layer code must therefore expect to run in an IOThread and avoid using old APIs that implicitly use the main loop. See the "How to program for IOThreads" above for information on how to do that. -If main loop code such as a QMP function wishes to access a BlockDriverState it -must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the -IOThread does not run in parallel. +If main loop code such as a QMP function wishes to access a BlockDriverState +it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure +that callbacks in the IOThread do not run in parallel. + +Code running in the monitor typically uses bdrv_drain() to ensure that +past requests from the guest are completed. When a block device is +running in an IOThread, the IOThread can also process requests from +the guest (via ioeventfd). These requests have to be blocked with +aio_disable_clients() before calling bdrv_drain(). You can then reenable +guest requests with aio_enable_clients() before finally releasing the +AioContext and completing the monitor command. Long-running jobs (usually in the form of coroutines) are best scheduled in the BlockDriverState's AioContext to avoid the need to acquire/release around each diff --git a/include/block/aio.h b/include/block/aio.h index 191bd3e..7223daf 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -114,9 +114,6 @@ struct AioContext { bool notified; EventNotifier notifier; - /* Scheduling this BH forces the event loop it iterate */ - QEMUBH *notify_dummy_bh; - /* Thread pool for performing work and receiving completion callbacks */ struct ThreadPool *thread_pool; diff --git a/iothread.c b/iothread.c index 1150912..14a12cd 100644 --- a/iothread.c +++ b/iothread.c @@ -38,7 +38,6 @@ bool aio_context_in_iothread(AioContext *ctx) static void *iothread_run(void *opaque) { IOThread *iothread = opaque; - bool blocking; rcu_register_thread(); @@ -48,14 +47,8 @@ static void *iothread_run(void *opaque) qemu_cond_signal(&iothread->init_done_cond); qemu_mutex_unlock(&iothread->init_done_lock); - while (!iothread->stopping) { - aio_context_acquire(iothread->ctx); - blocking = true; - while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { - /* Progress was made, keep going */ - blocking = false; - } - aio_context_release(iothread->ctx); + while (!atomic_read(&iothread->stopping)) { + aio_poll(iothread->ctx, true); } rcu_unregister_thread(); diff --git a/tests/test-aio.c b/tests/test-aio.c index 6ccea98..41e30f7 100644 --- a/tests/test-aio.c +++ b/tests/test-aio.c @@ -99,6 +99,7 @@ static void event_ready_cb(EventNotifier *e) typedef struct { QemuMutex start_lock; + EventNotifier notifier; bool thread_acquired; } AcquireTestData; @@ -110,6 +111,8 @@ static void *test_acquire_thread(void *opaque) qemu_mutex_lock(&data->start_lock); qemu_mutex_unlock(&data->start_lock); + g_usleep(500000); + event_notifier_set(&data->notifier); aio_context_acquire(ctx); aio_context_release(ctx); @@ -124,20 +127,19 @@ static void set_event_notifier(AioContext *ctx, EventNotifier *notifier, aio_set_event_notifier(ctx, notifier, false, handler); } -static void dummy_notifier_read(EventNotifier *unused) +static void dummy_notifier_read(EventNotifier *n) { - g_assert(false); /* should never be invoked */ + event_notifier_test_and_clear(n); } static void test_acquire(void) { QemuThread thread; - EventNotifier notifier; AcquireTestData data; /* Dummy event notifier ensures aio_poll() will block */ - event_notifier_init(¬ifier, false); - set_event_notifier(ctx, ¬ifier, dummy_notifier_read); + event_notifier_init(&data.notifier, false); + set_event_notifier(ctx, &data.notifier, dummy_notifier_read); g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */ qemu_mutex_init(&data.start_lock); @@ -151,12 +153,13 @@ static void test_acquire(void) /* Block in aio_poll(), let other thread kick us and acquire context */ aio_context_acquire(ctx); qemu_mutex_unlock(&data.start_lock); /* let the thread run */ - g_assert(!aio_poll(ctx, true)); + g_assert(aio_poll(ctx, true)); + g_assert(!data.thread_acquired); aio_context_release(ctx); qemu_thread_join(&thread); - set_event_notifier(ctx, ¬ifier, NULL); - event_notifier_cleanup(¬ifier); + set_event_notifier(ctx, &data.notifier, NULL); + event_notifier_cleanup(&data.notifier); g_assert(data.thread_acquired); }
This is the first step towards having fine-grained critical sections in dataplane threads, which resolves lock ordering problems between address_space_* functions (which need the BQL when doing MMIO, even after we complete RCU-based dispatch) and the AioContext. Because AioContext does not use contention callbacks anymore, the unit test has to be changed. Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and then reverted. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- async.c | 22 ++++------------------ docs/multiple-iothreads.txt | 25 +++++++++++++++---------- include/block/aio.h | 3 --- iothread.c | 11 ++--------- tests/test-aio.c | 19 +++++++++++-------- 5 files changed, 32 insertions(+), 48 deletions(-)