Message ID | 1454033989-16996-1-git-send-email-famz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote: > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque); > * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and > * anything that uses bdrv_drain_all() in the main loop. > * > + * The job->deferred_to_main_loop flag will be set. Caller must clear it once > + * the deferred work is done and the block job coroutine continues, unless it's > + * completing immediately. > + * It's not necessary to expose job->deferred_to_main_loop to the user. Just clear it: static void block_job_defer_to_main_loop_bh(void *opaque) { BlockJobDeferToMainLoopData *data = opaque; AioContext *aio_context; qemu_bh_delete(data->bh); /* Prevent race with block_job_defer_to_main_loop() */ aio_context_acquire(data->aio_context); /* Fetch BDS AioContext again, in case it has changed */ aio_context = bdrv_get_aio_context(data->job->bs); aio_context_acquire(aio_context); data->fn(data->job, data->opaque); job->deferred_to_main_loop = false; /* <----- HERE */ aio_context_release(aio_context); aio_context_release(data->aio_context); g_free(data); }
On Fri, 01/29 11:31, Stefan Hajnoczi wrote: > On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote: > > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque); > > * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and > > * anything that uses bdrv_drain_all() in the main loop. > > * > > + * The job->deferred_to_main_loop flag will be set. Caller must clear it once > > + * the deferred work is done and the block job coroutine continues, unless it's > > + * completing immediately. > > + * > > It's not necessary to expose job->deferred_to_main_loop to the user. > Just clear it: > > static void block_job_defer_to_main_loop_bh(void *opaque) > { > BlockJobDeferToMainLoopData *data = opaque; > AioContext *aio_context; > > qemu_bh_delete(data->bh); > > /* Prevent race with block_job_defer_to_main_loop() */ > aio_context_acquire(data->aio_context); > > /* Fetch BDS AioContext again, in case it has changed */ > aio_context = bdrv_get_aio_context(data->job->bs); > aio_context_acquire(aio_context); > > data->fn(data->job, data->opaque); > job->deferred_to_main_loop = false; /* <----- HERE */ Maybe move one line above in case data->fn() does another block_job_defer_to_main_loop()? Fam > > aio_context_release(aio_context); > > aio_context_release(data->aio_context); > > g_free(data); > }
On Mon, Feb 01, 2016 at 10:49:00AM +0800, Fam Zheng wrote: > On Fri, 01/29 11:31, Stefan Hajnoczi wrote: > > On Fri, Jan 29, 2016 at 10:19:49AM +0800, Fam Zheng wrote: > > > @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque); > > > * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and > > > * anything that uses bdrv_drain_all() in the main loop. > > > * > > > + * The job->deferred_to_main_loop flag will be set. Caller must clear it once > > > + * the deferred work is done and the block job coroutine continues, unless it's > > > + * completing immediately. > > > + * > > > > It's not necessary to expose job->deferred_to_main_loop to the user. > > Just clear it: > > > > static void block_job_defer_to_main_loop_bh(void *opaque) > > { > > BlockJobDeferToMainLoopData *data = opaque; > > AioContext *aio_context; > > > > qemu_bh_delete(data->bh); > > > > /* Prevent race with block_job_defer_to_main_loop() */ > > aio_context_acquire(data->aio_context); > > > > /* Fetch BDS AioContext again, in case it has changed */ > > aio_context = bdrv_get_aio_context(data->job->bs); > > aio_context_acquire(aio_context); > > > > data->fn(data->job, data->opaque); > > job->deferred_to_main_loop = false; /* <----- HERE */ > > Maybe move one line above in case data->fn() does another > block_job_defer_to_main_loop()? Yes, good point. Thanks for spotting the bug. It's safe to clear the boolean as soon as we acquire aio_context. Stefan
diff --git a/blockjob.c b/blockjob.c index 80adb9d..25e1581 100644 --- a/blockjob.c +++ b/blockjob.c @@ -304,7 +304,9 @@ static int block_job_finish_sync(BlockJob *job, return -EBUSY; } while (!job->completed) { - aio_poll(bdrv_get_aio_context(bs), true); + aio_poll(job->deferred_to_main_loop ? qemu_get_aio_context() : + bdrv_get_aio_context(bs), + true); } ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret; block_job_unref(job); @@ -497,6 +499,7 @@ void block_job_defer_to_main_loop(BlockJob *job, data->aio_context = bdrv_get_aio_context(job->bs); data->fn = fn; data->opaque = opaque; + job->deferred_to_main_loop = true; qemu_bh_schedule(data->bh); } diff --git a/include/block/blockjob.h b/include/block/blockjob.h index d84ccd8..550de26 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -130,6 +130,11 @@ struct BlockJob { */ bool ready; + /** + * Set to true when the job has deferred work to the main loop. + */ + bool deferred_to_main_loop; + /** Status that is published by the query-block-jobs QMP API */ BlockDeviceIoStatus iostatus; @@ -402,6 +407,10 @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque); * AioContext acquired. Block jobs must call bdrv_unref(), bdrv_close(), and * anything that uses bdrv_drain_all() in the main loop. * + * The job->deferred_to_main_loop flag will be set. Caller must clear it once + * the deferred work is done and the block job coroutine continues, unless it's + * completing immediately. + * * The @job AioContext is held while @fn executes. */ void block_job_defer_to_main_loop(BlockJob *job,
With a mirror job running on a virtio-blk dataplane disk, sending "q" to HMP will cause a dead loop in block_job_finish_sync. This is because the aio_poll() only processes the AIO context of bs which has no more work to do, while the main loop BH that is scheduled for setting the job->completed flag is never processed. Fix this by adding a flag in BlockJob structure, to track which context to poll for the block job to make progress. Its value is set to true when block_job_coroutine_complete() is called, and is checked in block_job_finish_sync to determine which context to poll. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- blockjob.c | 5 ++++- include/block/blockjob.h | 9 +++++++++ 2 files changed, 13 insertions(+), 1 deletion(-)