Message ID | 1453964571-23016-3-git-send-email-famz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 28/01/2016 08:02, Fam Zheng wrote: > > This is because the aio_poll() only processes the AIO context of bs > which has no more work to do, while the main loop BH that is scheduled > for setting the job->completed flag is never processed. > > Fix this by adding a "ctx" pointer in BlockJob structure, to track which > context to poll for the block job to make progress. Its value is set to > the BDS context at block job creation, until > block_job_coroutine_complete() is called by the block job coroutine. > After that point, the block job's work is deferred to main loop BH. > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > blockjob.c | 4 +++- > include/block/blockjob.h | 2 ++ > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/blockjob.c b/blockjob.c > index 4b16720..4ea1ce0 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -74,6 +74,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, > job->opaque = opaque; > job->busy = true; > job->refcnt = 1; > + job->ctx = bdrv_get_aio_context(bs); Can the context change if dataplane is started/stopped in the middle of a job? (For example if you start migration). Perhaps job->ctx == NULL could mean "use bdrv_get_aio_context(bs)". Paolo > bs->job = job; > > /* Only set speed when necessary to avoid NotSupported error */ > @@ -304,7 +305,7 @@ static int block_job_finish_sync(BlockJob *job, > return -EBUSY; > } > while (!job->completed) { > - aio_poll(bdrv_get_aio_context(bs), true); > + aio_poll(job->ctx, true); > } > ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret; > block_job_unref(job); > @@ -497,6 +498,7 @@ void block_job_coroutine_complete(BlockJob *job, > data->aio_context = bdrv_get_aio_context(job->bs); > data->fn = fn; > data->opaque = opaque; > + job->ctx = qemu_get_aio_context(); > > qemu_bh_schedule(data->bh); > } > diff --git a/include/block/blockjob.h b/include/block/blockjob.h > index de59fc2..5c6a884 100644 > --- a/include/block/blockjob.h > +++ b/include/block/blockjob.h > @@ -92,6 +92,8 @@ struct BlockJob { > */ > char *id; > > + AioContext *ctx; > + > /** > * The coroutine that executes the job. If not NULL, it is > * reentered when busy is false and the job is cancelled. > -- 2.4.3
On Thu, 01/28 10:10, Paolo Bonzini wrote: > > > On 28/01/2016 08:02, Fam Zheng wrote: > > > > This is because the aio_poll() only processes the AIO context of bs > > which has no more work to do, while the main loop BH that is scheduled > > for setting the job->completed flag is never processed. > > > > Fix this by adding a "ctx" pointer in BlockJob structure, to track which > > context to poll for the block job to make progress. Its value is set to > > the BDS context at block job creation, until > > block_job_coroutine_complete() is called by the block job coroutine. > > After that point, the block job's work is deferred to main loop BH. > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > > blockjob.c | 4 +++- > > include/block/blockjob.h | 2 ++ > > 2 files changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/blockjob.c b/blockjob.c > > index 4b16720..4ea1ce0 100644 > > --- a/blockjob.c > > +++ b/blockjob.c > > @@ -74,6 +74,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, > > job->opaque = opaque; > > job->busy = true; > > job->refcnt = 1; > > + job->ctx = bdrv_get_aio_context(bs); > > Can the context change if dataplane is started/stopped in the middle of > a job? (For example if you start migration). Perhaps job->ctx == NULL > could mean "use bdrv_get_aio_context(bs)". Yes, that's a good idea. Fam
On Thu, Jan 28, 2016 at 10:10:06AM +0100, Paolo Bonzini wrote: > > > On 28/01/2016 08:02, Fam Zheng wrote: > > > > This is because the aio_poll() only processes the AIO context of bs > > which has no more work to do, while the main loop BH that is scheduled > > for setting the job->completed flag is never processed. > > > > Fix this by adding a "ctx" pointer in BlockJob structure, to track which > > context to poll for the block job to make progress. Its value is set to > > the BDS context at block job creation, until > > block_job_coroutine_complete() is called by the block job coroutine. > > After that point, the block job's work is deferred to main loop BH. > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > > blockjob.c | 4 +++- > > include/block/blockjob.h | 2 ++ > > 2 files changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/blockjob.c b/blockjob.c > > index 4b16720..4ea1ce0 100644 > > --- a/blockjob.c > > +++ b/blockjob.c > > @@ -74,6 +74,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, > > job->opaque = opaque; > > job->busy = true; > > job->refcnt = 1; > > + job->ctx = bdrv_get_aio_context(bs); > > Can the context change if dataplane is started/stopped in the middle of > a job? (For example if you start migration). Perhaps job->ctx == NULL > could mean "use bdrv_get_aio_context(bs)". Yes, it can change. Normally a block job doesn't need to know its AioContext since the coroutine only runs from inside the AioContext anyway. Perhaps it's easier to make this special case explicit with a flag in job->deferred_to_main_loop so the synchronous wait can do: while (!job->completed) { AioContext *ctx = job->deferred_to_main_loop ? qemu_get_aio_context() : bdrv_get_aio_context(bs); aio_poll(ctx, true); } That way there's no need to store a possibly stale AioContext pointer. Stefan
diff --git a/blockjob.c b/blockjob.c index 4b16720..4ea1ce0 100644 --- a/blockjob.c +++ b/blockjob.c @@ -74,6 +74,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, job->opaque = opaque; job->busy = true; job->refcnt = 1; + job->ctx = bdrv_get_aio_context(bs); bs->job = job; /* Only set speed when necessary to avoid NotSupported error */ @@ -304,7 +305,7 @@ static int block_job_finish_sync(BlockJob *job, return -EBUSY; } while (!job->completed) { - aio_poll(bdrv_get_aio_context(bs), true); + aio_poll(job->ctx, true); } ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret; block_job_unref(job); @@ -497,6 +498,7 @@ void block_job_coroutine_complete(BlockJob *job, data->aio_context = bdrv_get_aio_context(job->bs); data->fn = fn; data->opaque = opaque; + job->ctx = qemu_get_aio_context(); qemu_bh_schedule(data->bh); } diff --git a/include/block/blockjob.h b/include/block/blockjob.h index de59fc2..5c6a884 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -92,6 +92,8 @@ struct BlockJob { */ char *id; + AioContext *ctx; + /** * The coroutine that executes the job. If not NULL, it is * reentered when busy is false and the job is cancelled.
With a mirror job running on a virtio-blk dataplane disk, sending "q" to HMP will cause a dead loop in block_job_finish_sync. This is because the aio_poll() only processes the AIO context of bs which has no more work to do, while the main loop BH that is scheduled for setting the job->completed flag is never processed. Fix this by adding a "ctx" pointer in BlockJob structure, to track which context to poll for the block job to make progress. Its value is set to the BDS context at block job creation, until block_job_coroutine_complete() is called by the block job coroutine. After that point, the block job's work is deferred to main loop BH. Signed-off-by: Fam Zheng <famz@redhat.com> --- blockjob.c | 4 +++- include/block/blockjob.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-)