Message ID | 48f12427ab83c20fd27aa451e5a941201581742c.1476450059.git.berto@igalia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/10/2016 15:08, Alberto Garcia wrote: > When a BlockDriverState is about to be reopened it can trigger certain > operations that need to write to disk. During this process a different > block job can be woken up. If that block job completes and also needs > to call bdrv_reopen() it can happen that it needs to do it on the same > BlockDriverState that is still in the process of being reopened. > > This can have fatal consequences, like in this example: > > 1) Block job A starts and sleeps after a while. > 2) Block job B starts and tries to reopen node1 (a qcow2 file). > 3) Reopening node1 means flushing and replacing its qcow2 cache. > 4) While the qcow2 cache is being flushed, job A wakes up. > 5) Job A completes and reopens node1, replacing its cache. > 6) Job B resumes, but the cache that was being flushed no longer > exists. > > This patch splits the bdrv_drain_all() call to keep all block jobs > paused during bdrv_reopen_multiple(), so that step 4 can never happen > and the operation is safe. > > Note that this scenario can only happen if both bdrv_reopen() calls > are made by block jobs on the same backing chain. Otherwise there's no > chance that the same BlockDriverState appears in both reopen queues. > > Signed-off-by: Alberto Garcia <berto@igalia.com> > --- > block.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/block.c b/block.c > index 7f3e7bc..adbecd0 100644 > --- a/block.c > +++ b/block.c > @@ -2090,7 +2090,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp) > > assert(bs_queue != NULL); > > - bdrv_drain_all(); > + bdrv_drain_all_begin(); > > QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) { > if (bdrv_reopen_prepare(&bs_entry->state, bs_queue, &local_err)) { > @@ -2120,6 +2120,9 @@ cleanup: > g_free(bs_entry); > } > g_free(bs_queue); > + > + bdrv_drain_all_end(); > + > return ret; > } > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/block.c b/block.c index 7f3e7bc..adbecd0 100644 --- a/block.c +++ b/block.c @@ -2090,7 +2090,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp) assert(bs_queue != NULL); - bdrv_drain_all(); + bdrv_drain_all_begin(); QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) { if (bdrv_reopen_prepare(&bs_entry->state, bs_queue, &local_err)) { @@ -2120,6 +2120,9 @@ cleanup: g_free(bs_entry); } g_free(bs_queue); + + bdrv_drain_all_end(); + return ret; }
When a BlockDriverState is about to be reopened it can trigger certain operations that need to write to disk. During this process a different block job can be woken up. If that block job completes and also needs to call bdrv_reopen() it can happen that it needs to do it on the same BlockDriverState that is still in the process of being reopened. This can have fatal consequences, like in this example: 1) Block job A starts and sleeps after a while. 2) Block job B starts and tries to reopen node1 (a qcow2 file). 3) Reopening node1 means flushing and replacing its qcow2 cache. 4) While the qcow2 cache is being flushed, job A wakes up. 5) Job A completes and reopens node1, replacing its cache. 6) Job B resumes, but the cache that was being flushed no longer exists. This patch splits the bdrv_drain_all() call to keep all block jobs paused during bdrv_reopen_multiple(), so that step 4 can never happen and the operation is safe. Note that this scenario can only happen if both bdrv_reopen() calls are made by block jobs on the same backing chain. Otherwise there's no chance that the same BlockDriverState appears in both reopen queues. Signed-off-by: Alberto Garcia <berto@igalia.com> --- block.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)