Message ID | 20210211172242.146671-2-mreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | file-posix: Cache next hole | expand |
On 2/11/21 11:22 AM, Max Reitz wrote: > mirror_top currently shares all permissions, and takes only the WRITE > permission (if some parent has taken that permission, too). > > That is wrong, though; mirror_top is a filter, so it should take > permissions like any other filter does. For example, if the parent > needs CONSISTENT_READ, we need to take that, too, and if it cannot share > the WRITE permission, we cannot share it either. > > The exception is when mirror_top is used for active commit, where we > cannot take CONSISTENT_READ (because it is deliberately unshared above > the base node) and where we must share WRITE (so that it is shared for > all images in the backing chain, so the mirror job can take it for the > target BB). > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/mirror.c | 32 +++++++++++++++++++++++++------- > 1 file changed, 25 insertions(+), 7 deletions(-) > Reviewed-by: Eric Blake <eblake@redhat.com>
11.02.2021 20:22, Max Reitz wrote: > mirror_top currently shares all permissions, and takes only the WRITE > permission (if some parent has taken that permission, too). > > That is wrong, though; mirror_top is a filter, so it should take > permissions like any other filter does. For example, if the parent > needs CONSISTENT_READ, we need to take that, too, and if it cannot share > the WRITE permission, we cannot share it either. > > The exception is when mirror_top is used for active commit, where we > cannot take CONSISTENT_READ (because it is deliberately unshared above > the base node) and where we must share WRITE (so that it is shared for > all images in the backing chain, so the mirror job can take it for the > target BB). > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/mirror.c | 32 +++++++++++++++++++++++++------- > 1 file changed, 25 insertions(+), 7 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 8e1ad6eceb..1edfc3cc14 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -89,6 +89,7 @@ typedef struct MirrorBlockJob { > typedef struct MirrorBDSOpaque { > MirrorBlockJob *job; > bool stop; > + bool is_commit; > } MirrorBDSOpaque; > > struct MirrorOp { > @@ -1513,13 +1514,27 @@ static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c, > return; > } > > - /* Must be able to forward guest writes to the real image */ > - *nperm = 0; > - if (perm & BLK_PERM_WRITE) { > - *nperm |= BLK_PERM_WRITE; > - } > + bdrv_default_perms(bs, c, role, reopen_queue, > + perm, shared, nperm, nshared); > > - *nshared = BLK_PERM_ALL; > + if (s->is_commit) { > + /* > + * For commit jobs, we cannot take CONSISTENT_READ, because > + * that permission is unshared for everything above the base > + * node (except for filters on the base node). > + * We also have to force-share the WRITE permission, or > + * otherwise we would block ourselves at the base node (if > + * writes are blocked for a node, they are also blocked for > + * its backing file). > + * (We could also share RESIZE, because it may be needed for > + * the target if its size is less than the top node's; but > + * bdrv_default_perms_for_cow() automatically shares RESIZE > + * for backing nodes if WRITE is shared, so there is no need > + * to do it here.) > + */ > + *nperm &= ~BLK_PERM_CONSISTENT_READ; this works only because we don't assert READ permission on generic read path in block/io.c, like we do for WRITE.. but this is better than pre-patch anyway.. How block-jobs and filters works - definitely goes beyond our permissions architecture.. > + *nshared |= BLK_PERM_WRITE; > + } > } > > /* Dummy node that provides consistent read to its users without requiring it > @@ -1583,6 +1598,8 @@ static BlockJob *mirror_start_job( > return NULL; > } > > + target_is_backing = bdrv_chain_contains(bs, target); may be initialized together with definition.. > + > /* In the case of active commit, add dummy driver to provide consistent > * reads on the top, while disabling it in the intermediate nodes, and make > * the backing chain writable. */ > @@ -1605,6 +1622,8 @@ static BlockJob *mirror_start_job( > bs_opaque = g_new0(MirrorBDSOpaque, 1); > mirror_top_bs->opaque = bs_opaque; > > + bs_opaque->is_commit = target_is_backing; > + > /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep > * it alive until block_job_create() succeeds even if bs has no parent. */ > bdrv_ref(mirror_top_bs); > @@ -1646,7 +1665,6 @@ static BlockJob *mirror_start_job( > target_perms = BLK_PERM_WRITE; > target_shared_perms = BLK_PERM_WRITE_UNCHANGED; > > - target_is_backing = bdrv_chain_contains(bs, target); > if (target_is_backing) { > int64_t bs_size, target_size; > bs_size = bdrv_getlength(bs); > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
On 12.02.21 10:04, Vladimir Sementsov-Ogievskiy wrote: > 11.02.2021 20:22, Max Reitz wrote: >> mirror_top currently shares all permissions, and takes only the WRITE >> permission (if some parent has taken that permission, too). >> >> That is wrong, though; mirror_top is a filter, so it should take >> permissions like any other filter does. For example, if the parent >> needs CONSISTENT_READ, we need to take that, too, and if it cannot share >> the WRITE permission, we cannot share it either. >> >> The exception is when mirror_top is used for active commit, where we >> cannot take CONSISTENT_READ (because it is deliberately unshared above >> the base node) and where we must share WRITE (so that it is shared for >> all images in the backing chain, so the mirror job can take it for the >> target BB). >> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> block/mirror.c | 32 +++++++++++++++++++++++++------- >> 1 file changed, 25 insertions(+), 7 deletions(-) >> >> diff --git a/block/mirror.c b/block/mirror.c >> index 8e1ad6eceb..1edfc3cc14 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -89,6 +89,7 @@ typedef struct MirrorBlockJob { >> typedef struct MirrorBDSOpaque { >> MirrorBlockJob *job; >> bool stop; >> + bool is_commit; >> } MirrorBDSOpaque; >> struct MirrorOp { >> @@ -1513,13 +1514,27 @@ static void >> bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c, >> return; >> } >> - /* Must be able to forward guest writes to the real image */ >> - *nperm = 0; >> - if (perm & BLK_PERM_WRITE) { >> - *nperm |= BLK_PERM_WRITE; >> - } >> + bdrv_default_perms(bs, c, role, reopen_queue, >> + perm, shared, nperm, nshared); >> - *nshared = BLK_PERM_ALL; >> + if (s->is_commit) { >> + /* >> + * For commit jobs, we cannot take CONSISTENT_READ, because >> + * that permission is unshared for everything above the base >> + * node (except for filters on the base node). >> + * We also have to force-share the WRITE permission, or >> + * otherwise we would block ourselves at the base node (if >> + * writes are blocked for a node, they are also blocked for >> + * its backing file). >> + * (We could also share RESIZE, because it may be needed for >> + * the target if its size is less than the top node's; but >> + * bdrv_default_perms_for_cow() automatically shares RESIZE >> + * for backing nodes if WRITE is shared, so there is no need >> + * to do it here.) >> + */ >> + *nperm &= ~BLK_PERM_CONSISTENT_READ; > > this works only because we don't assert READ permission on generic read > path in block/io.c, like we do for WRITE.. > but this is better than pre-patch anyway.. Yes, because you can read regardless of CONSISTENT_READ, the question is just whether you’ll receive consistent data. The caller needs to decide whether that’s the case. Taking that permission kind of is deferring the question whether the data will be consistent to the block layer. In case of commit, we unshare CONSISTENT_READ above the base, because the data between base and top will not be consistent. Starting from top, we know it is, so we do not need to take the permission, because we do not need that guarantee from the block layer; the commit job can give that guarantee itself. (I suppose we could add some ALLOW_INCONSISTENT flag to read requests, and the permission is checked when that flag is not present, but I don’t think we really need to.) (Technically we have the problem that there could be something else between top and base that unshares CONSISTENT_READ, and we’ll never know, but addressing that would be complicated and it’s only a hypothetical problem, AFAIA.) > How block-jobs and filters works - definitely goes beyond our > permissions architecture.. FWIW, AFAIR, the first job filter node was commit_top, whose purpose was precisely to allow unsharing CONSISTENT_READ on the base and then offering it again on the top. >> + *nshared |= BLK_PERM_WRITE; >> + } >> } >> /* Dummy node that provides consistent read to its users without >> requiring it >> @@ -1583,6 +1598,8 @@ static BlockJob *mirror_start_job( >> return NULL; >> } >> + target_is_backing = bdrv_chain_contains(bs, target); > > may be initialized together with definition.. Could be, but would that be better? :) >> + >> /* In the case of active commit, add dummy driver to provide >> consistent >> * reads on the top, while disabling it in the intermediate >> nodes, and make >> * the backing chain writable. */ >> @@ -1605,6 +1622,8 @@ static BlockJob *mirror_start_job( >> bs_opaque = g_new0(MirrorBDSOpaque, 1); >> mirror_top_bs->opaque = bs_opaque; >> + bs_opaque->is_commit = target_is_backing; >> + >> /* bdrv_append takes ownership of the mirror_top_bs reference, >> need to keep >> * it alive until block_job_create() succeeds even if bs has no >> parent. */ >> bdrv_ref(mirror_top_bs); >> @@ -1646,7 +1665,6 @@ static BlockJob *mirror_start_job( >> target_perms = BLK_PERM_WRITE; >> target_shared_perms = BLK_PERM_WRITE_UNCHANGED; >> - target_is_backing = bdrv_chain_contains(bs, target); >> if (target_is_backing) { >> int64_t bs_size, target_size; >> bs_size = bdrv_getlength(bs); >> > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Thanks! Max
12.02.2021 12:23, Max Reitz wrote: > On 12.02.21 10:04, Vladimir Sementsov-Ogievskiy wrote: >> 11.02.2021 20:22, Max Reitz wrote: >>> mirror_top currently shares all permissions, and takes only the WRITE >>> permission (if some parent has taken that permission, too). >>> >>> That is wrong, though; mirror_top is a filter, so it should take >>> permissions like any other filter does. For example, if the parent >>> needs CONSISTENT_READ, we need to take that, too, and if it cannot share >>> the WRITE permission, we cannot share it either. >>> >>> The exception is when mirror_top is used for active commit, where we >>> cannot take CONSISTENT_READ (because it is deliberately unshared above >>> the base node) and where we must share WRITE (so that it is shared for >>> all images in the backing chain, so the mirror job can take it for the >>> target BB). >>> >>> Signed-off-by: Max Reitz <mreitz@redhat.com> >>> --- >>> block/mirror.c | 32 +++++++++++++++++++++++++------- >>> 1 file changed, 25 insertions(+), 7 deletions(-) >>> >>> diff --git a/block/mirror.c b/block/mirror.c >>> index 8e1ad6eceb..1edfc3cc14 100644 >>> --- a/block/mirror.c >>> +++ b/block/mirror.c >>> @@ -89,6 +89,7 @@ typedef struct MirrorBlockJob { >>> typedef struct MirrorBDSOpaque { >>> MirrorBlockJob *job; >>> bool stop; >>> + bool is_commit; >>> } MirrorBDSOpaque; >>> struct MirrorOp { >>> @@ -1513,13 +1514,27 @@ static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c, >>> return; >>> } >>> - /* Must be able to forward guest writes to the real image */ >>> - *nperm = 0; >>> - if (perm & BLK_PERM_WRITE) { >>> - *nperm |= BLK_PERM_WRITE; >>> - } >>> + bdrv_default_perms(bs, c, role, reopen_queue, >>> + perm, shared, nperm, nshared); >>> - *nshared = BLK_PERM_ALL; >>> + if (s->is_commit) { >>> + /* >>> + * For commit jobs, we cannot take CONSISTENT_READ, because >>> + * that permission is unshared for everything above the base >>> + * node (except for filters on the base node). >>> + * We also have to force-share the WRITE permission, or >>> + * otherwise we would block ourselves at the base node (if >>> + * writes are blocked for a node, they are also blocked for >>> + * its backing file). >>> + * (We could also share RESIZE, because it may be needed for >>> + * the target if its size is less than the top node's; but >>> + * bdrv_default_perms_for_cow() automatically shares RESIZE >>> + * for backing nodes if WRITE is shared, so there is no need >>> + * to do it here.) >>> + */ >>> + *nperm &= ~BLK_PERM_CONSISTENT_READ; >> >> this works only because we don't assert READ permission on generic read path in block/io.c, like we do for WRITE.. >> but this is better than pre-patch anyway.. > > Yes, because you can read regardless of CONSISTENT_READ, the question is just whether you’ll receive consistent data. The caller needs to decide whether that’s the case. > > Taking that permission kind of is deferring the question whether the data will be consistent to the block layer. > > In case of commit, we unshare CONSISTENT_READ above the base, because the data between base and top will not be consistent. Starting from top, we know it is, so we do not need to take the permission, because we do not need that guarantee from the block layer; the commit job can give that guarantee itself. > > (I suppose we could add some ALLOW_INCONSISTENT flag to read requests, and the permission is checked when that flag is not present, but I don’t think we really need to.) > > (Technically we have the problem that there could be something else between top and base that unshares CONSISTENT_READ, and we’ll never know, but addressing that would be complicated and it’s only a hypothetical problem, AFAIA.) > >> How block-jobs and filters works - definitely goes beyond our permissions architecture.. > > FWIW, AFAIR, the first job filter node was commit_top, whose purpose was precisely to allow unsharing CONSISTENT_READ on the base and then offering it again on the top. Hmm reasonable. But sharing writes though backing chain is not that safe > >>> + *nshared |= BLK_PERM_WRITE; >>> + } >>> } >>> /* Dummy node that provides consistent read to its users without requiring it >>> @@ -1583,6 +1598,8 @@ static BlockJob *mirror_start_job( >>> return NULL; >>> } >>> + target_is_backing = bdrv_chain_contains(bs, target); >> >> may be initialized together with definition.. > > Could be, but would that be better? :) > >>> + >>> /* In the case of active commit, add dummy driver to provide consistent >>> * reads on the top, while disabling it in the intermediate nodes, and make >>> * the backing chain writable. */ >>> @@ -1605,6 +1622,8 @@ static BlockJob *mirror_start_job( >>> bs_opaque = g_new0(MirrorBDSOpaque, 1); >>> mirror_top_bs->opaque = bs_opaque; >>> + bs_opaque->is_commit = target_is_backing; >>> + >>> /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep >>> * it alive until block_job_create() succeeds even if bs has no parent. */ >>> bdrv_ref(mirror_top_bs); >>> @@ -1646,7 +1665,6 @@ static BlockJob *mirror_start_job( >>> target_perms = BLK_PERM_WRITE; >>> target_shared_perms = BLK_PERM_WRITE_UNCHANGED; >>> - target_is_backing = bdrv_chain_contains(bs, target); >>> if (target_is_backing) { >>> int64_t bs_size, target_size; >>> bs_size = bdrv_getlength(bs); >>> >> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > Thanks! > > Max >
diff --git a/block/mirror.c b/block/mirror.c index 8e1ad6eceb..1edfc3cc14 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -89,6 +89,7 @@ typedef struct MirrorBlockJob { typedef struct MirrorBDSOpaque { MirrorBlockJob *job; bool stop; + bool is_commit; } MirrorBDSOpaque; struct MirrorOp { @@ -1513,13 +1514,27 @@ static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c, return; } - /* Must be able to forward guest writes to the real image */ - *nperm = 0; - if (perm & BLK_PERM_WRITE) { - *nperm |= BLK_PERM_WRITE; - } + bdrv_default_perms(bs, c, role, reopen_queue, + perm, shared, nperm, nshared); - *nshared = BLK_PERM_ALL; + if (s->is_commit) { + /* + * For commit jobs, we cannot take CONSISTENT_READ, because + * that permission is unshared for everything above the base + * node (except for filters on the base node). + * We also have to force-share the WRITE permission, or + * otherwise we would block ourselves at the base node (if + * writes are blocked for a node, they are also blocked for + * its backing file). + * (We could also share RESIZE, because it may be needed for + * the target if its size is less than the top node's; but + * bdrv_default_perms_for_cow() automatically shares RESIZE + * for backing nodes if WRITE is shared, so there is no need + * to do it here.) + */ + *nperm &= ~BLK_PERM_CONSISTENT_READ; + *nshared |= BLK_PERM_WRITE; + } } /* Dummy node that provides consistent read to its users without requiring it @@ -1583,6 +1598,8 @@ static BlockJob *mirror_start_job( return NULL; } + target_is_backing = bdrv_chain_contains(bs, target); + /* In the case of active commit, add dummy driver to provide consistent * reads on the top, while disabling it in the intermediate nodes, and make * the backing chain writable. */ @@ -1605,6 +1622,8 @@ static BlockJob *mirror_start_job( bs_opaque = g_new0(MirrorBDSOpaque, 1); mirror_top_bs->opaque = bs_opaque; + bs_opaque->is_commit = target_is_backing; + /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep * it alive until block_job_create() succeeds even if bs has no parent. */ bdrv_ref(mirror_top_bs); @@ -1646,7 +1665,6 @@ static BlockJob *mirror_start_job( target_perms = BLK_PERM_WRITE; target_shared_perms = BLK_PERM_WRITE_UNCHANGED; - target_is_backing = bdrv_chain_contains(bs, target); if (target_is_backing) { int64_t bs_size, target_size; bs_size = bdrv_getlength(bs);
mirror_top currently shares all permissions, and takes only the WRITE permission (if some parent has taken that permission, too). That is wrong, though; mirror_top is a filter, so it should take permissions like any other filter does. For example, if the parent needs CONSISTENT_READ, we need to take that, too, and if it cannot share the WRITE permission, we cannot share it either. The exception is when mirror_top is used for active commit, where we cannot take CONSISTENT_READ (because it is deliberately unshared above the base node) and where we must share WRITE (so that it is shared for all images in the backing chain, so the mirror job can take it for the target BB). Signed-off-by: Max Reitz <mreitz@redhat.com> --- block/mirror.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-)