Message ID | 20240605132539.3668497-2-jean-louis@dupond.be (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] qcow2: handle discard-no-unref in measure | expand |
On 05.06.24 15:25, Jean-Louis Dupond wrote: > When doing a measure on an image with a backing file and > discard-no-unref is enabled, the code should take this into account. That doesn’t make sense to me. As far as I understand, 'measure' is supposed to report how much space you need for a given image, i.e. if you were to convert it to a new image. discard-no-unref doesn’t factor into that, because for a 'convert' target (a new image), nothing can be discarded. Reading the issue, I understand that oVirt uses measure to determine the size of the target of a 'commit' operation. Seems a bit like abuse to me, precisely because of the issue you’re facing. More specifically, a 'commit' operation is a complex thing with a lot of variables, so the outcome depends on a lot. For example, this patch just checks the discard-no-unref setting on the top image. But AFAIU it doesn’t matter what the setting on the top image is, it matters what the setting on the commit target is. 'measure' can’t know this because it doesn’t know what the commit target is. As far as I can see, this patch actually assumes the commit target is the first backing image (it specifically checks in the image whether a block is allocated) – why? So to me that means if 'measure' is supposed to give reliable data on the commit case, it needs to be extended. Best thing I can come up with off the top of my head would be to add an option e.g. 'commit=<target-node-name>', so we (A) that we’re looking at a commit and not a convert, and (B) we know what data will be collapsed into which image and where we need to check for discard-no-unref. Hanna > If for example you have a snapshot image with a base, and you do a > discard within the snapshot, it will be ZERO and ALLOCATED, but without > host offset. > Now if we commit this snapshot, and the clusters in the base image have > a host offset, the clusters will only be set to ZERO, but the host offset > will not be cleared. > Therefor non-data clusters in the top image need to check the > base to see if space will be freed or not, to have a correct measure > output. > > Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369 > Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be> > --- > block/qcow2.c | 32 +++++++++++++++++++++++++++++--- > 1 file changed, 29 insertions(+), 3 deletions(-) > > diff --git a/block/qcow2.c b/block/qcow2.c > index 956128b409..50354e5b98 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -5163,9 +5163,16 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs, > } else { > int64_t offset; > int64_t pnum = 0; > + BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs); > + BDRVQcow2State *s = NULL; > + > + if (parent) { > + s = parent->opaque; > + } > > for (offset = 0; offset < ssize; offset += pnum) { > int ret; > + int retp = 0; > > ret = bdrv_block_status_above(in_bs, NULL, offset, > ssize - offset, &pnum, NULL, > @@ -5176,10 +5183,29 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs, > goto err; > } > > - if (ret & BDRV_BLOCK_ZERO) { > + /* If we have a parent in the chain and the current block is not data, > + * then we want to check the allocation state of the parent block. > + * If it has a valid offset, then we want to include it into > + * the calculation, cause blocks with an offset will not be freed when > + * committing the top into base with discard-no-unref enabled. > + */ > + if (parent && s->discard_no_unref && !(ret & BDRV_BLOCK_DATA)) { > + int64_t pnum_parent = 0; > + retp = bdrv_block_status_above(parent, NULL, offset, > + ssize - offset, &pnum_parent, NULL, > + NULL); > + /* If the parent continuous block is smaller, use that pnum, > + * so the next iteration starts with the smallest offset. > + */ > + if (pnum_parent < pnum) { > + pnum = pnum_parent; > + } > + } > + if (ret & BDRV_BLOCK_ZERO && !parent && !(parent && s->discard_no_unref)) { > /* Skip zero regions (safe with no backing file) */ > - } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) == > - (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) { > + } else if (((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) == > + (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) || > + (retp & BDRV_BLOCK_OFFSET_VALID)) { > /* Extend pnum to end of cluster for next iteration */ > pnum = ROUND_UP(offset + pnum, cluster_size) - offset; >
diff --git a/block/qcow2.c b/block/qcow2.c index 956128b409..50354e5b98 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -5163,9 +5163,16 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs, } else { int64_t offset; int64_t pnum = 0; + BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs); + BDRVQcow2State *s = NULL; + + if (parent) { + s = parent->opaque; + } for (offset = 0; offset < ssize; offset += pnum) { int ret; + int retp = 0; ret = bdrv_block_status_above(in_bs, NULL, offset, ssize - offset, &pnum, NULL, @@ -5176,10 +5183,29 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs, goto err; } - if (ret & BDRV_BLOCK_ZERO) { + /* If we have a parent in the chain and the current block is not data, + * then we want to check the allocation state of the parent block. + * If it has a valid offset, then we want to include it into + * the calculation, cause blocks with an offset will not be freed when + * committing the top into base with discard-no-unref enabled. + */ + if (parent && s->discard_no_unref && !(ret & BDRV_BLOCK_DATA)) { + int64_t pnum_parent = 0; + retp = bdrv_block_status_above(parent, NULL, offset, + ssize - offset, &pnum_parent, NULL, + NULL); + /* If the parent continuous block is smaller, use that pnum, + * so the next iteration starts with the smallest offset. + */ + if (pnum_parent < pnum) { + pnum = pnum_parent; + } + } + if (ret & BDRV_BLOCK_ZERO && !parent && !(parent && s->discard_no_unref)) { /* Skip zero regions (safe with no backing file) */ - } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) == - (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) { + } else if (((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) == + (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) || + (retp & BDRV_BLOCK_OFFSET_VALID)) { /* Extend pnum to end of cluster for next iteration */ pnum = ROUND_UP(offset + pnum, cluster_size) - offset;
When doing a measure on an image with a backing file and discard-no-unref is enabled, the code should take this into account. If for example you have a snapshot image with a base, and you do a discard within the snapshot, it will be ZERO and ALLOCATED, but without host offset. Now if we commit this snapshot, and the clusters in the base image have a host offset, the clusters will only be set to ZERO, but the host offset will not be cleared. Therefor non-data clusters in the top image need to check the base to see if space will be freed or not, to have a correct measure output. Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369 Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be> --- block/qcow2.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-)