diff mbox series

[v2,1/2] qcow2: handle discard-no-unref in measure

Message ID 20240605132539.3668497-2-jean-louis@dupond.be (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] qcow2: handle discard-no-unref in measure | expand

Commit Message

Jean-Louis Dupond June 5, 2024, 1:25 p.m. UTC
When doing a measure on an image with a backing file and
discard-no-unref is enabled, the code should take this into account.

If for example you have a snapshot image with a base, and you do a
discard within the snapshot, it will be ZERO and ALLOCATED, but without
host offset.
Now if we commit this snapshot, and the clusters in the base image have
a host offset, the clusters will only be set to ZERO, but the host offset
will not be cleared.
Therefor non-data clusters in the top image need to check the
base to see if space will be freed or not, to have a correct measure
output.

Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
---
 block/qcow2.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

Comments

Hanna Czenczek July 10, 2024, 12:58 p.m. UTC | #1
On 05.06.24 15:25, Jean-Louis Dupond wrote:
> When doing a measure on an image with a backing file and
> discard-no-unref is enabled, the code should take this into account.

That doesn’t make sense to me.  As far as I understand, 'measure' is 
supposed to report how much space you need for a given image, i.e. if 
you were to convert it to a new image.  discard-no-unref doesn’t factor 
into that, because for a 'convert' target (a new image), nothing can be 
discarded.

Reading the issue, I understand that oVirt uses measure to determine the 
size of the target of a 'commit' operation.  Seems a bit like abuse to 
me, precisely because of the issue you’re facing.  More specifically, a 
'commit' operation is a complex thing with a lot of variables, so the 
outcome depends on a lot.

For example, this patch just checks the discard-no-unref setting on the 
top image.  But AFAIU it doesn’t matter what the setting on the top 
image is, it matters what the setting on the commit target is. 'measure' 
can’t know this because it doesn’t know what the commit target is.  As 
far as I can see, this patch actually assumes the commit target is the 
first backing image (it specifically checks in the image whether a block 
is allocated) – why?

So to me that means if 'measure' is supposed to give reliable data on 
the commit case, it needs to be extended.  Best thing I can come up with 
off the top of my head would be to add an option e.g. 
'commit=<target-node-name>', so we (A) that we’re looking at a commit 
and not a convert, and (B) we know what data will be collapsed into 
which image and where we need to check for discard-no-unref.

Hanna

> If for example you have a snapshot image with a base, and you do a
> discard within the snapshot, it will be ZERO and ALLOCATED, but without
> host offset.
> Now if we commit this snapshot, and the clusters in the base image have
> a host offset, the clusters will only be set to ZERO, but the host offset
> will not be cleared.
> Therefor non-data clusters in the top image need to check the
> base to see if space will be freed or not, to have a correct measure
> output.
>
> Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
> ---
>   block/qcow2.c | 32 +++++++++++++++++++++++++++++---
>   1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 956128b409..50354e5b98 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -5163,9 +5163,16 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>           } else {
>               int64_t offset;
>               int64_t pnum = 0;
> +            BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
> +            BDRVQcow2State *s = NULL;
> +
> +            if (parent) {
> +                s = parent->opaque;
> +            }
>   
>               for (offset = 0; offset < ssize; offset += pnum) {
>                   int ret;
> +                int retp = 0;
>   
>                   ret = bdrv_block_status_above(in_bs, NULL, offset,
>                                                 ssize - offset, &pnum, NULL,
> @@ -5176,10 +5183,29 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>                       goto err;
>                   }
>   
> -                if (ret & BDRV_BLOCK_ZERO) {
> +                /* If we have a parent in the chain and the current block is not data,
> +                 * then we want to check the allocation state of the parent block.
> +                 * If it has a valid offset, then we want to include it into
> +                 * the calculation, cause blocks with an offset will not be freed when
> +                 * committing the top into base with discard-no-unref enabled.
> +                 */
> +                if (parent && s->discard_no_unref && !(ret & BDRV_BLOCK_DATA)) {
> +                        int64_t pnum_parent = 0;
> +                        retp = bdrv_block_status_above(parent, NULL, offset,
> +                                              ssize - offset, &pnum_parent, NULL,
> +                                              NULL);
> +                        /* If the parent continuous block is smaller, use that pnum,
> +                         * so the next iteration starts with the smallest offset.
> +                         */
> +                        if (pnum_parent < pnum) {
> +                            pnum = pnum_parent;
> +                        }
> +                }
> +                if (ret & BDRV_BLOCK_ZERO && !parent && !(parent && s->discard_no_unref)) {
>                       /* Skip zero regions (safe with no backing file) */
> -                } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
> -                           (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
> +                } else if (((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
> +                            (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ||
> +                           (retp & BDRV_BLOCK_OFFSET_VALID)) {
>                       /* Extend pnum to end of cluster for next iteration */
>                       pnum = ROUND_UP(offset + pnum, cluster_size) - offset;
>
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..50354e5b98 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5163,9 +5163,16 @@  static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
         } else {
             int64_t offset;
             int64_t pnum = 0;
+            BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
+            BDRVQcow2State *s = NULL;
+
+            if (parent) {
+                s = parent->opaque;
+            }
 
             for (offset = 0; offset < ssize; offset += pnum) {
                 int ret;
+                int retp = 0;
 
                 ret = bdrv_block_status_above(in_bs, NULL, offset,
                                               ssize - offset, &pnum, NULL,
@@ -5176,10 +5183,29 @@  static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
                     goto err;
                 }
 
-                if (ret & BDRV_BLOCK_ZERO) {
+                /* If we have a parent in the chain and the current block is not data,
+                 * then we want to check the allocation state of the parent block.
+                 * If it has a valid offset, then we want to include it into
+                 * the calculation, cause blocks with an offset will not be freed when
+                 * committing the top into base with discard-no-unref enabled.
+                 */
+                if (parent && s->discard_no_unref && !(ret & BDRV_BLOCK_DATA)) {
+                        int64_t pnum_parent = 0;
+                        retp = bdrv_block_status_above(parent, NULL, offset,
+                                              ssize - offset, &pnum_parent, NULL,
+                                              NULL);
+                        /* If the parent continuous block is smaller, use that pnum,
+                         * so the next iteration starts with the smallest offset.
+                         */
+                        if (pnum_parent < pnum) {
+                            pnum = pnum_parent;
+                        }
+                }
+                if (ret & BDRV_BLOCK_ZERO && !parent && !(parent && s->discard_no_unref)) {
                     /* Skip zero regions (safe with no backing file) */
-                } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
-                           (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
+                } else if (((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
+                            (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ||
+                           (retp & BDRV_BLOCK_OFFSET_VALID)) {
                     /* Extend pnum to end of cluster for next iteration */
                     pnum = ROUND_UP(offset + pnum, cluster_size) - offset;