diff mbox series

[2/2] block/rbd: workaround for ceph issue #53784

Message ID 20220110114154.3774072-3-pl@kamp.de (mailing list archive)
State New, archived
Headers show
Series block/rbd: fixes for bdrv_co_block_status | expand

Commit Message

Peter Lieven Jan. 10, 2022, 11:41 a.m. UTC
librbd had a bug until early 2022 that affected all versions of ceph that
supported fast-diff. This bug results in reporting of incorrect offsets
if the offset parameter to rbd_diff_iterate2 is not object aligned.
Work around this bug by rounding down the offset to object boundaries.

Fixes: https://tracker.ceph.com/issues/53784
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/rbd.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Stefano Garzarella Jan. 10, 2022, 2:18 p.m. UTC | #1
On Mon, Jan 10, 2022 at 12:41:54PM +0100, Peter Lieven wrote:
>librbd had a bug until early 2022 that affected all versions of ceph that
>supported fast-diff. This bug results in reporting of incorrect offsets
>if the offset parameter to rbd_diff_iterate2 is not object aligned.
>Work around this bug by rounding down the offset to object boundaries.
>
>Fixes: https://tracker.ceph.com/issues/53784
>Cc: qemu-stable@nongnu.org
>Signed-off-by: Peter Lieven <pl@kamp.de>
>---
> block/rbd.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
>diff --git a/block/rbd.c b/block/rbd.c
>index 5e9dc91d81..260cb9f4b4 100644
>--- a/block/rbd.c
>+++ b/block/rbd.c
>@@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>     int status, r;
>     RBDDiffIterateReq req = { .offs = offset };
>     uint64_t features, flags;
>+    int64_t head;
>
>     assert(offset + bytes <= s->image_size);
>
>@@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>         return status;
>     }
>
>+    /*
>+     * librbd had a bug until early 2022 that affected all versions of ceph that
>+     * supported fast-diff. This bug results in reporting of incorrect offsets
>+     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
>+     * Work around this bug by rounding down the offset to object boundaries.
>+     *
>+     * See: https://tracker.ceph.com/issues/53784
>+     */
>+    head = offset & (s->object_size - 1);
>+    offset -= head;
>+    req.offs -= head;
>+    bytes += head;
>+
>     r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>                           qemu_rbd_diff_iterate_cb, &req);
>     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>@@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>     }
>
>-    *pnum = req.bytes;
>+    assert(req.bytes > head);
>+    *pnum = req.bytes - head;
>     return status;
> }

Thanks for the workaround!

I just tested this patch for the issue reported in this BZ [1] and the 
test now works correctly!

Tested-by: Stefano Garzarella <sgarzare@redhat.com>

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2034791
Peter Lieven Jan. 11, 2022, 9:10 a.m. UTC | #2
Am 10.01.22 um 15:18 schrieb Stefano Garzarella:
> On Mon, Jan 10, 2022 at 12:41:54PM +0100, Peter Lieven wrote:
>> librbd had a bug until early 2022 that affected all versions of ceph that
>> supported fast-diff. This bug results in reporting of incorrect offsets
>> if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> Work around this bug by rounding down the offset to object boundaries.
>>
>> Fixes: https://tracker.ceph.com/issues/53784
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> block/rbd.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 5e9dc91d81..260cb9f4b4 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>     int status, r;
>>     RBDDiffIterateReq req = { .offs = offset };
>>     uint64_t features, flags;
>> +    int64_t head;
>>
>>     assert(offset + bytes <= s->image_size);
>>
>> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>         return status;
>>     }
>>
>> +    /*
>> +     * librbd had a bug until early 2022 that affected all versions of ceph that
>> +     * supported fast-diff. This bug results in reporting of incorrect offsets
>> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> +     * Work around this bug by rounding down the offset to object boundaries.
>> +     *
>> +     * See: https://tracker.ceph.com/issues/53784
>> +     */
>> +    head = offset & (s->object_size - 1);
>> +    offset -= head;
>> +    req.offs -= head;
>> +    bytes += head;
>> +
>>     r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>>                           qemu_rbd_diff_iterate_cb, &req);
>>     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>> @@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>>     }
>>
>> -    *pnum = req.bytes;
>> +    assert(req.bytes > head);
>> +    *pnum = req.bytes - head;
>>     return status;
>> }
>
> Thanks for the workaround!
>
> I just tested this patch for the issue reported in this BZ [1] and the test now works correctly!
>
> Tested-by: Stefano Garzarella <sgarzare@redhat.com>
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2034791
>


Hi Stefano,


thanks for the feedback. Please note that you also need the other patch or you will sooner or later run into another assertion as soon as rbd snapshots are involved.


Regarding the workaround I need confirmation from Ilya that it covers all cases. I do not know if it works if striping or EC is configured on the pool.


Best,

Peter
Stefano Garzarella Jan. 11, 2022, 11:15 a.m. UTC | #3
Hi Peter,

On Tue, Jan 11, 2022 at 10:10:16AM +0100, Peter Lieven wrote:
>Hi Stefano,
>
>
>thanks for the feedback. Please note that you also need the other patch 
>or you will sooner or later run into another assertion as soon as rbd 
>snapshots are involved.

Yep, I tested with the entire series applied.
Anyway, thanks for clarifying that.

>
>Regarding the workaround I need confirmation from Ilya that it covers 
>all cases. I do not know if it works if striping or EC is configured on 
>the pool.

Sure :-)

Thanks,
Stefano
Ilya Dryomov Jan. 12, 2022, 9:59 a.m. UTC | #4
On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
>
> librbd had a bug until early 2022 that affected all versions of ceph that
> supported fast-diff. This bug results in reporting of incorrect offsets
> if the offset parameter to rbd_diff_iterate2 is not object aligned.
> Work around this bug by rounding down the offset to object boundaries.
>
> Fixes: https://tracker.ceph.com/issues/53784

I don't think the Fixes tag is appropriate here.  Linking librbd
ticket is fine but this patch doesn't really fix anything.

> Cc: qemu-stable@nongnu.org
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/rbd.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5e9dc91d81..260cb9f4b4 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>      int status, r;
>      RBDDiffIterateReq req = { .offs = offset };
>      uint64_t features, flags;
> +    int64_t head;
>
>      assert(offset + bytes <= s->image_size);
>
> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>          return status;
>      }
>
> +    /*
> +     * librbd had a bug until early 2022 that affected all versions of ceph that
> +     * supported fast-diff. This bug results in reporting of incorrect offsets
> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
> +     * Work around this bug by rounding down the offset to object boundaries.
> +     *
> +     * See: https://tracker.ceph.com/issues/53784
> +     */
> +    head = offset & (s->object_size - 1);
> +    offset -= head;
> +    req.offs -= head;
> +    bytes += head;

So it looks like the intention is to have more or less a permanent
workaround since all librbd versions are affected, right?  For that,
I think we would need to also reject custom striping patterns and
clones.  For the above to be reliable, the image has to be standalone
and have a default striping pattern (stripe_unit == object_size &&
stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
invalid.

> +

Nit: I'd replace { .offs = offset } initialization at the top with {}
and assign to req.offs here, right before calling rbd_diff_iterate2().

>      r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>                            qemu_rbd_diff_iterate_cb, &req);
>      if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
> @@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>      }
>
> -    *pnum = req.bytes;
> +    assert(req.bytes > head);

I'd expand the workaround comment with an explanation of why it's OK
to round down the offset -- because rbd_diff_iterate2() is called with
whole_object=true.  If that wasn't the case, on top of inconsistent
results for different offsets within an object, this assert could be
triggered.

Thanks,

                Ilya
Peter Lieven Jan. 12, 2022, 11:55 a.m. UTC | #5
Am 12.01.22 um 10:59 schrieb Ilya Dryomov:
> On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
>> librbd had a bug until early 2022 that affected all versions of ceph that
>> supported fast-diff. This bug results in reporting of incorrect offsets
>> if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> Work around this bug by rounding down the offset to object boundaries.
>>
>> Fixes: https://tracker.ceph.com/issues/53784
> I don't think the Fixes tag is appropriate here.  Linking librbd
> ticket is fine but this patch doesn't really fix anything.


Okay, I will change that to See:


>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/rbd.c | 17 ++++++++++++++++-
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 5e9dc91d81..260cb9f4b4 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>      int status, r;
>>      RBDDiffIterateReq req = { .offs = offset };
>>      uint64_t features, flags;
>> +    int64_t head;
>>
>>      assert(offset + bytes <= s->image_size);
>>
>> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>          return status;
>>      }
>>
>> +    /*
>> +     * librbd had a bug until early 2022 that affected all versions of ceph that
>> +     * supported fast-diff. This bug results in reporting of incorrect offsets
>> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
>> +     * Work around this bug by rounding down the offset to object boundaries.
>> +     *
>> +     * See: https://tracker.ceph.com/issues/53784
>> +     */
>> +    head = offset & (s->object_size - 1);
>> +    offset -= head;
>> +    req.offs -= head;
>> +    bytes += head;
> So it looks like the intention is to have more or less a permanent
> workaround since all librbd versions are affected, right?  For that,
> I think we would need to also reject custom striping patterns and
> clones.  For the above to be reliable, the image has to be standalone
> and have a default striping pattern (stripe_unit == object_size &&
> stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
> invalid.


Do you have a fealing how many users use a different striping pattern than default?

What about EC backed pools?

Do you have another idea how we can detect if the librbd version is broken?


>
>> +
> Nit: I'd replace { .offs = offset } initialization at the top with {}
> and assign to req.offs here, right before calling rbd_diff_iterate2().
>
>>      r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
>>                            qemu_rbd_diff_iterate_cb, &req);
>>      if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
>> @@ -1379,7 +1393,8 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
>>          status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
>>      }
>>
>> -    *pnum = req.bytes;
>> +    assert(req.bytes > head);
> I'd expand the workaround comment with an explanation of why it's OK
> to round down the offset -- because rbd_diff_iterate2() is called with
> whole_object=true.  If that wasn't the case, on top of inconsistent
> results for different offsets within an object, this assert could be
> triggered.

Sure, you are right. I had this in mind. This also does not change complexity

since we stay with the offset in the same object. I will mention both.


Peter
Ilya Dryomov Jan. 12, 2022, 12:22 p.m. UTC | #6
On Wed, Jan 12, 2022 at 12:55 PM Peter Lieven <pl@kamp.de> wrote:
>
> Am 12.01.22 um 10:59 schrieb Ilya Dryomov:
> > On Mon, Jan 10, 2022 at 12:43 PM Peter Lieven <pl@kamp.de> wrote:
> >> librbd had a bug until early 2022 that affected all versions of ceph that
> >> supported fast-diff. This bug results in reporting of incorrect offsets
> >> if the offset parameter to rbd_diff_iterate2 is not object aligned.
> >> Work around this bug by rounding down the offset to object boundaries.
> >>
> >> Fixes: https://tracker.ceph.com/issues/53784
> > I don't think the Fixes tag is appropriate here.  Linking librbd
> > ticket is fine but this patch doesn't really fix anything.
>
>
> Okay, I will change that to See:

It's already linked in the source code, up to you if you also want to
link it in the description.

>
>
> >
> >> Cc: qemu-stable@nongnu.org
> >> Signed-off-by: Peter Lieven <pl@kamp.de>
> >> ---
> >>  block/rbd.c | 17 ++++++++++++++++-
> >>  1 file changed, 16 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 5e9dc91d81..260cb9f4b4 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -1333,6 +1333,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
> >>      int status, r;
> >>      RBDDiffIterateReq req = { .offs = offset };
> >>      uint64_t features, flags;
> >> +    int64_t head;
> >>
> >>      assert(offset + bytes <= s->image_size);
> >>
> >> @@ -1360,6 +1361,19 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
> >>          return status;
> >>      }
> >>
> >> +    /*
> >> +     * librbd had a bug until early 2022 that affected all versions of ceph that
> >> +     * supported fast-diff. This bug results in reporting of incorrect offsets
> >> +     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
> >> +     * Work around this bug by rounding down the offset to object boundaries.
> >> +     *
> >> +     * See: https://tracker.ceph.com/issues/53784
> >> +     */
> >> +    head = offset & (s->object_size - 1);
> >> +    offset -= head;
> >> +    req.offs -= head;
> >> +    bytes += head;
> > So it looks like the intention is to have more or less a permanent
> > workaround since all librbd versions are affected, right?  For that,
> > I think we would need to also reject custom striping patterns and
> > clones.  For the above to be reliable, the image has to be standalone
> > and have a default striping pattern (stripe_unit == object_size &&
> > stripe_count == 1).  Otherwise, behave as if fast-diff is disabled or
> > invalid.
>
>
> Do you have a fealing how many users use a different striping pattern than default?

Very few.

>
> What about EC backed pools?

In this context EC pools behave exactly the same as replicated pools.

>
> Do you have another idea how we can detect if the librbd version is broken?

No.  Initially I wanted to just fix these bugs in librbd, relying on
the assumption that setups with a new QEMU should also have a fairly
new librbd.  But after looking at various distros and realizing the
extent of rbd_diff_iterate2() issues, I think a long-term workaround
in QEMU makes sense.  A configure-time check for known good versions
of librbd can be added later if someone feels like it.

Thanks,

                Ilya
diff mbox series

Patch

diff --git a/block/rbd.c b/block/rbd.c
index 5e9dc91d81..260cb9f4b4 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1333,6 +1333,7 @@  static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
     int status, r;
     RBDDiffIterateReq req = { .offs = offset };
     uint64_t features, flags;
+    int64_t head;
 
     assert(offset + bytes <= s->image_size);
 
@@ -1360,6 +1361,19 @@  static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
         return status;
     }
 
+    /*
+     * librbd had a bug until early 2022 that affected all versions of ceph that
+     * supported fast-diff. This bug results in reporting of incorrect offsets
+     * if the offset parameter to rbd_diff_iterate2 is not object aligned.
+     * Work around this bug by rounding down the offset to object boundaries.
+     *
+     * See: https://tracker.ceph.com/issues/53784
+     */
+    head = offset & (s->object_size - 1);
+    offset -= head;
+    req.offs -= head;
+    bytes += head;
+
     r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
                           qemu_rbd_diff_iterate_cb, &req);
     if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
@@ -1379,7 +1393,8 @@  static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
         status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
     }
 
-    *pnum = req.bytes;
+    assert(req.bytes > head);
+    *pnum = req.bytes - head;
     return status;
 }