diff mbox series

[v2] ide: Increment BB in-flight counter for TRIM BH

Message ID 20220120142259.120189-1-hreitz@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] ide: Increment BB in-flight counter for TRIM BH | expand

Commit Message

Hanna Czenczek Jan. 20, 2022, 2:22 p.m. UTC
When we still have an AIOCB registered for DMA operations, we try to
settle the respective operation by draining the BlockBackend associated
with the IDE device.

However, this assumes that every DMA operation is associated with an
increment of the BlockBackend’s in-flight counter (e.g. through some
ongoing I/O operation), so that draining the BB until its in-flight
counter reaches 0 will settle all DMA operations.  That is not the case:
For TRIM, the guest can issue a zero-length operation that will not
result in any I/O operation forwarded to the BlockBackend, and also not
increment the in-flight counter in any other way.  In such a case,
blk_drain() will be a no-op if no other operations are in flight.

It is clear that if blk_drain() is a no-op, the value of
s->bus->dma->aiocb will not change between checking it in the `if`
condition and asserting that it is NULL after blk_drain().

The particular problem is that ide_issue_trim() creates a BH
(ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
ide_dma_cb(), which will either create a new request, or find the
transfer to be done and call ide_set_inactive(), which clears
s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
ide_trim_bh_cb() to run, which currently it will not always do.

To fix this issue, we increment the BlockBackend's in-flight counter
when the TRIM operation begins (in ide_issue_trim(), when the
ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
is done.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
v1:
https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html

v2:
- Increment BB’s in-flight counter while the BH is active so that
  blk_drain() will poll until the BH is done, as suggested by Paolo

(No git-backport-diff, because this patch was basically completely
rewritten, so it wouldn’t be worth it.)
---
 hw/ide/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

John Snow Jan. 21, 2022, 6:47 p.m. UTC | #1
On Thu, Jan 20, 2022 at 9:23 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
> When we still have an AIOCB registered for DMA operations, we try to
> settle the respective operation by draining the BlockBackend associated
> with the IDE device.
>
> However, this assumes that every DMA operation is associated with an
> increment of the BlockBackend’s in-flight counter (e.g. through some
> ongoing I/O operation), so that draining the BB until its in-flight
> counter reaches 0 will settle all DMA operations.  That is not the case:
> For TRIM, the guest can issue a zero-length operation that will not
> result in any I/O operation forwarded to the BlockBackend, and also not
> increment the in-flight counter in any other way.  In such a case,
> blk_drain() will be a no-op if no other operations are in flight.
>
> It is clear that if blk_drain() is a no-op, the value of
> s->bus->dma->aiocb will not change between checking it in the `if`
> condition and asserting that it is NULL after blk_drain().
>
> The particular problem is that ide_issue_trim() creates a BH
> (ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
> ide_dma_cb(), which will either create a new request, or find the
> transfer to be done and call ide_set_inactive(), which clears
> s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
> ide_trim_bh_cb() to run, which currently it will not always do.
>
> To fix this issue, we increment the BlockBackend's in-flight counter
> when the TRIM operation begins (in ide_issue_trim(), when the
> ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
> is done.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> v1:
> https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html
>
> v2:
> - Increment BB’s in-flight counter while the BH is active so that
>   blk_drain() will poll until the BH is done, as suggested by Paolo
>
> (No git-backport-diff, because this patch was basically completely
> rewritten, so it wouldn’t be worth it.)
> ---
>  hw/ide/core.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index e28f8aad61..15138225be 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -433,12 +433,16 @@ static const AIOCBInfo trim_aiocb_info = {
>  static void ide_trim_bh_cb(void *opaque)
>  {
>      TrimAIOCB *iocb = opaque;
> +    BlockBackend *blk = iocb->s->blk;
>
>      iocb->common.cb(iocb->common.opaque, iocb->ret);
>
>      qemu_bh_delete(iocb->bh);
>      iocb->bh = NULL;
>      qemu_aio_unref(iocb);
> +
> +    /* Paired with an increment in ide_issue_trim() */
> +    blk_dec_in_flight(blk);
>  }
>
>  static void ide_issue_trim_cb(void *opaque, int ret)
> @@ -508,6 +512,9 @@ BlockAIOCB *ide_issue_trim(
>      IDEState *s = opaque;
>      TrimAIOCB *iocb;
>
> +    /* Paired with a decrement in ide_trim_bh_cb() */
> +    blk_inc_in_flight(s->blk);
> +
>      iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
>      iocb->s = s;
>      iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
> --
> 2.34.1
>

Oh, this *wasn't* the same bug I thought it was.

There's no regression test, but I will trust you (and Paolo) that this
solves the bug you were seeing. It makes sense.

Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Hanna Czenczek Jan. 24, 2022, 9:06 a.m. UTC | #2
On 21.01.22 19:47, John Snow wrote:
> On Thu, Jan 20, 2022 at 9:23 AM Hanna Reitz <hreitz@redhat.com> wrote:
>> When we still have an AIOCB registered for DMA operations, we try to
>> settle the respective operation by draining the BlockBackend associated
>> with the IDE device.
>>
>> However, this assumes that every DMA operation is associated with an
>> increment of the BlockBackend’s in-flight counter (e.g. through some
>> ongoing I/O operation), so that draining the BB until its in-flight
>> counter reaches 0 will settle all DMA operations.  That is not the case:
>> For TRIM, the guest can issue a zero-length operation that will not
>> result in any I/O operation forwarded to the BlockBackend, and also not
>> increment the in-flight counter in any other way.  In such a case,
>> blk_drain() will be a no-op if no other operations are in flight.
>>
>> It is clear that if blk_drain() is a no-op, the value of
>> s->bus->dma->aiocb will not change between checking it in the `if`
>> condition and asserting that it is NULL after blk_drain().
>>
>> The particular problem is that ide_issue_trim() creates a BH
>> (ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
>> ide_dma_cb(), which will either create a new request, or find the
>> transfer to be done and call ide_set_inactive(), which clears
>> s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
>> ide_trim_bh_cb() to run, which currently it will not always do.
>>
>> To fix this issue, we increment the BlockBackend's in-flight counter
>> when the TRIM operation begins (in ide_issue_trim(), when the
>> ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
>> is done.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
>> ---
>> v1:
>> https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html
>>
>> v2:
>> - Increment BB’s in-flight counter while the BH is active so that
>>    blk_drain() will poll until the BH is done, as suggested by Paolo
>>
>> (No git-backport-diff, because this patch was basically completely
>> rewritten, so it wouldn’t be worth it.)
>> ---
>>   hw/ide/core.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index e28f8aad61..15138225be 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -433,12 +433,16 @@ static const AIOCBInfo trim_aiocb_info = {
>>   static void ide_trim_bh_cb(void *opaque)
>>   {
>>       TrimAIOCB *iocb = opaque;
>> +    BlockBackend *blk = iocb->s->blk;
>>
>>       iocb->common.cb(iocb->common.opaque, iocb->ret);
>>
>>       qemu_bh_delete(iocb->bh);
>>       iocb->bh = NULL;
>>       qemu_aio_unref(iocb);
>> +
>> +    /* Paired with an increment in ide_issue_trim() */
>> +    blk_dec_in_flight(blk);
>>   }
>>
>>   static void ide_issue_trim_cb(void *opaque, int ret)
>> @@ -508,6 +512,9 @@ BlockAIOCB *ide_issue_trim(
>>       IDEState *s = opaque;
>>       TrimAIOCB *iocb;
>>
>> +    /* Paired with a decrement in ide_trim_bh_cb() */
>> +    blk_inc_in_flight(s->blk);
>> +
>>       iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
>>       iocb->s = s;
>>       iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
>> --
>> 2.34.1
>>
> Oh, this *wasn't* the same bug I thought it was.
>
> There's no regression test, but I will trust you (and Paolo) that this
> solves the bug you were seeing. It makes sense.

There is one in the BZ linked, but I don’t know where we’d put it into 
the qemu tree...  I’ve explained in v1 
(https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html) 
how I didn’t find a way to write a qtest for this, and so resorted to 
writing boot sector code to reproduce the assertion failure. Now, we 
could put that as a sample image into the iotests, but that’d just be... 
wrong.  (Is there a place where something like this would belong?)

> Reviewed-by: John Snow <jsnow@redhat.com>
> Tested-by: John Snow <jsnow@redhat.com>

Thanks!

Hanna
Paolo Bonzini Jan. 24, 2022, 9:55 a.m. UTC | #3
On 1/20/22 15:22, Hanna Reitz wrote:
> 
> Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=2029980
> Suggested-by: Paolo Bonzini<pbonzini@redhat.com>
> Signed-off-by: Hanna Reitz<hreitz@redhat.com>
> ---
> v1:
> https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html
> 
> v2:
> - Increment BB’s in-flight counter while the BH is active so that
>    blk_drain() will poll until the BH is done, as suggested by Paolo
> 
> (No git-backport-diff, because this patch was basically completely
> rewritten, so it wouldn’t be worth it.)

FWIW you can put a

Supersedes: <20220105111337.10366-1-hreitz@redhat.com>

anywhere in the message, and patchew will gladly produce the crossdiff 
link 
https://patchew.org/QEMU/20220105111337.10366-1-hreitz@redhat.com/diff/20220120142259.120189-1-hreitz@redhat.com/ 
for disbelievers---or for those who want to compare the commit messages.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo
John Snow Jan. 24, 2022, 6:41 p.m. UTC | #4
On Mon, Jan 24, 2022 at 4:06 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
> On 21.01.22 19:47, John Snow wrote:
> >
> > There's no regression test, but I will trust you (and Paolo) that this
> > solves the bug you were seeing. It makes sense.
>
> There is one in the BZ linked, but I don’t know where we’d put it into
> the qemu tree...  I’ve explained in v1
> (https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html)
> how I didn’t find a way to write a qtest for this, and so resorted to
> writing boot sector code to reproduce the assertion failure. Now, we
> could put that as a sample image into the iotests, but that’d just be...
> wrong.  (Is there a place where something like this would belong?)
>

No idea. I guess it'd be more of an avacado-test level thing, but I'm
not sure I know how to do it quickly.  I'm worried there's lots of
little things like this that'd be nice to test against, but "where do
we put this" is a recurring problem.

(Definitely not insisting on this being solved, and also wise enough
to not want to volunteer.)

> > Reviewed-by: John Snow <jsnow@redhat.com>
> > Tested-by: John Snow <jsnow@redhat.com>
>
> Thanks!
>
> Hanna
>

--js
Hanna Czenczek Feb. 15, 2022, 5:13 p.m. UTC | #5
Ping

(I can take it too, if you’d like, John, but you’re listed as the only 
maintainer for hw/ide, so...  Just say the word, though!)

On 20.01.22 15:22, Hanna Reitz wrote:
> When we still have an AIOCB registered for DMA operations, we try to
> settle the respective operation by draining the BlockBackend associated
> with the IDE device.
>
> However, this assumes that every DMA operation is associated with an
> increment of the BlockBackend’s in-flight counter (e.g. through some
> ongoing I/O operation), so that draining the BB until its in-flight
> counter reaches 0 will settle all DMA operations.  That is not the case:
> For TRIM, the guest can issue a zero-length operation that will not
> result in any I/O operation forwarded to the BlockBackend, and also not
> increment the in-flight counter in any other way.  In such a case,
> blk_drain() will be a no-op if no other operations are in flight.
>
> It is clear that if blk_drain() is a no-op, the value of
> s->bus->dma->aiocb will not change between checking it in the `if`
> condition and asserting that it is NULL after blk_drain().
>
> The particular problem is that ide_issue_trim() creates a BH
> (ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
> ide_dma_cb(), which will either create a new request, or find the
> transfer to be done and call ide_set_inactive(), which clears
> s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
> ide_trim_bh_cb() to run, which currently it will not always do.
>
> To fix this issue, we increment the BlockBackend's in-flight counter
> when the TRIM operation begins (in ide_issue_trim(), when the
> ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
> is done.
>
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> ---
> v1:
> https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html
>
> v2:
> - Increment BB’s in-flight counter while the BH is active so that
>    blk_drain() will poll until the BH is done, as suggested by Paolo
>
> (No git-backport-diff, because this patch was basically completely
> rewritten, so it wouldn’t be worth it.)
> ---
>   hw/ide/core.c | 7 +++++++
>   1 file changed, 7 insertions(+)
John Snow Feb. 17, 2022, 9:02 p.m. UTC | #6
On Tue, Feb 15, 2022 at 12:14 PM Hanna Reitz <hreitz@redhat.com> wrote:
>
> Ping
>
> (I can take it too, if you’d like, John, but you’re listed as the only
> maintainer for hw/ide, so...  Just say the word, though!)
>

Sorry, I sent you a mail off-list at the time where I said you were
free to take it whenever you like. Why'd I send it off-list? I don't
know....

Please feel free to send this with your next block PR.

--js

> On 20.01.22 15:22, Hanna Reitz wrote:
> > When we still have an AIOCB registered for DMA operations, we try to
> > settle the respective operation by draining the BlockBackend associated
> > with the IDE device.
> >
> > However, this assumes that every DMA operation is associated with an
> > increment of the BlockBackend’s in-flight counter (e.g. through some
> > ongoing I/O operation), so that draining the BB until its in-flight
> > counter reaches 0 will settle all DMA operations.  That is not the case:
> > For TRIM, the guest can issue a zero-length operation that will not
> > result in any I/O operation forwarded to the BlockBackend, and also not
> > increment the in-flight counter in any other way.  In such a case,
> > blk_drain() will be a no-op if no other operations are in flight.
> >
> > It is clear that if blk_drain() is a no-op, the value of
> > s->bus->dma->aiocb will not change between checking it in the `if`
> > condition and asserting that it is NULL after blk_drain().
> >
> > The particular problem is that ide_issue_trim() creates a BH
> > (ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
> > ide_dma_cb(), which will either create a new request, or find the
> > transfer to be done and call ide_set_inactive(), which clears
> > s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
> > ide_trim_bh_cb() to run, which currently it will not always do.
> >
> > To fix this issue, we increment the BlockBackend's in-flight counter
> > when the TRIM operation begins (in ide_issue_trim(), when the
> > ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
> > is done.
> >
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
> > Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> > Signed-off-by: Hanna Reitz <hreitz@redhat.com>
> > ---
> > v1:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html
> >
> > v2:
> > - Increment BB’s in-flight counter while the BH is active so that
> >    blk_drain() will poll until the BH is done, as suggested by Paolo
> >
> > (No git-backport-diff, because this patch was basically completely
> > rewritten, so it wouldn’t be worth it.)
> > ---
> >   hw/ide/core.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
>
Hanna Czenczek Feb. 21, 2022, 8:05 a.m. UTC | #7
On 17.02.22 22:02, John Snow wrote:
> On Tue, Feb 15, 2022 at 12:14 PM Hanna Reitz <hreitz@redhat.com> wrote:
>> Ping
>>
>> (I can take it too, if you’d like, John, but you’re listed as the only
>> maintainer for hw/ide, so...  Just say the word, though!)
>>
> Sorry, I sent you a mail off-list at the time where I said you were
> free to take it whenever you like. Why'd I send it off-list? I don't
> know....
>
> Please feel free to send this with your next block PR.

Oh, oops.  You’re right.  I seemed to remember you saying something 
along those lines, but when I tried to support my memory with some 
factual email, I didn’t immediately see it, because I sorted that 
off-list mail somewhere else than where I was looking...

Thanks, will do!

Hanna
diff mbox series

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index e28f8aad61..15138225be 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -433,12 +433,16 @@  static const AIOCBInfo trim_aiocb_info = {
 static void ide_trim_bh_cb(void *opaque)
 {
     TrimAIOCB *iocb = opaque;
+    BlockBackend *blk = iocb->s->blk;
 
     iocb->common.cb(iocb->common.opaque, iocb->ret);
 
     qemu_bh_delete(iocb->bh);
     iocb->bh = NULL;
     qemu_aio_unref(iocb);
+
+    /* Paired with an increment in ide_issue_trim() */
+    blk_dec_in_flight(blk);
 }
 
 static void ide_issue_trim_cb(void *opaque, int ret)
@@ -508,6 +512,9 @@  BlockAIOCB *ide_issue_trim(
     IDEState *s = opaque;
     TrimAIOCB *iocb;
 
+    /* Paired with a decrement in ide_trim_bh_cb() */
+    blk_inc_in_flight(s->blk);
+
     iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
     iocb->s = s;
     iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);