[0/4] Tweaks around virtio-blk start/stop

Message ID	56F0EFCA.4080003@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <tubo@linux.vnet.ibm.com>; Tue, 22 Mar 2016 03:10:07 -0400 Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 22 Mar 2016 03:10:04 -0400 To: qemu-devel@nongnu.org References: <1458123018-18651-1-git-send-email-famz@redhat.com> <56E9355A.5070700@redhat.com> <56E93A22.1080102@de.ibm.com> <56E93ECE.10103@redhat.com> <56E9425C.8030201@de.ibm.com> <56E957AD.2050005@redhat.com> <56E961EA.4090908@de.ibm.com> <56E9638B.5090204@redhat.com> <20160317003906.GA23821@ad.usersys.redhat.com> <56EA8EEE.2020801@linux.vnet.ibm.com> <20160321105718.GA7710@ad.usersys.redhat.com> From: tu bo <tubo@linux.vnet.ibm.com> Message-ID: <56F0EFCA.4080003@linux.vnet.ibm.com> Date: Tue, 22 Mar 2016 15:10:02 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20160321105718.GA7710@ad.usersys.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 0/4] Tweaks around virtio-blk start/stop Precedence: list Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org

tu bo March 22, 2016, 7:10 a.m. UTC

Hi Fam:

On 03/21/2016 06:57 PM, Fam Zheng wrote:
> On Thu, 03/17 19:03, tu bo wrote:
>>
>> On 03/17/2016 08:39 AM, Fam Zheng wrote:
>>> On Wed, 03/16 14:45, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 16/03/2016 14:38, Christian Borntraeger wrote:
>>>>>> If you just remove the calls to virtio_queue_host_notifier_read, here
>>>>>> and in virtio_queue_aio_set_host_notifier_fd_handler, does it work
>>>>>> (keeping patches 2-4 in)?
>>>>>
>>>>> With these changes and patch 2-4 it does no longer locks up.
>>>>> I keep it running some hour to check if a crash happens.
>>>>>
>>>>> Tu Bo, your setup is currently better suited for reproducing. Can you also check?
>>>>
>>>> Great, I'll prepare a patch to virtio then sketching the solution that
>>>> Conny agreed with.
>>>>
>>>> While Fam and I agreed that patch 1 is not required, I'm not sure if the
>>>> mutex is necessary in the end.
>>>
>>> If we can fix this from the virtio_queue_host_notifier_read side, the mutex/BH
>>> are not necessary; but OTOH the mutex does catch such bugs, so maybe it's good
>>> to have it. I'm not sure about the BH.
>>>
>>> And on a hindsight I realize we don't want patches 2-3 too. Actually the
>>> begin/end pair won't work as expected because of the blk_set_aio_context.
>>>
>>> Let's hold on this series.
>>>
>>>>
>>>> So if Tu Bo can check without the virtio_queue_host_notifier_read calls,
>>>> and both with/without Fam's patches, it would be great.
>>>
>>> Tu Bo, only with/withoug patch 4, if you want to check. Sorry for the noise.
>>>
>> 1. without the virtio_queue_host_notifier_read calls,  without patch 4
>>
>> crash happens very often,
>>
>> (gdb) bt
>> #0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
>> #1  0x000002aa165da37e in coroutine_trampoline (i0=<optimized out>,
>> i1=1812051552) at util/coroutine-ucontext.c:79
>> #2  0x000003ff7dd5150a in __makecontext_ret () from /lib64/libc.so.6
>>
>>
>> 2. without the virtio_queue_host_notifier_read calls, with patch 4
>>
>> crash happens very often,
>>
>> (gdb) bt
>> #0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
>> #1  0x000002aa39dda43e in coroutine_trampoline (i0=<optimized out>,
>> i1=-1677715600) at util/coroutine-ucontext.c:79
>> #2  0x000003ffab6d150a in __makecontext_ret () from /lib64/libc.so.6
>>
>>
>
> Tu Bo,
>
> Could you help test this patch (on top of master, without patch 4)?
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 08275a9..47f8043 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
>
>   void virtio_queue_notify(VirtIODevice *vdev, int n)
>   {
> -    virtio_queue_notify_vq(&vdev->vq[n]);
> +    VirtQueue *vq = &vdev->vq[n];
> +    EventNotifier *n;
> +    n = virtio_queue_get_host_notifier(vq);
> +    if (n) {
> +        event_notifier_set(n);
> +    } else {
> +        virtio_queue_notify_vq(vq);
> +    }
>   }
>
>   uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
>
>

I got a build error as below,

/BUILD/qemu-2.5.50/hw/virtio/virtio.c: In function 'virtio_queue_notify':
/BUILD/qemu-2.5.50/hw/virtio/virtio.c:1102:20: error: 'n' redeclared as 
different kind of symbol
      EventNotifier *n;
                     ^
/BUILD/qemu-2.5.50/hw/virtio/virtio.c:1099:50: note: previous definition 
of 'n' was here
  void virtio_queue_notify(VirtIODevice *vdev, int n)


Then I did some change for your patch as below,

With qemu master + modified patch above(without patch 4, without Conny's 
patches), I did NOT get crash so far.  thanks



>

Fam Zheng March 22, 2016, 7:18 a.m. UTC | #1

On Tue, 03/22 15:10, tu bo wrote:
> Hi Fam:
> 
> On 03/21/2016 06:57 PM, Fam Zheng wrote:
> >On Thu, 03/17 19:03, tu bo wrote:
> >>
> >>On 03/17/2016 08:39 AM, Fam Zheng wrote:
> >>>On Wed, 03/16 14:45, Paolo Bonzini wrote:
> >>>>
> >>>>
> >>>>On 16/03/2016 14:38, Christian Borntraeger wrote:
> >>>>>>If you just remove the calls to virtio_queue_host_notifier_read, here
> >>>>>>and in virtio_queue_aio_set_host_notifier_fd_handler, does it work
> >>>>>>(keeping patches 2-4 in)?
> >>>>>
> >>>>>With these changes and patch 2-4 it does no longer locks up.
> >>>>>I keep it running some hour to check if a crash happens.
> >>>>>
> >>>>>Tu Bo, your setup is currently better suited for reproducing. Can you also check?
> >>>>
> >>>>Great, I'll prepare a patch to virtio then sketching the solution that
> >>>>Conny agreed with.
> >>>>
> >>>>While Fam and I agreed that patch 1 is not required, I'm not sure if the
> >>>>mutex is necessary in the end.
> >>>
> >>>If we can fix this from the virtio_queue_host_notifier_read side, the mutex/BH
> >>>are not necessary; but OTOH the mutex does catch such bugs, so maybe it's good
> >>>to have it. I'm not sure about the BH.
> >>>
> >>>And on a hindsight I realize we don't want patches 2-3 too. Actually the
> >>>begin/end pair won't work as expected because of the blk_set_aio_context.
> >>>
> >>>Let's hold on this series.
> >>>
> >>>>
> >>>>So if Tu Bo can check without the virtio_queue_host_notifier_read calls,
> >>>>and both with/without Fam's patches, it would be great.
> >>>
> >>>Tu Bo, only with/withoug patch 4, if you want to check. Sorry for the noise.
> >>>
> >>1. without the virtio_queue_host_notifier_read calls,  without patch 4
> >>
> >>crash happens very often,
> >>
> >>(gdb) bt
> >>#0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
> >>#1  0x000002aa165da37e in coroutine_trampoline (i0=<optimized out>,
> >>i1=1812051552) at util/coroutine-ucontext.c:79
> >>#2  0x000003ff7dd5150a in __makecontext_ret () from /lib64/libc.so.6
> >>
> >>
> >>2. without the virtio_queue_host_notifier_read calls, with patch 4
> >>
> >>crash happens very often,
> >>
> >>(gdb) bt
> >>#0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
> >>#1  0x000002aa39dda43e in coroutine_trampoline (i0=<optimized out>,
> >>i1=-1677715600) at util/coroutine-ucontext.c:79
> >>#2  0x000003ffab6d150a in __makecontext_ret () from /lib64/libc.so.6
> >>
> >>
> >
> >Tu Bo,
> >
> >Could you help test this patch (on top of master, without patch 4)?
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index 08275a9..47f8043 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> >
> >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> >  {
> >-    virtio_queue_notify_vq(&vdev->vq[n]);
> >+    VirtQueue *vq = &vdev->vq[n];
> >+    EventNotifier *n;
> >+    n = virtio_queue_get_host_notifier(vq);
> >+    if (n) {
> >+        event_notifier_set(n);
> >+    } else {
> >+        virtio_queue_notify_vq(vq);
> >+    }
> >  }
> >
> >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> >
> >
> 
> I got a build error as below,
> 
> /BUILD/qemu-2.5.50/hw/virtio/virtio.c: In function 'virtio_queue_notify':
> /BUILD/qemu-2.5.50/hw/virtio/virtio.c:1102:20: error: 'n' redeclared
> as different kind of symbol
>      EventNotifier *n;
>                     ^
> /BUILD/qemu-2.5.50/hw/virtio/virtio.c:1099:50: note: previous
> definition of 'n' was here
>  void virtio_queue_notify(VirtIODevice *vdev, int n)
> 
> 
> Then I did some change for your patch as below,
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 08275a9..a10da39 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> 
>  void virtio_queue_notify(VirtIODevice *vdev, int n)
>  {
> -    virtio_queue_notify_vq(&vdev->vq[n]);
> +    VirtQueue *vq = &vdev->vq[n];
> +    EventNotifier *en;
> +    en = virtio_queue_get_host_notifier(vq);
> +    if (en) {
> +        event_notifier_set(en);
> +    } else {
> +        virtio_queue_notify_vq(vq);
> +    }
>  }
> 
>  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> 
> With qemu master + modified patch above(without patch 4, without
> Conny's patches), I did NOT get crash so far.  thanks

Yes, it was a mistake.  Thanks for the testing!

Fam

Cornelia Huck March 22, 2016, 9:07 a.m. UTC | #2

(re-adding cc:s)

On Tue, 22 Mar 2016 15:18:05 +0800
Fam Zheng <famz@redhat.com> wrote:

> On Tue, 03/22 15:10, tu bo wrote:
> > Hi Fam:
> > 
> > On 03/21/2016 06:57 PM, Fam Zheng wrote:
> > >On Thu, 03/17 19:03, tu bo wrote:
> > >>
> > >>On 03/17/2016 08:39 AM, Fam Zheng wrote:
> > >>>On Wed, 03/16 14:45, Paolo Bonzini wrote:
> > >>>>
> > >>>>
> > >>>>On 16/03/2016 14:38, Christian Borntraeger wrote:
> > >>>>>>If you just remove the calls to virtio_queue_host_notifier_read, here
> > >>>>>>and in virtio_queue_aio_set_host_notifier_fd_handler, does it work
> > >>>>>>(keeping patches 2-4 in)?
> > >>>>>
> > >>>>>With these changes and patch 2-4 it does no longer locks up.
> > >>>>>I keep it running some hour to check if a crash happens.
> > >>>>>
> > >>>>>Tu Bo, your setup is currently better suited for reproducing. Can you also check?
> > >>>>
> > >>>>Great, I'll prepare a patch to virtio then sketching the solution that
> > >>>>Conny agreed with.
> > >>>>
> > >>>>While Fam and I agreed that patch 1 is not required, I'm not sure if the
> > >>>>mutex is necessary in the end.
> > >>>
> > >>>If we can fix this from the virtio_queue_host_notifier_read side, the mutex/BH
> > >>>are not necessary; but OTOH the mutex does catch such bugs, so maybe it's good
> > >>>to have it. I'm not sure about the BH.
> > >>>
> > >>>And on a hindsight I realize we don't want patches 2-3 too. Actually the
> > >>>begin/end pair won't work as expected because of the blk_set_aio_context.
> > >>>
> > >>>Let's hold on this series.
> > >>>
> > >>>>
> > >>>>So if Tu Bo can check without the virtio_queue_host_notifier_read calls,
> > >>>>and both with/without Fam's patches, it would be great.
> > >>>
> > >>>Tu Bo, only with/withoug patch 4, if you want to check. Sorry for the noise.
> > >>>
> > >>1. without the virtio_queue_host_notifier_read calls,  without patch 4
> > >>
> > >>crash happens very often,
> > >>
> > >>(gdb) bt
> > >>#0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
> > >>#1  0x000002aa165da37e in coroutine_trampoline (i0=<optimized out>,
> > >>i1=1812051552) at util/coroutine-ucontext.c:79
> > >>#2  0x000003ff7dd5150a in __makecontext_ret () from /lib64/libc.so.6
> > >>
> > >>
> > >>2. without the virtio_queue_host_notifier_read calls, with patch 4
> > >>
> > >>crash happens very often,
> > >>
> > >>(gdb) bt
> > >>#0  bdrv_co_do_rw (opaque=0x0) at block/io.c:2172
> > >>#1  0x000002aa39dda43e in coroutine_trampoline (i0=<optimized out>,
> > >>i1=-1677715600) at util/coroutine-ucontext.c:79
> > >>#2  0x000003ffab6d150a in __makecontext_ret () from /lib64/libc.so.6
> > >>
> > >>
> > >
> > >Tu Bo,
> > >
> > >Could you help test this patch (on top of master, without patch 4)?
> > >
> > >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > >index 08275a9..47f8043 100644
> > >--- a/hw/virtio/virtio.c
> > >+++ b/hw/virtio/virtio.c
> > >@@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> > >
> > >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> > >  {
> > >-    virtio_queue_notify_vq(&vdev->vq[n]);
> > >+    VirtQueue *vq = &vdev->vq[n];
> > >+    EventNotifier *n;
> > >+    n = virtio_queue_get_host_notifier(vq);
> > >+    if (n) {
> > >+        event_notifier_set(n);
> > >+    } else {
> > >+        virtio_queue_notify_vq(vq);
> > >+    }
> > >  }
> > >
> > >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> > >
> > >
> > 
> > I got a build error as below,
> > 
> > /BUILD/qemu-2.5.50/hw/virtio/virtio.c: In function 'virtio_queue_notify':
> > /BUILD/qemu-2.5.50/hw/virtio/virtio.c:1102:20: error: 'n' redeclared
> > as different kind of symbol
> >      EventNotifier *n;
> >                     ^
> > /BUILD/qemu-2.5.50/hw/virtio/virtio.c:1099:50: note: previous
> > definition of 'n' was here
> >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> > 
> > 
> > Then I did some change for your patch as below,
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 08275a9..a10da39 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -1098,7 +1098,14 @@ void virtio_queue_notify_vq(VirtQueue *vq)
> > 
> >  void virtio_queue_notify(VirtIODevice *vdev, int n)
> >  {
> > -    virtio_queue_notify_vq(&vdev->vq[n]);
> > +    VirtQueue *vq = &vdev->vq[n];
> > +    EventNotifier *en;
> > +    en = virtio_queue_get_host_notifier(vq);
> > +    if (en) {
> > +        event_notifier_set(en);
> > +    } else {
> > +        virtio_queue_notify_vq(vq);
> > +    }
> >  }
> > 
> >  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n)
> > 
> > With qemu master + modified patch above(without patch 4, without
> > Conny's patches), I did NOT get crash so far.  thanks
> 
> Yes, it was a mistake.  Thanks for the testing!

I'm wondering what we learn from this. Bypassing notify_vq() removes
the race, but that's not the solution here.

So far, we had the best results with my refactoring + the mutex/bh
change. Two problems:

- We don't really understand yet why my refactoring helps, but passing
the assign value is a good canditate (and it's agreed that this fixes a
bug, I think.)
- There's some problem with the bh, if I understood Stefan correctly.

Paolo Bonzini March 22, 2016, 9:46 a.m. UTC | #3

On 22/03/2016 10:07, Cornelia Huck wrote:
> So far, we had the best results with my refactoring + the mutex/bh
> change. Two problems:
> 
> - We don't really understand yet why my refactoring helps, but passing
> the assign value is a good canditate (and it's agreed that this fixes a
> bug, I think.)
> - There's some problem with the bh, if I understood Stefan correctly.

They can be fixed with just an extra object_ref/object_unref.

I didn't understand that Tu Bo also needed the BH fix, and with that
information it makes sense.  Passing the assign value ensures that
ioeventfd remains always assigned.  With the CPU threads out of the
picture, the BH becomes enough to make everything thread-safe.

Paolo

Cornelia Huck March 22, 2016, 11:59 a.m. UTC | #4

On Tue, 22 Mar 2016 10:46:58 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 22/03/2016 10:07, Cornelia Huck wrote:
> > So far, we had the best results with my refactoring + the mutex/bh
> > change. Two problems:
> > 
> > - We don't really understand yet why my refactoring helps, but passing
> > the assign value is a good canditate (and it's agreed that this fixes a
> > bug, I think.)
> > - There's some problem with the bh, if I understood Stefan correctly.
> 
> They can be fixed with just an extra object_ref/object_unref.
> 
> I didn't understand that Tu Bo also needed the BH fix, and with that
> information it makes sense.  Passing the assign value ensures that
> ioeventfd remains always assigned.  With the CPU threads out of the
> picture, the BH becomes enough to make everything thread-safe.

Yes, this makes sense.

Might we still have a hole somewhere in dataplane teardown? Probably
not, from reading the code, even if it runs in cpu thread context.

Paolo Bonzini March 22, 2016, 12:11 p.m. UTC | #5

On 22/03/2016 12:59, Cornelia Huck wrote:
>> > They can be fixed with just an extra object_ref/object_unref.
>> > 
>> > I didn't understand that Tu Bo also needed the BH fix, and with that
>> > information it makes sense.  Passing the assign value ensures that
>> > ioeventfd remains always assigned.  With the CPU threads out of the
>> > picture, the BH becomes enough to make everything thread-safe.
> Yes, this makes sense.
> 
> Might we still have a hole somewhere in dataplane teardown? Probably
> not, from reading the code, even if it runs in cpu thread context.

The bug arises when the main thread sets started = true, a CPU thread
comes along while the ioeventfd is reset, and as soon as the BQL is
released by the main thread the CPU thread thinks it is a dataplane
thread.  This does horrible things to non-reentrant code.  For stop we
should be safe because the CPU thread is the one that sets started = false.

IOW, we should be safe as long as the ioeventfd is never unassigned
(your fix) _and_ we ensure serialization between threads that touch
dataplane_started (Fam's fix).

Paolo

Cornelia Huck March 22, 2016, 12:54 p.m. UTC | #6

On Tue, 22 Mar 2016 13:11:05 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 22/03/2016 12:59, Cornelia Huck wrote:
> >> > They can be fixed with just an extra object_ref/object_unref.
> >> > 
> >> > I didn't understand that Tu Bo also needed the BH fix, and with that
> >> > information it makes sense.  Passing the assign value ensures that
> >> > ioeventfd remains always assigned.  With the CPU threads out of the
> >> > picture, the BH becomes enough to make everything thread-safe.
> > Yes, this makes sense.
> > 
> > Might we still have a hole somewhere in dataplane teardown? Probably
> > not, from reading the code, even if it runs in cpu thread context.
> 
> The bug arises when the main thread sets started = true, a CPU thread
> comes along while the ioeventfd is reset, and as soon as the BQL is
> released by the main thread the CPU thread thinks it is a dataplane
> thread.  This does horrible things to non-reentrant code.  For stop we
> should be safe because the CPU thread is the one that sets started = false.
> 
> IOW, we should be safe as long as the ioeventfd is never unassigned
> (your fix) _and_ we ensure serialization between threads that touch
> dataplane_started (Fam's fix).

We should really add something like that explanation to the changelog
so that future generations may understand what's going on here :)

So, what do we do for 2.6? A respin of Fam's fix + my refactoring (with
some interface doc added)? I'd still like some reviews and maybe a test
on virtio-mmio.

[0/4] Tweaks around virtio-blk start/stop

Commit Message

Comments

Patch