diff mbox series

hw/virtio: Add a protection against duplicate vu_scmi_stop calls

Message ID 20230720101037.2161450-1-mzamazal@redhat.com (mailing list archive)
State New, archived
Headers show
Series hw/virtio: Add a protection against duplicate vu_scmi_stop calls | expand

Commit Message

Milan Zamazal July 20, 2023, 10:10 a.m. UTC
The QEMU CI fails in virtio-scmi test occasionally.  As reported by
Thomas Huth, this happens most likely when the system is loaded and it
fails with the following error:

  qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
  msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
  ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)

As discovered by Fabiano Rosas, the cause is a duplicate invocation of
msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:

  msix_unset_vector_notifiers
  virtio_pci_set_guest_notifiers
  vu_scmi_stop
  vu_scmi_disconnect
  ...
  qemu_chr_write_buffer

  msix_unset_vector_notifiers
  virtio_pci_set_guest_notifiers
  vu_scmi_stop
  vu_scmi_set_status
  ...
  qemu_cleanup

While vu_scmi_stop calls are protected by vhost_dev_is_started()
check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
use an extra protection, see f5b22d06fb (vhost: recheck dev state in
the vhost_migration_log routine) for the motivation.  Let's use the
same in vhost-user-scmi, which fixes the failure above.

Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
---
 hw/virtio/vhost-user-scmi.c         | 7 +++++++
 include/hw/virtio/vhost-user-scmi.h | 1 +
 2 files changed, 8 insertions(+)

Comments

Thomas Huth July 20, 2023, 10:55 a.m. UTC | #1
On 20/07/2023 12.10, Milan Zamazal wrote:
> The QEMU CI fails in virtio-scmi test occasionally.  As reported by
> Thomas Huth, this happens most likely when the system is loaded and it
> fails with the following error:
> 
>    qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
>    msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
>    ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
> 
> As discovered by Fabiano Rosas, the cause is a duplicate invocation of
> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
> 
>    msix_unset_vector_notifiers
>    virtio_pci_set_guest_notifiers
>    vu_scmi_stop
>    vu_scmi_disconnect
>    ...
>    qemu_chr_write_buffer
> 
>    msix_unset_vector_notifiers
>    virtio_pci_set_guest_notifiers
>    vu_scmi_stop
>    vu_scmi_set_status
>    ...
>    qemu_cleanup
> 
> While vu_scmi_stop calls are protected by vhost_dev_is_started()
> check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
> use an extra protection, see f5b22d06fb (vhost: recheck dev state in
> the vhost_migration_log routine) for the motivation.  Let's use the
> same in vhost-user-scmi, which fixes the failure above.
> 
> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
> ---
>   hw/virtio/vhost-user-scmi.c         | 7 +++++++
>   include/hw/virtio/vhost-user-scmi.h | 1 +
>   2 files changed, 8 insertions(+)

Thanks, that fixes the problem for me!

Tested-by: Thomas Huth <thuth@redhat.com>
Fabiano Rosas July 20, 2023, 12:30 p.m. UTC | #2
Milan Zamazal <mzamazal@redhat.com> writes:

> The QEMU CI fails in virtio-scmi test occasionally.  As reported by
> Thomas Huth, this happens most likely when the system is loaded and it
> fails with the following error:
>
>   qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
>   msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
>   ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
>
> As discovered by Fabiano Rosas, the cause is a duplicate invocation of
> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
>
>   msix_unset_vector_notifiers
>   virtio_pci_set_guest_notifiers
>   vu_scmi_stop
>   vu_scmi_disconnect
>   ...
>   qemu_chr_write_buffer
>
>   msix_unset_vector_notifiers
>   virtio_pci_set_guest_notifiers
>   vu_scmi_stop
>   vu_scmi_set_status
>   ...
>   qemu_cleanup
>
> While vu_scmi_stop calls are protected by vhost_dev_is_started()
> check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
> use an extra protection, see f5b22d06fb (vhost: recheck dev state in
> the vhost_migration_log routine) for the motivation.  Let's use the
> same in vhost-user-scmi, which fixes the failure above.
>
> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Milan Zamazal Aug. 3, 2023, 7:10 a.m. UTC | #3
Fabiano Rosas <farosas@suse.de> writes:

> Milan Zamazal <mzamazal@redhat.com> writes:
>
>> The QEMU CI fails in virtio-scmi test occasionally.  As reported by
>> Thomas Huth, this happens most likely when the system is loaded and it
>> fails with the following error:
>>
>>   qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
>>   msix_unset_vector_notifiers: Assertion
>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier'
>> failed.
>>   ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected
>> QEMU death from signal 6 (Aborted) (core dumped)
>>
>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of
>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
>>
>>   msix_unset_vector_notifiers
>>   virtio_pci_set_guest_notifiers
>>   vu_scmi_stop
>>   vu_scmi_disconnect
>>   ...
>>   qemu_chr_write_buffer
>>
>>   msix_unset_vector_notifiers
>>   virtio_pci_set_guest_notifiers
>>   vu_scmi_stop
>>   vu_scmi_set_status
>>   ...
>>   qemu_cleanup
>>
>> While vu_scmi_stop calls are protected by vhost_dev_is_started()
>> check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in
>> the vhost_migration_log routine) for the motivation.  Let's use the
>> same in vhost-user-scmi, which fixes the failure above.
>>
>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Please note that this bug fix should IMO definitely go to 8.1, to not
have a bug in vhost-user-scmi and to not have broken tests.  Any chance
to get it merged?

Thanks,
Milan
Thomas Huth Aug. 3, 2023, 7:38 a.m. UTC | #4
On 03/08/2023 09.10, Milan Zamazal wrote:
> Fabiano Rosas <farosas@suse.de> writes:
> 
>> Milan Zamazal <mzamazal@redhat.com> writes:
>>
>>> The QEMU CI fails in virtio-scmi test occasionally.  As reported by
>>> Thomas Huth, this happens most likely when the system is loaded and it
>>> fails with the following error:
>>>
>>>    qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
>>>    msix_unset_vector_notifiers: Assertion
>>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier'
>>> failed.
>>>    ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected
>>> QEMU death from signal 6 (Aborted) (core dumped)
>>>
>>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of
>>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
>>>
>>>    msix_unset_vector_notifiers
>>>    virtio_pci_set_guest_notifiers
>>>    vu_scmi_stop
>>>    vu_scmi_disconnect
>>>    ...
>>>    qemu_chr_write_buffer
>>>
>>>    msix_unset_vector_notifiers
>>>    virtio_pci_set_guest_notifiers
>>>    vu_scmi_stop
>>>    vu_scmi_set_status
>>>    ...
>>>    qemu_cleanup
>>>
>>> While vu_scmi_stop calls are protected by vhost_dev_is_started()
>>> check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
>>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in
>>> the vhost_migration_log routine) for the motivation.  Let's use the
>>> same in vhost-user-scmi, which fixes the failure above.
>>>
>>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
>>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
>>
>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> 
> Please note that this bug fix should IMO definitely go to 8.1, to not
> have a bug in vhost-user-scmi and to not have broken tests.  Any chance
> to get it merged?

If nobody else is planning a pull request with this patch included, I can 
take it for my next PR (since it is fixing the CI, I just saw another 
failure here:
https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 )

  Thomas
Michael S. Tsirkin Aug. 3, 2023, 9:50 a.m. UTC | #5
On Thu, Aug 03, 2023 at 09:38:20AM +0200, Thomas Huth wrote:
> On 03/08/2023 09.10, Milan Zamazal wrote:
> > Fabiano Rosas <farosas@suse.de> writes:
> > 
> > > Milan Zamazal <mzamazal@redhat.com> writes:
> > > 
> > > > The QEMU CI fails in virtio-scmi test occasionally.  As reported by
> > > > Thomas Huth, this happens most likely when the system is loaded and it
> > > > fails with the following error:
> > > > 
> > > >    qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
> > > >    msix_unset_vector_notifiers: Assertion
> > > > `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier'
> > > > failed.
> > > >    ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected
> > > > QEMU death from signal 6 (Aborted) (core dumped)
> > > > 
> > > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of
> > > > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
> > > > 
> > > >    msix_unset_vector_notifiers
> > > >    virtio_pci_set_guest_notifiers
> > > >    vu_scmi_stop
> > > >    vu_scmi_disconnect
> > > >    ...
> > > >    qemu_chr_write_buffer
> > > > 
> > > >    msix_unset_vector_notifiers
> > > >    virtio_pci_set_guest_notifiers
> > > >    vu_scmi_stop
> > > >    vu_scmi_set_status
> > > >    ...
> > > >    qemu_cleanup
> > > > 
> > > > While vu_scmi_stop calls are protected by vhost_dev_is_started()
> > > > check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
> > > > use an extra protection, see f5b22d06fb (vhost: recheck dev state in
> > > > the vhost_migration_log routine) for the motivation.  Let's use the
> > > > same in vhost-user-scmi, which fixes the failure above.
> > > > 
> > > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
> > > > Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
> > > 
> > > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > 
> > Please note that this bug fix should IMO definitely go to 8.1, to not
> > have a bug in vhost-user-scmi and to not have broken tests.  Any chance
> > to get it merged?
> 
> If nobody else is planning a pull request with this patch included, I can
> take it for my next PR (since it is fixing the CI, I just saw another
> failure here:
> https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 )
> 
>  Thomas
> 

I picked it up but if you like I can drop it.
Thomas Huth Aug. 3, 2023, 10:46 a.m. UTC | #6
On 03/08/2023 11.50, Michael S. Tsirkin wrote:
> On Thu, Aug 03, 2023 at 09:38:20AM +0200, Thomas Huth wrote:
>> On 03/08/2023 09.10, Milan Zamazal wrote:
>>> Fabiano Rosas <farosas@suse.de> writes:
>>>
>>>> Milan Zamazal <mzamazal@redhat.com> writes:
>>>>
>>>>> The QEMU CI fails in virtio-scmi test occasionally.  As reported by
>>>>> Thomas Huth, this happens most likely when the system is loaded and it
>>>>> fails with the following error:
>>>>>
>>>>>     qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659:
>>>>>     msix_unset_vector_notifiers: Assertion
>>>>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier'
>>>>> failed.
>>>>>     ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected
>>>>> QEMU death from signal 6 (Aborted) (core dumped)
>>>>>
>>>>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of
>>>>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls:
>>>>>
>>>>>     msix_unset_vector_notifiers
>>>>>     virtio_pci_set_guest_notifiers
>>>>>     vu_scmi_stop
>>>>>     vu_scmi_disconnect
>>>>>     ...
>>>>>     qemu_chr_write_buffer
>>>>>
>>>>>     msix_unset_vector_notifiers
>>>>>     virtio_pci_set_guest_notifiers
>>>>>     vu_scmi_stop
>>>>>     vu_scmi_set_status
>>>>>     ...
>>>>>     qemu_cleanup
>>>>>
>>>>> While vu_scmi_stop calls are protected by vhost_dev_is_started()
>>>>> check, it's apparently not enough.  vhost-user-blk and vhost-user-gpio
>>>>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in
>>>>> the vhost_migration_log routine) for the motivation.  Let's use the
>>>>> same in vhost-user-scmi, which fixes the failure above.
>>>>>
>>>>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device")
>>>>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
>>>>
>>>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>>
>>> Please note that this bug fix should IMO definitely go to 8.1, to not
>>> have a bug in vhost-user-scmi and to not have broken tests.  Any chance
>>> to get it merged?
>>
>> If nobody else is planning a pull request with this patch included, I can
>> take it for my next PR (since it is fixing the CI, I just saw another
>> failure here:
>> https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 )
>>
>>   Thomas
>>
> 
> I picked it up but if you like I can drop it.

I think it better fits into your tree, so if you plan a PR before the next 
RC, then please go ahead. I'll drop it from my branch again.

  Thomas
diff mbox series

Patch

diff --git a/hw/virtio/vhost-user-scmi.c b/hw/virtio/vhost-user-scmi.c
index d386fb2df9..918bb7dcf7 100644
--- a/hw/virtio/vhost-user-scmi.c
+++ b/hw/virtio/vhost-user-scmi.c
@@ -63,6 +63,7 @@  static int vu_scmi_start(VirtIODevice *vdev)
         error_report("Error starting vhost-user-scmi: %d", ret);
         goto err_guest_notifiers;
     }
+    scmi->started_vu = true;
 
     /*
      * guest_notifier_mask/pending not used yet, so just unmask
@@ -90,6 +91,12 @@  static void vu_scmi_stop(VirtIODevice *vdev)
     struct vhost_dev *vhost_dev = &scmi->vhost_dev;
     int ret;
 
+    /* vhost_dev_is_started() check in the callers is not fully reliable. */
+    if (!scmi->started_vu) {
+        return;
+    }
+    scmi->started_vu = false;
+
     if (!k->set_guest_notifiers) {
         return;
     }
diff --git a/include/hw/virtio/vhost-user-scmi.h b/include/hw/virtio/vhost-user-scmi.h
index 6175a74ebd..c90db77dd5 100644
--- a/include/hw/virtio/vhost-user-scmi.h
+++ b/include/hw/virtio/vhost-user-scmi.h
@@ -25,6 +25,7 @@  struct VHostUserSCMI {
     VirtQueue *cmd_vq;
     VirtQueue *event_vq;
     bool connected;
+    bool started_vu;
 };
 
 #endif /* _QEMU_VHOST_USER_SCMI_H */