Message ID | 20230720101037.2161450-1-mzamazal@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hw/virtio: Add a protection against duplicate vu_scmi_stop calls | expand |
On 20/07/2023 12.10, Milan Zamazal wrote: > The QEMU CI fails in virtio-scmi test occasionally. As reported by > Thomas Huth, this happens most likely when the system is loaded and it > fails with the following error: > > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_disconnect > ... > qemu_chr_write_buffer > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_set_status > ... > qemu_cleanup > > While vu_scmi_stop calls are protected by vhost_dev_is_started() > check, it's apparently not enough. vhost-user-blk and vhost-user-gpio > use an extra protection, see f5b22d06fb (vhost: recheck dev state in > the vhost_migration_log routine) for the motivation. Let's use the > same in vhost-user-scmi, which fixes the failure above. > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > --- > hw/virtio/vhost-user-scmi.c | 7 +++++++ > include/hw/virtio/vhost-user-scmi.h | 1 + > 2 files changed, 8 insertions(+) Thanks, that fixes the problem for me! Tested-by: Thomas Huth <thuth@redhat.com>
Milan Zamazal <mzamazal@redhat.com> writes: > The QEMU CI fails in virtio-scmi test occasionally. As reported by > Thomas Huth, this happens most likely when the system is loaded and it > fails with the following error: > > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_disconnect > ... > qemu_chr_write_buffer > > msix_unset_vector_notifiers > virtio_pci_set_guest_notifiers > vu_scmi_stop > vu_scmi_set_status > ... > qemu_cleanup > > While vu_scmi_stop calls are protected by vhost_dev_is_started() > check, it's apparently not enough. vhost-user-blk and vhost-user-gpio > use an extra protection, see f5b22d06fb (vhost: recheck dev state in > the vhost_migration_log routine) for the motivation. Let's use the > same in vhost-user-scmi, which fixes the failure above. > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Fabiano Rosas <farosas@suse.de> writes: > Milan Zamazal <mzamazal@redhat.com> writes: > >> The QEMU CI fails in virtio-scmi test occasionally. As reported by >> Thomas Huth, this happens most likely when the system is loaded and it >> fails with the following error: >> >> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >> msix_unset_vector_notifiers: Assertion >> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >> failed. >> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >> QEMU death from signal 6 (Aborted) (core dumped) >> >> As discovered by Fabiano Rosas, the cause is a duplicate invocation of >> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: >> >> msix_unset_vector_notifiers >> virtio_pci_set_guest_notifiers >> vu_scmi_stop >> vu_scmi_disconnect >> ... >> qemu_chr_write_buffer >> >> msix_unset_vector_notifiers >> virtio_pci_set_guest_notifiers >> vu_scmi_stop >> vu_scmi_set_status >> ... >> qemu_cleanup >> >> While vu_scmi_stop calls are protected by vhost_dev_is_started() >> check, it's apparently not enough. vhost-user-blk and vhost-user-gpio >> use an extra protection, see f5b22d06fb (vhost: recheck dev state in >> the vhost_migration_log routine) for the motivation. Let's use the >> same in vhost-user-scmi, which fixes the failure above. >> >> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") >> Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > > Reviewed-by: Fabiano Rosas <farosas@suse.de> Please note that this bug fix should IMO definitely go to 8.1, to not have a bug in vhost-user-scmi and to not have broken tests. Any chance to get it merged? Thanks, Milan
On 03/08/2023 09.10, Milan Zamazal wrote: > Fabiano Rosas <farosas@suse.de> writes: > >> Milan Zamazal <mzamazal@redhat.com> writes: >> >>> The QEMU CI fails in virtio-scmi test occasionally. As reported by >>> Thomas Huth, this happens most likely when the system is loaded and it >>> fails with the following error: >>> >>> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >>> msix_unset_vector_notifiers: Assertion >>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >>> failed. >>> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >>> QEMU death from signal 6 (Aborted) (core dumped) >>> >>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of >>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: >>> >>> msix_unset_vector_notifiers >>> virtio_pci_set_guest_notifiers >>> vu_scmi_stop >>> vu_scmi_disconnect >>> ... >>> qemu_chr_write_buffer >>> >>> msix_unset_vector_notifiers >>> virtio_pci_set_guest_notifiers >>> vu_scmi_stop >>> vu_scmi_set_status >>> ... >>> qemu_cleanup >>> >>> While vu_scmi_stop calls are protected by vhost_dev_is_started() >>> check, it's apparently not enough. vhost-user-blk and vhost-user-gpio >>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in >>> the vhost_migration_log routine) for the motivation. Let's use the >>> same in vhost-user-scmi, which fixes the failure above. >>> >>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") >>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com> >> >> Reviewed-by: Fabiano Rosas <farosas@suse.de> > > Please note that this bug fix should IMO definitely go to 8.1, to not > have a bug in vhost-user-scmi and to not have broken tests. Any chance > to get it merged? If nobody else is planning a pull request with this patch included, I can take it for my next PR (since it is fixing the CI, I just saw another failure here: https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 ) Thomas
On Thu, Aug 03, 2023 at 09:38:20AM +0200, Thomas Huth wrote: > On 03/08/2023 09.10, Milan Zamazal wrote: > > Fabiano Rosas <farosas@suse.de> writes: > > > > > Milan Zamazal <mzamazal@redhat.com> writes: > > > > > > > The QEMU CI fails in virtio-scmi test occasionally. As reported by > > > > Thomas Huth, this happens most likely when the system is loaded and it > > > > fails with the following error: > > > > > > > > qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: > > > > msix_unset_vector_notifiers: Assertion > > > > `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' > > > > failed. > > > > ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected > > > > QEMU death from signal 6 (Aborted) (core dumped) > > > > > > > > As discovered by Fabiano Rosas, the cause is a duplicate invocation of > > > > msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: > > > > > > > > msix_unset_vector_notifiers > > > > virtio_pci_set_guest_notifiers > > > > vu_scmi_stop > > > > vu_scmi_disconnect > > > > ... > > > > qemu_chr_write_buffer > > > > > > > > msix_unset_vector_notifiers > > > > virtio_pci_set_guest_notifiers > > > > vu_scmi_stop > > > > vu_scmi_set_status > > > > ... > > > > qemu_cleanup > > > > > > > > While vu_scmi_stop calls are protected by vhost_dev_is_started() > > > > check, it's apparently not enough. vhost-user-blk and vhost-user-gpio > > > > use an extra protection, see f5b22d06fb (vhost: recheck dev state in > > > > the vhost_migration_log routine) for the motivation. Let's use the > > > > same in vhost-user-scmi, which fixes the failure above. > > > > > > > > Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") > > > > Signed-off-by: Milan Zamazal <mzamazal@redhat.com> > > > > > > Reviewed-by: Fabiano Rosas <farosas@suse.de> > > > > Please note that this bug fix should IMO definitely go to 8.1, to not > > have a bug in vhost-user-scmi and to not have broken tests. Any chance > > to get it merged? > > If nobody else is planning a pull request with this patch included, I can > take it for my next PR (since it is fixing the CI, I just saw another > failure here: > https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 ) > > Thomas > I picked it up but if you like I can drop it.
On 03/08/2023 11.50, Michael S. Tsirkin wrote: > On Thu, Aug 03, 2023 at 09:38:20AM +0200, Thomas Huth wrote: >> On 03/08/2023 09.10, Milan Zamazal wrote: >>> Fabiano Rosas <farosas@suse.de> writes: >>> >>>> Milan Zamazal <mzamazal@redhat.com> writes: >>>> >>>>> The QEMU CI fails in virtio-scmi test occasionally. As reported by >>>>> Thomas Huth, this happens most likely when the system is loaded and it >>>>> fails with the following error: >>>>> >>>>> qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: >>>>> msix_unset_vector_notifiers: Assertion >>>>> `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' >>>>> failed. >>>>> ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected >>>>> QEMU death from signal 6 (Aborted) (core dumped) >>>>> >>>>> As discovered by Fabiano Rosas, the cause is a duplicate invocation of >>>>> msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: >>>>> >>>>> msix_unset_vector_notifiers >>>>> virtio_pci_set_guest_notifiers >>>>> vu_scmi_stop >>>>> vu_scmi_disconnect >>>>> ... >>>>> qemu_chr_write_buffer >>>>> >>>>> msix_unset_vector_notifiers >>>>> virtio_pci_set_guest_notifiers >>>>> vu_scmi_stop >>>>> vu_scmi_set_status >>>>> ... >>>>> qemu_cleanup >>>>> >>>>> While vu_scmi_stop calls are protected by vhost_dev_is_started() >>>>> check, it's apparently not enough. vhost-user-blk and vhost-user-gpio >>>>> use an extra protection, see f5b22d06fb (vhost: recheck dev state in >>>>> the vhost_migration_log routine) for the motivation. Let's use the >>>>> same in vhost-user-scmi, which fixes the failure above. >>>>> >>>>> Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") >>>>> Signed-off-by: Milan Zamazal <mzamazal@redhat.com> >>>> >>>> Reviewed-by: Fabiano Rosas <farosas@suse.de> >>> >>> Please note that this bug fix should IMO definitely go to 8.1, to not >>> have a bug in vhost-user-scmi and to not have broken tests. Any chance >>> to get it merged? >> >> If nobody else is planning a pull request with this patch included, I can >> take it for my next PR (since it is fixing the CI, I just saw another >> failure here: >> https://gitlab.com/qemu-project/qemu/-/jobs/4790457938#L4784 ) >> >> Thomas >> > > I picked it up but if you like I can drop it. I think it better fits into your tree, so if you plan a PR before the next RC, then please go ahead. I'll drop it from my branch again. Thomas
diff --git a/hw/virtio/vhost-user-scmi.c b/hw/virtio/vhost-user-scmi.c index d386fb2df9..918bb7dcf7 100644 --- a/hw/virtio/vhost-user-scmi.c +++ b/hw/virtio/vhost-user-scmi.c @@ -63,6 +63,7 @@ static int vu_scmi_start(VirtIODevice *vdev) error_report("Error starting vhost-user-scmi: %d", ret); goto err_guest_notifiers; } + scmi->started_vu = true; /* * guest_notifier_mask/pending not used yet, so just unmask @@ -90,6 +91,12 @@ static void vu_scmi_stop(VirtIODevice *vdev) struct vhost_dev *vhost_dev = &scmi->vhost_dev; int ret; + /* vhost_dev_is_started() check in the callers is not fully reliable. */ + if (!scmi->started_vu) { + return; + } + scmi->started_vu = false; + if (!k->set_guest_notifiers) { return; } diff --git a/include/hw/virtio/vhost-user-scmi.h b/include/hw/virtio/vhost-user-scmi.h index 6175a74ebd..c90db77dd5 100644 --- a/include/hw/virtio/vhost-user-scmi.h +++ b/include/hw/virtio/vhost-user-scmi.h @@ -25,6 +25,7 @@ struct VHostUserSCMI { VirtQueue *cmd_vq; VirtQueue *event_vq; bool connected; + bool started_vu; }; #endif /* _QEMU_VHOST_USER_SCMI_H */
The QEMU CI fails in virtio-scmi test occasionally. As reported by Thomas Huth, this happens most likely when the system is loaded and it fails with the following error: qemu-system-aarch64: ../../devel/qemu/hw/pci/msix.c:659: msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed. ../../devel/qemu/tests/qtest/libqtest.c:200: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) As discovered by Fabiano Rosas, the cause is a duplicate invocation of msix_unset_vector_notifiers via duplicate vu_scmi_stop calls: msix_unset_vector_notifiers virtio_pci_set_guest_notifiers vu_scmi_stop vu_scmi_disconnect ... qemu_chr_write_buffer msix_unset_vector_notifiers virtio_pci_set_guest_notifiers vu_scmi_stop vu_scmi_set_status ... qemu_cleanup While vu_scmi_stop calls are protected by vhost_dev_is_started() check, it's apparently not enough. vhost-user-blk and vhost-user-gpio use an extra protection, see f5b22d06fb (vhost: recheck dev state in the vhost_migration_log routine) for the motivation. Let's use the same in vhost-user-scmi, which fixes the failure above. Fixes: a5dab090e142 ("hw/virtio: Add boilerplate for vhost-user-scmi device") Signed-off-by: Milan Zamazal <mzamazal@redhat.com> --- hw/virtio/vhost-user-scmi.c | 7 +++++++ include/hw/virtio/vhost-user-scmi.h | 1 + 2 files changed, 8 insertions(+)