Message ID | 20240129132407.1474202-1-eperezma@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Move net backend cleanup to NIC cleanup | expand |
On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote: > > Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net > structures if peer nic is present") effectively delayed the backend > cleanup, allowing the frontend or the guest to access it resources as > long as the frontend NIC is still visible to the guest. > > However it does not clean up the resources until the qemu process is > over. This causes an effective leak if the device is deleted with > device_del, as there is no way to close the vdpa device. This makes > impossible to re-add that device to this or other QEMU instances until > the first instance of QEMU is finished. > > Move the cleanup from qemu_cleanup to the NIC deletion. > > Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") > Acked-by: Jason Wang <jasowang@redhat.com> > Reported-by: Lei Yang <leiyang@redhat.com> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > > Eugenio Pérez (2): > net: parameterize the removing client from nc list > net: move backend cleanup to NIC cleanup > > net/net.c | 30 ++++++++++++++++++++---------- > net/vhost-vdpa.c | 8 -------- > 2 files changed, 20 insertions(+), 18 deletions(-) > > -- Queued. Thanks
Hi Jason, It seems this series wasn't applied successfully, I still cannot see it from the latest tree. Any idea? In any case the fix LGTM. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Thanks, -Siwei On 1/31/2024 9:43 PM, Jason Wang wrote: > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote: >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net >> structures if peer nic is present") effectively delayed the backend >> cleanup, allowing the frontend or the guest to access it resources as >> long as the frontend NIC is still visible to the guest. >> >> However it does not clean up the resources until the qemu process is >> over. This causes an effective leak if the device is deleted with >> device_del, as there is no way to close the vdpa device. This makes >> impossible to re-add that device to this or other QEMU instances until >> the first instance of QEMU is finished. >> >> Move the cleanup from qemu_cleanup to the NIC deletion. >> >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") >> Acked-by: Jason Wang <jasowang@redhat.com> >> Reported-by: Lei Yang <leiyang@redhat.com> >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> >> >> Eugenio Pérez (2): >> net: parameterize the removing client from nc list >> net: move backend cleanup to NIC cleanup >> >> net/net.c | 30 ++++++++++++++++++++---------- >> net/vhost-vdpa.c | 8 -------- >> 2 files changed, 20 insertions(+), 18 deletions(-) >> >> -- > Queued. > > Thanks >
On Tue, Sep 10, 2024 at 11:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > Hi Jason, > > It seems this series wasn't applied successfully, I still cannot see it > from the latest tree. Any idea? It breaks make check. Eugenio, would you want to fix and resend the series? Thanks > > In any case the fix LGTM. > > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> > > Thanks, > -Siwei > > On 1/31/2024 9:43 PM, Jason Wang wrote: > > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote: > >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net > >> structures if peer nic is present") effectively delayed the backend > >> cleanup, allowing the frontend or the guest to access it resources as > >> long as the frontend NIC is still visible to the guest. > >> > >> However it does not clean up the resources until the qemu process is > >> over. This causes an effective leak if the device is deleted with > >> device_del, as there is no way to close the vdpa device. This makes > >> impossible to re-add that device to this or other QEMU instances until > >> the first instance of QEMU is finished. > >> > >> Move the cleanup from qemu_cleanup to the NIC deletion. > >> > >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") > >> Acked-by: Jason Wang <jasowang@redhat.com> > >> Reported-by: Lei Yang <leiyang@redhat.com> > >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > >> > >> Eugenio Pérez (2): > >> net: parameterize the removing client from nc list > >> net: move backend cleanup to NIC cleanup > >> > >> net/net.c | 30 ++++++++++++++++++++---------- > >> net/vhost-vdpa.c | 8 -------- > >> 2 files changed, 20 insertions(+), 18 deletions(-) > >> > >> -- > > Queued. > > > > Thanks > > >
On Tue, Sep 10, 2024 at 5:46 AM Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Sep 10, 2024 at 11:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > > > Hi Jason, > > > > It seems this series wasn't applied successfully, I still cannot see it > > from the latest tree. Any idea? > > It breaks make check. > > Eugenio, would you want to fix and resend the series? > I'm trying to reproduce but with no luck :(. For the record this is the failed log. Is it possible to try to reproduce it again in the machine / env it crashed? ▶ 10/354 ERROR:../tests/qtest/qos-test. c:191:subprocess_run_one_test: child process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess [1494462]) failed unexpectedly ERROR 10/354 qemu:qtest+qtest-x86_64 / qtest-x86_64/qos-test ERROR 14.19s killed by signal 6 SIGABRT >>> PYTHON=/home/devel/git/qemu/build/pyvenv/bin/python3 G_TEST_DBUS_DAEMON=/home/devel/git/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=82 /home/devel/git/qemu/build/tests/qtest/qos-test --tap -k ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: Vhost user backend fails to broadcast fake RARP ../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped) ../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped) ** ERROR:../tests/qtest/qos-test.c:191:subprocess_run_one_test: child process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess [1494462]) failed unexpectedly > Thanks > > > > > In any case the fix LGTM. > > > > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> > > > > Thanks, > > -Siwei > > > > On 1/31/2024 9:43 PM, Jason Wang wrote: > > > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote: > > >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net > > >> structures if peer nic is present") effectively delayed the backend > > >> cleanup, allowing the frontend or the guest to access it resources as > > >> long as the frontend NIC is still visible to the guest. > > >> > > >> However it does not clean up the resources until the qemu process is > > >> over. This causes an effective leak if the device is deleted with > > >> device_del, as there is no way to close the vdpa device. This makes > > >> impossible to re-add that device to this or other QEMU instances until > > >> the first instance of QEMU is finished. > > >> > > >> Move the cleanup from qemu_cleanup to the NIC deletion. > > >> > > >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") > > >> Acked-by: Jason Wang <jasowang@redhat.com> > > >> Reported-by: Lei Yang <leiyang@redhat.com> > > >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > > >> > > >> Eugenio Pérez (2): > > >> net: parameterize the removing client from nc list > > >> net: move backend cleanup to NIC cleanup > > >> > > >> net/net.c | 30 ++++++++++++++++++++---------- > > >> net/vhost-vdpa.c | 8 -------- > > >> 2 files changed, 20 insertions(+), 18 deletions(-) > > >> > > >> -- > > > Queued. > > > > > > Thanks > > > > > >
On Wed, Sep 11, 2024 at 11:04 AM Eugenio Perez Martin <eperezma@redhat.com> wrote: > > On Tue, Sep 10, 2024 at 5:46 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Tue, Sep 10, 2024 at 11:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > > > > > Hi Jason, > > > > > > It seems this series wasn't applied successfully, I still cannot see it > > > from the latest tree. Any idea? > > > > It breaks make check. > > > > Eugenio, would you want to fix and resend the series? > > > > I'm trying to reproduce but with no luck :(. > I'm able to reproduce consistently now. > For the record this is the failed log. Is it possible to try to > reproduce it again in the machine / env it crashed? > > ▶ 10/354 ERROR:../tests/qtest/qos-test. > c:191:subprocess_run_one_test: > child process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess > [1494462]) failed unexpectedly ERROR > 10/354 qemu:qtest+qtest-x86_64 / qtest-x86_64/qos-test > ERROR 14.19s killed by signal 6 SIGABRT > >>> PYTHON=/home/devel/git/qemu/build/pyvenv/bin/python3 G_TEST_DBUS_DAEMON=/home/devel/git/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=82 /home/devel/git/qemu/build/tests/qtest/qos-test --tap -k > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > Vhost user backend fails to broadcast fake RARP > ../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from > signal 11 (Segmentation fault) (core dumped) > ../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from > signal 11 (Segmentation fault) (core dumped) > ** > ERROR:../tests/qtest/qos-test.c:191:subprocess_run_one_test: child > process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess > [1494462]) failed unexpectedly > > Thanks > > > > > > > > In any case the fix LGTM. > > > > > > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> > > > > > > Thanks, > > > -Siwei > > > > > > On 1/31/2024 9:43 PM, Jason Wang wrote: > > > > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote: > > > >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net > > > >> structures if peer nic is present") effectively delayed the backend > > > >> cleanup, allowing the frontend or the guest to access it resources as > > > >> long as the frontend NIC is still visible to the guest. > > > >> > > > >> However it does not clean up the resources until the qemu process is > > > >> over. This causes an effective leak if the device is deleted with > > > >> device_del, as there is no way to close the vdpa device. This makes > > > >> impossible to re-add that device to this or other QEMU instances until > > > >> the first instance of QEMU is finished. > > > >> > > > >> Move the cleanup from qemu_cleanup to the NIC deletion. > > > >> > > > >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present") > > > >> Acked-by: Jason Wang <jasowang@redhat.com> > > > >> Reported-by: Lei Yang <leiyang@redhat.com> > > > >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> > > > >> > > > >> Eugenio Pérez (2): > > > >> net: parameterize the removing client from nc list > > > >> net: move backend cleanup to NIC cleanup > > > >> > > > >> net/net.c | 30 ++++++++++++++++++++---------- > > > >> net/vhost-vdpa.c | 8 -------- > > > >> 2 files changed, 20 insertions(+), 18 deletions(-) > > > >> > > > >> -- > > > > Queued. > > > > > > > > Thanks > > > > > > > > >