mbox series

[0/2] Move net backend cleanup to NIC cleanup

Message ID 20240129132407.1474202-1-eperezma@redhat.com (mailing list archive)
Headers show
Series Move net backend cleanup to NIC cleanup | expand

Message

Eugenio Perez Martin Jan. 29, 2024, 1:24 p.m. UTC
Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
structures if peer nic is present") effectively delayed the backend
cleanup, allowing the frontend or the guest to access it resources as
long as the frontend NIC is still visible to the guest.

However it does not clean up the resources until the qemu process is
over.  This causes an effective leak if the device is deleted with
device_del, as there is no way to close the vdpa device.  This makes
impossible to re-add that device to this or other QEMU instances until
the first instance of QEMU is finished.

Move the cleanup from qemu_cleanup to the NIC deletion.

Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present")
Acked-by: Jason Wang <jasowang@redhat.com>
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Eugenio Pérez (2):
  net: parameterize the removing client from nc list
  net: move backend cleanup to NIC cleanup

 net/net.c        | 30 ++++++++++++++++++++----------
 net/vhost-vdpa.c |  8 --------
 2 files changed, 20 insertions(+), 18 deletions(-)

Comments

Jason Wang Feb. 1, 2024, 5:43 a.m. UTC | #1
On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
> structures if peer nic is present") effectively delayed the backend
> cleanup, allowing the frontend or the guest to access it resources as
> long as the frontend NIC is still visible to the guest.
>
> However it does not clean up the resources until the qemu process is
> over.  This causes an effective leak if the device is deleted with
> device_del, as there is no way to close the vdpa device.  This makes
> impossible to re-add that device to this or other QEMU instances until
> the first instance of QEMU is finished.
>
> Move the cleanup from qemu_cleanup to the NIC deletion.
>
> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present")
> Acked-by: Jason Wang <jasowang@redhat.com>
> Reported-by: Lei Yang <leiyang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
> Eugenio Pérez (2):
>   net: parameterize the removing client from nc list
>   net: move backend cleanup to NIC cleanup
>
>  net/net.c        | 30 ++++++++++++++++++++----------
>  net/vhost-vdpa.c |  8 --------
>  2 files changed, 20 insertions(+), 18 deletions(-)
>
> --

Queued.

Thanks
Si-Wei Liu Sept. 10, 2024, 3:40 a.m. UTC | #2
Hi Jason,

It seems this series wasn't applied successfully, I still cannot see it 
from the latest tree. Any idea?

In any case the fix LGTM.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

Thanks,
-Siwei

On 1/31/2024 9:43 PM, Jason Wang wrote:
> On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote:
>> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
>> structures if peer nic is present") effectively delayed the backend
>> cleanup, allowing the frontend or the guest to access it resources as
>> long as the frontend NIC is still visible to the guest.
>>
>> However it does not clean up the resources until the qemu process is
>> over.  This causes an effective leak if the device is deleted with
>> device_del, as there is no way to close the vdpa device.  This makes
>> impossible to re-add that device to this or other QEMU instances until
>> the first instance of QEMU is finished.
>>
>> Move the cleanup from qemu_cleanup to the NIC deletion.
>>
>> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present")
>> Acked-by: Jason Wang <jasowang@redhat.com>
>> Reported-by: Lei Yang <leiyang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>
>> Eugenio Pérez (2):
>>    net: parameterize the removing client from nc list
>>    net: move backend cleanup to NIC cleanup
>>
>>   net/net.c        | 30 ++++++++++++++++++++----------
>>   net/vhost-vdpa.c |  8 --------
>>   2 files changed, 20 insertions(+), 18 deletions(-)
>>
>> --
> Queued.
>
> Thanks
>
Jason Wang Sept. 10, 2024, 3:46 a.m. UTC | #3
On Tue, Sep 10, 2024 at 11:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Hi Jason,
>
> It seems this series wasn't applied successfully, I still cannot see it
> from the latest tree. Any idea?

It breaks make check.

Eugenio, would you want to fix and resend the series?

Thanks

>
> In any case the fix LGTM.
>
> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
>
> Thanks,
> -Siwei
>
> On 1/31/2024 9:43 PM, Jason Wang wrote:
> > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
> >> structures if peer nic is present") effectively delayed the backend
> >> cleanup, allowing the frontend or the guest to access it resources as
> >> long as the frontend NIC is still visible to the guest.
> >>
> >> However it does not clean up the resources until the qemu process is
> >> over.  This causes an effective leak if the device is deleted with
> >> device_del, as there is no way to close the vdpa device.  This makes
> >> impossible to re-add that device to this or other QEMU instances until
> >> the first instance of QEMU is finished.
> >>
> >> Move the cleanup from qemu_cleanup to the NIC deletion.
> >>
> >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present")
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> >> Reported-by: Lei Yang <leiyang@redhat.com>
> >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> >>
> >> Eugenio Pérez (2):
> >>    net: parameterize the removing client from nc list
> >>    net: move backend cleanup to NIC cleanup
> >>
> >>   net/net.c        | 30 ++++++++++++++++++++----------
> >>   net/vhost-vdpa.c |  8 --------
> >>   2 files changed, 20 insertions(+), 18 deletions(-)
> >>
> >> --
> > Queued.
> >
> > Thanks
> >
>
Eugenio Perez Martin Sept. 11, 2024, 9:04 a.m. UTC | #4
On Tue, Sep 10, 2024 at 5:46 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Sep 10, 2024 at 11:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> > Hi Jason,
> >
> > It seems this series wasn't applied successfully, I still cannot see it
> > from the latest tree. Any idea?
>
> It breaks make check.
>
> Eugenio, would you want to fix and resend the series?
>

I'm trying to reproduce but with no luck :(.

For the record this is the failed log. Is it possible to try to
reproduce it again in the machine / env it crashed?

▶  10/354 ERROR:../tests/qtest/qos-test.
c:191:subprocess_run_one_test:
child process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess
[1494462]) failed unexpectedly ERROR
 10/354 qemu:qtest+qtest-x86_64 / qtest-x86_64/qos-test
        ERROR           14.19s   killed by signal 6 SIGABRT
>>> PYTHON=/home/devel/git/qemu/build/pyvenv/bin/python3 G_TEST_DBUS_DAEMON=/home/devel/git/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=82 /home/devel/git/qemu/build/tests/qtest/qos-test --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
Vhost user backend fails to broadcast fake RARP
../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from
signal 11 (Segmentation fault) (core dumped)
../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from
signal 11 (Segmentation fault) (core dumped)
**
ERROR:../tests/qtest/qos-test.c:191:subprocess_run_one_test: child
process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess
[1494462]) failed unexpectedly
> Thanks
>
> >
> > In any case the fix LGTM.
> >
> > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
> >
> > Thanks,
> > -Siwei
> >
> > On 1/31/2024 9:43 PM, Jason Wang wrote:
> > > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
> > >> structures if peer nic is present") effectively delayed the backend
> > >> cleanup, allowing the frontend or the guest to access it resources as
> > >> long as the frontend NIC is still visible to the guest.
> > >>
> > >> However it does not clean up the resources until the qemu process is
> > >> over.  This causes an effective leak if the device is deleted with
> > >> device_del, as there is no way to close the vdpa device.  This makes
> > >> impossible to re-add that device to this or other QEMU instances until
> > >> the first instance of QEMU is finished.
> > >>
> > >> Move the cleanup from qemu_cleanup to the NIC deletion.
> > >>
> > >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present")
> > >> Acked-by: Jason Wang <jasowang@redhat.com>
> > >> Reported-by: Lei Yang <leiyang@redhat.com>
> > >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>
> > >> Eugenio Pérez (2):
> > >>    net: parameterize the removing client from nc list
> > >>    net: move backend cleanup to NIC cleanup
> > >>
> > >>   net/net.c        | 30 ++++++++++++++++++++----------
> > >>   net/vhost-vdpa.c |  8 --------
> > >>   2 files changed, 20 insertions(+), 18 deletions(-)
> > >>
> > >> --
> > > Queued.
> > >
> > > Thanks
> > >
> >
>
Eugenio Perez Martin Sept. 11, 2024, 5:48 p.m. UTC | #5
On Wed, Sep 11, 2024 at 11:04 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Sep 10, 2024 at 5:46 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Sep 10, 2024 at 11:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > >
> > > Hi Jason,
> > >
> > > It seems this series wasn't applied successfully, I still cannot see it
> > > from the latest tree. Any idea?
> >
> > It breaks make check.
> >
> > Eugenio, would you want to fix and resend the series?
> >
>
> I'm trying to reproduce but with no luck :(.
>

I'm able to reproduce consistently now.

> For the record this is the failed log. Is it possible to try to
> reproduce it again in the machine / env it crashed?
>
> ▶  10/354 ERROR:../tests/qtest/qos-test.
> c:191:subprocess_run_one_test:
> child process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess
> [1494462]) failed unexpectedly ERROR
>  10/354 qemu:qtest+qtest-x86_64 / qtest-x86_64/qos-test
>         ERROR           14.19s   killed by signal 6 SIGABRT
> >>> PYTHON=/home/devel/git/qemu/build/pyvenv/bin/python3 G_TEST_DBUS_DAEMON=/home/devel/git/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=82 /home/devel/git/qemu/build/tests/qtest/qos-test --tap -k
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> stderr:
> Vhost user backend fails to broadcast fake RARP
> ../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from
> signal 11 (Segmentation fault) (core dumped)
> ../tests/qtest/libqtest.c:204: kill_qemu() detected QEMU death from
> signal 11 (Segmentation fault) (core dumped)
> **
> ERROR:../tests/qtest/qos-test.c:191:subprocess_run_one_test: child
> process (/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/migrate/subprocess
> [1494462]) failed unexpectedly
> > Thanks
> >
> > >
> > > In any case the fix LGTM.
> > >
> > > Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > >
> > > Thanks,
> > > -Siwei
> > >
> > > On 1/31/2024 9:43 PM, Jason Wang wrote:
> > > > On Mon, Jan 29, 2024 at 9:24 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > >> Commit a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net
> > > >> structures if peer nic is present") effectively delayed the backend
> > > >> cleanup, allowing the frontend or the guest to access it resources as
> > > >> long as the frontend NIC is still visible to the guest.
> > > >>
> > > >> However it does not clean up the resources until the qemu process is
> > > >> over.  This causes an effective leak if the device is deleted with
> > > >> device_del, as there is no way to close the vdpa device.  This makes
> > > >> impossible to re-add that device to this or other QEMU instances until
> > > >> the first instance of QEMU is finished.
> > > >>
> > > >> Move the cleanup from qemu_cleanup to the NIC deletion.
> > > >>
> > > >> Fixes: a0d7215e33 ("vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present")
> > > >> Acked-by: Jason Wang <jasowang@redhat.com>
> > > >> Reported-by: Lei Yang <leiyang@redhat.com>
> > > >> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > >>
> > > >> Eugenio Pérez (2):
> > > >>    net: parameterize the removing client from nc list
> > > >>    net: move backend cleanup to NIC cleanup
> > > >>
> > > >>   net/net.c        | 30 ++++++++++++++++++++----------
> > > >>   net/vhost-vdpa.c |  8 --------
> > > >>   2 files changed, 20 insertions(+), 18 deletions(-)
> > > >>
> > > >> --
> > > > Queued.
> > > >
> > > > Thanks
> > > >
> > >
> >