mbox series

[0/7] vhost-vdpa multiqueue fixes

Message ID 1648621997-22416-1-git-send-email-si-wei.liu@oracle.com (mailing list archive)
Headers show
Series vhost-vdpa multiqueue fixes | expand

Message

Si-Wei Liu March 30, 2022, 6:33 a.m. UTC
Hi,

This patch series attempt to fix a few issues in vhost-vdpa multiqueue functionality. 

Patch #1 is the formal submission for RFC patch in:
https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/

Patch #2 and #3 were taken from a previous patchset posted on qemu-devel:
https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/

albeit abandoned, two patches in that set turn out to be useful for patch #4, which is to fix a QEMU crash due to race condition.

Patch #5 through #7 are obviously small bug fixes. Please find the description of each in the commit log.

Thanks,
-Siwei

---

Eugenio Pérez (2):
  virtio-net: Fix indentation
  virtio-net: Only enable userland vq if using tap backend

Si-Wei Liu (5):
  virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
  virtio: don't read pending event on host notifier if disabled
  vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
  vhost-net: fix improper cleanup in vhost_net_start
  vhost-vdpa: backend feature should set only once

 hw/net/vhost_net.c         |  4 +++-
 hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
 hw/virtio/vhost-vdpa.c     |  2 +-
 hw/virtio/virtio-bus.c     |  3 ++-
 hw/virtio/virtio.c         | 21 +++++++++++++--------
 include/hw/virtio/virtio.h |  2 ++
 net/vhost-vdpa.c           |  4 +++-
 7 files changed, 45 insertions(+), 16 deletions(-)

Comments

Jason Wang April 27, 2022, 4:28 a.m. UTC | #1
在 2022/3/30 14:33, Si-Wei Liu 写道:
> Hi,
>
> This patch series attempt to fix a few issues in vhost-vdpa multiqueue functionality.
>
> Patch #1 is the formal submission for RFC patch in:
> https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/
>
> Patch #2 and #3 were taken from a previous patchset posted on qemu-devel:
> https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/
>
> albeit abandoned, two patches in that set turn out to be useful for patch #4, which is to fix a QEMU crash due to race condition.
>
> Patch #5 through #7 are obviously small bug fixes. Please find the description of each in the commit log.
>
> Thanks,
> -Siwei


Hi Si Wei:

I think we need another version of this series?

Thanks


>
> ---
>
> Eugenio Pérez (2):
>    virtio-net: Fix indentation
>    virtio-net: Only enable userland vq if using tap backend
>
> Si-Wei Liu (5):
>    virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>    virtio: don't read pending event on host notifier if disabled
>    vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>    vhost-net: fix improper cleanup in vhost_net_start
>    vhost-vdpa: backend feature should set only once
>
>   hw/net/vhost_net.c         |  4 +++-
>   hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>   hw/virtio/vhost-vdpa.c     |  2 +-
>   hw/virtio/virtio-bus.c     |  3 ++-
>   hw/virtio/virtio.c         | 21 +++++++++++++--------
>   include/hw/virtio/virtio.h |  2 ++
>   net/vhost-vdpa.c           |  4 +++-
>   7 files changed, 45 insertions(+), 16 deletions(-)
>
Si-Wei Liu April 27, 2022, 8:29 a.m. UTC | #2
On 4/26/2022 9:28 PM, Jason Wang wrote:
>
> 在 2022/3/30 14:33, Si-Wei Liu 写道:
>> Hi,
>>
>> This patch series attempt to fix a few issues in vhost-vdpa 
>> multiqueue functionality.
>>
>> Patch #1 is the formal submission for RFC patch in:
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$ 
>>
>> Patch #2 and #3 were taken from a previous patchset posted on 
>> qemu-devel:
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$ 
>>
>> albeit abandoned, two patches in that set turn out to be useful for 
>> patch #4, which is to fix a QEMU crash due to race condition.
>>
>> Patch #5 through #7 are obviously small bug fixes. Please find the 
>> description of each in the commit log.
>>
>> Thanks,
>> -Siwei
>
>
> Hi Si Wei:
>
> I think we need another version of this series?
Hi Jason,

Apologies for the long delay. I was in the middle of reworking the patch 
"virtio: don't read pending event on host notifier if disabled", but 
found out that it would need quite some code change for the userspace 
fallback handler to work properly (for the queue no. change case 
specifically). I have to drop it from the series and posted it later on 
when ready. Will post a v2 with relevant patches removed.

Regards,
-Siwei

>
> Thanks
>
>
>>
>> ---
>>
>> Eugenio Pérez (2):
>>    virtio-net: Fix indentation
>>    virtio-net: Only enable userland vq if using tap backend
>>
>> Si-Wei Liu (5):
>>    virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>>    virtio: don't read pending event on host notifier if disabled
>>    vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>>    vhost-net: fix improper cleanup in vhost_net_start
>>    vhost-vdpa: backend feature should set only once
>>
>>   hw/net/vhost_net.c         |  4 +++-
>>   hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>>   hw/virtio/vhost-vdpa.c     |  2 +-
>>   hw/virtio/virtio-bus.c     |  3 ++-
>>   hw/virtio/virtio.c         | 21 +++++++++++++--------
>>   include/hw/virtio/virtio.h |  2 ++
>>   net/vhost-vdpa.c           |  4 +++-
>>   7 files changed, 45 insertions(+), 16 deletions(-)
>>
>
Jason Wang April 27, 2022, 8:38 a.m. UTC | #3
On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/26/2022 9:28 PM, Jason Wang wrote:
> >
> > 在 2022/3/30 14:33, Si-Wei Liu 写道:
> >> Hi,
> >>
> >> This patch series attempt to fix a few issues in vhost-vdpa
> >> multiqueue functionality.
> >>
> >> Patch #1 is the formal submission for RFC patch in:
> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
> >>
> >> Patch #2 and #3 were taken from a previous patchset posted on
> >> qemu-devel:
> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
> >>
> >> albeit abandoned, two patches in that set turn out to be useful for
> >> patch #4, which is to fix a QEMU crash due to race condition.
> >>
> >> Patch #5 through #7 are obviously small bug fixes. Please find the
> >> description of each in the commit log.
> >>
> >> Thanks,
> >> -Siwei
> >
> >
> > Hi Si Wei:
> >
> > I think we need another version of this series?
> Hi Jason,
>
> Apologies for the long delay. I was in the middle of reworking the patch
> "virtio: don't read pending event on host notifier if disabled", but
> found out that it would need quite some code change for the userspace
> fallback handler to work properly (for the queue no. change case
> specifically).

We probably need this fix for -stable, so I wonder if we can have a
workaround first and do refactoring on top?

> I have to drop it from the series and posted it later on
> when ready. Will post a v2 with relevant patches removed.

Thanks

>
> Regards,
> -Siwei
>
> >
> > Thanks
> >
> >
> >>
> >> ---
> >>
> >> Eugenio Pérez (2):
> >>    virtio-net: Fix indentation
> >>    virtio-net: Only enable userland vq if using tap backend
> >>
> >> Si-Wei Liu (5):
> >>    virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
> >>    virtio: don't read pending event on host notifier if disabled
> >>    vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
> >>    vhost-net: fix improper cleanup in vhost_net_start
> >>    vhost-vdpa: backend feature should set only once
> >>
> >>   hw/net/vhost_net.c         |  4 +++-
> >>   hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
> >>   hw/virtio/vhost-vdpa.c     |  2 +-
> >>   hw/virtio/virtio-bus.c     |  3 ++-
> >>   hw/virtio/virtio.c         | 21 +++++++++++++--------
> >>   include/hw/virtio/virtio.h |  2 ++
> >>   net/vhost-vdpa.c           |  4 +++-
> >>   7 files changed, 45 insertions(+), 16 deletions(-)
> >>
> >
>
Si-Wei Liu April 27, 2022, 9:09 a.m. UTC | #4
On 4/27/2022 1:38 AM, Jason Wang wrote:
> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 4/26/2022 9:28 PM, Jason Wang wrote:
>>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
>>>> Hi,
>>>>
>>>> This patch series attempt to fix a few issues in vhost-vdpa
>>>> multiqueue functionality.
>>>>
>>>> Patch #1 is the formal submission for RFC patch in:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
>>>>
>>>> Patch #2 and #3 were taken from a previous patchset posted on
>>>> qemu-devel:
>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
>>>>
>>>> albeit abandoned, two patches in that set turn out to be useful for
>>>> patch #4, which is to fix a QEMU crash due to race condition.
>>>>
>>>> Patch #5 through #7 are obviously small bug fixes. Please find the
>>>> description of each in the commit log.
>>>>
>>>> Thanks,
>>>> -Siwei
>>>
>>> Hi Si Wei:
>>>
>>> I think we need another version of this series?
>> Hi Jason,
>>
>> Apologies for the long delay. I was in the middle of reworking the patch
>> "virtio: don't read pending event on host notifier if disabled", but
>> found out that it would need quite some code change for the userspace
>> fallback handler to work properly (for the queue no. change case
>> specifically).
> We probably need this fix for -stable, so I wonder if we can have a
> workaround first and do refactoring on top?
Hmmm, a nasty fix but may well address the segfault is something like this:

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8ca0b80..3ac93a4 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -368,6 +368,10 @@ static void virtio_net_set_status(struct 
VirtIODevice *vdev, uint8_t status)
      int i;
      uint8_t queue_status;

+    if (n->status_pending)
+        return;
+
+    n->status_pending = true;
      virtio_net_vnet_endian_status(n, status);
      virtio_net_vhost_status(n, status);

@@ -416,6 +420,7 @@ static void virtio_net_set_status(struct 
VirtIODevice *vdev, uint8_t status)
              }
          }
      }
+    n->status_pending = false;
  }

  static void virtio_net_set_link_status(NetClientState *nc)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032..95efea8 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -216,6 +216,7 @@ struct VirtIONet {
      VirtioNetRssData rss_data;
      struct NetRxPkt *rx_pkt;
      struct EBPFRSSContext ebpf_rss;
+    bool status_pending;
  };

  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,

To be honest, I am not sure if this is worth a full blown fix to make it 
completely work. Without applying vq suspend patch (the one I posted in 
https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/), 
it's very hard for me to effectively verify my code change - it's very 
easy for the guest vq index to be out of sync if not stopping the vq 
once the vhost is up and running (I tested it with repeatedly set_link 
off and on). I am not sure if there's real chance we can run into issue 
in practice due to the incomplete fix, if we don't fix the vq 
stop/suspend issue first. Anyway I will try, as other use case e.g, live 
migration is likely to get stumbled on it, too.

-Siwei


>
>> I have to drop it from the series and posted it later on
>> when ready. Will post a v2 with relevant patches removed.
> Thanks
>
>> Regards,
>> -Siwei
>>
>>> Thanks
>>>
>>>
>>>> ---
>>>>
>>>> Eugenio Pérez (2):
>>>>     virtio-net: Fix indentation
>>>>     virtio-net: Only enable userland vq if using tap backend
>>>>
>>>> Si-Wei Liu (5):
>>>>     virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>>>>     virtio: don't read pending event on host notifier if disabled
>>>>     vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>>>>     vhost-net: fix improper cleanup in vhost_net_start
>>>>     vhost-vdpa: backend feature should set only once
>>>>
>>>>    hw/net/vhost_net.c         |  4 +++-
>>>>    hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>>>>    hw/virtio/vhost-vdpa.c     |  2 +-
>>>>    hw/virtio/virtio-bus.c     |  3 ++-
>>>>    hw/virtio/virtio.c         | 21 +++++++++++++--------
>>>>    include/hw/virtio/virtio.h |  2 ++
>>>>    net/vhost-vdpa.c           |  4 +++-
>>>>    7 files changed, 45 insertions(+), 16 deletions(-)
>>>>
Jason Wang April 29, 2022, 2:30 a.m. UTC | #5
On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/27/2022 1:38 AM, Jason Wang wrote:
> > On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 4/26/2022 9:28 PM, Jason Wang wrote:
> >>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
> >>>> Hi,
> >>>>
> >>>> This patch series attempt to fix a few issues in vhost-vdpa
> >>>> multiqueue functionality.
> >>>>
> >>>> Patch #1 is the formal submission for RFC patch in:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
> >>>>
> >>>> Patch #2 and #3 were taken from a previous patchset posted on
> >>>> qemu-devel:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
> >>>>
> >>>> albeit abandoned, two patches in that set turn out to be useful for
> >>>> patch #4, which is to fix a QEMU crash due to race condition.
> >>>>
> >>>> Patch #5 through #7 are obviously small bug fixes. Please find the
> >>>> description of each in the commit log.
> >>>>
> >>>> Thanks,
> >>>> -Siwei
> >>>
> >>> Hi Si Wei:
> >>>
> >>> I think we need another version of this series?
> >> Hi Jason,
> >>
> >> Apologies for the long delay. I was in the middle of reworking the patch
> >> "virtio: don't read pending event on host notifier if disabled", but
> >> found out that it would need quite some code change for the userspace
> >> fallback handler to work properly (for the queue no. change case
> >> specifically).
> > We probably need this fix for -stable, so I wonder if we can have a
> > workaround first and do refactoring on top?
> Hmmm, a nasty fix but may well address the segfault is something like this:
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 8ca0b80..3ac93a4 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct
> VirtIODevice *vdev, uint8_t status)
>       int i;
>       uint8_t queue_status;
>
> +    if (n->status_pending)
> +        return;
> +
> +    n->status_pending = true;
>       virtio_net_vnet_endian_status(n, status);
>       virtio_net_vhost_status(n, status);
>
> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct
> VirtIODevice *vdev, uint8_t status)
>               }
>           }
>       }
> +    n->status_pending = false;
>   }
>
>   static void virtio_net_set_link_status(NetClientState *nc)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index eb87032..95efea8 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -216,6 +216,7 @@ struct VirtIONet {
>       VirtioNetRssData rss_data;
>       struct NetRxPkt *rx_pkt;
>       struct EBPFRSSContext ebpf_rss;
> +    bool status_pending;
>   };
>
>   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>
> To be honest, I am not sure if this is worth a full blown fix to make it
> completely work. Without applying vq suspend patch (the one I posted in
> https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/),
> it's very hard for me to effectively verify my code change - it's very
> easy for the guest vq index to be out of sync if not stopping the vq
> once the vhost is up and running (I tested it with repeatedly set_link
> off and on).

Can we test via vmstop?

> I am not sure if there's real chance we can run into issue
> in practice due to the incomplete fix, if we don't fix the vq
> stop/suspend issue first. Anyway I will try, as other use case e.g, live
> migration is likely to get stumbled on it, too.

Ok, so I think we probably don't need the "nasty" fix above. Let's fix
it with the issue of stop/resume.

Thanks

>
> -Siwei
>
>
> >
> >> I have to drop it from the series and posted it later on
> >> when ready. Will post a v2 with relevant patches removed.
> > Thanks
> >
> >> Regards,
> >> -Siwei
> >>
> >>> Thanks
> >>>
> >>>
> >>>> ---
> >>>>
> >>>> Eugenio Pérez (2):
> >>>>     virtio-net: Fix indentation
> >>>>     virtio-net: Only enable userland vq if using tap backend
> >>>>
> >>>> Si-Wei Liu (5):
> >>>>     virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
> >>>>     virtio: don't read pending event on host notifier if disabled
> >>>>     vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
> >>>>     vhost-net: fix improper cleanup in vhost_net_start
> >>>>     vhost-vdpa: backend feature should set only once
> >>>>
> >>>>    hw/net/vhost_net.c         |  4 +++-
> >>>>    hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
> >>>>    hw/virtio/vhost-vdpa.c     |  2 +-
> >>>>    hw/virtio/virtio-bus.c     |  3 ++-
> >>>>    hw/virtio/virtio.c         | 21 +++++++++++++--------
> >>>>    include/hw/virtio/virtio.h |  2 ++
> >>>>    net/vhost-vdpa.c           |  4 +++-
> >>>>    7 files changed, 45 insertions(+), 16 deletions(-)
> >>>>
>
Si-Wei Liu April 30, 2022, 2:07 a.m. UTC | #6
On 4/28/2022 7:30 PM, Jason Wang wrote:
> On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>> On 4/27/2022 1:38 AM, Jason Wang wrote:
>>> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>> On 4/26/2022 9:28 PM, Jason Wang wrote:
>>>>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
>>>>>> Hi,
>>>>>>
>>>>>> This patch series attempt to fix a few issues in vhost-vdpa
>>>>>> multiqueue functionality.
>>>>>>
>>>>>> Patch #1 is the formal submission for RFC patch in:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
>>>>>>
>>>>>> Patch #2 and #3 were taken from a previous patchset posted on
>>>>>> qemu-devel:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
>>>>>>
>>>>>> albeit abandoned, two patches in that set turn out to be useful for
>>>>>> patch #4, which is to fix a QEMU crash due to race condition.
>>>>>>
>>>>>> Patch #5 through #7 are obviously small bug fixes. Please find the
>>>>>> description of each in the commit log.
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>> Hi Si Wei:
>>>>>
>>>>> I think we need another version of this series?
>>>> Hi Jason,
>>>>
>>>> Apologies for the long delay. I was in the middle of reworking the patch
>>>> "virtio: don't read pending event on host notifier if disabled", but
>>>> found out that it would need quite some code change for the userspace
>>>> fallback handler to work properly (for the queue no. change case
>>>> specifically).
>>> We probably need this fix for -stable, so I wonder if we can have a
>>> workaround first and do refactoring on top?
>> Hmmm, a nasty fix but may well address the segfault is something like this:
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 8ca0b80..3ac93a4 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct
>> VirtIODevice *vdev, uint8_t status)
>>        int i;
>>        uint8_t queue_status;
>>
>> +    if (n->status_pending)
>> +        return;
>> +
>> +    n->status_pending = true;
>>        virtio_net_vnet_endian_status(n, status);
>>        virtio_net_vhost_status(n, status);
>>
>> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct
>> VirtIODevice *vdev, uint8_t status)
>>                }
>>            }
>>        }
>> +    n->status_pending = false;
>>    }
>>
>>    static void virtio_net_set_link_status(NetClientState *nc)
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index eb87032..95efea8 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -216,6 +216,7 @@ struct VirtIONet {
>>        VirtioNetRssData rss_data;
>>        struct NetRxPkt *rx_pkt;
>>        struct EBPFRSSContext ebpf_rss;
>> +    bool status_pending;
>>    };
>>
>>    void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>>
>> To be honest, I am not sure if this is worth a full blown fix to make it
>> completely work. Without applying vq suspend patch (the one I posted in
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/__;!!ACWV5N9M2RV99hQ!L4qque3YpPr-CGp12NrNdMMT1HROfEY_Juw2vnfZXHjOhtT0XJCR9GB8cvWEbJL9Aeh-WhDogBVArJn91P0$ ),
>> it's very hard for me to effectively verify my code change - it's very
>> easy for the guest vq index to be out of sync if not stopping the vq
>> once the vhost is up and running (I tested it with repeatedly set_link
>> off and on).
> Can we test via vmstop?
Yes, of coz, that's where the segfault happened. The tight loop of 
set_link on/off doesn't even work for the single queue case, hence 
that's why I doubted it ever worked for vhost-vdpa.

>
>> I am not sure if there's real chance we can run into issue
>> in practice due to the incomplete fix, if we don't fix the vq
>> stop/suspend issue first. Anyway I will try, as other use case e.g, live
>> migration is likely to get stumbled on it, too.
> Ok, so I think we probably don't need the "nasty" fix above. Let's fix
> it with the issue of stop/resume.
Ok, then does below tentative code change suffice the need? i.e. it 
would fail the request of changing queue pairs when the vhost-vdpa 
backend falls back to the userspace handler, but it's probably the 
easiest way to fix the vmstop segfault.

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ed231f9..8ba9f09 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1177,6 +1177,7 @@ static int virtio_net_handle_mq(VirtIONet *n, 
uint8_t cmd,
      struct virtio_net_ctrl_mq mq;
      size_t s;
      uint16_t queue_pairs;
+    NetClientState *nc = qemu_get_queue(n->nic);

      s = iov_to_buf(iov, iov_cnt, 0, &mq, sizeof(mq));
      if (s != sizeof(mq)) {
@@ -1196,6 +1197,13 @@ static int virtio_net_handle_mq(VirtIONet *n, 
uint8_t cmd,
          return VIRTIO_NET_ERR;
      }

+    /* avoid changing the number of queue_pairs for vdpa device in
+     * userspace handler.
+     * TODO: get userspace fallback to work with future patch */
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+        return VIRTIO_NET_ERR;
+    }
+
      n->curr_queue_pairs = queue_pairs;
      /* stop the backend before changing the number of queue_pairs to 
avoid handling a
       * disabled queue */

Thanks,
-Siwei
>
> Thanks
>
>> -Siwei
>>
>>
>>>> I have to drop it from the series and posted it later on
>>>> when ready. Will post a v2 with relevant patches removed.
>>> Thanks
>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> Eugenio Pérez (2):
>>>>>>      virtio-net: Fix indentation
>>>>>>      virtio-net: Only enable userland vq if using tap backend
>>>>>>
>>>>>> Si-Wei Liu (5):
>>>>>>      virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
>>>>>>      virtio: don't read pending event on host notifier if disabled
>>>>>>      vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
>>>>>>      vhost-net: fix improper cleanup in vhost_net_start
>>>>>>      vhost-vdpa: backend feature should set only once
>>>>>>
>>>>>>     hw/net/vhost_net.c         |  4 +++-
>>>>>>     hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
>>>>>>     hw/virtio/vhost-vdpa.c     |  2 +-
>>>>>>     hw/virtio/virtio-bus.c     |  3 ++-
>>>>>>     hw/virtio/virtio.c         | 21 +++++++++++++--------
>>>>>>     include/hw/virtio/virtio.h |  2 ++
>>>>>>     net/vhost-vdpa.c           |  4 +++-
>>>>>>     7 files changed, 45 insertions(+), 16 deletions(-)
>>>>>>
Jason Wang May 5, 2022, 8:40 a.m. UTC | #7
On Sat, Apr 30, 2022 at 10:07 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
>
>
> On 4/28/2022 7:30 PM, Jason Wang wrote:
> > On Wed, Apr 27, 2022 at 5:09 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>
> >>
> >> On 4/27/2022 1:38 AM, Jason Wang wrote:
> >>> On Wed, Apr 27, 2022 at 4:30 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >>>>
> >>>> On 4/26/2022 9:28 PM, Jason Wang wrote:
> >>>>> 在 2022/3/30 14:33, Si-Wei Liu 写道:
> >>>>>> Hi,
> >>>>>>
> >>>>>> This patch series attempt to fix a few issues in vhost-vdpa
> >>>>>> multiqueue functionality.
> >>>>>>
> >>>>>> Patch #1 is the formal submission for RFC patch in:
> >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/c3e931ee-1a1b-9c2f-2f59-cb4395c230f9@oracle.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqg3ysNsk$
> >>>>>>
> >>>>>> Patch #2 and #3 were taken from a previous patchset posted on
> >>>>>> qemu-devel:
> >>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20211117192851.65529-1-eperezma@redhat.com/__;!!ACWV5N9M2RV99hQ!OoUKcyWauHGQOM4MTAUn88TINQo5ZP4aaYyvyUCK9ggrI_L6diSZo5Nmq55moGH769SD87drxQyqc3mXqDs$
> >>>>>>
> >>>>>> albeit abandoned, two patches in that set turn out to be useful for
> >>>>>> patch #4, which is to fix a QEMU crash due to race condition.
> >>>>>>
> >>>>>> Patch #5 through #7 are obviously small bug fixes. Please find the
> >>>>>> description of each in the commit log.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>> Hi Si Wei:
> >>>>>
> >>>>> I think we need another version of this series?
> >>>> Hi Jason,
> >>>>
> >>>> Apologies for the long delay. I was in the middle of reworking the patch
> >>>> "virtio: don't read pending event on host notifier if disabled", but
> >>>> found out that it would need quite some code change for the userspace
> >>>> fallback handler to work properly (for the queue no. change case
> >>>> specifically).
> >>> We probably need this fix for -stable, so I wonder if we can have a
> >>> workaround first and do refactoring on top?
> >> Hmmm, a nasty fix but may well address the segfault is something like this:
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 8ca0b80..3ac93a4 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -368,6 +368,10 @@ static void virtio_net_set_status(struct
> >> VirtIODevice *vdev, uint8_t status)
> >>        int i;
> >>        uint8_t queue_status;
> >>
> >> +    if (n->status_pending)
> >> +        return;
> >> +
> >> +    n->status_pending = true;
> >>        virtio_net_vnet_endian_status(n, status);
> >>        virtio_net_vhost_status(n, status);
> >>
> >> @@ -416,6 +420,7 @@ static void virtio_net_set_status(struct
> >> VirtIODevice *vdev, uint8_t status)
> >>                }
> >>            }
> >>        }
> >> +    n->status_pending = false;
> >>    }
> >>
> >>    static void virtio_net_set_link_status(NetClientState *nc)
> >> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> >> index eb87032..95efea8 100644
> >> --- a/include/hw/virtio/virtio-net.h
> >> +++ b/include/hw/virtio/virtio-net.h
> >> @@ -216,6 +216,7 @@ struct VirtIONet {
> >>        VirtioNetRssData rss_data;
> >>        struct NetRxPkt *rx_pkt;
> >>        struct EBPFRSSContext ebpf_rss;
> >> +    bool status_pending;
> >>    };
> >>
> >>    void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> >>
> >> To be honest, I am not sure if this is worth a full blown fix to make it
> >> completely work. Without applying vq suspend patch (the one I posted in
> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/df7c9a87-b2bd-7758-a6b6-bd834a7336fe@oracle.com/__;!!ACWV5N9M2RV99hQ!L4qque3YpPr-CGp12NrNdMMT1HROfEY_Juw2vnfZXHjOhtT0XJCR9GB8cvWEbJL9Aeh-WhDogBVArJn91P0$ ),
> >> it's very hard for me to effectively verify my code change - it's very
> >> easy for the guest vq index to be out of sync if not stopping the vq
> >> once the vhost is up and running (I tested it with repeatedly set_link
> >> off and on).
> > Can we test via vmstop?
> Yes, of coz, that's where the segfault happened. The tight loop of
> set_link on/off doesn't even work for the single queue case, hence
> that's why I doubted it ever worked for vhost-vdpa.

Right, this is something we need to check. Set_link should stop the
vhost device anyhow, otherwise it should be a bug.

>
> >
> >> I am not sure if there's real chance we can run into issue
> >> in practice due to the incomplete fix, if we don't fix the vq
> >> stop/suspend issue first. Anyway I will try, as other use case e.g, live
> >> migration is likely to get stumbled on it, too.
> > Ok, so I think we probably don't need the "nasty" fix above. Let's fix
> > it with the issue of stop/resume.
> Ok, then does below tentative code change suffice the need? i.e. it
> would fail the request of changing queue pairs when the vhost-vdpa
> backend falls back to the userspace handler, but it's probably the
> easiest way to fix the vmstop segfault.

Probably, let's see.

Thanks

>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index ed231f9..8ba9f09 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1177,6 +1177,7 @@ static int virtio_net_handle_mq(VirtIONet *n,
> uint8_t cmd,
>       struct virtio_net_ctrl_mq mq;
>       size_t s;
>       uint16_t queue_pairs;
> +    NetClientState *nc = qemu_get_queue(n->nic);
>
>       s = iov_to_buf(iov, iov_cnt, 0, &mq, sizeof(mq));
>       if (s != sizeof(mq)) {
> @@ -1196,6 +1197,13 @@ static int virtio_net_handle_mq(VirtIONet *n,
> uint8_t cmd,
>           return VIRTIO_NET_ERR;
>       }
>
> +    /* avoid changing the number of queue_pairs for vdpa device in
> +     * userspace handler.
> +     * TODO: get userspace fallback to work with future patch */
> +    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> +        return VIRTIO_NET_ERR;
> +    }
> +
>       n->curr_queue_pairs = queue_pairs;
>       /* stop the backend before changing the number of queue_pairs to
> avoid handling a
>        * disabled queue */
>
> Thanks,
> -Siwei
> >
> > Thanks
> >
> >> -Siwei
> >>
> >>
> >>>> I have to drop it from the series and posted it later on
> >>>> when ready. Will post a v2 with relevant patches removed.
> >>> Thanks
> >>>
> >>>> Regards,
> >>>> -Siwei
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> Eugenio Pérez (2):
> >>>>>>      virtio-net: Fix indentation
> >>>>>>      virtio-net: Only enable userland vq if using tap backend
> >>>>>>
> >>>>>> Si-Wei Liu (5):
> >>>>>>      virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
> >>>>>>      virtio: don't read pending event on host notifier if disabled
> >>>>>>      vhost-vdpa: fix improper cleanup in net_init_vhost_vdpa
> >>>>>>      vhost-net: fix improper cleanup in vhost_net_start
> >>>>>>      vhost-vdpa: backend feature should set only once
> >>>>>>
> >>>>>>     hw/net/vhost_net.c         |  4 +++-
> >>>>>>     hw/net/virtio-net.c        | 25 +++++++++++++++++++++----
> >>>>>>     hw/virtio/vhost-vdpa.c     |  2 +-
> >>>>>>     hw/virtio/virtio-bus.c     |  3 ++-
> >>>>>>     hw/virtio/virtio.c         | 21 +++++++++++++--------
> >>>>>>     include/hw/virtio/virtio.h |  2 ++
> >>>>>>     net/vhost-vdpa.c           |  4 +++-
> >>>>>>     7 files changed, 45 insertions(+), 16 deletions(-)
> >>>>>>
>