mbox series

[0/7] Enable vdpa net migration with features depending on CVQ

Message ID 20230728172028.2074052-1-eperezma@redhat.com (mailing list archive)
Headers show
Series Enable vdpa net migration with features depending on CVQ | expand

Message

Eugenio Perez Martin July 28, 2023, 5:20 p.m. UTC
At this moment the migration of net features that depends on CVQ is not
possible, as there is no reliable way to restore the device state like mac
address, number of enabled queues, etc to the destination.  This is mainly
caused because the device must only read CVQ, and process all the commands
before resuming the dataplane.

This series lift that requirement, sending the VHOST_VDPA_SET_VRING_ENABLE
ioctl for dataplane vqs only after the device has processed all commands.
---
From FRC:
* Enable vqs early in case CVQ cannot be shadowed.

Eugenio Pérez (7):
  vdpa: export vhost_vdpa_set_vring_ready
  vdpa: add should_enable op
  vdpa: use virtio_ops->should_enable at vhost_vdpa_set_vrings_ready
  vdpa: add stub vhost_vdpa_should_enable
  vdpa: delay enable of data vqs
  vdpa: enable cvq svq if data vq are shadowed
  vdpa: remove net cvq migration blocker

 include/hw/virtio/vhost-vdpa.h |  9 +++++
 hw/virtio/vhost-vdpa.c         | 33 ++++++++++++----
 net/vhost-vdpa.c               | 69 ++++++++++++++++++++++++++--------
 hw/virtio/trace-events         |  2 +-
 4 files changed, 89 insertions(+), 24 deletions(-)

Comments

Jason Wang July 31, 2023, 6:41 a.m. UTC | #1
On Sat, Jul 29, 2023 at 1:20 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> At this moment the migration of net features that depends on CVQ is not
> possible, as there is no reliable way to restore the device state like mac
> address, number of enabled queues, etc to the destination.  This is mainly
> caused because the device must only read CVQ, and process all the commands
> before resuming the dataplane.
>
> This series lift that requirement, sending the VHOST_VDPA_SET_VRING_ENABLE
> ioctl for dataplane vqs only after the device has processed all commands.

I think it's better to explain (that is what I don't understand) why
we can not simply reorder vhost_net_start_one() in vhost_net_start()?

    for (i = 0; i < nvhosts; i++) {
        if (i < data_queue_pairs) {
            peer = qemu_get_peer(ncs, i);
        } else {
            peer = qemu_get_peer(ncs, n->max_queue_pairs);
        }

        if (peer->vring_enable) {
            /* restore vring enable state */
            r = vhost_set_vring_enable(peer, peer->vring_enable);

            if (r < 0) {
                goto err_start;
            }
        }

=>      r = vhost_net_start_one(get_vhost_net(peer), dev);
        if (r < 0) {
            goto err_start;
        }
    }

Can we simply start cvq first here?

Thanks

> ---
> From FRC:
> * Enable vqs early in case CVQ cannot be shadowed.
>
> Eugenio Pérez (7):
>   vdpa: export vhost_vdpa_set_vring_ready
>   vdpa: add should_enable op
>   vdpa: use virtio_ops->should_enable at vhost_vdpa_set_vrings_ready
>   vdpa: add stub vhost_vdpa_should_enable
>   vdpa: delay enable of data vqs
>   vdpa: enable cvq svq if data vq are shadowed
>   vdpa: remove net cvq migration blocker
>
>  include/hw/virtio/vhost-vdpa.h |  9 +++++
>  hw/virtio/vhost-vdpa.c         | 33 ++++++++++++----
>  net/vhost-vdpa.c               | 69 ++++++++++++++++++++++++++--------
>  hw/virtio/trace-events         |  2 +-
>  4 files changed, 89 insertions(+), 24 deletions(-)
>
> --
> 2.39.3
>
>
Eugenio Perez Martin July 31, 2023, 10:15 a.m. UTC | #2
On Mon, Jul 31, 2023 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Sat, Jul 29, 2023 at 1:20 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> >
> > At this moment the migration of net features that depends on CVQ is not
> > possible, as there is no reliable way to restore the device state like mac
> > address, number of enabled queues, etc to the destination.  This is mainly
> > caused because the device must only read CVQ, and process all the commands
> > before resuming the dataplane.
> >
> > This series lift that requirement, sending the VHOST_VDPA_SET_VRING_ENABLE
> > ioctl for dataplane vqs only after the device has processed all commands.
>
> I think it's better to explain (that is what I don't understand) why
> we can not simply reorder vhost_net_start_one() in vhost_net_start()?
>
>     for (i = 0; i < nvhosts; i++) {
>         if (i < data_queue_pairs) {
>             peer = qemu_get_peer(ncs, i);
>         } else {
>             peer = qemu_get_peer(ncs, n->max_queue_pairs);
>         }
>
>         if (peer->vring_enable) {
>             /* restore vring enable state */
>             r = vhost_set_vring_enable(peer, peer->vring_enable);
>
>             if (r < 0) {
>                 goto err_start;
>             }
>         }
>
> =>      r = vhost_net_start_one(get_vhost_net(peer), dev);
>         if (r < 0) {
>             goto err_start;
>         }
>     }
>
> Can we simply start cvq first here?
>

Well the current order is:
* set dev features (conditioned by
* Configure all vq addresses
* Configure all vq size
...
* Enable cvq
* DRIVER_OK
* Enable all the rest of the queues.

If we just start CVQ first, we need to modify vhost_vdpa_set_features
as minimum. A lot of code that depends on vdev->vq_index{,_end} may be
affected.

Also, I'm not sure if all the devices will support configure address,
vq size, etc after DRIVER_OK.

> Thanks
>
> > ---
> > From FRC:
> > * Enable vqs early in case CVQ cannot be shadowed.
> >
> > Eugenio Pérez (7):
> >   vdpa: export vhost_vdpa_set_vring_ready
> >   vdpa: add should_enable op
> >   vdpa: use virtio_ops->should_enable at vhost_vdpa_set_vrings_ready
> >   vdpa: add stub vhost_vdpa_should_enable
> >   vdpa: delay enable of data vqs
> >   vdpa: enable cvq svq if data vq are shadowed
> >   vdpa: remove net cvq migration blocker
> >
> >  include/hw/virtio/vhost-vdpa.h |  9 +++++
> >  hw/virtio/vhost-vdpa.c         | 33 ++++++++++++----
> >  net/vhost-vdpa.c               | 69 ++++++++++++++++++++++++++--------
> >  hw/virtio/trace-events         |  2 +-
> >  4 files changed, 89 insertions(+), 24 deletions(-)
> >
> > --
> > 2.39.3
> >
> >
>
Jason Wang Aug. 1, 2023, 3:48 a.m. UTC | #3
On Mon, Jul 31, 2023 at 6:15 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Mon, Jul 31, 2023 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Jul 29, 2023 at 1:20 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > At this moment the migration of net features that depends on CVQ is not
> > > possible, as there is no reliable way to restore the device state like mac
> > > address, number of enabled queues, etc to the destination.  This is mainly
> > > caused because the device must only read CVQ, and process all the commands
> > > before resuming the dataplane.
> > >
> > > This series lift that requirement, sending the VHOST_VDPA_SET_VRING_ENABLE
> > > ioctl for dataplane vqs only after the device has processed all commands.
> >
> > I think it's better to explain (that is what I don't understand) why
> > we can not simply reorder vhost_net_start_one() in vhost_net_start()?
> >
> >     for (i = 0; i < nvhosts; i++) {
> >         if (i < data_queue_pairs) {
> >             peer = qemu_get_peer(ncs, i);
> >         } else {
> >             peer = qemu_get_peer(ncs, n->max_queue_pairs);
> >         }
> >
> >         if (peer->vring_enable) {
> >             /* restore vring enable state */
> >             r = vhost_set_vring_enable(peer, peer->vring_enable);
> >
> >             if (r < 0) {
> >                 goto err_start;
> >             }
> >         }
> >
> > =>      r = vhost_net_start_one(get_vhost_net(peer), dev);
> >         if (r < 0) {
> >             goto err_start;
> >         }
> >     }
> >
> > Can we simply start cvq first here?
> >
>
> Well the current order is:
> * set dev features (conditioned by
> * Configure all vq addresses
> * Configure all vq size
> ...
> * Enable cvq
> * DRIVER_OK
> * Enable all the rest of the queues.
>
> If we just start CVQ first, we need to modify vhost_vdpa_set_features
> as minimum. A lot of code that depends on vdev->vq_index{,_end} may be
> affected.
>
> Also, I'm not sure if all the devices will support configure address,
> vq size, etc after DRIVER_OK.

Ok, so basically what I meant is to seek a way to refactor
vhost_net_start() instead of introducing new ops (e.g introducing
virtio ops in vhost seems a layer violation anyhow).

Can we simply factor VRING_ENABLE out and then we can enable vring in
any order as we want in vhost_net_start()?

Thanks

>
> > Thanks
> >
> > > ---
> > > From FRC:
> > > * Enable vqs early in case CVQ cannot be shadowed.
> > >
> > > Eugenio Pérez (7):
> > >   vdpa: export vhost_vdpa_set_vring_ready
> > >   vdpa: add should_enable op
> > >   vdpa: use virtio_ops->should_enable at vhost_vdpa_set_vrings_ready
> > >   vdpa: add stub vhost_vdpa_should_enable
> > >   vdpa: delay enable of data vqs
> > >   vdpa: enable cvq svq if data vq are shadowed
> > >   vdpa: remove net cvq migration blocker
> > >
> > >  include/hw/virtio/vhost-vdpa.h |  9 +++++
> > >  hw/virtio/vhost-vdpa.c         | 33 ++++++++++++----
> > >  net/vhost-vdpa.c               | 69 ++++++++++++++++++++++++++--------
> > >  hw/virtio/trace-events         |  2 +-
> > >  4 files changed, 89 insertions(+), 24 deletions(-)
> > >
> > > --
> > > 2.39.3
> > >
> > >
> >
>