mbox series

[v4,0/4] Implement vdpasim stop operation

Message ID 20220526124338.36247-1-eperezma@redhat.com (mailing list archive)
Headers show
Series Implement vdpasim stop operation | expand

Message

Eugenio Perez Martin May 26, 2022, 12:43 p.m. UTC
Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
that backend feature and userspace can effectively stop the device.

This is a must before get virtqueue indexes (base) for live migration,
since the device could modify them after userland gets them. There are
individual ways to perform that action for some devices
(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
way to perform it for any vhost device (and, in particular, vhost-vdpa).

After the return of ioctl with stop != 0, the device MUST finish any
pending operations like in flight requests. It must also preserve all
the necessary state (the virtqueue vring base plus the possible device
specific states) that is required for restoring in the future. The
device must not change its configuration after that point.

After the return of ioctl with stop == 0, the device can continue
processing buffers as long as typical conditions are met (vq is enabled,
DRIVER_OK status bit is enabled, etc).

In the future, we will provide features similar to VHOST_USER_GET_INFLIGHT_FD
so the device can save pending operations.

Comments are welcome.

v4:
* Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.

v3:
* s/VHOST_STOP/VHOST_VDPA_STOP/
* Add documentation and requirements of the ioctl above its definition.

v2:
* Replace raw _F_STOP with BIT_ULL(_F_STOP).
* Fix obtaining of stop ioctl arg (it was not obtained but written).
* Add stop to vdpa_sim_blk.

Eugenio Pérez (4):
  vdpa: Add stop operation
  vhost-vdpa: introduce STOP backend feature bit
  vhost-vdpa: uAPI to stop the device
  vdpa_sim: Implement stop vdpa op

 drivers/vdpa/vdpa_sim/vdpa_sim.c     | 21 +++++++++++++++++
 drivers/vdpa/vdpa_sim/vdpa_sim.h     |  1 +
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
 drivers/vhost/vdpa.c                 | 34 +++++++++++++++++++++++++++-
 include/linux/vdpa.h                 |  6 +++++
 include/uapi/linux/vhost.h           | 14 ++++++++++++
 include/uapi/linux/vhost_types.h     |  2 ++
 8 files changed, 83 insertions(+), 1 deletion(-)

Comments

Parav Pandit May 26, 2022, 12:54 p.m. UTC | #1
> From: Eugenio Pérez <eperezma@redhat.com>
> Sent: Thursday, May 26, 2022 8:44 AM

> Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> 
> that backend feature and userspace can effectively stop the device.
> 
> 
> 
> This is a must before get virtqueue indexes (base) for live migration,
> 
> since the device could modify them after userland gets them. There are
> 
> individual ways to perform that action for some devices
> 
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> was no
> 
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
> 
> 
> 
> After the return of ioctl with stop != 0, the device MUST finish any
> 
> pending operations like in flight requests. It must also preserve all
> 
> the necessary state (the virtqueue vring base plus the possible device
> 
> specific states) that is required for restoring in the future. The
> 
> device must not change its configuration after that point.
> 
> 
> 
> After the return of ioctl with stop == 0, the device can continue
> 
> processing buffers as long as typical conditions are met (vq is enabled,
> 
> DRIVER_OK status bit is enabled, etc).

Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.

Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?
This is in the context of other discussion we had in the LM series.
Jason Wang May 27, 2022, 2:26 a.m. UTC | #2
On Thu, May 26, 2022 at 8:54 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Eugenio Pérez <eperezma@redhat.com>
> > Sent: Thursday, May 26, 2022 8:44 AM
>
> > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> >
> > that backend feature and userspace can effectively stop the device.
> >
> >
> >
> > This is a must before get virtqueue indexes (base) for live migration,
> >
> > since the device could modify them after userland gets them. There are
> >
> > individual ways to perform that action for some devices
> >
> > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > was no
> >
> > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> >
> >
> >
> > After the return of ioctl with stop != 0, the device MUST finish any
> >
> > pending operations like in flight requests. It must also preserve all
> >
> > the necessary state (the virtqueue vring base plus the possible device
> >
> > specific states) that is required for restoring in the future. The
> >
> > device must not change its configuration after that point.
> >
> >
> >
> > After the return of ioctl with stop == 0, the device can continue
> >
> > processing buffers as long as typical conditions are met (vq is enabled,
> >
> > DRIVER_OK status bit is enabled, etc).
>
> Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.

We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
means it is expected to implement at least a subset of
VIRTIO_CONFIG_S_STOP.

>
> Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?

So the idea is to add capability that does not exist in the spec. Then
came the stop/resume which can't be done via DRIVER_OK. I think we
should only allow the stop/resume to succeed after DRIVER_OK is set.

> This is in the context of other discussion we had in the LM series.

Do you see any issue that blocks the live migration?

Thanks
Eugenio Perez Martin May 27, 2022, 7:55 a.m. UTC | #3
On Fri, May 27, 2022 at 4:26 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, May 26, 2022 at 8:54 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Eugenio Pérez <eperezma@redhat.com>
> > > Sent: Thursday, May 26, 2022 8:44 AM
> >
> > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > >
> > > that backend feature and userspace can effectively stop the device.
> > >
> > >
> > >
> > > This is a must before get virtqueue indexes (base) for live migration,
> > >
> > > since the device could modify them after userland gets them. There are
> > >
> > > individual ways to perform that action for some devices
> > >
> > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > > was no
> > >
> > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > >
> > >
> > >
> > > After the return of ioctl with stop != 0, the device MUST finish any
> > >
> > > pending operations like in flight requests. It must also preserve all
> > >
> > > the necessary state (the virtqueue vring base plus the possible device
> > >
> > > specific states) that is required for restoring in the future. The
> > >
> > > device must not change its configuration after that point.
> > >
> > >
> > >
> > > After the return of ioctl with stop == 0, the device can continue
> > >
> > > processing buffers as long as typical conditions are met (vq is enabled,
> > >
> > > DRIVER_OK status bit is enabled, etc).
> >
> > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.
>
> We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
> means it is expected to implement at least a subset of
> VIRTIO_CONFIG_S_STOP.
>

Appending a link to the proposal, just for reference [1].

> >
> > Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?
>

Parav, I'm not sure I follow you here.

By the proposal, the resume of the device is (From qemu POV):
1. To configure all data vqs and cvq (addr, num, ...)
2. To enable only CVQ, not data vqs
3. To send DRIVER_OK
4. Wait for all buffers of CVQ to be used
5. To enable all others data vqs (individual ioctl at the moment)

Where can we fit the resume (as "stop(false)") here? If the device is
stopped (as if we send stop(true) before DRIVER_OK), we don't read CVQ
first. If we send it right after (or instead) DRIVER_OK, data buffers
can reach data vqs before configuring RSS.

> So the idea is to add capability that does not exist in the spec. Then
> came the stop/resume which can't be done via DRIVER_OK. I think we
> should only allow the stop/resume to succeed after DRIVER_OK is set.
>
> > This is in the context of other discussion we had in the LM series.
>
> Do you see any issue that blocks the live migration?
>
> Thanks
>
Michael S. Tsirkin May 27, 2022, 10:55 a.m. UTC | #4
On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> 
> 
> > From: Eugenio Pérez <eperezma@redhat.com>
> > Sent: Thursday, May 26, 2022 8:44 AM
> 
> > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > 
> > that backend feature and userspace can effectively stop the device.
> > 
> > 
> > 
> > This is a must before get virtqueue indexes (base) for live migration,
> > 
> > since the device could modify them after userland gets them. There are
> > 
> > individual ways to perform that action for some devices
> > 
> > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > was no
> > 
> > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > 
> > 
> > 
> > After the return of ioctl with stop != 0, the device MUST finish any
> > 
> > pending operations like in flight requests. It must also preserve all
> > 
> > the necessary state (the virtqueue vring base plus the possible device
> > 
> > specific states) that is required for restoring in the future. The
> > 
> > device must not change its configuration after that point.
> > 
> > 
> > 
> > After the return of ioctl with stop == 0, the device can continue
> > 
> > processing buffers as long as typical conditions are met (vq is enabled,
> > 
> > DRIVER_OK status bit is enabled, etc).
> 
> Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.
> 
> Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?
> This is in the context of other discussion we had in the LM series.

If there's something in the spec that does this then let's use that.
Unfortunately the LM series seems to be stuck on moving
bits around with the admin virtqueue ...
Jason Wang May 30, 2022, 3:39 a.m. UTC | #5
On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Eugenio Pérez <eperezma@redhat.com>
> > > Sent: Thursday, May 26, 2022 8:44 AM
> >
> > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > >
> > > that backend feature and userspace can effectively stop the device.
> > >
> > >
> > >
> > > This is a must before get virtqueue indexes (base) for live migration,
> > >
> > > since the device could modify them after userland gets them. There are
> > >
> > > individual ways to perform that action for some devices
> > >
> > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > > was no
> > >
> > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > >
> > >
> > >
> > > After the return of ioctl with stop != 0, the device MUST finish any
> > >
> > > pending operations like in flight requests. It must also preserve all
> > >
> > > the necessary state (the virtqueue vring base plus the possible device
> > >
> > > specific states) that is required for restoring in the future. The
> > >
> > > device must not change its configuration after that point.
> > >
> > >
> > >
> > > After the return of ioctl with stop == 0, the device can continue
> > >
> > > processing buffers as long as typical conditions are met (vq is enabled,
> > >
> > > DRIVER_OK status bit is enabled, etc).
> >
> > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.
> >
> > Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?
> > This is in the context of other discussion we had in the LM series.
>
> If there's something in the spec that does this then let's use that.

Actually, we try to propose a independent feature here:

https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html

Does it make sense to you?

Thanks

> Unfortunately the LM series seems to be stuck on moving
> bits around with the admin virtqueue ...
>
> --
> MST
>
Michael S. Tsirkin May 31, 2022, 5:40 a.m. UTC | #6
On Mon, May 30, 2022 at 11:39:21AM +0800, Jason Wang wrote:
> On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > Sent: Thursday, May 26, 2022 8:44 AM
> > >
> > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > > >
> > > > that backend feature and userspace can effectively stop the device.
> > > >
> > > >
> > > >
> > > > This is a must before get virtqueue indexes (base) for live migration,
> > > >
> > > > since the device could modify them after userland gets them. There are
> > > >
> > > > individual ways to perform that action for some devices
> > > >
> > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > > > was no
> > > >
> > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop != 0, the device MUST finish any
> > > >
> > > > pending operations like in flight requests. It must also preserve all
> > > >
> > > > the necessary state (the virtqueue vring base plus the possible device
> > > >
> > > > specific states) that is required for restoring in the future. The
> > > >
> > > > device must not change its configuration after that point.
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop == 0, the device can continue
> > > >
> > > > processing buffers as long as typical conditions are met (vq is enabled,
> > > >
> > > > DRIVER_OK status bit is enabled, etc).
> > >
> > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.
> > >
> > > Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?
> > > This is in the context of other discussion we had in the LM series.
> >
> > If there's something in the spec that does this then let's use that.
> 
> Actually, we try to propose a independent feature here:
> 
> https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> 
> Does it make sense to you?
> 
> Thanks

But I thought the LM patches are trying to replace all that?


> > Unfortunately the LM series seems to be stuck on moving
> > bits around with the admin virtqueue ...
> >
> > --
> > MST
> >
Michael S. Tsirkin May 31, 2022, 5:42 a.m. UTC | #7
On Thu, May 26, 2022 at 02:43:34PM +0200, Eugenio Pérez wrote:
> Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> that backend feature and userspace can effectively stop the device.
> 
> This is a must before get virtqueue indexes (base) for live migration,
> since the device could modify them after userland gets them. There are
> individual ways to perform that action for some devices
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
> 
> After the return of ioctl with stop != 0, the device MUST finish any
> pending operations like in flight requests. It must also preserve all
> the necessary state (the virtqueue vring base plus the possible device
> specific states) that is required for restoring in the future. The
> device must not change its configuration after that point.
> 
> After the return of ioctl with stop == 0, the device can continue
> processing buffers as long as typical conditions are met (vq is enabled,
> DRIVER_OK status bit is enabled, etc).
> 
> In the future, we will provide features similar to VHOST_USER_GET_INFLIGHT_FD
> so the device can save pending operations.
> 
> Comments are welcome.


So given this is just for simulator and affects UAPI I think it's fine
to make it wait for the next merge window, until there's a consensus.
Right?

> v4:
> * Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.
> 
> v3:
> * s/VHOST_STOP/VHOST_VDPA_STOP/
> * Add documentation and requirements of the ioctl above its definition.
> 
> v2:
> * Replace raw _F_STOP with BIT_ULL(_F_STOP).
> * Fix obtaining of stop ioctl arg (it was not obtained but written).
> * Add stop to vdpa_sim_blk.
> 
> Eugenio Pérez (4):
>   vdpa: Add stop operation
>   vhost-vdpa: introduce STOP backend feature bit
>   vhost-vdpa: uAPI to stop the device
>   vdpa_sim: Implement stop vdpa op
> 
>  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 21 +++++++++++++++++
>  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  1 +
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
>  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
>  drivers/vhost/vdpa.c                 | 34 +++++++++++++++++++++++++++-
>  include/linux/vdpa.h                 |  6 +++++
>  include/uapi/linux/vhost.h           | 14 ++++++++++++
>  include/uapi/linux/vhost_types.h     |  2 ++
>  8 files changed, 83 insertions(+), 1 deletion(-)
> 
> -- 
> 2.31.1
>
Jason Wang May 31, 2022, 6:44 a.m. UTC | #8
On Tue, May 31, 2022 at 1:40 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, May 30, 2022 at 11:39:21AM +0800, Jason Wang wrote:
> > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > >
> > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > > > >
> > > > > that backend feature and userspace can effectively stop the device.
> > > > >
> > > > >
> > > > >
> > > > > This is a must before get virtqueue indexes (base) for live migration,
> > > > >
> > > > > since the device could modify them after userland gets them. There are
> > > > >
> > > > > individual ways to perform that action for some devices
> > > > >
> > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there
> > > > > was no
> > > > >
> > > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop != 0, the device MUST finish any
> > > > >
> > > > > pending operations like in flight requests. It must also preserve all
> > > > >
> > > > > the necessary state (the virtqueue vring base plus the possible device
> > > > >
> > > > > specific states) that is required for restoring in the future. The
> > > > >
> > > > > device must not change its configuration after that point.
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop == 0, the device can continue
> > > > >
> > > > > processing buffers as long as typical conditions are met (vq is enabled,
> > > > >
> > > > > DRIVER_OK status bit is enabled, etc).
> > > >
> > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to any mechanism in the virtio spec.
> > > >
> > > > Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?
> > > > This is in the context of other discussion we had in the LM series.
> > >
> > > If there's something in the spec that does this then let's use that.
> >
> > Actually, we try to propose a independent feature here:
> >
> > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> >
> > Does it make sense to you?
> >
> > Thanks
>
> But I thought the LM patches are trying to replace all that?

I'm not sure, and actually I think they are orthogonal. We need a new
state and the command to set the state could be transport specific or
a virtqueue.

As far as I know, most of the vendors have implemented this semantic.

Thanks

>
>
> > > Unfortunately the LM series seems to be stuck on moving
> > > bits around with the admin virtqueue ...
> > >
> > > --
> > > MST
> > >
>
Eugenio Perez Martin May 31, 2022, 7:13 a.m. UTC | #9
On Tue, May 31, 2022 at 7:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, May 26, 2022 at 02:43:34PM +0200, Eugenio Pérez wrote:
> > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > that backend feature and userspace can effectively stop the device.
> >
> > This is a must before get virtqueue indexes (base) for live migration,
> > since the device could modify them after userland gets them. There are
> > individual ways to perform that action for some devices
> > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> >
> > After the return of ioctl with stop != 0, the device MUST finish any
> > pending operations like in flight requests. It must also preserve all
> > the necessary state (the virtqueue vring base plus the possible device
> > specific states) that is required for restoring in the future. The
> > device must not change its configuration after that point.
> >
> > After the return of ioctl with stop == 0, the device can continue
> > processing buffers as long as typical conditions are met (vq is enabled,
> > DRIVER_OK status bit is enabled, etc).
> >
> > In the future, we will provide features similar to VHOST_USER_GET_INFLIGHT_FD
> > so the device can save pending operations.
> >
> > Comments are welcome.
>
>
> So given this is just for simulator and affects UAPI I think it's fine
> to make it wait for the next merge window, until there's a consensus.
> Right?
>

While the change is only implemented in the simulator at this moment,
it's just the very last missing piece in the kernel to implement
complete live migration for net devices with cvq :). All vendor
drivers can implement this call with current code, just a little bit
of plumbing is needed. And it was accepted in previous meetings.

If it proves it works for every configuration (nested, etc), the
implementation can forward the call to the admin vq for example. At
the moment, it follows the proposed stop status bit sematic to stop
the device, which POC has been tested in these circumstances.

Thanks!

> > v4:
> > * Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.
> >
> > v3:
> > * s/VHOST_STOP/VHOST_VDPA_STOP/
> > * Add documentation and requirements of the ioctl above its definition.
> >
> > v2:
> > * Replace raw _F_STOP with BIT_ULL(_F_STOP).
> > * Fix obtaining of stop ioctl arg (it was not obtained but written).
> > * Add stop to vdpa_sim_blk.
> >
> > Eugenio Pérez (4):
> >   vdpa: Add stop operation
> >   vhost-vdpa: introduce STOP backend feature bit
> >   vhost-vdpa: uAPI to stop the device
> >   vdpa_sim: Implement stop vdpa op
> >
> >  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 21 +++++++++++++++++
> >  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  1 +
> >  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
> >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
> >  drivers/vhost/vdpa.c                 | 34 +++++++++++++++++++++++++++-
> >  include/linux/vdpa.h                 |  6 +++++
> >  include/uapi/linux/vhost.h           | 14 ++++++++++++
> >  include/uapi/linux/vhost_types.h     |  2 ++
> >  8 files changed, 83 insertions(+), 1 deletion(-)
> >
> > --
> > 2.31.1
> >
>
Michael S. Tsirkin May 31, 2022, 9:23 a.m. UTC | #10
On Tue, May 31, 2022 at 09:13:38AM +0200, Eugenio Perez Martin wrote:
> On Tue, May 31, 2022 at 7:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, May 26, 2022 at 02:43:34PM +0200, Eugenio Pérez wrote:
> > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > > that backend feature and userspace can effectively stop the device.
> > >
> > > This is a must before get virtqueue indexes (base) for live migration,
> > > since the device could modify them after userland gets them. There are
> > > individual ways to perform that action for some devices
> > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > >
> > > After the return of ioctl with stop != 0, the device MUST finish any
> > > pending operations like in flight requests. It must also preserve all
> > > the necessary state (the virtqueue vring base plus the possible device
> > > specific states) that is required for restoring in the future. The
> > > device must not change its configuration after that point.
> > >
> > > After the return of ioctl with stop == 0, the device can continue
> > > processing buffers as long as typical conditions are met (vq is enabled,
> > > DRIVER_OK status bit is enabled, etc).
> > >
> > > In the future, we will provide features similar to VHOST_USER_GET_INFLIGHT_FD
> > > so the device can save pending operations.
> > >
> > > Comments are welcome.
> >
> >
> > So given this is just for simulator and affects UAPI I think it's fine
> > to make it wait for the next merge window, until there's a consensus.
> > Right?
> >
> 
> While the change is only implemented in the simulator at this moment,
> it's just the very last missing piece in the kernel to implement
> complete live migration for net devices with cvq :). All vendor
> drivers can implement this call with current code, just a little bit
> of plumbing is needed. And it was accepted in previous meetings.
> 
> If it proves it works for every configuration (nested, etc), the
> implementation can forward the call to the admin vq for example. At
> the moment, it follows the proposed stop status bit sematic to stop
> the device, which POC has been tested in these circumstances.
> 
> Thanks!

Oh absolutely, but I am guessing this plumbing won't
be ready for this merge window.

> > > v4:
> > > * Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.
> > >
> > > v3:
> > > * s/VHOST_STOP/VHOST_VDPA_STOP/
> > > * Add documentation and requirements of the ioctl above its definition.
> > >
> > > v2:
> > > * Replace raw _F_STOP with BIT_ULL(_F_STOP).
> > > * Fix obtaining of stop ioctl arg (it was not obtained but written).
> > > * Add stop to vdpa_sim_blk.
> > >
> > > Eugenio Pérez (4):
> > >   vdpa: Add stop operation
> > >   vhost-vdpa: introduce STOP backend feature bit
> > >   vhost-vdpa: uAPI to stop the device
> > >   vdpa_sim: Implement stop vdpa op
> > >
> > >  drivers/vdpa/vdpa_sim/vdpa_sim.c     | 21 +++++++++++++++++
> > >  drivers/vdpa/vdpa_sim/vdpa_sim.h     |  1 +
> > >  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
> > >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
> > >  drivers/vhost/vdpa.c                 | 34 +++++++++++++++++++++++++++-
> > >  include/linux/vdpa.h                 |  6 +++++
> > >  include/uapi/linux/vhost.h           | 14 ++++++++++++
> > >  include/uapi/linux/vhost_types.h     |  2 ++
> > >  8 files changed, 83 insertions(+), 1 deletion(-)
> > >
> > > --
> > > 2.31.1
> > >
> >
Parav Pandit May 31, 2022, 8:19 p.m. UTC | #11
> From: Jason Wang <jasowang@redhat.com>
> Sent: Sunday, May 29, 2022 11:39 PM
> 
> On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > Sent: Thursday, May 26, 2022 8:44 AM
> > >
> > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > offer
> > > >
> > > > that backend feature and userspace can effectively stop the device.
> > > >
> > > >
> > > >
> > > > This is a must before get virtqueue indexes (base) for live
> > > > migration,
> > > >
> > > > since the device could modify them after userland gets them. There
> > > > are
> > > >
> > > > individual ways to perform that action for some devices
> > > >
> > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> there
> > > > was no
> > > >
> > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > any
> > > >
> > > > pending operations like in flight requests. It must also preserve
> > > > all
> > > >
> > > > the necessary state (the virtqueue vring base plus the possible
> > > > device
> > > >
> > > > specific states) that is required for restoring in the future. The
> > > >
> > > > device must not change its configuration after that point.
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop == 0, the device can continue
> > > >
> > > > processing buffers as long as typical conditions are met (vq is
> > > > enabled,
> > > >
> > > > DRIVER_OK status bit is enabled, etc).
> > >
> > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> any mechanism in the virtio spec.
> > >
> > > Why can't we use this ioctl() to indicate driver to start/stop the device
> instead of driving it through the driver_ok?
> > > This is in the context of other discussion we had in the LM series.
> >
> > If there's something in the spec that does this then let's use that.
> 
> Actually, we try to propose a independent feature here:
> 
> https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> 
This will stop the device for all the operations.
Once the device is stopped, its state cannot be queried further as device won't respond.
It has limited use case.
What we need is to stop non admin queue related portion of the device.

> Does it make sense to you?
> 
> Thanks
> 
> > Unfortunately the LM series seems to be stuck on moving bits around
> > with the admin virtqueue ...
> >
> > --
> > MST
> >
Parav Pandit May 31, 2022, 8:26 p.m. UTC | #12
> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Friday, May 27, 2022 3:55 AM
> 
> On Fri, May 27, 2022 at 4:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Thu, May 26, 2022 at 8:54 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > Sent: Thursday, May 26, 2022 8:44 AM
> > >
> > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > offer
> > > >
> > > > that backend feature and userspace can effectively stop the device.
> > > >
> > > >
> > > >
> > > > This is a must before get virtqueue indexes (base) for live
> > > > migration,
> > > >
> > > > since the device could modify them after userland gets them. There
> > > > are
> > > >
> > > > individual ways to perform that action for some devices
> > > >
> > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> there
> > > > was no
> > > >
> > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > any
> > > >
> > > > pending operations like in flight requests. It must also preserve
> > > > all
> > > >
> > > > the necessary state (the virtqueue vring base plus the possible
> > > > device
> > > >
> > > > specific states) that is required for restoring in the future. The
> > > >
> > > > device must not change its configuration after that point.
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop == 0, the device can continue
> > > >
> > > > processing buffers as long as typical conditions are met (vq is
> > > > enabled,
> > > >
> > > > DRIVER_OK status bit is enabled, etc).
> > >
> > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> any mechanism in the virtio spec.
> >
> > We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
> > means it is expected to implement at least a subset of
> > VIRTIO_CONFIG_S_STOP.
> >
> 
> Appending a link to the proposal, just for reference [1].
> 
> > >
> > > Why can't we use this ioctl() to indicate driver to start/stop the device
> instead of driving it through the driver_ok?
> >
> 
> Parav, I'm not sure I follow you here.
> 
> By the proposal, the resume of the device is (From qemu POV):
> 1. To configure all data vqs and cvq (addr, num, ...) 2. To enable only CVQ, not
> data vqs 3. To send DRIVER_OK 4. Wait for all buffers of CVQ to be used 5. To
> enable all others data vqs (individual ioctl at the moment)
> 
> Where can we fit the resume (as "stop(false)") here? If the device is stopped
> (as if we send stop(true) before DRIVER_OK), we don't read CVQ first. If we
> send it right after (or instead) DRIVER_OK, data buffers can reach data vqs
> before configuring RSS.
> 
It doesn’t make sense with currently proposed way of using cvq to replay the config.
Need to continue with currently proposed temporary method that subsequently to be replaced with optimized flow as we discussed.
Jason Wang June 1, 2022, 2:42 a.m. UTC | #13
On Wed, Jun 1, 2022 at 4:19 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Sunday, May 29, 2022 11:39 PM
> >
> > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > >
> > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > > offer
> > > > >
> > > > > that backend feature and userspace can effectively stop the device.
> > > > >
> > > > >
> > > > >
> > > > > This is a must before get virtqueue indexes (base) for live
> > > > > migration,
> > > > >
> > > > > since the device could modify them after userland gets them. There
> > > > > are
> > > > >
> > > > > individual ways to perform that action for some devices
> > > > >
> > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> > there
> > > > > was no
> > > > >
> > > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > > any
> > > > >
> > > > > pending operations like in flight requests. It must also preserve
> > > > > all
> > > > >
> > > > > the necessary state (the virtqueue vring base plus the possible
> > > > > device
> > > > >
> > > > > specific states) that is required for restoring in the future. The
> > > > >
> > > > > device must not change its configuration after that point.
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop == 0, the device can continue
> > > > >
> > > > > processing buffers as long as typical conditions are met (vq is
> > > > > enabled,
> > > > >
> > > > > DRIVER_OK status bit is enabled, etc).
> > > >
> > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> > any mechanism in the virtio spec.
> > > >
> > > > Why can't we use this ioctl() to indicate driver to start/stop the device
> > instead of driving it through the driver_ok?
> > > > This is in the context of other discussion we had in the LM series.
> > >
> > > If there's something in the spec that does this then let's use that.
> >
> > Actually, we try to propose a independent feature here:
> >
> > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> >
> This will stop the device for all the operations.

Well, the ability to query the virtqueue state was proposed as another
feature (Eugenio, please correct me). This should be sufficient for
making virtio-net to be live migrated.

https://lists.oasis-open.org/archives/virtio-comment/202103/msg00008.html

> Once the device is stopped, its state cannot be queried further as device won't respond.
> It has limited use case.
> What we need is to stop non admin queue related portion of the device.

See above.

Thanks

>
> > Does it make sense to you?
> >
> > Thanks
> >
> > > Unfortunately the LM series seems to be stuck on moving bits around
> > > with the admin virtqueue ...
> > >
> > > --
> > > MST
> > >
>
Eugenio Perez Martin June 1, 2022, 9:49 a.m. UTC | #14
On Tue, May 31, 2022 at 10:19 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Sunday, May 29, 2022 11:39 PM
> >
> > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > >
> > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > > offer
> > > > >
> > > > > that backend feature and userspace can effectively stop the device.
> > > > >
> > > > >
> > > > >
> > > > > This is a must before get virtqueue indexes (base) for live
> > > > > migration,
> > > > >
> > > > > since the device could modify them after userland gets them. There
> > > > > are
> > > > >
> > > > > individual ways to perform that action for some devices
> > > > >
> > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> > there
> > > > > was no
> > > > >
> > > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > > any
> > > > >
> > > > > pending operations like in flight requests. It must also preserve
> > > > > all
> > > > >
> > > > > the necessary state (the virtqueue vring base plus the possible
> > > > > device
> > > > >
> > > > > specific states) that is required for restoring in the future. The
> > > > >
> > > > > device must not change its configuration after that point.
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop == 0, the device can continue
> > > > >
> > > > > processing buffers as long as typical conditions are met (vq is
> > > > > enabled,
> > > > >
> > > > > DRIVER_OK status bit is enabled, etc).
> > > >
> > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> > any mechanism in the virtio spec.
> > > >
> > > > Why can't we use this ioctl() to indicate driver to start/stop the device
> > instead of driving it through the driver_ok?
> > > > This is in the context of other discussion we had in the LM series.
> > >
> > > If there's something in the spec that does this then let's use that.
> >
> > Actually, we try to propose a independent feature here:
> >
> > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> >
> This will stop the device for all the operations.
> Once the device is stopped, its state cannot be queried further as device won't respond.
> It has limited use case.
> What we need is to stop non admin queue related portion of the device.
>

Still don't follow this, sorry.

Adding the admin vq to the mix, this would stop a device of a device
group, but not the whole virtqueue group. If the admin VQ is offered
by the PF (since it's not exposed to the guest), it will continue
accepting requests as normal. If it's exposed in the VF, I think the
best bet is to shadow it, since guest and host requests could
conflict.

Since this is offered through vdpa, the device backend driver can
route it to whatever method works better for the hardware. For
example, to send an admin vq command to the PF. That's why it's
important to keep the feature as self-contained and orthogonal to
others as possible.

> > Does it make sense to you?
> >
> > Thanks
> >
> > > Unfortunately the LM series seems to be stuck on moving bits around
> > > with the admin virtqueue ...
> > >
> > > --
> > > MST
> > >
>
Eugenio Perez Martin June 1, 2022, 10:48 a.m. UTC | #15
On Tue, May 31, 2022 at 10:26 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Eugenio Perez Martin <eperezma@redhat.com>
> > Sent: Friday, May 27, 2022 3:55 AM
> >
> > On Fri, May 27, 2022 at 4:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Thu, May 26, 2022 at 8:54 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > >
> > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > > offer
> > > > >
> > > > > that backend feature and userspace can effectively stop the device.
> > > > >
> > > > >
> > > > >
> > > > > This is a must before get virtqueue indexes (base) for live
> > > > > migration,
> > > > >
> > > > > since the device could modify them after userland gets them. There
> > > > > are
> > > > >
> > > > > individual ways to perform that action for some devices
> > > > >
> > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> > there
> > > > > was no
> > > > >
> > > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > > any
> > > > >
> > > > > pending operations like in flight requests. It must also preserve
> > > > > all
> > > > >
> > > > > the necessary state (the virtqueue vring base plus the possible
> > > > > device
> > > > >
> > > > > specific states) that is required for restoring in the future. The
> > > > >
> > > > > device must not change its configuration after that point.
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop == 0, the device can continue
> > > > >
> > > > > processing buffers as long as typical conditions are met (vq is
> > > > > enabled,
> > > > >
> > > > > DRIVER_OK status bit is enabled, etc).
> > > >
> > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> > any mechanism in the virtio spec.
> > >
> > > We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
> > > means it is expected to implement at least a subset of
> > > VIRTIO_CONFIG_S_STOP.
> > >
> >
> > Appending a link to the proposal, just for reference [1].
> >
> > > >
> > > > Why can't we use this ioctl() to indicate driver to start/stop the device
> > instead of driving it through the driver_ok?
> > >
> >
> > Parav, I'm not sure I follow you here.
> >
> > By the proposal, the resume of the device is (From qemu POV):
> > 1. To configure all data vqs and cvq (addr, num, ...) 2. To enable only CVQ, not
> > data vqs 3. To send DRIVER_OK 4. Wait for all buffers of CVQ to be used 5. To
> > enable all others data vqs (individual ioctl at the moment)
> >
> > Where can we fit the resume (as "stop(false)") here? If the device is stopped
> > (as if we send stop(true) before DRIVER_OK), we don't read CVQ first. If we
> > send it right after (or instead) DRIVER_OK, data buffers can reach data vqs
> > before configuring RSS.
> >
> It doesn’t make sense with currently proposed way of using cvq to replay the config.

The stop/resume part is not intended to restore the config through the
CVQ. The stop call is issued to be able to retrieve the vq status
(base, in vhost terminology). The symmetric operation (resume) was
added on demand, it was never intended to be part neither of the
config restore or the virtqueue state restore workflow.

The configuration restore workflow was modelled after the device
initialization, so each part needed to add the less things the better,
and only qemu needed to be changed. From the device POV, there is no
need to learn new tricks for this. The support of .set_vq_ready and
.get_vq_ready is already in the kernel in every vdpa backend driver.

> Need to continue with currently proposed temporary method that subsequently to be replaced with optimized flow as we discussed.

Back then, it was noted by you that enabling each data vq individually
after DRIVER_OK is slow on mlx5 devices. The solution was to batch
these enable calls accounting in the kernel, achieving no growth in
the vdpa uAPI layer. The proposed solution did not involve the resume
operation.

After that, you proposed in this thread "Why can't we use this ioctl()
to indicate driver to start/stop the device instead of driving it
through the driver_ok?". As I understand, that is a mistake, since it
requires the device, the vdpa layer, etc... to learn new tricks. It
requires qemu to duplicate the initialization layer (it's now common
for start and restore config). But I might have not seen the whole
picture, missing advantages of using the resume call for this
workflow. Can you describe the workflow you have in mind? How does
that new workflow affect this proposal?

I'm ok to change the proposal as long as we find we obtain a net gain.

Thanks!
Parav Pandit June 1, 2022, 6:58 p.m. UTC | #16
> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, May 31, 2022 10:42 PM
> 
> Well, the ability to query the virtqueue state was proposed as another
> feature (Eugenio, please correct me). This should be sufficient for making
> virtio-net to be live migrated.
> 
The device is stopped, it won't answer to this special vq config done here.
Programming all of these using cfg registers doesn't scale for on-chip memory and for the speed.

Next would be to program hundreds of statistics of the 64 VQs through giant PCI config space register in some busy polling scheme.

I can clearly see how all these are inefficient for faster LM.
We need an efficient AQ to proceed with at minimum.

> https://lists.oasis-open.org/archives/virtio-comment/202103/msg00008.html
> 
> > Once the device is stopped, its state cannot be queried further as device
> won't respond.
> > It has limited use case.
> > What we need is to stop non admin queue related portion of the device.
Parav Pandit June 1, 2022, 7:30 p.m. UTC | #17
> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Wednesday, June 1, 2022 5:50 AM
> 
> On Tue, May 31, 2022 at 10:19 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Sunday, May 29, 2022 11:39 PM
> > >
> > > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > >
> > > > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > > >
> > > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa
> > > > > > will offer
> > > > > >
> > > > > > that backend feature and userspace can effectively stop the device.
> > > > > >
> > > > > >
> > > > > >
> > > > > > This is a must before get virtqueue indexes (base) for live
> > > > > > migration,
> > > > > >
> > > > > > since the device could modify them after userland gets them.
> > > > > > There are
> > > > > >
> > > > > > individual ways to perform that action for some devices
> > > > > >
> > > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...)
> but
> > > there
> > > > > > was no
> > > > > >
> > > > > > way to perform it for any vhost device (and, in particular, vhost-
> vdpa).
> > > > > >
> > > > > >
> > > > > >
> > > > > > After the return of ioctl with stop != 0, the device MUST
> > > > > > finish any
> > > > > >
> > > > > > pending operations like in flight requests. It must also
> > > > > > preserve all
> > > > > >
> > > > > > the necessary state (the virtqueue vring base plus the
> > > > > > possible device
> > > > > >
> > > > > > specific states) that is required for restoring in the future.
> > > > > > The
> > > > > >
> > > > > > device must not change its configuration after that point.
> > > > > >
> > > > > >
> > > > > >
> > > > > > After the return of ioctl with stop == 0, the device can
> > > > > > continue
> > > > > >
> > > > > > processing buffers as long as typical conditions are met (vq
> > > > > > is enabled,
> > > > > >
> > > > > > DRIVER_OK status bit is enabled, etc).
> > > > >
> > > > > Just to be clear, we are adding vdpa level new ioctl() that
> > > > > doesn’t map to
> > > any mechanism in the virtio spec.
> > > > >
> > > > > Why can't we use this ioctl() to indicate driver to start/stop
> > > > > the device
> > > instead of driving it through the driver_ok?
> > > > > This is in the context of other discussion we had in the LM series.
> > > >
> > > > If there's something in the spec that does this then let's use that.
> > >
> > > Actually, we try to propose a independent feature here:
> > >
> > > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.htm
> > > l
> > >
> > This will stop the device for all the operations.
> > Once the device is stopped, its state cannot be queried further as device
> won't respond.
> > It has limited use case.
> > What we need is to stop non admin queue related portion of the device.
> >
> 
> Still don't follow this, sorry.
Once a device it stopped its state etc cannot be queried.
if you want to stop and still allow certain operations, a better spec definition is needed that says,

stop A, B, C, but allow D, E, F, G.
A = stop CVQs and save its state somewhere
B = stop data VQs and save it state somewhere
C = stop generic config interrupt

D = query state of multiple VQs
E = query device statistics and other elements/objects in future
F = setup/config/restore certain fields
G = resume the device

> 
> Adding the admin vq to the mix, this would stop a device of a device group,
> but not the whole virtqueue group. If the admin VQ is offered by the PF
> (since it's not exposed to the guest), it will continue accepting requests as
> normal. If it's exposed in the VF, I think the best bet is to shadow it, since
> guest and host requests could conflict.
> 
> Since this is offered through vdpa, the device backend driver can route it to
> whatever method works better for the hardware. For example, to send an
> admin vq command to the PF. That's why it's important to keep the feature
> as self-contained and orthogonal to others as possible.
> 

I replied in other thread to continue there.
Jason Wang June 2, 2022, 2 a.m. UTC | #18
On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, May 31, 2022 10:42 PM
> >
> > Well, the ability to query the virtqueue state was proposed as another
> > feature (Eugenio, please correct me). This should be sufficient for making
> > virtio-net to be live migrated.
> >
> The device is stopped, it won't answer to this special vq config done here.

This depends on the definition of the stop. Any query to the device
state should be allowed otherwise it's meaningless for us.

> Programming all of these using cfg registers doesn't scale for on-chip memory and for the speed.

Well, they are orthogonal and what I want to say is, we should first
define the semantics of stop and state of the virtqueue.

Such a facility could be accessed by either transport specific method
or admin virtqueue, it totally depends on the hardware architecture of
the vendor.

>
> Next would be to program hundreds of statistics of the 64 VQs through a giant PCI config space register in some busy polling scheme.

We don't need giant config space, and this method has been implemented
by some vDPA vendors.

>
> I can clearly see how all these are inefficient for faster LM.
> We need an efficient AQ to proceed with at minimum.

I'm fine with admin virtqueue, but the stop and state are orthogonal
to that. And using admin virtqueue for stop/state will be more natural
if we use admin virtqueue as a transport.

Thanks

>
> > https://lists.oasis-open.org/archives/virtio-comment/202103/msg00008.html
> >
> > > Once the device is stopped, its state cannot be queried further as device
> > won't respond.
> > > It has limited use case.
> > > What we need is to stop non admin queue related portion of the device.
Jason Wang June 2, 2022, 2:02 a.m. UTC | #19
On Thu, Jun 2, 2022 at 3:30 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Eugenio Perez Martin <eperezma@redhat.com>
> > Sent: Wednesday, June 1, 2022 5:50 AM
> >
> > On Tue, May 31, 2022 at 10:19 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Sunday, May 29, 2022 11:39 PM
> > > >
> > > > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin <mst@redhat.com>
> > wrote:
> > > > >
> > > > > On Thu, May 26, 2022 at 12:54:32PM +0000, Parav Pandit wrote:
> > > > > >
> > > > > >
> > > > > > > From: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > > > >
> > > > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa
> > > > > > > will offer
> > > > > > >
> > > > > > > that backend feature and userspace can effectively stop the device.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This is a must before get virtqueue indexes (base) for live
> > > > > > > migration,
> > > > > > >
> > > > > > > since the device could modify them after userland gets them.
> > > > > > > There are
> > > > > > >
> > > > > > > individual ways to perform that action for some devices
> > > > > > >
> > > > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...)
> > but
> > > > there
> > > > > > > was no
> > > > > > >
> > > > > > > way to perform it for any vhost device (and, in particular, vhost-
> > vdpa).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After the return of ioctl with stop != 0, the device MUST
> > > > > > > finish any
> > > > > > >
> > > > > > > pending operations like in flight requests. It must also
> > > > > > > preserve all
> > > > > > >
> > > > > > > the necessary state (the virtqueue vring base plus the
> > > > > > > possible device
> > > > > > >
> > > > > > > specific states) that is required for restoring in the future.
> > > > > > > The
> > > > > > >
> > > > > > > device must not change its configuration after that point.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After the return of ioctl with stop == 0, the device can
> > > > > > > continue
> > > > > > >
> > > > > > > processing buffers as long as typical conditions are met (vq
> > > > > > > is enabled,
> > > > > > >
> > > > > > > DRIVER_OK status bit is enabled, etc).
> > > > > >
> > > > > > Just to be clear, we are adding vdpa level new ioctl() that
> > > > > > doesn’t map to
> > > > any mechanism in the virtio spec.
> > > > > >
> > > > > > Why can't we use this ioctl() to indicate driver to start/stop
> > > > > > the device
> > > > instead of driving it through the driver_ok?
> > > > > > This is in the context of other discussion we had in the LM series.
> > > > >
> > > > > If there's something in the spec that does this then let's use that.
> > > >
> > > > Actually, we try to propose a independent feature here:
> > > >
> > > > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.htm
> > > > l
> > > >
> > > This will stop the device for all the operations.
> > > Once the device is stopped, its state cannot be queried further as device
> > won't respond.
> > > It has limited use case.
> > > What we need is to stop non admin queue related portion of the device.
> > >
> >
> > Still don't follow this, sorry.
> Once a device it stopped its state etc cannot be queried.

This is not what is proposed here.

> if you want to stop and still allow certain operations, a better spec definition is needed that says,
>
> stop A, B, C, but allow D, E, F, G.
> A = stop CVQs and save its state somewhere
> B = stop data VQs and save it state somewhere
> C = stop generic config interrupt

Actually, it's the stop of the config space change.
And what more, any guest visible state must not be changed.

>
> D = query state of multiple VQs
> E = query device statistics and other elements/objects in future

This is the device state I believe.

> F = setup/config/restore certain fields

This is the reverse of D and E, that is setting the state.

> G = resume the device
>

Thanks

> >
> > Adding the admin vq to the mix, this would stop a device of a device group,
> > but not the whole virtqueue group. If the admin VQ is offered by the PF
> > (since it's not exposed to the guest), it will continue accepting requests as
> > normal. If it's exposed in the VF, I think the best bet is to shadow it, since
> > guest and host requests could conflict.
> >
> > Since this is offered through vdpa, the device backend driver can route it to
> > whatever method works better for the hardware. For example, to send an
> > admin vq command to the PF. That's why it's important to keep the feature
> > as self-contained and orthogonal to others as possible.
> >
>
> I replied in other thread to continue there.
Jason Wang June 2, 2022, 2:08 a.m. UTC | #20
在 2022/5/26 20:43, Eugenio Pérez 写道:
> Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> that backend feature and userspace can effectively stop the device.
>
> This is a must before get virtqueue indexes (base) for live migration,
> since the device could modify them after userland gets them. There are
> individual ways to perform that action for some devices
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
>
> After the return of ioctl with stop != 0, the device MUST finish any
> pending operations like in flight requests. It must also preserve all
> the necessary state (the virtqueue vring base plus the possible device
> specific states) that is required for restoring in the future. The
> device must not change its configuration after that point.


I think we probably need more accurate definition on the state as Parav 
suggested.

Besides this, we should also clarify when stop is allowed. E.g should we 
allow setting stop without DRIVER_OK?

Thanks


>
> After the return of ioctl with stop == 0, the device can continue
> processing buffers as long as typical conditions are met (vq is enabled,
> DRIVER_OK status bit is enabled, etc).
>
> In the future, we will provide features similar to VHOST_USER_GET_INFLIGHT_FD
> so the device can save pending operations.
>
> Comments are welcome.
>
> v4:
> * Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.
>
> v3:
> * s/VHOST_STOP/VHOST_VDPA_STOP/
> * Add documentation and requirements of the ioctl above its definition.
>
> v2:
> * Replace raw _F_STOP with BIT_ULL(_F_STOP).
> * Fix obtaining of stop ioctl arg (it was not obtained but written).
> * Add stop to vdpa_sim_blk.
>
> Eugenio Pérez (4):
>    vdpa: Add stop operation
>    vhost-vdpa: introduce STOP backend feature bit
>    vhost-vdpa: uAPI to stop the device
>    vdpa_sim: Implement stop vdpa op
>
>   drivers/vdpa/vdpa_sim/vdpa_sim.c     | 21 +++++++++++++++++
>   drivers/vdpa/vdpa_sim/vdpa_sim.h     |  1 +
>   drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
>   drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
>   drivers/vhost/vdpa.c                 | 34 +++++++++++++++++++++++++++-
>   include/linux/vdpa.h                 |  6 +++++
>   include/uapi/linux/vhost.h           | 14 ++++++++++++
>   include/uapi/linux/vhost_types.h     |  2 ++
>   8 files changed, 83 insertions(+), 1 deletion(-)
>
Parav Pandit June 2, 2022, 2:59 a.m. UTC | #21
> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, June 1, 2022 10:00 PM
> 
> On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, May 31, 2022 10:42 PM
> > >
> > > Well, the ability to query the virtqueue state was proposed as
> > > another feature (Eugenio, please correct me). This should be
> > > sufficient for making virtio-net to be live migrated.
> > >
> > The device is stopped, it won't answer to this special vq config done here.
> 
> This depends on the definition of the stop. Any query to the device state
> should be allowed otherwise it's meaningless for us.
> 
> > Programming all of these using cfg registers doesn't scale for on-chip
> memory and for the speed.
> 
> Well, they are orthogonal and what I want to say is, we should first define
> the semantics of stop and state of the virtqueue.
> 
> Such a facility could be accessed by either transport specific method or admin
> virtqueue, it totally depends on the hardware architecture of the vendor.
> 
I find it hard to believe that a vendor can implement a CVQ but not AQ and chose to expose tens of hundreds of registers.
But maybe, it fits some specific hw.

I like to learn the advantages of such method other than simplicity.

We can clearly that we are shifting away from such PCI registers with SIOV, IMS and other scalable solutions.
virtio drifting in reverse direction by introducing more registers as transport.
I expect it to an optional transport like AQ.

> >
> > Next would be to program hundreds of statistics of the 64 VQs through a
> giant PCI config space register in some busy polling scheme.
> 
> We don't need giant config space, and this method has been implemented
> by some vDPA vendors.
> 
There are tens of 64-bit counters per VQs. These needs to programmed on destination side.
Programming these via registers requires exposing them on the registers.
In one of the proposals, I see them being queried via CVQ from the device.

Programming them via cfg registers requires large cfg space or synchronous programming until receiving ACK from it.
This means one entry at a time...

Programming them via CVQ needs replicate and align cmd values etc on all device types. All duplicate and hard to maintain.


> >
> > I can clearly see how all these are inefficient for faster LM.
> > We need an efficient AQ to proceed with at minimum.
> 
> I'm fine with admin virtqueue, but the stop and state are orthogonal to that.
> And using admin virtqueue for stop/state will be more natural if we use
> admin virtqueue as a transport.
Ok.
We should have defined it bit earlier that all vendors can use. :(
Jason Wang June 2, 2022, 3:53 a.m. UTC | #22
On Thu, Jun 2, 2022 at 10:59 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, June 1, 2022 10:00 PM
> >
> > On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, May 31, 2022 10:42 PM
> > > >
> > > > Well, the ability to query the virtqueue state was proposed as
> > > > another feature (Eugenio, please correct me). This should be
> > > > sufficient for making virtio-net to be live migrated.
> > > >
> > > The device is stopped, it won't answer to this special vq config done here.
> >
> > This depends on the definition of the stop. Any query to the device state
> > should be allowed otherwise it's meaningless for us.
> >
> > > Programming all of these using cfg registers doesn't scale for on-chip
> > memory and for the speed.
> >
> > Well, they are orthogonal and what I want to say is, we should first define
> > the semantics of stop and state of the virtqueue.
> >
> > Such a facility could be accessed by either transport specific method or admin
> > virtqueue, it totally depends on the hardware architecture of the vendor.
> >
> I find it hard to believe that a vendor can implement a CVQ but not AQ and chose to expose tens of hundreds of registers.
> But maybe, it fits some specific hw.

You can have a look at the ifcvf dpdk driver as an example.

But another thing that is unrelated to hardware architecture is the
nesting support. Having admin virtqueue in a nesting environment looks
like an overkill. Presenting a register in L1 and map it to L0's admin
should be good enough.

>
> I like to learn the advantages of such method other than simplicity.
>
> We can clearly that we are shifting away from such PCI registers with SIOV, IMS and other scalable solutions.
> virtio drifting in reverse direction by introducing more registers as transport.
> I expect it to an optional transport like AQ.

Actually, I had a proposal of using admin virtqueue as a transport,
it's designed to be SIOV/IMS capable. And it's not hard to extend it
with the state/stop support etc.

>
> > >
> > > Next would be to program hundreds of statistics of the 64 VQs through a
> > giant PCI config space register in some busy polling scheme.
> >
> > We don't need giant config space, and this method has been implemented
> > by some vDPA vendors.
> >
> There are tens of 64-bit counters per VQs. These needs to programmed on destination side.
> Programming these via registers requires exposing them on the registers.
> In one of the proposals, I see them being queried via CVQ from the device.

I didn't see a proposal like this. And I don't think querying general
virtio state like idx with a device specific CVQ is a good design.

>
> Programming them via cfg registers requires large cfg space or synchronous programming until receiving ACK from it.
> This means one entry at a time...
>
> Programming them via CVQ needs replicate and align cmd values etc on all device types. All duplicate and hard to maintain.
>
>
> > >
> > > I can clearly see how all these are inefficient for faster LM.
> > > We need an efficient AQ to proceed with at minimum.
> >
> > I'm fine with admin virtqueue, but the stop and state are orthogonal to that.
> > And using admin virtqueue for stop/state will be more natural if we use
> > admin virtqueue as a transport.
> Ok.
> We should have defined it bit earlier that all vendors can use. :(

I agree.

Thanks
Eugenio Perez Martin June 2, 2022, 8:57 a.m. UTC | #23
On Thu, Jun 2, 2022 at 4:59 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, June 1, 2022 10:00 PM
> >
> > On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, May 31, 2022 10:42 PM
> > > >
> > > > Well, the ability to query the virtqueue state was proposed as
> > > > another feature (Eugenio, please correct me). This should be
> > > > sufficient for making virtio-net to be live migrated.
> > > >
> > > The device is stopped, it won't answer to this special vq config done here.
> >
> > This depends on the definition of the stop. Any query to the device state
> > should be allowed otherwise it's meaningless for us.
> >
> > > Programming all of these using cfg registers doesn't scale for on-chip
> > memory and for the speed.
> >
> > Well, they are orthogonal and what I want to say is, we should first define
> > the semantics of stop and state of the virtqueue.
> >
> > Such a facility could be accessed by either transport specific method or admin
> > virtqueue, it totally depends on the hardware architecture of the vendor.
> >
> I find it hard to believe that a vendor can implement a CVQ but not AQ and chose to expose tens of hundreds of registers.
> But maybe, it fits some specific hw.
>
> I like to learn the advantages of such method other than simplicity.
>
> We can clearly that we are shifting away from such PCI registers with SIOV, IMS and other scalable solutions.
> virtio drifting in reverse direction by introducing more registers as transport.
> I expect it to an optional transport like AQ.
>
> > >
> > > Next would be to program hundreds of statistics of the 64 VQs through a
> > giant PCI config space register in some busy polling scheme.
> >
> > We don't need giant config space, and this method has been implemented
> > by some vDPA vendors.
> >
> There are tens of 64-bit counters per VQs. These needs to programmed on destination side.
> Programming these via registers requires exposing them on the registers.
> In one of the proposals, I see them being queried via CVQ from the device.
>
> Programming them via cfg registers requires large cfg space or synchronous programming until receiving ACK from it.
> This means one entry at a time...
>
> Programming them via CVQ needs replicate and align cmd values etc on all device types. All duplicate and hard to maintain.
>

I think this discussion should be moved to the proposals on
virtio-comment. In the vdpa context, they should be covered.

This one is about exposing the basic facility of stopping and resuming
a device to userland, and it fits equally well if the device
implements it via cfg registers, via admin vq, via channel I/O, or via
whatever transport the vdpa backend prefers. To ask for the state is
already covered in the vhost layer, and this proposal does not affect
it.

Given the flexibility of vdpa, we can even ask vq state using
backend-specific methods, cache it (knowing that there will be no
change of them until resume or DRIVER_OK), and expose them to qemu
using config space interface or any other batch method. Same as with
the enable_vq problem. And the same applies to stats. And we maintain
compatibility with all vendor-specific control plane.

Would that work for devices that cannot or does not want to expose
them via config space?

Thanks!

>
> > >
> > > I can clearly see how all these are inefficient for faster LM.
> > > We need an efficient AQ to proceed with at minimum.
> >
> > I'm fine with admin virtqueue, but the stop and state are orthogonal to that.
> > And using admin virtqueue for stop/state will be more natural if we use
> > admin virtqueue as a transport.
> Ok.
> We should have defined it bit earlier that all vendors can use. :(
Parav Pandit June 15, 2022, 12:10 a.m. UTC | #24
> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, June 1, 2022 11:54 PM
> 
> On Thu, Jun 2, 2022 at 10:59 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, June 1, 2022 10:00 PM
> > >
> > > On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Tuesday, May 31, 2022 10:42 PM
> > > > >
> > > > > Well, the ability to query the virtqueue state was proposed as
> > > > > another feature (Eugenio, please correct me). This should be
> > > > > sufficient for making virtio-net to be live migrated.
> > > > >
> > > > The device is stopped, it won't answer to this special vq config done
> here.
> > >
> > > This depends on the definition of the stop. Any query to the device
> > > state should be allowed otherwise it's meaningless for us.
> > >
> > > > Programming all of these using cfg registers doesn't scale for
> > > > on-chip
> > > memory and for the speed.
> > >
> > > Well, they are orthogonal and what I want to say is, we should first
> > > define the semantics of stop and state of the virtqueue.
> > >
> > > Such a facility could be accessed by either transport specific
> > > method or admin virtqueue, it totally depends on the hardware
> architecture of the vendor.
> > >
> > I find it hard to believe that a vendor can implement a CVQ but not AQ and
> chose to expose tens of hundreds of registers.
> > But maybe, it fits some specific hw.
> 
> You can have a look at the ifcvf dpdk driver as an example.
> 
Ifcvf is an example of using registers.
It is not an answer why AQ is hard for it. :)
virtio spec has definition of queue now and implementing yet another queue shouldn't be a problem.

So far no one seem to have problem with the additional queue.
So I take it as AQ is ok.

> But another thing that is unrelated to hardware architecture is the nesting
> support. Having admin virtqueue in a nesting environment looks like an
> overkill. Presenting a register in L1 and map it to L0's admin should be good
> enough.
So may be a optimized interface can be added that fits nested env.
At this point in time real users that we heard are interested in non-nested use cases. Let's enable them first.


> 
> >
> > I like to learn the advantages of such method other than simplicity.
> >
> > We can clearly that we are shifting away from such PCI registers with SIOV,
> IMS and other scalable solutions.
> > virtio drifting in reverse direction by introducing more registers as
> transport.
> > I expect it to an optional transport like AQ.
> 
> Actually, I had a proposal of using admin virtqueue as a transport, it's
> designed to be SIOV/IMS capable. And it's not hard to extend it with the
> state/stop support etc.
> 
> >
> > > >
> > > > Next would be to program hundreds of statistics of the 64 VQs
> > > > through a
> > > giant PCI config space register in some busy polling scheme.
> > >
> > > We don't need giant config space, and this method has been
> > > implemented by some vDPA vendors.
> > >
> > There are tens of 64-bit counters per VQs. These needs to programmed on
> destination side.
> > Programming these via registers requires exposing them on the registers.
> > In one of the proposals, I see them being queried via CVQ from the device.
> 
> I didn't see a proposal like this. And I don't think querying general virtio state
> like idx with a device specific CVQ is a good design.
> 
My example was not for the idx. But for VQ statistics that is queried via CVQ.

> >
> > Programming them via cfg registers requires large cfg space or synchronous
> programming until receiving ACK from it.
> > This means one entry at a time...
> >
> > Programming them via CVQ needs replicate and align cmd values etc on all
> device types. All duplicate and hard to maintain.
> >
> >
> > > >
> > > > I can clearly see how all these are inefficient for faster LM.
> > > > We need an efficient AQ to proceed with at minimum.
> > >
> > > I'm fine with admin virtqueue, but the stop and state are orthogonal to
> that.
> > > And using admin virtqueue for stop/state will be more natural if we
> > > use admin virtqueue as a transport.
> > Ok.
> > We should have defined it bit earlier that all vendors can use. :(
> 
> I agree.

I remember few months back, you acked in the weekly meeting that TC has approved the AQ direction.
And we are still in this circle of debating the AQ.
Jason Wang June 15, 2022, 1:28 a.m. UTC | #25
On Wed, Jun 15, 2022 at 8:10 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, June 1, 2022 11:54 PM
> >
> > On Thu, Jun 2, 2022 at 10:59 AM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, June 1, 2022 10:00 PM
> > > >
> > > > On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Tuesday, May 31, 2022 10:42 PM
> > > > > >
> > > > > > Well, the ability to query the virtqueue state was proposed as
> > > > > > another feature (Eugenio, please correct me). This should be
> > > > > > sufficient for making virtio-net to be live migrated.
> > > > > >
> > > > > The device is stopped, it won't answer to this special vq config done
> > here.
> > > >
> > > > This depends on the definition of the stop. Any query to the device
> > > > state should be allowed otherwise it's meaningless for us.
> > > >
> > > > > Programming all of these using cfg registers doesn't scale for
> > > > > on-chip
> > > > memory and for the speed.
> > > >
> > > > Well, they are orthogonal and what I want to say is, we should first
> > > > define the semantics of stop and state of the virtqueue.
> > > >
> > > > Such a facility could be accessed by either transport specific
> > > > method or admin virtqueue, it totally depends on the hardware
> > architecture of the vendor.
> > > >
> > > I find it hard to believe that a vendor can implement a CVQ but not AQ and
> > chose to expose tens of hundreds of registers.
> > > But maybe, it fits some specific hw.
> >
> > You can have a look at the ifcvf dpdk driver as an example.
> >
> Ifcvf is an example of using registers.
> It is not an answer why AQ is hard for it. :)

Well, it's an example of how vDPA is implemented. I think we agree
that for vDPA, vendors have the flexibility to implement their
perferrable datapath.

> virtio spec has definition of queue now and implementing yet another queue shouldn't be a problem.
>
> So far no one seem to have problem with the additional queue.
> So I take it as AQ is ok.
>
> > But another thing that is unrelated to hardware architecture is the nesting
> > support. Having admin virtqueue in a nesting environment looks like an
> > overkill. Presenting a register in L1 and map it to L0's admin should be good
> > enough.
> So may be a optimized interface can be added that fits nested env.
> At this point in time real users that we heard are interested in non-nested use cases. Let's enable them first.

That's fine. For nests, it's actually really easy, just adding an
interface within the existing transport should be sufficient.

>
>
> >
> > >
> > > I like to learn the advantages of such method other than simplicity.
> > >
> > > We can clearly that we are shifting away from such PCI registers with SIOV,
> > IMS and other scalable solutions.
> > > virtio drifting in reverse direction by introducing more registers as
> > transport.
> > > I expect it to an optional transport like AQ.
> >
> > Actually, I had a proposal of using admin virtqueue as a transport, it's
> > designed to be SIOV/IMS capable. And it's not hard to extend it with the
> > state/stop support etc.
> >
> > >
> > > > >
> > > > > Next would be to program hundreds of statistics of the 64 VQs
> > > > > through a
> > > > giant PCI config space register in some busy polling scheme.
> > > >
> > > > We don't need giant config space, and this method has been
> > > > implemented by some vDPA vendors.
> > > >
> > > There are tens of 64-bit counters per VQs. These needs to programmed on
> > destination side.
> > > Programming these via registers requires exposing them on the registers.
> > > In one of the proposals, I see them being queried via CVQ from the device.
> >
> > I didn't see a proposal like this. And I don't think querying general virtio state
> > like idx with a device specific CVQ is a good design.
> >
> My example was not for the idx. But for VQ statistics that is queried via CVQ.
>
> > >
> > > Programming them via cfg registers requires large cfg space or synchronous
> > programming until receiving ACK from it.
> > > This means one entry at a time...
> > >
> > > Programming them via CVQ needs replicate and align cmd values etc on all
> > device types. All duplicate and hard to maintain.
> > >
> > >
> > > > >
> > > > > I can clearly see how all these are inefficient for faster LM.
> > > > > We need an efficient AQ to proceed with at minimum.
> > > >
> > > > I'm fine with admin virtqueue, but the stop and state are orthogonal to
> > that.
> > > > And using admin virtqueue for stop/state will be more natural if we
> > > > use admin virtqueue as a transport.
> > > Ok.
> > > We should have defined it bit earlier that all vendors can use. :(
> >
> > I agree.
>
> I remember few months back, you acked in the weekly meeting that TC has approved the AQ direction.
> And we are still in this circle of debating the AQ.

I think not. Just to make sure we are on the same page, the proposal
here is for vDPA, and hope it can provide forward compatibility to
virtio. So in the context of vDPA, admin virtqueue is not a must.

Thanks
Parav Pandit June 16, 2022, 7:36 p.m. UTC | #26
> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, June 14, 2022 9:29 PM
> 
> Well, it's an example of how vDPA is implemented. I think we agree that for
> vDPA, vendors have the flexibility to implement their perferrable datapath.
>
Yes for the vdpa level and for the virtio level.

> >
> > I remember few months back, you acked in the weekly meeting that TC has
> approved the AQ direction.
> > And we are still in this circle of debating the AQ.
> 
> I think not. Just to make sure we are on the same page, the proposal here is
> for vDPA, and hope it can provide forward compatibility to virtio. So in the
> context of vDPA, admin virtqueue is not a must.
In context of vdpa over virtio, an efficient transport interface is needed.
If AQ is not much any other interface such as hundreds to thousands of registers is not must either.

AQ is one interface proposed with multiple benefits.
I haven’t seen any other alternatives that delivers all the benefits.
Only one I have seen is synchronous config registers.

If you let vendors progress, handful of sensible interfaces can exist, each with different characteristics.
How would we proceed from here?
Jason Wang June 17, 2022, 1:15 a.m. UTC | #27
On Fri, Jun 17, 2022 at 3:36 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, June 14, 2022 9:29 PM
> >
> > Well, it's an example of how vDPA is implemented. I think we agree that for
> > vDPA, vendors have the flexibility to implement their perferrable datapath.
> >
> Yes for the vdpa level and for the virtio level.
>
> > >
> > > I remember few months back, you acked in the weekly meeting that TC has
> > approved the AQ direction.
> > > And we are still in this circle of debating the AQ.
> >
> > I think not. Just to make sure we are on the same page, the proposal here is
> > for vDPA, and hope it can provide forward compatibility to virtio. So in the
> > context of vDPA, admin virtqueue is not a must.
> In context of vdpa over virtio, an efficient transport interface is needed.
> If AQ is not much any other interface such as hundreds to thousands of registers is not must either.
>
> AQ is one interface proposed with multiple benefits.
> I haven’t seen any other alternatives that delivers all the benefits.
> Only one I have seen is synchronous config registers.
>
> If you let vendors progress, handful of sensible interfaces can exist, each with different characteristics.
> How would we proceed from here?

I'm pretty fine with having admin virtqueue in the virtio spec. If you
remember, I've even submitted a proposal to use admin virtqueue as a
transport last year.

Let's just proceed in the virtio-dev list.

Thanks
Parav Pandit June 17, 2022, 2:42 a.m. UTC | #28
> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, June 16, 2022 9:15 PM
> 
> On Fri, Jun 17, 2022 at 3:36 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, June 14, 2022 9:29 PM
> > >
> > > Well, it's an example of how vDPA is implemented. I think we agree
> > > that for vDPA, vendors have the flexibility to implement their perferrable
> datapath.
> > >
> > Yes for the vdpa level and for the virtio level.
> >
> > > >
> > > > I remember few months back, you acked in the weekly meeting that
> > > > TC has
> > > approved the AQ direction.
> > > > And we are still in this circle of debating the AQ.
> > >
> > > I think not. Just to make sure we are on the same page, the proposal
> > > here is for vDPA, and hope it can provide forward compatibility to
> > > virtio. So in the context of vDPA, admin virtqueue is not a must.
> > In context of vdpa over virtio, an efficient transport interface is needed.
> > If AQ is not much any other interface such as hundreds to thousands of
> registers is not must either.
> >
> > AQ is one interface proposed with multiple benefits.
> > I haven’t seen any other alternatives that delivers all the benefits.
> > Only one I have seen is synchronous config registers.
> >
> > If you let vendors progress, handful of sensible interfaces can exist, each
> with different characteristics.
> > How would we proceed from here?
> 
> I'm pretty fine with having admin virtqueue in the virtio spec. If you
> remember, I've even submitted a proposal to use admin virtqueue as a
> transport last year.
> 
> Let's just proceed in the virtio-dev list.

o.k. thanks. I am aligned with your thoughts now.