Message ID | 1525734594-11134-1-git-send-email-sridhar.samudrala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Ping on this patch now that the kernel patches are accepted into davem's net-next tree. https://patchwork.ozlabs.org/cover/920005/ On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > This feature bit can be used by hypervisor to indicate virtio_net device to > act as a standby for another device with the same MAC address. > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > by default as i am using libvirt to start the VMs. > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > XML file? > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > --- > hw/net/virtio-net.c | 2 ++ > include/standard-headers/linux/virtio_net.h | 3 +++ > 2 files changed, 5 insertions(+) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 90502fca7c..38b3140670 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > true), > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > + false), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > index e9f255ea3f..01ec09684c 100644 > --- a/include/standard-headers/linux/virtio_net.h > +++ b/include/standard-headers/linux/virtio_net.h > @@ -57,6 +57,9 @@ > * Steering */ > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > + * with the same MAC. > + */ > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > #ifndef VIRTIO_NET_NO_LEGACY
On 2018年06月05日 09:41, Samudrala, Sridhar wrote: > Ping on this patch now that the kernel patches are accepted into > davem's net-next tree. > https://patchwork.ozlabs.org/cover/920005/ > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >> This feature bit can be used by hypervisor to indicate virtio_net >> device to >> act as a standby for another device with the same MAC address. >> >> I tested this with a small change to the patch to mark the STANDBY >> feature 'true' >> by default as i am using libvirt to start the VMs. >> Is there a way to pass the newly added feature bit 'standby' to qemu >> via libvirt >> XML file? >> Maybe you can try qemu command line passthrough: https://libvirt.org/drvqemu.html#qemucommand The patch looks good to me. Let's see if Michael have any comment. Thanks >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> --- >> hw/net/virtio-net.c | 2 ++ >> include/standard-headers/linux/virtio_net.h | 3 +++ >> 2 files changed, 5 insertions(+) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index 90502fca7c..38b3140670 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >> true), >> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, >> SPEED_UNKNOWN), >> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, >> VIRTIO_NET_F_STANDBY, >> + false), >> DEFINE_PROP_END_OF_LIST(), >> }; >> diff --git a/include/standard-headers/linux/virtio_net.h >> b/include/standard-headers/linux/virtio_net.h >> index e9f255ea3f..01ec09684c 100644 >> --- a/include/standard-headers/linux/virtio_net.h >> +++ b/include/standard-headers/linux/virtio_net.h >> @@ -57,6 +57,9 @@ >> * Steering */ >> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for >> another device >> + * with the same MAC. >> + */ >> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and >> duplex */ >> #ifndef VIRTIO_NET_NO_LEGACY >
I don't think this is sufficient. If both primary and standby devices are present, a legacy guest without support for the feature might see two devices with same mac and get confused. I think that we should only make primary visible after guest acked the backup feature bit. And on reset or when backup is cleared in some other way, unplug the primary. Something like the below will do the job: Primary device is added with a special "primary-failover" flag. A virtual machine is then initialized with just a standby virtio device. Primary is not yet added. Later QEMU detects that guest driver device set DRIVER_OK. It then exposes the primary device to the guest, and triggers a device addition event (hot-plug event) for it. If QEMU detects guest driver removal, it initiates a hot-unplug sequence to remove the primary driver. In particular, if QEMU detects guest re-initialization (e.g. by detecting guest reset) it immediately removes the primary device. We can move some of this code to management as well, architecturally it does not make too much sense but it might be easier implementation-wise. HTH On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: > Ping on this patch now that the kernel patches are accepted into davem's net-next tree. > https://patchwork.ozlabs.org/cover/920005/ > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > > This feature bit can be used by hypervisor to indicate virtio_net device to > > act as a standby for another device with the same MAC address. > > > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > > by default as i am using libvirt to start the VMs. > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > > XML file? > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > --- > > hw/net/virtio-net.c | 2 ++ > > include/standard-headers/linux/virtio_net.h | 3 +++ > > 2 files changed, 5 insertions(+) > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > index 90502fca7c..38b3140670 100644 > > --- a/hw/net/virtio-net.c > > +++ b/hw/net/virtio-net.c > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > true), > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > > + false), > > DEFINE_PROP_END_OF_LIST(), > > }; > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > > index e9f255ea3f..01ec09684c 100644 > > --- a/include/standard-headers/linux/virtio_net.h > > +++ b/include/standard-headers/linux/virtio_net.h > > @@ -57,6 +57,9 @@ > > * Steering */ > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > > + * with the same MAC. > > + */ > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > #ifndef VIRTIO_NET_NO_LEGACY
On 6/5/2018 5:33 AM, Michael S. Tsirkin wrote: > I don't think this is sufficient. Sure. This is not sufficient for a complete solution, but is Qemu the right place to manage primary/standby interfaces? I think the other steps including plugging/unplugging the primary interface needs to handled by some orchestration layer. > > If both primary and standby devices are present, a legacy guest without > support for the feature might see two devices with same mac and get > confused. Yes. Isn't this possible today when a VM is started with a mis-configured domain XML file that passes a virtio-net interface and a VF with the same MAC? > > I think that we should only make primary visible after guest acked the > backup feature bit. The primary can be plugged/unplugged at any time by the management layer. So i guess this needs to be handled in the libvirt layer. > > And on reset or when backup is cleared in some other way, unplug the > primary. > > Something like the below will do the job: > > Primary device is added with a special "primary-failover" flag. Are you suggesting an extension to Qemu to add another flag for primary device too? > A virtual machine is then initialized with just a standby virtio > device. Primary is not yet added. > > Later QEMU detects that guest driver device set DRIVER_OK. > It then exposes the primary device to the guest, and triggers > a device addition event (hot-plug event) for it. > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > to remove the primary driver. In particular, if QEMU detects guest > re-initialization (e.g. by detecting guest reset) it immediately removes > the primary device. > > We can move some of this code to management as well, architecturally it > does not make too much sense but it might be easier implementation-wise. > > HTH > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. >> https://patchwork.ozlabs.org/cover/920005/ >> >> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>> This feature bit can be used by hypervisor to indicate virtio_net device to >>> act as a standby for another device with the same MAC address. >>> >>> I tested this with a small change to the patch to mark the STANDBY feature 'true' >>> by default as i am using libvirt to start the VMs. >>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >>> XML file? >>> >>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>> --- >>> hw/net/virtio-net.c | 2 ++ >>> include/standard-headers/linux/virtio_net.h | 3 +++ >>> 2 files changed, 5 insertions(+) >>> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>> index 90502fca7c..38b3140670 100644 >>> --- a/hw/net/virtio-net.c >>> +++ b/hw/net/virtio-net.c >>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>> true), >>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >>> + false), >>> DEFINE_PROP_END_OF_LIST(), >>> }; >>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >>> index e9f255ea3f..01ec09684c 100644 >>> --- a/include/standard-headers/linux/virtio_net.h >>> +++ b/include/standard-headers/linux/virtio_net.h >>> @@ -57,6 +57,9 @@ >>> * Steering */ >>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >>> + * with the same MAC. >>> + */ >>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >>> #ifndef VIRTIO_NET_NO_LEGACY
On Tue, Jun 05, 2018 at 01:20:33PM -0700, Samudrala, Sridhar wrote: > > On 6/5/2018 5:33 AM, Michael S. Tsirkin wrote: > > I don't think this is sufficient. > > Sure. This is not sufficient for a complete solution, but is Qemu the right place > to manage primary/standby interfaces? > > I think the other steps including plugging/unplugging the primary interface needs > to handled by some orchestration layer. I don't see any real benefit to it but yes, you can push it up the stack. Unfortunately the patch still won't be sufficient then, see below :) > > > > > If both primary and standby devices are present, a legacy guest without > > support for the feature might see two devices with same mac and get > > confused. > > Yes. Isn't this possible today when a VM is started with a mis-configured domain > XML file that passes a virtio-net interface and a VF with the same MAC? Yes, so if you misconfigure the hypervisor then your guest won't run. Not sure what you are trying to say by this though. > > > > > I think that we should only make primary visible after guest acked the > > backup feature bit. > > The primary can be plugged/unplugged at any time by the management layer. > So i guess this needs to be handled in the libvirt layer. management just wants to give the primary to guest and later take it back, it really does not care about the details of the process, so I don't see what does pushing it up the stack buy you. So I don't think it *needs* to be done in libvirt. It probably can if you add a bunch of hooks so it knows whenever vm reboots, driver binds and unbinds from device, and can check that backup flag was set. If you are pushing for a setup like that please get a buy-in from libvirt maintainers or better write a patch. > > > > And on reset or when backup is cleared in some other way, unplug the > > primary. > > > > Something like the below will do the job: > > > > Primary device is added with a special "primary-failover" flag. > > Are you suggesting an extension to Qemu to add another flag for > primary device too? Exactly. > > A virtual machine is then initialized with just a standby virtio > > device. Primary is not yet added. > > > > Later QEMU detects that guest driver device set DRIVER_OK. > > It then exposes the primary device to the guest, and triggers > > a device addition event (hot-plug event) for it. > > > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > > to remove the primary driver. In particular, if QEMU detects guest > > re-initialization (e.g. by detecting guest reset) it immediately removes > > the primary device. > > > > We can move some of this code to management as well, architecturally it > > does not make too much sense but it might be easier implementation-wise. > > > > HTH > > > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: > > > Ping on this patch now that the kernel patches are accepted into davem's net-next tree. > > > https://patchwork.ozlabs.org/cover/920005/ > > > > > > > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > > > > This feature bit can be used by hypervisor to indicate virtio_net device to > > > > act as a standby for another device with the same MAC address. > > > > > > > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > > > > by default as i am using libvirt to start the VMs. > > > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > > > > XML file? > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > --- > > > > hw/net/virtio-net.c | 2 ++ > > > > include/standard-headers/linux/virtio_net.h | 3 +++ > > > > 2 files changed, 5 insertions(+) > > > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > > > index 90502fca7c..38b3140670 100644 > > > > --- a/hw/net/virtio-net.c > > > > +++ b/hw/net/virtio-net.c > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > > > true), > > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > > > > + false), > > > > DEFINE_PROP_END_OF_LIST(), > > > > }; > > > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > > > > index e9f255ea3f..01ec09684c 100644 > > > > --- a/include/standard-headers/linux/virtio_net.h > > > > +++ b/include/standard-headers/linux/virtio_net.h > > > > @@ -57,6 +57,9 @@ > > > > * Steering */ > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > > > > + * with the same MAC. > > > > + */ > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > > > #ifndef VIRTIO_NET_NO_LEGACY
Good to see this discussion going. I share the same feeling that the decision of plugging the primary (passthrough) should only be made until guest driver acknowledges DRIVER_OK and _F_STANDBY. Architecturally this intelligence should be baken to QEMU itself rather than moving up to management stack, such as libvirt. A few questions in line below. On Tue, Jun 5, 2018 at 5:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > I don't think this is sufficient. > > If both primary and standby devices are present, a legacy guest without > support for the feature might see two devices with same mac and get > confused. > > I think that we should only make primary visible after guest acked the > backup feature bit. > > And on reset or when backup is cleared in some other way, unplug the > primary. > > Something like the below will do the job: > > Primary device is added with a special "primary-failover" flag. I wonder if you envision this flag as a user interface or some internal attribute/property to QEMU device? Since QEMU needs to associate this particular primary-failover device with a virtio standby instance for checking DRIVER_OK as you indicate below, I wonder if the group ID thing will be helpful to set "primary-failover" flag internally without having to add another option in CLI? > A virtual machine is then initialized with just a standby virtio > device. Primary is not yet added. > > Later QEMU detects that guest driver device set DRIVER_OK. > It then exposes the primary device to the guest, and triggers > a device addition event (hot-plug event) for it. Sounds OK. While there might be a small window the guest already starts to use virtio interface before the passthrough gets plugged in, I think that should be fine. Another option is to wait until the primary appears and gets registered in the guest, so it can bring up the upper failover interface. > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > to remove the primary driver. In particular, if QEMU detects guest > re-initialization (e.g. by detecting guest reset) it immediately removes > the primary device. Agreed. For legacy guest, user might prefer seeing a single passthrough device rather than a virtio device. I would hope there's an option to allow it to happen, instead of removing all passthrough devices. Regards, -Siwei > > We can move some of this code to management as well, architecturally it > does not make too much sense but it might be easier implementation-wise. > > HTH > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. >> https://patchwork.ozlabs.org/cover/920005/ >> >> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >> > This feature bit can be used by hypervisor to indicate virtio_net device to >> > act as a standby for another device with the same MAC address. >> > >> > I tested this with a small change to the patch to mark the STANDBY feature 'true' >> > by default as i am using libvirt to start the VMs. >> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >> > XML file? >> > >> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> > --- >> > hw/net/virtio-net.c | 2 ++ >> > include/standard-headers/linux/virtio_net.h | 3 +++ >> > 2 files changed, 5 insertions(+) >> > >> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> > index 90502fca7c..38b3140670 100644 >> > --- a/hw/net/virtio-net.c >> > +++ b/hw/net/virtio-net.c >> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >> > true), >> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >> > + false), >> > DEFINE_PROP_END_OF_LIST(), >> > }; >> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >> > index e9f255ea3f..01ec09684c 100644 >> > --- a/include/standard-headers/linux/virtio_net.h >> > +++ b/include/standard-headers/linux/virtio_net.h >> > @@ -57,6 +57,9 @@ >> > * Steering */ >> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >> > + * with the same MAC. >> > + */ >> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >> > #ifndef VIRTIO_NET_NO_LEGACY > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >
On Tue, Jun 05, 2018 at 02:16:44PM -0700, Siwei Liu wrote: > Good to see this discussion going. I share the same feeling that the > decision of plugging the primary (passthrough) should only be made > until guest driver acknowledges DRIVER_OK and _F_STANDBY. > Architecturally this intelligence should be baken to QEMU itself > rather than moving up to management stack, such as libvirt. > > A few questions in line below. > > On Tue, Jun 5, 2018 at 5:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > I don't think this is sufficient. > > > > If both primary and standby devices are present, a legacy guest without > > support for the feature might see two devices with same mac and get > > confused. > > > > I think that we should only make primary visible after guest acked the > > backup feature bit. > > > > And on reset or when backup is cleared in some other way, unplug the > > primary. > > > > Something like the below will do the job: > > > > Primary device is added with a special "primary-failover" flag. > > I wonder if you envision this flag as a user interface or some > internal attribute/property to QEMU device? > > Since QEMU needs to associate this particular primary-failover device > with a virtio standby instance for checking DRIVER_OK as you indicate > below, I wonder if the group ID thing will be helpful to set > "primary-failover" flag internally without having to add another > option in CLI? I haven't thought about it but it's an option. > > A virtual machine is then initialized with just a standby virtio > > device. Primary is not yet added. > > > > Later QEMU detects that guest driver device set DRIVER_OK. > > It then exposes the primary device to the guest, and triggers > > a device addition event (hot-plug event) for it. > > Sounds OK. While there might be a small window the guest already > starts to use virtio interface before the passthrough gets plugged in, > I think that should be fine. > > Another option is to wait until the primary appears and gets > registered in the guest, so it can bring up the upper failover > interface. We can't be sure this will ever happen, can we? We might be asked to migrate at any time. > > > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > > to remove the primary driver. In particular, if QEMU detects guest > > re-initialization (e.g. by detecting guest reset) it immediately removes > > the primary device. > > Agreed. > > For legacy guest, user might prefer seeing a single passthrough device > rather than a virtio device. I would hope there's an option to allow > it to happen, instead of removing all passthrough devices. > > Regards, > -Siwei I don't see a way to make it work since then you can't migrate, can you? The way I see it, if you don't need migration, just use PT without failover. If you do, then you either use virtio or failover if guest supports it. > > > > We can move some of this code to management as well, architecturally it > > does not make too much sense but it might be easier implementation-wise. > > > > HTH > > > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: > >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. > >> https://patchwork.ozlabs.org/cover/920005/ > >> > >> > >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > >> > This feature bit can be used by hypervisor to indicate virtio_net device to > >> > act as a standby for another device with the same MAC address. > >> > > >> > I tested this with a small change to the patch to mark the STANDBY feature 'true' > >> > by default as i am using libvirt to start the VMs. > >> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > >> > XML file? > >> > > >> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >> > --- > >> > hw/net/virtio-net.c | 2 ++ > >> > include/standard-headers/linux/virtio_net.h | 3 +++ > >> > 2 files changed, 5 insertions(+) > >> > > >> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> > index 90502fca7c..38b3140670 100644 > >> > --- a/hw/net/virtio-net.c > >> > +++ b/hw/net/virtio-net.c > >> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > >> > true), > >> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > >> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > >> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > >> > + false), > >> > DEFINE_PROP_END_OF_LIST(), > >> > }; > >> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > >> > index e9f255ea3f..01ec09684c 100644 > >> > --- a/include/standard-headers/linux/virtio_net.h > >> > +++ b/include/standard-headers/linux/virtio_net.h > >> > @@ -57,6 +57,9 @@ > >> > * Steering */ > >> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > >> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > >> > + * with the same MAC. > >> > + */ > >> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > >> > #ifndef VIRTIO_NET_NO_LEGACY > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >
On Tue, Jun 5, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Jun 05, 2018 at 02:16:44PM -0700, Siwei Liu wrote: >> Good to see this discussion going. I share the same feeling that the >> decision of plugging the primary (passthrough) should only be made >> until guest driver acknowledges DRIVER_OK and _F_STANDBY. >> Architecturally this intelligence should be baken to QEMU itself >> rather than moving up to management stack, such as libvirt. >> >> A few questions in line below. >> >> On Tue, Jun 5, 2018 at 5:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > I don't think this is sufficient. >> > >> > If both primary and standby devices are present, a legacy guest without >> > support for the feature might see two devices with same mac and get >> > confused. >> > >> > I think that we should only make primary visible after guest acked the >> > backup feature bit. >> > >> > And on reset or when backup is cleared in some other way, unplug the >> > primary. >> > >> > Something like the below will do the job: >> > >> > Primary device is added with a special "primary-failover" flag. >> >> I wonder if you envision this flag as a user interface or some >> internal attribute/property to QEMU device? >> >> Since QEMU needs to associate this particular primary-failover device >> with a virtio standby instance for checking DRIVER_OK as you indicate >> below, I wonder if the group ID thing will be helpful to set >> "primary-failover" flag internally without having to add another >> option in CLI? > > I haven't thought about it but it's an option. > >> > A virtual machine is then initialized with just a standby virtio >> > device. Primary is not yet added. >> > >> > Later QEMU detects that guest driver device set DRIVER_OK. >> > It then exposes the primary device to the guest, and triggers >> > a device addition event (hot-plug event) for it. >> >> Sounds OK. While there might be a small window the guest already >> starts to use virtio interface before the passthrough gets plugged in, >> I think that should be fine. >> >> Another option is to wait until the primary appears and gets >> registered in the guest, so it can bring up the upper failover >> interface. > > We can't be sure this will ever happen, can we? We might be asked to > migrate at any time. Right. For that to work, virtio driver needs a complicated state tracking hand-in-hand with the host for VF's hot plugging activity. If hot plugging is defered due to migration activity the failover interface can be brought up immediately. As said, this is just an option. I don't have strong preference here. > >> > >> > If QEMU detects guest driver removal, it initiates a hot-unplug sequence >> > to remove the primary driver. In particular, if QEMU detects guest >> > re-initialization (e.g. by detecting guest reset) it immediately removes >> > the primary device. >> >> Agreed. >> >> For legacy guest, user might prefer seeing a single passthrough device >> rather than a virtio device. I would hope there's an option to allow >> it to happen, instead of removing all passthrough devices. >> >> Regards, >> -Siwei > > I don't see a way to make it work since then you can't migrate, can you? The thing is cloud service provider might prefer sticking to the same level of service agreement (SLA) of keeping VF over migration, particularly when guest OS or kernel does not have the support.The footprint in guest OS support might NOT be prevailing in the early days. There's no point the cloud vendor can switch back and forth, as there's no way for hypervisor to detect what kernel version even OS the guest will be running ahead of time, until the guest gets fully loaded and boots up, right? -Siwei > > The way I see it, if you don't need migration, just use PT without > failover. If you do, then you either use virtio or failover if guest > supports it. > >> > >> > We can move some of this code to management as well, architecturally it >> > does not make too much sense but it might be easier implementation-wise. >> > >> > HTH >> > >> > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >> >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. >> >> https://patchwork.ozlabs.org/cover/920005/ >> >> >> >> >> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >> >> > This feature bit can be used by hypervisor to indicate virtio_net device to >> >> > act as a standby for another device with the same MAC address. >> >> > >> >> > I tested this with a small change to the patch to mark the STANDBY feature 'true' >> >> > by default as i am using libvirt to start the VMs. >> >> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >> >> > XML file? >> >> > >> >> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> >> > --- >> >> > hw/net/virtio-net.c | 2 ++ >> >> > include/standard-headers/linux/virtio_net.h | 3 +++ >> >> > 2 files changed, 5 insertions(+) >> >> > >> >> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> >> > index 90502fca7c..38b3140670 100644 >> >> > --- a/hw/net/virtio-net.c >> >> > +++ b/hw/net/virtio-net.c >> >> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >> >> > true), >> >> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >> >> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> >> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >> >> > + false), >> >> > DEFINE_PROP_END_OF_LIST(), >> >> > }; >> >> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >> >> > index e9f255ea3f..01ec09684c 100644 >> >> > --- a/include/standard-headers/linux/virtio_net.h >> >> > +++ b/include/standard-headers/linux/virtio_net.h >> >> > @@ -57,6 +57,9 @@ >> >> > * Steering */ >> >> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >> >> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >> >> > + * with the same MAC. >> >> > + */ >> >> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >> >> > #ifndef VIRTIO_NET_NO_LEGACY >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> >
On 2018年06月05日 20:33, Michael S. Tsirkin wrote: > I don't think this is sufficient. > > If both primary and standby devices are present, a legacy guest without > support for the feature might see two devices with same mac and get > confused. > > I think that we should only make primary visible after guest acked the > backup feature bit. I think we want exactly the reverse? E.g fail the negotiation when guest does not ack backup feature. Otherwise legacy guest won't even have the chance to see primary device in the guest. > > And on reset or when backup is cleared in some other way, unplug the > primary. What if guest does not support hotplug? > > Something like the below will do the job: > > Primary device is added with a special "primary-failover" flag. > A virtual machine is then initialized with just a standby virtio > device. Primary is not yet added. A question is how to do the matching? Qemu knows nothing about e.g mac address of a pass-through device I believe? > > Later QEMU detects that guest driver device set DRIVER_OK. > It then exposes the primary device to the guest, and triggers > a device addition event (hot-plug event) for it. Do you mean we won't have primary device in the initial qemu cli? > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > to remove the primary driver. In particular, if QEMU detects guest > re-initialization (e.g. by detecting guest reset) it immediately removes > the primary device. I believe guest failover module should handle this gracefully? Thanks > > We can move some of this code to management as well, architecturally it > does not make too much sense but it might be easier implementation-wise. > > HTH > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. >> https://patchwork.ozlabs.org/cover/920005/ >> >> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>> This feature bit can be used by hypervisor to indicate virtio_net device to >>> act as a standby for another device with the same MAC address. >>> >>> I tested this with a small change to the patch to mark the STANDBY feature 'true' >>> by default as i am using libvirt to start the VMs. >>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >>> XML file? >>> >>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>> --- >>> hw/net/virtio-net.c | 2 ++ >>> include/standard-headers/linux/virtio_net.h | 3 +++ >>> 2 files changed, 5 insertions(+) >>> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>> index 90502fca7c..38b3140670 100644 >>> --- a/hw/net/virtio-net.c >>> +++ b/hw/net/virtio-net.c >>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>> true), >>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >>> + false), >>> DEFINE_PROP_END_OF_LIST(), >>> }; >>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >>> index e9f255ea3f..01ec09684c 100644 >>> --- a/include/standard-headers/linux/virtio_net.h >>> +++ b/include/standard-headers/linux/virtio_net.h >>> @@ -57,6 +57,9 @@ >>> * Steering */ >>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >>> + * with the same MAC. >>> + */ >>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >>> #ifndef VIRTIO_NET_NO_LEGACY
On 6/4/2018 7:06 PM, Jason Wang wrote: > > > On 2018年06月05日 09:41, Samudrala, Sridhar wrote: >> Ping on this patch now that the kernel patches are accepted into >> davem's net-next tree. >> https://patchwork.ozlabs.org/cover/920005/ >> >> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>> This feature bit can be used by hypervisor to indicate virtio_net >>> device to >>> act as a standby for another device with the same MAC address. >>> >>> I tested this with a small change to the patch to mark the STANDBY >>> feature 'true' >>> by default as i am using libvirt to start the VMs. >>> Is there a way to pass the newly added feature bit 'standby' to qemu >>> via libvirt >>> XML file? >>> > > Maybe you can try qemu command line passthrough: > > https://libvirt.org/drvqemu.html#qemucommand It looks like this can be used to pass command line arguments to qemu. Is it possible to specify a virtio specific attribute via this method? For ex: to say mrg_rxbuf is off we can add the following line to virtio section of the domain xml file. <host mrg_rxbuf='off'/> I think libvirt needs to be extended to to support the new 'standby' attribute via this mechanism. Adding Liane Stump and libvirt to the CC list. Michael, Can we start with getting this patch into Qemu and an update to libvirt to support the 'standby' feature so that this feature can be enabled via some scripts/orchestration layer for now. We could improve this solution by enhancing Qemu to do automatic management of the addition/deletion of the primary device based on feature negotiation as a later patch. > > The patch looks good to me. Let's see if Michael have any comment. > > Thanks > >>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>> --- >>> hw/net/virtio-net.c | 2 ++ >>> include/standard-headers/linux/virtio_net.h | 3 +++ >>> 2 files changed, 5 insertions(+) >>> >>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>> index 90502fca7c..38b3140670 100644 >>> --- a/hw/net/virtio-net.c >>> +++ b/hw/net/virtio-net.c >>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>> true), >>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, >>> SPEED_UNKNOWN), >>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, >>> VIRTIO_NET_F_STANDBY, >>> + false), >>> DEFINE_PROP_END_OF_LIST(), >>> }; >>> diff --git a/include/standard-headers/linux/virtio_net.h >>> b/include/standard-headers/linux/virtio_net.h >>> index e9f255ea3f..01ec09684c 100644 >>> --- a/include/standard-headers/linux/virtio_net.h >>> +++ b/include/standard-headers/linux/virtio_net.h >>> @@ -57,6 +57,9 @@ >>> * Steering */ >>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for >>> another device >>> + * with the same MAC. >>> + */ >>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed >>> and duplex */ >>> #ifndef VIRTIO_NET_NO_LEGACY >> >
On Wed, Jun 06, 2018 at 11:17:36AM -0700, Samudrala, Sridhar wrote: >On 6/4/2018 7:06 PM, Jason Wang wrote: >> >> >> On 2018年06月05日 09:41, Samudrala, Sridhar wrote: >>> Ping on this patch now that the kernel patches are accepted into >>> davem's net-next tree. >>> https://patchwork.ozlabs.org/cover/920005/ >>> >>> >>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>>> This feature bit can be used by hypervisor to indicate virtio_net >>>> device to >>>> act as a standby for another device with the same MAC address. >>>> >>>> I tested this with a small change to the patch to mark the STANDBY >>>> feature 'true' >>>> by default as i am using libvirt to start the VMs. >>>> Is there a way to pass the newly added feature bit 'standby' to qemu >>>> via libvirt >>>> XML file? >>>> >> >> Maybe you can try qemu command line passthrough: >> >> https://libvirt.org/drvqemu.html#qemucommand > >It looks like this can be used to pass command line arguments to qemu. >Is it possible to specify a virtio specific attribute via this method? > Yes, for testing purposes you should be able to do this via using QEMU's -set command line argument: http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html i.e.: <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> ... <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.net0.standby=on'/> </qemu:commandline> </domain> >For ex: to say mrg_rxbuf is off we can add the following line to virtio >section of the domain xml file. > <host mrg_rxbuf='off'/> > >I think libvirt needs to be extended to to support the new 'standby' attribute >via this mechanism. >Adding Liane Stump and libvirt to the CC list. *Laine > >Michael, >Can we start with getting this patch into Qemu and an update to libvirt to >support the 'standby' feature so that this feature can be enabled via >some scripts/orchestration layer for now. > >We could improve this solution by enhancing Qemu to do automatic management of the >addition/deletion of the primary device based on feature negotiation as a later patch. > If that means the libvirt attribute would no longer be needed, I don't see the reason to add it to libvirt in the first place. Jano
On Wed, Jun 06, 2018 at 11:17:36AM -0700, Samudrala, Sridhar wrote: > On 6/4/2018 7:06 PM, Jason Wang wrote: > > > > > > On 2018年06月05日 09:41, Samudrala, Sridhar wrote: > > > Ping on this patch now that the kernel patches are accepted into > > > davem's net-next tree. > > > https://patchwork.ozlabs.org/cover/920005/ > > > > > > > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > > > > This feature bit can be used by hypervisor to indicate > > > > virtio_net device to > > > > act as a standby for another device with the same MAC address. > > > > > > > > I tested this with a small change to the patch to mark the > > > > STANDBY feature 'true' > > > > by default as i am using libvirt to start the VMs. > > > > Is there a way to pass the newly added feature bit 'standby' to > > > > qemu via libvirt > > > > XML file? > > > > > > > > Maybe you can try qemu command line passthrough: > > > > https://libvirt.org/drvqemu.html#qemucommand > > It looks like this can be used to pass command line arguments to qemu. > Is it possible to specify a virtio specific attribute via this method? > > For ex: to say mrg_rxbuf is off we can add the following line to virtio > section of the domain xml file. > <host mrg_rxbuf='off'/> > > I think libvirt needs to be extended to to support the new 'standby' attribute > via this mechanism. > Adding Liane Stump and libvirt to the CC list. > > Michael, > Can we start with getting this patch into Qemu and an update to libvirt to > support the 'standby' feature so that this feature can be enabled via > some scripts/orchestration layer for now. Unfortunately we don't give the hypothetical orchestration layer enough info about driver being bound, so it does not know when is it safe to add a primary. And a similar issue around reset. We could add events for that which would be a small patch, but the issue then is that guest can trigger a storm of these events. > We could improve this solution by enhancing Qemu to do automatic management of the > addition/deletion of the primary device based on feature negotiation as a later patch. I'm not sure what the point of the back and forth would be though. Typically after people invest in a specific config and get it working, it's very hard to move them to another solution. > > > > > The patch looks good to me. Let's see if Michael have any comment. > > > > Thanks > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > --- > > > > hw/net/virtio-net.c | 2 ++ > > > > include/standard-headers/linux/virtio_net.h | 3 +++ > > > > 2 files changed, 5 insertions(+) > > > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > > > index 90502fca7c..38b3140670 100644 > > > > --- a/hw/net/virtio-net.c > > > > +++ b/hw/net/virtio-net.c > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > > > true), > > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, > > > > SPEED_UNKNOWN), > > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, > > > > VIRTIO_NET_F_STANDBY, > > > > + false), > > > > DEFINE_PROP_END_OF_LIST(), > > > > }; > > > > diff --git a/include/standard-headers/linux/virtio_net.h > > > > b/include/standard-headers/linux/virtio_net.h > > > > index e9f255ea3f..01ec09684c 100644 > > > > --- a/include/standard-headers/linux/virtio_net.h > > > > +++ b/include/standard-headers/linux/virtio_net.h > > > > @@ -57,6 +57,9 @@ > > > > * Steering */ > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for > > > > another device > > > > + * with the same MAC. > > > > + */ > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set > > > > linkspeed and duplex */ > > > > #ifndef VIRTIO_NET_NO_LEGACY > > > > >
On 6/6/2018 11:52 AM, Ján Tomko wrote: > On Wed, Jun 06, 2018 at 11:17:36AM -0700, Samudrala, Sridhar wrote: >> On 6/4/2018 7:06 PM, Jason Wang wrote: >>> >>> >>> On 2018年06月05日 09:41, Samudrala, Sridhar wrote: >>>> Ping on this patch now that the kernel patches are accepted into >>>> davem's net-next tree. >>>> https://patchwork.ozlabs.org/cover/920005/ >>>> >>>> >>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>>>> This feature bit can be used by hypervisor to indicate virtio_net >>>>> device to >>>>> act as a standby for another device with the same MAC address. >>>>> >>>>> I tested this with a small change to the patch to mark the STANDBY >>>>> feature 'true' >>>>> by default as i am using libvirt to start the VMs. >>>>> Is there a way to pass the newly added feature bit 'standby' to qemu >>>>> via libvirt >>>>> XML file? >>>>> >>> >>> Maybe you can try qemu command line passthrough: >>> >>> https://libvirt.org/drvqemu.html#qemucommand >> >> It looks like this can be used to pass command line arguments to qemu. >> Is it possible to specify a virtio specific attribute via this method? >> > > Yes, for testing purposes you should be able to do this via using QEMU's > -set command line argument: > http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html > > i.e.: > > <domain type='kvm' > xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> > ... > <qemu:commandline> > <qemu:arg value='-set'/> > <qemu:arg value='device.net0.standby=on'/> > </qemu:commandline> > </domain> Thanks. Will try this. > >> For ex: to say mrg_rxbuf is off we can add the following line to virtio >> section of the domain xml file. >> <host mrg_rxbuf='off'/> >> >> I think libvirt needs to be extended to to support the new 'standby' >> attribute >> via this mechanism. >> Adding Liane Stump and libvirt to the CC list. > > *Laine > >> >> Michael, >> Can we start with getting this patch into Qemu and an update to >> libvirt to >> support the 'standby' feature so that this feature can be enabled via >> some scripts/orchestration layer for now. >> >> We could improve this solution by enhancing Qemu to do automatic >> management of the >> addition/deletion of the primary device based on feature negotiation >> as a later patch. >> > > If that means the libvirt attribute would no longer be needed, I don't > see the reason to add it to libvirt in the first place. I think we still need this attribute supported via libvirt so that a user/admin can enable this feature via domain XML specification.
On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: > This feature bit can be used by hypervisor to indicate virtio_net device to > act as a standby for another device with the same MAC address. > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > by default as i am using libvirt to start the VMs. > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > XML file? > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> So I do not think we can commit to this interface: we really need to control visibility of the primary device. However just for testing purposes, we could add a non-stable interface "x-standby" with the understanding that as any x- prefix it's unstable and will be changed down the road, likely in the next release. > --- > hw/net/virtio-net.c | 2 ++ > include/standard-headers/linux/virtio_net.h | 3 +++ > 2 files changed, 5 insertions(+) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 90502fca7c..38b3140670 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > true), > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > + false), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > index e9f255ea3f..01ec09684c 100644 > --- a/include/standard-headers/linux/virtio_net.h > +++ b/include/standard-headers/linux/virtio_net.h > @@ -57,6 +57,9 @@ > * Steering */ > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > + * with the same MAC. > + */ > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > #ifndef VIRTIO_NET_NO_LEGACY > -- > 2.14.3
On 2018年06月12日 01:26, Michael S. Tsirkin wrote: > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: >> This feature bit can be used by hypervisor to indicate virtio_net device to >> act as a standby for another device with the same MAC address. >> >> I tested this with a small change to the patch to mark the STANDBY feature 'true' >> by default as i am using libvirt to start the VMs. >> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >> XML file? >> >> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > So I do not think we can commit to this interface: we > really need to control visibility of the primary device. The problem is legacy guest won't use primary device at all if we do this. How about control the visibility of standby device? Thanks > However just for testing purposes, we could add a non-stable > interface "x-standby" with the understanding that as any > x- prefix it's unstable and will be changed down the road, > likely in the next release. > > >> --- >> hw/net/virtio-net.c | 2 ++ >> include/standard-headers/linux/virtio_net.h | 3 +++ >> 2 files changed, 5 insertions(+) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index 90502fca7c..38b3140670 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >> true), >> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >> + false), >> DEFINE_PROP_END_OF_LIST(), >> }; >> >> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >> index e9f255ea3f..01ec09684c 100644 >> --- a/include/standard-headers/linux/virtio_net.h >> +++ b/include/standard-headers/linux/virtio_net.h >> @@ -57,6 +57,9 @@ >> * Steering */ >> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >> >> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >> + * with the same MAC. >> + */ >> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >> >> #ifndef VIRTIO_NET_NO_LEGACY >> -- >> 2.14.3
On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: > > > On 2018年06月12日 01:26, Michael S. Tsirkin wrote: > > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: > > > This feature bit can be used by hypervisor to indicate virtio_net device to > > > act as a standby for another device with the same MAC address. > > > > > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > > > by default as i am using libvirt to start the VMs. > > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > > > XML file? > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > So I do not think we can commit to this interface: we > > really need to control visibility of the primary device. > > The problem is legacy guest won't use primary device at all if we do this. And that's by design - I think it's the only way to ensure the legacy guest isn't confused. > How about control the visibility of standby device? > > Thanks standy the always there to guarantee no downtime. > > However just for testing purposes, we could add a non-stable > > interface "x-standby" with the understanding that as any > > x- prefix it's unstable and will be changed down the road, > > likely in the next release. > > > > > > > --- > > > hw/net/virtio-net.c | 2 ++ > > > include/standard-headers/linux/virtio_net.h | 3 +++ > > > 2 files changed, 5 insertions(+) > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > > index 90502fca7c..38b3140670 100644 > > > --- a/hw/net/virtio-net.c > > > +++ b/hw/net/virtio-net.c > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > > true), > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > > > + false), > > > DEFINE_PROP_END_OF_LIST(), > > > }; > > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > > > index e9f255ea3f..01ec09684c 100644 > > > --- a/include/standard-headers/linux/virtio_net.h > > > +++ b/include/standard-headers/linux/virtio_net.h > > > @@ -57,6 +57,9 @@ > > > * Steering */ > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > > > + * with the same MAC. > > > + */ > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > > #ifndef VIRTIO_NET_NO_LEGACY > > > -- > > > 2.14.3
On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote: > On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: >> >> On 2018年06月12日 01:26, Michael S. Tsirkin wrote: >>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: >>>> This feature bit can be used by hypervisor to indicate virtio_net device to >>>> act as a standby for another device with the same MAC address. >>>> >>>> I tested this with a small change to the patch to mark the STANDBY feature 'true' >>>> by default as i am using libvirt to start the VMs. >>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >>>> XML file? >>>> >>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>> So I do not think we can commit to this interface: we >>> really need to control visibility of the primary device. >> The problem is legacy guest won't use primary device at all if we do this. > And that's by design - I think it's the only way to ensure the > legacy guest isn't confused. Yes. I think so. But i am not sure if Qemu is the right place to control the visibility of the primary device. The primary device may not be specified as an argument to Qemu. It may be plugged in later. The cloud service provider is providing a feature that enables low latency datapath and live migration capability. A tenant can use this feature only if he is running a VM that has virtio-net with failover support. I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for an upper layer indicating if the STANDBY feature is successfully negotiated or not. The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links. If VF is successfully hot plugged, virtio-net link should be disabled. > >> How about control the visibility of standby device? >> >> Thanks > standy the always there to guarantee no downtime. > >>> However just for testing purposes, we could add a non-stable >>> interface "x-standby" with the understanding that as any >>> x- prefix it's unstable and will be changed down the road, >>> likely in the next release. >>> >>> >>>> --- >>>> hw/net/virtio-net.c | 2 ++ >>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>> 2 files changed, 5 insertions(+) >>>> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>> index 90502fca7c..38b3140670 100644 >>>> --- a/hw/net/virtio-net.c >>>> +++ b/hw/net/virtio-net.c >>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>> true), >>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >>>> + false), >>>> DEFINE_PROP_END_OF_LIST(), >>>> }; >>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >>>> index e9f255ea3f..01ec09684c 100644 >>>> --- a/include/standard-headers/linux/virtio_net.h >>>> +++ b/include/standard-headers/linux/virtio_net.h >>>> @@ -57,6 +57,9 @@ >>>> * Steering */ >>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >>>> + * with the same MAC. >>>> + */ >>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >>>> #ifndef VIRTIO_NET_NO_LEGACY >>>> -- >>>> 2.14.3
On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote: > On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote: > > On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: > > > > > > On 2018年06月12日 01:26, Michael S. Tsirkin wrote: > > > > On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: > > > > > This feature bit can be used by hypervisor to indicate virtio_net device to > > > > > act as a standby for another device with the same MAC address. > > > > > > > > > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > > > > > by default as i am using libvirt to start the VMs. > > > > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > > > > > XML file? > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > So I do not think we can commit to this interface: we > > > > really need to control visibility of the primary device. > > > The problem is legacy guest won't use primary device at all if we do this. > > And that's by design - I think it's the only way to ensure the > > legacy guest isn't confused. > > Yes. I think so. But i am not sure if Qemu is the right place to control the visibility > of the primary device. The primary device may not be specified as an argument to Qemu. It > may be plugged in later. > The cloud service provider is providing a feature that enables low latency datapath and live > migration capability. > A tenant can use this feature only if he is running a VM that has virtio-net with failover support. Well live migration is there already. The new feature is low latency data path. And it's the guest that needs failover support not the VM. > I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for > an upper layer indicating if the STANDBY feature is successfully negotiated or not. > The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links. > If VF is successfully hot plugged, virtio-net link should be disabled. Did you even talk to upper layer management about it? Just list the steps they need to do and you will see that's a lot of machinery to manage by the upper layer. What do we gain in flexibility? As far as I can see the only gain is some resources saved for legacy VMs. That's not a lot as tenant of the upper layer probably already has at least a hunch that it's a new guest otherwise why bother specifying the feature at all - you save even more resources without it. > > > > > > How about control the visibility of standby device? > > > > > > Thanks > > standy the always there to guarantee no downtime. > > > > > > However just for testing purposes, we could add a non-stable > > > > interface "x-standby" with the understanding that as any > > > > x- prefix it's unstable and will be changed down the road, > > > > likely in the next release. > > > > > > > > > > > > > --- > > > > > hw/net/virtio-net.c | 2 ++ > > > > > include/standard-headers/linux/virtio_net.h | 3 +++ > > > > > 2 files changed, 5 insertions(+) > > > > > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > > > > index 90502fca7c..38b3140670 100644 > > > > > --- a/hw/net/virtio-net.c > > > > > +++ b/hw/net/virtio-net.c > > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > > > > true), > > > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > > > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > > > > > + false), > > > > > DEFINE_PROP_END_OF_LIST(), > > > > > }; > > > > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > > > > > index e9f255ea3f..01ec09684c 100644 > > > > > --- a/include/standard-headers/linux/virtio_net.h > > > > > +++ b/include/standard-headers/linux/virtio_net.h > > > > > @@ -57,6 +57,9 @@ > > > > > * Steering */ > > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > > > > > + * with the same MAC. > > > > > + */ > > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > > > > #ifndef VIRTIO_NET_NO_LEGACY > > > > > -- > > > > > 2.14.3
On Tue, Jun 05, 2018 at 03:09:26PM -0700, Siwei Liu wrote: > The thing is cloud service provider might prefer sticking to the same > level of service agreement (SLA) of keeping VF over migration, That requirement is trivially satisfied by just a single VF :) If your SLA does not require live migration, you should do just that.
On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote: > > I don't think this is sufficient. > > > > If both primary and standby devices are present, a legacy guest without > > support for the feature might see two devices with same mac and get > > confused. > > > > I think that we should only make primary visible after guest acked the > > backup feature bit. > > I think we want exactly the reverse? E.g fail the negotiation when guest > does not ack backup feature. > > Otherwise legacy guest won't even have the chance to see primary device in > the guest. That's by design. > > > > And on reset or when backup is cleared in some other way, unplug the > > primary. > > What if guest does not support hotplug? It shouldn't ack the backup feature then and it's a good point. We should both document this and check kernel config has hotplug support. Sridhar could you take a look pls? > > > > Something like the below will do the job: > > > > Primary device is added with a special "primary-failover" flag. > > A virtual machine is then initialized with just a standby virtio > > device. Primary is not yet added. > > A question is how to do the matching? Qemu knows nothing about e.g mac > address of a pass-through device I believe? Supply a flag to the VFIO when it's added, this way QEMU will know. > > > > Later QEMU detects that guest driver device set DRIVER_OK. > > It then exposes the primary device to the guest, and triggers > > a device addition event (hot-plug event) for it. > > Do you mean we won't have primary device in the initial qemu cli? No, that's not what I mean. I mean management will supply a flag to VFIO and then - VFIO defers exposing primary to guest until guest acks the feature bit. - When we see guest ack, initiate hotplug. - On reboot, hide it again. - On reset without reboot, request hot-unplug and on eject hide it again. > > > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > > to remove the primary driver. In particular, if QEMU detects guest > > re-initialization (e.g. by detecting guest reset) it immediately removes > > the primary device. > > I believe guest failover module should handle this gracefully? It can't control enumeration order, if primary is enumerated before standby then guest will load its driver and it's too late when failover driver is loaded. > Thanks > > > > > We can move some of this code to management as well, architecturally it > > does not make too much sense but it might be easier implementation-wise. > > > > HTH > > > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: > > > Ping on this patch now that the kernel patches are accepted into davem's net-next tree. > > > https://patchwork.ozlabs.org/cover/920005/ > > > > > > > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > > > > This feature bit can be used by hypervisor to indicate virtio_net device to > > > > act as a standby for another device with the same MAC address. > > > > > > > > I tested this with a small change to the patch to mark the STANDBY feature 'true' > > > > by default as i am using libvirt to start the VMs. > > > > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt > > > > XML file? > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > --- > > > > hw/net/virtio-net.c | 2 ++ > > > > include/standard-headers/linux/virtio_net.h | 3 +++ > > > > 2 files changed, 5 insertions(+) > > > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > > > index 90502fca7c..38b3140670 100644 > > > > --- a/hw/net/virtio-net.c > > > > +++ b/hw/net/virtio-net.c > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > > > true), > > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), > > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, > > > > + false), > > > > DEFINE_PROP_END_OF_LIST(), > > > > }; > > > > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h > > > > index e9f255ea3f..01ec09684c 100644 > > > > --- a/include/standard-headers/linux/virtio_net.h > > > > +++ b/include/standard-headers/linux/virtio_net.h > > > > @@ -57,6 +57,9 @@ > > > > * Steering */ > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device > > > > + * with the same MAC. > > > > + */ > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > > > #ifndef VIRTIO_NET_NO_LEGACY
On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote: > On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote: >> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote: >>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: >>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote: >>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: >>>>>> This feature bit can be used by hypervisor to indicate virtio_net device to >>>>>> act as a standby for another device with the same MAC address. >>>>>> >>>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true' >>>>>> by default as i am using libvirt to start the VMs. >>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >>>>>> XML file? >>>>>> >>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>> So I do not think we can commit to this interface: we >>>>> really need to control visibility of the primary device. >>>> The problem is legacy guest won't use primary device at all if we do this. >>> And that's by design - I think it's the only way to ensure the >>> legacy guest isn't confused. >> Yes. I think so. But i am not sure if Qemu is the right place to control the visibility >> of the primary device. The primary device may not be specified as an argument to Qemu. It >> may be plugged in later. >> The cloud service provider is providing a feature that enables low latency datapath and live >> migration capability. >> A tenant can use this feature only if he is running a VM that has virtio-net with failover support. > Well live migration is there already. The new feature is low latency > data path. we get live migration with just virtio. But I meant live migration with VF as primary device. > > And it's the guest that needs failover support not the VM. Isn't guest and VM synonymous? > > >> I think Qemu should check if guest virtio-net supports this feature and provide a mechanism for >> an upper layer indicating if the STANDBY feature is successfully negotiated or not. >> The upper layer can then decide if it should hot plug a VF with the same MAC and manage the 2 links. >> If VF is successfully hot plugged, virtio-net link should be disabled. > Did you even talk to upper layer management about it? > Just list the steps they need to do and you will see > that's a lot of machinery to manage by the upper layer. > > What do we gain in flexibility? As far as I can see the > only gain is some resources saved for legacy VMs. > > That's not a lot as tenant of the upper layer probably already has > at least a hunch that it's a new guest otherwise > why bother specifying the feature at all - you > save even more resources without it. > I am not all that familiar with how Qemu manages network devices. If we can do all the required management of the primary/standby devices within Qemu, that is definitely a better approach without upper layer involvement. > > >>>> How about control the visibility of standby device? >>>> >>>> Thanks >>> standy the always there to guarantee no downtime. >>> >>>>> However just for testing purposes, we could add a non-stable >>>>> interface "x-standby" with the understanding that as any >>>>> x- prefix it's unstable and will be changed down the road, >>>>> likely in the next release. >>>>> >>>>> >>>>>> --- >>>>>> hw/net/virtio-net.c | 2 ++ >>>>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>>>> 2 files changed, 5 insertions(+) >>>>>> >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>> index 90502fca7c..38b3140670 100644 >>>>>> --- a/hw/net/virtio-net.c >>>>>> +++ b/hw/net/virtio-net.c >>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>>>> true), >>>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >>>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >>>>>> + false), >>>>>> DEFINE_PROP_END_OF_LIST(), >>>>>> }; >>>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >>>>>> index e9f255ea3f..01ec09684c 100644 >>>>>> --- a/include/standard-headers/linux/virtio_net.h >>>>>> +++ b/include/standard-headers/linux/virtio_net.h >>>>>> @@ -57,6 +57,9 @@ >>>>>> * Steering */ >>>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >>>>>> + * with the same MAC. >>>>>> + */ >>>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >>>>>> #ifndef VIRTIO_NET_NO_LEGACY >>>>>> -- >>>>>> 2.14.3 > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >
On 6/12/2018 4:54 AM, Michael S. Tsirkin wrote: > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >> >> On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >>> I don't think this is sufficient. >>> >>> If both primary and standby devices are present, a legacy guest without >>> support for the feature might see two devices with same mac and get >>> confused. >>> >>> I think that we should only make primary visible after guest acked the >>> backup feature bit. >> I think we want exactly the reverse? E.g fail the negotiation when guest >> does not ack backup feature. >> >> Otherwise legacy guest won't even have the chance to see primary device in >> the guest. > That's by design. > >>> And on reset or when backup is cleared in some other way, unplug the >>> primary. >> What if guest does not support hotplug? > It shouldn't ack the backup feature then and it's a good point. > We should both document this and check kernel config has > hotplug support. Sridhar could you take a look pls? Right now we select NET_FAILOVER when virtio-net is enabled, Should we select PCI_HOTPLUG when NET_FAILOVER is enabled? > >>> Something like the below will do the job: >>> >>> Primary device is added with a special "primary-failover" flag. >>> A virtual machine is then initialized with just a standby virtio >>> device. Primary is not yet added. >> A question is how to do the matching? Qemu knows nothing about e.g mac >> address of a pass-through device I believe? > Supply a flag to the VFIO when it's added, this way QEMU will know. > >>> Later QEMU detects that guest driver device set DRIVER_OK. >>> It then exposes the primary device to the guest, and triggers >>> a device addition event (hot-plug event) for it. >> Do you mean we won't have primary device in the initial qemu cli? > No, that's not what I mean. > > I mean management will supply a flag to VFIO and then > > > - VFIO defers exposing > primary to guest until guest acks the feature bit. > - When we see guest ack, initiate hotplug. > - On reboot, hide it again. > - On reset without reboot, request hot-unplug and on eject hide it again. > > >>> If QEMU detects guest driver removal, it initiates a hot-unplug sequence >>> to remove the primary driver. In particular, if QEMU detects guest >>> re-initialization (e.g. by detecting guest reset) it immediately removes >>> the primary device. >> I believe guest failover module should handle this gracefully? > It can't control enumeration order, if primary is enumerated before > standby then guest will load its driver and it's too late > when failover driver is loaded. Does it mean that if guest unloads virtio-net driver, Qemu should automatically unplug the associated primary device? What about guest unloading/re-loading primary device driver? > >> Thanks >> >>> We can move some of this code to management as well, architecturally it >>> does not make too much sense but it might be easier implementation-wise. >>> >>> HTH >>> >>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >>>> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. >>>> https://patchwork.ozlabs.org/cover/920005/ >>>> >>>> >>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>>>> This feature bit can be used by hypervisor to indicate virtio_net device to >>>>> act as a standby for another device with the same MAC address. >>>>> >>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true' >>>>> by default as i am using libvirt to start the VMs. >>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >>>>> XML file? >>>>> >>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>> --- >>>>> hw/net/virtio-net.c | 2 ++ >>>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>>> 2 files changed, 5 insertions(+) >>>>> >>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>> index 90502fca7c..38b3140670 100644 >>>>> --- a/hw/net/virtio-net.c >>>>> +++ b/hw/net/virtio-net.c >>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>>> true), >>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >>>>> + false), >>>>> DEFINE_PROP_END_OF_LIST(), >>>>> }; >>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >>>>> index e9f255ea3f..01ec09684c 100644 >>>>> --- a/include/standard-headers/linux/virtio_net.h >>>>> +++ b/include/standard-headers/linux/virtio_net.h >>>>> @@ -57,6 +57,9 @@ >>>>> * Steering */ >>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >>>>> + * with the same MAC. >>>>> + */ >>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >>>>> #ifndef VIRTIO_NET_NO_LEGACY
On 2018年06月12日 19:54, Michael S. Tsirkin wrote: > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >> >> On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >>> I don't think this is sufficient. >>> >>> If both primary and standby devices are present, a legacy guest without >>> support for the feature might see two devices with same mac and get >>> confused. >>> >>> I think that we should only make primary visible after guest acked the >>> backup feature bit. >> I think we want exactly the reverse? E.g fail the negotiation when guest >> does not ack backup feature. >> >> Otherwise legacy guest won't even have the chance to see primary device in >> the guest. > That's by design. So management needs to know the capability of guest to set the backup feature. This looks a chicken or egg problem to me. > >>> And on reset or when backup is cleared in some other way, unplug the >>> primary. >> What if guest does not support hotplug? > It shouldn't ack the backup feature then and it's a good point. > We should both document this and check kernel config has > hotplug support. Sridhar could you take a look pls? > >>> Something like the below will do the job: >>> >>> Primary device is added with a special "primary-failover" flag. >>> A virtual machine is then initialized with just a standby virtio >>> device. Primary is not yet added. >> A question is how to do the matching? Qemu knows nothing about e.g mac >> address of a pass-through device I believe? > Supply a flag to the VFIO when it's added, this way QEMU will know. > >>> Later QEMU detects that guest driver device set DRIVER_OK. >>> It then exposes the primary device to the guest, and triggers >>> a device addition event (hot-plug event) for it. >> Do you mean we won't have primary device in the initial qemu cli? > No, that's not what I mean. > > I mean management will supply a flag to VFIO and then > > > - VFIO defers exposing > primary to guest until guest acks the feature bit. > - When we see guest ack, initiate hotplug. > - On reboot, hide it again. > - On reset without reboot, request hot-unplug and on eject hide it again. This sounds much like a kind of bonding in qemu. > >>> If QEMU detects guest driver removal, it initiates a hot-unplug sequence >>> to remove the primary driver. In particular, if QEMU detects guest >>> re-initialization (e.g. by detecting guest reset) it immediately removes >>> the primary device. >> I believe guest failover module should handle this gracefully? > It can't control enumeration order, if primary is enumerated before > standby then guest will load its driver and it's too late > when failover driver is loaded. Well, even if we can delay the visibility of primary until DRIVER_OK, there still be a race I think? And it looks to me it's still a bug of guest: E.g primary could be probed before failover_register() in guest. Then we will miss the enslaving of primary forever. Thanks > >> Thanks >> >>> We can move some of this code to management as well, architecturally it >>> does not make too much sense but it might be easier implementation-wise. >>> >>> HTH >>> >>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >>>> Ping on this patch now that the kernel patches are accepted into davem's net-next tree. >>>> https://patchwork.ozlabs.org/cover/920005/ >>>> >>>> >>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>>>> This feature bit can be used by hypervisor to indicate virtio_net device to >>>>> act as a standby for another device with the same MAC address. >>>>> >>>>> I tested this with a small change to the patch to mark the STANDBY feature 'true' >>>>> by default as i am using libvirt to start the VMs. >>>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt >>>>> XML file? >>>>> >>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>> --- >>>>> hw/net/virtio-net.c | 2 ++ >>>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>>> 2 files changed, 5 insertions(+) >>>>> >>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>> index 90502fca7c..38b3140670 100644 >>>>> --- a/hw/net/virtio-net.c >>>>> +++ b/hw/net/virtio-net.c >>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>>> true), >>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), >>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, >>>>> + false), >>>>> DEFINE_PROP_END_OF_LIST(), >>>>> }; >>>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h >>>>> index e9f255ea3f..01ec09684c 100644 >>>>> --- a/include/standard-headers/linux/virtio_net.h >>>>> +++ b/include/standard-headers/linux/virtio_net.h >>>>> @@ -57,6 +57,9 @@ >>>>> * Steering */ >>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device >>>>> + * with the same MAC. >>>>> + */ >>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ >>>>> #ifndef VIRTIO_NET_NO_LEGACY
On 2018年06月13日 08:20, Samudrala, Sridhar wrote: > On 6/12/2018 4:54 AM, Michael S. Tsirkin wrote: >> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >>> >>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >>>> I don't think this is sufficient. >>>> >>>> If both primary and standby devices are present, a legacy guest >>>> without >>>> support for the feature might see two devices with same mac and get >>>> confused. >>>> >>>> I think that we should only make primary visible after guest acked the >>>> backup feature bit. >>> I think we want exactly the reverse? E.g fail the negotiation when >>> guest >>> does not ack backup feature. >>> >>> Otherwise legacy guest won't even have the chance to see primary >>> device in >>> the guest. >> That's by design. >> >>>> And on reset or when backup is cleared in some other way, unplug the >>>> primary. >>> What if guest does not support hotplug? >> It shouldn't ack the backup feature then and it's a good point. >> We should both document this and check kernel config has >> hotplug support. Sridhar could you take a look pls? > > Right now we select NET_FAILOVER when virtio-net is enabled, Should we > select > PCI_HOTPLUG when NET_FAILOVER is enabled? Maybe I was wrong but it looks to me PCI_HOTPLUG is no sufficient since it depends on other to work e.g HOTPLUG_PCI_ACPI. Thanks
On 6/12/2018 7:38 PM, Jason Wang wrote: > > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote: >> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >>> >>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >>>> I don't think this is sufficient. >>>> >>>> If both primary and standby devices are present, a legacy guest >>>> without >>>> support for the feature might see two devices with same mac and get >>>> confused. >>>> >>>> I think that we should only make primary visible after guest acked the >>>> backup feature bit. >>> I think we want exactly the reverse? E.g fail the negotiation when >>> guest >>> does not ack backup feature. >>> >>> Otherwise legacy guest won't even have the chance to see primary >>> device in >>> the guest. >> That's by design. > > So management needs to know the capability of guest to set the backup > feature. This looks a chicken or egg problem to me. I don't think so. If the tenant requests 'accelerated datapath feature', the management will set 'standby' feature bit on virtio-net interface and if the guest virtio-net driver supports this feature, then the tenant VM will get that capability via a hot-plugged primary device. > >> >>>> And on reset or when backup is cleared in some other way, unplug the >>>> primary. >>> What if guest does not support hotplug? >> It shouldn't ack the backup feature then and it's a good point. >> We should both document this and check kernel config has >> hotplug support. Sridhar could you take a look pls? >> >>>> Something like the below will do the job: >>>> >>>> Primary device is added with a special "primary-failover" flag. >>>> A virtual machine is then initialized with just a standby virtio >>>> device. Primary is not yet added. >>> A question is how to do the matching? Qemu knows nothing about e.g mac >>> address of a pass-through device I believe? >> Supply a flag to the VFIO when it's added, this way QEMU will know. >> >>>> Later QEMU detects that guest driver device set DRIVER_OK. >>>> It then exposes the primary device to the guest, and triggers >>>> a device addition event (hot-plug event) for it. >>> Do you mean we won't have primary device in the initial qemu cli? >> No, that's not what I mean. >> >> I mean management will supply a flag to VFIO and then >> >> >> - VFIO defers exposing >> primary to guest until guest acks the feature bit. >> - When we see guest ack, initiate hotplug. >> - On reboot, hide it again. >> - On reset without reboot, request hot-unplug and on eject hide it >> again. > > This sounds much like a kind of bonding in qemu. > >> >>>> If QEMU detects guest driver removal, it initiates a hot-unplug >>>> sequence >>>> to remove the primary driver. In particular, if QEMU detects guest >>>> re-initialization (e.g. by detecting guest reset) it immediately >>>> removes >>>> the primary device. >>> I believe guest failover module should handle this gracefully? >> It can't control enumeration order, if primary is enumerated before >> standby then guest will load its driver and it's too late >> when failover driver is loaded. > > Well, even if we can delay the visibility of primary until DRIVER_OK, > there still be a race I think? And it looks to me it's still a bug of > guest: > > E.g primary could be probed before failover_register() in guest. Then > we will miss the enslaving of primary forever. That is not an issue. Even if the primary is probed before failover driver, it will enslave the primary via the call to failover_existing_slave_register() as part of failover_register() routine. > > Thanks > >> >>> Thanks >>> >>>> We can move some of this code to management as well, >>>> architecturally it >>>> does not make too much sense but it might be easier >>>> implementation-wise. >>>> >>>> HTH >>>> >>>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >>>>> Ping on this patch now that the kernel patches are accepted into >>>>> davem's net-next tree. >>>>> https://patchwork.ozlabs.org/cover/920005/ >>>>> >>>>> >>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>>>>> This feature bit can be used by hypervisor to indicate virtio_net >>>>>> device to >>>>>> act as a standby for another device with the same MAC address. >>>>>> >>>>>> I tested this with a small change to the patch to mark the >>>>>> STANDBY feature 'true' >>>>>> by default as i am using libvirt to start the VMs. >>>>>> Is there a way to pass the newly added feature bit 'standby' to >>>>>> qemu via libvirt >>>>>> XML file? >>>>>> >>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>>> --- >>>>>> hw/net/virtio-net.c | 2 ++ >>>>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>>>> 2 files changed, 5 insertions(+) >>>>>> >>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>> index 90502fca7c..38b3140670 100644 >>>>>> --- a/hw/net/virtio-net.c >>>>>> +++ b/hw/net/virtio-net.c >>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>>>> true), >>>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, >>>>>> SPEED_UNKNOWN), >>>>>> DEFINE_PROP_STRING("duplex", VirtIONet, >>>>>> net_conf.duplex_str), >>>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, >>>>>> VIRTIO_NET_F_STANDBY, >>>>>> + false), >>>>>> DEFINE_PROP_END_OF_LIST(), >>>>>> }; >>>>>> diff --git a/include/standard-headers/linux/virtio_net.h >>>>>> b/include/standard-headers/linux/virtio_net.h >>>>>> index e9f255ea3f..01ec09684c 100644 >>>>>> --- a/include/standard-headers/linux/virtio_net.h >>>>>> +++ b/include/standard-headers/linux/virtio_net.h >>>>>> @@ -57,6 +57,9 @@ >>>>>> * Steering */ >>>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for >>>>>> another device >>>>>> + * with the same MAC. >>>>>> + */ >>>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set >>>>>> linkspeed and duplex */ >>>>>> #ifndef VIRTIO_NET_NO_LEGACY >
On 2018年06月13日 12:24, Samudrala, Sridhar wrote: > On 6/12/2018 7:38 PM, Jason Wang wrote: >> >> >> On 2018年06月12日 19:54, Michael S. Tsirkin wrote: >>> On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >>>> >>>> On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >>>>> I don't think this is sufficient. >>>>> >>>>> If both primary and standby devices are present, a legacy guest >>>>> without >>>>> support for the feature might see two devices with same mac and get >>>>> confused. >>>>> >>>>> I think that we should only make primary visible after guest acked >>>>> the >>>>> backup feature bit. >>>> I think we want exactly the reverse? E.g fail the negotiation when >>>> guest >>>> does not ack backup feature. >>>> >>>> Otherwise legacy guest won't even have the chance to see primary >>>> device in >>>> the guest. >>> That's by design. >> >> So management needs to know the capability of guest to set the backup >> feature. This looks a chicken or egg problem to me. > > I don't think so. If the tenant requests 'accelerated datapath > feature', the management > will set 'standby' feature bit on virtio-net interface and if the > guest virtio-net driver > supports this feature, then the tenant VM will get that capability via > a hot-plugged > primary device. Ok, I thought exactly the reverse because of the commit title is "enable virtio_net to act as a standby for a passthru device". But re-read the commit log content, I understand the case a little bit. Btw, VF is not necessarily faster than virtio-net, especially consider virtio-net may have a lot of queues. > > >> >>> >>>>> And on reset or when backup is cleared in some other way, unplug the >>>>> primary. >>>> What if guest does not support hotplug? >>> It shouldn't ack the backup feature then and it's a good point. >>> We should both document this and check kernel config has >>> hotplug support. Sridhar could you take a look pls? >>> >>>>> Something like the below will do the job: >>>>> >>>>> Primary device is added with a special "primary-failover" flag. >>>>> A virtual machine is then initialized with just a standby virtio >>>>> device. Primary is not yet added. >>>> A question is how to do the matching? Qemu knows nothing about e.g mac >>>> address of a pass-through device I believe? >>> Supply a flag to the VFIO when it's added, this way QEMU will know. >>> >>>>> Later QEMU detects that guest driver device set DRIVER_OK. >>>>> It then exposes the primary device to the guest, and triggers >>>>> a device addition event (hot-plug event) for it. >>>> Do you mean we won't have primary device in the initial qemu cli? >>> No, that's not what I mean. >>> >>> I mean management will supply a flag to VFIO and then >>> >>> >>> - VFIO defers exposing >>> primary to guest until guest acks the feature bit. >>> - When we see guest ack, initiate hotplug. >>> - On reboot, hide it again. >>> - On reset without reboot, request hot-unplug and on eject hide it >>> again. >> >> This sounds much like a kind of bonding in qemu. >> >>> >>>>> If QEMU detects guest driver removal, it initiates a hot-unplug >>>>> sequence >>>>> to remove the primary driver. In particular, if QEMU detects guest >>>>> re-initialization (e.g. by detecting guest reset) it immediately >>>>> removes >>>>> the primary device. >>>> I believe guest failover module should handle this gracefully? >>> It can't control enumeration order, if primary is enumerated before >>> standby then guest will load its driver and it's too late >>> when failover driver is loaded. >> >> Well, even if we can delay the visibility of primary until DRIVER_OK, >> there still be a race I think? And it looks to me it's still a bug of >> guest: >> >> E.g primary could be probed before failover_register() in guest. Then >> we will miss the enslaving of primary forever. > > That is not an issue. Even if the primary is probed before failover > driver, it will > enslave the primary via the call to failover_existing_slave_register() > as part of > failover_register() routine. Aha I get it. So the enumeration order is not an issue. Consider primary may still be seen by guest kernel even if we delay its visibility, I wonder whether we can control the lifecycle of primary through driver but not qemu. This can simplify a lot of things. Thanks > >> >> Thanks >> >>> >>>> Thanks >>>> >>>>> We can move some of this code to management as well, >>>>> architecturally it >>>>> does not make too much sense but it might be easier >>>>> implementation-wise. >>>>> >>>>> HTH >>>>> >>>>> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >>>>>> Ping on this patch now that the kernel patches are accepted into >>>>>> davem's net-next tree. >>>>>> https://patchwork.ozlabs.org/cover/920005/ >>>>>> >>>>>> >>>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >>>>>>> This feature bit can be used by hypervisor to indicate >>>>>>> virtio_net device to >>>>>>> act as a standby for another device with the same MAC address. >>>>>>> >>>>>>> I tested this with a small change to the patch to mark the >>>>>>> STANDBY feature 'true' >>>>>>> by default as i am using libvirt to start the VMs. >>>>>>> Is there a way to pass the newly added feature bit 'standby' to >>>>>>> qemu via libvirt >>>>>>> XML file? >>>>>>> >>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>>>> --- >>>>>>> hw/net/virtio-net.c | 2 ++ >>>>>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>>>>> 2 files changed, 5 insertions(+) >>>>>>> >>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>>> index 90502fca7c..38b3140670 100644 >>>>>>> --- a/hw/net/virtio-net.c >>>>>>> +++ b/hw/net/virtio-net.c >>>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>>>>> true), >>>>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, >>>>>>> SPEED_UNKNOWN), >>>>>>> DEFINE_PROP_STRING("duplex", VirtIONet, >>>>>>> net_conf.duplex_str), >>>>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, >>>>>>> VIRTIO_NET_F_STANDBY, >>>>>>> + false), >>>>>>> DEFINE_PROP_END_OF_LIST(), >>>>>>> }; >>>>>>> diff --git a/include/standard-headers/linux/virtio_net.h >>>>>>> b/include/standard-headers/linux/virtio_net.h >>>>>>> index e9f255ea3f..01ec09684c 100644 >>>>>>> --- a/include/standard-headers/linux/virtio_net.h >>>>>>> +++ b/include/standard-headers/linux/virtio_net.h >>>>>>> @@ -57,6 +57,9 @@ >>>>>>> * Steering */ >>>>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for >>>>>>> another device >>>>>>> + * with the same MAC. >>>>>>> + */ >>>>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set >>>>>>> linkspeed and duplex */ >>>>>>> #ifndef VIRTIO_NET_NO_LEGACY >> >
On Tue, Jun 12, 2018 at 4:47 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Jun 05, 2018 at 03:09:26PM -0700, Siwei Liu wrote: >> The thing is cloud service provider might prefer sticking to the same >> level of service agreement (SLA) of keeping VF over migration, > > That requirement is trivially satisfied by just a single VF :) If your > SLA does not require live migration, you should do just that. The requirement is enable live migration by default without getting user too much attention. Migration should be attempted if guest is live migratable. If not, present a single VF to legacy guest. You seem to think it's not a valid use case? Or impossible to do so? What if I have something in mind that can support my theory as well? -Siwei > > -- > MST
On Tue, Jun 12, 2018 at 5:08 PM, Samudrala, Sridhar <sridhar.samudrala@intel.com> wrote: > On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote: >> >> On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote: >>> >>> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote: >>>> >>>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: >>>>> >>>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote: >>>>>> >>>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: >>>>>>> >>>>>>> This feature bit can be used by hypervisor to indicate virtio_net >>>>>>> device to >>>>>>> act as a standby for another device with the same MAC address. >>>>>>> >>>>>>> I tested this with a small change to the patch to mark the STANDBY >>>>>>> feature 'true' >>>>>>> by default as i am using libvirt to start the VMs. >>>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu >>>>>>> via libvirt >>>>>>> XML file? >>>>>>> >>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >>>>>> >>>>>> So I do not think we can commit to this interface: we >>>>>> really need to control visibility of the primary device. >>>>> >>>>> The problem is legacy guest won't use primary device at all if we do >>>>> this. >>>> >>>> And that's by design - I think it's the only way to ensure the >>>> legacy guest isn't confused. >>> >>> Yes. I think so. But i am not sure if Qemu is the right place to control >>> the visibility >>> of the primary device. The primary device may not be specified as an >>> argument to Qemu. It >>> may be plugged in later. >>> The cloud service provider is providing a feature that enables low >>> latency datapath and live >>> migration capability. >>> A tenant can use this feature only if he is running a VM that has >>> virtio-net with failover support. >> >> Well live migration is there already. The new feature is low latency >> data path. > > > we get live migration with just virtio. But I meant live migration with VF > as > primary device. > >> >> And it's the guest that needs failover support not the VM. > > > Isn't guest and VM synonymous? > > >> >> >>> I think Qemu should check if guest virtio-net supports this feature and >>> provide a mechanism for >>> an upper layer indicating if the STANDBY feature is successfully >>> negotiated or not. >>> The upper layer can then decide if it should hot plug a VF with the same >>> MAC and manage the 2 links. >>> If VF is successfully hot plugged, virtio-net link should be disabled. >> >> Did you even talk to upper layer management about it? >> Just list the steps they need to do and you will see >> that's a lot of machinery to manage by the upper layer. >> >> What do we gain in flexibility? As far as I can see the >> only gain is some resources saved for legacy VMs. >> >> That's not a lot as tenant of the upper layer probably already has >> at least a hunch that it's a new guest otherwise >> why bother specifying the feature at all - you >> save even more resources without it. >> > > I am not all that familiar with how Qemu manages network devices. If we can > do all the > required management of the primary/standby devices within Qemu, that is > definitely a better > approach without upper layer involvement. Right. I would imagine in the extreme case the upper layer doesn't have to be involved at all if QEMU manages all hot plug/unplug logic. The management tool can supply passthrough device and virtio with the same group UUID, QEMU auto-manages the presence of the primary, and hot plug the device as needed before or after the migration. -Siwei > > >> >> >>>>> How about control the visibility of standby device? >>>>> >>>>> Thanks >>>> >>>> standy the always there to guarantee no downtime. >>>> >>>>>> However just for testing purposes, we could add a non-stable >>>>>> interface "x-standby" with the understanding that as any >>>>>> x- prefix it's unstable and will be changed down the road, >>>>>> likely in the next release. >>>>>> >>>>>> >>>>>>> --- >>>>>>> hw/net/virtio-net.c | 2 ++ >>>>>>> include/standard-headers/linux/virtio_net.h | 3 +++ >>>>>>> 2 files changed, 5 insertions(+) >>>>>>> >>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>>> index 90502fca7c..38b3140670 100644 >>>>>>> --- a/hw/net/virtio-net.c >>>>>>> +++ b/hw/net/virtio-net.c >>>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >>>>>>> true), >>>>>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, >>>>>>> SPEED_UNKNOWN), >>>>>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), >>>>>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, >>>>>>> VIRTIO_NET_F_STANDBY, >>>>>>> + false), >>>>>>> DEFINE_PROP_END_OF_LIST(), >>>>>>> }; >>>>>>> diff --git a/include/standard-headers/linux/virtio_net.h >>>>>>> b/include/standard-headers/linux/virtio_net.h >>>>>>> index e9f255ea3f..01ec09684c 100644 >>>>>>> --- a/include/standard-headers/linux/virtio_net.h >>>>>>> +++ b/include/standard-headers/linux/virtio_net.h >>>>>>> @@ -57,6 +57,9 @@ >>>>>>> * Steering */ >>>>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >>>>>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for >>>>>>> another device >>>>>>> + * with the same MAC. >>>>>>> + */ >>>>>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set >>>>>>> linkspeed and duplex */ >>>>>>> #ifndef VIRTIO_NET_NO_LEGACY >>>>>>> -- >>>>>>> 2.14.3 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> >
I've been pointed to this discussion (which I had missed previously) and I'm getting a headache. Let me first summarize how I understand how this feature is supposed to work, then I'll respond to some individual points. The basic idea is to enable guests to migrate seamlessly, while still making it possible for them to use a passed-through device for more performance etc. The means to do so is to hook a virtio-net device together with a network device passed through via vfio. The vfio-handled device is there for performance, the virtio device for migratability. We have a new virtio feature bit for that which needs to be negotiated for that 'combined' device to be available. We have to consider two cases: - Older guests that do not support the new feature bit. We presume that those guests will be confused if they get two network devices with the same MAC, so the idea is to not show them the vfio-handled device at all. - Guests that negotiate the feature bit. We only know positively that they (a) know the feature bit and (b) are prepared to handle the consequences of negotiating it after they set the FEATURES_OK bit. This is therefore the earliest point in time that the vfio-handled device should be visible or usable for the guest. On Wed, 13 Jun 2018 18:02:01 -0700 Siwei Liu <loseweigh@gmail.com> wrote: > On Tue, Jun 12, 2018 at 5:08 PM, Samudrala, Sridhar > <sridhar.samudrala@intel.com> wrote: > > On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote: > >> > >> On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote: > >>> > >>> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote: > >>>> > >>>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: > >>>>> > >>>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote: > >>>>>> > >>>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: > >>>>>>> > >>>>>>> This feature bit can be used by hypervisor to indicate virtio_net > >>>>>>> device to > >>>>>>> act as a standby for another device with the same MAC address. > >>>>>>> > >>>>>>> I tested this with a small change to the patch to mark the STANDBY > >>>>>>> feature 'true' > >>>>>>> by default as i am using libvirt to start the VMs. > >>>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu > >>>>>>> via libvirt > >>>>>>> XML file? > >>>>>>> > >>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > >>>>>> > >>>>>> So I do not think we can commit to this interface: we > >>>>>> really need to control visibility of the primary device. > >>>>> > >>>>> The problem is legacy guest won't use primary device at all if we do > >>>>> this. > >>>> > >>>> And that's by design - I think it's the only way to ensure the > >>>> legacy guest isn't confused. > >>> > >>> Yes. I think so. But i am not sure if Qemu is the right place to control > >>> the visibility > >>> of the primary device. The primary device may not be specified as an > >>> argument to Qemu. It > >>> may be plugged in later. > >>> The cloud service provider is providing a feature that enables low > >>> latency datapath and live > >>> migration capability. > >>> A tenant can use this feature only if he is running a VM that has > >>> virtio-net with failover support. So, do you know from the outset that there will be such a coupled device? I.e., is it a property of the VM definition? Can there be a 'prepared' virtio-net device that presents the STANDBY feature even if there currently is no vfio-handled device available -- but making it possible to simply hotplug that device later? Should it be possible to add a virtio/vfio pair later on? > >> > >> Well live migration is there already. The new feature is low latency > >> data path. > > > > > > we get live migration with just virtio. But I meant live migration with VF > > as > > primary device. > > > >> > >> And it's the guest that needs failover support not the VM. > > > > > > Isn't guest and VM synonymous? I think we need to be really careful to not mix up the two: The VM contains the definitions, but it is up to the guest how it uses them. > > > > > >> > >> > >>> I think Qemu should check if guest virtio-net supports this feature and > >>> provide a mechanism for > >>> an upper layer indicating if the STANDBY feature is successfully > >>> negotiated or not. > >>> The upper layer can then decide if it should hot plug a VF with the same > >>> MAC and manage the 2 links. > >>> If VF is successfully hot plugged, virtio-net link should be disabled. > >> > >> Did you even talk to upper layer management about it? > >> Just list the steps they need to do and you will see > >> that's a lot of machinery to manage by the upper layer. > >> > >> What do we gain in flexibility? As far as I can see the > >> only gain is some resources saved for legacy VMs. > >> > >> That's not a lot as tenant of the upper layer probably already has > >> at least a hunch that it's a new guest otherwise > >> why bother specifying the feature at all - you > >> save even more resources without it. > >> > > > > I am not all that familiar with how Qemu manages network devices. If we can > > do all the > > required management of the primary/standby devices within Qemu, that is > > definitely a better > > approach without upper layer involvement. > > Right. I would imagine in the extreme case the upper layer doesn't > have to be involved at all if QEMU manages all hot plug/unplug logic. > The management tool can supply passthrough device and virtio with the > same group UUID, QEMU auto-manages the presence of the primary, and > hot plug the device as needed before or after the migration. I do not really see how you can manage that kind of stuff in QEMU only. Have you talked to some libvirt folks? (And I'm not sure what you refer to with 'group UUID'?) Also, I think you need to make a distinction between hotplugging a device and making it visible to the guest. What does 'hotplugging' mean here? Adding it to the VM definition? Would it be enough to have the vfio-based device not operational until the virtio feature bit has been negotiated? What happens if the guest does not use the vfio-based device after it has been made available? Will you still disable the virtio-net link? (All that link handling definitely sounds like a task for libvirt or the like.) Regarding hot(un)plugging during migration, I think you also need to keep in mind that different architectures/busses have different semantics there. Something that works if there's an unplug handshake may not work on a platform with surprise removal. Have you considered guest agents? All of this is punching through several layers, and I'm not sure if that is a good idea.
On Wed, Jun 13, 2018 at 06:02:01PM -0700, Siwei Liu wrote: > >> And it's the guest that needs failover support not the VM. > > > > > > Isn't guest and VM synonymous? Guest is whatever software is running on top of the hypervisor. The virtual machine is the interface between the two.
Thank you for sharing your thoughts, Cornelia. With questions below, I think you raised really good points, some of which I don't have answer yet and would also like to explore here. First off, I don't want to push the discussion to the extreme at this point, or sell anything about having QEMU manage everything automatically. Don't get me wrong, it's not there yet. Let's don't assume we are tied to a specific or concerte solution. I think the key for our discussion might be to define or refine the boundary between VM and guest, e.g. what each layer is expected to control and manage exactly. In my view, there might be possibly 3 different options to represent the failover device conceipt to QEMU and libvirt (or any upper layer software): a. Seperate device: in this model, virtio and passthough remains separate devices just as today. QEMU exposes the standby feature bit for virtio, and publish status/event around the negotiation process of this feature bit for libvirt to react upon. Since Libvirt has the pairing relationship itself, maybe through MAC address or something else, it can control the presence of primary by hot plugging or unplugging the passthrough device, although it has to work tightly with virtio's feature negotation process. Not just for migration but also various corner scenarios (driver/feature ok, device reset, reboot, legacy guest etc) along virtio's feature negotiation. b. Coupled device: in this model, virtio and passthough devices are weakly coupled using some group ID, i.e. QEMU match the passthough device for a standby virtio instance by comparing the group ID value present behind each device's bridge. Libvirt provides QEMU the group ID for both type of devices, and only deals with hot plug for migration, by checking some migration status exposed (e.g. the feature negotiation status on the virtio device) by QEMU. QEMU manages the visibility of the primary in guest along virtio's feature negotiation process. c. Fully combined device: in this model, virtio and passthough devices are viewed as a single VM interface altogther. QEMU not just controls the visibility of the primary in guest, but can also manage the exposure of the passthrough for migratability. It can be like that libvirt supplies the group ID to QEMU. Or libvirt does not even have to provide group ID for grouping the two devices, if just one single combined device is exposed by QEMU. In either case, QEMU manages all aspect of such internal construct, including virtio feature negotiation, presence of the primary, and live migration. It looks like to me that, in your opinion, you seem to prefer go with (a). While I'm actually okay with either (b) or (c). Do I understand your point correctly? The reason that I feel that (a) might not be ideal, just as Michael alluded to (quoting below), is that as management stack, it really doesn't need to care about the detailed process of feature negotiation (if we view the guest presence of the primary as part of feature negotiation at an extended level not just virtio). All it needs to be done is to hand in the required devices to QEMU and that's all. Why do we need to addd various hooks, events for whichever happens internally within the guest? '' Primary device is added with a special "primary-failover" flag. A virtual machine is then initialized with just a standby virtio device. Primary is not yet added. Later QEMU detects that guest driver device set DRIVER_OK. It then exposes the primary device to the guest, and triggers a device addition event (hot-plug event) for it. If QEMU detects guest driver removal, it initiates a hot-unplug sequence to remove the primary driver. In particular, if QEMU detects guest re-initialization (e.g. by detecting guest reset) it immediately removes the primary device. '' and, '' management just wants to give the primary to guest and later take it back, it really does not care about the details of the process, so I don't see what does pushing it up the stack buy you. So I don't think it *needs* to be done in libvirt. It probably can if you add a bunch of hooks so it knows whenever vm reboots, driver binds and unbinds from device, and can check that backup flag was set. If you are pushing for a setup like that please get a buy-in from libvirt maintainers or better write a patch. '' Let me know if my clarifications sound clear to you now. Thanks, -Siwei On Thu, Jun 14, 2018 at 3:02 AM, Cornelia Huck <cohuck@redhat.com> wrote: > I've been pointed to this discussion (which I had missed previously) > and I'm getting a headache. Let me first summarize how I understand how > this feature is supposed to work, then I'll respond to some individual > points. > > The basic idea is to enable guests to migrate seamlessly, while still > making it possible for them to use a passed-through device for more > performance etc. The means to do so is to hook a virtio-net device > together with a network device passed through via vfio. The > vfio-handled device is there for performance, the virtio device for > migratability. We have a new virtio feature bit for that which needs to > be negotiated for that 'combined' device to be available. We have to > consider two cases: > > - Older guests that do not support the new feature bit. We presume that > those guests will be confused if they get two network devices with > the same MAC, so the idea is to not show them the vfio-handled device > at all. > - Guests that negotiate the feature bit. We only know positively that > they (a) know the feature bit and (b) are prepared to handle the > consequences of negotiating it after they set the FEATURES_OK bit. > This is therefore the earliest point in time that the vfio-handled > device should be visible or usable for the guest. > > On Wed, 13 Jun 2018 18:02:01 -0700 > Siwei Liu <loseweigh@gmail.com> wrote: > >> On Tue, Jun 12, 2018 at 5:08 PM, Samudrala, Sridhar >> <sridhar.samudrala@intel.com> wrote: >> > On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote: >> >> >> >> On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote: >> >>> >> >>> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote: >> >>>> >> >>>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote: >> >>>>> >> >>>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote: >> >>>>>> >> >>>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote: >> >>>>>>> >> >>>>>>> This feature bit can be used by hypervisor to indicate virtio_net >> >>>>>>> device to >> >>>>>>> act as a standby for another device with the same MAC address. >> >>>>>>> >> >>>>>>> I tested this with a small change to the patch to mark the STANDBY >> >>>>>>> feature 'true' >> >>>>>>> by default as i am using libvirt to start the VMs. >> >>>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu >> >>>>>>> via libvirt >> >>>>>>> XML file? >> >>>>>>> >> >>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> >>>>>> >> >>>>>> So I do not think we can commit to this interface: we >> >>>>>> really need to control visibility of the primary device. >> >>>>> >> >>>>> The problem is legacy guest won't use primary device at all if we do >> >>>>> this. >> >>>> >> >>>> And that's by design - I think it's the only way to ensure the >> >>>> legacy guest isn't confused. >> >>> >> >>> Yes. I think so. But i am not sure if Qemu is the right place to control >> >>> the visibility >> >>> of the primary device. The primary device may not be specified as an >> >>> argument to Qemu. It >> >>> may be plugged in later. >> >>> The cloud service provider is providing a feature that enables low >> >>> latency datapath and live >> >>> migration capability. >> >>> A tenant can use this feature only if he is running a VM that has >> >>> virtio-net with failover support. > > So, do you know from the outset that there will be such a coupled > device? I.e., is it a property of the VM definition? > > Can there be a 'prepared' virtio-net device that presents the STANDBY > feature even if there currently is no vfio-handled device available -- > but making it possible to simply hotplug that device later? > > Should it be possible to add a virtio/vfio pair later on? > >> >> >> >> Well live migration is there already. The new feature is low latency >> >> data path. >> > >> > >> > we get live migration with just virtio. But I meant live migration with VF >> > as >> > primary device. >> > >> >> >> >> And it's the guest that needs failover support not the VM. >> > >> > >> > Isn't guest and VM synonymous? > > I think we need to be really careful to not mix up the two: The VM > contains the definitions, but it is up to the guest how it uses them. > >> > >> > >> >> >> >> >> >>> I think Qemu should check if guest virtio-net supports this feature and >> >>> provide a mechanism for >> >>> an upper layer indicating if the STANDBY feature is successfully >> >>> negotiated or not. >> >>> The upper layer can then decide if it should hot plug a VF with the same >> >>> MAC and manage the 2 links. >> >>> If VF is successfully hot plugged, virtio-net link should be disabled. >> >> >> >> Did you even talk to upper layer management about it? >> >> Just list the steps they need to do and you will see >> >> that's a lot of machinery to manage by the upper layer. >> >> >> >> What do we gain in flexibility? As far as I can see the >> >> only gain is some resources saved for legacy VMs. >> >> >> >> That's not a lot as tenant of the upper layer probably already has >> >> at least a hunch that it's a new guest otherwise >> >> why bother specifying the feature at all - you >> >> save even more resources without it. >> >> >> > >> > I am not all that familiar with how Qemu manages network devices. If we can >> > do all the >> > required management of the primary/standby devices within Qemu, that is >> > definitely a better >> > approach without upper layer involvement. >> >> Right. I would imagine in the extreme case the upper layer doesn't >> have to be involved at all if QEMU manages all hot plug/unplug logic. >> The management tool can supply passthrough device and virtio with the >> same group UUID, QEMU auto-manages the presence of the primary, and >> hot plug the device as needed before or after the migration. > > I do not really see how you can manage that kind of stuff in QEMU only. > Have you talked to some libvirt folks? (And I'm not sure what you refer > to with 'group UUID'?) > > Also, I think you need to make a distinction between hotplugging a > device and making it visible to the guest. What does 'hotplugging' mean > here? Adding it to the VM definition? Would it be enough to have the > vfio-based device not operational until the virtio feature bit has been > negotiated? > > What happens if the guest does not use the vfio-based device after it > has been made available? Will you still disable the virtio-net link? > (All that link handling definitely sounds like a task for libvirt or > the like.) > > Regarding hot(un)plugging during migration, I think you also need to > keep in mind that different architectures/busses have different > semantics there. Something that works if there's an unplug handshake may > not work on a platform with surprise removal. > > Have you considered guest agents? All of this is punching through > several layers, and I'm not sure if that is a good idea.
On Thu, Jun 14, 2018 at 12:02:31PM +0200, Cornelia Huck wrote: > So, do you know from the outset that there will be such a coupled > device? I.e., is it a property of the VM definition? > > Can there be a 'prepared' virtio-net device that presents the STANDBY > feature even if there currently is no vfio-handled device available -- > but making it possible to simply hotplug that device later? > Should it be possible to add a virtio/vfio pair later on? Yes, that's the plan, more or less. > > >>> I think Qemu should check if guest virtio-net supports this feature and > > >>> provide a mechanism for > > >>> an upper layer indicating if the STANDBY feature is successfully > > >>> negotiated or not. > > >>> The upper layer can then decide if it should hot plug a VF with the same > > >>> MAC and manage the 2 links. > > >>> If VF is successfully hot plugged, virtio-net link should be disabled. BTW I see no reason to do this last part. The role of the standby device is to be always there. > > >> > > >> Did you even talk to upper layer management about it? > > >> Just list the steps they need to do and you will see > > >> that's a lot of machinery to manage by the upper layer. > > >> > > >> What do we gain in flexibility? As far as I can see the > > >> only gain is some resources saved for legacy VMs. > > >> > > >> That's not a lot as tenant of the upper layer probably already has > > >> at least a hunch that it's a new guest otherwise > > >> why bother specifying the feature at all - you > > >> save even more resources without it. > > >> > > > > > > I am not all that familiar with how Qemu manages network devices. If we can > > > do all the > > > required management of the primary/standby devices within Qemu, that is > > > definitely a better > > > approach without upper layer involvement. > > > > Right. I would imagine in the extreme case the upper layer doesn't > > have to be involved at all if QEMU manages all hot plug/unplug logic. > > The management tool can supply passthrough device and virtio with the > > same group UUID, QEMU auto-manages the presence of the primary, and > > hot plug the device as needed before or after the migration. > > I do not really see how you can manage that kind of stuff in QEMU only. So right now failover is limited to pci passthrough devices only. The idea is to realize the vfio device but not expose it to guest. Have a separate command to expose it to guest. Hotunplug would also hide it from guest but not unrealize it. This will help ensure that e.g. on migration failure we can re-expose the device without risk of running out of resources.
On Fri, 15 Jun 2018 05:34:24 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Jun 14, 2018 at 12:02:31PM +0200, Cornelia Huck wrote: > > > > I am not all that familiar with how Qemu manages network devices. If we can > > > > do all the > > > > required management of the primary/standby devices within Qemu, that is > > > > definitely a better > > > > approach without upper layer involvement. > > > > > > Right. I would imagine in the extreme case the upper layer doesn't > > > have to be involved at all if QEMU manages all hot plug/unplug logic. > > > The management tool can supply passthrough device and virtio with the > > > same group UUID, QEMU auto-manages the presence of the primary, and > > > hot plug the device as needed before or after the migration. > > > > I do not really see how you can manage that kind of stuff in QEMU only. > > So right now failover is limited to pci passthrough devices only. > The idea is to realize the vfio device but not expose it > to guest. Have a separate command to expose it to guest. > Hotunplug would also hide it from guest but not unrealize it. So, this would not be real hot(un)plug, but 'hide it from the guest', right? The concept of "we have it realized in QEMU, but the guest can't discover and use it" should be translatable to non-pci as well (at least for ccw). > > This will help ensure that e.g. on migration failure we can > re-expose the device without risk of running out of resources. Makes sense. Should that 'hidden' state be visible/settable from outside as well (e.g. via a property)? I guess yes, so that management software has a chance to see whether a device is visible. Settable may be useful if we find another use case for hiding realized devices.
On Thu, 14 Jun 2018 18:57:11 -0700 Siwei Liu <loseweigh@gmail.com> wrote: > Thank you for sharing your thoughts, Cornelia. With questions below, I > think you raised really good points, some of which I don't have answer > yet and would also like to explore here. > > First off, I don't want to push the discussion to the extreme at this > point, or sell anything about having QEMU manage everything > automatically. Don't get me wrong, it's not there yet. Let's don't > assume we are tied to a specific or concerte solution. I think the key > for our discussion might be to define or refine the boundary between > VM and guest, e.g. what each layer is expected to control and manage > exactly. > > In my view, there might be possibly 3 different options to represent > the failover device conceipt to QEMU and libvirt (or any upper layer > software): > > a. Seperate device: in this model, virtio and passthough remains > separate devices just as today. QEMU exposes the standby feature bit > for virtio, and publish status/event around the negotiation process of > this feature bit for libvirt to react upon. Since Libvirt has the > pairing relationship itself, maybe through MAC address or something > else, it can control the presence of primary by hot plugging or > unplugging the passthrough device, although it has to work tightly > with virtio's feature negotation process. Not just for migration but > also various corner scenarios (driver/feature ok, device reset, > reboot, legacy guest etc) along virtio's feature negotiation. Yes, that one has obvious tie-ins to virtio's modus operandi. > > b. Coupled device: in this model, virtio and passthough devices are > weakly coupled using some group ID, i.e. QEMU match the passthough > device for a standby virtio instance by comparing the group ID value > present behind each device's bridge. Libvirt provides QEMU the group > ID for both type of devices, and only deals with hot plug for > migration, by checking some migration status exposed (e.g. the feature > negotiation status on the virtio device) by QEMU. QEMU manages the > visibility of the primary in guest along virtio's feature negotiation > process. I'm a bit confused here. What, exactly, ties the two devices together? If libvirt already has the knowledge that it should manage the two as a couple, why do we need the group id (or something else for other architectures)? (Maybe I'm simply missing something because I'm not that familiar with pci.) > > c. Fully combined device: in this model, virtio and passthough devices > are viewed as a single VM interface altogther. QEMU not just controls > the visibility of the primary in guest, but can also manage the > exposure of the passthrough for migratability. It can be like that > libvirt supplies the group ID to QEMU. Or libvirt does not even have > to provide group ID for grouping the two devices, if just one single > combined device is exposed by QEMU. In either case, QEMU manages all > aspect of such internal construct, including virtio feature > negotiation, presence of the primary, and live migration. Same question as above. > > It looks like to me that, in your opinion, you seem to prefer go with > (a). While I'm actually okay with either (b) or (c). Do I understand > your point correctly? I'm not yet preferring anything, as I'm still trying to understand how this works :) I hope we can arrive at a model that covers the use case and that is also flexible enough to be extended to other platforms. > > The reason that I feel that (a) might not be ideal, just as Michael > alluded to (quoting below), is that as management stack, it really > doesn't need to care about the detailed process of feature negotiation > (if we view the guest presence of the primary as part of feature > negotiation at an extended level not just virtio). All it needs to be > done is to hand in the required devices to QEMU and that's all. Why do > we need to addd various hooks, events for whichever happens internally > within the guest? > > '' > Primary device is added with a special "primary-failover" flag. > A virtual machine is then initialized with just a standby virtio > device. Primary is not yet added. > > Later QEMU detects that guest driver device set DRIVER_OK. > It then exposes the primary device to the guest, and triggers > a device addition event (hot-plug event) for it. > > If QEMU detects guest driver removal, it initiates a hot-unplug sequence > to remove the primary driver. In particular, if QEMU detects guest > re-initialization (e.g. by detecting guest reset) it immediately removes > the primary device. > '' > > and, > > '' > management just wants to give the primary to guest and later take it back, > it really does not care about the details of the process, > so I don't see what does pushing it up the stack buy you. > > So I don't think it *needs* to be done in libvirt. It probably can if you > add a bunch of hooks so it knows whenever vm reboots, driver binds and > unbinds from device, and can check that backup flag was set. > If you are pushing for a setup like that please get a buy-in > from libvirt maintainers or better write a patch. > '' This actually seems to mean the opposite to me: We need to know what the guest is doing and when, as it directly drives what we need to do with the devices. If we switch to a visibility vs a hotplug model (see the other mail), we might be able to handle that part within qemu. However, I don't see how you get around needing libvirt to actually set this up in the first place and to handle migration per se.
On Fri, Jun 15, 2018 at 11:32:42AM +0200, Cornelia Huck wrote: > On Fri, 15 Jun 2018 05:34:24 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Jun 14, 2018 at 12:02:31PM +0200, Cornelia Huck wrote: > > > > > > I am not all that familiar with how Qemu manages network devices. If we can > > > > > do all the > > > > > required management of the primary/standby devices within Qemu, that is > > > > > definitely a better > > > > > approach without upper layer involvement. > > > > > > > > Right. I would imagine in the extreme case the upper layer doesn't > > > > have to be involved at all if QEMU manages all hot plug/unplug logic. > > > > The management tool can supply passthrough device and virtio with the > > > > same group UUID, QEMU auto-manages the presence of the primary, and > > > > hot plug the device as needed before or after the migration. > > > > > > I do not really see how you can manage that kind of stuff in QEMU only. > > > > So right now failover is limited to pci passthrough devices only. > > The idea is to realize the vfio device but not expose it > > to guest. Have a separate command to expose it to guest. > > Hotunplug would also hide it from guest but not unrealize it. > > So, this would not be real hot(un)plug, but 'hide it from the guest', > right? The concept of "we have it realized in QEMU, but the guest can't > discover and use it" should be translatable to non-pci as well (at > least for ccw). > > > > > This will help ensure that e.g. on migration failure we can > > re-expose the device without risk of running out of resources. > > Makes sense. > > Should that 'hidden' state be visible/settable from outside as well > (e.g. via a property)? I guess yes, so that management software has a > chance to see whether a device is visible. Might be handy for debug, but note that since QEMU manages this state it's transient: can change at any time, so it's kind of hard for management to rely on it. > Settable may be useful if we > find another use case for hiding realized devices.
On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote: > On Thu, 14 Jun 2018 18:57:11 -0700 > Siwei Liu <loseweigh@gmail.com> wrote: > >> Thank you for sharing your thoughts, Cornelia. With questions below, I >> think you raised really good points, some of which I don't have answer >> yet and would also like to explore here. >> >> First off, I don't want to push the discussion to the extreme at this >> point, or sell anything about having QEMU manage everything >> automatically. Don't get me wrong, it's not there yet. Let's don't >> assume we are tied to a specific or concerte solution. I think the key >> for our discussion might be to define or refine the boundary between >> VM and guest, e.g. what each layer is expected to control and manage >> exactly. >> >> In my view, there might be possibly 3 different options to represent >> the failover device conceipt to QEMU and libvirt (or any upper layer >> software): >> >> a. Seperate device: in this model, virtio and passthough remains >> separate devices just as today. QEMU exposes the standby feature bit >> for virtio, and publish status/event around the negotiation process of >> this feature bit for libvirt to react upon. Since Libvirt has the >> pairing relationship itself, maybe through MAC address or something >> else, it can control the presence of primary by hot plugging or >> unplugging the passthrough device, although it has to work tightly >> with virtio's feature negotation process. Not just for migration but >> also various corner scenarios (driver/feature ok, device reset, >> reboot, legacy guest etc) along virtio's feature negotiation. > > Yes, that one has obvious tie-ins to virtio's modus operandi. > >> >> b. Coupled device: in this model, virtio and passthough devices are >> weakly coupled using some group ID, i.e. QEMU match the passthough >> device for a standby virtio instance by comparing the group ID value >> present behind each device's bridge. Libvirt provides QEMU the group >> ID for both type of devices, and only deals with hot plug for >> migration, by checking some migration status exposed (e.g. the feature >> negotiation status on the virtio device) by QEMU. QEMU manages the >> visibility of the primary in guest along virtio's feature negotiation >> process. > > I'm a bit confused here. What, exactly, ties the two devices together? The group UUID. Since QEMU VFIO dvice does not have insight of MAC address (which it doesn't have to), the association between VFIO passthrough and standby must be specificed for QEMU to understand the relationship with this model. Note, standby feature is no longer required to be exposed under this model. > If libvirt already has the knowledge that it should manage the two as a > couple, why do we need the group id (or something else for other > architectures)? (Maybe I'm simply missing something because I'm not > that familiar with pci.) The idea is to have QEMU control the visibility and enumeration order of the passthrough VFIO for the failover scenario. Hotplug can be one way to achieve it, and perhaps there's other way around also. The group ID is not just for QEMU to couple devices, it's also helpful to guest too as grouping using MAC address is just not safe. > >> >> c. Fully combined device: in this model, virtio and passthough devices >> are viewed as a single VM interface altogther. QEMU not just controls >> the visibility of the primary in guest, but can also manage the >> exposure of the passthrough for migratability. It can be like that >> libvirt supplies the group ID to QEMU. Or libvirt does not even have >> to provide group ID for grouping the two devices, if just one single >> combined device is exposed by QEMU. In either case, QEMU manages all >> aspect of such internal construct, including virtio feature >> negotiation, presence of the primary, and live migration. > > Same question as above. > >> >> It looks like to me that, in your opinion, you seem to prefer go with >> (a). While I'm actually okay with either (b) or (c). Do I understand >> your point correctly? > > I'm not yet preferring anything, as I'm still trying to understand how > this works :) I hope we can arrive at a model that covers the use case > and that is also flexible enough to be extended to other platforms. > >> >> The reason that I feel that (a) might not be ideal, just as Michael >> alluded to (quoting below), is that as management stack, it really >> doesn't need to care about the detailed process of feature negotiation >> (if we view the guest presence of the primary as part of feature >> negotiation at an extended level not just virtio). All it needs to be >> done is to hand in the required devices to QEMU and that's all. Why do >> we need to addd various hooks, events for whichever happens internally >> within the guest? >> >> '' >> Primary device is added with a special "primary-failover" flag. >> A virtual machine is then initialized with just a standby virtio >> device. Primary is not yet added. >> >> Later QEMU detects that guest driver device set DRIVER_OK. >> It then exposes the primary device to the guest, and triggers >> a device addition event (hot-plug event) for it. >> >> If QEMU detects guest driver removal, it initiates a hot-unplug sequence >> to remove the primary driver. In particular, if QEMU detects guest >> re-initialization (e.g. by detecting guest reset) it immediately removes >> the primary device. >> '' >> >> and, >> >> '' >> management just wants to give the primary to guest and later take it back, >> it really does not care about the details of the process, >> so I don't see what does pushing it up the stack buy you. >> >> So I don't think it *needs* to be done in libvirt. It probably can if you >> add a bunch of hooks so it knows whenever vm reboots, driver binds and >> unbinds from device, and can check that backup flag was set. >> If you are pushing for a setup like that please get a buy-in >> from libvirt maintainers or better write a patch. >> '' > > This actually seems to mean the opposite to me: We need to know what > the guest is doing and when, as it directly drives what we need to do > with the devices. If we switch to a visibility vs a hotplug model (see > the other mail), we might be able to handle that part within qemu. In the model of (b), I think it essentially turns hotplug to one of mechanisms for QEMU to control the visibility. The libvirt can still manage the hotplug of individual devices during live migration or in normal situation to hot add/remove devices. Though the visibility of the VFIO is under the controll of QEMU, and it's possible that the hot add/remove request does not involve actual hot plug activity in guest at all. In the model of (c), the hotplug semantics of the combined device would mean differently - it would end up with devices plugged in or out altogther. To make this work, we either have to build a brand new bond-like QEMU device consist of virtio and VFIO internally, or need to have some abstraction in place for libvirt to manipulate the combined device (and prohibit libvirt from operating on individual internal device directly). Note with this model the group ID doesn't even need to get exposed to libvirt, just imagine libvirt to supply all options required to configure two regular virtio-net and VFIO devices for a single device object, and QEMU will deal with the device's visibility and enumeration, such when to hot plug VFIO device in to or out from the guest. It might be complicated to implement (c) though. Regards, -Siwei > However, I don't see how you get around needing libvirt to actually set > this up in the first place and to handle migration per se.
On Fri, 15 Jun 2018 15:31:43 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Jun 15, 2018 at 11:32:42AM +0200, Cornelia Huck wrote: > > On Fri, 15 Jun 2018 05:34:24 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Thu, Jun 14, 2018 at 12:02:31PM +0200, Cornelia Huck wrote: > > > > > > > > I am not all that familiar with how Qemu manages network devices. If we can > > > > > > do all the > > > > > > required management of the primary/standby devices within Qemu, that is > > > > > > definitely a better > > > > > > approach without upper layer involvement. > > > > > > > > > > Right. I would imagine in the extreme case the upper layer doesn't > > > > > have to be involved at all if QEMU manages all hot plug/unplug logic. > > > > > The management tool can supply passthrough device and virtio with the > > > > > same group UUID, QEMU auto-manages the presence of the primary, and > > > > > hot plug the device as needed before or after the migration. > > > > > > > > I do not really see how you can manage that kind of stuff in QEMU only. > > > > > > So right now failover is limited to pci passthrough devices only. > > > The idea is to realize the vfio device but not expose it > > > to guest. Have a separate command to expose it to guest. > > > Hotunplug would also hide it from guest but not unrealize it. > > > > So, this would not be real hot(un)plug, but 'hide it from the guest', > > right? The concept of "we have it realized in QEMU, but the guest can't > > discover and use it" should be translatable to non-pci as well (at > > least for ccw). > > > > > > > > This will help ensure that e.g. on migration failure we can > > > re-expose the device without risk of running out of resources. > > > > Makes sense. > > > > Should that 'hidden' state be visible/settable from outside as well > > (e.g. via a property)? I guess yes, so that management software has a > > chance to see whether a device is visible. > > Might be handy for debug, but note that since QEMU manages this > state it's transient: can change at any time, so it's kind > of hard for management to rely on it. Might be another reason to have this controlled by management software; being able to find out easily why a device is not visible to the guest seems to be a useful thing. Anyway, let's defer this discussion until it is clear how we actually want to handle the whole setup. > > > Settable may be useful if we > > find another use case for hiding realized devices.
On Fri, 15 Jun 2018 10:06:07 -0700 Siwei Liu <loseweigh@gmail.com> wrote: > On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote: > > On Thu, 14 Jun 2018 18:57:11 -0700 > > Siwei Liu <loseweigh@gmail.com> wrote: > > > >> Thank you for sharing your thoughts, Cornelia. With questions below, I > >> think you raised really good points, some of which I don't have answer > >> yet and would also like to explore here. > >> > >> First off, I don't want to push the discussion to the extreme at this > >> point, or sell anything about having QEMU manage everything > >> automatically. Don't get me wrong, it's not there yet. Let's don't > >> assume we are tied to a specific or concerte solution. I think the key > >> for our discussion might be to define or refine the boundary between > >> VM and guest, e.g. what each layer is expected to control and manage > >> exactly. > >> > >> In my view, there might be possibly 3 different options to represent > >> the failover device conceipt to QEMU and libvirt (or any upper layer > >> software): > >> > >> a. Seperate device: in this model, virtio and passthough remains > >> separate devices just as today. QEMU exposes the standby feature bit > >> for virtio, and publish status/event around the negotiation process of > >> this feature bit for libvirt to react upon. Since Libvirt has the > >> pairing relationship itself, maybe through MAC address or something > >> else, it can control the presence of primary by hot plugging or > >> unplugging the passthrough device, although it has to work tightly > >> with virtio's feature negotation process. Not just for migration but > >> also various corner scenarios (driver/feature ok, device reset, > >> reboot, legacy guest etc) along virtio's feature negotiation. > > > > Yes, that one has obvious tie-ins to virtio's modus operandi. > > > >> > >> b. Coupled device: in this model, virtio and passthough devices are > >> weakly coupled using some group ID, i.e. QEMU match the passthough > >> device for a standby virtio instance by comparing the group ID value > >> present behind each device's bridge. Libvirt provides QEMU the group > >> ID for both type of devices, and only deals with hot plug for > >> migration, by checking some migration status exposed (e.g. the feature > >> negotiation status on the virtio device) by QEMU. QEMU manages the > >> visibility of the primary in guest along virtio's feature negotiation > >> process. > > > > I'm a bit confused here. What, exactly, ties the two devices together? > > The group UUID. Since QEMU VFIO dvice does not have insight of MAC > address (which it doesn't have to), the association between VFIO > passthrough and standby must be specificed for QEMU to understand the > relationship with this model. Note, standby feature is no longer > required to be exposed under this model. Isn't that a bit limiting, though? With this model, you can probably tie a vfio-pci device and a virtio-net-pci device together. But this will fail if you have different transports: Consider tying together a vfio-pci device and a virtio-net-ccw device on s390, for example. The standby feature bit is on the virtio-net level and should not have any dependency on the transport used. > > > If libvirt already has the knowledge that it should manage the two as a > > couple, why do we need the group id (or something else for other > > architectures)? (Maybe I'm simply missing something because I'm not > > that familiar with pci.) > > The idea is to have QEMU control the visibility and enumeration order > of the passthrough VFIO for the failover scenario. Hotplug can be one > way to achieve it, and perhaps there's other way around also. The > group ID is not just for QEMU to couple devices, it's also helpful to > guest too as grouping using MAC address is just not safe. Sorry about dragging mainframes into this, but this will only work for homogenous device coupling, not for heterogenous. Consider my vfio-pci + virtio-net-ccw example again: The guest cannot find out that the two belong together by checking some group ID, it has to either use the MAC or some needs-to-be-architectured property. Alternatively, we could propose that mechanism as pci-only, which means we can rely on mechanisms that won't necessarily work on non-pci transports. (FWIW, I don't see a use case for using vfio-ccw to pass through a network card anytime in the near future, due to the nature of network cards currently in use on s390.) > > > > >> > >> c. Fully combined device: in this model, virtio and passthough devices > >> are viewed as a single VM interface altogther. QEMU not just controls > >> the visibility of the primary in guest, but can also manage the > >> exposure of the passthrough for migratability. It can be like that > >> libvirt supplies the group ID to QEMU. Or libvirt does not even have > >> to provide group ID for grouping the two devices, if just one single > >> combined device is exposed by QEMU. In either case, QEMU manages all > >> aspect of such internal construct, including virtio feature > >> negotiation, presence of the primary, and live migration. > > > > Same question as above. > > > >> > >> It looks like to me that, in your opinion, you seem to prefer go with > >> (a). While I'm actually okay with either (b) or (c). Do I understand > >> your point correctly? > > > > I'm not yet preferring anything, as I'm still trying to understand how > > this works :) I hope we can arrive at a model that covers the use case > > and that is also flexible enough to be extended to other platforms. > > > >> > >> The reason that I feel that (a) might not be ideal, just as Michael > >> alluded to (quoting below), is that as management stack, it really > >> doesn't need to care about the detailed process of feature negotiation > >> (if we view the guest presence of the primary as part of feature > >> negotiation at an extended level not just virtio). All it needs to be > >> done is to hand in the required devices to QEMU and that's all. Why do > >> we need to addd various hooks, events for whichever happens internally > >> within the guest? > >> > >> '' > >> Primary device is added with a special "primary-failover" flag. > >> A virtual machine is then initialized with just a standby virtio > >> device. Primary is not yet added. > >> > >> Later QEMU detects that guest driver device set DRIVER_OK. > >> It then exposes the primary device to the guest, and triggers > >> a device addition event (hot-plug event) for it. > >> > >> If QEMU detects guest driver removal, it initiates a hot-unplug sequence > >> to remove the primary driver. In particular, if QEMU detects guest > >> re-initialization (e.g. by detecting guest reset) it immediately removes > >> the primary device. > >> '' > >> > >> and, > >> > >> '' > >> management just wants to give the primary to guest and later take it back, > >> it really does not care about the details of the process, > >> so I don't see what does pushing it up the stack buy you. > >> > >> So I don't think it *needs* to be done in libvirt. It probably can if you > >> add a bunch of hooks so it knows whenever vm reboots, driver binds and > >> unbinds from device, and can check that backup flag was set. > >> If you are pushing for a setup like that please get a buy-in > >> from libvirt maintainers or better write a patch. > >> '' > > > > This actually seems to mean the opposite to me: We need to know what > > the guest is doing and when, as it directly drives what we need to do > > with the devices. If we switch to a visibility vs a hotplug model (see > > the other mail), we might be able to handle that part within qemu. > > In the model of (b), I think it essentially turns hotplug to one of > mechanisms for QEMU to control the visibility. The libvirt can still > manage the hotplug of individual devices during live migration or in > normal situation to hot add/remove devices. Though the visibility of > the VFIO is under the controll of QEMU, and it's possible that the hot > add/remove request does not involve actual hot plug activity in guest > at all. That depends on how you model visibility, I guess. You'll probably want to stop traffic flowing through one or the other of the cards; would link down or similar be enough for the virtio device? > > In the model of (c), the hotplug semantics of the combined device > would mean differently - it would end up with devices plugged in or > out altogther. To make this work, we either have to build a brand new > bond-like QEMU device consist of virtio and VFIO internally, or need > to have some abstraction in place for libvirt to manipulate the > combined device (and prohibit libvirt from operating on individual > internal device directly). Note with this model the group ID doesn't > even need to get exposed to libvirt, just imagine libvirt to supply > all options required to configure two regular virtio-net and VFIO > devices for a single device object, and QEMU will deal with the > device's visibility and enumeration, such when to hot plug VFIO device > in to or out from the guest. > > It might be complicated to implement (c) though. I think (c) would be even more complicated for heterogenous setups.
On Tue, Jun 19, 2018 at 3:54 AM, Cornelia Huck <cohuck@redhat.com> wrote: > On Fri, 15 Jun 2018 10:06:07 -0700 > Siwei Liu <loseweigh@gmail.com> wrote: > >> On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote: >> > On Thu, 14 Jun 2018 18:57:11 -0700 >> > Siwei Liu <loseweigh@gmail.com> wrote: >> > >> >> Thank you for sharing your thoughts, Cornelia. With questions below, I >> >> think you raised really good points, some of which I don't have answer >> >> yet and would also like to explore here. >> >> >> >> First off, I don't want to push the discussion to the extreme at this >> >> point, or sell anything about having QEMU manage everything >> >> automatically. Don't get me wrong, it's not there yet. Let's don't >> >> assume we are tied to a specific or concerte solution. I think the key >> >> for our discussion might be to define or refine the boundary between >> >> VM and guest, e.g. what each layer is expected to control and manage >> >> exactly. >> >> >> >> In my view, there might be possibly 3 different options to represent >> >> the failover device conceipt to QEMU and libvirt (or any upper layer >> >> software): >> >> >> >> a. Seperate device: in this model, virtio and passthough remains >> >> separate devices just as today. QEMU exposes the standby feature bit >> >> for virtio, and publish status/event around the negotiation process of >> >> this feature bit for libvirt to react upon. Since Libvirt has the >> >> pairing relationship itself, maybe through MAC address or something >> >> else, it can control the presence of primary by hot plugging or >> >> unplugging the passthrough device, although it has to work tightly >> >> with virtio's feature negotation process. Not just for migration but >> >> also various corner scenarios (driver/feature ok, device reset, >> >> reboot, legacy guest etc) along virtio's feature negotiation. >> > >> > Yes, that one has obvious tie-ins to virtio's modus operandi. >> > >> >> >> >> b. Coupled device: in this model, virtio and passthough devices are >> >> weakly coupled using some group ID, i.e. QEMU match the passthough >> >> device for a standby virtio instance by comparing the group ID value >> >> present behind each device's bridge. Libvirt provides QEMU the group >> >> ID for both type of devices, and only deals with hot plug for >> >> migration, by checking some migration status exposed (e.g. the feature >> >> negotiation status on the virtio device) by QEMU. QEMU manages the >> >> visibility of the primary in guest along virtio's feature negotiation >> >> process. >> > >> > I'm a bit confused here. What, exactly, ties the two devices together? >> >> The group UUID. Since QEMU VFIO dvice does not have insight of MAC >> address (which it doesn't have to), the association between VFIO >> passthrough and standby must be specificed for QEMU to understand the >> relationship with this model. Note, standby feature is no longer >> required to be exposed under this model. > > Isn't that a bit limiting, though? > > With this model, you can probably tie a vfio-pci device and a > virtio-net-pci device together. But this will fail if you have > different transports: Consider tying together a vfio-pci device and a > virtio-net-ccw device on s390, for example. The standby feature bit is > on the virtio-net level and should not have any dependency on the > transport used. Probably we'd limit the support for grouping to virtio-net-pci device and vfio-pci device only. For virtio-net-pci, as you might see with Venu's patch, we store the group UUID on the config space of virtio-pci, which is only applicable to PCI transport. If virtio-net-ccw needs to support the same, I think similar grouping interface should be defined on the VirtIO CCW transport. I think the current implementation of the Linux failover driver assumes that it's SR-IOV VF with same MAC address which the virtio-net-pci needs to pair with, and that the PV path is on same PF without needing to update network of the port-MAC association change. If we need to extend the grouping mechanism to virtio-net-ccw, it has to pass such failover mode to virtio driver specifically through some other option I guess. > >> >> > If libvirt already has the knowledge that it should manage the two as a >> > couple, why do we need the group id (or something else for other >> > architectures)? (Maybe I'm simply missing something because I'm not >> > that familiar with pci.) >> >> The idea is to have QEMU control the visibility and enumeration order >> of the passthrough VFIO for the failover scenario. Hotplug can be one >> way to achieve it, and perhaps there's other way around also. The >> group ID is not just for QEMU to couple devices, it's also helpful to >> guest too as grouping using MAC address is just not safe. > > Sorry about dragging mainframes into this, but this will only work for > homogenous device coupling, not for heterogenous. Consider my vfio-pci > + virtio-net-ccw example again: The guest cannot find out that the two > belong together by checking some group ID, it has to either use the MAC > or some needs-to-be-architectured property. > > Alternatively, we could propose that mechanism as pci-only, which means > we can rely on mechanisms that won't necessarily work on non-pci > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > through a network card anytime in the near future, due to the nature of > network cards currently in use on s390.) Yes, let's do this just for PCI transport (homogenous) for now. > >> >> > >> >> >> >> c. Fully combined device: in this model, virtio and passthough devices >> >> are viewed as a single VM interface altogther. QEMU not just controls >> >> the visibility of the primary in guest, but can also manage the >> >> exposure of the passthrough for migratability. It can be like that >> >> libvirt supplies the group ID to QEMU. Or libvirt does not even have >> >> to provide group ID for grouping the two devices, if just one single >> >> combined device is exposed by QEMU. In either case, QEMU manages all >> >> aspect of such internal construct, including virtio feature >> >> negotiation, presence of the primary, and live migration. >> > >> > Same question as above. >> > >> >> >> >> It looks like to me that, in your opinion, you seem to prefer go with >> >> (a). While I'm actually okay with either (b) or (c). Do I understand >> >> your point correctly? >> > >> > I'm not yet preferring anything, as I'm still trying to understand how >> > this works :) I hope we can arrive at a model that covers the use case >> > and that is also flexible enough to be extended to other platforms. >> > >> >> >> >> The reason that I feel that (a) might not be ideal, just as Michael >> >> alluded to (quoting below), is that as management stack, it really >> >> doesn't need to care about the detailed process of feature negotiation >> >> (if we view the guest presence of the primary as part of feature >> >> negotiation at an extended level not just virtio). All it needs to be >> >> done is to hand in the required devices to QEMU and that's all. Why do >> >> we need to addd various hooks, events for whichever happens internally >> >> within the guest? >> >> >> >> '' >> >> Primary device is added with a special "primary-failover" flag. >> >> A virtual machine is then initialized with just a standby virtio >> >> device. Primary is not yet added. >> >> >> >> Later QEMU detects that guest driver device set DRIVER_OK. >> >> It then exposes the primary device to the guest, and triggers >> >> a device addition event (hot-plug event) for it. >> >> >> >> If QEMU detects guest driver removal, it initiates a hot-unplug sequence >> >> to remove the primary driver. In particular, if QEMU detects guest >> >> re-initialization (e.g. by detecting guest reset) it immediately removes >> >> the primary device. >> >> '' >> >> >> >> and, >> >> >> >> '' >> >> management just wants to give the primary to guest and later take it back, >> >> it really does not care about the details of the process, >> >> so I don't see what does pushing it up the stack buy you. >> >> >> >> So I don't think it *needs* to be done in libvirt. It probably can if you >> >> add a bunch of hooks so it knows whenever vm reboots, driver binds and >> >> unbinds from device, and can check that backup flag was set. >> >> If you are pushing for a setup like that please get a buy-in >> >> from libvirt maintainers or better write a patch. >> >> '' >> > >> > This actually seems to mean the opposite to me: We need to know what >> > the guest is doing and when, as it directly drives what we need to do >> > with the devices. If we switch to a visibility vs a hotplug model (see >> > the other mail), we might be able to handle that part within qemu. >> >> In the model of (b), I think it essentially turns hotplug to one of >> mechanisms for QEMU to control the visibility. The libvirt can still >> manage the hotplug of individual devices during live migration or in >> normal situation to hot add/remove devices. Though the visibility of >> the VFIO is under the controll of QEMU, and it's possible that the hot >> add/remove request does not involve actual hot plug activity in guest >> at all. > > That depends on how you model visibility, I guess. You'll probably want > to stop traffic flowing through one or the other of the cards; would > link down or similar be enough for the virtio device? I'm not sure if it is a good idea. The guest user will see two devices with same MAC but one of them is down. Do you expect user to use it or not? And since the guest is going to be migrated, we need to unplug a broken VF from guest before migrating, why do we bother plugging in this useless VF at the first place? Thanks, -Siwei > >> >> In the model of (c), the hotplug semantics of the combined device >> would mean differently - it would end up with devices plugged in or >> out altogther. To make this work, we either have to build a brand new >> bond-like QEMU device consist of virtio and VFIO internally, or need >> to have some abstraction in place for libvirt to manipulate the >> combined device (and prohibit libvirt from operating on individual >> internal device directly). Note with this model the group ID doesn't >> even need to get exposed to libvirt, just imagine libvirt to supply >> all options required to configure two regular virtio-net and VFIO >> devices for a single device object, and QEMU will deal with the >> device's visibility and enumeration, such when to hot plug VFIO device >> in to or out from the guest. >> >> It might be complicated to implement (c) though. > > I think (c) would be even more complicated for heterogenous setups.
On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote: > Sorry about dragging mainframes into this, but this will only work for > homogenous device coupling, not for heterogenous. Consider my vfio-pci > + virtio-net-ccw example again: The guest cannot find out that the two > belong together by checking some group ID, it has to either use the MAC > or some needs-to-be-architectured property. > > Alternatively, we could propose that mechanism as pci-only, which means > we can rely on mechanisms that won't necessarily work on non-pci > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > through a network card anytime in the near future, due to the nature of > network cards currently in use on s390.) That's what it boils down to, yes. If there's need to have this for non-pci devices, then we should put it in config space. Cornelia, what do you think?
On Tue, 19 Jun 2018 23:32:06 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote: > > Sorry about dragging mainframes into this, but this will only work for > > homogenous device coupling, not for heterogenous. Consider my vfio-pci > > + virtio-net-ccw example again: The guest cannot find out that the two > > belong together by checking some group ID, it has to either use the MAC > > or some needs-to-be-architectured property. > > > > Alternatively, we could propose that mechanism as pci-only, which means > > we can rely on mechanisms that won't necessarily work on non-pci > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > > through a network card anytime in the near future, due to the nature of > > network cards currently in use on s390.) > > That's what it boils down to, yes. If there's need to have this for > non-pci devices, then we should put it in config space. > Cornelia, what do you think? > I think the only really useful config on s390 is the vfio-pci network card coupled with a virtio-net-ccw device: Using an s390 network card via vfio-ccw is out due to the nature of the s390 network cards, and virtio-ccw is the default transport (virtio-pci is not supported on any enterprise distro AFAIK). For this, having a uuid in the config space could work (vfio-pci devices have a config space by virtue of being pci devices, and virtio-net-ccw devices have a config space by virtue of being virtio devices -- ccw devices usually don't have that concept).
On Wed, Jun 20, 2018 at 11:53:59AM +0200, Cornelia Huck wrote: > On Tue, 19 Jun 2018 23:32:06 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote: > > > Sorry about dragging mainframes into this, but this will only work for > > > homogenous device coupling, not for heterogenous. Consider my vfio-pci > > > + virtio-net-ccw example again: The guest cannot find out that the two > > > belong together by checking some group ID, it has to either use the MAC > > > or some needs-to-be-architectured property. > > > > > > Alternatively, we could propose that mechanism as pci-only, which means > > > we can rely on mechanisms that won't necessarily work on non-pci > > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > > > through a network card anytime in the near future, due to the nature of > > > network cards currently in use on s390.) > > > > That's what it boils down to, yes. If there's need to have this for > > non-pci devices, then we should put it in config space. > > Cornelia, what do you think? > > > > I think the only really useful config on s390 is the vfio-pci network > card coupled with a virtio-net-ccw device: Using an s390 network card > via vfio-ccw is out due to the nature of the s390 network cards, and > virtio-ccw is the default transport (virtio-pci is not supported on any > enterprise distro AFAIK). > > For this, having a uuid in the config space could work (vfio-pci > devices have a config space by virtue of being pci devices, and > virtio-net-ccw devices have a config space by virtue of being virtio > devices -- ccw devices usually don't have that concept). OK so this calls for adding such a field generally (it's device agnostic right now). How would you suggest doing that?
On Tue, 19 Jun 2018 13:09:14 -0700 Siwei Liu <loseweigh@gmail.com> wrote: > On Tue, Jun 19, 2018 at 3:54 AM, Cornelia Huck <cohuck@redhat.com> wrote: > > On Fri, 15 Jun 2018 10:06:07 -0700 > > Siwei Liu <loseweigh@gmail.com> wrote: > > > >> On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote: > >> > On Thu, 14 Jun 2018 18:57:11 -0700 > >> > Siwei Liu <loseweigh@gmail.com> wrote: > >> > I'm a bit confused here. What, exactly, ties the two devices together? > >> > >> The group UUID. Since QEMU VFIO dvice does not have insight of MAC > >> address (which it doesn't have to), the association between VFIO > >> passthrough and standby must be specificed for QEMU to understand the > >> relationship with this model. Note, standby feature is no longer > >> required to be exposed under this model. > > > > Isn't that a bit limiting, though? > > > > With this model, you can probably tie a vfio-pci device and a > > virtio-net-pci device together. But this will fail if you have > > different transports: Consider tying together a vfio-pci device and a > > virtio-net-ccw device on s390, for example. The standby feature bit is > > on the virtio-net level and should not have any dependency on the > > transport used. > > Probably we'd limit the support for grouping to virtio-net-pci device > and vfio-pci device only. For virtio-net-pci, as you might see with > Venu's patch, we store the group UUID on the config space of > virtio-pci, which is only applicable to PCI transport. > > If virtio-net-ccw needs to support the same, I think similar grouping > interface should be defined on the VirtIO CCW transport. I think the > current implementation of the Linux failover driver assumes that it's > SR-IOV VF with same MAC address which the virtio-net-pci needs to pair > with, and that the PV path is on same PF without needing to update > network of the port-MAC association change. If we need to extend the > grouping mechanism to virtio-net-ccw, it has to pass such failover > mode to virtio driver specifically through some other option I guess. Hm, I've just spent some time reading the Linux failover code and I did not really find much pci-related magic in there (other than checking for a pci device in net_failover_slave_pre_register). We also seem to look for a matching device by MAC only. What magic am I missing? Is the look-for-uuid handling supposed to happen in the host only? > >> > If libvirt already has the knowledge that it should manage the two as a > >> > couple, why do we need the group id (or something else for other > >> > architectures)? (Maybe I'm simply missing something because I'm not > >> > that familiar with pci.) > >> > >> The idea is to have QEMU control the visibility and enumeration order > >> of the passthrough VFIO for the failover scenario. Hotplug can be one > >> way to achieve it, and perhaps there's other way around also. The > >> group ID is not just for QEMU to couple devices, it's also helpful to > >> guest too as grouping using MAC address is just not safe. > > > > Sorry about dragging mainframes into this, but this will only work for > > homogenous device coupling, not for heterogenous. Consider my vfio-pci > > + virtio-net-ccw example again: The guest cannot find out that the two > > belong together by checking some group ID, it has to either use the MAC > > or some needs-to-be-architectured property. > > > > Alternatively, we could propose that mechanism as pci-only, which means > > we can rely on mechanisms that won't necessarily work on non-pci > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > > through a network card anytime in the near future, due to the nature of > > network cards currently in use on s390.) > > Yes, let's do this just for PCI transport (homogenous) for now. But why? Using pci for passthrough to make things easier (and because there's not really a use case), sure. But I really don't want to restrict this to virtio-pci only. > >> In the model of (b), I think it essentially turns hotplug to one of > >> mechanisms for QEMU to control the visibility. The libvirt can still > >> manage the hotplug of individual devices during live migration or in > >> normal situation to hot add/remove devices. Though the visibility of > >> the VFIO is under the controll of QEMU, and it's possible that the hot > >> add/remove request does not involve actual hot plug activity in guest > >> at all. > > > > That depends on how you model visibility, I guess. You'll probably want > > to stop traffic flowing through one or the other of the cards; would > > link down or similar be enough for the virtio device? > > I'm not sure if it is a good idea. The guest user will see two devices > with same MAC but one of them is down. Do you expect user to use it or > not? And since the guest is going to be migrated, we need to unplug a > broken VF from guest before migrating, why do we bother plugging in > this useless VF at the first place? I was thinking about using hotunplugging only over migration and doing the link up only after feature negotiation has finished, but that is probably too complicated. Let's stick to hotplug for simplicity's sake.
On Wed, 20 Jun 2018 17:11:59 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Jun 20, 2018 at 11:53:59AM +0200, Cornelia Huck wrote: > > On Tue, 19 Jun 2018 23:32:06 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote: > > > > Sorry about dragging mainframes into this, but this will only work for > > > > homogenous device coupling, not for heterogenous. Consider my vfio-pci > > > > + virtio-net-ccw example again: The guest cannot find out that the two > > > > belong together by checking some group ID, it has to either use the MAC > > > > or some needs-to-be-architectured property. > > > > > > > > Alternatively, we could propose that mechanism as pci-only, which means > > > > we can rely on mechanisms that won't necessarily work on non-pci > > > > transports. (FWIW, I don't see a use case for using vfio-ccw to pass > > > > through a network card anytime in the near future, due to the nature of > > > > network cards currently in use on s390.) > > > > > > That's what it boils down to, yes. If there's need to have this for > > > non-pci devices, then we should put it in config space. > > > Cornelia, what do you think? > > > > > > > I think the only really useful config on s390 is the vfio-pci network > > card coupled with a virtio-net-ccw device: Using an s390 network card > > via vfio-ccw is out due to the nature of the s390 network cards, and > > virtio-ccw is the default transport (virtio-pci is not supported on any > > enterprise distro AFAIK). > > > > For this, having a uuid in the config space could work (vfio-pci > > devices have a config space by virtue of being pci devices, and > > virtio-net-ccw devices have a config space by virtue of being virtio > > devices -- ccw devices usually don't have that concept). > > OK so this calls for adding such a field generally (it's > device agnostic right now). > > How would you suggest doing that? I hope that I'm not thoroughly confused at this point in time, so I'll summarize my current understanding (also keep in mind that I haven't looked at Venu's patches yet): - The Linux guest initiates coupling from the virtio-net driver. Matching the other device is done via the MAC, and only pci devices are allowed for the failover device. (There does not seem to be any restriction on the transport of the virtio-net device.) - The Linux guest virtio-net driver does not allow changing the MAC if standby has been negotiated (implying that the hypervisor needs to configure the correct MAC). - In QEMU, we need to know which two devices (vfio-pci and virtio-net) go together, so that the virtio-net device gets the correct MAC. We also need the pairing so that we can make the vfio-pci device available once the guest has negotiated the standby feature. We can tack the two devices together in QEMU by introducing new, optional properties pointing from the virtio-net device to the vfio-pci device (only offer standby if this is set) and the other way around (don't make the device visible at the start if this is set). Problems: - The admin needs to figure out the MAC by themselves and set it correctly. If this is incorrect, the vfio-pci device cannot be found in the guest. (Not sure how much of a problem this is in practice -- and QEMU cannot figure out the MAC without poking at the vfio-pci device, and we probably want to avoid that.) - This two-way pointing makes for interesting handing of the command line and when both devices are plugged later. In any case, I'm not sure anymore why we'd want the extra uuid. Is there any way QEMU (or libvirt) can figure it out without actually looking at the vfio-pci device?
On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> In any case, I'm not sure anymore why we'd want the extra uuid.
It's mostly so we can have e.g. multiple devices with same MAC
(which some people seem to want in order to then use
then with different containers).
But it is also handy for when you assign a PF, since then you
can't set the MAC.
On Wed, Jun 20, 2018 at 7:34 AM, Cornelia Huck <cohuck@redhat.com> wrote: > On Tue, 19 Jun 2018 13:09:14 -0700 > Siwei Liu <loseweigh@gmail.com> wrote: > >> On Tue, Jun 19, 2018 at 3:54 AM, Cornelia Huck <cohuck@redhat.com> wrote: >> > On Fri, 15 Jun 2018 10:06:07 -0700 >> > Siwei Liu <loseweigh@gmail.com> wrote: >> > >> >> On Fri, Jun 15, 2018 at 4:48 AM, Cornelia Huck <cohuck@redhat.com> wrote: >> >> > On Thu, 14 Jun 2018 18:57:11 -0700 >> >> > Siwei Liu <loseweigh@gmail.com> wrote: > >> >> > I'm a bit confused here. What, exactly, ties the two devices together? >> >> >> >> The group UUID. Since QEMU VFIO dvice does not have insight of MAC >> >> address (which it doesn't have to), the association between VFIO >> >> passthrough and standby must be specificed for QEMU to understand the >> >> relationship with this model. Note, standby feature is no longer >> >> required to be exposed under this model. >> > >> > Isn't that a bit limiting, though? >> > >> > With this model, you can probably tie a vfio-pci device and a >> > virtio-net-pci device together. But this will fail if you have >> > different transports: Consider tying together a vfio-pci device and a >> > virtio-net-ccw device on s390, for example. The standby feature bit is >> > on the virtio-net level and should not have any dependency on the >> > transport used. >> >> Probably we'd limit the support for grouping to virtio-net-pci device >> and vfio-pci device only. For virtio-net-pci, as you might see with >> Venu's patch, we store the group UUID on the config space of >> virtio-pci, which is only applicable to PCI transport. >> >> If virtio-net-ccw needs to support the same, I think similar grouping >> interface should be defined on the VirtIO CCW transport. I think the >> current implementation of the Linux failover driver assumes that it's >> SR-IOV VF with same MAC address which the virtio-net-pci needs to pair >> with, and that the PV path is on same PF without needing to update >> network of the port-MAC association change. If we need to extend the >> grouping mechanism to virtio-net-ccw, it has to pass such failover >> mode to virtio driver specifically through some other option I guess. > > Hm, I've just spent some time reading the Linux failover code and I did > not really find much pci-related magic in there (other than checking > for a pci device in net_failover_slave_pre_register). We also seem to > look for a matching device by MAC only. What magic am I missing? The existing assumptions around SR-IOV VF and thus PCI is implicit. A lot of simplications are built on the fact that the passthrough device is a SR-IOV Virtual Function specifically than others: MAC addresses for couple devices must be the same, changing MAC address is prohibited, programming VLAN filter is challenged, the datapath of virtio-net has to share the same physical function where VF belongs to. There's no hankshake during datapath switching at all to support a normal passthrough device at this point. I'd imagine some work around that ahead, which might be a bit involved than just to support a simplified model for VF migration. > > Is the look-for-uuid handling supposed to happen in the host only? The look-for-MAC matching scheme is not ideal in many aspects. I don't want to repeat those again, but once the group UUID is added to QEMU, the failover driver is supposed to switch to the UUID based matching scheme in the guest. > >> >> > If libvirt already has the knowledge that it should manage the two as a >> >> > couple, why do we need the group id (or something else for other >> >> > architectures)? (Maybe I'm simply missing something because I'm not >> >> > that familiar with pci.) >> >> >> >> The idea is to have QEMU control the visibility and enumeration order >> >> of the passthrough VFIO for the failover scenario. Hotplug can be one >> >> way to achieve it, and perhaps there's other way around also. The >> >> group ID is not just for QEMU to couple devices, it's also helpful to >> >> guest too as grouping using MAC address is just not safe. >> > >> > Sorry about dragging mainframes into this, but this will only work for >> > homogenous device coupling, not for heterogenous. Consider my vfio-pci >> > + virtio-net-ccw example again: The guest cannot find out that the two >> > belong together by checking some group ID, it has to either use the MAC >> > or some needs-to-be-architectured property. >> > >> > Alternatively, we could propose that mechanism as pci-only, which means >> > we can rely on mechanisms that won't necessarily work on non-pci >> > transports. (FWIW, I don't see a use case for using vfio-ccw to pass >> > through a network card anytime in the near future, due to the nature of >> > network cards currently in use on s390.) >> >> Yes, let's do this just for PCI transport (homogenous) for now. > > But why? Using pci for passthrough to make things easier (and because > there's not really a use case), sure. But I really don't want to > restrict this to virtio-pci only. Of course, technically it doesn't have to be virtio-pci only. The group UUID can even extend it further to non-pci transport. However, with the current focus of the driver support on SR-IOV VF and limited use case on non-pci, I'd feel no immediate effort will be needed on that front. > >> >> In the model of (b), I think it essentially turns hotplug to one of >> >> mechanisms for QEMU to control the visibility. The libvirt can still >> >> manage the hotplug of individual devices during live migration or in >> >> normal situation to hot add/remove devices. Though the visibility of >> >> the VFIO is under the controll of QEMU, and it's possible that the hot >> >> add/remove request does not involve actual hot plug activity in guest >> >> at all. >> > >> > That depends on how you model visibility, I guess. You'll probably want >> > to stop traffic flowing through one or the other of the cards; would >> > link down or similar be enough for the virtio device? >> >> I'm not sure if it is a good idea. The guest user will see two devices >> with same MAC but one of them is down. Do you expect user to use it or >> not? And since the guest is going to be migrated, we need to unplug a >> broken VF from guest before migrating, why do we bother plugging in >> this useless VF at the first place? > > I was thinking about using hotunplugging only over migration and doing > the link up only after feature negotiation has finished, but that is > probably too complicated. Let's stick to hotplug for simplicity's sake. OK. Thanks for the discussion, it's really useful. Regards, -Siwei
On Wed, 20 Jun 2018 22:48:58 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: > > In any case, I'm not sure anymore why we'd want the extra uuid. > > It's mostly so we can have e.g. multiple devices with same MAC > (which some people seem to want in order to then use > then with different containers). > > But it is also handy for when you assign a PF, since then you > can't set the MAC. > OK, so what about the following: - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates that we have a new uuid field in the virtio-net config space - in QEMU, add a property for virtio-net that allows to specify a uuid, offer VIRTIO_NET_F_STANDBY_UUID if set - when configuring, set the property to the group UUID of the vfio-pci device - in the guest, use the uuid from the virtio-net device's config space if applicable; else, fall back to matching by MAC as done today That should work for all virtio transports.
On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote: > > > On 2018年06月13日 12:24, Samudrala, Sridhar wrote: > > On 6/12/2018 7:38 PM, Jason Wang wrote: > > > > > > > > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote: > > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: > > > > > > > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote: > > > > > > I don't think this is sufficient. > > > > > > > > > > > > If both primary and standby devices are present, a > > > > > > legacy guest without > > > > > > support for the feature might see two devices with same mac and get > > > > > > confused. > > > > > > > > > > > > I think that we should only make primary visible after > > > > > > guest acked the > > > > > > backup feature bit. > > > > > I think we want exactly the reverse? E.g fail the > > > > > negotiation when guest > > > > > does not ack backup feature. > > > > > > > > > > Otherwise legacy guest won't even have the chance to see > > > > > primary device in > > > > > the guest. > > > > That's by design. > > > > > > So management needs to know the capability of guest to set the > > > backup feature. This looks a chicken or egg problem to me. > > > > I don't think so. If the tenant requests 'accelerated datapath feature', > > the management > > will set 'standby' feature bit on virtio-net interface and if the guest > > virtio-net driver > > supports this feature, then the tenant VM will get that capability via a > > hot-plugged > > primary device. > > Ok, I thought exactly the reverse because of the commit title is "enable > virtio_net to act as a standby for a passthru device". But re-read the > commit log content, I understand the case a little bit. Btw, VF is not > necessarily faster than virtio-net, especially consider virtio-net may have > a lot of queues. Don't do that then, right? > > > > > > > > > > > > > > > > > And on reset or when backup is cleared in some other way, unplug the > > > > > > primary. > > > > > What if guest does not support hotplug? > > > > It shouldn't ack the backup feature then and it's a good point. > > > > We should both document this and check kernel config has > > > > hotplug support. Sridhar could you take a look pls? > > > > > > > > > > Something like the below will do the job: > > > > > > > > > > > > Primary device is added with a special "primary-failover" flag. > > > > > > A virtual machine is then initialized with just a standby virtio > > > > > > device. Primary is not yet added. > > > > > A question is how to do the matching? Qemu knows nothing about e.g mac > > > > > address of a pass-through device I believe? > > > > Supply a flag to the VFIO when it's added, this way QEMU will know. > > > > > > > > > > Later QEMU detects that guest driver device set DRIVER_OK. > > > > > > It then exposes the primary device to the guest, and triggers > > > > > > a device addition event (hot-plug event) for it. > > > > > Do you mean we won't have primary device in the initial qemu cli? > > > > No, that's not what I mean. > > > > > > > > I mean management will supply a flag to VFIO and then > > > > > > > > > > > > - VFIO defers exposing > > > > primary to guest until guest acks the feature bit. > > > > - When we see guest ack, initiate hotplug. > > > > - On reboot, hide it again. > > > > - On reset without reboot, request hot-unplug and on eject hide > > > > it again. > > > > > > This sounds much like a kind of bonding in qemu. > > > > > > > > > > > > > If QEMU detects guest driver removal, it initiates a > > > > > > hot-unplug sequence > > > > > > to remove the primary driver. In particular, if QEMU detects guest > > > > > > re-initialization (e.g. by detecting guest reset) it > > > > > > immediately removes > > > > > > the primary device. > > > > > I believe guest failover module should handle this gracefully? > > > > It can't control enumeration order, if primary is enumerated before > > > > standby then guest will load its driver and it's too late > > > > when failover driver is loaded. > > > > > > Well, even if we can delay the visibility of primary until > > > DRIVER_OK, there still be a race I think? And it looks to me it's > > > still a bug of guest: > > > > > > E.g primary could be probed before failover_register() in guest. > > > Then we will miss the enslaving of primary forever. > > > > That is not an issue. Even if the primary is probed before failover > > driver, it will > > enslave the primary via the call to failover_existing_slave_register() > > as part of > > failover_register() routine. > > Aha I get it. So the enumeration order is not an issue. > > Consider primary may still be seen by guest kernel even if we delay its > visibility, I wonder whether we can control the lifecycle of primary through > driver but not qemu. This can simplify a lot of things. > > Thanks > > > > > > > > > Thanks > > > > > > > > > > > > Thanks > > > > > > > > > > > We can move some of this code to management as well, > > > > > > architecturally it > > > > > > does not make too much sense but it might be easier > > > > > > implementation-wise. > > > > > > > > > > > > HTH > > > > > > > > > > > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: > > > > > > > Ping on this patch now that the kernel patches are > > > > > > > accepted into davem's net-next tree. > > > > > > > https://patchwork.ozlabs.org/cover/920005/ > > > > > > > > > > > > > > > > > > > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: > > > > > > > > This feature bit can be used by hypervisor to > > > > > > > > indicate virtio_net device to > > > > > > > > act as a standby for another device with the same MAC address. > > > > > > > > > > > > > > > > I tested this with a small change to the patch > > > > > > > > to mark the STANDBY feature 'true' > > > > > > > > by default as i am using libvirt to start the VMs. > > > > > > > > Is there a way to pass the newly added feature > > > > > > > > bit 'standby' to qemu via libvirt > > > > > > > > XML file? > > > > > > > > > > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> > > > > > > > > --- > > > > > > > > hw/net/virtio-net.c | 2 ++ > > > > > > > > include/standard-headers/linux/virtio_net.h | 3 +++ > > > > > > > > 2 files changed, 5 insertions(+) > > > > > > > > > > > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > > > > > > > > index 90502fca7c..38b3140670 100644 > > > > > > > > --- a/hw/net/virtio-net.c > > > > > > > > +++ b/hw/net/virtio-net.c > > > > > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { > > > > > > > > true), > > > > > > > > DEFINE_PROP_INT32("speed", VirtIONet, > > > > > > > > net_conf.speed, SPEED_UNKNOWN), > > > > > > > > DEFINE_PROP_STRING("duplex", VirtIONet, > > > > > > > > net_conf.duplex_str), > > > > > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, > > > > > > > > host_features, VIRTIO_NET_F_STANDBY, > > > > > > > > + false), > > > > > > > > DEFINE_PROP_END_OF_LIST(), > > > > > > > > }; > > > > > > > > diff --git > > > > > > > > a/include/standard-headers/linux/virtio_net.h > > > > > > > > b/include/standard-headers/linux/virtio_net.h > > > > > > > > index e9f255ea3f..01ec09684c 100644 > > > > > > > > --- a/include/standard-headers/linux/virtio_net.h > > > > > > > > +++ b/include/standard-headers/linux/virtio_net.h > > > > > > > > @@ -57,6 +57,9 @@ > > > > > > > > * Steering */ > > > > > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act > > > > > > > > as standby for another device > > > > > > > > + * with the same MAC. > > > > > > > > + */ > > > > > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* > > > > > > > > Device set linkspeed and duplex */ > > > > > > > > #ifndef VIRTIO_NET_NO_LEGACY > > > > >
On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > On Wed, 20 Jun 2018 22:48:58 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: > > > In any case, I'm not sure anymore why we'd want the extra uuid. > > > > It's mostly so we can have e.g. multiple devices with same MAC > > (which some people seem to want in order to then use > > then with different containers). > > > > But it is also handy for when you assign a PF, since then you > > can't set the MAC. > > > > OK, so what about the following: > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > that we have a new uuid field in the virtio-net config space > - in QEMU, add a property for virtio-net that allows to specify a uuid, > offer VIRTIO_NET_F_STANDBY_UUID if set > - when configuring, set the property to the group UUID of the vfio-pci > device > - in the guest, use the uuid from the virtio-net device's config space > if applicable; else, fall back to matching by MAC as done today > > That should work for all virtio transports. True. I'm a bit unhappy that it's virtio net specific though since down the road I expect we'll have a very similar feature for scsi (and maybe others). But we do not have a way to have fields that are portable both across devices and transports, and I think it would be a useful addition. How would this work though? Any idea?
On Thu, Jun 21, 2018 at 11:14 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote: >> >> >> On 2018年06月13日 12:24, Samudrala, Sridhar wrote: >> > On 6/12/2018 7:38 PM, Jason Wang wrote: >> > > >> > > >> > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote: >> > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >> > > > > >> > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >> > > > > > I don't think this is sufficient. >> > > > > > >> > > > > > If both primary and standby devices are present, a >> > > > > > legacy guest without >> > > > > > support for the feature might see two devices with same mac and get >> > > > > > confused. >> > > > > > >> > > > > > I think that we should only make primary visible after >> > > > > > guest acked the >> > > > > > backup feature bit. >> > > > > I think we want exactly the reverse? E.g fail the >> > > > > negotiation when guest >> > > > > does not ack backup feature. >> > > > > >> > > > > Otherwise legacy guest won't even have the chance to see >> > > > > primary device in >> > > > > the guest. >> > > > That's by design. >> > > >> > > So management needs to know the capability of guest to set the >> > > backup feature. This looks a chicken or egg problem to me. >> > >> > I don't think so. If the tenant requests 'accelerated datapath feature', >> > the management >> > will set 'standby' feature bit on virtio-net interface and if the guest >> > virtio-net driver >> > supports this feature, then the tenant VM will get that capability via a >> > hot-plugged >> > primary device. >> >> Ok, I thought exactly the reverse because of the commit title is "enable >> virtio_net to act as a standby for a passthru device". But re-read the >> commit log content, I understand the case a little bit. Btw, VF is not >> necessarily faster than virtio-net, especially consider virtio-net may have >> a lot of queues. > > Don't do that then, right? I don't understand. Where did the standby feature come to imply the "accelerated datapath" thing? Isn't failover/standby a generic high availblity term, rather than marry it to the concept of device model specifics? Do we expect scsi to work exactly the same way with "accelerated datapath"? -Siwei > >> > >> > >> > > >> > > > >> > > > > > And on reset or when backup is cleared in some other way, unplug the >> > > > > > primary. >> > > > > What if guest does not support hotplug? >> > > > It shouldn't ack the backup feature then and it's a good point. >> > > > We should both document this and check kernel config has >> > > > hotplug support. Sridhar could you take a look pls? >> > > > >> > > > > > Something like the below will do the job: >> > > > > > >> > > > > > Primary device is added with a special "primary-failover" flag. >> > > > > > A virtual machine is then initialized with just a standby virtio >> > > > > > device. Primary is not yet added. >> > > > > A question is how to do the matching? Qemu knows nothing about e.g mac >> > > > > address of a pass-through device I believe? >> > > > Supply a flag to the VFIO when it's added, this way QEMU will know. >> > > > >> > > > > > Later QEMU detects that guest driver device set DRIVER_OK. >> > > > > > It then exposes the primary device to the guest, and triggers >> > > > > > a device addition event (hot-plug event) for it. >> > > > > Do you mean we won't have primary device in the initial qemu cli? >> > > > No, that's not what I mean. >> > > > >> > > > I mean management will supply a flag to VFIO and then >> > > > >> > > > >> > > > - VFIO defers exposing >> > > > primary to guest until guest acks the feature bit. >> > > > - When we see guest ack, initiate hotplug. >> > > > - On reboot, hide it again. >> > > > - On reset without reboot, request hot-unplug and on eject hide >> > > > it again. >> > > >> > > This sounds much like a kind of bonding in qemu. >> > > >> > > > >> > > > > > If QEMU detects guest driver removal, it initiates a >> > > > > > hot-unplug sequence >> > > > > > to remove the primary driver. In particular, if QEMU detects guest >> > > > > > re-initialization (e.g. by detecting guest reset) it >> > > > > > immediately removes >> > > > > > the primary device. >> > > > > I believe guest failover module should handle this gracefully? >> > > > It can't control enumeration order, if primary is enumerated before >> > > > standby then guest will load its driver and it's too late >> > > > when failover driver is loaded. >> > > >> > > Well, even if we can delay the visibility of primary until >> > > DRIVER_OK, there still be a race I think? And it looks to me it's >> > > still a bug of guest: >> > > >> > > E.g primary could be probed before failover_register() in guest. >> > > Then we will miss the enslaving of primary forever. >> > >> > That is not an issue. Even if the primary is probed before failover >> > driver, it will >> > enslave the primary via the call to failover_existing_slave_register() >> > as part of >> > failover_register() routine. >> >> Aha I get it. So the enumeration order is not an issue. >> >> Consider primary may still be seen by guest kernel even if we delay its >> visibility, I wonder whether we can control the lifecycle of primary through >> driver but not qemu. This can simplify a lot of things. >> >> Thanks >> >> > >> > > >> > > Thanks >> > > >> > > > >> > > > > Thanks >> > > > > >> > > > > > We can move some of this code to management as well, >> > > > > > architecturally it >> > > > > > does not make too much sense but it might be easier >> > > > > > implementation-wise. >> > > > > > >> > > > > > HTH >> > > > > > >> > > > > > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote: >> > > > > > > Ping on this patch now that the kernel patches are >> > > > > > > accepted into davem's net-next tree. >> > > > > > > https://patchwork.ozlabs.org/cover/920005/ >> > > > > > > >> > > > > > > >> > > > > > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote: >> > > > > > > > This feature bit can be used by hypervisor to >> > > > > > > > indicate virtio_net device to >> > > > > > > > act as a standby for another device with the same MAC address. >> > > > > > > > >> > > > > > > > I tested this with a small change to the patch >> > > > > > > > to mark the STANDBY feature 'true' >> > > > > > > > by default as i am using libvirt to start the VMs. >> > > > > > > > Is there a way to pass the newly added feature >> > > > > > > > bit 'standby' to qemu via libvirt >> > > > > > > > XML file? >> > > > > > > > >> > > > > > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> >> > > > > > > > --- >> > > > > > > > hw/net/virtio-net.c | 2 ++ >> > > > > > > > include/standard-headers/linux/virtio_net.h | 3 +++ >> > > > > > > > 2 files changed, 5 insertions(+) >> > > > > > > > >> > > > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> > > > > > > > index 90502fca7c..38b3140670 100644 >> > > > > > > > --- a/hw/net/virtio-net.c >> > > > > > > > +++ b/hw/net/virtio-net.c >> > > > > > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { >> > > > > > > > true), >> > > > > > > > DEFINE_PROP_INT32("speed", VirtIONet, >> > > > > > > > net_conf.speed, SPEED_UNKNOWN), >> > > > > > > > DEFINE_PROP_STRING("duplex", VirtIONet, >> > > > > > > > net_conf.duplex_str), >> > > > > > > > + DEFINE_PROP_BIT64("standby", VirtIONet, >> > > > > > > > host_features, VIRTIO_NET_F_STANDBY, >> > > > > > > > + false), >> > > > > > > > DEFINE_PROP_END_OF_LIST(), >> > > > > > > > }; >> > > > > > > > diff --git >> > > > > > > > a/include/standard-headers/linux/virtio_net.h >> > > > > > > > b/include/standard-headers/linux/virtio_net.h >> > > > > > > > index e9f255ea3f..01ec09684c 100644 >> > > > > > > > --- a/include/standard-headers/linux/virtio_net.h >> > > > > > > > +++ b/include/standard-headers/linux/virtio_net.h >> > > > > > > > @@ -57,6 +57,9 @@ >> > > > > > > > * Steering */ >> > > > > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >> > > > > > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act >> > > > > > > > as standby for another device >> > > > > > > > + * with the same MAC. >> > > > > > > > + */ >> > > > > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* >> > > > > > > > Device set linkspeed and duplex */ >> > > > > > > > #ifndef VIRTIO_NET_NO_LEGACY >> > > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >
On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: > On Wed, 20 Jun 2018 22:48:58 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >> > In any case, I'm not sure anymore why we'd want the extra uuid. >> >> It's mostly so we can have e.g. multiple devices with same MAC >> (which some people seem to want in order to then use >> then with different containers). >> >> But it is also handy for when you assign a PF, since then you >> can't set the MAC. >> > > OK, so what about the following: > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > that we have a new uuid field in the virtio-net config space > - in QEMU, add a property for virtio-net that allows to specify a uuid, > offer VIRTIO_NET_F_STANDBY_UUID if set > - when configuring, set the property to the group UUID of the vfio-pci > device If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe to still expose UUID in the config space on virtio-pci? I'm not even sure if it's sane to expose group UUID on the PCI bridge where the corresponding vfio-pci device attached to for a guest which doesn't support the feature (legacy). -Siwei > - in the guest, use the uuid from the virtio-net device's config space > if applicable; else, fall back to matching by MAC as done today > > That should work for all virtio transports.
On 2018-06-21 18:21:55 -0700, Siwei Liu wrote: > On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: > > On Wed, 20 Jun 2018 22:48:58 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: > >> > In any case, I'm not sure anymore why we'd want the extra uuid. > >> > >> It's mostly so we can have e.g. multiple devices with same MAC > >> (which some people seem to want in order to then use > >> then with different containers). > >> > >> But it is also handy for when you assign a PF, since then you > >> can't set the MAC. > >> > > > > OK, so what about the following: > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > that we have a new uuid field in the virtio-net config space > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > offer VIRTIO_NET_F_STANDBY_UUID if set > > - when configuring, set the property to the group UUID of the vfio-pci > > device > > If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe > to still expose UUID in the config space on virtio-pci? Why is it not safe? When we expose VIRTIO_NET_F_STANDBY_UUID, it is up to the driver to decide if it wants to use it or not. If it does not want to use the feature, it would also imply that the driver will not be interested in the UUID value in the config space. So, the UUID will be some piece of data that simply sits around; nobody cares for it. > I'm not even sure if it's sane to expose group UUID on the PCI bridge > where the corresponding vfio-pci device attached to for a guest which > doesn't support the feature (legacy). Unfortunately, you don't know beforehand if the guest will be a legacy guest that doesn't support the feature. As is the case with the UUID in the virtio-pci device's config space, the UUID in the bridge's config space will/should be ignored by the legacy guest. Venu > > - in the guest, use the uuid from the virtio-net device's config space > > if applicable; else, fall back to matching by MAC as done today > > > > That should work for all virtio transports.
On Thu, Jun 21, 2018 at 06:07:18PM -0700, Siwei Liu wrote: > On Thu, Jun 21, 2018 at 11:14 AM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote: > >> > >> > >> On 2018年06月13日 12:24, Samudrala, Sridhar wrote: > >> > On 6/12/2018 7:38 PM, Jason Wang wrote: > >> > > > >> > > > >> > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote: > >> > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: > >> > > > > > >> > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote: > >> > > > > > I don't think this is sufficient. > >> > > > > > > >> > > > > > If both primary and standby devices are present, a > >> > > > > > legacy guest without > >> > > > > > support for the feature might see two devices with same mac and get > >> > > > > > confused. > >> > > > > > > >> > > > > > I think that we should only make primary visible after > >> > > > > > guest acked the > >> > > > > > backup feature bit. > >> > > > > I think we want exactly the reverse? E.g fail the > >> > > > > negotiation when guest > >> > > > > does not ack backup feature. > >> > > > > > >> > > > > Otherwise legacy guest won't even have the chance to see > >> > > > > primary device in > >> > > > > the guest. > >> > > > That's by design. > >> > > > >> > > So management needs to know the capability of guest to set the > >> > > backup feature. This looks a chicken or egg problem to me. > >> > > >> > I don't think so. If the tenant requests 'accelerated datapath feature', > >> > the management > >> > will set 'standby' feature bit on virtio-net interface and if the guest > >> > virtio-net driver > >> > supports this feature, then the tenant VM will get that capability via a > >> > hot-plugged > >> > primary device. > >> > >> Ok, I thought exactly the reverse because of the commit title is "enable > >> virtio_net to act as a standby for a passthru device". But re-read the > >> commit log content, I understand the case a little bit. Btw, VF is not > >> necessarily faster than virtio-net, especially consider virtio-net may have > >> a lot of queues. > > > > Don't do that then, right? > > I don't understand. Where did the standby feature come to imply the > "accelerated datapath" thing? > Isn't failover/standby a generic high > availblity term, rather than marry it to the concept of device model > specifics? Do we expect scsi to work exactly the same way with > "accelerated datapath"? That's not what I said. The semantics are that the primary is always used if present in preference to standby. Jason said virtio net is sometimes preferable. If that's the case don't make it a standby. More advanced use-cases do exist and e.g. Alexander Duyck suggested using a switch-dev. failover isn't it though.
On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: > On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: > > On Wed, 20 Jun 2018 22:48:58 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: > >> > In any case, I'm not sure anymore why we'd want the extra uuid. > >> > >> It's mostly so we can have e.g. multiple devices with same MAC > >> (which some people seem to want in order to then use > >> then with different containers). > >> > >> But it is also handy for when you assign a PF, since then you > >> can't set the MAC. > >> > > > > OK, so what about the following: > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > that we have a new uuid field in the virtio-net config space > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > offer VIRTIO_NET_F_STANDBY_UUID if set > > - when configuring, set the property to the group UUID of the vfio-pci > > device > > If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe > to still expose UUID in the config space on virtio-pci? Yes but guest is not supposed to read it. > I'm not even sure if it's sane to expose group UUID on the PCI bridge > where the corresponding vfio-pci device attached to for a guest which > doesn't support the feature (legacy). > > -Siwei Yes but you won't add the primary behind such a bridge. > > > - in the guest, use the uuid from the virtio-net device's config space > > if applicable; else, fall back to matching by MAC as done today > > > > That should work for all virtio transports.
On Thu, 21 Jun 2018 21:20:13 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > > OK, so what about the following: > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > that we have a new uuid field in the virtio-net config space > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > offer VIRTIO_NET_F_STANDBY_UUID if set > > - when configuring, set the property to the group UUID of the vfio-pci > > device > > - in the guest, use the uuid from the virtio-net device's config space > > if applicable; else, fall back to matching by MAC as done today > > > > That should work for all virtio transports. > > True. I'm a bit unhappy that it's virtio net specific though > since down the road I expect we'll have a very similar feature > for scsi (and maybe others). > > But we do not have a way to have fields that are portable > both across devices and transports, and I think it would > be a useful addition. How would this work though? Any idea? Can we introduce some kind of device-independent config space area? Pushing back the device-specific config space by a certain value if the appropriate feature is negotiated and use that for things like the uuid? But regardless of that, I'm not sure whether extending this approach to other device types is the way to go. Tying together two different devices is creating complicated situations at least in the hypervisor (even if it's fairly straightforward in the guest). [I have not come around again to look at the "how to handle visibility in QEMU" questions due to lack of cycles, sorry about that.] So, what's the goal of this approach? Only to allow migration with vfio-pci, or also to plug in a faster device and use it instead of an already attached paravirtualized device? What about migration of vfio devices that are not easily replaced by a paravirtualized device? I'm thinking of vfio-ccw, where our main (and currently only) supported device is dasd (disks) -- which can do a lot of specialized things that virtio-blk does not support (and should not or even cannot support). Would it be more helpful to focus on generic migration support for vfio instead of going about it device by device? This network device approach already seems far along, so it makes sense to continue with it. But I'm not sure whether we want to spend time and energy on that for other device types rather than working on a general solution for vfio migration.
On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > On Thu, 21 Jun 2018 21:20:13 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > > > OK, so what about the following: > > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > > that we have a new uuid field in the virtio-net config space > > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > > offer VIRTIO_NET_F_STANDBY_UUID if set > > > - when configuring, set the property to the group UUID of the vfio-pci > > > device > > > - in the guest, use the uuid from the virtio-net device's config space > > > if applicable; else, fall back to matching by MAC as done today > > > > > > That should work for all virtio transports. > > > > True. I'm a bit unhappy that it's virtio net specific though > > since down the road I expect we'll have a very similar feature > > for scsi (and maybe others). > > > > But we do not have a way to have fields that are portable > > both across devices and transports, and I think it would > > be a useful addition. How would this work though? Any idea? > > Can we introduce some kind of device-independent config space area? > Pushing back the device-specific config space by a certain value if the > appropriate feature is negotiated and use that for things like the uuid? So config moves back and forth? Reminds me of the msi vector mess we had with pci. I'd rather have every transport add a new config. > But regardless of that, I'm not sure whether extending this approach to > other device types is the way to go. Tying together two different > devices is creating complicated situations at least in the hypervisor > (even if it's fairly straightforward in the guest). [I have not come > around again to look at the "how to handle visibility in QEMU" > questions due to lack of cycles, sorry about that.] > > So, what's the goal of this approach? Only to allow migration with > vfio-pci, or also to plug in a faster device and use it instead of an > already attached paravirtualized device? These are two sides of the same coin, I think the second approach is closer to what we are doing here. > What about migration of vfio devices that are not easily replaced by a > paravirtualized device? I'm thinking of vfio-ccw, where our main (and > currently only) supported device is dasd (disks) -- which can do a lot > of specialized things that virtio-blk does not support (and should not > or even cannot support). But maybe virtio-scsi can? > Would it be more helpful to focus on generic > migration support for vfio instead of going about it device by device? > > This network device approach already seems far along, so it makes sense > to continue with it. But I'm not sure whether we want to spend time and > energy on that for other device types rather than working on a general > solution for vfio migration. I'm inclined to say finalizing this feature would be a good first step and will teach us how we can move forward.
On Thu, Jun 21, 2018 at 7:30 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Jun 21, 2018 at 06:07:18PM -0700, Siwei Liu wrote: >> On Thu, Jun 21, 2018 at 11:14 AM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Wed, Jun 13, 2018 at 01:40:59PM +0800, Jason Wang wrote: >> >> >> >> >> >> On 2018年06月13日 12:24, Samudrala, Sridhar wrote: >> >> > On 6/12/2018 7:38 PM, Jason Wang wrote: >> >> > > >> >> > > >> >> > > On 2018年06月12日 19:54, Michael S. Tsirkin wrote: >> >> > > > On Wed, Jun 06, 2018 at 10:29:03AM +0800, Jason Wang wrote: >> >> > > > > >> >> > > > > On 2018年06月05日 20:33, Michael S. Tsirkin wrote: >> >> > > > > > I don't think this is sufficient. >> >> > > > > > >> >> > > > > > If both primary and standby devices are present, a >> >> > > > > > legacy guest without >> >> > > > > > support for the feature might see two devices with same mac and get >> >> > > > > > confused. >> >> > > > > > >> >> > > > > > I think that we should only make primary visible after >> >> > > > > > guest acked the >> >> > > > > > backup feature bit. >> >> > > > > I think we want exactly the reverse? E.g fail the >> >> > > > > negotiation when guest >> >> > > > > does not ack backup feature. >> >> > > > > >> >> > > > > Otherwise legacy guest won't even have the chance to see >> >> > > > > primary device in >> >> > > > > the guest. >> >> > > > That's by design. >> >> > > >> >> > > So management needs to know the capability of guest to set the >> >> > > backup feature. This looks a chicken or egg problem to me. >> >> > >> >> > I don't think so. If the tenant requests 'accelerated datapath feature', >> >> > the management >> >> > will set 'standby' feature bit on virtio-net interface and if the guest >> >> > virtio-net driver >> >> > supports this feature, then the tenant VM will get that capability via a >> >> > hot-plugged >> >> > primary device. >> >> >> >> Ok, I thought exactly the reverse because of the commit title is "enable >> >> virtio_net to act as a standby for a passthru device". But re-read the >> >> commit log content, I understand the case a little bit. Btw, VF is not >> >> necessarily faster than virtio-net, especially consider virtio-net may have >> >> a lot of queues. >> > >> > Don't do that then, right? >> >> I don't understand. Where did the standby feature come to imply the >> "accelerated datapath" thing? >> Isn't failover/standby a generic high >> availblity term, rather than marry it to the concept of device model >> specifics? Do we expect scsi to work exactly the same way with >> "accelerated datapath"? > > That's not what I said. > The semantics are that the primary is always used if present in > preference to standby. OK. If this is the only semantics of what "standby" refers to in general, that is fine. I just don't want to limit the failover/standby semantics to the device model specifics, the "accelerated datapath" thing or whatever. I really don't know where the requirements of the "accelerated datapath" came from, as the originial problem is migrating vfio devices which is in match of QEMU's live migration model. Hyper-V has it's limitation to do 1-netdev should not impact how KVM/QEMU should be doing it. > Jason said virtio net is sometimes preferable. > If that's the case don't make it a standby. > > More advanced use-cases do exist and e.g. Alexander Duyck > suggested using a switch-dev. The switchdev case would need a new feature bit, right? -Siwei > failover isn't it though. > > -- > MST
On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: >> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: >> > On Wed, 20 Jun 2018 22:48:58 +0300 >> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > >> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >> >> > In any case, I'm not sure anymore why we'd want the extra uuid. >> >> >> >> It's mostly so we can have e.g. multiple devices with same MAC >> >> (which some people seem to want in order to then use >> >> then with different containers). >> >> >> >> But it is also handy for when you assign a PF, since then you >> >> can't set the MAC. >> >> >> > >> > OK, so what about the following: >> > >> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >> > that we have a new uuid field in the virtio-net config space >> > - in QEMU, add a property for virtio-net that allows to specify a uuid, >> > offer VIRTIO_NET_F_STANDBY_UUID if set >> > - when configuring, set the property to the group UUID of the vfio-pci >> > device >> >> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe >> to still expose UUID in the config space on virtio-pci? > > > Yes but guest is not supposed to read it. > >> I'm not even sure if it's sane to expose group UUID on the PCI bridge >> where the corresponding vfio-pci device attached to for a guest which >> doesn't support the feature (legacy). >> >> -Siwei > > Yes but you won't add the primary behind such a bridge. I assume the UUID feature is a new one besides the exiting VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, if UUID feature is present and supported by guest, we'll do pairing based on UUID; if UUID feature present or not supported by guest, we'll still plug in the VF (if guest supports the _F_STANDBY feature) but the pairing will be fallback to using MAC address. Are you saying you don't even want to plug in the primary when the UUID feature is not supported by guest? Does the feature negotiation UUID have to depend on STANDBY being set, or the other way around? I thought that just the absence of STANDBY will prevent primary to get exposed to the guest. -Siwei > >> >> > - in the guest, use the uuid from the virtio-net device's config space >> > if applicable; else, fall back to matching by MAC as done today >> > >> > That should work for all virtio transports.
On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: >>> > On Wed, 20 Jun 2018 22:48:58 +0300 >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >>> > >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid. >>> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC >>> >> (which some people seem to want in order to then use >>> >> then with different containers). >>> >> >>> >> But it is also handy for when you assign a PF, since then you >>> >> can't set the MAC. >>> >> >>> > >>> > OK, so what about the following: >>> > >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >>> > that we have a new uuid field in the virtio-net config space >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid, >>> > offer VIRTIO_NET_F_STANDBY_UUID if set >>> > - when configuring, set the property to the group UUID of the vfio-pci >>> > device >>> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe >>> to still expose UUID in the config space on virtio-pci? >> >> >> Yes but guest is not supposed to read it. >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge >>> where the corresponding vfio-pci device attached to for a guest which >>> doesn't support the feature (legacy). >>> >>> -Siwei >> >> Yes but you won't add the primary behind such a bridge. > > I assume the UUID feature is a new one besides the exiting > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, > if UUID feature is present and supported by guest, we'll do pairing > based on UUID; if UUID feature present ^^^^^^^ is NOT present > or not supported by guest, > we'll still plug in the VF (if guest supports the _F_STANDBY feature) > but the pairing will be fallback to using MAC address. Are you saying > you don't even want to plug in the primary when the UUID feature is > not supported by guest? Does the feature negotiation UUID have to > depend on STANDBY being set, or the other way around? I thought that > just the absence of STANDBY will prevent primary to get exposed to the > guest. > > -Siwei > >> >>> >>> > - in the guest, use the uuid from the virtio-net device's config space >>> > if applicable; else, fall back to matching by MAC as done today >>> > >>> > That should work for all virtio transports.
On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: >> On Thu, 21 Jun 2018 21:20:13 +0300 >> "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: >> > > OK, so what about the following: >> > > >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >> > > that we have a new uuid field in the virtio-net config space >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid, >> > > offer VIRTIO_NET_F_STANDBY_UUID if set >> > > - when configuring, set the property to the group UUID of the vfio-pci >> > > device >> > > - in the guest, use the uuid from the virtio-net device's config space >> > > if applicable; else, fall back to matching by MAC as done today >> > > >> > > That should work for all virtio transports. >> > >> > True. I'm a bit unhappy that it's virtio net specific though >> > since down the road I expect we'll have a very similar feature >> > for scsi (and maybe others). >> > >> > But we do not have a way to have fields that are portable >> > both across devices and transports, and I think it would >> > be a useful addition. How would this work though? Any idea? >> >> Can we introduce some kind of device-independent config space area? >> Pushing back the device-specific config space by a certain value if the >> appropriate feature is negotiated and use that for things like the uuid? > > So config moves back and forth? > Reminds me of the msi vector mess we had with pci. > I'd rather have every transport add a new config. > >> But regardless of that, I'm not sure whether extending this approach to >> other device types is the way to go. Tying together two different >> devices is creating complicated situations at least in the hypervisor >> (even if it's fairly straightforward in the guest). [I have not come >> around again to look at the "how to handle visibility in QEMU" >> questions due to lack of cycles, sorry about that.] >> >> So, what's the goal of this approach? Only to allow migration with >> vfio-pci, or also to plug in a faster device and use it instead of an >> already attached paravirtualized device? > > These are two sides of the same coin, I think the second approach > is closer to what we are doing here. I'm leaning towards being conservative to keep the potential of doing both. It's the vfio migration itself that has to be addessed, not to make every virtio device to have an accelerated datapath, right? -Siwei > >> What about migration of vfio devices that are not easily replaced by a >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and >> currently only) supported device is dasd (disks) -- which can do a lot >> of specialized things that virtio-blk does not support (and should not >> or even cannot support). > > But maybe virtio-scsi can? > >> Would it be more helpful to focus on generic >> migration support for vfio instead of going about it device by device? >> >> This network device approach already seems far along, so it makes sense >> to continue with it. But I'm not sure whether we want to spend time and >> energy on that for other device types rather than working on a general >> solution for vfio migration. > > I'm inclined to say finalizing this feature would be a good first step > and will teach us how we can move forward. > > -- > MST
On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote: > On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: > > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: > >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: > >>> > On Wed, 20 Jun 2018 22:48:58 +0300 > >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >>> > > >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: > >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid. > >>> >> > >>> >> It's mostly so we can have e.g. multiple devices with same MAC > >>> >> (which some people seem to want in order to then use > >>> >> then with different containers). > >>> >> > >>> >> But it is also handy for when you assign a PF, since then you > >>> >> can't set the MAC. > >>> >> > >>> > > >>> > OK, so what about the following: > >>> > > >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > >>> > that we have a new uuid field in the virtio-net config space > >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid, > >>> > offer VIRTIO_NET_F_STANDBY_UUID if set > >>> > - when configuring, set the property to the group UUID of the vfio-pci > >>> > device > >>> > >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe > >>> to still expose UUID in the config space on virtio-pci? > >> > >> > >> Yes but guest is not supposed to read it. > >> > >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge > >>> where the corresponding vfio-pci device attached to for a guest which > >>> doesn't support the feature (legacy). > >>> > >>> -Siwei > >> > >> Yes but you won't add the primary behind such a bridge. > > > > I assume the UUID feature is a new one besides the exiting > > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, > > if UUID feature is present and supported by guest, we'll do pairing > > based on UUID; if UUID feature present > ^^^^^^^ is NOT present Let's go back for a bit. What is wrong with matching by mac? 1. Does not allow multiple NICs with same mac 2. Requires MAC to be programmed by host in the PT device (which is often possible with VFs but isn't possible with all PT devices) Both issues are up to host: if host needs them it needs the UUID feature, if not MAC matching is sufficient. > > or not supported by guest, > > we'll still plug in the VF (if guest supports the _F_STANDBY feature) > > but the pairing will be fallback to using MAC address. Are you saying > > you don't even want to plug in the primary when the UUID feature is > > not supported by guest? Does the feature negotiation UUID have to > > depend on STANDBY being set, or the other way around? I thought that > > just the absence of STANDBY will prevent primary to get exposed to the > > guest. > > > > -Siwei > > > >> > >>> > >>> > - in the guest, use the uuid from the virtio-net device's config space > >>> > if applicable; else, fall back to matching by MAC as done today
On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote: > On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > >> On Thu, 21 Jun 2018 21:20:13 +0300 > >> "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> > >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > >> > > OK, so what about the following: > >> > > > >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > >> > > that we have a new uuid field in the virtio-net config space > >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > >> > > offer VIRTIO_NET_F_STANDBY_UUID if set > >> > > - when configuring, set the property to the group UUID of the vfio-pci > >> > > device > >> > > - in the guest, use the uuid from the virtio-net device's config space > >> > > if applicable; else, fall back to matching by MAC as done today > >> > > > >> > > That should work for all virtio transports. > >> > > >> > True. I'm a bit unhappy that it's virtio net specific though > >> > since down the road I expect we'll have a very similar feature > >> > for scsi (and maybe others). > >> > > >> > But we do not have a way to have fields that are portable > >> > both across devices and transports, and I think it would > >> > be a useful addition. How would this work though? Any idea? > >> > >> Can we introduce some kind of device-independent config space area? > >> Pushing back the device-specific config space by a certain value if the > >> appropriate feature is negotiated and use that for things like the uuid? > > > > So config moves back and forth? > > Reminds me of the msi vector mess we had with pci. > > I'd rather have every transport add a new config. > > > >> But regardless of that, I'm not sure whether extending this approach to > >> other device types is the way to go. Tying together two different > >> devices is creating complicated situations at least in the hypervisor > >> (even if it's fairly straightforward in the guest). [I have not come > >> around again to look at the "how to handle visibility in QEMU" > >> questions due to lack of cycles, sorry about that.] > >> > >> So, what's the goal of this approach? Only to allow migration with > >> vfio-pci, or also to plug in a faster device and use it instead of an > >> already attached paravirtualized device? > > > > These are two sides of the same coin, I think the second approach > > is closer to what we are doing here. > > I'm leaning towards being conservative to keep the potential of doing > both. It's the vfio migration itself that has to be addessed, not to > make every virtio device to have an accelerated datapath, right? > > -Siwei I think that the approach we took (signal configuration through standby) assumes standby always present and primary appearing and disappearing. Anything else isn't well supported and I'm not sure we should complicate code too much to support it. > > > > >> What about migration of vfio devices that are not easily replaced by a > >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and > >> currently only) supported device is dasd (disks) -- which can do a lot > >> of specialized things that virtio-blk does not support (and should not > >> or even cannot support). > > > > But maybe virtio-scsi can? > > > >> Would it be more helpful to focus on generic > >> migration support for vfio instead of going about it device by device? > >> > >> This network device approach already seems far along, so it makes sense > >> to continue with it. But I'm not sure whether we want to spend time and > >> energy on that for other device types rather than working on a general > >> solution for vfio migration. > > > > I'm inclined to say finalizing this feature would be a good first step > > and will teach us how we can move forward. > > > > -- > > MST
On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > Would it be more helpful to focus on generic > migration support for vfio instead of going about it device by device? Just to note this approach is actually device by device *type*. It's mostly device agnostic for a given device type so you can migrate between hosts with very different hardware. And support for more PV device types has other advantages such as security and forward compatibility to future hosts. Finally, it all can happen mostly within QEMU. User is currently required to enable it but it's pretty lightweight. OTOH vfio migration generally requires actual device-specific work, and only works when hosts are mostly identical. When they aren't it's easy to blame the user, but tools for checking host compatiblity are currently non-existent. Upper layer management will also have to learn about host and device compatibility wrt migration. At the moment they can't even figure it out wrt software versions of vhost in kernel and dpdk so I won't hold my breath for all of this happening quickly.
On Fri, Jun 22, 2018 at 12:43:26PM -0700, Siwei Liu wrote: > > The semantics are that the primary is always used if present in > > preference to standby. > OK. If this is the only semantics of what "standby" refers to in > general, that is fine. > > I just don't want to limit the failover/standby semantics to the > device model specifics, the "accelerated datapath" thing or whatever. > I really don't know where the requirements of the "accelerated > datapath" came from, It's a way to put it that matches what failover actually provides. > as the originial problem is migrating vfio > devices which is in match of QEMU's live migration model. If you put it this way then it's natural to require that we have a config with a working vfio device, and that we somehow add virtio for duration of migration. > Hyper-V has > it's limitation to do 1-netdev should not impact how KVM/QEMU should > be doing it. That's a linux thing and pretty orthogonal to host/guest interface. > > Jason said virtio net is sometimes preferable. > > If that's the case don't make it a standby. > > > > More advanced use-cases do exist and e.g. Alexander Duyck > > suggested using a switch-dev. > > The switchdev case would need a new feature bit, right? > > -Siwei I think it would need to be a completely new device. > > failover isn't it though. > > > > -- > > MST
On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote: >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300 >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >>> > >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid. >> >>> >> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC >> >>> >> (which some people seem to want in order to then use >> >>> >> then with different containers). >> >>> >> >> >>> >> But it is also handy for when you assign a PF, since then you >> >>> >> can't set the MAC. >> >>> >> >> >>> > >> >>> > OK, so what about the following: >> >>> > >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >> >>> > that we have a new uuid field in the virtio-net config space >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid, >> >>> > offer VIRTIO_NET_F_STANDBY_UUID if set >> >>> > - when configuring, set the property to the group UUID of the vfio-pci >> >>> > device >> >>> >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe >> >>> to still expose UUID in the config space on virtio-pci? >> >> >> >> >> >> Yes but guest is not supposed to read it. >> >> >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge >> >>> where the corresponding vfio-pci device attached to for a guest which >> >>> doesn't support the feature (legacy). >> >>> >> >>> -Siwei >> >> >> >> Yes but you won't add the primary behind such a bridge. >> > >> > I assume the UUID feature is a new one besides the exiting >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, >> > if UUID feature is present and supported by guest, we'll do pairing >> > based on UUID; if UUID feature present >> ^^^^^^^ is NOT present > > Let's go back for a bit. > > What is wrong with matching by mac? > > 1. Does not allow multiple NICs with same mac > 2. Requires MAC to be programmed by host in the PT device > (which is often possible with VFs but isn't possible with all PT > devices) Might not neccessarily be something wrong, but it's very limited to prohibit the MAC of VF from changing when enslaved by failover. The same as you indicated on the PT device. I suspect the same is prohibited on even virtio-net and failover is because of the same limitation. > > Both issues are up to host: if host needs them it needs the UUID > feature, if not MAC matching is sufficient. I know. What I'm saying is that we rely on STANDBY feature to control the visibility of the primary, not the UUID feature. -Siwei > > >> > or not supported by guest, >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature) >> > but the pairing will be fallback to using MAC address. Are you saying >> > you don't even want to plug in the primary when the UUID feature is >> > not supported by guest? Does the feature negotiation UUID have to >> > depend on STANDBY being set, or the other way around? I thought that >> > just the absence of STANDBY will prevent primary to get exposed to the >> > guest. >> > >> > -Siwei >> > >> >> >> >>> >> >>> > - in the guest, use the uuid from the virtio-net device's config space >> >>> > if applicable; else, fall back to matching by MAC as done today >
On Fri, Jun 22, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote: >> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: >> >> On Thu, 21 Jun 2018 21:20:13 +0300 >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> >> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: >> >> > > OK, so what about the following: >> >> > > >> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >> >> > > that we have a new uuid field in the virtio-net config space >> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid, >> >> > > offer VIRTIO_NET_F_STANDBY_UUID if set >> >> > > - when configuring, set the property to the group UUID of the vfio-pci >> >> > > device >> >> > > - in the guest, use the uuid from the virtio-net device's config space >> >> > > if applicable; else, fall back to matching by MAC as done today >> >> > > >> >> > > That should work for all virtio transports. >> >> > >> >> > True. I'm a bit unhappy that it's virtio net specific though >> >> > since down the road I expect we'll have a very similar feature >> >> > for scsi (and maybe others). >> >> > >> >> > But we do not have a way to have fields that are portable >> >> > both across devices and transports, and I think it would >> >> > be a useful addition. How would this work though? Any idea? >> >> >> >> Can we introduce some kind of device-independent config space area? >> >> Pushing back the device-specific config space by a certain value if the >> >> appropriate feature is negotiated and use that for things like the uuid? >> > >> > So config moves back and forth? >> > Reminds me of the msi vector mess we had with pci. >> > I'd rather have every transport add a new config. >> > >> >> But regardless of that, I'm not sure whether extending this approach to >> >> other device types is the way to go. Tying together two different >> >> devices is creating complicated situations at least in the hypervisor >> >> (even if it's fairly straightforward in the guest). [I have not come >> >> around again to look at the "how to handle visibility in QEMU" >> >> questions due to lack of cycles, sorry about that.] >> >> >> >> So, what's the goal of this approach? Only to allow migration with >> >> vfio-pci, or also to plug in a faster device and use it instead of an >> >> already attached paravirtualized device? >> > >> > These are two sides of the same coin, I think the second approach >> > is closer to what we are doing here. >> >> I'm leaning towards being conservative to keep the potential of doing >> both. It's the vfio migration itself that has to be addessed, not to >> make every virtio device to have an accelerated datapath, right? >> >> -Siwei > > I think that the approach we took (signal configuration > through standby) assumes standby always present and > primary appearing and disappearing. I can imagine that's still true even for 1-netdev model. As long as we can hide the lower devices. That's what I said "to keep the potential to support both" models. But we should not go further along to assume the other aspect ie. to get PV accelerated whenever possible, but not to get VF migrated whenever possible. That's essetially a big diveregence of the priority for the goal we'd want to pursue. -Siwei > > Anything else isn't well supported and I'm not sure we > should complicate code too much to support it. > >> >> > >> >> What about migration of vfio devices that are not easily replaced by a >> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and >> >> currently only) supported device is dasd (disks) -- which can do a lot >> >> of specialized things that virtio-blk does not support (and should not >> >> or even cannot support). >> > >> > But maybe virtio-scsi can? >> > >> >> Would it be more helpful to focus on generic >> >> migration support for vfio instead of going about it device by device? >> >> >> >> This network device approach already seems far along, so it makes sense >> >> to continue with it. But I'm not sure whether we want to spend time and >> >> energy on that for other device types rather than working on a general >> >> solution for vfio migration. >> > >> > I'm inclined to say finalizing this feature would be a good first step >> > and will teach us how we can move forward. >> > >> > -- >> > MST
On Fri, Jun 22, 2018 at 2:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 12:43:26PM -0700, Siwei Liu wrote: >> > The semantics are that the primary is always used if present in >> > preference to standby. >> OK. If this is the only semantics of what "standby" refers to in >> general, that is fine. >> >> I just don't want to limit the failover/standby semantics to the >> device model specifics, the "accelerated datapath" thing or whatever. >> I really don't know where the requirements of the "accelerated >> datapath" came from, > > It's a way to put it that matches what failover actually provides. > >> as the originial problem is migrating vfio >> devices which is in match of QEMU's live migration model. > > If you put it this way then it's natural to require that we have a > config with a working vfio device, and that we somehow add virtio for > duration of migration. OK. That's the STANDBY feature sematics as I expect. Not something like "we have a working virtio, but we need an accelerated datapath via VFIO when the VM is not migrating". The VFIO is the what needs to be concerned with, not the virtio. The way I view it, the STANDBY feature, although present in the virtio device, is served as a communication channel for the paired VFIO device, and the main purpose of its feature negotiation process has to be around for migrating the corresponding VFIO. While it's possible to re-use it as a side way to achieve "accelerated datapath". There could be other alternatives for vfio migration though, which do not need the virtio helper, so don't need to get a STANDBY virtio for that VFIO device. > >> Hyper-V has >> it's limitation to do 1-netdev should not impact how KVM/QEMU should >> be doing it. > > That's a linux thing and pretty orthogonal to host/guest interface. I agree. So don't assume any device model specifics in the host/guest interface. > >> > Jason said virtio net is sometimes preferable. >> > If that's the case don't make it a standby. >> > >> > More advanced use-cases do exist and e.g. Alexander Duyck >> > suggested using a switch-dev. >> >> The switchdev case would need a new feature bit, right? >> >> -Siwei > > I think it would need to be a completely new device. So how it relates to virtio or failover? Regards, -Siwei > >> > failover isn't it though. >> > >> > -- >> > MST
On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote: > On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote: > >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: > >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: > >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: > >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300 > >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> >>> > > >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: > >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid. > >> >>> >> > >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC > >> >>> >> (which some people seem to want in order to then use > >> >>> >> then with different containers). > >> >>> >> > >> >>> >> But it is also handy for when you assign a PF, since then you > >> >>> >> can't set the MAC. > >> >>> >> > >> >>> > > >> >>> > OK, so what about the following: > >> >>> > > >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > >> >>> > that we have a new uuid field in the virtio-net config space > >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid, > >> >>> > offer VIRTIO_NET_F_STANDBY_UUID if set > >> >>> > - when configuring, set the property to the group UUID of the vfio-pci > >> >>> > device > >> >>> > >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe > >> >>> to still expose UUID in the config space on virtio-pci? > >> >> > >> >> > >> >> Yes but guest is not supposed to read it. > >> >> > >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge > >> >>> where the corresponding vfio-pci device attached to for a guest which > >> >>> doesn't support the feature (legacy). > >> >>> > >> >>> -Siwei > >> >> > >> >> Yes but you won't add the primary behind such a bridge. > >> > > >> > I assume the UUID feature is a new one besides the exiting > >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, > >> > if UUID feature is present and supported by guest, we'll do pairing > >> > based on UUID; if UUID feature present > >> ^^^^^^^ is NOT present > > > > Let's go back for a bit. > > > > What is wrong with matching by mac? > > > > 1. Does not allow multiple NICs with same mac > > 2. Requires MAC to be programmed by host in the PT device > > (which is often possible with VFs but isn't possible with all PT > > devices) > > Might not neccessarily be something wrong, but it's very limited to > prohibit the MAC of VF from changing when enslaved by failover. You mean guest changing MAC? I'm not sure why we prohibit that. > The > same as you indicated on the PT device. I suspect the same is > prohibited on even virtio-net and failover is because of the same > limitation. > > > > > Both issues are up to host: if host needs them it needs the UUID > > feature, if not MAC matching is sufficient. > > I know. What I'm saying is that we rely on STANDBY feature to control > the visibility of the primary, not the UUID feature. > > -Siwei And what I'm saying is that it's up to a host policy. > > > > > >> > or not supported by guest, > >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature) > >> > but the pairing will be fallback to using MAC address. Are you saying > >> > you don't even want to plug in the primary when the UUID feature is > >> > not supported by guest? Does the feature negotiation UUID have to > >> > depend on STANDBY being set, or the other way around? I thought that > >> > just the absence of STANDBY will prevent primary to get exposed to the > >> > guest. > >> > > >> > -Siwei > >> > > >> >> > >> >>> > >> >>> > - in the guest, use the uuid from the virtio-net device's config space > >> >>> > if applicable; else, fall back to matching by MAC as done today > >
On Fri, Jun 22, 2018 at 03:25:19PM -0700, Siwei Liu wrote: > On Fri, Jun 22, 2018 at 2:47 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Jun 22, 2018 at 12:43:26PM -0700, Siwei Liu wrote: > >> > The semantics are that the primary is always used if present in > >> > preference to standby. > >> OK. If this is the only semantics of what "standby" refers to in > >> general, that is fine. > >> > >> I just don't want to limit the failover/standby semantics to the > >> device model specifics, the "accelerated datapath" thing or whatever. > >> I really don't know where the requirements of the "accelerated > >> datapath" came from, > > > > It's a way to put it that matches what failover actually provides. > > > >> as the originial problem is migrating vfio > >> devices which is in match of QEMU's live migration model. > > > > If you put it this way then it's natural to require that we have a > > config with a working vfio device, and that we somehow add virtio for > > duration of migration. > > OK. That's the STANDBY feature sematics as I expect. Maybe but that isn't what virtio or hyperv implement. If someone tells you otherwise, they are mistaken IMHO. > Not something like "we have a working virtio, but we need an > accelerated datapath via VFIO when the VM is not migrating". The VFIO > is the what needs to be concerned with, not the virtio. > The way I view it, the STANDBY feature, although present in the virtio > device, is served as a communication channel for the paired VFIO > device, and the main purpose of its feature negotiation process has to > be around for migrating the corresponding VFIO. While it's possible to > re-use it as a side way to achieve "accelerated datapath". > > There could be other alternatives for vfio migration though, which do > not need the virtio helper, so don't need to get a STANDBY virtio for > that VFIO device. > > > > >> Hyper-V has > >> it's limitation to do 1-netdev should not impact how KVM/QEMU should > >> be doing it. > > > > That's a linux thing and pretty orthogonal to host/guest interface. > > I agree. So don't assume any device model specifics in the host/guest interface. > > > > >> > Jason said virtio net is sometimes preferable. > >> > If that's the case don't make it a standby. > >> > > >> > More advanced use-cases do exist and e.g. Alexander Duyck > >> > suggested using a switch-dev. > >> > >> The switchdev case would need a new feature bit, right? > >> > >> -Siwei > > > > I think it would need to be a completely new device. > > So how it relates to virtio or failover? > > Regards, > -Siwei It might with time offer a super-set of the features. > > > >> > failover isn't it though. > >> > > >> > -- > >> > MST
On Fri, Jun 22, 2018 at 02:57:39PM -0700, Siwei Liu wrote: > On Fri, Jun 22, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote: > >> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > >> >> On Thu, 21 Jun 2018 21:20:13 +0300 > >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> >> > >> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > >> >> > > OK, so what about the following: > >> >> > > > >> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > >> >> > > that we have a new uuid field in the virtio-net config space > >> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > >> >> > > offer VIRTIO_NET_F_STANDBY_UUID if set > >> >> > > - when configuring, set the property to the group UUID of the vfio-pci > >> >> > > device > >> >> > > - in the guest, use the uuid from the virtio-net device's config space > >> >> > > if applicable; else, fall back to matching by MAC as done today > >> >> > > > >> >> > > That should work for all virtio transports. > >> >> > > >> >> > True. I'm a bit unhappy that it's virtio net specific though > >> >> > since down the road I expect we'll have a very similar feature > >> >> > for scsi (and maybe others). > >> >> > > >> >> > But we do not have a way to have fields that are portable > >> >> > both across devices and transports, and I think it would > >> >> > be a useful addition. How would this work though? Any idea? > >> >> > >> >> Can we introduce some kind of device-independent config space area? > >> >> Pushing back the device-specific config space by a certain value if the > >> >> appropriate feature is negotiated and use that for things like the uuid? > >> > > >> > So config moves back and forth? > >> > Reminds me of the msi vector mess we had with pci. > >> > I'd rather have every transport add a new config. > >> > > >> >> But regardless of that, I'm not sure whether extending this approach to > >> >> other device types is the way to go. Tying together two different > >> >> devices is creating complicated situations at least in the hypervisor > >> >> (even if it's fairly straightforward in the guest). [I have not come > >> >> around again to look at the "how to handle visibility in QEMU" > >> >> questions due to lack of cycles, sorry about that.] > >> >> > >> >> So, what's the goal of this approach? Only to allow migration with > >> >> vfio-pci, or also to plug in a faster device and use it instead of an > >> >> already attached paravirtualized device? > >> > > >> > These are two sides of the same coin, I think the second approach > >> > is closer to what we are doing here. > >> > >> I'm leaning towards being conservative to keep the potential of doing > >> both. It's the vfio migration itself that has to be addessed, not to > >> make every virtio device to have an accelerated datapath, right? > >> > >> -Siwei > > > > I think that the approach we took (signal configuration > > through standby) assumes standby always present and > > primary appearing and disappearing. > > I can imagine that's still true even for 1-netdev model. As long as we > can hide the lower devices. > > That's what I said "to keep the potential to support both" models. But > we should not go further along to assume the other aspect ie. to get > PV accelerated whenever possible, but not to get VF migrated whenever > possible. That's essetially a big diveregence of the priority for the > goal we'd want to pursue. > > -Siwei I suspect the diveregence will be lost on most users though simply because they don't even care about vfio. They just want things to go fast. Rephrasing requirements in terms of acceleration actually made things implementable. So I'm not in a hurry to try and go back to asking for a migrateable vfio - very easy to get stuck there. > > > > Anything else isn't well supported and I'm not sure we > > should complicate code too much to support it. > > > >> > >> > > >> >> What about migration of vfio devices that are not easily replaced by a > >> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and > >> >> currently only) supported device is dasd (disks) -- which can do a lot > >> >> of specialized things that virtio-blk does not support (and should not > >> >> or even cannot support). > >> > > >> > But maybe virtio-scsi can? > >> > > >> >> Would it be more helpful to focus on generic > >> >> migration support for vfio instead of going about it device by device? > >> >> > >> >> This network device approach already seems far along, so it makes sense > >> >> to continue with it. But I'm not sure whether we want to spend time and > >> >> energy on that for other device types rather than working on a general > >> >> solution for vfio migration. > >> > > >> > I'm inclined to say finalizing this feature would be a good first step > >> > and will teach us how we can move forward. > >> > > >> > -- > >> > MST
On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote: >> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote: >> >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: >> >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: >> >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: >> >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300 >> >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> >>> > >> >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >> >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid. >> >> >>> >> >> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC >> >> >>> >> (which some people seem to want in order to then use >> >> >>> >> then with different containers). >> >> >>> >> >> >> >>> >> But it is also handy for when you assign a PF, since then you >> >> >>> >> can't set the MAC. >> >> >>> >> >> >> >>> > >> >> >>> > OK, so what about the following: >> >> >>> > >> >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >> >> >>> > that we have a new uuid field in the virtio-net config space >> >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid, >> >> >>> > offer VIRTIO_NET_F_STANDBY_UUID if set >> >> >>> > - when configuring, set the property to the group UUID of the vfio-pci >> >> >>> > device >> >> >>> >> >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe >> >> >>> to still expose UUID in the config space on virtio-pci? >> >> >> >> >> >> >> >> >> Yes but guest is not supposed to read it. >> >> >> >> >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge >> >> >>> where the corresponding vfio-pci device attached to for a guest which >> >> >>> doesn't support the feature (legacy). >> >> >>> >> >> >>> -Siwei >> >> >> >> >> >> Yes but you won't add the primary behind such a bridge. >> >> > >> >> > I assume the UUID feature is a new one besides the exiting >> >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, >> >> > if UUID feature is present and supported by guest, we'll do pairing >> >> > based on UUID; if UUID feature present >> >> ^^^^^^^ is NOT present >> > >> > Let's go back for a bit. >> > >> > What is wrong with matching by mac? >> > >> > 1. Does not allow multiple NICs with same mac >> > 2. Requires MAC to be programmed by host in the PT device >> > (which is often possible with VFs but isn't possible with all PT >> > devices) >> >> Might not neccessarily be something wrong, but it's very limited to >> prohibit the MAC of VF from changing when enslaved by failover. > > You mean guest changing MAC? I'm not sure why we prohibit that. I think Sridhar and Jiri might be better person to answer it. My impression was that sync'ing the MAC address change between all 3 devices is challenging, as the failover driver uses MAC address to match net_device internally. > > >> The >> same as you indicated on the PT device. I suspect the same is >> prohibited on even virtio-net and failover is because of the same >> limitation. >> >> > >> > Both issues are up to host: if host needs them it needs the UUID >> > feature, if not MAC matching is sufficient. >> >> I know. What I'm saying is that we rely on STANDBY feature to control >> the visibility of the primary, not the UUID feature. >> >> -Siwei > > And what I'm saying is that it's up to a host policy. So lets say host policy explicitly requires UUID but the guest does not support it. The guest could be a legacy guest with no STANDBY support, or a relevately recent guest with STANDBY support but without the UUID support. In either case, since host asked for UUID matching explicitly, maybe because it requires different NIC using same MAC, so it has to override the negotiation result of STANDBY, even if STANDBY is set and supported? That will end up with inter-feature dependency over STANDBY, as you see. -Siwei > >> > >> > >> >> > or not supported by guest, >> >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature) >> >> > but the pairing will be fallback to using MAC address. Are you saying >> >> > you don't even want to plug in the primary when the UUID feature is >> >> > not supported by guest? Does the feature negotiation UUID have to >> >> > depend on STANDBY being set, or the other way around? I thought that >> >> > just the absence of STANDBY will prevent primary to get exposed to the >> >> > guest. >> >> > >> >> > -Siwei >> >> > >> >> >> >> >> >>> >> >> >>> > - in the guest, use the uuid from the virtio-net device's config space >> >> >>> > if applicable; else, fall back to matching by MAC as done today >> >
On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 02:57:39PM -0700, Siwei Liu wrote: >> On Fri, Jun 22, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Fri, Jun 22, 2018 at 01:21:45PM -0700, Siwei Liu wrote: >> >> On Fri, Jun 22, 2018 at 12:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> >> > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: >> >> >> On Thu, 21 Jun 2018 21:20:13 +0300 >> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote: >> >> >> >> >> >> > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: >> >> >> > > OK, so what about the following: >> >> >> > > >> >> >> > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >> >> >> > > that we have a new uuid field in the virtio-net config space >> >> >> > > - in QEMU, add a property for virtio-net that allows to specify a uuid, >> >> >> > > offer VIRTIO_NET_F_STANDBY_UUID if set >> >> >> > > - when configuring, set the property to the group UUID of the vfio-pci >> >> >> > > device >> >> >> > > - in the guest, use the uuid from the virtio-net device's config space >> >> >> > > if applicable; else, fall back to matching by MAC as done today >> >> >> > > >> >> >> > > That should work for all virtio transports. >> >> >> > >> >> >> > True. I'm a bit unhappy that it's virtio net specific though >> >> >> > since down the road I expect we'll have a very similar feature >> >> >> > for scsi (and maybe others). >> >> >> > >> >> >> > But we do not have a way to have fields that are portable >> >> >> > both across devices and transports, and I think it would >> >> >> > be a useful addition. How would this work though? Any idea? >> >> >> >> >> >> Can we introduce some kind of device-independent config space area? >> >> >> Pushing back the device-specific config space by a certain value if the >> >> >> appropriate feature is negotiated and use that for things like the uuid? >> >> > >> >> > So config moves back and forth? >> >> > Reminds me of the msi vector mess we had with pci. >> >> > I'd rather have every transport add a new config. >> >> > >> >> >> But regardless of that, I'm not sure whether extending this approach to >> >> >> other device types is the way to go. Tying together two different >> >> >> devices is creating complicated situations at least in the hypervisor >> >> >> (even if it's fairly straightforward in the guest). [I have not come >> >> >> around again to look at the "how to handle visibility in QEMU" >> >> >> questions due to lack of cycles, sorry about that.] >> >> >> >> >> >> So, what's the goal of this approach? Only to allow migration with >> >> >> vfio-pci, or also to plug in a faster device and use it instead of an >> >> >> already attached paravirtualized device? >> >> > >> >> > These are two sides of the same coin, I think the second approach >> >> > is closer to what we are doing here. >> >> >> >> I'm leaning towards being conservative to keep the potential of doing >> >> both. It's the vfio migration itself that has to be addessed, not to >> >> make every virtio device to have an accelerated datapath, right? >> >> >> >> -Siwei >> > >> > I think that the approach we took (signal configuration >> > through standby) assumes standby always present and >> > primary appearing and disappearing. >> >> I can imagine that's still true even for 1-netdev model. As long as we >> can hide the lower devices. >> >> That's what I said "to keep the potential to support both" models. But >> we should not go further along to assume the other aspect ie. to get >> PV accelerated whenever possible, but not to get VF migrated whenever >> possible. That's essetially a big diveregence of the priority for the >> goal we'd want to pursue. >> >> -Siwei > > I suspect the diveregence will be lost on most users though > simply because they don't even care about vfio. They just > want things to go fast. Like Jason said, VF isn't faster than virtio-net in all cases. It depends on the workload and performance metrics: throughput, latency, or packet per second. > > Rephrasing requirements in terms of acceleration actually > made things implementable. So I'm not in a hurry to try > and go back to asking for a migrateable vfio - very > easy to get stuck there. Understood. When we get all ethtool_ops exposed on the failover interface, we'll get back to attack migrateable vfio with the 1-netdev model. -Siwei > >> > >> > Anything else isn't well supported and I'm not sure we >> > should complicate code too much to support it. >> > >> >> >> >> > >> >> >> What about migration of vfio devices that are not easily replaced by a >> >> >> paravirtualized device? I'm thinking of vfio-ccw, where our main (and >> >> >> currently only) supported device is dasd (disks) -- which can do a lot >> >> >> of specialized things that virtio-blk does not support (and should not >> >> >> or even cannot support). >> >> > >> >> > But maybe virtio-scsi can? >> >> > >> >> >> Would it be more helpful to focus on generic >> >> >> migration support for vfio instead of going about it device by device? >> >> >> >> >> >> This network device approach already seems far along, so it makes sense >> >> >> to continue with it. But I'm not sure whether we want to spend time and >> >> >> energy on that for other device types rather than working on a general >> >> >> solution for vfio migration. >> >> > >> >> > I'm inclined to say finalizing this feature would be a good first step >> >> > and will teach us how we can move forward. >> >> > >> >> > -- >> >> > MST
On Fri, Jun 22, 2018 at 4:40 PM, Siwei Liu <loseweigh@gmail.com> wrote: > On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote: >>> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>> > On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote: >>> >> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: >>> >> > On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>> >> >> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: >>> >> >>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: >>> >> >>> > On Wed, 20 Jun 2018 22:48:58 +0300 >>> >> >>> > "Michael S. Tsirkin" <mst@redhat.com> wrote: >>> >> >>> > >>> >> >>> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >>> >> >>> >> > In any case, I'm not sure anymore why we'd want the extra uuid. >>> >> >>> >> >>> >> >>> >> It's mostly so we can have e.g. multiple devices with same MAC >>> >> >>> >> (which some people seem to want in order to then use >>> >> >>> >> then with different containers). >>> >> >>> >> >>> >> >>> >> But it is also handy for when you assign a PF, since then you >>> >> >>> >> can't set the MAC. >>> >> >>> >> >>> >> >>> > >>> >> >>> > OK, so what about the following: >>> >> >>> > >>> >> >>> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >>> >> >>> > that we have a new uuid field in the virtio-net config space >>> >> >>> > - in QEMU, add a property for virtio-net that allows to specify a uuid, >>> >> >>> > offer VIRTIO_NET_F_STANDBY_UUID if set >>> >> >>> > - when configuring, set the property to the group UUID of the vfio-pci >>> >> >>> > device >>> >> >>> >>> >> >>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe >>> >> >>> to still expose UUID in the config space on virtio-pci? >>> >> >> >>> >> >> >>> >> >> Yes but guest is not supposed to read it. >>> >> >> >>> >> >>> I'm not even sure if it's sane to expose group UUID on the PCI bridge >>> >> >>> where the corresponding vfio-pci device attached to for a guest which >>> >> >>> doesn't support the feature (legacy). >>> >> >>> >>> >> >>> -Siwei >>> >> >> >>> >> >> Yes but you won't add the primary behind such a bridge. >>> >> > >>> >> > I assume the UUID feature is a new one besides the exiting >>> >> > VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, >>> >> > if UUID feature is present and supported by guest, we'll do pairing >>> >> > based on UUID; if UUID feature present >>> >> ^^^^^^^ is NOT present >>> > >>> > Let's go back for a bit. >>> > >>> > What is wrong with matching by mac? >>> > >>> > 1. Does not allow multiple NICs with same mac >>> > 2. Requires MAC to be programmed by host in the PT device >>> > (which is often possible with VFs but isn't possible with all PT >>> > devices) >>> >>> Might not neccessarily be something wrong, but it's very limited to >>> prohibit the MAC of VF from changing when enslaved by failover. >> >> You mean guest changing MAC? I'm not sure why we prohibit that. > > I think Sridhar and Jiri might be better person to answer it. My > impression was that sync'ing the MAC address change between all 3 > devices is challenging, as the failover driver uses MAC address to > match net_device internally. > >> >> >>> The >>> same as you indicated on the PT device. I suspect the same is >>> prohibited on even virtio-net and failover is because of the same >>> limitation. >>> >>> > >>> > Both issues are up to host: if host needs them it needs the UUID >>> > feature, if not MAC matching is sufficient. >>> >>> I know. What I'm saying is that we rely on STANDBY feature to control >>> the visibility of the primary, not the UUID feature. >>> >>> -Siwei >> >> And what I'm saying is that it's up to a host policy. > > So lets say host policy explicitly requires UUID but the guest does > not support it. The guest could be a legacy guest with no STANDBY > support, or a relevately recent guest with STANDBY support but without > the UUID support. In either case, since host asked for UUID matching > explicitly, maybe because it requires different NIC using same MAC, so > it has to override the negotiation result of STANDBY, even if STANDBY > is set and supported? That will end up with inter-feature dependency > over STANDBY, as you see. I forgot to mention the above has the assumption that we expose both STANDBY and UUID feature to QEMU user. In fact, as we're going towards not exposing the STANDBY feature directly to user, UUID may be always required to enable STANDBY. If not, how do we make sure QEMU can control the visibility of primary device? Something to be confirmed before implementing the code. -Siwei > > -Siwei > >> >>> > >>> > >>> >> > or not supported by guest, >>> >> > we'll still plug in the VF (if guest supports the _F_STANDBY feature) >>> >> > but the pairing will be fallback to using MAC address. Are you saying >>> >> > you don't even want to plug in the primary when the UUID feature is >>> >> > not supported by guest? Does the feature negotiation UUID have to >>> >> > depend on STANDBY being set, or the other way around? I thought that >>> >> > just the absence of STANDBY will prevent primary to get exposed to the >>> >> > guest. >>> >> > >>> >> > -Siwei >>> >> > >>> >> >> >>> >> >>> >>> >> >>> > - in the guest, use the uuid from the virtio-net device's config space >>> >> >>> > if applicable; else, fall back to matching by MAC as done today >>> >
On Fri, Jun 22, 2018 at 05:17:10PM -0700, Siwei Liu wrote: > I forgot to mention the above has the assumption that we expose both > STANDBY and UUID feature to QEMU user. In fact, as we're going towards > not exposing the STANDBY feature directly to user, UUID may be always > required to enable STANDBY. Sounds good. > If not, how do we make sure QEMU can > control the visibility of primary device? Hypervisors fundamentally always can control visibility of all virtual devices. > Something to be confirmed > before implementing the code. >
On Fri, 22 Jun 2018 22:05:50 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > > On Thu, 21 Jun 2018 21:20:13 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > > > > OK, so what about the following: > > > > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > > > that we have a new uuid field in the virtio-net config space > > > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > > > offer VIRTIO_NET_F_STANDBY_UUID if set > > > > - when configuring, set the property to the group UUID of the vfio-pci > > > > device > > > > - in the guest, use the uuid from the virtio-net device's config space > > > > if applicable; else, fall back to matching by MAC as done today > > > > > > > > That should work for all virtio transports. > > > > > > True. I'm a bit unhappy that it's virtio net specific though > > > since down the road I expect we'll have a very similar feature > > > for scsi (and maybe others). > > > > > > But we do not have a way to have fields that are portable > > > both across devices and transports, and I think it would > > > be a useful addition. How would this work though? Any idea? > > > > Can we introduce some kind of device-independent config space area? > > Pushing back the device-specific config space by a certain value if the > > appropriate feature is negotiated and use that for things like the uuid? > > So config moves back and forth? > Reminds me of the msi vector mess we had with pci. Yes, that would be a bit unfortunate. > I'd rather have every transport add a new config. You mean via different mechanisms? > > > But regardless of that, I'm not sure whether extending this approach to > > other device types is the way to go. Tying together two different > > devices is creating complicated situations at least in the hypervisor > > (even if it's fairly straightforward in the guest). [I have not come > > around again to look at the "how to handle visibility in QEMU" > > questions due to lack of cycles, sorry about that.] > > > > So, what's the goal of this approach? Only to allow migration with > > vfio-pci, or also to plug in a faster device and use it instead of an > > already attached paravirtualized device? > > These are two sides of the same coin, I think the second approach > is closer to what we are doing here. Thinking about it, do we need any knob to keep the vfio device invisible if the virtio device is not present? IOW, how does the hypervisor know that the vfio device is supposed to be paired with a virtio device? It seems we need an explicit tie-in. > > > What about migration of vfio devices that are not easily replaced by a > > paravirtualized device? I'm thinking of vfio-ccw, where our main (and > > currently only) supported device is dasd (disks) -- which can do a lot > > of specialized things that virtio-blk does not support (and should not > > or even cannot support). > > But maybe virtio-scsi can? I don't think so. Dasds have some channel commands that don't map easily to scsi commands. > > > Would it be more helpful to focus on generic > > migration support for vfio instead of going about it device by device? > > > > This network device approach already seems far along, so it makes sense > > to continue with it. But I'm not sure whether we want to spend time and > > energy on that for other device types rather than working on a general > > solution for vfio migration. > > I'm inclined to say finalizing this feature would be a good first step > and will teach us how we can move forward. I'm not opposed to figuring out this one, but I'm not sure whether we want to extend it to more device types. Are people looking into generic migration support? I have it on my things-to-look-at list (figuring out what needs to be device specific and what can be generic, figuring out how we can support vfio-ccw, etc.).
On 6/22/2018 5:17 PM, Siwei Liu wrote: > On Fri, Jun 22, 2018 at 4:40 PM, Siwei Liu <loseweigh@gmail.com> wrote: >> On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>> On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote: >>>> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>>> On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote: >>>>>> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote: >>>>>>> On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>>> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote: >>>>>>>>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote: >>>>>>>>>> On Wed, 20 Jun 2018 22:48:58 +0300 >>>>>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote: >>>>>>>>>>>> In any case, I'm not sure anymore why we'd want the extra uuid. >>>>>>>>>>> It's mostly so we can have e.g. multiple devices with same MAC >>>>>>>>>>> (which some people seem to want in order to then use >>>>>>>>>>> then with different containers). >>>>>>>>>>> >>>>>>>>>>> But it is also handy for when you assign a PF, since then you >>>>>>>>>>> can't set the MAC. >>>>>>>>>>> >>>>>>>>>> OK, so what about the following: >>>>>>>>>> >>>>>>>>>> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates >>>>>>>>>> that we have a new uuid field in the virtio-net config space >>>>>>>>>> - in QEMU, add a property for virtio-net that allows to specify a uuid, >>>>>>>>>> offer VIRTIO_NET_F_STANDBY_UUID if set >>>>>>>>>> - when configuring, set the property to the group UUID of the vfio-pci >>>>>>>>>> device >>>>>>>>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe >>>>>>>>> to still expose UUID in the config space on virtio-pci? >>>>>>>> >>>>>>>> Yes but guest is not supposed to read it. >>>>>>>> >>>>>>>>> I'm not even sure if it's sane to expose group UUID on the PCI bridge >>>>>>>>> where the corresponding vfio-pci device attached to for a guest which >>>>>>>>> doesn't support the feature (legacy). >>>>>>>>> >>>>>>>>> -Siwei >>>>>>>> Yes but you won't add the primary behind such a bridge. >>>>>>> I assume the UUID feature is a new one besides the exiting >>>>>>> VIRTIO_NET_F_STANDBY feature, where I think the current proposal is, >>>>>>> if UUID feature is present and supported by guest, we'll do pairing >>>>>>> based on UUID; if UUID feature present >>>>>> ^^^^^^^ is NOT present >>>>> Let's go back for a bit. >>>>> >>>>> What is wrong with matching by mac? >>>>> >>>>> 1. Does not allow multiple NICs with same mac >>>>> 2. Requires MAC to be programmed by host in the PT device >>>>> (which is often possible with VFs but isn't possible with all PT >>>>> devices) >>>> Might not neccessarily be something wrong, but it's very limited to >>>> prohibit the MAC of VF from changing when enslaved by failover. >>> You mean guest changing MAC? I'm not sure why we prohibit that. >> I think Sridhar and Jiri might be better person to answer it. My >> impression was that sync'ing the MAC address change between all 3 >> devices is challenging, as the failover driver uses MAC address to >> match net_device internally. Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement of the MAC between the PF and VF. Allowing the guest to change the MAC will require synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers don't allow changing guest MAC unless it is a trusted VF.
On Mon, Jun 25, 2018 at 11:55:12AM +0200, Cornelia Huck wrote: > On Fri, 22 Jun 2018 22:05:50 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > > > On Thu, 21 Jun 2018 21:20:13 +0300 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > > > > > OK, so what about the following: > > > > > > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > > > > that we have a new uuid field in the virtio-net config space > > > > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > > > > offer VIRTIO_NET_F_STANDBY_UUID if set > > > > > - when configuring, set the property to the group UUID of the vfio-pci > > > > > device > > > > > - in the guest, use the uuid from the virtio-net device's config space > > > > > if applicable; else, fall back to matching by MAC as done today > > > > > > > > > > That should work for all virtio transports. > > > > > > > > True. I'm a bit unhappy that it's virtio net specific though > > > > since down the road I expect we'll have a very similar feature > > > > for scsi (and maybe others). > > > > > > > > But we do not have a way to have fields that are portable > > > > both across devices and transports, and I think it would > > > > be a useful addition. How would this work though? Any idea? > > > > > > Can we introduce some kind of device-independent config space area? > > > Pushing back the device-specific config space by a certain value if the > > > appropriate feature is negotiated and use that for things like the uuid? > > > > So config moves back and forth? > > Reminds me of the msi vector mess we had with pci. > > Yes, that would be a bit unfortunate. > > > I'd rather have every transport add a new config. > > You mean via different mechanisms? I guess so. > > > > > But regardless of that, I'm not sure whether extending this approach to > > > other device types is the way to go. Tying together two different > > > devices is creating complicated situations at least in the hypervisor > > > (even if it's fairly straightforward in the guest). [I have not come > > > around again to look at the "how to handle visibility in QEMU" > > > questions due to lack of cycles, sorry about that.] > > > > > > So, what's the goal of this approach? Only to allow migration with > > > vfio-pci, or also to plug in a faster device and use it instead of an > > > already attached paravirtualized device? > > > > These are two sides of the same coin, I think the second approach > > is closer to what we are doing here. > > Thinking about it, do we need any knob to keep the vfio device > invisible if the virtio device is not present? IOW, how does the > hypervisor know that the vfio device is supposed to be paired with a > virtio device? It seems we need an explicit tie-in. If we are going the way of the bridge, both bridge and virtio would have some kind of id. When pairing using mac, I'm less sure. PAss vfio device mac to qemu as a property? > > > > > What about migration of vfio devices that are not easily replaced by a > > > paravirtualized device? I'm thinking of vfio-ccw, where our main (and > > > currently only) supported device is dasd (disks) -- which can do a lot > > > of specialized things that virtio-blk does not support (and should not > > > or even cannot support). > > > > But maybe virtio-scsi can? > > I don't think so. Dasds have some channel commands that don't map > easily to scsi commands. There's always a choice of adding these to the spec. E.g. FC extensions were proposed, I don't remember why they are still stuck. > > > > > Would it be more helpful to focus on generic > > > migration support for vfio instead of going about it device by device? > > > > > > This network device approach already seems far along, so it makes sense > > > to continue with it. But I'm not sure whether we want to spend time and > > > energy on that for other device types rather than working on a general > > > solution for vfio migration. > > > > I'm inclined to say finalizing this feature would be a good first step > > and will teach us how we can move forward. > > I'm not opposed to figuring out this one, but I'm not sure whether we > want to extend it to more device types. > > Are people looking into generic migration support? I have it on my > things-to-look-at list (figuring out what needs to be device specific > and what can be generic, figuring out how we can support vfio-ccw, > etc.). I expect to see more of it if SPDK makes progress.
On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: > > > > > Might not neccessarily be something wrong, but it's very limited to > > > > > prohibit the MAC of VF from changing when enslaved by failover. > > > > You mean guest changing MAC? I'm not sure why we prohibit that. > > > I think Sridhar and Jiri might be better person to answer it. My > > > impression was that sync'ing the MAC address change between all 3 > > > devices is challenging, as the failover driver uses MAC address to > > > match net_device internally. > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement > of the MAC between the PF and VF. Allowing the guest to change the MAC will require > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers > don't allow changing guest MAC unless it is a trusted VF. OK but it's a policy thing. Maybe it's a trusted VF. Who knows? For example I can see host just failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
On Tue, 26 Jun 2018 04:46:03 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Mon, Jun 25, 2018 at 11:55:12AM +0200, Cornelia Huck wrote: > > On Fri, 22 Jun 2018 22:05:50 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > > > > On Thu, 21 Jun 2018 21:20:13 +0300 > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > > > > > > OK, so what about the following: > > > > > > > > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > > > > > that we have a new uuid field in the virtio-net config space > > > > > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > > > > > offer VIRTIO_NET_F_STANDBY_UUID if set > > > > > > - when configuring, set the property to the group UUID of the vfio-pci > > > > > > device > > > > > > - in the guest, use the uuid from the virtio-net device's config space > > > > > > if applicable; else, fall back to matching by MAC as done today > > > > > > > > > > > > That should work for all virtio transports. > > > > > > > > > > True. I'm a bit unhappy that it's virtio net specific though > > > > > since down the road I expect we'll have a very similar feature > > > > > for scsi (and maybe others). > > > > > > > > > > But we do not have a way to have fields that are portable > > > > > both across devices and transports, and I think it would > > > > > be a useful addition. How would this work though? Any idea? > > > > > > > > Can we introduce some kind of device-independent config space area? > > > > Pushing back the device-specific config space by a certain value if the > > > > appropriate feature is negotiated and use that for things like the uuid? > > > > > > So config moves back and forth? > > > Reminds me of the msi vector mess we had with pci. > > > > Yes, that would be a bit unfortunate. > > > > > I'd rather have every transport add a new config. > > > > You mean via different mechanisms? > > I guess so. Is there an alternate mechanism for pci to use? (Not so familiar with it.) For ccw, this needs more thought. We already introduced two commands for reading/writing the config space (a concept that does not really exist on s390). There's the generic read configuration data command, but the data returned by it is not really generic enough. So we would need one new command (or two, if we need to write as well). I'm not sure about that yet. > > > > > > > > But regardless of that, I'm not sure whether extending this approach to > > > > other device types is the way to go. Tying together two different > > > > devices is creating complicated situations at least in the hypervisor > > > > (even if it's fairly straightforward in the guest). [I have not come > > > > around again to look at the "how to handle visibility in QEMU" > > > > questions due to lack of cycles, sorry about that.] > > > > > > > > So, what's the goal of this approach? Only to allow migration with > > > > vfio-pci, or also to plug in a faster device and use it instead of an > > > > already attached paravirtualized device? > > > > > > These are two sides of the same coin, I think the second approach > > > is closer to what we are doing here. > > > > Thinking about it, do we need any knob to keep the vfio device > > invisible if the virtio device is not present? IOW, how does the > > hypervisor know that the vfio device is supposed to be paired with a > > virtio device? It seems we need an explicit tie-in. > > If we are going the way of the bridge, both bridge and > virtio would have some kind of id. So the presence of the id would indicate "this is one part of a pair"? > > When pairing using mac, I'm less sure. PAss vfio device mac to qemu > as a property? That feels a bit odd. "This is the vfio device's mac, use this instead of your usual mac property"? As we have not designed the QEMU interface yet, just go with the id in any case? The guest can still match by mac. > > > > What about migration of vfio devices that are not easily replaced by a > > > > paravirtualized device? I'm thinking of vfio-ccw, where our main (and > > > > currently only) supported device is dasd (disks) -- which can do a lot > > > > of specialized things that virtio-blk does not support (and should not > > > > or even cannot support). > > > > > > But maybe virtio-scsi can? > > > > I don't think so. Dasds have some channel commands that don't map > > easily to scsi commands. > > There's always a choice of adding these to the spec. > E.g. FC extensions were proposed, I don't remember why they > are still stuck. FC extensions are a completely different kind of enhancements, though. For a start, they are not unique to a certain transport. Also, we have a whole list of special dasd issues. Weird disk layout for eckd, low-level disk formatting, etc. (See the list of commands in drivers/s390/block/dasd_eckd.h for an idea. There's also no public documentation AFAICS; https://en.wikipedia.org/wiki/ECKD does not link to anything interesting.) I don't think we want to cram stuff like this into a completely different framework.
On Tue, Jun 26, 2018 at 01:55:09PM +0200, Cornelia Huck wrote: > On Tue, 26 Jun 2018 04:46:03 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Mon, Jun 25, 2018 at 11:55:12AM +0200, Cornelia Huck wrote: > > > On Fri, 22 Jun 2018 22:05:50 +0300 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > > > > > On Thu, 21 Jun 2018 21:20:13 +0300 > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > > > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote: > > > > > > > OK, so what about the following: > > > > > > > > > > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates > > > > > > > that we have a new uuid field in the virtio-net config space > > > > > > > - in QEMU, add a property for virtio-net that allows to specify a uuid, > > > > > > > offer VIRTIO_NET_F_STANDBY_UUID if set > > > > > > > - when configuring, set the property to the group UUID of the vfio-pci > > > > > > > device > > > > > > > - in the guest, use the uuid from the virtio-net device's config space > > > > > > > if applicable; else, fall back to matching by MAC as done today > > > > > > > > > > > > > > That should work for all virtio transports. > > > > > > > > > > > > True. I'm a bit unhappy that it's virtio net specific though > > > > > > since down the road I expect we'll have a very similar feature > > > > > > for scsi (and maybe others). > > > > > > > > > > > > But we do not have a way to have fields that are portable > > > > > > both across devices and transports, and I think it would > > > > > > be a useful addition. How would this work though? Any idea? > > > > > > > > > > Can we introduce some kind of device-independent config space area? > > > > > Pushing back the device-specific config space by a certain value if the > > > > > appropriate feature is negotiated and use that for things like the uuid? > > > > > > > > So config moves back and forth? > > > > Reminds me of the msi vector mess we had with pci. > > > > > > Yes, that would be a bit unfortunate. > > > > > > > I'd rather have every transport add a new config. > > > > > > You mean via different mechanisms? > > > > I guess so. > > Is there an alternate mechanism for pci to use? (Not so familiar with > it.) We have a device and transport config capability. We could add a generic config capability too. > For ccw, this needs more thought. We already introduced two commands > for reading/writing the config space (a concept that does not really > exist on s390). There's the generic read configuration data command, > but the data returned by it is not really generic enough. So we would > need one new command (or two, if we need to write as well). I'm not > sure about that yet. > > > > > > > > > > > > But regardless of that, I'm not sure whether extending this approach to > > > > > other device types is the way to go. Tying together two different > > > > > devices is creating complicated situations at least in the hypervisor > > > > > (even if it's fairly straightforward in the guest). [I have not come > > > > > around again to look at the "how to handle visibility in QEMU" > > > > > questions due to lack of cycles, sorry about that.] > > > > > > > > > > So, what's the goal of this approach? Only to allow migration with > > > > > vfio-pci, or also to plug in a faster device and use it instead of an > > > > > already attached paravirtualized device? > > > > > > > > These are two sides of the same coin, I think the second approach > > > > is closer to what we are doing here. > > > > > > Thinking about it, do we need any knob to keep the vfio device > > > invisible if the virtio device is not present? IOW, how does the > > > hypervisor know that the vfio device is supposed to be paired with a > > > virtio device? It seems we need an explicit tie-in. > > > > If we are going the way of the bridge, both bridge and > > virtio would have some kind of id. > > So the presence of the id would indicate "this is one part of a pair"? I guess so, yes. > > > > When pairing using mac, I'm less sure. PAss vfio device mac to qemu > > as a property? > > That feels a bit odd. "This is the vfio device's mac, use this instead > of your usual mac property"? As we have not designed the QEMU interface > yet, just go with the id in any case? The guest can still match by mac. OK > > > > > What about migration of vfio devices that are not easily replaced by a > > > > > paravirtualized device? I'm thinking of vfio-ccw, where our main (and > > > > > currently only) supported device is dasd (disks) -- which can do a lot > > > > > of specialized things that virtio-blk does not support (and should not > > > > > or even cannot support). > > > > > > > > But maybe virtio-scsi can? > > > > > > I don't think so. Dasds have some channel commands that don't map > > > easily to scsi commands. > > > > There's always a choice of adding these to the spec. > > E.g. FC extensions were proposed, I don't remember why they > > are still stuck. > > FC extensions are a completely different kind of enhancements, though. > For a start, they are not unique to a certain transport. > > Also, we have a whole list of special dasd issues. Weird disk layout > for eckd, low-level disk formatting, etc. (See the list of commands in > drivers/s390/block/dasd_eckd.h for an idea. There's also no public > documentation AFAICS; https://en.wikipedia.org/wiki/ECKD does not link > to anything interesting.) I don't think we want to cram stuff like this > into a completely different framework.
On Fri, 22 Jun 2018 17:05:04 -0700 Siwei Liu <loseweigh@gmail.com> wrote: > On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > I suspect the diveregence will be lost on most users though > > simply because they don't even care about vfio. They just > > want things to go fast. > > Like Jason said, VF isn't faster than virtio-net in all cases. It > depends on the workload and performance metrics: throughput, latency, > or packet per second. So, will it be guest/admin-controllable then where the traffic flows through? Just because we do have a vf available after negotiation of the feature bit, it does not necessarily mean we want to use it? Do we (the guest) even want to make it visible in that case?
On Tue, 26 Jun 2018 04:50:25 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: > > > > > > Might not neccessarily be something wrong, but it's very limited to > > > > > > prohibit the MAC of VF from changing when enslaved by failover. > > > > > You mean guest changing MAC? I'm not sure why we prohibit that. > > > > I think Sridhar and Jiri might be better person to answer it. My > > > > impression was that sync'ing the MAC address change between all 3 > > > > devices is challenging, as the failover driver uses MAC address to > > > > match net_device internally. > > > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement > > of the MAC between the PF and VF. Allowing the guest to change the MAC will require > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers > > don't allow changing guest MAC unless it is a trusted VF. > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows? > For example I can see host just > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. > So, what I get from this is that QEMU needs to be able to control all of standby, uuid, and mac to accommodate the different setups (respectively have libvirt/management software set it up). Is the host able to find out respectively define whether a VF is trusted?
On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote: > On Tue, 26 Jun 2018 04:50:25 +0300 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: > > > > > > > Might not neccessarily be something wrong, but it's very limited to > > > > > > > prohibit the MAC of VF from changing when enslaved by failover. > > > > > > You mean guest changing MAC? I'm not sure why we prohibit that. > > > > > I think Sridhar and Jiri might be better person to answer it. My > > > > > impression was that sync'ing the MAC address change between all 3 > > > > > devices is challenging, as the failover driver uses MAC address to > > > > > match net_device internally. > > > > > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement > > > of the MAC between the PF and VF. Allowing the guest to change the MAC will require > > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers > > > don't allow changing guest MAC unless it is a trusted VF. > > > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows? > > For example I can see host just > > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. > > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. > > > > So, what I get from this is that QEMU needs to be able to control all > of standby, uuid, and mac to accommodate the different setups > (respectively have libvirt/management software set it up). Is the host > able to find out respectively define whether a VF is trusted? You do it with ip link I think but QEMU doesn't normally do this, it relies on libvirt to poke at host kernel and supply the info.
On Tue, 26 Jun 2018 18:38:51 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote: > > On Tue, 26 Jun 2018 04:50:25 +0300 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: > > > > > > > > Might not neccessarily be something wrong, but it's very limited to > > > > > > > > prohibit the MAC of VF from changing when enslaved by failover. > > > > > > > You mean guest changing MAC? I'm not sure why we prohibit that. > > > > > > I think Sridhar and Jiri might be better person to answer it. My > > > > > > impression was that sync'ing the MAC address change between all 3 > > > > > > devices is challenging, as the failover driver uses MAC address to > > > > > > match net_device internally. > > > > > > > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement > > > > of the MAC between the PF and VF. Allowing the guest to change the MAC will require > > > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers > > > > don't allow changing guest MAC unless it is a trusted VF. > > > > > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows? > > > For example I can see host just > > > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. > > > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. > > > > > > > So, what I get from this is that QEMU needs to be able to control all > > of standby, uuid, and mac to accommodate the different setups > > (respectively have libvirt/management software set it up). Is the host > > able to find out respectively define whether a VF is trusted? > > You do it with ip link I think but QEMU doesn't normally do this, > it relies on libvirt to poke at host kernel and supply the info. > Ok, that makes me conclude that we definitely need to involve the libvirt folks before we proceed further with defining QEMU interfaces.
On Tue, Jun 26, 2018 at 06:03:16PM +0200, Cornelia Huck wrote: > Ok, that makes me conclude that we definitely need to involve the > libvirt folks before we proceed further with defining QEMU interfaces. That's always a wise thing to do.
On Tue, Jun 26, 2018 at 05:08:13PM +0200, Cornelia Huck wrote: > On Fri, 22 Jun 2018 17:05:04 -0700 > Siwei Liu <loseweigh@gmail.com> wrote: > > > On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > I suspect the diveregence will be lost on most users though > > > simply because they don't even care about vfio. They just > > > want things to go fast. > > > > Like Jason said, VF isn't faster than virtio-net in all cases. It > > depends on the workload and performance metrics: throughput, latency, > > or packet per second. > > So, will it be guest/admin-controllable then where the traffic flows > through? Just because we do have a vf available after negotiation of > the feature bit, it does not necessarily mean we want to use it? Do we > (the guest) even want to make it visible in that case? I think these ideas belong to what Alex Duyck wanted to do: some kind of advanced device that isn't tied to any network interfaces and allows workload and performance specific tuning. Way out of scope for a simple failover, and more importantly, no one is looking at even enumerating the problems involved, much less solving them.
On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: >> > > > > Might not neccessarily be something wrong, but it's very limited to >> > > > > prohibit the MAC of VF from changing when enslaved by failover. >> > > > You mean guest changing MAC? I'm not sure why we prohibit that. >> > > I think Sridhar and Jiri might be better person to answer it. My >> > > impression was that sync'ing the MAC address change between all 3 >> > > devices is challenging, as the failover driver uses MAC address to >> > > match net_device internally. >> >> Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement >> of the MAC between the PF and VF. Allowing the guest to change the MAC will require >> synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers >> don't allow changing guest MAC unless it is a trusted VF. > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows? > For example I can see host just > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. That's why I think pairing using MAC is fragile IMHO. When VF's MAC got changed before virtio attempts to match and pair the device, it ends up with no pairing found out at all. UUID is better. -Siwei > > -- > MST
On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote: > On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: > >> > > > > Might not neccessarily be something wrong, but it's very limited to > >> > > > > prohibit the MAC of VF from changing when enslaved by failover. > >> > > > You mean guest changing MAC? I'm not sure why we prohibit that. > >> > > I think Sridhar and Jiri might be better person to answer it. My > >> > > impression was that sync'ing the MAC address change between all 3 > >> > > devices is challenging, as the failover driver uses MAC address to > >> > > match net_device internally. > >> > >> Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement > >> of the MAC between the PF and VF. Allowing the guest to change the MAC will require > >> synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers > >> don't allow changing guest MAC unless it is a trusted VF. > > > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows? > > For example I can see host just > > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. > > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. > > That's why I think pairing using MAC is fragile IMHO. When VF's MAC > got changed before virtio attempts to match and pair the device, it > ends up with no pairing found out at all. Guest seems to match on the hardware mac and ignore whatever is set by user. Makes sense to me and should not be fragile. > UUID is better. > > -Siwei > > > > > -- > > MST
On Tue, Jun 26, 2018 at 5:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote: >> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: >> >> > > > > Might not neccessarily be something wrong, but it's very limited to >> >> > > > > prohibit the MAC of VF from changing when enslaved by failover. >> >> > > > You mean guest changing MAC? I'm not sure why we prohibit that. >> >> > > I think Sridhar and Jiri might be better person to answer it. My >> >> > > impression was that sync'ing the MAC address change between all 3 >> >> > > devices is challenging, as the failover driver uses MAC address to >> >> > > match net_device internally. >> >> >> >> Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement >> >> of the MAC between the PF and VF. Allowing the guest to change the MAC will require >> >> synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers >> >> don't allow changing guest MAC unless it is a trusted VF. >> > >> > OK but it's a policy thing. Maybe it's a trusted VF. Who knows? >> > For example I can see host just >> > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. >> > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. >> >> That's why I think pairing using MAC is fragile IMHO. When VF's MAC >> got changed before virtio attempts to match and pair the device, it >> ends up with no pairing found out at all. > > Guest seems to match on the hardware mac and ignore whatever > is set by user. Makes sense to me and should not be fragile. Host can change the hardware mac for VF any time. -Siwei > > >> UUID is better. >> >> -Siwei >> >> > >> > -- >> > MST
On 6/26/2018 11:21 PM, Siwei Liu wrote: > On Tue, Jun 26, 2018 at 5:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote: >>> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: >>>>>>>>> Might not neccessarily be something wrong, but it's very limited to >>>>>>>>> prohibit the MAC of VF from changing when enslaved by failover. >>>>>>>> You mean guest changing MAC? I'm not sure why we prohibit that. >>>>>>> I think Sridhar and Jiri might be better person to answer it. My >>>>>>> impression was that sync'ing the MAC address change between all 3 >>>>>>> devices is challenging, as the failover driver uses MAC address to >>>>>>> match net_device internally. >>>>> Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement >>>>> of the MAC between the PF and VF. Allowing the guest to change the MAC will require >>>>> synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers >>>>> don't allow changing guest MAC unless it is a trusted VF. >>>> OK but it's a policy thing. Maybe it's a trusted VF. Who knows? >>>> For example I can see host just >>>> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. >>>> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. >>> That's why I think pairing using MAC is fragile IMHO. When VF's MAC >>> got changed before virtio attempts to match and pair the device, it >>> ends up with no pairing found out at all. >> Guest seems to match on the hardware mac and ignore whatever >> is set by user. Makes sense to me and should not be fragile. > Host can change the hardware mac for VF any time. Live migration is initiated and controlled by the Host, So the Source Host will reset the MAC during live migration after unplugging the VF. This is to redirect the VMs frames towards PF so that they can be received via virtio-net standby interface. The destination host will set the VF MAC and plug the VF after live migration is completed. Allowing the guest to change the MAC will require the qemu/libvirt/mgmt layers to track the MAC changes and replay that change after live migration. > > -Siwei >> >>> UUID is better. >>> >>> -Siwei >>> >>>> -- >>>> MST
On Tue, Jun 26, 2018 at 11:49 PM, Samudrala, Sridhar <sridhar.samudrala@intel.com> wrote: > On 6/26/2018 11:21 PM, Siwei Liu wrote: >> >> On Tue, Jun 26, 2018 at 5:29 PM, Michael S. Tsirkin <mst@redhat.com> >> wrote: >>> >>> On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote: >>>> >>>> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> >>>> wrote: >>>>> >>>>> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote: >>>>>>>>>> >>>>>>>>>> Might not neccessarily be something wrong, but it's very limited >>>>>>>>>> to >>>>>>>>>> prohibit the MAC of VF from changing when enslaved by failover. >>>>>>>>> >>>>>>>>> You mean guest changing MAC? I'm not sure why we prohibit that. >>>>>>>> >>>>>>>> I think Sridhar and Jiri might be better person to answer it. My >>>>>>>> impression was that sync'ing the MAC address change between all 3 >>>>>>>> devices is challenging, as the failover driver uses MAC address to >>>>>>>> match net_device internally. >>>>>> >>>>>> Yes. The MAC address is assigned by the hypervisor and it needs to >>>>>> manage the movement >>>>>> of the MAC between the PF and VF. Allowing the guest to change the >>>>>> MAC will require >>>>>> synchronization between the hypervisor and the PF/VF drivers. Most of >>>>>> the VF drivers >>>>>> don't allow changing guest MAC unless it is a trusted VF. >>>>> >>>>> OK but it's a policy thing. Maybe it's a trusted VF. Who knows? >>>>> For example I can see host just >>>>> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it. >>>>> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest. >>>> >>>> That's why I think pairing using MAC is fragile IMHO. When VF's MAC >>>> got changed before virtio attempts to match and pair the device, it >>>> ends up with no pairing found out at all. >>> >>> Guest seems to match on the hardware mac and ignore whatever >>> is set by user. Makes sense to me and should not be fragile. >> >> Host can change the hardware mac for VF any time. > > > Live migration is initiated and controlled by the Host, So the Source Host > will > reset the MAC during live migration after unplugging the VF. This is to > redirect the > VMs frames towards PF so that they can be received via virtio-net standby > interface. > The destination host will set the VF MAC and plug the VF after live > migration is > completed. > > Allowing the guest to change the MAC will require the qemu/libvirt/mgmt > layers to > track the MAC changes and replay that change after live migration. > If the failover's MAC is kept in sync with VF's MAC address change, the VF on destination host can be paired using the permanent address after plugging in, while failover interface will resync the MAC to the current one in use when enslaving the VF. I think similar is done for multicast and unicast address list on VF's registration, right? No need of QEMU or mgmt software keep track of MAC changes. -Siwei > > >> >> -Siwei >>> >>> >>>> UUID is better. >>>> >>>> -Siwei >>>> >>>>> -- >>>>> MST > >
On Tue, 26 Jun 2018 20:50:20 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jun 26, 2018 at 05:08:13PM +0200, Cornelia Huck wrote: > > On Fri, 22 Jun 2018 17:05:04 -0700 > > Siwei Liu <loseweigh@gmail.com> wrote: > > > > > On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > I suspect the diveregence will be lost on most users though > > > > simply because they don't even care about vfio. They just > > > > want things to go fast. > > > > > > Like Jason said, VF isn't faster than virtio-net in all cases. It > > > depends on the workload and performance metrics: throughput, latency, > > > or packet per second. > > > > So, will it be guest/admin-controllable then where the traffic flows > > through? Just because we do have a vf available after negotiation of > > the feature bit, it does not necessarily mean we want to use it? Do we > > (the guest) even want to make it visible in that case? > > I think these ideas belong to what Alex Duyck wanted to do: > some kind of advanced device that isn't tied to > any network interfaces and allows workload and performance > specific tuning. > > Way out of scope for a simple failover, and more importantly, > no one is looking at even enumerating the problems involved, > much less solving them. So, for simplicity's sake, we need to rely on the host admin configuring the vm for its guest's intended use case. Sounds fair, but probably needs a note somewhere.
On Sat, 23 Jun 2018 00:43:24 +0300 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote: > > Would it be more helpful to focus on generic > > migration support for vfio instead of going about it device by device? > > Just to note this approach is actually device by device *type*. It's > mostly device agnostic for a given device type so you can migrate > between hosts with very different hardware. This enables heterogeneous environments, yes. But one drawback of that is that you cannot exploit any hardware specialities - it seems you're limited to what the paravirtual device supports. This is limiting for more homogeneous environments. > > And support for more PV device types has other advantages > such as security and forward compatibility to future hosts. But again the drawback is that we can't exploit new capabilities easily, can we? > > Finally, it all can happen mostly within QEMU. User is currently > required to enable it but it's pretty lightweight. > > OTOH vfio migration generally requires actual device-specific work, and > only works when hosts are mostly identical. When they aren't it's easy > to blame the user, but tools for checking host compatiblity are > currently non-existent. Upper layer management will also have to learn > about host and device compatibility wrt migration. At the moment they > can't even figure it out wrt software versions of vhost in kernel and > dpdk so I won't hold my breath for all of this happening quickly. Yes, that's a real problem. I think one issue here is that we want to support really different environments. For the case here, we have a lot of different networking adapters, but the guests are basically interested in one thing: doing network traffic. On the other hand, I'm thinking of the mainframe environment, where we have a very limited set of devices to support, but at the same time want to exploit their specialities, so the pv approach is limiting. For that use case, generic migration looks more useful.
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 90502fca7c..38b3140670 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = { true), DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN), DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str), + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY, + false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h index e9f255ea3f..01ec09684c 100644 --- a/include/standard-headers/linux/virtio_net.h +++ b/include/standard-headers/linux/virtio_net.h @@ -57,6 +57,9 @@ * Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device + * with the same MAC. + */ #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ #ifndef VIRTIO_NET_NO_LEGACY
This feature bit can be used by hypervisor to indicate virtio_net device to act as a standby for another device with the same MAC address. I tested this with a small change to the patch to mark the STANDBY feature 'true' by default as i am using libvirt to start the VMs. Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt XML file? Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> --- hw/net/virtio-net.c | 2 ++ include/standard-headers/linux/virtio_net.h | 3 +++ 2 files changed, 5 insertions(+)