Message ID | 20210322122452.369750-1-yuri.benditovich@daynix.com (mailing list archive) |
---|---|
Headers | show |
Series | virtio-net: graceful drop of vhost for TAP | expand |
在 2021/3/22 下午8:24, Yuri Benditovich 写道: > Allow fallback to userspace only upon migration, only for specific features > and only if 'vhostforce' is not requested. > > Changes from v1: > Patch 1 dropeed (will be submitted in another series) > Added device callback in case the migration should fail due to missing features Hi Yuri: Have a quick glance at the series. A questions is why we need to do the fallback only during load? I think we should do it in the device initializating. E.g when the vhost features can not satisfy, we should disable vhost since there. Thanks > > Yuri Benditovich (3): > net: add ability to hide (disable) vhost_net > virtio: introduce 'missing_features_migrated' device callback > virtio-net: implement missing_features_migrated callback > > hw/net/vhost_net.c | 4 ++- > hw/net/virtio-net.c | 51 ++++++++++++++++++++++++++++++++++++++ > hw/virtio/virtio.c | 8 ++++++ > include/hw/virtio/virtio.h | 8 ++++++ > include/net/net.h | 1 + > 5 files changed, 71 insertions(+), 1 deletion(-) >
Hi Jason, This was discussed earlier on the previous series of patches. https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html There were strong objections from both Daniel and Michael and I feel that the series was rejected. There was Michael's claim: "We did what this patch is trying to change for years now, in particular KVM also seems to happily disable CPU features not supported by kernel so I wonder why we can't keep doing it, with tweaks for some corner cases." https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html And it was Michael's question: "Can we limit the change to when a VM is migrated in?" https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html So I'm trying to suggest another approach: - In case of conflicting features (for example RSS and vhost) we in qemu we do not have enough information to prefer one or another. - If we drop to userspace in the first set_features we say: "vhost is less important than other requested features" - This series keeps backward compatibility, i.e. if you start with vhost and some features are not available - they are silently cleared. - But in case the features are available on source machine - they are used - In case of migration this series says: "We prefer successful migration even if for that we need to drop to userspace" - On the migration back to the 1st system we again work with all the features and with vhost as all the features are available. Thanks, Yuri On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote: > > > 在 2021/3/22 下午8:24, Yuri Benditovich 写道: > > Allow fallback to userspace only upon migration, only for specific features > > and only if 'vhostforce' is not requested. > > > > Changes from v1: > > Patch 1 dropeed (will be submitted in another series) > > Added device callback in case the migration should fail due to missing features > > > Hi Yuri: > > Have a quick glance at the series. A questions is why we need to do the > fallback only during load? > > I think we should do it in the device initializating. E.g when the vhost > features can not satisfy, we should disable vhost since there. > > Thanks > > > > > > Yuri Benditovich (3): > > net: add ability to hide (disable) vhost_net > > virtio: introduce 'missing_features_migrated' device callback > > virtio-net: implement missing_features_migrated callback > > > > hw/net/vhost_net.c | 4 ++- > > hw/net/virtio-net.c | 51 ++++++++++++++++++++++++++++++++++++++ > > hw/virtio/virtio.c | 8 ++++++ > > include/hw/virtio/virtio.h | 8 ++++++ > > include/net/net.h | 1 + > > 5 files changed, 71 insertions(+), 1 deletion(-) > > >
在 2021/3/25 下午5:00, Yuri Benditovich 写道: > Hi Jason, > > This was discussed earlier on the previous series of patches. > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html > There were strong objections from both Daniel and Michael and I feel > that the series was rejected. > There was Michael's claim: > "We did what this patch is trying to change for years now, in > particular KVM also seems to happily disable CPU features not supported > by kernel so I wonder why we can't keep doing it, with tweaks for some > corner cases." So for cpu feautres, it works since the management have other tool to the cpuid. Then management will make sure the migration happens amongs the hosts that is compatibile with the same cpuid sets. For vhost, we don't have such capabilities, that's why I think we need to have fallback. > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html > And it was Michael's question: > "Can we limit the change to when a VM is migrated in?" > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html > So I'm trying to suggest another approach: > - In case of conflicting features (for example RSS and vhost) we in > qemu we do not have enough information to prefer one or another. > - If we drop to userspace in the first set_features we say: "vhost is > less important than other requested features" > - This series keeps backward compatibility, i.e. if you start with > vhost and some features are not available - they are silently cleared. > - But in case the features are available on source machine - they are used > - In case of migration this series says: "We prefer successful > migration even if for that we need to drop to userspace" > - On the migration back to the 1st system we again work with all the > features and with vhost as all the features are available. One issue for this approach is that. Consider we had two drivers: 1) Driver A that supports split only 2) Driver B that supports packed Consider src support packed but dest doesn't So switching driver A to driver B works without migration. But if we switch driver from A to B after migration it won't work? Thanks > > Thanks, > Yuri > > > > On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote: >> >> 在 2021/3/22 下午8:24, Yuri Benditovich 写道: >>> Allow fallback to userspace only upon migration, only for specific features >>> and only if 'vhostforce' is not requested. >>> >>> Changes from v1: >>> Patch 1 dropeed (will be submitted in another series) >>> Added device callback in case the migration should fail due to missing features >> >> Hi Yuri: >> >> Have a quick glance at the series. A questions is why we need to do the >> fallback only during load? >> >> I think we should do it in the device initializating. E.g when the vhost >> features can not satisfy, we should disable vhost since there. >> >> Thanks >> >> >>> Yuri Benditovich (3): >>> net: add ability to hide (disable) vhost_net >>> virtio: introduce 'missing_features_migrated' device callback >>> virtio-net: implement missing_features_migrated callback >>> >>> hw/net/vhost_net.c | 4 ++- >>> hw/net/virtio-net.c | 51 ++++++++++++++++++++++++++++++++++++++ >>> hw/virtio/virtio.c | 8 ++++++ >>> include/hw/virtio/virtio.h | 8 ++++++ >>> include/net/net.h | 1 + >>> 5 files changed, 71 insertions(+), 1 deletion(-) >>>
On Fri, Mar 26, 2021 at 10:51 AM Jason Wang <jasowang@redhat.com> wrote: > > > 在 2021/3/25 下午5:00, Yuri Benditovich 写道: > > Hi Jason, > > > > This was discussed earlier on the previous series of patches. > > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html > > There were strong objections from both Daniel and Michael and I feel > > that the series was rejected. > > There was Michael's claim: > > "We did what this patch is trying to change for years now, in > > particular KVM also seems to happily disable CPU features not supported > > by kernel so I wonder why we can't keep doing it, with tweaks for some > > corner cases." > > > So for cpu feautres, it works since the management have other tool to > the cpuid. Then management will make sure the migration happens amongs > the hosts that is compatibile with the same cpuid sets. > > For vhost, we don't have such capabilities, that's why I think we need > to have fallback. > Hi Jason, What, from your POV was the result of v1 discussion? IMO, there was one critical comment that the patch does not address 'forcevhost' properly (indeed). IMO, there are many comments from Daniel and Michael that in the sum say that this change is not what they would like. If I'm mistaken please let me know. I have no problem to send v3 = v1 + handling of ''forcevhost' If this is what you want, please let me know also. > > > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html > > And it was Michael's question: > > "Can we limit the change to when a VM is migrated in?" > > https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html > > So I'm trying to suggest another approach: > > - In case of conflicting features (for example RSS and vhost) we in > > qemu we do not have enough information to prefer one or another. > > - If we drop to userspace in the first set_features we say: "vhost is > > less important than other requested features" > > - This series keeps backward compatibility, i.e. if you start with > > vhost and some features are not available - they are silently cleared. > > - But in case the features are available on source machine - they are used > > - In case of migration this series says: "We prefer successful > > migration even if for that we need to drop to userspace" > > - On the migration back to the 1st system we again work with all the > > features and with vhost as all the features are available. > > > One issue for this approach is that. Consider we had two drivers: > > 1) Driver A that supports split only > 2) Driver B that supports packed > > Consider src support packed but dest doesn't > > So switching driver A to driver B works without migration. But if we > switch driver from A to B after migration it won't work? I assume that both src and dest started with vhost=on. As driver B supports both packed and split, you can switch from driver A to driver B after migration and driver B will work with split. Exactly as it does today. The key question is what is more important - vhost or features that vhost does not support? current code says: vhost is more important always v1 patch says: features are more important always. v2 patch says: vhost is more important at init time, features are more important at migration time. Because we are able to drop vhost but we can't drop features when we have a running driver. Do you agree? > > Thanks > > > > > > Thanks, > > Yuri > > > > > > > > On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote: > >> > >> 在 2021/3/22 下午8:24, Yuri Benditovich 写道: > >>> Allow fallback to userspace only upon migration, only for specific features > >>> and only if 'vhostforce' is not requested. > >>> > >>> Changes from v1: > >>> Patch 1 dropeed (will be submitted in another series) > >>> Added device callback in case the migration should fail due to missing features > >> > >> Hi Yuri: > >> > >> Have a quick glance at the series. A questions is why we need to do the > >> fallback only during load? > >> > >> I think we should do it in the device initializating. E.g when the vhost > >> features can not satisfy, we should disable vhost since there. > >> > >> Thanks > >> > >> > >>> Yuri Benditovich (3): > >>> net: add ability to hide (disable) vhost_net > >>> virtio: introduce 'missing_features_migrated' device callback > >>> virtio-net: implement missing_features_migrated callback > >>> > >>> hw/net/vhost_net.c | 4 ++- > >>> hw/net/virtio-net.c | 51 ++++++++++++++++++++++++++++++++++++++ > >>> hw/virtio/virtio.c | 8 ++++++ > >>> include/hw/virtio/virtio.h | 8 ++++++ > >>> include/net/net.h | 1 + > >>> 5 files changed, 71 insertions(+), 1 deletion(-) > >>> >
在 2021/3/26 下午5:09, Yuri Benditovich 写道: > On Fri, Mar 26, 2021 at 10:51 AM Jason Wang <jasowang@redhat.com> wrote: >> >> 在 2021/3/25 下午5:00, Yuri Benditovich 写道: >>> Hi Jason, >>> >>> This was discussed earlier on the previous series of patches. >>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg01829.html >>> There were strong objections from both Daniel and Michael and I feel >>> that the series was rejected. >>> There was Michael's claim: >>> "We did what this patch is trying to change for years now, in >>> particular KVM also seems to happily disable CPU features not supported >>> by kernel so I wonder why we can't keep doing it, with tweaks for some >>> corner cases." >> >> So for cpu feautres, it works since the management have other tool to >> the cpuid. Then management will make sure the migration happens amongs >> the hosts that is compatibile with the same cpuid sets. >> >> For vhost, we don't have such capabilities, that's why I think we need >> to have fallback. >> > Hi Jason, > What, from your POV was the result of v1 discussion? It looks to me we don't have an agreement on that, sorry. > IMO, there was one critical comment that the patch does not address > 'forcevhost' properly (indeed). > IMO, there are many comments from Daniel and Michael that in the sum > say that this change is not what they would like. > If I'm mistaken please let me know. I think I will open a new thread and summarize the different approaches and then we can come a conclusion. > > I have no problem to send v3 = v1 + handling of ''forcevhost' > If this is what you want, please let me know also. > >>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03187.html >>> And it was Michael's question: >>> "Can we limit the change to when a VM is migrated in?" >>> https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg03163.html >>> So I'm trying to suggest another approach: >>> - In case of conflicting features (for example RSS and vhost) we in >>> qemu we do not have enough information to prefer one or another. >>> - If we drop to userspace in the first set_features we say: "vhost is >>> less important than other requested features" >>> - This series keeps backward compatibility, i.e. if you start with >>> vhost and some features are not available - they are silently cleared. >>> - But in case the features are available on source machine - they are used >>> - In case of migration this series says: "We prefer successful >>> migration even if for that we need to drop to userspace" >>> - On the migration back to the 1st system we again work with all the >>> features and with vhost as all the features are available. >> >> One issue for this approach is that. Consider we had two drivers: >> >> 1) Driver A that supports split only >> 2) Driver B that supports packed >> >> Consider src support packed but dest doesn't >> >> So switching driver A to driver B works without migration. But if we >> switch driver from A to B after migration it won't work? > I assume that both src and dest started with vhost=on. > > As driver B supports both packed and split, you can switch from driver > A to driver B after migration > and driver B will work with split. Exactly as it does today. > > The key question is what is more important - vhost or features that > vhost does not support? > current code says: vhost is more important always > v1 patch says: features are more important always. > v2 patch says: vhost is more important at init time, features are more > important at migration time. > Because we are able to drop vhost but we can't drop features when we > have a running driver. > Do you agree? I think what came from cli is the most important. So if I understand correclty: - vhost=on means "turn on vhost when possible" it implies that fallback is allowed (we had already had fallback codes) - vhostforce=on means "turn on vhost unconditonally" it implies that we can't do fallback So my understanding is that: - "vhost=on, packed=on", we can fallback to userspace but must keep packed virtqueue works - "vhost=on,vhostforce=on,packed=on", we can't fallback and must keep both vhost and packed virtqueue work, if we can't we need to fail Thanks > >> Thanks >> >> >>> Thanks, >>> Yuri >>> >>> >>> >>> On Thu, Mar 25, 2021 at 8:59 AM Jason Wang <jasowang@redhat.com> wrote: >>>> 在 2021/3/22 下午8:24, Yuri Benditovich 写道: >>>>> Allow fallback to userspace only upon migration, only for specific features >>>>> and only if 'vhostforce' is not requested. >>>>> >>>>> Changes from v1: >>>>> Patch 1 dropeed (will be submitted in another series) >>>>> Added device callback in case the migration should fail due to missing features >>>> Hi Yuri: >>>> >>>> Have a quick glance at the series. A questions is why we need to do the >>>> fallback only during load? >>>> >>>> I think we should do it in the device initializating. E.g when the vhost >>>> features can not satisfy, we should disable vhost since there. >>>> >>>> Thanks >>>> >>>> >>>>> Yuri Benditovich (3): >>>>> net: add ability to hide (disable) vhost_net >>>>> virtio: introduce 'missing_features_migrated' device callback >>>>> virtio-net: implement missing_features_migrated callback >>>>> >>>>> hw/net/vhost_net.c | 4 ++- >>>>> hw/net/virtio-net.c | 51 ++++++++++++++++++++++++++++++++++++++ >>>>> hw/virtio/virtio.c | 8 ++++++ >>>>> include/hw/virtio/virtio.h | 8 ++++++ >>>>> include/net/net.h | 1 + >>>>> 5 files changed, 71 insertions(+), 1 deletion(-) >>>>>
在 2021/3/26 下午5:18, Jason Wang 写道: >>> ? >> I assume that both src and dest started with vhost=on. >> >> As driver B supports both packed and split, you can switch from driver >> A to driver B after migration >> and driver B will work with split. Exactly as it does today. >> >> The key question is what is more important - vhost or features that >> vhost does not support? >> current code says: vhost is more important always >> v1 patch says: features are more important always. >> v2 patch says: vhost is more important at init time, features are more >> important at migration time. >> Because we are able to drop vhost but we can't drop features when we >> have a running driver. >> Do you agree? > > > I think what came from cli is the most important. So if I understand > correclty: > > - vhost=on means "turn on vhost when possible" it implies that > fallback is allowed (we had already had fallback codes) > - vhostforce=on means "turn on vhost unconditonally" it implies that > we can't do fallback > > So my understanding is that: > > - "vhost=on, packed=on", we can fallback to userspace but must keep > packed virtqueue works > - "vhost=on,vhostforce=on,packed=on", we can't fallback and must keep > both vhost and packed virtqueue work, if we can't we need to fail > > Thanks Daniel and Michael, am I right here? We need some inputs to move forward to fix the migration compatibility issue. Thanks