Message ID | 20231004125904.110781-2-hreitz@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vhost-user: Back-end state migration | expand |
On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > There is no clearly defined purpose for the virtio status byte in > vhost-user: For resetting, we already have RESET_DEVICE; and for virtio > feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > protocol extension, it is possible for SET_FEATURES to return errors > (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > > As for implementations, SET_STATUS is not widely implemented. dpdk does > implement it, but only uses it to signal feature negotiation failure. > While it does log reset requests (SET_STATUS 0) as such, it effectively > ignores them, in contrast to RESET_OWNER (which is deprecated, and today > means the same thing as RESET_DEVICE). > > While qemu superficially has support for [GS]ET_STATUS, it does not > forward the guest-set status byte, but instead just makes it up > internally, and actually completely ignores what the back-end returns, > only using it as the template for a subsequent SET_STATUS to add single > bits to it. Notably, after setting FEATURES_OK, it never reads it back > to see whether the flag is still set, which is the only way in which > dpdk uses the status byte. > > As-is, no front-end or back-end can rely on the other side handling this > field in a useful manner, and it also provides no practical use over > other mechanisms the vhost-user protocol has, which are more clearly > defined. Deprecate it. > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- > 1 file changed, 21 insertions(+), 7 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On 05.10.23 19:15, Michael S. Tsirkin wrote: > On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>> There is no clearly defined purpose for the virtio status byte in >>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio >>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>> protocol extension, it is possible for SET_FEATURES to return errors >>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>> >>> As for implementations, SET_STATUS is not widely implemented. dpdk does >>> implement it, but only uses it to signal feature negotiation failure. >>> While it does log reset requests (SET_STATUS 0) as such, it effectively >>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today >>> means the same thing as RESET_DEVICE). >>> >>> While qemu superficially has support for [GS]ET_STATUS, it does not >>> forward the guest-set status byte, but instead just makes it up >>> internally, and actually completely ignores what the back-end returns, >>> only using it as the template for a subsequent SET_STATUS to add single >>> bits to it. Notably, after setting FEATURES_OK, it never reads it back >>> to see whether the flag is still set, which is the only way in which >>> dpdk uses the status byte. >>> >>> As-is, no front-end or back-end can rely on the other side handling this >>> field in a useful manner, and it also provides no practical use over >>> other mechanisms the vhost-user protocol has, which are more clearly >>> defined. Deprecate it. >>> >>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>> --- >>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- >>> 1 file changed, 21 insertions(+), 7 deletions(-) >> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > > SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. > The fact current backends never check errors does not mean they never > will. So no, not applying this. Can this not be done with REPLY_ACK? I.e., with the following message order: 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is present 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK 4. SET_FEATURES with need_reply If not, the problem is that qemu has sent SET_STATUS 0 for a while when the vCPUs are stopped, which generally seems to request a device reset. If we don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will implement SET_STATUS later may break with at least these qemu versions. But documenting that a particular use of the status byte is to be ignored would be really strange. Hanna
On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: > On 05.10.23 19:15, Michael S. Tsirkin wrote: > > On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: > > > On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > > > > There is no clearly defined purpose for the virtio status byte in > > > > vhost-user: For resetting, we already have RESET_DEVICE; and for virtio > > > > feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > > > > protocol extension, it is possible for SET_FEATURES to return errors > > > > (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > > > > > > > > As for implementations, SET_STATUS is not widely implemented. dpdk does > > > > implement it, but only uses it to signal feature negotiation failure. > > > > While it does log reset requests (SET_STATUS 0) as such, it effectively > > > > ignores them, in contrast to RESET_OWNER (which is deprecated, and today > > > > means the same thing as RESET_DEVICE). > > > > > > > > While qemu superficially has support for [GS]ET_STATUS, it does not > > > > forward the guest-set status byte, but instead just makes it up > > > > internally, and actually completely ignores what the back-end returns, > > > > only using it as the template for a subsequent SET_STATUS to add single > > > > bits to it. Notably, after setting FEATURES_OK, it never reads it back > > > > to see whether the flag is still set, which is the only way in which > > > > dpdk uses the status byte. > > > > > > > > As-is, no front-end or back-end can rely on the other side handling this > > > > field in a useful manner, and it also provides no practical use over > > > > other mechanisms the vhost-user protocol has, which are more clearly > > > > defined. Deprecate it. > > > > > > > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > --- > > > > docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- > > > > 1 file changed, 21 insertions(+), 7 deletions(-) > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. > > The fact current backends never check errors does not mean they never > > will. So no, not applying this. > > Can this not be done with REPLY_ACK? I.e., with the following message > order: > > 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is > present > 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK > 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK > 4. SET_FEATURES with need_reply > > If not, the problem is that qemu has sent SET_STATUS 0 for a while when the > vCPUs are stopped, which generally seems to request a device reset. If we > don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will > implement SET_STATUS later may break with at least these qemu versions. But > documenting that a particular use of the status byte is to be ignored would > be really strange. > > Hanna Hmm I guess. Though just following virtio spec seems cleaner to me... vhost-user reconfigures the state fully on start. I guess symmetry was the point. So I don't see why SET_STATUS 0 has to be ignored. SET_STATUS was introduced by: commit 923b8921d210763359e96246a58658ac0db6c645 Author: Yajun Wu <yajunw@nvidia.com> Date: Mon Oct 17 14:44:52 2022 +0800 vhost-user: Support vhost_dev_start CC the author.
On 06.10.23 10:45, Michael S. Tsirkin wrote: > On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>>>> There is no clearly defined purpose for the virtio status byte in >>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio >>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>>>> protocol extension, it is possible for SET_FEATURES to return errors >>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>>>> >>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does >>>>> implement it, but only uses it to signal feature negotiation failure. >>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively >>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today >>>>> means the same thing as RESET_DEVICE). >>>>> >>>>> While qemu superficially has support for [GS]ET_STATUS, it does not >>>>> forward the guest-set status byte, but instead just makes it up >>>>> internally, and actually completely ignores what the back-end returns, >>>>> only using it as the template for a subsequent SET_STATUS to add single >>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back >>>>> to see whether the flag is still set, which is the only way in which >>>>> dpdk uses the status byte. >>>>> >>>>> As-is, no front-end or back-end can rely on the other side handling this >>>>> field in a useful manner, and it also provides no practical use over >>>>> other mechanisms the vhost-user protocol has, which are more clearly >>>>> defined. Deprecate it. >>>>> >>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>> --- >>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- >>>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. >>> The fact current backends never check errors does not mean they never >>> will. So no, not applying this. >> Can this not be done with REPLY_ACK? I.e., with the following message >> order: >> >> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is >> present >> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK >> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK >> 4. SET_FEATURES with need_reply >> >> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the >> vCPUs are stopped, which generally seems to request a device reset. If we >> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will >> implement SET_STATUS later may break with at least these qemu versions. But >> documenting that a particular use of the status byte is to be ignored would >> be really strange. >> >> Hanna > Hmm I guess. Though just following virtio spec seems cleaner to me... > vhost-user reconfigures the state fully on start. Not the internal device state, though. virtiofsd has internal state, and other devices like vhost-gpu back-ends would probably, too. Stefan has recently sent a series (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to put the reset (RESET_DEVICE) into virtio_reset() (when we really need a reset). I really don’t like our current approach with the status byte. Following the virtio specification to me would mean that the guest directly controls this byte, which it does not. qemu makes up values as it deems appropriate, and this includes sending a SET_STATUS 0 when the guest is just paused, i.e. when the guest really doesn’t want a device reset. That means that qemu does not treat this as a virtio device field (because that would mean exposing it to the guest driver), but instead treats it as part of the vhost(-user) protocol. It doesn’t feel right to me that we use a virtio-defined feature for communication on the vhost level, i.e. between front-end and back-end, and not between guest driver and device. I think all vhost-level protocol features should be fully defined in the vhost-user specification, which REPLY_ACK is. Now, we could hand full control of the status byte to the guest, and that would make me content. But I feel like that doesn’t really work, because qemu needs to intercept the status byte anyway (it needs to know when there is a reset, probably wants to know when the device is configured, etc.), so I don’t think having the status byte in vhost-user really gains us much when qemu could translate status byte changes to/from other vhost-user commands. Hanna > I guess symmetry was the > point. So I don't see why SET_STATUS 0 has to be ignored. > > > SET_STATUS was introduced by: > > commit 923b8921d210763359e96246a58658ac0db6c645 > Author: Yajun Wu <yajunw@nvidia.com> > Date: Mon Oct 17 14:44:52 2022 +0800 > > vhost-user: Support vhost_dev_start > > CC the author. >
On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: > On 06.10.23 10:45, Michael S. Tsirkin wrote: > > On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: > > > On 05.10.23 19:15, Michael S. Tsirkin wrote: > > > > On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: > > > > > On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > > > > > > There is no clearly defined purpose for the virtio status byte in > > > > > > vhost-user: For resetting, we already have RESET_DEVICE; and for virtio > > > > > > feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > > > > > > protocol extension, it is possible for SET_FEATURES to return errors > > > > > > (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > > > > > > > > > > > > As for implementations, SET_STATUS is not widely implemented. dpdk does > > > > > > implement it, but only uses it to signal feature negotiation failure. > > > > > > While it does log reset requests (SET_STATUS 0) as such, it effectively > > > > > > ignores them, in contrast to RESET_OWNER (which is deprecated, and today > > > > > > means the same thing as RESET_DEVICE). > > > > > > > > > > > > While qemu superficially has support for [GS]ET_STATUS, it does not > > > > > > forward the guest-set status byte, but instead just makes it up > > > > > > internally, and actually completely ignores what the back-end returns, > > > > > > only using it as the template for a subsequent SET_STATUS to add single > > > > > > bits to it. Notably, after setting FEATURES_OK, it never reads it back > > > > > > to see whether the flag is still set, which is the only way in which > > > > > > dpdk uses the status byte. > > > > > > > > > > > > As-is, no front-end or back-end can rely on the other side handling this > > > > > > field in a useful manner, and it also provides no practical use over > > > > > > other mechanisms the vhost-user protocol has, which are more clearly > > > > > > defined. Deprecate it. > > > > > > > > > > > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > > > --- > > > > > > docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- > > > > > > 1 file changed, 21 insertions(+), 7 deletions(-) > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. > > > > The fact current backends never check errors does not mean they never > > > > will. So no, not applying this. > > > Can this not be done with REPLY_ACK? I.e., with the following message > > > order: > > > > > > 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is > > > present > > > 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK > > > 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK > > > 4. SET_FEATURES with need_reply > > > > > > If not, the problem is that qemu has sent SET_STATUS 0 for a while when the > > > vCPUs are stopped, which generally seems to request a device reset. If we > > > don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will > > > implement SET_STATUS later may break with at least these qemu versions. But > > > documenting that a particular use of the status byte is to be ignored would > > > be really strange. > > > > > > Hanna > > Hmm I guess. Though just following virtio spec seems cleaner to me... > > vhost-user reconfigures the state fully on start. > > Not the internal device state, though. virtiofsd has internal state, and > other devices like vhost-gpu back-ends would probably, too. > > Stefan has recently sent a series > (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to > put the reset (RESET_DEVICE) into virtio_reset() (when we really need a > reset). > > I really don’t like our current approach with the status byte. Following the > virtio specification to me would mean that the guest directly controls this > byte, which it does not. qemu makes up values as it deems appropriate, and > this includes sending a SET_STATUS 0 when the guest is just paused, i.e. > when the guest really doesn’t want a device reset. > > That means that qemu does not treat this as a virtio device field (because > that would mean exposing it to the guest driver), but instead treats it as > part of the vhost(-user) protocol. It doesn’t feel right to me that we use > a virtio-defined feature for communication on the vhost level, i.e. between > front-end and back-end, and not between guest driver and device. I think > all vhost-level protocol features should be fully defined in the vhost-user > specification, which REPLY_ACK is. Hmm that makes sense. Maybe we should have done what stefan's patch is doing. Do look at the original commit that introduced it to understand why it was added. > Now, we could hand full control of the status byte to the guest, and that > would make me content. But I feel like that doesn’t really work, because > qemu needs to intercept the status byte anyway (it needs to know when there > is a reset, probably wants to know when the device is configured, etc.), so > I don’t think having the status byte in vhost-user really gains us much when > qemu could translate status byte changes to/from other vhost-user commands. > > Hanna well it intercepts it but I think it could pass it on unchanged. > > I guess symmetry was the > > point. So I don't see why SET_STATUS 0 has to be ignored. > > > > > > SET_STATUS was introduced by: > > > > commit 923b8921d210763359e96246a58658ac0db6c645 > > Author: Yajun Wu <yajunw@nvidia.com> > > Date: Mon Oct 17 14:44:52 2022 +0800 > > > > vhost-user: Support vhost_dev_start > > > > CC the author. > >
On 06.10.23 11:26, Michael S. Tsirkin wrote: > On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>>>>>> There is no clearly defined purpose for the virtio status byte in >>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio >>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>>>>>> protocol extension, it is possible for SET_FEATURES to return errors >>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>>>>>> >>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does >>>>>>> implement it, but only uses it to signal feature negotiation failure. >>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively >>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today >>>>>>> means the same thing as RESET_DEVICE). >>>>>>> >>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not >>>>>>> forward the guest-set status byte, but instead just makes it up >>>>>>> internally, and actually completely ignores what the back-end returns, >>>>>>> only using it as the template for a subsequent SET_STATUS to add single >>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back >>>>>>> to see whether the flag is still set, which is the only way in which >>>>>>> dpdk uses the status byte. >>>>>>> >>>>>>> As-is, no front-end or back-end can rely on the other side handling this >>>>>>> field in a useful manner, and it also provides no practical use over >>>>>>> other mechanisms the vhost-user protocol has, which are more clearly >>>>>>> defined. Deprecate it. >>>>>>> >>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>> --- >>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- >>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. >>>>> The fact current backends never check errors does not mean they never >>>>> will. So no, not applying this. >>>> Can this not be done with REPLY_ACK? I.e., with the following message >>>> order: >>>> >>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is >>>> present >>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK >>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK >>>> 4. SET_FEATURES with need_reply >>>> >>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the >>>> vCPUs are stopped, which generally seems to request a device reset. If we >>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will >>>> implement SET_STATUS later may break with at least these qemu versions. But >>>> documenting that a particular use of the status byte is to be ignored would >>>> be really strange. >>>> >>>> Hanna >>> Hmm I guess. Though just following virtio spec seems cleaner to me... >>> vhost-user reconfigures the state fully on start. >> Not the internal device state, though. virtiofsd has internal state, and >> other devices like vhost-gpu back-ends would probably, too. >> >> Stefan has recently sent a series >> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to >> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a >> reset). >> >> I really don’t like our current approach with the status byte. Following the >> virtio specification to me would mean that the guest directly controls this >> byte, which it does not. qemu makes up values as it deems appropriate, and >> this includes sending a SET_STATUS 0 when the guest is just paused, i.e. >> when the guest really doesn’t want a device reset. >> >> That means that qemu does not treat this as a virtio device field (because >> that would mean exposing it to the guest driver), but instead treats it as >> part of the vhost(-user) protocol. It doesn’t feel right to me that we use >> a virtio-defined feature for communication on the vhost level, i.e. between >> front-end and back-end, and not between guest driver and device. I think >> all vhost-level protocol features should be fully defined in the vhost-user >> specification, which REPLY_ACK is. > Hmm that makes sense. Maybe we should have done what stefan's patch > is doing. > > Do look at the original commit that introduced it to understand why > it was added. I don’t understand why this was added to the stop/cont code, though. If it is time consuming to make these changes, why are they done every time the VM is paused and resumed? It makes sense that this would be done for the initial configuration (where a reset also wouldn’t hurt), but here it seems wrong. (To be clear, a reset in the stop/cont code is wrong, because it breaks stateful devices.) Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as originally introduced was wrong even for non-stateful devices, because it occurred before we fetched the state (vring indices) so we could restore it later. I don’t know how 923b8921d21 was tested, but if the back-end used for testing implemented SET_STATUS 0 as a reset, it could not have survived either migration or a stop/cont in general, because the vring indices would have been reset to 0. What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all devices that would implement them as per virtio spec, and even today it’s broken for stateful devices. The mentioned performance issue is likely real, but we can’t address it by making up SET_STATUS calls that are wrong. I concede that I didn’t think about DRIVER_OK. Personally, I would do all final configuration that would happen upon a DRIVER_OK once the first vring is started (i.e. receives a kick). That has the added benefit of being asynchronous because it doesn’t block any vhost-user messages (which are synchronous, and thus block downtime). Hanna >> Now, we could hand full control of the status byte to the guest, and that >> would make me content. But I feel like that doesn’t really work, because >> qemu needs to intercept the status byte anyway (it needs to know when there >> is a reset, probably wants to know when the device is configured, etc.), so >> I don’t think having the status byte in vhost-user really gains us much when >> qemu could translate status byte changes to/from other vhost-user commands. >> >> Hanna > well it intercepts it but I think it could pass it on unchanged. > > >>> I guess symmetry was the >>> point. So I don't see why SET_STATUS 0 has to be ignored. >>> >>> >>> SET_STATUS was introduced by: >>> >>> commit 923b8921d210763359e96246a58658ac0db6c645 >>> Author: Yajun Wu <yajunw@nvidia.com> >>> Date: Mon Oct 17 14:44:52 2022 +0800 >>> >>> vhost-user: Support vhost_dev_start >>> >>> CC the author. >>>
On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: > On 06.10.23 11:26, Michael S. Tsirkin wrote: > > On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: > > > On 06.10.23 10:45, Michael S. Tsirkin wrote: > > > > On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: > > > > > On 05.10.23 19:15, Michael S. Tsirkin wrote: > > > > > > On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: > > > > > > > On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > > > > > > > > There is no clearly defined purpose for the virtio status byte in > > > > > > > > vhost-user: For resetting, we already have RESET_DEVICE; and for virtio > > > > > > > > feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > > > > > > > > protocol extension, it is possible for SET_FEATURES to return errors > > > > > > > > (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > > > > > > > > > > > > > > > > As for implementations, SET_STATUS is not widely implemented. dpdk does > > > > > > > > implement it, but only uses it to signal feature negotiation failure. > > > > > > > > While it does log reset requests (SET_STATUS 0) as such, it effectively > > > > > > > > ignores them, in contrast to RESET_OWNER (which is deprecated, and today > > > > > > > > means the same thing as RESET_DEVICE). > > > > > > > > > > > > > > > > While qemu superficially has support for [GS]ET_STATUS, it does not > > > > > > > > forward the guest-set status byte, but instead just makes it up > > > > > > > > internally, and actually completely ignores what the back-end returns, > > > > > > > > only using it as the template for a subsequent SET_STATUS to add single > > > > > > > > bits to it. Notably, after setting FEATURES_OK, it never reads it back > > > > > > > > to see whether the flag is still set, which is the only way in which > > > > > > > > dpdk uses the status byte. > > > > > > > > > > > > > > > > As-is, no front-end or back-end can rely on the other side handling this > > > > > > > > field in a useful manner, and it also provides no practical use over > > > > > > > > other mechanisms the vhost-user protocol has, which are more clearly > > > > > > > > defined. Deprecate it. > > > > > > > > > > > > > > > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > > > > > --- > > > > > > > > docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- > > > > > > > > 1 file changed, 21 insertions(+), 7 deletions(-) > > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > > > > > > SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. > > > > > > The fact current backends never check errors does not mean they never > > > > > > will. So no, not applying this. > > > > > Can this not be done with REPLY_ACK? I.e., with the following message > > > > > order: > > > > > > > > > > 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is > > > > > present > > > > > 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK > > > > > 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK > > > > > 4. SET_FEATURES with need_reply > > > > > > > > > > If not, the problem is that qemu has sent SET_STATUS 0 for a while when the > > > > > vCPUs are stopped, which generally seems to request a device reset. If we > > > > > don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will > > > > > implement SET_STATUS later may break with at least these qemu versions. But > > > > > documenting that a particular use of the status byte is to be ignored would > > > > > be really strange. > > > > > > > > > > Hanna > > > > Hmm I guess. Though just following virtio spec seems cleaner to me... > > > > vhost-user reconfigures the state fully on start. > > > Not the internal device state, though. virtiofsd has internal state, and > > > other devices like vhost-gpu back-ends would probably, too. > > > > > > Stefan has recently sent a series > > > (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to > > > put the reset (RESET_DEVICE) into virtio_reset() (when we really need a > > > reset). > > > > > > I really don’t like our current approach with the status byte. Following the > > > virtio specification to me would mean that the guest directly controls this > > > byte, which it does not. qemu makes up values as it deems appropriate, and > > > this includes sending a SET_STATUS 0 when the guest is just paused, i.e. > > > when the guest really doesn’t want a device reset. > > > > > > That means that qemu does not treat this as a virtio device field (because > > > that would mean exposing it to the guest driver), but instead treats it as > > > part of the vhost(-user) protocol. It doesn’t feel right to me that we use > > > a virtio-defined feature for communication on the vhost level, i.e. between > > > front-end and back-end, and not between guest driver and device. I think > > > all vhost-level protocol features should be fully defined in the vhost-user > > > specification, which REPLY_ACK is. > > Hmm that makes sense. Maybe we should have done what stefan's patch > > is doing. > > > > Do look at the original commit that introduced it to understand why > > it was added. > > I don’t understand why this was added to the stop/cont code, though. If it > is time consuming to make these changes, why are they done every time the VM > is paused > and resumed? It makes sense that this would be done for the initial > configuration (where a reset also wouldn’t hurt), but here it seems wrong. > > (To be clear, a reset in the stop/cont code is wrong, because it breaks > stateful devices.) > > Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as > originally introduced was wrong even for non-stateful devices, because it > occurred before we fetched the state (vring indices) so we could restore it > later. I don’t know how 923b8921d21 was tested, but if the back-end used > for testing implemented SET_STATUS 0 as a reset, it could not have survived > either migration or a stop/cont in general, because the vring indices would > have been reset to 0. > > What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all > devices that would implement them as per virtio spec, and even today it’s > broken for stateful devices. The mentioned performance issue is likely > real, but we can’t address it by making up SET_STATUS calls that are wrong. > > I concede that I didn’t think about DRIVER_OK. Personally, I would do all > final configuration that would happen upon a DRIVER_OK once the first vring > is started (i.e. receives a kick). That has the added benefit of being > asynchronous because it doesn’t block any vhost-user messages (which are > synchronous, and thus block downtime). > > Hanna For better or worse kick is per ring. It's out of spec to start rings that were not kicked but I guess you could do configuration ... Seems somewhat asymmetrical though. Let's wait until next week, hopefully Yajun Wu will answer. > > > Now, we could hand full control of the status byte to the guest, and that > > > would make me content. But I feel like that doesn’t really work, because > > > qemu needs to intercept the status byte anyway (it needs to know when there > > > is a reset, probably wants to know when the device is configured, etc.), so > > > I don’t think having the status byte in vhost-user really gains us much when > > > qemu could translate status byte changes to/from other vhost-user commands. > > > > > > Hanna > > well it intercepts it but I think it could pass it on unchanged. > > > > > > > > I guess symmetry was the > > > > point. So I don't see why SET_STATUS 0 has to be ignored. > > > > > > > > > > > > SET_STATUS was introduced by: > > > > > > > > commit 923b8921d210763359e96246a58658ac0db6c645 > > > > Author: Yajun Wu <yajunw@nvidia.com> > > > > Date: Mon Oct 17 14:44:52 2022 +0800 > > > > > > > > vhost-user: Support vhost_dev_start > > > > > > > > CC the author. > > > >
On 06.10.23 12:34, Michael S. Tsirkin wrote: > On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>>>>>>>> There is no clearly defined purpose for the virtio status byte in >>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio >>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors >>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>>>>>>>> >>>>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does >>>>>>>>> implement it, but only uses it to signal feature negotiation failure. >>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively >>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today >>>>>>>>> means the same thing as RESET_DEVICE). >>>>>>>>> >>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not >>>>>>>>> forward the guest-set status byte, but instead just makes it up >>>>>>>>> internally, and actually completely ignores what the back-end returns, >>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single >>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back >>>>>>>>> to see whether the flag is still set, which is the only way in which >>>>>>>>> dpdk uses the status byte. >>>>>>>>> >>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this >>>>>>>>> field in a useful manner, and it also provides no practical use over >>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly >>>>>>>>> defined. Deprecate it. >>>>>>>>> >>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>>>> --- >>>>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- >>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. >>>>>>> The fact current backends never check errors does not mean they never >>>>>>> will. So no, not applying this. >>>>>> Can this not be done with REPLY_ACK? I.e., with the following message >>>>>> order: >>>>>> >>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is >>>>>> present >>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>> 4. SET_FEATURES with need_reply >>>>>> >>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the >>>>>> vCPUs are stopped, which generally seems to request a device reset. If we >>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will >>>>>> implement SET_STATUS later may break with at least these qemu versions. But >>>>>> documenting that a particular use of the status byte is to be ignored would >>>>>> be really strange. >>>>>> >>>>>> Hanna >>>>> Hmm I guess. Though just following virtio spec seems cleaner to me... >>>>> vhost-user reconfigures the state fully on start. >>>> Not the internal device state, though. virtiofsd has internal state, and >>>> other devices like vhost-gpu back-ends would probably, too. >>>> >>>> Stefan has recently sent a series >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to >>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a >>>> reset). >>>> >>>> I really don’t like our current approach with the status byte. Following the >>>> virtio specification to me would mean that the guest directly controls this >>>> byte, which it does not. qemu makes up values as it deems appropriate, and >>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e. >>>> when the guest really doesn’t want a device reset. >>>> >>>> That means that qemu does not treat this as a virtio device field (because >>>> that would mean exposing it to the guest driver), but instead treats it as >>>> part of the vhost(-user) protocol. It doesn’t feel right to me that we use >>>> a virtio-defined feature for communication on the vhost level, i.e. between >>>> front-end and back-end, and not between guest driver and device. I think >>>> all vhost-level protocol features should be fully defined in the vhost-user >>>> specification, which REPLY_ACK is. >>> Hmm that makes sense. Maybe we should have done what stefan's patch >>> is doing. >>> >>> Do look at the original commit that introduced it to understand why >>> it was added. >> I don’t understand why this was added to the stop/cont code, though. If it >> is time consuming to make these changes, why are they done every time the VM >> is paused >> and resumed? It makes sense that this would be done for the initial >> configuration (where a reset also wouldn’t hurt), but here it seems wrong. >> >> (To be clear, a reset in the stop/cont code is wrong, because it breaks >> stateful devices.) >> >> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as >> originally introduced was wrong even for non-stateful devices, because it >> occurred before we fetched the state (vring indices) so we could restore it >> later. I don’t know how 923b8921d21 was tested, but if the back-end used >> for testing implemented SET_STATUS 0 as a reset, it could not have survived >> either migration or a stop/cont in general, because the vring indices would >> have been reset to 0. >> >> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >> devices that would implement them as per virtio spec, and even today it’s >> broken for stateful devices. The mentioned performance issue is likely >> real, but we can’t address it by making up SET_STATUS calls that are wrong. >> >> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >> final configuration that would happen upon a DRIVER_OK once the first vring >> is started (i.e. receives a kick). That has the added benefit of being >> asynchronous because it doesn’t block any vhost-user messages (which are >> synchronous, and thus block downtime). >> >> Hanna > > For better or worse kick is per ring. It's out of spec to start rings > that were not kicked but I guess you could do configuration ... > Seems somewhat asymmetrical though. I meant to take the first ring being started as the signal to do the global configuration, i.e. not do this once per vring, but once globally. > Let's wait until next week, hopefully Yajun Wu will answer. I mean, personally I don’t really care about the whole SET_STATUS thing. It’s clear that it’s broken for stateful devices. The fact that it took until 6f8be29ec17d to fix it for just any device that would implement it according to spec to me is a strong indication that nobody does implement it according to spec, and is currently only used to signal to some specific back-end that all rings have been set up and should be configured in a single block. (By the way, our SET_STATUS call that adds ACKNOWLEDGE | DRIVER | DRIVER_OK is also completely against the spec, and any well-behaving device should reject it. These flags must be set one after another, and specifically, features must be read and set after setting DRIVER, but before setting FEATURES_OK, and FEATURES_OK must be set before setting DRIVER_OK. Any well-behaving device should error out when DRIVER_OK is set without FEATURES_OK set, or when FEATURES_OK is set without ACKNOWLEDGE | DRIVER set.) I can just drop this patch from the migration series, because in my opinion it doesn’t affect it whatsoever (although I understood Stefan disagrees). But honestly, I think any vhost-user back-end developer is well-advised to completely ignore the status byte. Not ignoring it means relying on qemu’s implementation-defined behavior. Hanna
Hanna Czenczek <hreitz@redhat.com> writes: > On 06.10.23 12:34, Michael S. Tsirkin wrote: >> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >>> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: <snip> >>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >>> devices that would implement them as per virtio spec, and even today it’s >>> broken for stateful devices. The mentioned performance issue is likely >>> real, but we can’t address it by making up SET_STATUS calls that are wrong. >>> >>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >>> final configuration that would happen upon a DRIVER_OK once the first vring >>> is started (i.e. receives a kick). That has the added benefit of being >>> asynchronous because it doesn’t block any vhost-user messages (which are >>> synchronous, and thus block downtime). >>> >>> Hanna >> >> For better or worse kick is per ring. It's out of spec to start rings >> that were not kicked but I guess you could do configuration ... >> Seems somewhat asymmetrical though. > > I meant to take the first ring being started as the signal to do the > global configuration, i.e. not do this once per vring, but once > globally. > >> Let's wait until next week, hopefully Yajun Wu will answer. > > I mean, personally I don’t really care about the whole SET_STATUS > thing. It’s clear that it’s broken for stateful devices. The fact > that it took until 6f8be29ec17d to fix it for just any device that > would implement it according to spec to me is a strong indication that > nobody does implement it according to spec, and is currently only used > to signal to some specific back-end that all rings have been set up > and should be configured in a single block. I'm certainly using [GS]ET_STATUS for the proposed F_TRANSPORT extensions where everything is off-loaded to the vhost-user backend.
On 06.10.23 17:17, Alex Bennée wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > >> On 06.10.23 12:34, Michael S. Tsirkin wrote: >>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >>>> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > <snip> >>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >>>> devices that would implement them as per virtio spec, and even today it’s >>>> broken for stateful devices. The mentioned performance issue is likely >>>> real, but we can’t address it by making up SET_STATUS calls that are wrong. >>>> >>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >>>> final configuration that would happen upon a DRIVER_OK once the first vring >>>> is started (i.e. receives a kick). That has the added benefit of being >>>> asynchronous because it doesn’t block any vhost-user messages (which are >>>> synchronous, and thus block downtime). >>>> >>>> Hanna >>> For better or worse kick is per ring. It's out of spec to start rings >>> that were not kicked but I guess you could do configuration ... >>> Seems somewhat asymmetrical though. >> I meant to take the first ring being started as the signal to do the >> global configuration, i.e. not do this once per vring, but once >> globally. >> >>> Let's wait until next week, hopefully Yajun Wu will answer. >> I mean, personally I don’t really care about the whole SET_STATUS >> thing. It’s clear that it’s broken for stateful devices. The fact >> that it took until 6f8be29ec17d to fix it for just any device that >> would implement it according to spec to me is a strong indication that >> nobody does implement it according to spec, and is currently only used >> to signal to some specific back-end that all rings have been set up >> and should be configured in a single block. > I'm certainly using [GS]ET_STATUS for the proposed F_TRANSPORT > extensions where everything is off-loaded to the vhost-user backend. How do these back-ends work with the fact that qemu uses SET_STATUS incorrectly when not offloading? Do you plan on fixing that? (I.e. that we send SET_STATUS 0 when the VM is paused, potentially resetting state that is not recoverable, and that we set DRIVER and DRIVER_OK simultaneously.) Hanna
Hanna Czenczek <hreitz@redhat.com> writes: > On 06.10.23 17:17, Alex Bennée wrote: >> Hanna Czenczek <hreitz@redhat.com> writes: >> >>> On 06.10.23 12:34, Michael S. Tsirkin wrote: >>>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >>>>> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >> <snip> >>>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >>>>> devices that would implement them as per virtio spec, and even today it’s >>>>> broken for stateful devices. The mentioned performance issue is likely >>>>> real, but we can’t address it by making up SET_STATUS calls that are wrong. >>>>> >>>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >>>>> final configuration that would happen upon a DRIVER_OK once the first vring >>>>> is started (i.e. receives a kick). That has the added benefit of being >>>>> asynchronous because it doesn’t block any vhost-user messages (which are >>>>> synchronous, and thus block downtime). >>>>> >>>>> Hanna >>>> For better or worse kick is per ring. It's out of spec to start rings >>>> that were not kicked but I guess you could do configuration ... >>>> Seems somewhat asymmetrical though. >>> I meant to take the first ring being started as the signal to do the >>> global configuration, i.e. not do this once per vring, but once >>> globally. >>> >>>> Let's wait until next week, hopefully Yajun Wu will answer. >>> I mean, personally I don’t really care about the whole SET_STATUS >>> thing. It’s clear that it’s broken for stateful devices. The fact >>> that it took until 6f8be29ec17d to fix it for just any device that >>> would implement it according to spec to me is a strong indication that >>> nobody does implement it according to spec, and is currently only used >>> to signal to some specific back-end that all rings have been set up >>> and should be configured in a single block. >> I'm certainly using [GS]ET_STATUS for the proposed F_TRANSPORT >> extensions where everything is off-loaded to the vhost-user backend. > > How do these back-ends work with the fact that qemu uses SET_STATUS > incorrectly when not offloading? Do you plan on fixing that? Mainly having a common base implementation which does it right and having very lightweight derivations for legacy stubs using it. The aim is to eliminate the need for QEMU stubs entirely by fully specifying the device from the vhost-user API. > (I.e. that we send SET_STATUS 0 when the VM is paused, potentially > resetting state that is not recoverable, and that we set DRIVER and > DRIVER_OK simultaneously.) This is QEMU simulating a SET_STATUS rather than the guest triggering it? > > Hanna
On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: > External email: Use caution opening links or attachments > > > On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>>>>>>>> There is no clearly defined purpose for the virtio status byte in >>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio >>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors >>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>>>>>>>> >>>>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does >>>>>>>>> implement it, but only uses it to signal feature negotiation failure. >>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively >>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today >>>>>>>>> means the same thing as RESET_DEVICE). >>>>>>>>> >>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not >>>>>>>>> forward the guest-set status byte, but instead just makes it up >>>>>>>>> internally, and actually completely ignores what the back-end returns, >>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single >>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back >>>>>>>>> to see whether the flag is still set, which is the only way in which >>>>>>>>> dpdk uses the status byte. >>>>>>>>> >>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this >>>>>>>>> field in a useful manner, and it also provides no practical use over >>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly >>>>>>>>> defined. Deprecate it. >>>>>>>>> >>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>>>> --- >>>>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- >>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. >>>>>>> The fact current backends never check errors does not mean they never >>>>>>> will. So no, not applying this. >>>>>> Can this not be done with REPLY_ACK? I.e., with the following message >>>>>> order: >>>>>> >>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is >>>>>> present >>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>> 4. SET_FEATURES with need_reply >>>>>> >>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the >>>>>> vCPUs are stopped, which generally seems to request a device reset. If we >>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will >>>>>> implement SET_STATUS later may break with at least these qemu versions. But >>>>>> documenting that a particular use of the status byte is to be ignored would >>>>>> be really strange. >>>>>> >>>>>> Hanna >>>>> Hmm I guess. Though just following virtio spec seems cleaner to me... >>>>> vhost-user reconfigures the state fully on start. >>>> Not the internal device state, though. virtiofsd has internal state, and >>>> other devices like vhost-gpu back-ends would probably, too. >>>> >>>> Stefan has recently sent a series >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to >>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a >>>> reset). >>>> >>>> I really don’t like our current approach with the status byte. Following the >>>> virtio specification to me would mean that the guest directly controls this >>>> byte, which it does not. qemu makes up values as it deems appropriate, and >>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e. >>>> when the guest really doesn’t want a device reset. >>>> >>>> That means that qemu does not treat this as a virtio device field (because >>>> that would mean exposing it to the guest driver), but instead treats it as >>>> part of the vhost(-user) protocol. It doesn’t feel right to me that we use >>>> a virtio-defined feature for communication on the vhost level, i.e. between >>>> front-end and back-end, and not between guest driver and device. I think >>>> all vhost-level protocol features should be fully defined in the vhost-user >>>> specification, which REPLY_ACK is. >>> Hmm that makes sense. Maybe we should have done what stefan's patch >>> is doing. >>> >>> Do look at the original commit that introduced it to understand why >>> it was added. >> I don’t understand why this was added to the stop/cont code, though. If it >> is time consuming to make these changes, why are they done every time the VM >> is paused >> and resumed? It makes sense that this would be done for the initial >> configuration (where a reset also wouldn’t hurt), but here it seems wrong. >> >> (To be clear, a reset in the stop/cont code is wrong, because it breaks >> stateful devices.) >> >> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as >> originally introduced was wrong even for non-stateful devices, because it >> occurred before we fetched the state (vring indices) so we could restore it >> later. I don’t know how 923b8921d21 was tested, but if the back-end used >> for testing implemented SET_STATUS 0 as a reset, it could not have survived >> either migration or a stop/cont in general, because the vring indices would >> have been reset to 0. >> >> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >> devices that would implement them as per virtio spec, and even today it’s >> broken for stateful devices. The mentioned performance issue is likely >> real, but we can’t address it by making up SET_STATUS calls that are wrong. >> >> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >> final configuration that would happen upon a DRIVER_OK once the first vring >> is started (i.e. receives a kick). That has the added benefit of being >> asynchronous because it doesn’t block any vhost-user messages (which are >> synchronous, and thus block downtime). >> >> Hanna > > For better or worse kick is per ring. It's out of spec to start rings > that were not kicked but I guess you could do configuration ... > Seems somewhat asymmetrical though. > > Let's wait until next week, hopefully Yajun Wu will answer. The main motivation of adding VHOST_USER_SET_STATUS is to let backend DPDK know when DRIVER_OK bit is valid. It's an indication of all VQ configuration has sent, otherwise DPDK has to rely on first queue pair is ready, then receiving/applying VQ configuration one by one. During live migration, configuring VQ one by one is very time consuming. For VIRTIO net vDPA, HW needs to know how many VQs are enabled to set RSS(Receive-Side Scaling). If you don’t want SET_STATUS message, backend can remove protocol feature bit VHOST_USER_PROTOCOL_F_STATUS. DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device close/reset. I'm not involved in discussion about adding SET_STATUS in Vhost protocol. This feature is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). Thanks, Yajun > >>>> Now, we could hand full control of the status byte to the guest, and that >>>> would make me content. But I feel like that doesn’t really work, because >>>> qemu needs to intercept the status byte anyway (it needs to know when there >>>> is a reset, probably wants to know when the device is configured, etc.), so >>>> I don’t think having the status byte in vhost-user really gains us much when >>>> qemu could translate status byte changes to/from other vhost-user commands. >>>> >>>> Hanna >>> well it intercepts it but I think it could pass it on unchanged. >>> >>> >>>>> I guess symmetry was the >>>>> point. So I don't see why SET_STATUS 0 has to be ignored. >>>>> >>>>> >>>>> SET_STATUS was introduced by: >>>>> >>>>> commit 923b8921d210763359e96246a58658ac0db6c645 >>>>> Author: Yajun Wu <yajunw@nvidia.com> >>>>> Date: Mon Oct 17 14:44:52 2022 +0800 >>>>> >>>>> vhost-user: Support vhost_dev_start >>>>> >>>>> CC the author. >>>>>
On 06.10.23 22:49, Alex Bennée wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > >> On 06.10.23 17:17, Alex Bennée wrote: >>> Hanna Czenczek <hreitz@redhat.com> writes: >>> >>>> On 06.10.23 12:34, Michael S. Tsirkin wrote: >>>>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >>>>>> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>>>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>> <snip> >>>>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >>>>>> devices that would implement them as per virtio spec, and even today it’s >>>>>> broken for stateful devices. The mentioned performance issue is likely >>>>>> real, but we can’t address it by making up SET_STATUS calls that are wrong. >>>>>> >>>>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >>>>>> final configuration that would happen upon a DRIVER_OK once the first vring >>>>>> is started (i.e. receives a kick). That has the added benefit of being >>>>>> asynchronous because it doesn’t block any vhost-user messages (which are >>>>>> synchronous, and thus block downtime). >>>>>> >>>>>> Hanna >>>>> For better or worse kick is per ring. It's out of spec to start rings >>>>> that were not kicked but I guess you could do configuration ... >>>>> Seems somewhat asymmetrical though. >>>> I meant to take the first ring being started as the signal to do the >>>> global configuration, i.e. not do this once per vring, but once >>>> globally. >>>> >>>>> Let's wait until next week, hopefully Yajun Wu will answer. >>>> I mean, personally I don’t really care about the whole SET_STATUS >>>> thing. It’s clear that it’s broken for stateful devices. The fact >>>> that it took until 6f8be29ec17d to fix it for just any device that >>>> would implement it according to spec to me is a strong indication that >>>> nobody does implement it according to spec, and is currently only used >>>> to signal to some specific back-end that all rings have been set up >>>> and should be configured in a single block. >>> I'm certainly using [GS]ET_STATUS for the proposed F_TRANSPORT >>> extensions where everything is off-loaded to the vhost-user backend. >> How do these back-ends work with the fact that qemu uses SET_STATUS >> incorrectly when not offloading? Do you plan on fixing that? > Mainly having a common base implementation which does it right and > having very lightweight derivations for legacy stubs using it. The > aim is to eliminate the need for QEMU stubs entirely by fully specifying > the device from the vhost-user API. If the current SET_STATUS use is overhauled, too, that would be good. I wonder why you need the status byte, though. >> (I.e. that we send SET_STATUS 0 when the VM is paused, potentially >> resetting state that is not recoverable, and that we set DRIVER and >> DRIVER_OK simultaneously.) > This is QEMU simulating a SET_STATUS rather than the guest triggering > it? Yes, and the fact that we simulate it when the guest will not have triggered it, i.e. we reset the device (SET_STATUS 0) when the VM is paused. Effectively, qemu injects virtio commands that the guest has never requested, which generally feels like a bad idea, because qemu will need to get the device back to its previous state before the guest is resumed, which may or may not work. Specifically, it won’t work for devices that have internal state. Furthermore, we use SET_STATUS to set ACKNOWLEDGE | DRIVER | DRIVER_OK simultaneously, which is wrong. ACKNOWLEDGE | DRIVER may perhaps be set simultaneously, but then comes feature negotiation (setting and checking FEATURES_OK), and then DRIVER_OK. Finally, how the status byte is to be used is not noted in the vhost-user specification, which instead points to the virtio specification. I think if we keep SET_STATUS, it must be documented how it interacts with other vhost-user commands. For example, how the FEATURES_OK protocol described in the virtio specification interacts with GET_FEATURES/SET_FEATURES, or whether SET_STATUS 0 and RESET_DEVICE are equivalent. Currently, the only implementation of SET_STATUS I know (DPDK) ignores SET_STATUS 0, i.e. doesn’t do a reset. To me that indicates that the spec must be clear on what these status values mean with regards to the vhost-user protocol as a whole. So every software implementation with STATUS support that I know implements SET_STATUS wrongly right now, and that’s a problem, because it prevents implementations like virtiofsd from doing so correctly. Hanna
On 07.10.23 04:22, Yajun Wu wrote: > > On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: >> External email: Use caution opening links or attachments >> >> >> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >>> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>>>>>>>>> There is no clearly defined purpose for the virtio status >>>>>>>>>> byte in >>>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and >>>>>>>>>> for virtio >>>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>>>>>>>>> protocol extension, it is possible for SET_FEATURES to return >>>>>>>>>> errors >>>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>>>>>>>>> >>>>>>>>>> As for implementations, SET_STATUS is not widely >>>>>>>>>> implemented. dpdk does >>>>>>>>>> implement it, but only uses it to signal feature negotiation >>>>>>>>>> failure. >>>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it >>>>>>>>>> effectively >>>>>>>>>> ignores them, in contrast to RESET_OWNER (which is >>>>>>>>>> deprecated, and today >>>>>>>>>> means the same thing as RESET_DEVICE). >>>>>>>>>> >>>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it >>>>>>>>>> does not >>>>>>>>>> forward the guest-set status byte, but instead just makes it up >>>>>>>>>> internally, and actually completely ignores what the back-end >>>>>>>>>> returns, >>>>>>>>>> only using it as the template for a subsequent SET_STATUS to >>>>>>>>>> add single >>>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never >>>>>>>>>> reads it back >>>>>>>>>> to see whether the flag is still set, which is the only way >>>>>>>>>> in which >>>>>>>>>> dpdk uses the status byte. >>>>>>>>>> >>>>>>>>>> As-is, no front-end or back-end can rely on the other side >>>>>>>>>> handling this >>>>>>>>>> field in a useful manner, and it also provides no practical >>>>>>>>>> use over >>>>>>>>>> other mechanisms the vhost-user protocol has, which are more >>>>>>>>>> clearly >>>>>>>>>> defined. Deprecate it. >>>>>>>>>> >>>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>>>>> --- >>>>>>>>>> docs/interop/vhost-user.rst | 28 >>>>>>>>>> +++++++++++++++++++++------- >>>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>>> SET_STATUS is the only way to signal failure to acknowledge >>>>>>>> FEATURES_OK. >>>>>>>> The fact current backends never check errors does not mean they >>>>>>>> never >>>>>>>> will. So no, not applying this. >>>>>>> Can this not be done with REPLY_ACK? I.e., with the following >>>>>>> message >>>>>>> order: >>>>>>> >>>>>>> 1. GET_FEATURES to find out whether >>>>>>> VHOST_USER_F_PROTOCOL_FEATURES is >>>>>>> present >>>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get >>>>>>> VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>>> 4. SET_FEATURES with need_reply >>>>>>> >>>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a >>>>>>> while when the >>>>>>> vCPUs are stopped, which generally seems to request a device >>>>>>> reset. If we >>>>>>> don’t state at least that SET_STATUS 0 is to be ignored, >>>>>>> back-ends that will >>>>>>> implement SET_STATUS later may break with at least these qemu >>>>>>> versions. But >>>>>>> documenting that a particular use of the status byte is to be >>>>>>> ignored would >>>>>>> be really strange. >>>>>>> >>>>>>> Hanna >>>>>> Hmm I guess. Though just following virtio spec seems cleaner to >>>>>> me... >>>>>> vhost-user reconfigures the state fully on start. >>>>> Not the internal device state, though. virtiofsd has internal >>>>> state, and >>>>> other devices like vhost-gpu back-ends would probably, too. >>>>> >>>>> Stefan has recently sent a series >>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) >>>>> to >>>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really >>>>> need a >>>>> reset). >>>>> >>>>> I really don’t like our current approach with the status byte. >>>>> Following the >>>>> virtio specification to me would mean that the guest directly >>>>> controls this >>>>> byte, which it does not. qemu makes up values as it deems >>>>> appropriate, and >>>>> this includes sending a SET_STATUS 0 when the guest is just >>>>> paused, i.e. >>>>> when the guest really doesn’t want a device reset. >>>>> >>>>> That means that qemu does not treat this as a virtio device field >>>>> (because >>>>> that would mean exposing it to the guest driver), but instead >>>>> treats it as >>>>> part of the vhost(-user) protocol. It doesn’t feel right to me >>>>> that we use >>>>> a virtio-defined feature for communication on the vhost level, >>>>> i.e. between >>>>> front-end and back-end, and not between guest driver and device. >>>>> I think >>>>> all vhost-level protocol features should be fully defined in the >>>>> vhost-user >>>>> specification, which REPLY_ACK is. >>>> Hmm that makes sense. Maybe we should have done what stefan's patch >>>> is doing. >>>> >>>> Do look at the original commit that introduced it to understand why >>>> it was added. >>> I don’t understand why this was added to the stop/cont code, >>> though. If it >>> is time consuming to make these changes, why are they done every >>> time the VM >>> is paused >>> and resumed? It makes sense that this would be done for the initial >>> configuration (where a reset also wouldn’t hurt), but here it seems >>> wrong. >>> >>> (To be clear, a reset in the stop/cont code is wrong, because it breaks >>> stateful devices.) >>> >>> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as >>> originally introduced was wrong even for non-stateful devices, >>> because it >>> occurred before we fetched the state (vring indices) so we could >>> restore it >>> later. I don’t know how 923b8921d21 was tested, but if the back-end >>> used >>> for testing implemented SET_STATUS 0 as a reset, it could not have >>> survived >>> either migration or a stop/cont in general, because the vring >>> indices would >>> have been reset to 0. >>> >>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that >>> broke all >>> devices that would implement them as per virtio spec, and even today >>> it’s >>> broken for stateful devices. The mentioned performance issue is likely >>> real, but we can’t address it by making up SET_STATUS calls that are >>> wrong. >>> >>> I concede that I didn’t think about DRIVER_OK. Personally, I would >>> do all >>> final configuration that would happen upon a DRIVER_OK once the >>> first vring >>> is started (i.e. receives a kick). That has the added benefit of being >>> asynchronous because it doesn’t block any vhost-user messages (which >>> are >>> synchronous, and thus block downtime). >>> >>> Hanna >> >> For better or worse kick is per ring. It's out of spec to start rings >> that were not kicked but I guess you could do configuration ... >> Seems somewhat asymmetrical though. >> >> Let's wait until next week, hopefully Yajun Wu will answer. > The main motivation of adding VHOST_USER_SET_STATUS is to let backend > DPDK know > when DRIVER_OK bit is valid. It's an indication of all VQ > configuration has sent, > otherwise DPDK has to rely on first queue pair is ready, then > receiving/applying > VQ configuration one by one. > > During live migration, configuring VQ one by one is very time consuming. One question I have here is why it wasn’t then introduced in the live migration code, but in the general VM stop/cont code instead. It does seem time-consuming to do this every time the VM is paused and resumed. > For VIRTIO > net vDPA, HW needs to know how many VQs are enabled to set > RSS(Receive-Side Scaling). > > If you don’t want SET_STATUS message, backend can remove protocol > feature bit > VHOST_USER_PROTOCOL_F_STATUS. The problem isn’t back-ends that don’t want the message, the problem is that qemu uses the message wrongly, which prevents well-behaving back-ends from implementing the message. > DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device > close/reset. So the right thing to do for back-ends is to announce STATUS support and then not implement it correctly? GET_VRING_BASE should not reset the close or reset the device, by the way. It should stop that one vring, not more. We have a RESET_DEVICE command for resetting. > I'm not involved in discussion about adding SET_STATUS in Vhost > protocol. This feature > is essential for vDPA(same as vhost-vdpa implements > VHOST_VDPA_SET_STATUS). So from what I gather from your response is that there is only a single use for SET_STATUS, which is the DRIVER_OK bit. If so, documenting that all other bits are to be ignored by both back-end and front-end would be fine by me. I’m not fully serious about that suggestion, but I hear the strong implication that nothing but DRIVER_OK was of any concern, and this is really important to note when we talk about the status of the STATUS feature in vhost today. It seems to me now that it was not intended to be the virtio-level status byte, but just a DRIVER_OK signalling path from front-end to back-end. That makes it a vhost-level protocol feature to me. Hanna > > Thanks, > Yajun >> >>>>> Now, we could hand full control of the status byte to the guest, >>>>> and that >>>>> would make me content. But I feel like that doesn’t really work, >>>>> because >>>>> qemu needs to intercept the status byte anyway (it needs to know >>>>> when there >>>>> is a reset, probably wants to know when the device is configured, >>>>> etc.), so >>>>> I don’t think having the status byte in vhost-user really gains us >>>>> much when >>>>> qemu could translate status byte changes to/from other vhost-user >>>>> commands. >>>>> >>>>> Hanna >>>> well it intercepts it but I think it could pass it on unchanged. >>>> >>>> >>>>>> I guess symmetry was the >>>>>> point. So I don't see why SET_STATUS 0 has to be ignored. >>>>>> >>>>>> >>>>>> SET_STATUS was introduced by: >>>>>> >>>>>> commit 923b8921d210763359e96246a58658ac0db6c645 >>>>>> Author: Yajun Wu <yajunw@nvidia.com> >>>>>> Date: Mon Oct 17 14:44:52 2022 +0800 >>>>>> >>>>>> vhost-user: Support vhost_dev_start >>>>>> >>>>>> CC the author. >>>>>> >
On 09.10.23 10:21, Hanna Czenczek wrote: > On 07.10.23 04:22, Yajun Wu wrote: [...] >> The main motivation of adding VHOST_USER_SET_STATUS is to let backend >> DPDK know >> when DRIVER_OK bit is valid. It's an indication of all VQ >> configuration has sent, >> otherwise DPDK has to rely on first queue pair is ready, then >> receiving/applying >> VQ configuration one by one. >> >> During live migration, configuring VQ one by one is very time consuming. > > One question I have here is why it wasn’t then introduced in the live > migration code, but in the general VM stop/cont code instead. It does > seem time-consuming to do this every time the VM is paused and resumed. > >> For VIRTIO >> net vDPA, HW needs to know how many VQs are enabled to set >> RSS(Receive-Side Scaling). >> >> If you don’t want SET_STATUS message, backend can remove protocol >> feature bit >> VHOST_USER_PROTOCOL_F_STATUS. > > The problem isn’t back-ends that don’t want the message, the problem > is that qemu uses the message wrongly, which prevents well-behaving > back-ends from implementing the message. > >> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device >> close/reset. > > So the right thing to do for back-ends is to announce STATUS support > and then not implement it correctly? > > GET_VRING_BASE should not reset the close or reset the device, by the > way. It should stop that one vring, not more. We have a RESET_DEVICE > command for resetting. > >> I'm not involved in discussion about adding SET_STATUS in Vhost >> protocol. This feature >> is essential for vDPA(same as vhost-vdpa implements >> VHOST_VDPA_SET_STATUS). > > So from what I gather from your response is that there is only a > single use for SET_STATUS, which is the DRIVER_OK bit. If so, > documenting that all other bits are to be ignored by both back-end and > front-end would be fine by me. > > I’m not fully serious about that suggestion, but I hear the strong > implication that nothing but DRIVER_OK was of any concern, and this is > really important to note when we talk about the status of the STATUS > feature in vhost today. It seems to me now that it was not intended > to be the virtio-level status byte, but just a DRIVER_OK signalling > path from front-end to back-end. That makes it a vhost-level protocol > feature to me. On second thought, it just is a pure vhost-level protocol feature, and has nothing to do with the virtio status byte as-is. The only stated purpose is for the front-end to send DRIVER_OK after migration, but migration is transparent to the guest, so the guest would never change the status byte during migration. Therefore, if this feature is essential, we will never be able to have a status byte that is transparently shared between guest and back-end device, i.e. the virtio status byte. Cc-ing Alex on this mail, because to me, this seems like an important detail when he plans on using the byte in the future. If we need a virtio status byte, I can’t see how we could use the existing F_STATUS for it. Hanna
On 09.10.23 11:07, Hanna Czenczek wrote: > On 09.10.23 10:21, Hanna Czenczek wrote: >> On 07.10.23 04:22, Yajun Wu wrote: > > [...] > >>> The main motivation of adding VHOST_USER_SET_STATUS is to let >>> backend DPDK know >>> when DRIVER_OK bit is valid. It's an indication of all VQ >>> configuration has sent, >>> otherwise DPDK has to rely on first queue pair is ready, then >>> receiving/applying >>> VQ configuration one by one. >>> >>> During live migration, configuring VQ one by one is very time >>> consuming. >> >> One question I have here is why it wasn’t then introduced in the live >> migration code, but in the general VM stop/cont code instead. It does >> seem time-consuming to do this every time the VM is paused and resumed. >> >>> For VIRTIO >>> net vDPA, HW needs to know how many VQs are enabled to set >>> RSS(Receive-Side Scaling). >>> >>> If you don’t want SET_STATUS message, backend can remove protocol >>> feature bit >>> VHOST_USER_PROTOCOL_F_STATUS. >> >> The problem isn’t back-ends that don’t want the message, the problem >> is that qemu uses the message wrongly, which prevents well-behaving >> back-ends from implementing the message. >> >>> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device >>> close/reset. >> >> So the right thing to do for back-ends is to announce STATUS support >> and then not implement it correctly? >> >> GET_VRING_BASE should not reset the close or reset the device, by the >> way. It should stop that one vring, not more. We have a >> RESET_DEVICE command for resetting. >> >>> I'm not involved in discussion about adding SET_STATUS in Vhost >>> protocol. This feature >>> is essential for vDPA(same as vhost-vdpa implements >>> VHOST_VDPA_SET_STATUS). >> >> So from what I gather from your response is that there is only a >> single use for SET_STATUS, which is the DRIVER_OK bit. If so, >> documenting that all other bits are to be ignored by both back-end >> and front-end would be fine by me. >> >> I’m not fully serious about that suggestion, but I hear the strong >> implication that nothing but DRIVER_OK was of any concern, and this >> is really important to note when we talk about the status of the >> STATUS feature in vhost today. It seems to me now that it was not >> intended to be the virtio-level status byte, but just a DRIVER_OK >> signalling path from front-end to back-end. That makes it a >> vhost-level protocol feature to me. > > On second thought, it just is a pure vhost-level protocol feature, and > has nothing to do with the virtio status byte as-is. The only stated > purpose is for the front-end to send DRIVER_OK after migration, but > migration is transparent to the guest, so the guest would never change > the status byte during migration. Therefore, if this feature is > essential, we will never be able to have a status byte that is > transparently shared between guest and back-end device, i.e. the > virtio status byte. On third thought, scratch that. The guest wouldn’t set it, but naturally, after migration, the front-end will need to restore the status byte from the source, so the front-end will always need to set it, even if it were otherwise used controlled only by the guest and the back-end device. So technically, this doesn’t prevent such a use case. (In practice, it isn’t controlled by the guest right now, but that could be fixed.) > Cc-ing Alex on this mail, because to me, this seems like an important > detail when he plans on using the byte in the future. If we need a > virtio status byte, I can’t see how we could use the existing F_STATUS > for it. > > Hanna
On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu <yajunw@nvidia.com> wrote: > > > On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: > > External email: Use caution opening links or attachments > > > > > > On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: > >> On 06.10.23 11:26, Michael S. Tsirkin wrote: > >>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: > >>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: > >>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: > >>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: > >>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: > >>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > >>>>>>>>> There is no clearly defined purpose for the virtio status byte in > >>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio > >>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > >>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors > >>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > >>>>>>>>> > >>>>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does > >>>>>>>>> implement it, but only uses it to signal feature negotiation failure. > >>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively > >>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today > >>>>>>>>> means the same thing as RESET_DEVICE). > >>>>>>>>> > >>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not > >>>>>>>>> forward the guest-set status byte, but instead just makes it up > >>>>>>>>> internally, and actually completely ignores what the back-end returns, > >>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single > >>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back > >>>>>>>>> to see whether the flag is still set, which is the only way in which > >>>>>>>>> dpdk uses the status byte. > >>>>>>>>> > >>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this > >>>>>>>>> field in a useful manner, and it also provides no practical use over > >>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly > >>>>>>>>> defined. Deprecate it. > >>>>>>>>> > >>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> > >>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >>>>>>>>> --- > >>>>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- > >>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) > >>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > >>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. > >>>>>>> The fact current backends never check errors does not mean they never > >>>>>>> will. So no, not applying this. > >>>>>> Can this not be done with REPLY_ACK? I.e., with the following message > >>>>>> order: > >>>>>> > >>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is > >>>>>> present > >>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK > >>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK > >>>>>> 4. SET_FEATURES with need_reply > >>>>>> > >>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the > >>>>>> vCPUs are stopped, which generally seems to request a device reset. If we > >>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will > >>>>>> implement SET_STATUS later may break with at least these qemu versions. But > >>>>>> documenting that a particular use of the status byte is to be ignored would > >>>>>> be really strange. > >>>>>> > >>>>>> Hanna > >>>>> Hmm I guess. Though just following virtio spec seems cleaner to me... > >>>>> vhost-user reconfigures the state fully on start. > >>>> Not the internal device state, though. virtiofsd has internal state, and > >>>> other devices like vhost-gpu back-ends would probably, too. > >>>> > >>>> Stefan has recently sent a series > >>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to > >>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a > >>>> reset). > >>>> > >>>> I really don’t like our current approach with the status byte. Following the > >>>> virtio specification to me would mean that the guest directly controls this > >>>> byte, which it does not. qemu makes up values as it deems appropriate, and > >>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e. > >>>> when the guest really doesn’t want a device reset. > >>>> > >>>> That means that qemu does not treat this as a virtio device field (because > >>>> that would mean exposing it to the guest driver), but instead treats it as > >>>> part of the vhost(-user) protocol. It doesn’t feel right to me that we use > >>>> a virtio-defined feature for communication on the vhost level, i.e. between > >>>> front-end and back-end, and not between guest driver and device. I think > >>>> all vhost-level protocol features should be fully defined in the vhost-user > >>>> specification, which REPLY_ACK is. > >>> Hmm that makes sense. Maybe we should have done what stefan's patch > >>> is doing. > >>> > >>> Do look at the original commit that introduced it to understand why > >>> it was added. > >> I don’t understand why this was added to the stop/cont code, though. If it > >> is time consuming to make these changes, why are they done every time the VM > >> is paused > >> and resumed? It makes sense that this would be done for the initial > >> configuration (where a reset also wouldn’t hurt), but here it seems wrong. > >> > >> (To be clear, a reset in the stop/cont code is wrong, because it breaks > >> stateful devices.) > >> > >> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as > >> originally introduced was wrong even for non-stateful devices, because it > >> occurred before we fetched the state (vring indices) so we could restore it > >> later. I don’t know how 923b8921d21 was tested, but if the back-end used > >> for testing implemented SET_STATUS 0 as a reset, it could not have survived > >> either migration or a stop/cont in general, because the vring indices would > >> have been reset to 0. > >> > >> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all > >> devices that would implement them as per virtio spec, and even today it’s > >> broken for stateful devices. The mentioned performance issue is likely > >> real, but we can’t address it by making up SET_STATUS calls that are wrong. > >> > >> I concede that I didn’t think about DRIVER_OK. Personally, I would do all > >> final configuration that would happen upon a DRIVER_OK once the first vring > >> is started (i.e. receives a kick). That has the added benefit of being > >> asynchronous because it doesn’t block any vhost-user messages (which are > >> synchronous, and thus block downtime). > >> > >> Hanna > > > > For better or worse kick is per ring. It's out of spec to start rings > > that were not kicked but I guess you could do configuration ... > > Seems somewhat asymmetrical though. > > > > Let's wait until next week, hopefully Yajun Wu will answer. > The main motivation of adding VHOST_USER_SET_STATUS is to let backend > DPDK know > when DRIVER_OK bit is valid. It's an indication of all VQ configuration > has sent, > otherwise DPDK has to rely on first queue pair is ready, then > receiving/applying > VQ configuration one by one. > > During live migration, configuring VQ one by one is very time consuming. > For VIRTIO > net vDPA, HW needs to know how many VQs are enabled to set > RSS(Receive-Side Scaling). > > If you don’t want SET_STATUS message, backend can remove protocol > feature bit > VHOST_USER_PROTOCOL_F_STATUS. > DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device > close/reset. This is incorrect, resetting the device on GET_VRING_BASE breaks the stop/cont. Since you don't want to reset the VQs on stop/cont. > > I'm not involved in discussion about adding SET_STATUS in Vhost > protocol. This feature > is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). > > Thanks, > Yajun > > > >>>> Now, we could hand full control of the status byte to the guest, and that > >>>> would make me content. But I feel like that doesn’t really work, because > >>>> qemu needs to intercept the status byte anyway (it needs to know when there > >>>> is a reset, probably wants to know when the device is configured, etc.), so > >>>> I don’t think having the status byte in vhost-user really gains us much when > >>>> qemu could translate status byte changes to/from other vhost-user commands. > >>>> > >>>> Hanna > >>> well it intercepts it but I think it could pass it on unchanged. > >>> > >>> > >>>>> I guess symmetry was the > >>>>> point. So I don't see why SET_STATUS 0 has to be ignored. > >>>>> > >>>>> > >>>>> SET_STATUS was introduced by: > >>>>> > >>>>> commit 923b8921d210763359e96246a58658ac0db6c645 > >>>>> Author: Yajun Wu <yajunw@nvidia.com> > >>>>> Date: Mon Oct 17 14:44:52 2022 +0800 > >>>>> > >>>>> vhost-user: Support vhost_dev_start > >>>>> > >>>>> CC the author. > >>>>> > > _______________________________________________ > Virtio-fs mailing list > Virtio-fs@redhat.com > https://listman.redhat.com/mailman/listinfo/virtio-fs
On 10/9/2023 6:28 PM, German Maglione wrote: > External email: Use caution opening links or attachments > > > On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu <yajunw@nvidia.com> wrote: >> >> On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: >>>> On 06.10.23 11:26, Michael S. Tsirkin wrote: >>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: >>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: >>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: >>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: >>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: >>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: >>>>>>>>>>> There is no clearly defined purpose for the virtio status byte in >>>>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio >>>>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK >>>>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors >>>>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). >>>>>>>>>>> >>>>>>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does >>>>>>>>>>> implement it, but only uses it to signal feature negotiation failure. >>>>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively >>>>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today >>>>>>>>>>> means the same thing as RESET_DEVICE). >>>>>>>>>>> >>>>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not >>>>>>>>>>> forward the guest-set status byte, but instead just makes it up >>>>>>>>>>> internally, and actually completely ignores what the back-end returns, >>>>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single >>>>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back >>>>>>>>>>> to see whether the flag is still set, which is the only way in which >>>>>>>>>>> dpdk uses the status byte. >>>>>>>>>>> >>>>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this >>>>>>>>>>> field in a useful manner, and it also provides no practical use over >>>>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly >>>>>>>>>>> defined. Deprecate it. >>>>>>>>>>> >>>>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>>>>>> --- >>>>>>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- >>>>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) >>>>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>>>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. >>>>>>>>> The fact current backends never check errors does not mean they never >>>>>>>>> will. So no, not applying this. >>>>>>>> Can this not be done with REPLY_ACK? I.e., with the following message >>>>>>>> order: >>>>>>>> >>>>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is >>>>>>>> present >>>>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK >>>>>>>> 4. SET_FEATURES with need_reply >>>>>>>> >>>>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the >>>>>>>> vCPUs are stopped, which generally seems to request a device reset. If we >>>>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will >>>>>>>> implement SET_STATUS later may break with at least these qemu versions. But >>>>>>>> documenting that a particular use of the status byte is to be ignored would >>>>>>>> be really strange. >>>>>>>> >>>>>>>> Hanna >>>>>>> Hmm I guess. Though just following virtio spec seems cleaner to me... >>>>>>> vhost-user reconfigures the state fully on start. >>>>>> Not the internal device state, though. virtiofsd has internal state, and >>>>>> other devices like vhost-gpu back-ends would probably, too. >>>>>> >>>>>> Stefan has recently sent a series >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to >>>>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a >>>>>> reset). >>>>>> >>>>>> I really don’t like our current approach with the status byte. Following the >>>>>> virtio specification to me would mean that the guest directly controls this >>>>>> byte, which it does not. qemu makes up values as it deems appropriate, and >>>>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e. >>>>>> when the guest really doesn’t want a device reset. >>>>>> >>>>>> That means that qemu does not treat this as a virtio device field (because >>>>>> that would mean exposing it to the guest driver), but instead treats it as >>>>>> part of the vhost(-user) protocol. It doesn’t feel right to me that we use >>>>>> a virtio-defined feature for communication on the vhost level, i.e. between >>>>>> front-end and back-end, and not between guest driver and device. I think >>>>>> all vhost-level protocol features should be fully defined in the vhost-user >>>>>> specification, which REPLY_ACK is. >>>>> Hmm that makes sense. Maybe we should have done what stefan's patch >>>>> is doing. >>>>> >>>>> Do look at the original commit that introduced it to understand why >>>>> it was added. >>>> I don’t understand why this was added to the stop/cont code, though. If it >>>> is time consuming to make these changes, why are they done every time the VM >>>> is paused >>>> and resumed? It makes sense that this would be done for the initial >>>> configuration (where a reset also wouldn’t hurt), but here it seems wrong. >>>> >>>> (To be clear, a reset in the stop/cont code is wrong, because it breaks >>>> stateful devices.) >>>> >>>> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as >>>> originally introduced was wrong even for non-stateful devices, because it >>>> occurred before we fetched the state (vring indices) so we could restore it >>>> later. I don’t know how 923b8921d21 was tested, but if the back-end used >>>> for testing implemented SET_STATUS 0 as a reset, it could not have survived >>>> either migration or a stop/cont in general, because the vring indices would >>>> have been reset to 0. >>>> >>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all >>>> devices that would implement them as per virtio spec, and even today it’s >>>> broken for stateful devices. The mentioned performance issue is likely >>>> real, but we can’t address it by making up SET_STATUS calls that are wrong. >>>> >>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all >>>> final configuration that would happen upon a DRIVER_OK once the first vring >>>> is started (i.e. receives a kick). That has the added benefit of being >>>> asynchronous because it doesn’t block any vhost-user messages (which are >>>> synchronous, and thus block downtime). >>>> >>>> Hanna >>> For better or worse kick is per ring. It's out of spec to start rings >>> that were not kicked but I guess you could do configuration ... >>> Seems somewhat asymmetrical though. >>> >>> Let's wait until next week, hopefully Yajun Wu will answer. >> The main motivation of adding VHOST_USER_SET_STATUS is to let backend >> DPDK know >> when DRIVER_OK bit is valid. It's an indication of all VQ configuration >> has sent, >> otherwise DPDK has to rely on first queue pair is ready, then >> receiving/applying >> VQ configuration one by one. >> >> During live migration, configuring VQ one by one is very time consuming. >> For VIRTIO >> net vDPA, HW needs to know how many VQs are enabled to set >> RSS(Receive-Side Scaling). >> >> If you don’t want SET_STATUS message, backend can remove protocol >> feature bit >> VHOST_USER_PROTOCOL_F_STATUS. >> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device >> close/reset. > This is incorrect, resetting the device on GET_VRING_BASE breaks > the stop/cont. Since you don't want to reset the VQs on stop/cont. Sorry for the misunderstanding, dpdk vhost backend framework doesn't have RESET concept(only device level .dev_conf and .dev_close). On receiving DRIVER_OK does dev_conf, on receiving GET_VRING_BASE does dev_close. For every VM suspend/resume, dpdk issues dev_close then dev_conf. > >> I'm not involved in discussion about adding SET_STATUS in Vhost >> protocol. This feature >> is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). >> >> Thanks, >> Yajun >>>>>> Now, we could hand full control of the status byte to the guest, and that >>>>>> would make me content. But I feel like that doesn’t really work, because >>>>>> qemu needs to intercept the status byte anyway (it needs to know when there >>>>>> is a reset, probably wants to know when the device is configured, etc.), so >>>>>> I don’t think having the status byte in vhost-user really gains us much when >>>>>> qemu could translate status byte changes to/from other vhost-user commands. >>>>>> >>>>>> Hanna >>>>> well it intercepts it but I think it could pass it on unchanged. >>>>> >>>>> >>>>>>> I guess symmetry was the >>>>>>> point. So I don't see why SET_STATUS 0 has to be ignored. >>>>>>> >>>>>>> >>>>>>> SET_STATUS was introduced by: >>>>>>> >>>>>>> commit 923b8921d210763359e96246a58658ac0db6c645 >>>>>>> Author: Yajun Wu <yajunw@nvidia.com> >>>>>>> Date: Mon Oct 17 14:44:52 2022 +0800 >>>>>>> >>>>>>> vhost-user: Support vhost_dev_start >>>>>>> >>>>>>> CC the author. >>>>>>> >> _______________________________________________ >> Virtio-fs mailing list >> Virtio-fs@redhat.com >> https://listman.redhat.com/mailman/listinfo/virtio-fs > > > -- > German >
On 10/9/2023 5:13 PM, Hanna Czenczek wrote: > External email: Use caution opening links or attachments > > > On 09.10.23 11:07, Hanna Czenczek wrote: >> On 09.10.23 10:21, Hanna Czenczek wrote: >>> On 07.10.23 04:22, Yajun Wu wrote: >> [...] >> >>>> The main motivation of adding VHOST_USER_SET_STATUS is to let >>>> backend DPDK know >>>> when DRIVER_OK bit is valid. It's an indication of all VQ >>>> configuration has sent, >>>> otherwise DPDK has to rely on first queue pair is ready, then >>>> receiving/applying >>>> VQ configuration one by one. >>>> >>>> During live migration, configuring VQ one by one is very time >>>> consuming. >>> One question I have here is why it wasn’t then introduced in the live >>> migration code, but in the general VM stop/cont code instead. It does >>> seem time-consuming to do this every time the VM is paused and resumed. Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe because there's no device level stop/cont vhost message? >>> >>>> For VIRTIO >>>> net vDPA, HW needs to know how many VQs are enabled to set >>>> RSS(Receive-Side Scaling). >>>> >>>> If you don’t want SET_STATUS message, backend can remove protocol >>>> feature bit >>>> VHOST_USER_PROTOCOL_F_STATUS. >>> The problem isn’t back-ends that don’t want the message, the problem >>> is that qemu uses the message wrongly, which prevents well-behaving >>> back-ends from implementing the message. >>> >>>> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device >>>> close/reset. >>> So the right thing to do for back-ends is to announce STATUS support >>> and then not implement it correctly? >>> >>> GET_VRING_BASE should not reset the close or reset the device, by the >>> way. It should stop that one vring, not more. We have a >>> RESET_DEVICE command for resetting. I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE? It's a compatible issue. For new backend implements, we can have better solution, right? >>>> I'm not involved in discussion about adding SET_STATUS in Vhost >>>> protocol. This feature >>>> is essential for vDPA(same as vhost-vdpa implements >>>> VHOST_VDPA_SET_STATUS). >>> So from what I gather from your response is that there is only a >>> single use for SET_STATUS, which is the DRIVER_OK bit. If so, >>> documenting that all other bits are to be ignored by both back-end >>> and front-end would be fine by me. >>> >>> I’m not fully serious about that suggestion, but I hear the strong >>> implication that nothing but DRIVER_OK was of any concern, and this >>> is really important to note when we talk about the status of the >>> STATUS feature in vhost today. It seems to me now that it was not >>> intended to be the virtio-level status byte, but just a DRIVER_OK >>> signalling path from front-end to back-end. That makes it a >>> vhost-level protocol feature to me. >> On second thought, it just is a pure vhost-level protocol feature, and >> has nothing to do with the virtio status byte as-is. The only stated >> purpose is for the front-end to send DRIVER_OK after migration, but >> migration is transparent to the guest, so the guest would never change >> the status byte during migration. Therefore, if this feature is >> essential, we will never be able to have a status byte that is >> transparently shared between guest and back-end device, i.e. the >> virtio status byte. > On third thought, scratch that. The guest wouldn’t set it, but > naturally, after migration, the front-end will need to restore the > status byte from the source, so the front-end will always need to set > it, even if it were otherwise used controlled only by the guest and the > back-end device. So technically, this doesn’t prevent such a use case. > (In practice, it isn’t controlled by the guest right now, but that could > be fixed.) I only tested the feature with DPDK(the only backend use it today?). Max defined the protocol and added the corresponding code in DPDK before I added QEMU support. If other backend or different device type want to use this, we can have further discussion? >> Cc-ing Alex on this mail, because to me, this seems like an important >> detail when he plans on using the byte in the future. If we need a >> virtio status byte, I can’t see how we could use the existing F_STATUS >> for it. >> >> Hanna
On 10.10.23 06:00, Yajun Wu wrote: > > On 10/9/2023 5:13 PM, Hanna Czenczek wrote: >> External email: Use caution opening links or attachments >> >> >> On 09.10.23 11:07, Hanna Czenczek wrote: >>> On 09.10.23 10:21, Hanna Czenczek wrote: >>>> On 07.10.23 04:22, Yajun Wu wrote: >>> [...] >>> >>>>> The main motivation of adding VHOST_USER_SET_STATUS is to let >>>>> backend DPDK know >>>>> when DRIVER_OK bit is valid. It's an indication of all VQ >>>>> configuration has sent, >>>>> otherwise DPDK has to rely on first queue pair is ready, then >>>>> receiving/applying >>>>> VQ configuration one by one. >>>>> >>>>> During live migration, configuring VQ one by one is very time >>>>> consuming. >>>> One question I have here is why it wasn’t then introduced in the live >>>> migration code, but in the general VM stop/cont code instead. It does >>>> seem time-consuming to do this every time the VM is paused and >>>> resumed. > > Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe > because there's no device level stop/cont vhost message? No, it is because qemu will reset the status in stop/cont*, which it should not do. Aside from guest-initiated resets, the only thing where a reset comes into play is when the back-end is changed, e.g. during migration. In that case, the source back-end will see a disconnect on the vhost-user socket and can then do whatever uninitialization it needs to do, and the destination front-end will need to be reconfigured by qemu anyway, because it’s just a case of the destination qemu initiating a fresh connection to a new back-end (except that it will need to restore the state from the source). *Yes, technically, dpdk will ignore that reset, but it still stops the device on a different message (when it should just pause processing vrings), so the outcome is the same. >>>> >>>>> For VIRTIO >>>>> net vDPA, HW needs to know how many VQs are enabled to set >>>>> RSS(Receive-Side Scaling). >>>>> >>>>> If you don’t want SET_STATUS message, backend can remove protocol >>>>> feature bit >>>>> VHOST_USER_PROTOCOL_F_STATUS. >>>> The problem isn’t back-ends that don’t want the message, the problem >>>> is that qemu uses the message wrongly, which prevents well-behaving >>>> back-ends from implementing the message. >>>> >>>>> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device >>>>> close/reset. >>>> So the right thing to do for back-ends is to announce STATUS support >>>> and then not implement it correctly? >>>> >>>> GET_VRING_BASE should not reset the close or reset the device, by the >>>> way. It should stop that one vring, not more. We have a >>>> RESET_DEVICE command for resetting. > I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE? I don’t think it matters who came first. What matters is the specification, and that dpdk decided to rely on implementation-specific behavior without having all involved parties agree by matters of putting that in the specification. And now dpdk clearly deviates from the specification as a result of that action, which can result in problems if the front-end doesn’t do what qemu always used to do. (E.g. the front-end might just send GET_VRING_BASE for all vrings when suspending the guest, and then only send kicks on resume to re-start the vrings. dpdk would most likely be left in a state where the whole device is stopped, expecting DRIVER_OK. Same thing in general for front-ends that don’t support F_STATUS.) > It's a compatible issue. For new backend implements, we can have > better solution, right? The fact that dpdk and qemu deviate from the specification is a problem as-is. >>>>> I'm not involved in discussion about adding SET_STATUS in Vhost >>>>> protocol. This feature >>>>> is essential for vDPA(same as vhost-vdpa implements >>>>> VHOST_VDPA_SET_STATUS). >>>> So from what I gather from your response is that there is only a >>>> single use for SET_STATUS, which is the DRIVER_OK bit. If so, >>>> documenting that all other bits are to be ignored by both back-end >>>> and front-end would be fine by me. >>>> >>>> I’m not fully serious about that suggestion, but I hear the strong >>>> implication that nothing but DRIVER_OK was of any concern, and this >>>> is really important to note when we talk about the status of the >>>> STATUS feature in vhost today. It seems to me now that it was not >>>> intended to be the virtio-level status byte, but just a DRIVER_OK >>>> signalling path from front-end to back-end. That makes it a >>>> vhost-level protocol feature to me. >>> On second thought, it just is a pure vhost-level protocol feature, and >>> has nothing to do with the virtio status byte as-is. The only stated >>> purpose is for the front-end to send DRIVER_OK after migration, but >>> migration is transparent to the guest, so the guest would never change >>> the status byte during migration. Therefore, if this feature is >>> essential, we will never be able to have a status byte that is >>> transparently shared between guest and back-end device, i.e. the >>> virtio status byte. >> On third thought, scratch that. The guest wouldn’t set it, but >> naturally, after migration, the front-end will need to restore the >> status byte from the source, so the front-end will always need to set >> it, even if it were otherwise used controlled only by the guest and the >> back-end device. So technically, this doesn’t prevent such a use case. >> (In practice, it isn’t controlled by the guest right now, but that could >> be fixed.) > I only tested the feature with DPDK(the only backend use it today?). > Max defined the protocol and added the corresponding code in DPDK > before I added QEMU support. If other backend or different device type > want to use this, we can have further discussion? So as far as I understand, the feature is supposed to rely on implementation-specific behavior between specifically qemu as a front-end and dpdk as a back-end, nothing else. Honestly, that to me is a very good reason to deprecate it. That would make it clear that any implementation that implements it does so because it relies on implementation-specific behavior from other implementations. Option 2 is to fix it. It is not right to use this broadly defined feature with its clear protocol as given in the virtio specification just to set and clear a single bit (DRIVER_OK). The vhost-user specification points to that virtio protocol. We must adhere to the protocol. And note that we must not reset devices just because the VM is paused/resumed. (That is why I wanted to deprecate SET_STATUS, so that Stefan’s series would introduce RESET_DEVICE where we need it, and we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().) Option 3 would be to just be honest in the specification, and limit the scope of F_STATUS to say the only bit that matters is DRIVER_OK. I would say this is not really different from deprecating, though it wouldn’t affect your case. However, I understand Alex relies on a full status byte. I’m still interested to know why that is. Option 4 is of course not to do anything, and leave everything as-is, waiting for the next person to stir the hornet’s nest. >>> Cc-ing Alex on this mail, because to me, this seems like an important >>> detail when he plans on using the byte in the future. If we need a >>> virtio status byte, I can’t see how we could use the existing F_STATUS >>> for it. >>> >>> Hanna >
On Tue, Oct 10, 2023 at 4:57 AM Yajun Wu <yajunw@nvidia.com> wrote: > > > On 10/9/2023 6:28 PM, German Maglione wrote: > > External email: Use caution opening links or attachments > > > > > > On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu <yajunw@nvidia.com> wrote: > >> > >> On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote: > >>>> On 06.10.23 11:26, Michael S. Tsirkin wrote: > >>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote: > >>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote: > >>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote: > >>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote: > >>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote: > >>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote: > >>>>>>>>>>> There is no clearly defined purpose for the virtio status byte in > >>>>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio > >>>>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK > >>>>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors > >>>>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). > >>>>>>>>>>> > >>>>>>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does > >>>>>>>>>>> implement it, but only uses it to signal feature negotiation failure. > >>>>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively > >>>>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today > >>>>>>>>>>> means the same thing as RESET_DEVICE). > >>>>>>>>>>> > >>>>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not > >>>>>>>>>>> forward the guest-set status byte, but instead just makes it up > >>>>>>>>>>> internally, and actually completely ignores what the back-end returns, > >>>>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single > >>>>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back > >>>>>>>>>>> to see whether the flag is still set, which is the only way in which > >>>>>>>>>>> dpdk uses the status byte. > >>>>>>>>>>> > >>>>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this > >>>>>>>>>>> field in a useful manner, and it also provides no practical use over > >>>>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly > >>>>>>>>>>> defined. Deprecate it. > >>>>>>>>>>> > >>>>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> > >>>>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >>>>>>>>>>> --- > >>>>>>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- > >>>>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-) > >>>>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > >>>>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK. > >>>>>>>>> The fact current backends never check errors does not mean they never > >>>>>>>>> will. So no, not applying this. > >>>>>>>> Can this not be done with REPLY_ACK? I.e., with the following message > >>>>>>>> order: > >>>>>>>> > >>>>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is > >>>>>>>> present > >>>>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK > >>>>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK > >>>>>>>> 4. SET_FEATURES with need_reply > >>>>>>>> > >>>>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the > >>>>>>>> vCPUs are stopped, which generally seems to request a device reset. If we > >>>>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will > >>>>>>>> implement SET_STATUS later may break with at least these qemu versions. But > >>>>>>>> documenting that a particular use of the status byte is to be ignored would > >>>>>>>> be really strange. > >>>>>>>> > >>>>>>>> Hanna > >>>>>>> Hmm I guess. Though just following virtio spec seems cleaner to me... > >>>>>>> vhost-user reconfigures the state fully on start. > >>>>>> Not the internal device state, though. virtiofsd has internal state, and > >>>>>> other devices like vhost-gpu back-ends would probably, too. > >>>>>> > >>>>>> Stefan has recently sent a series > >>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to > >>>>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a > >>>>>> reset). > >>>>>> > >>>>>> I really don’t like our current approach with the status byte. Following the > >>>>>> virtio specification to me would mean that the guest directly controls this > >>>>>> byte, which it does not. qemu makes up values as it deems appropriate, and > >>>>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e. > >>>>>> when the guest really doesn’t want a device reset. > >>>>>> > >>>>>> That means that qemu does not treat this as a virtio device field (because > >>>>>> that would mean exposing it to the guest driver), but instead treats it as > >>>>>> part of the vhost(-user) protocol. It doesn’t feel right to me that we use > >>>>>> a virtio-defined feature for communication on the vhost level, i.e. between > >>>>>> front-end and back-end, and not between guest driver and device. I think > >>>>>> all vhost-level protocol features should be fully defined in the vhost-user > >>>>>> specification, which REPLY_ACK is. > >>>>> Hmm that makes sense. Maybe we should have done what stefan's patch > >>>>> is doing. > >>>>> > >>>>> Do look at the original commit that introduced it to understand why > >>>>> it was added. > >>>> I don’t understand why this was added to the stop/cont code, though. If it > >>>> is time consuming to make these changes, why are they done every time the VM > >>>> is paused > >>>> and resumed? It makes sense that this would be done for the initial > >>>> configuration (where a reset also wouldn’t hurt), but here it seems wrong. > >>>> > >>>> (To be clear, a reset in the stop/cont code is wrong, because it breaks > >>>> stateful devices.) > >>>> > >>>> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as > >>>> originally introduced was wrong even for non-stateful devices, because it > >>>> occurred before we fetched the state (vring indices) so we could restore it > >>>> later. I don’t know how 923b8921d21 was tested, but if the back-end used > >>>> for testing implemented SET_STATUS 0 as a reset, it could not have survived > >>>> either migration or a stop/cont in general, because the vring indices would > >>>> have been reset to 0. > >>>> > >>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all > >>>> devices that would implement them as per virtio spec, and even today it’s > >>>> broken for stateful devices. The mentioned performance issue is likely > >>>> real, but we can’t address it by making up SET_STATUS calls that are wrong. > >>>> > >>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all > >>>> final configuration that would happen upon a DRIVER_OK once the first vring > >>>> is started (i.e. receives a kick). That has the added benefit of being > >>>> asynchronous because it doesn’t block any vhost-user messages (which are > >>>> synchronous, and thus block downtime). > >>>> > >>>> Hanna > >>> For better or worse kick is per ring. It's out of spec to start rings > >>> that were not kicked but I guess you could do configuration ... > >>> Seems somewhat asymmetrical though. > >>> > >>> Let's wait until next week, hopefully Yajun Wu will answer. > >> The main motivation of adding VHOST_USER_SET_STATUS is to let backend > >> DPDK know > >> when DRIVER_OK bit is valid. It's an indication of all VQ configuration > >> has sent, > >> otherwise DPDK has to rely on first queue pair is ready, then > >> receiving/applying > >> VQ configuration one by one. > >> > >> During live migration, configuring VQ one by one is very time consuming. > >> For VIRTIO > >> net vDPA, HW needs to know how many VQs are enabled to set > >> RSS(Receive-Side Scaling). > >> > >> If you don’t want SET_STATUS message, backend can remove protocol > >> feature bit > >> VHOST_USER_PROTOCOL_F_STATUS. > >> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device > >> close/reset. > > This is incorrect, resetting the device on GET_VRING_BASE breaks > > the stop/cont. Since you don't want to reset the VQs on stop/cont. > Sorry for the misunderstanding, dpdk vhost backend framework doesn't > have RESET concept(only device level .dev_conf and .dev_close). On > receiving DRIVER_OK does dev_conf, on receiving GET_VRING_BASE does > dev_close. For every VM suspend/resume, dpdk issues dev_close then dev_conf. (sorry I did not explain myself well) I meant that resetting the VQs upon receiveng GET_VRING_BASE makes the backend to fail if qemu continues after a "stop". I notice that in dpdk, when it receives a GET_VRING_BASE[0], it calls 'vring_invalidate(dev, vq);'[1], resetting the VQ[2], doing that is incorrect. [0] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2135 [1] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost_user.c#L2201 [2] https://github.com/DPDK/dpdk/blob/main/lib/vhost/vhost.c#L580 > > > >> I'm not involved in discussion about adding SET_STATUS in Vhost > >> protocol. This feature > >> is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS). > >> > >> Thanks, > >> Yajun > >>>>>> Now, we could hand full control of the status byte to the guest, and that > >>>>>> would make me content. But I feel like that doesn’t really work, because > >>>>>> qemu needs to intercept the status byte anyway (it needs to know when there > >>>>>> is a reset, probably wants to know when the device is configured, etc.), so > >>>>>> I don’t think having the status byte in vhost-user really gains us much when > >>>>>> qemu could translate status byte changes to/from other vhost-user commands. > >>>>>> > >>>>>> Hanna > >>>>> well it intercepts it but I think it could pass it on unchanged. > >>>>> > >>>>> > >>>>>>> I guess symmetry was the > >>>>>>> point. So I don't see why SET_STATUS 0 has to be ignored. > >>>>>>> > >>>>>>> > >>>>>>> SET_STATUS was introduced by: > >>>>>>> > >>>>>>> commit 923b8921d210763359e96246a58658ac0db6c645 > >>>>>>> Author: Yajun Wu <yajunw@nvidia.com> > >>>>>>> Date: Mon Oct 17 14:44:52 2022 +0800 > >>>>>>> > >>>>>>> vhost-user: Support vhost_dev_start > >>>>>>> > >>>>>>> CC the author. > >>>>>>> > >> _______________________________________________ > >> Virtio-fs mailing list > >> Virtio-fs@redhat.com > >> https://listman.redhat.com/mailman/listinfo/virtio-fs > > > > > > -- > > German > > >
Hanna Czenczek <hreitz@redhat.com> writes: > On 10.10.23 06:00, Yajun Wu wrote: >> >> On 10/9/2023 5:13 PM, Hanna Czenczek wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On 09.10.23 11:07, Hanna Czenczek wrote: >>>> On 09.10.23 10:21, Hanna Czenczek wrote: >>>>> On 07.10.23 04:22, Yajun Wu wrote: >>>> [...] >>>> <snip> > So as far as I understand, the feature is supposed to rely on > implementation-specific behavior between specifically qemu as a > front-end and dpdk as a back-end, nothing else. Honestly, that to me > is a very good reason to deprecate it. That would make it clear that > any implementation that implements it does so because it relies on > implementation-specific behavior from other implementations. > > Option 2 is to fix it. It is not right to use this broadly defined > feature with its clear protocol as given in the virtio specification > just to set and clear a single bit (DRIVER_OK). The vhost-user > specification points to that virtio protocol. We must adhere to the > protocol. And note that we must not reset devices just because the VM > is paused/resumed. (That is why I wanted to deprecate SET_STATUS, so > that Stefan’s series would introduce RESET_DEVICE where we need it, > and we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().) > > Option 3 would be to just be honest in the specification, and limit > the scope of F_STATUS to say the only bit that matters is DRIVER_OK. > I would say this is not really different from deprecating, though it > wouldn’t affect your case. However, I understand Alex relies on a > full status byte. I’m still interested to know why that is. For an F_TRANSPORT backend (or whatever the final name ends up being) we need the backend to have full control of the status byte because all the handling of VirtIO is deferred to it. Therefor it has to handle all the feature negotiation and indicate when the device needs resetting. (side note: feature negotiation is another slippery area when QEMU gets involved in gating which feature bits may or may not be exposed to the backend. The only one it should ever mask is F_UNUSED which is used (sic) to trigger the vhost protocol negotiation) > Option 4 is of course not to do anything, and leave everything as-is, > waiting for the next person to stir the hornet’s nest. > >>>> Cc-ing Alex on this mail, because to me, this seems like an important >>>> detail when he plans on using the byte in the future. If we need a >>>> virtio status byte, I can’t see how we could use the existing F_STATUS >>>> for it. What would we use instead of F_STATUS to query the Device Status field? >>>> >>>> Hanna >>
On 10.10.23 12:36, Alex Bennée wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > >> On 10.10.23 06:00, Yajun Wu wrote: >>> On 10/9/2023 5:13 PM, Hanna Czenczek wrote: >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On 09.10.23 11:07, Hanna Czenczek wrote: >>>>> On 09.10.23 10:21, Hanna Czenczek wrote: >>>>>> On 07.10.23 04:22, Yajun Wu wrote: >>>>> [...] >>>>> > <snip> >> So as far as I understand, the feature is supposed to rely on >> implementation-specific behavior between specifically qemu as a >> front-end and dpdk as a back-end, nothing else. Honestly, that to me >> is a very good reason to deprecate it. That would make it clear that >> any implementation that implements it does so because it relies on >> implementation-specific behavior from other implementations. >> >> Option 2 is to fix it. It is not right to use this broadly defined >> feature with its clear protocol as given in the virtio specification >> just to set and clear a single bit (DRIVER_OK). The vhost-user >> specification points to that virtio protocol. We must adhere to the >> protocol. And note that we must not reset devices just because the VM >> is paused/resumed. (That is why I wanted to deprecate SET_STATUS, so >> that Stefan’s series would introduce RESET_DEVICE where we need it, >> and we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().) >> >> Option 3 would be to just be honest in the specification, and limit >> the scope of F_STATUS to say the only bit that matters is DRIVER_OK. >> I would say this is not really different from deprecating, though it >> wouldn’t affect your case. However, I understand Alex relies on a >> full status byte. I’m still interested to know why that is. > For an F_TRANSPORT backend (or whatever the final name ends up being) we > need the backend to have full control of the status byte because all the > handling of VirtIO is deferred to it. Therefor it has to handle all the > feature negotiation and indicate when the device needs resetting. > > (side note: feature negotiation is another slippery area when QEMU gets > involved in gating which feature bits may or may not be exposed to the > backend. The only one it should ever mask is F_UNUSED which is used > (sic) to trigger the vhost protocol negotiation) That’s the thing, feature negotiation is done with GET_FEATURES and SET_FEATURES. Configuring F_REPLY_ACK lets SET_FEATURES return errors. Indicating that the device needs reset is a good point, there is no other feature to do that. (And something qemu currently ignores, just like any value the device returns through GET_STATUS, but that’s besides the point.) >> Option 4 is of course not to do anything, and leave everything as-is, >> waiting for the next person to stir the hornet’s nest. >> >>>>> Cc-ing Alex on this mail, because to me, this seems like an important >>>>> detail when he plans on using the byte in the future. If we need a >>>>> virtio status byte, I can’t see how we could use the existing F_STATUS >>>>> for it. > What would we use instead of F_STATUS to query the Device Status field? We would emulate it in the front-end, just like we need to do for back-ends without F_STATUS. We can’t emulate the DEVICE_NEEDS_RESET bit, though, that’s correct. Given that qemu currently ignores DEVICE_NEEDS_RESET, I’m not 100 % convinced that your use case has a hard dependency on F_STATUS. However, this still does make a fair point in general that it would be useful to keep it. That still leaves us with the situation that currently, the only implementations with F_STATUS support are qemu and dpdk, which both handle it incorrectly. Furthermore, the specification leaves much to be desired, specifically in how F_STATUS interacts with other vhost-user commands (which is something I cited as a reason for my original patch), i.e. whether RESET_DEVICE and SET_STATUS 0 are equivalent, and whether failures in feature negotiation must result in both SET_FEATURES returning an error (with F_REPLY_ACK), and FEATURES_OK being reset in the status byte, or whether either is sufficient. What happens when DEVICE_NEEDS_RESET is set, i.e. do we just need RESET_DEVICE / SET_STATUS 0, or do we also need to reset some protocol state? (This is also connected to the fact that what happens on RESET_DEVICE is largely undefined, which I said on Stefan’s series.) In general, because we have our own transport, we should make a note how it interacts with the status negotiation phases, i.e. that GET_FEATURES must not be called before S_ACKNOWLEDGE | S_DRIVER are set, that FEATURES_OK must be set after the SET_FEATURES call, and that DRIVER_OK must not be set without FEATURES_OK set / SET_FEATURES having returned success. Here we would also answer the question about the interaction of F_REPLY_ACK+SET_FEATURES with F_STATUS, specifically whether an implementation with F_REPLY_ACK even needs to read back the status byte after setting FEATURES_OK because it could have got the feature negotiation result already as a result of the SET_FEATURES call. After migration, can you just set all flags immediately or do we need to follow this step-by-step protocol? I think we do need to do it step-by-step, mostly for simplicity in the back-end, i.e. that it just sees a normal device start-up. We should also clarify whether SET_STATUS can fail, i.e. whether setting an invalid status (is setting FEATURES_OK when the device doesn’t think so invalid?) has SET_STATUS fail (with F_REPLY_ACK) and/or immediately gets the device into DEVICE_NEEDS_RESET. We should clarify whether SET_STATUS can block. The current use of DRIVER_OK seems to indicate to me that dpdk does do time-consuming operations when it sees DRIVER_OK (code looks like it, too) and only returns when that’s done, but naïvely, I would expect SET_STATUS to be just setting some value and doing whatever needs to be done in the background, not actually launching and blocking on an operation. I think it is dangerous to just push ahead with using F_STATUS without acknowledging that its implementation is broken right now, and that it is so *on purpose* because the DRIVER_OK bit is the only thing that it was supposed to be used for. Using it for its purported original use (actually the virtio status byte) is contradictory to that. It’s probably fixable, but I think it requires taking a step back and seeing what needs to be done to remedy the conflict. If you rip out all the existing STATUS code and replace it such that qemu will let the guest have full control over the status byte (except for migration, where we restore it on the destination, which will result in DRIVER_OK being set at the end, fulfilling that requirement), that will fix the implementation in qemu. I think. But the specification should be amended to think about all these corner cases, not least because I think they will also affect your implementation. (The answers to many of the questions I raise for documentation may be obvious to you, based on “in virtio, it’s just an MMIO byte that’s written and read, so the rest follows from there”. But evidently the implementation we have kind of ignores that e.g. SET_STATUS 0 is a reset (6f8be29ec17d44496b9ed67599bceaaba72d1864 is a work-around, not much more) or that there is actually a protocol to setting the status flags and you can’t just set them all at once, so I don’t think the answers are immediately obvious, and should be documented.) As for me and the original patch: I claimed nobody really needs F_STATUS, you say you do, so plainly, I assumed wrong and will naturally take my hands off of F_STATUS and just ensure not to implement it in any back-end until you’ve fixed it, as Yajun has advised. I’d still prefer mentioning this advice in the documentation until it’s fixed, but, you know, I wouldn’t be the first one to say “I now know about the quirk, so I can work around it, no need to tell anyone else as long as my stuff works”. Hanna
Hanna Czenczek <hreitz@redhat.com> writes: (adding Viresh to CC for Xen Vhost questions) > On 10.10.23 12:36, Alex Bennée wrote: >> Hanna Czenczek <hreitz@redhat.com> writes: >> >>> On 10.10.23 06:00, Yajun Wu wrote: >>>> On 10/9/2023 5:13 PM, Hanna Czenczek wrote: >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On 09.10.23 11:07, Hanna Czenczek wrote: >>>>>> On 09.10.23 10:21, Hanna Czenczek wrote: >>>>>>> On 07.10.23 04:22, Yajun Wu wrote: >>>>>> [...] >>>>>> >> <snip> >>> So as far as I understand, the feature is supposed to rely on >>> implementation-specific behavior between specifically qemu as a >>> front-end and dpdk as a back-end, nothing else. Honestly, that to me >>> is a very good reason to deprecate it. That would make it clear that >>> any implementation that implements it does so because it relies on >>> implementation-specific behavior from other implementations. >>> >>> Option 2 is to fix it. It is not right to use this broadly defined >>> feature with its clear protocol as given in the virtio specification >>> just to set and clear a single bit (DRIVER_OK). The vhost-user >>> specification points to that virtio protocol. We must adhere to the >>> protocol. And note that we must not reset devices just because the VM >>> is paused/resumed. (That is why I wanted to deprecate SET_STATUS, so >>> that Stefan’s series would introduce RESET_DEVICE where we need it, >>> and we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().) >>> >>> Option 3 would be to just be honest in the specification, and limit >>> the scope of F_STATUS to say the only bit that matters is DRIVER_OK. >>> I would say this is not really different from deprecating, though it >>> wouldn’t affect your case. However, I understand Alex relies on a >>> full status byte. I’m still interested to know why that is. >> For an F_TRANSPORT backend (or whatever the final name ends up being) we >> need the backend to have full control of the status byte because all the >> handling of VirtIO is deferred to it. Therefor it has to handle all the >> feature negotiation and indicate when the device needs resetting. >> >> (side note: feature negotiation is another slippery area when QEMU gets >> involved in gating which feature bits may or may not be exposed to the >> backend. The only one it should ever mask is F_UNUSED which is used >> (sic) to trigger the vhost protocol negotiation) > > That’s the thing, feature negotiation is done with GET_FEATURES and > SET_FEATURES. Configuring F_REPLY_ACK lets SET_FEATURES return > errors. OK but then what - QEMU fakes up FEATURES_OK in the Device Status field on the behalf of the backend? I should point out QEMU doesn't exist in some of these use case. When using the rust-vmm backends with Xen for example there is no VMM to talk to so we have a Xen Vhost Frontend which is entirely concerned with setup and then once connected up leaves the backend to do its thing. I'd rather leave the frontend as dumb as possible rather than splitting logic between the two. > Indicating that the device needs reset is a good point, there is no > other feature to do that. (And something qemu currently ignores, just > like any value the device returns through GET_STATUS, but that’s > besides the point.) > >>> Option 4 is of course not to do anything, and leave everything as-is, >>> waiting for the next person to stir the hornet’s nest. >>> >>>>>> Cc-ing Alex on this mail, because to me, this seems like an important >>>>>> detail when he plans on using the byte in the future. If we need a >>>>>> virtio status byte, I can’t see how we could use the existing F_STATUS >>>>>> for it. >> What would we use instead of F_STATUS to query the Device Status field? > > We would emulate it in the front-end, just like we need to do for > back-ends without F_STATUS. We can’t emulate the DEVICE_NEEDS_RESET > bit, though, that’s correct. > > Given that qemu currently ignores DEVICE_NEEDS_RESET, I’m not 100 % > convinced that your use case has a hard dependency on F_STATUS. > However, this still does make a fair point in general that it would be > useful to keep it. OK/ > That still leaves us with the situation that currently, the only > implementations with F_STATUS support are qemu and dpdk, which both > handle it incorrectly. I was going to say there is also the rust-vmm vhost-user-master crates which we've imported: https://github.com/vireshk/vhost for the Xen Vhost Frontend: https://github.com/vireshk/xen-vhost-frontend but I can't actually see any handling for GET/SET_STATUS at all which makes me wonder how we actually work. Viresh? > Furthermore, the specification leaves much to > be desired, specifically in how F_STATUS interacts with other > vhost-user commands (which is something I cited as a reason for my > original patch), i.e. whether RESET_DEVICE and SET_STATUS 0 are > equivalent, and whether failures in feature negotiation must result in > both SET_FEATURES returning an error (with F_REPLY_ACK), and > FEATURES_OK being reset in the status byte, or whether either is > sufficient. What happens when DEVICE_NEEDS_RESET is set, i.e. do we > just need RESET_DEVICE / SET_STATUS 0, or do we also need to reset > some protocol state? (This is also connected to the fact that what > happens on RESET_DEVICE is largely undefined, which I said on Stefan’s > series.) I'm all for strengthening the vhost-user protocol definitions. I'm just wary of encoding QEMU<->backend implementation details. > In general, because we have our own transport, we should make a note > how it interacts with the status negotiation phases, i.e. that > GET_FEATURES must not be called before S_ACKNOWLEDGE | S_DRIVER are > set, that FEATURES_OK must be set after the SET_FEATURES call, and > that DRIVER_OK must not be set without FEATURES_OK set / SET_FEATURES > having returned success. Here we would also answer the question about > the interaction of F_REPLY_ACK+SET_FEATURES with F_STATUS, > specifically whether an implementation with F_REPLY_ACK even needs to > read back the status byte after setting FEATURES_OK because it could > have got the feature negotiation result already as a result of the > SET_FEATURES call. Some sequence diagrams would remove a lot of the ambiguity from parsing the words. I wonder if there is a pretty way to do that to render nicely in our published docs? > After migration, can you just set all flags immediately or do we need > to follow this step-by-step protocol? I think we do need to do it > step-by-step, mostly for simplicity in the back-end, i.e. that it just > sees a normal device start-up. Makes sense. > We should also clarify whether SET_STATUS can fail, i.e. whether > setting an invalid status (is setting FEATURES_OK when the device > doesn’t think so invalid?) has SET_STATUS fail (with F_REPLY_ACK) > and/or immediately gets the device into DEVICE_NEEDS_RESET. > > We should clarify whether SET_STATUS can block. The current use of > DRIVER_OK seems to indicate to me that dpdk does do time-consuming > operations when it sees DRIVER_OK (code looks like it, too) and only > returns when that’s done, but naïvely, I would expect SET_STATUS to be > just setting some value and doing whatever needs to be done in the > background, not actually launching and blocking on an operation. Shouldn't the guest driver be reading the status bit until it flips? So potentially there could be multiple GET_STATUS calls. > I think it is dangerous to just push ahead with using F_STATUS without > acknowledging that its implementation is broken right now, and that it > is so *on purpose* because the DRIVER_OK bit is the only thing that it > was supposed to be used for. Using it for its purported original use > (actually the virtio status byte) is contradictory to that. It’s > probably fixable, but I think it requires taking a step back and > seeing what needs to be done to remedy the conflict. If you rip out > all the existing STATUS code and replace it such that qemu will let > the guest have full control over the status byte (except for > migration, where we restore it on the destination, which will result > in DRIVER_OK being set at the end, fulfilling that requirement), that > will fix the implementation in qemu. I think. But the specification > should be amended to think about all these corner cases, not least > because I think they will also affect your implementation. > > (The answers to many of the questions I raise for documentation may be > obvious to you, based on “in virtio, it’s just an MMIO byte that’s > written and read, so the rest follows from there”. But evidently the > implementation we have kind of ignores that e.g. SET_STATUS 0 is a > reset (6f8be29ec17d44496b9ed67599bceaaba72d1864 is a work-around, not > much more) or that there is actually a protocol to setting the status > flags and you can’t just set them all at once, so I don’t think the > answers are immediately obvious, and should be documented.) > > As for me and the original patch: I claimed nobody really needs > F_STATUS, you say you do, so plainly, I assumed wrong and will > naturally take my hands off of F_STATUS and just ensure not to > implement it in any back-end until you’ve fixed it, as Yajun has > advised. I’d still prefer mentioning this advice in the documentation > until it’s fixed, but, you know, I wouldn’t be the first one to say “I > now know about the quirk, so I can work around it, no need to tell > anyone else as long as my stuff works”. > > Hanna
On 10.10.23 16:35, Alex Bennée wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > > (adding Viresh to CC for Xen Vhost questions) > >> On 10.10.23 12:36, Alex Bennée wrote: >>> Hanna Czenczek <hreitz@redhat.com> writes: >>> >>>> On 10.10.23 06:00, Yajun Wu wrote: >>>>> On 10/9/2023 5:13 PM, Hanna Czenczek wrote: >>>>>> External email: Use caution opening links or attachments >>>>>> >>>>>> >>>>>> On 09.10.23 11:07, Hanna Czenczek wrote: >>>>>>> On 09.10.23 10:21, Hanna Czenczek wrote: >>>>>>>> On 07.10.23 04:22, Yajun Wu wrote: >>>>>>> [...] >>>>>>> >>> <snip> >>>> So as far as I understand, the feature is supposed to rely on >>>> implementation-specific behavior between specifically qemu as a >>>> front-end and dpdk as a back-end, nothing else. Honestly, that to me >>>> is a very good reason to deprecate it. That would make it clear that >>>> any implementation that implements it does so because it relies on >>>> implementation-specific behavior from other implementations. >>>> >>>> Option 2 is to fix it. It is not right to use this broadly defined >>>> feature with its clear protocol as given in the virtio specification >>>> just to set and clear a single bit (DRIVER_OK). The vhost-user >>>> specification points to that virtio protocol. We must adhere to the >>>> protocol. And note that we must not reset devices just because the VM >>>> is paused/resumed. (That is why I wanted to deprecate SET_STATUS, so >>>> that Stefan’s series would introduce RESET_DEVICE where we need it, >>>> and we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().) >>>> >>>> Option 3 would be to just be honest in the specification, and limit >>>> the scope of F_STATUS to say the only bit that matters is DRIVER_OK. >>>> I would say this is not really different from deprecating, though it >>>> wouldn’t affect your case. However, I understand Alex relies on a >>>> full status byte. I’m still interested to know why that is. >>> For an F_TRANSPORT backend (or whatever the final name ends up being) we >>> need the backend to have full control of the status byte because all the >>> handling of VirtIO is deferred to it. Therefor it has to handle all the >>> feature negotiation and indicate when the device needs resetting. >>> >>> (side note: feature negotiation is another slippery area when QEMU gets >>> involved in gating which feature bits may or may not be exposed to the >>> backend. The only one it should ever mask is F_UNUSED which is used >>> (sic) to trigger the vhost protocol negotiation) >> That’s the thing, feature negotiation is done with GET_FEATURES and >> SET_FEATURES. Configuring F_REPLY_ACK lets SET_FEATURES return >> errors. > OK but then what - QEMU fakes up FEATURES_OK in the Device Status field > on the behalf of the backend? It does that right now. When using qemu, vhost-user status byte is not exposed to the guest at all. qemu makes it up completely, and effectively ignores the response from GET_STATUS completely. (The only use of GET_STATUS is (right now): There is a function to set a flag in the status byte, and it calls GET_STATUS, ORs the flag in, and calls SET_STATUS with the result.) > I should point out QEMU doesn't exist in some of these use case. When > using the rust-vmm backends with Xen for example there is no VMM to talk > to so we have a Xen Vhost Frontend which is entirely concerned with > setup and then once connected up leaves the backend to do its thing. I'd > rather leave the frontend as dumb as possible rather than splitting > logic between the two. > >> Indicating that the device needs reset is a good point, there is no >> other feature to do that. (And something qemu currently ignores, just >> like any value the device returns through GET_STATUS, but that’s >> besides the point.) >> >>>> Option 4 is of course not to do anything, and leave everything as-is, >>>> waiting for the next person to stir the hornet’s nest. >>>> >>>>>>> Cc-ing Alex on this mail, because to me, this seems like an important >>>>>>> detail when he plans on using the byte in the future. If we need a >>>>>>> virtio status byte, I can’t see how we could use the existing F_STATUS >>>>>>> for it. >>> What would we use instead of F_STATUS to query the Device Status field? >> We would emulate it in the front-end, just like we need to do for >> back-ends without F_STATUS. We can’t emulate the DEVICE_NEEDS_RESET >> bit, though, that’s correct. >> >> Given that qemu currently ignores DEVICE_NEEDS_RESET, I’m not 100 % >> convinced that your use case has a hard dependency on F_STATUS. >> However, this still does make a fair point in general that it would be >> useful to keep it. > OK/ > >> That still leaves us with the situation that currently, the only >> implementations with F_STATUS support are qemu and dpdk, which both >> handle it incorrectly. > I was going to say there is also the rust-vmm vhost-user-master crates > which we've imported: > > https://github.com/vireshk/vhost > > for the Xen Vhost Frontend: > > https://github.com/vireshk/xen-vhost-frontend > > but I can't actually see any handling for GET/SET_STATUS at all which > makes me wonder how we actually work. Viresh? As far as I know the only back-end implementation of F_STATUS is in DPDK. As I said, if anyone else implemented it right now, that would be dangerous, because qemu doesn’t adhere to the virtio protocol when it comes to the status byte. >> Furthermore, the specification leaves much to >> be desired, specifically in how F_STATUS interacts with other >> vhost-user commands (which is something I cited as a reason for my >> original patch), i.e. whether RESET_DEVICE and SET_STATUS 0 are >> equivalent, and whether failures in feature negotiation must result in >> both SET_FEATURES returning an error (with F_REPLY_ACK), and >> FEATURES_OK being reset in the status byte, or whether either is >> sufficient. What happens when DEVICE_NEEDS_RESET is set, i.e. do we >> just need RESET_DEVICE / SET_STATUS 0, or do we also need to reset >> some protocol state? (This is also connected to the fact that what >> happens on RESET_DEVICE is largely undefined, which I said on Stefan’s >> series.) > I'm all for strengthening the vhost-user protocol definitions. I'm just > wary of encoding QEMU<->backend implementation details. > >> In general, because we have our own transport, we should make a note >> how it interacts with the status negotiation phases, i.e. that >> GET_FEATURES must not be called before S_ACKNOWLEDGE | S_DRIVER are >> set, that FEATURES_OK must be set after the SET_FEATURES call, and >> that DRIVER_OK must not be set without FEATURES_OK set / SET_FEATURES >> having returned success. Here we would also answer the question about >> the interaction of F_REPLY_ACK+SET_FEATURES with F_STATUS, >> specifically whether an implementation with F_REPLY_ACK even needs to >> read back the status byte after setting FEATURES_OK because it could >> have got the feature negotiation result already as a result of the >> SET_FEATURES call. > Some sequence diagrams would remove a lot of the ambiguity from parsing > the words. I wonder if there is a pretty way to do that to render nicely > in our published docs? I’m sure some form of SVG will work. Somehow. If not, it should. :) >> After migration, can you just set all flags immediately or do we need >> to follow this step-by-step protocol? I think we do need to do it >> step-by-step, mostly for simplicity in the back-end, i.e. that it just >> sees a normal device start-up. > Makes sense. > >> We should also clarify whether SET_STATUS can fail, i.e. whether >> setting an invalid status (is setting FEATURES_OK when the device >> doesn’t think so invalid?) has SET_STATUS fail (with F_REPLY_ACK) >> and/or immediately gets the device into DEVICE_NEEDS_RESET. >> >> We should clarify whether SET_STATUS can block. The current use of >> DRIVER_OK seems to indicate to me that dpdk does do time-consuming >> operations when it sees DRIVER_OK (code looks like it, too) and only >> returns when that’s done, but naïvely, I would expect SET_STATUS to be >> just setting some value and doing whatever needs to be done in the >> background, not actually launching and blocking on an operation. > Shouldn't the guest driver be reading the status bit until it flips? So > potentially there could be multiple GET_STATUS calls. Ah, the device will only show DRIVER_OK set once the device is ready to serve the driver? Hanna
On 13-10-23, 20:02, Hanna Czenczek wrote: > On 10.10.23 16:35, Alex Bennée wrote: > > I was going to say there is also the rust-vmm vhost-user-master crates > > which we've imported: > > > > https://github.com/vireshk/vhost > > > > for the Xen Vhost Frontend: > > > > https://github.com/vireshk/xen-vhost-frontend > > > > but I can't actually see any handling for GET/SET_STATUS at all which > > makes me wonder how we actually work. Viresh? > > As far as I know the only back-end implementation of F_STATUS is in DPDK. > As I said, if anyone else implemented it right now, that would be dangerous, > because qemu doesn’t adhere to the virtio protocol when it comes to the > status byte. Yeah, none of the Rust based Virtio backends enable `STATUS` in `VhostUserProtocolFeatures` and so these messages are never exchanged. The generic Rust code for the backends, doesn't even implement them. Not sure if they should or not.
On 17.10.23 09:49, Viresh Kumar wrote: > On 13-10-23, 20:02, Hanna Czenczek wrote: >> On 10.10.23 16:35, Alex Bennée wrote: >>> I was going to say there is also the rust-vmm vhost-user-master crates >>> which we've imported: >>> >>> https://github.com/vireshk/vhost >>> >>> for the Xen Vhost Frontend: >>> >>> https://github.com/vireshk/xen-vhost-frontend >>> >>> but I can't actually see any handling for GET/SET_STATUS at all which >>> makes me wonder how we actually work. Viresh? >> As far as I know the only back-end implementation of F_STATUS is in DPDK. >> As I said, if anyone else implemented it right now, that would be dangerous, >> because qemu doesn’t adhere to the virtio protocol when it comes to the >> status byte. > Yeah, none of the Rust based Virtio backends enable `STATUS` in > `VhostUserProtocolFeatures` and so these messages are never exchanged. > > The generic Rust code for the backends, doesn't even implement them. > Not sure if they should or not. It absolutely should not, for evidence see this whole thread. qemu sends a SET_STATUS 0, which amounts to a reset, when the VM is merely paused[1], and when it sets status bytes, it does not set them according to virtio specification. Implementing it right now means relying on and working around qemu’s implementation-defined spec-breaking behavior. Also, note that qemu ignores feature negotiation response through FEATURES_OK, and DEVICE_NEEDS_RESET, so unless it’s worth working around the problems just to get some form of DRIVER_OK information (note this information does not come from the driver, but qemu makes it up), I absolutely would not implement it. [1] Notably, it does restore the virtio state to the best of its abilities when the VM is resumed, but this is all still wrong (there is no point in doing so much on a pause/resume, it needlessly costs time) and any implementation that does a reset then will rely on the implementation-defined behavior that qemu is actually able to restore all the state that the back-end would lose during a reset. Notably, reset is not even well-defined in the vhost-user specification. It was argued, in this thread, that DPDK works just fine with this, precisely because it ignores SET_STATUS 0. Finally, if virtiofsd in particular, as a user of the Rust crates, is reset, it would lose its internal state, which qemu cannot restore short of using the upcoming migration facilities.
diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 5a070adbc1..2f68e67a1a 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -1424,21 +1424,35 @@ Front-end message types :request payload: ``u64`` :reply payload: N/A - When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been - successfully negotiated, this message is submitted by the front-end to - notify the back-end with updated device status as defined in the Virtio +.. admonition:: Deprecated + + This is no longer used. Used to be sent by the front-end to notify the + back-end with updated device status as defined in the Virtio specification. + However, its purpose in vhost-user was never well-defined; for + example, how or if it would replace VHOST_USER_RESET_DEVICE, or how it + integrates with the feature negotiation phase. Therefore, + implementations in practice were less than strict in how the status + value was handled, which means there was actually no protocol between + front-end and back-end on the use of the status value. + + For resetting, use VHOST_USER_RESET_DEVICE instead. For feature + negotiation with acknowledgment from the device, use + VHOST_USER_SET_FEATURES with the :ref:`REPLY_ACK <reply_ack>` feature + instead. + ``VHOST_USER_GET_STATUS`` :id: 40 :equivalent ioctl: VHOST_VDPA_GET_STATUS :request payload: N/A :reply payload: ``u64`` - When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been - successfully negotiated, this message is submitted by the front-end to - query the back-end for its device status as defined in the Virtio - specification. +.. admonition:: Deprecated + + This is no longer used. Used to be sent by the front-end to query the + back-end for its device status as defined in the Virtio specification. + Deprecated together with VHOST_USER_SET_STATUS. Back-end message types
There is no clearly defined purpose for the virtio status byte in vhost-user: For resetting, we already have RESET_DEVICE; and for virtio feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK protocol extension, it is possible for SET_FEATURES to return errors (SET_PROTOCOL_FEATURES may be called before SET_FEATURES). As for implementations, SET_STATUS is not widely implemented. dpdk does implement it, but only uses it to signal feature negotiation failure. While it does log reset requests (SET_STATUS 0) as such, it effectively ignores them, in contrast to RESET_OWNER (which is deprecated, and today means the same thing as RESET_DEVICE). While qemu superficially has support for [GS]ET_STATUS, it does not forward the guest-set status byte, but instead just makes it up internally, and actually completely ignores what the back-end returns, only using it as the template for a subsequent SET_STATUS to add single bits to it. Notably, after setting FEATURES_OK, it never reads it back to see whether the flag is still set, which is the only way in which dpdk uses the status byte. As-is, no front-end or back-end can rely on the other side handling this field in a useful manner, and it also provides no practical use over other mechanisms the vhost-user protocol has, which are more clearly defined. Deprecate it. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- docs/interop/vhost-user.rst | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-)