Message ID | 20231004014532.1228637-2-stefanha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vhost: clean up device reset | expand |
On 04.10.23 03:45, Stefan Hajnoczi wrote: > The VHOST_USER_RESET_OWNER message is deprecated in the spec: > > This is no longer used. Used to be sent to request disabling all > rings, but some back-ends interpreted it to also discard connection > state (this interpretation would lead to bugs). It is recommended > that back-ends either ignore this message, or use it to disable all > rings. According to the spec, it is then indeed better to not call it in vhost_user_reset_device, because it seems like it would be interpreted as something completely different. However, between the three back-end implementations of vhost-user I know of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none implement RESET_DEVICE. libvhost-user and DPDK do implement RESET_OWNER, though, and they both do it by resetting the device, not by disabling any vring. The vhost crate also implements RESET_OWNER, but it doesn’t do anything but forward it as such to the actual device implementation (virtiofsd doesn’t implement this function, so ignores it). It does document that it would disable all vrings, but does so in the past and has marked it deprecated (ever since the method was introduced in the fourth commit to the repository, making it extremely unlikely that anyone would implement it). So I would like to know why the spec says that it would disable all vrings, when none of the implementations (qemu, libvhost-user, DPDK) agree on that. Let me look it up: Before commit c61f09ed855, it did say “stopping” instead of “disabling”. The commit doesn’t explain why it changed this. Commit a586e65bbd0 (just a week prior) deprecated the command, changing it from “connection is about to be closed, [front-end] will no longer own this connection” to “deprecated, used to be sent to request stopping all vrings”. To me, the front-end closing the connection sounds like a good point to reset, which would indeed stop all vrings, but not just that. Notably, qemu agrees, because RESET_OWNER is used only in the vhost_user_reset_device() function. a586e65bbd0^ removed that function’s use, though, specifically because it would cause a reset, when the intention was just to stop. So it sounds to me like “used to be sent to request stopping all vrings” is rather what vhost_net wanted, but specifically not what the message did, which was anything between nothing and a reset, I presume (because it never specified what the back-end was supposed to do, though apparently libvhost-user and DPDK both took it to mean reset). Why it was then changed to “disabling”, I absolutely cannot say. Now, the code change here is indeed effectively a no-op, as you deduce below, but in the context of the whole series the situation is a bit different: As far as I understand, the point is to have guest-initiated resets be forwarded to back-ends. But by removing the RESET_OWNER fallback, no back-end will actually do a reset still. I understand that as per the specification, using RESET_OWNER for resetting is wrong. But all implementations that implemented it before it was deprecated do interpret it as a reset, so I don’t think using it as a fallback is actually wrong. Hanna > The only caller of vhost_user_reset_device() is vhost_user_scsi_reset(). > It checks that F_RESET_DEVICE was negotiated before calling it: > > static void vhost_user_scsi_reset(VirtIODevice *vdev) > { > VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev); > struct vhost_dev *dev = &vsc->dev; > > /* > * Historically, reset was not implemented so only reset devices > * that are expecting it. > */ > if (!virtio_has_feature(dev->protocol_features, > VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { > return; > } > > if (dev->vhost_ops->vhost_reset_device) { > dev->vhost_ops->vhost_reset_device(dev); > } > } > > Therefore VHOST_USER_RESET_OWNER is actually never sent by > vhost_user_reset_device(). Remove the dead code. This effectively moves > the vhost-user protocol specific code from vhost-user-scsi.c into > vhost-user.c where it belongs. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > hw/scsi/vhost-user-scsi.c | 9 --------- > hw/virtio/vhost-user.c | 13 +++++++++---- > 2 files changed, 9 insertions(+), 13 deletions(-) > > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c > index ee99b19e7a..8582b2e8ab 100644 > --- a/hw/scsi/vhost-user-scsi.c > +++ b/hw/scsi/vhost-user-scsi.c > @@ -71,15 +71,6 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev) > VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev); > struct vhost_dev *dev = &vsc->dev; > > - /* > - * Historically, reset was not implemented so only reset devices > - * that are expecting it. > - */ > - if (!virtio_has_feature(dev->protocol_features, > - VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { > - return; > - } > - > if (dev->vhost_ops->vhost_reset_device) { > dev->vhost_ops->vhost_reset_device(dev); > } > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 8dcf049d42..7bed9ad7d5 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -1492,12 +1492,17 @@ static int vhost_user_reset_device(struct vhost_dev *dev) > { > VhostUserMsg msg = { > .hdr.flags = VHOST_USER_VERSION, > + .hdr.request = VHOST_USER_RESET_DEVICE, > }; > > - msg.hdr.request = virtio_has_feature(dev->protocol_features, > - VHOST_USER_PROTOCOL_F_RESET_DEVICE) > - ? VHOST_USER_RESET_DEVICE > - : VHOST_USER_RESET_OWNER; > + /* > + * Historically, reset was not implemented so only reset devices > + * that are expecting it. > + */ > + if (!virtio_has_feature(dev->protocol_features, > + VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { > + return -ENOSYS; > + } > > return vhost_user_write(dev, &msg, NULL, 0); > }
On Wed, 4 Oct 2023 at 06:44, Hanna Czenczek <hreitz@redhat.com> wrote: > > On 04.10.23 03:45, Stefan Hajnoczi wrote: > > The VHOST_USER_RESET_OWNER message is deprecated in the spec: > > > > This is no longer used. Used to be sent to request disabling all > > rings, but some back-ends interpreted it to also discard connection > > state (this interpretation would lead to bugs). It is recommended > > that back-ends either ignore this message, or use it to disable all > > rings. > > According to the spec, it is then indeed better to not call it in > vhost_user_reset_device, because it seems like it would be interpreted > as something completely different. > > However, between the three back-end implementations of vhost-user I know > of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none > implement RESET_DEVICE. libvhost-user and DPDK do implement > RESET_OWNER, though, and they both do it by resetting the device, not by > disabling any vring. The vhost crate also implements RESET_OWNER, but > it doesn’t do anything but forward it as such to the actual device > implementation (virtiofsd doesn’t implement this function, so ignores > it). It does document that it would disable all vrings, but does so in > the past and has marked it deprecated (ever since the method was > introduced in the fourth commit to the repository, making it extremely > unlikely that anyone would implement it). Hi Hanna, vhost-user-backend still seems to reset all vhost-user protocol state, making RESET_OWNER unusable: https://github.com/rust-vmm/vhost/blob/main/crates/vhost-user-backend/src/handler.rs#L230 Have I missed something? Stefan > > So I would like to know why the spec says that it would disable all > vrings, when none of the implementations (qemu, libvhost-user, DPDK) > agree on that. Let me look it up: > > Before commit c61f09ed855, it did say “stopping” instead of > “disabling”. The commit doesn’t explain why it changed this. Commit > a586e65bbd0 (just a week prior) deprecated the command, changing it from > “connection is about to be closed, [front-end] will no longer own this > connection” to “deprecated, used to be sent to request stopping all > vrings”. To me, the front-end closing the connection sounds like a good > point to reset, which would indeed stop all vrings, but not just that. > Notably, qemu agrees, because RESET_OWNER is used only in the > vhost_user_reset_device() function. a586e65bbd0^ removed that function’s > use, though, specifically because it would cause a reset, when the > intention was just to stop. > > So it sounds to me like “used to be sent to request stopping all vrings” > is rather what vhost_net wanted, but specifically not what the message > did, which was anything between nothing and a reset, I presume (because > it never specified what the back-end was supposed to do, though > apparently libvhost-user and DPDK both took it to mean reset). Why it > was then changed to “disabling”, I absolutely cannot say. > > Now, the code change here is indeed effectively a no-op, as you deduce > below, but in the context of the whole series the situation is a bit > different: As far as I understand, the point is to have guest-initiated > resets be forwarded to back-ends. But by removing the RESET_OWNER > fallback, no back-end will actually do a reset still. > > I understand that as per the specification, using RESET_OWNER for > resetting is wrong. But all implementations that implemented it before > it was deprecated do interpret it as a reset, so I don’t think using it > as a fallback is actually wrong. > > Hanna > > > The only caller of vhost_user_reset_device() is vhost_user_scsi_reset(). > > It checks that F_RESET_DEVICE was negotiated before calling it: > > > > static void vhost_user_scsi_reset(VirtIODevice *vdev) > > { > > VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev); > > struct vhost_dev *dev = &vsc->dev; > > > > /* > > * Historically, reset was not implemented so only reset devices > > * that are expecting it. > > */ > > if (!virtio_has_feature(dev->protocol_features, > > VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { > > return; > > } > > > > if (dev->vhost_ops->vhost_reset_device) { > > dev->vhost_ops->vhost_reset_device(dev); > > } > > } > > > > Therefore VHOST_USER_RESET_OWNER is actually never sent by > > vhost_user_reset_device(). Remove the dead code. This effectively moves > > the vhost-user protocol specific code from vhost-user-scsi.c into > > vhost-user.c where it belongs. > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > --- > > hw/scsi/vhost-user-scsi.c | 9 --------- > > hw/virtio/vhost-user.c | 13 +++++++++---- > > 2 files changed, 9 insertions(+), 13 deletions(-) > > > > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c > > index ee99b19e7a..8582b2e8ab 100644 > > --- a/hw/scsi/vhost-user-scsi.c > > +++ b/hw/scsi/vhost-user-scsi.c > > @@ -71,15 +71,6 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev) > > VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev); > > struct vhost_dev *dev = &vsc->dev; > > > > - /* > > - * Historically, reset was not implemented so only reset devices > > - * that are expecting it. > > - */ > > - if (!virtio_has_feature(dev->protocol_features, > > - VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { > > - return; > > - } > > - > > if (dev->vhost_ops->vhost_reset_device) { > > dev->vhost_ops->vhost_reset_device(dev); > > } > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 8dcf049d42..7bed9ad7d5 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -1492,12 +1492,17 @@ static int vhost_user_reset_device(struct vhost_dev *dev) > > { > > VhostUserMsg msg = { > > .hdr.flags = VHOST_USER_VERSION, > > + .hdr.request = VHOST_USER_RESET_DEVICE, > > }; > > > > - msg.hdr.request = virtio_has_feature(dev->protocol_features, > > - VHOST_USER_PROTOCOL_F_RESET_DEVICE) > > - ? VHOST_USER_RESET_DEVICE > > - : VHOST_USER_RESET_OWNER; > > + /* > > + * Historically, reset was not implemented so only reset devices > > + * that are expecting it. > > + */ > > + if (!virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { > > + return -ENOSYS; > > + } > > > > return vhost_user_write(dev, &msg, NULL, 0); > > } > >
On 04.10.23 13:15, Stefan Hajnoczi wrote: > On Wed, 4 Oct 2023 at 06:44, Hanna Czenczek <hreitz@redhat.com> wrote: >> On 04.10.23 03:45, Stefan Hajnoczi wrote: >>> The VHOST_USER_RESET_OWNER message is deprecated in the spec: >>> >>> This is no longer used. Used to be sent to request disabling all >>> rings, but some back-ends interpreted it to also discard connection >>> state (this interpretation would lead to bugs). It is recommended >>> that back-ends either ignore this message, or use it to disable all >>> rings. >> According to the spec, it is then indeed better to not call it in >> vhost_user_reset_device, because it seems like it would be interpreted >> as something completely different. >> >> However, between the three back-end implementations of vhost-user I know >> of (libvhost-user, DPDK, the vhost crates; four if you count RSD), none >> implement RESET_DEVICE. libvhost-user and DPDK do implement >> RESET_OWNER, though, and they both do it by resetting the device, not by >> disabling any vring. The vhost crate also implements RESET_OWNER, but >> it doesn’t do anything but forward it as such to the actual device >> implementation (virtiofsd doesn’t implement this function, so ignores >> it). It does document that it would disable all vrings, but does so in >> the past and has marked it deprecated (ever since the method was >> introduced in the fourth commit to the repository, making it extremely >> unlikely that anyone would implement it). > Hi Hanna, > vhost-user-backend still seems to reset all vhost-user protocol state, > making RESET_OWNER unusable: > https://github.com/rust-vmm/vhost/blob/main/crates/vhost-user-backend/src/handler.rs#L230 > > Have I missed something? No, I missed that. I overlooked that when grepping for reset_owner. This implementation kind of does follow the original pre-2015 description of RESET_OWNER, but I can’t believe this code originated from pre-2015, which makes it really weird. It’s strange that in a commit from April 2019 the vhost crate marked the function as (“no longer used”), and then it was implemented in September of 2019, notably as something completely different than “Used to be sent to request disabling all rings”. Another thing I noticed is that while libvhost-user does call the function vu_reset_device_exec(), what it does is indeed disable all vrings instead of resetting anything, i.e. what the spec says (and what I didn’t think anyone did). DPDK interestingly does do a reset, and this includes clearing protocol_features (in reset_device()). So two things: First, I’d prefer (slightly) for the commit message to mention that while RESET_OWNER would not be usable for a reset if everything were according to spec, the main problem to me seems that RESET_OWNER was never clearly defined before being deprecated, so different implementations do different things regardless of what the spec says now, which means it’s basically burned and no front-end may ever issue it at all lest it wants to get the back-end into an implementation-defined state. Second, I wonder what exactly you mean by saying that RESET_OWNER resetting all vhost-user protocol state were to make the command unusable for resetting. The vhost-user-backend implementation doesn’t clear all state, but resets only three things: The @owned field (which I don’t think is really used outside of vhost/src/vhost_user/dummy_slave.rs (this name is begging for a conscious language overhaul...), which appears not really important?), the virtio features (this I would expect any device reset to do), and the vhost-user protocol features. Now, yes, clearing the vhost-user protocol features doesn’t seem like something that should be done just because of a device reset. However, note that DPDK’s reset_device() does this, too. What I’m getting at is that we don’t have a clear definition of what RESET_DEVICE is supposed to do either, i.e. it doesn’t explicitly say that protocol features are not to be reset. Sure, that may be obvious, but we should clarify this so that if e.g. DPDK is to implement RESET_DEVICE, they will take care not use its reset_device() implementation, which does reset protocol_features. (Also, now that I look at RESET_DEVICE – why does it say that it disables all vrings? (Has done so since its introduction.) Is this something that qemu expects and will handle, i.e. that after a guest-initiated reset, the rings are enabled when the guest wants to use the device again? Also, it doesn’t say the rings are stopped, so basically, as it is *right now* (where we only have three ring states, stopped, started+disabled, started+enabled), disabling vrings implicitly means they must still be started, because they can’t be disabled when stopped. I’m going to change that, but as it reads right now, RESET_DEVICE does not seem like the reset we want. We should really get to fix that, too, before back-ends start to actually implement it.) Hanna
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c index ee99b19e7a..8582b2e8ab 100644 --- a/hw/scsi/vhost-user-scsi.c +++ b/hw/scsi/vhost-user-scsi.c @@ -71,15 +71,6 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev) VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev); struct vhost_dev *dev = &vsc->dev; - /* - * Historically, reset was not implemented so only reset devices - * that are expecting it. - */ - if (!virtio_has_feature(dev->protocol_features, - VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { - return; - } - if (dev->vhost_ops->vhost_reset_device) { dev->vhost_ops->vhost_reset_device(dev); } diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 8dcf049d42..7bed9ad7d5 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -1492,12 +1492,17 @@ static int vhost_user_reset_device(struct vhost_dev *dev) { VhostUserMsg msg = { .hdr.flags = VHOST_USER_VERSION, + .hdr.request = VHOST_USER_RESET_DEVICE, }; - msg.hdr.request = virtio_has_feature(dev->protocol_features, - VHOST_USER_PROTOCOL_F_RESET_DEVICE) - ? VHOST_USER_RESET_DEVICE - : VHOST_USER_RESET_OWNER; + /* + * Historically, reset was not implemented so only reset devices + * that are expecting it. + */ + if (!virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { + return -ENOSYS; + } return vhost_user_write(dev, &msg, NULL, 0); }
The VHOST_USER_RESET_OWNER message is deprecated in the spec: This is no longer used. Used to be sent to request disabling all rings, but some back-ends interpreted it to also discard connection state (this interpretation would lead to bugs). It is recommended that back-ends either ignore this message, or use it to disable all rings. The only caller of vhost_user_reset_device() is vhost_user_scsi_reset(). It checks that F_RESET_DEVICE was negotiated before calling it: static void vhost_user_scsi_reset(VirtIODevice *vdev) { VHostSCSICommon *vsc = VHOST_SCSI_COMMON(vdev); struct vhost_dev *dev = &vsc->dev; /* * Historically, reset was not implemented so only reset devices * that are expecting it. */ if (!virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_RESET_DEVICE)) { return; } if (dev->vhost_ops->vhost_reset_device) { dev->vhost_ops->vhost_reset_device(dev); } } Therefore VHOST_USER_RESET_OWNER is actually never sent by vhost_user_reset_device(). Remove the dead code. This effectively moves the vhost-user protocol specific code from vhost-user-scsi.c into vhost-user.c where it belongs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- hw/scsi/vhost-user-scsi.c | 9 --------- hw/virtio/vhost-user.c | 13 +++++++++---- 2 files changed, 9 insertions(+), 13 deletions(-)