Message ID | 1346917610-14568-1-git-send-email-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Sep 06, 2012 at 09:46:50AM +0200, Paolo Bonzini wrote: > VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a > "negative" feature: it tells you that silent defalte is not supported. > Right now, QEMU refuses migration if the target does not support all the > features that were negotiated. But then: > > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed, > which is wrong; > > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which > is useless. > > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Frankly I think it's a qemu migration bug. I don't see why we need to tweak spec: just teach qemu to be smarter during migration. Can you show a scenario with old driver/new hypervisor or new driver/old hypervisor that fails? > --- > virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++--- > 1 file modificato, 33 inserzioni(+), 3 rimozioni(-) > > diff --git a/virtio-spec.lyx b/virtio-spec.lyx > index 7a073f4..1a25a18 100644 > --- a/virtio-spec.lyx > +++ b/virtio-spec.lyx > @@ -6238,6 +6238,8 @@ bits > > \begin_deeper > \begin_layout Description > + > +\change_deleted 1531152142 1346917221 > VIRTIO_BALLOON_F_MUST_TELL_HOST > \begin_inset space ~ > \end_inset > @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ > \end_inset > > (1) A virtqueue for reporting guest memory statistics is present. > +\change_inserted 1531152142 1346917193 > + > +\end_layout > + > +\begin_layout Description > + > +\change_inserted 1531152142 1346917219 > +VIRTIO_BALLOON_F_SILENT_DEFLATE > +\begin_inset space ~ > +\end_inset > + > +(2) Host does not need to be told before pages from the balloon are used. > +\change_unchanged > + > \end_layout > > \end_deeper > @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously > \end_layout > > \begin_layout Enumerate > -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not > - use these requested pages until that descriptor in the deflateq has been > - used by the device. > +If the VIRTIO_BALLOON_F_ > +\change_deleted 1531152142 1346917234 > +MUST_TELL_HOST > +\change_inserted 1531152142 1346917237 > +SILENT_DEFLATE > +\change_unchanged > + feature is > +\change_inserted 1531152142 1346917241 > +not > +\change_unchanged > +set, the guest may not use these requested pages until that descriptor in > + the deflateq has been used by the device. > + > +\change_inserted 1531152142 1346917253 > + If it is set, the guest may choose to not use the deflateq at all. > +\change_unchanged > + > \end_layout > > \begin_layout Enumerate > -- > 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 06/09/2012 10:47, Michael S. Tsirkin ha scritto: >> > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed, >> > which is wrong; >> > >> > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which >> > is useless. >> > >> > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate >> > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used. >> > >> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > Frankly I think it's a qemu migration bug. I don't see why > we need to tweak spec: just teach qemu to be smarter > during migration. Of course you can just teach QEMU to be smarter, but that would be a one-off hack for the only ill-defined feature that says something is _not_ supported. Since in practice all virtio_balloon-enbled hypervisors supported silent deflate (so the bit was always zero), and no client used it (so it was never checked), it's easier to just reverse the direction. In fact, it's not clear how the driver should use the feature. My guess is that, if it wants to use silent deflate, it tries to negotiate VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if negotiation fails. This is against the logic of all other features. > Can you show a scenario with old driver/new hypervisor or > new driver/old hypervisor that fails? Suppose the driver tried to negotiate the feature as above. We then have the two scenarios above. In the harmless but annoying scenario, the source hypervisor doesn't support silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST has been negotiated successfully. The destination hypervisor supports silent deflate, so it does _not_ include the feature. It sees that the guest requests VIRTIO_BALLOON_F_MUST_TELL_HOST, and fails migration. In the incorrect scenario, you are migrating to an older hypervisor. The source hypervisor is newer and supports silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST was _not_ negotiated. The destination hypervisor does not supports silent deflate. However, the guest is not requesting VIRTIO_BALLOON_F_MUST_TELL_HOST, and migration succeeds. Next time the guest tries to do use a page from the balloon, badness happens, because the hypervisor does not expect it. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 06, 2012 at 11:24:02AM +0200, Paolo Bonzini wrote: > Il 06/09/2012 10:47, Michael S. Tsirkin ha scritto: > >> > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed, > >> > which is wrong; > >> > > >> > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which > >> > is useless. > >> > > >> > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate > >> > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used. > >> > > >> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > Frankly I think it's a qemu migration bug. I don't see why > > we need to tweak spec: just teach qemu to be smarter > > during migration. > > Of course you can just teach QEMU to be smarter, but that would be a > one-off hack for the only ill-defined feature that says something is > _not_ supported. Since in practice all virtio_balloon-enbled > hypervisors supported silent deflate (so the bit was always zero), and > no client used it (so it was never checked), it's easier to just reverse > the direction. > > In fact, it's not clear how the driver should use the feature. My guess > is that, if it wants to use silent deflate, it tries to negotiate > VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if > negotiation fails. This is against the logic of all other features. Let's take a step back from the implementation details. You are trying to add a new feature bit, after all. Why? Why is silent deflate useful? This is what is missing in all this discussion. If it is not useful we do not need a bit for it. > > Can you show a scenario with old driver/new hypervisor or > > new driver/old hypervisor that fails? > > Suppose the driver tried to negotiate the feature as above. We then > have the two scenarios above. > > In the harmless but annoying scenario, the source hypervisor doesn't > support silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST has been > negotiated successfully. The destination hypervisor supports silent > deflate, so it does _not_ include the feature. It sees that the guest > requests VIRTIO_BALLOON_F_MUST_TELL_HOST, and fails migration. > > In the incorrect scenario, you are migrating to an older hypervisor. > The source hypervisor is newer and supports silent deflate, so > VIRTIO_BALLOON_F_MUST_TELL_HOST was _not_ negotiated. The destination > hypervisor does not supports silent deflate. However, the guest is not > requesting VIRTIO_BALLOON_F_MUST_TELL_HOST, and migration succeeds. > Next time the guest tries to do use a page from the balloon, badness > happens, because the hypervisor does not expect it. > > Paolo Sorry this is not the example I asked for. Please give and example without migration. Migration is qemu's problem: it is hypervisor's job to make sure guest sees no change during migration. It should be able to do this with any hardware it emulates, there should be no need to change hardware to make it "migrateable" somehow.
Il 06/09/2012 11:44, Michael S. Tsirkin ha scritto: >> In fact, it's not clear how the driver should use the feature. My guess >> is that, if it wants to use silent deflate, it tries to negotiate >> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if >> negotiation fails. This is against the logic of all other features. > > Let's take a step back from the implementation details. > You are trying to add a new feature bit, after all. > Why? Why is silent deflate useful? This is what is > missing in all this discussion. If it is not useful > we do not need a bit for it. It is useful because it lets guests inflate the balloon aggressively, and then use ballooned-out pages even in places where the guest OS cannot sleep, such as kmalloc(GFP_ATOMIC). >>> Can you show a scenario with old driver/new hypervisor or >>> new driver/old hypervisor that fails? > > Sorry this is not the example I asked for. Please give and example > without migration. > > Migration is qemu's problem: it is hypervisor's job to > make sure guest sees no change during migration. Quoting my message: "Of course you can just teach QEMU to be smarter, but that would be a one-off hack for the only ill-defined feature that says something is _not_ supported". Currently migration works the same way for all virtio devices, and assumes that features are defined only in the "positive" direction: drivers request features if they want to use it, devices provide features to say they support something. Instead, in the case of this feature, the driver requests it before relying on its lack (which is odd); the device provides if they do not support something (which is wrong). You can see that this just cannot provide backwards-compatibility in the device; it happens to work only because the feature was there in the first version of the spec. > It should be able to do this with any hardware it emulates, > there should be no need to change hardware to make it > "migrateable" somehow. Of course, but if we can fix the hardware with no bad effects, let's do that instead. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 06, 2012 at 11:57:22AM +0200, Paolo Bonzini wrote: > Il 06/09/2012 11:44, Michael S. Tsirkin ha scritto: > >> In fact, it's not clear how the driver should use the feature. My guess > >> is that, if it wants to use silent deflate, it tries to negotiate > >> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if > >> negotiation fails. This is against the logic of all other features. > > > > Let's take a step back from the implementation details. > > You are trying to add a new feature bit, after all. > > Why? Why is silent deflate useful? This is what is > > missing in all this discussion. If it is not useful > > we do not need a bit for it. > > It is useful because it lets guests inflate the balloon aggressively, > and then use ballooned-out pages even in places where the guest OS > cannot sleep, such as kmalloc(GFP_ATOMIC). Interesting. Do you intend to develop a driver patch using this? I'd like to see how that works. Because if not, IMO it's best to wait until someone asks for it. > >>> Can you show a scenario with old driver/new hypervisor or > >>> new driver/old hypervisor that fails? > > > > Sorry this is not the example I asked for. Please give and example > > without migration. > > > > Migration is qemu's problem: it is hypervisor's job to > > make sure guest sees no change during migration. > > Quoting my message: "Of course you can just teach QEMU to be smarter, > but that would be a one-off hack for the only ill-defined feature that > says something is _not_ supported". > > Currently migration works the same way for all virtio devices, > and > assumes that features are defined only in the "positive" direction: > drivers request features if they want to use it, devices provide > features to say they support something. Well this approach is buggy. If I reread features after migration what do I see? Something changed right? So this is a bug. Migration should not change hardware. And it is not a "one off" thing it is fundamental for any hardware. Fix that in qemu, and the problem goes away without spec changes. > Instead, in the case of this feature, the driver requests it before > relying on its lack (which is odd); Which code in driver do you refer to? > the device provides if they do not > support something (which is wrong). Not support? It just seems to be asking guest to tell it about deflates. If guest acks the bit, we know it will. If it does not, it will not. > You can see that this just cannot > provide backwards-compatibility in the device; Sorry I do not understand this meta argument. There should be an example where a driver and device fail to work together. And without migration: as I showed migration is simply broken atm for an unrelated reason. Otherwise all's well. > it happens to work only > because the feature was there in the first version of the spec. This is how we do compatiblity in virtio. If we want driver to do something, we add a feature and it can ack, if it does we know it will do what we want. Another example is network announce bit. If driver acks it, we know we do not need to send gratitious arp from qemu. You are saying it is also broken? > > It should be able to do this with any hardware it emulates, > > there should be no need to change hardware to make it > > "migrateable" somehow. > > Of course, but if we can fix the hardware with no bad effects, let's do > that instead. > > Paolo Don't fix what is not broken. We get to carry compatibility in both driver and host for a long time for each feature. Note: adding new features adds zero value in this respect - it will not allow simplifying the hypervisor.
Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto: >> It is useful because it lets guests inflate the balloon aggressively, >> and then use ballooned-out pages even in places where the guest OS >> cannot sleep, such as kmalloc(GFP_ATOMIC). > > Interesting. > Do you intend to develop a driver patch using this? I'd like to see how > that works. Because if not, IMO it's best to wait until someone asks > for it. It's been two months, but Frank Swiderski's patch that triggered the debate is exactly that (http://permalink.gmane.org/gmane.linux.kernel/1318984). However, he didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there. >> Currently migration works the same way for all virtio devices, >> and assumes that features are defined only in the "positive" direction: >> drivers request features if they want to use it, devices provide >> features to say they support something. > > Well this approach is buggy. If I reread features after migration what > do I see? Something changed right? So this is a bug. Migration should > not change hardware. Exactly, virtio migration currently fails if it would change hardware due to features not supported in the destination. Except for VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is defined in the wrong direction. > Fix that in qemu, and the problem goes away without spec changes. That would be a one-off hack, for the sole feature that was defined wrong. >> Instead, in the case of this feature, the driver requests it before >> relying on its lack (which is odd); > > Which code in driver do you refer to? I'm talking of the code Frank should have put in the driver, but he didn't (so he has a bug). Something like if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST)) return -ENODEV; So it has to request the feature, and then fail if the feature is present. That's quite backwards. Everywhere else you'll find if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI)) return -ENOTTY; BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE)); if (virtio_has_feature(vblk->vdev, VIRTIO_NET_F_CTRL_VQ)) { /* do cool stuff */ } etc. >> the device provides if they do not >> support something (which is wrong). > > Not support? It just seems to be asking guest to tell it about deflates. > If guest acks the bit, we know it will. If it does not, > it will not. No, it is the other way round. The host ultimately decides what features are negotiated, so it doesn't ask anything to the guest. The _guest_ is asking the host about the need for explicit deflate. >> You can see that this just cannot >> provide backwards-compatibility in the device; > > Sorry I do not understand this meta argument. > There should be an example where a driver and device > fail to work together. There's nothing that you cannot work around. Use virtio_has_feature in the device, invert the migration feature check in the driver. Why not just _get it right_ instead? >> it happens to work only >> because the feature was there in the first version of the spec. > > This is how we do compatiblity in virtio. If we want driver to do > something, we add a feature and it can ack, if it does we know it will > do what we want. Another example is network announce bit. If driver > acks it, we know we do not need to send gratitious arp from qemu. You > are saying it is also broken? No, it's not broken. A reverse feature, let's call it like VIRTIO_NET_F_HOST_MUST_SEND_GARP, would be broken. VIRTIO_NET_F_GUEST_ANNOUNCE is a "positive" feature: if set, the host _can_ ask the guest to send a gARP, but it may also send it itself. Similarly if VIRTIO_BALLOON_F_SILENT_DEFLATE is set, the guest _can_ use ballooned pages directly, but may also deflate them explicitly. Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature: if set, the host _may not_ rely on the guest to send a gARP. Similarly if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use ballooned pages directly. There are _no_ other negative features besides VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good reason---because they're broken. (Hmm, actually we have one, VIRTIO_BLK_F_RO. It is also a bit broken, but it is not so important because it depends on user input more than hypervisor version). Reasoning on migration is just another way to see if the feature is positive. During migration, new features available on the destination can always be masked. But if removing the feature _adds_ a capability to the hardware, it's wrong. > Don't fix what is not broken. We get to carry compatibility > in both driver and host for a long time for each feature. Sure, but better fix broken things _before_ somebody uses them. I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW because it's in wide use and it would pose compatibility problems indeed. But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source code, neither driver nor hypervisor, we get lucky and we can instantly deprecate it. > Note: adding new features adds zero value in this respect - it will not > allow simplifying the hypervisor. Indeed, it will add one line of code to the hypervisor to advertise the new feature. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 06, 2012 at 02:13:14PM +0200, Paolo Bonzini wrote: > Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto: > >> It is useful because it lets guests inflate the balloon aggressively, > >> and then use ballooned-out pages even in places where the guest OS > >> cannot sleep, such as kmalloc(GFP_ATOMIC). > > > > Interesting. > > Do you intend to develop a driver patch using this? I'd like to see how > > that works. Because if not, IMO it's best to wait until someone asks > > for it. > > It's been two months, but Frank Swiderski's patch that triggered the > debate is exactly that > (http://permalink.gmane.org/gmane.linux.kernel/1318984). However, he > didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there. He is using a sepate device ID though, so we do not need a feature bit. > >> Currently migration works the same way for all virtio devices, > >> and assumes that features are defined only in the "positive" direction: > >> drivers request features if they want to use it, devices provide > >> features to say they support something. > > > > Well this approach is buggy. If I reread features after migration what > > do I see? Something changed right? So this is a bug. Migration should > > not change hardware. > > Exactly, virtio migration currently fails if it would change hardware > due to features not supported in the destination. Except for > VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is > defined in the wrong direction. There is nothing wrong with the direction that I can see. The bug is that migration between backends > > Fix that in qemu, and the problem goes away without spec changes. > > That would be a one-off hack, for the sole feature that was defined wrong. Not at all. It's a fundamental bug, as long as it's unfixed talking about migration is just useless. > >> Instead, in the case of this feature, the driver requests it before > >> relying on its lack (which is odd); > > > > Which code in driver do you refer to? > > I'm talking of the code Frank should have put in the driver, but he > didn't (so he has a bug). Something like > > if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST)) > return -ENODEV; > > So it has to request the feature, and then fail if the feature is > present. That's quite backwards. Everywhere else you'll find > > if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI)) > return -ENOTTY; > > BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE)); This is a bug BTW - we should not crash on bad device, failing probe is the right thing. > > if (virtio_has_feature(vblk->vdev, VIRTIO_NET_F_CTRL_VQ)) { > /* do cool stuff */ > } > > etc. > See above. Frank's driver does not seem to have a bug. > >> the device provides if they do not > >> support something (which is wrong). > > > > Not support? It just seems to be asking guest to tell it about deflates. > > If guest acks the bit, we know it will. If it does not, > > it will not. > > No, it is the other way round. The host ultimately decides what > features are negotiated, so it doesn't ask anything to the guest. The > _guest_ is asking the host about the need for explicit deflate. Now we are arguing about words. This is why meta arguments without specific examples are so bad. > >> You can see that this just cannot > >> provide backwards-compatibility in the device; > > > > Sorry I do not understand this meta argument. > > There should be an example where a driver and device > > fail to work together. > > There's nothing that you cannot work around. Use virtio_has_feature in > the device, invert the migration feature check in the driver. Why not > just _get it right_ instead? Exactly. Bug is in qemu, fix it _there_. What you do is a work around in spec: you declare old configurations unsupported. > >> it happens to work only > >> because the feature was there in the first version of the spec. > > > > This is how we do compatiblity in virtio. If we want driver to do > > something, we add a feature and it can ack, if it does we know it will > > do what we want. Another example is network announce bit. If driver > > acks it, we know we do not need to send gratitious arp from qemu. You > > are saying it is also broken? > > No, it's not broken. A reverse feature, let's call it like > VIRTIO_NET_F_HOST_MUST_SEND_GARP, would be broken. > > VIRTIO_NET_F_GUEST_ANNOUNCE is a "positive" feature: if set, the host > _can_ ask the guest to send a gARP, but it may also send it itself. > Similarly if VIRTIO_BALLOON_F_SILENT_DEFLATE is set, the guest _can_ use > ballooned pages directly, but may also deflate them explicitly. > > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature: > if set, the host _may not_ rely on the guest to send a gARP. Similarly > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use > ballooned pages directly. > > There are _no_ other negative features besides > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good > reason---because they're broken. > > (Hmm, actually we have one, VIRTIO_BLK_F_RO. It is also a bit broken, > but it is not so important because it depends on user input more than > hypervisor version). Now that we have a specific example, we can talk. Simply, some features do not need an ack from guest: they just tell guest something about device. RO is one such feature. > Reasoning on migration is just another way to see if the feature is > positive. During migration, new features available on the destination > can always be masked. But if removing the feature _adds_ a capability > to the hardware, it's wrong. Fact is, nothing except migration seems broken. This alone should make you realize there is a bug in qemu not in driver or protocol. > > Don't fix what is not broken. We get to carry compatibility > > in both driver and host for a long time for each feature. > > Sure, but better fix broken things _before_ somebody uses them. > > I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW > because it's in wide use and it would pose compatibility problems indeed. > > But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source > code, neither driver nor hypervisor, > we get lucky and we can instantly > deprecate it. Which driver are you looking at? grep VIRTIO_BALLOON_F_MUST_TELL_HOST drivers/virtio/*c|wc -l 2 So it does not look like we can just remove it like you did. At minimum we will need to reserve the bit. Yes qemu does not seem to set this bit. Need to check others e.g. kvm tool etc. Benefit seems very small. Why bother? > > Note: adding new features adds zero value in this respect - it will not > > allow simplifying the hypervisor. > > Indeed, it will add one line of code to the hypervisor to advertise the > new feature. > > Paolo So there's no point. Migration will stil be broken until it is fixed properly.
Il 06/09/2012 14:51, Michael S. Tsirkin ha scritto: > On Thu, Sep 06, 2012 at 02:13:14PM +0200, Paolo Bonzini wrote: >> Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto: >>>> It is useful because it lets guests inflate the balloon aggressively, >>>> and then use ballooned-out pages even in places where the guest OS >>>> cannot sleep, such as kmalloc(GFP_ATOMIC). >>> >>> Interesting. >>> Do you intend to develop a driver patch using this? I'd like to see how >>> that works. Because if not, IMO it's best to wait until someone asks >>> for it. >> >> It's been two months, but Frank Swiderski's patch that triggered the >> debate is exactly that >> (http://permalink.gmane.org/gmane.linux.kernel/1318984). However, he >> didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there. > > He is using a sepate device ID though, so we do not need a feature bit. It doesn't need to, it can work just as well with the existing device. >> Exactly, virtio migration currently fails if it would change hardware >> due to features not supported in the destination. Except for >> VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is >> defined in the wrong direction. > > There is nothing wrong with the direction that I can see. > The bug is that migration between backends ...? >> BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE)); > > This is a bug BTW - we should not crash on bad device, failing probe > is the right thing. (I knew you would say that. :) The feature has been checked elsewhere, and this code will be unreachable if the feature was not there in the first place). >>>> You can see that this just cannot >>>> provide backwards-compatibility in the device; >>> >>> Sorry I do not understand this meta argument. >>> There should be an example where a driver and device >>> fail to work together. >> >> There's nothing that you cannot work around. Use virtio_has_feature in >> the device, invert the migration feature check in the driver. Why not >> just _get it right_ instead? > > Exactly. Bug is in qemu, fix it _there_. What you do is a work around > in spec: you declare old configurations unsupported. Such old configurations do not exist, so it's fine. >> There are _no_ other negative features besides >> VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good >> reason---because they're broken. >> >> (Hmm, actually we have one, VIRTIO_BLK_F_RO. It is also a bit broken, >> but it is not so important because it depends on user input more than >> hypervisor version). > > Now that we have a specific example, we can talk. > Simply, some features do not need an ack from guest: > they just tell guest something about device. > RO is one such feature. I don't understand how that's relevant. All features that just say "this field is there in the configuration at that offset" need no ack from guest, yet they're expressed as "positive" features. >> Reasoning on migration is just another way to see if the feature is >> positive. During migration, new features available on the destination >> can always be masked. But if removing the feature _adds_ a capability >> to the hardware, it's wrong. > > Fact is, nothing except migration seems broken. This alone should > make you realize there is a bug in qemu not in driver or protocol. I'm not entirely sure that you understand why migration is broken and why it is only broken for this particular feature bit (and in a more benign way for VIRTIO_BLK_F_RO). >>> Don't fix what is not broken. We get to carry compatibility >>> in both driver and host for a long time for each feature. >> >> Sure, but better fix broken things _before_ somebody uses them. >> >> I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW >> because it's in wide use and it would pose compatibility problems indeed. >> >> But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source >> code, neither driver nor hypervisor, >> we get lucky and we can instantly >> deprecate it. > > Which driver are you looking at? > > grep VIRTIO_BALLOON_F_MUST_TELL_HOST drivers/virtio/*c|wc -l > 2 Well, I am also looking at how it is used: static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, }; /* * Note that if * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ tell_host(vb, vb->deflate_vq); release_pages_by_pfn(vb->pfns, vb->num_pfns); So no, it is not used with virtio_has_feature in any way that matters. > So it does not look like we can just remove it like you did. At minimum > we will need to reserve the bit. Yes, reserving the bit is necessary. > Benefit seems very small. Why bother? Because the problem is clear, and easily solved, and I don't like time bombs. >>> Note: adding new features adds zero value in this respect - it will not >>> allow simplifying the hypervisor. >> >> Indeed, it will add one line of code to the hypervisor to advertise the >> new feature. > > So there's no point. Migration will stil be broken until it is > fixed properly. Migration of VIRTIO_BALLOON_F_MUST_TELL_HOST would be broken, but VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist and will not exist in the hypervisor, so it will not be broken. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 06, 2012 at 09:46:50AM +0200, Paolo Bonzini wrote: > VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a > "negative" feature: it tells you that silent defalte is not supported. > Right now, QEMU refuses migration if the target does not support all the > features that were negotiated. But then: > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed, > which is wrong; It need not be wrong. It depends on how host behaves. The right thing for qemu to do would be to assume that since this bit was not acked it can not assume specific guest behaviour. Even ignoring that, looking at this at a high level: basically you have configured two different machines with different qemu flags, and are complaning that migration did not fail cleanly. However - This is still a user/management bug. You should not even try such migration. Yes I put a sanity check there but it is just a debugging aid. It is not indended to be exhaustive. This is far from the only case where user error might cause problems silently. - Yes clean failure would be nicer, if you want to guarantee this just teach qemu to send all host features and verify they match on destination. Or more generally send machine configuration. No need to change spec or special case this bit at all. > > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which > is useless. It is correct. device feature bits should not change across migration. > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++--- > 1 file modificato, 33 inserzioni(+), 3 rimozioni(-) > > diff --git a/virtio-spec.lyx b/virtio-spec.lyx > index 7a073f4..1a25a18 100644 > --- a/virtio-spec.lyx > +++ b/virtio-spec.lyx > @@ -6238,6 +6238,8 @@ bits > > \begin_deeper > \begin_layout Description > + > +\change_deleted 1531152142 1346917221 > VIRTIO_BALLOON_F_MUST_TELL_HOST > \begin_inset space ~ > \end_inset > @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ > \end_inset > > (1) A virtqueue for reporting guest memory statistics is present. > +\change_inserted 1531152142 1346917193 > + > +\end_layout > + > +\begin_layout Description > + > +\change_inserted 1531152142 1346917219 > +VIRTIO_BALLOON_F_SILENT_DEFLATE > +\begin_inset space ~ > +\end_inset > + > +(2) Host does not need to be told before pages from the balloon are used. > +\change_unchanged > + > \end_layout > > \end_deeper > @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously > \end_layout > > \begin_layout Enumerate > -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not > - use these requested pages until that descriptor in the deflateq has been > - used by the device. > +If the VIRTIO_BALLOON_F_ > +\change_deleted 1531152142 1346917234 > +MUST_TELL_HOST > +\change_inserted 1531152142 1346917237 > +SILENT_DEFLATE > +\change_unchanged > + feature is > +\change_inserted 1531152142 1346917241 > +not > +\change_unchanged > +set, the guest may not use these requested pages until that descriptor in > + the deflateq has been used by the device. > + > +\change_inserted 1531152142 1346917253 > + If it is set, the guest may choose to not use the deflateq at all. > +\change_unchanged > + > \end_layout > > \begin_layout Enumerate > -- > 1.7.11.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini <pbonzini@redhat.com> writes: > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature: > if set, the host _may not_ rely on the guest to send a gARP. Similarly > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use > ballooned pages directly. > > There are _no_ other negative features besides > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good > reason---because they're broken. > > (Hmm, actually we have one, VIRTIO_BLK_F_RO. It is also a bit broken, > but it is not so important because it depends on user input more than > hypervisor version). Yes, this is the key observation, and an important lesson for the future. Thanks! Note that these two negative features were in the original spec, where it's assumed that every device supports them. That's not explicitly documented, however. I like killing the totally unused feature. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 07, 2012 at 09:15:46AM +0930, Rusty Russell wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: > > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature: > > if set, the host _may not_ rely on the guest to send a gARP. Similarly > > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use > > ballooned pages directly. > > > > There are _no_ other negative features besides > > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good > > reason---because they're broken. > > > > (Hmm, actually we have one, VIRTIO_BLK_F_RO. It is also a bit broken, > > but it is not so important because it depends on user input more than > > hypervisor version). > > Yes, this is the key observation, and an important lesson for the > future. Thanks! > Note that these two negative features were in the original spec, where > it's assumed that every device supports them. That's not explicitly > documented, however. I'm curious what would we do for the future? I tried to imagine that _RO was not in the original spec, so virtio-blk expects a r/w device. Now we can not add _RW - old hypervisors do not set it, and old drivers do not ack it. What would a new flag with equivalent functionality be? > I like killing the totally unused feature. > > Cheers, > Rusty. I tried verifying that it is unused. And found this: Linux drivers currently try to ack this feature if it is there, so do BSD drivers. Both from v1. And they tell host before leak. So that is fine. But looking at windows drivers: https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/Balloon/sys/balloon.c they do *not* ack this bit, and BalloonLeak calls TellHost at the last line. So it looks like a bug: we should teach driver to tell host first on leak? Yan, Vadim, can you comment please? Also if true, looks like this bit will be useful to detect a fixed driver on the hypervisor side - to avoid unmapping such pages? Rusty what do you think?
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Fri, Sep 07, 2012 at 09:15:46AM +0930, Rusty Russell wrote: >> Paolo Bonzini <pbonzini@redhat.com> writes: >> > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature: >> > if set, the host _may not_ rely on the guest to send a gARP. Similarly >> > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use >> > ballooned pages directly. >> > >> > There are _no_ other negative features besides >> > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good >> > reason---because they're broken. >> > >> > (Hmm, actually we have one, VIRTIO_BLK_F_RO. It is also a bit broken, >> > but it is not so important because it depends on user input more than >> > hypervisor version). >> >> Yes, this is the key observation, and an important lesson for the >> future. Thanks! >> Note that these two negative features were in the original spec, where >> it's assumed that every device supports them. That's not explicitly >> documented, however. > > I'm curious what would we do for the future? I tried to imagine that _RO > was not in the original spec, so virtio-blk expects a r/w device. > Now we can not add _RW - old hypervisors do not set it, and old > drivers do not ack it. > What would a new flag with equivalent functionality be? Backwards compatibility in the R/O case would actually work: just fail writes. Because it's just friendly advice to the OS, really. The final test is always: does it break users? If there are no users who will notice, we can do anything. If there are users, we have to keep backwards compatibility, and that implies we can't add "must know" features. > So it looks like a bug: we should teach driver to tell host first on leak? > Yan, Vadim, can you comment please? > > Also if true, looks like this bit will be useful to detect a fixed driver on > the hypervisor side - to avoid unmapping such pages? Rusty what do you > think? So, feature is unimplemented in qemu, and broken in drivers. I starting to share Paolo's dislike of it. Don't understand why we'd care about fixed drivers though, if we remove the feature bit.... Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 07/09/2012 08:39, Rusty Russell ha scritto: >> > So it looks like a bug: we should teach driver to tell host first on leak? >> > Yan, Vadim, can you comment please? >> > >> > Also if true, looks like this bit will be useful to detect a fixed driver on >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you >> > think? > So, feature is unimplemented in qemu, and broken in drivers. I starting > to share Paolo's dislike of it. > > Don't understand why we'd care about fixed drivers though, if we remove > the feature bit.... Hmm, Michael has a point here. Basically, the Windows driver is using silent deflate, but not telling the host (yet) about it. So, we must assume that a driver that does not negotiate VIRTIO_BALLOON_F_MUST_TELL_HOST _will_ use silent deflate. Here's a way to proceed. We add VIRTIO_BALLOON_F_SILENT_DEFLATE, which is negotiated normally. If not available, at worst the guest driver may refuse to start, or revert to using the deflateq. We rename VIRTIO_BALLOON_F_MUST_TELL_HOST to WILL_TELL_HOST, since that's how it's being used. Now for the device there are three cases: - does not support silent deflate at all: it should always propose VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not negotiate it, the device must assume that the guest will use silent deflate, and fail to start the guest if the device does not support silent deflate. - optionally supports silent deflate: it should always propose VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not negotiate it, the device must assume that the guest will use silent deflate - always supports silent deflate: does not need to do anything, current behavior works fine. But the driver might as well propose VIRTIO_BALLOON_F_WILL_TELL_HOST, so that migration works fine. (This is a hardware change, so it must be versioned, yadda yadda). I can prepare a spec patch for this. BTW, since we have in the archives an example of using silent deflate, here is an example of non-silent deflate. It may help understanding the above with an actual example of a device. Suppose a guest is using PCI passthrough, so it has all memory pinned. - If the guest will _not_ use silent deflate, we can unlock memory on inflate and lock it back on deflate. (The question is what to do if locking fail; left for when someone actually implements this thing). - If the guest will use silent deflate, we cannot do that. So this is the second case above. The device must propose VIRTIO_BALLOON_F_WILL_TELL_HOST. Then: - if the guest negotiates VIRTIO_BALLOON_F_SILENT_DEFLATE, we cannot do the munlock/mlock - if the guest negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST, we can do the munlock/mlock - if the guest does not negotiate either, the driver is buggy and we cannot do the munlock/mlock Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote: > > So it looks like a bug: we should teach driver to tell host first on leak? > > Yan, Vadim, can you comment please? > > > > Also if true, looks like this bit will be useful to detect a fixed driver on > > the hypervisor side - to avoid unmapping such pages? Rusty what do you > > think? > > So, feature is unimplemented in qemu, and broken in drivers. I starting > to share Paolo's dislike of it. What is broken in drivers? To me it looks like it works exactly as advertized: linux and bsd ack TELL_HOST and tells host first. windows does not do either. Do windows drivers currently are not broken: they just do not support the feature. The claim that drivers and hyprevisors do not support it is false: qemu does not support it but drivers do and in non trivial way. > Don't understand why we'd care about fixed drivers though, if we remove > the feature bit.... > > Cheers, > Rusty. Do we really know there are no hypervisors implementing it? As I said above drivers do have support.
On Fri, Sep 07, 2012 at 11:27:42AM +0200, Paolo Bonzini wrote: > Il 07/09/2012 08:39, Rusty Russell ha scritto: > >> > So it looks like a bug: we should teach driver to tell host first on leak? > >> > Yan, Vadim, can you comment please? > >> > > >> > Also if true, looks like this bit will be useful to detect a fixed driver on > >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you > >> > think? > > So, feature is unimplemented in qemu, and broken in drivers. I starting > > to share Paolo's dislike of it. > > > > Don't understand why we'd care about fixed drivers though, if we remove > > the feature bit.... > > Hmm, Michael has a point here. Basically, the Windows driver is using > silent deflate, but not telling the host (yet) about it. So, we must > assume that a driver that does not negotiate > VIRTIO_BALLOON_F_MUST_TELL_HOST _will_ use silent deflate. > > Here's a way to proceed. > > We add VIRTIO_BALLOON_F_SILENT_DEFLATE, which is negotiated normally. > If not available, at worst the guest driver may refuse to start, or > revert to using the deflateq. > > We rename VIRTIO_BALLOON_F_MUST_TELL_HOST to WILL_TELL_HOST, since > that's how it's being used. Now for the device there are three cases: > > - does not support silent deflate at all: it should always propose > VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not > negotiate it, the device must assume that the guest will use silent > deflate, and fail to start the guest if the device does not support > silent deflate. > > - optionally supports silent deflate: it should always propose > VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not > negotiate it, the device must assume that the guest will use silent > deflate > > - always supports silent deflate: does not need to do anything, > current behavior works fine. But the driver might as well propose > VIRTIO_BALLOON_F_WILL_TELL_HOST, so that migration works fine. (This > is a hardware change, so it must be versioned, yadda yadda). > > I can prepare a spec patch for this. > > > BTW, since we have in the archives an example of using silent deflate, > here is an example of non-silent deflate. It may help understanding the > above with an actual example of a device. Suppose a guest is using PCI > passthrough, so it has all memory pinned. > > - If the guest will _not_ use silent deflate, we can unlock memory on > inflate and lock it back on deflate. (The question is what to do if > locking fail; left for when someone actually implements this thing). > > - If the guest will use silent deflate, we cannot do that. > > So this is the second case above. The device must propose > VIRTIO_BALLOON_F_WILL_TELL_HOST. Then: > > - if the guest negotiates VIRTIO_BALLOON_F_SILENT_DEFLATE, > we cannot do the munlock/mlock > > - if the guest negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST, > we can do the munlock/mlock > > - if the guest does not negotiate either, the driver is buggy > and we cannot do the munlock/mlock > > Paolo Let us start with what is broken currently. Looking at it very closely, I think the answer is nothing. Even migration in qemu is not broken as you claimed initially. Next, consider the interface proposed here. You defacto declare all existing drivers buggy. This is a wrong thing to do. You also use two feature bits for a single simple thing, this is inelegant. Last, let us consider how existing feature can be used in the hypervisor. If driver did not ack MUST_TELL_HOST, it is *not* buggy but it means we can not do munlock. This applies to current windows drivers. If driver *did* ack MUST_TELL_HOST, we can munlock and mlock back on leak. Seems useful, driver support is already there, so removing the MUST_TELL_HOST bit seems like a bad idea.
Il 07/09/2012 12:53, Michael S. Tsirkin ha scritto: > Let us start with what is broken currently. Looking at > it very closely, I think the answer is nothing. > Even migration in qemu is not broken as you claimed initially. Correct, migration would be broken as soon as QEMU starts using MUST_TELL_HOST. I'm trying to think ahead, since we have many ideas floating around on the implementation of ballooning. If you implement the mlock/munlock trick, you must start using MUST_TELL_HOST in QEMU to advertise it to guests, and migration breaks. > Next, consider the interface proposed here. You defacto declare > all existing drivers buggy. No, only Windows (and it is buggy, it calls tell_host last). Linux and BSD drivers do negotiate MUST_TELL_HOST, and are not buggy. > This is a wrong thing to do. > You also use two feature bits for a single simple thing, > this is inelegant. True, but the choice is: 1) add a once-only hack to QEMU that fixes migration of VIRTIO_BALLOON_F_MUST_TELL_HOST; 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST. If you do this, guests cannot use anymore silent deflate, which is a regression. 3) use two bits. One tells the device that the driver supports chatty deflate; one tells the driver that the device supports silent deflate. So in the end you do use two feature bits for two different things. Plus, both feature bits are "positive" and I'm happy. > Last, let us consider how existing feature can be used > in the hypervisor. If driver did not ack > MUST_TELL_HOST, it is *not* buggy but it means we can not > do munlock. This applies to current windows drivers. > If driver *did* ack MUST_TELL_HOST, we can munlock > and mlock back on leak. > Seems useful, driver support is already there, > so removing the MUST_TELL_HOST bit seems like a bad idea. Indeed, repurposing MUST_TELL_HOST to WILL_TELL_HOST is better than killing it. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 07, 2012 at 01:20:57PM +0200, Paolo Bonzini wrote: > Il 07/09/2012 12:53, Michael S. Tsirkin ha scritto: > > Let us start with what is broken currently. Looking at > > it very closely, I think the answer is nothing. > > Even migration in qemu is not broken as you claimed initially. > > Correct, migration would be broken as soon as QEMU starts using > MUST_TELL_HOST. I'm trying to think ahead, since we have many ideas > floating around on the implementation of ballooning. > > If you implement the mlock/munlock trick, you must start using > MUST_TELL_HOST in QEMU to advertise it to guests, and migration breaks. Migration does not break. Since I wrote this code in qemu let me explain what is going on. qemu requires that local and remote side are started with same feature bits. To support cross version migration, code in hw/pc_piix.c disables features if you require migration from/to old qemu. At some point I added a sanity check: if we get guest features we know that any bit set there must be set in host features. Yes, this catches some user mistakes. This was never intended as a compatibility guarantee. User is still required to start qemu such that host features match exactly, anything else can lead to failures some of them hard to debug. Here is a simple example: 1. guest reads host features 2. guest is migrated - check passes since no features are acked 3. guest acks features -> failure This applies to any feature. Nothing special with this one. Yes, we can if we want to make this more robust against user errors, e.g. by migrating host feature bits. Patches welcome. If we do it will help all features, not just this one. > > Next, consider the interface proposed here. You defacto declare > > all existing drivers buggy. > > No, only Windows (and it is buggy, it calls tell_host last). It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell host at any point, it behaves exactly to spec. You can not retroactively declare drivers buggy like that. > Linux and > BSD drivers do negotiate MUST_TELL_HOST, and are not buggy. > > > This is a wrong thing to do. > > You also use two feature bits for a single simple thing, > > this is inelegant. > > True, but the choice is: > > 1) add a once-only hack to QEMU that fixes migration of > VIRTIO_BALLOON_F_MUST_TELL_HOST; > > 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST. If you do this, > guests cannot use anymore silent deflate, which is a regression. > > 3) use two bits. One tells the device that the driver supports chatty > deflate; one tells the driver that the device supports silent deflate. The right thing to do is either 4. realize we can not address all user errors, so no real issue 5. address this class of user errors by migrating host features > So in the end you do use two feature bits for two different things. > Plus, both feature bits are "positive" and I'm happy. I am not happy. We lose compatibility with all existing drivers so it will take years until the feature is actually useful. > > Last, let us consider how existing feature can be used > > in the hypervisor. If driver did not ack > > MUST_TELL_HOST, it is *not* buggy but it means we can not > > do munlock. This applies to current windows drivers. > > If driver *did* ack MUST_TELL_HOST, we can munlock > > and mlock back on leak. > > Seems useful, driver support is already there, > > so removing the MUST_TELL_HOST bit seems like a bad idea. > > Indeed, repurposing MUST_TELL_HOST to WILL_TELL_HOST is better than > killing it. > > Paolo Is this just a matter of naming? Same functionality: driver that acks this bit will tell host first, driver that does not will not? If yes that is fine.
Il 07/09/2012 14:17, Michael S. Tsirkin ha scritto: >>> Next, consider the interface proposed here. You defacto declare >>> all existing drivers buggy. >> >> No, only Windows (and it is buggy, it calls tell_host last). > > It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell > host at any point, it behaves exactly > to spec. You can not retroactively declare drivers buggy like that. Well, according to your reading of the spec. In the spec I'm reading "Host must be told before pages from the balloon are used". Doesn't say anything about the guest. Now, indeed a very free interpretation is "Guest will tell host before pages from the balloon are used". It turns out that it's exactly what guests have been doing, hence that's exactly what I'm proposing now: rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST. >> True, but the choice is: >> >> 1) add a once-only hack to QEMU that fixes migration of >> VIRTIO_BALLOON_F_MUST_TELL_HOST; >> >> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST. If you do this, >> guests cannot use anymore silent deflate, which is a regression. >> >> 3) use two bits. One tells the device that the driver supports chatty >> deflate; one tells the driver that the device supports silent deflate. > > The right thing to do is either > 4. realize we can not address all user errors, so no real issue > 5. address this class of user errors by migrating host features > >> So in the end you do use two feature bits for two different things. >> Plus, both feature bits are "positive" and I'm happy. > > I am not happy. > We lose compatibility with all existing drivers How so? > so it will take years until the feature is actually > useful. No, we don't! Windows guests will just not be able to use munlock/mlock until the driver is fixed. Which will probably happen before someone writes the munlock/mlock code. > Is this just a matter of naming? Same functionality: > driver that acks this bit will tell host first, > driver that does not will not? > > If yes that is fine. Yes, that part we agree on I think. We disagree that (after the rename) QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST. _Plus_ adding the new feature bit, which is needed to actually tell the driver that the host supports the silent deflate. Spec patch on the way. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 07, 2012 at 02:22:47PM +0200, Paolo Bonzini wrote: > Il 07/09/2012 14:17, Michael S. Tsirkin ha scritto: > >>> Next, consider the interface proposed here. You defacto declare > >>> all existing drivers buggy. > >> > >> No, only Windows (and it is buggy, it calls tell_host last). > > > > It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell > > host at any point, it behaves exactly > > to spec. You can not retroactively declare drivers buggy like that. > > Well, according to your reading of the spec. > > In the spec I'm reading "Host must be told before pages from the balloon > are used". Doesn't say anything about the guest. No? How is host told then? By divine force? > Now, indeed a very free interpretation is "Guest will tell host before > pages from the balloon are used". It turns out that it's exactly what > guests have been doing, hence that's exactly what I'm proposing now: > rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST. Rename is fine. > >> True, but the choice is: > >> > >> 1) add a once-only hack to QEMU that fixes migration of > >> VIRTIO_BALLOON_F_MUST_TELL_HOST; > >> > >> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST. If you do this, > >> guests cannot use anymore silent deflate, which is a regression. > >> > >> 3) use two bits. One tells the device that the driver supports chatty > >> deflate; one tells the driver that the device supports silent deflate. > > > > The right thing to do is either > > 4. realize we can not address all user errors, so no real issue > > 5. address this class of user errors by migrating host features > > > >> So in the end you do use two feature bits for two different things. > >> Plus, both feature bits are "positive" and I'm happy. > > > > I am not happy. > > We lose compatibility with all existing drivers > > How so? > > > so it will take years until the feature is actually > > useful. > > No, we don't! Windows guests will just not be able to use munlock/mlock > until the driver is fixed. Which will probably happen before someone > writes the munlock/mlock code. If the only change is rename then ofcourse things keep working. I don't care about the name it is up to Rusty. > > Is this just a matter of naming? Same functionality: > > driver that acks this bit will tell host first, > > driver that does not will not? > > > > If yes that is fine. > > Yes, that part we agree on I think. We disagree that (after the rename) > QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST. Not always. It must be off if compatibility with old qemu is disabled. > _Plus_ adding the new feature bit, which is needed to actually tell the This part I do not get. What is silent deflate, why is it useful and what it has to do with what we are discussing here? > driver that the host supports the silent deflate. > Spec patch on the way. > > Paolo Hang on. Can we please talk about motivation? These patches which come without motivation are very hard to review. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 07/09/2012 14:44, Michael S. Tsirkin ha scritto: >> Well, according to your reading of the spec. >> >> In the spec I'm reading "Host must be told before pages from the balloon >> are used". Doesn't say anything about the guest. > > No? How is host told then? By divine force? For a simple madvise-based implementation such as the one in QEMU, the host doesn't need to be told about anything (except periodic updating of the "actual" field, or the statsq). The guest doesn't need at all to use the deflateq in fact. >> Now, indeed a very free interpretation is "Guest will tell host before >> pages from the balloon are used". It turns out that it's exactly what >> guests have been doing, hence that's exactly what I'm proposing now: >> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST. > > Rename is fine. Cool. >> Yes, that part we agree on I think. We disagree that (after the rename) >> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST. > > Not always. It must be off if compatibility with old qemu is disabled. Yes, of course. >> _Plus_ adding the new feature bit, which is needed to actually tell the > > This part I do not get. What is silent deflate, why is it useful > and what it has to do with what we are discussing here? Silent deflate is deflating just by using the page, without using the deflateq at all. So it can be done even from GFP_ATOMIC context. But knowing that the guest will _not_ deflate silently also benefits the host. This is the cause of unpinning ballooned pages and pinning them again upon deflation. This allows cooperative memory overcommit even if the guests' memory is pinned, similar to what Xen did before it supported overcommit. This is the second feature bit. There are three cases: * guest will never do silent deflation: current Linux driver. Host may do munlock/mlock dance. Negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST only, features = VIRTIO_BALLOON_F_WILL_TELL_HOST. * guest will always do silent deflation: current Windows driver. Negotiates nothing, or if it cares it can negotiate VIRTIO_BALLOON_F_SILENT_DEFLATE. Host mustn't do munlock/mlock dance. * guest may do silent deflation if available: combo of Linux driver + Frank's driver. Negotiates both features, looks at VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave: ** If PCI passthrough, the host will not negotiate VIRTIO_BALLOON_F_SILENT_DEFLATE. The driver will behave as the current Linux driver, the host can do the munlock/mlock dance. ** If no PCI passthrough, the host will negotiate VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more aggressively and not use the deflateq. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 07, 2012 at 04:04:13PM +0200, Paolo Bonzini wrote: > Il 07/09/2012 14:44, Michael S. Tsirkin ha scritto: > >> Well, according to your reading of the spec. > >> > >> In the spec I'm reading "Host must be told before pages from the balloon > >> are used". Doesn't say anything about the guest. > > > > No? How is host told then? By divine force? > > For a simple madvise-based implementation such as the one in QEMU, the > host doesn't need to be told about anything (except periodic updating of > the "actual" field, or the statsq). > > The guest doesn't need at all to use the deflateq in fact. > > >> Now, indeed a very free interpretation is "Guest will tell host before > >> pages from the balloon are used". It turns out that it's exactly what > >> guests have been doing, hence that's exactly what I'm proposing now: > >> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST. > > > > Rename is fine. > > Cool. > > >> Yes, that part we agree on I think. We disagree that (after the rename) > >> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST. > > > > Not always. It must be off if compatibility with old qemu is disabled. > > Yes, of course. > > >> _Plus_ adding the new feature bit, which is needed to actually tell the > > > > This part I do not get. What is silent deflate, why is it useful > > and what it has to do with what we are discussing here? > > Silent deflate is deflating just by using the page, without using the > deflateq at all. So it can be done even from GFP_ATOMIC context. BTW I don't see a need to avoid deflateq. Without MUST_TELL_HOST you can just deflate and then bounce telling the host out to a thread. > But knowing that the guest will _not_ deflate silently also benefits the > host. This is the cause of unpinning ballooned pages and pinning them > again upon deflation. This allows cooperative memory overcommit even if > the guests' memory is pinned, similar to what Xen did before it > supported overcommit. This is the second feature bit. This is the MUST_TELL_HOST. > There are three cases: > > * guest will never do silent deflation: current Linux driver. Host may > do munlock/mlock dance. Negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST > only, features = VIRTIO_BALLOON_F_WILL_TELL_HOST. > > * guest will always do silent deflation: current Windows driver. Not exactly. It is not silent. It tells host just after use. > Negotiates nothing, or if it cares it can negotiate > VIRTIO_BALLOON_F_SILENT_DEFLATE. Host mustn't do munlock/mlock dance. > > * guest may do silent deflation if available: combo of Linux driver + > Frank's driver. Negotiates both features, looks at > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave: > > ** If PCI passthrough, the host will not negotiate > VIRTIO_BALLOON_F_SILENT_DEFLATE. The driver will behave as the > current Linux driver, the host can do the munlock/mlock dance. So this case is just existing interface. OK. > ** If no PCI passthrough, the host will negotiate > VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more > aggressively and not use the deflateq. > > Paolo This last trickery confuses me. It just does not make sense to set both SILENT and TELL: either you are silent or you tell. What is the benefit of avoiding host notification? It seems cleaner for the host to know the state.
Il 07/09/2012 16:25, Michael S. Tsirkin ha scritto: >> > Silent deflate is deflating just by using the page, without using the >> > deflateq at all. So it can be done even from GFP_ATOMIC context. > BTW I don't see a need to avoid deflateq. > Without MUST_TELL_HOST you can just deflate and then > bounce telling the host out to a thread. Yes, that's fine too. >> > But knowing that the guest will _not_ deflate silently also benefits the >> > host. This is the cause of unpinning ballooned pages and pinning them >> > again upon deflation. This allows cooperative memory overcommit even if >> > the guests' memory is pinned, similar to what Xen did before it >> > supported overcommit. This is the second feature bit. > This is the MUST_TELL_HOST. Almost. One is "the guest, if really needed, can tell the host of pages". If not negotiated, and the host does not support it, the host must break the guest (e.g. fail to offer any virtqueues). The other is "the guest, though, would prefer not to do so". It is different because the guest can proceed in a fallback mode even if the host doesn't offer it. You're interpreting features as something that dictates behavior, but they're really just advisory. You could negotiate VIRTIO_BLK_F_TOPOLOGY and end up never reading the fields; you could negotiate VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement. >> > * guest will always do silent deflation: current Windows driver. > Not exactly. It is not silent. It tells host > just after use. Yeah, but no difference from the POV of the driver. Delaying or avoiding is the same in the end. The spec says it well: "In this case, deflation advice is merely a courtesy". >> > Negotiates nothing, or if it cares it can negotiate >> > VIRTIO_BALLOON_F_SILENT_DEFLATE. Host mustn't do munlock/mlock dance. >> > >> > * guest may do silent deflation if available: combo of Linux driver + >> > Frank's driver. Negotiates both features, looks at >> > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave: >> > >> > ** If PCI passthrough, the host will not negotiate >> > VIRTIO_BALLOON_F_SILENT_DEFLATE. The driver will behave as the >> > current Linux driver, the host can do the munlock/mlock dance. > So this case is just existing interface. OK. > >> > ** If no PCI passthrough, the host will negotiate >> > VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more >> > aggressively and not use the deflateq. >> > > This last trickery confuses me. It just does not make sense to set both > SILENT and TELL: either you are silent or you tell. "I can be chatty if you ask me, but I'd rather be silent if you let me". TELL is a request of the host to the guest; the guest can agree or not. SILENT is a request of the guest to the host; the host can agree or not. > What is the benefit of avoiding host notification? > It seems cleaner for the host to know the state. Yeah, if you want to do it you can. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote: >> > So it looks like a bug: we should teach driver to tell host first on leak? >> > Yan, Vadim, can you comment please? >> > >> > Also if true, looks like this bit will be useful to detect a fixed driver on >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you >> > think? >> >> So, feature is unimplemented in qemu, and broken in drivers. I starting >> to share Paolo's dislike of it. > > What is broken in drivers? Because supporting the feature is *not* optional for a driver. If the device said MUST_TELL_HOST, it meant that the driver *had* to tell the host before it touched the page, otherwise Bad Things might happen. It was in the original spec precisely to allow devices to actually *remove* pages. Noone ever noticed the windows driver didn't support it, because qemu never requires MUST_TELL_HOST. So in practice, it's now an optional feature. Since no device used it anyway, we're better off discarding it than trying to fix it. If someone wants an *optional* "tell me first" feature later, that's easy to add, but I don't see why they'd want to. > Do we really know there are no hypervisors implementing it? As much as can be known. Qemu doesn't, lkvm doesn't. > As I said above drivers do have support. Not the windows drivers. So it's optional, thus removing it will likely harm noone. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 08/09/2012 07:06, Rusty Russell ha scritto: >> > Do we really know there are no hypervisors implementing it? > As much as can be known. Qemu doesn't, lkvm doesn't. > However, there are cases in which it would be nice to have it. Repurposing the bit to how it has been used so far (as a guest->host communication bit, not host->guest) is a better choice. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 07, 2012 at 04:44:40PM +0200, Paolo Bonzini wrote: > Il 07/09/2012 16:25, Michael S. Tsirkin ha scritto: > >> > Silent deflate is deflating just by using the page, without using the > >> > deflateq at all. So it can be done even from GFP_ATOMIC context. > > BTW I don't see a need to avoid deflateq. > > Without MUST_TELL_HOST you can just deflate and then > > bounce telling the host out to a thread. > > Yes, that's fine too. OK so it's preferable: will work on existing hypervisors. > >> > But knowing that the guest will _not_ deflate silently also benefits the > >> > host. This is the cause of unpinning ballooned pages and pinning them > >> > again upon deflation. This allows cooperative memory overcommit even if > >> > the guests' memory is pinned, similar to what Xen did before it > >> > supported overcommit. This is the second feature bit. > > This is the MUST_TELL_HOST. > > Almost. One is "the guest, if really needed, can tell the host of > pages". If not negotiated, and the host does not support it, the host > must break the guest (e.g. fail to offer any virtqueues). There is no way in spec to break the guest. You can not fail to offer virtqueues. Besides, there is no guarantee that virtqueue setup happens after feature negotiation. > The other is "the guest, though, would prefer not to do so". It is > different because the guest can proceed in a fallback mode even if the > host doesn't offer it. I think I get what your proposed SILENT means what I do not get is the motivation. It looks like a premature optimization to me. > You're interpreting features as something that dictates behavior, but > they're really just advisory. The spec is pretty clear that if guest acks feature it is a contract that dictates behaviour. If it doesn't it is either ignored or just informative depending on feature. > You could negotiate VIRTIO_BLK_F_TOPOLOGY > and end up never reading the fields; you could negotiate > VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement. Block example is just informative. It does not need to be negotiated even to be used. But last example is wrong. If you ack GUEST_ANNOUNCE hypervisor assumes guest will announce self, if guest does not do it this break migration. > >> > * guest will always do silent deflation: current Windows driver. > > Not exactly. It is not silent. It tells host > > just after use. > > Yeah, but no difference from the POV of the driver. > > Delaying or avoiding is the same in the end. The spec says it well: "In > this case, deflation advice is merely a courtesy". So it looks like we don't need a new bit to leak in atomic ctx. Just do not ack MUST_TELL_HOST and delay telling host to a wq. IMO we should not add random stuff to spec like this just because it seems like a good idea. > >> > Negotiates nothing, or if it cares it can negotiate > >> > VIRTIO_BALLOON_F_SILENT_DEFLATE. Host mustn't do munlock/mlock dance. > >> > > >> > * guest may do silent deflation if available: combo of Linux driver + > >> > Frank's driver. Negotiates both features, looks at > >> > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave: > >> > > >> > ** If PCI passthrough, the host will not negotiate > >> > VIRTIO_BALLOON_F_SILENT_DEFLATE. The driver will behave as the > >> > current Linux driver, the host can do the munlock/mlock dance. > > So this case is just existing interface. OK. > > > >> > ** If no PCI passthrough, the host will negotiate > >> > VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more > >> > aggressively and not use the deflateq. > >> > > > This last trickery confuses me. It just does not make sense to set both > > SILENT and TELL: either you are silent or you tell. > > "I can be chatty if you ask me, but I'd rather be silent if you let me". > > TELL is a request of the host to the guest; the guest can agree or not. > > SILENT is a request of the guest to the host; the host can agree or not. OK so TELL says *when* to notify host, SILENT if set allows guest to skip leak notifications? In this case TELL should just be ignored when SILENT is set. Otherwise it's a reasonable extension. But I am still not sure *why* do we need SILENT - any data showing that avoiding these notifications is a win? Let's not add new features until we see an actual use. > > It seems cleaner for the host to know the state. > > Yeah, if you want to do it you can. > > Paolo IMHO, renaming is fine since there is confusion. But WILL_TELL is also not all that clear imho. I think the confusion is that TELL_HOST seems to imply we can avoid telling host at all. How about being explicit? VIRTIO_BALLOON_F_HOST_ACK_BEFORE_DEFLATE Hmm? "MUST" "WILL" are not really informative. Also some confusion I think is from spec text saying "feature is set" in many cases where it really almost always means "feature is negotiated". Let's address?
On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote: > >> > So it looks like a bug: we should teach driver to tell host first on leak? > >> > Yan, Vadim, can you comment please? > >> > > >> > Also if true, looks like this bit will be useful to detect a fixed driver on > >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you > >> > think? > >> > >> So, feature is unimplemented in qemu, and broken in drivers. I starting > >> to share Paolo's dislike of it. > > > > What is broken in drivers? > > Because supporting the feature is *not* optional for a driver. > > If the device said MUST_TELL_HOST, it meant that the driver *had* to > tell the host before it touched the page, otherwise Bad Things might > happen. It was in the original spec precisely to allow devices to > actually *remove* pages. > > Noone ever noticed the windows driver didn't support it, because qemu > never requires MUST_TELL_HOST. > > So in practice, it's now an optional feature. Since no device used it > anyway, we're better off discarding it than trying to fix it. I trust you this was not the intent. But it seems to be the intent in spec, because almost all features are optional. And so windows driver authors interpreted it this way. And it is *useful* like this. See below. > If someone wants an *optional* "tell me first" feature later, that's > easy to add, but I don't see why they'd want to. Suggested use is for device assignment which needs all guest memory locked. hypervisor can unlock pages in balloon but guest must wait for hypervisor to lock them back before use. when a hypervisor implements this, this will work with linux guests but not windows guests and the existing bit fits exactly the purpose. > > Do we really know there are no hypervisors implementing it? > > As much as can be known. Qemu doesn't, lkvm doesn't. But we can add it in qemu and it will be a useful feature. > > As I said above drivers do have support. > > Not the windows drivers. So it's optional, thus removing it will likely > harm noone. > > Cheers, > Rusty. I think the issue is that kvm always locked all guest memory for assignment. This restriction is removed with vfio which has separate page tables. Now that vfio is upstream and work on qemu integration is being worked on, we might finally see people using this bit to allow memory overcommit with device assignment. So let's not remove yet, there is no rush. Let's document it that this feature is optional, maybe renaming as Paolo suggests.
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote: >> >> > So it looks like a bug: we should teach driver to tell host first on leak? >> >> > Yan, Vadim, can you comment please? >> >> > >> >> > Also if true, looks like this bit will be useful to detect a fixed driver on >> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you >> >> > think? >> >> >> >> So, feature is unimplemented in qemu, and broken in drivers. I starting >> >> to share Paolo's dislike of it. >> > >> > What is broken in drivers? >> >> Because supporting the feature is *not* optional for a driver. >> >> If the device said MUST_TELL_HOST, it meant that the driver *had* to >> tell the host before it touched the page, otherwise Bad Things might >> happen. It was in the original spec precisely to allow devices to >> actually *remove* pages. >> >> Noone ever noticed the windows driver didn't support it, because qemu >> never requires MUST_TELL_HOST. >> >> So in practice, it's now an optional feature. Since no device used it >> anyway, we're better off discarding it than trying to fix it. > > I trust you this was not the intent. But it seems to be > the intent in spec, because almost all features are optional. > > And so windows driver authors interpreted it > this way. And it is *useful* like this. See below. ... > Suggested use is for device assignment which needs all guest memory > locked. hypervisor can unlock pages in balloon but guest must wait for > hypervisor to lock them back before use. > > when a hypervisor implements this, > this will work with linux guests but not windows > guests and the existing bit fits exactly the purpose. If a hypervisor needs this, and the guest doesn't support it, then the hypervisor can only abandon the balloon device. That's not my definition of "optional". >> > Do we really know there are no hypervisors implementing it? >> >> As much as can be known. Qemu doesn't, lkvm doesn't. > > But we can add it in qemu and it will be a useful feature. > >> > As I said above drivers do have support. >> >> Not the windows drivers. So it's optional, thus removing it will likely >> harm noone. >> >> Cheers, >> Rusty. > > I think the issue is that kvm always locked all guest memory > for assignment. This restriction is removed > with vfio which has separate page tables. > Now that vfio is upstream and work on qemu integration > is being worked on, we might finally see people using this bit > to allow memory overcommit with device assignment. That was left-field.... so you're saying some guest might pull a page from the balloon and DMA to it, and the vfio device needs to know in advance that it's going to do it? But what will we do if the guest doesn't ack the bit? ie. I don't think it can really be optional. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto: >> Almost. One is "the guest, if really needed, can tell the host of >> pages". If not negotiated, and the host does not support it, the host >> must break the guest (e.g. fail to offer any virtqueues). > > There is no way in spec to break the guest. > You can not fail to offer virtqueues. You can always return 0 for the first queue. > Besides, there is no guarantee that virtqueue setup > happens after feature negotiation. It is the only way that makes sense though (unless the guest would write 0 for its features). Should we change that? >> The other is "the guest, though, would prefer not to do so". It is >> different because the guest can proceed in a fallback mode even if the >> host doesn't offer it. > > I think I get what your proposed SILENT means what I do not get > is the motivation. It looks like a premature optimization to me. The motivation is to let the driver choose between two behaviors: the current one where ballooning is only done on request, and a more aggressive one. > The spec is pretty clear that if guest acks feature it > is a contract that dictates behaviour. > If it doesn't it is either ignored or just informative > depending on feature. > >> You could negotiate VIRTIO_BLK_F_TOPOLOGY >> and end up never reading the fields; you could negotiate >> VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement. > > Block example is just informative. It does not need to be > negotiated even to be used. But last example is wrong. > If you ack GUEST_ANNOUNCE hypervisor assumes guest will > announce self, if guest does not do it this break migration. It is wrong indeed, sorry. Better example: the driver can negotiate VIRTIO_NET_F_CTRL_RX and never set promiscuous mode. The device has to obey if it does. Similarly, if you set VIRTIO_BALLOON_F_SILENT_DEFLATE and only do chatty deflate later, that's fine. If you do silent deflate, and the device negotiated the feature, it has to work. >> Delaying or avoiding is the same in the end. The spec says it well: "In >> this case, deflation advice is merely a courtesy". > > So it looks like we don't need a new bit to leak in atomic ctx. > Just do not ack MUST_TELL_HOST and delay telling host to a wq. > IMO we should not add random stuff to spec like this just because it > seems like a good idea. But this way you have to choose all-or-none. If the host cannot do silent deflate, you cannot balloon anymore, not even in the normal "cooperative" mode. > OK so TELL says *when* to notify host, SILENT if set allows guest > to skip leak notifications? In this case TELL should just be ignored > when SILENT is set. Yeah, that was my first idea. However, there are existing drivers that ignore SILENT, so that would not be 100% exact. > IMHO, renaming is fine since there is confusion. > But WILL_TELL is also not all that clear imho. > I think the confusion is that TELL_HOST seems to > imply we can avoid telling host at all. > How about being explicit? > > VIRTIO_BALLOON_F_HOST_ACK_BEFORE_DEFLATE Makes sense. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 10/09/2012 03:43, Rusty Russell ha scritto: >> Now that vfio is upstream and work on qemu integration >> is being worked on, we might finally see people using this bit >> to allow memory overcommit with device assignment. > That was left-field.... so you're saying some guest might pull a page > from the balloon and DMA to it, and the vfio device needs to know in > advance that it's going to do it? Not the vfio device, but the page needs to be pinned with mlock so the effect is the same. > But what will we do if the guest doesn't ack the bit? Just pin the whole memory. Paolo > ie. I don't think it can really be optional. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 10, 2012 at 11:13:12AM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote: > >> >> > So it looks like a bug: we should teach driver to tell host first on leak? > >> >> > Yan, Vadim, can you comment please? > >> >> > > >> >> > Also if true, looks like this bit will be useful to detect a fixed driver on > >> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you > >> >> > think? > >> >> > >> >> So, feature is unimplemented in qemu, and broken in drivers. I starting > >> >> to share Paolo's dislike of it. > >> > > >> > What is broken in drivers? > >> > >> Because supporting the feature is *not* optional for a driver. > >> > >> If the device said MUST_TELL_HOST, it meant that the driver *had* to > >> tell the host before it touched the page, otherwise Bad Things might > >> happen. It was in the original spec precisely to allow devices to > >> actually *remove* pages. > >> > >> Noone ever noticed the windows driver didn't support it, because qemu > >> never requires MUST_TELL_HOST. > >> > >> So in practice, it's now an optional feature. Since no device used it > >> anyway, we're better off discarding it than trying to fix it. > > > > I trust you this was not the intent. But it seems to be > > the intent in spec, because almost all features are optional. > > > > And so windows driver authors interpreted it > > this way. And it is *useful* like this. See below. > > ... > > > Suggested use is for device assignment which needs all guest memory > > locked. hypervisor can unlock pages in balloon but guest must wait for > > hypervisor to lock them back before use. > > > > when a hypervisor implements this, > > this will work with linux guests but not windows > > guests and the existing bit fits exactly the purpose. > > If a hypervisor needs this, and the guest doesn't support it, then > the hypervisor can only abandon the balloon device. That's not my > definition of "optional". > > >> > Do we really know there are no hypervisors implementing it? > >> > >> As much as can be known. Qemu doesn't, lkvm doesn't. > > > > But we can add it in qemu and it will be a useful feature. > > > >> > As I said above drivers do have support. > >> > >> Not the windows drivers. So it's optional, thus removing it will likely > >> harm noone. > >> > >> Cheers, > >> Rusty. > > > > I think the issue is that kvm always locked all guest memory > > for assignment. This restriction is removed > > with vfio which has separate page tables. > > Now that vfio is upstream and work on qemu integration > > is being worked on, we might finally see people using this bit > > to allow memory overcommit with device assignment. > > That was left-field.... so you're saying some guest might pull a page > from the balloon and DMA to it, and the vfio device needs to know in > advance that it's going to do it? > > But what will we do if the guest doesn't ack the bit? > ie. I don't think it can really be optional. > > Cheers, > Rusty. We have several options: 1. No memory overcommit feature. This is always the case ATM! 2. Do not hot-plug assigned device. 3. Hot-unplug assigned device. 4. Some assigned devices can be able to handle memory errors e.g. using ATS. Limit hotplug to these. > I don't think it can really be optional. It is optional *for the device*. But I don't insist on my patch. I am merely saying that 1. The bit is useful for host to detect guests which can't combine memory overcommit with device assignment, and this set of guests is not empty. 2. The fact that this bit is not optional for drivers is not well documented. The only hint seems the use of words "feature is set" as opposed to "feature is negoticated" as with other features. The spec intended "feature is set in Device Features bits". Drivers interpreted this as "feature is set in Guest Features bits". Hard to blame them, let us give them time to address this.
On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote: > Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto: > >> Almost. One is "the guest, if really needed, can tell the host of > >> pages". If not negotiated, and the host does not support it, the host > >> must break the guest (e.g. fail to offer any virtqueues). > > > > There is no way in spec to break the guest. > > You can not fail to offer virtqueues. > > You can always return 0 for the first queue. I don't think guest drivers recover gracefully from this. Do they? > > Besides, there is no guarantee that virtqueue setup > > happens after feature negotiation. > > It is the only way that makes sense though (unless the guest would write > 0 for its features). > Should we change that? Not sure. This was not always the case. Further setup can fail with e.g ENOMEM and driver could retry with a set of more conservative features. I do think it would be nice to add a generic way for device to notify guest about an internal failure. This can only happen after DRIVER_OK status is written though, and since existing drivers do not expect such failure, it might be too late. > >> The other is "the guest, though, would prefer not to do so". It is > >> different because the guest can proceed in a fallback mode even if the > >> host doesn't offer it. > > > > I think I get what your proposed SILENT means what I do not get > > is the motivation. It looks like a premature optimization to me. > > The motivation is to let the driver choose between two behaviors: the > current one where ballooning is only done on request, and a more > aggressive one. Yes but why is being silent any good? Optimization? Any data to show that it will help some workload? ... > > OK so TELL says *when* to notify host, SILENT if set allows guest > > to skip leak notifications? In this case TELL should just be ignored > > when SILENT is set. > > Yeah, that was my first idea. However, there are existing drivers that > ignore SILENT, so that would not be 100% exact. Not sure I follow the logic. They don't ack SILENT so that would be 100% exact.
Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto: > On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote: >> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto: >>>> Almost. One is "the guest, if really needed, can tell the host of >>>> pages". If not negotiated, and the host does not support it, the host >>>> must break the guest (e.g. fail to offer any virtqueues). >>> >>> There is no way in spec to break the guest. >>> You can not fail to offer virtqueues. >> >> You can always return 0 for the first queue. > > I don't think guest drivers recover gracefully from this. > Do they? No, that's the point ("break the guest" is really "break the driver"). >>> Besides, there is no guarantee that virtqueue setup >>> happens after feature negotiation. >> >> It is the only way that makes sense though (unless the guest would write >> 0 for its features). Should we change that? > > I do think it would be nice to add a generic way for device to > notify guest about an internal failure. > This can only happen after DRIVER_OK status is written though, > and since existing drivers do not expect such failure, it might > be too late. Agreed. >>>> The other is "the guest, though, would prefer not to do so". It is >>>> different because the guest can proceed in a fallback mode even if the >>>> host doesn't offer it. >>> >>> I think I get what your proposed SILENT means what I do not get >>> is the motivation. It looks like a premature optimization to me. >> >> The motivation is to let the driver choose between two behaviors: the >> current one where ballooning is only done on request, and a more >> aggressive one. > > Yes but why is being silent any good? Optimization? > Any data to show that it will help some workload? Idle guests can move cache pages to the balloon. You can overcommit more aggressively, because the host can madvise away a lot more memory. >>> OK so TELL says *when* to notify host, SILENT if set allows guest >>> to skip leak notifications? In this case TELL should just be ignored >>> when SILENT is set. >> >> Yeah, that was my first idea. However, there are existing drivers that >> ignore SILENT, so that would not be 100% exact. > > Not sure I follow the logic. > They don't ack SILENT so that would be 100% exact. Hmm, then I'm not sure I follow yours. We agreed that delaying notifications or skipping them is really the same thing, right? I think we're just stuck in a linguistic problem, with "must not" being wrong and "does not have to" being too verbose. Calling it VIRTIO_BALLOON_F_SILENT_DEFLATE was a workaround for this, but perhaps it adds more confusion. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 10, 2012 at 08:38:09AM +0200, Paolo Bonzini wrote: > Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto: > > On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote: > >> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto: > >>>> Almost. One is "the guest, if really needed, can tell the host of > >>>> pages". If not negotiated, and the host does not support it, the host > >>>> must break the guest (e.g. fail to offer any virtqueues). > >>> > >>> There is no way in spec to break the guest. > >>> You can not fail to offer virtqueues. > >> > >> You can always return 0 for the first queue. > > > > I don't think guest drivers recover gracefully from this. > > Do they? > > No, that's the point ("break the guest" is really "break the driver"). You can just stop VM then. No need for a side channel. ... > >>>> The other is "the guest, though, would prefer not to do so". It is > >>>> different because the guest can proceed in a fallback mode even if the > >>>> host doesn't offer it. > >>> > >>> I think I get what your proposed SILENT means what I do not get > >>> is the motivation. It looks like a premature optimization to me. > >> > >> The motivation is to let the driver choose between two behaviors: the > >> current one where ballooning is only done on request, and a more > >> aggressive one. > > > > Yes but why is being silent any good? Optimization? > > Any data to show that it will help some workload? > > Idle guests can move cache pages to the balloon. You can overcommit > more aggressively, because the host can madvise away a lot more memory. IMHO this feature needs more thought. E.g. how will this work with assignment? If we build something let's build it in a way that plays nicely with other features. > >>> OK so TELL says *when* to notify host, SILENT if set allows guest > >>> to skip leak notifications? In this case TELL should just be ignored > >>> when SILENT is set. > >> > >> Yeah, that was my first idea. However, there are existing drivers that > >> ignore SILENT, so that would not be 100% exact. > > > > Not sure I follow the logic. > > They don't ack SILENT so that would be 100% exact. > > Hmm, then I'm not sure I follow yours. We agreed that delaying > notifications or skipping them is really the same thing, right? > > I think we're just stuck in a linguistic problem, with "must not" being > wrong and "does not have to" being too verbose. Calling it > VIRTIO_BALLOON_F_SILENT_DEFLATE was a workaround for this, but perhaps > it adds more confusion. > > Paolo Looks like it does :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Il 10/09/2012 08:47, Michael S. Tsirkin ha scritto: > On Mon, Sep 10, 2012 at 08:38:09AM +0200, Paolo Bonzini wrote: >> Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto: >>> On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote: >>>> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto: >>>>>> Almost. One is "the guest, if really needed, can tell the host of >>>>>> pages". If not negotiated, and the host does not support it, the host >>>>>> must break the guest (e.g. fail to offer any virtqueues). >>>>> >>>>> There is no way in spec to break the guest. >>>>> You can not fail to offer virtqueues. >>>> >>>> You can always return 0 for the first queue. >>> >>> I don't think guest drivers recover gracefully from this. >>> Do they? >> >> No, that's the point ("break the guest" is really "break the driver"). > > You can just stop VM then. No need for a side channel. Keeping the VM running, just with no balloon driver is preferrable. >>>>>> The other is "the guest, though, would prefer not to do so". It is >>>>>> different because the guest can proceed in a fallback mode even if the >>>>>> host doesn't offer it. >>>>> >>>>> I think I get what your proposed SILENT means what I do not get >>>>> is the motivation. It looks like a premature optimization to me. >>>> >>>> The motivation is to let the driver choose between two behaviors: the >>>> current one where ballooning is only done on request, and a more >>>> aggressive one. >>> >>> Yes but why is being silent any good? Optimization? >>> Any data to show that it will help some workload? >> >> Idle guests can move cache pages to the balloon. You can overcommit >> more aggressively, because the host can madvise away a lot more memory. > > IMHO this feature needs more thought. E.g. how will this work with assignment? Revert to normal cooperative ballooning. > If we build something let's build it in a way that plays nicely > with other features. Yes, that's the point of SILENT. :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Michael S. Tsirkin" <mst@redhat.com> writes: > We have several options: > 1. No memory overcommit feature. This is always the case ATM! > 2. Do not hot-plug assigned device. > 3. Hot-unplug assigned device. > 4. Some assigned devices can be able to handle memory errors > e.g. using ATS. Limit hotplug to these. OK, let's leave it alone, and the first person to try to implement it in a device can figure out how to fall back. I replied on the other thread: I think a SILENT_DEFLATE option is a future optimization. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 7a073f4..1a25a18 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -6238,6 +6238,8 @@ bits \begin_deeper \begin_layout Description + +\change_deleted 1531152142 1346917221 VIRTIO_BALLOON_F_MUST_TELL_HOST \begin_inset space ~ \end_inset @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ \end_inset (1) A virtqueue for reporting guest memory statistics is present. +\change_inserted 1531152142 1346917193 + +\end_layout + +\begin_layout Description + +\change_inserted 1531152142 1346917219 +VIRTIO_BALLOON_F_SILENT_DEFLATE +\begin_inset space ~ +\end_inset + +(2) Host does not need to be told before pages from the balloon are used. +\change_unchanged + \end_layout \end_deeper @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously \end_layout \begin_layout Enumerate -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not - use these requested pages until that descriptor in the deflateq has been - used by the device. +If the VIRTIO_BALLOON_F_ +\change_deleted 1531152142 1346917234 +MUST_TELL_HOST +\change_inserted 1531152142 1346917237 +SILENT_DEFLATE +\change_unchanged + feature is +\change_inserted 1531152142 1346917241 +not +\change_unchanged +set, the guest may not use these requested pages until that descriptor in + the deflateq has been used by the device. + +\change_inserted 1531152142 1346917253 + If it is set, the guest may choose to not use the deflateq at all. +\change_unchanged + \end_layout \begin_layout Enumerate
VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a "negative" feature: it tells you that silent defalte is not supported. Right now, QEMU refuses migration if the target does not support all the features that were negotiated. But then: - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed, which is wrong; - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which is useless. Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++--- 1 file modificato, 33 inserzioni(+), 3 rimozioni(-)