Message ID | 20200527041414.12700.50293.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-balloon: add support for page poison and free page reporting | expand |
On Tue, May 26, 2020 at 9:14 PM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > In an upcoming patch a feature named Free Page Reporting is about to be > added. In order to avoid any confusion we should drop the use of the word > 'report' when referring to Free Page Hinting. So what this patch does is go > through and replace all instances of 'report' with 'hint" when we are > referring to free page hinting. > > Acked-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > --- > hw/virtio/virtio-balloon.c | 78 ++++++++++++++++++------------------ > include/hw/virtio/virtio-balloon.h | 20 +++++---- > 2 files changed, 49 insertions(+), 49 deletions(-) > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > index 3e2ac1104b5f..dc15409b0bb6 100644 > --- a/hw/virtio/virtio-balloon.c > +++ b/hw/virtio/virtio-balloon.c ... > @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) > return 0; > } > > -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { > +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > .name = "virtio-balloon-device/free-page-report", > .version_id = 1, > .minimum_version_id = 1, > .needed = virtio_balloon_free_page_support, > .fields = (VMStateField[]) { > - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), > - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), > + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), > + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), > VMSTATE_END_OF_LIST() > } > }; So I noticed this patch wasn't in the list of patches pulled, but that is probably for the best since I believe the change above might have broken migration as VMSTATE_UINT32 does a stringify on the first parameter. Any advice on how to address it, or should I just give up on renaming free_page_report_cmd_id and free_page_report_status? Looking at this I wonder why we even need to migrate these values? It seems like if we are completing a migration the cmd_id should always be "DONE" shouldn't it? It isn't as if we are going to migrate the hinting from one host to another. We will have to start over which is essentially the signal that the "DONE" value provides. Same thing for the status. We shouldn't be able to migrate unless both of these are already in the "DONE" state so if anything I wonder if we shouldn't have that as the initial state for the device and just drop the migration info. Thanks. - Alex
On 13.06.20 22:07, Alexander Duyck wrote: > On Tue, May 26, 2020 at 9:14 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: >> >> From: Alexander Duyck <alexander.h.duyck@linux.intel.com> >> >> In an upcoming patch a feature named Free Page Reporting is about to be >> added. In order to avoid any confusion we should drop the use of the word >> 'report' when referring to Free Page Hinting. So what this patch does is go >> through and replace all instances of 'report' with 'hint" when we are >> referring to free page hinting. >> >> Acked-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> >> --- >> hw/virtio/virtio-balloon.c | 78 ++++++++++++++++++------------------ >> include/hw/virtio/virtio-balloon.h | 20 +++++---- >> 2 files changed, 49 insertions(+), 49 deletions(-) >> >> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c >> index 3e2ac1104b5f..dc15409b0bb6 100644 >> --- a/hw/virtio/virtio-balloon.c >> +++ b/hw/virtio/virtio-balloon.c > > ... > >> @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) >> return 0; >> } >> >> -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { >> +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { >> .name = "virtio-balloon-device/free-page-report", >> .version_id = 1, >> .minimum_version_id = 1, >> .needed = virtio_balloon_free_page_support, >> .fields = (VMStateField[]) { >> - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), >> - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), >> + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), >> + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), >> VMSTATE_END_OF_LIST() >> } >> }; > > So I noticed this patch wasn't in the list of patches pulled, but that > is probably for the best since I believe the change above might have > broken migration as VMSTATE_UINT32 does a stringify on the first > parameter. Indeed, it's the name of the vmstate field. But I don't think it is relevant for migration. It's and indicator if a field is valid and it's used in traces/error messages. See git grep "field->name" I don't think renaming this is problematic. Can you rebase and resent? Thanks! > Any advice on how to address it, or should I just give up on renaming > free_page_report_cmd_id and free_page_report_status? > > Looking at this I wonder why we even need to migrate these values? It > seems like if we are completing a migration the cmd_id should always > be "DONE" shouldn't it? It isn't as if we are going to migrate the The *status* should be DONE IIUC. The cmd_id might be relevant, no? It's always incremented until it wraps. > hinting from one host to another. We will have to start over which is > essentially the signal that the "DONE" value provides. Same thing for > the status. We shouldn't be able to migrate unless both of these are > already in the "DONE" state so if anything I wonder if we shouldn't > have that as the initial state for the device and just drop the > migration info. We'll have to glue that to a compat machine unfortunately, so we can just keep migrating it ... :(
On Thu, Jun 18, 2020 at 5:54 AM David Hildenbrand <david@redhat.com> wrote: > > On 13.06.20 22:07, Alexander Duyck wrote: > > On Tue, May 26, 2020 at 9:14 PM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > >> > >> From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > >> > >> In an upcoming patch a feature named Free Page Reporting is about to be > >> added. In order to avoid any confusion we should drop the use of the word > >> 'report' when referring to Free Page Hinting. So what this patch does is go > >> through and replace all instances of 'report' with 'hint" when we are > >> referring to free page hinting. > >> > >> Acked-by: David Hildenbrand <david@redhat.com> > >> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > >> --- > >> hw/virtio/virtio-balloon.c | 78 ++++++++++++++++++------------------ > >> include/hw/virtio/virtio-balloon.h | 20 +++++---- > >> 2 files changed, 49 insertions(+), 49 deletions(-) > >> > >> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > >> index 3e2ac1104b5f..dc15409b0bb6 100644 > >> --- a/hw/virtio/virtio-balloon.c > >> +++ b/hw/virtio/virtio-balloon.c > > > > ... > > > >> @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) > >> return 0; > >> } > >> > >> -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { > >> +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > >> .name = "virtio-balloon-device/free-page-report", > >> .version_id = 1, > >> .minimum_version_id = 1, > >> .needed = virtio_balloon_free_page_support, > >> .fields = (VMStateField[]) { > >> - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), > >> - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), > >> + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), > >> + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), > >> VMSTATE_END_OF_LIST() > >> } > >> }; > > > > So I noticed this patch wasn't in the list of patches pulled, but that > > is probably for the best since I believe the change above might have > > broken migration as VMSTATE_UINT32 does a stringify on the first > > parameter. > > Indeed, it's the name of the vmstate field. But I don't think it is > relevant for migration. It's and indicator if a field is valid and it's > used in traces/error messages. > > See git grep "field->name" > > I don't think renaming this is problematic. Can you rebase and resent? > Thanks! Okay, I will. > > Any advice on how to address it, or should I just give up on renaming > > free_page_report_cmd_id and free_page_report_status? > > > > Looking at this I wonder why we even need to migrate these values? It > > seems like if we are completing a migration the cmd_id should always > > be "DONE" shouldn't it? It isn't as if we are going to migrate the > > The *status* should be DONE IIUC. The cmd_id might be relevant, no? It's > always incremented until it wraps. The thing is, the cmd_id visible to the driver if the status is DONE is the cmd_id value for DONE. So as long as the driver acknowledges the value we could essentially start over the cmd_id without any negative effect. The driver would have to put down a new descriptor to start a block of hinting in order to begin reporting again so there shouldn't be any risk of us falsely hinting pages that were in a previous epoch. Ugh, although now looking at it I think we might have a bug in the QEMU code in that the driver could in theory force its way past a "STOP" by just replaying the last command_id descriptor and then keep going. Should be a pretty easy fix though as we should only allow a transition to S_START if the status is S_REQUESTED/ > > hinting from one host to another. We will have to start over which is > > essentially the signal that the "DONE" value provides. Same thing for > > the status. We shouldn't be able to migrate unless both of these are > > already in the "DONE" state so if anything I wonder if we shouldn't > > have that as the initial state for the device and just drop the > > migration info. > > We'll have to glue that to a compat machine unfortunately, so we can > just keep migrating it ... :( Yeah, I kind of figured that would be the case. However if the name change is not an issue then it should not be a problem. Thanks. - Alex
On 18.06.20 17:14, Alexander Duyck wrote: > On Thu, Jun 18, 2020 at 5:54 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 13.06.20 22:07, Alexander Duyck wrote: >>> On Tue, May 26, 2020 at 9:14 PM Alexander Duyck >>> <alexander.duyck@gmail.com> wrote: >>>> >>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com> >>>> >>>> In an upcoming patch a feature named Free Page Reporting is about to be >>>> added. In order to avoid any confusion we should drop the use of the word >>>> 'report' when referring to Free Page Hinting. So what this patch does is go >>>> through and replace all instances of 'report' with 'hint" when we are >>>> referring to free page hinting. >>>> >>>> Acked-by: David Hildenbrand <david@redhat.com> >>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> >>>> --- >>>> hw/virtio/virtio-balloon.c | 78 ++++++++++++++++++------------------ >>>> include/hw/virtio/virtio-balloon.h | 20 +++++---- >>>> 2 files changed, 49 insertions(+), 49 deletions(-) >>>> >>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c >>>> index 3e2ac1104b5f..dc15409b0bb6 100644 >>>> --- a/hw/virtio/virtio-balloon.c >>>> +++ b/hw/virtio/virtio-balloon.c >>> >>> ... >>> >>>> @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) >>>> return 0; >>>> } >>>> >>>> -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { >>>> +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { >>>> .name = "virtio-balloon-device/free-page-report", >>>> .version_id = 1, >>>> .minimum_version_id = 1, >>>> .needed = virtio_balloon_free_page_support, >>>> .fields = (VMStateField[]) { >>>> - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), >>>> - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), >>>> + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), >>>> + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), >>>> VMSTATE_END_OF_LIST() >>>> } >>>> }; >>> >>> So I noticed this patch wasn't in the list of patches pulled, but that >>> is probably for the best since I believe the change above might have >>> broken migration as VMSTATE_UINT32 does a stringify on the first >>> parameter. >> >> Indeed, it's the name of the vmstate field. But I don't think it is >> relevant for migration. It's and indicator if a field is valid and it's >> used in traces/error messages. >> >> See git grep "field->name" >> >> I don't think renaming this is problematic. Can you rebase and resent? >> Thanks! > > Okay, I will. > >>> Any advice on how to address it, or should I just give up on renaming >>> free_page_report_cmd_id and free_page_report_status? >>> >>> Looking at this I wonder why we even need to migrate these values? It >>> seems like if we are completing a migration the cmd_id should always >>> be "DONE" shouldn't it? It isn't as if we are going to migrate the >> >> The *status* should be DONE IIUC. The cmd_id might be relevant, no? It's >> always incremented until it wraps. > > The thing is, the cmd_id visible to the driver if the status is DONE > is the cmd_id value for DONE. So as long as the driver acknowledges > the value we could essentially start over the cmd_id without any > negative effect. The driver would have to put down a new descriptor to > start a block of hinting in order to begin reporting again so there > shouldn't be any risk of us falsely hinting pages that were in a > previous epoch. > > Ugh, although now looking at it I think we might have a bug in the > QEMU code in that the driver could in theory force its way past a > "STOP" by just replaying the last command_id descriptor and then keep > going. Should be a pretty easy fix though as we should only allow a > transition to S_START if the status is S_REQUESTED/ Ugh, ... @MST, you might have missed that in another discussion, what's your general opinion about removing free page hinting in QEMU (and Linux)? We keep finding issues in the QEMU implementation, including non-trivial ones, and have to speculate about the actual semantics. I can see that e.g., libvirt does not support it yet.
On Thu, Jun 18, 2020 at 05:58:28PM +0200, David Hildenbrand wrote: > On 18.06.20 17:14, Alexander Duyck wrote: > > On Thu, Jun 18, 2020 at 5:54 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 13.06.20 22:07, Alexander Duyck wrote: > >>> On Tue, May 26, 2020 at 9:14 PM Alexander Duyck > >>> <alexander.duyck@gmail.com> wrote: > >>>> > >>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > >>>> > >>>> In an upcoming patch a feature named Free Page Reporting is about to be > >>>> added. In order to avoid any confusion we should drop the use of the word > >>>> 'report' when referring to Free Page Hinting. So what this patch does is go > >>>> through and replace all instances of 'report' with 'hint" when we are > >>>> referring to free page hinting. > >>>> > >>>> Acked-by: David Hildenbrand <david@redhat.com> > >>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > >>>> --- > >>>> hw/virtio/virtio-balloon.c | 78 ++++++++++++++++++------------------ > >>>> include/hw/virtio/virtio-balloon.h | 20 +++++---- > >>>> 2 files changed, 49 insertions(+), 49 deletions(-) > >>>> > >>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > >>>> index 3e2ac1104b5f..dc15409b0bb6 100644 > >>>> --- a/hw/virtio/virtio-balloon.c > >>>> +++ b/hw/virtio/virtio-balloon.c > >>> > >>> ... > >>> > >>>> @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) > >>>> return 0; > >>>> } > >>>> > >>>> -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { > >>>> +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > >>>> .name = "virtio-balloon-device/free-page-report", > >>>> .version_id = 1, > >>>> .minimum_version_id = 1, > >>>> .needed = virtio_balloon_free_page_support, > >>>> .fields = (VMStateField[]) { > >>>> - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), > >>>> - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), > >>>> + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), > >>>> + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), > >>>> VMSTATE_END_OF_LIST() > >>>> } > >>>> }; > >>> > >>> So I noticed this patch wasn't in the list of patches pulled, but that > >>> is probably for the best since I believe the change above might have > >>> broken migration as VMSTATE_UINT32 does a stringify on the first > >>> parameter. > >> > >> Indeed, it's the name of the vmstate field. But I don't think it is > >> relevant for migration. It's and indicator if a field is valid and it's > >> used in traces/error messages. > >> > >> See git grep "field->name" > >> > >> I don't think renaming this is problematic. Can you rebase and resent? > >> Thanks! > > > > Okay, I will. > > > >>> Any advice on how to address it, or should I just give up on renaming > >>> free_page_report_cmd_id and free_page_report_status? > >>> > >>> Looking at this I wonder why we even need to migrate these values? It > >>> seems like if we are completing a migration the cmd_id should always > >>> be "DONE" shouldn't it? It isn't as if we are going to migrate the > >> > >> The *status* should be DONE IIUC. The cmd_id might be relevant, no? It's > >> always incremented until it wraps. > > > > The thing is, the cmd_id visible to the driver if the status is DONE > > is the cmd_id value for DONE. So as long as the driver acknowledges > > the value we could essentially start over the cmd_id without any > > negative effect. The driver would have to put down a new descriptor to > > start a block of hinting in order to begin reporting again so there > > shouldn't be any risk of us falsely hinting pages that were in a > > previous epoch. > > > > Ugh, although now looking at it I think we might have a bug in the > > QEMU code in that the driver could in theory force its way past a > > "STOP" by just replaying the last command_id descriptor and then keep > > going. Should be a pretty easy fix though as we should only allow a > > transition to S_START if the status is S_REQUESTED/ > > Ugh, ... > > @MST, you might have missed that in another discussion, what's your > general opinion about removing free page hinting in QEMU (and Linux)? We > keep finding issues in the QEMU implementation, including non-trivial > ones, and have to speculate about the actual semantics. I can see that > e.g., libvirt does not support it yet. Not maintaining two similar features sounds attractive. I'm still trying to get my head around the list of issues. So far they all look kind of minor to me. Would you like to summarize them somewhere? The appeal of hinting is that it's 0 overhead outside migration, and pains were taken to avoid keeping pages locked while hypervisor is busy. If we are to drop hinting completely we need to show that reporting can be comparable, and we'll probably want to add a mode for reporting that behaves somewhat similarly.
>> >> Ugh, ... >> >> @MST, you might have missed that in another discussion, what's your >> general opinion about removing free page hinting in QEMU (and Linux)? We >> keep finding issues in the QEMU implementation, including non-trivial >> ones, and have to speculate about the actual semantics. I can see that >> e.g., libvirt does not support it yet. > > Not maintaining two similar features sounds attractive. I consider free page hinting (in QEMU) to be in an unmaintainable state (and it looks like Alex and I are fixing a feature we don't actually intend to use / not aware of users). In contrast to that, the free page reporting functionality/implementation is a walk in the park. > > I'm still trying to get my head around the list of issues. So far they > all look kind of minor to me. Would you like to summarize them > somewhere? Some things I still have in my mind 1. If migration fails during RAM precopy, the guest will never receive a DONE notification. Probably easy to fix. 2. Unclear semantics. Alex tried to document what the actual semantics of hinted pages are. Assume the following in the guest to a previously hinted page /* page was hinted and is reused now */ if (page[x] != Y) page[x] == Y; /* migration ends, we now run on the destination */ BUG_ON(page[x] != Y); /* BUG, because the content chan A guest can observe that. And that could be a random driver that just allocated a page. (I *assume* in Linux we might catch that using kasan, but I am not 100% sure, also, the actual semantics to document are unclear - e.g., for other guests) As Alex mentioned, it is not even guaranteed in QEMU that we receive a zero page on the destination, it could also be something else (e.g., previously migrated values). 3. If I am not wrong, the iothread works in virtio_ballloon_get_free_page_hints() on the virtqueue only with holding the free_page_lock (no BQL). Assume we're migrating, the iothread is active, and the guest triggers a device reset. virtio_balloon_device_reset() will trigger a virtio_balloon_free_page_stop(s). That won't actually wait for the iothread to stop, it will only temporarily lock free_page_lock and update s->free_page_report_status. I think there can be a race between the device reset and the iothread. Once virtio_balloon_free_page_stop() returned, virtio_ballloon_get_free_page_hints() can still call - virtio_queue_set_notification(vq, 0); - virtio_queue_set_notification(vq, 1); - virtio_notify(vdev, vq); - virtqueue_pop() I doubt this is very nice. There are other concerns I had regarding the iothread (e.g., while reporting is active, virtio_ballloon_get_free_page_hints() is essentially a busy loop, in contrast to documented - continue_to_get_hints will always be true). > The appeal of hinting is that it's 0 overhead outside migration, > and pains were taken to avoid keeping pages locked while > hypervisor is busy. > > If we are to drop hinting completely we need to show that reporting > can be comparable, and we'll probably want to add a mode for > reporting that behaves somewhat similarly. Depends on the actual users. If we're dropping a feature that nobody is actively using, I don't think we have to show anything. This feature obviously saw no proper review.
> There are other concerns I had regarding the iothread (e.g., while > reporting is active, virtio_ballloon_get_free_page_hints() is > essentially a busy loop, in contrast to documented - > continue_to_get_hints will always be true). FWIW, I just double checked this and my memory was bad. -
* Alexander Duyck (alexander.duyck@gmail.com) wrote: > On Tue, May 26, 2020 at 9:14 PM Alexander Duyck > <alexander.duyck@gmail.com> wrote: > > > > From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > > > In an upcoming patch a feature named Free Page Reporting is about to be > > added. In order to avoid any confusion we should drop the use of the word > > 'report' when referring to Free Page Hinting. So what this patch does is go > > through and replace all instances of 'report' with 'hint" when we are > > referring to free page hinting. > > > > Acked-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > --- > > hw/virtio/virtio-balloon.c | 78 ++++++++++++++++++------------------ > > include/hw/virtio/virtio-balloon.h | 20 +++++---- > > 2 files changed, 49 insertions(+), 49 deletions(-) > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c > > index 3e2ac1104b5f..dc15409b0bb6 100644 > > --- a/hw/virtio/virtio-balloon.c > > +++ b/hw/virtio/virtio-balloon.c > > ... > > > @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) > > return 0; > > } > > > > -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { > > +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { > > .name = "virtio-balloon-device/free-page-report", > > .version_id = 1, > > .minimum_version_id = 1, > > .needed = virtio_balloon_free_page_support, > > .fields = (VMStateField[]) { > > - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), > > - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), > > + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), > > + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), > > VMSTATE_END_OF_LIST() > > } > > }; > > So I noticed this patch wasn't in the list of patches pulled, but that > is probably for the best since I believe the change above might have > broken migration as VMSTATE_UINT32 does a stringify on the first > parameter. > Any advice on how to address it, or should I just give up on renaming > free_page_report_cmd_id and free_page_report_status? The filed names never hit the wire; the migration format is trivial binary, especially of things like integers - that lands as just 4 bytes on the wire [ hopefully in the place where the destination expects to receive them ]. You need to be careful of the names of top level vmstate devices, and the names of subsections; I don't think any other naming is in the stream. (We've even done hacks in the past of converting a VMSTATE_UINT32 to a pair of UINT16 ) Dave > Looking at this I wonder why we even need to migrate these values? It > seems like if we are completing a migration the cmd_id should always > be "DONE" shouldn't it? It isn't as if we are going to migrate the > hinting from one host to another. We will have to start over which is > essentially the signal that the "DONE" value provides. Same thing for > the status. We shouldn't be able to migrate unless both of these are > already in the "DONE" state so if anything I wonder if we shouldn't > have that as the initial state for the device and just drop the > migration info. > > Thanks. > > - Alex > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Thu, Jun 18, 2020 at 10:10 AM David Hildenbrand <david@redhat.com> wrote: > > >> > >> Ugh, ... > >> > >> @MST, you might have missed that in another discussion, what's your > >> general opinion about removing free page hinting in QEMU (and Linux)? We > >> keep finding issues in the QEMU implementation, including non-trivial > >> ones, and have to speculate about the actual semantics. I can see that > >> e.g., libvirt does not support it yet. > > > > Not maintaining two similar features sounds attractive. Agreed. Just to make sure we are all on the same page I am adding Wei Wang since he was the original author for page hinting. > I consider free page hinting (in QEMU) to be in an unmaintainable state > (and it looks like Alex and I are fixing a feature we don't actually > intend to use / not aware of users). In contrast to that, the free page > reporting functionality/implementation is a walk in the park. > > > > > I'm still trying to get my head around the list of issues. So far they > > all look kind of minor to me. Would you like to summarize them > > somewhere? > > Some things I still have in my mind > > > 1. If migration fails during RAM precopy, the guest will never receive a > DONE notification. Probably easy to fix. Agreed. It is just a matter of finding the right point to add a hook so that if we abort the migration we can report DONE. > 2. Unclear semantics. Alex tried to document what the actual semantics > of hinted pages are. Assume the following in the guest to a previously > hinted page > > /* page was hinted and is reused now */ > if (page[x] != Y) > page[x] == Y; > /* migration ends, we now run on the destination */ > BUG_ON(page[x] != Y); > /* BUG, because the content chan > > A guest can observe that. And that could be a random driver that just > allocated a page. > > (I *assume* in Linux we might catch that using kasan, but I am not 100% > sure, also, the actual semantics to document are unclear - e.g., for > other guests) > > As Alex mentioned, it is not even guaranteed in QEMU that we receive a > zero page on the destination, it could also be something else (e.g., > previously migrated values). So this is only an issue for pages that are pushed out of the balloon as a part of the shrinker process though. So fixing it would be pretty straightforward as we would just have to initialize or at least dirty pages that are leaked as a part of the shrinker. That may have an impact on performance though as it would result in us dirtying pages that are freed as a result of the shrinker being triggered. > 3. If I am not wrong, the iothread works in > virtio_ballloon_get_free_page_hints() on the virtqueue only with holding > the free_page_lock (no BQL). > > Assume we're migrating, the iothread is active, and the guest triggers a > device reset. > > virtio_balloon_device_reset() will trigger a > virtio_balloon_free_page_stop(s). That won't actually wait for the > iothread to stop, it will only temporarily lock free_page_lock and > update s->free_page_report_status. > > I think there can be a race between the device reset and the iothread. > Once virtio_balloon_free_page_stop() returned, > virtio_ballloon_get_free_page_hints() can still call > - virtio_queue_set_notification(vq, 0); > - virtio_queue_set_notification(vq, 1); > - virtio_notify(vdev, vq); > - virtqueue_pop() > > I doubt this is very nice. And our conversation had me start looking though reference to virtio_balloon_free_page_stop. It looks like we call it for when we unrealize the device or reset the device. It might make more sense for us to look at pushing the status to DONE and forcing the iothread to be flushed out. > There are other concerns I had regarding the iothread (e.g., while > reporting is active, virtio_ballloon_get_free_page_hints() is > essentially a busy loop, in contrast to documented - > continue_to_get_hints will always be true). > > > The appeal of hinting is that it's 0 overhead outside migration, > > and pains were taken to avoid keeping pages locked while > > hypervisor is busy. > > > > If we are to drop hinting completely we need to show that reporting > > can be comparable, and we'll probably want to add a mode for > > reporting that behaves somewhat similarly. > > Depends on the actual users. If we're dropping a feature that nobody is > actively using, I don't think we have to show anything. > > This feature obviously saw no proper review. I'm pretty sure it had some, as it went through several iterations as I recall. However I don't think the review of the virtio interface was very detailed as I think most of the attention was on the kernel interface. As far as trying to do this with page reporting it would be doable, but I would need to use something like the command interface so that I would have a way to tell the driver when to drop the reported bit from the pages and when to stop/resume hinting. However it still wouldn't resolve the issue of copy on write style pages where the page is only read and never updated. I honestly wonder if it wouldn't be better to just apply the same logic that I have been doing with page reporting and migrate a zero page for a hinted page and MADVISE the page away so that it should be zero on the origin. Then you get the benefit of the guest shrinking as you perform the migration and would only be transmitting zero pages to the other side which should be significantly smaller if I am not mistaken.
> >> 2. Unclear semantics. Alex tried to document what the actual semantics >> of hinted pages are. Assume the following in the guest to a previously >> hinted page >> >> /* page was hinted and is reused now */ >> if (page[x] != Y) >> page[x] == Y; >> /* migration ends, we now run on the destination */ >> BUG_ON(page[x] != Y); >> /* BUG, because the content chan >> >> A guest can observe that. And that could be a random driver that just >> allocated a page. >> >> (I *assume* in Linux we might catch that using kasan, but I am not 100% >> sure, also, the actual semantics to document are unclear - e.g., for >> other guests) >> >> As Alex mentioned, it is not even guaranteed in QEMU that we receive a >> zero page on the destination, it could also be something else (e.g., >> previously migrated values). > > So this is only an issue for pages that are pushed out of the balloon > as a part of the shrinker process though. So fixing it would be pretty > straightforward as we would just have to initialize or at least dirty > pages that are leaked as a part of the shrinker. That may have an > impact on performance though as it would result in us dirtying pages > that are freed as a result of the shrinker being triggered. > It really depends on the desired semantics, which are unclear because there is no doc/spec. Either QEMU is buggy or the kernel is buggy. >> 3. If I am not wrong, the iothread works in >> virtio_ballloon_get_free_page_hints() on the virtqueue only with holding >> the free_page_lock (no BQL). >> >> Assume we're migrating, the iothread is active, and the guest triggers a >> device reset. >> >> virtio_balloon_device_reset() will trigger a >> virtio_balloon_free_page_stop(s). That won't actually wait for the >> iothread to stop, it will only temporarily lock free_page_lock and >> update s->free_page_report_status. >> >> I think there can be a race between the device reset and the iothread. >> Once virtio_balloon_free_page_stop() returned, >> virtio_ballloon_get_free_page_hints() can still call >> - virtio_queue_set_notification(vq, 0); >> - virtio_queue_set_notification(vq, 1); >> - virtio_notify(vdev, vq); >> - virtqueue_pop() >> >> I doubt this is very nice. > > And our conversation had me start looking though reference to > virtio_balloon_free_page_stop. It looks like we call it for when we > unrealize the device or reset the device. It might make more sense for > us to look at pushing the status to DONE and forcing the iothread to > be flushed out. > >> There are other concerns I had regarding the iothread (e.g., while >> reporting is active, virtio_ballloon_get_free_page_hints() is >> essentially a busy loop, in contrast to documented - >> continue_to_get_hints will always be true). >> >>> The appeal of hinting is that it's 0 overhead outside migration, >>> and pains were taken to avoid keeping pages locked while >>> hypervisor is busy. >>> >>> If we are to drop hinting completely we need to show that reporting >>> can be comparable, and we'll probably want to add a mode for >>> reporting that behaves somewhat similarly. >> >> Depends on the actual users. If we're dropping a feature that nobody is >> actively using, I don't think we have to show anything. >> >> This feature obviously saw no proper review. > > I'm pretty sure it had some, as it went through several iterations as > I recall. However I don't think the review of the virtio interface was > very detailed as I think most of the attention was on the kernel > interface. Yes, that‘s what I meant. The kernel side and the migration code (QEMU) got a lot of attention.
On Thu, Jun 18, 2020 at 07:10:43PM +0200, David Hildenbrand wrote: > >> > >> Ugh, ... > >> > >> @MST, you might have missed that in another discussion, what's your > >> general opinion about removing free page hinting in QEMU (and Linux)? We > >> keep finding issues in the QEMU implementation, including non-trivial > >> ones, and have to speculate about the actual semantics. I can see that > >> e.g., libvirt does not support it yet. > > > > Not maintaining two similar features sounds attractive. > > I consider free page hinting (in QEMU) to be in an unmaintainable state > (and it looks like Alex and I are fixing a feature we don't actually > intend to use / not aware of users). In contrast to that, the free page > reporting functionality/implementation is a walk in the park. So at the high level the idea was simple, we just clear the dirty bit when page is hinted, unless we sent a new command since. Implementation was reviewed by migration maintainers. If there's a consensus the code is written so badly we can't maintain it, maybe we should remove it. Which parts are unmaintainable in your eyes - migration or virtio ones? Or maybe it's the general thing that interface was never specced properly. > > > > I'm still trying to get my head around the list of issues. So far they > > all look kind of minor to me. Would you like to summarize them > > somewhere? > > Some things I still have in my mind Thanks for the summary! > > 1. If migration fails during RAM precopy, the guest will never receive a > DONE notification. Probably easy to fix. > > 2. Unclear semantics. Alex tried to document what the actual semantics > of hinted pages are. I'll reply to that now. > Assume the following in the guest to a previously > hinted page > > /* page was hinted and is reused now */ > if (page[x] != Y) > page[x] == Y; > /* migration ends, we now run on the destination */ > BUG_ON(page[x] != Y); > /* BUG, because the content chan The assumption hinting makes is that data in page is writtent to before it's used. > A guest can observe that. And that could be a random driver that just > allocated a page. > > (I *assume* in Linux we might catch that using kasan, but I am not 100% > sure, also, the actual semantics to document are unclear - e.g., for > other guests) I think it's basically simple: hinting means it's ok to fill page with trash unless it has been modified since the command ID supplied. > As Alex mentioned, it is not even guaranteed in QEMU that we receive a > zero page on the destination, it could also be something else (e.g., > previously migrated values). Absolutely. > 3. If I am not wrong, the iothread works in > virtio_ballloon_get_free_page_hints() on the virtqueue only with holding > the free_page_lock (no BQL). > > Assume we're migrating, the iothread is active, and the guest triggers a > device reset. > > virtio_balloon_device_reset() will trigger a > virtio_balloon_free_page_stop(s). That won't actually wait for the > iothread to stop, it will only temporarily lock free_page_lock and > update s->free_page_report_status. > > I think there can be a race between the device reset and the iothread. > Once virtio_balloon_free_page_stop() returned, > virtio_ballloon_get_free_page_hints() can still call > - virtio_queue_set_notification(vq, 0); > - virtio_queue_set_notification(vq, 1); > - virtio_notify(vdev, vq); > - virtqueue_pop() > > I doubt this is very nice. Reset is notoriously hard to get right. > There are other concerns I had regarding the iothread (e.g., while > reporting is active, virtio_ballloon_get_free_page_hints() is > essentially a busy loop, in contrast to documented - > continue_to_get_hints will always be true). So that would be a performance issue you are suggesting, right? > > The appeal of hinting is that it's 0 overhead outside migration, > > and pains were taken to avoid keeping pages locked while > > hypervisor is busy. > > > > If we are to drop hinting completely we need to show that reporting > > can be comparable, and we'll probably want to add a mode for > > reporting that behaves somewhat similarly. > > Depends on the actual users. If we're dropping a feature that nobody is > actively using, I don't think we have to show anything. I don't know how to find out. So far it doesn't look like we found any common data corruptions that would indicate no one can use it safely. Races around reset aren't all that uncommon but I don't think that qualifies as a deal breaker. I find the idea of asynchronously sending hints to host without waiting for them to be processed intriguing. Not something I'd work on implementing if we had reporting originally, but since it's there I'm not sure we should just discard it at this point. > This feature obviously saw no proper review. I did my best but obviously missed some things. > -- > Thanks, > > David / dhildenb
> So at the high level the idea was simple, we just clear the dirty bit > when page is hinted, unless we sent a new command since. Implementation > was reviewed by migration maintainers. If there's a consensus the code > is written so badly we can't maintain it, maybe we should remove it. > Which parts are unmaintainable in your eyes - migration or virtio ones? QEMU implementation without a propert virtio specification. I hope that we can *at least* finally document the expected behavior. Alex gave it a shot, and I was hoping that Wei could jump in to clarify, help move this forward ... after all he implemented (+designed?) the feature and the virtio interface. > Or maybe it's the general thing that interface was never specced > properly. Yes, a spec would be definitely a good starter ... [...] >> >> 1. If migration fails during RAM precopy, the guest will never receive a >> DONE notification. Probably easy to fix. >> >> 2. Unclear semantics. Alex tried to document what the actual semantics >> of hinted pages are. > > I'll reply to that now. > >> Assume the following in the guest to a previously >> hinted page >> >> /* page was hinted and is reused now */ >> if (page[x] != Y) >> page[x] == Y; >> /* migration ends, we now run on the destination */ >> BUG_ON(page[x] != Y); >> /* BUG, because the content chan > > The assumption hinting makes is that data in page is writtent to before it's used. > > >> A guest can observe that. And that could be a random driver that just >> allocated a page. >> >> (I *assume* in Linux we might catch that using kasan, but I am not 100% >> sure, also, the actual semantics to document are unclear - e.g., for >> other guests) > > I think it's basically simple: hinting means it's ok to > fill page with trash unless it has been modified since the command > ID supplied. Yeah, I quite dislike the semantics, especially, as they are different to well-know semantics as e.g., represent in MADV_FREE. Getting changed content when reading is really weird. But it seemed to be easier to implement (low hanging fruit) and nobody complained back then. Well, now we are stuck with it. [..] > >> There are other concerns I had regarding the iothread (e.g., while >> reporting is active, virtio_ballloon_get_free_page_hints() is >> essentially a busy loop, in contrast to documented - >> continue_to_get_hints will always be true). > > So that would be a performance issue you are suggesting, right? I misread the code, so that comment does no longer apply (see other message). > >>> The appeal of hinting is that it's 0 overhead outside migration, >>> and pains were taken to avoid keeping pages locked while >>> hypervisor is busy. >>> >>> If we are to drop hinting completely we need to show that reporting >>> can be comparable, and we'll probably want to add a mode for >>> reporting that behaves somewhat similarly. >> >> Depends on the actual users. If we're dropping a feature that nobody is >> actively using, I don't think we have to show anything. > > > I don't know how to find out. So far it doesn't look like we found > any common data corruptions that would indicate no one can use it safely. > Races around reset aren't all that uncommon but I don't think that > qualifies as a deal breaker. As I said, there are no libvirt bindings, so at least anything using libvirt does not use it. I'd be curious about actual users. > > I find the idea of asynchronously sending hints to host without > waiting for them to be processed intriguing. Not something > I'd work on implementing if we had reporting originally, > but since it's there I'm not sure we should just discard it > at this point. > >> This feature obviously saw no proper review. > > I did my best but obviously missed some things. Yeah, definitely not your fault. People cannot expect maintainers to review everything in detail.
On Wed, Jun 24, 2020 at 05:28:59PM +0200, David Hildenbrand wrote: > > So at the high level the idea was simple, we just clear the dirty bit > > when page is hinted, unless we sent a new command since. Implementation > > was reviewed by migration maintainers. If there's a consensus the code > > is written so badly we can't maintain it, maybe we should remove it. > > Which parts are unmaintainable in your eyes - migration or virtio ones? > > QEMU implementation without a propert virtio specification. I hope that > we can *at least* finally document the expected behavior. Alex gave it a > shot, and I was hoping that Wei could jump in to clarify, help move this > forward ... after all he implemented (+designed?) the feature and the > virtio interface. > > > Or maybe it's the general thing that interface was never specced > > properly. > > Yes, a spec would be definitely a good starter ... > > [...] > > >> > >> 1. If migration fails during RAM precopy, the guest will never receive a > >> DONE notification. Probably easy to fix. > >> > >> 2. Unclear semantics. Alex tried to document what the actual semantics > >> of hinted pages are. > > > > I'll reply to that now. > > > >> Assume the following in the guest to a previously > >> hinted page > >> > >> /* page was hinted and is reused now */ > >> if (page[x] != Y) > >> page[x] == Y; > >> /* migration ends, we now run on the destination */ > >> BUG_ON(page[x] != Y); > >> /* BUG, because the content chan > > > > The assumption hinting makes is that data in page is writtent to before it's used. > > > > > >> A guest can observe that. And that could be a random driver that just > >> allocated a page. > >> > >> (I *assume* in Linux we might catch that using kasan, but I am not 100% > >> sure, also, the actual semantics to document are unclear - e.g., for > >> other guests) > > > > I think it's basically simple: hinting means it's ok to > > fill page with trash unless it has been modified since the command > > ID supplied. > > Yeah, I quite dislike the semantics, especially, as they are different > to well-know semantics as e.g., represent in MADV_FREE. Getting changed > content when reading is really weird. But it seemed to be easier to > implement (low hanging fruit) and nobody complained back then. Well, now > we are stuck with it. > > [..] The difference with MADV_FREE is - asynchronous (using cmd id to synchronize) - zero not guaranteed right? > > > >> There are other concerns I had regarding the iothread (e.g., while > >> reporting is active, virtio_ballloon_get_free_page_hints() is > >> essentially a busy loop, in contrast to documented - > >> continue_to_get_hints will always be true). > > > > So that would be a performance issue you are suggesting, right? > > I misread the code, so that comment does no longer apply (see other > message). > > > > >>> The appeal of hinting is that it's 0 overhead outside migration, > >>> and pains were taken to avoid keeping pages locked while > >>> hypervisor is busy. > >>> > >>> If we are to drop hinting completely we need to show that reporting > >>> can be comparable, and we'll probably want to add a mode for > >>> reporting that behaves somewhat similarly. > >> > >> Depends on the actual users. If we're dropping a feature that nobody is > >> actively using, I don't think we have to show anything. > > > > > > I don't know how to find out. So far it doesn't look like we found > > any common data corruptions that would indicate no one can use it safely. > > Races around reset aren't all that uncommon but I don't think that > > qualifies as a deal breaker. > > As I said, there are no libvirt bindings, so at least anything using > libvirt does not use it. I'd be curious about actual users. > > > > > I find the idea of asynchronously sending hints to host without > > waiting for them to be processed intriguing. Not something > > I'd work on implementing if we had reporting originally, > > but since it's there I'm not sure we should just discard it > > at this point. > > > >> This feature obviously saw no proper review. > > > > I did my best but obviously missed some things. > > Yeah, definitely not your fault. People cannot expect maintainers to > review everything in detail. > > -- > Thanks, > > David / dhildenb
On 24.06.20 17:37, Michael S. Tsirkin wrote: > On Wed, Jun 24, 2020 at 05:28:59PM +0200, David Hildenbrand wrote: >>> So at the high level the idea was simple, we just clear the dirty bit >>> when page is hinted, unless we sent a new command since. Implementation >>> was reviewed by migration maintainers. If there's a consensus the code >>> is written so badly we can't maintain it, maybe we should remove it. >>> Which parts are unmaintainable in your eyes - migration or virtio ones? >> >> QEMU implementation without a propert virtio specification. I hope that >> we can *at least* finally document the expected behavior. Alex gave it a >> shot, and I was hoping that Wei could jump in to clarify, help move this >> forward ... after all he implemented (+designed?) the feature and the >> virtio interface. >> >>> Or maybe it's the general thing that interface was never specced >>> properly. >> >> Yes, a spec would be definitely a good starter ... >> >> [...] >> >>>> >>>> 1. If migration fails during RAM precopy, the guest will never receive a >>>> DONE notification. Probably easy to fix. >>>> >>>> 2. Unclear semantics. Alex tried to document what the actual semantics >>>> of hinted pages are. >>> >>> I'll reply to that now. >>> >>>> Assume the following in the guest to a previously >>>> hinted page >>>> >>>> /* page was hinted and is reused now */ >>>> if (page[x] != Y) >>>> page[x] == Y; >>>> /* migration ends, we now run on the destination */ >>>> BUG_ON(page[x] != Y); >>>> /* BUG, because the content chan >>> >>> The assumption hinting makes is that data in page is writtent to before it's used. >>> >>> >>>> A guest can observe that. And that could be a random driver that just >>>> allocated a page. >>>> >>>> (I *assume* in Linux we might catch that using kasan, but I am not 100% >>>> sure, also, the actual semantics to document are unclear - e.g., for >>>> other guests) >>> >>> I think it's basically simple: hinting means it's ok to >>> fill page with trash unless it has been modified since the command >>> ID supplied. >> >> Yeah, I quite dislike the semantics, especially, as they are different >> to well-know semantics as e.g., represent in MADV_FREE. Getting changed >> content when reading is really weird. But it seemed to be easier to >> implement (low hanging fruit) and nobody complained back then. Well, now >> we are stuck with it. >> >> [..] > > The difference with MADV_FREE is > - asynchronous (using cmd id to synchronize) > - zero not guaranteed > > right? *looking into man page*, yes, when reading you either get the old content or zero. (I remember that a re-read also makes the content stable, but looks like you really have to write to a page) We should most probably do what Alex suggested and initialize pages (at least write a single byte) when leaking them from the shrinker in the guest while hinting is active, such that the content is stable for anybody to allocate and reuse a page.
On Wed, Jun 24, 2020 at 06:01:02PM +0200, David Hildenbrand wrote: > On 24.06.20 17:37, Michael S. Tsirkin wrote: > > On Wed, Jun 24, 2020 at 05:28:59PM +0200, David Hildenbrand wrote: > >>> So at the high level the idea was simple, we just clear the dirty bit > >>> when page is hinted, unless we sent a new command since. Implementation > >>> was reviewed by migration maintainers. If there's a consensus the code > >>> is written so badly we can't maintain it, maybe we should remove it. > >>> Which parts are unmaintainable in your eyes - migration or virtio ones? > >> > >> QEMU implementation without a propert virtio specification. I hope that > >> we can *at least* finally document the expected behavior. Alex gave it a > >> shot, and I was hoping that Wei could jump in to clarify, help move this > >> forward ... after all he implemented (+designed?) the feature and the > >> virtio interface. > >> > >>> Or maybe it's the general thing that interface was never specced > >>> properly. > >> > >> Yes, a spec would be definitely a good starter ... > >> > >> [...] > >> > >>>> > >>>> 1. If migration fails during RAM precopy, the guest will never receive a > >>>> DONE notification. Probably easy to fix. > >>>> > >>>> 2. Unclear semantics. Alex tried to document what the actual semantics > >>>> of hinted pages are. > >>> > >>> I'll reply to that now. > >>> > >>>> Assume the following in the guest to a previously > >>>> hinted page > >>>> > >>>> /* page was hinted and is reused now */ > >>>> if (page[x] != Y) > >>>> page[x] == Y; > >>>> /* migration ends, we now run on the destination */ > >>>> BUG_ON(page[x] != Y); > >>>> /* BUG, because the content chan > >>> > >>> The assumption hinting makes is that data in page is writtent to before it's used. > >>> > >>> > >>>> A guest can observe that. And that could be a random driver that just > >>>> allocated a page. > >>>> > >>>> (I *assume* in Linux we might catch that using kasan, but I am not 100% > >>>> sure, also, the actual semantics to document are unclear - e.g., for > >>>> other guests) > >>> > >>> I think it's basically simple: hinting means it's ok to > >>> fill page with trash unless it has been modified since the command > >>> ID supplied. > >> > >> Yeah, I quite dislike the semantics, especially, as they are different > >> to well-know semantics as e.g., represent in MADV_FREE. Getting changed > >> content when reading is really weird. But it seemed to be easier to > >> implement (low hanging fruit) and nobody complained back then. Well, now > >> we are stuck with it. > >> > >> [..] > > > > The difference with MADV_FREE is > > - asynchronous (using cmd id to synchronize) > > - zero not guaranteed > > > > right? > > *looking into man page*, yes, when reading you either get the old > content or zero. > > (I remember that a re-read also makes the content stable, but looks like > you really have to write to a page) > > We should most probably do what Alex suggested and initialize pages (at > least write a single byte) when leaking them from the shrinker in the > guest while hinting is active, such that the content is stable for > anybody to allocate and reuse a page. Drivers ignore old content from slab though, so I don't really see the point. > -- > Thanks, > > David / dhildenb
> Am 24.06.2020 um 22:36 schrieb Michael S. Tsirkin <mst@redhat.com>: > > On Wed, Jun 24, 2020 at 06:01:02PM +0200, David Hildenbrand wrote: >>> On 24.06.20 17:37, Michael S. Tsirkin wrote: >>> On Wed, Jun 24, 2020 at 05:28:59PM +0200, David Hildenbrand wrote: >>>>> So at the high level the idea was simple, we just clear the dirty bit >>>>> when page is hinted, unless we sent a new command since. Implementation >>>>> was reviewed by migration maintainers. If there's a consensus the code >>>>> is written so badly we can't maintain it, maybe we should remove it. >>>>> Which parts are unmaintainable in your eyes - migration or virtio ones? >>>> >>>> QEMU implementation without a propert virtio specification. I hope that >>>> we can *at least* finally document the expected behavior. Alex gave it a >>>> shot, and I was hoping that Wei could jump in to clarify, help move this >>>> forward ... after all he implemented (+designed?) the feature and the >>>> virtio interface. >>>> >>>>> Or maybe it's the general thing that interface was never specced >>>>> properly. >>>> >>>> Yes, a spec would be definitely a good starter ... >>>> >>>> [...] >>>> >>>>>> >>>>>> 1. If migration fails during RAM precopy, the guest will never receive a >>>>>> DONE notification. Probably easy to fix. >>>>>> >>>>>> 2. Unclear semantics. Alex tried to document what the actual semantics >>>>>> of hinted pages are. >>>>> >>>>> I'll reply to that now. >>>>> >>>>>> Assume the following in the guest to a previously >>>>>> hinted page >>>>>> >>>>>> /* page was hinted and is reused now */ >>>>>> if (page[x] != Y) >>>>>> page[x] == Y; >>>>>> /* migration ends, we now run on the destination */ >>>>>> BUG_ON(page[x] != Y); >>>>>> /* BUG, because the content chan >>>>> >>>>> The assumption hinting makes is that data in page is writtent to before it's used. >>>>> >>>>> >>>>>> A guest can observe that. And that could be a random driver that just >>>>>> allocated a page. >>>>>> >>>>>> (I *assume* in Linux we might catch that using kasan, but I am not 100% >>>>>> sure, also, the actual semantics to document are unclear - e.g., for >>>>>> other guests) >>>>> >>>>> I think it's basically simple: hinting means it's ok to >>>>> fill page with trash unless it has been modified since the command >>>>> ID supplied. >>>> >>>> Yeah, I quite dislike the semantics, especially, as they are different >>>> to well-know semantics as e.g., represent in MADV_FREE. Getting changed >>>> content when reading is really weird. But it seemed to be easier to >>>> implement (low hanging fruit) and nobody complained back then. Well, now >>>> we are stuck with it. >>>> >>>> [..] >>> >>> The difference with MADV_FREE is >>> - asynchronous (using cmd id to synchronize) >>> - zero not guaranteed >>> >>> right? >> >> *looking into man page*, yes, when reading you either get the old >> content or zero. >> >> (I remember that a re-read also makes the content stable, but looks like >> you really have to write to a page) >> >> We should most probably do what Alex suggested and initialize pages (at >> least write a single byte) when leaking them from the shrinker in the >> guest while hinting is active, such that the content is stable for >> anybody to allocate and reuse a page. > > Drivers ignore old content from slab though, so I don't really see > the point. > That‘s what we‘re hoping for and what we would expect. Maybe we should just life with that assumption and hope for the best ... >> -- >> Thanks, >> >> David / dhildenb >
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 3e2ac1104b5f..dc15409b0bb6 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -527,21 +527,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev) ret = false; goto out; } - if (id == dev->free_page_report_cmd_id) { - dev->free_page_report_status = FREE_PAGE_REPORT_S_START; + if (id == dev->free_page_hint_cmd_id) { + dev->free_page_hint_status = FREE_PAGE_HINT_S_START; } else { /* * Stop the optimization only when it has started. This * avoids a stale stop sign for the previous command. */ - if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { - dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP; + if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { + dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP; } } } if (elem->in_num) { - if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) { + if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) { qemu_guest_free_page_hint(elem->in_sg[0].iov_base, elem->in_sg[0].iov_len); } @@ -567,11 +567,11 @@ static void virtio_ballloon_get_free_page_hints(void *opaque) qemu_mutex_unlock(&dev->free_page_lock); virtio_notify(vdev, vq); /* - * Start to poll the vq once the reporting started. Otherwise, continue + * Start to poll the vq once the hinting started. Otherwise, continue * only when there are entries on the vq, which need to be given back. */ } while (continue_to_get_hints || - dev->free_page_report_status == FREE_PAGE_REPORT_S_START); + dev->free_page_hint_status == FREE_PAGE_HINT_S_START); virtio_queue_set_notification(vq, 1); } @@ -592,14 +592,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s) return; } - if (s->free_page_report_cmd_id == UINT_MAX) { - s->free_page_report_cmd_id = - VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN; + if (s->free_page_hint_cmd_id == UINT_MAX) { + s->free_page_hint_cmd_id = + VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; } else { - s->free_page_report_cmd_id++; + s->free_page_hint_cmd_id++; } - s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED; + s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED; virtio_notify_config(vdev); } @@ -607,18 +607,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon *s) { VirtIODevice *vdev = VIRTIO_DEVICE(s); - if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) { + if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) { /* * The lock also guarantees us that the * virtio_ballloon_get_free_page_hints exits after the - * free_page_report_status is set to S_STOP. + * free_page_hint_status is set to S_STOP. */ qemu_mutex_lock(&s->free_page_lock); /* - * The guest hasn't done the reporting, so host sends a notification - * to the guest to actively stop the reporting. + * The guest isn't done hinting, so send a notification + * to the guest to actively stop the hinting. */ - s->free_page_report_status = FREE_PAGE_REPORT_S_STOP; + s->free_page_hint_status = FREE_PAGE_HINT_S_STOP; qemu_mutex_unlock(&s->free_page_lock); virtio_notify_config(vdev); } @@ -628,15 +628,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon *s) { VirtIODevice *vdev = VIRTIO_DEVICE(s); - s->free_page_report_status = FREE_PAGE_REPORT_S_DONE; + s->free_page_hint_status = FREE_PAGE_HINT_S_DONE; virtio_notify_config(vdev); } static int -virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data) +virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data) { VirtIOBalloon *dev = container_of(n, VirtIOBalloon, - free_page_report_notify); + free_page_hint_notify); VirtIODevice *vdev = VIRTIO_DEVICE(dev); PrecopyNotifyData *pnd = data; @@ -685,7 +685,7 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s) if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { return offsetof(struct virtio_balloon_config, poison_val); } - return offsetof(struct virtio_balloon_config, free_page_report_cmd_id); + return offsetof(struct virtio_balloon_config, free_page_hint_cmd_id); } static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data) @@ -697,14 +697,14 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data) config.actual = cpu_to_le32(dev->actual); config.poison_val = cpu_to_le32(dev->poison_val); - if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) { - config.free_page_report_cmd_id = - cpu_to_le32(dev->free_page_report_cmd_id); - } else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) { - config.free_page_report_cmd_id = + if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) { + config.free_page_hint_cmd_id = + cpu_to_le32(dev->free_page_hint_cmd_id); + } else if (dev->free_page_hint_status == FREE_PAGE_HINT_S_STOP) { + config.free_page_hint_cmd_id = cpu_to_le32(VIRTIO_BALLOON_CMD_ID_STOP); - } else if (dev->free_page_report_status == FREE_PAGE_REPORT_S_DONE) { - config.free_page_report_cmd_id = + } else if (dev->free_page_hint_status == FREE_PAGE_HINT_S_DONE) { + config.free_page_hint_cmd_id = cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE); } @@ -817,14 +817,14 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id) return 0; } -static const VMStateDescription vmstate_virtio_balloon_free_page_report = { +static const VMStateDescription vmstate_virtio_balloon_free_page_hint = { .name = "virtio-balloon-device/free-page-report", .version_id = 1, .minimum_version_id = 1, .needed = virtio_balloon_free_page_support, .fields = (VMStateField[]) { - VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon), - VMSTATE_UINT32(free_page_report_status, VirtIOBalloon), + VMSTATE_UINT32(free_page_hint_cmd_id, VirtIOBalloon), + VMSTATE_UINT32(free_page_hint_status, VirtIOBalloon), VMSTATE_END_OF_LIST() } }; @@ -851,7 +851,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = { VMSTATE_END_OF_LIST() }, .subsections = (const VMStateDescription * []) { - &vmstate_virtio_balloon_free_page_report, + &vmstate_virtio_balloon_free_page_hint, &vmstate_virtio_balloon_page_poison, NULL } @@ -883,12 +883,12 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp) VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, virtio_balloon_handle_free_page_vq); - s->free_page_report_status = FREE_PAGE_REPORT_S_STOP; - s->free_page_report_cmd_id = - VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN; - s->free_page_report_notify.notify = - virtio_balloon_free_page_report_notify; - precopy_add_notifier(&s->free_page_report_notify); + s->free_page_hint_status = FREE_PAGE_HINT_S_STOP; + s->free_page_hint_cmd_id = + VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN; + s->free_page_hint_notify.notify = + virtio_balloon_free_page_hint_notify; + precopy_add_notifier(&s->free_page_hint_notify); if (s->iothread) { object_ref(OBJECT(s->iothread)); s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread), @@ -919,7 +919,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev) if (virtio_balloon_free_page_support(s)) { qemu_bh_delete(s->free_page_bh); virtio_balloon_free_page_stop(s); - precopy_remove_notifier(&s->free_page_report_notify); + precopy_remove_notifier(&s->free_page_hint_notify); } balloon_stats_destroy_timer(s); qemu_remove_balloon_handler(s); diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h index d49fef00cef2..28fd2b396087 100644 --- a/include/hw/virtio/virtio-balloon.h +++ b/include/hw/virtio/virtio-balloon.h @@ -23,7 +23,7 @@ #define VIRTIO_BALLOON(obj) \ OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON) -#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000 +#define VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN 0x80000000 typedef struct virtio_balloon_stat VirtIOBalloonStat; @@ -33,20 +33,20 @@ typedef struct virtio_balloon_stat_modern { uint64_t val; } VirtIOBalloonStatModern; -enum virtio_balloon_free_page_report_status { - FREE_PAGE_REPORT_S_STOP = 0, - FREE_PAGE_REPORT_S_REQUESTED = 1, - FREE_PAGE_REPORT_S_START = 2, - FREE_PAGE_REPORT_S_DONE = 3, +enum virtio_balloon_free_page_hint_status { + FREE_PAGE_HINT_S_STOP = 0, + FREE_PAGE_HINT_S_REQUESTED = 1, + FREE_PAGE_HINT_S_START = 2, + FREE_PAGE_HINT_S_DONE = 3, }; typedef struct VirtIOBalloon { VirtIODevice parent_obj; VirtQueue *ivq, *dvq, *svq, *free_page_vq, *reporting_vq; - uint32_t free_page_report_status; + uint32_t free_page_hint_status; uint32_t num_pages; uint32_t actual; - uint32_t free_page_report_cmd_id; + uint32_t free_page_hint_cmd_id; uint64_t stats[VIRTIO_BALLOON_S_NR]; VirtQueueElement *stats_vq_elem; size_t stats_vq_offset; @@ -55,7 +55,7 @@ typedef struct VirtIOBalloon { QEMUBH *free_page_bh; /* * Lock to synchronize threads to access the free page reporting related - * fields (e.g. free_page_report_status). + * fields (e.g. free_page_hint_status). */ QemuMutex free_page_lock; QemuCond free_page_cond; @@ -64,7 +64,7 @@ typedef struct VirtIOBalloon { * stopped. */ bool block_iothread; - NotifierWithReturn free_page_report_notify; + NotifierWithReturn free_page_hint_notify; int64_t stats_last_update; int64_t stats_poll_interval; uint32_t host_features;