diff mbox series

[v18,QEMU,3/3] virtio-balloon: Provide a interface for free page reporting

Message ID 20200408225529.18764.44086.stgit@localhost.localdomain (mailing list archive)
State New, archived
Headers show
Series virtio-balloon: add support for providing free page reporting | expand

Commit Message

Alexander Duyck April 8, 2020, 10:55 p.m. UTC
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Add support for what I am referring to as "free page reporting".
Basically the idea is to function very similar to how the balloon works
in that we basically end up madvising the page as not being used. However
we don't really need to bother with any deflate type logic since the page
will be faulted back into the guest when it is read or written to.

This is meant to be a simplification of the existing balloon interface
to use for providing hints to what memory needs to be freed. I am assuming
this is safe to do as the deflate logic does not actually appear to do very
much other than tracking what subpages have been released and which ones
haven't.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 hw/virtio/virtio-balloon.c         |   48 +++++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-balloon.h |    2 +-
 2 files changed, 47 insertions(+), 3 deletions(-)

Comments

David Hildenbrand April 9, 2020, 7:44 a.m. UTC | #1
On 09.04.20 00:55, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> Add support for what I am referring to as "free page reporting".

"Add support for "free page reporting".

> Basically the idea is to function very similar to how the balloon works
> in that we basically end up madvising the page as not being used. However

I'd get rid of one "basically".

> we don't really need to bother with any deflate type logic since the page
> will be faulted back into the guest when it is read or written to.
> 
> This is meant to be a simplification of the existing balloon interface
> to use for providing hints to what memory needs to be freed. I am assuming

It's not really a simplification, it's something new. It's a new way of
letting the guest automatically report free pages to the hypervisor, so
the hypervisor can reuse them. In contrast to inflate/deflate, that's
triggered via the hypervisor explicitly.

> this is safe to do as the deflate logic does not actually appear to do very
> much other than tracking what subpages have been released and which ones
> haven't.

"I assume this is safe" does not sound very confident. Can we just drop
the last sentence?

> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  hw/virtio/virtio-balloon.c         |   48 +++++++++++++++++++++++++++++++++++-
>  include/hw/virtio/virtio-balloon.h |    2 +-
>  2 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 1c6d36a29a04..297b267198ac 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>      balloon_stats_change_timer(s, 0);
>  }
>  
> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> +    VirtQueueElement *elem;
> +
> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> +        unsigned int i;
> +
> +        for (i = 0; i < elem->in_num; i++) {
> +            void *addr = elem->in_sg[i].iov_base;
> +            size_t size = elem->in_sg[i].iov_len;
> +            ram_addr_t ram_offset;
> +            size_t rb_page_size;
> +            RAMBlock *rb;
> +
> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> +                continue;
> +            }

These checks are not sufficient. See virtio_balloon_handle_output(),
where we e.g., check that somebody doesn't try to discard the bios.

Maybe we can factor our/unify the handling in both code paths.

> +
> +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> +            rb_page_size = qemu_ram_pagesize(rb);
> +
> +            /* For now we will simply ignore unaligned memory regions */
> +            if ((ram_offset | size) & (rb_page_size - 1)) {

"!QEMU_IS_ALIGNED()" please to make this easier to read.

> +                continue;
> +            }
> +
> +            ram_block_discard_range(rb, ram_offset, size);
> +        }
> +
> +        virtqueue_push(vq, elem, 0);
> +        virtio_notify(vdev, vq);
> +        g_free(elem);
> +    }
> +}
> +

[...]

>      if (virtio_has_feature(s->host_features,
>                             VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>          s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
> @@ -940,6 +982,8 @@ static Property virtio_balloon_properties[] = {
>       */
>      DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
>                       qemu_4_0_config_size, false),
> +    DEFINE_PROP_BIT("unused-page-reporting", VirtIOBalloon, host_features,

"free-page-reporting"
Alexander Duyck April 9, 2020, 5:34 p.m. UTC | #2
On Thu, Apr 9, 2020 at 12:44 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 09.04.20 00:55, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > Add support for what I am referring to as "free page reporting".
>
> "Add support for "free page reporting".
>
> > Basically the idea is to function very similar to how the balloon works
> > in that we basically end up madvising the page as not being used. However
>
> I'd get rid of one "basically".
>
> > we don't really need to bother with any deflate type logic since the page
> > will be faulted back into the guest when it is read or written to.
> >
> > This is meant to be a simplification of the existing balloon interface
> > to use for providing hints to what memory needs to be freed. I am assuming
>
> It's not really a simplification, it's something new. It's a new way of
> letting the guest automatically report free pages to the hypervisor, so
> the hypervisor can reuse them. In contrast to inflate/deflate, that's
> triggered via the hypervisor explicitly.
>
> > this is safe to do as the deflate logic does not actually appear to do very
> > much other than tracking what subpages have been released and which ones
> > haven't.
>
> "I assume this is safe" does not sound very confident. Can we just drop
> the last sentence?

Agreed. I will make the requested changes to the patch description.

>
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  hw/virtio/virtio-balloon.c         |   48 +++++++++++++++++++++++++++++++++++-
> >  include/hw/virtio/virtio-balloon.h |    2 +-
> >  2 files changed, 47 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 1c6d36a29a04..297b267198ac 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
> >      balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +    VirtQueueElement *elem;
> > +
> > +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> > +        unsigned int i;
> > +
> > +        for (i = 0; i < elem->in_num; i++) {
> > +            void *addr = elem->in_sg[i].iov_base;
> > +            size_t size = elem->in_sg[i].iov_len;
> > +            ram_addr_t ram_offset;
> > +            size_t rb_page_size;
> > +            RAMBlock *rb;
> > +
> > +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +                continue;
> > +            }
>
> These checks are not sufficient. See virtio_balloon_handle_output(),
> where we e.g., check that somebody doesn't try to discard the bios.
>
> Maybe we can factor our/unify the handling in both code paths.

I am going to have to look at this further. If I understand correctly
you are asking me to add all of the memory section checks that are in
virtio_balloon_handle_output correct?

I'm not sure it makes sense with the scatterlist approach I have taken
here. All of the mappings were created as a scatterlist of writable
DMA mappings unlike the regular balloon which is just stuffing PFN
numbers into an array and then sending the array across. As such I
would think there are already such protections in place. For instance,
you wouldn't want to let virtio-net map the BIOS and request data be
written into that either, correct? So I am assuming there is something
there to prevent that already isn't there?

> > +
> > +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> > +            rb_page_size = qemu_ram_pagesize(rb);
> > +
> > +            /* For now we will simply ignore unaligned memory regions */
> > +            if ((ram_offset | size) & (rb_page_size - 1)) {
>
> "!QEMU_IS_ALIGNED()" please to make this easier to read.

done.

> > +                continue;
> > +            }
> > +
> > +            ram_block_discard_range(rb, ram_offset, size);
> > +        }
> > +
> > +        virtqueue_push(vq, elem, 0);
> > +        virtio_notify(vdev, vq);
> > +        g_free(elem);
> > +    }
> > +}
> > +
>
> [...]
>
> >      if (virtio_has_feature(s->host_features,
> >                             VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >          s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
> > @@ -940,6 +982,8 @@ static Property virtio_balloon_properties[] = {
> >       */
> >      DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
> >                       qemu_4_0_config_size, false),
> > +    DEFINE_PROP_BIT("unused-page-reporting", VirtIOBalloon, host_features,
>
> "free-page-reporting"

Thanks for catching that. It has been a while since that was renamed
and I must have let it slip through the cracks.

- Alex
David Hildenbrand April 9, 2020, 5:35 p.m. UTC | #3
On 09.04.20 19:34, Alexander Duyck wrote:
> On Thu, Apr 9, 2020 at 12:44 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 09.04.20 00:55, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>
>>> Add support for what I am referring to as "free page reporting".
>>
>> "Add support for "free page reporting".
>>
>>> Basically the idea is to function very similar to how the balloon works
>>> in that we basically end up madvising the page as not being used. However
>>
>> I'd get rid of one "basically".
>>
>>> we don't really need to bother with any deflate type logic since the page
>>> will be faulted back into the guest when it is read or written to.
>>>
>>> This is meant to be a simplification of the existing balloon interface
>>> to use for providing hints to what memory needs to be freed. I am assuming
>>
>> It's not really a simplification, it's something new. It's a new way of
>> letting the guest automatically report free pages to the hypervisor, so
>> the hypervisor can reuse them. In contrast to inflate/deflate, that's
>> triggered via the hypervisor explicitly.
>>
>>> this is safe to do as the deflate logic does not actually appear to do very
>>> much other than tracking what subpages have been released and which ones
>>> haven't.
>>
>> "I assume this is safe" does not sound very confident. Can we just drop
>> the last sentence?
> 
> Agreed. I will make the requested changes to the patch description.
> 
>>
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>> ---
>>>  hw/virtio/virtio-balloon.c         |   48 +++++++++++++++++++++++++++++++++++-
>>>  include/hw/virtio/virtio-balloon.h |    2 +-
>>>  2 files changed, 47 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>>> index 1c6d36a29a04..297b267198ac 100644
>>> --- a/hw/virtio/virtio-balloon.c
>>> +++ b/hw/virtio/virtio-balloon.c
>>> @@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>>>      balloon_stats_change_timer(s, 0);
>>>  }
>>>
>>> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
>>> +{
>>> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>>> +    VirtQueueElement *elem;
>>> +
>>> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
>>> +        unsigned int i;
>>> +
>>> +        for (i = 0; i < elem->in_num; i++) {
>>> +            void *addr = elem->in_sg[i].iov_base;
>>> +            size_t size = elem->in_sg[i].iov_len;
>>> +            ram_addr_t ram_offset;
>>> +            size_t rb_page_size;
>>> +            RAMBlock *rb;
>>> +
>>> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
>>> +                continue;
>>> +            }
>>
>> These checks are not sufficient. See virtio_balloon_handle_output(),
>> where we e.g., check that somebody doesn't try to discard the bios.
>>
>> Maybe we can factor our/unify the handling in both code paths.
> 
> I am going to have to look at this further. If I understand correctly
> you are asking me to add all of the memory section checks that are in
> virtio_balloon_handle_output correct?
> 
> I'm not sure it makes sense with the scatterlist approach I have taken
> here. All of the mappings were created as a scatterlist of writable
> DMA mappings unlike the regular balloon which is just stuffing PFN
> numbers into an array and then sending the array across. As such I
> would think there are already such protections in place. For instance,
> you wouldn't want to let virtio-net map the BIOS and request data be
> written into that either, correct? So I am assuming there is something
> there to prevent that already isn't there?

Good point, let's find out :)
Alexander Duyck April 10, 2020, 3:36 a.m. UTC | #4
On Thu, Apr 9, 2020 at 10:35 AM David Hildenbrand <david@redhat.com> wrote:
> On 09.04.20 19:34, Alexander Duyck wrote:
> >>>  hw/virtio/virtio-balloon.c         |   48 +++++++++++++++++++++++++++++++++++-
> >>>  include/hw/virtio/virtio-balloon.h |    2 +-
> >>>  2 files changed, 47 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> >>> index 1c6d36a29a04..297b267198ac 100644
> >>> --- a/hw/virtio/virtio-balloon.c
> >>> +++ b/hw/virtio/virtio-balloon.c
> >>> @@ -321,6 +321,42 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
> >>>      balloon_stats_change_timer(s, 0);
> >>>  }
> >>>
> >>> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> >>> +{
> >>> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >>> +    VirtQueueElement *elem;
> >>> +
> >>> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> >>> +        unsigned int i;
> >>> +
> >>> +        for (i = 0; i < elem->in_num; i++) {
> >>> +            void *addr = elem->in_sg[i].iov_base;
> >>> +            size_t size = elem->in_sg[i].iov_len;
> >>> +            ram_addr_t ram_offset;
> >>> +            size_t rb_page_size;
> >>> +            RAMBlock *rb;
> >>> +
> >>> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> >>> +                continue;
> >>> +            }
> >>
> >> These checks are not sufficient. See virtio_balloon_handle_output(),
> >> where we e.g., check that somebody doesn't try to discard the bios.
> >>
> >> Maybe we can factor our/unify the handling in both code paths.
> >
> > I am going to have to look at this further. If I understand correctly
> > you are asking me to add all of the memory section checks that are in
> > virtio_balloon_handle_output correct?
> >
> > I'm not sure it makes sense with the scatterlist approach I have taken
> > here. All of the mappings were created as a scatterlist of writable
> > DMA mappings unlike the regular balloon which is just stuffing PFN
> > numbers into an array and then sending the array across. As such I
> > would think there are already such protections in place. For instance,
> > you wouldn't want to let virtio-net map the BIOS and request data be
> > written into that either, correct? So I am assuming there is something
> > there to prevent that already isn't there?
>
> Good point, let's find out :)

Okay, so I believe I figured it out. From what I can tell there is a
call in address_space_map that will determine if we can directly write
to the page or not. However it looks like there might be one minor
issue as it is assuming it can write directly to ROM devices and that
isn't correct. I will add a patch to my set to address that.

Other than that the behavior seems to be that a bounce buffer will be
allocated on the first instance of a page backed by ROM or anything
other than RAM, and after that it will return NULL until the bounce
buffer is freed. If we start getting NULLs the mapping will be aborted
and we wouldn't even get into this code path. After we unmap the
region it will attempt to use address_space_write which should fail
for any region that isn't meant to be written to, or it will copy
zeros out of the bounce buffer into the region if it is writable via
address_space_write.

So the DMA mapping API in virtqueue_pop/virtqueue_push will take care
of doing the right stuff for the mappings in the case that the guest
is trying to do something really stupid, at least after I address the
direct write access to rom_device regions.

- Alex
diff mbox series

Patch

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1c6d36a29a04..297b267198ac 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,42 @@  static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
     balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+
+    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
+        unsigned int i;
+
+        for (i = 0; i < elem->in_num; i++) {
+            void *addr = elem->in_sg[i].iov_base;
+            size_t size = elem->in_sg[i].iov_len;
+            ram_addr_t ram_offset;
+            size_t rb_page_size;
+            RAMBlock *rb;
+
+            if (qemu_balloon_is_inhibited() || dev->poison_val) {
+                continue;
+            }
+
+            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
+            rb_page_size = qemu_ram_pagesize(rb);
+
+            /* For now we will simply ignore unaligned memory regions */
+            if ((ram_offset | size) & (rb_page_size - 1)) {
+                continue;
+            }
+
+            ram_block_discard_range(rb, ram_offset, size);
+        }
+
+        virtqueue_push(vq, elem, 0);
+        virtio_notify(vdev, vq);
+        g_free(elem);
+    }
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +664,8 @@  static size_t virtio_balloon_config_size(VirtIOBalloon *s)
         return sizeof(struct virtio_balloon_config);
     }
     if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
-        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+        virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
         return sizeof(struct virtio_balloon_config);
     }
     return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -717,7 +754,8 @@  static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
-    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+        virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
         virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
     }
 
@@ -807,6 +845,10 @@  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+        s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+    }
+
     if (virtio_has_feature(s->host_features,
                            VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
         s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -940,6 +982,8 @@  static Property virtio_balloon_properties[] = {
      */
     DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
                      qemu_4_0_config_size, false),
+    DEFINE_PROP_BIT("unused-page-reporting", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_REPORTING, true),
     DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
                      IOThread *),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 7fe78e5c14d7..db5bf7127112 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@  enum virtio_balloon_free_page_report_status {
 
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *rvq;
     uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;