diff mbox

[RFC,3/4] virtio_balloon: add pressure notification via a new virtqueue

Message ID 1389889445-3855-4-git-send-email-lcapitulino@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Luiz Capitulino Jan. 16, 2014, 4:24 p.m. UTC
From: Luiz capitulino <lcapitulino@redhat.com>

This commit adds support to a new virtqueue called message virtqueue.

The message virtqueue can be used by guests to notify the host about
important memory-related state changes in the guest. Currently, the
only implemented notification is the "guest is under pressure" one,
which informs the host that the guest is under memory pressure.

This notification can be used to implement automatic memory ballooning
in the host. For example, once learning that the guest is under pressure,
the host could cancel an on-going inflate and/or start a deflate
operation.

Doing this through a virtqueue might seem like overkill, as all
we're doing is to transfer an integer between guest and host. However,
using a virtqueue offers the following advantages:

 1. We can realibly synchronize host and guest. That is, the guest
    will only continue once the host acknowledges the notification.
    This is important, because if the guest gets under pressure while
    inflating the balloon, it has to stop to give the host a chance
    to reset "num_pages" (see next commit)

 2. It's extensible. We may (or may not) want to tell the host
    which pressure level the guest finds itself in (ie. low,
    medium or critical)

The lightweight alternative is to use a configuration space parameter.
For this to work though, the guest would have to wait the for host
to acknowloedge the receipt of a configuration change update. I could
try this if the virtqueue is too overkill.

Finally, the guest learns it's under pressure by registering a
callback with the in-kernel vmpressure API.

FIXMEs:

 - vmpressure API is missing an de-registration routine
 - not completely sure my changes in virtballoon_probe() are correct

Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
---
 drivers/virtio/virtio_balloon.c     | 93 +++++++++++++++++++++++++++++++++----
 include/uapi/linux/virtio_balloon.h |  1 +
 2 files changed, 84 insertions(+), 10 deletions(-)

Comments

Rusty Russell Jan. 16, 2014, 10:40 p.m. UTC | #1
Luiz Capitulino <lcapitulino@redhat.com> writes:
> From: Luiz capitulino <lcapitulino@redhat.com>
>
> This commit adds support to a new virtqueue called message virtqueue.

OK, this needs a lot of thought (especially since reworking the virtio
balloon is on the TODO list for the OASIS virtio technical committee...)

But AFAICT this is a way of explicitly saying "no" to the host's target
(vs the implicit method of just not meeting the target).  I'm not sure
that gives enough information to the host.  On the other hand, I'm not
sure what information we *can* give.

Should we instead be counter-proposing a target?

What does qemu do with this information?

Thanks,
Rusty.

> The message virtqueue can be used by guests to notify the host about
> important memory-related state changes in the guest. Currently, the
> only implemented notification is the "guest is under pressure" one,
> which informs the host that the guest is under memory pressure.
>
> This notification can be used to implement automatic memory ballooning
> in the host. For example, once learning that the guest is under pressure,
> the host could cancel an on-going inflate and/or start a deflate
> operation.
>
> Doing this through a virtqueue might seem like overkill, as all
> we're doing is to transfer an integer between guest and host. However,
> using a virtqueue offers the following advantages:
>
>  1. We can realibly synchronize host and guest. That is, the guest
>     will only continue once the host acknowledges the notification.
>     This is important, because if the guest gets under pressure while
>     inflating the balloon, it has to stop to give the host a chance
>     to reset "num_pages" (see next commit)
>
>  2. It's extensible. We may (or may not) want to tell the host
>     which pressure level the guest finds itself in (ie. low,
>     medium or critical)
>
> The lightweight alternative is to use a configuration space parameter.
> For this to work though, the guest would have to wait the for host
> to acknowloedge the receipt of a configuration change update. I could
> try this if the virtqueue is too overkill.
>
> Finally, the guest learns it's under pressure by registering a
> callback with the in-kernel vmpressure API.
>
> FIXMEs:
>
>  - vmpressure API is missing an de-registration routine
>  - not completely sure my changes in virtballoon_probe() are correct
>
> Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c     | 93 +++++++++++++++++++++++++++++++++----
>  include/uapi/linux/virtio_balloon.h |  1 +
>  2 files changed, 84 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 5c4a95b..1c3ee71 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -29,6 +29,9 @@
>  #include <linux/module.h>
>  #include <linux/balloon_compaction.h>
>  
> +#include <linux/cgroup.h>
> +#include <linux/vmpressure.h>
> +
>  /*
>   * Balloon device works in 4K page units.  So each page is pointed to by
>   * multiple balloon pages.  All memory counters in this driver are in balloon
> @@ -37,10 +40,12 @@
>  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>  
> +#define VIRTIO_BALLOON_MSG_PRESSURE 1
> +
>  struct virtio_balloon
>  {
>  	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq;
>  
>  	/* Where the ballooning thread waits for config to change. */
>  	wait_queue_head_t config_change;
> @@ -51,6 +56,8 @@ struct virtio_balloon
>  	/* Waiting for host to ack the pages we released. */
>  	wait_queue_head_t acked;
>  
> +	wait_queue_head_t message_acked;
> +
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
>  	/*
> @@ -71,6 +78,9 @@ struct virtio_balloon
>  	/* Memory statistics */
>  	int need_stats_update;
>  	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
> +
> +	/* Message virtqueue */
> +	atomic_t guest_pressure;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = {
>  	{ 0 },
>  };
>  
> +static inline bool guest_under_pressure(const struct virtio_balloon *vb)
> +{
> +	return atomic_read(&vb->guest_pressure) == 1;
> +}
> +
> +static void vmpressure_event_handler(void *data, int level)
> +{
> +	struct virtio_balloon *vb = data;
> +
> +	atomic_set(&vb->guest_pressure, 1);
> +	wake_up(&vb->config_change);
> +}
> +
> +static void tell_host_pressure(struct virtio_balloon *vb)
> +{
> +	const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE;
> +	struct scatterlist sg;
> +	unsigned int len;
> +	int err;
> +
> +	sg_init_one(&sg, &msg, sizeof(msg));
> +
> +	err = virtqueue_add_outbuf(vb->message_vq, &sg, 1, vb, GFP_KERNEL);
> +	if (err < 0) {
> +		printk(KERN_WARNING "virtio-balloon: failed to send host message (%d)\n", err);
> +		goto out;
> +	}
> +	virtqueue_kick(vb->message_vq);
> +
> +	wait_event(vb->message_acked, virtqueue_get_buf(vb->message_vq, &len));
> +
> +out:
> +	atomic_set(&vb->guest_pressure, 0);
> +}
> +
>  static u32 page_to_balloon_pfn(struct page *page)
>  {
>  	unsigned long pfn = page_to_pfn(page);
> @@ -100,6 +145,13 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static void message_ack(struct virtqueue *vq)
> +{
> +	struct virtio_balloon *vb = vq->vdev->priv;
> +
> +	wake_up(&vb->message_acked);
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
> @@ -300,6 +352,7 @@ static int balloon(void *_vballoon)
>  		try_to_freeze();
>  		wait_event_interruptible(vb->config_change,
>  					 (diff = towards_target(vb)) != 0
> +					 || guest_under_pressure(vb)
>  					 || vb->need_stats_update
>  					 || kthread_should_stop()
>  					 || freezing(current));
> @@ -310,31 +363,38 @@ static int balloon(void *_vballoon)
>  		else if (diff < 0)
>  			leak_balloon(vb, -diff);
>  		update_balloon_size(vb);
> +
> +		if (guest_under_pressure(vb))
> +			tell_host_pressure(vb);
> +
>  	}
>  	return 0;
>  }
>  
>  static int init_vqs(struct virtio_balloon *vb)
>  {
> -	struct virtqueue *vqs[3];
> -	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> -	const char *names[] = { "inflate", "deflate", "stats" };
> -	int err, nvqs;
> +	struct virtqueue *vqs[4];
> +	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> +				       stats_request, message_ack };
> +	const char *names[] = { "inflate", "deflate", "stats", "pressure" };
> +	int err, nvqs, idx;
>  
>  	/*
>  	 * We expect two virtqueues: inflate and deflate, and
> -	 * optionally stat.
> +	 * optionally stat and message.
>  	 */
> -	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> +	nvqs = 2 + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) +
> +		virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ);
>  	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
>  	if (err)
>  		return err;
>  
> -	vb->inflate_vq = vqs[0];
> -	vb->deflate_vq = vqs[1];
> +	idx = 0;
> +	vb->inflate_vq = vqs[idx++];
> +	vb->deflate_vq = vqs[idx++];
>  	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
>  		struct scatterlist sg;
> -		vb->stats_vq = vqs[2];
> +		vb->stats_vq = vqs[idx++];
>  
>  		/*
>  		 * Prime this virtqueue with one buffer so the hypervisor can
> @@ -346,6 +406,9 @@ static int init_vqs(struct virtio_balloon *vb)
>  			BUG();
>  		virtqueue_kick(vb->stats_vq);
>  	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ))
> +		vb->message_vq = vqs[idx];
> +
>  	return 0;
>  }
>  
> @@ -438,9 +501,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	vb->num_pages = 0;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->config_change);
> +	init_waitqueue_head(&vb->message_acked);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
>  	vb->need_stats_update = 0;
> +	atomic_set(&vb->guest_pressure, 0);
>  
>  	vb_devinfo = balloon_devinfo_alloc(vb);
>  	if (IS_ERR(vb_devinfo)) {
> @@ -467,6 +532,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	if (err)
>  		goto out_free_vb_mapping;
>  
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ)) {
> +		err = vmpressure_register_kernel_event(NULL, vmpressure_event_handler, vb);
> +		if (err)
> +			goto out_free_vb_mapping;
> +	}
> +
>  	vb->thread = kthread_run(balloon, vb, "vballoon");
>  	if (IS_ERR(vb->thread)) {
>  		err = PTR_ERR(vb->thread);
> @@ -476,6 +547,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	return 0;
>  
>  out_del_vqs:
> +	/* FIXME: add vmpressure_deregister_kernel_event() */
>  	vdev->config->del_vqs(vdev);
>  out_free_vb_mapping:
>  	balloon_mapping_free(vb_mapping);
> @@ -543,6 +615,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
>  static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
> +	VIRTIO_BALLOON_F_MESSAGE_VQ,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index 5e26f61..846e46c 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -31,6 +31,7 @@
>  /* The feature bitmap for virtio balloon */
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
> +#define VIRTIO_BALLOON_F_MESSAGE_VQ	2 /* Message virtqueue */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
> -- 
> 1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luiz Capitulino Jan. 17, 2014, 1:38 a.m. UTC | #2
On Fri, 17 Jan 2014 09:10:47 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> Luiz Capitulino <lcapitulino@redhat.com> writes:
> > From: Luiz capitulino <lcapitulino@redhat.com>
> >
> > This commit adds support to a new virtqueue called message virtqueue.
> 
> OK, this needs a lot of thought (especially since reworking the virtio
> balloon is on the TODO list for the OASIS virtio technical committee...)
> 
> But AFAICT this is a way of explicitly saying "no" to the host's target
> (vs the implicit method of just not meeting the target).  I'm not sure
> that gives enough information to the host.  On the other hand, I'm not
> sure what information we *can* give.
> 
> Should we instead be counter-proposing a target?

The problem is how to estimate a target value. I found it simpler
to just try to obey what the host is asking for (and fail if not
possible) than trying to make the guest negotiate with the host.

> What does qemu do with this information?

There are two possible scenarios:

 1. The balloon driver is currently inflating when it gets under
    pressure

    QEMU resets "num_pages" to the current balloon size. This
    cancels the on-going inflate

 2. The balloon driver is not inflating, eg. it's possibly sleeping

   QEMU issues a deflate

But note that those scenarios are not supposed to be used with the
current device, they are part of the automatic ballooning feature.
I CC'ed you on the QEMU patch, you can find it here case you didn't
see it:

 http://marc.info/?l=kvm&m=138988966315690&w=2

> 
> Thanks,
> Rusty.
> 
> > The message virtqueue can be used by guests to notify the host about
> > important memory-related state changes in the guest. Currently, the
> > only implemented notification is the "guest is under pressure" one,
> > which informs the host that the guest is under memory pressure.
> >
> > This notification can be used to implement automatic memory ballooning
> > in the host. For example, once learning that the guest is under pressure,
> > the host could cancel an on-going inflate and/or start a deflate
> > operation.
> >
> > Doing this through a virtqueue might seem like overkill, as all
> > we're doing is to transfer an integer between guest and host. However,
> > using a virtqueue offers the following advantages:
> >
> >  1. We can realibly synchronize host and guest. That is, the guest
> >     will only continue once the host acknowledges the notification.
> >     This is important, because if the guest gets under pressure while
> >     inflating the balloon, it has to stop to give the host a chance
> >     to reset "num_pages" (see next commit)
> >
> >  2. It's extensible. We may (or may not) want to tell the host
> >     which pressure level the guest finds itself in (ie. low,
> >     medium or critical)
> >
> > The lightweight alternative is to use a configuration space parameter.
> > For this to work though, the guest would have to wait the for host
> > to acknowloedge the receipt of a configuration change update. I could
> > try this if the virtqueue is too overkill.
> >
> > Finally, the guest learns it's under pressure by registering a
> > callback with the in-kernel vmpressure API.
> >
> > FIXMEs:
> >
> >  - vmpressure API is missing an de-registration routine
> >  - not completely sure my changes in virtballoon_probe() are correct
> >
> > Signed-off-by: Luiz capitulino <lcapitulino@redhat.com>
> > ---
> >  drivers/virtio/virtio_balloon.c     | 93 +++++++++++++++++++++++++++++++++----
> >  include/uapi/linux/virtio_balloon.h |  1 +
> >  2 files changed, 84 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > index 5c4a95b..1c3ee71 100644
> > --- a/drivers/virtio/virtio_balloon.c
> > +++ b/drivers/virtio/virtio_balloon.c
> > @@ -29,6 +29,9 @@
> >  #include <linux/module.h>
> >  #include <linux/balloon_compaction.h>
> >  
> > +#include <linux/cgroup.h>
> > +#include <linux/vmpressure.h>
> > +
> >  /*
> >   * Balloon device works in 4K page units.  So each page is pointed to by
> >   * multiple balloon pages.  All memory counters in this driver are in balloon
> > @@ -37,10 +40,12 @@
> >  #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
> >  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
> >  
> > +#define VIRTIO_BALLOON_MSG_PRESSURE 1
> > +
> >  struct virtio_balloon
> >  {
> >  	struct virtio_device *vdev;
> > -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> > +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq;
> >  
> >  	/* Where the ballooning thread waits for config to change. */
> >  	wait_queue_head_t config_change;
> > @@ -51,6 +56,8 @@ struct virtio_balloon
> >  	/* Waiting for host to ack the pages we released. */
> >  	wait_queue_head_t acked;
> >  
> > +	wait_queue_head_t message_acked;
> > +
> >  	/* Number of balloon pages we've told the Host we're not using. */
> >  	unsigned int num_pages;
> >  	/*
> > @@ -71,6 +78,9 @@ struct virtio_balloon
> >  	/* Memory statistics */
> >  	int need_stats_update;
> >  	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
> > +
> > +	/* Message virtqueue */
> > +	atomic_t guest_pressure;
> >  };
> >  
> >  static struct virtio_device_id id_table[] = {
> > @@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = {
> >  	{ 0 },
> >  };
> >  
> > +static inline bool guest_under_pressure(const struct virtio_balloon *vb)
> > +{
> > +	return atomic_read(&vb->guest_pressure) == 1;
> > +}
> > +
> > +static void vmpressure_event_handler(void *data, int level)
> > +{
> > +	struct virtio_balloon *vb = data;
> > +
> > +	atomic_set(&vb->guest_pressure, 1);
> > +	wake_up(&vb->config_change);
> > +}
> > +
> > +static void tell_host_pressure(struct virtio_balloon *vb)
> > +{
> > +	const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE;
> > +	struct scatterlist sg;
> > +	unsigned int len;
> > +	int err;
> > +
> > +	sg_init_one(&sg, &msg, sizeof(msg));
> > +
> > +	err = virtqueue_add_outbuf(vb->message_vq, &sg, 1, vb, GFP_KERNEL);
> > +	if (err < 0) {
> > +		printk(KERN_WARNING "virtio-balloon: failed to send host message (%d)\n", err);
> > +		goto out;
> > +	}
> > +	virtqueue_kick(vb->message_vq);
> > +
> > +	wait_event(vb->message_acked, virtqueue_get_buf(vb->message_vq, &len));
> > +
> > +out:
> > +	atomic_set(&vb->guest_pressure, 0);
> > +}
> > +
> >  static u32 page_to_balloon_pfn(struct page *page)
> >  {
> >  	unsigned long pfn = page_to_pfn(page);
> > @@ -100,6 +145,13 @@ static void balloon_ack(struct virtqueue *vq)
> >  	wake_up(&vb->acked);
> >  }
> >  
> > +static void message_ack(struct virtqueue *vq)
> > +{
> > +	struct virtio_balloon *vb = vq->vdev->priv;
> > +
> > +	wake_up(&vb->message_acked);
> > +}
> > +
> >  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> >  {
> >  	struct scatterlist sg;
> > @@ -300,6 +352,7 @@ static int balloon(void *_vballoon)
> >  		try_to_freeze();
> >  		wait_event_interruptible(vb->config_change,
> >  					 (diff = towards_target(vb)) != 0
> > +					 || guest_under_pressure(vb)
> >  					 || vb->need_stats_update
> >  					 || kthread_should_stop()
> >  					 || freezing(current));
> > @@ -310,31 +363,38 @@ static int balloon(void *_vballoon)
> >  		else if (diff < 0)
> >  			leak_balloon(vb, -diff);
> >  		update_balloon_size(vb);
> > +
> > +		if (guest_under_pressure(vb))
> > +			tell_host_pressure(vb);
> > +
> >  	}
> >  	return 0;
> >  }
> >  
> >  static int init_vqs(struct virtio_balloon *vb)
> >  {
> > -	struct virtqueue *vqs[3];
> > -	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> > -	const char *names[] = { "inflate", "deflate", "stats" };
> > -	int err, nvqs;
> > +	struct virtqueue *vqs[4];
> > +	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> > +				       stats_request, message_ack };
> > +	const char *names[] = { "inflate", "deflate", "stats", "pressure" };
> > +	int err, nvqs, idx;
> >  
> >  	/*
> >  	 * We expect two virtqueues: inflate and deflate, and
> > -	 * optionally stat.
> > +	 * optionally stat and message.
> >  	 */
> > -	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > +	nvqs = 2 + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) +
> > +		virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ);
> >  	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
> >  	if (err)
> >  		return err;
> >  
> > -	vb->inflate_vq = vqs[0];
> > -	vb->deflate_vq = vqs[1];
> > +	idx = 0;
> > +	vb->inflate_vq = vqs[idx++];
> > +	vb->deflate_vq = vqs[idx++];
> >  	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> >  		struct scatterlist sg;
> > -		vb->stats_vq = vqs[2];
> > +		vb->stats_vq = vqs[idx++];
> >  
> >  		/*
> >  		 * Prime this virtqueue with one buffer so the hypervisor can
> > @@ -346,6 +406,9 @@ static int init_vqs(struct virtio_balloon *vb)
> >  			BUG();
> >  		virtqueue_kick(vb->stats_vq);
> >  	}
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ))
> > +		vb->message_vq = vqs[idx];
> > +
> >  	return 0;
> >  }
> >  
> > @@ -438,9 +501,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	vb->num_pages = 0;
> >  	mutex_init(&vb->balloon_lock);
> >  	init_waitqueue_head(&vb->config_change);
> > +	init_waitqueue_head(&vb->message_acked);
> >  	init_waitqueue_head(&vb->acked);
> >  	vb->vdev = vdev;
> >  	vb->need_stats_update = 0;
> > +	atomic_set(&vb->guest_pressure, 0);
> >  
> >  	vb_devinfo = balloon_devinfo_alloc(vb);
> >  	if (IS_ERR(vb_devinfo)) {
> > @@ -467,6 +532,12 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	if (err)
> >  		goto out_free_vb_mapping;
> >  
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ)) {
> > +		err = vmpressure_register_kernel_event(NULL, vmpressure_event_handler, vb);
> > +		if (err)
> > +			goto out_free_vb_mapping;
> > +	}
> > +
> >  	vb->thread = kthread_run(balloon, vb, "vballoon");
> >  	if (IS_ERR(vb->thread)) {
> >  		err = PTR_ERR(vb->thread);
> > @@ -476,6 +547,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	return 0;
> >  
> >  out_del_vqs:
> > +	/* FIXME: add vmpressure_deregister_kernel_event() */
> >  	vdev->config->del_vqs(vdev);
> >  out_free_vb_mapping:
> >  	balloon_mapping_free(vb_mapping);
> > @@ -543,6 +615,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
> >  static unsigned int features[] = {
> >  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
> >  	VIRTIO_BALLOON_F_STATS_VQ,
> > +	VIRTIO_BALLOON_F_MESSAGE_VQ,
> >  };
> >  
> >  static struct virtio_driver virtio_balloon_driver = {
> > diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> > index 5e26f61..846e46c 100644
> > --- a/include/uapi/linux/virtio_balloon.h
> > +++ b/include/uapi/linux/virtio_balloon.h
> > @@ -31,6 +31,7 @@
> >  /* The feature bitmap for virtio balloon */
> >  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
> >  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
> > +#define VIRTIO_BALLOON_F_MESSAGE_VQ	2 /* Message virtqueue */
> >  
> >  /* Size of a PFN in the balloon interface. */
> >  #define VIRTIO_BALLOON_PFN_SHIFT 12
> > -- 
> > 1.8.1.4
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luiz Capitulino Jan. 17, 2014, 1:43 a.m. UTC | #3
On Thu, 16 Jan 2014 20:38:19 -0500
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> > What does qemu do with this information?
> 
> There are two possible scenarios:
> 
>  1. The balloon driver is currently inflating when it gets under
>     pressure
> 
>     QEMU resets "num_pages" to the current balloon size. This
>     cancels the on-going inflate
> 
>  2. The balloon driver is not inflating, eg. it's possibly sleeping
> 
>    QEMU issues a deflate
> 
> But note that those scenarios are not supposed to be used with the
> current device, they are part of the automatic ballooning feature.
> I CC'ed you on the QEMU patch, you can find it here case you didn't
> see it:
> 
>  http://marc.info/?l=kvm&m=138988966315690&w=2

By "current device" I meant "outside of automatic ballooning scope".
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell Jan. 20, 2014, 2:13 a.m. UTC | #4
Luiz Capitulino <lcapitulino@redhat.com> writes:
> On Fri, 17 Jan 2014 09:10:47 +1030
> Rusty Russell <rusty@rustcorp.com.au> wrote:
>
>> Luiz Capitulino <lcapitulino@redhat.com> writes:
>> > From: Luiz capitulino <lcapitulino@redhat.com>
>> >
>> > This commit adds support to a new virtqueue called message virtqueue.
>> 
>> OK, this needs a lot of thought (especially since reworking the virtio
>> balloon is on the TODO list for the OASIS virtio technical committee...)
>> 
>> But AFAICT this is a way of explicitly saying "no" to the host's target
>> (vs the implicit method of just not meeting the target).  I'm not sure
>> that gives enough information to the host.  On the other hand, I'm not
>> sure what information we *can* give.
>> 
>> Should we instead be counter-proposing a target?
>
> The problem is how to estimate a target value. I found it simpler
> to just try to obey what the host is asking for (and fail if not
> possible) than trying to make the guest negotiate with the host.

Understood, but we already do this the other way where the host tells
the guest how much memory to give up.

And is a guest expected to automatically inflate the balloon even if the
host doesn't want the memory, or wait to be asked?

> >> What does qemu do with this information?
> >
> > There are two possible scenarios:
> >
> >  1. The balloon driver is currently inflating when it gets under
> >     pressure
> >
> >     QEMU resets "num_pages" to the current balloon size. This
> >     cancels the on-going inflate
> >
> >  2. The balloon driver is not inflating, eg. it's possibly sleeping
> >
> >    QEMU issues a deflate
> >
> > But note that those scenarios are not supposed to be used with the
> > current device, they are part of the automatic ballooning feature.
> > I CC'ed you on the QEMU patch, you can find it here case you didn't
> > see it:
> >
> >  http://marc.info/?l=kvm&m=138988966315690&w=2

Yes, caught that after I replied; I should have checked first.

It seems like we are still figuring out how to do ballooning.  The
best approach in cases like this is to make it simple, so I don't hate
this.

But note that Daniel Kiper and I have been discussing a new virtio
balloon for draft 2 of the standard.  I'll CC you when I post that to
one of the OASIS virtio mailing lists.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luiz Capitulino Jan. 20, 2014, 2:59 p.m. UTC | #5
On Mon, 20 Jan 2014 12:43:45 +1030
Rusty Russell <rusty@rustcorp.com.au> wrote:

> Luiz Capitulino <lcapitulino@redhat.com> writes:
> > On Fri, 17 Jan 2014 09:10:47 +1030
> > Rusty Russell <rusty@rustcorp.com.au> wrote:
> >
> >> Luiz Capitulino <lcapitulino@redhat.com> writes:
> >> > From: Luiz capitulino <lcapitulino@redhat.com>
> >> >
> >> > This commit adds support to a new virtqueue called message virtqueue.
> >> 
> >> OK, this needs a lot of thought (especially since reworking the virtio
> >> balloon is on the TODO list for the OASIS virtio technical committee...)
> >> 
> >> But AFAICT this is a way of explicitly saying "no" to the host's target
> >> (vs the implicit method of just not meeting the target).  I'm not sure
> >> that gives enough information to the host.  On the other hand, I'm not
> >> sure what information we *can* give.
> >> 
> >> Should we instead be counter-proposing a target?
> >
> > The problem is how to estimate a target value. I found it simpler
> > to just try to obey what the host is asking for (and fail if not
> > possible) than trying to make the guest negotiate with the host.
> 
> Understood, but we already do this the other way where the host tells
> the guest how much memory to give up.
> 
> And is a guest expected to automatically inflate the balloon even if the
> host doesn't want the memory, or wait to be asked?

In my current design the guest always waits to be asked. Actually, all
automatic inflates and deflates are started by the host. An advantage
of this approach is that all the logic stays on the host, which makes
things simple as it matches with current balloon design.

Btw, you asked about what information we could provide to the host and
I forgot something important. The vmpressure notification (take a look
at patch 1/4 and patch 2/4) does provide an information that could be
interesting to have in the host: it provides a pressure level ("low",
"medium" or "critical") which is part of kernel's ABI with user-space.

I'm thinking about using this pressure level information to implement
different policies for automatic ballooning, which could be set by
the user or a management tool.

> > >> What does qemu do with this information?
> > >
> > > There are two possible scenarios:
> > >
> > >  1. The balloon driver is currently inflating when it gets under
> > >     pressure
> > >
> > >     QEMU resets "num_pages" to the current balloon size. This
> > >     cancels the on-going inflate
> > >
> > >  2. The balloon driver is not inflating, eg. it's possibly sleeping
> > >
> > >    QEMU issues a deflate
> > >
> > > But note that those scenarios are not supposed to be used with the
> > > current device, they are part of the automatic ballooning feature.
> > > I CC'ed you on the QEMU patch, you can find it here case you didn't
> > > see it:
> > >
> > >  http://marc.info/?l=kvm&m=138988966315690&w=2
> 
> Yes, caught that after I replied; I should have checked first.
> 
> It seems like we are still figuring out how to do ballooning.  The
> best approach in cases like this is to make it simple, so I don't hate
> this.

Good news to me :)

> But note that Daniel Kiper and I have been discussing a new virtio
> balloon for draft 2 of the standard.  I'll CC you when I post that to
> one of the OASIS virtio mailing lists.

Thank you, I appreciate that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5c4a95b..1c3ee71 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -29,6 +29,9 @@ 
 #include <linux/module.h>
 #include <linux/balloon_compaction.h>
 
+#include <linux/cgroup.h>
+#include <linux/vmpressure.h>
+
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
  * multiple balloon pages.  All memory counters in this driver are in balloon
@@ -37,10 +40,12 @@ 
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
 #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
 
+#define VIRTIO_BALLOON_MSG_PRESSURE 1
+
 struct virtio_balloon
 {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq;
 
 	/* Where the ballooning thread waits for config to change. */
 	wait_queue_head_t config_change;
@@ -51,6 +56,8 @@  struct virtio_balloon
 	/* Waiting for host to ack the pages we released. */
 	wait_queue_head_t acked;
 
+	wait_queue_head_t message_acked;
+
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
 	/*
@@ -71,6 +78,9 @@  struct virtio_balloon
 	/* Memory statistics */
 	int need_stats_update;
 	struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR];
+
+	/* Message virtqueue */
+	atomic_t guest_pressure;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -78,6 +88,41 @@  static struct virtio_device_id id_table[] = {
 	{ 0 },
 };
 
+static inline bool guest_under_pressure(const struct virtio_balloon *vb)
+{
+	return atomic_read(&vb->guest_pressure) == 1;
+}
+
+static void vmpressure_event_handler(void *data, int level)
+{
+	struct virtio_balloon *vb = data;
+
+	atomic_set(&vb->guest_pressure, 1);
+	wake_up(&vb->config_change);
+}
+
+static void tell_host_pressure(struct virtio_balloon *vb)
+{
+	const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE;
+	struct scatterlist sg;
+	unsigned int len;
+	int err;
+
+	sg_init_one(&sg, &msg, sizeof(msg));
+
+	err = virtqueue_add_outbuf(vb->message_vq, &sg, 1, vb, GFP_KERNEL);
+	if (err < 0) {
+		printk(KERN_WARNING "virtio-balloon: failed to send host message (%d)\n", err);
+		goto out;
+	}
+	virtqueue_kick(vb->message_vq);
+
+	wait_event(vb->message_acked, virtqueue_get_buf(vb->message_vq, &len));
+
+out:
+	atomic_set(&vb->guest_pressure, 0);
+}
+
 static u32 page_to_balloon_pfn(struct page *page)
 {
 	unsigned long pfn = page_to_pfn(page);
@@ -100,6 +145,13 @@  static void balloon_ack(struct virtqueue *vq)
 	wake_up(&vb->acked);
 }
 
+static void message_ack(struct virtqueue *vq)
+{
+	struct virtio_balloon *vb = vq->vdev->priv;
+
+	wake_up(&vb->message_acked);
+}
+
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
@@ -300,6 +352,7 @@  static int balloon(void *_vballoon)
 		try_to_freeze();
 		wait_event_interruptible(vb->config_change,
 					 (diff = towards_target(vb)) != 0
+					 || guest_under_pressure(vb)
 					 || vb->need_stats_update
 					 || kthread_should_stop()
 					 || freezing(current));
@@ -310,31 +363,38 @@  static int balloon(void *_vballoon)
 		else if (diff < 0)
 			leak_balloon(vb, -diff);
 		update_balloon_size(vb);
+
+		if (guest_under_pressure(vb))
+			tell_host_pressure(vb);
+
 	}
 	return 0;
 }
 
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	const char *names[] = { "inflate", "deflate", "stats" };
-	int err, nvqs;
+	struct virtqueue *vqs[4];
+	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+				       stats_request, message_ack };
+	const char *names[] = { "inflate", "deflate", "stats", "pressure" };
+	int err, nvqs, idx;
 
 	/*
 	 * We expect two virtqueues: inflate and deflate, and
-	 * optionally stat.
+	 * optionally stat and message.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+	nvqs = 2 + virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) +
+		virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ);
 	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
 		return err;
 
-	vb->inflate_vq = vqs[0];
-	vb->deflate_vq = vqs[1];
+	idx = 0;
+	vb->inflate_vq = vqs[idx++];
+	vb->deflate_vq = vqs[idx++];
 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
 		struct scatterlist sg;
-		vb->stats_vq = vqs[2];
+		vb->stats_vq = vqs[idx++];
 
 		/*
 		 * Prime this virtqueue with one buffer so the hypervisor can
@@ -346,6 +406,9 @@  static int init_vqs(struct virtio_balloon *vb)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ))
+		vb->message_vq = vqs[idx];
+
 	return 0;
 }
 
@@ -438,9 +501,11 @@  static int virtballoon_probe(struct virtio_device *vdev)
 	vb->num_pages = 0;
 	mutex_init(&vb->balloon_lock);
 	init_waitqueue_head(&vb->config_change);
+	init_waitqueue_head(&vb->message_acked);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
 	vb->need_stats_update = 0;
+	atomic_set(&vb->guest_pressure, 0);
 
 	vb_devinfo = balloon_devinfo_alloc(vb);
 	if (IS_ERR(vb_devinfo)) {
@@ -467,6 +532,12 @@  static int virtballoon_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_vb_mapping;
 
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MESSAGE_VQ)) {
+		err = vmpressure_register_kernel_event(NULL, vmpressure_event_handler, vb);
+		if (err)
+			goto out_free_vb_mapping;
+	}
+
 	vb->thread = kthread_run(balloon, vb, "vballoon");
 	if (IS_ERR(vb->thread)) {
 		err = PTR_ERR(vb->thread);
@@ -476,6 +547,7 @@  static int virtballoon_probe(struct virtio_device *vdev)
 	return 0;
 
 out_del_vqs:
+	/* FIXME: add vmpressure_deregister_kernel_event() */
 	vdev->config->del_vqs(vdev);
 out_free_vb_mapping:
 	balloon_mapping_free(vb_mapping);
@@ -543,6 +615,7 @@  static int virtballoon_restore(struct virtio_device *vdev)
 static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
+	VIRTIO_BALLOON_F_MESSAGE_VQ,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 5e26f61..846e46c 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -31,6 +31,7 @@ 
 /* The feature bitmap for virtio balloon */
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
+#define VIRTIO_BALLOON_F_MESSAGE_VQ	2 /* Message virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12