diff mbox

[v10] kvm: add support for irqfd

Message ID 20090526164201.GD9842@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin May 26, 2009, 4:42 p.m. UTC
On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> +static int
> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> +{
> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> +
> +	/*
> +	 * The wake_up is called with interrupts disabled.  Therefore we need
> +	 * to defer the IRQ injection until later since we need to acquire the
> +	 * kvm->lock to do so.
> +	 */
> +	schedule_work(&irqfd->work);
> +
> +	return 0;
> +}

This schedule_work is there just to work around the spinlock
in eventfd_signal, which we don't really need. Isn't this right?
And this is on each interrupt. Seems like a pity.
How about a flag in eventfd that would
convert it from waking up someone to a plain function call?

Davide, could we add something like

Comments

Gregory Haskins May 26, 2009, 6:05 p.m. UTC | #1
Michael S. Tsirkin wrote:
> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
>   
>> +static int
>> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
>> +{
>> +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
>> +
>> +	/*
>> +	 * The wake_up is called with interrupts disabled.  Therefore we need
>> +	 * to defer the IRQ injection until later since we need to acquire the
>> +	 * kvm->lock to do so.
>> +	 */
>> +	schedule_work(&irqfd->work);
>> +
>> +	return 0;
>> +}
>>     
>
> This schedule_work is there just to work around the spinlock
> in eventfd_signal, which we don't really need. Isn't this right?
>   

Yep.

> And this is on each interrupt. Seems like a pity.
>   

I agree.  Moving towards a way to be able to inject without deferring to
a workqueue will be a good thing.  Note, however, that addressing it at
the eventfd/wqh-lock layer is only part of the picture since ideally we
can inject (i.e. eventfd_signal()) from any atomic context (e.g.
hard-irq), not just the artificial one created by the wqh based
implementation.  I think Marcelo's irq_lock split-up code is taking us
in that direction by (eventually) allowing the kvm_set_irq() path to be
atomic-context friendly.

> How about a flag in eventfd that would
> convert it from waking up someone to a plain function call?
>
> Davide, could we add something like
>
>
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 2a701d5..8bfa308 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -29,6 +29,7 @@ struct eventfd_ctx {
>  	 */
>  	__u64 count;
>  	unsigned int flags;
> +	int nolock;
>  };
>  
>  /*
> @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n)
>  
>  	if (n < 0)
>  		return -EINVAL;
> +	if (ctx->nolock) {
> +               /* Whoever set nolock
> +                  better set wqh.func as well. */
> +		ctx->wqh.func(&ctx->wqh, 0, 0, NULL);
> +		return 0;
> +	}
>  	spin_lock_irqsave(&ctx->wqh.lock, flags);
>  	if (ULLONG_MAX - ctx->count < n)
>  		n = (int) (ULLONG_MAX - ctx->count);
>
>   

If we think we still need to address it at the eventfd layer (which I am
not 100% convinced we do), I think we should probably generalize it a
little more and make it so it doesn't completely re-route the
notification (there may be other end-points interrested in the event, I
suppose).

I am thinking something along the lines that the internal eventfd uses
an srcu_notifier, and we register a default notifier which points to a
wqh path very much like what we have today.  Then something like kvm
could register an additional srcu_notifier which should allow it to be
invoked lockless (*).  This would theoretically allow the eventfd to
remain free to support an arbitrary number of end-points which support
both locked and lockless operation.

-Greg

(*) disclaimer: I've never looked at the srcu_notifier implementation,
so perhaps this is not what they really offer.  I base this only on
basic RCU understanding.
Davide Libenzi May 26, 2009, 8 p.m. UTC | #2
On Tue, 26 May 2009, Michael S. Tsirkin wrote:

> On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
> > +static int
> > +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> > +{
> > +	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> > +
> > +	/*
> > +	 * The wake_up is called with interrupts disabled.  Therefore we need
> > +	 * to defer the IRQ injection until later since we need to acquire the
> > +	 * kvm->lock to do so.
> > +	 */
> > +	schedule_work(&irqfd->work);
> > +
> > +	return 0;
> > +}
> 
> This schedule_work is there just to work around the spinlock
> in eventfd_signal, which we don't really need. Isn't this right?
> And this is on each interrupt. Seems like a pity.
> How about a flag in eventfd that would
> convert it from waking up someone to a plain function call?
> 
> Davide, could we add something like

I'm sorry, but it's not very pretty. Please find another way around.



> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 2a701d5..8bfa308 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -29,6 +29,7 @@ struct eventfd_ctx {
>  	 */
>  	__u64 count;
>  	unsigned int flags;
> +	int nolock;
>  };
>  
>  /*
> @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n)
>  
>  	if (n < 0)
>  		return -EINVAL;
> +	if (ctx->nolock) {
> +               /* Whoever set nolock
> +                  better set wqh.func as well. */
> +		ctx->wqh.func(&ctx->wqh, 0, 0, NULL);
> +		return 0;
> +	}
>  	spin_lock_irqsave(&ctx->wqh.lock, flags);
>  	if (ULLONG_MAX - ctx->count < n)
>  		n = (int) (ULLONG_MAX - ctx->count);



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..8bfa308 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -29,6 +29,7 @@  struct eventfd_ctx {
 	 */
 	__u64 count;
 	unsigned int flags;
+	int nolock;
 };
 
 /*
@@ -46,6 +47,12 @@  int eventfd_signal(struct file *file, int n)
 
 	if (n < 0)
 		return -EINVAL;
+	if (ctx->nolock) {
+               /* Whoever set nolock
+                  better set wqh.func as well. */
+		ctx->wqh.func(&ctx->wqh, 0, 0, NULL);
+		return 0;
+	}
 	spin_lock_irqsave(&ctx->wqh.lock, flags);
 	if (ULLONG_MAX - ctx->count < n)
 		n = (int) (ULLONG_MAX - ctx->count);