Message ID | 20090526164201.GD9842@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Michael S. Tsirkin wrote: > On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote: > >> +static int >> +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) >> +{ >> + struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); >> + >> + /* >> + * The wake_up is called with interrupts disabled. Therefore we need >> + * to defer the IRQ injection until later since we need to acquire the >> + * kvm->lock to do so. >> + */ >> + schedule_work(&irqfd->work); >> + >> + return 0; >> +} >> > > This schedule_work is there just to work around the spinlock > in eventfd_signal, which we don't really need. Isn't this right? > Yep. > And this is on each interrupt. Seems like a pity. > I agree. Moving towards a way to be able to inject without deferring to a workqueue will be a good thing. Note, however, that addressing it at the eventfd/wqh-lock layer is only part of the picture since ideally we can inject (i.e. eventfd_signal()) from any atomic context (e.g. hard-irq), not just the artificial one created by the wqh based implementation. I think Marcelo's irq_lock split-up code is taking us in that direction by (eventually) allowing the kvm_set_irq() path to be atomic-context friendly. > How about a flag in eventfd that would > convert it from waking up someone to a plain function call? > > Davide, could we add something like > > > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 2a701d5..8bfa308 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -29,6 +29,7 @@ struct eventfd_ctx { > */ > __u64 count; > unsigned int flags; > + int nolock; > }; > > /* > @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n) > > if (n < 0) > return -EINVAL; > + if (ctx->nolock) { > + /* Whoever set nolock > + better set wqh.func as well. */ > + ctx->wqh.func(&ctx->wqh, 0, 0, NULL); > + return 0; > + } > spin_lock_irqsave(&ctx->wqh.lock, flags); > if (ULLONG_MAX - ctx->count < n) > n = (int) (ULLONG_MAX - ctx->count); > > If we think we still need to address it at the eventfd layer (which I am not 100% convinced we do), I think we should probably generalize it a little more and make it so it doesn't completely re-route the notification (there may be other end-points interrested in the event, I suppose). I am thinking something along the lines that the internal eventfd uses an srcu_notifier, and we register a default notifier which points to a wqh path very much like what we have today. Then something like kvm could register an additional srcu_notifier which should allow it to be invoked lockless (*). This would theoretically allow the eventfd to remain free to support an arbitrary number of end-points which support both locked and lockless operation. -Greg (*) disclaimer: I've never looked at the srcu_notifier implementation, so perhaps this is not what they really offer. I base this only on basic RCU understanding.
On Tue, 26 May 2009, Michael S. Tsirkin wrote: > On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote: > > +static int > > +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) > > +{ > > + struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); > > + > > + /* > > + * The wake_up is called with interrupts disabled. Therefore we need > > + * to defer the IRQ injection until later since we need to acquire the > > + * kvm->lock to do so. > > + */ > > + schedule_work(&irqfd->work); > > + > > + return 0; > > +} > > This schedule_work is there just to work around the spinlock > in eventfd_signal, which we don't really need. Isn't this right? > And this is on each interrupt. Seems like a pity. > How about a flag in eventfd that would > convert it from waking up someone to a plain function call? > > Davide, could we add something like I'm sorry, but it's not very pretty. Please find another way around. > diff --git a/fs/eventfd.c b/fs/eventfd.c > index 2a701d5..8bfa308 100644 > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -29,6 +29,7 @@ struct eventfd_ctx { > */ > __u64 count; > unsigned int flags; > + int nolock; > }; > > /* > @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n) > > if (n < 0) > return -EINVAL; > + if (ctx->nolock) { > + /* Whoever set nolock > + better set wqh.func as well. */ > + ctx->wqh.func(&ctx->wqh, 0, 0, NULL); > + return 0; > + } > spin_lock_irqsave(&ctx->wqh.lock, flags); > if (ULLONG_MAX - ctx->count < n) > n = (int) (ULLONG_MAX - ctx->count); - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/eventfd.c b/fs/eventfd.c index 2a701d5..8bfa308 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -29,6 +29,7 @@ struct eventfd_ctx { */ __u64 count; unsigned int flags; + int nolock; }; /* @@ -46,6 +47,12 @@ int eventfd_signal(struct file *file, int n) if (n < 0) return -EINVAL; + if (ctx->nolock) { + /* Whoever set nolock + better set wqh.func as well. */ + ctx->wqh.func(&ctx->wqh, 0, 0, NULL); + return 0; + } spin_lock_irqsave(&ctx->wqh.lock, flags); if (ULLONG_MAX - ctx->count < n) n = (int) (ULLONG_MAX - ctx->count);