Message ID | 20201026175325.585623-2-dwmw2@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/2] sched/wait: Add add_wait_queue_priority() | expand |
On 26/10/20 18:53, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > As far as I can tell, when we use posted interrupts we silently cut off > the events from userspace, if it's listening on the same eventfd that > feeds the irqfd. > > I like that behaviour. Let's do it all the time, even without posted > interrupts. It makes it much easier to handle IRQ remapping invalidation > without having to constantly add/remove the fd from the userspace poll > set. We can just leave userspace polling on it, and the bypass will... > well... bypass it. This looks good, though of course it depends on the somewhat hackish patch 1. However don't you need to read the eventfd as well, since userspace will never be able to do so? Paolo > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > virt/kvm/eventfd.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index d6408bb497dc..39443e2f72bf 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -191,6 +191,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) > struct kvm *kvm = irqfd->kvm; > unsigned seq; > int idx; > + int ret = 0; > > if (flags & EPOLLIN) { > idx = srcu_read_lock(&kvm->irq_srcu); > @@ -204,6 +205,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) > false) == -EWOULDBLOCK) > schedule_work(&irqfd->inject); > srcu_read_unlock(&kvm->irq_srcu, idx); > + ret = 1; > } > > if (flags & EPOLLHUP) { > @@ -227,7 +229,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) > spin_unlock_irqrestore(&kvm->irqfds.lock, iflags); > } > > - return 0; > + return ret; > } > > static void > @@ -236,7 +238,7 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, > { > struct kvm_kernel_irqfd *irqfd = > container_of(pt, struct kvm_kernel_irqfd, pt); > - add_wait_queue(wqh, &irqfd->wait); > + add_wait_queue_priority(wqh, &irqfd->wait); > } > > /* Must be called under irqfds.lock */ >
On Tue, 2020-10-27 at 09:01 +0100, Paolo Bonzini wrote: > On 26/10/20 18:53, David Woodhouse wrote: > > From: David Woodhouse <dwmw@amazon.co.uk> > > > > As far as I can tell, when we use posted interrupts we silently cut off > > the events from userspace, if it's listening on the same eventfd that > > feeds the irqfd. > > > > I like that behaviour. Let's do it all the time, even without posted > > interrupts. It makes it much easier to handle IRQ remapping invalidation > > without having to constantly add/remove the fd from the userspace poll > > set. We can just leave userspace polling on it, and the bypass will... > > well... bypass it. > > This looks good, though of course it depends on the somewhat hackish > patch 1. I thought it was quite neat :) > However don't you need to read the eventfd as well, since > userspace will never be able to do so? Yes. Although that's a separate cleanup as it was already true before my patch. Right now, userspace needs to explicitly stop polling on the VFIO eventfd while it's assigned as KVM IRQFD (to avoid injecting duplicate interrupts when the kernel isn't using PI and allows events to leak). So it isn't going to consume the events in that case either. Nothing's really changed. The VFIO virqfd is just the same. The count just builds up when the kernel handles the events, and is eventually cleared by eventfd_ctx_remove_wait_queue(). In both cases, that actually works fine because in practice the events are raised by eventfd_signal() in the kernel, and that works even if the count reaches ULLONG_MAX. It's just that sending further events from *userspace* would block in that case. Both of them theoretically want fixing — regardless of the priority patch. Since the wq lock is held while the wakeup function (virqfd_wakeup or irqfd_wakeup for VFIO/KVM respectively) run, all they really need to do is call eventfd_ctx_do_read() to consume the events. I'll look at whether I can find a nicer option than just exporting that.
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index d6408bb497dc..39443e2f72bf 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -191,6 +191,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) struct kvm *kvm = irqfd->kvm; unsigned seq; int idx; + int ret = 0; if (flags & EPOLLIN) { idx = srcu_read_lock(&kvm->irq_srcu); @@ -204,6 +205,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) false) == -EWOULDBLOCK) schedule_work(&irqfd->inject); srcu_read_unlock(&kvm->irq_srcu, idx); + ret = 1; } if (flags & EPOLLHUP) { @@ -227,7 +229,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key) spin_unlock_irqrestore(&kvm->irqfds.lock, iflags); } - return 0; + return ret; } static void @@ -236,7 +238,7 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, { struct kvm_kernel_irqfd *irqfd = container_of(pt, struct kvm_kernel_irqfd, pt); - add_wait_queue(wqh, &irqfd->wait); + add_wait_queue_priority(wqh, &irqfd->wait); } /* Must be called under irqfds.lock */