Message ID | 4A521082.40209@novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jul 06, 2009 at 10:56:02AM -0400, Gregory Haskins wrote: > Avi Kivity wrote: > > On 07/02/2009 06:50 PM, Avi Kivity wrote: > >> On 07/02/2009 06:37 PM, Gregory Haskins wrote: > >>> (Applies to kvm.git/master:1f9050fd) > >>> > >>> The following is the latest attempt to fix the races in > >>> irqfd/eventfd, as > >>> well as restore DEASSIGN support. For more details, please read the > >>> patch > >>> headers. > >>> > >>> As always, this series has been tested against the kvm-eventfd unit > >>> test > >>> and everything appears to be functioning properly. You can download > >>> this > >>> test here: > >> > >> Applied, thanks. > >> > > > > ... and unapplied. There's a refcounting mismatch in irqfd_cleanup: a > > reference is taken for each irqfd, but dropped for each guest. This > > causes an oops if a guest with no irqfds is created and destroyed: > > I was able to reproduce this issue. The problem turned out to be that I > inadvertently always did a flush_workqueue(), even if the work-queue was > never initialized. > > The following interdiff applied to the reverted patch has been confirmed > to fix the issue: Could you document the init boolean and its locking rules? The best place to put it would be where the field is declared btw. Is it true that init === list_empty(&kvm->irqfds.items)? If yes maybe we don't need this field at all. > ------------------- > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index fcc3469..52b0e04 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi) > struct _irqfd *irqfd, *tmp; > struct eventfd_ctx *eventfd; > > + if (!kvm->irqfds.init) > + return -ENOENT; > + > eventfd = eventfd_ctx_fdget(fd); > if (IS_ERR(eventfd)) > return PTR_ERR(eventfd); wouldn't it be cleaner to error out in the for each loop if we don't find an entry to deactivate? Might be helpful for apps to get an error if they didn't deassign anything. > @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm) > { > struct _irqfd *irqfd, *tmp; > > + if (!kvm->irqfds.init) > + return; > + So here, I recall some old comment that flush below was needed even if list is empty. Is this no longer true? If not it might be cleaner to only flush if list is not empty. > spin_lock_irq(&kvm->irqfds.lock); > > list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) > > --------------------- > > You can pick up this fix folded into the original v9:5/5 patch here: > > git pull > git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git > for-avi > > Sorry for the sloppy patch in v9. :( Will strive to do better next time. > > Regards, > -Greg > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin wrote: > On Mon, Jul 06, 2009 at 10:56:02AM -0400, Gregory Haskins wrote: > >> Avi Kivity wrote: >> >>> On 07/02/2009 06:50 PM, Avi Kivity wrote: >>> >>>> On 07/02/2009 06:37 PM, Gregory Haskins wrote: >>>> >>>>> (Applies to kvm.git/master:1f9050fd) >>>>> >>>>> The following is the latest attempt to fix the races in >>>>> irqfd/eventfd, as >>>>> well as restore DEASSIGN support. For more details, please read the >>>>> patch >>>>> headers. >>>>> >>>>> As always, this series has been tested against the kvm-eventfd unit >>>>> test >>>>> and everything appears to be functioning properly. You can download >>>>> this >>>>> test here: >>>>> >>>> Applied, thanks. >>>> >>>> >>> ... and unapplied. There's a refcounting mismatch in irqfd_cleanup: a >>> reference is taken for each irqfd, but dropped for each guest. This >>> causes an oops if a guest with no irqfds is created and destroyed: >>> >> I was able to reproduce this issue. The problem turned out to be that I >> inadvertently always did a flush_workqueue(), even if the work-queue was >> never initialized. >> >> The following interdiff applied to the reverted patch has been confirmed >> to fix the issue: >> > > Could you document the init boolean and its locking rules? > The best place to put it would be where the field is declared btw. > Will do > Is it true that init === list_empty(&kvm->irqfds.items)? > If yes maybe we don't need this field at all. > > No, because its more difficult to maintain the work-queue when referenced against active irqfds (*). So instead, its maintained against guests that use irqfd, whether they have an active irqfd or not. Otherwise you have to contend with the eventfd-side release, which is a little tricky. (*) I'm sure its not rocket science to get this working, but it was getting more complex than I thought it was worth, so I simplified the model to be per-vm. Note that this design decision/limitation is declared in the patch header. > >> ------------------- >> >> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c >> index fcc3469..52b0e04 100644 >> --- a/virt/kvm/eventfd.c >> +++ b/virt/kvm/eventfd.c >> @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi) >> struct _irqfd *irqfd, *tmp; >> struct eventfd_ctx *eventfd; >> >> + if (!kvm->irqfds.init) >> + return -ENOENT; >> + >> eventfd = eventfd_ctx_fdget(fd); >> if (IS_ERR(eventfd)) >> return PTR_ERR(eventfd); >> > > wouldn't it be cleaner to error out in the for each loop if we don't > find an entry to deactivate? Might be helpful for apps to get an error > if they didn't deassign anything. > Again, irqfds.init is somewhat orthogonal to whether the list is populated or not. This check is for sanity (how can you deassign if you didnt assign, etc). Normally this would be a simple BUG_ON() sanity check, but I don't want a malicious/broken userspace to gain an easy attack vector ;) > >> @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm) >> { >> struct _irqfd *irqfd, *tmp; >> >> + if (!kvm->irqfds.init) >> + return; >> + >> > > So here, I recall some old comment that flush below was > needed even if list is empty. Is this no longer true? > If you are using irqfd, its true. If irqfds.init == false, you are not using irqfd and thus the flush cannot be needed. > If not it might be cleaner to only flush if list is not empty. > > You have to flush if irqfds.init == true even if the list is empty because you need to be sure that eventfd-side releases complete. They may have already removed themselves from the list, but the work-item is still in flight. Regards, -Greg
On Mon, Jul 06, 2009 at 12:41:59PM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > On Mon, Jul 06, 2009 at 10:56:02AM -0400, Gregory Haskins wrote: > > > >> Avi Kivity wrote: > >> > >>> On 07/02/2009 06:50 PM, Avi Kivity wrote: > >>> > >>>> On 07/02/2009 06:37 PM, Gregory Haskins wrote: > >>>> > >>>>> (Applies to kvm.git/master:1f9050fd) > >>>>> > >>>>> The following is the latest attempt to fix the races in > >>>>> irqfd/eventfd, as > >>>>> well as restore DEASSIGN support. For more details, please read the > >>>>> patch > >>>>> headers. > >>>>> > >>>>> As always, this series has been tested against the kvm-eventfd unit > >>>>> test > >>>>> and everything appears to be functioning properly. You can download > >>>>> this > >>>>> test here: > >>>>> > >>>> Applied, thanks. > >>>> > >>>> > >>> ... and unapplied. There's a refcounting mismatch in irqfd_cleanup: a > >>> reference is taken for each irqfd, but dropped for each guest. This > >>> causes an oops if a guest with no irqfds is created and destroyed: > >>> > >> I was able to reproduce this issue. The problem turned out to be that I > >> inadvertently always did a flush_workqueue(), even if the work-queue was > >> never initialized. > >> > >> The following interdiff applied to the reverted patch has been confirmed > >> to fix the issue: > >> > > > > Could you document the init boolean and its locking rules? > > The best place to put it would be where the field is declared btw. > > > > Will do > > > Is it true that init === list_empty(&kvm->irqfds.items)? > > If yes maybe we don't need this field at all. > > > > > No, OK, I thought it is. I'll wait for the documentation patch then. > because its more difficult to maintain the work-queue when > referenced against active irqfds (*). So instead, its maintained > against guests that use irqfd, whether they have an active irqfd or > not. Otherwise you have to contend with the eventfd-side release, which > is a little tricky. > > (*) I'm sure its not rocket science to get this working, but it was > getting more complex than I thought it was worth, so I simplified the > model to be per-vm. Note that this design decision/limitation is > declared in the patch header. > > > >> ------------------- > >> > >> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > >> index fcc3469..52b0e04 100644 > >> --- a/virt/kvm/eventfd.c > >> +++ b/virt/kvm/eventfd.c > >> @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi) > >> struct _irqfd *irqfd, *tmp; > >> struct eventfd_ctx *eventfd; > >> > >> + if (!kvm->irqfds.init) > >> + return -ENOENT; > >> + > >> eventfd = eventfd_ctx_fdget(fd); > >> if (IS_ERR(eventfd)) > >> return PTR_ERR(eventfd); > >> > > > > wouldn't it be cleaner to error out in the for each loop if we don't > > find an entry to deactivate? Might be helpful for apps to get an error > > if they didn't deassign anything. > > > > Again, irqfds.init is somewhat orthogonal to whether the list is > populated or not. This check is for sanity (how can you deassign if you > didnt assign, etc). Normally this would be a simple BUG_ON() sanity > check, but I don't want a malicious/broken userspace to gain an easy > attack vector ;) what I'm saying is that deassign should return an error if it's passed and entry that is not on the list. And if you do this and return before flush, this check won't be needed. > > > >> @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm) > >> { > >> struct _irqfd *irqfd, *tmp; > >> > >> + if (!kvm->irqfds.init) > >> + return; > >> + > >> > > > > So here, I recall some old comment that flush below was > > needed even if list is empty. Is this no longer true? > > > > If you are using irqfd, its true. If irqfds.init == false, you are not > using irqfd and thus the flush cannot be needed. > > > If not it might be cleaner to only flush if list is not empty. > > > > > You have to flush if irqfds.init == true even if the list is empty > because you need to be sure that eventfd-side releases complete. They > may have already removed themselves from the list, but the work-item is > still in flight. > > Regards, > -Greg > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin wrote: > On Mon, Jul 06, 2009 at 12:41:59PM -0400, Gregory Haskins wrote: > >> Michael S. Tsirkin wrote: >> >>>> >>>> >>> wouldn't it be cleaner to error out in the for each loop if we don't >>> find an entry to deactivate? Might be helpful for apps to get an error >>> if they didn't deassign anything. >>> >>> >> Again, irqfds.init is somewhat orthogonal to whether the list is >> populated or not. This check is for sanity (how can you deassign if you >> didnt assign, etc). Normally this would be a simple BUG_ON() sanity >> check, but I don't want a malicious/broken userspace to gain an easy >> attack vector ;) >> > > what I'm saying is that deassign should return an error if it's passed > and entry that is not on the list. This isn't an unreasonable request, and I believe this is actually the way the original deassign logic worked before we yanked the feature a few weeks ago. Its only slightly complicated by the fact that we may match multiple irqfds, but I think we solved that before by returning the number we matched. If Avi answers the other mail stating he wants to still see the on-demand work go in, lets use your suggestion. Regards, -Greg
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index fcc3469..52b0e04 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -318,6 +318,9 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi) struct _irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; + if (!kvm->irqfds.init) + return -ENOENT; + eventfd = eventfd_ctx_fdget(fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); @@ -360,6 +363,9 @@ kvm_irqfd_release(struct kvm *kvm) { struct _irqfd *irqfd, *tmp; + if (!kvm->irqfds.init) + return; + spin_lock_irq(&kvm->irqfds.lock); list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list)