diff mbox

KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

Message ID jpg1ttyhtqw.fsf@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bandan Das July 7, 2014, 12:56 a.m. UTC
Wanpeng Li <wanpeng.li@linux.intel.com> writes:

> On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>>Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>>> 
>>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>>> There is a race which lead to an interrupt will be injected to L2 which 
>>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>>> 
>>>>                VCPU0                               another thread 
>>>> 
>>>> L1 intr not blocked on L2 first entry
>>>> vmx_vcpu_run req event 
>>>> kvm check request req event 
>>>> check_nested_events don't have any intr 
>>>> not nested exit 
>>>>                                             intr occur (8254, lapic timer etc)
>>>> inject_pending_event now have intr 
>>>> inject interrupt 
>>>> 
>>>> This patch fix this race by introduced a l1_events_blocked field in nested_vmx 
>>>> which indicates there is still-pending event which blocked by nested_run_pending, 
>>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which blocked 
>>>> by nested_run_pending.
>>>
>>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>>> aren't those able to trigger this scenario?
>>>
>>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>>> should be changed.
>>
>>
>>Ugh! I think I am hitting another one but this one's probably because 
>>we are not setting KVM_REQ_EVENT for something we should.
>>
>>Before this patch, I was able to hit this bug everytime with 
>>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>>
>>After this patch, I believe I am hitting another bug - this happens 
>>after I boot L2, as above, and then start a Linux kernel compilation
>>and then wait and watch :) It's a pain to debug because this happens
>>almost once in three times; it never happens if I run with ept=1, however,
>>I think that's only because the test completes sooner. But I can confirm
>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set instead of
>>the approach this patch takes.
>>(Any debug hints help appreciated!)
>>
>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>to do is to have the interrupt pending check for injection into L1 at
>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>nested events checks together ?
>>
>
> How about revert commit b6b8a1451 and try if the bug which you mentioned
> is still there?

Sorry, didn't get time at all to look at this over the weekend but thought of 
putting down what I have so far..

So, as mentioned in http://www.spinics.net/linux/lists/kvm/msg105316.html,
I have two tests - one is just booting up L2 with enable_shadow_vmcs=0 and 
ept=0 and the other is compiling the Linux kernel in L2.

Starting *without* your patch, let's apply this change -

This will presumably avoid the race (assuming only interrupts) in all 
cases.

And, sure enough, booting up L2 comes up fine. The next test compiling the 
kernel goes fine too.

Finally, let's apply your patch on top of these changes. With your change, L2
boots up fine, and when compiling the kernel in L2, I finally encounter a 
hang after some time. (In my last test it took around 22 minutes and I was 
compiling a kernel with everything enabled). The WARN() that we added doesn't
get hit, so it doesn't seem like the same race.

The only thing I can think of at this point is that since this patch 
sets REQ_EVENT only for certain conditions, it's exposing a bug for a certain 
event which apparently, setting REQ_EVENT for all cases hides. Note that 
I do think this patch is doing the right thing, but it's just exposing another 
bug somewhere else :)

Bandan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Wanpeng Li July 7, 2014, 8:46 a.m. UTC | #1
On Sun, Jul 06, 2014 at 08:56:07PM -0400, Bandan Das wrote:
[...]
>>
>> How about revert commit b6b8a1451 and try if the bug which you mentioned
>> is still there?
>
>Sorry, didn't get time at all to look at this over the weekend but thought of 
>putting down what I have so far..
>
>So, as mentioned in http://www.spinics.net/linux/lists/kvm/msg105316.html,
>I have two tests - one is just booting up L2 with enable_shadow_vmcs=0 and 
>ept=0 and the other is compiling the Linux kernel in L2.
>
>Starting *without* your patch, let's apply this change -
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index f32a025..c28730d 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -5887,6 +5887,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
> 			kvm_x86_ops->set_nmi(vcpu);
> 		}
> 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
>+		WARN_ON(is_guest_mode(vcpu));
> 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> 					    false);
>
>This will trigger a warning if we encounter a race (IIUC). Now, when booting L2,
>sure enough, I encounter the following in L0. Also, L2 hangs, so the next test
>(compiling the kernel) is not applicable anymore.
>[139132.361063] Call Trace:
>[139132.361070]  [<ffffffff816c0d31>] dump_stack+0x45/0x56
>[139132.361075]  [<ffffffff81084a7d>] warn_slowpath_common+0x7d/0xa0
>[139132.361077]  [<ffffffff81084b5a>] warn_slowpath_null+0x1a/0x20
>[139132.361093]  [<ffffffffa0437697>] kvm_arch_vcpu_ioctl_run+0xf77/0x1130 [kvm]
>[139132.361100]  [<ffffffffa04331ee>] ? kvm_arch_vcpu_load+0x4e/0x1e0 [kvm]
>[139132.361106]  [<ffffffffa0421bf2>] kvm_vcpu_ioctl+0x2b2/0x590 [kvm]
>[139132.361109]  [<ffffffff811eca08>] do_vfs_ioctl+0x2d8/0x4b0
>[139132.361111]  [<ffffffff811ecc61>] SyS_ioctl+0x81/0xa0
>[139132.361115]  [<ffffffff81114fd6>] ? __audit_syscall_exit+0x1f6/0x2a0
>[139132.361118]  [<ffffffff816c7ee9>] system_call_fastpath+0x16/0x1b
>
>The next step is to apply this change -
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index f32a025..432aa25 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -5887,6 +5887,12 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
> 			kvm_x86_ops->set_nmi(vcpu);
> 		}
> 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
>+		if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
>+			r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
>+			if (r != 0)
>+				return r;
>+		}
>+		WARN_ON(is_guest_mode(vcpu));
> 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> 					    false);
>
>This will presumably avoid the race (assuming only interrupts) in all 
>cases.
>
>And, sure enough, booting up L2 comes up fine. The next test compiling the 
>kernel goes fine too.
>
>Finally, let's apply your patch on top of these changes. With your change, L2
>boots up fine, and when compiling the kernel in L2, I finally encounter a 
>hang after some time. (In my last test it took around 22 minutes and I was 
>compiling a kernel with everything enabled). The WARN() that we added doesn't
>get hit, so it doesn't seem like the same race.

Agreed.

>
>The only thing I can think of at this point is that since this patch 
>sets REQ_EVENT only for certain conditions, it's exposing a bug for a certain 
>event which apparently, setting REQ_EVENT for all cases hides. Note that 
>I do think this patch is doing the right thing, but it's just exposing another 
>bug somewhere else :)

Agreed. 

Hi Paolo, 

Is it ok for you to apply this patch and then more effort should be taken
to figure out the other bug which don't have any relationship with the race 
that this patch fixed?

Regards,
Wanpeng Li 


>
>Bandan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 7, 2014, 1:03 p.m. UTC | #2
Il 07/07/2014 10:46, Wanpeng Li ha scritto:
> Hi Paolo,
>
> Is it ok for you to apply this patch and then more effort should be taken
> to figure out the other bug which don't have any relationship with the race
> that this patch fixed?

Which patch?  Yours or Bandan's?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das July 7, 2014, 5:31 p.m. UTC | #3
Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 07/07/2014 10:46, Wanpeng Li ha scritto:
>> Hi Paolo,
>>
>> Is it ok for you to apply this patch and then more effort should be taken
>> to figure out the other bug which don't have any relationship with the race
>> that this patch fixed?
>
> Which patch?  Yours or Bandan's?
Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
to call check_nested_events() when checking for interrupt in inject_pending_event() ?

I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381 
too. Once, we figure out what's causing hangs under certain conditions with his 
patch, we can apply that and revert this change.


> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 7, 2014, 5:34 p.m. UTC | #4
Il 07/07/2014 19:31, Bandan Das ha scritto:
>> >
>> > Which patch?  Yours or Bandan's?
> Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
> to call check_nested_events() when checking for interrupt in inject_pending_event() ?

Exactly, yours seemed better to apply as a quick regression fix.

Can you post it as a toplevel patch, so that the commit message explains 
what's happening?  Perhaps add a comment in the code as well.

Paolo

> I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381
> too. Once, we figure out what's causing hangs under certain conditions with his
> patch, we can apply that and revert this change.
>
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das July 7, 2014, 5:38 p.m. UTC | #5
Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 07/07/2014 19:31, Bandan Das ha scritto:
>>> >
>>> > Which patch?  Yours or Bandan's?
>> Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
>> to call check_nested_events() when checking for interrupt in inject_pending_event() ?
>
> Exactly, yours seemed better to apply as a quick regression fix.
>
> Can you post it as a toplevel patch, so that the commit message
> explains what's happening?  Perhaps add a comment in the code as well.

Ok, will do, thanks!

> Paolo
>
>> I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381
>> too. Once, we figure out what's causing hangs under certain conditions with his
>> patch, we can apply that and revert this change.
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 7, 2014, 11:14 p.m. UTC | #6
On Mon, Jul 07, 2014 at 01:38:37PM -0400, Bandan Das wrote:
>Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> Il 07/07/2014 19:31, Bandan Das ha scritto:
>>>> >
>>>> > Which patch?  Yours or Bandan's?
>>> Why don't we hold off on Wanpeng's patch and instead apply the one I proposed
>>> to call check_nested_events() when checking for interrupt in inject_pending_event() ?
>>
>> Exactly, yours seemed better to apply as a quick regression fix.
>>
>> Can you post it as a toplevel patch, so that the commit message
>> explains what's happening?  Perhaps add a comment in the code as well.
>
>Ok, will do, thanks!

As Jan metioned in http://www.spinics.net/lists/kvm/msg105238.html, "In any case, 
unconditionally setting KVM_REQ_EVENT seems strange and should be changed." Your 
trick still keep the unconditionally setting KVM_REQ_EVENT which is the root cause 
of the race there, anyway, I focus on fix the hang currently and a patch will be 
submitted soon. 

Regards,
Wanpeng Li 

>
>> Paolo
>>
>>> I think that will take care of https://bugzilla.kernel.org/show_bug.cgi?id=72381
>>> too. Once, we figure out what's causing hangs under certain conditions with his
>>> patch, we can apply that and revert this change.
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 7, 2014, 11:38 p.m. UTC | #7
On Mon, Jul 07, 2014 at 03:03:13PM +0200, Paolo Bonzini wrote:
>Il 07/07/2014 10:46, Wanpeng Li ha scritto:
>>Hi Paolo,
>>
>>Is it ok for you to apply this patch and then more effort should be taken
>>to figure out the other bug which don't have any relationship with the race
>>that this patch fixed?
>
>Which patch?  Yours or Bandan's?

Please wait, a patch which fix the hang will be submitted soon.

Regards,
Wanpeng Li 

>
>Paolo
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das July 8, 2014, 4:35 a.m. UTC | #8
Wanpeng Li <wanpeng.li@linux.intel.com> writes:
...
>
> As Jan metioned in http://www.spinics.net/lists/kvm/msg105238.html, "In any case, 
> unconditionally setting KVM_REQ_EVENT seems strange and should be changed." Your 
> trick still keep the unconditionally setting KVM_REQ_EVENT which is the root cause 
> of the race there, anyway, I focus on fix the hang currently and a patch will be 
> submitted soon. 

Right, that's the plan. Once you submit an updated fix, we can always revert
this change :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 8, 2014, 5:49 a.m. UTC | #9
Il 08/07/2014 01:38, Wanpeng Li ha scritto:
> On Mon, Jul 07, 2014 at 03:03:13PM +0200, Paolo Bonzini wrote:
>> Il 07/07/2014 10:46, Wanpeng Li ha scritto:
>>> Hi Paolo,
>>>
>>> Is it ok for you to apply this patch and then more effort should be taken
>>> to figure out the other bug which don't have any relationship with the race
>>> that this patch fixed?
>>
>> Which patch?  Yours or Bandan's?
>
> Please wait, a patch which fix the hang will be submitted soon.

This is a regression, so I think the right thing to do is to apply 
Bandan's patch to 3.16 and yours to 3.17.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..c28730d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5887,6 +5887,7 @@  static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 			kvm_x86_ops->set_nmi(vcpu);
 		}
 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
+		WARN_ON(is_guest_mode(vcpu));
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);

This will trigger a warning if we encounter a race (IIUC). Now, when booting L2,
sure enough, I encounter the following in L0. Also, L2 hangs, so the next test
(compiling the kernel) is not applicable anymore.
[139132.361063] Call Trace:
[139132.361070]  [<ffffffff816c0d31>] dump_stack+0x45/0x56
[139132.361075]  [<ffffffff81084a7d>] warn_slowpath_common+0x7d/0xa0
[139132.361077]  [<ffffffff81084b5a>] warn_slowpath_null+0x1a/0x20
[139132.361093]  [<ffffffffa0437697>] kvm_arch_vcpu_ioctl_run+0xf77/0x1130 [kvm]
[139132.361100]  [<ffffffffa04331ee>] ? kvm_arch_vcpu_load+0x4e/0x1e0 [kvm]
[139132.361106]  [<ffffffffa0421bf2>] kvm_vcpu_ioctl+0x2b2/0x590 [kvm]
[139132.361109]  [<ffffffff811eca08>] do_vfs_ioctl+0x2d8/0x4b0
[139132.361111]  [<ffffffff811ecc61>] SyS_ioctl+0x81/0xa0
[139132.361115]  [<ffffffff81114fd6>] ? __audit_syscall_exit+0x1f6/0x2a0
[139132.361118]  [<ffffffff816c7ee9>] system_call_fastpath+0x16/0x1b

The next step is to apply this change -
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..432aa25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5887,6 +5887,12 @@  static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 			kvm_x86_ops->set_nmi(vcpu);
 		}
 	} else if (kvm_cpu_has_injectable_intr(vcpu)) {
+		if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
+			r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
+			if (r != 0)
+				return r;
+		}
+		WARN_ON(is_guest_mode(vcpu));
 		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
 			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
 					    false);