diff mbox

[v4,2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it

Message ID 1467893939-3335-2-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li July 7, 2016, 12:18 p.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

We will go to vcpu_run() loop after L0 emulates VMRESUME which maybe 
incur kvm_sched_out and kvm_sched_in operations since cond_resched() 
will be called once need resched. Preemption timer will be reprogrammed 
if vCPU is scheduled to a different pCPU. Then the preemption timer 
bit of vmcs02 will be set if L0 enable preemption timer to run L1 even 
if L1 doesn't enable preemption timer to run L2.

This patch fix it by don't reprogram preemption timer of vmcs02 if L1's 
vCPU is scheduled on diffent pCPU when we are in the way to vmresume 
nested guest, and fallback to hrtimer based emulated method.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Yunhong Jiang <yunhong.jiang@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
v3 -> v4:
 * fallback to hrtimer based emulated method when in the way to vmresume nested guest 

 arch/x86/kvm/x86.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini July 7, 2016, 12:29 p.m. UTC | #1
On 07/07/2016 14:18, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> We will go to vcpu_run() loop after L0 emulates VMRESUME which maybe 
> incur kvm_sched_out and kvm_sched_in operations since cond_resched() 
> will be called once need resched. Preemption timer will be reprogrammed 
> if vCPU is scheduled to a different pCPU. Then the preemption timer 
> bit of vmcs02 will be set if L0 enable preemption timer to run L1 even 
> if L1 doesn't enable preemption timer to run L2.
> 
> This patch fix it by don't reprogram preemption timer of vmcs02 if L1's 
> vCPU is scheduled on diffent pCPU when we are in the way to vmresume 
> nested guest, and fallback to hrtimer based emulated method.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
> Cc: Jan Kiszka <jan.kiszka@siemens.com>
> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v3 -> v4:
>  * fallback to hrtimer based emulated method when in the way to vmresume nested guest 
> 
>  arch/x86/kvm/x86.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0cc6cf8..05137c0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2743,8 +2743,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  			mark_tsc_unstable("KVM discovered backwards TSC");
>  
>  		if (kvm_lapic_hv_timer_in_use(vcpu) &&
> +			(is_guest_mode(vcpu) ||
>  				kvm_x86_ops->set_hv_timer(vcpu,
> -					kvm_get_lapic_tscdeadline_msr(vcpu)))
> +					kvm_get_lapic_tscdeadline_msr(vcpu))))
>  			kvm_lapic_switch_to_sw_timer(vcpu);
>  		if (check_tsc_unstable()) {
>  			u64 offset = kvm_compute_tsc_offset(vcpu,
> 

Thanks, this is good as a fallback.  I'll try to fix it by getting the
pin-based execution controls right but if I fail this patch is okay.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 7, 2016, 1:23 p.m. UTC | #2
2016-07-07 20:29 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/07/2016 14:18, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> We will go to vcpu_run() loop after L0 emulates VMRESUME which maybe
>> incur kvm_sched_out and kvm_sched_in operations since cond_resched()
>> will be called once need resched. Preemption timer will be reprogrammed
>> if vCPU is scheduled to a different pCPU. Then the preemption timer
>> bit of vmcs02 will be set if L0 enable preemption timer to run L1 even
>> if L1 doesn't enable preemption timer to run L2.
>>
>> This patch fix it by don't reprogram preemption timer of vmcs02 if L1's
>> vCPU is scheduled on diffent pCPU when we are in the way to vmresume
>> nested guest, and fallback to hrtimer based emulated method.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Yunhong Jiang <yunhong.jiang@intel.com>
>> Cc: Jan Kiszka <jan.kiszka@siemens.com>
>> Cc: Haozhong Zhang <haozhong.zhang@intel.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v3 -> v4:
>>  * fallback to hrtimer based emulated method when in the way to vmresume nested guest
>>
>>  arch/x86/kvm/x86.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 0cc6cf8..05137c0 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2743,8 +2743,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>                       mark_tsc_unstable("KVM discovered backwards TSC");
>>
>>               if (kvm_lapic_hv_timer_in_use(vcpu) &&
>> +                     (is_guest_mode(vcpu) ||
>>                               kvm_x86_ops->set_hv_timer(vcpu,
>> -                                     kvm_get_lapic_tscdeadline_msr(vcpu)))
>> +                                     kvm_get_lapic_tscdeadline_msr(vcpu))))
>>                       kvm_lapic_switch_to_sw_timer(vcpu);
>>               if (check_tsc_unstable()) {
>>                       u64 offset = kvm_compute_tsc_offset(vcpu,
>>
>
> Thanks, this is good as a fallback.  I'll try to fix it by getting the
> pin-based execution controls right but if I fail this patch is okay.

I believe we still need this patch even if you implement "L1 TSC
deadline timer to trigger while L2 is running" eventually, the codes
you posted before:

  exec_control = vmcs12->pin_based_vm_exec_control;
+exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
  exec_control |= vmcs_config.pin_based_exec_ctrl;
- exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
+ if (vmx->hv_deadline_tsc == -1)
+     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;

So there is still case the preemption timer bit of vmcs02 is not set,
however,  the scenario I mentioned above in kvm_arch_vcpu_load() will
set it unnecessary.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 7, 2016, 2:11 p.m. UTC | #3
On 07/07/2016 15:23, Wanpeng Li wrote:
>>> >>
>>> >>               if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>> >> +                     (is_guest_mode(vcpu) ||
>>> >>                               kvm_x86_ops->set_hv_timer(vcpu,
>>> >> -                                     kvm_get_lapic_tscdeadline_msr(vcpu)))
>>> >> +                                     kvm_get_lapic_tscdeadline_msr(vcpu))))
>>> >>                       kvm_lapic_switch_to_sw_timer(vcpu);
>>> >>               if (check_tsc_unstable()) {
>>> >>                       u64 offset = kvm_compute_tsc_offset(vcpu,
>>> >>
>> >
>> > Thanks, this is good as a fallback.  I'll try to fix it by getting the
>> > pin-based execution controls right but if I fail this patch is okay.
> I believe we still need this patch even if you implement "L1 TSC
> deadline timer to trigger while L2 is running" eventually, the codes
> you posted before:
> 
>   exec_control = vmcs12->pin_based_vm_exec_control;
> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>   exec_control |= vmcs_config.pin_based_exec_ctrl;
> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> + if (vmx->hv_deadline_tsc == -1)
> +     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
> 
> So there is still case the preemption timer bit of vmcs02 is not set,
> however,  the scenario I mentioned above in kvm_arch_vcpu_load() will
> set it unnecessary.

kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
if vmcs02 is the loaded one.

This can happen if L2 has access to L1's local APIC registers (i.e. L1
passes the local APIC instead of emulating it, as is the case in a
partitioning hypervisor).  While L2 runs, it writes to the TSC deadline
MSR of L1.  This causes a call to kvm_x86_ops->set_hv_timer while the
active VMCS is a vmcs02.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 8, 2016, 12:38 a.m. UTC | #4
2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 07/07/2016 15:23, Wanpeng Li wrote:
>>>> >>
>>>> >>               if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>>> >> +                     (is_guest_mode(vcpu) ||
>>>> >>                               kvm_x86_ops->set_hv_timer(vcpu,
>>>> >> -                                     kvm_get_lapic_tscdeadline_msr(vcpu)))
>>>> >> +                                     kvm_get_lapic_tscdeadline_msr(vcpu))))
>>>> >>                       kvm_lapic_switch_to_sw_timer(vcpu);
>>>> >>               if (check_tsc_unstable()) {
>>>> >>                       u64 offset = kvm_compute_tsc_offset(vcpu,
>>>> >>
>>> >
>>> > Thanks, this is good as a fallback.  I'll try to fix it by getting the
>>> > pin-based execution controls right but if I fail this patch is okay.
>> I believe we still need this patch even if you implement "L1 TSC
>> deadline timer to trigger while L2 is running" eventually, the codes
>> you posted before:
>>
>>   exec_control = vmcs12->pin_based_vm_exec_control;
>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>   exec_control |= vmcs_config.pin_based_exec_ctrl;
>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>> + if (vmx->hv_deadline_tsc == -1)
>> +     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>
>> So there is still case the preemption timer bit of vmcs02 is not set,
>> however,  the scenario I mentioned above in kvm_arch_vcpu_load() will
>> set it unnecessary.
>
> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
> if vmcs02 is the loaded one.
>
> This can happen if L2 has access to L1's local APIC registers (i.e. L1
> passes the local APIC instead of emulating it, as is the case in a
> partitioning hypervisor).  While L2 runs, it writes to the TSC deadline
> MSR of L1.  This causes a call to kvm_x86_ops->set_hv_timer while the
> active VMCS is a vmcs02.

Yes, in the scenario you pointed out the call to
kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct,
however, in the scenario I mentioned in the patch description is not
correct even if enable "L1 TSC deadline timer to trigger while L2 is
running".

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 8, 2016, 10:18 a.m. UTC | #5
On 08/07/2016 02:38, Wanpeng Li wrote:
> 2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 07/07/2016 15:23, Wanpeng Li wrote:
>>>>>>>
>>>>>>>               if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>>>>>> +                     (is_guest_mode(vcpu) ||
>>>>>>>                               kvm_x86_ops->set_hv_timer(vcpu,
>>>>>>> -                                     kvm_get_lapic_tscdeadline_msr(vcpu)))
>>>>>>> +                                     kvm_get_lapic_tscdeadline_msr(vcpu))))
>>>>>>>                       kvm_lapic_switch_to_sw_timer(vcpu);
>>>>>>>               if (check_tsc_unstable()) {
>>>>>>>                       u64 offset = kvm_compute_tsc_offset(vcpu,
>>>>>>>
>>>>>
>>>>> Thanks, this is good as a fallback.  I'll try to fix it by getting the
>>>>> pin-based execution controls right but if I fail this patch is okay.
>>> I believe we still need this patch even if you implement "L1 TSC
>>> deadline timer to trigger while L2 is running" eventually, the codes
>>> you posted before:
>>>
>>>   exec_control = vmcs12->pin_based_vm_exec_control;
>>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>   exec_control |= vmcs_config.pin_based_exec_ctrl;
>>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>> + if (vmx->hv_deadline_tsc == -1)
>>> +     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>
>>> So there is still case the preemption timer bit of vmcs02 is not set,
>>> however,  the scenario I mentioned above in kvm_arch_vcpu_load() will
>>> set it unnecessary.
>>
>> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
>> if vmcs02 is the loaded one.
>>
>> This can happen if L2 has access to L1's local APIC registers (i.e. L1
>> passes the local APIC instead of emulating it, as is the case in a
>> partitioning hypervisor).  While L2 runs, it writes to the TSC deadline
>> MSR of L1.  This causes a call to kvm_x86_ops->set_hv_timer while the
>> active VMCS is a vmcs02.
> 
> Yes, in the scenario you pointed out the call to
> kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct,
> however, in the scenario I mentioned in the patch description is not
> correct even if enable "L1 TSC deadline timer to trigger while L2 is
> running".

It doesn't help that you have not explained how to reproduce the
bug---this is what the cover letter and commit messages are for, too.

Your patch 1 is enough for me to boot L2 Windows 2008 inside L1 KVM 4.1.
 So I have an updated patch to handle the TSC deadline timer while L2 is
running, but I have no idea how to test its correctness.  I'll send the
patch shortly.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 8, 2016, 1:58 p.m. UTC | #6
2016-07-08 18:18 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 08/07/2016 02:38, Wanpeng Li wrote:
>> 2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>
>>>
>>> On 07/07/2016 15:23, Wanpeng Li wrote:
>>>>>>>>
>>>>>>>>               if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>>>>>>> +                     (is_guest_mode(vcpu) ||
>>>>>>>>                               kvm_x86_ops->set_hv_timer(vcpu,
>>>>>>>> -                                     kvm_get_lapic_tscdeadline_msr(vcpu)))
>>>>>>>> +                                     kvm_get_lapic_tscdeadline_msr(vcpu))))
>>>>>>>>                       kvm_lapic_switch_to_sw_timer(vcpu);
>>>>>>>>               if (check_tsc_unstable()) {
>>>>>>>>                       u64 offset = kvm_compute_tsc_offset(vcpu,
>>>>>>>>
>>>>>>
>>>>>> Thanks, this is good as a fallback.  I'll try to fix it by getting the
>>>>>> pin-based execution controls right but if I fail this patch is okay.
>>>> I believe we still need this patch even if you implement "L1 TSC
>>>> deadline timer to trigger while L2 is running" eventually, the codes
>>>> you posted before:
>>>>
>>>>   exec_control = vmcs12->pin_based_vm_exec_control;
>>>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>   exec_control |= vmcs_config.pin_based_exec_ctrl;
>>>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>> + if (vmx->hv_deadline_tsc == -1)
>>>> +     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>
>>>> So there is still case the preemption timer bit of vmcs02 is not set,
>>>> however,  the scenario I mentioned above in kvm_arch_vcpu_load() will
>>>> set it unnecessary.
>>>
>>> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
>>> if vmcs02 is the loaded one.
>>>
>>> This can happen if L2 has access to L1's local APIC registers (i.e. L1
>>> passes the local APIC instead of emulating it, as is the case in a
>>> partitioning hypervisor).  While L2 runs, it writes to the TSC deadline
>>> MSR of L1.  This causes a call to kvm_x86_ops->set_hv_timer while the
>>> active VMCS is a vmcs02.
>>
>> Yes, in the scenario you pointed out the call to
>> kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct,
>> however, in the scenario I mentioned in the patch description is not
>> correct even if enable "L1 TSC deadline timer to trigger while L2 is
>> running".
>
> It doesn't help that you have not explained how to reproduce the
> bug---this is what the cover letter and commit messages are for, too.

I believe you pointed out this before:

| The patch looks correct, but I'm not sure how you get a preemption
| timer vmexit while vmcs02 is active:
|
| exec_control = vmcs12->pin_based_vm_exec_control;
| exec_control |= vmcs_config.pin_based_exec_ctrl;
| exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;

We can't get preemption timer vmexit which vmcs02 is loaded since
preemtion timer bit in vmcs02 is not set, then how can we get the
incorrect preemption timer vmexit in nested guest which patch 1 fixed?
Because the scenario I mentioned in patch 2 set vmcs02.

w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is
running"  => we will get the bug calltrace mentioned in patch1 since
incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So
apply patch2 can fix it.

However, after enable "L1 TSC deadline timer to trigger while L2 is
running", we should have patch 1 to correctly capture nested vmexit
for preemption timer.

My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's
preemption timer enable patches and my previous "fix missed
cancellation of TSC deadline timer" patches. I always enable
preemption timer in L0, but try to enable or disable preemption timer
in L1.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 8, 2016, 2:08 p.m. UTC | #7
2016-07-08 21:58 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2016-07-08 18:18 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 08/07/2016 02:38, Wanpeng Li wrote:
>>> 2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>>
>>>>
>>>> On 07/07/2016 15:23, Wanpeng Li wrote:
>>>>>>>>>
>>>>>>>>>               if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>>>>>>>> +                     (is_guest_mode(vcpu) ||
>>>>>>>>>                               kvm_x86_ops->set_hv_timer(vcpu,
>>>>>>>>> -                                     kvm_get_lapic_tscdeadline_msr(vcpu)))
>>>>>>>>> +                                     kvm_get_lapic_tscdeadline_msr(vcpu))))
>>>>>>>>>                       kvm_lapic_switch_to_sw_timer(vcpu);
>>>>>>>>>               if (check_tsc_unstable()) {
>>>>>>>>>                       u64 offset = kvm_compute_tsc_offset(vcpu,
>>>>>>>>>
>>>>>>>
>>>>>>> Thanks, this is good as a fallback.  I'll try to fix it by getting the
>>>>>>> pin-based execution controls right but if I fail this patch is okay.
>>>>> I believe we still need this patch even if you implement "L1 TSC
>>>>> deadline timer to trigger while L2 is running" eventually, the codes
>>>>> you posted before:
>>>>>
>>>>>   exec_control = vmcs12->pin_based_vm_exec_control;
>>>>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>>   exec_control |= vmcs_config.pin_based_exec_ctrl;
>>>>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>> + if (vmx->hv_deadline_tsc == -1)
>>>>> +     exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>>
>>>>> So there is still case the preemption timer bit of vmcs02 is not set,
>>>>> however,  the scenario I mentioned above in kvm_arch_vcpu_load() will
>>>>> set it unnecessary.
>>>>
>>>> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
>>>> if vmcs02 is the loaded one.
>>>>
>>>> This can happen if L2 has access to L1's local APIC registers (i.e. L1
>>>> passes the local APIC instead of emulating it, as is the case in a
>>>> partitioning hypervisor).  While L2 runs, it writes to the TSC deadline
>>>> MSR of L1.  This causes a call to kvm_x86_ops->set_hv_timer while the
>>>> active VMCS is a vmcs02.
>>>
>>> Yes, in the scenario you pointed out the call to
>>> kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct,
>>> however, in the scenario I mentioned in the patch description is not
>>> correct even if enable "L1 TSC deadline timer to trigger while L2 is
>>> running".
>>
>> It doesn't help that you have not explained how to reproduce the
>> bug---this is what the cover letter and commit messages are for, too.
>
> I believe you pointed out this before:
>
> | The patch looks correct, but I'm not sure how you get a preemption
> | timer vmexit while vmcs02 is active:
> |
> | exec_control = vmcs12->pin_based_vm_exec_control;
> | exec_control |= vmcs_config.pin_based_exec_ctrl;
> | exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>
> We can't get preemption timer vmexit which vmcs02 is loaded since
> preemtion timer bit in vmcs02 is not set, then how can we get the
> incorrect preemption timer vmexit in nested guest which patch 1 fixed?
> Because the scenario I mentioned in patch 2 set vmcs02.
>
> w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is
> running"  => we will get the bug calltrace mentioned in patch1 since
> incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So
> apply patch2 can fix it.
>
> However, after enable "L1 TSC deadline timer to trigger while L2 is
> running", we should have patch 1 to correctly capture nested vmexit
> for preemption timer.
>
> My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's
> preemption timer enable patches and my previous "fix missed
> cancellation of TSC deadline timer" patches. I always enable
> preemption timer in L0, but try to enable or disable preemption timer
> in L1.

Btw, my L1 is a full dynticks guest in order that hrtimer in L1 will
be heavily used.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 8, 2016, 3:47 p.m. UTC | #8
On 08/07/2016 16:08, Wanpeng Li wrote:
> 2016-07-08 21:58 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
>> We can't get preemption timer vmexit which vmcs02 is loaded since
>> preemtion timer bit in vmcs02 is not set, then how can we get the
>> incorrect preemption timer vmexit in nested guest which patch 1 fixed?
>> Because the scenario I mentioned in patch 2 set vmcs02.
>>
>> w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is
>> running"  => we will get the bug calltrace mentioned in patch1 since
>> incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So
>> apply patch2 can fix it.
>>
>> However, after enable "L1 TSC deadline timer to trigger while L2 is
>> running", we should have patch 1 to correctly capture nested vmexit
>> for preemption timer.
>>
>> My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's
>> preemption timer enable patches and my previous "fix missed
>> cancellation of TSC deadline timer" patches. I always enable
>> preemption timer in L0, but try to enable or disable preemption timer
>> in L1.
> 
> Btw, my L1 is a full dynticks guest in order that hrtimer in L1 will
> be heavily used.

Thanks---I'd be grateful if you tested my patch series.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li July 8, 2016, 10:57 p.m. UTC | #9
2016-07-08 23:47 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 08/07/2016 16:08, Wanpeng Li wrote:
>> 2016-07-08 21:58 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
>>> We can't get preemption timer vmexit which vmcs02 is loaded since
>>> preemtion timer bit in vmcs02 is not set, then how can we get the
>>> incorrect preemption timer vmexit in nested guest which patch 1 fixed?
>>> Because the scenario I mentioned in patch 2 set vmcs02.
>>>
>>> w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is
>>> running"  => we will get the bug calltrace mentioned in patch1 since
>>> incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So
>>> apply patch2 can fix it.
>>>
>>> However, after enable "L1 TSC deadline timer to trigger while L2 is
>>> running", we should have patch 1 to correctly capture nested vmexit
>>> for preemption timer.
>>>
>>> My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's
>>> preemption timer enable patches and my previous "fix missed
>>> cancellation of TSC deadline timer" patches. I always enable
>>> preemption timer in L0, but try to enable or disable preemption timer
>>> in L1.
>>
>> Btw, my L1 is a full dynticks guest in order that hrtimer in L1 will
>> be heavily used.
>
> Thanks---I'd be grateful if you tested my patch series.

I will do it and also reconfirm my patch 2 next monday.

Regards,
Wapeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0cc6cf8..05137c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2743,8 +2743,9 @@  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 			mark_tsc_unstable("KVM discovered backwards TSC");
 
 		if (kvm_lapic_hv_timer_in_use(vcpu) &&
+			(is_guest_mode(vcpu) ||
 				kvm_x86_ops->set_hv_timer(vcpu,
-					kvm_get_lapic_tscdeadline_msr(vcpu)))
+					kvm_get_lapic_tscdeadline_msr(vcpu))))
 			kvm_lapic_switch_to_sw_timer(vcpu);
 		if (check_tsc_unstable()) {
 			u64 offset = kvm_compute_tsc_offset(vcpu,