diff mbox

KVM: VMX: switch to hrtimer for TSC deadline timer when L2 guest is running

Message ID 20160706051051.7869-1-haozhong.zhang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Haozhong Zhang July 6, 2016, 5:10 a.m. UTC
A different VMCS is loaded when L2 guest is running, so it's incorrect
to use the VMX preemption timer for L1 TSC deadline timer. This patch
switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
and switches back to VMX preemption timer when nested VMEXIT from L2 to
L1.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 arch/x86/kvm/vmx.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Wanpeng Li July 6, 2016, 5:37 a.m. UTC | #1
Cc Jan,
2016-07-06 13:10 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> A different VMCS is loaded when L2 guest is running, so it's incorrect
> to use the VMX preemption timer for L1 TSC deadline timer. This patch
> switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
> and switches back to VMX preemption timer when nested VMEXIT from L2 to
> L1.

Nested preemption timer is emulated by hrtimer, so it doesn't
influence vmcs02, why this is needed?

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 6, 2016, 6:04 a.m. UTC | #2
On 06/07/2016 07:37, Wanpeng Li wrote:
> Cc Jan,
> 2016-07-06 13:10 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
>> A different VMCS is loaded when L2 guest is running, so it's incorrect
>> to use the VMX preemption timer for L1 TSC deadline timer. This patch
>> switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
>> and switches back to VMX preemption timer when nested VMEXIT from L2 to
>> L1.
> 
> Nested preemption timer is emulated by hrtimer, so it doesn't
> influence vmcs02, why this is needed?

I agree (and nested APIC timer is emulated by L1).  Because the L1 APIC
timer is set in the preemption timer before vmentry, this should not be
necessary.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haozhong Zhang July 6, 2016, 6:05 a.m. UTC | #3
On 07/06/16 13:37, Wanpeng Li wrote:
> Cc Jan,
> 2016-07-06 13:10 GMT+08:00 Haozhong Zhang <haozhong.zhang@intel.com>:
> > A different VMCS is loaded when L2 guest is running, so it's incorrect
> > to use the VMX preemption timer for L1 TSC deadline timer. This patch
> > switches to hrtimer for L1 TSC deadline timer when entering L2 guest,
> > and switches back to VMX preemption timer when nested VMEXIT from L2 to
> > L1.
> 
> Nested preemption timer is emulated by hrtimer, so it doesn't
> influence vmcs02, why this is needed?

Nested (L2) preemption timer is not affected for sure, and this patch
is to fix another problem caused by using L1 preemption timer for L1
TSC deadline timer. When we use L1 VMX preemption timer for L1 TSC
deadline timer, we intend to use the pin-based exec config and the VMX
preemption timer value in vmcs01. However, when L2 guest is running,
vmcs02 is loaded as the current VMCS so that a different VMX
preemption timer config is used (i.e. VMX preemption timer is disabled
in prepare_vmcs02()). If we still use preemption timer for L1 TSC
deadline timer at this moment, then L1 TSC deadline timer will not be
able to be triggered when L2 guest is running.

Thanks,
Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 6, 2016, 7:46 a.m. UTC | #4
On 06/07/2016 08:05, Haozhong Zhang wrote:
>> > 
>> > Nested preemption timer is emulated by hrtimer, so it doesn't
>> > influence vmcs02, why this is needed?
> Nested (L2) preemption timer is not affected for sure, and this patch
> is to fix another problem caused by using L1 preemption timer for L1
> TSC deadline timer. When we use L1 VMX preemption timer for L1 TSC
> deadline timer, we intend to use the pin-based exec config and the VMX
> preemption timer value in vmcs01. However, when L2 guest is running,
> vmcs02 is loaded as the current VMCS so that a different VMX
> preemption timer config is used (i.e. VMX preemption timer is disabled
> in prepare_vmcs02()). If we still use preemption timer for L1 TSC
> deadline timer at this moment, then L1 TSC deadline timer will not be
> able to be triggered when L2 guest is running.

The right fix then is to set the pin-based controls in prepare_vmcs02,
based on vcpu->arch.hv_deadline_tsc.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haozhong Zhang July 6, 2016, 8:01 a.m. UTC | #5
On 07/06/16 09:46, Paolo Bonzini wrote:
> 
> 
> On 06/07/2016 08:05, Haozhong Zhang wrote:
> >> > 
> >> > Nested preemption timer is emulated by hrtimer, so it doesn't
> >> > influence vmcs02, why this is needed?
> > Nested (L2) preemption timer is not affected for sure, and this patch
> > is to fix another problem caused by using L1 preemption timer for L1
> > TSC deadline timer. When we use L1 VMX preemption timer for L1 TSC
> > deadline timer, we intend to use the pin-based exec config and the VMX
> > preemption timer value in vmcs01. However, when L2 guest is running,
> > vmcs02 is loaded as the current VMCS so that a different VMX
> > preemption timer config is used (i.e. VMX preemption timer is disabled
> > in prepare_vmcs02()). If we still use preemption timer for L1 TSC
> > deadline timer at this moment, then L1 TSC deadline timer will not be
> > able to be triggered when L2 guest is running.
> 
> The right fix then is to set the pin-based controls in prepare_vmcs02,
> based on vcpu->arch.hv_deadline_tsc.

Then KVM needs to distinguish L1 and L2 VMEXITs for preemption timer
(or does KVM already have this?). The current implementation which
uses a separate hrtimer for L2 preemption timer can easily distinguish
them.

Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haozhong Zhang July 6, 2016, 8:09 a.m. UTC | #6
On 07/06/16 16:01, Haozhong Zhang wrote:
> On 07/06/16 09:46, Paolo Bonzini wrote:
> > 
> > 
> > On 06/07/2016 08:05, Haozhong Zhang wrote:
> > >> > 
> > >> > Nested preemption timer is emulated by hrtimer, so it doesn't
> > >> > influence vmcs02, why this is needed?
> > > Nested (L2) preemption timer is not affected for sure, and this patch
> > > is to fix another problem caused by using L1 preemption timer for L1
> > > TSC deadline timer. When we use L1 VMX preemption timer for L1 TSC
> > > deadline timer, we intend to use the pin-based exec config and the VMX
> > > preemption timer value in vmcs01. However, when L2 guest is running,
> > > vmcs02 is loaded as the current VMCS so that a different VMX
> > > preemption timer config is used (i.e. VMX preemption timer is disabled
> > > in prepare_vmcs02()). If we still use preemption timer for L1 TSC
> > > deadline timer at this moment, then L1 TSC deadline timer will not be
> > > able to be triggered when L2 guest is running.
> > 
> > The right fix then is to set the pin-based controls in prepare_vmcs02,
> > based on vcpu->arch.hv_deadline_tsc.
> 
> Then KVM needs to distinguish L1 and L2 VMEXITs for preemption timer
> (or does KVM already have this?). The current implementation which
> uses a separate hrtimer for L2 preemption timer can easily distinguish
> them.
> 

And L1 hypervisor may also want to use preemption timer for L2
guest. In this case, L0 KVM can not reuse preemption timer for both L1
and L2.

Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 6, 2016, 8:14 a.m. UTC | #7
> On 07/06/16 16:01, Haozhong Zhang wrote:
> > On 07/06/16 09:46, Paolo Bonzini wrote:
> > > On 06/07/2016 08:05, Haozhong Zhang wrote:
> > > >> > Nested preemption timer is emulated by hrtimer, so it doesn't
> > > >> > influence vmcs02, why this is needed?
> > > > Nested (L2) preemption timer is not affected for sure, and this patch
> > > > is to fix another problem caused by using L1 preemption timer for L1
> > > > TSC deadline timer. When we use L1 VMX preemption timer for L1 TSC
> > > > deadline timer, we intend to use the pin-based exec config and the VMX
> > > > preemption timer value in vmcs01. However, when L2 guest is running,
> > > > vmcs02 is loaded as the current VMCS so that a different VMX
> > > > preemption timer config is used (i.e. VMX preemption timer is disabled
> > > > in prepare_vmcs02()). If we still use preemption timer for L1 TSC
> > > > deadline timer at this moment, then L1 TSC deadline timer will not be
> > > > able to be triggered when L2 guest is running.
> > > 
> > > The right fix then is to set the pin-based controls in prepare_vmcs02,
> > > based on vcpu->arch.hv_deadline_tsc.
> > 
> > Then KVM needs to distinguish L1 and L2 VMEXITs for preemption timer
> > (or does KVM already have this?). The current implementation which
> > uses a separate hrtimer for L2 preemption timer can easily distinguish
> > them.
> 
> And L1 hypervisor may also want to use preemption timer for L2
> guest. In this case, L0 KVM can not reuse preemption timer for both L1
> and L2.

The vmcs12's preemption timer is emulated by KVM through a separate hrtimer,
so KVM would only use the vmcs02 preemption timer for the L1 APIC timer.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Haozhong Zhang July 6, 2016, 8:21 a.m. UTC | #8
On 07/06/16 04:14, Paolo Bonzini wrote:
> > On 07/06/16 16:01, Haozhong Zhang wrote:
> > > On 07/06/16 09:46, Paolo Bonzini wrote:
> > > > On 06/07/2016 08:05, Haozhong Zhang wrote:
> > > > >> > Nested preemption timer is emulated by hrtimer, so it doesn't
> > > > >> > influence vmcs02, why this is needed?
> > > > > Nested (L2) preemption timer is not affected for sure, and this patch
> > > > > is to fix another problem caused by using L1 preemption timer for L1
> > > > > TSC deadline timer. When we use L1 VMX preemption timer for L1 TSC
> > > > > deadline timer, we intend to use the pin-based exec config and the VMX
> > > > > preemption timer value in vmcs01. However, when L2 guest is running,
> > > > > vmcs02 is loaded as the current VMCS so that a different VMX
> > > > > preemption timer config is used (i.e. VMX preemption timer is disabled
> > > > > in prepare_vmcs02()). If we still use preemption timer for L1 TSC
> > > > > deadline timer at this moment, then L1 TSC deadline timer will not be
> > > > > able to be triggered when L2 guest is running.
> > > > 
> > > > The right fix then is to set the pin-based controls in prepare_vmcs02,
> > > > based on vcpu->arch.hv_deadline_tsc.
> > > 
> > > Then KVM needs to distinguish L1 and L2 VMEXITs for preemption timer
> > > (or does KVM already have this?). The current implementation which
> > > uses a separate hrtimer for L2 preemption timer can easily distinguish
> > > them.
> > 
> > And L1 hypervisor may also want to use preemption timer for L2
> > guest. In this case, L0 KVM can not reuse preemption timer for both L1
> > and L2.
> 
> The vmcs12's preemption timer is emulated by KVM through a separate hrtimer,
> so KVM would only use the vmcs02 preemption timer for the L1 APIC timer.

Ah, I got what you mean. Please drop this patch, and I'll send another
one later.

Thanks,
Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85e2f0a..cc29c2a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10203,6 +10203,9 @@  static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (!vmcs02)
 		return -ENOMEM;
 
+	if (kvm_lapic_hv_timer_in_use(vcpu))
+		kvm_lapic_switch_to_sw_timer(vcpu);
+
 	enter_guest_mode(vcpu);
 
 	vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
@@ -10227,6 +10230,7 @@  static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (msr_entry_idx) {
 		leave_guest_mode(vcpu);
 		vmx_load_vmcs01(vcpu);
+		kvm_lapic_switch_to_hv_timer(vcpu);
 		nested_vmx_entry_failure(vcpu, vmcs12,
 				EXIT_REASON_MSR_LOAD_FAIL, msr_entry_idx);
 		return 1;
@@ -10700,6 +10704,7 @@  static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 		nested_vmx_abort(vcpu, VMX_ABORT_SAVE_GUEST_MSR_FAIL);
 
 	vmx_load_vmcs01(vcpu);
+	kvm_lapic_switch_to_hv_timer(vcpu);
 
 	if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
 	    && nested_exit_intr_ack_set(vcpu)) {