diff mbox

[2/3] KVM: X86: implement the logic for spinlock optimization

Message ID 1502095466-21312-3-git-send-email-longpeng2@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Longpeng(Mike) Aug. 7, 2017, 8:44 a.m. UTC
Implements the kvm_arch_vcpu_spin/preempt_in_kernel(), because get_cpl
requires vcpu_load, so we must cache the result(whether the vcpu was
preempted when its cpl=0) in kvm_arch_vcpu.

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++++
 arch/x86/kvm/svm.c              |  6 ++++++
 arch/x86/kvm/vmx.c              | 20 ++++++++++++++++++++
 arch/x86/kvm/x86.c              |  9 +++++++--
 4 files changed, 38 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Aug. 7, 2017, 10:45 a.m. UTC | #1
On 07/08/2017 10:44, Longpeng(Mike) wrote:
> +
> +	/*
> +	 * Intel sdm vol3 ch-25.1.3 says: The “PAUSE-loop exiting”
> +	 * VM-execution control is ignored if CPL > 0. So the vcpu
> +	 * is always exiting with CPL=0 if it uses PLE.

This is not true (how can it be?).  What 25.1.3 says is, the VCPU is
always at CPL=0 if you get a PAUSE exit (reason 40) and PAUSE exiting is
0 (it always is for KVM).  But here you're looking for a VCPU that
didn't get a PAUSE exit, so the CPL can certainly be 3.

However, I understand that vmx_get_cpl can be a bit slow here.  You can
actually read SS's access rights directly in this function and get the
DPL from there, that's going to be just a single VMREAD.

The only difference is when vmx->rmode.vm86_active=1.  However,
pause-loop exiting is not working properly anyway if
vmx->rmode.vm86_active=1, because CPL=3 according to the processor.

Paolo

> +	 * The following block needs less cycles than vmx_get_cpl().
> +	 */
> +	if (cpu_has_secondary_exec_ctrls())
> +		secondary_exec_ctrl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
> +	if (secondary_exec_ctrl & SECONDARY_EXEC_PAUSE_LOOP_EXITING)
> +		return true;
> +

Paolo
long mike Aug. 7, 2017, 12:28 p.m. UTC | #2
On 08/07/2017 06:45 PM, Paolo Bonzini wrote:
> On 07/08/2017 10:44, Longpeng(Mike) wrote:
>> +
>> +	/*
>> +	 * Intel sdm vol3 ch-25.1.3 says: The “PAUSE-loop exiting”
>> +	 * VM-execution control is ignored if CPL > 0. So the vcpu
>> +	 * is always exiting with CPL=0 if it uses PLE.
> 
> This is not true (how can it be?).  What 25.1.3 says is, the VCPU is
> always at CPL=0 if you get a PAUSE exit (reason 40) and PAUSE exiting is
> 0 (it always is for KVM).  But here you're looking for a VCPU that
> didn't get a PAUSE exit, so the CPL can certainly be 3.
> 

Hi Paolo,

My comment above is something wrong(please forgive my poor English), my 
origin meaning is:
	The “PAUSE-loop exiting” VM-execution control is ignored if
	CPL > 0. So the vcpu's CPL is must 0 if it exits due to PLE.

* kvm_arch_spin_in_kernel() returns whether the vcpu(which exits due to 
spinlock) is CPL=0. It only be called by kvm_vcpu_on_spin(), and the 
input vcpu is 'me' which get a PAUSE exit now. *

I split kvm_arch_vcpu_in_kernel(in RFC) into two functions: 
kvm_arch_spin_in_kernel and kvm_arch_preempt_in_kernel


Because of KVM/VMX L1 never set CPU_BASED_PAUSE_EXITING and only set
SECONDARY_EXEC_PAUSE_LOOP_EXITING if supported, so for L1:
1. get a PAUSE exit with CPL=0 if PLE is supported
2. never get a PAUSE exit if don't support PLE

So, I think it can direct return true(CPL=0) if supports PLE.

But for nested KVM/VMX(I'm not familiar with nested), it could set 
CPU_BASED_PAUSE_EXITING, so I think get_cpl() is also needed.


If the above is correct, what about this way( we can save a vmcs_read 
opeartion for L1):

kvm_arch_vcpu_spin_in_kernel(vcpu)
{
	if (!is_guest_mode(vcpu))
		return true;

	return vmx_get_cpl(vcpu) == 0;
}

kvm_vcpu_on_spin()
{
	/* @me get a PAUSE exit */
	me_in_kernel = kvm_arch_vcpu_spin_in_kernel(me);
	...
	for each vcpu {
		...
		if (me_in_kernel && !...preempt_in_kernel(vcpu))
			continue;
		...
	}
	...
}

---
Regards,
Longpeng(Mike)

> However, I understand that vmx_get_cpl can be a bit slow here.  You can
> actually read SS's access rights directly in this function and get the
> DPL from there, that's going to be just a single VMREAD.
> 
> The only difference is when vmx->rmode.vm86_active=1.  However,
> pause-loop exiting is not working properly anyway if
> vmx->rmode.vm86_active=1, because CPL=3 according to the processor.
> 
> Paolo
> 
>> +	 * The following block needs less cycles than vmx_get_cpl().
>> +	 */
>> +	if (cpu_has_secondary_exec_ctrls())
>> +		secondary_exec_ctrl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
>> +	if (secondary_exec_ctrl & SECONDARY_EXEC_PAUSE_LOOP_EXITING)
>> +		return true;
>> +
> 
> Paolo
>
Paolo Bonzini Aug. 7, 2017, 1:16 p.m. UTC | #3
On 07/08/2017 14:28, Longpeng(Mike) wrote:
> * kvm_arch_spin_in_kernel() returns whether the vcpu (which exits due to
> spinlock) is CPL=0. It only be called by kvm_vcpu_on_spin(), and the
> input vcpu is 'me' which get a PAUSE exit now. *
> 
> I split kvm_arch_vcpu_in_kernel(in RFC) into two functions:
> kvm_arch_spin_in_kernel and kvm_arch_preempt_in_kernel
> 
> Because of KVM/VMX L1 never set CPU_BASED_PAUSE_EXITING and only set
> SECONDARY_EXEC_PAUSE_LOOP_EXITING if supported, so for L1:

I understand better now.  I think vmx.c should just return true from
vmx_spin_in_kernel.  However, kvm_arch_vcpu_spin_in_kernel is not
necessary.  Instead you should make "in_kern" an argument to
kvm_vcpu_on_spin (maybe renamed to "yield_to_kernel_mode_vcpu").

Then vmx.c can just call "kvm_vcpu_on_spin(vcpu, true)".

> 1. get a PAUSE exit with CPL=0 if PLE is supported
> 2. never get a PAUSE exit if don't support PLE
> 
> So, I think it can direct return true(CPL=0) if supports PLE.
> 
> But for nested KVM/VMX(I'm not familiar with nested), it could set
> CPU_BASED_PAUSE_EXITING, so I think get_cpl() is also needed.

If the nested hypervisor sets CPU_BASED_PAUSE_EXITING, a PAUSE vmexit
while running a nested guest would be reflected to the nested
hypervisor.  So you wouldn't get to handle_pause and thus to
kvm_vcpu_on_spin.

Thanks,

Paolo
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 87ac4fb..d2b2d57 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -688,6 +688,9 @@  struct kvm_vcpu_arch {
 
 	/* GPA available (AMD only) */
 	bool gpa_available;
+
+	/* be preempted when it's in kernel-mode(cpl=0) */
+	bool preempted_in_kernel;
 };
 
 struct kvm_lpage_info {
@@ -1057,6 +1060,8 @@  struct kvm_x86_ops {
 	void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
 
 	void (*setup_mce)(struct kvm_vcpu *vcpu);
+
+	bool (*spin_in_kernel)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4d8141e..552ab4c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5352,6 +5352,11 @@  static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static bool svm_spin_in_kernel(struct kvm_vcpu *vcpu)
+{
+	return svm_get_cpl(vcpu) == 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = has_svm,
 	.disabled_by_bios = is_disabled,
@@ -5464,6 +5469,7 @@  static void svm_setup_mce(struct kvm_vcpu *vcpu)
 	.deliver_posted_interrupt = svm_deliver_avic_intr,
 	.update_pi_irte = svm_update_pi_irte,
 	.setup_mce = svm_setup_mce,
+	.spin_in_kernel = svm_spin_in_kernel,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 39a6222..d0dfe2e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11547,6 +11547,25 @@  static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 			~FEATURE_CONTROL_LMCE;
 }
 
+static bool vmx_spin_in_kernel(struct kvm_vcpu *vcpu)
+{
+	u32 secondary_exec_ctrl = 0;
+
+	/*
+	 * Intel sdm vol3 ch-25.1.3 says: The “PAUSE-loop exiting”
+	 * VM-execution control is ignored if CPL > 0. So the vcpu
+	 * is always exiting with CPL=0 if it uses PLE.
+	 *
+	 * The following block needs less cycles than vmx_get_cpl().
+	 */
+	if (cpu_has_secondary_exec_ctrls())
+		secondary_exec_ctrl = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
+	if (secondary_exec_ctrl & SECONDARY_EXEC_PAUSE_LOOP_EXITING)
+		return true;
+
+	return vmx_get_cpl(vcpu) == 0;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 	.cpu_has_kvm_support = cpu_has_kvm_support,
 	.disabled_by_bios = vmx_disabled_by_bios,
@@ -11674,6 +11693,7 @@  static void vmx_setup_mce(struct kvm_vcpu *vcpu)
 #endif
 
 	.setup_mce = vmx_setup_mce,
+	.spin_in_kernel = vmx_spin_in_kernel,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04c6a1f..fa79a60 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2881,6 +2881,10 @@  static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	int idx;
+
+	if (vcpu->preempted)
+		vcpu->arch.preempted_in_kernel = !kvm_x86_ops->get_cpl(vcpu);
+
 	/*
 	 * Disable page faults because we're in atomic context here.
 	 * kvm_write_guest_offset_cached() would call might_fault()
@@ -7988,6 +7992,7 @@  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	kvm_pmu_init(vcpu);
 
 	vcpu->arch.pending_external_vector = -1;
+	vcpu->arch.preempted_in_kernel = false;
 
 	kvm_hv_vcpu_init(vcpu);
 
@@ -8437,12 +8442,12 @@  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 
 bool kvm_arch_vcpu_spin_in_kernel(struct kvm_vcpu *vcpu)
 {
-	return false;
+	return kvm_x86_ops->spin_in_kernel(vcpu);
 }
 
 bool kvm_arch_vcpu_preempt_in_kernel(struct kvm_vcpu *vcpu)
 {
-	return false;
+	return vcpu->arch.preempted_in_kernel;
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)