[v2,3/5] KVM: X86: Boost vCPU which is in critical section

Message ID	1648800605-18074-4-git-send-email-wanpengli@tencent.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Wanpeng Li <kernellwp@gmail.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org> Subject: [PATCH v2 3/5] KVM: X86: Boost vCPU which is in critical section Date: Fri, 1 Apr 2022 01:10:03 -0700 Message-Id: <1648800605-18074-4-git-send-email-wanpengli@tencent.com> In-Reply-To: <1648800605-18074-1-git-send-email-wanpengli@tencent.com> References: <1648800605-18074-1-git-send-email-wanpengli@tencent.com> Precedence: bulk
Series	KVM: X86: Scaling Guest OS Critical Sections with boosting \| expand [v2,0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting [v2,1/5] KVM: X86: Add MSR_KVM_PREEMPT_COUNT support [v2,2/5] KVM: X86: Add last guest interrupt disable state support [v2,3/5] KVM: X86: Boost vCPU which is in critical section [v2,4/5] x86/kvm: Add MSR_KVM_PREEMPT_COUNT guest support [v2,5/5] KVM: X86: Expose PREEMT_COUNT CPUID feature bit to guest

Message ID

1648800605-18074-4-git-send-email-wanpengli@tencent.com (mailing list archive)

State

New, archived

Headers

From: Wanpeng Li <kernellwp@gmail.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
        Sean Christopherson <seanjc@google.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>
Subject: [PATCH v2 3/5] KVM: X86: Boost vCPU which is in critical section
Date: Fri,  1 Apr 2022 01:10:03 -0700
Message-Id: <1648800605-18074-4-git-send-email-wanpengli@tencent.com>
In-Reply-To: <1648800605-18074-1-git-send-email-wanpengli@tencent.com>
References: <1648800605-18074-1-git-send-email-wanpengli@tencent.com>
Precedence: bulk

Series

KVM: X86: Scaling Guest OS Critical Sections with boosting | expand

Commit Message

Wanpeng Li April 1, 2022, 8:10 a.m. UTC

From: Wanpeng Li <wanpengli@tencent.com>

The missing semantic gap that occurs when a guest OS is preempted 
when executing its own critical section, this leads to degradation 
of application scalability. We try to bridge this semantic gap in 
some ways, by passing guest preempt_count to the host and checking 
guest irq disable state, the hypervisor now knows whether guest 
OSes are running in the critical section, the hypervisor yield-on-spin 
heuristics can be more smart this time to boost the vCPU candidate 
who is in the critical section to mitigate this preemption problem, 
in addition, it is more likely to be a potential lock holder.

Testing on 96 HT 2 socket Xeon CLX server, with 96 vCPUs VM 100GB RAM,
one VM running benchmark, the other(none-2) VMs running cpu-bound 
workloads, There is no performance regression for other benchmarks 
like Unixbench etc.

1VM
            vanilla    optimized    improved

hackbench -l 50000
              28         21.45        30.5%
ebizzy -M
             12189       12354        1.4%
dbench
             712 MB/sec  722 MB/sec   1.4%

2VM:
            vanilla    optimized    improved

hackbench -l 10000
              29.4        26          13%
ebizzy -M
             3834        4033          5%
dbench
           42.3 MB/sec  44.1 MB/sec   4.3%

3VM:
            vanilla    optimized    improved

hackbench -l 10000
              47         35.46        33%
ebizzy -M
	     3828        4031         5%
dbench 
           30.5 MB/sec  31.16 MB/sec  2.3%

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/x86.c       | 22 ++++++++++++++++++++++
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      |  7 +++++++
 3 files changed, 30 insertions(+)

Comments

Sean Christopherson April 13, 2022, 9:43 p.m. UTC | #1

+tglx and PeterZ

On Fri, Apr 01, 2022, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> The missing semantic gap that occurs when a guest OS is preempted 
> when executing its own critical section, this leads to degradation 
> of application scalability. We try to bridge this semantic gap in 
> some ways, by passing guest preempt_count to the host and checking 
> guest irq disable state, the hypervisor now knows whether guest 
> OSes are running in the critical section, the hypervisor yield-on-spin 
> heuristics can be more smart this time to boost the vCPU candidate 
> who is in the critical section to mitigate this preemption problem, 
> in addition, it is more likely to be a potential lock holder.
> 
> Testing on 96 HT 2 socket Xeon CLX server, with 96 vCPUs VM 100GB RAM,
> one VM running benchmark, the other(none-2) VMs running cpu-bound 
> workloads, There is no performance regression for other benchmarks 
> like Unixbench etc.

...

> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/x86.c       | 22 ++++++++++++++++++++++
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/kvm_main.c      |  7 +++++++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9aa05f79b743..b613cd2b822a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10377,6 +10377,28 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  	return r;
>  }
>  
> +static bool kvm_vcpu_is_preemptible(struct kvm_vcpu *vcpu)
> +{
> +	int count;
> +
> +	if (!vcpu->arch.pv_pc.preempt_count_enabled)
> +		return false;
> +
> +	if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
> +	    &count, sizeof(int)))
> +		return !(count & ~PREEMPT_NEED_RESCHED);

As I pointed out in v1[*], this makes PREEMPT_NEED_RESCHED and really the entire
__preempt_count to some extent, KVM guest/host ABI.  That needs acks from sched
folks, and if they're ok with it, needs to be formalized somewhere in kvm_para.h,
not buried in the KVM host code.

[*] https://lore.kernel.org/all/YkOfJeXm8MiMOEyh@google.com

> +
> +	return false;
> +}
> +

Peter Zijlstra April 14, 2022, 8:08 a.m. UTC | #2

On Wed, Apr 13, 2022 at 09:43:03PM +0000, Sean Christopherson wrote:
> +tglx and PeterZ
> 
> On Fri, Apr 01, 2022, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> > 
> > The missing semantic gap that occurs when a guest OS is preempted 
> > when executing its own critical section, this leads to degradation 
> > of application scalability. We try to bridge this semantic gap in 
> > some ways, by passing guest preempt_count to the host and checking 
> > guest irq disable state, the hypervisor now knows whether guest 
> > OSes are running in the critical section, the hypervisor yield-on-spin 
> > heuristics can be more smart this time to boost the vCPU candidate 
> > who is in the critical section to mitigate this preemption problem, 
> > in addition, it is more likely to be a potential lock holder.
> > 
> > Testing on 96 HT 2 socket Xeon CLX server, with 96 vCPUs VM 100GB RAM,
> > one VM running benchmark, the other(none-2) VMs running cpu-bound 
> > workloads, There is no performance regression for other benchmarks 
> > like Unixbench etc.
> 
> ...
> 
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/x86.c       | 22 ++++++++++++++++++++++
> >  include/linux/kvm_host.h |  1 +
> >  virt/kvm/kvm_main.c      |  7 +++++++
> >  3 files changed, 30 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9aa05f79b743..b613cd2b822a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10377,6 +10377,28 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >  	return r;
> >  }
> >  
> > +static bool kvm_vcpu_is_preemptible(struct kvm_vcpu *vcpu)
> > +{
> > +	int count;
> > +
> > +	if (!vcpu->arch.pv_pc.preempt_count_enabled)
> > +		return false;
> > +
> > +	if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
> > +	    &count, sizeof(int)))
> > +		return !(count & ~PREEMPT_NEED_RESCHED);
> 
> As I pointed out in v1[*], this makes PREEMPT_NEED_RESCHED and really the entire
> __preempt_count to some extent, KVM guest/host ABI.  That needs acks from sched
> folks, and if they're ok with it, needs to be formalized somewhere in kvm_para.h,
> not buried in the KVM host code.

Right, not going to happen. There's been plenty changes to
__preempt_count over the past years, suggesting that making it ABI will
be an incredibly bad idea.

It also only solves part of the problem; namely spinlocks, but doesn't
help at all with mutexes, which can be equally short lived, as evidenced
by the adaptive spinning mutex code etc..

Also, I'm not sure I fully understand the problem, doesn't the paravirt
spinlock code give sufficient clues?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9aa05f79b743..b613cd2b822a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10377,6 +10377,28 @@  static int vcpu_run(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static bool kvm_vcpu_is_preemptible(struct kvm_vcpu *vcpu)
+{
+	int count;
+
+	if (!vcpu->arch.pv_pc.preempt_count_enabled)
+		return false;
+
+	if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
+	    &count, sizeof(int)))
+		return !(count & ~PREEMPT_NEED_RESCHED);
+
+	return false;
+}
+
+bool kvm_arch_boost_candidate(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.last_guest_irq_disabled || !kvm_vcpu_is_preemptible(vcpu))
+		return true;
+
+	return false;
+}
+
 static inline int complete_emulated_io(struct kvm_vcpu *vcpu)
 {
 	int r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9536ffa0473b..28d9e99284f1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1420,6 +1420,7 @@  bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_boost_candidate(struct kvm_vcpu *vcpu);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 69c318fdff61..018a87af01a1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3544,6 +3544,11 @@  bool __weak kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+bool __weak kvm_arch_boost_candidate(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 {
 	struct kvm *kvm = me->kvm;
@@ -3579,6 +3584,8 @@  void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 			    !kvm_arch_dy_has_pending_interrupt(vcpu) &&
 			    !kvm_arch_vcpu_in_kernel(vcpu))
 				continue;
+			if (!kvm_arch_boost_candidate(vcpu))
+				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;

[v2,3/5] KVM: X86: Boost vCPU which is in critical section

Commit Message

Comments

Patch