Message ID | 20211221090449.15337-1-kechenl@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: add per-vCPU exits disable capability | expand |
On Tue, Dec 21, 2021 at 01:04:46AM -0800, Kechen Lu wrote: > Summary > =========== > Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS > cap for disabling exits to enable finer-grained VM exits disabling > on per vCPU scales instead of whole guest. This patch series enabled > the vCPU-scoped exits control on HLT VM-exits. > > Motivation > ============ > In use cases like Windows guest running heavy CPU-bound > workloads, disabling HLT VM-exits could mitigate host sched ctx switch > overhead. Simply HLT disabling on all vCPUs could bring > performance benefits, but if no pCPUs reserved for host threads, could > happened to the forced preemption as host does not know the time to do > the schedule for other host threads want to run. With this patch, we > could only disable part of vCPUs HLT exits for one guest, this still > keeps performance benefits, and also shows resiliency to host stressing > workload running at the same time. > > Performance and Testing > ========================= > In the host stressing workload experiment with Windows guest heavy > CPU-bound workloads, it shows good resiliency and having the ~3% > performance improvement. E.g. Passmark running in a Windows guest > with this patch disabling HLT exits on only half of vCPUs still > showing 2.4% higher main score v/s baseline. > > Tested everything on AMD machines. > > > v1->v2 (Sean Christopherson) : > - Add explicit restriction for VM-scoped exits disabling to be called > before vCPUs creation (patch 1) > - Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits > disable flags check purely for vCPU instead of VM (patch 2) This is still quite blunt and assumes a ton of configuration on the host exactly matching the workload within guest. Which seems a waste since guests actually have the smarts to know what's happening within them. If you are going to allow guest to halt a vCPU, how about working on exposing mwait to guest cleanly instead? The idea is to expose this in ACPI - linux guests ignore ACPI and go by CPUID but windows guests follow ACPI. Linux can be patched ;) What we would have is a mirror of host ACPI states, such that lower states invoke HLT and exit, higher power states invoke mwait and wait within guest. The nice thing with this approach is that it's already supported by the host kernel, so it's just a question of coding up ACPI. > > Best Regards, > Kechen > > Kechen Lu (3): > KVM: x86: only allow exits disable before vCPUs created > KVM: x86: move ()_in_guest checking to vCPU scope > KVM: x86: add vCPU ioctl for HLT exits disable capability > > Documentation/virt/kvm/api.rst | 4 +++- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h | 7 +++++++ > arch/x86/kvm/cpuid.c | 2 +- > arch/x86/kvm/lapic.c | 2 +- > arch/x86/kvm/svm/svm.c | 20 +++++++++++++++----- > arch/x86/kvm/vmx/vmx.c | 26 ++++++++++++++++++-------- > arch/x86/kvm/x86.c | 24 +++++++++++++++++++++++- > arch/x86/kvm/x86.h | 16 ++++++++-------- > 9 files changed, 77 insertions(+), 25 deletions(-) > > -- > 2.30.2
Hi Michael, > -----Original Message----- > From: Michael S. Tsirkin <mst@redhat.com> > Sent: Monday, January 10, 2022 1:18 PM > To: Kechen Lu <kechenl@nvidia.com> > Cc: kvm@vger.kernel.org; pbonzini@redhat.com; seanjc@google.com; > wanpengli@tencent.com; vkuznets@redhat.com; Somdutta Roy > <somduttar@nvidia.com>; linux-kernel@vger.kernel.org > Subject: Re: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable > capability > > External email: Use caution opening links or attachments > > > On Tue, Dec 21, 2021 at 01:04:46AM -0800, Kechen Lu wrote: > > Summary > > =========== > > Introduce support of vCPU-scoped ioctl with > KVM_CAP_X86_DISABLE_EXITS > > cap for disabling exits to enable finer-grained VM exits disabling on > > per vCPU scales instead of whole guest. This patch series enabled the > > vCPU-scoped exits control on HLT VM-exits. > > > > Motivation > > ============ > > In use cases like Windows guest running heavy CPU-bound workloads, > > disabling HLT VM-exits could mitigate host sched ctx switch overhead. > > Simply HLT disabling on all vCPUs could bring performance benefits, > > but if no pCPUs reserved for host threads, could happened to the > > forced preemption as host does not know the time to do the schedule > > for other host threads want to run. With this patch, we could only > > disable part of vCPUs HLT exits for one guest, this still keeps > > performance benefits, and also shows resiliency to host stressing > > workload running at the same time. > > > > Performance and Testing > > ========================= > > In the host stressing workload experiment with Windows guest heavy > > CPU-bound workloads, it shows good resiliency and having the ~3% > > performance improvement. E.g. Passmark running in a Windows guest with > > this patch disabling HLT exits on only half of vCPUs still showing > > 2.4% higher main score v/s baseline. > > > > Tested everything on AMD machines. > > > > > > v1->v2 (Sean Christopherson) : > > - Add explicit restriction for VM-scoped exits disabling to be called > > before vCPUs creation (patch 1) > > - Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits > > disable flags check purely for vCPU instead of VM (patch 2) > > This is still quite blunt and assumes a ton of configuration on the host exactly > matching the workload within guest. Which seems a waste since guests > actually have the smarts to know what's happening within them. > For now we use fixed configuration on the host for our guests, it still gives promising performance benefits on most workloads in our use case. But yeah, it's not adaptive and flexible for workloads in guest. > If you are going to allow guest to halt a vCPU, how about working on > exposing mwait to guest cleanly instead? > The idea is to expose this in ACPI - linux guests ignore ACPI and go by CPUID > but windows guests follow ACPI. Linux can be patched ;) > > What we would have is a mirror of host ACPI states, such that lower states > invoke HLT and exit, higher power states invoke mwait and wait within guest. > > The nice thing with this approach is that it's already supported by the host > kernel, so it's just a question of coding up ACPI. > This idea looks really interesting! If we could achieve idling longer time(deeper power State) causing HLT and exit, shorter time idle(higher power state) mwait in guest, through ACPI config, that's indeed a more adaptive and cleaner approach. But especially for Windows guest, its idle process execution and idle/sleep state switching logic seems not well documented, need to figure out impacts on idle process and os PM behaviors with the change. But much thanks for this suggestion, I will try to explore it a bit, and will get updates posted. Thanks! Best Regards, Kechen > > > > > > Best Regards, > > Kechen > > > > Kechen Lu (3): > > KVM: x86: only allow exits disable before vCPUs created > > KVM: x86: move ()_in_guest checking to vCPU scope > > KVM: x86: add vCPU ioctl for HLT exits disable capability > > > > Documentation/virt/kvm/api.rst | 4 +++- > > arch/x86/include/asm/kvm-x86-ops.h | 1 + > > arch/x86/include/asm/kvm_host.h | 7 +++++++ > > arch/x86/kvm/cpuid.c | 2 +- > > arch/x86/kvm/lapic.c | 2 +- > > arch/x86/kvm/svm/svm.c | 20 +++++++++++++++----- > > arch/x86/kvm/vmx/vmx.c | 26 ++++++++++++++++++-------- > > arch/x86/kvm/x86.c | 24 +++++++++++++++++++++++- > > arch/x86/kvm/x86.h | 16 ++++++++-------- > > 9 files changed, 77 insertions(+), 25 deletions(-) > > > > -- > > 2.30.2