[RFC,v2,0/3] KVM: x86: add per-vCPU exits disable capability

Message ID	20211221090449.15337-1-kechenl@nvidia.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; From: Kechen Lu <kechenl@nvidia.com> To: <kvm@vger.kernel.org>, <pbonzini@redhat.com>, <seanjc@google.com> CC: <wanpengli@tencent.com>, <vkuznets@redhat.com>, <mst@redhat.com>, <somduttar@nvidia.com>, <kechenl@nvidia.com>, <linux-kernel@vger.kernel.org> Subject: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability Date: Tue, 21 Dec 2021 01:04:46 -0800 Message-ID: <20211221090449.15337-1-kechenl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	KVM: x86: add per-vCPU exits disable capability \| expand [RFC,v2,0/3] KVM: x86: add per-vCPU exits disable capability [RFC,v2,1/3] KVM: x86: only allow exits disable before vCPUs created [RFC,v2,2/3] KVM: x86: move ()_in_guest checking to vCPU scope [RFC,v2,3/3] KVM: x86: add vCPU ioctl for HLT exits disable capability

Message ID

20211221090449.15337-1-kechenl@nvidia.com (mailing list archive)

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 12.22.5.235 as permitted sender) receiver=protection.outlook.com;
 client-ip=12.22.5.235; helo=mail.nvidia.com;
From: Kechen Lu <kechenl@nvidia.com>
To: <kvm@vger.kernel.org>, <pbonzini@redhat.com>, <seanjc@google.com>
CC: <wanpengli@tencent.com>, <vkuznets@redhat.com>, <mst@redhat.com>,
        <somduttar@nvidia.com>, <kechenl@nvidia.com>,
        <linux-kernel@vger.kernel.org>
Subject: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable capability
Date: Tue, 21 Dec 2021 01:04:46 -0800
Message-ID: <20211221090449.15337-1-kechenl@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Dec 2021 09:05:48.7296
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 1e7d732c-77ab-406e-45a0-08d9c46114bc
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[12.22.5.235];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 CO1NAM11FT046.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1366
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Series

KVM: x86: add per-vCPU exits disable capability | expand

Message

Kechen Lu Dec. 21, 2021, 9:04 a.m. UTC

Summary
===========
Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS
cap for disabling exits to enable finer-grained VM exits disabling
on per vCPU scales instead of whole guest. This patch series enabled
the vCPU-scoped exits control on HLT VM-exits.

Motivation
============
In use cases like Windows guest running heavy CPU-bound
workloads, disabling HLT VM-exits could mitigate host sched ctx switch
overhead. Simply HLT disabling on all vCPUs could bring
performance benefits, but if no pCPUs reserved for host threads, could
happened to the forced preemption as host does not know the time to do
the schedule for other host threads want to run. With this patch, we
could only disable part of vCPUs HLT exits for one guest, this still
keeps performance benefits, and also shows resiliency to host stressing
workload running at the same time.

Performance and Testing
=========================
In the host stressing workload experiment with Windows guest heavy
CPU-bound workloads, it shows good resiliency and having the ~3%
performance improvement. E.g. Passmark running in a Windows guest
with this patch disabling HLT exits on only half of vCPUs still
showing 2.4% higher main score v/s baseline.

Tested everything on AMD machines.


v1->v2 (Sean Christopherson) :
- Add explicit restriction for VM-scoped exits disabling to be called
  before vCPUs creation (patch 1)
- Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits
  disable flags check purely for vCPU instead of VM (patch 2)


Best Regards,
Kechen

Kechen Lu (3):
  KVM: x86: only allow exits disable before vCPUs created
  KVM: x86: move ()_in_guest checking to vCPU scope
  KVM: x86: add vCPU ioctl for HLT exits disable capability

 Documentation/virt/kvm/api.rst     |  4 +++-
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  7 +++++++
 arch/x86/kvm/cpuid.c               |  2 +-
 arch/x86/kvm/lapic.c               |  2 +-
 arch/x86/kvm/svm/svm.c             | 20 +++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c             | 26 ++++++++++++++++++--------
 arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++++-
 arch/x86/kvm/x86.h                 | 16 ++++++++--------
 9 files changed, 77 insertions(+), 25 deletions(-)

Comments

Michael S. Tsirkin Jan. 10, 2022, 9:18 p.m. UTC | #1

On Tue, Dec 21, 2021 at 01:04:46AM -0800, Kechen Lu wrote:
> Summary
> ===========
> Introduce support of vCPU-scoped ioctl with KVM_CAP_X86_DISABLE_EXITS
> cap for disabling exits to enable finer-grained VM exits disabling
> on per vCPU scales instead of whole guest. This patch series enabled
> the vCPU-scoped exits control on HLT VM-exits.
> 
> Motivation
> ============
> In use cases like Windows guest running heavy CPU-bound
> workloads, disabling HLT VM-exits could mitigate host sched ctx switch
> overhead. Simply HLT disabling on all vCPUs could bring
> performance benefits, but if no pCPUs reserved for host threads, could
> happened to the forced preemption as host does not know the time to do
> the schedule for other host threads want to run. With this patch, we
> could only disable part of vCPUs HLT exits for one guest, this still
> keeps performance benefits, and also shows resiliency to host stressing
> workload running at the same time.
> 
> Performance and Testing
> =========================
> In the host stressing workload experiment with Windows guest heavy
> CPU-bound workloads, it shows good resiliency and having the ~3%
> performance improvement. E.g. Passmark running in a Windows guest
> with this patch disabling HLT exits on only half of vCPUs still
> showing 2.4% higher main score v/s baseline.
> 
> Tested everything on AMD machines.
> 
> 
> v1->v2 (Sean Christopherson) :
> - Add explicit restriction for VM-scoped exits disabling to be called
>   before vCPUs creation (patch 1)
> - Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits
>   disable flags check purely for vCPU instead of VM (patch 2)

This is still quite blunt and assumes a ton of configuration on the host
exactly matching the workload within guest. Which seems a waste since
guests actually have the smarts to know what's happening within them.

If you are going to allow guest to halt a vCPU, how about
working on exposing mwait to guest cleanly instead?
The idea is to expose this in ACPI - linux guests
ignore ACPI and go by CPUID but windows guests follow
ACPI. Linux can be patched ;)

What we would have is a mirror of host ACPI states,
such that lower states invoke HLT and exit, higher
power states invoke mwait and wait within guest.

The nice thing with this approach is that it's already supported
by the host kernel, so it's just a question of coding up ACPI.



> 
> Best Regards,
> Kechen
> 
> Kechen Lu (3):
>   KVM: x86: only allow exits disable before vCPUs created
>   KVM: x86: move ()_in_guest checking to vCPU scope
>   KVM: x86: add vCPU ioctl for HLT exits disable capability
> 
>  Documentation/virt/kvm/api.rst     |  4 +++-
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  7 +++++++
>  arch/x86/kvm/cpuid.c               |  2 +-
>  arch/x86/kvm/lapic.c               |  2 +-
>  arch/x86/kvm/svm/svm.c             | 20 +++++++++++++++-----
>  arch/x86/kvm/vmx/vmx.c             | 26 ++++++++++++++++++--------
>  arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++++-
>  arch/x86/kvm/x86.h                 | 16 ++++++++--------
>  9 files changed, 77 insertions(+), 25 deletions(-)
> 
> -- 
> 2.30.2

Kechen Lu Jan. 11, 2022, 6:34 a.m. UTC | #2

Hi Michael,

> -----Original Message-----
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Monday, January 10, 2022 1:18 PM
> To: Kechen Lu <kechenl@nvidia.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; seanjc@google.com;
> wanpengli@tencent.com; vkuznets@redhat.com; Somdutta Roy
> <somduttar@nvidia.com>; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH v2 0/3] KVM: x86: add per-vCPU exits disable
> capability
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Dec 21, 2021 at 01:04:46AM -0800, Kechen Lu wrote:
> > Summary
> > ===========
> > Introduce support of vCPU-scoped ioctl with
> KVM_CAP_X86_DISABLE_EXITS
> > cap for disabling exits to enable finer-grained VM exits disabling on
> > per vCPU scales instead of whole guest. This patch series enabled the
> > vCPU-scoped exits control on HLT VM-exits.
> >
> > Motivation
> > ============
> > In use cases like Windows guest running heavy CPU-bound workloads,
> > disabling HLT VM-exits could mitigate host sched ctx switch overhead.
> > Simply HLT disabling on all vCPUs could bring performance benefits,
> > but if no pCPUs reserved for host threads, could happened to the
> > forced preemption as host does not know the time to do the schedule
> > for other host threads want to run. With this patch, we could only
> > disable part of vCPUs HLT exits for one guest, this still keeps
> > performance benefits, and also shows resiliency to host stressing
> > workload running at the same time.
> >
> > Performance and Testing
> > =========================
> > In the host stressing workload experiment with Windows guest heavy
> > CPU-bound workloads, it shows good resiliency and having the ~3%
> > performance improvement. E.g. Passmark running in a Windows guest with
> > this patch disabling HLT exits on only half of vCPUs still showing
> > 2.4% higher main score v/s baseline.
> >
> > Tested everything on AMD machines.
> >
> >
> > v1->v2 (Sean Christopherson) :
> > - Add explicit restriction for VM-scoped exits disabling to be called
> >   before vCPUs creation (patch 1)
> > - Use vCPU ioctl instead of 64bit vCPU bitmask (patch 3), and make exits
> >   disable flags check purely for vCPU instead of VM (patch 2)
> 
> This is still quite blunt and assumes a ton of configuration on the host exactly
> matching the workload within guest. Which seems a waste since guests
> actually have the smarts to know what's happening within them.
> 

For now we use fixed configuration on the host for our guests, it still 
gives promising performance benefits on most workloads in our use case. But 
yeah, it's not adaptive and flexible for workloads in guest.

> If you are going to allow guest to halt a vCPU, how about working on
> exposing mwait to guest cleanly instead?
> The idea is to expose this in ACPI - linux guests ignore ACPI and go by CPUID
> but windows guests follow ACPI. Linux can be patched ;)
> 
> What we would have is a mirror of host ACPI states, such that lower states
> invoke HLT and exit, higher power states invoke mwait and wait within guest.
> 
> The nice thing with this approach is that it's already supported by the host
> kernel, so it's just a question of coding up ACPI.
> 

This idea looks really interesting! If we could achieve idling longer time(deeper power
State) causing HLT and exit, shorter time idle(higher power state) mwait in guest, 
through ACPI config, that's indeed a more adaptive and cleaner approach. But especially
for Windows guest, its idle process execution and idle/sleep state switching logic seems
not well documented, need to figure out impacts on idle process and os PM behaviors 
with the change.

But much thanks for this suggestion, I will try to explore it a bit,
and will get updates posted. 

Thanks!

Best Regards,
Kechen

> 
> 
> >
> > Best Regards,
> > Kechen
> >
> > Kechen Lu (3):
> >   KVM: x86: only allow exits disable before vCPUs created
> >   KVM: x86: move ()_in_guest checking to vCPU scope
> >   KVM: x86: add vCPU ioctl for HLT exits disable capability
> >
> >  Documentation/virt/kvm/api.rst     |  4 +++-
> >  arch/x86/include/asm/kvm-x86-ops.h |  1 +
> >  arch/x86/include/asm/kvm_host.h    |  7 +++++++
> >  arch/x86/kvm/cpuid.c               |  2 +-
> >  arch/x86/kvm/lapic.c               |  2 +-
> >  arch/x86/kvm/svm/svm.c             | 20 +++++++++++++++-----
> >  arch/x86/kvm/vmx/vmx.c             | 26 ++++++++++++++++++--------
> >  arch/x86/kvm/x86.c                 | 24 +++++++++++++++++++++++-
> >  arch/x86/kvm/x86.h                 | 16 ++++++++--------
> >  9 files changed, 77 insertions(+), 25 deletions(-)
> >
> > --
> > 2.30.2