mbox series

[00/14] KVM: x86/mmu: Dirty logging fixes and improvements

Message ID 20210213005015.1651772-1-seanjc@google.com (mailing list archive)
Headers show
Series KVM: x86/mmu: Dirty logging fixes and improvements | expand

Message

Sean Christopherson Feb. 13, 2021, 12:50 a.m. UTC
Paolo, this is more or less ready, but on final read-through before
sending I realized it would be a good idea to WARN during VM destruction
if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
do the above change and retest next week (note, Monday is a US holiday).

On to the code...

This started out as a small tweak to collapsible SPTE zapping in the TDP
MMU, and ended up as a rather large overhaul of CPU dirty logging, a.k.a.
PML.

Four main highlights:

  - Do a more precise check on whether or not a SPTE should be zapped to
    rebuild it as a large page.
  - Disable PML when running L2.  PML is fully emulated for L1 VMMs, thus
    enabling PML in L2 can only hurt and never help.
  - Drop the existing PML kvm_x86_ops.  They're basically trampolines into
    the MMU, and IMO do far more harm than good.
  - Turn on PML only when it's needed instead of setting all dirty bits to
    soft disable PML.

What led me down the rabbit's hole of ripping out the existing PML
kvm_x86_ops isn't really shown here.  Prior to incorporating Makarand's
patch, which allowed for the wholesale remove of setting dirty bits,
I spent a bunch of time poking around the "set dirty bits" code.  My
original changes optimized that path to skip setting dirty bits in the
nested MMU, since the nested MMU relies on write-protection and not PML.
That in turn allowed the TDP MMU zapping to completely skip walking the
rmaps, but doing so based on a bunch of callbacks was a twisted mess.

Happily, those patches got dropped in favor of nuking the code entirely.

Ran selftest and unit tests, and migrated actual VMs on AMD and Intel,
with and without TDP MMU, and with and without EPT.  The AMD system I'm
testing on infinite loops on the reset vector due to a #PF when NPT is
disabled, so that didn't get tested.  That reproduces with kvm/next,
I'll dig into it next week (no idea if it's a KVM or hardware issue).

For actual migration, I ran kvm-unit-tests in L1 along with stress to
hammer memory, and verified migration was effectively blocked until the
stress threads were killed (I didn't feel like figuring out how to
throttle the VM).

Makarand Sonare (1):
  KVM: VMX: Dynamically enable/disable PML based on memslot dirty
    logging

Sean Christopherson (13):
  KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE
    pages
  KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU
  KVM: x86/mmu: Split out max mapping level calculation to helper
  KVM: x86/mmu: Pass the memslot to the rmap callbacks
  KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  KVM: nVMX: Disable PML in hardware when running L2
  KVM: x86/mmu: Expand on the comment in
    kvm_vcpu_ad_need_write_protect()
  KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  KVM: x86: Move MMU's PML logic to common code
  KVM: x86: Further clarify the logic and comments for toggling log
    dirty
  KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  KVM: x86: Fold "write-protect large" use case into generic
    write-protect
  KVM: x86/mmu: Remove a variety of unnecessary exports

 arch/x86/include/asm/kvm-x86-ops.h |   6 +-
 arch/x86/include/asm/kvm_host.h    |  36 +----
 arch/x86/kvm/mmu/mmu.c             | 203 +++++++++--------------------
 arch/x86/kvm/mmu/mmu_internal.h    |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.c         |  66 +---------
 arch/x86/kvm/mmu/tdp_mmu.h         |   3 +-
 arch/x86/kvm/vmx/nested.c          |  34 +++--
 arch/x86/kvm/vmx/vmx.c             |  94 +++++--------
 arch/x86/kvm/vmx/vmx.h             |   2 +
 arch/x86/kvm/x86.c                 | 145 +++++++++++++--------
 10 files changed, 230 insertions(+), 366 deletions(-)

Comments

Sean Christopherson Feb. 17, 2021, 10:50 p.m. UTC | #1
On Fri, Feb 12, 2021, Sean Christopherson wrote:
> Paolo, this is more or less ready, but on final read-through before
> sending I realized it would be a good idea to WARN during VM destruction
> if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
> the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
> do the above change and retest next week (note, Monday is a US holiday).

Verified cpu_dirty_logging_count does indeed hit zero during VM destruction.

Adding a WARN to KVM would require adding an arch hook to kvm_free_memslots(),
otherwise the count will be non-zero if the VM is destroyed with dirty logging
active.  That doesn't seem worthwhile, so I'm not planning on pursuing a WARN
for the upstream code.
Paolo Bonzini Feb. 18, 2021, 12:57 p.m. UTC | #2
On 13/02/21 01:50, Sean Christopherson wrote:
> Paolo, this is more or less ready, but on final read-through before
> sending I realized it would be a good idea to WARN during VM destruction
> if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
> the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
> do the above change and retest next week (note, Monday is a US holiday).
> 
> On to the code...
> 
> This started out as a small tweak to collapsible SPTE zapping in the TDP
> MMU, and ended up as a rather large overhaul of CPU dirty logging, a.k.a.
> PML.
> 
> Four main highlights:
> 
>    - Do a more precise check on whether or not a SPTE should be zapped to
>      rebuild it as a large page.
>    - Disable PML when running L2.  PML is fully emulated for L1 VMMs, thus
>      enabling PML in L2 can only hurt and never help.
>    - Drop the existing PML kvm_x86_ops.  They're basically trampolines into
>      the MMU, and IMO do far more harm than good.
>    - Turn on PML only when it's needed instead of setting all dirty bits to
>      soft disable PML.
> 
> What led me down the rabbit's hole of ripping out the existing PML
> kvm_x86_ops isn't really shown here.  Prior to incorporating Makarand's
> patch, which allowed for the wholesale remove of setting dirty bits,
> I spent a bunch of time poking around the "set dirty bits" code.  My
> original changes optimized that path to skip setting dirty bits in the
> nested MMU, since the nested MMU relies on write-protection and not PML.
> That in turn allowed the TDP MMU zapping to completely skip walking the
> rmaps, but doing so based on a bunch of callbacks was a twisted mess.
> 
> Happily, those patches got dropped in favor of nuking the code entirely.
> 
> Ran selftest and unit tests, and migrated actual VMs on AMD and Intel,
> with and without TDP MMU, and with and without EPT.  The AMD system I'm
> testing on infinite loops on the reset vector due to a #PF when NPT is
> disabled, so that didn't get tested.  That reproduces with kvm/next,
> I'll dig into it next week (no idea if it's a KVM or hardware issue).
> 
> For actual migration, I ran kvm-unit-tests in L1 along with stress to
> hammer memory, and verified migration was effectively blocked until the
> stress threads were killed (I didn't feel like figuring out how to
> throttle the VM).
> 
> Makarand Sonare (1):
>    KVM: VMX: Dynamically enable/disable PML based on memslot dirty
>      logging
> 
> Sean Christopherson (13):
>    KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE
>      pages
>    KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU
>    KVM: x86/mmu: Split out max mapping level calculation to helper
>    KVM: x86/mmu: Pass the memslot to the rmap callbacks
>    KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
>    KVM: nVMX: Disable PML in hardware when running L2
>    KVM: x86/mmu: Expand on the comment in
>      kvm_vcpu_ad_need_write_protect()
>    KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
>    KVM: x86: Move MMU's PML logic to common code
>    KVM: x86: Further clarify the logic and comments for toggling log
>      dirty
>    KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
>    KVM: x86: Fold "write-protect large" use case into generic
>      write-protect
>    KVM: x86/mmu: Remove a variety of unnecessary exports
> 
>   arch/x86/include/asm/kvm-x86-ops.h |   6 +-
>   arch/x86/include/asm/kvm_host.h    |  36 +----
>   arch/x86/kvm/mmu/mmu.c             | 203 +++++++++--------------------
>   arch/x86/kvm/mmu/mmu_internal.h    |   7 +-
>   arch/x86/kvm/mmu/tdp_mmu.c         |  66 +---------
>   arch/x86/kvm/mmu/tdp_mmu.h         |   3 +-
>   arch/x86/kvm/vmx/nested.c          |  34 +++--
>   arch/x86/kvm/vmx/vmx.c             |  94 +++++--------
>   arch/x86/kvm/vmx/vmx.h             |   2 +
>   arch/x86/kvm/x86.c                 | 145 +++++++++++++--------
>   10 files changed, 230 insertions(+), 366 deletions(-)
> 

Queued 1-9 and 14 until you clarify my doubt about patch 10, thanks.

Paolo