mbox series

[0/5] Introduce a quirk to control memslot zap behavior

Message ID 20240613060708.11761-1-yan.y.zhao@intel.com (mailing list archive)
Headers show
Series Introduce a quirk to control memslot zap behavior | expand

Message

Yan Zhao June 13, 2024, 6:06 a.m. UTC
Today "zapping only memslot leaf SPTEs" on moving/deleting a memslot is not
done. Instead, KVM opts to invalidate all page tables and generate fresh
new ones based on the new memslot layout (referred to as "zap all" for
short). This "zap all" behavior is of low overhead for most use cases, and
is adopted primarily due to a bug which caused VM instability when a VM is
with Nvidia Geforce GPU assigned (see link in patch 1).

However, the "zap all" behavior is not desired for certain specific
scenarios. e.g.
- It's not viable for TDX,
  a) TDX requires root page of private page table remains unaltered
     throughout the TD life cycle.
  b) TDX mandates that leaf entries in private page table must be zapped
     prior to non-leaf entries.
  c) TDX requires re-accepting of private pages after page dropping.
- It's not performant for scenarios involving frequent deletion and
  re-adding of numerous small memslots.

This series therefore introduces the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk,
enabling users to control the behavior of memslot zapping when a memslot is
moved/deleted.

The quirk is turned on by default, leading to invalidation/zapping to all
SPTEs when a memslot is moved/deleted.

Users have the option to turn off the quirk. Doing so will limit the
zapping to only leaf SPTEs within the range of memslot being moved/deleted.

This series has been tested with
- Normal VMs
  w/ and w/o device assignment, and kvm selftests

- TDX guests.
  Memslot deletion typically does not occur without device assignment for a
  TD. Therefore, it is tested with shared device assignment.

Note: For TDX integration, the quirk is currently disabled via TDX code in
      QEMU rather than being automatically disabled based on VM type in
      KVM, which is not safe. A malfunctioning QEMU that fails to disable
      the quirk could result in the shared EPT being invalidated while the
      private EPT remains unaffected, as kvm_mmu_zap_all_fast() only
      targets the shared EPT.      

      However, current kvm->arch.disabled_quirks is entirely
      user-controlled, and there is no mechanism for users to verify if a
      quirk has been disabled by the kernel.
      We are therefore wondering which below options are better for TDX:

      a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
         besides the testing of kvm_check_has_quirk(). It is similar to
         "all new VM types have the quirk disabled". e.g.

         static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)                    
         {                                                                                
              return kvm->arch.vm_type != KVM_X86_TDX_VM &&                               
                     kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);                
         }
         
      b) Init the disabled_quirks based on VM type in kernel, extend
         disabled_quirk querying/setting interface to enforce the quirk to
         be disabled for TDX.

Patch 1:   KVM changes.
Patch 2-5: Selftests updates. Verify memslot move/deletion functionality
           with the quirk enabled/disabled.


Yan Zhao (5):
  KVM: x86/mmu: Introduce a quirk to control memslot zap behavior
  KVM: selftests: Test slot move/delete with slot zap quirk
    enabled/disabled
  KVM: selftests: Allow slot modification stress test with quirk
    disabled
  KVM: selftests: Test memslot move in memslot_perf_test with quirk
    disabled
  KVM: selftests: Test private access to deleted memslot with quirk
    disabled

 Documentation/virt/kvm/api.rst                |  6 ++++
 arch/x86/include/asm/kvm_host.h               |  3 +-
 arch/x86/include/uapi/asm/kvm.h               |  1 +
 arch/x86/kvm/mmu/mmu.c                        | 36 ++++++++++++++++++-
 .../kvm/memslot_modification_stress_test.c    | 19 ++++++++--
 .../testing/selftests/kvm/memslot_perf_test.c | 12 ++++++-
 .../selftests/kvm/set_memory_region_test.c    | 29 ++++++++++-----
 .../kvm/x86_64/private_mem_kvm_exits_test.c   | 11 ++++--
 8 files changed, 102 insertions(+), 15 deletions(-)

base-commit: dd5a440a31fae6e459c0d6271dddd62825505361

Comments

Rick Edgecombe June 13, 2024, 8:01 p.m. UTC | #1
On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
>       a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
>          besides the testing of kvm_check_has_quirk(). It is similar to
>          "all new VM types have the quirk disabled". e.g.
> 
>          static inline bool kvm_memslot_flush_zap_all(struct kvm
> *kvm)                    
>         
> {                                                                             
>    
>               return kvm->arch.vm_type != KVM_X86_TDX_VM
> &&                               
>                      kvm_check_has_quirk(kvm,
> KVM_X86_QUIRK_SLOT_ZAP_ALL);                
>          }
>          
>       b) Init the disabled_quirks based on VM type in kernel, extend
>          disabled_quirk querying/setting interface to enforce the quirk to
>          be disabled for TDX.

I'd prefer to go with option (a) here. Because we don't have any behavior
defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.

Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
of the existing vm_types. It would be a few lines of documentation to save
implementing and maintaining a whole interface with special logic for TDX. So to
me it doesn't seem worth it, unless there is some other user for a new more
complex quirk interface.
Yan Zhao June 17, 2024, 10:15 a.m. UTC | #2
On Fri, Jun 14, 2024 at 04:01:07AM +0800, Edgecombe, Rick P wrote:
> On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> >       a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> >          besides the testing of kvm_check_has_quirk(). It is similar to
> >          "all new VM types have the quirk disabled". e.g.
> > 
> >          static inline bool kvm_memslot_flush_zap_all(struct kvm
> > *kvm)                    
> >         
> > {                                                                             
> >    
> >               return kvm->arch.vm_type != KVM_X86_TDX_VM
> > &&                               
> >                      kvm_check_has_quirk(kvm,
> > KVM_X86_QUIRK_SLOT_ZAP_ALL);                
> >          }
> >          
> >       b) Init the disabled_quirks based on VM type in kernel, extend
> >          disabled_quirk querying/setting interface to enforce the quirk to
> >          be disabled for TDX.
> 
> I'd prefer to go with option (a) here. Because we don't have any behavior
> defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.
> 
> Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
> of the existing vm_types. It would be a few lines of documentation to save
> implementing and maintaining a whole interface with special logic for TDX. So to
> me it doesn't seem worth it, unless there is some other user for a new more
> complex quirk interface.
What about introducing a forced disabled_quirk field?

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8152b5259435..32859952fa75 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1425,6 +1425,7 @@ struct kvm_arch {
        u32 bsp_vcpu_id;

        u64 disabled_quirks;
+       u64 force_disabled_quirks;

        enum kvm_irqchip_mode irqchip_mode;
        u8 nr_reserved_ioapic_pins;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e4bb1db45476..629a95cbe568 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -565,6 +565,7 @@ int tdx_vm_init(struct kvm *kvm)
 {
        kvm->arch.has_private_mem = true;

+       kvm->arch.force_disabled_quirks |= KVM_X86_QUIRK_SLOT_ZAP_ALL;
        /*
         * Because guest TD is protected, VMM can't parse the instruction in TD.
         * Instead, guest uses MMIO hypercall.  For unmodified device driver,
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 229497fd3266..53ef06a06517 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -291,7 +291,8 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,

 static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
 {
-       return !(kvm->arch.disabled_quirks & quirk);
+       return !(kvm->arch.disabled_quirks & quirk) &&
+              !(kvm->arch.force_disabled_quirks & quirk);
 }
Sean Christopherson June 18, 2024, 2:34 p.m. UTC | #3
On Mon, Jun 17, 2024, Yan Zhao wrote:
> On Fri, Jun 14, 2024 at 04:01:07AM +0800, Edgecombe, Rick P wrote:
> > On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> > >       a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> > >          besides the testing of kvm_check_has_quirk(). It is similar to
> > >          "all new VM types have the quirk disabled". e.g.
> > > 
> > >          static inline bool kvm_memslot_flush_zap_all(struct kvm
> > > *kvm)                    
> > >         
> > > {                                                                             
> > >    
> > >               return kvm->arch.vm_type != KVM_X86_TDX_VM
> > > &&                               
> > >                      kvm_check_has_quirk(kvm,
> > > KVM_X86_QUIRK_SLOT_ZAP_ALL);                
> > >          }
> > >          
> > >       b) Init the disabled_quirks based on VM type in kernel, extend
> > >          disabled_quirk querying/setting interface to enforce the quirk to
> > >          be disabled for TDX.

There's also option:

            c) Init disabled_quirks based on VM type.

I.e. let userspace enable the quirk.  If the VMM wants to shoot its TDX VM guests,
then so be it.  That said, I don't like this option because it would create a very
bizarre ABI.

> > 
> > I'd prefer to go with option (a) here. Because we don't have any behavior
> > defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.

I vote for (a) as well.

> > Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
> > of the existing vm_types. It would be a few lines of documentation to save
> > implementing and maintaining a whole interface with special logic for TDX. So to
> > me it doesn't seem worth it, unless there is some other user for a new more
> > complex quirk interface.
> What about introducing a forced disabled_quirk field?

Nah, it'd require manual opt-in for every VM type for almost no benefit.  In fact,
IMO the code itself would be a net negative versus:

		return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
		       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);

because explicitly checking for KVM_X86_DEFAULT_VM would directly match the
documentation (which would state that the quirk only applies to DEFAULT_VM).
Yan Zhao June 20, 2024, 3:59 a.m. UTC | #4
On Tue, Jun 18, 2024 at 07:34:15AM -0700, Sean Christopherson wrote:
> On Mon, Jun 17, 2024, Yan Zhao wrote:
> > On Fri, Jun 14, 2024 at 04:01:07AM +0800, Edgecombe, Rick P wrote:
> > > On Thu, 2024-06-13 at 14:06 +0800, Yan Zhao wrote:
> > > >       a) Add a condition for TDX VM type in kvm_arch_flush_shadow_memslot()
> > > >          besides the testing of kvm_check_has_quirk(). It is similar to
> > > >          "all new VM types have the quirk disabled". e.g.
> > > > 
> > > >          static inline bool kvm_memslot_flush_zap_all(struct kvm
> > > > *kvm)                    
> > > >         
> > > > {                                                                             
> > > >    
> > > >               return kvm->arch.vm_type != KVM_X86_TDX_VM
> > > > &&                               
> > > >                      kvm_check_has_quirk(kvm,
> > > > KVM_X86_QUIRK_SLOT_ZAP_ALL);                
> > > >          }
> > > >          
> > > >       b) Init the disabled_quirks based on VM type in kernel, extend
> > > >          disabled_quirk querying/setting interface to enforce the quirk to
> > > >          be disabled for TDX.
> 
> There's also option:
> 
>             c) Init disabled_quirks based on VM type.
> 
> I.e. let userspace enable the quirk.  If the VMM wants to shoot its TDX VM guests,
> then so be it.  That said, I don't like this option because it would create a very
> bizarre ABI.
> 
> > > 
> > > I'd prefer to go with option (a) here. Because we don't have any behavior
> > > defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk" of it.
> 
> I vote for (a) as well.
> 
> > > Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the behavior
> > > of the existing vm_types. It would be a few lines of documentation to save
> > > implementing and maintaining a whole interface with special logic for TDX. So to
> > > me it doesn't seem worth it, unless there is some other user for a new more
> > > complex quirk interface.
> > What about introducing a forced disabled_quirk field?
> 
> Nah, it'd require manual opt-in for every VM type for almost no benefit.  In fact,
> IMO the code itself would be a net negative versus:
> 
> 		return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
> 		       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
> 
> because explicitly checking for KVM_X86_DEFAULT_VM would directly match the
> documentation (which would state that the quirk only applies to DEFAULT_VM).
Makes sense.
Then, (a) looks good to me :)
Do you prefer to include the document update (i.e. the quirk only applies to
DEFAULT_VM) in this series or in TDX MMU series?
And it means the quirk will not be enabled for all other VM types, e.g.
KVM_X86_SW_PROTECTED_VM, and SEV, SNP..., right?
Rick Edgecombe June 20, 2024, 7:38 p.m. UTC | #5
On Tue, 2024-06-18 at 07:34 -0700, Sean Christopherson wrote:
> There's also option:
> 
>             c) Init disabled_quirks based on VM type.
> 
> I.e. let userspace enable the quirk.  If the VMM wants to shoot its TDX VM
> guests,
> then so be it.  That said, I don't like this option because it would create a
> very
> bizarre ABI.

I think we actually need to force it on for TDX because kvm_mmu_zap_all_fast()
only zaps the direct (shared) root. If userspace decides to not enable the
quirk, mirror/private memory will not be zapped on memslot deletion. Then later
if there is a hole punch it will skip zapping that range because there is no
memslot. Then won't it let the pages get freed while they are still mapped in
the TD?

If I got that right (not 100% sure on the gmem hole punch page freeing), I think
KVM needs to force the behavior for TDs.

> 
> > > 
> > > I'd prefer to go with option (a) here. Because we don't have any behavior
> > > defined yet for KVM_X86_TDX_VM, we don't really need to "disable a quirk"
> > > of it.
> 
> I vote for (a) as well.
> 
> > > Instead we could just define KVM_X86_QUIRK_SLOT_ZAP_ALL to be about the
> > > behavior
> > > of the existing vm_types. It would be a few lines of documentation to save
> > > implementing and maintaining a whole interface with special logic for TDX.
> > > So to
> > > me it doesn't seem worth it, unless there is some other user for a new
> > > more
> > > complex quirk interface.
> > What about introducing a forced disabled_quirk field?
> 
> Nah, it'd require manual opt-in for every VM type for almost no benefit.  In
> fact,
> IMO the code itself would be a net negative versus:
> 
>                 return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
>                        kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
> 
> because explicitly checking for KVM_X86_DEFAULT_VM would directly match the
> documentation (which would state that the quirk only applies to DEFAULT_VM).

Ok, I updated (and posted on this series) the TDX integration patch.
Sean Christopherson June 21, 2024, midnight UTC | #6
On Thu, Jun 20, 2024, Rick P Edgecombe wrote:
> On Tue, 2024-06-18 at 07:34 -0700, Sean Christopherson wrote:
> > There's also option:
> > 
> >             c) Init disabled_quirks based on VM type.
> > 
> > I.e. let userspace enable the quirk.  If the VMM wants to shoot its TDX VM
> > guests, then so be it.  That said, I don't like this option because it
> > would create a very bizarre ABI.
> 
> I think we actually need to force it on for TDX because kvm_mmu_zap_all_fast()
> only zaps the direct (shared) root. If userspace decides to not enable the
> quirk, mirror/private memory will not be zapped on memslot deletion. Then later
> if there is a hole punch it will skip zapping that range because there is no
> memslot. Then won't it let the pages get freed while they are still mapped in
> the TD?
> 
> If I got that right (not 100% sure on the gmem hole punch page freeing), I think
> KVM needs to force the behavior for TDs.

What I was suggesting is that we condition the skipping of the mirror/private
EPT pages tables on the quirk, i.e. zap *everything* for TDX VMs if the quirk is
enabled.  Hence the very bizarre ABI.
Rick Edgecombe June 21, 2024, 1:06 a.m. UTC | #7
On Thu, 2024-06-20 at 17:00 -0700, Sean Christopherson wrote:
> 
> What I was suggesting is that we condition the skipping of the mirror/private
> EPT pages tables on the quirk, i.e. zap *everything* for TDX VMs if the quirk
> is
> enabled.  Hence the very bizarre ABI.

Ah, I see now. Yes, that would be strange.