mbox series

[RFC,0/2] SEPT SEAMCALL retry proposal

Message ID 20241121115139.26338-1-yan.y.zhao@intel.com (mailing list archive)
Headers show
Series SEPT SEAMCALL retry proposal | expand

Message

Yan Zhao Nov. 21, 2024, 11:51 a.m. UTC
This SEPT SEAMCALL retry proposal aims to remove patch
"[HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT"
[1] at the tail of v2 series "TDX MMU Part 2".

==brief history==

In the v1 series 'TDX MMU Part 2', there were several discussions regarding
the necessity of retrying SEPT-related SEAMCALLs up to 16 times within the
SEAMCALL wrapper tdx_seamcall_sept().

The lock status of each SEAMCALL relevant to KVM was analyzed in [2].

The conclusion was that 16 retries was necessary because
- tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.

  When the TDX module detects that EPT violations are caused by the same
  RIP as in the last tdh_vp_enter() for 6 consecutive times, tdh_vp_enter()
  will take SEPT tree lock and contend with tdh_mem*().

- tdg_mem_page_accept() can contend with other tdh_mem*().


Sean provided several good suggestions[3], including:
- Implement retries within TDX code when the TDP MMU returns
  RET_PF_RETRY_FOZEN (for RET_PF_RETRY and frozen SPTE) to avoid triggering
  0-step mitigation.
- It's not necessary for tdg_mem_page_accept() to contend with tdh_mem*()
  inside TDX module.
- Use a method similar to KVM_REQ_MCLOCK_INPROGRESS to kick off vCPUs and
  prevent tdh_vp_enter() during page uninstallation.

Yan later found out that only retry RET_PF_RETRY_FOZEN within TDX code is
insufficient to prevent 0-step mitigation [4].

Rick and Yan then consulted TDX module team with findings that:
- The threshold of zero-step mitigation is counted per vCPU.
  It's of value 6 because

    "There can be at most 2 mapping faults on instruction fetch
     (x86 macro-instructions length is at most 15 bytes) when the
     instruction crosses page boundary; then there can be at most 2
     mapping faults for each memory operand, when the operand crosses
     page boundary. For most of x86 macro-instructions, there are up to 2
     memory operands and each one of them is small, which brings us to
     maximum 2+2*2 = 6 legal mapping faults."
  
- Besides tdg_mem_page_accept(),  tdg_mem_page_attr_rd/wr() can also 
  contend with SEAMCALLs tdh_mem*().

So, we decided to make a proposal to tolerate 0-step mitigation.

==proposal details==

The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs
together which are required for page installation and uninstallation.

These SEAMCALLs can be divided into three groups:
Group 1: tdh_mem_page_add().
         The SEAMCALL is invoked only during TD build time and therefore
         KVM has ensured that no contention will occur.

         Proposal: (as in patch 1)
         Just return error when TDX_OPERAND_BUSY is found.

Group 2: tdh_mem_sept_add(), tdh_mem_page_aug().
         These two SEAMCALLs are invoked for page installation. 
         They return TDX_OPERAND_BUSY when contending with tdh_vp_enter()
	 (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(),
	 tdg_mem_page_attr_rd/wr().

         Proposal: (as in patch 1)
         - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY
           to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault().
         
         - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as
           long as there are no pending signals/interrupts.

         The retry inside TDX aims to reduce the count of tdh_vp_enter()
         before resolving EPT violations in the local vCPU, thereby
         minimizing contentions with other vCPUs. However, it can't
         completely eliminate 0-step mitigation as it exits when there're
         pending signals/interrupts and does not (and cannot) remove the
         tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT.

         Resources    SHARED  users      EXCLUSIVE users
         ------------------------------------------------------------
         SEPT tree  tdh_mem_sept_add     tdh_vp_enter(0-step mitigation)
                    tdh_mem_page_aug
         ------------------------------------------------------------
         SEPT entry                      tdh_mem_sept_add (Host lock)
                                         tdh_mem_page_aug (Host lock)
                                         tdg_mem_page_accept (Guest lock)
                                         tdg_mem_page_attr_rd (Guest lock)
                                         tdg_mem_page_attr_wr (Guest lock)

Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove().
         These three SEAMCALLs are invoked for page uninstallation, with
         KVM mmu_lock held for writing.

         Resources     SHARED users      EXCLUSIVE users
         ------------------------------------------------------------
         TDCS epoch    tdh_vp_enter      tdh_mem_track
         ------------------------------------------------------------
         SEPT tree  tdh_mem_page_remove  tdh_vp_enter (0-step mitigation)
                                         tdh_mem_range_block   
         ------------------------------------------------------------
         SEPT entry                      tdh_mem_range_block (Host lock)
                                         tdh_mem_page_remove (Host lock)
                                         tdg_mem_page_accept (Guest lock)
                                         tdg_mem_page_attr_rd (Guest lock)
                                         tdg_mem_page_attr_wr (Guest lock)

         Proposal: (as in patch 2)
         - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only
           once.
         - During the retry, kick off all vCPUs and prevent any vCPU from
           entering to avoid potential contentions.

         This is because tdh_vp_enter() and TDCALLs are not protected by
         KVM mmu_lock, and remove_external_spte() in KVM must not fail.



SEAMCALL                Lock Type        Resource 
-----------------------------Group 1--------------------------------
tdh_mem_page_add        EXCLUSIVE        TDR
                        NO_LOCK          TDCS
                        NO_LOCK          SEPT tree
                        EXCLUSIVE        page to add

----------------------------Group 2--------------------------------
tdh_mem_sept_add        SHARED           TDR
                        SHARED           TDCS
                        SHARED           SEPT tree
                        HOST,EXCLUSIVE   SEPT entry to modify
                        EXCLUSIVE        page to add


tdh_mem_page_aug        SHARED           TDR
                        SHARED           TDCS
                        SHARED           SEPT tree
                        HOST,EXCLUSIVE   SEPT entry to modify
                        EXCLUSIVE        page to aug

----------------------------Group 3--------------------------------
tdh_mem_range_block     SHARED           TDR
                        SHARED           TDCS
                        EXCLUSIVE        SEPT tree
                        HOST,EXCLUSIVE   SEPT entry to modify

tdh_mem_track           SHARED           TDR
                        SHARED           TDCS
                        EXCLUSIVE        TDCS epoch

tdh_mem_page_remove     SHARED           TDR
                        SHARED           TDCS
                        SHARED           SEPT tree
                        HOST,EXCLUSIVE   SEPT entry to modify
                        EXCLUSIVE        page to remove


[1] https://lore.kernel.org/all/20241112073909.22326-1-yan.y.zhao@intel.com
[2] https://lore.kernel.org/kvm/ZuP5eNXFCljzRgWo@yzhao56-desk.sh.intel.com
[3] https://lore.kernel.org/kvm/ZuR09EqzU1WbQYGd@google.com
[4] https://lore.kernel.org/kvm/Zw%2FKElXSOf1xqLE7@yzhao56-desk.sh.intel.com

Yan Zhao (2):
  KVM: TDX: Retry in TDX when installing TD private/sept pages
  KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/mmu/mmu.c          |   2 +-
 arch/x86/kvm/vmx/tdx.c          | 102 +++++++++++++++++++++++++-------
 3 files changed, 85 insertions(+), 21 deletions(-)