From patchwork Thu Feb 27 01:20:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993402 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2D3C482EB; Thu, 27 Feb 2025 01:18:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619134; cv=none; b=Nn1ZB/0hcvUZ9Az7wgCgoT6NTr6krsIOJD6uYDpfUk/3PF1r0hM5hzcjCSRz4WsM5E4+FXWQpDswppYxiEEfWvds6dWfRLpEVIQZfbgpARDM6zMPiy2MpO9b4D6PZX3ElQr28PR7LhrsQAsFTcvS4ShS2Qk8GXv0FgSdP9pZWcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619134; c=relaxed/simple; bh=BpzZmI2O2fbXdMpIbvZS+E9MGmSSlZxok3GGQxKhq7Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NBKDmtk8gt4O5TqOz+mRY/Sgns5FaPkNEX/UZ3CcZNsyPwiTgN0fE6oPdOS1l7wcWTxk2h/uov/NBVyygZxdAwHXRlHwjLVFs3yw2Gi3yPeQ0NTsFgYP0pWE+m3LDJKog8y9oYE1HMaJ0P0eqMKDSBVCK7uxbDOhF7Aw79hwq2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=C19wVxQU; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="C19wVxQU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619133; x=1772155133; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BpzZmI2O2fbXdMpIbvZS+E9MGmSSlZxok3GGQxKhq7Y=; b=C19wVxQUB3KGuCCqCBJgiJ0qHfKYERcDZy/u0qWcLF4SmMw0snKkWb33 yvLhvK1nqjvI9OMxKT2A5ySrUxpnPJmpqbrW9PbHtsujWv8Khy70Be7t0 D2NL+RDdOHW3SJ1geqR8XV0N9lw97mytl4oHtwnMkb8VGlDZ6L2+IqkLG kVQCEN2aKPU9+c4l+HpwC8VVeB9tCTq+1IZHvP2M2YGEjTDaTo2GVFcAi T83Wg+MCRsba1ZuxEQqKMk89K5U0kDvU6Np+fPtuB22WyvJLvKArufEMo Aj9LgFFiHQ+3bI0beQMpWwETi+A9v2V/4jHCiT6sPi/VkQA5KCnsn928q w==; X-CSE-ConnectionGUID: gyYyzBKNSQGYZ7T3pJydEw== X-CSE-MsgGUID: Hxu4wBqnQwuOcavfmQn2VQ== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959586" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959586" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:53 -0800 X-CSE-ConnectionGUID: 23tYVhGLR96LIp8r/Rju0A== X-CSE-MsgGUID: vHLmkKCoTUiHBeFZLSCwUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674832" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:49 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 01/20] KVM: TDX: Handle EPT violation/misconfig exit Date: Thu, 27 Feb 2025 09:20:02 +0800 Message-ID: <20250227012021.1778144-2-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata For TDX, on EPT violation, call common __vmx_handle_ept_violation() to trigger x86 MMU code; on EPT misconfiguration, bug the VM since it shouldn't happen. EPT violation due to instruction fetch should never be triggered from shared memory in TDX guest. If such EPT violation occurs, treat it as broken hardware. EPT misconfiguration shouldn't happen on neither shared nor secure EPT for TDX guests. - TDX module guarantees no EPT misconfiguration on secure EPT. Per TDX module v1.5 spec section 9.4 "Secure EPT Induced TD Exits": "By design, since secure EPT is fully controlled by the TDX module, an EPT misconfiguration on a private GPA indicates a TDX module bug and is handled as a fatal error." - For shared EPT, the MMIO caching optimization, which is the only case where current KVM configures EPT entries to generate EPT misconfiguration, is implemented in a different way for TDX guests. KVM configures EPT entries to non-present value without suppressing #VE bit. It causes #VE in the TDX guest and the guest will call TDG.VP.VMCALL to request MMIO emulation. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter [binbin: rework changelog] Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" v2: - KVM_BUG_ON() and return -EIO for real EXIT_REASON_EPT_MISCONFIG. (Sean) - Defer the handling for real EXIT_REASON_EPT_MISCONFIG until tdx_handle_exit() because tdx_to_vmx_exit_reason() is called by non-instrumentable code with interrupt disabled. - Rebased after adding tdcall_to_vmx_exit_reason(). TDX "the rest" v1: - Renamed from "KVM: TDX: Handle ept violation/misconfig exit" to "KVM: TDX: Handle EPT violation/misconfig exit" (Reinette) - Removed WARN_ON_ONCE(1) in tdx_handle_ept_misconfig(). (Rick) - Add comment above EPT_VIOLATION_ACC_INSTR check. (Chao) https://lore.kernel.org/lkml/Zgoz0sizgEZhnQ98@chao-email/ https://lore.kernel.org/lkml/ZjiE+O9fct5zI4Sf@chao-email/ - Remove unnecessary define of TDX_SEPT_VIOLATION_EXIT_QUAL. (Sean) - Replace pr_warn() and KVM_EXIT_EXCEPTION with KVM_BUG_ON(). (Sean) - KVM_BUG_ON() for EPT misconfig. (Sean) - Rework changelog. v14 -> v15: - use PFERR_GUEST_ENC_MASK to tell the fault is private --- arch/x86/kvm/vmx/tdx.c | 47 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f9eccee02a69..0e2c734070d6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -860,6 +860,12 @@ static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu) return EXIT_REASON_VMCALL; return tdcall_to_vmx_exit_reason(vcpu); + case EXIT_REASON_EPT_MISCONFIG: + /* + * Defer KVM_BUG_ON() until tdx_handle_exit() because this is in + * non-instrumentable code with interrupts disabled. + */ + return -1u; default: break; } @@ -968,6 +974,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) vcpu->arch.regs_avail &= ~TDX_REGS_UNSUPPORTED_SET; + if (unlikely(tdx->vp_enter_ret == EXIT_REASON_EPT_MISCONFIG)) + return EXIT_FASTPATH_NONE; + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) return EXIT_FASTPATH_NONE; @@ -1674,6 +1683,37 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector); } +static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) +{ + unsigned long exit_qual; + gpa_t gpa = to_tdx(vcpu)->exit_gpa; + + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { + /* + * Always treat SEPT violations as write faults. Ignore the + * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations. + * TD private pages are always RWX in the SEPT tables, + * i.e. they're always mapped writable. Just as importantly, + * treating SEPT violations as write faults is necessary to + * avoid COW allocations, which will cause TDAUGPAGE failures + * due to aliasing a single HPA to multiple GPAs. + */ + exit_qual = EPT_VIOLATION_ACC_WRITE; + } else { + exit_qual = vmx_get_exit_qual(vcpu); + /* + * EPT violation due to instruction fetch should never be + * triggered from shared memory in TDX guest. If such EPT + * violation occurs, treat it as broken hardware. + */ + if (KVM_BUG_ON(exit_qual & EPT_VIOLATION_ACC_INSTR, vcpu->kvm)) + return -EIO; + } + + trace_kvm_page_fault(vcpu, gpa, exit_qual); + return __vmx_handle_ept_violation(vcpu, gpa, exit_qual); +} + int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -1683,6 +1723,11 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) if (fastpath != EXIT_FASTPATH_NONE) return 1; + if (unlikely(vp_enter_ret == EXIT_REASON_EPT_MISCONFIG)) { + KVM_BUG_ON(1, vcpu->kvm); + return -EIO; + } + /* * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and * TDX_SEAMCALL_VMFAILINVALID. @@ -1732,6 +1777,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return tdx_emulate_io(vcpu); case EXIT_REASON_EPT_MISCONFIG: return tdx_emulate_mmio(vcpu); + case EXIT_REASON_EPT_VIOLATION: + return tdx_handle_ept_violation(vcpu); case EXIT_REASON_OTHER_SMI: /* * Unlike VMX, SMI in SEAM non-root mode (i.e. when From patchwork Thu Feb 27 01:20:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993403 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A51B1459F7; Thu, 27 Feb 2025 01:18:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619137; cv=none; b=Kh9nqw+BWukursED1kl3pwEIpDj44MtRfnIXs/Z80TFz/tb3RoGqdePbgXkyEThZ8huypFz0uSAP8rHKteiCrac2+CVLuMBLynHBaHKNmGDUZH48V/2wZk8idwVHS812NN5BDZvOGY73g7GKeP2XbI93y1f5vv7T8l2EwjCWqlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619137; c=relaxed/simple; bh=rAInuVdG2jGTcBVvYvvjjkp1E50lnprsmHCpiiwqzNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r7UWa4EM8yYvblA28iOCQin12OM6s1YwIShAcSrBnfsB7rc+FVjqn6HQ2XE2503T82TziHiHx1AmgFqGLc5FrTNACrVw8BDA5UBDTWxp/XgjzG12leCYj3G7uhXjuqdkuL3s9k6rlNyc4zZY+FcWjz5jm5Hp85bMxN8vLpKrdvU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TZjIkrJo; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TZjIkrJo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619136; x=1772155136; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rAInuVdG2jGTcBVvYvvjjkp1E50lnprsmHCpiiwqzNY=; b=TZjIkrJoFEyBWLsZwZaX+hj57/IFXh3v2ZnFZe7+KSdPeSECqiCXbxQJ rdiyzbAWWxNmygfTtc1cyoTLEeEGqk5AumAL26ybO99LC2lWwN2OaUyOY j5Fv8W/C1kmaqopqFEdmEW6ifGZFFGnEucvyraDz2fB9wrhXpydaJ8rHk lhHpbRF88qB+VUBpQgHgFj7yn1eHfwm+Uv2uawCMD44NliUH3OIfNtk4P DG3CMMZF1+cxwFzanFXxCuaejo3qNiDamQoCzW16pEUf2i+oeu+dGtUHT 8mXdbkGBxkE92IY5Zh5Ul1JkxZ91BU+Y7ZsAcCr0G7K4JvwLZrrJ37hAR w==; X-CSE-ConnectionGUID: mOsAYCCtRZah4j1WUfyULQ== X-CSE-MsgGUID: l6dyqi1UQQiHyizB0lVBLg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959594" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959594" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:56 -0800 X-CSE-ConnectionGUID: JLHF8FKUTYe7njiIYsGvLw== X-CSE-MsgGUID: kmUPc+EMRl6WFiBQdEKlXA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674836" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:52 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 02/20] KVM: TDX: Detect unexpected SEPT violations due to pending SPTEs Date: Thu, 27 Feb 2025 09:20:03 +0800 Message-ID: <20250227012021.1778144-3-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yan Zhao Detect SEPT violations that occur when an SEPT entry is in PENDING state while the TD is configured not to receive #VE on SEPT violations. A TD guest can be configured not to receive #VE by setting SEPT_VE_DISABLE to 1 in tdh_mng_init() or modifying pending_ve_disable to 1 in TDCS when flexible_pending_ve is permitted. In such cases, the TDX module will not inject #VE into the TD upon encountering an EPT violation caused by an SEPT entry in the PENDING state. Instead, TDX module will exit to VMM and set extended exit qualification type to PENDING_EPT_VIOLATION and exit qualification bit 6:3 to 0. Since #VE will not be injected to such TDs, they are not able to be notified to accept a GPA. TD accessing before accepting a private GPA is regarded as an error within the guest. Detect such guest error by inspecting the (extended) exit qualification bits and make such VM dead. Cc: Xiaoyao Li Cc: Rick Edgecombe Signed-off-by: Yan Zhao Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Rebased on getting exit_qualification, ext_exit_qualification. TDX "the rest" v1: - New patch --- arch/x86/include/asm/vmx.h | 2 ++ arch/x86/kvm/vmx/tdx.c | 17 +++++++++++++++++ arch/x86/kvm/vmx/tdx_arch.h | 2 ++ 3 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 9298fb9d4bb3..028f3b8db2af 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -585,12 +585,14 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_ACC_WRITE_BIT 1 #define EPT_VIOLATION_ACC_INSTR_BIT 2 #define EPT_VIOLATION_RWX_SHIFT 3 +#define EPT_VIOLATION_EXEC_R3_LIN_BIT 6 #define EPT_VIOLATION_GVA_IS_VALID_BIT 7 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8 #define EPT_VIOLATION_ACC_READ (1 << EPT_VIOLATION_ACC_READ_BIT) #define EPT_VIOLATION_ACC_WRITE (1 << EPT_VIOLATION_ACC_WRITE_BIT) #define EPT_VIOLATION_ACC_INSTR (1 << EPT_VIOLATION_ACC_INSTR_BIT) #define EPT_VIOLATION_RWX_MASK (VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT) +#define EPT_VIOLATION_EXEC_FOR_RING3_LIN (1 << EPT_VIOLATION_EXEC_R3_LIN_BIT) #define EPT_VIOLATION_GVA_IS_VALID (1 << EPT_VIOLATION_GVA_IS_VALID_BIT) #define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0e2c734070d6..b8701e343e80 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1683,12 +1683,29 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector); } +static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcpu) +{ + u64 eeq_type = to_tdx(vcpu)->ext_exit_qualification & TDX_EXT_EXIT_QUAL_TYPE_MASK; + u64 eq = vmx_get_exit_qual(vcpu); + + if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION) + return false; + + return !(eq & EPT_VIOLATION_RWX_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN); +} + static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) { unsigned long exit_qual; gpa_t gpa = to_tdx(vcpu)->exit_gpa; if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { + if (tdx_is_sept_violation_unexpected_pending(vcpu)) { + pr_warn("Guest access before accepting 0x%llx on vCPU %d\n", + gpa, vcpu->vcpu_id); + kvm_vm_dead(vcpu->kvm); + return -EIO; + } /* * Always treat SEPT violations as write faults. Ignore the * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations. diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index a8071409498f..fcbf0d4abc5f 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -69,6 +69,8 @@ struct tdx_cpuid_value { #define TDX_TD_ATTR_KL BIT_ULL(31) #define TDX_TD_ATTR_PERFMON BIT_ULL(63) +#define TDX_EXT_EXIT_QUAL_TYPE_MASK GENMASK(3, 0) +#define TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION 6 /* * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B. */ From patchwork Thu Feb 27 01:20:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993404 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DE1422F01; Thu, 27 Feb 2025 01:18:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619141; cv=none; b=qC81KS+wOaotP2jPaeMGYEb55SOYGndy+ZYnhuCDcxOMUXMUP/8FA6UY3smtPq94VS2SkouWJHBBIAkPYuFhwOU8+RaErQ004nNnYBOcCbvJcPSknr+3KXwI/tgxQobgTtWFaETkyoHOxsusRfKSI3KURGiBNL/xaFl84+Deq3o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619141; c=relaxed/simple; bh=DmK8LRAGl5CcObJpm47+U2I+JjTgigBYqYk/CcW5jq0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WVGOfFN/phTb8ImFQtvq+HBCc3V8vtEeXE/0AYtUwXP3IPL4AT3lN6MV/JFngrdOhVMcxGrR+jq319X5dHYwU5Kay0IFQTaoHro5uR+utQLu8K8tAU7j9AFOey7RHswPTHEp68+WsLY0y4DRFCpxD33GN8C4pAab7tZRfQOnIe0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Wy2zMwuI; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Wy2zMwuI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619139; x=1772155139; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DmK8LRAGl5CcObJpm47+U2I+JjTgigBYqYk/CcW5jq0=; b=Wy2zMwuIYkoX7P/6lxwbCp6DzNVLCwqlmA3aEmQsfHwti4WPC1n9+L2y Crrmo+IIbrZGWI7jQB+/ccCMTLCSSUxW0LZsj/tnC4m4oNVihubDxbZXc +cQe0PD7RR3Cq1/WYGG/fgb+eFF0JuYGOnjfYbFYiZqWfmMZKPl/ak9uD jFE4ZD3LnAt4cSu4eM4PN/Wqb9T7nfydP4EasqxFhcwe8e2oAyF/75pja tFLU/9bUOrIcxQ61TibAmVLuZoktzHQ5Ptq4aVTcZ73Pwq5YkRZjxwybg B1I9LBKMzrBN1Wyef5Ieiw5k4nrC/ytUxnebyHzrRSTy4T/+eWFFLCXpK w==; X-CSE-ConnectionGUID: ZIiosn3LTs+mc5GAAlNdIw== X-CSE-MsgGUID: h6nPWl5hSoKZGIONPm0w8w== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959598" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959598" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:59 -0800 X-CSE-ConnectionGUID: jvspfqGaTo21I7sy2NgkcA== X-CSE-MsgGUID: c8rykWehTv+q2chbFli9SA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674839" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:55 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 03/20] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY Date: Thu, 27 Feb 2025 09:20:04 +0800 Message-ID: <20250227012021.1778144-4-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yan Zhao Retry locally in the TDX EPT violation handler for private memory to reduce the chances for tdh_mem_sept_add()/tdh_mem_page_aug() to contend with tdh_vp_enter(). TDX EPT violation installs private pages via tdh_mem_sept_add() and tdh_mem_page_aug(). The two may have contention with tdh_vp_enter() or TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation) tdh_mem_page_aug ------------------------------------------------------------ SEPT entry tdh_mem_sept_add (Host lock) tdh_mem_page_aug (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Though the contention between tdh_mem_sept_add()/tdh_mem_page_aug() and TDCALLs may be removed in future TDX module, their contention with tdh_vp_enter() due to 0-step mitigation still persists. The TDX module may trigger 0-step mitigation in SEAMCALL TDH.VP.ENTER, which works as follows: 0. Each TDH.VP.ENTER records the guest RIP on TD entry. 1. When the TDX module encounters a VM exit with reason EPT_VIOLATION, it checks if the guest RIP is the same as last guest RIP on TD entry. -if yes, it means the EPT violation is caused by the same instruction that caused the last VM exit. Then, the TDX module increases the guest RIP no-progress count. When the count increases from 0 to the threshold (currently 6), the TDX module records the faulting GPA into a last_epf_gpa_list. -if no, it means the guest RIP has made progress. So, the TDX module resets the RIP no-progress count and the last_epf_gpa_list. 2. On the next TDH.VP.ENTER, the TDX module (after saving the guest RIP on TD entry) checks if the last_epf_gpa_list is empty. -if yes, TD entry continues without acquiring the lock on the SEPT tree. -if no, it triggers the 0-step mitigation by acquiring the exclusive lock on SEPT tree, walking the EPT tree to check if all page faults caused by the GPAs in the last_epf_gpa_list have been resolved before continuing TD entry. Since KVM TDP MMU usually re-enters guest whenever it exits to userspace (e.g. for KVM_EXIT_MEMORY_FAULT) or encounters a BUSY, it is possible for a tdh_vp_enter() to be called more than the threshold count before a page fault is addressed, triggering contention when tdh_vp_enter() attempts to acquire exclusive lock on SEPT tree. Retry locally in TDX EPT violation handler to reduce the count of invoking tdh_vp_enter(), hence reducing the possibility of its contention with tdh_mem_sept_add()/tdh_mem_page_aug(). However, the 0-step mitigation and the contention are still not eliminated due to KVM_EXIT_MEMORY_FAULT, signals/interrupts, and cases when one instruction faults more GFNs than the threshold count. Signed-off-by: Yan Zhao --- Changes from "KVM: TDX SEPT SEAMCALL retry" series [1] - Use kvm_vcpu_has_events() to replace "pi_has_pending_interrupt(vcpu) || kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending)" [2]. (Sean) - Update code comment to explain why break out for false positive of interrupt injection is fine (Sean). [1]: https://lore.kernel.org/all/20250113021218.18922-1-yan.y.zhao@intel.com [2]: https://lore.kernel.org/all/Z4rIGv4E7Jdmhl8P@google.com --- arch/x86/kvm/vmx/tdx.c | 57 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b8701e343e80..6fa6f7e13e15 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1698,6 +1698,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) { unsigned long exit_qual; gpa_t gpa = to_tdx(vcpu)->exit_gpa; + bool local_retry = false; + int ret; if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { if (tdx_is_sept_violation_unexpected_pending(vcpu)) { @@ -1716,6 +1718,9 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) * due to aliasing a single HPA to multiple GPAs. */ exit_qual = EPT_VIOLATION_ACC_WRITE; + + /* Only private GPA triggers zero-step mitigation */ + local_retry = true; } else { exit_qual = vmx_get_exit_qual(vcpu); /* @@ -1728,7 +1733,57 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) } trace_kvm_page_fault(vcpu, gpa, exit_qual); - return __vmx_handle_ept_violation(vcpu, gpa, exit_qual); + + /* + * To minimize TDH.VP.ENTER invocations, retry locally for private GPA + * mapping in TDX. + * + * KVM may return RET_PF_RETRY for private GPA due to + * - contentions when atomically updating SPTEs of the mirror page table + * - in-progress GFN invalidation or memslot removal. + * - TDX_OPERAND_BUSY error from TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD, + * caused by contentions with TDH.VP.ENTER (with zero-step mitigation) + * or certain TDCALLs. + * + * If TDH.VP.ENTER is invoked more times than the threshold set by the + * TDX module before KVM resolves the private GPA mapping, the TDX + * module will activate zero-step mitigation during TDH.VP.ENTER. This + * process acquires an SEPT tree lock in the TDX module, leading to + * further contentions with TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD + * operations on other vCPUs. + * + * Breaking out of local retries for kvm_vcpu_has_events() is for + * interrupt injection. kvm_vcpu_has_events() should not see pending + * events for TDX. Since KVM can't determine if IRQs (or NMIs) are + * blocked by TDs, false positives are inevitable i.e., KVM may re-enter + * the guest even if the IRQ/NMI can't be delivered. + * + * Note: even without breaking out of local retries, zero-step + * mitigation may still occur due to + * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT, + * - a single RIP causing EPT violations for more GFNs than the + * threshold count. + * This is safe, as triggering zero-step mitigation only introduces + * contentions to page installation SEAMCALLs on other vCPUs, which will + * handle retries locally in their EPT violation handlers. + */ + while (1) { + ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual); + + if (ret != RET_PF_RETRY || !local_retry) + break; + + if (kvm_vcpu_has_events(vcpu) || signal_pending(current)) + break; + + if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { + ret = -EIO; + break; + } + + cond_resched(); + } + return ret; } int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) From patchwork Thu Feb 27 01:20:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993405 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04DA814B96E; Thu, 27 Feb 2025 01:19:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619144; cv=none; b=RbtHi03G3Tth7YhoSkJOSIXqj3qj65NNRqx9FuOjw22QzelD/87CAGHTJ2pHJgUeL5Beit0pbBWEjduOLSYJe6NBgSPWjUrE14PrdoLnhcBbOpoXeXbQ2hcAygCo2g9IT2DLl28HRLH171GowzfahO1zPUwPkl+m1x3Un+CohZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619144; c=relaxed/simple; bh=qXfVJSCufMdZGiMGONaIm/kQ0o1b85tXM8RtmwjYVvM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ergCAsyJe4I/GOoMRypqPGi3QxsrxTCECP/djqMgvhMJhOC0OP+eCpywcQstz57nCqxG6fAVy2WuCXHetvtNEaaMVH/Bw5Y9od7kglE7WOub5R/dkeXZUDxLXLQepLfc47RT6w2xCvQALUhpmhn8NDmxkw3yBA7FaDdzWylbHKc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nqGUi7QG; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nqGUi7QG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619143; x=1772155143; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qXfVJSCufMdZGiMGONaIm/kQ0o1b85tXM8RtmwjYVvM=; b=nqGUi7QGIQWYpUx8zPOe8VZHbBAzUD70TEpaVKEEgS/tvCM2ymdI5bKT XNU4/R2Y8GRBjKosKNqbktbGmbqLYm7I6w4Q1j3fKeSbQLWZNB+2QfGxy YB5htETgeV7IvmgJhTteLTu7s++3eAFr5jQXWwXYHP2PfKaKPqW5tyXpp MfjE3JJKAPg7kIsNnu1z9k/94Gc+0sZuITEi+V7GRFnPMv3jog7nrdzX0 7nTh4OqP6vhli4Bg2VgyQ3/L13ctZn9m0avA3z4XsD4ZBvY1xwtyqZluM rqV4/nt3oFETc8O65ieWjwIYVicwR4k3E2yOd1MlN6ZiQgybdbxif6Lox w==; X-CSE-ConnectionGUID: zAaU9LMrReqQJ73hno7NzA== X-CSE-MsgGUID: MmYIAqU/RvSMvE9ZPXdnSg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959604" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959604" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:02 -0800 X-CSE-ConnectionGUID: xJftErw0Rr6ZlA/zdj9X9Q== X-CSE-MsgGUID: 0yfIO+L4QOOCyhMvH6sjcA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674858" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:18:59 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 04/20] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Date: Thu, 27 Feb 2025 09:20:05 +0800 Message-ID: <20250227012021.1778144-5-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yan Zhao Kick off all vCPUs and prevent tdh_vp_enter() from executing whenever tdh_mem_range_block()/tdh_mem_track()/tdh_mem_page_remove() encounters contention, since the page removal path does not expect error and is less sensitive to the performance penalty caused by kicking off vCPUs. Although KVM has protected SEPT zap-related SEAMCALLs with kvm->mmu_lock, KVM may still encounter TDX_OPERAND_BUSY due to the contention in the TDX module. - tdh_mem_track() may contend with tdh_vp_enter(). - tdh_mem_range_block()/tdh_mem_page_remove() may contend with tdh_vp_enter() and TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ TDCS epoch tdh_vp_enter tdh_mem_track ------------------------------------------------------------ SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation) tdh_mem_range_block ------------------------------------------------------------ SEPT entry tdh_mem_range_block (Host lock) tdh_mem_page_remove (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Use a TDX specific per-VM flag wait_for_sept_zap along with KVM_REQ_OUTSIDE_GUEST_MODE to kick off vCPUs and prevent them from entering TD, thereby avoiding the potential contention. Apply the kick-off and no vCPU entering only after each SEAMCALL busy error to minimize the window of no TD entry, as the contention due to 0-step mitigation or TDCALLs is expected to be rare. Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao --- Rebased to use helper tdx_operand_busy(). --- arch/x86/kvm/vmx/tdx.c | 63 ++++++++++++++++++++++++++++++++++++------ arch/x86/kvm/vmx/tdx.h | 7 +++++ 2 files changed, 61 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 6fa6f7e13e15..25ae2c2826d0 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -294,6 +294,26 @@ static void tdx_clear_page(struct page *page) __mb(); } +static void tdx_no_vcpus_enter_start(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + lockdep_assert_held_write(&kvm->mmu_lock); + + WRITE_ONCE(kvm_tdx->wait_for_sept_zap, true); + + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); +} + +static void tdx_no_vcpus_enter_stop(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + lockdep_assert_held_write(&kvm->mmu_lock); + + WRITE_ONCE(kvm_tdx->wait_for_sept_zap, false); +} + /* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */ static int __tdx_reclaim_page(struct page *page) { @@ -953,6 +973,14 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return EXIT_FASTPATH_NONE; } + /* + * Wait until retry of SEPT-zap-related SEAMCALL completes before + * allowing vCPU entry to avoid contention with tdh_vp_enter() and + * TDCALLs. + */ + if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap))) + return EXIT_FASTPATH_EXIT_HANDLED; + trace_kvm_entry(vcpu, force_immediate_exit); if (pi_test_on(&vt->pi_desc)) { @@ -1467,15 +1495,24 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) return -EINVAL; - do { + /* + * When zapping private page, write lock is held. So no race condition + * with other vcpu sept operation. + * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. + */ + err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry, + &level_state); + + if (unlikely(tdx_operand_busy(err))) { /* - * When zapping private page, write lock is held. So no race - * condition with other vcpu sept operation. Race only with - * TDH.VP.ENTER. + * The second retry is expected to succeed after kicking off all + * other vCPUs and prevent them from invoking TDH.VP.ENTER. */ + tdx_no_vcpus_enter_start(kvm); err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); - } while (unlikely(tdx_operand_busy(err))); + tdx_no_vcpus_enter_stop(kvm); + } if (KVM_BUG_ON(err, kvm)) { pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state); @@ -1559,9 +1596,13 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, WARN_ON_ONCE(level != PG_LEVEL_4K); err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); - if (unlikely(tdx_operand_busy(err))) - return -EBUSY; + if (unlikely(tdx_operand_busy(err))) { + /* After no vCPUs enter, the second retry is expected to succeed */ + tdx_no_vcpus_enter_start(kvm); + err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state); + tdx_no_vcpus_enter_stop(kvm); + } if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) && !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { atomic64_dec(&kvm_tdx->nr_premapped); @@ -1611,9 +1652,13 @@ static void tdx_track(struct kvm *kvm) lockdep_assert_held_write(&kvm->mmu_lock); - do { + err = tdh_mem_track(&kvm_tdx->td); + if (unlikely(tdx_operand_busy(err))) { + /* After no vCPUs enter, the second retry is expected to succeed */ + tdx_no_vcpus_enter_start(kvm); err = tdh_mem_track(&kvm_tdx->td); - } while (unlikely(tdx_operand_busy(err))); + tdx_no_vcpus_enter_stop(kvm); + } if (KVM_BUG_ON(err, kvm)) pr_tdx_error(TDH_MEM_TRACK, err); diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 4e1e71d5d8e8..c6bafac31d4d 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -36,6 +36,13 @@ struct kvm_tdx { /* For KVM_TDX_INIT_MEM_REGION. */ atomic64_t nr_premapped; + + /* + * Prevent vCPUs from TD entry to ensure SEPT zap related SEAMCALLs do + * not contend with tdh_vp_enter() and TDCALLs. + * Set/unset is protected with kvm->mmu_lock. + */ + bool wait_for_sept_zap; }; /* TDX module vCPU states */ From patchwork Thu Feb 27 01:20:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993406 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAE8015F41F; Thu, 27 Feb 2025 01:19:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619147; cv=none; b=ncRqSh9DPjuVrcqnuWSJ10kYqItLrSHM4xf5JQwFOiYPMPi90nEnA9a//Oo3LxyHMX/iaPokhnImjBMv25fLKo0/gP7/JUid9Yw10TZgr9GMRLXeNpoR8cBz4efZ8h/P3q44V+O5lQtjOrtFRtsLtMk4MtL0WycDFZtiwB66k2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619147; c=relaxed/simple; bh=gUJ+Ih2J/jHcolFrLhOfb5sHgVPqwuVe3UgbkZb0DwQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hgIh+AVdMQxjNiKYlGTdbXNL0vbvkgFJw8lK8G+V2FzsSFNxDhOyW4aPi7m8sdYRCDMXRqp1Tm7nhxou1rWDnQOWSNpJBvsz08sdEJYPL1DF21ixkPscvyGsjNmRGv+iGJGgWdGyxm7kU45UZx8+yF7BD5wycmlStVe8QL59PIk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bQALva22; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bQALva22" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619146; x=1772155146; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gUJ+Ih2J/jHcolFrLhOfb5sHgVPqwuVe3UgbkZb0DwQ=; b=bQALva225N/ljgHVGTKwAY7uQwL57DaWz19qmuCL/h5+7Ic1TxN/jqBl JNR2OGQUBUwOuiUS4mAYnby3nmaLvZw9CbfVCZEGWcQk5qIjAMlDhuUmZ YJMTgVCTguZQdfeF3+U78j280SCxFtdlFrYPt6h7l7y/jlKwnKxoI8BeZ Hek00kIuiWGv15mtzVP3k2i1d058y5Mm+S8rmaoZmSNyOV2yxzhxWfTWa F2XfXzhE1W1ePtRQz0yCYZvocFoXyCMJoo7zKpdWqk5eWAPtW53kVw2hj nI7R0KEHf0jl8aGVq7hW56+MyWFz0aywEo+dPnesOErvoxHRfw1hm36Pc w==; X-CSE-ConnectionGUID: eWHlxOoPS1SZhblRBWIvLQ== X-CSE-MsgGUID: pAm4npjrRPSvcWkwZbwEzg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959608" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959608" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:06 -0800 X-CSE-ConnectionGUID: c8OPJmhqSJ6Qi8Zub0CWIw== X-CSE-MsgGUID: 4NIe5vdFSriBgLX4zYdJ2Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674876" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:02 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 05/20] KVM: TDX: Handle TDX PV CPUID hypercall Date: Thu, 27 Feb 2025 09:20:06 +0800 Message-ID: <20250227012021.1778144-6-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle TDX PV CPUID hypercall for the CPUIDs virtualized by VMM according to TDX Guest Host Communication Interface (GHCI). For TDX, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE. On #VE, TDX guest can issue TDG.VP.VMCALL (same value as EXIT_REASON_CPUID) to request VMM to emulate CPUID operation. Morph PV CPUID hypercall to EXIT_REASON_CPUID and wire up to the KVM backend function. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata [binbin: rewrite changelog] Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Morph PV CPUID hypercall to EXIT_REASON_CPUID. (Sean) - Rebased to use tdcall_to_vmx_exit_reason(). - Use vp_enter_args directly instead of helpers. - Skip setting return code as TDVMCALL_STATUS_SUCCESS. TDX "the rest" v1: - Rewrite changelog. - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 25ae2c2826d0..8019bf553ca5 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -845,6 +845,7 @@ static bool tdx_guest_state_is_invalid(struct kvm_vcpu *vcpu) static __always_inline u32 tdcall_to_vmx_exit_reason(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { + case EXIT_REASON_CPUID: case EXIT_REASON_IO_INSTRUCTION: return tdvmcall_leaf(vcpu); case EXIT_REASON_EPT_VIOLATION: @@ -1212,6 +1213,25 @@ static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) return 0; } +static int tdx_emulate_cpuid(struct kvm_vcpu *vcpu) +{ + u32 eax, ebx, ecx, edx; + struct vcpu_tdx *tdx = to_tdx(vcpu); + + /* EAX and ECX for cpuid is stored in R12 and R13. */ + eax = tdx->vp_enter_args.r12; + ecx = tdx->vp_enter_args.r13; + + kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, false); + + tdx->vp_enter_args.r12 = eax; + tdx->vp_enter_args.r13 = ebx; + tdx->vp_enter_args.r14 = ecx; + tdx->vp_enter_args.r15 = edx; + + return 1; +} + static int tdx_complete_pio_out(struct kvm_vcpu *vcpu) { vcpu->arch.pio.count = 0; @@ -1886,6 +1906,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) case EXIT_REASON_EXTERNAL_INTERRUPT: ++vcpu->stat.irq_exits; return 1; + case EXIT_REASON_CPUID: + return tdx_emulate_cpuid(vcpu); case EXIT_REASON_TDCALL: return handle_tdvmcall(vcpu); case EXIT_REASON_VMCALL: From patchwork Thu Feb 27 01:20:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993407 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B22B17A2F3; Thu, 27 Feb 2025 01:19:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619151; cv=none; b=cqkNC0KQH0JxyFWT6y+YKBR6ix/k/FagOb5LXiCDVJLxKDEvQ2J1n5roO6araic6pi7SR8mgUyOvltHudVThGjoYVYWVN+mVCgRvY/KQQpwUDdTB6Cd4bdvrdkkwedNCpa4qj0n+Qgzak9wECbYA/IjSuwneNe7lO4Ay2cuooyg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619151; c=relaxed/simple; bh=I2BGa656dde3G5onU6rC3eO3gUchuMpeokI55iQR/iU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZnPfBjZGLaYrSZsjpF3lgiNNv66u0vXS8RESyXYwlVmq5iVEMTCR+ONfFJcsHalAHftU/v2c/G8zEBJRyf0q0tI4d4eczghG15pEwNDE5CSKbx90ZCNcw0c/MpktoQ3ngtQB0nEBXSUUrt9up/IYdfB2qnsGvMtH1w7XioZjNEU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OWr8Qa4R; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OWr8Qa4R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619149; x=1772155149; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=I2BGa656dde3G5onU6rC3eO3gUchuMpeokI55iQR/iU=; b=OWr8Qa4RE94rRSrXbUFqDYy0mlEod2Wc2KdmMQ+aKiFqiIhgtOPLmrrM M2lPgiYO97M0lAA9+qCo7EJfTd78wEmyNhT33cfUQDTX07buClQQt0BgQ G33D6uathivEfmWrxm5aoRXARMv6n0BpICYxkqrtC6yBmQ4wVXF+ScXOe k6yFxUjOt1+BbamzT+41lQjMfpeoG6qHNhXEM2G7vDreG1hqK1ieCckj6 cg7bppeud961ZzU+LV6qYEK8d7JUzdzp+y21WXAhxAaX400PaUAu+98SD ZmHdr8qeWnxXtfbgpAc5mPnVx5idQJq1Tw1FC18gB/lNxHdVMs5EvMGzh w==; X-CSE-ConnectionGUID: 5/sJzgIRSASa1x2SRjPsUQ== X-CSE-MsgGUID: XxQ5tHCyQDyuE6z+fcLS1g== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959613" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959613" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:09 -0800 X-CSE-ConnectionGUID: idOysd39QxWehSbzUmNG6A== X-CSE-MsgGUID: EFvmq/ZnScyYLu5DUfcbIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674886" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:05 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 06/20] KVM: TDX: Handle TDX PV HLT hypercall Date: Thu, 27 Feb 2025 09:20:07 +0800 Message-ID: <20250227012021.1778144-7-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle TDX PV HLT hypercall and the interrupt status due to it. TDX guest status is protected, KVM can't get the interrupt status of TDX guest and it assumes interrupt is always allowed unless TDX guest calls TDVMCALL with HLT, which passes the interrupt blocked flag. If the guest halted with interrupt enabled, also query pending RVI by checking bit 0 of TD_VCPU_STATE_DETAILS_NON_ARCH field via a seamcall. Update vt_interrupt_allowed() for TDX based on interrupt blocked flag passed by HLT TDVMCALL. Do not wakeup TD vCPU if interrupt is blocked for VT-d PI. For NMIs, KVM cannot determine the NMI blocking status for TDX guests, so KVM always assumes NMIs are not blocked. In the unlikely scenario where a guest invokes the PV HLT hypercall within an NMI handler, this could result in a spurious wakeup. The guest should implement the PV HLT hypercall within a loop if it truly requires no interruptions, since NMI could be unblocked by an IRET due to an exception occurring before the PV HLT is executed in the NMI handler. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Morph PV HLT hypercall to EXIT_REASON_HLT. (Sean) - Rebased to use tdcall_to_vmx_exit_reason(). - Use vp_enter_args directly instead of helpers. - Check RVI pending for halted case and updated the changelog accordingly. TDX "the rest" v1: - Update the changelog. - Remove the interrupt_disabled_hlt field (Sean) https://lore.kernel.org/kvm/Zg1seIaTmM94IyR8@google.com/ - Move the logic of interrupt status to vt_interrupt_allowed() (Chao) https://lore.kernel.org/kvm/ZhIX7K0WK+gYtcan@chao-email/ - Add suggested-by tag. - Use tdx_check_exit_reason() - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) v19: - move tdvps_state_non_arch_check() to this patch v18: - drop buggy_hlt_workaround and use TDH.VP.RD(TD_VCPU_STATE_DETAILS) --- arch/x86/kvm/vmx/main.c | 2 +- arch/x86/kvm/vmx/posted_intr.c | 3 ++- arch/x86/kvm/vmx/tdx.c | 39 ++++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.h | 7 ++++++ arch/x86/kvm/vmx/tdx_arch.h | 11 ++++++++++ 5 files changed, 56 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 55c8507d60fe..d383f4510c15 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -418,7 +418,7 @@ static void vt_cancel_injection(struct kvm_vcpu *vcpu) static int vt_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection) { if (is_td_vcpu(vcpu)) - return true; + return tdx_interrupt_allowed(vcpu); return vmx_interrupt_allowed(vcpu, for_injection); } diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 895bbe85b818..f2ca37b3f606 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -203,7 +203,8 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) return; if (kvm_vcpu_is_blocking(vcpu) && - (is_td_vcpu(vcpu) || !vmx_interrupt_blocked(vcpu))) + ((is_td_vcpu(vcpu) && tdx_interrupt_allowed(vcpu)) || + (!is_td_vcpu(vcpu) && !vmx_interrupt_blocked(vcpu)))) pi_enable_wakeup_handler(vcpu); /* diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 8019bf553ca5..30f0ceeed7d2 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -720,9 +720,39 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) local_irq_enable(); } +bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) +{ + /* + * KVM can't get the interrupt status of TDX guest and it assumes + * interrupt is always allowed unless TDX guest calls TDVMCALL with HLT, + * which passes the interrupt blocked flag. + */ + return vmx_get_exit_reason(vcpu).basic != EXIT_REASON_HLT || + !to_tdx(vcpu)->vp_enter_args.r12; +} + bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) { - return pi_has_pending_interrupt(vcpu); + u64 vcpu_state_details; + + if (pi_has_pending_interrupt(vcpu)) + return true; + + /* + * Only check RVI pending for HALTED case with IRQ enabled. + * For non-HLT cases, KVM doesn't care about STI/SS shadows. And if the + * interrupt was pending before TD exit, then it _must_ be blocked, + * otherwise the interrupt would have been serviced at the instruction + * boundary. + */ + if (vmx_get_exit_reason(vcpu).basic != EXIT_REASON_HLT || + to_tdx(vcpu)->vp_enter_args.r12) + return false; + + vcpu_state_details = + td_state_non_arch_read64(to_tdx(vcpu), TD_VCPU_STATE_DETAILS_NON_ARCH); + + return tdx_vcpu_state_details_intr_pending(vcpu_state_details); } /* @@ -846,6 +876,7 @@ static __always_inline u32 tdcall_to_vmx_exit_reason(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { case EXIT_REASON_CPUID: + case EXIT_REASON_HLT: case EXIT_REASON_IO_INSTRUCTION: return tdvmcall_leaf(vcpu); case EXIT_REASON_EPT_VIOLATION: @@ -1103,9 +1134,7 @@ static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu) /* * Stop processing the remaining part if there is a pending interrupt, * which could be qualified to deliver. Skip checking pending RVI for - * TDVMCALL_MAP_GPA. - * TODO: Add a comment to link the reason when the target function is - * implemented. + * TDVMCALL_MAP_GPA, see comments in tdx_protected_apic_has_interrupt(). */ if (kvm_vcpu_has_events(vcpu)) { tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY); @@ -1908,6 +1937,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return 1; case EXIT_REASON_CPUID: return tdx_emulate_cpuid(vcpu); + case EXIT_REASON_HLT: + return kvm_emulate_halt_noskip(vcpu); case EXIT_REASON_TDCALL: return handle_tdvmcall(vcpu); case EXIT_REASON_VMCALL: diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index c6bafac31d4d..cb9014b7a4f1 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -121,6 +121,7 @@ static __always_inline void tdvps_vmcs_check(u32 field, u8 bits) } static __always_inline void tdvps_management_check(u64 field, u8 bits) {} +static __always_inline void tdvps_state_non_arch_check(u64 field, u8 bits) {} #define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \ static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \ @@ -168,11 +169,15 @@ static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ tdh_vp_wr_failed(tdx, #uclass, " &= ~", field, bit, err);\ } + +bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu); + TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management); +TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch); #else static inline int tdx_bringup(void) { return 0; } @@ -188,6 +193,8 @@ struct vcpu_tdx { struct kvm_vcpu vcpu; }; +static inline bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) { return false; } + #endif #endif diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index fcbf0d4abc5f..e1f636ef5d0e 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -36,6 +36,17 @@ enum tdx_tdcs_execution_control { TD_TDCS_EXEC_TSC_OFFSET = 10, }; +enum tdx_vcpu_guest_other_state { + TD_VCPU_STATE_DETAILS_NON_ARCH = 0x100, +}; + +#define TDX_VCPU_STATE_DETAILS_INTR_PENDING BIT_ULL(0) + +static inline bool tdx_vcpu_state_details_intr_pending(u64 vcpu_state_details) +{ + return !!(vcpu_state_details & TDX_VCPU_STATE_DETAILS_INTR_PENDING); +} + /* @field is any of enum tdx_tdcs_execution_control */ #define TDCS_EXEC(field) BUILD_TDX_FIELD(TD_CLASS_EXECUTION_CONTROLS, (field)) From patchwork Thu Feb 27 01:20:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993408 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99B7017A310; Thu, 27 Feb 2025 01:19:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619154; cv=none; b=MlmcWGgdrIr/3KXWq9E4pxJBi+J8AAxMf8HVOhPRUG3FdzxHk1GtKXLyF/70r5Mm6u7uSci9HNO0U+Mkp2i5nEAwRL4ILEYH1M8jlKsWmFS8PygBwAnSqJv8JKEf+bwfdkVdHnVbY4f+iLMibfEFgwc0Kt6sDsXO49jrTuULni8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619154; c=relaxed/simple; bh=6j2mZ1I5om5klX55W+lZKIazbOQN+SR/BAi30yYJ+t4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U+i8PvHe+LEqeObJhmURLLoA/6l6EgXtnHt/+Wp4VD2rpQCYCSzXITco02KPbnBofLTVEJ/sbXkzRJHjMIUxjuJU/S3YjPpybwRMEk4ldeCflK4l2K5tNstA43ah57lThrnnoFJByZ2MNtYqvjDFukuyboZsmPhyztOuG6z5Go4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ka/p5YOi; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ka/p5YOi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619153; x=1772155153; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6j2mZ1I5om5klX55W+lZKIazbOQN+SR/BAi30yYJ+t4=; b=ka/p5YOicjzYd2ZOzEg1jmuud3kstIn7ZTtrU42/fuv7+5reyRcCZv2h 1U0Py0QUr1mTbHexRRGQW5rZxpnYvtFIP2tiR1y6skmOFO9k3GiX7M34h hSSHasqZHhqCFaqTADdSCF5WFYH5Ci8KXUTd0mBtjMpOTdCEBsfPirNq1 iXrrDmXv2aaW17cKSv2uTcAl1bZTKJPLffyqQM4Qp9NuaeEgwawLsBbUc okx+Vi+uFuRKU13Wv0KvTkiuEcaNPSyDcbtmr/CSY+4YAO+qfHHL8bqNj AZpSBHpB8twuK6U0qaXRQwP4nHlSFA4uAyulGLfI7H5Pck1dWlmlMhVqy Q==; X-CSE-ConnectionGUID: xwDc5j/xSIyJEA0m1XrKeA== X-CSE-MsgGUID: 7kPhUU0hQjSaJXBp2v3hNA== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959616" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959616" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:12 -0800 X-CSE-ConnectionGUID: A73mde9rTpWfWeCWOxiNKA== X-CSE-MsgGUID: H+YZyFXsSZKhf/NLRIYiXw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674893" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:08 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 07/20] KVM: x86: Move KVM_MAX_MCE_BANKS to header file Date: Thu, 27 Feb 2025 09:20:08 +0800 Message-ID: <20250227012021.1778144-8-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Move KVM_MAX_MCE_BANKS to header file so that it can be used for TDX in a future patch. Signed-off-by: Isaku Yamahata [binbin: split into new patch] Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No Change. TDX "the rest" v1: - New patch. Split from "KVM: TDX: Implement callbacks for MSR operations for TDX". (Sean) https://lore.kernel.org/kvm/Zg1yPIV6cVJrwGxX@google.com/ --- arch/x86/kvm/x86.c | 1 - arch/x86/kvm/x86.h | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 33b64b65c166..1bd7977006c9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -90,7 +90,6 @@ #include "trace.h" #define MAX_IO_MSRS 256 -#define KVM_MAX_MCE_BANKS 32 /* * Note, kvm_caps fields should *never* have default values, all fields must be diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 72b4cee27d02..772d5c320be1 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -10,6 +10,8 @@ #include "kvm_emulate.h" #include "cpuid.h" +#define KVM_MAX_MCE_BANKS 32 + struct kvm_caps { /* control of guest tsc rate supported? */ bool has_tsc_control; From patchwork Thu Feb 27 01:20:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993409 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D6BC189B84; Thu, 27 Feb 2025 01:19:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619157; cv=none; b=fufvWw/Lc8VXDyOh6SWR1xCX+pO5+bRrMLYphlKi8HEipr4IRWh8Bsn4hg5U3UdI2trorKEECvTo4VOBNy2syrM4d2XOzw4mtdpQ+lUF7pwbKhgBNF/LDjq5sxneII8fUTQGuWPEpn6QJbrhfE8x34otVaK+s7fKDcBedAMVBwY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619157; c=relaxed/simple; bh=xCkNm4JnyBb3SmxvO8+ccf3merznB8vgaoOvBZJQoco=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VuUyR30NsJqiQR1aNte2PNEZfDcNABcCviu5nr1cY/5ByCf5ZwCPucYyZf9mfTgbeEatf6yfTRCTVUvDDZ7HafR39wG7uL6kvCE7Y1aWnWN0hb3/ruiG/PTApNWGJYf8hdqS0h5b8tjMQMiEV+5JZK+dozN4zfsoUZkRBtcxuT4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iIgqryt8; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iIgqryt8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619156; x=1772155156; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xCkNm4JnyBb3SmxvO8+ccf3merznB8vgaoOvBZJQoco=; b=iIgqryt8uvaSa016I5QJXbJ0wzoX7+4n5HXTum4O8lfKUH9+S/onflbJ X3LC+42lS4YM2yzma3B9pSApIRUggb4tAcnY8xWI+ns0ZkZmOuC8aGqfr U2eiPL4JknhJu0p+auRq8/oc8N+R5STfgj1olXpKE9+HlCNinFnVcOx+2 MHstIKZ9pe5b80C0rbaJxRMuj1TI+ahElh2nNyh51LB4tHaCUYkx9vnNa kBrKjA7sHyBurFNsM8Me7vwxArwvltYXZyEx3+DB9+mh3NNBPb96KGjAx PFvBtnnWw1gbLNkOL7bFKrathOo9PhpDHzWIRrGXr3FsyE1PF+CkOw+Ys A==; X-CSE-ConnectionGUID: QnqcxewQTSKgg/dDyl2crQ== X-CSE-MsgGUID: 7ZzVTap/Th+QHFKY/qHMMg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959622" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959622" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:16 -0800 X-CSE-ConnectionGUID: IRsI8NVSTW+rxyv86KxjxQ== X-CSE-MsgGUID: uXl6nllRSwCSktPIGUQclg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674897" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:12 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 08/20] KVM: TDX: Implement callbacks for MSR operations Date: Thu, 27 Feb 2025 09:20:09 +0800 Message-ID: <20250227012021.1778144-9-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add functions to implement MSR related callbacks, .set_msr(), .get_msr(), and .has_emulated_msr(), for preparation of handling hypercalls from TDX guest for PV RDMSR and WRMSR. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX. There are three classes of MSR virtualization for TDX. - Non-configurable: TDX module directly virtualizes it. VMM can't configure it, the value set by KVM_SET_MSRS is ignored. - Configurable: TDX module directly virtualizes it. VMM can configure it at VM creation time. The value set by KVM_SET_MSRS is used. - #VE case: TDX guest would issue TDG.VP.VMCALL and VMM handles the MSR hypercall. The value set by KVM_SET_MSRS is used. For the MSRs belonging to the #VE case, the TDX module injects #VE to the TDX guest upon RDMSR or WRMSR. The exact list of such MSRs is defined in TDX Module ABI Spec. Upon #VE, the TDX guest may call TDG.VP.VMCALL, which are defined in GHCI (Guest-Host Communication Interface) so that the host VMM (e.g. KVM) can virtualize the MSRs. TDX doesn't allow VMM to configure interception of MSR accesses. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX guest. If the userspace has set any MSR filters, it will be applied when handling TDG.VP.VMCALL in a later patch. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- TDX "the rest" v2: - Typo fix in changelog. (Kai) https://lore.kernel.org/kvm/34ed02a6aedc7dcb35df9ce639e427e4d1a61f72.camel@intel.com TDX "the rest" v1: - Renamed from "KVM: TDX: Implement callbacks for MSR operations for TDX" to "KVM: TDX: Implement callbacks for MSR operations" - Update changelog. - Remove the @write parameter of tdx_has_emulated_msr(), check whether the MSR is readonly or not in tdx_set_msr(). (Sean) https://lore.kernel.org/kvm/Zg1yPIV6cVJrwGxX@google.com/ - Change the code align with the patten "make the happy path the not-taken path". (Sean) - Open code the handling for an emulated KVM MSR MSR_KVM_POLL_CONTROL, and let others go through the default statement. (Sean) - Split macro KVM_MAX_MCE_BANKS move to a separate patch. (Sean) - Add comments in vt_msr_filter_changed(). - Add Suggested-by tag. --- arch/x86/kvm/vmx/main.c | 50 +++++++++++++++++++++++++--- arch/x86/kvm/vmx/tdx.c | 67 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 6 ++++ 3 files changed, 119 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d383f4510c15..463de3add3bd 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -193,6 +193,48 @@ static int vt_handle_exit(struct kvm_vcpu *vcpu, return vmx_handle_exit(vcpu, fastpath); } +static int vt_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + if (unlikely(is_td_vcpu(vcpu))) + return tdx_set_msr(vcpu, msr_info); + + return vmx_set_msr(vcpu, msr_info); +} + +/* + * The kvm parameter can be NULL (module initialization, or invocation before + * VM creation). Be sure to check the kvm parameter before using it. + */ +static bool vt_has_emulated_msr(struct kvm *kvm, u32 index) +{ + if (kvm && is_td(kvm)) + return tdx_has_emulated_msr(index); + + return vmx_has_emulated_msr(kvm, index); +} + +static int vt_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) +{ + if (unlikely(is_td_vcpu(vcpu))) + return tdx_get_msr(vcpu, msr_info); + + return vmx_get_msr(vcpu, msr_info); +} + +static void vt_msr_filter_changed(struct kvm_vcpu *vcpu) +{ + /* + * TDX doesn't allow VMM to configure interception of MSR accesses. + * TDX guest requests MSR accesses by calling TDVMCALL. The MSR + * filters will be applied when handling the TDVMCALL for RDMSR/WRMSR + * if the userspace has set any. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_msr_filter_changed(vcpu); +} + #ifdef CONFIG_KVM_SMM static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) { @@ -516,7 +558,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .disable_virtualization_cpu = vt_disable_virtualization_cpu, .emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu, - .has_emulated_msr = vmx_has_emulated_msr, + .has_emulated_msr = vt_has_emulated_msr, .vm_size = sizeof(struct kvm_vmx), @@ -535,8 +577,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .update_exception_bitmap = vmx_update_exception_bitmap, .get_feature_msr = vmx_get_feature_msr, - .get_msr = vmx_get_msr, - .set_msr = vmx_set_msr, + .get_msr = vt_get_msr, + .set_msr = vt_set_msr, .get_segment_base = vmx_get_segment_base, .get_segment = vmx_get_segment, .set_segment = vmx_set_segment, @@ -643,7 +685,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .apic_init_signal_blocked = vt_apic_init_signal_blocked, .migrate_timers = vmx_migrate_timers, - .msr_filter_changed = vmx_msr_filter_changed, + .msr_filter_changed = vt_msr_filter_changed, .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 30f0ceeed7d2..5b556af0f139 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2002,6 +2002,73 @@ void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, *error_code = 0; } +bool tdx_has_emulated_msr(u32 index) +{ + switch (index) { + case MSR_IA32_UCODE_REV: + case MSR_IA32_ARCH_CAPABILITIES: + case MSR_IA32_POWER_CTL: + case MSR_IA32_CR_PAT: + case MSR_IA32_TSC_DEADLINE: + case MSR_IA32_MISC_ENABLE: + case MSR_PLATFORM_INFO: + case MSR_MISC_FEATURES_ENABLES: + case MSR_IA32_APICBASE: + case MSR_EFER: + case MSR_IA32_MCG_CAP: + case MSR_IA32_MCG_STATUS: + case MSR_IA32_MCG_CTL: + case MSR_IA32_MCG_EXT_CTL: + case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: + case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1: + /* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC, CTL2} */ + case MSR_KVM_POLL_CONTROL: + return true; + case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff: + /* + * x2APIC registers that are virtualized by the CPU can't be + * emulated, KVM doesn't have access to the virtual APIC page. + */ + switch (index) { + case X2APIC_MSR(APIC_TASKPRI): + case X2APIC_MSR(APIC_PROCPRI): + case X2APIC_MSR(APIC_EOI): + case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR): + case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR): + case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR): + return false; + default: + return true; + } + default: + return false; + } +} + +static bool tdx_is_read_only_msr(u32 index) +{ + return index == MSR_IA32_APICBASE || index == MSR_EFER; +} + +int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) +{ + if (!tdx_has_emulated_msr(msr->index)) + return 1; + + return kvm_get_msr_common(vcpu, msr); +} + +int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) +{ + if (tdx_is_read_only_msr(msr->index)) + return 1; + + if (!tdx_has_emulated_msr(msr->index)) + return 1; + + return kvm_set_msr_common(vcpu, msr); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index e5d0c8a03566..19f770b0fc81 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -145,6 +145,9 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, void tdx_inject_nmi(struct kvm_vcpu *vcpu); void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); +bool tdx_has_emulated_msr(u32 index); +int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); +int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -188,6 +191,9 @@ static inline void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mo static inline void tdx_inject_nmi(struct kvm_vcpu *vcpu) {} static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) {} +static inline bool tdx_has_emulated_msr(u32 index) { return false; } +static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } +static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; } static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Thu Feb 27 01:20:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993410 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58E5B1917FB; Thu, 27 Feb 2025 01:19:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619160; cv=none; b=rvnAFTalUS75SmfY4VUlgrSXkMzka7qCAVmwc2xIYT3m2luhPQJXcEJy0b0o01tvU8cUyltXGUutwBe0iXovS4nu8FbPRfmSxMyrXtoiqrm/tUkT9rXEBlZq4nCg6bCeU00P6pamylftPQHXjl5WGpldriDUbWdFldsusj6hrt4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619160; c=relaxed/simple; bh=ri6AyiRTtAQTnjyYiC0FobX78tZ80oBXhVhCjzKkk9I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=puTnHtNxW8jCARZjEmR5LiHLYo2TH/gglIC4nEXC0XuGr8HHFmFAElzYTKBU+3vYX4EYH5r4dZh2g3kwM/I92nLpWdpx8Pt4cduUHqYUUuGvSZsOdJNt4bQHcCM5iUHX8wLhDQxpGWzNL36TRGf4ZeohC9KoyYpChVu/yp2V7Zo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nsgDY+01; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nsgDY+01" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619159; x=1772155159; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ri6AyiRTtAQTnjyYiC0FobX78tZ80oBXhVhCjzKkk9I=; b=nsgDY+01bk4nyLptcTnvweZ7dV9SXqs0L6KatVTwnDlSdbpYWuZ63iDM jt1Mli2Npbt9MLdGI2cGJrn1E43za2P6FI+1DLMepyLmRHTJ4dg33PJQS 0gWRO7WNau66VgjonU/u3XWnJnMeCkWfBKxe+oq4xUXgR3iW5QOMxJOCD 5ZVtPUVkdmrc8RyHY2oCP7b06N+8BbSapJVQxF3+MzmnmotT48yxH4cqB GJLwaOuaWClHPQIdqyJYH5yDM9jBXEIp3Wh4FnQhW9Rj9+33R3Ej6OE4i gjpcxpsh3Z/snAJ5TFdxT7/yMETbYshnWZ2nq4zK7DKce4F9Fy+QMx81g Q==; X-CSE-ConnectionGUID: y6eDhVYXRneWQ1vcVZ+u2Q== X-CSE-MsgGUID: d/6uD2QASiab12tkYbH2mg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959626" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959626" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:19 -0800 X-CSE-ConnectionGUID: tLI6RLXxRxqkpvfZtvvDqg== X-CSE-MsgGUID: msoI5mqTQQyrheb4v1ne9Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674900" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:15 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 09/20] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Date: Thu, 27 Feb 2025 09:20:10 +0800 Message-ID: <20250227012021.1778144-10-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Morph PV RDMSR/WRMSR hypercall to EXIT_REASON_MSR_{READ,WRITE} and wire up KVM backend functions. For complete_emulated_msr() callback, instead of injecting #GP on error, implement tdx_complete_emulated_msr() to set return code on error. Also set return value on MSR read according to the values from kvm x86 registers. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- TDX "the rest" v2: - Morph PV RDMSR/WRMSR hypercall to EXIT_REASON_MSR_{READ,WRITE}. (Sean) - Rebased to use tdcall_to_vmx_exit_reason(). - Marshall values to the appropriate x86 registers to leverage the existing kvm_emulate_{rdmsr,wrmsr}(). (Sean) - Implement complete_emulated_msr() callback for TDX to set return value and return code to vp_enter_args. - Update changelog. TDX "the rest" v1: - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- arch/x86/kvm/vmx/main.c | 10 +++++++++- arch/x86/kvm/vmx/tdx.c | 24 ++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 2 ++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 463de3add3bd..7f3be1b65ce1 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -235,6 +235,14 @@ static void vt_msr_filter_changed(struct kvm_vcpu *vcpu) vmx_msr_filter_changed(vcpu); } +static int vt_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) +{ + if (is_td_vcpu(vcpu)) + return tdx_complete_emulated_msr(vcpu, err); + + return kvm_complete_insn_gp(vcpu, err); +} + #ifdef CONFIG_KVM_SMM static int vt_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection) { @@ -686,7 +694,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .migrate_timers = vmx_migrate_timers, .msr_filter_changed = vt_msr_filter_changed, - .complete_emulated_msr = kvm_complete_insn_gp, + .complete_emulated_msr = vt_complete_emulated_msr, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5b556af0f139..85ff6e040cf3 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -878,6 +878,8 @@ static __always_inline u32 tdcall_to_vmx_exit_reason(struct kvm_vcpu *vcpu) case EXIT_REASON_CPUID: case EXIT_REASON_HLT: case EXIT_REASON_IO_INSTRUCTION: + case EXIT_REASON_MSR_READ: + case EXIT_REASON_MSR_WRITE: return tdvmcall_leaf(vcpu); case EXIT_REASON_EPT_VIOLATION: return EXIT_REASON_EPT_MISCONFIG; @@ -1880,6 +1882,20 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) return ret; } +int tdx_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) +{ + if (err) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; + } + + if (vmx_get_exit_reason(vcpu).basic == EXIT_REASON_MSR_READ) + tdvmcall_set_return_val(vcpu, kvm_read_edx_eax(vcpu)); + + return 1; +} + + int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -1945,6 +1961,14 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return tdx_emulate_vmcall(vcpu); case EXIT_REASON_IO_INSTRUCTION: return tdx_emulate_io(vcpu); + case EXIT_REASON_MSR_READ: + kvm_rcx_write(vcpu, tdx->vp_enter_args.r12); + return kvm_emulate_rdmsr(vcpu); + case EXIT_REASON_MSR_WRITE: + kvm_rcx_write(vcpu, tdx->vp_enter_args.r12); + kvm_rax_write(vcpu, tdx->vp_enter_args.r13 & -1u); + kvm_rdx_write(vcpu, tdx->vp_enter_args.r13 >> 32); + return kvm_emulate_wrmsr(vcpu); case EXIT_REASON_EPT_MISCONFIG: return tdx_emulate_mmio(vcpu); case EXIT_REASON_EPT_VIOLATION: diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index cb9014b7a4f1..8f8070d0f55e 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -171,6 +171,7 @@ static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu); +int tdx_complete_emulated_msr(struct kvm_vcpu *vcpu, int err); TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); @@ -194,6 +195,7 @@ struct vcpu_tdx { }; static inline bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu) { return false; } +static inline int tdx_complete_emulated_msr(struct kvm_vcpu *vcpu, int err) { return 0; } #endif From patchwork Thu Feb 27 01:20:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993411 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C0231A0730; Thu, 27 Feb 2025 01:19:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619164; cv=none; b=KFWF6irqNeGuhQplHUc42CtwWY21J7DjCmB1kZuqpUypapHNOfnKs5RPtqtRaQGnnsy1/EwAzT/+O5bLfIjUQq3MxI1M/SOJs1l7IaU8gPl9aqUCefERUnhyyKkAJCIQsbB2syewHNFdp646202LPaclYjEhg+fTLLYOp/PO1vM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619164; c=relaxed/simple; bh=vMUAeJTPBCoxwqnHG1I/vI/AaVpM8hcMwkiv2EGrwSY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cp6+OsNYHLS0OAgUHKFaPa70UJF4W8YO9jbwgr0AGQn8i6FVp2h2xKLyAslR4odfAh6cEj8pffSjgk/0vyHQc7uMM68HxnmG9L48sc8geTOrmkD/krfVGBU6aiXVdLS3y6HNi1OTp0DYrUFnPG2wszYbRGQ0G93/jqW8iTH2gzQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QnFIBP4I; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QnFIBP4I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619163; x=1772155163; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vMUAeJTPBCoxwqnHG1I/vI/AaVpM8hcMwkiv2EGrwSY=; b=QnFIBP4Iq5FUlxuVLLTm/6/JwdFIupoFs4jjiXS9HagofIAyUNe6Rt/V OQTICO81E8s04joeQFgLy3L/TGjPj/PTeAkNl/OPOzzS9UVXLH2KQpEPo o5C3i2mHDOg/H+ydDnKg5lbFx28y/VtkT2KIxwj/M92bIgrp0IZygj+qs OJhr9ruah5NI/gtcNMDmhhFfyfvmEs7ZMxC7SZlxAjysF4KKVBqoOL2Fm ta/PG8z9NRmE4bM2EnYTotMgxD4RNlIfxYnN4VTnPFLE7C69tGQKPOoIs AHJeUuBUpH9xhVjxSFhfjRF3CJJcJ2Fgw/gN+r2AjxAEQbPqKLHPdXrZI w==; X-CSE-ConnectionGUID: VAQziL1gQmeyGhXFnEhQxw== X-CSE-MsgGUID: 95E1klA2TVa93KWQ1YQdqg== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959629" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959629" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:22 -0800 X-CSE-ConnectionGUID: B9a286aRSqSCybPrJlJGOw== X-CSE-MsgGUID: IBjaz6nsSIqmXyBFootrVw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674903" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:18 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 10/20] KVM: TDX: Enable guest access to LMCE related MSRs Date: Thu, 27 Feb 2025 09:20:11 +0800 Message-ID: <20250227012021.1778144-11-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Allow TDX guest to configure LMCE (Local Machine Check Event) by handling MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL. MCE and MCA are advertised via cpuid based on the TDX module spec. Guest kernel can access IA32_FEAT_CTL to check whether LMCE is opted-in by the platform or not. If LMCE is opted-in by the platform, guest kernel can access IA32_MCG_EXT_CTL to enable/disable LMCE. Handle MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL for TDX guests to avoid failure when a guest accesses them with TDG.VP.VMCALL on #VE. E.g., Linux guest will treat the failure as a #GP(0). Userspace VMM may not opt-in LMCE by default, e.g., QEMU disables it by default, "-cpu lmce=on" is needed in QEMU command line to opt-in it. Signed-off-by: Isaku Yamahata [binbin: rework changelog] Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No Change. TDX "the rest" v1: - Renamed from "KVM: TDX: Handle MSR IA32_FEAT_CTL MSR and IA32_MCG_EXT_CTL" to "KVM: TDX: Enable guest access to LMCE related MSRs". - Update changelog. - Check reserved bits are not set when set MSR_IA32_MCG_EXT_CTL. --- arch/x86/kvm/vmx/tdx.c | 46 +++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 85ff6e040cf3..76764bf5ba29 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2039,6 +2039,7 @@ bool tdx_has_emulated_msr(u32 index) case MSR_MISC_FEATURES_ENABLES: case MSR_IA32_APICBASE: case MSR_EFER: + case MSR_IA32_FEAT_CTL: case MSR_IA32_MCG_CAP: case MSR_IA32_MCG_STATUS: case MSR_IA32_MCG_CTL: @@ -2071,26 +2072,53 @@ bool tdx_has_emulated_msr(u32 index) static bool tdx_is_read_only_msr(u32 index) { - return index == MSR_IA32_APICBASE || index == MSR_EFER; + return index == MSR_IA32_APICBASE || index == MSR_EFER || + index == MSR_IA32_FEAT_CTL; } int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { - if (!tdx_has_emulated_msr(msr->index)) - return 1; + switch (msr->index) { + case MSR_IA32_FEAT_CTL: + /* + * MCE and MCA are advertised via cpuid. Guest kernel could + * check if LMCE is enabled or not. + */ + msr->data = FEAT_CTL_LOCKED; + if (vcpu->arch.mcg_cap & MCG_LMCE_P) + msr->data |= FEAT_CTL_LMCE_ENABLED; + return 0; + case MSR_IA32_MCG_EXT_CTL: + if (!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) + return 1; + msr->data = vcpu->arch.mcg_ext_ctl; + return 0; + default: + if (!tdx_has_emulated_msr(msr->index)) + return 1; - return kvm_get_msr_common(vcpu, msr); + return kvm_get_msr_common(vcpu, msr); + } } int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { - if (tdx_is_read_only_msr(msr->index)) - return 1; + switch (msr->index) { + case MSR_IA32_MCG_EXT_CTL: + if ((!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) || + (msr->data & ~MCG_EXT_CTL_LMCE_EN)) + return 1; + vcpu->arch.mcg_ext_ctl = msr->data; + return 0; + default: + if (tdx_is_read_only_msr(msr->index)) + return 1; - if (!tdx_has_emulated_msr(msr->index)) - return 1; + if (!tdx_has_emulated_msr(msr->index)) + return 1; - return kvm_set_msr_common(vcpu, msr); + return kvm_set_msr_common(vcpu, msr); + } } static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) From patchwork Thu Feb 27 01:20:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993412 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF9B61A83E6; Thu, 27 Feb 2025 01:19:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619167; cv=none; b=tPKlJSy9b++Q4ccAD6OxjwHl1iS67cHH92BLenAnxMh1HcQBG9AQHoDbT8TppCUF0wEzNq7McUk1BgZyiDIAFWpqSv0bDkBX/cVIgIS79kfZe3dsOODGKJdwUGjG54Akwz+JOmCRYUYzqmB1kjApe3kMGfa25pu4qnrrYk2gM+0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619167; c=relaxed/simple; bh=lEHEJBGNv7Z2ZmoEjKOLCQwWgl97DVj2RsejJ+To+VM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Us2Zu++9kTazA37qvQ4CN76i4+jo1IH8T0nnFat3E+e5mtCIjMdyoTTcRJctT5HqDmVvUW4BOxEi863vbtRYoWGZpkNi0tP18BTNcVv6hwdVz1Fk4EQ7R+QXivKaqP3pbrQLtLE8EVnOrkopemvo+2akwhZBk7w3gfEd9UYLxpk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QUDCEqIp; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QUDCEqIp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619166; x=1772155166; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lEHEJBGNv7Z2ZmoEjKOLCQwWgl97DVj2RsejJ+To+VM=; b=QUDCEqIp/ikRVel9mP3PC9NFNZfl54C8yW/R+lMF9mvBF4jj49DiFTEn BxWgmVPLC4AziWf+nV5N7q9T8AU5lWUKLFJfavhPxrY2LZgDyU6URtNgT oEnR2WDYmxheDwf+SwalwNVPCJpf1gqkIElzPLkfFABRWE+KCa+31yq8b 7JgSqbo8JQZTvLN7gzp6CoqYDGiJfx4f9wf4R4RRcC/GLm9qVDNrCfCEd ovQFvxjsDNAhWLtRX0zk3qqzp+gBAiJDiPfRGPHaKolfbULZFh1SaaTqA aDTdknvyMLXT/CWkMnqaRlg8g3G3FZRhaPrewO38VjU6AiM2QxUgEUs7o w==; X-CSE-ConnectionGUID: kJFl6CeoSiib36MIWREWMw== X-CSE-MsgGUID: XPizCEjnQ1mIE1KbWPCoQw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959632" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959632" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:26 -0800 X-CSE-ConnectionGUID: +OlgE+0bRomnfoTj7CiJwg== X-CSE-MsgGUID: cm2TS4g5Q1KaSsnlAxP6Dg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674906" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:22 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 11/20] KVM: TDX: Handle TDG.VP.VMCALL hypercall Date: Thu, 27 Feb 2025 09:20:12 +0800 Message-ID: <20250227012021.1778144-12-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement TDG.VP.VMCALL hypercall. If the input value is zero, return success code and zero in output registers. TDG.VP.VMCALL hypercall is a subleaf of TDG.VP.VMCALL to enumerate which TDG.VP.VMCALL sub leaves are supported. This hypercall is for future enhancement of the Guest-Host-Communication Interface (GHCI) specification. The GHCI version of 344426-001US defines it to require input R12 to be zero and to return zero in output registers, R11, R12, R13, and R14 so that guest TD enumerates no enhancement. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Use vp_enter_args directly instead of helpers. - Skip setting return code as TDVMCALL_STATUS_SUCCESS. - Skip setting tdx->vp_enter_args.r12 to 0 because it already is 0. TDX "the rest" v1: - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) v19: - rename TDG_VP_VMCALL_GET_TD_VM_CALL_INFO => TDVMCALL_GET_TD_VM_CALL_INFO --- arch/x86/include/asm/shared/tdx.h | 1 + arch/x86/kvm/vmx/tdx.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index f23657350d28..606d93a1cbac 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -67,6 +67,7 @@ #define TD_CTLS_LOCK BIT_ULL(TD_CTLS_LOCK_BIT) /* TDX hypercall Leaf IDs */ +#define TDVMCALL_GET_TD_VM_CALL_INFO 0x10000 #define TDVMCALL_MAP_GPA 0x10001 #define TDVMCALL_GET_QUOTE 0x10002 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 76764bf5ba29..dbc9fffcbc26 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1422,6 +1422,20 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) return 1; } +static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (tdx->vp_enter_args.r12) + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + else { + tdx->vp_enter_args.r11 = 0; + tdx->vp_enter_args.r13 = 0; + tdx->vp_enter_args.r14 = 0; + } + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { @@ -1429,6 +1443,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_map_gpa(vcpu); case TDVMCALL_REPORT_FATAL_ERROR: return tdx_report_fatal_error(vcpu); + case TDVMCALL_GET_TD_VM_CALL_INFO: + return tdx_get_td_vm_call_info(vcpu); default: break; } From patchwork Thu Feb 27 01:20:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993413 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E9661ABEAC; Thu, 27 Feb 2025 01:19:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619170; cv=none; b=h3X93zZQ6643SeFmLxQ9oBUw7v+5BU75TnHdHufFpNGfaWnZxxqKEomyEYE5QT7gE+Qt6+irjYg0tngrm2aFfqKOFkut+T/PUuOzRB+qyzK2Oea29fvQo8yiZlOCfKyP/HrtshvUY0IT2idie4BG6C0BR6n6ijCC8CdiFEOK7tQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619170; c=relaxed/simple; bh=MMd4x+ve8cCyfIEcQz0SDKFYVbOKm7OQ9khPYc920ic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Owmzn8jrSDnab41nq5RYDE0ChuddKSQ8/gBWHHn4KoTGuzPDmQcJ/gOEd/lDsH3a3T0PpKeIowlxgmzRHo4Y5CDePPwXD6foeECEvcHQVw+YKGISZPNyM5gOFAIwj6kHEiPx6KG+Bv5OHQiNXeIS7T9WPbOzJuYLPmDkH9uIkO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LAvRnsKi; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LAvRnsKi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619169; x=1772155169; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MMd4x+ve8cCyfIEcQz0SDKFYVbOKm7OQ9khPYc920ic=; b=LAvRnsKiDJWS6nU5Rvs0zvuCAhNgk8Ejwwq2YUyBYEH4al3drDnVXowq M8x0rSN92MNvotrrsep6e+eaV8/lVazCefhxNUXsxqnzMD/ceGvyK5WnX 7fw9BWuP0VmI1khmKVscm42r4/S8mIRGKXOl+MZzwICBKT6EdN9wA6ErL xbJp/r9vfE7TIGKnojlf7BSTgAAZIQ09FEZNYQZFYoaQnI3fKkQGzIgRy ESLf46isDy0WZ9FLcRs+GH02Ide/AaGY7vSM1+fbgSp2+ZgU/9/DueqfP LHDpuvaho9SMMa8BDQ78cIFtKjuoWQVCIXiYS3+4he3l+DBipxDS4ijA4 g==; X-CSE-ConnectionGUID: iw/nia+GRgmu/5mD0Ma6fw== X-CSE-MsgGUID: vLH5g8uSS5G26454yEQ+Qw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959637" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959637" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:29 -0800 X-CSE-ConnectionGUID: JdwNei5QTY+9s+SmsYZgtw== X-CSE-MsgGUID: Ljg26MCmRBqZVl+fm8mvdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674909" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:25 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 12/20] KVM: TDX: Add methods to ignore accesses to CPU state Date: Thu, 27 Feb 2025 09:20:13 +0800 Message-ID: <20250227012021.1778144-13-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX protects TDX guest state from VMM. Implement access methods for TDX guest state to ignore them or return zero. Because those methods can be called by kvm ioctls to set/get cpu registers, they don't have KVM_BUG_ON. Signed-off-by: Isaku Yamahata Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Discard the mistakenly introduced change. https://lore.kernel.org/kvm/67dd44fb-0211-4435-a294-b9f00dc681d8@linux.intel.com - Drop tdx_cache_reg() (Adrian, Sean) TDX "the rest" v1: - Dropped KVM_BUG_ON() in vt_sync_dirty_debug_regs(). (Rick) Since the KVM_BUG_ON() is removed, change the changlog accordingly. --- arch/x86/kvm/vmx/main.c | 307 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 278 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 7f3be1b65ce1..b76f39cc56fb 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -335,6 +335,214 @@ static void vt_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, vmx_deliver_interrupt(apic, delivery_mode, trig_mode, vector); } +static void vt_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_vcpu_after_set_cpuid(vcpu); +} + +static void vt_update_exception_bitmap(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_update_exception_bitmap(vcpu); +} + +static u64 vt_get_segment_base(struct kvm_vcpu *vcpu, int seg) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_segment_base(vcpu, seg); +} + +static void vt_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, + int seg) +{ + if (is_td_vcpu(vcpu)) { + memset(var, 0, sizeof(*var)); + return; + } + + vmx_get_segment(vcpu, var, seg); +} + +static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, + int seg) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_segment(vcpu, var, seg); +} + +static int vt_get_cpl(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_cpl(vcpu); +} + +static int vt_get_cpl_no_cache(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_cpl_no_cache(vcpu); +} + +static void vt_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l) +{ + if (is_td_vcpu(vcpu)) { + *db = 0; + *l = 0; + return; + } + + vmx_get_cs_db_l_bits(vcpu, db, l); +} + +static bool vt_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (is_td_vcpu(vcpu)) + return true; + + return vmx_is_valid_cr0(vcpu, cr0); +} + +static void vt_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_cr0(vcpu, cr0); +} + +static bool vt_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + if (is_td_vcpu(vcpu)) + return true; + + return vmx_is_valid_cr4(vcpu, cr4); +} + +static void vt_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_cr4(vcpu, cr4); +} + +static int vt_set_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_set_efer(vcpu, efer); +} + +static void vt_get_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) { + memset(dt, 0, sizeof(*dt)); + return; + } + + vmx_get_idt(vcpu, dt); +} + +static void vt_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_idt(vcpu, dt); +} + +static void vt_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) { + memset(dt, 0, sizeof(*dt)); + return; + } + + vmx_get_gdt(vcpu, dt); +} + +static void vt_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_gdt(vcpu, dt); +} + +static void vt_set_dr6(struct kvm_vcpu *vcpu, unsigned long val) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_dr6(vcpu, val); +} + +static void vt_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_dr7(vcpu, val); +} + +static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) +{ + /* + * MOV-DR exiting is always cleared for TD guest, even in debug mode. + * Thus KVM_DEBUGREG_WONT_EXIT can never be set and it should never + * reach here for TD vcpu. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_sync_dirty_debug_regs(vcpu); +} + +static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) +{ + if (WARN_ON_ONCE(is_td_vcpu(vcpu))) + return; + + vmx_cache_reg(vcpu, reg); +} + +static unsigned long vt_get_rflags(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_rflags(vcpu); +} + +static void vt_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_set_rflags(vcpu, rflags); +} + +static bool vt_get_if_flag(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return false; + + return vmx_get_if_flag(vcpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -457,6 +665,14 @@ static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) vmx_inject_irq(vcpu, reinjected); } +static void vt_inject_exception(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_inject_exception(vcpu); +} + static void vt_cancel_injection(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -504,6 +720,14 @@ static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); } +static void vt_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_update_cr8_intercept(vcpu, tpr, irr); +} + static void vt_set_apic_access_page_addr(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) @@ -522,6 +746,30 @@ static void vt_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) vmx_refresh_apicv_exec_ctrl(vcpu); } +static void vt_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_load_eoi_exitmap(vcpu, eoi_exit_bitmap); +} + +static int vt_set_tss_addr(struct kvm *kvm, unsigned int addr) +{ + if (is_td(kvm)) + return 0; + + return vmx_set_tss_addr(kvm, addr); +} + +static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) +{ + if (is_td(kvm)) + return 0; + + return vmx_set_identity_map_addr(kvm, ident_addr); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -583,32 +831,33 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_vcpu_load, .vcpu_put = vt_vcpu_put, - .update_exception_bitmap = vmx_update_exception_bitmap, + .update_exception_bitmap = vt_update_exception_bitmap, .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_get_msr, .set_msr = vt_set_msr, - .get_segment_base = vmx_get_segment_base, - .get_segment = vmx_get_segment, - .set_segment = vmx_set_segment, - .get_cpl = vmx_get_cpl, - .get_cpl_no_cache = vmx_get_cpl_no_cache, - .get_cs_db_l_bits = vmx_get_cs_db_l_bits, - .is_valid_cr0 = vmx_is_valid_cr0, - .set_cr0 = vmx_set_cr0, - .is_valid_cr4 = vmx_is_valid_cr4, - .set_cr4 = vmx_set_cr4, - .set_efer = vmx_set_efer, - .get_idt = vmx_get_idt, - .set_idt = vmx_set_idt, - .get_gdt = vmx_get_gdt, - .set_gdt = vmx_set_gdt, - .set_dr6 = vmx_set_dr6, - .set_dr7 = vmx_set_dr7, - .sync_dirty_debug_regs = vmx_sync_dirty_debug_regs, - .cache_reg = vmx_cache_reg, - .get_rflags = vmx_get_rflags, - .set_rflags = vmx_set_rflags, - .get_if_flag = vmx_get_if_flag, + + .get_segment_base = vt_get_segment_base, + .get_segment = vt_get_segment, + .set_segment = vt_set_segment, + .get_cpl = vt_get_cpl, + .get_cpl_no_cache = vt_get_cpl_no_cache, + .get_cs_db_l_bits = vt_get_cs_db_l_bits, + .is_valid_cr0 = vt_is_valid_cr0, + .set_cr0 = vt_set_cr0, + .is_valid_cr4 = vt_is_valid_cr4, + .set_cr4 = vt_set_cr4, + .set_efer = vt_set_efer, + .get_idt = vt_get_idt, + .set_idt = vt_set_idt, + .get_gdt = vt_get_gdt, + .set_gdt = vt_set_gdt, + .set_dr6 = vt_set_dr6, + .set_dr7 = vt_set_dr7, + .sync_dirty_debug_regs = vt_sync_dirty_debug_regs, + .cache_reg = vt_cache_reg, + .get_rflags = vt_get_rflags, + .set_rflags = vt_set_rflags, + .get_if_flag = vt_get_if_flag, .flush_tlb_all = vt_flush_tlb_all, .flush_tlb_current = vt_flush_tlb_current, @@ -625,7 +874,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .patch_hypercall = vmx_patch_hypercall, .inject_irq = vt_inject_irq, .inject_nmi = vt_inject_nmi, - .inject_exception = vmx_inject_exception, + .inject_exception = vt_inject_exception, .cancel_injection = vt_cancel_injection, .interrupt_allowed = vt_interrupt_allowed, .nmi_allowed = vt_nmi_allowed, @@ -633,13 +882,13 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_nmi_mask = vt_set_nmi_mask, .enable_nmi_window = vt_enable_nmi_window, .enable_irq_window = vt_enable_irq_window, - .update_cr8_intercept = vmx_update_cr8_intercept, + .update_cr8_intercept = vt_update_cr8_intercept, .x2apic_icr_is_split = false, .set_virtual_apic_mode = vt_set_virtual_apic_mode, .set_apic_access_page_addr = vt_set_apic_access_page_addr, .refresh_apicv_exec_ctrl = vt_refresh_apicv_exec_ctrl, - .load_eoi_exitmap = vmx_load_eoi_exitmap, + .load_eoi_exitmap = vt_load_eoi_exitmap, .apicv_pre_state_restore = vt_apicv_pre_state_restore, .required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS, .hwapic_isr_update = vt_hwapic_isr_update, @@ -647,14 +896,14 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .deliver_interrupt = vt_deliver_interrupt, .dy_apicv_has_pending_interrupt = pi_has_pending_interrupt, - .set_tss_addr = vmx_set_tss_addr, - .set_identity_map_addr = vmx_set_identity_map_addr, + .set_tss_addr = vt_set_tss_addr, + .set_identity_map_addr = vt_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, .get_exit_info = vt_get_exit_info, .get_entry_info = vt_get_entry_info, - .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, + .vcpu_after_set_cpuid = vt_vcpu_after_set_cpuid, .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, From patchwork Thu Feb 27 01:20:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993414 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 590371487D1; Thu, 27 Feb 2025 01:19:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619173; cv=none; b=YTXlxL4SOFLh1cpRYLgVe2rhJM0K3qnhLDoC+ZhFSfkLVbZQMeg4Wz6UV45m6TmEnIJL2geWJfupSSiY2+zDH9B1nX29r3JcyUZb/APBNGaruiWi+tzEosaB42fpJTk3Zbuh2tVdK0y55J5x351RH0lO6WydMCpMG1JT30kJchM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619173; c=relaxed/simple; bh=RFuuXxLhAxJ8SSjFqPETXY9iTs1K4e+otykdJV4ex5w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n3M95z78pSBKuWz6CY2GHUd5MZVXSG6rVyAogyeJgx+J74F0gm0vkBNhGBuXrWeo7yrV66VbAMnEpkl5JYuBYA8fNT6W5TVS+wTZSd8gAENdeByNf0p5ljSWekqyCSZ6mxIAdv1YVVTrDs3xQEK4D5RCnoE952CbM1fc0TH19PE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jJydBdrD; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jJydBdrD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619172; x=1772155172; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RFuuXxLhAxJ8SSjFqPETXY9iTs1K4e+otykdJV4ex5w=; b=jJydBdrDzmZLcTv8GleIAAAnga8K4bGpvfdiuB2o6KE2nYs+1LH0Qvss wR/qX2nO/vmDhmRxj/mgyPC5lqFagLnw1d2uy7/jg2GIUtW4rzlcWhw+5 uG8qpJWsnAy8Ix36XiqQLExWT4Y9WCYADv8oM78+hDIZNc6uaDnt+aCRh i965VXs7QkFAU8AzMIYxUYQ40/zzKpCY58rJBpJcUEGUPfIkPkMvL8ZG5 Fy3J1kF7Evsw8onZ6c/LFH+7Nmr8SjUbJjWdfv1n+Ww1yVxVj0wZZ9lgp CYFxXejhMLyVW1okfUveaXbmGr0iUmBwsSpmVRB5MCtPZVDqX4RtO8/t9 A==; X-CSE-ConnectionGUID: CuFk5ddbRCGnDDH9fOzQ4g== X-CSE-MsgGUID: kZZ5qlmCSA+taPr0TinM3Q== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959644" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959644" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:32 -0800 X-CSE-ConnectionGUID: 3jdzHrL+TiOMnWAEY/9saA== X-CSE-MsgGUID: eDhEubT4RrOkLxVAQu63NQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674913" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:28 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 13/20] KVM: TDX: Add method to ignore guest instruction emulation Date: Thu, 27 Feb 2025 09:20:14 +0800 Message-ID: <20250227012021.1778144-14-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Skip instruction emulation and let the TDX guest retry for MMIO emulation after installing the MMIO SPTE with suppress #VE bit cleared. TDX protects TDX guest state from VMM, instructions in guest memory cannot be emulated. MMIO emulation is the only case that triggers the instruction emulation code path for TDX guest. The MMIO emulation handling flow as following: - The TDX guest issues a vMMIO instruction. (The GPA must be shared and is not covered by KVM memory slot.) - The default SPTE entry for shared-EPT by KVM has suppress #VE bit set. So EPT violation causes TD exit to KVM. - Trigger KVM page fault handler and install a new SPTE with suppress #VE bit cleared. - Skip instruction emulation and return X86EMU_RETRY_INSTR to let the vCPU retry. - TDX guest re-executes the vMMIO instruction. - TDX guest gets #VE because KVM has cleared #VE suppress bit. - TDX guest #VE handler converts MMIO into TDG.VP.VMCALL Return X86EMU_RETRY_INSTR in the callback check_emulate_instruction() for TDX guests to retry the MMIO instruction. Also, the instruction emulation handling will be skipped, so that the callback check_intercept() will never be called for TDX guest. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No change. TDX "the rest" v1: - Dropped vt_check_intercept(). - Add a comment in vt_check_emulate_instruction(). - Update the changelog. --- arch/x86/kvm/vmx/main.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index b76f39cc56fb..035c3ed263b7 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -278,6 +278,22 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu) } #endif +static int vt_check_emulate_instruction(struct kvm_vcpu *vcpu, int emul_type, + void *insn, int insn_len) +{ + /* + * For TDX, this can only be triggered for MMIO emulation. Let the + * guest retry after installing the SPTE with suppress #VE bit cleared, + * so that the guest will receive #VE when retry. The guest is expected + * to call TDG.VP.VMCALL to request VMM to do MMIO emulation on + * #VE. + */ + if (is_td_vcpu(vcpu)) + return X86EMUL_RETRY_INSTR; + + return vmx_check_emulate_instruction(vcpu, emul_type, insn, insn_len); +} + static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu) { /* @@ -938,7 +954,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .enable_smi_window = vt_enable_smi_window, #endif - .check_emulate_instruction = vmx_check_emulate_instruction, + .check_emulate_instruction = vt_check_emulate_instruction, .apic_init_signal_blocked = vt_apic_init_signal_blocked, .migrate_timers = vmx_migrate_timers, From patchwork Thu Feb 27 01:20:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993415 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98BBB14885D; Thu, 27 Feb 2025 01:19:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619177; cv=none; b=cEKYVzI56LbJjxqLrtFknMws0TLCdBoa8la0D5Ni9h0B/+7gYZZgTNNa5xFE1+YvfgrVWxkp681lKi9qLpDff0lzmfZ5ZgsUVPwCCcbGfq+7iiFe9ofxVrjuA9GCb407IIOi03MTurb8aVUWZey82vXaIov5EltA7v58QQWshCY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619177; c=relaxed/simple; bh=SUrcgcg5oLsLNX/TkXFvCFnyDw/4Fg8vQrrN66xcXXs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ta04gYAY2xnkBYjJkbLTyMReUzPZXzO/oA8teQ+FpB3kmwUOyh/unfFyGy6jNjoZ7T0cC6KwID/tTfEFafFSlROYnEqKX1ymLRIuSgcN0tCQq2RoozWbsD4quF9y/xFle/XArr9E/bH4UfrZJdjZxiN9nY8bqd57QOo+iGM+eJs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=da8GQCWT; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="da8GQCWT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619176; x=1772155176; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SUrcgcg5oLsLNX/TkXFvCFnyDw/4Fg8vQrrN66xcXXs=; b=da8GQCWTBGcjxJ8myovRGfpLCtD5BC9no9w3fKimOpVBUJ5M8P5dv6np SpCHLBwiASVj9wqF0LWjzK93B5rkmDEdQEQ37kyjpjxaRMUNqFGwFf2+B Rn+u/pUsQG2QZRIHw8Slg3gFLA+p50QdGfe/Ts+q6OtmCCcgYJDIuBxew 83PSXGdfuy58moSda52aV3EdJhB13af1A8fd9AtZFcri2FajYgiJo/wGO Mzb9h+T0AihKqV/3y/vExK1UzzD5dQ9njckHqJESS60+mfLUHXh7TY72V KBPMM6YhFzljkmLQAoRCw62cJXCTK4/fyhcvCCR54ok4ApVVGU2BFCRoS A==; X-CSE-ConnectionGUID: dRjJMhf3TOCzvY5RZ96L/w== X-CSE-MsgGUID: jFbYD8KbRoKgCyH3R17Qcw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959654" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959654" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:35 -0800 X-CSE-ConnectionGUID: fQqVmp//Stib7W157oTdkw== X-CSE-MsgGUID: 1AS+1REqSa2+t6glqUBusA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674917" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:32 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 14/20] KVM: TDX: Add methods to ignore VMX preemption timer Date: Thu, 27 Feb 2025 09:20:15 +0800 Message-ID: <20250227012021.1778144-15-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX doesn't support VMX preemption timer. Implement access methods for VMM to ignore VMX preemption timer. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No change. TDX "the rest" v1: - Dropped KVM_BUG_ON() in vt_cancel_hv_timer(). (Rick) --- arch/x86/kvm/vmx/main.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 035c3ed263b7..75e7fef7914e 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -786,6 +786,27 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) return vmx_set_identity_map_addr(kvm, ident_addr); } +#ifdef CONFIG_X86_64 +static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, + bool *expired) +{ + /* VMX-preemption timer isn't available for TDX. */ + if (is_td_vcpu(vcpu)) + return -EINVAL; + + return vmx_set_hv_timer(vcpu, guest_deadline_tsc, expired); +} + +static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu) +{ + /* VMX-preemption timer can't be set. See vt_set_hv_timer(). */ + if (is_td_vcpu(vcpu)) + return; + + vmx_cancel_hv_timer(vcpu); +} +#endif + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -941,8 +962,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .pi_start_assignment = vmx_pi_start_assignment, #ifdef CONFIG_X86_64 - .set_hv_timer = vmx_set_hv_timer, - .cancel_hv_timer = vmx_cancel_hv_timer, + .set_hv_timer = vt_set_hv_timer, + .cancel_hv_timer = vt_cancel_hv_timer, #endif .setup_mce = vmx_setup_mce, From patchwork Thu Feb 27 01:20:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993416 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3FFF1B21B4; Thu, 27 Feb 2025 01:19:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619180; cv=none; b=iu1dlNHK2AFi9RMKTW71VIsL5s93intz+BE2p1EoI43Y4YzgzKg1vXNjE+vCY5yQAcXFfsj46JT3+2wSCxnOfToqx3Uxt9NqPLYZxAhecawXKKY5CvTtAo/9O1kMCNb1GptM0E35zPWBW4g3AB15g30hquQWyZxnGTVTrHaHj/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619180; c=relaxed/simple; bh=pVYU29wbge2aVLiMpQm4eg0isUm9O54M4ws6Q27APC8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Xlz/nDuZnv9i/UIXOU0Ujjs5EAE2vOQkr1iaRiOfIm7ik0MMP8RhK3GKFk6lMrReA2MTwXVC/lpxxofihzdJxDmF3K1Qrxly9qF5cAgdi5DbTZCy9WMNRGhZ5Lav2KSvL91ZIa9yF9qyYkZmgb1BMkp0eKNRm467SN8DRu3qBvY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=j+HpiFm6; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="j+HpiFm6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619179; x=1772155179; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pVYU29wbge2aVLiMpQm4eg0isUm9O54M4ws6Q27APC8=; b=j+HpiFm6L317UN6Sc06JUZgSliK5eu1k0Ca3351Xgp0X1GENYTR3ZNfR 13YidBTLx2ItiJ/PS7drsAJRSHi+OB70f9s/YHvhkIZGajvZYbLl0aQIX dVz2Lm3jy4OZhOd/P1LwmF2ucqOP4CNYqjG7lOxlf4Kd5wgMjklpeJ1Ka YREGhEea3ueYo33LDLzWOQTsDxlw0zvs5ninxNUi+bk3ef7sYWmn1h4Ki IQA2uYP4KX65netuEhiuhB1yVQlztkIMXqMr9zmotBniqYeo33MI3Q9Vb C4rDM84MeooNE0erCejjsIJ9e1NB7e9nX98Yr7aEPwX2jSyZsu23d5x2d w==; X-CSE-ConnectionGUID: 2D1yOiLIQza/HQ10SWXJYw== X-CSE-MsgGUID: LfslKYtDSMOjGcWXU1d6fw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959667" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959667" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:39 -0800 X-CSE-ConnectionGUID: DuHM7DIRS9yW1xpL5/TXvQ== X-CSE-MsgGUID: PM5BR6GrQOqJNSmSQQdX7Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674920" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:35 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 15/20] KVM: TDX: Add methods to ignore accesses to TSC Date: Thu, 27 Feb 2025 09:20:16 +0800 Message-ID: <20250227012021.1778144-16-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX protects TDX guest TSC state from VMM. Implement access methods to ignore guest TSC. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No change. TDX "the rest" v1: - Dropped KVM_BUG_ON() in vt_get_l2_tsc_offset(). (Rick) --- arch/x86/kvm/vmx/main.c | 44 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 75e7fef7914e..554975f053ff 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -786,6 +786,42 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr) return vmx_set_identity_map_addr(kvm, ident_addr); } +static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu) +{ + /* TDX doesn't support L2 guest at the moment. */ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_l2_tsc_offset(vcpu); +} + +static u64 vt_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu) +{ + /* TDX doesn't support L2 guest at the moment. */ + if (is_td_vcpu(vcpu)) + return 0; + + return vmx_get_l2_tsc_multiplier(vcpu); +} + +static void vt_write_tsc_offset(struct kvm_vcpu *vcpu) +{ + /* In TDX, tsc offset can't be changed. */ + if (is_td_vcpu(vcpu)) + return; + + vmx_write_tsc_offset(vcpu); +} + +static void vt_write_tsc_multiplier(struct kvm_vcpu *vcpu) +{ + /* In TDX, tsc multiplier can't be changed. */ + if (is_td_vcpu(vcpu)) + return; + + vmx_write_tsc_multiplier(vcpu); +} + #ifdef CONFIG_X86_64 static int vt_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, bool *expired) @@ -944,10 +980,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, - .get_l2_tsc_offset = vmx_get_l2_tsc_offset, - .get_l2_tsc_multiplier = vmx_get_l2_tsc_multiplier, - .write_tsc_offset = vmx_write_tsc_offset, - .write_tsc_multiplier = vmx_write_tsc_multiplier, + .get_l2_tsc_offset = vt_get_l2_tsc_offset, + .get_l2_tsc_multiplier = vt_get_l2_tsc_multiplier, + .write_tsc_offset = vt_write_tsc_offset, + .write_tsc_multiplier = vt_write_tsc_multiplier, .load_mmu_pgd = vt_load_mmu_pgd, From patchwork Thu Feb 27 01:20:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993417 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BD9F1B4159; Thu, 27 Feb 2025 01:19:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619183; cv=none; b=MqHgssW6giF+QIG6074ZZEAjiDwCE1PT47pihcZ7yBkX8ZvReOnCboNDhvIoCcAe7uiaODy3t9un6WfFFj/QvZv78y2Jqc5eocBhgs5tgUN1nwvAWqmUrjI/LggzDOSzWM0hP9Jj56MWy9JcNcRJUrxfcYefqPizQ+D9uVIc4ZA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619183; c=relaxed/simple; bh=zItX0PUquswBbQehRZ9Mq2klvkqoMesHQu9V+fBv0nQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pIiB7i9cOM6P0bPxQR0xWgm28e96GZWxNLrIH0wWu0Dap3TX17c1Qx6w0i6WKipmU+I617BvhomJmoAA6mN39OAStkxwoHzV/9HQ+ETjR4No8Fa8v96+9pXekIQ9MXkiK3pmAXJQpTaf/g/+Jkka9ELvY0yKdL5bA0jlPrPZwcc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QICB+FfM; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QICB+FfM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619182; x=1772155182; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zItX0PUquswBbQehRZ9Mq2klvkqoMesHQu9V+fBv0nQ=; b=QICB+FfMN4SgBmFcCZBl8h8VA8Bqh/YukE7YhV8rXZezz9NMRGWWpNLX +bhdk225ejeQCXHCOMIpG0Fiyi0El6rhJEsb2WV/vT+8YIi6sKTYglO/m PRJC/HRGJCiaz6zcuQ7TNW6CXRHJFxFTGkeLBEm9jIMLRVSZUQNjnATi5 CgRI1c56g6Aiku52JMzT/MbpsLVnncYkDNiK6oJ27TFJ6bciampSB9HTq FW8qdrrz+EJ7F8qS5aT1SoURyN3lHepk//sdKAV/gvDTieImMvAsS/3Nc waquQWDhT1CV4iDv2765RU8VZfWBGRIpnXhp4rcUenCcwN9xUeq5ww/E+ g==; X-CSE-ConnectionGUID: 0VFseoftTSiiw/Y4Vszj2Q== X-CSE-MsgGUID: ZHefEP26SyqDCpr61Z+3Iw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959676" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959676" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:42 -0800 X-CSE-ConnectionGUID: IQQQZUV3S5m+DHW4GByRnA== X-CSE-MsgGUID: i6SFevVcQGO/Tdq1ptKaqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674923" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:38 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 16/20] KVM: TDX: Ignore setting up mce Date: Thu, 27 Feb 2025 09:20:17 +0800 Message-ID: <20250227012021.1778144-17-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Because vmx_set_mce function is VMX specific and it cannot be used for TDX. Add vt stub to ignore setting up mce for TDX. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No change. --- arch/x86/kvm/vmx/main.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 554975f053ff..d73ea9ce750d 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -843,6 +843,14 @@ static void vt_cancel_hv_timer(struct kvm_vcpu *vcpu) } #endif +static void vt_setup_mce(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_setup_mce(vcpu); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -1002,7 +1010,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .cancel_hv_timer = vt_cancel_hv_timer, #endif - .setup_mce = vmx_setup_mce, + .setup_mce = vt_setup_mce, #ifdef CONFIG_KVM_SMM .smi_allowed = vt_smi_allowed, From patchwork Thu Feb 27 01:20:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993418 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7240214B965; Thu, 27 Feb 2025 01:19:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619186; cv=none; b=XY8annml1qjJ04rGFsS952YYP2a7Xwfg+43trX+TI/OTz6YfNhUOyJJp7rNmqPj5NTdKI/2EsBOWBWp9Ur/kJbx2jlHNfH0AnOYrQ3IQLpODG/HRsXZM2Wc3nojko5iWtN8NW7NYhpgWbJHo/62SEzbSXqSU+9AoRKFOvOpEUAs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619186; c=relaxed/simple; bh=1FiDmvpnC4DRyC4nO4hfG0+/4Ilohx50+cijVzK/D84=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q8mOJ0WCGIZ1FrRHFSKBq/nKkTpmTjEfNBLJeuCp3b9uHGoU3c1JOu7aotLcZRaSvHXgl0zrQab3ihilX/6f+v00mud1vRCzPKB1kuhr3gFs/8EcJwbSQlvWZG1LPZyy+sXw5g58F9gWpKyXY+6/8Fs/V/bpeKHpF6c5ELiXPDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Z4+Js9zr; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Z4+Js9zr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619185; x=1772155185; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1FiDmvpnC4DRyC4nO4hfG0+/4Ilohx50+cijVzK/D84=; b=Z4+Js9zr0GbskW+f1BXxNsTd7kYV72fmilMFftevKvUk+zxG8ITwaTzC FXfGHuFmQ+yJvts3CfPSm2WUyGoG1RLbNW9wjOAX3mv3CYTXJZaCmElBM d9yX48CGcipLaMPCJCvcOqlal3Ry4Pi8AZOoq4uKf7yg+qJcYhaKo3pcc emXjjmeU+pOE1eIXBW9io9wGUWZV+xlzrC+RXhS7LP6/oJ0n/Rl8ZPzHY 7qB5GNRd+RIp7uXrOes5xVb0+Pg36jBE1n59IXF+5mJOKP8dFQpjCHYvP EeUJ/UZ2Ah9SONzZBOg8KODE2/Qo2L2kpzXBE6RGB9VqU724DSVCSb4qe g==; X-CSE-ConnectionGUID: HToI0uZgTbmUo8PjBibOJQ== X-CSE-MsgGUID: F6TwHFvaRX29bFmURnD6xA== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959689" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959689" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:45 -0800 X-CSE-ConnectionGUID: iSPkmwjEQY2LeeHmHd6tNg== X-CSE-MsgGUID: AC5V+EonQRCorckgj1u9Lw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674928" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:41 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 17/20] KVM: TDX: Add a method to ignore hypercall patching Date: Thu, 27 Feb 2025 09:20:18 +0800 Message-ID: <20250227012021.1778144-18-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Because guest TD memory is protected, VMM patching guest binary for hypercall instruction isn't possible. Add a method to ignore hypercall patching. Note: guest TD kernel needs to be modified to use TDG.VP.VMCALL for hypercall. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No change. TDX "the rest" v1: - Renamed from "KVM: TDX: Add a method to ignore for TDX to ignore hypercall patch" to "KVM: TDX: Add a method to ignore hypercall patching". - Dropped KVM_BUG_ON() in vt_patch_hypercall(). (Rick) - Remove "with a warning" from "Add a method to ignore hypercall patching with a warning." in changelog to reflect code change. --- arch/x86/kvm/vmx/main.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d73ea9ce750d..fa8b5f609666 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -673,6 +673,19 @@ static u32 vt_get_interrupt_shadow(struct kvm_vcpu *vcpu) return vmx_get_interrupt_shadow(vcpu); } +static void vt_patch_hypercall(struct kvm_vcpu *vcpu, + unsigned char *hypercall) +{ + /* + * Because guest memory is protected, guest can't be patched. TD kernel + * is modified to use TDG.VP.VMCALL for hypercall. + */ + if (is_td_vcpu(vcpu)) + return; + + vmx_patch_hypercall(vcpu, hypercall); +} + static void vt_inject_irq(struct kvm_vcpu *vcpu, bool reinjected) { if (is_td_vcpu(vcpu)) @@ -952,7 +965,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vt_set_interrupt_shadow, .get_interrupt_shadow = vt_get_interrupt_shadow, - .patch_hypercall = vmx_patch_hypercall, + .patch_hypercall = vt_patch_hypercall, .inject_irq = vt_inject_irq, .inject_nmi = vt_inject_nmi, .inject_exception = vt_inject_exception, From patchwork Thu Feb 27 01:20:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993419 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 893951C07DA; Thu, 27 Feb 2025 01:19:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619189; cv=none; b=TMY+Qrm6I3+8CcL/ZCihOiBhIRElN1fxJ/F4+3s1Hh5UfvdptVzgnya6bsvSU7CUqpXqz/R2riq+LeLkbuhJDgMfHphtyOuBwfja8oZZ2Lqpw4BgZEiS3LkAKDXgA21mmq+wNCxvir9R+EYaHr8VF3keunNXbh+4ygiCv7R1+AE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619189; c=relaxed/simple; bh=XJaBTgpvmUOYXvblbK+evXNJHErt7dbEPe57+h9OFdA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=egLnN5bH3bwBcXzmxYudYXcFDtsRuGu/WHacEy7ALquBtF4MC+mW4PLgYw+wpih2RGp8DjHq54vlDeoeKqjLICQaX8oOomfcLrliAsaBTV+H8/wC2GmyWDjxxj9nWVo3TlMKif7c7X8HdXpi9PRkIJWTpnBiABT+LRUmZFd/TO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TZ+oJogq; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TZ+oJogq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619189; x=1772155189; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XJaBTgpvmUOYXvblbK+evXNJHErt7dbEPe57+h9OFdA=; b=TZ+oJogq8qu4MVLVQPO0wrOSru5DurmAbmTeQs/XhNRV9X9vff/KqDSQ XXjTRlLCEEKzfyMiLuwr2tGdLjUvLKVVNcUG7g0IyW1H7ZRMse8P+SGmK siG2ycJ7LfmBAtkeFHRoHtR8pBopyokg11pgfr7XkMutjP7dK3zfLZ8z3 OkPK7FICKCO/c7WxQpCR1V71OxmYn1Botiu8D8Tm2q/NjrxPSA9PcHlVN JDvC2jE0Npb6sXpS0on6IMHtA3Ukfl+tgg4C/u6hm6ra3OnQ6/dNUqhmb e6Zism2VSbw2nFUguQ/hBGRHONxsBcH/UNuAsQjvsJNJmPNZgKuHHAZmN Q==; X-CSE-ConnectionGUID: BYKBpeBKQTuUWI45+sEEoQ== X-CSE-MsgGUID: bgYJCz4MRoWzSm6kfllh9g== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959695" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959695" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:48 -0800 X-CSE-ConnectionGUID: dkpu7eQWTFucAns7gVrD+g== X-CSE-MsgGUID: Um/k5dJ0TI2X3mXmyP3P0A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674931" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:45 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 18/20] KVM: TDX: Enable guest access to MTRR MSRs Date: Thu, 27 Feb 2025 09:20:19 +0800 Message-ID: <20250227012021.1778144-19-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Allow TDX guests to access MTRR MSRs as what KVM does for normal VMs, i.e., KVM emulates accesses to MTRR MSRs, but doesn't virtualize guest MTRR memory types. TDX module exposes MTRR feature to TDX guests unconditionally. KVM needs to support MTRR MSRs accesses for TDX guests to match the architectural behavior. Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Add the MTRR access support back, but drop the special handling for TDX guests, just align with what KVM does for normal VMs. --- arch/x86/kvm/vmx/tdx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index dbc9fffcbc26..5b957e5e503d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2049,6 +2049,9 @@ bool tdx_has_emulated_msr(u32 index) case MSR_IA32_ARCH_CAPABILITIES: case MSR_IA32_POWER_CTL: case MSR_IA32_CR_PAT: + case MSR_MTRRcap: + case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000: + case MSR_MTRRdefType: case MSR_IA32_TSC_DEADLINE: case MSR_IA32_MISC_ENABLE: case MSR_PLATFORM_INFO: From patchwork Thu Feb 27 01:20:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993420 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBF721C5D63; Thu, 27 Feb 2025 01:19:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619193; cv=none; b=WocB+RczTTRJYvYCF+EcBqSqqZUiziY606nkp1o5HDv6Ruy1Ld9M2TbFGPEqZ3XiMI6MD4HB4voNkWEc1LQncUPEE8gq/7mI2SRtH5BJl0G44fmyTEx0jXJ66jOQ7QCjBTC1+G87OEREEHcSC4X4I/rYW2tf9+9R/oQTo6TGUgw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619193; c=relaxed/simple; bh=69h9BawlmRPy4kcfbZCH7uVMvPfzAdDJo52SPxWa6Ec=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Sw5bZAKU4AaKrYmJrI2/oXq0bzcqiXj8CAY6K+S3YdD5ZqSxRwBxx/lPv6Lofi+4JJCNv/5WUmRfSrYpXPRhlmYZW5wtEzRU+RICl0jNYzxCk9h5lIcinS67SiOb64LwBCchgnvAMebUun4hPzZso2ucqq0FA61oc6QJU92uiME= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bgjTqVw7; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bgjTqVw7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619192; x=1772155192; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=69h9BawlmRPy4kcfbZCH7uVMvPfzAdDJo52SPxWa6Ec=; b=bgjTqVw78XbR0DMZ8y4/j++w/3+143IsqNqcaGPNrp0NEEsbDU+k8fuK irXJ9YN7cEciD6vt9VkjaMqmqjt585n18PrtgzbMYMqZQn6GDYGxL4/7j lbICneFvtRwJzEbbVxe/vgN2Vg/wBkjcmceMP4OP3HsvMo7Qoo4LyT38R J5wdq0MN2FpWQAJUsaj7ESJ6KUBfTdzedAwi52RC93tLaE7gRZJRnlhsy zN2VT1gNaz/fEei8hfeUzJwGHdLALaQppAtp7lcN5ebxQYmhWBiW2IR8P 3L4au1+uQh/DfzKvH94H2bNpeP/+9QTGDBewFasiDDZuSHqoIhFodfOEu g==; X-CSE-ConnectionGUID: WI5UhokuQuGWvW26BQR9+g== X-CSE-MsgGUID: KUCUUu0NQZytttnV0B34jw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959705" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959705" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:52 -0800 X-CSE-ConnectionGUID: sqt1etO+R8WB9MUAhgAXIQ== X-CSE-MsgGUID: oGhUGVg4RGuyI61XIhJlrw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674941" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:48 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 19/20] KVM: TDX: Make TDX VM type supported Date: Thu, 27 Feb 2025 09:20:20 +0800 Message-ID: <20250227012021.1778144-20-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Now all the necessary code for TDX is in place, it's ready to run TDX guest. Advertise the VM type of KVM_X86_TDX_VM so that the user space VMM like QEMU can start to use it. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - No change. TDX "the rest" v1: - Move down to the end of patch series. --- arch/x86/kvm/vmx/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index fa8b5f609666..320c96e1e80a 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -1092,6 +1092,7 @@ static int __init vt_init(void) vcpu_align = max_t(unsigned, vcpu_align, __alignof__(struct vcpu_tdx)); kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT; + kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM); } /* From patchwork Thu Feb 27 01:20:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13993421 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 406421D5AAE; Thu, 27 Feb 2025 01:19:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619197; cv=none; b=thrTj/BLMia9X3ZwQgCujfnz5Ht45302oXxTXPuqATV8YFDpBfj6Xvv/73zItp2K4jKnepucemcg+zzW4k0JFlgCPqn/ibh5iQY/gSrtOpsYM4SvQsSID/yjP0DOB+BCcfYZitOPyxORFuWCYpIQsH1vpZMjHO65+k+pKtXS4eA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740619197; c=relaxed/simple; bh=mIjc+dLi3yzXCyrtTzBHorO1x7uI89b87fbmJwN5hf4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VuW2M+s01c0gXMANMUSOePeXiuk9K/5PPjoCLCn+JYxYMqSB1HmIgaf74z4oZClaeD5tva3eeIqhJUEOrQBU746MOR472yF8Z0hMrSxc3Y4XH7U7BtdTUupASGzBwu0gIBRKXAQwzlmB7gjaRDBoTbMlNeiNDUCFWT9EOH4qb/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=N6h7vY4J; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="N6h7vY4J" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740619195; x=1772155195; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mIjc+dLi3yzXCyrtTzBHorO1x7uI89b87fbmJwN5hf4=; b=N6h7vY4JIOSRThjDJj2eULJnweECnZDKhnL+xMjaHOA/cmoFKO4A3qWF BYUv6PticyGmf7UrtPL275gN/EfaaNDKPtvlWLTl8tiDbTeUH2c138Bz3 dAe5U5Nfzr9O6DvZYfIYiXgm11rPZftRfVbJZwD8Vnz9jdCShImqg/CnZ wiHa5ODKkpz/k845nrEOdGk5+K39lkKA4CM4O2w1fO0lo8PsRHRPzQpI1 1l7zhE47eRUmhAtBagLFEJSFtZfwwCXCtJRUrDhfZN9CmHFg0v+oWZl7/ NDxm+5soivwD1aXNU1qDiF5mgpetZnSL8wNFBZXN0GU0VkRTKO7F+w+Ys w==; X-CSE-ConnectionGUID: Z8beyJ6zTvmH2/Jm7JaKAg== X-CSE-MsgGUID: sAwaCRRbRGaQ34FQzP7Vmw== X-IronPort-AV: E=McAfee;i="6700,10204,11357"; a="63959712" X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="63959712" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:55 -0800 X-CSE-ConnectionGUID: x2H/xLwYT+qC6U4pKLvfuQ== X-CSE-MsgGUID: h3A4Th8kSmmqbsLzAVo7tA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.13,318,1732608000"; d="scan'208";a="116674951" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Feb 2025 17:19:51 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 20/20] Documentation/virt/kvm: Document on Trust Domain Extensions (TDX) Date: Thu, 27 Feb 2025 09:20:21 +0800 Message-ID: <20250227012021.1778144-21-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com> References: <20250227012021.1778144-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add documentation to Intel Trusted Domain Extensions (TDX) support. Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX "the rest" v2: - Update for configurable CPUID comment (Tony) - Add the documentation for KVM_TDX_GET_CPUID - Add missing white space. (Kai) - Consolidate description of KVM_MEMORY_ENCRYPT_OP for SEV and TDX. (Kai) - Update the "Overview" part. (Kai) - Sync description of struct kvm_tdx_cmd with source code. (Kai) - Update description of KVM_TDX_CAPABILITIES. (Xiaoyao) - Use hw_error instead of error and remove the description of "unused" for TDX specific sub ioctl commands. (Kai) - Add description for return values for TDX specific sub-ioctl() commands. (Kai) - Replace "SEAM call" with "SEAMCALL", and replace "sub ioctl" with "sub-ioctl()" for consistency. (Kai) - Remove the detailed SEAMCALLs when describing TDX specific sub-ioctl() commands. (Kai, Xiaoyao) - Update description for KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, and KVM_TDX_INIT_MEM_REGION. (Xiaoyao) - Update "KVM TDX creation flow". (Kai, Xiaoyao) - Fix typos in enum/struct comments (Xiaoyao) - Remove design section (Kai) TDX "the rest" v1: - Updates to match code changes (Tony) --- Documentation/virt/kvm/api.rst | 13 +- Documentation/virt/kvm/x86/index.rst | 1 + Documentation/virt/kvm/x86/intel-tdx.rst | 255 +++++++++++++++++++++++ 3 files changed, 265 insertions(+), 4 deletions(-) create mode 100644 Documentation/virt/kvm/x86/intel-tdx.rst diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 641d3537a38d..7e155977a8ed 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1407,6 +1407,9 @@ the memory region are automatically reflected into the guest. For example, an mmap() that affects the region will be made visible immediately. Another example is madvise(MADV_DROP). +For TDX guest, deleting/moving memory region loses guest memory contents. +Read only region isn't supported. Only as-id 0 is supported. + Note: On arm64, a write generated by the page-table walker (to update the Access and Dirty flags, for example) never results in a KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This @@ -4764,7 +4767,7 @@ H_GET_CPU_CHARACTERISTICS hypercall. :Capability: basic :Architectures: x86 -:Type: vm +:Type: vm ioctl, vcpu ioctl :Parameters: an opaque platform specific structure (in/out) :Returns: 0 on success; -1 on error @@ -4772,9 +4775,11 @@ If the platform supports creating encrypted VMs then this ioctl can be used for issuing platform-specific memory encryption commands to manage those encrypted VMs. -Currently, this ioctl is used for issuing Secure Encrypted Virtualization -(SEV) commands on AMD Processors. The SEV commands are defined in -Documentation/virt/kvm/x86/amd-memory-encryption.rst. +Currently, this ioctl is used for issuing both Secure Encrypted Virtualization +(SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands +on Intel Processors. The detailed commands are defined in +Documentation/virt/kvm/x86/amd-memory-encryption.rst and +Documentation/virt/kvm/x86/intel-tdx.rst. 4.111 KVM_MEMORY_ENCRYPT_REG_REGION ----------------------------------- diff --git a/Documentation/virt/kvm/x86/index.rst b/Documentation/virt/kvm/x86/index.rst index 9ece6b8dc817..851e99174762 100644 --- a/Documentation/virt/kvm/x86/index.rst +++ b/Documentation/virt/kvm/x86/index.rst @@ -11,6 +11,7 @@ KVM for x86 systems cpuid errata hypercalls + intel-tdx mmu msr nested-vmx diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/kvm/x86/intel-tdx.rst new file mode 100644 index 000000000000..de41d4c01e5c --- /dev/null +++ b/Documentation/virt/kvm/x86/intel-tdx.rst @@ -0,0 +1,255 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Intel Trust Domain Extensions (TDX) +=================================== + +Overview +======== +Intel's Trust Domain Extensions (TDX) protect confidential guest VMs from the +host and physical attacks. A CPU-attested software module called 'the TDX +module' runs inside a new CPU isolated range to provide the functionalities to +manage and run protected VMs, a.k.a, TDX guests or TDs. + +Please refer to [1] for the whitepaper, specifications and other resources. + +This documentation describes TDX-specific KVM ABIs. The TDX module needs to be +initialized before it can be used by KVM to run any TDX guests. The host +core-kernel provides the support of initializing the TDX module, which is +described in the Documentation/arch/x86/tdx.rst. + +API description +=============== + +KVM_MEMORY_ENCRYPT_OP +--------------------- +:Type: vm ioctl, vcpu ioctl + +For TDX operations, KVM_MEMORY_ENCRYPT_OP is re-purposed to be generic +ioctl with TDX specific sub-ioctl() commands. + +:: + + /* Trust Domain Extensions sub-ioctl() commands. */ + enum kvm_tdx_cmd_id { + KVM_TDX_CAPABILITIES = 0, + KVM_TDX_INIT_VM, + KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, + KVM_TDX_FINALIZE_VM, + KVM_TDX_GET_CPUID, + + KVM_TDX_CMD_NR_MAX, + }; + + struct kvm_tdx_cmd { + /* enum kvm_tdx_cmd_id */ + __u32 id; + /* flags for sub-command. If sub-command doesn't use this, set zero. */ + __u32 flags; + /* + * data for each sub-command. An immediate or a pointer to the actual + * data in process virtual address. If sub-command doesn't use it, + * set zero. + */ + __u64 data; + /* + * Auxiliary error code. The sub-command may return TDX SEAMCALL + * status code in addition to -Exxx. + */ + __u64 hw_error; + }; + +KVM_TDX_CAPABILITIES +-------------------- +:Type: vm ioctl +:Returns: 0 on success, <0 on error + +Return the TDX capabilities that current KVM supports with the specific TDX +module loaded in the system. It reports what features/capabilities are allowed +to be configured to the TDX guest. + +- id: KVM_TDX_CAPABILITIES +- flags: must be 0 +- data: pointer to struct kvm_tdx_capabilities +- hw_error: must be 0 + +:: + + struct kvm_tdx_capabilities { + __u64 supported_attrs; + __u64 supported_xfam; + __u64 reserved[254]; + + /* Configurable CPUID bits for userspace */ + struct kvm_cpuid2 cpuid; + }; + + +KVM_TDX_INIT_VM +--------------- +:Type: vm ioctl +:Returns: 0 on success, <0 on error + +Perform TDX specific VM initialization. This needs to be called after +KVM_CREATE_VM and before creating any VCPUs. + +- id: KVM_TDX_INIT_VM +- flags: must be 0 +- data: pointer to struct kvm_tdx_init_vm +- hw_error: must be 0 + +:: + + struct kvm_tdx_init_vm { + __u64 attributes; + __u64 xfam; + __u64 mrconfigid[6]; /* sha384 digest */ + __u64 mrowner[6]; /* sha384 digest */ + __u64 mrownerconfig[6]; /* sha384 digest */ + + /* The total space for TD_PARAMS before the CPUIDs is 256 bytes */ + __u64 reserved[12]; + + /* + * Call KVM_TDX_INIT_VM before vcpu creation, thus before + * KVM_SET_CPUID2. + * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the + * TDX module directly virtualizes those CPUIDs without VMM. The user + * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with + * those values. If it doesn't, KVM may have wrong idea of vCPUIDs of + * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX + * module doesn't virtualize. + */ + struct kvm_cpuid2 cpuid; + }; + + +KVM_TDX_INIT_VCPU +----------------- +:Type: vcpu ioctl +:Returns: 0 on success, <0 on error + +Perform TDX specific VCPU initialization. + +- id: KVM_TDX_INIT_VCPU +- flags: must be 0 +- data: initial value of the guest TD VCPU RCX +- hw_error: must be 0 + +KVM_TDX_INIT_MEM_REGION +----------------------- +:Type: vcpu ioctl +:Returns: 0 on success, <0 on error + +Initialize @nr_pages TDX guest private memory starting from @gpa with userspace +provided data from @source_addr. + +Note, before calling this sub command, memory attribute of the range +[gpa, gpa + nr_pages] needs to be private. Userspace can use +KVM_SET_MEMORY_ATTRIBUTES to set the attribute. + +If KVM_TDX_MEASURE_MEMORY_REGION flag is specified, it also extends measurement. + +- id: KVM_TDX_INIT_MEM_REGION +- flags: currently only KVM_TDX_MEASURE_MEMORY_REGION is defined +- data: pointer to struct kvm_tdx_init_mem_region +- hw_error: must be 0 + +:: + + #define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0) + + struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; + }; + + +KVM_TDX_FINALIZE_VM +------------------- +:Type: vm ioctl +:Returns: 0 on success, <0 on error + +Complete measurement of the initial TD contents and mark it ready to run. + +- id: KVM_TDX_FINALIZE_VM +- flags: must be 0 +- data: must be 0 +- hw_error: must be 0 + + +KVM_TDX_GET_CPUID +----------------- +:Type: vcpu ioctl +:Returns: 0 on success, <0 on error + +Get the CPUID values that the TDX module virtualizes for the TD guest. +When it returns -E2BIG, the user space should allocate a larger buffer and +retry. The minimum buffer size is updated in the nent field of the +struct kvm_cpuid2. + +- id: KVM_TDX_GET_CPUID +- flags: must be 0 +- data: pointer to struct kvm_cpuid2 (in/out) +- hw_error: must be 0 (out) + +:: + + struct kvm_cpuid2 { + __u32 nent; + __u32 padding; + struct kvm_cpuid_entry2 entries[0]; + }; + + struct kvm_cpuid_entry2 { + __u32 function; + __u32 index; + __u32 flags; + __u32 eax; + __u32 ebx; + __u32 ecx; + __u32 edx; + __u32 padding[3]; + }; + +KVM TDX creation flow +===================== +In addition to the standard KVM flow, new TDX ioctls need to be called. The +control flow is as follows: + +#. Check system wide capability + + * KVM_CAP_VM_TYPES: Check if VM type is supported and if KVM_X86_TDX_VM + is supported. + +#. Create VM + + * KVM_CREATE_VM + * KVM_TDX_CAPABILITIES: Query TDX capabilities for creating TDX guests. + * KVM_CHECK_EXTENSION(KVM_CAP_MAX_VCPUS): Query maximum VCPUs the TD can + support at VM level (TDX has its own limitation on this). + * KVM_SET_TSC_KHZ: Configure TD's TSC frequency if a different TSC frequency + than host is desired. This is Optional. + * KVM_TDX_INIT_VM: Pass TDX specific VM parameters. + +#. Create VCPU + + * KVM_CREATE_VCPU + * KVM_TDX_INIT_VCPU: Pass TDX specific VCPU parameters. + * KVM_SET_CPUID2: Configure TD's CPUIDs. + * KVM_SET_MSRS: Configure TD's MSRs. + +#. Initialize initial guest memory + + * Prepare content of initial guest memory. + * KVM_TDX_INIT_MEM_REGION: Add initial guest memory. + * KVM_TDX_FINALIZE_VM: Finalize the measurement of the TDX guest. + +#. Run VCPU + +References +========== + +.. [1] https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/documentation.html