From patchwork Tue Feb 11 02:54:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969229 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB1301D5ACE; Tue, 11 Feb 2025 02:53:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242390; cv=none; b=JpMiG5oaj9a/bfi7HXDqaPwN14/50pwWGiMaLJIZyZZc4BdXSCJLjDeFiKTXCwAS9ZjiryNap1RVEuZvKi6FrVqbIgYMAlQM8PeTDOdxQyzO5jqDGtMI514sG6y26xrjCEuykND+kZd/GQnUYeeKVHiTpxRfYZ9nR1olH7+IBys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242390; c=relaxed/simple; bh=4ZjJT3SQODngSUmtn6bJJ64fMMTdXrkSzpJP2btL/a4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cEwCe7n/3sTVIr39K4XX6gm42HiQll8aVKjg2ifjMowqfMOA6RmEcgaOHD7LPnrRtih5ten/LAh3murrbc/K9J6tdJusquPEVwbw8W/OsMHR1cZhhbEd1VGl4wk6dQUpFMS+cOBr3HpI6CaBg/H6YFiimH3Uhzu4VWtxkekrcXo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gzQfU5gg; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gzQfU5gg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242389; x=1770778389; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4ZjJT3SQODngSUmtn6bJJ64fMMTdXrkSzpJP2btL/a4=; b=gzQfU5ggoekLXIChxFS5HlCWXBRisa9uY4Z4Egcm3I1ideH32JKk9iNC moteuGKdU9fVef6ctEsk6yOWSkhflTKE+So2gFyDKPvlniXV/xVVhjAHB KIQ2SLxd8C1dS3eP8XaaorV9Qca630GMrGY2WzGWZNTFWdedKy4FzI6SD aXPtoQuaIPz6AM3vZ36abMUVY/zYQUlThLqwxPO+00/Jqqt/WN93YUORB hhSegCf9toZW6wQmH7Vd4jj7gqZTBgGI3kL5qhg9O0JxU4aV2yZPq/2hL J3zOv2DDOyEgZKMypcUdosIJJOXWKUCnrEVyDLJ23WgAMbmKTVPVMEswK g==; X-CSE-ConnectionGUID: i7XolAemSOmAlSKzpV4jtg== X-CSE-MsgGUID: RMYMi3sKTBSelrGfy0TDcQ== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506592" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506592" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:09 -0800 X-CSE-ConnectionGUID: k7F5yYDoSt+b4S/1JCEeig== X-CSE-MsgGUID: i61U/UOfQTSy/WrxOQjT/A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236403" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:05 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 1/8] KVM: x86: Have ____kvm_emulate_hypercall() read the GPRs Date: Tue, 11 Feb 2025 10:54:35 +0800 Message-ID: <20250211025442.3071607-2-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Have ____kvm_emulate_hypercall() read the GPRs instead of passing them in via the macro. When emulating KVM hypercalls via TDVMCALL, TDX will marshall registers of TDVMCALL ABI into KVM's x86 registers to match the definition of KVM hypercall ABI _before_ ____kvm_emulate_hypercall() gets called. Therefore, ____kvm_emulate_hypercall() can just read registers internally based on KVM hypercall ABI, and those registers can be removed from the __kvm_emulate_hypercall() macro. Also, op_64_bit can be determined inside ____kvm_emulate_hypercall(), remove it from the __kvm_emulate_hypercall() macro as well. No functional change intended. Suggested-by: Sean Christopherson Signed-off-by: Binbin Wu Reviewed-by: Kai Huang --- arch/x86/kvm/x86.c | 15 ++++++++------- arch/x86/kvm/x86.h | 26 +++++++++----------------- 2 files changed, 17 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6ace11303f90..29f33f7c9da9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10022,13 +10022,16 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu) return kvm_skip_emulated_instruction(vcpu); } -int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, - unsigned long a0, unsigned long a1, - unsigned long a2, unsigned long a3, - int op_64_bit, int cpl, +int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl, int (*complete_hypercall)(struct kvm_vcpu *)) { unsigned long ret; + unsigned long nr = kvm_rax_read(vcpu); + unsigned long a0 = kvm_rbx_read(vcpu); + unsigned long a1 = kvm_rcx_read(vcpu); + unsigned long a2 = kvm_rdx_read(vcpu); + unsigned long a3 = kvm_rsi_read(vcpu); + int op_64_bit = is_64_bit_hypercall(vcpu); ++vcpu->stat.hypercalls; @@ -10131,9 +10134,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) if (kvm_hv_hypercall_enabled(vcpu)) return kvm_hv_hypercall(vcpu); - return __kvm_emulate_hypercall(vcpu, rax, rbx, rcx, rdx, rsi, - is_64_bit_hypercall(vcpu), - kvm_x86_call(get_cpl)(vcpu), + return __kvm_emulate_hypercall(vcpu, kvm_x86_call(get_cpl)(vcpu), complete_hypercall_exit); } EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 91e50a513100..8b27f70c6321 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -621,25 +621,17 @@ static inline bool user_exit_on_hypercall(struct kvm *kvm, unsigned long hc_nr) return kvm->arch.hypercall_exit_enabled & BIT(hc_nr); } -int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, - unsigned long a0, unsigned long a1, - unsigned long a2, unsigned long a3, - int op_64_bit, int cpl, +int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl, int (*complete_hypercall)(struct kvm_vcpu *)); -#define __kvm_emulate_hypercall(_vcpu, nr, a0, a1, a2, a3, op_64_bit, cpl, complete_hypercall) \ -({ \ - int __ret; \ - \ - __ret = ____kvm_emulate_hypercall(_vcpu, \ - kvm_##nr##_read(_vcpu), kvm_##a0##_read(_vcpu), \ - kvm_##a1##_read(_vcpu), kvm_##a2##_read(_vcpu), \ - kvm_##a3##_read(_vcpu), op_64_bit, cpl, \ - complete_hypercall); \ - \ - if (__ret > 0) \ - __ret = complete_hypercall(_vcpu); \ - __ret; \ +#define __kvm_emulate_hypercall(_vcpu, cpl, complete_hypercall) \ +({ \ + int __ret; \ + __ret = ____kvm_emulate_hypercall(_vcpu, cpl, complete_hypercall); \ + \ + if (__ret > 0) \ + __ret = complete_hypercall(_vcpu); \ + __ret; \ }) int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); From patchwork Tue Feb 11 02:54:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969230 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C42BF1D54C2; Tue, 11 Feb 2025 02:53:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242394; cv=none; b=ciK2auPrgjFoITcLsI6yCbi46m1pJ14gUldDLrjq2o60YjoqjjHuIZCJ9yQwdN55SvdJLOygDtdZlKQvd2cOMxPf5vcUofAD2lKRRALi02ND2d21AWS5+m4sTRGQYv9pFwYguTvSs7SN3+l0bEfJ1FVbuvZ7B3VPoi9VSIwWx4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242394; c=relaxed/simple; bh=SBvYkIlOlMShcP+fpx+z9pM4LYvwobKjklVzZrcNP/E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uZ+D3LnrRZ1kbniH+Vz/gVBVbPqDoUB81kWFRLKv7WmbE5cuTOAOUaeoT5M50ba7LFvD8BU9VFxj/gUupM0cfcT8xVA3Or2Ci66/1SARQIDAkRe3O2C1rmtBIoEIs2jasanoLu48BiEwIw3ULJmasQEqIA/L2mGISz+w4IDbroE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LPUnu/33; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LPUnu/33" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242393; x=1770778393; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SBvYkIlOlMShcP+fpx+z9pM4LYvwobKjklVzZrcNP/E=; b=LPUnu/33dUFalfHauHIqqNS1sLrzOS7YbMs8QAO6Zp1zF44iF4vXD7rA gl5vb4zAKQHZCqgyZdaFv1C4NnlQ+KTtcCGPwtj7b/5nQXwGQWCOLcLKl 99hWZN7sVz4OwSRT5n7kuWqrzKsGIxHDTWWV9280rJg2HXRjVYaopR5Gq rIpKZswAifgHsOGrf0cwBdD5vCXujWsgOI+31ykuM5oh3IuNbBC0bsm4w fS+CxADPrpRL5AVIy4qSKFyw5/kTKFzpohh5ML07XIZ0NsV1ZJf0ZiIry 4tYScv0icAzwiK81VkiO5TrPu6TLFYhNSMv6N6Mzql8wm4VvKyyOsDaXP w==; X-CSE-ConnectionGUID: pOeqHYsuSVGcgKE0S18A0w== X-CSE-MsgGUID: KVPYpSLGQamyVsUN9gpL9A== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506596" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506596" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:13 -0800 X-CSE-ConnectionGUID: LDeV5jkeSGKkXjxJqRRkWg== X-CSE-MsgGUID: QUW0cuJdRjOlhUjJ8KbjLw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236419" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:09 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 2/8] KVM: TDX: Add a place holder to handle TDX VM exit Date: Tue, 11 Feb 2025 10:54:36 +0800 Message-ID: <20250211025442.3071607-3-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Introduce the wiring for handling TDX VM exits by implementing the callbacks .get_exit_info(), .get_entry_info(), and .handle_exit(). Additionally, add error handling during the TDX VM exit flow, and add a place holder to handle various exit reasons. Store VMX exit reason and exit qualification in struct vcpu_vt for TDX, so that TDX/VMX can use the same helpers to get exit reason and exit qualification. Store extended exit qualification and exit GPA info in struct vcpu_tdx because they are used by TDX code only. Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.* operations due to secure EPT or TD EPOCH. If the contention occurs, the return value will have TDX_OPERAND_BUSY set, prompting the vCPU to attempt re-entry into the guest with EXIT_FASTPATH_EXIT_HANDLED, not EXIT_FASTPATH_REENTER_GUEST, so that the interrupts pending during IN_GUEST_MODE can be delivered for sure. Otherwise, the requester of KVM_REQ_OUTSIDE_GUEST_MODE may be blocked endlessly. Error Handling: - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM firmware is not loaded or disabled. - TDX_ERROR: This indicates some check failed in the TDX module, preventing the vCPU from running. - Failed VM Entry: Exit to userspace with KVM_EXIT_FAIL_ENTRY. Handle it separately before handling TDX_NON_RECOVERABLE because when off-TD debug is not enabled, TDX_NON_RECOVERABLE is set. - TDX_NON_RECOVERABLE: Set by the TDX module when the error is non-recoverable, indicating that the TDX guest is dead or the vCPU is disabled. A special case is triple fault, which also sets TDX_NON_RECOVERABLE but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX case. - Any unhandled VM exit reason will also return to userspace with KVM_EXIT_INTERNAL_ERROR. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Chao Gao --- Hypercalls exit to userspace v2: - Record vmx exit reason and exit_qualification in struct vcpu_vt for TDX, so that TDX/VMX can use the same helpers to get exit reason and exit qualification. (Sean) - Handle failed vmentry separately by KVM_EXIT_FAIL_ENTRY. (Xiaoyao) - Remove the print of hkid & set_hkid_to_hpa() for TDX_ERROR or TDX_NON_RECOVERABLE case. (Xiaoyao) - Handle EXIT_REASON_TRIPLE_FAULT in switch case, and drop the helper tdx_handle_triple_fault(), open code it. (Sean) - intr_info should be 0 for the case VMX exit reason is invalid in tdx_get_exit_info(). (Chao) - Combine TDX_OPERAND_BUSY for TDX_OPERAND_ID_TD_EPOCH and TDX_OPERAND_ID_SEPT, use EXIT_FASTPATH_EXIT_HANDLED instead of EXIT_FASTPATH_REENTER_GUEST. Updated comments. - Use helper tdx_operand_busy(). - Add vt_get_entry_info() to implement .get_entry_info() for TDX. Hypercalls exit to userspace v1: - Dropped Paolo's Reviewed-by since the change is not subtle. - Mention addition of .get_exit_info() handler in changelog. (Binbin) - tdh_sept_seamcall() -> tdx_seamcall_sept() in comments. (Binbin) - Do not open code TDX_ERROR_SEPT_BUSY. (Binbin) - "TDH.VP.ENTRY" -> "TDH.VP.ENTER". (Binbin) - Remove the use of union tdx_exit_reason. (Sean) https://lore.kernel.org/kvm/ZfSExlemFMKjBtZb@google.com/ - Add tdx_check_exit_reason() to check a VMX exit reason against the status code of TDH.VP.ENTER. - Move the handling of TDX_ERROR_SEPT_BUSY and (TDX_OPERAND_BUSY | TDX_OPERAND_ID_TD_EPOCH) into fast path, and add a helper function tdx_exit_handlers_fastpath(). - Remove the warning on TDX_SW_ERROR in fastpath, but return without further handling. - Call kvm_machine_check() for EXIT_REASON_MCE_DURING_VMENTRY, align with VMX case. - On failed_vmentry in fast path, return without further handling. - Exit to userspace for #UD and #GP. - Fix whitespace in tdx_get_exit_info() - Add a comment in tdx_handle_exit() to describe failed_vmentry case is handled by TDX_NON_RECOVERABLE handling. - Move the code of handling NMI, exception and external interrupts out of the patch, i.e., the NMI handling in tdx_vcpu_enter_exit() and the wiring of .handle_exit_irqoff() are removed. - Drop the check for VCPU_TD_STATE_INITIALIZED in tdx_handle_exit() because it has been checked in tdx_vcpu_pre_run(). - Update changelog. --- arch/x86/include/asm/tdx.h | 1 + arch/x86/kvm/vmx/main.c | 38 +++++++++- arch/x86/kvm/vmx/tdx.c | 141 ++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/tdx_errno.h | 3 + arch/x86/kvm/vmx/x86_ops.h | 8 ++ 6 files changed, 189 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 8a47a69c148e..897db9392d7d 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -18,6 +18,7 @@ * TDX module. */ #define TDX_ERROR _BITUL(63) +#define TDX_NON_RECOVERABLE _BITUL(62) #define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) #define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 341aa537ca72..7f1318c44040 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -169,6 +169,15 @@ static fastpath_t vt_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return vmx_vcpu_run(vcpu, force_immediate_exit); } +static int vt_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath) +{ + if (is_td_vcpu(vcpu)) + return tdx_handle_exit(vcpu, fastpath); + + return vmx_handle_exit(vcpu, fastpath); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -216,6 +225,29 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); } +static void vt_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code) +{ + *intr_info = 0; + *error_code = 0; + + if (is_td_vcpu(vcpu)) + return; + + vmx_get_entry_info(vcpu, intr_info, error_code); +} + +static void vt_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) +{ + if (is_td_vcpu(vcpu)) { + tdx_get_exit_info(vcpu, reason, info1, info2, intr_info, + error_code); + return; + } + + vmx_get_exit_info(vcpu, reason, info1, info2, intr_info, error_code); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -310,7 +342,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_pre_run = vt_vcpu_pre_run, .vcpu_run = vt_vcpu_run, - .handle_exit = vmx_handle_exit, + .handle_exit = vt_handle_exit, .skip_emulated_instruction = vmx_skip_emulated_instruction, .update_emulated_instruction = vmx_update_emulated_instruction, .set_interrupt_shadow = vmx_set_interrupt_shadow, @@ -344,8 +376,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_identity_map_addr = vmx_set_identity_map_addr, .get_mt_mask = vmx_get_mt_mask, - .get_exit_info = vmx_get_exit_info, - .get_entry_info = vmx_get_entry_info, + .get_exit_info = vt_get_exit_info, + .get_entry_info = vt_get_entry_info, .vcpu_after_set_cpuid = vmx_vcpu_after_set_cpuid, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0863bdaf761a..cb64675e6ad9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -807,17 +807,70 @@ static bool tdx_guest_state_is_invalid(struct kvm_vcpu *vcpu) !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES)); } +static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + switch (tdx->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) { + case TDX_SUCCESS: + case TDX_NON_RECOVERABLE_VCPU: + case TDX_NON_RECOVERABLE_TD: + case TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE: + case TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE: + break; + default: + return -1u; + } + + return tdx->vp_enter_ret; +} + static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu) { struct vcpu_tdx *tdx = to_tdx(vcpu); + struct vcpu_vt *vt = to_vt(vcpu); guest_state_enter_irqoff(); tdx->vp_enter_ret = tdh_vp_enter(&tdx->vp, &tdx->vp_enter_args); + vt->exit_reason.full = tdx_to_vmx_exit_reason(vcpu); + + vt->exit_qualification = tdx->vp_enter_args.rcx; + tdx->ext_exit_qualification = tdx->vp_enter_args.rdx; + tdx->exit_gpa = tdx->vp_enter_args.r8; + vt->exit_intr_info = tdx->vp_enter_args.r9; + guest_state_exit_irqoff(); } +static bool tdx_failed_vmentry(struct kvm_vcpu *vcpu) +{ + return vmx_get_exit_reason(vcpu).failed_vmentry && + vmx_get_exit_reason(vcpu).full != -1u; +} + +static fastpath_t tdx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) +{ + u64 vp_enter_ret = to_tdx(vcpu)->vp_enter_ret; + + /* + * TDX_OPERAND_BUSY could be returned for SEPT due to 0-step mitigation + * or for TD EPOCH due to contention with TDH.MEM.TRACK on TDH.VP.ENTER. + * + * When KVM requests KVM_REQ_OUTSIDE_GUEST_MODE, which has both + * KVM_REQUEST_WAIT and KVM_REQUEST_NO_ACTION set, it requires target + * vCPUs leaving fastpath so that interrupt can be enabled to ensure the + * IPIs can be delivered. Return EXIT_FASTPATH_EXIT_HANDLED instead of + * EXIT_FASTPATH_REENTER_GUEST to exit fastpath, otherwise, the + * requester may be blocked endlessly. + */ + if (unlikely(tdx_operand_busy(vp_enter_ret))) + return EXIT_FASTPATH_EXIT_HANDLED; + + return EXIT_FASTPATH_NONE; +} + #define TDX_REGS_UNSUPPORTED_SET (BIT(VCPU_EXREG_RFLAGS) | \ BIT(VCPU_EXREG_SEGMENTS)) @@ -863,9 +916,18 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) vcpu->arch.regs_avail &= ~TDX_REGS_UNSUPPORTED_SET; + if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) + return EXIT_FASTPATH_NONE; + + if (unlikely(vmx_get_exit_reason(vcpu).basic == EXIT_REASON_MCE_DURING_VMENTRY)) + kvm_machine_check(); + trace_kvm_exit(vcpu, KVM_ISA_VMX); - return EXIT_FASTPATH_NONE; + if (unlikely(tdx_failed_vmentry(vcpu))) + return EXIT_FASTPATH_NONE; + + return tdx_exit_handlers_fastpath(vcpu); } void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) @@ -1155,6 +1217,83 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, return tdx_sept_drop_private_spte(kvm, gfn, level, pfn_to_page(pfn)); } +int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + u64 vp_enter_ret = tdx->vp_enter_ret; + union vmx_exit_reason exit_reason = vmx_get_exit_reason(vcpu); + + if (fastpath != EXIT_FASTPATH_NONE) + return 1; + + /* + * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and + * TDX_SEAMCALL_VMFAILINVALID. + */ + if (unlikely((vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR)) { + KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); + goto unhandled_exit; + } + + if (unlikely(tdx_failed_vmentry(vcpu))) { + /* + * If the guest state is protected, that means off-TD debug is + * not enabled, TDX_NON_RECOVERABLE must be set. + */ + WARN_ON_ONCE(vcpu->arch.guest_state_protected && + !(vp_enter_ret & TDX_NON_RECOVERABLE)); + vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY; + vcpu->run->fail_entry.hardware_entry_failure_reason = exit_reason.full; + vcpu->run->fail_entry.cpu = vcpu->arch.last_vmentry_cpu; + return 0; + } + + if (unlikely(vp_enter_ret & (TDX_ERROR | TDX_NON_RECOVERABLE)) && + exit_reason.basic != EXIT_REASON_TRIPLE_FAULT) { + kvm_pr_unimpl("TD vp_enter_ret 0x%llx\n", vp_enter_ret); + goto unhandled_exit; + } + + WARN_ON_ONCE(exit_reason.basic != EXIT_REASON_TRIPLE_FAULT && + (vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) != TDX_SUCCESS); + + switch (exit_reason.basic) { + case EXIT_REASON_TRIPLE_FAULT: + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; + vcpu->mmio_needed = 0; + return 0; + default: + break; + } + +unhandled_exit: + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON; + vcpu->run->internal.ndata = 2; + vcpu->run->internal.data[0] = vp_enter_ret; + vcpu->run->internal.data[1] = vcpu->arch.last_vmentry_cpu; + return 0; +} + +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + *reason = tdx->vt.exit_reason.full; + if (*reason != -1u) { + *info1 = vmx_get_exit_qual(vcpu); + *info2 = tdx->ext_exit_qualification; + *intr_info = vmx_get_intr_info(vcpu); + } else { + *info1 = 0; + *info2 = 0; + *intr_info = 0; + } + + *error_code = 0; +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 8339bbf0fdd4..0e3522e423cc 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -46,6 +46,8 @@ enum vcpu_tdx_state { struct vcpu_tdx { struct kvm_vcpu vcpu; struct vcpu_vt vt; + u64 ext_exit_qualification; + gpa_t exit_gpa; struct tdx_module_args vp_enter_args; struct tdx_vp vp; diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h index f9dbb3a065cc..6ff4672c4181 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -10,6 +10,9 @@ * TDX SEAMCALL Status Codes (returned in RAX) */ #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL +#define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL +#define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL +#define TDX_NON_RECOVERABLE_TD_WRONG_APIC_MODE 0x6000000700000000ULL #define TDX_INTERRUPTED_RESUMABLE 0x8000000300000000ULL #define TDX_OPERAND_INVALID 0xC000010000000000ULL #define TDX_OPERAND_BUSY 0x8000020000000000ULL diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index f856eac8f1e8..92716f6486e9 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -135,6 +135,10 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void tdx_vcpu_put(struct kvm_vcpu *vcpu); +int tdx_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath); +void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, + u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -169,6 +173,10 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediat } static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} +static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, + enum exit_fastpath_completion fastpath) { return 0; } +static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, + u64 *info2, u32 *intr_info, u32 *error_code) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Tue Feb 11 02:54:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969231 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B40531D5ADB; Tue, 11 Feb 2025 02:53:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242398; cv=none; b=mexGWtvFHQ6tPHN64/vQuz77frfs5SLfEaMSvtu8ykX89tMZuvxe35wEPwjRccrHYwt/pAyEg9GtK4OOmmIpZGvjVyLFcpgmmctg0vszpnWkDaQ80O7Z0jSBS3G3A3u2WQZQL1+xlFCzEfc2fyjD0Or2IVmvV1kIyKoT5KZWVik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242398; c=relaxed/simple; bh=ZGE/gueTmxVzFHGbyM4Wo3MlwRXS6BvbhJZxWuqwHeo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ki73+xfKzTVTyBUICrWmcSdVKbUdow9HdrylDIybKIxwDd+K5vvDCJNtH1kdKaWi0i95Hz4niYWYgIqqvA06z8JLBVUdrT4Pdx3bmxtUNpOnqjhyqqVAWC7VP7TSdRT5Fq95cKxzCUUNRNVCeNW/BbsJKjU72PfZyrum5lYopbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZkbZLlsf; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZkbZLlsf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242398; x=1770778398; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZGE/gueTmxVzFHGbyM4Wo3MlwRXS6BvbhJZxWuqwHeo=; b=ZkbZLlsftRi5JqJrdh5nZg4T8uFwHcayz4O3WCZqTHP0yHVbCI0kU7q5 cWlneOsYqQTh7Bs7YYbAYJhetMo5mOLQkGGFM8qzh4xMiya+szl/X+fMP dYdsHQFT3YKVpX5LR4+4qSQGgHjIUWXcXStksq9D7NLbM72ULhJA75LU/ g0CX8+wPCtAK9v5JpaPdO/Y/tIPtCvCJdBWHQ0ttTAfRIS1iqu1umwBZa eUqd9/lpfkaFSlaZa4MnN+bkVPiMwe4Yx0k2P1gH4zHGIz8sbRu+2+9RZ TbIP7YRN9ELv8Y1XrxopuWT6c/i7nzC9/7auTS5kHdycXPv2eH28UsDnZ Q==; X-CSE-ConnectionGUID: 9xPKN3mYRGmPaed21WTiXg== X-CSE-MsgGUID: 4SUbO0M7TSGBm9rGYeLYEA== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506606" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506606" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:17 -0800 X-CSE-ConnectionGUID: BHdYOP25RXWl46fxULl1Hg== X-CSE-MsgGUID: eEENujGMQOWIDCecH9Fy5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236426" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:13 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 3/8] KVM: TDX: Add a place holder for handler of TDX hypercalls (TDG.VP.VMCALL) Date: Tue, 11 Feb 2025 10:54:37 +0800 Message-ID: <20250211025442.3071607-4-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a place holder and related helper functions for preparation of TDG.VP.VMCALL handling. The TDX module specification defines TDG.VP.VMCALL API (TDVMCALL for short) for the guest TD to call hypercall to VMM. When the guest TD issues a TDVMCALL, the guest TD exits to VMM with a new exit reason. The arguments from the guest TD and returned values from the VMM are passed in the guest registers. The guest RCX register indicates which registers are used. Define helper functions to access those registers. A new VMX exit reason TDCALL is added to indicate the exit is due to TDVMCALL from the guest TD. Define the TDCALL exit reason and add a place holder to handle such exit. Suggested-by: Sean Christopherson Co-developed-by: Xiaoyao Li Signed-off-by: Xiaoyao Li Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Chao Gao --- Hypercalls exit to userspace v2: - Get/set tdvmcall inputs/outputs from/to vp_enter_args. - Morph the guest requested exit reason (via TDVMCALL) to KVM's tracked exit reason when it could, i.e. when the TDVMCALL leaf number is less than 0x10000. (Sean) - Drop helpers for read/write a0~a3. Hypercalls exit to userspace v1: - Update changelog. - Drop the unused tdx->tdvmcall. (Chao) - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- arch/x86/include/uapi/asm/vmx.h | 4 ++- arch/x86/kvm/vmx/tdx.c | 49 ++++++++++++++++++++++++++++++++- 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h index a5faf6d88f1b..6a9f268a2d2c 100644 --- a/arch/x86/include/uapi/asm/vmx.h +++ b/arch/x86/include/uapi/asm/vmx.h @@ -92,6 +92,7 @@ #define EXIT_REASON_TPAUSE 68 #define EXIT_REASON_BUS_LOCK 74 #define EXIT_REASON_NOTIFY 75 +#define EXIT_REASON_TDCALL 77 #define VMX_EXIT_REASONS \ { EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \ @@ -155,7 +156,8 @@ { EXIT_REASON_UMWAIT, "UMWAIT" }, \ { EXIT_REASON_TPAUSE, "TPAUSE" }, \ { EXIT_REASON_BUS_LOCK, "BUS_LOCK" }, \ - { EXIT_REASON_NOTIFY, "NOTIFY" } + { EXIT_REASON_NOTIFY, "NOTIFY" }, \ + { EXIT_REASON_TDCALL, "TDCALL" } #define VMX_EXIT_REASON_FLAGS \ { VMX_EXIT_REASONS_FAILED_VMENTRY, "FAILED_VMENTRY" } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index cb64675e6ad9..420ee492e919 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -235,6 +235,25 @@ static bool tdx_operand_busy(u64 err) */ static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); +static __always_inline unsigned long tdvmcall_exit_type(struct kvm_vcpu *vcpu) +{ + return to_tdx(vcpu)->vp_enter_args.r10; +} +static __always_inline unsigned long tdvmcall_leaf(struct kvm_vcpu *vcpu) +{ + return to_tdx(vcpu)->vp_enter_args.r11; +} +static __always_inline void tdvmcall_set_return_code(struct kvm_vcpu *vcpu, + long val) +{ + to_tdx(vcpu)->vp_enter_args.r10 = val; +} +static __always_inline void tdvmcall_set_return_val(struct kvm_vcpu *vcpu, + unsigned long val) +{ + to_tdx(vcpu)->vp_enter_args.r11 = val; +} + static inline void tdx_hkid_free(struct kvm_tdx *kvm_tdx) { tdx_guest_keyid_free(kvm_tdx->hkid); @@ -810,6 +829,7 @@ static bool tdx_guest_state_is_invalid(struct kvm_vcpu *vcpu) static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu) { struct vcpu_tdx *tdx = to_tdx(vcpu); + u32 exit_reason; switch (tdx->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) { case TDX_SUCCESS: @@ -822,7 +842,21 @@ static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu) return -1u; } - return tdx->vp_enter_ret; + exit_reason = tdx->vp_enter_ret; + + switch (exit_reason) { + case EXIT_REASON_TDCALL: + if (tdvmcall_exit_type(vcpu)) + return EXIT_REASON_VMCALL; + + if (tdvmcall_leaf(vcpu) < 0x10000) + return tdvmcall_leaf(vcpu); + break; + default: + break; + } + + return exit_reason; } static noinstr void tdx_vcpu_enter_exit(struct kvm_vcpu *vcpu) @@ -930,6 +964,17 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return tdx_exit_handlers_fastpath(vcpu); } +static int handle_tdvmcall(struct kvm_vcpu *vcpu) +{ + switch (tdvmcall_leaf(vcpu)) { + default: + break; + } + + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; +} + void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) { u64 shared_bit = (pgd_level == 5) ? TDX_SHARED_BIT_PWL_5 : @@ -1262,6 +1307,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; vcpu->mmio_needed = 0; return 0; + case EXIT_REASON_TDCALL: + return handle_tdvmcall(vcpu); default: break; } From patchwork Tue Feb 11 02:54:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969232 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F10C51DB13A; Tue, 11 Feb 2025 02:53:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242401; cv=none; b=k6zxwAAXNngbKVKk/4FtbKEDJdNUg0AuFDOxEG/knNh6Pwgj4ltCv6EojFNpkW5Wf3+p8yoZQcrGO2XIWATtbHH/c25AdsH+WIX/LyTNMrs6NfCd8tz5iVsP1Ia2vciVWOK8VlTyvg2zDPwHQvaIwMN8eNtEFVNeC69mfpEsnKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242401; c=relaxed/simple; bh=ogoIy/jiJ8cy9vidxoi3hYkSe6CCjoWYa2WLQH0sAdQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U7ygAepKOLwRqYLjKT5gTTPXYPDsJapxvjKPgtw/TY8T9OnaPzPp6Or3ZEg/yxsGqNEmsG0GEVTA7RWYCdIFNXQNNXbBtfifVdGXa/km+ND6+R0jQ5XG4ldinZsTmILVvYT4v/kG8zHkHLYR39xnEgWv+Sw3By0XQO44clDokRQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ukgm/O5q; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ukgm/O5q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242401; x=1770778401; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ogoIy/jiJ8cy9vidxoi3hYkSe6CCjoWYa2WLQH0sAdQ=; b=Ukgm/O5qlEtmrYLuZdM8BemG8V/EPciGaL4ig83/BZmHgumFFB0yRzae 5gCurf+fFqca/AhZ5rfdSpkzgMbQ5dZ7i23I1aDRDDF/5I6uqrwfoZcke QX6qVX/H7Si9fmKNC14JD6bYNNUEMcM48z7Vb6YCVoCHIpF7PN2vHDwfd viwnEtB6lQt/BEe8iLPrF45WL9pTAdd9D85ZPaM4ZTgjPcbpC4h2oiRdU QGDj/6REH2j3REWu94O//wG6Y52kbpWqXna6qXKz6Id0nWakiVCZ/8phK IdkEpRpF8rl+1O+0wf29tJtz1LiqGUjgo94SK1I6EbZGqr6KqlRWRBYo8 A==; X-CSE-ConnectionGUID: a+n3nQjASiGE5e2bPXKQvA== X-CSE-MsgGUID: +GSErN6AQg2L3Ow5DY7Rag== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506615" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506615" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:20 -0800 X-CSE-ConnectionGUID: Ywmm9UHiRci364UKLjY/pQ== X-CSE-MsgGUID: D2Lb1Y7GSoSclHdJQvL1jQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236431" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:16 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 4/8] KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL Date: Tue, 11 Feb 2025 10:54:38 +0800 Message-ID: <20250211025442.3071607-5-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle KVM hypercall for TDX according to TDX Guest-Host Communication Interface (GHCI) specification. The TDX GHCI specification defines the ABI for the guest TD to issue hypercalls. When R10 is non-zero, it indicates the TDG.VP.VMCALL is vendor-specific. KVM uses R10 as KVM hypercall number and R11-R14 as 4 arguments, while the error code is returned in R10. Morph the TDG.VP.VMCALL with KVM hypercall to EXIT_REASON_VMCALL and marshall r10~r14 from vp_enter_args to the appropriate x86 registers for KVM hypercall handling. Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu --- Hypercalls exit to userspace v2: - Morph the TDG.VP.VMCALL with KVM hypercall to EXIT_REASON_VMCALL. - Marshall values to the appropriate x86 registers for KVM hypercall handling. Hypercalls exit to userspace v1: - Renamed from "KVM: TDX: handle KVM hypercall with TDG.VP.VMCALL" to "KVM: TDX: Handle KVM hypercall with TDG.VP.VMCALL". - Update the change log. - Rebased on Sean's "Prep KVM hypercall handling for TDX" patch set. https://lore.kernel.org/kvm/20241128004344.4072099-1-seanjc@google.com - Use the right register (i.e. R10) to set the return code after returning back from userspace. --- arch/x86/kvm/vmx/tdx.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 420ee492e919..daa49f2ee2b3 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -964,6 +964,23 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) return tdx_exit_handlers_fastpath(vcpu); } +static int complete_hypercall_exit(struct kvm_vcpu *vcpu) +{ + tdvmcall_set_return_code(vcpu, vcpu->run->hypercall.ret); + return 1; +} + +static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu) +{ + kvm_rax_write(vcpu, to_tdx(vcpu)->vp_enter_args.r10); + kvm_rbx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r11); + kvm_rcx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r12); + kvm_rdx_write(vcpu, to_tdx(vcpu)->vp_enter_args.r13); + kvm_rsi_write(vcpu, to_tdx(vcpu)->vp_enter_args.r14); + + return __kvm_emulate_hypercall(vcpu, 0, complete_hypercall_exit); +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { @@ -1309,6 +1326,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return 0; case EXIT_REASON_TDCALL: return handle_tdvmcall(vcpu); + case EXIT_REASON_VMCALL: + return tdx_emulate_vmcall(vcpu); default: break; } From patchwork Tue Feb 11 02:54:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969233 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74C7F1E231F; Tue, 11 Feb 2025 02:53:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242405; cv=none; b=tBSYG/bvdJ1PXE3ZwPfOeYT+vt9J39+Gr5duW8uECkAGgjZHdEcRry58BU5wX3lhjKMJ846vv8buDqxq6y1RgWv2Aa//8B/hQZROK4unXc/WsaJvb44gGs1DHJefnr/ATX/CH0RfFs9bDN0isz3iAf/OYMLAPFAZ25x+O3AcS6k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242405; c=relaxed/simple; bh=M2ne8a29+Lj/l2pX5cqru700A4qvT2rUMpEU3QoQjMw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uVjXxY1+qQ4tJpRxhpgvssIuVTPZ/x4Z8wQhCCFENftrbAWaYzLw3pCFDdNs/JFbRhnGgT7d9qhrEIU+BgMin39uuTJeuhsM4LOV2On8WH9vkuEWj2o1gXDgmpjr4FC/zY69iLC7pm4bUBsGA70vCaur2GLfgrKfLOCwDmciPQs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jtF1OROt; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jtF1OROt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242404; x=1770778404; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=M2ne8a29+Lj/l2pX5cqru700A4qvT2rUMpEU3QoQjMw=; b=jtF1OROtLQeWQGdGB9uIQjSJSC3uDlpLxwRNI2Hd1FaS9ttIN9N3mLhk rh+uUJTrYGayKZygf/R7Q8FiIuNxxF4wtW+OqxZL8rGYvrH2KBaizYAyi qYPvUAHbHHFLxrwdI4khKaPpgyNMrHeAH53fldYc5h+m28slDvgLMGbuX ltRaOrLswf/i5oadAtSlAHPuO6ORcQ47CepbFTuDplEwVN7/6g+cXoizk 4vro/iWDZoGAl+w8o3ykrFtl2s5TmMK29rHjUsAE6kyTkZohHVm7Ci3aX rMHcr3srphctN1MNcQcf0JrEr+5Ph4bamVcCYHJPO9CA700Ok6cOLAeUo Q==; X-CSE-ConnectionGUID: XGp4hcrcTp2DIbF51WN4zQ== X-CSE-MsgGUID: +cN8iABQTA6BfZIe/mNH5Q== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506619" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506619" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:24 -0800 X-CSE-ConnectionGUID: 5l/wllmeQuuF3BsXpn+HyA== X-CSE-MsgGUID: z7SCzaSoSyC+zDtrlMbHrw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236440" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:20 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 5/8] KVM: TDX: Handle TDG.VP.VMCALL Date: Tue, 11 Feb 2025 10:54:39 +0800 Message-ID: <20250211025442.3071607-6-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Convert TDG.VP.VMCALL to KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling. MapGPA is used by TDX guest to request to map a GPA range as private or shared memory. It needs to exit to userspace for handling. KVM has already implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which will exit to userspace with exit reason KVM_EXIT_HYPERCALL. Do sanity checks, convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the request to userspace. To prevent a TDG.VP.VMCALL call from taking too long, the MapGPA range is split into 2MB chunks and check interrupt pending between chunks. This allows for timely injection of interrupts and prevents issues with guest lockup detection. TDX guest should retry the operation for the GPA starting at the address specified in R11 when the TDVMCALL return TDVMCALL_RETRY as status code. Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE bit set for TD VM. Suggested-by: Sean Christopherson Signed-off-by: Binbin Wu --- Hypercalls exit to userspace v2: - Skip setting of return code as TDVMCALL_STATUS_SUCCESS. - Use vp_enter_args instead of x86 registers. - Remove unnecessary comments. - Zero run->hypercall.ret in __tdx_map_gpa() following the pattern of Paolo's patch, the feedback of adding a helper is still pending. (Rick) https://lore.kernel.org/kvm/20241213194137.315304-1-pbonzini@redhat.com Hypercalls exit to userspace v1: - New added. Implement one of the hypercalls need to exit to userspace for handling after dropping "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve Sean's comment. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/ - Check interrupt pending between chunks suggested by Sean. https://lore.kernel.org/kvm/ZleJvmCawKqmpFIa@google.com/ - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) - Use vt_is_tdx_private_gpa() --- arch/x86/include/asm/shared/tdx.h | 1 + arch/x86/kvm/vmx/tdx.c | 113 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 3 + 3 files changed, 117 insertions(+) diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index 4aedab1f2a1a..f23657350d28 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -77,6 +77,7 @@ #define TDVMCALL_STATUS_SUCCESS 0x0000000000000000ULL #define TDVMCALL_STATUS_RETRY 0x0000000000000001ULL #define TDVMCALL_STATUS_INVALID_OPERAND 0x8000000000000000ULL +#define TDVMCALL_STATUS_ALIGN_ERROR 0x8000000000000002ULL /* * Bitmasks of exposed registers (with VMM). diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index daa49f2ee2b3..8b51b4c937e9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -981,9 +981,122 @@ static int tdx_emulate_vmcall(struct kvm_vcpu *vcpu) return __kvm_emulate_hypercall(vcpu, 0, complete_hypercall_exit); } +/* + * Split into chunks and check interrupt pending between chunks. This allows + * for timely injection of interrupts to prevent issues with guest lockup + * detection. + */ +#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024) +static void __tdx_map_gpa(struct vcpu_tdx *tdx); + +static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->run->hypercall.ret) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + tdx->vp_enter_args.r11 = tdx->map_gpa_next; + return 1; + } + + tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN; + if (tdx->map_gpa_next >= tdx->map_gpa_end) + return 1; + + /* + * Stop processing the remaining part if there is pending interrupt. + * Skip checking pending virtual interrupt (reflected by + * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because + * if guest disabled interrupt, it's OK not returning back to guest + * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA + * immediately after STI or MOV/POP SS. + */ + if (pi_has_pending_interrupt(vcpu) || + kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY); + tdx->vp_enter_args.r11 = tdx->map_gpa_next; + return 1; + } + + __tdx_map_gpa(tdx); + return 0; +} + +static void __tdx_map_gpa(struct vcpu_tdx *tdx) +{ + u64 gpa = tdx->map_gpa_next; + u64 size = tdx->map_gpa_end - tdx->map_gpa_next; + + if (size > TDX_MAP_GPA_MAX_LEN) + size = TDX_MAP_GPA_MAX_LEN; + + tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; + tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; + /* + * In principle this should have been -KVM_ENOSYS, but userspace (QEMU <=9.2) + * assumed that vcpu->run->hypercall.ret is never changed by KVM and thus that + * it was always zero on KVM_EXIT_HYPERCALL. Since KVM is now overwriting + * vcpu->run->hypercall.ret, ensuring that it is zero to not break QEMU. + */ + tdx->vcpu.run->hypercall.ret = 0; + tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); + tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; + tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ? + KVM_MAP_GPA_RANGE_ENCRYPTED : + KVM_MAP_GPA_RANGE_DECRYPTED; + tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE; + + tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa; +} + +static int tdx_map_gpa(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + u64 gpa = tdx->vp_enter_args.r12; + u64 size = tdx->vp_enter_args.r13; + u64 ret; + + /* + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE + * bit set. If not, the error code is not defined in GHCI for TDX, use + * TDVMCALL_STATUS_INVALID_OPERAND for this case. + */ + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) { + ret = TDVMCALL_STATUS_INVALID_OPERAND; + goto error; + } + + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) || + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size - 1) || + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) != + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1))) { + ret = TDVMCALL_STATUS_INVALID_OPERAND; + goto error; + } + + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) { + ret = TDVMCALL_STATUS_ALIGN_ERROR; + goto error; + } + + tdx->map_gpa_end = gpa + size; + tdx->map_gpa_next = gpa; + + __tdx_map_gpa(tdx); + return 0; + +error: + tdvmcall_set_return_code(vcpu, ret); + tdx->vp_enter_args.r11 = gpa; + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { + case TDVMCALL_MAP_GPA: + return tdx_map_gpa(vcpu); default: break; } diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 0e3522e423cc..45c1d064b6b7 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -57,6 +57,9 @@ struct vcpu_tdx { u64 vp_enter_ret; enum vcpu_tdx_state state; + + u64 map_gpa_next; + u64 map_gpa_end; }; void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err); From patchwork Tue Feb 11 02:54:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969234 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F5DF1D61A5; Tue, 11 Feb 2025 02:53:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242408; cv=none; b=C08JkLbbjyh1Ln9wpky8PtpGqE1HWatza6NA/eDQhmUqSsXShLmF58dyFkTuGyIhzrS4R5l8txn9z+9HfKq/xflBM/1L+KgMQLpQPAuGCZzwDsfjsAlpfb2zQKZIxeBafkwTn92OAnpQYn1PMPIaLQPUBw8nxm9EKF+Xp6ejZOg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242408; c=relaxed/simple; bh=YMWpCMl+AIpwqEPlHAIvt12kPvItzQgcwNKb/CTy/5I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nkqCpOlEizvMIpBnCL9tyEOzE7vxsIcn7x/Z9WVNggfuQWs1Jc0ZHKWsgyIRB8Nujy6LV6nvVejjxda/RNWuR0G89CINIP1UUyjDPVynKuAGvmDeTKqS/v1eIy1OZzvxWL3Q3lWASXVzkt6h63PT/jBSqPSkIj9dYqW7+uHyi/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=PYo3keyw; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PYo3keyw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242408; x=1770778408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YMWpCMl+AIpwqEPlHAIvt12kPvItzQgcwNKb/CTy/5I=; b=PYo3keywFZuHhtyfJjaOAbRxhhZLvvS+ej9vwCj4gN3r/1Pur2GfJDxu 8B31hETTN5YfshOraIzHmxAfN7fff7nOhNK7PmxR7n4FwUd+dEzERzCpy yT+VNnKFGnZpL9Xu8bOfMUYl00AEM32f+xSOhHzgo6i88sxxOYl60eNMq j42rpPu/NzE9AmdYkKTz1vNgcMEcj+QATO09Pxau4Igl4ZtS/dEEDthNP ou194tgKXP/AqDTzOkQlrEK3RotA6wIB8Hif8Ntf1vVzemgrkXFUTLUeB ULxqtewngai1yKBpRjJH+20i0p0QTBWsLM8G5aA4OzDwnXCMjlu8eVDal Q==; X-CSE-ConnectionGUID: 7xrK4v4KQFykzmtR29uI/w== X-CSE-MsgGUID: X9WEOIB0SN6w/MpnO7dwJg== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506625" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506625" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:28 -0800 X-CSE-ConnectionGUID: pKk/uTvFQQmRDSQX+ZEBuw== X-CSE-MsgGUID: tVbpFlurSnmVyd1gc+sG6g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236456" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:23 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 6/8] KVM: TDX: Handle TDG.VP.VMCALL Date: Tue, 11 Feb 2025 10:54:40 +0800 Message-ID: <20250211025442.3071607-7-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Convert TDG.VP.VMCALL to KVM_EXIT_SYSTEM_EVENT with a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for handling. TD guest can use TDG.VP.VMCALL to report the fatal error it has experienced. This hypercall is special because TD guest is requesting a termination with the error information, KVM needs to forward the hypercall to userspace anyway, KVM doesn't do sanity checks and let userspace decide what to do. Signed-off-by: Binbin Wu --- Hypercalls exit to userspace v2: - Use vp_enter_args instead of x86 registers. - vcpu->run->system_event.ndata is not hardcoded to 10. (Xiaoyao) - Undefine COPY_REG after use. (Yilun) - Updated the document about KVM_SYSTEM_EVENT_TDX_FATAL. (Chao) Hypercalls exit to userspace v1: - New added. Implement one of the hypercalls need to exit to userspace for handling after reverting "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve Sean's comment. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/ - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) --- Documentation/virt/kvm/api.rst | 9 +++++++ arch/x86/kvm/vmx/tdx.c | 45 ++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + 3 files changed, 55 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 2ba70c1fad51..5e415b312ab0 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6823,6 +6823,7 @@ should put the acknowledged interrupt vector into the 'epr' field. #define KVM_SYSTEM_EVENT_WAKEUP 4 #define KVM_SYSTEM_EVENT_SUSPEND 5 #define KVM_SYSTEM_EVENT_SEV_TERM 6 + #define KVM_SYSTEM_EVENT_TDX_FATAL 7 __u32 type; __u32 ndata; __u64 data[16]; @@ -6849,6 +6850,14 @@ Valid values for 'type' are: reset/shutdown of the VM. - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination. The guest physical address of the guest's GHCB is stored in `data[0]`. + - KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state. + The error code reported by the TDX guest is stored in `data[0]`, the error + code format is defined in TDX GHCI specification. + If the bit 63 of `data[0]` is set, it indicates there is TD specified + additional information provided in a page, which is shared memory. The + guest physical address of the information page is stored in `data[1]`. + An optional error message is provided by `data[2]` ~ `data[9]`, which is + byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled. - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and KVM has recognized a wakeup event. Userspace may honor this event by marking the exiting vCPU as runnable, or deny it and call KVM_RUN again. diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 8b51b4c937e9..85956768c515 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1092,11 +1092,56 @@ static int tdx_map_gpa(struct kvm_vcpu *vcpu) return 1; } +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + u64 reg_mask = tdx->vp_enter_args.rcx; + u64 *opt_regs; + + /* + * Skip sanity checks and let userspace decide what to do if sanity + * checks fail. + */ + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL; + /* Error codes. */ + vcpu->run->system_event.data[0] = tdx->vp_enter_args.r12; + /* GPA of additional information page. */ + vcpu->run->system_event.data[1] = tdx->vp_enter_args.r13; + /* Information passed via registers (up to 64 bytes). */ + opt_regs = &vcpu->run->system_event.data[2]; + +#define COPY_REG(REG, MASK) \ + do { \ + if (reg_mask & MASK) { \ + *opt_regs = tdx->vp_enter_args.REG; \ + opt_regs++; \ + } \ + } while (0) + + /* The order is defined in GHCI. */ + COPY_REG(r14, BIT_ULL(14)); + COPY_REG(r15, BIT_ULL(15)); + COPY_REG(rbx, BIT_ULL(3)); + COPY_REG(rdi, BIT_ULL(7)); + COPY_REG(rsi, BIT_ULL(6)); + COPY_REG(r8, BIT_ULL(8)); + COPY_REG(r9, BIT_ULL(9)); + COPY_REG(rdx, BIT_ULL(2)); +#undef COPY_REG + + vcpu->run->system_event.ndata = opt_regs - vcpu->run->system_event.data; + + return 0; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { case TDVMCALL_MAP_GPA: return tdx_map_gpa(vcpu); + case TDVMCALL_REPORT_FATAL_ERROR: + return tdx_report_fatal_error(vcpu); default: break; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 45e6d8fca9b9..937400350317 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -375,6 +375,7 @@ struct kvm_run { #define KVM_SYSTEM_EVENT_WAKEUP 4 #define KVM_SYSTEM_EVENT_SUSPEND 5 #define KVM_SYSTEM_EVENT_SEV_TERM 6 +#define KVM_SYSTEM_EVENT_TDX_FATAL 7 __u32 type; __u32 ndata; union { From patchwork Tue Feb 11 02:54:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969235 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC63C1E412A; Tue, 11 Feb 2025 02:53:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242412; cv=none; b=I3/ipOfkAQNgoA6d1g5R1rEcNSQgMqYINrUCwAxShsx0VZFLoHCpZLufjVXFzM9jKOzCHiA3ZJO6gqn13nuTGGzTc5XEQeffbMgED2/+WL4w4hkuCUth3ac8HX7unJF3sqc+enD+WHKb/A6cvYAP4yy+d/gEo9HGaJ11cVlqCLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242412; c=relaxed/simple; bh=z/5Bed58RE9KjC8zRoI9EixAQOU33U2F+6O/XNNHdTo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E7jlkJtga3RT1dBkGjd81c6DlQLHRhLGhyPWp5siw2dIvVa2qwnzTFIycVJcajTgA/9pMgb313nx0HGtyHgNfxgdnIo6fzJbsUg270hm0WzZKs1fEX3+2LhsLlEYUgGaxtPb2y1mLPWp3GCrAgH0Qnr0OyayIoBNDGmuDn8G4fI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LclGEA6c; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LclGEA6c" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242412; x=1770778412; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z/5Bed58RE9KjC8zRoI9EixAQOU33U2F+6O/XNNHdTo=; b=LclGEA6cpWNZHDoD14UkdtkH5u9rbLwDBycYNDCpHoh9QQUNFZoPq+w4 9oBQSbyMpZdEiRsMKHBY0QLqpdYtdqgYGih6/pi+CMEFjoofi+yY3m8d+ rFpxX4dRlIt282tJHvjhYEp8sYj/6g5QibgXF+3Lx0AsxXoiSu0ZPxy9M DH0HD5+qPZzCSJSlcVVh1NDJvi8OQILI97ds48fwDRv4N/3/RmGflR9vD MyEeySUgYYhw9zFans/EU6qJ9EksVP/2T3YI8HQQQt/U8dpNzRuCqG6S4 L4k879KPADFfIL1ER6h01WWXZcGf4Xb+vzndRnKEuScTgi0YmbRa/SZms g==; X-CSE-ConnectionGUID: zSFkclmfSsyy2kzeKokzFw== X-CSE-MsgGUID: U5XBPrcZSx+OpnXmveCGBA== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506629" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506629" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:31 -0800 X-CSE-ConnectionGUID: NZ+142KPQcKnn5+W8TMeiw== X-CSE-MsgGUID: LK/1cOQRTxGmbsU8QOSYJA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236466" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:27 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 7/8] KVM: TDX: Handle TDX PV port I/O hypercall Date: Tue, 11 Feb 2025 10:54:41 +0800 Message-ID: <20250211025442.3071607-8-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Emulate port I/O requested by TDX guest via TDVMCALL with leaf Instruction.IO (same value as EXIT_REASON_IO_INSTRUCTION) according to TDX Guest Host Communication Interface (GHCI). All port I/O instructions inside the TDX guest trigger the #VE exception. On #VE triggered by I/O instructions, TDX guest can call TDVMCALL with leaf Instruction.IO to request VMM to emulate I/O instructions. Similar to normal port I/O emulation, try to handle the port I/O in kernel first, if kernel can't support it, forward the request to userspace. Note string I/O operations are not supported in TDX. Guest should unroll them before calling the TDVMCALL. Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- Hypercalls exit to userspace v2: - Morph PV port I/O hypercall to EXIT_REASON_IO_INSTRUCTION. (Sean) - Use vp_enter_args instead of x86 registers. - Check write is either 0 or 1. (Chao) - Skip setting return code as TDVMCALL_STATUS_SUCCESS. (Sean) Hypercalls exit to userspace v1: - Renamed from "KVM: TDX: Handle TDX PV port io hypercall" to "KVM: TDX: Handle TDX PV port I/O hypercall". - Update changelog. - Add missing curly brackets. - Move reset of pio.count to tdx_complete_pio_out() and remove the stale comment. (binbin) - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) - Set status code to TDVMCALL_STATUS_SUCCESS when PIO is handled in kernel. - Don't write to R11 when it is a write operation for output. v18: - Fix out case to set R10 and R11 correctly when user space handled port out. --- arch/x86/kvm/vmx/tdx.c | 60 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 85956768c515..f13da28dd4a2 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1135,6 +1135,64 @@ static int tdx_report_fatal_error(struct kvm_vcpu *vcpu) return 0; } +static int tdx_complete_pio_out(struct kvm_vcpu *vcpu) +{ + vcpu->arch.pio.count = 0; + return 1; +} + +static int tdx_complete_pio_in(struct kvm_vcpu *vcpu) +{ + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + unsigned long val = 0; + int ret; + + ret = ctxt->ops->pio_in_emulated(ctxt, vcpu->arch.pio.size, + vcpu->arch.pio.port, &val, 1); + + WARN_ON_ONCE(!ret); + + tdvmcall_set_return_val(vcpu, val); + + return 1; +} + +static int tdx_emulate_io(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + unsigned long val = 0; + unsigned int port; + u64 size, write; + int ret; + + ++vcpu->stat.io_exits; + + size = tdx->vp_enter_args.r12; + write = tdx->vp_enter_args.r13; + port = tdx->vp_enter_args.r14; + + if ((write != 0 && write != 1) || (size != 1 && size != 2 && size != 4)) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; + } + + if (write) { + val = tdx->vp_enter_args.r15; + ret = ctxt->ops->pio_out_emulated(ctxt, size, port, &val, 1); + } else { + ret = ctxt->ops->pio_in_emulated(ctxt, size, port, &val, 1); + } + + if (!ret) + vcpu->arch.complete_userspace_io = write ? tdx_complete_pio_out : + tdx_complete_pio_in; + else if (!write) + tdvmcall_set_return_val(vcpu, val); + + return ret; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { @@ -1486,6 +1544,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return handle_tdvmcall(vcpu); case EXIT_REASON_VMCALL: return tdx_emulate_vmcall(vcpu); + case EXIT_REASON_IO_INSTRUCTION: + return tdx_emulate_io(vcpu); default: break; } From patchwork Tue Feb 11 02:54:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Binbin Wu X-Patchwork-Id: 13969236 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E88C1E5B79; Tue, 11 Feb 2025 02:53:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242416; cv=none; b=uj4d0IRtUhm7JUVHe++xJDEytx3a3OX8aueTYq+WPBYxFsC5aJ8MRi2P0PEgA68a8tg4BmfvEsmFAAYQ5h3V+tQI9ML6mmdazbC/gJ7s9dj9G7uEXvu+U6UxzvT98o1aFmn0nBy38QStF75fzGlsrlvtHsAeplW4pGmNlwQTK1w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739242416; c=relaxed/simple; bh=PBkOidVvwMnLVO//k87I99vI11Ujgs8DjnLQYbKjFDQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sfRv1isux9d0HAfGuXGEM6bjvL7azaQLbTze9O9Qi1gSGkd6cI1ABxQOHhALLNVvVJHBufxwBUeZdPjoxuFeAnYw7Pcsc4ITfn/3CTNcCzQrZYec77vSxZ4A3yQmDocedk4meguLqa2bPnGl9GKjXKrPELU9oN4M84NwK6FVj74= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=R9ufVTh7; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R9ufVTh7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1739242415; x=1770778415; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PBkOidVvwMnLVO//k87I99vI11Ujgs8DjnLQYbKjFDQ=; b=R9ufVTh7S5wFuHk2RvVq+qNkON4plC5laWfGZQFN9Kwe9q4vQsdUpo1w eOK0GmHsA7ZW/lyZ0WJzsPSn4x+78nzT12VqjpH4jCoo+oX+T2iXOZQCu Sv5qPBokufqScA8Nt8X3ubVfAxEFyk+FEInrk3QgtAlZmKtpRhqkqtd0Y rQXifLrphH4Oq++biJysPqUl+Y5prsN8OXm/ByjP4khDZav27k8lr5L/P se5Je2VXOUcHDNEm5TCUNxwcYY+WkEGw7TReD/j5JZwGOlXYfNakCqeNK TthAAMuDNo+to6qlj4ffumVaacpknFJW8Xrf9I3HtHrW5b2U5AMroxnDg g==; X-CSE-ConnectionGUID: n5qQ0PZITxqj9qkD6/wJaA== X-CSE-MsgGUID: uLWSxCyyTCG5ZGvFXD5nqA== X-IronPort-AV: E=McAfee;i="6700,10204,11341"; a="43506635" X-IronPort-AV: E=Sophos;i="6.13,276,1732608000"; d="scan'208";a="43506635" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:35 -0800 X-CSE-ConnectionGUID: BZF9QDQsRsmuCQBy1vST/A== X-CSE-MsgGUID: iW+nfzdCQo+53y494+avMA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="112236484" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Feb 2025 18:53:31 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH v2 8/8] KVM: TDX: Handle TDX PV MMIO hypercall Date: Tue, 11 Feb 2025 10:54:42 +0800 Message-ID: <20250211025442.3071607-9-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250211025442.3071607-1-binbin.wu@linux.intel.com> References: <20250211025442.3071607-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson Handle TDX PV MMIO hypercall when TDX guest calls TDVMCALL with the leaf #VE.RequestMMIO (same value as EXIT_REASON_EPT_VIOLATION) according to TDX Guest Host Communication Interface (GHCI) spec. For TDX guests, VMM is not allowed to access vCPU registers and the private memory, and the code instructions must be fetched from the private memory. So MMIO emulation implemented for non-TDX VMs is not possible for TDX guests. In TDX the MMIO regions are instead configured by VMM to trigger a #VE exception in the guest. The #VE handling is supposed to emulate the MMIO instruction inside the guest and convert it into a TDVMCALL with the leaf #VE.RequestMMIO, which equals to EXIT_REASON_EPT_VIOLATION. The requested MMIO address must be in shared GPA space. The shared bit is stripped after check because the existing code for MMIO emulation is not aware of the shared bit. The MMIO GPA shouldn't have a valid memslot, also the attribute of the GPA should be shared. KVM could do the checks before exiting to userspace, however, even if KVM does the check, there still will be race conditions between the check in KVM and the emulation of MMIO access in userspace due to a memslot hotplug, or a memory attribute conversion. If userspace doesn't check the attribute of the GPA and the attribute happens to be private, it will not pose a security risk or cause an MCE, but it can lead to another issue. E.g., in QEMU, treating a GPA with private attribute as shared when it falls within RAM's range can result in extra memory consumption during the emulation to the access to the HVA of the GPA. There are two options: 1) Do the check both in KVM and userspace. 2) Do the check only in QEMU. This patch chooses option 2, i.e. KVM omits the memslot and attribute checks, and expects userspace to do the checks. Similar to normal MMIO emulation, try to handle the MMIO in kernel first, if kernel can't support it, forward the request to userspace. Export needed symbols used for MMIO handling. Fragments handling is not needed for TDX PV MMIO because GPA is provided, if a MMIO access crosses page boundary, it should be continuous in GPA. Also, the size is limited to 1, 2, 4, 8 bytes. No further split needed. Allow cross page access because no extra handling needed after checking both start and end GPA are shared GPAs. Suggested-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Paolo Bonzini --- Hypercalls exit to userspace v2: - Morph PV MMIO hypercall to EXIT_REASON_EPT_MISCONFIG. (Sean) - Skip setting return code as TDVMCALL_STATUS_SUCCESS, as a result, the complete_userspace_io() callback for write is not needed. (Sean) - Remove the code for reading/writing APIC mmio,since TDX guest supports x2APIC only. - Use vp_enter_args directly instead of helpers. Hypercalls exit to userspace v1: - Update the changelog. - Remove the check of memslot for GPA. - Allow MMIO access crossing page boundary. - Move the tracepoint for KVM_TRACE_MMIO_WRITE earlier so the tracepoint handles the cases both for kernel and userspace. (Isaku) - Set TDVMCALL return code when back from userspace, which is missing in v19. - Move fast MMIO write into tdx_mmio_write() - Check GPA is shared GPA. (Binbin) - Remove extra check for size > 8u. (Binbin) - Removed KVM_BUG_ON() in tdx_complete_mmio() and tdx_emulate_mmio() - Removed vcpu->mmio_needed code since it's not used after removing KVM_BUG_ON(). - Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin) - Use vt_is_tdx_private_gpa() --- arch/x86/kvm/vmx/tdx.c | 109 ++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 1 + virt/kvm/kvm_main.c | 1 + 3 files changed, 110 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f13da28dd4a2..8f3147c6e602 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -849,8 +849,12 @@ static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu) if (tdvmcall_exit_type(vcpu)) return EXIT_REASON_VMCALL; - if (tdvmcall_leaf(vcpu) < 0x10000) + if (tdvmcall_leaf(vcpu) < 0x10000) { + if (tdvmcall_leaf(vcpu) == EXIT_REASON_EPT_VIOLATION) + return EXIT_REASON_EPT_MISCONFIG; + return tdvmcall_leaf(vcpu); + } break; default: break; @@ -1193,6 +1197,107 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu) return ret; } +static int tdx_complete_mmio_read(struct kvm_vcpu *vcpu) +{ + unsigned long val = 0; + gpa_t gpa; + int size; + + gpa = vcpu->mmio_fragments[0].gpa; + size = vcpu->mmio_fragments[0].len; + + memcpy(&val, vcpu->run->mmio.data, size); + tdvmcall_set_return_val(vcpu, val); + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); + return 1; +} + +static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size, + unsigned long val) +{ + if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) { + trace_kvm_fast_mmio(gpa); + return 0; + } + + trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); + if (kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val)) + return -EOPNOTSUPP; + + return 0; +} + +static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size) +{ + unsigned long val; + + if (kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val)) + return -EOPNOTSUPP; + + tdvmcall_set_return_val(vcpu, val); + trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); + return 0; +} + +static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + int size, write, r; + unsigned long val; + gpa_t gpa; + + size = tdx->vp_enter_args.r12; + write = tdx->vp_enter_args.r13; + gpa = tdx->vp_enter_args.r14; + val = write ? tdx->vp_enter_args.r15 : 0; + + if (size != 1 && size != 2 && size != 4 && size != 8) + goto error; + if (write != 0 && write != 1) + goto error; + + /* + * TDG.VP.VMCALL allows only shared GPA, it makes no sense to + * do MMIO emulation for private GPA. + */ + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa) || + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size - 1)) + goto error; + + gpa = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(vcpu->kvm)); + + if (write) + r = tdx_mmio_write(vcpu, gpa, size, val); + else + r = tdx_mmio_read(vcpu, gpa, size); + if (!r) + /* Kernel completed device emulation. */ + return 1; + + /* Request the device emulation to userspace device model. */ + vcpu->mmio_is_write = write; + if (!write) + vcpu->arch.complete_userspace_io = tdx_complete_mmio_read; + + vcpu->run->mmio.phys_addr = gpa; + vcpu->run->mmio.len = size; + vcpu->run->mmio.is_write = write; + vcpu->run->exit_reason = KVM_EXIT_MMIO; + + if (write) { + memcpy(vcpu->run->mmio.data, &val, size); + } else { + vcpu->mmio_fragments[0].gpa = gpa; + vcpu->mmio_fragments[0].len = size; + trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, size, gpa, NULL); + } + return 0; + +error: + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { @@ -1546,6 +1651,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath) return tdx_emulate_vmcall(vcpu); case EXIT_REASON_IO_INSTRUCTION: return tdx_emulate_io(vcpu); + case EXIT_REASON_EPT_MISCONFIG: + return tdx_emulate_mmio(vcpu); default: break; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 29f33f7c9da9..a41d57ba4a86 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -14010,6 +14010,7 @@ EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 836e0c69f53b..783683d04939 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5835,6 +5835,7 @@ int kvm_io_bus_read(struct kvm_vcpu *vcpu, enum kvm_bus bus_idx, gpa_t addr, r = __kvm_io_bus_read(vcpu, bus, &range, val); return r < 0 ? r : 0; } +EXPORT_SYMBOL_GPL(kvm_io_bus_read); int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int len, struct kvm_io_device *dev)