From patchwork Wed Sep 4 03:07:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789633 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3F7D4A15; Wed, 4 Sep 2024 03:14:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419673; cv=none; b=eEdfYMWN4Y3OqnJtbN4KV4cPuAkPeumkqIEfL3IVjYJA7cRSS/rQI1klAVQ2z0mlDY9ncbWATsuaw8NCq+dZW45FfAok+VCv2dFH7a3w0ArBngEgB1b7mRl9D436iy17ySTMKmN84DnRK4r0mtW6TRjoxQJE7utTciLFP+Ngvd0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419673; c=relaxed/simple; bh=N6xlq1IwCpX/qyuRU4A8GLbVCjaqTEZLoE5Y9X5Ku0E=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OwBveivPU0/2tBGknhWZOeAfSZpOjmdh/fJRR1WOIzQ/TmfzggPlYr9ISJKL1CglH/uVkzOzptJX38zHBIoM0X3b/lz75w3MMPootzG7lSW8Ihv/yw+G5kqLAr27zuCyAAdGB5MZGJflgunYmzW8OE3ndRe/XiaDIVqE6TMZVek= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gAm0cQeC; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gAm0cQeC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419671; x=1756955671; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=N6xlq1IwCpX/qyuRU4A8GLbVCjaqTEZLoE5Y9X5Ku0E=; b=gAm0cQeC6lFtTi4V7PJ1mLR6l8FDgcMhmSJKiMsBWqc81kiRfe7mEV2k xpkFYJqPOp+9WnkHyidp66SDJmKWFabtdR1VA5b9lbQABtPlHVBrlDg2D Dbmhr7GlfMmsUsAywsxdZSFwbBmcjJYeqsi3qWjUHXBaiY1+Q1XkW6yMo ln5bdd7dsQ/A1QZEyeMp+Dwhwi/Dn70qER1A+uJcvmXoDmcEwam9J1UpR 8JYohFTCwm3JK1/EGrLGZry2Rk3xiH8UTvWPSW1fucgesRGCNwaQrUFHe z1yxNeShnCO7WkVuUeT9Ln9FfOluN73T9hh6tlSaDjVSGMibF0+8XLDbQ Q==; X-CSE-ConnectionGUID: C2bKh+8XRt+I25vl4RVQOg== X-CSE-MsgGUID: DSvl5GeMQoykVzqwATz0sA== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564622" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564622" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:07:58 -0700 X-CSE-ConnectionGUID: odBDxinvTtKIfddHcEi6cQ== X-CSE-MsgGUID: zlY3GO5PQni6XKVmJP+Uog== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106212" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:07:57 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 01/21] KVM: x86/mmu: Implement memslot deletion for TDX Date: Tue, 3 Sep 2024 20:07:31 -0700 Message-Id: <20240904030751.117579-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior. TDs cannot use the fast zapping operation to implement memslot deletion for a couple reasons: 1. KVM cannot fully zap and re-build TDX private PTEs without coordinating with the guest. This is due to the TDs needing to "accept" memory. So an operation to delete a memslot needs to limit the private zapping to the range of the memslot. 2. For reason (1), kvm_mmu_zap_all_fast() is limited to direct (shared) roots. This means it will not zap the mirror (private) PTEs. If a memslot is deleted with private memory mapped, the private memory would remain mapped in the TD. Then if later the gmem fd was whole punched, the pages could be freed on the host while still mapped in the TD. This is because that operation would no longer have the memslot to map the pgoff to the gfn. To handle the first case, userspace could simply set the KVM_X86_QUIRK_SLOT_ZAP_ALL quirk for TDs. This would prevent the issue in (1), but it is not sufficient to resolve (2) because the problems there extend beyond the userspace's TD, to affecting the rest of the host. So the zap-leafs-only behavior is required for both A couple options were considered, including forcing KVM_X86_QUIRK_SLOT_ZAP_ALL to always be on for TDs, however due to the currently limited quirks interface (no way to query quirks, or force them to be disabled), this would require developing additional interfaces. So instead just do the simple thing and make TDs always do the zap-leafs behavior like when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled. While at it, have the new behavior apply to all non-KVM_X86_DEFAULT_VM VMs, as the previous behavior was not ideal (see [0]). It is assumed until proven otherwise that the other VM types will not be exposed to the bug[1] that derailed that effort. Memslot deletion needs to zap both the private and shared mappings of a GFN, so update the attr_filter field in kvm_mmu_zap_memslot_leafs() to include both. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe Link: https://lore.kernel.org/kvm/20190205205443.1059-1-sean.j.christopherson@intel.com/ [0] Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1] --- TDX MMU part 2 v1: - Clarify TDX limits on zapping private memory (Sean) Memslot quirk series: - New patch --- arch/x86/kvm/mmu/mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a8d91cf11761..7e66d7c426c1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7104,6 +7104,7 @@ static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s .start = slot->base_gfn, .end = slot->base_gfn + slot->npages, .may_block = true, + .attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED, }; bool flush = false; From patchwork Wed Sep 4 03:07:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789635 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3419C47796; Wed, 4 Sep 2024 03:14:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419674; cv=none; b=u4m4LOLQZVMijtrtoJiybv7JRCLn9J36am0UGgVBeLRLBO3isw2lShOXF4gxfuANH5x4kk6ygcHo6QLz1qx7eK118roxm8oyZNoCAnTXKKY4LBBuf1axzPDYAVYfPifE6zZFI/uhMJXrsbwplDrBbvpUvyFFUVfHpo72Q7a+FMc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419674; c=relaxed/simple; bh=cTe8OOkqYquI2I1s891XT0thE+RLb9ytTyB9PBl4YMY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gquMYzP85Htep7Lt2lxbOKDVtbvNLTqMI3kz8/uE+Oz1TdICvI5ZtifqnnsPU6uarEmrOlqpsIHxcjcuM/FKQm/XnmmNxL7zDObPHzAl0BHhJex0Mc+gMPsPWUJtqB3bXwyItfUiJ3aGGI3HQzaTOD9oEXg3sxgCmN0uye/4Ouc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Uip2eje8; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Uip2eje8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419672; x=1756955672; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cTe8OOkqYquI2I1s891XT0thE+RLb9ytTyB9PBl4YMY=; b=Uip2eje8LX28B8KNpndVfzZ751AslrSX0LdOzg9ZDAYqY80wZo+deQ+G dr5zjd9AtvFafstzYP0N4yNkzbBYG8p1wDDA+jTIyxn+VNtev1x0Ervqx AdIMpB3bTfQ38tK8bXLc21Y5uRA0ujmByTK+bBOuW9XaIVVMn2U95QBrg LjNzxiQWoMc/R4QAvYDYc3aXKSpkYMdjps1RQgg+5MLI2i+KPFKAgMLHw YcdEhcu79sXm+rF73GICNJiuwXCt0kxy87pVwrhsvLGKl09JNu5b6IJgL nzQxzbT5GoBAIO6cG6YjZOO+idzuO8tO+Kmqja2miFfMaQ06vAqeCz9rm g==; X-CSE-ConnectionGUID: dy9R9gKTTi2yOBAgJdws2g== X-CSE-MsgGUID: SzoLDgY2RN2xo6Y9sk+E7Q== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564629" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564629" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:07:59 -0700 X-CSE-ConnectionGUID: RATggeyJTZ235pnag5nl4Q== X-CSE-MsgGUID: WbwoMKdZQAW/bOOyvvJ7ZA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106218" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:07:58 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 02/21] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Date: Tue, 3 Sep 2024 20:07:32 -0700 Message-Id: <20240904030751.117579-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Export a function to walk down the TDP without modifying it and simply check if a PGA is mapped. Future changes will support pre-populating TDX private memory. In order to implement this KVM will need to check if a given GFN is already pre-populated in the mirrored EPT. [1] There is already a TDP MMU walker, kvm_tdp_mmu_get_walk() for use within the KVM MMU that almost does what is required. However, to make sense of the results, MMU internal PTE helpers are needed. Refactor the code to provide a helper that can be used outside of the KVM MMU code. Refactoring the KVM page fault handler to support this lookup usage was also considered, but it was an awkward fit. kvm_tdp_mmu_gpa_is_mapped() is based on a diff by Paolo Bonzini. Link: https://lore.kernel.org/kvm/ZfBkle1eZFfjPI8l@google.com/ [1] Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Change exported function to just return of GPA is mapped because "You are executing with the filemap_invalidate_lock() taken, and therefore cannot race with kvm_gmem_punch_hole()" (Paolo) https://lore.kernel.org/kvm/CABgObfbpNN842noAe77WYvgi5MzK2SAA_FYw-=fGa+PcT_Z22w@mail.gmail.com/ - Take root hpa instead of enum (Paolo) TDX MMU Prep v2: - Rename function with "mirror" and use root enum TDX MMU Prep: - New patch --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 3 +-- arch/x86/kvm/mmu/tdp_mmu.c | 37 ++++++++++++++++++++++++++++++++----- 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 8f289222b353..5faa416ac874 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -254,6 +254,9 @@ extern bool tdp_mmu_enabled; #define tdp_mmu_enabled false #endif +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa); +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level); + static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7e66d7c426c1..01808cdf8627 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4713,8 +4713,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return direct_page_fault(vcpu, fault); } -static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, - u8 *level) +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level) { int r; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 37b3769a5d32..019b43723d90 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1911,16 +1911,13 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, * * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. */ -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, - int *root_level) +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + struct kvm_mmu_page *root) { - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); struct tdp_iter iter; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - *root_level = vcpu->arch.mmu->root_role.level; - tdp_mmu_for_each_pte(iter, vcpu->kvm, root, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; @@ -1929,6 +1926,36 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, return leaf; } +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + int *root_level) +{ + struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); + *root_level = vcpu->arch.mmu->root_role.level; + + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root); +} + +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa) +{ + struct kvm *kvm = vcpu->kvm; + bool is_direct = kvm_is_addr_direct(kvm, gpa); + hpa_t root = is_direct ? vcpu->arch.mmu->root.hpa : + vcpu->arch.mmu->mirror_root_hpa; + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; + int leaf; + + lockdep_assert_held(&kvm->mmu_lock); + rcu_read_lock(); + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root)); + rcu_read_unlock(); + if (leaf < 0) + return false; + + spte = sptes[leaf]; + return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); +} +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); + /* * Returns the last level spte pointer of the shadow page walk for the given * gpa, and sets *spte to the spte value. This spte may be non-preset. If no From patchwork Wed Sep 4 03:07:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789636 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3873B4F20E; Wed, 4 Sep 2024 03:14:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419675; cv=none; b=g0l0s8S/IRNwp/MfzkKQQX/OaYxpTgA0S8wBXgba56aAgB7sB0G9iewU58L/tFuGH9nb1nzYaAFK8eGQkvPAk+6GNDpoDVgyjJK3aQmvqYVx8ZzVdHNpFpFfx/C/S+V+uQxysTd621ZNmT39tDINy0PdYR1qGzWfUpmXgJjOwvU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419675; c=relaxed/simple; bh=w+GKyCxiK7UnwLz7nTaK27EoEWg1ukTr+XO1hSqevvk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=luNA7Xb/c0z4jsfXykzXhCqhPR4/sytiBLT3ZkWfozOpRcZ6p686nRVRSgoIhRnvHf25EcJV+bDFnzxPWhuxKY1g8HxH4Al/z+LaGikSBhDBKNZXaMrtMnX5lMJO2//9C+HPpjdzoLr+USLwNJcAiIzIJCR1b1rhp9YWx1pJngE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=J1B8VSON; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="J1B8VSON" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419673; x=1756955673; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w+GKyCxiK7UnwLz7nTaK27EoEWg1ukTr+XO1hSqevvk=; b=J1B8VSONQfwmGgCy6iy93VHiaOJ+iuJmAVyPIf1iGVOMG93LKmeEgMho OnH8iNtQPN80j2jxxnc/SEBqCXqJAkZoF4S/thiid6a5lSKVW/jqMN70b tJ+vPc9Bq6vMeWfUtzNCsfAFS2m3ujGro7+2YyRu+Nt8i+0JDrJdFt3Hy 5TwT3vDYF3uEJD0NkgxToZDLrN4yL4DIPGaEkh41zoHgps0XeEDoX/tHg mZmI/cvBNMTxnMvJ0GXYTMX5A63rODXuyXcTzR2TG5RRI3lerZSFVUk3S 4P1elXn0wDpjxCQGoApHdiv57dG+j6UuRJuPrd+sQNRSPO2qfIq2Eu1z3 g==; X-CSE-ConnectionGUID: 2hG7oIIvSdKB1VxIPhyoJg== X-CSE-MsgGUID: 5Zr2ENC7T1q9SD8dIBCsqA== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564632" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564632" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:00 -0700 X-CSE-ConnectionGUID: kNhRQgSqSUCgZO44tWkFXA== X-CSE-MsgGUID: kc6yr9tFQWiamHJtDlbcWA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106228" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:07:59 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, Yuan Yao , Binbin Wu Subject: [PATCH 03/21] KVM: x86/mmu: Do not enable page track for TD guest Date: Tue, 3 Sep 2024 20:07:33 -0700 Message-Id: <20240904030751.117579-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yan Zhao TDX does not support write protection and hence page track. Though !tdp_enabled and kvm_shadow_root_allocated(kvm) are always false for TD guest, should also return false when external write tracking is enabled. Cc: Yuan Yao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- v19: - drop TDX: from the short log - Added reviewed-by: BinBin --- arch/x86/kvm/mmu/page_track.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 561c331fd6ec..26436113103a 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -35,6 +35,9 @@ static bool kvm_external_write_tracking_enabled(struct kvm *kvm) bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) { + if (kvm->arch.vm_type == KVM_X86_TDX_VM) + return false; + return kvm_external_write_tracking_enabled(kvm) || kvm_shadow_root_allocated(kvm) || !tdp_enabled; } From patchwork Wed Sep 4 03:07:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789637 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2147D6CDCC; Wed, 4 Sep 2024 03:14:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419676; cv=none; b=BO3YIusfBXpR0CC6PRKjyw7pyvqpRudM6sZJeXB+yhOjjtIKYqwFwFMgH2/1B5rxXCDOkjppYt0j7Q0jfasQCcIhegq9CiAMQr0e3bCpCPRYhzuw7DtBzp61DPTe9uSyN7OV4SEMC0uvsqqtKYb2JzRY5Uq3VEHzJ2qZaK0bmVI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419676; c=relaxed/simple; bh=dJIz3M3DQymfcueoB4QfkdH4fJLqj05YhkoxZc/qRRw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=O/EzowXS8+g3lORv9m9m/8TdXtTwc0BDSm5NduBT8fHpq/RYWOAs+xdWHSrAEd5HWelgX5XtVSNcv6FaXvgwm5yphjagbphxD81Jz/WTnZBEa7+qSlDPle6jgJkOU6MyBFUDzSzyz/R/lAv44kngXB0YPtQcj2CG3h3VGP9Jw6U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dBtbhDRU; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dBtbhDRU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419674; x=1756955674; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dJIz3M3DQymfcueoB4QfkdH4fJLqj05YhkoxZc/qRRw=; b=dBtbhDRUR/32fxvoKNUUF2TDJYV5E+nLnmW/3SsItHQdwhSzqlGEhpbs 34lNGkqQcv450k/4fL2T/VVcjacOAazyfr/QXPSGLGkJ1i5/ZYgNsiGbu kYXvqtZvC47Vvjml0GYIsD+dWJGEiWLCDgO1c6N6ylQDQYs7qfBTLr5OU XFIQTwmAlrC0zEdVU4xJgbo7Y+YzSXXYd0+qes7uwMq8Eku4qWXVRjgmq gqxcgKfHxNNOFOqo6nzwod7ZjBRFvphzyZJ1bU7Foo0PAlvEu3PoU2fRl sFZTV+DaQg0/gdUrFCTtzlD9WFCoa3Md+KLOAsILh8b+NFhFMtttxR1oT Q==; X-CSE-ConnectionGUID: wJ2ZBED9SAOcvJ9Z3qKE+Q== X-CSE-MsgGUID: Sn8lkDxtSki6rZduVkXE8Q== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564637" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564637" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:00 -0700 X-CSE-ConnectionGUID: x5BXQv0NRdqDhTPGtlCUSQ== X-CSE-MsgGUID: WvSENB9dTYOaxEYMYBWDiw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106237" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:07:59 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, Binbin Wu Subject: [PATCH 04/21] KVM: VMX: Split out guts of EPT violation to common/exposed function Date: Tue, 3 Sep 2024 20:07:34 -0700 Message-Id: <20240904030751.117579-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson The difference of TDX EPT violation is how to retrieve information, GPA, and exit qualification. To share the code to handle EPT violation, split out the guts of EPT violation handler so that VMX/TDX exit handler can call it after retrieving GPA and exit qualification. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini Reviewed-by: Kai Huang Reviewed-by: Binbin Wu --- arch/x86/kvm/vmx/common.h | 34 ++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 25 +++---------------------- 2 files changed, 37 insertions(+), 22 deletions(-) create mode 100644 arch/x86/kvm/vmx/common.h diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h new file mode 100644 index 000000000000..78ae39b6cdcd --- /dev/null +++ b/arch/x86/kvm/vmx/common.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_X86_VMX_COMMON_H +#define __KVM_X86_VMX_COMMON_H + +#include + +#include "mmu.h" + +static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, + unsigned long exit_qualification) +{ + u64 error_code; + + /* Is it a read fault? */ + error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) + ? PFERR_USER_MASK : 0; + /* Is it a write fault? */ + error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) + ? PFERR_WRITE_MASK : 0; + /* Is it a fetch fault? */ + error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) + ? PFERR_FETCH_MASK : 0; + /* ept page table entry is present? */ + error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) + ? PFERR_PRESENT_MASK : 0; + + if (error_code & EPT_VIOLATION_GVA_IS_VALID) + error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? + PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); +} + +#endif /* __KVM_X86_VMX_COMMON_H */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5e7b5732f35d..ade7666febe9 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -53,6 +53,7 @@ #include #include "capabilities.h" +#include "common.h" #include "cpuid.h" #include "hyperv.h" #include "kvm_onhyperv.h" @@ -5771,11 +5772,8 @@ static int handle_task_switch(struct kvm_vcpu *vcpu) static int handle_ept_violation(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; + unsigned long exit_qualification = vmx_get_exit_qual(vcpu); gpa_t gpa; - u64 error_code; - - exit_qualification = vmx_get_exit_qual(vcpu); /* * EPT violation happened while executing iret from NMI, @@ -5791,23 +5789,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); trace_kvm_page_fault(vcpu, gpa, exit_qualification); - /* Is it a read fault? */ - error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) - ? PFERR_USER_MASK : 0; - /* Is it a write fault? */ - error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) - ? PFERR_WRITE_MASK : 0; - /* Is it a fetch fault? */ - error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) - ? PFERR_FETCH_MASK : 0; - /* ept page table entry is present? */ - error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) - ? PFERR_PRESENT_MASK : 0; - - if (error_code & EPT_VIOLATION_GVA_IS_VALID) - error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? - PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; - /* * Check that the GPA doesn't exceed physical memory limits, as that is * a guest page fault. We have to emulate the instruction here, because @@ -5819,7 +5800,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa))) return kvm_emulate_instruction(vcpu, 0); - return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); + return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification); } static int handle_ept_misconfig(struct kvm_vcpu *vcpu) From patchwork Wed Sep 4 03:07:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789634 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44C3D47F4A; Wed, 4 Sep 2024 03:14:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419674; cv=none; b=YavSEHgJC2cd9BE6qqnAXnEfASboO0yiaHw4Atx9t5c3x1YICxl2JIQIv+R7W5FcbKzHDd2HTgnQOEpe+5XKZFw8pC62sLKlfQm5x9NZzFRSMY/1Rwwlq0IoYz1IhPaQII8ZO0EMIzTKQXkYiiEpMZe9CZm9iirit3tpoAttY4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419674; c=relaxed/simple; bh=ERxpuu2tmgTUyl4OT2x8+HukYN0tTSxMFn6qwuOdsfs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=D5o1pGtyLZQwUzJoLkLe2cGfVvYl9cNTP833PxNofucNJDCxZ/5NR9Reecww28M8LOlk4hNR/1CDqRkBc5knU5fWBMCpXcVjjB/fAqkOJZDXnE3zLu7lwprhId4M8sBLKXk33qPFL8oVLrFX2kDUvqRiv9nsMmyYg2RFAHOlrUg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YhKMPf4Q; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YhKMPf4Q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419672; x=1756955672; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ERxpuu2tmgTUyl4OT2x8+HukYN0tTSxMFn6qwuOdsfs=; b=YhKMPf4QKorMaRAdagWYJ2ypMIHqVTDemo48KGCQIXlLNelv/Ioh1sfj xaCxeNtzxulKgVcdHDQN5k58cN+I5WGPJghni2xZfaMoldKXxXrM+2bTh SNddX5dZisLANq228D3J57nWgIZLul/9lTfd9H3ADHZB+CaEfR4PmHtJ9 3ybkFsORyPtZYE1NW4H5L37TjAOGX3F/dNK5mmxSclDh6/XyeeMYIP7QN RddKhrccap/krGAu8PYyZv5mQ5VsHEZ1xQmxnH0hg5utbcMysAI45ng46 GKc2OnwldiE0VcXfv5IoTFfou4HCh1Ih5RrJmgJdtwsiRalCmGId+j41/ w==; X-CSE-ConnectionGUID: StpK510jR5SHMyAdpCajGw== X-CSE-MsgGUID: CtwKzyvmTim4YiH+6H+yyg== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564644" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564644" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:01 -0700 X-CSE-ConnectionGUID: g1k1juoLRduHQHVTuqEfIQ== X-CSE-MsgGUID: t9WSXgTgQ9mLifELb9dgBA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106246" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:00 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 05/21] KVM: VMX: Teach EPT violation helper about private mem Date: Tue, 3 Sep 2024 20:07:35 -0700 Message-Id: <20240904030751.117579-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Teach EPT violation helper to check shared mask of a GPA to find out whether the GPA is for private memory. When EPT violation is triggered after TD accessing a private GPA, KVM will exit to user space if the corresponding GFN's attribute is not private. User space will then update GFN's attribute during its memory conversion process. After that, TD will re-access the private GPA and trigger EPT violation again. Only with GFN's attribute matches to private, KVM will fault in private page, map it in mirrored TDP root, and propagate changes to private EPT to resolve the EPT violation. Relying on GFN's attribute tracking xarray to determine if a GFN is private, as for KVM_X86_SW_PROTECTED_VM, may lead to endless EPT violations. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Split from "KVM: TDX: handle ept violation/misconfig exit" --- arch/x86/kvm/vmx/common.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 78ae39b6cdcd..10aa12d45097 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -6,6 +6,12 @@ #include "mmu.h" +static inline bool kvm_is_private_gpa(struct kvm *kvm, gpa_t gpa) +{ + /* For TDX the direct mask is the shared mask. */ + return !kvm_is_addr_direct(kvm, gpa); +} + static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long exit_qualification) { @@ -28,6 +34,13 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + /* + * Don't rely on GFN's attribute tracking xarray to prevent EPT violation + * loops. + */ + if (kvm_is_private_gpa(vcpu->kvm, gpa)) + error_code |= PFERR_PRIVATE_ACCESS; + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } From patchwork Wed Sep 4 03:07:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789640 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E25678C89; Wed, 4 Sep 2024 03:14:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419678; cv=none; b=bX6uexTVMvzBSg7qqdZ9yLr1dk7Hf50YPsEjf4te+JTi42rqK2WPkP06W5jekUnnb8t8VVkeu47BiThvT3JXA/VqobaDtW7F+7p0URYXxzjYNMxRG03JgkWSn2/R73Gyf8Ly1TTipSiQV9E4UlhDPKLbUSLrJ/y9EVRbCkLnjUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419678; c=relaxed/simple; bh=wkKEjNDf9/+HaaMVib+qLpU8xC2xWIBhbLnIL7xU3wQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NS1GRc8HJy6A4F13KAAqH/3wsus7cTub6+EByAouRoc0jNeop/fHOHvclFDC2PdiMrThrnFzc5eB1jenEdY3eO7B+t7ZGYtXxr7XdMuG0DZm6rZzvIni1GlnhzI4tRx1/ShJlRJGwa6i4ZatwYdEs47rTYtpjwRKsWRY3iCz0IU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FuMhmiZO; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FuMhmiZO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419676; x=1756955676; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wkKEjNDf9/+HaaMVib+qLpU8xC2xWIBhbLnIL7xU3wQ=; b=FuMhmiZOV/85/nINHxEG9J8nEAlBdntjBgcW7ZPiNYgruc5wHnhmGH4M 4LtmAGHo2ZyrONsYcH4cLo8X9fWPM7btOA/XdduvHSXBx2o5EqOsnFFPa ynhilOaVmFrLZAcOLNx+Xlfg53Gvpcdw20EG4l44N0dA/Ck3qL+DnWNME FlJq/d2QBLyGRjLzEKN39k1eWyu2IY12WRdNQe4FZsYvYOZNSrZ8BBf8u lFgnJqaZ0Js44dvpcnsThyRS4a3cm8inlCyRt/sfZ6RGSJH5kmJjBE1Nx EhLIAFYWV/I+LQtmy27fKR39SssU2vEQtvqZN5SpwgZNlp9dCZgZBsyCV A==; X-CSE-ConnectionGUID: 2jvrWJ6qTZCT4Wb/usF8tw== X-CSE-MsgGUID: 0SmFZtFvQYK/AaCy9LfWuA== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564649" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564649" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:01 -0700 X-CSE-ConnectionGUID: QeHFI92JTWqjtygkrbQ+Kg== X-CSE-MsgGUID: S1XkK/AATJ6p4U249fE61A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106251" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:01 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 06/21] KVM: TDX: Add accessors VMX VMCS helpers Date: Tue, 3 Sep 2024 20:07:36 -0700 Message-Id: <20240904030751.117579-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX defines SEAMCALL APIs to access TDX control structures corresponding to the VMX VMCS. Introduce helper accessors to hide its SEAMCALL ABI details. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Update for the wrapper functions for SEAMCALLs. (Sean) - Eliminate kvm_mmu_free_private_spt() and open code it. - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - deleted unnecessary stub functions, tdvps_state_non_arch_check() and tdvps_management_check(). --- arch/x86/kvm/vmx/tdx.h | 87 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 7eeb54fbcae1..66540c57ed61 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -76,6 +76,93 @@ static __always_inline struct vcpu_tdx *to_tdx(struct kvm_vcpu *vcpu) */ #include "tdx_ops.h" +static __always_inline void tdvps_vmcs_check(u32 field, u8 bits) +{ +#define VMCS_ENC_ACCESS_TYPE_MASK 0x1UL +#define VMCS_ENC_ACCESS_TYPE_FULL 0x0UL +#define VMCS_ENC_ACCESS_TYPE_HIGH 0x1UL +#define VMCS_ENC_ACCESS_TYPE(field) ((field) & VMCS_ENC_ACCESS_TYPE_MASK) + + /* TDX is 64bit only. HIGH field isn't supported. */ + BUILD_BUG_ON_MSG(__builtin_constant_p(field) && + VMCS_ENC_ACCESS_TYPE(field) == VMCS_ENC_ACCESS_TYPE_HIGH, + "Read/Write to TD VMCS *_HIGH fields not supported"); + + BUILD_BUG_ON(bits != 16 && bits != 32 && bits != 64); + +#define VMCS_ENC_WIDTH_MASK GENMASK(14, 13) +#define VMCS_ENC_WIDTH_16BIT (0UL << 13) +#define VMCS_ENC_WIDTH_64BIT (1UL << 13) +#define VMCS_ENC_WIDTH_32BIT (2UL << 13) +#define VMCS_ENC_WIDTH_NATURAL (3UL << 13) +#define VMCS_ENC_WIDTH(field) ((field) & VMCS_ENC_WIDTH_MASK) + + /* TDX is 64bit only. i.e. natural width = 64bit. */ + BUILD_BUG_ON_MSG(bits != 64 && __builtin_constant_p(field) && + (VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_64BIT || + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_NATURAL), + "Invalid TD VMCS access for 64-bit field"); + BUILD_BUG_ON_MSG(bits != 32 && __builtin_constant_p(field) && + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_32BIT, + "Invalid TD VMCS access for 32-bit field"); + BUILD_BUG_ON_MSG(bits != 16 && __builtin_constant_p(field) && + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_16BIT, + "Invalid TD VMCS access for 16-bit field"); +} + +#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \ +static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \ + u32 field) \ +{ \ + u64 err, data; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_rd(tdx, TDVPS_##uclass(field), &data); \ + if (KVM_BUG_ON(err, tdx->vcpu.kvm)) { \ + pr_err("TDH_VP_RD["#uclass".0x%x] failed: 0x%llx\n", \ + field, err); \ + return 0; \ + } \ + return (u##bits)data; \ +} \ +static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx, \ + u32 field, u##bits val) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(tdx, TDVPS_##uclass(field), val, \ + GENMASK_ULL(bits - 1, 0)); \ + if (KVM_BUG_ON(err, tdx->vcpu.kvm)) \ + pr_err("TDH_VP_WR["#uclass".0x%x] = 0x%llx failed: 0x%llx\n", \ + field, (u64)val, err); \ +} \ +static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *tdx, \ + u32 field, u64 bit) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(tdx, TDVPS_##uclass(field), bit, bit); \ + if (KVM_BUG_ON(err, tdx->vcpu.kvm)) \ + pr_err("TDH_VP_WR["#uclass".0x%x] |= 0x%llx failed: 0x%llx\n", \ + field, bit, err); \ +} \ +static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ + u32 field, u64 bit) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(tdx, TDVPS_##uclass(field), 0, bit); \ + if (KVM_BUG_ON(err, tdx->vcpu.kvm)) \ + pr_err("TDH_VP_WR["#uclass".0x%x] &= ~0x%llx failed: 0x%llx\n", \ + field, bit, err); \ +} + +TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); +TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); +TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs); #else static inline void tdx_bringup(void) {} static inline void tdx_cleanup(void) {} From patchwork Wed Sep 4 03:07:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789638 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2C3973451; Wed, 4 Sep 2024 03:14:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419676; cv=none; b=tU/JWVFokS4eU3C3T6oDhq3CIDbC4B/3xBkFxUzCv7NdDj/PqSczVE45FFNILa3Lzw+yFxqwqinwDQhwI8+Iz/Rz7MGU8CuDmeSWKXphgsHHKY4jotY9UxTGqWxX0Rj9PaFY44iAzU3LUnJPD4/uXeFM4RE18sCchsL997xpr64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419676; c=relaxed/simple; bh=XkVvA1F9Zc4YZMj/WX+e9PpU3SrBeleKnicgyV8p5Ho=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GJX0UHeUtiGJlURBOoNL1F2z4BWUB8FekUtXhOpUinDrBs1VLeDuPHvMyIIcQ6KPCTpeGCb+8i3UZs3QvC2OvasNSCepCp6mKc4POYinJ6lUOK+z9qPPEFZ3Z1dYUQanflF9lnlf2WXbIo8UOFBzQWCR+N0Wd9D8wWZ2ixhma/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kaDwkPcv; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kaDwkPcv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419675; x=1756955675; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XkVvA1F9Zc4YZMj/WX+e9PpU3SrBeleKnicgyV8p5Ho=; b=kaDwkPcvONtbO/LkzhhTyLe966cqZgRyjASJpDsMocyOkQb9mXWscCI3 u7YXMepjBSSyAa2rga5Idx3mBF6lDnvgMGQrUhVHDRzLgBJvPnrGv8hbV MERNOkUUxONzekNn4UovII7snhkzZKoSZsBA7fNCySXDcd1iU7qbpb//S kvK1WlqCX410bEZYXKALDq19uqIo20iO6aKzQHt52b2bFi7e+r4tMcNTn QVCAprFTd+lxJ3C+1xAYSS80M2XWqGunQR9ZG9Xx4Afy0Ra71OXF+vvv2 4bqJr5wHmEEk8jY49UhHnfqPxxhIrNK3bt9jd35UZJTjxwbe1PGkRL7qH w==; X-CSE-ConnectionGUID: QESr2VnAQT2pZQqQ0fSjlQ== X-CSE-MsgGUID: 9oEqC8NnT46l2yzWFhLjSg== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564656" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564656" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:02 -0700 X-CSE-ConnectionGUID: AZ5I2h3vTvO7+55aKkm1bw== X-CSE-MsgGUID: zWCtVkeqRr+hfWQlcxm+wg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106257" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:01 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 07/21] KVM: TDX: Add load_mmu_pgd method for TDX Date: Tue, 3 Sep 2024 20:07:37 -0700 Message-Id: <20240904030751.117579-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson TDX uses two EPT pointers, one for the private half of the GPA space and one for the shared half. The private half uses the normal EPT_POINTER vmcs field, which is managed in a special way by the TDX module. For TDX, KVM is not allowed to operate on it directly. The shared half uses a new SHARED_EPT_POINTER field and will be managed by the conventional MMU management operations that operate directly on the EPT root. This means for TDX the .load_mmu_pgd() operation will need to know to use the SHARED_EPT_POINTER field instead of the normal one. Add a new wrapper in x86 ops for load_mmu_pgd() that either directs the write to the existing vmx implementation or a TDX one. tdx_load_mmu_pgd() is so much simpler than vmx_load_mmu_pgd() since for the TDX mode of operation, EPT will always be used and KVM does not need to be involved in virtualization of CR3 behavior. So tdx_load_mmu_pgd() can simply write to SHARED_EPT_POINTER. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - update the commit msg with the version rephrased by Rick. https://lore.kernel.org/all/78b1024ec3f5868e228baf797c6be98c5397bd49.camel@intel.com/ v19: - Add WARN_ON_ONCE() to tdx_load_mmu_pgd() and drop unconditional mask --- arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/vmx/main.c | 13 ++++++++++++- arch/x86/kvm/vmx/tdx.c | 5 +++++ arch/x86/kvm/vmx/x86_ops.h | 4 ++++ 4 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index d77a31039f24..3e003183a4f7 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -237,6 +237,7 @@ enum vmcs_field { TSC_MULTIPLIER_HIGH = 0x00002033, TERTIARY_VM_EXEC_CONTROL = 0x00002034, TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, + SHARED_EPT_POINTER = 0x0000203C, PID_POINTER_TABLE = 0x00002042, PID_POINTER_TABLE_HIGH = 0x00002043, GUEST_PHYSICAL_ADDRESS = 0x00002400, diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d63685ea95ce..c9dfa3aa866c 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -100,6 +100,17 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, + int pgd_level) +{ + if (is_td_vcpu(vcpu)) { + tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level); + return; + } + + vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -229,7 +240,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .write_tsc_offset = vmx_write_tsc_offset, .write_tsc_multiplier = vmx_write_tsc_multiplier, - .load_mmu_pgd = vmx_load_mmu_pgd, + .load_mmu_pgd = vt_load_mmu_pgd, .check_intercept = vmx_check_intercept, .handle_exit_irqoff = vmx_handle_exit_irqoff, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 2ef95c84ee5b..8f43977ef4c6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -428,6 +428,11 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) */ } +void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) +{ + td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index debc6877729a..dcf2b36efbb9 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -130,6 +130,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu); void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); + +void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -142,6 +144,8 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } + +static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ From patchwork Wed Sep 4 03:07:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789639 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D91874418; Wed, 4 Sep 2024 03:14:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419677; cv=none; b=BncibAK1vwzn991UfAFRb/xB78xG0gYcRdp/lT2HubA4SLZYZmF5mVsSduDBxVJw3IxEfZv3QA0oeYvGMqR7kF693iHoKeE2gmrFd3o/VxE8eZYHG7aeIxkfC282t4V906WKpTOJ8NwKi25KFJQ14CbBNl8t1cMludfG6Ear9Jk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419677; c=relaxed/simple; bh=m0aJmLVGUjr94xP1ERSToS276WOj1bIg2E0s9dvuB1U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FGUEbswwsly00iMcLTHzzX+MKY6sdKzMwBrfHDwZDSOh/3hbTnvebv3Is3vjujDcEg1PRUkUAdLwc+5JHGTBGAJA8IMsczCWkGiWNYhSfIGE3D3I3VxdN9IBJu/QTp24AeFKkdEqyqPzoahluZqWrpW0YrRl87h9OfSZeJmejLo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EA686JL/; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EA686JL/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419675; x=1756955675; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m0aJmLVGUjr94xP1ERSToS276WOj1bIg2E0s9dvuB1U=; b=EA686JL/fEoN22Og7t1rGXbPlc6TQXDK9R/w6f65EV/rUZH5OXs9ocCp 6Hgw0JQAYUzhLIrxxIG8G2NhBUbmkvTb5JMQVluw3PzYE3x4x6R/Ce7/L P7oM65pXt10nBaR9kxfKOB3o3+7Na5pHcOlKig6OIXZh/mWY3+r7yA05T htDnk8JoUig6J2EYmrZJ6tt/YeKYRpsp6p0SlttnseW6YK8NK1pKPVEjl /8D5zNf6PAlcIWf1bDog4W48CljcgUGvMd98chIH2fwt81w49Q3tiNyKB ypx8nHwIfy0fDzjrWc0VzBk/VACPTr9FsuBa/BoDvjZCqDrQMZJAG3bCq Q==; X-CSE-ConnectionGUID: 4WKa0l1bQCqj7gDhLSS+pg== X-CSE-MsgGUID: eFGCdBUGQnqr32N5NJFyrg== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564665" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564665" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 X-CSE-ConnectionGUID: 2x/iTZbURAehzn8Ov8nFrg== X-CSE-MsgGUID: K/RD8d26TzqXJwI0iKeSMw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106270" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:02 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 08/21] KVM: TDX: Set gfn_direct_bits to shared bit Date: Tue, 3 Sep 2024 20:07:38 -0700 Message-Id: <20240904030751.117579-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Make the direct root handle memslot GFNs at an alias with the TDX shared bit set. For TDX shared memory, the memslot GFNs need to be mapped at an alias with the shared bit set. These shared mappings will be be mapped on the KVM MMU's "direct" root. The direct root has it's mappings shifted by applying "gfn_direct_bits" as a mask. The concept of "GPAW" (guest physical address width) determines the location of the shared bit. So set gfn_direct_bits based on this, to map shared memory at the proper GPA. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Move setting of gfn_direct_bits to separate patch (Yan) --- arch/x86/kvm/vmx/tdx.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 8f43977ef4c6..25c24901061b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -921,6 +921,11 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) kvm_tdx->attributes = td_params->attributes; kvm_tdx->xfam = td_params->xfam; + if (td_params->exec_controls & TDX_EXEC_CONTROL_MAX_GPAW) + kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(51)); + else + kvm->arch.gfn_direct_bits = gpa_to_gfn(BIT_ULL(47)); + out: /* kfree() accepts NULL. */ kfree(init_vm); From patchwork Wed Sep 4 03:07:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789641 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EAC68121B; Wed, 4 Sep 2024 03:14:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419679; cv=none; b=gdWYwy81WW2KnARvNM5NyjffPqhrGsrCMdOH9rdfciGOvEYImtCkEDQTmbzo/mblDvcBa4ih2dNxjn8xgLEFv12jVejZCEasxxTmAJfg6ZAuU/s3iYDPOX/VbHA+GGdLLjjzxky0U8p3XEP3zvwiWlFBwwjdxLCmRkddYAvy5vg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419679; c=relaxed/simple; bh=xW7XPRQeiRKx4GXYs8rvZQUSXXmZ+f+w5kAxqysOs6Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=YOwppWv0Z/H4WnrG9Upf7SjiHwqDILZ6MSkD6dUPjdyoxG/Gtn4gOdjwIHsbf4Py3ZfJjj1N3eiRRWxAa7AmvJGMESjg6geV8sr/SiGTdLWYkKXuxrsawYwRmoQSre+i5qpJEcZvu2yJCVEjc3lfwI2aPxgHPyQXmqApFLGX540= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TlFkq+FH; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TlFkq+FH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419677; x=1756955677; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xW7XPRQeiRKx4GXYs8rvZQUSXXmZ+f+w5kAxqysOs6Q=; b=TlFkq+FHf7vXNykzvKNxNZA4JE3wsQvMbknwnlYAZ2mX5COYc5g52ovS +ZsMC7tw4hGeYL257RNy3Lr0rmQsIY2ztmvRsS7ySmDkr0SIp6kd2vpIC blqLhW30txzSMBC8r+EuCkC0A8BJj2IM6ZtypvUQUKqI8dNaM/4ghEMen y3xHyccegAYvkh0f/EiPNkrwh6TaG2G4f5Wbn24I1tDhLd4nHQE8E3yPO Llj9ppvANPUpVNEcPZRm6wiC/RH6fOKXX2ejYnj+Suu+NuWtc0Rmd1Y8B Fe4c4BfqwWSO4QFllb3X5o6q36+KOiwM7NHCqfg3RaCPnZjjxaml+ij2h g==; X-CSE-ConnectionGUID: qBWwsdF7ROSIZYkvVo8DJA== X-CSE-MsgGUID: XC4ipXY7Sf+t7LQ10ig2OQ== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564673" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564673" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 X-CSE-ConnectionGUID: OjmE+G0MTdSlc73sQO8Pxg== X-CSE-MsgGUID: AfRUoVkYSUWoU4oyAwjKxA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106279" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:05 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, Yuan Yao Subject: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Date: Tue, 3 Sep 2024 20:07:39 -0700 Message-Id: <20240904030751.117579-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yuan Yao TDX module internally uses locks to protect internal resources. It tries to acquire the locks. If it fails to obtain the lock, it returns TDX_OPERAND_BUSY error without spin because its execution time limitation. TDX SEAMCALL API reference describes what resources are used. It's known which TDX SEAMCALL can cause contention with which resources. VMM can avoid contention inside the TDX module by avoiding contentious TDX SEAMCALL with, for example, spinlock. Because OS knows better its process scheduling and its scalability, a lock at OS/VMM layer would work better than simply retrying TDX SEAMCALLs. TDH.MEM.* API except for TDH.MEM.TRACK operates on a secure EPT tree and the TDX module internally tries to acquire the lock of the secure EPT tree. They return TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT in case of failure to get the lock. TDX KVM allows sept callbacks to return error so that TDP MMU layer can retry. Retry TDX TDH.MEM.* API on the error because the error is a rare event caused by zero-step attack mitigation. Signed-off-by: Yuan Yao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Updates from seamcall overhaul (Kai) v19: - fix typo TDG.VP.ENTER => TDH.VP.ENTER, TDX_OPRRAN_BUSY => TDX_OPERAND_BUSY - drop the description on TDH.VP.ENTER as this patch doesn't touch TDH.VP.ENTER --- arch/x86/kvm/vmx/tdx_ops.h | 48 ++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 0363d8544f42..8ca3e252a6ed 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -31,6 +31,40 @@ #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \ pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8) +/* + * TDX module acquires its internal lock for resources. It doesn't spin to get + * locks because of its restrictions of allowed execution time. Instead, it + * returns TDX_OPERAND_BUSY with an operand id. + * + * Multiple VCPUs can operate on SEPT. Also with zero-step attack mitigation, + * TDH.VP.ENTER may rarely acquire SEPT lock and release it when zero-step + * attack is suspected. It results in TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT + * with TDH.MEM.* operation. Note: TDH.MEM.TRACK is an exception. + * + * Because TDP MMU uses read lock for scalability, spin lock around SEAMCALL + * spoils TDP MMU effort. Retry several times with the assumption that SEPT + * lock contention is rare. But don't loop forever to avoid lockup. Let TDP + * MMU retry. + */ +#define TDX_ERROR_SEPT_BUSY (TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT) + +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in) +{ +#define SEAMCALL_RETRY_MAX 16 + struct tdx_module_args args_in; + int retry = SEAMCALL_RETRY_MAX; + u64 ret; + + do { + args_in = *in; + ret = seamcall_ret(op, in); + } while (ret == TDX_ERROR_SEPT_BUSY && retry-- > 0); + + *in = args_in; + + return ret; +} + static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr) { struct tdx_module_args in = { @@ -55,7 +89,7 @@ static inline u64 tdh_mem_page_add(struct kvm_tdx *kvm_tdx, gpa_t gpa, u64 ret; clflush_cache_range(__va(hpa), PAGE_SIZE); - ret = seamcall_ret(TDH_MEM_PAGE_ADD, &in); + ret = tdx_seamcall_sept(TDH_MEM_PAGE_ADD, &in); *rcx = in.rcx; *rdx = in.rdx; @@ -76,7 +110,7 @@ static inline u64 tdh_mem_sept_add(struct kvm_tdx *kvm_tdx, gpa_t gpa, clflush_cache_range(__va(page), PAGE_SIZE); - ret = seamcall_ret(TDH_MEM_SEPT_ADD, &in); + ret = tdx_seamcall_sept(TDH_MEM_SEPT_ADD, &in); *rcx = in.rcx; *rdx = in.rdx; @@ -93,7 +127,7 @@ static inline u64 tdh_mem_sept_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa, }; u64 ret; - ret = seamcall_ret(TDH_MEM_SEPT_REMOVE, &in); + ret = tdx_seamcall_sept(TDH_MEM_SEPT_REMOVE, &in); *rcx = in.rcx; *rdx = in.rdx; @@ -123,7 +157,7 @@ static inline u64 tdh_mem_page_aug(struct kvm_tdx *kvm_tdx, gpa_t gpa, hpa_t hpa u64 ret; clflush_cache_range(__va(hpa), PAGE_SIZE); - ret = seamcall_ret(TDH_MEM_PAGE_AUG, &in); + ret = tdx_seamcall_sept(TDH_MEM_PAGE_AUG, &in); *rcx = in.rcx; *rdx = in.rdx; @@ -140,7 +174,7 @@ static inline u64 tdh_mem_range_block(struct kvm_tdx *kvm_tdx, gpa_t gpa, }; u64 ret; - ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &in); + ret = tdx_seamcall_sept(TDH_MEM_RANGE_BLOCK, &in); *rcx = in.rcx; *rdx = in.rdx; @@ -335,7 +369,7 @@ static inline u64 tdh_mem_page_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa, }; u64 ret; - ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &in); + ret = tdx_seamcall_sept(TDH_MEM_PAGE_REMOVE, &in); *rcx = in.rcx; *rdx = in.rdx; @@ -361,7 +395,7 @@ static inline u64 tdh_mem_range_unblock(struct kvm_tdx *kvm_tdx, gpa_t gpa, }; u64 ret; - ret = seamcall_ret(TDH_MEM_RANGE_UNBLOCK, &in); + ret = tdx_seamcall_sept(TDH_MEM_RANGE_UNBLOCK, &in); *rcx = in.rcx; *rdx = in.rdx; From patchwork Wed Sep 4 03:07:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789642 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E328F824AC; Wed, 4 Sep 2024 03:14:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419679; cv=none; b=RahzvW0GYneEVg4rh10Aohqrc8ndmcsc9PK+ZDkhEGZx/GwzaBi0/uYCr3MRiAulxGKzIqggOHeGzEVldJhvyCALAHM1t9u5u44OQqTEQCqeZrgPxDAQyPpmN6hz/H8mqomtqdvXqnNBqXG872hZgz1OyxWrQ++L83dBmONDEh0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419679; c=relaxed/simple; bh=U/Ua4SaeSzKMaUdpZlIQ2ga4XtLDXbBrIYvv9HJOes8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XW3wy98ugYbXoejJYN6w8hQ1MvAjk/XTCPcLxVr67nF4rxiXKFj3Smv61KH0/WHeZ5C2zt7ncFgIAzGCv8XrtiQD2mZK+Tb9rKvhoXzMNXZX9tQd0+66A4xpnhlLecj0Y5MpykcHPdeBg5XsC+5DQtazjo0wKB+NwUdEP5aZdRc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=JktX4HVq; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="JktX4HVq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419678; x=1756955678; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=U/Ua4SaeSzKMaUdpZlIQ2ga4XtLDXbBrIYvv9HJOes8=; b=JktX4HVqK7VVQmtX3O7q852lYe6bzYRM5ln9WZLlphQxisjCejaP/JRN hrRB3fZl4xHF9TfeA47dJAChPghOtvmF9AS9w0sVyJkEqzqHuvR1BHZUl 98X+64mRHwoh0TXHbUM1i+7zZYvEw2xxy3z1ZUo97xVG5KWRyIvpD/Nlt 0jq6AhUdFWrZ05WUIIkXyZppD/1CNkNrL/Wc4zKuzbefnyi3pHYROC0an VSv9806E/KUs7LfkTWaFGybQvE/RQmMa5ffDUjoLsdc3jplgz8Sov+390 WalzVrdelnwWlk8Ox01xgGUwTOIhbskShhRCv/aiHCLCX7ZiwM7TMY4H0 A==; X-CSE-ConnectionGUID: S998bzq2Rxun0l1mqjilkQ== X-CSE-MsgGUID: DVnHPCrWR4egWQcRkE3E/g== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564682" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564682" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 X-CSE-ConnectionGUID: UwcCFA/ETMORdYJJXWN0wQ== X-CSE-MsgGUID: w1ZEbwiQRva0aTclHnNZdg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106295" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:06 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 10/21] KVM: TDX: Require TDP MMU and mmio caching for TDX Date: Tue, 3 Sep 2024 20:07:40 -0700 Message-Id: <20240904030751.117579-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Disable TDX support when TDP MMU or mmio caching aren't supported. As TDP MMU is becoming main stream than the legacy MMU, the legacy MMU support for TDX isn't implemented. TDX requires KVM mmio caching. Without mmio caching, KVM will go to MMIO emulation without installing SPTEs for MMIOs. However, TDX guest is protected and KVM would meet errors when trying to emulate MMIOs for TDX guest during instruction decoding. So, TDX guest relies on SPTEs being installed for MMIOs, which are with no RWX bits and with VE suppress bit unset, to inject VE to TDX guest. The TDX guest would then issue TDVMCALL in the VE handler to perform instruction decoding and have host do MMIO emulation. Signed-off-by: Isaku Yamahata Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Addressed Binbin's comment by massaging Isaku's updated comments and adding more explanations about instroducing mmio caching. - Addressed Sean's comments of v19 according to Isaku's update but kept the warning for MOVDIR64B. - Move code change in tdx_hardware_setup() to __tdx_bringup() since the former has been removed. --- arch/x86/kvm/mmu/mmu.c | 1 + arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/tdx.c | 8 +++----- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 01808cdf8627..d26b235d8f84 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -110,6 +110,7 @@ static bool __ro_after_init tdp_mmu_allowed; #ifdef CONFIG_X86_64 bool __read_mostly tdp_mmu_enabled = true; module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444); +EXPORT_SYMBOL_GPL(tdp_mmu_enabled); #endif static int max_huge_page_level __read_mostly; diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c9dfa3aa866c..2cc29d0fc279 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -3,6 +3,7 @@ #include "x86_ops.h" #include "vmx.h" +#include "mmu.h" #include "nested.h" #include "pmu.h" #include "posted_intr.h" diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 25c24901061b..0c08062ef99f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1474,16 +1474,14 @@ static int __init __tdx_bringup(void) const struct tdx_sys_info_td_conf *td_conf; int r; + if (!tdp_mmu_enabled || !enable_mmio_caching) + return -EOPNOTSUPP; + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { pr_warn("MOVDIR64B is reqiured for TDX\n"); return -EOPNOTSUPP; } - if (!enable_ept) { - pr_err("Cannot enable TDX with EPT disabled.\n"); - return -EINVAL; - } - /* * Enabling TDX requires enabling hardware virtualization first, * as making SEAMCALLs requires CPU being in post-VMXON state. From patchwork Wed Sep 4 03:07:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789644 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9E2D8563E; Wed, 4 Sep 2024 03:14:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419681; cv=none; b=r0KzrR/IRsks2ixQktpk1FOYAJGp5qYDtydcjQss8E0Z92fmP7D+NLtQbEunEiZlwl7DiUOdE2ZqKexTEGuOCHMgZLVA29u7myji1FUTWiQZjUi07qnkjSreeQwLYQKfv558L/P9UHmeyTBrMHepmMlsNiMDbgP6p7pYKuoANUg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419681; c=relaxed/simple; bh=wP6n/PTkt2MkcS+mzx6vYUWeiv3rxjJDxuCBmkWaUKA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nFWQYowp2fpJZB7OCgluNygo7n7dnaEuHx3uh8rkifRm4bZ5AGG6aFfO8w2AaYdBs1ZZWBXY9XKefoH6ACOVM01F8U0l4ykJncSZOeFmd75XnHJdJQgJNNbSs6uZRL2Aoc6Y+uS8o2wdMg+9P45B1l03ooZHzZSGNnzxTdkWKbM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YTQRQ0v5; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YTQRQ0v5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419679; x=1756955679; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wP6n/PTkt2MkcS+mzx6vYUWeiv3rxjJDxuCBmkWaUKA=; b=YTQRQ0v5Ghjnpa8TKA2iNJu8NnzRv8kbZDp54nrjlFFiZ6wBoCeeKD+k Ep6HLYxxYMPtftD10nSULJNUbVG70zWiJDprpbEBpyTvHVDbYcu3MNHzX GtIba7hbm3kn8Z0pXopvo22dq4aantwdca0H8qv+Vd9yZrCFexB9EkrP7 oLZMJP5z71HORBq8y2tmwc6iIaciCCkNRF1DpMV/O9fEhSil4Ep4ZJwSp i01JDFNPBsAmoK6S02MgKc+R7bIw6DoU+kZIv9XAVlON0gWgGXYISJDlX 2nvEBWOgTr+QdVsRysZMCcVYguepT/EPLTBLyUP6AAal4kzm/VdljTp1F g==; X-CSE-ConnectionGUID: JPPdIrOGQSqmU1/9l1kmtA== X-CSE-MsgGUID: WbaGPGImQjKiiAZsNG4Esw== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564678" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564678" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 X-CSE-ConnectionGUID: 7J4gV4u/SeK88DkfSUzc5g== X-CSE-MsgGUID: Vjf2DoCwTn+HNFM3tZSqgw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106299" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:06 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 11/21] KVM: x86/mmu: Add setter for shadow_mmio_value Date: Tue, 3 Sep 2024 20:07:41 -0700 Message-Id: <20240904030751.117579-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Future changes will want to set shadow_mmio_value from TDX code. Add a helper to setter with a name that makes more sense from that context. Signed-off-by: Isaku Yamahata [split into new patch] Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Split into new patch --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/spte.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 5faa416ac874..72035154a23a 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -78,6 +78,7 @@ static inline gfn_t kvm_mmu_max_gfn(void) u8 kvm_mmu_get_max_tdp_level(void); void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); +void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index d4527965e48c..46a26be0245b 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -409,6 +409,12 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask) } EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); +void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value) +{ + kvm->arch.shadow_mmio_value = mmio_value; +} +EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value); + void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) { /* shadow_me_value must be a subset of shadow_me_mask */ From patchwork Wed Sep 4 03:07:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789643 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 647BB83A09; Wed, 4 Sep 2024 03:14:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419680; cv=none; b=EotKz9ils1J3zDQwdfQufNbGxwAY45qwBFRihlr6jIQ62KEEVQPnHYCxRPRloHugq1/rjF+/ymQBtQVH2e41ZWiZi1Ms5wfLFVwhiqMtdD6tuQtDNCCcM5XebBg42sNi+sq4RvQReoVIBhqvRX52OSkVX2kb+PqTwXh0pkdilYI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419680; c=relaxed/simple; bh=2SyccfbB6s6km++2Ia3ZkwXk8k1gu6IyfFGyB63uYas=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Gz7qnv04y9cVo/j7aY5Fidm5NU4U6Rq8gkNpPANwKaGPxSLH7UIElDN0nKTCU5w+RalyW2QHTxAVKShJNrWLm7YJNIeXrOFeGB2AByrJkUzhSu1HQ+i+UaL9awQMuQxQm9bl9T5EPuCUURDqaiqbEeUt9ufJUdXRljvMuTZZ8N4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TgQZqPzt; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TgQZqPzt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419678; x=1756955678; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2SyccfbB6s6km++2Ia3ZkwXk8k1gu6IyfFGyB63uYas=; b=TgQZqPzt9JWuAcCy6RAX5pzJDERxX7eahJwALFFx3PUwkrnZvbByE0rl sL98IX63aiep8XmrbuDIIJCzm8b2sO9bLyz+BxsfrOIEy0p50NINMI4V9 JBVCQA4qJ3R2q6P8qTtYw+bUL6tqP1bXz1VmebPxDEqhKTg/HURpxonV4 LvYbex4W1cn4SddOGR+LlV2CCZtOOODhu9AdRadApXiz//pOPSXJx1kB/ E8RDMlCutKeKVzq8DVjx88E7RIYT/LdtMdMOKsNQSUcO+fEA7qHB3i26E 4ZAP+pFQYqykUg1De+ANJdF2PUDeN7QR3/vln0RMrnpxLpAvqiC2aCuBp Q==; X-CSE-ConnectionGUID: g1Mxmek/Q4iBswtW+QHkvA== X-CSE-MsgGUID: DHZpvTmWRoaP9+/z5Y42wg== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564687" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564687" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 X-CSE-ConnectionGUID: 2ACuqrUzQteHeoXQFgRBQg== X-CSE-MsgGUID: N7sIXiqQQoSX0c6zwYroEw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106309" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:07 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 12/21] KVM: TDX: Set per-VM shadow_mmio_value to 0 Date: Tue, 3 Sep 2024 20:07:42 -0700 Message-Id: <20240904030751.117579-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Set per-VM shadow_mmio_value to 0 for TDX. With enable_mmio_caching on, KVM installs MMIO SPTEs for TDs. To correctly configure MMIO SPTEs, TDX requires the per-VM shadow_mmio_value to be set to 0. This is necessary to override the default value of the suppress VE bit in the SPTE, which is 1, and to ensure value 0 in RWX bits. For MMIO SPTE, the spte value changes as follows: 1. initial value (suppress VE bit is set) 2. Guest issues MMIO and triggers EPT violation 3. KVM updates SPTE value to MMIO value (suppress VE bit is cleared) 4. Guest MMIO resumes. It triggers VE exception in guest TD 5. Guest VE handler issues TDG.VP.VMCALL 6. KVM handles MMIO 7. Guest VE handler resumes its execution after MMIO instruction Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Remove warning for shadow_mmio_value --- arch/x86/kvm/mmu/spte.c | 2 -- arch/x86/kvm/vmx/tdx.c | 15 ++++++++++++++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 46a26be0245b..4ab6d2a87032 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -94,8 +94,6 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) u64 spte = generation_mmio_spte_mask(gen); u64 gpa = gfn << PAGE_SHIFT; - WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value); - access &= shadow_mmio_access_mask; spte |= vcpu->kvm->arch.shadow_mmio_value | access; spte |= gpa | shadow_nonpresent_or_rsvd_mask; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0c08062ef99f..9da71782660f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -6,7 +6,7 @@ #include "mmu.h" #include "tdx.h" #include "tdx_ops.h" - +#include "mmu/spte.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -344,6 +344,19 @@ int tdx_vm_init(struct kvm *kvm) { kvm->arch.has_private_mem = true; + /* + * Because guest TD is protected, VMM can't parse the instruction in TD. + * Instead, guest uses MMIO hypercall. For unmodified device driver, + * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO + * instruction into MMIO hypercall. + * + * SPTE value for MMIO needs to be setup so that #VE is injected into + * TD instead of triggering EPT MISCONFIG. + * - RWX=0 so that EPT violation is triggered. + * - suppress #VE bit is cleared to inject #VE. + */ + kvm_mmu_set_mmio_spte_value(kvm, 0); + /* * This function initializes only KVM software construct. It doesn't * initialize TDX stuff, e.g. TDCS, TDR, TDCX, HKID etc. From patchwork Wed Sep 4 03:07:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789645 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11EB457C8D; Wed, 4 Sep 2024 03:14:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419682; cv=none; b=bev6kL0ozTxbNMtJqwx1cAURWF3HDQHZ05Ns6zIKo76JsYIF/idv+aACO+/vKlQ307oMsULGxBEHVBs12kz4cOxay2jcijSXPzQfsKJsYeDfCZpN4pUpjNLa8fTbTGGwNlJGMBcoXp56zjAJfiV6hDEZxSv36iC1Asb1x5ZEasQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419682; c=relaxed/simple; bh=VWxvMb0bIs/oli9SNRdkD05D1RIi3Y54Xn91m4sVxhU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lNw/nfwmVAFY2E4ljcnrvWy7JlxYXjhQUmd1zKR9ZGet59y+5aLWG2Z9J2ocdscKvbOd9R+TCbjRD8JNh8VJ5RrTLYMcm2QlIpnwpI9yQcH/SEhW6darYGuaB9mN0aHcLrL1NERGAllEFyFdiWWUY6BguZuEx/feSYAufpwsqe0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cStDTVrg; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cStDTVrg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419680; x=1756955680; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VWxvMb0bIs/oli9SNRdkD05D1RIi3Y54Xn91m4sVxhU=; b=cStDTVrgstKLw0oyqCa+Nx2o4D7KPB5lv9JK+BLQ7ZS/zXkQq35EWpXR kguBelMscw7UKP9Udh2FTEEg8/JijQHdPTFZjmZ3budguMjZLcn8OsDii 8ARwFYp/PiqyPHbEvVxqxmvjujPC/3E4KD63WPysZFZ/FDQ1o27MmsHPl vXu6HdScrUvBx2pq7E16kEKKqN3bNEhIkUlj17AWlG/jDR2Hm09J+jtfS E/RV+VE7BYSQgyxXyvMAkpC/xj9D632M5AjVvw5JAkxCsDdqNFJeXSuIU 71uLQHhMMzMWLlKC88NpO7LBqWkD4VL9OKdUAgBOrsjWdghBYpxbubVeT g==; X-CSE-ConnectionGUID: k7h4/RLpRNePFXe4QO75Xw== X-CSE-MsgGUID: acrjs5I2SgOqB4ftwzqYBQ== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564695" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564695" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 X-CSE-ConnectionGUID: 1imYM/m4SJ+EJS8QtSloVg== X-CSE-MsgGUID: jzbUD0f1Sl2JAyt0v2ppPA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106320" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 13/21] KVM: TDX: Handle TLB tracking for TDX Date: Tue, 3 Sep 2024 20:07:43 -0700 Message-Id: <20240904030751.117579-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle TLB tracking for TDX by introducing function tdx_track() for private memory TLB tracking and implementing flush_tlb* hooks to flush TLBs for shared memory. Introduce function tdx_track() to do TLB tracking on private memory, which basically does two things: calling TDH.MEM.TRACK to increase TD epoch and kicking off all vCPUs. The private EPT will then be flushed when each vCPU re-enters the TD. This function is unused temporarily in this patch and will be called on a page-by-page basis on removal of private guest page in a later patch. In earlier revisions, tdx_track() relied on an atomic counter to coordinate the synchronization between the actions of kicking off vCPUs, incrementing the TD epoch, and the vCPUs waiting for the incremented TD epoch after being kicked off. However, the core MMU only actually needs to call tdx_track() while aleady under a write mmu_lock. So this sychnonization can be made to be unneeded. vCPUs are kicked off only after the successful execution of TDH.MEM.TRACK, eliminating the need for vCPUs to wait for TDH.MEM.TRACK completion after being kicked off. tdx_track() is therefore able to send requests KVM_REQ_OUTSIDE_GUEST_MODE rather than KVM_REQ_TLB_FLUSH. Hooks for flush_remote_tlb and flush_remote_tlbs_range are not necessary for TDX, as tdx_track() will handle TLB tracking of private memory on page-by-page basis when private guest pages are removed. There is no need to invoke tdx_track() again in kvm_flush_remote_tlbs() even after changes to the mirrored page table. For hooks flush_tlb_current and flush_tlb_all, which are invoked during kvm_mmu_load() and vcpu load for normal VMs, let VMM to flush all EPTs in the two hooks for simplicity, since TDX does not depend on the two hooks to notify TDX module to flush private EPT in those cases. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Modification of synchronization mechanism in tdx_track(). - Dropped hooks flush_remote_tlb and flush_remote_tlbs_range. - Let VMM to flush all EPTs in hooks flush_tlb_all and flush_tlb_current. - Dropped KVM_BUG_ON() in vt_flush_tlb_gva(). (Rick) --- arch/x86/kvm/vmx/main.c | 52 ++++++++++++++++++++++++++++++++--- arch/x86/kvm/vmx/tdx.c | 55 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 2 ++ 3 files changed, 105 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 2cc29d0fc279..1c86849680a3 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -101,6 +101,50 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) +{ + /* + * TDX calls tdx_track() in tdx_sept_remove_private_spte() to ensure + * private EPT will be flushed on the next TD enter. + * No need to call tdx_track() here again even when this callback is as + * a result of zapping private EPT. + * Just invoke invept() directly here to work for both shared EPT and + * private EPT. + */ + if (is_td_vcpu(vcpu)) { + ept_sync_global(); + return; + } + + vmx_flush_tlb_all(vcpu); +} + +static void vt_flush_tlb_current(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_flush_tlb_current(vcpu); + return; + } + + vmx_flush_tlb_current(vcpu); +} + +static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_flush_tlb_gva(vcpu, addr); +} + +static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_flush_tlb_guest(vcpu); +} + static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) { @@ -190,10 +234,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_rflags = vmx_set_rflags, .get_if_flag = vmx_get_if_flag, - .flush_tlb_all = vmx_flush_tlb_all, - .flush_tlb_current = vmx_flush_tlb_current, - .flush_tlb_gva = vmx_flush_tlb_gva, - .flush_tlb_guest = vmx_flush_tlb_guest, + .flush_tlb_all = vt_flush_tlb_all, + .flush_tlb_current = vt_flush_tlb_current, + .flush_tlb_gva = vt_flush_tlb_gva, + .flush_tlb_guest = vt_flush_tlb_guest, .vcpu_pre_run = vmx_vcpu_pre_run, .vcpu_run = vmx_vcpu_run, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9da71782660f..6feb3ab96926 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -6,6 +6,7 @@ #include "mmu.h" #include "tdx.h" #include "tdx_ops.h" +#include "vmx.h" #include "mmu/spte.h" #undef pr_fmt @@ -446,6 +447,51 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } +/* + * Ensure shared and private EPTs to be flushed on all vCPUs. + * tdh_mem_track() is the only caller that increases TD epoch. An increase in + * the TD epoch (e.g., to value "N + 1") is successful only if no vCPUs are + * running in guest mode with the value "N - 1". + * + * A successful execution of tdh_mem_track() ensures that vCPUs can only run in + * guest mode with TD epoch value "N" if no TD exit occurs after the TD epoch + * being increased to "N + 1". + * + * Kicking off all vCPUs after that further results in no vCPUs can run in guest + * mode with TD epoch value "N", which unblocks the next tdh_mem_track() (e.g. + * to increase TD epoch to "N + 2"). + * + * TDX module will flush EPT on the next TD enter and make vCPUs to run in + * guest mode with TD epoch value "N + 1". + * + * kvm_make_all_cpus_request() guarantees all vCPUs are out of guest mode by + * waiting empty IPI handler ack_kick(). + * + * No action is required to the vCPUs being kicked off since the kicking off + * occurs certainly after TD epoch increment and before the next + * tdh_mem_track(). + */ +static void __always_unused tdx_track(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + u64 err; + + /* If TD isn't finalized, it's before any vcpu running. */ + if (unlikely(!is_td_finalized(kvm_tdx))) + return; + + lockdep_assert_held_write(&kvm->mmu_lock); + + do { + err = tdh_mem_track(kvm_tdx); + } while (unlikely((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY)); + + if (KVM_BUG_ON(err, kvm)) + pr_tdx_error(TDH_MEM_TRACK, err); + + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; @@ -947,6 +993,15 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) return ret; } +void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) +{ + /* + * flush_tlb_current() is used only the first time for the vcpu to run. + * As it isn't performance critical, keep this function simple. + */ + ept_sync_global(); +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index dcf2b36efbb9..28fda93f0b27 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -131,6 +131,7 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); +void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } @@ -145,6 +146,7 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } +static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif From patchwork Wed Sep 4 03:07:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789646 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62B8D57CA7; Wed, 4 Sep 2024 03:14:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419683; cv=none; b=vAfCqx4ZJNASldh2pEkAy5athryevZPESMe2kv/fH94R8uRu38IN2Y9ZRzIlnRQnPXDi3C3WvUJQ4VHLrt/SVXLg4+79HBuQW99Ngi3psR48B6F6qLPaN8w2lQl+LXiBi+ZfIz1dM+TNi8EqdiYGQUB0/He8NL1XUDU13Ulbx6w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419683; c=relaxed/simple; bh=dVkl2Ju9fcT/LP1iKG9XPwmDZFKTu0VPLIUzUyrrd+w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=THnBOh6pEajarilk2WC0yFnLYxXExScq+jEKPcbu2oROFonNpMGh0TEHOvsKNgIgLFKJ04zE39DLF9kqXWHW4BJBf3lanzCCr1LDQiX7oxM7jB7imq/StgvahJZpkHXd40nIvlS/ToWNVMcahc90llO1LRi8Rrxy/yCSXqHfPao= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WzCc4lsu; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WzCc4lsu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419680; x=1756955680; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dVkl2Ju9fcT/LP1iKG9XPwmDZFKTu0VPLIUzUyrrd+w=; b=WzCc4lsu8eE49lZBHNlPJPtte035GqXMMO1tyFK2BdcL0GlYd8BFpjPE ilDOjq+1ylYAZ5Fvu1SoAdr4IBiOCrbImlE8ZPbaw82zZ3G80dXdlc59c nohO99hyCzcU/oRhGK71VR5hqnMCLLu3A2J6JPpqWxise3UAVHrLcS39B 7qbwarRW+n0/TJwDjpp+GZEbmD3RUJj3+CxK5YUDzzDzTxP3zXfTebdZx LOYZ26DKQCPhADSlb7LP2cDXmgADRISjSDeHpWDuws2hCvAMfAo+WV50r 3HBkJmixqNaC/jiM4sEgjwukPwNBFpZD3IdfsrGjgI21HaJqkq0hwpHBF Q==; X-CSE-ConnectionGUID: 7PMPfuhYQZ+mTHIhsDijPg== X-CSE-MsgGUID: zfQyrpWCSISGZvv6ynpFPA== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564700" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564700" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:09 -0700 X-CSE-ConnectionGUID: ByB7aPORQT+rxVmIwauYRA== X-CSE-MsgGUID: x94eR9t/T8C2KrCkic0tqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106331" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:08 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 14/21] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Date: Tue, 3 Sep 2024 20:07:44 -0700 Message-Id: <20240904030751.117579-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement hooks in TDX to propagate changes of mirror page table to private EPT, including changes for page table page adding/removing, guest page adding/removing. TDX invokes corresponding SEAMCALLs in the hooks. - Hook link_external_spt propagates adding page table page into private EPT. - Hook set_external_spte tdx_sept_set_private_spte() in this patch only handles adding of guest private page when TD is finalized. Later patches will handle the case of adding guest private pages before TD finalization. - Hook free_external_spt It is invoked when page table page is removed in mirror page table, which currently must occur at TD tear down phase, after hkid is freed. - Hook remove_external_spte It is invoked when guest private page is removed in mirror page table, which can occur when TD is active, e.g. during shared <-> private conversion and slot move/deletion. This hook is ensured to be triggered before hkid is freed, because gmem fd is released along with all private leaf mappings zapped before freeing hkid at VM destroy. TDX invokes below SEAMCALLs sequentially: 1) TDH.MEM.RANGE.BLOCK (remove RWX bits from a private EPT entry), 2) TDH.MEM.TRACK (increases TD epoch) 3) TDH.MEM.PAGE.REMOVE (remove the private EPT entry and untrack the guest page). TDH.MEM.PAGE.REMOVE can't succeed without TDH.MEM.RANGE.BLOCK and TDH.MEM.TRACK being called successfully. SEAMCALL TDH.MEM.TRACK is called in function tdx_track() to enforce that TLB tracking will be performed by TDX module for private EPT. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Move setting up the 4 callbacks (kvm_x86_ops::link_external_spt etc) from tdx_hardware_setup() (which doesn't exist anymore) to vt_hardware_setup() directly. Make tdx_sept_link_external_spt() those 4 callbacks global and add declarations to x86_ops.h so they can be setup in vt_hardware_setup(). - Updated the KVM_BUG_ON() in tdx_sept_free_private_spt(). (Isaku, Binbin) - Removed the unused tdx_post_mmu_map_page(). - Removed WARN_ON_ONCE) in tdh_mem_page_aug() according to Isaku's feedback: "This WARN_ON_ONCE() is a guard for buggy TDX module. It shouldn't return (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX)) when SEPT_VE_DISABLED cleared. Maybe we should remove this WARN_ON_ONCE() because the TDX module is mature." - Update for the wrapper functions for SEAMCALLs. (Sean) - Add preparation for KVM_TDX_INIT_MEM_REGION to make tdx_sept_set_private_spte() callback nop when the guest isn't finalized. - use unlikely(err) in tdx_reclaim_td_page(). - Updates from seamcall overhaul (Kai) - Move header definitions from "KVM: TDX: Define TDX architectural definitions" (Sean) - Drop ugly unions (Sean) - Remove tdx_mng_key_config_lock cleanup after dropped in "KVM: TDX: create/destroy VM structure" (Chao) - Since HKID is freed on vm_destroy() zapping only happens when HKID is allocated. Remove relevant code in zapping handlers that assume the opposite, and add some KVM_BUG_ON() to assert this where it was missing. (Isaku) --- arch/x86/kvm/vmx/main.c | 14 ++- arch/x86/kvm/vmx/tdx.c | 222 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx_arch.h | 23 ++++ arch/x86/kvm/vmx/tdx_ops.h | 6 + arch/x86/kvm/vmx/x86_ops.h | 37 ++++++ 5 files changed, 300 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 1c86849680a3..bf6fd5cca1d6 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -36,9 +36,21 @@ static __init int vt_hardware_setup(void) * is KVM may allocate couple of more bytes than needed for * each VM. */ - if (enable_tdx) + if (enable_tdx) { vt_x86_ops.vm_size = max_t(unsigned int, vt_x86_ops.vm_size, sizeof(struct kvm_tdx)); + /* + * Note, TDX may fail to initialize in a later time in + * vt_init(), in which case it is not necessary to setup + * those callbacks. But making them valid here even + * when TDX fails to init later is fine because those + * callbacks won't be called if the VM isn't TDX guest. + */ + vt_x86_ops.link_external_spt = tdx_sept_link_private_spt; + vt_x86_ops.set_external_spte = tdx_sept_set_private_spte; + vt_x86_ops.free_external_spt = tdx_sept_free_private_spt; + vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte; + } return 0; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 6feb3ab96926..b8cd5a629a80 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -447,6 +447,177 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } +static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn) +{ + struct page *page = pfn_to_page(pfn); + + put_page(page); +} + +static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + hpa_t hpa = pfn_to_hpa(pfn); + gpa_t gpa = gfn_to_gpa(gfn); + u64 entry, level_state; + u64 err; + + err = tdh_mem_page_aug(kvm_tdx, gpa, hpa, &entry, &level_state); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) { + tdx_unpin(kvm, pfn); + return -EAGAIN; + } + if (unlikely(err == (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX))) { + if (tdx_get_sept_level(level_state) == tdx_level && + tdx_get_sept_state(level_state) == TDX_SEPT_PENDING && + is_last_spte(entry, level) && + spte_to_pfn(entry) == pfn && + entry & VMX_EPT_SUPPRESS_VE_BIT) { + tdx_unpin(kvm, pfn); + return -EAGAIN; + } + } + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state); + tdx_unpin(kvm, pfn); + return -EIO; + } + + return 0; +} + +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) + return -EINVAL; + + /* + * Because guest_memfd doesn't support page migration with + * a_ops->migrate_folio (yet), no callback is triggered for KVM on page + * migration. Until guest_memfd supports page migration, prevent page + * migration. + * TODO: Once guest_memfd introduces callback on page migration, + * implement it and remove get_page/put_page(). + */ + get_page(pfn_to_page(pfn)); + + if (likely(is_td_finalized(kvm_tdx))) + return tdx_mem_page_aug(kvm, gfn, level, pfn); + + /* + * TODO: KVM_MAP_MEMORY support to populate before finalize comes + * here for the initial memory. + */ + return 0; +} + +static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn); + hpa_t hpa = pfn_to_hpa(pfn); + hpa_t hpa_with_hkid; + u64 err, entry, level_state; + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) + return -EINVAL; + + if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) + return -EINVAL; + + do { + /* + * When zapping private page, write lock is held. So no race + * condition with other vcpu sept operation. Race only with + * TDH.VP.ENTER. + */ + err = tdh_mem_page_remove(kvm_tdx, gpa, tdx_level, &entry, + &level_state); + } while (unlikely(err == TDX_ERROR_SEPT_BUSY)); + if (unlikely(!is_td_finalized(kvm_tdx) && + err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { + /* + * This page was mapped with KVM_MAP_MEMORY, but + * KVM_TDX_INIT_MEM_REGION is not issued yet. + */ + if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) { + tdx_unpin(kvm, pfn); + return 0; + } + } + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state); + return -EIO; + } + + hpa_with_hkid = set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid); + do { + /* + * TDX_OPERAND_BUSY can happen on locking PAMT entry. Because + * this page was removed above, other thread shouldn't be + * repeatedly operating on this page. Just retry loop. + */ + err = tdh_phymem_page_wbinvd(hpa_with_hkid); + } while (unlikely(err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX))); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); + return -EIO; + } + tdx_clear_page(hpa); + tdx_unpin(kvm, pfn); + return 0; +} + +int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + gpa_t gpa = gfn_to_gpa(gfn); + hpa_t hpa = __pa(private_spt); + u64 err, entry, level_state; + + err = tdh_mem_sept_add(to_kvm_tdx(kvm), gpa, tdx_level, hpa, &entry, + &level_state); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state); + return -EIO; + } + + return 0; +} + +static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn) & KVM_HPAGE_MASK(level); + u64 err, entry, level_state; + + /* For now large page isn't supported yet. */ + WARN_ON_ONCE(level != PG_LEVEL_4K); + + err = tdh_mem_range_block(kvm_tdx, gpa, tdx_level, &entry, &level_state); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_RANGE_BLOCK, err, entry, level_state); + return -EIO; + } + return 0; +} + /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase in @@ -471,7 +642,7 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) * occurs certainly after TD epoch increment and before the next * tdh_mem_track(). */ -static void __always_unused tdx_track(struct kvm *kvm) +static void tdx_track(struct kvm *kvm) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); u64 err; @@ -492,6 +663,55 @@ static void __always_unused tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } +int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + /* + * free_external_spt() is only called after hkid is freed when TD is + * tearing down. + * KVM doesn't (yet) zap page table pages in mirror page table while + * TD is active, though guest pages mapped in mirror page table could be + * zapped during TD is active, e.g. for shared <-> private conversion + * and slot move/deletion. + */ + if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) + return -EINVAL; + + /* + * The HKID assigned to this TD was already freed and cache was + * already flushed. We don't have to flush again. + */ + return tdx_reclaim_page(__pa(private_spt)); +} + +int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int ret; + + /* + * HKID is released when vm_free() which is after closing gmem_fd + * which causes gmem invalidation to zap all spte. + * Population is only allowed after KVM_TDX_INIT_VM. + */ + if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) + return -EINVAL; + + ret = tdx_sept_zap_private_spte(kvm, gfn, level); + if (ret) + return ret; + + /* + * TDX requires TLB tracking before dropping private page. Do + * it here, although it is also done later. + */ + tdx_track(kvm); + + return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 815e74408a34..634ed76db26a 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -155,6 +155,29 @@ struct td_params { #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) +/* Additional Secure EPT entry information */ +#define TDX_SEPT_LEVEL_MASK GENMASK_ULL(2, 0) +#define TDX_SEPT_STATE_MASK GENMASK_ULL(15, 8) +#define TDX_SEPT_STATE_SHIFT 8 + +enum tdx_sept_entry_state { + TDX_SEPT_FREE = 0, + TDX_SEPT_BLOCKED = 1, + TDX_SEPT_PENDING = 2, + TDX_SEPT_PENDING_BLOCKED = 3, + TDX_SEPT_PRESENT = 4, +}; + +static inline u8 tdx_get_sept_level(u64 sept_entry_info) +{ + return sept_entry_info & TDX_SEPT_LEVEL_MASK; +} + +static inline u8 tdx_get_sept_state(u64 sept_entry_info) +{ + return (sept_entry_info & TDX_SEPT_STATE_MASK) >> TDX_SEPT_STATE_SHIFT; +} + #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) /* diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 8ca3e252a6ed..73ffd80223b0 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -31,6 +31,12 @@ #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8) \ pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8) +static inline int pg_level_to_tdx_sept_level(enum pg_level level) +{ + WARN_ON_ONCE(level == PG_LEVEL_NONE); + return level - 1; +} + /* * TDX module acquires its internal lock for resources. It doesn't spin to get * locks because of its restrictions of allowed execution time. Instead, it diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 28fda93f0b27..d1db807b793a 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -131,6 +131,15 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); +int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt); +int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt); +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn); +int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn); + void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else @@ -146,6 +155,34 @@ static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } +static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + void *private_spt) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + void *private_spt) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + kvm_pfn_t pfn) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + kvm_pfn_t pfn) +{ + return -EOPNOTSUPP; +} + static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif From patchwork Wed Sep 4 03:07:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789648 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C5C0139CFC; Wed, 4 Sep 2024 03:14:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419684; cv=none; b=sN5lkjCKT/EOa4wHmDOzKI3hZspXThPihW+ehDJritr0+hNlih7AHyYoD/4QnCVTpA72MmY4AxqM/7E9XexQNjhT3FbKK86TawAn8LAff+D2sbVfVQuaREbzYceC+ZQ3F+/SMn+LlkIbLXtsjWrzqKeVSCIt0CITYpVGRCnJ500= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419684; c=relaxed/simple; bh=P5V/xRC5gwlM8RuyOqE/ALPqpDAH8g0S2cKP5+h2srI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QT2FwGCieRJfrmC2ePC3eIlzFYa4DIUvOssjS0dGG/ju+re2sMX9E8VSIxQix2CYeHWE8thPC5Pu9qgasujOCXvzBrBzBiN071g6dJNvP0fLrb39yREIQ2GZPDMPk8paffSk26yio9pyLDuojw6T2hThBURFZadbYZw3hbmZpyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ytze5hD2; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ytze5hD2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419682; x=1756955682; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=P5V/xRC5gwlM8RuyOqE/ALPqpDAH8g0S2cKP5+h2srI=; b=Ytze5hD2eClOpgW4s2+SxPRmprQ4Wi+FKilipRhOfcJWh1/3MxXF/aJu aX1/PzqUw+aepiWoxK4xOH4xu/ZxhuTvSgFZc/0ex20eZm96pO486homU 7Bd9i4ieY8SqHt1S1JTpANM05Q9eOt7VVyYVewH7wVRWpwWii55RYMNPF Hw6zRcAOPWmRb3tj5HtROjCCebsKBgGu2VWwgwg2tsdWBp2TWJ1CzhPfT phCwphNVBA+VMN92Dx1tLPRKPiUqoHV8DTApPJ7wFAe/7WYZ6j5BwnfS/ 1B9nap8vx/wnhIYw67rPc+x4NOA/dzd86RTbH/osfNnyzLjclqH2L/cki Q==; X-CSE-ConnectionGUID: T3g6alZdQ/mbP7igriSTGA== X-CSE-MsgGUID: JlcLZrAeTzK9uHHYuH7YDw== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564705" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564705" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:10 -0700 X-CSE-ConnectionGUID: 37obUln5S3GaxwtSWbZkhw== X-CSE-MsgGUID: waJ59fTCT9qWto0vgk/xig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106341" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:09 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 15/21] KVM: TDX: Implement hook to get max mapping level of private pages Date: Tue, 3 Sep 2024 20:07:45 -0700 Message-Id: <20240904030751.117579-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement hook private_max_mapping_level for TDX to let TDP MMU core get max mapping level of private pages. The value is hard coded to 4K for no huge page support for now. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Fix missing tdx_gmem_private_max_mapping_level() implementation for !CONFIG_INTEL_TDX_HOST v19: - Use gmem_max_level callback, delete tdp_max_page_level. --- arch/x86/kvm/vmx/main.c | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 5 +++++ arch/x86/kvm/vmx/x86_ops.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index bf6fd5cca1d6..5d43b44e2467 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -184,6 +184,14 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp) return tdx_vcpu_ioctl(vcpu, argp); } +static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) +{ + if (is_td(kvm)) + return tdx_gmem_private_max_mapping_level(kvm, pfn); + + return 0; +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -337,6 +345,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .mem_enc_ioctl = vt_mem_enc_ioctl, .vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl, + + .private_max_mapping_level = vt_gmem_private_max_mapping_level }; struct kvm_x86_init_ops vt_init_ops __initdata = { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b8cd5a629a80..59b627b45475 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1582,6 +1582,11 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) return ret; } +int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) +{ + return PG_LEVEL_4K; +} + #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) static int __init setup_kvm_tdx_caps(void) diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index d1db807b793a..66829413797d 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -142,6 +142,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); +int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -185,6 +186,7 @@ static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} +static inline int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) { return 0; } #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ From patchwork Wed Sep 4 03:07:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789651 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9D2D13C690; Wed, 4 Sep 2024 03:14:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419686; cv=none; b=ZnqwI/bwv+Yswny9QP/Pz469EXpiJ70pEi+KrLT6j29HK8ygTIGLr78pHptT2/Z3t7bRe8ow7VYZ9UgcOHtUWKHxZdiXhiowwHh+0y0CFObzZJUHGloxiNTG9+SHO9zVmT0FV31CKbAQjYNv0lphOYIHhZkLhbyKpwU0Cyn4dxc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419686; c=relaxed/simple; bh=vzF4jMsbdAIEM7uSpTc8OouvMuhn8wUK+hjt+NiEOh8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nAmlW3/ZVe1GXGX6TNMfNaYydAMdc6j+2r1NIwVnmvnr0lL0oUBTPb9iObgkay4gB+KtNax9HXN/mFaoXJA/4pOh/aWzySqgE8d0wFW8XDz16iu+1zshZE4ZciSJjVCaJSipfTB0bBHjp7BDzcHUdGPrR6DvwGOFGm7qU0V9090= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TeFJRspa; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TeFJRspa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419684; x=1756955684; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vzF4jMsbdAIEM7uSpTc8OouvMuhn8wUK+hjt+NiEOh8=; b=TeFJRspau6/wIQ2ANrsDRsAwjldhQsBeu+MWh2iR9ZjAQcjo7SUo4WW0 mFekPkK3BxcS1zwbSauXMRqcwItypJIanhg6LXwk35NkSkLq9XaTW3ihj Tqqe+MpOXS4UZ3rOkXYmViow265zoNvPzGm1vZ5ZxZj8bGdoD6zSnWIdJ J5sNp8sntlq0ZDZaXD8PjOi2Hj6VDS0eENHg06bzeESjY9OkKQyOot7bo PyTTvdDygm5j196AXfUTWqsFf83FnWtbF1aat8O2MsDYr08UcrUD6zRJB 8tbDujWSzgPgIipxCUDmR1ww/wLQVR+KANemjKu94ZGfRrfm/fNa5liBI A==; X-CSE-ConnectionGUID: KnxubGoJTHmLUcYw/erxEQ== X-CSE-MsgGUID: 2KcFK8EpQgGjub3maeC+3Q== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564710" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564710" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:10 -0700 X-CSE-ConnectionGUID: 6deKK6LdQDCausr+tjEUfg== X-CSE-MsgGUID: vQVCmvHSQWGkmWYvSDlc9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106350" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:10 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 16/21] KVM: TDX: Premap initial guest memory Date: Tue, 3 Sep 2024 20:07:46 -0700 Message-Id: <20240904030751.117579-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Update TDX's hook of set_external_spte() to record pre-mapping cnt instead of doing nothing and returning when TD is not finalized. TDX uses ioctl KVM_TDX_INIT_MEM_REGION to initialize its initial guest memory. This ioctl calls kvm_gmem_populate() to get guest pages and in tdx_gmem_post_populate(), it will (1) Map page table pages into KVM mirror page table and private EPT. (2) Map guest pages into KVM mirror page table. In the propagation hook, just record pre-mapping cnt without mapping the guest page into private EPT. (3) Map guest pages into private EPT and decrease pre-mapping cnt. Do not map guest pages into private EPT directly in step (2), because TDX requires TDH.MEM.PAGE.ADD() to add a guest page before TD is finalized, which copies page content from a source page from user to target guest page to be added. However, source page is not available via common interface kvm_tdp_map_page() in step (2). Therefore, just pre-map the guest page into KVM mirror page table and record the pre-mapping cnt in TDX's propagation hook. The pre-mapping cnt would be decreased in ioctl KVM_TDX_INIT_MEM_REGION when the guest page is mapped into private EPT. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Update the code comment and patch log according to latest gmem update. https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ - Rename tdx_mem_page_add() to tdx_mem_page_record_premap_cnt() to avoid confusion. - Change the patch title to "KVM: TDX: Premap initial guest memory". - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean) - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to KVM_TDX_INIT_MEM_REGION. (Sean) - Added nr_premapped to track the number of premapped pages - Drop tdx_post_mmu_map_page(). v19: - Switched to use KVM_MEMORY_MAPPING - Dropped measurement extension - updated commit message. private_page_add() => set_private_spte() --- arch/x86/kvm/vmx/tdx.c | 40 +++++++++++++++++++++++++++++++++------- arch/x86/kvm/vmx/tdx.h | 2 +- 2 files changed, 34 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 59b627b45475..435112562954 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -488,6 +488,34 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, return 0; } +/* + * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to get guest pages and + * tdx_gmem_post_populate() to premap page table pages into private EPT. + * Mapping guest pages into private EPT before TD is finalized should use a + * seamcall TDH.MEM.PAGE.ADD(), which copies page content from a source page + * from user to target guest pages to be added. This source page is not + * available via common interface kvm_tdp_map_page(). So, currently, + * kvm_tdp_map_page() only premaps guest pages into KVM mirrored root. + * A counter nr_premapped is increased here to record status. The counter will + * be decreased after TDH.MEM.PAGE.ADD() is called after the kvm_tdp_map_page() + * in tdx_gmem_post_populate(). + */ +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + /* Returning error here to let TDP MMU bail out early. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) { + tdx_unpin(kvm, pfn); + return -EINVAL; + } + + /* nr_premapped will be decreased when tdh_mem_page_add() is called. */ + atomic64_inc(&kvm_tdx->nr_premapped); + return 0; +} + int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { @@ -510,11 +538,7 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, if (likely(is_td_finalized(kvm_tdx))) return tdx_mem_page_aug(kvm, gfn, level, pfn); - /* - * TODO: KVM_MAP_MEMORY support to populate before finalize comes - * here for the initial memory. - */ - return 0; + return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); } static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, @@ -546,10 +570,12 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, if (unlikely(!is_td_finalized(kvm_tdx) && err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { /* - * This page was mapped with KVM_MAP_MEMORY, but - * KVM_TDX_INIT_MEM_REGION is not issued yet. + * Page is mapped by KVM_TDX_INIT_MEM_REGION, but hasn't called + * tdh_mem_page_add(). */ if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) { + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); + atomic64_dec(&kvm_tdx->nr_premapped); tdx_unpin(kvm, pfn); return 0; } diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 66540c57ed61..25a4aaede2ba 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -26,7 +26,7 @@ struct kvm_tdx { u64 tsc_offset; - /* For KVM_MAP_MEMORY and KVM_TDX_INIT_MEM_REGION. */ + /* For KVM_TDX_INIT_MEM_REGION. */ atomic64_t nr_premapped; struct kvm_cpuid2 *cpuid; From patchwork Wed Sep 4 03:07:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789649 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2753513AA4C; Wed, 4 Sep 2024 03:14:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419685; cv=none; b=L18ctWD8vEoIQ1uN5rxwbOGNGX7vyYGOBy2C/INveHakGwHsm5+XdO3Kkg2CmC1CWBA44wWsr30pM/ypebu0TWmbtwZ0n8yGUAyyKoFqL1n4RTVumXxRqmFlFMK/KsQ7cedvHzCjXtBtco1EdDnukP3NgLCHEKyG2xpql75SGas= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419685; c=relaxed/simple; bh=8+bE6q1YJllLLISK12nO0Yax9Yjf9NeRSGBD2+kh0Hk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GwcHU+8/SQ1EffUz3xfMMh8BrLQVZdCKxB6+NL41N3Eghk26mRp7sMOaACg2fKTw71WhCuihCmZHF4xqCizNiE8yfEskfQVenGYhu2DbZuQSnI9hxpeWQ7qajcl7KVsgaKNfSEVgJhaCOgpSTpscgm0ydI1D38yb2p93/7JqGT4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E8VpK2Dy; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E8VpK2Dy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419683; x=1756955683; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8+bE6q1YJllLLISK12nO0Yax9Yjf9NeRSGBD2+kh0Hk=; b=E8VpK2Dy1WFd+T0VNGxU8S0tOXF3dDB/kfn+qqnnXfeZjFPR1RsYTF6V EUDt5dsdaFaceu3/xGFxHt+RXD8GHty3E5CYRlsew71y3rJcOBYnDD9oJ xRbEdwZ+eobg6H+05Ou183SFhQXgrKC/o7dvL9wbC04fjGtHqypS07urq BZ5ulR61d4oOKRJxAq11C9WpQGuxLgzOYNM8EI+FCt4nDeXdbCsIxMSYx bCJnwuymd5owz56IjBgt/LeAiKcMPgwDC2668cu2CiWcwQklLk5AiJkZe /Brw8epMH9EqAcaplNj1RKyLPosoxnsq21AI4TFDL9hcUoFTc6jiqY3ek A==; X-CSE-ConnectionGUID: KtmR1/rFRaSNQ98dSRkm7w== X-CSE-MsgGUID: 2ujPfcr7RqKEYhhl4DbQ+Q== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564715" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564715" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:11 -0700 X-CSE-ConnectionGUID: FE1RKMh1T3a2utLTXqrMng== X-CSE-MsgGUID: UhPLdk+1QH6pWYMlmbFu2g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106356" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:10 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 17/21] KVM: TDX: MTRR: implement get_mt_mask() for TDX Date: Tue, 3 Sep 2024 20:07:47 -0700 Message-Id: <20240904030751.117579-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Although TDX supports only WB for private GPA, it's desirable to support MTRR for shared GPA. Always honor guest PAT for shared EPT as what's done for normal VMs. Suggested-by: Kai Huang Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Align with latest vmx code in kvm/queue. - Updated patch log. - Dropped KVM_BUG_ON() in vt_get_mt_mask(). (Rick) v19: - typo in the commit message - Deleted stale paragraph in the commit message --- arch/x86/kvm/vmx/main.c | 10 +++++++++- arch/x86/kvm/vmx/tdx.c | 8 ++++++++ arch/x86/kvm/vmx/x86_ops.h | 2 ++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 5d43b44e2467..8f5dbab9099f 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -168,6 +168,14 @@ static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); } +static u8 vt_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) +{ + if (is_td_vcpu(vcpu)) + return tdx_get_mt_mask(vcpu, gfn, is_mmio); + + return vmx_get_mt_mask(vcpu, gfn, is_mmio); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -292,7 +300,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_tss_addr = vmx_set_tss_addr, .set_identity_map_addr = vmx_set_identity_map_addr, - .get_mt_mask = vmx_get_mt_mask, + .get_mt_mask = vt_get_mt_mask, .get_exit_info = vmx_get_exit_info, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 435112562954..50ce24905062 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -374,6 +374,14 @@ int tdx_vm_init(struct kvm *kvm) return 0; } +u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) +{ + if (is_mmio) + return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT; + + return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT; +} + int tdx_vcpu_create(struct kvm_vcpu *vcpu) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 66829413797d..d8a00ab4651c 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -128,6 +128,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); +u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -153,6 +154,7 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} +static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { return 0; } static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Wed Sep 4 03:07:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789647 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D03A113213C; Wed, 4 Sep 2024 03:14:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419684; cv=none; b=G6cv+SMDdzIicLbTRNz/A5y0LlIIjE0LAr2t43SPK6wInoJ/ZkkVOKzaOeyifuUeARk5dpZys8YNz8HPmrZwbnYBozTv7sL45aercLEcpPD0lv3AEOhR06r1BPj9wGF8BONQyTpBxKXiMjDjPgnMyYVB7sVK/C1B2GDSNAVeoCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419684; c=relaxed/simple; bh=d0oDl1sG8BMASTdCO/CxO7ARHn2ISwq2R/yNIapUptI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bNqmvY+xiaCwI5JMyQXKuObrYEojaKC3k3heYXeTeix7k+VFZLuPvJXA33PkfERV6FnTv0/i9iWMJryokiVU+fO+ywTF4CS8UrnECEZYkdy2D8yyd84MnBIQCXtV6dzLZamxYflqrfPfk7aQbpPBgItDH4HstRZI6m8IMzaSH08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IESJHrnE; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IESJHrnE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419681; x=1756955681; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=d0oDl1sG8BMASTdCO/CxO7ARHn2ISwq2R/yNIapUptI=; b=IESJHrnEkNwFvAPQ2uFLktEXdMAvuXF5k2rSJhi7PuQGVdtxSH5K1fkc P7QX9IUQaoDl6O3EmCM+OV1NvDjxVeLXS6QuFRw7oU53QhOy31pDbVdD1 nY4gSLOBqsKszwfvJk9QHq83GrmOPmdpxOQ4cjoHYzqiyo575LoR+pJUF hSlkKUjnMu4Qv87lFDR8JNnNcEuXGDw13uisGCb6ERlszRImf0YGXPEKG 3CRbP7rQ3Pv1b66SXzfhaHuQO6NXPoScHC34Hns+0ZWebF6KuxfYQ+MLp UqeMM6SOBWMITfPEZhTqSEmoxbO5DD9PwvVyVfj5aXY3+NLNoyNhAQUGW Q==; X-CSE-ConnectionGUID: pjjRD9m9SOmhoqZxFra8Ug== X-CSE-MsgGUID: FyzFPjiGTBahVUyVqyTrEA== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564721" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564721" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:11 -0700 X-CSE-ConnectionGUID: ksDleisLR8SMl+QPSHVQ+A== X-CSE-MsgGUID: i2gq4rBVRYK7Q8BbEdpkhg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106367" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:11 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 18/21] KVM: x86/mmu: Export kvm_tdp_map_page() Date: Tue, 3 Sep 2024 20:07:48 -0700 Message-Id: <20240904030751.117579-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In future changes coco specific code will need to call kvm_tdp_map_page() from within their respective gmem_post_populate() callbacks. Export it so this can be done from vendor specific code. Since kvm_mmu_reload() will be needed for this operation, export it as well. Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - New patch --- arch/x86/kvm/mmu/mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d26b235d8f84..1a7965cfa08e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4754,6 +4754,7 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level return -EIO; } } +EXPORT_SYMBOL_GPL(kvm_tdp_map_page); long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range) @@ -5776,6 +5777,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) out: return r; } +EXPORT_SYMBOL_GPL(kvm_mmu_load); void kvm_mmu_unload(struct kvm_vcpu *vcpu) { From patchwork Wed Sep 4 03:07:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789650 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DA0759155; Wed, 4 Sep 2024 03:14:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419686; cv=none; b=hJUoa63wSEelDWWCQnhZGmSw3yCIp1oZsv6hXmgsgHywA127GopD0ondb2llj+/D+naC6ZJo92XrRuI8nSFfmIkEMzkWBq/SXR/KM9bdciVcQ++tN+OJvLqYEgZdwiPN5HsBrAjjHIfMq7NzR+/KyKBbPlpAdoYwNRlhc0wEPfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419686; c=relaxed/simple; bh=nndpORrLQUk9Aw9kC1XdXsRMcRLXZY0XeYbkhI+fpss=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=j1Q5NB9CTB1K46MHG4icmuYqJEnFg8OvZBMID8PVb/fdfohoKQKs+xfYUmwQJ1KeEkrAr/ozTtkI992p42ApVGyRWosz5ZrQ/DAg6+yLWek7rfhOUTZNMhsUeeF78nnkzE4fwpKi+QoEcNWLcwVaOjj9z840yFrJOrSt74+roro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iPspYo3y; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iPspYo3y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419684; x=1756955684; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nndpORrLQUk9Aw9kC1XdXsRMcRLXZY0XeYbkhI+fpss=; b=iPspYo3yE1lTQ8fWs45pNfskf/QFMqfN3GtyKl3snQrXe8Va4E2pxco9 dOP/gQt1hoA3tR3gd41HWXELPJko63ofO6CG3hf/iEyKiwCiGmkdtjkqN QMWoxZq7i8qv6OBXh58eUYG+HpU65AjXRQ89kedRSS0U4B0PTQtabuXQI LBCGB4J3cLD3V0yBDcBJBehpyUYTc5Qnvx5ZwH4tZf1/cko7il4htWcZJ EqOxvBxzEjTlec+WhNtqmFNo6zOvpioI8ruWkuTqOCfVa6m3SubIeEonN 5gblLq9P/2VsMC0zhisoTr0FetdMfh/Z0e13WV/fgsAXRJbusXJDo9XPV Q==; X-CSE-ConnectionGUID: FfUnSgV1Rk66qyDQWF+dxw== X-CSE-MsgGUID: Fd0AdSxmTyqLXxE1N75Jhw== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564726" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564726" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:12 -0700 X-CSE-ConnectionGUID: 9X7FYh3AQ4m3KLeB7qTFfw== X-CSE-MsgGUID: dZJbLmi5QpOEJJuCCkqtew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106375" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:12 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 19/21] KVM: TDX: Add an ioctl to create initial guest memory Date: Tue, 3 Sep 2024 20:07:49 -0700 Message-Id: <20240904030751.117579-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a new ioctl for the user space VMM to initialize guest memory with the specified memory contents. Because TDX protects the guest's memory, the creation of the initial guest memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of directly copying the memory contents into the guest's memory in the case of the default VM type. Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign the guest page in Secure-EPT, copy the initial memory contents into the guest memory, and encrypt the guest memory. Optionally, extend the memory measurement of the TDX guest. Discussion history: - Originally, KVM_TDX_INIT_MEM_REGION used the callback of the TDP MMU of the KVM page fault handler. It issues TDX SEAMCALL deep in the call stack, and the ioctl passes down the necessary parameters. [2] rejected it. [3] suggests that the call to the TDX module should be invoked in a shallow call stack. - Instead, introduce guest memory pre-population [1] that doesn't update vendor-specific part (Secure-EPT in TDX case) and the vendor-specific code (KVM_TDX_INIT_MEM_REGION) updates only vendor-specific parts without modifying the KVM TDP MMU suggested at [4] Crazy idea. For TDX S-EPT, what if KVM_MAP_MEMORY does all of the SEPT.ADD stuff, which doesn't affect the measurement, and even fills in KVM's copy of the leaf EPTE, but tdx_sept_set_private_spte() doesn't do anything if the TD isn't finalized? Then KVM provides a dedicated TDX ioctl(), i.e. what is/was KVM_TDX_INIT_MEM_REGION, to do PAGE.ADD. KVM_TDX_INIT_MEM_REGION wouldn't need to map anything, it would simply need to verify that the pfn from guest_memfd() is the same as what's in the TDP MMU. - Use the common guest_memfd population function, kvm_gmem_populate() instead of a custom function. It should check whether the PFN from TDP MMU is the same as the one from guest_memfd. [1] - Instead of forcing userspace to do two passes, pre-map the guest initial memory in tdx_gmem_post_populate. [5] Link: https://lore.kernel.org/kvm/20240419085927.3648704-1-pbonzini@redhat.com/ [1] Link: https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ [2] Link: https://lore.kernel.org/kvm/Zh8DHbb8FzoVErgX@google.com/ [3] Link: https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ [4] Link: https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ [5] Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Update the code according to latest gmem update. https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ - Fixup a aligment bug reported by Binbin. - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean) - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to KVM_TDX_INIT_MEM_REGION. (Sean) - Added nr_premapped to track the number of premapped pages - Drop tdx_post_mmu_map_page(). - Drop kvm_slot_can_be_private() check (Paolo) - Use kvm_tdp_mmu_gpa_is_mapped() (Paolo) v19: - Switched to use KVM_MEMORY_MAPPING - Dropped measurement extension - updated commit message. private_page_add() => set_private_spte() --- arch/x86/include/uapi/asm/kvm.h | 9 ++ arch/x86/kvm/vmx/tdx.c | 150 ++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 1 + 3 files changed, 160 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 39636be5c891..789d1d821b4f 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, @@ -996,4 +997,12 @@ struct kvm_tdx_init_vm { struct kvm_cpuid2 cpuid; }; +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 50ce24905062..796d1a495a66 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "tdx_ops.h" #include "vmx.h" #include "mmu/spte.h" +#include "common.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -1586,6 +1587,152 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) return 0; } +struct tdx_gmem_post_populate_arg { + struct kvm_vcpu *vcpu; + __u32 flags; +}; + +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + void __user *src, int order, void *_arg) +{ + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct tdx_gmem_post_populate_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + gpa_t gpa = gfn_to_gpa(gfn); + u8 level = PG_LEVEL_4K; + struct page *page; + int ret, i; + u64 err, entry, level_state; + + /* + * Get the source page if it has been faulted in. Return failure if the + * source page has been swapped out or unmapped in primary memory. + */ + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page); + if (ret < 0) + return ret; + if (ret != 1) + return -ENOMEM; + + if (!kvm_mem_is_private(kvm, gfn)) { + ret = -EFAULT; + goto out_put_page; + } + + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); + if (ret < 0) + goto out_put_page; + + read_lock(&kvm->mmu_lock); + + if (!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa)) { + ret = -ENOENT; + goto out; + } + + ret = 0; + do { + err = tdh_mem_page_add(kvm_tdx, gpa, pfn_to_hpa(pfn), + pfn_to_hpa(page_to_pfn(page)), + &entry, &level_state); + } while (err == TDX_ERROR_SEPT_BUSY); + if (err) { + ret = -EIO; + goto out; + } + + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); + atomic64_dec(&kvm_tdx->nr_premapped); + + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { + err = tdh_mr_extend(kvm_tdx, gpa + i, &entry, + &level_state); + if (err) { + ret = -EIO; + break; + } + } + } + +out: + read_unlock(&kvm->mmu_lock); +out_put_page: + put_page(page); + return ret; +} + +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_mem_region region; + struct tdx_gmem_post_populate_arg arg; + long gmem_ret; + int ret; + + if (!to_tdx(vcpu)->initialized) + return -EINVAL; + + /* Once TD is finalized, the initial guest memory is fixed. */ + if (is_td_finalized(kvm_tdx)) + return -EINVAL; + + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) + return -EINVAL; + + if (copy_from_user(®ion, u64_to_user_ptr(cmd->data), sizeof(region))) + return -EFAULT; + + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || + !region.nr_pages || + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || + !kvm_is_private_gpa(kvm, region.gpa) || + !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) + return -EINVAL; + + mutex_lock(&kvm->slots_lock); + + kvm_mmu_reload(vcpu); + ret = 0; + while (region.nr_pages) { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + arg = (struct tdx_gmem_post_populate_arg) { + .vcpu = vcpu, + .flags = cmd->flags, + }; + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), + u64_to_user_ptr(region.source_addr), + 1, tdx_gmem_post_populate, &arg); + if (gmem_ret < 0) { + ret = gmem_ret; + break; + } + + if (gmem_ret != 1) { + ret = -EIO; + break; + } + + region.source_addr += PAGE_SIZE; + region.gpa += PAGE_SIZE; + region.nr_pages--; + + cond_resched(); + } + + mutex_unlock(&kvm->slots_lock); + + if (copy_to_user(u64_to_user_ptr(cmd->data), ®ion, sizeof(region))) + ret = -EFAULT; + return ret; +} + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -1605,6 +1752,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) case KVM_TDX_INIT_VCPU: ret = tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_INIT_MEM_REGION: + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); + break; case KVM_TDX_GET_CPUID: ret = tdx_vcpu_get_cpuid(vcpu, &cmd); break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 73fc3334721d..0822db480719 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2639,6 +2639,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn return NULL; } +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) { From patchwork Wed Sep 4 03:07:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789652 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62F5013D2A9; Wed, 4 Sep 2024 03:14:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419687; cv=none; b=gaC+OE9oePmi6Sh8SS37oCWH5mpqK8derv7mMED72BcUbNQDbm0gZxh1TAWNAEkQjwZvy9W11BHDZMMDXZj8w2iKHTh3kHNixnKLjBE3K7kkBNUksmaenvUewB2ZNqRQg90A+mGILzn17VSJ+PmALuNSjNQuTAfJeVibutal6Ew= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419687; c=relaxed/simple; bh=rrGzdniK2GFyZIc5ro2AjfZct5nRKwSCW1uYyGcSXGc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fk61cmg6/YDGVd3n67CFYN9nZg99jkAhs0Uj8B4KSCVanrbZMKlw+Icx3R9s1tmRPO7mVhJcgoKblqXhI+xay5FxN3sAl/ycKm7JaCNhXbcDmWkxiNWtj8WNoxU0AVMTvate3h/CIgGj9RFDUFzQffitVoWcBhGZ1KMp0f3gJnI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kjKDZ9f7; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kjKDZ9f7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419685; x=1756955685; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rrGzdniK2GFyZIc5ro2AjfZct5nRKwSCW1uYyGcSXGc=; b=kjKDZ9f7Hd2WJI0Z+SD20mROJ7nCn4JB2TWq1YSPH3TSx54hvCTGLxy+ DCh+mnkTeMruLlu4QmMaBFnhPxoKnR6EBoOFTZsSqDpj+Gv7IxD2rIchM UG+uluQMz4lNF9XylkTeYoN/6maLrnYsiH9GSVByDvt+Vjwpy+wK9hmZ3 bTqdmqAvj9CYu0jKhHeTSqGWJEHwfrt7ByfTtTB6Kftlk8Z0uoOoR7mO/ YRoIhMbqRcDydfPopV2pecs3sWmFr4dL8fQHLm4XNjjvRD2kVfHp5Zttg lwQHOtl3TnUGVeRiLbG9Qveoekzzf1gVGkhoItipqQu+HrwRWQi8EgNFV A==; X-CSE-ConnectionGUID: F2g5uzRZSOyT4pUZVCyCYQ== X-CSE-MsgGUID: fX7yLL1CTnejx8byXtEEvA== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564731" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564731" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:13 -0700 X-CSE-ConnectionGUID: 7vh4eNHSTwa0BdE85cfHew== X-CSE-MsgGUID: 6OVuKBC4SuSNz+jWnNoS+A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106389" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:12 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org, Adrian Hunter Subject: [PATCH 20/21] KVM: TDX: Finalize VM initialization Date: Tue, 3 Sep 2024 20:07:50 -0700 Message-Id: <20240904030751.117579-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand, KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization. Documentation for the API is added in another patch: "Documentation/virt/kvm: Document on Trust Domain Extensions(TDX)" For the purpose of attestation, a measurement must be made of the TDX VM initial state. This is referred to as TD Measurement Finalization, and uses SEAMCALL TDH.MR.FINALIZE, after which: 1. The VMM adding TD private pages with arbitrary content is no longer allowed 2. The TDX VM is runnable Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Added premapped check. - Update for the wrapper functions for SEAMCALLs. (Sean) - Add check if nr_premapped is zero. If not, return error. - Use KVM_BUG_ON() in tdx_td_finalizer() for consistency. - Change tdx_td_finalizemr() to take struct kvm_tdx_cmd *cmd and return error (Adrian) - Handle TDX_OPERAND_BUSY case (Adrian) - Updates from seamcall overhaul (Kai) - Rename error->hw_error v18: - Remove the change of tools/arch/x86/include/uapi/asm/kvm.h. v15: - removed unconditional tdx_track() by tdx_flush_tlb_current() that does tdx_track(). --- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 789d1d821b4f..0b4827e39458 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -932,6 +932,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, KVM_TDX_INIT_MEM_REGION, + KVM_TDX_FINALIZE_VM, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 796d1a495a66..3083a66bb895 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1257,6 +1257,31 @@ void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) ept_sync_global(); } +static int tdx_td_finalizemr(struct kvm *kvm, struct kvm_tdx_cmd *cmd) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + if (!is_hkid_assigned(kvm_tdx) || is_td_finalized(kvm_tdx)) + return -EINVAL; + /* + * Pages are pending for KVM_TDX_INIT_MEM_REGION to issue + * TDH.MEM.PAGE.ADD(). + */ + if (atomic64_read(&kvm_tdx->nr_premapped)) + return -EINVAL; + + cmd->hw_error = tdh_mr_finalize(kvm_tdx); + if ((cmd->hw_error & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY) + return -EAGAIN; + if (KVM_BUG_ON(cmd->hw_error, kvm)) { + pr_tdx_error(TDH_MR_FINALIZE, cmd->hw_error); + return -EIO; + } + + kvm_tdx->finalized = true; + return 0; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -1281,6 +1306,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_INIT_VM: r = tdx_td_init(kvm, &tdx_cmd); break; + case KVM_TDX_FINALIZE_VM: + r = tdx_td_finalizemr(kvm, &tdx_cmd); + break; default: r = -EINVAL; goto out; From patchwork Wed Sep 4 03:07:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13789653 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B580313F458; Wed, 4 Sep 2024 03:14:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419689; cv=none; b=TRFtKYBE1hgC/Ex0j2fWybz+w+LU8VuwphDjzQ5j84fgYNqpHE289ktheEN3ZOUiISNOL3c8hUY+4QXF9XEt0KKdZPllWJSx0p4JXdqL0y/DkNCSvdtBVKU6rB6omjmbXEeHy/xXUZt8e7nc+bX+4VGkyWLQePAC5sCwYh+VSds= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725419689; c=relaxed/simple; bh=WX27fnXsZPVUcpAjBEAPE+bdFp1W2FwYupWRB5xei9Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XCvlD1TjYlfCtOLgb0ecW81qJV953I9EBqMju1fpPJQihtk3f4k1szu+z4ZskBaYtsQZM1SqIYDYorYSTxyFohYjewicG1LiB4QY4rIqzy/F0ZS5URXcne7c2cpoHlc9Mzkq7AeIWqEeJax7nnhCJLqw0o1HlWt6ZXL9QTKELRc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YdvENOHt; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YdvENOHt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1725419686; x=1756955686; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WX27fnXsZPVUcpAjBEAPE+bdFp1W2FwYupWRB5xei9Q=; b=YdvENOHtnMjzfxyYMmTV2tfD+/55LeHfeYHSVcNZ/Vf/ulZsOqfP4liB JBHuxuOPvTsVN5qxuoBnEjQ4bTQ702SZVMCbHPcFliGPNex7Qkt25onDN /C7Vn/Ak1v9U8GzIIIQ9j4zSDMu4vs3v1j2/JzgQWOLd7hKNHkxZJuvs2 GOkkFd/TANDNn6bQupvMcS2zvg6YodWzQwcyE8fDBOwqfsNqIiZl8EMX7 WsyGRE8ntzS1cl+/UecMRR1+H3TilxSMA7bKqI5waLrgr+wbaFRlR8x7V a69ui1lAxpiqEojDbA1oLbdqvySD7VAD/PXhVAHvmZBMaBXzcGEwkd5FA w==; X-CSE-ConnectionGUID: ZVfGFlVEQOqW1QReaeIqpQ== X-CSE-MsgGUID: q72WS+HnQfW313xdaAqvow== X-IronPort-AV: E=McAfee;i="6700,10204,11184"; a="23564737" X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="23564737" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:14 -0700 X-CSE-ConnectionGUID: RtPFJEKvRuOfYMLIf1xdwQ== X-CSE-MsgGUID: zu8SYLqvRBqM9NEDRtn4iA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,200,1719903600"; d="scan'208";a="65106401" Received: from dgramcko-desk.amr.corp.intel.com (HELO rpedgeco-desk4..) ([10.124.221.153]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2024 20:08:13 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, isaku.yamahata@gmail.com, yan.y.zhao@intel.com, nik.borisov@suse.com, rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Subject: [PATCH 21/21] KVM: TDX: Handle vCPU dissociation Date: Tue, 3 Sep 2024 20:07:51 -0700 Message-Id: <20240904030751.117579-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240904030751.117579-1-rick.p.edgecombe@intel.com> References: <20240904030751.117579-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes the address translation caches and cached TD VMCS of a TD vCPU in its associated pCPU. In TDX, a vCPUs can only be associated with one pCPU at a time, which is done by invoking SEAMCALL TDH.VP.ENTER. For a successful association, the vCPU must be dissociated from its previous associated pCPU. To facilitate vCPU dissociation, introduce a per-pCPU list associated_tdvcpus. Add a vCPU into this list when it's loaded into a new pCPU (i.e. when a vCPU is loaded for the first time or migrated to a new pCPU). vCPU dissociations can happen under below conditions: - On the op hardware_disable is called. This op is called when virtualization is disabled on a given pCPU, e.g. when hot-unplug a pCPU or machine shutdown/suspend. In this case, dissociate all vCPUs from the pCPU by iterating its per-pCPU list associated_tdvcpus. - On vCPU migration to a new pCPU. Before adding a vCPU into associated_tdvcpus list of the new pCPU, dissociation from its old pCPU is required, which is performed by issuing an IPI and executing SEAMCALL TDH.VP.FLUSH on the old pCPU. On a successful dissociation, the vCPU will be removed from the associated_tdvcpus list of its previously associated pCPU. - On tdx_mmu_release_hkid() is called. TDX mandates that all vCPUs must be disassociated prior to the release of an hkid. Therefore, dissociation of all vCPUs is a must before executing the SEAMCALL TDH.MNG.VPFLUSHDONE and subsequently freeing the hkid. Signed-off-by: Isaku Yamahata Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe --- TDX MMU part 2 v1: - Changed title to "KVM: TDX: Handle vCPU dissociation" . - Updated commit log. - Removed calling tdx_disassociate_vp_on_cpu() in tdx_vcpu_free() since no new TD enter would be called for vCPU association after tdx_mmu_release_hkid(), which is now called in vt_vm_destroy(), i.e. after releasing vcpu fd and kvm_unload_vcpu_mmus(), and before tdx_vcpu_free(). - TODO: include Isaku's fix https://eclists.intel.com/sympa/arc/kvm-qemu-review/2024-07/msg00359.html - Update for the wrapper functions for SEAMCALLs. (Sean) - Removed unnecessary pr_err() in tdx_flush_vp_on_cpu(). - Use KVM_BUG_ON() in tdx_flush_vp_on_cpu() for consistency. - Capitalize the first word of tile. (Binbin) - Minor fixed in changelog. (Binbin, Reinette(internal)) - Fix some comments. (Binbin, Reinette(internal)) - Rename arg_ to _arg (Binbin) - Updates from seamcall overhaul (Kai) - Remove lockdep_assert_preemption_disabled() in tdx_hardware_setup() since now hardware_enable() is not called via SMP func call anymore, but (per-cpu) CPU hotplug thread - Use KVM_BUG_ON() for SEAMCALLs in tdx_mmu_release_hkid() (Kai) - Update based on upstream commit "KVM: x86: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()" - Eliminate TDX_FLUSHVP_NOT_DONE error check because vCPUs were all freed. So the error won't happen. (Sean) --- arch/x86/kvm/vmx/main.c | 22 +++++- arch/x86/kvm/vmx/tdx.c | 151 +++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 4 + 4 files changed, 169 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 8f5dbab9099f..8171c1412c3b 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -10,6 +10,14 @@ #include "tdx.h" #include "tdx_arch.h" +static void vt_hardware_disable(void) +{ + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ + if (enable_tdx) + tdx_hardware_disable(); + vmx_hardware_disable(); +} + static __init int vt_hardware_setup(void) { int ret; @@ -113,6 +121,16 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_load(vcpu, cpu); + return; + } + + vmx_vcpu_load(vcpu, cpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { /* @@ -217,7 +235,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .hardware_unsetup = vmx_hardware_unsetup, .hardware_enable = vmx_hardware_enable, - .hardware_disable = vmx_hardware_disable, + .hardware_disable = vt_hardware_disable, .emergency_disable = vmx_emergency_disable, .has_emulated_msr = vmx_has_emulated_msr, @@ -234,7 +252,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_reset = vt_vcpu_reset, .prepare_switch_to_guest = vmx_prepare_switch_to_guest, - .vcpu_load = vmx_vcpu_load, + .vcpu_load = vt_vcpu_load, .vcpu_put = vmx_vcpu_put, .update_exception_bitmap = vmx_update_exception_bitmap, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 3083a66bb895..554154d3dd58 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -57,6 +57,14 @@ static DEFINE_MUTEX(tdx_lock); /* Maximum number of retries to attempt for SEAMCALLs. */ #define TDX_SEAMCALL_RETRIES 10000 +/* + * A per-CPU list of TD vCPUs associated with a given CPU. Used when a CPU + * is brought down to invoke TDH_VP_FLUSH on the appropriate TD vCPUS. + * Protected by interrupt mask. This list is manipulated in process context + * of vCPU and IPI callback. See tdx_flush_vp_on_cpu(). + */ +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); + static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) { return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); @@ -88,6 +96,22 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) return kvm_tdx->finalized; } +static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) +{ + lockdep_assert_irqs_disabled(); + + list_del(&to_tdx(vcpu)->cpu_list); + + /* + * Ensure tdx->cpu_list is updated before setting vcpu->cpu to -1, + * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU + * to its list before it's deleted from this CPU's list. + */ + smp_wmb(); + + vcpu->cpu = -1; +} + static void tdx_clear_page(unsigned long page_pa) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); @@ -168,6 +192,83 @@ static void tdx_reclaim_control_page(unsigned long ctrl_page_pa) free_page((unsigned long)__va(ctrl_page_pa)); } +struct tdx_flush_vp_arg { + struct kvm_vcpu *vcpu; + u64 err; +}; + +static void tdx_flush_vp(void *_arg) +{ + struct tdx_flush_vp_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + u64 err; + + arg->err = 0; + lockdep_assert_irqs_disabled(); + + /* Task migration can race with CPU offlining. */ + if (unlikely(vcpu->cpu != raw_smp_processor_id())) + return; + + /* + * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized. The + * list tracking still needs to be updated so that it's correct if/when + * the vCPU does get initialized. + */ + if (is_td_vcpu_created(to_tdx(vcpu))) { + /* + * No need to retry. TDX Resources needed for TDH.VP.FLUSH are: + * TDVPR as exclusive, TDR as shared, and TDCS as shared. This + * vp flush function is called when destructing vCPU/TD or vCPU + * migration. No other thread uses TDVPR in those cases. + */ + err = tdh_vp_flush(to_tdx(vcpu)); + if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED)) { + /* + * This function is called in IPI context. Do not use + * printk to avoid console semaphore. + * The caller prints out the error message, instead. + */ + if (err) + arg->err = err; + } + } + + tdx_disassociate_vp(vcpu); +} + +static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu) +{ + struct tdx_flush_vp_arg arg = { + .vcpu = vcpu, + }; + int cpu = vcpu->cpu; + + if (unlikely(cpu == -1)) + return; + + smp_call_function_single(cpu, tdx_flush_vp, &arg, 1); + if (KVM_BUG_ON(arg.err, vcpu->kvm)) + pr_tdx_error(TDH_VP_FLUSH, arg.err); +} + +void tdx_hardware_disable(void) +{ + int cpu = raw_smp_processor_id(); + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); + struct tdx_flush_vp_arg arg; + struct vcpu_tdx *tdx, *tmp; + unsigned long flags; + + local_irq_save(flags); + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) { + arg.vcpu = &tdx->vcpu; + tdx_flush_vp(&arg); + } + local_irq_restore(flags); +} + static void smp_func_do_phymem_cache_wb(void *unused) { u64 err = 0; @@ -204,22 +305,21 @@ void tdx_mmu_release_hkid(struct kvm *kvm) bool packages_allocated, targets_allocated; struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages, targets; - u64 err; + struct kvm_vcpu *vcpu; + unsigned long j; int i; + u64 err; if (!is_hkid_assigned(kvm_tdx)) return; - /* KeyID has been allocated but guest is not yet configured */ - if (!is_td_created(kvm_tdx)) { - tdx_hkid_free(kvm_tdx); - return; - } - packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL); cpus_read_lock(); + kvm_for_each_vcpu(j, vcpu, kvm) + tdx_flush_vp_on_cpu(vcpu); + /* * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock * and can fail with TDX_OPERAND_BUSY when it fails to get the lock. @@ -233,6 +333,16 @@ void tdx_mmu_release_hkid(struct kvm *kvm) * After the above flushing vps, there should be no more vCPU * associations, as all vCPU fds have been released at this stage. */ + err = tdh_mng_vpflushdone(kvm_tdx); + if (err == TDX_FLUSHVP_NOT_DONE) + goto out; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err); + pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + goto out; + } + for_each_online_cpu(i) { if (packages_allocated && cpumask_test_and_set_cpu(topology_physical_package_id(i), @@ -258,6 +368,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm) tdx_hkid_free(kvm_tdx); } +out: mutex_unlock(&tdx_lock); cpus_read_unlock(); free_cpumask_var(targets); @@ -409,6 +520,26 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return 0; } +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->cpu == cpu) + return; + + tdx_flush_vp_on_cpu(vcpu); + + local_irq_disable(); + /* + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure + * vcpu->cpu is read before tdx->cpu_list. + */ + smp_rmb(); + + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); + local_irq_enable(); +} + void tdx_vcpu_free(struct kvm_vcpu *vcpu) { struct vcpu_tdx *tdx = to_tdx(vcpu); @@ -1977,7 +2108,7 @@ static int __init __do_tdx_bringup(void) static int __init __tdx_bringup(void) { const struct tdx_sys_info_td_conf *td_conf; - int r; + int r, i; if (!tdp_mmu_enabled || !enable_mmio_caching) return -EOPNOTSUPP; @@ -1987,6 +2118,10 @@ static int __init __tdx_bringup(void) return -EOPNOTSUPP; } + /* tdx_hardware_disable() uses associated_tdvcpus. */ + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, i)); + /* * Enabling TDX requires enabling hardware virtualization first, * as making SEAMCALLs requires CPU being in post-VMXON state. diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 25a4aaede2ba..4b6fc25feeb6 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -39,6 +39,8 @@ struct vcpu_tdx { unsigned long *tdcx_pa; bool td_vcpu_created; + struct list_head cpu_list; + bool initialized; /* diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index d8a00ab4651c..f4aa0ec16980 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -119,6 +119,7 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); void vmx_setup_mce(struct kvm_vcpu *vcpu); #ifdef CONFIG_INTEL_TDX_HOST +void tdx_hardware_disable(void); int tdx_vm_init(struct kvm *kvm); void tdx_mmu_release_hkid(struct kvm *kvm); void tdx_vm_free(struct kvm *kvm); @@ -128,6 +129,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu); u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -145,6 +147,7 @@ void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); #else +static inline void tdx_hardware_disable(void) {} static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} static inline void tdx_vm_free(struct kvm *kvm) {} @@ -154,6 +157,7 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) {} +static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {} static inline u8 tdx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { return 0; } static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }