From patchwork Tue Nov 12 07:34:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871799 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11C5820A5FB; Tue, 12 Nov 2024 07:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397024; cv=none; b=HqARHr9J14ZvDLpW8EgX7BPJshh1dj4ajMCXlQG4AVwoAC2m3Ce+kj8Xn5/gj5sebskfvzqurN3lEgAKkgCVDNYhfrPVG3AArsSKZ70Xq5iF5aL9ihENyiJVnZ/8nA2f+QmRsvePfTsQlnaawA+A5i8qHB5FQ4sQj0FjxzIU/Vg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397024; c=relaxed/simple; bh=sYlwcimPr/bcLGxNCU6gu7tC+Bt3r+i40j9uiMiWFoI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dLOnGmd5ZKPstB5aL8VJZPdHKl8VRn5dXJ0Ijl5zPeTezvuVCGN8JEoHNW1Wl7dpp41BflOR0ADs6wcpjavwpih2pR/XIVMHXrFraB85Xx7EPCNES17jVL6m17CoNnulirqFCoOXWLodm6qfTfNZviMketF9LDtgDwmEn3fCWxo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iB7rW+eg; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iB7rW+eg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397023; x=1762933023; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sYlwcimPr/bcLGxNCU6gu7tC+Bt3r+i40j9uiMiWFoI=; b=iB7rW+egSHVgZuGlBH1kzC9Mol5ICrLfLEDEzqj42J+TXJjj7ccv3i0n aL3HevzbiFO7LRzBGNpfzPRRca3Gxtf6iZCj8b1jNCcQ1/ZJC6KfWz5Pn 9oLuLb5GY4TRjLWaWwRrMW5EnQOJcvEYqYOpD8ukUknOJQn/5EgxliAjy 7Vw5fjoRdIp4sluuzwEUuHIgpn7TQ22oL1ZgN5qgAcnYOEm/rjKvanzpC DFpIW1B/FbJpNxv8CwLexAYnN+MfcrZPBrDsk0OZ5w2UBRrlKCnCHTdcI JBnhLNcYWyWcAyo2Bhb0d75sii0Eyl2otdqU90WIOScB/PCIRUp8oTWIL A==; X-CSE-ConnectionGUID: zp9osZzbQjOdWFwSjnJ0tQ== X-CSE-MsgGUID: J+QGyxiyRteAkkn3INBd5A== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389118" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389118" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:03 -0800 X-CSE-ConnectionGUID: 1kvmQp4JSI6yopj1pAkjlw== X-CSE-MsgGUID: hSC+T8WiRyeZJPfII3pgpw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87081579" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:36:56 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 01/24] KVM: x86/mmu: Implement memslot deletion for TDX Date: Tue, 12 Nov 2024 15:34:26 +0800 Message-ID: <20241112073426.21997-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Rick Edgecombe Update attr_filter field to zap both private and shared mappings for TDX when memslot is deleted. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - Update commit msg. TDX MMU part 2 v1: - Clarify TDX limits on zapping private memory (Sean) Memslot quirk series: - New patch --- arch/x86/kvm/mmu/mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2e253a488949..68bb78a2306c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7126,6 +7126,7 @@ static void kvm_mmu_zap_memslot(struct kvm *kvm, .start = slot->base_gfn, .end = slot->base_gfn + slot->npages, .may_block = true, + .attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED, }; bool flush; From patchwork Tue Nov 12 07:34:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871800 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7443C20A5FB; Tue, 12 Nov 2024 07:37:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397051; cv=none; b=urayIIOixAaXAw6CogDrW8fzm5ciNFMToyivdGlXcX0lpuyF6u7XzYb/Br4j5IDQVNGRENHB+NFnUOpmH/fmOvgcJFgPSm/VxAn/2YiKlydp+KQ0N3P2pRqHC89xIagNF78IEwrarBh6TxWsPyXxYpAAa/WYflwA71qwBXuzanY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397051; c=relaxed/simple; bh=czVlomoh24KL+gMFZIleuMzZnXotvPsIaQAUtpIa1oE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SLToCsSs4JbFt/w08fupPhAMS7TDbM/CjiDKGdxNoWE+D8aQMNzS7p0tjPNRIByZCR6kCQN+87m0HFxK6fUptT1xrJue9ZRW6d8hx5+oS2M7Na4K7ZSth+OFE7Xdc9QTvnWCLoFxydjR/c5rtGnZjQ/cNuOJwyruKCmgtW26wcA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WV+tQhiO; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WV+tQhiO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397049; x=1762933049; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=czVlomoh24KL+gMFZIleuMzZnXotvPsIaQAUtpIa1oE=; b=WV+tQhiODcjPUxVn2XJG0qqVDUOJk4XXUinJmr9cJob6vnkd0FnfZ5S5 5bvuJTmZvXY5XQvU5Kg9WqJySRzWUXszMwMiLZm4UHQltYXwXG2wPp7Kg /xGbe5zPRWpfCXr8qnJIIGkkxh8IrTIybFoN1xqQjYbVZow3cZsbRCGps X/XJp1jFqG2Cg0v9dZN2eRW9PzC8prC4j4qyjtXth23hb+D7fVQuWeThR HitR6dgcnx3kPM0YdEv7t6QaUrrf6N+jwB7GlrdYHNG/puxtyK36R0sc9 x/3qecV4J4rLVb3gXNgn6NlLp3bQEjsVBkVAQbwKavD+ipHEZW3RW/k7z w==; X-CSE-ConnectionGUID: UZIf8EhGQO2jdXcjjqWLJg== X-CSE-MsgGUID: 7GkXf/jXQKKZ5tvL/cyJng== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="30616010" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="30616010" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:28 -0800 X-CSE-ConnectionGUID: SVh5p95lQ8qwGhbtZoy0oA== X-CSE-MsgGUID: RW8Yj9ERT+yWAUYH1Ru8Zw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="92416795" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:24 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 02/24] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Date: Tue, 12 Nov 2024 15:34:57 +0800 Message-ID: <20241112073457.22011-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Export a function to walk down the TDP without modifying it and simply check if a GPA is mapped. Future changes will support pre-populating TDX private memory. In order to implement this KVM will need to check if a given GFN is already pre-populated in the mirrored EPT. [1] There is already a TDP MMU walker, kvm_tdp_mmu_get_walk() for use within the KVM MMU that almost does what is required. However, to make sense of the results, MMU internal PTE helpers are needed. Refactor the code to provide a helper that can be used outside of the KVM MMU code. Refactoring the KVM page fault handler to support this lookup usage was also considered, but it was an awkward fit. kvm_tdp_mmu_gpa_is_mapped() is based on a diff by Paolo Bonzini. Link: https://lore.kernel.org/kvm/ZfBkle1eZFfjPI8l@google.com/ [1] Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: - Added Paolo's rb TDX MMU part 2 v1: - Change exported function to just return of GPA is mapped because "You are executing with the filemap_invalidate_lock() taken, and therefore cannot race with kvm_gmem_punch_hole()" (Paolo) https://lore.kernel.org/kvm/CABgObfbpNN842noAe77WYvgi5MzK2SAA_FYw-=fGa+PcT_Z22w@mail.gmail.com/ - Take root hpa instead of enum (Paolo) TDX MMU Prep v2: - Rename function with "mirror" and use root enum TDX MMU Prep: - New patch --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 3 +-- arch/x86/kvm/mmu/tdp_mmu.c | 37 ++++++++++++++++++++++++++++++++----- 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 0fa86e47e9f3..398b6b06ed73 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -252,6 +252,9 @@ extern bool tdp_mmu_enabled; #define tdp_mmu_enabled false #endif +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa); +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level); + static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 68bb78a2306c..3a338df541c1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4743,8 +4743,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return direct_page_fault(vcpu, fault); } -static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, - u8 *level) +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level) { int r; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d7d60116672b..b0e1c4cb3004 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1898,16 +1898,13 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, * * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. */ -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, - int *root_level) +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + struct kvm_mmu_page *root) { - struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); struct tdp_iter iter; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - *root_level = vcpu->arch.mmu->root_role.level; - tdp_mmu_for_each_pte(iter, vcpu->kvm, root, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; @@ -1916,6 +1913,36 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, return leaf; } +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + int *root_level) +{ + struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa); + *root_level = vcpu->arch.mmu->root_role.level; + + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root); +} + +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa) +{ + struct kvm *kvm = vcpu->kvm; + bool is_direct = kvm_is_addr_direct(kvm, gpa); + hpa_t root = is_direct ? vcpu->arch.mmu->root.hpa : + vcpu->arch.mmu->mirror_root_hpa; + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; + int leaf; + + lockdep_assert_held(&kvm->mmu_lock); + rcu_read_lock(); + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root)); + rcu_read_unlock(); + if (leaf < 0) + return false; + + spte = sptes[leaf]; + return is_shadow_present_pte(spte) && is_last_spte(spte, leaf); +} +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped); + /* * Returns the last level spte pointer of the shadow page walk for the given * gpa, and sets *spte to the spte value. This spte may be non-preset. If no From patchwork Tue Nov 12 07:35:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871801 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45E27849C; Tue, 12 Nov 2024 07:37:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397068; cv=none; b=DcMWTkMzCPape3pMDDzvmTpRhnx+wMzydzfPElZhVvl/jZKJ6/4WwnMDMkbu7zQuWcg4PiL2lSEVVNDs56hAW669AMDVztCojBoWScmwyoEzMfVUy8hvYjprfLWkS9eY8CuBINFv0CfxtyIb/YfsNjdVaft+9xfRBbz/mb64p14= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397068; c=relaxed/simple; bh=kT4mFrbUXzSw+ZvkIhQ9I58T4nMsjmaFcq1Xjn9KWGc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VA6DxSq+dNC0CXtv+Zkademu00qCf7p07ZhCtcEnP+wnD2LmlkTHyzQnvLkgdb31RUjSo0F3mKRo40A4X4dylENJYYHpcfAQtB4n9bxv/eq2wkXPCuXfvDeu30qjJFZnRkR+wj/nxOOO7QKzfBZj1iY/GPjOb3G3sgWr6/ThQYU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=coDSGFy8; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="coDSGFy8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397067; x=1762933067; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kT4mFrbUXzSw+ZvkIhQ9I58T4nMsjmaFcq1Xjn9KWGc=; b=coDSGFy8U7MZDMBaf3heUDcq6FvSOqjxywB3484zzij6UycgvLsOEXAy Oh2AfKnU7bk/4LM2Iw1SqgGiuhj+eLw3/R2DJdxy9X5AN7HtO2v1ddW8z WLrFXJuTa17NXLNtbS205HugnT1C980fBP0owLUWjTSMITKKgiWXYag+P bm8c3Gv1QSxAmriWA+n6h2oGXWvmO/dKnQQKyfgiF+Rv+T0vYSLEveOx3 REbg52PApcPE3w4iJ1QJzk1EpYkTbOjU8Cj7oPLi4dfxvZD+nmLHTcsJ5 c2+oxl31QrycDfFYGqgHK9n5B2wCbGYsc/Dfi6RZXoCmGmt4oBI2rrS1v w==; X-CSE-ConnectionGUID: 3yqSpFjWSFK7UG8ao/U+hA== X-CSE-MsgGUID: pJabUOZkShmllVKL6gxR8Q== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="30616049" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="30616049" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:46 -0800 X-CSE-ConnectionGUID: IQ9SC6sdTiqE00aRrGNeTg== X-CSE-MsgGUID: S+D2hQ5GQMuTIe7ZRP2rvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="92416912" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:42 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 03/24] KVM: x86/mmu: Do not enable page track for TD guest Date: Tue, 12 Nov 2024 15:35:15 +0800 Message-ID: <20241112073515.22028-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Fail kvm_page_track_write_tracking_enabled() if VM type is TDX to make the external page track user fail in kvm_page_track_register_notifier() since TDX does not support write protection and hence page track. No need to fail KVM internal users of page track (i.e. for shadow page), because TDX is always with EPT enabled and currently TDX module does not emulate and send VMLAUNCH/VMRESUME VMExits to VMM. Suggested-by: Paolo Bonzini Signed-off-by: Yan Zhao Reviewed-by: Binbin Wu Cc: Yuan Yao --- TDX MMU part 2 v2: - Move the checking of VM type from kvm_page_track_write_tracking_enabled() to kvm_enable_external_write_tracking() to make kvm_page_track_register_notifier() fail. (Paolo) - Updated patch msg (Yan) - Added Paolo's Suggested-by tag since the patch is simple enough and the current implementation was suggested by Paolo (Yan) v19: - drop TDX: from the short log - Added reviewed-by: BinBin --- arch/x86/kvm/mmu/page_track.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 561c331fd6ec..1b17b12393a8 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -172,6 +172,9 @@ static int kvm_enable_external_write_tracking(struct kvm *kvm) struct kvm_memory_slot *slot; int r = 0, i, bkt; + if (kvm->arch.vm_type == KVM_X86_TDX_VM) + return -EOPNOTSUPP; + mutex_lock(&kvm->slots_arch_lock); /* From patchwork Tue Nov 12 07:35:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871802 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67B5420ADDD; Tue, 12 Nov 2024 07:38:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397083; cv=none; b=Acm8clUnbkI9PX8fRNSoLKhl9ldT/fmE2taqUDR+QxELh/OHqYSiHYCcHxTVWDcSCDfn0wvg8JJG5yqaeek80AVBhcv+Ct9rcyl7Mw/CoRfeSQlAhO88zfwdxyHaw/H1H0AMhPu1Nb1rDMN8eEVF8ehoeDVdKvYMx+k+uEUGw7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397083; c=relaxed/simple; bh=a9HsTh3oTSpqmpI5Nd9DSVkMqXwyIFrApC3W0w3zqYE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XSg1IKnY+oDdTCngmjkEGclR+RQ0flnCtdqA6V5jMPn8gi/qh0FqHm1pgKkhpenZLGIX68PZZZ9GKe/i9ZklPLDTyonXrI7U+uH8huUVAtAVoRQXca/aY9VQorPhTKmkUIZ1Q6DJnbMdjHU0lJJ5Dw1pGXJGxTFg5wHA7OaddLA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=c/Be51o8; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="c/Be51o8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397081; x=1762933081; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=a9HsTh3oTSpqmpI5Nd9DSVkMqXwyIFrApC3W0w3zqYE=; b=c/Be51o8glu8cFGja4SiYGj3nf44Ylossrn4rAFG0oW1x2ND5wXMwc8Y h/gU9Tw6zTkWTwkpELzhTPSLCvy/H/tolPhVcBzizW5CF2OPPgfHX6Znw mTllHTxUiMNnQB5OVXHcH00TEY/VxfUhTDI7r7ZS6iPfIvDnXGC2tNfEA qmX8oWqHry80wJoRx0D401kpWdhkKArhcYg1JuoAm29qPCOKoo0ZfgHSg Z37L/3f5/qFvLEb5ax9qlZKzPcmCN9ag9QpM37v3pK5R9A9gobiVNuQ3P uttnQxew898Mk7kICkFqUWgJ9q0fjDy6WsX+MCQKnKk7MmL2cf07P7Bwe A==; X-CSE-ConnectionGUID: oXtCJOrVRbewVeMTtzKu7A== X-CSE-MsgGUID: BxttSaPOTo2HVQoDFk/Vpg== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="42598598" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="42598598" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:59 -0800 X-CSE-ConnectionGUID: VN6HTYJBTuaJc1qvHb1kPA== X-CSE-MsgGUID: 7W5iTpsARM+VC3OB8STNqg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="110595063" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:37:54 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 04/24] KVM: VMX: Split out guts of EPT violation to common/exposed function Date: Tue, 12 Nov 2024 15:35:28 +0800 Message-ID: <20241112073528.22042-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson The difference of TDX EPT violation is how to retrieve information, GPA, and exit qualification. To share the code to handle EPT violation, split out the guts of EPT violation handler so that VMX/TDX exit handler can call it after retrieving GPA and exit qualification. Signed-off-by: Sean Christopherson Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini Reviewed-by: Kai Huang Reviewed-by: Binbin Wu --- arch/x86/kvm/vmx/common.h | 34 ++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 25 +++---------------------- 2 files changed, 37 insertions(+), 22 deletions(-) create mode 100644 arch/x86/kvm/vmx/common.h diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h new file mode 100644 index 000000000000..78ae39b6cdcd --- /dev/null +++ b/arch/x86/kvm/vmx/common.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __KVM_X86_VMX_COMMON_H +#define __KVM_X86_VMX_COMMON_H + +#include + +#include "mmu.h" + +static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, + unsigned long exit_qualification) +{ + u64 error_code; + + /* Is it a read fault? */ + error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) + ? PFERR_USER_MASK : 0; + /* Is it a write fault? */ + error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) + ? PFERR_WRITE_MASK : 0; + /* Is it a fetch fault? */ + error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) + ? PFERR_FETCH_MASK : 0; + /* ept page table entry is present? */ + error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) + ? PFERR_PRESENT_MASK : 0; + + if (error_code & EPT_VIOLATION_GVA_IS_VALID) + error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? + PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); +} + +#endif /* __KVM_X86_VMX_COMMON_H */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 976fe6579f62..f7ae2359cea2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -53,6 +53,7 @@ #include #include "capabilities.h" +#include "common.h" #include "cpuid.h" #include "hyperv.h" #include "kvm_onhyperv.h" @@ -5774,11 +5775,8 @@ static int handle_task_switch(struct kvm_vcpu *vcpu) static int handle_ept_violation(struct kvm_vcpu *vcpu) { - unsigned long exit_qualification; + unsigned long exit_qualification = vmx_get_exit_qual(vcpu); gpa_t gpa; - u64 error_code; - - exit_qualification = vmx_get_exit_qual(vcpu); /* * EPT violation happened while executing iret from NMI, @@ -5794,23 +5792,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); trace_kvm_page_fault(vcpu, gpa, exit_qualification); - /* Is it a read fault? */ - error_code = (exit_qualification & EPT_VIOLATION_ACC_READ) - ? PFERR_USER_MASK : 0; - /* Is it a write fault? */ - error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE) - ? PFERR_WRITE_MASK : 0; - /* Is it a fetch fault? */ - error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR) - ? PFERR_FETCH_MASK : 0; - /* ept page table entry is present? */ - error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK) - ? PFERR_PRESENT_MASK : 0; - - if (error_code & EPT_VIOLATION_GVA_IS_VALID) - error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? - PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; - /* * Check that the GPA doesn't exceed physical memory limits, as that is * a guest page fault. We have to emulate the instruction here, because @@ -5822,7 +5803,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa))) return kvm_emulate_instruction(vcpu, 0); - return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); + return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification); } static int handle_ept_misconfig(struct kvm_vcpu *vcpu) From patchwork Tue Nov 12 07:35:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871803 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16EF820B814; Tue, 12 Nov 2024 07:38:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397092; cv=none; b=Gg617zf61Tl0nuEcBWWeE/cCWsIBQE/YuIR7aoDJEaZjTxgMTBrAKz4M5MNUDbt4sAjei7pWOFFc2ww6YAnvtHCrhV8X2kA5MPYGynCX5Jzh5fzNn/6jc8uxHWdVOZkqbdNG3R/OYRKJNPXOpM1dck0/dmXXIzChxYNFUAP9Mw4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397092; c=relaxed/simple; bh=wMpGoehg1J/B3n5jSNSrNlfUDzijvrljKuSJUktw21o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q0/OKCYlucHosH8AvjR1vOpwtg/uVBjFfjGeZUcfTgIY5VRIbJoHjfvzR3rBSMNJ25mXB0yRbD9P/EY81YEbCHz7yVE87q6+S24q0VRGxoBe6Js22jkaPyBkjS448vmdgJMUusX+1oICD17G6lJ2B2ukb5SRYJ++pPkne+qG8Ms= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=WZMgCD/3; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="WZMgCD/3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397091; x=1762933091; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wMpGoehg1J/B3n5jSNSrNlfUDzijvrljKuSJUktw21o=; b=WZMgCD/3LDUgUBgjYEmlrHT+vkvTYThnmaWZ0an1F4tRnG0a2LHr8+bY mZBNESsNRuEKYelfeafQm628x9IKe7z37yrLpGrCNWkriMoBVoPXeTAPW Z82lYMl7OsrEmcyVZ9V3y0LlQFA5VbHrB08GNS9IVboucw0wwkzSgb52V 3+Dgi3CAV81Sow/DFyRFOfWQbVanH21fUFzrrfJyblXojywtPDQLPIkjs cK9kvzXZDDqDAEtJfNAm5Bc0YLd9GlE/cCxQGqoNRxguNr9NtcsXeLAVb ZhMSFmjuEiIAPT0FbvzTa5Kvdy665Rf+oFmddB2OncnZW1OoMPGZIuRQK w==; X-CSE-ConnectionGUID: r2z7AeLQQ0CcB7a+QIUw2Q== X-CSE-MsgGUID: OV1V894QTTSHzPfWqYXhFQ== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="42598621" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="42598621" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:10 -0800 X-CSE-ConnectionGUID: ACOXBA0fQl+UndL2pYoT6Q== X-CSE-MsgGUID: cjJW8phnTEepnbzucvUpnQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="110595076" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:06 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 05/24] KVM: VMX: Teach EPT violation helper about private mem Date: Tue, 12 Nov 2024 15:35:39 +0800 Message-ID: <20241112073539.22056-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Rick Edgecombe Teach EPT violation helper to check shared mask of a GPA to find out whether the GPA is for private memory. When EPT violation is triggered after TD accessing a private GPA, KVM will exit to user space if the corresponding GFN's attribute is not private. User space will then update GFN's attribute during its memory conversion process. After that, TD will re-access the private GPA and trigger EPT violation again. Only with GFN's attribute matches to private, KVM will fault in private page, map it in mirrored TDP root, and propagate changes to private EPT to resolve the EPT violation. Relying on GFN's attribute tracking xarray to determine if a GFN is private, as for KVM_X86_SW_PROTECTED_VM, may lead to endless EPT violations. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - Rename kvm_is_private_gpa() to vt_is_tdx_private_gpa() (Paolo) TDX MMU part 2 v1: - Split from "KVM: TDX: handle ept violation/misconfig exit" --- arch/x86/kvm/vmx/common.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 78ae39b6cdcd..7a592467a044 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -6,6 +6,12 @@ #include "mmu.h" +static inline bool vt_is_tdx_private_gpa(struct kvm *kvm, gpa_t gpa) +{ + /* For TDX the direct mask is the shared mask. */ + return !kvm_is_addr_direct(kvm, gpa); +} + static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned long exit_qualification) { @@ -28,6 +34,9 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa, error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) + error_code |= PFERR_PRIVATE_ACCESS; + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } From patchwork Tue Nov 12 07:35:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871804 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19C7F20A5FE; Tue, 12 Nov 2024 07:38:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397103; cv=none; b=Z3+UinAD5oq0Jqu/P01Spk/VVjf46GMkpilkArpA26VAIq8T4mTCMdjq8mJ3H4UE9vyC9oRlYHid8N37jC8l9bBI7Vd9GJH8XfUsxWPr6oXb3yPANybVp++u+lT1JjSZ6r/iUheTlBX8yNh7S4KfcN7yKvaxKcMoKPHsNIriOfk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397103; c=relaxed/simple; bh=LJP5i6H0vrVARHyHyA5b39kA++Huw/2CwDa6XjsgVdQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uzVAx/7H5YO1iiOTJ+u40DRZp2Q6m/CawUH1Z5MZS/J+/XUGkgxAdYIgoX8OyF86xzl4IUAzL++OkcJXZ217lJ9bTYfgTGPYO/sx2+cjc+Z3Fc4yFIOpZl79r/f9/aruomFS2s+CovlcdKIkr2QkVCBytAfTh0tARqfzSb8PoDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DabZaDXy; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DabZaDXy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397102; x=1762933102; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LJP5i6H0vrVARHyHyA5b39kA++Huw/2CwDa6XjsgVdQ=; b=DabZaDXyFkV7CC1mvIWhfdDo/zVh2wBkDctNcZVww8SUzlRVFNZe8WsF UYoPtOQJibX+J+9iSXDqNHRFQh+d7AUZ7wbXgUmo1N7YkWZLa7gJUO0Ew pSJqP97ci/jhnvsDeihgx3m3EIy/8RLQRM/6ElpYB1UaNkDLQCMZRWnTx Dg0HvCbri63XOypgTtTwAd1nY8+zuV+dEj5SNiZm7+NvvcGZFskQ5V2xS 3LmRt7f032E4q6GCXZjlep619GorypKO4x+9S9PEWPrInD131r13z3Qqh tkrz5Yf0zlpwFUJQNcXSyZ2dRTZNhjumyBIu+3fiOsy72GPV1je4FyVDG Q==; X-CSE-ConnectionGUID: jF4UArfmS7S1geYmbjew1Q== X-CSE-MsgGUID: dc/yFJ8rR5e2y/8owxcfKw== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="42598658" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="42598658" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:22 -0800 X-CSE-ConnectionGUID: B4UqiyKrQc65YKcJV4DvLA== X-CSE-MsgGUID: DZR3eZcrQ8Ofxujm9Aj9TQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="110595087" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:17 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 06/24] KVM: TDX: Add accessors VMX VMCS helpers Date: Tue, 12 Nov 2024 15:35:50 +0800 Message-ID: <20241112073551.22070-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX defines SEAMCALL APIs to access TDX control structures corresponding to the VMX VMCS. Introduce helper accessors to hide its SEAMCALL ABI details. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: -Move inline warning msgs to tdh_vp_rd/wr_failed() (Paolo) TDX MMU part 2 v1: - Update for the wrapper functions for SEAMCALLs. (Sean) - Eliminate kvm_mmu_free_private_spt() and open code it. - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - deleted unnecessary stub functions, tdvps_state_non_arch_check() and tdvps_management_check(). --- arch/x86/kvm/vmx/tdx.c | 13 +++++++ arch/x86/kvm/vmx/tdx.h | 88 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 7fb32d3b1aae..ed4473d0c2cd 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -32,6 +32,19 @@ static enum cpuhp_state tdx_cpuhp_state; static const struct tdx_sys_info *tdx_sysinfo; +void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err) +{ + KVM_BUG_ON(1, tdx->vcpu.kvm); + pr_err("TDH_VP_RD[%s.0x%x] failed 0x%llx\n", uclass, field, err); +} + +void tdh_vp_wr_failed(struct vcpu_tdx *tdx, char *uclass, char *op, u32 field, + u64 val, u64 err) +{ + KVM_BUG_ON(1, tdx->vcpu.kvm); + pr_err("TDH_VP_WR[%s.0x%x]%s0x%llx failed: 0x%llx\n", uclass, field, op, val, err); +} + #define KVM_SUPPORTED_TD_ATTRS (TDX_TD_ATTR_SEPT_VE_DISABLE) static u64 tdx_get_supported_attrs(const struct tdx_sys_info_td_conf *td_conf) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 1b78a7ea988e..727bcf25d731 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -49,6 +49,10 @@ struct vcpu_tdx { enum vcpu_tdx_state state; }; +void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err); +void tdh_vp_wr_failed(struct vcpu_tdx *tdx, char *uclass, char *op, u32 field, + u64 val, u64 err); + static inline bool is_td(struct kvm *kvm) { return kvm->arch.vm_type == KVM_X86_TDX_VM; @@ -80,6 +84,90 @@ static __always_inline u64 td_tdcs_exec_read64(struct kvm_tdx *kvm_tdx, u32 fiel } return data; } + +static __always_inline void tdvps_vmcs_check(u32 field, u8 bits) +{ +#define VMCS_ENC_ACCESS_TYPE_MASK 0x1UL +#define VMCS_ENC_ACCESS_TYPE_FULL 0x0UL +#define VMCS_ENC_ACCESS_TYPE_HIGH 0x1UL +#define VMCS_ENC_ACCESS_TYPE(field) ((field) & VMCS_ENC_ACCESS_TYPE_MASK) + + /* TDX is 64bit only. HIGH field isn't supported. */ + BUILD_BUG_ON_MSG(__builtin_constant_p(field) && + VMCS_ENC_ACCESS_TYPE(field) == VMCS_ENC_ACCESS_TYPE_HIGH, + "Read/Write to TD VMCS *_HIGH fields not supported"); + + BUILD_BUG_ON(bits != 16 && bits != 32 && bits != 64); + +#define VMCS_ENC_WIDTH_MASK GENMASK(14, 13) +#define VMCS_ENC_WIDTH_16BIT (0UL << 13) +#define VMCS_ENC_WIDTH_64BIT (1UL << 13) +#define VMCS_ENC_WIDTH_32BIT (2UL << 13) +#define VMCS_ENC_WIDTH_NATURAL (3UL << 13) +#define VMCS_ENC_WIDTH(field) ((field) & VMCS_ENC_WIDTH_MASK) + + /* TDX is 64bit only. i.e. natural width = 64bit. */ + BUILD_BUG_ON_MSG(bits != 64 && __builtin_constant_p(field) && + (VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_64BIT || + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_NATURAL), + "Invalid TD VMCS access for 64-bit field"); + BUILD_BUG_ON_MSG(bits != 32 && __builtin_constant_p(field) && + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_32BIT, + "Invalid TD VMCS access for 32-bit field"); + BUILD_BUG_ON_MSG(bits != 16 && __builtin_constant_p(field) && + VMCS_ENC_WIDTH(field) == VMCS_ENC_WIDTH_16BIT, + "Invalid TD VMCS access for 16-bit field"); +} + +#define TDX_BUILD_TDVPS_ACCESSORS(bits, uclass, lclass) \ +static __always_inline u##bits td_##lclass##_read##bits(struct vcpu_tdx *tdx, \ + u32 field) \ +{ \ + u64 err, data; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_rd(tdx->tdvpr_pa, TDVPS_##uclass(field), &data); \ + if (unlikely(err)) { \ + tdh_vp_rd_failed(tdx, #uclass, field, err); \ + return 0; \ + } \ + return (u##bits)data; \ +} \ +static __always_inline void td_##lclass##_write##bits(struct vcpu_tdx *tdx, \ + u32 field, u##bits val) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(tdx->tdvpr_pa, TDVPS_##uclass(field), val, \ + GENMASK_ULL(bits - 1, 0)); \ + if (unlikely(err)) \ + tdh_vp_wr_failed(tdx, #uclass, " = ", field, (u64)val, err); \ +} \ +static __always_inline void td_##lclass##_setbit##bits(struct vcpu_tdx *tdx, \ + u32 field, u64 bit) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(tdx->tdvpr_pa, TDVPS_##uclass(field), bit, bit); \ + if (unlikely(err)) \ + tdh_vp_wr_failed(tdx, #uclass, " |= ", field, bit, err); \ +} \ +static __always_inline void td_##lclass##_clearbit##bits(struct vcpu_tdx *tdx, \ + u32 field, u64 bit) \ +{ \ + u64 err; \ + \ + tdvps_##lclass##_check(field, bits); \ + err = tdh_vp_wr(tdx->tdvpr_pa, TDVPS_##uclass(field), 0, bit); \ + if (unlikely(err)) \ + tdh_vp_wr_failed(tdx, #uclass, " &= ~", field, bit, err);\ +} + +TDX_BUILD_TDVPS_ACCESSORS(16, VMCS, vmcs); +TDX_BUILD_TDVPS_ACCESSORS(32, VMCS, vmcs); +TDX_BUILD_TDVPS_ACCESSORS(64, VMCS, vmcs); #else static inline void tdx_bringup(void) {} static inline void tdx_cleanup(void) {} From patchwork Tue Nov 12 07:36:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871805 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 736241C1AD1; Tue, 12 Nov 2024 07:38:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397115; cv=none; b=Bxsm2l8kxOYDku8bm1X85eqccDFN+QMMf1U2f14exH6QzVMfyjq8w8DkZC3Yl5kCMt+HTuPmRiPpdlPRGBTXjAecYl1iIOPZZKCldqw11LuBs+Un7i+OJ1+BpZSQLjPM6bm1wDVaPhL6Jn+P3lLmEyrxarw3aFPhJDewmusD2Pg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397115; c=relaxed/simple; bh=sN9nijE9CbITTXnFl8o4aImSrCwVmYnOzQ3JNw75nO0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u08xkqlIyJPT/B5qXW2/0eeN62uG9j0bO+mcDUM55/4O7/7jrUgU0i+j/4m8NcTuYfP9TETt4VciTlUWWkYo5z1t8P+Ig0g5kn+ALB6MH+8JTDfUgmULXZsFaC0ag5d2jpwfrImk/nSRPRlW20ouf7di1cW9HwJ/16Gb+doYrps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mHM5Lsao; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mHM5Lsao" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397113; x=1762933113; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sN9nijE9CbITTXnFl8o4aImSrCwVmYnOzQ3JNw75nO0=; b=mHM5Lsaogo2egUOEtqZHUGQMXbV5slEF6r4m4tT7h7oJud+ioC14RLyR IzpyulcfRqyJv+8OB82reuN5cZHIfEifnaWNoGTPcIEpjWVMWSZrUNT9C tQRB8wCnp3/q/d9RvQUDmNZVCF3zl6YJyDBcmL9bCvD+NZVOZNoS99FJ/ xdEZ3/q2Juxtt7hSAppYwKoJXJ6uQbc7noMaXojUTnEDdfZAmUCkR1x6F leitfpsGenpMk37XTtbAkc1svLGaTutEc5YRn73uP+Dx0DQFdBa7x4PUa wv0EhDcvFpyVUBSkH/zKf0znUthahCPOiIEln8d5n/GYwspvLWppiN5iw A==; X-CSE-ConnectionGUID: tUwHEn1NQ+O9//LFyUbpSQ== X-CSE-MsgGUID: 4kdYjOz4Q72UoSlcRZsNmg== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389325" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389325" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:33 -0800 X-CSE-ConnectionGUID: v947A+FzS7yrcfp0f4V+Fw== X-CSE-MsgGUID: APiAnDlnTGCyF4dRgxMmAQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87736007" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:28 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 07/24] KVM: TDX: Add load_mmu_pgd method for TDX Date: Tue, 12 Nov 2024 15:36:01 +0800 Message-ID: <20241112073601.22084-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson TDX uses two EPT pointers, one for the private half of the GPA space and one for the shared half. The private half uses the normal EPT_POINTER vmcs field, which is managed in a special way by the TDX module. For TDX, KVM is not allowed to operate on it directly. The shared half uses a new SHARED_EPT_POINTER field and will be managed by the conventional MMU management operations that operate directly on the EPT root. This means for TDX the .load_mmu_pgd() operation will need to know to use the SHARED_EPT_POINTER field instead of the normal one. Add a new wrapper in x86 ops for load_mmu_pgd() that either directs the write to the existing vmx implementation or a TDX one. tdx_load_mmu_pgd() is so much simpler than vmx_load_mmu_pgd() since for the TDX mode of operation, EPT will always be used and KVM does not need to be involved in virtualization of CR3 behavior. So tdx_load_mmu_pgd() can simply write to SHARED_EPT_POINTER. Signed-off-by: Sean Christopherson Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: -Check shared EPT level matches to direct bits mask in tdx_load_mmu_pgd() (Chao Gao) TDX MMU part 2 v1: - update the commit msg with the version rephrased by Rick. https://lore.kernel.org/all/78b1024ec3f5868e228baf797c6be98c5397bd49.camel@intel.com/ v19: - Add WARN_ON_ONCE() to tdx_load_mmu_pgd() and drop unconditional mask --- arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/vmx/main.c | 13 ++++++++++++- arch/x86/kvm/vmx/tdx.c | 15 +++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 4 ++++ 4 files changed, 32 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index f7fd4369b821..9298fb9d4bb3 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -256,6 +256,7 @@ enum vmcs_field { TSC_MULTIPLIER_HIGH = 0x00002033, TERTIARY_VM_EXEC_CONTROL = 0x00002034, TERTIARY_VM_EXEC_CONTROL_HIGH = 0x00002035, + SHARED_EPT_POINTER = 0x0000203C, PID_POINTER_TABLE = 0x00002042, PID_POINTER_TABLE_HIGH = 0x00002043, GUEST_PHYSICAL_ADDRESS = 0x00002400, diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d28ffddd766f..3c292b4a063a 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -98,6 +98,17 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, + int pgd_level) +{ + if (is_td_vcpu(vcpu)) { + tdx_load_mmu_pgd(vcpu, root_hpa, pgd_level); + return; + } + + vmx_load_mmu_pgd(vcpu, root_hpa, pgd_level); +} + static int vt_mem_enc_ioctl(struct kvm *kvm, void __user *argp) { if (!is_td(kvm)) @@ -229,7 +240,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .write_tsc_offset = vmx_write_tsc_offset, .write_tsc_multiplier = vmx_write_tsc_multiplier, - .load_mmu_pgd = vmx_load_mmu_pgd, + .load_mmu_pgd = vt_load_mmu_pgd, .check_intercept = vmx_check_intercept, .handle_exit_irqoff = vmx_handle_exit_irqoff, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ed4473d0c2cd..785ee9f95504 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -28,6 +28,9 @@ bool enable_tdx __ro_after_init; module_param_named(tdx, enable_tdx, bool, 0444); +#define TDX_SHARED_BIT_PWL_5 gpa_to_gfn(BIT_ULL(51)) +#define TDX_SHARED_BIT_PWL_4 gpa_to_gfn(BIT_ULL(47)) + static enum cpuhp_state tdx_cpuhp_state; static const struct tdx_sys_info *tdx_sysinfo; @@ -495,6 +498,18 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) tdx->state = VCPU_TD_STATE_UNINITIALIZED; } + +void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) +{ + u64 shared_bit = (pgd_level == 5) ? TDX_SHARED_BIT_PWL_5 : + TDX_SHARED_BIT_PWL_4; + + if (KVM_BUG_ON(shared_bit != kvm_gfn_direct_bits(vcpu->kvm), vcpu->kvm)) + return; + + td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 4739891858ea..f49135094c94 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -129,6 +129,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); + +void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -140,6 +142,8 @@ static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } + +static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ From patchwork Tue Nov 12 07:36:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871808 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0988820C03B; Tue, 12 Nov 2024 07:38:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397126; cv=none; b=UuK1fQdsilOwWpaT386qyYkh43UGfhTKY81PFOQ0mxmpb1Vbo2918okTYi7zKfSi39jmvqoiAsPF58F+o5Oi4dzx766J68YeGHknUDjuSmX0BeMjPaYwBxulBJUmN4emp9nWiaxkxvsvLkOiYiDsgi1Xeip6Q46KELACcdOQaXM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397126; c=relaxed/simple; bh=hpL8qRg9b5uNCj3fBmLnd4DdrGDSB5YFOl7ygYxVAmQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cCGTZ9/9GTbT8kO2kI9FZaL2Z/nvu83Ec5F374MGv+emDLxhwUlSyy9sQ7s38wuzVO9bPnsUnSPHcEbVMC+uPaTRAoB7UsmzrJUt+AOL3fOlYLwSKMvHgZ4FXuTfFQmYGN2E8rleDSrdiD2mj2u0J8lrw79buZiEfCbXen9sEyI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aFuTflUe; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aFuTflUe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397125; x=1762933125; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hpL8qRg9b5uNCj3fBmLnd4DdrGDSB5YFOl7ygYxVAmQ=; b=aFuTflUeEQpgyHYt3BOHsAtCROEmMANLkTxhA7kj2sYbhYz4inqOdSlX XqJEzHGL6ac0rjmOlHYHDM7ddt0xh+rDwjZqBnIptOCtnG36mk5IZLX13 kV8WhPRgYKyS+R0BsffdLa0RsNh6nFpFVNgilsNt5JBLGXXutHTD/p7W4 TnSKMzRfsGpL6DgvdMdvVWCQ/cLv15LGQqzYCRpp0w8MUyE+5GDCpW6Hn JuvATrFw+EFOAsdRLJnw5OxswxvWqOy/tUoz2DFgQXVkIE2/a12E/5UeW DOKdORXdo3qB9MhCTOwp/e2W0OIkGzMaY54qMHdY4DXzK0NBmmJxoPVMI g==; X-CSE-ConnectionGUID: c2nd6TvbQa+/sTwlC07bTQ== X-CSE-MsgGUID: 8/KQMbcfS0mPI9dea2mgug== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389345" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389345" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:44 -0800 X-CSE-ConnectionGUID: 4cd0IoREQ7KLeIrtfNa9jA== X-CSE-MsgGUID: 1JFz42ZjS4OSeolUaLGb8A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87736272" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:40 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 08/24] KVM: TDX: Set gfn_direct_bits to shared bit Date: Tue, 12 Nov 2024 15:36:13 +0800 Message-ID: <20241112073613.22100-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Make the direct root handle memslot GFNs at an alias with the TDX shared bit set. For TDX shared memory, the memslot GFNs need to be mapped at an alias with the shared bit set. These shared mappings will be mapped on the KVM MMU's "direct" root. The direct root has it's mappings shifted by applying "gfn_direct_bits" as a mask. The concept of "GPAW" (guest physical address width) determines the location of the shared bit. So set gfn_direct_bits based on this, to map shared memory at the proper GPA. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: - Added Paolo's rb - Use TDX 1.5 naming of config_flags instead of exec_controls (Xiaoyao) - Use macro TDX_SHARED_BITS_PWL_5 and TDX_SHARED_BITS_PWL_4 for gfn_direct_bits. (Yan) TDX MMU part 2 v1: - Move setting of gfn_direct_bits to separate patch (Yan) --- arch/x86/kvm/vmx/tdx.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 785ee9f95504..38369cafc175 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1041,6 +1041,11 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) kvm_tdx->attributes = td_params->attributes; kvm_tdx->xfam = td_params->xfam; + if (td_params->config_flags & TDX_CONFIG_FLAGS_MAX_GPAW) + kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_5; + else + kvm->arch.gfn_direct_bits = TDX_SHARED_BIT_PWL_4; + kvm_tdx->state = TD_STATE_INITIALIZED; out: /* kfree() accepts NULL. */ From patchwork Tue Nov 12 07:36:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871809 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DD1E20ADDB; Tue, 12 Nov 2024 07:38:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397138; cv=none; b=GtBRCw83PHmXHUqF/Jy3PL8l8N2JhK5OikEb1E3DN4ZBolESFxqwg1bv0QYINH5tALx/kkLwGh5fH5bkoTJWoS7U6pldkv14seAYL9tLNVwExokzd0dPZ88bF8iNhwO6gbrJx7kK/DSEU9wcxOq68PtVyv389E3OKLJa0Z1oQkU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397138; c=relaxed/simple; bh=HG9MO4y7d/mRf/kaNdvPBYM1BwuKLbFyc7R09Li8cWM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CWkWK2lIh78ebhhz9bJvBE49qlHUQrELH4O1e3nIjd6KISaq+5sTxL7VUy0hANpVsaNqHfQid4MOKG9Tf/w/vISlBOMvnyyJZX8XxVLPEKtIy9Bs3oAPzKFfDGya4ML4Vc73FWvoPCA8yY1HO7iX9vpYjI8mP//UWr1mZaVC+kQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QP93u2OD; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QP93u2OD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397136; x=1762933136; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HG9MO4y7d/mRf/kaNdvPBYM1BwuKLbFyc7R09Li8cWM=; b=QP93u2ODrz4lj++Z9v2Dva7Nl39vH4hwXbiz4FKbOwfUbchqX1ATbcfo fCHiogZGzLtssjqh9SDY+CgOtABOKAScTqVB5tQ0aPwBsvXCZ4BS9By3q ZF+Tv3rsbvFe7gsLlHtfoZXXJxBatM71inLxBQW6XrQiXShQcEOpEEDfo gZXeTaVH0c3MXJwDcXg/4sZh/BeP1H9fksVLfrVo3kU40ksI3SH27hip+ faZdOzz8vb8igIb9ezQd/IsDtDArqaaaWOjyJr+4B92fZJAvzn039fgpL O96fKSmv1CISTf+HYTSfQCqq4FPpzkOSBhlvfW0AXgeo+OdZBCZS3UeCm g==; X-CSE-ConnectionGUID: i7iy9ORvQs2RB4GJOPmgTg== X-CSE-MsgGUID: b7LN8y1BTmeM4ZesoFDyqg== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389357" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389357" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:56 -0800 X-CSE-ConnectionGUID: Mq+rblRIRzOqJJ6djdjhvg== X-CSE-MsgGUID: q6j754+QTCOqD4i820HDNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87736526" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:38:51 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 09/24] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_sept_add() to add SEPT pages Date: Tue, 12 Nov 2024 15:36:24 +0800 Message-ID: <20241112073624.22114-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX architecture introduces the concept of private GPA vs shared GPA, depending on the GPA.SHARED bit. The TDX module maintains a Secure EPT (S-EPT or SEPT) tree per TD for private GPA to HPA translation. Wrap the TDH.MEM.SEPT.ADD SEAMCALL with tdh_mem_sept_add() to provide pages to the TDX module for building a TD's SEPT tree. (Refer to these pages as SEPT pages). Callers need to allocate and provide a normal page to tdh_mem_sept_add(), which then passes the page to the TDX module via the SEAMCALL TDH.MEM.SEPT.ADD. The TDX module then installs the page into SEPT tree and encrypts this SEPT page with the TD's guest keyID. The kernel cannot use the SEPT page until after reclaiming it via TDH.MEM.SEPT.REMOVE or TDH.PHYMEM.PAGE.RECLAIM. Before passing the page to the TDX module, tdh_mem_sept_add() performs a CLFLUSH on the page mapped with keyID 0 to ensure that any dirty cache lines don't write back later and clobber TD memory or control structures. Don't worry about the other MK-TME keyIDs because the kernel doesn't use them. The TDX docs specify that this flush is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC feature bit. Do the CLFLUSH unconditionally for two reasons: make the solution simpler by having a single path that can handle both !CLFLUSH_BEFORE_ALLOC and CLFLUSH_BEFORE_ALLOC cases. Avoid wading into any correctness uncertainty by going with a conservative solution to start. Callers should specify "GPA" and "level" for the TDX module to install the SEPT page at the specified position in the SEPT. Do not include the root page level in "level" since TDH.MEM.SEPT.ADD can only add non-root pages to the SEPT. Ensure "level" is between 1 and 3 for a 4-level SEPT or between 1 and 4 for a 5-level SEPT. Call tdh_mem_sept_add() during the TD's build time or during the TD's runtime. Check for errors from the function return value and retrieve extended error info from the function output parameters. The TDX module has many internal locks. To avoid staying in SEAM mode for too long, SEAMCALLs returns a BUSY error code to the kernel instead of spinning on the locks. Depending on the specific SEAMCALL, the caller may need to handle this error in specific ways (e.g., retry). Therefore, return the SEAMCALL error code directly to the caller. Don't attempt to handle it in the core kernel. TDH.MEM.SEPT.ADD effectively manages two internal resources of the TDX module: it installs page table pages in the SEPT tree and also updates the TDX module's page metadata (PAMT). Don't add a wrapper for the matching SEAMCALL for removing a SEPT page (TDH.MEM.SEPT.REMOVE) because KVM, as the only in-kernel user, will only tear down the SEPT tree when the TD is being torn down. When this happens it can just do other operations that reclaim the SEPT pages for the host kernels to use, update the PAMT and let the SEPT get trashed. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - split out TDH.MEM.SEPT.ADD and re-wrote the patch msg. (Yan) - dropped TDH.MEM.SEPT.REMOVE. (Yan) - split out from original patch "KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module" and move to x86 core. (Kai) --- arch/x86/include/asm/tdx.h | 1 + arch/x86/virt/vmx/tdx/tdx.c | 19 +++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index d093dc4350ac..b6f3e5504d4d 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -124,6 +124,7 @@ void tdx_guest_keyid_free(unsigned int keyid); /* SEAMCALL wrappers for creating/destroying/running TDX guests */ u64 tdh_mng_addcx(u64 tdr, u64 tdcs); +u64 tdh_mem_sept_add(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx); u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx); u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index af121a73de80..1dc9be680475 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1575,6 +1575,25 @@ u64 tdh_mng_addcx(u64 tdr, u64 tdcs) } EXPORT_SYMBOL_GPL(tdh_mng_addcx); +u64 tdh_mem_sept_add(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdr, + .r8 = hpa, + }; + u64 ret; + + clflush_cache_range(__va(hpa), PAGE_SIZE); + ret = seamcall_ret(TDH_MEM_SEPT_ADD, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_sept_add); + u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index a63037036c91..7624d098515f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -18,6 +18,7 @@ * TDX module SEAMCALL leaf functions */ #define TDH_MNG_ADDCX 1 +#define TDH_MEM_SEPT_ADD 3 #define TDH_VP_ADDCX 4 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 From patchwork Tue Nov 12 07:36:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871810 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDA6120C481; Tue, 12 Nov 2024 07:39:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397149; cv=none; b=WIbcs/XBaDe0mIpYZkGZ/Y4SADC5R2/04akQQ4ewvhJYXahl1Qx+B+xrt9gldLJNRZ0rJk6sDeET+kPKJjerphg2K/tb5tzDyPGSd8H9eQnUaDRyqUzK8EFPQUu21a1g0tikQIiPJ2M3nrUnmdifIoefZZpu1F/NgVLwSd1VO2E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397149; c=relaxed/simple; bh=bg1j8ds2WKtREt9N/qTvmMHLjdRPkMBb+kP+s34qI+k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XGxythZjHqbvI74Q+po1u5l+dv6ENRygNImXD//7CG8L+l8NpDYkKkvUamR1kC8twR30o1pIWV0N2ebFbGba3qOr8FgF3ZVZaKLV6wSZnWTjYa1CqHMWmrwG9EtH+iDOGXdJkzYD0bQHqpckeVcG7nquAnfsC1OBzd5B+WRAm/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hGwPLdcG; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hGwPLdcG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397148; x=1762933148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bg1j8ds2WKtREt9N/qTvmMHLjdRPkMBb+kP+s34qI+k=; b=hGwPLdcGHA366pSzC/5QnTlRK5pjK2B26m1XpqoBE2gG5MoebZ495Xc7 c9R+peA/53ubjGQCPBW0S2LA4T+3eNm9bPs936Q7i8HnhJlusRWM6t5L6 Zv5f+yMdrHHklr+FuNAxYkzd5s5FMCOGrP+VfIWABYai50BsYs62ZG4NW jQ1I5JwXrqY9ONE+FzTE3AxszAjaAMBvoyOSd8n2witk4Et0GtLUD2Ffs HGEHN3hbPnsog5oDE01z+1kcKwRj4pp4M8GBX687zlMFyglVY4wsJrcF+ KMgAsy/ho0QlL7owcwuE8v8aCQER+Ewl0E/AwtT5BQUtbSQLKWbp+PBEX w==; X-CSE-ConnectionGUID: LgG1uImeRNKxXfYBX86x7Q== X-CSE-MsgGUID: FyOTixNSSIu4GrO7Uq+wNg== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389391" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389391" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:07 -0800 X-CSE-ConnectionGUID: f4hl0gqqRr+UWe/+quIlgg== X-CSE-MsgGUID: WCsYzx02RzWvmZBVl4Vtzg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87081953" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:03 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 10/24] x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages Date: Tue, 12 Nov 2024 15:36:36 +0800 Message-ID: <20241112073636.22129-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX architecture introduces the concept of private GPA vs shared GPA, depending on the GPA.SHARED bit. The TDX module maintains a Secure EPT (S-EPT or SEPT) tree per TD to translate TD's private memory accessed using a private GPA. Wrap the SEAMCALL TDH.MEM.PAGE.ADD with tdh_mem_page_add() and TDH.MEM.PAGE.AUG with tdh_mem_page_aug() to add TD private pages and map them to the TD's private GPAs in the SEPT. Callers of tdh_mem_page_add() and tdh_mem_page_aug() allocate and provide normal pages to the wrappers, who further pass those pages to the TDX module. Before passing the pages to the TDX module, tdh_mem_page_add() and tdh_mem_page_aug() perform a CLFLUSH on the page mapped with keyID 0 to ensure that any dirty cache lines don't write back later and clobber TD memory or control structures. Don't worry about the other MK-TME keyIDs because the kernel doesn't use them. The TDX docs specify that this flush is not needed unless the TDX module exposes the CLFLUSH_BEFORE_ALLOC feature bit. Do the CLFLUSH unconditionally for two reasons: make the solution simpler by having a single path that can handle both !CLFLUSH_BEFORE_ALLOC and CLFLUSH_BEFORE_ALLOC cases. Avoid wading into any correctness uncertainty by going with a conservative solution to start. Call tdh_mem_page_add() to add a private page to a TD during the TD's build time (i.e., before TDH.MR.FINALIZE). Specify which GPA the 4K private page will map to. No need to specify level info since TDH.MEM.PAGE.ADD only adds pages at 4K level. To provide initial contents to TD, provide an additional source page residing in memory managed by the host kernel itself (encrypted with a shared keyID). The TDX module will copy the initial contents from the source page in shared memory into the private page after mapping the page in the SEPT to the specified private GPA. The TDX module allows the source page to be the same page as the private page to be added. In that case, the TDX module converts and encrypts the source page as a TD private page. Call tdh_mem_page_aug() to add a private page to a TD during the TD's runtime (i.e., after TDH.MR.FINALIZE). TDH.MEM.PAGE.AUG supports adding huge pages. Specify which GPA the private page will map to, along with level info embedded in the lower bits of the GPA. The TDX module will recognize the added page as the TD's private page after the TD's acceptance with TDCALL TDG.MEM.PAGE.ACCEPT. tdh_mem_page_add() and tdh_mem_page_aug() may fail. Callers can check function return value and retrieve extended error info from the function output parameters. The TDX module has many internal locks. To avoid staying in SEAM mode for too long, SEAMCALLs returns a BUSY error code to the kernel instead of spinning on the locks. Depending on the specific SEAMCALL, the caller may need to handle this error in specific ways (e.g., retry). Therefore, return the SEAMCALL error code directly to the caller. Don't attempt to handle it in the core kernel. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - split out TDH.MEM.PAGE.ADD/AUG and re-wrote the patch msg (Yan). - removed the code comment in tdh_mem_page_add() about rcx/rdx since the callers still need to check for accurate interpretation from spec and need to put the comment in a central place (Yan, Reinette). - split out from original patch "KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module" and move to x86 core (Kai) --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 43 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index b6f3e5504d4d..d363aa201283 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -124,8 +124,10 @@ void tdx_guest_keyid_free(unsigned int keyid); /* SEAMCALL wrappers for creating/destroying/running TDX guests */ u64 tdh_mng_addcx(u64 tdr, u64 tdcs); +u64 tdh_mem_page_add(u64 tdr, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx); u64 tdh_mem_sept_add(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx); u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx); +u64 tdh_mem_page_aug(u64 tdr, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx); u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); u64 tdh_vp_create(u64 tdr, u64 tdvpr); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 1dc9be680475..e63e3cfd41fc 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1575,6 +1575,26 @@ u64 tdh_mng_addcx(u64 tdr, u64 tdcs) } EXPORT_SYMBOL_GPL(tdh_mng_addcx); +u64 tdh_mem_page_add(u64 tdr, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa, + .rdx = tdr, + .r8 = hpa, + .r9 = source, + }; + u64 ret; + + clflush_cache_range(__va(hpa), PAGE_SIZE); + ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_page_add); + u64 tdh_mem_sept_add(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx) { struct tdx_module_args args = { @@ -1606,6 +1626,25 @@ u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx) } EXPORT_SYMBOL_GPL(tdh_vp_addcx); +u64 tdh_mem_page_aug(u64 tdr, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa, + .rdx = tdr, + .r8 = hpa, + }; + u64 ret; + + clflush_cache_range(__va(hpa), PAGE_SIZE); + ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_page_aug); + u64 tdh_mng_key_config(u64 tdr) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 7624d098515f..d32ed527f67f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -18,8 +18,10 @@ * TDX module SEAMCALL leaf functions */ #define TDH_MNG_ADDCX 1 +#define TDH_MEM_PAGE_ADD 2 #define TDH_MEM_SEPT_ADD 3 #define TDH_VP_ADDCX 4 +#define TDH_MEM_PAGE_AUG 6 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 From patchwork Tue Nov 12 07:36:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871811 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFCF220B7F5; Tue, 12 Nov 2024 07:39:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397160; cv=none; b=Zbqa0pmlODSHFhF5O+S6M0rTlqAcuX2oOt0af1Yk9SNnqj3U2DY5c53QeZbVz7xyt5yqXnxU5ZUdsk1B7h0VG/yWsvaR9NKFhqkRyjIO+pDdhJp65Id7X/qlrfTZ+WHcxUpS29eTWOUm+wUK4i6xtkdTn3f5XS1p4wG9Sp06Yrk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397160; c=relaxed/simple; bh=Tlk940Q5sXfX8YFlbf4ZFyHk3Wts0lRf2SX7bTseaDM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ay3JwA6n+v6jURvbfc8JmC47+3Jux+fK4EXaausc30f/7rOkXUGLdixTtTDQjEtG1DdLaNxSnAaKmAP+2U9+8AIDA2jC36ICZVuTgeIKjRAn9KEnyPTWmjph6ajiYUXPGBcyDfMCk0zRUT9so4NTt4syIAGANOOFAEduUN3IRPw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nBxOTc7L; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nBxOTc7L" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397159; x=1762933159; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Tlk940Q5sXfX8YFlbf4ZFyHk3Wts0lRf2SX7bTseaDM=; b=nBxOTc7LmzyeNZKBqJ6qH5lilcaXg35wT5qX+00TzBDjTAw+yotMp7ih qcgeDOg80jxWJmDFoRi85EPDAy0Zrpkz3pY5lonAC2HN12Dapxq01QojX J4VZI16BT9eD+9BjI62nHVdwbIxDhBy7M0Pw7XfYnvaITGnM3w4i4dR5A /hLNygyFFO4+SuQ7IPqY6J2fld5VGbKX9FQOkkLidPdCNHzClIYLj+8w8 XcocSiWl11VfpXXzuYxbqdw69Ppz37nAu3/D5I4LRz8V8L+BuNvBlbEXh cg8fN1C0m0AWoidh2oWZkPx/r+o8v8kSDoc7ba0LCufuPZ+NWGf6pl8Tl g==; X-CSE-ConnectionGUID: LvPg92jDRHqCnXzVU6E3DA== X-CSE-MsgGUID: jBeXkSMuTueMkS4a+oWIEw== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389413" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389413" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:19 -0800 X-CSE-ConnectionGUID: RD/na2I2TVC93NWbTkcNCg== X-CSE-MsgGUID: GJD6Z86dQ125M6RARtRW4Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87081975" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:15 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 11/24] x86/virt/tdx: Add SEAMCALL wrappers to manage TDX TLB tracking Date: Tue, 12 Nov 2024 15:36:47 +0800 Message-ID: <20241112073648.22143-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX module defines a TLB tracking protocol to make sure that no logical processor holds any stale Secure EPT (S-EPT or SEPT) TLB translations for a given TD private GPA range. After a successful TDH.MEM.RANGE.BLOCK, TDH.MEM.TRACK, and kicking off all vCPUs, TDX module ensures that the subsequent TDH.VP.ENTER on each vCPU will flush all stale TLB entries for the specified GPA ranges in TDH.MEM.RANGE.BLOCK. Wrap the TDH.MEM.RANGE.BLOCK with tdh_mem_range_block() and TDH.MEM.TRACK with tdh_mem_track() to enable the kernel to assist the TDX module in TLB tracking management. The caller of tdh_mem_range_block() needs to specify "GPA" and "level" to request the TDX module to block the subsequent creation of TLB translation for a GPA range. This GPA range can correspond to a SEPT page or a TD private page at any level. Contentions and errors are possible with the SEAMCALL TDH.MEM.RANGE.BLOCK. Therefore, the caller of tdh_mem_range_block() needs to check the function return value and retrieve extended error info from the function output params. Upon TDH.MEM.RANGE.BLOCK success, no new TLB entries will be created for the specified private GPA range, though the existing TLB translations may still persist. Call tdh_mem_track() after tdh_mem_range_block(). No extra info is required except the TDR HPA to denote the TD. TDH.MEM.TRACK will advance the TD's epoch counter to ensure TDX module will flush TLBs in all vCPUs once the vCPUs re-enter the TD. TDH.MEM.TRACK will fail to advance TD's epoch counter if there are vCPUs still running in non-root mode at the previous TD epoch counter. Therefore, send IPIs to kick off vCPUs after tdh_mem_track() to avoid the failure by forcing all vCPUs to re-enter the TD. Contentions are also possible in TDH.MEM.TRACK. For example, TDH.MEM.TRACK may contend with TDH.VP.ENTER when advancing the TD epoch counter. tdh_mem_track() does not provide the retries for the caller. Callers can choose to avoid contentions or retry on their own. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - split out TDH.MEM.RANGE.BLOCK and TDH.MEM.TRACK and re-wrote the patch msg (Yan). - removed TDH.MEM.RANGE.UNBLOCK since it's unused. (Yan) - split out from original patch "KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module" and move to x86 core (Yan). --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 27 +++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index d363aa201283..227cb334176e 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -128,6 +128,7 @@ u64 tdh_mem_page_add(u64 tdr, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx); u64 tdh_mem_sept_add(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx); u64 tdh_vp_addcx(u64 tdvpr, u64 tdcx); u64 tdh_mem_page_aug(u64 tdr, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx); +u64 tdh_mem_range_block(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx); u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); u64 tdh_vp_create(u64 tdr, u64 tdvpr); @@ -141,6 +142,7 @@ u64 tdh_vp_rd(u64 tdvpr, u64 field, u64 *data); u64 tdh_vp_wr(u64 tdvpr, u64 field, u64 data, u64 mask); u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid); u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8); +u64 tdh_mem_track(u64 tdr); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(u64 tdr); #else diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index e63e3cfd41fc..f7f83d86ec18 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1645,6 +1645,23 @@ u64 tdh_mem_page_aug(u64 tdr, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx) } EXPORT_SYMBOL_GPL(tdh_mem_page_aug); +u64 tdh_mem_range_block(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdr, + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_range_block); + u64 tdh_mng_key_config(u64 tdr) { struct tdx_module_args args = { @@ -1820,6 +1837,16 @@ u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8) } EXPORT_SYMBOL_GPL(tdh_phymem_page_reclaim); +u64 tdh_mem_track(u64 tdr) +{ + struct tdx_module_args args = { + .rcx = tdr, + }; + + return seamcall(TDH_MEM_TRACK, &args); +} +EXPORT_SYMBOL_GPL(tdh_mem_track); + u64 tdh_phymem_cache_wb(bool resume) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index d32ed527f67f..e659eee1080a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -22,6 +22,7 @@ #define TDH_MEM_SEPT_ADD 3 #define TDH_VP_ADDCX 4 #define TDH_MEM_PAGE_AUG 6 +#define TDH_MEM_RANGE_BLOCK 7 #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 @@ -37,6 +38,7 @@ #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 #define TDH_SYS_RD 34 +#define TDH_MEM_TRACK 38 #define TDH_SYS_LP_INIT 35 #define TDH_SYS_TDMR_INIT 36 #define TDH_PHYMEM_CACHE_WB 40 From patchwork Tue Nov 12 07:36:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871812 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDE6420B80E; Tue, 12 Nov 2024 07:39:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397171; cv=none; b=NBYRksxyc6Jb49PhS9NG61LGYPCI0dFveMTx0iaQ+HSKN2cRF6JpZ/Tj80+ZSjpNAl0r2iq/ou5d8DX+lex4rSJ1NdNxH2+/nL/HBkRu1XQ3l7ZntRUg84dB8SM+RzKCH5kdFbrRo26OAJNiZM2JUz1odRMqWpiZh5MItnA9yYk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397171; c=relaxed/simple; bh=4fRyHylFpFmoOmS+0uQ9CGHFi3HY1T18UcGp/o4aH74=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AB7+SN6+nqGhqCfhRbn1x6+luPFgkDffYwroDKaEA7nRypVAmROKidniFeE+pQ/uxHraB9OB+OOME5s0CPGSGkaonyEp1dbYlk6IonjTdp+TATCELDAE4Vuu/mCQ2xpf56OhSJV2d+K7E7nojGG/dgSRmEEI6bweC6349zUV4UQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nHJnp+Cj; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nHJnp+Cj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397170; x=1762933170; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4fRyHylFpFmoOmS+0uQ9CGHFi3HY1T18UcGp/o4aH74=; b=nHJnp+CjzC4gjUjU2LQVX6wG0u3AissvB233vnylLu+xYOpwDKIFWeF3 eQp55ZJmKWvsB1VN+eIAPVw9ypUys/Eq618WkoTBuH83uXM+2DHd8TNmF I1SO3xyDFWOfHyFEKl/3zfFeyrm+TXQf5OfAb2dkKqnjQc0pGRqk6Lya1 y0EtuPGotgNnJlRoK6MiZ+IvzBUHt8Q9Po1zKfalCA8zce2UsAltTkOf0 K/nmW6xjoU7jj6JdHP1Y5oTCjrVeaX8ewSdElBvjmEzBe/WAedo0LtFob ze9E2+KPAT8rhWu142bbcG+omN20rG44KKsEkX8jSzwhUksw9okooZSoB Q==; X-CSE-ConnectionGUID: 7aUDtJPERfyDf4OCxUGuLw== X-CSE-MsgGUID: 5psBrAwKQcS0AgWJbOKu/Q== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="31389432" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="31389432" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:29 -0800 X-CSE-ConnectionGUID: SNQaZgoJTxyyqUMHs2R58w== X-CSE-MsgGUID: TFIITmeXS1qZjDQiZCdl0A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87082029" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:25 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 12/24] x86/virt/tdx: Add SEAMCALL wrappers to remove a TD private page Date: Tue, 12 Nov 2024 15:36:58 +0800 Message-ID: <20241112073658.22157-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata TDX architecture introduces the concept of private GPA vs shared GPA, depending on the GPA.SHARED bit. The TDX module maintains a single Secure EPT (S-EPT or SEPT) tree per TD to translate TD's private memory accessed using a private GPA. Wrap the SEAMCALL TDH.MEM.PAGE.REMOVE with tdh_mem_page_remove() and TDH_PHYMEM_PAGE_WBINVD with tdh_phymem_page_wbinvd_hkid() to unmap a TD private page from the SEPT, remove the TD private page from the TDX module and flush cache lines to memory after removal of the private page. Callers should specify "GPA" and "level" when calling tdh_mem_page_remove() to indicate to the TDX module which TD private page to unmap and remove. TDH.MEM.PAGE.REMOVE may fail, and the caller of tdh_mem_page_remove() can check the function return value and retrieve extended error information from the function output parameters. Follow the TLB tracking protocol before calling tdh_mem_page_remove() to remove a TD private page to avoid SEAMCALL failure. After removing a TD's private page, the TDX module does not write back and invalidate cache lines associated with the page and the page's keyID (i.e., the TD's guest keyID). Therefore, provide tdh_phymem_page_wbinvd_hkid() to allow the caller to pass in the TD's guest keyID and invoke TDH_PHYMEM_PAGE_WBINVD to perform this action. Before reusing the page, the host kernel needs to map the page with keyID 0 and invoke movdir64b() to convert the TD private page to a normal shared page. TDH.MEM.PAGE.REMOVE and TDH_PHYMEM_PAGE_WBINVD may meet contentions inside the TDX module for TDX's internal resources. To avoid staying in SEAM mode for too long, TDX module will return a BUSY error code to the kernel instead of spinning on the locks. The caller may need to handle this error in specific ways (e.g., retry). The wrappers return the SEAMCALL error code directly to the caller. Don't attempt to handle it in the core kernel. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - split out TDH.MEM.PAGE.REMOVE, TDH_PHYMEM_PAGE_WBINVD and re-wrote the patch msg (Yan). - split out from original patch "KVM: TDX: Add C wrapper functions for SEAMCALLs to the TDX module" and move to x86 core (Kai) --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 27 +++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 227cb334176e..bad47415894b 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -143,8 +143,10 @@ u64 tdh_vp_wr(u64 tdvpr, u64 field, u64 data, u64 mask); u64 tdh_vp_init_apicid(u64 tdvpr, u64 initial_rcx, u32 x2apicid); u64 tdh_phymem_page_reclaim(u64 page, u64 *rcx, u64 *rdx, u64 *r8); u64 tdh_mem_track(u64 tdr); +u64 tdh_mem_page_remove(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(u64 tdr); +u64 tdh_phymem_page_wbinvd_hkid(u64 hpa, u64 hkid); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index f7f83d86ec18..1b57486f2f06 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1847,6 +1847,23 @@ u64 tdh_mem_track(u64 tdr) } EXPORT_SYMBOL_GPL(tdh_mem_track); +u64 tdh_mem_page_remove(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa | level, + .rdx = tdr, + }; + u64 ret; + + ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mem_page_remove); + u64 tdh_phymem_cache_wb(bool resume) { struct tdx_module_args args = { @@ -1866,3 +1883,13 @@ u64 tdh_phymem_page_wbinvd_tdr(u64 tdr) return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_tdr); + +u64 tdh_phymem_page_wbinvd_hkid(u64 hpa, u64 hkid) +{ + struct tdx_module_args args = {}; + + args.rcx = hpa | (hkid << boot_cpu_data.x86_phys_bits); + + return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); +} +EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index e659eee1080a..505203a89238 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -35,6 +35,7 @@ #define TDH_PHYMEM_PAGE_RDMD 24 #define TDH_VP_RD 26 #define TDH_PHYMEM_PAGE_RECLAIM 28 +#define TDH_MEM_PAGE_REMOVE 29 #define TDH_SYS_KEY_CONFIG 31 #define TDH_SYS_INIT 33 #define TDH_SYS_RD 34 From patchwork Tue Nov 12 07:37:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871813 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E671320B807; Tue, 12 Nov 2024 07:39:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397182; cv=none; b=roaigA///RjxKBQBUlM8CX+C28ImUPX/DTd4dDzNvRLRwjibMBf4rYCjgE09UbCB5lMrBmQ4X7Xk2yCPW0sFd5uTTsepbUgjirYPZacbf9hlWwuy669+OIwuz0r3BxEISXNyT2cJja6aWTuwijcqCM35yzJwrMG3K83htN9Rauk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397182; c=relaxed/simple; bh=ikdXp3Cf8U+u3zRTfMhQoih6YK2g0+SWrWmE8if5VuA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KtCWuUhWyDZWyjH9t4LZWlGCBfaVY3n3srhjoYE+w3bWU91LIicL/ac0SALnH0UGYsxh0qlf0cu66xBeiz7FCg/ETsNNHAcUICDg96+DnC3hZYwMMH71GPlhQEHiciKgE3kkm5fa7D01Tc8jaD99HwWx+boiFrmc1N2vMFptk2U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DQM0dRhu; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DQM0dRhu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397181; x=1762933181; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ikdXp3Cf8U+u3zRTfMhQoih6YK2g0+SWrWmE8if5VuA=; b=DQM0dRhuDfAN7He3RUdKUJQPJY0cZ8PHDqKIgh2tudsJsNF5/me0eATP 2MbyVv0Y8+K4MwWJSX1LtfPvcIf+RoTFNCw5hZvwJ6cXQ+Ns4rjftLper Oi165YFNFzYm2XZvry1LT0NVXA+MfCPgGXpYvtqYa9BUx/7C5IMpNtueQ DN54rqz446rg0VJj/EZY2/pVtVMxfP02w9oIpgC0SJJIqoXRmx5Bx5d8z rpg3qjL+NZ+do+ZbvQogB4rG/rEBwOyQS+ulHmb+sfuuIgWOyu9Ub2CUy Ftx5PZvHaZ0ypon2C5jSHoSUaBY8z59ZIXjN1MgoifdPLWLp5fiEe73D+ Q==; X-CSE-ConnectionGUID: F5ikCOUdRd2a5pLycStS2A== X-CSE-MsgGUID: PdNWT59VTx6b/CSq77mGrw== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31311312" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31311312" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:40 -0800 X-CSE-ConnectionGUID: 4hY3IAFzQPGO9X2uiX7EFw== X-CSE-MsgGUID: 9cuMzUEzT22BAE2TgoKXzw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="124830450" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:36 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 13/24] x86/virt/tdx: Add SEAMCALL wrappers for TD measurement of initial contents Date: Tue, 12 Nov 2024 15:37:08 +0800 Message-ID: <20241112073709.22171-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata The TDX module measures the TD during the build process and saves the measurement in TDCS.MRTD to facilitate TD attestation of the initial contents of the TD. Wrap the SEAMCALL TDH.MR.EXTEND with tdh_mr_extend() and TDH.MR.FINALIZE with tdh_mr_finalize() to enable the host kernel to assist the TDX module in performing the measurement. The measurement in TDCS.MRTD is a SHA-384 digest of the build process. SEAMCALLs TDH.MNG.INIT and TDH.MEM.PAGE.ADD initialize and contribute to the MRTD digest calculation. The caller of tdh_mr_extend() should break the TD private page into chunks of size TDX_EXTENDMR_CHUNKSIZE and invoke tdh_mr_extend() to add the page content into the digest calculation. Failures are possible with TDH.MR.EXTEND (e.g., due to SEPT walking). The caller of tdh_mr_extend() can check the function return value and retrieve extended error information from the function output parameters. Calling tdh_mr_finalize() completes the measurement. The TDX module then turns the TD into the runnable state. Further TDH.MEM.PAGE.ADD and TDH.MR.EXTEND calls will fail. TDH.MR.FINALIZE may fail due to errors such as the TD having no vCPUs or contentions. Check function return value when calling tdh_mr_finalize() to determine the exact reason for failure. Take proper locks on the caller's side to avoid contention failures, or handle the BUSY error in specific ways (e.g., retry). Return the SEAMCALL error code directly to the caller. Do not attempt to handle it in the core kernel. [Kai: Switched from generic seamcall export] [Yan: Re-wrote the changelog] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Kai Huang Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - Rewrote the patch log (Yan). uAPI breakout v2: - Change to use 'u64' as function parameter to prepare to move SEAMCALL wrappers to arch/x86. (Kai) - Split to separate patch - Move SEAMCALL wrappers from KVM to x86 core; - Move TDH_xx macros from KVM to x86 core; - Re-write log uAPI breakout v1: - Make argument to C wrapper function struct kvm_tdx * or struct vcpu_tdx * .(Sean) - Drop unused helpers (Kai) - Fix bisectability issues in headers (Kai) - Updates from seamcall overhaul (Kai) v19: - Update the commit message to match the patch by Yuan - Use seamcall() and seamcall_ret() by paolo v18: - removed stub functions for __seamcall{,_ret}() - Added Reviewed-by Binbin - Make tdx_seamcall() use struct tdx_module_args instead of taking each inputs. v16: - use struct tdx_module_args instead of struct tdx_module_output - Add tdh_mem_sept_rd() for SEPT_VE_DISABLE=1. --- arch/x86/include/asm/tdx.h | 2 ++ arch/x86/virt/vmx/tdx/tdx.c | 27 +++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index bad47415894b..fdc81799171e 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -133,6 +133,8 @@ u64 tdh_mng_key_config(u64 tdr); u64 tdh_mng_create(u64 tdr, u64 hkid); u64 tdh_vp_create(u64 tdr, u64 tdvpr); u64 tdh_mng_rd(u64 tdr, u64 field, u64 *data); +u64 tdh_mr_extend(u64 tdr, u64 gpa, u64 *rcx, u64 *rdx); +u64 tdh_mr_finalize(u64 tdr); u64 tdh_vp_flush(u64 tdvpr); u64 tdh_mng_vpflushdone(u64 tdr); u64 tdh_mng_key_freeid(u64 tdr); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 1b57486f2f06..7e0574facfb0 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1713,6 +1713,33 @@ u64 tdh_mng_rd(u64 tdr, u64 field, u64 *data) } EXPORT_SYMBOL_GPL(tdh_mng_rd); +u64 tdh_mr_extend(u64 tdr, u64 gpa, u64 *rcx, u64 *rdx) +{ + struct tdx_module_args args = { + .rcx = gpa, + .rdx = tdr, + }; + u64 ret; + + ret = seamcall_ret(TDH_MR_EXTEND, &args); + + *rcx = args.rcx; + *rdx = args.rdx; + + return ret; +} +EXPORT_SYMBOL_GPL(tdh_mr_extend); + +u64 tdh_mr_finalize(u64 tdr) +{ + struct tdx_module_args args = { + .rcx = tdr, + }; + + return seamcall(TDH_MR_FINALIZE, &args); +} +EXPORT_SYMBOL_GPL(tdh_mr_finalize); + u64 tdh_vp_flush(u64 tdvpr) { struct tdx_module_args args = { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 505203a89238..4919d00025c9 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -27,6 +27,8 @@ #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 #define TDH_MNG_RD 11 +#define TDH_MR_EXTEND 16 +#define TDH_MR_FINALIZE 17 #define TDH_VP_FLUSH 18 #define TDH_MNG_VPFLUSHDONE 19 #define TDH_MNG_KEY_FREEID 20 From patchwork Tue Nov 12 07:37:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871814 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BE2720B807; Tue, 12 Nov 2024 07:39:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397192; cv=none; b=DwJkmn3LSLQ0vCt0/IPnbnP/9j2SWZW2qRFJwCRctJ3uBOjOoajX+KCXnUNhG4AKjFd//gw8Jej7+gnpmaepk8/bOmQRddLUIdr/WV2Yy+RvH+qmHsOe9W+Gt8Aaj/6kHH3b3NMwXIVq/QTO988x9dy/6OMO3iZ7WhIWhYEmjpg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397192; c=relaxed/simple; bh=UG5Ify1ngckIVJ9ozXP8b7ZYwtgvy0VLcrhVx+XnxfY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kWdELj+RhMFGTUIkaxyQat5mMDiilQY4F8/ylTITpyAjwSXFHabKZmt777uzwvCKmqBEO7H5YEgx97Y2g8Ccg2pxoRknVH2RwZs9V1Vyjs6xBe01iA6YGwtE6P4rWno7kEfLBHGryQ76vBQJwCN9CDgabctaAEF+7KPSQy53Meg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=I3tVCzXO; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="I3tVCzXO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397191; x=1762933191; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UG5Ify1ngckIVJ9ozXP8b7ZYwtgvy0VLcrhVx+XnxfY=; b=I3tVCzXOK3Qb3e1PWxe48sK7WvYbPvMY8OAL0JGk7DlfiBCDnBVpyVAA k30/5SlfriEdG4Ifo1wsT1p52pCsFFUAQBZMNthpEtYhj+uyJeP18+18c wnacf+CV0DsxYt88lc2LHf+sAQ5KWpYBBp4IbWPzFHZnjimBY6KeFDeFd tu7Jp/ZWzsHJL1p93lSurgEVFm9gZRQzA+o2WhFekVIYgc26ho3RpJMBm eNoObpHat4voBigW+XjB/CCwAUUNaMJUKtCFM9vpt9fIcLiTIBofFQXUt pzUCldwZI8jeQTvrkum3xaSMTRoJA/73hCgAZFT+GxMArFxh3+a8hNm2k Q==; X-CSE-ConnectionGUID: Evq5/bccTgyXME2CW+XmwQ== X-CSE-MsgGUID: zOpew0D1TK6IVI6YkTfEGA== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31311359" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31311359" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:51 -0800 X-CSE-ConnectionGUID: HQlcwpURR3CEYzXUos1FiQ== X-CSE-MsgGUID: 84t+DI5kQ6Sl6U0bQYdqPw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="124830472" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:47 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 14/24] KVM: TDX: Require TDP MMU and mmio caching for TDX Date: Tue, 12 Nov 2024 15:37:20 +0800 Message-ID: <20241112073720.22186-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Disable TDX support when TDP MMU or mmio caching aren't supported. As TDP MMU is becoming main stream than the legacy MMU, the legacy MMU support for TDX isn't implemented. TDX requires KVM mmio caching. Without mmio caching, KVM will go to MMIO emulation without installing SPTEs for MMIOs. However, TDX guest is protected and KVM would meet errors when trying to emulate MMIOs for TDX guest during instruction decoding. So, TDX guest relies on SPTEs being installed for MMIOs, which are with no RWX bits and with VE suppress bit unset, to inject VE to TDX guest. The TDX guest would then issue TDVMCALL in the VE handler to perform instruction decoding and have host do MMIO emulation. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: - Added Paolo's rb. TDX MMU part 2 v1: - Addressed Binbin's comment by massaging Isaku's updated comments and adding more explanations about instroducing mmio caching. - Addressed Sean's comments of v19 according to Isaku's update but kept the warning for MOVDIR64B. - Move code change in tdx_hardware_setup() to __tdx_bringup() since the former has been removed. --- arch/x86/kvm/mmu/mmu.c | 1 + arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/tdx.c | 8 +++----- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3a338df541c1..e2f75c8145fd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -110,6 +110,7 @@ static bool __ro_after_init tdp_mmu_allowed; #ifdef CONFIG_X86_64 bool __read_mostly tdp_mmu_enabled = true; module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0444); +EXPORT_SYMBOL_GPL(tdp_mmu_enabled); #endif static int max_huge_page_level __read_mostly; diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 3c292b4a063a..a34c0bebe1c3 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -3,6 +3,7 @@ #include "x86_ops.h" #include "vmx.h" +#include "mmu.h" #include "nested.h" #include "pmu.h" #include "posted_intr.h" diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 38369cafc175..8832f76e4a22 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1412,16 +1412,14 @@ static int __init __tdx_bringup(void) const struct tdx_sys_info_td_conf *td_conf; int r; + if (!tdp_mmu_enabled || !enable_mmio_caching) + return -EOPNOTSUPP; + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { pr_warn("MOVDIR64B is reqiured for TDX\n"); return -EOPNOTSUPP; } - if (!enable_ept) { - pr_err("Cannot enable TDX with EPT disabled.\n"); - return -EINVAL; - } - /* * Enabling TDX requires enabling hardware virtualization first, * as making SEAMCALLs requires CPU being in post-VMXON state. From patchwork Tue Nov 12 07:37:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871815 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 976F420BB49; Tue, 12 Nov 2024 07:40:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397204; cv=none; b=qoQ30jafG40M0xfJ6y80GmG7ktzAPxIKL3VPOnLlHszfioHs8T2HXAvwepAhpyuZX12XkJH56YJVQjoZssXoN9CVtgoenYI7zxDfIXHV+XwKIkjjckMHlVH1I/RtgL08IEmDElNnWAXAFRjznFbmAR5GJnm03XJDPc/1fKXnfpM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397204; c=relaxed/simple; bh=i6gBHuaj0gHh60VTayQdIjmV6bLsnXSbpoPy7aqQGH0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=er22eXgWb4yX6UKJtuuBPvcDUo0mW1p2NLq9rDINOXYdAON6UNzp0fktaSXCUkaxpVnjRfrVr92UotoHZ/NWbgsRxYTut221H1wS8/mNyJxaKF9wR8tKTMjQSNlg9JcXzIaZv6RGiz28pz15AhkrtKKPcUpXNdS1lh+PIC9xYLI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=lttCRNkO; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="lttCRNkO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397202; x=1762933202; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i6gBHuaj0gHh60VTayQdIjmV6bLsnXSbpoPy7aqQGH0=; b=lttCRNkOKJI48i/uVj25zzq4Oe45RNEx2xh1y11SBbUuv2VfMDwTqpbE cGBXykPC99SEGZ2sPEKrmcSzhDdstzsVzDjAXib3XZ3Dzim9KCgHxyRTn MuyReFGiLqxbBVZ9xGZ8U2v3lA5et+pMbCp2fd0o34nZr6AsbvTzwMHM2 Vdp6jpeEdZ4RmYippIIP9fJJ8oYpnC5OfvSz3yR2zMf5TiRznxhiqKt+B mk4fso/LLzDsGI9HYM+9Ml463YAgWa7C6l0E5hG2zvWHjGbG+8eRuwTec acXJFZikzWlsLO/a4+nVm7GoiRqJsIakuYcqYFbvZ3DxyORJdpPesLO8t A==; X-CSE-ConnectionGUID: jpXVPQQpQrqD2G9/tnCucg== X-CSE-MsgGUID: rX0N0It4T72CFgeIAoXpZA== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31090602" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31090602" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:02 -0800 X-CSE-ConnectionGUID: SkO8oupuQfeCxZqUq5hV/A== X-CSE-MsgGUID: CzCvCOZ/QB2wKLxcUc4EDg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="88115312" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:39:58 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 15/24] KVM: x86/mmu: Add setter for shadow_mmio_value Date: Tue, 12 Nov 2024 15:37:30 +0800 Message-ID: <20241112073730.22200-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Future changes will want to set shadow_mmio_value from TDX code. Add a helper to setter with a name that makes more sense from that context. Signed-off-by: Isaku Yamahata [split into new patch] Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: - Added Paolo's rb TDX MMU part 2 v1: - Split into new patch --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/spte.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 398b6b06ed73..a935e65a133d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -78,6 +78,7 @@ static inline gfn_t kvm_mmu_max_gfn(void) u8 kvm_mmu_get_max_tdp_level(void); void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask); +void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index f1a50a78badb..a831e76f379a 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -422,6 +422,12 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask) } EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); +void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value) +{ + kvm->arch.shadow_mmio_value = mmio_value; +} +EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_value); + void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask) { /* shadow_me_value must be a subset of shadow_me_mask */ From patchwork Tue Nov 12 07:37:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871816 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66C4E20BB49; Tue, 12 Nov 2024 07:40:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397216; cv=none; b=QCJS4BwjnvbVvMWY139mK1QtW6kDj1gIWWDLWKLZqmfeCKPQ2OJFmWYnh397RQN3RFhKPIQFThcmYDbS1bkPSv4GbI1oyEpS54FNKFQx0GgN9bbvMhLh+0Vnt68W8R1HdpxKDIE5aeQ2V3qmpfxIo3sTv4a0Ux+/2mnY3CcVWes= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397216; c=relaxed/simple; bh=ATuGK4QHEOp9CzuIj/HnUEsWjzZZ+yVOHjj0esAelLs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fEpOMn7yll5tvAS96kKhTvVNEoHGFRMGe/ppeDT/GLEO+KLRxvXaEc9PNNyAmZH9e7/4ZqLgw5Olx5dx2oaWyXaP02RrSwUfaC0Qbsxkh3Eq+aNcMbpBGzQGhL0nDMUMRCeGVX3VGdkp+VPsdDoeW8jQiUJbdyjtGs2Ctp7vkoI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YrsJzqMU; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YrsJzqMU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397215; x=1762933215; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ATuGK4QHEOp9CzuIj/HnUEsWjzZZ+yVOHjj0esAelLs=; b=YrsJzqMUI/IG0hfd9PDpU0DzhE0gPefmeyHm7LZxlCpH07A3UUywoxIM F3vAe8rsr6C8PBCK3RSffQqkBMMprcATm6JmcAT8HgCQqM0k9fz3c3vxq tasdfTia+nYLePS3XZ4+u2/rqcyPCQ55WEq6zxREUfnOZL6fCp+orZbxE T/cpdsA35Xc+y96bwppCK8z/zlQnP5jhtxjFLLLjibjsoc4uFNe5I+5VS JNF49hRVvPmA4i90uUrIZ7A20auPmBTIpKxwmdSnOPEv8WY8gYmez5GdC 3bJccNpA6VoHmFcQ/tFu6Rq2CZXAjgSHIkYmcTZ2rUOUGvAiCspUtSIxC Q==; X-CSE-ConnectionGUID: zfPrR7s7Qd6x0P9hEr199g== X-CSE-MsgGUID: kfga5oBZR0GWqh7xfc7AKg== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31090630" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31090630" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:14 -0800 X-CSE-ConnectionGUID: kbajvGrFSlOzLveGKflLCA== X-CSE-MsgGUID: BNbJ+AG9QH6xkIzJ6kMmCg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="92089301" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:09 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 16/24] KVM: TDX: Set per-VM shadow_mmio_value to 0 Date: Tue, 12 Nov 2024 15:37:43 +0800 Message-ID: <20241112073743.22214-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Set per-VM shadow_mmio_value to 0 for TDX. With enable_mmio_caching on, KVM installs MMIO SPTEs for TDs. To correctly configure MMIO SPTEs, TDX requires the per-VM shadow_mmio_value to be set to 0. This is necessary to override the default value of the suppress VE bit in the SPTE, which is 1, and to ensure value 0 in RWX bits. For MMIO SPTE, the spte value changes as follows: 1. initial value (suppress VE bit is set) 2. Guest issues MMIO and triggers EPT violation 3. KVM updates SPTE value to MMIO value (suppress VE bit is cleared) 4. Guest MMIO resumes. It triggers VE exception in guest TD 5. Guest VE handler issues TDG.VP.VMCALL 6. KVM handles MMIO 7. Guest VE handler resumes its execution after MMIO instruction Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: - Added Paolo's rb. TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Remove warning for shadow_mmio_value --- arch/x86/kvm/mmu/spte.c | 2 -- arch/x86/kvm/vmx/tdx.c | 15 ++++++++++++++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index a831e76f379a..817c68ad8bd5 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -94,8 +94,6 @@ u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access) u64 spte = generation_mmio_spte_mask(gen); u64 gpa = gfn << PAGE_SHIFT; - WARN_ON_ONCE(!vcpu->kvm->arch.shadow_mmio_value); - access &= shadow_mmio_access_mask; spte |= vcpu->kvm->arch.shadow_mmio_value | access; spte |= gpa | shadow_nonpresent_or_rsvd_mask; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 8832f76e4a22..37696adb574c 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -5,7 +5,7 @@ #include "mmu.h" #include "x86_ops.h" #include "tdx.h" - +#include "mmu/spte.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -415,6 +415,19 @@ int tdx_vm_init(struct kvm *kvm) kvm->arch.has_private_mem = true; + /* + * Because guest TD is protected, VMM can't parse the instruction in TD. + * Instead, guest uses MMIO hypercall. For unmodified device driver, + * #VE needs to be injected for MMIO and #VE handler in TD converts MMIO + * instruction into MMIO hypercall. + * + * SPTE value for MMIO needs to be setup so that #VE is injected into + * TD instead of triggering EPT MISCONFIG. + * - RWX=0 so that EPT violation is triggered. + * - suppress #VE bit is cleared to inject #VE. + */ + kvm_mmu_set_mmio_spte_value(kvm, 0); + /* * TDX has its own limit of maximum vCPUs it can support for all * TDX guests in addition to KVM_MAX_VCPUS. TDX module reports From patchwork Tue Nov 12 07:37:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871817 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D744420BB49; Tue, 12 Nov 2024 07:40:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397233; cv=none; b=ATOkJymQ7B2o6IVeg9NTir9Iuyqyb3Ac7wrRUP+ig9rKjsmXr0uTy0wk2VaOmYyI0m2c83YU8CYD7LK0AcwR2+Rattiosl/mXyzNAabSGUeX19cHIrjwrqgn84NdEhIj8tnfjXJooy4V44LblVGDq1CTa7gQFNiTOHkNej6kfYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397233; c=relaxed/simple; bh=Y28Bn1uVmNPfYs5Pzm1YxRbZtqYBIEZoCJS9YviJUok=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ERp+E+mJCv5zGG60rRtOJaJJplTz+7lnwrPkwsdqbyqRCj/UfFJpS6TbgYvNePghVrX3VLRAZwZ3mwU0l3a1r47TpcgJ/5Xv5AtrJNi/+htdA2IcUFSjlZjD3rEmeY6D/o9kMVDm783ZbNYRapTRfANCjuIUrotbwiyMiwlT0Jc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=G3GxDkT1; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="G3GxDkT1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397231; x=1762933231; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y28Bn1uVmNPfYs5Pzm1YxRbZtqYBIEZoCJS9YviJUok=; b=G3GxDkT1/5ol2Z7jXyyiuqazjo896bYiG7Gt4c1lzXu+Al1QAJ8pnTc4 cx9muPw5PI8/2lkRZL+8b8sv3qUCTDBb5wMPbVQcZ/mCNQglcMZW8FL/m TBKb1S8xZSj8EwCd+YCfmsPNVpwZwXzF/XVoAycfETKKSyPX57vUt0mMT twRR6s5HRsDfePS1utPxFxmznsQ8E+WJpytX6C9H3wWD7diJSndKrO6s3 pd2OhXIOtm712LOCoGERTx5B6oxm6Jh2+F3wnSNdse0F9zQuWlLlMrQbt Y8L8RFaqIXXfYaQzXJXa3EPfqsaqHUtjmguU+C/LgG68vE1N+S4AIhugN Q==; X-CSE-ConnectionGUID: TII8dMbgS1SQnNYO4OgUPg== X-CSE-MsgGUID: XYQYD5NxQoiJlPHe80qSMQ== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31090671" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31090671" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:24 -0800 X-CSE-ConnectionGUID: QzoqBj2dSkupYP6K/28R2g== X-CSE-MsgGUID: vOknj8eeQ/2W1nkFjEUvlQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="92089528" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:20 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 17/24] KVM: TDX: Handle TLB tracking for TDX Date: Tue, 12 Nov 2024 15:37:53 +0800 Message-ID: <20241112073753.22228-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle TLB tracking for TDX by introducing function tdx_track() for private memory TLB tracking and implementing flush_tlb* hooks to flush TLBs for shared memory. Introduce function tdx_track() to do TLB tracking on private memory, which basically does two things: calling TDH.MEM.TRACK to increase TD epoch and kicking off all vCPUs. The private EPT will then be flushed when each vCPU re-enters the TD. This function is unused temporarily in this patch and will be called on a page-by-page basis on removal of private guest page in a later patch. In earlier revisions, tdx_track() relied on an atomic counter to coordinate the synchronization between the actions of kicking off vCPUs, incrementing the TD epoch, and the vCPUs waiting for the incremented TD epoch after being kicked off. However, the core MMU only actually needs to call tdx_track() while aleady under a write mmu_lock. So this sychnonization can be made to be unneeded. vCPUs are kicked off only after the successful execution of TDH.MEM.TRACK, eliminating the need for vCPUs to wait for TDH.MEM.TRACK completion after being kicked off. tdx_track() is therefore able to send requests KVM_REQ_OUTSIDE_GUEST_MODE rather than KVM_REQ_TLB_FLUSH. Hooks for flush_remote_tlb and flush_remote_tlbs_range are not necessary for TDX, as tdx_track() will handle TLB tracking of private memory on page-by-page basis when private guest pages are removed. There is no need to invoke tdx_track() again in kvm_flush_remote_tlbs() even after changes to the mirrored page table. For hooks flush_tlb_current and flush_tlb_all, which are invoked during kvm_mmu_load() and vcpu load for normal VMs, let VMM to flush all EPTs in the two hooks for simplicity, since TDX does not depend on the two hooks to notify TDX module to flush private EPT in those cases. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - No need for is_td_finalized() (Rick) - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add TD state handling (Tony) - Added tdx_flush_tlb_all() (Paolo) - Re-explanined the reason to do global invalidation in tdx_flush_tlb_current() (Yan) TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Modification of synchronization mechanism in tdx_track(). - Dropped hooks flush_remote_tlb and flush_remote_tlbs_range. - Let VMM to flush all EPTs in hooks flush_tlb_all and flush_tlb_current. - Dropped KVM_BUG_ON() in vt_flush_tlb_gva(). (Rick) --- arch/x86/kvm/vmx/main.c | 44 +++++++++++++++++++-- arch/x86/kvm/vmx/tdx.c | 81 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/x86_ops.h | 4 ++ 3 files changed, 125 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index a34c0bebe1c3..4902d7bb86f3 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -99,6 +99,42 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_flush_tlb_all(vcpu); + return; + } + + vmx_flush_tlb_all(vcpu); +} + +static void vt_flush_tlb_current(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_flush_tlb_current(vcpu); + return; + } + + vmx_flush_tlb_current(vcpu); +} + +static void vt_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_flush_tlb_gva(vcpu, addr); +} + +static void vt_flush_tlb_guest(struct kvm_vcpu *vcpu) +{ + if (is_td_vcpu(vcpu)) + return; + + vmx_flush_tlb_guest(vcpu); +} + static void vt_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) { @@ -188,10 +224,10 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .set_rflags = vmx_set_rflags, .get_if_flag = vmx_get_if_flag, - .flush_tlb_all = vmx_flush_tlb_all, - .flush_tlb_current = vmx_flush_tlb_current, - .flush_tlb_gva = vmx_flush_tlb_gva, - .flush_tlb_guest = vmx_flush_tlb_guest, + .flush_tlb_all = vt_flush_tlb_all, + .flush_tlb_current = vt_flush_tlb_current, + .flush_tlb_gva = vt_flush_tlb_gva, + .flush_tlb_guest = vt_flush_tlb_guest, .vcpu_pre_run = vmx_vcpu_pre_run, .vcpu_run = vmx_vcpu_run, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 37696adb574c..9eef361c8e57 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -5,6 +5,7 @@ #include "mmu.h" #include "x86_ops.h" #include "tdx.h" +#include "vmx.h" #include "mmu/spte.h" #undef pr_fmt @@ -523,6 +524,51 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } +/* + * Ensure shared and private EPTs to be flushed on all vCPUs. + * tdh_mem_track() is the only caller that increases TD epoch. An increase in + * the TD epoch (e.g., to value "N + 1") is successful only if no vCPUs are + * running in guest mode with the value "N - 1". + * + * A successful execution of tdh_mem_track() ensures that vCPUs can only run in + * guest mode with TD epoch value "N" if no TD exit occurs after the TD epoch + * being increased to "N + 1". + * + * Kicking off all vCPUs after that further results in no vCPUs can run in guest + * mode with TD epoch value "N", which unblocks the next tdh_mem_track() (e.g. + * to increase TD epoch to "N + 2"). + * + * TDX module will flush EPT on the next TD enter and make vCPUs to run in + * guest mode with TD epoch value "N + 1". + * + * kvm_make_all_cpus_request() guarantees all vCPUs are out of guest mode by + * waiting empty IPI handler ack_kick(). + * + * No action is required to the vCPUs being kicked off since the kicking off + * occurs certainly after TD epoch increment and before the next + * tdh_mem_track(). + */ +static void __always_unused tdx_track(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + u64 err; + + /* If TD isn't finalized, it's before any vcpu running. */ + if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) + return; + + lockdep_assert_held_write(&kvm->mmu_lock); + + do { + err = tdh_mem_track(kvm_tdx->tdr_pa); + } while (unlikely((err & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY)); + + if (KVM_BUG_ON(err, kvm)) + pr_tdx_error(TDH_MEM_TRACK, err); + + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; @@ -1068,6 +1114,41 @@ static int tdx_td_init(struct kvm *kvm, struct kvm_tdx_cmd *cmd) return ret; } +void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) +{ + /* + * flush_tlb_current() is invoked when the first time for the vcpu to + * run or when root of shared EPT is invalidated. + * KVM only needs to flush shared EPT because the TDX module handles TLB + * invalidation for private EPT in tdh_vp_enter(); + * + * A single context invalidation for shared EPT can be performed here. + * However, this single context invalidation requires the private EPTP + * rather than the shared EPTP to flush shared EPT, as shared EPT uses + * private EPTP as its ASID for TLB invalidation. + * + * To avoid reading back private EPTP, perform a global invalidation for + * shared EPT instead to keep this function simple. + */ + ept_sync_global(); +} + +void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) +{ + /* + * TDX has called tdx_track() in tdx_sept_remove_private_spte() to + * ensure that private EPT will be flushed on the next TD enter. No need + * to call tdx_track() here again even when this callback is a result of + * zapping private EPT. + * + * Due to the lack of the context to determine which EPT has been + * affected by zapping, invoke invept() directly here for both shared + * EPT and private EPT for simplicity, though it's not necessary for + * private EPT. + */ + ept_sync_global(); +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index f49135094c94..7151ac38bc31 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -130,6 +130,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); +void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); +void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } @@ -143,6 +145,8 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } +static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} +static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} #endif From patchwork Tue Nov 12 07:38:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871818 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 555FE2139AF; Tue, 12 Nov 2024 07:40:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397237; cv=none; b=slXu1hbgAmgqsPkGe9CbuWRwdv1ODJi0HONe/9eRoZMx0+cpYEXmQ8/jw23vowyJxxqirUkKIEXwTl/5VUne3LyUGPPZ8taxmQZ/54bkHxJwSaVb2eN+z6Rg/bgJW4gYfWklcyco5WGA9P8mFCVE3fTuKKhEDi1EZq/ufmmg4kA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397237; c=relaxed/simple; bh=JoZpPG0nym+VJAQu2zwd3iOukPfsFOAFaeI4EeSaZEg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cIFTxfBocrMLldhxSVaKaWICc23RNXB/sdbqDFSEFhEc7hpfP5FKcfhpsJZIBSdAxRMikeS26ALg/jM4/aiAJO3wor1kUlGHko81kTLocdpyGq/W3TKu9+hRw77KJyC+C1wGJzhcdy7X2AKKrgRyZILY2Hj929y3sdjzuGqOeug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hzfa3ak7; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hzfa3ak7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397235; x=1762933235; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JoZpPG0nym+VJAQu2zwd3iOukPfsFOAFaeI4EeSaZEg=; b=hzfa3ak78tIdlTZYYenbQf2sdwfnQeHwmvU9kuv7MLHXZ6hhrEmcHMlh XYeD/vNMJ6I0z1q4rHK9ls5kIyri6FtQOK3yndqT8jHycM5yyFW9N5Ju7 By1vMrzNykqHrkGF2Wh1ZgXjk8zqCfylJHKoZko3zIFfs4TJm+O1ZV/Yn 3LxxE06myrD1LiLXJLuKFiPIKrAEAkPkcl3ISdprpp/hxKpx6cje7GEhE aCB8jQOnICw+22emHP+Sx0U2WBFJ54g9JeaLuD/CtHWva63f4v4HtiybW Bn3CYya3v25A1AaQ3y8p+FBoPzPLTonllPawPX9XbHdSqDBpbQxUCC0b1 Q==; X-CSE-ConnectionGUID: KGsH4Om9ScS0Q7Jso1KY6Q== X-CSE-MsgGUID: OXBBrvdsT6Gsba6vPmrBoQ== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31090701" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31090701" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:35 -0800 X-CSE-ConnectionGUID: quOya2fxQJi+Owk5M32FLg== X-CSE-MsgGUID: a5S2FhI/Qm+DQWYgAeZh2Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="92089590" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:31 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 18/24] KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table Date: Tue, 12 Nov 2024 15:38:04 +0800 Message-ID: <20241112073804.22242-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement hooks in TDX to propagate changes of mirror page table to private EPT, including changes for page table page adding/removing, guest page adding/removing. TDX invokes corresponding SEAMCALLs in the hooks. - Hook link_external_spt propagates adding page table page into private EPT. - Hook set_external_spte tdx_sept_set_private_spte() in this patch only handles adding of guest private page when TD is finalized. Later patches will handle the case of adding guest private pages before TD finalization. - Hook free_external_spt It is invoked when page table page is removed in mirror page table, which currently must occur at TD tear down phase, after hkid is freed. - Hook remove_external_spte It is invoked when guest private page is removed in mirror page table, which can occur when TD is active, e.g. during shared <-> private conversion and slot move/deletion. This hook is ensured to be triggered before hkid is freed, because gmem fd is released along with all private leaf mappings zapped before freeing hkid at VM destroy. TDX invokes below SEAMCALLs sequentially: 1) TDH.MEM.RANGE.BLOCK (remove RWX bits from a private EPT entry), 2) TDH.MEM.TRACK (increases TD epoch) 3) TDH.MEM.PAGE.REMOVE (remove the private EPT entry and untrack the guest page). TDH.MEM.PAGE.REMOVE can't succeed without TDH.MEM.RANGE.BLOCK and TDH.MEM.TRACK being called successfully. SEAMCALL TDH.MEM.TRACK is called in function tdx_track() to enforce that TLB tracking will be performed by TDX module for private EPT. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - No need for is_td_finalized() (Rick) - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add TD state handling (Tony) - Fix "KVM_MAP_MEMORY" comment (Binbin) - Updated comment of KVM_BUG_ON() in tdx_sept_remove_private_spte (Kai, Rick) - Return -EBUSY on busy in tdx_mem_page_aug(). (Kai) - Retry tdh_mem_page_aug() on TDX_OPERAND_BUSY instead of TDX_ERROR_SEPT_BUSY. (Yan) TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Move setting up the 4 callbacks (kvm_x86_ops::link_external_spt etc) from tdx_hardware_setup() (which doesn't exist anymore) to vt_hardware_setup() directly. Make tdx_sept_link_external_spt() those 4 callbacks global and add declarations to x86_ops.h so they can be setup in vt_hardware_setup(). - Updated the KVM_BUG_ON() in tdx_sept_free_private_spt(). (Isaku, Binbin) - Removed the unused tdx_post_mmu_map_page(). - Removed WARN_ON_ONCE) in tdh_mem_page_aug() according to Isaku's feedback: "This WARN_ON_ONCE() is a guard for buggy TDX module. It shouldn't return (TDX_EPT_ENTRY_STATE_INCORRECT | TDX_OPERAND_ID_RCX)) when SEPT_VE_DISABLED cleared. Maybe we should remove this WARN_ON_ONCE() because the TDX module is mature." - Update for the wrapper functions for SEAMCALLs. (Sean) - Add preparation for KVM_TDX_INIT_MEM_REGION to make tdx_sept_set_private_spte() callback nop when the guest isn't finalized. - use unlikely(err) in tdx_reclaim_td_page(). - Updates from seamcall overhaul (Kai) - Move header definitions from "KVM: TDX: Define TDX architectural definitions" (Sean) - Drop ugly unions (Sean) - Remove tdx_mng_key_config_lock cleanup after dropped in "KVM: TDX: create/destroy VM structure" (Chao) - Since HKID is freed on vm_destroy() zapping only happens when HKID is allocated. Remove relevant code in zapping handlers that assume the opposite, and add some KVM_BUG_ON() to assert this where it was missing. (Isaku) --- arch/x86/kvm/vmx/main.c | 14 ++- arch/x86/kvm/vmx/tdx.c | 219 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx_arch.h | 23 ++++ arch/x86/kvm/vmx/x86_ops.h | 37 ++++++ 4 files changed, 291 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 4902d7bb86f3..cb41e9a1d3e3 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -36,9 +36,21 @@ static __init int vt_hardware_setup(void) * is KVM may allocate couple of more bytes than needed for * each VM. */ - if (enable_tdx) + if (enable_tdx) { vt_x86_ops.vm_size = max_t(unsigned int, vt_x86_ops.vm_size, sizeof(struct kvm_tdx)); + /* + * Note, TDX may fail to initialize in a later time in + * vt_init(), in which case it is not necessary to setup + * those callbacks. But making them valid here even + * when TDX fails to init later is fine because those + * callbacks won't be called if the VM isn't TDX guest. + */ + vt_x86_ops.link_external_spt = tdx_sept_link_private_spt; + vt_x86_ops.set_external_spte = tdx_sept_set_private_spte; + vt_x86_ops.free_external_spt = tdx_sept_free_private_spt; + vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte; + } return 0; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9eef361c8e57..29f01cff0e6b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -142,6 +142,14 @@ static DEFINE_MUTEX(tdx_lock); static atomic_t nr_configured_hkid; +#define TDX_ERROR_SEPT_BUSY (TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT) + +static inline int pg_level_to_tdx_sept_level(enum pg_level level) +{ + WARN_ON_ONCE(level == PG_LEVEL_NONE); + return level - 1; +} + /* Maximum number of retries to attempt for SEAMCALLs. */ #define TDX_SEAMCALL_RETRIES 10000 @@ -524,6 +532,166 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } +static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn) +{ + put_page(pfn_to_page(pfn)); +} + +static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + hpa_t hpa = pfn_to_hpa(pfn); + gpa_t gpa = gfn_to_gpa(gfn); + u64 entry, level_state; + u64 err; + + err = tdh_mem_page_aug(kvm_tdx->tdr_pa, gpa, hpa, &entry, &level_state); + if (unlikely(err & TDX_OPERAND_BUSY)) { + tdx_unpin(kvm, pfn); + return -EBUSY; + } + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state); + tdx_unpin(kvm, pfn); + return -EIO; + } + + return 0; +} + +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) + return -EINVAL; + + /* + * Because guest_memfd doesn't support page migration with + * a_ops->migrate_folio (yet), no callback is triggered for KVM on page + * migration. Until guest_memfd supports page migration, prevent page + * migration. + * TODO: Once guest_memfd introduces callback on page migration, + * implement it and remove get_page/put_page(). + */ + get_page(pfn_to_page(pfn)); + + if (likely(kvm_tdx->state == TD_STATE_RUNNABLE)) + return tdx_mem_page_aug(kvm, gfn, level, pfn); + + /* + * TODO: KVM_TDX_INIT_MEM_REGION support to populate before finalize + * comes here for the initial memory. + */ + return -EOPNOTSUPP; +} + +static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn); + hpa_t hpa = pfn_to_hpa(pfn); + u64 err, entry, level_state; + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level != PG_LEVEL_4K, kvm)) + return -EINVAL; + + if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm)) + return -EINVAL; + + do { + /* + * When zapping private page, write lock is held. So no race + * condition with other vcpu sept operation. Race only with + * TDH.VP.ENTER. + */ + err = tdh_mem_page_remove(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, + &level_state); + } while (unlikely(err == TDX_ERROR_SEPT_BUSY)); + + if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE && + err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { + /* + * This page was mapped with KVM_MAP_MEMORY, but + * KVM_TDX_INIT_MEM_REGION is not issued yet. + */ + if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) { + tdx_unpin(kvm, pfn); + return 0; + } + } + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state); + return -EIO; + } + + do { + /* + * TDX_OPERAND_BUSY can happen on locking PAMT entry. Because + * this page was removed above, other thread shouldn't be + * repeatedly operating on this page. Just retry loop. + */ + err = tdh_phymem_page_wbinvd_hkid(hpa, (u16)kvm_tdx->hkid); + } while (unlikely(err == (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX))); + + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); + return -EIO; + } + tdx_clear_page(hpa); + tdx_unpin(kvm, pfn); + return 0; +} + +int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + gpa_t gpa = gfn_to_gpa(gfn); + hpa_t hpa = __pa(private_spt); + u64 err, entry, level_state; + + err = tdh_mem_sept_add(to_kvm_tdx(kvm)->tdr_pa, gpa, tdx_level, hpa, &entry, + &level_state); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state); + return -EIO; + } + + return 0; +} + +static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level) +{ + int tdx_level = pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + gpa_t gpa = gfn_to_gpa(gfn) & KVM_HPAGE_MASK(level); + u64 err, entry, level_state; + + /* For now large page isn't supported yet. */ + WARN_ON_ONCE(level != PG_LEVEL_4K); + + err = tdh_mem_range_block(kvm_tdx->tdr_pa, gpa, tdx_level, &entry, &level_state); + if (unlikely(err == TDX_ERROR_SEPT_BUSY)) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error_2(TDH_MEM_RANGE_BLOCK, err, entry, level_state); + return -EIO; + } + return 0; +} + /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase in @@ -548,7 +716,7 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) * occurs certainly after TD epoch increment and before the next * tdh_mem_track(). */ -static void __always_unused tdx_track(struct kvm *kvm) +static void tdx_track(struct kvm *kvm) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); u64 err; @@ -569,6 +737,55 @@ static void __always_unused tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } +int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + /* + * free_external_spt() is only called after hkid is freed when TD is + * tearing down. + * KVM doesn't (yet) zap page table pages in mirror page table while + * TD is active, though guest pages mapped in mirror page table could be + * zapped during TD is active, e.g. for shared <-> private conversion + * and slot move/deletion. + */ + if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) + return -EINVAL; + + /* + * The HKID assigned to this TD was already freed and cache was + * already flushed. We don't have to flush again. + */ + return tdx_reclaim_page(__pa(private_spt)); +} + +int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + int ret; + + /* + * HKID is released after all private pages have been removed, and set + * before any might be populated. Warn if zapping is attempted when + * there can't be anything populated in the private EPT. + */ + if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) + return -EINVAL; + + ret = tdx_sept_zap_private_spte(kvm, gfn, level); + if (ret) + return ret; + + /* + * TDX requires TLB tracking before dropping private page. Do + * it here, although it is also done later. + */ + tdx_track(kvm); + + return tdx_sept_drop_private_spte(kvm, gfn, level, pfn); +} + static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) { const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index d80ec118834e..289728f1611f 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -155,6 +155,29 @@ struct td_params { #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) +/* Additional Secure EPT entry information */ +#define TDX_SEPT_LEVEL_MASK GENMASK_ULL(2, 0) +#define TDX_SEPT_STATE_MASK GENMASK_ULL(15, 8) +#define TDX_SEPT_STATE_SHIFT 8 + +enum tdx_sept_entry_state { + TDX_SEPT_FREE = 0, + TDX_SEPT_BLOCKED = 1, + TDX_SEPT_PENDING = 2, + TDX_SEPT_PENDING_BLOCKED = 3, + TDX_SEPT_PRESENT = 4, +}; + +static inline u8 tdx_get_sept_level(u64 sept_entry_info) +{ + return sept_entry_info & TDX_SEPT_LEVEL_MASK; +} + +static inline u8 tdx_get_sept_state(u64 sept_entry_info) +{ + return (sept_entry_info & TDX_SEPT_STATE_MASK) >> TDX_SEPT_STATE_SHIFT; +} + #define MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM BIT_ULL(20) /* diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 7151ac38bc31..3e7e7d0eadbf 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -130,6 +130,15 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); +int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt); +int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt); +int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn); +int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn); + void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); @@ -145,6 +154,34 @@ static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } +static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + void *private_spt) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + void *private_spt) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + kvm_pfn_t pfn) +{ + return -EOPNOTSUPP; +} + +static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, + kvm_pfn_t pfn) +{ + return -EOPNOTSUPP; +} + static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} From patchwork Tue Nov 12 07:38:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871819 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF32920C03D; Tue, 12 Nov 2024 07:40:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397249; cv=none; b=SY9I2D90EOD9VgQAH/hh9MMiGU84tVmxDoakLWOhDr3xn4jcwIi9yZlSW6a1E/88zEMkMSDiOI8QaBBug+C9X2dqNAuilsm9kUNaI/au1+4VgIkXNsQpG3iOL32nxBP+g0kxc8aaGE/5y5m7EJh6TgIHtM5t9DfFUwXQsRtIZDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397249; c=relaxed/simple; bh=7t0ccS7gH3OWd/7wU0XQ8THGzqp4LvQk2C6Q3mrvDLQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z9boFEHhYqnWJsAvt5r/XweAg6S915DetSH0QxTwkmF3lc6L7QLQi/YQKEgFBhGVcumh+jtiOEHW7zSTbwbCUBUDHN570SBjJoPmFfn6SbP60zXj9xbt7hMsjdBiFvwcN2S/aKloOlApIvCs4MDyl00Vbz9Ro/Z90jiMkvle3T8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dXGsbST2; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dXGsbST2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397248; x=1762933248; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7t0ccS7gH3OWd/7wU0XQ8THGzqp4LvQk2C6Q3mrvDLQ=; b=dXGsbST2lFxW3QNbcPbI/clsAQya22n0qjSs6Yttyb15xWJ3lq+LYGYU eeVOPk1vzpIgBCa7Wtll0jHmHWDN+2qhQP3GyFfssQkbdblmhMaNaYPiu 4kqHYbnjT+8yLPMW8u2IB8fFLY7tBIndJEbF1bv6E39AuSvBHCsJXmclS IVW+SBqZvLivc00oSB+2d7AXyoek7GOEQ8HnkVROxw+zWMDheImas2tlw f2A37ZFN2jloqHPBRTBfn3j8aUqefkA0BTvwwsXyZ8/7KDXhOdLBMu+j8 fV8PbAzFZRLZjKsEo6yaLsDDLfb38e36Lmo91voJQ4IOGUgBvRIks1td7 w==; X-CSE-ConnectionGUID: GMxBp4XETg2RP7kApJdKOw== X-CSE-MsgGUID: Go754VkXQIaxK6x6dUtAnA== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="42598890" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="42598890" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:46 -0800 X-CSE-ConnectionGUID: 81R8YG8aSmupKs4lXuCnIg== X-CSE-MsgGUID: hIgTE4UkTCG/bTNOXQPCww== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87427089" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:43 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 19/24] KVM: TDX: Implement hook to get max mapping level of private pages Date: Tue, 12 Nov 2024 15:38:16 +0800 Message-ID: <20241112073816.22256-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Implement hook private_max_mapping_level for TDX to let TDP MMU core get max mapping level of private pages. The value is hard coded to 4K for no huge page support for now. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Reviewed-by: Paolo Bonzini --- TDX MMU part 2 v2: - Added Paolo's rb. TDX MMU part 2 v1: - Split from the big patch "KVM: TDX: TDP MMU TDX support". - Fix missing tdx_gmem_private_max_mapping_level() implementation for !CONFIG_INTEL_TDX_HOST v19: - Use gmem_max_level callback, delete tdp_max_page_level. --- arch/x86/kvm/vmx/main.c | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 5 +++++ arch/x86/kvm/vmx/x86_ops.h | 2 ++ 3 files changed, 17 insertions(+) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index cb41e9a1d3e3..244fb80d385a 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -174,6 +174,14 @@ static int vt_vcpu_mem_enc_ioctl(struct kvm_vcpu *vcpu, void __user *argp) return tdx_vcpu_ioctl(vcpu, argp); } +static int vt_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) +{ + if (is_td(kvm)) + return tdx_gmem_private_max_mapping_level(kvm, pfn); + + return 0; +} + #define VMX_REQUIRED_APICV_INHIBITS \ (BIT(APICV_INHIBIT_REASON_DISABLED) | \ BIT(APICV_INHIBIT_REASON_ABSENT) | \ @@ -329,6 +337,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .mem_enc_ioctl = vt_mem_enc_ioctl, .vcpu_mem_enc_ioctl = vt_vcpu_mem_enc_ioctl, + + .private_max_mapping_level = vt_gmem_private_max_mapping_level }; struct kvm_x86_init_ops vt_init_ops __initdata = { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 29f01cff0e6b..ead520083397 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1627,6 +1627,11 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) return ret; } +int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) +{ + return PG_LEVEL_4K; +} + static int tdx_online_cpu(unsigned int cpu) { unsigned long flags; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 3e7e7d0eadbf..f61daac5f2f0 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -142,6 +142,7 @@ int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, void tdx_flush_tlb_current(struct kvm_vcpu *vcpu); void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); +int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); #else static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} @@ -185,6 +186,7 @@ static inline int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, static inline void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) {} static inline void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) {} static inline void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) {} +static inline int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn) { return 0; } #endif #endif /* __KVM_X86_VMX_X86_OPS_H */ From patchwork Tue Nov 12 07:38:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871820 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE14620E01E; Tue, 12 Nov 2024 07:40:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397259; cv=none; b=PS96s7lDRUBNRjWj1IPGbKZ6LPXPguEVkrmPzHTAGLEuAl7R9bm+2Eh+FTJabcOnbRgs/kMaa3eL5saZa6f27DaJhtVYthbF6df4gp62Uews6HZsH6b5hRBcGeSDii1YpLy8POfOeIStK8BhP+RkaBuFCOt38un9H4zpmslfsW0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397259; c=relaxed/simple; bh=bbr8e4XV5c43aR99S/J74bCRTx8fPaKjjBt9L4PASU8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HT/IVhvIU/aMOX3Nkc816HfbmETTeQEcT+pDowXwAffLweOZ9IeMzlSY5ET0TGp1d+/IEBZZbD5Gh3K5Tl7TUKdK8njzP5VQ0fTyLGCBEgZh/cIYuG8UnOmnuaHIScBa7Uz4HECEn/wNr3Qa+mW+468hSvMk5Whn7nCovujfVig= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VfpXt4Xg; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VfpXt4Xg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397258; x=1762933258; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bbr8e4XV5c43aR99S/J74bCRTx8fPaKjjBt9L4PASU8=; b=VfpXt4Xgb1yTaqdGmiXpGMeOc3kSEKPKEZpUtHAeNQwLFUMBR8mZpeWT 9PpRjKUVY+ygXTzhWUuiMj51liscLLwlDsJQSYXJdtG7l9t6ghuOXSOp3 HJl9YP3djLwtz6A8CyIcGPprYSDslWn/9118HTaoHMgpOt/OwrEA8/LJq mPG4jcmqNi+FOXie4Tmsy+pBOli/3NMjIIOYD0ftoSIfvxzbkxI+B8Cjg TawIc9r/JBK4X/2/pqtoO9OEMpRjHdmoUWgDal0G7vL3nyJjf9jNh6oCl xmf5bmHqzFiSmNBHLYFBYiR7tkSVlXKrCoaAiru1VjoSFMeJ/v/uf6l6l g==; X-CSE-ConnectionGUID: mxOcKTT4RT2K8DsPdHBQAw== X-CSE-MsgGUID: 7lNy3ToFSF25WF9pSAlLfw== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="42598925" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="42598925" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:58 -0800 X-CSE-ConnectionGUID: QwE7x1WjQH2Jk6kfd6ht+Q== X-CSE-MsgGUID: /rOO1/2vRtugSBUhJaMBhg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87427138" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:40:54 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 20/24] KVM: x86/mmu: Export kvm_tdp_map_page() Date: Tue, 12 Nov 2024 15:38:27 +0800 Message-ID: <20241112073827.22270-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Rick Edgecombe In future changes coco specific code will need to call kvm_tdp_map_page() from within their respective gmem_post_populate() callbacks. Export it so this can be done from vendor specific code. Since kvm_mmu_reload() will be needed for this operation, export its callee kvm_mmu_load() as well. Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - Updated patch msg to mention kvm_mmu_load() is kvm_mmu_reload()'s callee (Paolo) TDX MMU part 2 v1: - New patch --- arch/x86/kvm/mmu/mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e2f75c8145fd..7157e87c5e07 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4782,6 +4782,7 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level return -EIO; } } +EXPORT_SYMBOL_GPL(kvm_tdp_map_page); long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range) @@ -5805,6 +5806,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) out: return r; } +EXPORT_SYMBOL_GPL(kvm_mmu_load); void kvm_mmu_unload(struct kvm_vcpu *vcpu) { From patchwork Tue Nov 12 07:38:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871821 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E5BE2141D3; Tue, 12 Nov 2024 07:41:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397271; cv=none; b=mbjnV2dqq0mfGrPm56YZDUHcljs1baYKktozokYsiLhM8A3b0ObdL1gJ24+dqjEa8E6Vn0P6vDiYROnR+Vzof2e0v7LJKzZ9CV2yQxvbwJ318NnmJ/hWO02foyQWDqnd3c+oN1L3afiD1x/Px3Ykj4EUTha1M2J4UjJd6SmkKu8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397271; c=relaxed/simple; bh=07u0JHTlb95NTfYGRNDculJVsJkNEKNN/iPuy0fvY3k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oShNMhn3Dj4xVOriVQbrW6Qn71DQjIsz8Bzk15J5RpOPAdx2I/pAnhTmalA0E6V5Byh7qmermbMtiNSQ75fMMjxuqm/MIxzb0M698B/e+8wLqXZjnxG5PGM8Mg0B2VtKPDAS2QKa8ivdWVoa8NW1zixhTcHgSA2i8WfMY7NKbN8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Q17HO1RV; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Q17HO1RV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397269; x=1762933269; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=07u0JHTlb95NTfYGRNDculJVsJkNEKNN/iPuy0fvY3k=; b=Q17HO1RV/H3JWgptbUr586bJ4kYhaVRdhHNZWGNzqxtbqYE+CHT5kL41 r/jmwS04aDZfgN3ff2WmdFvV5pansOc5ymlhuzVOwI6ReAxiOG60ns0Ta pN5rJYIhORChoiR/WUbqKx/FlaYkFIvzENgm2gLe3RjM5x84k2LYCTWOQ Va0zP+V2Pb8x1uc40+ZbMhhRWy4UwoFv8SuE6BDgmNnh9fVRLiwjwMmMy 4pIQfjVYc/Zdy3FjpvGlCx0wgm1hxC8M0+HLPM2qIJwypWVasQf1PUw3J CXvi8CyPU87o+MEjvCzmFHi3qzu0zhyJs4sADLYkDgvqJDYDnkeR+ryZp Q==; X-CSE-ConnectionGUID: i3jTT3JFTjSqNdpTfF9R/g== X-CSE-MsgGUID: FHWFxUSORJC8N6DSqYOMDg== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="42598967" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="42598967" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:08 -0800 X-CSE-ConnectionGUID: tJlB+OgGSL+AEOfbKDYlEA== X-CSE-MsgGUID: Xde3ebUbTPOfEy/HofXmig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="87427183" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:04 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 21/24] KVM: TDX: Add an ioctl to create initial guest memory Date: Tue, 12 Nov 2024 15:38:37 +0800 Message-ID: <20241112073837.22284-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a new ioctl for the user space VMM to initialize guest memory with the specified memory contents. Because TDX protects the guest's memory, the creation of the initial guest memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of directly copying the memory contents into the guest's memory in the case of the default VM type. Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign the guest page in Secure-EPT, copy the initial memory contents into the guest memory, and encrypt the guest memory. Optionally, extend the memory measurement of the TDX guest. The ioctl uses the vCPU file descriptor because of the TDX module's requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD) prior to initialization (TDH.MEM.PAGE.ADD). Accessing the MMU in turn requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY. In fact, the post-populate callback is able to reuse the same logic used by KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single ioctl. Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that point. This ensures that there cannot be pages in the S-EPT awaiting TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT, but the corresponding S-EPT entry will be !PRESENT). Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - Updated commit msg (Paolo) - Added a guard around kvm_tdp_mmu_gpa_is_mapped() (Paolo). - Remove checking kvm_mem_is_private() in tdx_gmem_post_populate (Rick) - No need for is_td_finalized() (Rick) - Remove decrement of nr_premapped (moved to "Finalize VM initialization" patch) (Paolo) - Take slots_lock before checking kvm_tdx->finalized in tdx_vcpu_init_mem_region(), and use guard() (Paolo) - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add TD state handling (Tony) TDX MMU part 2 v1: - Update the code according to latest gmem update. https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/ - Fixup a aligment bug reported by Binbin. - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean) - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to KVM_TDX_INIT_MEM_REGION. (Sean) - Added nr_premapped to track the number of premapped pages - Drop tdx_post_mmu_map_page(). - Drop kvm_slot_can_be_private() check (Paolo) - Use kvm_tdp_mmu_gpa_is_mapped() (Paolo) v19: - Switched to use KVM_MEMORY_MAPPING - Dropped measurement extension - updated commit message. private_page_add() => set_private_spte() --- arch/x86/include/uapi/asm/kvm.h | 9 ++ arch/x86/kvm/vmx/tdx.c | 147 ++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 1 + 3 files changed, 157 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 36fa03376581..a19cd84cec76 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, @@ -985,4 +986,12 @@ struct kvm_tdx_init_vm { struct kvm_cpuid2 cpuid; }; +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ead520083397..15cedacd717a 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include "capabilities.h" @@ -7,6 +8,7 @@ #include "tdx.h" #include "vmx.h" #include "mmu/spte.h" +#include "common.h" #undef pr_fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -1597,6 +1599,148 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) return 0; } +struct tdx_gmem_post_populate_arg { + struct kvm_vcpu *vcpu; + __u32 flags; +}; + +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + void __user *src, int order, void *_arg) +{ + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct tdx_gmem_post_populate_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + gpa_t gpa = gfn_to_gpa(gfn); + u8 level = PG_LEVEL_4K; + struct page *page; + int ret, i; + u64 err, entry, level_state; + + /* + * Get the source page if it has been faulted in. Return failure if the + * source page has been swapped out or unmapped in primary memory. + */ + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page); + if (ret < 0) + return ret; + if (ret != 1) + return -ENOMEM; + + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level); + if (ret < 0) + goto out; + + /* + * The private mem cannot be zapped after kvm_tdp_map_page() + * because all paths are covered by slots_lock and the + * filemap invalidate lock. Check that they are indeed enough. + */ + if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) { + scoped_guard(read_lock, &kvm->mmu_lock) { + if (KVM_BUG_ON(!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa), kvm)) { + ret = -EIO; + goto out; + } + } + } + + ret = 0; + do { + err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn), + pfn_to_hpa(page_to_pfn(page)), + &entry, &level_state); + } while (err == TDX_ERROR_SEPT_BUSY); + if (err) { + ret = -EIO; + goto out; + } + + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { + err = tdh_mr_extend(kvm_tdx->tdr_pa, gpa + i, &entry, + &level_state); + if (err) { + ret = -EIO; + break; + } + } + } + +out: + put_page(page); + return ret; +} + +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + struct kvm *kvm = vcpu->kvm; + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_mem_region region; + struct tdx_gmem_post_populate_arg arg; + long gmem_ret; + int ret; + + if (tdx->state != VCPU_TD_STATE_INITIALIZED) + return -EINVAL; + + guard(mutex)(&kvm->slots_lock); + + /* Once TD is finalized, the initial guest memory is fixed. */ + if (kvm_tdx->state == TD_STATE_RUNNABLE) + return -EINVAL; + + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) + return -EINVAL; + + if (copy_from_user(®ion, u64_to_user_ptr(cmd->data), sizeof(region))) + return -EFAULT; + + if (!PAGE_ALIGNED(region.source_addr) || !PAGE_ALIGNED(region.gpa) || + !region.nr_pages || + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || + !vt_is_tdx_private_gpa(kvm, region.gpa) || + !vt_is_tdx_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT) - 1)) + return -EINVAL; + + kvm_mmu_reload(vcpu); + ret = 0; + while (region.nr_pages) { + if (signal_pending(current)) { + ret = -EINTR; + break; + } + + arg = (struct tdx_gmem_post_populate_arg) { + .vcpu = vcpu, + .flags = cmd->flags, + }; + gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa), + u64_to_user_ptr(region.source_addr), + 1, tdx_gmem_post_populate, &arg); + if (gmem_ret < 0) { + ret = gmem_ret; + break; + } + + if (gmem_ret != 1) { + ret = -EIO; + break; + } + + region.source_addr += PAGE_SIZE; + region.gpa += PAGE_SIZE; + region.nr_pages--; + + cond_resched(); + } + + if (copy_to_user(u64_to_user_ptr(cmd->data), ®ion, sizeof(region))) + ret = -EFAULT; + return ret; +} + int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -1616,6 +1760,9 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) case KVM_TDX_INIT_VCPU: ret = tdx_vcpu_init(vcpu, &cmd); break; + case KVM_TDX_INIT_MEM_REGION: + ret = tdx_vcpu_init_mem_region(vcpu, &cmd); + break; case KVM_TDX_GET_CPUID: ret = tdx_vcpu_get_cpuid(vcpu, &cmd); break; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 152afe67a00b..5901d03e372c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2600,6 +2600,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn return NULL; } +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn) { From patchwork Tue Nov 12 07:38:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871822 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D1BD20E32C; Tue, 12 Nov 2024 07:41:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397283; cv=none; b=DOXnneHsmGibdp2c2bgDJbb0REMhS19P4cRkCpOzFj4NLu13SGN1HQrh/3Shyx1Zp1DoOr3bBCAIX82MHoHAHe09sTcf4A7R+thkwV0HvL2EwhWC0BTwMEEwsZw6WiHouN37GaJEyjwHEUArQZGnhaEXoCmrk5Hytf41Dj7EDQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397283; c=relaxed/simple; bh=2p2WLMrmWNOutVqrBIwb/cMEvzMPFLs3YqKeMUFkIRI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SHOXcULZYVJH7uej9IzijXVGKwM44MmcnTBeeQ24XFZkhVG0zmlQgbE1C6uZG3q8vRWzRDabpvuCJKn+YvJqPQkCLZYWcBp8hN2Lg6T5Sr9ErdHbIznn/nFkfF8LyW/+uIQKm4d1m0fnaviVUsWF0CCG0hfgLjfTGN69irYRDCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E8K38Ocg; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E8K38Ocg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397282; x=1762933282; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2p2WLMrmWNOutVqrBIwb/cMEvzMPFLs3YqKeMUFkIRI=; b=E8K38OcgUqTfyBP2p9qlh5y8MZgSjvXQCWVl1Tj7t3g2DJnvYyDerBfr on+IZDTkHb3XcTnJNqG+9DbRlCRu4CehYxmPnzdeM3webbdFDhD8GgiOj E5iFLuoOE8wfFHHVEU24G+X00s6jkTNxcgrlZ0kU4DEQcaqNTdszpYZ8s Nztn/yAGhZpiqikwLIugUJmMsbtsJ0X4qtUhiVUnQetkByizU7A8Uz7Fi aC2VF4ap/Q6Jv/DibRna0Tza+g/kOLtx2BqKXIxgv3a1HS7lWmNY1Qp0E Beail5aXFFQH6ofgXD/wQ1yaEGG6nbVXI4EaHfiBlZWDYO8gp+ktC+xBM Q==; X-CSE-ConnectionGUID: YYrWEUU6QJSO6/mvD6a4Vw== X-CSE-MsgGUID: YSX0nIjPRduPnA3s29PTFw== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31311578" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31311578" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:19 -0800 X-CSE-ConnectionGUID: +4WwSx3LQHS3EdM6kqC18w== X-CSE-MsgGUID: 76bSvV3/RqGsDwaM8ETWOg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="124830653" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:15 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 22/24] KVM: TDX: Finalize VM initialization Date: Tue, 12 Nov 2024 15:38:48 +0800 Message-ID: <20241112073848.22298-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Introduce a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand, KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization. The API documentation is provided in a separate patch: “Documentation/virt/kvm: Document on Trust Domain Extensions (TDX)”. Enhance TDX’s set_external_spte() hook to record the pre-mapping count instead of returning without action when the TD is not finalized. Adjust the pre-mapping count when pages are added or if the mapping is dropped. Set pre_fault_allowed to true after the finalization is complete. Note: TD Measurement Finalization is the process by which the initial state of the TDX VM is measured for attestation purposes. It uses the SEAMCALL TDH.MR.FINALIZE, after which: 1. The VMM can no longer add TD private pages with arbitrary content. 2. The TDX VM becomes runnable. Signed-off-by: Isaku Yamahata Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- TDX MMU part 2 v2 - Merge changes from patch "KVM: TDX: Premap initial guest memory" into this patch (Paolo) - Consolidate nr_premapped counting into this patch (Paolo) - Page level check should be (and is) in tdx_sept_set_private_spte() in patch "KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table" not in tdx_mem_page_record_premap_cnt() (Paolo) - Protect finalization using kvm->slots_lock (Paolo) - Set kvm->arch.pre_fault_allowed to true after finalization is done (Paolo) - Add a memory barrier to ensure correct ordering of the updates to kvm_tdx->finalized and kvm->arch.pre_fault_allowed (Adrian) - pre_fault_allowed must not be true before finalization is done. Highlight that fact by checking it in tdx_mem_page_record_premap_cnt() (Adrian) - No need for is_td_finalized() (Rick) - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Add nr_premapped where it's first used (Tao) TDX MMU part 2 v1: - Added premapped check. - Update for the wrapper functions for SEAMCALLs. (Sean) - Add check if nr_premapped is zero. If not, return error. - Use KVM_BUG_ON() in tdx_td_finalizer() for consistency. - Change tdx_td_finalizemr() to take struct kvm_tdx_cmd *cmd and return error (Adrian) - Handle TDX_OPERAND_BUSY case (Adrian) - Updates from seamcall overhaul (Kai) - Rename error->hw_error v18: - Remove the change of tools/arch/x86/include/uapi/asm/kvm.h. v15: - removed unconditional tdx_track() by tdx_flush_tlb_current() that does tdx_track(). --- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/vmx/tdx.c | 78 ++++++++++++++++++++++++++++++--- arch/x86/kvm/vmx/tdx.h | 3 ++ 3 files changed, 75 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index a19cd84cec76..eee6de05f261 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -932,6 +932,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, KVM_TDX_INIT_MEM_REGION, + KVM_TDX_FINALIZE_VM, KVM_TDX_GET_CPUID, KVM_TDX_CMD_NR_MAX, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 15cedacd717a..acaa11be1031 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -563,6 +563,31 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, return 0; } +/* + * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to get guest pages and + * tdx_gmem_post_populate() to premap page table pages into private EPT. + * Mapping guest pages into private EPT before TD is finalized should use a + * seamcall TDH.MEM.PAGE.ADD(), which copies page content from a source page + * from user to target guest pages to be added. This source page is not + * available via common interface kvm_tdp_map_page(). So, currently, + * kvm_tdp_map_page() only premaps guest pages into KVM mirrored root. + * A counter nr_premapped is increased here to record status. The counter will + * be decreased after TDH.MEM.PAGE.ADD() is called after the kvm_tdp_map_page() + * in tdx_gmem_post_populate(). + */ +static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, kvm_pfn_t pfn) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm)) + return -EINVAL; + + /* nr_premapped will be decreased when tdh_mem_page_add() is called. */ + atomic64_inc(&kvm_tdx->nr_premapped); + return 0; +} + int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { @@ -582,14 +607,15 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, */ get_page(pfn_to_page(pfn)); + /* + * To match ordering of 'finalized' and 'pre_fault_allowed' in + * tdx_td_finalizemr(). + */ + smp_rmb(); if (likely(kvm_tdx->state == TD_STATE_RUNNABLE)) return tdx_mem_page_aug(kvm, gfn, level, pfn); - /* - * TODO: KVM_TDX_INIT_MEM_REGION support to populate before finalize - * comes here for the initial memory. - */ - return -EOPNOTSUPP; + return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); } static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, @@ -621,10 +647,12 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE && err == (TDX_EPT_WALK_FAILED | TDX_OPERAND_ID_RCX))) { /* - * This page was mapped with KVM_MAP_MEMORY, but - * KVM_TDX_INIT_MEM_REGION is not issued yet. + * Page is mapped by KVM_TDX_INIT_MEM_REGION, but hasn't called + * tdh_mem_page_add(). */ if (!is_last_spte(entry, level) || !(entry & VMX_EPT_RWX_MASK)) { + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); + atomic64_dec(&kvm_tdx->nr_premapped); tdx_unpin(kvm, pfn); return 0; } @@ -1368,6 +1396,36 @@ void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) ept_sync_global(); } +static int tdx_td_finalizemr(struct kvm *kvm, struct kvm_tdx_cmd *cmd) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + + guard(mutex)(&kvm->slots_lock); + + if (!is_hkid_assigned(kvm_tdx) || kvm_tdx->state == TD_STATE_RUNNABLE) + return -EINVAL; + /* + * Pages are pending for KVM_TDX_INIT_MEM_REGION to issue + * TDH.MEM.PAGE.ADD(). + */ + if (atomic64_read(&kvm_tdx->nr_premapped)) + return -EINVAL; + + cmd->hw_error = tdh_mr_finalize(kvm_tdx->tdr_pa); + if ((cmd->hw_error & TDX_SEAMCALL_STATUS_MASK) == TDX_OPERAND_BUSY) + return -EAGAIN; + if (KVM_BUG_ON(cmd->hw_error, kvm)) { + pr_tdx_error(TDH_MR_FINALIZE, cmd->hw_error); + return -EIO; + } + + kvm_tdx->state = TD_STATE_RUNNABLE; + /* TD_STATE_RUNNABLE must be set before 'pre_fault_allowed' */ + smp_wmb(); + kvm->arch.pre_fault_allowed = true; + return 0; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -1392,6 +1450,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_INIT_VM: r = tdx_td_init(kvm, &tdx_cmd); break; + case KVM_TDX_FINALIZE_VM: + r = tdx_td_finalizemr(kvm, &tdx_cmd); + break; default: r = -EINVAL; goto out; @@ -1656,6 +1717,9 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, goto out; } + WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); + atomic64_dec(&kvm_tdx->nr_premapped); + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { err = tdh_mr_extend(kvm_tdx->tdr_pa, gpa + i, &entry, diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 727bcf25d731..aeddf2bb0a94 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -32,6 +32,9 @@ struct kvm_tdx { u64 tsc_offset; enum kvm_tdx_state state; + + /* For KVM_TDX_INIT_MEM_REGION. */ + atomic64_t nr_premapped; }; /* TDX module vCPU states */ From patchwork Tue Nov 12 07:38:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871823 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9AA620B7EB; Tue, 12 Nov 2024 07:41:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397292; cv=none; b=nPyMc1wWZ2IyeD8VUiYJmG8tHLfpMuqZo2G8nzF+URYsiGtNuqCPqGKPGLMmeoq0PAGr6ItpCvp2/a0YDIo27sSQRGx/zOysMn70xrcM8+O8605f/isuu6wd8fmjP3yZgWzQ2SOOidP3LdxWM4/DYWXVdFcRxpeG2WplKNYcAxc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397292; c=relaxed/simple; bh=8maquX7rIqjbR3syOA0q3Nw/T2JMdSZH0sD7Nw18LSQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EegioWD3MpnwBLOTl/CG312OU/xEs5+MMBjnDTJ0iGeFV3CjbTLzak44yMBsMcQk3vxJMx6E2y3J6AL71H8Im/Oqx+Mew8FIvEZ7oDG2mGDM+CA7/aG1/b4FIZHPRz+xf59GSvCbbZ2XiwnGqJ/9PuPY1teANXnFigtJpCydLFA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=j1F3GSHD; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="j1F3GSHD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397290; x=1762933290; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8maquX7rIqjbR3syOA0q3Nw/T2JMdSZH0sD7Nw18LSQ=; b=j1F3GSHDlK+ODsHvXQcmZBkGOrm8i6r6dmZkMg/92ZlDQL5aYjnuAi7y c3ryO+mIa9bb5ixr8UKLW6X72wJuvVFvryMGq0q56ziheTBtquoSrkOet BBXBZcMfd7CvkC4LhGFVYuu1cK43O8lZOAYaF/iht29gwfyfAHaa84vKq B0t/CC72v9dhzaFOOuhbFw/mxn4ZLO6w9t8/f8u2jMp9dWKOvVTJA+J3w mQnWC952lrRxmYwwd/aFvHG8yZVl04Fcowv5oLuqe4FfGLG/XzdiAGkIE rRPkwvI3vlFYmZJqD2lpDKIDsCNFWnVRp8jFrzyypW/CM6NdZfVCUWllr w==; X-CSE-ConnectionGUID: Z5KL+r+kSeuWhGAKm20HsA== X-CSE-MsgGUID: +73cIhYsTw+z7FTtUqjQug== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31311600" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31311600" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:29 -0800 X-CSE-ConnectionGUID: OrDPkGjCS0GehITXlhVenA== X-CSE-MsgGUID: IHJs6NldQc2vc09pS73TbQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="124830667" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:25 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 23/24] KVM: TDX: Handle vCPU dissociation Date: Tue, 12 Nov 2024 15:38:58 +0800 Message-ID: <20241112073858.22312-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes the address translation caches and cached TD VMCS of a TD vCPU in its associated pCPU. In TDX, a vCPUs can only be associated with one pCPU at a time, which is done by invoking SEAMCALL TDH.VP.ENTER. For a successful association, the vCPU must be dissociated from its previous associated pCPU. To facilitate vCPU dissociation, introduce a per-pCPU list associated_tdvcpus. Add a vCPU into this list when it's loaded into a new pCPU (i.e. when a vCPU is loaded for the first time or migrated to a new pCPU). vCPU dissociations can happen under below conditions: - On the op hardware_disable is called. This op is called when virtualization is disabled on a given pCPU, e.g. when hot-unplug a pCPU or machine shutdown/suspend. In this case, dissociate all vCPUs from the pCPU by iterating its per-pCPU list associated_tdvcpus. - On vCPU migration to a new pCPU. Before adding a vCPU into associated_tdvcpus list of the new pCPU, dissociation from its old pCPU is required, which is performed by issuing an IPI and executing SEAMCALL TDH.VP.FLUSH on the old pCPU. On a successful dissociation, the vCPU will be removed from the associated_tdvcpus list of its previously associated pCPU. - On tdx_mmu_release_hkid() is called. TDX mandates that all vCPUs must be disassociated prior to the release of an hkid. Therefore, dissociation of all vCPUs is a must before executing the SEAMCALL TDH.MNG.VPFLUSHDONE and subsequently freeing the hkid. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - No need for is_td_vcpu_created() (Rick) - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL wrappers (Kai) - Rename vt_hardware_disable() and tdx_hardware_disable() to track upstream changes - Updated the comment of per-cpu list (Yan) - Added an assertion KVM_BUG_ON(cpu != raw_smp_processor_id(), vcpu->kvm) in tdx_vcpu_load(). (Yan) TDX MMU part 2 v1: - Changed title to "KVM: TDX: Handle vCPU dissociation" . - Updated commit log. - Removed calling tdx_disassociate_vp_on_cpu() in tdx_vcpu_free() since no new TD enter would be called for vCPU association after tdx_mmu_release_hkid(), which is now called in vt_vm_destroy(), i.e. after releasing vcpu fd and kvm_unload_vcpu_mmus(), and before tdx_vcpu_free(). - TODO: include Isaku's fix https://eclists.intel.com/sympa/arc/kvm-qemu-review/2024-07/msg00359.html - Update for the wrapper functions for SEAMCALLs. (Sean) - Removed unnecessary pr_err() in tdx_flush_vp_on_cpu(). - Use KVM_BUG_ON() in tdx_flush_vp_on_cpu() for consistency. - Capitalize the first word of tile. (Binbin) - Minor fixed in changelog. (Binbin, Reinette(internal)) - Fix some comments. (Binbin, Reinette(internal)) - Rename arg_ to _arg (Binbin) - Updates from seamcall overhaul (Kai) - Remove lockdep_assert_preemption_disabled() in tdx_hardware_setup() since now hardware_enable() is not called via SMP func call anymore, but (per-cpu) CPU hotplug thread - Use KVM_BUG_ON() for SEAMCALLs in tdx_mmu_release_hkid() (Kai) - Update based on upstream commit "KVM: x86: Fold kvm_arch_sched_in() into kvm_arch_vcpu_load()" - Eliminate TDX_FLUSHVP_NOT_DONE error check because vCPUs were all freed. So the error won't happen. (Sean) --- arch/x86/kvm/vmx/main.c | 22 ++++- arch/x86/kvm/vmx/tdx.c | 159 +++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.h | 2 + arch/x86/kvm/vmx/x86_ops.h | 4 + 4 files changed, 177 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 244fb80d385a..bfed421e6fbb 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -10,6 +10,14 @@ #include "tdx.h" #include "tdx_arch.h" +static void vt_disable_virtualization_cpu(void) +{ + /* Note, TDX *and* VMX need to be disabled if TDX is enabled. */ + if (enable_tdx) + tdx_disable_virtualization_cpu(); + vmx_disable_virtualization_cpu(); +} + static __init int vt_hardware_setup(void) { int ret; @@ -111,6 +119,16 @@ static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_vcpu_reset(vcpu, init_event); } +static void vt_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + if (is_td_vcpu(vcpu)) { + tdx_vcpu_load(vcpu, cpu); + return; + } + + vmx_vcpu_load(vcpu, cpu); +} + static void vt_flush_tlb_all(struct kvm_vcpu *vcpu) { if (is_td_vcpu(vcpu)) { @@ -199,7 +217,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .hardware_unsetup = vmx_hardware_unsetup, .enable_virtualization_cpu = vmx_enable_virtualization_cpu, - .disable_virtualization_cpu = vmx_disable_virtualization_cpu, + .disable_virtualization_cpu = vt_disable_virtualization_cpu, .emergency_disable_virtualization_cpu = vmx_emergency_disable_virtualization_cpu, .has_emulated_msr = vmx_has_emulated_msr, @@ -216,7 +234,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_reset = vt_vcpu_reset, .prepare_switch_to_guest = vmx_prepare_switch_to_guest, - .vcpu_load = vmx_vcpu_load, + .vcpu_load = vt_vcpu_load, .vcpu_put = vmx_vcpu_put, .update_exception_bitmap = vmx_update_exception_bitmap, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index acaa11be1031..dc6c5f40608e 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -155,6 +155,21 @@ static inline int pg_level_to_tdx_sept_level(enum pg_level level) /* Maximum number of retries to attempt for SEAMCALLs. */ #define TDX_SEAMCALL_RETRIES 10000 +/* + * A per-CPU list of TD vCPUs associated with a given CPU. + * Protected by interrupt mask. Only manipulated by the CPU owning this per-CPU + * list. + * - When a vCPU is loaded onto a CPU, it is removed from the per-CPU list of + * the old CPU during the IPI callback running on the old CPU, and then added + * to the per-CPU list of the new CPU. + * - When a TD is tearing down, all vCPUs are disassociated from their current + * running CPUs and removed from the per-CPU list during the IPI callback + * running on those CPUs. + * - When a CPU is brought down, traverse the per-CPU list to disassociate all + * associated TD vCPUs and remove them from the per-CPU list. + */ +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus); + static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) { return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); @@ -172,6 +187,22 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) return kvm_tdx->hkid > 0; } +static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu) +{ + lockdep_assert_irqs_disabled(); + + list_del(&to_tdx(vcpu)->cpu_list); + + /* + * Ensure tdx->cpu_list is updated before setting vcpu->cpu to -1, + * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU + * to its list before it's deleted from this CPU's list. + */ + smp_wmb(); + + vcpu->cpu = -1; +} + static void tdx_clear_page(unsigned long page_pa) { const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); @@ -252,6 +283,83 @@ static void tdx_reclaim_control_page(unsigned long ctrl_page_pa) free_page((unsigned long)__va(ctrl_page_pa)); } +struct tdx_flush_vp_arg { + struct kvm_vcpu *vcpu; + u64 err; +}; + +static void tdx_flush_vp(void *_arg) +{ + struct tdx_flush_vp_arg *arg = _arg; + struct kvm_vcpu *vcpu = arg->vcpu; + u64 err; + + arg->err = 0; + lockdep_assert_irqs_disabled(); + + /* Task migration can race with CPU offlining. */ + if (unlikely(vcpu->cpu != raw_smp_processor_id())) + return; + + /* + * No need to do TDH_VP_FLUSH if the vCPU hasn't been initialized. The + * list tracking still needs to be updated so that it's correct if/when + * the vCPU does get initialized. + */ + if (to_tdx(vcpu)->state != VCPU_TD_STATE_UNINITIALIZED) { + /* + * No need to retry. TDX Resources needed for TDH.VP.FLUSH are: + * TDVPR as exclusive, TDR as shared, and TDCS as shared. This + * vp flush function is called when destructing vCPU/TD or vCPU + * migration. No other thread uses TDVPR in those cases. + */ + err = tdh_vp_flush(to_tdx(vcpu)->tdvpr_pa); + if (unlikely(err && err != TDX_VCPU_NOT_ASSOCIATED)) { + /* + * This function is called in IPI context. Do not use + * printk to avoid console semaphore. + * The caller prints out the error message, instead. + */ + if (err) + arg->err = err; + } + } + + tdx_disassociate_vp(vcpu); +} + +static void tdx_flush_vp_on_cpu(struct kvm_vcpu *vcpu) +{ + struct tdx_flush_vp_arg arg = { + .vcpu = vcpu, + }; + int cpu = vcpu->cpu; + + if (unlikely(cpu == -1)) + return; + + smp_call_function_single(cpu, tdx_flush_vp, &arg, 1); + if (KVM_BUG_ON(arg.err, vcpu->kvm)) + pr_tdx_error(TDH_VP_FLUSH, arg.err); +} + +void tdx_disable_virtualization_cpu(void) +{ + int cpu = raw_smp_processor_id(); + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu); + struct tdx_flush_vp_arg arg; + struct vcpu_tdx *tdx, *tmp; + unsigned long flags; + + local_irq_save(flags); + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */ + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list) { + arg.vcpu = &tdx->vcpu; + tdx_flush_vp(&arg); + } + local_irq_restore(flags); +} + static void smp_func_do_phymem_cache_wb(void *unused) { u64 err = 0; @@ -288,22 +396,21 @@ void tdx_mmu_release_hkid(struct kvm *kvm) bool packages_allocated, targets_allocated; struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); cpumask_var_t packages, targets; - u64 err; + struct kvm_vcpu *vcpu; + unsigned long j; int i; + u64 err; if (!is_hkid_assigned(kvm_tdx)) return; - /* KeyID has been allocated but guest is not yet configured */ - if (!kvm_tdx->tdr_pa) { - tdx_hkid_free(kvm_tdx); - return; - } - packages_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); targets_allocated = zalloc_cpumask_var(&targets, GFP_KERNEL); cpus_read_lock(); + kvm_for_each_vcpu(j, vcpu, kvm) + tdx_flush_vp_on_cpu(vcpu); + /* * TDH.PHYMEM.CACHE.WB tries to acquire the TDX module global lock * and can fail with TDX_OPERAND_BUSY when it fails to get the lock. @@ -317,6 +424,16 @@ void tdx_mmu_release_hkid(struct kvm *kvm) * After the above flushing vps, there should be no more vCPU * associations, as all vCPU fds have been released at this stage. */ + err = tdh_mng_vpflushdone(kvm_tdx->tdr_pa); + if (err == TDX_FLUSHVP_NOT_DONE) + goto out; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MNG_VPFLUSHDONE, err); + pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n", + kvm_tdx->hkid); + goto out; + } + for_each_online_cpu(i) { if (packages_allocated && cpumask_test_and_set_cpu(topology_physical_package_id(i), @@ -342,6 +459,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm) tdx_hkid_free(kvm_tdx); } +out: mutex_unlock(&tdx_lock); cpus_read_unlock(); free_cpumask_var(targets); @@ -489,6 +607,27 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return 0; } +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + if (vcpu->cpu == cpu) + return; + + tdx_flush_vp_on_cpu(vcpu); + + KVM_BUG_ON(cpu != raw_smp_processor_id(), vcpu->kvm); + local_irq_disable(); + /* + * Pairs with the smp_wmb() in tdx_disassociate_vp() to ensure + * vcpu->cpu is read before tdx->cpu_list. + */ + smp_rmb(); + + list_add(&tdx->cpu_list, &per_cpu(associated_tdvcpus, cpu)); + local_irq_enable(); +} + void tdx_vcpu_free(struct kvm_vcpu *vcpu) { struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); @@ -1937,7 +2076,7 @@ static int __init __do_tdx_bringup(void) static int __init __tdx_bringup(void) { const struct tdx_sys_info_td_conf *td_conf; - int r; + int r, i; if (!tdp_mmu_enabled || !enable_mmio_caching) return -EOPNOTSUPP; @@ -1947,6 +2086,10 @@ static int __init __tdx_bringup(void) return -EOPNOTSUPP; } + /* tdx_disable_virtualization_cpu() uses associated_tdvcpus. */ + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, i)); + /* * Enabling TDX requires enabling hardware virtualization first, * as making SEAMCALLs requires CPU being in post-VMXON state. diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index aeddf2bb0a94..899654519df6 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -49,6 +49,8 @@ struct vcpu_tdx { unsigned long tdvpr_pa; unsigned long *tdcx_pa; + struct list_head cpu_list; + enum vcpu_tdx_state state; }; diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index f61daac5f2f0..06583b1afa4f 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -119,6 +119,7 @@ void vmx_cancel_hv_timer(struct kvm_vcpu *vcpu); void vmx_setup_mce(struct kvm_vcpu *vcpu); #ifdef CONFIG_INTEL_TDX_HOST +void tdx_disable_virtualization_cpu(void); int tdx_vm_init(struct kvm *kvm); void tdx_mmu_release_hkid(struct kvm *kvm); void tdx_vm_free(struct kvm *kvm); @@ -127,6 +128,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp); int tdx_vcpu_create(struct kvm_vcpu *vcpu); void tdx_vcpu_free(struct kvm_vcpu *vcpu); +void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu); int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp); @@ -144,6 +146,7 @@ void tdx_flush_tlb_all(struct kvm_vcpu *vcpu); void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn); #else +static inline void tdx_disable_virtualization_cpu(void) {} static inline int tdx_vm_init(struct kvm *kvm) { return -EOPNOTSUPP; } static inline void tdx_mmu_release_hkid(struct kvm *kvm) {} static inline void tdx_vm_free(struct kvm *kvm) {} @@ -152,6 +155,7 @@ static inline int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { return -EOP static inline int tdx_vcpu_create(struct kvm_vcpu *vcpu) { return -EOPNOTSUPP; } static inline void tdx_vcpu_free(struct kvm_vcpu *vcpu) {} +static inline void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {} static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; } From patchwork Tue Nov 12 07:39:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 13871824 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DDA720B7EB; Tue, 12 Nov 2024 07:41:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397303; cv=none; b=DWNbUK6yawuI99NbMHXzXFRfx72Mjl1KxVd0UUfQdKWhIH1BdZoZr8I683+whvZTaRRTay8BGfMqykM4VJxeQxlbrem1NZu9tLaUvylHhcWJArw0JtMtC8g+WtIpde1X1unYFH+1FX3ciJn/INI7ZqOO4NmfL+7P8FQd9R28yog= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397303; c=relaxed/simple; bh=hSJ6BEYyUGCyEcY/U+BKLm7dTIH/L848thDKTcyA8Xg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lJOlWINLMeioc5xzZiOmM8efxm5gdlssw2jvNWiG0aGpvbc9qOdT3LXok7+hWIeR7fFVGJKn+4wP/vbfFabicIkHcEtsGbHT4fNihKaRAhI+Qj0ZzsVT1b01QadO0ksHMuUbnKdmQsZO/6xL+twPN0zfop5Dphd+jd+/PyYzD+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=d5noO6F9; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="d5noO6F9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1731397301; x=1762933301; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hSJ6BEYyUGCyEcY/U+BKLm7dTIH/L848thDKTcyA8Xg=; b=d5noO6F9wG/YrECAoUAXCtQOrZzmEbAr6uRYhkdn1nsZI61iDNNHaE6g jXoK+QTmFhN6SxpuO2QreDZwf3H5qo6Vv7JsWl16W63UIDnTIHesHVARE HrfHqbAqRQJnWpSFj+pLQ8AtmmPUQXelYQOG9F90dSnFQw7SdgS8biy6M gV/WqCdm4/TyAFeyEBgNbgJo0L2EBmozzyc5yvzi663fiN/DJ6GRX7Q5l j6wG/pbvsc+WAmUqlsCI0I83uKCXQb3C5yEfjUrtBO12p2jkj2BgbGlqC gpKF1qcKI1EpJvw+b74eXjCMcSptGpTFrH9+4tttKkoX+vS0EFUXUUCso g==; X-CSE-ConnectionGUID: xPd20s7eSzGx+WfioRESqA== X-CSE-MsgGUID: qG7rL3TBR9aQ73gZXpFosA== X-IronPort-AV: E=McAfee;i="6700,10204,11253"; a="31090942" X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="31090942" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:41 -0800 X-CSE-ConnectionGUID: TtLWdoWQQdmpQSpiSx+ODg== X-CSE-MsgGUID: v5QTDcqTQfyb5cCPTOPVZw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,147,1728975600"; d="scan'208";a="92089776" Received: from yzhao56-desk.sh.intel.com ([10.239.159.62]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2024 23:41:37 -0800 From: Yan Zhao To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@intel.com, binbin.wu@linux.intel.com, dmatlack@google.com, isaku.yamahata@intel.com, isaku.yamahata@gmail.com, nik.borisov@suse.com, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v2 24/24] [HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT Date: Tue, 12 Nov 2024 15:39:09 +0800 Message-ID: <20241112073909.22326-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20241112073327.21979-1-yan.y.zhao@intel.com> References: <20241112073327.21979-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yuan Yao Temporary retry in SEAMCALL wrappers when the TDX module returns TDX_OPERAND_BUSY with operand SEPT. The TDX module has many internal locks to protect its resources. To avoid staying in SEAM mode for too long, SEAMCALLs will return a TDX_OPERAND_BUSY error code to the kernel instead of spinning on the locks. Usually, callers of the SEAMCALL wrappers can avoid contentions by implementing proper locks on their side. For example, KVM can efficiently avoid the TDX module's lock contentions for resources like TDR, TDCS, KOT, and TDVPR by taking locks within KVM or making a resource per-thread. However, for performance reasons, callers like KVM may not want to use exclusive locks to avoid internal contentions on the SEPT tree within the TDX module. For instance, KVM allows TDH.VP.ENTER to run concurrently with TDH.MEM.SEPT.ADD, TDH.MEM.PAGE.AUG, and TDH.MEM.PAGE.REMOVE. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------------------ SEPT tree TDH.MEM.SEPT.ADD TDH.VP.ENTER TDH.MEM.PAGE.AUG TDH.MEM.SEPT.REMOVE TDH.MEM.PAGE.REMOVE TDH.MEM.RANGE.BLOCK Inside the TDX module, although TDH.VP.ENTER only acquires an exclusive lock on the SEPT tree when zero-step mitigation is triggered, it is still possible to encounter TDX_OPERAND_BUSY with operand SEPT in KVM. Retry in the SEAMCALL wrappers temporarily until KVM either retries on the caller side or finds a way to avoid the contentions. Note: The wrappers only retry for 16 times for the TDX_OPERAND_BUSY with operand SEPT. Retries exceeding 16 times are rare. SEAMCALLs TDH.MEM.* can also contend with TDCALL TDG.MEM.PAGE.ACCEPT, returning TDX_OPERAND_BUSY without operand SEPT. Do not retry in the SEAMCALL wrappers for such rare errors. Let the callers handle these rare errors. Signed-off-by: Yuan Yao Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao --- TDX MMU part 2 v2: - Updates the patch log. (Yan) TDX MMU part 2 v1: - Updates from seamcall overhaul (Kai) v19: - fix typo TDG.VP.ENTER => TDH.VP.ENTER, TDX_OPRRAN_BUSY => TDX_OPERAND_BUSY - drop the description on TDH.VP.ENTER as this patch doesn't touch TDH.VP.ENTER --- arch/x86/virt/vmx/tdx/tdx.c | 47 +++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 7e0574facfb0..04cb2f1d6deb 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1563,6 +1563,43 @@ void tdx_guest_keyid_free(unsigned int keyid) } EXPORT_SYMBOL_GPL(tdx_guest_keyid_free); +/* + * TDX module acquires its internal lock for resources. It doesn't spin to get + * locks because of its restrictions of allowed execution time. Instead, it + * returns TDX_OPERAND_BUSY with an operand id. + * + * Multiple VCPUs can operate on SEPT. Also with zero-step attack mitigation, + * TDH.VP.ENTER may rarely acquire SEPT lock and release it when zero-step + * attack is suspected. It results in TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT + * with TDH.MEM.* operation. Note: TDH.MEM.TRACK is an exception. + * + * Because TDP MMU uses read lock for scalability, spin lock around SEAMCALL + * spoils TDP MMU effort. Retry several times with the assumption that SEPT + * lock contention is rare. But don't loop forever to avoid lockup. Let TDP + * MMU retry. + */ +#define TDX_OPERAND_BUSY 0x8000020000000000ULL +#define TDX_OPERAND_ID_SEPT 0x92 + +#define TDX_ERROR_SEPT_BUSY (TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT) + +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in) +{ +#define SEAMCALL_RETRY_MAX 16 + struct tdx_module_args args_in; + int retry = SEAMCALL_RETRY_MAX; + u64 ret; + + do { + args_in = *in; + ret = seamcall_ret(op, in); + } while (ret == TDX_ERROR_SEPT_BUSY && retry-- > 0); + + *in = args_in; + + return ret; +} + u64 tdh_mng_addcx(u64 tdr, u64 tdcs) { struct tdx_module_args args = { @@ -1586,7 +1623,7 @@ u64 tdh_mem_page_add(u64 tdr, u64 gpa, u64 hpa, u64 source, u64 *rcx, u64 *rdx) u64 ret; clflush_cache_range(__va(hpa), PAGE_SIZE); - ret = seamcall_ret(TDH_MEM_PAGE_ADD, &args); + ret = tdx_seamcall_sept(TDH_MEM_PAGE_ADD, &args); *rcx = args.rcx; *rdx = args.rdx; @@ -1605,7 +1642,7 @@ u64 tdh_mem_sept_add(u64 tdr, u64 gpa, u64 level, u64 hpa, u64 *rcx, u64 *rdx) u64 ret; clflush_cache_range(__va(hpa), PAGE_SIZE); - ret = seamcall_ret(TDH_MEM_SEPT_ADD, &args); + ret = tdx_seamcall_sept(TDH_MEM_SEPT_ADD, &args); *rcx = args.rcx; *rdx = args.rdx; @@ -1636,7 +1673,7 @@ u64 tdh_mem_page_aug(u64 tdr, u64 gpa, u64 hpa, u64 *rcx, u64 *rdx) u64 ret; clflush_cache_range(__va(hpa), PAGE_SIZE); - ret = seamcall_ret(TDH_MEM_PAGE_AUG, &args); + ret = tdx_seamcall_sept(TDH_MEM_PAGE_AUG, &args); *rcx = args.rcx; *rdx = args.rdx; @@ -1653,7 +1690,7 @@ u64 tdh_mem_range_block(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx) }; u64 ret; - ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &args); + ret = tdx_seamcall_sept(TDH_MEM_RANGE_BLOCK, &args); *rcx = args.rcx; *rdx = args.rdx; @@ -1882,7 +1919,7 @@ u64 tdh_mem_page_remove(u64 tdr, u64 gpa, u64 level, u64 *rcx, u64 *rdx) }; u64 ret; - ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &args); + ret = tdx_seamcall_sept(TDH_MEM_PAGE_REMOVE, &args); *rcx = args.rcx; *rdx = args.rdx;