[v3,09/17] KVM: x86/mmu: Support GFN direct bits

From: Isaku Yamahata <isaku.yamahata@intel.com>

From: Isaku Yamahata <isaku.yamahata@intel.com>

Teach the MMU to map guest GFNs at a massaged position on the TDP, to aid
in implementing TDX shared memory.

Like other Coco technologies, TDX has the concept of private and shared
memory. For TDX the private and shared mappings are managed on separate
EPT roots. The private half is managed indirectly though calls into a
protected runtime environment called the TDX module, where the shared half
is managed within KVM in normal page tables.

For TDX, the shared half will be mapped in the higher alias, with a "shared
bit" set in the GPA. However, KVM will still manage it with the same
memslots as the private half. This means memslot looks ups and zapping
operations will be provided with a GFN without the shared bit set.

So KVM will either need to apply or strip the shared bit before mapping or
zapping the shared EPT. Having GFNs sometimes have the shared bit and
sometimes not would make the code confusing.

So instead arrange the code such that GFNs never have shared bit set.
Create a concept of "direct bits", that is stripped from the fault
address when setting fault->gfn, and applied within the TDP MMU iterator.
Calling code will behave as if is operating on the PTE mapping the GFN
(without shared bits) but within the iterator, the actual mappings will be
shifted using bits specific for the root. SPs will have the GFN set
without the shared bit. In the end the TDP MMU will behave like it is
mapping things at the GFN without the shared bit but with a strange page
table format where everything is offset by the shared bit.

Since TDX only needs to shift the mapping like this for the shared bit,
which is mapped as the normal TDP root, add a "gfn_direct_bits" field to
the kvm_arch structure for each VM with a default value of 0. It will
have the bit set at the position of the GPA shared bit in GFN through TD
specific initialization code. Keep TDX specific concepts out of the MMU
code by not naming it "shared".

Ranged TLB flushes (i.e. flush_remote_tlbs_range()) target specific GFN
ranges. In convention established above, these would need to target the
shifted GFN range. It won't matter functionally, since the actual
implementation will always result in a full flush for the only planned
user (TDX). For correctness reasons, future changes can provide a TDX
x86_ops.flush_remote_tlbs_range implementation to return -EOPNOTSUPP and
force the full flush for TDs.

This leaves one drawback. Some operations use a concept of max gfn (i.e.
kvm_mmu_max_gfn()), to iterate over the whole TDP range. These would then
exceed the range actually covered by each root. It should only result in a
bit of extra iterating, and not cause functional problems. This will be
addressed in a future change.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
TDX MMU Prep v3:
 - Add comment for kvm_gfn_root_mask() (Paolo)
 - Change names mask -> bits (Paolo)
 - Add comment in struct definition for fault->gfn not containing shared
   bit. (Paolo)
 - Drop special handling in kvm_arch_flush_remote_tlbs_range(),
   implement kvm_x86_ops.flush_remote_tlbs_range in a future patch.
   (Paolo)
 - Do addition of kvm arg to iterator in previous patch (Paolo)
 - OR gfn_bits in try_step_side() too, because of issue seen with 4
   level EPT
 - Add warning for GFN bits in wrong arg in tdp_iter_start()

TDX MMU Prep v2:
 - Rename from "KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA"
 - Dropped Binbin's reviewed-by tag because of the extend of the changes
 - Rename gfn_shared_mask to gfn_direct_mask.
 - Don't include shared bits in GFNs, hide the existence in the TDP MMU
   iterator.
 - Don't do range flushes if a gfn_direct_mask is present.
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu.h              |  5 +++++
 arch/x86/kvm/mmu/mmu_internal.h | 28 ++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/tdp_iter.c     | 10 ++++++----
 arch/x86/kvm/mmu/tdp_iter.h     | 10 ++++++----
 5 files changed, 45 insertions(+), 10 deletions(-)

Message ID	20240619223614.290657-10-rick.p.edgecombe@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 303BE15FA87; Wed, 19 Jun 2024 22:36:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718836592; cv=none; b=O3/Mi0LwKEAdIpcLzKb4PfwB0cogic8URR+TSzU2Ti/CNP1umuhh2xpmV4sdmK23g2HFoP7/vEIaq7+33kPZwIFVwJIE+AoWXj0Oo1rwqTExS56QUsiO/u4VkWdxjN6ANdmzyYLjZo7QmxB9Vzkunp1Y1a1g5/UfColOM1Z8BlM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718836592; c=relaxed/simple; bh=xjU0Uvi2jaGMr4Uc8852Yu77RNhF+BZmEqBYNn9YWGs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=b/QxsGHJGhmj396RX+wDlpNBrZ3MMuMf4L18jkob06u2OHHi+eRB7t5MRWPFsfdnsXwfJCLjnZpLq8JkkzrrzT6zf8Yxfpd843OZZEklBDD1qWKDYyKdAQDsRCYK1GUs9tl0M+W2H3PNxo1BKK9xBDaBL3L/Gxvd3vLn6Ic/wKM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QDC9E5Yt; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QDC9E5Yt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718836590; x=1750372590; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xjU0Uvi2jaGMr4Uc8852Yu77RNhF+BZmEqBYNn9YWGs=; b=QDC9E5Yt/b6DGS3Wq3CCJp90X+SVdDLYRoqeTCn5z5w8GLB9onrJroyj 6gGXctkVIQIk6yDRw8bTdTOSsv3aaY0j5O8m4eo/q4pqg+msANZrQEolG 3x+QU1eK7AhZ5dOG+c8YEwhGNpj66pco4rS9Yy0/1s+ZP3CzQA2Dvgl0S LolOvEhkYHW2WavTo5Ty6VBQaeDFu7dU7fGtUQf3QOJcj6e4qD2aJ9SA1 oGBTzxb9G3Kw9ssA+pHLLlIDxgePg1STRSlS4H5hAAGTQCNUZm0fQ+LCF 2QDR+EBlXXgpRvqEVjTCjwdRkWZLEjlc/iUZ+wX8u8udgk5ELwNlvCq3p Q==; X-CSE-ConnectionGUID: VuNpH33hRFWkEjBS7X2QrA== X-CSE-MsgGUID: sWFpSR10Se+cBpQuoHnCDQ== X-IronPort-AV: E=McAfee;i="6700,10204,11108"; a="15931968" X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="15931968" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 15:36:22 -0700 X-CSE-ConnectionGUID: WImm/HPZRbKglmHH8KEhyg== X-CSE-MsgGUID: sRvR66M0QrS2jYmIZwItPw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,251,1712646000"; d="scan'208";a="72793348" Received: from ivsilic-mobl2.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.54.39]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 15:36:22 -0700 From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org Cc: kai.huang@intel.com, dmatlack@google.com, erdemaktas@google.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, sagis@google.com, yan.y.zhao@intel.com, rick.p.edgecombe@intel.com, Isaku Yamahata <isaku.yamahata@intel.com> Subject: [PATCH v3 09/17] KVM: x86/mmu: Support GFN direct bits Date: Wed, 19 Jun 2024 15:36:06 -0700 Message-Id: <20240619223614.290657-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240619223614.290657-1-rick.p.edgecombe@intel.com> References: <20240619223614.290657-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: <kvm.vger.kernel.org> List-Subscribe: <mailto:kvm+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:kvm+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	TDX MMU prep series part 1 \| expand [v3,00/17] TDX MMU prep series part 1 [v3,01/17] KVM: x86/tdp_mmu: Rename REMOVED_SPTE to FROZEN_SPTE [v3,02/17] KVM: Add member to struct kvm_gfn_range for target alias [v3,03/17] KVM: x86: Add a VM type define for TDX [v3,04/17] KVM: x86/mmu: Add an external pointer to struct kvm_mmu_page [v3,05/17] KVM: x86/mmu: Add an is_mirror member for union kvm_mmu_page_role [v3,06/17] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void [v3,07/17] KVM: x86/tdp_mmu: Take struct kvm in iter loops [v3,08/17] KVM: x86/tdp_mmu: Take a GFN in kvm_tdp_mmu_fast_pf_get_last_sptep() [v3,09/17] KVM: x86/mmu: Support GFN direct bits [v3,10/17] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() [v3,11/17] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type [v3,12/17] KVM: x86/tdp_mmu: Take root in tdp_mmu_for_each_pte() [v3,13/17] KVM: x86/tdp_mmu: Support mirror root for TDP MMU [v3,14/17] KVM: x86/tdp_mmu: Propagate attr_filter to MMU notifier callbacks [v3,15/17] KVM: x86/tdp_mmu: Propagate building mirror page tables [v3,16/17] KVM: x86/tdp_mmu: Propagate tearing down mirror page tables [v3,17/17] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots()

[v3,09/17] KVM: x86/mmu: Support GFN direct bits

Commit Message

Patch