From patchwork Tue Apr  4 13:09:19 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Binbin Wu <binbin.wu@linux.intel.com>
X-Patchwork-Id: 13199930
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 38CB1C761AF
	for <kvm@archiver.kernel.org>; Tue,  4 Apr 2023 13:09:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235101AbjDDNJi (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 4 Apr 2023 09:09:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42854 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235068AbjDDNJg (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 4 Apr 2023 09:09:36 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66A201BCB
        for <kvm@vger.kernel.org>; Tue,  4 Apr 2023 06:09:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1680613773; x=1712149773;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=VgKorZEIo2q2LdcEZ2+a0FhiC5U5ASocrDIOqvCsoxg=;
  b=XfqsoQgcSaBhrI2bcvemIX8189N5rkZ92dbrX5Ca4iJROKYkv5/FOvTl
   v4T7qjuh7fWANc3k0R1zPhy5PXvHH2W7EQq/XxALhiow0RMXBV+8FZPfx
   0hc1BtauycPoVpDhK4njeYmIPZ1tcbYc4hpC4APZYzaU5WBOMuacbxIfn
   /Jh3J9PkPlDysShLJXynueMhVVoHRnnm0+ei10ap3WDo+gB5UA701sqsw
   xE4YuAPBDTKtbKaS/Kb1XwoOXS3cSXBI0vGTinzzlO9WzrmYlA6CQZd5P
   u/rrD4iyugOqBgX0aYs2XRDmegUSQEmfwT9a3O9/3MH1qHlmGYGmEKCt3
   Q==;
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="326193398"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="326193398"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:32 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="750902076"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="750902076"
Received: from binbinwu-mobl.ccr.corp.intel.com ([10.254.215.140])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:30 -0700
From: Binbin Wu <binbin.wu@linux.intel.com>
To: kvm@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com
Cc: binbin.wu@linux.intel.com, kai.huang@intel.com, chao.gao@intel.com,
        xuelian.guo@intel.com, robert.hu@linux.intel.com
Subject: [PATCH v7 1/5] KVM: x86: Virtualize CR4.LAM_SUP
Date: Tue,  4 Apr 2023 21:09:19 +0800
Message-Id: <20230404130923.27749-2-binbin.wu@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230404130923.27749-1-binbin.wu@linux.intel.com>
References: <20230404130923.27749-1-binbin.wu@linux.intel.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Robert Hoo <robert.hu@linux.intel.com>

Allow setting of CR4.LAM_SUP (bit 28) by guest if vCPU supports LAM,
and intercept the bit (as it already is).

LAM uses CR4.LAM_SUP to configure LAM masking on supervisor mode address.
To virtualize that, move CR4.LAM_SUP out of CR4_RESERVED_BITS and its
reservation depends on vCPU has LAM feature or not.
CR4.LAM_SUP is allowed to be set even not in 64-bit mode, but it will not
take effect since LAM only applies to 64-bit linear address.

Leave the bit intercepted to avoid vmread every time when KVM fetches its
value, with the expectation that guest won't toggle the bit frequently.

Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, so KVM
doesn't need to emulate TLB flush based on it.
There's no other features/vmx_exec_controls connection, therefore no code
need to be complemented in kvm/vmx_set_cr4().

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 3 ++-
 arch/x86/kvm/vmx/vmx.c          | 3 +++
 arch/x86/kvm/x86.h              | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 808c292ad3f4..ba594f9ea414 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,7 +125,8 @@
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
 			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
-			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+			  | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+			  | X86_CR4_LAM_SUP))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d2d6e1b6c788..42f163862a0f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7630,6 +7630,9 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
 	cr4_fixed1_update(X86_CR4_UMIP,       ecx, feature_bit(UMIP));
 	cr4_fixed1_update(X86_CR4_LA57,       ecx, feature_bit(LA57));
 
+	entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
+	cr4_fixed1_update(X86_CR4_LAM_SUP,    eax, feature_bit(LAM));
+
 #undef cr4_fixed1_update
 }
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a8167b47b8c8..3a9d97b899df 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -487,6 +487,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 		__reserved_bits |= X86_CR4_VMXE;        \
 	if (!__cpu_has(__c, X86_FEATURE_PCID))          \
 		__reserved_bits |= X86_CR4_PCIDE;       \
+	if (!__cpu_has(__c, X86_FEATURE_LAM))           \
+		__reserved_bits |= X86_CR4_LAM_SUP;     \
 	__reserved_bits;                                \
 })
 

From patchwork Tue Apr  4 13:09:20 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Binbin Wu <binbin.wu@linux.intel.com>
X-Patchwork-Id: 13199931
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 99F6BC77B6C
	for <kvm@archiver.kernel.org>; Tue,  4 Apr 2023 13:09:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234495AbjDDNJj (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 4 Apr 2023 09:09:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42926 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235022AbjDDNJh (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 4 Apr 2023 09:09:37 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C833010F2
        for <kvm@vger.kernel.org>; Tue,  4 Apr 2023 06:09:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1680613775; x=1712149775;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=mbRkI6bDeqElk/F1wX+YTMuEf3TRl9TBsDznB8Yq6DE=;
  b=KR5/d77Qa+BOk9X/+treAWx7puSu+mZBG/9SK6w5cH4XBKLTZY0bVbWb
   RMyrzt1PnyIMg70/0+Vc7CEXNXrcPuU1TBNyLbiqY2rh5Wr6AG6mo60yH
   Z4xRS6IRiyz4qRM1U0buB7Xje/D27yNN4gRYu87nOUl7B5jO5QCgSFwio
   cBdhx9bZMG/34NWGV7n3Ryf1IK1qm45c+rw1xoEuxgTvG7I/PZk+LMuJm
   PObW4oHfkdLMkedtRujvQQrBl6mtWpDNoS5e5GMhHmdFrs5szIs9wcJ54
   zDy+K5JmKyG4IJvnchr7OISWP9+ozrvstMAhWHE24BOT3AhuVde+OcMfA
   A==;
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="326193406"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="326193406"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:35 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="750902083"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="750902083"
Received: from binbinwu-mobl.ccr.corp.intel.com ([10.254.215.140])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:33 -0700
From: Binbin Wu <binbin.wu@linux.intel.com>
To: kvm@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com
Cc: binbin.wu@linux.intel.com, kai.huang@intel.com, chao.gao@intel.com,
        xuelian.guo@intel.com, robert.hu@linux.intel.com
Subject: [PATCH v7 2/5] KVM: x86: Virtualize CR3.LAM_{U48,U57}
Date: Tue,  4 Apr 2023 21:09:20 +0800
Message-Id: <20230404130923.27749-3-binbin.wu@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230404130923.27749-1-binbin.wu@linux.intel.com>
References: <20230404130923.27749-1-binbin.wu@linux.intel.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Robert Hoo <robert.hu@linux.intel.com>

LAM uses CR3.LAM_U48 (bit 62) and CR3.LAM_U57 (bit 61) to configure LAM
masking for user mode pointers.

When EPT is on:
CR3 is fully under control of guest, guest LAM is thus transparent to KVM.

When EPT is off (shadow paging):
- KVM needs to handle guest CR3.LAM_U48 and CR3.LAM_U57 toggles.
  The two bits are allowed to be set in CR3 if vCPU supports LAM.
  The two bits should be kept as they are in the shadow CR3.
- Perform GFN calculation from guest CR3/PGD generically by extracting the
  maximal base address mask since the validity has been checked already.
- Leave LAM bits in root.pgd to force a new root for a CR3+LAM combination.
  It could potentially increase root cache misses and mmu reload, however,
  it's very rare to turn off EPT when performace matters.

To be generic, introduce a field 'cr3_ctrl_bits' in kvm_vcpu_arch to record
the bits used to control supported features related to CR3 (e.g. LAM).
- Supported control bits are set to cr3_ctrl_bits after set cpuid.
- Add kvm_vcpu_is_legal_cr3() to validate CR3, tp allow setting of control
  bits for the supported features.
- Add kvm_get_active_cr3_ctrl_bits() to get the active control bits to form
  a new guest CR3 (in vmx_load_mmu_pgd()).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++++
 arch/x86/kvm/cpuid.h            | 5 +++++
 arch/x86/kvm/mmu.h              | 5 +++++
 arch/x86/kvm/mmu/mmu.c          | 6 +++++-
 arch/x86/kvm/mmu/mmu_internal.h | 1 +
 arch/x86/kvm/mmu/paging_tmpl.h  | 6 +++++-
 arch/x86/kvm/mmu/spte.h         | 2 +-
 arch/x86/kvm/vmx/nested.c       | 4 ++--
 arch/x86/kvm/vmx/vmx.c          | 6 +++++-
 arch/x86/kvm/x86.c              | 4 ++--
 10 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ba594f9ea414..498d2b5e8dc1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -729,6 +729,12 @@ struct kvm_vcpu_arch {
 	unsigned long cr0_guest_owned_bits;
 	unsigned long cr2;
 	unsigned long cr3;
+	/*
+	 * Bits in CR3 used to enable certain features. These bits are allowed
+	 * to be set in CR3 when vCPU supports the features. When shadow paging
+	 * is used, these bits should be kept as they are in the shadow CR3.
+	 */
+	u64 cr3_ctrl_bits;
 	unsigned long cr4;
 	unsigned long cr4_guest_owned_bits;
 	unsigned long cr4_guest_rsvd_bits;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..ef8e1b912d7d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 	return vcpu->arch.maxphyaddr;
 }
 
+static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);
+}
+
 static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	return !(gpa & vcpu->arch.reserved_gpa_bits);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 168c46fd8dd1..29985eeb8e12 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -142,6 +142,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
 }
 
+static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
+}
+
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
 	u64 root_hpa = vcpu->arch.mmu->root.hpa;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8ebe542c565..de2c51a0b611 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3732,7 +3732,11 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	hpa_t root;
 
 	root_pgd = mmu->get_guest_pgd(vcpu);
-	root_gfn = root_pgd >> PAGE_SHIFT;
+	/*
+	* The guest PGD has already been checked for validity, unconditionally
+	* strip non-address bits when computing the GFN.
+	*/
+	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index cc58631e2336..c0479cbc2ca3 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -21,6 +21,7 @@ extern bool dbg;
 #endif
 
 /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
+#define __PT_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
 #define __PT_LEVEL_SHIFT(level, bits_per_level)	\
 	(PAGE_SHIFT + ((level) - 1) * (bits_per_level))
 #define __PT_INDEX(address, level, bits_per_level) \
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 57f0b75c80f9..88351ba04249 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -62,7 +62,7 @@
 #endif
 
 /* Common logic, but per-type values.  These also need to be undefined. */
-#define PT_BASE_ADDR_MASK	((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
+#define PT_BASE_ADDR_MASK	((pt_element_t)__PT_BASE_ADDR_MASK)
 #define PT_LVL_ADDR_MASK(lvl)	__PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
 #define PT_LVL_OFFSET_MASK(lvl)	__PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
 #define PT_INDEX(addr, lvl)	__PT_INDEX(addr, lvl, PT_LEVEL_BITS)
@@ -324,6 +324,10 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
 	walker->level = mmu->cpu_role.base.level;
+	/*
+	 * No need to mask cr3_ctrl_bits, gpte_to_gfn() will strip
+	 * non-address bits.
+	 */
 	pte           = mmu->get_guest_pgd(vcpu);
 	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1279db2eab44..777f7d443e3b 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -36,7 +36,7 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
 #define SPTE_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
 #else
-#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define SPTE_BASE_ADDR_MASK __PT_BASE_ADDR_MASK
 #endif
 
 #define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1bc2b80273c9..d35bda9610e2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1079,7 +1079,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 			       bool nested_ept, bool reload_pdptrs,
 			       enum vm_entry_failure_code *entry_failure_code)
 {
-	if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) {
+	if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) {
 		*entry_failure_code = ENTRY_FAIL_DEFAULT;
 		return -EINVAL;
 	}
@@ -2907,7 +2907,7 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 
 	if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) ||
 	    CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) ||
-	    CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3)))
+	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
 	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 42f163862a0f..4d329ee9474c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3388,7 +3388,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			update_guest_cr3 = false;
 		vmx_ept_load_pdptrs(vcpu);
 	} else {
-		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
+		            kvm_get_active_cr3_ctrl_bits(vcpu);
 	}
 
 	if (update_guest_cr3)
@@ -7763,6 +7764,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmx->msr_ia32_feature_control_valid_bits &=
 			~FEAT_CTL_SGX_LC_ENABLED;
 
+	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
+		vcpu->arch.cr3_ctrl_bits |= X86_CR3_LAM_U48 | X86_CR3_LAM_U57;
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7713420abab0..aca255e69d0d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1260,7 +1260,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
 	 * the current vCPU mode is accurate.
 	 */
-	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+	if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
 		return 1;
 
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
@@ -11349,7 +11349,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 		 */
 		if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
 			return false;
-		if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3))
+		if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
 			return false;
 	} else {
 		/*

From patchwork Tue Apr  4 13:09:21 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Binbin Wu <binbin.wu@linux.intel.com>
X-Patchwork-Id: 13199932
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A201EC77B71
	for <kvm@archiver.kernel.org>; Tue,  4 Apr 2023 13:09:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235032AbjDDNJk (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 4 Apr 2023 09:09:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42972 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235018AbjDDNJj (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 4 Apr 2023 09:09:39 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5599726A3
        for <kvm@vger.kernel.org>; Tue,  4 Apr 2023 06:09:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1680613778; x=1712149778;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=4d4MvGLrRrowjXwqWAmjffMD8VAlfgoxSmzOLM/C/ug=;
  b=D4zOhcJeEPt0rd9xcJ++EOnAbg+nX4nVEeHj7b6d3uo258E7MLnHkduQ
   jZLL9yokP50R0SNtaI+c6S2PCy4Wf0UsIpLCn947ld1AnecWbTU/BCVdg
   Iz8WBK8/kHCqiymvkwvjLTxhewIwrV/DAtLzh+VaF6W9F/8Ilpv26Mf/S
   3YaswniPAJk8J0DYAzxPFEWQQbjYZT8vBAiQ/8LanlWf4ppQoZr9aLeUX
   uAr789ws00sF7pMoJlPQ9RncOyyDwcBUd/3xnbcJXX8Iz08FPxNN2z27I
   XfgK4U447M5HGGFa2SHox30whGbZWKgZkmy8OKAboqzSgzVSmB3dCyfaE
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="326193428"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="326193428"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:37 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="750902099"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="750902099"
Received: from binbinwu-mobl.ccr.corp.intel.com ([10.254.215.140])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:35 -0700
From: Binbin Wu <binbin.wu@linux.intel.com>
To: kvm@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com
Cc: binbin.wu@linux.intel.com, kai.huang@intel.com, chao.gao@intel.com,
        xuelian.guo@intel.com, robert.hu@linux.intel.com
Subject: [PATCH v7 3/5] KVM: x86: Introduce untag_addr() in kvm_x86_ops
Date: Tue,  4 Apr 2023 21:09:21 +0800
Message-Id: <20230404130923.27749-4-binbin.wu@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230404130923.27749-1-binbin.wu@linux.intel.com>
References: <20230404130923.27749-1-binbin.wu@linux.intel.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Introduce a new interface untag_addr() to kvm_x86_ops to untag the metadata
from linear address. Implement LAM version in VMX and dummy version in SVM.

When enabled feature like Intel Linear Address Masking or AMD Upper
Address Ignore, linear address may be tagged with metadata. Linear
address should be checked for modified canonicality and untagged in
instrution emulations or vmexit handlings if LAM or UAI is applicable.

Introduce untag_addr() to kvm_x86_ops to hide the code related to vendor
specific details.
- For VMX, LAM version is implemented.
  LAM has a modified canonical check when applicable:
  * LAM_S48                : [ 1 ][ metadata ][ 1 ]
                               63               47
  * LAM_U48                : [ 0 ][ metadata ][ 0 ]
                               63               47
  * LAM_S57                : [ 1 ][ metadata ][ 1 ]
                               63               56
  * LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ]
                               63               56
  * LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ]
                               63               56..47
  If LAM is applicable to certain address, untag the metadata bits and
  replace them with the value of bit 47 (LAM48) or bit 56 (LAM57). Later
  the untagged address will do legacy canonical check. So that LAM canonical
  check and mask can be covered by "untag + legacy canonical check".

  For cases LAM is not applicable, 'flags' is passed to the interface
  to skip untag.

- For SVM, add a dummy version to do nothing, but return the original
  address.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  5 +++
 arch/x86/kvm/svm/svm.c             |  7 ++++
 arch/x86/kvm/vmx/vmx.c             | 60 ++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h             |  2 +
 5 files changed, 75 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 8dc345cc6318..7d63d1b942ac 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -52,6 +52,7 @@ KVM_X86_OP(cache_reg)
 KVM_X86_OP(get_rflags)
 KVM_X86_OP(set_rflags)
 KVM_X86_OP(get_if_flag)
+KVM_X86_OP(untag_addr)
 KVM_X86_OP(flush_tlb_all)
 KVM_X86_OP(flush_tlb_current)
 KVM_X86_OP_OPTIONAL(tlb_remote_flush)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 498d2b5e8dc1..cb674ec826d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -69,6 +69,9 @@
 #define KVM_X86_NOTIFY_VMEXIT_VALID_BITS	(KVM_X86_NOTIFY_VMEXIT_ENABLED | \
 						 KVM_X86_NOTIFY_VMEXIT_USER)
 
+/* flags for kvm_x86_ops::untag_addr() */
+#define KVM_X86_UNTAG_ADDR_SKIP_LAM	_BITULL(0)
+
 /* x86-specific vcpu->requests bit members */
 #define KVM_REQ_MIGRATE_TIMER		KVM_ARCH_REQ(0)
 #define KVM_REQ_REPORT_TPR_ACCESS	KVM_ARCH_REQ(1)
@@ -1607,6 +1610,8 @@ struct kvm_x86_ops {
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 	bool (*get_if_flag)(struct kvm_vcpu *vcpu);
 
+	u64 (*untag_addr)(struct kvm_vcpu *vcpu, u64 la, u64 flags);
+
 	void (*flush_tlb_all)(struct kvm_vcpu *vcpu);
 	void (*flush_tlb_current)(struct kvm_vcpu *vcpu);
 	int  (*tlb_remote_flush)(struct kvm *kvm);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..a6e6bd09642b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4696,6 +4696,11 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static u64 svm_untag_addr(struct kvm_vcpu *vcpu, u64 addr, u64 flags)
+{
+	return addr;
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -4745,6 +4750,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_rflags = svm_set_rflags,
 	.get_if_flag = svm_get_if_flag,
 
+	.untag_addr = svm_untag_addr,
+
 	.flush_tlb_all = svm_flush_tlb_current,
 	.flush_tlb_current = svm_flush_tlb_current,
 	.flush_tlb_gva = svm_flush_tlb_gva,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4d329ee9474c..73cc495bd0da 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8137,6 +8137,64 @@ static void vmx_vm_destroy(struct kvm *kvm)
 	free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
 }
 
+
+#define LAM_S57_EN_MASK (X86_CR4_LAM_SUP | X86_CR4_LA57)
+
+static inline int lam_sign_extend_bit(bool user, struct kvm_vcpu *vcpu)
+{
+	u64 cr3, cr4;
+
+	if (user) {
+		cr3 = kvm_read_cr3(vcpu);
+		if (!!(cr3 & X86_CR3_LAM_U57))
+			return 56;
+		if (!!(cr3 & X86_CR3_LAM_U48))
+			return 47;
+	} else {
+		cr4 = kvm_read_cr4_bits(vcpu, LAM_S57_EN_MASK);
+		if (cr4 == LAM_S57_EN_MASK)
+			return 56;
+		if (!!(cr4 & X86_CR4_LAM_SUP))
+			return 47;
+	}
+	return -1;
+}
+
+/*
+ * Only called in 64-bit mode.
+ *
+ * Metadata bits are [62:48] in LAM48 and [62:57] in LAM57. Mask metadata in
+ * pointers by sign-extending the value of bit 47 (LAM48) or 56 (LAM57).
+ * The resulting address after untagging isn't guaranteed to be canonical.
+ * Callers should perform the original canonical check and raise #GP/#SS if the
+ * address is non-canonical.
+ */
+u64 vmx_untag_addr(struct kvm_vcpu *vcpu, u64 addr, u64 flags)
+{
+	int sign_ext_bit;
+
+	/*
+	 * Instead of calling relatively expensive guest_cpuid_has(), just check
+	 * LAM_U48 in cr3_ctrl_bits. If not set, vCPU doesn't supports LAM.
+	 */
+	if (!(vcpu->arch.cr3_ctrl_bits & X86_CR3_LAM_U48) ||
+	    !!(flags & KVM_X86_UNTAG_ADDR_SKIP_LAM))
+		return addr;
+
+	if(!is_64_bit_mode(vcpu)){
+		WARN_ONCE(1, "Only be called in 64-bit mode");
+		return addr;
+	}
+
+	sign_ext_bit = lam_sign_extend_bit(!(addr >> 63), vcpu);
+
+	if (sign_ext_bit < 0)
+		return addr;
+
+	return (sign_extend64(addr, sign_ext_bit) & ~BIT_ULL(63)) |
+	       (addr & BIT_ULL(63));
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -8185,6 +8243,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.set_rflags = vmx_set_rflags,
 	.get_if_flag = vmx_get_if_flag,
 
+	.untag_addr = vmx_untag_addr,
+
 	.flush_tlb_all = vmx_flush_tlb_all,
 	.flush_tlb_current = vmx_flush_tlb_current,
 	.flush_tlb_gva = vmx_flush_tlb_gva,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 2acdc54bc34b..79855b9a4aca 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type);
 u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu);
 u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu);
 
+u64 vmx_untag_addr(struct kvm_vcpu *vcpu, u64 addr, u64 flags);
+
 static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
 					     int type, bool value)
 {

From patchwork Tue Apr  4 13:09:22 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Binbin Wu <binbin.wu@linux.intel.com>
X-Patchwork-Id: 13199933
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EBA71C77B6C
	for <kvm@archiver.kernel.org>; Tue,  4 Apr 2023 13:09:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235133AbjDDNJo (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 4 Apr 2023 09:09:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43036 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235102AbjDDNJl (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 4 Apr 2023 09:09:41 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 930411BE6
        for <kvm@vger.kernel.org>; Tue,  4 Apr 2023 06:09:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1680613780; x=1712149780;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=AO+S72K+4y49T1asUpYGvxgUT+1zI3bcnbJrnZhKtOQ=;
  b=aCjKOT5JxNWqvyc8NPpErzEOgFUU+ch5qo2MbyAlZJ/SMfJEqd4doW0d
   G/ptjEFzN87bl/8UXe0D+KX/LNBK1R87sBWztb/R6ExZ5cm/O2q6T3+4m
   rtMekAiVBliXlRZ7uP93dWQboGKeTETNrR0nkvubFH3IGLeA+0slF4dEZ
   fBOE7t60jXFcGLK5B+tPvynSX9cYpyFTHBWU4WhsG8gc8AR1rCSDGsIYE
   ZjagsR8SwOquyvUEXsh+OsMngC0ayxKKJ8GEGCCu+CgqVX4yfI1XTjtxd
   ibb/BNcqKKch7ki1+axy3v9XkRAVrbUCpp/MzMuXkLGJ/N60TBdSOiMfN
   g==;
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="326193437"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="326193437"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:40 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="750902104"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="750902104"
Received: from binbinwu-mobl.ccr.corp.intel.com ([10.254.215.140])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:38 -0700
From: Binbin Wu <binbin.wu@linux.intel.com>
To: kvm@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com
Cc: binbin.wu@linux.intel.com, kai.huang@intel.com, chao.gao@intel.com,
        xuelian.guo@intel.com, robert.hu@linux.intel.com
Subject: [PATCH v7 4/5] KVM: x86: Untag address when LAM applicable
Date: Tue,  4 Apr 2023 21:09:22 +0800
Message-Id: <20230404130923.27749-5-binbin.wu@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230404130923.27749-1-binbin.wu@linux.intel.com>
References: <20230404130923.27749-1-binbin.wu@linux.intel.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Untag address for 64-bit memory/mmio operand in instruction emulations
and vmexit handlers when LAM is applicable.

For instruction emulation, untag address in __linearize() before
canonical check. LAM doesn't apply to instruction fetch and invlpg,
use KVM_X86_UNTAG_ADDR_SKIP_LAM to skip LAM untag.

For vmexit handlings related to 64-bit linear address:
- Cases need to untag address
  Operand(s) of VMX instructions and INVPCID
  Operand(s) of SGX ENCLS
  Linear address in INVVPID descriptor.
- Cases LAM doesn't apply to (no change needed)
  Operand of INVLPG
  Linear address in INVPCID descriptor

Co-developed-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
---
 arch/x86/kvm/emulate.c     | 23 ++++++++++++++++++-----
 arch/x86/kvm/kvm_emulate.h |  2 ++
 arch/x86/kvm/vmx/nested.c  |  4 ++++
 arch/x86/kvm/vmx/sgx.c     |  1 +
 arch/x86/kvm/x86.c         | 10 ++++++++++
 5 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a20bec931764..b7df465eccf2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -688,7 +688,8 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
 				       struct segmented_address addr,
 				       unsigned *max_size, unsigned size,
 				       bool write, bool fetch,
-				       enum x86emul_mode mode, ulong *linear)
+				       enum x86emul_mode mode, ulong *linear,
+				       u64 untag_flags)
 {
 	struct desc_struct desc;
 	bool usable;
@@ -701,6 +702,7 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
 	*max_size = 0;
 	switch (mode) {
 	case X86EMUL_MODE_PROT64:
+		la = ctxt->ops->untag_addr(ctxt, la, untag_flags);
 		*linear = la;
 		va_bits = ctxt_virt_addr_bits(ctxt);
 		if (!__is_canonical_address(la, va_bits))
@@ -758,7 +760,7 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
 {
 	unsigned max_size;
 	return __linearize(ctxt, addr, &max_size, size, write, false,
-			   ctxt->mode, linear);
+			   ctxt->mode, linear, 0);
 }
 
 static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
@@ -771,7 +773,12 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
 
 	if (ctxt->op_bytes != sizeof(unsigned long))
 		addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
-	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
+	/*
+	 * LAM does not apply to addresses used for instruction fetches
+	 * or to those that specify the targets of jump and call instructions
+	 */
+	rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode,
+	                 &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
 	if (rc == X86EMUL_CONTINUE)
 		ctxt->_eip = addr.ea;
 	return rc;
@@ -906,9 +913,12 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctxt *ctxt, int op_size)
 	 * __linearize is called with size 0 so that it does not do any
 	 * boundary check itself.  Instead, we use max_size to check
 	 * against op_size.
+	 *
+	 * LAM does not apply to addresses used for instruction fetches
+	 * or to those that specify the targets of jump and call instructions
 	 */
 	rc = __linearize(ctxt, addr, &max_size, 0, false, true, ctxt->mode,
-			 &linear);
+			 &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
 	if (unlikely(rc != X86EMUL_CONTINUE))
 		return rc;
 
@@ -3433,8 +3443,11 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt)
 {
 	int rc;
 	ulong linear;
+	unsigned max_size;
 
-	rc = linearize(ctxt, ctxt->src.addr.mem, 1, false, &linear);
+	/* skip untag for invlpg since LAM is not applied to invlpg */
+	rc = __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1, false, false,
+			 ctxt->mode, &linear, KVM_X86_UNTAG_ADDR_SKIP_LAM);
 	if (rc == X86EMUL_CONTINUE)
 		ctxt->ops->invlpg(ctxt, linear);
 	/* Disable writeback. */
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index ab65f3a47dfd..8d9f782adccb 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -225,6 +225,8 @@ struct x86_emulate_ops {
 	int (*leave_smm)(struct x86_emulate_ctxt *ctxt);
 	void (*triple_fault)(struct x86_emulate_ctxt *ctxt);
 	int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr);
+
+	u64 (*untag_addr)(struct x86_emulate_ctxt *ctxt, u64 addr, u64 flags);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d35bda9610e2..48cca88bfd37 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4970,6 +4970,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned long exit_qualification,
 		else
 			*ret = off;
 
+		*ret = vmx_untag_addr(vcpu, *ret, 0);
 		/* Long mode: #GP(0)/#SS(0) if the memory address is in a
 		 * non-canonical form. This is the only check on the memory
 		 * destination for long mode!
@@ -5787,6 +5788,9 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 	vpid02 = nested_get_vpid02(vcpu);
 	switch (type) {
 	case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
+		/* invvpid is not valid in compatibility mode */
+		if (is_long_mode(vcpu))
+			operand.gla = vmx_untag_addr(vcpu, operand.gla, 0);
 		if (!operand.vpid ||
 		    is_noncanonical_address(operand.gla, vcpu))
 			return nested_vmx_fail(vcpu,
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 0574030b071f..527f1a902c65 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -37,6 +37,7 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
 	if (!IS_ALIGNED(*gva, alignment)) {
 		fault = true;
 	} else if (likely(is_64_bit_mode(vcpu))) {
+		*gva = vmx_untag_addr(vcpu, *gva, 0);
 		fault = is_noncanonical_address(*gva, vcpu);
 	} else {
 		*gva &= 0xffffffff;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aca255e69d0d..18ad38649714 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8218,6 +8218,11 @@ static void emulator_vm_bugged(struct x86_emulate_ctxt *ctxt)
 		kvm_vm_bugged(kvm);
 }
 
+static u64 emulator_untag_addr(struct x86_emulate_ctxt *ctxt, u64 addr, u64 flags)
+{
+	return static_call(kvm_x86_untag_addr)(emul_to_vcpu(ctxt), addr, flags);
+}
+
 static const struct x86_emulate_ops emulate_ops = {
 	.vm_bugged           = emulator_vm_bugged,
 	.read_gpr            = emulator_read_gpr,
@@ -8263,6 +8268,7 @@ static const struct x86_emulate_ops emulate_ops = {
 	.leave_smm           = emulator_leave_smm,
 	.triple_fault        = emulator_triple_fault,
 	.set_xcr             = emulator_set_xcr,
+	.untag_addr          = emulator_untag_addr,
 };
 
 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
@@ -13260,6 +13266,10 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 
 	switch (type) {
 	case INVPCID_TYPE_INDIV_ADDR:
+		/*
+		 * LAM doesn't apply to the linear address in the descriptor,
+		 * still need to be canonical
+		 */
 		if ((!pcid_enabled && (operand.pcid != 0)) ||
 		    is_noncanonical_address(operand.gla, vcpu)) {
 			kvm_inject_gp(vcpu, 0);

From patchwork Tue Apr  4 13:09:23 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Binbin Wu <binbin.wu@linux.intel.com>
X-Patchwork-Id: 13199934
Return-Path: <kvm-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8744BC6FD1D
	for <kvm@archiver.kernel.org>; Tue,  4 Apr 2023 13:09:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234786AbjDDNJr (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Tue, 4 Apr 2023 09:09:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43098 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234007AbjDDNJn (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 4 Apr 2023 09:09:43 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1B22198A
        for <kvm@vger.kernel.org>; Tue,  4 Apr 2023 06:09:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1680613782; x=1712149782;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=0pdsg5GImSmlfMLMsDn0scgMa9yM0+CDoCvSL+FuWRI=;
  b=VqhnH8OtYdhU6bmfyr4fqvh4+m9/jR3uVVWC7rRUP3DkUkLIgI9D09Z3
   g6kYzBf1zJXCrYYMwXeD/baV3+mSzfKfD/NB/4VZlb2BEyxEeyw0/B+2O
   t3x3TDaptRHoabiI7gFPfC/RL8Gq45vT99b0rJDYZNwJgY/1ff+aUuswa
   weWeCGQWGDYav7C7RlH5CTCbnHTc32R17+f4Z5sdtbPDHHIoAgqfLdH2n
   MO1dsokwaD6Hfk158icO0JGcaQAg+XzbRas5EhY9gpm7ogVCqi1ax7HC1
   Ztj/0K2Ol+nwlMAp8gLBWeMX7URdnj+HxjCvw9tIwQkAL/hVhQgOqHc3d
   g==;
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="326193447"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="326193447"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:42 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10670"; a="750902109"
X-IronPort-AV: E=Sophos;i="5.98,317,1673942400";
   d="scan'208";a="750902109"
Received: from binbinwu-mobl.ccr.corp.intel.com ([10.254.215.140])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 04 Apr 2023 06:09:40 -0700
From: Binbin Wu <binbin.wu@linux.intel.com>
To: kvm@vger.kernel.org, seanjc@google.com, pbonzini@redhat.com
Cc: binbin.wu@linux.intel.com, kai.huang@intel.com, chao.gao@intel.com,
        xuelian.guo@intel.com, robert.hu@linux.intel.com
Subject: [PATCH v7 5/5] KVM: x86: Expose LAM feature to userspace VMM
Date: Tue,  4 Apr 2023 21:09:23 +0800
Message-Id: <20230404130923.27749-6-binbin.wu@linux.intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20230404130923.27749-1-binbin.wu@linux.intel.com>
References: <20230404130923.27749-1-binbin.wu@linux.intel.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Robert Hoo <robert.hu@linux.intel.com>

LAM feature is enumerated by CPUID.7.1:EAX.LAM[bit 26].
Expose the feature to userspace as the final step after the following
supports:
- CR4.LAM_SUP virtualization
- CR3.LAM_U48 and CR3.LAM_U57 virtualization
- Check and untag 64-bit linear address when LAM applies in instruction
  emulations and vmexit handlers.

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 599aebec2d52..1dd5e9034386 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -670,7 +670,7 @@ void kvm_set_cpu_caps(void)
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
 		F(FZRM) | F(FSRS) | F(FSRC) |
-		F(AMX_FP16) | F(AVX_IFMA)
+		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,