diff mbox series

[v9,3/6] KVM: x86: Virtualize CR3.LAM_{U48,U57}

Message ID 20230606091842.13123-4-binbin.wu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Linear Address Masking (LAM) KVM Enabling | expand

Commit Message

Binbin Wu June 6, 2023, 9:18 a.m. UTC
From: Robert Hoo <robert.hu@linux.intel.com>

Add support to allow guests to set two new CR3 non-address control bits for
guests to enable the new Intel CPU feature Linear Address Masking (LAM) on user
pointers.

LAM modifies the checking that is applied to 64-bit linear addresses, allowing
software to use of the untranslated address bits for metadata and masks the
metadata bits before using them as linear addresses to access memory. LAM uses
two new CR3 non-address bits LAM_U48 (bit 62) and AM_U57 (bit 61) to configure
LAM for user pointers. LAM also changes VMENTER to allow both bits to be set in
VMCS's HOST_CR3 and GUEST_CR3 for virtualization.

When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set any of
the two LAM control bits. However, when EPT is off, the actual CR3 used by the
guest is generated from the shadow MMU root which is different from the CR3 that
is *set* by the guest, and KVM needs to manually apply any active control bits
to VMCS's GUEST_CR3 based on the cached CR3 *seen* by the guest.

KVM manually checks guest's CR3 to make sure it points to a valid guest physical
address (i.e. to support smaller MAXPHYSADDR in the guest). Extend this check
to allow the two LAM control bits to be set. And to make such check generic,
introduce a new field 'cr3_ctrl_bits' to vcpu to record all feature control bits
that are allowed to be set by the guest. After check, non-address bits of guest
CR3 will be stripped off to extract guest physical address.

In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3 and
GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when L0 enters
L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2 directly. KVM also
manually checks VMCS12's HOST_CR3 and GUEST_CR3 being valid physical address.
Extend such check to allow the new LAM control bits too.

Note, LAM doesn't have a global control bit to turn on/off LAM completely, but
purely depends on hardware's CPUID to determine it can be enabled or not. That
means, when EPT is on, even when KVM doesn't expose LAM to guest, the guest can
still set LAM control bits in CR3 w/o causing problem. This is an unfortunate
virtualization hole. KVM could choose to intercept CR3 in this case and inject
fault but this would hurt performance when running a normal VM w/o LAM support.
This is undesirable. Just choose to let the guest do such illegal thing as the
worst case is guest being killed when KVM eventually find out such illegal
behaviour and that is the guest to blame.

Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.
Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code,
to provide a clear distinction b/t CR3 and GPA checks.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++++
 arch/x86/kvm/cpuid.h            | 5 +++++
 arch/x86/kvm/mmu.h              | 5 +++++
 arch/x86/kvm/mmu/mmu.c          | 8 +++++++-
 arch/x86/kvm/mmu/mmu_internal.h | 1 +
 arch/x86/kvm/mmu/paging_tmpl.h  | 3 ++-
 arch/x86/kvm/mmu/spte.h         | 2 +-
 arch/x86/kvm/svm/nested.c       | 4 ++--
 arch/x86/kvm/vmx/nested.c       | 4 ++--
 arch/x86/kvm/vmx/vmx.c          | 8 +++++++-
 arch/x86/kvm/x86.c              | 4 ++--
 11 files changed, 39 insertions(+), 10 deletions(-)

Comments

Sean Christopherson June 27, 2023, 11:40 p.m. UTC | #1
On Tue, Jun 06, 2023, Binbin Wu wrote:
> Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.

This are not the type of changes to do opportunstically.   Opportunstic changes
are things like fixing comment typos, dropping unnecessary semi-colons, fixing
coding styles violations, etc.

> Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code,
> to provide a clear distinction b/t CR3 and GPA checks.

This *shouldn't* be an opportunsitic thing.  That you felt compelled to call it
out is a symptom of this patch doing too much.

In short, split this into three patches:

  1. Do the __PT_BASE_ADDR_MASK() changes
  2. Add and use kvm_vcpu_is_legal_cr3()
  3. Add support for CR3.LAM bits

> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Xuelian Guo <xuelian.guo@intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 5 +++++
>  arch/x86/kvm/cpuid.h            | 5 +++++
>  arch/x86/kvm/mmu.h              | 5 +++++
>  arch/x86/kvm/mmu/mmu.c          | 8 +++++++-
>  arch/x86/kvm/mmu/mmu_internal.h | 1 +
>  arch/x86/kvm/mmu/paging_tmpl.h  | 3 ++-
>  arch/x86/kvm/mmu/spte.h         | 2 +-
>  arch/x86/kvm/svm/nested.c       | 4 ++--
>  arch/x86/kvm/vmx/nested.c       | 4 ++--
>  arch/x86/kvm/vmx/vmx.c          | 8 +++++++-
>  arch/x86/kvm/x86.c              | 4 ++--
>  11 files changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c6f03d151c31..46471dd9cc1b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -727,6 +727,11 @@ struct kvm_vcpu_arch {
>  	unsigned long cr0_guest_owned_bits;
>  	unsigned long cr2;
>  	unsigned long cr3;
> +	/*
> +	 * CR3 non-address feature control bits.
> +	 * Guest CR3 may contain any of those bits at runtime.
> +	 */
> +	u64 cr3_ctrl_bits;

This should be an "unsigned long".

Hmm, "ctrl_bits" is unnecessarily generic at this point.  It's also arguably wrong,
because X86_CR3_PCID_NOFLUSH is also a control bit, it's just allowed in CR3 itself.

I think I'd prefer to drop this field and avoid bikeshedding the name entirely.  The
only reason to effectively cache "X86_CR3_LAM_U48 | X86_CR3_LAM_U57" is because
guest_cpuid_has() is slow, and I'd rather solve that problem with the "governed
feature" framework.

More below.

>  	unsigned long cr4;
>  	unsigned long cr4_guest_owned_bits;
>  	unsigned long cr4_guest_rsvd_bits;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index b1658c0de847..ef8e1b912d7d 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.maxphyaddr;
>  }
>  
> +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)

Heh, I think it makes sense to wrap this one.  I'll probably tell you differently
tomorrow, but today, let's wrap.

> +{
> +	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);

Don't open code something for which there is a perfect helper, i.e. use
kvm_vcpu_is_legal_gpa().

If we go the governed feature route, this becomes:

static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu,
					 unsigned long cr3)
{
	if (guest_can_use(vcpu, X86_FEATURE_LAM))
		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);

	return kvm_vcpu_is_legal_gpa(cr3);
}

> +}
> +
>  static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>  {
>  	return !(gpa & vcpu->arch.reserved_gpa_bits);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 92d5a1924fc1..81d8a433dae1 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -144,6 +144,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>  	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
>  }
>  
> +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)

And then this becomes:

static inline u64 kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
{
	if (!guest_can_use(vcpu, X86_FEATURE_LAM))
		return 0;

	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
}

> +{
> +	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
> +}
> +
>  static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>  {
>  	u64 root_hpa = vcpu->arch.mmu->root.hpa;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c8961f45e3b1..deea9a9f0c75 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>  	hpa_t root;
>  
>  	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
> -	root_gfn = root_pgd >> PAGE_SHIFT;
> +	/*
> +	 * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
> +	 * additional control bits (e.g. LAM control bits). To be generic,
> +	 * unconditionally strip non-address bits when computing the GFN since
> +	 * the guest PGD has already been checked for validity.
> +	 */

Drop this comment, the code is self-explanatory, and the comment is incomplete,
e.g. it can also be nCR3.

> +	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>  
>  	if (mmu_check_root(vcpu, root_gfn))
>  		return 1;
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index d39af5639ce9..7d2105432d66 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -21,6 +21,7 @@ extern bool dbg;
>  #endif
>  
>  /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
> +#define __PT_BASE_ADDR_MASK GENMASK_ULL(51, 12)
>  #define __PT_LEVEL_SHIFT(level, bits_per_level)	\
>  	(PAGE_SHIFT + ((level) - 1) * (bits_per_level))
>  #define __PT_INDEX(address, level, bits_per_level) \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 0662e0278e70..394733ac9088 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -62,7 +62,7 @@
>  #endif
>  
>  /* Common logic, but per-type values.  These also need to be undefined. */
> -#define PT_BASE_ADDR_MASK	((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
> +#define PT_BASE_ADDR_MASK	((pt_element_t)__PT_BASE_ADDR_MASK)
>  #define PT_LVL_ADDR_MASK(lvl)	__PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>  #define PT_LVL_OFFSET_MASK(lvl)	__PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>  #define PT_INDEX(addr, lvl)	__PT_INDEX(addr, lvl, PT_LEVEL_BITS)
> @@ -324,6 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	trace_kvm_mmu_pagetable_walk(addr, access);
>  retry_walk:
>  	walker->level = mmu->cpu_role.base.level;
> +	/* gpte_to_gfn() will strip non-address bits. */

Drop this comment too, it's not relevant to the immediate code, i.e. it'd be
better suited about this code:

	table_gfn = gpte_to_gfn(pte);

but IMO that code is quite self-explanatory too.

> @@ -7740,6 +7741,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		vmx->msr_ia32_feature_control_valid_bits &=
>  			~FEAT_CTL_SGX_LC_ENABLED;
>  
> +	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))

This is wrong, KVM needs to check that the host supports LAM too, otherwise KVM
will allow userspace to shove garbage into guest CR3 and induce VM-Entry failures
and whatnot.  If we go the guest_can_use() route, this problem solves itself.
Binbin Wu June 28, 2023, 3:05 a.m. UTC | #2
On 6/28/2023 7:40 AM, Sean Christopherson wrote:
> On Tue, Jun 06, 2023, Binbin Wu wrote:
>> Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK.
> This are not the type of changes to do opportunstically.   Opportunstic changes
> are things like fixing comment typos, dropping unnecessary semi-colons, fixing
> coding styles violations, etc.

OK, thanks for the education.
>
>> Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested code,
>> to provide a clear distinction b/t CR3 and GPA checks.
> This *shouldn't* be an opportunsitic thing.  That you felt compelled to call it
> out is a symptom of this patch doing too much.
>
> In short, split this into three patches:
>
>    1. Do the __PT_BASE_ADDR_MASK() changes
>    2. Add and use kvm_vcpu_is_legal_cr3()
>    3. Add support for CR3.LAM bits
Will do that, thanks.

>
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Tested-by: Xuelian Guo <xuelian.guo@intel.com>
>> Reviewed-by: Kai Huang <kai.huang@intel.com>
>> Reviewed-by: Chao Gao <chao.gao@intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h | 5 +++++
>>   arch/x86/kvm/cpuid.h            | 5 +++++
>>   arch/x86/kvm/mmu.h              | 5 +++++
>>   arch/x86/kvm/mmu/mmu.c          | 8 +++++++-
>>   arch/x86/kvm/mmu/mmu_internal.h | 1 +
>>   arch/x86/kvm/mmu/paging_tmpl.h  | 3 ++-
>>   arch/x86/kvm/mmu/spte.h         | 2 +-
>>   arch/x86/kvm/svm/nested.c       | 4 ++--
>>   arch/x86/kvm/vmx/nested.c       | 4 ++--
>>   arch/x86/kvm/vmx/vmx.c          | 8 +++++++-
>>   arch/x86/kvm/x86.c              | 4 ++--
>>   11 files changed, 39 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index c6f03d151c31..46471dd9cc1b 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -727,6 +727,11 @@ struct kvm_vcpu_arch {
>>   	unsigned long cr0_guest_owned_bits;
>>   	unsigned long cr2;
>>   	unsigned long cr3;
>> +	/*
>> +	 * CR3 non-address feature control bits.
>> +	 * Guest CR3 may contain any of those bits at runtime.
>> +	 */
>> +	u64 cr3_ctrl_bits;
> This should be an "unsigned long".
>
> Hmm, "ctrl_bits" is unnecessarily generic at this point.  It's also arguably wrong,
> because X86_CR3_PCID_NOFLUSH is also a control bit, it's just allowed in CR3 itself.
>
> I think I'd prefer to drop this field and avoid bikeshedding the name entirely.  The
> only reason to effectively cache "X86_CR3_LAM_U48 | X86_CR3_LAM_U57" is because
> guest_cpuid_has() is slow, and I'd rather solve that problem with the "governed
> feature" framework.
Thanks for the suggestion.

Is the below patch the lastest patch of "governed feature" framework 
support?
https://lore.kernel.org/kvm/20230217231022.816138-2-seanjc@google.com/

Do you have plan to apply it to kvm-x86 repo?

>
> More below.
>
>>   	unsigned long cr4;
>>   	unsigned long cr4_guest_owned_bits;
>>   	unsigned long cr4_guest_rsvd_bits;
>> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
>> index b1658c0de847..ef8e1b912d7d 100644
>> --- a/arch/x86/kvm/cpuid.h
>> +++ b/arch/x86/kvm/cpuid.h
>> @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
>>   	return vcpu->arch.maxphyaddr;
>>   }
>>   
>> +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> Heh, I think it makes sense to wrap this one.  I'll probably tell you differently
> tomorrow, but today, let's wrap.
>
>> +{
>> +	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);
> Don't open code something for which there is a perfect helper, i.e. use
> kvm_vcpu_is_legal_gpa().
I didn't use the helper because after masking the control bits (LAM 
bits), CR3 is not  a GPA conceptally,
i.e, it contains PCID or PWT/PCD in lower bits.
But maybe this is my overthinking?Will use the helper instead.

>
> If we go the governed feature route, this becomes:
>
> static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu,
> 					 unsigned long cr3)
> {
> 	if (guest_can_use(vcpu, X86_FEATURE_LAM))
> 		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
>
> 	return kvm_vcpu_is_legal_gpa(cr3);
> }
>
>> +}
>> +
>>   static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
>>   {
>>   	return !(gpa & vcpu->arch.reserved_gpa_bits);
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 92d5a1924fc1..81d8a433dae1 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -144,6 +144,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>>   	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
>>   }
>>   
>> +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
> And then this becomes:
>
> static inline u64 kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
> {
> 	if (!guest_can_use(vcpu, X86_FEATURE_LAM))
> 		return 0;
>
> 	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> }
>
>> +{
>> +	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
>> +}
>> +
>>   static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
>>   {
>>   	u64 root_hpa = vcpu->arch.mmu->root.hpa;
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index c8961f45e3b1..deea9a9f0c75 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>>   	hpa_t root;
>>   
>>   	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
>> -	root_gfn = root_pgd >> PAGE_SHIFT;
>> +	/*
>> +	 * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
>> +	 * additional control bits (e.g. LAM control bits). To be generic,
>> +	 * unconditionally strip non-address bits when computing the GFN since
>> +	 * the guest PGD has already been checked for validity.
>> +	 */
> Drop this comment, the code is self-explanatory, and the comment is incomplete,
> e.g. it can also be nCR3.
I was trying to use CR3 for both nested/non-nested cases. Sorry for the 
confusion.
Anyway, will drop the comment.


>
>> +	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
>>   
>>   	if (mmu_check_root(vcpu, root_gfn))
>>   		return 1;
>> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
>> index d39af5639ce9..7d2105432d66 100644
>> --- a/arch/x86/kvm/mmu/mmu_internal.h
>> +++ b/arch/x86/kvm/mmu/mmu_internal.h
>> @@ -21,6 +21,7 @@ extern bool dbg;
>>   #endif
>>   
>>   /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
>> +#define __PT_BASE_ADDR_MASK GENMASK_ULL(51, 12)
>>   #define __PT_LEVEL_SHIFT(level, bits_per_level)	\
>>   	(PAGE_SHIFT + ((level) - 1) * (bits_per_level))
>>   #define __PT_INDEX(address, level, bits_per_level) \
>> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
>> index 0662e0278e70..394733ac9088 100644
>> --- a/arch/x86/kvm/mmu/paging_tmpl.h
>> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
>> @@ -62,7 +62,7 @@
>>   #endif
>>   
>>   /* Common logic, but per-type values.  These also need to be undefined. */
>> -#define PT_BASE_ADDR_MASK	((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
>> +#define PT_BASE_ADDR_MASK	((pt_element_t)__PT_BASE_ADDR_MASK)
>>   #define PT_LVL_ADDR_MASK(lvl)	__PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>>   #define PT_LVL_OFFSET_MASK(lvl)	__PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
>>   #define PT_INDEX(addr, lvl)	__PT_INDEX(addr, lvl, PT_LEVEL_BITS)
>> @@ -324,6 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>>   	trace_kvm_mmu_pagetable_walk(addr, access);
>>   retry_walk:
>>   	walker->level = mmu->cpu_role.base.level;
>> +	/* gpte_to_gfn() will strip non-address bits. */
> Drop this comment too, it's not relevant to the immediate code, i.e. it'd be
> better suited about this code:
>
> 	table_gfn = gpte_to_gfn(pte);
>
> but IMO that code is quite self-explanatory too.

OK, thanks.
>
>> @@ -7740,6 +7741,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>   		vmx->msr_ia32_feature_control_valid_bits &=
>>   			~FEAT_CTL_SGX_LC_ENABLED;
>>   
>> +	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
> This is wrong, KVM needs to check that the host supports LAM too, otherwise KVM
> will allow userspace to shove garbage into guest CR3 and induce VM-Entry failures
> and whatnot.
Right, will fix it.

>    If we go the guest_can_use() route, this problem solves itself.
Sean Christopherson June 28, 2023, 5:40 p.m. UTC | #3
On Wed, Jun 28, 2023, Binbin Wu wrote:
> 
> 
> On 6/28/2023 7:40 AM, Sean Christopherson wrote:
> > I think I'd prefer to drop this field and avoid bikeshedding the name entirely.  The
> > only reason to effectively cache "X86_CR3_LAM_U48 | X86_CR3_LAM_U57" is because
> > guest_cpuid_has() is slow, and I'd rather solve that problem with the "governed
> > feature" framework.
> Thanks for the suggestion.
> 
> Is the below patch the lastest patch of "governed feature" framework
> support?
> https://lore.kernel.org/kvm/20230217231022.816138-2-seanjc@google.com/

Yes, I haven't refreshed it since the original posting.

> Do you have plan to apply it to kvm-x86 repo?

I'm leaning more and more towards pushing it through sooner than later as this
isn't the first time in recent memory that a patch/series has done somewhat odd
things to workaround guest_cpuid_has() being slow.  I was hoping to get feedback
before applying, but that's not looking likely at this point.

> > More below.
> > 
> > >   	unsigned long cr4;
> > >   	unsigned long cr4_guest_owned_bits;
> > >   	unsigned long cr4_guest_rsvd_bits;
> > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > > index b1658c0de847..ef8e1b912d7d 100644
> > > --- a/arch/x86/kvm/cpuid.h
> > > +++ b/arch/x86/kvm/cpuid.h
> > > @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
> > >   	return vcpu->arch.maxphyaddr;
> > >   }
> > > +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> > Heh, I think it makes sense to wrap this one.  I'll probably tell you differently
> > tomorrow, but today, let's wrap.
> > 
> > > +{
> > > +	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);
> > Don't open code something for which there is a perfect helper, i.e. use
> > kvm_vcpu_is_legal_gpa().
> I didn't use the helper because after masking the control bits (LAM bits),
> CR3 is not  a GPA conceptally, i.e, it contains PCID or PWT/PCD in lower
> bits.
> But maybe this is my overthinking?Will use the helper instead.

You're not overthinking it, I had the exact same reaction.  However, the above
also directly looks at arch.reserved_gpa_bits, i.e. treats CR3 like a GPA for
all intents and purposes, so it's not any better than using kvm_vcpu_is_legal_gpa().
And I couldn't bring myself to suggest adding a "reserved CR3 bits" mask because
CR3 *does* contain a GPA, i.e. we'd still have to check kvm_vcpu_is_legal_gpa(),
and realistically the "reserved CR3 bits" will never be a superset of "illegal
GPA bits".

> > If we go the governed feature route, this becomes:
> > 
> > static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu,
> > 					 unsigned long cr3)
> > {
> > 	if (guest_can_use(vcpu, X86_FEATURE_LAM))
> > 		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> > 
> > 	return kvm_vcpu_is_legal_gpa(cr3);
> > }
> > 
> > > +}
> > > +
> > >   static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
> > >   {
> > >   	return !(gpa & vcpu->arch.reserved_gpa_bits);
> > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > index 92d5a1924fc1..81d8a433dae1 100644
> > > --- a/arch/x86/kvm/mmu.h
> > > +++ b/arch/x86/kvm/mmu.h
> > > @@ -144,6 +144,11 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
> > >   	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
> > >   }
> > > +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
> > And then this becomes:
> > 
> > static inline u64 kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
> > {
> > 	if (!guest_can_use(vcpu, X86_FEATURE_LAM))
> > 		return 0;
> > 
> > 	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> > }
> > 
> > > +{
> > > +	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
> > > +}
> > > +
> > >   static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
> > >   {
> > >   	u64 root_hpa = vcpu->arch.mmu->root.hpa;
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index c8961f45e3b1..deea9a9f0c75 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
> > >   	hpa_t root;
> > >   	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
> > > -	root_gfn = root_pgd >> PAGE_SHIFT;
> > > +	/*
> > > +	 * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
> > > +	 * additional control bits (e.g. LAM control bits). To be generic,
> > > +	 * unconditionally strip non-address bits when computing the GFN since
> > > +	 * the guest PGD has already been checked for validity.
> > > +	 */
> > Drop this comment, the code is self-explanatory, and the comment is incomplete,
> > e.g. it can also be nCR3.
> I was trying to use CR3 for both nested/non-nested cases. Sorry for the
> confusion.
> Anyway, will drop the comment.

FWIW, EPTP also has non-address bits.  But the real reason I don't think this
warrants a comment is that "pgd" is a specifically not an address, i.e. it's
fully expected and intuitive that retrieving the gfn from a pgd would need to
mask off bits.
Binbin Wu July 3, 2023, 7:56 a.m. UTC | #4
On 6/29/2023 1:40 AM, Sean Christopherson wrote:
> On Wed, Jun 28, 2023, Binbin Wu wrote:
>>
>> On 6/28/2023 7:40 AM, Sean Christopherson wrote:
>>> I think I'd prefer to drop this field and avoid bikeshedding the name entirely.  The
>>> only reason to effectively cache "X86_CR3_LAM_U48 | X86_CR3_LAM_U57" is because
>>> guest_cpuid_has() is slow, and I'd rather solve that problem with the "governed
>>> feature" framework.
>> Thanks for the suggestion.
>>
>> Is the below patch the lastest patch of "governed feature" framework
>> support?
>> https://lore.kernel.org/kvm/20230217231022.816138-2-seanjc@google.com/
> Yes, I haven't refreshed it since the original posting.
>
>> Do you have plan to apply it to kvm-x86 repo?
> I'm leaning more and more towards pushing it through sooner than later as this
> isn't the first time in recent memory that a patch/series has done somewhat odd
> things to workaround guest_cpuid_has() being slow.  I was hoping to get feedback
> before applying, but that's not looking likely at this point.
Hi Sean,

I plan to adopt the "KVM-governed feature framework" to track whether 
the guest can use LAM feature.
Because your patchset is not applied yet, there are two ways to do it. 
Which one do you prefer?

Option 1:
Make KVM LAM patchset base on your "KVM-governed feature framework" 
patchset.

Option 2:
Temporarily add a bool in kvm_vcpu_arch as following, and use the bool 
"can_use_lam" instead of guest_can_use(vcpu, X86_FEATURE_LAM).
And provide a cleanup patch to use "KVM-governed feature framework", 
which can be applied along with or after your patchset.

index fb9d1f2d6136..74c0c70b0a44 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,6 +748,7 @@ struct kvm_vcpu_arch {
         bool tpr_access_reporting;
         bool xsaves_enabled;
         bool xfd_no_write_intercept;
+       bool can_use_lam;
         u64 ia32_xss;
         u64 microcode_version;
         u64 arch_capabilities;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2d9d155691a7..5b2db5daebb3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7737,6 +7737,9 @@ static void vmx_vcpu_after_set_cpuid(struct 
kvm_vcpu *vcpu)
                 vmx->msr_ia32_feature_control_valid_bits &=
                         ~FEAT_CTL_SGX_LC_ENABLED;

+       vcpu->arch.can_use_lam = boot_cpu_has(X86_FEATURE_LAM) &&
+                                guest_cpuid_has(vcpu, X86_FEATURE_LAM);
+
         /* Refresh #PF interception to account for MAXPHYADDR changes. */
         vmx_update_exception_bitmap(vcpu);
  }

[...]
Sean Christopherson July 22, 2023, 1:28 a.m. UTC | #5
On Mon, Jul 03, 2023, Binbin Wu wrote:
> 
> On 6/29/2023 1:40 AM, Sean Christopherson wrote:
> > On Wed, Jun 28, 2023, Binbin Wu wrote:
> > > 
> > > On 6/28/2023 7:40 AM, Sean Christopherson wrote:
> > > > I think I'd prefer to drop this field and avoid bikeshedding the name entirely.  The
> > > > only reason to effectively cache "X86_CR3_LAM_U48 | X86_CR3_LAM_U57" is because
> > > > guest_cpuid_has() is slow, and I'd rather solve that problem with the "governed
> > > > feature" framework.
> > > Thanks for the suggestion.
> > > 
> > > Is the below patch the lastest patch of "governed feature" framework
> > > support?
> > > https://lore.kernel.org/kvm/20230217231022.816138-2-seanjc@google.com/
> > Yes, I haven't refreshed it since the original posting.
> > 
> > > Do you have plan to apply it to kvm-x86 repo?
> > I'm leaning more and more towards pushing it through sooner than later as this
> > isn't the first time in recent memory that a patch/series has done somewhat odd
> > things to workaround guest_cpuid_has() being slow.  I was hoping to get feedback
> > before applying, but that's not looking likely at this point.
> Hi Sean,
> 
> I plan to adopt the "KVM-governed feature framework" to track whether the
> guest can use LAM feature.
> Because your patchset is not applied yet, there are two ways to do it. Which
> one do you prefer?
> 
> Option 1:
> Make KVM LAM patchset base on your "KVM-governed feature framework"
> patchset.
> 
> Option 2:
> Temporarily add a bool in kvm_vcpu_arch as following, and use the bool
> "can_use_lam" instead of guest_can_use(vcpu, X86_FEATURE_LAM).
> And provide a cleanup patch to use "KVM-governed feature framework", which
> can be applied along with or after your patchset.

Sorry for not responding.  I was hoping I could get v2 posted before advising on
a direction, but long story short, I made a few goofs and got delayed (I won't get
v2 out until next week).  Belatedly, either option is fine by me (I see you posted
v10 on top of the governed feature stuff).

Thank!  And again, sorry.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c6f03d151c31..46471dd9cc1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -727,6 +727,11 @@  struct kvm_vcpu_arch {
 	unsigned long cr0_guest_owned_bits;
 	unsigned long cr2;
 	unsigned long cr3;
+	/*
+	 * CR3 non-address feature control bits.
+	 * Guest CR3 may contain any of those bits at runtime.
+	 */
+	u64 cr3_ctrl_bits;
 	unsigned long cr4;
 	unsigned long cr4_guest_owned_bits;
 	unsigned long cr4_guest_rsvd_bits;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..ef8e1b912d7d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -42,6 +42,11 @@  static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 	return vcpu->arch.maxphyaddr;
 }
 
+static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
+{
+	return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits);
+}
+
 static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
 {
 	return !(gpa & vcpu->arch.reserved_gpa_bits);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 92d5a1924fc1..81d8a433dae1 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -144,6 +144,11 @@  static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 	return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu));
 }
 
+static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits;
+}
+
 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 {
 	u64 root_hpa = vcpu->arch.mmu->root.hpa;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..deea9a9f0c75 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3812,7 +3812,13 @@  static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	hpa_t root;
 
 	root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
-	root_gfn = root_pgd >> PAGE_SHIFT;
+	/*
+	 * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain
+	 * additional control bits (e.g. LAM control bits). To be generic,
+	 * unconditionally strip non-address bits when computing the GFN since
+	 * the guest PGD has already been checked for validity.
+	 */
+	root_gfn = (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
 	if (mmu_check_root(vcpu, root_gfn))
 		return 1;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d39af5639ce9..7d2105432d66 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -21,6 +21,7 @@  extern bool dbg;
 #endif
 
 /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
+#define __PT_BASE_ADDR_MASK GENMASK_ULL(51, 12)
 #define __PT_LEVEL_SHIFT(level, bits_per_level)	\
 	(PAGE_SHIFT + ((level) - 1) * (bits_per_level))
 #define __PT_INDEX(address, level, bits_per_level) \
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0662e0278e70..394733ac9088 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -62,7 +62,7 @@ 
 #endif
 
 /* Common logic, but per-type values.  These also need to be undefined. */
-#define PT_BASE_ADDR_MASK	((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
+#define PT_BASE_ADDR_MASK	((pt_element_t)__PT_BASE_ADDR_MASK)
 #define PT_LVL_ADDR_MASK(lvl)	__PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
 #define PT_LVL_OFFSET_MASK(lvl)	__PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lvl, PT_LEVEL_BITS)
 #define PT_INDEX(addr, lvl)	__PT_INDEX(addr, lvl, PT_LEVEL_BITS)
@@ -324,6 +324,7 @@  static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	trace_kvm_mmu_pagetable_walk(addr, access);
 retry_walk:
 	walker->level = mmu->cpu_role.base.level;
+	/* gpte_to_gfn() will strip non-address bits. */
 	pte           = kvm_mmu_get_guest_pgd(vcpu, mmu);
 	have_ad       = PT_HAVE_ACCESSED_DIRTY(mmu);
 
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1279db2eab44..777f7d443e3b 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -36,7 +36,7 @@  static_assert(SPTE_TDP_AD_ENABLED == 0);
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
 #define SPTE_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
 #else
-#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define SPTE_BASE_ADDR_MASK __PT_BASE_ADDR_MASK
 #endif
 
 #define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 96936ddf1b3c..1df801a48451 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -311,7 +311,7 @@  static bool __nested_vmcb_check_save(struct kvm_vcpu *vcpu,
 	if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) {
 		if (CC(!(save->cr4 & X86_CR4_PAE)) ||
 		    CC(!(save->cr0 & X86_CR0_PE)) ||
-		    CC(kvm_vcpu_is_illegal_gpa(vcpu, save->cr3)))
+		    CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3)))
 			return false;
 	}
 
@@ -520,7 +520,7 @@  static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 			       bool nested_npt, bool reload_pdptrs)
 {
-	if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3)))
+	if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3)))
 		return -EINVAL;
 
 	if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) &&
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e35cf0bd0df9..11b12a75ca91 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1085,7 +1085,7 @@  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
 			       bool nested_ept, bool reload_pdptrs,
 			       enum vm_entry_failure_code *entry_failure_code)
 {
-	if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) {
+	if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) {
 		*entry_failure_code = ENTRY_FAIL_DEFAULT;
 		return -EINVAL;
 	}
@@ -2913,7 +2913,7 @@  static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
 
 	if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) ||
 	    CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) ||
-	    CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3)))
+	    CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
 		return -EINVAL;
 
 	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0dd2970ba5c8..52dcf3c00bb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3358,7 +3358,8 @@  static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
 			update_guest_cr3 = false;
 		vmx_ept_load_pdptrs(vcpu);
 	} else {
-		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu);
+		guest_cr3 = root_hpa | kvm_get_active_pcid(vcpu) |
+		            kvm_get_active_cr3_ctrl_bits(vcpu);
 	}
 
 	if (update_guest_cr3)
@@ -7740,6 +7741,11 @@  static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmx->msr_ia32_feature_control_valid_bits &=
 			~FEAT_CTL_SGX_LC_ENABLED;
 
+	if (guest_cpuid_has(vcpu, X86_FEATURE_LAM))
+		vcpu->arch.cr3_ctrl_bits |= X86_CR3_LAM_U48 | X86_CR3_LAM_U57;
+	else
+		vcpu->arch.cr3_ctrl_bits &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
+
 	/* Refresh #PF interception to account for MAXPHYADDR changes. */
 	vmx_update_exception_bitmap(vcpu);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5ad55ef71433..709fc920f378 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1275,7 +1275,7 @@  int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 	 * stuff CR3, e.g. for RSM emulation, and there is no guarantee that
 	 * the current vCPU mode is accurate.
 	 */
-	if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+	if (!kvm_vcpu_is_legal_cr3(vcpu, cr3))
 		return 1;
 
 	if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
@@ -11456,7 +11456,7 @@  static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 		 */
 		if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
 			return false;
-		if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3))
+		if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3))
 			return false;
 	} else {
 		/*