Message ID | 1502544906-1108-4-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 12/08/2017 15:35, Yu Zhang wrote: > struct rsvd_bits_validate { > - u64 rsvd_bits_mask[2][4]; > + u64 rsvd_bits_mask[2][5]; > u64 bad_mt_xwr; > }; Can you change this 4 to PT64_ROOT_MAX_LEVEL in patch 2? > - if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL && > - (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL || > - vcpu->arch.mmu.direct_map)) { > + if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL || > + vcpu->arch.mmu.direct_map) { > hpa_t root = vcpu->arch.mmu.root_hpa; You should keep the check on shadow_root_level (changing it to >= of course), otherwise you break the case where EPT is disabled, paging is disabled (so vcpu->arch.mmu.direct_map is true) and the host kernel is 32-bit. In that case shadow pages use PAE format, and entering this branch is incorrect. > @@ -4444,7 +4457,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, > > MMU_WARN_ON(VALID_PAGE(context->root_hpa)); > > - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); > + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); > > context->nx = true; > context->ept_ad = accessed_dirty; Below, there is: context->root_level = context->shadow_root_level; this should be forced to PT64_ROOT_4LEVEL until there is support for nested EPT 5-level page tables. Thanks, Paolo
Thanks a lot for your comments, Paolo. :-) On 8/14/2017 3:31 PM, Paolo Bonzini wrote: > On 12/08/2017 15:35, Yu Zhang wrote: >> struct rsvd_bits_validate { >> - u64 rsvd_bits_mask[2][4]; >> + u64 rsvd_bits_mask[2][5]; >> u64 bad_mt_xwr; >> }; > > Can you change this 4 to PT64_ROOT_MAX_LEVEL in patch 2? Well, I had tried, but failed to find a neat approach to do so. The difficulty I have met is that PT64_ROOT_MAX_LEVEL is defined together with PT64_ROOT_4LEVEL/PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL in mmu.h, yet the rsvd_bits_validate structure is defined in kvm_host.h, which are included in quite a lot .c files that do not include mmu.h or include the mmu.h after kvm_host.h. I guess that's the reason why the magic number 4 instead of PT64_ROOT_4LEVEL is used in current definition of rsvd_bits_vadlidate. :-) > >> - if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL && >> - (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL || >> - vcpu->arch.mmu.direct_map)) { >> + if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL || >> + vcpu->arch.mmu.direct_map) { >> hpa_t root = vcpu->arch.mmu.root_hpa; > You should keep the check on shadow_root_level (changing it to >= of > course), otherwise you break the case where EPT is disabled, paging is > disabled (so vcpu->arch.mmu.direct_map is true) and the host kernel is > 32-bit. In that case shadow pages use PAE format, and entering this > branch is incorrect. Oh, right. Thanks! > >> @@ -4444,7 +4457,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, >> >> MMU_WARN_ON(VALID_PAGE(context->root_hpa)); >> >> - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); >> + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); >> >> context->nx = true; >> context->ept_ad = accessed_dirty; > Below, there is: > > context->root_level = context->shadow_root_level; > > this should be forced to PT64_ROOT_4LEVEL until there is support for > nested EPT 5-level page tables. So the context->shadow_root_level could be 5 or 4, and context->root_level is always 4? My understanding is that shadow ept level should be determined by the width of ngpa, and that if L1 guest is not exposed with EPT5 feature, it shall only use 4 level ept for L2 guest, and the shadow ept does not need a 5 level one. Is this understanding correct? And how about we set both values to PT64_ROOT_4LEVEL for now? Besides, if we wanna support nested EPT5, what do you think we need to do besides exposing the EPT5 feature to L1 guest? Thanks Yu > > Thanks, > > Paolo >
On 14/08/2017 13:37, Yu Zhang wrote: > Thanks a lot for your comments, Paolo. :-) > > > On 8/14/2017 3:31 PM, Paolo Bonzini wrote: >> On 12/08/2017 15:35, Yu Zhang wrote: >>> struct rsvd_bits_validate { >>> - u64 rsvd_bits_mask[2][4]; >>> + u64 rsvd_bits_mask[2][5]; >>> u64 bad_mt_xwr; >>> }; >> >> Can you change this 4 to PT64_ROOT_MAX_LEVEL in patch 2? > > Well, I had tried, but failed to find a neat approach to do so. The > difficulty I have met is that PT64_ROOT_MAX_LEVEL is defined together > with PT64_ROOT_4LEVEL/PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL in mmu.h, yet > the rsvd_bits_validate structure is defined in kvm_host.h, which are > included in quite a lot .c files that do not include mmu.h or include > the mmu.h after kvm_host.h. > > I guess that's the reason why the magic number 4 instead of > PT64_ROOT_4LEVEL is used in current definition of rsvd_bits_vadlidate. :-) Yes, you're right. I think the solution is to define PT64_ROOT_MAX_LEVEL in kvm_host.h. >>> @@ -4444,7 +4457,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu >>> *vcpu, bool execonly, >>> MMU_WARN_ON(VALID_PAGE(context->root_hpa)); >>> - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); >>> + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); >>> context->nx = true; >>> context->ept_ad = accessed_dirty; >> Below, there is: >> >> context->root_level = context->shadow_root_level; >> >> this should be forced to PT64_ROOT_4LEVEL until there is support for >> nested EPT 5-level page tables. > > So the context->shadow_root_level could be 5 or 4, and > context->root_level is always 4? That was my idea, but setting both to 4 should be fine too as you suggest below. > My understanding is that shadow ept level should be determined by > the width of ngpa, and that if L1 guest is not exposed with EPT5 > feature, it shall only use 4 level ept for L2 guest, and the shadow > ept does not need a 5 level one. Is this understanding correct? And > how about we set both values to PT64_ROOT_4LEVEL for now?> > Besides, if we wanna support nested EPT5, what do you think we need to > do besides exposing the EPT5 feature to L1 guest? Nothing else, I think. Paolo
On 8/14/2017 10:13 PM, Paolo Bonzini wrote: > On 14/08/2017 13:37, Yu Zhang wrote: >> Thanks a lot for your comments, Paolo. :-) >> >> >> On 8/14/2017 3:31 PM, Paolo Bonzini wrote: >>> On 12/08/2017 15:35, Yu Zhang wrote: >>>> struct rsvd_bits_validate { >>>> - u64 rsvd_bits_mask[2][4]; >>>> + u64 rsvd_bits_mask[2][5]; >>>> u64 bad_mt_xwr; >>>> }; >>> Can you change this 4 to PT64_ROOT_MAX_LEVEL in patch 2? >> Well, I had tried, but failed to find a neat approach to do so. The >> difficulty I have met is that PT64_ROOT_MAX_LEVEL is defined together >> with PT64_ROOT_4LEVEL/PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL in mmu.h, yet >> the rsvd_bits_validate structure is defined in kvm_host.h, which are >> included in quite a lot .c files that do not include mmu.h or include >> the mmu.h after kvm_host.h. >> >> I guess that's the reason why the magic number 4 instead of >> PT64_ROOT_4LEVEL is used in current definition of rsvd_bits_vadlidate. :-) > Yes, you're right. I think the solution is to define > PT64_ROOT_MAX_LEVEL in kvm_host.h. Thanks, Paolo. How about we also move the definition of PT64_ROOT_4LEVEL/ PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL from mmu.h to kvm_host.h? Then we can define PT64_ROOT_MAX_LEVEL as PT64_ROOT_4LEVEL instead of 4 in kvm_host.h. >>>> @@ -4444,7 +4457,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu >>>> *vcpu, bool execonly, >>>> MMU_WARN_ON(VALID_PAGE(context->root_hpa)); >>>> - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); >>>> + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); >>>> context->nx = true; >>>> context->ept_ad = accessed_dirty; >>> Below, there is: >>> >>> context->root_level = context->shadow_root_level; >>> >>> this should be forced to PT64_ROOT_4LEVEL until there is support for >>> nested EPT 5-level page tables. >> So the context->shadow_root_level could be 5 or 4, and >> context->root_level is always 4? > That was my idea, but setting both to 4 should be fine too as you > suggest below. > >> My understanding is that shadow ept level should be determined by >> the width of ngpa, and that if L1 guest is not exposed with EPT5 >> feature, it shall only use 4 level ept for L2 guest, and the shadow >> ept does not need a 5 level one. Is this understanding correct? And >> how about we set both values to PT64_ROOT_4LEVEL for now?> >> Besides, if we wanna support nested EPT5, what do you think we need to >> do besides exposing the EPT5 feature to L1 guest? > Nothing else, I think. Thanks. I'll try to keep both values fixed to PT64_ROOT_4LEVEL then. :-) For nested EPT5, we can enable it later(should be a quite simple patch, but need to be verified in our simics environment, which I am not sure if nested scenario works). B.R. Yu
On 8/14/2017 11:02 PM, Paolo Bonzini wrote: > On 14/08/2017 16:32, Yu Zhang wrote: >> >> On 8/14/2017 10:13 PM, Paolo Bonzini wrote: >>> On 14/08/2017 13:37, Yu Zhang wrote: >>>> Thanks a lot for your comments, Paolo. :-) >>>> >>>> >>>> On 8/14/2017 3:31 PM, Paolo Bonzini wrote: >>>>> On 12/08/2017 15:35, Yu Zhang wrote: >>>>>> struct rsvd_bits_validate { >>>>>> - u64 rsvd_bits_mask[2][4]; >>>>>> + u64 rsvd_bits_mask[2][5]; >>>>>> u64 bad_mt_xwr; >>>>>> }; >>>>> Can you change this 4 to PT64_ROOT_MAX_LEVEL in patch 2? >>>> Well, I had tried, but failed to find a neat approach to do so. The >>>> difficulty I have met is that PT64_ROOT_MAX_LEVEL is defined together >>>> with PT64_ROOT_4LEVEL/PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL in mmu.h, yet >>>> the rsvd_bits_validate structure is defined in kvm_host.h, which are >>>> included in quite a lot .c files that do not include mmu.h or include >>>> the mmu.h after kvm_host.h. >>>> >>>> I guess that's the reason why the magic number 4 instead of >>>> PT64_ROOT_4LEVEL is used in current definition of >>>> rsvd_bits_vadlidate. :-) >>> Yes, you're right. I think the solution is to define >>> PT64_ROOT_MAX_LEVEL in kvm_host.h. >> Thanks, Paolo. How about we also move the definition of PT64_ROOT_4LEVEL/ >> PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL from mmu.h to kvm_host.h? Then we >> can define PT64_ROOT_MAX_LEVEL as PT64_ROOT_4LEVEL instead of 4 in >> kvm_host.h. > No, I think those are best left in mmu.h. They are only used in mmu > files, except for two occurrences in svm.c. > > kvm_host.h would have PT64_ROOT_MAX_LEVEL just because it is slightly > better than "4" or "5". OK. I can define PT64_ROOT_MAX_LEVEL in kvm_host.h as 4 in patch 2, and change it to 5 in patch 3. :- ) Thanks Yu
On 14/08/2017 16:32, Yu Zhang wrote: > > > On 8/14/2017 10:13 PM, Paolo Bonzini wrote: >> On 14/08/2017 13:37, Yu Zhang wrote: >>> Thanks a lot for your comments, Paolo. :-) >>> >>> >>> On 8/14/2017 3:31 PM, Paolo Bonzini wrote: >>>> On 12/08/2017 15:35, Yu Zhang wrote: >>>>> struct rsvd_bits_validate { >>>>> - u64 rsvd_bits_mask[2][4]; >>>>> + u64 rsvd_bits_mask[2][5]; >>>>> u64 bad_mt_xwr; >>>>> }; >>>> Can you change this 4 to PT64_ROOT_MAX_LEVEL in patch 2? >>> Well, I had tried, but failed to find a neat approach to do so. The >>> difficulty I have met is that PT64_ROOT_MAX_LEVEL is defined together >>> with PT64_ROOT_4LEVEL/PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL in mmu.h, yet >>> the rsvd_bits_validate structure is defined in kvm_host.h, which are >>> included in quite a lot .c files that do not include mmu.h or include >>> the mmu.h after kvm_host.h. >>> >>> I guess that's the reason why the magic number 4 instead of >>> PT64_ROOT_4LEVEL is used in current definition of >>> rsvd_bits_vadlidate. :-) >> Yes, you're right. I think the solution is to define >> PT64_ROOT_MAX_LEVEL in kvm_host.h. > > Thanks, Paolo. How about we also move the definition of PT64_ROOT_4LEVEL/ > PT32E_ROOT_LEVEL/PT32_ROOT_LEVEL from mmu.h to kvm_host.h? Then we > can define PT64_ROOT_MAX_LEVEL as PT64_ROOT_4LEVEL instead of 4 in > kvm_host.h. No, I think those are best left in mmu.h. They are only used in mmu files, except for two occurrences in svm.c. kvm_host.h would have PT64_ROOT_MAX_LEVEL just because it is slightly better than "4" or "5". Paolo >>>>> @@ -4444,7 +4457,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu >>>>> *vcpu, bool execonly, >>>>> MMU_WARN_ON(VALID_PAGE(context->root_hpa)); >>>>> - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); >>>>> + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); >>>>> context->nx = true; >>>>> context->ept_ad = accessed_dirty; >>>> Below, there is: >>>> >>>> context->root_level = context->shadow_root_level; >>>> >>>> this should be forced to PT64_ROOT_4LEVEL until there is support for >>>> nested EPT 5-level page tables. >>> So the context->shadow_root_level could be 5 or 4, and >>> context->root_level is always 4? >> That was my idea, but setting both to 4 should be fine too as you >> suggest below. >> >>> My understanding is that shadow ept level should be determined by >>> the width of ngpa, and that if L1 guest is not exposed with EPT5 >>> feature, it shall only use 4 level ept for L2 guest, and the shadow >>> ept does not need a 5 level one. Is this understanding correct? And >>> how about we set both values to PT64_ROOT_4LEVEL for now?> >>> Besides, if we wanna support nested EPT5, what do you think we need to >>> do besides exposing the EPT5 feature to L1 guest? >> Nothing else, I think. > > Thanks. I'll try to keep both values fixed to PT64_ROOT_4LEVEL then. :-) > For nested EPT5, we can enable it later(should be a quite simple patch, > but need to > be verified in our simics environment, which I am not sure if nested > scenario works). > > B.R. > Yu
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 018300e..7e98a75 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -316,14 +316,14 @@ struct kvm_pio_request { }; struct rsvd_bits_validate { - u64 rsvd_bits_mask[2][4]; + u64 rsvd_bits_mask[2][5]; u64 bad_mt_xwr; }; /* - * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level - * 32-bit). The kvm_mmu structure abstracts the details of the current mmu - * mode. + * x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit, + * and 2-level 32-bit). The kvm_mmu structure abstracts the details of the + * current mmu mode. */ struct kvm_mmu { void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root); @@ -979,7 +979,7 @@ struct kvm_x86_ops { void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); - int (*get_tdp_level)(void); + int (*get_tdp_level)(struct kvm_vcpu *vcpu); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); int (*get_lpage_level)(void); bool (*rdtscp_supported)(void); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 5f63a2e..a0fb025 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -453,6 +453,7 @@ enum vmcs_field { #define VMX_EPT_EXECUTE_ONLY_BIT (1ull) #define VMX_EPT_PAGE_WALK_4_BIT (1ull << 6) +#define VMX_EPT_PAGE_WALK_5_BIT (1ull << 7) #define VMX_EPTP_UC_BIT (1ull << 8) #define VMX_EPTP_WB_BIT (1ull << 14) #define VMX_EPT_2MB_PAGE_BIT (1ull << 16) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 59ca2ee..aceacf8 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -137,6 +137,11 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) /* Update physical-address width */ vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); +#ifdef CONFIG_X86_64 + if (vcpu->arch.maxphyaddr > 48) + kvm_mmu_reset_context(vcpu); +#endif + kvm_pmu_refresh(vcpu); return 0; } diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index cd4d2cc..298d840 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3323,9 +3323,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu) if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) return; - if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL && - (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL || - vcpu->arch.mmu.direct_map)) { + if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL || + vcpu->arch.mmu.direct_map) { hpa_t root = vcpu->arch.mmu.root_hpa; spin_lock(&vcpu->kvm->mmu_lock); @@ -3376,10 +3375,11 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) struct kvm_mmu_page *sp; unsigned i; - if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL) { + if (vcpu->arch.mmu.shadow_root_level >= PT64_ROOT_4LEVEL) { spin_lock(&vcpu->kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_4LEVEL, 1, ACC_ALL); + sp = kvm_mmu_get_page(vcpu, 0, 0, + vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL); ++sp->root_count; spin_unlock(&vcpu->kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3420,15 +3420,15 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) * Do we shadow a long mode page table? If so we need to * write-protect the guests page table root. */ - if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) { + if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL) { hpa_t root = vcpu->arch.mmu.root_hpa; MMU_WARN_ON(VALID_PAGE(root)); spin_lock(&vcpu->kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_4LEVEL, - 0, ACC_ALL); + sp = kvm_mmu_get_page(vcpu, root_gfn, 0, + vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(&vcpu->kvm->mmu_lock); @@ -3520,7 +3520,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu) vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); - if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) { + if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL) { hpa_t root = vcpu->arch.mmu.root_hpa; sp = page_header(root); mmu_sync_children(vcpu, sp); @@ -4022,6 +4022,12 @@ __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, rsvd_check->rsvd_bits_mask[1][0] = rsvd_check->rsvd_bits_mask[0][0]; break; + case PT64_ROOT_5LEVEL: + rsvd_check->rsvd_bits_mask[0][4] = exb_bit_rsvd | + nonleaf_bit8_rsvd | rsvd_bits(7, 7) | + rsvd_bits(maxphyaddr, 51); + rsvd_check->rsvd_bits_mask[1][4] = + rsvd_check->rsvd_bits_mask[0][4]; case PT64_ROOT_4LEVEL: rsvd_check->rsvd_bits_mask[0][3] = exb_bit_rsvd | nonleaf_bit8_rsvd | rsvd_bits(7, 7) | @@ -4063,6 +4069,8 @@ __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check, { u64 bad_mt_xwr; + rsvd_check->rsvd_bits_mask[0][4] = + rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7); rsvd_check->rsvd_bits_mask[0][3] = rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7); rsvd_check->rsvd_bits_mask[0][2] = @@ -4072,6 +4080,7 @@ __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check, rsvd_check->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51); /* large page */ + rsvd_check->rsvd_bits_mask[1][4] = rsvd_check->rsvd_bits_mask[0][4]; rsvd_check->rsvd_bits_mask[1][3] = rsvd_check->rsvd_bits_mask[0][3]; rsvd_check->rsvd_bits_mask[1][2] = rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29); @@ -4332,7 +4341,10 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu, static void paging64_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { - paging64_init_context_common(vcpu, context, PT64_ROOT_4LEVEL); + int root_level = is_la57_mode(vcpu) ? + PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL; + + paging64_init_context_common(vcpu, context, root_level); } static void paging32_init_context(struct kvm_vcpu *vcpu, @@ -4373,7 +4385,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) context->sync_page = nonpaging_sync_page; context->invlpg = nonpaging_invlpg; context->update_pte = nonpaging_update_pte; - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); context->root_hpa = INVALID_PAGE; context->direct_map = true; context->set_cr3 = kvm_x86_ops->set_tdp_cr3; @@ -4387,7 +4399,8 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) context->root_level = 0; } else if (is_long_mode(vcpu)) { context->nx = is_nx(vcpu); - context->root_level = PT64_ROOT_4LEVEL; + context->root_level = is_la57_mode(vcpu) ? + PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL; reset_rsvds_bits_mask(vcpu, context); context->gva_to_gpa = paging64_gva_to_gpa; } else if (is_pae(vcpu)) { @@ -4444,7 +4457,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, MMU_WARN_ON(VALID_PAGE(context->root_hpa)); - context->shadow_root_level = kvm_x86_ops->get_tdp_level(); + context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu); context->nx = true; context->ept_ad = accessed_dirty; @@ -4498,7 +4511,8 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu) g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested; } else if (is_long_mode(vcpu)) { g_context->nx = is_nx(vcpu); - g_context->root_level = PT64_ROOT_4LEVEL; + g_context->root_level = is_la57_mode(vcpu) ? + PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL; reset_rsvds_bits_mask(vcpu, g_context); g_context->gva_to_gpa = paging64_gva_to_gpa_nested; } else if (is_pae(vcpu)) { diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60b9001..7152b5b 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -37,11 +37,12 @@ #define PT32_DIR_PSE36_MASK \ (((1ULL << PT32_DIR_PSE36_SIZE) - 1) << PT32_DIR_PSE36_SHIFT) +#define PT64_ROOT_5LEVEL 5 #define PT64_ROOT_4LEVEL 4 #define PT32_ROOT_LEVEL 2 #define PT32E_ROOT_LEVEL 3 -#define PT64_ROOT_MAX_LEVEL PT64_ROOT_4LEVEL +#define PT64_ROOT_MAX_LEVEL PT64_ROOT_5LEVEL #define PT_PDPE_LEVEL 3 #define PT_DIRECTORY_LEVEL 2 @@ -50,6 +51,9 @@ static inline u64 rsvd_bits(int s, int e) { + if (e < s) + return 0; + return ((1ULL << (e - s + 1)) - 1) << s; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 2e6996d..d22ddbd 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -62,11 +62,11 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn) if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) return; - if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) { + if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL) { hpa_t root = vcpu->arch.mmu.root_hpa; sp = page_header(root); - __mmu_spte_walk(vcpu, sp, fn, PT64_ROOT_4LEVEL); + __mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu.root_level); return; } diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f7aa33d..bdd0142 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -567,7 +567,7 @@ static inline void invlpga(unsigned long addr, u32 asid) asm volatile (__ex(SVM_INVLPGA) : : "a"(addr), "c"(asid)); } -static int get_npt_level(void) +static int get_npt_level(struct kvm_vcpu *vcpu) { #ifdef CONFIG_X86_64 return PT64_ROOT_4LEVEL; @@ -2389,7 +2389,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu) vcpu->arch.mmu.get_cr3 = nested_svm_get_tdp_cr3; vcpu->arch.mmu.get_pdptr = nested_svm_get_tdp_pdptr; vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit; - vcpu->arch.mmu.shadow_root_level = get_npt_level(); + vcpu->arch.mmu.shadow_root_level = get_npt_level(vcpu); reset_shadow_zero_bits_mask(vcpu, &vcpu->arch.mmu); vcpu->arch.walk_mmu = &vcpu->arch.nested_mmu; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ed1074e..614ade7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1200,6 +1200,11 @@ static inline bool cpu_has_vmx_ept_4levels(void) return vmx_capability.ept & VMX_EPT_PAGE_WALK_4_BIT; } +static inline bool cpu_has_vmx_ept_5levels(void) +{ + return vmx_capability.ept & VMX_EPT_PAGE_WALK_5_BIT; +} + static inline bool cpu_has_vmx_ept_ad_bits(void) { return vmx_capability.ept & VMX_EPT_AD_BIT; @@ -4296,13 +4301,20 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) vmx->emulation_required = emulation_required(vcpu); } +static int get_ept_level(struct kvm_vcpu *vcpu) +{ + if (cpu_has_vmx_ept_5levels() && (cpuid_maxphyaddr(vcpu) > 48)) + return VMX_EPT_MAX_GAW + 1; + return VMX_EPT_DEFAULT_GAW + 1; +} + static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa) { u64 eptp; /* TODO write the value reading from MSR */ eptp = VMX_EPT_DEFAULT_MT | - VMX_EPT_DEFAULT_GAW << VMX_EPT_GAW_EPTP_SHIFT; + (get_ept_level(vcpu) - 1) << VMX_EPT_GAW_EPTP_SHIFT; if (enable_ept_ad_bits && (!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu))) eptp |= VMX_EPT_AD_ENABLE_BIT; @@ -9505,11 +9517,6 @@ static void __init vmx_check_processor_compat(void *rtn) } } -static int get_ept_level(void) -{ - return VMX_EPT_DEFAULT_GAW + 1; -} - static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { u8 cache; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 6120670..0107ab7 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -62,6 +62,16 @@ static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu) return cs_l; } +static inline bool is_la57_mode(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_X86_64 + return (vcpu->arch.efer & EFER_LMA) && + kvm_read_cr4_bits(vcpu, X86_CR4_LA57); +#else + return 0; +#endif +} + static inline bool mmu_is_nested(struct kvm_vcpu *vcpu) { return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
Extends the shadow paging code, so that 5 level shadow page table can be constructed if VM is running in 5 level paging mode. Also extends the ept code, so that 5 level ept table can be constructed if maxphysaddr of VM exceeds 48 bits. Unlike the shadow logic, KVM should still use 4 level ept table for a VM whose physical address width is less than 48 bits, even when the VM is running in 5 level paging mode. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> --- arch/x86/include/asm/kvm_host.h | 10 +++++----- arch/x86/include/asm/vmx.h | 1 + arch/x86/kvm/cpuid.c | 5 +++++ arch/x86/kvm/mmu.c | 42 +++++++++++++++++++++++++++-------------- arch/x86/kvm/mmu.h | 6 +++++- arch/x86/kvm/mmu_audit.c | 4 ++-- arch/x86/kvm/svm.c | 4 ++-- arch/x86/kvm/vmx.c | 19 +++++++++++++------ arch/x86/kvm/x86.h | 10 ++++++++++ 9 files changed, 71 insertions(+), 30 deletions(-)