Message ID | 1503577676-12345-3-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 24/08/2017 14:27, Yu Zhang wrote: > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 3ed6192..67e7ec2 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -48,6 +48,9 @@ > > static inline u64 rsvd_bits(int s, int e) > { > + if (e < s) > + return 0; > + > return ((1ULL << (e - s + 1)) - 1) << s; > } e = s - 1 is already supported; why do you need e <= s - 2? Paolo
On 8/24/2017 9:40 PM, Paolo Bonzini wrote: > On 24/08/2017 14:27, Yu Zhang wrote: >> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h >> index 3ed6192..67e7ec2 100644 >> --- a/arch/x86/kvm/mmu.h >> +++ b/arch/x86/kvm/mmu.h >> @@ -48,6 +48,9 @@ >> >> static inline u64 rsvd_bits(int s, int e) >> { >> + if (e < s) >> + return 0; >> + >> return ((1ULL << (e - s + 1)) - 1) << s; >> } > e = s - 1 is already supported; why do you need e <= s - 2? Sorry? I do not quite understand. When will e = s - 1? Thanks Yu > Paolo >
On 8/24/2017 11:50 PM, Paolo Bonzini wrote: > On 24/08/2017 17:23, Yu Zhang wrote: >>>> static inline u64 rsvd_bits(int s, int e) >>>> { >>>> + if (e < s) >>>> + return 0; >>>> + >>>> return ((1ULL << (e - s + 1)) - 1) << s; >>>> } >>> e = s - 1 is already supported; why do you need e <= s - 2? >> Sorry? I do not quite understand. When will e = s - 1? > Is there any case where e < s? I can see that MAXPHYADDR=63 gives > rsvd_bits(63, 62), but that works. > > In practice, MAXPHYADDR will never be 59 even because the PKRU bits are > at bits 59..62. Thanks, Paolo. I see. I had made an assumption that MAXPHYADDR shall not exceed the physical one, which is 52 I believe. But I'm not sure there's any place to check this. Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater than the value of the host? Yu > > Paolo >
On 24/08/2017 17:23, Yu Zhang wrote: >>> >>> static inline u64 rsvd_bits(int s, int e) >>> { >>> + if (e < s) >>> + return 0; >>> + >>> return ((1ULL << (e - s + 1)) - 1) << s; >>> } >> e = s - 1 is already supported; why do you need e <= s - 2? > > Sorry? I do not quite understand. When will e = s - 1? Is there any case where e < s? I can see that MAXPHYADDR=63 gives rsvd_bits(63, 62), but that works. In practice, MAXPHYADDR will never be 59 even because the PKRU bits are at bits 59..62. Paolo
On 8/25/2017 12:27 AM, Paolo Bonzini wrote: > On 24/08/2017 17:38, Yu Zhang wrote: >>> >>> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are >>> at bits 59..62. >> Thanks, Paolo. >> I see. I had made an assumption that MAXPHYADDR shall not exceed the >> physical one, >> which is 52 I believe. But I'm not sure there's any place to check this. >> Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater >> than the value of the host? > That's a separate change anyway. In any case, since currently the > MAXPHYADDR is not validated, your change to rsvd_bits makes sense. Thanks, Paolo. As to this patch series, any change I need do? BTW, I have written a patch for kvm-unit-test access test, but the test failed. Not sure if my patch is erroneous or due to a simulator error. I'll send out the test patch after it works.:-) Yu > Paolo >
On 24/08/2017 17:38, Yu Zhang wrote: >> >> >> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are >> at bits 59..62. > > Thanks, Paolo. > I see. I had made an assumption that MAXPHYADDR shall not exceed the > physical one, > which is 52 I believe. But I'm not sure there's any place to check this. > Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater > than the value of the host? That's a separate change anyway. In any case, since currently the MAXPHYADDR is not validated, your change to rsvd_bits makes sense. Paolo
On 24/08/2017 18:21, Yu Zhang wrote: > > > On 8/25/2017 12:27 AM, Paolo Bonzini wrote: >> On 24/08/2017 17:38, Yu Zhang wrote: >>>> >>>> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are >>>> at bits 59..62. >>> Thanks, Paolo. >>> I see. I had made an assumption that MAXPHYADDR shall not exceed the >>> physical one, >>> which is 52 I believe. But I'm not sure there's any place to check this. >>> Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater >>> than the value of the host? >> That's a separate change anyway. In any case, since currently the >> MAXPHYADDR is not validated, your change to rsvd_bits makes sense. > > Thanks, Paolo. > As to this patch series, any change I need do? No, it's fine. > BTW, I have written a patch for kvm-unit-test access test, but the test > failed. > Not sure if my patch is erroneous or due to a simulator error. I'll send > out the > test patch after it works.:-) Try to send it. I can also test it with QEMU TCG. Paolo
On Thu, Aug 24, 2017 at 5:27 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the > reserved bits in CR3. Yet the length of reserved bits in > guest CR3 should be based on the physical address width > exposed to the VM. This patch changes CR3 check logic to > calculate the reserved bits at runtime. > > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > --- > arch/x86/include/asm/kvm_host.h | 1 - > arch/x86/kvm/emulate.c | 14 ++++++++++++-- > arch/x86/kvm/mmu.h | 3 +++ > arch/x86/kvm/x86.c | 8 ++++---- > 4 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 6db0ed9..e716228 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -79,7 +79,6 @@ > | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \ > | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG)) > > -#define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL > #define CR3_PCID_INVD BIT_64(63) > #define CR4_RESERVED_BITS \ > (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 319d91f..a89b595 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -28,6 +28,7 @@ > > #include "x86.h" > #include "tss.h" > +#include "mmu.h" > > /* > * Operand types > @@ -4097,8 +4098,17 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) > u64 rsvd = 0; > > ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); > - if (efer & EFER_LMA) > - rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD; > + if (efer & EFER_LMA) { > + u64 maxphyaddr; > + u32 eax = 0x80000008; > + > + if (ctxt->ops->get_cpuid(ctxt, &eax, NULL, NULL, > + NULL, false)) Passing NULL for the address of ecx looks problematic to me. We have: static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool check_limit) { return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, check_limit); } And: bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool check_limit) { u32 function = *eax, index = *ecx; struct kvm_cpuid_entry2 *best; bool entry_found = true; ... Doesn't this immediately try to dereference a NULL pointer? How much testing have you done of this code? > + maxphyaddr = eax & 0xff; > + else > + maxphyaddr = 36; > + rsvd = rsvd_bits(maxphyaddr, 62); > + } > > if (new_val & rsvd) > return emulate_gp(ctxt, 0); > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 3ed6192..67e7ec2 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -48,6 +48,9 @@ > > static inline u64 rsvd_bits(int s, int e) > { > + if (e < s) > + return 0; > + > return ((1ULL << (e - s + 1)) - 1) << s; > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index cc2c7e4..79f5889 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -813,10 +813,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) > return 0; > } > > - if (is_long_mode(vcpu)) { > - if (cr3 & CR3_L_MODE_RESERVED_BITS) > - return 1; > - } else if (is_pae(vcpu) && is_paging(vcpu) && > + if (is_long_mode(vcpu) && > + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62))) > + return 1; > + else if (is_pae(vcpu) && is_paging(vcpu) && > !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) > return 1; > > -- > 2.5.0 >
On 9/16/2017 7:19 AM, Jim Mattson wrote: > On Thu, Aug 24, 2017 at 5:27 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote: >> Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the >> reserved bits in CR3. Yet the length of reserved bits in >> guest CR3 should be based on the physical address width >> exposed to the VM. This patch changes CR3 check logic to >> calculate the reserved bits at runtime. >> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >> --- >> arch/x86/include/asm/kvm_host.h | 1 - >> arch/x86/kvm/emulate.c | 14 ++++++++++++-- >> arch/x86/kvm/mmu.h | 3 +++ >> arch/x86/kvm/x86.c | 8 ++++---- >> 4 files changed, 19 insertions(+), 7 deletions(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 6db0ed9..e716228 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -79,7 +79,6 @@ >> | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \ >> | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG)) >> >> -#define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL >> #define CR3_PCID_INVD BIT_64(63) >> #define CR4_RESERVED_BITS \ >> (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c >> index 319d91f..a89b595 100644 >> --- a/arch/x86/kvm/emulate.c >> +++ b/arch/x86/kvm/emulate.c >> @@ -28,6 +28,7 @@ >> >> #include "x86.h" >> #include "tss.h" >> +#include "mmu.h" >> >> /* >> * Operand types >> @@ -4097,8 +4098,17 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) >> u64 rsvd = 0; >> >> ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); >> - if (efer & EFER_LMA) >> - rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD; >> + if (efer & EFER_LMA) { >> + u64 maxphyaddr; >> + u32 eax = 0x80000008; >> + >> + if (ctxt->ops->get_cpuid(ctxt, &eax, NULL, NULL, >> + NULL, false)) > Passing NULL for the address of ecx looks problematic to me. > > We have: > > static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt, > u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool > check_limit) > { > return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, check_limit); > } > > And: > > bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, > u32 *ecx, u32 *edx, bool check_limit) > { > u32 function = *eax, index = *ecx; > struct kvm_cpuid_entry2 *best; > bool entry_found = true; > ... > > Doesn't this immediately try to dereference a NULL pointer? How much > testing have you done of this code? Thanks Jim. I have tested this code in a simulator to successfully boot a VM in shadow mode. Seems this code is not covered(but I am now still perplexed why this is not covered). Any possibility that the check_cr_write() is not triggered when emulating the cr operations? Anyway, this should be a bug and thanks for pointing this out, and I'll send out the fix later. BR Yu
On 18/09/2017 10:15, Yu Zhang wrote: >> >> static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt, >> u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool >> check_limit) >> { >> return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, >> check_limit); >> } >> >> And: >> >> bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, >> u32 *ecx, u32 *edx, bool check_limit) >> { >> u32 function = *eax, index = *ecx; >> struct kvm_cpuid_entry2 *best; >> bool entry_found = true; >> ... >> >> Doesn't this immediately try to dereference a NULL pointer? How much >> testing have you done of this code? > > Thanks Jim. > I have tested this code in a simulator to successfully boot a VM in > shadow mode. Seems this code is not covered(but I am now still > perplexed why this is not covered). Any possibility that the > check_cr_write() is not triggered when emulating the cr operations? CR moves usually don't go through the emulator (the main exception is emulation of invalid guest state when the processor doesn't support unrestricted_guest=1, but even that is unlikely to happen with EFER.LMA=1). This explains why you didn't see the failure. > Anyway, this should be a bug and thanks for pointing this out, and > I'll send out the fix later. Thanks, Paolo
On 9/18/2017 4:41 PM, Paolo Bonzini wrote: > On 18/09/2017 10:15, Yu Zhang wrote: >>> static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt, >>> u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool >>> check_limit) >>> { >>> return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, >>> check_limit); >>> } >>> >>> And: >>> >>> bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, >>> u32 *ecx, u32 *edx, bool check_limit) >>> { >>> u32 function = *eax, index = *ecx; >>> struct kvm_cpuid_entry2 *best; >>> bool entry_found = true; >>> ... >>> >>> Doesn't this immediately try to dereference a NULL pointer? How much >>> testing have you done of this code? >> Thanks Jim. >> I have tested this code in a simulator to successfully boot a VM in >> shadow mode. Seems this code is not covered(but I am now still >> perplexed why this is not covered). Any possibility that the >> check_cr_write() is not triggered when emulating the cr operations? > CR moves usually don't go through the emulator (the main exception is > emulation of invalid guest state when the processor doesn't support > unrestricted_guest=1, but even that is unlikely to happen with > EFER.LMA=1). This explains why you didn't see the failure. Oh, right. It normally goes to handle_cr(). Thanks, Paolo. Yu > >> Anyway, this should be a bug and thanks for pointing this out, and >> I'll send out the fix later. > Thanks, > > Paolo >
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6db0ed9..e716228 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -79,7 +79,6 @@ | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \ | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG)) -#define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL #define CR3_PCID_INVD BIT_64(63) #define CR4_RESERVED_BITS \ (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 319d91f..a89b595 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -28,6 +28,7 @@ #include "x86.h" #include "tss.h" +#include "mmu.h" /* * Operand types @@ -4097,8 +4098,17 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) u64 rsvd = 0; ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); - if (efer & EFER_LMA) - rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD; + if (efer & EFER_LMA) { + u64 maxphyaddr; + u32 eax = 0x80000008; + + if (ctxt->ops->get_cpuid(ctxt, &eax, NULL, NULL, + NULL, false)) + maxphyaddr = eax & 0xff; + else + maxphyaddr = 36; + rsvd = rsvd_bits(maxphyaddr, 62); + } if (new_val & rsvd) return emulate_gp(ctxt, 0); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 3ed6192..67e7ec2 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -48,6 +48,9 @@ static inline u64 rsvd_bits(int s, int e) { + if (e < s) + return 0; + return ((1ULL << (e - s + 1)) - 1) << s; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc2c7e4..79f5889 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -813,10 +813,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) return 0; } - if (is_long_mode(vcpu)) { - if (cr3 & CR3_L_MODE_RESERVED_BITS) - return 1; - } else if (is_pae(vcpu) && is_paging(vcpu) && + if (is_long_mode(vcpu) && + (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62))) + return 1; + else if (is_pae(vcpu) && is_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) return 1;
Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the reserved bits in CR3. Yet the length of reserved bits in guest CR3 should be based on the physical address width exposed to the VM. This patch changes CR3 check logic to calculate the reserved bits at runtime. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/emulate.c | 14 ++++++++++++-- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/x86.c | 8 ++++---- 4 files changed, 19 insertions(+), 7 deletions(-)