Message ID | jpg615ul1j8.fsf@linux.bootlegged.copy (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 09/07/2015 00:36, Bandan Das wrote: > Let userspace inquire the maximum physical address width > of the host processors; this can be used to identify maximum > memory that can be assigned to the guest. > > Reported-by: Laszlo Ersek <lersek@redhat.com> > Signed-off-by: Bandan Das <bsd@redhat.com> > --- > arch/x86/kvm/x86.c | 3 +++ > include/uapi/linux/kvm.h | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index bbaf44e..97d6746 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_NR_MEMSLOTS: > r = KVM_USER_MEM_SLOTS; > break; > + case KVM_CAP_PHY_ADDR_WIDTH: > + r = boot_cpu_data.x86_phys_bits; > + break; Userspace can just use CPUID, can't it? Paolo > case KVM_CAP_PV_MMU: /* obsolete */ > r = 0; > break; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 716ad4a..e7949a1 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { > #define KVM_CAP_DISABLE_QUIRKS 116 > #define KVM_CAP_X86_SMM 117 > #define KVM_CAP_MULTI_ADDRESS_SPACE 118 > +#define KVM_CAP_PHY_ADDR_WIDTH 119 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/09/15 08:09, Paolo Bonzini wrote: > > > On 09/07/2015 00:36, Bandan Das wrote: >> Let userspace inquire the maximum physical address width >> of the host processors; this can be used to identify maximum >> memory that can be assigned to the guest. >> >> Reported-by: Laszlo Ersek <lersek@redhat.com> >> Signed-off-by: Bandan Das <bsd@redhat.com> >> --- >> arch/x86/kvm/x86.c | 3 +++ >> include/uapi/linux/kvm.h | 1 + >> 2 files changed, 4 insertions(+) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index bbaf44e..97d6746 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >> case KVM_CAP_NR_MEMSLOTS: >> r = KVM_USER_MEM_SLOTS; >> break; >> + case KVM_CAP_PHY_ADDR_WIDTH: >> + r = boot_cpu_data.x86_phys_bits; >> + break; > > Userspace can just use CPUID, can't it? I believe KVM's cooperation is necessary, for the following reason: The truncation only occurs when the guest-phys <-> host-phys translation is done in hardware, *and* the phys bits of the host processor are insufficient to represent the highest guest-phys address that the guest will ever face. The first condition (of course) means that the truncation depends on EPT being enabled. (I didn't test on AMD so I don't know if RVI has the same issue.) If EPT is disabled, either because the host processor lacks it, or because the respective kvm_intel module parameter is set so, then the issue cannot be experienced. Therefore I believe a KVM patch is necessary. However, this specific patch doesn't seem sufficient; it should also consider whether EPT is enabled. (And the ioctl should be perhaps renamed to reflect that -- what QEMU needs to know is not the raw physical address width of the host processor, but whether that width will cause EPT to silently truncate high guest-phys addresses.) Thanks Laszlo > > Paolo > >> case KVM_CAP_PV_MMU: /* obsolete */ >> r = 0; >> break; >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index 716ad4a..e7949a1 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { >> #define KVM_CAP_DISABLE_QUIRKS 116 >> #define KVM_CAP_X86_SMM 117 >> #define KVM_CAP_MULTI_ADDRESS_SPACE 118 >> +#define KVM_CAP_PHY_ADDR_WIDTH 119 >> >> #ifdef KVM_CAP_IRQ_ROUTING >> >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/07/2015 08:43, Laszlo Ersek wrote: > On 07/09/15 08:09, Paolo Bonzini wrote: >> >> >> On 09/07/2015 00:36, Bandan Das wrote: >>> Let userspace inquire the maximum physical address width >>> of the host processors; this can be used to identify maximum >>> memory that can be assigned to the guest. >>> >>> Reported-by: Laszlo Ersek <lersek@redhat.com> >>> Signed-off-by: Bandan Das <bsd@redhat.com> >>> --- >>> arch/x86/kvm/x86.c | 3 +++ >>> include/uapi/linux/kvm.h | 1 + >>> 2 files changed, 4 insertions(+) >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index bbaf44e..97d6746 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>> case KVM_CAP_NR_MEMSLOTS: >>> r = KVM_USER_MEM_SLOTS; >>> break; >>> + case KVM_CAP_PHY_ADDR_WIDTH: >>> + r = boot_cpu_data.x86_phys_bits; >>> + break; >> >> Userspace can just use CPUID, can't it? > > I believe KVM's cooperation is necessary, for the following reason: > > The truncation only occurs when the guest-phys <-> host-phys translation > is done in hardware, *and* the phys bits of the host processor are > insufficient to represent the highest guest-phys address that the guest > will ever face. > > The first condition (of course) means that the truncation depends on EPT > being enabled. (I didn't test on AMD so I don't know if RVI has the same > issue.) If EPT is disabled, either because the host processor lacks it, > or because the respective kvm_intel module parameter is set so, then the > issue cannot be experienced. > > Therefore I believe a KVM patch is necessary. > > However, this specific patch doesn't seem sufficient; it should also > consider whether EPT is enabled. (And the ioctl should be perhaps > renamed to reflect that -- what QEMU needs to know is not the raw > physical address width of the host processor, but whether that width > will cause EPT to silently truncate high guest-phys addresses.) Right; if you want to consider whether EPT is enabled (which is the right thing to do, albeit it makes for a much bigger patch) a KVM patch is necessary. In that case you also need to patch the API documentation. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini <pbonzini@redhat.com> writes: > On 09/07/2015 08:43, Laszlo Ersek wrote: >> On 07/09/15 08:09, Paolo Bonzini wrote: >>> >>> >>> On 09/07/2015 00:36, Bandan Das wrote: >>>> Let userspace inquire the maximum physical address width >>>> of the host processors; this can be used to identify maximum >>>> memory that can be assigned to the guest. >>>> >>>> Reported-by: Laszlo Ersek <lersek@redhat.com> >>>> Signed-off-by: Bandan Das <bsd@redhat.com> >>>> --- >>>> arch/x86/kvm/x86.c | 3 +++ >>>> include/uapi/linux/kvm.h | 1 + >>>> 2 files changed, 4 insertions(+) >>>> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index bbaf44e..97d6746 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>>> case KVM_CAP_NR_MEMSLOTS: >>>> r = KVM_USER_MEM_SLOTS; >>>> break; >>>> + case KVM_CAP_PHY_ADDR_WIDTH: >>>> + r = boot_cpu_data.x86_phys_bits; >>>> + break; >>> >>> Userspace can just use CPUID, can't it? >> >> I believe KVM's cooperation is necessary, for the following reason: >> >> The truncation only occurs when the guest-phys <-> host-phys translation >> is done in hardware, *and* the phys bits of the host processor are >> insufficient to represent the highest guest-phys address that the guest >> will ever face. >> >> The first condition (of course) means that the truncation depends on EPT >> being enabled. (I didn't test on AMD so I don't know if RVI has the same >> issue.) If EPT is disabled, either because the host processor lacks it, >> or because the respective kvm_intel module parameter is set so, then the >> issue cannot be experienced. >> >> Therefore I believe a KVM patch is necessary. >> >> However, this specific patch doesn't seem sufficient; it should also >> consider whether EPT is enabled. (And the ioctl should be perhaps >> renamed to reflect that -- what QEMU needs to know is not the raw >> physical address width of the host processor, but whether that width >> will cause EPT to silently truncate high guest-phys addresses.) > > Right; if you want to consider whether EPT is enabled (which is the > right thing to do, albeit it makes for a much bigger patch) a KVM patch > is necessary. In that case you also need to patch the API documentation. Note that this patch really doesn't do anything except for printing a message that something might potentially go wrong. Without EPT, you don't hit the processor limitation with your setup, but the user should nevertheless still be notified. In fact, I think shadow paging code should also emulate this behavior if the gpa is out of range. > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/09/15 20:32, Bandan Das wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 09/07/2015 08:43, Laszlo Ersek wrote: >>> On 07/09/15 08:09, Paolo Bonzini wrote: >>>> >>>> >>>> On 09/07/2015 00:36, Bandan Das wrote: >>>>> Let userspace inquire the maximum physical address width >>>>> of the host processors; this can be used to identify maximum >>>>> memory that can be assigned to the guest. >>>>> >>>>> Reported-by: Laszlo Ersek <lersek@redhat.com> >>>>> Signed-off-by: Bandan Das <bsd@redhat.com> >>>>> --- >>>>> arch/x86/kvm/x86.c | 3 +++ >>>>> include/uapi/linux/kvm.h | 1 + >>>>> 2 files changed, 4 insertions(+) >>>>> >>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>>> index bbaf44e..97d6746 100644 >>>>> --- a/arch/x86/kvm/x86.c >>>>> +++ b/arch/x86/kvm/x86.c >>>>> @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>>>> case KVM_CAP_NR_MEMSLOTS: >>>>> r = KVM_USER_MEM_SLOTS; >>>>> break; >>>>> + case KVM_CAP_PHY_ADDR_WIDTH: >>>>> + r = boot_cpu_data.x86_phys_bits; >>>>> + break; >>>> >>>> Userspace can just use CPUID, can't it? >>> >>> I believe KVM's cooperation is necessary, for the following reason: >>> >>> The truncation only occurs when the guest-phys <-> host-phys translation >>> is done in hardware, *and* the phys bits of the host processor are >>> insufficient to represent the highest guest-phys address that the guest >>> will ever face. >>> >>> The first condition (of course) means that the truncation depends on EPT >>> being enabled. (I didn't test on AMD so I don't know if RVI has the same >>> issue.) If EPT is disabled, either because the host processor lacks it, >>> or because the respective kvm_intel module parameter is set so, then the >>> issue cannot be experienced. >>> >>> Therefore I believe a KVM patch is necessary. >>> >>> However, this specific patch doesn't seem sufficient; it should also >>> consider whether EPT is enabled. (And the ioctl should be perhaps >>> renamed to reflect that -- what QEMU needs to know is not the raw >>> physical address width of the host processor, but whether that width >>> will cause EPT to silently truncate high guest-phys addresses.) >> >> Right; if you want to consider whether EPT is enabled (which is the >> right thing to do, albeit it makes for a much bigger patch) a KVM patch >> is necessary. In that case you also need to patch the API documentation. > > Note that this patch really doesn't do anything except for printing a > message that something might potentially go wrong. Yes. > Without EPT, you don't > hit the processor limitation with your setup, but the user should nevertheless > still be notified. I disagree. > In fact, I think shadow paging code should also emulate > this behavior if the gpa is out of range. I disagree. There is no "out of range" gpa. QEMU allocates enough memory, and it should be completely transparent to the guest. The fact that it silently breaks with nested paging if the host processor doesn't have enough address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not sure, but I suspect it's a hardware bug). In any case the guest shouldn't care at all. It is a *virtual* machine, and the VMM should lie to it plausibly enough. How much RAM, and how many phys address bits the host has, is a performance question, but it should not be a correctness question. A 256 GB guest should run (slowly, but correctly) on a laptop that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap space. Because otherwise your argument could be extrapolated as "TCG should break too if the gpa is 'out of range'". So, I disagree. Whatever memory you give to the guest should just work (unless of course you want to emulate a small address width for the *VCPU*, but that's absolutely not the use case here). What we have here is a leaky abstraction: a PCPU limitation giving away a lie that the guest should never notice. The guest should be able to use all memory that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT vs. KVM-with-EPT. If the last case cannot work (due to hardware limitations), that's fine, but then (and only then) a warning should be printed. ... In any case, please understand that I'm not campaigning for this warning :) IIRC the warning was your (very welcome!) idea after I reported the problem; I'm just trying to ensure that the warning match the exact issue I encountered. Thanks! Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Laszlo Ersek <lersek@redhat.com> writes: ... > Yes. > >> Without EPT, you don't >> hit the processor limitation with your setup, but the user should nevertheless >> still be notified. > > I disagree. > >> In fact, I think shadow paging code should also emulate >> this behavior if the gpa is out of range. > > I disagree. > > There is no "out of range" gpa. QEMU allocates enough memory, and it > should be completely transparent to the guest. The fact that it silently > breaks with nested paging if the host processor doesn't have enough > address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not > sure, but I suspect it's a hardware bug). In any case the guest > shouldn't care at all. It is a *virtual* machine, and the VMM should lie > to it plausibly enough. How much RAM, and how many phys address bits the > host has, is a performance question, but it should not be a correctness > question. A 256 GB guest should run (slowly, but correctly) on a laptop > that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap > space. > > Because otherwise your argument could be extrapolated as "TCG should > break too if the gpa is 'out of range'". > > So, I disagree. Whatever memory you give to the guest should just work > (unless of course you want to emulate a small address width for the > *VCPU*, but that's absolutely not the use case here). What we have here > is a leaky abstraction: a PCPU limitation giving away a lie that the > guest should never notice. The guest should be able to use all memory > that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT > vs. KVM-with-EPT. If the last case cannot work (due to hardware > limitations), that's fine, but then (and only then) a warning should be > printed. Hmm... Ok, I understand your point. So, this is more like a EPT limitation/bug in that Qemu isn't complaining about the memory assigned to the guest but EPT code is breaking owing to the processor physical address width. And honestly, I now think that this patch just makes the whole situation more confusing :) I am wondering if it's just possible for kvm to simply throw an error like a EPT misconfiguration or something .. Or in other words, if using a hardware assisted mechanism is just not possible, KVM will simply not let it run instead of letting a guest stuck in boot. > ... In any case, please understand that I'm not campaigning for this > warning :) IIRC the warning was your (very welcome!) idea after I > reported the problem; I'm just trying to ensure that the warning match > the exact issue I encountered. > > Thanks! > Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/09/15 22:02, Bandan Das wrote: > Laszlo Ersek <lersek@redhat.com> writes: > ... >> Yes. >> >>> Without EPT, you don't >>> hit the processor limitation with your setup, but the user should nevertheless >>> still be notified. >> >> I disagree. >> >>> In fact, I think shadow paging code should also emulate >>> this behavior if the gpa is out of range. >> >> I disagree. >> >> There is no "out of range" gpa. QEMU allocates enough memory, and it >> should be completely transparent to the guest. The fact that it silently >> breaks with nested paging if the host processor doesn't have enough >> address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not >> sure, but I suspect it's a hardware bug). In any case the guest >> shouldn't care at all. It is a *virtual* machine, and the VMM should lie >> to it plausibly enough. How much RAM, and how many phys address bits the >> host has, is a performance question, but it should not be a correctness >> question. A 256 GB guest should run (slowly, but correctly) on a laptop >> that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap >> space. >> >> Because otherwise your argument could be extrapolated as "TCG should >> break too if the gpa is 'out of range'". >> >> So, I disagree. Whatever memory you give to the guest should just work >> (unless of course you want to emulate a small address width for the >> *VCPU*, but that's absolutely not the use case here). What we have here >> is a leaky abstraction: a PCPU limitation giving away a lie that the >> guest should never notice. The guest should be able to use all memory >> that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT >> vs. KVM-with-EPT. If the last case cannot work (due to hardware >> limitations), that's fine, but then (and only then) a warning should be >> printed. > > Hmm... Ok, I understand your point. So, this is more like a EPT > limitation/bug in that Qemu isn't complaining about the memory assigned > to the guest but EPT code is breaking owing to the processor physical > address width. Exactly. > And honestly, I now think that this patch just makes the whole > situation more confusing :) I am wondering if it's just possible for kvm to > simply throw an error like a EPT misconfiguration or something .. > > Or in other words, if using a hardware assisted mechanism is just not > possible, KVM will simply not let it run instead of letting a guest > stuck in boot. That would be the best solution. Thanks Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bandan Das <bsd@redhat.com> writes: > Laszlo Ersek <lersek@redhat.com> writes: > ... >> Yes. >> >>> Without EPT, you don't >>> hit the processor limitation with your setup, but the user should nevertheless >>> still be notified. >> >> I disagree. >> >>> In fact, I think shadow paging code should also emulate >>> this behavior if the gpa is out of range. >> >> I disagree. >> >> There is no "out of range" gpa. QEMU allocates enough memory, and it >> should be completely transparent to the guest. The fact that it silently >> breaks with nested paging if the host processor doesn't have enough >> address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not >> sure, but I suspect it's a hardware bug). In any case the guest >> shouldn't care at all. It is a *virtual* machine, and the VMM should lie >> to it plausibly enough. How much RAM, and how many phys address bits the >> host has, is a performance question, but it should not be a correctness >> question. A 256 GB guest should run (slowly, but correctly) on a laptop >> that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap >> space. >> >> Because otherwise your argument could be extrapolated as "TCG should >> break too if the gpa is 'out of range'". >> >> So, I disagree. Whatever memory you give to the guest should just work >> (unless of course you want to emulate a small address width for the >> *VCPU*, but that's absolutely not the use case here). What we have here >> is a leaky abstraction: a PCPU limitation giving away a lie that the >> guest should never notice. The guest should be able to use all memory >> that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT >> vs. KVM-with-EPT. If the last case cannot work (due to hardware >> limitations), that's fine, but then (and only then) a warning should be >> printed. > > Hmm... Ok, I understand your point. So, this is more like a EPT > limitation/bug in that Qemu isn't complaining about the memory assigned > to the guest but EPT code is breaking owing to the processor physical > address width. And honestly, I now think that this patch just makes the whole > situation more confusing :) I am wondering if it's just possible for kvm to > simply throw an error like a EPT misconfiguration or something .. > > Or in other words, if using a hardware assisted mechanism is just not > possible, KVM will simply not let it run instead of letting a guest > stuck in boot. I noticed that when the guest gets stuck, trace shows an endless loop of EXTERNAL_INTERRUPT exits with code 14 (PF). There's a note in 28.2.2 of the spec that "No processors supporting the Intel64 architecture support more than 48 physical-address bits.. An attempt to use such an address causes a page fault". So, my first guess was to print out the guest physical address. That seems to be well beyond the range and is always 0xff000 (when the guest is stuck). The other thing I can think of is the EPT entries have bits in the 51:N range set which is reserved and always 0. I haven't verified but it looks like there's ept_misconfig_inspect_spte() that should already catch this condition. I am out of ideas for today :) > >> ... In any case, please understand that I'm not campaigning for this >> warning :) IIRC the warning was your (very welcome!) idea after I >> reported the problem; I'm just trying to ensure that the warning match >> the exact issue I encountered. >> >> Thanks! >> Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/07/2015 20:57, Laszlo Ersek wrote: >> Without EPT, you don't >> hit the processor limitation with your setup, but the user should nevertheless >> still be notified. > > I disagree. FWIW, I also disagree (and it looks like Bandan disagrees with himself now :)). >> In fact, I think shadow paging code should also emulate >> this behavior if the gpa is out of range. > > I disagree. Same here. > There is no "out of range" gpa. QEMU allocates enough memory, and it > should be completely transparent to the guest. The fact that it silently > breaks with nested paging if the host processor doesn't have enough > address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not > sure, but I suspect it's a hardware bug). It's a hardware bug, possibly due to some limitations in the physical addresses that the TLB can store? I guess KVM could detect the situation and fall back to sloooow shadow paging. > ... In any case, please understand that I'm not campaigning for this > warning :) IIRC the warning was your (very welcome!) idea after I > reported the problem; I'm just trying to ensure that the warning match > the exact issue I encountered. Yup. I think the right thing to do would be to hide memory above the limit. A kernel patch to query the limit is definitely necessary, but it needs to return e.g. 48 for shadow paging (otherwise you could just use CPUID). I'm not sure if the rest is possible with just QEMU, or it requires help from the firmware. Probably yes. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/10/15 16:13, Paolo Bonzini wrote: > > > On 09/07/2015 20:57, Laszlo Ersek wrote: >>> Without EPT, you don't >>> hit the processor limitation with your setup, but the user should nevertheless >>> still be notified. >> >> I disagree. > > FWIW, I also disagree (and it looks like Bandan disagrees with himself > now :)). > >>> In fact, I think shadow paging code should also emulate >>> this behavior if the gpa is out of range. >> >> I disagree. > > Same here. > >> There is no "out of range" gpa. QEMU allocates enough memory, and it >> should be completely transparent to the guest. The fact that it silently >> breaks with nested paging if the host processor doesn't have enough >> address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not >> sure, but I suspect it's a hardware bug). > > It's a hardware bug, possibly due to some limitations in the physical > addresses that the TLB can store? I guess KVM could detect the > situation and fall back to sloooow shadow paging. > >> ... In any case, please understand that I'm not campaigning for this >> warning :) IIRC the warning was your (very welcome!) idea after I >> reported the problem; I'm just trying to ensure that the warning match >> the exact issue I encountered. > > Yup. I think the right thing to do would be to hide memory above the > limit. How so? - The stack would not be doing what the user asks for. Pass -m <a_lot>, and the guest would silently see less memory. If the user found out, he'd immediately ask (or set out debugging) why. I think if the user's request cannot be satisfied, the stack should fail hard. - Assuming the user didn't find out, and the guest just worked (with less memory than the user asked for), then the hidden portion of the memory (that QEMU allocated nonetheless) would be just wasted, on the host system. (Especially with overcommit_memory=2 (which is the most prudent setting).) Thanks Laszlo > A kernel patch to query the limit is definitely necessary, but > it needs to return e.g. 48 for shadow paging (otherwise you could just > use CPUID). I'm not sure if the rest is possible with just QEMU, or it > requires help from the firmware. Probably yes. > > Paolo > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/07/2015 16:57, Laszlo Ersek wrote: > > > ... In any case, please understand that I'm not campaigning for this > > > warning :) IIRC the warning was your (very welcome!) idea after I > > > reported the problem; I'm just trying to ensure that the warning match > > > the exact issue I encountered. > > > > Yup. I think the right thing to do would be to hide memory above the > > limit. > How so? > > - The stack would not be doing what the user asks for. Pass -m <a_lot>, > and the guest would silently see less memory. If the user found out, > he'd immediately ask (or set out debugging) why. I think if the user's > request cannot be satisfied, the stack should fail hard. That's another possibility. I think both of them are wrong depending on _why_ you're using "-m <a lot>" in the first place. Considering that this really happens (on Xeons) only for 1TB+ guests, it's probably just for debugging and then hiding the memory makes some sense. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/10/15 16:59, Paolo Bonzini wrote: > > > On 10/07/2015 16:57, Laszlo Ersek wrote: >>>> ... In any case, please understand that I'm not campaigning for this >>>> warning :) IIRC the warning was your (very welcome!) idea after I >>>> reported the problem; I'm just trying to ensure that the warning match >>>> the exact issue I encountered. >>> >>> Yup. I think the right thing to do would be to hide memory above the >>> limit. >> How so? >> >> - The stack would not be doing what the user asks for. Pass -m <a_lot>, >> and the guest would silently see less memory. If the user found out, >> he'd immediately ask (or set out debugging) why. I think if the user's >> request cannot be satisfied, the stack should fail hard. > > That's another possibility. I think both of them are wrong depending on > _why_ you're using "-m <a lot>" in the first place. > > Considering that this really happens (on Xeons) only for 1TB+ guests, I reported this issue because I ran into it with a ~64GB guest. From my /proc/cpuinfo: model name : Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz address sizes : 36 bits physical, 48 bits virtual I was specifically developing 64GB+ support for OVMF, and this limitation caused me to think that there was a bug in my OVMF patches. (There wasn't.) An error message from QEMU, advising me to turn off EPT, would have saved me many hours. Thanks Laszlo > it's probably just for debugging and then hiding the memory makes some > sense. > > Paolo > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Laszlo Ersek <lersek@redhat.com> writes: > On 07/10/15 16:59, Paolo Bonzini wrote: >> >> >> On 10/07/2015 16:57, Laszlo Ersek wrote: >>>>> ... In any case, please understand that I'm not campaigning for this >>>>> warning :) IIRC the warning was your (very welcome!) idea after I >>>>> reported the problem; I'm just trying to ensure that the warning match >>>>> the exact issue I encountered. >>>> >>>> Yup. I think the right thing to do would be to hide memory above the >>>> limit. >>> How so? >>> >>> - The stack would not be doing what the user asks for. Pass -m <a_lot>, >>> and the guest would silently see less memory. If the user found out, >>> he'd immediately ask (or set out debugging) why. I think if the user's >>> request cannot be satisfied, the stack should fail hard. >> >> That's another possibility. I think both of them are wrong depending on >> _why_ you're using "-m <a lot>" in the first place. >> >> Considering that this really happens (on Xeons) only for 1TB+ guests, > > I reported this issue because I ran into it with a ~64GB guest. From my > /proc/cpuinfo: > > model name : Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz > address sizes : 36 bits physical, 48 bits virtual > > I was specifically developing 64GB+ support for OVMF, and this > limitation caused me to think that there was a bug in my OVMF patches. > (There wasn't.) An error message from QEMU, advising me to turn off EPT, > would have saved me many hours. Right, I specifically reserved a system with 36 bits physical to reproduce this and it was very easy to reproduce. If it's a hardware bug, I would say, it's a very annoying one (if not serious). I wonder if Intel folks can chime in. > Thanks > Laszlo > >> it's probably just for debugging and then hiding the memory makes some >> sense. Actually, I agree with Laszlo here. Hiding memory is synonymous to forcing the user to use less for the -m argument as is failing. But failing and letting the user do it himself can save hours of debugging. Regards, The confused teenager who can't make up his mind. >> Paolo >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbaf44e..97d6746 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_NR_MEMSLOTS: r = KVM_USER_MEM_SLOTS; break; + case KVM_CAP_PHY_ADDR_WIDTH: + r = boot_cpu_data.x86_phys_bits; + break; case KVM_CAP_PV_MMU: /* obsolete */ r = 0; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 716ad4a..e7949a1 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_PHY_ADDR_WIDTH 119 #ifdef KVM_CAP_IRQ_ROUTING
Let userspace inquire the maximum physical address width of the host processors; this can be used to identify maximum memory that can be assigned to the guest. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Bandan Das <bsd@redhat.com> --- arch/x86/kvm/x86.c | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+)