diff mbox series

x86/mce: Check for hypervisor before enabling additional error logging

Message ID 20201109232402.GA25492@agluck-desk2.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show
Series x86/mce: Check for hypervisor before enabling additional error logging | expand

Commit Message

Tony Luck Nov. 9, 2020, 11:24 p.m. UTC
Booting as a guest under KVM results in error messages about
unchecked MSR access:

[    6.814328][    T0] unchecked MSR access error: RDMSR from 0x17f at rIP: 0xffffffff84483f16 (mce_intel_feature_init+0x156/0x270)

because KVM doesn't provide emulation for random model specific registers.

Check for X86_FEATURE_HYPERVISOR and skip trying to enable the mode (a
guest shouldn't be concerned with corrected errors anyway).

Reported-by: Qian Cai <cai@redhat.com>
Fixes: 68299a42f842 ("x86/mce: Enable additional error logging on certain Intel CPUs")
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/intel.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Borislav Petkov Nov. 10, 2020, 6:31 a.m. UTC | #1
On Mon, Nov 09, 2020 at 03:24:02PM -0800, Luck, Tony wrote:
> Booting as a guest under KVM results in error messages about
> unchecked MSR access:
> 
> [    6.814328][    T0] unchecked MSR access error: RDMSR from 0x17f at rIP: 0xffffffff84483f16 (mce_intel_feature_init+0x156/0x270)
> 
> because KVM doesn't provide emulation for random model specific registers.
> 
> Check for X86_FEATURE_HYPERVISOR and skip trying to enable the mode (a
> guest shouldn't be concerned with corrected errors anyway).
> 
> Reported-by: Qian Cai <cai@redhat.com>
> Fixes: 68299a42f842 ("x86/mce: Enable additional error logging on certain Intel CPUs")
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/mce/intel.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
> index b47883e364b4..7f7d863400b7 100644
> --- a/arch/x86/kernel/cpu/mce/intel.c
> +++ b/arch/x86/kernel/cpu/mce/intel.c
> @@ -517,6 +517,9 @@ static void intel_imc_init(struct cpuinfo_x86 *c)
>  {
>  	u64 error_control;
>  
> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> +		return;
> +

Frankly, I'm tired of wagging the dog because the tail can't. If
qemu/kvm can't emulate a CPU model fully then it should ignore those
unknown MSR accesses by default, i.e., that "ignore_msrs" functionality
should be on by default I'd say...

We certainly can't be sprinkling this check everytime the kernel tries
to do something as basic as read an MSR.
Paolo Bonzini Nov. 10, 2020, 8:50 a.m. UTC | #2
On 10/11/20 07:31, Borislav Petkov wrote:
>>   
>> +	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> +		return;
>> +
> Frankly, I'm tired of wagging the dog because the tail can't. If
> qemu/kvm can't emulate a CPU model fully then it should ignore those
> unknown MSR accesses by default, i.e., that "ignore_msrs" functionality
> should be on by default I'd say...
> 
> We certainly can't be sprinkling this check everytime the kernel tries
> to do something as basic as read an MSR.

You don't have to, also because it's wrong.  Fortunately it's much 
simpler than that:

1) ignore_msrs _cannot_ be on by default.  You cannot know in advance 
that for all non-architectural MSRs it's okay for them to read as zero 
and eat writes.  For some non-architectural MSR which never reads as 
zero on real hardware, who knows that there isn't some code using the 
contents of the MSR as a divisor, and causing a division by zero 
exception with ignore_msrs=1?

2) it's not just KVM.  _Any_ hypervisor is bound to have this issue for 
some non-architectural MSRs.  KVM just gets the flak because Linux CI 
environments (for obvious reasons) use it more than they use Hyper-V or 
ESXi or VirtualBox.

3) because of (1) and (2), the solution is very simple.  If the MSR is 
architectural, its absence is a KVM bug and we'll fix it in all stable 
versions.  If the MSR is not architectural (and 17Fh isn't; not only 
it's not mentioned in the SDM, even Google is failing me), never ever 
assume that the CPUID family/model/stepping implies a given MSR is 
there, and just use rdmsr_safe/wrmsr_safe.

So, for this patch,

Nacked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo
Borislav Petkov Nov. 10, 2020, 9:56 a.m. UTC | #3
On Tue, Nov 10, 2020 at 09:50:43AM +0100, Paolo Bonzini wrote:
> 1) ignore_msrs _cannot_ be on by default.  You cannot know in advance that
> for all non-architectural MSRs it's okay for them to read as zero and eat
> writes.  For some non-architectural MSR which never reads as zero on real
> hardware, who knows that there isn't some code using the contents of the MSR
> as a divisor, and causing a division by zero exception with ignore_msrs=1?

So if you're emulating a certain type of hardware - say a certain CPU
model - then what are you saying? That you're emulating it but not
really all of it, just some bits?

Because this is what happens - the kernel checks that it runs on a
certain CPU type and this tells it that those MSRs are there. But then
comes virt and throws all assumptions out.

So if it emulates a CPU model and the kernel tries to access those MSRs,
then the HV should ignore those MSR accesses if it doesn't know about
them. Why should the kernel change everytime some tool or virtualization
has shortcomings?

> 2) it's not just KVM.  _Any_ hypervisor is bound to have this issue for some
> non-architectural MSRs.  KVM just gets the flak because Linux CI
> environments (for obvious reasons) use it more than they use Hyper-V or ESXi
> or VirtualBox.

It's not flak - I'm trying to find a solution which is workable for
both. The kernel is not at fault here.

> 3) because of (1) and (2), the solution is very simple.  If the MSR is
> architectural, its absence is a KVM bug and we'll fix it in all stable
> versions.  If the MSR is not architectural (and 17Fh isn't; not only it's
> not mentioned in the SDM,

It is mentioned in the SDM.

> even Google is failing me), never ever assume that the CPUID
> family/model/stepping implies a given MSR is there, and just use
> rdmsr_safe/wrmsr_safe.

Yes, we don't have a choice, as always. ;-\

But maybe we should have a choice and maybe qemu/kvm should have a way
to ignore certain MSRs for certain CPU types, regardless of them being
architectural or not.

Thx.
Paolo Bonzini Nov. 10, 2020, 10:40 a.m. UTC | #4
On 10/11/20 10:56, Borislav Petkov wrote:
> On Tue, Nov 10, 2020 at 09:50:43AM +0100, Paolo Bonzini wrote:
>> 1) ignore_msrs _cannot_ be on by default.  You cannot know in advance that
>> for all non-architectural MSRs it's okay for them to read as zero and eat
>> writes.  For some non-architectural MSR which never reads as zero on real
>> hardware, who knows that there isn't some code using the contents of the MSR
>> as a divisor, and causing a division by zero exception with ignore_msrs=1?
> 
> So if you're emulating a certain type of hardware - say a certain CPU
> model - then what are you saying? That you're emulating it but not
> really all of it, just some bits?

We try to emulate all that is described in the SDM as architectural, as 
long as we expose the corresponding CPUID leaves.

However, f/m/s mean nothing when running virtualized.  First, trying to 
derive any non-architectural property from the f/m/s is going to fail. 
Second, even the host can be anything as long as it's newer than the 
f/m/s that the VM reports (i.e. you can get a Sandy Bridge model and 
model name even if running on Skylake).

Also, X86_FEATURE_HYPERVISOR might be clear even if running virtualized. 
  (Thank you nVidia for using it to play market segmentation games).

> Because this is what happens - the kernel checks that it runs on a
> certain CPU type and this tells it that those MSRs are there. But then
> comes virt and throws all assumptions out.
> 
> So if it emulates a CPU model and the kernel tries to access those MSRs,
> then the HV should ignore those MSR accesses if it doesn't know about
> them. Why should the kernel change everytime some tool or virtualization
> has shortcomings?

See above: how can the hypervisor know a safe value for all MSRs, 
possibly including the undocumented ones?

>> 3) because of (1) and (2), the solution is very simple.  If the MSR is
>> architectural, its absence is a KVM bug and we'll fix it in all stable
>> versions.  If the MSR is not architectural (and 17Fh isn't; not only it's
>> not mentioned in the SDM,
> 
> It is mentioned in the SDM.

Oh right they moved the MSRs to a separate manual; found it now.  Still, 
it's not architectural.

> But maybe we should have a choice and maybe qemu/kvm should have a way
> to ignore certain MSRs for certain CPU types, regardless of them being
> architectural or not.

If it makes sense to emulate certain non-architectural MSRs we can add 
them.  Supporting the error control MSR wouldn't even be hard, but I'm 
not sure it makes sense:

1) that MSR has not been there on current processors for several years 
(and therefore Intel has clearly no intention of making architectural). 
  For what we know, even current processors might not provide any of 
that extended information at all (and still the VM could present Sandy 
Bridge f/m/s).

2) it would only present extended error info if the host itself enables 
the bit, so one might question the wisdom of backporting that support 
this to stable kernels

3) It's unclear whether the guest would be able to use the extended 
error information at all (and in some cases the description in the 
manual is not even proper English: "allows the iMC to log first device 
error when corrected error is detected during normal read"?).

4) other hypervisors, including older distros, would likely have the 
same issue.

Paolo
Borislav Petkov Nov. 10, 2020, 3:50 p.m. UTC | #5
On Tue, Nov 10, 2020 at 11:40:34AM +0100, Paolo Bonzini wrote:
> However, f/m/s mean nothing when running virtualized.  First, trying to
> derive any non-architectural property from the f/m/s is going to fail.
> Second, even the host can be anything as long as it's newer than the f/m/s
> that the VM reports (i.e. you can get a Sandy Bridge model and model name
> even if running on Skylake).

Yah, that's why I said "then comes virt and throws all assumptions out."

> See above: how can the hypervisor know a safe value for all MSRs, possibly
> including the undocumented ones?

Not all - just the ones which belong to a model. I was thinking of
having a mapping between f/m/s and a list of MSRs which those models
have - even non-architectural ones - but that's a waste of energy. Why?
Because using the *msr_safe() variants will give you the same thing
without doing any of the MSR lists in KVM and any of that jumping thru
hoops.

Which means, the kernel MSR accessors should be doing _safe(), i.e.,
exception handling, by default. And the non-safe ones - rdmsrl, wrmsrl,
etc, should all be used only in very seldom cases. Hmm, yeah, that would
probably solve this class of problems once and for all.

> If it makes sense to emulate certain non-architectural MSRs we can add them.

See above - probably not worth the effort.

I'll take a look at how ugly it would become to make the majority of MSR
accesses safe.

Thx.
Paolo Bonzini Nov. 10, 2020, 4:08 p.m. UTC | #6
On 10/11/20 16:50, Borislav Petkov wrote:
> I was thinking of
> having a mapping between f/m/s and a list of MSRs which those models
> have - even non-architectural ones - but that's a waste of energy. Why?
> Because using the *msr_safe() variants will give you the same thing

Yes, pretty much.

>> If it makes sense to emulate certain non-architectural MSRs we can add them.
> See above - probably not worth the effort.

When we do, certain Microsoft OSes are usually involved. :)

> I'll take a look at how ugly it would become to make the majority of MSR
> accesses safe.

I think most of them already are, especially the non-architectural ones, 
because I remember going through a similar discussion a few years ago 
and Andy said basically "screw it, let's just use *_safe anywhere" as 
well.  I don't see any need to do anything but add it to your review 
checklist if you have it (and do it now for MSR_ERROR_CONTROL).

Paolo
Tony Luck Nov. 10, 2020, 5:52 p.m. UTC | #7
I still think there is a reasonable case to claim that this usage is right to check
whether it is running as a guest.

Look at what it is trying to do ... change the behavior of the platform w.r.t. logging
of memory errors.  How does that make any sense for a guest ... that doesn't even
know what memory is present on the platform. Or have guarantees that what it sees
as memory address 0x12345678 maps to the same set of cells in a DRAM from one
second to the next?

-Tony
Paolo Bonzini Nov. 10, 2020, 8:37 p.m. UTC | #8
On 10/11/20 18:52, Luck, Tony wrote:
> Look at what it is trying to do ... change the behavior of the platform w.r.t. logging
> of memory errors.  How does that make any sense for a guest ...

Logging of memory errors certainly makes sense for a guest, KVM already 
does MCE forwarding as you probably know.

The exact set of information that MSR_ERROR_CONTROL[1] adds may not make 
much sense in the case of KVM, but it may make sense for other 
hypervisors that do nothing but partition the host.  (Difficult for me 
to say since the relevant part of the SDM might as well be written in 
Klingon :)).

In any case, checking HYPERVISOR is not enough because having it clear 
is a valid configuration.  So you would still have to switch to 
{rd,wr}msrl_safe, and then checking HYPERVISOR is pointless.

Paolo

> that doesn't even
> know what memory is present on the platform. Or have guarantees that what it sees
> as memory address 0x12345678 maps to the same set of cells in a DRAM from one
> second to the next?
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index b47883e364b4..7f7d863400b7 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -517,6 +517,9 @@  static void intel_imc_init(struct cpuinfo_x86 *c)
 {
 	u64 error_control;
 
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+		return;
+
 	switch (c->x86_model) {
 	case INTEL_FAM6_SANDYBRIDGE_X:
 	case INTEL_FAM6_IVYBRIDGE_X: